This is the 19th day of my participation in the August Wenwen Challenge.More challenges in August

Selenium uses id, ClassName, Tag, xpath, and selectors to find elements. It doesn’t really matter which one you use, as long as you’re adept at finding elements.

There’s one more way to do this, and that’s called xpath. Whether you use Selenium or not, xpath is also a common choice when you’re seriously learning about crawlers.

First of all, what is xpath? What does it do? XPath (XML Path Language) is a Language specified by the international standardization organization W3C to select nodes in XML and HTML documents. Current major browsers (Chrome, Firefox, Edge, Safari) all support XPath syntax

An absolute path

An absolute path is a deterministic path written from the root node. Here our root node is/HTML, and we start at/HTML and work our way up to some specific element. In general, absolute paths are long. For example, we could go like this:

elements = driver.find_elements_by_xpath("/html/body/div/")
Copy the code

The xpath above is coherent, concrete and deterministic, so it is an absolute path.

Again, take Baidu

The full xpath above is:

/html/body/div[1]/div[1]/div[5]/div/div/form/span[2]/input
Copy the code

Relative paths

As opposed to absolute paths, we have relative paths. Relative paths are not written from the root node. It usually starts with a double slash //. Such as:

//div//a
Copy the code

Some students say, is the relative path so short? Yeah, it’s that short, so it’s likely to match a lot of them, so we need to be careful. Check whether the matched element is the one we need, otherwise there may be an error when using it.

Our complete matching code is:

elements = driver.find_elements_by_xpath("//div//p")
Copy the code

That way you can match a bunch of them.

The wildcard

The wildcard here means any character, replaced by *.

If you want to select all direct children of all div nodes, you can use the expression //div/*

elements = driver.find_elements_by_xpath("//div/*")
Copy the code

The code above matches all child nodes under all divs.

About crawlers and automated tests, if you are interested in learning, you can pay attention to the public account: code like poetry, leave a message to me, I will teach you systematic learning.