Selenium and PhantomJS

Selenium is a Web testing automation tool. It was originally developed for Web site automation testing. It is similar to the button sprites we use to play games. It supports all major browsers (including unbounded browsers like PhantomJS).

Selenium allows the browser to automatically load pages, get the data it needs, or even take screenshots of pages, or determine whether certain actions are taking place on a website, based on our instructions.

Selenium does not come with a browser of its own and does not support browser functionality. It needs to be used in conjunction with third-party browsers. But we sometimes need to make it run embedded in code, so we can use a tool called PhantomJS instead of a real browser.

Can download from PyPI website Selenium library pypi.python.org/simple/sele… PIP install Selenium can also be installed using the third-party manager PIP

Selenium official Reference:selenium-python.readthedocs.io/index.html

PhantomJS is a WebKit-based “headless” browser that loads a website into memory and executes JavaScript on the page, and runs more efficiently than a full browser because it doesn’t display a graphical interface.

If we combine Selenium with PhantomJS, we can run a very powerful web crawler that handles JavaScrip, Cookie, headers, and anything else our real users need to do.

Note: PhantomJS only from its official website phantomjs.org/download.ht… Download. Because PhantomJS is a full-featured (albeit non-interface) browser rather than a Python library, it does not need to be installed like other Python libraries, but we can use PhantomJS directly through Selenium calls.

PhantomJS official reference document:Phantomjs.org/documentati…

#3. Quick Start

Selenium has an API called WebDriver. WebDriver is a bit like a browser that loads websites, but it can also be used like BeautifulSoup or any other Selector object to find page elements, interact with elements on the page (sending text, clicking, etc.), and perform other actions to run a web crawler.

# IPython2 Test code

# import webdriver
from selenium import webdriver

To call keyboard keystrokes, you need to import keys
from selenium.webdriver.common.keys import Keys

Create a browser object by calling the PhantomJS browser specified by the environment variable
driver = webdriver.PhantomJS()

# If PhantomJS location is not specified in the environment variable
# driver = webdriver.PhantomJS(executable_path="./phantomjs"))

The # get method will wait until the page is fully loaded before continuing the program, and usually the test will select time.sleep(2) here.
driver.get("http://www.baidu.com/")

Get the text content of the ID tag named Wrapper
data = driver.find_element_by_id("wrapper").text

Print data content
print data

# Print page title "Baidu, you know"
print driver.title

Create and save a snapshot of the current page
driver.save_screenshot("baidu.png")

# id="kw" is baidu search input box, input string "Great Wall"
driver.find_element_by_id("kw").send_keys(u"Great Wall")

# id="su" is a Baidu search button, and click() is a simulated click
driver.find_element_by_id("su").click()

Get a new page snapshot
driver.save_screenshot("The Great Wall. The PNG")

Print the source code after rendering the web page
print driver.page_source

Get the current page Cookie
print driver.get_cookies()

# CTRL + A Select all input box contents
driver.find_element_by_id("kw").send_keys(Keys.CONTROL,'a')

# CTRL + X cut the input field
driver.find_element_by_id("kw").send_keys(Keys.CONTROL,'x')

Reenter the content in the input box
driver.find_element_by_id("kw").send_keys("itcast")

# Simulate Enter key
driver.find_element_by_id("su").send_keys(Keys.RETURN)

# Clear the input field
driver.find_element_by_id("kw").clear()

Create a new page snapshot
driver.save_screenshot("itcast.png")

Get the current URL
print driver.current_url

If there is only one page, the browser will be closed
# driver.close()

# Close the browser
driver.quit()
Copy the code

Selenium’s WebDriver provides various methods to find elements, assuming that there is a form input box below:

<input type="text" name="user-name" id="passwd-id" />
Copy the code

So:

Get the id tag value
element = driver.find_element_by_id("passwd-id")
Get the name tag value
element = driver.find_element_by_name("user-name")
Get the tag name value
element = driver.find_elements_by_tag_name("input")
# Can also be matched by XPath
element = driver.find_element_by_xpath("//input[@id='passwd-id']")
Copy the code

#5. Locating UI elements (WebElements)

For element selection, there is the following API for single element selection

find_element_by_id
find_elements_by_name
find_elements_by_xpath
find_elements_by_link_text
find_elements_by_partial_link_text
find_elements_by_tag_name
find_elements_by_class_name
find_elements_by_css_selector
Copy the code

##1.By ID

<div id="coolestWidgetEvah">... </div>

implementation

element = driver.find_element_by_id("coolestWidgetEvah")
------------------------ or -------------------------
from selenium.webdriver.common.by import By
element = driver.find_element(by=By.ID, value="coolestWidgetEvah")
Copy the code

By Class Name

<div class="cheese"><span>Cheddar</span></div><div class="cheese"><span>Gouda</span></div>

cheeses = driver.find_elements_by_class_name("cheese")
------------------------ or -------------------------
from selenium.webdriver.common.by import By
cheeses = driver.find_elements(By.CLASS_NAME, "cheese")
Copy the code

##2.By Tag Name

<iframe src="..." ></iframe>

implementation

frame = driver.find_element_by_tag_name("iframe")
------------------------ or -------------------------
from selenium.webdriver.common.by import By
frame = driver.find_element(By.TAG_NAME, "iframe")
Copy the code

##3.By Name

<input name="cheese" type="text"/>

implementation

cheese = driver.find_element_by_name("cheese")
------------------------ or -------------------------
from selenium.webdriver.common.by import By
cheese = driver.find_element(By.NAME, "cheese")
Copy the code

##4.By Link Text

<a href="http://www.google.com/search?q=cheese">cheese</a>

implementation

cheese = driver.find_element_by_link_text("cheese")
------------------------ or -------------------------
from selenium.webdriver.common.by import By
cheese = driver.find_element(By.LINK_TEXT, "cheese")
Copy the code

##5.By Partial Link Text

<a href="http://www.google.com/search?q=cheese">search for cheese</a>>

implementation

cheese = driver.find_element_by_partial_link_text("cheese")
------------------------ or -------------------------
from selenium.webdriver.common.by import By
cheese = driver.find_element(By.PARTIAL_LINK_TEXT, "cheese")
Copy the code

##6.By CSS

<div id="food"><span class="dairy">milk</span><span class="dairy aged">cheese</span></div>

implementation

cheese = driver.find_element_by_css_selector("#food span.dairy.aged")
------------------------ or -------------------------
from selenium.webdriver.common.by import By
cheese = driver.find_element(By.CSS_SELECTOR, "#food span.dairy.aged")
Copy the code

##7.By XPath

<input type="text" name="example" /> <INPUT type="text" name="other" />

implementation

inputs = driver.find_elements_by_xpath("//input")
------------------------ or -------------------------
from selenium.webdriver.common.by import By
inputs = driver.find_elements(By.XPATH, "//input")
Copy the code

Sometimes, we need to simulate some mouse actions on the page, such as double click, right click, drag and hold, etc., we can import ActionChains to do this:

Import ActionChains
from selenium.webdriver import ActionChains

Move the mouse to the AC position
ac = driver.find_element_by_xpath('element')
ActionChains(driver).move_to_element(ac).perform()


# Click in the AC position
ac = driver.find_element_by_xpath("elementA")
ActionChains(driver).move_to_element(ac).click(ac).perform()

# Double click in ac position
ac = driver.find_element_by_xpath("elementB")
ActionChains(driver).move_to_element(ac).double_click(ac).perform()

# Right click at ac position
ac = driver.find_element_by_xpath("elementC")
ActionChains(driver).move_to_element(ac).context_click(ac).perform()

# Left click hold in ac position
ac = driver.find_element_by_xpath('elementF')
ActionChains(driver).move_to_element(ac).click_and_hold(ac).perform()

Drag ac1 to ac2
ac1 = driver.find_element_by_xpath('elementD')
ac2 = driver.find_element_by_xpath('elementE')
ActionChains(driver).drag_and_drop(ac1, ac2).perform()
Copy the code

We already know how to enter text into a text field, but sometimes we run into the < SELECT >
TAB dropdown. Clicking directly on the option in the drop-down box may not be feasible.

<select id="status" class="form-control valid" onchange="" name="status">
    <option value=""></option>
    <option value="0"> Not reviewed </option> <option value="1"</option> <option value="2"</option> <option value="3"</option> </select>Copy the code

Import the Select class
from selenium.webdriver.support.ui import Select

# Find the name TAB
select = Select(driver.find_element_by_name('status'))

# 
select.select_by_index(1)
select.select_by_value("0")
select.select_by_visible_text(u"Not reviewed")
Copy the code

These are the three ways to select the drop-down box, it can be selected by index, it can be selected by value, it can be selected by text. Note:

Index The index starts from 0

Value is an attribute value of the option tag, not the value displayed in the drop-down box

Visible_text is the value of the text in the option tag, it’s the value displayed in the dropdown box. Is simple:select.deselect_all()

Alert = driver.switch_to_alert() #9 A browser is bound to have many Windows, so there must be a way to switch between them. The method of switching Windows is as follows: driver.switch_to.window(“this is window name”) You can also use the Window_handles method to obtain the operation object of each window. Such as:

for handle in driver.window_handles:
    driver.switch_to_window(handle)
Copy the code

Page forward and back:

driver.forward()     # forward
driver.back()        # back
Copy the code

#11.Cookies

Gets the value of each cookie on the page, as follows

for cookie in driver.get_cookies():
    print "%s -> %s" % (cookie['name'], cookie['value'])
Copy the code

Delete Cookies as follows

# By name
driver.delete_cookie("CookieName")

# all
driver.delete_all_cookies()
Copy the code

Web pages are increasingly using Ajax technology, so programs can’t be sure when an element is fully loaded. If the actual page waits too long for a DOM element to appear, but your code uses the WebElement directly, NullPointer will be raised.

In order to avoid the difficulties and will improve the element orientation ElementNotVisibleException probability. So Selenium provides two waiting modes, one is implicit and the other is explicit.

Implicit wait is to wait for a specific time, and explicit wait is to specify a condition until the condition is true.

Explicitly wait to specify a condition, and then set the maximum wait time. If the element is not found by this time, an exception is thrown.

from selenium import webdriver
from selenium.webdriver.common.by import By
# WebDriverWait library, responsible for loop wait
from selenium.webdriver.support.ui import WebDriverWait
The # expected_conditions class is responsible for the condition departure
from selenium.webdriver.support import expected_conditions as EC

driver = webdriver.Chrome()
driver.get("http://www.xxxxx.com/loading")
try:
    # the page keeps looping until id="myDynamicElement" appears
    element = WebDriverWait(driver, 10).until(
        EC.presence_of_element_located((By.ID, "myDynamicElement"))
    )
finally:
    driver.quit()
Copy the code

If no arguments are written, the program defaults to a 0.5 second call to see if the element has been generated, and returns immediately if the element already exists.

Here are some of the built-in wait conditions that you can invoke directly instead of writing your own wait conditions.

title_is title_contains presence_of_element_located visibility_of_element_located visibility_of presence_of_all_elements_located text_to_be_present_in_element text_to_be_present_in_element_value Frame_to_be_available_and_switch_to_it invisibility_of_element_located element_to_be_clickable -- It is Displayed and Enabled. staleness_of element_to_be_selected element_located_to_be_selected element_selection_state_to_be element_located_selection_state_to_be alert_is_presentCopy the code

Implicit wait The implicit wait is simple. You simply set a waiting time, in seconds.

from selenium import webdriver

driver = webdriver.Chrome()
driver.implicitly_wait(10) # seconds
driver.get("http://www.xxxxx.com/loading")
myDynamicElement = driver.find_element_by_id("myDynamicElement")
Copy the code

If this parameter is not set, the default waiting time is 0.

Related Posts

Several methods of daily temperature

Why did I ditch Ubuntu?

Learn about the CLI