Playwright is Microsoft open in early 2020 a new generation of automated testing tools, its function is similar to the Selenium, Pyppeteer etc., can drive the browser for all kinds of automation. It is also very powerful, with support for all the major browsers on the market and a simple and powerful API. Although the birth is relatively late, but the development is very hot.

In an age when Pyppeteer is no longer maintained, owning ourselves, the open source tool, is a great choice, well-documented and powerful.

installation

conda config --add channels conda-forge
conda config --add channels microsoft
conda install playwright
playwright install
Copy the code

The way the above command works is by downloading and packaging it as Chromium, Firefox, and Webkit install browser binaries.

The characteristics of

Offended, offended, offended, offended, offended, offended, offended, offended, offended, offended, offended, offended, offended, offended, offended, offended, offended, offended, offended, offended, offended, offended, offended, offended, offended, offended, offended, offended, offended, offended, offended, offended, offended, offended, offended, offended, offended.
The ourselves supports testing of mobile pages, using device emulation technology to test responsive Web applications in mobile Web browsers.
The ourselves supports both Headless and non-headless tests in all browsers.
The ourselves is easy to install and configure, installing the browser driver automatically and requiring no additional WebDriver configuration.
The ourselves provides a host of automation-related apis that wait for the corresponding node to load when the page loads, greatly simplifying the task of writing the API.

use

Import ourselves into a Python script and launch one of the three browsers (Chromium, Friefox, and WebKit). Playwright support two writing pattern, one kind is Pyppeter asynchronous mode, another kind is like Selenium synchronous mode, we can according to the actual need to choose a different model.

Let’s start with an example of a basic synchronization mode:

from playwright.sync_api import sync_playwright


with sync_playwright() as p:
    for browser_type in [p.chromium, p.firefox, p.webkit]:
        browser = browser_type.launch(headless=False)
        page = browser.new_page()
        page.goto("https://www.baidu.com")
        page.screenshot(path=f"screenshot-{browser_type.name}.png")
        print(page.title())
        browser.close()
Copy the code

Here we first import the Sync_ourselves method and then call it, which returns a PlaywrightContextManager object, thought of as the browser’s context manager, assigned to the variable P. The Chromium, Firefox, and WebKit browser instances of the PlaywrightContextManager object are then called and their launch methods are executed using a for loop, setting headless to False.

One caveat here: if launch is not set to Flase, the browser launches in headless mode by default and we don’t see any Windows.

The launch method returns a Browser object, which we copy as a Browser variable, and then call new_page, which is equivalent to creating a new video card. The page object is returned and assigned to the variable page. The next step is to call the page object’s set of automated apis. After the page is loaded, a screenshot is generated and console output is exited. This code calls two methods of the page object:

1, screenshot: parameter upload the name of a file, so the screenshot will be automatically saved as the name of the file.

Title: Returns the title of the page.

In this case, the current directory will generate three screenshot files, all of which are baidu’s home page with the name of the browser in the file name, as shown in the figure:

Console running results:

Baidu, you know baidu, you knowCopy the code

In addition to the synchronous mode described above, the Playwright also supports asynchronous mode, if used in the project asyncio, so it should be considered to use asynchronous mode, use the asynchronous API, written as follows:

import asyncio
from playwright.async_api import async_playwright


async def main() :
    async with async_playwright() as p:
        for browser_type in [p.chromium, p.firefox, p.webkit]:
            browser = await browser_type.launch()
            page = await browser.new_page()
            await page.goto("https://www.baidu.com")
            await page.screenshot(path=f"screenshot-{browser_type.name}.png")
            print(await page.title())

            await browser.close()

asyncio.run(main())
Copy the code

As you can see from the above code, the entire script is very similar to the synchronous mode.

Note:

1. The async_ourselves method is imported

2, add async/await keyword in writing.

Code generation

The ourselves also has the power to record what we do in the browser and automatically generate code as we go. This can be done by calling the codeGen command, so let’s first consider what parameters the codeGen command has.

playwright codegen --help
Copy the code

The results look something like this:

Usage: npx playwright codegen [options] [url] open page and generate code for user actions Options: -o, --output <file name> saves the generated script to a file --target <language> language to generate, one of javascript, test, python, python-async, csharp (default: "python") -b, --browser <browserType> browser to use, one of cr, chromium, ff, firefox, wk, webkit (default: "chromium") --channel <channel> Chromium distribution channel, "chrome", "chrome-beta", "msedge-dev", etc --color-scheme <scheme> emulate preferred color scheme, "light" or "dark" --device <deviceName> emulate device, for example "iPhone 11" --geolocation <coordinates> specify geolocation coordinates, For example "37.819722,-122.478611" --ignore-https-errors ignore HTTPS errors --load-storage <filename> load context storage state from the file, previously saved with --save-storage --lang <language> specify language / locale, for example "en-GB" --proxy-server <proxy> specify proxy server, for example "http://myproxy:3128" or "socks5://myproxy:8080" --save-storage <filename> save context storage state at the  end, for later use with --load-storage --save-trace <filename> record a trace for the session and save it to a file --timezone <time zone> time zone to emulate, for example "Europe/Rome" --timeout <timeout> timeout for Playwright actions in milliseconds (default: "10000") --user-agent <ua string> specify user agent string --viewport-size <size> specify browser viewport size in pixels, for example "1280, 720" -h, --help display help for command Examples: $ codegen $ codegen --target=python $ codegen -b webkit https://example.comCopy the code

You can see several options above, such as -o for the name of the output code file; — Target specifies the language used. The default is Python, which generates synchronous mode operation code. If python-async is passed in, asynchronous mode operation code is generated. -b Indicates the browser type. The default browser is Chrome. — Device can simulate using mobile browser; –lang sets the browser language, and –timeout sets the page loading timeout time.

With that in mind, let’s try launching Chrome and output the results to test3.py with the following command:

playwright codegen -o test3.py --target python-async
Copy the code

A Chrome browser pops up with a script window on the right that displays the code for the action in real time.

You can see that the browser also highlights the node in action, along with the node name.

The code changes in real time during the operation. After the operation, the browser can be shut down, and the offender may generate a test3.py file that reads as follows:

import asyncio

from playwright.async_api import Playwright, async_playwright


async def run(playwright: Playwright) - >None:
    browser = await playwright.chromium.launch(headless=False)
    context = await browser.new_context()

    # Open new page
    page = await context.new_page()

    # Go to https://www.baidu.com/
    await page.goto("https://www.baidu.com/")

    # Click input[name="wd"]
    await page.click("input[name=\"wd\"]")

    # Click input[name="wd"]
    await page.click("input[name=\"wd\"]")

    # Fill input[name="wd"]
    await page.fill("input[name=\"wd\"]"."How to get rich?")

    # Click text= baidu
    # async with page.expect_navigation(url="https://www.baidu.com/s?ie=utf-8&f=8&rsv_bp=1&rsv_idx=1&tn=baidu&wd=%E5%A6%82%E4%BD%95%E6%A6 %9C%E4%B8%8A%E5%AF%8C%E5%A9%86&fenlei=256&rsv_pq=ca59e3ec000cf6aa&rsv_t=5f82kcndi6iqNSwqOVo5sd%2BHSoqhzQHKLGVs1HFegxx02U tWAA5gHQbWBfw&rqlang=cn&rsv_enter=0&rsv_dl=tb&rsv_sug3=24&rsv_sug1=14&rsv_sug7=100&rsv_btype=i&prefixsug=%25E5%25A6%2582 %25E4%25BD%2595%25E6%25A6%259C%25E4%25B8%258A%25E5%25AF%258C%25E5%25A9%2586&rsp=4&inputT=8686&rsv_sug4=68370&rsv_jmp=fai l"):
    async with page.expect_navigation():
        await page.click("Text = baidu")
    # assert page.url == "https://www.baidu.com/s?ie=utf-8&f=8&rsv_bp=1&rsv_idx=1&tn=baidu&wd=%E5%A6%82%E4%BD%95%E6%A6%9C%E4%B8%8A%E5%AF%8C%E5%A9 %86&fenlei=256&rsv_pq=ca59e3ec000cf6aa&rsv_t=5f82kcndi6iqNSwqOVo5sd%2BHSoqhzQHKLGVs1HFegxx02UtWAA5gHQbWBfw&rqlang=cn&rsv _enter=0&rsv_dl=tb&rsv_sug3=24&rsv_sug1=14&rsv_sug7=100&rsv_btype=i&prefixsug=%25E5%25A6%2582%25E4%25BD%2595%25E6%25A6%2 59C%25E4%25B8%258A%25E5%25AF%258C%25E5%25A9%2586&rsp=4&inputT=8686&rsv_sug4=68370"

    # Close page
    await page.close()

    # -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -
    await context.close()
    await browser.close()


async def main() - >None:
    async with async_playwright() as playwright:
        await run(playwright)


asyncio.run(main())
Copy the code

You can see that this code is basically similar to the code we wrote before, and it’s completely runnable, and when it runs, you can see that it repeats what we just did.

In addition, new_page is not called by browser, but by the context variable, which in turn is called by browser object. The context variable is equivalent to a BrowserContext object, which is an invisible-like independent context in which the running resources are isolated from each other.

The selector

Playwright of the document is very rich, can be directly reference Playwright. Dev/python/docs…

Event listeners

The Page object provides an ON method that can be used to listen for events on the page, such as close, console, load, Request, response, and so on.

For example, we can monitor the response event, which can be triggered every time the network request gets a response. We can set the corresponding callback method to obtain all the information of the corresponding response.

from playwright.sync_api import sync_playwright


def on_response(response) :
    print(f'Statue {response.status}:{response.url}')

with sync_playwright() as p:
    browser = p.chromium.launch(headless=False)
    page = browser.new_page()
    page.on('response', on_response)
    page.goto('https://www.kenshujun.cn/')
    page.wait_for_load_state('networkidle')
    browser.close()
Copy the code

Once the Page object is created, it listens for the response event and sets the callback method to on_response. The on_response object takes an argument and prints both the status code and the connection.

If you look, you can see that the output here is the same as what was loaded in the browser’s Network panel.

mo4tech.com (Moment For Technology) is a global community with thousands techies from across the global hang out!Passionate technologists, be it gadget freaks, tech enthusiasts, coders, technopreneurs, or CIOs, you would find them all here.

Introduction of a new crawler sharp weapon

installation

The characteristics of

use

Code generation

The selector

Event listeners

Introduction of a new crawler sharp weapon

installation

The characteristics of

use

Code generation

The selector

Event listeners

Related Posts

Multithreading in Java

Computer and Network Security (6)

Go Micro Service Series (1)