This article is from the official account of the project: “AirtestProject” Copyright notice: It is allowed to be reproduced, but the original link must be retained. Do not use it for commercial or illegal purposes

preface

Airtest is a cross-platform UI automation test framework based on the principle of image recognition. It can identify the position of a screenshot in the current screen according to a large number of feature points, but it cannot identify the specific text contained in the screenshot.

In the process of automated testing, we will often encounter scenes requiring text recognition, such as identification of verification codes, identification of text in screenshots, reading of values in screenshots and so on. How can we deal with these situations?

Tesseract-ocr, a free open source image OCR word recognition software, is used to deal with the above situation.

1. Install the Tesseract – OCR. Exe

Tesseract-ocr: TesserACt-OCR: TesserACt-OCR: TesserAct-OCR: TesserAcT-OCR: TesserAcT-OCR: TesserAcT-OCR: TesserAcT-OCR

Click The Additional Language Data (Download) check box when selecting the installed components. The Additional Language data(Download) check box is used to install each version of the language package.

Another thing to note is that remember the path we chose to install the software, because we need to add this to the path of the system environment variable:

Another environment variable to add is TESSDATA_PREFIX, as shown below, Please make sure the TESSDATA_PREFIX environment variable is set to the parent directory of your “tessdata” Directory error:

Tesseract -v = tesseract -v = tesseract -v = tesseract -v = tesseract -v = tesseract -v

2. Install PyTesseract in the local Python environment

Since we will eventually use AirTest and Tesseract in a Python environment, we need to install the AirTest and PyTesserAct libraries in our native Python environment:

pip install airtest
pip install pytesseract
Copy the code

After the installation is complete, you can check the installation result by typing PIP list on the command line:

3. Use Airtest to capture the screenshot and identify the screenshot text

Open our AirtestIDE and set the Python environment where we have just installed the corresponding library in the option — Setup — Custom python.exe path:

Take the interface of POCO demo provided on the official website as an example. We use Airtest to capture the screenshot of the part in red box, and then use Tesseract to identify and print the text in the screenshot:

The concrete implementation is as follows:

# -*- encoding=utf8 -*-
__author__ = "AirtestProject"

from airtest.core.api import *
from airtest.aircv import *
auto_setup(__file__)

from PIL import Image
import pytesseract

# Partial screenshots
screen = G.DEVICE.snapshot()
local= aircv. Crop_image (screen,,58,380,126 (132))Save partial screenshots to the specified folder
pil_image = cv2_2_pil(local)
pil_image.save("D:/test/score0.png", quality=99, optimize=True)

Read the screenshot and recognize the text in the screenshot
image = Image.open(r'D:/test/score0.png')    
text = pytesseract.image_to_string(image)
print("----------- initial data is --------------")
print(text)
Copy the code

The identification results are as follows:

Knowledge:

① g.davidice. Snapshot (), take a screenshot of the current device screen and save it in the memory.

(x_min, y_min, x_max,y_max) (screen) (screen) (screen) (screen) (screen) (screen) (screen) (screen) (screen)

③ image.open (), which is used to directly read the Image pointed to by the given path

④ image_to_string(), used to parse the text in the picture

4. Identify the verification code

For example, the following verification code is saved in D:/test/7364.jpg:

The identification methods and results are as follows:

# Identify the verification code
image2 = Image.open(r'D:/test/7364.jpg')    
text2 = pytesseract.image_to_string(image2)
print("----------- Verification code is --------------")
print(text2)
log("Verification code is:"+text2)
Copy the code

5. Recognize Chinese characters

The methods and numbers used to identify Chinese characters are basically the same as those used in English, but in particular we need to specify Chinese language parameters in the image_to_string() method (simplified Chinese is specified in the sample code to identify screenshots) :

# Recognize Chinese
image3 = Image.open(r'D:/test/3.png')    
text3 = pytesseract.image_to_string(image3,lang='chi_sim')
print("----------- identified as: --------------")
print(text3)
log("The recognized text is:"+text3)
Copy the code


Airtest website: airtest.netease.com/ Airtest tutorial website: airtest.doc.io.netease.com/ build enterprise private cloud service: airlab.163.com/b2b