Today shares a practical skill, using Python to batch read id information to write Excel.

read

Take the id card in the form of picture as an example, we use Baidu character recognition OCR to read information to achieve, Baidu interface provides a free quota, daily use is almost enough, let’s take a look at how to use Baidu character recognition.

The SDK installation

Baidu Cloud SDK provides Python, Java and other languages support, the Python version of the SDK is easy to install, using PIP install Bidu-aip, support Python 2.7+ & 3.x version.

Create an

To create an application, a Baidu or Baidu Cloud account is required, and the registered login address is: https://login.bce.baidu.com/?redirect=http%3A%2F%2Fcloud.baidu.com%2Fcampaign%2Fcampus-2018%2Findex.html, move the mouse to login head position after login, From the popup menu, click User Center, as shown:

To enter for the first time, select the corresponding information, as shown in the figure:

Click Save when you’re done.

Then move the mouse pointer to the left > symbol position, select ARTIFICIAL intelligence, and click text recognition, as shown in the picture:

Click to enter the following picture:

Now we can click Create app and proceed to the following image:

From the figure above, we can see that Baidu character recognition OCR can recognize many categories of information, that is to say, not only id card, if you have other information recognition needs can also be quickly realized through it.

Here we fill in the application name and description, and click Create immediately.

After creation, return to the application list, as shown below:

We’re going to use AppID&API Key&Secret Key, so let’s make a note of that.

Code implementation

The code implementation is simple, with a few lines of Python code, as follows:

from aip import AipOcr

APP_ID = 'Own APP_ID'
API_KEY = 'Own API_KEY'
SECRET_KEY = 'Own SECRET_KEY'
Create a client object
client = AipOcr(APP_ID, API_KEY, SECRET_KEY)
Open and read the file contents
fp = open("idcard.jpg"."rb").read()
# res = client.basicGeneral(fp
res = client.basicAccurate(fp)  # high precision
Copy the code

It can be seen from the above code that the recognition function is divided into ordinary and high-precision modes. In order to improve the recognition rate, high-precision mode is adopted here.

Take the following three fake IDS I found online:

Because there are multiple ID card pictures, we need to write a method to traverse, the code implementation is as follows:

def findAllFile(base):
    for root, ds, fs in os.walk(base):
        for f in fs:
            yield base + f
Copy the code

The original id card information obtained through the identification function is in the following format:

{' words_result: [{' words' : 'names inspired'}, {' words' : 'gender man ethnic han'}, {' words' : 'birth on December 20, 1654}, {' words' : 'Address: No. 4 jingshan Qian Street, Dongcheng District, Beijing '}, {' Words ':' Jing Shi Fang '}, {' Words ': 'Citizen ID no. 11204416541220243X'}], 'log_id': 1411522933129289151, 'words_result_num': 6}Copy the code

write

The id information is written to Pandas. Here, we also need to preprocess the obtained original certificate information to write it into Excel. We put the name of the certificate… Addresses are stored in the array, the processing code is implemented as follows:

For Tex in res["words_result"]: row = Tex ["words"] if "name" in row: names.append(row[2:]) elif "gender" in row: (genders. Append (row[2:3]) Nations. Append (row[5:]) Elif "birth" in row: kC-1 Addr += row[2:] elif "iD" in row: ids.append(row[7:]) else: addr += rowCopy the code

Then it is very convenient to write information directly into Excel, write code implementation as follows:

Df = pd.DataFrame({" name ": names, "gender ": genders," nationality ": nations, "birth ": elaborated," address ": ids}) df.to_excel('idcards.xlsx', index=False)Copy the code

Take a look at the write effect:

At this point, we will realize the id card information batch read and write function.

Source code in the public number Python small two background reply ID card access.