Document an attempt at OCR program development

Recently, I was involved in the verification of some documents and paper documents. I wanted to take pictures of paper documents and check each other with words. Think of the previous call youdao Wisdom cloud interface to do a document translation. After looking at the API interface of OCR character recognition, Youdao provides a variety of different INTERFACES for OCR recognition, including handwriting, printing, forms, whole problem recognition, shopping receipt recognition, ID card, business card, etc. Simply this time continue to use Youdao Wisdom cloud interface to do a small demo, these functions have been tried, when practice, but also when for the future may use the function to prepare.

Preparations for calling the API

First of all, you need to create an instance, create an application, bind the application and instance on the personal page of Youdao Wisdom Cloud, and obtain the ID and key of the application. For details about the process of individual registration and application creation, see the development process of a batch file translation shared in the article

Detailed introduction of the development process

The following describes the specific code development process:

This demo is developed using PYTHon3 and includes maindow. Py, ocrprocesser.py, and ocrTools. py files. In the interface part, in order to simplify the development process, python tkinter library is used to provide the function of selecting the file to be recognized and the recognition type, and displaying the recognition results. Ocrprocesser.py calls the appropriate API interface based on the selected type to complete the recognition process and return the result; Ocrtools. py encapsulates all kinds of YOUdao OCR apis after finishing, and implements categorical calls.

Interface part:

Part of the interface code is as follows, using Tkinter grid to arrange elements.

root=tk.Tk() root.title("netease youdao ocr test") frm = tk.Frame(root) frm.grid(padx='50', Pady ='50') btn_get_file = tk.Button(FRM, text=' select image ', command=get_files) btn_get_file.grid(row=0, column=0, padx='10', pady='20') text1 = tk.Text(frm, width='40', height='5') text1.grid(row=0, column=1) combox=ttk.Combobox(frm,textvariable=tk.StringVar(),width=38) combox["value"]=img_type_dict combox.current(0) Combox. bind("<<ComboboxSelected>>",get_img_type) combox.grid(row=1,column=1) label=tk. label (FRM,text=" ") label.grid(row=2,column=0) text_result=tk.Text(frm,width='40',height='10') text_result.grid(row=2,column=1) Btn_sure =tk.Button(FRM,text=" start ",command=ocr_files) btn_sure. Grid (row=3,column=1) Btn_clean =tk.Button(FRM,text=" clean ",command=clean_text) btn_clean. Grid (row=3,column=2) root.mainloop() btn_clean=tk.Button(FRM,text=" clean ",command=clean_text)Copy the code

Where the bTN_sure binding event ocr_Files () passes the file path and identification type into the ocrProcesser:

def ocr_files(): if ocr_model.img_paths: ocr_result=ocr_model.ocr_files() text_result.insert(tk.END,ocr_result) else : Tk.messagebox.showinfo (" prompt "," no file ")Copy the code

2. The main method in ocrProcesser is ocr_files(), which processes the image base64 and calls the encapsulated API.

def ocr_files(self):
    for img_path in self.img_paths:
        img_file_name=os.path.basename(img_path).split('.')[0]
        #print('==========='+img_file_name+'===========')
        f=open(img_path,'rb')
        img_code=base64.b64encode(f.read()).decode('utf-8')
        f.close()
        print(img_code)
        ocr_result= self.ocr_by_netease(img_code, self.img_type)
        print(ocr_result)
        return ocr_result
Copy the code

3. After reading and sorting through the document with AN API, it can be roughly divided into the following four API entrances: handwriting/print recognition, ID card/business card recognition, form recognition and whole question recognition. The URL of each interface is different, and the request parameters are not all consistent.

# 0-hand write
# 1-print
# 2-ID card
# 3-name card
# 4-table
# 5-problem
def get_ocr_result(img_code,img_type):
    if img_type==0 or img_type==1:
        return ocr_common(img_code)
    elif img_type==2 or img_type==3 :
        return ocr_card(img_code,img_type)
    elif img_type==4:
        return ocr_table(img_code)
    elif img_type==5:
        return ocr_problem(img_code)
    else:
        return "error:undefined type!"
Copy the code

Then organize data and other fields according to the parameters required by the interface, and perform simple parsing and processing for the return values of different interfaces, and return:

def ocr_common(img_code):
    YOUDAO_URL='https://openapi.youdao.com/ocrapi'
    data = {}
    data['detectType'] = '10012'
    data['imageType'] = '1'
    data['langType'] = 'auto'
    data['img'] =img_code
    data['docType'] = 'json'
    data=get_sign_and_salt(data,img_code)
    response=do_request(YOUDAO_URL,data)['regions']
    result=[]
    for r in response:
        for line in r['lines']:
            result.append(line['text'])
    return result


def ocr_card(img_code,img_type):
    YOUDAO_URL='https://openapi.youdao.com/ocr_structure'
    data={}
    if img_type==2:
        data['structureType'] = 'idcard'
    elif img_type==3:
        data['structureType'] = 'namecard'
    data['q'] = img_code
    data['docType'] = 'json'
    data=get_sign_and_salt(data,img_code)
    return do_request(YOUDAO_URL,data)

def ocr_table(img_code):
    YOUDAO_URL='https://openapi.youdao.com/ocr_table'
    data = {}
    data['type'] = '1'
    data['q'] = img_code
    data['docType'] = 'json'
    data=get_sign_and_salt(data,img_code)
    return do_request(YOUDAO_URL,data)

def ocr_problem(img_code):
    YOUDAO_URL='https://openapi.youdao.com/ocr_formula'
    data = {}
    data['detectType'] = '10011'
    data['imageType'] = '1'
    data['img'] = img_code
    data['docType'] = 'json'
    data=get_sign_and_salt(data,img_code)
    response=do_request(YOUDAO_URL,data)['regions']
    result = []
    for r in response:
        for line in r['lines']:
            for l in line:
                result.append(l['text'])
    return result
Copy the code

Get_sign_and_salt () adds the necessary signatures and other information to data:

def get_sign_and_salt(data,img_code): data['signType'] = 'v3' curtime = str(int(time.time())) data['curtime'] = curtime salt = str(uuid.uuid1()) signStr = APP_KEY + truncate(img_code) + salt + curtime + APP_SECRET sign = encrypt(signStr) data['appKey'] = APP_KEY data['salt']  = salt data['sign'] = sign return dataCopy the code

Results show

Handwritten result display:

Printing (program yuan to identify the code) :

Business card recognition, here I found a business card template, it seems accurate or ok:

Id card (also template) :

Table recognition (this super long JSON, >_< emmm……) :

Whole problem identification (formula identification is also done, identification result JSON is relatively long, it looks not so intuitive, I will not paste here) :

conclusion

Overall, the interface function is very powerful, all kinds of support. That is, visual algorithm engineers do not have the classification function, so they need to call each type of image separately by interface, and the interface cannot be mixed at all. For example, in the development process, I submitted the business card picture to THE API as id card, and the result returned “Items not found!” , it is a bit troublesome for developers to call the API, of course, it also improves the identification accuracy to a certain extent, and I guess it is also for the convenience of charging by interface: P.

Project address: github.com/LemonQH/Wor…

From: Program girl Danny