Enterprise open source OCR framework, verification code identification it is enough

Project Address:Github.com/kerlomz/cap…

Download version: github.com/kerlomz/cap…

Note: If the use of cloud Server (Windows Server version) encountered flash back, please follow the steps: My computer — Properties — Management — Add roles and functions — check desktop experience, click Install, restart after installation.

2020/06/01 side:

Presumably the this article, you just accidentally found online articles is uneven, title party a lot, very few open source code, can run up to run the code, often encounter the following problems such as: memory leaks, network parameters write die lead to change the training set error, network run other sample recognition rate is low, no call sample and so on.

Before I go any further, I can assure you that it will be the most practical and production-level captchas you’ve ever seen.

To Xiao Bai: You don’t need to write a single line of code.
For small businesses: its availability and stability is able to withstand the test, in performance is also peer leading, can rest assured into the pit.

Because xiaobian is planning to change industries, before leaving this industry always leave something to prove that they have come, always someone told me that this deployment will not be called, maybe you want a line of PIP to fix the environment, so today I arranged for you MuggleOCR (MuggleOCR). Pypi.org/project/mug… It integrates the common model of simple verification code recognition + common recognition of printed characters, and supports the invocation of the model trained by this framework. The invocation takes only three lines of core code:

import time
# STEP 1
import muggle_ocr
import os
# STEP 2
sdk = muggle_ocr.SDK(model_type=muggle_ocr.ModelType.OCR)
root_dir = r"./imgs"
for i in os.listdir(root_dir):
    n = os.path.join(root_dir, i)
    with open(n, "rb") as f:
        b = f.read()
    st = time.time()
    # STEP 3
    text = sdk.predict(image_bytes=b)
    print(i, text, time.time() - st)
Copy the code

This is really simple, and is sufficient for general text recognition and captcha. (The text recognition will be updated with the new model in a few days, after all, the 0601 model only runs half a day.

1. Introduction

This project is for Python3.7, GPU>=NVIDIA GTX1050Ti, the original master branch has added a GUI configuration interface and compiled version, it is time to write a new article.

* * to make a long story short, cut to the chase, * * code present on the network is given priority to with teaching research, this project aimed to pragmatists custom, as long as the basic installation of environment, can be very good training out the desired model, redefine anyone can use a few simple parameters deep learning a commercial product technical training.

Of CNN, the author chooses the current most popular + BLSTM + CTC (CRNN) for end-to-end variable length verification code identification, reserved CNNX code (no because it is a small make up their own together)/MobileNet DenseNet121 / ResNet50 options, can use directly in the configuration screen. First of all, a general introduction.

Grid structure	predict-CPU	predict-GPU	The model size
CNN5+Bi-LSTM+H64+CTC	15ms	8ms	2mb
CNN5+CrossEntropy	8ms	2ms	1.5 MB

H16/H64 refers to UnitsNum, the number of hidden neurons in BI-LSTM, so this project uses GPU training and CPU for prediction. The forecast service deployment project can be found here: github.com/kerlomz/cap… A compiled version of the deployment project can be downloaded at github.com/kerlomz/cap…

2. Environment dependence:

Spent a long length to introduce the basic construction of the training environment, mainly for the readers who have not yet started to see, the old birds just skip, if you do not want to waste time in the environment, welcome to use the compiled version, can be found at the beginning of the article download address.

CUDA/cuDNN/TensorFlow/TensorFlow/TensorFlow/PIP

Linux

Version	Python version	Compiler	Build tools	cuDNN	CUDA
Tensorflow_gpu – 1.14.0	3.7	GCC 4.8	Bazel 0.15.0	7.6	9

Windows

Version	Python version	Compiler	Build tools	cuDNN	CUDA
Tensorflow_gpu – 1.14.0	3.7	MSVC 2015 update 3	Bazel 0.15.0	7.6	10

If you want to use a different CUDA and cuDNN, you can either compile TensorFlow yourself, or go to Github and search for the TensorFlow Wheel to find a third-party version of the WHL installation package. Be forewarned, if you compile your own will be a lot of trouble, a lot of holes, I won’t expand here.

2.1 Environment dependence of the project

Currently, it has been tested on the following mainstream operating system platforms:

The operating system	Minimum Supported Version
Ubuntu	16.04
Windows	7 SP1
MacOS	N/A

The main environmental dependencies of this training program are listed below

Rely on	Minimum Supported Version
Python	3.7
TensorFlow-GPU	1.14.0
Opencv-Python	4.1.2.30
Numpy	1.16.0
Pillow	4.3.0
PyYaml	3.13
tqdm	N/A

2.1.1 Python 3.7 in Ubuntu 16.04

1) Install the Python environment first (Python 3.7 can be ignored)

sudo apt-get install openssl sudo apt-get install libssl-dev sudo apt-get install libc6-dev gcc sudo apt-get install -y make build-essential zlib1g-dev libbz2-dev libreadline-dev $ libsqlite3-dev wget curl llvm tk-dev wget https://www.python.org/ftp/python/3.7.6/Python-3.7.6.tgz tar - VXF Python - 3.7.6. Tar. XzcdPython - 3.7.6. / configure -- prefix = / usr /local  --enable-shared
make -j8
sudo make install -j8
Copy the code

If you can’t find libpython3.7m.so.1.0, go to /usr/local/lib and copy the file to /usr/lib and /usr/lib64. Pip3 install -r requirements. TXT: pip3 install -r requirements. TXT: pip3 install -r requirements. TXT: pip3 install -r requirements. TXT: pip3 install -r requirements. TXT It is highly recommended to use virtual environments such as Virtualenv or Anaconda for environmental isolation between projects. I usually use Virtualenv. If you need to modify the code, PyCharm is recommended as the Python IDE

virtualenv -p /usr/bin/python3 venv # venv is the name of the virtual environment.
cd venv/ # venv is the name of the virtual environment.
source bin/activate # to activate the current virtual environment.
cd captcha_trainer # captcha_trainer is the project path.
pip3 install -r requirements.txt
Copy the code

2.1.2 CUDA/cuDNN in Ubuntu 16.04

Having seen many tutorials online and deployed myself many times, Ubuntu 16.04 has encountered relatively few bugs. 14.04 Support is not so good, if the motherboard does not support turning off SecureBoot, do not install the Desktop version, because after installing will be infinite loop in the login interface can not enter the Desktop. Online tutorial said to add driver blacklist what I directly skipped, pro test is not necessary. If you want to install a runfile, deb will automatically install the default driver, which may cause you to log in to a NVIDIA driver cycle (www.geforce.cn/drivers CUDA). Developer.nvidia.com/cuda-downlo… Download address: cuDNN developer.nvidia.com/cudnn (need to register the NVIDIA account and log in, download deb installation package)

2. Close the GUI. Ctrl+ Alt +F1 Go to the character screen and close the GUI

sudo service lightdm stop
Copy the code

3. Install the Nvidia Driver

The version in the command corresponds to the downloaded version. Download the latest version based on your graphics card model from the preceding download address. Remember that the installation package is in runfile format.

Run // Obtain the execution permission sudo./ nvidia-linux-x86_64-384.90. run -- no-x-check -- no-nouveau-check -- no-opengl-files // Install the driverCopy the code

Run the following command to verify the installation. If the graphics card information is displayed, the installation is successful

nvidia-smi
Copy the code

4. Install CUDA

1) Install some system dependent libraries first

sudo apt-get install freeglut3-dev build-essential libx11-dev libxmu-dev libxi-dev libgl1-mesa-glx libglu1-mesa libglu1-mesa-dev
Copy the code

Execute the installation program, continue as instructed brainless, if prompted whether to install the driver choose not to install.

Sudo sh cuda_9. 0.176 _384. 81 _linux. RunCopy the code

If the environment variables are not configured, write to the end of the ~/.bashrc file

export PATH=/usr/local/ cuda - 9.0 / bin${PATH:+:${PATH}}
export LD_LIBRARY_PATH=/usr/local/ cuda - 9.0 / lib64${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}}
Copy the code

Then execute the sudo LDConfig update in the terminal, and restart the machine to restart the graphical interface after installation.

sudo service lightdm start
Copy the code

2.1.3 Windows

On Windows, it is actually much easier, just go to the official website to download the installation package and install it mindless. Download and connect with Ubuntu, first install Python, graphics card driver, CUDA, and then download the corresponding cuDNN and replace it with the corresponding path.

3 the use of

Before the training, many group friends often ask me “how many samples do I need to train four-digit English numbers?” Such questions, I will make a unified reply here, the number of samples mainly depends on the characteristics of the sample complexity.

Several references can be provided here: deformation or not? Does it rotate? Is there complex background interference? Are there multiple fonts? What is the size of the character set? What are the number of bits (tags)?

Generally a simple few hundred samples (need to adjust the validation set size and validation batch size) can be.
A few thousand slightly more complicated samples will usually work.
Especially complex tens of thousands of samples.
Chinese this thousands of classification of the general 100,000.

Note: only prepare less than 100 samples of the pro, do not try training test, because it can not run.

Start with the first step environment set up, that is ready to run the code, or there are several necessary conditions, even without straw, first of all, since it is training, to have a training set, there is a novice taste training set, mnist handwriting recognition example, can be downloaded in Tencent cloud: share.weiyun.com/5pzGF4V, now everything is ready except the east wind.

3.1 Define a model

This project is based on parametric configuration, no need to change any code, through the visual interface operation training almost any character image verification code. The training framework interface can be roughly divided into several parts:

A Neural Network
Project Configuration – Project Configuration area
Sample Source – Sample Source configuration area
Training Configuration – Training Configuration area
Buttons – Functional control area

The training configuration steps are as follows:

The configuration items in the neural network area seem to be many. For beginners, the default configuration can be used directly: CNNX+GRU+CTC+C1 combination (CNN front network +GRU+CTC+ single channel).
The configuration item in the project configuration area configures the project name after the network is selected, press Enter or click the blank space to confirm.
The configuration item in the sample source configuration area is used to configure the path of the sample source. Training samples are packaged into TFRecords format according to this path. Validation samples can be randomly sampled from the total training Set into the Validation Set using the [Validation Set Num] parameter without specifying the path.
Configuration items in the training configuration area define the conditions for completing training, such as end Accuracy, end COST, end Epochs, and batch size
After setting the configuration items in the function control area, click [Make Dataset] to pack samples and then click [Start Training] to Start Training.

The following is for those with a foundation:

If CrossEntropy is used as the decoder, it is necessary to pay attention to the relationship between label number LabelNum and image size. Because the network is designed for multiple labels (generally, multiple labels are directly connected to multiple classifiers), the output of the convolution layer is transformed as follows:

Reshape([label_num, int(outputs_shape[1] / label_num)])
Copy the code

In order to ensure that int(outputs_shape[1] / label_num) is a positive integer, which means that there is some relationship between them, for the CNN5+Cross Entropy network structure, the step size of the Conv2D layer is 1, then the following relationship needs to be guaranteed:

So sometimes you need to Resize the Shape of the network input

network	Pool step size ^ Number of pools	Output layer parameter
CNN5	16	64
CNNX	8	64
ResNet50	16	1024
DenseNet	32	2048

For example, if CNN5+CrossEntropy is used, the input width and input height must meet the following requirements:

Similarly, if CNN5+RNN+CTC, the output after the convolution layer is transformed as follows:

Reshape([- 1, outputs_shape[2] * outputs_shape[3]])
Copy the code

The original output (batch_size, outputs_shape[1], outputs_shape[2], outputs_shape[3]), RNN layer input and output requirements are (Batch, timesteps, num_classes), In order to access RNN, a concept of Time Step is introduced after the above operations, so the value of timesteps is also outputs_shape[1], and CTC Loss requires the input of [batch_size, frames, num_labels]. If timesteps are smaller than the number of labels, the loss cannot be calculated and the minimum value cannot be found in the loss function, so why the gradient drops. The most reasonable value for timeSteps is usually twice the number of labels. To achieve this, you can also use the Shape input in the Resize network. In most cases, timeSteps are directly associated with the width of the image.

ExtractRegex parameters:

Note: If the training set is named in a different format than the beginner training set I provided, modify the ExtractRegex regular expression as required. Currently, you can only modify the yamL configuration file directly. GUI modification is not supported. The DatasetPath and SourcePath parameters allow multiple paths, which is suitable for people who need to train multiple samples into a single model, or who want to train aset of generic generic models. In fact, in most cases, there is no need to modify the character set Category. Generally, graphic verification codes are inseparable from numbers and English, and generally are case-insensitive and case-insensitive. Because the quality of training sets collected by the coding platform varies, some are in upper case and some are in lower case, so it is better to unify all the training sets in lower case. ALPHANUMERIC_LOWER will automatically convert ALPHANUMERIC_LOWER from uppercase to lowercase. ALPHANUMERIC_LOWER will automatically convert ALPHANUMERIC_LOWER from uppercase to lowercase. ALPHANUMERIC_LOWER will automatically convert ALPHANUMERIC_LOWER from lowercase to lowercase.

Category: ['often'.'the'.'better'.'slow'.'the south'.'system'.'root'.'hard']
Copy the code

For single-label classification, LabelNum=1 can be used, for example:

Category: ["Carrier"."Rain boots"."Wool"."Safety helmet"."Palette"."Sea gull"."Calendar"."Tennis racket". ]Copy the code

Example file name: aircraft carrier _1231290424123.png

If it is multi-label classification, LabelSplit=& can be used, for example:

Category: ["Carrier"."Rain boots"."Wool"."Safety helmet"."Palette"."Sea gull"."Calendar"."Tennis racket". ]Copy the code

Example file name: Aircraft carrier & Rain Boots & Wool _1231290424123.png

Note: Chinese character sets are generally much larger than digital English, so the convergence is slow at the beginning. It takes more time to train and requires more sample sizes

An image like the one above can be easily trained to a recognition rate of more than 95%.

Pretreatment parameters:

This parameter is used to preprocess images, such as GIF images,

The ConcatFrames parameter can be used to select frames for horizontal concatenation of two frames, which is suitable for scrolling GIFs, while flickering GIFs can be fused using the BlendFrames parameter.

3.2 Start training

After collection, the sample shape is xxx_ random number.png
Package the sample directly through the GUI interface [Make Dataset] or make_dataset. Py. Note: use source code to run the functional modules of this project need to have a certain language basis, parameter modification part and examples have been reserved, try not to modify the code of core classes or functions to avoid errors.

According to the above introduction, you only need to modify the corresponding values of a few parameters to start the formal training journey. The specific operations are as follows: You can Run trains. Py directly using PyCharm’s Run, or you can Run trains using a terminal when Virtualenv is activated, or you can Run trains in installation-dependent global environments, but it is recommended to use a GUI interface all the way through.

python3 trains.py
Copy the code

All that’s left is to wait, see the process, wait for the result. A normal start to training would look something like this:

At the end of the training, a graph directory containing pb files and a Model directory containing YAML files will be generated under the out path of the project. Now it’s time to deploy.

3.3 the deployment

It is really necessary to introduce the deployment project seriously. I put more effort into this deployment project than training. Why? Project address: github.com/kerlomz/cap…

If you want to integrate this system into your own projects, you can refer to the python-SDK use: pypi.org/project/mug… The core of the project is based on captcha_platform/ SDK/PB/sdK.py, which can be modified as needed, or directly use MuggleOCR to call the model produced by the training framework. (The specific call method can be clicked on the above link with the corresponding document introduction)

Translated version: github.com/kerlomz/cap…

A few things that are really worth knowing

Manage multiple models at the same time and support hot swap of models
Flexible version control
Support batch identification
Service intelligent routing policy

Firstly, the author rewrites Graph session management of TensorFlow, designs session pool, allows simultaneous management of multiple models, and realizes multi-model dynamic deployment scheme.

1) As long as the trained PB model is placed in the graph path of the deployment project and the YAML model configuration file is placed in model, it can be discovered and loaded by the service. (Both are placed in the same directory when called with SDK)

2) If you need to uninstall a serving model, simply delete the YAML configuration file of the model in Model and delete the corresponding PB model in GRAPH.

3) If you need to update an existing model, you only need to change the version number of the yamL configuration file of the new model to be higher than the version number of the original model. In the sequence of putting PB first and YAML first, the service will automatically discover the new model and load it for use. The old model will not be invoked because the version is lower than the new model. You can unload a deprecated model as described above to free memory. There is no need to restart the service in the above operation, and the switch is completely seamless

Secondly, if a service wants to serve various image recognition requirements, it can define a set of strategies. During training, all images of the same size are trained into a model, and the service automatically selects which model to use according to the image size. Such a design enables the coexistence of customization and universality. When accumulated to a certain variety of training set can be training all together a universal model, also can be independent of each other, each model superposition increased by only a small amount of memory, or ram, online solutions are mostly single deployment model is a set of services, each process to load a set of TensorFlow framework would be too large and extra.

Batch identification requirements are used by relatively few people and I won’t go into that here. But here’s an example of 12306:

FieldParam:
  CorpParams: [
    {
      "start_pos": [118, 0].
      "interval_size": [0, 0].
      "corp_num": [1, 1].
      "corp_size": [60, 30]
    },
    {
      "start_pos": [5, 40].
      "interval_size": [5, 5].
      "corp_num": [4, 2].
      "corp_size": [66, 66]
    }
  ]
  OutputCoord: True
Copy the code

This parameter can be used to crop a large graph to form a batch of small graphs as a batch of input, using the method to avoid multiple calls.

But the Identification project offers a number of optional services: GRPC, Flask, Tornado, Sanic, wherein Flask and Tornado provide encryption interfaces, which are similar to SecretKey and AccessKey interfaces of wechat public account development interfaces. For those who are interested, you can read the call source code in Demo. py.

The use of deployment can be compiled into an executable file through package.py, which saves the trouble of changing machine environment installation. The installation process of deployment project is the same as the training project, and the required dependencies are listed in requirements.txt provided in the project. It is highly recommended that deployment projects install CPU version TensorFlow.

Tornado edition is recommended for deployment of this project, which has the most complete functions and the most stable performance.

Linux:

Tornado:

# port 19952
python3 tornado_server.py
Copy the code

Flask

Option 1, bare-boot, port 19951
python flask_server.py 
Option 2, gunicorn, port 5000
pip install gunicorn 
gunicorn -c deploy.conf.py flask_server:app
Copy the code

Sanic:

# port 19953
python3 sanic_server.py
Copy the code

gRPC:

# port 50054
python3 grpc_server.py
Copy the code

Compiled (based on Tornado)

# foreground run
./captcha_platform_tornado
# background run
nohup ./captcha_platform_tornado &
Copy the code

Windows: On The Windows platform, the corresponding service is started by python3 XXx_server. py. Note that Tornado, Flask and Sanic’s performance is greatly reduced on the Windows platform. The compiled version directly runs the compiled EXE executable file.

3.4 Calling/Testing

1. Tornado Service

Request the address	Content-Type	Parameter form	Request method
http://localhost:19952/captcha/v1	application/json	JSON	POST

Specific parameters:

Parameter names	Will choose	type	instructions
image	Yes	String	Base64 encoding
model_name	No	String	Model name, binding in yamL configuration
need_color	No	String	Color filter, black/red/blue/yellow/green/white
output_split	No	String	Multi-label split character

{“image”: “base64 encoded binary stream of images “}

Return result:

Parameter names	type	instructions
message	String	Identify the result or error message
code	String	Status code
success	String	Whether the request is successful

This return is in JSON format, such as {“message”: “XXXX “, “code”: 0, “success”: true}

2. Flask Service:

Request the address	Content-Type	Parameter form	Request method
http://localhost:19951/captcha/v1	application/json	JSON	POST

The request parameters and return format are the same as above

3. Sanic Services:

Request the address	Content-Type	Parameter form	Request method
http://localhost:19953/captcha/v1	application/json	JSON	POST

The request parameters and return format are the same as above

GRPC service: The dependency grpcio, grpcio_tools and the corresponding grpc.proto file can be directly extracted from the sample code demo.py in the project.

python -m grpc_tools.protoc -I. --python_out=. --grpc_python_out=. ./grpc.proto
Copy the code

Grpcio and grpcio_tools are generated using the preceding command based on grpc.proto.

class GoogleRPC(object):

    def __init__(self, host: str):
        self._url = '{} : 50054'.format(host)
        self.true_count = 0
        self.total_count = 0

    def request(self, image, model_type=None, model_site=None):

        import grpc
        import grpc_pb2
        import grpc_pb2_grpc
        channel = grpc.insecure_channel(self._url)
        stub = grpc_pb2_grpc.PredictStub(channel)
        response = stub.predict(grpc_pb2.PredictRequest(
            image=image, split_char=', ', model_type=model_type, model_site=model_site
        ))
        return {"message": response.result, "code": response.code, "success": response.success}

if __name__ == '__main__':
    result = GoogleRPC().request("Base64 encoded image binary stream")
    print(result)
Copy the code

3.5 Strange tricks

The deployment project Middleware/IMPl /color_extractor.py implements a color-separation module based on K-means that can be used to process captchas in the following form:

And there’s a plan and forecast verification code and each character corresponding color, but that need to modify the existing neural network to support, in the last layer is modified to double output, an output color, one output corresponding to the characters, the high to the requirement of sample label, also increase cost, so if you can use unlimited generated samples, the problems are solved, such as above, The author wrote sample code generation, interested to: www.jianshu.com/p/da1b972e2… There are many, many skills, for example, substitute the generated samples for training set, actually online image authentication code are mostly use open source, modify, and in most cases can be approximate generated, the above show the captcha image does not represent any actual site, if you have the same, by coincidence, the project can only be used for learning and exchange purposes, It shall not be used for illegal purposes.

Afterword.

If the description of the article is not detailed enough or need technical support, you can add group 857149419 consultation, or in the open source project issue, it is a great honor to contribute to the open source community.