OCR recognition based on neural network

Optical Character Recognition(OCR) commonly refers to the Recognition of the text in the picture, such as the ID number, name, address on the ID card, the card number on the bank card, etc.

Evil

Github repo

Evil is a simple recognition framework for iOS and macOS. Supports installation through CocoaPods, Carthage, and Swift Package Manager. The underlying recognition model can be easily migrated to other platforms.

The basic flow of OCR recognition

The area to be identified is captured from the whole imageEg: Capture the rectangular area where the ID card is located from the whole picture
Intercept the text areaEg: Intercept the id card number through a certain algorithm
A series of preprocessing to the text area to facilitate the next operationEg: Gaussian blur, expansion and so on
Split text, meaning that the text area is divided into a singleword
Will be a singlewordThrow it into the neural network

Evil uses the latest Vision framework to do this. Four steps before Apple gives us a useful method to use system For example: VNDetectTextRectanglesRequest. So we won’t discuss the implementation details of the first 4 steps here, but if you want to learn how to use the API, you can see it here.

How to use neural network to recognize a single word

I personally think that the recognition of a small number of printed words can be processed by picture classification model, if you have a better solution, welcome to communicate. This is easy to follow if you’re a CNN tuner, but if you don’t know the basics of neural networks, it might be a little hard to follow, because I don’t know much about them either. If you don’t have relevant knowledge before, you can learn about the Turi Create provided by Apple, which saves you from designing your own network.

0x00 Design the network

First of all, we need to design a CNN network to input our single word picture for recognition. Because our recognition task is very simple, the network architecture will be very simple. Here is the Keras (2.0.6) code:

model = Sequential()

model.add(Conv2D(32, (5.5), input_shape=(28.28.1), activation='relu'))
model.add(MaxPooling2D(pool_size=(2.2)))
model.add(Dropout(0.5))
model.add(Conv2D(64, (3.3), activation='relu'))
model.add(MaxPooling2D(pool_size=(2.2)))
model.add(Dropout(0.2))
model.add(Conv2D(128, (1.1), activation='relu'))
model.add(MaxPooling2D(pool_size=(2.2)))
model.add(Dropout(0.2))
model.add(Flatten())
model.add(Dense(128, activation='relu'))
model.add(Dense(num_classes, activation='softmax'))
Copy the code

0x01 Generates training Data

We know that in order to train the network, you have to have a lot of raw data so what do we do without that? Some training resources can be found on the network, but like us here to identify the ID number, how to do?

Of course is to write script generation, such as our Sir Into many many id number area. Make some random changes to increase the diversity of the data.

o_image = Image.open(BACKGROUND)
    draw_brush = ImageDraw.Draw(o_image)

    font_size = random.randint(- 5.5) + 35
    draw_brush.text((10 + random.randint(- 10.10), 15 + random.randint(2 -.2)), LABELS,
                    fill='black',
                    font=ImageFont.truetype(FONT, font_size))

    o_image = ImageEnhance.Color(o_image).enhance(
        random.uniform(0.5.1.5))  # coloring
    o_image = ImageEnhance.Brightness(o_image).enhance(
        random.uniform(0.5.1.5))  # brightness
    o_image = ImageEnhance.Contrast(o_image).enhance(
        random.uniform(0.5.1.5))  # contrast
    o_image = ImageEnhance.Sharpness(o_image).enhance(
        random.uniform(0.5.1.5))  # rotation
    o_image = o_image.rotate(random.randint(2 -.2))

    o_image.save(output + '/%d.png' % idx)
Copy the code

Once we have the text area, we need to divide the text area into single words and train the network. Because the next task is universal, I simply wrote a small tool called PrepareBot. The specific code is here, you can go and have a look.

0x02 Training network

With the data, with the network model, the training network is very simple and it looks like this:

model.fit_generator(generator=train_data_generator)
Copy the code

Good to here, we observe the convergence of the network and identification accuracy, if not very bad, you can save the model, for the future identification task to prepare. Note that the Keras model generated in this step is cross-platform, meaning that it can be recognized on Windows, Linux, and even Android.

0x03 Conversion Network

In the previous steps we generated Keras network models. How do you use these models in Evil? First of all, we need to use coremltools provided by apple to convert keras model into CoreModel

# Prepare model for inference
for k in model.layers:
    if type(k) is keras.layers.Dropout:
        model.layers.remove(k)
        model.save("./temp.model")

core_ml_model = coremltools.converters.keras.convert("./temp.model",
                                                     input_names='image',
                                                     image_input_names='image',
                                                     output_names='output',
                                                     class_labels=list(labels),
                                                     image_scale=1 / 255.)

core_ml_model.author = 'gix.evil'
core_ml_model.license = 'MIT license'
core_ml_model.short_description = 'model to classify chinese IDCard numbers'

core_ml_model.input_description['image'] = 'Grayscale image of card number'
core_ml_model.output_description['output'] = 'Predicted digit'

core_ml_model.save('demo.mlmodel')
Copy the code

Save the demo. mlModel file for later use.

0x04 Importing a Network

We have the model file, how do we import the Evil framework?

Just drag it into Xcode

A significant disadvantage of this method is that it will increase the size of the app, so we do not recommend using this method. But this will always give you the easiest and most straightforward way to do this during our debugging.

Runtime download

This doesn’t have any effect on the size of your app, but you’ll notice that you need to download model files at run time and code complexity increases, but the good news is that Evil provides very friendly support for this. All you need to do is save the model file on your server or CDN and configure the download path to Evil in the info.plist file to automatically configure your network model.

0x05 Using a Network

Everything is there, how to use it? Simply invoke the interfaces provided by Evil as outlined in steps 1-5. For example,

// 1. Use Evil built-in model to identify ID number

lazy var evil = try? Evil(recognizer: .chineseIDCard)
let image: Recognizable=...let cardNumber = self.evil? .recognize(image)print(cardNumber)
Copy the code

// 2. Use custom models

let url: URL=...let evil = try? Evil(contentsOf: url, name: "demo")
let ciimage = CIImage(cvPixelBuffer: pixelBuffer).oriented(orientation)
            if let numbers = ciimage.preprocessor
                // Perspective correction
                .perspectiveCorrection(boundingBox: observation.boundingBox,
                                       topLeft: observation.topLeft,
                                       topRight: observation.topRight,
                                       bottomLeft: observation.bottomLeft,
                                       bottomRight: observation.bottomRight)
                .mapValue({Value($0.image.oriented(orientation), $0.bounds)})
                // Make sure your id card faces up
                .correctionByFace()
                // Intercept the number area
                .cropChineseIDCardNumberArea()
                // Preprocessing gaussian blur etc
                .process()
                // Split text
                .divideText()
                // Simple verification.value? .map({$0.image }), numbers.count= =18 {
                if let result = try? self.evil? .prediction(numbers) {if letcardnumber = result? .flatMap({ $0 }).joined() {
                        DispatchQueue.main.async {
                            self.tipLabel.text = cardnumber
                        }
                    }
                }
            }
Copy the code

conclusion

Good, good advertising is here, welcome to ridicule, welcome to star fork, welcome to all pr, welcome to contribute your own training model. Writing for the first time in nuggets thanks for your support.