This time, I’d like to talk about word recognition. You must be familiar with all kinds of scanning apps at ordinary times. Hold the phone’s camera up to any text and convert the text from the camera into an editable string on the phone.

Character recognition, abbreviated as OCR, stands for Optical character Recognition. About his complete definition can be found on Wikipedia: en.wikipedia.org/wiki/Optica… .

The complete OCR algorithm is not simple. It involves image recognition algorithms and converting words identified from images into binary forms that can be used in our programs. This also includes how to identify the characters of different languages, how to identify more accurately and so on.

Imagine if you, as a developer, had to develop such an algorithm from scratch. But the good news is that we live in the open source era, and there are plenty of people who have done it for us. Tesseract is an open source library that recognises all the text in any given image and uses a very simple API.

Tesseract

Tesseract is an OCR open source library released by Google that supports multiple locales and runtime environments, including the iOS environment that we will cover here. Use it today to guide you in developing your own OCR application.

First, you need to install Tesseract. The easiest way to install Tesseract is to use Cocoapods, go to your project root directory, and type:

pod init
Copy the code

Then, edit the generated Podfile configuration file:


target 'ocrSamples' do
  use_frameworks!
  pod 'TesseractOCRiOS'
end

Copy the code

Add TesseractOCRiOS to the configuration list and type:

pod install
Copy the code

The TesseractOCRiOS installation is complete. If you haven’t worked with Cocoapods before, check out our previous swiftcafe.io/2015/02/10/… .

configuration

Build Phases -> Link Binary With Libraries The dependencies required by the project, CoreImage, libstdc++, and TesseractOCRiOS itself:

After the dependency library is configured, we also need to add the text recognition training data. What is the training data? TesseractOCRiOS will recognize the text according to the rules of this training data when recognizing images. For example, Chinese and English have corresponding training data, which can be understood as the model trained for us in advance by deep learning. Use it to carry on the core recognition algorithm.

Tesseract has a dedicated page listing all available training data: github.com/tesseract-o… .

For example, if we need to identify simplified Chinese, we can download chi-Sim training data:

Then drag and drop the training data into the project:

In the figure above, chi_sim. trainedData is the training data we downloaded. It should be noted that this file must be in the testData folder. And the Folder should be dragged in as “Create Folder Reference” :

How does this reference differ from the other “Create Groups” approach? The Main difference is that the directory dragged and dropped in by reference saves our training data resources in the path of testData /chi_sim. trainedData in Main Bunlde when the APP package is finally generated. If “Create Groups” is used, the folder name will be ignored and the files stored in the Main Bundle will be stored in/chi_sim.trainedData.

And TesseractOCRiOS, by default is in testdata/chi_sim traineddata this path search for training data, so if you use the “Create Groups” way into, can cause the runtime can’t find the training data, and an error. This detail requires special attention.

Finally, in order for TesseractOCRiOS to work correctly, we also need to turn off BitCode, otherwise a compilation error will be reported. We need to turn off BitCode in two places, one for the to project and the other for the Pods module, as shown below:

Start coding

With that done, we’re ready to start coding, and I’m just going to show you the simplest code. First of all, we need to display two controls on the main interface, one is our pre-stored photo with text, and the other is a text box for displaying recognition results:

 override func viewDidLoad() { super.viewDidLoad() self.imageView = UIImageView(frame: CGRect(x: 0, y: 80, width: Self. View. Frame. The size. Width, height: the self. The frame. The size. The width * 0.7)) the self. The imageView?. Image = UIImage (named:"textimg")
        self.view.addSubview(self.imageView!)
        
        self.textView = UITextView(frame: CGRect(x: 0, y: labelResult.frame.origin.y + labelResult.frame.size.height + 20, width: self.view.frame.size.width, height: 200))
        self.view.addSubview(self.textView!)
        
        
    }
Copy the code

Here we will only write out the key initialization code of the two controls, the other unimportant code is omitted. TesseractOCRiOS can then be called for text recognition:

 func recognizeText(image: UIImage) {
        
        if let tesseract = G8Tesseract(language: "chi_sim") { tesseract.engineMode = .tesseractOnly tesseract.pageSegmentationMode = .auto tesseract.image = image tesseract.recognize() self.textView? .text = tesseract.recognizedText } }Copy the code

All the code that identifies it is there. First call G8Tesseract for initialization, we pass in the name of the training data, here “chi_sim” for simplified Chinese. EngineMode has three modes to choose from, tesseractOnly, tesseractCubeCombined, and cubeOnly. We use the first model, which is to train the data. The consciousness of cubeOnly is to use more accurate cube method, tesseractCubeCombined is the combination of the two modes.

The Cube schema requires additional model data, roughly like this:

The figure above is an English cube recognition model. I haven’t found the simplified Chinese model yet, so we just use tesseractOnly mode in this example. In addition, tesseractCubeCombined and cubeOnly are not needed in the absence of cube model data, otherwise an error will be reported because the model data does not exist. If you can find the Chinese cube model, you are also welcome to feedback in the comments, which will make the character recognition more accurate.

The other calls need no further explanation. The image object to be recognized is set to the Tesseract. It recognises it by calling its recognizer method, and finally sets the tesseract. RecognizedText to the TextView.

The final running effect is as follows:

The above picture is a photo I took of an article on SwiftCafe website. Judging from the recognition result, it is relatively accurate.

conclusion

As you can see from the above code, Tesseract, while providing an inherently complex text recognition algorithm, provides a fairly simple interface for developers. OCR character recognition as a whole can also be regarded as an application branch of AI. In this era of POPULAR AI, even if we master some application technologies, it can greatly broaden our horizons for developers. Get creative, and maybe components like Tesseract can help you create cutting-edge apps for the AGE of AI.

Of course, Tesseract still has some limitations of its own, such as the fact that it can only read printed characters, not handwritten ones. But so what? The OCR algorithm, which is quite complicated, can be applied with very little cost, which is a very happy thing for developers.

As usual, the sample project code for this article is available on Github for downloading if you need it: github.com/swiftcafex/… .

** If you find this article helpful, you can also follow the wechat official account Swift-cafe and more of my original content will be shared with you ~ **