Processing images is not an easy task. It’s easy for you, as a human being, to look at something and immediately know what you’re looking at. But that’s not how computers work.

Tasks that are too difficult for you, such as complex arithmetic, or mathematics in general, can be done effortlessly by a computer. But here, the reverse is true — tasks that are trivial for you, like recognizing a cat or dog in an image, are really hard for a computer. In a way, we were made for each other. At least for now.

While image classification and tasks that involve some degree of computer vision may require a lot of code and a solid understanding, reading text from well-formed images is simple in Python and can be applied to many real-world problems.

In today’s post, I want to prove it. Some libraries will be installed, but it won’t take much time. Here are the libraries you need:

  • OpenCV
  • PyTesseract

OpenCV

For now, this library will only be used to load images, and you don’t really need to know much about it beforehand (although it might help, you’ll see why).

According to official documents:

OpenCV is an open source computer vision and machine learning software library. OpenCV aims to provide a common infrastructure for computer vision applications and accelerate the use of machine awareness in commercial products. OpenCV is a BSD-licensed product. OpenCV makes it easy for enterprises to use and modify code

In short, you can do any type of image conversion using OpenCV, which is a fairly simple library.

If you haven’t already installed it, it will just be a line in the terminal:

pip install opencv-pythonCopy the code

That’s about it. Until now, things have been simple, but that is about to change.

PyTesseract

What the hell is this library? According to Wikipedia:

Tesseract is an optical character recognition engine for various operating systems. It is free software, released under the Apache License version 2.0, and has been sponsored by Google since 2006.

I’m sure there are more sophisticated libraries available now, but I found this one to work well. Based on my own experience, the library should be able to read text from any image, but only if the font doesn’t overwhelm you.

If you can’t read text from your image, spend more time using OpenCV and apply various filters to make the text highlighted.

It’s a little tricky to install at the bottom right now. If you’re using Linux, it all boils down to a few sudo-apt get commands:

sudo apt-get update
sudo apt-get install tesseract-ocr
sudo apt-get install libtesseract-devCopy the code

I use Windows, so the process is a bit tedious.

First, open the URL:https://github.com/UB-Mannheim/tesseract/wiki download the 32-bit or 64 – bit installer:

The installation itself is a simple matter of clicking Next a few times. Yes, you also need to do a PIP installation:

pip install pytesseractCopy the code

Next you need to tell Python where the Tesseract is installed. I don’t need to do this on Linux machines, but it is required on Windows. By default, it installs Program Files.

If you do everything right, executing this code should not generate any errors:

Get the text

Let’s start with an easy one. I found some royalty-free images with some text in them, and the first one was this:

It should be a simple one, with the possibility that Tesseract will read those blue “objects” as parentheses. Let’s see what happens:

My guess was right. However, this is not a problem, and you can easily solve these problems with some Python tricks.

The next one could be trickier:

I hope it doesn’t detect the “B” on the coin:

It seems to be working pretty well.

Now it’s your turn to apply it to your own problems. OpenCV skills can be crucial here if the text is mixed with the background.

Before you leave

Reading text from an image is a fairly difficult task for a computer. Think about it, a computer doesn’t know what a letter is, it only works with numbers. What happens behind the hood may seem like a black box at first, but I encourage you to investigate further if this is an area of interest to you.

I’m not saying PyTesseract works well every time, but I’ve found it works well enough even on some of the more complex images. But it’s not always good, sometimes it’s a little bit of image manipulation where you have to highlight the text to make it stand out against the background.

reference

  1. https://opencv.org/about/
  2. https://en.wikipedia.org/wiki/Tesseract_(software)

Rock and the AI technology blog resources summary station: http://docs.panchuang.net/PyTorch, the official Chinese tutorial station: Chinese official document: http://pytorch.panchuang.net/OpenCV http://woshicver.com/