This is the 8th day of my participation in the Novembermore Challenge.The final text challenge in 2021

img

In this article, we will perform (1) text detection and (2) text recognition using OpenCV, Python, and Tesseract. In my last article, I showed you how to perform text detection using OpenCV’s EAST deep learning model. Using this model, we are able to detect and locate the bounding box coordinates of the text contained in the image. The next step is to get each area that contains the text and actually recognize and OCR the text using OpenCV and Tesseract.

To perform OpenCV OCR text recognition, we first need to install Tesseract V4, which includes a highly accurate text recognition model based on deep learning.

The steps of this article:

  • Text detection is performed using OpenCV’s EAST text detector, a highly accurate deep learning text detector for detecting text in natural scene images.
  • Once we detect text areas using OpenCV, we will extract each text ROI and pass them to Tesseract, enabling us to build the full OpenCV OCR pipeline!
  • Finally, I’ll close today’s tutorial by showing you some sample results of applying text recognition using OpenCV, and discussing some of the limitations and drawbacks of this approach.

Let’s continue using OpenCV OCR!

How to install Tesseract 4

Install Tesseract 4 on Ubuntu

The exact commands used to install Tesseract 4 on Ubuntu will vary depending on whether you are using Ubuntu 18.04 or Ubuntu 17.04 and earlier.

To check your Ubuntu version, you can use the lsb_release command:

 lsb_release -a
Copy the code

As you can see, I’m running Ubuntu 18.04, but you should check your Ubuntu version before continuing.

For Ubuntu 18.04 users, Tesseract 4 is part of the main apt-get repository and it is very easy to install Tesseract with the following command:

sudo apt install tesseract-ocr
Copy the code

Install Tesseract 4 on macOS

If Homebrew, the “unofficial” package manager for macOS, is installed on your system, installing Tesseract on macOS is simple. Tesseract V4 will be installed on your Mac simply by running the following command:

brew install tesseract
Copy the code

If you already have Tesseract installed on your Mac (for example, if you followed my previous Tesseract installation tutorial), you will first unlink the original installation:

 brew unlink tesseract
Copy the code

Install Tesseract 4 on Windows 10

Introduction to OCR, Tesseract installation and use _AI Hao -CSDN blog

You can then run the install command.

Verify your version of Tesseract

image-20211118133720869

Once you have Tesseract installed on your machine, you should verify your Tesseract version by executing the following command:

tesseract -v
Copy the code

As soon as you see TesserAct 4 somewhere in the output, you know you have the latest version of Tesseract installed on your system.

We will then install Pillow using PIP, which is a more Pythonfriendly version of PIL, followed by Pytesseract and Imutils:

$ pip install pillow
$ pip install pytesseract
$ pip install imutils
Copy the code

Understand OpenCV OCR and Tesseract text recognition

image-20211118133733981

Now that we have successfully installed OpenCV and Tesseract on our system, we need to briefly review our pipes and related commands.

First, we will use OpenCV’s EAST text detector to detect the presence of the image text.

The EAST text detector will give us the bounding box (x, y) coordinates of the text ROI.

We will extract each of these ROIs and then pass them to Tesseract V4’s LSTM deep learning text recognition algorithm. The output of the LSTM will give us the actual OCR results.

Finally, we will draw the OpenCV OCR result on the output image. But before we really start our project, let’s briefly review the Tesseract command (which will be called in the background by the PyTesserAct library).

When calling the TessArct binary, we need to provide a number of flags. The three most important are -L, — OEM, and — PSM.

-l Controls the language of the input text. We will use ENG (English) in this example, but you can see all the languages Tesseract supports here.

OEM parameter or OCR engine mode controls the algorithm type used by Tesseract. You can run the following command to view the available OCR engine modes:

$ tesseract --help-oem
OCR Engine modes:
  0    Legacy engine only.
  1    Neural nets LSTM engine only.
  2    Legacy + LSTM engines.
  3    Default, based on what is available.
Copy the code

We will use — OEM 1 says we want to use only the deep learning LSTM engine.

The last important sign — PSM controls the automatic page splitting mode used by Tesseract:

tesseract --help-psm
Page segmentation modes:
  0    Orientation and script detection (OSD) only.
  1    Automatic page segmentation with OSD.
  2    Automatic page segmentation, but no OSD, or OCR.
  3    Fully automatic page segmentation, but no OSD. (Default)
  4    Assume a single column of text of variable sizes.
  5    Assume a single uniform block of vertically aligned text.
  6    Assume a single uniform block of text.
  7    Treat the image as a single text line.
  8    Treat the image as a single word.
  9    Treat the image as a single word in a circle.
 10    Treat the image as a single character.
 11    Sparse text. Find as much text as possible in no particular order.
 12    Sparse text with OSD.
 13    Raw line. Treat the image as a single text line,
       bypassing hacks that are Tesseract-specific.
Copy the code

For OCR text ROI, I found that modes 6 and 7 work fine, but if you’re working with large chunks of text at OCR, then you might want to try 3, the default mode. Whenever you find yourself getting incorrect OCR results, I strongly recommend that you adjust — PSM, because it can have a huge impact on your output OCR results.

The project structure

├── images │ ├── example_02.jpg │ ├─ example_03.jpg │ ├─ example_03.jpg │ ├─ example_03.jpg │ ├─ example_03.jpg │ ├─ example_03.jpg │ ├─ example_03.jpg │ ├── example_03.jpg │ ├─ ├── Example_05.jpg ├─ frozen_east_text_identity.pb ├─ text_identity.pyCopy the code

Our project contains a directory and two notable files:

  • Images / : A directory containing six test images that contain scene text. We will try OpenCV OCR on each of these images.
  • Frozen_east_text_detection. pb: EAST text detector. The CNN has been pre-trained for text detection and can be used immediately.
  • Text_recognize.py: Our OCR script — we’ll look at this script line by line. The script uses the EAST text detector to find text areas in the image and then uses Tesseract V4 to identify them.

Implement OpenCV OCR algorithm

We are now ready to perform text recognition using OpenCV! Open the text_recognization.py file and insert the following code:

# import the necessary packages
from imutils.object_detection import non_max_suppression
import numpy as np
import pytesseract
import argparse
import cv2
Copy the code

Today’s OCR script requires five imports, one of which is built into OpenCV. Most notably, we’ll use PyTesserAct and OpenCV.

My imutils package will be used for non-maximum suppression because OpenCV’s NMSBoxes function doesn’t seem to work with the Python API.

I’ll also note that NumPy is a dependency on OpenCV. The argparse package is included in Python and handles command-line arguments — you don’t need to install anything. Now that our imports have been processed, let’s implement the decode_Predictions function:

def decode_predictions(scores, geometry):
 # grab the number of rows and columns from the scores volume, then
 # initialize our set of bounding box rectangles and corresponding
 # confidence scores
 (numRows, numCols) = scores.shape[2:4]
 rects = []
 confidences = []
 # loop over the number of rows
 for y in range(0, numRows):
  # extract the scores (probabilities), followed by the
  # geometrical data used to derive potential bounding box
  # coordinates that surround text
  scoresData = scores[0, 0, y]
  xData0 = geometry[0, 0, y]
  xData1 = geometry[0, 1, y]
  xData2 = geometry[0, 2, y]
  xData3 = geometry[0, 3, y]
  anglesData = geometry[0, 4, y]
  # loop over the number of columns
  for x in range(0, numCols):
   # if our score does not have sufficient probability,
   # ignore it
   if scoresData[x] < args["min_confidence"]:
    continue
   # compute the offset factor as our resulting feature
   # maps will be 4x smaller than the input image
   (offsetX, offsetY) = (x * 4.0, y * 4.0)
   # extract the rotation angle for the prediction and
   # then compute the sin and cosine
   angle = anglesData[x]
   cos = np.cos(angle)
   sin = np.sin(angle)
   # use the geometry volume to derive the width and height
   # of the bounding box
   h = xData0[x] + xData2[x]
   w = xData1[x] + xData3[x]
   # compute both the starting and ending (x, y)-coordinates
   # for the text prediction bounding box
   endX = int(offsetX + (cos * xData1[x]) + (sin * xData2[x]))
   endY = int(offsetY - (sin * xData1[x]) + (cos * xData2[x]))
   startX = int(endX - w)
   startY = int(endY - h)
   # add the bounding box coordinates and probability score
   # to our respective lists
   rects.append((startX, startY, endX, endY))
   confidences.append(scoresData[x])
 # return a tuple of the bounding boxes and associated confidences
 return (rects, confidences)
Copy the code

The decode_Predictions function, explained in detail in the EAST text detection post.

Then, parse our command-line arguments:

# construct the argument parser and parse the arguments ap = argparse.ArgumentParser() ap.add_argument("-i", "--image", type=str, help="path to input image") ap.add_argument("-east", "--east", type=str, Help ="path to input EAST text detector") ap.add_argument("-c", "--min-confidence", type=float, default=0.5, help="minimum probability required to inspect a region") ap.add_argument("-w", "--width", type=int, default=320, help="nearest multiple of 32 for resized width") ap.add_argument("-e", "--height", type=int, default=320, Help ="nearest multiple of 32 for resized height") ap.add_argument("-p", "--padding", type=float, default=0.0, help="amount of padding to add to each border of ROI") args = vars(ap.parse_args())Copy the code

Our script requires two command-line arguments:

–image: Enter the path of the image.

–east: pretrains the path of the EAST text detector.

Alternatively, you can provide the following command-line arguments:

  • –min-confidence: Indicates the minimum probability of detecting a text area.
  • –width: The width to which our image will be adjusted before passing through the EAST text detector. We need multiples of 32 for our detector.
  • –height: Same as width, but used for height. Again, our detector needs multiples of 32 to adjust the height.
  • — PADDING: Optional amount of padding added to each ROI border. If you find that the OCR result is incorrect, you can try using 0.05 for 5% or 0.10 for 10% (and so on).

From there, we’ll load + preprocess our image and initialize key variables:

# load the input image and grab the image dimensions
image = cv2.imread(args["image"])
orig = image.copy()
(origH, origW) = image.shape[:2]
# set the new width and height and then determine the ratio in change
# for both the width and height
(newW, newH) = (args["width"], args["height"])
rW = origW / float(newW)
rH = origH / float(newH)
# resize the image and grab the new image dimensions
image = cv2.resize(image, (newW, newH))
(H, W) = image.shape[:2]
Copy the code

The image is loaded into memory and copied (so that we can draw our output on it later).

We get the original width and height, and then extract the new width and height from the ARGS dictionary. Using the original size and the new size, we calculate the ratio used to scale the bounding box coordinates later in the script. Our image is then resized, ignoring aspect ratios. Next, let’s use the EAST text detector:

# define the two output layer names for the EAST detector model that # we are interested in -- the first is the output probabilities and the # second can be used to derive the bounding box coordinates of text layerNames = [ "feature_fusion/Conv_7/Sigmoid", "feature_fusion/concat_3"] # load the pre-trained EAST text detector print("[INFO] loading EAST text detector..." ) net = cv2.dnn.readNet(args["east"])Copy the code

Our two output layer names are listed. To see why these two output names are important, you need to refer to my tutorial on raw EAST text detection.

Then, our pre-trained EAST neural network is loaded into memory. I can’t stress this enough: you need at least OpenCV 3.4.2 to have a Cv2.dnn. readNet implementation. The following:

# construct a blob from the image and then perform a forward pass of
# the model to obtain the two output layer sets
blob = cv2.dnn.blobFromImage(image, 1.0, (W, H),
 (123.68, 116.78, 103.94), swapRB=True, crop=False)
net.setInput(blob)
(scores, geometry) = net.forward(layerNames)
# decode the predictions, then  apply non-maxima suppression to
# suppress weak, overlapping bounding boxes
(rects, confidences) = decode_predictions(scores, geometry)
boxes = non_max_suppression(np.array(rects), probs=confidences)
Copy the code

To determine the text position, we:

  • Construct a BLOB.
  • The bloB is run through a neural network to obtain fractions and geometry.
  • Decode the predictions using the decode_Predictions function defined earlier. Apply non-maximum suppression through my imutils method. NMS effectively retrieves the most likely text area, eliminating other overlapping areas.

Now that we know where the text area is, we need to take steps to identify the text! We start walking through the bounding boxes and processing the results, ready for actual text recognition:

# initialize the list of results results = [] # loop over the bounding boxes for (startX, startY, endX, endY) in boxes: # scale the bounding box coordinates based on the respective # ratios startX = int(startX * rW) startY = int(startY * rH) endX = int(endX * rW) endY = int(endY * rH) # in order to obtain a better OCR of the text we can potentially # apply  a bit of padding surrounding the bounding box -- here we # are computing the deltas in both the x and y directions dX =  int((endX - startX) * args["padding"]) dY = int((endY - startY) * args["padding"]) # apply padding to each side of the bounding box, respectively startX = max(0, startX - dX) startY = max(0, startY - dY) endX = min(origW, endX + (dX * 2)) endY = min(origH, endY + (dY * 2)) # extract the actual padded ROI roi = orig[startY:endY, startX:endX]Copy the code

We initialize the result list to contain OCR bounding boxes and text.

Then we start walking through the box, we:

  • Scale the bounding box based on the ratio previously calculated.
  • Fill the bounding box.
  • Finally, extract the ROI for the fill (line 144).

Our OpenCV OCR pipeline can be done with a bit of Tesseract V4 “magic” :

# in order to apply Tesseract v4 to OCR text we must supply
 # (1) a language, (2) an OEM flag of 4, indicating that the we
 # wish to use the LSTM neural net model for OCR, and finally
 # (3) an OEM value, in this case, 7 which implies that we are
 # treating the ROI as a single line of text
 config = ("-l eng --oem 1 --psm 7")
 text = pytesseract.image_to_string(roi, config=config)
 # add the bounding box coordinates and OCR'd text to the list
 # of results
 results.append(((startX, startY, endX, endY), text))
Copy the code

Note the comments in the code block and set the Tesseract configuration parameter.

Note: If you find yourself getting incorrect OCR results, you may need to use the configuration — PSM value described at the top of this tutorial.

The PyTesseract library handles the rest of line 152 where we call Pytesseract.image_to_string, passing in our ROI and Config String.

In the next two lines of code, you use Tesseract V4 to identify text ROI in the image.

Remember, there’s a lot going on behind the scenes. Our results (bounding box values and actual text strings) are appended to the results list. Then we continue the process for the other ROIs at the top of the loop. Now let’s display/print the result to see if it really works:

# sort the results bounding box coordinates from top to bottom results = sorted(results, key=lambda r:r[0][1]) # loop over the results for ((startX, startY, endX, endY), text) in results: # display the text OCR'd by Tesseract print("OCR TEXT") print("========") print("{}\n".format(text)) # strip out non-ASCII text so we can draw the text on the image # using OpenCV, then draw the text and a bounding box surrounding # the text region of the input image text = "".join([c if ord(c) < 128  else "" for c in text]).strip() output = orig.copy() cv2.rectangle(output, (startX, startY), (endX, endY), (0, 0, Cv2. PutText (output, text, (startX, starty-20), cv2.FONT_HERSHEY_SIMPLEX, 1.2, (0, 0, 255), 3) # show the output image cv2.imshow("Text Detection", output) cv2.waitKey(0)Copy the code

Our results are sorted from top to bottom by the y-coordinate of the bounding boxes (although you may want to order them differently). Then, as a result of the loop, we:

  • Print the OCR text to the terminal.
  • Remove non-ASCII characters from text, because OpenCV does not support non-ASCII characters in cv2.putText functions.
  • Draw (1) the bounding box around ROI and (2) the resulting text above ROI.
  • Displays output and waits for any key to be pressed.

OpenCV text recognition results

Now that we have implemented our OpenCV OCR pipeline, let’s see it in action.

Open a command line, navigate to the location where you downloaded and unzipped the zip, and execute the following command:

python text_recognition.py --east frozen_east_text_detection.pb \ --image images/example_01.jpg [INFO] loading EAST text  detector... OCR TEXT ======== OH OKCopy the code

image-20211118133751317

Let’s start with a simple example. Notice how our OpenCV OCR system is able to correctly (1) detect the text in the image, and then (2) also recognize the text. The next example is more representative of the text we see in real-world images:

python text_recognition.py --east frozen_east_text_detection.pb \ --image images/example_02.jpg [INFO] loading EAST text  detector... OCR TEXT ======== ® MIDDLEBOROUGHCopy the code

image-20211118133800669

conclusion

In today’s tutorial, you learned how to apply OpenCV OCR to do two things:

  • Text detection
  • Character recognition

To accomplish this task, we:

  • OpenCV’s EAST text detector enables us to apply deep learning to locate text regions in images
  • We extracted each text ROI and then applied text recognition using OpenCV and Tesseract V4. We also looked at Python code that performs text detection and text recognition in a single script.

Our OpenCV OCR pipeline worked well in some cases, but failed in others. To get the best OpenCV text recognition results, I recommend that you ensure that:

  • Clean up and preprocess your input ROI as much as possible. In an ideal world, your text would be perfectly separated from the rest of the image, but in practice, that’s not always possible.
  • Your text is taken from the camera at a 90-degree Angle, similar to a top-down aerial view. If this is not the case, perspective transformations may help you get better results.

I hope you enjoyed today’s post on OpenCV OCR and text recognition

In this tutorial, you learned how to apply OpenCV OCR to do two things:

  • Text detection
  • Character recognition

To accomplish this task, we:

  • OpenCV’s EAST text detector enables us to apply deep learning to locate text regions in images
  • We extracted each text ROI and then applied text recognition using OpenCV and Tesseract V4. We also looked at Python code that performs text detection and text recognition in a single script.

Our OpenCV OCR pipeline worked well in some cases, but failed in others. To get the best OpenCV text recognition results, I recommend that you ensure that:

  • Clean up and preprocess your input ROI as much as possible. In an ideal world, your text would be perfectly separated from the rest of the image, but in practice, that’s not always possible.
  • Your text is taken from the camera at a 90-degree Angle, similar to a top-down aerial view. If this is not the case, perspective transformations may help you get better results.

I hope you enjoy today about OpenCV OCR and text recognition post complete code: download.csdn.net/download/hh…