One, the origin

The beautiful secretary of the company knew I was an image recognition programmer, and she came to me specifically. She said, “I have been in contact with all industries, but I have not been in contact with you men in the IT field. I would like to ask, are you directly doing IT? I’m kind of overwhelmed… She goes on to say, is there any preparation before you do it? I said yes. She’s happy. That’s great. There must be some pre-processing of the image before it’s recognized. I happen to have some scanned files in my hand, but they are not standard. Could you help me to make them?

The secretary gave me this picture, saying that the blank area of the confidential document was too big. She wanted only the text area, and asked me to use the program to mark the text area.

BoundingRect boundingRect bounding rectangle

This requirement is too simple.

The first thing that comes to mind is the boundingRect method in VC2, which is the professional box rectangle area.

Through grayscale, invert, binarization, processing, and finally to boundingRect recognition, I quickly made it, the effect is as follows:

The code is as follows:

def boundingRect(image_path) :

    # read in picture 3 channel [[[255255255], [255255255]], [[255255255], [255255255]]]
    image = cv2.imread(image_path)
    [[255 255]]
    gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
    # White Upside down
    gray = cv2.bitwise_not(gray)
    # binarization
    thresh = cv2.threshold(gray, 0.255, cv2.THRESH_BINARY | cv2.THRESH_OTSU)[1]
    X-x position, Y-y position, W-width, H-height
    x, y, w, h = cv2.boundingRect(thresh)
    left, top, right,  bottom = x, y, x+w, y+h

    # Draw the box on the picture
    cv2.rectangle(image,(left, top), (right, bottom), (0.0.255), 2)

    Save the processed files to the current directory
    #cv2.imwrite('img2_1_rotate.jpg', image)
    # Popup the processed file
    # cv2.imshow("output", image)
    # cv2.waitKey(0)

boundingRect('img1_0_origin.jpg')
Copy the code

I wanted to work up the courage to go to the secretary and tell her my idea. Afraid of stumbling after the meeting, I made a good draft in advance.

Imread (cv2.imshow(“output”, image)); cv2.imshow(“output”, image); The graph data looks like this [[[255,255,255],[255,255,255]],[[255,255,255],[255,255,255]], each pixel has three RGB values.

This data is still a bit complicated for identifying text boundaries. Because the computer doesn’t necessarily care whether it’s red or green, just whether it’s red or green. Therefore, it is necessary to call cv2.cvtColor(image, cv2.color_bgr2Gray) for gray processing of the graph, and the graph data after processing becomes simple and becomes a single channel pixel set [[255 255],[255 255]].

At this point, the value of the white area is close to 255, and the value of the black area is close to 0. We pay more attention to the words in the black area, which has a value of 0. That’s not possible. Computers generally ignore 0 and care about 255 (which is why many training sets for word recognition are white on a black background). So, we need to do a black and white inversion to change our focus to 255.

Black and white inversion algorithm is very simple, 255- can be used. 255-255 = 0255-0 = 255. So the black became white, and the white became black. However, the grayscale image is a number between 0 and 255, and there will be 127 and 128 pixels that are neither black nor white. There will also be some 5, 6, 7 this kind of burr or shadow, say it is a word, still can’t see clearly, say no, indistinct still have. Life must be decisively carried out “break”, “give up”, from “, the procedure must be so. Either word, or blank, I am so determined, this is also to reduce the amount of calculation.

Now, the image only exists at 0 or 255. We’ll give it to boundingRect and it’ll return whatever value we need.

I handed it over to my secretary, and I didn’t say anything.

She was busy and thanked me. She said she would try it later.

I waited excitedly for her reply, not writing a single line of code all afternoon, and went over and over the code I had given her, looking for any omissions.

Three, minAreaRect minimum area rectangle

The secretary asked me to go. I went to the bathroom first.

I imagined how I would respond to her thanks, with a smile and with what I said. No, I have to be cocky, because it’s a small thing, and it can be done in a minute, and I’ll be there when it happens. Isn’t that unfriendly…

I did see her, and she said there was something wrong with the program, that the little card she recognized wasn’t what she wanted, and I looked at it.

It turned out that the secret document was slanted, so the box came out slanted. This is an anomaly.

Because she was not a product manager, I held back my anger. I said I’d go back and check.

I found another method, minAreaRect, which boxes out the smallest area of an area. Even if the picture is tilted, its frame has to be tilted to get the minimum area. You get the tilt Angle, you rotate it back, it’s normal, and it’s a success.

I began to practice in front of the mirror again, I must summon up the courage to tell her the realization of the idea, at the same time, the reason for the failure last time also said.

The code is as follows:

def minAreaRect(image_path) :

    # read in picture 3 channel [[[255255255], [255255255]], [[255255255], [255255255]]]
    image = cv2.imread(image_path)
    [[255 255]]
    gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
    # White Upside down
    gray = cv2.bitwise_not(gray)
    # binarization
    thresh = cv2.threshold(gray, 0.255, cv2.THRESH_BINARY | cv2.THRESH_OTSU)[1]
    Find the row and column of points greater than 0
    ys, xs = np.where(thresh > 0)
    [306 37][306 38][307 38]], which are all non-zero pixels
    coords = np.column_stack([xs,ys])
    Get the minimum rectangle information (center point, width, Angle)
    rect = cv2.minAreaRect(coords)
    angle = rect[-1] The last parameter is the Angle
    print(rect,angle) # ((26.8, 23.0), (320.2, 393.9), 63.4)

    Obtain the coordinates of the four vertices by conversion
    box = cv2.boxPoints(rect)
    box = np.int0(cv2.boxPoints(rect))
    print(box) # [[15 181][367 5][510 292][158 468]]
    
    # Frame, popover display
    cv2.drawContours(image, [box], 0, (0.0.255), 2)
    cv2.imshow("output", image)
    cv2.waitKey(0)

    return angle
Copy the code

The front read picture, gray, black and white upside down, binarization, and boundingRect processing.

The difference is that in order to find the minimum area, minAreaRect requires you to provide it with all non-blank coordinates. It calculates these coordinates by its own algorithm, and it can draw a rectangular region that’s not parallel to the XY axis, that just wraps around these coordinate points.

MinAreaRect (248.26095581054688, 237.67669677734375), (278.31488037109375, 342.6839904785156), It is divided into three parts (center point coordinates X and Y, length and width H and W, and Angle A).

Let’s look at Angle A first, if Angle A is positive.

BoxPoints (rect) converts the four vertices from minAreaRect to [[15 181][367 5][510 292][158 468]].

As for how to rotate the image… You just write it down and use it, because that’s not the point. There’s no need to spend energy on everything. Life always has to leave some regrets to make up for.

Pass in the image data array and the Angle to rotate
def rotate_bound(image, angle) :
    Get width and height
    (h, w) = image.shape[:2]
    (cX, cY) = (w // 2, h // 2)
    Extract the rotation matrix sin and cosine
    M = cv2.getRotationMatrix2D((cX, cY), -angle, 1.0)
    cos = np.abs(M[0.0])
    sin = np.abs(M[0.1])
    # Calculate the new boundary size of the image
    nW = int((h * sin) + (w * cos))
    nH = h
    Adjust the rotation matrix
    M[0.2] += (nW / 2) - cX
    M[1.2] += (nH / 2) - cY
 
    return cv2.warpAffine(image, M, (nW, nH),flags=cv2.INTER_CUBIC, borderMode=cv2.BORDER_REPLICATE)
Copy the code

Find the Angle and rotate the picture

Image = rotate_bound(cv2.imread('img2_0_rotate.jpg'), 90-angle) cv2.imshow("output", image) cv2.waitKey(0)Copy the code

I went to the secretary with the program, opened the door, saw that she had just put her coat on the hanger, and the moment she turned around, I saw that she was wearing a low-cut dress, very low-cut, very hot.

No way! I am a gentleman, see no evil.

I said, the program has been changed, you try again. Then I slammed the door with a red face.

After a while, she called me back and said there seemed to be a problem with the procedure.

When I went there again, I found that she had put on her coat. Why did she put on her coat when it was not very cold?

She described the phenomenon to me. She said look, the processed picture is still not what she wants, she just wants to straighten the text picture.

I looked at it and saw that the exception was too unusual. First, a slanted rectangular text area, and the text within that area was slanted. This thing, you can’t frame it right.

Beauty secretary jiao shyly ask: big engineer, is this difficult?

“Difficulty, ha ha ha, does not exist! I go back first, take out a spare time to fix for you!” I walked straight out of the room, and the moment I closed the door, I lost my breath.

Four, HoughLinesP hough line transformation

You can’t think about boxes and regions anymore. All boxes are useless. Don’t even think about whether the weather is hot or not, these will only confuse the mind.

I finally found a way to turn the twisted text right.

I’m using something called the Hough transform.

Hough transform can recognize not only straight lines, but also any shape, such as circle and ellipse.

I’m not greedy. I’m just using it here to identify lines.

# read in picture 3 channel [[[255255255], [255255255]], [[255255255], [255255255]]]
image = cv2.imread(image_path)
[[255 255]]
gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
 # Handle edges
edges = cv2.Canny(gray, 500.200)
So let's get the set of all the lines
lines = cv2.HoughLinesP(edges, 1, np.pi/180, threshold=30, maxLineGap=200)
print(lines) # [[[185 153 369 337]] [[128 172 355 362]]
Copy the code

Image gray processing is the same as before, the main purpose is to make the data is concise and effective.

There is a cV2.Canny(Gray, 500, 200) edge treatment, treatment effect is as follows.

Again, the goal is to make the data more concise and efficient.

Describe the 3 parameters of Canny(Gray, 500, 200).

  1. The first1Three parameters are grayscale image data, as it can only be processedGray image.
  2. The first2A parameter isThe threshold, for settingThe strength of the edgesThe larger the value, the more rugged the edge, to a certain extent, the edge is not connected to the block intermittently.
  3. The first3A parameter isSmall thresholdforRepair broken edges, the value determines the degree of fineness of repair.

Through edge processing, we can get the outline of the image, which does not affect the structure of the original image. Although two points make a line, a line can also have three points, four points, five points, if the conditions are right, it is possible to determine the line from the points.

Cv2.HoughLinesP would toss and turn on the picture to draw lines based on the outline, according to the conditions, to see how many straight lines could be drawn. There must be a lot of them, but not all of them fit, and it can find all of the line segments that fit according to the parameters as shown below.

We interpret its parameters cv2.HoughLinesP(edges, 1, Np.pi /180, Threshold =30, maxLineGap=200).

  1. The first1A parameteredgesIt’s the pixel data at the edge.
  2. The first2The parameter is moving a few pixels at a time when searching for a line.
  3. The first3The argument is how many angles to rotate each time you search for a line.
  4. The first4A parameterthresholdThe minimum number of points intersecting a line.
  5. The first5A parametermaxLineGapThe maximum pixel distance between two points considered a straight line is too far to be considered a straight line.

After these filters, the lines come out.

Now that we have a line, we can calculate the Angle of the line, using an inverse trigonometric formula.

# Calculate the Angle of a line
def calculateAngle(x1,y1,x2,y2) :

    x1 = float(x1)
    x2 = float(x2)
    y1 = float(y1)
    y2 = float(y2)

    if x2 - x1 == 0:
        result=90 # The line is vertical
    elif y2 - y1 == 0:
        result=0 # The line is horizontal
    else:
        # calculate slope
        k = -(y2 - y1) / (x2 - x1)
        # Take the inverse tangent and convert the resulting radians to degrees
        result = np.arctan(k) * 57.29577
    return result
Copy the code

Then we calculate the angles of all the lines. Then the Angle with the highest frequency is selected, and this Angle basically represents the overall tilt Angle.

Store the tilt Angle of all line segments
angles = []
for line in lines:
    x1, y1, x2, y2 = line[0]
    cv2.line(image, (x1, y1), (x2, y2), (0.0.255))
    angle = calculateAngle(x1, y1, x2, y2)
    angles.append(round(angle))
# Find the one that appears most
mostAngle = Counter(angles).most_common(1) [0] [0]
print("mostAngle:",mostAngle)
Copy the code

Finally, we call the method to show the rotated effect.

mostAngle = houghImg("img3_0_cut.jpg")
# Rotate the image to see the effect
image = rotate_bound(cv2.imread('img3_0_cut.jpg'), mostAngle)
cv2.imshow("output", image)
cv2.waitKey(0)
Copy the code

In this way, we can also correct this document.

I hurried to find the beautiful secretary, told her the good news.

She was pleased and said she had been working on the program herself and asked me for the source code.

So, I put the project’s Github address github.com/hlwgy/doc_c… Gave her.

That day, we both laughed very happily.

Five, the story continues…

I didn’t know about it until later.

The beautiful secretary has a young boyfriend, her boyfriend is also engaged in programming.

The problem with document correction is not that the beautiful secretary needs it, but that her boyfriend has a problem at work.

I remembered a lot of strange things: the secretary knew about image preprocessing, but also provided such a professional abnormal scene, and she asked me for the project source code.

It all seemed so unreasonable at the time that it seems so reasonable now.

“Big engineer, I will wait for you in room 609 of Happiness Hotel… Please do come!” I got a message from her.

Sent you a wrong?

  1. “Big engineer” was what she used to call me.
  2. The Happiness Hotel is next to our office building.

On purpose? What is she going to do? I was lost in thought…