“This is the 20th day of my participation in the November Gwen Challenge. See details of the event: The Last Gwen Challenge 2021”.

preface

Using Python to identify graphic verification code, automatic login. Without further ado.

Let’s have a good time

The development tools

Python version: 3.6.4

Related modules:

Re;

Numpy module;

Pytesseract module;

The selenium module;

And some modules that come with Python.

Environment set up

Install Python and add it to the environment variables. PIP installs the required related modules.

1. Gray processing Turns the color verification code image into a gray image

import cv2

image = cv2.imread('1.jpeg'.0)
cv2.imwrite('1.jpg', image)
Copy the code

2. Binarization processing The image is processed into a picture with only black and white. Here we find that there are no interference lines, which means we only need to deal with the interference points.

import cv2

image = cv2.imread('1.jpeg'.0)
ret, image = cv2.threshold(image, 100.255.1)
height, width = image.shape
new_image = image[0:height, 0:150]
cv2.imwrite('1.jpg', new_image)
Copy the code

3. Noise reduction processing to remove small black dots, that is, isolated black pixels.

The principle of point noise reduction is to detect the 8 adjacent points of black points and judge the color of the 8 points. If all white dots, then the point is considered white, do black dots to white dots processing. For example, at point ⑤, there are 8 adjacent areas in terms of field lattice.

The coordinates of points ①②③ are shown in the figure below. Similarly, the coordinates of points ④⑤⑥⑦⑧⑨ are known

The noise reduction code

import cv2
import numpy as np
from PIL import Image


def inverse_color(image, col_range) :
    # Read the image, 0 means the image becomes grayscale
    image = cv2.imread(image, 0)
    100 = set threshold, 255 = maximum threshold, 1 = threshold type, current point value > threshold, set to 0, otherwise set to 255. Ret is short for Return Value and stands for the current threshold
    ret, image = cv2.threshold(image, 110.255.1)
    # Height and width of the image
    height, width = image.shape
    # Image reverse color processing, reason: the above processing can only generate white words and black background picture, and we need black words and white background picture
    img2 = image.copy()
    for i in range(height):
        for j in range(width):
            img2[i, j] = (255 - image[i, j])
    img = np.array(img2)
    # Capture the processed picture
    height, width = img.shape
    new_image = img[0:height, col_range[0]:col_range[1]]
    cv2.imwrite('handle_one.png', new_image)
    image = Image.open('handle_one.png')
    return image


def clear_noise(img) :
    # Image denoising
    x, y = img.width, img.height
    for i in range(x):
        for j in range(y):
            if sum_9_region(img, i, j) < 2:
                # Change the pixel color to white
                img.putpixel((i, j), 255)
    img = np.array(img)
    cv2.imwrite('handle_two.png', img)
    img = Image.open('handle_two.png')
    return img


def sum_9_region(img, x, y) :
    """ Field case """
    Get the color value of the current pixel
    cur_pixel = img.getpixel((x, y))
    width = img.width
    height = img.height

    if cur_pixel == 255:  # If the current point is a white area, the neighborhood value is not counted
        return 10

    if y == 0:  # the first line
        if x == 0:  # top left vertex,4 neighborhood
            # 3 points next to the center point
            sum_1 = cur_pixel + img.getpixel((x, y + 1)) + img.getpixel((x + 1, y)) + img.getpixel((x + 1, y + 1))
            return 4 - sum_1 / 255
        elif x == width - 1:  # top right vertex
            sum_2 = cur_pixel + img.getpixel((x, y + 1)) + img.getpixel((x - 1, y)) + img.getpixel((x - 1, y + 1))
            return 4 - sum_2 / 255
        else:  # uppermost non-vertex,6 neighborhood
            sum_3 = img.getpixel((x - 1, y)) + img.getpixel((x - 1, y + 1)) + cur_pixel + img.getpixel((x, y + 1)) + img.getpixel((x + 1, y)) + img.getpixel((x + 1, y + 1))
            return 6 - sum_3 / 255

    elif y == height - 1:  # bottom line
        if x == 0:  # lower left vertex
            # 3 points next to the center point
            sum_4 = cur_pixel + img.getpixel((x + 1, y)) + img.getpixel((x + 1, y - 1)) + img.getpixel((x, y - 1))
            return 4 - sum_4 / 255
        elif x == width - 1:  # lower right vertex
            sum_5 = cur_pixel + img.getpixel((x, y - 1)) + img.getpixel((x - 1, y)) + img.getpixel((x - 1, y - 1))
            return 4 - sum_5 / 255
        else:  # lowest non-vertex,6 neighborhood
            sum_6 = cur_pixel + img.getpixel((x - 1, y)) + img.getpixel((x + 1, y)) + img.getpixel((x, y - 1)) + img.getpixel((x - 1, y - 1)) + img.getpixel((x + 1, y - 1))
            return 6 - sum_6 / 255

    else:  # y is not on the boundary
        if x == 0:  # Left non-vertex
            sum_7 = img.getpixel((x, y - 1)) + cur_pixel + img.getpixel((x, y + 1)) + img.getpixel((x + 1, y - 1)) + img.getpixel((x + 1, y)) + img.getpixel((x + 1, y + 1))
            return 6 - sum_7 / 255
        elif x == width - 1:  # Right non-vertex
            sum_8 = img.getpixel((x, y - 1)) + cur_pixel + img.getpixel((x, y + 1)) + img.getpixel((x - 1, y - 1)) + img.getpixel((x - 1, y)) + img.getpixel((x - 1, y + 1))
            return 6 - sum_8 / 255
        else:  # Meet the requirements of 9 domains
            sum_9 = img.getpixel((x - 1, y - 1)) + img.getpixel((x - 1, y)) + img.getpixel((x - 1, y + 1)) + img.getpixel((x, y - 1)) + cur_pixel + img.getpixel((x, y + 1)) + img.getpixel((x + 1, y - 1)) + img.getpixel((x + 1, y)) + img.getpixel((x + 1, y + 1))
            return 9 - sum_9 / 255


def main() :
    img = '1.jpeg'
    img = inverse_color(img, (0.160))
    clear_noise(img)


if __name__ == '__main__':
    main()
Copy the code

With the biggest problem solved, the next step is to implement automatic login. First, use Selenium to automatically click the login button.

Finally, the verification code is successfully obtained.

Why is this a screenshot? The reason is that the captcha image changes all the time. For example, if I now copy the image link of the 8863 verification code and open it in a new TAB, I will see that the verification code has changed, not 8863, but a different image of the verification code. So we get the captcha image by getting the captcha link of the current page, this method is definitely not feasible.

This problem can be solved successfully by referring to relevant information and knowing that there are links with cookies to access verification codes. But because the relevant library did not import success, also gave up. We’ll solve that next time when we do captcha machine learning.

Log in successfully