An unsuccessful deep learning practice - wechat jump

Recently the micro channel jump jump small program fire, so the day before yesterday also updated the micro channel to play a few discs, at most manual to about 200 on the line.

Later, I planned to write an auxiliary tool with the code. I checked it on Github and found that someone had already made it. The project was launched on December 29th, 2017, and it earned 5K stars in less than 5 days.

Github.com/wangshub/we…

The idea is similar:

Debug mobile phone with ADB, get screenshots;
Identify the center position of the chess piece and the target block from the screenshot;
According to the distance calculation of long press time, coefficient and screen resolution related;
Use ADB to simulate long press to complete the jump.

Alas, what a pity, missed a good project.

Now that someone else has done it, try something different and solve it with deep learning.

The basic idea

The basic flow is similar, the only difference being how to get the center position of the piece and the target block.

If the long press time depends only on the horizontal position of the piece and the target block, then all you need to know is their horizontal coordinates.

It can be regarded as an object detection problem, and objects such as chess pieces in the screenshot can be detected. It is assumed that there are seven types of objects in total:

Chess, chess
Egg block: including sewage waste, Rubik’s Cube Magic, shop Shop, music box
Common blocks include rectangular blocks rect and circular blocks circle

Model implementation

I manually annotated 500 screenshots and based on the SSD_Mobilenet_v1_coco model and the TensorFlow object detection API, the trained model ran like this.

It can be seen that the chess pieces, Rubik’s cube, rectangular block and circular block in the screenshot are all detected, and each detection result includes three parts:

Object position, marked with rectangle, corresponding to the quad ymin, xmin, ymax, xmax;
Object category, one of the above seven categories;
The higher the detection confidence, the more certain the model is to the detection results.

It’s not just simple rule checking, but actually seeing how many objects are in the screenshot and what each object is.

So, all you need to do is extract the position of the chess piece from the detection result, and the position of the top non-chess object, the target block.

With the boundary outline of the object, the horizontal coordinates of the chess piece and the target block can be obtained by taking the midpoint, which is normalized, that is, the screen width is 1 and the distance is between 0 and 1. Then multiply the distance by a coefficient as long press time and simulate execution.

The results

It looks good, but what about the actual runs?

Probably a few hundred. What’s the problem?

The main reason is that the annotation data is too little and the model training is not enough, so the detection result is not accurate enough. Sometimes the chess pieces and target blocks cannot be detected, and once such problems occur, the score will inevitably be broken.

The following methods were tried: one screenshot was shifted in different directions to obtain nine screenshots, hoping to improve the recall rate of detection results. However, there were still cases that detection results could not be detected, and perhaps only more annotation data could solve this problem.

Rules of the test

After 20W rounds of model training, there is still a situation that can not be detected. I am so frustrated that I should write a simple version of code based on rules.

It took less than 20 minutes to write the code, extract the edges with OpenCV, and then detect the horizontal center positions of the pieces and the target blocks. The result looks like this.

As it turns out, the final score is much higher than the previous model…

What about deep learning?

conclusion

In the following situations, defining rules based on manual experience is much less effort and more effective than using deep learning training model:

The problem itself is relatively simple and does not require complex abstractions;
Annotation data is limited, so it is difficult to fully train the model.
The penalty for mistakes is high and there is no tolerance for mistakes. Even if the model works perfectly 99 percent of the time, a 1 percent error instantly ends the game, which is not as reliable as hard Code’s rules.

Of course, if we can work together to get more labeled data out, there may be some hope.

The code is on Github: github.com/Honlan/wech…

Anyway, I went on to score, and with less than a hundred lines of code…

mo4tech.com (Moment For Technology) is a global community with thousands techies from across the global hang out!Passionate technologists, be it gadget freaks, tech enthusiasts, coders, technopreneurs, or CIOs, you would find them all here.