An overview of the

CV(Computer Vision) is relatively successful in the real world, such as face recognition in daily life, license plate recognition, fingerprint comparison, electronic image stabilization, pedestrian, vehicle tracking, etc.. So in other areas, such as mobile games that people often play, what applications can CV have? There are still differences between the images of game scenes and those of real scenes. Scenes in some games are relatively complex, for example, special effects interference, game characters are not as regular as real people, and art fonts are not as fixed as license plates and have a uniform background color, etc. Some elements are relatively simple, such as fixed ICONS at fixed locations in the game, and so on. Simple game elements can be detected using traditional image detection methods, and can also achieve better results. In this article, we will take a look at common game scene recognition.

1. Process

The recognition of game scenes can be divided into two modules: GameClient and CVServer. The GameClient module is responsible for getting real-time images from a mobile phone or PC and sending them to CVServer. The CVServer processes the received Game graphics and returns the results to the Game Client. After further processing according to the requirements of the Game Client, and then feedback to the Game side. The process is shown in Figure 1.



Figure 1 The main process of game scene identification

2. Application examples

The last section mainly shared the main process of game scene recognition with you. In this section, we will mainly analyze the application of image recognition in games.

2.1 Determination of Game State Each game UI is called a game state. Games can be thought of as having many different UIs. The sample library of these UIs is established first. When a game screen is acquired in real time, the current image and sample image can be compared to determine the current game state. There are many methods to compare whether two images are similar. Here, we take feature point matching as an example, and the main steps are as follows:

Step1: Extract feature points from sample images and test feature point matching of images



Fig. 2 Feature point extraction

Step2: Matching feature points



Fig. 3 Feature point matching

Step3: Matching and screening



Fig. 4 Matching screening is performed according to Ratio – Test

ORB feature point matching is a relatively mature technology. In the collected test data set, there will be a large difference in image size or UI rendering position due to the difference in phone resolution, fringe or rendering, which is difficult to adapt to the common template matching. This is not the case for the matching scheme based on feature points, which generally refer to corner points or salient points in the image. And the position and size of the object elements in the image is not obvious, so the applicability is stronger. ORB feature point combines FAST feature point detection method with Brief feature descriptor, and improves and optimizes them on the basis of the original. ORB feature points are rotation invariant and scale invariant. The following sections introduce feature point extraction, feature point description, feature point matching and feature point screening.

2.1.1 Feature point extraction :FAST

The basic idea of FAST is that if a pixel p is significantly different from enough pixels in its surrounding neighborhood (1 to 16), then the pixel may be a corner point. The original FAST feature points are not scale invariant. The implementation of ORB in OpenCV achieves scale invariant by constructing Gaussian pyramid and then detecting corner points on each pyramid image. The original FAST also does not have directional invariance. In Orb’s paper, we propose a method to solve this problem by using gray centroid method. For any feature point P, the moment defining the neighborhood pixel of P is



, where I(x,y) is the gray value at point (x,y), and the centroid of the image is:



, the Angle between the feature points and the center of mass, namely, the direction of the FAST feature points:





Faster and Better: A Machine Learning Approach to Corner Detection

2.1.2 Description of feature points :BRIEF

The core idea of the BRIEF algorithm is to select N point pairs in a certain way around the key point P, and then combine the comparison results of these N point pairs into a binary code string of length N, which serves as the descriptor of the key point. When ORB calculates the BRIEF descriptor, the coordinate system established is a two-dimensional coordinate system established with the key point as the center of the circle and the line between the feature point P and the centroid (Q) of the fetchpoint region as the X-axis. The center of the circle is fixed, with PQ as the X-axis coordinate and the vertical direction as the Y-axis coordinate. Under different rotation angles, the point pairs taken out from the same feature point are consistent, which solves the problem of rotation consistency.

2.1.3 Feature point matching: Hamming Distance

The hamming distance between two binary strings of equal length is the number of different characters in the corresponding positions of the two binary strings. ORB uses Hamming Distance to measure the Distance between two descriptors.

2.1.4 Feature point screening: Ratio-Test

Ratio-Test is used to eliminate fuzzy matching point pairs with approximate distance ratios (nearest neighbor distance/second nearest neighbor distance). Here, a parameter ratio is used to control the elimination of feature points outside a certain range of distance ratio. As shown in the figure below, when the ratio is about 0.75, the best separation of the correct match and the wrong match can be achieved.



Fig. 7 Ratio graphs of nearest neighbor distance and second nearest neighbor distance. The solid line is the PDF of ratio when the match is correct, and the dashed line is the PDF of ratio when the match is wrong. D. G. Lowe. Distinctive imagefeatures from scale-invariant Keypoints, 2004

2.2 Scene Coverage

The method based on feature point matching can also be used in the application of scene coverage. The first is to load the template image of the core scene. During the running process, AI will collect a large number of screenshots of the game in operation. Based on these screenshots formation test data sets, through each test data sets, respectively based on the part of the image feature point matching algorithm is the core of scene images and testing images, all images and matching feature points matching core scene images and testing images, ultimately selected matching results, the core of the filter is matched to the scene of the image. By matching the images and number of core scenes, the scene coverage during the AI operation can be inferred.

2.3 Number recognition in the game

There are many digital images in the game, such as the number of levels, score, countdown and so on. We can identify the numbers based on the method of CNN. The classification method based on CNN has been proposed for a long time. The earlier classical CNN network is the Lenet network proposed in 1998. The Lenet network is composed of two layers of convolution, two layers of pooling, two layers of full connection and the last layer of Softmax. The input is the digital image, and the output is the category of the digital image.



Figure 6 Lenet network

We can first segment the all-digital image into an independent number, then predict each digital image through Lenet network, and the output digital image category is the recognized number. Finally, we can assemble the all-digital image to get the final all-digital recognition result.



Fig. 7 Digital recognition process

With the deepening of network structure, the strengthening of convolution function, as well as the historical opportunities brought by GPU and big data, CNN has witnessed explosive development in recent years. In addition, CNN is not only used for classification, but also for object detection, that is, from the category of the original output object in the last layer to the position of the output object in the image and the category of the object at this position. We can adopt the algorithm YOLOV3, which is a compromise between speed and accuracy, and optimize the network in two directions, namely, reducing the number of network layers and reducing the number of feature graphs, based on the image features of the project, so as to further optimize the network speed.



Fig. 9 The process of number recognition and recombination

2.4 Recognition of fixed icon in fixed position

There are many applications of template matching. We give examples on the identification of fixed buttons, the identification of prompt information, and the detection of stuck state. In the main interface of the game, the hero’s skills, equipment, operation keys and other buttons are generally in a fixed position. The button icon extracted when the button is in the available state is used as the template. The real-time acquired game interface detects the template, indicating that the current button is available. Once the game AI gets the information about these buttons, it can take appropriate strategies, such as unleashing skills, buying equipment, etc. Game prompt information is similar. There are some prompt messages at fixed positions in the game interface, such as route indication information shown in Figure 7, game end state (success/failure), game running state (start), etc. We first collect the location of these prompt information and these prompt icon template. In the real-time running process of the game, whether the real-time match matches the collected icon template at the appearing position. If the match is found, it indicates that the prompt information is currently present. If it matches the game’s success icon, the AI strategy in this game should be rewarded, instead of punished, etc.



Fig. 10 Recognition of fixed Button



Figure 11 Recognition of the game prompt icon

The idea of template matching is to find the most matching part of an image in another template image. The process is shown in Figure 12.



Figure 12 The process of template matching

The processing steps for template matching are as follows:

Step1: Starting from the upper left corner of the original image, from left to right and from top to bottom, the step size is 1. The similarity between the template image and the window sub-image is calculated by sliding window in turn. Step2: store similar results in the result matrix. Step3: Finally find the best matching value in the result matrix. If the more similar the value is, the greater the value is, then the brightest part in the result matrix is the best match. OpenCV provides the interface function CV2.MatchTemplate (SRC, TMPL, method) for template matching, where method represents the selection of matching methods.

2.5 Object filtering based on pixel features

According to the range of the color value of each channel, the pixel in the detection area is filtered to get the position of the target object conforming to this color feature.

The color characteristics of the blood bar in the game are also quite obvious. For example, the R channel value of red blood strip is larger. The G channel value of green blood strip was larger. The blue blood bar has a larger B channel value. We extracted the color features of the blood strips and filtered the pixel points of the blood strips according to the color features. Many pixel points formed the blood strips. By calculating the connected area of the blood strips, we could know the length of the blood strips and then the percentage of the blood volume. By filtering the health bar pixels, we can know the position attributes and health percentage attributes of friendly units (green health bar or blue health bar) and enemy units (red health bar) on the main interface of the current game. Based on these attributes, the game AI can take different strategies such as running away, attacking forward, forming a team, etc.



Fig. 13 Flow chart of percentage calculation of blood strips

MOBA games, often in the small map, my square tower and the enemy tower. The color range of extraction tower is R(0, 90),G(90, 190),B(110,200). Within the range of the minimap, filter the pixel gray value within the range of the pixel, you can know the position of our (enemy) tower, as well as the tower’s health (the number of pixels). If our hero appears in the minimap and our hero image is surrounded by a green circle, we can also extract the pixel values of the green circle in the range R(80, 140),G(170, 210),B(70,110). Through the gray value of each channel to filter the hero in what position, and then carry out pathfinding or strategy selection.



Fig. 14 Application of small map pixel screening in MOBA games

2.6 There are many other applications of image recognition in games, such as pedestrian detection, hero detection, flower screen detection, air wall detection, mold piercing, and reweighting in the game scene.

3 summary

This article mainly introduces the application of image recognition in the game, such as the determination of the game state, the calculation of the scene coverage, the recognition of the number in the game, the recognition of the fixed position and the fixed icon. I hope that after reading this article, the reader will have a further understanding of the application of image recognition in games.

“UQM User Quality Management” professional game client performance (lag, heat, memory /CPU, network, etc.) and anomalies (crash flashback, ANR, ERROR) monitoring and analysis platform. With the help of deep quality big data analysis, it provides all-round quality monitoring, data analysis and business insight services for game business.

Click the link:

UQM user quality management by WETEST industry leading quality cloud service providers for details