🤖 AI for the Threes! game. 🎮

Making Repo: Halfrost – Field

Follow: halfrost dead simple

Source: github.com/halfrost/th…

inspiration

1 month ago, I participated in an AI competition with two other friends. Although the result of the contest was not ideal, at least I enjoyed the programming process. From this competition, I realized that Go is good at writing server, game simulator and AI. Recently, the auxiliary of Wechat jump jump and the auxiliary of punding conference are basically written by Go. So I can not sit still, also write one to commemorate our competition.

Since I am also a client, the AI must also be able to swipe points on the phone. So look for a mobile game, one that three people can play, or one with a “three” in the name, thus:

Threes preson join one AI competition —> Threes-AI

“Flaunt” the score

So far the Go version of AI has scored in three places, each with 200 reels. The percentage of high scores is something like 20%. Therefore, I hope to increase the rate of high scores to 100% in the second stage of the project, machine learning

1. Play Threes Game official website

This site is the web version of the official game.

This high score video is here, Tencent video link

2. Threes Android client

The reason why there are no screenshots of the game running iOS clients is that iOS clients need to be jailbroken to run. I have all my machines running on iOS 11.2+, and I can run them again after jailbreaking.

3. Threes Game builds its own website

In order to train the model through machine learning, and to publicly demonstrate the strength of the AI, a web version was copied according to the official rules of the game.

This high score video is here, Tencent video link

There is a “rumor” circulating on the Internet that when 12,288 blocks are made, the two 6,144 blocks merge, the game ends and the producer list starts playing. There is no such rule on this site. You can make as high a brick as you want. The score is not capped so that the intelligence of the AI can be fully tested.

Of course, FOR the first two official game addresses, I really did not have a composite of 12288 bricks, so there is no way to verify the “rumor”. 100% crafting of 12288 bricks is also the AI’s goal. We haven’t reached our goal yet.

Operation method

1. Threes Game builds its own website

Docker


// Run the go server on port 9000
docker container run --rm -p 9000:9000 -it halfrost/threes-ai:go0.01.

// Run the web front end again, http://127.0.0.1:9888
docker container run --rm -p 9888:9888 -it halfrost/threes-ai:web0.01.

Copy the code

Local

The local build is a little more complicated, and is also done in two steps: first build go Server, then build Web.

Build go Server first:


// Go back to the project home directory
cd threes-ai
go run main.go

Copy the code

The preceding commands build the GO Server, which listens for messages from port 9000.

Rebuild the Web:

Since the project is based on Meteor, configure the local environment of Meteor and install the link of Meteor manual


/ / in threes! Web home directory
cd threes-ai/threes!
meteor 

Copy the code

The above command will run the Web at http://localhost:3000.

It’s already running here.

Let’s talk about how to package docker images locally.

Package go Server first, because docker is Linux inside, so pay attention to cross-compilation when packaging, otherwise the final Docker can not be executed. See the steps in the Dockerfile_go file for details.


docker image build -t threes_go:0.01. .
docker container run --rm -p 9000:9000 -it threes_go:0.01.

Copy the code

Repackage web:


cd threes-ai/threes!
meteor build ./dist

Copy the code

After executing the above command, the dist folder will be generated in the current directory, and it will contain threes! .tar.gz Compressed package. Unzip the package and you get the bundle file that you need to deploy.

You can also run the production web locally at this point. Use the following command:


cd dist/bundle
ROOT_URL=http:/ / 127.0.0.1 PORT = 9888 node main. Js

Copy the code

The Web will also run on http://127.0.0.1:9888. Note that the node version must be 8.9.4. The node version requirements are those of the METEOR version. Meteor 1.6 is node 8.9.4. The author also limited node version information in the.nvmrc file. Back to the issue of packaging web Docker images.

The next steps to package into a Docker image are shown in the Dockerfile_web file.


docker image build -t threes_web:0.01. .
docker container run --rm -p 9888:9888 -it threes_web:0.01.

Copy the code

First run go server docker image, and then the Web docker image run, you can.

2. Play Threes Game official website

The first step is to package the Go program as a dynamic library for Python calls.


go build -buildmode=c-shared -o threes.so main.go

Copy the code

The above command packages go into the Threes.so dynamic library.

Next you need to set up a WS connection using Chrome’s Remote Debug mode. What threes_ai_web.py does is pass the checkerboard number information on the Web to GO, and when it’s done, go returns the direction to move next. Finally, ws returns to the Web page to simulate movement.


sudo /Applications/Google\ Chrome.app/Contents/MacOS/Google\ Chrome --remote-debugging-port=9162

python threes_ai_web.py -b chrome -p 9162

Copy the code

Sometimes when sudo is not added, exceptions may occur, such as GPU allocation failure.

A new window has been created in an existing browser session. [29819:45571:0225/225036.004108:ERROR:browser_gpu_channel_host_factory.cc(121)] Failed to launch GPU process.

Copy the code

If you encounter the above error, exit Chrome completely and run –remote-debugging-port=9162. A new WebSocket connection is established, for example:


DevTools listening on ws:/ / 127.0.0.1:9162 / c6deb3 devtools/browser / 86-3 fc1 ec50f1fa ab - 4833-98-0177

Copy the code

The threes_ai_web.py script has some problems, and sometimes an error causes the WS connection to break. This one will come out later when it’s done.

3. FAQ

Halfrost/threes-AI :web-0.0.1 this image has version 0.0.2, why use version 0.0.1 in the command above?

This issue relates to the deployment server. In the current source code, threes-ai/ Threes! /client/js/game.js at line 367, 0.0.1 says 9000, 0.0.2 says 8999.

Why is the port number of the two versions different? Because of SSL. Deploying to the server is HTTPS, so WS becomes a WSS connection. Running locally doesn’t matter, either localhost or 127.0.0.1 is HTTP. Since SSL is required on the server side, it is necessary to use NGINx plus a layer of reverse proxy to support WSS. Nginx listens on Port 8999 of Web Server and forwards to go Server 9000 port with SSL. This completes the WSS interaction between the Web and GO. For local operation, there is no need to bother so much. It can be directly connected to 9000 port of Go Server through 9000 port for WS communication.

2. During server deployment, an error occursWebSocket connection to 'wss://XXXX' failed: Error in connection establishment: net::ERR_CONNECTION_REFUSEDError, how to solve?

The possible causes of CONNECTION_REFUSED error are as follows:

  • 1. It is possible that server iptables is masking ports
  • 2. An IP or port error may occur or the protocol is not supported
  • 3. The server is not started

The above problems occurred during the deployment of the author. The iptables test was performed first and no problems were found. The second possibility is that WSS is not supported, and the port is detected by the openssl command:


openssl s_client [-connect host:port>] [-verify depth] [-cert filename] [-key filename] 
 [-CApath directory] [-CAfile filename][-reconnect] [-pause] [-showcerts] [-debug] [-msg] 
 [-nbio_test] [-state] [-nbio] [-crlf] [-ign_eof] [-quiet]

Copy the code

The port does not support SSL, so add a layer of SSL under nginx proxy to solve the above problem.

The game analysis

The difficulty for Threes is that it’s a losing game. By the end of the game, after 6144 is created, the block is immobilized for a large portion of the time, which is equivalent to 4 * 4 = 16 tiles, minus one for it. The bricks on the site can not be combined all at once in the later stage, so the reserved space is very little, often because of the turnover or continuous 1 or 2, unable to synthesize 3, alive “squeezed to death”.

Web version of the design process, and did not consider the “skip” mode to the client, that is, the brick can appear 3 multiples, such as 3, 6, 12, 24, 96,192,384… These big bricks. So the web version of the game might be a little easier than the client version.

Why would it be easier? The reason for this is that, although there are no big blocks (big blocks have a higher score), the play time is longer, but the survival rate is slightly higher. If 96,384,768 are played consecutively, then many blocks that cannot be combined will suddenly appear on the board. Although the score will soar, it will also be unable to move because it cannot be combined, which will quickly end the game.

There is a “skip” setting on the client, and it is possible that these blocks will appear over a period of time. I also found this problem when testing AI, the probability of being killed by successive single 1 or successive single 2 is not very high, but the probability of being killed by high score block is a lot, so the survival time is not long, and the score is not as high as the web version.

When it comes to game layout, there is a knack to it.

The layout of the game is optimal for monotonicity, as shown in the following two images:

As you can see, the layout of these two diagrams is the best, because adjacent blocks can be combined to a higher level, which is also the fastest. The fastest way to eliminate tiles on the board. If not in accordance with the similar monotony to the layout, then often end up because of their own layout chaos, leading to some bricks can not be synthesized, and finally their own alive “forced to death”.

Algorithm thought

This repO is implemented with Expectimax Search, of course, there are other solutions to this problem, here also a little mention of the algorithm idea, corresponding to algorithm two and algorithm three, but without code implementation.

Expectimax Search Trees Expectimax Search Trees

In everyday life, there are situations where we cannot judge the outcome of a choice even after careful consideration. Is it good or bad?

    1. When you touch a poker card, you never know what the next card will be. What kind of impact will this unknown card have on the game?
    1. In minesweeper, click on a square at any time. It can be thunder or numbers. The random position of the thunder will directly affect whether the round is over.
    1. Pac-man, ghost locations appear randomly, leading directly to the following route planning.
    1. Threes! In the game, blocks 1 and 2 appear randomly, which affects how we move the blocks.

In all of these cases, Expectimax Search Trees can be used to solve this problem. These kinds of questions are trying to find a maximum (score). The main ideas are as follows:

The maximum node is the root node of the whole tree, just like minimax Search. Insert “Chance” nodes in the middle, like the smallest nodes, but remove nodes whose results are uncertain. Finally, the weighted average method is used to find the maximum expectation that is the final result.

Markov Decision Processes determine the next move based on the current state of the chess board.

1. Some properties of Expectimax maximum expected value

The other nodes are not hostile and are not under our control. The reason is because they are unpredictable. We don’t know what’s going to happen with these nodes.

Each state also has a maximum expected value. We can’t just go with expectimax because it’s not 100% safe and it can cause the whole tree to “loose”.

Chance nodes are managed by weighted average probabilities rather than blindly selecting minimum values.

2. Pruning

There is no concept of pruning in expectimax.

First, there is no such thing as a “best game” for an opponent, because the opponent’s behavior is random and unknown. So whatever the current expectations are, random events in the future could overturn the current situation. For this reason, finding expectimax is slow (though there is an acceleration strategy).

3. probability function

In the Expectimax Search expected maximum Search, we have a probabilistic model of adversary behavior in any state. The model can be a simple uniform distribution (such as rolling dice), or it can be complex, requiring a lot of calculation to arrive at a probability.

The most uncertain factors are the behavior of the opponent or random changes in the environment. Let’s say that for each of these states, we have a magic function that generates the probabilities. The probability is going to affect the final expected value. The expected value of the function is its mean, weighted by the probability of the input.

For example: calculating the time to the airport. The weight of your luggage affects your travel time.

L (none) = 20, L (light) = 30, L (heavy) = 60Copy the code

In the three cases, the probability distribution is:

P (T) = {none: 0.25, light: 0.5, heavy: 0.25}Copy the code

So the estimated driving time is written as

E [L (T)] = L (none) * P (none) + L (light) * P (light) + L (heavy) * P (heavy) E [L (T)] = 20 * 0.25) + (30 * 0.5) + (60 * 0.25) = 35Copy the code

4. Mathematical theory

In probability theory and statistics, mathematical expectation (or mean, also known as expectation) is the sum of the probabilities of each possible outcome of an experiment multiplied by its outcomes. It is one of the most basic mathematical characteristics. It reflects the average value of the random variable.

It is important to note that expected values are not necessarily the same as common sense “expectations” — “expectations” may not be equal to every outcome. The expected value is the average of the output values of the variable. The expected value is not necessarily included in the set of output values of a variable.

The law of large numbers states that the arithmetic mean of a value almost certainly converges to the expected value as the number of repetitions approaches infinity.

5. Specific

Let’s talk about specific ideas.

As you can see from the two figures above, for each case, there are four actions that can occur, and each action will produce a new brick, with a stable probability that the new brick will appear in the position. In each case, the expected value is computed. The result is a tree-like structure.

For a more detailed example, see the picture below. Suppose the current checkerboard looks like this:

The next block is 2. So where will it be? There are 16 cases.

If you move UP to UP, there are four possible positions for brick 2. If you move the block DOWN, there are four possible positions for the 2 block. If you move the LEFT to the LEFT, there are four possible positions for the 2 brick. If you move RIGHT to the RIGHT, there are four possible positions for block 2.

So once you have those 16 cases, you keep recursing. The recursion formula is as follows:

The formula above is just doing the expected value over and over again.

But recursion cannot be infinite, recursion requires critical conditions. The convergence condition I’m setting up here is that the recursion ends when the probability is less than a certain value. What that value is and you can pick an appropriate value depending on the level of recursion.

Once the recursion converges, you compute the expected value, which is the result of multiplying the weight matrix and the checkerboard matrix. The values in the weight matrix also need to be adjusted by themselves. Improper adjustment will lead to many recursion levels, affecting efficiency. Too few recursion levels will affect the accuracy of the expected result calculation. The “tuning” of this weight matrix may be left to unsupervised learning in machine learning.

The above formula is the expected value calculation formula under the condition of recursive convergence.

After the above expected value recursive calculation, we can return to the initial state. So how do you decide which way to go?

As in the airport example above, calculate the expected value of each route. So once you’ve calculated the maximum expected value here, you can just take an average, and the highest value is the direction you want to move in.

But in real recursion the following happens:

A large blank area requires a lot of recursion, which indirectly leads to a large amount of calculation and a longer time for AI to think. The solution to this problem is to limit the recursion depth.

Sample variance is used to evaluate the mean:

The larger S is, the more different the sample is from the mean, the more dispersed it is. Indirectly, it limits the depth of recursion.

Reference:

[1]:ExpectimaxSearch

[2]:What is the optimal algorithm for the game 2048?

[3]:2048 AI — The Intelligent Bot

Minimax Search Minimax search

Von Neumann’s minimax theory in 1928 paved the way for later approaches to adversarial tree search, which became the foundation of decision theory in the early days of computer science and artificial intelligence.

See the following REPO for details:

Github.com/rianhunter/…

E.g. < 1 > A Monte Carlo tree search is used to search a tree

The Monte Carlo method solved the problem by random sampling and was subsequently introduced in the 1940s as a method for solving fuzzy definition problems unsuitable for direct tree search. Remi Coulomb combined these two methods in 2006 to provide a new approach to move planning in Go, now known as Monte Carlo Tree Search (MCTS). In theory, MCTS can be applied to any field that can be described in the form of {state, action} and predicted by simulation.

Monte Carlo is a method to solve reinforcement learning problems based on average sample return. AlphaGo uses a Monte Carlo tree search to quickly assess the value of positions. We can also use this method to evaluate the probability of Threes’ current move getting the highest score.

The board has a total of 361 alternate positions with 19 lines, meaning there are a total of 10^171 possibilities (1 followed by 171 zeros). That’s more than the total number of atoms in the universe is 10^80(1 followed by 80 zeros)!

Traditional AI uses brute force search (as deep Blue does), building a tree of all possible moves. Because of the large state space, it is impossible to enumerate all states by means of violent enumeration.

Monte Carlo tree search can be roughly divided into four steps. Selection, Expansion, Simulation, Backpropagation.

In the beginning, the search tree has only one node, which is the situation we need to make a decision on. Each node in the search tree contains three basic information: the situation represented, the number of visits, and the cumulative score.

Selection: Start with root R and select the successive child nodes to leaf l. The following section describes more about the method of selecting child nodes to allow the game tree to expand to the most promising moves, which is the essence of Monte Carlo tree search.

Expansion: Unless L ends up with a win for either side, create one (or more) child nodes and choose one node C from them.

Simulation: Plays a random play from node C. This step is sometimes called playout or rollout.

Backpropagation: Updates the information in the nodes on the path from C to R using the results of Playout.

The figure shows the steps involved in a decision, and each node shows how many times that player has won/played from the point of view of the player represented by that point. So, in the selection diagram, the black is about to move. 11/21 is the total number of white playouts so far from this position. It reflects the total of 10/21 black wins shown by the three black nodes below it, with each black win indicating a possible black move.

When the white simulation fails, all nodes along the selection increase their simulation count (denominator), but only the black nodes are credited with victory (numerator). If white is chosen instead, all the selected nodes still increase their simulation count, but only the white node wins. This ensures that during the selection process, each player’s choices are expanded to reflect each player’s goals with the player’s most promising moves to maximize the value of their actions.

As long as the time allocated for movement remains constant, the search is repeated. Then choose the most simulations (that is, the highest denominator) as the final answer.

From this we can see that Monte Carlo tree search is a heuristic search strategy, which uses the frequency to estimate the probability. When enough samples are collected, the frequency is approximate to the probability. If we flip a coin at random enough times, the frequency of heads will be infinitely close to 0.5.

Reference:

[1]:Browne C B, Powley E, Whitehouse D, et al. A Survey of Monte Carlo Tree Search Methods[J]. IEEE Transactions on Computational Intelligence & Ai in Games, 2012, 4:1 (1) : 1-43.

[2]:P. Auer, N. CesA-Bianchi, and P. Fischer, “Finite-time Analysis of Multiarmed Bandit Problem,” Mach. Learn., Vol. 47, No. 2, pp. 235 — 256, 2002.

To-Do

I thought this was the end of the project, but after Ali released the “Yellow Book” in recent days, I suddenly felt that I had a new idea, and I decided to continue to use enhanced learning to finish the second edition. Hopefully the smarter AI will be able to complete 100% of 12288’s highest achievement. After the training, I will come back to finish the following article.

Making Repo: Halfrost – Field

Follow: halfrost dead simple

Source: github.com/halfrost/th…