Recently, I am obsessed with retro games. I think of playing snake when I was a child. It is really fun. Now in 2019, everything seems to be different from before, for example, we can use AI to play Go, use AI to do automatic driving, is it possible for an AI to learn to play snake by itself? The answer is yes!

Let’s look at our results first:

Train of thought

For the AI to learn to eat snakes, first we need to define the environment. Some people will say that making an interface with PyGame, or even GTK, is too complicated. I haven’t even built the AI yet, but the interface would take a lot of time. We can solve this problem directly with OpenCV! The code looks like this:

How to make AI learn

So how to let AI know the rules of the snake game? Let the AI know, as a matter of fact:

  • You can’t eat yourself;
  • You can’t touch the wall;
  • You got your reward for being late with the red food.

After my attempts, I found that if the traditional dynamic programming method is used to learn and predict the next direction of the snake from pictures, it is actually difficult for the Agent to learn anything. On the contrary, our strategy is as follows:

  1. First, we let the AI learn in a small environment;
  2. Then gradually expand the plate;
  3. Finally see the final effect on a fixed size chessboard.

This idea is a bit like ProgressiveGAN, which is to let the Agent learn step by step, starting from the small environment first, learning in the small environment well, and then increasing the difficulty. In fact, the experiment confirmed our idea that this direction definitely works.

As can be seen from the figure, this is one of the links, we can see that the maximum length of our snake can reach 9, in fact, it is very good, the next number shows that it has bitten itself to death 1841 times, but still strong alive.

The process of the whole model algorithm is also very clear. In simple terms, the steps are as follows:

  • First of all, we define a total number of reincarnation, things change, there are reincarnation, you can die up to 50,000 times, each time you are given 8 lives, how to learn, you do what the snake;
  • In each reincarnation, we record every attempt of the move, put into our Memery, so that our snake can be like CAI Xukun’s coquetty move;
  • Finally, after each reincarnation death, we use these memories to train our Qnetwork;
  • In this way, Qnetwork becomes stronger and stronger, because it remembers all the successful attempts and failed attempts, the more powerful the next instruction will be, and the longer the last reincarnation will be, not to say landing into Buddha.

QNetwork build

For this kind of problem, it is actually a decision making process according to the environment, which can be learned by means of reinforcement learning. However, as a prediction model of the next action space, we still need to build DNN to fit, and learn the law of predicting the next action from the data, which is also the core. Our QNetwork was built using TensorFlow 2.0 and built using the Keras NN API. It’s very much a combination of trends. The core QNetwork build code is as follows:

class QNetwork:

    def __init__(self,input_shape, hidden_units, output_size, learning_rate=0.01):
        self.input_shape = input_shape
        hidden_units_1, hidden_units_2, hidden_units_3 = hidden_units
        self.model = tf.keras.Sequential([
            tf.keras.layers.Dense(units=hidden_units_1, input_dim=input_shape, activation=tf.nn.relu),
            tf.keras.layers.Dense(units=hidden_units_2, activation=tf.nn.relu),
            tf.keras.layers.Dense(units=hidden_units_3, activation=tf.nn.relu),
            tf.keras.layers.Dense(units=output_size, activation=tf.keras.activations.linear)
        ])

        self.model.compile(optimizer = tf.keras.optimizers.Adam(learning_rate), loss='mse',metrics=['accuracy'])
        
    def predict(self, state, batch_size=1):
        return self.model.predict(state, batch_size)
    
    def train(self, states, action_values, batch_size):
        self.model.fit(states, action_values, batch_size=batch_size, verbose=0, epochs=1)
Copy the code

For this model, we can actually adopt a more in-depth architecture. We will continue to explore the optimization effect of different models in the future, so as to make our Snake AI more intelligent.

Intensive learning training

We can look at the log of the entire training process:

It can be seen that after about 5000 episodes, the score can gradually increase, indicating that the network is more handy in guiding snake’s next movement. From the actual training GIF, we can also see that the maximum length of our snake can reach 13, just imagine, with the enlargement of the chessboard, the model becomes stronger, will the snake become very, very long, so that beyond the limit of human playing snake?? We’ll see!!

Training is not complete yet, but as you can see, this move is still very coquettish!!

Show Time

Finally, it’s time for our ARTIFICIAL intelligence show!! Let’s turn it over to Snake AI!!!! Snake, go!

This walk is still very coquettish! After a night of training, as many as 30,000 cycles, our snake was finally able to ensure its own standing within the range of length 28, WHICH I think should be better than the multi-part mutilated teenager. Like me.

To summarize the problems and unsolvable blind spots in our reinforcement learning:

  • Although the reinforcement learning model can deal with 99% of the time, but whether establish yao scenario would exist so 1% have never seen, big probability AI don’t know how to decision at this time, most likely blind several a random walk, we are the snake also ok, but if use the automated driving decisions, then solve is directly see Marx;
  • Although we can make the model more complex wirelessly, after two weeks of experiments, we found that the more complex the model, the stronger the AI;
  • Is more elusive, is not the longer Episode came, the stronger the AI, during which we found that it has a peak, the highest average score of 188 points, this is so it can reach the highest length 40 step, interested friends to change the snake, look at the snake, you will be able to train the AI can be strong to what state, Can be as stable as a farg;
  • Snake cannot predict boundary, that is to say, if our model in a training environment more than he operation of the board, it can predict the border, big probability which we let him board with small training gradually expand the size of a chessboard reason, but even so, it cannot accurately under the board of any size to find food, and avoid the boundary.

Code

Finally, all the code in this tutorial consists of four main things:

  • The game environment, this game environment is not from OpenAI, we built it with OpencV;
  • TensorFlow2.0 model building and training code;
  • Reinforcement learning training code;
  • Start AI to play snake code.

All code is available on the MANA AI platform, a dedicated platform for sharing high quality AI code maintained by a professional team:

Manaai. Cn/aicodes_det…