How does an AI play a game by itself?

Artificial intelligence (ai)

In recent years, artificial intelligence has enjoyed a resurgence in media hype. In the eyes of the general public, artificial intelligence seems to have omnipotent, automatic driving, intelligent robots, face recognition, voice translation and playing go games, and so on, these are no problem.

Just as the so-called amateurs watch the fun and amateurs watch the tricks of the road, in fact, the current artificial intelligence technology is mainly used by some machine learning algorithms, which belongs to weak artificial intelligence. Although it can achieve many amazing applications, it also has great limitations, only after a clear understanding of the mechanism of artificial intelligence will not be carried away by exaggerated media reports. In this article we are going to take a deeper look at how ai plays games on its own.

Machine learning

Generally speaking, machine learning is a branch of artificial intelligence and a way to achieve artificial intelligence. Machine learning theory mainly studies some algorithms that can make machines learn automatically. It can automatically learn rules from data samples, and then predict unknown data based on the learned rules.

Machine learning can be broadly divided into four categories:

Supervised learning, supervised learning is learning from labeled samples.
Unsupervised learning, unsupervised learning is learning with unlabeled samples.
Semi-supervised learning, in which there are only a small number of labeled samples in the data sample and most of the samples are unlabeled, uses these samples for learning.
Reinforcement learning is a very different way of learning. It has no regular training samples and labels. It mainly uses rewards and punishments to achieve the purpose of learning.

A little game

There is A small game, as shown in the picture below. After entering the entrance, the initial position is A, and the shadow part represents the kang. If you step into it, you will die and have to start again. How would you play this game if you didn’t know what the pits were? You should have many ways to finish this game. But what about machines? How can a machine learn to play the game on its own? The answer is reinforcement learning!

Reinforcement learning

The Brain once had a challenge called the Hive Maze, where the challenger tried and tried and tried. Reinforcement learning is similar in that it consists of three main concepts: state, action, and reward. Similarly, taking a maze as an example, the position of an intelligent body is a state, and taking a step in a certain direction from a certain position is an action, such as left, right or up and down. Every step will generate a return, such as hitting a wall is a negative return, and good action will bring a positive return. And focus not only on immediate rewards, but also on long-term rewards, learning a long-term reward sequence through trial and error.

The agent learns behavioral strategy from the environment, that is, how to take a series of behaviors in the environment so as to maximize the value of the reward signal function, that is, to obtain the maximum cumulative return. Reinforcement learning evaluates the quality of the action through the signals provided by the environment, and it must be learned by its own experience. After acquiring the model, the agent knows what behavior to take in what state and learns the mapping from the environment state to the action, which is called strategy.

In the figure below, an agent interacts with the environment and changes the environment with certain strategies. The agent can get a state from the environment, then perform an action, then get an immediate reward, and finally move to the next state. The whole process can be summed up as finding the optimal action to maximize the return based on the value of the current observed state.

Markov decision making process

Before we understand reinforcement learning, we need to understand what kind of problem we are trying to solve. In fact, reinforcement learning process is the optimization of Markov decision process, which consists of a mathematical model, the model under the control of agents to make decisions on random results.

The agent can perform certain actions, such as moving up, down, left, and right, and these actions may receive a reward, either positive or negative, that causes the total score to change. While the action can change the environment and cause a new state, the agent can then perform another action. Markov decision process is composed of the set of states, actions and rewards, and transformation rules.

Decision related elements

The following five important elements are involved in the whole Markov decision-making process.

Set of states, all possible states.
A set of actions, all actions, that can transform one state into another.
State transition probability, the probability distribution of performing different actions in a given state.
The payoff function, the payoff from going from one state to another.
The discount factor, used to determine the importance of future returns.

Reinforcement training is all about calculating the rewards of different actions in different states, not instantaneously, but through a lot of trial and error. The next state depends on the current state and action, and the state does not depend on the previous state, has no memory, and conforms to markov property.

Reinforcement learning is an iterative interaction between an agent and the environment, which needs to be considered as follows:

In a certain state, the decision maker will choose an action in this state;
Can randomly enter a new state and give the decision maker the corresponding reward as a response;
The action selected by the state transition function will affect the selection of the new state.

Characteristics of reinforcement learning

It’s trial-and-error learning, because it doesn’t have direct guidance like supervised learning, so it’s constantly interacting with the environment and trying to get the best strategy.
Its reward is delayed because it often gives guidance only in the last state, which makes it more difficult to allocate the reward to the previous state after receiving a positive or negative reward.

Learning model

Through trial and error, we eventually acquire a model through reinforcement learning that can learn the quality of each action in each state, and with that we can complete the little game.

The general idea of the learning process is: enter the entrance, point A has two directions, at the beginning there is no return, choose A direction at random. If it’s down, it goes all the way down, and when it gets to B, there’s no way, so it gets a negative 100 return, and it’s done. Next time you go to a square above B, update your returns. Keep updating until you’re at A, and the payoff is -50 if you go down, you go to the right. After reaching D, it is also random first, and then the return value of the grid is updated. Because you want to take a long view, you need to look at the expected return n times in the future, and of course the further you go, the more discounted the return is. N times the payoff to the right is bigger, so you go to the right, you get to F. The return value of the grid on the way to G is also smaller after iteration, and the return value to H is larger, until the exit.

conclusion

So, the mechanics of ai games are reinforcement learning, where the machine learns through trial and error a strategy that has a long-term payoff, and then completes the game. In fact, many fields and applications are realized through reinforcement learning, such as flight control of aircraft, control of robot walking, learning how to play games, learning how to finance and invest and so on.

github

Github.com/sea-boat/Ma…

————- Recommended reading ————

Summary of my open Source projects (Machine & Deep Learning, NLP, Network IO, AIML, mysql protocol, Chatbot)

Why to write “Analysis of Tomcat Kernel Design”

2018 summary data structure algorithms

2018 Summary machine learning

2018 Summary Java in Depth

2018 Summary of natural language processing

2018 Summary of deep learning

2018 summary JDK source code

2018 Summary Java concurrency Core

2018 summary reading passage

Talk to me, ask me questions:

Welcome to: Artificial intelligence, reading and feeling, talk about mathematics, distributed, machine learning, deep learning, natural language processing, algorithms and data structures, Java depth, Tomcat kernel and other related articles