DeepMind proposes relational deep reinforcement learning: Achieving optimal levels in StarCraft II missions

DeepMind has not published further research on starCraft AI since it opened its AI research environment SC2LE with Blizzard in July last year. The company recently came up with a “relational deep reinforcement learning” approach that was tested in StarCraft II.

In the StarCraft II learning environment, DeepMind’s agents achieved their current best performance in six mini-games and outperformed master human players in four. This new type of reinforcement learning can improve the efficiency, generalization ability and interpretability of conventional methods through structured perception and relational reasoning.

Recent advances in deep reinforcement learning (RL) [1, 2, 3] are partly driven by the ability to learn good internal representations to inform agent strategies. Unfortunately, deep learning models still have major defects, such as low sampling efficiency and often fail to generalize to seemingly small changes in tasks [4, 5, 6, 7]. These defects indicate that the deep reinforcement learning models with strong abilities tend to overfit the large amounts of data they are training, and therefore fail to understand the abstractness, interpretability, and generality of the problems they are trying to solve.

Here, we improve the deep RL architecture under relational RL (RRL, [8,9]) by drawing on insights from the RL literature of more than 20 years ago. RRL advocates the use of relational state (and action) Spaces and strategy representations to combine the generalization capabilities of relational learning (or inductive logic programming) with reinforcement learning. We propose an approach that combines these advantages with the learning capabilities offered by deep learning, which advocates learning and reusing entity – and relational-centric functions [10, 11, 12] to implicitly reason about relational representations [13].

Our results are as follows: (1) we create and analyze a known as the “square world” RL task, the task with the aim to clear relation reasoning, and proved with the use of nonlocal calculation based on the attention to generate relationship ability of characterizing agent [14] compared with the agent does not have this ability, show the interesting generalization behavior; (2) We applied this agent to a difficult problem — the “Starcraft II” mini-game [15] — and achieved the current optimal level in six mini-games.

Figure 1: The “World of Squares” and “StarCraft II” missions require reasoning about entities and their relationships.

Relational reinforcement learning

The core idea behind RRL is to combine reinforcement learning with relational learning or inductive logic programming [16] by using first-order (or relational) language [8, 9, 17, 18] to represent states, actions, and strategies. The shift from propositional to relational representations facilitates generalization of goals, states, and actions, and takes advantage of knowledge gained in the early stages of learning. In addition, relational languages facilitate the use of background knowledge, which can also be provided through logical facts and rules related to learning problems.

In the game “World of Squares,” for example, when specifying background information, participants can use the predictors above(S, A, B) to indicate state S where block A is above block B. This statement can be used to learn squares C and D and other objects. Representational language, context, and assumptions form an inductive bias that can guide and constrain an agent’s search for good strategies. Linguistic (or declarative) biases determine how concepts are represented.

Neural networks are traditionally associated with attribute-value, propositional and reinforcement learning methods [19]. Now, the researchers have translated the core ideas of RRL into a structurally specified inductive bias in a deep RL agent. They use neural network models to perform operations on structured situational representations (entity sets) and perform relational reasoning in an iterative manner. Where entities correspond to local areas of the image, and agents will learn to pay attention to key objects and calculate their interactions in pairs and higher order.

architecture

Figure 2: Cube World agent architecture and multi-head dot product attention. E is a matrix that compiles the entities generated by the visual front end; F_ θ is A multilayer perceptron that is used to parallel the output of each line of MHDPA step A and produce the updated entity E.

Experiments and results

Square in the world

The “cube world” is an environment of simple perception but complex composition that requires abstract relational reasoning and planning. It consists of a 12-by-12-pixel space with keys and squares scattered randomly. This space also contains an agent, represented by a dark gray pixel, which can move in four directions: up, down, left, and right (Figure 1).

Figure 3: “World of squares” : sample observations (left), basic graph structure that determines the appropriate path to achieve the goal with arbitrary interference branches (center) and training curves (right).

Figure 4: Visualization of attention weight. (a) Basic maps at the single-sample level; (b) Analysis results at that level, using each entity in the resolution paths (1-5) as a source of attention. The arrow points to the entity of the source’s positive attention, and the transparency of the arrow is determined by the corresponding attention weight.

Figure 5: Generalization in “World of Squares”. Zero sample migration to the desired level :(a) opening a long box sequence; (b) Use lock-key combinations not used during training.

Starcraft II mini game

“Starcraft II”, a popular video game, presents a tricky problem for reinforcement learning. There are multiple agents in the game, and each player controls a large number (hundreds) of units that need to interact and cooperate (see Figure 1).

Table 1: Average score of all action groups in the starCraft II mini-game. “↑” indicates a score higher than that of a master human player. Small game :(1) move to beacon position; (2) Collect crystal debris; (3) Find and defeat dogs; (4) Defeat cockroaches; (5) Defeat dogs and poison bugs; (6) Collecting crystal ore and gas ore; (7) Make riflemen.

Relational Deep Reinforcement Learning

The thesis links: https://arxiv.org/abs/1806.01830

Abstract: In this paper, we introduce a deep reinforcement learning method that can improve the efficiency, generalization ability and interpretability of conventional methods through structured perception and relational reasoning. This approach uses self-attention to iteratively reason about the relationships between entities in the scene and guide the Model-free strategy. The results showed that in a new task called “world of squares” for navigation and planning, agents found explicable solutions and improved baseline levels of sample complexity and ability to generalize to more complex scenarios than during training. In the StarCraft II learning environment, the agent reached its current optimal level in six mini-games — outperforming the master human player in four. By considering the structural inductive bias, our study opens a new direction for solving important and thorny problems in deep reinforcement learning.

DeepMind proposes relational deep reinforcement learning: Achieving optimal levels in StarCraft II missions

Related Posts

【 game 】 Based on MATLAB GUI seat lottery

One to understand NLP model framework Encoder-Decoder and Seq2Seq

OpenCV form transformation in detail