This year, the development of deep learning and artificial intelligence has been focused on vision and natural language, but there is a very basic problem that few people talk about, and that is navigation. Because even if AlphaGo could beat Lee sedol, it would still be difficult for a machine to find a target in a space using its own senses.

In the early days of deep learning, we learned that the cognitive functions of humans that seem to exist like air are more difficult for machine time, such as vision. This problem is far more difficult than making machines do calculations or play chess. Exploration and navigation in space (and of course the more difficult tasks of consciousness and motivation, especially if we don’t understand perception and movement) is another example. The familiar concept of space is so imprinted in our minds that we no longer see it as unusual. And once you close your eyes and walk around the house, like looking for a toilet, you realize how difficult it is.

Of course navigation in high technology is not an AI problem at all, if you have GPS everything is just a path planning problem. But no, the question is no longer simple, how long did it take for the hominids to get out of Africa and into the world?

This is our topic today, in the absence of GPS, how to let artificial intelligence through perception and behavior, form the cognition of space, and then reach the destination.

One, how do animals navigate with the help of nature

We talk about navigation in terms of perception and motion. You need to be able to perceive environmental signals before you talk about space exploration. Vision is definitely the first thing, but other senses like hearing, magnetic fields, etc., can also be good signals. Fish, for example, use the sun, pigeons use the earth’s magnetic field, and bees use polarized light. At any point, the understanding of space starts from perception. Take yourself as an example. If you want to go to the school canteen, you basically turn right when you see the teaching building and turn left when you see the playground, that’s all. The key here is to look.

In addition to seeing, the second essential is to know their movement. Animals need to know where they are going and how far they are going, rather than walking around like a drunk without knowing where they are. For example, if you look at the navigation to find the target, and you see 300 meters to the right, 500 meters to the left, then you can complete the process, first of all, you have to be able to sense your own movement, know which direction you are going.

So the second basis for navigation, is that you start to become aware of your own movement and measure it relative to it, which naturally involves memory, which is basically knowing how far you’ve traveled, without which you can’t know that you’ve taken 500 steps east.

These two elements together, how much do you know how much you go further, and perceive the change of the surrounding environment, the steps and the corresponding objects together, you can get a concept, that is how many step right on our school playground is the dining room, how many step is left the building and it’s like a map of the heart.

These elements are like the basic elements of what machine learning calls space exploration, and that map is an advanced feature of space exploration. Just as pixels are the basic elements of image data, CNN extracts more representative next-level features that constitute the image, such as texture and parts of different properties of objects, from the basic pixels. And here, the challenge for neural networks is to extract the basic features of space from the perception and behavior we’ve just talked about — and to construct a kind of virtual and map.

Look at what a biological neural network does. Scientists have long thought that this virtual map should exist in a biological neural network, and they actually found a group of cells called the Place cell in and around the mouse hippocampus, which is basically a map of where the mouse goes, A cell corresponding to that location (and its neighbors) starts firing. Neuroscientists say this virtual brain map is key to the mice’s ability to navigate mazes without getting confused.

Location neurons, where you go, the corresponding nerve cell there calls.

And that’s not all. Further upstream, the neuroscientists found a group of cells called grid cells, or “grid cells” in Chinese. What is a grid cell? We usually describe cell discharge in terms of receptive field, which is the average discharge rate of each cell for different signal inputs (a measure of cell sensitivity). The difference here is location, and scientists have found that the receptive fields of grid cells take on a shape similar to that of a hexagonal grid. Instead of discharging at just one spot, as placecell does, they discharge at periodically arranged positions in space, like an unfolding hexagonal network. It can be said that it has a very obvious periodicity in space. (That is, when you walk through a space, certain cells in turn start firing when you walk a certain distance in a certain direction). Each cell in the entire grid cell layer has a different spatial periodicity (with different side lengths), but is arranged in a hexagonal shape. And the spatial cycle is stable, even in a dark environment, this layer of grid cells can accurately respond, and when moved to a new environment, it will generate a new receptive field according to the new environment, although the location of each cell is changed, but the overall shape of the grid remains the same. Different grid cells correspond to different grids with different periods and the whole grid network layer becomes a set of hexagonal networks with many different periods.

Grid neurons, groups of neurons with different periodicity


What this grid cell says is that each cell is sensitive only to specific spatial locations, and these sensitive locations are combined (red dots) to form a hexagonal network.

A scientific hypothesis that needs to be tested is that this mysterious grid network is exactly the basis for the high-level concept of place cell “position”. High-level features like image objects are similarly composed of low-level features like edges and corners. In fact, there is a strong physical basis for this. We can perform Fourier decomposition similar to time decomposition for spatial signals, so that a place cell sensitive to a fixed position in space can be decomposed into periodic grids of different frequencies (see Fourier decomposition for details), and conversely, By adding up grid cells of different frequencies, a fixed-point signal can be obtained.

Scientists have discovered that grid neurons have a correspondence with location neurons.

Why bother? Didn’t you just indicate a location? Another hexagon? The key here is that we have place cells that code not just for a particular location in one environment, but for all possible locations in all possible environments. For example, one of your cells should be able to represent both Beijing Wangfujing and Shanghai Oriental Pearl. In this way, we need a powerful and effective feature representation, which can quickly re-encode the position of the new environment space. The hexagonal base formed by grid cells is exactly this representation. To some extent, they represent the basic properties of space, which are abstracted from the completely real space.


The principle is a bit like quantum mechanics, where any particle at a particular position can be seen as a superposition of wave functions of different periods, and these waves form a basis. Here, in the dark, quantum mechanics and neuroscience are shaking hands.


The correspondence between grid neurons and position neurons looks like some kind of Fourier transform


two “About deepMind’s AI

Well, let’s talk about AI. Deepmind’s AI doesn’t have a very good story, because it just added this animal-like grid cell community to the neural navigation module it had built before and got better navigation performance.

If you were to design such an ai, how would you do it?

By the way, to design a virtual brain-neural network that does the same kind of processing that we do in biology, the two elements of exploration that we just talked about — visual perception, motor memory — are integrated with our deep neural network, and then trained with reinforcement learning, just like we train animals.

This model is a combination of deep learning and actor-Critic, an extremely powerful reinforcement learning framework. Reinforcement learning is learning by reward, which is similar to how you train a mouse to walk in a maze, where you get cheese hidden in a corner of space, and you learn spatial navigation. Reinforcement learning involves states (observations), behaviors, strategies, and rewards. The process from observation to behavior is accomplished by the artificial neural network equipped with artificial intelligence, which is the nerve center of the virtual creature. It derives a decision (speed) based on an actor’s observations of the surroundings (objects in the field of vision), such as walking 50 metres north. The neural network starts out as a random walk and is rewarded only when it reaches a goal at a given time. Through a process of reinforcement learning, it learns how to synthesize the correct behavior of various perceptual and motor information in space.

This method is called actor-critic, which is a very powerful method in deep reinforcement learning. The so-called actor-actor refers to a neural network that generates the behavior of an ARTIFICIAL agent, which continuously obtains the optimal behavior (probability representation) for the information observed at each moment. This action, however, is subjected to another neural network – the so-called critic check, which assesses the potential payoff for each action, like a critic. And adjust this revenue estimate (TD learning) according to the rewards at each step. When the two are combined, actor keeps boldly raising the possible optimal behavior, and then critic keeps pointing out whether the behavior is good or bad. A dynamic game between the two will lead to more and more optimization of the behavior. This method can be said to combine the two branches of reinforcement learning, strategy gradient and state function, which have great advantages. When your actions have a consequence (reward or punishment), both actors and critics are improved. For example, when an AI is successfully rewarded, the actor will directly increase the weights of all intermediate actions, while the evaluator will improve the evaluation scores of those actions in the middle process, which are finally reflected in the update of the weight of the neural network.

Actor-critic network, Actor is the decision maker, Critic, the two together determine the loss function


The so-called deep reinforcement learning, in addition to training methods, neural networks are the basic tools of deep learning. The first element of spatial navigation that we emphasize is vision, which sees the environment around it, and this image is solved by THE CNN convolutional network, and the image features that it gets, along with the speed of the ai movement, are fed into the next network. So this is the idea that navigation that I just described requires the integration of both visual and motor information. Can you guess what this network is? It is the timing neural network (RNN), a powerful tool for natural language processing. RNN has a memory function, so it satisfies the second element of navigation we just talked about – motor memory. We use an enhanced version of RNN – LSTM here. On the one hand, LSTM collects motion speed and on the other hand, perceptual signals transmitted by CNN.

So can our “virtual mice” navigate mazes like real ones? The answer is no. Simple reinforcement learning can make agent explore space, but its behavior is a little clumsy. For example, if a new shortcut to reach the target appears in space, it is difficult for agent to use such signals.

The most crucial step in making this clumsiness smart, and the highlight of DeepMind’s new paper, is to add spatial representations like hippocampal grid cells, resulting in the ability to navigate mazes very much like a real animal. This is really learning from nature.

But the process of assembling animal grid cells is not just a copy and paste process. We still use learning methods to train ordinary neural networks into grid networks. The idea is also plausible. If grid networks are something that organisms rely on for navigation, then they probably evolved spontaneously from learning about spatial prediction. Deepmind set a task in which an ARTIFICIAL intelligence (AI) creature would repeatedly navigate a maze, but instead of looking for a reward, it would predict its own location, using the same neural network responsible for temporal memory called LSTM plus a layer of ordinary forward networks. The input of this LSTM is the speed of the AI at each moment, and then finally output its current position from the forward network. You can immediately see that what you really need is a path integral, you need to add up the velocities at each moment to get the new position. On top of this task, the forward network has evolved a receptive field that is indistinguishable from the grid cells of nature.

Through the training of supervised learning, neurons generated on the artificial neural network are very similar to real biological grid cells in form and function, and have the same receptive field. From the neural activity of these cells, we can decode the “position” of the artificial intelligence agent.
The whole framework of the network structure in detail

After joining the grid of artificial intelligence and many are more intelligent, is a very interesting behavior phenomenon, equipped with a spatial representation of the artificial intelligence that can be very good use of space in the “shortcut”, just as humans could be cut by walking paths, this phenomenon proves that human intelligence that has some kind of “understanding” of space. This ability to copy a path is a bit like the Turing test of navigation.

Ai’s strong ability to take shortcuts, if there are both near and far paths from the start to the end, most of the time the short path is automatically selected.

Here we can draw the flow chart of the whole task. The vision is taken care of by CNN. Then, the LSTM of the grid cell group is responsible for receiving the speed, which is the neural code containing the location information, and the other LSTM mentioned earlier, which is the actor-decision center that determines the next speed. Together, this is the whole process.

The “grid cells” trained by supervised learning are seamlessly inserted into the whole neural network, and then accepted by the decision LSTM together with the visual information processed by CNN to determine the direction of movement at the next moment


The beauty of this paper is that we use supervised learning to get key features similar to those used in biological reality, and then use a deep reinforcement learning to get the correct behavior, which can be described as a perfect combination of cognition and decision making.

The whole process can be represented by an end-to-end computing graph.

Although blown all the xuan, we are doing a space motion about the characteristics of the project, the so-called neural coding of location information appropriate to the decision to use network, so, no matter what the agent to the environment, it can use it to learn the skills, to position the unfamiliar space and learning, Adapt to new changes in those circumstances, take advantage of shortcuts, etc.

The real-world applications of such a technology are obvious, for example, for all robot motion control problems. And it shows us just how much artificial intelligence can be inspired by the seemingly useless work of those toiling PhD neurobiologists.