During the New Year’s Day holiday, there is nothing like relaxing in the lazy winter sun, picking up the mobile phone, calling a friend who has not fought with you for a long time, and having a fierce battle in king of Glory, as if returning to the young self in those days.

Awesome, after all, DD was the king of 50 star level, haha.

But even I am such a king, in the last year also Tencent online super COMPUTER AI must understand abuse not light. I remember a friend came to me at that time, saying that Tencent launched a juwu challenge mode, in which the COMPUTER AI was nothing like clouds and mud compared with the computers we used to practice proficiency. Each level was more difficult than the next level, and it was said that even some teams of professional athletes had overturned.

Listen to so mysterious, the head of hard I naturally will not be credulous, and then in the following days, as it were, was absolutely aware of the abuse of life can not take care of themselves, or finally defaced by the spread of the net “big Joe – Mileti” transmission steal home routine just barely pass.

While recalling the tragic experience of being defeated, while browsing the recent news, suddenly the eyes burst out a surprise, China won the Football World Cup!

You read it right, China did win the Football World Cup, but it was not the traditional men’s and women’s football teams that won the World Cup, but the football AI-wekick, which evolved from the familiar concept of pure understanding.

WeKick won the first Google Soccer Kaggle competition, with 1,138 teams participating. It represents the best SOCCER AI competition on the planet, and can be called the Soccer AI World Cup.

Of all the participating teams, WeKick scored 1785.8 points, dominating the tournament, just like the bulls in 1996 and Brazil in 2002.

Unbelievable? Let me show you some more highlights!

Quick, accurate, straight! A perfect long ball, straight into the goal!

Broke the ball and made 4 easy passes.

Some people may disagree, think that the performance of king of Glory in the past never understand, with football, is also very simple.

In fact, this is not true. Firstly, Honor of Kings is a 5V5 game, while football is an 11V11 sport, which means that the number of intelligent bodies (players) that AI needs to control is more than double. Secondly, although football is also a real-time strategy game, it also needs AI to have the ability of long-term thinking, rapid decision-making and complex environment processing. AI needs to take into account each player’s speed, acceleration, shooting, heading, passing, defense and other indicators, but also need to control the players to frequently cooperate with each other, also need to always observe the behavior of the opponent players, take precautions, make the best choice!

In response to these different situations, the WeKick team used their imagination to tailor model training to the following three innovations.

Self-Play reinforcement learning framework

The WeKick team used self-play reinforcement learning to train the model from scratch and deployed it in an asynchronous distributed reinforcement learning framework. Asynchronous architecture sacrifices part of the real-time performance of training, but correspondingly, gets higher flexibility, and can support the adjustment of the entire computing resources in the training process according to the actual needs, so that it can quickly and perfectly adapt to the training environment of football game with more intelligent agents.

GAIL generation versus simulation learning

Honor of Kings is an adversarial MOBA game whose ultimate goal is very different from soccer. The WeKick team expanded and innovated the design of features and rewards by combining GAIL (Generative adversarial Simulation Learning) with artificially designed rewards.

Using this scheme, WeKick can learn from other teams, fit the state and movement distribution of expert behavior, and then use GAIL’s training model as a fixed opponent for further self-play training, further improving the robustness of the strategy.

League Multi-style reinforcement learning

The unsolved shortcoming of the self-play reinforcement learning scheme mentioned above is that the models obtained through this scheme can easily form a single style. In football parlance, the game is set in stone, easily targeted or overwhelmed by formations that are naturally restrained. To address this problem, WeKick’s team used League (Several Strategy Pools) multi-style reinforcement learning training for multi-agent learning tasks to improve strategy diversity.

The main process of this League multi-style intensive learning and training program is explained in one sentence from simple to complex!

  • First, train some basic models, such as passing, dribbling, passing, shooting and so on.
  • Several stylized models are trained according to the basic model, and each model focuses on a style of playing. In the training process, the main model is added as the training opponent to avoid rigid and inflexible training effect.
  • Then train a master model based on multiple basic models. The master model can use its own historical version as the training opponent, and can also add all the stylized models as different training shops, so that the master model can meet any opponent with a solution.

According to its internal ability scoring system, the master model under this algorithm can improve the base model by 200 points, which is 80 points higher than the strongest stylized model!

Finally, Google Football Kaggle

Founded in 2010, Kaggle is the world’s largest data science community and data science competition platform. This is the first time That Kaggle has released a question for football AI.

As the team strategy of football requires the most correct teamwork, real-time decision-making and competitive strategy in the rapidly changing field, the difficulty has always been a puzzle to the world’s top AI research teams. As mentioned above, from the evolution of satowu to WeKick, the number of controlled agents increases from 5V5 to 11v11, during which the difficulty of reinforcement learning will explode exponentially with the increase of the number of agents.

As a matter of fact, the development team of Juwu had already shifted from the control of single agent in football matches to the research direction of simultaneous control and collaborative operation of multiple agents. Juwu has previously won the Google Research Football League, a 5V5 version of Google’s ladder competition, and this is an upgrade.

From the earliest AI skill of Go, to the AI of MOBA game of King of Glory, and now to the ai-Wekick of football, Tencent is gradually evolving the degree of deep reinforcement learning in ARTIFICIAL intelligence, and it is likely to be applied to other broader industries in the future, truly realizing artificial intelligence to serve human beings.

At the moment, I just want to get a chance to play against WeKick sometime. Do you want to play against him?

Welcome to pay attention to my public account: Program monkey DD, get the exclusive arrangement of learning resources, daily dry goods and welfare gifts.