A computer with a GPU can now run go programs that are better than humans. ELF OpenGo’s pre-training model and code for the Facebook Artificial Intelligence Institute (FAIR) has just been open-source. Tian and others have recreated the DeepMind go program AlphaZero, the first open source super Go AI. At the same time, researchers incidentally also found a “bug” in artificial intelligence go.

  • Project: facebook. Ai/developers /…

  • Thesis: arxiv.org/abs/1902.04…

For computer literate Go enthusiasts, you can also download ELF OpenGo’s final version and compile it to experience playing against top-of-the-line AI Go (that requires a Nvidia CUDA-enabled GPU on your computer).

Go program links: dl.fbaipublicfiles.com/elfopengo/p…

ELF OpenGo is based on DeepMind’s famous AI Go AlphaGoZero/AlphaZero, but does not need to use Google TPU for training. The latest version of OpenGo uses 2,000 Gpus to train for 15 days to achieve superhuman performance. With only one GPU, OpenGo has a 20-0 record in slow motion against four of the top 30 human pros, with no time limit for human players.

“I can definitely say that ELF OpenGo has had a big impact on the Korean Go community,” said Beomgeun Cho, assistant head of public relations at the Korean Go Association. “Since its inception, almost all professional players in Korea have been using ELF Go to analyze their own games against other players. This will not only improve the level of Korean go players, but also the world level of Go.”

The final model of ELF OpenGo is the result of 20 million games of self-playing training, and it is stable and superhuman. But just as AlphaGo once demonstrated its own “looting” bug, researchers have also found ELF OpenGo to exhibit specific reinforcement learning limitations during training: OpenGo, like AlphaZero, is unable to fully understand the concept of “twisting sheep’s head,” which is common to beginners. More than other changes, dietary trends depend on slightly longer – term projections. Although it is easy for a human player to predict 30 or more moves in the future, DeepMind has pointed out that these predictions can only be made late in computer training.

Go zheng eat schematic diagram. Human players can quickly learn to predict the course of such situations, but computers learn much more slowly and cannot generalize from individual examples.

Interestingly, the researchers found that ELF OpenGo learns go in the opposite way to human players. Its reinforcement learning-based approach focuses more on the second half of the game than the opening or mid-game. By providing an incentive for the AI to win, reinforcement learning forces OpenGo to know more about how the game ends — not how it is laid out. Humans, meanwhile, tend to evaluate the current chessboard, focusing on recent changes and local patterns while progressing gradually.

While this finding is limited to Go, it also suggests that reinforcement learning has some limitations and can lead to an overall impressive performance that fails — or is exploited because the AI focuses too much on the end result and ignores recent changes.

The following excerpt introduces some contents of the ELF OpenGo paper.

ELF OpenGo at FAIR analyzed how well 87,000 professional chess matchups from GoGod’s dataset matched computer moves from 1700 to 2018.

Go has a legendary history of more than 4,000 years and is regarded as one of the most complex turn-based board games with complete information. The emergence of AlphaGo (Silver et al., 2016) and its descendants, AlphaGo Zero (Silver et al., 2017) and AlphaZero (Silver et al., 2018), shows that even without the oversight of human play data sets, Deep reinforcement learning (Deep RL) can also achieve performance beyond humans.

These advances in chess-playing ability have significant computational costs. A single training process requires millions of games of self-play, which takes days of training on thousands of Tpus; But the vast majority of the study community does not have access to such computing power. This, combined with the lack of access to code and models, makes this approach very difficult or impossible to reproduce, study, improve, and extend.

In this paper, we present ELF OpenGo, which is an open source re-implementation of AlphaZero (Silver et al., 2018) algorithm for Go. We also applied ELF OpenGo to achieve the following three additional contributions.

First, we trained a superhuman model for ELF OpenGo. After nine days of running our AlphaZero-style training software on 2,000 Gpus, our 20-module model surpassed human performance, And can be said to be comparable to the 20-module model described in Silver et al. (2017) and Silver et al. (2018). To assist in this area, we will provide pre-trained superman-level models, the code used to train them, a comprehensive training trajectory data set (consisting of 20 million self-played games, divided into over 1.5 million training minibatches), and auxiliary data. We will describe the design of the system and algorithm in depth, and we will include many lessons learned while developing and training our model, which we hope will help the community better understand many of the considerations of large-scale deep reinforcement learning.

Second, we provide an analysis of the model’s behavior during training. (1) With the development of training, we observed that compared with other models, the chess power of this model has a great change. This will remain true even as learning rates decline. (2) Model learning is slow and can never be fully mastered for the decision of whether or not to play the game (such as “sign game”) that requires significant foresight. (3) We explore how quickly the model learns high-quality moves at different stages of a game. Compared to typical behavior of Tabular RL, the model learned mid-game and late-game moves at roughly the same rate.

Third, we performed extensive experiments to study the nature of AlphaZero-style algorithms. We identify several important parameters that are not clearly described in Silver et al. (2018) and provide insights into their role in successful training. We simply compared the training process of AlphaGoZero and AlphaZero. Finally, we found that even for the final model, doubling rollout in the game still improved the AI’s chess power by about 200 ELO, indicating that the AI’s chess power was limited by the size of the model.

Our ultimate goal is to provide the resources and exploratory insights necessary for both the AI research community and the go community to research, refine, and test these promising advanced methods.

OpenGo analyzed the famous 19th century Japanese go player Hideko Honinobo’s famous game “The Game of Ear Red”. Hideo’s famous 127th black move was placed in position A, but ai thought that black’s move should be played in position B.

ELF OpenGo

Our goal with ELF OpenGo is to faithfully re-implement AlphaGoZero (AGZ) and AlphaZero (AZ), eliminate ambiguities in the original paper, and propose a variety of innovations that allow our system to work entirely on commercial-grade hardware. ELF OpenGo’s system and software design are discussed in detail in Appendix A for brevity. Highlights include (1) colocation of multiple self-playing workers on the GPU to improve throughput, and (2) an asynchronous self-playing workflow to handle higher and higher game latency per game.

We used an Nvidia Tesla V100 GPU with 16GB of ram for training and reasoning; But you can expect the model to perform similarly on most Nvidia Gpus with Tensor Core (like the RTX 2060 commercial Gpus).

Table 1: Hyperparameters and training details for AGZ, AZ and ELF OpenGo. “?” Represents details that are vague or not clearly indicated in Silver et al. (2017) or Silver et al. (2018).

Overall, we largely followed AZ’s training details. But instead of using 5000 self-playing TPus and 64 training TPus, we used 2000 self-playing Gpus and 8 training Gpus. Since Silver et al. (2018) did not specify the size of AZ’s replay buffer, we used AGZ’s 500,000-game setting. We also used AGZ’s self-play rollout setting of 1600 per step.

Figure 2: Development of model capability during training. “Selfplay ELO 25,000” and “Selfplay ELO 50,000” refer to the unnormalized SELF-play ELO level calculated based on the continuous model pairs of training minibatch with intervals of 25000 and 50000, respectively. “Nash Averaging 50,000” means Nash Averaging grade (Balduzzi et al., 2018), This is calculated based on round-robin (paired) bidding between the same models as in the Selfplay ELO 50,000.

Comparison with human chess

Because our model is significantly better than the prototype model showing superhuman chess ability, we assume that our model is also better than humans. In Figure 3(c), we show that the moves predicted by the model are consistent with those given by human professional chess players. These moves were taken from 1,000 professional games played between 2011 and 2015. The human laydown matching rate of the model quickly converges to about 46% around minibatch 125000. This suggests that beyond this point, the model’s chess power may not be due to better human professional prediction, and as Silver et al. (2016) suggests, there may be limitations in using human games to supervise training.

Number of MCTS rollout

Intuitively, increasing the number of ITERATIONS of MCTS (rollout number) can improve the AI’s chess ability by exploring more parts of the game tree. To better understand the effect of rollout numbers on chess power, we performed a self-play analysis using the final model, in which one player used twice as many MCTS rollouts as the other. We performed this analysis on a wide range of rollouts (800-25600).

Figure 9: Win rate of a model with a 2x rollout relative to the same model with a 1x rollout

As shown in Figure 9, ELF OpenGo consistently enjoys a win rate of 80-90% (about 250-400 more ELO) when doubling the number of rollouts. ELF OpenGo, on the other hand, has a 55%-75% win rate when the number of blackeners rollout is doubled (ELO is about 35-200 more). In addition, as the number of rollouts increases, the incremental benefit of doubling rollout in black shrinks to nearly 50%, indicating that our model has a ceiling of capability in terms of the number of rollouts in black. This ceiling does not exist when the model is white, indicating that 7.5 tiles (white score bonus) has a considerable effect on black.

Since both sides using the same model can introduce bias (the one with more rollout sees all branches explored by the other), we also experimented with the prototype/final model and observed a similar trend — doubling rollout gives an ELO increase of about 200.

ELF OpenGo: Analysis and Open Re-implementation of AlphaZero

Arxiv.org/abs/1902.04…

AlphaGo, AlphaGo Zero and the AlphaZero algorithm family have brilliantly demonstrated the ability of deep reinforcement learning to superhuman levels of complexity in go games, with increasing autonomy. However, many obstacles remain in the research community to understand and use these promising methods. To clarify the unsolved puzzles and facilitate future research, we propose ELF OpenGo, an open source re-implementation of the AlphaZero algorithm. ELF OpenGo is the first open source Superman-level Go AI that has achieved a convincing perfect record (20:0) against the world’s top professional players. ELF OpenGo was used for extensive study and many interesting phenomena in model training and chess reasoning were also identified and analyzed. Our code, models, self-playing data sets and auxiliary data are all exposed.

Reference links: ai.facebook.com/blog/open-s…