A brief discussion on the time scale of current reinforcement learning to solve problems

DeepMind and OpenAI both attracted a lot of attention in the video game world recently, and The OpenAI Bot got a lot of attention when it beat Dendi in Dota 2. Dendi said, ‘I give up’ after losing two rounds. Very calm and talked about some possible strategies to defeat the bot, like a top player who knows all the details of the game. Greg Brockman of OpenAI was on hand to say that this time next year, the same competition will be a 5-on-5 challenge. We noticed that only an ordinary desktop computer was used at the scene, which is said to be used by Greg himself. However, the model training was not conducted on this machine, but on the GPU of the server.

Last week, the official OpenAI blog post went into more detail about the project. Here’s a simplified description of the timeline:

March 1: First test results using classical reinforcement learning in Dota environment
May 8: Tester with 1.5K MMR score says he learns faster than the robot
Early June: Robot beats tester with 1.5K MMR score
June 30: Robot beats most of the testers with 3k MMR points
July 8: Robots almost always lose to semi-professional testers with 7.5K MMR points in the first round
August 7: Beat Blitz 3-0 (6.2k points, ex-pro), Pajkatt 2-1 (8.5K points pro), and CC&C 3-0 (8.9K points pro), all saying Sumail might have a way to beat the robots
Aug. 9: Beat Arteezy 10-0 (10K points, top pro), who says Sumail may have a way to beat the robots
August 10: Beat Sumail 6-0 (8.3K points, top 1-on-1 pro), he said the robot was unbeatable, but when Sumail played the August 9 version of the robot, he won 2-1
August 11: beat Dendi (7.3K pro, former world champion, old school favorite) 2-0, and bots of that day have a 60% win rate against bots of August 10

Robot TrueSkill score changes (theoretical limit is 101) :

In about five months, the game robot went from being able to move only by randomly calling the game’s API to beating top human players in 1-on-1 matches. This is the time scale on which reinforcement learning is currently used to solve such problems.

This type of game is particularly suited to using reinforcement learning to allow an agent to evolve on its own, similar to human behavior, and its interface with the “world” — the environment — is usually three — observing the environment, taking action, and getting feedback. An agent is designed to continuously enhance its capabilities based on feedback, and it is a system that only evolves, not degrades, for a specific purpose.

In their Dota project, OpenAI, they went out of their way to emphasize the following three points:

Observation: The environment data available to the robot is exactly the same as that available to humans, all from Dota’s own robot API, and the robot does not know everything about the game environment
Action: Robots give instructions to perform operations at an average human speed, such as moving, attacking, or using items
Feedback: The robot receives the same feedback signals as a human, including victory, health and last hits.

So the basic principle is to try to keep the robot’s inputs and outputs as close as possible to the human, but just test the decision system.

John McCarthy and Marvin Minsky came up with the term Artificial Intelligence in 1956, and they spent their long careers exploring possible ways in which Artificial Intelligence could be implemented. The first attempt was made in chess, because it was the most typical pure intelligence game, with clear rules and simple results, but with many variations. Other masters, such as Von Neumann, Shannon, Norbert Wiener, Alan Turing, Peter Norvig and others are also very interested. It was not until the second decade of this century that machine intelligence comprehensively defeated the best humans in chess, then go, poker, and then video games. The increase in computing power was evident over a period of about 60 years. Some video games have their particularity, need the assistance of machine vision, a lot of calculation, can not be said to be better than human beings. But the method is becoming clear.

Despite these advances, it’s still a long way from what we normally understand as intelligence, and the big names have been anticipating this, suggesting some paths to ultimate intelligence.

In his paper “Some Philosophical Problems from the Perspective of Artificial Intelligence”, McCarthy proposed that the brain can be regarded as a large organization composed of a finite number of small automatons, each of which can make some relatively simple and well-defined decisions. Thus, it is possible to combine problems in a complex world, Break it down into smaller, simpler problems to solve and achieve some degree of general intelligence. Because our understanding of intelligence is always limited, it is likely that in the future there will be a hybrid of living organisms and computers, but for now it is still the most feasible approach to simulate automata to solve a particular problem on a computer.

Roughly speaking, from 60 years to five months, solving problems is nearly 150 times faster, in the sense that computers have greatly extended a person’s life span. As long as the world remains largely at peace, within the next 500 years it will be possible for us to solve some of the problems we have been hoping to solve for tens of thousands of years since the awakening of human consciousness. We are already witnessing the changes brought about by the Internet and artificial intelligence in certain fields. Things that once existed only in religious ideals and literary fantasies are gradually becoming reality. So it’s an incredibly exciting time, full of incredible possibilities. So let’s explore.

A brief discussion on the time scale of current reinforcement learning to solve problems

Related Posts

Hadoop Notes Series 6 – Distributed storage Systems

Net enterprise level open source projects worth learning and practicing, highly recommended

Python Matplotlib plotting subgraph overview