Asynchronous Methods for Deep Reinforcement Learning

Abstract & Conclution

Abstract:

No new algorithm is proposed, but a new algorithm framework is proposed
Using all four algorithms in the frame of Asynchronous Methods works well, especially A3C.
A3C is very effective for continuous motion control problems, far ahead of DQN
Parallel multithreading training, using only a multi-core CPU instead of GPU, saving computing resources (DQN uses GPU)

Conclution:

Disadvantages of experience replay:

Asynchronous Methods framework can be applied to various algorithms of RL: on/off-policy, value-base/policy-base, DT/CT
Have the advantages of experience replay:

The Experience replay + Asynchronous Methods framework can greatly improve date effcient
Parallel multithreading training, using only a multi-core CPU instead of GPU, saving computing resources (DQN uses GPU)

Updating the model where interactingwith the environment is more expensive than updating the model for the architecture we used.

A3C algorithm can obtain many variations by changing advantage function
For value-base algorithms, an Asynchronous Methods framework can reduce overestimation bias of Q-values

Both abstract and conclution

A little bit about the past and the present, not so much

See the essay section of Sutton’s book for a quick review of Sutton’s book HHH