Abstract & Conclution

Abstract:

  1. No new algorithm is proposed, but a new algorithm framework is proposed

  2. Using all four algorithms in the frame of Asynchronous Methods works well, especially A3C.

  3. A3C is very effective for continuous motion control problems, far ahead of DQN

  4. Parallel multithreading training, using only a multi-core CPU instead of GPU, saving computing resources (DQN uses GPU)

Conclution:

Disadvantages of experience replay:

  • A, a large number of computing and storage resources are required
  • B, can only be used for the off-policy algorithm
  1. Asynchronous Methods framework can be applied to various algorithms of RL: on/off-policy, value-base/policy-base, DT/CT

  2. Have the advantages of experience replay:

  • A) Reduce data correlation and make data stationary
  • Very date effcient
  1. The Experience replay + Asynchronous Methods framework can greatly improve date effcient
  2. Parallel multithreading training, using only a multi-core CPU instead of GPU, saving computing resources (DQN uses GPU)

Updating the model where interactingwith the environment is more expensive than updating the model for the architecture we used.

  1. A3C algorithm can obtain many variations by changing advantage function
  2. For value-base algorithms, an Asynchronous Methods framework can reduce overestimation bias of Q-values

1.introduction

Both abstract and conclution

2.Related Work

A little bit about the past and the present, not so much

3.Reinforcement Learning Background

See the essay section of Sutton’s book for a quick review of Sutton’s book HHH

[u] 4. Asynchronous RL Framework