Abstract & Conclusion

A DDPG algorithm which can be used in continuous action space is derived from DQN

DDPG = actor critic + deterministic polici gradient + deep-learning

  1. There is no theory to prove the convergence of nonlinear function approximation, but it is very robust! (Robustly saves more than 20 simulated physics tasks,)

  2. You can enter the original pixel directly! (Directly from raw pixel inputs.)

  3. Compared to DQN, very few experiments are required! Note DDPG can solve more complex problems in the same time

The classic problems of Cartpole swing-up, Dexterous manipulation, locomotion and car driving are solved

Disadvantages: requires a large number of episodes to train (robustly saves more than 20 simulated physics tasks,)

Introduction

Defects of DQN:

It can only be applied to action Spaces of discretization & low latitude, not to continuous action Spaces directly, but must be discretized -> Curse of Dimensionality

The number of actions increases exponentially with the number of degrees of freedom

This leads to two problems

①. Large continuous action space is difficult to train

②. Discretization of the continuous action space throws away much useful information

Reasons why DQN is very stable and robust:

1) relpay buffer

(2) the taget qq network

Therefore, **DDPG= DQN + DPG= actor-critic + model-free + off-policy + deep-learning

Batch nomalization is used for stable and robust DDPG algorithms