Abstract & Conclusion

A DDPG algorithm which can be used in continuous action space is derived from DQN

DDPG = actor critic + deterministic polici gradient + deep-learning

There is no theory to prove the convergence of nonlinear function approximation, but it is very robust! (Robustly saves more than 20 simulated physics tasks,)
You can enter the original pixel directly! (Directly from raw pixel inputs.)
Compared to DQN, very few experiments are required! Note DDPG can solve more complex problems in the same time

The classic problems of Cartpole swing-up, Dexterous manipulation, locomotion and car driving are solved

Disadvantages: requires a large number of episodes to train (robustly saves more than 20 simulated physics tasks,)

Introduction

Defects of DQN:

It can only be applied to action Spaces of discretization & low latitude, not to continuous action Spaces directly, but must be discretized -> Curse of Dimensionality

The number of actions increases exponentially with the number of degrees of freedom

This leads to two problems

①. Large continuous action space is difficult to train

②. Discretization of the continuous action space throws away much useful information

Reasons why DQN is very stable and robust:

1) relpay buffer

(2) the taget qq network

Therefore, **DDPG= DQN + DPG= actor-critic + model-free + off-policy + deep-learning

Batch nomalization is used for stable and robust DDPG algorithms

mo4tech.com (Moment For Technology) is a global community with thousands techies from across the global hang out!Passionate technologists, be it gadget freaks, tech enthusiasts, coders, technopreneurs, or CIOs, you would find them all here.

Continuous control with Deep Reinforcement Learning

Abstract & Conclusion

Introduction

Continuous control with Deep Reinforcement Learning

Abstract & Conclusion

Introduction

Related Posts

Make your graduation photos from decades ago clear and divide them into steps.

Technology sharing | discuss TensorRT

How to use InsightFace for face recognition training?