Advantage estimator based on importance sampling

In continuous action tasks,deep reinforcement learning usually uses Gaussian distribution as a policy function.Aiming at the problem that the Gaussian distribution policy function slows down due to the clipped action,an importance sampling advantage estimator was proposed.Based on the general advant...

Full description

Saved in:

Bibliographic Details
Main Authors:	Quan LIU, Yubin JIANG, Zhihui HU
Format:	Article
Language:	zho
Published:	Editorial Department of Journal on Communications 2019-05-01
Series:	Tongxin xuebao
Subjects:	reinforcement learning importance sampling deep reinforcement learning advantage function
Online Access:	http://www.joconline.com.cn/zh/article/doi/10.11959/j.issn.1000-436x.2019122/
Tags:	Add Tag No Tags, Be the first to tag this record!

Internet

http://www.joconline.com.cn/zh/article/doi/10.11959/j.issn.1000-436x.2019122/

Advantage estimator based on importance sampling

Internet

Similar Items