Advantage estimator based on importance sampling

In continuous action tasks,deep reinforcement learning usually uses Gaussian distribution as a policy function.Aiming at the problem that the Gaussian distribution policy function slows down due to the clipped action,an importance sampling advantage estimator was proposed.Based on the general advant...

Full description

Saved in:
Bibliographic Details
Main Authors: Quan LIU, Yubin JIANG, Zhihui HU
Format: Article
Language:zho
Published: Editorial Department of Journal on Communications 2019-05-01
Series:Tongxin xuebao
Subjects:
Online Access:http://www.joconline.com.cn/zh/article/doi/10.11959/j.issn.1000-436x.2019122/
Tags: Add Tag
No Tags, Be the first to tag this record!