Advantage estimator based on importance sampling

In continuous action tasks,deep reinforcement learning usually uses Gaussian distribution as a policy function.Aiming at the problem that the Gaussian distribution policy function slows down due to the clipped action,an importance sampling advantage estimator was proposed.Based on the general advant...

Full description

Saved in:
Bibliographic Details
Main Authors: Quan LIU, Yubin JIANG, Zhihui HU
Format: Article
Language:zho
Published: Editorial Department of Journal on Communications 2019-05-01
Series:Tongxin xuebao
Subjects:
Online Access:http://www.joconline.com.cn/zh/article/doi/10.11959/j.issn.1000-436x.2019122/
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1841539349042167808
author Quan LIU
Yubin JIANG
Zhihui HU
author_facet Quan LIU
Yubin JIANG
Zhihui HU
author_sort Quan LIU
collection DOAJ
description In continuous action tasks,deep reinforcement learning usually uses Gaussian distribution as a policy function.Aiming at the problem that the Gaussian distribution policy function slows down due to the clipped action,an importance sampling advantage estimator was proposed.Based on the general advantage estimator,an importance sampling mechanism was introduced by the estimator to improve the convergence speed of the algorithm and correct the deviation of the value function caused by calculating the target strategy and action strategy ratio of the boundary action.In addition,the L parameter was introduced by ISAE which improved the reliability of the sample and limited the stability of the network parameters by limiting the range of the importance sampling rate.In order to verify the effectiveness of the ISAE,applying it to proximal policy optimization and comparing it with other algorithms on the MuJoCo platform.Experimental results show that ISAE has a faster convergence rate.
format Article
id doaj-art-5d736fc56acc4fe58d13855006d0d6db
institution Kabale University
issn 1000-436X
language zho
publishDate 2019-05-01
publisher Editorial Department of Journal on Communications
record_format Article
series Tongxin xuebao
spelling doaj-art-5d736fc56acc4fe58d13855006d0d6db2025-01-14T07:16:57ZzhoEditorial Department of Journal on CommunicationsTongxin xuebao1000-436X2019-05-014010811659727096Advantage estimator based on importance samplingQuan LIUYubin JIANGZhihui HUIn continuous action tasks,deep reinforcement learning usually uses Gaussian distribution as a policy function.Aiming at the problem that the Gaussian distribution policy function slows down due to the clipped action,an importance sampling advantage estimator was proposed.Based on the general advantage estimator,an importance sampling mechanism was introduced by the estimator to improve the convergence speed of the algorithm and correct the deviation of the value function caused by calculating the target strategy and action strategy ratio of the boundary action.In addition,the L parameter was introduced by ISAE which improved the reliability of the sample and limited the stability of the network parameters by limiting the range of the importance sampling rate.In order to verify the effectiveness of the ISAE,applying it to proximal policy optimization and comparing it with other algorithms on the MuJoCo platform.Experimental results show that ISAE has a faster convergence rate.http://www.joconline.com.cn/zh/article/doi/10.11959/j.issn.1000-436x.2019122/reinforcement learningimportance samplingdeep reinforcement learningadvantage function
spellingShingle Quan LIU
Yubin JIANG
Zhihui HU
Advantage estimator based on importance sampling
Tongxin xuebao
reinforcement learning
importance sampling
deep reinforcement learning
advantage function
title Advantage estimator based on importance sampling
title_full Advantage estimator based on importance sampling
title_fullStr Advantage estimator based on importance sampling
title_full_unstemmed Advantage estimator based on importance sampling
title_short Advantage estimator based on importance sampling
title_sort advantage estimator based on importance sampling
topic reinforcement learning
importance sampling
deep reinforcement learning
advantage function
url http://www.joconline.com.cn/zh/article/doi/10.11959/j.issn.1000-436x.2019122/
work_keys_str_mv AT quanliu advantageestimatorbasedonimportancesampling
AT yubinjiang advantageestimatorbasedonimportancesampling
AT zhihuihu advantageestimatorbasedonimportancesampling