Advantage estimator based on importance sampling
In continuous action tasks,deep reinforcement learning usually uses Gaussian distribution as a policy function.Aiming at the problem that the Gaussian distribution policy function slows down due to the clipped action,an importance sampling advantage estimator was proposed.Based on the general advant...
Saved in:
Main Authors: | , , |
---|---|
Format: | Article |
Language: | zho |
Published: |
Editorial Department of Journal on Communications
2019-05-01
|
Series: | Tongxin xuebao |
Subjects: | |
Online Access: | http://www.joconline.com.cn/zh/article/doi/10.11959/j.issn.1000-436x.2019122/ |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
_version_ | 1841539349042167808 |
---|---|
author | Quan LIU Yubin JIANG Zhihui HU |
author_facet | Quan LIU Yubin JIANG Zhihui HU |
author_sort | Quan LIU |
collection | DOAJ |
description | In continuous action tasks,deep reinforcement learning usually uses Gaussian distribution as a policy function.Aiming at the problem that the Gaussian distribution policy function slows down due to the clipped action,an importance sampling advantage estimator was proposed.Based on the general advantage estimator,an importance sampling mechanism was introduced by the estimator to improve the convergence speed of the algorithm and correct the deviation of the value function caused by calculating the target strategy and action strategy ratio of the boundary action.In addition,the L parameter was introduced by ISAE which improved the reliability of the sample and limited the stability of the network parameters by limiting the range of the importance sampling rate.In order to verify the effectiveness of the ISAE,applying it to proximal policy optimization and comparing it with other algorithms on the MuJoCo platform.Experimental results show that ISAE has a faster convergence rate. |
format | Article |
id | doaj-art-5d736fc56acc4fe58d13855006d0d6db |
institution | Kabale University |
issn | 1000-436X |
language | zho |
publishDate | 2019-05-01 |
publisher | Editorial Department of Journal on Communications |
record_format | Article |
series | Tongxin xuebao |
spelling | doaj-art-5d736fc56acc4fe58d13855006d0d6db2025-01-14T07:16:57ZzhoEditorial Department of Journal on CommunicationsTongxin xuebao1000-436X2019-05-014010811659727096Advantage estimator based on importance samplingQuan LIUYubin JIANGZhihui HUIn continuous action tasks,deep reinforcement learning usually uses Gaussian distribution as a policy function.Aiming at the problem that the Gaussian distribution policy function slows down due to the clipped action,an importance sampling advantage estimator was proposed.Based on the general advantage estimator,an importance sampling mechanism was introduced by the estimator to improve the convergence speed of the algorithm and correct the deviation of the value function caused by calculating the target strategy and action strategy ratio of the boundary action.In addition,the L parameter was introduced by ISAE which improved the reliability of the sample and limited the stability of the network parameters by limiting the range of the importance sampling rate.In order to verify the effectiveness of the ISAE,applying it to proximal policy optimization and comparing it with other algorithms on the MuJoCo platform.Experimental results show that ISAE has a faster convergence rate.http://www.joconline.com.cn/zh/article/doi/10.11959/j.issn.1000-436x.2019122/reinforcement learningimportance samplingdeep reinforcement learningadvantage function |
spellingShingle | Quan LIU Yubin JIANG Zhihui HU Advantage estimator based on importance sampling Tongxin xuebao reinforcement learning importance sampling deep reinforcement learning advantage function |
title | Advantage estimator based on importance sampling |
title_full | Advantage estimator based on importance sampling |
title_fullStr | Advantage estimator based on importance sampling |
title_full_unstemmed | Advantage estimator based on importance sampling |
title_short | Advantage estimator based on importance sampling |
title_sort | advantage estimator based on importance sampling |
topic | reinforcement learning importance sampling deep reinforcement learning advantage function |
url | http://www.joconline.com.cn/zh/article/doi/10.11959/j.issn.1000-436x.2019122/ |
work_keys_str_mv | AT quanliu advantageestimatorbasedonimportancesampling AT yubinjiang advantageestimatorbasedonimportancesampling AT zhihuihu advantageestimatorbasedonimportancesampling |