Advantage estimator based on importance sampling

In continuous action tasks,deep reinforcement learning usually uses Gaussian distribution as a policy function.Aiming at the problem that the Gaussian distribution policy function slows down due to the clipped action,an importance sampling advantage estimator was proposed.Based on the general advant...

Full description

Saved in:

Bibliographic Details
Main Authors:	Quan LIU, Yubin JIANG, Zhihui HU
Format:	Article
Language:	zho
Published:	Editorial Department of Journal on Communications 2019-05-01
Series:	Tongxin xuebao
Subjects:	reinforcement learning importance sampling deep reinforcement learning advantage function
Online Access:	http://www.joconline.com.cn/zh/article/doi/10.11959/j.issn.1000-436x.2019122/
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1841539349042167808
author	Quan LIU Yubin JIANG Zhihui HU
author_facet	Quan LIU Yubin JIANG Zhihui HU
author_sort	Quan LIU
collection	DOAJ
description	In continuous action tasks,deep reinforcement learning usually uses Gaussian distribution as a policy function.Aiming at the problem that the Gaussian distribution policy function slows down due to the clipped action,an importance sampling advantage estimator was proposed.Based on the general advantage estimator,an importance sampling mechanism was introduced by the estimator to improve the convergence speed of the algorithm and correct the deviation of the value function caused by calculating the target strategy and action strategy ratio of the boundary action.In addition,the L parameter was introduced by ISAE which improved the reliability of the sample and limited the stability of the network parameters by limiting the range of the importance sampling rate.In order to verify the effectiveness of the ISAE,applying it to proximal policy optimization and comparing it with other algorithms on the MuJoCo platform.Experimental results show that ISAE has a faster convergence rate.
format	Article
id	doaj-art-5d736fc56acc4fe58d13855006d0d6db
institution	Kabale University
issn	1000-436X
language	zho
publishDate	2019-05-01
publisher	Editorial Department of Journal on Communications
record_format	Article
series	Tongxin xuebao
spelling	doaj-art-5d736fc56acc4fe58d13855006d0d6db2025-01-14T07:16:57ZzhoEditorial Department of Journal on CommunicationsTongxin xuebao1000-436X2019-05-014010811659727096Advantage estimator based on importance samplingQuan LIUYubin JIANGZhihui HUIn continuous action tasks,deep reinforcement learning usually uses Gaussian distribution as a policy function.Aiming at the problem that the Gaussian distribution policy function slows down due to the clipped action,an importance sampling advantage estimator was proposed.Based on the general advantage estimator,an importance sampling mechanism was introduced by the estimator to improve the convergence speed of the algorithm and correct the deviation of the value function caused by calculating the target strategy and action strategy ratio of the boundary action.In addition,the L parameter was introduced by ISAE which improved the reliability of the sample and limited the stability of the network parameters by limiting the range of the importance sampling rate.In order to verify the effectiveness of the ISAE,applying it to proximal policy optimization and comparing it with other algorithms on the MuJoCo platform.Experimental results show that ISAE has a faster convergence rate.http://www.joconline.com.cn/zh/article/doi/10.11959/j.issn.1000-436x.2019122/reinforcement learningimportance samplingdeep reinforcement learningadvantage function
spellingShingle	Quan LIU Yubin JIANG Zhihui HU Advantage estimator based on importance sampling Tongxin xuebao reinforcement learning importance sampling deep reinforcement learning advantage function
title	Advantage estimator based on importance sampling
title_full	Advantage estimator based on importance sampling
title_fullStr	Advantage estimator based on importance sampling
title_full_unstemmed	Advantage estimator based on importance sampling
title_short	Advantage estimator based on importance sampling
title_sort	advantage estimator based on importance sampling
topic	reinforcement learning importance sampling deep reinforcement learning advantage function
url	http://www.joconline.com.cn/zh/article/doi/10.11959/j.issn.1000-436x.2019122/
work_keys_str_mv	AT quanliu advantageestimatorbasedonimportancesampling AT yubinjiang advantageestimatorbasedonimportancesampling AT zhihuihu advantageestimatorbasedonimportancesampling

Advantage estimator based on importance sampling

Similar Items