Actor-critic algorithm with incremental dual natural policy gradient

The existed algorithms for continuous action space failed to consider the way of selecting optimal action and utilizing the knowledge of the action space,so an efficient actor-critic algorithm was proposed by improving the natural gradient.The objective of the proposed algorithm was to maximize the...

Full description

Saved in:
Bibliographic Details
Main Authors: Peng ZHANG, Quan LIU, Shan ZHONG, Jian-wei ZHAI, Wei-sheng QIAN
Format: Article
Language:zho
Published: Editorial Department of Journal on Communications 2017-04-01
Series:Tongxin xuebao
Subjects:
Online Access:http://www.joconline.com.cn/zh/article/doi/10.11959/j.issn.1000-436x.2017089/
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1841539528351809536
author Peng ZHANG
Quan LIU
Shan ZHONG
Jian-wei ZHAI
Wei-sheng QIAN
author_facet Peng ZHANG
Quan LIU
Shan ZHONG
Jian-wei ZHAI
Wei-sheng QIAN
author_sort Peng ZHANG
collection DOAJ
description The existed algorithms for continuous action space failed to consider the way of selecting optimal action and utilizing the knowledge of the action space,so an efficient actor-critic algorithm was proposed by improving the natural gradient.The objective of the proposed algorithm was to maximize the expected return.Upper and the lower bounds of the action range were weighted to obtain the optimal action.The two bounds were approximated by linear function.Afterward,the problem of obtaining the optimal action was transferred to the learning of double policy parameter vectors.To speed the learning,the incremental Fisher information matrix and the eligibilities of both bounds were designed.At three reinforcement learning problems,compared with other representative methods with continuous action space,the simulation results show that the proposed algorithm has the advantages of rapid convergence rate and high convergence stability.
format Article
id doaj-art-a7c5f9298dbb44828af3d720c4972c3e
institution Kabale University
issn 1000-436X
language zho
publishDate 2017-04-01
publisher Editorial Department of Journal on Communications
record_format Article
series Tongxin xuebao
spelling doaj-art-a7c5f9298dbb44828af3d720c4972c3e2025-01-14T07:12:06ZzhoEditorial Department of Journal on CommunicationsTongxin xuebao1000-436X2017-04-013816617759709336Actor-critic algorithm with incremental dual natural policy gradientPeng ZHANGQuan LIUShan ZHONGJian-wei ZHAIWei-sheng QIANThe existed algorithms for continuous action space failed to consider the way of selecting optimal action and utilizing the knowledge of the action space,so an efficient actor-critic algorithm was proposed by improving the natural gradient.The objective of the proposed algorithm was to maximize the expected return.Upper and the lower bounds of the action range were weighted to obtain the optimal action.The two bounds were approximated by linear function.Afterward,the problem of obtaining the optimal action was transferred to the learning of double policy parameter vectors.To speed the learning,the incremental Fisher information matrix and the eligibilities of both bounds were designed.At three reinforcement learning problems,compared with other representative methods with continuous action space,the simulation results show that the proposed algorithm has the advantages of rapid convergence rate and high convergence stability.http://www.joconline.com.cn/zh/article/doi/10.11959/j.issn.1000-436x.2017089/reinforcement learningnatural gradientactor-criticcontinuous space
spellingShingle Peng ZHANG
Quan LIU
Shan ZHONG
Jian-wei ZHAI
Wei-sheng QIAN
Actor-critic algorithm with incremental dual natural policy gradient
Tongxin xuebao
reinforcement learning
natural gradient
actor-critic
continuous space
title Actor-critic algorithm with incremental dual natural policy gradient
title_full Actor-critic algorithm with incremental dual natural policy gradient
title_fullStr Actor-critic algorithm with incremental dual natural policy gradient
title_full_unstemmed Actor-critic algorithm with incremental dual natural policy gradient
title_short Actor-critic algorithm with incremental dual natural policy gradient
title_sort actor critic algorithm with incremental dual natural policy gradient
topic reinforcement learning
natural gradient
actor-critic
continuous space
url http://www.joconline.com.cn/zh/article/doi/10.11959/j.issn.1000-436x.2017089/
work_keys_str_mv AT pengzhang actorcriticalgorithmwithincrementaldualnaturalpolicygradient
AT quanliu actorcriticalgorithmwithincrementaldualnaturalpolicygradient
AT shanzhong actorcriticalgorithmwithincrementaldualnaturalpolicygradient
AT jianweizhai actorcriticalgorithmwithincrementaldualnaturalpolicygradient
AT weishengqian actorcriticalgorithmwithincrementaldualnaturalpolicygradient