Actor-critic algorithm with incremental dual natural policy gradient
The existed algorithms for continuous action space failed to consider the way of selecting optimal action and utilizing the knowledge of the action space,so an efficient actor-critic algorithm was proposed by improving the natural gradient.The objective of the proposed algorithm was to maximize the...
Saved in:
Main Authors: | , , , , |
---|---|
Format: | Article |
Language: | zho |
Published: |
Editorial Department of Journal on Communications
2017-04-01
|
Series: | Tongxin xuebao |
Subjects: | |
Online Access: | http://www.joconline.com.cn/zh/article/doi/10.11959/j.issn.1000-436x.2017089/ |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
_version_ | 1841539528351809536 |
---|---|
author | Peng ZHANG Quan LIU Shan ZHONG Jian-wei ZHAI Wei-sheng QIAN |
author_facet | Peng ZHANG Quan LIU Shan ZHONG Jian-wei ZHAI Wei-sheng QIAN |
author_sort | Peng ZHANG |
collection | DOAJ |
description | The existed algorithms for continuous action space failed to consider the way of selecting optimal action and utilizing the knowledge of the action space,so an efficient actor-critic algorithm was proposed by improving the natural gradient.The objective of the proposed algorithm was to maximize the expected return.Upper and the lower bounds of the action range were weighted to obtain the optimal action.The two bounds were approximated by linear function.Afterward,the problem of obtaining the optimal action was transferred to the learning of double policy parameter vectors.To speed the learning,the incremental Fisher information matrix and the eligibilities of both bounds were designed.At three reinforcement learning problems,compared with other representative methods with continuous action space,the simulation results show that the proposed algorithm has the advantages of rapid convergence rate and high convergence stability. |
format | Article |
id | doaj-art-a7c5f9298dbb44828af3d720c4972c3e |
institution | Kabale University |
issn | 1000-436X |
language | zho |
publishDate | 2017-04-01 |
publisher | Editorial Department of Journal on Communications |
record_format | Article |
series | Tongxin xuebao |
spelling | doaj-art-a7c5f9298dbb44828af3d720c4972c3e2025-01-14T07:12:06ZzhoEditorial Department of Journal on CommunicationsTongxin xuebao1000-436X2017-04-013816617759709336Actor-critic algorithm with incremental dual natural policy gradientPeng ZHANGQuan LIUShan ZHONGJian-wei ZHAIWei-sheng QIANThe existed algorithms for continuous action space failed to consider the way of selecting optimal action and utilizing the knowledge of the action space,so an efficient actor-critic algorithm was proposed by improving the natural gradient.The objective of the proposed algorithm was to maximize the expected return.Upper and the lower bounds of the action range were weighted to obtain the optimal action.The two bounds were approximated by linear function.Afterward,the problem of obtaining the optimal action was transferred to the learning of double policy parameter vectors.To speed the learning,the incremental Fisher information matrix and the eligibilities of both bounds were designed.At three reinforcement learning problems,compared with other representative methods with continuous action space,the simulation results show that the proposed algorithm has the advantages of rapid convergence rate and high convergence stability.http://www.joconline.com.cn/zh/article/doi/10.11959/j.issn.1000-436x.2017089/reinforcement learningnatural gradientactor-criticcontinuous space |
spellingShingle | Peng ZHANG Quan LIU Shan ZHONG Jian-wei ZHAI Wei-sheng QIAN Actor-critic algorithm with incremental dual natural policy gradient Tongxin xuebao reinforcement learning natural gradient actor-critic continuous space |
title | Actor-critic algorithm with incremental dual natural policy gradient |
title_full | Actor-critic algorithm with incremental dual natural policy gradient |
title_fullStr | Actor-critic algorithm with incremental dual natural policy gradient |
title_full_unstemmed | Actor-critic algorithm with incremental dual natural policy gradient |
title_short | Actor-critic algorithm with incremental dual natural policy gradient |
title_sort | actor critic algorithm with incremental dual natural policy gradient |
topic | reinforcement learning natural gradient actor-critic continuous space |
url | http://www.joconline.com.cn/zh/article/doi/10.11959/j.issn.1000-436x.2017089/ |
work_keys_str_mv | AT pengzhang actorcriticalgorithmwithincrementaldualnaturalpolicygradient AT quanliu actorcriticalgorithmwithincrementaldualnaturalpolicygradient AT shanzhong actorcriticalgorithmwithincrementaldualnaturalpolicygradient AT jianweizhai actorcriticalgorithmwithincrementaldualnaturalpolicygradient AT weishengqian actorcriticalgorithmwithincrementaldualnaturalpolicygradient |