Bayesian Q learning method with Dyna architecture and prioritized sweeping

In order to balance this trade-off, a probability distribution was used in Bayesian Q learning method to de-scribe the uncertainty of the Q value and choose actions with this distribution. But the slow convergence is a big problem for Bayesian Q-Learning. In allusion to the above problems, a novel B...

Full description

Saved in:

Bibliographic Details
Main Authors:	Jun YU, Quan LIU, Qi-ming FU, Hong-kun SUN, Gui-xing CHEN
Format:	Article
Language:	zho
Published:	Editorial Department of Journal on Communications 2013-11-01
Series:	Tongxin xuebao
Subjects:	reinforcement learning Markov decision process prioritized sweeping Dyna architecture Bayesian Q learning
Online Access:	http://www.joconline.com.cn/zh/article/doi/10.3969/j.issn.1000-436x.2013.11.015/
Tags:	Add Tag No Tags, Be the first to tag this record!

Description
Summary:	In order to balance this trade-off, a probability distribution was used in Bayesian Q learning method to de-scribe the uncertainty of the Q value and choose actions with this distribution. But the slow convergence is a big problem for Bayesian Q-Learning. In allusion to the above problems, a novel B ian Q learning algorithm with Dyna architec-ture and prioritized sweeping, called Dyna-PS-BayesQL was proposed. The algorithm mainly includes two parts: in the learning part, it models the transition function and reward function according to collected samples, and update Q value function by Bayesian Q-learning, in the programming part, it updates the Q value function by using prioritized sweeping and dynamic programming methods based on the constructed model, which can improve the efficiency of using the his-torical information. Applying the Dyna-PS-BayesQL to the chain problem and maze navigation problem, the results show that the proposed algorithm can get a good performance of balancing the exploration and exploitation in the learning process, and get a better convergence performance.
ISSN:	1000-436X

Bayesian Q learning method with Dyna architecture and prioritized sweeping

Similar Items