Off-Policy Temporal Difference Learning with Bellman Residuals

In reinforcement learning, off-policy temporal difference learning methods have gained significant attention due to their flexibility in utilizing existing data. However, traditional off-policy temporal difference methods often suffer from poor convergence and stability when handling complex problem...

Full description

Saved in:

Bibliographic Details
Main Authors:	Shangdong Yang, Dingyuanhao Sun, Xingguo Chen
Format:	Article
Language:	English
Published:	MDPI AG 2024-11-01
Series:	Mathematics
Subjects:	reinforcement learning value function approximation stability off-policy Bellman residual
Online Access:	https://www.mdpi.com/2227-7390/12/22/3603
Tags:	Add Tag No Tags, Be the first to tag this record!

Internet

https://www.mdpi.com/2227-7390/12/22/3603

Off-Policy Temporal Difference Learning with Bellman Residuals

Internet

Similar Items