Off-Policy Temporal Difference Learning with Bellman Residuals

In reinforcement learning, off-policy temporal difference learning methods have gained significant attention due to their flexibility in utilizing existing data. However, traditional off-policy temporal difference methods often suffer from poor convergence and stability when handling complex problem...

Full description

Saved in:
Bibliographic Details
Main Authors: Shangdong Yang, Dingyuanhao Sun, Xingguo Chen
Format: Article
Language:English
Published: MDPI AG 2024-11-01
Series:Mathematics
Subjects:
Online Access:https://www.mdpi.com/2227-7390/12/22/3603
Tags: Add Tag
No Tags, Be the first to tag this record!