Off-Policy Temporal Difference Learning with Bellman Residuals
In reinforcement learning, off-policy temporal difference learning methods have gained significant attention due to their flexibility in utilizing existing data. However, traditional off-policy temporal difference methods often suffer from poor convergence and stability when handling complex problem...
Saved in:
Main Authors: | , , |
---|---|
Format: | Article |
Language: | English |
Published: |
MDPI AG
2024-11-01
|
Series: | Mathematics |
Subjects: | |
Online Access: | https://www.mdpi.com/2227-7390/12/22/3603 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|