Off-Policy Temporal Difference Learning with Bellman Residuals
In reinforcement learning, off-policy temporal difference learning methods have gained significant attention due to their flexibility in utilizing existing data. However, traditional off-policy temporal difference methods often suffer from poor convergence and stability when handling complex problem...
Saved in:
Main Authors: | Shangdong Yang, Dingyuanhao Sun, Xingguo Chen |
---|---|
Format: | Article |
Language: | English |
Published: |
MDPI AG
2024-11-01
|
Series: | Mathematics |
Subjects: | |
Online Access: | https://www.mdpi.com/2227-7390/12/22/3603 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
-
Online Attentive Kernel-Based Off-Policy Temporal Difference Learning
by: Shangdong Yang, et al.
Published: (2024-11-01) -
THE OPTIMAL PRINCIPLE OF BELLMAN IN THE PROBLEM OF OPTIMAL MEANS DISTRIBUTION BETWEEN ENTERPRISES FOR THE EXPANSION OF PRODUCTION
by: A. Tarasenko, et al.
Published: (2019-11-01) -
Finite Differences on Sparse Grids for Continuous-Time Heterogeneous Agent Models
by: Jochen Garcke, et al.
Published: (2025-01-01) -
Study of unusable liquid propellant residues evaporation processes parameters in the tanks of launch vehicle worked-off stage in microgravity
by: V. I. Trushlyakov, et al.
Published: (2019-06-01) -
Optimal feedback control of dynamical systems via value-function approximation
by: Kunisch, Karl, et al.
Published: (2023-07-01)