Off-Policy Temporal Difference Learning with Bellman Residuals

Off-Policy Temporal Difference Learning with Bellman Residuals

In reinforcement learning, off-policy temporal difference learning methods have gained significant attention due to their flexibility in utilizing existing data. However, traditional off-policy temporal difference methods often suffer from poor convergence and stability when handling complex problem...

Full description

Saved in:

Bibliographic Details
Main Authors:	Shangdong Yang, Dingyuanhao Sun, Xingguo Chen
Format:	Article
Language:	English
Published:	MDPI AG 2024-11-01
Series:	Mathematics
Subjects:	reinforcement learning value function approximation stability off-policy Bellman residual
Online Access:	https://www.mdpi.com/2227-7390/12/22/3603
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Online Attentive Kernel-Based Off-Policy Temporal Difference Learning
by: Shangdong Yang, et al.
Published: (2024-11-01)

THE OPTIMAL PRINCIPLE OF BELLMAN IN THE PROBLEM OF OPTIMAL MEANS DISTRIBUTION BETWEEN ENTERPRISES FOR THE EXPANSION OF PRODUCTION
by: A. Tarasenko, et al.
Published: (2019-11-01)

Finite Differences on Sparse Grids for Continuous-Time Heterogeneous Agent Models
by: Jochen Garcke, et al.
Published: (2025-01-01)

Study of unusable liquid propellant residues evaporation processes parameters in the tanks of launch vehicle worked-off stage in microgravity
by: V. I. Trushlyakov, et al.
Published: (2019-06-01)

Optimal feedback control of dynamical systems via value-function approximation
by: Kunisch, Karl, et al.
Published: (2023-07-01)

Adaptive temporal-difference learning via deep neural network function approximation: a non-asymptotic analysis
by: Guoyong Wang, et al.
Published: (2025-01-01)

About one problem of optimal control synthesis
by: Muhametberdy Rakhimov
Published: (2023-12-01)

Traffic expertise meets residual RL: Knowledge-informed model-based residual reinforcement learning for CAV trajectory control
by: Zihao Sheng, et al.
Published: (2024-12-01)

Ecological Functional Zoning in Urban Fringe Areas Based on the Trade-Offs Between Ecological–Social Values in Ecosystem Services: A Case Study of Jiangning District, Nanjing
by: Ning Xu, et al.
Published: (2024-11-01)

The Impact of Divorce Cooling-Off Period on Registered Divorces: Evidence from China
by: Wenge Zheng, et al.
Published: (2024-11-01)

Optimal Control of a Harmonic Oscillator with Parametric Excitation
by: Dmitrii Kamzolkin, et al.
Published: (2024-12-01)

Sumudu residual power series method to solve time-fractional Fisher’s equation
by: Rajendra Pant, et al.
Published: (2025-01-01)

Wind power deployment in Germany: trade-offs of spatial planning instruments
by: Charlotte Geiger, et al.
Published: (2025-12-01)

Formation of the Native Topology of a Protein is due to the “Conserved but Non-Functional” Residues: A Case of Apomyoglobin Folding
by: Valentina E. Bychkova, et al.
Published: (2024-11-01)

The uniqueness of the best $L_1$-approximant of continuous Banach-valued functions under interpolatory constraints
by: M.Ye. Tkachenko, et al.
Published: (2024-12-01)

Selective imitation for efficient online reinforcement learning with pre-collected data
by: Chanin Eom, et al.
Published: (2024-12-01)

Off-label antineoplastic drugs. An effectiveness and safety study
by: Cristina Arroyo Álvarez, et al.
Published: (2017-05-01)

The Effect of Split-offs in Corporate Reorganization
by: Daisuke ASAOKA
Published: (2022-09-01)

Afternoon kick-off, evening kick-off, or night kick-off in the first German Bundesliga – A possible Injury risk factor?
by: Schiffner Erik, et al.
Published: (2024-01-01)

A Parallel Monte Carlo Algorithm for the Life Cycle Asset Allocation Problem
by: Xueying Yang, et al.
Published: (2024-11-01)

Function approximation method based on weights gradient descent in reinforcement learning
by: Xiaoyan QIN, et al.
Published: (2023-08-01)

Spinning Them Off: Entrepreneuring Practices in Corporate Spin-Offs
by: Katja Maria Hydle, et al.
Published: (2016-01-01)

Trade-Offs in Marine Policy Decisions Through the Lens of Literature
by: Joyce Dias Gois Rodrigues de Queiroz, et al.
Published: (2024-12-01)

Antibiotic Residues in Animal Products from Some African Countries and Their Possible Impact on Human Health
by: Oluwaseun Mary Oladeji, et al.
Published: (2025-01-01)

Residual reliability of a composite material subjected to buckling and post-buckling tests: the case of reinforced concrete
by: Aslain Brisco Ngnassi Djami, et al.
Published: (2024-12-01)

Reinforcement-Learning-Based Synthesis of Custom Approximate Parallel Prefix Adders
by: Apostolos Stefanidis, et al.
Published: (2024-12-01)

The cut-off term in Arabic grammar
by: Fatr kahila
Published: (2021-03-01)

Dynamic temporal reinforcement learning and policy-enhanced LSTM for hotel booking cancellation prediction
by: Junhua Xiao, et al.
Published: (2024-12-01)

Innovative Approach to the Optimal Distribution of Citizens’ Pension Savings to Non-State Pension Funds
by: Evgeniy Kostyrin, et al.
Published: (2025-02-01)

APPROXIMATE INTEGRATION OF HIGHLY OSCILLATING FUNCTIONS
by: I. N. Melashko, et al.
Published: (2017-07-01)

Strategic Timing in Financial Markets: Real Options Analysis of American Options
by: Gilles Tamba Bokolo, et al.
Published: (2024-12-01)

The Spatial and Temporal Characteristics of Ecosystem Service Trade-Offs and Synergistic Relationships in a Mountainous Region of Northern China
by: Xianglong Hou, et al.
Published: (2024-12-01)

Optimized reinforcement of granite residual soil using a cement and alkaline solution: A coupling effect
by: Bingxiang Yuan, et al.
Published: (2025-01-01)

Off-grid DOA estimation algorithm based on unitary transform and sparse Bayesian learning
by: Yang GAO, et al.
Published: (2017-06-01)

Physicochemical evaluation of okra residue obtained after pectin extraction
by: Gifty Williams, et al.
Published: (2023-09-01)

Enhanced Cutting Performance of 50Cr15MoV Martensitic Stainless Steel Through Controlled Residual Austenite Stability
by: Fujian Guo, et al.
Published: (2025-01-01)

Investigating Ecosystem Service Trade-Offs and Synergies: The Need for Correlations and Driving Factors in the Upper Fen River Basin of Shanxi Province, China
by: Zhongyi Ding, et al.
Published: (2024-11-01)

Sensory-directed flavor analysis of key odorants compounds development of French fries and oils in the break-in, optimum and degrading frying stage
by: Lirong Xu, et al.
Published: (2023-01-01)

Optimal Covid-19 Control on Effectiveness of Detection Campaign and Treatment
by: Mouhamadou A. M. T. Baldé, et al.
Published: (2025-04-01)

A numerical solution of parabolic quasi-variational inequality nonlinear using Newton-multigrid method
by: M. Bahi, et al.
Published: (2024-12-01)