Nonstationary Stochastic Bandits: UCB Policies and Minimax Regret
We study the nonstationary stochastic Multi-Armed Bandit (MAB) problem in which the distributions of rewards associated with arms are assumed to be time-varying and the total variation in the expected rewards is subject to a variation budget. The regret of a policy is defined by the difference in th...
Saved in:
Main Authors: | Lai Wei, Vaibhav Srivastava |
---|---|
Format: | Article |
Language: | English |
Published: |
IEEE
2024-01-01
|
Series: | IEEE Open Journal of Control Systems |
Subjects: | |
Online Access: | https://ieeexplore.ieee.org/document/10460198/ |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
-
Thompson Sampling for Non-Stationary Bandit Problems
by: Han Qi, et al.
Published: (2025-01-01) -
A Double-Edged Sword: Bandits in the Javanese Revolution: Foes or Friends?
by: Anung Jati Nugraha Mukti
Published: (2024-07-01) -
Manipulation Game Considering No-Regret Strategies
by: Julio B. Clempner
Published: (2025-01-01) -
Formal Verification of Multi-Thread Minimax Behavior Using mCRL2 in the Connect 4
by: Diego Escobar, et al.
Published: (2024-12-01) -
Illustration with the Emotion of Grief and Regret in Poems of Siavash Kasraei
by: mohammadamin ehsani estahbanati, et al.
Published: (2018-08-01)