Multilevel Constrained Bandits: A Hierarchical Upper Confidence Bound Approach with Safety Guarantees

The multi-armed bandit (MAB) problem is a foundational model for sequential decision-making under uncertainty. While MAB has proven valuable in applications such as clinical trials and online advertising, traditional formulations have limitations; specifically, they struggle to handle three key real...

Full description

Saved in:

Bibliographic Details
Main Author:	Ali Baheri
Format:	Article
Language:	English
Published:	MDPI AG 2025-01-01
Series:	Mathematics
Subjects:	multi-armed bandit constrained optimization decision making under uncertainty
Online Access:	https://www.mdpi.com/2227-7390/13/1/149
Tags:	Add Tag No Tags, Be the first to tag this record!

Description
Summary:	The multi-armed bandit (MAB) problem is a foundational model for sequential decision-making under uncertainty. While MAB has proven valuable in applications such as clinical trials and online advertising, traditional formulations have limitations; specifically, they struggle to handle three key real-world scenarios: (1) when decisions must follow a hierarchical structure (as in autonomous systems where high-level strategy guides low-level actions); (2) when there are constraints at multiple levels of decision-making (such as both system-wide and component-level resource limits); and (3) when available actions depend on previous choices or context. To address these challenges, we introduce the hierarchical constrained bandits (HCB) framework, which extends contextual bandits to incorporate both hierarchical decisions and multilevel constraints. We propose the HC-UCB (hierarchical constrained upper confidence bound) algorithm to solve the HCB problem. The algorithm uses confidence bounds within a hierarchical setting to balance exploration and exploitation while respecting constraints at all levels. Our theoretical analysis establishes that HC-UCB achieves sublinear regret, guarantees constraint satisfaction at all hierarchical levels, and is near-optimal in terms of achievable performance. Simple experimental results demonstrate the algorithm’s effectiveness in balancing reward maximization with constraint satisfaction.
ISSN:	2227-7390

Multilevel Constrained Bandits: A Hierarchical Upper Confidence Bound Approach with Safety Guarantees

Similar Items