Multilevel Constrained Bandits: A Hierarchical Upper Confidence Bound Approach with Safety Guarantees

The multi-armed bandit (MAB) problem is a foundational model for sequential decision-making under uncertainty. While MAB has proven valuable in applications such as clinical trials and online advertising, traditional formulations have limitations; specifically, they struggle to handle three key real...

Full description

Saved in:

Bibliographic Details
Main Author:	Ali Baheri
Format:	Article
Language:	English
Published:	MDPI AG 2025-01-01
Series:	Mathematics
Subjects:	multi-armed bandit constrained optimization decision making under uncertainty
Online Access:	https://www.mdpi.com/2227-7390/13/1/149
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1841549186425683968
author	Ali Baheri
author_facet	Ali Baheri
author_sort	Ali Baheri
collection	DOAJ
description	The multi-armed bandit (MAB) problem is a foundational model for sequential decision-making under uncertainty. While MAB has proven valuable in applications such as clinical trials and online advertising, traditional formulations have limitations; specifically, they struggle to handle three key real-world scenarios: (1) when decisions must follow a hierarchical structure (as in autonomous systems where high-level strategy guides low-level actions); (2) when there are constraints at multiple levels of decision-making (such as both system-wide and component-level resource limits); and (3) when available actions depend on previous choices or context. To address these challenges, we introduce the hierarchical constrained bandits (HCB) framework, which extends contextual bandits to incorporate both hierarchical decisions and multilevel constraints. We propose the HC-UCB (hierarchical constrained upper confidence bound) algorithm to solve the HCB problem. The algorithm uses confidence bounds within a hierarchical setting to balance exploration and exploitation while respecting constraints at all levels. Our theoretical analysis establishes that HC-UCB achieves sublinear regret, guarantees constraint satisfaction at all hierarchical levels, and is near-optimal in terms of achievable performance. Simple experimental results demonstrate the algorithm’s effectiveness in balancing reward maximization with constraint satisfaction.
format	Article
id	doaj-art-ead94de45dd24df28af3f6ba936f535a
institution	Kabale University
issn	2227-7390
language	English
publishDate	2025-01-01
publisher	MDPI AG
record_format	Article
series	Mathematics
spelling	doaj-art-ead94de45dd24df28af3f6ba936f535a2025-01-10T13:18:25ZengMDPI AGMathematics2227-73902025-01-0113114910.3390/math13010149Multilevel Constrained Bandits: A Hierarchical Upper Confidence Bound Approach with Safety GuaranteesAli Baheri0Department of Mechanical Engineering, Rochester Institute of Technology, Rochester, NY 14623, USAThe multi-armed bandit (MAB) problem is a foundational model for sequential decision-making under uncertainty. While MAB has proven valuable in applications such as clinical trials and online advertising, traditional formulations have limitations; specifically, they struggle to handle three key real-world scenarios: (1) when decisions must follow a hierarchical structure (as in autonomous systems where high-level strategy guides low-level actions); (2) when there are constraints at multiple levels of decision-making (such as both system-wide and component-level resource limits); and (3) when available actions depend on previous choices or context. To address these challenges, we introduce the hierarchical constrained bandits (HCB) framework, which extends contextual bandits to incorporate both hierarchical decisions and multilevel constraints. We propose the HC-UCB (hierarchical constrained upper confidence bound) algorithm to solve the HCB problem. The algorithm uses confidence bounds within a hierarchical setting to balance exploration and exploitation while respecting constraints at all levels. Our theoretical analysis establishes that HC-UCB achieves sublinear regret, guarantees constraint satisfaction at all hierarchical levels, and is near-optimal in terms of achievable performance. Simple experimental results demonstrate the algorithm’s effectiveness in balancing reward maximization with constraint satisfaction.https://www.mdpi.com/2227-7390/13/1/149multi-armed banditconstrained optimizationdecision making under uncertainty
spellingShingle	Ali Baheri Multilevel Constrained Bandits: A Hierarchical Upper Confidence Bound Approach with Safety Guarantees Mathematics multi-armed bandit constrained optimization decision making under uncertainty
title	Multilevel Constrained Bandits: A Hierarchical Upper Confidence Bound Approach with Safety Guarantees
title_full	Multilevel Constrained Bandits: A Hierarchical Upper Confidence Bound Approach with Safety Guarantees
title_fullStr	Multilevel Constrained Bandits: A Hierarchical Upper Confidence Bound Approach with Safety Guarantees
title_full_unstemmed	Multilevel Constrained Bandits: A Hierarchical Upper Confidence Bound Approach with Safety Guarantees
title_short	Multilevel Constrained Bandits: A Hierarchical Upper Confidence Bound Approach with Safety Guarantees
title_sort	multilevel constrained bandits a hierarchical upper confidence bound approach with safety guarantees
topic	multi-armed bandit constrained optimization decision making under uncertainty
url	https://www.mdpi.com/2227-7390/13/1/149
work_keys_str_mv	AT alibaheri multilevelconstrainedbanditsahierarchicalupperconfidenceboundapproachwithsafetyguarantees

Multilevel Constrained Bandits: A Hierarchical Upper Confidence Bound Approach with Safety Guarantees

Similar Items