Multilevel Constrained Bandits: A Hierarchical Upper Confidence Bound Approach with Safety Guarantees

The multi-armed bandit (MAB) problem is a foundational model for sequential decision-making under uncertainty. While MAB has proven valuable in applications such as clinical trials and online advertising, traditional formulations have limitations; specifically, they struggle to handle three key real...

Full description

Saved in:
Bibliographic Details
Main Author: Ali Baheri
Format: Article
Language:English
Published: MDPI AG 2025-01-01
Series:Mathematics
Subjects:
Online Access:https://www.mdpi.com/2227-7390/13/1/149
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1841549186425683968
author Ali Baheri
author_facet Ali Baheri
author_sort Ali Baheri
collection DOAJ
description The multi-armed bandit (MAB) problem is a foundational model for sequential decision-making under uncertainty. While MAB has proven valuable in applications such as clinical trials and online advertising, traditional formulations have limitations; specifically, they struggle to handle three key real-world scenarios: (1) when decisions must follow a hierarchical structure (as in autonomous systems where high-level strategy guides low-level actions); (2) when there are constraints at multiple levels of decision-making (such as both system-wide and component-level resource limits); and (3) when available actions depend on previous choices or context. To address these challenges, we introduce the hierarchical constrained bandits (HCB) framework, which extends contextual bandits to incorporate both hierarchical decisions and multilevel constraints. We propose the HC-UCB (hierarchical constrained upper confidence bound) algorithm to solve the HCB problem. The algorithm uses confidence bounds within a hierarchical setting to balance exploration and exploitation while respecting constraints at all levels. Our theoretical analysis establishes that HC-UCB achieves sublinear regret, guarantees constraint satisfaction at all hierarchical levels, and is near-optimal in terms of achievable performance. Simple experimental results demonstrate the algorithm’s effectiveness in balancing reward maximization with constraint satisfaction.
format Article
id doaj-art-ead94de45dd24df28af3f6ba936f535a
institution Kabale University
issn 2227-7390
language English
publishDate 2025-01-01
publisher MDPI AG
record_format Article
series Mathematics
spelling doaj-art-ead94de45dd24df28af3f6ba936f535a2025-01-10T13:18:25ZengMDPI AGMathematics2227-73902025-01-0113114910.3390/math13010149Multilevel Constrained Bandits: A Hierarchical Upper Confidence Bound Approach with Safety GuaranteesAli Baheri0Department of Mechanical Engineering, Rochester Institute of Technology, Rochester, NY 14623, USAThe multi-armed bandit (MAB) problem is a foundational model for sequential decision-making under uncertainty. While MAB has proven valuable in applications such as clinical trials and online advertising, traditional formulations have limitations; specifically, they struggle to handle three key real-world scenarios: (1) when decisions must follow a hierarchical structure (as in autonomous systems where high-level strategy guides low-level actions); (2) when there are constraints at multiple levels of decision-making (such as both system-wide and component-level resource limits); and (3) when available actions depend on previous choices or context. To address these challenges, we introduce the hierarchical constrained bandits (HCB) framework, which extends contextual bandits to incorporate both hierarchical decisions and multilevel constraints. We propose the HC-UCB (hierarchical constrained upper confidence bound) algorithm to solve the HCB problem. The algorithm uses confidence bounds within a hierarchical setting to balance exploration and exploitation while respecting constraints at all levels. Our theoretical analysis establishes that HC-UCB achieves sublinear regret, guarantees constraint satisfaction at all hierarchical levels, and is near-optimal in terms of achievable performance. Simple experimental results demonstrate the algorithm’s effectiveness in balancing reward maximization with constraint satisfaction.https://www.mdpi.com/2227-7390/13/1/149multi-armed banditconstrained optimizationdecision making under uncertainty
spellingShingle Ali Baheri
Multilevel Constrained Bandits: A Hierarchical Upper Confidence Bound Approach with Safety Guarantees
Mathematics
multi-armed bandit
constrained optimization
decision making under uncertainty
title Multilevel Constrained Bandits: A Hierarchical Upper Confidence Bound Approach with Safety Guarantees
title_full Multilevel Constrained Bandits: A Hierarchical Upper Confidence Bound Approach with Safety Guarantees
title_fullStr Multilevel Constrained Bandits: A Hierarchical Upper Confidence Bound Approach with Safety Guarantees
title_full_unstemmed Multilevel Constrained Bandits: A Hierarchical Upper Confidence Bound Approach with Safety Guarantees
title_short Multilevel Constrained Bandits: A Hierarchical Upper Confidence Bound Approach with Safety Guarantees
title_sort multilevel constrained bandits a hierarchical upper confidence bound approach with safety guarantees
topic multi-armed bandit
constrained optimization
decision making under uncertainty
url https://www.mdpi.com/2227-7390/13/1/149
work_keys_str_mv AT alibaheri multilevelconstrainedbanditsahierarchicalupperconfidenceboundapproachwithsafetyguarantees