Dual-timescale hierarchical MADDPG for Multi-UAV cooperative search

Abstract Cooperative exploration conducted by multiple unmanned aerial vehicles (UAVs) facilitates parallelized reconnaissance over expansive territories, thereby optimizing the efficiency of target localization. This study investigates the challenge of coordinated search for sparsely located, initi...

Full description

Saved in:
Bibliographic Details
Main Authors: Jiancheng Liu, Siwen Wei, Bo Li, Tuo Wang, Wanlong Qi, Xingye Han, Gang Hou, Ke Li, Yuqing Lin, Dingrui Xue, Kexin Wang
Format: Article
Language:English
Published: Springer 2025-07-01
Series:Journal of King Saud University: Computer and Information Sciences
Subjects:
Online Access:https://doi.org/10.1007/s44443-025-00156-6
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1849235705560039424
author Jiancheng Liu
Siwen Wei
Bo Li
Tuo Wang
Wanlong Qi
Xingye Han
Gang Hou
Ke Li
Yuqing Lin
Dingrui Xue
Kexin Wang
author_facet Jiancheng Liu
Siwen Wei
Bo Li
Tuo Wang
Wanlong Qi
Xingye Han
Gang Hou
Ke Li
Yuqing Lin
Dingrui Xue
Kexin Wang
author_sort Jiancheng Liu
collection DOAJ
description Abstract Cooperative exploration conducted by multiple unmanned aerial vehicles (UAVs) facilitates parallelized reconnaissance over expansive territories, thereby optimizing the efficiency of target localization. This study investigates the challenge of coordinated search for sparsely located, initially undiscovered stationary targets by a fleet of UAVs constrained by limited perceptual capabilities. Effective resolution of this issue is pivotal for attaining rapid situational awareness in expansive, time-sensitive missions such as disaster mitigation and strategic intelligence gathering. Nonetheless, prevailing methodologies for multi-UAV search frequently encounter limitations in concurrently achieving exhaustive spatial coverage and elevated target acquisition efficacy. To overcome these deficiencies, this study introduces a dual-timescale hierarchical reinforcement learning paradigm tailored for collaborative multi-UAV search missions. The proposed Dual-Timescale Hierarchical Multi-Agent Deep Deterministic Policy Gradient (DTH-MADDPG) architecture incorporates a high-level strategic controller and an array of low-level decentralized agents, thereby enabling temporally stratified policy optimization. This framework facilitates a more nuanced equilibrium between macro-scale environmental coverage and micro-scale target identification than monolithic architectures. Empirical evaluations within simulated operational environments reveal that DTH-MADDPG markedly surpasses contemporary benchmark algorithms, demonstrating superior scalability, accelerated convergence rates, and heightened resilience.
format Article
id doaj-art-b2c9f903656c4a429178b4bdc9ff124d
institution Kabale University
issn 1319-1578
2213-1248
language English
publishDate 2025-07-01
publisher Springer
record_format Article
series Journal of King Saud University: Computer and Information Sciences
spelling doaj-art-b2c9f903656c4a429178b4bdc9ff124d2025-08-20T04:02:41ZengSpringerJournal of King Saud University: Computer and Information Sciences1319-15782213-12482025-07-0137611710.1007/s44443-025-00156-6Dual-timescale hierarchical MADDPG for Multi-UAV cooperative searchJiancheng Liu0Siwen Wei1Bo Li2Tuo Wang3Wanlong Qi4Xingye Han5Gang Hou6Ke Li7Yuqing Lin8Dingrui Xue9Kexin Wang10Northwest Institute of Mechanical and Electrical EngineeringSchool of Computer Science and Technology, Xidian UniversityNorthwest Institute of Mechanical and Electrical EngineeringNorthwest Institute of Mechanical and Electrical EngineeringNorthwest Institute of Mechanical and Electrical EngineeringNorthwest Institute of Mechanical and Electrical EngineeringNorthwest Institute of Mechanical and Electrical EngineeringNorthwest Institute of Mechanical and Electrical EngineeringSchool of Information Engineering, Chang’an UniversitySchool of Information Engineering, Chang’an UniversitySchool of Computer Science and Technology, Xidian UniversityAbstract Cooperative exploration conducted by multiple unmanned aerial vehicles (UAVs) facilitates parallelized reconnaissance over expansive territories, thereby optimizing the efficiency of target localization. This study investigates the challenge of coordinated search for sparsely located, initially undiscovered stationary targets by a fleet of UAVs constrained by limited perceptual capabilities. Effective resolution of this issue is pivotal for attaining rapid situational awareness in expansive, time-sensitive missions such as disaster mitigation and strategic intelligence gathering. Nonetheless, prevailing methodologies for multi-UAV search frequently encounter limitations in concurrently achieving exhaustive spatial coverage and elevated target acquisition efficacy. To overcome these deficiencies, this study introduces a dual-timescale hierarchical reinforcement learning paradigm tailored for collaborative multi-UAV search missions. The proposed Dual-Timescale Hierarchical Multi-Agent Deep Deterministic Policy Gradient (DTH-MADDPG) architecture incorporates a high-level strategic controller and an array of low-level decentralized agents, thereby enabling temporally stratified policy optimization. This framework facilitates a more nuanced equilibrium between macro-scale environmental coverage and micro-scale target identification than monolithic architectures. Empirical evaluations within simulated operational environments reveal that DTH-MADDPG markedly surpasses contemporary benchmark algorithms, demonstrating superior scalability, accelerated convergence rates, and heightened resilience.https://doi.org/10.1007/s44443-025-00156-6Multi-agent reinforcement learningMarkov decision processMulti-UAV cooperative search
spellingShingle Jiancheng Liu
Siwen Wei
Bo Li
Tuo Wang
Wanlong Qi
Xingye Han
Gang Hou
Ke Li
Yuqing Lin
Dingrui Xue
Kexin Wang
Dual-timescale hierarchical MADDPG for Multi-UAV cooperative search
Journal of King Saud University: Computer and Information Sciences
Multi-agent reinforcement learning
Markov decision process
Multi-UAV cooperative search
title Dual-timescale hierarchical MADDPG for Multi-UAV cooperative search
title_full Dual-timescale hierarchical MADDPG for Multi-UAV cooperative search
title_fullStr Dual-timescale hierarchical MADDPG for Multi-UAV cooperative search
title_full_unstemmed Dual-timescale hierarchical MADDPG for Multi-UAV cooperative search
title_short Dual-timescale hierarchical MADDPG for Multi-UAV cooperative search
title_sort dual timescale hierarchical maddpg for multi uav cooperative search
topic Multi-agent reinforcement learning
Markov decision process
Multi-UAV cooperative search
url https://doi.org/10.1007/s44443-025-00156-6
work_keys_str_mv AT jianchengliu dualtimescalehierarchicalmaddpgformultiuavcooperativesearch
AT siwenwei dualtimescalehierarchicalmaddpgformultiuavcooperativesearch
AT boli dualtimescalehierarchicalmaddpgformultiuavcooperativesearch
AT tuowang dualtimescalehierarchicalmaddpgformultiuavcooperativesearch
AT wanlongqi dualtimescalehierarchicalmaddpgformultiuavcooperativesearch
AT xingyehan dualtimescalehierarchicalmaddpgformultiuavcooperativesearch
AT ganghou dualtimescalehierarchicalmaddpgformultiuavcooperativesearch
AT keli dualtimescalehierarchicalmaddpgformultiuavcooperativesearch
AT yuqinglin dualtimescalehierarchicalmaddpgformultiuavcooperativesearch
AT dingruixue dualtimescalehierarchicalmaddpgformultiuavcooperativesearch
AT kexinwang dualtimescalehierarchicalmaddpgformultiuavcooperativesearch