Dual-timescale hierarchical MADDPG for Multi-UAV cooperative search

Abstract Cooperative exploration conducted by multiple unmanned aerial vehicles (UAVs) facilitates parallelized reconnaissance over expansive territories, thereby optimizing the efficiency of target localization. This study investigates the challenge of coordinated search for sparsely located, initi...

Full description

Saved in:
Bibliographic Details
Main Authors: Jiancheng Liu, Siwen Wei, Bo Li, Tuo Wang, Wanlong Qi, Xingye Han, Gang Hou, Ke Li, Yuqing Lin, Dingrui Xue, Kexin Wang
Format: Article
Language:English
Published: Springer 2025-07-01
Series:Journal of King Saud University: Computer and Information Sciences
Subjects:
Online Access:https://doi.org/10.1007/s44443-025-00156-6
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Abstract Cooperative exploration conducted by multiple unmanned aerial vehicles (UAVs) facilitates parallelized reconnaissance over expansive territories, thereby optimizing the efficiency of target localization. This study investigates the challenge of coordinated search for sparsely located, initially undiscovered stationary targets by a fleet of UAVs constrained by limited perceptual capabilities. Effective resolution of this issue is pivotal for attaining rapid situational awareness in expansive, time-sensitive missions such as disaster mitigation and strategic intelligence gathering. Nonetheless, prevailing methodologies for multi-UAV search frequently encounter limitations in concurrently achieving exhaustive spatial coverage and elevated target acquisition efficacy. To overcome these deficiencies, this study introduces a dual-timescale hierarchical reinforcement learning paradigm tailored for collaborative multi-UAV search missions. The proposed Dual-Timescale Hierarchical Multi-Agent Deep Deterministic Policy Gradient (DTH-MADDPG) architecture incorporates a high-level strategic controller and an array of low-level decentralized agents, thereby enabling temporally stratified policy optimization. This framework facilitates a more nuanced equilibrium between macro-scale environmental coverage and micro-scale target identification than monolithic architectures. Empirical evaluations within simulated operational environments reveal that DTH-MADDPG markedly surpasses contemporary benchmark algorithms, demonstrating superior scalability, accelerated convergence rates, and heightened resilience.
ISSN:1319-1578
2213-1248