Multi-UAV Path Planning for Air-Ground Relay Communication Based on Mix-Greedy MAPPO Algorithm

With the continuous development of modern UAV technology and communication technology, UAV-to-ground communication relay has become a research hotspot. In this paper, a Multi-Agent Reinforcement Learning (MARL) method based on the ε-greedy strategy and multi-agent proximal policy optimization (MAPPO...

Full description

Saved in:
Bibliographic Details
Main Authors: Yiquan Wang, Yan Cui, Yu Yang, Zhaodong Li, Xing Cui
Format: Article
Language:English
Published: MDPI AG 2024-11-01
Series:Drones
Subjects:
Online Access:https://www.mdpi.com/2504-446X/8/12/706
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1846105017218498560
author Yiquan Wang
Yan Cui
Yu Yang
Zhaodong Li
Xing Cui
author_facet Yiquan Wang
Yan Cui
Yu Yang
Zhaodong Li
Xing Cui
author_sort Yiquan Wang
collection DOAJ
description With the continuous development of modern UAV technology and communication technology, UAV-to-ground communication relay has become a research hotspot. In this paper, a Multi-Agent Reinforcement Learning (MARL) method based on the ε-greedy strategy and multi-agent proximal policy optimization (MAPPO) algorithm is proposed to address the local optimization problem, improving the communication efficiency and task execution capability of UAV cluster control. This paper explores the path planning problem in multi-UAV-to-ground relay communication, with a special focus on the application of the proposed Mix-Greedy MAPPO algorithm. The state space, action space, communication model, training environment, and reward function are designed by comprehensively considering the actual tasks and entity characteristics such as safe distance, no-fly zones, survival in a threatened environment, and energy consumption. The results show that the Mix-Greedy MAPPO algorithm significantly improves communication probability, reduces energy consumption, avoids no-fly zones, and facilitates exploration compared to other algorithms in the multi-UAV ground communication relay path planning task. After training with the same number of steps, the Mix-Greedy MAPPO algorithm has an average reward score that is 45.9% higher than the MAPPO algorithm and several times higher than the multi-agent soft actor-critic (MASAC) and multi-agent deep deterministic policy gradient (MADDPG) algorithms. The experimental results verify the superiority and adaptability of the algorithm in complex environments.
format Article
id doaj-art-b6dd1a83ee084bceb8d5dc4bb67b8201
institution Kabale University
issn 2504-446X
language English
publishDate 2024-11-01
publisher MDPI AG
record_format Article
series Drones
spelling doaj-art-b6dd1a83ee084bceb8d5dc4bb67b82012024-12-27T14:21:43ZengMDPI AGDrones2504-446X2024-11-0181270610.3390/drones8120706Multi-UAV Path Planning for Air-Ground Relay Communication Based on Mix-Greedy MAPPO AlgorithmYiquan Wang0Yan Cui1Yu Yang2Zhaodong Li3Xing Cui4China North Artificial Intelligence and Innovation Research Institute, Beijing 100072, ChinaChina North Artificial Intelligence and Innovation Research Institute, Beijing 100072, ChinaChina North Artificial Intelligence and Innovation Research Institute, Beijing 100072, ChinaChina North Artificial Intelligence and Innovation Research Institute, Beijing 100072, ChinaChina North Artificial Intelligence and Innovation Research Institute, Beijing 100072, ChinaWith the continuous development of modern UAV technology and communication technology, UAV-to-ground communication relay has become a research hotspot. In this paper, a Multi-Agent Reinforcement Learning (MARL) method based on the ε-greedy strategy and multi-agent proximal policy optimization (MAPPO) algorithm is proposed to address the local optimization problem, improving the communication efficiency and task execution capability of UAV cluster control. This paper explores the path planning problem in multi-UAV-to-ground relay communication, with a special focus on the application of the proposed Mix-Greedy MAPPO algorithm. The state space, action space, communication model, training environment, and reward function are designed by comprehensively considering the actual tasks and entity characteristics such as safe distance, no-fly zones, survival in a threatened environment, and energy consumption. The results show that the Mix-Greedy MAPPO algorithm significantly improves communication probability, reduces energy consumption, avoids no-fly zones, and facilitates exploration compared to other algorithms in the multi-UAV ground communication relay path planning task. After training with the same number of steps, the Mix-Greedy MAPPO algorithm has an average reward score that is 45.9% higher than the MAPPO algorithm and several times higher than the multi-agent soft actor-critic (MASAC) and multi-agent deep deterministic policy gradient (MADDPG) algorithms. The experimental results verify the superiority and adaptability of the algorithm in complex environments.https://www.mdpi.com/2504-446X/8/12/706path planningUAVrelay communication
spellingShingle Yiquan Wang
Yan Cui
Yu Yang
Zhaodong Li
Xing Cui
Multi-UAV Path Planning for Air-Ground Relay Communication Based on Mix-Greedy MAPPO Algorithm
Drones
path planning
UAV
relay communication
title Multi-UAV Path Planning for Air-Ground Relay Communication Based on Mix-Greedy MAPPO Algorithm
title_full Multi-UAV Path Planning for Air-Ground Relay Communication Based on Mix-Greedy MAPPO Algorithm
title_fullStr Multi-UAV Path Planning for Air-Ground Relay Communication Based on Mix-Greedy MAPPO Algorithm
title_full_unstemmed Multi-UAV Path Planning for Air-Ground Relay Communication Based on Mix-Greedy MAPPO Algorithm
title_short Multi-UAV Path Planning for Air-Ground Relay Communication Based on Mix-Greedy MAPPO Algorithm
title_sort multi uav path planning for air ground relay communication based on mix greedy mappo algorithm
topic path planning
UAV
relay communication
url https://www.mdpi.com/2504-446X/8/12/706
work_keys_str_mv AT yiquanwang multiuavpathplanningforairgroundrelaycommunicationbasedonmixgreedymappoalgorithm
AT yancui multiuavpathplanningforairgroundrelaycommunicationbasedonmixgreedymappoalgorithm
AT yuyang multiuavpathplanningforairgroundrelaycommunicationbasedonmixgreedymappoalgorithm
AT zhaodongli multiuavpathplanningforairgroundrelaycommunicationbasedonmixgreedymappoalgorithm
AT xingcui multiuavpathplanningforairgroundrelaycommunicationbasedonmixgreedymappoalgorithm