Real-Time Policy Optimization for UAV Swarms Based on Evolution Strategies

Multi-agent decision-making faces many challenges such as non-stationarity and sparse rewards, while the complexity and randomness of the real environment further complicate policy development. This paper addresses the high-dimensional policy optimization problems of unmanned aerial vehicle (UAV) sw...

Full description

Saved in:
Bibliographic Details
Main Authors: Zeyu Chen, Haiying Liu, Guohua Liu
Format: Article
Language:English
Published: MDPI AG 2024-10-01
Series:Drones
Subjects:
Online Access:https://www.mdpi.com/2504-446X/8/11/619
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1846153764179804160
author Zeyu Chen
Haiying Liu
Guohua Liu
author_facet Zeyu Chen
Haiying Liu
Guohua Liu
author_sort Zeyu Chen
collection DOAJ
description Multi-agent decision-making faces many challenges such as non-stationarity and sparse rewards, while the complexity and randomness of the real environment further complicate policy development. This paper addresses the high-dimensional policy optimization problems of unmanned aerial vehicle (UAV) swarms. By modeling the problem scenario as a Markov decision process, a real-time policy optimization algorithm based on evolution strategy (ES) pre-training is proposed. This approach combines decision-time planning with background planning to evaluate and integrate different sets of policy parameters in a temporal context. In the experimental phase, the policy network is trained using both ES and REINFORCE algorithms on a constructed simulation platform. Comparative experiments demonstrate the effectiveness of using ES for policy pre-training. Finally, the proposed real-time policy optimization algorithm further improves the performance of the swarm by approximately 10% in simulations, offering a feasible solution for adversarial games between swarms and extending the research scope of evolutionary algorithms.
format Article
id doaj-art-e9b9796d86014c948079ca2d2a6e624c
institution Kabale University
issn 2504-446X
language English
publishDate 2024-10-01
publisher MDPI AG
record_format Article
series Drones
spelling doaj-art-e9b9796d86014c948079ca2d2a6e624c2024-11-26T18:00:35ZengMDPI AGDrones2504-446X2024-10-0181161910.3390/drones8110619Real-Time Policy Optimization for UAV Swarms Based on Evolution StrategiesZeyu Chen0Haiying Liu1Guohua Liu2School of Mathematics, Southeast University, Nanjing 211102, ChinaCollege of Astronautics, Nanjing University of Aeronautics and Astronautics, Nanjing 210016, ChinaSchool of Mathematics, Southeast University, Nanjing 211102, ChinaMulti-agent decision-making faces many challenges such as non-stationarity and sparse rewards, while the complexity and randomness of the real environment further complicate policy development. This paper addresses the high-dimensional policy optimization problems of unmanned aerial vehicle (UAV) swarms. By modeling the problem scenario as a Markov decision process, a real-time policy optimization algorithm based on evolution strategy (ES) pre-training is proposed. This approach combines decision-time planning with background planning to evaluate and integrate different sets of policy parameters in a temporal context. In the experimental phase, the policy network is trained using both ES and REINFORCE algorithms on a constructed simulation platform. Comparative experiments demonstrate the effectiveness of using ES for policy pre-training. Finally, the proposed real-time policy optimization algorithm further improves the performance of the swarm by approximately 10% in simulations, offering a feasible solution for adversarial games between swarms and extending the research scope of evolutionary algorithms.https://www.mdpi.com/2504-446X/8/11/619UAV swarmreinforcement learningevolution strategiesreal-time optimization
spellingShingle Zeyu Chen
Haiying Liu
Guohua Liu
Real-Time Policy Optimization for UAV Swarms Based on Evolution Strategies
Drones
UAV swarm
reinforcement learning
evolution strategies
real-time optimization
title Real-Time Policy Optimization for UAV Swarms Based on Evolution Strategies
title_full Real-Time Policy Optimization for UAV Swarms Based on Evolution Strategies
title_fullStr Real-Time Policy Optimization for UAV Swarms Based on Evolution Strategies
title_full_unstemmed Real-Time Policy Optimization for UAV Swarms Based on Evolution Strategies
title_short Real-Time Policy Optimization for UAV Swarms Based on Evolution Strategies
title_sort real time policy optimization for uav swarms based on evolution strategies
topic UAV swarm
reinforcement learning
evolution strategies
real-time optimization
url https://www.mdpi.com/2504-446X/8/11/619
work_keys_str_mv AT zeyuchen realtimepolicyoptimizationforuavswarmsbasedonevolutionstrategies
AT haiyingliu realtimepolicyoptimizationforuavswarmsbasedonevolutionstrategies
AT guohualiu realtimepolicyoptimizationforuavswarmsbasedonevolutionstrategies