A Vision-Based End-to-End Reinforcement Learning Framework for Drone Target Tracking

Drone target tracking, which involves instructing drone movement to follow a moving target, encounters several challenges: (1) traditional methods need accurate state estimation of both the drone and target; (2) conventional Proportional–Derivative (PD) controllers require tedious parameter tuning a...

Full description

Saved in:

Bibliographic Details
Main Authors:	Xun Zhao, Xinjian Huang, Jianheng Cheng, Zhendong Xia, Zhiheng Tu
Format:	Article
Language:	English
Published:	MDPI AG 2024-10-01
Series:	Drones
Subjects:	drone target tracking end to end reinforcement learning YOLOv8 detector BoT-SORT twin delayed deep deterministic policy gradient
Online Access:	https://www.mdpi.com/2504-446X/8/11/628
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1846153745066360832
author	Xun Zhao Xinjian Huang Jianheng Cheng Zhendong Xia Zhiheng Tu
author_facet	Xun Zhao Xinjian Huang Jianheng Cheng Zhendong Xia Zhiheng Tu
author_sort	Xun Zhao
collection	DOAJ
description	Drone target tracking, which involves instructing drone movement to follow a moving target, encounters several challenges: (1) traditional methods need accurate state estimation of both the drone and target; (2) conventional Proportional–Derivative (PD) controllers require tedious parameter tuning and struggle with nonlinear properties; and (3) reinforcement learning methods, though promising, rely on the drone’s self-state estimation, adding complexity and computational load and reducing reliability. To address these challenges, this study proposes an innovative model-free end-to-end reinforcement learning framework, the VTD3 (Vision-Based Twin Delayed Deep Deterministic Policy Gradient), for drone target tracking tasks. This framework focuses on controlling the drone to follow a moving target while maintaining a specific distance. VTD3 is a pure vision-based tracking algorithm which integrates the YOLOv8 detector, the BoT-SORT tracking algorithm, and the Twin Delayed Deep Deterministic Policy Gradient (TD3) algorithm. It diminishes reliance on GPS and other sensors while simultaneously enhancing the tracking capability for complex target motion trajectories. In a simulated environment, we assess the tracking performance of VTD3 across four complex target motion trajectories (triangular, square, sawtooth, and square wave, including scenarios with occlusions). The experimental results indicate that our proposed VTD3 reinforcement learning algorithm substantially outperforms conventional PD controllers in drone target tracking applications. Across various target trajectories, the VTD3 algorithm demonstrates a significant reduction in average tracking errors along the X-axis and Y-axis of up to <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mrow><mn>34.35</mn><mo>%</mo></mrow></semantics></math></inline-formula> and <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mrow><mn>45.36</mn><mo>%</mo></mrow></semantics></math></inline-formula>, respectively. Additionally, it achieves a notable improvement of up to <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mrow><mn>66.10</mn><mo>%</mo></mrow></semantics></math></inline-formula> in altitude control precision. In terms of motion smoothness, the VTD3 algorithm markedly enhances performance metrics, with improvements of up to <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mrow><mn>37.70</mn><mo>%</mo></mrow></semantics></math></inline-formula> in jitter and <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mrow><mn>60.64</mn><mo>%</mo></mrow></semantics></math></inline-formula> in Jerk RMS. Empirical results verify the superiority and feasibility of our proposed VTD3 framework for drone target tracking.
format	Article
id	doaj-art-ab1f69a6e7364524bc1b6a25a0456cef
institution	Kabale University
issn	2504-446X
language	English
publishDate	2024-10-01
publisher	MDPI AG
record_format	Article
series	Drones
spelling	doaj-art-ab1f69a6e7364524bc1b6a25a0456cef2024-11-26T18:00:37ZengMDPI AGDrones2504-446X2024-10-0181162810.3390/drones8110628A Vision-Based End-to-End Reinforcement Learning Framework for Drone Target TrackingXun Zhao0Xinjian Huang1Jianheng Cheng2Zhendong Xia3Zhiheng Tu4School of Mechanical Engineering, Nanjing University of Science and Technology, Nanjing 210094, ChinaSchool of Cyber Science and Engineering, Nanjing University of Science and Technology, Nanjing 210094, ChinaSchool of Sino-French Engineers, Nanjing University of Science and Technology, Nanjing 210094, ChinaSchool of Sino-French Engineers, Nanjing University of Science and Technology, Nanjing 210094, ChinaSchool of Mechanical Engineering, Nanjing University of Science and Technology, Nanjing 210094, ChinaDrone target tracking, which involves instructing drone movement to follow a moving target, encounters several challenges: (1) traditional methods need accurate state estimation of both the drone and target; (2) conventional Proportional–Derivative (PD) controllers require tedious parameter tuning and struggle with nonlinear properties; and (3) reinforcement learning methods, though promising, rely on the drone’s self-state estimation, adding complexity and computational load and reducing reliability. To address these challenges, this study proposes an innovative model-free end-to-end reinforcement learning framework, the VTD3 (Vision-Based Twin Delayed Deep Deterministic Policy Gradient), for drone target tracking tasks. This framework focuses on controlling the drone to follow a moving target while maintaining a specific distance. VTD3 is a pure vision-based tracking algorithm which integrates the YOLOv8 detector, the BoT-SORT tracking algorithm, and the Twin Delayed Deep Deterministic Policy Gradient (TD3) algorithm. It diminishes reliance on GPS and other sensors while simultaneously enhancing the tracking capability for complex target motion trajectories. In a simulated environment, we assess the tracking performance of VTD3 across four complex target motion trajectories (triangular, square, sawtooth, and square wave, including scenarios with occlusions). The experimental results indicate that our proposed VTD3 reinforcement learning algorithm substantially outperforms conventional PD controllers in drone target tracking applications. Across various target trajectories, the VTD3 algorithm demonstrates a significant reduction in average tracking errors along the X-axis and Y-axis of up to <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mrow><mn>34.35</mn><mo>%</mo></mrow></semantics></math></inline-formula> and <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mrow><mn>45.36</mn><mo>%</mo></mrow></semantics></math></inline-formula>, respectively. Additionally, it achieves a notable improvement of up to <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mrow><mn>66.10</mn><mo>%</mo></mrow></semantics></math></inline-formula> in altitude control precision. In terms of motion smoothness, the VTD3 algorithm markedly enhances performance metrics, with improvements of up to <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mrow><mn>37.70</mn><mo>%</mo></mrow></semantics></math></inline-formula> in jitter and <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mrow><mn>60.64</mn><mo>%</mo></mrow></semantics></math></inline-formula> in Jerk RMS. Empirical results verify the superiority and feasibility of our proposed VTD3 framework for drone target tracking.https://www.mdpi.com/2504-446X/8/11/628drone target trackingend to endreinforcement learningYOLOv8 detectorBoT-SORTtwin delayed deep deterministic policy gradient
spellingShingle	Xun Zhao Xinjian Huang Jianheng Cheng Zhendong Xia Zhiheng Tu A Vision-Based End-to-End Reinforcement Learning Framework for Drone Target Tracking Drones drone target tracking end to end reinforcement learning YOLOv8 detector BoT-SORT twin delayed deep deterministic policy gradient
title	A Vision-Based End-to-End Reinforcement Learning Framework for Drone Target Tracking
title_full	A Vision-Based End-to-End Reinforcement Learning Framework for Drone Target Tracking
title_fullStr	A Vision-Based End-to-End Reinforcement Learning Framework for Drone Target Tracking
title_full_unstemmed	A Vision-Based End-to-End Reinforcement Learning Framework for Drone Target Tracking
title_short	A Vision-Based End-to-End Reinforcement Learning Framework for Drone Target Tracking
title_sort	vision based end to end reinforcement learning framework for drone target tracking
topic	drone target tracking end to end reinforcement learning YOLOv8 detector BoT-SORT twin delayed deep deterministic policy gradient
url	https://www.mdpi.com/2504-446X/8/11/628
work_keys_str_mv	AT xunzhao avisionbasedendtoendreinforcementlearningframeworkfordronetargettracking AT xinjianhuang avisionbasedendtoendreinforcementlearningframeworkfordronetargettracking AT jianhengcheng avisionbasedendtoendreinforcementlearningframeworkfordronetargettracking AT zhendongxia avisionbasedendtoendreinforcementlearningframeworkfordronetargettracking AT zhihengtu avisionbasedendtoendreinforcementlearningframeworkfordronetargettracking AT xunzhao visionbasedendtoendreinforcementlearningframeworkfordronetargettracking AT xinjianhuang visionbasedendtoendreinforcementlearningframeworkfordronetargettracking AT jianhengcheng visionbasedendtoendreinforcementlearningframeworkfordronetargettracking AT zhendongxia visionbasedendtoendreinforcementlearningframeworkfordronetargettracking AT zhihengtu visionbasedendtoendreinforcementlearningframeworkfordronetargettracking

A Vision-Based End-to-End Reinforcement Learning Framework for Drone Target Tracking

Similar Items