Path Planning of Mobile Robot in Dynamic Obstacle Avoidance Environment Based on Deep Reinforcement Learning
In this study, to address the issues faced by mobile robots in complex environments, such as sparse rewards caused by limited effective experience, slow learning efficiency in the early stages of training, as well as poor obstacle avoidance performance in environments with dynamic obstacles, the aut...
Saved in:
| Main Authors: | , , , , , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
IEEE
2024-01-01
|
| Series: | IEEE Access |
| Subjects: | |
| Online Access: | https://ieeexplore.ieee.org/document/10769446/ |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1846113859271655424 |
|---|---|
| author | Qingfeng Zhang Wenpeng Ma Qingchun Zheng Xiaofan Zhai Wenqian Zhang Tianchang Zhang Shuo Wang |
| author_facet | Qingfeng Zhang Wenpeng Ma Qingchun Zheng Xiaofan Zhai Wenqian Zhang Tianchang Zhang Shuo Wang |
| author_sort | Qingfeng Zhang |
| collection | DOAJ |
| description | In this study, to address the issues faced by mobile robots in complex environments, such as sparse rewards caused by limited effective experience, slow learning efficiency in the early stages of training, as well as poor obstacle avoidance performance in environments with dynamic obstacles, the authors proposed a new path planning algorithm for mobile robots by introducing Intrinsic Curiosity Module (ICM) and Long Short-Term Memory (LSTM) into the Proximal Policy Optimization (PPO) algorithm. ICM provided intrinsic rewards in addition to external rewards, accelerating the initial convergence speed. And the Actor-Critic network was optimized by employing a LSTM-based neural network to enhance the performance of avoiding dynamic obstacles. Then, various simulation experiments were conducted in Gazebo with scenarios featuring both static and dynamic obstacles, and the TurtleBot3 mobile robot was used for experimental verification. The experiments demonstrate that the proposed algorithm significantly accelerates convergence in environments with sparse rewards compared to traditional algorithms, and the robot can find more target points within a single episode, indicating more effective path planning ability. Additionally, it can avoid obstacles in various states. Finally, the effectiveness of the algorithm was validated using the TurtleBot3 physical robot in real-world scenarios. Results show that compared to Deep Deterministic Policy Gradient (DDPG), PPO, LSTM-PPO and ICM-PPO, the success rate of the proposed algorithm in path planning increases by 9%, 7.2%, 4% and 4.8% in the most complex simulation environment and by 20%, 16%, 10% and 12% in the physical environment, respectively. |
| format | Article |
| id | doaj-art-512df81fc2624bbeb59d1936dc10bbd8 |
| institution | Kabale University |
| issn | 2169-3536 |
| language | English |
| publishDate | 2024-01-01 |
| publisher | IEEE |
| record_format | Article |
| series | IEEE Access |
| spelling | doaj-art-512df81fc2624bbeb59d1936dc10bbd82024-12-21T00:01:00ZengIEEEIEEE Access2169-35362024-01-011218913618915210.1109/ACCESS.2024.350701610769446Path Planning of Mobile Robot in Dynamic Obstacle Avoidance Environment Based on Deep Reinforcement LearningQingfeng Zhang0https://orcid.org/0009-0004-6895-5353Wenpeng Ma1https://orcid.org/0000-0002-9714-2544Qingchun Zheng2Xiaofan Zhai3Wenqian Zhang4Tianchang Zhang5Shuo Wang6Tianjin Key Laboratory for Advanced Mechatronic System Design and Intelligent Control, School of Mechanical Engineering, Tianjin University of Technology, Tianjin, ChinaTianjin Key Laboratory for Advanced Mechatronic System Design and Intelligent Control, School of Mechanical Engineering, Tianjin University of Technology, Tianjin, ChinaTianjin Key Laboratory for Advanced Mechatronic System Design and Intelligent Control, School of Mechanical Engineering, Tianjin University of Technology, Tianjin, ChinaTianjin Key Laboratory for Advanced Mechatronic System Design and Intelligent Control, School of Mechanical Engineering, Tianjin University of Technology, Tianjin, ChinaTianjin Key Laboratory for Advanced Mechatronic System Design and Intelligent Control, School of Mechanical Engineering, Tianjin University of Technology, Tianjin, ChinaTianjin Key Laboratory for Advanced Mechatronic System Design and Intelligent Control, School of Mechanical Engineering, Tianjin University of Technology, Tianjin, ChinaTianjin Key Laboratory for Advanced Mechatronic System Design and Intelligent Control, School of Mechanical Engineering, Tianjin University of Technology, Tianjin, ChinaIn this study, to address the issues faced by mobile robots in complex environments, such as sparse rewards caused by limited effective experience, slow learning efficiency in the early stages of training, as well as poor obstacle avoidance performance in environments with dynamic obstacles, the authors proposed a new path planning algorithm for mobile robots by introducing Intrinsic Curiosity Module (ICM) and Long Short-Term Memory (LSTM) into the Proximal Policy Optimization (PPO) algorithm. ICM provided intrinsic rewards in addition to external rewards, accelerating the initial convergence speed. And the Actor-Critic network was optimized by employing a LSTM-based neural network to enhance the performance of avoiding dynamic obstacles. Then, various simulation experiments were conducted in Gazebo with scenarios featuring both static and dynamic obstacles, and the TurtleBot3 mobile robot was used for experimental verification. The experiments demonstrate that the proposed algorithm significantly accelerates convergence in environments with sparse rewards compared to traditional algorithms, and the robot can find more target points within a single episode, indicating more effective path planning ability. Additionally, it can avoid obstacles in various states. Finally, the effectiveness of the algorithm was validated using the TurtleBot3 physical robot in real-world scenarios. Results show that compared to Deep Deterministic Policy Gradient (DDPG), PPO, LSTM-PPO and ICM-PPO, the success rate of the proposed algorithm in path planning increases by 9%, 7.2%, 4% and 4.8% in the most complex simulation environment and by 20%, 16%, 10% and 12% in the physical environment, respectively.https://ieeexplore.ieee.org/document/10769446/Intrinsic curiosity moduleLSTM networkmobile robotpath planningreinforcement learning |
| spellingShingle | Qingfeng Zhang Wenpeng Ma Qingchun Zheng Xiaofan Zhai Wenqian Zhang Tianchang Zhang Shuo Wang Path Planning of Mobile Robot in Dynamic Obstacle Avoidance Environment Based on Deep Reinforcement Learning IEEE Access Intrinsic curiosity module LSTM network mobile robot path planning reinforcement learning |
| title | Path Planning of Mobile Robot in Dynamic Obstacle Avoidance Environment Based on Deep Reinforcement Learning |
| title_full | Path Planning of Mobile Robot in Dynamic Obstacle Avoidance Environment Based on Deep Reinforcement Learning |
| title_fullStr | Path Planning of Mobile Robot in Dynamic Obstacle Avoidance Environment Based on Deep Reinforcement Learning |
| title_full_unstemmed | Path Planning of Mobile Robot in Dynamic Obstacle Avoidance Environment Based on Deep Reinforcement Learning |
| title_short | Path Planning of Mobile Robot in Dynamic Obstacle Avoidance Environment Based on Deep Reinforcement Learning |
| title_sort | path planning of mobile robot in dynamic obstacle avoidance environment based on deep reinforcement learning |
| topic | Intrinsic curiosity module LSTM network mobile robot path planning reinforcement learning |
| url | https://ieeexplore.ieee.org/document/10769446/ |
| work_keys_str_mv | AT qingfengzhang pathplanningofmobilerobotindynamicobstacleavoidanceenvironmentbasedondeepreinforcementlearning AT wenpengma pathplanningofmobilerobotindynamicobstacleavoidanceenvironmentbasedondeepreinforcementlearning AT qingchunzheng pathplanningofmobilerobotindynamicobstacleavoidanceenvironmentbasedondeepreinforcementlearning AT xiaofanzhai pathplanningofmobilerobotindynamicobstacleavoidanceenvironmentbasedondeepreinforcementlearning AT wenqianzhang pathplanningofmobilerobotindynamicobstacleavoidanceenvironmentbasedondeepreinforcementlearning AT tianchangzhang pathplanningofmobilerobotindynamicobstacleavoidanceenvironmentbasedondeepreinforcementlearning AT shuowang pathplanningofmobilerobotindynamicobstacleavoidanceenvironmentbasedondeepreinforcementlearning |