A Hybrid CNN-Transformer Network for Object Detection in Optical Remote Sensing Images: Integrating Local and Global Feature Fusion
Remote sensing images (RSIs) object detection is important in natural disaster management, urban planning and resource exploration. However, due to the large differences between RSIs and natural images (NIs), most of the existing object detectors for NIs cannot be directly used to process RSIs. Most...
Saved in:
| Main Authors: | , , , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
IEEE
2025-01-01
|
| Series: | IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing |
| Subjects: | |
| Online Access: | https://ieeexplore.ieee.org/document/10721373/ |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1846123604697153536 |
|---|---|
| author | Youxiang Huang Donglai Jiao Xingru Huang Tiantian Tang Guan Gui |
| author_facet | Youxiang Huang Donglai Jiao Xingru Huang Tiantian Tang Guan Gui |
| author_sort | Youxiang Huang |
| collection | DOAJ |
| description | Remote sensing images (RSIs) object detection is important in natural disaster management, urban planning and resource exploration. However, due to the large differences between RSIs and natural images (NIs), most of the existing object detectors for NIs cannot be directly used to process RSIs. Most existing models based on convolutional neural networks (CNNs) require additional design of specific attentional modules to relate small targets in RSIs to global positional relationships. In contrast, transformer-based models had to add modules to obtain more detailed information. This imposes additional computational overheads for deployment on edge devices. To solve the above-mentioned problem, we propose a hybrid CNN and transformer model (DConvTrans-LKA) to enhance the model's ability to acquire features and design a fusion of local and global attention mechanisms to fuse local features and global location information. To better fuse the feature and location information extracted by the model, we introduce a feature residual pyramid network to enhance the model's ability to fuse multiscale feature maps. Finally, we conduct experiments in three representative optical RSI datasets (NWPU VHR-10, HRRSD, and DIOR) to verify the effectiveness of our proposed DConvTrans-LKA method. The experimental results show that our proposed method reaches 61.7%, 82.1%, and 61.3% at mAP at 0.5, respectively, further demonstrating the potential of our proposed method in RSI object detection tasks. |
| format | Article |
| id | doaj-art-075bf58b679d40af9b13bf3cadb5b2a4 |
| institution | Kabale University |
| issn | 1939-1404 2151-1535 |
| language | English |
| publishDate | 2025-01-01 |
| publisher | IEEE |
| record_format | Article |
| series | IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing |
| spelling | doaj-art-075bf58b679d40af9b13bf3cadb5b2a42024-12-14T00:00:11ZengIEEEIEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing1939-14042151-15352025-01-011824125410.1109/JSTARS.2024.348325310721373A Hybrid CNN-Transformer Network for Object Detection in Optical Remote Sensing Images: Integrating Local and Global Feature FusionYouxiang Huang0https://orcid.org/0009-0000-0837-139XDonglai Jiao1https://orcid.org/0000-0003-4578-2715Xingru Huang2Tiantian Tang3https://orcid.org/0000-0002-8596-1227Guan Gui4https://orcid.org/0000-0003-3888-2881College of Telecommunications and Information Engineering, Nanjing University of Posts and Telecommunications, Nanjing, ChinaCollege of Internet of Things, Nanjing University of Posts and Telecommunications, Nanjing, ChinaCollege of Communication Engineering, Hangzhou Dianzi University, Hangzhou, ChinaCollege of Telecommunications and Information Engineering, Nanjing University of Posts and Telecommunications, Nanjing, ChinaCollege of Telecommunications and Information Engineering, Nanjing University of Posts and Telecommunications, Nanjing, ChinaRemote sensing images (RSIs) object detection is important in natural disaster management, urban planning and resource exploration. However, due to the large differences between RSIs and natural images (NIs), most of the existing object detectors for NIs cannot be directly used to process RSIs. Most existing models based on convolutional neural networks (CNNs) require additional design of specific attentional modules to relate small targets in RSIs to global positional relationships. In contrast, transformer-based models had to add modules to obtain more detailed information. This imposes additional computational overheads for deployment on edge devices. To solve the above-mentioned problem, we propose a hybrid CNN and transformer model (DConvTrans-LKA) to enhance the model's ability to acquire features and design a fusion of local and global attention mechanisms to fuse local features and global location information. To better fuse the feature and location information extracted by the model, we introduce a feature residual pyramid network to enhance the model's ability to fuse multiscale feature maps. Finally, we conduct experiments in three representative optical RSI datasets (NWPU VHR-10, HRRSD, and DIOR) to verify the effectiveness of our proposed DConvTrans-LKA method. The experimental results show that our proposed method reaches 61.7%, 82.1%, and 61.3% at mAP at 0.5, respectively, further demonstrating the potential of our proposed method in RSI object detection tasks.https://ieeexplore.ieee.org/document/10721373/Convolutional neural networks (CNNs)feature fusionlocal and global attention (LGA)optical remote sensing images (RSIs)vision transformer |
| spellingShingle | Youxiang Huang Donglai Jiao Xingru Huang Tiantian Tang Guan Gui A Hybrid CNN-Transformer Network for Object Detection in Optical Remote Sensing Images: Integrating Local and Global Feature Fusion IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing Convolutional neural networks (CNNs) feature fusion local and global attention (LGA) optical remote sensing images (RSIs) vision transformer |
| title | A Hybrid CNN-Transformer Network for Object Detection in Optical Remote Sensing Images: Integrating Local and Global Feature Fusion |
| title_full | A Hybrid CNN-Transformer Network for Object Detection in Optical Remote Sensing Images: Integrating Local and Global Feature Fusion |
| title_fullStr | A Hybrid CNN-Transformer Network for Object Detection in Optical Remote Sensing Images: Integrating Local and Global Feature Fusion |
| title_full_unstemmed | A Hybrid CNN-Transformer Network for Object Detection in Optical Remote Sensing Images: Integrating Local and Global Feature Fusion |
| title_short | A Hybrid CNN-Transformer Network for Object Detection in Optical Remote Sensing Images: Integrating Local and Global Feature Fusion |
| title_sort | hybrid cnn transformer network for object detection in optical remote sensing images integrating local and global feature fusion |
| topic | Convolutional neural networks (CNNs) feature fusion local and global attention (LGA) optical remote sensing images (RSIs) vision transformer |
| url | https://ieeexplore.ieee.org/document/10721373/ |
| work_keys_str_mv | AT youxianghuang ahybridcnntransformernetworkforobjectdetectioninopticalremotesensingimagesintegratinglocalandglobalfeaturefusion AT donglaijiao ahybridcnntransformernetworkforobjectdetectioninopticalremotesensingimagesintegratinglocalandglobalfeaturefusion AT xingruhuang ahybridcnntransformernetworkforobjectdetectioninopticalremotesensingimagesintegratinglocalandglobalfeaturefusion AT tiantiantang ahybridcnntransformernetworkforobjectdetectioninopticalremotesensingimagesintegratinglocalandglobalfeaturefusion AT guangui ahybridcnntransformernetworkforobjectdetectioninopticalremotesensingimagesintegratinglocalandglobalfeaturefusion AT youxianghuang hybridcnntransformernetworkforobjectdetectioninopticalremotesensingimagesintegratinglocalandglobalfeaturefusion AT donglaijiao hybridcnntransformernetworkforobjectdetectioninopticalremotesensingimagesintegratinglocalandglobalfeaturefusion AT xingruhuang hybridcnntransformernetworkforobjectdetectioninopticalremotesensingimagesintegratinglocalandglobalfeaturefusion AT tiantiantang hybridcnntransformernetworkforobjectdetectioninopticalremotesensingimagesintegratinglocalandglobalfeaturefusion AT guangui hybridcnntransformernetworkforobjectdetectioninopticalremotesensingimagesintegratinglocalandglobalfeaturefusion |