A Hybrid CNN-Transformer Network for Object Detection in Optical Remote Sensing Images: Integrating Local and Global Feature Fusion

Remote sensing images (RSIs) object detection is important in natural disaster management, urban planning and resource exploration. However, due to the large differences between RSIs and natural images (NIs), most of the existing object detectors for NIs cannot be directly used to process RSIs. Most...

Full description

Saved in:
Bibliographic Details
Main Authors: Youxiang Huang, Donglai Jiao, Xingru Huang, Tiantian Tang, Guan Gui
Format: Article
Language:English
Published: IEEE 2025-01-01
Series:IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing
Subjects:
Online Access:https://ieeexplore.ieee.org/document/10721373/
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1846123604697153536
author Youxiang Huang
Donglai Jiao
Xingru Huang
Tiantian Tang
Guan Gui
author_facet Youxiang Huang
Donglai Jiao
Xingru Huang
Tiantian Tang
Guan Gui
author_sort Youxiang Huang
collection DOAJ
description Remote sensing images (RSIs) object detection is important in natural disaster management, urban planning and resource exploration. However, due to the large differences between RSIs and natural images (NIs), most of the existing object detectors for NIs cannot be directly used to process RSIs. Most existing models based on convolutional neural networks (CNNs) require additional design of specific attentional modules to relate small targets in RSIs to global positional relationships. In contrast, transformer-based models had to add modules to obtain more detailed information. This imposes additional computational overheads for deployment on edge devices. To solve the above-mentioned problem, we propose a hybrid CNN and transformer model (DConvTrans-LKA) to enhance the model's ability to acquire features and design a fusion of local and global attention mechanisms to fuse local features and global location information. To better fuse the feature and location information extracted by the model, we introduce a feature residual pyramid network to enhance the model's ability to fuse multiscale feature maps. Finally, we conduct experiments in three representative optical RSI datasets (NWPU VHR-10, HRRSD, and DIOR) to verify the effectiveness of our proposed DConvTrans-LKA method. The experimental results show that our proposed method reaches 61.7%, 82.1%, and 61.3% at mAP at 0.5, respectively, further demonstrating the potential of our proposed method in RSI object detection tasks.
format Article
id doaj-art-075bf58b679d40af9b13bf3cadb5b2a4
institution Kabale University
issn 1939-1404
2151-1535
language English
publishDate 2025-01-01
publisher IEEE
record_format Article
series IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing
spelling doaj-art-075bf58b679d40af9b13bf3cadb5b2a42024-12-14T00:00:11ZengIEEEIEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing1939-14042151-15352025-01-011824125410.1109/JSTARS.2024.348325310721373A Hybrid CNN-Transformer Network for Object Detection in Optical Remote Sensing Images: Integrating Local and Global Feature FusionYouxiang Huang0https://orcid.org/0009-0000-0837-139XDonglai Jiao1https://orcid.org/0000-0003-4578-2715Xingru Huang2Tiantian Tang3https://orcid.org/0000-0002-8596-1227Guan Gui4https://orcid.org/0000-0003-3888-2881College of Telecommunications and Information Engineering, Nanjing University of Posts and Telecommunications, Nanjing, ChinaCollege of Internet of Things, Nanjing University of Posts and Telecommunications, Nanjing, ChinaCollege of Communication Engineering, Hangzhou Dianzi University, Hangzhou, ChinaCollege of Telecommunications and Information Engineering, Nanjing University of Posts and Telecommunications, Nanjing, ChinaCollege of Telecommunications and Information Engineering, Nanjing University of Posts and Telecommunications, Nanjing, ChinaRemote sensing images (RSIs) object detection is important in natural disaster management, urban planning and resource exploration. However, due to the large differences between RSIs and natural images (NIs), most of the existing object detectors for NIs cannot be directly used to process RSIs. Most existing models based on convolutional neural networks (CNNs) require additional design of specific attentional modules to relate small targets in RSIs to global positional relationships. In contrast, transformer-based models had to add modules to obtain more detailed information. This imposes additional computational overheads for deployment on edge devices. To solve the above-mentioned problem, we propose a hybrid CNN and transformer model (DConvTrans-LKA) to enhance the model's ability to acquire features and design a fusion of local and global attention mechanisms to fuse local features and global location information. To better fuse the feature and location information extracted by the model, we introduce a feature residual pyramid network to enhance the model's ability to fuse multiscale feature maps. Finally, we conduct experiments in three representative optical RSI datasets (NWPU VHR-10, HRRSD, and DIOR) to verify the effectiveness of our proposed DConvTrans-LKA method. The experimental results show that our proposed method reaches 61.7%, 82.1%, and 61.3% at mAP at 0.5, respectively, further demonstrating the potential of our proposed method in RSI object detection tasks.https://ieeexplore.ieee.org/document/10721373/Convolutional neural networks (CNNs)feature fusionlocal and global attention (LGA)optical remote sensing images (RSIs)vision transformer
spellingShingle Youxiang Huang
Donglai Jiao
Xingru Huang
Tiantian Tang
Guan Gui
A Hybrid CNN-Transformer Network for Object Detection in Optical Remote Sensing Images: Integrating Local and Global Feature Fusion
IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing
Convolutional neural networks (CNNs)
feature fusion
local and global attention (LGA)
optical remote sensing images (RSIs)
vision transformer
title A Hybrid CNN-Transformer Network for Object Detection in Optical Remote Sensing Images: Integrating Local and Global Feature Fusion
title_full A Hybrid CNN-Transformer Network for Object Detection in Optical Remote Sensing Images: Integrating Local and Global Feature Fusion
title_fullStr A Hybrid CNN-Transformer Network for Object Detection in Optical Remote Sensing Images: Integrating Local and Global Feature Fusion
title_full_unstemmed A Hybrid CNN-Transformer Network for Object Detection in Optical Remote Sensing Images: Integrating Local and Global Feature Fusion
title_short A Hybrid CNN-Transformer Network for Object Detection in Optical Remote Sensing Images: Integrating Local and Global Feature Fusion
title_sort hybrid cnn transformer network for object detection in optical remote sensing images integrating local and global feature fusion
topic Convolutional neural networks (CNNs)
feature fusion
local and global attention (LGA)
optical remote sensing images (RSIs)
vision transformer
url https://ieeexplore.ieee.org/document/10721373/
work_keys_str_mv AT youxianghuang ahybridcnntransformernetworkforobjectdetectioninopticalremotesensingimagesintegratinglocalandglobalfeaturefusion
AT donglaijiao ahybridcnntransformernetworkforobjectdetectioninopticalremotesensingimagesintegratinglocalandglobalfeaturefusion
AT xingruhuang ahybridcnntransformernetworkforobjectdetectioninopticalremotesensingimagesintegratinglocalandglobalfeaturefusion
AT tiantiantang ahybridcnntransformernetworkforobjectdetectioninopticalremotesensingimagesintegratinglocalandglobalfeaturefusion
AT guangui ahybridcnntransformernetworkforobjectdetectioninopticalremotesensingimagesintegratinglocalandglobalfeaturefusion
AT youxianghuang hybridcnntransformernetworkforobjectdetectioninopticalremotesensingimagesintegratinglocalandglobalfeaturefusion
AT donglaijiao hybridcnntransformernetworkforobjectdetectioninopticalremotesensingimagesintegratinglocalandglobalfeaturefusion
AT xingruhuang hybridcnntransformernetworkforobjectdetectioninopticalremotesensingimagesintegratinglocalandglobalfeaturefusion
AT tiantiantang hybridcnntransformernetworkforobjectdetectioninopticalremotesensingimagesintegratinglocalandglobalfeaturefusion
AT guangui hybridcnntransformernetworkforobjectdetectioninopticalremotesensingimagesintegratinglocalandglobalfeaturefusion