A Hybrid CNN-Transformer Network for Object Detection in Optical Remote Sensing Images: Integrating Local and Global Feature Fusion

Remote sensing images (RSIs) object detection is important in natural disaster management, urban planning and resource exploration. However, due to the large differences between RSIs and natural images (NIs), most of the existing object detectors for NIs cannot be directly used to process RSIs. Most...

Full description

Saved in:

Bibliographic Details
Main Authors:	Youxiang Huang, Donglai Jiao, Xingru Huang, Tiantian Tang, Guan Gui
Format:	Article
Language:	English
Published:	IEEE 2025-01-01
Series:	IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing
Subjects:	Convolutional neural networks (CNNs) feature fusion local and global attention (LGA) optical remote sensing images (RSIs) vision transformer
Online Access:	https://ieeexplore.ieee.org/document/10721373/
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1846123604697153536
author	Youxiang Huang Donglai Jiao Xingru Huang Tiantian Tang Guan Gui
author_facet	Youxiang Huang Donglai Jiao Xingru Huang Tiantian Tang Guan Gui
author_sort	Youxiang Huang
collection	DOAJ
description	Remote sensing images (RSIs) object detection is important in natural disaster management, urban planning and resource exploration. However, due to the large differences between RSIs and natural images (NIs), most of the existing object detectors for NIs cannot be directly used to process RSIs. Most existing models based on convolutional neural networks (CNNs) require additional design of specific attentional modules to relate small targets in RSIs to global positional relationships. In contrast, transformer-based models had to add modules to obtain more detailed information. This imposes additional computational overheads for deployment on edge devices. To solve the above-mentioned problem, we propose a hybrid CNN and transformer model (DConvTrans-LKA) to enhance the model's ability to acquire features and design a fusion of local and global attention mechanisms to fuse local features and global location information. To better fuse the feature and location information extracted by the model, we introduce a feature residual pyramid network to enhance the model's ability to fuse multiscale feature maps. Finally, we conduct experiments in three representative optical RSI datasets (NWPU VHR-10, HRRSD, and DIOR) to verify the effectiveness of our proposed DConvTrans-LKA method. The experimental results show that our proposed method reaches 61.7%, 82.1%, and 61.3% at mAP at 0.5, respectively, further demonstrating the potential of our proposed method in RSI object detection tasks.
format	Article
id	doaj-art-075bf58b679d40af9b13bf3cadb5b2a4
institution	Kabale University
issn	1939-1404 2151-1535
language	English
publishDate	2025-01-01
publisher	IEEE
record_format	Article
series	IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing
spelling	doaj-art-075bf58b679d40af9b13bf3cadb5b2a42024-12-14T00:00:11ZengIEEEIEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing1939-14042151-15352025-01-011824125410.1109/JSTARS.2024.348325310721373A Hybrid CNN-Transformer Network for Object Detection in Optical Remote Sensing Images: Integrating Local and Global Feature FusionYouxiang Huang0https://orcid.org/0009-0000-0837-139XDonglai Jiao1https://orcid.org/0000-0003-4578-2715Xingru Huang2Tiantian Tang3https://orcid.org/0000-0002-8596-1227Guan Gui4https://orcid.org/0000-0003-3888-2881College of Telecommunications and Information Engineering, Nanjing University of Posts and Telecommunications, Nanjing, ChinaCollege of Internet of Things, Nanjing University of Posts and Telecommunications, Nanjing, ChinaCollege of Communication Engineering, Hangzhou Dianzi University, Hangzhou, ChinaCollege of Telecommunications and Information Engineering, Nanjing University of Posts and Telecommunications, Nanjing, ChinaCollege of Telecommunications and Information Engineering, Nanjing University of Posts and Telecommunications, Nanjing, ChinaRemote sensing images (RSIs) object detection is important in natural disaster management, urban planning and resource exploration. However, due to the large differences between RSIs and natural images (NIs), most of the existing object detectors for NIs cannot be directly used to process RSIs. Most existing models based on convolutional neural networks (CNNs) require additional design of specific attentional modules to relate small targets in RSIs to global positional relationships. In contrast, transformer-based models had to add modules to obtain more detailed information. This imposes additional computational overheads for deployment on edge devices. To solve the above-mentioned problem, we propose a hybrid CNN and transformer model (DConvTrans-LKA) to enhance the model's ability to acquire features and design a fusion of local and global attention mechanisms to fuse local features and global location information. To better fuse the feature and location information extracted by the model, we introduce a feature residual pyramid network to enhance the model's ability to fuse multiscale feature maps. Finally, we conduct experiments in three representative optical RSI datasets (NWPU VHR-10, HRRSD, and DIOR) to verify the effectiveness of our proposed DConvTrans-LKA method. The experimental results show that our proposed method reaches 61.7%, 82.1%, and 61.3% at mAP at 0.5, respectively, further demonstrating the potential of our proposed method in RSI object detection tasks.https://ieeexplore.ieee.org/document/10721373/Convolutional neural networks (CNNs)feature fusionlocal and global attention (LGA)optical remote sensing images (RSIs)vision transformer
spellingShingle	Youxiang Huang Donglai Jiao Xingru Huang Tiantian Tang Guan Gui A Hybrid CNN-Transformer Network for Object Detection in Optical Remote Sensing Images: Integrating Local and Global Feature Fusion IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing Convolutional neural networks (CNNs) feature fusion local and global attention (LGA) optical remote sensing images (RSIs) vision transformer
title	A Hybrid CNN-Transformer Network for Object Detection in Optical Remote Sensing Images: Integrating Local and Global Feature Fusion
title_full	A Hybrid CNN-Transformer Network for Object Detection in Optical Remote Sensing Images: Integrating Local and Global Feature Fusion
title_fullStr	A Hybrid CNN-Transformer Network for Object Detection in Optical Remote Sensing Images: Integrating Local and Global Feature Fusion
title_full_unstemmed	A Hybrid CNN-Transformer Network for Object Detection in Optical Remote Sensing Images: Integrating Local and Global Feature Fusion
title_short	A Hybrid CNN-Transformer Network for Object Detection in Optical Remote Sensing Images: Integrating Local and Global Feature Fusion
title_sort	hybrid cnn transformer network for object detection in optical remote sensing images integrating local and global feature fusion
topic	Convolutional neural networks (CNNs) feature fusion local and global attention (LGA) optical remote sensing images (RSIs) vision transformer
url	https://ieeexplore.ieee.org/document/10721373/
work_keys_str_mv	AT youxianghuang ahybridcnntransformernetworkforobjectdetectioninopticalremotesensingimagesintegratinglocalandglobalfeaturefusion AT donglaijiao ahybridcnntransformernetworkforobjectdetectioninopticalremotesensingimagesintegratinglocalandglobalfeaturefusion AT xingruhuang ahybridcnntransformernetworkforobjectdetectioninopticalremotesensingimagesintegratinglocalandglobalfeaturefusion AT tiantiantang ahybridcnntransformernetworkforobjectdetectioninopticalremotesensingimagesintegratinglocalandglobalfeaturefusion AT guangui ahybridcnntransformernetworkforobjectdetectioninopticalremotesensingimagesintegratinglocalandglobalfeaturefusion AT youxianghuang hybridcnntransformernetworkforobjectdetectioninopticalremotesensingimagesintegratinglocalandglobalfeaturefusion AT donglaijiao hybridcnntransformernetworkforobjectdetectioninopticalremotesensingimagesintegratinglocalandglobalfeaturefusion AT xingruhuang hybridcnntransformernetworkforobjectdetectioninopticalremotesensingimagesintegratinglocalandglobalfeaturefusion AT tiantiantang hybridcnntransformernetworkforobjectdetectioninopticalremotesensingimagesintegratinglocalandglobalfeaturefusion AT guangui hybridcnntransformernetworkforobjectdetectioninopticalremotesensingimagesintegratinglocalandglobalfeaturefusion

A Hybrid CNN-Transformer Network for Object Detection in Optical Remote Sensing Images: Integrating Local and Global Feature Fusion

Similar Items