Multi-Modal Prototypes for Few-Shot Object Detection in Remote Sensing Images

Few-shot object detection has attracted extensive attention due to the abomination of time-consuming or even impractical large-scale data labeling. Current studies attempted to employ prototype-matching approaches for object detection, constructing class prototypes from textual or visual features. H...

Full description

Saved in:

Bibliographic Details
Main Authors:	Yanxing Liu, Zongxu Pan, Jianwei Yang, Peiling Zhou, Bingchen Zhang
Format:	Article
Language:	English
Published:	MDPI AG 2024-12-01
Series:	Remote Sensing
Subjects:	few-shot learning object detection remote sensing images
Online Access:	https://www.mdpi.com/2072-4292/16/24/4693
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1846102872347901952
author	Yanxing Liu Zongxu Pan Jianwei Yang Peiling Zhou Bingchen Zhang
author_facet	Yanxing Liu Zongxu Pan Jianwei Yang Peiling Zhou Bingchen Zhang
author_sort	Yanxing Liu
collection	DOAJ
description	Few-shot object detection has attracted extensive attention due to the abomination of time-consuming or even impractical large-scale data labeling. Current studies attempted to employ prototype-matching approaches for object detection, constructing class prototypes from textual or visual features. However, single visual prototypes exhibit limited generalization in few-shot scenarios, while single textual prototypes lack the spatial details of remote sensing targets. Therefore, to achieve the best of both worlds, we propose a prototype aggregating module to integrate textual and visual prototypes, leveraging both semantics from textual prototypes and spatial details from visual prototypes. In addition, the transferability of multi-modal few-shot detectors from natural scenarios to remote sensing scenarios remains unexplored, and previous training strategies for FSOD do not adequately consider the characteristics of text encoders. To address the issue, we have conducted extensive ablation studies on different feature extractors of the detector and propose an efficient two-stage training strategy, which takes the characteristics of the text feature extractor into account. Experiments on two common few-shot detection benchmarks demonstrate the effectiveness of our proposed method. In four widely used data splits of DIOR, our method significantly outperforms previous state-of-the-art methods by at most 8.7%.
format	Article
id	doaj-art-a1461b16dafd44dab3bb9efa8f71fbfc
institution	Kabale University
issn	2072-4292
language	English
publishDate	2024-12-01
publisher	MDPI AG
record_format	Article
series	Remote Sensing
spelling	doaj-art-a1461b16dafd44dab3bb9efa8f71fbfc2024-12-27T14:50:55ZengMDPI AGRemote Sensing2072-42922024-12-011624469310.3390/rs16244693Multi-Modal Prototypes for Few-Shot Object Detection in Remote Sensing ImagesYanxing Liu0Zongxu Pan1Jianwei Yang2Peiling Zhou3Bingchen Zhang4Aerospace Information Research Institute, Chinese Academy of Sciences, Beijing 100094, ChinaAerospace Information Research Institute, Chinese Academy of Sciences, Beijing 100094, ChinaAerospace Information Research Institute, Chinese Academy of Sciences, Beijing 100094, ChinaAerospace Information Research Institute, Chinese Academy of Sciences, Beijing 100094, ChinaAerospace Information Research Institute, Chinese Academy of Sciences, Beijing 100094, ChinaFew-shot object detection has attracted extensive attention due to the abomination of time-consuming or even impractical large-scale data labeling. Current studies attempted to employ prototype-matching approaches for object detection, constructing class prototypes from textual or visual features. However, single visual prototypes exhibit limited generalization in few-shot scenarios, while single textual prototypes lack the spatial details of remote sensing targets. Therefore, to achieve the best of both worlds, we propose a prototype aggregating module to integrate textual and visual prototypes, leveraging both semantics from textual prototypes and spatial details from visual prototypes. In addition, the transferability of multi-modal few-shot detectors from natural scenarios to remote sensing scenarios remains unexplored, and previous training strategies for FSOD do not adequately consider the characteristics of text encoders. To address the issue, we have conducted extensive ablation studies on different feature extractors of the detector and propose an efficient two-stage training strategy, which takes the characteristics of the text feature extractor into account. Experiments on two common few-shot detection benchmarks demonstrate the effectiveness of our proposed method. In four widely used data splits of DIOR, our method significantly outperforms previous state-of-the-art methods by at most 8.7%.https://www.mdpi.com/2072-4292/16/24/4693few-shot learningobject detectionremote sensing images
spellingShingle	Yanxing Liu Zongxu Pan Jianwei Yang Peiling Zhou Bingchen Zhang Multi-Modal Prototypes for Few-Shot Object Detection in Remote Sensing Images Remote Sensing few-shot learning object detection remote sensing images
title	Multi-Modal Prototypes for Few-Shot Object Detection in Remote Sensing Images
title_full	Multi-Modal Prototypes for Few-Shot Object Detection in Remote Sensing Images
title_fullStr	Multi-Modal Prototypes for Few-Shot Object Detection in Remote Sensing Images
title_full_unstemmed	Multi-Modal Prototypes for Few-Shot Object Detection in Remote Sensing Images
title_short	Multi-Modal Prototypes for Few-Shot Object Detection in Remote Sensing Images
title_sort	multi modal prototypes for few shot object detection in remote sensing images
topic	few-shot learning object detection remote sensing images
url	https://www.mdpi.com/2072-4292/16/24/4693
work_keys_str_mv	AT yanxingliu multimodalprototypesforfewshotobjectdetectioninremotesensingimages AT zongxupan multimodalprototypesforfewshotobjectdetectioninremotesensingimages AT jianweiyang multimodalprototypesforfewshotobjectdetectioninremotesensingimages AT peilingzhou multimodalprototypesforfewshotobjectdetectioninremotesensingimages AT bingchenzhang multimodalprototypesforfewshotobjectdetectioninremotesensingimages

Multi-Modal Prototypes for Few-Shot Object Detection in Remote Sensing Images

Similar Items