Multi-Modal Prototypes for Few-Shot Object Detection in Remote Sensing Images

Few-shot object detection has attracted extensive attention due to the abomination of time-consuming or even impractical large-scale data labeling. Current studies attempted to employ prototype-matching approaches for object detection, constructing class prototypes from textual or visual features. H...

Full description

Saved in:
Bibliographic Details
Main Authors: Yanxing Liu, Zongxu Pan, Jianwei Yang, Peiling Zhou, Bingchen Zhang
Format: Article
Language:English
Published: MDPI AG 2024-12-01
Series:Remote Sensing
Subjects:
Online Access:https://www.mdpi.com/2072-4292/16/24/4693
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1846102872347901952
author Yanxing Liu
Zongxu Pan
Jianwei Yang
Peiling Zhou
Bingchen Zhang
author_facet Yanxing Liu
Zongxu Pan
Jianwei Yang
Peiling Zhou
Bingchen Zhang
author_sort Yanxing Liu
collection DOAJ
description Few-shot object detection has attracted extensive attention due to the abomination of time-consuming or even impractical large-scale data labeling. Current studies attempted to employ prototype-matching approaches for object detection, constructing class prototypes from textual or visual features. However, single visual prototypes exhibit limited generalization in few-shot scenarios, while single textual prototypes lack the spatial details of remote sensing targets. Therefore, to achieve the best of both worlds, we propose a prototype aggregating module to integrate textual and visual prototypes, leveraging both semantics from textual prototypes and spatial details from visual prototypes. In addition, the transferability of multi-modal few-shot detectors from natural scenarios to remote sensing scenarios remains unexplored, and previous training strategies for FSOD do not adequately consider the characteristics of text encoders. To address the issue, we have conducted extensive ablation studies on different feature extractors of the detector and propose an efficient two-stage training strategy, which takes the characteristics of the text feature extractor into account. Experiments on two common few-shot detection benchmarks demonstrate the effectiveness of our proposed method. In four widely used data splits of DIOR, our method significantly outperforms previous state-of-the-art methods by at most 8.7%.
format Article
id doaj-art-a1461b16dafd44dab3bb9efa8f71fbfc
institution Kabale University
issn 2072-4292
language English
publishDate 2024-12-01
publisher MDPI AG
record_format Article
series Remote Sensing
spelling doaj-art-a1461b16dafd44dab3bb9efa8f71fbfc2024-12-27T14:50:55ZengMDPI AGRemote Sensing2072-42922024-12-011624469310.3390/rs16244693Multi-Modal Prototypes for Few-Shot Object Detection in Remote Sensing ImagesYanxing Liu0Zongxu Pan1Jianwei Yang2Peiling Zhou3Bingchen Zhang4Aerospace Information Research Institute, Chinese Academy of Sciences, Beijing 100094, ChinaAerospace Information Research Institute, Chinese Academy of Sciences, Beijing 100094, ChinaAerospace Information Research Institute, Chinese Academy of Sciences, Beijing 100094, ChinaAerospace Information Research Institute, Chinese Academy of Sciences, Beijing 100094, ChinaAerospace Information Research Institute, Chinese Academy of Sciences, Beijing 100094, ChinaFew-shot object detection has attracted extensive attention due to the abomination of time-consuming or even impractical large-scale data labeling. Current studies attempted to employ prototype-matching approaches for object detection, constructing class prototypes from textual or visual features. However, single visual prototypes exhibit limited generalization in few-shot scenarios, while single textual prototypes lack the spatial details of remote sensing targets. Therefore, to achieve the best of both worlds, we propose a prototype aggregating module to integrate textual and visual prototypes, leveraging both semantics from textual prototypes and spatial details from visual prototypes. In addition, the transferability of multi-modal few-shot detectors from natural scenarios to remote sensing scenarios remains unexplored, and previous training strategies for FSOD do not adequately consider the characteristics of text encoders. To address the issue, we have conducted extensive ablation studies on different feature extractors of the detector and propose an efficient two-stage training strategy, which takes the characteristics of the text feature extractor into account. Experiments on two common few-shot detection benchmarks demonstrate the effectiveness of our proposed method. In four widely used data splits of DIOR, our method significantly outperforms previous state-of-the-art methods by at most 8.7%.https://www.mdpi.com/2072-4292/16/24/4693few-shot learningobject detectionremote sensing images
spellingShingle Yanxing Liu
Zongxu Pan
Jianwei Yang
Peiling Zhou
Bingchen Zhang
Multi-Modal Prototypes for Few-Shot Object Detection in Remote Sensing Images
Remote Sensing
few-shot learning
object detection
remote sensing images
title Multi-Modal Prototypes for Few-Shot Object Detection in Remote Sensing Images
title_full Multi-Modal Prototypes for Few-Shot Object Detection in Remote Sensing Images
title_fullStr Multi-Modal Prototypes for Few-Shot Object Detection in Remote Sensing Images
title_full_unstemmed Multi-Modal Prototypes for Few-Shot Object Detection in Remote Sensing Images
title_short Multi-Modal Prototypes for Few-Shot Object Detection in Remote Sensing Images
title_sort multi modal prototypes for few shot object detection in remote sensing images
topic few-shot learning
object detection
remote sensing images
url https://www.mdpi.com/2072-4292/16/24/4693
work_keys_str_mv AT yanxingliu multimodalprototypesforfewshotobjectdetectioninremotesensingimages
AT zongxupan multimodalprototypesforfewshotobjectdetectioninremotesensingimages
AT jianweiyang multimodalprototypesforfewshotobjectdetectioninremotesensingimages
AT peilingzhou multimodalprototypesforfewshotobjectdetectioninremotesensingimages
AT bingchenzhang multimodalprototypesforfewshotobjectdetectioninremotesensingimages