A Review of Deep Learning-Based Remote Sensing Image Caption: Methods, Models, Comparisons and Future Directions

Remote sensing images contain a wealth of Earth-observation information. Efficient extraction and application of hidden knowledge from these images will greatly promote the development of resource and environment monitoring, urban planning and other related fields. Remote sensing image caption (RSIC...

Full description

Saved in:
Bibliographic Details
Main Authors: Ke Zhang, Peijie Li, Jianqiang Wang
Format: Article
Language:English
Published: MDPI AG 2024-11-01
Series:Remote Sensing
Subjects:
Online Access:https://www.mdpi.com/2072-4292/16/21/4113
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1846173160063369216
author Ke Zhang
Peijie Li
Jianqiang Wang
author_facet Ke Zhang
Peijie Li
Jianqiang Wang
author_sort Ke Zhang
collection DOAJ
description Remote sensing images contain a wealth of Earth-observation information. Efficient extraction and application of hidden knowledge from these images will greatly promote the development of resource and environment monitoring, urban planning and other related fields. Remote sensing image caption (RSIC) involves obtaining textual descriptions from remote sensing images through accurately capturing and describing the semantic-level relationships between objects and attributes in the images. However, there is currently no comprehensive review summarizing the progress in RSIC based on deep learning. After defining the scope of the papers to be discussed and summarizing them all, the paper begins by providing a comprehensive review of the recent advancements in RSIC, covering six key aspects: encoder–decoder framework, attention mechanism, reinforcement learning, learning with auxiliary task, large visual language models and few-shot learning. Subsequently a brief explanation on the datasets and evaluation metrics for RSIC is given. Furthermore, we compare and analyze the results of the latest models and the pros and cons of different deep learning methods. Lastly, future directions of RSIC are suggested. The primary objective of this review is to offer researchers a more profound understanding of RSIC.
format Article
id doaj-art-49299cde34b64a95bf5d971e2860772f
institution Kabale University
issn 2072-4292
language English
publishDate 2024-11-01
publisher MDPI AG
record_format Article
series Remote Sensing
spelling doaj-art-49299cde34b64a95bf5d971e2860772f2024-11-08T14:40:52ZengMDPI AGRemote Sensing2072-42922024-11-011621411310.3390/rs16214113A Review of Deep Learning-Based Remote Sensing Image Caption: Methods, Models, Comparisons and Future DirectionsKe Zhang0Peijie Li1Jianqiang Wang2Department of Electronic and Communication Engineering, North China Electric Power University, Baoding 071003, ChinaDepartment of Electronic and Communication Engineering, North China Electric Power University, Baoding 071003, ChinaDepartment of Electronic and Communication Engineering, North China Electric Power University, Baoding 071003, ChinaRemote sensing images contain a wealth of Earth-observation information. Efficient extraction and application of hidden knowledge from these images will greatly promote the development of resource and environment monitoring, urban planning and other related fields. Remote sensing image caption (RSIC) involves obtaining textual descriptions from remote sensing images through accurately capturing and describing the semantic-level relationships between objects and attributes in the images. However, there is currently no comprehensive review summarizing the progress in RSIC based on deep learning. After defining the scope of the papers to be discussed and summarizing them all, the paper begins by providing a comprehensive review of the recent advancements in RSIC, covering six key aspects: encoder–decoder framework, attention mechanism, reinforcement learning, learning with auxiliary task, large visual language models and few-shot learning. Subsequently a brief explanation on the datasets and evaluation metrics for RSIC is given. Furthermore, we compare and analyze the results of the latest models and the pros and cons of different deep learning methods. Lastly, future directions of RSIC are suggested. The primary objective of this review is to offer researchers a more profound understanding of RSIC.https://www.mdpi.com/2072-4292/16/21/4113remote sensingimage captionencoder–decoder frameworkattention mechanismreinforcement learningauxiliary task
spellingShingle Ke Zhang
Peijie Li
Jianqiang Wang
A Review of Deep Learning-Based Remote Sensing Image Caption: Methods, Models, Comparisons and Future Directions
Remote Sensing
remote sensing
image caption
encoder–decoder framework
attention mechanism
reinforcement learning
auxiliary task
title A Review of Deep Learning-Based Remote Sensing Image Caption: Methods, Models, Comparisons and Future Directions
title_full A Review of Deep Learning-Based Remote Sensing Image Caption: Methods, Models, Comparisons and Future Directions
title_fullStr A Review of Deep Learning-Based Remote Sensing Image Caption: Methods, Models, Comparisons and Future Directions
title_full_unstemmed A Review of Deep Learning-Based Remote Sensing Image Caption: Methods, Models, Comparisons and Future Directions
title_short A Review of Deep Learning-Based Remote Sensing Image Caption: Methods, Models, Comparisons and Future Directions
title_sort review of deep learning based remote sensing image caption methods models comparisons and future directions
topic remote sensing
image caption
encoder–decoder framework
attention mechanism
reinforcement learning
auxiliary task
url https://www.mdpi.com/2072-4292/16/21/4113
work_keys_str_mv AT kezhang areviewofdeeplearningbasedremotesensingimagecaptionmethodsmodelscomparisonsandfuturedirections
AT peijieli areviewofdeeplearningbasedremotesensingimagecaptionmethodsmodelscomparisonsandfuturedirections
AT jianqiangwang areviewofdeeplearningbasedremotesensingimagecaptionmethodsmodelscomparisonsandfuturedirections
AT kezhang reviewofdeeplearningbasedremotesensingimagecaptionmethodsmodelscomparisonsandfuturedirections
AT peijieli reviewofdeeplearningbasedremotesensingimagecaptionmethodsmodelscomparisonsandfuturedirections
AT jianqiangwang reviewofdeeplearningbasedremotesensingimagecaptionmethodsmodelscomparisonsandfuturedirections