A Review of Deep Learning-Based Remote Sensing Image Caption: Methods, Models, Comparisons and Future Directions

Remote sensing images contain a wealth of Earth-observation information. Efficient extraction and application of hidden knowledge from these images will greatly promote the development of resource and environment monitoring, urban planning and other related fields. Remote sensing image caption (RSIC...

Full description

Saved in:

Bibliographic Details
Main Authors:	Ke Zhang, Peijie Li, Jianqiang Wang
Format:	Article
Language:	English
Published:	MDPI AG 2024-11-01
Series:	Remote Sensing
Subjects:	remote sensing image caption encoder–decoder framework attention mechanism reinforcement learning auxiliary task
Online Access:	https://www.mdpi.com/2072-4292/16/21/4113
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1846173160063369216
author	Ke Zhang Peijie Li Jianqiang Wang
author_facet	Ke Zhang Peijie Li Jianqiang Wang
author_sort	Ke Zhang
collection	DOAJ
description	Remote sensing images contain a wealth of Earth-observation information. Efficient extraction and application of hidden knowledge from these images will greatly promote the development of resource and environment monitoring, urban planning and other related fields. Remote sensing image caption (RSIC) involves obtaining textual descriptions from remote sensing images through accurately capturing and describing the semantic-level relationships between objects and attributes in the images. However, there is currently no comprehensive review summarizing the progress in RSIC based on deep learning. After defining the scope of the papers to be discussed and summarizing them all, the paper begins by providing a comprehensive review of the recent advancements in RSIC, covering six key aspects: encoder–decoder framework, attention mechanism, reinforcement learning, learning with auxiliary task, large visual language models and few-shot learning. Subsequently a brief explanation on the datasets and evaluation metrics for RSIC is given. Furthermore, we compare and analyze the results of the latest models and the pros and cons of different deep learning methods. Lastly, future directions of RSIC are suggested. The primary objective of this review is to offer researchers a more profound understanding of RSIC.
format	Article
id	doaj-art-49299cde34b64a95bf5d971e2860772f
institution	Kabale University
issn	2072-4292
language	English
publishDate	2024-11-01
publisher	MDPI AG
record_format	Article
series	Remote Sensing
spelling	doaj-art-49299cde34b64a95bf5d971e2860772f2024-11-08T14:40:52ZengMDPI AGRemote Sensing2072-42922024-11-011621411310.3390/rs16214113A Review of Deep Learning-Based Remote Sensing Image Caption: Methods, Models, Comparisons and Future DirectionsKe Zhang0Peijie Li1Jianqiang Wang2Department of Electronic and Communication Engineering, North China Electric Power University, Baoding 071003, ChinaDepartment of Electronic and Communication Engineering, North China Electric Power University, Baoding 071003, ChinaDepartment of Electronic and Communication Engineering, North China Electric Power University, Baoding 071003, ChinaRemote sensing images contain a wealth of Earth-observation information. Efficient extraction and application of hidden knowledge from these images will greatly promote the development of resource and environment monitoring, urban planning and other related fields. Remote sensing image caption (RSIC) involves obtaining textual descriptions from remote sensing images through accurately capturing and describing the semantic-level relationships between objects and attributes in the images. However, there is currently no comprehensive review summarizing the progress in RSIC based on deep learning. After defining the scope of the papers to be discussed and summarizing them all, the paper begins by providing a comprehensive review of the recent advancements in RSIC, covering six key aspects: encoder–decoder framework, attention mechanism, reinforcement learning, learning with auxiliary task, large visual language models and few-shot learning. Subsequently a brief explanation on the datasets and evaluation metrics for RSIC is given. Furthermore, we compare and analyze the results of the latest models and the pros and cons of different deep learning methods. Lastly, future directions of RSIC are suggested. The primary objective of this review is to offer researchers a more profound understanding of RSIC.https://www.mdpi.com/2072-4292/16/21/4113remote sensingimage captionencoder–decoder frameworkattention mechanismreinforcement learningauxiliary task
spellingShingle	Ke Zhang Peijie Li Jianqiang Wang A Review of Deep Learning-Based Remote Sensing Image Caption: Methods, Models, Comparisons and Future Directions Remote Sensing remote sensing image caption encoder–decoder framework attention mechanism reinforcement learning auxiliary task
title	A Review of Deep Learning-Based Remote Sensing Image Caption: Methods, Models, Comparisons and Future Directions
title_full	A Review of Deep Learning-Based Remote Sensing Image Caption: Methods, Models, Comparisons and Future Directions
title_fullStr	A Review of Deep Learning-Based Remote Sensing Image Caption: Methods, Models, Comparisons and Future Directions
title_full_unstemmed	A Review of Deep Learning-Based Remote Sensing Image Caption: Methods, Models, Comparisons and Future Directions
title_short	A Review of Deep Learning-Based Remote Sensing Image Caption: Methods, Models, Comparisons and Future Directions
title_sort	review of deep learning based remote sensing image caption methods models comparisons and future directions
topic	remote sensing image caption encoder–decoder framework attention mechanism reinforcement learning auxiliary task
url	https://www.mdpi.com/2072-4292/16/21/4113
work_keys_str_mv	AT kezhang areviewofdeeplearningbasedremotesensingimagecaptionmethodsmodelscomparisonsandfuturedirections AT peijieli areviewofdeeplearningbasedremotesensingimagecaptionmethodsmodelscomparisonsandfuturedirections AT jianqiangwang areviewofdeeplearningbasedremotesensingimagecaptionmethodsmodelscomparisonsandfuturedirections AT kezhang reviewofdeeplearningbasedremotesensingimagecaptionmethodsmodelscomparisonsandfuturedirections AT peijieli reviewofdeeplearningbasedremotesensingimagecaptionmethodsmodelscomparisonsandfuturedirections AT jianqiangwang reviewofdeeplearningbasedremotesensingimagecaptionmethodsmodelscomparisonsandfuturedirections

A Review of Deep Learning-Based Remote Sensing Image Caption: Methods, Models, Comparisons and Future Directions

Similar Items