A Review of Deep Learning-Based Remote Sensing Image Caption: Methods, Models, Comparisons and Future Directions
Remote sensing images contain a wealth of Earth-observation information. Efficient extraction and application of hidden knowledge from these images will greatly promote the development of resource and environment monitoring, urban planning and other related fields. Remote sensing image caption (RSIC...
Saved in:
Main Authors: | , , |
---|---|
Format: | Article |
Language: | English |
Published: |
MDPI AG
2024-11-01
|
Series: | Remote Sensing |
Subjects: | |
Online Access: | https://www.mdpi.com/2072-4292/16/21/4113 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
_version_ | 1846173160063369216 |
---|---|
author | Ke Zhang Peijie Li Jianqiang Wang |
author_facet | Ke Zhang Peijie Li Jianqiang Wang |
author_sort | Ke Zhang |
collection | DOAJ |
description | Remote sensing images contain a wealth of Earth-observation information. Efficient extraction and application of hidden knowledge from these images will greatly promote the development of resource and environment monitoring, urban planning and other related fields. Remote sensing image caption (RSIC) involves obtaining textual descriptions from remote sensing images through accurately capturing and describing the semantic-level relationships between objects and attributes in the images. However, there is currently no comprehensive review summarizing the progress in RSIC based on deep learning. After defining the scope of the papers to be discussed and summarizing them all, the paper begins by providing a comprehensive review of the recent advancements in RSIC, covering six key aspects: encoder–decoder framework, attention mechanism, reinforcement learning, learning with auxiliary task, large visual language models and few-shot learning. Subsequently a brief explanation on the datasets and evaluation metrics for RSIC is given. Furthermore, we compare and analyze the results of the latest models and the pros and cons of different deep learning methods. Lastly, future directions of RSIC are suggested. The primary objective of this review is to offer researchers a more profound understanding of RSIC. |
format | Article |
id | doaj-art-49299cde34b64a95bf5d971e2860772f |
institution | Kabale University |
issn | 2072-4292 |
language | English |
publishDate | 2024-11-01 |
publisher | MDPI AG |
record_format | Article |
series | Remote Sensing |
spelling | doaj-art-49299cde34b64a95bf5d971e2860772f2024-11-08T14:40:52ZengMDPI AGRemote Sensing2072-42922024-11-011621411310.3390/rs16214113A Review of Deep Learning-Based Remote Sensing Image Caption: Methods, Models, Comparisons and Future DirectionsKe Zhang0Peijie Li1Jianqiang Wang2Department of Electronic and Communication Engineering, North China Electric Power University, Baoding 071003, ChinaDepartment of Electronic and Communication Engineering, North China Electric Power University, Baoding 071003, ChinaDepartment of Electronic and Communication Engineering, North China Electric Power University, Baoding 071003, ChinaRemote sensing images contain a wealth of Earth-observation information. Efficient extraction and application of hidden knowledge from these images will greatly promote the development of resource and environment monitoring, urban planning and other related fields. Remote sensing image caption (RSIC) involves obtaining textual descriptions from remote sensing images through accurately capturing and describing the semantic-level relationships between objects and attributes in the images. However, there is currently no comprehensive review summarizing the progress in RSIC based on deep learning. After defining the scope of the papers to be discussed and summarizing them all, the paper begins by providing a comprehensive review of the recent advancements in RSIC, covering six key aspects: encoder–decoder framework, attention mechanism, reinforcement learning, learning with auxiliary task, large visual language models and few-shot learning. Subsequently a brief explanation on the datasets and evaluation metrics for RSIC is given. Furthermore, we compare and analyze the results of the latest models and the pros and cons of different deep learning methods. Lastly, future directions of RSIC are suggested. The primary objective of this review is to offer researchers a more profound understanding of RSIC.https://www.mdpi.com/2072-4292/16/21/4113remote sensingimage captionencoder–decoder frameworkattention mechanismreinforcement learningauxiliary task |
spellingShingle | Ke Zhang Peijie Li Jianqiang Wang A Review of Deep Learning-Based Remote Sensing Image Caption: Methods, Models, Comparisons and Future Directions Remote Sensing remote sensing image caption encoder–decoder framework attention mechanism reinforcement learning auxiliary task |
title | A Review of Deep Learning-Based Remote Sensing Image Caption: Methods, Models, Comparisons and Future Directions |
title_full | A Review of Deep Learning-Based Remote Sensing Image Caption: Methods, Models, Comparisons and Future Directions |
title_fullStr | A Review of Deep Learning-Based Remote Sensing Image Caption: Methods, Models, Comparisons and Future Directions |
title_full_unstemmed | A Review of Deep Learning-Based Remote Sensing Image Caption: Methods, Models, Comparisons and Future Directions |
title_short | A Review of Deep Learning-Based Remote Sensing Image Caption: Methods, Models, Comparisons and Future Directions |
title_sort | review of deep learning based remote sensing image caption methods models comparisons and future directions |
topic | remote sensing image caption encoder–decoder framework attention mechanism reinforcement learning auxiliary task |
url | https://www.mdpi.com/2072-4292/16/21/4113 |
work_keys_str_mv | AT kezhang areviewofdeeplearningbasedremotesensingimagecaptionmethodsmodelscomparisonsandfuturedirections AT peijieli areviewofdeeplearningbasedremotesensingimagecaptionmethodsmodelscomparisonsandfuturedirections AT jianqiangwang areviewofdeeplearningbasedremotesensingimagecaptionmethodsmodelscomparisonsandfuturedirections AT kezhang reviewofdeeplearningbasedremotesensingimagecaptionmethodsmodelscomparisonsandfuturedirections AT peijieli reviewofdeeplearningbasedremotesensingimagecaptionmethodsmodelscomparisonsandfuturedirections AT jianqiangwang reviewofdeeplearningbasedremotesensingimagecaptionmethodsmodelscomparisonsandfuturedirections |