Advancements in Vision–Language Models for Remote Sensing: Datasets, Capabilities, and Enhancement Techniques
Recently, the remarkable success of ChatGPT has sparked a renewed wave of interest in artificial intelligence (AI), and the advancements in Vision–Language Models (VLMs) have pushed this enthusiasm to new heights. Differing from previous AI approaches that generally formulated different tasks as dis...
Saved in:
Main Authors: | , , , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
MDPI AG
2025-01-01
|
Series: | Remote Sensing |
Subjects: | |
Online Access: | https://www.mdpi.com/2072-4292/17/1/162 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
_version_ | 1841548976110698496 |
---|---|
author | Lijie Tao Haokui Zhang Haizhao Jing Yu Liu Dawei Yan Guoting Wei Xizhe Xue |
author_facet | Lijie Tao Haokui Zhang Haizhao Jing Yu Liu Dawei Yan Guoting Wei Xizhe Xue |
author_sort | Lijie Tao |
collection | DOAJ |
description | Recently, the remarkable success of ChatGPT has sparked a renewed wave of interest in artificial intelligence (AI), and the advancements in Vision–Language Models (VLMs) have pushed this enthusiasm to new heights. Differing from previous AI approaches that generally formulated different tasks as discriminative models, VLMs frame tasks as generative models and align language with visual information, enabling the handling of more challenging problems. The remote sensing (RS) field, a highly practical domain, has also embraced this new trend and introduced several VLM-based RS methods that have demonstrated promising performance and enormous potential. In this paper, we first review the fundamental theories related to VLM, then summarize the datasets constructed for VLMs in remote sensing and the various tasks they address. Finally, we categorize the improvement methods into three main parts according to the core components of VLMs and provide a detailed introduction and comparison of these methods. |
format | Article |
id | doaj-art-2585c56e1e7c4cb79c62fb0f6989f584 |
institution | Kabale University |
issn | 2072-4292 |
language | English |
publishDate | 2025-01-01 |
publisher | MDPI AG |
record_format | Article |
series | Remote Sensing |
spelling | doaj-art-2585c56e1e7c4cb79c62fb0f6989f5842025-01-10T13:20:26ZengMDPI AGRemote Sensing2072-42922025-01-0117116210.3390/rs17010162Advancements in Vision–Language Models for Remote Sensing: Datasets, Capabilities, and Enhancement TechniquesLijie Tao0Haokui Zhang1Haizhao Jing2Yu Liu3Dawei Yan4Guoting Wei5Xizhe Xue6School of Cybersecurity, Northwestern Polytechnical University, Xi’an 710072, ChinaSchool of Cybersecurity, Northwestern Polytechnical University, Xi’an 710072, ChinaSchool of Cybersecurity, Northwestern Polytechnical University, Xi’an 710072, ChinaZhejiang Lab, Hangzhou 311500, ChinaSchool of Cybersecurity, Northwestern Polytechnical University, Xi’an 710072, ChinaSchool of Computer Science and Engineering, Nanjing University of Science and Technology, Nanjing 210094, ChinaDepartment of Aerospace and Geodesy, Technical University of Munich, 80333 Munich, GermanyRecently, the remarkable success of ChatGPT has sparked a renewed wave of interest in artificial intelligence (AI), and the advancements in Vision–Language Models (VLMs) have pushed this enthusiasm to new heights. Differing from previous AI approaches that generally formulated different tasks as discriminative models, VLMs frame tasks as generative models and align language with visual information, enabling the handling of more challenging problems. The remote sensing (RS) field, a highly practical domain, has also embraced this new trend and introduced several VLM-based RS methods that have demonstrated promising performance and enormous potential. In this paper, we first review the fundamental theories related to VLM, then summarize the datasets constructed for VLMs in remote sensing and the various tasks they address. Finally, we categorize the improvement methods into three main parts according to the core components of VLMs and provide a detailed introduction and comparison of these methods.https://www.mdpi.com/2072-4292/17/1/162vision–language modelsremote sensing |
spellingShingle | Lijie Tao Haokui Zhang Haizhao Jing Yu Liu Dawei Yan Guoting Wei Xizhe Xue Advancements in Vision–Language Models for Remote Sensing: Datasets, Capabilities, and Enhancement Techniques Remote Sensing vision–language models remote sensing |
title | Advancements in Vision–Language Models for Remote Sensing: Datasets, Capabilities, and Enhancement Techniques |
title_full | Advancements in Vision–Language Models for Remote Sensing: Datasets, Capabilities, and Enhancement Techniques |
title_fullStr | Advancements in Vision–Language Models for Remote Sensing: Datasets, Capabilities, and Enhancement Techniques |
title_full_unstemmed | Advancements in Vision–Language Models for Remote Sensing: Datasets, Capabilities, and Enhancement Techniques |
title_short | Advancements in Vision–Language Models for Remote Sensing: Datasets, Capabilities, and Enhancement Techniques |
title_sort | advancements in vision language models for remote sensing datasets capabilities and enhancement techniques |
topic | vision–language models remote sensing |
url | https://www.mdpi.com/2072-4292/17/1/162 |
work_keys_str_mv | AT lijietao advancementsinvisionlanguagemodelsforremotesensingdatasetscapabilitiesandenhancementtechniques AT haokuizhang advancementsinvisionlanguagemodelsforremotesensingdatasetscapabilitiesandenhancementtechniques AT haizhaojing advancementsinvisionlanguagemodelsforremotesensingdatasetscapabilitiesandenhancementtechniques AT yuliu advancementsinvisionlanguagemodelsforremotesensingdatasetscapabilitiesandenhancementtechniques AT daweiyan advancementsinvisionlanguagemodelsforremotesensingdatasetscapabilitiesandenhancementtechniques AT guotingwei advancementsinvisionlanguagemodelsforremotesensingdatasetscapabilitiesandenhancementtechniques AT xizhexue advancementsinvisionlanguagemodelsforremotesensingdatasetscapabilitiesandenhancementtechniques |