Advancements in Vision–Language Models for Remote Sensing: Datasets, Capabilities, and Enhancement Techniques

Recently, the remarkable success of ChatGPT has sparked a renewed wave of interest in artificial intelligence (AI), and the advancements in Vision–Language Models (VLMs) have pushed this enthusiasm to new heights. Differing from previous AI approaches that generally formulated different tasks as dis...

Full description

Saved in:
Bibliographic Details
Main Authors: Lijie Tao, Haokui Zhang, Haizhao Jing, Yu Liu, Dawei Yan, Guoting Wei, Xizhe Xue
Format: Article
Language:English
Published: MDPI AG 2025-01-01
Series:Remote Sensing
Subjects:
Online Access:https://www.mdpi.com/2072-4292/17/1/162
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1841548976110698496
author Lijie Tao
Haokui Zhang
Haizhao Jing
Yu Liu
Dawei Yan
Guoting Wei
Xizhe Xue
author_facet Lijie Tao
Haokui Zhang
Haizhao Jing
Yu Liu
Dawei Yan
Guoting Wei
Xizhe Xue
author_sort Lijie Tao
collection DOAJ
description Recently, the remarkable success of ChatGPT has sparked a renewed wave of interest in artificial intelligence (AI), and the advancements in Vision–Language Models (VLMs) have pushed this enthusiasm to new heights. Differing from previous AI approaches that generally formulated different tasks as discriminative models, VLMs frame tasks as generative models and align language with visual information, enabling the handling of more challenging problems. The remote sensing (RS) field, a highly practical domain, has also embraced this new trend and introduced several VLM-based RS methods that have demonstrated promising performance and enormous potential. In this paper, we first review the fundamental theories related to VLM, then summarize the datasets constructed for VLMs in remote sensing and the various tasks they address. Finally, we categorize the improvement methods into three main parts according to the core components of VLMs and provide a detailed introduction and comparison of these methods.
format Article
id doaj-art-2585c56e1e7c4cb79c62fb0f6989f584
institution Kabale University
issn 2072-4292
language English
publishDate 2025-01-01
publisher MDPI AG
record_format Article
series Remote Sensing
spelling doaj-art-2585c56e1e7c4cb79c62fb0f6989f5842025-01-10T13:20:26ZengMDPI AGRemote Sensing2072-42922025-01-0117116210.3390/rs17010162Advancements in Vision–Language Models for Remote Sensing: Datasets, Capabilities, and Enhancement TechniquesLijie Tao0Haokui Zhang1Haizhao Jing2Yu Liu3Dawei Yan4Guoting Wei5Xizhe Xue6School of Cybersecurity, Northwestern Polytechnical University, Xi’an 710072, ChinaSchool of Cybersecurity, Northwestern Polytechnical University, Xi’an 710072, ChinaSchool of Cybersecurity, Northwestern Polytechnical University, Xi’an 710072, ChinaZhejiang Lab, Hangzhou 311500, ChinaSchool of Cybersecurity, Northwestern Polytechnical University, Xi’an 710072, ChinaSchool of Computer Science and Engineering, Nanjing University of Science and Technology, Nanjing 210094, ChinaDepartment of Aerospace and Geodesy, Technical University of Munich, 80333 Munich, GermanyRecently, the remarkable success of ChatGPT has sparked a renewed wave of interest in artificial intelligence (AI), and the advancements in Vision–Language Models (VLMs) have pushed this enthusiasm to new heights. Differing from previous AI approaches that generally formulated different tasks as discriminative models, VLMs frame tasks as generative models and align language with visual information, enabling the handling of more challenging problems. The remote sensing (RS) field, a highly practical domain, has also embraced this new trend and introduced several VLM-based RS methods that have demonstrated promising performance and enormous potential. In this paper, we first review the fundamental theories related to VLM, then summarize the datasets constructed for VLMs in remote sensing and the various tasks they address. Finally, we categorize the improvement methods into three main parts according to the core components of VLMs and provide a detailed introduction and comparison of these methods.https://www.mdpi.com/2072-4292/17/1/162vision–language modelsremote sensing
spellingShingle Lijie Tao
Haokui Zhang
Haizhao Jing
Yu Liu
Dawei Yan
Guoting Wei
Xizhe Xue
Advancements in Vision–Language Models for Remote Sensing: Datasets, Capabilities, and Enhancement Techniques
Remote Sensing
vision–language models
remote sensing
title Advancements in Vision–Language Models for Remote Sensing: Datasets, Capabilities, and Enhancement Techniques
title_full Advancements in Vision–Language Models for Remote Sensing: Datasets, Capabilities, and Enhancement Techniques
title_fullStr Advancements in Vision–Language Models for Remote Sensing: Datasets, Capabilities, and Enhancement Techniques
title_full_unstemmed Advancements in Vision–Language Models for Remote Sensing: Datasets, Capabilities, and Enhancement Techniques
title_short Advancements in Vision–Language Models for Remote Sensing: Datasets, Capabilities, and Enhancement Techniques
title_sort advancements in vision language models for remote sensing datasets capabilities and enhancement techniques
topic vision–language models
remote sensing
url https://www.mdpi.com/2072-4292/17/1/162
work_keys_str_mv AT lijietao advancementsinvisionlanguagemodelsforremotesensingdatasetscapabilitiesandenhancementtechniques
AT haokuizhang advancementsinvisionlanguagemodelsforremotesensingdatasetscapabilitiesandenhancementtechniques
AT haizhaojing advancementsinvisionlanguagemodelsforremotesensingdatasetscapabilitiesandenhancementtechniques
AT yuliu advancementsinvisionlanguagemodelsforremotesensingdatasetscapabilitiesandenhancementtechniques
AT daweiyan advancementsinvisionlanguagemodelsforremotesensingdatasetscapabilitiesandenhancementtechniques
AT guotingwei advancementsinvisionlanguagemodelsforremotesensingdatasetscapabilitiesandenhancementtechniques
AT xizhexue advancementsinvisionlanguagemodelsforremotesensingdatasetscapabilitiesandenhancementtechniques