Advancements in Vision–Language Models for Remote Sensing: Datasets, Capabilities, and Enhancement Techniques

Recently, the remarkable success of ChatGPT has sparked a renewed wave of interest in artificial intelligence (AI), and the advancements in Vision–Language Models (VLMs) have pushed this enthusiasm to new heights. Differing from previous AI approaches that generally formulated different tasks as dis...

Full description

Saved in:

Bibliographic Details
Main Authors:	Lijie Tao, Haokui Zhang, Haizhao Jing, Yu Liu, Dawei Yan, Guoting Wei, Xizhe Xue
Format:	Article
Language:	English
Published:	MDPI AG 2025-01-01
Series:	Remote Sensing
Subjects:	vision–language models remote sensing
Online Access:	https://www.mdpi.com/2072-4292/17/1/162
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1841548976110698496
author	Lijie Tao Haokui Zhang Haizhao Jing Yu Liu Dawei Yan Guoting Wei Xizhe Xue
author_facet	Lijie Tao Haokui Zhang Haizhao Jing Yu Liu Dawei Yan Guoting Wei Xizhe Xue
author_sort	Lijie Tao
collection	DOAJ
description	Recently, the remarkable success of ChatGPT has sparked a renewed wave of interest in artificial intelligence (AI), and the advancements in Vision–Language Models (VLMs) have pushed this enthusiasm to new heights. Differing from previous AI approaches that generally formulated different tasks as discriminative models, VLMs frame tasks as generative models and align language with visual information, enabling the handling of more challenging problems. The remote sensing (RS) field, a highly practical domain, has also embraced this new trend and introduced several VLM-based RS methods that have demonstrated promising performance and enormous potential. In this paper, we first review the fundamental theories related to VLM, then summarize the datasets constructed for VLMs in remote sensing and the various tasks they address. Finally, we categorize the improvement methods into three main parts according to the core components of VLMs and provide a detailed introduction and comparison of these methods.
format	Article
id	doaj-art-2585c56e1e7c4cb79c62fb0f6989f584
institution	Kabale University
issn	2072-4292
language	English
publishDate	2025-01-01
publisher	MDPI AG
record_format	Article
series	Remote Sensing
spelling	doaj-art-2585c56e1e7c4cb79c62fb0f6989f5842025-01-10T13:20:26ZengMDPI AGRemote Sensing2072-42922025-01-0117116210.3390/rs17010162Advancements in Vision–Language Models for Remote Sensing: Datasets, Capabilities, and Enhancement TechniquesLijie Tao0Haokui Zhang1Haizhao Jing2Yu Liu3Dawei Yan4Guoting Wei5Xizhe Xue6School of Cybersecurity, Northwestern Polytechnical University, Xi’an 710072, ChinaSchool of Cybersecurity, Northwestern Polytechnical University, Xi’an 710072, ChinaSchool of Cybersecurity, Northwestern Polytechnical University, Xi’an 710072, ChinaZhejiang Lab, Hangzhou 311500, ChinaSchool of Cybersecurity, Northwestern Polytechnical University, Xi’an 710072, ChinaSchool of Computer Science and Engineering, Nanjing University of Science and Technology, Nanjing 210094, ChinaDepartment of Aerospace and Geodesy, Technical University of Munich, 80333 Munich, GermanyRecently, the remarkable success of ChatGPT has sparked a renewed wave of interest in artificial intelligence (AI), and the advancements in Vision–Language Models (VLMs) have pushed this enthusiasm to new heights. Differing from previous AI approaches that generally formulated different tasks as discriminative models, VLMs frame tasks as generative models and align language with visual information, enabling the handling of more challenging problems. The remote sensing (RS) field, a highly practical domain, has also embraced this new trend and introduced several VLM-based RS methods that have demonstrated promising performance and enormous potential. In this paper, we first review the fundamental theories related to VLM, then summarize the datasets constructed for VLMs in remote sensing and the various tasks they address. Finally, we categorize the improvement methods into three main parts according to the core components of VLMs and provide a detailed introduction and comparison of these methods.https://www.mdpi.com/2072-4292/17/1/162vision–language modelsremote sensing
spellingShingle	Lijie Tao Haokui Zhang Haizhao Jing Yu Liu Dawei Yan Guoting Wei Xizhe Xue Advancements in Vision–Language Models for Remote Sensing: Datasets, Capabilities, and Enhancement Techniques Remote Sensing vision–language models remote sensing
title	Advancements in Vision–Language Models for Remote Sensing: Datasets, Capabilities, and Enhancement Techniques
title_full	Advancements in Vision–Language Models for Remote Sensing: Datasets, Capabilities, and Enhancement Techniques
title_fullStr	Advancements in Vision–Language Models for Remote Sensing: Datasets, Capabilities, and Enhancement Techniques
title_full_unstemmed	Advancements in Vision–Language Models for Remote Sensing: Datasets, Capabilities, and Enhancement Techniques
title_short	Advancements in Vision–Language Models for Remote Sensing: Datasets, Capabilities, and Enhancement Techniques
title_sort	advancements in vision language models for remote sensing datasets capabilities and enhancement techniques
topic	vision–language models remote sensing
url	https://www.mdpi.com/2072-4292/17/1/162
work_keys_str_mv	AT lijietao advancementsinvisionlanguagemodelsforremotesensingdatasetscapabilitiesandenhancementtechniques AT haokuizhang advancementsinvisionlanguagemodelsforremotesensingdatasetscapabilitiesandenhancementtechniques AT haizhaojing advancementsinvisionlanguagemodelsforremotesensingdatasetscapabilitiesandenhancementtechniques AT yuliu advancementsinvisionlanguagemodelsforremotesensingdatasetscapabilitiesandenhancementtechniques AT daweiyan advancementsinvisionlanguagemodelsforremotesensingdatasetscapabilitiesandenhancementtechniques AT guotingwei advancementsinvisionlanguagemodelsforremotesensingdatasetscapabilitiesandenhancementtechniques AT xizhexue advancementsinvisionlanguagemodelsforremotesensingdatasetscapabilitiesandenhancementtechniques

Advancements in Vision–Language Models for Remote Sensing: Datasets, Capabilities, and Enhancement Techniques

Similar Items