A Patch-Level Region-Aware Module with a Multi-Label Framework for Remote Sensing Image Captioning

A Patch-Level Region-Aware Module with a Multi-Label Framework for Remote Sensing Image Captioning

Recent Transformer-based works can generate high-quality captions for remote sensing images (RSIs). However, these methods generally feed global or grid visual features to a Transformer-based captioning model for associating cross-modal information, which limits performance. In this work, we investi...

Full description

Saved in:

Bibliographic Details
Main Authors:	Yunpeng Li, Xiangrong Zhang, Tianyang Zhang, Guanchun Wang, Xinlin Wang, Shuo Li
Format:	Article
Language:	English
Published:	MDPI AG 2024-10-01
Series:	Remote Sensing
Subjects:	remote sensing image captioning salient regions multi-label classification multi-head attention
Online Access:	https://www.mdpi.com/2072-4292/16/21/3987
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Thangka image captioning model with Salient Attention and Local Interaction Aggregator
by: Wenjin Hu, et al.
Published: (2024-11-01)

Novel Advance Image Caption Generation Utilizing Vision Transformer and Generative Adversarial Networks
by: Shourya Tyagi, et al.
Published: (2024-11-01)

Remote Sensing Image Change Captioning Using Multi-Attentive Network with Diffusion Model
by: Yue Yang, et al.
Published: (2024-11-01)

A Multi-Label Image Classification Method based on Label Correlation Learning Network
by: WANG Lufang, et al.
Published: (2024-11-01)

A Review of Deep Learning-Based Remote Sensing Image Caption: Methods, Models, Comparisons and Future Directions
by: Ke Zhang, et al.
Published: (2024-11-01)

DanceCaps: Pseudo-Captioning for Dance Videos Using Large Language Models
by: Seohyun Kim, et al.
Published: (2024-11-01)

KE-RSIC: Remote Sensing Image Captioning Based on Knowledge Embedding
by: Kangda Cheng, et al.
Published: (2025-01-01)

Visual Rotated Position Encoding Transformer for Remote Sensing Image Captioning
by: Anli Liu, et al.
Published: (2024-01-01)

CLIP-Based Grid Features and Masking for Remote Sensing Image Captioning
by: Qiaoling Lin, et al.
Published: (2025-01-01)

Enhanced CLIP-GPT Framework for Cross-Lingual Remote Sensing Image Captioning
by: Rui Song, et al.
Published: (2025-01-01)

Undergraduate students’ perceptions toward writing Instagram captions in English
by: Nahda Nafisah Hutasuhut, et al.
Published: (2024-05-01)

SOCIOPRAGMATIC STUDY ON INSTAGRAM CAPTIONS AS A MEDIA FOR TOURISM PROMOTION IN BANGKALAN
by: Tri Pujiati, et al.
Published: (2024-07-01)

A multi-label classification method for disposing incomplete labeled data and label relevance
by: Lina ZHANG, et al.
Published: (2016-08-01)

Multimodal Event Classification for Social Media Based on Text-Image-Caption Assisted Alignment
by: Yuanting Wang
Published: (2024-01-01)

An effective video captioning based on language description using a novel Graylag Deep Kookaburra Reinforcement Learning
by: M. Gowri Shankar, et al.
Published: (2025-01-01)

Detailed Image Captioning and Hashtag Generation
by: Nikshep Shetty, et al.
Published: (2024-11-01)

Affective Image Captioning for Visual Artworks Using Emotion-Based Cross-Attention Mechanisms
by: Shintaro Ishikawa, et al.
Published: (2023-01-01)

Combining Region-Guided Attention and Attribute Prediction for Thangka Image Captioning Method
by: Fujun Zhang, et al.
Published: (2025-01-01)

MIRA-CAP: Memory-Integrated Retrieval-Augmented Captioning for State-of-the-Art Image and Video Captioning
by: Sabina Umirzakova, et al.
Published: (2024-12-01)

Incidental vocabulary recognition effects of subtitled, captioned and reverse subtitled audiovisual input
by: Jana van der Kolk, et al.
Published: (2024-07-01)

Instance structure based multi-label learning with missing labels
by: Tianzhu CHEN, et al.
Published: (2021-11-01)

Preliminary Study on Image Captioning for Construction Hazards
by: Wen-Ta Hsiao, et al.
Published: (2024-08-01)

The Effect of Applying Arabic Translation Techniques on the Translation Quality Assessment of Al Jazeera Captions on TikTok Social Media
by: Farrah Chaiya Mas, et al.
Published: (2024-09-01)

Offline visual aid system for the blind based on image captioning
by: Yue CHEN, et al.
Published: (2022-01-01)

Multi-label feature selection based on dynamic graph Laplacian
by: Yonghao LI, et al.
Published: (2020-12-01)

Diversifying Multi-Head Attention in the Transformer Model
by: Nicholas Ampazis, et al.
Published: (2024-11-01)

MHRA-MS-3D-ResNet-BiLSTM: A Multi-Head-Residual Attention-Based Multi-Stream Deep Learning Model for Soybean Yield Prediction in the U.S. Using Multi-Source Remote Sensing Data
by: Mahdiyeh Fathi, et al.
Published: (2024-12-01)

Prediction and Optimization of Damper Winding Structural Parameters for Salient-Pole Synchronous Generators With Rectifier Load
by: Meijun Qi, et al.
Published: (2024-01-01)

Predicting future evapotranspiration based on remote sensing and deep learning
by: Xin Zheng, et al.
Published: (2024-12-01)

Multi-Label Feature Selection with Feature–Label Subgraph Association and Graph Representation Learning
by: Jinghou Ruan, et al.
Published: (2024-11-01)

Non-speech information w angielskich i rosyjskich napisach Closed Captions zawartych w serialu Эпидемия. Analiza kontrastywna
by: Daniel Piecewicz
Published: (2023-12-01)

Weakly Supervised Nuclei Segmentation with Point-Guided Attention and Self-Supervised Pseudo-Labeling
by: Yapeng Mo, et al.
Published: (2025-01-01)

Multi-Head Attention Refiner for Multi-View 3D Reconstruction
by: Kyunghee Lee, et al.
Published: (2024-10-01)

Multi-modal feature fusion with multi-head self-attention for epileptic EEG signals
by: Ning Huang, et al.
Published: (2024-08-01)

Multi-view knowledge representation learning for personalized news recommendation
by: Chao Chang, et al.
Published: (2025-01-01)

Research on Fault Diagnosis of Rotating Parts Based on Transformer Deep Learning Model
by: Zilin Zhang, et al.
Published: (2024-11-01)

Image Captioning Generator Using Deep Learning Models: An Abbreviated Survey
by: Yasir Hameed Zaidan, et al.
Published: (2024-04-01)

L’emploi d’expressions métalangagières : phénomènes de saillance et travail interprétatif
by: Blandine Pennec
Published: (2014-10-01)

CFRNet: Cross-Attention-Based Fusion and Refinement Network for Enhanced RGB-T Salient Object Detection
by: Biao Deng, et al.
Published: (2024-11-01)

Classification on Grade, Price, and Region with Multi-Label and Multi-Target Methods in Wineinformatics
by: James Palmer, et al.
Published: (2020-03-01)