Adjacent-Scale Multimodal Fusion Networks for Semantic Segmentation of Remote Sensing Data

Semantic segmentation is a fundamental task in remote sensing image analysis. The accurate delineation of objects within such imagery serves as the cornerstone for a wide range of applications. To address this issue, edge detection, cross-modal data, large intraclass variability, and limited intercl...

Full description

Saved in:

Bibliographic Details
Main Authors:	Xianping Ma, Xichen Xu, Xiaokang Zhang, Man-On Pun
Format:	Article
Language:	English
Published:	IEEE 2024-01-01
Series:	IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing
Subjects:	Adjacent-scale multimodal fusion remote sensing semantic segmentation
Online Access:	https://ieeexplore.ieee.org/document/10736654/
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1846163855789522944
author	Xianping Ma Xichen Xu Xiaokang Zhang Man-On Pun
author_facet	Xianping Ma Xichen Xu Xiaokang Zhang Man-On Pun
author_sort	Xianping Ma
collection	DOAJ
description	Semantic segmentation is a fundamental task in remote sensing image analysis. The accurate delineation of objects within such imagery serves as the cornerstone for a wide range of applications. To address this issue, edge detection, cross-modal data, large intraclass variability, and limited interclass variance must be considered. Traditional convolutional-neural-network-based models are notably constrained by their local receptive fields, Nowadays, transformer-based methods show great potential to learn features globally, while they ignore positional cues easily and are still unable to cope with multimodal data. Therefore, this work proposes an adjacent-scale multimodal fusion network (ASMFNet) for semantic segmentation of remote sensing data. ASMFNet stands out not only for its innovative interaction mechanism across adjacent-scale features, effectively capturing contextual cues while maintaining low computational complexity but also for its remarkable cross-modal capability. It seamlessly integrates different modalities, enriching feature representation. Its hierarchical scale attention (HSA) module bolsters the association between ground objects and their surrounding scenes through learning discriminative features at higher level abstractions, thereby linking the broad structural information. Adaptive modality fusion module is equipped by HSA with valuable insights into the interrelationships between cross-model data, and it assigns spatial weights at the pixel level and seamlessly integrates them into channel features to enhance fusion representation through an evaluation of modality importance via feature concatenation and filtering. Extensive experiments on representative remote sensing semantic segmentation datasets, including the ISPRS Vaihingen and Potsdam datasets, confirm the impressive performance of the proposed ASMFNet.
format	Article
id	doaj-art-9c8b556adb5d4f939e948118dfb32743
institution	Kabale University
issn	1939-1404 2151-1535
language	English
publishDate	2024-01-01
publisher	IEEE
record_format	Article
series	IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing
spelling	doaj-art-9c8b556adb5d4f939e948118dfb327432024-11-19T00:00:35ZengIEEEIEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing1939-14042151-15352024-01-0117201162012810.1109/JSTARS.2024.348690610736654Adjacent-Scale Multimodal Fusion Networks for Semantic Segmentation of Remote Sensing DataXianping Ma0https://orcid.org/0000-0002-2180-2964Xichen Xu1Xiaokang Zhang2https://orcid.org/0000-0002-6127-4801Man-On Pun3https://orcid.org/0000-0003-3316-5381School of Science and Engineering, The Chinese University of Hong Kong, Shenzhen, ChinaSchool of Science and Engineering, The Chinese University of Hong Kong, Shenzhen, ChinaSchool of Information Science and Engineering, Wuhan University of Science and Technology, Wuhan, ChinaSchool of Science and Engineering, The Chinese University of Hong Kong, Shenzhen, ChinaSemantic segmentation is a fundamental task in remote sensing image analysis. The accurate delineation of objects within such imagery serves as the cornerstone for a wide range of applications. To address this issue, edge detection, cross-modal data, large intraclass variability, and limited interclass variance must be considered. Traditional convolutional-neural-network-based models are notably constrained by their local receptive fields, Nowadays, transformer-based methods show great potential to learn features globally, while they ignore positional cues easily and are still unable to cope with multimodal data. Therefore, this work proposes an adjacent-scale multimodal fusion network (ASMFNet) for semantic segmentation of remote sensing data. ASMFNet stands out not only for its innovative interaction mechanism across adjacent-scale features, effectively capturing contextual cues while maintaining low computational complexity but also for its remarkable cross-modal capability. It seamlessly integrates different modalities, enriching feature representation. Its hierarchical scale attention (HSA) module bolsters the association between ground objects and their surrounding scenes through learning discriminative features at higher level abstractions, thereby linking the broad structural information. Adaptive modality fusion module is equipped by HSA with valuable insights into the interrelationships between cross-model data, and it assigns spatial weights at the pixel level and seamlessly integrates them into channel features to enhance fusion representation through an evaluation of modality importance via feature concatenation and filtering. Extensive experiments on representative remote sensing semantic segmentation datasets, including the ISPRS Vaihingen and Potsdam datasets, confirm the impressive performance of the proposed ASMFNet.https://ieeexplore.ieee.org/document/10736654/Adjacent-scalemultimodal fusionremote sensingsemantic segmentation
spellingShingle	Xianping Ma Xichen Xu Xiaokang Zhang Man-On Pun Adjacent-Scale Multimodal Fusion Networks for Semantic Segmentation of Remote Sensing Data IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing Adjacent-scale multimodal fusion remote sensing semantic segmentation
title	Adjacent-Scale Multimodal Fusion Networks for Semantic Segmentation of Remote Sensing Data
title_full	Adjacent-Scale Multimodal Fusion Networks for Semantic Segmentation of Remote Sensing Data
title_fullStr	Adjacent-Scale Multimodal Fusion Networks for Semantic Segmentation of Remote Sensing Data
title_full_unstemmed	Adjacent-Scale Multimodal Fusion Networks for Semantic Segmentation of Remote Sensing Data
title_short	Adjacent-Scale Multimodal Fusion Networks for Semantic Segmentation of Remote Sensing Data
title_sort	adjacent scale multimodal fusion networks for semantic segmentation of remote sensing data
topic	Adjacent-scale multimodal fusion remote sensing semantic segmentation
url	https://ieeexplore.ieee.org/document/10736654/
work_keys_str_mv	AT xianpingma adjacentscalemultimodalfusionnetworksforsemanticsegmentationofremotesensingdata AT xichenxu adjacentscalemultimodalfusionnetworksforsemanticsegmentationofremotesensingdata AT xiaokangzhang adjacentscalemultimodalfusionnetworksforsemanticsegmentationofremotesensingdata AT manonpun adjacentscalemultimodalfusionnetworksforsemanticsegmentationofremotesensingdata

Adjacent-Scale Multimodal Fusion Networks for Semantic Segmentation of Remote Sensing Data

Similar Items