M2Caps: learning multi-modal capsules of optical and SAR images for land cover classification

Land cover classification (LCC) is essential for monitoring land use and changes. This study examines the integration of optical (OPT) and synthetic aperture radar (SAR) images for precise LCC. The disparity between OPT and SAR images introduces challenges in fusing high-level semantic information a...

Full description

Saved in:
Bibliographic Details
Main Authors: Haodi Zhang, Anzhu Yu, Kuiliang Gao, Xuanbei Lu, Xuefeng Cao, Wenyue Guo, Weiqi Lian
Format: Article
Language:English
Published: Taylor & Francis Group 2025-12-01
Series:International Journal of Digital Earth
Subjects:
Online Access:https://www.tandfonline.com/doi/10.1080/17538947.2024.2447347
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1841562118772490240
author Haodi Zhang
Anzhu Yu
Kuiliang Gao
Xuanbei Lu
Xuefeng Cao
Wenyue Guo
Weiqi Lian
author_facet Haodi Zhang
Anzhu Yu
Kuiliang Gao
Xuanbei Lu
Xuefeng Cao
Wenyue Guo
Weiqi Lian
author_sort Haodi Zhang
collection DOAJ
description Land cover classification (LCC) is essential for monitoring land use and changes. This study examines the integration of optical (OPT) and synthetic aperture radar (SAR) images for precise LCC. The disparity between OPT and SAR images introduces challenges in fusing high-level semantic information and utilizing multi-scale features. To address these challenges, this paper proposes a novel multi-modal capsules model (M²Caps) incorporating multi-modal capsules learning and cascaded features fusion modules. The multi-modal capsules learning module models high-level semantic information and abstract relationships across diverse remote sensing images (RSIs) modalities as vectors, thereby facilitating the induction of joint multi-modal features with high discriminability and robustness. Subsequently, the cascaded features fusion module integrates various feature scales, concurrently processing deep multi modal features, shallow OPT features, and shallow SAR features at each layer. This approach ensures the precise characterization of both local details and global semantics. M²Caps outperformed state-of-the-art models, improving mean intersection over union (mIoU) by 2.86% – 12.9% on the WHU-OPT-SAR dataset and 3.91% – 12.3% on the GF-2 and GF-3 Pohang datasets, demonstrating its effectiveness in high-precision LCC in complex environments.
format Article
id doaj-art-bbcb687e0c8140858df90bafd70a81d4
institution Kabale University
issn 1753-8947
1753-8955
language English
publishDate 2025-12-01
publisher Taylor & Francis Group
record_format Article
series International Journal of Digital Earth
spelling doaj-art-bbcb687e0c8140858df90bafd70a81d42025-01-03T01:09:08ZengTaylor & Francis GroupInternational Journal of Digital Earth1753-89471753-89552025-12-0118110.1080/17538947.2024.2447347M2Caps: learning multi-modal capsules of optical and SAR images for land cover classificationHaodi Zhang0Anzhu Yu1Kuiliang Gao2Xuanbei Lu3Xuefeng Cao4Wenyue Guo5Weiqi Lian6School of Surveying and Mapping, Information Engineering University, Zhengzhou, People’s Republic of ChinaSchool of Surveying and Mapping, Information Engineering University, Zhengzhou, People’s Republic of ChinaSchool of Surveying and Mapping, Information Engineering University, Zhengzhou, People’s Republic of ChinaSchool of Surveying and Mapping, Information Engineering University, Zhengzhou, People’s Republic of ChinaSchool of Surveying and Mapping, Information Engineering University, Zhengzhou, People’s Republic of ChinaSchool of Surveying and Mapping, Information Engineering University, Zhengzhou, People’s Republic of ChinaSchool of Surveying and Mapping, Information Engineering University, Zhengzhou, People’s Republic of ChinaLand cover classification (LCC) is essential for monitoring land use and changes. This study examines the integration of optical (OPT) and synthetic aperture radar (SAR) images for precise LCC. The disparity between OPT and SAR images introduces challenges in fusing high-level semantic information and utilizing multi-scale features. To address these challenges, this paper proposes a novel multi-modal capsules model (M²Caps) incorporating multi-modal capsules learning and cascaded features fusion modules. The multi-modal capsules learning module models high-level semantic information and abstract relationships across diverse remote sensing images (RSIs) modalities as vectors, thereby facilitating the induction of joint multi-modal features with high discriminability and robustness. Subsequently, the cascaded features fusion module integrates various feature scales, concurrently processing deep multi modal features, shallow OPT features, and shallow SAR features at each layer. This approach ensures the precise characterization of both local details and global semantics. M²Caps outperformed state-of-the-art models, improving mean intersection over union (mIoU) by 2.86% – 12.9% on the WHU-OPT-SAR dataset and 3.91% – 12.3% on the GF-2 and GF-3 Pohang datasets, demonstrating its effectiveness in high-precision LCC in complex environments.https://www.tandfonline.com/doi/10.1080/17538947.2024.2447347Land cover classificationmulti-modal semantic segmentationmulti-modal capsules learningcascaded features fusionoptical imagessynthetic aperture radar
spellingShingle Haodi Zhang
Anzhu Yu
Kuiliang Gao
Xuanbei Lu
Xuefeng Cao
Wenyue Guo
Weiqi Lian
M2Caps: learning multi-modal capsules of optical and SAR images for land cover classification
International Journal of Digital Earth
Land cover classification
multi-modal semantic segmentation
multi-modal capsules learning
cascaded features fusion
optical images
synthetic aperture radar
title M2Caps: learning multi-modal capsules of optical and SAR images for land cover classification
title_full M2Caps: learning multi-modal capsules of optical and SAR images for land cover classification
title_fullStr M2Caps: learning multi-modal capsules of optical and SAR images for land cover classification
title_full_unstemmed M2Caps: learning multi-modal capsules of optical and SAR images for land cover classification
title_short M2Caps: learning multi-modal capsules of optical and SAR images for land cover classification
title_sort m2caps learning multi modal capsules of optical and sar images for land cover classification
topic Land cover classification
multi-modal semantic segmentation
multi-modal capsules learning
cascaded features fusion
optical images
synthetic aperture radar
url https://www.tandfonline.com/doi/10.1080/17538947.2024.2447347
work_keys_str_mv AT haodizhang m2capslearningmultimodalcapsulesofopticalandsarimagesforlandcoverclassification
AT anzhuyu m2capslearningmultimodalcapsulesofopticalandsarimagesforlandcoverclassification
AT kuilianggao m2capslearningmultimodalcapsulesofopticalandsarimagesforlandcoverclassification
AT xuanbeilu m2capslearningmultimodalcapsulesofopticalandsarimagesforlandcoverclassification
AT xuefengcao m2capslearningmultimodalcapsulesofopticalandsarimagesforlandcoverclassification
AT wenyueguo m2capslearningmultimodalcapsulesofopticalandsarimagesforlandcoverclassification
AT weiqilian m2capslearningmultimodalcapsulesofopticalandsarimagesforlandcoverclassification