GLClick: Interactive Segmentation Combining Global and Local Features
Convolutional neural networks (CNNs) are the backbone of most modern interactive segmentation algorithms. However, the limited receptive field of CNNs restricts their ability to capture long-range semantic relationships. Recently, transformers have gained significant attention for their capacity to...
Saved in:
Main Authors: | , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
MDPI AG
2024-12-01
|
Series: | Applied Sciences |
Subjects: | |
Online Access: | https://www.mdpi.com/2076-3417/15/1/186 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
_version_ | 1841549447552565248 |
---|---|
author | Jiaying Tang Hongyuan Wang Zongyuan Ding Zihao Xin |
author_facet | Jiaying Tang Hongyuan Wang Zongyuan Ding Zihao Xin |
author_sort | Jiaying Tang |
collection | DOAJ |
description | Convolutional neural networks (CNNs) are the backbone of most modern interactive segmentation algorithms. However, the limited receptive field of CNNs restricts their ability to capture long-range semantic relationships. Recently, transformers have gained significant attention for their capacity to capture long-range dependencies. Nevertheless, CNNs still outperform Transformer in extracting local information. An effective interactive segmentation algorithm should accurately capture fine-grained local details alongside global semantic relationships. Therefore, we propose GLClick, a global–local click-based interactive image segmentation model that integrates local and global information through a novel fusion mechanism. We design an efficient global–local feature fusion module (GLFM) that integrates fine-grained features from various layers of ResNet50 with those from the Transformer feature pyramid. This approach maintains ResNet50’s ability to extract local features while effectively leveraging the Transformer to capture global context. Additionally, we enhance the multi-layer perceptron (MLP) to improve performance. Extensive experiments on diverse benchmark datasets demonstrate significant improvements in interactive image segmentation, confirming the effectiveness of our approach. Moreover, we conduct experiments on medical image datasets, further illustrating the model’s versatility and effectiveness across different domains. |
format | Article |
id | doaj-art-714681e2554b46c493f0f92db97187f7 |
institution | Kabale University |
issn | 2076-3417 |
language | English |
publishDate | 2024-12-01 |
publisher | MDPI AG |
record_format | Article |
series | Applied Sciences |
spelling | doaj-art-714681e2554b46c493f0f92db97187f72025-01-10T13:14:44ZengMDPI AGApplied Sciences2076-34172024-12-0115118610.3390/app15010186GLClick: Interactive Segmentation Combining Global and Local FeaturesJiaying Tang0Hongyuan Wang1Zongyuan Ding2Zihao Xin3School of Computer and Artificial Intelligence, Changzhou University, No. 1, Gehu Road, Changzhou 213164, ChinaSchool of Computer and Artificial Intelligence, Changzhou University, No. 1, Gehu Road, Changzhou 213164, ChinaSchool of Computer and Artificial Intelligence, Changzhou University, No. 1, Gehu Road, Changzhou 213164, ChinaSchool of Computer Science and Technology, Nanjing University of Aeronautics and Astronautics, No. 29, Jiangjun Road, Nanjing 211106, ChinaConvolutional neural networks (CNNs) are the backbone of most modern interactive segmentation algorithms. However, the limited receptive field of CNNs restricts their ability to capture long-range semantic relationships. Recently, transformers have gained significant attention for their capacity to capture long-range dependencies. Nevertheless, CNNs still outperform Transformer in extracting local information. An effective interactive segmentation algorithm should accurately capture fine-grained local details alongside global semantic relationships. Therefore, we propose GLClick, a global–local click-based interactive image segmentation model that integrates local and global information through a novel fusion mechanism. We design an efficient global–local feature fusion module (GLFM) that integrates fine-grained features from various layers of ResNet50 with those from the Transformer feature pyramid. This approach maintains ResNet50’s ability to extract local features while effectively leveraging the Transformer to capture global context. Additionally, we enhance the multi-layer perceptron (MLP) to improve performance. Extensive experiments on diverse benchmark datasets demonstrate significant improvements in interactive image segmentation, confirming the effectiveness of our approach. Moreover, we conduct experiments on medical image datasets, further illustrating the model’s versatility and effectiveness across different domains.https://www.mdpi.com/2076-3417/15/1/186ResNet50Transformerglobal–local feature fusion moduleinteractive segmentationMLP |
spellingShingle | Jiaying Tang Hongyuan Wang Zongyuan Ding Zihao Xin GLClick: Interactive Segmentation Combining Global and Local Features Applied Sciences ResNet50 Transformer global–local feature fusion module interactive segmentation MLP |
title | GLClick: Interactive Segmentation Combining Global and Local Features |
title_full | GLClick: Interactive Segmentation Combining Global and Local Features |
title_fullStr | GLClick: Interactive Segmentation Combining Global and Local Features |
title_full_unstemmed | GLClick: Interactive Segmentation Combining Global and Local Features |
title_short | GLClick: Interactive Segmentation Combining Global and Local Features |
title_sort | glclick interactive segmentation combining global and local features |
topic | ResNet50 Transformer global–local feature fusion module interactive segmentation MLP |
url | https://www.mdpi.com/2076-3417/15/1/186 |
work_keys_str_mv | AT jiayingtang glclickinteractivesegmentationcombiningglobalandlocalfeatures AT hongyuanwang glclickinteractivesegmentationcombiningglobalandlocalfeatures AT zongyuanding glclickinteractivesegmentationcombiningglobalandlocalfeatures AT zihaoxin glclickinteractivesegmentationcombiningglobalandlocalfeatures |