GLClick: Interactive Segmentation Combining Global and Local Features

Convolutional neural networks (CNNs) are the backbone of most modern interactive segmentation algorithms. However, the limited receptive field of CNNs restricts their ability to capture long-range semantic relationships. Recently, transformers have gained significant attention for their capacity to...

Full description

Saved in:
Bibliographic Details
Main Authors: Jiaying Tang, Hongyuan Wang, Zongyuan Ding, Zihao Xin
Format: Article
Language:English
Published: MDPI AG 2024-12-01
Series:Applied Sciences
Subjects:
Online Access:https://www.mdpi.com/2076-3417/15/1/186
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1841549447552565248
author Jiaying Tang
Hongyuan Wang
Zongyuan Ding
Zihao Xin
author_facet Jiaying Tang
Hongyuan Wang
Zongyuan Ding
Zihao Xin
author_sort Jiaying Tang
collection DOAJ
description Convolutional neural networks (CNNs) are the backbone of most modern interactive segmentation algorithms. However, the limited receptive field of CNNs restricts their ability to capture long-range semantic relationships. Recently, transformers have gained significant attention for their capacity to capture long-range dependencies. Nevertheless, CNNs still outperform Transformer in extracting local information. An effective interactive segmentation algorithm should accurately capture fine-grained local details alongside global semantic relationships. Therefore, we propose GLClick, a global–local click-based interactive image segmentation model that integrates local and global information through a novel fusion mechanism. We design an efficient global–local feature fusion module (GLFM) that integrates fine-grained features from various layers of ResNet50 with those from the Transformer feature pyramid. This approach maintains ResNet50’s ability to extract local features while effectively leveraging the Transformer to capture global context. Additionally, we enhance the multi-layer perceptron (MLP) to improve performance. Extensive experiments on diverse benchmark datasets demonstrate significant improvements in interactive image segmentation, confirming the effectiveness of our approach. Moreover, we conduct experiments on medical image datasets, further illustrating the model’s versatility and effectiveness across different domains.
format Article
id doaj-art-714681e2554b46c493f0f92db97187f7
institution Kabale University
issn 2076-3417
language English
publishDate 2024-12-01
publisher MDPI AG
record_format Article
series Applied Sciences
spelling doaj-art-714681e2554b46c493f0f92db97187f72025-01-10T13:14:44ZengMDPI AGApplied Sciences2076-34172024-12-0115118610.3390/app15010186GLClick: Interactive Segmentation Combining Global and Local FeaturesJiaying Tang0Hongyuan Wang1Zongyuan Ding2Zihao Xin3School of Computer and Artificial Intelligence, Changzhou University, No. 1, Gehu Road, Changzhou 213164, ChinaSchool of Computer and Artificial Intelligence, Changzhou University, No. 1, Gehu Road, Changzhou 213164, ChinaSchool of Computer and Artificial Intelligence, Changzhou University, No. 1, Gehu Road, Changzhou 213164, ChinaSchool of Computer Science and Technology, Nanjing University of Aeronautics and Astronautics, No. 29, Jiangjun Road, Nanjing 211106, ChinaConvolutional neural networks (CNNs) are the backbone of most modern interactive segmentation algorithms. However, the limited receptive field of CNNs restricts their ability to capture long-range semantic relationships. Recently, transformers have gained significant attention for their capacity to capture long-range dependencies. Nevertheless, CNNs still outperform Transformer in extracting local information. An effective interactive segmentation algorithm should accurately capture fine-grained local details alongside global semantic relationships. Therefore, we propose GLClick, a global–local click-based interactive image segmentation model that integrates local and global information through a novel fusion mechanism. We design an efficient global–local feature fusion module (GLFM) that integrates fine-grained features from various layers of ResNet50 with those from the Transformer feature pyramid. This approach maintains ResNet50’s ability to extract local features while effectively leveraging the Transformer to capture global context. Additionally, we enhance the multi-layer perceptron (MLP) to improve performance. Extensive experiments on diverse benchmark datasets demonstrate significant improvements in interactive image segmentation, confirming the effectiveness of our approach. Moreover, we conduct experiments on medical image datasets, further illustrating the model’s versatility and effectiveness across different domains.https://www.mdpi.com/2076-3417/15/1/186ResNet50Transformerglobal–local feature fusion moduleinteractive segmentationMLP
spellingShingle Jiaying Tang
Hongyuan Wang
Zongyuan Ding
Zihao Xin
GLClick: Interactive Segmentation Combining Global and Local Features
Applied Sciences
ResNet50
Transformer
global–local feature fusion module
interactive segmentation
MLP
title GLClick: Interactive Segmentation Combining Global and Local Features
title_full GLClick: Interactive Segmentation Combining Global and Local Features
title_fullStr GLClick: Interactive Segmentation Combining Global and Local Features
title_full_unstemmed GLClick: Interactive Segmentation Combining Global and Local Features
title_short GLClick: Interactive Segmentation Combining Global and Local Features
title_sort glclick interactive segmentation combining global and local features
topic ResNet50
Transformer
global–local feature fusion module
interactive segmentation
MLP
url https://www.mdpi.com/2076-3417/15/1/186
work_keys_str_mv AT jiayingtang glclickinteractivesegmentationcombiningglobalandlocalfeatures
AT hongyuanwang glclickinteractivesegmentationcombiningglobalandlocalfeatures
AT zongyuanding glclickinteractivesegmentationcombiningglobalandlocalfeatures
AT zihaoxin glclickinteractivesegmentationcombiningglobalandlocalfeatures