Alzheimer’s disease recognition using graph neural network by leveraging image-text similarity from vision language model

Abstract Alzheimer’s disease (AD), a progressive neurodegenerative condition, notably impacts cognitive functions and daily activity. One method of detecting dementia involves a task where participants describe a given picture, and extensive research has been conducted using the participants’ speech...

Full description

Saved in:

Bibliographic Details
Main Authors:	Byounghwa Lee, Jeong-Uk Bang, Hwa Jeon Song, Byung Ok Kang
Format:	Article
Language:	English
Published:	Nature Portfolio 2025-01-01
Series:	Scientific Reports
Subjects:	Alzheimer’s disease Bipartite graph Dementia Multimodal Graph neural network Vision language model
Online Access:	https://doi.org/10.1038/s41598-024-82597-z
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1841544790989078528
author	Byounghwa Lee Jeong-Uk Bang Hwa Jeon Song Byung Ok Kang
author_facet	Byounghwa Lee Jeong-Uk Bang Hwa Jeon Song Byung Ok Kang
author_sort	Byounghwa Lee
collection	DOAJ
description	Abstract Alzheimer’s disease (AD), a progressive neurodegenerative condition, notably impacts cognitive functions and daily activity. One method of detecting dementia involves a task where participants describe a given picture, and extensive research has been conducted using the participants’ speech and transcribed text. However, very few studies have explored the modality of the image itself. In this work, we propose a method that predicts dementia automatically by representing the relationship between images and texts as a graph. First, we transcribe the participants’ speech into text using an automatic speech recognition system. Then, we employ a vision language model to represent the relationship between the parts of the image and the corresponding descriptive sentences as a bipartite graph. Finally, we use a graph convolutional network (GCN), considering each subject as an individual graph, to classify AD patients through a graph-level classification task. In experiments conducted on the ADReSSo Challenge datasets, our model surpassed the existing state-of-the-art performance by achieving an accuracy of 88.73%. Additionally, ablation studies that removed the relationship between images and texts demonstrated the critical role of graphs in improving performance. Furthermore, by utilizing the sentence representations learned through the GCN, we identified the sentences and keywords critical for AD classification.
format	Article
id	doaj-art-fc780b6cf11c4aca9246d6be539ef430
institution	Kabale University
issn	2045-2322
language	English
publishDate	2025-01-01
publisher	Nature Portfolio
record_format	Article
series	Scientific Reports
spelling	doaj-art-fc780b6cf11c4aca9246d6be539ef4302025-01-12T12:19:34ZengNature PortfolioScientific Reports2045-23222025-01-0115111410.1038/s41598-024-82597-zAlzheimer’s disease recognition using graph neural network by leveraging image-text similarity from vision language modelByounghwa Lee0Jeong-Uk Bang1Hwa Jeon Song2Byung Ok Kang3Integrated Intelligence Research Section, Electronics and Telecommunications Research InstituteIntegrated Intelligence Research Section, Electronics and Telecommunications Research InstituteIntegrated Intelligence Research Section, Electronics and Telecommunications Research InstituteIntegrated Intelligence Research Section, Electronics and Telecommunications Research InstituteAbstract Alzheimer’s disease (AD), a progressive neurodegenerative condition, notably impacts cognitive functions and daily activity. One method of detecting dementia involves a task where participants describe a given picture, and extensive research has been conducted using the participants’ speech and transcribed text. However, very few studies have explored the modality of the image itself. In this work, we propose a method that predicts dementia automatically by representing the relationship between images and texts as a graph. First, we transcribe the participants’ speech into text using an automatic speech recognition system. Then, we employ a vision language model to represent the relationship between the parts of the image and the corresponding descriptive sentences as a bipartite graph. Finally, we use a graph convolutional network (GCN), considering each subject as an individual graph, to classify AD patients through a graph-level classification task. In experiments conducted on the ADReSSo Challenge datasets, our model surpassed the existing state-of-the-art performance by achieving an accuracy of 88.73%. Additionally, ablation studies that removed the relationship between images and texts demonstrated the critical role of graphs in improving performance. Furthermore, by utilizing the sentence representations learned through the GCN, we identified the sentences and keywords critical for AD classification.https://doi.org/10.1038/s41598-024-82597-zAlzheimer’s diseaseBipartite graphDementiaMultimodalGraph neural networkVision language model
spellingShingle	Byounghwa Lee Jeong-Uk Bang Hwa Jeon Song Byung Ok Kang Alzheimer’s disease recognition using graph neural network by leveraging image-text similarity from vision language model Scientific Reports Alzheimer’s disease Bipartite graph Dementia Multimodal Graph neural network Vision language model
title	Alzheimer’s disease recognition using graph neural network by leveraging image-text similarity from vision language model
title_full	Alzheimer’s disease recognition using graph neural network by leveraging image-text similarity from vision language model
title_fullStr	Alzheimer’s disease recognition using graph neural network by leveraging image-text similarity from vision language model
title_full_unstemmed	Alzheimer’s disease recognition using graph neural network by leveraging image-text similarity from vision language model
title_short	Alzheimer’s disease recognition using graph neural network by leveraging image-text similarity from vision language model
title_sort	alzheimer s disease recognition using graph neural network by leveraging image text similarity from vision language model
topic	Alzheimer’s disease Bipartite graph Dementia Multimodal Graph neural network Vision language model
url	https://doi.org/10.1038/s41598-024-82597-z
work_keys_str_mv	AT byounghwalee alzheimersdiseaserecognitionusinggraphneuralnetworkbyleveragingimagetextsimilarityfromvisionlanguagemodel AT jeongukbang alzheimersdiseaserecognitionusinggraphneuralnetworkbyleveragingimagetextsimilarityfromvisionlanguagemodel AT hwajeonsong alzheimersdiseaserecognitionusinggraphneuralnetworkbyleveragingimagetextsimilarityfromvisionlanguagemodel AT byungokkang alzheimersdiseaserecognitionusinggraphneuralnetworkbyleveragingimagetextsimilarityfromvisionlanguagemodel

Alzheimer’s disease recognition using graph neural network by leveraging image-text similarity from vision language model

Similar Items