Alzheimer’s disease recognition using graph neural network by leveraging image-text similarity from vision language model

Abstract Alzheimer’s disease (AD), a progressive neurodegenerative condition, notably impacts cognitive functions and daily activity. One method of detecting dementia involves a task where participants describe a given picture, and extensive research has been conducted using the participants’ speech...

Full description

Saved in:
Bibliographic Details
Main Authors: Byounghwa Lee, Jeong-Uk Bang, Hwa Jeon Song, Byung Ok Kang
Format: Article
Language:English
Published: Nature Portfolio 2025-01-01
Series:Scientific Reports
Subjects:
Online Access:https://doi.org/10.1038/s41598-024-82597-z
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1841544790989078528
author Byounghwa Lee
Jeong-Uk Bang
Hwa Jeon Song
Byung Ok Kang
author_facet Byounghwa Lee
Jeong-Uk Bang
Hwa Jeon Song
Byung Ok Kang
author_sort Byounghwa Lee
collection DOAJ
description Abstract Alzheimer’s disease (AD), a progressive neurodegenerative condition, notably impacts cognitive functions and daily activity. One method of detecting dementia involves a task where participants describe a given picture, and extensive research has been conducted using the participants’ speech and transcribed text. However, very few studies have explored the modality of the image itself. In this work, we propose a method that predicts dementia automatically by representing the relationship between images and texts as a graph. First, we transcribe the participants’ speech into text using an automatic speech recognition system. Then, we employ a vision language model to represent the relationship between the parts of the image and the corresponding descriptive sentences as a bipartite graph. Finally, we use a graph convolutional network (GCN), considering each subject as an individual graph, to classify AD patients through a graph-level classification task. In experiments conducted on the ADReSSo Challenge datasets, our model surpassed the existing state-of-the-art performance by achieving an accuracy of 88.73%. Additionally, ablation studies that removed the relationship between images and texts demonstrated the critical role of graphs in improving performance. Furthermore, by utilizing the sentence representations learned through the GCN, we identified the sentences and keywords critical for AD classification.
format Article
id doaj-art-fc780b6cf11c4aca9246d6be539ef430
institution Kabale University
issn 2045-2322
language English
publishDate 2025-01-01
publisher Nature Portfolio
record_format Article
series Scientific Reports
spelling doaj-art-fc780b6cf11c4aca9246d6be539ef4302025-01-12T12:19:34ZengNature PortfolioScientific Reports2045-23222025-01-0115111410.1038/s41598-024-82597-zAlzheimer’s disease recognition using graph neural network by leveraging image-text similarity from vision language modelByounghwa Lee0Jeong-Uk Bang1Hwa Jeon Song2Byung Ok Kang3Integrated Intelligence Research Section, Electronics and Telecommunications Research InstituteIntegrated Intelligence Research Section, Electronics and Telecommunications Research InstituteIntegrated Intelligence Research Section, Electronics and Telecommunications Research InstituteIntegrated Intelligence Research Section, Electronics and Telecommunications Research InstituteAbstract Alzheimer’s disease (AD), a progressive neurodegenerative condition, notably impacts cognitive functions and daily activity. One method of detecting dementia involves a task where participants describe a given picture, and extensive research has been conducted using the participants’ speech and transcribed text. However, very few studies have explored the modality of the image itself. In this work, we propose a method that predicts dementia automatically by representing the relationship between images and texts as a graph. First, we transcribe the participants’ speech into text using an automatic speech recognition system. Then, we employ a vision language model to represent the relationship between the parts of the image and the corresponding descriptive sentences as a bipartite graph. Finally, we use a graph convolutional network (GCN), considering each subject as an individual graph, to classify AD patients through a graph-level classification task. In experiments conducted on the ADReSSo Challenge datasets, our model surpassed the existing state-of-the-art performance by achieving an accuracy of 88.73%. Additionally, ablation studies that removed the relationship between images and texts demonstrated the critical role of graphs in improving performance. Furthermore, by utilizing the sentence representations learned through the GCN, we identified the sentences and keywords critical for AD classification.https://doi.org/10.1038/s41598-024-82597-zAlzheimer’s diseaseBipartite graphDementiaMultimodalGraph neural networkVision language model
spellingShingle Byounghwa Lee
Jeong-Uk Bang
Hwa Jeon Song
Byung Ok Kang
Alzheimer’s disease recognition using graph neural network by leveraging image-text similarity from vision language model
Scientific Reports
Alzheimer’s disease
Bipartite graph
Dementia
Multimodal
Graph neural network
Vision language model
title Alzheimer’s disease recognition using graph neural network by leveraging image-text similarity from vision language model
title_full Alzheimer’s disease recognition using graph neural network by leveraging image-text similarity from vision language model
title_fullStr Alzheimer’s disease recognition using graph neural network by leveraging image-text similarity from vision language model
title_full_unstemmed Alzheimer’s disease recognition using graph neural network by leveraging image-text similarity from vision language model
title_short Alzheimer’s disease recognition using graph neural network by leveraging image-text similarity from vision language model
title_sort alzheimer s disease recognition using graph neural network by leveraging image text similarity from vision language model
topic Alzheimer’s disease
Bipartite graph
Dementia
Multimodal
Graph neural network
Vision language model
url https://doi.org/10.1038/s41598-024-82597-z
work_keys_str_mv AT byounghwalee alzheimersdiseaserecognitionusinggraphneuralnetworkbyleveragingimagetextsimilarityfromvisionlanguagemodel
AT jeongukbang alzheimersdiseaserecognitionusinggraphneuralnetworkbyleveragingimagetextsimilarityfromvisionlanguagemodel
AT hwajeonsong alzheimersdiseaserecognitionusinggraphneuralnetworkbyleveragingimagetextsimilarityfromvisionlanguagemodel
AT byungokkang alzheimersdiseaserecognitionusinggraphneuralnetworkbyleveragingimagetextsimilarityfromvisionlanguagemodel