Distilling knowledge from graph neural networks trained on cell graphs to non-neural student models

Abstract The development and refinement of artificial intelligence (AI) and machine learning algorithms have been an area of intense research in radiology and pathology, particularly for automated or computer-aided diagnosis. Whole Slide Imaging (WSI) has emerged as a promising tool for developing a...

Full description

Saved in:

Bibliographic Details
Main Authors:	Vasundhara Acharya, Bülent Yener, Gillian Beamer
Format:	Article
Language:	English
Published:	Nature Portfolio 2025-08-01
Series:	Scientific Reports
Subjects:	Whole slide imaging Graph neural networks Cell graphs Knowledge distillation Non-neural models Tuberculosis
Online Access:	https://doi.org/10.1038/s41598-025-13697-7
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1849235109519032320
author	Vasundhara Acharya Bülent Yener Gillian Beamer
author_facet	Vasundhara Acharya Bülent Yener Gillian Beamer
author_sort	Vasundhara Acharya
collection	DOAJ
description	Abstract The development and refinement of artificial intelligence (AI) and machine learning algorithms have been an area of intense research in radiology and pathology, particularly for automated or computer-aided diagnosis. Whole Slide Imaging (WSI) has emerged as a promising tool for developing and utilizing such algorithms in diagnostic and experimental pathology. However, patch-wise analysis of WSIs often falls short of capturing the intricate cell-level interactions within local microenvironment. A robust alternative to address this limitation involves leveraging cell graph representations, thereby enabling a more detailed analysis of local cell interactions. These cell graphs encapsulate the local spatial arrangement of cells in histopathology images, a factor proven to have significant prognostic value. Graph Neural Networks (GNNs) can effectively utilize these spatial feature representations and other features, demonstrating promising performance across classification tasks of varying complexities. It is also feasible to distill the knowledge acquired by deep neural networks to smaller student models through knowledge distillation (KD), achieving goals such as model compression and performance enhancement. Traditional approaches for constructing cell graphs generally rely on edge thresholds defined by sparsity/density or the assumption that nearby cells interact. However, such methods may fail to capture biologically meaningful interactions. Additionally, existing works in knowledge distillation primarily focus on distilling knowledge between neural networks. We designed cell graphs with biologically informed edge thresholds or criteria to address these limitations, moving beyond density/sparsity-based definitions. Furthermore, we demonstrated that student models do not need to be neural networks. Even non-neural models can learn from a neural network teacher. We evaluated our approach across varying dataset complexities, including the presence or absence of distribution shifts, varying degrees of imbalance, and different levels of graph complexity for training GNNs. We also investigated whether softened probabilities obtained from calibrated logits offered better guidance than raw logits. Our experiments revealed that the teacher’s guidance was effective when distribution shifts existed in the data. The teacher model demonstrated decent performance due to its higher complexity and ability to use cell graph structures and features. Its logits provided rich information and regularization to students, mitigating the risk of overfitting the training distribution. We also examined the differences in feature importance between student models trained with the teacher’s logits and their counterparts trained on hard labels. In particular, the student model demonstrated a stronger emphasis on morphological features in the Tuberculosis (TB) dataset than the models trained with hard labels. This emphasis aligns closely with the features that pathologists typically prioritize for diagnostic purposes. Future work could explore designing alternative teacher models, evaluating the proposed approach on larger datasets, and investigating causal knowledge distillation as a potential extension.
format	Article
id	doaj-art-a8ad9706ecb146d1a13cc639e1a7c4f3
institution	Kabale University
issn	2045-2322
language	English
publishDate	2025-08-01
publisher	Nature Portfolio
record_format	Article
series	Scientific Reports
spelling	doaj-art-a8ad9706ecb146d1a13cc639e1a7c4f32025-08-20T04:02:55ZengNature PortfolioScientific Reports2045-23222025-08-0115114110.1038/s41598-025-13697-7Distilling knowledge from graph neural networks trained on cell graphs to non-neural student modelsVasundhara Acharya0Bülent Yener1Gillian Beamer2Rensselaer Polytechnic InstituteProfessor, Rensselaer Polytechnic InstituteAdjunct Associate Professor, Texas Biomedical Research InstituteAbstract The development and refinement of artificial intelligence (AI) and machine learning algorithms have been an area of intense research in radiology and pathology, particularly for automated or computer-aided diagnosis. Whole Slide Imaging (WSI) has emerged as a promising tool for developing and utilizing such algorithms in diagnostic and experimental pathology. However, patch-wise analysis of WSIs often falls short of capturing the intricate cell-level interactions within local microenvironment. A robust alternative to address this limitation involves leveraging cell graph representations, thereby enabling a more detailed analysis of local cell interactions. These cell graphs encapsulate the local spatial arrangement of cells in histopathology images, a factor proven to have significant prognostic value. Graph Neural Networks (GNNs) can effectively utilize these spatial feature representations and other features, demonstrating promising performance across classification tasks of varying complexities. It is also feasible to distill the knowledge acquired by deep neural networks to smaller student models through knowledge distillation (KD), achieving goals such as model compression and performance enhancement. Traditional approaches for constructing cell graphs generally rely on edge thresholds defined by sparsity/density or the assumption that nearby cells interact. However, such methods may fail to capture biologically meaningful interactions. Additionally, existing works in knowledge distillation primarily focus on distilling knowledge between neural networks. We designed cell graphs with biologically informed edge thresholds or criteria to address these limitations, moving beyond density/sparsity-based definitions. Furthermore, we demonstrated that student models do not need to be neural networks. Even non-neural models can learn from a neural network teacher. We evaluated our approach across varying dataset complexities, including the presence or absence of distribution shifts, varying degrees of imbalance, and different levels of graph complexity for training GNNs. We also investigated whether softened probabilities obtained from calibrated logits offered better guidance than raw logits. Our experiments revealed that the teacher’s guidance was effective when distribution shifts existed in the data. The teacher model demonstrated decent performance due to its higher complexity and ability to use cell graph structures and features. Its logits provided rich information and regularization to students, mitigating the risk of overfitting the training distribution. We also examined the differences in feature importance between student models trained with the teacher’s logits and their counterparts trained on hard labels. In particular, the student model demonstrated a stronger emphasis on morphological features in the Tuberculosis (TB) dataset than the models trained with hard labels. This emphasis aligns closely with the features that pathologists typically prioritize for diagnostic purposes. Future work could explore designing alternative teacher models, evaluating the proposed approach on larger datasets, and investigating causal knowledge distillation as a potential extension.https://doi.org/10.1038/s41598-025-13697-7Whole slide imagingGraph neural networksCell graphsKnowledge distillationNon-neural modelsTuberculosis
spellingShingle	Vasundhara Acharya Bülent Yener Gillian Beamer Distilling knowledge from graph neural networks trained on cell graphs to non-neural student models Scientific Reports Whole slide imaging Graph neural networks Cell graphs Knowledge distillation Non-neural models Tuberculosis
title	Distilling knowledge from graph neural networks trained on cell graphs to non-neural student models
title_full	Distilling knowledge from graph neural networks trained on cell graphs to non-neural student models
title_fullStr	Distilling knowledge from graph neural networks trained on cell graphs to non-neural student models
title_full_unstemmed	Distilling knowledge from graph neural networks trained on cell graphs to non-neural student models
title_short	Distilling knowledge from graph neural networks trained on cell graphs to non-neural student models
title_sort	distilling knowledge from graph neural networks trained on cell graphs to non neural student models
topic	Whole slide imaging Graph neural networks Cell graphs Knowledge distillation Non-neural models Tuberculosis
url	https://doi.org/10.1038/s41598-025-13697-7
work_keys_str_mv	AT vasundharaacharya distillingknowledgefromgraphneuralnetworkstrainedoncellgraphstononneuralstudentmodels AT bulentyener distillingknowledgefromgraphneuralnetworkstrainedoncellgraphstononneuralstudentmodels AT gillianbeamer distillingknowledgefromgraphneuralnetworkstrainedoncellgraphstononneuralstudentmodels

Distilling knowledge from graph neural networks trained on cell graphs to non-neural student models

Similar Items