Enhancing Cluster Accuracy in Diabetes Multimorbidity With Dirichlet Process Mixture Models

Clustering of diabetic multimorbidity data from EHRs is challenging due to patient heterogeneity, high-dimensional variables, sensitivity to parameter settings, and high computational demands, which complicate clustering processes and may result in suboptimal clustering results. These complex and im...

Full description

Saved in:
Bibliographic Details
Main Authors: Francis John Kita, Srinivasa Rao Gaddes, Peter Josephat Kirigiti
Format: Article
Language:English
Published: IEEE 2025-01-01
Series:IEEE Access
Subjects:
Online Access:https://ieeexplore.ieee.org/document/10816607/
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1841542534732447744
author Francis John Kita
Srinivasa Rao Gaddes
Peter Josephat Kirigiti
author_facet Francis John Kita
Srinivasa Rao Gaddes
Peter Josephat Kirigiti
author_sort Francis John Kita
collection DOAJ
description Clustering of diabetic multimorbidity data from EHRs is challenging due to patient heterogeneity, high-dimensional variables, sensitivity to parameter settings, and high computational demands, which complicate clustering processes and may result in suboptimal clustering results. These complex and imbalanced natures of diabetic multimorbidity data limit the effectiveness of traditional clustering techniques, producing suboptimal clusters and revealing inadequate clinically meaningful insights. This study addresses this gap by applying the Dirichlet Process Mixture Model (DPMM), a non-parametric clustering approach that does not require specifying cluster numbers and adapts dynamically to the underlying data structure. The major advantages of DPMM include (1) DPMM automatically adjusts the number of clusters based on data structure, enabling it to capture diverse patient profiles without needing predefined cluster counts, which is ideal for handling the variability in multimorbidity patterns; (2) DPMM estimates the distributional properties directly from the data, relying on proper parameter choices and improving the stability of clustering results across datasets; and (3) DPMM uses a Bayesian framework to iteratively converge toward optimal clustering solutions, efficiently managing large datasets and producing more clinically meaningful clusters. Additionally, Gibbs sampling is employed for robust convergence in parameter settings, minimizing the dependency on initial configurations and improving the consistency of clustering outcomes across various data contexts. Results show that DPMM consistently outperforms traditional methods in clustering high-dimensional and imbalanced datasets, offering significant translational potential for guiding tailored healthcare strategies for complex chronic diseases and optimizing healthcare resource allocation.
format Article
id doaj-art-36dc6f29c9ab4c95b5aa04265b12f534
institution Kabale University
issn 2169-3536
language English
publishDate 2025-01-01
publisher IEEE
record_format Article
series IEEE Access
spelling doaj-art-36dc6f29c9ab4c95b5aa04265b12f5342025-01-14T00:02:18ZengIEEEIEEE Access2169-35362025-01-01136422643910.1109/ACCESS.2024.352334010816607Enhancing Cluster Accuracy in Diabetes Multimorbidity With Dirichlet Process Mixture ModelsFrancis John Kita0https://orcid.org/0009-0001-8009-120XSrinivasa Rao Gaddes1https://orcid.org/0000-0002-3683-5486Peter Josephat Kirigiti2Department of Mathematics and Statistics, The University of Dodoma, Dodoma, TanzaniaDepartment of Mathematics and Statistics, The University of Dodoma, Dodoma, TanzaniaDepartment of Mathematics and Statistics, The University of Dodoma, Dodoma, TanzaniaClustering of diabetic multimorbidity data from EHRs is challenging due to patient heterogeneity, high-dimensional variables, sensitivity to parameter settings, and high computational demands, which complicate clustering processes and may result in suboptimal clustering results. These complex and imbalanced natures of diabetic multimorbidity data limit the effectiveness of traditional clustering techniques, producing suboptimal clusters and revealing inadequate clinically meaningful insights. This study addresses this gap by applying the Dirichlet Process Mixture Model (DPMM), a non-parametric clustering approach that does not require specifying cluster numbers and adapts dynamically to the underlying data structure. The major advantages of DPMM include (1) DPMM automatically adjusts the number of clusters based on data structure, enabling it to capture diverse patient profiles without needing predefined cluster counts, which is ideal for handling the variability in multimorbidity patterns; (2) DPMM estimates the distributional properties directly from the data, relying on proper parameter choices and improving the stability of clustering results across datasets; and (3) DPMM uses a Bayesian framework to iteratively converge toward optimal clustering solutions, efficiently managing large datasets and producing more clinically meaningful clusters. Additionally, Gibbs sampling is employed for robust convergence in parameter settings, minimizing the dependency on initial configurations and improving the consistency of clustering outcomes across various data contexts. Results show that DPMM consistently outperforms traditional methods in clustering high-dimensional and imbalanced datasets, offering significant translational potential for guiding tailored healthcare strategies for complex chronic diseases and optimizing healthcare resource allocation.https://ieeexplore.ieee.org/document/10816607/Clustering methodsdiabetic multimorbidityDirichlet process mixture modelinfinite mixture modelsvalidation metrics
spellingShingle Francis John Kita
Srinivasa Rao Gaddes
Peter Josephat Kirigiti
Enhancing Cluster Accuracy in Diabetes Multimorbidity With Dirichlet Process Mixture Models
IEEE Access
Clustering methods
diabetic multimorbidity
Dirichlet process mixture model
infinite mixture models
validation metrics
title Enhancing Cluster Accuracy in Diabetes Multimorbidity With Dirichlet Process Mixture Models
title_full Enhancing Cluster Accuracy in Diabetes Multimorbidity With Dirichlet Process Mixture Models
title_fullStr Enhancing Cluster Accuracy in Diabetes Multimorbidity With Dirichlet Process Mixture Models
title_full_unstemmed Enhancing Cluster Accuracy in Diabetes Multimorbidity With Dirichlet Process Mixture Models
title_short Enhancing Cluster Accuracy in Diabetes Multimorbidity With Dirichlet Process Mixture Models
title_sort enhancing cluster accuracy in diabetes multimorbidity with dirichlet process mixture models
topic Clustering methods
diabetic multimorbidity
Dirichlet process mixture model
infinite mixture models
validation metrics
url https://ieeexplore.ieee.org/document/10816607/
work_keys_str_mv AT francisjohnkita enhancingclusteraccuracyindiabetesmultimorbiditywithdirichletprocessmixturemodels
AT srinivasaraogaddes enhancingclusteraccuracyindiabetesmultimorbiditywithdirichletprocessmixturemodels
AT peterjosephatkirigiti enhancingclusteraccuracyindiabetesmultimorbiditywithdirichletprocessmixturemodels