Soft graph clustering for single-cell RNA sequencing data

Abstract Background Clustering analysis is fundamental in single-cell RNA sequencing (scRNA-seq) data analysis for elucidating cellular heterogeneity and diversity. Recent graph-based scRNA-seq clustering methods, particularly graph neural networks (GNNs), have significantly improved in tackling the...

Full description

Saved in:
Bibliographic Details
Main Authors: Ping Xu, Pengfei Wang, Zhiyuan Ning, Meng Xiao, Min Wu, Yuanchun Zhou
Format: Article
Language:English
Published: BMC 2025-07-01
Series:BMC Bioinformatics
Subjects:
Online Access:https://doi.org/10.1186/s12859-025-06231-z
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1849760881864343552
author Ping Xu
Pengfei Wang
Zhiyuan Ning
Meng Xiao
Min Wu
Yuanchun Zhou
author_facet Ping Xu
Pengfei Wang
Zhiyuan Ning
Meng Xiao
Min Wu
Yuanchun Zhou
author_sort Ping Xu
collection DOAJ
description Abstract Background Clustering analysis is fundamental in single-cell RNA sequencing (scRNA-seq) data analysis for elucidating cellular heterogeneity and diversity. Recent graph-based scRNA-seq clustering methods, particularly graph neural networks (GNNs), have significantly improved in tackling the challenges of high-dimension, high-sparsity, and frequent dropout events that lead to ambiguous cell population boundaries. However, one major challenge for GNN-based methods is their reliance on hard graph constructions derived from similarity matrices. These constructions introduce difficulties when applied to scRNA-seq data due to: (i) The simplification of intercellular relationships into binary edges (0 or 1) by applying thresholds, which restricts the capture of continuous similarity features among cells and leads to significant information loss. (ii) The presence of significant inter-cluster connections within hard graphs, which can confuse GNN methods that rely heavily on graph structures, potentially causing erroneous message propagation and biased clustering outcomes. Results To tackle these challenges, we introduce scSGC, a Soft Graph Clustering for single-cell RNA sequencing data, which aims to more accurately characterize continuous similarities among cells through non-binary edge weights, thereby mitigating the limitations of rigid data structures. The scSGC framework comprises three core components: (i) a zero-inflated negative binomial (ZINB)-based feature autoencoder designed to effectively handle the sparsity and dropout issues in scRNA-seq data; (ii) a dual-channel cut-informed soft graph embedding module, constructed through deep graph-cut information, capturing continuous similarities between cells while preserving the intrinsic data structures of scRNA-seq; and (iii) an optimal transport-based clustering optimization module, achieving optimal delineation of cell populations while maintaining high biological relevance. Conclusion By integrating dual-channel cut-informed soft graph representation learning, a ZINB-based feature autoencoder, and optimal transport-driven clustering optimization, scSGC effectively overcomes the challenges associated with traditional hard graph constructions in GNN methods. Extensive experiments across ten datasets demonstrate that scSGC outperforms 13 state-of-the-art clustering models in clustering accuracy, cell type annotation, and computational efficiency. These results highlight its substantial potential to advance scRNA-seq data analysis and deepen our understanding of cellular heterogeneity.
format Article
id doaj-art-0c251eccd1e84dbe8fda5eb24de1ceb8
institution DOAJ
issn 1471-2105
language English
publishDate 2025-07-01
publisher BMC
record_format Article
series BMC Bioinformatics
spelling doaj-art-0c251eccd1e84dbe8fda5eb24de1ceb82025-08-20T03:06:13ZengBMCBMC Bioinformatics1471-21052025-07-0126112210.1186/s12859-025-06231-zSoft graph clustering for single-cell RNA sequencing dataPing Xu0Pengfei Wang1Zhiyuan Ning2Meng Xiao3Min Wu4Yuanchun Zhou5Computer Network Information Center, Chinese Academy of SciencesComputer Network Information Center, Chinese Academy of SciencesComputer Network Information Center, Chinese Academy of SciencesComputer Network Information Center, Chinese Academy of SciencesDuke-NUS Medical School, National University of SingaporeComputer Network Information Center, Chinese Academy of SciencesAbstract Background Clustering analysis is fundamental in single-cell RNA sequencing (scRNA-seq) data analysis for elucidating cellular heterogeneity and diversity. Recent graph-based scRNA-seq clustering methods, particularly graph neural networks (GNNs), have significantly improved in tackling the challenges of high-dimension, high-sparsity, and frequent dropout events that lead to ambiguous cell population boundaries. However, one major challenge for GNN-based methods is their reliance on hard graph constructions derived from similarity matrices. These constructions introduce difficulties when applied to scRNA-seq data due to: (i) The simplification of intercellular relationships into binary edges (0 or 1) by applying thresholds, which restricts the capture of continuous similarity features among cells and leads to significant information loss. (ii) The presence of significant inter-cluster connections within hard graphs, which can confuse GNN methods that rely heavily on graph structures, potentially causing erroneous message propagation and biased clustering outcomes. Results To tackle these challenges, we introduce scSGC, a Soft Graph Clustering for single-cell RNA sequencing data, which aims to more accurately characterize continuous similarities among cells through non-binary edge weights, thereby mitigating the limitations of rigid data structures. The scSGC framework comprises three core components: (i) a zero-inflated negative binomial (ZINB)-based feature autoencoder designed to effectively handle the sparsity and dropout issues in scRNA-seq data; (ii) a dual-channel cut-informed soft graph embedding module, constructed through deep graph-cut information, capturing continuous similarities between cells while preserving the intrinsic data structures of scRNA-seq; and (iii) an optimal transport-based clustering optimization module, achieving optimal delineation of cell populations while maintaining high biological relevance. Conclusion By integrating dual-channel cut-informed soft graph representation learning, a ZINB-based feature autoencoder, and optimal transport-driven clustering optimization, scSGC effectively overcomes the challenges associated with traditional hard graph constructions in GNN methods. Extensive experiments across ten datasets demonstrate that scSGC outperforms 13 state-of-the-art clustering models in clustering accuracy, cell type annotation, and computational efficiency. These results highlight its substantial potential to advance scRNA-seq data analysis and deepen our understanding of cellular heterogeneity.https://doi.org/10.1186/s12859-025-06231-zBioinformaticsscRNA-seq dataSoft graph clusteringDeep cut-informed graph embedding
spellingShingle Ping Xu
Pengfei Wang
Zhiyuan Ning
Meng Xiao
Min Wu
Yuanchun Zhou
Soft graph clustering for single-cell RNA sequencing data
BMC Bioinformatics
Bioinformatics
scRNA-seq data
Soft graph clustering
Deep cut-informed graph embedding
title Soft graph clustering for single-cell RNA sequencing data
title_full Soft graph clustering for single-cell RNA sequencing data
title_fullStr Soft graph clustering for single-cell RNA sequencing data
title_full_unstemmed Soft graph clustering for single-cell RNA sequencing data
title_short Soft graph clustering for single-cell RNA sequencing data
title_sort soft graph clustering for single cell rna sequencing data
topic Bioinformatics
scRNA-seq data
Soft graph clustering
Deep cut-informed graph embedding
url https://doi.org/10.1186/s12859-025-06231-z
work_keys_str_mv AT pingxu softgraphclusteringforsinglecellrnasequencingdata
AT pengfeiwang softgraphclusteringforsinglecellrnasequencingdata
AT zhiyuanning softgraphclusteringforsinglecellrnasequencingdata
AT mengxiao softgraphclusteringforsinglecellrnasequencingdata
AT minwu softgraphclusteringforsinglecellrnasequencingdata
AT yuanchunzhou softgraphclusteringforsinglecellrnasequencingdata