Optimizing machine learning for network inference through comparative analysis of model performance in synthetic and real-world networks

Abstract Understanding the structural and operational characteristics of complex systems is crucial for network science research and analysis. To better understand the dynamics and behaviors of networks, it involves studying them in a variety of settings, including social, biological, and technical...

Full description

Saved in:

Bibliographic Details
Main Authors:	Ruby Khan, Sumbal Khan, Bakht Pari, Krzysztof Puszynski
Format:	Article
Language:	English
Published:	Nature Portfolio 2025-07-01
Series:	Scientific Reports
Subjects:	Network science Machine learning Network inference Logistic regression Random forest Scale-free networks
Online Access:	https://doi.org/10.1038/s41598-025-02982-0
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1849332916646051840
author	Ruby Khan Sumbal Khan Bakht Pari Krzysztof Puszynski
author_facet	Ruby Khan Sumbal Khan Bakht Pari Krzysztof Puszynski
author_sort	Ruby Khan
collection	DOAJ
description	Abstract Understanding the structural and operational characteristics of complex systems is crucial for network science research and analysis. To better understand the dynamics and behaviors of networks, it involves studying them in a variety of settings, including social, biological, and technical domains. This entails modeling and analyzing networks to identify their properties, frequently employing machine learning and statistical techniques. Conventional network models, such Erdős-Renyi (ER), Barabási-Albert (BA), and Stochastic Block Models (SBM), are commonly employed in synthetic network analysis. Real-world networks sometimes include extra complexities, like modularity, clustering, and scale-free features, which pose issues for these models. This study focuses on assessing the effectiveness of machine learning models in examining the structural features of networks across different scales and the related computational expenses. Here we show that Logistic Regression (LR) consistently outperforms Random Forest (RF) in synthetic networks of varying sizes, achieving perfect accuracy, precision, recall, F1 score, and AUC across networks with 100, 500, and 1000 nodes, while Random Forest exhibits lower performance with an accuracy of 80%. These findings call into question the notion that complicated models like Random Forest are inherently superior, indicating that simpler models like Logistic Regression are more effective in larger, more complex networks due to their higher generalization capabilities. The Stochastic Block Model (SBM) closely matches the modularity of real-world networks, while the Barabási-Albert (BA) model accurately replicates the hub-dominated structure of social networks, as confirmed by Kolmogorov-Smirnov (K-S) test statistics of $$D = 0.12$$ ( $$p = 0.18$$ ) for BA and $$D = 0.33$$ ( $$p = 0.005$$ ) for WS. These findings show that simpler machine learning models can outperform more sophisticated ones in some contexts, offering a more nuanced view of model selection based on network scale and complexity. They also emphasize the significance of balancing computational trade-offs when using machine learning on real-world networks. In a larger sense, this research helps to optimize machine learning techniques for network inference and analysis, which has ramifications for social, biological, and technical applications. The findings imply that future research should concentrate on adapting model selection to the specific characteristics of the network and task, assuring optimal performance and accuracy.
format	Article
id	doaj-art-9fbd918bee994cd097bc071935ccb51c
institution	Kabale University
issn	2045-2322
language	English
publishDate	2025-07-01
publisher	Nature Portfolio
record_format	Article
series	Scientific Reports
spelling	doaj-art-9fbd918bee994cd097bc071935ccb51c2025-08-20T03:46:04ZengNature PortfolioScientific Reports2045-23222025-07-0115112610.1038/s41598-025-02982-0Optimizing machine learning for network inference through comparative analysis of model performance in synthetic and real-world networksRuby Khan0Sumbal Khan1Bakht Pari2Krzysztof Puszynski3Department of System Biology and Engineering, Silesian University of TechnologyKhyber Girls Medical CollegeSarhad University of Science and Information TechnologyDepartment of System Biology and Engineering, Silesian University of TechnologyAbstract Understanding the structural and operational characteristics of complex systems is crucial for network science research and analysis. To better understand the dynamics and behaviors of networks, it involves studying them in a variety of settings, including social, biological, and technical domains. This entails modeling and analyzing networks to identify their properties, frequently employing machine learning and statistical techniques. Conventional network models, such Erdős-Renyi (ER), Barabási-Albert (BA), and Stochastic Block Models (SBM), are commonly employed in synthetic network analysis. Real-world networks sometimes include extra complexities, like modularity, clustering, and scale-free features, which pose issues for these models. This study focuses on assessing the effectiveness of machine learning models in examining the structural features of networks across different scales and the related computational expenses. Here we show that Logistic Regression (LR) consistently outperforms Random Forest (RF) in synthetic networks of varying sizes, achieving perfect accuracy, precision, recall, F1 score, and AUC across networks with 100, 500, and 1000 nodes, while Random Forest exhibits lower performance with an accuracy of 80%. These findings call into question the notion that complicated models like Random Forest are inherently superior, indicating that simpler models like Logistic Regression are more effective in larger, more complex networks due to their higher generalization capabilities. The Stochastic Block Model (SBM) closely matches the modularity of real-world networks, while the Barabási-Albert (BA) model accurately replicates the hub-dominated structure of social networks, as confirmed by Kolmogorov-Smirnov (K-S) test statistics of $$D = 0.12$$ ( $$p = 0.18$$ ) for BA and $$D = 0.33$$ ( $$p = 0.005$$ ) for WS. These findings show that simpler machine learning models can outperform more sophisticated ones in some contexts, offering a more nuanced view of model selection based on network scale and complexity. They also emphasize the significance of balancing computational trade-offs when using machine learning on real-world networks. In a larger sense, this research helps to optimize machine learning techniques for network inference and analysis, which has ramifications for social, biological, and technical applications. The findings imply that future research should concentrate on adapting model selection to the specific characteristics of the network and task, assuring optimal performance and accuracy.https://doi.org/10.1038/s41598-025-02982-0Network scienceMachine learningNetwork inferenceLogistic regressionRandom forestScale-free networks
spellingShingle	Ruby Khan Sumbal Khan Bakht Pari Krzysztof Puszynski Optimizing machine learning for network inference through comparative analysis of model performance in synthetic and real-world networks Scientific Reports Network science Machine learning Network inference Logistic regression Random forest Scale-free networks
title	Optimizing machine learning for network inference through comparative analysis of model performance in synthetic and real-world networks
title_full	Optimizing machine learning for network inference through comparative analysis of model performance in synthetic and real-world networks
title_fullStr	Optimizing machine learning for network inference through comparative analysis of model performance in synthetic and real-world networks
title_full_unstemmed	Optimizing machine learning for network inference through comparative analysis of model performance in synthetic and real-world networks
title_short	Optimizing machine learning for network inference through comparative analysis of model performance in synthetic and real-world networks
title_sort	optimizing machine learning for network inference through comparative analysis of model performance in synthetic and real world networks
topic	Network science Machine learning Network inference Logistic regression Random forest Scale-free networks
url	https://doi.org/10.1038/s41598-025-02982-0
work_keys_str_mv	AT rubykhan optimizingmachinelearningfornetworkinferencethroughcomparativeanalysisofmodelperformanceinsyntheticandrealworldnetworks AT sumbalkhan optimizingmachinelearningfornetworkinferencethroughcomparativeanalysisofmodelperformanceinsyntheticandrealworldnetworks AT bakhtpari optimizingmachinelearningfornetworkinferencethroughcomparativeanalysisofmodelperformanceinsyntheticandrealworldnetworks AT krzysztofpuszynski optimizingmachinelearningfornetworkinferencethroughcomparativeanalysisofmodelperformanceinsyntheticandrealworldnetworks

Optimizing machine learning for network inference through comparative analysis of model performance in synthetic and real-world networks

Similar Items