Optimizing machine learning for network inference through comparative analysis of model performance in synthetic and real-world networks

Abstract Understanding the structural and operational characteristics of complex systems is crucial for network science research and analysis. To better understand the dynamics and behaviors of networks, it involves studying them in a variety of settings, including social, biological, and technical...

Full description

Saved in:
Bibliographic Details
Main Authors: Ruby Khan, Sumbal Khan, Bakht Pari, Krzysztof Puszynski
Format: Article
Language:English
Published: Nature Portfolio 2025-07-01
Series:Scientific Reports
Subjects:
Online Access:https://doi.org/10.1038/s41598-025-02982-0
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1849332916646051840
author Ruby Khan
Sumbal Khan
Bakht Pari
Krzysztof Puszynski
author_facet Ruby Khan
Sumbal Khan
Bakht Pari
Krzysztof Puszynski
author_sort Ruby Khan
collection DOAJ
description Abstract Understanding the structural and operational characteristics of complex systems is crucial for network science research and analysis. To better understand the dynamics and behaviors of networks, it involves studying them in a variety of settings, including social, biological, and technical domains. This entails modeling and analyzing networks to identify their properties, frequently employing machine learning and statistical techniques. Conventional network models, such Erdős-Renyi (ER), Barabási-Albert (BA), and Stochastic Block Models (SBM), are commonly employed in synthetic network analysis. Real-world networks sometimes include extra complexities, like modularity, clustering, and scale-free features, which pose issues for these models. This study focuses on assessing the effectiveness of machine learning models in examining the structural features of networks across different scales and the related computational expenses. Here we show that Logistic Regression (LR) consistently outperforms Random Forest (RF) in synthetic networks of varying sizes, achieving perfect accuracy, precision, recall, F1 score, and AUC across networks with 100, 500, and 1000 nodes, while Random Forest exhibits lower performance with an accuracy of 80%. These findings call into question the notion that complicated models like Random Forest are inherently superior, indicating that simpler models like Logistic Regression are more effective in larger, more complex networks due to their higher generalization capabilities. The Stochastic Block Model (SBM) closely matches the modularity of real-world networks, while the Barabási-Albert (BA) model accurately replicates the hub-dominated structure of social networks, as confirmed by Kolmogorov-Smirnov (K-S) test statistics of $$D = 0.12$$ ( $$p = 0.18$$ ) for BA and $$D = 0.33$$ ( $$p = 0.005$$ ) for WS. These findings show that simpler machine learning models can outperform more sophisticated ones in some contexts, offering a more nuanced view of model selection based on network scale and complexity. They also emphasize the significance of balancing computational trade-offs when using machine learning on real-world networks. In a larger sense, this research helps to optimize machine learning techniques for network inference and analysis, which has ramifications for social, biological, and technical applications. The findings imply that future research should concentrate on adapting model selection to the specific characteristics of the network and task, assuring optimal performance and accuracy.
format Article
id doaj-art-9fbd918bee994cd097bc071935ccb51c
institution Kabale University
issn 2045-2322
language English
publishDate 2025-07-01
publisher Nature Portfolio
record_format Article
series Scientific Reports
spelling doaj-art-9fbd918bee994cd097bc071935ccb51c2025-08-20T03:46:04ZengNature PortfolioScientific Reports2045-23222025-07-0115112610.1038/s41598-025-02982-0Optimizing machine learning for network inference through comparative analysis of model performance in synthetic and real-world networksRuby Khan0Sumbal Khan1Bakht Pari2Krzysztof Puszynski3Department of System Biology and Engineering, Silesian University of TechnologyKhyber Girls Medical CollegeSarhad University of Science and Information TechnologyDepartment of System Biology and Engineering, Silesian University of TechnologyAbstract Understanding the structural and operational characteristics of complex systems is crucial for network science research and analysis. To better understand the dynamics and behaviors of networks, it involves studying them in a variety of settings, including social, biological, and technical domains. This entails modeling and analyzing networks to identify their properties, frequently employing machine learning and statistical techniques. Conventional network models, such Erdős-Renyi (ER), Barabási-Albert (BA), and Stochastic Block Models (SBM), are commonly employed in synthetic network analysis. Real-world networks sometimes include extra complexities, like modularity, clustering, and scale-free features, which pose issues for these models. This study focuses on assessing the effectiveness of machine learning models in examining the structural features of networks across different scales and the related computational expenses. Here we show that Logistic Regression (LR) consistently outperforms Random Forest (RF) in synthetic networks of varying sizes, achieving perfect accuracy, precision, recall, F1 score, and AUC across networks with 100, 500, and 1000 nodes, while Random Forest exhibits lower performance with an accuracy of 80%. These findings call into question the notion that complicated models like Random Forest are inherently superior, indicating that simpler models like Logistic Regression are more effective in larger, more complex networks due to their higher generalization capabilities. The Stochastic Block Model (SBM) closely matches the modularity of real-world networks, while the Barabási-Albert (BA) model accurately replicates the hub-dominated structure of social networks, as confirmed by Kolmogorov-Smirnov (K-S) test statistics of $$D = 0.12$$ ( $$p = 0.18$$ ) for BA and $$D = 0.33$$ ( $$p = 0.005$$ ) for WS. These findings show that simpler machine learning models can outperform more sophisticated ones in some contexts, offering a more nuanced view of model selection based on network scale and complexity. They also emphasize the significance of balancing computational trade-offs when using machine learning on real-world networks. In a larger sense, this research helps to optimize machine learning techniques for network inference and analysis, which has ramifications for social, biological, and technical applications. The findings imply that future research should concentrate on adapting model selection to the specific characteristics of the network and task, assuring optimal performance and accuracy.https://doi.org/10.1038/s41598-025-02982-0Network scienceMachine learningNetwork inferenceLogistic regressionRandom forestScale-free networks
spellingShingle Ruby Khan
Sumbal Khan
Bakht Pari
Krzysztof Puszynski
Optimizing machine learning for network inference through comparative analysis of model performance in synthetic and real-world networks
Scientific Reports
Network science
Machine learning
Network inference
Logistic regression
Random forest
Scale-free networks
title Optimizing machine learning for network inference through comparative analysis of model performance in synthetic and real-world networks
title_full Optimizing machine learning for network inference through comparative analysis of model performance in synthetic and real-world networks
title_fullStr Optimizing machine learning for network inference through comparative analysis of model performance in synthetic and real-world networks
title_full_unstemmed Optimizing machine learning for network inference through comparative analysis of model performance in synthetic and real-world networks
title_short Optimizing machine learning for network inference through comparative analysis of model performance in synthetic and real-world networks
title_sort optimizing machine learning for network inference through comparative analysis of model performance in synthetic and real world networks
topic Network science
Machine learning
Network inference
Logistic regression
Random forest
Scale-free networks
url https://doi.org/10.1038/s41598-025-02982-0
work_keys_str_mv AT rubykhan optimizingmachinelearningfornetworkinferencethroughcomparativeanalysisofmodelperformanceinsyntheticandrealworldnetworks
AT sumbalkhan optimizingmachinelearningfornetworkinferencethroughcomparativeanalysisofmodelperformanceinsyntheticandrealworldnetworks
AT bakhtpari optimizingmachinelearningfornetworkinferencethroughcomparativeanalysisofmodelperformanceinsyntheticandrealworldnetworks
AT krzysztofpuszynski optimizingmachinelearningfornetworkinferencethroughcomparativeanalysisofmodelperformanceinsyntheticandrealworldnetworks