Optimizing machine learning for network inference through comparative analysis of model performance in synthetic and real-world networks
Abstract Understanding the structural and operational characteristics of complex systems is crucial for network science research and analysis. To better understand the dynamics and behaviors of networks, it involves studying them in a variety of settings, including social, biological, and technical...
Saved in:
| Main Authors: | , , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
Nature Portfolio
2025-07-01
|
| Series: | Scientific Reports |
| Subjects: | |
| Online Access: | https://doi.org/10.1038/s41598-025-02982-0 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1849332916646051840 |
|---|---|
| author | Ruby Khan Sumbal Khan Bakht Pari Krzysztof Puszynski |
| author_facet | Ruby Khan Sumbal Khan Bakht Pari Krzysztof Puszynski |
| author_sort | Ruby Khan |
| collection | DOAJ |
| description | Abstract Understanding the structural and operational characteristics of complex systems is crucial for network science research and analysis. To better understand the dynamics and behaviors of networks, it involves studying them in a variety of settings, including social, biological, and technical domains. This entails modeling and analyzing networks to identify their properties, frequently employing machine learning and statistical techniques. Conventional network models, such Erdős-Renyi (ER), Barabási-Albert (BA), and Stochastic Block Models (SBM), are commonly employed in synthetic network analysis. Real-world networks sometimes include extra complexities, like modularity, clustering, and scale-free features, which pose issues for these models. This study focuses on assessing the effectiveness of machine learning models in examining the structural features of networks across different scales and the related computational expenses. Here we show that Logistic Regression (LR) consistently outperforms Random Forest (RF) in synthetic networks of varying sizes, achieving perfect accuracy, precision, recall, F1 score, and AUC across networks with 100, 500, and 1000 nodes, while Random Forest exhibits lower performance with an accuracy of 80%. These findings call into question the notion that complicated models like Random Forest are inherently superior, indicating that simpler models like Logistic Regression are more effective in larger, more complex networks due to their higher generalization capabilities. The Stochastic Block Model (SBM) closely matches the modularity of real-world networks, while the Barabási-Albert (BA) model accurately replicates the hub-dominated structure of social networks, as confirmed by Kolmogorov-Smirnov (K-S) test statistics of $$D = 0.12$$ ( $$p = 0.18$$ ) for BA and $$D = 0.33$$ ( $$p = 0.005$$ ) for WS. These findings show that simpler machine learning models can outperform more sophisticated ones in some contexts, offering a more nuanced view of model selection based on network scale and complexity. They also emphasize the significance of balancing computational trade-offs when using machine learning on real-world networks. In a larger sense, this research helps to optimize machine learning techniques for network inference and analysis, which has ramifications for social, biological, and technical applications. The findings imply that future research should concentrate on adapting model selection to the specific characteristics of the network and task, assuring optimal performance and accuracy. |
| format | Article |
| id | doaj-art-9fbd918bee994cd097bc071935ccb51c |
| institution | Kabale University |
| issn | 2045-2322 |
| language | English |
| publishDate | 2025-07-01 |
| publisher | Nature Portfolio |
| record_format | Article |
| series | Scientific Reports |
| spelling | doaj-art-9fbd918bee994cd097bc071935ccb51c2025-08-20T03:46:04ZengNature PortfolioScientific Reports2045-23222025-07-0115112610.1038/s41598-025-02982-0Optimizing machine learning for network inference through comparative analysis of model performance in synthetic and real-world networksRuby Khan0Sumbal Khan1Bakht Pari2Krzysztof Puszynski3Department of System Biology and Engineering, Silesian University of TechnologyKhyber Girls Medical CollegeSarhad University of Science and Information TechnologyDepartment of System Biology and Engineering, Silesian University of TechnologyAbstract Understanding the structural and operational characteristics of complex systems is crucial for network science research and analysis. To better understand the dynamics and behaviors of networks, it involves studying them in a variety of settings, including social, biological, and technical domains. This entails modeling and analyzing networks to identify their properties, frequently employing machine learning and statistical techniques. Conventional network models, such Erdős-Renyi (ER), Barabási-Albert (BA), and Stochastic Block Models (SBM), are commonly employed in synthetic network analysis. Real-world networks sometimes include extra complexities, like modularity, clustering, and scale-free features, which pose issues for these models. This study focuses on assessing the effectiveness of machine learning models in examining the structural features of networks across different scales and the related computational expenses. Here we show that Logistic Regression (LR) consistently outperforms Random Forest (RF) in synthetic networks of varying sizes, achieving perfect accuracy, precision, recall, F1 score, and AUC across networks with 100, 500, and 1000 nodes, while Random Forest exhibits lower performance with an accuracy of 80%. These findings call into question the notion that complicated models like Random Forest are inherently superior, indicating that simpler models like Logistic Regression are more effective in larger, more complex networks due to their higher generalization capabilities. The Stochastic Block Model (SBM) closely matches the modularity of real-world networks, while the Barabási-Albert (BA) model accurately replicates the hub-dominated structure of social networks, as confirmed by Kolmogorov-Smirnov (K-S) test statistics of $$D = 0.12$$ ( $$p = 0.18$$ ) for BA and $$D = 0.33$$ ( $$p = 0.005$$ ) for WS. These findings show that simpler machine learning models can outperform more sophisticated ones in some contexts, offering a more nuanced view of model selection based on network scale and complexity. They also emphasize the significance of balancing computational trade-offs when using machine learning on real-world networks. In a larger sense, this research helps to optimize machine learning techniques for network inference and analysis, which has ramifications for social, biological, and technical applications. The findings imply that future research should concentrate on adapting model selection to the specific characteristics of the network and task, assuring optimal performance and accuracy.https://doi.org/10.1038/s41598-025-02982-0Network scienceMachine learningNetwork inferenceLogistic regressionRandom forestScale-free networks |
| spellingShingle | Ruby Khan Sumbal Khan Bakht Pari Krzysztof Puszynski Optimizing machine learning for network inference through comparative analysis of model performance in synthetic and real-world networks Scientific Reports Network science Machine learning Network inference Logistic regression Random forest Scale-free networks |
| title | Optimizing machine learning for network inference through comparative analysis of model performance in synthetic and real-world networks |
| title_full | Optimizing machine learning for network inference through comparative analysis of model performance in synthetic and real-world networks |
| title_fullStr | Optimizing machine learning for network inference through comparative analysis of model performance in synthetic and real-world networks |
| title_full_unstemmed | Optimizing machine learning for network inference through comparative analysis of model performance in synthetic and real-world networks |
| title_short | Optimizing machine learning for network inference through comparative analysis of model performance in synthetic and real-world networks |
| title_sort | optimizing machine learning for network inference through comparative analysis of model performance in synthetic and real world networks |
| topic | Network science Machine learning Network inference Logistic regression Random forest Scale-free networks |
| url | https://doi.org/10.1038/s41598-025-02982-0 |
| work_keys_str_mv | AT rubykhan optimizingmachinelearningfornetworkinferencethroughcomparativeanalysisofmodelperformanceinsyntheticandrealworldnetworks AT sumbalkhan optimizingmachinelearningfornetworkinferencethroughcomparativeanalysisofmodelperformanceinsyntheticandrealworldnetworks AT bakhtpari optimizingmachinelearningfornetworkinferencethroughcomparativeanalysisofmodelperformanceinsyntheticandrealworldnetworks AT krzysztofpuszynski optimizingmachinelearningfornetworkinferencethroughcomparativeanalysisofmodelperformanceinsyntheticandrealworldnetworks |