Optimizing machine learning for network inference through comparative analysis of model performance in synthetic and real-world networks

Abstract Understanding the structural and operational characteristics of complex systems is crucial for network science research and analysis. To better understand the dynamics and behaviors of networks, it involves studying them in a variety of settings, including social, biological, and technical...

Full description

Saved in:

Bibliographic Details
Main Authors:	Ruby Khan, Sumbal Khan, Bakht Pari, Krzysztof Puszynski
Format:	Article
Language:	English
Published:	Nature Portfolio 2025-07-01
Series:	Scientific Reports
Subjects:	Network science Machine learning Network inference Logistic regression Random forest Scale-free networks
Online Access:	https://doi.org/10.1038/s41598-025-02982-0
Tags:	Add Tag No Tags, Be the first to tag this record!

Description
Summary:	Abstract Understanding the structural and operational characteristics of complex systems is crucial for network science research and analysis. To better understand the dynamics and behaviors of networks, it involves studying them in a variety of settings, including social, biological, and technical domains. This entails modeling and analyzing networks to identify their properties, frequently employing machine learning and statistical techniques. Conventional network models, such Erdős-Renyi (ER), Barabási-Albert (BA), and Stochastic Block Models (SBM), are commonly employed in synthetic network analysis. Real-world networks sometimes include extra complexities, like modularity, clustering, and scale-free features, which pose issues for these models. This study focuses on assessing the effectiveness of machine learning models in examining the structural features of networks across different scales and the related computational expenses. Here we show that Logistic Regression (LR) consistently outperforms Random Forest (RF) in synthetic networks of varying sizes, achieving perfect accuracy, precision, recall, F1 score, and AUC across networks with 100, 500, and 1000 nodes, while Random Forest exhibits lower performance with an accuracy of 80%. These findings call into question the notion that complicated models like Random Forest are inherently superior, indicating that simpler models like Logistic Regression are more effective in larger, more complex networks due to their higher generalization capabilities. The Stochastic Block Model (SBM) closely matches the modularity of real-world networks, while the Barabási-Albert (BA) model accurately replicates the hub-dominated structure of social networks, as confirmed by Kolmogorov-Smirnov (K-S) test statistics of $$D = 0.12$$ ( $$p = 0.18$$ ) for BA and $$D = 0.33$$ ( $$p = 0.005$$ ) for WS. These findings show that simpler machine learning models can outperform more sophisticated ones in some contexts, offering a more nuanced view of model selection based on network scale and complexity. They also emphasize the significance of balancing computational trade-offs when using machine learning on real-world networks. In a larger sense, this research helps to optimize machine learning techniques for network inference and analysis, which has ramifications for social, biological, and technical applications. The findings imply that future research should concentrate on adapting model selection to the specific characteristics of the network and task, assuring optimal performance and accuracy.
ISSN:	2045-2322

Optimizing machine learning for network inference through comparative analysis of model performance in synthetic and real-world networks

Similar Items