Resource Time Series Analysis and Forecasting in Large-Scale Virtual Clusters

In today’s rapidly evolving internet landscape, prominent companies across various industries face increasingly complex business operations, leading to significant cluster-scale growth. However, this growth brings about challenges in cluster management and the inefficient utilization of vast amounts...

Full description

Saved in:
Bibliographic Details
Main Authors: Yue Lin, Jiamin Wen, Xudong Zhang, Yan Liang, Jianjiang Li
Format: Article
Language:English
Published: Tsinghua University Press 2025-05-01
Series:Big Data Mining and Analytics
Subjects:
Online Access:https://www.sciopen.com/article/10.26599/BDMA.2024.9020085
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1849304102891159552
author Yue Lin
Jiamin Wen
Xudong Zhang
Yan Liang
Jianjiang Li
author_facet Yue Lin
Jiamin Wen
Xudong Zhang
Yan Liang
Jianjiang Li
author_sort Yue Lin
collection DOAJ
description In today’s rapidly evolving internet landscape, prominent companies across various industries face increasingly complex business operations, leading to significant cluster-scale growth. However, this growth brings about challenges in cluster management and the inefficient utilization of vast amounts of data due to its low value density. This paper, based on the large-scale cluster virtualization and monitoring system of the data center of the Bureau of Geophysical Prospecting (BGP), utilizes time series data of host resources from the monitoring system’s time series database to propose a multivariate multi-step time series forecasting model, MUL-CNN-BiGRU-Attention, for forecasting CPU load on virtual cluster hosts. The model undergoes extensive offline training using a large volume of time series data, followed by deployment using TensorFlow Serving. Recent small-batch data are employed for fine-tuning model parameters to better adapt to current data patterns. Comparative experiments are conducted between the proposed model and other baseline models, demonstrating notable improvements in Mean Absolute Error (MAE), Mean Squared Error (MSE), Root Mean Squared Error (RMSE), and R2 metrics by up to 35.2%, 56.1%, 32.5%, and 10.3%, respectively. Additionally, ablation experiments are designed to investigate the impact of different factors on the performance of the forecasting model, providing valuable insights for parameter optimization based on experimental results.
format Article
id doaj-art-3c46adcbc8ec4bb78b8e95331a855d8c
institution Kabale University
issn 2096-0654
2097-406X
language English
publishDate 2025-05-01
publisher Tsinghua University Press
record_format Article
series Big Data Mining and Analytics
spelling doaj-art-3c46adcbc8ec4bb78b8e95331a855d8c2025-08-20T03:55:49ZengTsinghua University PressBig Data Mining and Analytics2096-06542097-406X2025-05-018359260510.26599/BDMA.2024.9020085Resource Time Series Analysis and Forecasting in Large-Scale Virtual ClustersYue Lin0Jiamin Wen1Xudong Zhang2Yan Liang3Jianjiang Li4Department of Computer Science and Technology, University of Science and Technology Beijing, Beijing 100083, ChinaBGP Inc., China National Petroleum Corporation, Zhuozhou 072751, ChinaNational Engineering Research Center of Oil & Gas Exploration Computer Software, Zhuozhou 072751, ChinaBGP Inc., China National Petroleum Corporation, Zhuozhou 072751, ChinaDepartment of Computer Science and Technology, University of Science and Technology Beijing, Beijing 100083, ChinaIn today’s rapidly evolving internet landscape, prominent companies across various industries face increasingly complex business operations, leading to significant cluster-scale growth. However, this growth brings about challenges in cluster management and the inefficient utilization of vast amounts of data due to its low value density. This paper, based on the large-scale cluster virtualization and monitoring system of the data center of the Bureau of Geophysical Prospecting (BGP), utilizes time series data of host resources from the monitoring system’s time series database to propose a multivariate multi-step time series forecasting model, MUL-CNN-BiGRU-Attention, for forecasting CPU load on virtual cluster hosts. The model undergoes extensive offline training using a large volume of time series data, followed by deployment using TensorFlow Serving. Recent small-batch data are employed for fine-tuning model parameters to better adapt to current data patterns. Comparative experiments are conducted between the proposed model and other baseline models, demonstrating notable improvements in Mean Absolute Error (MAE), Mean Squared Error (MSE), Root Mean Squared Error (RMSE), and R2 metrics by up to 35.2%, 56.1%, 32.5%, and 10.3%, respectively. Additionally, ablation experiments are designed to investigate the impact of different factors on the performance of the forecasting model, providing valuable insights for parameter optimization based on experimental results.https://www.sciopen.com/article/10.26599/BDMA.2024.9020085workload forecastingmultivariate time series forecastingdeep learning
spellingShingle Yue Lin
Jiamin Wen
Xudong Zhang
Yan Liang
Jianjiang Li
Resource Time Series Analysis and Forecasting in Large-Scale Virtual Clusters
Big Data Mining and Analytics
workload forecasting
multivariate time series forecasting
deep learning
title Resource Time Series Analysis and Forecasting in Large-Scale Virtual Clusters
title_full Resource Time Series Analysis and Forecasting in Large-Scale Virtual Clusters
title_fullStr Resource Time Series Analysis and Forecasting in Large-Scale Virtual Clusters
title_full_unstemmed Resource Time Series Analysis and Forecasting in Large-Scale Virtual Clusters
title_short Resource Time Series Analysis and Forecasting in Large-Scale Virtual Clusters
title_sort resource time series analysis and forecasting in large scale virtual clusters
topic workload forecasting
multivariate time series forecasting
deep learning
url https://www.sciopen.com/article/10.26599/BDMA.2024.9020085
work_keys_str_mv AT yuelin resourcetimeseriesanalysisandforecastinginlargescalevirtualclusters
AT jiaminwen resourcetimeseriesanalysisandforecastinginlargescalevirtualclusters
AT xudongzhang resourcetimeseriesanalysisandforecastinginlargescalevirtualclusters
AT yanliang resourcetimeseriesanalysisandforecastinginlargescalevirtualclusters
AT jianjiangli resourcetimeseriesanalysisandforecastinginlargescalevirtualclusters