Virtual Machine Proactive Fault Tolerance Using Log-Based Anomaly Detection
Virtual Machine (VM) fault tolerance ensures high availability in cloud computing environments. Proactive fault tolerance strategies avert service disruptions by detecting potential failures before they occur and migrating the VMs to healthy hosts. In this paper, we propose Virtual Machine Proactive...
Saved in:
| Main Authors: | , , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
IEEE
2024-01-01
|
| Series: | IEEE Access |
| Subjects: | |
| Online Access: | https://ieeexplore.ieee.org/document/10767421/ |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| Summary: | Virtual Machine (VM) fault tolerance ensures high availability in cloud computing environments. Proactive fault tolerance strategies avert service disruptions by detecting potential failures before they occur and migrating the VMs to healthy hosts. In this paper, we propose Virtual Machine Proactive Fault Tolerance using Log-based Anomaly Detection (VMFT-LAD), a semi-supervised, real-time log anomaly detection model capable of detecting failures ahead of time to provide effective VM fault tolerance. VMFT-LAD leverages the efficiency of the Matrix Profile for anomaly detection and the log inference capability of Large Language Models (LLMs) to identify potential VM failures early, while minimizing false positives. Our improved Matrix Profile enables VMFT-LAD to continuously learn and identify potential failures, including unforeseen fault types, with minimal human intervention. Additionally, its semi-supervised nature eliminates the need for labeled failure data. Extensive evaluations on several datasets, using two distinct criteria to validate anomaly detection and early failure detection capabilities, demonstrate VMFT-LAD’s outstanding performance. VMFT-LAD achieves a Numenta Anomaly Benchmark (NAB) standard score of 90.74 for predicting failures in advance, with a high early detection rate of 96.28% and a low false positive rate of 0.02%, enabling accurate and timely VM migration before failures occur. |
|---|---|
| ISSN: | 2169-3536 |