Virtual Machine Proactive Fault Tolerance Using Log-Based Anomaly Detection
Virtual Machine (VM) fault tolerance ensures high availability in cloud computing environments. Proactive fault tolerance strategies avert service disruptions by detecting potential failures before they occur and migrating the VMs to healthy hosts. In this paper, we propose Virtual Machine Proactive...
Saved in:
| Main Authors: | , , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
IEEE
2024-01-01
|
| Series: | IEEE Access |
| Subjects: | |
| Online Access: | https://ieeexplore.ieee.org/document/10767421/ |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1846128518946095104 |
|---|---|
| author | Pratheek Senevirathne Samindu Cooray Jerome Dinal Herath Dinuni Fernando |
| author_facet | Pratheek Senevirathne Samindu Cooray Jerome Dinal Herath Dinuni Fernando |
| author_sort | Pratheek Senevirathne |
| collection | DOAJ |
| description | Virtual Machine (VM) fault tolerance ensures high availability in cloud computing environments. Proactive fault tolerance strategies avert service disruptions by detecting potential failures before they occur and migrating the VMs to healthy hosts. In this paper, we propose Virtual Machine Proactive Fault Tolerance using Log-based Anomaly Detection (VMFT-LAD), a semi-supervised, real-time log anomaly detection model capable of detecting failures ahead of time to provide effective VM fault tolerance. VMFT-LAD leverages the efficiency of the Matrix Profile for anomaly detection and the log inference capability of Large Language Models (LLMs) to identify potential VM failures early, while minimizing false positives. Our improved Matrix Profile enables VMFT-LAD to continuously learn and identify potential failures, including unforeseen fault types, with minimal human intervention. Additionally, its semi-supervised nature eliminates the need for labeled failure data. Extensive evaluations on several datasets, using two distinct criteria to validate anomaly detection and early failure detection capabilities, demonstrate VMFT-LAD’s outstanding performance. VMFT-LAD achieves a Numenta Anomaly Benchmark (NAB) standard score of 90.74 for predicting failures in advance, with a high early detection rate of 96.28% and a low false positive rate of 0.02%, enabling accurate and timely VM migration before failures occur. |
| format | Article |
| id | doaj-art-21c1554b805849ec9b08b75e9f246ddc |
| institution | Kabale University |
| issn | 2169-3536 |
| language | English |
| publishDate | 2024-01-01 |
| publisher | IEEE |
| record_format | Article |
| series | IEEE Access |
| spelling | doaj-art-21c1554b805849ec9b08b75e9f246ddc2024-12-11T00:04:58ZengIEEEIEEE Access2169-35362024-01-011217895117897010.1109/ACCESS.2024.350683310767421Virtual Machine Proactive Fault Tolerance Using Log-Based Anomaly DetectionPratheek Senevirathne0https://orcid.org/0009-0004-6380-9277Samindu Cooray1https://orcid.org/0009-0005-3401-2784Jerome Dinal Herath2Dinuni Fernando3https://orcid.org/0000-0001-5597-4185School of Computing, University of Colombo, Colombo, Sri LankaSchool of Computing, University of Colombo, Colombo, Sri LankaSchool of Computing, University of Colombo, Colombo, Sri LankaSchool of Computing, University of Colombo, Colombo, Sri LankaVirtual Machine (VM) fault tolerance ensures high availability in cloud computing environments. Proactive fault tolerance strategies avert service disruptions by detecting potential failures before they occur and migrating the VMs to healthy hosts. In this paper, we propose Virtual Machine Proactive Fault Tolerance using Log-based Anomaly Detection (VMFT-LAD), a semi-supervised, real-time log anomaly detection model capable of detecting failures ahead of time to provide effective VM fault tolerance. VMFT-LAD leverages the efficiency of the Matrix Profile for anomaly detection and the log inference capability of Large Language Models (LLMs) to identify potential VM failures early, while minimizing false positives. Our improved Matrix Profile enables VMFT-LAD to continuously learn and identify potential failures, including unforeseen fault types, with minimal human intervention. Additionally, its semi-supervised nature eliminates the need for labeled failure data. Extensive evaluations on several datasets, using two distinct criteria to validate anomaly detection and early failure detection capabilities, demonstrate VMFT-LAD’s outstanding performance. VMFT-LAD achieves a Numenta Anomaly Benchmark (NAB) standard score of 90.74 for predicting failures in advance, with a high early detection rate of 96.28% and a low false positive rate of 0.02%, enabling accurate and timely VM migration before failures occur.https://ieeexplore.ieee.org/document/10767421/Adaptive learninganomaly detectioncloud computingfault tolerancelarge language modelslog analysis |
| spellingShingle | Pratheek Senevirathne Samindu Cooray Jerome Dinal Herath Dinuni Fernando Virtual Machine Proactive Fault Tolerance Using Log-Based Anomaly Detection IEEE Access Adaptive learning anomaly detection cloud computing fault tolerance large language models log analysis |
| title | Virtual Machine Proactive Fault Tolerance Using Log-Based Anomaly Detection |
| title_full | Virtual Machine Proactive Fault Tolerance Using Log-Based Anomaly Detection |
| title_fullStr | Virtual Machine Proactive Fault Tolerance Using Log-Based Anomaly Detection |
| title_full_unstemmed | Virtual Machine Proactive Fault Tolerance Using Log-Based Anomaly Detection |
| title_short | Virtual Machine Proactive Fault Tolerance Using Log-Based Anomaly Detection |
| title_sort | virtual machine proactive fault tolerance using log based anomaly detection |
| topic | Adaptive learning anomaly detection cloud computing fault tolerance large language models log analysis |
| url | https://ieeexplore.ieee.org/document/10767421/ |
| work_keys_str_mv | AT pratheeksenevirathne virtualmachineproactivefaulttoleranceusinglogbasedanomalydetection AT saminducooray virtualmachineproactivefaulttoleranceusinglogbasedanomalydetection AT jeromedinalherath virtualmachineproactivefaulttoleranceusinglogbasedanomalydetection AT dinunifernando virtualmachineproactivefaulttoleranceusinglogbasedanomalydetection |