Data anomaly repair method based on fuzzy voting and multi-segment interpolation

Abstract Wind turbines are often situated in remote areas under harsh environmental conditions, where external noise and electromagnetic interference can corrupt the data, negatively impacting downstream tasks such as predictive alerts and diagnostics. Consequently, this paper proposes a comprehensi...

Full description

Saved in:
Bibliographic Details
Main Authors: Yanling Lv, Qingdong Han, Shulei Xue
Format: Article
Language:English
Published: Nature Portfolio 2025-07-01
Series:Scientific Reports
Online Access:https://doi.org/10.1038/s41598-025-05951-9
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1849403022689435648
author Yanling Lv
Qingdong Han
Shulei Xue
author_facet Yanling Lv
Qingdong Han
Shulei Xue
author_sort Yanling Lv
collection DOAJ
description Abstract Wind turbines are often situated in remote areas under harsh environmental conditions, where external noise and electromagnetic interference can corrupt the data, negatively impacting downstream tasks such as predictive alerts and diagnostics. Consequently, this paper proposes a comprehensive data processing workflow, encompassing both anomaly detection and data interpolation, to preprocess data for wind farms effectively. Firstly, an outlier detection method based on fuzzy voting theory is proposed, utilizing multiple anomaly detectors to ensure accurate detection of outliers within voluminous datasets. Secondly, a multi-segment data interpolation method based on segmented recognition is introduced. This method captures statistical features of the dataset to establish dynamic thresholds for identifying the upper limits of missing segments. For middle gaps, interpolation is performed using forward-backward LOESS, while large gaps are filled using thermal card filling based on similar trend recognition. This approach not only enhances the quality of data interpolation but also optimally balances the training time cost. Finally, the proposed method was validated using real-world wind field data. The results of the analysis demonstrate that compared to LSTM and other interpolation methods, the multi-segment interpolation approach achieved significant improvements in performance metrics, with MAE, MSRE, and RSE reduced by 24%, 7.1%, and 8.2%, respectively, indicating a notable enhancement in data quality. After completing the full data processing workflow, the wind field data showed a substantial improvement in model performance: the test set F1 score of the DLinear model increased by 3.8–19.1%, and Accuracy improved by 2.3–13.3% compared to the unprocessed data. These results highlight the enhanced precision and stability of the early warning model, along with faster convergence speeds.
format Article
id doaj-art-dd28d9eb8eb341c3890c30aa3c60b4a3
institution Kabale University
issn 2045-2322
language English
publishDate 2025-07-01
publisher Nature Portfolio
record_format Article
series Scientific Reports
spelling doaj-art-dd28d9eb8eb341c3890c30aa3c60b4a32025-08-20T03:37:22ZengNature PortfolioScientific Reports2045-23222025-07-0115111510.1038/s41598-025-05951-9Data anomaly repair method based on fuzzy voting and multi-segment interpolationYanling Lv0Qingdong Han1Shulei Xue2School of Electrical and Electronic Engineering, Harbin University of Science and TechnologySchool of Electrical and Electronic Engineering, Harbin University of Science and TechnologySchool of Electrical and Electronic Engineering, Harbin University of Science and TechnologyAbstract Wind turbines are often situated in remote areas under harsh environmental conditions, where external noise and electromagnetic interference can corrupt the data, negatively impacting downstream tasks such as predictive alerts and diagnostics. Consequently, this paper proposes a comprehensive data processing workflow, encompassing both anomaly detection and data interpolation, to preprocess data for wind farms effectively. Firstly, an outlier detection method based on fuzzy voting theory is proposed, utilizing multiple anomaly detectors to ensure accurate detection of outliers within voluminous datasets. Secondly, a multi-segment data interpolation method based on segmented recognition is introduced. This method captures statistical features of the dataset to establish dynamic thresholds for identifying the upper limits of missing segments. For middle gaps, interpolation is performed using forward-backward LOESS, while large gaps are filled using thermal card filling based on similar trend recognition. This approach not only enhances the quality of data interpolation but also optimally balances the training time cost. Finally, the proposed method was validated using real-world wind field data. The results of the analysis demonstrate that compared to LSTM and other interpolation methods, the multi-segment interpolation approach achieved significant improvements in performance metrics, with MAE, MSRE, and RSE reduced by 24%, 7.1%, and 8.2%, respectively, indicating a notable enhancement in data quality. After completing the full data processing workflow, the wind field data showed a substantial improvement in model performance: the test set F1 score of the DLinear model increased by 3.8–19.1%, and Accuracy improved by 2.3–13.3% compared to the unprocessed data. These results highlight the enhanced precision and stability of the early warning model, along with faster convergence speeds.https://doi.org/10.1038/s41598-025-05951-9
spellingShingle Yanling Lv
Qingdong Han
Shulei Xue
Data anomaly repair method based on fuzzy voting and multi-segment interpolation
Scientific Reports
title Data anomaly repair method based on fuzzy voting and multi-segment interpolation
title_full Data anomaly repair method based on fuzzy voting and multi-segment interpolation
title_fullStr Data anomaly repair method based on fuzzy voting and multi-segment interpolation
title_full_unstemmed Data anomaly repair method based on fuzzy voting and multi-segment interpolation
title_short Data anomaly repair method based on fuzzy voting and multi-segment interpolation
title_sort data anomaly repair method based on fuzzy voting and multi segment interpolation
url https://doi.org/10.1038/s41598-025-05951-9
work_keys_str_mv AT yanlinglv dataanomalyrepairmethodbasedonfuzzyvotingandmultisegmentinterpolation
AT qingdonghan dataanomalyrepairmethodbasedonfuzzyvotingandmultisegmentinterpolation
AT shuleixue dataanomalyrepairmethodbasedonfuzzyvotingandmultisegmentinterpolation