Performance and Scalability of Data Cleaning and Preprocessing Tools: A Benchmark on Large Real-World Datasets

Data cleaning remains one of the most time-consuming and critical steps in modern data science, directly influencing the reliability and accuracy of downstream analytics. In this paper, we present a comprehensive evaluation of five widely used data cleaning tools—OpenRefine, Dedupe, Great Expectatio...

Full description

Saved in:
Bibliographic Details
Main Authors: Pedro Martins, Filipe Cardoso, Paulo Váz, José Silva, Maryam Abbasi
Format: Article
Language:English
Published: MDPI AG 2025-05-01
Series:Data
Subjects:
Online Access:https://www.mdpi.com/2306-5729/10/5/68
Tags: Add Tag
No Tags, Be the first to tag this record!