Biplot visualisations of the differences between multiple imputation techniques for simulated categorical data
Abstract Proper handling of missing data is a necessity for all data driven research. Multiple imputation is considered as a superior approach to handle missing data. This manuscript compares four ready-to-use R packages for multiple imputation of missing multivariate categorical data. The selected...
Saved in:
| Main Author: | |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
Springer
2025-07-01
|
| Series: | Discover Data |
| Subjects: | |
| Online Access: | https://doi.org/10.1007/s44248-025-00063-1 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| Summary: | Abstract Proper handling of missing data is a necessity for all data driven research. Multiple imputation is considered as a superior approach to handle missing data. This manuscript compares four ready-to-use R packages for multiple imputation of missing multivariate categorical data. The selected methods provide a variety of approaches to investigate the possible effect of congenial imputation and analysis models when compared to other imputation methods. The focus is on the evaluation of multivariate visualisations of multiple imputation techniques, by specifically using multiple correspondence analysis biplots. Simulated multivariate categorical data sets are used to compare the visualisations of complete and incomplete biplot representations. An unbiased unified visualisation method, the GPAbin biplot, is used to obtain a combined multivariate visualisation of multiple imputed data sets. This visualisation approach combines configurations by means of generalised orthogonal Procrustes analysis (GPA) and applying Rubin's rules (-bin) on the aligned configurations. Biplot visualisation enables the investigation of associations of samples and variables by evaluating the discerning patterns that arise due to the proximities of the coordinates. Differences between the visualisations of the various multiple imputation strategies can provide guidance on the suitability of the chosen imputation methods. Evaluation measures related to the distances between coordinates in the biplots are used to compare the visualisations and establish the performance of the four imputation methods. This manuscript shows how relevant visualisations can provide insight and an intuition on the appropriateness of the applied imputation approach. The findings will guide users to select an appropriate multiple imputation strategy based on the underlying data characteristics. |
|---|---|
| ISSN: | 2731-6955 |