The fallacy of single imputation for trait databases: Use multiple imputation instead

Abstract The past few years have seen the publication of many new trait databases. A common problem with large databases is a lack of completeness, or inversely, the high prevalence of missing values. Biologists have developed several methods to impute (fill in) missing values. This allows ordinary...

Full description

Saved in:
Bibliographic Details
Main Authors: Simone P. Blomberg, Orlin S. Todorov
Format: Article
Language:English
Published: Wiley 2025-04-01
Series:Methods in Ecology and Evolution
Subjects:
Online Access:https://doi.org/10.1111/2041-210X.14494
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Abstract The past few years have seen the publication of many new trait databases. A common problem with large databases is a lack of completeness, or inversely, the high prevalence of missing values. Biologists have developed several methods to impute (fill in) missing values. This allows ordinary statistical procedures to be used in analyses and the use of only complete cases, with a concomitant loss of power and accuracy, can be avoided. Often, biologists use simulation to test new methods by deleting values from a dataset and recording how well the imputed values match the known, removed values. Here we argue that this is a poor measure of the strength of an imputation method. We also describe the importance and logic of the statistical procedure of multiple imputation, which requires that the imputations need not be precise or accurate estimates of the missing data.
ISSN:2041-210X