Predicting Missing Data Using Multiple Imputation by Chained Process in Obesity Dataset
Synopsis
This paper showcases the effectiveness of multiple imputation (MI) using a chained process (MIC) in imputing missing values in an obesity dataset, highlighting its superiority over single imputation methods due to its computational complexity and lack of familiarity. MIC is implemented and its performance is compared to basic statistical imputation techniques. The results show MIC provides lower error (MSE/RMSE) on numeric variables and higher accuracy on categorical variables versus statistical methods. MIC handles both numeric and categorical missing data well, provided column variables are correlated. By providing a template for applying MIC, this project aims to encourage the use of MI and promote awareness of its benefits over a single imputation for missing data problems in medical research.

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.