Review, evaluation, and discussion of the challenges of missing value imputation for mass spectrometry-based label-free global proteomics.
Webb-Robertson, B.-J. M.; Wiberg, H. K.; Matzke, M. M.; Brown, J. N.; Wang, J.; McDermott, J. E.; Smith, R. D.; Rodland, K. D.; Metz, T. O.; Pounds, J. G.; Waters, K. M.; et al. Review, evaluation, and discussion of the challenges of missing value imputation for mass spectrometry-based label-free global proteomics. J. Proteome Res. 2015, 14 (5), 1993−2001.
Evaluation of performance and caveats of 9 imputation algorithms applied on a LC-MS data set.
Most imputation methods perform well, no single algorithm or imputation strategy (single, local, global) outperforms, sometimes even no imputation is superior in subsequent classification analysis.
Local similarity-based approaches are in general the most accuarate and robust methods. Such as least-squares adaptive (LSA) or regularized expectation maximization (REM) (Figure 4)
The 'best' imputation method highly depends on the data and the goal of the downstream analysis and therewith advantageous methods are hard to define (Figure 3)
With left-censored data the number of missing values highly depends on peptide intensity (Figure 1)
Study design and evidence level
3 single-value approaches (LOD1,LOD2,RTI), 5 local similarity approaches (KNN, LLS, LSA, REM, MBI) and 2 global-structure approaches (PPCA, BPCA) were evaluated which allows comparison and discussion of different imputation strategies. They were applied to 3 real datasets of different type and species, which represent a broad biological application.