The effects of nonignorable missing data on label-free mass spectrometry proteomics experiments
O'Brien JJ, Gunawardena HP, Paulo JA, Chen X, Ibrahim JG, Gygi SP, Qaqish BF (2018). The effects of nonignorable missing data on label-free mass spectrometry proteomics experiments. Ann Appl Stat. 12(4):2075-95. doi: 10.1214/18-AOAS1144
In this paper parameter contrasts due to missing data are analyzed and a Bayesian selection model to overcome these contrasts and recover interblock information is introduced. The proposed model is compared to other imputation strategies as well as complete-case analyses.
The introduced selection model for proteomics (SMP) tries to capture the missing data mechanisms of the specific dataset.
The SMP model improves accuracy, depth of discovery and internal coverage (Figures 1,2,3)
The mixed model and two-way ANOVA, which rely on intrablock estimation, outperform the one-way ANOVA and other imputation methods (Min,Mean,Svd,Knn), which rely on interblock information, on all datasets (Figures 1,2,3)
Missing data leads to contrast bias between conditions.
Study design and evidence level
Separate analysis of imputation performance if protein contrasts are estimable or inestimable.
9 imputation algorithms are compared: SMP, ANOVA (1+2-way), mean, column minimum, peptide minimum, svd, knn, mixture model, although most of them are quite simple models.
Accuracy as well as interval coverage are assessed.
Further comments and aspects
Data simulation favors SMP model.
Model similar to:
Luo R, Colangelo CM, Sessa WC, Zhao H. Bayesian Analysis of iTRAQ Data with Nonrandom Missingness: Identification of Differentially Expressed Proteins. Stat Biosci. 1(2):228-45.