The effects of nonignorable missing data on label-free mass spectrometry proteomics experiments


O'Brien JJ, Gunawardena HP, Paulo JA, Chen X, Ibrahim JG, Gygi SP, Qaqish BF (2018). The effects of nonignorable missing data on label-free mass spectrometry proteomics experiments. Ann Appl Stat. 12(4):2075-95. doi: 10.1214/18-AOAS1144


In this paper parameter contrasts due to missing data are analyzed and a Bayesian selection model to overcome these contrasts and recover interblock information is introduced. The proposed model is compared to other imputation strategies as well as complete-case analyses.

Study outcomes

The introduced selection model for proteomics (SMP) tries to capture the missing data mechanisms of the specific dataset.

Outcome O1

The SMP model improves accuracy, depth of discovery and internal coverage (Figures 1,2,3)

Outcome O2

The mixed model and two-way ANOVA, which rely on intrablock estimation, outperform the one-way ANOVA and other imputation methods (Min,Mean,Svd,Knn), which rely on interblock information, on all datasets (Figures 1,2,3)

Further outcomes

Missing data leads to contrast bias between conditions.

Study design and evidence level

General aspects

Separate analysis of imputation performance if protein contrasts are estimable or inestimable.

9 imputation algorithms are compared: SMP, ANOVA (1+2-way), mean, column minimum, peptide minimum, svd, knn, mixture model, although most of them are quite simple models.

Accuracy as well as interval coverage are assessed.

Further comments and aspects

Data simulation favors SMP model.


Model similar to:

Luo R, Colangelo CM, Sessa WC, Zhao H. Bayesian Analysis of iTRAQ Data with Nonrandom Missingness: Identification of Differentially Expressed Proteins. Stat Biosci. 1(2):228-45.