Gene set analysis methods: a systematic comparison

Revision as of 10:44, 25 February 2020 by Ckreutz (talk | contribs) (Gene set analysis methods: a systematic comparison)

1 Gene set analysis methods: a systematic comparison

Mathur, R., Rotroff, D., Ma, J., Shojaie, A., & Motsinger-Reif, A. , Gene set analysis methods: a systematic comparison, 2018, BioData mining, 11(1), 8.

Permanent link to the paper


1.1 Summary

Approaches for gene set analyses were assessed by using simulated data that were generated based on a real experimental data set.

1.2 Study outcomes

1.2.1 Outcome O1: False positives under null distribution

The frequency of false-positives was assessed by using an alpha=0.05. Consequently all approaches (except FET-1k) showed around 5% false-positive or less. FET-1k ("FET global statistic in SAFE") had around than 20%.

Outcome O1 is presented as Figure 2 in the original publication for the prostate data template and in the "Additional File 1" for the other templates.

Baseline of this outcome is that all approaches excep FET-1k perform similarly well in terms of false-positives.

1.2.2 Outcome O2

  • sigPathway showed superior performance
  • SAFE-Wilcoxon could NOT detect the differentially regulated pathway(s).
  • In general, the performance increases with increasing fraction of regulated genes (parameter pi in the paper), except for "Comp GSEA Q" that shows counterintuitive performance.

Outcome O2 is presented as Figure 3 in the original publication, the numbers are provided in the supplement.

1.2.3 Outcome O3

  • SAFE again performs weak for most configurations
  • Only "aveDiff-boot" seems to have a good power that improves with increasing magnitudes tau of regulation
  • FET-1k, FET-10k could identify the regulated pathway but shows counterintuitive performance (i.e. decreasing performances for increasing magnitudes of regulation)

Outcome O3 is presented as Figure 4 in the original publication.

1.2.4 Outcome O4

  • COMP-GSEA-FDR and Self-GSEA-FDR showed superior performance
  • Comp-GSEA-Q and SELF-GSEA-Q showed counterintuitive performance, i.e. the performance deceases with increasing effect size tau


1.3 Study design and evidence level

1.3.1 General aspects

  • The authors compared four different methods:
    • Gene Set Enrichment Analysis (GSEA)
    • Significance Analysis of Function and Expression (SAFE)
    • sigPathway, and
    • Correlation Adjusted Mean RAnk (CAMERA).
  • The authors consider different sizes of the gene sets
  • The authors consider different proportions of regulated genes in the gene sets
  • The authors consider different magnitudes of the underlying effect size (i.e. log-fold-changes)
  • The authors consider three null simulations (without regulation) as reference for outcome O1
  • In this publication, the authors published a novel simulation approach termed (FANGS)
  • The simulation approach is available in this R package (FANGS) offers the opportunity to reproduce the simulations and repeat the analysis for other gene set methods.
  • The authors provide a comprehensive list of the used configuration parameters
  • The authors evaluated the following alternative configurations
    • For GSEA one alternative
    • For SAFE five alternative setups
    • For sigPathway and CAMERA no other configurations were considered
  • Three experimental data sets were used as foundations for simulating data
    • prostate cancer (264 cases, 160 controls)
    • ischemic stroke (20 cases, 20 controls)
    • normal brain tissue (21 cases, 20 controls)

1.3.2 Design for Outcome O1

  • The authors consider three null simulations (without regulation) as reference:
    • permutation of class labels
    • independently sampled expression of all features (=genes)
    • centering the simulated data, i.e. set effect size to zero
  • Default configuration parameters and the alternative parameters described above were evaluated
  • Only the prostat cancer data set was considered as template for simulations

1.3.3 Design for Outcome O2

  • The outcome was generated by simulating differential expression of one pathway
  • The analysis was repeated for all three data sets as template
  • For each of the three data sets the analysis was repeated by selecting two different pathways as differentially regulated.
  • In total, six analyses were performed (3 data sets x 2 regulated pathways)
  • Default configuration parameters were chosen

1.3.4 Design for Outcome O3

  • The weak performance of SAFE for the default configuration in O2 seems to be the motivation for investigation of other configurations for SAFE
  • The outcome O3 was only generated for one data sets (prostate cancer) and two regulated pathways

1.4 Further comments and aspects

1.5 References

The list of cited or related literature is placed here.