Gene set analysis methods: a systematic comparison

Revision as of 13:08, 25 February 2020 by Ckreutz (talk | contribs)

1 Citation

Mathur, R., Rotroff, D., Ma, J., Shojaie, A., & Motsinger-Reif, A. , Gene set analysis methods: a systematic comparison, 2018, BioData mining, 11(1), 8.

Permanent link to the paper

2 Summary

Approaches for gene set analyses were assessed by using simulated data that were generated based on a real experimental data set.

  • The authors compared four different methods:
    • Gene Set Enrichment Analysis (GSEA)
    • Significance Analysis of Function and Expression (SAFE)
    • sigPathway
    • Correlation Adjusted Mean Rank (CAMERA)

3 Study outcomes

3.1 Outcome O1: False positives under null distribution

The frequency of false-positives was assessed by using an alpha=0.05. Consequently all approaches (except FET-1k) showed around 5% false-positive or less. FET-1k ("FET global statistic in SAFE") had around than 20%.

Outcome O1 is presented as Figure 2 in the original publication for the prostate data template and in the "Additional File 1" for the other templates.

Baseline of this outcome is that all approaches excep FET-1k perform similarly well in terms of false-positives.

3.2 Outcome O2

  • sigPathway showed superior performance
  • SAFE-Wilcoxon could NOT detect the differentially regulated pathway(s).
  • In general, the performance increases with increasing fraction of regulated genes (parameter pi in the paper), except for "Comp GSEA Q" that shows counterintuitive performance.

Outcome O2 is presented as Figure 3 in the original publication, the numbers are provided in the supplement.

3.3 Outcome O3

  • SAFE again performs weak for most configurations
  • Only "aveDiff-boot" seems to have a good power that improves with increasing magnitudes tau of regulation
  • FET-1k, FET-10k could identify the regulated pathway but shows counterintuitive performance (i.e. decreasing performances for increasing magnitudes of regulation)

Outcome O3 is presented as Figure 4 in the original publication.

3.4 Outcome O4

  • COMP-GSEA-FDR and Self-GSEA-FDR showed superior performance
  • Comp-GSEA-Q and SELF-GSEA-Q showed counterintuitive performance, i.e. the performance deceases with increasing effect size tau

4 Study design and evidence level

4.1 General aspects

  • The authors consider different sizes of the gene sets
  • The authors consider different proportions of regulated genes in the gene sets
  • The authors consider different magnitudes of the underlying effect size (i.e. log-fold-changes)
  • The authors consider three null simulations (without regulation) as reference for outcome O1
  • In this publication, the authors published a novel simulation approach termed (FANGS)
  • The simulation approach is available in this R package (FANGS) offers the opportunity to reproduce the simulations and repeat the analysis for other gene set methods.
  • The authors provide a comprehensive list of the used configuration parameters
  • The authors evaluated the following alternative configurations
    • For GSEA one alternative
    • For SAFE five alternative setups
    • For sigPathway and CAMERA no other configurations were considered
  • Three experimental data sets were used as foundations for simulating data
    • prostate cancer (264 cases, 160 controls)
    • ischemic stroke (20 cases, 20 controls)
    • normal brain tissue (21 cases, 20 controls)

4.2 Design for Outcome O1

  • The authors consider three null simulations (without regulation) as reference:
    • permutation of class labels
    • independently sampled expression of all features (=genes)
    • centering the simulated data, i.e. set effect size to zero
  • Default configuration parameters and the alternative parameters described above were evaluated
  • Only the prostat cancer data set was considered as template for simulations

4.3 Design for Outcome O2

  • The outcome was generated by simulating differential expression of one pathway
  • The analysis was repeated for all three data sets as template
  • For each of the three data sets the analysis was repeated by selecting two different pathways as differentially regulated.
  • In total, six analyses were performed (3 data sets x 2 regulated pathways)
  • Default configuration parameters were chosen

4.4 Design for Outcome O3

  • The weak performance of SAFE for the default configuration in O2 seems to be the motivation for investigation of other configurations for SAFE
  • The outcome O3 was only generated for one data set (prostate cancer) and two regulated pathways

5 Further comments and aspects