Toward a gold standard for benchmarking gene set enrichment analysis

__ NUMBEREDHEADINGS__

Citation

Geistlinger, L., Csaba, G., Santarelli, M., Ramos, M., Schiffer, L., Law, C., ... & Zimmer, R., Toward a gold standard for benchmarking gene set enrichment analysis, 2020, Bioinformatics, 0, 1-12

Permanent link to the paper

Summary

Gene set analyses are combination of several analysis modules. This paper investigates the performance of ten prominent approaches. Biological plausibility based on co-citation databases is used for assessment.

Study outcomes

Outcome O1

The performance of ...

Outcome O1 is presented as Figure X in the original publication.

Outcome O2

...

Outcome O2 is presented as Figure X in the original publication.

Outcome On

...

Outcome On is presented as Figure X in the original publication.

Further outcomes

Runtimes are as follows:


Study design and evidence level

General aspects

  • "75 expression datasets investigating 42 human diseases"
  • microarray and RNAseq data
  • pre-existing benchmark data sets
  • 10 methods:
    • ORA
    • GLOBALTEST
    • GSEA
    • SAFE
    • GSA
    • SAMGS
    • ROAST
    • CAMERA
    • PADOG
    • GSVA
  • "Gene set relevance rankings for each disease were constructed by querying the MalaCards database. MalaCards scores genes for disease relevance based on experimental evidence and co-citation in the literature."
  • "A nominal significance level of 0.05" is used (without correction with respect to multiple testing). This was also common in other benchmark studies.
  • The "type I error rate was evaluated by randomization of the sample labels" of the microarray data set.
  • "Random gene sets of increasing set size were analyzed to assess whether enrichment methods are affected by geneset size." For this purpose, 100 "random gene sets of defined sizes {5,10,25,50,100,250,500}" were sampled.

Design for Outcome O1

  • The outcome was generated for ...
  • Configuration parameters were chosen ...
  • ...

Design for Outcome O2

  • The outcome was generated for ...
  • Configuration parameters were chosen ...
  • ...

...

Design for Outcome O

  • The outcome was generated for ...
  • Configuration parameters were chosen ...
  • ...

Further comments and aspects

An R package (GSEABenchmarkeR) is available that seem to enable similar analyses.

References

The list of cited or related literature is placed here.