Difference between revisions of "Gene set analysis methods: a systematic comparison"

Revision as of 13:20, 25 February 2020

The authors compared four different methods:
- Gene Set Enrichment Analysis (GSEA-SELF and GSEA-COMP)
- Significance Analysis of Function and Expression (SAFE) based on the t-test as gene-wise test and offers Wilcoxon rank sum, Fisher’s Exact Test, Pearson’s Chi-squared type statistic and a t-statistic as global (gene set wide) tests
- sigPathway
- Correlation Adjusted Mean Rank (CAMERA)

3 Study outcomes

3.1 Outcome O1: False positives under null distribution

The frequency of false-positives was assessed by using an alpha=0.05. Consequently all approaches (except FET-1k) showed around 5% false-positive or less. FET-1k ("FET global statistic in SAFE") had around than 20%.

Outcome O1 is presented as Figure 2 in the original publication for the prostate data template and in the "Additional File 1" for the other templates.

Baseline of this outcome is that all approaches excep FET-1k perform similarly well in terms of false-positives.

3.2 Outcome O2

sigPathway showed superior performance
SAFE-Wilcoxon could NOT detect the differentially regulated pathway(s).
In general, the performance increases with increasing fraction of regulated genes (parameter pi in the paper), except for "Comp GSEA Q" that shows counterintuitive performance.

Outcome O2 is presented as Figure 3 in the original publication, the numbers are provided in the supplement.

3.3 Outcome O3

SAFE again performs weak for most configurations
Only "aveDiff-boot" seems to have a good power that improves with increasing magnitudes tau of regulation
FET-1k, FET-10k could identify the regulated pathway but shows counterintuitive performance (i.e. decreasing performances for increasing magnitudes of regulation)

Outcome O3 is presented as Figure 4 in the original publication.

3.4 Outcome O4

COMP-GSEA-FDR and Self-GSEA-FDR showed superior performance
Comp-GSEA-Q and SELF-GSEA-Q showed counterintuitive performance, i.e. the performance deceases with increasing effect size tau

4 Study design and evidence level

4.1 General aspects

The authors consider different sizes of the gene sets
The authors consider different proportions of regulated genes in the gene sets
The authors consider different magnitudes of the underlying effect size (i.e. log-fold-changes)
The authors consider three null simulations (without regulation) as reference for outcome O1
In this publication, the authors published a novel simulation approach termed (FANGS)
The simulation approach is available in this R package (FANGS) offers the opportunity to reproduce the simulations and repeat the analysis for other gene set methods.
The authors provide a comprehensive list of the used configuration parameters
The authors evaluated the following alternative configurations
- For GSEA one alternative
- For SAFE five alternative setups
- For sigPathway and CAMERA no other configurations were considered
Three experimental data sets were used as foundations for simulating data
- prostate cancer (264 cases, 160 controls)
- ischemic stroke (20 cases, 20 controls)
- normal brain tissue (21 cases, 20 controls)

4.2 Design for Outcome O1

The authors consider three null simulations (without regulation) as reference:
- permutation of class labels
- independently sampled expression of all features (=genes)
- centering the simulated data, i.e. set effect size to zero
Default configuration parameters and the alternative parameters described above were evaluated
Only the prostat cancer data set was considered as template for simulations

4.3 Design for Outcome O2

The outcome was generated by simulating differential expression of one pathway
The analysis was repeated for all three data sets as template
For each of the three data sets the analysis was repeated by selecting two different pathways as differentially regulated.
In total, six analyses were performed (3 data sets x 2 regulated pathways)
Default configuration parameters were chosen

4.4 Design for Outcome O3

The weak performance of SAFE for the default configuration in O2 seems to be the motivation for investigation of other configurations for SAFE
The outcome O3 was only generated for one data set (prostate cancer) and two regulated pathways

5 Further comments and aspects

Gene sets from MSigDB were used
The authors are aware of the fact that different null hypotheses are tested by the different approaches
sigPathway and CAMERA offers other options that are discussed in the article but not evaluated

@@ Line 7: / Line 7: @@
 === Summary ===
 Approaches for gene set analyses were assessed by using simulated data that were generated based on a real experimental data set.
+There are competitive tests (COMP) that uses the distribution of a reference gene set (e.g. all gene that are not in the gene set) as reference and self-contained (SELF) approaches that do not rely on a reference.
 * The authors compared four different methods:
-** Gene Set Enrichment Analysis (GSEA)
+** Gene Set Enrichment Analysis (GSEA-SELF and GSEA-COMP)
-** Significance Analysis of Function and Expression (SAFE)
+** Significance Analysis of Function and Expression (SAFE) based on the t-test as gene-wise test and offers Wilcoxon rank sum, Fisher’s Exact Test, Pearson’s Chi-squared type statistic and a t-statistic as global (gene set wide) tests
 ** sigPathway
 ** Correlation Adjusted Mean Rank (CAMERA)
 === Study outcomes ===
 ==== Outcome O1: False positives under null distribution ====
 The frequency of false-positives was assessed by using an alpha=0.05.
@@ Line 84: / Line 84: @@
 === Further comments and aspects ===
+* Gene sets from MSigDB were used
+* The authors are aware of the fact that different null hypotheses are tested by the different approaches
+* sigPathway and CAMERA offers other options that are discussed in the article but not evaluated

Anonymous

Search

Navigation

Navigation

Show

Wiki tools

Wiki tools

Difference between revisions of "Gene set analysis methods: a systematic comparison"

Namespaces

Page actions

Revision as of 13:20, 25 February 2020

Contents

1 Citation

2 Summary

3 Study outcomes

3.1 Outcome O1: False positives under null distribution

3.2 Outcome O2

3.3 Outcome O3

3.4 Outcome O4

4 Study design and evidence level

4.1 General aspects

4.2 Design for Outcome O1

4.3 Design for Outcome O2

4.4 Design for Outcome O3

5 Further comments and aspects

Anonymous

Search

Navigation

Wiki tools

Page tools

Difference between revisions of "Gene set analysis methods: a systematic comparison"

Revision as of 13:20, 25 February 2020

Contents

1 Citation

2 Summary

3 Study outcomes

3.1 Outcome O1: False positives under null distribution

3.2 Outcome O2

3.3 Outcome O3

3.4 Outcome O4

4 Study design and evidence level

4.1 General aspects

4.2 Design for Outcome O1

4.3 Design for Outcome O2

4.4 Design for Outcome O3

5 Further comments and aspects