Difference between revisions of "A general modular framework for gene set enrichment analysis"
Line 19: | Line 19: | ||
=== Study outcomes === | === Study outcomes === | ||
List the paper results concerning method comparison and benchmarking: | List the paper results concerning method comparison and benchmarking: | ||
− | ==== Outcome O1 ==== | + | ==== Outcome O1: Gene level statistics ==== |
− | The | + | * The choice of the gene-level statistics (t, moderated t, or correlation) does NOT have a great impact |
+ | * t statistic, moderated t, and correlation fail to find gene sets that contain up- and downregulated genes | ||
− | + | Outcomes O1 and O2 are presented as Table 2 in the original publication. | |
− | ==== Outcome O2 ==== | + | ==== Outcome O2: Transformation of the gene level statistics ==== |
− | + | * The transformation has a substantial impact | |
+ | * Transformations help to find gene sets that contain up- and downregulated genes | ||
+ | * Combination of square transformation and rank transformation shows the best overall performance | ||
− | + | Outcomes O1 and O2 are presented as Table 2 in the original publication. | |
==== Outcome On ==== | ==== Outcome On ==== | ||
Line 40: | Line 43: | ||
=== Study design and evidence level === | === Study design and evidence level === | ||
==== General aspects ==== | ==== General aspects ==== | ||
+ | * 100 data sets were simulated | ||
* The simulated data sets have 600 features (genes) and 20 samples (10 vs. 10) | * The simulated data sets have 600 features (genes) and 20 samples (10 vs. 10) | ||
* The data was simulated with normally distributed noise with variance equals to one | * The data was simulated with normally distributed noise with variance equals to one | ||
* 520 genes were consided as uninformative (delta=0, rho=0) | * 520 genes were consided as uninformative (delta=0, rho=0) | ||
− | * Gene sets with different levels of differential expression (delta \in {0, 0.75, 1, -1}) were simulated | + | * Altogether, nine different simulation data sets were generated that consist of the following combinations: |
− | * Gene sets with varying levels of intra-group correlation (rho \in {0, 0.6, -0.6}) were simulated | + | ** Gene sets with different levels of differential expression (delta \in {0, 0.75, 1, -1}) were simulated |
− | * Gene sets that contain regulated and unregulated genes were generated as well as gene set that contain up- and downregulated genes. | + | ** Gene sets with varying levels of intra-group correlation (rho \in {0, 0.6, -0.6}) were simulated |
− | * | + | ** Gene sets that contain regulated and unregulated genes (half/half) were generated as well as gene set that contain up- and downregulated genes. |
+ | * "The gene set statistic ES was not combined with a binary transformation since the latter does not allow a sensible ranking of the genes." | ||
+ | * In total | ||
+ | ** 3 gene level statistics × | ||
+ | ** 5 transformations × | ||
+ | ** 6 gene set statistics × | ||
+ | ** 3 significance assessments | ||
+ | ** minus 9 insensible combinations | ||
+ | ** = 261 (in total) variants of gene set analyses were considered | ||
+ | |||
+ | |||
+ | |||
+ | ==== Design for Outcome O1: Gene level statistics ==== | ||
+ | * The authors consider the impact of the selected approach at for module 1 (see summary above) | ||
+ | * Three approaches were considered: t, moderated t and correlation | ||
+ | * These approaches were evaluated for five different transformations (see O2) | ||
− | + | * Multiple other approaches | |
− | * The | + | * The authors already provide the important hint that the dependency on the gene level test statistic might be more relevant for smaller sample size (e.g. 3 vs 3) |
− | + | ||
− | + | ==== Design for Outcome O2: Transformation of the gene level statistics ==== | |
− | ==== Design for Outcome O2 ==== | + | * The outcome was generated for five different transformations (and three gene level statistics) |
− | * The outcome was generated for | ||
* Configuration parameters were chosen ... | * Configuration parameters were chosen ... | ||
* ... | * ... | ||
... | ... | ||
+ | |||
+ | (resampling, permutation, restandardization) | ||
==== Design for Outcome O ==== | ==== Design for Outcome O ==== | ||
Line 65: | Line 85: | ||
=== Further comments and aspects === | === Further comments and aspects === | ||
+ | * Simulation is NOT based on characteristics or gene sets derived from real data | ||
+ | * The paper provides very comprehensive outcomes in terms of combinations of approaches | ||
+ | |||
=== References === | === References === | ||
The list of cited or related literature is placed here. | The list of cited or related literature is placed here. |
Revision as of 14:43, 25 February 2020
Contents
1 Citation
M Ackermann and K Strimmer, A general modular framework for gene set enrichment analysis, 2009, BMC Bioinformatics, 10:47, pages etc in any possible citation style.
2 Summary
Gene set analyses have a modular structure, i.e. they consist of
- gene level statistics
- gene level significance assessment
- gene set statistics
- gene set significance assessment
- statistical conclusion
Alternatively, steps 1.-3. might be replaced by a single global test.
In this paper, 261 different variants of gene set enrichment procedures were evaluated based on simulated and experimental data.
3 Study outcomes
List the paper results concerning method comparison and benchmarking:
3.1 Outcome O1: Gene level statistics
- The choice of the gene-level statistics (t, moderated t, or correlation) does NOT have a great impact
- t statistic, moderated t, and correlation fail to find gene sets that contain up- and downregulated genes
Outcomes O1 and O2 are presented as Table 2 in the original publication.
3.2 Outcome O2: Transformation of the gene level statistics
- The transformation has a substantial impact
- Transformations help to find gene sets that contain up- and downregulated genes
- Combination of square transformation and rank transformation shows the best overall performance
Outcomes O1 and O2 are presented as Table 2 in the original publication.
3.3 Outcome On
...
Outcome On is presented as Figure X in the original publication.
3.4 Further outcomes
If intended, you can add further outcomes here.
4 Study design and evidence level
4.1 General aspects
- 100 data sets were simulated
- The simulated data sets have 600 features (genes) and 20 samples (10 vs. 10)
- The data was simulated with normally distributed noise with variance equals to one
- 520 genes were consided as uninformative (delta=0, rho=0)
- Altogether, nine different simulation data sets were generated that consist of the following combinations:
- Gene sets with different levels of differential expression (delta \in {0, 0.75, 1, -1}) were simulated
- Gene sets with varying levels of intra-group correlation (rho \in {0, 0.6, -0.6}) were simulated
- Gene sets that contain regulated and unregulated genes (half/half) were generated as well as gene set that contain up- and downregulated genes.
- "The gene set statistic ES was not combined with a binary transformation since the latter does not allow a sensible ranking of the genes."
- In total
- 3 gene level statistics ×
- 5 transformations ×
- 6 gene set statistics ×
- 3 significance assessments
- minus 9 insensible combinations
- = 261 (in total) variants of gene set analyses were considered
4.2 Design for Outcome O1: Gene level statistics
- The authors consider the impact of the selected approach at for module 1 (see summary above)
- Three approaches were considered: t, moderated t and correlation
- These approaches were evaluated for five different transformations (see O2)
- Multiple other approaches
- The authors already provide the important hint that the dependency on the gene level test statistic might be more relevant for smaller sample size (e.g. 3 vs 3)
4.3 Design for Outcome O2: Transformation of the gene level statistics
- The outcome was generated for five different transformations (and three gene level statistics)
- Configuration parameters were chosen ...
- ...
...
(resampling, permutation, restandardization)
4.4 Design for Outcome O
- The outcome was generated for ...
- Configuration parameters were chosen ...
- ...
5 Further comments and aspects
- Simulation is NOT based on characteristics or gene sets derived from real data
- The paper provides very comprehensive outcomes in terms of combinations of approaches
6 References
The list of cited or related literature is placed here.