Difference between revisions of "A general modular framework for gene set enrichment analysis"

Latest revision as of 15:40, 25 February 2020

The transformation of the gene level statistic has a substantial impact
Transformations help to find gene sets that contain up- and downregulated genes
Combination of square transformation and rank transformation shows the best overall performance
Binary transformation (i.e. using a cutpoint) and FDRs decrease the performance

Outcomes O1 and O2 are presented as Table 2 in the original publication.

3.3 Outcome O3: Gene set statistics

"mean and the maxmean statistic produce ... overall very good results"
"median and the Wilcoxon test are primarily advantageous if the competitive null hypothesis is tested, or if there are many outliers in the data"
"conditional FDR ... vary strongly with the choice of the gene-level statistic, transformation and permutation approach.
The ES score showed a rather weak performance

Outcomes O3 are presented as Table 3 in the original publication.

3.4 Outcome O4: Significance assessment

The parametric approach has the best power but is overoptimistic if the assumption of statistical indpendence is violated
Permutation seems to slightly outperform resampling
"restandardization procedure performs very similar to resampling"

Outcomes O4 are presented as Table 4 in the original publication.

3.5 Outcome O5: Global approaches

The performance of the globaltest procedure "is not better than that of the less sophisticated univariate methods" but "is computationally a little bit faster".
For Hotellings T2-test:
- an "overall poor" performance was obtained
- "the uncorrelated sets are found with the same reliability as with univariate approaches. However, ... the sets with correlation ... are hardly detected."
- shows "improved performance with sample label permutation as opposed to gene sampling."

Outcomes O5 are presented as Table 5 for the global test and in Table 6 for Hotellings T2 in the original publication.

3.6 Further outcomes

4 Study design and evidence level

4.1 General aspects

100 data sets were simulated
The simulated data sets have 600 features (genes) and 20 samples (10 vs. 10)
The data was simulated with normally distributed noise with variance equals to one
520 genes were consided as uninformative (delta=0, rho=0)
Altogether, nine different simulation data sets were generated that consist of the following combinations:
- Gene sets with different levels of differential expression (delta \in {0, 0.75, 1, -1}) were simulated
- Gene sets with varying levels of intra-group correlation (rho \in {0, 0.6, -0.6}) were simulated
- Gene sets that contain regulated and unregulated genes (half/half) were generated as well as gene set that contain up- and downregulated genes.
"The gene set statistic ES was not combined with a binary transformation since the latter does not allow a sensible ranking of the genes."
In total
- 3 gene level statistics ×
- 5 transformations ×
- 6 gene set statistics ×
- 3 significance assessments
- minus 9 insensible combinations
- = 261 (in total) variants of gene set analyses were considered
The authors count how frequently the p-values that assess significance at the gene-set level are below a significance level 0.05

4.2 Design for Outcome O1: Gene level statistics

The authors consider the impact of the selected approach at for module 1 (see summary above)
Three approaches were considered: t, moderated t and correlation
These approaches were evaluated for five different transformations (see O2)

Multiple other approaches
The authors already provide the important hint that the dependency on the gene level test statistic might be more relevant for smaller sample size (e.g. 3 vs 3)

4.3 Design for Outcome O2: Transformation of the gene level statistics

The outcome was generated for five different transformations (and three gene level statistics)

4.4 Design for Outcome O3: Gene set statistics

Three gene set statistics were investigated:
- mean
- maxmean
- median
- ES
- conditional FDR
- Wilcoxon
This analyses were performed for the moderated t statistic (gene level) and by using the quadratic transformation. For significance assessment, resampling was applied.

4.5 Design for Outcome O4: Significance assessment

Four different approaches for assessing significance at the gene set level were evaluated:
- parametric
- resampling
- permutation
- restandardization
This analysis was performed by using the moderated t as the gene level statistic in combination with a quadratic transformation and the mean as the gene set statistic

4.6 Design for Outcome O5: Global approaches

globaltest andHotelling's T2-test with a shrinkage covariance matrix was considered

5 Further comments and aspects

Simulation is NOT based on characteristics or gene sets derived from real data
The paper provides very comprehensive outcomes in terms of combinations of approaches
After the paper was published another type of gene set statistics appeared that is based on Kolmogorov-Smirnov test. This approach is applied e.g. for GSEA.

@@ Line 18: / Line 18: @@
 === Study outcomes ===
-List the paper results concerning method comparison and benchmarking:
 ==== Outcome O1: Gene level statistics ====
 * The choice of the gene-level statistics (t, moderated t, or correlation) does NOT have a great impact
@@ Line 26: / Line 25: @@
 ==== Outcome O2: Transformation of the gene level statistics ====
-* The transformation has a substantial impact
+* The transformation of the gene level statistic has a substantial impact
 * Transformations help to find gene sets that contain up- and downregulated genes
 * Combination of square transformation and rank transformation shows the best overall performance
+* Binary transformation (i.e. using a cutpoint) and FDRs decrease the performance
 Outcomes O1 and O2 are presented as Table 2 in the original publication.
-==== Outcome On ====
+==== Outcome O3: Gene set statistics ====
-...
+* "mean and the maxmean statistic produce ... overall very good results"
+* "median and the Wilcoxon test are primarily advantageous if the competitive null hypothesis is tested, or if there are many outliers in the data"
+* "conditional FDR ... vary strongly with the choice of the gene-level statistic, transformation and permutation approach.
+* The ES score showed a rather weak performance
+Outcomes O3 are presented as Table 3 in the original publication.
+==== Outcome O4: Significance assessment ====
+* The parametric approach has the best power but is overoptimistic if the assumption of statistical indpendence is violated
+* Permutation seems to slightly outperform resampling
+* "restandardization procedure performs very similar to resampling"
+Outcomes O4 are presented as Table 4 in the original publication.
+==== Outcome O5: Global approaches ====
+* The performance of the globaltest procedure "is not better than that of the less sophisticated univariate methods" but "is computationally a little bit faster".
+* For Hotellings T2-test:
+** an "overall poor" performance was obtained
+** "the uncorrelated sets are found with the same reliability as with univariate approaches. However, ... the sets with correlation ... are hardly detected."
+** shows "improved performance with sample label permutation as opposed to gene sampling."
-Outcome On is presented as Figure X in the original publication.
+Outcomes O5 are presented as Table 5 for the global test and in Table 6 for Hotellings T2 in the original publication.
 ==== Further outcomes ====
-If intended, you can add further outcomes here.
 === Study design and evidence level ===
@@ Line 48: / Line 65: @@
 * 520 genes were consided as uninformative (delta=0, rho=0)
 * Altogether, nine different simulation data sets were generated that consist of the following combinations:
-** Gene  sets  with  different levels of differential expression (delta \in {0, 0.75, 1, -1})  were simulated
+** Gene sets with different levels of differential expression (delta \in {0, 0.75, 1, -1}) were simulated
 ** Gene sets with varying levels of intra-group correlation (rho \in {0, 0.6, -0.6}) were simulated
 ** Gene sets that contain regulated and unregulated genes (half/half) were generated as well as gene set that contain up- and downregulated genes.
@@ Line 59: / Line 76: @@
 ** minus 9 insensible combinations
 ** = 261 (in total) variants of gene set analyses were considered
+* The authors count how frequently the p-values that assess significance at the gene-set level are below a significance level 0.05
@@ Line 72: / Line 89: @@
 ==== Design for Outcome O2: Transformation of the gene level statistics ====
 * The outcome was generated for five different transformations (and three gene level statistics)
-* Configuration parameters were chosen ...
-* ...
-...
+==== Design for Outcome O3: Gene set statistics ====
+* Three gene set statistics were investigated:
+** mean
+** maxmean
+** median
+** ES
+** conditional FDR
+** Wilcoxon
+* This analyses were performed for the moderated t statistic (gene level) and by using the quadratic transformation. For significance assessment, resampling was applied.
-(resampling, permutation, restandardization)
+==== Design for Outcome O4: Significance assessment ====
+* Four different approaches for assessing significance at the gene set level were evaluated:
+** parametric
+** resampling
+** permutation
+** restandardization
+* This analysis was performed by using the moderated t as the gene level statistic in combination with a quadratic transformation and the mean as the gene set statistic
-==== Design for Outcome O ====
+==== Design for Outcome O5: Global approaches ====
-* The outcome was generated for ...
+* globaltest andHotelling's T2-test with a shrinkage covariance matrix was considered
-* Configuration parameters were chosen ...
-* ...
 === Further comments and aspects ===
 * Simulation is NOT based on characteristics or gene sets derived from real data
 * The paper provides very comprehensive outcomes in terms of combinations of approaches
+* After the paper was published another type of gene set statistics appeared that is based on Kolmogorov-Smirnov test. This approach is applied e.g. for GSEA.
 === References ===
-The list of cited or related literature is placed here.

Anonymous

Search

Navigation

Navigation

Show

Wiki tools

Wiki tools

Difference between revisions of "A general modular framework for gene set enrichment analysis"

Namespaces

Page actions

Latest revision as of 15:40, 25 February 2020

Contents

1 Citation

2 Summary

3 Study outcomes

3.1 Outcome O1: Gene level statistics

3.2 Outcome O2: Transformation of the gene level statistics

3.3 Outcome O3: Gene set statistics

3.4 Outcome O4: Significance assessment

3.5 Outcome O5: Global approaches

3.6 Further outcomes

4 Study design and evidence level

4.1 General aspects

4.2 Design for Outcome O1: Gene level statistics

4.3 Design for Outcome O2: Transformation of the gene level statistics

4.4 Design for Outcome O3: Gene set statistics

4.5 Design for Outcome O4: Significance assessment

4.6 Design for Outcome O5: Global approaches

5 Further comments and aspects

6 References

Anonymous

Search

Navigation

Wiki tools

Page tools

Difference between revisions of "A general modular framework for gene set enrichment analysis"

Latest revision as of 15:40, 25 February 2020

Contents

1 Citation

2 Summary

3 Study outcomes

3.1 Outcome O1: Gene level statistics

3.2 Outcome O2: Transformation of the gene level statistics

3.3 Outcome O3: Gene set statistics

3.4 Outcome O4: Significance assessment

3.5 Outcome O5: Global approaches

3.6 Further outcomes

4 Study design and evidence level

4.1 General aspects

4.2 Design for Outcome O1: Gene level statistics

4.3 Design for Outcome O2: Transformation of the gene level statistics

4.4 Design for Outcome O3: Gene set statistics

4.5 Design for Outcome O4: Significance assessment

4.6 Design for Outcome O5: Global approaches

5 Further comments and aspects

6 References