Comparing gene set analysis methods on single-nucleotide polymorphism data from Genetic Analysis Workshop 16

All methods have similar numbers of false-positives (i.e. similar numbers of identified gene sets in case there are no associated genes)
Most gene sets containting associated SNPs are found by SUMSTAT (compared to the other methods)
All approaches (except FET with cutoff=13.8) fail for the setting "pseudo gene set with 1-2 strongly associated SNPs
The evaluated approaches seem to perform in the following order: SUMSTAT > SUMQ > GSEA > FET 5.9 > FET 9.2 > FET 13.8 > FET 18.4

Outcome O2 is presented as Table 1 in the original publication.

3.3 Further outcomes

The overall best method (SUMSTAT) was also applied to the experimental data in order to interpret the data.

4 Study design and evidence level

4.1 General aspects

4.2 Design for Outcome O1

The outcome was generated for a rather large experimental data from the so-called Framingham Heart Study.

The data set contained 957 samples, thereof 157 ever had heart disease and 167 ever had diabites.

The cutoff for FET was motivated by quantiles of the chi2-distribution but seems aribtrary to some extent
Only a singel combination of configuration parameters was considered.
The two phenotypes (heart disease and diabetis) were jointly analyzed.
706 gene sets were analyzed for each of the two phenotypes (1412 gene sets in total)
One cutoff for FET has been considered
The tradeoff between true positives and false positives (or sensitivty vs. specificity) in general depends on the cutoff. Since only a single cutoff was investigated and the methods predict different numbers of gene sets, it is difficult to interpret this result.

4.3 Design for Outcome O2

For this outcome, simulated data for 666 vs. 210 samples was generated.
The size of this data set is similar as the experimental data (O1), but not the same.
2000 SNPs were assumed as weakly assiciated with the phenotype, 19 were assumed as strongly associated
Real gene sets and pseudo-gene sets were considerd
4 cutoffs for FET were evaluated
The following regulation/associtation was simulated:
- real gene sets without associtated genes
- real gene sets with some weakly (1-9) associtated genes
- real gene sets with many weakly (10+) associtated genes
- real gene sets with one to two strongly associated genes (no weakly associated genes)
- pseudo-gene sets without strongly genes
- pseudo-gene sets with a large numbers of weakly associated genes
- pseudo-gene sets with some strongly genes
It is counterintuitive that the proportion of identified gene sets does not monotonically depend on the chosen cutoff for FET
There is no distinct evaluation in terms of true positives and false positives. Only the "percent of sets found as significant" are provided.
For FET, extrapolation to smaller cutoffs seems to possibly outperform the other approaches. However, sensitivity and specificity should be considered.
There is no graphical depiction of the outcome

5 Further comments and aspects

The authors conclude that in agreement with other publications, GSEA and FET are less powerful.
In the paper, there is no discussion about sensitivity vs. specificity

Anonymous

Search

Navigation

Navigation

Show

Wiki tools

Wiki tools

Comparing gene set analysis methods on single-nucleotide polymorphism data from Genetic Analysis Workshop 16

Namespaces

Page actions

Contents

1 Citation

2 Summary

3 Study outcomes

3.1 Outcome O1

3.2 Outcome O2

3.3 Further outcomes

4 Study design and evidence level

4.1 General aspects

4.2 Design for Outcome O1

4.3 Design for Outcome O2

5 Further comments and aspects

Anonymous

Search

Navigation

Wiki tools

Page tools

Comparing gene set analysis methods on single-nucleotide polymorphism data from Genetic Analysis Workshop 16

Contents

1 Citation

2 Summary

3 Study outcomes

3.1 Outcome O1

3.2 Outcome O2

3.3 Further outcomes

4 Study design and evidence level

4.1 General aspects

4.2 Design for Outcome O1

4.3 Design for Outcome O2

5 Further comments and aspects