Difference between revisions of "Comparing gene set analysis methods on single-nucleotide polymorphism data from Genetic Analysis Workshop 16"

(Comparing gene set analysis methods on single-nucleotide polymorphism data from Genetic Analysis Workshop 16)
(Comparing gene set analysis methods on single-nucleotide polymorphism data from Genetic Analysis Workshop 16)
Line 1: Line 1:
 
__NUMBEREDHEADINGS__
 
__NUMBEREDHEADINGS__
=== Citation ===
+
== Comparing gene set analysis methods on single-nucleotide polymorphism data from Genetic Analysis Workshop 16 ==
 
Tintle, N. L., Borchers, B., Brown, M., & Bekmetjev, A., Comparing gene set analysis methods on single-nucleotide polymorphism data from Genetic Analysis Workshop 16, 2009, BMC proceedings, 3(S7), 96.
 
Tintle, N. L., Borchers, B., Brown, M., & Bekmetjev, A., Comparing gene set analysis methods on single-nucleotide polymorphism data from Genetic Analysis Workshop 16, 2009, BMC proceedings, 3(S7), 96.
  
[https://doi.org/10.1371/journal.pone.0074335 Permanent link to the paper]
+
[https://doi.org/10.1186/1753-6561-3-S7-S96]
  
  
 
=== Summary ===
 
=== Summary ===
Briefly describe the scope of the paper, i.e. the field of research and/or application.
+
The authors write in their abstract that the purpose of this study was to confirm that GSEA and FET are not optimal for the analysis of SNP data when compared with the SUMSTAT.
 +
 
 +
The following methods were compared:
 +
* Gene Set Enrichtment Analysis (GSEA)
 +
* Fisher's Exact Test (FET)
 +
* SUMSTAT (the sum of the test statistics for the individual genes)
 +
* SUMQ (the sum of the squared test statistics), also termed SAM-GS in the paper
 +
 
 +
GSEA, SUMSTAT and SUMQ uses 1000 randomly selected gene sets containing the same number of genes as the set of interest as reference.
  
 
=== Study outcomes ===
 
=== Study outcomes ===
 
List the paper results concerning method comparison and benchmarking:
 
List the paper results concerning method comparison and benchmarking:
 
==== Outcome O1 ====
 
==== Outcome O1 ====
The performance of ...
+
* More gene sets were found by SUMSTAT compared to FET and GSEA.
 +
* There is a large overlap between the considered methods
  
Outcome O1 is presented as Figure X in the original publication.  
+
Outcome O1 is presented as Venn diagram in Figure 1 of the original publication.  
  
 
==== Outcome O2 ====
 
==== Outcome O2 ====
...
+
* More gene sets containting weakly associated SNPs are found be SUMSTAT (compared to the other methods)
 +
*
  
 
Outcome O2 is presented as Figure X in the original publication.  
 
Outcome O2 is presented as Figure X in the original publication.  
+
 
 
==== Outcome On ====
 
==== Outcome On ====
 
...
 
...
Line 36: Line 46:
  
 
==== Design for Outcome O1 ====
 
==== Design for Outcome O1 ====
* The outcome was generated for ...
+
* The outcome was generated for a rather large experimental data from the so-called Framingham Heart Study.
* Configuration parameters were chosen ...
+
The data set contained 957 samples, thereof 157 ever had heart disease and 167 ever had diabites.
* ...
+
* The cutoff for FET has been motivated but seems aribtrary to some extent
 +
* Only a singel combination of configuration parameters was considered.
 +
* The two phenotypes (heart disease and diabetis) were jointly analyzed.  
 +
* 706 gene sets were analyzed for each of the two phenotypes (1412 gene sets in total)
 +
* One cutoff for FET has been considered
 +
* The tradeoff between true positives and false positives (or sensitivty vs. specificity) in general depends on the cutoff. Since only a single cutoff was investigated and the methods predict different numbers of gene sets, it is difficult to interpret this result.
 +
 
 
==== Design for Outcome O2 ====
 
==== Design for Outcome O2 ====
* The outcome was generated for ...
+
* For this outcome, simulated data for 666 vs. 210 samples was generated.  
* Configuration parameters were chosen ...
+
* The size of this data set is similar as the experimental data (O1), but not the same.
 +
* 2000 SNPs were assumed as weakly assiciated with the phenotype, 19 were assumed as strongly associated
 +
* Real gene sets and pseudo-gene sets were considerd
 +
* 4 cutoffs for FET were evaluated
 
* ...
 
* ...
  

Revision as of 12:22, 25 February 2020

1 Comparing gene set analysis methods on single-nucleotide polymorphism data from Genetic Analysis Workshop 16

Tintle, N. L., Borchers, B., Brown, M., & Bekmetjev, A., Comparing gene set analysis methods on single-nucleotide polymorphism data from Genetic Analysis Workshop 16, 2009, BMC proceedings, 3(S7), 96.

[1]


1.1 Summary

The authors write in their abstract that the purpose of this study was to confirm that GSEA and FET are not optimal for the analysis of SNP data when compared with the SUMSTAT.

The following methods were compared:

  • Gene Set Enrichtment Analysis (GSEA)
  • Fisher's Exact Test (FET)
  • SUMSTAT (the sum of the test statistics for the individual genes)
  • SUMQ (the sum of the squared test statistics), also termed SAM-GS in the paper

GSEA, SUMSTAT and SUMQ uses 1000 randomly selected gene sets containing the same number of genes as the set of interest as reference.

1.2 Study outcomes

List the paper results concerning method comparison and benchmarking:

1.2.1 Outcome O1

  • More gene sets were found by SUMSTAT compared to FET and GSEA.
  • There is a large overlap between the considered methods

Outcome O1 is presented as Venn diagram in Figure 1 of the original publication.

1.2.2 Outcome O2

  • More gene sets containting weakly associated SNPs are found be SUMSTAT (compared to the other methods)

Outcome O2 is presented as Figure X in the original publication.

1.2.3 Outcome On

...

Outcome On is presented as Figure X in the original publication.

1.2.4 Further outcomes

If intended, you can add further outcomes here.


1.3 Study design and evidence level

1.3.1 General aspects

You can describe general design aspects here. The study designs for describing specific outcomes are listed in the following subsections:

1.3.2 Design for Outcome O1

  • The outcome was generated for a rather large experimental data from the so-called Framingham Heart Study.

The data set contained 957 samples, thereof 157 ever had heart disease and 167 ever had diabites.

  • The cutoff for FET has been motivated but seems aribtrary to some extent
  • Only a singel combination of configuration parameters was considered.
  • The two phenotypes (heart disease and diabetis) were jointly analyzed.
  • 706 gene sets were analyzed for each of the two phenotypes (1412 gene sets in total)
  • One cutoff for FET has been considered
  • The tradeoff between true positives and false positives (or sensitivty vs. specificity) in general depends on the cutoff. Since only a single cutoff was investigated and the methods predict different numbers of gene sets, it is difficult to interpret this result.

1.3.3 Design for Outcome O2

  • For this outcome, simulated data for 666 vs. 210 samples was generated.
  • The size of this data set is similar as the experimental data (O1), but not the same.
  • 2000 SNPs were assumed as weakly assiciated with the phenotype, 19 were assumed as strongly associated
  • Real gene sets and pseudo-gene sets were considerd
  • 4 cutoffs for FET were evaluated
  • ...

...

1.3.4 Design for Outcome O

  • The outcome was generated for ...
  • Configuration parameters were chosen ...
  • ...

1.4 Further comments and aspects

1.5 References

The list of cited or related literature is placed here.