Difference between revisions of "Literature Studies"

(Identifying sets of features (e.g. gene set analyses))
(4 additional imputation benchmark studies (for scRNAseq))
 
(20 intermediate revisions by the same user not shown)
Line 6: Line 6:
 
| Here outcomes of benchmarking studies from the literature are collected. The primary aim is a comprehensive overview about neutral benchmark studies, i.e. assessments which were performed independenty on publication of a new approach. Studies which are not neutral are put in brackets. </br>  
 
| Here outcomes of benchmarking studies from the literature are collected. The primary aim is a comprehensive overview about neutral benchmark studies, i.e. assessments which were performed independenty on publication of a new approach. Studies which are not neutral are put in brackets. </br>  
  
The focus is on computational methods for analyzing experimental data (instead of comparing experimental techniques or platforms). </br>
+
The focus is on computational methods for analyzing experimental data form the molecular biology field (instead of comparing experimental techniques or platforms). </br>
  
 
Please extend this list by creating a new page and adding a link below. </br>  
 
Please extend this list by creating a new page and adding a link below. </br>  
Line 13: Line 13:
  
 
== Results from Literature ==
 
== Results from Literature ==
 +
https://journals.tubitak.gov.tr/biology/issues/biy-21-45-2/biy-45-2-1-2008-8.pdf
  
=== Classification ===
+
=== Preprocessing high-throughput data===
''' 2003 '''</br>
 
* [[Comparison of statistical methods for classification of ovarian cancer using mass spectrometry data]]
 
''' 2005 '''</br>
 
* [[A review and comparison of classification algorithms for medical decision making]]
 
''' 2016 '''</br>
 
* [[Predicting Breast Cancer Survivability Using Data Mining Techniques]]
 
 
 
=== Selection of Differential Features and Regions ===
 
==== Identifying differential features ====
 
''' 2006 '''</br>
 
* [[Rat toxicogenomic study reveals analytical consistency across microarray platforms]]
 
''' 2010 '''</br>
 
* [[A comprehensive assessment of RNA-seq accuracy, reproducibility and information content by the sequencing Quality control consortium]]
 
''' 2017 '''</br>
 
* [[Identification of differentially expressed peptides in high-throughput proteomics data]]
 
* [[In-depth method assessments of differentially expressed protein detection for shotgun proteomics data with missing values]]
 
* [[Strategies for analyzing bisulfite sequencing data]]
 
''' 2018 '''</br>
 
* [[Identification of Differentially Methylated Sites with Weak Methylation Effects]]
 
 
 
==== Identifying differential regions (e.g. DMRs) ====
 
 
{| class="wikitable sortable"
 
{| class="wikitable sortable"
 
|-
 
|-
! 2015 || Peters || [[De novo identification of differentially methylated regions in the human genome]]
+
! Year || First Author || Title
 +
|- 1999 || Perkins DN || [[Probability-based protein identification by searching sequence databases using mass spectrometry data]]
 +
|-
 +
| 2003 || Bolstad || [[A comparison of normalization methods for high density oligonucleotide array data based on variance and bias]]
 +
|-
 +
| 2003 || Gentzel || [[Preprocessing of tandem mass spectrometric data to support automatic protein identification]]
 +
|-
 +
| 2005 || Irizarry || [[Comparison of Affymetrix GeneChip Expression Measures]]
 +
|-
 +
| 2005 || Meleth S || [[The case for well-conducted experiments to validate statistical protocols for 2D gels: different pre-processing = different lists of significant proteins]]
 +
|-
 +
| 2005 || Freudenberg || [[Comparison of background correction and normalization procedures for high-density oligonucleotide microarrays]]
 +
|-
 +
| 2006 || Shippy || [[Using RNA sample titrations to assess microarray platform performance and normalization techniques]]
 +
|-
 +
| 2006 || Wang P ||  [[Normalization regarding non-random missing values in high-throughput mass spectrometry data]]
 +
|-
 +
| 2006 || Du P ||  [[Improved peak detection in mass spectrum by incorporating continuous wavelet transform-based pattern matching]]
 +
|-
 +
| 2007 || Carvalho B ||  [[Exploration, normalization, and genotype calls of high-density oligonucleotide SNP array data]]
 +
|-
 +
| 2007 || Cannataro M ||  [[MS‐Analyzer: preprocessing and data mining services for proteomics applications on the Grid]]
 +
|-
 +
| 2008 || Goebels ||  [[Comparison of preprocessing methods for the hgU133+2 chip from Affymetrix]]
 +
|-
 +
| 2009 || Autio ||  [[Comparison of Affymetrix data normalization methods using 6,926 experiments across five array generations]]
 +
|-
 +
| 2009 || Mar JC ||  [[Data-driven normalization strategies for high-throughput quantitative RT-PCR]]
 +
|-
 +
| 2009 || Vakhrushev SY ||  [[Software platform for high-throughput glycomics]]
 +
|-
 +
| 2010 || Fan ||  [[Consistency of predictive signature genes and classifiers generated using different microarray platforms]]
 +
|-
 +
| 2010 || Li ||  [[Detecting and correcting systematic variation in large-scale RNA sequencing data]]
 +
|-
 +
| 2010 || Bullard ||  [[Evaluation of statistical methods for normalization and differential expression in mRNA-Seq experiments]]
 +
|-
 +
| 2010 || Risso ||  [[Normalization of RNA-seq data using factor analysis of control genes or samples]]
 +
|-
 +
| 2010 || Armananzas R ||  [[Peakbin selection in mass spectrometry data using a consensus approach with estimation of distribution algorithms]]
 +
|-
 +
| 2011 || McCall || [[Affymetrix GeneChip microarray preprocessing for multivariate analyses]]
 +
|-
 +
| 2011 || Zhang ZM || [[Peak alignment using wavelet pattern matching and differential evolution]]
 +
|-
 +
| 2012 || Dillies || [[A comprehensive evaluation of normalization methods for Illumina high-throughput RNA sequencing data analysis]]
 +
|-
 +
| 2013 || García-Torres M || [[Comparison of metaheuristic strategies for peakbin selection in proteomic mass spectrometry data]]
 +
|-
 +
| 2013 || Horvatovich P  || [[Bioinformatics and Statistics: LC‐MS (/MS) Data Preprocessing for Biomarker Discovery]]
 +
|-
 +
| 2014 || Chawade || [[Normalyzer: A Tool for Rapid Evaluation of Normalization Methods for Omics Data Sets]]
 +
|-
 +
| 2014 || Zhou X || [[Prevention, diagnosis and treatment of high-throughput sequencing data pathologies]]
 +
|-
 +
| 2014 || Coble JB || [[Comparative evaluation of preprocessing freeware on chromatography/mass spectrometry data for signature discovery]]
 
|-
 
|-
| 2015 || Bhasin || [[MethylAction: detecting differentially methylated regions that distinguish biological subtypes]]
+
| 2014 || Aggio RB || [[Identifying and quantifying metabolites by scoring peaks of GC-MS data]]
 
|-
 
|-
| 2015 || Jühling || [[metilene: Fast and sensitive calling of differentially methylated regions from bisulfite sequencing data]]
+
| 2014 || Cox J || [[Accurate proteome-wide label-free quantification by delayed normalization and maximal peptide ratio extraction, termed MaxLFQ]]
 
|-
 
|-
| 2016 || Kolde || [[seqlm: an MDL based method for identifying differentially methylated regions in high density methylation array data]]
+
| 2015 || Caraus I || [[Detecting and overcoming systematic bias in high-throughput screening technologies: a comprehensive review of practical issues and methodological solutions]]
 
|-
 
|-
| 2016 || Ayyala || [[Statistical methods for detecting differentially methylated regions based on MethylCap-seq data]]
+
| 2015 || Tam S || [[Optimization of miRNA-seq data preprocessing]]
 
|-
 
|-
| 2017 || Gaspar || [[DMRfinder: efficiently identifying differentially methylated regions from MethylC-seq data]]
+
| 2015 || Rafiei A || [[Comparison of peak‐picking workflows for untargeted liquid chromatography/high‐resolution mass spectrometry metabolomics data analysis]]
 
|-
 
|-
| 2018 || Condon || [[Defiant: (DMRs: easy, fast, identification and ANnoTation) identifies differentially Methylated regions from iron-deficient rat hippocampus]]
+
| 2015 || Chawade A || [[Data processing has major impact on the outcome of quantitative label-free LC-MS analysis]]
 
|-
 
|-
| 2018 || Catoni || [[DMRcaller: a versatile R/Bioconductor package for detection and visualization of differentially methylated regions in CpG and non-CpG contexts]]
+
| 2015 || Wang T || [[A systematic study of normalization methods for Infinium 450K methylation data using whole-genome bisulfite sequencing data]]
 
|-
 
|-
| 2018 || Gong || [[MethCP: Differentially Methylated Region Detection with Change Point Models (bioRxiv)]]
+
| 2015 || Lu J || [[Improved Peak Detection and Deconvolution of Native Electrospray Mass Spectra from Large Protein Complexes]]
|}
 
 
 
==== Identifying sets of features (e.g. gene set analyses) ====
 
{| class="wikitable sortable"
 
 
|-
 
|-
! Year || First Author || Title
+
| 2016 || Yi L || [[Chemometric methods in data processing of mass spectrometry-based metabolomics: A review]]
 
|-
 
|-
| 2009 || Ackermann || [[A general modular framework for gene set enrichment analysis]]
+
| 2016 || Tsuji J || [[Evaluation of preprocessing, mapping and postprocessing algorithms for analyzing whole genome bisulfite sequencing data]]
 
|-
 
|-
| 2009 || Tintle || [[Comparing gene set analysis methods on single-nucleotide polymorphism data from Genetic Analysis Workshop 16]]
+
| 2016 || Li B || [[Performance Evaluation and Online Realization of Data-driven Normalization Methods Used in LC/MS based Untargeted Metabolomics Analysis]]
 
|-
 
|-
| 2018 || Mathur || [[Gene set analysis methods: a systematic comparison]]
+
| 2016 || Zheng Y || [[An improved algorithm for peak detection in mass spectra based on continuous wavelet transform]]
 
|-
 
|-
| 2020 || Geistlinger || [[Toward a gold standard for benchmarking gene set enrichment analysis]]
+
| 2017 || Li B || [[NOREVA: normalization and evaluation of MS-based metabolomics data]]
|}
 
 
 
==== Dimension reduction ====
 
 
 
{| class="wikitable sortable"
 
 
|-
 
|-
! Year || First Author || Title
+
| 2018 || Mazoure B || [[Identification and Correction of Additive and Multiplicative Spatial Biases in Experimental High-Throughput Screening]]
 
|-
 
|-
| 2008 || Janecek || [[On the Relationship Between Feature Selection and Classification Accuracy]]
+
| 2018 || Li Z || [[Comprehensive evaluation of untargeted metabolomics data processing software in feature detection, quantification and discriminating marker selection]]
 
|-
 
|-
| 2015 || Fernández-Gutiérrez || [[Comparing feature selection methods for highdimensional imbalanced data: identifying rheumatoid arthritis cohorts from routine data]]
+
| 2018 || Willforss J || [[NormalyzerDE: Online Tool for Improved Normalization of Omics Expression Data and High-Sensitivity Differential Expression Analysis]]
 
|}
 
|}
 +
  
 
=== Imputation methods for missing values ===
 
=== Imputation methods for missing values ===
Line 137: Line 164:
 
|-
 
|-
 
| 2018 || O'Brien JJ || [[The effects of nonignorable missing data on label-free mass spectrometry proteomics experiments]]
 
| 2018 || O'Brien JJ || [[The effects of nonignorable missing data on label-free mass spectrometry proteomics experiments]]
 +
|-
 +
| 2019 || Gunady MK || [[scGAIN: Single Cell RNA-seq Data Imputation using Generative Adversarial Networks]]
 +
|-
 +
| 2020 || Hou W || [[A systematic evaluation of single-cell RNA-sequencing imputation methods]]
 +
|-
 +
| 2020 || Zhang L || [[Comparison of Computational Methods for Imputing Single-Cell RNA-Sequencing Data]]
 +
|-
 +
| 2021 || Steinheuer LM || [[Benchmarking scRNA-seq imputation tools with respect to network inference highlights deficits in performance at high levels of sparsity]]
 +
|-
 +
| 2021 || Jin L || [[A comparative study of evaluating missing value imputation methods in label-free proteomics]]
 
|}
 
|}
  
=== ODE-based Modelling ===
+
=== Selection of Differential Features and Regions ===
 +
==== Identifying differential features ====
 
{| class="wikitable sortable"
 
{| class="wikitable sortable"
 
|-
 
|-
 
! Year || First Author || Title
 
! Year || First Author || Title
 
|-
 
|-
| 2001 || Beal || [[Ways to Fit a PK Model with Some Data Below the Quantification Limit]]
+
| 2006 || Guo || [[Rat toxicogenomic study reveals analytical consistency across microarray platforms]]
 +
|-
 +
| 2006 || Yang || [[The impact of sample imbalance on identifying differentially expressed genes]]
 +
|-
 +
| 2010 || Su || [[A comprehensive assessment of RNA-seq accuracy, reproducibility and information content by the sequencing Quality control consortium]]
 +
|-
 +
| 2014 || Ching || [[Power analysis and sample size estimation for RNA-Seq differential expression]]
 +
|-
 +
| 2017 || van Ooijen || [[Identification of differentially expressed peptides in high-throughput proteomics data]]
 
|-
 
|-
| 2008 || Balsa-Canto || [[Hybrid optimization method with general switching strategy for parameter estimation]]
+
| 2017 || Wang || [[In-depth method assessments of differentially expressed protein detection for shotgun proteomics data with missing values]]
 
|-
 
|-
| 2011 || Tashkova || [[Parameter estimation with bio-inspired meta-heuristic optimization: modeling the dynamics of endocytosis]]
+
| 2017 || Wreczycka || [[Strategies for analyzing bisulfite sequencing data]]
 
|-
 
|-
| 2013 || Raue || [[Lessons Learned from Quantitative Dynamical Modeling in Systems Biology]]
+
| 2018 || Tran || [[Identification of Differentially Methylated Sites with Weak Methylation Effects]]
 
|-
 
|-
| 2013 || Dondelinger || [[ODE parameter inference using adaptive gradient matching with Gaussian processes]]
+
| 2020 || Li || [[Choice of library size normalization and statistical methods for differential gene expression analysis in balanced two-group comparisons for RNA-seq studies]]
 +
|}
 +
 
 +
==== Identifying differential regions (e.g. DMRs) ====
 +
{| class="wikitable sortable"
 
|-
 
|-
| 2017 || Ballnus |[[Comprehensive benchmarking of Markov chain Monte Carlo methods for dynamical systems]]
+
! 2015 || Peters || [[De novo identification of differentially methylated regions in the human genome]]
 
|-
 
|-
| 2017 || Henriques || [[Data-driven reverse engineering of signaling pathways using ensembles of dynamic models]]
+
| 2015 || Bhasin || [[MethylAction: detecting differentially methylated regions that distinguish biological subtypes]]
 
|-
 
|-
| 2017 || Melicher || [[Fast derivatives of likelihood functionals for ODE based models using adjoint-state method]]
+
| 2015 || Jühling || [[metilene: Fast and sensitive calling of differentially methylated regions from bisulfite sequencing data]]
 
|-
 
|-
| 2017 || Penas || [[Parameter estimation in large-scale systems biology models: a parallel and self-adaptive cooperative strategy]]
+
| 2016 || Kolde || [[seqlm: an MDL based method for identifying differentially methylated regions in high density methylation array data]]
 
|-
 
|-
| 2017 || Degasperi || [[Performance of objective functions and optimization procedures for parameter estimation in system biology models]]
+
| 2016 || Ayyala || [[Statistical methods for detecting differentially methylated regions based on MethylCap-seq data]]
 
|-
 
|-
| 2017 || Fröhlich || [[Scalable Parameter Estimation for Genome-Scale Biochemical Reaction Networks]]
+
| 2017 || Gaspar || [[DMRfinder: efficiently identifying differentially methylated regions from MethylC-seq data]]
 
|-
 
|-
| 2018 || Schälte || [[Evaluation of Derivative-Free Optimizers for Parameter Estimation in Systems Biology]]
+
| 2018 || Condon || [[Defiant: (DMRs: easy, fast, identification and ANnoTation) identifies differentially Methylated regions from iron-deficient rat hippocampus]]
 
|-
 
|-
| 2018 || Loos || [[Hierarchical optimization for the efficient parametrization of ODE models]]
+
| 2018 || Catoni || [[DMRcaller: a versatile R/Bioconductor package for detection and visualization of differentially methylated regions in CpG and non-CpG contexts]]
 
|-
 
|-
| 2018 || Stapor || [[Optimization and profile calculation of ODE models using second order adjoint sensitivity analysis]]
+
| 2018 || Gong || [[MethCP: Differentially Methylated Region Detection with Change Point Models (bioRxiv)]]
 +
|}
 +
 
 +
==== Identifying sets of features (e.g. gene set analyses) ====
 +
{| class="wikitable sortable"
 
|-
 
|-
| 2019 || Villaverde || [[A comparison of methods for quantifying prediction uncertainty in systems biology]]
+
! Year || First Author || Title
 
|-
 
|-
| 2019 || Hass || [[Benchmark problems for dynamic modeling of intracellular processes]]
+
| 2009 || Ackermann || [[A general modular framework for gene set enrichment analysis]]
 
|-
 
|-
| 2019 || Villaverde || [[Benchmarking optimization methods for parameter estimation in large kinetic models]]
+
| 2009 || Tintle || [[Comparing gene set analysis methods on single-nucleotide polymorphism data from Genetic Analysis Workshop 16]]
 
|-
 
|-
| 2019 || Lines || [[Efficient computation of steady states in large-scale ODE models of biochemical reaction networks]]
+
| 2018 || Mathur || [[Gene set analysis methods: a systematic comparison]]
 
|-
 
|-
| 2019 || Stapor || [[Mini-batch optimization enables training of ODE models on large-scale datasets]]
+
| 2020 || Geistlinger || [[Toward a gold standard for benchmarking gene set enrichment analysis]]
 +
|}
 +
 
 +
==== Dimension reduction ====
 +
 
 +
{| class="wikitable sortable"
 
|-
 
|-
| 2019 || Wu || [[Parameter Estimation and Variable Selection for Big Systems of Linear Ordinary Differential Equations: A Matrix-Based Approach]]
+
! Year || First Author || Title
 
|-
 
|-
| 2019 || Pitt || [[Parameter estimation in models of biological oscillators: an automated regularised estimation approach]]
+
| 2008 || Janecek || [[On the Relationship Between Feature Selection and Classification Accuracy]]
 
|-
 
|-
| 2019 || Loos || [[Robust calibration of hierarchical population models for heterogeneous cell populations]]
+
| 2015 || Fernández-Gutiérrez || [[Comparing feature selection methods for highdimensional imbalanced data: identifying rheumatoid arthritis cohorts from routine data]]
 +
|}
 +
 
 +
=== Classification ===
 +
{| class="wikitable sortable"
 
|-
 
|-
| 2019 || Clairon || [[Tracking for parameter and state estimation in possibly misspecified partially observed linear Ordinary Differential Equations]]
+
! Year || First Author || Title
 
|-
 
|-
| 2020 || Schmiester || [[Efficient parameterization of large-scale dynamic models based on relative measurements]]
+
| 2003 || Wu || [[Comparison of statistical methods for classification of ovarian cancer using mass spectrometry data]]
 
|-
 
|-
| 2020 || Castro || [[Testing structural identifiability by a simple scaling method]]
+
| 2005 || Bellaachia|| [[Predicting Breast Cancer Survivability Using Data Mining Techniques]]
 
|}
 
|}
 +
  
 
=== Omics Workflows ===
 
=== Omics Workflows ===
Line 210: Line 274:
 
| 2014 || Cox J || [[Accurate Proteome-wide Label-free Quantification by Delayed Normalization and Maximal Peptide Ratio Extraction, Termed MaxLFQ* ]]
 
| 2014 || Cox J || [[Accurate Proteome-wide Label-free Quantification by Delayed Normalization and Maximal Peptide Ratio Extraction, Termed MaxLFQ* ]]
 
|-
 
|-
| 2015 || ||  [[ComparingVariant Call Files for Performance Benchmarkingof Next-Generation Sequencing Variant Calling Pipelines]]
+
| 2015 || Cleary ||  [[Comparing Variant Call Files for Performance Benchmarkingof Next-Generation Sequencing Variant Calling Pipelines]]
 
|-
 
|-
 
| 2016 || Tyanova S || [[The MaxQuant computational platform for mass spectrometry–based shotgun proteomics]]
 
| 2016 || Tyanova S || [[The MaxQuant computational platform for mass spectrometry–based shotgun proteomics]]
Line 216: Line 280:
 
| 2016 || Röst HL || [[OpenMS: a flexible open-source software platform for mass spectrometry data analysis]]
 
| 2016 || Röst HL || [[OpenMS: a flexible open-source software platform for mass spectrometry data analysis]]
 
|-
 
|-
| 2017 || ||  [[A benchmarking of workflows for detecting differential splicing and differential expression at isoform level in human RNA-seq studies]]
+
| 2017 || Merino ||  [[A benchmarking of workflows for detecting differential splicing and differential expression at isoform level in human RNA-seq studies]]
 
|-
 
|-
 
| 2018 || Välikangas T ||  [[A comprehensive evaluation of popular proteomics software workflows for label-free proteome quantification and imputation]]
 
| 2018 || Välikangas T ||  [[A comprehensive evaluation of popular proteomics software workflows for label-free proteome quantification and imputation]]
 
|-
 
|-
| 2019 || ||  [[A Systematic Evaluation of Single CellRNA-Seq Analysis Pipelines]]
+
| 2019 || Vieth ||  [[A Systematic Evaluation of Single CellRNA-Seq Analysis Pipelines]]
 +
|-
 +
| 2019 || Krishnan ||  [[Benchmarking workflows to assess performance and suitability of germline variant calling pipelines in clinical diagnostic assays]]
 +
|-
 +
| 2020 || Tang ||  [[Simultaneous Improvement in the Precision, Accuracy and Robustness of Label-free Proteome Quantification by Optimizing Data Manipulation Chains]]
 
|-
 
|-
| 2019 || ||  [[Benchmarking workflows to assess performance and suitability of germline variant calling pipelines in clinical diagnostic assays]]
+
| 2021 || Dowell JA ||  [[Benchmarking Quantitative Performance in Label-Free Proteomics]]
 
|}
 
|}
  
=== Preprocessing high-throughput data===
+
=== ODE-based Modelling ===
 
{| class="wikitable sortable"
 
{| class="wikitable sortable"
 
|-
 
|-
 
! Year || First Author || Title
 
! Year || First Author || Title
|- 1999 || Perkins DN || [[Probability-based protein identification by searching sequence databases using mass spectrometry data]]
 
 
|-
 
|-
| 2003 || || [[A comparison of normalization methods for high density oligonucleotide array data based on variance and bias]]
+
| 2001 || Beal || [[Ways to Fit a PK Model with Some Data Below the Quantification Limit]]
 
|-
 
|-
| 2003 || || [[Preprocessing of tandem mass spectrometric data to support automatic protein identification]]
+
| 2008 || Balsa-Canto || [[Hybrid optimization method with general switching strategy for parameter estimation]]
 
|-
 
|-
| 2005 || || [[Comparison of Affymetrix GeneChip Expression Measures]]
+
| 2011 || Tashkova || [[Parameter estimation with bio-inspired meta-heuristic optimization: modeling the dynamics of endocytosis]]
 
|-
 
|-
| 2005 || Meleth S || [[The case for well-conducted experiments to validate statistical protocols for 2D gels: different pre-processing = different lists of significant proteins]]
+
| 2013 || Raue || [[Lessons Learned from Quantitative Dynamical Modeling in Systems Biology]]
 
|-
 
|-
| 2005 || || [[Comparison of background correction and normalization procedures for high-density oligonucleotide microarrays]]
+
| 2013 || Dondelinger || [[ODE parameter inference using adaptive gradient matching with Gaussian processes]]
 
|-
 
|-
| 2006 || || [[Using RNA sample titrations to assess microarray platform performance and normalization techniques]]
+
| 2017 || Ballnus || [[Comprehensive benchmarking of Markov chain Monte Carlo methods for dynamical systems]]
 
|-
 
|-
| 2006 || Wang P ||  [[Normalization regarding non-random missing values in high-throughput mass spectrometry data]]
+
| 2017 || Henriques ||  [[Data-driven reverse engineering of signaling pathways using ensembles of dynamic models]]
 
|-
 
|-
| 2006 || Du P ||  [[Improved peak detection in mass spectrum by incorporating continuous wavelet transform-based pattern matching]]
+
| 2017 || Melicher ||  [[Fast derivatives of likelihood functionals for ODE based models using adjoint-state method]]
 
|-
 
|-
| 2007 || Carvalho B ||  [[Exploration, normalization, and genotype calls of high-density oligonucleotide SNP array data]]
+
| 2017 || Penas ||  [[Parameter estimation in large-scale systems biology models: a parallel and self-adaptive cooperative strategy]]
 
|-
 
|-
| 2007 || Cannataro M ||  [[MS‐Analyzer: preprocessing and data mining services for proteomics applications on the Grid]]
+
| 2017 || Degasperi ||  [[Performance of objective functions and optimization procedures for parameter estimation in system biology models]]
 
|-
 
|-
| 2008 || ||  [[Comparison of preprocessing methods for the hgU133+2 chip from Affymetrix]]
+
| 2017 || Fröhlich ||  [[Scalable Parameter Estimation for Genome-Scale Biochemical Reaction Networks]]
 
|-
 
|-
| 2009 || ||  [[Comparison of Affymetrix data normalization methods using 6,926 experiments across five array generations]]
+
| 2018 || Schälte ||  [[Evaluation of Derivative-Free Optimizers for Parameter Estimation in Systems Biology]]
 
|-
 
|-
| 2009 || Mar JC || [[Data-driven normalization strategies for high-throughput quantitative RT-PCR]]
+
| 2018 || Loos || [[Hierarchical optimization for the efficient parametrization of ODE models]]
 
|-
 
|-
| 2009 || Vakhrushev SY || [[Software platform for high-throughput glycomics]]
+
| 2018 || Stapor || [[Optimization and profile calculation of ODE models using second order adjoint sensitivity analysis]]
 
|-
 
|-
| 2010 || || [[Consistency of predictive signature genes and classifiers generated using different microarray platforms]]
+
| 2019 || Villaverde || [[A comparison of methods for quantifying prediction uncertainty in systems biology]]
 
|-
 
|-
| 2010 || || [[Detecting and correcting systematic variation in large-scale RNA sequencing data]]
+
| 2019 || Hass || [[Benchmark problems for dynamic modeling of intracellular processes]]
 
|-
 
|-
| 2010 || || [[Evaluation of statistical methods for normalization and differential expression in mRNA-Seq experiments]]
+
| 2019 || Villaverde || [[Benchmarking optimization methods for parameter estimation in large kinetic models]]
 
|-
 
|-
| 2010 || || [[Normalization of RNA-seq data using factor analysis of control genes or samples]]
+
| 2019 || Lines || [[Efficient computation of steady states in large-scale ODE models of biochemical reaction networks]]
 
|-
 
|-
| 2010 || Armananzas R || [[Peakbin selection in mass spectrometry data using a consensus approach with estimation of distribution algorithms]]
+
| 2019 || Stapor || [[Mini-batch optimization enables training of ODE models on large-scale datasets]]
 
|-
 
|-
| 2011 || || [[Affymetrix GeneChip microarray preprocessing for multivariate analyses]]
+
| 2019 || Wu || [[Parameter Estimation and Variable Selection for Big Systems of Linear Ordinary Differential Equations: A Matrix-Based Approach]]
 
|-
 
|-
| 2011 || Zhang ZM || [[Peak alignment using wavelet pattern matching and differential evolution]]
+
| 2019 || Pitt || [[Parameter estimation in models of biological oscillators: an automated regularised estimation approach]]
 
|-
 
|-
| 2012 || || [[A comprehensive evaluation of normalization methods for Illumina high-throughput RNA sequencing data analysis]]
+
| 2019 || Loos || [[Robust calibration of hierarchical population models for heterogeneous cell populations]]
 
|-
 
|-
| 2013 || García-Torres M || [[Comparison of metaheuristic strategies for peakbin selection in proteomic mass spectrometry data]]
+
| 2019 || Clairon || [[Tracking for parameter and state estimation in possibly misspecified partially observed linear Ordinary Differential Equations]]
 
|-
 
|-
| 2013 || Horvatovich P  || [[Bioinformatics and Statistics: LC‐MS (/MS) Data Preprocessing for Biomarker Discovery]]
+
| 2020 || Schmiester || [[Efficient parameterization of large-scale dynamic models based on relative measurements]]
 
|-
 
|-
| 2014 || || [[Normalyzer: A Tool for Rapid Evaluation of Normalization Methods for Omics Data Sets]]
+
| 2020 || Castro || [[Testing structural identifiability by a simple scaling method]]
|-
 
| 2014 || Zhou X || [[Prevention, diagnosis and treatment of high-throughput sequencing data pathologies]]
 
|-
 
| 2014 || Coble JB || [[Comparative evaluation of preprocessing freeware on chromatography/mass spectrometry data for signature discovery]]
 
|-
 
| 2014 || Aggio RB || [[Identifying and quantifying metabolites by scoring peaks of GC-MS data]]
 
|-
 
| 2014 || Cox J || [[Accurate proteome-wide label-free quantification by delayed normalization and maximal peptide ratio extraction, termed MaxLFQ]]
 
|-
 
| 2015 || Caraus I || [[Detecting and overcoming systematic bias in high-throughput screening technologies: a comprehensive review of practical issues and methodological solutions]]
 
|-
 
| 2015 || Tam S || [[Optimization of miRNA-seq data preprocessing]]
 
|-
 
| 2015 || Rafiei A || [[Comparison of peak‐picking workflows for untargeted liquid chromatography/high‐resolution mass spectrometry metabolomics data analysis]]
 
|-
 
| 2015 || Chawade A || [[Data processing has major impact on the outcome of quantitative label-free LC-MS analysis]]
 
|-
 
| 2015 || Wang T || [[A systematic study of normalization methods for Infinium 450K methylation data using whole-genome bisulfite sequencing data]]
 
|-
 
| 2015 || Lu J || [[Improved Peak Detection and Deconvolution of Native Electrospray Mass Spectra from Large Protein Complexes]]
 
|-
 
| 2016 || Yi L || [[Chemometric methods in data processing of mass spectrometry-based metabolomics: A review]]
 
|-
 
| 2016 || Tsuji J || [[Evaluation of preprocessing, mapping and postprocessing algorithms for analyzing whole genome bisulfite sequencing data]]
 
|-
 
| 2016 || Li B || [[Performance Evaluation and Online Realization of Data-driven Normalization Methods Used in LC/MS based Untargeted Metabolomics Analysis]]
 
|-
 
| 2016 || Zheng Y || [[An improved algorithm for peak detection in mass spectra based on continuous wavelet transform]]
 
|-
 
| 2017 || Li B || [[NOREVA: normalization and evaluation of MS-based metabolomics data]]
 
|-
 
| 2018 || Mazoure B || [[Identification and Correction of Additive and Multiplicative Spatial Biases in Experimental High-Throughput Screening]]
 
|-
 
| 2018 || Li Z || [[Comprehensive evaluation of untargeted metabolomics data processing software in feature detection, quantification and discriminating marker selection]]
 
|-
 
| 2018 || Willforss J || [[NormalyzerDE: Online Tool for Improved Normalization of Omics Expression Data and High-Sensitivity Differential Expression Analysis]]
 
 
|}
 
|}
 +
 +
 +
=== Other Studies ===
 +
https://link.springer.com/article/10.1007/s00521-021-06188-z
 +
 +
https://www.diva-portal.org/smash/get/diva2:1568674/FULLTEXT01.pdf
 +
 +
https://www.sciencedirect.com/science/article/pii/S2405471221002076
 +
 +
https://www.tandfonline.com/doi/abs/10.1080/15476286.2021.1940047
 +
 +
https://escholarship.org/content/qt4091n16g/qt4091n16g.pdf

Latest revision as of 11:23, 27 August 2021

Page summary
Here outcomes of benchmarking studies from the literature are collected. The primary aim is a comprehensive overview about neutral benchmark studies, i.e. assessments which were performed independenty on publication of a new approach. Studies which are not neutral are put in brackets.

The focus is on computational methods for analyzing experimental data form the molecular biology field (instead of comparing experimental techniques or platforms).

Please extend this list by creating a new page and adding a link below.
Use the guidelines described here.

1 Results from Literature

https://journals.tubitak.gov.tr/biology/issues/biy-21-45-2/biy-45-2-1-2008-8.pdf

1.1 Preprocessing high-throughput data

Year First Author Title
2003 Bolstad A comparison of normalization methods for high density oligonucleotide array data based on variance and bias
2003 Gentzel Preprocessing of tandem mass spectrometric data to support automatic protein identification
2005 Irizarry Comparison of Affymetrix GeneChip Expression Measures
2005 Meleth S The case for well-conducted experiments to validate statistical protocols for 2D gels: different pre-processing = different lists of significant proteins
2005 Freudenberg Comparison of background correction and normalization procedures for high-density oligonucleotide microarrays
2006 Shippy Using RNA sample titrations to assess microarray platform performance and normalization techniques
2006 Wang P Normalization regarding non-random missing values in high-throughput mass spectrometry data
2006 Du P Improved peak detection in mass spectrum by incorporating continuous wavelet transform-based pattern matching
2007 Carvalho B Exploration, normalization, and genotype calls of high-density oligonucleotide SNP array data
2007 Cannataro M MS‐Analyzer: preprocessing and data mining services for proteomics applications on the Grid
2008 Goebels Comparison of preprocessing methods for the hgU133+2 chip from Affymetrix
2009 Autio Comparison of Affymetrix data normalization methods using 6,926 experiments across five array generations
2009 Mar JC Data-driven normalization strategies for high-throughput quantitative RT-PCR
2009 Vakhrushev SY Software platform for high-throughput glycomics
2010 Fan Consistency of predictive signature genes and classifiers generated using different microarray platforms
2010 Li Detecting and correcting systematic variation in large-scale RNA sequencing data
2010 Bullard Evaluation of statistical methods for normalization and differential expression in mRNA-Seq experiments
2010 Risso Normalization of RNA-seq data using factor analysis of control genes or samples
2010 Armananzas R Peakbin selection in mass spectrometry data using a consensus approach with estimation of distribution algorithms
2011 McCall Affymetrix GeneChip microarray preprocessing for multivariate analyses
2011 Zhang ZM Peak alignment using wavelet pattern matching and differential evolution
2012 Dillies A comprehensive evaluation of normalization methods for Illumina high-throughput RNA sequencing data analysis
2013 García-Torres M Comparison of metaheuristic strategies for peakbin selection in proteomic mass spectrometry data
2013 Horvatovich P Bioinformatics and Statistics: LC‐MS (/MS) Data Preprocessing for Biomarker Discovery
2014 Chawade Normalyzer: A Tool for Rapid Evaluation of Normalization Methods for Omics Data Sets
2014 Zhou X Prevention, diagnosis and treatment of high-throughput sequencing data pathologies
2014 Coble JB Comparative evaluation of preprocessing freeware on chromatography/mass spectrometry data for signature discovery
2014 Aggio RB Identifying and quantifying metabolites by scoring peaks of GC-MS data
2014 Cox J Accurate proteome-wide label-free quantification by delayed normalization and maximal peptide ratio extraction, termed MaxLFQ
2015 Caraus I Detecting and overcoming systematic bias in high-throughput screening technologies: a comprehensive review of practical issues and methodological solutions
2015 Tam S Optimization of miRNA-seq data preprocessing
2015 Rafiei A Comparison of peak‐picking workflows for untargeted liquid chromatography/high‐resolution mass spectrometry metabolomics data analysis
2015 Chawade A Data processing has major impact on the outcome of quantitative label-free LC-MS analysis
2015 Wang T A systematic study of normalization methods for Infinium 450K methylation data using whole-genome bisulfite sequencing data
2015 Lu J Improved Peak Detection and Deconvolution of Native Electrospray Mass Spectra from Large Protein Complexes
2016 Yi L Chemometric methods in data processing of mass spectrometry-based metabolomics: A review
2016 Tsuji J Evaluation of preprocessing, mapping and postprocessing algorithms for analyzing whole genome bisulfite sequencing data
2016 Li B Performance Evaluation and Online Realization of Data-driven Normalization Methods Used in LC/MS based Untargeted Metabolomics Analysis
2016 Zheng Y An improved algorithm for peak detection in mass spectra based on continuous wavelet transform
2017 Li B NOREVA: normalization and evaluation of MS-based metabolomics data
2018 Mazoure B Identification and Correction of Additive and Multiplicative Spatial Biases in Experimental High-Throughput Screening
2018 Li Z Comprehensive evaluation of untargeted metabolomics data processing software in feature detection, quantification and discriminating marker selection
2018 Willforss J NormalyzerDE: Online Tool for Improved Normalization of Omics Expression Data and High-Sensitivity Differential Expression Analysis


1.2 Imputation methods for missing values

Year First Author Title
1996 Schenker Partially parametric techniques for multiple imputation
1999 Hastie T Imputing Missing Data for Gene Expression Arrays
2001 Troyanskaya Missing value estimation methods for DNA microarrays
2002 Engels J Imputation of missing longitudinal data: a comparison of methods
2003 Oba A Bayesian missing value estimation method for gene expression profile data
2005 Scholz Nonlinear PCA: a missing data approach
2007 Stacklies pcaMethods—a bioconductor package providing PCA methods for incomplete data
2007 Verboven Sequential imputation for missing values
2008 Shaffer GN Which missing value imputation method to use in expression profiles: a comparative study and two selection schemes
2011 Templ Iterative stepwise regression imputation using standard and robust methods
2012 Hrydziuszko O Missing values in mass spectrometry based metabolomics: an undervalued step in the data processing pipeline
2012 Stekhoven MissForest—non-parametric missing value imputation for mixed-type data
2013 Taylor Accounting for undetected compounds in statistical analyses of mass spectrometry ‘omic studies
2013 Waljee Comparison of imputation methods for missing laboratory data in medicine
2014 Shah Comparison of Random Forest and Parametric Imputation Models for Imputing Missing Data Using MICE: A CALIBER Study
2014 Rodwell Comparison of methods for imputing limited-range variables: a simulation study
2014 Morris Tuning multiple imputation by predictive mean matching and local residual draws
2014 Doove L Recursive partitioning for missing data imputation in the presence of interaction effects
2015 Webb-Robertson BJM Review, evaluation, and discussion of the challenges of missing value imputation for mass spectrometry-based label-free global proteomics
2016 Folch-Fortuny A Assessment of maximum likelihood PCA missing data imputation
2016 Lazar C Accounting for the Multiple Natures of Missing Values in Label-Free Quantitative Proteomics Data Sets to Compare Imputation Strategies
2016 Yin X Multiple imputation and analysis for high-dimensional incomplete proteomics data
2018 Wei R Missing Value Imputation Approach for Mass Spectrometry-based Metabolomics Data
2018 Poyatos R Gap-filling a spatially explicit plant trait database: comparing imputation methods and different levels of environmental information
2018 O'Brien JJ The effects of nonignorable missing data on label-free mass spectrometry proteomics experiments
2019 Gunady MK scGAIN: Single Cell RNA-seq Data Imputation using Generative Adversarial Networks
2020 Hou W A systematic evaluation of single-cell RNA-sequencing imputation methods
2020 Zhang L Comparison of Computational Methods for Imputing Single-Cell RNA-Sequencing Data
2021 Steinheuer LM Benchmarking scRNA-seq imputation tools with respect to network inference highlights deficits in performance at high levels of sparsity
2021 Jin L A comparative study of evaluating missing value imputation methods in label-free proteomics

1.3 Selection of Differential Features and Regions

1.3.1 Identifying differential features

Year First Author Title
2006 Guo Rat toxicogenomic study reveals analytical consistency across microarray platforms
2006 Yang The impact of sample imbalance on identifying differentially expressed genes
2010 Su A comprehensive assessment of RNA-seq accuracy, reproducibility and information content by the sequencing Quality control consortium
2014 Ching Power analysis and sample size estimation for RNA-Seq differential expression
2017 van Ooijen Identification of differentially expressed peptides in high-throughput proteomics data
2017 Wang In-depth method assessments of differentially expressed protein detection for shotgun proteomics data with missing values
2017 Wreczycka Strategies for analyzing bisulfite sequencing data
2018 Tran Identification of Differentially Methylated Sites with Weak Methylation Effects
2020 Li Choice of library size normalization and statistical methods for differential gene expression analysis in balanced two-group comparisons for RNA-seq studies

1.3.2 Identifying differential regions (e.g. DMRs)

2015 Peters De novo identification of differentially methylated regions in the human genome
2015 Bhasin MethylAction: detecting differentially methylated regions that distinguish biological subtypes
2015 Jühling metilene: Fast and sensitive calling of differentially methylated regions from bisulfite sequencing data
2016 Kolde seqlm: an MDL based method for identifying differentially methylated regions in high density methylation array data
2016 Ayyala Statistical methods for detecting differentially methylated regions based on MethylCap-seq data
2017 Gaspar DMRfinder: efficiently identifying differentially methylated regions from MethylC-seq data
2018 Condon Defiant: (DMRs: easy, fast, identification and ANnoTation) identifies differentially Methylated regions from iron-deficient rat hippocampus
2018 Catoni DMRcaller: a versatile R/Bioconductor package for detection and visualization of differentially methylated regions in CpG and non-CpG contexts
2018 Gong MethCP: Differentially Methylated Region Detection with Change Point Models (bioRxiv)

1.3.3 Identifying sets of features (e.g. gene set analyses)

Year First Author Title
2009 Ackermann A general modular framework for gene set enrichment analysis
2009 Tintle Comparing gene set analysis methods on single-nucleotide polymorphism data from Genetic Analysis Workshop 16
2018 Mathur Gene set analysis methods: a systematic comparison
2020 Geistlinger Toward a gold standard for benchmarking gene set enrichment analysis

1.3.4 Dimension reduction

Year First Author Title
2008 Janecek On the Relationship Between Feature Selection and Classification Accuracy
2015 Fernández-Gutiérrez Comparing feature selection methods for highdimensional imbalanced data: identifying rheumatoid arthritis cohorts from routine data

1.4 Classification

Year First Author Title
2003 Wu Comparison of statistical methods for classification of ovarian cancer using mass spectrometry data
2005 Bellaachia Predicting Breast Cancer Survivability Using Data Mining Techniques


1.5 Omics Workflows

Year First Author Title
2008 Neuweger H MeltDB: a software platform for the analysis and integration of metabolomics experiment data
2008 Barla A Machine learning methods for predictive proteomics
2009 Xia J MetaboAnalyst: a web server for metabolomic data analysis and interpretation
2013 Weisser H An Automated Pipeline for High-Throughput Label-Free Quantitative Proteomics
2014 Cox J Accurate Proteome-wide Label-free Quantification by Delayed Normalization and Maximal Peptide Ratio Extraction, Termed MaxLFQ*
2015 Cleary Comparing Variant Call Files for Performance Benchmarkingof Next-Generation Sequencing Variant Calling Pipelines
2016 Tyanova S The MaxQuant computational platform for mass spectrometry–based shotgun proteomics
2016 Röst HL OpenMS: a flexible open-source software platform for mass spectrometry data analysis
2017 Merino A benchmarking of workflows for detecting differential splicing and differential expression at isoform level in human RNA-seq studies
2018 Välikangas T A comprehensive evaluation of popular proteomics software workflows for label-free proteome quantification and imputation
2019 Vieth A Systematic Evaluation of Single CellRNA-Seq Analysis Pipelines
2019 Krishnan Benchmarking workflows to assess performance and suitability of germline variant calling pipelines in clinical diagnostic assays
2020 Tang Simultaneous Improvement in the Precision, Accuracy and Robustness of Label-free Proteome Quantification by Optimizing Data Manipulation Chains
2021 Dowell JA Benchmarking Quantitative Performance in Label-Free Proteomics

1.6 ODE-based Modelling

Year First Author Title
2001 Beal Ways to Fit a PK Model with Some Data Below the Quantification Limit
2008 Balsa-Canto Hybrid optimization method with general switching strategy for parameter estimation
2011 Tashkova Parameter estimation with bio-inspired meta-heuristic optimization: modeling the dynamics of endocytosis
2013 Raue Lessons Learned from Quantitative Dynamical Modeling in Systems Biology
2013 Dondelinger ODE parameter inference using adaptive gradient matching with Gaussian processes
2017 Ballnus Comprehensive benchmarking of Markov chain Monte Carlo methods for dynamical systems
2017 Henriques Data-driven reverse engineering of signaling pathways using ensembles of dynamic models
2017 Melicher Fast derivatives of likelihood functionals for ODE based models using adjoint-state method
2017 Penas Parameter estimation in large-scale systems biology models: a parallel and self-adaptive cooperative strategy
2017 Degasperi Performance of objective functions and optimization procedures for parameter estimation in system biology models
2017 Fröhlich Scalable Parameter Estimation for Genome-Scale Biochemical Reaction Networks
2018 Schälte Evaluation of Derivative-Free Optimizers for Parameter Estimation in Systems Biology
2018 Loos Hierarchical optimization for the efficient parametrization of ODE models
2018 Stapor Optimization and profile calculation of ODE models using second order adjoint sensitivity analysis
2019 Villaverde A comparison of methods for quantifying prediction uncertainty in systems biology
2019 Hass Benchmark problems for dynamic modeling of intracellular processes
2019 Villaverde Benchmarking optimization methods for parameter estimation in large kinetic models
2019 Lines Efficient computation of steady states in large-scale ODE models of biochemical reaction networks
2019 Stapor Mini-batch optimization enables training of ODE models on large-scale datasets
2019 Wu Parameter Estimation and Variable Selection for Big Systems of Linear Ordinary Differential Equations: A Matrix-Based Approach
2019 Pitt Parameter estimation in models of biological oscillators: an automated regularised estimation approach
2019 Loos Robust calibration of hierarchical population models for heterogeneous cell populations
2019 Clairon Tracking for parameter and state estimation in possibly misspecified partially observed linear Ordinary Differential Equations
2020 Schmiester Efficient parameterization of large-scale dynamic models based on relative measurements
2020 Castro Testing structural identifiability by a simple scaling method


1.7 Other Studies

https://link.springer.com/article/10.1007/s00521-021-06188-z

https://www.diva-portal.org/smash/get/diva2:1568674/FULLTEXT01.pdf

https://www.sciencedirect.com/science/article/pii/S2405471221002076

https://www.tandfonline.com/doi/abs/10.1080/15476286.2021.1940047

https://escholarship.org/content/qt4091n16g/qt4091n16g.pdf