Literature Studies

Page summary
Here outcomes of benchmarking studies from the literature are collected. The primary aim is a comprehensive overview about neutral benchmark studies, i.e. assessments which were performed independenty on publication of a new approach. Studies which are not neutral are put in brackets. The focus is on computational methods for analyzing experimental data form the molecular biology field (instead of comparing experimental techniques or platforms). Please extend this list by creating a new page and adding a link below. Use the guidelines described here.

1 Results from Literature

https://journals.tubitak.gov.tr/biology/issues/biy-21-45-2/biy-45-2-1-2008-8.pdf

1.1 Preprocessing high-throughput data

Year	First Author	Title
2003	Bolstad	A comparison of normalization methods for high density oligonucleotide array data based on variance and bias
2003	Gentzel	Preprocessing of tandem mass spectrometric data to support automatic protein identification
2005	Irizarry	Comparison of Affymetrix GeneChip Expression Measures
2005	Meleth S	The case for well-conducted experiments to validate statistical protocols for 2D gels: different pre-processing = different lists of significant proteins
2005	Freudenberg	Comparison of background correction and normalization procedures for high-density oligonucleotide microarrays
2006	Shippy	Using RNA sample titrations to assess microarray platform performance and normalization techniques
2006	Wang P	Normalization regarding non-random missing values in high-throughput mass spectrometry data
2006	Du P	Improved peak detection in mass spectrum by incorporating continuous wavelet transform-based pattern matching
2007	Carvalho B	Exploration, normalization, and genotype calls of high-density oligonucleotide SNP array data
2007	Cannataro M	MS‐Analyzer: preprocessing and data mining services for proteomics applications on the Grid
2008	Goebels	Comparison of preprocessing methods for the hgU133+2 chip from Affymetrix
2009	Autio	Comparison of Affymetrix data normalization methods using 6,926 experiments across five array generations
2009	Mar JC	Data-driven normalization strategies for high-throughput quantitative RT-PCR
2009	Vakhrushev SY	Software platform for high-throughput glycomics
2010	Fan	Consistency of predictive signature genes and classifiers generated using different microarray platforms
2010	Li	Detecting and correcting systematic variation in large-scale RNA sequencing data
2010	Bullard	Evaluation of statistical methods for normalization and differential expression in mRNA-Seq experiments
2010	Risso	Normalization of RNA-seq data using factor analysis of control genes or samples
2010	Armananzas R	Peakbin selection in mass spectrometry data using a consensus approach with estimation of distribution algorithms
2011	McCall	Affymetrix GeneChip microarray preprocessing for multivariate analyses
2011	Zhang ZM	Peak alignment using wavelet pattern matching and differential evolution
2012	Dillies	A comprehensive evaluation of normalization methods for Illumina high-throughput RNA sequencing data analysis
2013	García-Torres M	Comparison of metaheuristic strategies for peakbin selection in proteomic mass spectrometry data
2013	Horvatovich P	Bioinformatics and Statistics: LC‐MS (/MS) Data Preprocessing for Biomarker Discovery
2014	Chawade	Normalyzer: A Tool for Rapid Evaluation of Normalization Methods for Omics Data Sets
2014	Zhou X	Prevention, diagnosis and treatment of high-throughput sequencing data pathologies
2014	Coble JB	Comparative evaluation of preprocessing freeware on chromatography/mass spectrometry data for signature discovery
2014	Aggio RB	Identifying and quantifying metabolites by scoring peaks of GC-MS data
2014	Cox J	Accurate proteome-wide label-free quantification by delayed normalization and maximal peptide ratio extraction, termed MaxLFQ
2015	Caraus I	Detecting and overcoming systematic bias in high-throughput screening technologies: a comprehensive review of practical issues and methodological solutions
2015	Tam S	Optimization of miRNA-seq data preprocessing
2015	Rafiei A	Comparison of peak‐picking workflows for untargeted liquid chromatography/high‐resolution mass spectrometry metabolomics data analysis
2015	Chawade A	Data processing has major impact on the outcome of quantitative label-free LC-MS analysis
2015	Wang T	A systematic study of normalization methods for Infinium 450K methylation data using whole-genome bisulfite sequencing data
2015	Lu J	Improved Peak Detection and Deconvolution of Native Electrospray Mass Spectra from Large Protein Complexes
2016	Yi L	Chemometric methods in data processing of mass spectrometry-based metabolomics: A review
2016	Tsuji J	Evaluation of preprocessing, mapping and postprocessing algorithms for analyzing whole genome bisulfite sequencing data
2016	Li B	Performance Evaluation and Online Realization of Data-driven Normalization Methods Used in LC/MS based Untargeted Metabolomics Analysis
2016	Zheng Y	An improved algorithm for peak detection in mass spectra based on continuous wavelet transform
2017	Li B	NOREVA: normalization and evaluation of MS-based metabolomics data
2018	Mazoure B	Identification and Correction of Additive and Multiplicative Spatial Biases in Experimental High-Throughput Screening
2018	Li Z	Comprehensive evaluation of untargeted metabolomics data processing software in feature detection, quantification and discriminating marker selection
2018	Willforss J	NormalyzerDE: Online Tool for Improved Normalization of Omics Expression Data and High-Sensitivity Differential Expression Analysis

1.2 Imputation methods for missing values

Year	First Author	Title
1996	Schenker	Partially parametric techniques for multiple imputation
1999	Hastie T	Imputing Missing Data for Gene Expression Arrays
2001	Troyanskaya	Missing value estimation methods for DNA microarrays
2002	Engels J	Imputation of missing longitudinal data: a comparison of methods
2003	Oba	A Bayesian missing value estimation method for gene expression profile data
2005	Scholz	Nonlinear PCA: a missing data approach
2007	Stacklies	pcaMethods—a bioconductor package providing PCA methods for incomplete data
2007	Verboven	Sequential imputation for missing values
2008	Shaffer GN	Which missing value imputation method to use in expression profiles: a comparative study and two selection schemes
2011	Templ	Iterative stepwise regression imputation using standard and robust methods
2012	Hrydziuszko O	Missing values in mass spectrometry based metabolomics: an undervalued step in the data processing pipeline
2012	Stekhoven	MissForest—non-parametric missing value imputation for mixed-type data
2013	Taylor	Accounting for undetected compounds in statistical analyses of mass spectrometry ‘omic studies
2013	Waljee	Comparison of imputation methods for missing laboratory data in medicine
2014	Shah	Comparison of Random Forest and Parametric Imputation Models for Imputing Missing Data Using MICE: A CALIBER Study
2014	Rodwell	Comparison of methods for imputing limited-range variables: a simulation study
2014	Morris	Tuning multiple imputation by predictive mean matching and local residual draws
2014	Doove L	Recursive partitioning for missing data imputation in the presence of interaction effects
2015	Webb-Robertson BJM	Review, evaluation, and discussion of the challenges of missing value imputation for mass spectrometry-based label-free global proteomics
2016	Folch-Fortuny A	Assessment of maximum likelihood PCA missing data imputation
2016	Lazar C	Accounting for the Multiple Natures of Missing Values in Label-Free Quantitative Proteomics Data Sets to Compare Imputation Strategies
2016	Yin X	Multiple imputation and analysis for high-dimensional incomplete proteomics data
2018	Wei R	Missing Value Imputation Approach for Mass Spectrometry-based Metabolomics Data
2018	Poyatos R	Gap-filling a spatially explicit plant trait database: comparing imputation methods and different levels of environmental information
2018	O'Brien JJ	The effects of nonignorable missing data on label-free mass spectrometry proteomics experiments
2019	Gunady MK	scGAIN: Single Cell RNA-seq Data Imputation using Generative Adversarial Networks
2020	Hou W	A systematic evaluation of single-cell RNA-sequencing imputation methods
2020	Zhang L	Comparison of Computational Methods for Imputing Single-Cell RNA-Sequencing Data
2021	Steinheuer LM	Benchmarking scRNA-seq imputation tools with respect to network inference highlights deficits in performance at high levels of sparsity
2021	Jin L	A comparative study of evaluating missing value imputation methods in label-free proteomics

1.3 Selection of Differential Features and Regions

1.3.1 Identifying differential features

Year	First Author	Title
2006	Guo	Rat toxicogenomic study reveals analytical consistency across microarray platforms
2006	Yang	The impact of sample imbalance on identifying differentially expressed genes
2010	Su	A comprehensive assessment of RNA-seq accuracy, reproducibility and information content by the sequencing Quality control consortium
2014	Ching	Power analysis and sample size estimation for RNA-Seq differential expression
2017	van Ooijen	Identification of differentially expressed peptides in high-throughput proteomics data
2017	Wang	In-depth method assessments of differentially expressed protein detection for shotgun proteomics data with missing values
2017	Wreczycka	Strategies for analyzing bisulfite sequencing data
2018	Tran	Identification of Differentially Methylated Sites with Weak Methylation Effects
2020	Li	Choice of library size normalization and statistical methods for differential gene expression analysis in balanced two-group comparisons for RNA-seq studies
2021	Das	A Comprehensive Survey of Statistical Approaches for Differential Expression Analysis in Single-Cell RNA Sequencing Studies

1.3.2 Identifying differential regions (e.g. DMRs)

2015	Peters	De novo identification of differentially methylated regions in the human genome
2015	Bhasin	MethylAction: detecting differentially methylated regions that distinguish biological subtypes
2015	Jühling	metilene: Fast and sensitive calling of differentially methylated regions from bisulfite sequencing data
2016	Kolde	seqlm: an MDL based method for identifying differentially methylated regions in high density methylation array data
2016	Ayyala	Statistical methods for detecting differentially methylated regions based on MethylCap-seq data
2017	Gaspar	DMRfinder: efficiently identifying differentially methylated regions from MethylC-seq data
2018	Condon	Defiant: (DMRs: easy, fast, identification and ANnoTation) identifies differentially Methylated regions from iron-deficient rat hippocampus
2018	Catoni	DMRcaller: a versatile R/Bioconductor package for detection and visualization of differentially methylated regions in CpG and non-CpG contexts
2018	Gong	MethCP: Differentially Methylated Region Detection with Change Point Models (bioRxiv)

1.3.3 Identifying sets of features (e.g. gene set analyses)

Year	First Author	Title
2009	Ackermann	A general modular framework for gene set enrichment analysis
2009	Tintle	Comparing gene set analysis methods on single-nucleotide polymorphism data from Genetic Analysis Workshop 16
2018	Mathur	Gene set analysis methods: a systematic comparison
2020	Geistlinger	Toward a gold standard for benchmarking gene set enrichment analysis

1.3.4 Dimension reduction

Year	First Author	Title
2008	Janecek	On the Relationship Between Feature Selection and Classification Accuracy
2015	Fernández-Gutiérrez	Comparing feature selection methods for highdimensional imbalanced data: identifying rheumatoid arthritis cohorts from routine data

1.4 Classification

Year	First Author	Title
2003	Wu	Comparison of statistical methods for classification of ovarian cancer using mass spectrometry data
2005	Bellaachia	Predicting Breast Cancer Survivability Using Data Mining Techniques

1.5 Omics Workflows

Year	First Author	Title
2008	Neuweger H	MeltDB: a software platform for the analysis and integration of metabolomics experiment data
2008	Barla A	Machine learning methods for predictive proteomics
2009	Xia J	MetaboAnalyst: a web server for metabolomic data analysis and interpretation
2013	Weisser H	An Automated Pipeline for High-Throughput Label-Free Quantitative Proteomics
2014	Cox J	Accurate Proteome-wide Label-free Quantification by Delayed Normalization and Maximal Peptide Ratio Extraction, Termed MaxLFQ*
2015	Cleary	Comparing Variant Call Files for Performance Benchmarkingof Next-Generation Sequencing Variant Calling Pipelines
2016	Tyanova S	The MaxQuant computational platform for mass spectrometry–based shotgun proteomics
2016	Röst HL	OpenMS: a flexible open-source software platform for mass spectrometry data analysis
2017	Merino	A benchmarking of workflows for detecting differential splicing and differential expression at isoform level in human RNA-seq studies
2018	Välikangas T	A comprehensive evaluation of popular proteomics software workflows for label-free proteome quantification and imputation
2019	Vieth	A Systematic Evaluation of Single CellRNA-Seq Analysis Pipelines
2019	Krishnan	Benchmarking workflows to assess performance and suitability of germline variant calling pipelines in clinical diagnostic assays
2020	Tang	Simultaneous Improvement in the Precision, Accuracy and Robustness of Label-free Proteome Quantification by Optimizing Data Manipulation Chains
2021	Dowell JA	Benchmarking Quantitative Performance in Label-Free Proteomics

1.6 Microbiome & Metagenomics

Year	First Author	Title
2016	D’Amore R	A comprehensive benchmarking study of protocols and sequencing platforms for 16S rRNA community profiling
2016	Bokulich N	mockrobiota: a public resource for microbiome bioinformatics benchmarking
2017	McIntyre AB	Comprehensive benchmarking and ensemble approaches for metagenomic classifiers
2018	Nearing JT	Denoising the Denoisers: an independent evaluation of microbiome sequence error-correction approaches
2019	Ye S	Benchmarking Metagenomics Tools for Taxonomic Classification
2020	Wang XW	Comparative study of classifiers for human microbiome data
2020	Calgaro M	Assessment of statistical methods from single cell, bulk RNA-seq, and metagenomics applied to microbiome data
2020	Seppey M	LEMMI: a continuous benchmarking platform for metagenomics classifiers
2021	Kubinski R	Benchmark of data processing methods and machine learning models for gut microbiome-based diagnosis of inflammatory bowel disease
2021	Lloréns-Rico V	Benchmarking microbiome transformations favors experimental quantitative approaches to address compositionality and sampling depth biases
2021	Andreu-Sánchez S	A benchmark of genetic variant calling pipelines using metagenomic short-read sequencing
2021	Cho H	Distribution-based comprehensive evaluation of methods for differential expression analysis in metatranscriptomics
2021	Parks DH	Evaluation of the microba community profiler for taxonomic profiling of metagenomic datasets from the human gut microbiome
2021	Dixit K	Benchmarking of 16S rRNA gene databases using known strain sequences
2021	Khomich M	Analysing microbiome intervention design studies: Comparison of alternative multivariate statistical methods
2022	Nearing J	Microbiome differential abundance methods produce different results across 38 datasets
2022	Briscoe L	Evaluating supervised and unsupervised background noise correction in human gut microbiome data
2024	Marić J	Comparative analysis of metagenomic classifiers for long-read sequencing datasets

1.7 Single Cell Omics

Year	First Author	Title	Link
2023	Alaqueeli	Evaluating the Performance of the Generalized Linear Model (glm) R Package Using Single-Cell RNA-Sequencing Data	https://www.mdpi.com/2076-3417/13/20/11512

1.8 ODE-based Modelling

Year	First Author	Title
2001	Beal	Ways to Fit a PK Model with Some Data Below the Quantification Limit
2008	Balsa-Canto	Hybrid optimization method with general switching strategy for parameter estimation
2011	Tashkova	Parameter estimation with bio-inspired meta-heuristic optimization: modeling the dynamics of endocytosis
2013	Raue	Lessons Learned from Quantitative Dynamical Modeling in Systems Biology
2013	Dondelinger	ODE parameter inference using adaptive gradient matching with Gaussian processes
2017	Ballnus	Comprehensive benchmarking of Markov chain Monte Carlo methods for dynamical systems
2017	Henriques	Data-driven reverse engineering of signaling pathways using ensembles of dynamic models
2017	Melicher	Fast derivatives of likelihood functionals for ODE based models using adjoint-state method
2017	Penas	Parameter estimation in large-scale systems biology models: a parallel and self-adaptive cooperative strategy
2017	Degasperi	Performance of objective functions and optimization procedures for parameter estimation in system biology models
2017	Fröhlich	Scalable Parameter Estimation for Genome-Scale Biochemical Reaction Networks
2018	Schälte	Evaluation of Derivative-Free Optimizers for Parameter Estimation in Systems Biology
2018	Loos	Hierarchical optimization for the efficient parametrization of ODE models
2018	Stapor	Optimization and profile calculation of ODE models using second order adjoint sensitivity analysis
2019	Villaverde	A comparison of methods for quantifying prediction uncertainty in systems biology
2019	Hass	Benchmark problems for dynamic modeling of intracellular processes
2019	Villaverde	Benchmarking optimization methods for parameter estimation in large kinetic models
2019	Lines	Efficient computation of steady states in large-scale ODE models of biochemical reaction networks
2019	Stapor	Mini-batch optimization enables training of ODE models on large-scale datasets
2019	Wu	Parameter Estimation and Variable Selection for Big Systems of Linear Ordinary Differential Equations: A Matrix-Based Approach
2019	Pitt	Parameter estimation in models of biological oscillators: an automated regularised estimation approach
2019	Loos	Robust calibration of hierarchical population models for heterogeneous cell populations
2019	Clairon	Tracking for parameter and state estimation in possibly misspecified partially observed linear Ordinary Differential Equations
2020	Schmiester	Efficient parameterization of large-scale dynamic models based on relative measurements
2020	Castro	Testing structural identifiability by a simple scaling method
2023	Loman	Catalyst: Fast and flexible modeling of reaction networks

1.9 AI & Deep Learning

Year	First Author	Title	Link
2023	Template Author	Template Title	https://a.template.link

1.10 Other Studies

https://link.springer.com/article/10.1007/s00521-021-06188-z

https://www.diva-portal.org/smash/get/diva2:1568674/FULLTEXT01.pdf

https://www.sciencedirect.com/science/article/pii/S2405471221002076

https://www.tandfonline.com/doi/abs/10.1080/15476286.2021.1940047

https://escholarship.org/content/qt4091n16g/qt4091n16g.pdf

Anonymous

Search

Navigation

Navigation

Show

Wiki tools

Wiki tools

Literature Studies

Namespaces

Page actions

Contents

1 Results from Literature

1.1 Preprocessing high-throughput data

1.2 Imputation methods for missing values

1.3 Selection of Differential Features and Regions

1.3.1 Identifying differential features

1.3.2 Identifying differential regions (e.g. DMRs)

1.3.3 Identifying sets of features (e.g. gene set analyses)

1.3.4 Dimension reduction

1.4 Classification

1.5 Omics Workflows

1.6 Microbiome & Metagenomics

1.7 Single Cell Omics

1.8 ODE-based Modelling

1.9 AI & Deep Learning

1.10 Other Studies

Anonymous

Search

Navigation

Wiki tools

Page tools

Literature Studies

Contents

1 Results from Literature

1.1 Preprocessing high-throughput data

1.2 Imputation methods for missing values

1.3 Selection of Differential Features and Regions

1.3.1 Identifying differential features

1.3.2 Identifying differential regions (e.g. DMRs)

1.3.3 Identifying sets of features (e.g. gene set analyses)

1.3.4 Dimension reduction

1.4 Classification

1.5 Omics Workflows

1.6 Microbiome & Metagenomics

1.7 Single Cell Omics

1.8 ODE-based Modelling

1.9 AI & Deep Learning

1.10 Other Studies