Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Review Article
  • Published:

Challenges and best practices in omics benchmarking

Abstract

Technological advances enabling massively parallel measurement of biological features — such as microarrays, high-throughput sequencing and mass spectrometry — have ushered in the omics era, now in its third decade. The resulting complex landscape of analytical methods has naturally fostered the growth of an omics benchmarking industry. Benchmarking refers to the process of objectively comparing and evaluating the performance of different computational or analytical techniques when processing and analysing large-scale biological data sets, such as transcriptomics, proteomics and metabolomics. With thousands of omics benchmarking studies published over the past 25 years, the field has matured to the point where the foundations of benchmarking have been established and well described. However, generating meaningful benchmarking data and properly evaluating performance in this complex domain remains challenging. In this Review, we highlight some common oversights and pitfalls in omics benchmarking. We also establish a methodology to bring the issues that can be addressed into focus and to be transparent about those that cannot: this takes the form of a spreadsheet template of guidelines for comprehensive reporting, intended to accompany publications. In addition, a survey of recent developments in benchmarking is provided as well as specific guidance for commonly encountered difficulties.

This is a preview of subscription content, access via your institution

Access options

Buy this article

Prices may be subject to local taxes which are calculated during checkout

Fig. 1: Overview of benchmarking report template and benchmarking a pipeline.

Similar content being viewed by others

References

  1. Weber, L. M. et al. Essential guidelines for computational method benchmarking. Genome Biol. 20, 125 (2019). This landmark paper describes the fundamental tenets of omics benchmarking in biology, for those intending to perform benchmarking studies or to study the literature in search of guidance.

    Article  PubMed  PubMed Central  Google Scholar 

  2. Mangul, S. et al. Systematic benchmarking of omics computational tools. Nat. Commun. 10, 1393 (2019). This landmark paper describes the fundamentals of benchmarking, with a focus on the big picture rather than the particulars of data generation.

    Article  PubMed  PubMed Central  Google Scholar 

  3. Aniba, M. R., Poch, O. & Thompson, J. D. Issues in bioinformatics benchmarking: the case study of multiple sequence alignment. Nucleic Acids Res. 38, 7353–7363 (2010).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  4. Mathews, D. H. How to benchmark RNA secondary structure prediction accuracy. Methods 162–163, 60–67 (2019).

    Article  PubMed  PubMed Central  Google Scholar 

  5. Bokulich, N. A., Ziemski, M., Robeson, M. S. & Kaehler, B. D. Measuring the microbiome: best practices for developing and benchmarking microbiomics methods. Comput. Struct. Biotechnol. J. 18, 4048–4062 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  6. Meyer, F. et al. Tutorial: assessing metagenomics software with the CAMI benchmarking toolkit. Nat. Protoc. 16, 1785–1801 (2021).

    Article  CAS  PubMed  Google Scholar 

  7. Olson, N. D. et al. Variant calling and benchmarking in an era of complete human genome sequences. Nat. Rev. Genet. 24, 464–483 (2023).

    Article  CAS  PubMed  Google Scholar 

  8. Crowell, H. L., Morillo Leonardo, S. X., Soneson, C. & Robinson, M. D. The shaky foundations of simulating single-cell RNA sequencing data. Genome Biol. 24, 62 (2023).

    Article  PubMed  PubMed Central  Google Scholar 

  9. Escalona, M., Rocha, S. & Posada, D. A comparison of tools for the simulation of genomic next-generation sequencing data. Nat. Rev. Genet. 17, 459–469 (2016).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  10. Milhaven, M. & Pfeifer, S. P. Performance evaluation of six popular short-read simulators. Heredity 130, 55–63 (2023).

    Article  PubMed  Google Scholar 

  11. Shakola, F., Palejev, D. & Ivanov, I. A framework for comparison and assessment of synthetic RNA-seq data. Genes 13, 2362 (2022).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  12. Kimes, P. K. & Reyes, A. Reproducible and replicable comparisons using SummarizedBenchmark. Bioinformatics 35, 137–139 (2018).

    Article  PubMed Central  Google Scholar 

  13. Germain, P.-L., Sonrel, A. & Robinson, M. D. pipeComp, a general framework for the evaluation of computational pipelines, reveals performant single cell RNA-seq preprocessing tools. Genome Biol. 21, 227 (2020).

    Article  PubMed  PubMed Central  Google Scholar 

  14. Stephens, M. DSC: dynamic statistical comparisons. GitHub https://stephenslab.github.io/dsc-wiki/overview.html (2023).

  15. Robinson, M. Omnibenchmark: open and continuous community benchmarking. Omnibenchmark https://omnibenchmark.org (2023).

  16. Capella-Gutierrez, S. et al. Lessons learned: recommendations for establishing critical periodic scientific benchmarking. Preprint at bioRxiv https://doi.org/10.1101/181677 (2017).

  17. de Pico, E. M., Gelpi, J. L. & Capella-Gutiérrez, S. FAIRsoft — a practical implementation of FAIR principles for research software. Preprint at bioRxiv https://doi.org/10.1101/2022.05.04.490563 (2022).

  18. Nakato, R. & Sakata, T. Methods for ChIP-seq analysis: a practical workflow and advanced applications. Methods 187, 44–53 (2021).

    Article  CAS  PubMed  Google Scholar 

  19. Li, Y., Ge, X., Peng, F., Li, W. & Li, J. J. Exaggerated false positives by popular differential expression methods when analyzing human population samples. Genome Biol. 23, 79 (2022).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  20. Ison, J. et al. Tools and data services registry: a community effort to document bioinformatics resources. Nucleic Acids Res. 44, D38–D47 (2015).

    Article  PubMed  PubMed Central  Google Scholar 

  21. Wikipedia. List of bioinformatics software. Wikipedia https://en.wikipedia.org/wiki/List_of_bioinformatics_software (2022).

  22. Zappia, L. & Theis, F. J. Over 1000 tools reveal trends in the single-cell RNA-seq analysis landscape. Genome Biol. 22, 301 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  23. Koch, F. C., Sutton, G. J., Voineagu, I. & Vafaee, F. Supervised application of internal validation measures to benchmark dimensionality reduction methods in scRNA-seq data. Brief. Bioinform. 22, bbab304 (2021).

    Article  PubMed  Google Scholar 

  24. Norel, R., Rice, J. J. & Stolovitzky, G. The self-assessment trap: can we all be better than average? Mol. Syst. Biol. 7, 537 (2011). This paper reviews the reported performances of new methods and calls for increased use of multiple evaluation metrics and publication of novel methods even when they do not improve performance above prior works.

    Article  PubMed  PubMed Central  Google Scholar 

  25. Buchka, S., Hapfelmeier, A., Gardner, P. P., Wilson, R. & Boulesteix, A. L. On the optimistic performance evaluation of newly introduced bioinformatic methods. Genome Biol. 22, 152 (2021). This review compares the initial performance claims of published methods to later benchmarking of the same methods, highlighting the need for independent benchmarking.

    Article  PubMed  PubMed Central  Google Scholar 

  26. Germain, P.-L. et al. RNAontheBENCH: computational and empirical resources for benchmarking RNAseq quantification and differential expression methods. Nucleic Acids Res. 44, 5054–5067 (2016).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  27. Holik, A. Z. et al. RNA-seq mixology: designing realistic control experiments to compare protocols and analysis methods. Nucleic Acids Res. 45, e30 (2017). This paper demonstrates the importance of including both technical and biological variation in benchmark data, as well as one approach for including realistic biological variation when evaluating RNA-seq.

    Article  PubMed  Google Scholar 

  28. Altschul, S. F., Gish, W., Miller, W., Myers, E. W. & Lipman, D. J. Basic local alignment search tool. J. Mol. Biol. 215, 403–410 (1990).

    Article  CAS  PubMed  Google Scholar 

  29. Sandve, G. K. & Greiff, V. Access to ground truth at unconstrained size makes simulated data as indispensable as experimental data for bioinformatics methods development and benchmarking. Bioinformatics 38, 4994–4996 (2022). This paper argues for nearly always including simulated data in methods evaluation in order to go beyond the limitations of experimental data with regard to factors such as sample size, knowledge of ground truth and explicit presentation of assumptions.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  30. Maza, E., Frasse, P., Senin, P., Bouzayen, M. & Zouine, M. Comparison of normalization methods for differential gene expression analysis in RNA-seq experiments: a matter of relative size of studied transcriptomes. Commun. Integr. Biol. 6, e25849 (2013).

    Article  PubMed  PubMed Central  Google Scholar 

  31. Szalkowski, A. M. & Schmid, C. D. Rapid innovation in ChIP-seq peak-calling algorithms is outdistancing benchmarking efforts. Brief. Bioinform. 12, 626–633 (2011).

    Article  CAS  PubMed  Google Scholar 

  32. Jelizarow, M., Guillemot, V., Tenenhaus, A., Strimmer, K. & Boulesteix, A. L. Over-optimism in bioinformatics: an illustration. Bioinformatics 26, 1990–1998 (2010). This paper emphasizes the importance of evaluating methods on ‘fresh’ validation data sets that were not used for tuning the method under evaluation.

    Article  CAS  PubMed  Google Scholar 

  33. Szikszai, M., Wise, M., Datta, A., Ward, M. & Mathews, D. H. Deep learning models for RNA secondary structure prediction (probably) do not generalize across families. Bioinformatics 38, 3892–3899 (2022).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  34. Mehta, T., Tanik, M. & Allison, D. B. Towards sound epistemological foundations of statistical methods for high-dimensional biology. Nat. Genet. 36, 943–947 (2004).

    Article  CAS  PubMed  Google Scholar 

  35. Lin, M. H. et al. Benchmarking differential expression, imputation and quantification methods for proteomics data. Brief. Bioinform. 23, bbac138 (2022).

    Article  PubMed  Google Scholar 

  36. Robinson, M. D. & Oshlack, A. A scaling normalization method for differential expression analysis of RNA-seq data. Genome Biol. 11, R25 (2010).

    Article  PubMed  PubMed Central  Google Scholar 

  37. Lahens, N. F. et al. CAMPAREE: a robust and configurable RNA expression simulator. BMC Genomics 22, 692 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  38. Love, M. I., Huber, W. & Anders, S. Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome Biol. 15, 550 (2014).

    Article  PubMed  PubMed Central  Google Scholar 

  39. Law, C. W., Chen, Y., Shi, W. & Smyth, G. K. voom: precision weights unlock linear model analysis tools for RNA-seq read counts. Genome Biol. 15, R29 (2014).

    Article  PubMed  PubMed Central  Google Scholar 

  40. Korthauer, K. et al. A practical guide to methods controlling false discoveries in computational biology. Genome Biol. 20, 118 (2019).

    Article  PubMed  PubMed Central  Google Scholar 

  41. Benjamini, Y. & Yekutieli, D. The control of the false discovery rate in multiple testing under dependency. Ann. Stat. 29, 1165–1188 (2001).

    Article  Google Scholar 

  42. Burton, A., Altman, D. G., Royston, P. & Holder, R. L. The design of simulation studies in medical statistics. Stat. Med. 25, 4279–4292 (2006).

    Article  PubMed  Google Scholar 

  43. Madsen, L. & Birkes, D. Simulating dependent discrete data. J. Stat. Comput. Simul. 83, 677–691 (2013).

    Article  Google Scholar 

  44. Soneson, C. & Robinson, M. D. Towards unified quality verification of synthetic count data with countsimQC. Bioinformatics 34, 691–692 (2017).

    Article  PubMed Central  Google Scholar 

  45. Cao, Y., Yang, P. & Yang, J. Y. H. A benchmark study of simulation methods for single-cell RNA sequencing data. Nat. Commun. 12, 6911 (2021). A benchmark of 12 single-cell RNA-seq simulation methods, including an exhaustive evaluation of simulation quality by comparison to real data sets.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  46. Warton, D. I. & Hui, F. K. C. The central role of mean–variance relationships in the analysis of multivariate abundance data: a response to Roberts (2017). Methods Ecol. Evol. 8, 1408–1414 (2017).

    Article  Google Scholar 

  47. Baruzzo, G. et al. Simulation-based comprehensive benchmarking of RNA-seq aligners. Nat. Methods 14, 135–139 (2017).

    Article  CAS  PubMed  Google Scholar 

  48. Yang, L. & Shami, A. On hyperparameter optimization of machine learning algorithms: theory and practice. Neurocomputing 415, 295–316 (2020). This paper reviews the common techniques used for parameter optimization in machine learning, some of which can be used in omics benchmarking for optimizing parameters of the assessed tools.

    Article  Google Scholar 

  49. Bischl, B. et al. Hyperparameter optimization: foundations, algorithms, best practices, and open challenges. WIREs Data Min. Knowl. Discov. 13, e1484 (2023).

    Article  Google Scholar 

  50. Lessmann, S., Stahlbock, R. & Crone, S. F. in Proc. Int. Conf. Artificial Intelligence 74–82 (ICAI, 2005).

  51. Lorenzo, P. R., Nalepa, J., Kawulok, M., Ramos, L. S. & Pastor, J. R. in Proc. Genetic Evolutionary Computation Conf. 481–488 (ACM, 2017).

  52. Eggensperger, K., Hutter, F., Hoos, H. & Leyton-Brown, K. in Proc. AAAI Conf. Artificial Intelligence (AAAI, 2015).

  53. Anscombe, F. J. Graphs in statistical analysis. Am. Stat. 27, 17–21 (1973). This classic paper shows, with a now well-known example, the shortcomings of summary statistics such as mean and correlation.

    Article  Google Scholar 

  54. Bland, J. M. & Altman, D. G. Statistical methods for assessing agreement between two methods of clinical measurement. Lancet 1, 307–310 (1986).

    Article  CAS  PubMed  Google Scholar 

  55. Chen, X. & Sarkar, S. K. On Benjamini–Hochberg procedure applied to mid p-values. J. Stat. Plan. Infer. 205, 34–45 (2020).

    Article  Google Scholar 

  56. Lyu, P., Li, Y., Wen, X. & Cao, H. JUMP: replicability analysis of high-throughput experiments with applications to spatial transcriptomic studies. Bioinformatics 39, btad366 (2023).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  57. Soneson, C. & Robinson, M. D. iCOBRA: open, reproducible, standardized and live method benchmarking. Nat. Methods 13, 283 (2016). A widely useful library for benchmarking that performs comparisons of methods that produce ranked lists of features, particularly P values but also numerical rankings.

    Article  PubMed  Google Scholar 

  58. Sing, T., Sander, O., Beerenwinkel, N. & Lengauer, T. ROCR: visualizing classifier performance in R. Bioinformatics 21, 3940–3941 (2005).

    Article  CAS  PubMed  Google Scholar 

  59. Breheny, P., Stromberg, A. & Lambert, J. p-value histograms: inference and diagnostics. High Throughput 7, 23 (2018).

    Article  PubMed  PubMed Central  Google Scholar 

  60. VanderWeele, T. J. & Mathur, M. B. Some desirable properties of the Bonferroni correction: is the Bonferroni correction really so bad. Am. J. Epidemiol. 188, 617–618 (2019).

    Article  PubMed  Google Scholar 

  61. Bayarri, M. J. & Berger, J. O. P values for composite null models. J. Am. Stat. Assoc. 95, 1127–1142 (2000).

    Google Scholar 

  62. Tardaguila, M. et al. SQANTI: extensive characterization of long-read transcript sequences for quality control in full-length transcriptome identification and quantification. Genome Res. 28, 396–411 (2018).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  63. Rye, M. B., Sætrom, P. & Drabløs, F. A manually curated ChIP-seq benchmark demonstrates room for improvement in current peak-finder programs. Nucleic Acids Res. 39, e25 (2010).

    Article  PubMed  PubMed Central  Google Scholar 

  64. Wilbanks, E. G. & Facciotti, M. T. Evaluation of algorithm performance in ChIP-seq peak detection. PLoS ONE 5, e11471 (2010).

    Article  PubMed  PubMed Central  Google Scholar 

  65. Thomas, R., Thomas, S., Holloway, A. K. & Pollard, K. S. Features that define the best ChIP-seq peak calling algorithms. Brief. Bioinform. 18, 441–450 (2016).

    PubMed Central  Google Scholar 

  66. de Boer, B. A. et al. OccuPeak: ChIP-seq peak calling based on internal background modelling. PLoS ONE 9, e99844 (2014).

    Article  PubMed  PubMed Central  Google Scholar 

  67. Johnson, D. S., Mortazavi, A., Myers, R. M. & Wold, B. Genome-wide mapping of in vivo protein–DNA interactions. Science 316, 1497–1502 (2007).

    Article  CAS  PubMed  Google Scholar 

  68. Laajala, T. D. et al. A practical comparison of methods for detecting transcription factor binding sites in ChIP-seq experiments. BMC Genomics 10, 618 (2009).

    Article  PubMed  PubMed Central  Google Scholar 

  69. Li, Q., Brown, J. B., Huang, H. & Bickel, P. J. Measuring reproducibility of high-throughput experiments. Ann. Appl. Stat. 5, 1752–1779 (2011). This paper defines the widely used irreproducible discovery rate, which measures the consistency of rankings of features to evaluate consistency across independent biological samples.

    Article  Google Scholar 

  70. Nakato, R. & Shirahige, K. Recent advances in ChIP-seq analysis: from quality management to whole-genome annotation. Brief. Bioinform. 18, 279–290 (2016).

    PubMed Central  Google Scholar 

  71. Huang, M. et al. SAVER: gene expression recovery for single-cell RNA sequencing. Nat. Methods 15, 539–542 (2018).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  72. Laloum, D. & Robinson-Rechavi, M. Methods detecting rhythmic gene expression are biologically relevant only for strong signal. PLoS Comput. Biol. 16, e1007666 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  73. Kryuchkova-Mostacci, N. & Robinson-Rechavi, M. A benchmark of gene expression tissue-specificity metrics. Brief. Bioinform. 18, 205–214 (2016).

    PubMed Central  Google Scholar 

  74. Manni, M., Berkeley, M. R., Seppey, M. & Zdobnov, E. M. BUSCO: assessing genomic data quality and beyond. Curr. Protoc. 1, e323 (2021).

    Article  PubMed  Google Scholar 

  75. Hayer, K. E., Pizarro, A., Lahens, N. F., Hogenesch, J. B. & Grant, G. R. Benchmark analysis of algorithms for determining and quantifying full-length mRNA splice forms from RNA-seq data. Bioinformatics 31, 3938–3945 (2015).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  76. Sonrel, A. et al. Meta-analysis of (single-cell method) benchmarks reveals the need for extensibility and interoperability. Genome Biol. 24, 119 (2023). This paper extensively reviews recent single-cell analysis method benchmarking papers and quantifies the need for documented, reproducible and extensible benchmarking.

    Article  PubMed  PubMed Central  Google Scholar 

  77. Kryshtafovych, A., Schwede, T., Topf, M., Fidelis, K. & Moult, J. Critical assessment of methods of protein structure prediction (CASP) — round XIV. Proteins 89, 1607–1617 (2021). An important example of competition-style benchmarking, in which regularly scheduled independent, blind assessment of protein structure prediction methods is performed using novel, experimentally determined proteins as reference.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  78. Merkel, D. Docker: lightweight Linux containers for consistent development and deployment. Linux J. 2014, 2 (2014).

  79. Kadri, S., Sboner, A., Sigaras, A. & Roy, S. Containers in bioinformatics: applications, practical considerations, and best practices in molecular pathology. J. Mol. Diagn. 24, 442–454 (2022).

    Article  CAS  PubMed  Google Scholar 

  80. Audoux, J. et al. SimBA: a methodology and tools for evaluating the performance of RNA-seq bioinformatic pipelines. BMC Bioinformatics 18, 428 (2017).

    Article  PubMed  PubMed Central  Google Scholar 

  81. Bansal, S. & Parmar, S. Decay of URLs citation: a case study of current science. Libr. Philos. Pract. https://digitalcommons.unl.edu/libphilprac/3582 (2020).

  82. Edgar, R., Domrachev, M. & Lash, A. E. Gene expression omnibus: NCBI gene expression and hybridization array data repository. Nucleic Acids Res. 30, 207–210 (2002).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  83. Barrett, T. et al. NCBI GEO: archive for functional genomics data sets–update. Nucleic Acids Res. 41, D991–D995 (2013).

    Article  CAS  PubMed  Google Scholar 

  84. Altenhoff, A. M. et al. The Quest for Orthologs benchmark service and consensus calls in 2020. Nucleic Acids Res. 48, W538–W545 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  85. Conte, A. D. et al. Critical assessment of protein intrinsic disorder prediction (CAID) — results of round 2. Proteins 91, 1925–1934 (2023).

    Article  CAS  PubMed  Google Scholar 

  86. Bryce-Smith, S. et al. Extensible benchmarking of methods that identify and quantify polyadenylation sites from RNA-seq data. RNA 29, 1839–1855 (2023).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  87. Nevers, Y. et al. The Quest for Orthologs orthology benchmark service in 2022. Nucleic Acids Res. 50, W623–W632 (2022).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  88. Pratapa, A., Jalihal, A. P., Law, J. N., Bharadwaj, A. & Murali, T. M. Benchmarking algorithms for gene regulatory network inference from single-cell transcriptomic data. Nat. Methods 17, 147–154 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  89. Seppey, M., Manni, M. & Zdobnov, E. M. LEMMI: a continuous benchmarking platform for metagenomics classifiers. Genome Res. 30, 1208–1216 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  90. Kuznetsov, D. et al. OrthoDB v11: annotation of orthologs in the widest sampling of organismal diversity. Nucleic Acids Res. 51, D445–D451 (2022).

    Article  PubMed Central  Google Scholar 

  91. Perscheid, C. Comprior: facilitating the implementation and automated benchmarking of prior knowledge-based feature selection approaches on gene expression data sets. BMC Bioinformatics 22, 401 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  92. Soneson, C. compcodeR — an R package for benchmarking differential expression methods for RNA-seq data. Bioinformatics 30, 2517–2518 (2014).

    Article  CAS  PubMed  Google Scholar 

  93. Teng, M. et al. A benchmark for RNA-seq quantification pipelines. Genome Biol. 17, 74 (2016).

    Article  PubMed  PubMed Central  Google Scholar 

  94. Saelens, W., Cannoodt, R., Todorov, H. & Saeys, Y. A comparison of single-cell trajectory inference methods. Nat. Biotechnol. 37, 547–554 (2019).

    Article  CAS  PubMed  Google Scholar 

  95. Smolander, J., Junttila, S. & Elo, L. L. Cell-connectivity-guided trajectory inference from single-cell data. Bioinformatics 39, btad515 (2023).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  96. Wang, C. X., Zhang, L. & Wang, B. One cell at a time (OCAT): a unified framework to integrate and analyze single-cell RNA-seq data. Genome Biol. 23, 102 (2022).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  97. Van den Berge, K. et al. Trajectory-based differential expression analysis for single-cell sequencing data. Nat. Commun. 11, 1201 (2020).

    Article  PubMed  PubMed Central  Google Scholar 

  98. Li, R. & Quon, G. scBFA: modeling detection patterns to mitigate technical noise in large-scale single-cell genomics data. Genome Biol. 20, 193 (2019).

    Article  PubMed  PubMed Central  Google Scholar 

  99. Zhu, A., Ibrahim, J. G. & Love, M. I. Heavy-tailed prior distributions for sequence count data: removing the noise and preserving large differences. Bioinformatics 35, 2084–2092 (2018).

    Article  PubMed Central  Google Scholar 

  100. Choudhary, S. & Satija, R. Comparison and evaluation of statistical error models for scRNA-seq. Genome Biol. 23, 27 (2022).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  101. Spies, D., Renz, P. F., Beyer, T. A. & Ciaudo, C. Comparative analysis of differential gene expression tools for RNA sequencing time course data. Brief. Bioinform. 20, 288–298 (2017).

    Article  PubMed Central  Google Scholar 

  102. Zhu, A., Srivastava, A., Ibrahim, J. G., Patro, R. & Love, M. I. Nonparametric expression analysis using inferential replicate counts. Nucleic Acids Res. 47, e105 (2019).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  103. Gilis, J., Vitting-Seerup, K., Van den Berge, K. & Clement, L. satuRn: scalable analysis of differential transcript usage for bulk and single-cell RNA-sequencing applications. F1000Res 10, 374 (2021).

    Article  CAS  PubMed  Google Scholar 

  104. Wu, E. Y. et al. SEESAW: detecting isoform-level allelic imbalance accounting for inferential uncertainty. Genome Biol. 24, 165 (2023).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  105. He, Z., Pan, Y., Shao, F. & Wang, H. Identifying differentially expressed genes of zero inflated single cell RNA sequencing data using mixed model score tests. Front. Genet. 12, 616686 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  106. Li, Y., Mansmann, U., Du, S. & Hornung, R. Benchmark study of feature selection strategies for multi-omics data. BMC Bioinformatics 23, 412 (2022).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  107. Herrmann, M., Probst, P., Hornung, R., Jurinovic, V. & Boulesteix, A.-L. Large-scale benchmark study of survival prediction methods using multi-omics data. Brief. Bioinform. 22, bbaa167 (2020).

    Article  PubMed Central  Google Scholar 

  108. Leng, D. et al. A benchmark study of deep learning-based multi-omics data fusion methods for cancer. Genome Biol. 23, 171 (2022).

    Article  PubMed  PubMed Central  Google Scholar 

  109. Cantini, L. et al. Benchmarking joint multi-omics dimensionality reduction approaches for the study of cancer. Nat. Commun. 12, 124 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  110. Pierre-Jean, M., Deleuze, J.-F., Le Floch, E. & Mauger, F. Clustering and variable selection evaluation of 13 unsupervised methods for multi-omics data integration. Brief. Bioinform. 21, 2011–2030 (2020).

    Article  CAS  PubMed  Google Scholar 

  111. Rappoport, N. & Shamir, R. Multi-omic and multi-view clustering algorithms: review and cancer benchmark. Nucleic Acids Res. 46, 10546–10562 (2018).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  112. Fawcett, T. An introduction to ROC analysis. Pattern Recogn. Lett. 27, 861–874 (2006). This paper provides advice on applying the widely used receiver operating characteristic (ROC) curve, including pitfalls in interpretation when using the ROC to compare method performance.

    Article  Google Scholar 

  113. Matthews, B. W. Comparison of the predicted and observed secondary structure of T4 phage lysozyme. Biochim. Biophys. Acta 405, 442–451 (1975).

    Article  CAS  PubMed  Google Scholar 

  114. Cramér, H. Mathematical Methods of Statistics 282 (Princeton Univ. Press, 1946).

  115. Rand, W. M. Objective criteria for the evaluation of clustering methods. J. Am. Stat. Assoc. 66, 846–850 (1971).

    Article  Google Scholar 

  116. Fowlkes, E. B. & Mallows, C. L. A method for comparing two hierarchical clusterings. J. Am. Stat. Assoc. 78, 553–569 (1983).

    Article  Google Scholar 

  117. Ekstrom, C. T., Gerds, T. A. & Jensen, A. K. Sequential rank agreement methods for comparison of ranked lists. Biostatistics 20, 582–598 (2019).

    Article  PubMed  Google Scholar 

  118. Fijorek, K., Fijorek, D., Wisniowska, B. & Polak, S. BDTcomparator: a program for comparing binary classifiers. Bioinformatics 27, 3439–3440 (2011).

    Article  CAS  PubMed  Google Scholar 

  119. Knight, C. H. et al. IBRAP: integrated benchmarking single-cell RNA-sequencing analytical pipeline. Brief. Bioinform. 24, bbad061 (2023).

    Article  PubMed  PubMed Central  Google Scholar 

  120. Tantasatityanon, P. & Wichadakul, D. in Proc. 15th Int. Conf. Computer Modeling Simulation 84–91 (ACM, 2023).

  121. Sang-aram, C., Browaeys, R., Seurinck, R. & Saeys, Y. Spotless: a reproducible pipeline for benchmarking cell type deconvolution in spatial transcriptomics. eLife 12, RP88431 (2023).

    Google Scholar 

  122. Virshup, I. et al. The scverse project provides a computational ecosystem for single-cell omics data analysis. Nat. Biotechnol. 41, 604–606 (2023).

    Article  CAS  PubMed  Google Scholar 

  123. Huber, W. et al. Orchestrating high-throughput genomic analysis with Bioconductor. Nat. Methods 12, 115–121 (2015).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  124. Amezquita, R. A. et al. Orchestrating single-cell analysis with Bioconductor. Nat. Methods 17, 137–145 (2020).

    Article  CAS  PubMed  Google Scholar 

  125. Lähnemann, D. et al. Eleven grand challenges in single-cell data science. Genome Biol. 21, 31 (2020).

    Article  PubMed  PubMed Central  Google Scholar 

  126. Jiang, R., Sun, T., Song, D. & Li, J. J. Statistics or biology: the zero-inflation controversy about scRNA-seq data. Genome Biol. 23, 31 (2022).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  127. Silverman, J. D., Roche, K., Mukherjee, S. & David, L. A. Naught all zeros in sequence count data are the same. Comput. Struct. Biotechnol. J. 18, 2789–2798 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  128. Gatto, L. et al. Initial recommendations for performing, benchmarking and reporting single-cell proteomics experiments. Nat. Methods 20, 375–386 (2023).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  129. Valecha, M. & Posada, D. Somatic variant calling from single-cell DNA sequencing data. Comput. Struct. Biotechnol. J. 20, 2978–2985 (2022).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  130. Baker, E. A. G., Schapiro, D., Dumitrascu, B., Vickovic, S. & Regev, A. In silico tissue generation and power analysis for spatial omics. Nat. Methods 20, 424–431 (2023).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  131. Zappia, L., Phipson, B. & Oshlack, A. Splatter: simulation of single-cell RNA sequencing data. Genome Biol. 18, 174 (2017).

    Article  PubMed  PubMed Central  Google Scholar 

  132. Li, B. et al. Benchmarking spatial and single-cell transcriptomics integration methods for transcript distribution prediction and cell type deconvolution. Nat. Methods 19, 662–670 (2022).

    Article  CAS  PubMed  Google Scholar 

  133. Raimundo, F., Prompsy, P., Vert, J.-P. & Vallot, C. A benchmark of computational pipelines for single-cell histone modification data. Genome Biol. 24, 143 (2023).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  134. Yuan, H. & Kelley, D. R. scBasset: sequence-based modeling of single-cell ATAC-seq using convolutional neural networks. Nat. Methods 19, 1088–1096 (2022).

    Article  CAS  PubMed  Google Scholar 

  135. Ashuach, T., Reidenbach, D. A., Gayoso, A. & Yosef, N. PeakVI: a deep generative model for single-cell chromatin accessibility analysis. Cell Rep. Methods 2, 100182 (2022).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  136. Liu, Z., Sun, D. & Wang, C. Evaluation of cell–cell interaction methods by integrating single-cell RNA sequencing data with spatial information. Genome Biol. 23, 218 (2022).

    Article  PubMed  PubMed Central  Google Scholar 

  137. Long, B., Miller, J. & the SpaceTx Consortium. SpaceTx: a roadmap for benchmarking spatial transcriptomics exploration of the brain. Preprint at arXiv https://doi.org/10.48550/arXiv.2301.08436 (2023).

  138. Zhang, Y. et al. Reference-based cell type matching of in situ image-based spatial transcriptomics data on primary visual cortex of mouse brain. Sci. Rep. 13, 9567 (2023).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  139. Bullard, J. H., Purdom, E., Hansen, K. D. & Dudoit, S. Evaluation of statistical methods for normalization and differential expression in mRNA-Seq experiments. BMC Bioinformatics 11, 94 (2010).

    Article  PubMed  PubMed Central  Google Scholar 

  140. Yoshihara, K. et al. The landscape and therapeutic relevance of cancer-associated transcript fusions. Oncogene 34, 4845–4854 (2015).

    Article  CAS  PubMed  Google Scholar 

  141. Lataretu, M. & Hölzer, M. RNAflow: an effective and simple RNA-seq differential gene expression pipeline using Nextflow. Genes 11, 1487 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  142. Tarazona, S., Garcia-Alcalde, F., Dopazo, J., Ferrer, A. & Conesa, A. Differential expression in RNA-seq: a matter of depth. Genome Res. 21, 2213–2223 (2011).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  143. Costa-Silva, J., Domingues, D. & Lopes, F. M. RNA-seq differential expression analysis: an extended review and a software tool. PLoS ONE 12, e0190152 (2017).

    Article  PubMed  PubMed Central  Google Scholar 

  144. Yang, E. W., Girke, T. & Jiang, T. Differential gene expression analysis using coexpression and RNA-seq data. Bioinformatics 29, 2153–2161 (2013).

    Article  CAS  PubMed  Google Scholar 

  145. Zhang, Z. H. et al. A comparative study of techniques for differential expression analysis on RNA-seq data. PLoS ONE 9, e103207 (2014).

    Article  PubMed  PubMed Central  Google Scholar 

  146. Rajkumar, A. P. et al. Experimental validation of methods for differential gene expression analysis and sample pooling in RNA-seq. BMC Genomics 16, 548 (2015).

    Article  PubMed  PubMed Central  Google Scholar 

  147. Das, A., Das, D. & Panda, A. C. Validation of circular RNAs by PCR. Methods Mol. Biol. 2392, 103–114 (2022).

    Article  CAS  PubMed  Google Scholar 

  148. Rai, M. F., Tycksen, E. D., Sandell, L. J. & Brophy, R. H. Advantages of RNA-seq compared to RNA microarrays for transcriptome profiling of anterior cruciate ligament tears. J. Orthop. Res. 36, 484–497 (2018).

    Article  CAS  PubMed  Google Scholar 

  149. Beck, T. F., Mullikin, J. C., Program, N. C. S. & Biesecker, L. G. Systematic evaluation of Sanger validation of next-generation sequencing variants. Clin. Chem. 62, 647–654 (2016).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  150. Zheng, J. et al. A comprehensive assessment of next-generation sequencing variants validation using a secondary technology. Mol. Genet. Genom. Med. 7, e00748 (2019).

    Article  Google Scholar 

  151. Li, B. & Dewey, C. N. RSEM: accurate transcript quantification from RNA-seq data with or without a reference genome. BMC Bioinformatics 12, 323 (2011).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  152. Griebel, T. et al. Modelling and simulating generic RNA-seq experiments with the flux simulator. Nucleic Acids Res. 40, 10073–10083 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  153. Frazee, A. C., Jaffe, A. E., Langmead, B. & Leek, J. T. Polyester: simulating RNA-seq datasets with differential transcript expression. Bioinformatics 31, 2778–2784 (2015).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  154. Franklin, J. M., Schneeweiss, S., Polinski, J. M. & Rassen, J. A. Plasmode simulation for the evaluation of pharmacoepidemiologic methods in complex healthcare databases. Comput. Stat. Data Anal. 72, 219–226 (2014).

    Article  PubMed  Google Scholar 

Download references

Acknowledgements

The authors thank the reviewers and editors for their helpful comments and suggestions. This work was funded by the National Center for Advancing Translational Sciences (grant 5UL1TR000003).

Author information

Authors and Affiliations

Authors

Contributions

The authors contributed equally to all aspects of the article.

Corresponding author

Correspondence to Gregory R. Grant.

Ethics declarations

Competing interests

The authors declare no competing interests.

Peer review

Peer review information

Nature Reviews Genetics thanks Yvan Saeys and the other, anonymous, reviewer for their contribution to the peer review of this work.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Related links

Open Problems in Single-Cell Analysis: https://openproblems.bio

SpaceTx: https://spacetx.github.io

Updates and domain-specific templates: https://github.com/itmat/OmicsBenchmarkReport

Supplementary information

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Brooks, T.G., Lahens, N.F., Mrčela, A. et al. Challenges and best practices in omics benchmarking. Nat Rev Genet 25, 326–339 (2024). https://doi.org/10.1038/s41576-023-00679-6

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1038/s41576-023-00679-6

Search

Quick links

Nature Briefing: Translational Research

Sign up for the Nature Briefing: Translational Research newsletter — top stories in biotechnology, drug discovery and pharma.

Get what matters in translational research, free to your inbox weekly. Sign up for Nature Briefing: Translational Research