Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Brief Communication
  • Published:

A synthetic-diploid benchmark for accurate variant-calling evaluation

Abstract

Existing benchmark datasets for use in evaluating variant-calling accuracy are constructed from a consensus of known short-variant callers, and they are thus biased toward easy regions that are accessible by these algorithms. We derived a new benchmark dataset from the de novo PacBio assemblies of two fully homozygous human cell lines, which provides a relatively more accurate and less biased estimate of small-variant-calling error rates in a realistic context.

This is a preview of subscription content, access via your institution

Access options

Rent or buy this article

Prices vary by article type

from$1.95

to$39.95

Prices may be subject to local taxes which are calculated during checkout

Fig. 1: Construction of the Syndip benchmark dataset.
Fig. 2: Evaluation of variant-calling accuracy with Syndip.

Similar content being viewed by others

References

  1. Zook, J. M. et al. Nat. Biotechnol. 32, 246–251 (2014).

    Article  PubMed  CAS  Google Scholar 

  2. Eberle, M. A. et al. Genome Res. 27, 157–164 (2017).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  3. Li, H. Bioinformatics 30, 2843–2851 (2014).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  4. Chin, C. S. et al. Nat. Methods 13, 1050–1054 (2016).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  5. Chin, C. S. et al. Nat. Methods 10, 563–569 (2013).

    Article  PubMed  CAS  Google Scholar 

  6. Seo, J. S. et al. Nature 538, 243–247 (2016).

    Article  PubMed  CAS  Google Scholar 

  7. Huddleston, J. et al. Genome Res. 27, 677–685 (2017).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  8. Schneider, V. A. et al. Genome Res. 27, 849–864 (2017).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  9. Li, H. arXiv Preprint at https://arxiv.org/abs/1303.3997 (2013).

  10. Langmead, B. & Salzberg, S. L. Nat. Methods 9, 357–359 (2012).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  11. Li, H. Bioinformatics https://doi.org/10.1093/bioinformatics/bty191 (2018).

  12. Garrison, E. & Marth, G. arXiv Preprint at https://arxiv.org/abs/1207.3907 (2012).

  13. Rimmer, A. et al. Nat. Genet. 46, 912–918 (2014).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  14. Li, H. Bioinformatics 27, 2987–2993 (2011).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  15. DePristo, M. A. et al. Nat. Genet. 43, 491–498 (2011).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  16. Cleary, J.G. et al. bioRxiv Preprint at https://www.biorxiv.org/content/early/2015/08/03/023754 (2015).

  17. Auton, A. et al. Nature 526, 68–74 (2015).

    Article  PubMed  CAS  Google Scholar 

  18. Robinson, J. T. et al. Nat. Biotechnol. 29, 24–26 (2011).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  19. Morgulis, A., Gertz, E. M., Schäffer, A. A. & Agarwala, R. J. Comput. Biol. 13, 1028–1040 (2006).

    Article  PubMed  CAS  Google Scholar 

  20. Mallick, S. et al. Nature 538, 201–206 (2016).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  21. Li, H. Bioinformatics 31, 3694–3696 (2015).

    PubMed  PubMed Central  CAS  Google Scholar 

Download references

Acknowledgements

We are grateful to E. Eichler (Department of Genome Sciences, University of Washington, Seattle, WA, USA) for providing DNA from CHM cell lines. We thank A. Carrol for testing PacBio’s new consensus caller, Arrow, and M. DePristo, J. Zook and B. Chapman for helpful suggestions. This study was supported by the US National Institutes of Health (NIH) (grants 5U54DK105566-04 and 5U01HG009088-03 to D.M. and B.N.; grant 1R01HG010040-01 to H.L.).

Author information

Authors and Affiliations

Authors

Contributions

H.L. conceived the study, constructed the benchmark dataset and drafted the manuscript; H.L., J.M.B. and Y.F. designed the experiment; L.G. and M.F. analyzed the data and applied the benchmark; and D.M. and B.N. supervised the project. All of the authors helped to revise the manuscript.

Corresponding authors

Correspondence to Heng Li, Benjamin Neale or Daniel MacArthur.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Reporting Summary

Supplementary Software

Syndip evaluation scripts and helper scripts used to generate the benchmark dataset

Supplementary Data 1

Numerical data and gnuplot script used to generate Fig. 2

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Li, H., Bloom, J.M., Farjoun, Y. et al. A synthetic-diploid benchmark for accurate variant-calling evaluation. Nat Methods 15, 595–597 (2018). https://doi.org/10.1038/s41592-018-0054-7

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1038/s41592-018-0054-7

This article is cited by

Search

Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing