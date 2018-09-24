Letter | Published:

A universal SNP and small-indel variant caller using deep neural networks

Nature Biotechnology

Subjects

Abstract

Despite rapid advances in sequencing technologies, accurately calling genetic variants present in an individual genome from billions of short, errorful sequence reads remains challenging. Here we show that a deep convolutional neural network can call genetic variation in aligned next-generation sequencing read data by learning statistical relationships between images of read pileups around putative variant and true genotype calls. The approach, called DeepVariant, outperforms existing state-of-the-art tools. The learned model generalizes across genome builds and mammalian species, allowing nonhuman sequencing projects to benefit from the wealth of human ground-truth data. We further show that DeepVariant can learn to call variants in a variety of sequencing technologies and experimental designs, including deep whole genomes from 10X Genomics and Ion Ampliseq exomes, highlighting the benefits of using more automated and generalizable techniques for variant calling.

Download references

Acknowledgements

We thank J. Zook and his collaborators at NIST for their work developing the Genome in a Bottle resources, the Verily sequencing facility for running the NA12878 replicates, and our colleagues at Verily and Google for their feedback on this manuscript and the project in general. This work was supported by internal funding.

Author information

Affiliations

  Verily Life Sciences, Mountain View, California, USA.

    • Ryan Poplin
    • , Dan Newburger
    • , Jojo Dijamco
    • , Nam Nguyen
    • , Pegah T Afshar
    • , Sam S Gross
    • , Lizzie Dorfman
    • , Cory Y McLean
    •  & Mark A DePristo

  Google Inc., Mountain View, California, USA.

    • Ryan Poplin
    • , Pi-Chuan Chang
    • , David Alexander
    • , Scott Schwartz
    • , Thomas Colthurst
    • , Alexander Ku
    • , Lizzie Dorfman
    • , Cory Y McLean
    •  & Mark A DePristo

Authors

Contributions

R.P. and M.A.D. designed the study, analyzed and interpreted results and wrote the paper. R.P., P.-C.C., D.A., S.S., T.C., A.K., D.N., J.D., N.N., P.T.A., S.S.G., L.D., C.Y.M. and M.A.D. performed experiments and contributed to the software.

Competing interests

D.N., J.D., N.N., P.T.A. and S.S.G. are employees of Verily Life Sciences. P.-C.C., D.A., S.S, T.C. and A.K. are employees of Google Inc. R.P., L.D., C.Y.M. and M.A.D. are employees of Verily Life Sciences and Google Inc. This work was internally funded by Verily Life Sciences and Google Inc.

Corresponding author

Correspondence to Mark A DePristo.

Supplementary information

PDF files

    Supplementary Text and Figures

    Supplementary Figures 1 and 2, Supplementary Tables 1–8 and Supplementary Notes 1–11

    Life Sciences Reporting Summary

    Supplementary Data

    Evaluation metrics

    Supplementary Software

    Benchmarking script

https://doi.org/10.1038/nbt.4235