Abstract

A fundamental requirement for genetic studies is an accurate determination of sequence variation. While human genome sequence diversity is increasingly well characterized, there is a need for efficient ways to use this knowledge in sequence analysis. Here we present Graphtyper, a publicly available novel algorithm and software for discovering and genotyping sequence variants. Graphtyper realigns short-read sequence data to a pangenome, a variation-aware graph structure that encodes sequence variation within a population by representing possible haplotypes as graph paths. Our results show that Graphtyper is fast, highly scalable, and provides sensitive and accurate genotype calls. Graphtyper genotyped 89.4 million sequence variants in the whole genomes of 28,075 Icelanders using less than 100,000 CPU days, including detailed genotyping of six human leukocyte antigen (HLA) genes. We show that Graphtyper is a valuable tool in characterizing sequence variation in both small and population-scale sequencing studies.

  • Subscribe to Nature Genetics for full access:

    $59

    Subscribe

Additional access options:

Already a subscriber?  Log in  now or  Register  for online access.

References

  1. 1.

    Genome of the Netherlands Consortium. Whole-genome sequence variation, population structure and demographic history of the Dutch population. Nat. Genet. 46, 818–825 (2014).

  2. 2.

    et al. Large-scale whole-genome sequencing of the Icelandic population. Nat. Genet. 47, 435–444 (2015).

  3. 3.

    et al. An integrated map of structural variation in 2,504 human genomes. Nature 526, 75–81 (2015).

  4. 4.

    et al. A global reference for human genetic variation. Nature 526, 68–74 (2015).

  5. 5.

    et al. De novo assembly and phasing of a Korean human genome. Nature 538, 243–247 (2016).

  6. 6.

    et al. The diploid genome sequence of an Asian individual. Nature 456, 60–65 (2008).

  7. 7.

    & HLA and Disease Associations (Springer, 1985).

  8. 8.

    et al. The IPD and IMGT/HLA database: allele variant databases. Nucleic Acids Res. 43, D423–D431 (2015).

  9. 9.

    , , , & Improved genome inference in the MHC using a population reference graph. Nat. Genet. 47, 682–688 (2015).

  10. 10.

    & Fast and accurate short read alignment with Burrows–Wheeler transform. Bioinformatics 25, 1754–1760 (2009).

  11. 11.

    et al. A framework for variation discovery and genotyping using next-generation DNA sequencing data. Nat. Genet. 43, 491–498 (2011).

  12. 12.

    et al. A population model for genotyping indels from next-generation sequence data. Nucleic Acids Res. 41, e46 (2013).

  13. 13.

    Computational Pan-Genomics Consortium. Computational pan-genomics: status, promises and challenges. Brief. Bioinform. (2016).

  14. 14.

    et al. High-accuracy HLA type inference from whole-genome sequencing data using population reference graphs. PLoS Comput. Biol. 12, e1005151 (2016).

  15. 15.

    , & Mapping to a reference genome structure. Preprint at (2014).

  16. 16.

    , & Short read alignment with populations of genomes. Bioinformatics 29, i361–i370 (2013).

  17. 17.

    , , , & De novo assembly and genotyping of variants using colored de Bruijn graphs. Nat. Genet. 44, 226–232 (2012).

  18. 18.

    et al. Integrating mapping-, assembly- and haplotype-based approaches for calling variants in clinical sequencing applications. Nat. Genet. 46, 912–918 (2014).

  19. 19.

    et al. Building the sequence map of the human pan-genome. Nat. Biotechnol. 28, 57–63 (2010).

  20. 20.

    , & Indexing graphs for path queries with applications in genome research. IEEE/ACM Trans. Comput. Biol. Bioinform. 11, 375–388 (2014).

  21. 21.

    et al. Simultaneous alignment of short reads against multiple genomes. Genome Biol. 10, R98 (2009).

  22. 22.

    , , & SSW library: an SIMD Smith–Waterman C/C++ library for use in genomic applications. PLoS One 8, e82138 (2013).

  23. 23.

    et al. Genome Graphs. Preprint at bioRxiv (2017).

  24. 24.

    et al. Extending reference assembly models. Genome Biol. 16, 13 (2015).

  25. 25.

    , , & Genome graphs and the evolution of genome inference. Genome Res. 27, 665–676 (2017).

  26. 26.

    in 2017 Proceedings of the Nineteenth Workshop on Algorithm Engineering and Experiments (ALENEX) (eds. Fekete, S. & Ramachandran, V.) 13–27 (Society for Industrial and Applied Mathematics, 2017).

  27. 27.

    , , & Genome alignment with graph data structures: a comparison. BMC Bioinformatics 15, 99 (2014).

  28. 28.

    , , & in Lecture Notes in Computer Science 9838, 222–233 (2016).

  29. 29.

    et al. The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data. Genome Res. 20, 1297–1303 (2010).

  30. 30.

    et al. The Sequence Alignment/Map format and SAMtools. Bioinformatics 25, 2078–2079 (2009).

  31. 31.

    & Haplotype-based variant detection from short-read sequencing. Preprint at (2012).

  32. 32.

    et al. Parental influence on human germline de novo mutations in 1,548 trios from Iceland. Nature (2017).

  33. 33.

    et al. Whole genome characterization of sequence diversity of 15,220 Icelanders. Sci. Data (in press).

  34. 34.

    et al. A reference data set of 5.4 million phased human variants validated by genetic inheritance from sequencing a three-generation 17-member pedigree. Genome Res. 27, 157–164 (2017).

  35. 35.

    et al. OptiType: precision HLA typing from next-generation sequencing data. Bioinformatics 30, 3310–3316 (2014).

  36. 36.

    et al. Accurate whole human genome sequencing using reversible terminator chemistry. Nature 456, 53–59 (2008).

  37. 37.

    Gyper: A Graph-Based HLA Genotyper Using Aligned DNA Sequences. MS thesis, Univ. of Iceland, Reykjavík (2015).

Download references

Acknowledgements

We are grateful to our colleagues from deCODE Genetics/Amgen for their contributions. We also wish to thank all research participants who provided biological samples to deCODE Genetics.

Author information

Affiliations

  1. deCODE Genetics/Amgen, Inc., Reykjavik, Iceland.

    • Hannes P Eggertsson
    • , Hakon Jonsson
    • , Snaedis Kristmundsdottir
    • , Eirikur Hjartarson
    • , Birte Kehr
    • , Gisli Masson
    • , Florian Zink
    • , Kristjan E Hjorleifsson
    • , Aslaug Jonasdottir
    • , Adalbjorg Jonasdottir
    • , Ingileif Jonsdottir
    • , Daniel F Gudbjartsson
    • , Pall Melsted
    • , Kari Stefansson
    •  & Bjarni V Halldorsson
  2. School of Engineering and Natural Sciences, University of Iceland, Reykjavik, Iceland.

    • Hannes P Eggertsson
    • , Daniel F Gudbjartsson
    •  & Pall Melsted
  3. School of Science and Engineering, Reykjavik University, Reykjavik, Iceland.

    • Snaedis Kristmundsdottir
    •  & Bjarni V Halldorsson
  4. Berlin Institute of Health (BIH), Berlin, Germany.

    • Birte Kehr
  5. Faculty of Medicine, School of Health Sciences, University of Iceland, Reykjavik, Iceland.

    • Ingileif Jonsdottir
    •  & Kari Stefansson

Authors

  1. Search for Hannes P Eggertsson in:

  2. Search for Hakon Jonsson in:

  3. Search for Snaedis Kristmundsdottir in:

  4. Search for Eirikur Hjartarson in:

  5. Search for Birte Kehr in:

  6. Search for Gisli Masson in:

  7. Search for Florian Zink in:

  8. Search for Kristjan E Hjorleifsson in:

  9. Search for Aslaug Jonasdottir in:

  10. Search for Adalbjorg Jonasdottir in:

  11. Search for Ingileif Jonsdottir in:

  12. Search for Daniel F Gudbjartsson in:

  13. Search for Pall Melsted in:

  14. Search for Kari Stefansson in:

  15. Search for Bjarni V Halldorsson in:

Contributions

H.P.E. implemented the Graphtyper software. H.P.E., P.M., and B.V.H. designed the Graphtyper algorithm. H.P.E., D.F.G., P.M., B.V.H., and K.S. designed the experiments. H.P.E., E.H., G.M., and F.Z. ran all evaluated genotypers. H.P.E., H.J., and K.E.H. analyzed the call sets. Aslaug Jonasdottir, Adalbjorg Jonasdottir, and I.J. were responsible for PCR validation. H.J. and S.K. contributed software for the project. H.P.E. wrote the initial version of the manuscript, and H.J., S.K., B.K., P.M., B.V.H., and K.S. contributed to subsequent versions. All authors reviewed and approved the final version of the manuscript.

Competing interests

All authors are employees of deCODE Genetics/Amgen, Inc.

Corresponding authors

Correspondence to Hannes P Eggertsson or Bjarni V Halldorsson.

Integrated supplementary information

Supplementary information

PDF files

  1. 1.

    Supplementary Text and Figures

    Supplementary Figures 1–6, Supplementary Tables 1–3 and Supplementary Note

  2. 2.

    Life Sciences Reporting Summary