A hybrid approach for the automated finishing of bacterial genomes

Article metrics

Abstract

Advances in DNA sequencing technology have improved our ability to characterize most genomic diversity. However, accurate resolution of large structural events is challenging because of the short read lengths of second-generation technologies. Third-generation sequencing technologies, which can yield longer multikilobase reads, have the potential to address limitations associated with genome assembly. Here we combine sequencing data from second- and third-generation DNA sequencing technologies to assemble the two-chromosome genome of a recent Haitian cholera outbreak strain into two nearly finished contigs at >99.9% accuracy. Complex regions with clinically relevant structure were completely resolved. In separate control assemblies on experimental and simulated data for the canonical N16961 cholera reference strain, we obtained 14 scaffolds of greater than 1 kb for the experimental data and 8 scaffolds of greater than 1 kb for the simulated data, which allowed us to correct several errors in contigs assembled from the short-read data alone. This work provides a blueprint for the next generation of rapid microbial identification and full-genome assembly.

Access options

Rent or Buy article

Get time limited or full article access on ReadCube.

from$8.99

All prices are NET prices.

Figure 1: H1 assembly.
Figure 2: Resolution of rRNA genes.
Figure 3: CTX-TLC assembly and validation.
Figure 4: Superintegron assembly.

Accession codes

Primary accessions

BioProject

NCBI Reference Sequence

Sequence Read Archive

Referenced accessions

NCBI Reference Sequence

References

  1. 1

    Chin, C.S. et al. The origin of the Haitian cholera outbreak strain. N. Engl. J. Med. 364, 33–42 (2011).

  2. 2

    Rasko, D.A. et al. Origins of the E. coli strain causing an outbreak of hemolytic-uremic syndrome in Germany. N. Engl. J. Med. 365, 709–717 (2011).

  3. 3

    Rohde, H. et al. Open-source genomic analysis of Shiga-toxin–producing E. coli O104:H4. N. Engl. J. Med. 365, 718–724 (2011).

  4. 4

    Ali, A. et al. Recent clonal origin of cholera in Haiti. Emerg. Infect. Dis. 17, 699–701 (2011).

  5. 5

    Hendriksen, R.S. et al. Population genetics of Vibrio cholerae from Nepal in 2010: evidence on the origin of the Haitian outbreak. MBio 2, e00157–e00111 (2011).

  6. 6

    Mutreja, A. et al. Evidence for several waves of global transmission in the seventh cholera pandemic. Nature 477, 462–465 (2011).

  7. 7

    Reimer, A.R. et al. Comparative genomics of Vibrio cholerae from Haiti, Asia, and Africa. Emerg. Infect. Dis. 17, 2113–2121 (2011).

  8. 8

    Metzker, M.L. Sequencing technologies—the next generation. Nat. Rev. Genet. 11, 31–46 (2010).

  9. 9

    Schadt, E.E., Turner, S. & Kasarskis, A. A window into third generation sequencing. Hum. Mol. Genet. 19, R227–R240 (2010); erratum 20, 853 (2011).

  10. 10

    Mardis, E.R. Next-generation DNA sequencing methods. Annu. Rev. Genomics Hum. Genet. 9, 387–402 (2008).

  11. 11

    Chaisson, M., Pevzner, P. & Tang, H. Fragment assembly with short reads. Bioinformatics 20, 2067–2074 (2004).

  12. 12

    Pevzner, P.A., Tang, H. & Waterman, M.S. An Eulerian path approach to DNA fragment assembly. Proc. Natl. Acad. Sci. USA 98, 9748–9753 (2001).

  13. 13

    Myers, E.W. The fragment assembly string graph. Bioinformatics 21 (suppl. 2), ii79–ii85 (2005).

  14. 14

    Medvedev, P. & Brudno, M. Maximum likelihood genome assembly. J. Comput. Biol. 16, 1101–1116 (2009).

  15. 15

    Batzoglou, S. et al. ARACHNE: a whole-genome shotgun assembler. Genome Res. 12, 177–189 (2002).

  16. 16

    Myers, E.W. et al. A whole-genome assembly of Drosophila. Science 287, 2196–2204 (2000).

  17. 17

    Schatz, M.C., Delcher, A.L. & Salzberg, S.L. Assembly of large genomes using second-generation sequencing. Genome Res. 20, 1165–1173 (2010).

  18. 18

    Chaisson, M.J. & Pevzner, P.A. Short read fragment assembly of bacterial genomes. Genome Res. 18, 324–330 (2008).

  19. 19

    Simpson, J.T. et al. ABySS: a parallel assembler for short read sequence data. Genome Res. 19, 1117–1123 (2009).

  20. 20

    Zerbino, D.R. & Birney, E. Velvet: algorithms for de novo short read assembly using de Bruijn graphs. Genome Res. 18, 821–829 (2008).

  21. 21

    Butler, J. et al. ALLPATHS: de novo assembly of whole-genome shotgun microreads. Genome Res. 18, 810–820 (2008).

  22. 22

    Margulies, M. et al. Genome sequencing in microfabricated high-density picolitre reactors. Nature 437, 376–380 (2005); erratum 441, 120 (2006).

  23. 23

    Li, R. et al. De novo assembly of human genomes with massively parallel short read sequencing. Genome Res. 20, 265–272 (2010).

  24. 24

    Kingsford, C., Schatz, M.C. & Pop, M. Assembly complexity of prokaryotic genomes using short reads. BMC Bioinformatics 11, 21 (2010).

  25. 25

    Alkan, C., Sajjadian, S. & Eichler, E.E. Limitations of next-generation genome sequence assembly. Nat. Methods 8, 61–65 (2010); comment 8, 59–60 (2011).

  26. 26

    Chain, P.S.G. et al. Genomics. Genome project standards in a new era of sequencing. Science 326, 236–237 (2009).

  27. 27

    Alkan, C., Coe, B.P. & Eichler, E.E. Genome structural variation discovery and genotyping. Nat. Rev. Genet. 12, 363–376 (2011).

  28. 28

    Li, Y. et al. Structural variation in two human genomes mapped at single-nucleotide resolution by whole genome de novo assembly. Nat. Biotechnol. 29, 723–730 (2011).

  29. 29

    Nelson, K.E. et al. A catalog of reference genomes from the human microbiome. Science 328, 994–999 (2010).

  30. 30

    Liolios, K. et al. The Genomes On Line Database (GOLD) in 2009: status of genomic and metagenomic projects and their associated metadata. Nucleic Acids Res. 38, D346–D354 (2010).

  31. 31

    Schadt, E.E., Turner, S. & Kasarskis, A. A window into third-generation sequencing. Hum. Mol. Genet. 19, R227–R240 (2010).

  32. 32

    Goldberg, S.M.D. et al. A Sanger/pyrosequencing hybrid approach for the generation of high-quality draft assemblies of marine microbial genomes. Proc. Natl. Acad. Sci. USA 103, 11240–11245 (2006).

  33. 33

    Pop, M. Genome assembly reborn: recent computational challenges. Brief. Bioinform. 10, 354–366 (2009).

  34. 34

    Miller, J.R. et al. Aggressive assembly of pyrosequencing reads with mates. Bioinformatics 24, 2818–2824 (2008).

  35. 35

    Reinhardt, J.A. et al. De novo assembly using low-coverage short read sequence data from the rice pathogen Pseudomonas syringae pv. oryzae. Genome Res. 19, 294–305 (2009).

  36. 36

    Kong, A. et al. Parental origin of sequence variants associated with complex diseases. Nature 462, 868–874 (2009).

  37. 37

    Ritz, A., Bashir, A. & Raphael, B.J. Structural variation analysis with strobe reads. Bioinformatics 26, 1291–1298 (2010).

  38. 38

    Grim, C.J. et al. Genome sequence of hybrid Vibrio cholerae O1 MJ-1236, B-33, and CIRS101 and comparative genomics with V. cholerae. J. Bacteriol. 192, 3524–3533 (2010).

  39. 39

    Frerichs, R.R., Keim, P.S., Barrais, R. & Piarroux, R. Nepalese origin of cholera epidemic in Haiti. Clin. Microbiol. Infect. 18, E158–E163 (2012).

  40. 40

    Davis, B.M. & Waldor, M.K. CTXϕ contains a hybrid genome derived from tandemly integrated elements. Proc. Natl. Acad. Sci. USA 97, 8572–8577 (2000).

  41. 41

    Rubin, E.J., Lin, W., Mekalanos, J.J. & Waldor, M.K. Replication and integration of a Vibrio cholerae cryptic plasmid linked to the CTX prophage. Mol. Microbiol. 28, 1247–1254 (1998).

  42. 42

    Hassan, F., Kamruzzaman, M., Mekalanos, J.J. & Faruque, S.M. Satellite phage TLCϕ enables toxigenic conversion by CTX phage through dif site alteration. Nature 467, 982–985 (2010).

  43. 43

    Mazel, D., Dychinco, B., Webb, V.A. & Davies, J. A distinctive class of integron in the Vibrio cholerae genome. Science 280, 605–608 (1998).

  44. 44

    Rowe-Magnus, D.A., Guerout, A.M. & Mazel, D. Super-integrons. Res. Microbiol. 150, 641–651 (1999).

  45. 45

    Mazel, D. Integrons: agents of bacterial evolution. Nat. Rev. Microbiol. 4, 608–620 (2006).

  46. 46

    Pop, M., Kosack, D.S. & Salzberg, S.L. Hierarchical scaffolding with Bambus. Genome Res. 14, 149–159 (2004).

  47. 47

    Kurtz, S. et al. Versatile and open software for comparing large genomes. Genome Biol. 5, R12 (2004).

  48. 48

    Dijkstra, E.W. A note on two problems in connexion with graphs. Numerische Mathematik 1, 269–271 (1959).

Download references

Acknowledgements

This study was supported in part by the US National Institutes of Health National Institute of General Medical Sciences grant R01GM068851 (J.J.M. and W.P.R.), NIH R37 AI-42347 (B.M.D. and M.K.W.) and the Howard Hughes Medical Institute (B.M.D. and M.K.W.).

Author information

A.B., A.A.K., W.P.R., C.S.C., E.P., M.F., C.L.T., M.T., B.M.D., A.K., J.J.M., M.K.W. and E.E.S. designed the experiments; A.B., A.A.K., C.S.C., D.W., J.S. and J.B. designed the methods; W.P.R., E.P., D.H., M.A., S.W., P.P., R.S., J.Y., M.V., E.M., K.L., S.L., B.L., A.J., L.R., M.F., C.L.T., M.T. and B.M.D. carried out all sample-preparation experiments, all sequencing runs and PCR-validation experiments; A.B., A.A.K., W.P.R., C.S.C., D.W., J.B., A.A.K., M.K.W. and E.E.S. jointly analyzed the data sets; and A.B., A.A.K., W.P.R., L.R., M.F., C.L.T., M.T., B.M.D., J.J.M., M.K.W. and E.E.S. wrote the manuscript.

Correspondence to Eric E Schadt.

Ethics declarations

Competing interests

Many of the authors are employees of and own stock in Pacific Biosciences.

Supplementary information

Supplementary Text and Figures

Supplementary Methods, Supplementary Results, Supplementary Tables 1-9 and Supplementary Figs. 1-13 (PDF 2063 kb)

Rights and permissions

Reprints and Permissions

About this article

Cite this article

Bashir, A., Klammer, A., Robins, W. et al. A hybrid approach for the automated finishing of bacterial genomes. Nat Biotechnol 30, 701–707 (2012) doi:10.1038/nbt.2288

Download citation

Further reading