Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

A hybrid approach for the automated finishing of bacterial genomes

Abstract

Advances in DNA sequencing technology have improved our ability to characterize most genomic diversity. However, accurate resolution of large structural events is challenging because of the short read lengths of second-generation technologies. Third-generation sequencing technologies, which can yield longer multikilobase reads, have the potential to address limitations associated with genome assembly. Here we combine sequencing data from second- and third-generation DNA sequencing technologies to assemble the two-chromosome genome of a recent Haitian cholera outbreak strain into two nearly finished contigs at >99.9% accuracy. Complex regions with clinically relevant structure were completely resolved. In separate control assemblies on experimental and simulated data for the canonical N16961 cholera reference strain, we obtained 14 scaffolds of greater than 1 kb for the experimental data and 8 scaffolds of greater than 1 kb for the simulated data, which allowed us to correct several errors in contigs assembled from the short-read data alone. This work provides a blueprint for the next generation of rapid microbial identification and full-genome assembly.

This is a preview of subscription content

Access options

Rent or Buy article

Get time limited or full article access on ReadCube.

from$8.99

All prices are NET prices.

Figure 1: H1 assembly.
Figure 2: Resolution of rRNA genes.
Figure 3: CTX-TLC assembly and validation.
Figure 4: Superintegron assembly.

Accession codes

Primary accessions

BioProject

NCBI Reference Sequence

Sequence Read Archive

Referenced accessions

NCBI Reference Sequence

References

  1. 1

    Chin, C.S. et al. The origin of the Haitian cholera outbreak strain. N. Engl. J. Med. 364, 33–42 (2011).

    CAS  Article  Google Scholar 

  2. 2

    Rasko, D.A. et al. Origins of the E. coli strain causing an outbreak of hemolytic-uremic syndrome in Germany. N. Engl. J. Med. 365, 709–717 (2011).

    CAS  Article  Google Scholar 

  3. 3

    Rohde, H. et al. Open-source genomic analysis of Shiga-toxin–producing E. coli O104:H4. N. Engl. J. Med. 365, 718–724 (2011).

    CAS  Article  Google Scholar 

  4. 4

    Ali, A. et al. Recent clonal origin of cholera in Haiti. Emerg. Infect. Dis. 17, 699–701 (2011).

    Article  Google Scholar 

  5. 5

    Hendriksen, R.S. et al. Population genetics of Vibrio cholerae from Nepal in 2010: evidence on the origin of the Haitian outbreak. MBio 2, e00157–e00111 (2011).

    Article  Google Scholar 

  6. 6

    Mutreja, A. et al. Evidence for several waves of global transmission in the seventh cholera pandemic. Nature 477, 462–465 (2011).

    CAS  Article  Google Scholar 

  7. 7

    Reimer, A.R. et al. Comparative genomics of Vibrio cholerae from Haiti, Asia, and Africa. Emerg. Infect. Dis. 17, 2113–2121 (2011).

    CAS  Article  Google Scholar 

  8. 8

    Metzker, M.L. Sequencing technologies—the next generation. Nat. Rev. Genet. 11, 31–46 (2010).

    CAS  Article  Google Scholar 

  9. 9

    Schadt, E.E., Turner, S. & Kasarskis, A. A window into third generation sequencing. Hum. Mol. Genet. 19, R227–R240 (2010); erratum 20, 853 (2011).

    CAS  Article  Google Scholar 

  10. 10

    Mardis, E.R. Next-generation DNA sequencing methods. Annu. Rev. Genomics Hum. Genet. 9, 387–402 (2008).

    CAS  Article  Google Scholar 

  11. 11

    Chaisson, M., Pevzner, P. & Tang, H. Fragment assembly with short reads. Bioinformatics 20, 2067–2074 (2004).

    CAS  Article  Google Scholar 

  12. 12

    Pevzner, P.A., Tang, H. & Waterman, M.S. An Eulerian path approach to DNA fragment assembly. Proc. Natl. Acad. Sci. USA 98, 9748–9753 (2001).

    CAS  Article  Google Scholar 

  13. 13

    Myers, E.W. The fragment assembly string graph. Bioinformatics 21 (suppl. 2), ii79–ii85 (2005).

    CAS  PubMed  Google Scholar 

  14. 14

    Medvedev, P. & Brudno, M. Maximum likelihood genome assembly. J. Comput. Biol. 16, 1101–1116 (2009).

    CAS  Article  Google Scholar 

  15. 15

    Batzoglou, S. et al. ARACHNE: a whole-genome shotgun assembler. Genome Res. 12, 177–189 (2002).

    CAS  Article  Google Scholar 

  16. 16

    Myers, E.W. et al. A whole-genome assembly of Drosophila. Science 287, 2196–2204 (2000).

    CAS  Article  Google Scholar 

  17. 17

    Schatz, M.C., Delcher, A.L. & Salzberg, S.L. Assembly of large genomes using second-generation sequencing. Genome Res. 20, 1165–1173 (2010).

    CAS  Article  Google Scholar 

  18. 18

    Chaisson, M.J. & Pevzner, P.A. Short read fragment assembly of bacterial genomes. Genome Res. 18, 324–330 (2008).

    CAS  Article  Google Scholar 

  19. 19

    Simpson, J.T. et al. ABySS: a parallel assembler for short read sequence data. Genome Res. 19, 1117–1123 (2009).

    CAS  Article  Google Scholar 

  20. 20

    Zerbino, D.R. & Birney, E. Velvet: algorithms for de novo short read assembly using de Bruijn graphs. Genome Res. 18, 821–829 (2008).

    CAS  Article  Google Scholar 

  21. 21

    Butler, J. et al. ALLPATHS: de novo assembly of whole-genome shotgun microreads. Genome Res. 18, 810–820 (2008).

    CAS  Article  Google Scholar 

  22. 22

    Margulies, M. et al. Genome sequencing in microfabricated high-density picolitre reactors. Nature 437, 376–380 (2005); erratum 441, 120 (2006).

    CAS  Article  Google Scholar 

  23. 23

    Li, R. et al. De novo assembly of human genomes with massively parallel short read sequencing. Genome Res. 20, 265–272 (2010).

    CAS  Article  Google Scholar 

  24. 24

    Kingsford, C., Schatz, M.C. & Pop, M. Assembly complexity of prokaryotic genomes using short reads. BMC Bioinformatics 11, 21 (2010).

    Article  Google Scholar 

  25. 25

    Alkan, C., Sajjadian, S. & Eichler, E.E. Limitations of next-generation genome sequence assembly. Nat. Methods 8, 61–65 (2010); comment 8, 59–60 (2011).

    Article  Google Scholar 

  26. 26

    Chain, P.S.G. et al. Genomics. Genome project standards in a new era of sequencing. Science 326, 236–237 (2009).

    CAS  Article  Google Scholar 

  27. 27

    Alkan, C., Coe, B.P. & Eichler, E.E. Genome structural variation discovery and genotyping. Nat. Rev. Genet. 12, 363–376 (2011).

    CAS  Article  Google Scholar 

  28. 28

    Li, Y. et al. Structural variation in two human genomes mapped at single-nucleotide resolution by whole genome de novo assembly. Nat. Biotechnol. 29, 723–730 (2011).

    CAS  Article  Google Scholar 

  29. 29

    Nelson, K.E. et al. A catalog of reference genomes from the human microbiome. Science 328, 994–999 (2010).

    CAS  Article  Google Scholar 

  30. 30

    Liolios, K. et al. The Genomes On Line Database (GOLD) in 2009: status of genomic and metagenomic projects and their associated metadata. Nucleic Acids Res. 38, D346–D354 (2010).

    CAS  Article  Google Scholar 

  31. 31

    Schadt, E.E., Turner, S. & Kasarskis, A. A window into third-generation sequencing. Hum. Mol. Genet. 19, R227–R240 (2010).

    CAS  Article  Google Scholar 

  32. 32

    Goldberg, S.M.D. et al. A Sanger/pyrosequencing hybrid approach for the generation of high-quality draft assemblies of marine microbial genomes. Proc. Natl. Acad. Sci. USA 103, 11240–11245 (2006).

    CAS  Article  Google Scholar 

  33. 33

    Pop, M. Genome assembly reborn: recent computational challenges. Brief. Bioinform. 10, 354–366 (2009).

    CAS  Article  Google Scholar 

  34. 34

    Miller, J.R. et al. Aggressive assembly of pyrosequencing reads with mates. Bioinformatics 24, 2818–2824 (2008).

    CAS  Article  Google Scholar 

  35. 35

    Reinhardt, J.A. et al. De novo assembly using low-coverage short read sequence data from the rice pathogen Pseudomonas syringae pv. oryzae. Genome Res. 19, 294–305 (2009).

    CAS  Article  Google Scholar 

  36. 36

    Kong, A. et al. Parental origin of sequence variants associated with complex diseases. Nature 462, 868–874 (2009).

    CAS  Article  Google Scholar 

  37. 37

    Ritz, A., Bashir, A. & Raphael, B.J. Structural variation analysis with strobe reads. Bioinformatics 26, 1291–1298 (2010).

    CAS  Article  Google Scholar 

  38. 38

    Grim, C.J. et al. Genome sequence of hybrid Vibrio cholerae O1 MJ-1236, B-33, and CIRS101 and comparative genomics with V. cholerae. J. Bacteriol. 192, 3524–3533 (2010).

    CAS  Article  Google Scholar 

  39. 39

    Frerichs, R.R., Keim, P.S., Barrais, R. & Piarroux, R. Nepalese origin of cholera epidemic in Haiti. Clin. Microbiol. Infect. 18, E158–E163 (2012).

    CAS  Article  Google Scholar 

  40. 40

    Davis, B.M. & Waldor, M.K. CTXϕ contains a hybrid genome derived from tandemly integrated elements. Proc. Natl. Acad. Sci. USA 97, 8572–8577 (2000).

    CAS  Article  Google Scholar 

  41. 41

    Rubin, E.J., Lin, W., Mekalanos, J.J. & Waldor, M.K. Replication and integration of a Vibrio cholerae cryptic plasmid linked to the CTX prophage. Mol. Microbiol. 28, 1247–1254 (1998).

    CAS  Article  Google Scholar 

  42. 42

    Hassan, F., Kamruzzaman, M., Mekalanos, J.J. & Faruque, S.M. Satellite phage TLCϕ enables toxigenic conversion by CTX phage through dif site alteration. Nature 467, 982–985 (2010).

    CAS  Article  Google Scholar 

  43. 43

    Mazel, D., Dychinco, B., Webb, V.A. & Davies, J. A distinctive class of integron in the Vibrio cholerae genome. Science 280, 605–608 (1998).

    CAS  Article  Google Scholar 

  44. 44

    Rowe-Magnus, D.A., Guerout, A.M. & Mazel, D. Super-integrons. Res. Microbiol. 150, 641–651 (1999).

    CAS  Article  Google Scholar 

  45. 45

    Mazel, D. Integrons: agents of bacterial evolution. Nat. Rev. Microbiol. 4, 608–620 (2006).

    CAS  Article  Google Scholar 

  46. 46

    Pop, M., Kosack, D.S. & Salzberg, S.L. Hierarchical scaffolding with Bambus. Genome Res. 14, 149–159 (2004).

    CAS  Article  Google Scholar 

  47. 47

    Kurtz, S. et al. Versatile and open software for comparing large genomes. Genome Biol. 5, R12 (2004).

    Article  Google Scholar 

  48. 48

    Dijkstra, E.W. A note on two problems in connexion with graphs. Numerische Mathematik 1, 269–271 (1959).

    Article  Google Scholar 

Download references

Acknowledgements

This study was supported in part by the US National Institutes of Health National Institute of General Medical Sciences grant R01GM068851 (J.J.M. and W.P.R.), NIH R37 AI-42347 (B.M.D. and M.K.W.) and the Howard Hughes Medical Institute (B.M.D. and M.K.W.).

Author information

Affiliations

Authors

Contributions

A.B., A.A.K., W.P.R., C.S.C., E.P., M.F., C.L.T., M.T., B.M.D., A.K., J.J.M., M.K.W. and E.E.S. designed the experiments; A.B., A.A.K., C.S.C., D.W., J.S. and J.B. designed the methods; W.P.R., E.P., D.H., M.A., S.W., P.P., R.S., J.Y., M.V., E.M., K.L., S.L., B.L., A.J., L.R., M.F., C.L.T., M.T. and B.M.D. carried out all sample-preparation experiments, all sequencing runs and PCR-validation experiments; A.B., A.A.K., W.P.R., C.S.C., D.W., J.B., A.A.K., M.K.W. and E.E.S. jointly analyzed the data sets; and A.B., A.A.K., W.P.R., L.R., M.F., C.L.T., M.T., B.M.D., J.J.M., M.K.W. and E.E.S. wrote the manuscript.

Corresponding author

Correspondence to Eric E Schadt.

Ethics declarations

Competing interests

Many of the authors are employees of and own stock in Pacific Biosciences.

Supplementary information

Supplementary Text and Figures

Supplementary Methods, Supplementary Results, Supplementary Tables 1-9 and Supplementary Figs. 1-13 (PDF 2063 kb)

Rights and permissions

Reprints and Permissions

About this article

Cite this article

Bashir, A., Klammer, A., Robins, W. et al. A hybrid approach for the automated finishing of bacterial genomes. Nat Biotechnol 30, 701–707 (2012). https://doi.org/10.1038/nbt.2288

Download citation

Further reading

Search

Quick links