Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Article
  • Published:

Nonhybrid, finished microbial genome assemblies from long-read SMRT sequencing data

Abstract

We present a hierarchical genome-assembly process (HGAP) for high-quality de novo microbial genome assemblies using only a single, long-insert shotgun DNA library in conjunction with Single Molecule, Real-Time (SMRT) DNA sequencing. Our method uses the longest reads as seeds to recruit all other reads for construction of highly accurate preassembled reads through a directed acyclic graph–based consensus procedure, which we follow with assembly using off-the-shelf long-read assemblers. In contrast to hybrid approaches, HGAP does not require highly accurate raw reads for error correction. We demonstrate efficient genome assembly for several microorganisms using as few as three SMRT Cell zero-mode waveguide arrays of sequencing and for BACs using just one SMRT Cell. Long repeat regions can be successfully resolved with this workflow. We also describe a consensus algorithm that incorporates SMRT sequencing primary quality values to produce de novo genome sequence exceeding 99.999% accuracy.

This is a preview of subscription content, access via your institution

Access options

Buy this article

Prices may be subject to local taxes which are calculated during checkout

Figure 1: Principle of the hierarchical genome-assembly process using long-insert-size DNA shotgun template libraries with SMRT sequencing.
Figure 2: Workflow for the de novo HGAP assembly of E. coli MG1655.

Similar content being viewed by others

Accession codes

Primary accessions

Sequence Read Archive

Referenced accessions

NCBI Reference Sequence

References

  1. Medini, D. et al. Microbiology in the post-genomic era. Nat. Rev. Microbiol. 6, 419–430 (2008).

    Article  CAS  PubMed  Google Scholar 

  2. Parkhill, J. & Wren, B.W. Bacterial epidemiology and biology—lessons from genome sequencing. Genome Biol. 12, 230 (2011).

    Article  PubMed  PubMed Central  Google Scholar 

  3. Gagarinova, A. & Emili, A. Genome-scale genetic manipulation methods for exploring bacterial molecular biology. Mol. Biosyst. 8, 1626–1638 (2012)10.1039/C2MB25040C .

    Article  CAS  PubMed  Google Scholar 

  4. Loman, N.J. et al. High-throughput bacterial genome sequencing: an embarrassment of choice, a world of opportunity. Nat. Rev. Microbiol. 10, 599–606 (2012).

    Article  CAS  PubMed  Google Scholar 

  5. Ricker, N., Qian, H. & Fulthorpe, R.R. The limitations of draft assemblies for understanding prokaryotic adaptation and evolution. Genomics 100, 167–175 (2012)10.1016/j.ygeno.2012.06.009.

    Article  CAS  PubMed  Google Scholar 

  6. Siguier, P., Filée, J. & Chandler, M. Insertion sequences in prokaryotic genomes. Curr. Opin. Microbiol. 9, 526–531 (2006).

    Article  CAS  PubMed  Google Scholar 

  7. Srikhanta, Y.N., Fox, K.L. & Jennings, M.P. The phasevarion: phase variation of type III DNA methyltransferases controls coordinated switching in multiple genes. Nat. Rev. Microbiol. 8, 196–206 (2010)10.1038/nrmicro2283.

    Article  CAS  PubMed  Google Scholar 

  8. Toussaint, A. & Chandler, M. Prokaryote genome fluidity: toward a system approach of the mobilome. Methods Mol. Biol. 804, 57–80 (2012)10.1007/978-1-61779-361-5_4.

    Article  CAS  PubMed  Google Scholar 

  9. Kingsford, C., Schatz, M.C. & Pop, M. Assembly complexity of prokaryotic genomes using short reads. BMC Bioinformatics 11, 21 (2010).

    Article  PubMed  PubMed Central  Google Scholar 

  10. Salzberg, S.L. et al. GAGE: a critical evaluation of genome assemblies and assembly algorithms. Genome Res. 22, 557–567 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  11. Fraser, C.M., Eisen, J.A., Nelson, K.E., Paulsen, I.T. & Salzberg, S.L. The value of complete microbial genome sequencing (you get what you pay for). J. Bacteriol. 184, 6403–6405 (2002).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  12. English, A.C. et al. Mind the gap: upgrading genomes with Pacific Biosciences RS long-read sequencing technology. PLoS ONE 7, e47768 (2012)10.1371/journal.pone.0047768.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  13. Rasko, D.A. et al. Origins of the E. coli strain causing an outbreak of hemolytic-uremic syndrome in Germany. N. Engl. J. Med. 365, 709–717 (2011).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  14. Bashir, A. et al. A hybrid approach for the automated finishing of bacterial genomes. Nat. Biotechnol. 30, 701–707 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  15. Koren, S. et al. Hybrid error correction and de novo assembly of single-molecule sequencing reads. Nat. Biotechnol. 30, 693–700 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  16. Ribeiro, F.J. et al. Finished bacterial genomes from shotgun sequence data. Genome Res. 22, 2270–2277 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  17. Sommer, D.D., Delcher, A.L., Salzberg, S.L. & Pop, M. Minimus: a fast, lightweight genome assembler. BMC Bioinformatics 8, 64 (2007).

    Article  PubMed  PubMed Central  Google Scholar 

  18. Treangen, T.J., Sommer, D.D., Angly, F.E., Koren, S. & Pop, M. Next generation sequence assembly with AMOS. Curr. Protoc. Bioinformatics 33, 11.8 (2011)10.1002/0471250953.bi1108s33.

    Article  Google Scholar 

  19. Blattner, F.R. et al. The complete genome sequence of Escherichia coli K-12. Science 277, 1453–1462 (1997).

    Article  CAS  PubMed  Google Scholar 

  20. Kurtz, S. et al. Versatile and open software for comparing large genomes. Genome Biol. 5, R12 (2004).

    Article  PubMed  PubMed Central  Google Scholar 

  21. Tindall, B.J. et al. Complete genome sequence of Meiothermus ruber type strain (21T). Stand. Genomic Sci. 3, 26–36 (2010)10.4056/sigs.1032748.

    Article  PubMed  PubMed Central  Google Scholar 

  22. Han, C. et al. Complete genome sequence of Pedobacter heparinus type strain (HIM 762-3T). Stand. Genomic Sci. 1, 54–62 (2009)10.4056/sigs.22138.

    Article  PubMed  PubMed Central  Google Scholar 

  23. Alkan, C., Coe, B.P. & Eichler, E.E. Genome structural variation discovery and genotyping. Nat. Rev. Genet. 12, 363–376 (2011).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  24. Ariyadasa, R. & Stein, N. Advances in BAC-based physical mapping and map integration strategies in plants. J. Biomed. Biotechnol. 2012, 184854 (2012)10.1155/2012/184854.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  25. Liu, G.E., Alkan, C., Jiang, L., Zhao, S. & Eichler, E.E. Comparative analysis of Alu repeats in primate genomes. Genome Res. 19, 876–885 (2009).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  26. Travers, K.J., Chin, C.S., Rank, D.R., Eid, J.S. & Turner, S.W. A flexible and efficient template format for circular consensus sequencing and SNP detection. Nucleic Acids Res. 38, e159 (2010).

    Article  PubMed  PubMed Central  Google Scholar 

  27. Ewing, B., Hillier, L., Wendl, M.C. & Green, P. Base-calling of automated sequencer traces using phred. I. Accuracy assessment. Genome Res. 8, 175–185 (1998).

    Article  CAS  PubMed  Google Scholar 

  28. Rieder, M.J., Taylor, S.L., Tobe, V.O. & Nickerson, D.A. Automating the identification of DNA variations using quality-based fluorescence re-sequencing: analysis of the human mitochondrial genome. Nucleic Acids Res. 26, 967–973 (1998).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  29. Loomis, E.W. et al. Sequencing the unsequenceable: expanded CGG-repeat alleles of the fragile X gene. Genome Res. 23, 121–128 (2013)10.1101/gr.141705.112.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  30. Zhang, X. et al. Improving genome assemblies by sequencing PCR products with PacBio. Biotechniques 53, 61–62 (2012)10.2144/0000113891.

    Article  CAS  PubMed  Google Scholar 

  31. Carneiro, M.O. et al. Pacific Biosciences sequencing technology for genotyping and variation discovery in human data. BMC Genomics 13, 375 (2012)10.1186/1471-2164-13-375.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  32. Chain, P.S.G. et al. Genome project standards in a new era of sequencing. Science 326, 236–237 (2009).

    Article  CAS  PubMed  Google Scholar 

  33. Murray, I.A. et al. The methylomes of six bacteria. Nucleic Acids Res. 40, 11450–11462 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  34. Milne, I. et al. Tablet—next generation sequence assembly visualization. Bioinformatics 26, 401–402 (2010).

    Article  CAS  PubMed  Google Scholar 

  35. Krumsiek, J., Arnold, R. & Rattei, T. Gepard: a rapid and sensitive tool for creating dotplots on genome scale. Bioinformatics 23, 1026–1028 (2007)10.1093/bioinformatics/btm039.

    Article  CAS  PubMed  Google Scholar 

  36. Chaisson, M.J. & Tesler, G. Mapping single molecule sequencing reads using basic local alignment with successive refinement (BLASR): application and theory. BMC Bioinformatics 13, 238 (2012)10.1186/1471-2105-13-238.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  37. Lee, C., Grasso, C. & Sharlow, M.F. Multiple sequence alignment using partial order graphs. Bioinformatics 18, 452–464 (2002).

    Article  CAS  PubMed  Google Scholar 

  38. Edgar, R.C. MUSCLE: multiple sequence alignment with high accuracy and high throughput. Nucleic Acids Res. 32, 1792–1797 (2004).

    CAS  PubMed  PubMed Central  Google Scholar 

  39. Rausch, T. et al. A consistency-based consensus algorithm for de novo and reference-guided sequence assembly of short reads. Bioinformatics 25, 1118–1124 (2009)10.1093/bioinformatics/btp131.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  40. Huang, X. An improved sequence assembly program. Genomics 33, 21–31 (1996).

    Article  CAS  PubMed  Google Scholar 

  41. Kelley, D.R., Schatz, M.C. & Salzberg, S.L. Quake: quality-aware detection and correction of sequencing errors. Genome Biol. 11, R116 (2010).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  42. Myers, E.W. et al. A whole-genome assembly of Drosophila. Science 287, 2196–2204 (2000).

    Article  CAS  PubMed  Google Scholar 

  43. Eid, J. et al. Real-time DNA sequencing from single polymerase molecules. Science 323, 133–138 (2009).

    Article  CAS  PubMed  Google Scholar 

  44. Korlach, J. et al. Real-time DNA sequencing from single polymerase molecules. Methods Enzymol. 472, 431–455 (2010).

    Article  CAS  PubMed  Google Scholar 

Download references

Acknowledgements

We thank S. Clingenpeel (Joint Genome Institute) for growing cultures and performing DNA extraction for M. ruber and P. heparinus; B. Munson and F. Antonacci for assistance with the BAC library construction; and K. Travers, S. McCalmon, M. Wang, U. Nguyen, S. Ranade, M. Ashby, L. Hon and L. Hickey (Pacific Biosciences) for assistance in sample preparation, sequencing and data analysis. The authors acknowledge the ATCC for providing the E. coli K-12 MG1655 strain. We thank S. Koren and A. Phillippy for pointing out to us the SMRT sequencing–based gap-filling functionality development in the Celera Assembler. The work conducted by the US Department of Energy Joint Genome Institute is supported by the Office of Science of the US Department of Energy under contract no. DE-AC02-05CH11231.

Author information

Authors and Affiliations

Authors

Contributions

C.-S.C., A. Copeland., E.E.E., S.W.T. & J.K. designed the experiments; C.-S.C., D.H.A., P.M., A.A.K., J.D., A. Clum and J.H. analyzed data; C.H. performed the validation sequencing; and C.-S.C., D.H.A., P.M., A.A.K., A. Copeland., E.E.E. and J.K. wrote the manuscript.

Corresponding author

Correspondence to Jonas Korlach.

Ethics declarations

Competing interests

C.-S.C., D.H.A., P.M., A.A.K., J.D., C.H., S.W.T. and J.K. are employees of Pacific Biosciences, a company commercializing DNA sequencing technologies.

Supplementary information

Supplementary Text and Figures

Supplementary Figures 1–15, Supplementary Tables 1–5 and Supplementary Notes 1 and 2 (PDF 3942 kb)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Chin, CS., Alexander, D., Marks, P. et al. Nonhybrid, finished microbial genome assemblies from long-read SMRT sequencing data. Nat Methods 10, 563–569 (2013). https://doi.org/10.1038/nmeth.2474

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1038/nmeth.2474

This article is cited by

Search

Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing