We present a hierarchical genome-assembly process (HGAP) for high-quality de novo microbial genome assemblies using only a single, long-insert shotgun DNA library in conjunction with Single Molecule, Real-Time (SMRT) DNA sequencing. Our method uses the longest reads as seeds to recruit all other reads for construction of highly accurate preassembled reads through a directed acyclic graph–based consensus procedure, which we follow with assembly using off-the-shelf long-read assemblers. In contrast to hybrid approaches, HGAP does not require highly accurate raw reads for error correction. We demonstrate efficient genome assembly for several microorganisms using as few as three SMRT Cell zero-mode waveguide arrays of sequencing and for BACs using just one SMRT Cell. Long repeat regions can be successfully resolved with this workflow. We also describe a consensus algorithm that incorporates SMRT sequencing primary quality values to produce de novo genome sequence exceeding 99.999% accuracy.
This is a preview of subscription content, access via your institution
Open Access articles citing this article.
BMC Genomics Open Access 17 July 2023
Scientific Data Open Access 12 July 2023
BMC Biology Open Access 06 June 2023
Subscribe to this journal
Receive 12 print issues and online access
$259.00 per year
only $21.58 per issue
Rent or buy this article
Prices vary by article type
Prices may be subject to local taxes which are calculated during checkout
Medini, D. et al. Microbiology in the post-genomic era. Nat. Rev. Microbiol. 6, 419–430 (2008).
Parkhill, J. & Wren, B.W. Bacterial epidemiology and biology—lessons from genome sequencing. Genome Biol. 12, 230 (2011).
Gagarinova, A. & Emili, A. Genome-scale genetic manipulation methods for exploring bacterial molecular biology. Mol. Biosyst. 8, 1626–1638 (2012)10.1039/C2MB25040C .
Loman, N.J. et al. High-throughput bacterial genome sequencing: an embarrassment of choice, a world of opportunity. Nat. Rev. Microbiol. 10, 599–606 (2012).
Ricker, N., Qian, H. & Fulthorpe, R.R. The limitations of draft assemblies for understanding prokaryotic adaptation and evolution. Genomics 100, 167–175 (2012)10.1016/j.ygeno.2012.06.009.
Siguier, P., Filée, J. & Chandler, M. Insertion sequences in prokaryotic genomes. Curr. Opin. Microbiol. 9, 526–531 (2006).
Srikhanta, Y.N., Fox, K.L. & Jennings, M.P. The phasevarion: phase variation of type III DNA methyltransferases controls coordinated switching in multiple genes. Nat. Rev. Microbiol. 8, 196–206 (2010)10.1038/nrmicro2283.
Toussaint, A. & Chandler, M. Prokaryote genome fluidity: toward a system approach of the mobilome. Methods Mol. Biol. 804, 57–80 (2012)10.1007/978-1-61779-361-5_4.
Kingsford, C., Schatz, M.C. & Pop, M. Assembly complexity of prokaryotic genomes using short reads. BMC Bioinformatics 11, 21 (2010).
Salzberg, S.L. et al. GAGE: a critical evaluation of genome assemblies and assembly algorithms. Genome Res. 22, 557–567 (2012).
Fraser, C.M., Eisen, J.A., Nelson, K.E., Paulsen, I.T. & Salzberg, S.L. The value of complete microbial genome sequencing (you get what you pay for). J. Bacteriol. 184, 6403–6405 (2002).
English, A.C. et al. Mind the gap: upgrading genomes with Pacific Biosciences RS long-read sequencing technology. PLoS ONE 7, e47768 (2012)10.1371/journal.pone.0047768.
Rasko, D.A. et al. Origins of the E. coli strain causing an outbreak of hemolytic-uremic syndrome in Germany. N. Engl. J. Med. 365, 709–717 (2011).
Bashir, A. et al. A hybrid approach for the automated finishing of bacterial genomes. Nat. Biotechnol. 30, 701–707 (2012).
Koren, S. et al. Hybrid error correction and de novo assembly of single-molecule sequencing reads. Nat. Biotechnol. 30, 693–700 (2012).
Ribeiro, F.J. et al. Finished bacterial genomes from shotgun sequence data. Genome Res. 22, 2270–2277 (2012).
Sommer, D.D., Delcher, A.L., Salzberg, S.L. & Pop, M. Minimus: a fast, lightweight genome assembler. BMC Bioinformatics 8, 64 (2007).
Treangen, T.J., Sommer, D.D., Angly, F.E., Koren, S. & Pop, M. Next generation sequence assembly with AMOS. Curr. Protoc. Bioinformatics 33, 11.8 (2011)10.1002/0471250953.bi1108s33.
Blattner, F.R. et al. The complete genome sequence of Escherichia coli K-12. Science 277, 1453–1462 (1997).
Kurtz, S. et al. Versatile and open software for comparing large genomes. Genome Biol. 5, R12 (2004).
Tindall, B.J. et al. Complete genome sequence of Meiothermus ruber type strain (21T). Stand. Genomic Sci. 3, 26–36 (2010)10.4056/sigs.1032748.
Han, C. et al. Complete genome sequence of Pedobacter heparinus type strain (HIM 762-3T). Stand. Genomic Sci. 1, 54–62 (2009)10.4056/sigs.22138.
Alkan, C., Coe, B.P. & Eichler, E.E. Genome structural variation discovery and genotyping. Nat. Rev. Genet. 12, 363–376 (2011).
Ariyadasa, R. & Stein, N. Advances in BAC-based physical mapping and map integration strategies in plants. J. Biomed. Biotechnol. 2012, 184854 (2012)10.1155/2012/184854.
Liu, G.E., Alkan, C., Jiang, L., Zhao, S. & Eichler, E.E. Comparative analysis of Alu repeats in primate genomes. Genome Res. 19, 876–885 (2009).
Travers, K.J., Chin, C.S., Rank, D.R., Eid, J.S. & Turner, S.W. A flexible and efficient template format for circular consensus sequencing and SNP detection. Nucleic Acids Res. 38, e159 (2010).
Ewing, B., Hillier, L., Wendl, M.C. & Green, P. Base-calling of automated sequencer traces using phred. I. Accuracy assessment. Genome Res. 8, 175–185 (1998).
Rieder, M.J., Taylor, S.L., Tobe, V.O. & Nickerson, D.A. Automating the identification of DNA variations using quality-based fluorescence re-sequencing: analysis of the human mitochondrial genome. Nucleic Acids Res. 26, 967–973 (1998).
Loomis, E.W. et al. Sequencing the unsequenceable: expanded CGG-repeat alleles of the fragile X gene. Genome Res. 23, 121–128 (2013)10.1101/gr.141705.112.
Zhang, X. et al. Improving genome assemblies by sequencing PCR products with PacBio. Biotechniques 53, 61–62 (2012)10.2144/0000113891.
Carneiro, M.O. et al. Pacific Biosciences sequencing technology for genotyping and variation discovery in human data. BMC Genomics 13, 375 (2012)10.1186/1471-2164-13-375.
Chain, P.S.G. et al. Genome project standards in a new era of sequencing. Science 326, 236–237 (2009).
Murray, I.A. et al. The methylomes of six bacteria. Nucleic Acids Res. 40, 11450–11462 (2012).
Milne, I. et al. Tablet—next generation sequence assembly visualization. Bioinformatics 26, 401–402 (2010).
Krumsiek, J., Arnold, R. & Rattei, T. Gepard: a rapid and sensitive tool for creating dotplots on genome scale. Bioinformatics 23, 1026–1028 (2007)10.1093/bioinformatics/btm039.
Chaisson, M.J. & Tesler, G. Mapping single molecule sequencing reads using basic local alignment with successive refinement (BLASR): application and theory. BMC Bioinformatics 13, 238 (2012)10.1186/1471-2105-13-238.
Lee, C., Grasso, C. & Sharlow, M.F. Multiple sequence alignment using partial order graphs. Bioinformatics 18, 452–464 (2002).
Edgar, R.C. MUSCLE: multiple sequence alignment with high accuracy and high throughput. Nucleic Acids Res. 32, 1792–1797 (2004).
Rausch, T. et al. A consistency-based consensus algorithm for de novo and reference-guided sequence assembly of short reads. Bioinformatics 25, 1118–1124 (2009)10.1093/bioinformatics/btp131.
Huang, X. An improved sequence assembly program. Genomics 33, 21–31 (1996).
Kelley, D.R., Schatz, M.C. & Salzberg, S.L. Quake: quality-aware detection and correction of sequencing errors. Genome Biol. 11, R116 (2010).
Myers, E.W. et al. A whole-genome assembly of Drosophila. Science 287, 2196–2204 (2000).
Eid, J. et al. Real-time DNA sequencing from single polymerase molecules. Science 323, 133–138 (2009).
Korlach, J. et al. Real-time DNA sequencing from single polymerase molecules. Methods Enzymol. 472, 431–455 (2010).
We thank S. Clingenpeel (Joint Genome Institute) for growing cultures and performing DNA extraction for M. ruber and P. heparinus; B. Munson and F. Antonacci for assistance with the BAC library construction; and K. Travers, S. McCalmon, M. Wang, U. Nguyen, S. Ranade, M. Ashby, L. Hon and L. Hickey (Pacific Biosciences) for assistance in sample preparation, sequencing and data analysis. The authors acknowledge the ATCC for providing the E. coli K-12 MG1655 strain. We thank S. Koren and A. Phillippy for pointing out to us the SMRT sequencing–based gap-filling functionality development in the Celera Assembler. The work conducted by the US Department of Energy Joint Genome Institute is supported by the Office of Science of the US Department of Energy under contract no. DE-AC02-05CH11231.
C.-S.C., D.H.A., P.M., A.A.K., J.D., C.H., S.W.T. and J.K. are employees of Pacific Biosciences, a company commercializing DNA sequencing technologies.
About this article
Cite this article
Chin, CS., Alexander, D., Marks, P. et al. Nonhybrid, finished microbial genome assemblies from long-read SMRT sequencing data. Nat Methods 10, 563–569 (2013). https://doi.org/10.1038/nmeth.2474
This article is cited by
Chromosome-level genome assembly and population genomics of Mongolian racerunner (Eremias argus) provide insights into high-altitude adaptation in lizards
BMC Biology (2023)
BMC Genomics (2023)
BMC Genomics (2023)
Annals of Clinical Microbiology and Antimicrobials (2023)