High-quality genome sequences of uncultured microbes by assembly of read clouds

Bishara, Alex; Moss, Eli L; Kolmogorov, Mikhail; Parada, Alma E; Weng, Ziming; Sidow, Arend; Dekas, Anne E; Batzoglou, Serafim; Bhatt, Ami S

doi:10.1038/nbt.4266

Article
Published: 15 October 2018

High-quality genome sequences of uncultured microbes by assembly of read clouds

Nature Biotechnology volume 36, pages 1067–1075 (2018)Cite this article

12k Accesses
71 Citations
149 Altmetric
Metrics details

Subjects

Abstract

Although shotgun metagenomic sequencing of microbiome samples enables partial reconstruction of strain-level community structure, obtaining high-quality microbial genome drafts without isolation and culture remains difficult. Here, we present an application of read clouds, short-read sequences tagged with long-range information, to microbiome samples. We present Athena, a de novo assembler that uses read clouds to improve metagenomic assemblies. We applied this approach to sequence stool samples from two healthy individuals and compared it with existing short-read and synthetic long-read metagenomic sequencing techniques. Read-cloud metagenomic sequencing and Athena assembly produced the most comprehensive individual genome drafts with high contiguity (>200-kb N50, fewer than ten contigs), even for bacteria with relatively low (20×) raw short-read-sequence coverage. We also sequenced a complex marine-sediment sample and generated 24 intermediate-quality genome drafts (>70% complete, <10% contaminated), nine of which were complete (>90% complete, <5% contaminated). Our approach allows for culture-free generation of high-quality microbial genome drafts by using a single shotgun experiment.

Access through your institution

Buy or subscribe

This is a preview of subscription content, access via your institution

Access options

Access through your institution

Buy this article

Purchase on Springer Link
Instant access to full article PDF

Buy now

Prices may be subject to local taxes which are calculated during checkout

**Figure 1: Overview of the read-cloud shotgun sequencing and assembly approach.**

**Figure 2: Composition of stool microbiome communities from two healthy human participants.**

**Figure 3: Combined genome-draft results of the read-cloud, SLR, and short-read approaches applied to stool samples from healthy humans.**

**Figure 4: Completeness of genome bins produced by read-cloud, SLR, and short-read sequencing for various taxa present in stool samples from healthy humans.**

**Figure 5: Comparisons of representative read-cloud genome drafts to reference genomes, and corresponding short-read and SLR drafts.**

**Figure 6: Comparison of marine-sediment genome drafts generated by read-cloud sequencing with standard short-read versus Athena assembly.**

HiFi metagenomic sequencing enables assembly of accurate and complete genomes from human gut microbiota

Article Open access 26 October 2022

Improved high-molecular-weight DNA extraction, nanopore sequencing and metagenomic assembly from the human gut microbiome

Article 04 December 2020

Strainberry: automated strain separation in low-complexity metagenomes using long reads

Article Open access 23 July 2021

Accession codes

Primary accessions

Sequence Read Archive

PRJNA380276

References

Schloss, P.D. & Handelsman, J. Metagenomics for studying unculturable microorganisms: cutting the Gordian knot. Genome Biol. 6, 229 (2005).
PubMed PubMed Central Google Scholar
Turnbaugh, P.J. et al. An obesity-associated gut microbiome with increased capacity for energy harvest. Nature 444, 1027–1031 (2006).
PubMed Google Scholar
Human Microbiome Project Consortium. Structure, function and diversity of the healthy human microbiome. Nature 486, 207–214 (2012).
Lloyd-Price, J. et al. Strains, functions and dynamics in the expanded Human Microbiome Project. Nature 550, 61–66 (2017).
CAS PubMed PubMed Central Google Scholar
Kashtan, N. et al. Single-cell genomics reveals hundreds of coexisting subpopulations in wild Prochlorococcus. Science 344, 416–420 (2014).
CAS PubMed Google Scholar
Baker, B.J., Lazar, C.S., Teske, A.P. & Dick, G.J. Genomic resolution of linkages in carbon, nitrogen, and sulfur cycling among widespread estuary sediment bacteria. Microbiome 3, 14 (2015).
PubMed PubMed Central Google Scholar
Eyice, Ö. et al. SIP metagenomics identifies uncultivated Methylophilaceae as dimethylsulphide degrading bacteria in soil and lake sediment. ISME J. 9, 2336–2348 (2015).
CAS PubMed PubMed Central Google Scholar
He, Y. et al. Genomic and enzymatic evidence for acetogenesis among multiple lineages of the archaeal phylum Bathyarchaeota widespread in marine sediments. Nat. Microbiol. 1, 16035 (2016).
CAS PubMed Google Scholar
Brown, C.T. et al. Unusual biology across a group comprising more than 15% of domain bacteria. Nature 523, 208–211 (2015).
CAS PubMed Google Scholar
Hug, L.A. et al. A new view of the tree of life. Nat. Microbiol. 1, 16048 (2016).
CAS PubMed Google Scholar
O'Leary, N.A. et al. Reference sequence (RefSeq) database at NCBI: current status, taxonomic expansion, and functional annotation. Nucleic Acids Res. 44, D733–D745 (2016).
CAS PubMed Google Scholar
Peng, Y., Leung, H.C.M., Yiu, S.M. & Chin, F.Y.L. IDBA-UD: a de novo assembler for single-cell and metagenomic sequencing data with highly uneven depth. Bioinformatics 28, 1420–1428 (2012).
CAS PubMed Google Scholar
Namiki, T., Hachiya, T., Tanaka, H. & Sakakibara, Y. MetaVelvet: an extension of Velvet assembler to de novo metagenome assembly from short sequence reads. Nucleic Acids Res. 40, e155 (2012).
CAS PubMed PubMed Central Google Scholar
Cleary, B. et al. Detection of low-abundance bacterial strains in metagenomic datasets by eigengenome partitioning. Nat. Biotechnol. 33, 1053–1060 (2015).
CAS PubMed PubMed Central Google Scholar
Wu, Y.-W., Tang, Y.-H., Tringe, S.G., Simmons, B.A. & Singer, S.W. MaxBin: an automated binning method to recover individual genomes from metagenomes using an expectation-maximization algorithm. Microbiome 2, 26 (2014).
CAS PubMed PubMed Central Google Scholar
Kang, D.D., Froula, J., Egan, R. & Wang, Z. MetaBAT, an efficient tool for accurately reconstructing single genomes from complex microbial communities. PeerJ 3, e1165 (2015).
PubMed PubMed Central Google Scholar
Nielsen, H.B. et al. Identification and assembly of genomes and genetic elements in complex metagenomic samples without using reference genomes. Nat. Biotechnol. 32, 822–828 (2014).
CAS PubMed Google Scholar
Alneberg, J. et al. Binning metagenomic contigs by coverage and composition. Nat. Methods 11, 1144–1146 (2014).
CAS PubMed Google Scholar
Popic, V., Kuleshov, V., Snyder, M. & Batzoglou, S. GATTACA: lightweight metagenomic binning with compact indexing of kmer counts and minhash-based panel selection. Preprint at https://www.biorxiv.org/content/early/2017/04/26/130997 (2017).
Koren, S. et al. Hybrid error correction and de novo assembly of single-molecule sequencing reads. Nat. Biotechnol. 30, 693–700 (2012).
CAS PubMed PubMed Central Google Scholar
Chin, C.-S. Nonhybrid, finished microbial genome assemblies from long-read SMRT sequencing data. Nat. Methods 10, 563–569 (2013).
CAS PubMed Google Scholar
Loman, N.J., Quick, J. & Simpson, J.T. A complete bacterial genome assembled de novo using only nanopore sequencing data. Nat. Methods 12, 733–735 (2015).
CAS PubMed Google Scholar
Leonard, M.T. et al. The methylome of the gut microbiome: disparate Dam methylation patterns in intestinal Bacteroides dorei. Front. Microbiol. 5, 361 (2014).
PubMed PubMed Central Google Scholar
Voskoboynik, A. et al. The genome sequence of the colonial chordate, Botryllus schlosseri. eLife 2, e00569 (2013).
PubMed PubMed Central Google Scholar
Kuleshov, V. et al. Synthetic long-read sequencing reveals intraspecies diversity in the human microbiome. Nat. Biotechnol. 34, 64–69 (2016).
CAS PubMed Google Scholar
Sharon, I. et al. Accurate, multi-kb reads resolve complex populations and detect rare microorganisms. Genome Res. 25, 534–543 (2015).
CAS PubMed PubMed Central Google Scholar
White, R.A. III et al. Moleculo long-read sequencing facilitates assembly and genomic binning from complex soil metagenomes. mSystems 1, e00045–16 (2016).
PubMed PubMed Central Google Scholar
Zheng, G.X.Y. et al. Haplotyping germline and cancer genomes with high-throughput linked-read sequencing. Nat. Biotechnol. 34, 303–311 (2016).
CAS PubMed PubMed Central Google Scholar
Bishara, A. et al. Read clouds uncover variation in complex regions of the human genome. Genome Res. 25, 1570–1580 (2015).
CAS PubMed PubMed Central Google Scholar
Peters, B.A. et al. Accurate whole-genome sequencing and haplotyping from 10 to 20 human cells. Nature 487, 190–195 (2012).
CAS PubMed PubMed Central Google Scholar
Kitzman, J.O. et al. Haplotype-resolved genome sequencing of a Gujarati Indian individual. Nat. Biotechnol. 29, 59–63 (2011).
CAS PubMed Google Scholar
Amini, S. et al. Haplotype-resolved whole-genome sequencing by contiguity-preserving transposition and combinatorial indexing. Nat. Genet. 46, 1343–1349 (2014).
CAS PubMed PubMed Central Google Scholar
Spies, N. et al. Genome-wide reconstruction of complex structural variants using read clouds. Nat. Methods 14, 915–920 (2017).
CAS PubMed PubMed Central Google Scholar
Lin, Y. et al. Assembly of long error-prone reads using de Bruijn graphs. Proc. Natl. Acad. Sci. USA 113, E8396–E8405 (2016).
CAS PubMed PubMed Central Google Scholar
Kolmogorov, M., Yuan, J., Lin, Y. & Pevzner, P. Assembly of long error-prone reads using repeat graphs. Preprint at https://www.biorxiv.org/content/early/2018/01/12/247148 (2018).
Mikheenko, A., Saveliev, V. & Gurevich, A. MetaQUAST: evaluation of metagenome assemblies. Bioinformatics 32, 1088–1090 (2016).
CAS PubMed Google Scholar
Parks, D.H., Imelfort, M., Skennerton, C.T., Hugenholtz, P. & Tyson, G.W. CheckM: assessing the quality of microbial genomes recovered from isolates, single cells, and metagenomes. Genome Res. 25, 1043–1055 (2015).
CAS PubMed PubMed Central Google Scholar
Bowers, R.M. et al. Minimum information about a single amplified genome (MISAG) and a metagenome-assembled genome (MIMAG) of bacteria and archaea. Nat. Biotechnol. 35, 725–731 (2017).
CAS PubMed PubMed Central Google Scholar
Martin, M. Cutadapt removes adapter sequences from high-throughput sequencing reads. EMBnet.journal 17, 10–12 (2011).
Google Scholar
Bankevich, A. et al. SPAdes: A new genome assembly algorithm and its applications to single-cell sequencing. J. Comput. Biol. 19, 455–477 (2012).
CAS PubMed PubMed Central Google Scholar
Bankevich, A. & Pevzner, P.A. TruSPAdes: barcode assembly of TruSeq synthetic long reads. Nat. Methods 13, 248–250 (2016).
CAS PubMed Google Scholar
Koren, S. et al. Canu: scalable and accurate long-read assembly via adaptive k-mer weighting and repeat separation. Genome Res. 27, 722–736 (2017).
CAS PubMed PubMed Central Google Scholar
Li, D., Liu, C.-M., Luo, R., Sadakane, K. & Lam, T.-W. MEGAHIT: an ultra-fast single-node solution for large and complex metagenomics assembly via succinct de Bruijn graph. Bioinformatics 31, 1674–1676 (2015).
CAS PubMed Google Scholar
Li, H. & Durbin, R. Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics 25, 1754–1760 (2009).
CAS PubMed PubMed Central Google Scholar
Seemann, T. Prokka: rapid prokaryotic genome annotation. Bioinformatics 30, 2068–2069 (2014).
CAS PubMed Google Scholar
Laslett, D. & Canback, B. ARAGORN, a program to detect tRNA genes and tmRNA genes in nucleotide sequences. Nucleic Acids Res. 32, 11–16 (2004).
CAS PubMed PubMed Central Google Scholar
Seemann, T. barrnap. Github https://github.com/tseemann/barrnap/ (2018).
Wood, D.E. & Salzberg, S.L. Kraken: ultrafast metagenomic sequence classification using exact alignments. Genome Biol. 15, R46 (2014).
PubMed PubMed Central Google Scholar
Benson, D.A. et al. GenBank. Nucleic Acids Res. 41, D36–D42 (2013).
CAS PubMed Google Scholar

Download references

Acknowledgements

The authors thank E. Tkachenko for assistance in preparing TruSeq libraries and M. Snyder and members of the laboratory of A.S.B. for helpful feedback. The authors also thank H. Xu at Illumina for sharing read-cloud sequencing data of ATCC 20 for the mock metagenome. This work was supported by NCI K08 CA184420, the Amy Strelzer Manasevit Award from the National Marrow Donor Program, and a Damon Runyon Clinical Investigator Award to A.S.B. E.L.M. was supported by National Science Foundation Graduate Research Fellowship DGE-114747. A.B. was supported by the Stanford Genome Training Program (SGTP; NIH/NHGRI) and a Training Grant of the Joint Initiative for Metrology in Biology (JIMB; NIST). A.E.D. and the marine-sample collection and extraction were supported by National Science Foundation grant OCE-1634297. A.E.P. was supported by a Center for Dark Energy Biosphere Investigations Postdoctoral Fellowship. Access to shared computer resources was supported in part by NIH P30 CA124435 via the Stanford Cancer Institute Shared Resource Genetics Bioinformatics Service Center.

Author information

Alex Bishara and Eli L Moss: These authors contributed equally to this work.

Authors and Affiliations

Department of Computer Science, Stanford University, Stanford, California, USA
Alex Bishara & Serafim Batzoglou
Department of Medicine (Hematology, Blood and Marrow Transplantation) and Department of Genetics, Stanford University, Stanford, California, USA
Alex Bishara, Eli L Moss, Arend Sidow & Ami S Bhatt
Department of Computer Science and Engineering, University of California San Diego, La Jolla, California, USA
Mikhail Kolmogorov
Department of Earth System Science, Stanford University, Stanford, California, USA
Alma E Parada & Anne E Dekas
Department of Pathology, Stanford University School of Medicine, Stanford, California, USA
Ziming Weng & Arend Sidow

Authors

Alex Bishara
View author publications
You can also search for this author in PubMed Google Scholar
Eli L Moss
View author publications
You can also search for this author in PubMed Google Scholar
Mikhail Kolmogorov
View author publications
You can also search for this author in PubMed Google Scholar
Alma E Parada
View author publications
You can also search for this author in PubMed Google Scholar
Ziming Weng
View author publications
You can also search for this author in PubMed Google Scholar
Arend Sidow
View author publications
You can also search for this author in PubMed Google Scholar
Anne E Dekas
View author publications
You can also search for this author in PubMed Google Scholar
Serafim Batzoglou
View author publications
You can also search for this author in PubMed Google Scholar
Ami S Bhatt
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

A.B., E.L.M., A.S.B. and S.B. conceived the study. Z.W. prepared read-cloud libraries. E.L.M. extracted DNA and prepared SLR sequencing libraries. E.L.M. performed PCR validation and Sanger sequencing. A.B. and S.B. conceived the assembly approach. A.B. implemented the Athena assembler. M.K. modified the Flye assembler for use with Athena. A.E.P. and A.E.D. collected the marine-sediment sample, extracted DNA from the marine-sediment sample and assisted in analysis of these samples. A.B., A.S.B. and E.L.M. carried out all analyses, wrote the manuscript, and generated figures. All authors commented on the manuscript.

Corresponding authors

Correspondence to Serafim Batzoglou or Ami S Bhatt.

Ethics declarations

Competing interests

S.B. is an employee of and owns stock in Illumina. Shotgun sequencing products developed, marketed and/or sold by Illumina were used in this work.

Supplementary information

Supplementary Text and Figures

Supplementary Figures 1–5, Supplementary Tables 1, 3, 4, and Supplementary Notes 1–8 (PDF 5546 kb)

Life Sciences Reporting Summary (PDF 160 kb)

Supplementary Table 2

Per-species MetaQUAST statistics for ATCC 20 (XLSX 40 kb)

Supplementary Table 5

Assignment of metagenomic contigs to bins for healthy gut samples (TXT 1592 kb)

Supplementary Table 6

Per-taxon genome bin statistics for healthy gut samples (CSV 10 kb)

Supplementary Table 7

Comparisons of healthy gut genome bins and available references (XLSX 9 kb)

Supplementary Table 8

Assignment of metagenomic contigs to bins for marine sediment sample (TXT 6609 kb)

Supplementary Table 9

Genome bin statistics for marine sample (CSV 5 kb)

Supplementary Table 10

High-copy repeat statistics for P. copri (XLSX 7 kb)

Supplementary Table 11

Per-species MetaQUAST statistics for in silico downsampled ATCC 20 (XLSX 37 kb)

Supplementary Table 12

Per-taxon genome bin statistics for in silico downsampled human gut sample (CSV 3 kb)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Bishara, A., Moss, E., Kolmogorov, M. et al. High-quality genome sequences of uncultured microbes by assembly of read clouds. Nat Biotechnol 36, 1067–1075 (2018). https://doi.org/10.1038/nbt.4266

Download citation

Received: 23 February 2018
Accepted: 28 August 2018
Published: 15 October 2018
Issue Date: November 2018
DOI: https://doi.org/10.1038/nbt.4266

This article is cited by

Ariadne: synthetic long read deconvolution using assembly graphs
- Lauren Mak
- Dmitry Meleshko
- Iman Hajirasouliha
Genome Biology (2023)
Maast: genotyping thousands of microbial strains efficiently
- Zhou Jason Shi
- Stephen Nayfach
- Katherine S. Pollard
Genome Biology (2023)
A high-quality genome compendium of the human gut microbiome of Inner Mongolians
- Hao Jin
- Keyu Quan
- Zhihong Sun
Nature Microbiology (2023)
Viruses interact with hosts that span distantly related microbial domains in dense hydrothermal mats
- Yunha Hwang
- Simon Roux
- Peter R. Girguis
Nature Microbiology (2023)
Target-enriched long-read sequencing (TELSeq) contextualizes antimicrobial resistance genes in metagenomes
- Ilya B. Slizovskiy
- Marco Oliva
- Noelle R. Noyes
Microbiome (2022)