High-quality genome sequences of uncultured microbes by assembly of read clouds

Abstract

Although shotgun metagenomic sequencing of microbiome samples enables partial reconstruction of strain-level community structure, obtaining high-quality microbial genome drafts without isolation and culture remains difficult. Here, we present an application of read clouds, short-read sequences tagged with long-range information, to microbiome samples. We present Athena, a de novo assembler that uses read clouds to improve metagenomic assemblies. We applied this approach to sequence stool samples from two healthy individuals and compared it with existing short-read and synthetic long-read metagenomic sequencing techniques. Read-cloud metagenomic sequencing and Athena assembly produced the most comprehensive individual genome drafts with high contiguity (>200-kb N50, fewer than ten contigs), even for bacteria with relatively low (20×) raw short-read-sequence coverage. We also sequenced a complex marine-sediment sample and generated 24 intermediate-quality genome drafts (>70% complete, <10% contaminated), nine of which were complete (>90% complete, <5% contaminated). Our approach allows for culture-free generation of high-quality microbial genome drafts by using a single shotgun experiment.

Access options

Rent or Buy article

Get time limited or full article access on ReadCube.

from$8.99

All prices are NET prices.

Figure 1: Overview of the read-cloud shotgun sequencing and assembly approach.
Figure 2: Composition of stool microbiome communities from two healthy human participants.
Figure 3: Combined genome-draft results of the read-cloud, SLR, and short-read approaches applied to stool samples from healthy humans.
Figure 4: Completeness of genome bins produced by read-cloud, SLR, and short-read sequencing for various taxa present in stool samples from healthy humans.
Figure 5: Comparisons of representative read-cloud genome drafts to reference genomes, and corresponding short-read and SLR drafts.
Figure 6: Comparison of marine-sediment genome drafts generated by read-cloud sequencing with standard short-read versus Athena assembly.

Accession codes

Primary accessions

Sequence Read Archive

References

  1. 1

    Schloss, P.D. & Handelsman, J. Metagenomics for studying unculturable microorganisms: cutting the Gordian knot. Genome Biol. 6, 229 (2005).

    PubMed  PubMed Central  Google Scholar 

  2. 2

    Turnbaugh, P.J. et al. An obesity-associated gut microbiome with increased capacity for energy harvest. Nature 444, 1027–1031 (2006).

    PubMed  PubMed Central  Google Scholar 

  3. 3

    Human Microbiome Project Consortium. Structure, function and diversity of the healthy human microbiome. Nature 486, 207–214 (2012).

  4. 4

    Lloyd-Price, J. et al. Strains, functions and dynamics in the expanded Human Microbiome Project. Nature 550, 61–66 (2017).

    CAS  PubMed  PubMed Central  Google Scholar 

  5. 5

    Kashtan, N. et al. Single-cell genomics reveals hundreds of coexisting subpopulations in wild Prochlorococcus. Science 344, 416–420 (2014).

    CAS  PubMed  Google Scholar 

  6. 6

    Baker, B.J., Lazar, C.S., Teske, A.P. & Dick, G.J. Genomic resolution of linkages in carbon, nitrogen, and sulfur cycling among widespread estuary sediment bacteria. Microbiome 3, 14 (2015).

    PubMed  PubMed Central  Google Scholar 

  7. 7

    Eyice, Ö. et al. SIP metagenomics identifies uncultivated Methylophilaceae as dimethylsulphide degrading bacteria in soil and lake sediment. ISME J. 9, 2336–2348 (2015).

    CAS  PubMed  PubMed Central  Google Scholar 

  8. 8

    He, Y. et al. Genomic and enzymatic evidence for acetogenesis among multiple lineages of the archaeal phylum Bathyarchaeota widespread in marine sediments. Nat. Microbiol. 1, 16035 (2016).

    CAS  PubMed  Google Scholar 

  9. 9

    Brown, C.T. et al. Unusual biology across a group comprising more than 15% of domain bacteria. Nature 523, 208–211 (2015).

    CAS  PubMed  Google Scholar 

  10. 10

    Hug, L.A. et al. A new view of the tree of life. Nat. Microbiol. 1, 16048 (2016).

    CAS  PubMed  Google Scholar 

  11. 11

    O'Leary, N.A. et al. Reference sequence (RefSeq) database at NCBI: current status, taxonomic expansion, and functional annotation. Nucleic Acids Res. 44, D733–D745 (2016).

    CAS  Google Scholar 

  12. 12

    Peng, Y., Leung, H.C.M., Yiu, S.M. & Chin, F.Y.L. IDBA-UD: a de novo assembler for single-cell and metagenomic sequencing data with highly uneven depth. Bioinformatics 28, 1420–1428 (2012).

    CAS  PubMed  Google Scholar 

  13. 13

    Namiki, T., Hachiya, T., Tanaka, H. & Sakakibara, Y. MetaVelvet: an extension of Velvet assembler to de novo metagenome assembly from short sequence reads. Nucleic Acids Res. 40, e155 (2012).

    CAS  PubMed  PubMed Central  Google Scholar 

  14. 14

    Cleary, B. et al. Detection of low-abundance bacterial strains in metagenomic datasets by eigengenome partitioning. Nat. Biotechnol. 33, 1053–1060 (2015).

    CAS  PubMed  PubMed Central  Google Scholar 

  15. 15

    Wu, Y.-W., Tang, Y.-H., Tringe, S.G., Simmons, B.A. & Singer, S.W. MaxBin: an automated binning method to recover individual genomes from metagenomes using an expectation-maximization algorithm. Microbiome 2, 26 (2014).

    CAS  PubMed  PubMed Central  Google Scholar 

  16. 16

    Kang, D.D., Froula, J., Egan, R. & Wang, Z. MetaBAT, an efficient tool for accurately reconstructing single genomes from complex microbial communities. PeerJ 3, e1165 (2015).

    PubMed  PubMed Central  Google Scholar 

  17. 17

    Nielsen, H.B. et al. Identification and assembly of genomes and genetic elements in complex metagenomic samples without using reference genomes. Nat. Biotechnol. 32, 822–828 (2014).

    CAS  Google Scholar 

  18. 18

    Alneberg, J. et al. Binning metagenomic contigs by coverage and composition. Nat. Methods 11, 1144–1146 (2014).

    CAS  PubMed  Google Scholar 

  19. 19

    Popic, V., Kuleshov, V., Snyder, M. & Batzoglou, S. GATTACA: lightweight metagenomic binning with compact indexing of kmer counts and minhash-based panel selection. Preprint at https://www.biorxiv.org/content/early/2017/04/26/130997 (2017).

  20. 20

    Koren, S. et al. Hybrid error correction and de novo assembly of single-molecule sequencing reads. Nat. Biotechnol. 30, 693–700 (2012).

    CAS  PubMed  PubMed Central  Google Scholar 

  21. 21

    Chin, C.-S. Nonhybrid, finished microbial genome assemblies from long-read SMRT sequencing data. Nat. Methods 10, 563–569 (2013).

    CAS  Google Scholar 

  22. 22

    Loman, N.J., Quick, J. & Simpson, J.T. A complete bacterial genome assembled de novo using only nanopore sequencing data. Nat. Methods 12, 733–735 (2015).

    CAS  Google Scholar 

  23. 23

    Leonard, M.T. et al. The methylome of the gut microbiome: disparate Dam methylation patterns in intestinal Bacteroides dorei. Front. Microbiol. 5, 361 (2014).

    PubMed  PubMed Central  Google Scholar 

  24. 24

    Voskoboynik, A. et al. The genome sequence of the colonial chordate, Botryllus schlosseri. eLife 2, e00569 (2013).

    PubMed  PubMed Central  Google Scholar 

  25. 25

    Kuleshov, V. et al. Synthetic long-read sequencing reveals intraspecies diversity in the human microbiome. Nat. Biotechnol. 34, 64–69 (2016).

    CAS  PubMed  Google Scholar 

  26. 26

    Sharon, I. et al. Accurate, multi-kb reads resolve complex populations and detect rare microorganisms. Genome Res. 25, 534–543 (2015).

    CAS  PubMed  PubMed Central  Google Scholar 

  27. 27

    White, R.A. III et al. Moleculo long-read sequencing facilitates assembly and genomic binning from complex soil metagenomes. mSystems 1, e00045–16 (2016).

    PubMed  PubMed Central  Google Scholar 

  28. 28

    Zheng, G.X.Y. et al. Haplotyping germline and cancer genomes with high-throughput linked-read sequencing. Nat. Biotechnol. 34, 303–311 (2016).

    CAS  PubMed  PubMed Central  Google Scholar 

  29. 29

    Bishara, A. et al. Read clouds uncover variation in complex regions of the human genome. Genome Res. 25, 1570–1580 (2015).

    CAS  PubMed  PubMed Central  Google Scholar 

  30. 30

    Peters, B.A. et al. Accurate whole-genome sequencing and haplotyping from 10 to 20 human cells. Nature 487, 190–195 (2012).

    CAS  PubMed  PubMed Central  Google Scholar 

  31. 31

    Kitzman, J.O. et al. Haplotype-resolved genome sequencing of a Gujarati Indian individual. Nat. Biotechnol. 29, 59–63 (2011).

    CAS  PubMed  Google Scholar 

  32. 32

    Amini, S. et al. Haplotype-resolved whole-genome sequencing by contiguity-preserving transposition and combinatorial indexing. Nat. Genet. 46, 1343–1349 (2014).

    CAS  PubMed  PubMed Central  Google Scholar 

  33. 33

    Spies, N. et al. Genome-wide reconstruction of complex structural variants using read clouds. Nat. Methods 14, 915–920 (2017).

    CAS  PubMed  PubMed Central  Google Scholar 

  34. 34

    Lin, Y. et al. Assembly of long error-prone reads using de Bruijn graphs. Proc. Natl. Acad. Sci. USA 113, E8396–E8405 (2016).

    CAS  PubMed  Google Scholar 

  35. 35

    Kolmogorov, M., Yuan, J., Lin, Y. & Pevzner, P. Assembly of long error-prone reads using repeat graphs. Preprint at https://www.biorxiv.org/content/early/2018/01/12/247148 (2018).

  36. 36

    Mikheenko, A., Saveliev, V. & Gurevich, A. MetaQUAST: evaluation of metagenome assemblies. Bioinformatics 32, 1088–1090 (2016).

    CAS  PubMed  Google Scholar 

  37. 37

    Parks, D.H., Imelfort, M., Skennerton, C.T., Hugenholtz, P. & Tyson, G.W. CheckM: assessing the quality of microbial genomes recovered from isolates, single cells, and metagenomes. Genome Res. 25, 1043–1055 (2015).

    CAS  PubMed  PubMed Central  Google Scholar 

  38. 38

    Bowers, R.M. et al. Minimum information about a single amplified genome (MISAG) and a metagenome-assembled genome (MIMAG) of bacteria and archaea. Nat. Biotechnol. 35, 725–731 (2017).

    CAS  Google Scholar 

  39. 39

    Martin, M. Cutadapt removes adapter sequences from high-throughput sequencing reads. EMBnet.journal 17, 10–12 (2011).

    Google Scholar 

  40. 40

    Bankevich, A. et al. SPAdes: A new genome assembly algorithm and its applications to single-cell sequencing. J. Comput. Biol. 19, 455–477 (2012).

    CAS  PubMed  PubMed Central  Google Scholar 

  41. 41

    Bankevich, A. & Pevzner, P.A. TruSPAdes: barcode assembly of TruSeq synthetic long reads. Nat. Methods 13, 248–250 (2016).

    CAS  PubMed  Google Scholar 

  42. 42

    Koren, S. et al. Canu: scalable and accurate long-read assembly via adaptive k-mer weighting and repeat separation. Genome Res. 27, 722–736 (2017).

    CAS  PubMed  PubMed Central  Google Scholar 

  43. 43

    Li, D., Liu, C.-M., Luo, R., Sadakane, K. & Lam, T.-W. MEGAHIT: an ultra-fast single-node solution for large and complex metagenomics assembly via succinct de Bruijn graph. Bioinformatics 31, 1674–1676 (2015).

    CAS  PubMed  Google Scholar 

  44. 44

    Li, H. & Durbin, R. Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics 25, 1754–1760 (2009).

    CAS  PubMed  PubMed Central  Google Scholar 

  45. 45

    Seemann, T. Prokka: rapid prokaryotic genome annotation. Bioinformatics 30, 2068–2069 (2014).

    CAS  Google Scholar 

  46. 46

    Laslett, D. & Canback, B. ARAGORN, a program to detect tRNA genes and tmRNA genes in nucleotide sequences. Nucleic Acids Res. 32, 11–16 (2004).

    CAS  PubMed  PubMed Central  Google Scholar 

  47. 47

    Seemann, T. barrnap. Github https://github.com/tseemann/barrnap/ (2018).

  48. 48

    Wood, D.E. & Salzberg, S.L. Kraken: ultrafast metagenomic sequence classification using exact alignments. Genome Biol. 15, R46 (2014).

    PubMed  PubMed Central  Google Scholar 

  49. 49

    Benson, D.A. et al. GenBank. Nucleic Acids Res. 41, D36–D42 (2013).

    CAS  PubMed  Google Scholar 

Download references

Acknowledgements

The authors thank E. Tkachenko for assistance in preparing TruSeq libraries and M. Snyder and members of the laboratory of A.S.B. for helpful feedback. The authors also thank H. Xu at Illumina for sharing read-cloud sequencing data of ATCC 20 for the mock metagenome. This work was supported by NCI K08 CA184420, the Amy Strelzer Manasevit Award from the National Marrow Donor Program, and a Damon Runyon Clinical Investigator Award to A.S.B. E.L.M. was supported by National Science Foundation Graduate Research Fellowship DGE-114747. A.B. was supported by the Stanford Genome Training Program (SGTP; NIH/NHGRI) and a Training Grant of the Joint Initiative for Metrology in Biology (JIMB; NIST). A.E.D. and the marine-sample collection and extraction were supported by National Science Foundation grant OCE-1634297. A.E.P. was supported by a Center for Dark Energy Biosphere Investigations Postdoctoral Fellowship. Access to shared computer resources was supported in part by NIH P30 CA124435 via the Stanford Cancer Institute Shared Resource Genetics Bioinformatics Service Center.

Author information

Affiliations

Authors

Contributions

A.B., E.L.M., A.S.B. and S.B. conceived the study. Z.W. prepared read-cloud libraries. E.L.M. extracted DNA and prepared SLR sequencing libraries. E.L.M. performed PCR validation and Sanger sequencing. A.B. and S.B. conceived the assembly approach. A.B. implemented the Athena assembler. M.K. modified the Flye assembler for use with Athena. A.E.P. and A.E.D. collected the marine-sediment sample, extracted DNA from the marine-sediment sample and assisted in analysis of these samples. A.B., A.S.B. and E.L.M. carried out all analyses, wrote the manuscript, and generated figures. All authors commented on the manuscript.

Corresponding authors

Correspondence to Serafim Batzoglou or Ami S Bhatt.

Ethics declarations

Competing interests

S.B. is an employee of and owns stock in Illumina. Shotgun sequencing products developed, marketed and/or sold by Illumina were used in this work.

Supplementary information

Supplementary Text and Figures

Supplementary Figures 1–5, Supplementary Tables 1, 3, 4, and Supplementary Notes 1–8 (PDF 5546 kb)

Life Sciences Reporting Summary (PDF 160 kb)

Supplementary Table 2

Per-species MetaQUAST statistics for ATCC 20 (XLSX 40 kb)

Supplementary Table 5

Assignment of metagenomic contigs to bins for healthy gut samples (TXT 1592 kb)

Supplementary Table 6

Per-taxon genome bin statistics for healthy gut samples (CSV 10 kb)

Supplementary Table 7

Comparisons of healthy gut genome bins and available references (XLSX 9 kb)

Supplementary Table 8

Assignment of metagenomic contigs to bins for marine sediment sample (TXT 6609 kb)

Supplementary Table 9

Genome bin statistics for marine sample (CSV 5 kb)

Supplementary Table 10

High-copy repeat statistics for P. copri (XLSX 7 kb)

Supplementary Table 11

Per-species MetaQUAST statistics for in silico downsampled ATCC 20 (XLSX 37 kb)

Supplementary Table 12

Per-taxon genome bin statistics for in silico downsampled human gut sample (CSV 3 kb)

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Bishara, A., Moss, E., Kolmogorov, M. et al. High-quality genome sequences of uncultured microbes by assembly of read clouds. Nat Biotechnol 36, 1067–1075 (2018). https://doi.org/10.1038/nbt.4266

Download citation

Further reading

Search

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing