Abstract

Although shotgun metagenomic sequencing of microbiome samples enables partial reconstruction of strain-level community structure, obtaining high-quality microbial genome drafts without isolation and culture remains difficult. Here, we present an application of read clouds, short-read sequences tagged with long-range information, to microbiome samples. We present Athena, a de novo assembler that uses read clouds to improve metagenomic assemblies. We applied this approach to sequence stool samples from two healthy individuals and compared it with existing short-read and synthetic long-read metagenomic sequencing techniques. Read-cloud metagenomic sequencing and Athena assembly produced the most comprehensive individual genome drafts with high contiguity (>200-kb N50, fewer than ten contigs), even for bacteria with relatively low (20×) raw short-read-sequence coverage. We also sequenced a complex marine-sediment sample and generated 24 intermediate-quality genome drafts (>70% complete, <10% contaminated), nine of which were complete (>90% complete, <5% contaminated). Our approach allows for culture-free generation of high-quality microbial genome drafts by using a single shotgun experiment.

Access optionsAccess options

Rent or Buy article

Get time limited or full article access on ReadCube.

from$8.99

All prices are NET prices.

Accessions

Primary accessions

Sequence Read Archive

References

  1. 1.

    & Metagenomics for studying unculturable microorganisms: cutting the Gordian knot. Genome Biol. 6, 229 (2005).

  2. 2.

    et al. An obesity-associated gut microbiome with increased capacity for energy harvest. Nature 444, 1027–1031 (2006).

  3. 3.

    Human Microbiome Project Consortium. Structure, function and diversity of the healthy human microbiome. Nature 486, 207–214 (2012).

  4. 4.

    et al. Strains, functions and dynamics in the expanded Human Microbiome Project. Nature 550, 61–66 (2017).

  5. 5.

    et al. Single-cell genomics reveals hundreds of coexisting subpopulations in wild Prochlorococcus. Science 344, 416–420 (2014).

  6. 6.

    , , & Genomic resolution of linkages in carbon, nitrogen, and sulfur cycling among widespread estuary sediment bacteria. Microbiome 3, 14 (2015).

  7. 7.

    et al. SIP metagenomics identifies uncultivated Methylophilaceae as dimethylsulphide degrading bacteria in soil and lake sediment. ISME J. 9, 2336–2348 (2015).

  8. 8.

    et al. Genomic and enzymatic evidence for acetogenesis among multiple lineages of the archaeal phylum Bathyarchaeota widespread in marine sediments. Nat. Microbiol. 1, 16035 (2016).

  9. 9.

    et al. Unusual biology across a group comprising more than 15% of domain bacteria. Nature 523, 208–211 (2015).

  10. 10.

    et al. A new view of the tree of life. Nat. Microbiol. 1, 16048 (2016).

  11. 11.

    et al. Reference sequence (RefSeq) database at NCBI: current status, taxonomic expansion, and functional annotation. Nucleic Acids Res. 44, D733–D745 (2016).

  12. 12.

    , , & IDBA-UD: a de novo assembler for single-cell and metagenomic sequencing data with highly uneven depth. Bioinformatics 28, 1420–1428 (2012).

  13. 13.

    , , & MetaVelvet: an extension of Velvet assembler to de novo metagenome assembly from short sequence reads. Nucleic Acids Res. 40, e155 (2012).

  14. 14.

    et al. Detection of low-abundance bacterial strains in metagenomic datasets by eigengenome partitioning. Nat. Biotechnol. 33, 1053–1060 (2015).

  15. 15.

    , , , & MaxBin: an automated binning method to recover individual genomes from metagenomes using an expectation-maximization algorithm. Microbiome 2, 26 (2014).

  16. 16.

    , , & MetaBAT, an efficient tool for accurately reconstructing single genomes from complex microbial communities. PeerJ 3, e1165 (2015).

  17. 17.

    et al. Identification and assembly of genomes and genetic elements in complex metagenomic samples without using reference genomes. Nat. Biotechnol. 32, 822–828 (2014).

  18. 18.

    et al. Binning metagenomic contigs by coverage and composition. Nat. Methods 11, 1144–1146 (2014).

  19. 19.

    , , & GATTACA: lightweight metagenomic binning with compact indexing of kmer counts and minhash-based panel selection. Preprint at (2017).

  20. 20.

    et al. Hybrid error correction and de novo assembly of single-molecule sequencing reads. Nat. Biotechnol. 30, 693–700 (2012).

  21. 21.

    Nonhybrid, finished microbial genome assemblies from long-read SMRT sequencing data. Nat. Methods 10, 563–569 (2013).

  22. 22.

    , & A complete bacterial genome assembled de novo using only nanopore sequencing data. Nat. Methods 12, 733–735 (2015).

  23. 23.

    et al. The methylome of the gut microbiome: disparate Dam methylation patterns in intestinal Bacteroides dorei. Front. Microbiol. 5, 361 (2014).

  24. 24.

    et al. The genome sequence of the colonial chordate, Botryllus schlosseri. eLife 2, e00569 (2013).

  25. 25.

    et al. Synthetic long-read sequencing reveals intraspecies diversity in the human microbiome. Nat. Biotechnol. 34, 64–69 (2016).

  26. 26.

    et al. Accurate, multi-kb reads resolve complex populations and detect rare microorganisms. Genome Res. 25, 534–543 (2015).

  27. 27.

    et al. Moleculo long-read sequencing facilitates assembly and genomic binning from complex soil metagenomes. mSystems 1, e00045–16 (2016).

  28. 28.

    et al. Haplotyping germline and cancer genomes with high-throughput linked-read sequencing. Nat. Biotechnol. 34, 303–311 (2016).

  29. 29.

    et al. Read clouds uncover variation in complex regions of the human genome. Genome Res. 25, 1570–1580 (2015).

  30. 30.

    et al. Accurate whole-genome sequencing and haplotyping from 10 to 20 human cells. Nature 487, 190–195 (2012).

  31. 31.

    et al. Haplotype-resolved genome sequencing of a Gujarati Indian individual. Nat. Biotechnol. 29, 59–63 (2011).

  32. 32.

    et al. Haplotype-resolved whole-genome sequencing by contiguity-preserving transposition and combinatorial indexing. Nat. Genet. 46, 1343–1349 (2014).

  33. 33.

    et al. Genome-wide reconstruction of complex structural variants using read clouds. Nat. Methods 14, 915–920 (2017).

  34. 34.

    et al. Assembly of long error-prone reads using de Bruijn graphs. Proc. Natl. Acad. Sci. USA 113, E8396–E8405 (2016).

  35. 35.

    , , & Assembly of long error-prone reads using repeat graphs. Preprint at (2018).

  36. 36.

    , & MetaQUAST: evaluation of metagenome assemblies. Bioinformatics 32, 1088–1090 (2016).

  37. 37.

    , , , & CheckM: assessing the quality of microbial genomes recovered from isolates, single cells, and metagenomes. Genome Res. 25, 1043–1055 (2015).

  38. 38.

    et al. Minimum information about a single amplified genome (MISAG) and a metagenome-assembled genome (MIMAG) of bacteria and archaea. Nat. Biotechnol. 35, 725–731 (2017).

  39. 39.

    Cutadapt removes adapter sequences from high-throughput sequencing reads. EMBnet.journal 17, 10–12 (2011).

  40. 40.

    et al. SPAdes: A new genome assembly algorithm and its applications to single-cell sequencing. J. Comput. Biol. 19, 455–477 (2012).

  41. 41.

    & TruSPAdes: barcode assembly of TruSeq synthetic long reads. Nat. Methods 13, 248–250 (2016).

  42. 42.

    et al. Canu: scalable and accurate long-read assembly via adaptive k-mer weighting and repeat separation. Genome Res. 27, 722–736 (2017).

  43. 43.

    , , , & MEGAHIT: an ultra-fast single-node solution for large and complex metagenomics assembly via succinct de Bruijn graph. Bioinformatics 31, 1674–1676 (2015).

  44. 44.

    & Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics 25, 1754–1760 (2009).

  45. 45.

    Prokka: rapid prokaryotic genome annotation. Bioinformatics 30, 2068–2069 (2014).

  46. 46.

    & ARAGORN, a program to detect tRNA genes and tmRNA genes in nucleotide sequences. Nucleic Acids Res. 32, 11–16 (2004).

  47. 47.

    barrnap. Github (2018).

  48. 48.

    & Kraken: ultrafast metagenomic sequence classification using exact alignments. Genome Biol. 15, R46 (2014).

  49. 49.

    et al. GenBank. Nucleic Acids Res. 41, D36–D42 (2013).

Download references

Acknowledgements

The authors thank E. Tkachenko for assistance in preparing TruSeq libraries and M. Snyder and members of the laboratory of A.S.B. for helpful feedback. The authors also thank H. Xu at Illumina for sharing read-cloud sequencing data of ATCC 20 for the mock metagenome. This work was supported by NCI K08 CA184420, the Amy Strelzer Manasevit Award from the National Marrow Donor Program, and a Damon Runyon Clinical Investigator Award to A.S.B. E.L.M. was supported by National Science Foundation Graduate Research Fellowship DGE-114747. A.B. was supported by the Stanford Genome Training Program (SGTP; NIH/NHGRI) and a Training Grant of the Joint Initiative for Metrology in Biology (JIMB; NIST). A.E.D. and the marine-sample collection and extraction were supported by National Science Foundation grant OCE-1634297. A.E.P. was supported by a Center for Dark Energy Biosphere Investigations Postdoctoral Fellowship. Access to shared computer resources was supported in part by NIH P30 CA124435 via the Stanford Cancer Institute Shared Resource Genetics Bioinformatics Service Center.

Author information

Author notes

    • Alex Bishara
    •  & Eli L Moss

    These authors contributed equally to this work.

Affiliations

  1. Department of Computer Science, Stanford University, Stanford, California, USA.

    • Alex Bishara
    •  & Serafim Batzoglou
  2. Department of Medicine (Hematology, Blood and Marrow Transplantation) and Department of Genetics, Stanford University, Stanford, California, USA.

    • Alex Bishara
    • , Eli L Moss
    • , Arend Sidow
    •  & Ami S Bhatt
  3. Department of Computer Science and Engineering, University of California San Diego, La Jolla, California, USA.

    • Mikhail Kolmogorov
  4. Department of Earth System Science, Stanford University, Stanford, California, USA.

    • Alma E Parada
    •  & Anne E Dekas
  5. Department of Pathology, Stanford University School of Medicine, Stanford, California, USA.

    • Ziming Weng
    •  & Arend Sidow

Authors

  1. Search for Alex Bishara in:

  2. Search for Eli L Moss in:

  3. Search for Mikhail Kolmogorov in:

  4. Search for Alma E Parada in:

  5. Search for Ziming Weng in:

  6. Search for Arend Sidow in:

  7. Search for Anne E Dekas in:

  8. Search for Serafim Batzoglou in:

  9. Search for Ami S Bhatt in:

Contributions

A.B., E.L.M., A.S.B. and S.B. conceived the study. Z.W. prepared read-cloud libraries. E.L.M. extracted DNA and prepared SLR sequencing libraries. E.L.M. performed PCR validation and Sanger sequencing. A.B. and S.B. conceived the assembly approach. A.B. implemented the Athena assembler. M.K. modified the Flye assembler for use with Athena. A.E.P. and A.E.D. collected the marine-sediment sample, extracted DNA from the marine-sediment sample and assisted in analysis of these samples. A.B., A.S.B. and E.L.M. carried out all analyses, wrote the manuscript, and generated figures. All authors commented on the manuscript.

Competing interests

S.B. is an employee of and owns stock in Illumina. Shotgun sequencing products developed, marketed and/or sold by Illumina were used in this work.

Corresponding authors

Correspondence to Serafim Batzoglou or Ami S Bhatt.

Supplementary information

PDF files

  1. 1.

    Supplementary Text and Figures

    Supplementary Figures 1–5, Supplementary Tables 1, 3, 4, and Supplementary Notes 1–8

  2. 2.

    Life Sciences Reporting Summary

Excel files

  1. 1.

    Supplementary Table 2

    Per-species MetaQUAST statistics for ATCC 20

  2. 2.

    Supplementary Table 7

    Comparisons of healthy gut genome bins and available references

  3. 3.

    Supplementary Table 10

    High-copy repeat statistics for P. copri

  4. 4.

    Supplementary Table 11

    Per-species MetaQUAST statistics for in silico downsampled ATCC 20

Text files

  1. 1.

    Supplementary Table 5

    Assignment of metagenomic contigs to bins for healthy gut samples

  2. 2.

    Supplementary Table 8

    Assignment of metagenomic contigs to bins for marine sediment sample

CSV files

  1. 1.

    Supplementary Table 6

    Per-taxon genome bin statistics for healthy gut samples

  2. 2.

    Supplementary Table 9

    Genome bin statistics for marine sample

  3. 3.

    Supplementary Table 12

    Per-taxon genome bin statistics for in silico downsampled human gut sample

About this article

Publication history

Received

Accepted

Published

DOI

https://doi.org/10.1038/nbt.4266