Abstract

Shotgun metagenomics methods enable characterization of microbial communities in human microbiome and environmental samples. Assembly of metagenome sequences does not output whole genomes, so computational binning methods have been developed to cluster sequences into genome 'bins'. These methods exploit sequence composition, species abundance, or chromosome organization but cannot fully distinguish closely related species and strains. We present a binning method that incorporates bacterial DNA methylation signatures, which are detected using single-molecule real-time sequencing. Our method takes advantage of these endogenous epigenetic barcodes to resolve individual reads and assembled contigs into species- and strain-level bins. We validate our method using synthetic and real microbiome sequences. In addition to genome binning, we show that our method links plasmids and other mobile genetic elements to their host species in a real microbiome sample. Incorporation of DNA methylation information into shotgun metagenomics analyses will complement existing methods to enable more accurate sequence binning.

Access optionsAccess options

Rent or Buy article

Get time limited or full article access on ReadCube.

from$8.99

All prices are NET prices.

Accessions

Primary accessions

BioProject

Referenced accessions

GenBank/EMBL/DDBJ

NCBI Reference Sequence

Sequence Read Archive

References

  1. 1.

    & The human microbiome: at the interface of health and disease. Nat. Rev. Genet. 13, 260–270 (2012).

  2. 2.

    Human Microbiome Project Consortium. Structure, function and diversity of the healthy human microbiome. Nature 486, 207–214 (2012).

  3. 3.

    & 16S rRNA gene sequencing for bacterial identification in the diagnostic laboratory: pluses, perils, and pitfalls. J. Clin. Microbiol. 45, 2761–2764 (2007).

  4. 4.

    et al. A human gut microbial gene catalogue established by metagenomic sequencing. Nature 464, 59–65 (2010).

  5. 5.

    et al. Community structure and metabolism through reconstruction of microbial genomes from the environment. Nature 428, 37–43 (2004).

  6. 6.

    , , & Antibiotic treatment expands the resistance reservoir and ecological network of the phage metagenome. Nature 499, 219–222 (2013).

  7. 7.

    et al. ConStrains identifies microbial strains in metagenomic datasets. Nat. Biotechnol. 33, 1045–1052 (2015).

  8. 8.

    et al. Synthetic long-read sequencing reveals intraspecies diversity in the human microbiome. Nat. Biotechnol. 34, 64–69 (2016).

  9. 9.

    & Phymm and PhymmBL: metagenomic phylogenetic classification with interpolated Markov models. Nat. Methods 6, 673–676 (2009).

  10. 10.

    & Kraken: ultrafast metagenomic sequence classification using exact alignments. Genome Biol. 15, R46 (2014).

  11. 11.

    , & Unsupervised discovery of microbial population structure within metagenomes using nucleotide base composition. Nucleic Acids Res. 40, e34 (2012).

  12. 12.

    et al. Untangling genomes from metagenomes: revealing an uncultured class of marine Euryarchaeota. Science 335, 587–590 (2012).

  13. 13.

    , , & Alignment-free visualization of metagenomic data by nonlinear dimension reduction. Sci. Rep. 4, 4516 (2014).

  14. 14.

    et al. VizBin - an application for reference-independent visualization and human-augmented binning of metagenomic data. Microbiome 3, 1–7 (2015).

  15. 15.

    et al. Time series community genomics analysis reveals rapid shifts in bacterial species, strains, and phage during infant gut colonization. Genome Res. 23, 111–120 (2013).

  16. 16.

    et al. Genome sequences of rare, uncultured bacteria obtained by differential coverage binning of multiple metagenomes. Nat. Biotechnol. 31, 533–538 (2013).

  17. 17.

    et al. Identification and assembly of genomes and genetic elements in complex metagenomic samples without using reference genomes. Nat. Biotechnol. 32, 822–828 (2014).

  18. 18.

    et al. Binning metagenomic contigs by coverage and composition. Nat. Methods 11, 1144–1146 (2014).

  19. 19.

    et al. Metagenomic chromosome conformation capture (meta3C) unveils the diversity of chromosome organization in microorganisms. eLife 3, e03318 (2014).

  20. 20.

    , , & Species-level deconvolution of metagenome assemblies with Hi-C-based contact probability maps. G3 (Bethesda) 4, 1339–1346 (2014).

  21. 21.

    , , & Scaffolding bacterial genomes and probing host-virus interactions in gut microbiome by proximity ligation (chromosome capture) assay. Sci. Adv. 3, e1602105 (2017).

  22. 22.

    et al. Direct detection of DNA methylation during single-molecule, real-time sequencing. Nat. Methods 7, 461–465 (2010).

  23. 23.

    et al. Real-time DNA sequencing from single polymerase molecules. Science 323, 133–138 (2009).

  24. 24.

    & Epigenetic gene regulation in the bacterial world. Microbiol. Mol. Biol. Rev. 70, 830–856 (2006).

  25. 25.

    et al. The epigenomic landscape of prokaryotes. PLoS Genet. 12, e1005854 (2016).

  26. 26.

    , , & Shaping the genome--restriction-modification systems as mobile genetic elements. Curr. Opin. Genet. Dev. 9, 649–656 (1999).

  27. 27.

    et al. Single-molecule sequencing to track plasmid diversity of hospital-associated carbapenemase-producing Enterobacteriaceae. Sci. Transl. Med. 6, 254ra126 (2014).

  28. 28.

    et al. Modeling kinetic rate variation in third generation DNA sequencing data to detect putative modifications to DNA bases. Genome Res. 23, 129–141 (2013).

  29. 29.

    et al. Single molecule-level detection and long read-based phasing of epigenetic variations in bacterial methylomes. Nat. Commun. 6, 7438 (2015).

  30. 30.

    & Visualizing data using t-SNE. J. Mach. Learn. Res. 9, 2579–2605 (2008).

  31. 31.

    Accelerating t-sne using tree-based algorithms. J. Mach. Learn. Res. 15, 3221–3245 (2014).

  32. 32.

    , , & Towards a taxonomic coherence between average nucleotide identity and 16S rRNA gene sequence similarity for species demarcation of prokaryotes. Int. J. Syst. Evol. Microbiol. 64, 346–351 (2014).

  33. 33.

    , , , & CheckM: assessing the quality of microbial genomes recovered from isolates, single cells, and metagenomes. Genome Res. 25, 1043–1055 (2015).

  34. 34.

    et al. Complete genome sequences of 12 species of Stable Defined Moderately Diverse Mouse Microbiota 2. Genome Announc. 4, e00951–16 (2016).

  35. 35.

    et al. Genomic characterization of the uncultured Bacteroidales family S24-7 inhabiting the guts of homeothermic animals. Microbiome 4, 36 (2016).

  36. 36.

    et al. A catalog of the mouse gut metagenome. Nat. Biotechnol. 33, 1103–1108 (2015).

  37. 37.

    , , & Draft genome sequences of the altered schaedler flora, a defined bacterial community from gnotobiotic mice. Genome Announc. 2, e00287–14 (2014).

  38. 38.

    et al. GroopM: an automated tool for the recovery of population genomes from related metagenomes. PeerJ 2, e603 (2014).

  39. 39.

    , , & MetaBAT, an efficient tool for accurately reconstructing single genomes from complex microbial communities. PeerJ 3, e1165 (2015).

  40. 40.

    , , & Progress towards understanding the fate of plasmids in bacterial communities. FEMS Microbiol. Ecol. 66, 3–13 (2008).

  41. 41.

    & Mechanisms of, and barriers to, horizontal gene transfer between bacteria. Nat. Rev. Microbiol. 3, 711–721 (2005).

  42. 42.

    et al. Genome-wide mapping of methylated adenine residues in pathogenic Escherichia coli using single-molecule real-time sequencing. Nat. Biotechnol. 30, 1232–1239 (2012).

  43. 43.

    , , & REBASE—a database for DNA restriction and modification: enzymes, genes and genomes. Nucleic Acids Res. 43, D298–D299 (2015).

  44. 44.

    , , , & Evidence of extensive DNA transfer between bacteroidales species within the human gut. MBio 5, e01305–e01314 (2014).

  45. 45.

    et al. Nonhybrid, finished microbial genome assemblies from long-read SMRT sequencing data. Nat. Methods 10, 563–569 (2013).

  46. 46.

    et al. The complex methylome of the human gastric pathogen Helicobacter pylori. Nucleic Acids Res. 42, 2415–2432 (2014).

  47. 47.

    et al. Continuous base identification for single-molecule nanopore DNA sequencing. Nat. Nanotechnol. 4, 265–270 (2009).

  48. 48.

    et al. Real-time single-molecule electronic DNA sequencing by synthesis using polymer-tagged nucleotides on a nanopore array. Proc. Natl. Acad. Sci. USA 113, 5233–5238 (2016).

  49. 49.

    et al. Mapping DNA methylation with high-throughput nanopore sequencing. Nat. Methods 14, 411–413 (2017).

  50. 50.

    , , & Single-cell genome sequencing at ultra-high-throughput with microfluidic droplet barcoding. Nat. Biotechnol. 35, 640–646 (2017).

  51. 51.

    Silhouettes: a graphical aid to the interpretation and validation of cluster analysis. J. Comput. Appl. Math. 20, 53–65 (1987).

  52. 52.

    et al. QIIME allows analysis of high-throughput community sequencing data. Nat. Methods 7, 335–336 (2010).

  53. 53.

    et al. Faecalibacterium prausnitzii is an anti-inflammatory commensal bacterium identified by gut microbiota analysis of Crohn disease patients. Proc. Natl. Acad. Sci. USA 105, 16731–16736 (2008).

  54. 54.

    et al. Antibiotic-mediated gut microbiome perturbation accelerates development of type 1 diabetes in mice. Nat. Microbiol. 1, 16140 (2016).

  55. 55.

    & A stable shuttle vector system for efficient genetic complementation of Helicobacter pylori strains by transformation and conjugation. Mol. Gen. Genet. 257, 519–528 (1998).

  56. 56.

    & Natural transformation of an engineered Helicobacter pylori strain deficient in type II restriction endonucleases. J. Bacteriol. 194, 3407–3416 (2012).

  57. 57.

    et al. The methylome of the gut microbiome: disparate Dam methylation patterns in intestinal Bacteroides dorei. Front. Microbiol. 5, 361 (2014).

  58. 58.

    , , & Ultrafast and memory-efficient alignment of short DNA sequences to the human genome. Genome Biol. 10, R25 (2009).

  59. 59.

    et al. Detecting DNA modifications from SMRT sequencing data by modeling sequence context dependence of polymerase kinetic. PLOS Comput. Biol. 9, e1002935 (2013).

  60. 60.

    et al. Versatile and open software for comparing large genomes. Genome Biol. 5, R12 (2004).

  61. 61.

    et al. Complete genome sequence of a CTX-M-15-producing Klebsiella pneumoniae outbreak strain from multilocus sequence type 514. Genome Announc. 3, e00742–e15 (2015).

  62. 62.

    fastcluster: Fast hierarchical, agglomerative. J. Stat. Softw. 53, 1–18 (2013).

  63. 63.

    , & The NumPy Array: a structure for efficient numerical computation. Comput. Sci. Eng. 13, 22–30 (2011).

  64. 64.

    et al. Circlator: automated circularization of genome assemblies using long sequencing reads. Genome Biol. 16, 294 (2015).

  65. 65.

    , & Gepard: a rapid and sensitive tool for creating dotplots on genome scale. Bioinformatics 23, 1026–1028 (2007).

  66. 66.

    et al. The RAST Server: rapid annotations using subsystems technology. BMC Genomics 9, 75 (2008).

Download references

Acknowledgements

We thank M. Lewis for her assistance in DNA extraction and A. Bashir for his guidance in computational matters. We also thank those who contributed to the generation of the publically available SMRT sequencing data for the 20-member Mock Community B. The work is funded by R01 GM114472 (G.F.) from the National Institutes of Health and Icahn Institute for Genomics and Multiscale Biology. G.F. is a Nash Family Research Scholar. This work was also supported in part through the computational resources and staff expertise provided by the Department of Scientific Computing at the Icahn School of Medicine at Mount Sinai.

Author information

Affiliations

  1. Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, New York, USA.

    • John Beaulaurier
    • , Shijia Zhu
    • , Gintaras Deikus
    • , Ilaria Mogno
    • , Jeremiah J Faith
    • , Robert Sebra
    • , Eric E Schadt
    •  & Gang Fang
  2. Icahn Institute for Genomics and Multiscale Biology, Icahn School of Medicine at Mount Sinai, New York, New York, USA.

    • John Beaulaurier
    • , Shijia Zhu
    • , Gintaras Deikus
    • , Ilaria Mogno
    • , Jeremiah J Faith
    • , Robert Sebra
    • , Eric E Schadt
    •  & Gang Fang
  3. Immunology Institute, Icahn School of Medicine at Mount Sinai, New York, New York, USA.

    • Ilaria Mogno
    •  & Jeremiah J Faith
  4. Department of Medicine, New York University School of Medicine, New York, New York, USA.

    • Xue-Song Zhang
  5. Department of Microbiology and Cell Science, Institute of Food and Agricultural Sciences, University of Florida, Gainesville, Florida, USA.

    • Austin Davis-Richardson
    • , Ronald Canepa
    •  & Eric W Triplett
  6. Sema4, a Mount Sinai venture, Stamford, Connecticut, USA.

    • Robert Sebra
    •  & Eric E Schadt

Authors

  1. Search for John Beaulaurier in:

  2. Search for Shijia Zhu in:

  3. Search for Gintaras Deikus in:

  4. Search for Ilaria Mogno in:

  5. Search for Xue-Song Zhang in:

  6. Search for Austin Davis-Richardson in:

  7. Search for Ronald Canepa in:

  8. Search for Eric W Triplett in:

  9. Search for Jeremiah J Faith in:

  10. Search for Robert Sebra in:

  11. Search for Eric E Schadt in:

  12. Search for Gang Fang in:

Contributions

J.B. and G.F. designed the methods. J.B. developed the software package for all the proposed computational analyses. J.B., E.W.T., J.J.F. R.S., E.E.S. and G.F. contributed to experimental design. I.M., X.-S.Z., A.D.-R., R.C., E.W.T. and J.J.F. conducted the experiments. G.D. and R.S. designed and conducted sequencing. J.B., S.Z., E.W.T., J.J.F., R.S., E.E.S. and G.F. analyzed the data. J.B. and G.F. wrote the manuscript with inputs and comments from all co-authors. G.F. conceived and supervised the project.

Competing interests

E.E.S. is on the scientific advisory board of Pacific Biosciences. J.B. and G.F. are inventors of a US Provisional patent application (No. 62/525,908) that describes the method for methylation binning.

Corresponding author

Correspondence to Gang Fang.

Integrated supplementary information

Supplementary information

PDF files

  1. 1.

    Supplementary Text and Figures

    Supplementary Figures 1–15 Supplementary Methods

  2. 2.

    Life Sciences Reporting Summary

Zip files

  1. 1.

    Supplementary Tables

    Supplementary tables 1–11

  2. 2.

    Supplementary Code

    Mbin Software package and relevant scripts

About this article

Publication history

Received

Accepted

Published

DOI

https://doi.org/10.1038/nbt.4037