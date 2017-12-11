Abstract
Shotgun metagenomics methods enable characterization of microbial communities in human microbiome and environmental samples. Assembly of metagenome sequences does not output whole genomes, so computational binning methods have been developed to cluster sequences into genome 'bins'. These methods exploit sequence composition, species abundance, or chromosome organization but cannot fully distinguish closely related species and strains. We present a binning method that incorporates bacterial DNA methylation signatures, which are detected using single-molecule real-time sequencing. Our method takes advantage of these endogenous epigenetic barcodes to resolve individual reads and assembled contigs into species- and strain-level bins. We validate our method using synthetic and real microbiome sequences. In addition to genome binning, we show that our method links plasmids and other mobile genetic elements to their host species in a real microbiome sample. Incorporation of DNA methylation information into shotgun metagenomics analyses will complement existing methods to enable more accurate sequence binning.
Accessions
Primary accessions
BioProject
NCBI Reference Sequence
Referenced accessions
GenBank/EMBL/DDBJ
Sequence Read Archive
Acknowledgements
We thank M. Lewis for her assistance in DNA extraction and A. Bashir for his guidance in computational matters. We also thank those who contributed to the generation of the publically available SMRT sequencing data for the 20-member Mock Community B. The work is funded by R01 GM114472 (G.F.) from the National Institutes of Health and Icahn Institute for Genomics and Multiscale Biology. G.F. is a Nash Family Research Scholar. This work was also supported in part through the computational resources and staff expertise provided by the Department of Scientific Computing at the Icahn School of Medicine at Mount Sinai.
Integrated supplementary information
Supplementary figures
- 1.
Binning contigs from 8-species mock community.
- 2.
Shorter contigs contain fewer methylated motif sites.
- 3.
Composition and coverage-based binning methods applied to adult mouse gut microbiome assembly.
- 4.
Infant gut microbiome contigs binned by sequence composition and methylation profiles.
- 5.
CONCOCT bins of the mouse gut microbiome.
- 6.
Heatmaps of methylation profiles for K. pneumoniae.
- 7.
Sequence composition t-SNE map of modified HMP mock community B.
- 8.
5-mer frequency-based binning of unaligned reads from the modified HMP mock community B.
- 9.
t-SNE map of read-level methylation profiles for two H. pylori strains.
- 10.
Comparison of abundance-matched SMRT vs. synthetic long read (SLR) sequencing coverage.
- 11.
Examples of uneven coverage in SLR.
- 12.
Genomewide coverage of SLR and SMRT reads for all genomes in HMP mock community B.
- 13.
Reference matches for bins identified from methylation profiles in mouse gut microbiome.
- 14.
Modified relative abundances in HMP mock community B.
- 15.
Sequence composition t-SNE map of unmodified HMP mock community B.
Supplementary information
PDF files
- 1.
Supplementary Text and Figures
Supplementary Figures 1–15 Supplementary Methods
- 2.
Life Sciences Reporting Summary
Zip files
- 1.
Supplementary Tables
Supplementary tables 1–11
- 2.
Supplementary Code
Mbin Software package and relevant scripts