Discovering multiple types of DNA methylation from bacteria and microbiome using nanopore sequencing

Tourancheau, Alan; Mead, Edward A.; Zhang, Xue-Song; Fang, Gang

doi:10.1038/s41592-021-01109-3

Article
Published: 05 April 2021

Discovering multiple types of DNA methylation from bacteria and microbiome using nanopore sequencing

Nature Methods volume 18, pages 491–498 (2021)Cite this article

15k Accesses
59 Citations
133 Altmetric
Metrics details

Subjects

Abstract

Bacterial DNA methylation occurs at diverse sequence contexts and plays important functional roles in cellular defense and gene regulation. Existing methods for detecting DNA modification from nanopore sequencing data do not effectively support de novo study of unknown bacterial methylomes. In this work, we observed that a nanopore sequencing signal displays complex heterogeneity across methylation events of the same type. To enable nanopore sequencing for broadly applicable methylation discovery, we generated a training dataset from an assortment of bacterial species and developed a method, named nanodisco (https://github.com/fanglab/nanodisco), that couples the identification and fine mapping of the three forms of methylation into a multi-label classification framework. We applied it to individual bacteria and the mouse gut microbiome for reliable methylation discovery. In addition, we demonstrated the use of DNA methylation for binning metagenomic contigs, associating mobile genetic elements with their host genomes and identifying misassembled metagenomic contigs.

Access through your institution

Buy or subscribe

This is a preview of subscription content, access via your institution

Access options

Access through your institution

Buy this article

Purchase on Springer Link
Instant access to full article PDF

Buy now

Prices may be subject to local taxes which are calculated during checkout

**Fig. 1: Schematics for the method design and applications.**

**Fig. 2: Systematic examination of three main types of DNA methylation with nanopore sequencing.**

**Fig. 3: Local sequence context effect on motif signatures.**

**Fig. 4: Classification and fine mapping of three types of DNA methylation.**

**Fig. 5: Methylation analysis of mouse gut microbiome samples.**

Inferring gene regulatory networks from single-cell multiome data using atlas-scale external data

Article Open access 12 April 2024

A distinct Fusobacterium nucleatum clade dominates the colorectal cancer niche

Article Open access 20 March 2024

Assessing GPT-4 for cell type annotation in single-cell RNA-seq analysis

Article Open access 25 March 2024

Data availability

All sequencing data generated for this study are available at the Sequence Read Archive under the BioProjects PRJNA559199 for individual bacteria and PRJNA559386 for the mouse gut microbiome samples. NCBI reference sequences used for the individual bacteria analysis are available under the accession codes CP041693, CP041696, NC_008261.1, CP014225.1, CP023448.1, NC_007796.1, NC_002946.2, CP041695 and CP003732.1 (Supplementary Table 1). Information related to methylation motifs are available from the REBASE database (http://rebase.neb.com)¹⁶. Data from the SMRT sequencing metagenomic study can be found under the BioProject PRJNA404082.

Code availability

The nanodisco software and a detailed tutorial with supporting data are available at http://github.com/fanglab/nanodisco.

References

Beaulaurier, J., Schadt, E. E. & Fang, G. Deciphering bacterial epigenomes using modern sequencing technologies. Nat. Rev. Genet. 20, 157–172 (2019).
Article CAS PubMed PubMed Central Google Scholar
Flusberg, B. A. et al. Direct detection of DNA methylation during single-molecule, real-time sequencing. Nat. Methods 7, 461–465 (2010).
Article CAS PubMed PubMed Central Google Scholar
Blow, M. J. et al. The epigenomic landscape of prokaryotes. PLoS Genet. 12, e1005854 (2016).
Article PubMed PubMed Central CAS Google Scholar
Laszlo, A. H. et al. Detection and mapping of 5-methylcytosine and 5-hydroxymethylcytosine with nanopore MspA. Proc. Natl Acad. Sci. USA 110, 18904–18909 (2013).
Article CAS PubMed PubMed Central Google Scholar
Schreiber, J. et al. Error rates for nanopore discrimination among cytosine, methylcytosine, and hydroxymethylcytosine along individual DNA strands. Proc. Natl Acad. Sci. USA 110, 18910–18915 (2013).
Article CAS PubMed PubMed Central Google Scholar
Simpson, J. T. et al. Detecting DNA cytosine methylation using nanopore sequencing. Nat. Methods 14, 407–410 (2017).
Article CAS PubMed Google Scholar
Rand, A. C. et al. Mapping DNA methylation with high-throughput nanopore sequencing. Nat. Methods 14, 411–413 (2017).
Article CAS PubMed PubMed Central Google Scholar
McIntyre, A. B. R. et al. Single-molecule sequencing detection of N6-methyladenine in microbial reference materials. Nat. Commun. 10, 579 (2019).
Article CAS PubMed PubMed Central Google Scholar
Ni, P. et al. DeepSignal: detecting DNA methylation state from Nanopore sequencing reads using deep-learning. Bioinformatics 35, 4586–4595 (2019).
Article CAS PubMed Google Scholar
Liu, Q. et al. Detection of DNA base modifications by deep recurrent neural network on Oxford Nanopore sequencing data. Nat. Commun. 10, 2449 (2019).
Article PubMed PubMed Central CAS Google Scholar
Liu, Q., Georgieva, D. C., Egli, D. & Wang, K. NanoMod: a computational tool to detect DNA modifications using Nanopore long-read sequencing data. BMC Genomics 20, 78 (2019).
Article CAS PubMed PubMed Central Google Scholar
Stoiber, M. et al. De novo identification of DNA modifications enabled by genome-guided nanopore signal processing. Preprint at bioRxiv https://doi.org/10.1101/094672 (2017).
Amarasinghe, S. L. et al. Opportunities and challenges in long-read sequencing data analysis. Genome Biol. 21, 30 (2020).
Article PubMed PubMed Central Google Scholar
Wion, D. & Casadesus, J. N6-methyl-adenine: an epigenetic signal for DNA-protein interactions. Nat. Rev. Microbiol. 4, 183–192 (2006).
Article CAS PubMed PubMed Central Google Scholar
Casadesus, J. & Low, D. Epigenetic gene regulation in the bacterial world. Microbiol Mol. Biol. Rev. 70, 830–856 (2006).
Article CAS PubMed PubMed Central Google Scholar
Roberts, R. J., Vincze, T., Posfai, J. & Macelis, D. REBASE–a database for DNA restriction and modification: enzymes, genes and genomes. Nucleic Acids Res. 43, D298–D299 (2015).
Article CAS PubMed Google Scholar
Van Der Maaten, L. Accelerating t-SNE using tree-based algorithms. J. Mach. Learn. Res. 15, 3221–3245 (2014).
Google Scholar
Bailey, T. L., Johnson, J., Grant, C. E. & Noble, W. S. The MEME Suite. Nucleic Acids Res. 43, W39–W49 (2015).
Article CAS PubMed PubMed Central Google Scholar
Saeed, I., Tang, S. L. & Halgamuge, S. K. Unsupervised discovery of microbial population structure within metagenomes using nucleotide base composition. Nucleic Acids Res. 40, e34 (2012).
Article CAS PubMed Google Scholar
Iverson, V. et al. Untangling genomes from metagenomes: revealing an uncultured class of marine Euryarchaeota. Science 335, 587–590 (2012).
Article CAS PubMed Google Scholar
Laczny, C. C., Pinel, N., Vlassis, N. & Wilmes, P. Alignment-free visualization of metagenomic data by nonlinear dimension reduction. Sci. Rep. 4, 4516 (2014).
Article CAS PubMed PubMed Central Google Scholar
Laczny, C. C. et al. VizBin—an application for reference-independent visualization and human-augmented binning of metagenomic data. Microbiome 3, 1 (2015).
Article PubMed PubMed Central Google Scholar
Albertsen, M. et al. Genome sequences of rare, uncultured bacteria obtained by differential coverage binning of multiple metagenomes. Nat. Biotechnol. 31, 533–538 (2013).
Article CAS PubMed Google Scholar
Sharon, I. et al. Time series community genomics analysis reveals rapid shifts in bacterial species, strains, and phage during infant gut colonization. Genome Res 23, 111–120 (2013).
Article CAS PubMed PubMed Central Google Scholar
Alneberg, J. et al. Binning metagenomic contigs by coverage and composition. Nat. Methods 11, 1144–1146 (2014).
Article CAS PubMed Google Scholar
Nielsen, H. B. et al. Identification and assembly of genomes and genetic elements in complex metagenomic samples without using reference genomes. Nat. Biotechnol. 32, 822–828 (2014).
Article CAS PubMed Google Scholar
Marbouty, M. et al. Metagenomic chromosome conformation capture (meta3C) unveils the diversity of chromosome organization in microorganisms. eLlife 3, e03318 (2014).
Article Google Scholar
Burton, J. N., Liachko, I., Dunham, M. J. & Shendure, J. Species-level deconvolution of metagenome assemblies with Hi-C-based contact probability maps. G3 4, 1339–1346 (2014).
Article PubMed PubMed Central Google Scholar
Marbouty, M., Baudry, L., Cournac, A. & Koszul, R. Scaffolding bacterial genomes and probing host-virus interactions in gut microbiome by proximity ligation (chromosome capture) assay. Sci. Adv. 3, e1602105 (2017).
Article PubMed PubMed Central CAS Google Scholar
Beaulaurier, J. et al. Metagenomic binning and association of plasmids with bacterial host genomes using DNA methylation. Nat. Biotechnol. 36, 61–69 (2018).
Article CAS PubMed Google Scholar
Fang, G. et al. Genome-wide mapping of methylated adenine residues in pathogenic Escherichia coli using single-molecule real-time sequencing. Nat. Biotechnol. 30, 1232–1239 (2012).
Article CAS PubMed Google Scholar
Murray, I. A. et al. The methylomes of six bacteria. Nucleic Acids Res. 40, 11450–11462 (2012).
Article CAS PubMed PubMed Central Google Scholar
Schadt, E. E. et al. Modeling kinetic rate variation in third generation DNA sequencing data to detect putative modifications to DNA bases. Genome Res 23, 129–141 (2013).
Article CAS PubMed PubMed Central Google Scholar
Beaulaurier, J. et al. Single molecule-level detection and long read-based phasing of epigenetic variations in bacterial methylomes. Nat. Commun. 6, 7438 (2015).
Article CAS PubMed Google Scholar
Song, C. X., Yi, C. & He, C. Mapping recently identified nucleotide variants in the genome and transcriptome. Nat. Biotechnol. 30, 1107–1116 (2012).
Article CAS PubMed PubMed Central Google Scholar
Yoshihara, M., Jiang, L., Akatsuka, S., Suyama, M. & Toyokuni, S. Genome-wide profiling of 8-oxoguanine reveals its association with spatial positioning in nucleus. DNA Res 21, 603–612 (2014).
Article CAS PubMed PubMed Central Google Scholar
Li, S. & Mason, C. E. The pivotal regulatory landscape of RNA modifications. Annu Rev. Genomics Hum. Genet 15, 127–150 (2014).
Article CAS PubMed Google Scholar
Roundtree, I. A., Evans, M. E., Pan, T. & He, C. Dynamic RNA modifications in gene expression regulation. Cell 169, 1187–1200 (2017).
Article CAS PubMed PubMed Central Google Scholar
Garalde, D. R. et al. Highly parallel direct RNA sequencing on an array of nanopores. Nat. Methods 15, 201–206 (2018).
Article CAS PubMed Google Scholar
Workman, R. E. et al. Nanopore native RNA sequencing of a human poly(A) transcriptome. Nat. Methods 16, 1297–1305 (2019).
Article CAS PubMed PubMed Central Google Scholar
Yang, C., Chu, J., Warren, R. L. & Birol, I. NanoSim: nanopore sequence read simulator based on statistical characterization. Gigascience 6, 1–6 (2017).
PubMed PubMed Central Google Scholar
Li, H. Aligning sequence reads, clone sequences and assembly contigs with BWA-MEM. Preprint at https://arxiv.org/abs/1303.3997 (2013).
Morgan, M., Pagès, H., Obenchain, V. & Hayden, N. Rsamtools: binary alignment (BAM), FASTA, variant call (BCF), and tabix file import v.3.12 (Bioconductor, 2016).
Li, H. et al. The Sequence Alignment/Map format and SAMtools. Bioinformatics 25, 2078–2079 (2009).
Article PubMed PubMed Central CAS Google Scholar
Koren, S. et al. Canu: scalable and accurate long-read assembly via adaptive k-mer weighting and repeat separation. Genome Res 27, 722–736 (2017).
Article CAS PubMed PubMed Central Google Scholar
Vaser, R., Sovic, I., Nagarajan, N. & Sikic, M. Fast and accurate de novo genome assembly from long uncorrected reads. Genome Res 27, 737–746 (2017).
Article CAS PubMed PubMed Central Google Scholar
Kurtz, S. et al. Versatile and open software for comparing large genomes. Genome Biol. 5, R12 (2004).
Article PubMed PubMed Central Google Scholar
Kolmogorov, M., Yuan, J., Lin, Y. & Pevzner, P. A. Assembly of long, error-prone reads using repeat graphs. Nat. Biotechnol. 37, 540–546 (2019).
Article CAS PubMed Google Scholar
Li, H. Minimap2: pairwise alignment for nucleotide sequences. Bioinformatics 34, 3094–3100 (2018).
CAS PubMed PubMed Central Google Scholar

Download references

Acknowledgements

We thank A. Fomenkov and R. J. Roberts from NEB for their help with the bacterial strain selection and for providing us with DNA samples (B. amyloliquefaciens, B. fusiformis and N. otitidiscaviarum). We also thank R. Gunsalus from the University of California, Los Angeles (M. hungatei), S. Logan from the National Research Council Canada (C. perfringens), L. Jackson from the University of Oklahoma Health Sciences Center (N. gonorrhoeae) and B. Schink, N. Müller and A. Keller from the University of Konstanz, Germany (T. phaeum) for providing us with DNA samples. We thank Y. Kong and M. Ni for providing helpful feedback for early versions of this paper. This work was supported by a seed fund from Icahn Institute for Genomics and Multiscale Biology (G.F.) and by grant nos. R01 GM128955 (G.F.), R35 GM139655 (G.F.) and R56 HG011095 (G.F.) from the National Institutes of Health. G.F. is a Hirschl Research Scholar by Irma T. Hirschl/Monique Weill-Caulier Trust, and a Nash Family Research Scholar. This work was also supported in part through the computational resources and staff expertise provided by the Department of Scientific Computing at the Icahn School of Medicine at Mount Sinai.

Author information

Authors and Affiliations

Department of Genetics and Genomic Sciences and Icahn Institute for Genomics and Multiscale Biology, Icahn School of Medicine at Mount Sinai, New York, NY, USA
Alan Tourancheau, Edward A. Mead & Gang Fang
Center for Advanced Biotechnology and Medicine, Rutgers University, New Brunswick, NJ, USA
Xue-Song Zhang

Authors

Alan Tourancheau
View author publications
You can also search for this author in PubMed Google Scholar
Edward A. Mead
View author publications
You can also search for this author in PubMed Google Scholar
Xue-Song Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Gang Fang
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

G.F. conceived and supervised the project. A.T. and G.F. designed the methods. A.T. developed the software package for all the proposed computational analyses. A.T., E.A.M. and X-S.Z. conducted the experiments. A.T. and G.F. analyzed the data and wrote the paper with inputs and comments from all coauthors.

Corresponding author

Correspondence to Gang Fang.

Ethics declarations

Competing interests

A.T. and G.F. are inventors of two US Provisional patent applications (62/860,952 and 62/851,205) that describe the methods in this paper.

Additional information

Peer reviewer information Nature Methods thanks the anonymous reviewers for their contribution to the peer review of this work. Lei Tang was the primary editor on this article and managed its editorial process and peer review in collaboration with the rest of the editorial team.

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Extended data

Extended Data Fig. 1 General statistics of motif signatures.

(a) Distribution of current differences are shown for all confident motifs altogether (n = 46 motifs) as well as average absolute differences and associated standard deviations near methylated bases ([− 10 bp, + 11 bp]). The lower and upper hinges correspond to the 25th and 75th percentiles while the lower and upper whiskers extend to the minima and maxima respectively (capped at 1.5 time the inter-quartile range). (b) Same as a with distinction between DNA methylation types (n = 28 6 mA motifs, n = 7 4mC motifs, n = 11 5mC motifs). (c) Same as a but for individual methylation motifs.

Extended Data Fig. 2 Systematic examination of three main DNA methylation types with nanopore sequencing.

(a) t-SNE projection of isolated methylation motif occurrences separated per motif. The same dataset as Fig. 2b was used with occurrences colored per motif. Other motifs are colored in gray. (b) Same as a but grouped by methylation type.

Extended Data Fig. 3 Nanopore sequencing signal processing variable.

(a) Comparison of current differences across methylation occurrences between datasets base called with Albacore 1.1.0, Albacore 2.3.4, and Guppy 3.2.4 illustrated by projection with t-SNE from for 46 well-characterized motifs (Supplementary Table 2). Each dot represents one isolated motif occurrence colored by base caller versions. 100,000 motif occurrences were randomly selected from each dataset to reduce the scatter plot density and ease the visualization. For each motif occurrence, current differences from 22 positions near methylated bases ([− 10 bp, + 11 bp]) were used. (b) Performance for de novo methylated site detection between datasets base called with Albacore 1.1.0, Albacore 2.3.4, and Guppy 3.2.4. We evaluated individual motif occurrences detection using Precision-Recall curves for H. pylori at 75x coverage. Precision-Recall curves and area under the curves (AUC) were computed as described in the Method section. Only confident H. pylori motifs were considered for the evaluation. (c) Comparison of current differences across methylation occurrences (same as a) between datasets produced with or without outlier removal step (Methods). (d) Performance for de novo methylated site detection (similar than b) with datasets produced with or without outlier removal step. (e) Variation of current differences across methylation occurrences without outlier removal step as illustrated by motif signatures from three motifs, AG4mCT (n = 6550 occurrences), GGW5mCC (n = 1875 occurrences), and GCYYG6mAT (n = 954 occurrences). For each motif, current differences near methylated bases ([− 6 bp, + 7 bp]) from all isolated occurrences are plotted with conservation of relative distances to methylated bases. Distributions of current differences for each relative distance are displayed as a violin plot. Current differences axis is limited to −8 to 8 pA range. (f) Performance for de novo methylated site detection across current difference datasets generated with different read alignment type filtering: remove alternative alignments (filtered out XA bam flags; named No Alt.), remove supplementary alignments (filtered out 2048 bam flags; named No Supp.), remove chimeric alignments (filtered out SA bam flags; named No Chim.), only conserve unique mapping (filtered out XA and SA bam flags; named Unique), and keep all alignments (named None). (g) Performance for de novo methylated site detection across datasets normalized with linear regression (lm function), robust regression (rlm function) or no additional normalization (annotated as none). (h) Performance for de novo methylated site detection across datasets generated using two-sided Mann-Whitney U-test or Student’s t-test. (i) Performance for de novo methylated site detection across datasets generated using different p-value smoothing window size: no smoothing (named None), 3 nt, 5 nt, and 7 nt. (j) Performance for de novo methylated site detection across datasets generated using different function for combining consecutives p-values: Fisher’s method (named sumlog), logit method (named logitp), sum p method (named sump), and sum z method (named sumz). (k) Performance for de novo methylated site detection across peaks datasets generated using different peak detection window size: 5 nt, 7 nt, and 9 nt. Plots f, g, h, i, j, and k show Precision-Recall curves and area under the curves (AUC) for various signal processing steps and were computed as described in the Method section. (l) Comparison of current differences across methylation occurrences (same as a) with E. coli datasets (200x) produced using either the reference genome or the de novo assembly (Methods). (m) Performance for de novo methylated site detection in E. coli datasets (200x) using either the reference genome or the de novo assembly. (n) Performance of methylation motif typing and fine mapping on E. coli datasets (200x) produced using either the reference genome or the de novo assembly (motif occurrences: n = 458 for AACNNNNNNGTGC, n = 18451 for CCWGG, n = 28110 for GATC, n = 463 for GCACNNNNNNGTT). Only results for k-nearest neighbors, neural network, and random forest are displayed.

Extended Data Fig. 4 Additional information for classification of methylation motif occurrences.

(a) Approximation of DNA methylation position in three motifs, AG4mCT (n = 6549 occurrences), GGW5mCC (n = 1875 occurrences), and GCYYG6mAT (n = 954 occurrences). Signal strength is computed using a sliding window alongside motif signature to choose the best vector positioning to use for classification. (b) Flowchart description of procedure for classifier training and novel motifs dataset annotation. Training the classifier consists of gathering a set of bacteria with characterized methylomes. Confident motifs are selected to assure the robustness of the final classifier, then all motif occurrences are localized in the genome (from corresponding reference genome or de novo assembled and polished genome). Current differences are then computed along the genome. Next, the training dataset is built from the offsetted vector of current differences labelled with the known methylation type and the offset combination. Finally, the classifier is trained using the chosen model(s). Analyzing a new bacterial sample consists of de novo detecting the methylated motif from processed current differences (see Methods). Then methylated motif occurrences are localized and the motif signatures are computed (that is, distribution of current differences at relative distance from the methylated bases). Next, those signatures are leveraged to approximate the methylated position for each de novo detected motif (see Methods), which is used to define the classifier inputs (that is, vector of current differences centered on the approximate methylated position). Finally, the trained classifier is used to predict the methylation type and fine map the DNA methylation for each motif. (c) Boxplot of overall prediction accuracy in LOOCV evaluation (n = 46 motifs) for each classifier. Classifiers are ordered by average accuracy. The lower and upper hinges correspond to the 25th and 75th percentiles while the lower and upper whiskers extend to the minima and maxima respectively (capped at 1.5 time the inter-quartile range). (d) Effect of hyperparameters on classification accuracy. Boxplot of overall prediction accuracy in LOOCV evaluation with classifiers trained on all motifs except the ones from H. pylori (n = 27 motifs). Hyperparameters were either tuned on H. pylori motifs only (“Alt. HP”) or on all motifs (“Main HP”). The lower and upper hinges correspond to the 25th and 75th percentiles while the lower and upper whiskers extend to the minima and maxima respectively (capped at 1.5 time the inter-quartile range). (e) Relationship between LOOCV accuracy and current difference signal similarities. Current difference signal near methylated bases is visualized by projection with t-SNE for the 46 well-characterized motifs similar to Fig. 2b. Each dot represents one isolated motif occurrence colored by accuracy from LOOCV analysis.

Extended Data Fig. 5 Classification and fine mapping of three types of DNA methylation (part 1).

Similar to Fig. 4d with full set of prediction results for a subset of methylation motifs for k-nearest neighbors, random forest, and neural network. Filling colors correspond to percentage of occurrences classified to a specific class ranging from blue (0%) to red (100%). Greyed out prediction correspond to out of motif position. Blank columns correspond to within-motif positions without prediction. Prediction percentages of expected classes are displayed in italic and selected predictions based on consensus are displayed in bold.

Extended Data Fig. 6 Classification and fine mapping of three types of DNA methylation (part 2).

See Extended Data Fig. 5.

Extended Data Fig. 7 Evaluation of motif enrichment with Precision-Recall curves.

(a) Effect of coverage on de novo methylated site detection. We evaluated individual motif occurrences detection using Precision-Recall curves (PR curves) for H. pylori. Studied datasets with coverage ranging from 5x to 200x were generated by random subsampling of native and WGA datasets. Precision-Recall curves were generated as described in the Method section. We considered only confident H. pylori motifs for evaluation. (b) Same as a but using ROC curves for representation. Motif occurrences without data due to low coverage (<5x) were not considered. (c) Performance of methylation motif typing and fine mapping (n = 46 motifs) on datasets with genomic coverage subsampled at 10x, 15x, 20x, and 30x. Only results for k-nearest neighbors, neural network, and random forest are displayed. (d) Precision-Recall curves summarizing the detection performance at 75x coverage of individual methylation sites for each motif in H. pylori with adjusted frequency (Methods). (e) Same as d but using ROC curves for representation. (f) Effect of motif frequency on de novo methylated site detection. For each methylation motif, in silico datasets with a wide range of motif frequencies were created using a random subsampling strategy (either the motif occurrences or the genomic regions without motifs, see Methods). The natural motif frequencies (that is, the original ratio of motif occurrences over all queried regions) are annotated by a point on each motif curve.

Extended Data Fig. 8 Schematic representation of methylation feature vectors computation and methylation binning of contigs.

The computation of methylation features and the building of the methylation profile matrix is described in the method.

Extended Data Fig. 9 Detailed methylation analysis of MGM1 sample.

(a) Methylation binning using automated methylation features selection (without precise methylation motif discovery; Methods). Methylation features are projected on two dimensions using t-SNE. Contigs are colored per bin defined using DBSCAN, with point sizes matching contig length according to the legend. Two bins with the same methylation motifs were manually merged into Bin 4. (b) Methylation binning using de novo discovered motifs on each bin found in a (Methods). Methylation features computed from de novo discovered motifs are projected on two dimensions using t-SNE. Contigs are colored per bin defined using DBSCAN except Bin 11, which was manually defined. (c) Methylation binning using de novo discovered motifs on each bin found in b. Contigs are colored per bin defined using DBSCAN except for Bin 13, which was manually defined. (d) Methylation binning of MGM1 metagenome contigs using de novo discovered motifs (after three rounds of motif discovery (same as Fig. 5a).

Extended Data Fig. 10 Detailed methylation analysis of MGM2 sample.

(a) Methylation binning using automated methylation features selection (without precise methylation motif discovery; Methods). Methylation features are projected on two dimensions using t-SNE. Contigs are colored per defined bin with point sizes matching contig length according to the legend. Bin 1, 3, 4, and 5 were defined using DBSCAN. The other bins are composed of one or two contigs and were manually defined after de novo methylation motif discovery. (b) Methylation binning using de novo discovered motifs on each bin found in a (Methods). Methylation features computed from de novo discovered motifs are projected on two dimensions using t-SNE. Contigs are colored per bin as described in a.

Supplementary information

Supplementary Information

Supplementary Text and Figs. 1–4.

Reporting Summary

Supplementary Tables

Supplementary Tables 1–12.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Tourancheau, A., Mead, E.A., Zhang, XS. et al. Discovering multiple types of DNA methylation from bacteria and microbiome using nanopore sequencing. Nat Methods 18, 491–498 (2021). https://doi.org/10.1038/s41592-021-01109-3

Download citation

Received: 12 March 2019
Accepted: 03 March 2021
Published: 05 April 2021
Issue Date: May 2021
DOI: https://doi.org/10.1038/s41592-021-01109-3

This article is cited by

mEnrich-seq: methylation-guided enrichment sequencing of bacterial taxa of interest from microbiome
- Lei Cao
- Yimeng Kong
- Gang Fang
Nature Methods (2024)
HycDemux: a hybrid unsupervised approach for accurate barcoded sample demultiplexing in nanopore sequencing
- Renmin Han
- Junhai Qi
- Guojun Li
Genome Biology (2023)
Soil microbiome engineering for sustainability in a changing environment
- Janet K. Jansson
- Ryan McClure
- Robert G. Egbert
Nature Biotechnology (2023)
Long-read metagenomics paves the way toward a complete microbial tree of life
- Mads Albertsen
Nature Methods (2023)
Correcting modification-mediated errors in nanopore sequencing by nucleotide demodification and reference-based correction
- Chien-Shun Chiou
- Bo-Han Chen
- Yao-Ting Huang
Communications Biology (2023)