Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

Discovering multiple types of DNA methylation from bacteria and microbiome using nanopore sequencing

Abstract

Bacterial DNA methylation occurs at diverse sequence contexts and plays important functional roles in cellular defense and gene regulation. Existing methods for detecting DNA modification from nanopore sequencing data do not effectively support de novo study of unknown bacterial methylomes. In this work, we observed that a nanopore sequencing signal displays complex heterogeneity across methylation events of the same type. To enable nanopore sequencing for broadly applicable methylation discovery, we generated a training dataset from an assortment of bacterial species and developed a method, named nanodisco (https://github.com/fanglab/nanodisco), that couples the identification and fine mapping of the three forms of methylation into a multi-label classification framework. We applied it to individual bacteria and the mouse gut microbiome for reliable methylation discovery. In addition, we demonstrated the use of DNA methylation for binning metagenomic contigs, associating mobile genetic elements with their host genomes and identifying misassembled metagenomic contigs.

Access options

Rent or Buy article

Get time limited or full article access on ReadCube.

from$8.99

All prices are NET prices.

Fig. 1: Schematics for the method design and applications.
Fig. 2: Systematic examination of three main types of DNA methylation with nanopore sequencing.
Fig. 3: Local sequence context effect on motif signatures.
Fig. 4: Classification and fine mapping of three types of DNA methylation.
Fig. 5: Methylation analysis of mouse gut microbiome samples.

Data availability

All sequencing data generated for this study are available at the Sequence Read Archive under the BioProjects PRJNA559199 for individual bacteria and PRJNA559386 for the mouse gut microbiome samples. NCBI reference sequences used for the individual bacteria analysis are available under the accession codes CP041693, CP041696, NC_008261.1, CP014225.1, CP023448.1, NC_007796.1, NC_002946.2, CP041695 and CP003732.1 (Supplementary Table 1). Information related to methylation motifs are available from the REBASE database (http://rebase.neb.com)16. Data from the SMRT sequencing metagenomic study can be found under the BioProject PRJNA404082.

Code availability

The nanodisco software and a detailed tutorial with supporting data are available at http://github.com/fanglab/nanodisco.

References

  1. 1.

    Beaulaurier, J., Schadt, E. E. & Fang, G. Deciphering bacterial epigenomes using modern sequencing technologies. Nat. Rev. Genet. 20, 157–172 (2019).

    PubMed  PubMed Central  Article  CAS  Google Scholar 

  2. 2.

    Flusberg, B. A. et al. Direct detection of DNA methylation during single-molecule, real-time sequencing. Nat. Methods 7, 461–465 (2010).

    PubMed  PubMed Central  Article  CAS  Google Scholar 

  3. 3.

    Blow, M. J. et al. The epigenomic landscape of prokaryotes. PLoS Genet. 12, e1005854 (2016).

    PubMed  PubMed Central  Article  CAS  Google Scholar 

  4. 4.

    Laszlo, A. H. et al. Detection and mapping of 5-methylcytosine and 5-hydroxymethylcytosine with nanopore MspA. Proc. Natl Acad. Sci. USA 110, 18904–18909 (2013).

    PubMed  Article  CAS  PubMed Central  Google Scholar 

  5. 5.

    Schreiber, J. et al. Error rates for nanopore discrimination among cytosine, methylcytosine, and hydroxymethylcytosine along individual DNA strands. Proc. Natl Acad. Sci. USA 110, 18910–18915 (2013).

    PubMed  Article  CAS  PubMed Central  Google Scholar 

  6. 6.

    Simpson, J. T. et al. Detecting DNA cytosine methylation using nanopore sequencing. Nat. Methods 14, 407–410 (2017).

    PubMed  Article  CAS  PubMed Central  Google Scholar 

  7. 7.

    Rand, A. C. et al. Mapping DNA methylation with high-throughput nanopore sequencing. Nat. Methods 14, 411–413 (2017).

    PubMed  PubMed Central  Article  CAS  Google Scholar 

  8. 8.

    McIntyre, A. B. R. et al. Single-molecule sequencing detection of N6-methyladenine in microbial reference materials. Nat. Commun. 10, 579 (2019).

    PubMed  PubMed Central  Article  CAS  Google Scholar 

  9. 9.

    Ni, P. et al. DeepSignal: detecting DNA methylation state from Nanopore sequencing reads using deep-learning. Bioinformatics 35, 4586–4595 (2019).

    PubMed  Article  CAS  PubMed Central  Google Scholar 

  10. 10.

    Liu, Q. et al. Detection of DNA base modifications by deep recurrent neural network on Oxford Nanopore sequencing data. Nat. Commun. 10, 2449 (2019).

    PubMed  PubMed Central  Article  CAS  Google Scholar 

  11. 11.

    Liu, Q., Georgieva, D. C., Egli, D. & Wang, K. NanoMod: a computational tool to detect DNA modifications using Nanopore long-read sequencing data. BMC Genomics 20, 78 (2019).

    PubMed  PubMed Central  Article  CAS  Google Scholar 

  12. 12.

    Stoiber, M. et al. De novo identification of DNA modifications enabled by genome-guided nanopore signal processing. Preprint at bioRxiv https://doi.org/10.1101/094672 (2017).

  13. 13.

    Amarasinghe, S. L. et al. Opportunities and challenges in long-read sequencing data analysis. Genome Biol. 21, 30 (2020).

    PubMed  PubMed Central  Article  Google Scholar 

  14. 14.

    Wion, D. & Casadesus, J. N6-methyl-adenine: an epigenetic signal for DNA-protein interactions. Nat. Rev. Microbiol. 4, 183–192 (2006).

    PubMed  PubMed Central  Article  CAS  Google Scholar 

  15. 15.

    Casadesus, J. & Low, D. Epigenetic gene regulation in the bacterial world. Microbiol Mol. Biol. Rev. 70, 830–856 (2006).

    PubMed  PubMed Central  Article  CAS  Google Scholar 

  16. 16.

    Roberts, R. J., Vincze, T., Posfai, J. & Macelis, D. REBASE–a database for DNA restriction and modification: enzymes, genes and genomes. Nucleic Acids Res. 43, D298–D299 (2015).

    PubMed  Article  CAS  PubMed Central  Google Scholar 

  17. 17.

    Van Der Maaten, L. Accelerating t-SNE using tree-based algorithms. J. Mach. Learn. Res. 15, 3221–3245 (2014).

    Google Scholar 

  18. 18.

    Bailey, T. L., Johnson, J., Grant, C. E. & Noble, W. S. The MEME Suite. Nucleic Acids Res. 43, W39–W49 (2015).

    PubMed  PubMed Central  Article  CAS  Google Scholar 

  19. 19.

    Saeed, I., Tang, S. L. & Halgamuge, S. K. Unsupervised discovery of microbial population structure within metagenomes using nucleotide base composition. Nucleic Acids Res. 40, e34 (2012).

    PubMed  Article  CAS  PubMed Central  Google Scholar 

  20. 20.

    Iverson, V. et al. Untangling genomes from metagenomes: revealing an uncultured class of marine Euryarchaeota. Science 335, 587–590 (2012).

    PubMed  Article  CAS  PubMed Central  Google Scholar 

  21. 21.

    Laczny, C. C., Pinel, N., Vlassis, N. & Wilmes, P. Alignment-free visualization of metagenomic data by nonlinear dimension reduction. Sci. Rep. 4, 4516 (2014).

    PubMed  PubMed Central  Article  CAS  Google Scholar 

  22. 22.

    Laczny, C. C. et al. VizBin—an application for reference-independent visualization and human-augmented binning of metagenomic data. Microbiome 3, 1 (2015).

    PubMed  PubMed Central  Article  Google Scholar 

  23. 23.

    Albertsen, M. et al. Genome sequences of rare, uncultured bacteria obtained by differential coverage binning of multiple metagenomes. Nat. Biotechnol. 31, 533–538 (2013).

    PubMed  Article  CAS  PubMed Central  Google Scholar 

  24. 24.

    Sharon, I. et al. Time series community genomics analysis reveals rapid shifts in bacterial species, strains, and phage during infant gut colonization. Genome Res 23, 111–120 (2013).

    PubMed  PubMed Central  Article  CAS  Google Scholar 

  25. 25.

    Alneberg, J. et al. Binning metagenomic contigs by coverage and composition. Nat. Methods 11, 1144–1146 (2014).

    PubMed  Article  CAS  PubMed Central  Google Scholar 

  26. 26.

    Nielsen, H. B. et al. Identification and assembly of genomes and genetic elements in complex metagenomic samples without using reference genomes. Nat. Biotechnol. 32, 822–828 (2014).

    PubMed  Article  CAS  PubMed Central  Google Scholar 

  27. 27.

    Marbouty, M. et al. Metagenomic chromosome conformation capture (meta3C) unveils the diversity of chromosome organization in microorganisms. eLlife 3, e03318 (2014).

    Article  Google Scholar 

  28. 28.

    Burton, J. N., Liachko, I., Dunham, M. J. & Shendure, J. Species-level deconvolution of metagenome assemblies with Hi-C-based contact probability maps. G3 4, 1339–1346 (2014).

    PubMed  Article  PubMed Central  Google Scholar 

  29. 29.

    Marbouty, M., Baudry, L., Cournac, A. & Koszul, R. Scaffolding bacterial genomes and probing host-virus interactions in gut microbiome by proximity ligation (chromosome capture) assay. Sci. Adv. 3, e1602105 (2017).

    PubMed  PubMed Central  Article  CAS  Google Scholar 

  30. 30.

    Beaulaurier, J. et al. Metagenomic binning and association of plasmids with bacterial host genomes using DNA methylation. Nat. Biotechnol. 36, 61–69 (2018).

    PubMed  Article  CAS  PubMed Central  Google Scholar 

  31. 31.

    Fang, G. et al. Genome-wide mapping of methylated adenine residues in pathogenic Escherichia coli using single-molecule real-time sequencing. Nat. Biotechnol. 30, 1232–1239 (2012).

    PubMed  Article  CAS  PubMed Central  Google Scholar 

  32. 32.

    Murray, I. A. et al. The methylomes of six bacteria. Nucleic Acids Res. 40, 11450–11462 (2012).

    PubMed  PubMed Central  Article  CAS  Google Scholar 

  33. 33.

    Schadt, E. E. et al. Modeling kinetic rate variation in third generation DNA sequencing data to detect putative modifications to DNA bases. Genome Res 23, 129–141 (2013).

    PubMed  PubMed Central  Article  CAS  Google Scholar 

  34. 34.

    Beaulaurier, J. et al. Single molecule-level detection and long read-based phasing of epigenetic variations in bacterial methylomes. Nat. Commun. 6, 7438 (2015).

    PubMed  PubMed Central  Article  CAS  Google Scholar 

  35. 35.

    Song, C. X., Yi, C. & He, C. Mapping recently identified nucleotide variants in the genome and transcriptome. Nat. Biotechnol. 30, 1107–1116 (2012).

    PubMed  PubMed Central  Article  CAS  Google Scholar 

  36. 36.

    Yoshihara, M., Jiang, L., Akatsuka, S., Suyama, M. & Toyokuni, S. Genome-wide profiling of 8-oxoguanine reveals its association with spatial positioning in nucleus. DNA Res 21, 603–612 (2014).

    PubMed  PubMed Central  Article  CAS  Google Scholar 

  37. 37.

    Li, S. & Mason, C. E. The pivotal regulatory landscape of RNA modifications. Annu Rev. Genomics Hum. Genet 15, 127–150 (2014).

    PubMed  Article  CAS  PubMed Central  Google Scholar 

  38. 38.

    Roundtree, I. A., Evans, M. E., Pan, T. & He, C. Dynamic RNA modifications in gene expression regulation. Cell 169, 1187–1200 (2017).

    PubMed  PubMed Central  Article  CAS  Google Scholar 

  39. 39.

    Garalde, D. R. et al. Highly parallel direct RNA sequencing on an array of nanopores. Nat. Methods 15, 201–206 (2018).

    PubMed  Article  CAS  PubMed Central  Google Scholar 

  40. 40.

    Workman, R. E. et al. Nanopore native RNA sequencing of a human poly(A) transcriptome. Nat. Methods 16, 1297–1305 (2019).

    PubMed  PubMed Central  Article  CAS  Google Scholar 

  41. 41.

    Yang, C., Chu, J., Warren, R. L. & Birol, I. NanoSim: nanopore sequence read simulator based on statistical characterization. Gigascience 6, 1–6 (2017).

    PubMed  PubMed Central  Article  Google Scholar 

  42. 42.

    Li, H. Aligning sequence reads, clone sequences and assembly contigs with BWA-MEM. Preprint at https://arxiv.org/abs/1303.3997 (2013).

  43. 43.

    Morgan, M., Pagès, H., Obenchain, V. & Hayden, N. Rsamtools: binary alignment (BAM), FASTA, variant call (BCF), and tabix file import v.3.12 (Bioconductor, 2016).

  44. 44.

    Li, H. et al. The Sequence Alignment/Map format and SAMtools. Bioinformatics 25, 2078–2079 (2009).

    PubMed  PubMed Central  Article  CAS  Google Scholar 

  45. 45.

    Koren, S. et al. Canu: scalable and accurate long-read assembly via adaptive k-mer weighting and repeat separation. Genome Res 27, 722–736 (2017).

    PubMed  PubMed Central  Article  CAS  Google Scholar 

  46. 46.

    Vaser, R., Sovic, I., Nagarajan, N. & Sikic, M. Fast and accurate de novo genome assembly from long uncorrected reads. Genome Res 27, 737–746 (2017).

    PubMed  PubMed Central  Article  CAS  Google Scholar 

  47. 47.

    Kurtz, S. et al. Versatile and open software for comparing large genomes. Genome Biol. 5, R12 (2004).

    PubMed  PubMed Central  Article  Google Scholar 

  48. 48.

    Kolmogorov, M., Yuan, J., Lin, Y. & Pevzner, P. A. Assembly of long, error-prone reads using repeat graphs. Nat. Biotechnol. 37, 540–546 (2019).

    PubMed  Article  CAS  PubMed Central  Google Scholar 

  49. 49.

    Li, H. Minimap2: pairwise alignment for nucleotide sequences. Bioinformatics 34, 3094–3100 (2018).

    PubMed  PubMed Central  Article  CAS  Google Scholar 

Download references

Acknowledgements

We thank A. Fomenkov and R. J. Roberts from NEB for their help with the bacterial strain selection and for providing us with DNA samples (B. amyloliquefaciens, B. fusiformis and N. otitidiscaviarum). We also thank R. Gunsalus from the University of California, Los Angeles (M. hungatei), S. Logan from the National Research Council Canada (C. perfringens), L. Jackson from the University of Oklahoma Health Sciences Center (N. gonorrhoeae) and B. Schink, N. Müller and A. Keller from the University of Konstanz, Germany (T. phaeum) for providing us with DNA samples. We thank Y. Kong and M. Ni for providing helpful feedback for early versions of this paper. This work was supported by a seed fund from Icahn Institute for Genomics and Multiscale Biology (G.F.) and by grant nos. R01 GM128955 (G.F.), R35 GM139655 (G.F.) and R56 HG011095 (G.F.) from the National Institutes of Health. G.F. is a Hirschl Research Scholar by Irma T. Hirschl/Monique Weill-Caulier Trust, and a Nash Family Research Scholar. This work was also supported in part through the computational resources and staff expertise provided by the Department of Scientific Computing at the Icahn School of Medicine at Mount Sinai.

Author information

Affiliations

Authors

Contributions

G.F. conceived and supervised the project. A.T. and G.F. designed the methods. A.T. developed the software package for all the proposed computational analyses. A.T., E.A.M. and X-S.Z. conducted the experiments. A.T. and G.F. analyzed the data and wrote the paper with inputs and comments from all coauthors.

Corresponding author

Correspondence to Gang Fang.

Ethics declarations

Competing interests

A.T. and G.F. are inventors of two US Provisional patent applications (62/860,952 and 62/851,205) that describe the methods in this paper.

Additional information

Peer reviewer information Nature Methods thanks the anonymous reviewers for their contribution to the peer review of this work. Lei Tang was the primary editor on this article and managed its editorial process and peer review in collaboration with the rest of the editorial team.

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Extended data

Extended Data Fig. 1 General statistics of motif signatures.

(a) Distribution of current differences are shown for all confident motifs altogether (n = 46 motifs) as well as average absolute differences and associated standard deviations near methylated bases ([− 10 bp, + 11 bp]). The lower and upper hinges correspond to the 25th and 75th percentiles while the lower and upper whiskers extend to the minima and maxima respectively (capped at 1.5 time the inter-quartile range). (b) Same as a with distinction between DNA methylation types (n = 28 6 mA motifs, n = 7 4mC motifs, n = 11 5mC motifs). (c) Same as a but for individual methylation motifs.

Extended Data Fig. 2 Systematic examination of three main DNA methylation types with nanopore sequencing.

(a) t-SNE projection of isolated methylation motif occurrences separated per motif. The same dataset as Fig. 2b was used with occurrences colored per motif. Other motifs are colored in gray. (b) Same as a but grouped by methylation type.

Extended Data Fig. 3 Nanopore sequencing signal processing variable.

(a) Comparison of current differences across methylation occurrences between datasets base called with Albacore 1.1.0, Albacore 2.3.4, and Guppy 3.2.4 illustrated by projection with t-SNE from for 46 well-characterized motifs (Supplementary Table 2). Each dot represents one isolated motif occurrence colored by base caller versions. 100,000 motif occurrences were randomly selected from each dataset to reduce the scatter plot density and ease the visualization. For each motif occurrence, current differences from 22 positions near methylated bases ([− 10 bp, + 11 bp]) were used. (b) Performance for de novo methylated site detection between datasets base called with Albacore 1.1.0, Albacore 2.3.4, and Guppy 3.2.4. We evaluated individual motif occurrences detection using Precision-Recall curves for H. pylori at 75x coverage. Precision-Recall curves and area under the curves (AUC) were computed as described in the Method section. Only confident H. pylori motifs were considered for the evaluation. (c) Comparison of current differences across methylation occurrences (same as a) between datasets produced with or without outlier removal step (Methods). (d) Performance for de novo methylated site detection (similar than b) with datasets produced with or without outlier removal step. (e) Variation of current differences across methylation occurrences without outlier removal step as illustrated by motif signatures from three motifs, AG4mCT (n = 6550 occurrences), GGW5mCC (n = 1875 occurrences), and GCYYG6mAT (n = 954 occurrences). For each motif, current differences near methylated bases ([− 6 bp, + 7 bp]) from all isolated occurrences are plotted with conservation of relative distances to methylated bases. Distributions of current differences for each relative distance are displayed as a violin plot. Current differences axis is limited to −8 to 8 pA range. (f) Performance for de novo methylated site detection across current difference datasets generated with different read alignment type filtering: remove alternative alignments (filtered out XA bam flags; named No Alt.), remove supplementary alignments (filtered out 2048 bam flags; named No Supp.), remove chimeric alignments (filtered out SA bam flags; named No Chim.), only conserve unique mapping (filtered out XA and SA bam flags; named Unique), and keep all alignments (named None). (g) Performance for de novo methylated site detection across datasets normalized with linear regression (lm function), robust regression (rlm function) or no additional normalization (annotated as none). (h) Performance for de novo methylated site detection across datasets generated using two-sided Mann-Whitney U-test or Student’s t-test. (i) Performance for de novo methylated site detection across datasets generated using different p-value smoothing window size: no smoothing (named None), 3 nt, 5 nt, and 7 nt. (j) Performance for de novo methylated site detection across datasets generated using different function for combining consecutives p-values: Fisher’s method (named sumlog), logit method (named logitp), sum p method (named sump), and sum z method (named sumz). (k) Performance for de novo methylated site detection across peaks datasets generated using different peak detection window size: 5 nt, 7 nt, and 9 nt. Plots f, g, h, i, j, and k show Precision-Recall curves and area under the curves (AUC) for various signal processing steps and were computed as described in the Method section. (l) Comparison of current differences across methylation occurrences (same as a) with E. coli datasets (200x) produced using either the reference genome or the de novo assembly (Methods). (m) Performance for de novo methylated site detection in E. coli datasets (200x) using either the reference genome or the de novo assembly. (n) Performance of methylation motif typing and fine mapping on E. coli datasets (200x) produced using either the reference genome or the de novo assembly (motif occurrences: n = 458 for AACNNNNNNGTGC, n = 18451 for CCWGG, n = 28110 for GATC, n = 463 for GCACNNNNNNGTT). Only results for k-nearest neighbors, neural network, and random forest are displayed.

Extended Data Fig. 4 Additional information for classification of methylation motif occurrences.

(a) Approximation of DNA methylation position in three motifs, AG4mCT (n = 6549 occurrences), GGW5mCC (n = 1875 occurrences), and GCYYG6mAT (n = 954 occurrences). Signal strength is computed using a sliding window alongside motif signature to choose the best vector positioning to use for classification. (b) Flowchart description of procedure for classifier training and novel motifs dataset annotation. Training the classifier consists of gathering a set of bacteria with characterized methylomes. Confident motifs are selected to assure the robustness of the final classifier, then all motif occurrences are localized in the genome (from corresponding reference genome or de novo assembled and polished genome). Current differences are then computed along the genome. Next, the training dataset is built from the offsetted vector of current differences labelled with the known methylation type and the offset combination. Finally, the classifier is trained using the chosen model(s). Analyzing a new bacterial sample consists of de novo detecting the methylated motif from processed current differences (see Methods). Then methylated motif occurrences are localized and the motif signatures are computed (that is, distribution of current differences at relative distance from the methylated bases). Next, those signatures are leveraged to approximate the methylated position for each de novo detected motif (see Methods), which is used to define the classifier inputs (that is, vector of current differences centered on the approximate methylated position). Finally, the trained classifier is used to predict the methylation type and fine map the DNA methylation for each motif. (c) Boxplot of overall prediction accuracy in LOOCV evaluation (n = 46 motifs) for each classifier. Classifiers are ordered by average accuracy. The lower and upper hinges correspond to the 25th and 75th percentiles while the lower and upper whiskers extend to the minima and maxima respectively (capped at 1.5 time the inter-quartile range). (d) Effect of hyperparameters on classification accuracy. Boxplot of overall prediction accuracy in LOOCV evaluation with classifiers trained on all motifs except the ones from H. pylori (n = 27 motifs). Hyperparameters were either tuned on H. pylori motifs only (“Alt. HP”) or on all motifs (“Main HP”). The lower and upper hinges correspond to the 25th and 75th percentiles while the lower and upper whiskers extend to the minima and maxima respectively (capped at 1.5 time the inter-quartile range). (e) Relationship between LOOCV accuracy and current difference signal similarities. Current difference signal near methylated bases is visualized by projection with t-SNE for the 46 well-characterized motifs similar to Fig. 2b. Each dot represents one isolated motif occurrence colored by accuracy from LOOCV analysis.

Extended Data Fig. 5 Classification and fine mapping of three types of DNA methylation (part 1).

Similar to Fig. 4d with full set of prediction results for a subset of methylation motifs for k-nearest neighbors, random forest, and neural network. Filling colors correspond to percentage of occurrences classified to a specific class ranging from blue (0%) to red (100%). Greyed out prediction correspond to out of motif position. Blank columns correspond to within-motif positions without prediction. Prediction percentages of expected classes are displayed in italic and selected predictions based on consensus are displayed in bold.

Extended Data Fig. 6 Classification and fine mapping of three types of DNA methylation (part 2).

See Extended Data Fig. 5.

Extended Data Fig. 7 Evaluation of motif enrichment with Precision-Recall curves.

(a) Effect of coverage on de novo methylated site detection. We evaluated individual motif occurrences detection using Precision-Recall curves (PR curves) for H. pylori. Studied datasets with coverage ranging from 5x to 200x were generated by random subsampling of native and WGA datasets. Precision-Recall curves were generated as described in the Method section. We considered only confident H. pylori motifs for evaluation. (b) Same as a but using ROC curves for representation. Motif occurrences without data due to low coverage (<5x) were not considered. (c) Performance of methylation motif typing and fine mapping (n = 46 motifs) on datasets with genomic coverage subsampled at 10x, 15x, 20x, and 30x. Only results for k-nearest neighbors, neural network, and random forest are displayed. (d) Precision-Recall curves summarizing the detection performance at 75x coverage of individual methylation sites for each motif in H. pylori with adjusted frequency (Methods). (e) Same as d but using ROC curves for representation. (f) Effect of motif frequency on de novo methylated site detection. For each methylation motif, in silico datasets with a wide range of motif frequencies were created using a random subsampling strategy (either the motif occurrences or the genomic regions without motifs, see Methods). The natural motif frequencies (that is, the original ratio of motif occurrences over all queried regions) are annotated by a point on each motif curve.

Extended Data Fig. 8 Schematic representation of methylation feature vectors computation and methylation binning of contigs.

The computation of methylation features and the building of the methylation profile matrix is described in the method.

Extended Data Fig. 9 Detailed methylation analysis of MGM1 sample.

(a) Methylation binning using automated methylation features selection (without precise methylation motif discovery; Methods). Methylation features are projected on two dimensions using t-SNE. Contigs are colored per bin defined using DBSCAN, with point sizes matching contig length according to the legend. Two bins with the same methylation motifs were manually merged into Bin 4. (b) Methylation binning using de novo discovered motifs on each bin found in a (Methods). Methylation features computed from de novo discovered motifs are projected on two dimensions using t-SNE. Contigs are colored per bin defined using DBSCAN except Bin 11, which was manually defined. (c) Methylation binning using de novo discovered motifs on each bin found in b. Contigs are colored per bin defined using DBSCAN except for Bin 13, which was manually defined. (d) Methylation binning of MGM1 metagenome contigs using de novo discovered motifs (after three rounds of motif discovery (same as Fig. 5a).

Extended Data Fig. 10 Detailed methylation analysis of MGM2 sample.

(a) Methylation binning using automated methylation features selection (without precise methylation motif discovery; Methods). Methylation features are projected on two dimensions using t-SNE. Contigs are colored per defined bin with point sizes matching contig length according to the legend. Bin 1, 3, 4, and 5 were defined using DBSCAN. The other bins are composed of one or two contigs and were manually defined after de novo methylation motif discovery. (b) Methylation binning using de novo discovered motifs on each bin found in a (Methods). Methylation features computed from de novo discovered motifs are projected on two dimensions using t-SNE. Contigs are colored per bin as described in a.

Supplementary information

Supplementary Information

Supplementary Text and Figs. 1–4.

Reporting Summary

Supplementary Tables

Supplementary Tables 1–12.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Tourancheau, A., Mead, E.A., Zhang, XS. et al. Discovering multiple types of DNA methylation from bacteria and microbiome using nanopore sequencing. Nat Methods (2021). https://doi.org/10.1038/s41592-021-01109-3

Download citation

Search

Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing