Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

DiMeLo-seq: a long-read, single-molecule method for mapping protein–DNA interactions genome wide

Abstract

Studies of genome regulation routinely use high-throughput DNA sequencing approaches to determine where specific proteins interact with DNA, and they rely on DNA amplification and short-read sequencing, limiting their quantitative application in complex genomic regions. To address these limitations, we developed directed methylation with long-read sequencing (DiMeLo-seq), which uses antibody-tethered enzymes to methylate DNA near a target protein’s binding sites in situ. These exogenous methylation marks are then detected simultaneously with endogenous CpG methylation on unamplified DNA using long-read, single-molecule sequencing technologies. We optimized and benchmarked DiMeLo-seq by mapping chromatin-binding proteins and histone modifications across the human genome. Furthermore, we identified where centromere protein A localizes within highly repetitive regions that were unmappable with short sequencing reads, and we estimated the density of centromere protein A molecules along single chromatin fibers. DiMeLo-seq is a versatile method that provides multimodal, genome-wide information for investigating protein–DNA interactions.

This is a preview of subscription content

Access options

Buy article

Get time limited or full article access on ReadCube.

$32.00

All prices are NET prices.

Fig. 1: Genome-wide mapping of protein–DNA interactions with DiMeLo-seq.
Fig. 2: Application of DiMeLo-seq in artificial chromatin.
Fig. 3: Optimization of DiMeLo-seq targeting lamin B1 in situ.
Fig. 4: Single-molecule CTCF-binding and CpG methylation profiles.
Fig. 5: Detecting H3K9me3 in centromeres.
Fig. 6: CENP-A-directed methylation within chromosome X centromeric higher-order repeats.

Data availability

All raw fast5 sequencing data are available in the Sequence Read Archive (SRA) under BioProject accession PRJNA752170. These data were used to produce Figs. 26, Extended Data Figs. 110, Supplementary Tables 13 and Supplementary Fig. 2. The CTCF ChIP–seq peak bed file for GM12878 is available from ENCODE Project Consortium under accession ENCFF797SDL. The ATAC–seq peak bed file for GM12878 is available from ENCODE Project Consortium under accession ENCFF748UZH. Bulk and scDamID data were obtained from the Gene Expression Omnibus (GEO) under accession GSE156150. H3K9me3 CUT&RUN data are from Altemose et al.35 and accessible in the SRA with BioProject accession PRJNA752795. Data for Fig. 6c used CHM13 CENP-A ChIP–seq data for CENP-A k-mer analyses, which are available under BioProject accession PRJNA559484 from Logsdon et al.41. Centromere and HOR definition bed files from the telomere-to-telomere consortium can be found at https://github.com/marbl/chm13. Known CTCF motifs are from http://compbio.mit.edu/encode-motifs/matches.txt.gz. Data for the CpG methylation track in Fig. 6d were obtained from data available at https://github.com/nanopore-wgs-consortium/CHM13 (ref. 35). Source data are provided with this paper.

Code availability

The code to reproduce the results in this manuscript is available on https://github.com/amaslan/dimelo-seq/.

References

  1. van Steensel, B. & Henikoff, S. Identification of in vivo DNA targets of chromatin proteins using tethered dam methyltransferase. Nat. Biotechnol. 18, 424–428 (2000).

    PubMed  Article  CAS  Google Scholar 

  2. Mikkelsen, T. S. et al. Genome-wide maps of chromatin state in pluripotent and lineage-committed cells. Nature 448, 553–560 (2007).

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  3. Robertson, G. et al. Genome-wide profiles of STAT1 DNA association using chromatin immunoprecipitation and massively parallel sequencing. Nat. Methods 4, 651–657 (2007).

    CAS  PubMed  Article  Google Scholar 

  4. Johnson, D. S., Mortazavi, A., Myers, R. M. & Wold, B. Genome-wide mapping of in vivo protein–DNA interactions. Science 316, 1497–1502 (2007).

    CAS  PubMed  Article  Google Scholar 

  5. Barski, A. et al. High-resolution profiling of histone methylations in the human genome. Cell 129, 823–837 (2007).

    CAS  PubMed  Article  Google Scholar 

  6. Skene, P. J. & Henikoff, S. An efficient targeted nuclease strategy for high-resolution mapping of DNA binding sites. Elife 6, e21856 (2017).

    PubMed  PubMed Central  Article  Google Scholar 

  7. Rivera, C. M. & Ren, B. Mapping human epigenomes. Cell 155, 39–55 (2013).

    CAS  PubMed  Article  Google Scholar 

  8. Sönmezer, C. et al. Molecular co-occupancy identifies transcription factor binding cooperativity in vivo. Mol. Cell 81, 255–267 (2021).

    PubMed  Article  CAS  Google Scholar 

  9. Nurk, S. et al. The complete sequence of a human genome. Science 376, 44–53 (2022).

    CAS  PubMed  Article  Google Scholar 

  10. Abdulhay, N. J. et al. Massively multiplex single-molecule oligonucleosome footprinting. Elife 9, e59404 (2020).

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  11. Stergachis, A. B., Debo, B. M., Haugen, E., Churchman, L. S. & Stamatoyannopoulos, J. A. Single-molecule regulatory architectures captured by chromatin fiber sequencing. Science 368, 1449–1454 (2020).

    CAS  PubMed  Article  Google Scholar 

  12. Lee, I. et al. Simultaneous profiling of chromatin accessibility and methylation on human cell lines with nanopore sequencing. Nat. Methods 17, 1191–1199 (2020).

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  13. Shipony, Z. et al. Long-range single-molecule mapping of chromatin accessibility in eukaryotes. Nat. Methods 17, 319–327 (2020).

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  14. Wang, Y. et al. Single-molecule long-read sequencing reveals the chromatin basis of gene expression. Genome Res. 29, 1329–1342 (2019).

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  15. Schmid, M., Durussel, T. & Laemmli, U. K. ChIC and ChEC; genomic mapping of chromatin proteins. Mol. Cell 16, 147–157 (2004).

    CAS  PubMed  Google Scholar 

  16. van Schaik, T., Vos, M., Peric-Hupkes, D., Hn Celie, P. & van Steensel, B. Cell cycle dynamics of lamina-associated DNA. EMBO Rep. 21, e50636 (2020).

    PubMed  PubMed Central  Google Scholar 

  17. O’Brown, Z. K. et al. Sources of artifact in measurements of 6mA and 4mC abundance in eukaryotic genomic DNA. BMC Genomics 20, 445 (2019).

    PubMed  PubMed Central  Article  CAS  Google Scholar 

  18. Drozdz, M., Piekarowicz, A., Bujnicki, J. M. & Radlinska, M. Novel nonspecific DNA adenine methyltransferases. Nucleic Acids Res. 40, 2119–2130 (2012).

    CAS  PubMed  Article  Google Scholar 

  19. Lowary, P. T. & Widom, J. New DNA sequence rules for high affinity binding to histone octamer and sequence-directed nucleosome positioning. J. Mol. Biol. 276, 19–42 (1998).

    CAS  PubMed  Article  Google Scholar 

  20. Guelen, L. et al. Domain organization of human chromosomes revealed by mapping of nuclear lamina interactions. Nature 453, 948–951 (2008).

    CAS  PubMed  Article  Google Scholar 

  21. Meuleman, W. et al. Constitutive nuclear lamina–genome interactions are highly conserved and associated with A/T-rich sequence. Genome Res. 23, 270–280 (2013).

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  22. Altemose, N. et al. μDamID: a microfluidic approach for joint imaging and sequencing of protein–DNA interactions in single cells. Cell Syst. 11, 354–366 (2020).

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  23. Sobecki, M. et al. MadID, a versatile approach to map protein–DNA interactions, highlights telomere-nuclear envelope contact sites in human cells. Cell Rep. 25, 2891–2903 (2018).

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  24. Kind, J. et al. Genome-wide maps of nuclear lamina interactions in single human cells. Cell 163, 134–147 (2015).

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  25. Bell, A. C. & Felsenfeld, G. Methylation of a CTCF-dependent boundary controls imprinted expression of the Igf2 gene. Nature 405, 482–485 (2000).

    CAS  PubMed  Article  Google Scholar 

  26. Song, L. et al. Open chromatin defined by DNase I and FAIRE identifies regulatory elements that shape cell-type identity. Genome Res. 21, 1757–1767 (2011).

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  27. Boyle, A. P. et al. High-resolution genome-wide in vivo footprinting of diverse transcription factors in human cells. Genome Res. 21, 456–464 (2011).

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  28. Klenova, E. M. et al. CTCF, a conserved nuclear factor required for optimal transcriptional activity of the chicken c-myc gene, is an 11-Zn-finger protein differentially expressed in multiple forms. Mol. Cell. Biol. 13, 7612–7624 (1993).

    CAS  PubMed  PubMed Central  Google Scholar 

  29. Lobanenkov, V. V. et al. A novel sequence-specific DNA binding protein which interacts with three regularly spaced direct repeats of the CCCTC-motif in the 5′-flanking sequence of the chicken c-myc gene. Oncogene 5, 1743–1753 (1990).

    CAS  PubMed  Google Scholar 

  30. Ohlsson, R., Renkawitz, R. & Lobanenkov, V. CTCF is a uniquely versatile transcription regulator linked to epigenetics and disease. Trends Genet. 17, 520–527 (2001).

    CAS  PubMed  Article  Google Scholar 

  31. Rhee, H. S. & Pugh, B. F. Comprehensive genome-wide protein–DNA interactions detected at single-nucleotide resolution. Cell 147, 1408–1419 (2011).

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  32. Kelly, T. K. et al. Genome-wide mapping of nucleosome positioning and DNA methylation within individual DNA molecules. Genome Res. 22, 2497–2506 (2012).

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  33. Wenger, A. M. et al. Accurate circular consensus long-read sequencing improves variant detection and assembly of a human genome. Nat. Biotechnol. 37, 1155–1162 (2019).

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  34. Gershman, A. et al. Epigenetic patterns in a complete human genome. Science 376, eabj5089 (2022).

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  35. Altemose, N. et al. Complete genomic and epigenetic maps of human centromeres. Science 376, eabl4178 (2022).

    CAS  PubMed  Article  Google Scholar 

  36. McNulty, S. M. & Sullivan, B. A. Alpha satellite DNA biology: finding function in the recesses of the genome. Chromosome Res. 26, 115–138 (2018).

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  37. Rudd, M. K., Schueler, M. G. & Willard, H. F. Sequence organization and functional annotation of human centromeres. Cold Spring Harb. Symp. Quant. Biol. 68, 141–149 (2003).

    CAS  PubMed  Article  Google Scholar 

  38. Willard, H. F. & Waye, J. S. Hierarchical order in chromosome-specific human alpha satellite DNA. Trends Genet. 3, 192–198 (1987).

    CAS  Article  Google Scholar 

  39. Miga, K. H. et al. Centromere reference models for human chromosomes X and Y satellite arrays. Genome Res. 24, 697–707 (2014).

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  40. Hayden, K. E. et al. Sequences associated with centromere competency in the human genome. Mol. Cell. Biol. 33, 763–772 (2013).

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  41. Logsdon, G. A. et al. The structure, function and evolution of a complete human chromosome 8. Nature 593, 101–107 (2021).

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  42. Lica, L. & Hamkalo, B. Preparation of centromeric heterochromatin by restriction endonuclease digestion of mouse L929 cells. Chromosoma 88, 42–49 (1983).

    CAS  PubMed  Article  Google Scholar 

  43. Smith, O. K. et al. Identification and characterization of centromeric sequences in Xenopus laevis. Genome Res. 31, 958–967 (2021).

    PubMed  PubMed Central  Article  Google Scholar 

  44. Miga, K. H. et al. Telomere-to-telomere assembly of a complete human X chromosome. Nature 585, 79–84 (2020).

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  45. Bodor, D. L. et al. The quantitative architecture of centromeric chromatin. Elife 3, e02137 (2014).

    PubMed  PubMed Central  Article  CAS  Google Scholar 

  46. Aldrup-MacDonald, M. E., Kuo, M. E., Sullivan, L. L., Chew, K. & Sullivan, B. A. Genomic variation within alpha satellite DNA influences centromere location on human chromosomes with metastable epialleles. Genome Res. 26, 1301–1311 (2016).

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  47. Gilpatrick, T. et al. Targeted nanopore sequencing with Cas9-guided adapter ligation. Nat. Biotechnol. 38, 433–438 (2020).

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  48. Kovaka, S., Fan, Y., Ni, B., Timp, W. & Schatz, M. C. Targeted nanopore sequencing by real-time mapping of raw electrical signal with UNCALLED. Nat. Biotechnol. 39, 431–441 (2021).

    CAS  PubMed  Article  Google Scholar 

  49. Gamba, R. et al. A method to enrich and purify centromeric DNA from human cells. Preprint at bioRxiv https://doi.org/10.1101/2021.09.24.461328 (2021).

  50. Meers, M. P., Bryson, T. D., Henikoff, J. G. & Henikoff, S. Improved CUT&RUN chromatin profiling tools. Elife 8, e46314 (2019).

    PubMed  PubMed Central  Article  Google Scholar 

  51. Cao, S., Zhou, K., Zhang, Z., Luger, K. & Straight, A. F. Constitutive centromere-associated network contacts confer differential stability on CENP-A nucleosomes in vitro and in the cell. Mol. Biol. Cell 29, 751–762 (2018).

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  52. Zhou, K. et al. CENP-N promotes the compaction of centromeric chromatin. Preprint at bioRxiv https://doi.org/10.1101/2021.06.14.448351 (2021).

  53. Kim, B. Y. et al. Highly contiguous assemblies of 101 drosophilid genomes. Preprint at bioRxiv https://doi.org/10.1101/2020.12.14.422775 (2020).

  54. Hellman, A. & Chess, A. Gene body-specific methylation on the active X chromosome. Science 315, 1141–1143 (2007).

    CAS  PubMed  Article  Google Scholar 

Download references

Acknowledgements

We thank A. Stergachis for the plasmid encoding Hia5, G. Caldas for experimental training, and G. Karpen for helpful discussions. We thank Stanford University and the Stanford Research Computing Center for providing computational resources and support that contributed to these research results. We thank M. Tan for contributions to sequencing. This work was supported by the Chan Zuckerberg Biohub and by the NIGMS of the National Institutes of Health under award no. R35GM124916 to A.S. and R01 GM074728 to A.F.S. K.H.M. is supported by R21HG010548-01. O.K.S. and R.R.B. are supported by a National Institutes of Health T32 award, nos. GM113854-02 and GM007279-45, respectively. A.M., O.K.S. and R.R.B. are supported by NSF GRFP awards. N.A. is an HHMI Hanna H. Gray Fellow. A.S. is a Chan Zuckerberg Biohub Investigator, and a Pew Scholar in the Biomedical Sciences.

Author information

Authors and Affiliations

Authors

Contributions

N.A., A.M., O.K.S., K.S., A.F.S. and A.S. designed the study. N.A., A.M., O.K.S., K.S. and R.R.B. performed the experiments. A.M.D. and N.N. assisted with sequencing and provided feedback. K.H.M. provided unpublished datasets and feedback. R.M. assisted with analysis software development. N.A., A.M., O.K.S. and K.S. analyzed and interpreted the data. N.A., A.M., O.K.S., K.S. and R.R.B. made the figures. N.A., A.M., O.K.S. and K.S. wrote the manuscript, with input from R.R.B., A.F.S. and A.S. A.F.S. and A.S. supervised the study.

Corresponding authors

Correspondence to Aaron F. Straight or Aaron Streets.

Ethics declarations

Competing interests

N.A., A.M., O.K.S., K.S., A.F.S. and A.S. are co-inventors on a patent application related to this work. The remaining authors declare no competing interests.

Peer review

Peer review information

Nature Methods thanks the anonymous reviewers for their contribution to the peer review of this work. Primary Handling Editor: Lei Tang, in collaboration with the Nature Methods team. Peer reviewer reports are available.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Extended data

Extended Data Fig. 1 In vitro assessment of methylation of DNA and chromatin by pA-Hia5 and pAG-Hia5.

a,b, Agarose gel electrophoresis image of DpnI digestion of (unmethylated) plasmid DNA following incubation with Hia5, pA-Hia5 (a), or pAG-Hia5 (b) (Supplementary Note 1). Representative images of at least 2 replicates. c, Schematic of 1×601 DNA sequence. Grey box indicates 601 sequence, Yellow hexagon indicates end with biotin. d, Native polyacrylamide gel electrophoresis of naked 1×601 DNA or chromatinized 1×601 DNA before and after BsiWI digestion and glycerol gradient fractionation. Representative image of at least 2 replicates. e, Histogram (filled bars, left axis) and cumulative distribution (line traces, right axis) of fraction of methylation (mA/A) on reads from CENP-A 1×601 chromatin methylated with free pA-Hia5, CENP-A-directed pA-Hia5, IgG-directed pA-Hia5, or untreated. Left y-axis is truncated at 20 for better visualization. f, Plot showing percentage false discovery rate plotted against binned minimum mA probability score (Supplementary Note 4). Dotted lines indicates threshold - 0.6, 5% FDR. g,h, Receiver Operator Characteristic (ROC) curves comparing fraction of methylated reads from 1×601 CENP-A chromatin after CENP-A-directed methylation (True Positive Rate) to IgG-directed methylation (g) or no treatment (h) (False Positive Rate). Areas under the curves (AUC) for the ROC curves range between 0.92 and 0.94 for (g), and between 0.92 and 0.95 for (h). i, Schematic of methylation of accessible DNA on 1×601 CENP-A chromatin co-incubated with free pA-Hia5 and SAM. j, Heatmap showing methylation on 5000 individual reads from CENP-A chromatin following incubation with free pA-Hia5. Blue indicates methylation above threshold (0.6). k, Line plot showing percentage of reads with methylation as a function of the minimum percentage of methylation on each read. (methylation threshold - 0.6). Dotted line corresponds to methylation on at least 20% of each read (used in Fig. 2d).

Source data

Extended Data Fig. 2 In vitro assessment of methylation of 18×601 array chromatin by pA-Hia5 and pAG-Hia5.

a, Schematic showing the location of 601 sequences (grey boxes) and AvaI digestion sites (dashed line) in between 601 sequences on the 18×601 array. Yellow hexagons indicate biotinylation. b, Schematic of methylation of 18×601 chromatin reconstitution, incubation with free pA-Hia5 and SAM, and long-read sequencing of methylated DNA extracted from chromatin. c, Native polyacrylamide gel electrophoresis showing AvaI digested naked 18×601 array DNA or 18×601 chromatin array reconstituted with CENP-A or H3 (Supplementary Note 2). Representative gel image of at least 3 replicates. d, Representative immunofluorescence images of chromatin-coated beads following methylation using CENP-A-directed pA-Hia5. Scale bar - 3 microns. e, Violin plots of immunofluorescence signal on (denatured) chromatin-coated beads following antibody-directed methylation. Solid line - median, dashed line - quartiles. n > 90 beads/condition. (Supplementary Note 5) f, Histogram (filled bars, left axis) and cumulative distribution (line traces, right axis) of fraction of methylation (mA/A) on reads from CENP-A or H3 chromatin methylated with free pA-Hia5 or CENP-A-directed pA-Hia5. Left y-axis is truncated at 20 for better visualization. g,h, Heatmap showing methylation on 2000 individual reads from CENP-A chromatin methylation with free pA-Hia5, clustered over the entire 18×601 array (g) or a subset 4×601 region (Supplementary Note 4) along with cartoons depicting predicted nucleosome positions (red circles) (h). Insets below heatmaps show average mA/A on every base position of 18×601 array or 4×601 portion. (red dashed line indicates 601 dyad position). i, Violin plot of nucleosomes detected per read on reads from CENP-A or H3 18×601 chromatin array methylated with free pA-Hia5, or CENP-A-directed pA-Hia5. Solid line - median, dashed lines - quartiles. n = 3000 reads. Statistical significance was calculated using Kruskal-Wallis test. *** - P-value < 0.0001 ns - P-value > 0.05. j, Histogram (filled bars, left axis) and cumulative distribution (line traces, right axis) of fraction of methylation (mA/A) on reads from CENP-A or H3 chromatin methylated with free pA-Hia5 or CENP-A-directed pA-Hia5. Left y-axis is truncated at 20 for better visualization. k,i, Same as g,h, but corresponding to H3 chromatin methylation with H3-directed pAG-Hia5.

Source data

Extended Data Fig. 3 Assessment of mA calling and LMNB1 targeting.

a, The proportion of all adenines called as methylated at each possible mA probability score using two different software packages on ONT reads from two GM12878 DNA samples: untreated genomic DNA and purified genomic DNA methylated by Hia5 in vitro. The untreated DNA provides a measure of the false positive rate (FPR) at each score, since it contains few or no methyladenines. The Hia-5 treated DNA provides a lower bound on the true positive rate (TPR) at each threshold. b, Estimates of the proportion of As methylated in the Hia5-treated DNA sample at each false discovery rate (FDR) threshold (FDR = FPR/(TPR + FPR), determined from a). At least 80% of the adenines on the Hia5-treated DNA appear to be methylated. c-d, In the DiMeLo-seq workflow, following the primary antibody and pA/G-MTase binding and wash steps, a sample of nuclei can be taken for quality assessment by immunofluorescence. One can determine the locations and relative quantity of pA/G-MTase molecules using fluorophore-conjugated antibodies that bind to the pA/G-MTase but not to the primary antibody. In these representative images, the results for pAG-EcoGII are shown, comparing different antibodies, detergents, and samples with (d) and without (c) the use of an unconjugated secondary antibody to recruit more pA/G-MTase molecules to the target protein. Scale bars representing 10 microns are shown in the FITC channel images as white lines.

Source data

Extended Data Fig. 4 Demonstration of in vivo LMNB1-targeting and estimation of in situ sensitivity and specificity.

a, A browser view of chr7 comparing in vivo EcoGII-LMNB1 DamID (second track, green) to conventional LMNB1 in vivo DamID (first track, blue), and to LMNB1-targeted in situ DiMeLo-seq (fourth track, dark red). b, For an in situ LMNB1-targeting experiment using the final v2 protocol (#120 in Supplementary Table 1), the distributions of guppy mA probability scores across all A bases (q > 10) on all reads mapping to cLADs (gold, representing on-target methylation; n = 2.8 M) or ciLADs (blue, representing off-target methylation; n = 2.1 M). c, As in b, but showing the cumulative distributions for all mA calls above each probability score threshold, with the ratio between these plotted as a dotted line (using the right-hand y-axis). Vertical line indicates the stringent threshold of 0.9, at which cLADs have 20 times more mA as a proportion of all As (0.6%) than do ciLADs. If the threshold is reduced to 0.5, the fraction of As called as methylated increases to 2.5% but the cLAD:ciLAD ratio decreases to 15.6. d, On a per-read basis, for all reads with at least 500 A basecalls (q > 10) and using a mA probability threshold of 0.9, the distribution of mA/A called on each read for cLADs (n = 812 reads) vs. ciLADs (n = 827 reads). e, Receiver-Operator Characteristic (ROC) curve showing, for different mA calling thresholds, the ability to classify individual reads from (d) as originating from cLADs or ciLADs using a simple linear threshold on mA/A. At a false positive rate of 6%, reads can be classified with a true positive rate of 59%, and this is similar for all mA thresholds used. The total Area Under the Curve (AUC) for the p > 0.9 curve is 0.78. f, As in Fig. 3e, but for bulk conventional DamID raw coverage. The y axis is truncated to omit outliers for visualization (max = 300000), but these were not omitted for linear model and correlation computation. Error bars in x represent the proportion of 32 cells + /- 2 standard errors of the proportion. Error bars in y represent the mean of n = 94 to 663 genomic bins + /- 2 standard errors of the mean.

Extended Data Fig. 5 Analysis of CTCF targeting performance.

a, Enrichment profiles with mA probability threshold of 0.75 at the top quartile of ChIP–seq peaks for the DiMeLo-seq protocol v1 compared to four optimization conditions (opt1: 2 hour activation, 0.05 mM spermidine at activation, replenish SAM; opt2: 2 hour activation, 0.05 mM spermidine at activation, replenish SAM, 500 nM pA-Hia5; opt3: 2 hour activation, 0.05 mM spermidine at activation, replenish SAM, pA-Hia5 binding at 4 °C for 2 hours; opt4: 2 hour activation, no spermidine, 1 mM Ca ++ and 0.5 mM Mg ++ buffer) (Supplementary Note 11). b, Fold enrichment over background of mA/A in ChIP-seq peak regions. Error bars represent the 95% credible interval for each ratio of proportions determined by sampling proportions from posterior beta distributions computed with uninformative priors. c, mA/A in ATAC-seq peaks that do not overlap CTCF ChIP-seq peaks (grey) and mA/A in ATAC-seq peaks that do overlap CTCF ChIP-seq peaks (yellow). Error bars are computed as in (b) d, Methylation decay from the CTCF motif center for the top decile of ChIP-seq signal is fit with an exponential decay function. The positions of the peaks are indicated, with the spacing between peaks also noted. e, Methylation profiles at top quartile of ChIP-seq peaks when targeting the C-terminus or N-terminus of CTCF. The difference between antibody binding site produces noticeably different profiles (Supplementary Note 11). f, Receiver-Operator Characteristic (ROC) curves from aggregate peak calling with DiMeLo-seq targeting CTCF at 5-25X coverage using ChIP-seq as ground truth. Inset shows Area Under the Curve (AUC) as a function of coverage. g, The distribution of differences between our single-molecule predicted peak center and the known CTCF motif are plotted for single molecules within top decile ChIP-seq peaks. h, ROC curve for binary classification of CTCF-targeted DiMeLo-seq reads to identify CTCF-bound molecules based on each read’s proportion of methylated adenines in peak regions (Supplementary Note 11). At a FPR of 5.7%, a TPR of 54% is achieved. i, Fraction of reads that have a CTCF binding event detected in the peak region for each decile of ChIP-seq peak strength for the CTCF-targeted sample and IgG control. Calculated using thresholds determined from analysis in (h). Error bars do not extend beyond the points themselves so are not shown. j, Number of motifs and reads displayed in Fig. 4a.

Extended Data Fig. 6 Control mA and mCpG profiles at CTCF peaks.

Profiles at CTCF ChIP-seq peaks for free pA-Hia5, IgG control, in vitro treated genomic DNA, and untreated genomic DNA. Quartiles indicate rank of ChIP-seq peak strength. All axes are the same scaling as in Fig. 4a, except for mA/A of in vitro treated gDNA. With high mA levels achieved only with this in vitro methylated control, mC basecalling fails. However, if the Rerio model res_dna_r941_min_modbases_5mC_CpG_v001.cfg is used for calling mCpG separately from mA, the mCpG profile is restored, as seen in the inset for the in vitro treated gDNA sample. Importantly, as indicated by the y-axis scale in the inset, if mCpG is called separately from mA, the detected mCpG levels are higher.

Extended Data Fig. 7 Phased CTCF-targeted DiMeLo-seq reads.

Phased reads across one region on chr6 and two regions on chrX illustrate haplotype-specific CTCF binding due to genetic and epigenetic differences between haplotypes. a, A region on chr6 within the human leukocyte antigen (HLA) locus which contains two CTCF binding sites and many heterozygous SNPs useful for phasing reads. Both CTCF binding sites overlap a het SNP within their binding motif. At the first CTCF site, the paternal SNP allele within the motif is associated with weak or no CTCF binding on the paternal haplotype, and the opposite is true at the second CTCF site. Thus, only one of these two neighboring sites tends to be bound on each haplotype, which is clearly visible on reads spanning both CTCF sites. Further, because CpG methylation patterns are similar between the two haplotypes, these binding differences likely owe to the genetic differences present in/near the CTCF binding motifs themselves. b-c, Because the GM12878 cell line has two X chromosomes and was clonally derived, one X homolog (the paternally inherited X homolog for this cell line) has undergone X inactivation and remains inactive in all cells. Shown here are one region with CTCF binding on the active X only (b) and one region with CTCF binding on the inactive X only (c). The haplotype-specific CTCF binding patterns in these chrX regions appear to be associated with haplotype-specific CpG methylation, as similarly seen for the imprinted H19 locus shown in Fig. 4d. d, Aggregate enrichment profiles from DiMeLo-seq reads across all CTCF sites on chrX are shown, as in Fig. 4b. Each row in the heatmaps below the aggregate plots represents a single molecule centered at the CTCF motif. Notable strips of CpG hypermethylated reads are visible on the active X, as observed previously12,54.

Extended Data Fig. 8 Comparison of PacBio and Nanopore sequencing platforms for detecting mA from DiMeLo-seq.

The same DNA from a DiMeLo-seq experiment targeting CTCF in GM12878 cells was sequenced on both PacBio and Nanopore. The same untreated GM12878 DNA was also sequenced on both platforms. Methylated base calls for reads spanning the top decile of CTCF ChIP-seq peaks are analyzed. a, PacBio data. (i) Fraction of adenines methylated + /- 100 bp (“peak region”) from CTCF motif center as a function of IPD ratio for the CTCF-targeted sample and the untreated control. (ii) Fraction of adenines methylated for CTCF-targeted sample in the peak region for various IPD ratio thresholds and number of pass thresholds (indicated in legend from 1 to 5). (iii) Fraction of adenines methylated in the peak region for CTCF-targeted sample over the fraction for the untreated control as a function of IPD ratio and number of passes (indicated in legend from 1 to 5). (iv) Fraction of adenines methylated in the peak region for CTCF-targeted sample versus the enrichment of CTCF-targeted methylation over the untreated control. b, Nanopore data. Same as in (a), but probability of methylation is the threshold that varies rather than IPD ratio and number of passes. c, For a given fraction of adenines methylated in the peak region, here 0.1 for illustration, the PacBio and Nanopore enrichment profiles are overlaid. The thresholds for each platform for 10% peak methylation are indicated and the number of passes threshold for PacBio is one.

Extended Data Fig. 9 H3K9me3 control analysis at HOR boundaries and in centromere 7.

a, Density of methylated adenines for the H3K9me3-targeted sample and IgG and free pA-Hia5 controls in 100 kb sliding window across HOR boundaries 1p, 2pq, 6p, 9p, 13q, 14q, 15q, 16p, 17pq, 18pq, 20p, 21q, 22q. b, Centromere 7 single molecule browser tracks for H3K9me3-targeted sample, IgG control, and free pA-Hia5. The same molecules are shown in both plots, with mA calls indicated in the first, and mCpG calls indicated in the second. c, Coverage tracks in 10-kb bins to accompany mA/A and mCpG/CpG tracks from Fig. 5d.

Extended Data Fig. 10 AlphaHOR-RES centromere enrichment and methylation within chromosome X and chromosome 3 HORs.

a, Simulated cumulative distribution of the proportion of alpha-satellite DNA lost (black) and non-centromeric DNA kept (blue) after MscI and AseI digestion of the T2T chm13 genome at different size selection cutoffs. b, High (top) and low contrast (bottom) images of agarose gel run on total genomic DNA after Msc1 and Ase1 digestion. Sample recovered from above cut site (arrow). Representative image of at least 4 replicates. c, genomic DNA tapestation gel image of sample before digestion, after digestion, and after size selection. Representative image of at least 3 replicates. d, Coverage of the active HOR on each chromosome from the CHM13 + HG002X + hg38Y reference genome from free floating pA-Hia5 DiMeLo-seq libraries with and without AlphaHOR-RES. e-g, Single molecule view with individual reads in gray and mA depicted as dots for the indicated conditions. Scale bar indicates the probability of adenine methylation (from Guppy) between 0.6 and 1. Regions with at least 10 kb without unique 51 bp k-mers shown in grey to illustrate difficult to map locations for short-read sequencing. e. ChrX CDR (57.45 - 57.7 Mb), f. chromosome 3 HOR between 91.91 and 91.97 Mb, g. chromosome 3 HOR between 95.94 and 96.00 Mb.

Source data

Supplementary information

Supplementary Information

Supplementary Tables 1–3, Figs. 1 and 2 and Notes 1–15

Reporting Summary

Peer Review File

Source data

Source Data Extended Data Fig. 1

PDF file containing uncropped gel images shown in Extended Data Figs. 1a,b,d.

Source Data Extended Data Fig. 2

PDF file containing uncropped gel image corresponding to gel image shown in Extended Data Fig. 2c and uncropped images corresponding to images shown in Extended Data Fig. 2d.

Source Data Extended Data Fig. 2

Source data for graph in Extended Data Fig. 2e, fluorescence signal intensity per bead for the conditions in Extended Data Fig. 2e.

Source Data Extended Data Fig. 3

Unprocessed immunofluorescence micrographs in Extended Data Fig. 3c,d.

Source Data Extended Data Fig. 10

PDF file containing uncropped gel image corresponding to gel image shown in Extended Data Fig. 10b.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Altemose, N., Maslan, A., Smith, O.K. et al. DiMeLo-seq: a long-read, single-molecule method for mapping protein–DNA interactions genome wide. Nat Methods 19, 711–723 (2022). https://doi.org/10.1038/s41592-022-01475-6

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1038/s41592-022-01475-6

Further reading

Search

Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing