Studies of genome regulation routinely use high-throughput DNA sequencing approaches to determine where specific proteins interact with DNA, and they rely on DNA amplification and short-read sequencing, limiting their quantitative application in complex genomic regions. To address these limitations, we developed directed methylation with long-read sequencing (DiMeLo-seq), which uses antibody-tethered enzymes to methylate DNA near a target protein’s binding sites in situ. These exogenous methylation marks are then detected simultaneously with endogenous CpG methylation on unamplified DNA using long-read, single-molecule sequencing technologies. We optimized and benchmarked DiMeLo-seq by mapping chromatin-binding proteins and histone modifications across the human genome. Furthermore, we identified where centromere protein A localizes within highly repetitive regions that were unmappable with short sequencing reads, and we estimated the density of centromere protein A molecules along single chromatin fibers. DiMeLo-seq is a versatile method that provides multimodal, genome-wide information for investigating protein–DNA interactions.
This is a preview of subscription content
Subscribe to Nature+
Get immediate online access to the entire Nature family of 50+ journals
Subscribe to Journal
Get full journal access for 1 year
only $9.92 per issue
All prices are NET prices.
VAT will be added later in the checkout.
Tax calculation will be finalised during checkout.
Get time limited or full article access on ReadCube.
All prices are NET prices.
All raw fast5 sequencing data are available in the Sequence Read Archive (SRA) under BioProject accession PRJNA752170. These data were used to produce Figs. 2–6, Extended Data Figs. 1–10, Supplementary Tables 1–3 and Supplementary Fig. 2. The CTCF ChIP–seq peak bed file for GM12878 is available from ENCODE Project Consortium under accession ENCFF797SDL. The ATAC–seq peak bed file for GM12878 is available from ENCODE Project Consortium under accession ENCFF748UZH. Bulk and scDamID data were obtained from the Gene Expression Omnibus (GEO) under accession GSE156150. H3K9me3 CUT&RUN data are from Altemose et al.35 and accessible in the SRA with BioProject accession PRJNA752795. Data for Fig. 6c used CHM13 CENP-A ChIP–seq data for CENP-A k-mer analyses, which are available under BioProject accession PRJNA559484 from Logsdon et al.41. Centromere and HOR definition bed files from the telomere-to-telomere consortium can be found at https://github.com/marbl/chm13. Known CTCF motifs are from http://compbio.mit.edu/encode-motifs/matches.txt.gz. Data for the CpG methylation track in Fig. 6d were obtained from data available at https://github.com/nanopore-wgs-consortium/CHM13 (ref. 35). Source data are provided with this paper.
The code to reproduce the results in this manuscript is available on https://github.com/amaslan/dimelo-seq/.
van Steensel, B. & Henikoff, S. Identification of in vivo DNA targets of chromatin proteins using tethered dam methyltransferase. Nat. Biotechnol. 18, 424–428 (2000).
Mikkelsen, T. S. et al. Genome-wide maps of chromatin state in pluripotent and lineage-committed cells. Nature 448, 553–560 (2007).
Robertson, G. et al. Genome-wide profiles of STAT1 DNA association using chromatin immunoprecipitation and massively parallel sequencing. Nat. Methods 4, 651–657 (2007).
Johnson, D. S., Mortazavi, A., Myers, R. M. & Wold, B. Genome-wide mapping of in vivo protein–DNA interactions. Science 316, 1497–1502 (2007).
Barski, A. et al. High-resolution profiling of histone methylations in the human genome. Cell 129, 823–837 (2007).
Skene, P. J. & Henikoff, S. An efficient targeted nuclease strategy for high-resolution mapping of DNA binding sites. Elife 6, e21856 (2017).
Rivera, C. M. & Ren, B. Mapping human epigenomes. Cell 155, 39–55 (2013).
Sönmezer, C. et al. Molecular co-occupancy identifies transcription factor binding cooperativity in vivo. Mol. Cell 81, 255–267 (2021).
Nurk, S. et al. The complete sequence of a human genome. Science 376, 44–53 (2022).
Abdulhay, N. J. et al. Massively multiplex single-molecule oligonucleosome footprinting. Elife 9, e59404 (2020).
Stergachis, A. B., Debo, B. M., Haugen, E., Churchman, L. S. & Stamatoyannopoulos, J. A. Single-molecule regulatory architectures captured by chromatin fiber sequencing. Science 368, 1449–1454 (2020).
Lee, I. et al. Simultaneous profiling of chromatin accessibility and methylation on human cell lines with nanopore sequencing. Nat. Methods 17, 1191–1199 (2020).
Shipony, Z. et al. Long-range single-molecule mapping of chromatin accessibility in eukaryotes. Nat. Methods 17, 319–327 (2020).
Wang, Y. et al. Single-molecule long-read sequencing reveals the chromatin basis of gene expression. Genome Res. 29, 1329–1342 (2019).
Schmid, M., Durussel, T. & Laemmli, U. K. ChIC and ChEC; genomic mapping of chromatin proteins. Mol. Cell 16, 147–157 (2004).
van Schaik, T., Vos, M., Peric-Hupkes, D., Hn Celie, P. & van Steensel, B. Cell cycle dynamics of lamina-associated DNA. EMBO Rep. 21, e50636 (2020).
O’Brown, Z. K. et al. Sources of artifact in measurements of 6mA and 4mC abundance in eukaryotic genomic DNA. BMC Genomics 20, 445 (2019).
Drozdz, M., Piekarowicz, A., Bujnicki, J. M. & Radlinska, M. Novel nonspecific DNA adenine methyltransferases. Nucleic Acids Res. 40, 2119–2130 (2012).
Lowary, P. T. & Widom, J. New DNA sequence rules for high affinity binding to histone octamer and sequence-directed nucleosome positioning. J. Mol. Biol. 276, 19–42 (1998).
Guelen, L. et al. Domain organization of human chromosomes revealed by mapping of nuclear lamina interactions. Nature 453, 948–951 (2008).
Meuleman, W. et al. Constitutive nuclear lamina–genome interactions are highly conserved and associated with A/T-rich sequence. Genome Res. 23, 270–280 (2013).
Altemose, N. et al. μDamID: a microfluidic approach for joint imaging and sequencing of protein–DNA interactions in single cells. Cell Syst. 11, 354–366 (2020).
Sobecki, M. et al. MadID, a versatile approach to map protein–DNA interactions, highlights telomere-nuclear envelope contact sites in human cells. Cell Rep. 25, 2891–2903 (2018).
Kind, J. et al. Genome-wide maps of nuclear lamina interactions in single human cells. Cell 163, 134–147 (2015).
Bell, A. C. & Felsenfeld, G. Methylation of a CTCF-dependent boundary controls imprinted expression of the Igf2 gene. Nature 405, 482–485 (2000).
Song, L. et al. Open chromatin defined by DNase I and FAIRE identifies regulatory elements that shape cell-type identity. Genome Res. 21, 1757–1767 (2011).
Boyle, A. P. et al. High-resolution genome-wide in vivo footprinting of diverse transcription factors in human cells. Genome Res. 21, 456–464 (2011).
Klenova, E. M. et al. CTCF, a conserved nuclear factor required for optimal transcriptional activity of the chicken c-myc gene, is an 11-Zn-finger protein differentially expressed in multiple forms. Mol. Cell. Biol. 13, 7612–7624 (1993).
Lobanenkov, V. V. et al. A novel sequence-specific DNA binding protein which interacts with three regularly spaced direct repeats of the CCCTC-motif in the 5′-flanking sequence of the chicken c-myc gene. Oncogene 5, 1743–1753 (1990).
Ohlsson, R., Renkawitz, R. & Lobanenkov, V. CTCF is a uniquely versatile transcription regulator linked to epigenetics and disease. Trends Genet. 17, 520–527 (2001).
Rhee, H. S. & Pugh, B. F. Comprehensive genome-wide protein–DNA interactions detected at single-nucleotide resolution. Cell 147, 1408–1419 (2011).
Kelly, T. K. et al. Genome-wide mapping of nucleosome positioning and DNA methylation within individual DNA molecules. Genome Res. 22, 2497–2506 (2012).
Wenger, A. M. et al. Accurate circular consensus long-read sequencing improves variant detection and assembly of a human genome. Nat. Biotechnol. 37, 1155–1162 (2019).
Gershman, A. et al. Epigenetic patterns in a complete human genome. Science 376, eabj5089 (2022).
Altemose, N. et al. Complete genomic and epigenetic maps of human centromeres. Science 376, eabl4178 (2022).
McNulty, S. M. & Sullivan, B. A. Alpha satellite DNA biology: finding function in the recesses of the genome. Chromosome Res. 26, 115–138 (2018).
Rudd, M. K., Schueler, M. G. & Willard, H. F. Sequence organization and functional annotation of human centromeres. Cold Spring Harb. Symp. Quant. Biol. 68, 141–149 (2003).
Willard, H. F. & Waye, J. S. Hierarchical order in chromosome-specific human alpha satellite DNA. Trends Genet. 3, 192–198 (1987).
Miga, K. H. et al. Centromere reference models for human chromosomes X and Y satellite arrays. Genome Res. 24, 697–707 (2014).
Hayden, K. E. et al. Sequences associated with centromere competency in the human genome. Mol. Cell. Biol. 33, 763–772 (2013).
Logsdon, G. A. et al. The structure, function and evolution of a complete human chromosome 8. Nature 593, 101–107 (2021).
Lica, L. & Hamkalo, B. Preparation of centromeric heterochromatin by restriction endonuclease digestion of mouse L929 cells. Chromosoma 88, 42–49 (1983).
Smith, O. K. et al. Identification and characterization of centromeric sequences in Xenopus laevis. Genome Res. 31, 958–967 (2021).
Miga, K. H. et al. Telomere-to-telomere assembly of a complete human X chromosome. Nature 585, 79–84 (2020).
Bodor, D. L. et al. The quantitative architecture of centromeric chromatin. Elife 3, e02137 (2014).
Aldrup-MacDonald, M. E., Kuo, M. E., Sullivan, L. L., Chew, K. & Sullivan, B. A. Genomic variation within alpha satellite DNA influences centromere location on human chromosomes with metastable epialleles. Genome Res. 26, 1301–1311 (2016).
Gilpatrick, T. et al. Targeted nanopore sequencing with Cas9-guided adapter ligation. Nat. Biotechnol. 38, 433–438 (2020).
Kovaka, S., Fan, Y., Ni, B., Timp, W. & Schatz, M. C. Targeted nanopore sequencing by real-time mapping of raw electrical signal with UNCALLED. Nat. Biotechnol. 39, 431–441 (2021).
Gamba, R. et al. A method to enrich and purify centromeric DNA from human cells. Preprint at bioRxiv https://doi.org/10.1101/2021.09.24.461328 (2021).
Meers, M. P., Bryson, T. D., Henikoff, J. G. & Henikoff, S. Improved CUT&RUN chromatin profiling tools. Elife 8, e46314 (2019).
Cao, S., Zhou, K., Zhang, Z., Luger, K. & Straight, A. F. Constitutive centromere-associated network contacts confer differential stability on CENP-A nucleosomes in vitro and in the cell. Mol. Biol. Cell 29, 751–762 (2018).
Zhou, K. et al. CENP-N promotes the compaction of centromeric chromatin. Preprint at bioRxiv https://doi.org/10.1101/2021.06.14.448351 (2021).
Kim, B. Y. et al. Highly contiguous assemblies of 101 drosophilid genomes. Preprint at bioRxiv https://doi.org/10.1101/2020.12.14.422775 (2020).
Hellman, A. & Chess, A. Gene body-specific methylation on the active X chromosome. Science 315, 1141–1143 (2007).
We thank A. Stergachis for the plasmid encoding Hia5, G. Caldas for experimental training, and G. Karpen for helpful discussions. We thank Stanford University and the Stanford Research Computing Center for providing computational resources and support that contributed to these research results. We thank M. Tan for contributions to sequencing. This work was supported by the Chan Zuckerberg Biohub and by the NIGMS of the National Institutes of Health under award no. R35GM124916 to A.S. and R01 GM074728 to A.F.S. K.H.M. is supported by R21HG010548-01. O.K.S. and R.R.B. are supported by a National Institutes of Health T32 award, nos. GM113854-02 and GM007279-45, respectively. A.M., O.K.S. and R.R.B. are supported by NSF GRFP awards. N.A. is an HHMI Hanna H. Gray Fellow. A.S. is a Chan Zuckerberg Biohub Investigator, and a Pew Scholar in the Biomedical Sciences.
N.A., A.M., O.K.S., K.S., A.F.S. and A.S. are co-inventors on a patent application related to this work. The remaining authors declare no competing interests.
Peer review information
Nature Methods thanks the anonymous reviewers for their contribution to the peer review of this work. Primary Handling Editor: Lei Tang, in collaboration with the Nature Methods team. Peer reviewer reports are available.
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Extended Data Fig. 1 In vitro assessment of methylation of DNA and chromatin by pA-Hia5 and pAG-Hia5.
a,b, Agarose gel electrophoresis image of DpnI digestion of (unmethylated) plasmid DNA following incubation with Hia5, pA-Hia5 (a), or pAG-Hia5 (b) (Supplementary Note 1). Representative images of at least 2 replicates. c, Schematic of 1×601 DNA sequence. Grey box indicates 601 sequence, Yellow hexagon indicates end with biotin. d, Native polyacrylamide gel electrophoresis of naked 1×601 DNA or chromatinized 1×601 DNA before and after BsiWI digestion and glycerol gradient fractionation. Representative image of at least 2 replicates. e, Histogram (filled bars, left axis) and cumulative distribution (line traces, right axis) of fraction of methylation (mA/A) on reads from CENP-A 1×601 chromatin methylated with free pA-Hia5, CENP-A-directed pA-Hia5, IgG-directed pA-Hia5, or untreated. Left y-axis is truncated at 20 for better visualization. f, Plot showing percentage false discovery rate plotted against binned minimum mA probability score (Supplementary Note 4). Dotted lines indicates threshold - 0.6, 5% FDR. g,h, Receiver Operator Characteristic (ROC) curves comparing fraction of methylated reads from 1×601 CENP-A chromatin after CENP-A-directed methylation (True Positive Rate) to IgG-directed methylation (g) or no treatment (h) (False Positive Rate). Areas under the curves (AUC) for the ROC curves range between 0.92 and 0.94 for (g), and between 0.92 and 0.95 for (h). i, Schematic of methylation of accessible DNA on 1×601 CENP-A chromatin co-incubated with free pA-Hia5 and SAM. j, Heatmap showing methylation on 5000 individual reads from CENP-A chromatin following incubation with free pA-Hia5. Blue indicates methylation above threshold (0.6). k, Line plot showing percentage of reads with methylation as a function of the minimum percentage of methylation on each read. (methylation threshold - 0.6). Dotted line corresponds to methylation on at least 20% of each read (used in Fig. 2d).
Extended Data Fig. 2 In vitro assessment of methylation of 18×601 array chromatin by pA-Hia5 and pAG-Hia5.
a, Schematic showing the location of 601 sequences (grey boxes) and AvaI digestion sites (dashed line) in between 601 sequences on the 18×601 array. Yellow hexagons indicate biotinylation. b, Schematic of methylation of 18×601 chromatin reconstitution, incubation with free pA-Hia5 and SAM, and long-read sequencing of methylated DNA extracted from chromatin. c, Native polyacrylamide gel electrophoresis showing AvaI digested naked 18×601 array DNA or 18×601 chromatin array reconstituted with CENP-A or H3 (Supplementary Note 2). Representative gel image of at least 3 replicates. d, Representative immunofluorescence images of chromatin-coated beads following methylation using CENP-A-directed pA-Hia5. Scale bar - 3 microns. e, Violin plots of immunofluorescence signal on (denatured) chromatin-coated beads following antibody-directed methylation. Solid line - median, dashed line - quartiles. n > 90 beads/condition. (Supplementary Note 5) f, Histogram (filled bars, left axis) and cumulative distribution (line traces, right axis) of fraction of methylation (mA/A) on reads from CENP-A or H3 chromatin methylated with free pA-Hia5 or CENP-A-directed pA-Hia5. Left y-axis is truncated at 20 for better visualization. g,h, Heatmap showing methylation on 2000 individual reads from CENP-A chromatin methylation with free pA-Hia5, clustered over the entire 18×601 array (g) or a subset 4×601 region (Supplementary Note 4) along with cartoons depicting predicted nucleosome positions (red circles) (h). Insets below heatmaps show average mA/A on every base position of 18×601 array or 4×601 portion. (red dashed line indicates 601 dyad position). i, Violin plot of nucleosomes detected per read on reads from CENP-A or H3 18×601 chromatin array methylated with free pA-Hia5, or CENP-A-directed pA-Hia5. Solid line - median, dashed lines - quartiles. n = 3000 reads. Statistical significance was calculated using Kruskal-Wallis test. *** - P-value < 0.0001 ns - P-value > 0.05. j, Histogram (filled bars, left axis) and cumulative distribution (line traces, right axis) of fraction of methylation (mA/A) on reads from CENP-A or H3 chromatin methylated with free pA-Hia5 or CENP-A-directed pA-Hia5. Left y-axis is truncated at 20 for better visualization. k,i, Same as g,h, but corresponding to H3 chromatin methylation with H3-directed pAG-Hia5.
a, The proportion of all adenines called as methylated at each possible mA probability score using two different software packages on ONT reads from two GM12878 DNA samples: untreated genomic DNA and purified genomic DNA methylated by Hia5 in vitro. The untreated DNA provides a measure of the false positive rate (FPR) at each score, since it contains few or no methyladenines. The Hia-5 treated DNA provides a lower bound on the true positive rate (TPR) at each threshold. b, Estimates of the proportion of As methylated in the Hia5-treated DNA sample at each false discovery rate (FDR) threshold (FDR = FPR/(TPR + FPR), determined from a). At least 80% of the adenines on the Hia5-treated DNA appear to be methylated. c-d, In the DiMeLo-seq workflow, following the primary antibody and pA/G-MTase binding and wash steps, a sample of nuclei can be taken for quality assessment by immunofluorescence. One can determine the locations and relative quantity of pA/G-MTase molecules using fluorophore-conjugated antibodies that bind to the pA/G-MTase but not to the primary antibody. In these representative images, the results for pAG-EcoGII are shown, comparing different antibodies, detergents, and samples with (d) and without (c) the use of an unconjugated secondary antibody to recruit more pA/G-MTase molecules to the target protein. Scale bars representing 10 microns are shown in the FITC channel images as white lines.
Extended Data Fig. 4 Demonstration of in vivo LMNB1-targeting and estimation of in situ sensitivity and specificity.
a, A browser view of chr7 comparing in vivo EcoGII-LMNB1 DamID (second track, green) to conventional LMNB1 in vivo DamID (first track, blue), and to LMNB1-targeted in situ DiMeLo-seq (fourth track, dark red). b, For an in situ LMNB1-targeting experiment using the final v2 protocol (#120 in Supplementary Table 1), the distributions of guppy mA probability scores across all A bases (q > 10) on all reads mapping to cLADs (gold, representing on-target methylation; n = 2.8 M) or ciLADs (blue, representing off-target methylation; n = 2.1 M). c, As in b, but showing the cumulative distributions for all mA calls above each probability score threshold, with the ratio between these plotted as a dotted line (using the right-hand y-axis). Vertical line indicates the stringent threshold of 0.9, at which cLADs have 20 times more mA as a proportion of all As (0.6%) than do ciLADs. If the threshold is reduced to 0.5, the fraction of As called as methylated increases to 2.5% but the cLAD:ciLAD ratio decreases to 15.6. d, On a per-read basis, for all reads with at least 500 A basecalls (q > 10) and using a mA probability threshold of 0.9, the distribution of mA/A called on each read for cLADs (n = 812 reads) vs. ciLADs (n = 827 reads). e, Receiver-Operator Characteristic (ROC) curve showing, for different mA calling thresholds, the ability to classify individual reads from (d) as originating from cLADs or ciLADs using a simple linear threshold on mA/A. At a false positive rate of 6%, reads can be classified with a true positive rate of 59%, and this is similar for all mA thresholds used. The total Area Under the Curve (AUC) for the p > 0.9 curve is 0.78. f, As in Fig. 3e, but for bulk conventional DamID raw coverage. The y axis is truncated to omit outliers for visualization (max = 300000), but these were not omitted for linear model and correlation computation. Error bars in x represent the proportion of 32 cells + /- 2 standard errors of the proportion. Error bars in y represent the mean of n = 94 to 663 genomic bins + /- 2 standard errors of the mean.
a, Enrichment profiles with mA probability threshold of 0.75 at the top quartile of ChIP–seq peaks for the DiMeLo-seq protocol v1 compared to four optimization conditions (opt1: 2 hour activation, 0.05 mM spermidine at activation, replenish SAM; opt2: 2 hour activation, 0.05 mM spermidine at activation, replenish SAM, 500 nM pA-Hia5; opt3: 2 hour activation, 0.05 mM spermidine at activation, replenish SAM, pA-Hia5 binding at 4 °C for 2 hours; opt4: 2 hour activation, no spermidine, 1 mM Ca ++ and 0.5 mM Mg ++ buffer) (Supplementary Note 11). b, Fold enrichment over background of mA/A in ChIP-seq peak regions. Error bars represent the 95% credible interval for each ratio of proportions determined by sampling proportions from posterior beta distributions computed with uninformative priors. c, mA/A in ATAC-seq peaks that do not overlap CTCF ChIP-seq peaks (grey) and mA/A in ATAC-seq peaks that do overlap CTCF ChIP-seq peaks (yellow). Error bars are computed as in (b) d, Methylation decay from the CTCF motif center for the top decile of ChIP-seq signal is fit with an exponential decay function. The positions of the peaks are indicated, with the spacing between peaks also noted. e, Methylation profiles at top quartile of ChIP-seq peaks when targeting the C-terminus or N-terminus of CTCF. The difference between antibody binding site produces noticeably different profiles (Supplementary Note 11). f, Receiver-Operator Characteristic (ROC) curves from aggregate peak calling with DiMeLo-seq targeting CTCF at 5-25X coverage using ChIP-seq as ground truth. Inset shows Area Under the Curve (AUC) as a function of coverage. g, The distribution of differences between our single-molecule predicted peak center and the known CTCF motif are plotted for single molecules within top decile ChIP-seq peaks. h, ROC curve for binary classification of CTCF-targeted DiMeLo-seq reads to identify CTCF-bound molecules based on each read’s proportion of methylated adenines in peak regions (Supplementary Note 11). At a FPR of 5.7%, a TPR of 54% is achieved. i, Fraction of reads that have a CTCF binding event detected in the peak region for each decile of ChIP-seq peak strength for the CTCF-targeted sample and IgG control. Calculated using thresholds determined from analysis in (h). Error bars do not extend beyond the points themselves so are not shown. j, Number of motifs and reads displayed in Fig. 4a.
Profiles at CTCF ChIP-seq peaks for free pA-Hia5, IgG control, in vitro treated genomic DNA, and untreated genomic DNA. Quartiles indicate rank of ChIP-seq peak strength. All axes are the same scaling as in Fig. 4a, except for mA/A of in vitro treated gDNA. With high mA levels achieved only with this in vitro methylated control, mC basecalling fails. However, if the Rerio model res_dna_r941_min_modbases_5mC_CpG_v001.cfg is used for calling mCpG separately from mA, the mCpG profile is restored, as seen in the inset for the in vitro treated gDNA sample. Importantly, as indicated by the y-axis scale in the inset, if mCpG is called separately from mA, the detected mCpG levels are higher.
Phased reads across one region on chr6 and two regions on chrX illustrate haplotype-specific CTCF binding due to genetic and epigenetic differences between haplotypes. a, A region on chr6 within the human leukocyte antigen (HLA) locus which contains two CTCF binding sites and many heterozygous SNPs useful for phasing reads. Both CTCF binding sites overlap a het SNP within their binding motif. At the first CTCF site, the paternal SNP allele within the motif is associated with weak or no CTCF binding on the paternal haplotype, and the opposite is true at the second CTCF site. Thus, only one of these two neighboring sites tends to be bound on each haplotype, which is clearly visible on reads spanning both CTCF sites. Further, because CpG methylation patterns are similar between the two haplotypes, these binding differences likely owe to the genetic differences present in/near the CTCF binding motifs themselves. b-c, Because the GM12878 cell line has two X chromosomes and was clonally derived, one X homolog (the paternally inherited X homolog for this cell line) has undergone X inactivation and remains inactive in all cells. Shown here are one region with CTCF binding on the active X only (b) and one region with CTCF binding on the inactive X only (c). The haplotype-specific CTCF binding patterns in these chrX regions appear to be associated with haplotype-specific CpG methylation, as similarly seen for the imprinted H19 locus shown in Fig. 4d. d, Aggregate enrichment profiles from DiMeLo-seq reads across all CTCF sites on chrX are shown, as in Fig. 4b. Each row in the heatmaps below the aggregate plots represents a single molecule centered at the CTCF motif. Notable strips of CpG hypermethylated reads are visible on the active X, as observed previously12,54.
Extended Data Fig. 8 Comparison of PacBio and Nanopore sequencing platforms for detecting mA from DiMeLo-seq.
The same DNA from a DiMeLo-seq experiment targeting CTCF in GM12878 cells was sequenced on both PacBio and Nanopore. The same untreated GM12878 DNA was also sequenced on both platforms. Methylated base calls for reads spanning the top decile of CTCF ChIP-seq peaks are analyzed. a, PacBio data. (i) Fraction of adenines methylated + /- 100 bp (“peak region”) from CTCF motif center as a function of IPD ratio for the CTCF-targeted sample and the untreated control. (ii) Fraction of adenines methylated for CTCF-targeted sample in the peak region for various IPD ratio thresholds and number of pass thresholds (indicated in legend from 1 to 5). (iii) Fraction of adenines methylated in the peak region for CTCF-targeted sample over the fraction for the untreated control as a function of IPD ratio and number of passes (indicated in legend from 1 to 5). (iv) Fraction of adenines methylated in the peak region for CTCF-targeted sample versus the enrichment of CTCF-targeted methylation over the untreated control. b, Nanopore data. Same as in (a), but probability of methylation is the threshold that varies rather than IPD ratio and number of passes. c, For a given fraction of adenines methylated in the peak region, here 0.1 for illustration, the PacBio and Nanopore enrichment profiles are overlaid. The thresholds for each platform for 10% peak methylation are indicated and the number of passes threshold for PacBio is one.
a, Density of methylated adenines for the H3K9me3-targeted sample and IgG and free pA-Hia5 controls in 100 kb sliding window across HOR boundaries 1p, 2pq, 6p, 9p, 13q, 14q, 15q, 16p, 17pq, 18pq, 20p, 21q, 22q. b, Centromere 7 single molecule browser tracks for H3K9me3-targeted sample, IgG control, and free pA-Hia5. The same molecules are shown in both plots, with mA calls indicated in the first, and mCpG calls indicated in the second. c, Coverage tracks in 10-kb bins to accompany mA/A and mCpG/CpG tracks from Fig. 5d.
Extended Data Fig. 10 AlphaHOR-RES centromere enrichment and methylation within chromosome X and chromosome 3 HORs.
a, Simulated cumulative distribution of the proportion of alpha-satellite DNA lost (black) and non-centromeric DNA kept (blue) after MscI and AseI digestion of the T2T chm13 genome at different size selection cutoffs. b, High (top) and low contrast (bottom) images of agarose gel run on total genomic DNA after Msc1 and Ase1 digestion. Sample recovered from above cut site (arrow). Representative image of at least 4 replicates. c, genomic DNA tapestation gel image of sample before digestion, after digestion, and after size selection. Representative image of at least 3 replicates. d, Coverage of the active HOR on each chromosome from the CHM13 + HG002X + hg38Y reference genome from free floating pA-Hia5 DiMeLo-seq libraries with and without AlphaHOR-RES. e-g, Single molecule view with individual reads in gray and mA depicted as dots for the indicated conditions. Scale bar indicates the probability of adenine methylation (from Guppy) between 0.6 and 1. Regions with at least 10 kb without unique 51 bp k-mers shown in grey to illustrate difficult to map locations for short-read sequencing. e. ChrX CDR (57.45 - 57.7 Mb), f. chromosome 3 HOR between 91.91 and 91.97 Mb, g. chromosome 3 HOR between 95.94 and 96.00 Mb.
PDF file containing uncropped gel images shown in Extended Data Figs. 1a,b,d.
PDF file containing uncropped gel image corresponding to gel image shown in Extended Data Fig. 2c and uncropped images corresponding to images shown in Extended Data Fig. 2d.
Source data for graph in Extended Data Fig. 2e, fluorescence signal intensity per bead for the conditions in Extended Data Fig. 2e.
Unprocessed immunofluorescence micrographs in Extended Data Fig. 3c,d.
PDF file containing uncropped gel image corresponding to gel image shown in Extended Data Fig. 10b.
About this article
Cite this article
Altemose, N., Maslan, A., Smith, O.K. et al. DiMeLo-seq: a long-read, single-molecule method for mapping protein–DNA interactions genome wide. Nat Methods 19, 711–723 (2022). https://doi.org/10.1038/s41592-022-01475-6
Nature Methods (2022)