Epigenomic maps identify gene regulatory elements by their chromatin state. However, prevailing short-read sequencing methods cannot effectively distinguish alleles, evaluate the interdependence of elements in a locus or capture single-molecule dynamics. Here, we apply targeted nanopore sequencing to profile chromatin accessibility and DNA methylation on contiguous ~100-kb DNA molecules that span loci relevant to development, immunity and imprinting. We detect promoters, enhancers, insulators and transcription factor footprints on single molecules based on exogenous GpC methylation. We infer relationships among dynamic elements within immune loci, and order successive remodeling events during T cell stimulation. Finally, we phase primary sequence and regulatory elements across the H19/IGF2 locus, uncovering primate-specific features. These include a segmental duplication that stabilizes the imprinting control region and a noncanonical enhancer that drives biallelic IGF2 expression in specific contexts. Our study advances emerging strategies for phasing gene regulatory landscapes and reveals a mechanism that overrides IGF2 imprinting in human cells.
This is a preview of subscription content, access via your institution
Open Access articles citing this article.
BIND&MODIFY: a long-range method for single-molecule mapping of chromatin modifications in eukaryotes
Genome Biology Open Access 29 March 2023
Access Nature and 54 other Nature Portfolio journals
Get Nature+, our best-value online-access subscription
$29.99 / 30 days
cancel any time
Subscribe to this journal
Receive 12 print issues and online access
$209.00 per year
only $17.42 per issue
Rent or buy this article
Prices vary by article type
Prices may be subject to local taxes which are calculated during checkout
Primary sequencing and processed data are available through the Gene Expression Omnibus under accession no. GSE183760. The following databases were used in this study: hg38 gencode annotation (V22), hg38 Ensembl Regulatory Build (release 101) and hg38 RefSeq v.39.
Source code for analysis is available at https://github.com/BernsteinLab/targetedNOMe_2022.
Baylin, S. B. & Jones, P. A. A decade of exploring the cancer epigenome – biological and translational implications. Nat. Rev. Cancer 11, 726–734 (2011).
Greenberg, M. V. C. & Bourc’his, D. The diverse roles of DNA methylation in mammalian development and disease. Nat. Rev. Mol. Cell Biol. 20, 590–607 (2019).
Lappalainen, T. & Greally, J. M. Associating cellular epigenetic models with human phenotypes. Nat. Rev. Genet. 18, 441–451 (2017).
Cavalli, G. & Heard, E. Advances in epigenetics link genetics to the environment and disease. Nature 571, 489–499 (2019).
Klemm, S. L., Shipony, Z. & Greenleaf, W. J. Chromatin accessibility and the regulatory epigenome. Nat. Rev. Genet. 20, 207–220 (2019).
ENCODE Project Consortium et al. Expanded encyclopaedias of DNA elements in the human and mouse genomes. Nature 583, 699–710 (2020).
Roadmap Epigenomics Consortium et al. Integrative analysis of 111 reference human epigenomes. Nature 518, 317–330 (2015).
Stunnenberg, H. G. International Human Epigenome Consortium & Hirst. M.The International Human Epigenome Consortium: a blueprint for scientific collaboration and discovery. Cell 167, 1145–1149 (2016).
Pardo, C. E., Darst, R. P., Nabilsi, N. H., Delmas, A. L. & Kladde, M. P. Simultaneous single-molecule mapping of protein-DNA interactions and DNA methylation by MAPit. Curr. Protoc. Mol. Biol. 21, 21.22 (2011).
Kelly, T. K. et al. Genome-wide mapping of nucleosome positioning and DNA methylation within individual DNA molecules. Genome Res. 22, 2497–2506 (2012).
Wang, Y. et al. Single-molecule long-read sequencing reveals the chromatin basis of gene expression. Genome Res. 29, 1329–1342 (2019).
Shipony, Z. et al. Long-range single-molecule mapping of chromatin accessibility in eukaryotes. Nat. Methods 17, 319–327 (2020).
Lee, I. et al. Simultaneous profiling of chromatin accessibility and methylation on human cell lines with nanopore sequencing. Nat. Methods 17, 1191–1199 (2020).
Stergachis, A. B., Debo, B. M., Haugen, E., Churchman, L. S. & Stamatoyannopoulos, J. A. Single-molecule regulatory architectures captured by chromatin fiber sequencing. Science 368, 1449–1454 (2020).
Abdulhay, N. J. et al. Massively multiplex single-molecule oligonucleosome footprinting. eLife 9, e59404 (2020).
Giesselmann, P. et al. Analysis of short tandem repeat expansions and their methylation state with nanopore sequencing. Nat. Biotechnol. 37, 1478–1481 (2019).
Gilpatrick, T. et al. Targeted nanopore sequencing with Cas9-guided adapter ligation. Nat. Biotechnol. 38, 433–438 (2020).
Wallace, A. D. et al. CaBagE: a Cas9-based background elimination strategy for targeted, long-read DNA sequencing. PLoS ONE 16, e0241253 (2021).
Stubbington, M. J. T., Rozenblatt-Rosen, O., Regev, A. & Teichmann, S. A. Single-cell transcriptomics to explore the immune system in health and disease. Science 358, 58–63 (2017).
Shema, E., Bernstein, B. E. & Buenrostro, J. D. Single-cell and single-molecule epigenomics to uncover genome regulation at unprecedented resolution. Nat. Genet. 51, 19–25 (2019).
Sönmezer, C. et al. Molecular co-occupancy identifies transcription factor binding cooperativity in vivo. Mol. Cell 81, 255–267 (2021).
Pott, S. Simultaneous measurement of chromatin accessibility, DNA methylation, and nucleosome phasing in single cells. eLife 6, e23203 (2017).
Li, P. et al. BATF–JUN is critical for IRF4-mediated transcription in T cells. Nature 490, 543–546 (2012).
Wu, J. et al. Ablation of transcription factor IRF4 promotes transplant acceptance by driving allogenic CD4+ T cell dysfunction. Immunity 47, 1114–1128 (2017).
Huber, M. & Lohoff, M. IRF4 at the crossroads of effector T-cell fate decision. Eur. J. Immunol. 44, 1886–1895 (2014).
Bell, A. C. & Felsenfeld, G. Methylation of a CTCF-dependent boundary controls imprinted expression of the Igf2 gene. Nature 405, 482–485 (2000).
Harding, F. A., McArthur, J. G., Gross, J. A., Raulet, D. H. & Allison, J. P. CD28-mediated signalling co-stimulates murine T cells and prevents induction of anergy in T-cell clones. Nature 356, 607–609 (1992).
Lenschow, D. J., Walunas, T. L. & Bluestone, J. A. CD28/B7 system of T cell costimulation. Annu. Rev. Immunol. 14, 233–258 (1996).
Ji, Z. & Ji, H. TSCAN: pseudo-time reconstruction and evaluation in single-cell RNA-seq analysis. Nucleic Acids Res. 44, e117 (2016).
Trapnell, C. et al. The dynamics and regulators of cell fate decisions are revealed by pseudotemporal ordering of single cells. Nat. Biotechnol. 32, 381–386 (2014).
Saelens, W., Cannoodt, R., Todorov, H. & Saeys, Y. A comparison of single-cell trajectory inference methods. Nat. Biotechnol. 37, 547–554 (2019).
Haghverdi, L., Büttner, M., Wolf, F. A., Buettner, F. & Theis, F. J. Diffusion pseudotime robustly reconstructs lineage branching. Nat. Methods 13, 845–848 (2016).
Tucci, V., Isles, A. R., Kelsey, G. & Ferguson-Smith, A. C., Erice Imprinting Group. Genomic imprinting and physiological processes in mammals. Cell 176, 952–965 (2019).
Noordermeer, D. & Feil, R. Differential 3D chromatin organization and gene activity in genomic imprinting. Curr. Opin. Genet. Dev. 61, 17–24 (2020).
Monk, D., Mackay, D. J. G., Eggermann, T., Maher, E. R. & Riccio, A. Genomic imprinting disorders: lessons on how genome, epigenome and environment interact. Nat. Rev. Genet. 20, 235–248 (2019).
Court, F. et al. Genome-wide parent-of-origin DNA methylation analysis reveals the intricacies of human imprinting and suggests a germline methylation-independent mechanism of establishment. Genome Res. 24, 554–569 (2014).
Hark, A. T. et al. CTCF mediates methylation-sensitive enhancer-blocking activity at the H19/Igf2 locus. Nature 405, 486–489 (2000).
Nurk, S. et al. The complete sequence of a human genome. Science 376, 44–53 (2022).
Ainscough, J. F., John, R. M., Barton, S. C. & Surani, M. A. A skeletal muscle-specific mouse Igf2 repressor lies 40 kb downstream of the gene. Development 127, 3923–3930 (2000).
Jones, B. K., Levorse, J. & Tilghman, S. M. Deletion of a nuclease-sensitive region between the Igf2 and H19 genes leads to Igf2 misregulation and increased adiposity. Hum. Mol. Genet. 10, 807–814 (2001).
Charalambous, M. et al. An enhancer element at the Igf2/H19 locus drives gene expression in both imprinted and non-imprinted tissues. Dev. Biol. 271, 488–497 (2004).
Freschi, A. et al. The number of the CTCF binding sites of the H19/IGF2:IG-DMR correlates with DNA methylation and expression imprinting in a humanized mouse model. Hum. Mol. Genet. 30, 1509–1520 (2021).
Null, N. et al. The GTEx Consortium atlas of genetic regulatory effects across human tissues. Science 369, 1318–1330 (2020).
Zhang, Z. et al. Single nucleus transcriptome and chromatin accessibility of postmortem human pituitaries reveal diverse stem cell regulatory mechanisms. Cell Rep. 38, 110467 (2022).
Krietenstein, N. et al. Ultrastructural details of mammalian chromosome architecture. Mol. Cell 78, 554–565 (2020).
Johnstone, S. E. et al. Large-scale topological changes restrain malignant progression in colorectal cancer. Cell 182, 1474–1489 (2020).
Buenrostro, J. D., Wu, B., Chang, H. Y. & Greenleaf, W. J. ATAC-seq: a method for assaying chromatin accessibility genome-wide. Curr. Protoc. Mol. Biol. 109, 21.29.1–21.29.9 (2015).
Joung, J. et al. Genome-scale CRISPR-Cas9 knockout and transcriptional activation screening. Nat. Protoc. 12, 828–863 (2017).
Schmiedel, B. J. et al. 17q21 Asthma-risk variants switch CTCF binding and regulate IL-2 production by T cells. Nat. Commun. 7, 13426 (2016).
We thank all members of the Bernstein laboratory for discussions, and C. Epstein and P. Arlotta’s laboratory for kindly providing cryopreserved H9 ESCs. This work was supported by funds from the NCI/NIH Director’s Fund (no. DP1CA216873 to B.E.B.), the Gene Regulation Observatory and the Variant-to-Function Initiative at the Broad Institute. J.W. is supported by a postdoctoral fellowship from the Damon-Runyon Cancer Research Foundation. Z.C. is supported by a K00 from the NCI (no. CA234842). B.E.B. is the Richard and Nancy Lubin Family Endowed Chair at the Dana-Farber Cancer Institute and an American Cancer Society Research Professor.
B.E.B. declares outside interests in Fulcrum Therapeutics, Arsenal Biosciences, HiFiBio, Cell Signaling Technologies and Chroma Medicine. The remaining authors declare no competing interests.
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Aggregate methylation values in 250 bp windows are compared between two replicates for each cell type. Pearson correlations are indicated.
Genomic loci ranging from 50 to 116 kb in size were targeted for sequencing by Cas9 cleavage. Plots for reads sorted in ascending order by size show read lengths (top), percent of full-length reads (middle), and coverage (bottom) for each locus in data acquired for 11 genomic loci targeted in GM12878 and K562 (a), 5 loci targeted in H9 ESCs and HSMM (b), and 22 loci targeted in resting and activated CD4+ T cells (c).
a, Histograms of unmethylated run lengths in GM12878 using three different approximations. Minimum length (top), midpoint (middle), and maximum length (bottom). Dashed red lines denote the lengths at which the modes of the density distributions occur. b, Plots of aggregated methylated runs in GM12878 around IL2RA, divided into linker and non-linker classifications (see Methods). The top track denotes methylated runs identified as linker DNA while the track below shows the open run signal after excluding linker runs. DNase peaks from ENCODE and gene model track are included as reference. c, Histogram shows length distribution for open runs in GM12878. d, Correspondence between DNase-seq and aggregated open runs. The first two tracks show DNase-seq signal and peaks from ENCODE for GM12878 and K562, respectively. The bottom two tracks show the aggregated open run signal and called peaks for the same cell types. e, Scatterplot comparing open run signal in GM12878 and K562. Each point represents either a DHS peak (identified in GM12878 DNase or K562 DNase or both) or a control peak. f, Histograms showing the distribution of open run signal in K562 (top) and GM12878 (bottom). The gray histograms represent control peaks while the magenta and lilac histograms represent peaks identified in K562 and GM12878 DNase, respectively. g, ROC curves generated by using the open run signal to predict the respective DNase peaks.
Extended Data Fig. 4 Gene activity may be inferred based on accessibility and DNA methylation levels over their promoters and nearby enhancers.
a, Metaplot of open run signal in K562 cells centered at gene TSS’s from 46 targeted genes. The open run peak is 60 bp upstream of the TSS. Boxplots refer to the open run distribution within a 150 bp window centered at 60 bp upstream of the TSS. The boxes in the boxplots indicate the median and the first and third quartiles (Q1 and Q3). Whiskers extend to the last observation within 1.5 times the interquartile range below and above Q1 and Q3, respectively. The rest of the observations, including maxima and minima, are shown as outliers. FPKM < 1 (grey): n = 26; FPKM < 10 (light blue): n = 7; FPKM > 10 (dark blue): n = 7. b, List of active and inactive genes in GM12878 used in Fig. 2d. An RNA-seq FPKM threshold of 1 was used to determine whether or not a gene is actively transcribed. c, Differential accessibility (open run signal) of 312 potential enhancer regions across 24 loci before (0 h), and after (24 h and 48 h) activation of primary CD4+ T cells with anti-CD3/CD28 antibodies. d, Metaplot of open run signal centered at gene TSS’s from 40 targeted genes in resting T-cells and e, in stimulated T-cells. f, Heatmap of aggregated open run signal in resting T-cells centered at gene TSS’s from 40 targeted genes. Gene names and RNA-seq FPKM values are labeled on the side with a blue color indicating FPKM > 1 and grey indicating FPKM < 1. g, Same plot as in (e) for stimulated T-cells. h, Expression and motif enrichment for TFs enriched within ATAC-seq sites gained in resting or stimulating T cells (P value <1 × 10−¹°). The statistical test for motif enrichment is a one-sided hypergeometric test. The P value was not adjusted. ATAC-seq peaks were called by MACS2–callpeak with the parameters–nolambda–nomodel. We used findMotifsGenome.pl from the HOMER program with default parameters to identify TF motifs enriched in distal ATAC-seq peaks.
a, Scatterplot comparing average GpC methylation on each haplotype across a 52 kb targeted locus in GM12878 cells. Each dot represents a single GpC dinucleotide. Purple arrow indicates a site with 68.3% average GpC methylation in haplotype 1 and 20.3% in haplotype 2, measured across aggregated single molecules covering each allele. b, Top, single NOMe-seq reads (rows) from GM12878 cells split by haplotype and colored by GpC methylation reveal a ~200 base region of haplotype-specific chromatin accessibility within the ZPBP2 intronic region. Above the reads, open run tracks, aggregated by haplotype, show a TF footprint present in haplotype 1. Middle, the same reads are shown again, colored by SNP. The C > G SNP (rs12936231) creates a CTCF binding site in haplotype 1 (ref. 49). Bottom, CTCF ChIP-seq reads in GM12878 from ENCODE (ENCFF162QXM) show allele-specific CTCF binding to haplotype 1, which likely underlies the flanking allele-specific NOMe-seq accessibility. Reference sequence and CTCF binding motif are shown below ChIP-seq reads.
a, Flow cytometry for the T-cell activation markers CD69 and CD25 before (resting) and after (24 h and 48 h) stimulation of CD4+ T cells shows continuum of immune response. b, RNA-seq tracks show downregulation of CD28 and upregulation of nearby genes (CTLA4 and ICOS) after T-cell activation. Orange rectangle marks 78 kb locus enriched by RNPs. c, Inset shows targeted locus, which flanks CTLA4 and ICOS, captured with contiguous 78 kb NOMe-seq reads and profiled across a time course of CD4+ T-cell activation (0 h, 24 h and 48 h). Open runs (green) are shown for 75 (0 h), 67 (24 h) and 67 (48 h) individual reads (rows) and for the aggregate of each timepoint (above).
a, Aggregated open runs (green) and DNA methylation (red) from phased long reads for the imprinted SNRPN locus. The callouts below the tracks show 3 kb regions (chr15:24771500-24774501, chr15:24822120-24825121 and chr15:24856257-24859258) where SNPs, open runs (green), and DNA methylation (methyl-CpG in red; unmethylated CpG in blue) show strong allele specificity (calls based on 45 maternal and 41 paternal single molecules). Also shown are known DMRs36, a novel DMR that overlaps the TSS of a SNPRN isoform (chr15:24855801-24859292), and a bi-allelic deletion in H9 ES cells (chr15:24871965-24873506). This confirms that reads assigned to a given parental allele based on genetics (SNPs) have concordant epigenetic landscapes. b, Single molecule correlation heatmap of DNA methylation for the GNAS imprinted locus. The 93 kb locus was split into non-overlapping 100 bp bins. Pairwise correlation of the average methylation within the bins on 433 individual reads was computed across the locus. c, DMRs were identified based on an adaptation of peak calling. PCA was performed on 111 individual reads using the average DNA methylation signal within the identified DMRs. The reads were then ordered by their first principal component values and colored by their DNA methylation (methyl-CpG in red; unmethylated CpG in blue). The dendrogram on the left shows the results of hierarchical clustering of these reads based on the first principal component. The reads were also assigned to alternate alleles based on heterozygous SNPs (green and purple bars next to the dendrogram). ‘Green’ and ‘purple’ alleles can be annotated as maternal and paternal, respectively, based on their epigenetic landscapes. This clustering confirms that reads assigned to a given parental allele based on their epigenetic landscapes have concordant genotypes.
a, Schematic showing inversion of LINC01150 in the hg38 assembly. Reassembly of the region (chr11:1768505-2164321) with nanopore reads showed that LINC01150 is inverted and inserted downstream of TNNT3. b, Comparisons of H1-ESC Micro-C data45 aligned to the hg38, repaired, and T2T-CHM13 assemblies. The color scale is set to saturate at no detectable interactions (white) to the 90th percentile of interaction values (red) in the selected region. The highlighted genes show the inversion that was corrected in the repaired assembly and in T2T-CHM13. c, Zoomed-in contact map aligned to the repaired assembly from (b) over the enriched H19 and IGF2 locus. CTCF ChIP-seq track is shown in black. Cartoon of alternate enhancer model shows position of the non-canonical enhancer that is not insulated by the ICR. d, Top, human H19/IGF2 locus conservation track shows that the non-canonical enhancer region lacks synteny in mouse. Bottom, mouse H19/Igf2 locus track shows position of previously described regulatory element39,40,41. This element overlaps a CTCF anchor evident in multiple mouse tissues, including mouse heart (CTCF ChIP-seq track from ENCODE).
Extended Data Fig. 9 Examples of cell types with confirmed mono-allelic and bi-allelic expression of IGF2.
a,b, Top, expanded view of WGS reads shows heterozygous SNPs in the last exon of the IGF2 gene for Panc1 (chr11:2131663; rs7129583) (a) and HUES64 ESCs (chr11:2131029; rs79275529) (b). Bottom, RNA-seq detected a single allele on a large majority (>90%) of reads, confirming mono-allelic IGF2 expression in these human cell lines. c, Expanded views of RNA-seq reads in primary human osteoblast cells show three heterozygous SNPs (chr11:2129467; rs7873/chr11:2131992; rs184016711/chr11:2132958; rs2230949) with ~50% allele frequency, indicating bi-allelic IGF2 expression. d, Expanded views of RNA-seq reads in activated (ENCFF959SOP.bam) and naïve (ENCFF175STH.bam) CD8+ T cells from an individual show two heterozygous SNPs (chr11:2129214; rs2585/chr11:2131311; rs3168310) with ~50% allele frequency in the activated T cell population, consistent with bi-allelic IGF2 expression. No reads were detected over IGF2 in RNA-seq from naïve T cells, indicating low or no IGF2 expression in the resting state. Ref=reference sequence; Alt=alternate sequence.
Extended Data Fig. 10 Validation of bi-allelic IGF2 expression in human tissues and immune cells from gold-standard GTEx data.
a, X axis show 51 different tissue types for which the IGF2 gene contained informative (heterozygous) SNPs. Y axis indicates the percentage of samples among GTEx donors with confirmed bi-allelic IGF2 expression. Over 50% of samples with bi-allelic IGF2 correspond to one of the first six tissues (red). b-f, Expanded views of RNA-seq reads show heterozygous SNPs with 30–70% allele frequency in exemplar Whole Blood (chr11:2129467/rs7873) (b), Pituitary (chr11:2129214/rs2585) (c), Fibroblasts (chr11:2131311/rs3168310) (d), Cerebellum (chr11:2129214/rs2585) (e) and Esophagus (chr11:2129467 / rs7873) (f) samples, confirming bi-allelic IGF2 expression. Ref=reference sequence; Alt=alternate sequence.
About this article
Cite this article
Battaglia, S., Dong, K., Wu, J. et al. Long-range phasing of dynamic, tissue-specific and allele-specific regulatory elements. Nat Genet 54, 1504–1513 (2022). https://doi.org/10.1038/s41588-022-01188-8
This article is cited by
BIND&MODIFY: a long-range method for single-molecule mapping of chromatin modifications in eukaryotes
Genome Biology (2023)
Nature Reviews Disease Primers (2023)
Nature Reviews Genetics (2023)