Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

Defining genome architecture at base-pair resolution

Abstract

In higher eukaryotes, many genes are regulated by enhancers that are 104–106 base pairs (bp) away from the promoter. Enhancers contain transcription-factor-binding sites (which are typically around 7–22 bp), and physical contact between the promoters and enhancers is thought to be required to modulate gene expression. Although chromatin architecture has been mapped extensively at resolutions of 1 kilobase and above; it has not been possible to define physical contacts at the scale of the proteins that determine gene expression. Here we define these interactions in detail using a chromosome conformation capture method (Micro-Capture-C) that enables the physical contacts between different classes of regulatory elements to be determined at base-pair resolution. We find that highly punctate contacts occur between enhancers, promoters and CCCTC-binding factor (CTCF) sites and we show that transcription factors have an important role in the maintenance of the contacts between enhancers and promoters. Our data show that interactions between CTCF sites are increased when active promoters and enhancers are located within the intervening chromatin. This supports a model in which chromatin loop extrusion1 is dependent on cohesin loading at active promoters and enhancers, which explains the formation of tissue-specific chromatin domains without changes in CTCF binding.

Access options

Rent or Buy article

Get time limited or full article access on ReadCube.

from$8.99

All prices are NET prices.

Fig. 1: The comparison of MCC with other 3C techniques in erythroid cells at the promoters of the α-globin genes (Hba-a1 and Hba-a2) shows the considerably increased resolution afforded by MCC.
Fig. 2: MCC defines highly specific contacts between promoters and enhancers at many well-characterized loci.
Fig. 3: Contact profiles of the main CTCF boundary element (HS-38) at the α-globin locus are altered with changes in gene expression.
Fig. 4: Single-base-pair resolution analysis of MCC ligation junctions.

Data availability

Sequencing data have been submitted to the NCBI Gene Expression Omnibus (GSE144336 and GSE153256). Previously published data are available under the following accession codes: GSE679598, GSE9787114, GSE4428618, GSE2792137, GSE3020338, GSE5133439. DNase I hypersensitivity data are available for erythroid cells from ref. 40 and for ES cells from UW ENCODE41Source data are provided with this paper.

Code availability

The codes required for analysis of MCC data are available for academic use through the Oxford University Innovation software store (https://process.innovation.ox.ac.uk/software/p/16529a/micro-capture-c-academic/1).

References

  1. 1.

    Sanborn, A. L. et al. Chromatin extrusion explains key features of loop and domain formation in wild-type and engineered genomes. Proc. Natl Acad. Sci. USA 112, E6456–E6465 (2015).

    CAS  Article  Google Scholar 

  2. 2.

    Lieberman-Aiden, E. et al. Comprehensive mapping of long-range interactions reveals folding principles of the human genome. Science 326, 289–293 (2009).

    ADS  CAS  Article  Google Scholar 

  3. 3.

    Hsieh, T. S., Fudenberg, G., Goloborodko, A. & Rando, O. J. Micro-C XL: assaying chromosome conformation from the nucleosome to the entire genome. Nat. Methods 13, 1009–1011 (2016).

    CAS  Article  Google Scholar 

  4. 4.

    Krietenstein, N. et al. Ultrastructural details of mammalian chromosome architecture. Mol. Cell 78, 554–565.e7 (2020).

    CAS  Article  Google Scholar 

  5. 5.

    Hsieh, T. S. et al. Resolving the 3D landscape of transcription-linked mammalian chromatin folding. Mol. Cell 78, 539–553.e8 (2020).

    CAS  Article  Google Scholar 

  6. 6.

    Schoenfelder, S. et al. The pluripotent regulatory circuitry connecting promoters to their long-range interacting elements. Genome Res. 25, 582–597 (2015).

    CAS  Article  Google Scholar 

  7. 7.

    van de Werken, H. J. et al. Robust 4C-seq data analysis to screen for regulatory DNA interactions. Nat. Methods 9, 969–972 (2012).

    Article  Google Scholar 

  8. 8.

    Davies, J. O. et al. Multiplexed analysis of chromosome conformation at vastly improved sensitivity. Nat. Methods 13, 74–80 (2016).

    CAS  Article  Google Scholar 

  9. 9.

    Davies, J. O., Oudelaar, A. M., Higgs, D. R. & Hughes, J. R. How best to identify chromosomal interactions: a comparison of approaches. Nat. Methods 14, 125–134 (2017).

    CAS  Article  Google Scholar 

  10. 10.

    Kornberg, R. D. Chromatin structure: a repeating unit of histones and DNA. Science 184, 868–871 (1974).

    ADS  CAS  Article  Google Scholar 

  11. 11.

    Neph, S. et al. An expansive human regulatory lexicon encoded in transcription factor footprints. Nature 489, 83–90 (2012).

    ADS  CAS  Article  Google Scholar 

  12. 12.

    Hughes, J. R. et al. Analysis of hundreds of cis-regulatory landscapes at high resolution in a single, high-throughput experiment. Nat. Genet. 46, 205–212 (2014).

    CAS  Article  Google Scholar 

  13. 13.

    Tan-Wong, S. M. et al. Gene loops enhance transcriptional directionality. Science 338, 671–675 (2012).

    ADS  CAS  Article  Google Scholar 

  14. 14.

    Hanssen, L. L. P. et al. Tissue-specific CTCF–cohesin-mediated chromatin architecture delimits enhancer interactions and function in vivo. Nat. Cell Biol. 19, 952–961 (2017).

    CAS  Article  Google Scholar 

  15. 15.

    .Hentges, L. D., Sergeant, M. J., Downes, D. J., Hughes, J. R. & Taylor, S. LanceOtron: a deep learning peak caller for ATAC-seq, ChIP-seq, and DNase-seq. Preprint at https://doi.org/10.1101/2021.01.25.428108 (2021).

  16. 16.

    He, Q., Johnston, J. & Zeitlinger, J. ChIP-nexus enables improved detection of in vivo transcription factor binding footprints. Nat. Biotechnol. 33, 395–401 (2015).

    CAS  Article  Google Scholar 

  17. 17.

    Oudelaar, A. M. et al. Single-allele chromatin interactions identify regulatory hubs in dynamic compartmentalized domains. Nat. Genet. 50, 1744–1751 (2018).

    CAS  Article  Google Scholar 

  18. 18.

    Whyte, W. A. et al. Master transcription factors and mediator establish super-enhancers at key cell identity genes. Cell 153, 307–319 (2013).

    CAS  Article  Google Scholar 

  19. 19.

    Hay, D. et al. Genetic dissection of the α-globin super-enhancer in vivo. Nat. Genet. 48, 895–903 (2016).

    CAS  Article  Google Scholar 

  20. 20.

    Canver, M. C. et al. BCL11A enhancer dissection by Cas9-mediated in situ saturating mutagenesis. Nature 527, 192–197 (2015).

    ADS  CAS  Article  Google Scholar 

  21. 21.

    Ran, F. A. et al. Genome engineering using the CRISPR–Cas9 system. Nat. Protocols 8, 2281–2308 (2013).

    CAS  Article  Google Scholar 

  22. 22.

    Trakarnsanga, K. et al. An immortalized adult human erythroid line facilitates sustainable and scalable generation of functional red cells. Nat. Commun. 8, 14750 (2017).

    ADS  CAS  Article  Google Scholar 

  23. 23.

    Mettananda, S. et al. Editing an α-globin enhancer in primary human hematopoietic stem cells as a treatment for β-thalassemia. Nat. Commun. 8, 424 (2017).

    ADS  Article  Google Scholar 

  24. 24.

    Bak, R. O., Dever, D. P. & Porteus, M. H. CRISPR/Cas9 genome editing in human hematopoietic stem cells. Nat. Protocols 13, 358–376 (2018).

    CAS  Article  Google Scholar 

  25. 25.

    Scott, C. et al. Recapitulation of erythropoiesis in congenital dyserythropoietic anaemia type I (CDA-I) identifies defects in differentiation and nucleolar abnormalities. Haematologica https://doi.org/10.3324/haematol.2020.260158 (2020).

  26. 26.

    Magoč, T. & Salzberg, S. L. FLASH: fast length adjustment of short reads to improve genome assemblies. Bioinformatics 27, 2957–2963 (2011).

    Article  Google Scholar 

  27. 27.

    Kent, W. J. BLAT—the BLAST-like alignment tool. Genome Res. 12, 656–664 (2002).

    CAS  Article  Google Scholar 

  28. 28.

    Langmead, B. & Salzberg, S. L. Fast gapped-read alignment with Bowtie 2. Nat. Methods 9, 357–359 (2012).

    CAS  Article  Google Scholar 

  29. 29.

    Fornes, O. et al. JASPAR 2020: update of the open-access database of transcription factor binding profiles. Nucleic Acids Res. 48, D87–D92 (2020).

    CAS  Article  Google Scholar 

  30. 30.

    Khan, A. et al. JASPAR 2018: update of the open-access database of transcription factor binding profiles and its web framework. Nucleic Acids Res. 46, D260–D266 (2018).

    CAS  Article  Google Scholar 

  31. 31.

    Telenius, J. & Hughes, J. R. NGseqBasic - a single-command UNIX tool for ATAC-seq, DNaseI-seq, Cut-and-Run, and ChIP-seq data mapping, high-resolution visualisation, and quality control. Preprint at https://doi.org/10.1101/393413 (2018).

  32. 32.

    Feng, J., Liu, T., Qin, B., Zhang, Y. & Liu, X. S. Identifying ChIP-seq enrichment using MACS. Nat. Protocols 7, 1728–1740 (2012).

    CAS  Article  Google Scholar 

  33. 33.

    Zacher, B. et al. Accurate promoter and enhancer identification in 127 ENCODE and Roadmap Epigenomics cell types and tissues by GenoSTAN. PLoS ONE 12, e0169249 (2017).

    Article  Google Scholar 

  34. 34.

    Fisher, R. A. Statistical Methods for Research Workers 5th edn (Oliver and Boyd, 1932).

  35. 35.

    Kent, W. J. et al. The human genome browser at UCSC. Genome Res. 12, 996–1006 (2002).

    CAS  Article  Google Scholar 

  36. 36.

    Quinlan, A. R. & Hall, I. M. BEDTools: a flexible suite of utilities for comparing genomic features. Bioinformatics 26, 841–842 (2010).

    CAS  Article  Google Scholar 

  37. 37.

    Kowalczyk, M. S. et al. Intragenic enhancers act as alternative promoters. Mol. Cell 45, 447–458 (2012).

    CAS  Article  Google Scholar 

  38. 38.

    Stadler, M. B. et al. DNA-binding factors shape the mouse methylome at distal regulatory regions. Nature 480, 490–495 (2011).

    ADS  CAS  Article  Google Scholar 

  39. 39.

    Pope, B. D. et al. Topologically associating domains are stable units of replication-timing regulation. Nature 515, 402–405 (2014).

    ADS  CAS  Article  Google Scholar 

  40. 40.

    Hosseini, M. et al. Causes and consequences of chromatin variation between inbred mice. PLoS Genet. 9, e1003570 (2013).

    CAS  Article  Google Scholar 

  41. 41.

    The ENCODE Project Consortium. An integrated encyclopedia of DNA elements in the human genome. Nature 489, 57–74 (2012).

    ADS  Article  Google Scholar 

Download references

Acknowledgements

J.O.J.D. and P.H. are funded by an MRC Clinician Scientist Award (MRC Clinician Scientist Fellowship MR/R008108) to J.O.J.D. This work was also supported by a Medical Research Council Discovery Award led by D.R.H. (MC_PC_15069). L.D.H., J.R.H. and S.T. developed LanceOtron with support from the National Institutes of Health (USA) grant number R24DK106766. J.R.H. is supported by the MRC Molecular Haematology Unit (MC_UU_00016/14). M.B. is supported by an MRC Clinical Research Training Fellowship (MR/P019633/1). R. Kurita and Y. Nakamura from the RIKEN Tsukuba Branch provided HUDEP-2 cells. We thank V. Iotchkova for advice on the statistical analysis.

Author information

Affiliations

Authors

Contributions

J.O.J.D. conceived the project, designed, performed and analysed experiments, performed the majority of bioinformatic analyses and wrote the first draft of the manuscript. P.H. analysed data and performed experiments. L.L.P.H. assisted with the design of experiments, performed experiments and assisted with data analysis. M.B. and D.J.D. performed experiments and analysis. L.D.H., A.M.O., R.S. and S.T. contributed to analysis. D.M.J. and N.C. assisted with experiments. T.A.M. and J.R.H. assisted with experimental design and data analysis. D.R.H. provided funding and assisted with experimental design. All authors contributed to writing the manuscript.

Corresponding author

Correspondence to James O. J. Davies.

Ethics declarations

Competing interests

J.O.J.D., D.M.J. and J.R.H. are co-founders of Nucleome Therapeutics. J.O.J.D., J.R.H., D.J.D. and R.S. provide consultancy to the company and D.M.J. is an employee. J.O.J.D. and J.R.H. have filed a provisional patent application on this work (PCT/GB2020/050253).

Additional information

Peer review information Nature thanks Ralph Stadhouders and the other, anonymous, reviewer(s) for their contribution to the peer review of this work.

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Extended data figures and tables

Extended Data Fig. 1 Overview of experimental and computational workflow.

a, Cells are initially fixed with formaldehyde and then permeabilized with digitonin. They are subsequently treated with MNase at different concentrations. End repair and ligation are then performed. This results in the ligation of sequences that are in close proximity in the nucleus. DNA is then extracted to generate an MNase 3C library. This library is sonicated to a fragment size of around 200 bp. Illumina sequencing adaptors are added to the library. This library manufacture process is scaled up to maximize the amount of DNA available and the complexity of the libraries. Multiple samples with different sequencing indices are then mixed. The DNA is then denatured and mixed with a pool of biotinylated oligonucleotides. These 120-mer oligonucleotides were designed to capture the central portion of the hypersensitive site at the promoter or the central sequence of CTCF sites guided by a combination of motif analysis and DNase I footprinting. Following a hybridization reaction, a streptavidin bead pull down is performed and the uncaptured material is washed away. The material is PCR amplified and the oligonucleotide capture is repeated to improve the purity. The reads are then sequenced with 300-bp paired-end reads, which allows the entire sequence of each read to be determined as the DNA is fragmented to 200 bp by sonication. b, The overview of the data analysis. The raw FASTQ file is processed to reconstruct a single read from paired-end sequencing data. The single reads are then mapped to the 800-bp sequence surrounding the capture oligonucleotide using the non-stringent aligner BLAT. This enables the reads to be cut into ‘slices’ depending on whether they align to the sequence around the capture site. This strategy allows ligation junctions in the read to be determined with base-pair accuracy. The resulting FASTQ file is aligned to the genome using Bowtie 2 (ref. 28). This file is processed to remove PCR duplicates and the junctions of the ‘slices’ within the reads are identified. c, The different methods used for data visualization. Simple read pile-ups are generally used. However, the resolution can be further increased by reporting the precise base-pair position of the ligation junctions. As protein binding protects against DNA digestion by MNase, the regions of protein binding can be inferred from footprints in the junction plots that are similar to DNase I footprinting. More detailed localization of the protein-binding site that results in the interaction can be achieved by separating the junction profiles based on whether the read and therefore protein-binding site is upstream or downstream of the junction. Finally, single-base-pair resolution maps of junctions between the capture site and the peaks at regulatory elements can be generated. In the example above the central binding site of the two interacting CTCF sites is protected and the ligation junctions surround this. The direction of the reads at the capture and reporter sites can be used to identify the site of the proteins giving rise to the ligation junctions. This can be plotted with arrow plots. Here this shows that the central CTCF motif at both the capture and reporter site are the origin of the contacts between the two sites. This is more easily visualized using 3D surface plots. These were constructed by converting each data point into a rectangle 20 bp long and 4 bp wide in the direction of the reads giving rise to the interactions. This shows the central binding site of the two CTCF sites giving rise to the interactions and contacts between these central CTCF motif and the neighbouring nucleosomes.

Extended Data Fig. 2 Technical details of library preparation, reproducibility and biases.

a, Optimal MNase 3C library digestion keeps the nucleosome tails intact whereas over digestion leads to the loss of nucleosome linkers. This results in a failure of fragments to religate. Note that the digestion controls show that MNase cuts chromatin into mononucleosomes and that there are very few fragments over 1,000 bp. The fragment size is considerably smaller than DpnII-digested chromatin. b, MCC profiles of the Hba-a1 and Hba-a2 promoters in erythroid cells showing the interaction profiles derived from MNase-based 3C library preparation using a conventional NP-40-based nuclear extract, compared with data generated from intact cells permeabilized with digitonin. Data from three mice with two aliquots of cells from each mouse treated with two different concentrations of MNase are shown. Counts are normalized to the total number of reads across the genome. The assay is highly reproducible between replicates. To look for biases caused by MNase digestion, we sequenced the digestion controls. In addition, we sequenced the MNase 3C library without oligonucleotide capture to look for biases resulting from the ligation reaction. The global distribution of reads from the MNase digestion and the ligation junctions from the uncaptured library shows a very similar distribution to background (bottom two panels), without obvious biases towards the hypersensitive sites. c, Violin plots of the genome-wide analysis at different classes of element show the number of reads in a 1-kb window around different classes of element compared to control regions 10 kb downstream of the element generated by sequencing of the MNase digestion controls and ligation junctions in the unenriched MNase 3C library. Sequencing of the digestion controls shows no evidence of biases in MNase digestion at enhancers or CTCF sites. There is a small reduction in the number of reads at promoters, possibly due to the loss of smaller fragments from histone-depleted regions in the DNA extraction and sequencing library preparation. Conversely, sequencing of the ligation junctions reveals a slightly higher numbers of junctions at promoters and regulatory elements including CTCF sites, which is probably due to the ligation process. d, Analysis of the DNA sequence at ligation junctions detected no biases towards ligation junctions in AT-rich sequences (which MNase is reported to cut preferentially). e, Metaplots of the junction count from the uncaptured MNase 3C library at DNase I hypersensitive sites show a small bias to the central 200 bp where there are more junctions, but this is partially offset because the fragment size is reduced in the hypersensitive sites. There was no correlation between the strength of the hypersensitive site and the number of junctions per kb within the site. A model of the background distribution of reads was generated to correct for this effect using a 20-bp moving window across the metaplot of the hypersensitive sites. f, Plots of single-normalized MCC data (to the total number of reads across the genome from the viewpoint) compared to double normalization, which corrects for the small bias at hypersensitive sites. This analysis shows that double normalization for the hypersensitive site effect does not significantly change the interaction profile compared withs single normalization. Peak calling with the machine-learning-based peak caller LanceOtron of both single- and double-normalized data showed that 94% of the significant peaks remains unchanged by this correction (Supplementary Table 2).

Source data

Extended Data Fig. 3 Comparison of MCC data generated using different capture sites at active active genes in erythroid cells.

ad, MCC profiles of the promoters of Hba-a1 and Hba-a2 (a), Hbb-b1 and Hbb-bs (b), Slc25a37 (c) and Cd47 (d) in erythroid and ES cells (anchor points are denoted by red arrows). The interaction profile was markedly different depending on whether the viewpoint was directly over the hypersensitive site at the promoter (central; blue) or shifted 1 kb upstream (red) or 1 kb downstream (green). Data are reported as read pile-ups without windowing and the number of reads is normalized to the total number of reads across the genome. Profiles of NG Capture-C data from the same viewpoint, DNase-seq and ChIP–seq data of H3K4me1, H3K4me3 and CTCF. Bottom, the MCC profile from the central viewpoint in ES cells as a control. At all genes, there are significantly stronger interactions between the central promoter and the known enhancers of the genes (denoted by green arrows) and CTCF-binding sites (denoted by pink or purple arrows depending on motif orientation). e, Heat maps showing the punctate nature of promoter–promoter and enhancer–enhancer contacts (compared to randomly chosen sites) using data from the unenriched Hi-C library. The data presented show a 3-kb region around the centre of the hypersensitive site with a 50-bp bin size.

Source data

Extended Data Fig. 4 Comparison of MCC data generated using capture sites at the promoters of active genes in erythroid and ES cells.

a, MCC profiles from the promoter of Myc showing long-range interactions with enhancers of the gene that are more than 1 Mb from the promoter. b, Analysis of the Sox2 locus showing long-range interactions that are 750 kb from the promoter of the gene, which precisely localize with CTCF-binding sites, DNase-seq (green, ENCODE UW) and ChIP–seq data of H3K4me1, H3K4me3 (ENCODE/LICR), NANOG (GSM1082342), SOX2 (GSM1082341), OCT4 (GSM1082340)18 and CTCF. c, d, MCC profiles of the promoter of Klf4 (c) and Nanog (d) in ES cells. e, MCC profiles of Hbb (Hbb-bs), the inactive Hbb-y gene and the main HS-2 enhancer. Note that the profile from the inactive gene does not show contacts with the hypersensitive enhancers but that it has contacts with the surrounding chromatin compartment. f, MCC data using the promoter or enhancer of Sox2 as the viewpoint in ES cells and data of the enhancer in erythroid cells. g, The contacts of the promoter with the gene body for four genes that are transcribed in erythroid cells (Hemgn, Fhdc1, Epb4.9 and Btg2). These data show no evidence of gene looping between the promoter and the 3′ end of the gene. h, Metaplot of the number of junctions detected between the promoter and the 4-kb region surrounding the 3′ UTR. To account for distance effects, short genes (<10 kb) (left) and long genes (>10 kb) (right) are plotted separately. The comparison between erythroid cells, in which the genes are active, and ES cells, in which the genes are not transcribed, controls for distance and mapping effects. The dot plots show that there are no significant differences between the number of ligation junctions in 500-bp bins surrounding the 3′ UTR when genes are in active or inactive states. Short genes, n = 12; long genes, n = 18. Data are mean ± s.d., two-way ANOVA, no adjustments were made for multiple comparisons. This shows that there is no change in the number of contacts when the genes change from an active to an inactive state.

Source data

Extended Data Fig. 5 Promoters in gene-dense regions make contacts with multiple promoters.

a, Top, MCC profiles from the promoter of Klf1 showing long-range interactions with multiple promoters in the 400-kb surrounding the gene. However, it is likely that this gene is at least in part controlled locally as the region surrounding the promoter is monomethylated and bound by the erythroid transcription factor GATA1 (data not shown). Bottom, MCC data for Cldn13 showing contacts with 25 promoters and enhancers in the surrounding 1.5 Mb. Again, this gene is likely to be controlled at least in part by local regulatory elements. b, MCC profiles using the promoters of Atf5, Jund, Chd4, Gatad2a, Ddit3, Dedd2, Mafg and Gfer as viewpoints, show interactions in gene-dense regions. c, The number of interacting elements within the 2-Mb region of each viewpoint. Notably, promotor–promotor interactions dominate in these gene-dense regions.

Source data

Extended Data Fig. 6 MCC of CTCF sites at loci that are active in erythroid or ES cells shows that highly punctate interactions occur between CTCF sites and that these correlate with the activity of the intervening chromatin.

Note that contacts from CTCF sites do not correlate strongly with DNase I hypersensitivity. a, Capture from the CTCF site downstream of the enhancers at the β-globin locus. This site interacts strongly and precisely with the CTCF site upstream of the genes. These sites are in a convergent orientation. These interactions are not present in ES cells when the gene is inactive despite there being similar levels of CTCF occupancy at these sites in both tissues. b, Capture at an intergenic CTCF site at the Cd47 locus in erythroid cells shows strong tissue-specific interactions between convergent sites upstream that are not present in ES cells when the gene is inactive. c, At the Slc25a37 locus (which encodes mitoferrin), the CTCF site downstream of the enhancers interacts strongly with the convergent CTCF site at the promoter of the gene in a tissue-specific manner. d, Strong and highly punctate contacts are seen between multiple CTCF sites at the gene-dense Klf1 locus. e, At the Myc locus, highly specific, punctate contacts occur between the CTCF sites on either side of the gene and its regulatory elements. These contacts correlate with transcription (the gene is transcribed more in ES cells than erythroid cells). f, Similarly, at the Sox2 locus, highly specific long-range contacts occur between CTCF sites and these are highly tissue-specific (they are absent in erythroid cells in which the gene is inactive). DNase-seq (green) and ChIP–seq of CTCF and RAD21 are shown for both erythroid cells and ES cells. The RAD21 and CTCF ChIP–seq are normalized using a spike-in of human cells.

Extended Data Fig. 7 MCC of CTCF sites in ES cells shows that highly punctate interactions occur between CTCF sites over very long ranges.

a, At Pou5f1, tissue-specific contacts occur with a tissue-specific CTCF-binding site that is found in ES cells but not erythroid cells. b, At Klf4, we captured from a tissue-specific CTCF site, which contacts several CTCF sites in the same orientation in the vicinity. In erythroid cells, this sequence forms no specific contacts with the surrounding chromatin. DNase-seq and ChIP–seq of CTCF and RAD21 are shown for both erythroid and ES cells. c, Analysis of sequencing of the unenriched Hi-C-like MNase 3C library confirms highly punctate contacts between CTCF sites from genome-wide data (heatmap with 50-bp bins ±1,500 bp from the centre of the interacting CTCF sites). d, The number of contacts between viewpoint CTCF sites (n = 95) and other CTCF sites was calculated in two separate 300-kb windows, one upstream (red) and one downstream (blue) of the viewpoint CTCF site. H3K27ac was analysed separately for both the upstream and downstream 300-kb windows to provide a global estimate of the activity of the enhancers and promoters. The linked data points represent individual loci. Higher levels of H3K27ac on one side of a CTCF site correlated highly significantly with greater numbers of inter-CTCF site contacts (Wilcoxon signed-rank test). e, Metaplot of ligation junctions between CTCF sites. These are separated by the relative orientation and position of the viewpoint and interacting CTCF sites. This clearly shows that the orientation and relative position of the CTCF sites strongly determines contact frequency. f, Effect of CTCF orientation on ligation junction counts. The x axis displays the distance relative to the viewpoint corrected for orientation of the CTCF at the viewpoint; the y axis displays the junction count corrected for the orientation of the interacting CTCF site. Data are 6 replicates, 82 viewpoints, n = 14,010. g, Proposed model for the way in which cohesin loading at active enhancer and promoter elements alters CTCF–CTCF contacts using the α-globin locus as a model. In erythroid cells, activity of the α-globin genes and enhancers leads to increased cohesin loading at these sites. This results in loop extrusion and subsequently increased contacts between the CTCF sites upstream of the enhancers and distal to the Hba-a1 and Hba-a2 promoters. These contacts do not occur in ES cells, despite very similar levels of CTCF binding, because the enhancers and promoters are inactive and do not load cohesin.

Source data

Extended Data Fig. 8 Contacts from CTCF sites are highly correlated with cohesin.

a, MCC profiles of four CTCF sites with low co-occupancy of cohesin as measured by RAD21 ChIP–seq. These sites form no peaks with surrounding CTCF sites. b, When high levels of RAD21 coincide with CTCF, the sites form multiple contacts with surrounding CTCF sites. Profiles are included for DNase-seq and ChIP–seq of CTCF and RAD21 in erythroid cells.

Extended Data Fig. 9 Genome-wide analysis of inter-CTCF contacts and peak-calling analysis of MCC.

a, b, MCC profiles of the promotors at the α- and β-globin loci (Hba-a1 and Hba-a2, and Hbb-b1 and Hbb-bs, respectively), DNase-seq and ChIP–seq data of CTCF, RAD21, H3K27ac, NIPBL, GATA1, NF-E2 and KLF1. c. Metaplots of RAD21 and CTCF-binding density (RPKM) at the two nearest CTCF-binding sites flanking erythroid- and ES-cell-specific superenhancers in erythroid (red, n = 190) and ES (green, n = 462) cells. A significant difference in enrichment of RAD21 was found at CTCF sites flanking superenhancers in the different cell types. Higher levels of RAD21 were found at active ES superenhancers in ES cells compared with the same sequence when the enhancers are inactive in erythroid cells. Conversely higher levels of RAD21 were found at active erythroid superenhancers in erythroid cells compared with the same sequence in ES cells. CTCF-binding density of RAD21 and CTCF at random sites (n = 500) was quantified as a control and was similar in both cell types. Box plots of RAD21 binding (RPKM) at the two flanking CTCF-binding sites (1-kb region around the centre of the CTCF site) flanking erythroid- and ES-cell-specific superenhancers in erythroid (red) and ES (green) cells. Two-tailed Student’s t-test; box plots show the median, interquartile range, maximum points within 1.5× the interquartile range of quartile 1 or 3. d, Heat maps of 10-kb genomic regions surrounding promoters, enhancers or CTCF-binding sites showing DNase I hypersensitivity and ChIP–seq data for H3K27ac, H3K4me3, mediator (MED1), NIPBL, RAD21 and CTCF. The chromatin loader NIPBL is highly enriched at enhancers and promoters compared with CTCF sites. e, Manhattan plot showing highly significant peaks of interaction irrespective of the method of peak calling (–log10 transformations of the P values are plotted on the y axis). The data have been peak-called with three different orthogonal methods. MACS2, a custom Poisson-based model and a machine-learning-based model. All of these methods calculate the enrichment over the background data, which has undergone targeted capture. f, Histogram of the percentage of peaks falling within the topologically associating domain (TAD) in erythroid cells from promoters and CTCF sites. g, Percentage of ligation junctions falling within the TAD in cis from erythroid promoters. n = 576, data are mean ± s.d. h, Analysis of the strength of contacts at promoters with different classes of element as categorized by GenoSTAN. This analysis shows that promoters broadly contact all classes of element but have a slight predilection to contact promoters and enhancers compared with CTCF sites. By contrast, CTCF sites preferentially contact other CTCF sites compared with other categories. Normalized total numbers of junctions in the 1-kb region surrounding different classes of elements within 400 kb of the viewpoint. n = 6, data are mean ± s.d., two-way ANOVA, Tukey’s multiple comparisons test.

Source data

Extended Data Fig. 10 Base-pair resolution analysis of MCC ligation junctions shows that enhancer–promoter contacts generate complex patterns compared with CTCF sites.

a, Single-base-pair resolution plots of ligation junctions between two CTCF sites, showing that ligation junctions surround the central binding motifs, which are protected from MNase digestion. b, Superimposition of the directionality of the reads at the viewpoint and reporter site allows the precise location of the central CTCF-binding motif to be determined. c, Directional footprinting of the main enhancer element at the α-globin locus. d, Reconstructions of the contacts between the transcription factors and the promoter using directional footprinting of the viewpoint and enhancer. e, f, MCC footprinting shows more complex patterns of ligation junctions at enhancers than at CTCF sites at the Slc25a37 and Cd47 loci. g, h, Reconstruction metaplots of data from the strongest 150 CTCF–CTCF interactions and 65 enhancer–promoter interactions, showing the complexity of the enhancer–promoter contacts compared with contacts between CTCF sites.

Source data

Extended Data Fig. 11 Base-pair resolution analysis of MCC ligation junctions at pleotropic enhancers in ES and erythroid cells.

a, To show that MCC could detect subtle changes in chromatin architecture, we captured from gene promoters, such as Rad51c, that have active promoters in both erythroid and ES cells and have adjacent hypersensitive enhancers at the same sequence in erythroid and ES cells. b, c, MCC data at one of the shared enhancers delineates different patterns of footprinting in erythroid and ES cells. This shows that fine-scale alterations occur in the contacts at the same sequence in the two cell types, which is probably due to differences in the transcription factor repertoire. df, The gene Atp5a1 is also active in both erythroid (red) and ES cells (blue) (d) and there are contacts in both tissues with the promoter of Haus1 (e) and a local enhancer (f). At both of these sites the footprinting is clearly different in the two tissues at sites with the same DNA sequence. This shows fine-scale changes in the contact pattern that result from differences in the DNA-binding proteins in the two cell types.

Source data

Extended Data Fig. 12 MCC profiles at the main α-globin enhancer show the specific loss of contacts when an NF-E2 site is deleted.

a, Genome editing was used to make a small 2–4-bp deletion, determined by Illumina sequencing, in an NF-E2 consensus motif in the main enhancer at the α-globin locus (termed the R2 enhancer). b, ChIP–qPCR showing loss of NF-E2 binding at the R2 enhancer compared with enrichment at the adjacent R1 enhancer. Wild type, n = 6; NF-E2 edited, n = 8; data mean ± s.d., two-tailed Mann–Whitney U-test. Note that complete abrogation of NF-E2 binding would not be expected because there are two NF-E2-binding sites in the enhancer, which are separated by 26 bp. c, Deletion of the NF-E2-binding site at the R2 enhancer results in a significant reduction in expression of the α-globin genes in erythroid cells from normal donors that have undergone genome editing at the R2 NF-E2 site (editing efficiencies in excess of 90% with Cas9 ribonuclear protein, data not shown). n = 7, data are mean ± s.d., two-tailed Mann–Whitney U-test. d, e, MCC profiles of the promotors of HBA1 and HBA2 in human HUDEP-2 cells (d), showing that the interactions with the main enhancer (R2) reduce very specifically (e) at the site of an engineered 2–4-bp deletion at the binding site of NF-E2. f, In addition, the MCC footprint is specifically altered at the NF-E2-binding site. g, Quantification of read depth at all other hypersensitive sites at the α-globin locus showing that the only statistically significant change is at the site of the deletion. Data from the edited clone aligned to a modified hg19 genome with the deletion; n = 6, data are mean ± s.d., two-tailed Mann–Whitney U-test.

Source data

Supplementary information

Supplementary Information

This file contains explanatory notes for the columns in Supplementary Table 2.

Reporting Summary

Supplementary Table 1

This file contains a summary sheet of the metrics of the sequencing data from the different experiments carried out. In addition, there are individual sheets for each experiment provide the sequencing data metrics for each sample / viewpoint. There is also an included the analysis of the data with respect to topologically associated domains.

Supplementary Table 2

This contains a summary of all of the peak calls in the data performed by the three different methods of peak calling. See Supplementary Information PDF for an explanation of the data columns in this table.

Source data

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Hua, P., Badat, M., Hanssen, L.L.P. et al. Defining genome architecture at base-pair resolution. Nature 595, 125–129 (2021). https://doi.org/10.1038/s41586-021-03639-4

Download citation

Comments

By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.

Search

Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing