Dissection of the 4D chromatin structure of the α-globin locus through in vivo erythroid differentiation with extreme spatial and temporal resolution

Precise gene expression patterns during mammalian development are controlled by regulatory elements in the non-coding genome. Active enhancer elements interact with gene promoters within Topologically Associating Domains (TADs)1–3. However, the precise relationships between chromatin accessibility, nuclear architecture and gene activation are not completely understood. Here, we present Tiled-C, a new Chromosome Conformation Capture (3C) technology4, which allows for the generation of high-resolution contact matrices of loci of interest at unprecedented depth, and which can be optimized for as few as 2,000 cells of input material. We have used this approach to study the chromatin architecture of the mouse α-globin locus through in vivo erythroid differentiation. Integrated analysis of matched chromatin accessibility and single-cell expression data shows that the α-globin locus lies within a pre-existing TAD, which is established prior to activation of the domain. During differentiation, this TAD undergoes further sub-compartmentalization as regulatory elements gradually become accessible and specific interactions between enhancers and promoters are formed. As these chromatin changes develop, gene expression is progressively upregulated. Our findings demonstrate that chromatin architecture and gene activation are tightly linked during development and provide insights into the distinct mechanisms contributing to the establishment of tissue-specific chromatin structures.

A major current goal in biology is to characterize the three-dimensional nuclear architecture of the genome, to determine how this structure changes during differentiation and development, and to understand how these changes relate to gene expression. It has been shown that the genome is extensively reorganized during differentiation 5,6 and that interactions between enhancers and promoters can be established prior to expression, which is suggested to prime gene loci for future activation 7,8 . However, it remains unclear precisely when specific interactions between enhancers and promoters are formed relative to activation of gene expression [9][10][11] . To better understand the relationship between chromatin architecture and gene expression, it is crucial to characterize chromatin structure at high resolution in pure, primary cell populations representing relevant developmental stages. This has been hampered by the lack of high-resolution 3C methods that are suitable for the analysis of limited numbers of primary cells.
To overcome these hurdles, we developed Tiled-C, a new 3C-based approach 4 , which can generate high-resolution contact matrices of selected regions of interest. Tiled-C uses a panel of capture oligonucleotides tiled across all restriction fragments of specified genomic regions, combined with an adapted Capture-C based protocol 12 , to efficiently enrich for 3C contacts within this region.
This allows for deep, targeted sequencing of chromatin interactions within regions of interest and thus for the generation of high-resolution, Hi-C-like data at unprecedented depth, across multiplexed samples and genomic regions (Methods). Tiled-C combines the ability of all vs all methods such as Hi-C 13 to map large-scale chromatin structures including TADs, as well as the ability of one vs all methods such as 4C 14,15 and Capture-C 12,16 to robustly identify enhancerpromoter interactions within TADs in detail (Supplementary Figure 1,2). To validate the Tiled-C approach, we compared Tiled-C data to the deepest currently available in situ Hi-C datasets (mouse ES cells 5 ; Figure 1a). Tiled-C data at this region was ~28-fold higher in depth and required ~19fold less sequencing (Supplementary Table 1 Table 2) allowing for the analysis of previously intractable primary cell types. This is critical for the investigation of 4D (3D structure through developmental time) genome organization, as cell numbers become extremely limiting at early stages of development.
We used Tiled-C to study the changes in chromatin structure associated with gene activation in primary cells during erythroid differentiation. Using fluorescence-activated cell sorting (FACS), we isolated sequential stages of erythroid differentiation directly from mouse fetal livers 18,19 (Figure   3a,b). We focused our analysis on a ~3.3 Mb region containing the well-characterized a-globin genes and their associated regulatory elements. The a-globin genes are regulated by five erythroidspecific enhancer elements (R1-R4 and Rm), which classify as a super-enhancer 20 , and interact with the gene promoters within a TAD flanked by multiple CTCF-binding elements [21][22][23][24] (Supplementary   Figure 2). We generated a single-cell RNA-seq dataset 25 , which is the first scRNA-seq dataset to include the full course of in vivo erythroid differentiation through to terminal differentiation in the mouse (Supplementary Figure 5). This dataset shows that a-globin is expressed at basal levels in the S0 populations. Expression of a-globin dramatically increases during the S2 stage and plateaus at S3, however the earliest cells showing elevated expression of a-globin are found in the S1 stage ( Figure 3c). To validate that erythroid-specific a-globin upregulation begins at S1, we performed RNA-FISH to detect nascent transcription in FACS-sorted primary cells (Figure 3d). We detect a small increase in nascent transcription from S0-low to S0-medium cells, and confirm a robust increase in expression from S0 to S1 cells (P < 0.005 by paired T-test; Figure 3e).
We used the Assay for Transposase-Accessible Chromatin using sequencing (ATAC-seq) 26 to profile chromatin accessibility in these stages. Interestingly, we find that both enhancer and promoter elements are accessible prior to the onset of erythroid-specific gene expression, and that the degree of accessibility gradually increases, concomitant with upregulation of a-globin expression ( Figure 4, Supplementary Figure 6). Tiled-C shows that a TAD structure encompassing the a-globin locus is present at the earliest stage (S0-low), prior to the formation of weak enhancerpromoter interactions in the S0-medium stage ( Figure 4, Supplementary Figure 7). These enhancer-promoter interactions further strengthen in the subsequent S1 and S2 stages, accompanied by increases in a-globin expression and accessibility. In the S3 stage, where chromatin accessibility and expression reach their maximum levels, we observe strong enhancerpromoter interactions in a sub-compartmentalized chromatin structure similar to that observed in primary erythroblasts derived from mature spleen tissue ( Figure 1, Supplementary Figure 1,2). This smaller sub-compartmentalized structure, which forms within the pre-existing TAD, is delimited by convergent CTCF-binding elements that flank the a-globin enhancers and genes (Supplementary Figure 2). We have previously shown that these CTCF-binding elements are functionally important to restrict the interactions of the a-globin enhancers and prevent other genes within the TAD, but outside of the sub-compartmentalized structure, from being upregulated 22 (Figure 4, Supplementary Figure 2). This suggests that this smaller erythroid-specific domain is likely formed by similar CTCF-dependent mechanisms as TADs, although it is smaller in size (~70 kb) and has very high internal interaction frequencies compared to typical TADs.
Since both accessibility and the encompassing TAD structure are present prior to erythroidspecific a-globin expression, we purified early hematopoietic progenitor populations to investigate when in differentiation these features are established. Interestingly, we find that the pre-existing TAD containing the a-globin locus is already present in hematopoietic stem cells, despite four out of five enhancers and both promoters being inaccessible at this stage (Supplementary Figure 8).
To examine whether a similar order of events operates at other gene loci, we examined the chromatin architecture of Cpeb4, which is located ~420 kb upstream of the a-globin genes. Cpeb4 is an essential gene for terminal erythropoiesis 27 and is gradually upregulated during erythroid differentiation (Supplementary Figure 9). Tiled-C shows that the TAD encompassing Cpeb4 is established prior to erythroid-specific gene expression. Enhancer accessibility and a sub- The early and gradual establishment of accessible regulatory elements is consistent with previous analysis of transcription factor binding at the a-globin locus during erythroid differentiation 28 . Our findings are also consistent with previous chromatin conformation studies of the of the aand bglobin loci in erythroid cell lines 29,30 . Furthermore, our results fit with our recent analysis of the conformation of the a-globin locus in two stages of ex vivo erythroid differentiation. Using superresolution microscopy, we showed that the a-globin locus forms a self-interacting domain early in differentiation, prior to maximal gene activation. As erythropoiesis proceeds and a-globin transcription is upregulated, this domain decondenses, while the flanking CTCF sites come into more frequent proximity 24 . Here, we extend these observations in vivo at higher spatial and temporal resolution and provide new insights into the origins of these structures by analyzing earlier stages of erythroid differentiation.
Our findings are consistent with a model emerging from the work of several labs, in which preexisting TADs are established prior to domain activation. During early differentiation, smaller substructures are formed, which facilitate interactions between enhancers and promoters to prime loci for gene activation. As cells differentiate further, these interactions are strengthened, concomitant with strong upregulation of gene activity. It has recently been shown for the globin loci that gene activation is associated with the formation of higher-order hub-like structures, in which multiple enhancers and promoters form simultaneous, specific interactions 23,31 . Our data suggest that these structures may only be formed in the final stage of differentiation, when chromatin accessibility and interactions between enhancers and promoters are strongest, and may be important to achieve maximal gene expression. This model is further supported by recent live imaging experiments in Drosophila, in which gene activation only occurred upon the formation of tight associations between enhancers and a gene promoter, and not after induced enhancer-promoter proximity resulting from interactions between insulator elements 32 . Moreover, it is consistent with previous analyses of the conformations of the Hoxd cluster 33 and the Shh locus 34 across tissue types. Interestingly, our model implies that there are multiple processes contributing to the formation of specific chromatin structures associated with gene activation. A pre-existing TAD encompassing the a-globin locus is formed prior to and thus independent of activation of the regulatory elements within the domain. This is likely driven by tissue-invariant loop extrusion mediated by cohesin and constitutive CTCF-binding elements 35 . During differentiation, chromatin accessibility increases, and a smaller sub-domain is formed within this TAD. We have previously shown that deletion of the CTCF-binding sites at the base of this sub-domain causes it to expand and leads to aberrant expression of the neighboring genes 22 . This indicates that sub-compartmentalization is dependent on these CTCF-binding sites and implies that its formation is mediated by loop extrusion. Since the CTCF-binding sites are constitutively occupied, erythroid-specific compartmentalization is likely driven by increased rate or processivity of loop extrusion in this region during differentiation.
As we have previously observed erythroid-specific accumulation of cohesin at the a-globin enhancers 22 , it is possible that this is mediated by increased cohesin recruitment at the activated regulatory elements. This is further supported by studies showing that cohesin co-localizes with transcription factors across the genome 36,37 .
Within the sub-compartmentalized domain, specific interactions between the activated enhancers and promoters are formed gradually during differentiation. Somewhat surprisingly, the initial formation of accessible elements significantly precedes the onset of specific enhancer-promoter interactions. This indicates that chromatin opening can occur independently of chromatin reorganization, yet further increases in accessibility do occur alongside the establishment and progressive strengthening of enhancer-promoter interactions, suggesting only a partial decoupling. This is consistent with our previous work, which showed that deletion of the a-globin enhancers does not affect formation of the a-globin TAD, but does affect interactions between the enhancers and promoters within the domain 21,24 . This suggests that active regulatory elements play a role in the formation of tissue-specific chromatin structures, possibly mediated by interactions between the multi-protein complexes bound at these elements.
In conclusion, our dissection of the chromatin architecture of a well-understood gene locus during in vivo differentiation provides new insights into the mechanisms which regulate gene expression patterns during development. Importantly, Tiled-C provides an approach that enables such detailed analysis in cell types that were previously intractable.

Mature erythroid cells
Mature primary Ter 119+ erythroblasts were obtained from spleens of female C57BL/6 mice treated with phenylhydrazine as previously described 12 .

Mouse ES cells
Mouse ES cells were cultured and harvested as previously described 12 .

Erythroid progenitors
Primary erythroid progenitor cells were isolated from fetal livers, which were freshly isolated at

Replicates
The presented Tiled-C data derived from mature splenic erythroblasts and ES cells represent biological triplicates produced from separate mice or culture flasks, respectively. The presented Tiled-C data derived from hematopoietic and erythroid progenitor populations represent biological duplicates, with the exception of the S1 stage, for which we used a single biological replicate to generate technical duplicates. The presented ATAC data derived from hematopoietic and erythroid progenitor populations represent biological triplicates for the S0-low, S0-medium and S1 populations, biological duplicates for the S2 and S3 populations, and single replicates for the hematopoietic progenitor populations. The presented RNA-FISH data represent biological triplicates except for the brain and no-primary-antibody negative control, which represent biological duplicates.

Ethics
All protocols were approved through the Oxford University Local Ethical Review process and all experimental procedures were performed in accordance with European Union Directive 2010/63/EU and/or the UK Animals (Scientific Procedures) Act, 1986.

Rationale
Tiled-C is an hybrid of the all vs all 3C methods, such as Hi-C 13 , and the one vs all methods, such as 4C 14,15 and Capture-C 12,16 . Tiled-C generates all vs all contact matrices of specified genomic regions and thus combines an unbiased all vs all view with the ability to target regions of interest, without the need to sequence chromatin interactions genome-wide. Tiled-C has similarities to 5C 38 , but overcomes shortcomings related to PCR primer design and duplicate filtering and is able to generate data at higher resolution and depth. Tiled-C also has similarities to Capture Hi-C 39 . The main differences are that Tiled-C enriches a 3C library rather than a biotinylated Hi-C library, thereby retaining optimal library complexity, which is critical for analysis of small cell numbers. In We ordered panels of double-stranded capture oligonucleotides from Twist Bioscience (Custom probes for NGS target enrichment). As recommended by Twist, we used 13.67 fmol of each individual oligonucleotide per enrichment reaction.

Experimental procedure
We prepared 3C libraries as previously described 12  For enrichment using single-stranded oligonucleotides, we used the Nimblegen SeqCap EZ reagents and followed the SeqCap EZ Library SR User's Guide (Chapters 5-7). We multiplexed up to 6 samples per enrichment reaction in a single tube, and multiplied the volumes For enrichment using double-stranded oligonucleotides, we used Twist Biosciences reagents and followed the Twist Custom Panel Protocol (Steps 4-7). To multiplex samples, we used 375-500 ng indexed library per sample, mixed up to 1.5 µg (in exact 1:1 ratio) in a single tube, and used single reaction volumes as described in the protocol. We processed multiple tubes simultaneously if required. Streptavidin C1 beads were used to enrich the hybridized DNA and the washed material was amplified using 10-12 cycles of PCR. Ampure-XP beads were used in a 1.8:1 bead:sample ratio to clean up the amplification reaction and DNA was eluted in 30 µl PCR-grade water. To increase enrichment, a second round of oligonucleotide capture was performed following the same procedure, using up to 1.5 µg of enriched material in a single hybridization reaction (even if multiplexed in first round) of 20-24 hours.
The enriched Tiled-C libraries were assessed using the Agilent Bioanalyzer or D1000 Tapestation and quantified using KAPA Library Quantification reagents, before sequencing using the Illumina NextSeq platform. In high-quality libraries, sequencing 3-5 million reads per enriched Mb per sample is sufficient for data at 5 kb resolution. 13

Analysis
The most straightforward way to analyze Tiled-C data is to use the HiCPro pipeline 41 with the options for Capture-Hi-C analysis. We have also adjusted our pipelines for Capture-C analysis 12 to be compatible with Tiled-C data. This pipeline is designed to analyze deep, targeted 3C data and provides very stringent filtering, especially regarding PCR-related artefacts. All data presented in the paper have been analyzed using a combination of this CCseqBasic pipeline To examine the reproducibility of Tiled-C in low-input samples, we used HiCRep to calculate stratum-adjusted correlation coefficients 43 , considering a maximum distance of 100,000 bp.

Hi-C
We compared Tiled-C data in mouse ES cells to the deepest currently available Hi-C data in mouse ES cells 5 . We explored the data using HiGlass 44 and downloaded and re-analyzed the Hi-C data using the HiC-Pro pipeline 41 with default options and ICE normalization 42 .

ATAC-seq
Experimental procedure For FACS-sorted erythroid progenitors from fetal liver, either 1 or 2 technical replicates were processed of ~50,000 cells each for each sorted population. ATAC-seq was performed as previously described 26 .
For FACS-sorted hematopoietic stem and progenitor cells from adult bone marrow, 1 technical replicate of between 5,000 and 20,000 cells was processed for each population. Cells were spun at 500 g for 10 minutes at 4 °C. The supernatant was discarded and cells were resuspended directly in Dig-transposition buffer (25 µl 2x TD Buffer [Illumina], 2.5 µl Tn5 transposase, 0.5 µl 1% digitonin and 22 µl H2O) before incubating at 37 °C for 30 minutes with agitation at 600 rpm.
After the transposition step, samples were processed as previously described 26 .

Analysis
Reads were mapped to the mouse mm9 genome and PCR duplicates removed using NGseqBasic 45 .
Technical replicates were merged and peaks called using MACS2 46 . Peaks were merged and the number of reads in each sample overlapping each peak was calculated using BEDTools merge and multicov 47 . For visualization, bedgraph files were generated using BEDTools genomecov with a scaling factor of 1e6 / (total number of reads in peaks). All analysis scripts are available at https://github.com/rbeagrie/alpha-tiledc.

Single-cell RNA-seq
Experimental procedure Fetal livers were harvested and pooled from 7 e13.5 C57BL/6 mouse embryos and processed as above. Cells were first stained with 2.5 µg/ml biotin-conjugated anti-Ter119 (BD 553672) and 2. [AAGCGCTTGGCA], and 1 µg/ml non-specific IgG conjugated with ADT10 [CGGAGTAGTAAT]). Antibodies were conjugated to streptavidin as previously described 25 and mixed with biotinylated custom oligonucleotides. Cells were processed for scRNA-seq using the 10x Genomics Single Cell 3' v2 kit. cDNA reads were mapped to the mouse mm10 assembly and barcodes assigned to cells using Cell Ranger v2.1.1 (10x Genomics). ADT reads were mapped to cell/antibody barcodes using CITEseq-count (https://github.com/Hoohm/CITE-seq-Count). Potential doublet cells were removed using Scrublet 48 . Further analysis was performed using Seurat v2. Low quality cells with less than 300 or more than 5,000 identified genes, or with more than 9% mitochondrial reads were also removed. Clusters were identified using the "FindClusters" function and UMAP projection was generated using the "RunUMAP" function, both with the first 16 principle components. Seurat clusters were annotated using marker genes and by reference to previously published data 19 -Seurat identified two clusters corresponding to committed erythroid progenitors (CEP) and four clusters corresponding to cells undergoing erythroid terminal differentiation (ETD). Average gene expression for populations matching those obtained by FACS sorting was generated by using the "SubsetData" function to select cells with low levels of cell-surface barcodes corresponding to lineage markers, and appropriate levels of barcodes corresponding to CD71 and Ter119. All analysis scripts are available at https://github.com/rbeagrie/alpha-tiledc.

Experimental procedure
Standard RNA-FISH was carried out as previously described 49 . Sorted cells from mouse fetal liver were placed back into culture for 6 hours to allow nascent transcription to be re-established.
Samples were hybridized with digoxygenin-labelled oligonucleotide probes directed to a-globin introns (30 ng per slide) and visualized using FITC-conjugated antibodies (primary: sheep anti-DIG FITC [Roche] 1:50, secondary: rabbit anti-sheep FITC [Vector] 1:100). Two negative controls were also included: brain tissue from a male, adult CD1 mouse and Ter119+ (i.e. mature) fetal liver erythroid cells that were probed with secondary antibody but no primary antibody.    showing the mean ± s.d. of 3 independent experiments (except for brain and "no primary" negative controls, which have n=2). P values were calculated by two-tailed paired T-tests. accessibility and enhancer-promoter interactions.

Figure 4: Upregulation of a-globin expression correlates with increased chromatin accessibility and enhancer-promoter interactions. [continued]
Tiled-C contact matrices of 500 kb spanning the mouse a-globin locus in sequential stages of in vivo erythroid differentiation at 2 kb resolution. Contact frequencies represent normalized, unique interactions in 2 replicates. Matched open chromatin (ATAC) profiles are shown underneath each matrix and represent normalized data from 3 S0-low, S0-medium and S1 replicates and 2 S2 and S3 replicates. Gene annotation (a-globin genes highlighted in red), open chromatin (ATAC) and CTCF occupancy in mature mouse erythroblast cells are shown at the top. Coordinates (mm9): chr11:31,900,000-32,400,000. Our data support a model in which TADs are established early in differentiation, prior to activation of the domain. During differentiation, accessible regulatory elements are formed. This is followed by sub-compartmentalization of the TAD into smaller domains, in which enhancers and promoters form specific interactions. Through differentiation, accessibility and interactions between enhancers and promoters are gradually increased, concomitant with upregulation of gene expression.

Supplementary Figure 1: Tiled-C contact matrices of the a-globin locus.
Tiled-C contact matrices of 500 kb spanning the a-globin locus in primary mouse erythroid cells (top) and ES cells (bottom) at 2 kb resolution. Contact frequencies represent normalized, unique interactions in 3 replicates. Gene annotation (a-globin genes highlighted in red), open chromatin (ATAC) and CTCF occupancy are shown below the matrices. Coordinates (mm9): chr11:31,900,000-32,400,000.

Supplementary Figure 2: Tiled-C contact matrix and virtual viewpoints of the a-globin locus. [continued]
Tiled-C contact matrix of 300 kb spanning the a-globin locus in primary mouse erythroid cells at 2 kb resolution. Gene annotation (a-globin genes highlighted in red), open chromatin (ATAC), CTCF occupancy and interaction profiles from virtual viewpoints (highlighted by blue arrows) across the a-globin locus are shown underneath. Contact frequencies represent normalized, unique interactions in 3 replicates. The pre-existing TAD and sub-compartmentalized erythroidspecific domain are highlighted in white and magenta dashed lines, respectively. Coordinates (mm9): chr11:32,000,000-32,300,000. Example marker genes used to annotate scRNA-seq clusters, displayed with Seurat normalized expression values. (c) Cells were also labelled with barcoded antibodies against the same surface markers used for FACS purification. Plots show three example surface markers highlighting early populations (cKit), mid-differentiation (CD71) and late-differentiation (Ter119) erythroid cells. (d) In silico gating of the scRNA-seq data was used to match cell populations from scRNA-seq to bulk datasets obtained through FACS sorting. Bottom right: Gating of Lin-cells based on barcode counts for CD71 and Ter119. Bottom left, top left, top right: For three example gates (S0-low, S1 and S3, respectively) the cluster composition of the included cells is shown. S0-low contains predominantly MMP and EEP, S1 contains mostly CEP and S3 contains ETD cells. Figure 6: Chromatin accessibility in the a-globin locus through erythroid differentiation.

Supplementary
Chromatin accessibility (ATAC) in a region of 160 kb spanning the mouse a-globin locus in sequential stages of in vivo erythroid differentiation. ATAC profiles represent normalized data from 3 S0-low, S0-medium and S1 replicates and 2 S2 and S3 replicates. The profiles are shown at different scales to highlight changes in accessibility in early stages of differentiation. Gene annotation (a-globin genes highlighted in red), open chromatin (ATAC) and CTCF occupancy in mature mouse erythroblast cells are shown at the top. Coordinates (mm9): chr11:32,090,000-32,250,000.

Supplementary Figure 7: Chromatin interactions and accessibility in the extended aglobin locus through erythroid differentiation. [continued]
Tiled-C contact matrices of ~3.3 Mb spanning the mouse a-globin locus in sequential stages of in vivo erythroid differentiation at 5 kb resolution. Contact frequencies represent normalized, unique interactions in 2 replicates. Matched open chromatin (ATAC) profiles are shown underneath each matrix and represent normalized data from 3 S0-low, S0-medium and S1 replicates and 2 S2 and S3 replicates. Gene annotation (a-globin and Cpeb4 genes highlighted in red and green, respectively), open chromatin (ATAC) and CTCF occupancy in mature mouse erythroblast cells are shown at the top. Coordinates (mm9): chr11:29,900,000-32,230,000.

Supplementary Figure 10: Chromatin interactions and accessibility in the Cpeb4 locus through erythroid differentiation. [continued]
Tiled-C contact matrices of 500 kb spanning the mouse Cpeb4 locus in sequential stages of in vivo erythroid differentiation at 2 kb resolution. Contact frequencies represent normalized, unique interactions from 2 replicates. Matched open chromatin (ATAC) profiles are shown underneath each matrix and represent normalized data from 3 S0-low, S0-medium and S1 replicates and 2 S2 and S3 replicates. Gene annotation (Cpeb4 gene highlighted in green), open chromatin (ATAC) and CTCF, H3K4me3 and H3K4me1 occupancy in mature mouse erythroblast cells are shown at the top. Coordinates (mm9): chr11:31,450,000-31,950,000. The matrices show that the Cpeb4 TAD is established early in differentiation and that a smaller domain, in which the Cpeb4 promoter interacts with accessible elements marked with H3K4me1, is formed later in differentiation, concomitant with upregulation of Cpeb4 expression (Supplementary Figure 9).