Sequential and directional insulation by conserved CTCF sites underlies the Hox timer in stembryos

During development, Hox genes are temporally activated according to their relative positions on their clusters, contributing to the proper identities of structures along the rostrocaudal axis. To understand the mechanism underlying this Hox timer, we used mouse embryonic stem cell-derived stembryos. Following Wnt signaling, the process involves transcriptional initiation at the anterior part of the cluster and a concomitant loading of cohesin complexes enriched on the transcribed DNA segments, that is, with an asymmetric distribution favoring the anterior part of the cluster. Chromatin extrusion then occurs with successively more posterior CTCF sites acting as transient insulators, thus generating a progressive time delay in the activation of more posterior-located genes due to long-range contacts with a flanking topologically associating domain. Mutant stembryos support this model and reveal that the presence of evolutionary conserved and regularly spaced intergenic CTCF sites controls the precision and the pace of this temporal mechanism.


Supplementary Notes 1 Paused and elongating pol II
To complement the analyses of the spreading of H3K27ac over the posterior part of the HoxD cluster, we ChIPed-seq the large subunit of pol II using a pan-specific antibody recognizing the Cterminal domain (CTD) of RPB1. Pol II occupancy was somewhat similar to the H3K27 15 acetylation profile, yet with an unexpected accumulation over Hoxd8 and Hoxd9 at 96h already ( Supplementary Fig. 1a), which was higher than both the H3K27ac profiles and transcripts levels.
Consequently, while a colinear distribution was scored when comparing the 72h, 120h and 168h time points, profiles were virtually identical between the entire 96h to 132h time window (Supplementary Fig. 1a-c). This poor colinear progression of pol II was like that reported in 20 amphibians 1 . However, when we ChIPed pol II phosphorylated in serine 2 (Ser2-p) of the CTD domain, a modification that accompanies the transition from pausing to productive elongation, which usually covers the transcription unit with a robust enrichment in the 3' part (e.g. 2 ) a different distribution was scored ( Supplementary Fig. 1d).
The pol II Ser2-p profile at 96h was clearly different from that of the pan-pol II antibody and 25 involved mainly the Hoxd1 to Hoxd4 region with a weak coverage of Hoxd8, correlating well with the H3K27ac and RNA-seq profiles at this stage ( Supplementary Fig. 1d). The signals had now extended towards more posterior genes and was enriched in 3'of Hoxd8 ( Supplementary Fig. 1d, left arrow in 84h), whereas at 144h, the signal had moved in 3' of Hoxd9 ( Supplementary Fig. 1d, arrowhead). At this latter stage, the coverage of the pol II Ser2-p was eventually similar to that 30 2 seen with the pan pol II antibody. These differences between the two forms of pol II ( Supplementary Fig. 1d, compare the signals with arrows in 96h to 144h) suggest that, while actively transcribed regions (pol II Ser2-p positive) were spreading from the 3' to the 5' parts of the cluster, Pol II remained positioned throughout, likely under a paused state.  b, Heatmap of Pol II ChIP-seq coverage over the HoxD cluster divided into 10 kb bins. c, Computed ratio between Hoxd9 (dashed square) and Hoxd4 coverage (fourth bin from Hoxd1) for both H3K27ac and Pol II at 96h and 120h (left panel), and between their FPKM values (right panel). Comparable ratio between H3K27 acetylation and transcription was observed, unlike with Pol II where enrichment was already observed over Hoxd9 at 96h. d, ChIP-seq profiles of Pol II phosphorylated at serine 2 at five time-points. The dynamic of the elongating Pol II Ser2-p was comparable to H3K27 acetylation though with a slight delay. In a and d, genomic coordinates: chr2:74636423-74780095.

5
Such striking differences were also readily observed at both HoxA and HoxB clusters ( Supplementary Fig. 2a, b, respectively). We concluded that elongating pol II follows a colinear dynamic first detected in anterior regions of the clusters devoid of CTCF binding sites (see below), then progressively spreading towards posterior regions similar to H3K27 acetylation, unlike paused pol II, which seems to be recruited in a rather simultaneous manner over an extended 10 central part of the clusters including the transition from the 3' CTCF-free region towards the CTCF rich region.

Supplementary Figure 2
Supplementary Fig. 2. Pol II and Pol II Ser2-p profiles on HoxA and HoxB clusters. On top a time course of Pol II ChIP-seq profiles over the HoxA (a) and HoxB (b) genes clusters in stembryos from 72h to 168h, showing a rapid spreading of Pol II over the gene clusters. Bottom: The use of an antibody against Pol II with phosphorylated serine 2 in ChIP-seq profiles at various time points reveals distinct, more progressive and colinear profiles, suggesting that Pol II is initially pausing over the gene clusters before 20 starting to elongate.

Expression of CDX genes in stembryos and binding profiles
After the initial activation of anterior Hox genes, partly in response to Wnt signaling 21 , 22 , Cdx transcription factors were reported to activate more centrally located Hox genes 23-25 , until the location of the TAD boundary (the site of inversion in the orientations of the CTCF sites) located 5 inside the Hox clusters, which generally separates the Hox12 and Hox13 genes from all other anterior Hox genes. Other studies have proposed that the transcription of the latter posterior genes depends on TGFbeta signaling, in particular by Gdf11 26,27 . Coincidentally, the mapping of CTCF binding sites within Hox clusters revealed three sub-domains; an 'anterior' domain devoid of CTCF sites; a centrally located domain where series of CTCF sites are orientated towards the 3' 10 end of the clusters, and a posterior domain where several CTCF sites display the opposite orientation 28 . A potential upstream function for CDX proteins was thus investigated.
CDX proteins 3-5 play important roles in trunk extension 6,7 and, in our stembryos, both Cdx1 and Cdx2 were expressed as early as 72h with their mRNAs reaching peak levels at around 84h ( Supplementary Fig. 3a). Single-cell RNA-seq at 96h and 120h revealed a clear overlap between 15 Hoxd8, Hoxd9 and Cdx1/Cdx2 RNAs positive cells, in particular in neuro-mesodermal progenitors (NMPs) and early presomitic mesoderm (PSM) cells ( Supplementary Fig. 3b-d), suggesting that a direct regulation by CDX proteins is possible.
We produced a ChIP-seq dataset using an antibody against CDX2 and performed a HOMER de novo motif analysis of CDX2 peaks, which revealed homeobox-like binding motifs and scored 20 CDX-known motifs among the top two high-scoring results ( Supplementary Fig. 3e). Binding into the HoxD cluster was detected as early as in 72h, in several positions in the central region containing Hoxd3 to Hoxd9 ( Supplementary Fig. 3f, orange arrows), i.e., corresponding to those genes previously proposed to be activated by CDX proteins. Three prominent peaks were scored with decreasing intensities at 120h, thus matching the timing of transcriptional activation of central 25 genes. No binding was observed beyond Hoxd9 at any stage analyzed, supporting CDX2 as participating to the activation of central Hox genes only. However, while these three peaks covered the expected Hoxd4 to Hoxd9 region, no evidence was found for a progressive recruitment along the cluster and the peak around Hoxd9 appeared consistently more enriched than the two others. the UMAP clustering map. A clear overlap was observed between Cdx and Hoxd8 to Hoxd10. e. HOMER motif analysis of CDX2 ChIP-seq peaks, with de novo motifs (top, rank:1, 49.60%, p-value: 1e-1076) and known motifs (bottom, CDX4, rank:1, 57.57%, p-value: 1e-968). f. CDX2 ChIP-seq profiles over the Hoxd genes cluster at 72h, 96h and 120h. CDX2 binding peaks within the central part of the cluster are indicated with arrows. No colinear progression was observed in CDX2 recruitment over the cluster.

Spatial expression of Hoxd9 in mutant stembryos
In our stembryos lacking CBSs and activating prematurely the gene located immediately in 5' 10 (posteriorly), we asked how such a premature expression would affect the spatial distribution of

Deletions of the Hoxd1-Hoxd4 fragment and of sub-TAD1
In addition to the single or multiple deletions of CTCF binding sites, we also looked at the importance of preferentially loading cohesin in the anterior portion of the cluster by deleting this region. This deletion removed from Hoxd1 to Hoxd4 included, yet CBS1 was left in place 5 ( Supplementary Fig. 5a). RNA-seq of mutant stembryos showed a marked decrease in transcripts level for Hoxd8 and Hoxd9, both at 96h and 120h ( Supplementary Fig. 5b). To see whether this decrease in transcription was associated with a change in the contacts established between the various intra-cluster CBSs and sub-TAD1, we produced a CHi-C dataset of Del(Hoxd1-Hoxd4) stembryos at 96h ( Supplementary Fig. 5c). While the micro-TAD observed over the anterior region 10 of the cluster under normal condition had expectedly disappeared along with the deletion ( Supplementary Fig. 5c), a quantification of contacts between the various CBS1-5 and the CS38-40 sub-TAD1 region revealed important modifications ( Supplementary Fig. 6). In 96h control stembryos, CBS2 showed a higher contact frequency with the CBSs located within the CS38-40 region, when compared to CBS1 (Supplementary Fig. 6a). In contrast, the Del(Hoxd1-Hoxd4) 15 mutant stembryos showed comparable interaction levels between CS38-40 and CBS1 and 2 ( Supplementary Fig. 6a). We interpret this as a reduction in cohesin complexes reaching more 'posterior' CBS, due to reduced loading in the absence of the DNA segment including early transcribed genes and targeted by NIPBL. Basal accumulation of cohesin was nevertheless sufficient to establish the expected constitutive contacts with the CS38-40 region. 20 We then deleted sub-TAD1 to evaluate its importance for the transcription of those central genes located in 5' of CBS1. Accordingly, the CBS-rich CS38-40 region was moved close to the anterior part of the cluster ( Supplementary Fig. 5d). Changes in mRNAs level were already observed at 96h for Hoxd8 ( Supplementary Fig. 5e, top) and a substantial decrease was scored at 120h for both Hoxd8 and Hoxd9, whereas anterior genes were less affected ( Supplementary Fig.  25 5e, bottom). These results confirmed that posterior Hoxd genes somewhat require sub-TAD1 to be properly activated and that this may be achieved through those CBS-dependent interactions that are observed in the wild type situation. It also confirmed that the initial Wnt-dependent activation of anterior genes is -at least partially-independent from sub-TAD1 and thus potentially involves elements located within the cluster itself. The CHi-C profile of these mutant stembryos revealed a 30 rewiring of contacts between the cluster and the now closely located region CS38-40 ( Supplementary Fig. 5f), with a strong micro-TAD including the anterior portion of the cluster, where cohesin loading could presumably occur normally ( Supplementary Fig. 6b).  determined by Welch's unequal variances t-test (* is p-value < 0.05, ** < 0.01). f, CHi-C map using Del(sub-TAD1) stembryos at 96h, mapped onto an in silico reconstructed mutant genome. Bin size 5kb. A higher contact frequency is observed between the Hoxd genes cluster and the CS38-40 boundary region, now relocated close by. Quantifications in Supplementary Fig. 6. Genomic coordinates: chr2:74635000-75240000. 20 Supplementary Fig. 6. Virtual 4C contacts generated from Capture Hi-C datasets produced from either control or mutant stembryos. a, Del(Hoxd1-Hoxd4) and b, Del(sub-TAD1) stembryos were used at 96h. Data were mapped to corresponding in-silico reconstructed mutant genomes. a, Quantification of the contacts between the indicated HoxD cluster CBS1 to 5 (viewpoint) and the CBS of the CS38-40 region. While CBS2 is the CBS the more used in wt, in Del(CBS1-4) both CBS1 and 2 are preponderant. b,

10
Contacts between the Hoxd gene bodies indicated below (viewpoints) and the CBS of the CS38-40 region. On top, scheme of the deleted fragment (dashed lines with scissors). An increase in contact is observed between Hoxd genes and the CS38-40 region in the mutant Del(sub-TAD1). In a and b, box plots with median value and 25-75% percentiles, whiskers represent minimum and maximum (wt: sum of 3 independent replicates, mutant: single replicate). In b, the Mann-Whitney nonparametric test was used to 15 assess significant changes between wt and Del(sub-TAD1) mutant (**** is p-value < 0.0001). after gene activation and the concurrent translocation of the cluster, this preferential interaction slowly shifts towards more central positions within the cluster (white line).

Supplementary Table 1
Supplementary