Main

The interphase human genome folds into TADs and nested subTADs. TADs were originally defined in first-generation Hi-C and 5C data as megabase (Mb)-scale, self-interacting chromatin segments in which DNA sequences exhibit substantially higher contact frequency within—compared to between—domains3,4,5,6. Molecular and computational advances over the past decade have resulted in ultrahigh-resolution genome folding maps with substantially improved signal-to-noise ratios8,9,10,11. Such technical advances have enabled the discovery of fine-grained A/B compartments8, nested subTADs within TADs7, punctate dot structures indicative of long-range looping interactions8, and stripes indicative of loop extrusion12,13,14. In light of the critical importance of dissecting the link between specific higher-order chromatin architectural features and genome function, a leading challenge is to classify subtypes of TADs/subTADs in Hi-C maps by their fine-grained structural features. Clearly defining structural classes of TADs/subTADs can in turn facilitate the careful dissection of each boundary’s molecular composition, organizing principles and unique cause-and-effect relationship across a range of genome functions.

Here we ascertain the functional link between distinct structural classes of TADs/subTADs and DNA replication. Replication initiates from tens of thousands of origins licensed in excess across the human genome in telophase and throughout G1 (refs.1,2). A small proportion of licensed origins subsequently fire in orchestrated temporal waves during S phase2. It is established that origins fire at one or more sites chosen stochastically within ≈40 kb regions (IZs)15,16,17. Nevertheless, a consensus sequence encoding origin or IZ placement has not been definitively identified in humans. Waves of early and late replication correlate with A and B compartments, respectively, and the temporal transitions from early to late replication can in some cases align with TAD boundaries3,18,19. However, the role of fine-scale genome folding patterns during interphase (such as loops, subTADs and TADs detectable in high-resolution Hi-C data) in the genomic placement of initiated origins following entry into S phase is not known.

We recently developed a high-resolution Repli-seq method to identify the placement of IZs across the genome at 50-kb resolution16. We first compared the genomic locations of IZs replicating across early, early–mid and late S phase to our high-resolution Hi-C data developed in the 4D Nucleome Consortium from H1 human embryonic stem (ES) cells11. We noticed that high-efficiency, early-S-phase IZs colocalize to strongly insulated boundaries demarcated by corner-dot TADs/subTADs on one or both sides (Fig. 1a and Extended Data Figs. 1, 2a and 3). By contrast, low-efficiency IZs that fire late in S phase can colocalize with boundaries between TADs/subTADs devoid of corner-dots (Fig. 1a and Extended Data Figs. 2b and 3). Our qualitative observations suggest that early and late IZs are enriched at genomic locations serving as boundaries of corner-dot and dotless TAD/subTADs, respectively.

Fig. 1: High-efficiency IZs localize specifically to corner-dot TAD/subTAD boundaries with high-density arrays of CTCF + cohesin-binding sites in complex orientations.
figure 1

a, A Hi-C map from H1 human ES cells for the locus chromosome (chr.) 18: 23.75 Mb–25.75 Mb, hg38, showing TADs, subTADs, loops, CTCF motifs, A/B compartments, CTCF cleavage under targets and release using nuclease (CUT&RUN), cohesin chromatin immunoprecipitation with sequencing (ChIP–seq), two-fraction Repli-seq, 16-fraction Repli-seq and IZs. b, Distribution of the number of sites co-bound by CTCF and cohesin (red) or bound only by cohesin (blue) per boundary for: dot boundaries colocalized with early IZs (n = 2,200); dot boundaries colocalized with no IZs (n = 4,087); dotless boundaries colocalized with late IZs (n = 66); and dotless boundaries colocalized with no IZs (n = 628). c, Proportion of boundaries with no CTCF motif, one single CTCF motif, CTCF motifs in a tandem orientation, and CTCF motifs in a complex divergent or convergent orientation. Boundaries are stratified into dot and dotless boundaries with either early/late IZs or no IZs. d, Top: boundary classification as detailed in the Supplementary Methods and Supplementary Table 7. Middle: aggregate peak analysis of the average observed/expected interaction frequency of the domains centred on each boundary classification. Bottom: averaged 16-fraction Repli-seq signal for each S-phase fraction centred on boundaries ±750 kb. Boundaries, TADs and Repli-seq data were normalized to the same genomic length scale. Boundary numbers are provided only for autosomal chromosomes. e, We computed right-tailed, one-tailed empirical P values using a randomization test with early-, early–mid- and late-S-phase IZs and size- and A/B compartment-matched null IZs (Supplementary Methods).

To quantify the link between TAD/subTAD boundaries and IZ genomic placement, we identified a total of 23,851 chromatin domains genome-wide in Hi-C data for human ES cells using our graph-theory-based method 3DNetMod20 (Supplementary Methods and Supplementary Table 1). We also applied statistical methods developed by our laboratory and others to identify dot-like structures representative of bona fide looping interactions8,21,22. We identified 16,922 dots genome wide in ensemble Hi-C maps of human ES cells. Such dots represent punctate groups of adjacent pixels with significantly higher contact frequency compared to the surrounding local chromatin domain structure (Fig. 1a, green rectangles, Supplementary Methods and Supplementary Table 2). After co-registration of dots with domains, we identified 8,279 corner-dot TADs/subTADs and 15,572 dotless TADs/subTADs genome wide in human ES cells (Supplementary Table 3). We stratified boundaries into three groups, including those that are structurally demarcated by: adjacent corner-dot TADs/subTADs on both sides (double-dot boundaries, n = 6,318); corner-dot TADs/subTADs on only one side and dotless on the other (single-dot boundaries, n = 2,163); and adjacent dotless TADs/subTADs on both sides (dotless boundaries, n = 1,089) (Supplementary Table 4). By applying a range of parameter stringencies and methods for dot calling, we could modify the proportion of boundaries classified as double-dot, single-dot and dotless, but the colocalization of dot boundaries with IZs was evident regardless of statistical methodology (Supplementary Methods and Extended Data Fig. 4). We combined all double-dot and single-dot boundaries into dot boundaries, as they showed similar IZ localization patterns (Supplementary Table 4).

Cohesin is essential for the formation of TADs/subTADs through loop extrusion and stalling against boundaries insulated by the architectural protein CTCF12,13,23,24,25. We reasoned that the density and orientation of CTCF-binding sites might reveal an architectural protein signature at boundaries linked to placement of active origins that fire in S phase. We observed a substantially higher density of co-bound CTCF + cohesin-binding sites at dot boundaries overlapping early IZs compared to those that do not overlap any IZs (Fig. 1b and Supplementary Tables 5 and 6). We also examined sites that bind only cohesin, as they can earmark CTCF-independent enhancer–promoter interactions7,23, but we did not see a notable difference in the number of sites that bind only cohesin across dot versus dotless TAD/subTAD boundaries (Fig. 1b). Together, our data indicate that boundaries colocalizing with human early-S-phase IZs exhibit enriched occupancy of motifs co-bound by CTCF and cohesin, but not cohesin alone, thus confirming and substantially expanding on observations in previous reports linking cohesin generally to a small subset of replication origins in Drosophila26 and humans27.

Recent reports have uncovered that convergently oriented CTCF motifs anchor long-range looping interactions formed by cohesin-mediated extrusion12,14,23,28,29. We observed that most dot boundaries are marked by two or more CTCF + cohesin-bound motifs arranged in a convergent or divergent orientation (hereafter called complex motif orientation; Fig. 1c), and this molecular signature was further enriched when dot boundaries colocalize with early replicating IZs. By contrast, nearly all dotless boundaries have only one or no CTCF + cohesin-bound motifs (Fig. 1c). Dotless boundaries colocalized with late IZs were most often anchored by one CTCF motif. We therefore establish six boundary classes by stratifying dot (classes 1–3) and dotless (classes 4–6) boundaries into those localized with CTCF + cohesin-bound motifs in a complex orientation (classes 1 and 4), tandem or single-motif orientation (classes 2 and 5), or no bound motifs (classes 3 and 6; Fig. 1d).

We next formulated a statistical test to quantify IZ enrichment at boundaries compared to the background expectation across autosomes (Supplementary Methods and Supplementary Table 7). Consistent with our qualitative observations, high-efficiency IZs firing in early S phase were significantly enriched at dot boundaries marked by CTCF + cohesin-binding sites in complex orientations compared to a null distribution of random intervals matched by size and A/B compartment distribution (class 1; Fig. 1d,e, Extended Data Fig. 5b–d and Supplementary Methods). By contrast, low-efficiency IZs firing in late S phase were depleted at dot boundaries and significantly enriched at dotless boundaries with tandem + single CTCF + cohesin-bound motifs or no bound motifs (classes 5 and 6; Fig. 1d,e, Extended Data Fig. 5b–d and Supplementary Methods). We note that our null distribution was created with random intervals matched to real IZs by their size and compartment distribution, reinforcing that the enrichment reflects a strong localization at boundaries above the known link between early and late replication and A and B compartments, respectively (Supplementary Methods).

We sought to independently verify our observed link between IZs and boundaries with an orthogonal technique for assaying replication origin activity. Small nascent strand sequencing (SNS-seq) identifies approximately 10 origins per 100 kb of the genome and enriches for high-efficiency origins localized in early replicating regions30. A previous report using ENCODE (Encyclopedia of DNA Elements) phase I pilot microarray data of 1% of the human genome reported enrichment of the cohesin subunit RAD21 at approximately 300 replication origins27. Here, using genome folding features from high-resolution Hi-C data, we find that SNS-seq data from human ES cells30 exhibits heightened origin enrichment specifically at class 1 dot boundaries (Extended Data Fig. 5e). Thus, through two independent replication mapping techniques, we observe a strong enrichment of high-efficiency, early-S-phase IZs at a subset of genetically encoded corner-dot TAD/subTAD boundaries. The colocalization of IZs with TAD boundaries generally has been further confirmed recently with super-resolution imaging31.

Transcription correlates with origin placement and efficiency15,17,32,33,34,35. To ascertain whether transcription at boundaries could explain our results, we stratified dot boundaries with a complex CTCF orientation (class 1), dotless boundaries with a complex CTCF orientation (class 4) and dotless boundaries with no CTCF occupancy (class 6) into those that also had transcribed genes and those that were devoid of genes or had only inactive genes (Extended Data Fig. 6 and Supplementary Table 8). Boundaries with transcribed genes in the absence of the dot features (Extended Data Fig. 6b) or in the absence of CTCF + cohesin (Extended Data Fig. 6c) did not exhibit precise localization of high-efficiency early IZs. These results are consistent with the literature, as a large proportion of active promoters are not sites of efficient replication initiation, suggesting that further distinguishing features encode human origins36. It is also particularly noteworthy that we see enrichment of early IZs at dot boundaries with a complex CTCF motif orientation only when transcribed genes were also present (Extended Data Fig. 6a). Our data suggest that transcription alone is not sufficient to localize high-efficiency early IZs at boundaries. Transcription may cooperate with CTCF and cohesin-based loop extrusion to position high-efficiency IZs replicating in early S phase.

To understand whether cohesin and TAD/subTAD structural integrity are functionally necessary for origin placement in S phase, we examined IZs after global genome folding disruption using wild-type HCT116 cells engineered to degrade the cohesin subunit RAD21 within hours using a degron23. Such a system is uniquely suited to test the role of cohesin-mediated extrusion on IZs decoupled from transcription, as only hours of RAD21 degradation results in genome-wide ablation of nearly all loops with minimal short-term effect on transcription23. We synchronized HCT116 RAD21–mAID cells in mitosis, degraded RAD21 with auxin throughout G1, and then assessed replication initiation across S phase (Extended Data Fig. 7 and Supplementary Methods). We identified the same dot and dotless TADs/subTADs and boundary classes in Hi-C from wild-type HCT116 (untreated HCT116 RAD21–mAID) cells as in human ES cells (Fig. 2a and Supplementary Tables 914). Consistent with previous reports23, our observations show that nearly all dot and dotless boundaries were destroyed following short-term cohesin knockdown in HCT116 cells (Fig. 2b,d and Extended Data Fig. 8). Therefore, although the molecular composition of boundaries influences their structural features of insulation strength and corner-dot presence, most are dependent on cohesin.

Fig. 2: Loss of cohesin-mediated TADs/subTADs severely disrupts the genomic placement of DNA replication IZs.
figure 2

a, Boundary classification in HCT116 wild-type (untreated HCT116 RAD21–mAID) cells conducted as in human ES cells (Fig. 1d) with boundary counts as listed in Supplementary Table 14. The boundary numbers in the figure are provided for autosomal chromosomes alone. b,d, Aggregate peak analysis of the Hi-C observed/expected average interaction frequency of the domains centred on each boundary classification in HCT116 wild-type (WT; untreated HCT116 RAD21–mAID; b) and HCT116 RAD21-knockdown (KD; auxin-treated HCT116 RAD21–mAID; d) cells after cohesin degradation with auxin treatment. The Hi-C source data are from ref. 23. c,e, High-resolution 16-fraction Repli-seq data in wild-type HCT116 (WT; untreated HCT116 RAD21–mAID; c) and HCT116 RAD21-knockdown (KD; auxin-treated HCT116 RAD21–mAID; e) cells. Each row represents a temporal fraction from S phase, with 16 rows/fractions in total. The Repli-seq signal plotted represents an average across all boundaries in a particular class for that fraction (y-axis) in 50-kb bins across a ±750-kb genomic distance centred on the midpoint of the boundaries (x-axis). Sample sizes for each class are shown in a. f, ORM data for wild-type (untreated HCT116 RAD21–mAID; black) and RAD21-knockdown (auxin-treated HCT116 RAD21–mAID; red) cells.

Previous studies have reported that replication timing domains are not globally altered following genome-wide disruption of cohesin-mediated loops37,38,39. Analyses in these studies relied on the log ratio of DNA synthesized in the first or second halves of S phase (two-fraction early/late Repli-seq)40, the resolution of which renders it difficult to discern IZs. Moreover, previously published two-fraction Repli-seq signals were often quantile normalized37,39, which obscures the localized disruption in IZ placement and timing shifts at specific TAD/subTAD boundaries. We generated and analysed high-resolution 16-fraction Repli-seq data (Fig. 2c,e and Supplementary Table 15), as well as single-molecule optical replication mapping (ORM) data17 (Fig. 2f), in both wild-type and cohesin-knockdown HCT116 cells (Extended Data Fig. 7 and Supplementary Methods). As in human ES cells, we observed that 16-fraction Repli-seq data exhibit focal enrichment of high-efficiency/early IZs specifically at dot boundaries marked by CTCF + cohesin co-bound motifs in a complex orientation in wild-type HCT116 cells (class 1; Fig. 2c). Enrichment of early IZs occurs only at boundaries that colocalize with cohesin (Extended Data Fig. 9). Moreover, as in human ES cells, low-efficiency, late IZs were enriched at weak dotless boundaries in wild-type HCT116 cells (Fig. 2c). Using single-molecule ORM data, which can directly assess IZ efficiency as the percentage of molecules that initiate within a particular IZ, we detected enriched origin initiation specifically at class 1 boundaries (Fig. 2f). Together, our single-molecule and ensemble replication initiation data indicate that early-S-phase IZs fire at a key subset of genetically encoded dot boundaries.

Following ablation of cohesin-mediated boundaries (Fig. 2b,d and Extended Data Fig. 8), we observe severe disruption of high-efficiency early-S-phase IZs specifically at class 1 boundaries, as evidenced by a diffuse and delocalized Repli-seq signal (class 1; Fig. 2c,e). Consistent with our qualitative observations, early wave IZs were less numerous and increased in width specifically at dot boundaries with a complex CTCF motif orientation after loss of cohesin (Extended Data Fig. 10 and Supplementary Table 16). We also noticed that low-efficiency IZs shift to replicating at the end of S phase (fractions 14–16) at dotless boundaries following cohesin knockdown (classes 4–6, Fig. 2c,e and Extended Data Fig. 10). Independently conducted ORM analyses confirmed our observations of IZ disruption by cohesin removal (Fig. 2f). Cell cycle progression and 5-bromodeoxyuridine incorporation was not substantially affected by RAD21 knockdown39 (Extended Data Fig. 7). Together, our ensemble and single-molecule IZ data demonstrate that disruption of cohesin-mediated loops during G1 alters the genomic placement where origins or clusters of origins fire during early S phase.

On the basis of our observations, we reason that a failure of cohesin to unload, and therefore the creation of new long-range loops due to more cohesin molecules stalled at complex CTCF boundaries in G1 phase, might result in an increased number of high-efficiency origins or a narrowing of their genomic placement in S phase. Recently, it was reported that knockdown of the gene encoding the cohesin unloading factor WAPL results in increased long-range loops41. We examined the genomic placement of IZs in S phase with 16-fraction Repli-seq in wild-type HCT116 cells engineered with an improved degron system (AID2) to degrade WAPL throughout G1 phase42. First, we created Hi-C libraries in wild-type and WAPL-knockdown HCT116 cells (Fig. 3a,b and Extended Data Fig. 7). Consistent with published results, our observations show that dots indicative of loops are more numerous, and traverse a longer genomic distance, compared with those in wild-type HCT116 cells (Fig. 3a,b and Supplementary Table 18). We observed that the gain-of-looping phenotype following WAPL knockdown occurs most strongly at dot boundaries with a complex CTCF motif orientation (class 1; Fig. 3c). At class 1 boundaries, we observe that early IZs become significantly narrower following WAPL knockdown (Fig. 3d,e and Supplementary Table 17). We note that IZs tighten and refine following gain of looping in the WAPL-knockdown condition at the same boundaries where IZs grow more diffuse following cohesin knockdown (Fig. 3a,b and Extended Data Fig. 10). Together, the findings from our gain and loss of structural boundary experiments further support a model in which cohesin-based loop extrusion in interphase deterministically informs the placement of the subset of origins that fire during S phase.

Fig. 3: Gain of looping with WAPL degradation narrows the genomic placement of early IZs at dot boundaries with a complex CTCF motif orientation.
figure 3

a,b, Hi-C maps from wild-type HCT116 (WT; untreated HCT116 WAPL–mAID2) and HCT116 WAPL-knockdown (KD; auxin-treated HCT116 WAPL–mAID2) cells for the loci chromosome 17: 71.19–73.8 Mb, hg38 (a) and chromosome 10: 42.2–45 Mb, hg38 (b). The tracks show CTCF motifs, CTCF ChIP–seq, RAD21 ChIP–seq, high-resolution 16-fraction Repli-seq and IZs. c, Distribution of loops per boundary for each of the six boundary classes. Vertical lines demarcate mean number of loops per boundary within each sample and boundary class. Two-tailed Mann–Whitney U-test between HCT116 WAPL-knockdown and HCT116 wild-type cells for class 1 P = 2.0 × 10−207 and class 2 P = 1.2 × 10−16. d, Averaged Repli-seq for each of the 16 fractions in a ±750-kb window at boundary classes 1, 2 and 6 as detailed in the Supplementary Methods and Supplementary Table 14. Boundary numbers are provided in the figure for autosomal chromosomes alone. Each Repli-seq row represents a temporal fraction from S phase, there are 16 rows/fractions, and the Repli-seq signal plotted represents an average across all boundaries in a particular class for that fraction (y-axis) in 50-kb bins across a ±750-kb genomic distance centred on the midpoint of boundaries (x-axis). e, Width of all IZs colocalized with boundary classes 1, 2 and 6. Two-tailed Mann–Whitney U comparing HCT116 wild-type to HCT116 WAPL-knockdown cells for class 1 P = 3.0 × 1022 and class 2 P = 3.3 × 109.

We finally sought to understand whether specific boundaries are necessary and sufficient to regulate IZ firing. We used targeted CRISPR–Cas9 genome editing to delete an 80-kb section of the genome containing a complex array of more than 10 CTCF + cohesin-binding sites with complex motif orientations anchoring a long-range chromatin loop that separates late from early replication timing domains (Fig. 4a). The loop anchor was chosen because it also partially overlaps an early-S-phase IZ, but does not encompass the full IZ, thus allowing us to ablate the loop while keeping much of the IZ intact. We observed a striking local delay of replication timing from early to late following deletion of the 80-kb loop anchor, consistent with the loss of an early IZ (Fig. 4a,c(i)). As a negative control, we deleted a different 30-kb loop anchored by two tandemly oriented CTCF-binding sites within an adjacent late replication timing domain, but not overlapping an IZ (Fig. 4b). Deletion of this 30-kb loop anchor disrupted the dot boundary but preserved the timing and genomic location of DNA replication (Fig. 4b,c(ii)). The direct overlap of IZs with boundaries precludes our ability to fully decouple them, and overlap of functional elements remains a technical challenge for functional perturbative studies in the genome biology field at large. Nevertheless, our data provide evidence that replication at a specific early IZ can undergo a striking shift to late S phase following ablation of a boundary. These data are consistent with our cohesin-knockdown observations and our model in which boundaries marked by a complex CTCF motif orientation inform the precise placement of high-efficiency IZs.

Fig. 4: Targeted perturbation leading to gain and loss of structural boundaries can deterministically shift replication timing from early to late S phase.
figure 4

a, Schematic showing a CRISPR-mediated 80-kb deletion encompassing the IDS gene (coordinates of deletion: hg38, chromosome X: 149,470,422–149,555,112). b, Schematic showing a CRISPR-mediated 30-kb deletion encompassing two CTCF sites approximately 100 kb upstream from the FMR1 gene (coordinates of deletion: hg38, chromosome X: 147,804,022–147,838,883). CTCF ChIP–seq tracks for wild-type induced pluripotent stem (iPS) cells and the edited clone are shown. Scissors represent the location of the cut sites verified with Sanger sequencing. c, 5C heatmaps (chromosome X: 145,118,480–151,431,528, hg38) and two-fraction Repli-seq tracks in wild-type iPS cells, and iPS cells with an 80-kb (i) or a 30-kb (ii) loop anchor deletion. Tracks for IZs in human ES cells are overlaid. d, Gain of boundary Hi-C and Repli-seq at chromosome 2: ≈13M (hg19) and chromosome 6: ≈102M (hg19) in HAP1 cells with a transposon inserted boundary43 and replication timing. e, Model of DNA replication initiation determined by high-likelihood cohesin extrusion stalling against strong TAD/subTAD boundaries created high-density arrays of CTCF + cohesin-bound motifs with a complex orientation (early replicating IZs) or low-likelihood cohesin pausing against weak TAD/subTAD boundaries formed by single CTCF motifs (late replicating IZs). Yellow double hexamers depict the MCM2–7 complex at licensed origins. Red double hexamers represent the subset of licensed origins that are activated.

As the direct overlap of IZs with boundaries is not amenable to clean, single-variable ‘loss-of-structure’ perturbative experiments, we also examined a ‘gain-of-structure’ approach in which we assessed whether the introduction of an engineered ectopic boundary was sufficient to induce changes in replication initiation. We mapped replication with two-fraction Repli-seq in published HAP1 cell lines in which we have previously demonstrated a gain in boundary following insertion of an established 2 kb-sized cell-type-invariant boundary element43. We observed a striking shift from late to early replication directly at the location of the engineered boundary (Fig. 4d), consistent with the possibility that boundaries can be sufficient for de novo early IZ firing. Together, our data reveal that both global and local gain and loss of structural boundaries can deterministically influence the placement of IZs.

It is well established that the initiation of DNA replication involves two mutually exclusive steps1,2. The first step, origin licensing, begins in telophase with the loading of two copies of the mini-chromosome maintenance (MCM2–7) complex2,44. MCM2–7 is initially loaded in excess at tens of thousands of sites across the human genome in an inactive form as a double hexamer that encircles double stranded DNA (yellow double hexamers in Fig. 4e). The second step, origin activation, occurs at the onset of S phase. Origin activation involves mechanisms that both prevent further MCM loading and recruit multiple extra factors to initiate the unwinding of the double helix and DNA synthesis2,44. In mammalian systems, a critical mystery remains regarding the mechanisms that governing the selection of a subset of MCM-bound, licensed origins for activation in S phase.

Here we propose a model in which cohesin-mediated loop extrusion and stalling at dot boundaries marked by CTCF + cohesin-binding sites oriented in convergent and divergent directions is required for the positioning of high-efficiency replication origins (Fig. 4e). We propose two possible models to explain the strong localization of high-efficiency IZs to a subset of cohesin-dependent, genetically encoded boundaries: cohesin could directly push licensed MCM double hexamers or other origin activation cofactors along the genome before stalling at high-density arrays of CTCF + cohesin-bound motifs in complex orientations; alternatively, cohesin might pass over many licensed, MCM-bound origins and selectively participate in the activation of those already loaded at boundaries. We also posit that low-efficiency IZs might fire at weaker dotless boundaries later in S phase because cohesin only temporarily pauses during its traversal along the genome, and thus cannot aggregate initiation activity (Fig. 4e). In the cell types from our study, cohesin-mediated loop extrusion is required for IZ placement, and the changes in replication timing are subtle and indirect owing to the altered distance of nearby genomic regions to the nearest initiation site. We note that although we do not see evidence for a dominant role for cohesin on the larger replication timing program, we cannot rule out that cohesin knockdown might have a more profound effect on the replication timing program in other cell types, species and experimental designs.

Previous studies using mass spectrometry and co-immuno-precipitation have reported the direct binding of cohesin to DNA replication factors, such as MCM7, MCM6, MCM4, RFC1 and DNA polymerase α27,45. The MCM complex has the ability to slide after loading and can be pushed by polymerase during transcription46,47,48. However, the extent and rate at which this occurs on chromatin in the presence of nucleosomes (≈11 nm) is still an open question. The internal diameter of cohesin is 40 nm, whereas the MCM2–7 double hexamer is only 15 nm. The findings of a recent Hi-C and imaging study suggest that, despite their small size, MCM complexes could also serve as boundaries to block cohesin-based loop extrusion49. TAD boundaries and loops persist through S phase50, but MCMs are removed from chromatin after IZs fire1,2. Therefore, we favour a model in which cohesin pushes licensed MCMs in G1, leading to the localization and activation of a key subset of origins at boundaries with a complex CTCF motif orientation in S phase (Fig. 4e). Nevertheless, both proposed models remain exciting areas for future mechanistic dissection.

Understanding the structure–function relationship of the human genome remains a major challenge for human geneticists and chromatin biologists. Here we stratify TADs and subTADs by their structural and molecular features. We conduct global and local perturbative studies to reveal that genetically encoded TAD/subTAD boundaries formed by cohesin-mediated loop extrusion in G1/pre-S functionally inform genome function in the case of the initiation of DNA replication in S phase. Our work sheds light on the question of whether and how the location of fired origins is deterministically encoded in humans by the genome, epigenome and higher-order chromatin folding.

Reporting summary

Further information on research design is available in the Nature Research Reporting Summary linked to this paper.