Abstract
Chromatin of mammalian nucleus folds into discrete contact enriched regions such as Topologically Associating Domains (TADs). Folding hierarchy and internal organization of TADs is highly dynamic throughout cellular differentiation, and are correlated with gene activation and silencing. To account for multiple interacting TADs, we developed a parsimonious randomly crosslinked (RCL) polymer model that maps high frequency HiC encounters within and between TADs into direct loci interactions using crosslinks at a given basepair resolution. We reconstruct three TADs of the mammalian X chromosome for three stages of differentiation. We compute the radius of gyration of TADs and the encounter probability between genomic segments. We found 1) a synchronous compaction and decompaction of TADs throughout differentiation and 2) high order organization into metaTADs resulting from weak interTAD interactions. Finally, the present framework allows to infer transient properties of the chromatin from steadystate statistics embedded in the HiC/5C data.
Introduction
Mammalian chromosomes fold into discrete megabasepairs (Mbp) contact enriched regions termed topologically associating domains (TADs). Although the precise role of TADs remain unclear, they participate in synchronous gene regulation^{1,2} and replication^{3}. Gene regulation within TAD is modulated by transient loops at subTAD scale^{1,4,5,6}, formed by regulatory elements such as enhancers and promoters^{7}. However, sparse connectors between TADs (at a scale >Mbps) can significantly influence chromatin dynamics and gene regulation within TADs^{8,9}. We present and apply here methodology based on polymer model that accounts for interactions between TADs to study chromatin dynamic at long scales, an area that remains largely unexplored.
Genome organization can be probed by chromosome conformation capture (CC) methods^{2,4,10}, which report simultaneous genomic contacts (loops) at scales of kilo bps (kbps) to Mbps. TADs appear as matrix block in the contact maps by summing encounter events over an ensemble of millions of cells^{1,10}. Conformation capture contact maps provide a statistical summary of steadystate looping frequencies, but do not contain neither direct information about the size of the folded genomic section nor any transient genomic encounter times^{11}. Throughout cell differentiation stages, the boundaries between TADs remain stable^{1,12}, but their internal looping pattern is highly variable. Moreover, TADs can form hierarchy structures into meta TADs, formed by inter TAD connectivity, correlated with transcription state of the chromatin^{12}.
Coarsegrained polymer models are used to study steadystate and transient properties of chromatin at a given scale. Starting with the Rouse polymer^{13}, composed of N monomers connected sequentially by harmonic springs, the linear connectivity in the Rouse model does not account for the complexity of molecular interactions and therefore cannot account for contact enriched regions such as TADs. To include short and longrange looping, other polymer models have been developed, with selfavoiding interactions^{11,14,15,16,17,18,19,20,21,22}, random loops^{8,15,23,24}, or epigenomic state^{25,26}, that in addition account for TADs.
To account for interactions between multiple TADs, we develop a parsimonious polymer model of the chromatin. We extend the construction of the randomly crosslinked (RCL) polymer model, introduced in refs. ^{9,11}, and account for multiple connected TADs of various sizes and connectivity within and between TADs. Random crosslinks can be generated by binding molecules such as CTCF^{1,27} or by loop extrusion mechanism^{28}, although the exact mechanism by which they form is not really important for the present work. The random positions of the crosslinks capture the heterogeneity of chromatin structure sampled in a large ensemble of cells.
We then introduce our methodology to construct an RCL polymer from the empirical encounter probability (EP) extracted from 5C or HiC data, we determine their geometrical organization and characterize their distribution in space by computing the volume of TADs using the mean radius of gyration (MRG). We further investigate the role of several parameters such as the number and positions of the crosslinks or the polymer length. We show that TADs and higherorder genome organization are modulated by the interTAD connectivities. These connectivities determine the steadystate and timedependent statistical properties of monomers within each TAD. We validate the present method by comparing HiC and 5C data and the reconstructed statistics between two 5C replicas. Finally, we study the reorganization of three neighboring TADs on the mammalian X chromosome throughout stages of cellular differentiation. The key step of our method was to derive an expression for the EP of the RCL polymer model (see Eqs. (3) and (5)), this step was a bottleneck when fitting the empirical 5C data^{1}. We then determine the rate of compaction and decompaction of TADs that can be correlated to gene silencing and activation, respectively. These properties cannot be studied by a direct comparison of the overall number of interactions within and between TADs from 5C data, because individual realizations cannot be discerned from the ensemble. To conclude, the present framework allows to study chromatin dynamics from HiC or 5C data, to derive its statistical properties, to connect steadystate HiC statistics with timedependent analysis of single locus trajectories and to reveal the influence of multiple connected TADs on each other and genome reorganization throughout cellular differentiation.
Results
Statistical properties of heterogeneous RCL
We will study the statistical properties of the heterogeneous RCL polymer, representing multiple TADs using first the meansquareradius of gyration (MSRG), second the EP in two cases: between monomers of the same TAD (intra TAD) and across TADs (interTAD), third, meansquaredisplacement (MSD) of single monomers, and fourth, the distribution of distance between any two monomers. Details are given in Supplementary Methods.

(1)
The MSRG \(\langle R_g^2\rangle\) characterizes the folding of a TAD inside a ball of radius \(\sqrt {\langle R_g^2\rangle }\). When the condition of a dominant intraTAD connectivity (assumption H_{1}, Supplementary Methods Equation 29) is satisfied, the MSRG for TAD_{i} (see derivation in Supplementary Methods Equation 24–34) is given by
$$\left\langle {R_g^2} \right\rangle ^{(i)} \approx \frac{{b^2}}{{(1  \xi _{ii})(\zeta _0^{(i)}({\mathrm{\Xi }})  \zeta _1^{(i)}({\mathrm{\Xi }}))}},$$(1)where
$$\begin{array}{*{20}{l}} {\zeta _0^{(i)}({\mathrm{\Xi }})} \hfill & = \hfill & {y^{(i)}({\mathrm{\Xi }}) + \sqrt {y^{(i)}({\mathrm{\Xi }})^2  1} ,} \hfill \\ {\zeta _1^{(i)}({\mathrm{\Xi }})} \hfill & = \hfill & {y^{(i)}({\mathrm{\Xi }})  \sqrt {y^{(i)}({\mathrm{\Xi }})^2  1} ,} \hfill \end{array}$$(2)\(y^{(i)}({\mathrm{\Xi }}) = 1 + \frac{{\mathop {\sum }\nolimits_{k = 1}^{N_T} \xi _{ik}N_k}}{{2(1  \xi _{ii})}}\), and ξ_{ij} is the connectivity matrix defined in Eq. (14).

(2)
Under the condition of nonvanishing connectivities (Supplementary Methods Equation 50), the EP between monomer m and n within TAD A_{i} (see derivation in Supplementary Methods) is given by
$$P_{m^{(i)},n^{(i)}}({\mathrm{\Xi }}) \propto \left({\frac{d}{{2\pi \sigma _{m,n}^2({\mathrm{\Xi }})}}} \right)^{d/2},$$(3)where
$${\sigma _{m,n}^2({\mathrm{\Xi }}) = \left\{ {\begin{array}{*{20}{l}} {\left\langle {R_g^2} \right\rangle ^{(i)}\left( {\frac{{(\zeta _0^{(i)}({\mathrm{\Xi }})^{m  n}  1)^2  2\zeta _0^{(i)}({\mathrm{\Xi }})^{m + n  1}}}{{\zeta _0^{(i)}({\mathrm{\Xi }})^{2m  1}}} + 2} \right),} \hfill & {m \ge n;} \hfill \\ {\left\langle {R_g^2} \right\rangle ^{(i)}\left( {\frac{{(\zeta _0^{(i)}({\mathrm{\Xi }})^{n  m}  1)^2  2\zeta _0^{(i)}({\mathrm{\Xi }})^{m + n  1}}}{{\zeta _0^{(i)}({\mathrm{\Xi }})^{2n  1}}} + 2} \right),} \hfill & {m < n,} \hfill \end{array}} \right.}$$(4)and \(\left\langle {R_g^2} \right\rangle ^{(i)}\) is the MSRG of TAD A_{i}, defined by relation (1). When monomers belong to distinct TADs i and j (i ≠ j), the EP formula is modified to (see Supplementary Methods subsection Encounter probability of monomers of the heterogeneous RCL polymer)
$$P_{m^{(i)},n^{(j)}}({\mathrm{\Xi }}) \propto \left( {\frac{d}{{2\pi \sigma _{mn}^2({\mathrm{\Xi }})}}} \right)^{d/2},$$(5)where
$$\begin{array}{*{20}{l}} {\sigma _{mn}^2(\Xi )} \hfill \\ {\begin{array}{*{20}{l}} {} \hfill & = \hfill & {\left\langle {R_g^2} \right\rangle ^{(i)}(1 + \zeta _0^{(i)}(\Xi )^{1  2m}) + \left\langle {R_g^2} \right\rangle ^{(j)}(1 + \zeta _0^{(i)}(\Xi )^{1  2n})} \hfill \\ {} \hfill & {} \hfill & { + b^2\left( {\frac{1}{{N_i\mathop {\sum}\nolimits_{k \ne i}^{N_T} {N_k} \xi _{ik}}} + \frac{1}{{N_j\mathop {\sum}\nolimits_{k \ne j}^{N_T} {N_k} \xi _{jk}}}} \right).} \hfill \end{array}} \hfill \end{array}$$(6) 
(3)
The MSD of a monomer \(r_m^{(i)}\) located inside \(A_i\) for intermediate times (see Supplementary Methods subsection Meansquare displacement of monomers of the heterogeneous RCL polymer) is given by
$${\langle \langle (r_m^{(i)}(t + s)  r_m^{(i)}(s))^2\rangle \rangle \approx 2dD_{cm}t + \frac{{db^2Erf\left[ {\sqrt {2dDt\mathop {\sum }\nolimits_{k = 1}^{N_T} \frac{{N_k\xi _{ik}}}{{b^2}}} } \right]}}{{2\sqrt {(1  \xi _{ii})\mathop {\sum }\nolimits_{k = 1}^{N_T} N_k\xi _{ik}} }}},$$(7)where \(D_{cm} = \frac{D}{{\mathop {\sum }\nolimits_{k = 1}^{N_T} N_k}}\) and Erf is the Gauss error function.

(4)
The distribution \(f_{D_{mn}}(x)\) of the distance \(D_{mn} = \parallel r_m  r_n\parallel\) between any two monomers r_{m} and r_{n} (see Supplementary Methods subsection Distribution of the distance between monomers of the RCL polymer) is given by
where Γ is the Γfunction.
To validate formulas (1)–(7) so that we can use them to extract statistical properties of 5C/HiC data, we decided to test them against numerical simulations of three synthetic interacting TADs. We constructed a RCL polymer containing three TADs with N_{1} = 50, N_{2} = 40, N_{3} = 60 total monomers, so that condition H_{1} (Supplementary Methods Equation 29) about dominant intraconnectivity is satisfied. We impose the number of connectors in each TAD to be at least twice compared to the one between TADs (see Fig. 1a).
To construct the encounter frequency matrix, we simulated Eq. (17) in dimension d = 3, with b = 0.2 μm and diffusion coefficient D = 8 × 10^{−3} μm^{2} s^{−1} ^{29} starting with a random walk initial polymer configuration. Connectors were placed between monomers with a uniform probability in each TAD_{i} and in between TADs, as indicated in Fig. 1a. We ran 10,000 simulations until polymer relaxation time (see Supplementary Methods Equation 23 and ref. ^{11}). The longest relaxation time of RCL chains containing N_{T} TADs is defined by tens of thousands of simulation steps. At the end of each realization, we collected the monomer encounters falling below the distance \(\epsilon\) = 40 nm, and constructed the simulation encounter frequency matrix. This matrix shows three distinct diagonal blocks (Fig. 1a) resulting from high intraTAD connectivity, and further reveals a highorder organization (cyan blue in Fig. 1a), which resembles the metaTADs discussed in ref. ^{12}. We thus propose here that hierarchical TAD organization is a consequence of weak interconnectivity properties.
We then computed the steadystate EP from the simulation encounter frequency matrix (Fig. 1a) by dividing each row with its sum. We then compared simulations and theoretical EPs (Eqs. (3) and (5)) in Fig. 1b: the three sample curves for monomer r_{20} (upper left), r_{70} (upper right), and r_{120} (lower) located in the middle of each TAD, are in good agreement with the theory. Furthermore, the theoretical and simulated EPs for monomer r_{1}, r_{51}, and r_{91}, located at boundaries of TADs (Fig. 1b, bottom) are in good agreement. Finally, we computed the MRG \(\bar R_g = \sqrt {\langle R_g^2\rangle }\), for TAD_{1}, TAD_{2}, and TAD_{3} given by 0.177, 0.13, 0.165 μm (simulations), compared to 0.178, 0.13, 0.167, respectively, obtained from expression (1), which agree.
To validate the MSD expression (Eq. (7)), we simulated Eq. (17) for 2500 steps with a time step Δt = 0.01 s, past the relaxation time τ(Ξ) (Supplementary Methods Equation 29) and computed the average MSD over all monomers in each TAD_{i}, i = 1, 2, 3. In Fig. 1c, we plotted the average MSD in each TAD against expression (Eq. (7) (dashed)), which are in good agreement. The overshoot of the MSD of TAD_{1}, results from the weak coupling of centers of masses of TADs (see Supplementary Methods Equation 22). The amplitude of the MSD curve is inversely proportional to the total connectivity of each TAD as shown in Fig. 1c, TAD_{1} (blue, 26 connectors), TAD_{2} (red, 44 connectors), and TAD_{3} (yellow, 37 connectors). We conclude, that the present approach (numerical and theoretical) capture the steadystate properties (Eqs. (1), (3)–(5), (7)) of multiTAD.
In addition, we found that adding an exclusion forces with a radius of 40 nm did not lead to any modifications of the statistical quantities defined above (see Supplementary Fig. 3 compared to Fig. 1c). However, when the exclusion radius increases to 67 nm, deviations started to appear (Supplementary Fig. 4). To conclude, an exclusion radius of the order of 40 nm, also used in ref. ^{30}, is consistent with the physical crowding properties of condensin and cohesin^{31} to fold and unfold chromatin. Thus, using the present RCL polymer models we will now reconstruct statistical properties of chromatin in different cellular differentiation phases.
Reconstructing genome reorganization during cell differentiation
To extract chromatin statistical properties, we constructed systematically an RCL model from 5C data of the X chromosome^{1}. We focus on the chromatin organization during three stages of differentiation: undifferentiated mouse embryonic stem cells (mESC), neuronal precursor cells (NPC), and mouse embryonic fibroblasts (MEF). We first used the average of two replica of a subset of 5C data generated in ref. ^{1}, and then each replica separately. The two replica harbor three TADs: TAD D, E, and F, which span a genomic section of about 1.9 Mbp. We coarsegrained the 5C encounter frequency data at a scale of 6 kb (Fig. 2a, upper), which is twice the median length of the restriction segments of the HindII enzyme used in producing the 5C data^{1,8,9}. At this scale, we found that longrange persistent peaks of the 5C encounter data are sufficiently smoothed out to be able to use expressions (3) and (5) for fitting the 5C EP using standard norm minimization procedure. The result is a coarsegrained encounter frequency matrix that includes pairwise encounter data of 302 equallysized genomic segments. To determine the position of TAD boundaries, we mapped the TAD boundaries reported in bps (see ref. ^{1}) to genomic segments after coarsegraining. We then constructed a heterogeneous RCL polymer with N_{D} = 62, N_{E} = 88, N_{F} = 152 monomers for TAD D, E, and F, respectively. To compute the minimum number of connectors within and between TADs, we fitted the EP of each monomer in the coarsegrained empirical EP matrix using formulas (3) and (5). In Fig. 2a (bottom), we present the fitted EP matrices for mESC (left), NPC (middle), and MEF (right).
We computed the number of connectors Nc, within and between TADs, by averaging the connectivity values ξ_{m} for monomers in each TAD obtained from fitting the EP (Eqs. (3) and (5)) to all 302 monomers. After averaging we obtained the connectivity matrix Ξ, and used it in relations (Eqs. (13) and (14)) to recover the number of connectors within and between TADs. The mean number of connectors in the differentiation from mESC to NPC showed an increase by 145–200% (Fig. 2b) within and between TADs. The number of connectors within TAD F increases by 145% from 22 for mESC to 31 in NPC, the interconnectivity between TAD F and E increases by 150% from 9 at mESC to 14 for NPC and the connectivity between TAD F and D doubled from 6 connectors for mESC to 12 for NPC, whereas the number of connectors within TAD E remained constant of 15. In MEF stage the number of connectors within TADs D, E, and F returned to values comparable to mESC, whereas the interTAD connectivity between TAD F and E was 9 for MEF.
To evaluate the size of the folded TADs, we computed the MRG of the three TADs throughout differentiation stages. The MRG is the square root of the MSRG (Eq. (1)) for each TAD using the calibrated connectivity matrix Ξ, obtained from fitting the experimental 5C EP (Fig. 2c, left). We found that the MRG can both increase and decrease depending on the number of connectors within TADs, but is also affected by interTAD connectivity, as revealed by Eq. (1). The MRG of all TADs decreased in average from 0.21 μm at mESC stage to 0.19 μm for NPC and was 0.2 μm for MEF cells. From mESC to NPC, the MRG of TAD E exceeded that of TAD D (Fig. 2c, red squares) despite the higher numbers of added connectors in TAD D and its smaller size N_{D} = 66 compared to N_{E} = 88. This result shows how the interTAD connectivity contributes to determine the MRG and the volume of TADs. In addition, using the calibrated RCL model at 3 kb, we were able to reproduce the distributions of threedimensional distances between seven DNA FISH probes reported in ref. ^{8} (Supplementary Fig. 2).
Finally, we recall that a ball having a radius of gyration is insufficient to characterize the degree of compaction inside a TAD, because it does not give the density of bps per nm^{3}. To obtain a better characterization of chromatin compaction, we use the compaction ratio for TAD_{i}, defined by the ratio of volumes:
where \(\left\langle {R_g^2} \right\rangle ^{(i)}\) is given by formula (1) and the denominator is the MSRG for a linear Rouse chain of size N_{i}^{13}. We find that TAD F (N_{F} = 154 monomers) has the highest compaction ratio among all the TADs among the three stages of differentiation (Fig. 2c, circles): indeed for TAD F, \(C_r^F = 91,135\), and 97 fold more compact than the linear Rouse chain with N = 150 monomers, associated with mESC, NPC, and MEF stages, respectively. For TAD E, (N_{E} = 88 monomers) the compaction ratio is 51, 66, and 45, thus it is more compact than the linear Rouse chain with N = 88 monomers (Fig. 2c right, red squares), despite retaining 15 intraTAD connectors in all stages of differentiation (panel b). This effect is due to an increased interTAD connectivities between TAD E and F at NPC stage to 15. Finally, TAD D (N = 62 monomers), characterized by \(C_r^D = 28,44\), and 35 (blue diamonds) is more compact than a Rouse chain of N = 62 monomers, for mESC, NPC, and MEF stages, respectively.
To examine the consistency of our approach and the ability of RCL model to represent chromatin, we fitted independently the EPs P^{(1)}, P^{(2)} of the 5C data of replica 1 and 2 at 10 kb resolution (Supplementary Fig. 5A–C) using Eqs. (3)–(5) (Methods). We found that the number of added connectors in replica 1 and 2 differs by at most five connectors for TAD F. This difference between replica may arise from intrinsic fluctuations in the statistics of encounter frequencies. We further compared the EP P^{(1)} with the empirical EP E^{(2)} of replica 2 (Supplementary Fig. 5D, left); We found that \(\langle \parallel P^{(1)}(m)  E^{(2)}(m)\parallel \rangle _m\), averaged over monomers m (Supplementary Methods Equation 65), equals 0.17. Note that the main contribution of this difference arises from monomers forming longrange loops (Supplementary Fig. 5D) consistent with offdiagonal peaks of the 5C data. Similarly, we found \(\langle \parallel P^{(2)}  E^{(1)}\parallel \rangle _m = 0.17\) (Supplementary Fig. 5D, right). In addition, the mean radii of gyration for all three TADs in both replicas were comparable for all three stages of differentiation (Supplementary Fig. 6A, B, left).
To evaluate the consequences of boundaries between TADs on the number of added connectors necessary to reconstruct the heterogeneous RCL polymer (Supplementary Fig. 7), we subdivided each TAD DF into two equal parts, and repeated the fitting of the heterogeneous RCL model to the EP of the six resulting subTADs. We tested the scenario where the number and boundaries of TADs differ from the one in ref. ^{1}, which can result from various TADcalling algorithms^{32,33}. We extracted the intra and interTAD connectivity fractions Ξ and computed the number of connectors by fitting the empirical EP (Supplementary Fig. 6A, B). We computed the difference in the average number of connectors between the three TADs case (Supplementary Fig. 6C) and the six subTADs case. We found a maximal difference of six connectors for intraTAD connectivity of TAD E in the MEF stage (Supplementary Fig. 7D). For interTAD connectivity, the average difference is two connectors. In addition, we find that the compaction of TADs throughout differentiation is preserved for the six TAD case (Supplementary Fig. 8A), and further find a qualitative agreement between the compaction ratios of the three and six TADs case (see comparison, Supplementary Fig. 8B).
Finally, to determine the robustness of the predictions of the heterogeneous RCL polymer, we compared the reconstructed 5C statistics (Fig. 2) to the statistics reconstructed from HiC data^{34} of the X chromosome, harboring TAD D, E, and F, binned at 10 kb, with b = 1.814 μm computed from Supplementary Methods Equation 67, and for three successive stages of differentiation: mESC, NPC, and cortical neurons (Fig. 3). Note that the polymer model reconstructed from HiC and 5C data, are not necessary identical, although their share some similar statistics, because for each one, the data are generated at a different resolution. However, we found a good agreement between the intraTAD connectivity of TADs D and E of the 5C and HiC data for mESC and NPc stages (Fig. 3a, c). In general, the interTAD connectivity in the HiC data was lower (average of 1.5 connectors) than that of the 5C (average of 4), which resulted in an increased MRG for all TADs (Fig. 3b, d, left; and a decreased compaction ratios, right). A direct comparison between the reconstructed statistics of the 5C MEF and HiC CN was not possible. To conclude, interTAD connectivity plays a key role in the compaction of TADs and therefore recovering their exact number is a key step for precisely recovering genome reorganization from 5C data.
Distribution of anomalous exponents for single monomer trajectories
Multiple interacting TADs in a crosslinked chromatin environment, mediated by cohesin molecules can affect the dynamics of single loci trajectories. Indeed, analysis of single particle trajectories (SPTs)^{35,36,37,38,39} of a tagged locus revealed a deviation from classical diffusion as measured by the anomalous exponent. We recall briefly that the MSD (Eq. (7)) is computed from the positions r_{i}(t) of all monomers i = 1, …, N_{T}. In that case, the MSD, which is an average over realization, behaves for small time t, as a power law
It is still unclear how the value of the anomalous exponent α_{i} relates to the local chromatin environment, although it reflects some of its statistical properties, such as the local crosslink interaction between loci^{14,35}. Thus we decided to explore here how the distribution of crosslinks extracted from EP of the HiC data could influence the anomalous exponents. For that purpose, we simulate a heterogeneous RCL model, where the number of crosslinks was previously calibrated to the data. The number and position of the connectors remain fixed throughout all simulations (for tens of seconds).
We started with a heterogeneous RCL model with three TADs, reflecting the inter and intraTAD connectivity as shown in Fig. 2. We generated a hundred chromatin realizations \({\cal{G}}_1, \ldots ,{\cal{G}}_{100}\). In each realization \({\cal{G}}_k\), the position of added connectors is not changing. We then simulated in time each configuration a hundred times until relaxation time (Supplementary Methods Equation 23). After the relaxation time is reached, defined as t = 0, we followed the position of each monomer and computed the MSD up to time t = 25 s. To compute the anomalous exponent α_{i}, we fitted the MSD curves using a power law (Eq. (10)) to estimate the anomalous exponents \(\alpha _i,i = 1, \ldots ,302\) along the polymer chain. We repeated the procedure for each stage of cell differentiation: mESC, NPC, and MEF.
In Fig. 4, we plotted the anomalous α_{i} for each monomer of the three stages mESC (left), NPC (middle), and MEF (right), and for TAD D (dark blue), TAD E (cyan), and TAD F (brown). We find a wide distribution of α_{i} with values in the range \(\alpha _i \in [0.25,0.65]\) for all TADs in the three cell types. The average anomalous exponent in TAD D is α_{D} = 0.46, in mESC stage, reduced to α_{D} = 0.41 in NPC, due to the increases intraTAD connectivity, and increased to α_{D} = 0.435 in NPC stage. The average anomalous exponent in TAD E, α_{E} = 0.425, 0.41, 0.426 at mESC, NPC, and MEF stages, respectively. The average anomalous exponent of TAD F was α_{F} = 0.443, 0.405, 0.44 at mESC, NPC, and MEF stages, respectively. The anomalous exponent α decreases with adding connectors, observed throughout differentiation in all TADs, which is in agreement with the compaction and decompaction of TADs (Fig. 2c and Supplementary Figs. 6 and 7). Furthermore, we obtain an average anomalous exponent of 0.4, previously reported experimentally in ref. ^{38}.
To complement the anomalous exponent, we estimated the space explored by monomers by computing the length of constraint L_{c}^{35} (computed empirically along a trajectory of N_{p} points for monomer R as \(L_{c} \approx \sum _i \left( {\frac{1}{{N_p}}R(i{\mathrm{\Delta }}t)  \langle R\rangle } \right)^{2}\)) for three monomers in each TAD D, E, F: r_{20}, r_{70}, r_{120}. For a single connector realization, we obtain \(L_c \approx 0.3,0.25,0.26\) μm, respectively, which is about twice the simulated MRG of TAD D, E, F: \(0.18,0.13.0.17\) μm, respectively. Thus we conclude that random distributions of fixed connectors can reproduce the large variability of anomalous exponents reported in experimental systems using single locus trajectories, especially for bacteria and yeast genome^{35,38} in various conditions.
Discussion
Here, we report a general framework based on the RCL model^{9,11} to extract statistical and physical properties of multiple interacting TADs. The present polymer model differs from others by several aspects: Our construction of a polymer model from HiC is parsimonious. It uses a minimal number of added connectors at a given scale to match the experimental steadystate of HiC/5C data, in contrast to the model^{8}, which is based on a full monomermonomer interaction, described by pairs of potential wells. In addition, at the scale of few μm occupied by TAD D, E and F of chromosome X, we neglected crowding effects from neighboring chromosomes. Furthermore, we do not use here several types of diffusing binders that need to find a binding site in order to generate a stable link, as introduced in ref. ^{40}. In addition, contrary to the random loop model (RLM)^{16}, we do not consider here transient binding, because the positions of random connectors within TADs does not matter (as long as they are uniformly randomly placed). By placing connectors randomly, we capture the heterogeneity in chromatin organization across cell population. Here we fix connectors, which are stable in the time scale of minutes to hours. The present polymer construction is motivated by the evidence of many stable loci–loci interactions, which are common to the majority of chromosomes in 5C (e.g., peaks of the 5C data)^{1,32}. These stable interactions are also found at TAD boundaries, which are conserved in both human and mouse. There are several conflicting studies^{5,32,41} about the binding time of connectors (CTCFcohesin, etc...), which suggest that crosslinks can remain stable for minutes to hours and even during the entire phase cycle. Here, we study the chromatin dynamics within this time range where crosslinks are stable^{32,41}. One final difference between the present RCL model and the RLM model, is the possibility to account for several interacting TADs and our expressions for statistical quantities such as the radius of gyration, EP, or MSD.
We applied our framework to reconstruct chromatin organization across cell differentiation from conformation capture (3C, 5C, and HiC) data, where we accounted for both intra and interTAD connectivities. The RCL polymer model allows estimating average number of crosslinks within and between each TAD, lengthscales such as the MSRG that characterizes the size of folded TADs, and the MSD of monomers in multiple interacting TADs. The present method allowed us to estimate the volume occupied by TADs. These quantities cannot be derived directly from the empirical conformation capture data and are usually extracted from SPT experiments^{42,43}. Finally, the present approach can also be used to study the phaseliquid transition at the chromatin level, depending on the number of connectors. We computed here the radius of gyration that clarifies how chromatin compaction depends on the number of added connectors. Similar questions were recently discussed in a general perspective^{44} and we presented here some answers.
We have applied the present approach to reconstruct multiple TAD reorganization during three stages of cell differentiation: mouse embryonic stem cell (mESC), neuronal precursor cells (NPC), and mouse embryonic fibroblasts (MEF). We fitted expressions (3) and (5) to the empirical 5C encounter matrices of three differentiation stages (Fig. 2a) and showed that the RCL model produced contact enriched TADs with a variability in monomer connectivity within each TAD (Fig. 2a, super and subdiagonals of EP matrices). We use the average connectivity obtained to compute the average number of connectors within and between TADs (Fig. 2b). At a scale of 6 kb, our results show that the Xchromosome acquires connectors within and between TADs D, E, and F in the transition between mESC to NPC, and the number of connectors is comparable to those of mESC at MEF stage. Increased connectivity for NPC cells is correlated with an increase of LaminB1 (see ref. ^{1}, Fig. 3). Similarly, we reported (Fig. 2a, b) an increase in the number of connectors within TADs D, E, and F in NPC stage in comparison to the mESC stage, which is associated with TAD compaction. Indeed, the MRG curves (Fig. 2c, left) decrease from mESC to NPC stage, indicating a higher chromatin compaction for all TADs. The compaction ratio (compared to Rouse polymer) showed a high compaction at NPC stage (Fig. 2c, right) for all TADs, which can be associated with a heterochromatin state, suppressed gene expression and lamina associating domains^{45}. InterTAD connectivity remained quite stable as the number of connectors for TAD E did not change, but the MRG decreased and the compaction ratio increased at NPC stage. This result shows that an accurate description of chromatin from 5C data by polymer models has to account for interTAD connectivity. Indeed, despite having similar intraTAD connectivity, the reconstruction of 5C and HiC differs in interTAD connectivity, which is reflected in a change in the MRG (Fig. 3). In addition, using Eq. (8), we were able to reproduce the distribution of threedimensional distances between seven genomic loci, measured by DNA FISH probes (Supplementary Fig. 2). Overall, the RCL polymer captures the correlated reorganization of TAD D, E, and F during differentiation. TAD reorganization is also compatible with transcription coregulation in these TADs^{1}. Moreover, we found here multiple connected regions with weak interTAD connectivity, as suggested by experimental conformation capture data^{1,4,5,12} that can affect all loci dynamics inside a TAD (Fig. 4).
To describe 5C EP of HiC data, we used theory (Eqs. (3) and (5)) and simulations so we could capture both TADs and higher order structures (metaTADs^{12}) resulting from weak interTAD connectivity (Fig. 1a, b). This representation allowed us to clarify how the dynamics of monomers are affected by the local connectivities within and between TADs^{46}. Using the MSD curves (Fig. 1c), we found that local connectivities (number and positions) are responsible to shape the value of the anomalous exponents, leading to a large spectrum with a mean of 0.4 (Fig. 4). This result explains the large variability in MSD and anomalous exponent behavior reported experimentally in SPT experiments^{9,22,38,39,43,47,48,49}. Indeed, the local chromatin organization can vary in cell population, as cohesin could bind randomly at various places. Furthermore, we have shown here that the anomalous exponents are modulated by the number and the positions of crosslinkers. The value of anomalous exponent α does not depend in general on the diffusion coefficient D or b, as known in various other polymer models, such as for Rouse or βpolymers^{14,46}. We have shown here (Fig. 4) that the anomalous exponent crucially depends on the number and the distribution of connectors (see also Supplementary Fig. 3D). In the meanfield approximation, the mean exponent is 0.5, which does not depend on the polymer scale. Finding the exact relation between the number of connectors for a specific connector configuration (not in the meanfield case) remains challenging and relevant to reconstruct the local connectors environment from measured anomalous exponent. To conclude, we propose that measuring the anomalous exponents of loci, positioned at different locations inside a TAD, could reveal the amount of connectors and thus chromatin condensation beyond the exact position of these loci. In summary, the present method allows to reconstruct a polymer model from HiC, to generate numerical simulations and to estimate the MSD and the anomalous exponents, relating HiC with SPT statistics.
We emphasize here that the present approach can be used to describe chromatin with volume exclusion lower than 40 nm (Supplementary Figs. 3 and 4). Furthermore, we obtained here estimations for the number of connectors within and between TADs, which was consistent across two replicas of the 5C data (Supplementary Figs. 7 and 8). Comparing the predicted number of connectors between 5C and HiC data, we found a good agreement in the intraTAD connectivity. However, the interTAD connectivity was reduced in the HiC vs. 5C (Fig. 3). We attributed these discrepancies to the elimination of interTAD interactions, which are smoothed out at 10 kb resolution for the HiC data. Connectors are likely to represent binding molecules such as cohesin and the present results suggest that only a few of them are actually needed to condense chromatin and their exact positions inside a TAD does not necessarily matter.
The interpretation of the number of connectors Nc is not straightforward. This number characterize the amount of connectors at a given scale, which suggests that in the limit of 1 bp resolution Nc, it would be equal to the endogenous number of linkers, but as the resolution decreases, the number of connectors that are reported by the HiC data should also decrease, because two connectors in the same bin are not counted (Fig. 5a). It is always possible to coarsegrain a polymer model, but reconstructing a refined polymer from a coarse grained kbscale is an illposed problem, because the refined EP at high kbresolution cannot be inferred from a low resolution (yellow and red arrows in Fig. 5b).
To conclude, the presented framework is a tool to systematically reconstruct chromatin structural reorganization from HiC matrices, and can be used to interpret chromatin capture data. In particular, using any experimental data of ligation proximity experiments (HiC, 5C 4C), we obtained here statistical properties beyond conformation capture data. The present analysis could also be applied to reveal the statistics of complex matter at the transition point between liquid and gel^{44,50} based on heterogeneous random architecture at the level of chromatin suborganization.
Methods
RCL polymer model for multiple interacting TADs
We now describe the construction of a heterogeneous RCL polymer for multiple interacting TADs. The RCL polymer models were previously used in ref. ^{11} and (Supplementary Methods subsection Constructing a RCL polymer for a single TAD) to compare the statistical properties of simulations vs. theory for a single TAD. Heterogeneous RCL polymers consist of N_{T} sequentially connected RCL polymers (Fig. 6a, Supplementary Methods subsection Constructing a RCL polymer for a single TAD) that account for connectivity within TADs of \(N = [N_1,N_2, \ldots ,N_{N_T}]\) monomers, respectively (Fig. 6b). The polymer is constructed as follows: the position of the ensemble of monomers R in all TADs is described by N_{T} vectors represented in a block matrix
where the superscript indicates TAD 1, …, N_{T} it belongs to and square brackets indicate a block matrix, such that \(R^{(i)} = \left[ {r_1^{(i)}, \ldots ,r_{N_i}^{(i)}} \right]\). The linear backbone of the polymer is a chain of N_{T} Rouse matrices (Supplementary Methods Equation 2), defined in the block matrix
where each [M^{(j)}] contains N_{j} monomers. Note, that during the numerical simulations, we connect the last monomer of block [M^{(j)}] to the first one of [M^{(j+1)}] by a spring connector. One of the key ingredients of the present model is the addition of random spring connectors between non nearestneighbor (nonNN) monomer pairs. The maximal possible number N_{L} of nonNN connected pairs within (between) TAD A_{i} (A_{i} and A_{j}) is given by
We add Nc(i, j) spring connectors randomly between nonNN monomer pairs (Fig. 6b) between TADs A_{i} A_{j}. We define the squaresymmetric connectivity fraction matrix \({\mathrm{\Xi }} = \{ \xi _{ij}\}\), 1 ≤ i, j ≤ N_{T}, as the ratio of number of connectors to the total possible number of nonNN monomer pairs
A realization \({\cal{G}}\) of the RCL polymer is a random choice of \(Nc(i,j)\) monomer pairs to connect. We define the added connectivity matrix \(B^{\cal{G}}({\mathrm{\Xi }})\) by
The spring constants of the linear backbone and the added Nc connectors are similar. The added springs keep distal connected monomers into close proximity. The energy \(\phi _{\mathrm{\Xi }}^{\cal{G}}\) of realization \({\cal{G}}\) of the RCL polymer is the sum of the spring potential of the linear backbone plus that of added random connectors:
where κ is the spring constant, Tr is the trace operator. The dynamics of monomers R is induced by the potential energy 16 plus thermal fluctuations:
for a dimension d = 3, the mean distance b between neighboring monomers is defined for a linear chain when there are no added connectors (Nc = 0), a diffusion coefficient D, and the standard ddimensional Brownian motions ω have mean 0 and variance 1. We note that added connectors describe the crosslinking by binding molecules such as cohesin or CTCF, and their random positions in \({\cal{G}}\) realization capture the heterogeneity over cell population. In addition, after connectors have been added the mean distance between two consecutive monomers r_{m}, r_{m+1} can be smaller than the initial mean distance b for a Rouse chain (Nc = 0), as computed based on Eqs. (4)–(6).
Reporting summary
Further information on research design is available in the Nature Research Reporting Summary linked to this article.
Data availability
The datasets analyzed during the current study are available in the spatial organisation of the X inactivation center repository (www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE35721) for the 5C encounter matrices; and in the multiscale 3D genome rewiring during mouse neural development repository (www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE96107) for the HiC data.
Code availability
Codes for performing simulation and fit in the manuscript were constructed using Matlab 2017b. All codes are available from our repository website: http://bionewmetrics.org/statisticsofchromatinorganizationduringcelldifferentiationrevealedbyheterogeneouscrosslinkedpolymers/.
References
Nora, E. P. et al. Spatial partitioning of the regulatory landscape of the Xinactivation centre. Nature 485, 381–385 (2012).
Simonis, M. et al. Nuclear organization of active and inactive chromatin domains uncovered by chromosome conformation captureonchip (4C). Nat. Genet. 38, 1348–1354 (2006).
Pope, B. D. et al. Topologically associating domains are stable units of replicationtiming regulation. Nature 515, 402–405 (2014).
Dekker, J., Rippe, K., Dekker, M. & Kleckner, N. Capturing chromosome conformation. Science 295, 1306–1311 (2002).
Dixon, J. R. et al. Topological domains in mammalian genomes identified by analysis of chromatin interactions. Nature 485, 376–380 (2012).
Szabo, Q. et al. TADS are 3D structural units of higherorder chromosome organization in Drosophila. Sci. Adv. 4, eaar8082 (2018).
TarkDame, M., Jerabek, H., Manders, E. M. M., Heermann, D. W. & van Driel, R. Depletion of the chromatin looping proteins CTCF and cohesin causes chromatin compaction: insight into chromatin folding by polymer modelling. PLoS Comput. Biol. 10, e1003877 (2014).
Giorgetti, L. et al. Predictive polymer modeling reveals coupled fluctuations in chromosome conformation and transcription. Cell 157, 950–963 (2014).
Shukron, O. & Holcman, D. Transient chromatin properties revealed by polymer models and stochastic simulations constructed from chromosomal capture data. PLoS Comput. Biol. 13, e1005469 (2017).
LiebermanAiden, E. et al. Comprehensive mapping of longrange interactions reveals folding principles of the human genome. Science 326, 289–293 (2009).
Shukron, O. & Holcman, D. Statistics of randomly crosslinked polymer models to interpret chromatin conformation capture data. Phys. Rev. E 96, 012503 (2017).
Fraser, J. et al. Hierarchical folding and reorganization of chromosomes are linked to transcriptional changes in cellular differentiation. Mol. Syst. Biol. 11, 852 (2015).
Doi, M. & Edwards, S. F. The Theory of Polymer Dynamics Clarendon (Oxford, 1986).
Amitai, A. & Holcman, D. Polymer model with longrange interactions: analysis and applications to the chromatin structure. Phys. Rev. E 88, 052604 (2013).
Bohn, M. & Heermann, D. W. Diffusiondriven looping provides a consistent framework for chromatin organization. PLoS ONE 5, e12218 (2010).
Bohn, M., Heermann, D. W. & van Driel, R. Random loop model for long polymers. Phys. Rev. E 76, 051805 (2007).
Heermann, D. W. Physical nuclear organization: loops and entropy. Curr. Opin. Cell Biol. 23, 332–337 (2011).
Langowski, J. & Heermann, D. W. Computational modeling of the chromatin fiber. In Seminars in Cell & Developmental Biology, Vol. 18, 659–667 (Elsevier, 2007).
Sokolov, I. Cyclization of a polymer: firstpassage problem for a nonMarkovian process. Phys. Rev. Lett. 90, 080601 (2003).
Vasquez, P. A. & Bloom, K. Polymer models of interphase chromosomes. Nucleus 5, 376–390 (2014).
Vasquez, P. A. et al. Entropy gives rise to topologically associating domains. Nucleic Acids Res. 44, 5540–5549 (2016).
Verdaasdonk, J. S. et al. Centromere tethering confines chromosome domains. Mol. Cell 52, 819–831 (2013).
Bryngelson, J. D. & Thirumalai, D. Internal constraints induce localization in an isolated polymer molecule. Phys. Rev. Lett. 76, 542 (1996).
Jespersen, S., Sokolov, I. M. & Blumen, A. Smallworld Rouse networks as models of crosslinked polymers. J. Chem. Phys. 113, 7652–7655 (2000).
Jost, D., Carrivain, P., Cavalli, G. & Vaillant, C. Modeling epigenome folding: formation and dynamics of topologically associated chromatin domains. Nucleic Acids Res. 42, 9553–9561 (2014).
Michieletto, D., Orlandini, E. & Marenduzzo, D. Polymer model with epigenetic recoloring reveals a pathway for the de novo establishment and 3D organization of chromatin domains. Phys. Rev. X 6, 041047 (2016).
Noordermeer, D. & de Laat, W. Joining the loops: βglobin gene regulation. IUBMB Life 60, 824–833 (2008).
Fudenberg, G. et al. Formation of chromosomal domains by loop extrusion. Cell Rep. 15, 2038–2049 (2016).
Amitai, A. & Holcman, D. Diffusing polymers in confined microdomains and estimation of chromosomal territory sizes from chromosome capture data. Phys. Rev. Lett. 110, 248105 (2013).
Cheng, T. M. K. et al. A simple biophysical model emulates budding yeast chromosome condensation. eLife 4, e05565 (2015).
Anderson, D. E., Losada, A., Erickson, H. P. & Hirano, T. Condensin and cohesin display different arm conformations with characteristic hinge angles. J. Cell Biol. 156, 419–424 (2002).
Rao, S. P. et al. A 3D map of the human genome at kilobase resolution reveals principles of chromatin looping. Cell 159, 1665–1680 (2014).
Weinreb, C. & Raphael, B. J. Identification of hierarchical chromatin domains. Bioinformatics 32, 1601–1609 (2015).
Bonev, B. et al. Multiscale 3D genome rewiring during mouse neural development. Cell 171, 557–572 (2017).
Amitai, A., Seeber, A., Gasser, S. M. & Holcman, D. Visualization of chromatin decompaction and break site extrusion as predicted by statistical polymer modeling of singlelocus trajectories. Cell Rep. 18, 1200–1214 (2017).
Hauer, M. H. et al. Histone degradation in response to DNA damage enhances chromatin dynamics and recombination rates. Nat. Struct. Mol. Biol. 24, 99–107 (2017).
Javer, A. et al. Persistent superdiffusive motion of Escherichia coli chromosomal loci. Nat. Commun. 5, 3854 (2014).
Javer, A. et al. Shorttime movement of E. coli chromosomal loci depends on coordinate and subcellular localization. Nat. Commun. 4, 3003 (2013).
Kepten, E., Bronshtein, I. & Garini, Y. Improved estimation of anomalous diffusion exponents in singleparticle tracking experiments. Phys. Rev. E 87, 052713 (2013).
Barbieri, M. et al. Complexity of chromatin folding is captured by the strings and binders switch model. Proc. Natl Acad. Sci. USA 109, 16173–16178 (2012).
Hansen, A. S., Pustova, I., Cattoglio, C., Tjian, R. & Darzacq, X. CTCF and cohesin regulate chromatin loop stability with distinct dynamics. eLife 6, e25776 (2017).
Amitai, A., Toulouze, M., Dubrana, K. & Holcman, D. Analysis of single locus trajectories for extracting in vivo chromatin tethering interactions. PLoS Comput. Biol. 11, e1004433 (2015).
Gasser, S. M. Nuclear architecture: past and future tense. Trends Cell Biol. 26, 473–475 (2016).
Hnisz, D., Shrinivas, K., Young, R. A., Chakraborty, A. K. & Sharp, P. A. A phase separation model for transcriptional control. Cell 169, 13–23 (2017).
PericHupkes, D. et al. Molecular maps of the reorganization of genome–nuclear lamina interactions during differentiation. Mol. Cell 38, 603–613 (2010).
Amitai, A. & Holcman, D. Polymer physics of nuclear organization and function. Phys. Rep. 678, 1–83 (2017).
Dion, V. & Gasser, S. M. Chromatin movement in the maintenance of genome stability. Cell 152, 1355–1364 (2013).
Weber, S. C., Spakowitz, A. J. & Theriot, J. A. Bacterial chromosomal loci move subdiffusively through a viscoelastic cytoplasm. Phys. Rev. Lett. 104, 238102 (2010).
Weber, S. C., Spakowitz, A. J. & Theriot, J. A. Nonthermal ATPdependent fluctuations contribute to the in vivo motion of chromosomal loci. Proc. Natl Acad. Sci. USA 109, 7338–7343 (2012).
Harmon, S. T., Holehouse, A. S., Rosen, M. K. & Pappu, R. V. Intrinsically disordered linkers determine the interplay between phase separation and gelation in multivalent proteins. eLife 6, e30294 (2017).
Acknowledgements
V.P. and D.N. acknowledge funding from the Fondation pour la Recherche Médicale (AJE20140630069) and the Agence Nationale de la Recherche (ANR14ACHN000901 and ANR16TERC002701). D.H.’s research is supported by FMR team (DEQ20160334882).
Author information
Authors and Affiliations
Contributions
O.S. and D.H. wrote the manuscript. O.S. and D.H. conceived the research. O.S. designed the code, performed numerical simulations, and mathematical derivations. O.S. and D.H. established the model. D.N. and V.P. performed analysis of the contact matrices and analyzed the data.
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing interests.
Additional information
Peer review information: Nature Communications thanks the anonymous reviewer(s) for their contribution to the peer review of this work. Peer reviewer reports are available.
Publisher’s note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary Information
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Shukron, O., Piras, V., Noordermeer, D. et al. Statistics of chromatin organization during cell differentiation revealed by heterogeneous crosslinked polymers. Nat Commun 10, 2626 (2019). https://doi.org/10.1038/s4146701910402x
Received:
Accepted:
Published:
DOI: https://doi.org/10.1038/s4146701910402x
Comments
By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.