Heterogeneous cross-linked polymers to reconstruct chromatin reorganization during cell differentiation

Chromatin of mammalian nucleus folds into discrete contact enriched regions such as Topologically Associating domains (TADs). The folding hierarchy of the TADs and their internal organization is highly dynamic through cellular differentiation, where structural changes within and between TADs are correlated with gene activation and silencing. How these changes in chromatin conformation is correlated with gene regulation can be revealed by polymer models. Here we introduce a heterogeneous randomly cross-linked (RCL) polymer model that accounts for multiple interacting TADs. We study how connectivity within and between TADs affects chromatin organization and dynamics. Closed formula for the steady-state encounter probability within and between TADs indicate how cross-links across TADs affect TAD size and the dynamics of the chromatin. We apply the present polymer model to characterize the degree of compaction and show how secondary structures such as meta-TADs can be detected in Hi-C data. Finally, we apply the present approach to reconstruct high-order TAD organization due to inter-TAD connectivity for three TADs obtained from chromosomal capture data for the mammalian X chromosome during three successive stages of differentiation.


INTRODUCTION
Mammalian chromosome folds into discrete mega-bps (mbp) contact enriched regions termed Topologically associating domain (TADs). Although the precise role of TADs remain unclear, they participate in gene regulation [1, 2] and allows gene to activate synchronously, a concept known as replication timing units [3]. Gene regulation within TAD is modulated by transient loops at sub-TAD scale [1, 4,5], formed by regulatory elements such as enhancers and promoters [6]. However, sparse connectors between TADs (at a scale >mbps) can significantly influence chromatin dynamics and gene regulation by generating bridges within TADs [7,8]. We present here a polymer model that accounts for interactions between TADs to study chromatin dynamic at long scales, an area that remains largely unexplored. Genome organization can be probed by chromosome conformation capture (CC) methods [2, 4,9], which report simultaneous genomic contacts (loops) at scales of kilo bps (kbps) to mbps. TADs appear as matrix block in the contact maps by averaging encounter events over an ensemble of millions of cells [1,9]. CC contact maps provide a statistical summary of steady-state looping frequencies, but do not contain neither direct information about the size of the folded genomic section nor any transient genomic encounter times [10]. Throughout cell differentiation stages, the boundaries between TADs remain stable, [1,11] but their internal looping pattern is highly variable. Moreover, TADs can form hierarchy structures into meta TADs, formed by inter TAD connectivity, correlated with transcription state of the chromatin [11]. Coarse-grained polymer models are used to represent chromatin in steady state and in transient regimes, but also to study the statistical properties of the chromatin at a given scale. Starting with the Rouse polymer [12], composed of N monomers connected sequentially by har-monic springs, the linear connectivity in the Rouse model does not account for the complexity of molecular interactions and therefore cannot account for contact enriched regions such as TADs. To include short and longrange looping, other polymer models have been developed [10,[13][14][15][16][17][18], such as self avoiding interactions, random loops [8,13,19,20] and epigenomic state [21,22] demonstrating that TADs formation results from the intrinsic polymer properties. Here, to account for interactions between multiple TAD, we develop a parsimonious polymer model of the chromatin. We extend the construction of the randomly cross-linked (RCL) polymer model, introduced in [7,10], to account for multiple connected TADs of various sizes and connectivity within and between TADs. RCL can be generated by binding molecules such as CTCF [1] or by loop extrusion mechanism [23], although the exact mechanism by which they form is not the focus of the present work. We randomize the positions of the crosslinks to capture the heterogeneity in chromatin structure in an ensemble of cells. To study the polymer dynamics, we obtain closed formulas for the encounter probability (EP), variance and the radius of gyration of the heterogeneous RCL polymer that reveals the role of parameters. We show that TADs and higher order genome organization can be modulated by external connectivity between TADs. This external connectivity also affect steady-state and time-dependent statistical properties of monomers within each TADs. To demonstrate the impact of the present model, we study the reorganization of three neighboring TADs on the mammalian X chromosome throughout successive stages of cellular differentiation. By computing the average number of connectors within and between TADs from fitting the analytical expression for the EP to the empirical 5C data [1], we evaluate for the first time the rate of compaction and decompaction of TADs, correlated to gene silencing and activation, respectively. To conclude, the present framework allows to study chromatin dynamics from 5C data, to derive its statistical properties and reveal the influence of multiple connected TADs on each other and genome reorganization throughout cell differentiation.

CONSTRUCTING A RCL POLYMER FOR A SINGLE ASSOCIATING DOMAIN (AD)
The RCL polymer for a single AD [10] in dimension d = 3 is composed of N monomers at positions R = [r 1 , r 2 , ..., r N ] T , connected sequentially by harmonic springs [12] and N c spring connectors are added between random non-nearest neighboring (NN) monomer pairs (Fig. 1A) leading to a realization G. The added connectors describe the cross-linking by binding molecules such as CTCF, and their random position captures the heterogeneity over cell population. The energy of the RCL polymer is the sum of the spring potential of the linear backbone and that of added random connectors (called realization G) [15,20] : where κ = dk B T /b 2 is the spring constant, b the standard-deviation of the spring connector between monomers, k B is the Boltzmann's constant and T the temperature. The matrix M is the Rouse matrix for the linear backbone [12], defining NN monomers connectivity and B G (ξ) is a square symmetric connectivity matrix, representing connections between N c randomly chosen non NN monomer pairs, and 0 ≤ ξ ≤ 1 is the fraction of maximal possible non NN monomer pairs, and related to N c by We define the added connectivity matrix as For each polymer realization, we randomize the choice of N c monomer pairs to connect out of the possible (N − 1)(N − 2)/2 pairs, i.e., we change the matrix B G (ξ). The dynamics of the resulting monomers (vector R) is driven by Brownian motion and the field of forces derived from the energy 1, where D = kB T ζ is the diffusion constant, ζ is the friction coefficient, ω are standard d-dimensional Brownian motions with mean 0 and variance 1. To study the statistical properties of 4, a mean-field approach [10] is used, where we replaced B G (ξ) by its average ⟨B G (ξ)⟩ (average over all realization G for a given ξ [10,15]). To recover the connectivity fraction ξ and the mean number of crosslinks (N c ), we have used the formula for the steady-state variance, the EP between monomers m, n of a single AD [10], and the expression for the EP fitted to empirical 5C data of a single TAD. We shall now explain how this procedure can be extended to many TADs, so that the entire properties of the chromatin can be accounted for.

RCL POLYMER MODEL FOR MULTIPLE INTERACTING ADS
To construct multiple interacting TADs, we extend equations 1-4 to include N T > 1 ADs of variable sizes, inter and intra AD-connectivity. The heterogeneous polymer consists of connecting sequentially N T RCL polymers of N = [N 1 , N 2 , . . . , N NT ] monomers, respectively (Fig. 1B). The position of monomers R consists of N T vectors in the block matrix form where the superscript indicates the AD 1, . . . , N T it belongs to and square brackets indicate a block matrix. The polymer's linear backbone is a chain of N T Rouse matrices, defined in the matrix block form by where each [M (j) ] is a Rouse matrix of the form 2 with N j monomers. The square-symmetric connectivity fraction matrix Ξ = {ξ ij }, 1 ≤ i, j ≤ N T accounts for connectors within and between ADs, and the mean-field average added block connectivity matrix is ⟨B G (Ξ)⟩ ( Fig 1C) given by where and ∂A i are the indices of the boundaries of A i . The derivation of matrix ⟨B G (Ξ)⟩ is described in the appendix. The energy of the heterogeneous RCL polymer is computed from Eq. 1, with the connectivity fraction Ξ, the block matrix 6, and average random connectivity matrix ⟨B G (Ξ)⟩. The mean-field stochastic equations describing the dynamics of monomers R is where ω are Brownian motions with mean 0 and standard-deviation 1.

Statistical properties of heterogeneous RCL and numerical validation
We summarize here the main statistical properties of the heterogeneous RCL representing TADs. We first present the mean square radius of gyration (MSRG), second the encounter probability (EP) of the heterogeneous RCL polymer in two cases: encounters between monomers of the same AD (intra AD) and monomers of different ADs (inter-AD). Third, we discuss the mean square displacement of monomers of the heterogeneous RCL polymer in each AD. The computations are detailed in the Appendix. The mean square radius of gyration (MSRG) characterizes the folding of each AD inside a ball of radius R g . When the condition of dominant intra-AD connectivity (assumption H1, appendix 36) is satisfied, we obtain the MSRG for A i (see Appendix) where and Second, under the condition of non-vanishing connectivities (H n , see condition 46), the EP between monomer m and n inside A i is given by where and ⟨R 2 g ⟩ (i) is the MSRG of A i , defined in relation 10. When each monomer belongs to two distinct ADs i and j (i ̸ = j), the EP formula is modified to (see Appendix) where Finally, the Mean-Square Displacement of a monomer of the averaged heterogeneous RCL polymer inside A i for intermediate times (see appendix) is given by and Erf is the classical Error function. We now investigate the steady-state EP within ADs (Eq. 13) and between ADs (Eq. 15) by comparing Brownian simulations and formulas: for that purpose, we constructed a RCL polymer containing three ADs with N 1 = 50, N 2 = 40, N 3 = 60 total monomers, so that condition H1 (appendix Eq. 36) about dominant intraconnectivity is satisfied. We impose the number of connectors in each AD to be at least twice compared to the one between ADs (see Fig. 2A).
To construct the encounter frequency matrix, we use equation 9 in dimension d = 3, with b = √ 3µm and diffusion coefficient D = 1µm 2 /s starting with a random walk initial polymer configuration. The connectors are placed uniformly in each A i and in between ADs according to the numbers indicated in Fig. 2A. We ran 5000 simulations until polymer relaxation time, defined as ) , This is the longest relaxation time of RCL chains among N T [10] ADs of the order of tens of thousands of steps.
We then constructed the simulation encounter frequency matrix, where the encounter distance is ϵ = b/10. The encounter frequency matrix shows three distinct diagonal blocks ( Fig. 2A) resulting from high intra AD connectivity, and further reveals a high-order organization (cyan blue in Fig. 2A), which resembles the meta-ADs discussed in [11]. We thus propose here that hierarchical AD organization is a consequence of weak inter connectivity properties. Second, we computed using numerical simulations, the steady-state EP (Eqs. 13 and 15) from the encounter frequency matrix ( Fig. 2A) by dividing each row with its sum. We then compared the results of the simulations with the theoretical EP (Eqs. 13 and 15) in Fig.  2B: we show three sample curves for monomer 20 (upper left), 70 (upper right) and 120 (lower) located in the middle of each AD, which are in good agreement with the theory. Finally, the MSRG ⟨R 2 g ⟩ (i) for A 1 , A 2 and A 3 is 2.42, 1.5, 2.15µm (simulations), compared to 2.38, 1.31, 2.09, respectively, obtained from expression 10, which are in good agreement. To validate the MSD expression (Eq. 17), we simulated eq. 9 for 500 steps with a time step ∆t = 0.05s, past the relaxation time τ (Ξ) (Eq. 18) and computed the MSD for each monomers in each AD i = 1, 2, 3. In Fig. 2C we plotted the mean MSD in each AD against the theoretical expression Eq. 17 (dashed), which are in good agreement. The overshoot of the MSD of A 1 , results from the fact that center of masses of ADs remain weakly coupled (see Eq. 35). The height of the MSD curve is inversely proportional to the total connectivity of each AD, with A 1 (Fig. 2C, orange, 26 connectors), A 2 (blue, 46 connectors) and A 3 (red, 37 connectors). We conclude that various statistical properties can be extracted from the present heterogenous RLC polymer model that we shall use now to reconstruct chromatin in different phases of nucleus cell differentiation.

Genome reorganization with multiple TADs from the 5C data reconstructed from the heterogeneous RCL polymer during cellular differentiation
We now show how the heterogeneous RCL model can be used to reveal the chromatin properties based on 5C data of the X chromosome. We focus on the chromatin reorganization during three successive stages of differ- We used the average of two replica of a subset of the 5C data generated in [1], harboring 3 topologically associating domains (TADs): TAD D, E, and F, which span a genomic section of about 1.9 Mbp. We coarse-grained the 5C encounter frequency data at a scale of 6 kb, which is twice the median length of the restriction segment of the HindII enzyme used in producing the 5C data [7,8]. At this scale we empirically found that long-range persistent peaks of the 5C encounter data are smoothed out sufficiently to enable us to use expressions 13 and 15 for fitting the 5C EP. The resulting coarse-grained encounter frequency matrix includes pair-wise encounter data of 302 equally-sized genomic segments. To determine the posi-tion of TAD boundaries, we mapped the TAD boundaries reported in bp (see [1] Supplementary Information, 'Analysis of 5C data, section) to genomic segments after coarse-graining. We constructed a heterogeneous RCL polymer with N D = 62, N E = 88, N F = 152 monomers for TAD D, E and F, respectively. To compute the minimum number of connectors within and between TADs, we fitted the EP of each monomer in the coarse-grained empirical EP matrix using formulas 13 and 15. In Fig.  3A, we present the fitted EP matrices for MESC (left) NPC (middle) and MEF (right). We computed the average number of connectors N c , within and between each TAD by averaging the connectivity values obtained from the fitting the EP over all monomers to obtain the connectivity matrix Ξ and used relations 19 and 20. The mean number of connectors in the differentiation from MESC to NPC showed an increase by 145-200% (Fig. 3B) within and between TADs. The number of connectors within TAD F increases by 145% from 22 for MESC to 32 in NPC, the inter-connectivity between TAD F and E increases by 150% from 9 at MESC to 14 for NPC and the connectivity between TAD F and D doubled from 6 connectors for MESC to 12 for NPC, whereas the number of connectors within TAD E remained constant of 14. In the differentiation stage between NPC to MEF the number of connectors within TADs D, E and F returned to values comparable to MESC, whereas the inter-TAD connectivity between TAD F and E decreased from 14 for NPC to 9 for MEF. To evaluate the size of the folded TADs, we computed the mean square radius of gyration of the three TADs throughout differentiation stages. We found that the MSRG increases and decreases in correlation with the acquisition and lose of connectors in all TADs. We Computed the MSRG (Eq. 10) for each TAD using the calibrated connectivity matrix Ξ, obtained from fitting the experimental 5C EP (Fig. 3C, left) in the units of b 2 . The MSRG of all TADs decreased from average of 2 b 2 µm at MESC stage to 1.7 b 2 µm for NPC and increased back to 2 b 2 µm for MEF cells. In the transition from MESC to NPC, the MSRG of TAD D surpassed that of TAD E (Fig. 3C, red squares) due to the increased in inter-TAD connectivity between TAD E and F at NPC stage.

DISCUSSION
In this work, we constructed a general RCL model [7,10] for multiple interacting ADs of variable size, intra and inter-AD connectivities. We provide a framework for studying a complete representation of conformation capture (3C,5C, and Hi-C) data, where we account for both intra and inter-TAD connectivity. The RCL polymer model allows estimating the average number of crosslinks within and between each TAD, length-scales such as the mean square radius of gyration to characterize the size of the folded AD, and the mean square displacement in multiple ADs. These quantities cannot be derived from the empirical conformation capture data and requires complementary method, such as single particle tracking [24,25]. The present analysis reveals also the statistics of complex matter at the transition point between liquid and gel [26,27] and also how transcriptional bursting dynamics is modulated by the random architecture of chromatin sub-organization. To validate the heterogeneous RCL model, we reconstructed multiple TAD reorganization during three successive stages of cell differentiation: mouse embryonic stem cell (ESC), neuronal precursor cells (NPC) and mouse embryonic fibroblasts (MEF). Indeed, we fitted expressions 13 and 15 to the empirical 5C encounter ma-trices of three differentiation stages (Fig. 3A) and showed that the RCL model produced contact enriched TADs with a variability in monomer connectivity within each TAD (Fig. 3A, super and sub diagonals of EP matrices). We use the average connectivity we obtained from the fitting procedure to derive the average number of connectors between and within TADs (Fig. 3B). In the coarse-graining scale of 6 kb, our results show that the X chromosome acquires connectors within and between TADs D E and F [1] in the transition between MESC to NPC and later returns to values comparable to those of MESC at MEF stage. The increasing connectivity for NPC cells is correlated with the acquisition of LaminB1 (see [1], Figure 3). The increase in the number of connectors in each TAD at NPC stage correlated with TAD compaction. Indeed, the MSRG curves (Fig. 3C left) decrease at NPC stage, indicating a higher chromatin compaction for all TADs. The compaction ratio (in relation to linear Rouse polymer) showed high compaction at NPC stages (Fig. 3C, right) for all TADs, which can be associated with heterochromatin state, suppressed gene expression and lamina associating domains [28]. Inter-TAD connectivity is quite stable as the number of connectors for TAD E remains constant at 14 however, but the MSRG and compaction ratio decreased at NPC stage. This result indicates that the a full description of the 5C data by polymer models must take into account inter-TAD connectivity. Overall, the RCL polymer captures the correlated reorganization of TAD D, E and F during differentiation. TAD reorganization we found is also compatible with transcription co-regulation in these TADs [1]. Moreover, our construction of the RCL polymer accounts for multiple connected regions with weak inter-TAD connectivity, as suggested by experimental CC data [1,4,5,11]. Despite such weak inter-AD connectivity, multiple ADs can affect a locus dynamics inside a TAD. The expression we obtained for the steady-state encounter probability (Eqs. 13 and 15) can be used to describe the 5C encounter probability matrices. We validated these expressions by Brownian simulations and showed that we can capture both TADs and higher order structures (metaTADs [11]) resulting from inter-TAD connectivity ( Fig. 2A and B). We further demonstrated how the dynamics of monomers is affected by the local connectivity between and within ADs, using the MSD curves (Fig. 2C). This result explains the large variability in MSD behavior seen experimentally in single particle tracking experiments [7,24,29] based on the variability in local chromatin organization in cell population, represented here by the random connector localisation. To conclude, the present framework is new tool to reconstruct systematically chromatin structural reorganization from HiC matrices and can be used to interpret chromatin capture data. In particular, using any experimental data of ligation proximity experiments (Hi-C, 5C 4C), the model reveals statistical properties beyond CC data.

APPENDIX From HiC statistical maps to the average connectivity matrix ⟨B G (Ξ)⟩
To construct ⟨B G (Ξ)⟩, we first define the indices b i that represent the boundary between successive blocks i and i + 1, 1 ≤ i ≤ N T (Fig. 4A): and represent the bottom right corner of each block of ⟨B G (ξ)⟩ (Fig. 4A) where N L is the maximal possible number of non nearest neighbor (NN) connected pairs in block [T ij ], given by We make the choice that the number N c (ξ ij ) connected pairs is independent between blocks [T ij ], (i ̸ = j), and all monomer pairs are equally likely to be chosen within each block. Due to symmetry, we shall only consider choices of monomer pairs in the upper triangular part of ⟨B G (ξ)⟩. The probability,P r ii (k), of monomer b i < m < b i+1 to have k ≥ 0 connectors in [T ii ], is computed by choosing k positions in row m and the remaining N L − k connectors in any row and column n ̸ = m, leading to the hypergeometric probability and for off-diagonal blocks [T ij ], j > i where Here we consider that the mean number of non NN connectors β m (ξ) for monomer b i ≤ m ≤ b i+1 is the sum of the mean number of connectors in each block The matrix ⟨B G (Ξ)⟩ can then be defined for each block [T ij ] by where 1 ij is a N i × N j matrix of ones, M (i) is a Rouse matrix of N = N i monomers (Eq. 2), and I i is an N i ×N i identity matrix.

Diagonalization of the average connectivity matrix
To study the steady-state properties of equation 9, we decouple the system 9 into independent coordinates by diagonalizing it in the normal coordinates where is a diagonal block matrix where each block [V (i) ] is a N i × N i matrix of the eigenvectors of a Rouse polymer of N i monomers [12], given by We multiply 9 from the left by 27 to obtain the system of equations in the normal form where η = V ω are Brownian motion with mean 0 and standard-deviation 1, and is a N T × N T block diagonal matrix of the eigenvalues of Rouse chains [12] of N 1 , N 2 , . . . , N T monomers defined by where 2) and therefore the multiplication V ⟨B(Ξ)⟩V T in the right-hand side of 29, can be carried out for each block separately to obtain where Gij is a Ni × Nj matrix of zeros with 1 in its the top left cell. The stochastic equations that describes the dynamic of the normal coordinate u and the centers of masses u where η (i) m are d-dimensional Gaussian noise with mean 0 and standard-deviation √ 2D. In the following analysis, we consider the case of a dominant intra-AD connectivity, i.e which is well verified for the properties TADs of CC maps [1, 4,5].

Mean square radius of gyration for each AD
The mean square radius of gyration (MSRG) characterizes the folded size of each AD. We derive here an expression for the distribution of the square radius of gyration of each AD under the assumption of dominant intra-AD connectivity 36. We start with the classical formula for the distribution of the square radius of gyration P (R 2 g ) is given by [30] where ic is the complex unit, [Γ (i) ] is a (Ni − 1) × (Ni − 1) diagonal matrix of eigenvalues for the block connectivity matrix of AD i excluding the first vanishing one, and det is the determinant operator. We evaluate now the determinant in integral 37 [30] det where T r is the trace operator. We truncate the series in 38 at order p = 2 and substitute the resulting expressing in 37 to obtain Under the present approximation, the distribution of the square radius of gyration of each AD i is Gaussian with mean and variance given respectively by Under the assumption of dominant intra-AD connectivity, we approximate [Γ where Substituting 44 and 12 into 42 and then into 40, we obtain the expression for the MSRG of AD i . (45)

Encounter probability of monomers of the heterogeneous RCL polymer
To compute the encounter probability (EP) in the heterogeneous RCL polymer, we distinguish two cases: encounters between monomers of the same AD (intra AD), and monomers of different ADs (inter AD). We now derive formulas for the EP in these two cases under a non-vanishing connectivity assumption defined by (Hn) ξij > 0, 1 ≤ i, j ≤ NT . (46)

Intra-AD encounter probabilities
For monomers 1 ≤ m, n ≤ Ni of AD i, expressing the position in the normal coordinates 26, we get We recall that the variance the Ornstein-Uhlenbeck system 34 is where ⟨u The steady-state limit of 48 gives Substituting 49 in 47, we obtain (50) For Ni ≫ 1, we approximate the sum 50 using the Euler-MacLaurin formula, making a change of variable x = pπ N i , dx = N i π dp to obtain where y (i) (Ξ) is defined in expression 43. The integral in 51 is computed in the complex plane along the contour of the unit disk [10] using the residue theorem and we get where ⟨R 2 g ⟩ (i) is the MSRG of Ai defined by expression 45. Because the heterogeneous RCL model belongs to the class of generalized Gaussian models [31],the EP between monomer m and n of AD i is given by

Inter-AD encounter probabilities
We now compute the EP between monomers of Ai and Aj (i ̸ = j). We start by computing the variance of the vector between r (i) m of Ai and r (j) n of Aj, using the center of masses r The variance of the vector between a monomer m of Ai and its center of mass in normal coordinates (Eq. 26) at steady-state is Similarly for Aj, we obtain ⟨(r (j) n − r (j) cm ) 2 ⟩ = ⟨R 2 g ⟩ (j) (1 + ζ (j) 0 (Ξ) 1−2n )).
Under the assumption of weak inter-AD connectivity (Assumption H1, Eq. 36), the center of masses of ADs are independent. Thus the variance of the vectors between center of masses of Ai and Aj is from the Ornstein-Uhlenbeck equation 35 at steady-state, the variance ⟨u Substituting 58 for Ai and Aj into 57, we obtain Substituting expressions 55, 56 and 59 into 54, we obtain the final expression for the inter-AD variance The EP between monomer m of AD i and n of AD j is

Mean-Square Displacement of monomers of the heterogeneous RCL polymer
We now study the dynamical properties using the mean square displacement (MSD) of monomers in Ai. The computational method is similar to [10], where here we replace the mean number of connectors of monomers N ξ of a single AD by the sum ∑ N T k=1 N k ξ ik , which is the mean number of connectors of monomers in Ai when there are multiple ADs.
where Dcm = D ∑N T k=1 N k . Expression 63 can be further approximated as Expression 64 scales as √ t similar to Rouse. This effect is due to the averaging over all realization. We also highlight how the MSD curve depends on the connectivity between Ai and all other ADs.