The Rabl configuration limits topological entanglement of chromosomes in budding yeast

The three dimensional organization of genomes remains mostly unknown due to their high degree of condensation. Biophysical studies predict that condensation promotes the topological entanglement of chromatin fibers and the inhibition of function. How organisms balance between functionally active genomes and a high degree of condensation remains to be determined. Here we hypothesize that the Rabl configuration, characterized by the attachment of centromeres and telomeres to the nuclear envelope, helps to reduce the topological entanglement of chromosomes. To test this hypothesis we developed a novel method to quantify chromosome entanglement complexity in 3D reconstructions obtained from Chromosome Conformation Capture (CCC) data. Applying this method to published data of the yeast genome, we show that computational models implementing the attachment of telomeres or centromeres alone are not sufficient to obtain the reduced entanglement complexity observed in 3D reconstructions. It is only when the centromeres and telomeres are attached to the nuclear envelope (i.e. the Rabl configuration) that the complexity of entanglement of the genome is comparable to that of the 3D reconstructions. We therefore suggest that the Rabl configuration is an essential player in the simplification of the entanglement of chromatin fibers.

Here we hypothesized that the Rabl configuration significantly reduces the incidence of chromosome entanglements during interphase. To test this hypothesis we turned to budding yeast. The three dimensional organization of the budding yeast genome is much simpler than that of the human genome, its chromosome territory structure is much less pronounced, and its overall organization is believed to be closer to equilibrium than that of the mammalian genome 56,57 . On the other hand, the budding yeast genome preserves some of the basic packaging features of higher organisms 21,53,58 and presents very strong evidence for the Rabl configuration 49,50,59,60 . We used previously published 3D reconstructions obtained from chromosome conformation capture (CCC) data which have successfully illuminated many aspects of 3D chromosomal architecture in yeast including chromatin structure, functional compartmentalization and DNA repair [61][62][63] .
Quantification of the entanglement complexity between two chains is a well-defined mathematical problem when the chains are circular; in this case entanglement complexity is commonly quantified by topological invariants, including the linking number 64 . Since chromosomes are open chains (i.e. non-circular), new mathematical tools are required for quantifying their entanglement [65][66][67] . Here we propose a method inspired by advances in knot identification in proteins 68 . There, the knot type of the protein backbone is identified by producing ensembles of closed circular chains associated to a single protein backbone conformation. The topology of the protein is probabilistic in nature and determined by the proportion of different knots observed in the associated ensembles. We extend this approach by examining the entanglement of pairs of open chains, and by introducing the linking proportion for two open chains as a measure of their entanglement complexity. The distribution of the linking proportions associated to each pair of chromosomes in a CCC reconstruction quantifies the entanglement complexity of the reconstruction and provides a measure to compare experimentally derived reconstructions against each other and against theoretical models.
To quantify the entanglement complexity of the yeast genome, we compared the distributions of linking proportions associated with CCC reconstructions to the distributions of linking proportions associated to randomly embedded semiflexible (wormlike) chains in confinement. We found that the entanglement complexity of all reconstructions was lower than the entanglement complexity predicted by the wormlike chain model. This finding validates our approach and provides further evidence that the yeast genome is not randomly organized in the interphase nucleus 52,[69][70][71] . We also show that the entanglement complexity of a random organization of simulated yeast chromosomes can be significantly reduced by attachments of centromeres or telomeres to the nuclear envelope; although these cannot fully account for the low entanglement complexity observed in the 3D reconstructions of the yeast genome. The absence of entanglement observed in the reconstructions is achieved only with the implementation of the Rabl configuration. We therefore suggest that the Rabl configuration is a key organizational feature that prevents the yeast chromosomes from becoming entangled.

Data, Models and Methods
Data. In previous works we generated a set of eleven 3D reconstructions of the yeast genome 72 using data published by Duan and colleagues 62 . Reconstructions in Figure 1 were obtained using those published data. These reconstructions were the product of two consecutive restriction reactions with different enzymes, different false discovery rate (FDR) threshold 73 at the time of significance assessment of contacts, and/or different physical distances. Reconstruction 1 was the reconstruction reported in the publication by Duan and colleagues; It was obtained from contact maps using HindIII followed by the combination of MseI and MspI (denoted by MseI ∪ MspI) restriction assays and a FDR threshold of 0.01. Reconstructions 2-4 were obtained by changing the FDR threshold (0.01, 0.1 and 1.0). Reconstructions 5, 6, 7 and 8 used restriction enzyme HindIII followed by MseI (Reconstruction 5), MspI (Reconstruction 6) and MseI ∪ MspI (reconstructions 7 and 8). Additionally, reconstructions 7 and 8 used recomputed physical distances. Reconstruction 9 used the common fragments obtained from EcoRI and HindIII (denoted by EcoRI ∩ HindIII) followed by MseI ∪ MspI; reconstruction 10 used EcoRI followed by MseI ∩ MspI and reconstructions 11 and 12 used EcoRI followed by MseI and MspI repectively 72 . Distance units in the reconstructions reported here coincide with those used in 72 , and are proportional to, the experimental measurements 62 .

Quantification of the entanglement complexity of the yeast genome and classification of 3D reconstructions.
To determine the topological complexity of the yeast genome we defined a new geometrical invariant that extends the concept of linking number for open curves. We use the single stochastic closure method 68,74 to associate an ensemble of closed trajectories to each chromosome reconstruction. For this purpose, we define a sphere S with its center coinciding with the center of mass of the original reconstruction (all 16 chromosomes), and radius > R r, where r is the radius of the smallest sphere containing the reconstruction (e.g. ≈ r 110 and = R 150). We then define a circular trajectory for each chromosome by tracing two rays connecting the telomeres of each chromosome reconstruction with a point P chosen at random on the surface of the sphere S (Fig. 2a). We obtained an ensemble of closed trajectories associated to the chromosome by repeating this process for P i , = … i N 1, , . In the results reported below we used = N 10 3 . To validate our method, we recomputed the linking proportion for different values of R, P and N and obtained consistent results. We define the linking proportion for the reconstruction of two chromosomes l 1 and l 2 in a given reconstruction or simulation as otherwise, where I is a topological invariant, which in our study is the linking number, of two circular trajectories C i 1 and C j 2 associated, through the single stochastic www.nature.com/scientificreports www.nature.com/scientificreports/ closure method, to the linear trajectories l 1 and l 2 . We calculated the linking number using the double Gaussian integral form 75 and compiled the linking proportion results in a lower triangular matrix, as illustrated in Table 1. Using the Kolmogorov-Smirnov (KS) test 76 we determined differences between reconstructions by testing whether the distribution of entries in the tables in their vectorized form 77 were derived from the same distribution (Section 2.3). Differences between reconstructions and simulated configurations were estimated using the Wilcoxon test (Section 2.5). In both statistical tests we assumed samples were independent and identically distributed. All p-values were corrected using FDR.
Simulation of Yeast Genomes. We modeled yeast chromosomes as a discrete approximation of a semiflexible chain with no torsional constrain, a configuration known as the wormlike chain 78 . Each chromosome C consisted of n C segments … e e { , } n 1 C of equal length l and an energy given by  (Fig. 1). The entries reported are given in %. The entry in row i and column j corresponds to PLk(l i , l j ), the linking proportions of the linear chromosomes l i and l j . Entries highlighted in red indicate a linking proportion greater than 50%.  (1) shows the reconstruction using data from 62 and Panel (2) shows a replicate of Panel (1) obtained using the same input parameters. Panels (2-12) Reconstructions obtained in 72 using different experimental conditions and reconstruction parameters. All reconstructions are consistent with CCC data.
www.nature.com/scientificreports www.nature.com/scientificreports/ where k is the bending rigidity constant of the 10 nm fiber, and θ i is the exterior angle between edges e i and e i+1 . Given the experimentally estimated value for the persistence length of the yeast genome = ± L n m 197 62 p 52 , we calculated the value of the bending rigidity constant k and the corresponding number of discrete segments necessary to represent each Kuhn length 78 . Each chromosome realization was obtained by applying simulated annealing to a freely jointed chain that was gradually confined inside a sphere of fixed radius (i.e. the cell nucleus) and simultaneously minimized with respect to the energy of the wormlike chain. To generate Rabl configurations the same annealing algorithm was used with the additional conditions that centromeres were clustered and telomeres were tethered to variable locations near their experimentally measured locations 79,80 . The energy potential that binds telomeres and centromeres to a specific location uses the L 1 norm, which allows for more movement at higher temperatures yet more restricted movement at lower temperatures, and a smoother transition during the cooling schedule (See SI for more details).

Generation of ensembles of open chains from closed chains.
To test whether our algorithm distinguishes between conformations that are in close proximity and entangled from those that are in close proximity but untangled (Section 2.2) we associated ensembles of open chains to pairs of closed chains. We generated statistically independent ensembles of closed freely jointed chains of fixed length with their center of masses separated by a fixed distance. For a pair of closed chains C 1 and C 2 , we randomly selected a segment in each of the closed chains (s 1 and s 2 ) removed four segments consecutive to each of the selected segments s i , = i 1, 2 and applied the stochastic closure algorithm to compute their linking proportion. This process was repeated for different s i . Proportions were averaged over all sets of open chains and used as a measure of entanglement of C 1 and C 2 .

Results
The circularization algorithm quantifies the entanglement complexity of 3D reconstructions.
We used the linking proportion between pairs of chromosomes (i.e. open chains l 1 and l 2 ) as a measure of entanglement complexity. Although this geometrical invariant changes with chromosome length and with the distance between the center of masses of the chromosomes, it detects entanglement between two open curves and it is robust with respect to noise inherent to CCC data, a problem that has previously obscured the geometrical interpretation of CCC data 12,23,24,27,33 . We illustrate the properties of this algorithm with some examples. Figure 2 shows 3D reconstructions corresponding to some of the entries in Table 1 (Fig. 2a-c), Supplementary Table 9 (Fig. 2e) and Supplementary Table 10 (Fig. 2f) Table 1. Tables for all other reconstructions can be found in the supplementary material. On the basis of this work, we conclude that the circularization algorithm captures the entanglement complexity for 3D reconstructions of chromosomes.

The single stochastic closure algorithm can distinguish between entangled and unentangled open reconstructions.
To test whether our method can statistically discriminate between pairs of chains that are in close proximity and entangled from those that are in close proximity but untangled, we implemented the following statistical test. First, we generated a random sample of 1,000 pairs of closed circular freely jointed chains of equal length and with centers of mass separated by a fixed distance d 81 . Second, we split the population into two subpopulations: those with linking number equal to zero and those with linking number different from zero. We computed their associated ensembles of open chains and linking proportions (See Section 1.4). Third, we computed and compared the average value of the linking proportions in each of the subpopulations (with or without linking number equal to zero). The sample mean of the linking proportion was 72% for entangled chains, and 57% for untangled chains. To test whether these mean values were statistically different we generated the null distribution by permuting the linking proportion values of the entangled and untangled chains. The p-value for the permutation test was ~10 −3 (significant for α = 0.05). These results were further corroborated by performing the large sample independent t-test. Hence we concluded that the linking proportions can distinguish between entangled and untangled configurations that are in very close proximity.

The single stochastic closure algorithm outperforms the linking number for open chains when comparing reconstructions.
To further validate our approach we tested whether the linking proportion obtained by our method could distinguish reconstructions (obtained with different initial conditions) better than the known linking number for open chains. Based on the work in 72 , we would expect that a statistical algorithm should distinguish between reconstructions 7 and 8 from other reconstructions since they were generated using different physical distances. Tables were vectorized and compared using the K-S test (Section 1.2). Results of this comparison are shown in Table 2. The rows and columns in the table correspond to different reconstructions and the entries show the p-value associated to the linking number for open chains (first two rows) and the stochastic closure method proposed here (second two rows). Our results clearly show that the stochastic closure method can distinguish all of the reconstructions from reconstructions 7 and 8 (except for reconstruction 11). The linking number for open chains, on the other hand, fail to distinguish eleven of them. Based on these results we conclude that the stochastic closure algorithm outperforms other standard methods to measure entanglement of open curves obtained through the CCC data analyzed here.
The entanglement complexity of the yeast genome is lower than predicted by the wormlike chain model. We compared the observed linking proportion distribution in the reconstructions with those obtained using Monte-Carlo simulations of random embeddings of wormlike chains confined to a spherical volume. The mean, median and range for the distribution of linking proportions are shown in Table 3. Figure 3 shows the histograms representing the frequency of the linking proportions of the CCC reconstructions (blue) and of wormlike chains obtained by computer simulations (red). All the means were significantly smaller than that estimated for the wormlike chain (μ wlc = 81%). These results clearly show that the entanglement complexity predicted by the wormlike chains is much larger than that of the reconstructions. This finding is consistent with biophysical results of the linking of randomly embedded chains in confined volumes 82 and it provides further evidence that the yeast genome is not randomly organized. We also illustrate this result in Fig. 3, where the  Table 3. Linking proportion mean, median and range values for all reconstructions reflect the simulated Rabl configuration. Each column corresponds to one reconstruction.
www.nature.com/scientificreports www.nature.com/scientificreports/ distribution of linking proportions for three representative reconstructions. Reconstruction 1 (published in 62 ), Reconstruction 8 and 10 with the lowest and largest mean/median of the linking proportion values respectively. The Rabl configuration of the yeast genome helps explain the observed reduced chromosome entanglement. The main genomic organizational features observed at the level of resolution of chromosome arms and territories are the clustering of centromeres and telomeres in the Rabl configuration. We therefore posit that the distinct Rabl configuration plays a key role in preventing the entanglement of chromosomes. To test this hypothesis we performed three different simulation studies: (i) with centromeres clustered near the nuclear envelope and free telomeres; (ii) with only telomeres tethered near the nuclear envelope; and (iii) with both centromeres and telomeres fixed, resembling the Rabl configuration.
The top three panels in Fig. 4 show the distribution of linking proportions corresponding to simulation (i). The distributions of linking proportions corresponding to the selected reconstructions in (blue) still show much lower linking proportion than when only centromeres are clustered at the nuclear envelope (red) (μ cent = 70 ± 20%). The lower three panels show the results for simulation (ii), in which only telomeres are attached near the nuclear envelope 52,70 . Interestingly, in case (ii), the distribution of linking number frequencies (μ tel = 60 ± 30%) was closer to that of the reconstructions and much simpler than those of the randomly embedded wormlike chain and those in which only centromeres were attached (Fig. 4, top three panels). These results clearly show that these mechanisms are not sufficient to reduce the entanglement complexity to levels similar to those observed in the reconstruction data.
The situation was different when we implemented the Rabl configuration in simulation (iii). In this case the topological complexity was reduced to levels comparable to those observed in the reconstructions and the mean of the distribution of linking numbers was μ Rabl = 24 ± 21% a value much closer to those observed in the reconstructions (Fig. 5). In fact there were no significant differences between the simulated configurations and Reconstructions 1 and 10 ( = . p 0 062 and = . p 0 062). Interestingly, we still found significant differences between reconstruction 8 and the simulated models. Inspection of this reconstruction revealed that chromatin fibers have more interactions with the nuclear envelope than our proposed Rabl configuration. This reconstruction suggests a mechanism of entanglement simplification driven by the frequent attachment of chromatin fibers to the nuclear envelope 83 . Based on these results we suggest that the Rabl configuration is a regulator of the three dimensional organization of genomes that prevents entanglement of chromosome fibers.

Conclusion and Discussion
The three dimensional organization of genomes is essential for the correct functioning of the cell. Confinement of DNA fibers, however, promotes entanglement as evidenced experimentally by DNA knots and links observed in some viruses 5,84 and in the mitochondrial DNA of kinetoplastids (reviewed in 85 ), and by multiple theoretical studies 6,7,[9][10][11]13,86,87 . Organisms have evolved mechanisms to regulate DNA entanglement. Most notable is the presence of topoisomerases and site specific recombinases, enzymes that regulate the topology of genomes and that are www.nature.com/scientificreports www.nature.com/scientificreports/  www.nature.com/scientificreports www.nature.com/scientificreports/ known to unknot and unlink DNA [88][89][90] . There is evidence however that the cell has evolved other mechanisms to regulate the entanglement of genomes. For instance, at large scales, the eukaryotic chromosome is confined into territory (reviewed in 22 ) and below the megabase scale, genomes are partitioned into domains and loops, an organizational feature that is preserved from bacteria 37,38 to humans [30][31][32][33][34]36,42,91,92 .
On the basis of the work reported here, we propose that the Rabl configuration, an organizational feature that is also preserved through multiple species in evolution, provides another mechanism for topology simplification. We tested this hypothesis using published CCC data and developed a new method in statistical topology to estimate the topological entanglement of genomes. Our method for estimating entanglement has some advantages over the standard linking number for open chains. First, it detects entanglement of curves better than the standard linking number for open curves when analyzing 3D reconstructions and second it can be extended by using topological invariants finer than the linking number. Whether the former is a general property or not remains to be determined. Our results showed that the entanglement complexity of CCC reconstructions of the yeast genome is lower than the entanglement complexity of free randomly embedded chains and chains in which centromeres and telomeres had been tethered to the nuclear envelope; only the implementation of the Rabl configuration yielded an entanglement complexity comparable to that of the CCC reconstructions. These results suggest that the Rabl configuration is a regulator of the entanglement complexity of the genome. Note however that this finding does not exclude the possibility of other mechanisms such as the tethering of other regions of the genome to the nuclear envelope 20,83 . Our method and conclusions are limited by the fact that chromosomes are highly dynamic, specially as the cell goes through the cell cycle. This limitation opens the door to new inquiries on topology regulation during the cell cycle.

Data Availability
The datasets generated during and/or analysed during the current study are available from the corresponding author on reasonable request.