Pathways of DNA unlinking: A story of stepwise simplification

In Escherichia coli DNA replication yields interlinked chromosomes. Controlling topological changes associated with replication and returning the newly replicated chromosomes to an unlinked monomeric state is essential to cell survival. In the absence of the topoisomerase topoIV, the site-specific recombination complex XerCD- dif-FtsK can remove replication links by local reconnection. We previously showed mathematically that there is a unique minimal pathway of unlinking replication links by reconnection while stepwise reducing the topological complexity. However, the possibility that reconnection preserves or increases topological complexity is biologically plausible. In this case, are there other unlinking pathways? Which is the most probable? We consider these questions in an analytical and numerical study of minimal unlinking pathways. We use a Markov Chain Monte Carlo algorithm with Multiple Markov Chain sampling to model local reconnection on 491 different substrate topologies, 166 knots and 325 links, and distinguish between pathways connecting a total of 881 different topologies. We conclude that the minimal pathway of unlinking replication links that was found under more stringent assumptions is the most probable. We also present exact results on unlinking a 6-crossing replication link. These results point to a general process of topology simplification by local reconnection, with applications going beyond DNA.

Flexible circular chains appear often in nature, from microscopic DNA plasmids to macroscopic loops in solar corona. Such chains entrap rich geometrical and topological complexity which can give insight into the processes underlying their formation or modification. Knotted and interlinked states often coincide with higher energy states in physical systems and are usually undesired. Topology-simplifying reconnection processes involving one or two cleavages are observed. Examples in biology include the action of type II topoisomerases and of site-specific recombinases. Type II topoisomerases bind to two segments of double-stranded DNA, cleave one of the segments, transport the other through the break (strand-passage) and reseal the break. Site-specific recombinases bind to two specific sites (short segments of double-stranded DNA), introduce a double-stranded break on each site, recombine the ends and reseal the breaks. The action of recombination enzymes is a local reconnection event. We here investigate pathways of unlinking of newly replicated DNA links by local reconnection. The results presented, and the numerical methods proposed are not restricted to the biological example and are applicable to any local reconnection process.
In genetics, the observation of topological links dates back to studies in plants in the 1930s. In a study of chromosomal variation in Crepis tectorum, M. Navashin observed ring chromosomes, noting "in one case, the two daughter strands composing a normal chromosome failed to separate". Navashin reported on a metaphase involving four rings, two of which were "united in the fashion of chain links", thus documenting the appearance of two newly replicated circular chromosomes forming a singly-linked catenane, or 2-crossing link 1 . In her study of ring chromosomes in maize, Barbara McClintock observed the accumulation of several rings in the same cell and hypothesized that "lack of uniformity in the splitting plane could give rise to a double sized ring with two insertion regions or cause split halves of the ring to become interlocked", thus introducing the ideas of chromosome dimers and links (also called catenanes) 2 . Three decades later, DNA links were studied in vitro via random cyclization of circular DNA in the presence of an excess of DNA circles 3 and, in 1980 interlinked dimers formed by nicked newly replicated 5.2 kb circular dsDNA mini chromosomes from SV40 were observed by electron microscopy 4 . The mechanisms of replication and segregation of circular DNA predict products that can be topologically characterized as right-hand (RH) 2m-crossing torus links with parallel sites, which we here refer to as parallel 2m-cats (denoted mathematically as parallel m (2 ) 1 2 or T m (2, 2 ) p ) 5 . These topological forms were confirmed by characterizing the linked replication intermediates that accumulate in topoIV mutants 6 (Fig. 1(A)).
Sogo et al. 7 hypothesized that catenanes appeared as replication intermediates of bacteriophage λ DNA and observed that, in order to secure proper segregation of circular chromosomes at cell division, the linking number of the two newly replicated molecules must be reduced to zero. However, the topology of a circular double-stranded (ds)DNA molecule is insensitive to any manipulation that does not allow a double-stranded break 5 . Nicking of a single DNA strand, however extensive, is insufficient to unlink two newly replicated DNA circles unless pre-existing nicks are present along the second strand. The type II topoisomerase topoIV is a major decatenase in E. coli 6,8 . Grainge et al. showed that in the absence of topoIV, the XerCD-dif-FtsK molecular machine can act in vivo to separate two interlinked, newly replicated chromosomes 9 . The XerCD complex consists of the site-specific tyrosine recombinases XerC and XerD. The dif site is a 28 bp long recombination site located within the terminus region of the E. coli chromosome. FtsK is a powerful translocase that assembles at the division septum, where it activates XerCD-dif recombination. Their experimental data suggested a gradual reduction in topological complexity of the substrates, which were RH 2m-cats with parallel dif sites 9 . The proposed unlinking pathway, through which the enzymes unlink the replication links in a step-wise fashion is illustrated in Fig. 1A. In the figure, each closed curve represents a circular dsDNA molecule. The components of a two-component link represent two newly replicated DNA chains.  10 we showed that there is a unique unlinking pathway starting at a 2m-crossing replication link. In E. coli a replication link is a 2m-cat with parallel dif sites 6 , and this pathway predicts the first product to be a − m (2 1) 1 knot with two dif sites in direct repeats. Two sites along a knotted chain are in direct repeats if they induce the same orientation into the knot. Replication links are 2m-crossing right-handed torus links with parallel sites (mathematical notation: m (2 ) 1 2 ). The pathway in the figure illustrates, for m = 6, the only unlinking pathway starting at the parallel 2m-cat under the assumption that each reconnection step strictly reduces the minimal crossing number. All the intermediate topologies are torus links m (2 ) 1 2 or torus knots − m (2 1) 1 with two reconnection sites in direct repeats as in the figure. (B) One reconnection step: here the cleavage regions of the reconnection sites on a 6 1 2 link are brought together to form a synapse (shown as a ball enclosing two strings). The synapse is modeled mathematically as a 2-string tangle. In the case of XerCD sitespecific recombination, the strings in the tangle contain the core regions of the dif sites (indicated by two arrows in a tangle P representing two very short segments of double-stranded DNA which physically behave as two almost straight strings) and any bound DNA which does not change during recombination (gray shaded region). Any interesting geometrical or topological complexity of the substrate is captured mathematically as an outside tangle O that remains constant during reconnection. Before strand cleavage, the substrate is modeled by the tangle equation The local reconnection is modeled by tangle surgery where P is replaced with R, yielding a product represented as where K is a knot with two directly repeated sites. (C) Local reconnection is a simple event which can be modeled as a band surgery, where P = (0) is replaced with a tangle R = (w, 0) enclosing a vertical row of w twists, for some integer w. The rational tangle notation (or Conway notation) for such vertical tangle is R = (w, 0). In the case when = ± w 1 the notation simplifies to = ± R ( 1). In the simplest cases, P = (0) with sites in parallel alignment goes to = ± R ( 1), and P = (0) with sites in anti-parallel alignment goes to = R (0, 0) as illustrated in the figure.
Scientific RepoRts | 7: 12420 | DOI:10.1038/s41598-017-12172-2 A rigorous mathematical analysis of the recombination experiments of Grainge et al. 9 showed that at least 2m steps are needed in order to unlink any RH 2m-cat with parallel sites 10 . This result relied simply on the assumption that the XerCD tetramer binds the two dif sites and that a simple cut-reconnect-paste reaction ensues (Fig. 1C). If the shortest pathway of unlinking a 2m-crossing replication link has exactly 2m steps, it is natural to ask how many such pathways exist and whether some are more likely than others. Under the assumption that each step strictly reduces the topological complexity of its substrate (as measured by minimal crossing number), Shimokawa et al. 10 showed that the only possible pathway of unlinking a 2m-crossing replication link is that in Fig. 1A. Using tangle calculus, they proposed a 3-dimensional topological mechanism to take the parallel 2m-cat to the unlink. This mechanism incorporates three solutions obtained by tangle calculus at each step of the process, and the last three steps are fully characterized. The results in Shimokawa et al. 10 provide unprecedented detail in the study of the topological mechanism of DNA unlinking by site-specific recombination. Going beyond the original problem of unlinking newly replicated circular chromosomes, these results apply to any reconnection event that can be modeled using tangles as in Fig. 1. For example, the same unlinking pathway proposed for DNA links under site-specific recombination has been observed during reconnection events in physical fields such as vortices in fluid flow [11][12][13] . Further mathematical research on this subject can be found in the literature [14][15][16][17][18] .
Successful unlinking by XerCD-FtsK of newly replicated plasmids containing dif sites was shown in ref. 9 . Quantification of these data gave weak justification to the assumption of stepwise reduction in complexity during the unlinking reaction 10 . As can be seen in Fig. 2, the gel quantification clearly illustrates the reduction of replication links by XerCD-FtsK site-specific recombination at dif sites. However, because of the complexity of the data, in order to confirm stepwise reduction one would need to repeat the time course experiments 9 for each individual topology. This motivates the current work where we remove the assumption of stepwise decrease in complexity, and design mathematical and numerical methods to assess the different unlinking pathways and the identification of the most probable ones. We ask whether there are other minimal unlinking pathways and hypothesize that the minimal pathway previously proposed 9,10,19 and illustrated in Fig. 1A is the most likely among all the possible minimal pathways that arise. First, we allow the complexity of the products to decrease or remain the same at each step of the reaction. We provide analytical proof that there are exactly nine minimal pathways of unlinking a parallel 6-cat; many of the resulting transitions are fully characterized. Characterizing minimal pathways of unlinking by local reconnection and resolving the topological mechanisms involved are problems of high theoretical complexity since the number of possibilities quickly increases with the number of crossings of the substrate. Likewise, characterizing the topological mechanism(s) taking a link L i to a knot K j is equivalent to characterizing all band surgeries between L i and K j (see Fig. 1C).
In order to discriminate between different minimal unlinking pathways for a given substrate and to extend the study to higher crossing numbers, we eliminate the complexity assumption and develop a Monte Carlo method to simulate local reconnection events. The method can be applied to a substrate with any topology, allows products of varying topological complexity, and facilitates the rigorous quantification of the transition probabilities along each obtained pathway. Using this method we embark on a numerical study relevant to unlinking of DNA replication links by site-specific recombination a dif sites. More specifically, we restrict the numerical study to knotted chains of fixed length with two reconnection sites (representing the dif sites) that are evenly spaced along  9 showed a time course of unlinking by XerCD-dif-FtsK50C at 25 °C of newly replicated plasmids containing dif sites. Line scans of the gel were previously published 10 . In this figure each topological class is shown as a separate series of points with linear interpolation. The caption assumes the bands observed correspond to the topologies expected from a substrate composed of replication links, i.e. 2m-crossing links (e.g. 2m-cats), and some of the corresponding knotted intermediates (open circle or 0 1 , 3 1 , 5 1 ). "Unlink" corresponds to the two unlinked components in monomeric state (topology type 0 1 2 ), and "Unknot" corresponds to the dimeric unknot (0 1 ). The quantification clearly illustrates the reduction of replication links by XerCD-FtsK site-specific recombination at dif sites. The complexity of the data is also evident, with the relative proportions of all the different topologies fluctuating from one step to the next, thus obscuring the signal. The computational approach provides a rigorous means to discriminate between mathematically equivalent unlinking pathways. The combination of the mathematical and computational studies provides strong quantitative support for the hypothesis that the unlinking pathway from Fig. 1A is the most likely, even under the weakened assumptions.

Nomenclature for knots and links
It is important at the outset to say a word about the naming convention used for the knots and links which arise in this study (490 knots and 391 two-component links). A local reconnection event on a two component link with one cleavage site in each component yields a knotted chain with two sites in direct repeats (cf. Fig. 1A). Rolfsen's Knot Table 20 summarizes the knot nomenclature used in the mathematics community, which was not intended to distinguish between mirror images nor between oriented links, an important consideration when dealing with circular DNA and other biopolymers. Chirality is relevant, and indeed crucial, to characterize biological and chemical compounds. In this paper, we use the writhe-based knot nomenclature proposed in Brasher et al. 21 . The writhe is a geometrical invariant that provides a measure of a chain's entanglement complexity and chirality. It is computed analytically using a Gauss double integral and can be estimated numerically by taking the average of the writhe of a planar diagram taken over all projection directions (the projected writhe). The mean writhe of a knot K refers to the average of the writhes of all knotted chains of type K. Numerically this is estimated by averaging over a sufficiently large, randomly generated ensemble of conformations of type K. A representative of a chiral pair is chosen based on its mean writhe 21 . We extend this nomenclature to the 2-component links depicted in Fig. 3. For prime 2-component links with 9 or more crossings we use the default notation from Knotplot 22

Results
There are exactly 9 shortest pathways to unlink the 6-cat that do not increase substrate complexity. We consider an event where two oriented sites come together and undergo cleavage followed by reconnection. If the substrate is a single circle, then the oriented sites are in direct repeat, i.e. they induce the same orientation into the circle. If the substrate consists of two circular chains, then there is one site in each chain. Note  20 and in works by Kanenobu 28,29 is included in Supplementary Fig. S5. Arrows indicate the relative orientations of the sites. that such an event always changes the topology of the substrate: reconnection between two sites in separate components of a link yields a knot with two sites in direct repeats, and reconnection on a knot with two directly repeated sites yields a 2-component link with one site in each component. The reconnection event is modeled as a system of tangle equations as described in Fig. 1(B). In the context of DNA unlinking, as in Shimokawa et al. 10 , we model dsDNA as a curve defined by the axis of the DNA double helix, and the synapse formed by the enzymes bound to the core regions of the dif recombination sites as the 2-string tangle P. Reconnection changes P into R. If we assume that each reconnection is modeled as a coherent band surgery, i.e. P = (0) and R = (w, 0) for some integer w, then any minimal pathway to unlink an n-crossing torus link with parallel sites (e.g. 4 1 2 or 6 1 2 ) has exactly n steps. Furthermore, if each reconnection step is assumed to strictly reduce the complexity of its substrate, then the minimal pathway is unique: i.e. RH 2m-cat, RH − m (2, 2 1)-torus knot, RH − m (2 2)-cat, , RH trefoil, Hopf link, trivial knot, trivial link. Figure 1A illustrates the 6-cat case. Since the experimental data 9 only gives weak support to the assumption that the complexity goes strictly down at each step of the reaction (Fig. 2), we here examine the case where no reconnection step increases the number of crossings and provide analytical characterization of all shortest pathways from the 6-cat to the unlink.

Assumption 1. Consider a reconnection pathway from a parallel RH 2mcat to the unlink. Assume that each product along the pathway is a knot or a 2-component link, that the pathway is shortest, and that no reconnection event increases the number of crossings of its substrate.
Recall that any shortest reconnection pathway from m (2 ) 1 2 to the unlink has exactly 2m steps 10 . In Theorem 2 we show that there are exactly nine unlinking pathways satisfying Assumption 1. Fig. 4.

Theorem 2. A pathway from the parallel RH 6-cat that satisfies Assumption 1 is one of the 9 shown in
The 9 pathways found in Theorem 2 involve 16 possible transitions taking a knot to a link or vice versa; 6 of the transitions have fully characterized mechanisms. The proof of the theorem and the characterization of the mechanisms are presented in the Supplementary Methods. Figure 4 summarizes the results as an oriented graph where each node is a knot/link type and each edge represents the transition between two topologies by one reconnection step. All minimal pathways taking the parallel 6 1 2 to the unlink 0 1 2 , and satisfying Assumption 1 are shown. In the next section we undertake a thorough computational study with the objective of discriminating between minimal pathways while minimizing the number of assumptions. In particular, we use the numerical work to assign frequencies to each transition in the pathway graph (represented in Fig. 4 as weights on the edges).
We here give a draft of the proof of Theorem 2. More details, including Lemmas S1-S8, Propositions S9-S17, and Figs S1 and S2 exhibiting the steps of the proof and relevant band surgeries for each of the transitions in Fig. 4, are included in the Supplementary Methods. In order to characterize the minimal pathways starting from the parallel 6 1 2 link, we first investigate the effect of band surgeries on certain topological invariants such as the signature, the Jones polynomial, the Q polynomial and the Arf invariant of the knots and links involved in those pathways. By Lemma S6, the sequence of the signatures of knots and links is −5, −4, −3, −2, −1, 0, 0. Lemma S7 shows that split links can not appear in a shortest pathways. Lemma S8 identifies the candidate topologies for the minimal pathways from 6 1 2 .
. Because XerC and XerD are tyrosine recombinases and act through a Holliday Junction Intermediate, the tangle pairs (P, R) that are relevant to unlinking of DNA replication links by Xer recombination are Fig. 1C. The above proposition allows to determine all the topological mechanisms for each of the three combinations of substrate and product in the statement. We illustrate the solutions in Proposition S18 and in Supplementary Fig. S3 in the Supplementary Methods. Just as in Shimokawa et al. 10 , here each system of tangle equations yields three solutions, and the three solutions can be interpreted as representing a unique 3-dimensional topological mechanism.
Which unlinking pathways are most probable? In the previous section, we proved analytically that under Assumption 1 there are 9 minimal pathways of unlinking the parallel 6-cat, 6 1 2 . The mathematical analysis that includes enumeration of pathways and characterization of topological mechanisms becomes difficult for substrates with high crossing numbers. Furthermore, if the assumption of reduction in complexity-which is equivalent to imposing a topological filter in the physical system-is lifted, then the number of possible pathways increases rapidly and the detailed mathematical analysis quickly becomes intractable. We here remove Assumption 1 and set out on a numerical exploration of reconnection pathways starting from a broader set of substrate topologies. We develop software which finds reconnection sites along polygonal chains in the simple cubic lattice and simulates the reconnection event. Figure 5C illustrates the basic reconnection move on a simplified polygon. Figure 5A shows a lattice trefoil with one single reconnection site, before and after local reconnection. We simulate reconnection to explore different topological transitions, to quantify transition probabilities and to discriminate between unlinking pathways that are mathematically indistinguishable when only substrate, product and length are specified.
We provide numerical evidence that, of all minimal pathways starting with the RH parallel 6-cat, the one in Fig. 1A is the most likely. The weights in Fig. 4 correspond to the transition probabilities obtained in the numerical simulations. More generally, our numerical data suggest that this trend holds for any substrate that is a RH 2m-cat with parallel sites, or a RH − m (2 1)-torus knot with two sites in direct repeats. It is important to emphasize that the simulations do not use Assumption 1. Figure 5B is a circos figure that shows all observed reconnection transitions that maintain or decrease minimal crossing number and that belong to an observed minimal pathway from the 9 1 knot. The thickness of the arcs corresponds to the directed transition probability between two topologies. Transitions in the most probable minimal pathway from 9 1 are colored red. The predominance of these most probable unlinking pathways is consistent with the experimental observations for XerCD-FtsK-dif site-specific recombination on DNA replication links 9 , and for reconnection in fluid vortices 12 , and is also consistent with the predictions in the literature 10,11 .
The minimum distance between the link type L i and the knot type K j in terms of band surgeries is called nullification distance 25,26 . In the numerical experiment we started by choosing knots and 2-component links that are at nullification distance 1-3 from one of the 11 knots or links along one of the 9 minimal pathways of Theorem 2 and Fig. 4, or are obtained from these topologies by taking mirror images or reversing the orientation of one of the components. For completeness, we expanded the initial set to include 491 substrate topologies representing almost all knots and links with 9 or fewer crossings. Reasons for omitting a handful of 9-crossing split links from the substrate set are described in detail below. We use the BFACF algorithm to generate large independent ensembles of conformations for each substrate topology. BFACF is a dynamic Monte Carlo method which samples uniformly the set of all lattice polygons of fixed topology for a given mean length 27 . The BFACF moves used to perturb each chain are illustrated in Fig. S4 in the Supplementary Methods. Split links such as the unlink 0 1 2 or ∪ 0 3 1 1 (see Fig. 3), even though they appear as reconnection products, are not used as substrates due to the difficulty of keeping the components together without altering the Monte Carlo procedure. In order to improve the efficiency of sampling statistically independent conformations we implemented BFACF as a Composite Markov Chain (CMC). Details of the simulations, including a description of the algorithms and different parameters, are included in the numerical methods section and in the Supplementary Methods. Fig. S6 in the Supplementary Methods illustrates all the transitions observed between 881 topologies in the numerical experiment, including those that do not appear in minimal pathways from 9 1 . The resulting transition probabilities are available in matrix form in the data spreadsheet provided as Supplementary Information (Supplementary Data). Figure 5D contains exact counts for the number of minimal unlinking pathways for torus knots and links with up to 6 crossings, and the corresponding numerical estimates for 7 and 8 crossings. Under Assumption 1 there are 9 minimal pathways of unlinking the 6 1 2 link. In the numerical study, we find 36 minimal unlinking pathways for the 7 1 knot and 208 minimal unlinking pathways for the 8 1 2 link, under Assumption 1 (P L ( ) min ). Once the Assumption is removed, we observe = P(7) 2760 1 minimal pathways for the knot 7 1 and = P(8 ) 6434 1 2 minimal pathways for the link 8 1 2 (in this case the crossing number can increase at any given step). However it has been shown analytically that there are infinitely many possible minimal pathways between any 2n torus link with parallel sites and the unlink 17 . The numerical data can provide biologically-relevant information by establishing a ranking of the most likely pathways. The third row in Fig. 5D indicates the number of distinct product topologies (as detected by the HOMFLY-PT polynomial) observed for torus knots and links of the type T n (2, ) with 8 or fewer crossings after a single reconnection step. All substrate knots have directly repeated sites that are 60 segments apart, with a tolerance of ±6 segments, and all links have two components 60 ± 6 long so that the sum of the lengths is exactly 120. Reconnection on links is only performed between sites in different components. (B) Circos figure: all reconnection transitions in a minimal pathway from the 9 1 that satisfy Assumption1. 2-component links (resp. knots) are arranged by increasing crossing number from bottom to top in the left (resp. right) hemisphere, and are color-coded blue (resp. red). Color intensity increases with decreasing crossing number. An arc between K and L indicates at least one observed reconnection event between K and L. The thickness of the arcs corresponds to the directed transition probability between two topologies. Transitions with an observed probability <0.2 are thickened to be more visible. Transitions are colored according to the probability of the most probable minimal pathway they are a member of. The first, second, and third most probable unlinking pathways from 9 1 are colored red, orange, and yellow, respectively. If no arc appears between a pair {K, L}, this means that no reconnection between them was observed. Observed transitions for all substrate topologies, including those in non-minimal pathways, are included in Supplementary Data and in Fig. S6

Discussion
In Theorem 2 we prove that there are exactly 9 shortest unlinking pathways for the 6 1 2 , assuming that at every step the complexity of the substrate goes down or remains the same. The 9 pathways are illustrated in Fig. 4. We solve the topological mechanisms involved for 6 of the 16 steps along these pathways. We develop a new Monte Carlo based numerical method which allows us to model local reconnection on chains of fixed length and topology. We run the numerical simulation on each topology found to be within 3 nullification steps from any topology in Fig. 4. Notice that in these experiments there is nothing preventing the complexity of a substrate from going up at any given step. We can determine the set of all minimal pathways from any of the substrate topologies, and single out the most probable pathway. In Fig. 5 we provide numerical estimates for the number of minimal pathways for torus knots and links with 7 and 8 crossings. In our numerical data the most probable minimal pathway from a torus link (or knot) to the unlink is the one where every intermediate is also in the torus family as in Fig. 1A. The data from the numerical experiments can be found in the Supplementary Data.
Mathematically, extending Theorem 2 to determine all minimal pathways for T(2, N) torus knots and links is difficult. In general, if the substrate is a torus knot or link T(2, N) one can find multiple pathways that preserve the minimal crossing number at many steps. The complexity of the problem grows with the minimal crossing number of the substrate. For example, using numerical simulation we estimate the number of minimal pathways from the 7 1 (resp. 8 1 2 ) to the unlink to be at least 36 (resp. 208) under Assumption 1. These are not tight bounds due to the limitations with using links of the form K#2 1 2 as substrates in the numerical experiments. It is known that when the assumption is removed, there are infinitely many shortest pathways between the T N (2, 2 ) p torus link and the unlink 17 . In our numerical work, once Assumption 1 is removed we count at least 744, 2760 and 6434 shortest unlinking pathways for 6 1 2 , 7 1 and 8 1 2 , respectively. The problem of computing the nullification distance between a knot and a link is of interest to the mathematical community 17,25,26,28,29 . In cases where the analytical tools fail to provide an exact nullification distance, one can estimate the distance between two topologies using the numerical method and possibly remove ambiguities by exhibiting the relevant band surgeries.
The numerical simulations in this study posed a number of challenges. For example, in order to generate an ensemble of essentially independent unknots 0 1 of length 120 we had to go through at least twice as many iterations of the BFACF algorithm than for any other substrate topology. Further, these unknots contained synapses meeting the reconnection criteria approximately once every 7.5 × 10 9 iterations. In order to improve the efficiency of such runs, we implemented the BFACF algorithm as a Composite Markov Chain process [30][31][32][33] . Similar challenges extend to any topology consisting of a connected sum of a knot and a Hopf link K#2 1 2 , or the disjoint union of a knot and an unknot ∪ K 0 1 (see examples in Fig. 3). In the first case, the unknotted component tends to shrink, making it difficult to satisfy the equal-length criteria for recombination. In the second case, even though these topologies appear as reconnection products, they cannot be used as substrates due to the difficulty of keeping the components together (without biasing the simulations for those specific substrates). Now consider an example where a bacterial chromosome dimer forms a 3 1 knot with two equidistant directly repeated dif sites. In our simulations we see that 0.025% of trefoils transition to ∪ 0 3 1 1 , the disjoint union of an unknot and a trefoil, and 95.2% of trefoils transition to 2 1 2 . In the first case the knotted dimer is effectively unlinked in one step, but one of the components will remain knotted, which can pose problems during chromosome segregation. In the second case unlinking of the trefoil can be achieved in 3 steps, with a combined probability of 0.925; the final product is 0 1 2 , a union of two circles which can then segregate at cell division.
In the case of unlinking of DNA replication links, each component of the link corresponds to a newly replicated chromosome from E.coli with one dif site in each component. This example motivated our choice to let two reconnection sites within a single circle be equidistant, and the two components of a linked product or substrate have the same length. In different contexts, such as that of site-specific recombination between non-equidistant sites, more general homologous recombination, and possibly other reconnections in physics, the distance between sites will be an important parameter, requiring further exploration of the length and topology dependence of the transition probabilities obtained by the numerical method.
Furthermore, in nature, DNA molecules are often found tightly packaged in crowded environments. A study of reconnection on confined chains would shed light on whether confinement plays a role in driving topological simplification by any process involving local reconnection. Existing studies of the confinement of polygonal chains inside and outside the lattice suggest methods for generating ensembles of conformations 34,35 .

Materials and Methods
Mathematical Methods. The tangle method is briefly summarized in Fig. 1. The naming convention used for knots and links is reviewed in the Introduction and in Fig. 3. More detailed mathematical methods and results used in the proof of Theorem 2 are provided in Fig. 4 and in the Supplementary Methods. A site-specific recombination event is modeled as a local reconnection and is represented mathematically as a system of tangle equations as described in Fig. 1B. The circular chain represents the starting knot or link, and P is a 2-string tangle that encloses the reconnection sites. Reconnection changes P into R. We assume that each reconnection is modeled as a coherent band surgery, i.e. P = (0) and R = (w; 0) for some integer w (Fig. 1C).
Numerical Methods: modeling reconnection. Computer simulations of local reconnection. We use an integrated set of computational tools to generate and filter ensembles of conformations, perform reconnection, identify product topologies, generate transition probabilities and facilitate statistical analysis of the results. Given an ensemble of lattice conformations with fixed length and constant topology, our algorithm searches for possible synapses along each conformation, selects one uniformly at random, and performs reconnection as illustrated in Fig. 5A. Our original motivation came from XerC/D site-specific recombination at dif sites in newly replicated chromosomes with one site in each component or in chromosome dimers with two equidistant directly-repeated sites. In this case reconnection events are constrained by the position and orientation of the dif sites. We therefore impose a set of constraints on where to perform reconnection. These can be seen as topological filters that can be adjusted to best fit the scenario to be modeled. Here, a reconnection synapse is defined as a pair of coplanar edges of distance one apart with antiparallel orientation; each of the two oriented edges is a reconnection site. Reconnection exchanges each edge of the synapse for one perpendicular to it as shown in Fig. 5C. The set of possible edge pairs on which to form a synapse is further constrained by step distance along the conformation. Here we adjust this parameter to constrain the location of the synapse so that the arc lengths on each side are equal within a ±6 range, while enforcing the total length of the knotted polygon, or the sum of the lengths of the components of interlinked polygons, to be fixed. For knots this models two equidistant sites in the synapse. For two component links, it models two components of equal length with a single site in each of the two components. We exclusively sampled conformations of total length 120 which contain at least one reconnection synapse.
Generation of reconnection substrates. Self-avoidance is an important property when modeling biopolymers such as circular DNA. Here, conformations in the simple cubic lattice, Z 3 , are self-avoiding polygons whose vertices have integer coordinates and whose edges are parallel to one of the three coordinate axes. The BFACF algorithm is a dynamic Monte Carlo method which samples from the space of lattice conformations of a fixed topology 27 . The states of the resulting Markov Chain are conformations obtained by first randomly selecting an edge, then attempting one of the three moves shown in Fig. S4 in the Supplementary Methods ((−2)-move, (+2)-move or (0)-move). None of these moves can ever change the link type of the conformation 27,36 .
Generating large ensembles of conformations for each topology with at least one valid synapse posed significant technical challenges. The 0 1 knot and links of the type K#2 1 2 where K is a knot with high crossing number were particularly problematic. This is because the component with trivial topology tends to have a short average length, making sampled conformations that form a reconnection synapse very rare. For example, the 0 1 forms such a synapse in fewer than 1 in 1.3 × 10 6 sampled conformations. To address these challenges and gain the computational performance needed for this study, we here extend the efficient, constant time (in knot length) implementation of the BFACF algorithm used in previous work 34,35,37,38 by employing it as a Composite Markov Chain (CMC) Monte Carlo process [30][31][32][33]39 . CMC BFACF iterates simultaneously on multiple Markov chains with different fugacity parameters, swapping conformations between chains when certain weighted random criteria are met; more details of the implementation are included in the Supplementary Methods. CMC Monte Carlo improves efficiency by exchanging conformational states between chains, thus improving the speed at which the conformations are randomized. We sample conformations at a frequent fixed rate and correct for dependent samples using block mean analysis 40 , therefore standardizing the sampling methodology across all of the topologies in the study and avoiding reliance on direct estimations of integrated autocorrelation time. With this methodology, we generated in the range of 10 7 conformations for every substrate topology. Of the topologies for which a reconnection event was observed, the number of conformations containing at least one reconnection synapse ranged from approximately 1.5 × 10 6 for the ⁎ 9 13 knot, to as little as 86 for the 6 #2 2 1 2 link. Two component topologies in which the two components are of different topology are difficult to sample efficiently because of the rarity of conformations that meet our stringent arclength criteria. Split links, i.e. those topologies in which the two components are not interlinked, are even more problematic because both components tend to travel away from each other, thus dramatically reducing the probability of sampling conformations that contain a valid synapse. We identified those topologies as products of reconnection, but did not include them in the set of substrate topologies described in the next paragraph.
Recall that 9 minimal unlinking pathways from the 6-cat were obtained analytically in Theorem 2 under the assumption that each reconnection step either preserves or reduces the complexity of the substrate. Our simulations eliminate that assumption, enabling wider exploration of possible topological reconnection pathways. We start with 491 substrate topologies, including those along the 9 unique pathways from Fig. 4 (excluding the unlink 0 1 2 ). With CMC BFACF we generate ensembles of conformations with fixed topology to be used as reconnection substrates. The number of substrate conformations generated ranges from 1.2×10 7 for the 7 6 2 link, to more than 6.9 × 10 8 for the 0 1 . We perform one reconnection per conformation and identify the resulting topology. Including all substrate topologies and the identifiable products after reconnection, there are 881 topologies being analyzed in the study (490 knots and 391 two component links).
Knot identification. Our simulations require a rigorous, unambiguous way of identifying the knot or link conformation types in Z 3 . With the exception of chiral knots 8 17 and 9 42 which have the same HOMFLY-PT as their mirror images, and 9 12 which has the same HOMFLY-PT as 4 1 #5 2 , all prime knots with nine or fewer crossings can be unambiguously identified using the HOMFLY-PT polynomial 41,42 . Our knot identification software is based on the other published algorithms 43,44 . In order to identify product topologies, we first perform 20,000 BFACF iterations with randomly chosen (0) and (−2) moves. At each step, the conformation either remains the same length or becomes shorter, in many cases approaching the minimal length for that topology 38 . The final conformation goes through an energy minimization algorithm 22 , we compute an extended Gauss code and identify the topology using the HOMFLY-PT polynomial. Information on those oriented knots or links with 10 or fewer crossings that HOMFLY-PT fails to identify uniquely is included in the Supplementary Methods.
Recombination between two directly repeated sites along a single circular chain yields a 2-component link. The number of product topologies increases dramatically with the complexity of the substrate. Figure 3 shows a selection of some of the expected products, including composite links that are not normally shown in knot tables. Composites are of two types: connected sums of prime knots or links; and disjoint unions. In this study, we perform recombination on two types of substrates: (i) knots with two (approximately) equidistant directly repeated sites; and (ii) links with 2 components of identical total length and with one site in each component. More specifically, each substrate knot is a self-avoiding lattice polygon of length 120 and recombination occurs on two directly repeated sites that are between 54 and 66 units apart (Fig. 5A). Each linked substrate consists of two self-avoiding polygons between 54 and 66 units long, such that the sum of their lengths is exactly 120. Recombination is restricted to synapses where two sites, one in each component, are found at unit distance apart and in anti-parallel alignment as illustrated in Fig. 5(A and C). A small representative subset of the knot and link types used in the simulations is shown in Fig. 3, and the naming convention is described in the nomenclature section, in the Supplementary Methods and in Supplementary Fig. S5.