Design principles for rapid folding of knotted DNA nanostructures

Knots are some of the most remarkable topological features in nature. Self-assembly of knotted polymers without breaking or forming covalent bonds is challenging, as the chain needs to be threaded through previously formed loops in an exactly defined order. Here we describe principles to guide the folding of highly knotted single-chain DNA nanostructures as demonstrated on a nano-sized square pyramid. Folding of knots is encoded by the arrangement of modules of different stability based on derived topological and kinetic rules. Among DNA designs composed of the same modules and encoding the same topology, only the one with the folding pathway designed according to the ‘free-end' rule folds efficiently into the target structure. Besides high folding yield on slow annealing, this design also folds rapidly on temperature quenching and dilution from chemical denaturant. This strategy could be used to design folding of other knotted programmable polymers such as RNA or proteins.


S3
Supplementary Figure 4 | Relative rates and probabilities obtained from FFS simulations of the designed P0 folding pathway. For description of the parameters refer to Fig. 2 in the main text. After the formation of Aa, Bb, Cc, and Dd, the rate of Ee formation, which involved threading a long tail through a small loop, decreased by over four orders of magnitude. As with P1, because of the similar stabilities of the Aa and Bb, and Gg and Hh complementary module pairs, paths where their order of formation is reversed may also be relevant. If Bb forms before Aa, the resulting pathway also obeys the "free-end" rule. By contrast, if Hh were to form before Gg, the final step would then involve an unfavourable "L+L" step. However, simulations indicate that Gg is significantly more likely to form before Hh due to the much closer proximity of the former two modules once connections Aa→Ff have formed already.

Supplementary Tables
Supplementary contains the universal flanking sequences. In order to remove the flanking sequences (we refer to this step as "end-trimming"), we must first generate dsDNA RE sites via thermal annealing (shown in bold) in the presence of end-trimming oligonucleotides before cutting with RE enzymes. designs. The parameter di is the minimum distance between any intended base pair in module i (as measured from the center of mass of the bases), and xi is the number of intended base pairs in module i.
Results presented without the standard error of the mean value were obtained from one independent rate calculation.

Free-end gluing
Let w = s 1 s 2 ...s 2n be a string in which each symbol appears exactly twice. A symbol s appearing twice in w is a dimer composed of two complementary nucleotide modules within a single polynucleotide chain that glue in an antiparallel orientation. We say that a segment s at position i belongs to a free end if at that stage no segment at positions i,i +1,i + 2,..., 2n has been glued yet. Actually, this defines a tail free end. In a similar way one could define the head free end.
Theorem. There exists a sequence of gluings such that at each step at least one segment of a pair being glued belongs to a free end.
Proof. We give a "greedy" gluing algorithm that takes our string w as input and returns the gluing order of pairs with the property that at each gluing, the second occurrence of a segment being glued belongs to a free end. Code in Python: Note that the algorithm scans the string w from left to right and the same algorithm can be applied also for a scan in the direction from right to the left. Whenever a symbol s appears for the second time, it is appended to the list last. In the end the completed list last determines the gluing order. However, at the stage when the symbol s is being put in the list, none of the positions coming after it have been glued, S19 and the theorem is proven. Note that we have used only the tail free end in the proof. This means that the result holds even in the case when one of the termini is fixed.

Supplementary Note 2 oxDNA Model
The oxDNA model provides a good representation of DNA features that are relevant for the current study, including duplex melting temperatures, hairpin stability, and the relative flexibility of single coarse-grained models where higher diffusion constants can be used to accelerate sampling. The coarsegrained nature of the potential also means the free-energy landscape is, on a microscopic scale, smoother than for real DNA, additionally accelerating dynamic processes 7 . See reference 1 for more details and discussion of D sim in oxDNA. These features make the sampling of complex self-assembly pathways more computationally feasible. Here we are primarily concerned with using simulations to probe the relative rates of similar processes, so the absolute magnitude of the rates is not an issue.

Forward Flux Sampling
We used direct Forward Flux Sampling (FFS) to more efficiently calculate rates between local freeenergy minima, as well as sample the transition pathways between minima. Direct FFS is illustrated schematically in Supplementary Fig. 12

S21
where N A i−1 ,A i is the number of times the simulation that reached A i started in A i−1 , τ is the total simulation time, and f A is the fraction of τ in which the state A i−1 was visited more recently than A k , where k ≠ i −1.
In the phase space between these states, we draw successive non-intersecting interfaces λ (2).
The first term in the product on the right-hand side of Supplementary Equation 2 is calculated by loading random configurations that have just crossed λ The process is then iterated for successive interfaces, and the flux as well as the trajectories that successfully reach Q max i from the distribution of pathways can be sampled.
Mimicking annealing in FFS simulations is not computationally feasible for the current study due to the relatively wide temperature range over which the strand is designed to fold. Thus, all FFS simulations of 4Py designs were carried out at a constant temperature (72 ∘ C). In experiments, competing metastable structures are thermodynamically disfavored for P1, where slow annealing encourages the strand to fold according to the designed folding pathway (Fig. 2). Simulations running at a constant temperature may undesirably sample out-of-order folded metastable structures that may be long-lived. To avoid complications caused by metastable structures forming in simulations, we sampled only the folding of modules in the exact same order for a particular design as shown in Fig. 2. Specifically, we prevented all subsequent modules (i+1) from forming structure until module i formed completely for the first time. This We estimated the random error in the FFS simulations in the following way. In Fig. 2 in the main text we reported the mean value for the formation rates of modules from several identical and independent implementations of FFS for the 4Py strands that we considered. The error reported for each 4Py is the standard error of the mean value. In Supplementary Tables 8-10 we report the mean and the standard error of the mean for each individual interface for all 4Py studied. We note that this estimation of the error may be an underestimate of the true variance in the data because we were only able to calculate each rate independently two or three times due to limited computational resources.

Simulations Protocols
In all kinetics simulations we used a simulation box with a volume of 1.

Order Parameter Used in FFS Simulations
We used two types of order parameters in the simulations, which we combined into one multidimensional order parameter. Specifically, a 'distance' order parameter ( Supplementary Table 7.
With the order parameter defined, the transition rate for the formation of module i in a designed 4Py folding pathway (see Fig. 2), is computed from data obtained from FFS simulations as

Initialization of Single-Strand States for Use in FFS Simulations
FFS requires the initial configurations that are used to start flux generation simulations to be thermodynamically representative of the A i−1 -state. However, in simulations sampling the formation of a pyramid, configurations may form 14 bp (our requirement for the complete formation of a module) before the tail has completely threaded. For pyramid systems it was not possible to implement Monte-Carlo simulations, which could be used to calculate relative free energies between states with completely threaded tails and those that are incompletely threaded, because we lack a systematic approach for measuring the linking number of strands at vertices. We chose to run molecular dynamics simulations at 72 ∘ C before the start of a rate calculation for the folding of module i in an attempt to relax trial configurations. The simulations sample, in addition to any pre-formed pyramid structure, only the diffusion of unbound single stranded regions, which are prevented from forming any unintended base pairs that could lead to competing metastable structures. Since the tails only need to sample their local environments, which we naively expect they will be able to do with relative ease, the barrier between threaded and incompletely threaded tails should not necessarily be exceedingly large for the P1 system, where the longer tails have to thread when less pyramid structure is in place. Simulations that together accumulate a long amount of simulation time should be sampling at least some representative states. The relaxation simulations may also sample the melting of a previously formed module. Configurations that do lose modules were collected but were not used in any future simulations. To obtain a set of relaxed configurations, 500 starting configurations were randomly selected from the final state of the previous rate calculation for the formation of the module (i-1), where each configuration was simulated for 0.152 s before being saved. A configuration from this relaxed set is selected at random and set to be the starting configuration in a flux generation simulation.

Measurement of the Height of P1
As reported in the main text, we measured the average height h of P1 in simulations after cooling completely folded configurations down to 25 ∘ C. The average height was predicted to be 4 ± 0.3 nm. In Supplementary Fig. 13

Kinetics Results
The cumulative statistics of the FFS simulations for the complete formation of P0 are listed in Supplementary Table 8, for the P1 system in Supplementary Table 9, and the formation of the first four connection-forming steps in the P4 system in Supplementary Table 10.
Supplementary Tables 8-10 also list the estimated melting rates of a module ( k − i−1 ) which were sampled during the simulations for the formation of module i. The melting rate of a module is estimated by dividing the total number of sampled melting events, which occurred during flux generation simulations, divided by the total simulated flux time. For the module formation events that were sampled in P1, all estimated S26 melting rates are larger than the corresponding calculated forward rate by at most a factor of ten. P4 is less affected by melting until the first kissing hairpin forms, at which point module formation becomes extremely slow. P0 is also affected by melting of pre-formed modules, and is most prominent during the sampling of module formation when long tails must thread loops for the module to be completely formed.
There are two main reasons for the high rate of melting of pre-formed modules. First, pyramids containing improperly threaded modules are much more likely to melt because the pyramid is rather flexible at its vertices due to the linkers, which can help relieve the stress caused by a topological defect by allowing the affected module to lose some base pairs and partially unwind. Secondly, the simulation temperature is in the vicinity of the melting temperature of the modules, so properly formed modules may melt spontaneously. Most melting events in P0 and P1 are due to modules 2 and 3 unfolding, which are also the modules that we observe most often containing an improperly threaded tail. However, not all long tails lead to a higher incidence of defective modules in completed pyramids. For example, in the P0 system we see module 5, another double helix that forms as a result of a tail containing three modules threading a loop, forming correctly in the majority of sampled pyramids. The higher probability of forming defect-free modules is caused by a comparably smaller loop that the long tail must thread in order to form all base pairs when compared to the formation of module 3 in P1, because more pyramid structure is in place.
As discussed in section "Order Parameter Used in FFS Simulations", the implemented order parameter could not tell us whether a tail had completely threaded or not. Consequentially, not all of the initial simulations run prior to FFS simulations of the formation of module i may have sampled thermodynamically representative configurations of the state !!! . However, since we were able to sample pyramids with and without defects in several folding pathway designs, incomplete threading of modules may not be strongly favored over completely threaded tails in the final simulated structure. Thus, the rates of formation of modules may only be marginally affected by the formation of topological defects.
However melting rates are likely to be over-estimated. Simulations at lower temperatures would suppress the high rate of melting but may not help much in reducing the number of topologically defected modules, which are likely to still be present regardless of the simulation temperature. While there is uncertainty in