Compiler-aided systematic construction of large-scale DNA strand displacement circuits using unpurified components

Biochemical circuits made of rationally designed DNA molecules are proofs of concept for embedding control within complex molecular environments. They hold promise for transforming the current technologies in chemistry, biology, medicine and material science by introducing programmable and responsive behaviour to diverse molecular systems. As the transformative power of a technology depends on its accessibility, two main challenges are an automated design process and simple experimental procedures. Here we demonstrate the use of circuit design software, combined with the use of unpurified strands and simplified experimental procedures, for creating a complex DNA strand displacement circuit that consists of 78 distinct species. We develop a systematic procedure for overcoming the challenges involved in using unpurified DNA strands. We also develop a model that takes synthesis errors into consideration and semi-quantitatively reproduces the experimental data. Our methods now enable even novice researchers to successfully design and construct complex DNA strand displacement circuits.


Molecules with synthesis errors
We first define the probability of having n errors in a chemically synthesized DNA strand of l bases, given that r is the probability of synthesis error per base: We then calculate the populations of signal, gate and threshold molecules with and without synthesis errors (Fig. 8a). To make the model simple enough, but accurate enough to describe reactions that involve molecules with synthesis errors at distinct locations, we treat the very small population of molecules with more than one synthesis error as non-reactive, and classify the remaining molecules containing a single synthesis error based on the domain where the error occurs. For example, a signal strand is composed of two branch migration domains flanking a toehold domain (Fig. 8a, top left). Given that a branch migration domain has 15 bases and a toehold domain has 5 bases, the probability of a signal strand having s errors in a specific branch migration domain (and thus not in the other) and t errors in the toehold domain can be calculated as: P w (r, s, t) = P (r, 15, s) × P (r, 5, t) × P (r, 15, 0) It is known that the failure rate for each nucleotide coupling event during DNA synthesis is 1% or less 4 (https: //www.idtdna.com/pages/docs/technical-reports/chemical-synthesis-of-oligonucleotides.pdf), and we choose to use r = 0.01 in all following calculations. Specifically, a signal species composed of domains Sj, T and Si can be classified into five populations: P w (r, 0, 0) = 70.3% with no synthesis errors (named w j,i ), P w (r, 1, 0) = 10.7% with an error in the Sj domain (named w j * ,i ), P w (r, 0, 1) = 3.6% with an error in the T domain (named w j, * i ), P w (r, 1, 0) = 10.7% with an error in the Si domain (named w j,i * ), and 1 − P w (r, 0, 0) − 2 × P w (r, 1, 0) − P w (r, 0, 1) = 4.8% with two or more errors (considered as inert and not participating in any reactions). The location of a star in the name corresponds to the location of a synthesis error. Because the same toehold domain (that we call the universal toehold) is used in all signal species and thus not specified in the name, an error in the toehold domain is indicated by a star following the comma that separates j and i.
A gate molecule consists of a signal strand bound to a gate bottom strand that has two toehold domains flanking a branch migration domain (Fig. 8a, bottom). The gate bottom strand is never free, and only participates in reactions of two signal strands competing for the same bottom strand. Any error in the branch migration domain of the bottom strand should not significantly affect the reaction rate, because it does not bias the competition in either direction, and after the initiation of strand displacement, random walk steps of adjacent base pair opening and closing should remain sufficiently fast. 3 Thus we only consider errors in the remaining two branch migration domains and three toehold domains. The probability of a gate molecule having s errors in a specific branch migration domain (and thus not the other) and t errors in a specific toehold domain (and thus not the other two) can be calculated as: P G (r, s, t) = P (r, 15, s) × P (r, 5, t) × P (r, 25, 0) A threshold molecule consists of an extended toehold domain of 10 bases and two complementary branch migration domains (Fig. 8a, top right). The branch migration domains only participate in irreversible strand displacement reactions of a threshold molecule consuming a signal strand. An error in these two domains should not significantly affect the reaction rate, because it either occur in the top strand and bias the reaction more forward, which has little effect since the reaction is already strongly favored in the forward direction, or occur in the bottom strand and doesn't introduce additional bias to the random walk steps. Thus we only consider errors in the extended toehold domain. The probability of a threshold molecule having t errors can be calculated as: Using equations 3 and 4, a gate species can be classified into seven populations, including 1 − P G (r, 0, 0) − 2 × P G (r, 1, 0) − 3 × P G (r, 0, 1) = 7.5% inert molecules. A threshold species can be classified into three populations, including 1 − P T h (r, 0) − P T h (r, 1) = 0.4% inert molecules.

Reactions that involve molecules with synthesis errors
Seesaw circuits can be modeled with five types of reactions 1 (Fig. 8b): seesawing reactions that reversibly exchange two signals between inactive (i.e. bound to a gate) and active (i.e. free-floating) states, thresholding reactions that irreversibly consume a signal, reporting reactions that generate fluorescence readout, leak reactions that slowly release a signal from a gate molecule, and universal toehold binding reactions that temporarily occur between any single strand and any gate or threshold molecules. Compared to the reactions that only involve molecules without synthesis errors, there is a much longer list of reactions that involve molecules with synthesis errors, because each distinct species is now divided into multiple populations. To concisely describe these reactions, we define that reactions in the following format {R 11 , R 12 , · · · , R 1n } + {R 21 , R 22 , · · · , R 2m } k − → {P 11 , P 12 , · · · , P 1n } + {P 21 , P 22 , · · · , P 2m } can be interpreted as the set of reactions: This means for reactions with two reactants and two products, we always group them together in a way that the first product can be determined just based on the first reactant, and the second product can be determined just based on the second reactant. Note that a reversible reaction can be seen as two irreversible reactions that each follow the same definition. Based on the previous results, 5 we estimated the rates of seesawing and thresholding reactions that involve all populations of signal, gate and threshold molecules shown in Fig. 8a.

(1) Seesawing reactions:
If there is no error in the domains participating in a seesawing reaction, regardless of any errors in the other domains (e.g. Sj and Sk domains for w j,i interacts with G i:i,k ), the rate remains the same as in the previous model for purified seesaw circuits.
If there is an error in the participating toehold or branch migration domain of the invading signal strand, or in the initiating toehold domain of the gate molecule (i.e. the toehold that binds to the invading signal strand), the forward rate is 100 times slower and the backward rate remains the same. Symmetrically, if there is an error in the participating toehold or branch migration domain of the bound signal in the gate molecule, or in the disassociation toehold domain (i.e. the toehold that is originally covered), the backward rate is 100 times slower and the forward rate remains the same.
Note that gate molecules with two synthesis errors (e.g. G * i:i,k * and G i * :i,k * ) are not in the initial populations but can be produced by a seesawing reaction between signal and gate molecules that each have just one synthesis error.
Reactions are omitted if there exist more than one synthesis error that can significantly affect the rate, because they are either too slow or do not have enough reactants to take place. For example, (2) Thresholding reactions: Unlike a seesawing reaction, if there is no error in the toehold domains participating in a thresholding reaction, regardless of any errors in the branch migration domains, the rate remains the same as in the previous model for purified seesaw circuits.
Otherwise the rate is 100 times slower.
Reactions are again omitted if there exist more than one synthesis error that can significantly affect the rate. For example, An error in the extended toehold but not in the universal toehold domain of a signal strand is not considered to affect the rate of thresholding, because an error more distant from the branch migration domain should affect the rate less, and considering it would complicate the classification of signal molecules.
(3) Reporting reactions: Reporting reactions are also irreversible, and thus are modeled similarly as the thresholding reactions, based on if there is an error in the toehold domains.
(4) Leak reactions: Leak reactions are essentially 0-toehold strand displacement reactions. If there is no error in the participating domains of the two competing signal strands, the rate remains the same as in the previous model for purified seesaw circuits. An error in the gate bottom strand should not affect the rate significantly, regardless of if it is in the toehold domain, because the toehold is covered and thus treated the same as the branch migration domain.
Leak reactions should be faster, if there is an error in the participating toehold or branch migration domain of the bound signal strand in the gate molecule, because the forward reaction will be favored. The reaction would be roughly 10 times faster if the error occurs at either end of the double-stranded domain and opens up a 1-nucleotide toehold for the invading signal strand. However, the error should occur in the middle of the double-stranded domain, with a much higher probability, and thus serve as a much less effective 1-nucleotide toehold. Therefore, we estimate the rate to be only twice as fast.
Since leak reactions are already very slow, reactions are omitted if there is an error that slows down the rate even further. For example, Universal toehold binding reactions: Finally, the forward rate remains the same for all universal toehold binding reactions, since it is just the rate of hybridization. The backward rate remains the same if there is no error in the toehold domains, and is 10 times faster if there is an error, simply because the rate of toehold disassociation can be estimated as 10 6−l /s, where l is the number of bases in the toehold. 2, 3

Approximation in domain lengths
When calculating the populations of molecules with synthesis errors in distinct domains, we assume that a branch migration domain has 15 bases, a toehold domain has 5 bases, and a signal strand is composed of two branch migration domains flanking a toehold domain. This is a simplification of the actual components of a signal strand.
To reduce undesired leak reactions between two gate species, a signal strand is designed to include clamp domains of 2 bases (more details see supplementary notes S8 of ref. 1 ). These 2 bases are either part of a branch migration domain, or part of a toehold domain, depending on which side of a gate the signal strand is bound to or interact with. Because of the double identities of a clamp domain, a signal strand actually has 33 bases. Considering the clamp domains would significantly complicate the classification of molecules and reactions, and would only result in a very small difference compared to the calculations that we made in the model. Thus, we chose to not consider the clamp domains and used the approximation that a signal strand has 35 bases.

Concentrations of threshold species
Our model including molecules with synthesis errors explained the significant slow down of the unpurified seesaw circuits compared to purified ones, but it cannot explain why the threshold molecules had significantly higher effective concentrations compared to the signal strands. In fact, we applied threshold to signal ratio β/α = 1.4, as calculated in equation 7, to all threshold concentrations in the simulations. Having higher threshold concentrations in the model was actually not new to seesaw circuits. An β/α = 1.1 was applied to all threshold concentrations in the previously-developed model for purified seesaw circuits. 1 Unlike how we specifically measured β/α in this work, the 1.1× nominal threshold was simply tuned to obtain a better agreement between simulations and experiments for large circuits. We suspect that the difference in concentrations of threshold and signal species (both free-floating and bound to a gate), is caused by certain aspects of the DNA oligonucleotide synthesis procedures that we do not yet understand. This difference may be improved (e.g. from 1.4× to 1.1×), but cannot be completely removed, by in-house gel purification. Thus, it is important that the value of β/α is determined by users of the Seesaw Compiler, following the procedures that we discussed in threshold calibration.