Generation of DNA oligomers with similar chemical kinetics via in-silico optimization

Networks of interacting DNA oligomers are useful for applications such as biomarker detection, targeted drug delivery, information storage, and photonic information processing. However, differences in the chemical kinetics of hybridization reactions, referred to as kinetic dispersion, can be problematic for some applications. Here, it is found that limiting unnecessary stretches of Watson-Crick base pairing, referred to as unnecessary duplexes, can yield exceptionally low kinetic dispersions. Hybridization kinetics can be affected by unnecessary intra-oligomer duplexes containing only 2 base-pairs, and such duplexes explain up to 94% of previously reported kinetic dispersion. As a general design rule, it is recommended that unnecessary intra-oligomer duplexes larger than 2 base-pairs and unnecessary inter-oligomer duplexes larger than 7 base-pairs be avoided. Unnecessary duplexes typically scale exponentially with network size, and nearly all networks contain unnecessary duplexes substantial enough to affect hybridization kinetics. A new method for generating networks which utilizes in-silico optimization to mitigate unnecessary duplexes is proposed and demonstrated to reduce in-vitro kinetic dispersions as much as 96%. The limitations of the new design rule and generation method are evaluated in-silico by creating new oligomers for several designs, including three previously programmed reactions and one previously engineered structure.


Introduction
Molecules of single stranded deoxyribonucleic acid, sometimes referred to as DNA oligomers, can form duplexes where two complementary base-sequences are held together by Watson-Crick base-pairing.
A given network of DNA oligomers is often designed to adopt speci c duplexes which form a spatial arrangement of matter (i.e., a two-dimensional 7,31,32 or three-dimensional 33 shape) or implement a chemical reaction network 34,35 .However, networks of oligomers may also form other duplexes, referred to here as unnecessary duplexes.It has been well established that unnecessary duplexes can affect hybridization reactions 9,23,36−40 , making them a potential contributor to kinetic dispersion.Several other factors known to affect hybridization reactions may also contribute to kinetic dispersion.For separate aqueous solutions containing oligomers with the same base-sequences, initial state, and nal state these known factors include temperature 12,22,41−44 , ionic strength 45,46 , and viscosity 45,47 .For separate aqueous solutions containing oligomers with different base-sequences but similar initial and nal states these known factors additionally include oligomer length 42 and duplex stability 12,22,48,49 .

Kinetic dispersion in existing experimental data
Numerous previous studies have characterized the chemical kinetics of DNA oligomers in aqueous solutions undergoing hybridization reactions and several of these studies characterized enough samples for a meaningful re-analysis of the data.Here, this existing experimental data was used to better understand the relationship between unnecessary duplexes and kinetic dispersion.
Five datasets (i.e., sets of comparable samples) were identi ed in which a speci c hybridization reaction was characterized and all known hybridization-reaction-affecting factors except unnecessary duplexes were approximately constant.These ve datasets are listed in Fig. 1a and were given labels abbreviating the rst author, temperature, and reaction type.The ve datasets include: (1) H22F reported by Hata et al. 22 , (2) O25C reported by Olson et al. 23 , (3) O25L reported by Olson et al. 23 , (4) Z37F reported by Zhang et al. 12 , and (5) Z55F reported by Zhang et al. 12 .Labels for reaction types include duplex-formation (F), catalytic (C), and leak (L).The reactions in this dataset ranged from a relatively simple duplex-formation reaction to a relatively complicated multiple-intermediate catalytic reaction.Additional information regarding these datasets is reported in supplemental table S1-01.
Histograms of the rate-constant values in each dataset are shown in Fig. 1b.The rate-constant values within each dataset span at least 3 orders of magnitude, indicating that substantial kinetic dispersion exists in all ve datasets.The rate-constant values in each dataset are left skewed (i.e., have mean less than the median), and appear to be poorly approximated by a gaussian distribution.
To ascertain how much of the kinetic dispersion in these datasets can be explained by unnecessary interoligomer duplexes, the kinetic dispersion of samples with the least substantial unnecessary interoligomer duplexes were compared to that of randomly selected samples.Unnecessary inter-oligomer duplexes were quanti ed using the network tness score above baseline (denoted ΔN) and kinetic dispersion was quanti ed using the Inter-Quartile Range of the Natural Logarithm of rate-constants (denoted IQRNL), both of which are described in the methods section.When the kinetic dispersion of the three most ΔN-t samples were estimated (Fig. 1c green bars), kinetic dispersions were reduced by 37%, 51%, 23%, 39%, and 19% relative to the random samples (Fig. 1c grey bars).The estimated kinetic dispersion of the ΔN-t samples overlapped with that of the random samples for all 5 datasets, indicating that the differences were only marginally resolvable for any speci c dataset.However, the systematic decrease in kinetic dispersion across all ve datasets was interpreted as evidence that at least some of the kinetic dispersion in these datasets can be explained by unnecessary duplexes contributing to ΔN.The kinetic dispersions estimated for each ΔN-t population are reported in supplemental table S1-02.
Correlation plots of rate-constant as a function of ΔN are shown in supplemental gure S1-01.
To ascertain how much of the kinetic dispersion in these datasets can be explained by unnecessary intraoligomer duplexes, the kinetic dispersion of samples with the least substantial unnecessary intraoligomer duplexes were compared to that of randomly selected samples.Unnecessary intra-oligomer duplexes were quanti ed using the oligomer tness score above baseline (denoted ΔO and described in the methods section).Kinetic dispersion was quanti ed using IQRNL.When the kinetic dispersion of the three most ΔO-t samples were estimated (Fig. 1c orange bars), kinetic dispersions were reduced 84%, 92%, 86%, 93%, and 94% relative to the random samples (Fig. 1c grey bars).For all ve datasets, there was no overlap between the estimated kinetic dispersion of three ΔO-t samples and that of random samples, indicating this trend was strongly resolved.The decreased IQRNL of the ΔO-t samples was interpreted as evidence that the kinetic dispersion in all ve datasets can be mostly explained by unnecessary duplexes contributing to ΔO.The (non-normalized) kinetic dispersions estimated for each ΔO-t population are reported in supplemental table S1-02.Correlation plots of rate-constant as a function of ΔO are shown in supplemental gure S1-01.
The size of the unnecessary intra-oligomer duplexes explaining this kinetic dispersion indicates that relatively small duplexes are substantial enough to cause kinetic dispersion.For the ve datasets, the median ΔO of the three most ΔO-t samples differed from the median ΔO of three random samples by 3.3 x 10 3 , 7.0 x 10 3 , 7.0 x 10 3 , 1.0 x 10 5 , and 1.1 x 10 5 tness points.The smallest of these differences (3.3 x 10 3 ) is equivalent to approximately 3 unnecessary intra-oligomer duplexes containing 3 base-pairs each.This difference was present in the H22F dataset and was associated with an 84% reduction in kinetic dispersion.It was inferred from this information that, at least some, unnecessary intra-oligomer duplexes containing 3 base-pairs are substantial enough to impact hybridization kinetics.
The most ΔN-t samples in each dataset also exhibited kinetic dispersions systematically lower than random samples.However, the median ΔO of the three ΔN-t samples was also lower than the median ΔO of three random samples by 2.4 x 10 3 , 4 x 10 3 , 4 x 10 3 , 4 x 10 4 , and 5 x 10 4 tness points.Thus, the correlation between ΔN and decreasing kinetic dispersion can be fully explained by 3-base pair unnecessary intra-oligomer duplexes.
The weighted tness scores above baseline (abbreviated ΔW x and detailed in the methods section) were used to simultaneously quantify both unnecessary intra-oligomer duplexes and unnecessary interoligomer duplexes.To ascertain the performance of different ΔW x tness scores, kinetic dispersion was estimated for the three most ΔW x -t samples according to several x values (Fig. 1d).For x ≥ 10 4 , the most ΔW x -t samples were also the most ΔO-t samples, and the same large kinetic dispersion reductions were observed.For x ≤ 10 2 , the most ΔW x -t samples were similar or identical to the ΔN-t samples, and also yielded only marginal kinetic dispersion reductions.These results were expected based on the de nition of ΔW x and demonstrate how larger x values can be used to increase the penalty for unnecessary intra-oligomer duplexes.Since ΔW x with x ≥ 10 4 performed su ciently for all ve datasets, it was inferred that this tness score may be useful across a range of experimental conditions and network designs.

Scaling of typical unnecessary duplexes with network size
For networks of DNA oligomers, it is known that the largest unnecessary duplex typically increases with both the number and length of oligomers present 38,52,53 .Here, how unnecessary duplexes scale with these factors was studied in-silico using the network tness score (denoted N) and oligomer tness score (denoted O).To determine the typical tness score of different sized networks, networks were sampled by randomly generating base-sequences for a model system.The model system for this study (Fig. 2a) was a network containing no intentional base pairing, i oligomers, and j bases per oligomer.
Values of N and O estimated for several i and j combinations are shown in Fig. 2b 2c).Both N and O values were observed to systematically diverge from this model for small j values, which can be explained by the upper limit oligomer length establishes on duplex length.
It was inferred from this data that nearly all networks possess unnecessary duplexes substantial enough to cause kinetic dispersion.This is based on the following logic.First, during the analysis of existing experimental data (Fig. 1), it was observed that unnecessary intra-oligomer duplexes containing as few as 3 base-pairs are substantial enough to affect the chemical kinetics of hybridization reactions.A single unnecessary duplex containing 3 base-pairs corresponds with ΔO = 10 3 .Since ΔO increases with both i and j, sets of oligomers with ΔO ≤ 10 3 are a small fraction of the countably in nite number of possible networks.Consequently, nearly all oligomers possess unnecessary duplexes substantial enough to affect hybridization kinetics.The model in Fig. 2c implies that this is true for over 50% of oligomer-sets when (0.0526) • i 1.56 • j 3.63 ≥ 10 3 .This condition is satis ed by networks containing the following: 1) any oligomer longer than 15 bases, 2) any 2 oligomers longer than 11 bases, or 3) any 4 oligomers longer than 8 bases.In the literature it is a common assumption that several randomly generated networks approximate a target reaction well enough for a meaningful comparison.However, these results indicate this may often be a poor approximation and should be considered carefully.

Kinetic dispersion of newly generated oligomers
To con rm that minimizing unnecessary duplexes is su cient to mitigate kinetic dispersion, several new networks were generated using in-silico optimization and characterized in-vitro.The employed generation process is summarized in Fig. 3 and described in greater detail in the methods section.The model system for this study was the network design shown in Fig. 4a.This network design consists of three oligomers (labeled S 1 , S 2 , and S 3 ) intended to undergo two hybridization reactions referred to as duplex-formation and strand displacement.Networks were generated using one of four optimization criteria: no criteria (i.e., random sequence assignment or the RND group), N (the N-t group), O (the O-t group), or W 1 (W 1 -t group).Three networks were generated using each optimization criteria, and the oligomers for each generated network are reported in supplemental table S4-01.
The unnecessary duplexes for each network were quanti ed using ΔN and ΔO (Fig. 4b).Histograms summarizing the unnecessary duplexes are reported in supplemental gure S4-01.It was noted that the randomly-generated networks contained intra-oligomer unnecessary duplexes as large as 5 base-pairs, while no N-t, O-t, or W 1 -t network contained such duplexes larger than 2 base-pairs.Optimization of ΔO increased both inter-oligomer duplexes and ΔN, which is explained by the known logical connection between N and O.
The chemical kinetics of the new networks were characterized using a uorescence signal and modeled using second-order rate equations.The duplex-formation reaction was modeled using the following rateequation governed by rate constant k f .The strand-displacement reaction was modeled using the following rate-equation governed by rateconstant k d : Based on these equations, the rate-constants k f and k d both have units of M -1 s -1 .
The measured rate-constant values are shown in Fig. 4c.The largest rate-constant value observed was 5.9 x 10 7 M -1 s -1 (k f , O-t-1, 60°C), and the smallest was 9.2 x 10 3 M -1 s -1 (k f , RND-1, 10°C).The range of these rate-constant values is consistent with values reported elsewhere in the literature 12,22,23,43,47,57−61 .The mean and standard deviation of triplicate measurements for select data points are shown as error bars on Fig. 4c.The maximum difference within these triplicate measurements was 4% or less.Additional information regarding rate-constant measurements, including uorescence data and t quality are reported in supplemental section 4.
Network generation via W 1 optimization yielded the lowest kinetic dispersions.These kinetic dispersions were reduced by 86% (formation reaction) and 75% (displacement reaction) relative to the randomly generated networks.These reductions are consistent with the reductions observed while analyzing measurements reported by previous researchers (Fig. 1).Three key observations were made based on the large kinetic dispersion reductions of W 1 -t networks.1) This validates that substantial kinetic dispersion remained despite the factors held constant (e.g., temperature, viscosity, ionic strength, oligomer length, dt duplex stability, and toehold sequence).2) Unnecessary duplexes contributing to W 1 explain most of the remaining kinetic dispersion.3) Controlling speci cally for these factors is su cient to mitigate most kinetic dispersion.
Network generation via O optimization yielded two networks with kinetics similar to the W 1 -t group, and one network with a substantially faster duplex-formation reaction.This accelerated reaction may be explained by the numerous large unnecessary inter-oligomer duplexes introduced during O optimization (i.e., duplexes as large as 8 base-pairs).Notably, the two O-t networks with kinetics similar to the W 1 -t group also contain large unnecessary inter-oligomer duplexes (i.e., as large as 10 base-pairs) indicating that not all unnecessary inter-oligomer duplexes affect all hybridization reactions.
Network generation via N optimization yielded networks with unnecessary duplexes very similar to the W 1 -t group.This was associated with median ΔN values 2.9 x 10 4 tness points below and median ΔO values 1.1 x 10 3 tness points above the W 1 -t group.For all the N-t and W 1 -t oligomer-sets, the largest intra-oligomer unnecessary duplex contained 2 base-pairs and the largest inter-oligomer unnecessary duplex contained 4 base-pairs.However, two of the N optimized oligomer-sets (N-Fit-2 and N-t-3) exhibited duplex-formation kinetics substantially different from the W 1 -t group.This effect was especially pronounced at low temperatures, where the rate-constants differed up to a factor of 7.This was interpreted as evidence that intra-oligomer unnecessary duplexes as small as 2 base-pairs can affect hybridization kinetics under certain conditions.For all optimized networks, kinetic dispersion decreased at higher temperatures.This effect was especially pronounced for the W 1 -t oligomers undergoing the displacement reaction at 50°C and 60°C.At these temperatures, kinetic dispersions were reduced by 94% and 96% relative to randomly generated oligomers.The remaining kinetic dispersion for the W 1 -t oligomers at these temperatures appears to be beyond the resolution of the current study, which was estimated at 4%.The trend of decreasing kinetic dispersion with increasing temperature may be explained by the destabilization of smaller intra-oligomer unnecessary duplexes.From this, it was inferred that operating networks of DNA oligomers at relatively high temperatures may help mitigate kinetic dispersion, especially so for networks containing relatively small unnecessary duplexes.

Temperature dependence of hybridization kinetics
Rate-constants for the newly generated networks were measured at temperatures of 10, 20, 30, 40, 50 and 60°C.Rate constants for the duplex-formation reaction were observed to be strictly increasing in this temperature range, suggesting an Arrhenius temperature dependence.Similar reactions where oligomers form a single duplex have been characterized in previous studies and both Arrhenius 41,43,47,62 and non-Arrhenius 22,47,63 temperature dependences have been observed.Rate constants for the stranddisplacement reaction typically exhibited a broad peak with a maximum rate-constant between 30 and 40°C.The decreased strand-displacement rate-constants at higher temperatures can be explained by a destabilizing intermediate, which was presumed to be the localization of three-oligomers prior to stranddisplacement.
The temperature dependence of duplex-formation rates were modeled using the following equation: Referred to as the Arrhenius equation, this equation includes the following values: the pre-exponential factor (A), the activation energy (E a ), the absolute temperature (T), and the Boltzmann constant (k b ).
Based on Eq. 8, a plot of the natural logarithm of k f as a function of inverse temperature is expected to be linear, and such plots are shown in Fig. 5a.The activation energy and pre-exponential factor extracted from these ts are shown in Fig. 5b.
The duplex-formation rate-constants of all twelve networks were well modeled by the Arrhenius equation (Eq.8).The Arrhenius equation models the rate-limiting step of a reaction as a constant energy barrier overcome by thermal energy.Dispersion in the energy of this barrier (i.e., the activation energy E a ) followed similar trends to kinetic dispersion and was most uniform for the W 1 optimized networks.A plot of the natural logarithm of the pre-exponential factor as a function of activation energy was observed to be highly linear and is shown in Fig. 5c.This implies the following equation relating these parameters: where C 1 and C 2 are real valued constants.A linear t to the data in Fig. 5c is shown as a solid line.This t yielded C 1 and C 2 values of 19.1 log M -1 s -1 and 20.6 x 10 19 J -1 log M -1 s -1 , respectively.Eq. 9 was simpli ed by declaring the following constants: which yielded the following empirical model for k f : One feature of this equation is the theoretical critical temperature (T c ) at which formation rates equal the maximum rate (C 3 ) regardless of the value of the activation energy (E a ).The C 1 and C 2 values from the t to Eq. 9 imply C 3 and T c values of 1.97 x 10 8 M -1 s -1 and 352 K.These values are likely speci c to the duplex-formation reaction in Fig. 4a and factors such as viscosity, ionic strength, oligomer length, and duplex stability.
Correlations such as Eq. 9 have been observed by previous researchers and were interpreted as evidence of enthalpy/entropy compensation in an underlying thermodynamic model 64 .However, this explanation is contradicted by evidence that duplex formation reactions are a dynamic equilibrium governed by a rate- limiting transition through an intermediate 42,59,65,66 .As such, it was concluded that Eq. 12 arises from a similarity in the rate-limiting reaction mechanism of the reactions.

Generation of new oligomers for existing network designs
Larger and/or more complicated network designs may introduce issues which limit the effectiveness of W x optimization.To better understand the limitations of W x optimization, new oligomers were generated for several existing network designs.
The process shown in Fig. 3 was used to generate oligomers for four existing network designs.These designs included: 1) the "10 x 10 x 10 molecular canvas" reported by Ke et al. 67 , 2) the "four-input OR seesaw-gate network" reported by Qian et al. 68 , 3) the "autocatalytic four-arm junction" reported by Kotani et al. 69 , and 4) the "entropy driven autocatalytic network" reported by Zhang et al. 57 .Unnecessary duplexes in both the published and the new oligomers were quanti ed using ΔN and ΔO and are shown in Table 1.Additional details regarding these oligomers, including the base-sequence of the new oligomers and tables detailing the length and number of unnecessary duplexes, are reported in supplemental section 5. W x optimization was successfully used to generate oligomers with reduced unnecessary duplexes for each network design.This is apparent in both the reduced ΔN and ΔO for the new oligomers (Table 1) and in the size of unnecessary duplexes.None of the new networks contain unnecessary intra-oligomer duplexes larger than 3 base-pairs, and three of the four new networks contain no unnecessary intraoligomer duplexes larger than 2 base-pairs.In contrast, each of the existing networks contain unnecessary intra-oligomer duplexes of at least 4-base pairs, and as larger as 8 base-pairs.
For the optimization goal of eliminating all unnecessary intra-oligomer duplexes larger than 2 base-pairs, it was inferred that the current size limit is slightly smaller than the "10 x 10 x 10 molecular canvas" (i.e., approximately 517 oligomers containing up to 48 bases).However, the optimization goal of reducing ΔO values below 3 x 10 3 was not achievable for any of the designs.For the optimization goal of reducing ΔO values below 3 x 10 3 , it was inferred that the current size limit is less than that of the model system in Fig. 4a (i.e., 3 oligomers containing up to 49 bases).The exact value of these size limits are likely sensitive to network design, however the general magnitude of these limits are expected to be similar for many network designs.
During generation, it was necessary to tune the value of x independently for each network design.For x values which were too small, generated oligomers contained excessive intra-oligomer duplexes.For x values which were too large, generated oligomers contained excessive inter-oligomer duplexes.For all designs, an x value balancing intra-oligomer and inter-oligomer duplexes was found.The need to tune x can be explained by the different scaling of inter-oligomer and intra-oligomer duplexes network size observed in Fig. 2.
The amount of computation required to generate optimized oligomers varied substantially depending on the network design.Generating oligomers for the Zhang et al. design required the least amount of computation.The new oligomers for this design were generated after scoring 800,000 potential oligomersets in 2 minutes using a laptop computer (Intel Xeon E3-1505M v5 processor).The most computation was required for the "10 x 10 x 10 molecular canvas" design.The new oligomers for this design were generated after scoring 48,000,000 networks using 4 computing nodes for 6 hours each (each node contained dual intel Xeon E5-2680 v4 processors).The larger amount of computation required by this design is explained both by the relatively large number of oligomers (517 total oligomers) and the relatively large oligomers in the design (i.e., 66 oligomers containing 48 bases).

Comparison to other generation methods
Numerous methods for generating DNA oligomers have been demonstrated and there exist at least fteen computer programs speci cally intended to generate networks of interacting DNA oligomers 7,38,52,53,55,56,70−78 .Generation methods which utilize in-silico optimization to mitigate unnecessary duplexes can be generally divided into three groups: Those which minimize the number of base-pairs in the largest unnecessary duplex (i.e., the sequence symmetry criteria) 38,52,53,75 , those which optimize thermodynamic properties 70,79,80 , and those which optimize other custom tness scores 12,55,76 .The method shown in Fig. 3 optimizes the W x tness score, which quali es as a custom tness score.
This custom tness score gives the method key advantages and key disadvantages relative to the other methods.
The following three advantages of the generation method were observed.First, when suitably low ΔN and ΔO values can be achieved, W x optimization yields the lowest kinetic dispersions of any method known to the authors.Second, calculation of W x requires less computation than thermodynamic or molecular dynamic simulations.Third, W x calculation does not involve experimental conditions such as temperature, ionic-strength, or oligomer concentration.As such, the low kinetic dispersions of W x t networks is expected to be more robust to variations in experimental conditions than networks generated via thermodynamic or molecular dynamic simulation.
The following three limitations of the generation method were observed.First, this optimization method is most effective when ΔO can be reduced below 3 x 10 3 .This limits either the size or the quality of networks the method can generate.Second, W x assigns points to unnecessary duplexes based solely on their size (i.e., the number of base-pairs they contain).This is derived from the idea that all duplexes of a given length are equally bad, which becomes a poor assumption for duplexes containing approximately 4 or more base pairs since some 4 base-pair duplexes are more thermodynamically stable than some 5 base-pair duplexes.Consequently, W x optimization may erroneously favor more problematic duplexes when larger duplexes must be included.Third, N, O, and W x are ine cient in the sense that they penalize all inter-oligomer and all intra-oligomer unnecessary duplexes equally.Consequently, these scores likely overlook many experimentally viable oligomers where the unnecessary duplexes are either not stable enough or located in locations which do not affect the intended hybridization reaction.
To verify that existing methods do not already su ciently mitigate unnecessary duplexes, oligomers were generated for a model system using several freely available computer programs.The network design for this model system contained two 35-base oligomers forming a single 35 base-pair duplex.Oligomers were generated using the following methods: 1) optimization of the W 1 property using SeqEvo, 2) optimization of the W 10,000 property using SeqEvo, 3) default parameters of the Domain Design (DD) program 55 , 4) default parameters of the DNASequenceGenerator (DSG) program 53 , 5) default parameters of the Exhaustive Generation of Nucleic Acid Sequence (EGNAS) program 52 , 6) default parameters of the Nucleic Acid Package (NUPACK) software 56 , 7) default parameters of the Uniquimer3D program 75 , and 8) random sequence assignment.
The typical unnecessary duplexes for each generation method were identi ed and are reported in Table 2.
All oligomers generated for the model system were observed to form at least some unnecessary duplexes.However, the oligomers forming both the fewest and the smallest unnecessary duplexes were generated by SeqEvo.SeqEvo optimization of W 1 yielded oligomers forming less substantial unnecessary intra-oligomer duplexes than any program.SeqEvo optimization of W 10,000 yielded oligomers forming no unnecessary intra-oligomer duplexes larger than a single base pair.All programs other than SeqEvo yielded oligomers with unnecessary intra-oligomer duplexes of 3 base-pairs or larger.It was inferred from this that no known method renders SeqEvo obsolete, and that utility exists for the program.It is possible that optimization of criteria other than W x could yield low kinetic dispersions.However, for the purpose of eliminating unnecessary intra-oligomer duplexes based solely on the number of base-pairs they contain, the SeqEvo software outperforms any other optimization method studied here.

Discussion
Several applications for DNA oligomers have been demonstrated.However, the large differences in chemical kinetics exhibited by these oligomers, referred to as kinetic dispersion, can be problematic.Here, it was demonstrated that controlling factors known to affect hybridization reactions is su cient to mitigate most kinetic dispersion.These factors included temperature, ionic strength, viscosity, oligomer length, duplex stability, and unnecessary duplexes.
Mitigating unnecessary duplexes was observed to be key to achieving exceptionally low kinetic dispersions.Intra-oligomer duplexes speci cally were found to explain up to 94% of the kinetic dispersion in existing experimental data.Evidence was observed that intra-oligomer duplexes as small as 2 basepairs can affect hybridization kinetics under certain experimental conditions and that intra-oligomer duplexes as small as 3 base-pairs affect hybridization kinetics under typical experimental conditions.
Unnecessary duplexes were found to typically scale exponential with network size, indicating that most networks of interacting oligomers contain unnecessary duplexes large enough to impact hybridization kinetics.
A new method which generates oligomers by using in-silico optimization to limit unnecessary duplexes was presented.The generation method was demonstrated to mitigate kinetic dispersion up to 84% for the model system in Fig. 4a and to eliminate unnecessary intra-oligomer duplexes more effectively than several other freely available methods.When used to generate new oligomers for four existing network designs, the generation method successfully eliminated 3 base-pair or larger unnecessary intra-oligomer duplexes in three of the four network designs.
These results suggest the following model of how unnecessary duplexes affect hybridization reactions.When known factors are su ciently controlled, hybridization reactions exhibit highly similar chemical kinetics manifesting as low kinetic dispersions.For hybridization reactions with where a single stable duplex is formed and all known factors including unnecessary duplexes are mitigated, these chemical kinetics exhibit an Arrhenius temperature dependence.Relative to these chemical kinetics, unnecessary duplexes can either accelerate or decelerate a given hybridization reaction, which manifests as increased kinetic dispersion.Different unnecessary duplexes affect the reaction mechanism in different ways, leading to a divergence in reaction mechanisms and many possible behaviors.This model was observed to adequately describe both relatively simple reactions such as the duplex-formation reaction in Fig. 4a, and more complicated reactions such as the reaction characterized in dataset O25C from Fig. 1.For at least some hybridization reactions, these divergent reaction mechanisms continue to follow mathematical relationships such as Eq.12.
The ability to mitigate up to 94% of kinetic dispersion is expected to improve outcomes for applications such as biomarker detection, genomic sequencing, targeted drug delivery, arti cial gene regulation, information storage, and photonic information processing.However, the oligomer generation method reported here is most effective for relatively small networks where 3 base-pair unnecessary intra-oligomer duplexes can be fully eliminated.It may be possible to expand the effectiveness of this oligomer generation method by developing improved optimization criteria or increasing the e ciency of the software.

Methods
Datasets which reported 30 or more measurements of a speci c rate-constant under consistent experimental conditions were identi ed for analysis.The ve datasets identi ed include: (1) 47 measurements by Hata et al. 22 of a duplex-formation reaction at room temperature (approximated as 22°C), (2) 51 measurements by Olson et al. 23 of a catalytic reaction at 25°C, (3) 51 measurements by Olson et al. 23 of a leak reaction at 25°C, (4) 98 measurements by Zhang et al. 12 of a duplex-formation reaction at 37°C, and (5) 95 measurements by Zhang et al. 12 of a duplex-formation reaction at 55°C.The samples within each dataset are expected to have approximately constant temperature, ionic strength, viscosity, oligomer length, and duplex-stability.Either no other oligomers, or only poly-T oligomers were consistently present in each dataset.It is expected that unnecessary duplexes are the only known cause of kinetic dispersion not controlled within each dataset.The oligomers which yielded this data were neither randomly generated nor rationally designed.However, they were approximated as independent and identically distributed 81,82 for the statistical analysis.

Iqrnl
Kinetic dispersion was quanti ed as the inter-quartile range of the natural logarithm of rate-constant values (abbreviated IQRNL).The following observations regarding this metric were made.Small IQRNL values correspond with smaller, and more favorable, kinetic dispersion.IQRNL values can only be meaningfully compared if they are derived from the same rate-equations and rate-constants.IQRNL values are expected to be robust to both occasional outliers and the several orders of magnitude expected for these rate constants.IQRNL values are typically insensitive to the outer 50% of data, making them a conservative metric for quantifying kinetic dispersion.The IQRNL of a sample is expected to systematically underestimate the value of a population, marking it as a biased estimator and making the most meaningful comparisons of IQRNL values those calculated from the same number of samples.

N, O, and W x
Inter-oligomer duplexes were summarized using the network tness score (abbreviated N).N was calculated by accumulating 10 L tness points for each duplex connecting two oligomers: where L is the number of base-pairs in the duplex.Calculation of N included tness points for both duplexes which are part of larger duplexes and for each oligomer interacting with an identical oligomer.
The value of N is zero when the oligomers can form no inter-oligomer duplexes, and larger values of N indicate more substantial (i.e., larger or more numerous) inter-oligomer duplexes.Unnecessary interoligomer duplexes were quanti ed using the network tness score above baseline (abbreviated ΔN), which was calculated as network tness score minus the network tness score of all necessary duplexes.Intra-oligomer duplexes were summarized using the oligomer tness score (abbreviated O).O was calculated by accumulating 10 L tness points for each duplex connecting an oligomer with itself: where L is the number of base-pairs in the duplex.Calculation of O included tness points for duplexes which are part of larger duplexes.The value of O is zero when the oligomers can form no intra-oligomer duplexes, and larger values of O indicate more substantial intra-oligomer duplexes.Unnecessary intraoligomer duplexes were quanti ed using the oligomer tness score above baseline (abbreviated ΔO), which was calculated as oligomer tness score minus the oligomer tness score of all necessary duplexes.
Inter-oligomer and intra-oligomer duplexes were collectively summarized using a class of weighted tness scores (generally abbreviated W x ).These tness scores were calculated as weighted linear combinations of N and O: where x is a positive real number determining the relative contribution of O. Speci c W x are denoted with the value of x in the subscript (e.g., W 1 denotes W x with x = 1).The use of larger x values allows one to place an increasing emphasis on intra-oligomer duplexes.The value of all W x are zero when no duplexes are present, and larger values indicate more substantial duplexes.Unnecessary duplexes were quanti ed using the weighted tness scores above baseline (abbreviated ΔW x ), which were calculated as the weighted tness score minus the weighted tness score of any necessary duplexes.
The following three observations were made regarding these tness scores.First, N and O are not orthogonal since any intra-oligomer duplex physically implies the existence of two inter-oligomer duplexes.Alternatively stated, if an oligomer can form base-pairs with itself, it can also form these basepairs with an identical oligomer.Consequently, any structure which contributes to O also contributes to N, and N is always greater than or equal to 2•O.Second, while oligomers with ΔN and ΔO values approaching zero converge to the design, oligomers with larger values may diverge from the design by a variety of mechanisms resulting in a variety of behaviors.This makes ΔN and ΔO appropriate for quantifying unnecessary duplexes, but less appropriate for predicting the behavior of oligomers.Third, DNA oligomers are known to participate in other interactions not captured by N and O, so some oligomers with abnormal behavior may be expected.Such other interactions include non-canonical base-pairing, triplexes, quadruplexes, the binding of oligomers to container walls, or aptamer-like binding to other targets 50,51 .

Estimating kinetic dispersion in existing experimental data
Fitness score values for each sample were calculated using the custom-written Device Pro ler (abbreviated DevPro) computer program.This program utilizes an exhaustive linear search to identify both necessary and unnecessary.The source code of the DevPro computer program is freely available online (removed for double blind peer review).More information regarding the DevPro program, including an example score calculation is provided in supplemental section 2.
The kinetic dispersions of randomly selected samples were estimated using the following process.First, the number of samples to analyze, n, was chosen.The rate-constants of n samples were selected randomly.These rate constants were resampled with replacement 1,000 times and IQRNL was calculate for each of the resamplings.The 25th, 50th, and 75th percentiles of these IQRNL values were recorded.
This process was repeated 1,000 times, and the median of the 25th, 50th, and 75th percentiles are reported in Fig. 1.
The kinetic dispersions of the most-t samples were estimated using the following process.First, the number of samples to analyze, n, was chosen.The rate-constants of the n most-t samples were selected.These rate constants were resampled with replacement 1,000 times and IQRNL was calculate for each of these resamplings.The 25th, 50th, and 75th percentiles of these IQRNL values were recorded.
Normalized percentiles were calculated by dividing these values by the 50th percentile of the randomly selected samples and are reported in Fig. 2.
For both N-t and O-t samples, similar trends were observed when a greater number of samples were selected.However, kinetic dispersion reduction was most pronounced for small sample sizes.This can be explained by the decrease in sample tness when increasing the number of samples.Since IQRNL values were all calculated using the same number of samples, the increasing kinetic dispersion reduction with decreasing number of samples is not expected to be an artifact of biased parameter estimation.

Scaling of unnecessary duplexes
The typical unnecessary inter-oligomer duplexes for a given combination of i and j was estimated using the following process.First, 10,000 networks with random base-sequence were generated.N was calculated individually for each network.Since these networks contain no intentional duplexes, all duplexes are unnecessary duplexes and N = ΔN.The 50th percentile of these N values was taken to be an estimate of the typical unnecessary inter-oligomer duplexes.The 25th and 75th percentile of the N values were taken as lower and upper bounds for this estimate.The unnecessary intra-oligomer duplexes were estimated similarly, except using the O tness score.

Generation of new oligomers
New networks of oligomers forming less substantial unnecessary duplexes were generated by using insilico optimization of ΔN, ΔO, or ΔW x .In terms of the design paradigms established by Dirks et al. 54 , this generation method contains a network design acting as a positive design component and base-sequence optimization acting as a negative design component.
New networks were generated using the following process described visually in Fig. 3a.First, the intentional duplexes for the network were formalized as a domain-based design 55 .This involved describing each oligomer as a sequence of binding domains and binding domain complements.Binding domains were declared as either xed (i.e., not to be mutated) or variable (i.e., free to be mutated).Next, initial base-sequences for each domain were speci ed.The domain-based design and initial domain sequences were used as input for the custom-written Sequence Evolver (abbreviated SeqEvo) computer program, which optimized ΔN, ΔO, or ΔW x via a custom evolution-inspired algorithm.Fitness score calculation is described in detail in supplemental section 2. Operation of SeqEvo and the optimization algorithm are detailed in supplemental section 3. The source code of SeqEvo has been made freely available online (this text has been removed for double blind peer review.).All mutations performed by SeqEvo rearrange bases within a variable domain.Thus, all oligomers generated by SeqEvo have domains with the same length and base-composition as the initial base-sequences.For the rst optimization, default SeqEvo parameters were used.The network resulting from SeqEvo optimization, referred to as a candidate network, was then analyzed by a human looking for obvious aws.For this purpose, the custom-written Device Pro ler (abbreviated DevPro) computer program was used to identify and pro le unnecessary duplexes.Operation of DevPro is detailed in supplemental section 2. The source code of DevPro has been made freely available online (this text has been removed for double blind peer review.).It is possible to incorporate other software such as the nucleic acid analysis package (NUPACK) 56 thermodynamic simulator at this step if necessary.Based on the human analysis, the candidate network was either accepted as nal or the SeqEvo parameters were updated and optimization repeated.

Characterization of new oligomers
The network design for this model system (Fig. 4a) three DNA oligomers (labeled S 1 , S 2 , and S 3 ) composed of two binding domains (α and β).From their 5' ends: oligomer S 1 contains domains α then β, S 2 contains the binding complement of β then the binding complement of α, and S 3 contains only domain α.These three oligomers form two intentional duplexes: D 1 (which forms between S 1 and S 2 ) and D 2 (which forms between S 3 and S 2 ).Domain α is a variable sequence of 10 C's, 10 G's, 10 A's, and 11 T's.
Domain β is the xed sequence TCTCCATG, which was adopted from a previous study in order to mitigate the known effects of toehold stability on hybridization kinetics 84 .
Hybridization reactions were characterized using the following procedure.Oligomers were purchased HPLC puri ed from Integrated DNA Technologies with Cy3 and "Black Hole Quencher 1" modi cations, respectively.Oligomer concentrations were calculated using the absorbance coe cients reported by Integrated DNA Technologies and absorption measurements at 260nm (Thermo Nanodrop One spectrophotometer).Reactants were prepared in 1x TE buffer (supplemented with 1 M NaCl) at 1 µM concentration.Reactants were combined at an initial concentration of 10 nM in 1x TE (supplemented with 1 M NaCl) inside of one of two Cary Eclipse spectrophotometers.The concentration of unquenched Cy3 dye, which was proportional to the concentration of unreacted S 1 , was monitored using excitation and emission wavelengths of 548 nanometers and 574 nanometers.
Under these conditions, a plot of inverse-reactant-concentration as a function of time is expected to be linear with slope equal to the reaction rate 22,37,44 .Linear ts were applied to the rst 5, 10, 20, 50, 100, 200, and 500 seconds of data and the t with the largest coe cient of determination (R 2 ) was taken to be the best rate-constant measurement.This method yielded median R 2 values of 0.9968 and 0.9951 for the duplex-formation and oligo-displacement reactions.The uorescence traces, concentration plots, and calculated rate-constants are provided in supplemental section 4.
The precision of the rate-constant measurements was sampled using triplicate measurements of the "W 1t-3" oligomer-set.This included three measurements each for the formation and displacement reactions at 20°C and 40°C.In total, this yielded twelve rate-constant values organized into four groups of three.Within these groups of three, the largest deviation from a group average was 4% and the median deviation was 2%.Errors in commercial oligonucleotide synthesis have been observed to induce up to a factor of two difference in rate-constant values 12 and each group of three rate-constant values were based on a single synthesis.This suggests that, despite the relatively high precision of the experimental method, all measurements for any given oligomer-set may be systematically offset by up to a factor of two due to synthesis errors.Since the observed rate-constant values span four orders of magnitude, it is reasonable to expect the majority of observed rate variation to arise from factors other than synthesis errors.This assumption is further supported by the high reproducibility of certain design groups, such as the W 1 -t oligomers at high experimental temperatures.
Kinetic dispersion was estimated using the following process.First, the three samples were resampled with replacement 5,000 times.The IQRNL was calculated for each of these resamplings and the 50th percentile of these IQRNL values was interpreted as the estimated kinetic dispersion.The 25th and 75th percentiles were used as upper and lower bounds for this estimate.Normalized kinetic dispersions were calculated relative to the 50th percentile of the RND group.

Comparison with other generation
To quantify the unnecessary duplexes typical of different oligomer generation programs, oligomers were generated for a model system.The network design of this model system design contained two 35 base oligomers forming a single 35 base-pair duplex.A single duplex was chosen due to constraints of certain software.The length of 35 base-pairs was chosen because it is the smallest duplex which necessitates the introduction of a 3 base-pair unnecessary duplex 52 , which provided contrast in the results.For each generation method, three networks were generated using default parameters.Unnecessary duplexes were calculated for network using the DevPro program.The network with median O was taken to be typical for the generation method.
and Fig. 2c.In this gure, values for select j values (i.e., 16, 64, 256, and 1024 bases) are shown.Other j values (i.e., 8, 32, 128, and 512 bases) were used for tting but are omitted to reduce clutter in the gure.Both N and O were observed to scale exponentially with both i and j.This was con rmed by ts to the equation N = a•i b •j c and the equation O = a•i b •j c .In these equations, a, b, and c are real valued constants.The units of constant a are tness points and constants b and c are both unitless.N values appeared well modeled by values of a = 0.389, b = 3.19, and c = 3.68 (green lines in Fig. 2b) and O values appeared well modeled by values of a = 0.0526, b = 1.56, and c = 3.63 (orange lines in Fig.
been removed for double blind peer review.SOFTWARE AVAILABILITYThis text has been removed for double blind peer review.FUNDINGThis text has been removed for double blind peer review.CONFLICT OF INTERESTThis text has been removed for double blind peer review.

Figures
Figures

Figure 1 Analysis
Figure 1

Figure 2 In
Figure 2

Table 1
Fitness scores of oligomers generated for existing network designs.

Table 2
Unnecessary duplexes of oligomers generated using select methods.