Native characterization of nucleic acid motif thermodynamics via non-covalent catalysis

DNA hybridization thermodynamics is critical for accurate design of oligonucleotides for biotechnology and nanotechnology applications, but parameters currently in use are inaccurately extrapolated based on limited quantitative understanding of thermal behaviours. Here, we present a method to measure the ΔG° of DNA motifs at temperatures and buffer conditions of interest, with significantly better accuracy (6- to 14-fold lower s.e.) than prior methods. The equilibrium constant of a reaction with thermodynamics closely approximating that of a desired motif is numerically calculated from directly observed reactant and product equilibrium concentrations; a DNA catalyst is designed to accelerate equilibration. We measured the ΔG° of terminal fluorophores, single-nucleotide dangles and multinucleotide dangles, in temperatures ranging from 10 to 45 °C.


Software analysis
Fluorescence normalization Step 2: Subtract background Step Step 3: Fluorescence Quantitation Supplementary Figure 2: Gel image analysis with Image Quant TL software. First, full-width lanes are created centered on the bands of interest. Next, bands are automatically detected, and fluorescence background of the gel is subtracted using a "rolling ball" algorithm packaged with the Image Quant TL software. Finally, the band intensities are calculated, normalized to the lane 1 band being 100 arbitrary units.   When the concentration of 3p-Ref-G is higher than that of 3p-Comp-G, the double-stranded band intensity should remain saturated, and excess single-stranded 3p-Ref-G results in a new single-stranded band whose intensity is proportional to the excess quantity. (b) Plot of double-stranded (red) and single-stranded (blue) DNA band intensities. Scattered cots represent experimental intensity data, and lines show best-fit predictions. Single-strand DNA normalized fluorescence intensity, duplex DNA normalized fluorescence intensity, and true stoichiometric ratio were fitted to the observed band intensities. For this particular pair of oligonucleotides, the stoichiometry ratio based on absorbance at 260 nm was 1.04. Similar PAGE analysis was performed for all unlabeled species. For fluorophore-labeled species, their stoichiometry to the unlabeled complementary strands are directly inferred from fluorescent PAGE analyzing the reaction ∆G • and will be discussed in that section.

Supplementary note 1: Motif thermodynamics inferred from traditional melt analysis
Thermal melt analysis is traditionally the most commonly used method to determine DNA thermodynamic parameters. The study of the melting characteristics of DNA oligomers as means to evaluate sequence-dependent thermodynamic stability of DNA was first reported in the early 1980's. Two assumptions are applied during the analysis: (1) the transition equilibrium is assumed to involve only two states, and (2) the difference in the heat capacities of the two states is assumed to be negligible.
In melting analysis, samples are heated at a constant rate from 4 • C to 95 • C. The duplex to coil transition is monitored by measuring the absorbance at 260 nm, or by fluorescence of an intercalating dye. The upper and lower temperature baselines were assumed to represent a single state: either duplex or single strand. Because the duplex and single-stranded states themselves may exhibit temperature-dependent absorbance/fluorescence, slopes must be fitted to determine hybridization yields at other temperatures. Subjective judgement calls in baseline slope determination may have significant impact on the inferred ∆G • values.
A rough description of a typical melt analysis procedure is as follows: Two short complementary single strands form duplex D: S1 + S2 D. The parameters ∆H • duplex and ∆S • duplex are the duplex melting transition enthalpy and entropy, respectively. The total concentration of strands C T is given by C T =[S1] + [S2] + 2 [D]. At τ = τ m , C T K D = α, where α = 4 when S1 and S2 are distinct species or α = 1 when S1 and S2 are identical. From ∆H • duplexτ m ·∆S • duplex = R·τ m ln(C T /α), the enthalpy and entropy can be obtained by two methods: (1) averaging ∆H • and ∆S • from the fits of individual curves, and (2) fitting plots of reciprocal melting temperature versus the natural logarithm of the total strand concentration.
One shortcoming for melting approach is that the two-state assumption is frequently incorrect for moderate to long DNA oligonucleotides, because the single-stranded molecules may adopt one or more secondary structures. For any sequence, unless two-state melting is independently verified, uncertainties due to potential deviations from two-state behavior will always affect the predictions of melting stability. Because of this, most previous reports used short oligonucleotides (6-16nt) or self-complementary strands.
Melt experiments do not produce very accurate thermodynamic parameters for several reasons. First, experiments are typically performed in 1 M NaCl, possibly for historical reasons. Thermodynamics parameters fitted at one buffer condition are difficult to extrapolate to other buffer conditions; for example, to date there is no salt adjustment for the thermodynamics of RNA folding. Additionally, assumptions about the baselines in melt curve analysis affect the inferred ∆H • and ∆S • values. Finally, although DNA absorbs light strongly at 260 nm wavelength, other chemical impurities in the solution may also absorb at this wavelength, complicating inference of DNA hybridization thermodynamics.
Even disregarding the challenges of extrapolating thermodynamic parameters from melting analysis to other temperatures and buffer conditions, melt analysis must contend with a decomposition problem, because the fitted ∆H • and ∆S • values represent the sum of all component motifs (e.g. base stacks). For example, Bommarito's studies on single-nucleotide dangles used 9 nt oligonucleotides, so that the ∆H • and ∆S • values obtained correspond to the sum of 7 base stacks and 2 single-base dangles. Through the course of roughly 30 such experiments, a system of linear equations can be set up to solve for the ∆G • of each single-base dangle. However, this approach lacks accuracy due to the error contributions by the 7 individual base stack parameters for each sequence.
Temperature-dependence of ∆H • and ∆S • . In melt curve analyses, ∆H • and ∆S • of hybridization are assumed to be temperature invariant, which is equivalent to the assumptio of no change in heat capability Cp. Published experimental data, however, have been inconsistent with regards to this assumption. Whereas ∆Cp has been correlated with changes in solvent-exposed hydrophobic surface area, the underlying physical basis of this correlation lies in changes in the fluctuations of hydrogen-bonding patterns among solvating waters. Based on this, Cp should be temperature dependent. Several groups have performed research on heat capacity change of DNA across temperatures using numerous different approaches [2][3][4]. This is the case for the data analyzed by Petrushka and Goodman [5], who found strong nonlinear correlations between experimental values of ∆H • and ∆S • for regular base-pair doublets and a number of mismatched and modified base pairs. A significant heat capacity increase ∆Cp associated with DNA melting, in the range of 40-100 cal/mol·K per base pair was also reported in ref. [6].
Should the ∆H • and ∆S • of DNA hybridization motifs truly be temperature dependent, previous thermodynamic parameters inferred from melt curves would be even further challenged for accuracy. Our native characterization approach, in contrast, allows more direct measurement of ∆H • and ∆S • at temperatures far away from melting temperature. up to 90 minutes. Verification that photobleaching is not significant at these time scales is important to ensure reproducibility of results, as well as to ensure that there do not exist systematic biases in converting fluorescent band intensity to concentration. Supplementary Figure 13 shows that even with 5.5 hr of continuous excitation at 45 • C, there is not significant reduction of fluorescence for either the Alexa-532 or ROX fluorophores.

Supplementary note 5: Fluorophore ∆G • rxn inference from band intensities
In this section, we describe how quantitated gel bands are used to infer ∆G • rxn values. Other than the 6-lane gel shown in manuscript Fig. 2b demonstrating the effectiveness of catalysis, all other gels include only the 4 lanes corresponding to lanes 1, 2, 5, and 6 in Fig. 2b, omitting lanes 3 and 4 because these pre-equilibrium states do not help inform ∆G • .
Supplementary Figure 14 illustrates the process by which band intensities are first mapped to concentrations, and then to ∆G • values. Importantly, the per-unit fluorescence of single-stranded species and double-stranded species are not assumed to be identical, because of the known quenching effects of proximal G nucleotides. Furthermore, there is lane-to-lane pipetting error, which results in a difference of total DNA quantity in each lane; the total inferred concentration of the fluorescence species in each lane is corrected according to the total inferred concentration in the first lane.
Systematic bias due to CZ, CXZ, and CYZ. We did not observe any higher molecular weight CXZ or CYZ bands on any of the fluorescent PAGE experiments. However, we cannot quantitate the equilibrium concentration of CZ (a.k.a. Cat·Comp), and the presence of significant CZ at equilibrium will result in an overestimate of reaction ∆G • . From our analysis (Supplementary Figure 16), we estimate that there is a systematic bias of no more than 0.2 kcal/mol due to CZ. Furthermore, we believe that this systematic bias will be roughly the same in value for all experiments involving fluorophores. The upshot is that dangle parameter characterizations that subtract two different ∆G • values likely have little to no bias due to CZ intermediates.
Fitting ∆H • and ∆S • values. From the 4 different best-fit values of ∆G • at 10, 25, 37, and 45 • C, we can make a preliminary estimate of ∆H • and ∆S • based on maximum likelihood fits, described in Supplementary  Figure 17. Fitted values of ∆H • and ∆S • will be reported in later supplementary sections.
Placing confidence intervals on ∆H • and ∆S • tends to be more subtle than that of ∆G • , due to the fact that the 4 ∆G • values may have different standard deviations resulting in asymmetrical upper and lower bounds relative to the best-fit. In particular, placing a reasonable estimate on ∆S • given such few temperature-based data points was difficult, so we arbitrarily picked a methodology to provide a rough estimate. Finally, we generally do not believe that ∆H • and ∆S • are temperature invariant, which means that our linear fits may not be of high predictive value for extrapolating ∆G • values at other temperatures.
Effects of Running buffer. Our hybridization and catalysis reactions were performed in 10 mM Tris-MgCl 2 buffer or 1x PBS buffer at different temperatures. Our PAGE assays were subsequently carried out in 1x TAE running buffer at the same temperature as the hybridization/catalysis reaction. Samples were immediately loaded, and voltage was typically applied within 3 minutes of loading. We beleve that equilibriums were not significantly disturbed due to this transient buffer change.
To experimentally verify that buffer switching does not impact inferred thermodynamic paramters, we performed four groups of experiments with 1x TAE with 12.5 mM MgCl 2 as running buffer, and compared them to gels run in TAE without MgCl 2 . Supplementary Figure 18 shows that there is no significant difference in inferred ∆G • values for either fluorophores or single-base dangles, at the p = 0.05 (2 standard deviation) significance level. We could not practically use 1x PBS as gel running buffer because the high salinity would cause exorbiant amounts of Joule heating. Based on our results using TAE with Mg 2+ as running buffer, we expect that parameters in 1x PBS are likewise not significantly affected by the transient change in gel running buffer before application of voltage.
Species identity confirmation via SybrGold. A single stranded ladder 10-60 bases and a double strand ladder were used to support the identities of the gel bands we used for equilibrium constant calculation.
Supplementary note 6: Single-base dangle ∆G • rxn inference As described in the main text, the ∆G • of a single-base dangle is arithmetically calculated by subtracting the ∆G • values of two reactions. The first reaction reports the ∆G • of a fluorophore next to a duplex, and the second reaction reports the ∆G • of a fluorophore next to a duplex minus the dangle thermodynamics. In this and future sections, we also refer to the ∆G • of the second reaction as the "raw" ∆G • value, and the subtracted ∆G • value corresponding to only the dangle as the "real" ∆G • value. We performed all experiments using both ROX and Alexa-532 fluorophores, so there are two sets of raw and real ∆G • values for each dangle parameter. The two real ∆G • values are then combined into a single consensus ∆G • value as described in the main text. Supplementary Figure 20 shows the workflow of the experiments we performed to obtain the 160 consensus ∆G • values for the 32 single-base dangles under 5 different sets of temperature/salinity conditions. We previously described the positive bias that ignoring the CZ intermediate would have on inferred ∆G • values, but argued that this effect would be canceled out for dangles because both reaction 1 and reaction 2 will harbor similar systematic biases. Supplementary Figure 21 shows the effect of different concentrations of CZ on the inferred dangle ∆G • value. As the assumed concentration of CZ ranges between 0 and 30 nM (the total amount of catalyst in the reaction), the ∆G • of both reaction 1 and reaction 2 become more negative by about 0.2 kcal/mol. The deviation in the dangle ∆G • , however, is kept below 0.1 kcal/mol. This analysis assumes that the concentration of CZ is identical or similar for both reaction 1 and reaction 2, which is likely to be true because the sequence identity of the CZ species is the same in both reaction 1 and 2, implying the same standard free energy of formation.

Supplementary note 7: Melt curve analysis
To compare our method to traditional melt curve analysis, we also performed inference of ∆G • for poly A and poly T multinucleotide dangles using traditional melting analysis and Van't Hoff equation fitting. The concentrations of DNA strands and hybridization buffer used was identical to those in our native catalysis experiments. Table S11-1 shows the summary of the melt curve based results; in 1x PBS there is a trend for longer dangles to be destabilizing, but this is not present in TE-MgCl 2 buffer. This is consistent with our observations from native catalysis fluorescent PAGE experiments. Note that the error bars on ∆G • values are significantly higher for melt curve experiments.
Supplementary Figure 70 demonstrates the overall procedure for ∆H • and ∆S • inference from melt curves. The observed fluorescence at different temperatures (blue) is compared against a linear fit of the fluorescence dependence temperature at low temperatures (red) and at high temperatures (purple). This allows inference of hybridization yields at different temperatures; the equilibrium constant K is calculated from the hybridization yield. The Van't Hoff plot shows the relationship between the natural logarithm of the equilibrium constant K and the inverse of the temperature (in Kelvin). The best linear fit to the Van't Hoff plot gives the ∆H • (slope) and ∆S • (intercept) of the hybridization reaction. Mathematically, the fitting of the Van't Hoff plot can be expressed as: (1) The details of the upper and lower baselines can have large impact on the inferred thermodynamic parameters. For example, Supplementary Figure 71 shows the effects of slight differences in assumption in determining the upper and lower baselines. For our melt curve thermodynamics inferences, we consistently used the average of the first 7 data points (60 to 61.2 • C) to fit the upper baseline and 20 data points (71.2 to 75 c ircC) to fit the lower baseline. Different assumptions here would have led to systematically different ∆H • and ∆S • values.
Each set of data was acquired in triplicate, and ∆G • , ∆H • and ∆S • values were averaged from triplicates. As mentioned in the main text and manuscript Fig. 2fg, melt curves showed poor reproducibility in the Van't Hoff plots, resulting in errors that increase the standard deviation of the best estimate ∆G • of dangle motifs. These run-to-run errors are presumably unbiased and can be corrected through averaging a statistically large number of experiments, but the 6-to 12-fold higher standard deviation over the native catalysis method means that 36-to 144-fold more experiments are needed to achieve the same parameter precision.