Expanding detection windows for discriminating single nucleotide variants using rationally designed DNA equalizer probes

Combining experimental and simulation strategies to facilitate the design and operation of nucleic acid hybridization probes are highly important to both fundamental DNA nanotechnology and diverse biological/biomedical applications. Herein, we introduce a DNA equalizer gate (DEG) approach, a class of simulation-guided nucleic acid hybridization probes that drastically expand detection windows for discriminating single nucleotide variants in double-stranded DNA (dsDNA) via the user-definable transformation of the quantitative relationship between the detection signal and target concentrations. A thermodynamic-driven theoretical model was also developed, which quantitatively simulates and predicts the performance of DEG. The effectiveness of DEG for expanding detection windows and improving sequence selectivity was demonstrated both in silico and experimentally. As DEG acts directly on dsDNA, it is readily adaptable to nucleic acid amplification techniques, such as polymerase chain reaction (PCR). The practical usefulness of DEG was demonstrated through the simultaneous detection of infections and the screening of drug-resistance in clinical parasitic worm samples collected from rural areas of Honduras.

For a typical reversible reaction, the equilibrium constant can be derived from the reaction free energy ∆ 0 (eq. 2) and the concentration of all nucleic acid species follows the rule of mass conservation. ; (eq. 4) ℎ ≠ 1.
To quantitatively describe and compare sequence specificity, discrimination factor (DF) is commonly employed, where = ⁄ . In a general case where neither correct ( , ) nor spurious ( , ) target has equilibrium constant as 1, the discrimination factor (DF) is expressed as:

Supplementary Figure 2 | The theoretical concentration dependency of discrimination factor (DF).
a. Theoretical DF as a function of target concentration for a correct target against five theoretical spurious targets defined by their thermodynamic parameters. Reaction free energies of spurious targets are shown with the same colors with DF curves. b. The differences in yield between a pair of correct and spurious targets as a function of target concentration. c. Simulation of DF by including the correction using LOD.

Robustness Factor
To quantitatively describe the detection window for discriminating SNVs, here we mathematically define a robustness factor (RF) which is the concentration ratio of a pair of spurious and correct targets, when their yields are the same. To do so, we first derived target concentration τ as a function of yield and equilibrium constant (eq. 7). RF can then be mathematically derived using eq. 8. ; (eq. 9) The theoretical RF values increase linearly as a function of η ( Supplementary Fig. 3a). However, this is significantly deviated from the experimental observations, because the absolute concentration differences (τS -τC) between spurious and correct targets become much less significant, when the yields of hybridization approach 0 or 100% ( Supplementary Fig. 3b). To better reflect the analytical performance, we corrected our model by taking both LOD and limit of linearity (LOL) into consideration.
Supplementary Fig. 3e shows a corrected RF simulation by setting LOD to be 1% yield and LOL to 95% yield. To understand the concentration-dependency of RF, we further convert the x-axis from the yield η to the concentration of the target τ _correct ( Supplementary Fig. 3c-e).

Supplementary Figure 3 | The dependence of RF on reaction yield and target concentration. a. The
theoretical prediction of RF values as functions of reaction yield η. b. The absolute concentration differences between spurious and correct targets as a function of yield. c. RF as a function of target concentration. d. Absolute concentration differences at the same yield for the spurious and correct targets as a function of target concentration. e. Corrected RF using LOD and LOL.

Detection window of a toehold-exchange probe
To demonstrate the concentration-dependency of detection window for a toehold-exchange probe. We next simulated the η, DF, and RF using a numeric approach through MATLAB. A 42-nt synthetic DNA (see S5.1) was used as a model target and a single T to A mutation was introduced to create the spurious target. Standard Gibbs free energy (∆ 0 ) of each DNA species can be calculated using NUPACK software and ∆ 0 for each toehold-exchange reaction can thus be calculated as ∆ 0 = ∆ 0 ( ) + ∆ 0 ( ) − ∆ 0 ( ) − ∆ 0 ( ). The thermodynamical differences between a pair of correct and spurious targets can be quantified using ∆∆ 0 , where ∆∆ 0 = ∆ 0 ( ) − ∆ 0 ( ). In our model system, By further including ∆∆ 0 as a variable in our model, we were able to simulate the concentrationdependency of a toehold-exchange probe to all possible mutations that are mathematically reflected as varying ∆∆ 0 values (Supplementary Fig. 4d-4f). Our simulation results quantitatively reflect that the detection window is inversely related to the difficulty for discriminating a certain mutation: the smaller the ∆∆ 0 value, the narrower the concentration robustness range that allows effective discrimination ( Supplementary Fig. 4e).

DNA Equalizer Gate
DEG is designed to convert a dsDNA target into a ssDNA output in a quantitative manner with welldefined detection window. To simulate this process, we consider that all reactions are thermodynamically driven, and all DNA species are in their thermodynamic stable states. Under this assumption, a set of equilibrium equations could be used to predict the concentration distribution of newly formed DNA species (Fig. 3a in the main content). However, only independent equations need to be solved otherwise meaningless answers will be generated. To help determine independent equilibrium equations, we extract a numerical reaction matrix (RM) from the reaction system: The combination of DEG model with a classic toehold-exchange model allows us to precisely simulate the yield and discrimination factor for the correct target and any given mutation. To simulate RF in DEG system, a build-in mathematical reverse function in Matlab was used to first convert the reaction yield to the concentration of the ssDNA output using the toehold-exchange model and then convert the concentration of ssDNA to that of the dsDNA target using the DEG model.

5 Comparison between DEG and increase in energy barrier for expanding the detection window
An already established detection window for discriminating a single nucleotide mismatch (left in Supplementary Fig. 6) can be enlarged either through increase in energy barriers for activating the probe (right, bottom in Supplementary Fig. 6) or using our DEG approach (right, up in Supplementary Fig. 6).
As demonstrated by the simulation results in Supplementary Fig. 6, our DEG approach works better in both the degree of expansion (essentially to infinite) and more sensitive at low concentration range.

Supplementary Figure 6 | Simulation results of enlarged detection windows achieved through DEG (Top) and the increase in the energy barrier for activating the toehold-exchange probe (Bottom).
Increase in the energy barrier is achieved by elongating the reverse toehold by 2 bp.

Correction of ∆
It was previously found by Zhang and colleagues that corrections to ∆ 0 values predicted using NUPACK software are necessary to improve the agreement between theoretical prediction and experimental observation. 1 A similar correction was also performed in our study to improve the accuracy of the mathematical prediction ( Supplementary Fig. 7). By comparing theoretically predicted and experimentally determined yields at varying ∆ 0 , a correction of 1.575 kcal/mol was determined and applied throughout this study. were predicted using NUPACK software.

Determination of experimental RF through fitting
As both calibration curves for the correct and spurious targets were established using scattered data spots, it is not possible to determine the experimental RF directly. Therefore, we combined experimental fitting and mathematical conversion to address this issue ( Supplementary Fig. 8). A 4-parameter nonlinear fitting was used first to fit the experimental results. A set of four parameters including M, L, s and E, will be determined through the fitting (eq. 12). M and L represents the highest and lowest signals in the curve; E represents the concentration of target that gives halfway between maximum and minimum limits; and s represents the steepness of the fitting curve. Once established this mathematical model through fitting, we were able to convert any yield in a toehold-exchange reaction into a corresponding concentration of either

Detection of varying single nucleotide mutations using DEG
We examined the analytical performance and versatility of DEG for discriminating single nucleotide variants using two sets of synthetic targets: a set of subgenomes (28 to 87 bp) from a β-tubulin gene of a parasitic worm, Trichuris trichiura (TT) and a 44 bp subgenomic sequence from Hepatitis B Virus (HBV) S gene. Both diseases are major threat to human health worldwide. Varying types of mutations and indels were tested using our DEG detection platform. Various synthetic cancer targets carrying singlenucleotide-mutation hot spots were used to represent the robustness and clinical application potential of our DEG method. We also demonstrated the possibility of multiplexed DEG by mixing the two sets of targets and corresponding DEPs into the same test tube.

Detection of single nucleotide mutations in HBV
The goal of designing the double-stranded synthetic HBV S-gene target is to test the versatility of our DEG approach. As shown in Supplementary Fig. 27, a pair of DEPs and a reporter probe were designed for this synthetic target. Both single nucleotide mutations and base insertion/deletion were introduced and tested in this system using DEG. To verify the DEG approach for discriminating challenging single nucleotide mutations, we intentionally introduced an A to G mutation, a well-known challenging SNV because of the formation of G-T wobble that reduces differences in free energies between correct and spurious targets. We found that our DEG approach has effectively improved the specificity and concentration robustness for analyzing this challenging SNV comparing to the direct analysis using the toehold-exchange beacon (highlights in Supplementary Fig. 29 and 30).

Detection of clinically important single nucleotide variants in cancer
To further demonstrate the versatility and robustness of our DEG method, we designed 9 sets of DEG and toehold-exchange probes for clinically important single nucleotide variants frequently detected in cancer.
The sequences and designs are shown in Supplementary Fig. 31 and the performance of DEG for analyzing the 9 sets of targets are shown in Supplementary Fig. 32 and 33. output generated by asymmetric PCR using the reporter probe that is operated using toehold-exchange.

Analyzing clinical parasitic worm specimens using DEG-PCR
Infections caused by virus, bacteria, and parasitic worms are major threats to human worldwide. The extensive uses of antibiotics for treating various infectious diseases (often because of insufficient diagnosis) also leads to the issue of drug resistance. Therefore, an ideal test for diagnosing infectious diseases shall not only detect specific pathogens with high accuracy, it shall also screen or identify drug resistance and thus guide the treatment. Toward this goal, we engineered DEG-PCR by introducing a dual reporter system that allows the simultaneous detection of infections caused by Trichuris trichiura (TT) and screen for drug resistance. The first reporter (FAM-reporter) that is operated through the principle of toehold-exchange is designed to a specific A to T mutation at the 200 th codon of β-tubulin, which is a well-established hotspot of TT for resistance to benzimidazole (BZ, drug). As such, the fluorescence of