## Introduction

The advance of high throughput sequencing has uncovered a large number of biomedically relevant DNA sequences, from driver mutations in cancer to new bacterial/viral pathogen DNA sequences to microbiome metagenomic profiles that affect mental disorders on the gut-brain axis1,2,3,4. For discovery applications, “shotgun” whole genome sequencing (WGS) is the preferred approach to identify novel DNA sequences of interest5. However, the human genome comprises over 3 billion nucleotides, and despite the lowering costs of high-throughput sequencing, it is not practical today to perform WGS to high depths necessary for identification of subclonal mutations, such as somatic mutations in cancer. For routine detection of disease-relevant DNA variants in known genes of interest, targeted sequencing or direct qPCR approaches are typically used6,7. Of the two dominant methods today for target enrichment, multiplex PCR tends to have shorter workflows and require less DNA input than hybrid-capture probes8. However, multiplex PCR struggles to scale to large panels covering hundreds of genes, due to the nonlinear increase of primer dimer species that reduce NGS mapping rates and increase effective cost9.

Currently, multiplex PCR methods for NGS target enrichment (e.g., Ampliseq8) primarily rely on (1) enzymatic digestion of modified bases in primers10 and (2) DNA size selection to preferentially remove short amplicon species likely to be primer dimers. However, both steps are labor-intensive and cannot be applied universally to all multiplexed PCR reactions. In contrast, relatively little systematic work has been reported on computational approaches to minimizing the formation of primer dimers in the first place. To the best of our knowledge, existing multiplex primer design algorithm never exceeded 70 primer pairs in one tube11,12,13,14. This is mainly due to the high computational cost when the number of primers increases15. The development of a robust multiplex primer set design algorithm that produces highly multiplexed primer sets with minimal primer dimer formation could allow further scaling of multiplex PCR target enrichment to even larger NGS panels when combined with enzymatic and size selection methods. Alternatively, it can simplify the workflow of moderate size NGS and qPCR assays by removing the need for strict contamination control from open-tube steps.

There are two primary challenges in designing highly multiplexed PCR primer sets: First, for an N-plex PCR primer set comprising 2N primers, there are $$\left(\begin{array}{l}2N\\ 2\end{array}\right)$$ possible simple primer dimer interactions. For N = 50, this corresponds to $$\left(\begin{array}{l}100\\ 2\end{array}\right)$$ = 4950 times as many potential primer dimer bindings as for a single-plex PCR primer set. Second, there are typically M > 10 reasonable candidate choices for each primer when considering specific gene targets and amplicon length constraints, resulting in M2N possible N-plex primer sets. For M = 20 and N = 50, the number of possible primer sets is 20100 ≈ 1.3 × 10130, billions of times larger than the number of atoms in the universe. Thus, it is computationally intractable to evaluate all possible multiplex primer sets. Simultaneously, primer dimer formation emerges from the interactions of two or more primers in the primer set, so changing the sequence of a primer to mitigate one primer dimer interaction may result in the appearance of another more serious primer dimer. In the language of numerical optimization, multiplex primer design is high dimensional problem with a highly non-convex fitness landscape. Consequently, standard convex optimization algorithms (e.g., gradient descent) will not be effective.

Here, we present Simulated Annealing Design using Dimer Likelihood Estimation (SADDLE), an algorithmic framework for designing highly multiplex PCR primer sets. Within this framework, we present an example multiplex primer design algorithm, comprising an algorithm for primer candidate generation and a rapidly computable Loss function for estimating primer dimers. Using the SADDLE, we designed and experimentally tested multiplex primer sets comprising 192 primers (96-plex) and 784 primers (384-plex), and show low primer dimer formation through NGS experiments. Building upon this success, we built a single-tube 60 primer qPCR and Sanger assay to detect and identify 56 gene fusions with clinical actionability for non-small cell lung cancer.

## Results

### Simulated Annealing Design using Dimer Loss Estimation (SADDLE)

There are six main steps in SADDLE, as illustrated in Fig. 1:

1. 1.

Generation of forward primer (fP) and reverse primer (rP) candidates for each gene target.

2. 2.

Selection of an initial primer set S0 from the primer candidates.

3. 3.

Evaluation of the Loss function L(S) on the initial primer set S0.

4. 4.

Generate a temporary primer set T based on set Sg (primer set from generation g) by randomly changing 1 or more primers.

5. 5.

Evaluate L(T), and set Sg+1 to either Sg (no change) or T, depending probabilistically on the relative values of L(Sg) and L(T).

6. 6.

Repeat steps 4 to 5 until an acceptable primer set Sfinal is constructed.

The above abstract framework provides a basis for many potential multiplex primer design algorithms, depending on the specific details of primer candidate generation, form of Loss function, temporary set T generation, and the dynamic probability of setting Sg+1 to T. Below, we describe our specific implementation of SADDLE, based on our accumulated understanding of primer design principles and primer dimer formation mechanisms. Given the infinite possibilities for function forms and hyper-parameters, we did not systematically evaluate or optimize at the high-level. Lower-level parameters, such as standard free energy (ΔG°) ranges for primers, were experimentally optimized and these are described below.

1. Primer candidate generation. We begin our implementation of primer candidate generation through the selection of one or more “pivot” nucleotides on human genomic DNA around which we design the forward and reverse primers (Fig. 2a). The pivot nucleotides are the ones that must be included in the amplicon insert, and for example could be the hotspot region of a gene that is frequently mutated. From the pivot nucleotides and a constraint on the maximum length of the amplicons (e.g., determined by the read length of NGS), we can systematically generate a series of different proto-primers with 3′ end just outside the pivot nucleotides. The proto-primers have a large range of different lengths and binding energies to their complementary sequences, and will next be trimmed at the 3′ end to generate the primer candidates (Fig. 2a).

From our past experiences and preliminary optimization experiments, primers that hybridize to their cognate templates with ΔG° ≈ −11.5 kcal/mol have the best tradeoff between amplification efficiency/uniformity and nonspecific hybridization. Shorter primers may not bind consistently with high efficiency to their templates, resulting in variability in amplification efficiency and non-uniformity of amplicon on-target reads in the NGS library. Longer primers have increased likelihood of binding to other loci in human genome, and can result in non-specific amplicons. Based on this ΔG° goal, we next systematically constructed primer candidates from the proto-primers by truncating nucleotides from the 3′ end until the primer candidate has ΔG° between −10.5 kcal/mol and −12.5 kcal/mol (Supplementary Section S9). Due to the granularity of ΔG° for base stacks, some proto-primers with the same 5′ end will result in multiple primer candidates (e.g., with ΔG° = −10.9 kcal/mol and −12.0 kcal/mol). Optionally, one could implement additional filters here to remove undesirable prime candidates, such as based on G/C content. For our demonstration panels, we restrict the G/C content of primer candidates to be between 0.25 and 0.75, removing primer candidates with G/C content outside this range.

In the implementation of SADDLE, primer candidates can be treated as individual primers or as primer pairs. Our specific implementation treats primers as pairs, so we next combinatorially generate all candidate primer pairs for an DNA target, in order to better constrain the distribution of amplicon lengths. Any candidate primer pairs that generate amplicons with length exceeding our maximum amplicon length or below our minimum are removed.

2. Initial primer set S0 selection. We randomly selected a primer pair candidate for every amplicon that we wish to design, and collectively the selected primers are known as the initial primer set S0.

3. Evaluation of Loss function L(S) on S0. The Loss function L(S) is a rapidly computable function that aims to approximate the severity of primer dimer formation by a primer set S. L(S) sums the potential primer dimer interactions between every pair of primers in the primer set. To prevent confusion, we refer to the predicted formation of dimers for a particular pair of primers to be the Badness. Mathematically,

$${{\mbox{L}}}\,(S) =\mathop{\sum}\limits_{b\ge a}\,{{\mbox{Badness}}}\,({p}_{{{{{{\mathrm{a}}}}}}},{p}_{{{{{{\mathrm{b}}}}}}})\\ =\frac{1}{2}\cdot \mathop{\sum }\limits_{a=1}^{2N}\mathop{\sum }\limits_{b=1}^{2N}\,{{\mbox{Badness}}}\,({p}_{{{{{{\mathrm{a}}}}}}},{p}_{{{{{{\mathrm{b}}}}}}})+\underbrace{\frac{1}{2}\cdot \mathop{\sum }\limits_{a = 1}^{2N}{{\mbox{Badness}}({p}_{{{{{{\mathrm{a}}}}}}},{p}_{{{{{{\mathrm{a}}}}}}})}}_{\begin{array}{c}{{\mbox{pre-calculated}}}\end{array}}$$
(1)

where pa and pb are the ath and bth primer in primer set S, respectively (Fig. 1). Note that the second term of L(S) can be calculated in advance during primer candidates generation. One can imagine the Badness function to be proportional to the amount of primer dimers formed by two primers. In an optimized primer set with a relatively low concentration of primer dimers compared to the concentration of on-target amplicons, the amount of primer dimers formed between primer pa and pb should not significantly impact the amount of dimers formed between pa and pc, so the Loss function being defined as the sum of the component Badness functions is justified.

The Badness function in our implementation is defined as follows (Fig. 2b):

$$\,{{\mbox{Badness}}}({p}_{{{{{{\mathrm{a}}}}}}},{p}_{{{{{{\mathrm{b}}}}}}})=\sum \frac{{2}^{{{{{\mathrm{len}}}}}}\cdot {2}^{{{{{\mathrm{numGC}}}}}}}{({d}_{1}+1)({d}_{2}+1)}$$
(2)

The sum in the Badness definition is over all reverse complementary subsequences between primer pa and pb with at least 4 nt of continuous complementarity. len is the length of the subsequence, d1 and d2 are the distances of the subsequence to the 3′ ends of primer pa and pb, respectively, and numGC is the number of G/C nucleotides in the subsequence. Our choice of 4 nt is based on our preliminary experimental studies in qPCR showing that up to 3 nt of complementarity at the 3′ ends of two primers will not result in significant primer dimers even in no-template control (NTC) reactions.

For each complementary subsequence, its length (len) and its number of GC nucleotides in complementary subsequence (numGC) contribute exponentially to Badness. Thus, the exponential components of Badness roughly reflect the partition function of the complementarity interaction, with G/C base pairs roughly twice as strong as A/T base pairs. We chose to use these simplistic parameters, rather literature base stacking thermodynamics parameters16,17,18, because there is significant uncertainty in the effective salinity of the PCR reaction buffer, and because our previous studies on DNA thermodynamics suggests that previously reported ΔH° and ΔS° parameters do not extrapolate well to higher temperatures19.

The distances of the complementary subsequence to the 3′ ends of primers pa and pb, denoted as d1 and d2, are known to significantly affect the likelihood of primer dimer formation. In our preliminary qPCR experiments, we observed that a primer pair with 10 nt of complementarity at the 5′ end will not result in observable primer dimer formation, but a primer pair with 5 nt of complementarity at the 3′ end would. Depending on whether the specific DNA polymerase used, different d1-based and d2-based attenuation of Badness may be optimal for minimizing primer dimers. Because high-fidelity DNA polymerases with $${3}^{\prime} > \, {5}^{\prime}$$ exonuclease activity can remove mismatched 3′ nucleotides, the optimal d1-based and d2-based attenuation should be significantly weaker for high-fidelity DNA polymerases.

The evaluation of the Badness function is single largest component of software runtime cost, due to the large number of times the Badness function will be evaluated. For a primer that is 25 nt in length, there are 22 subsequences of length 4, 21 subsequences of length 5, etc. Evaluation of Badness for a single primer pair would thus have time complexity of O(P3), where P is the length of each primer (Eq. 2). In our specific implementation, subsequence length len also has a maximum of 8 nt, decreasing the time complexity to O(P2). Naively, evaluation of the Loss L(S) of the whole primer set would have time complexity of O(N2 × P2) (Eq. 1). However, due to the additive nature of subsequence components to the overall Badness function, we implement more rapid Badness evaluation by using a hash table20, as shown in Eq. (3), where H is the hash table, s is a subsequence of the primer set, d is the distance to 3$${}^{\prime}$$ ends of each occurrence of s, and revcomp is a function that converts s to its reverse complement sequence (Fig. 2b). The time complexity to set up the hash table is O(N × P) to calculate the hash value for each unique subsequence in the primer set, and the time complexity to evaluate the L(S) by stepping through all subsequences of all primers is also O(N × P) (Eq. 3). Consequently, the overall time complexity of evaluating L(S) is O(N × P) for all primers in S.

$${{\mbox{H}}}\,[s]=\sum \frac{1}{d+1}\\ \mathop{\sum }\limits_{a=1}^{2N}\mathop{\sum }\limits_{b=1}^{2N}\,{{\mbox{Badness}}}({p}_{{{{{{\mathrm{a}}}}}}},{p}_{{{{{{\mathrm{b}}}}}}})=\sum \frac{{2}^{{{{{\mathrm{len}}}}}}\cdot {2}^{{{{{\mathrm{numGC}}}}}}}{d+1}{{\mbox{H}}}\,[\,{{\mbox{revcomp}}}\,(s)]$$
(3)

4. Generate temporary primer set T based on Sg. Step 4 begins the recursive optimization process. Based on the current primer set Sg at generation g, we first randomly select one primer pair to “mutate.” For that primer pair, we randomly select a different primer pair from the list of all candidate primer pairs generated in Step 1. Temporary primer set T is thus generated by combining this new primer pair with all remaining primers in set Sg. Optionally, multiple primer pairs can be replaced simultaneously in this step to allow faster and more efficient exploration of the primer set space. In our preliminary in silico evaluations, we found that simultaneously mutating multiple primer pairs generally caused a slowdown of the optimization process.

5. Evaluate L(T) and set Sg+1 to be either T or Sg. The Loss of temporary primer set T can be evaluated significantly faster than the initial evaluation of L(S0), because the hash table only need to be modified based on the changed primers.

We next compare the value of L(T) vs. L(Sg). If L(T) is smaller than L(Sg), then the primer pair change was an improvement and accepted, so Sg+1 is set to T. If L(T) is larger than L(Sg), the change was detrimental, but we will still accept the change with a certain probability, as part of the simulated annealing algorithm21. To clarify, “simulated annealing” here refers to a specific computer science algorithm, and not a literal simulation of a physical DNA thermal annealing process. If we never accept any detrimental primer pair changes, then the approach degenerates to become a stochastic gradient descent approach. In preliminary in silico evaluations, we confirmed that stochastic gradient descent produces final primer sets with significantly worse Loss, because it becomes too easy to get stuck in a local Loss minima.

The probability of accepting a detrimental change depends on both the magnitude of the detriment (L(T) − L(Sg)) and the generation g of the optimization. Worse changes with higher L(T) are accepted with lower probability, and later generations of the optimization (higher g) are less tolerant of detrimental changes. In our implementations, the probability of setting Sg+1 to be T when L(T) is greater than L(Sg) are as follows:

$$p=\left\{\begin{array}{ll}{e}^{\frac{{{{{{{\rm{L}}}}}}}({S}_{{{{{{\mathrm{g}}}}}}})-{{{{{{\rm{L}}}}}}}(T)}{{{{{{{\rm{C}}}}}}}(g)}}&\,(g \; < \; {g}_{{{{{{\mathrm{t}}}}}}})\\ 0&\,(g\;\ge\; {g}_{{{{{{\mathrm{t}}}}}}})\end{array}\right.$$
(4)

where e is Euler’s number and gt is a positive interger. C(g) is a function that is monotonically non-increasing in g, indicating decreasing tolerance to detrimental changes at later generations. The parameter gt indicates the generation in which simulated annealing terminates, and we switch over to stochastic gradient descent.

6. Repeat steps 4 and 5. Steps 4 and 5 are repeated until either a pre-determined generation g, or until L(Sg) is below a pre-determined threshold Lt. In our implementation, we typically run the optimization to about 1.5 × gt to ensure we reach local minima. To further improve the overall quality of the generated primer set, we recommend running multiple SADDLE optimization processes with different starting conditions (initial primer sets) and selecting the best final primer set.

### Design and experimental evaluation of a 96-plex primer set

We first used SADDLE to optimize the design of a 96-plex primer set, each amplicon target one arbitrarily selected exon of a different cancer-related gene22,23,24,25 (Fig. 2c). Figure 2d shows the calculated value of L(Sg) at different generations g, and is representative of our typical optimization trajectory. We selected the designed primer sets at three different optimization generations for experimental testing: PS1 (initial unoptimized primer set), PS2 (primer set with intermediate Loss optimization), and PS3 (primer set with saturating Loss optimization). The primer set Loss decreased roughly 24-fold from PS1 to PS3; after 40,000 generations, only very marginal improvements were observed. We chose the primer set at 40,000 generations as PS3, rather than the one at 60,000 generations, because we know that our Loss function is an imperfect predictor of primer dimers. Over-training on an imperfect Loss function can lead to worse experimental results. The optimization finished in about 10 min under a conventional laptop with MATLAB R2021b and Linux operating system.

We applied each of the three primer sets individually to human genomic DNA (10 ng NA18562, sheared to a mean length of approximately 150 nt) and amplified for 17 cycles. We next constructed NGS libraries from the amplicons generated using PS1, PS2, and PS3, using a standard adaptor ligation protocol (Supplementary Section S1). After library preparation, capillary electrophoresis results show a clear increase of amplicons of the expected length from PS1 to PS2 to PS3 (Fig. 2e). In the NGS data analysis workflow, after the first step of adapter trimming, we separated NGS reads into three major species: on-target amplicons, dimers, and non-specific amplicons (Supplementary Section S2). On-target amplicons are the NGS reads that were successfully aligned to the intended amplicon sequences using Bowtie226. The remaining NGS reads were aligned separately to each forward and reverse primer sequence. Reads with insert length shorter than the sum of the two aligned primers are classified as Dimers, and reads with insert length longer than the sum of the two aligned primers are classified as Non-specific amplicons (amplifying unintended regions of the human genome).

The amounts of these three species in the three primer set libraries are shown in Fig. 3a. Going from the PS1 to the PS3 library, the fraction of primer dimers dropped significantly, from 90.7% in the PS1 library to 39.6% in the PS2 library and then to 4.9% in the PS3 library. However, even with the decrease of dimers from the PS2 library to the PS3 library, the proportion of non-specific amplicons in these two libraries remained about the same. This is reasonable because the SADDLE Loss function was designed only minimizes primer Dimers, and does not consider likelihood of Non-specific amplicon formation. The distribution of amplicon length in NGS reads is consistent with the capillary electrophoresis results in three libraries (Fig. 3b and Supplementary section S3).

We next tested the PS3 primer set on five formalin-fixed, paraffin-embedded (FFPE) clinical tissue samples (one breast cancer, two lung cancer, and two colorectal cancer samples, see also Supplementary Section S5). The beeswarm plot of the observed reads (Fig. 3d) show high consistency across the different samples, and are also consistent with our results from sheared genomic DNA. The identities and quantities of primer dimers formed, likewise, are similar between FFPE DNA samples and genomic DNA (Fig. 3e).

To demonstrate the scalability of SADDLE, we next designed and tested a 384 amplicon panel comprising 768 primers. The optimization finished in about 60 min under a conventional laptop with MATLAB R2021b and Linux operating system. Due to the high cost of primer synthesis for this large panel, we only experimentally tested the final primer set design. Surprisingly, the observed Dimer fraction was only 1% for this library, using an input of 40 ng sheared NA18562 genomic DNA (Fig. 3f). Roughly 56% of the reads were Non-specific amplicons, resulting in a NGS library on-target rate of 43% (Supplementary section S6).

### Accuracy of the dimerization prediction

We constructed the SADDLE Badness function based on our understanding of the mechanisms of primer dimer formation, but we know that this Badness function is imperfect both because our understanding of primer dimer formation is imperfect, and because it is computationally too expensive to implement many classes of potentially more accurate Badness functions. Accurate assessment of how good or bad the current Badness function is at predicting Dimers, however, is critical to further incremental improvement in multiplex PCR primer design using SADDLE.

Through the course of SADDLE optimization, we expect that the Dimer prediction accuracy will get worse in later optimization generations, because we are selecting for primer sets with low expected Badness that will include false negatives. Experiments and analysis of PS1, PS2, and PS3 confirm this understanding (Supplementary Section S7). The Dimer reads for each pair of primers from PS1 are plotted against the predicted Badness in Fig. 4a.

To facilitate discussions of Badness function accuracy in terms of sensitivity and specificity, we set two separate thresholds: the Reads Threshold (horizontal orange line) and the Badness Threshold (vertical dotted purple line). The plotted Reads Threshold in Fig. 4a corresponds to the mean on-target read depth, and the Badness Threshold plotted correspond to the value that maximizes prediction sensitivity plus specificity. For these Threshold values, we observe a sensitivity 92.5% ($$\frac{62}{67}$$) and a specificity of 90.3% ($$\frac{33,226}{36,797}$$). By adjusting the Badness Threshold value, we can change the tradeoff between sensitivity and specificity, resulting in a receiver operator characteristic (ROC) curve (Fig. 4b). The area under the ROC curve (AUROC) is 0.9577, indicating very high Dimer prediction accuracy by the Badness function. When the Read Threshold is adjusted higher, the AUROC also increases (Fig. 4c), but the positive predictive value (PPV) decreases.

We next examined the top five most dominant Dimer reads in the library (Fig. 4d) and compared them to the top five predicted dimer reads based on the Badness function (Fig. 4e). It is noteworthy that only one of the two different top five lists overlap. The other four predicted dimers did not contribute significantly experimentally, and the other four observed dimers were not predicted to have high risk for dimer formation. At a glance, it appears we over-weighted the possibility of forming primer dimers in which the 3′-most nucleotide in unpaired, and we may need to adjust the Badness function to allow a stronger attenuation of Badness based on distance from the 3′ end. Additionally, it appears that the Badness function may be not scaled optimally, as the log10(Badness) ranges between 0 and 3.5, whereas the log10(Dimers) ranges between 0 and 5 (Fig. 4f). This may mean that the current algorithm over-weights weak potential dimers, at the expense of insufficiently avoiding strong predicted primer dimers.

Beyond the above observations, it is not clear why some dimers are observed at much higher reads experimentally than others. For example, the top observed dimer only has a 5 nt overlap at the 3′ end, compared to a 7 nt overlap at the 3′ end for the rank 4 dimer. This is not consistent with our understanding of DNA hybridization and polymerase extension kinetics, and implies that we may not be able to generate a perfect Badness function even ignoring computational resource constraints.

### Gene fusion detection with qPCR and sanger sequencing

Gene fusions are therapeutic targets and attractive diagnostic biomarkers to guide treatment27,28,29,30. Currently, gene fusions are detected either in single-plex by qPCR for known high-frequency fusions (e.g., BCR-ABL1), or by NGS. A highly multiplexed qPCR assay that can detect tens of potential gene fusions relevant to a particular disease could greatly increase the accessibility of gene fusion testing.

Here, we used SADDLE to design a set of 60 primers to detect 56 actionable gene fusions for non-small cell lung cancer (NSCLC) across six genes (ALK, ROS1, RET, NRTK1, NTRK2, and NRTK3). The number of primers are lower than 56 × 2 because the same exon can be fused with multiple partner genes or exons. We detect the fusions in complementary DNA (cDNA) reverse transcribed from RNA, in order to limit the complexity and length of the detection targets. For each fusion of interest, the primer set includes a forward primer (fP) targets the upstream partner gene and a reverse primer (rP) targets the downstream partner gene (Fig. 5a).

We first tested the multiplex PCR panel against synthetic samples bearing the gene fusions of interest (Fig. 5b, c). In all cases, the positive samples were clearly distinguishable by cycle threshold (Ct) value against both commercial wildtype cDNA (WT) and the no-template control (NTC), with all ΔCt values above 10. We also tested the panel on synthetic gene fusion samples with a variant allele frequency (VAF) of 1% (Supplementary Section S8). The 1% VAF samples were constructed by mixing synthetic gBlocks that contained a single fusion (the variant) with human cDNA (the wildtype).

Finally, we applied the gene fusion qPCR panel to clinical cDNA samples extracted from extracellular vesicles in blood plasma from NSCLC patients (Fig. 5d). Of the ten clinical samples analyzed, three were called positive for gene fusions. To identify the exact gene fusion in these samples, we performed Sanger sequencing on the amplicons from the positive samples. Two samples were identified with EML4 exon20-ALK exon20, and one was identified with EML4 exon 15-ALK exon 20.

## Discussion

In this study, we designed a multiplex PCR primer design algorithm SADDLE targeting numerous genomic regions in a single tube. We presented experimental validation of primer sets on a 96-plex cancer-related exons panel, demonstrating that the SADDLE was capable in selecting better primers by reducing dimerization in a multiplex PCR reaction. The dimer rate decreased going from the 90.7% in a naively designed PS1 to 39.6% in an intermediate PS2 and to 4.9% in an optimized PS3, resulting in an increased on-target rate as well as greater uniformity of on-target amplicons. In another 384-plex panel targeting random-selected SNPs in the human genome, the NGS library using the optimized primer set showed a dimer rate of 1%. SADDLE can reduce reagent costs and enable the amplification of hundreds of target templates simultaneously without wasting NGS reads. Importantly, library preparation using optimized primer sets generated by SADDLE does not depend on labor-intensive enzymatic cleavage or size selection steps to remove dimers.

The improvement of NGS library on-target rates through the reduction of primer dimers can allow significantly larger targeted panels to be possible using multiplex PCR library preparation. Because multiplex PCR generally requires less input DNA and are faster than ligation-based library preparation approaches31, due to the low yields of end repair and ligation, we envision that SADDLE-designed primers can be useful for a variety of research and clinical applications where DNA sample quantities are limited and/or where rapid turnaround is needed. For example, in oncology tissue biopsies obtained through fine needle aspirates and core biopsies are frequently insufficient for standard NGS analysis, and cell-free DNA from peripheral blood plasma likewise are limited and impose sensitivity limitations to ligation-based approaches32. Furthermore, in reproductive medicine, samples from amniocentesis and preimplantation genetic screening (PGS) and preimplantation genetic diagnosis (PGD) are also very limited, and require rapid turnaround for molecular diagnostics due to the time-sensitivity of clinical decisions33.

Through our analyses of predicted vs. observed dimers, we found that the parameters in the Loss function used in SADDLE could be adjusted to optimize dimer prediction performance, particularly in the 3$${}^{\prime}$$ distance attenuation. However, with the current SADDLE algorithm, Non-specific amplicons now appear to dominate off-target rates, rather than Dimers. Thus, to further scale-up the panels that can be designed by SADDLE, it will be necessary to construct and optimize new Loss functions that penalize primer sets based on predicted off-target genomic amplification. Modification of the Loss function to minimize Non-specific amplicon formation would require significantly more work, as it requires consideration of the expected sample genome sequence. Whereas the current Loss function is “universal” in improving multiplex PCR primer set designs, a Loss function that considers Non-specific amplicons would inherently be suboptimal for primer dimer minimization. A Loss function predicting Non-specific amplification must also consider external factors, including the average length of the DNA molecules in the sample and nonpathogenic genomic polymorphisms. Current Loss function can be further improved based on NGS data and other methods including machine learning34.

In medical and research applications where the cost of NGS cannot be economically justified35, qPCR assays will likely be the dominant tool for study of genomic variants. In qPCR, even single-plex primers can form significant dimers if poorly designed with Ct values below 30. Multiplex qPCR thus typically requires significant empirical optimization, even at around 4-plex11. SADDLE allowed us to successfully design a 60-primer qPCR panel targeting 56 gene fusions, and exact fusion identities can be determined through affordable Sanger sequencing. Thus, we envision that SADDLE can revolutionize the use of qPCR for highly multiplex molecular diagnostics.

## Methods

### Ethical approval

All procedures performed in studies involving human participants were approved by the ethics committees of Shanghai Pulmonary Hospital, Tongji University (protocol K19-155Y), and were in accordance with the 1964 Helsinki declaration and its later amendments or comparable ethical standards. Informed consent was obtained from all participants.

### Oligonucleotides

All primers were purchased as standard desalted DNA oligonucleotides (Integrated DNA Technologies), and stored at 4 °C.

### Samples

Synthetic DNA templates were purchased as desalted DNA oligonucleotides (gBlocks, Integrated DNA Technologies), and stored at −20 °C. Human cell-line gDNA sample NA18562 (Coriell Biorepository) was stored at −20 °C. The gDNA was mixed with synthetic DNA templates at various ratios to create samples containing different proportions of a specific variant sequence. Dilution of gDNA samples and synthetic DNA templates were made in 1× TE buffer with 0.1% Tween 20 (Sigma Aldrich).

FFPE slides were purchased from Coriell Institute. FFPE DNA was extracted from GeneRead DNA FFPE Kit (Qiagen).

Ten plasma samples from ten NSCLC patients in de-identified format were collected from Shanghai Pulmonary Hospital. RNA in extracellular vesicles was extracted with exoRNeasy Serum/Plasma Kit (Qiagen). cDNA was synthesized with SuperScript IV First-Strand Synthesis System (ThermoFisher Scientific).

### Multiplex PCR protocol

Multiplex PCR was performed on a T100 Thermocycler or a C1000 Thermocycler (Bio-Rad). The total volume of each reaction was 50 µl. DNA sample input ranged from 10 to 100 ng per tube. PCR reagents including vent (exo-) polymerase, ThermoPol Reaction Buffer (10×), and dNTP (New England Biolabs) were used for enzymatic amplification. Thermal cycling started with a 3 min incubation step at 95 °C for polymerase activation, followed by 17 cycles of 30 s at 95 °C for DNA denaturing, 3 min at 60 °C for annealing, and 30 s at 72 °C for extension, followed by a final extension of 5 min at 72 °C. Detailed experiment protocol for the NGS library preparation can be found in Supplementary Section S1, S3.

### End repair protocol

Multiplex PCR product was end-repaired using NEBNext® Ultra II End Repair/dA-Tailing Module (New England Biolabs). Each reaction was a mixture of 3 μl NEBNext Ultra II End Prep Enzyme Mix, 7 μl NEBNext Ultra II End Prep Reaction Buffer, 20 μl multiplex PCR products, and 30 μl H2O. End repair was performed on a Eppendorf Mastercycler. Thermal cycling started with the incubation at 20 °C for 30 min and 65 °C for 30 min, with the heated lid set to 80 °C.

End repair mixture was ligated with adapters using NEBNext® Ultra II Ligation Module (New England Biolabs). Each reaction was a mixture of 30 μl NEBNext Ultra II Ligation Master Mix, 1 μl NEBNext Ligation Enhancer, 2.5 μl NEBNext Adaptor for Illumina, and 60 μl previous End repair mixture. Ligation was performed on a Mastercycler from Eppendorf. Thermocycling started with the incubation at 20 °C for 15 min with the heated lid off; after adding 3 μl USER enzyme to the ligation mixture, the reaction was incubated at 37 °C for 15 min with the heated lid set to 55 °C.

### Index quantitative PCR

Following adapter ligation, Index qPCR was performed on CFX96 Touch Deep Well Real-Time PCR Detection system (Bio-Rad). Quantification of different libraries was performed simultaneously in each well. Each reaction was a 10 μl mixture, with 1 μl i5 index, 1 μl i7 index, 1 μl ligation products, 2 μl Milli-Q, and 5 μl PowerUp SYBR Green Master Mix. Experiment was performed following a thermal cycling protocol with a 3 min incubation step at 95 °C for polymerase activation, followed by 40 cycles of 10 s at 95 °C for DNA denaturing and 30 s at 60 °C for annealing and extension. Ct values were obtained directly from the CFX96 system.

### Index PCR

Index PCR was performed on a T100 Thermocycler or a C1000 Thermocycler (Bio-Rad). Index primers used were NEBNext® Multiplex Oligos for Illumina® (New England Biolabs). Each reaction was a mixture of 2 μl each i5 and i7 index primers, 5 μl ligation products, and PCR reagents including vent (exo-) polymerase, ThermoPol Reaction Buffer (10×), and dNTP. The volume of each reaction was 52 μl. Thermal cycling started with a 3 min incubation step at 95 °C for polymerase activation, followed by various cycles of 30 s at 95 °C for DNA denaturing and 30 s at 60 °C for annealing, and 30 s at 72 °C for extension, followed by a final extension of 5 min at 72°.

### Column purification

Multiplex PCR products and ligation products were all purified using DNA Clean & Concentrator Kits (ZYMO Research). The volume of DNA-binding buffer was 250 μl for multiplex PCR products clean-up, and 482.5 μl for ligation products clean-up; 25 μl Milli-Q water was used as elution buffer for each reaction.

Index PCR product was purified using AMPure XP beads (Beckman Coulter). For each 50 μl reaction mixture, 90 μl of beads was added; 40 μl Milli-Q water was used as elution buffer.

### Library quantitation

All the libraries were quantified using the Qubit dsDNA HS Assay Kit (ThermoFisher Scientific).

### Bioanalyzer

Sizes of PCR products and libraries were measured using Bioanalyzer High Sensitivity DNA Assay (Agilent), and DNA chips were run on the Agilent 2100 Bioanalyzer system.

### Next-generation sequencing

All the libraries were loaded on a Miseq Reagent V2 for obtaining pair-end reads and were sequenced on a Miseq (Illumina).

### Sanger sequencing

PCR products were purified and prepared using a BigDye Terminator v1.1 Cycle Sequencing Kit (Thermo Fisher Scientific) and were sequenced on a Thermo Fisher Scientific 3500 Series Genetic Analyzer. Detailed experiment protocol for Sanger Sequencing can be found in Supplementary Section S8.

### Reporting summary

Further information on research design is available in the Nature Research Reporting Summary linked to this article.