POSoligo software for in vitro gene synthesis

Oligonucleotide synthesis is vital for molecular experiments. Bioinformatics has been employed to create various algorithmic tools for the in vitro synthesis of nucleotides. The main approach to synthesizing long-chain DNA molecules involves linking short-chain oligonucleotides through ligase chain reaction (LCR) and polymerase chain reaction (PCR). Short-chain DNA molecules have low mutation rates, while LCR requires complementary interfaces at both ends of the two nucleic acid molecules or may alter the conformation of the nucleotide chain, leading to termination of amplification. Therefore, molecular melting temperature, length, and specificity must be considered during experimental design. POSoligo is a specialized offline tool for nucleotide fragment synthesis. It optimizes the oligonucleotide length and specificity based on input single-stranded DNA, producing multiple contiguous long strands (COS) and short patch strands (POS) with complementary ends. This process ensures free 5′- and 3′-ends during oligonucleotide synthesis, preventing secondary structure formation and ensuring specific binding between COS and POS without relying on stabilizing the complementary strands based on Tm values. POSoligo was used to synthesize the linear RBD sequence of SARS-CoV-2 using only one DNA strand, several POSs for LCR ligation, and two pairs of primers for PCR amplification in a time- and cost-effective manner.

melting temperature) to ensure uniform hybridization during assembly and high specificity of oligonucleotides for the target to avoid incorrect assembly.
Our POSoligo software was based on the POS method 27 , utilizing single-stranded DNA as a template, thereby circumventing the need for specific thermodynamic properties in each segment of double-stranded DNA.Moreover, the short patch chains exhibit high specificity for adjacent template chains, thereby reducing the likelihood of mutations during synthesis.In terms of versatility, this software supports the input of single-stranded DNA and RNA sequences.

Materials and methods
All oligonucleotides were procured from Sangon Biotech Bioengineering (Shanghai) Co.A comprehensive index of sequences was compiled by replicating the designs outlined in the procedural section.

Polymerase chain reaction amplification
A partial double-stranded DNA template was obtained by PCR amplification using the outermost primer.The reaction mixture comprised 1 μL of LCR product, 2.5 μL of dNTPs (2 mM), 1 μL of each primer (0.2 μM), 5 μL of 10 × Pfu DNA polymerase buffer, high-fidelity DNA polymerase Pfu (5 U/μL, Bio-Basic Inc., Ontario, Canada), and 38.5 μL of nuclease-free H2O.The PCR reaction proceeded as follows: 94 °C for 3 min, 30 cycles of 94 °C for 30 s, 55 °C for 30 s, and 72 °C for 2 min, followed by a final extension at 72 °C for 5 min.

Amplification, cloning and sequencing of synthetic fragments
A 6-well plate was prepared with 3 wells of 293FT cells (including duplicate wells and control), at a density of 1 × 10 6 /mL per well.For electro-transfection, 2 μg of gel-recovered DNA and 100 μL of electro-transfection buffer were added to the electro-transfection cup, mixed well in the X-Porator H1 electro-transfer apparatus, and incubated in the incubator after transfection.RNA was extracted from 3 tubes of cytosol (C1, C2, CON) and reverse transcribed to cDNA.The cDNA served as a template, and RBD-F and RBD-R primers (Table 1) were utilized to amplify the RBD target fragment.The reaction proceeded as follows: 94 °C for 3 min, 30 cycles of 94 °C for 30 s, 55 °C for 30 s, and 72 °C for 2 min, followed by a final extension at 72 °C for 5 min.The DNA clone was ligated into pGEM-T vector, and 2 μL of the ligation product was transformed into JM109.Recombinant plasmids were screened using the blue/white spot selection method, and the recombinant plasmid was extracted from the white colonies and sequenced for analysis.

Algorithm
Our software employs a sophisticated algorithm to convert the input sequence into oligonucleotides.This input sequence is interpreted as single-stranded DNA, which is then divided into consecutive short strands of 50-120 bps.
Additionally, an auxiliary complementary patch strand is calculated and designed to link the two terminal points of COS, serving as a nexus or bridge.Notably, the two terminal regions of the original sequence intentionally lack patch strands, preserving their accessibility.
Following LCR, primers are carefully crafted in the terminal region of the long chain to facilitate PCR amplification.

Implementation
POSoligo has been developed using C + + programming and can be accessed either via direct input sequences or through a .TXT file within the software (Fig. 1).The algorithmic process for POS primarily involves designing a series of overlapping patches, aligning them based on their common sequences, and then merging them to generate the final DNA sequence.This iterative process ensures a satisfactory outcome.The C + + programming language employs various advanced algorithms and techniques to optimize this process: • Sequence alignment algorithms identify suitable target sequences for amplification or detection.
• Primer/probe design algorithms select appropriate, specific, and efficient oligonucleotide sequences.
• Secondary structure prediction algorithms prevent non-specific binding or unwanted interactions between oligonucleotides.

Design of an oligonucleotide set for SARS-CoV-2
During the coronavirus pandemic, we developed this software to combat the virus, aiming to save time and streamline PCR synthesis.To evaluate POSoligo, we designed the nucleotide sequence of the coding region of the S1 protein gene of the SARS-CoV-2 virus (GenBank registry no.QHD43416) encompassing the RBD amino acids Arg319-Lys529, totaling 210 base pairs.Subsequently, we appended the CMV promoter at its 5′-end and the polyA tail at its 3′-end to construct the CMV + RBD + polyA60 expression frame (Fig. 2).The program generated a total of 34 oligonucleotides (Table 1), including CMV1-CMV10 and RBD1-RBD8 as long-structured oligonucleotides (COS) ranging from 50-120 base pairs, and P1-P7 and CP1-CP9 as short structured POS spanning 22-30 base pairs, respectively.Notably, CMV-R served as an intermediate POS for binding the CMV promoter to the RBD reading frame, while CMV-F and RBD-R were utilized as upstream and downstream primers for PCR amplification post-segment assembly.Additionally, RBD-F and RBD-R functioned as primers for reverse transcription cDNAs, followed by sequencing to validate the correct RBD sequence expression.Furthermore, the software's folder was utilized to predict the secondary structure of COS RNA, ensuring free 5′-and 3′-ends, while minimizing self-annealing in the middle portion, which readily denatures at high temperatures (Fig. 3).www.nature.com/scientificreports/Synthesis of SARS-CoV-2 RBD gene in vitro CMV1-CMV10, RBD1-RBD8, P1-P7, CP1-CP9, and CMV-R (Table 1) were combined using the 2 × Pfu PCR Mix system and subjected to LCR on MG25 + under the following conditions: 95 °C for 5 min, 45 cycles of 95 °C for 30 s, 51 °C for 20 s, and 45 °C for 4 min, with a final overnight incubation at 45 °C for high-temperature ligation.
The PCR reaction mixture comprised 1 μL of LCR product, 1 μL of each CMV-F and RBD-R primer (Table 1), 12.5 μL of 2 × Pfu PCR Mix, and 9.5 μL of ddH2O.PCR amplification followed this program: 94 °C for 3 min, 30 cycles of 94 °C for 30 s, 55 °C for 30 s, and 72 °C for 2 min, with a final extension at 72 °C for 5 min.Target bands were identified via 1% gel electrophoresis and excised using a gel recovery kit.
Figure 4 (S1) illustrates the successful in vitro synthesis and amplification of the expression frame sequence of CMV + target antigen RBD + polyA60 (1530 bp) following LCR + PCR reaction.The electrophoresis fragment matched the expected size, confirming the successful synthesis of the expression frame sequence.

Synthesis and validation of RBD gene
The LCR-PCR product was recovered from the gel, and its concentration was measured spectrophotometrically, yielding 0.32 μg/μL.Subsequently, the DNA recovered from the gel was transfected into 293FT cells and incubated for 48 h.The RNA extracted and measured spectrophotometrically at a concentration of 1.

Discussion
Our in vitro gene synthesis approach offers numerous advantages compared to many current methodologies and tools.In our previous gene synthesis process, designing a large number of primers using the PCR synthesis method often led to mismatches in repetitive sequences, resulting in shifting or loss of some repetitive sequences in the synthesized whole gene fragment.To reduce the error rate associated with manual primer design for whole gene synthesis, our software generates primers in batch, significantly reducing the mismatch rate and greatly facilitating whole gene synthesis.Advantage 1: Unlike most modern techniques that require dividing target nucleotides into stable and homogeneous oligonucleotides with consistent thermodynamic properties, such as Gene2Oligo 26 and Assembly PCR Oligo Maker 28 , our software cleaves sequences into varying lengths while specifically maintaining the 5′-and 3′-ends of each POS sequence free and unobstructed.This design greatly facilitates the correct ligation of adjacent structural oligonucleotides by Taq DNA ligase.Unlike techniques that rely on maintaining a consistent Tm value among different oligodeoxynucleotides, our method releases any residual secondary structure during denaturation at 95 °C in the LCR without considering the stability of the Tm value.While Stemmer et al. 18 synthesized a 2.7-kbp sequence in a single step, verification during an intermediate stage is necessary due to an increased risk of errors in long sequences, reducing the likelihood of obtaining an accurate fragment after assembly [10][11][12] .Therefore, shortening long sequences is crucial to improve assembly accuracy.In POSoligo, users can calculate increasingly uniform lengths of oligonucleotides within a defined range using the "segment length" option.Advantage 2: Unlike many current tools that design complete double-stranded DNA using various methods involving special ends, enzymatic methods, or considering different lengths of isolated fragments, our software streamlines the process.It avoids the steep learning curve associated with these tools and instead follows common assembly methods such as PCR and LCR 21,22,25,26,[29][30][31][32] .
POSoligo provides clear output labeling below the sequence alignment once the calculation is completed.Additionally, it generates an "output" file in the root folder, which can be easily viewed using Notepad or similar software.The results can be copied directly into an email or other applications, facilitating communication with the manufacturer of the synthesized sequence.We have also compiled the program in C + + to ensure usability across multiple systems, and we plan to develop a web version of the program in the near future to further enhance accessibility.
POSoligo is highly versatile and can be widely used for designing long DNA fragments in synthetic biology and biotechnology research.We are committed to continuously upgrading the functionality of the software to better serve the needs of researchers.
GCG AAT CTG GTT GCG TTG AAT P2 GGC TCA CCC CAT AAC ACT TAA AGG P3 CAA TCT TAC CTG TTT GGC CCG GTG CAA TTT P4 TAG TTA CCC CCG ACT TTG CTA TCA AGATT P5 GTG CTA CCA GCC TGG TAT ATT TCG GTA P6 TGG TAA CCA ACG CCG TTT GTA GGT TGA AAT P7 TCT AGA GGA TCT TAC TTC TTG GGC CCA CAA ACC CP1 ATA CAA CGT ATG CAA TGG GCC AAG CTC ATG CP2 TAA TAA CTA GTC AAT AAT CAA TGT CAA CAT CP3 ATT TAC CGT AAG TTA TGT AAC GCG GAA CTC CP4 AAG TCC CTA TTG GCG TTA CTA TGG GAA CAT CP5 GGG GCG TAC TTG GCA TAT GAT ACA CTT GAT CP6 ATG TAC TGC CAA GTA GGA AAG TCC CAT AAG CP7 CAA ACC GCT ATC CAC GCC CAT TGA TGT ACT CP8 GAA AGT CCC GTT GAT TTT GGT GCC AAA ACA CP9 TAA ACG AGC TCT GCT TAT ATA GAC CTC CCA CMV1 AAA GGT GTG GGT TTG GAT CCG GCC TCG GCC TCT GCA TAA ATA AAA AAA ATT AGT CAG CCA TGA GCT TGG CCC ATTGC CMV2 ATA CGT TGT ATC CAT ATC ATA ATA TGT ACA TTT ATA TTG GCT CAT GTC CAA CAT TAC CGC CAT GTT GAC ATT GAT TAT TGAC CMV3 TAG TTA TTA ATA GTA ATC AAT TAC GGG GTC ATT AGT TCA TAG CCC ATA TAT GGA GTT CCG CGT TAC ATAAC CMV4 TTA CGG TAA ATG GCC CGC CTG GCT GAC CGC CCA ACG ACC CCC GCC CAT TGA CGT CAA TAA TGA CGT ATG TTC CCA TAG TAA CGC C CMV5 AAT AGG GAC TTT CCA TTG ACG TCA ATG GGT GGA GTA TTT ACG GTA AAC TGC CCA CTT GGC AGT ACA TCA AGT GTA TCA TATGC CMV6 CAA GTA CGC CCC CTA TTG ACG TCA ATG ACG GTA AAT GGC CCG CCT GGC ATT ATG CCC AGT ACA TGA CCT TAT GGG ACT TTC CTA CMV7 CTT GGC AGT ACA TCT ACG TAT TAG TCA TCG CTA TTA CCA TGG TGA TGC GGT TTT GGC AGT ACA TCA ATG GGC GTG CMV8 GAT AGC GGT TTG ACT CAC GGG GAT TTC CAA GTC TCC ACC CCA TTG ACG TCA ATG GGA GTT TGT TTT GGC ACC AAA ATC CMV9 AAC GGG ACT TTC CAA AAT GTC GTA ACA ACT CCG CCC CAT TGA CGC AAA TGG GCG GTA GGC GTG TAC GGT GGG AGG TCT ATA TAA GCAGA CMV10 GCT CGT TTA GTG AAC CGT CAG ATC GCC TGG AGA CGC CAT CCA CGC TGT TTT GAC CTC CAT AGA AGA CAC CGA CTC TAG AG RBD-1 ATG CGC GTA CAA CCG ACG GAG AGT ATC GTA CGA TTC CCT AAC ATA ACG AAT CTC TGT CCG TTT GGA GAG GTA TTC AAC GCA ACC AGA TTC GC RBD-2 GTC AGT CTA TGC GTG GAA TCG GAA GAG AAT ATC TAA TTG TGT TGC TGA CTA TTC TGT GCT GTA TAA CTC AGC CTC CTT TAG TAC CTT TAA GTG TTA TG RBD-3 GGG TGA GCC CGA CAA AAC TTA ACG ACC TTT GCT TTA CCA ACG TGT ACG CCG ACA GTT TTG TAA TCA GGG GGG ATG AAG TTA GGC AAA TTG CAC CGG GCC RBD-4 AAA CAG GTA AGA TTG CAG ACT ATA ACT ACA AAT TGC CAG ATG ACT TCA CTG GTT GTG TTA TCG CGT GGA ATT CTA ACA ATC TTG ATA GCA AAG TCG G RBD-5 GGG TAA CTA TAA CTA TCT TTA CCG CCT GTT TAG AAA AAG TAA CCT TAA ACC GTT CGA GCG AGA CAT AAG TAC CGA AAT ATA CCA G RBD-6 GCT GGT AGC ACA CCT TGC AAT GGG GTG GAG GGG TTC AAC TGT TAC TTC CCC CTC CAA AGT TAT GGA TTT CAA CCT ACA AACGG RBD-7 CGT TGG TTA CCA GCC TTA CAG GGT CGT TGT ACT CAG TTT CGA GTT GCT TCA TGC TCC TGC TAC GGT TTG TGG GCC CAA RBD-8 GAA GTA AGA TCC TCT AGA AAT AAA AGA TCT TAA GTT TCA TTA GAT CTG TGT GTT GGT TTT TTG TGT G CMV-F AAA GGT GTG GGT TTG GAT CCG GCC TCG GCC TC CMV-R CTC TCC GTC GGT TGT ACG CGC ATC TCT AGA GTC GGT GTC TTC TAT GGA GG RBD-F ATG CGC GTA CAA CCG ACG GAG AGT ATC GTA CGA TTC CC RBD-R CAC ACA AAA AAC CAA CAC AC • Simulated annealing or genetic algorithms optimize chemical synthesis and minimize errors or side reactions.C + + programs generate optimized protocols for the design and synthesis of oligonucleotides by integrating these algorithms and techniques.

Figure 1 .
Figure 1.The interface of the POSoligo program enables users to select a document or input a sequence directly into the "Input" section.Users can adjust the lengths of contiguous long strands (COS) and short patch strands (POS) based on the experiment's objectives and requirements.Clicking on "Calculate" generates the results.

Figure 2 .
Figure 2. SARS-CoV-2 S protein primary structure and promoter CMV + target antigen RBD + polyA60 complete expression frame.(A) RBD located on the S1 subunit in S protein; (B) promoter CMV + target antigen RBD + polyA60 complete expression frame.
3 μg/μL was reverse-transcribed into cDNA.The RT-PCR products were analyzed by electrophoresis, showing a match with the expected size of the RBD target gene fragment (Fig. 5A & S2).Sequencing results confirmed the successful synthesis of the RBD fragment, measuring 639 bp (Fig. 5B & Supplement 3).

Figure 3 .
Figure 3. Schematic diagram illustrating the flow of the POS method, with the secondary structure of the POS chain maintaining double ends free in the upper right corner.