The application of thermophilic DNA primase TtDnaG2 to DNA amplification

For DNA replication in vivo, DNA primase uses a complementary single-stranded DNA template to synthesize RNA primers ranging from 4 to 20 nucleotides in length, which are then elongated by DNA polymerase. Here, we report that, in the presence of double-stranded DNA, the thermophilic DNA primase TtDnaG2 synthesizes RNA primers of around 100 nucleotides with low initiation specificity at 70 °C. Analysing the structure of TtDnaG2, we identified that it adopts a compact conformation. The conserved sites in its zinc binding domain are sequestered away from its RNA polymerase domain, which might give rise to the low initiation specificity and synthesis of long RNA segments by TtDnaG2. Based on these unique features of TtDnaG2, a DNA amplification method has been developed. We utilized TtDnaG2 to synthesize RNA primers at 70 °C after 95 °C denaturation, followed by isothermal amplification with the DNA polymerase Bst3.0 or phi29. Using this method, we successfully amplified genomic DNA of a virus with 100% coverage and low copy number variation. Our data also demonstrate that this method can efficiently amplify circular DNA from a mixture of circular DNA and linear DNA, thus providing a tool to amplify low-copy-number circular DNA such as plasmids.

In vitro amplification of genomic DNA is an essential step for genomic DNA sequencing. For an ideal amplification method, the whole genome should be amplified with high fidelity and uniformity. Conventional whole-genome amplification (WGA) methods, including multiple displacement amplification (MDA) and multiple annealing and looping based amplification cycles (MALBAC) use synthetic random oligonucleotides to amplify the genomic DNA 17,18 . However, such random primers have some disadvantages, such as normally featuring certain sequence bias, which accumulates during the amplification process.
Various amplification methods have been developed in which primase is used to synthesize primers, rather than using random primers 14,19 , which is called primase-based whole-genome amplification (pWGA). Using bacteriophage T7 gp4 and T4 replisome, the yield of pWGA was shown to reach over 10 3 -fold amplification. A new method called TruePrime, which uses the Thermus thermophilus primase PrimPol to synthesize DNA primers, has also been used in whole-genome DNA amplification; it has been reported to have high amplification activity and low sequence bias 20 . However, such primases recognize a specific sequence for primer synthesis, which might also lead to amplification bias.
In this study, we demonstrate that TtDnaG2 can synthesize long RNA primers with low initiation specificity at 70 °C in the presence of a denatured double-stranded DNA template. It can also be used in whole-genome amplification with high uniformity, and preferentially amplifies circular DNA rather than linear DNA.

Results
Bioinformatic analysis of TtDnaG2 revealed the basis of its special priming activity. Using structural and functional information on E. coli DnaG and bacteriophage T7 primase gp4, we deduced the corresponding domains and the conserved catalytic centre of TtDnaG2 (Tyr106). We found that TtDnaG2 has an N-terminal ZBD and RPD 16 , which is consistent with E. coli primase DnaG. However, it lacks DnaB-ID (Fig. 1a). We then modelled the 3D structure of TtDnaG2 using I-TASSER. By analysing this structure, we found that TtDnaG2 adopts a compact conformation (Fig. 1b). According to our prediction, the structure of TtDnaG2 has a high TM-score with Aquifex aeolicus primase DnaG (Fig. 1c), which represents high structural similarity. A previous study proved that A. aeolicus primases adopt a compact conformation, in which the conserved residues in the ZBD are far away from the active sites of the RPD 21,22 , which decreases the specificity of initiation and restricts the primer length. According to our predicted structure, TtDnaG2 also has a compact conformation, in which the conserved residue Tyr106 in the ZBD is far away from the active site of the RPD. This may explain why the product of TtDnaG2 is long, it may also explain why it has low site specificity with all templates previously tested 16 . However, in T. tengcongensis, the main primase is TtDnaG, which is more like a conventional primase, while the special features and potential physiological function of TtDnaG2 are unknown.
TtDnaG2 initiates RNA primer synthesis on a denatured double-stranded template. TtDnaG2 does not has apparent template specificity for primer synthesis 16 , in contrast to other known primases, which prefer to initiate primer synthesis at specific DNA sequences 23,24 . We here explore whether it can be used in primase-based whole-genome amplification. A previous study revealed that TtDnaG2 can synthesize RNA primers at a wide range of temperatures on an M13 single-stranded template, as well as other short synthetic DNA oligonucleotides 16 . First, we tested whether TtDnaG2 can synthesize RNA primers on denatured double-stranded DNA. We used the plasmid pET28a and lambda DNA as templates; after denaturing these two double-stranded templates at 95 °C for 30 s, the RNA primers were synthesized at 70 °C for 30 min (Fig. 2a, Supplementary Fig. 8b) in the presence of [α-32 P]ATP. The RNA products were then analysed by autoradiography. As shown by the gel (Fig. 2b), TtDnaG2 can synthesize RNA by using the denatured double-stranded DNA as a template, and the RNA products are longer than 100 nucleotides. This indicates that TtDnaG2 can withstand 95 °C for 30 s. After heating at 95 °C for 30 s, we also tested the reaction temperatures of 75 °C, 80 °C, 85 °C, 90 °C, and 95 °C for 30 min. In contrast to the findings in a previous study with single-stranded templates, we found that the synthesis of primers on a double-stranded template could hardly occur at temperatures higher than 75 °C (Fig. 2d, Supplementary  Fig. 8a). Specifically, in the study of Li et al., TtDnaG2 could synthesize primers on a single-stranded template at 85 °C. To explain this difference, we propose that, after being heated at 95 °C for 30 s, TtDnaG2 is denatured at temperatures higher than 75 °C for 30 min. When 50 ng of double-stranded plasmid is used, after being heated at 95 °C for 30 min and primer synthesis occurring at 70 °C for 30 min, according to the results of agarose gel electrophoresis, the primers can be seen (Fig. 2c). In addition, differing from the primers synthesized in the reaction with short synthetic DNA oligonucleotides, the length of the synthetic primers is mainly greater than 100 nucleotides.
TtDnaG2 and Bst3.0 can be used in rolling circular amplification. As TtDnaG2 has special characteristics that make it recognize any DNA sequence that we tested 16 , we further explored whether its RNA product can act as a primer in DNA isothermal amplification. We used the plasmid pET28a as a template and carried out rolling circular amplification. For this, we used the DNA polymerase Bst3.0, which has high amplification and displacement activities, to extend the DNA strand beyond the RNA primer. We designed the reaction to proceed in two steps. At the priming step, we added 500 μM dNTP to the reaction system and carried out the reaction at 95 °C for 30 s and 70 °C for 30 min. After the priming step, we changed the temperature to 95 °C for 3 min, to abolish the priming activity of TtDnaG2, and dropped from the template to provide a template for DNA polymerase. Then, we put the reaction system on ice for 10 min, so that the RNA primers could anneal to the templates again. Subsequently, we added the enzyme Bst3.0 to the reaction system and kept the temperature at 65 °C overnight for amplification (Fig. 3a). We detected the product of this reaction by agarose gel electrophoresis; the product was found to be very large, which is consistent with our speculation that it is a large DNA molecule with many branches (Fig. 3b). We used at least 2.5 ng of template in the 50-µl reaction system and obtained 26.5 µg of DNA on average, which represented 1.06 × 10 4 -fold amplification (Fig. 3c). When we adopted a two-round amplification strategy, we used 2.5 pg of plasmid template, which led to amplification efficiency of 5.1 × 10 7 fold on average (Fig. 3d). Then, we tested the amplification products by PCR and sequencing. We designed three pairs of primers to amplify three fragments on pET28a, with the results indicating that the three fragments were all sequences of pET28a (Fig. 3e). Next, we applied the DNA polymerase phi29, which has strand displacement activity and is widely used in whole-genome amplification [25][26][27] . As phi29 DNA polymerase and TtDnaG2 cannot work in the same buffer, we needed two reaction systems to amplify DNA. After the priming step, the mixture was added to the phi29 amplification system. In our study, phi29 could also extend the DNA strand following RNA primer synthesis by TtDnaG2 when we used 5 ng of plasmid pET28a as a template (Fig. 3f). As the elongation activity of phi29 is lower than that of Bst3.0, the amplification products of phi29 are shorter than those of Bst3.0. Our data indicate that TtDnaG2 can be used for efficient primase-based DNA amplification.
TtDnaG2 and Bst3.0 can be used for amplifying circular DNA mixed with linear DNA. Next, we explore whether the linear template lambda DNA can be amplified with this method (Fig. 4a). As the polymerase Bst3.0 can theoretically synthesize a DNA strand without stopping, its amplification product can be longer than lambda DNA. As TtDnaG2 synthesizes primers on a template at random, each primer can be elongated to the terminal of the template, so different primers on the template can produce different products. Our data also indicate that the amplification product of lambda DNA is a mixture of DNA fragments of different lengths, most of which are shorter than lambda DNA (Fig. 4a). As Bst3.0 can theoretically elongate without stopping on a circular template, we further explored whether we can amplify linear and circular DNA molecules with different efficiency in a mixture of linear E. coli genomic DNA and plasmid pET28a. We mixed genomic DNA of E. coli, which is usually a linear molecule in vitro, with the plasmid pET28a. Then, we amplified a mixture with TtDnaG2 and Bst3.0, and tested the ratio of plasmid vs. genomic DNA in the amplification products by real-time PCR. We found that the relative amount of plasmid was elevated by 22-fold higher than genomic DNA (Fig. 4b). This indicates that our method might be useful as a tool to efficiently amplify low-copy-number circular DNA when it is mixed with linear DNA. TtDnaG2 can be used in whole-genome amplification. To test whether our method can be used in whole-genome amplification, we used the enzymes TtDnaG2 and Bst3.0 to amplify the genomic DNA of a virus newly discovered in Sulfolobus solfataricus. We use 5 ng of genomic DNA as a template for amplification and sequenced the product using next-generation sequencing technology. To determine the amplification quality, we analysed the coverage breadth, GC content, copy number variation (CNV), and amplification error rate. To analyse the coverage of the amplification product, we used the software Burrows-Wheeler Aligner (BWA). We found that, when the sequencing depth was 5-and 10-fold, the coverage was 93.6% and 100%, respectively (Fig. 5a). (b) Efficiency of circular plasmid amplification. The sample was for the first-round reaction; the template DNA is 2.5 ng. (c) Efficiency of circular plasmid amplification. The sample was for the second-round reaction, for which the amplification products of the first-round reaction were used as a template for amplification again. The initial template DNA was 2.5 pg. (d) Agarose electrophoresis image of amplification product. The sample was denatured at 95 °C for 30 seconds and then incubated at 70 °C for RNA primer synthesis with (+) or without (−) TtDnaG2. At the amplification step, the mixture was incubated on ice for 10 minutes, followed by adding Bst3.0 and incubating at 65 °C overnight. Circular plasmid was used as template. (e) Agarose electrophoresis image for checking the PCR results. TtDnaG2 is used in the amplification reaction; primers 1, 2, and 3 are used for checking the PCR results. (f) Agarose electrophoresis image of amplification product synthesized by phi29. The sample were denatured at 95 °C for 30 seconds and then incubated at 70 °C for RNA primer synthesis with (+) or without (−) TtDnaG2. At the amplification step, the mixture was mixed with "mixture B" which was mentioned in "Methods", followed by incubation of the mixture at 95 °C for 3 min, after which it was placed on ice for 10 min. Finally, phi29 was added, followed by incubation at 30 °C overnight. Circular plasmid was used as template.
SCIentIfIC RepoRTs | 7: 12809 | DOI:10.1038/s41598-017-12241-6 The GC content was 33%, which is consistent with the reference sequence (Fig. 5b). For the detection of CNV, we used the software Integrative Genomics Viewer (igv_2.3.88), which showed that the CNV is low (Fig. 5d). By single-nucleotide polymorphism analysis, we only found 2 nucleotides that differed from the reference sequence, so the error rate of amplification was 0.0086% (Fig. 5c). Accordingly, we can thus amplify the genomic DNA of this virus effectively with the combination of TtDnaG2 and Bst3.0. Agarose electrophoresis image of amplification product. The sample was denatured at 95 °C for 30 seconds and then incubated at 70 °C for RNA primer synthesis with (+) or without (−) TtDnaG2. At the amplification step, the mixture was incubated on ice for 10 minutes, followed by adding Bst3.0 and incubating at 65 °C overnight. Linear lambda DNA was used as template.50 ng of template was used,1/10 of the products were loaded on the gel. (b) The relative fold enrichment of plasmid in amplification products of TtDnaG2 and Bst3.0; 10 ng of template, involving a mixture of E. coli genomic DNA and plasmid at a ratio 1000:1, was used. The percentage content of plasmid in the mix was defined as "1" (the red column). The blue column represents the fold enrichment after amplification.

Discussion
Primases usually synthesize RNA primers on a single-stranded DNA template in vivo. In the priming step of DNA replication, double-stranded template is unwound by helicase 28,29 , and ssDNA binding protein is needed to stabilize ssDNA. Then, primases can synthesize RNA primers. In our study, we found that the primase TtDnaG2 can synthesize RNA primers at high temperature in vitro, when double-stranded DNA is denatured. The priming process does not require helicase and ssDNA binding protein. Consistent with the previous findings 16 , the priming product of TtDnaG2 is long, unlike most reported primases. The RNA primers were longer than 100 nucleotides in our study and the priming activity was high. Another characteristic of TtDnaG2 is that it has low site bias. A previous sequence analysis indicated that TtDnaG2 lacks motif IV in RPD, which is consistent with the structure of T7 bacteriophage gp4 16,30 . gp4 synthesizes RNA short primers at specified sites, so the lack of motif IV in RPD of TtDnaG2 may not be the reason why it has low site bias and can synthesize long RNA products 31 . The modelled 3D structure of TtDnaG2 indicates that it adopts a compact conformation, in which the active sites that control the specificity and length of primers are sequestered away from the synthesis domain. This may explain why TtDnaG2 can synthesize long RNA primers with low sequence bias.
Primase is thought to be promising in whole-genome amplification to overcome some shortcomings in whole-genome amplification 19 . In this context, the fact that TtDnaG2 can synthesize RNA primers at high temperature in vitro, without the need for helicase and ssDNA binding protein, suggests its potential in whole-genome amplification. Upon combining TtDnaG2 with Bst3.0 or phi29, which have high strand displacement activity and can use RNA primers to initiate amplification, we amplified the plasmid at a rate of 5.1 × 10 7 -fold. The reaction process can be summarized as follows: In the primer synthesis step, double-stranded DNA is denatured by heating at 95 °C. Then, TtDnaG2 synthesizes RNA primers at 70 °C, after which TtDnaG2 synthesis is abolished by exposure to 95 °C for 3 min. In the elongation step, Bst3.0 or phi29 elongates a new DNA strand with RNA primers. Our data also indicate that Bst3.0 and phi29 can use RNA primers to elongate DNA strands and may displace downstream RNA encountered. Compared with previously reported T4 and T7 pWGA, which need ssDNA binding protein or other proteins, our method is simpler as helicase or ssDNA binding protein is not required. Compared with TruePrime, TtDnaG2 has been proved to have low sequence bias, which may lead to lower amplification bias. The amplification of genomic DNA and analysis of sequencing results indicate that it has high amplification activity, 100% coverage, the CNV is low, and the error rate is lower than 1/10 4 .
To reduce the nucleotide error rates in the process of amplification, we used high-fidelity DNA polymerases phi29 and Pfu after priming to amplify DNA. However, we did not observe DNA amplification with the combination of Pfu DNA polymerase and TtDnaG2. Phi29 can use RNA primers generated by TtDnaG2 to amplify DNA, but it cannot work in the same buffer with TtDnaG2. We need a buffer exchange between priming and DNA amplification. Bst3.0 is a high-processive polymerase but has no 3′ → 5′ exonuclease activity. In this study, we found only two nucleotides that differed from the reference sequence by Bst3.0 amplification. Given that the virus genome is 23,219 bp, the error rate is 2/23,219. Accordingly, the error rate of Bst3.0 is lower than 1/10 4 in our reaction system, and it is comparable to that for Phi29 (error rate of 1/10 4 -1/10 6 ) 32 . Moreover, TtDnaG2 and Bst3.0 work well in the same reaction buffer. Thus, it would be advantageous to combine TtDnaG2 with Bst3.0 to amplify genomic DNA. We believe that this method could contribute greatly to obtaining complex genomic information accurately.
In the amplification of a mixture of linear E. coli genomic DNA and circular plasmid, the amplification efficiency of plasmid was 22-fold higher than that of the linear DNA. Bst3.0 can theoretically elongate a strand continuously on a circular DNA template, while the linear DNA in an experiment normally reaches 300 Kbp in length at most. This results in high amplification efficiency for circular templates and lower efficiency for short linear ones. Theoretically, when the linear DNA template is sufficiently long, the amplification difference is small; however, the shorter the linear DNA template, the larger the difference. Long primers synthesized by TtDnaG2 also overcome the limitation of Bst3.0 that it cannot use short random primers at 65 °C. These features make the method a potential tool to amplify low-copy-number circular DNA such as plasmid DNA or circular viral DNA from a whole DNA sample where the circular DNA might not be efficiently amplified and the information may be lost in the subsequent sequencing analysis.

Expression and purification of TtDnaG2. Upon the transformation of E. coli competent cells Rosetta
(DE3) with the plasmid pET28a-TtDnaG2, the cells were cultured in Luria-Bertani (LB) medium with 50 µg/mL kanamycin at 37 °C until they grew to OD 600 0.4 to 0.6, after which isopropyl-β-D-thiogalactopyranoside (IPTG) was added to a final concentration of 0.5 mM. The cells were then induced to express the protein at 37 °C for 3 h. After culture, the cells were harvested by centrifugation at 5000 rpm for 30 min, followed by resuspension of the pellet in lysis buffer [0.3 M NaCl, 10 mM imidazole and 50 mM sodium phosphate buffer (NaH 2 PO 4 /Na 2 HPO 4 , pH8.0)] and then ultrasonication for 50 min. To obtain soluble protein, the lysates were centrifuged at 10,000 rpm for 30 min, the pellet was discarded, and the supernatant was incubated at 70 °C for 30 min, followed by another session of centrifugation under the same conditions. The pellet was then discarded and the supernatant was passed through a 0.45-µm filter. Next, the protein TtDnaG2 was purified by affinity chromatography with a Ni 2nitrilotriacetic acid-agarose column and gel filtration chromatography with a Superdex 200 10/300 GL column. The purified protein was detected by SDS-PAGE and then concentrated with Amicon Ultrafra-15 concentrators, after which it was stored in 10% glycerol at −70 °C.
DNA amplification assay. For amplification with TtDnaG2 and Bst3.0, in the primer synthesis step, the 50-µL reaction system contained 50 mM HEPES (pH7.5), 10 mM dithiothreitol, 100 mM potassium glutamate, 5 mM magnesium acetate, 100 µM NTP mixture, 500 µM dNTP mixture, and 2.5 ng of plasmid pET28a template (or 50 ng of lambda DNA or a 10-ng mixture of genomic DNA and plasmid). The synthesis programme started with denaturation at 95 °C for 30 s, followed by synthesis at 70 °C for 30 min. At the amplification step, the mixture was incubated at 95 °C for 3 min, then on ice for 10 min, followed by adding 2 µL of Bst3.0 and incubating at 65 °C overnight. For two-round amplification of 2.5 pg of plasmid DNA, we used a 20-µL reaction system in the first round, with an amplification time of 3 h. In the second round, we added a 30-μL reaction mix [50 mM HEPES (pH7.5)], 10 mM dithiothreitol, 100 mM potassium glutamate, 5 mM magnesium acetate, 100 μM NTP mixture, 500 μM dNTP mixture, and 1 µM TtDnaG2), after which the priming and amplifying steps were carried out.
For amplification with TtDnaG2 and phi29, in the primer synthesis step, the 1-µL reaction mixture (mixture A) contained 50 mM HEPES (pH7.5), 10 mM dithiothreitol, 100 mM potassium glutamate, 5 mM magnesium acetate, 100 μM NTP mixture, and 5 ng of plasmid pET28a. The synthesis programme started with denaturation at 95 °C for 30 s, followed by synthesis at 70 °C for 30 min. At the amplification step, mixture A was first mixed with mixture B (9 µL), which contained 1 µL of 10* reaction buffer for phi29 DNA polymerase, 250 µM dNTP mixture, and 5 ng of plasmid pET28a, followed by incubation of the mixture at 95 °C for 3 min, after which it was placed on ice for 10 min. Finally, 1 µL of phi29 was added, followed by incubation at 30 °C overnight. qPCR assay of amplification products and calculation of fold enrichment. The primers designed for genomic DNA qPCR were as follows: E-f (CCAGTGGTCGCATCATCGTTA), E-r (CCATTATCTCGGTGGTAGGTG); those for plasmid pET28a were: P-f (CGACATATCGGATTGTCCCTA), P-r (TCGGCCAGATCGTTATTCAGT). The fold enrichment was calculated using the following formula: ΔCt = Ct(P) − Ct(E), −ΔΔCt = − [ΔCt(amplification products) − ΔCt(control)]. The fold enrichment was 2 −ΔΔCt . PCR assay. We designed three pairs of primers to amplify the amplification products of pET28a: The amplification reaction mixture was diluted 1:100, after which 1 μL was added to the PCR reaction mix. We used Takara PrimeSTAR MAX to clone the three fragments.
Agarose gel electrophoresis assay. Here, 1.2% agarose gel was prepared with TAE buffer containing 40 mM Tris-acetate and 2 mM EDTA. Next, our sample and the DNA ladder were loaded on an agarose gel and electrophoresed at 100 V for 25 min. After staining with ethidium bromide, the result was analysed by UV imaging.
Quantification of the DNA output. The sample was precipitated with three volumes of precooled ethanol and stored at −20 °C for 2 h. It was then centrifuged at 13,300 rpm and 4 °C, after which the supernatant was discarded and the remainder was washed twice with 70% ethanol. Following drying at room temperature, the sample was dissolved with sterile water. The DNA was then quantified with a spectrometer and the output was calculated.
Sequencing of amplification product. After amplification of the whole-genome DNA, the DNA sample was sequenced by Novogene. The three short PCR products were sequenced by TSINGKE, with sequencing primers F1, F2, and F3.
Sequence coverage, single nucleotide polymorphism (SNP), and CNV analysis. The sequences of the PCR products were subjected to a Blast search by the software A Plasmid Editor (ApE). The coverage of whole-genome amplification products was analysed by the software BWA, while CNV was analysed by the software Integrative Genomics Viewer. The SNP was analysed by the software Genome Analysis Toolkit (GATK). The reference sequence of the viral genome was from GenBank (MF144115).
Data availability. All data generated or analysed during this study are included in this published article and its Supplementary Information files.