Identification of Thermus aquaticus DNA polymerase variants with increased mismatch discrimination and reverse transcriptase activity from a smart enzyme mutant library

DNA polymerases the key enzymes for several biotechnological applications. Obviously, nature has not evolved these enzymes to be compatible with applications in biotechnology. Thus, engineering of a natural scaffold of DNA polymerases may lead to enzymes improved for several applications. Here, we investigated a two-step approach for the design and construction of a combinatorial library of mutants of KlenTaq DNA polymerase. First, we selected amino acid sites for saturation mutagenesis that interact with the primer/template strands or are evolutionarily conserved. From this library, we identified mutations that little interfere with DNA polymerase activity. Next, these functionally active mutants were combined randomly to construct a second library with enriched sequence diversity. We reasoned that the combination of mutants that have minuscule effect on enzyme activity and thermostability, will result in entities that have an increased mutation load but still retain activity. Besides activity and thermostability, we screened the library for entities with two distinct properties. Indeed, we identified two different KlenTaq DNA polymerase variants that either exhibit increased mismatch extension discrimination or increased reverse transcription PCR activity, respectively.

Scientific RepoRts | (2019) 9:590 | DOI: 10.1038/s41598-018-37233-y saturation mutagenesis at sites identified from structural data that are contacting the substrates. The later enzymes with the increased substrate scope (i.e., capability to use RNA as a template) were identified in libraries that were built from unfocused mutations on the entire enzyme scaffold through error-prone PCR. Interestingly, over the years, when searching for improved DNA polymerases, we failed to identify DNA polymerases with increased mismatch discrimination from unfocused libraries while we also failed to obtain DNA polymerases with reverse transcription activity from focused libraries.
Obviously, the usage of unfocused libraries suffers from the drawback of searching an astronomical number of sample sizes for new combinations of beneficial mutations, and also of the over-representation of non-functional protein sequences in the library. Therefore, rationalizing the amino acid position based on the knowledge of structural and evolutionary information, and generating a combinatorial library with functionally beneficial mutations would considerably improve the sequence diversity with increased functional protein sequences. Thus, we attempted to design a combinatorial library that coupled structure-based rational design and molecular shuffling of functional mutants. We reasoned that shuffling only active mutants for combinatorial library design would reduce the load of non-functional mutations in enzyme variants and thereby enrich the sequence diversity of the library with improved functional scope. Indeed, we identified two different KlenTaq DNA polymerase variants that either exhibit increased mismatch extension discrimination or increased reverse transcription PCR activity, respectively.

Results
General study design and selection of mutation sites. First, we aimed at constructing a focused library of KlenTaq DNA polymerase variants by rationally selecting target residues for mutation that were located in the proximity of the active site and make contact with primer/template DNA complex. Upon saturation mutagenesis at target residues and fluorescence-based screening, functional mutants should be identified. These mutants should be shuffled in single reaction to construct a combinatorial library that was again screened for variants with improved mismatch discrimination and reverse transcription activity, respectively.
For the selection of sites to be mutagenized by saturation mutagenesis, we first inspected a crystal structure of a ternary complex composed of KlenTaq DNA polymerase (Fig. 1A,B) 23 . We reasoned that residues that make direct contacts with the primer/template DNA complex and incoming nucleoside-5′-O-triphosphate are promising candidates for mutagenesis, as these residues could significantly influence the substrate recognition and the catalytic property of KlenTaq DNA polymerase. These residues were selected for mutagenesis. In addition, we also focused on residues for mutation that are evolutionarily conserved in family A DNA polymerases from multiple sequence alignment of Taq, E. coli and T7 DNA polymerases, as mutations at such sites were shown to be promising to obtain altered functions [23][24][25][26][27] . The selected residues are mostly directed towards thumb, palm and finger domains since these regions form the primer/template DNA complex and nucleotide-binding crevice. We chose 19 residues as target sites for focused library generation: 9 residues from the finger domain, 6 residues from the palm domain and 4 residues from the thumb domain, respectively. A detailed view of the chosen sites which are either in proximity or in contact with primer/template DNA complex is depicted in Fig. 1C-E. Residues R728, R746, M747, Q754 (finger), A570, D578 (palm) and N483, (thumb) contact the template strand. Residues V586, (palm) and E507, S515, K540, (thumb) contact the primer strand. Residues I614, H639, F667, K663, R659 (finger) contact the incoming nucleoside-5′-O-triphosphate. Residues V783, H784 (palm) are highly conserved and located in proximity to primer strand. Residue R573 (palm) contacts both primer and template strand (Fig. 1D insert).
Focused library: Construction and functional screening. After selecting the target sites for mutagenesis, we constructed the focused library of all possible 19 mutants for each target position. Each target site was mutated with oligonucleotides containing defined codons for the 19 individual amino acid exchanges using standard site-directed mutagenesis. The confirmed 361 mutants from the 19 target sites were established in 384 well plates and subsequently utilized for functional screening. Screening was performed with heat-denatured bacterial lysates as successful as before to identify active and inactive mutants using real time PCR 28 . For this purpose, a 92mer template DNA of the NANOG promoter region 29,30 along with corresponding primers and SYBR ® Green I were employed in screening reaction and the data was recorded. The PCR active and inactive mutants were determined by measuring the respective Cq value, the cycle number at which the fluorescence signal crosses the background signal with an exponential increase of the signal. The screening results revealed that mutation at each position influenced activity to a varied extent (Fig. 1F). Of the positions investigated in palm and thumb domains, more than 60% and 70% mutants generated were found to be PCR active in each domain respectively (Fig. 1F). On the other hand, mutations in the finger domain resulted in an increased number of inactive mutants in comparison to palm and thumb domains, and only 25% PCR active mutants were found (Fig. 1F). The protein expression levels of selected mutants that are inactive were analysed by SDS-PAGE in order to elucidate that the reduction in activity is not due to reduced protein expression (see Supplementary Fig. S1).
Generation of a combinatorial library with functional mutants. Strategies involving degenerate oligonucleotides, gene fragments and synthetic DNA fragments were employed in the past to design combinatorial protein libraries [31][32][33][34] . Nevertheless, the rationale of shuffling several functional mutants of multiple target residues to construct a combinatorial library has not been attempted before. Here, we exploited the functional information of single site mutants achieved from the focused library to construct a combinatorial library for KlenTaq DNA polymerase by molecular shuffling (Fig. 2) and followed an approach that was described as RACHITT (random chimeragenesis on a transient template) 34 . We attempted to construct a combinatorial library with 173 active mutants from 12 target sites of KlenTaq DNA polymerase (Fig. 2). For this purpose, 173 mutagenic oligonucleotides were used, each containing defined mutations for the active mutant of focused library (see Supplementary  Table S2). The mutagenic oligonucleotides of 173 active mutants were annealed with a transient single stranded template DNA of KlenTaq DNA polymerase that was obtained by PCR in the presence of dUTP instead of dTTP. The annealed mutagenic oligonucleotides 3′ ends were extended with Phusion U Hotstart DNA polymerase. After chimeric strand synthesis, the nicks were ligated with Taq DNA ligase and the transient single stranded template DNA was treated with uracil-DNA glycosylase to introduce abasic sites that foster strand cleavage under slightly basic conditions. Upon PCR amplification of chimeric strand and ligation into plasmid DNA, the recombinant plasmids were transformed into E. coli and grown on selection plate. Plasmids were prepared from randomly chosen recombinant clones and the sequences were analysed for the presence of chimerism at the defined mutational  23 . The N-terminal domain is highlighted in grey. The DNA complex containing primer (green) and template (yellow) is depicted. The finger, palm and thumb domains are depicted in cyan, red and magenta, respectively. (B) Rationally selected target sites for focused library construction are shown in the primary structure of KlenTaq DNA polymerase. The residues selected for mutations are highlighted in coloured boxes and the location of residues in different domains of KlenTaq DNA polymerase are shown in red (palm), magenta (thumb) and cyan (finger). Evolutionarily conserved residues are marked with asterisk sign on the top of amino acids. (C-E) Detailed view of rationally selected target residues from finger, palm and thumb domains, respectively. The incoming dideoxycytidine triphosphate is shown in dark blue. (F) Activity profile of target amino acid sites investigated by site-directed mutagenesis and denoted in their respective domain (finger, palm and thumb) colour code. Amino acid substitutions resulting in PCR active mutants are denoted in green (Cq 1-30) and the inactive mutants are shown in black (Cq >36). Blue indicates mutants with reduced activity (Cq 31-35) and the parent amino acids are highlighted in grey. Circled numbers indicate the active mutants included for molecular shuffling in the combinatorial library.
Scientific RepoRts | (2019) 9:590 | DOI:10.1038/s41598-018-37233-y sites of KlenTaq DNA polymerase. The chimerism of mutants from the resulted library was verified by sequencing 20 random clones and the analysis of nucleotide sequences revealed that 85% of random clones had the mutations at least at one or two target sites (see Supplementary Fig. S2). Noteworthy, combinatorial library design involving shuffling of several numbers of functional mutants in position-specific manner, including mutagenesis of multiple target residues (12 target sites), has not been reported before 35 .
Establishing and screening of combinatorial library for PCR activity. In order to generate the combinatorial library for activity screening, single cell clones were picked from the selection plate and cultivated in multi-well plates. A total of 15,000 recombinant clones were established. Overexpression was conducted in 96 well plate formats, and lysates were prepared for screening of functional mutants (PCR activity) as described above. The screen was performed using the corresponding template and primers, and enzyme activity recorded by quantifying the synthesised double stranded DNA using SYBR ® Green I. About ~2000 mutants were identified to be PCR active with Cq value as high as <36 cycles to account for enzyme variants with impeded expression.

Mutant with improved discrimination between matched and mismatched primer/template substrates.
KlenTaq DNA polymerase mutants with enhanced discrimination between extending from matched and mismatched primer ends have potential in applications like allele specific amplification (ASA) 21,36,37 . Thus, we set up a screen to monitor allelic discrimination through real time PCR (qPCR) 38 . Two parallel PCR reactions; one with matched primer/template complex and the other with primer/template complex that has a single mismatch at the 3′-primer terminus, were conducted directly, using heat-denatured bacterial lysates containing mutant DNA polymerase. The amplification efficiency of both reactions was recorded in qPCR and the Cq determined respectively. The ∆Cq i.e., the difference in Cq value between both parallelly conducted reactions, is a measure for the allelic discrimination of the respective enzyme. Mutants showing higher ∆Cq than wild-type KlenTaq DNA polymerase were considered positive hits. In order to investigate the influence of protein expression and other factors, e.g., originating from components in lysate in the real time PCR reaction, selected positive hits were purified to homogeneity (see Supplementary Fig. S3) and investigated in primer extension experiments (see Supplementary Fig. S4). The best performing mutant, henceforth termed Mut_ADL, showed promising results in both, primer extension and real time PCR (Fig. 4). In primer extension, the Mut_ ADL showed a significant difference in the propensity of extending the mismatch primer when compared with  wild-type KlenTaq DNA polymerase (Fig. 4A). Wild-type KlenTaq DNA polymerase yielded 30 nt long reaction products due to non-templated addition of one extra nucleotide to the blunt DNA duplex, a phenomenon that was described for 3′-5′ exonuclease deficient DNA polymerases before 39,40 . Interestingly, the non-template reaction was reduced in case of mutant DNA polymerase. In qPCR experiments, the purified mutant enzyme exhibited a ∆Cq value of 26 cycles and on the contrary, wild-type KlenTaq DNA polymerase exhibited only 17 cycles. However, the mutant enzyme is somewhat less efficient in amplification from matched primer strand than the wild-type enzyme. Thus, the higher ∆Cq value of mutant enzyme confirmed the improved discrimination property of Mut_ADL. Melting-curve analysis revealed the amplification of a specific single PCR product during PCR reaction (Fig. 4B,C). Their corresponding melting peaks (right panels) confirmed the PCR products that were amplified by both DNA polymerases. Nucleotide sequence analysis of Mut_ADL revealed that the enzyme bears mutations at eight positions (see Fig. 5D and Supplementary Fig. S5). Noteworthy, the mutant DNA polymerase tolerated a mutation load of up to 8 amino acid substitutions that are involved in contacts with the primer/ template complex. Next, to demonstrate the potential application of Mut_ADL's increased discrimination, we investigated the enzyme activity regarding its property to detect single nucleotide polymorphism in genomic DNA. We thus chose the genomic SNP of olfactory receptor sequence context 41 within HeLa genomic DNA. The increased propensity of Mut_ADL in comparison with the wild-type enzyme for allelic discrimination becomes more evident, as the mutant amplifies from a matched primer/template complex significantly more efficiently than from the mismatched (Fig. 5A-C).
In order to gain first preliminary insights into the overall selectivity of Mut_ADL, we also sequenced the amplificates of the PCR and found, that the sequence was maintained without any errors (see Supplementary   S6). Additionally, we also investigated incorporation selectivity of matched versus mismatched nucleotides. The data obtained (see Supplementary Fig. S7) indicate no significant changes compared to the wild-type enzyme. Thus, the obvious increase in mismatch extension discrimination is not accompanied by an increase in insertion selectivity. This is not surprising since our screening design was set to focus on the former. Next, we investigated the processivity of Mut_ADL in comparison to wild-type enzyme. We followed published protocols [42][43][44] and used heparin as a trap. In brief, the primer template complex was preincubated with the enzyme prior to simultaneous addition of dNTPs and heparin. After incubation, the reactions were analysed by PAGE. Interestingly, while the wild-type extends the primer by incorporation of up to 13 nucleotides, the Mut_ ADL is much more distributive and hardly starts synthesis and already aborts extension mostly after incorporation of one nucleotide (Fig. 5E). This distributive behaviour might add to the overall propensity of the enzyme to discriminate mismatches at the primer end.
In order to gain first insights whether the discrimination of mismatch extension also applies to other sequence contexts, we investigated Mut_ADL along these lines (see Supplementary Fig. S8). Much to our delight, the property of mismatch discrimination can also be found in the other sequence context. In summary, a viable mutant with potential application in allele-specific amplification that is used e.g., for genotyping, mutation diagnostics, and HLA typing [45][46][47] with a hitherto unprecedented mutation load for a DNA polymerase with increased selectivity was discovered from the depicted combinatorial library design.
Mutants with new catalytic activity: reverse-transcription PCR. The sequence and functional diversity of the constructed library was screened next for the identification of a functional property in which the wild-type enzyme shows little activity, namely the ability to reverse transcribe RNA into DNA 48,49 . The screen for reverse transcription PCR activity was again conducted from lysates of cells, expressing the mutated enzymes 22,50,51 . The enzymes were screened for reverse transcriptase (RT) activity through real time PCR, using MS2 bacteriophage genomic RNA as substrate for DNA synthesis. Mutants with RT activities were selected by quantifying PCR product formation with SYBR ® Green I and by melting curve analysis in 96 well plate format. After screening the PCR active mutants of the combinatorial library, we identified several mutants with significantly increased RT activity. Afterwards, the RT active mutants were screened in a second round with more stringent conditions, involving reduced extension time of 7.5 min in initial reverse transcription step and reduced template concentration of 5 pg/ul. The most promising hit from this round (hitherto named Mut_RT) was sequenced and further characterized (Fig. 6). In Fig. 6A, the five mutations, conferring the novel polymerase activity, were depicted as well as the change in nucleotide sequence of the mutation sites in Supplementary Fig. S9. The new mutations showcased the functional plasticity of KlenTaq DNA polymerase and, more importantly, the sequence diversity contributing to activity on RNA template. The mutant was purified by Ni-NTA affinity chromatography and analysed on SDS-PAGE along with wild-type KlenTaq DNA polymerase ( Supplementary  Fig. S10). We characterized the mutant Mut_RT in detail, in comparison to wild-type KlenTaq DNA polymerase, by conducting primer extension and real time PCR experiments (Fig. 6). A radioactive labelled DNA primer strand was annealed to its complementary site on either 53 nt RNA template strand or DNA template strand and incubated with respective enzymes, wild-type KlenTaq DNA polymerase and Mut_RT. The extended products of primer extension experiments were resolved by 12% denaturing polyacrylamide gel electrophoresis. Both, wild-type and mutant were able to extend the primer strand to a full length product when the DNA template was used. However, with RNA template, it was evident that there was little intrinsic property of the wild-type polymerase to utilize an RNA template, as it failed to extend the bound DNA primer strand beyond two nucleotides (Fig. 6B). On the other hand, Mut_RT showed remarkable extension of the primer strand when hybridized with an RNA template (Fig. 6B). The specific activity using RNA and DNA templates was investigated in comparison to the wild-type enzyme ( Table 1, Supplementary Fig. S11). This confirmed the superiority in reverse transcription of the identified mutant, compared to the parental enzyme. Interestingly, the activity of Mut_RT on DNA template was 4-fold higher in comparison to wild-type KlenTaq DNA polymerase. This effect was observed for other DNA polymerases such as other variants of KlenTaq DNA pol and human DNA polymerase β that were evolved to have increased reverse transcriptase or lesion bypass activity 22,50,52,53 .
Next, the reverse transcriptase activity of the evolved enzyme was investigated by employing varied concentration of RNA template in qPCR with SYBR ® Green I for detection of double stranded DNA (Fig. 6C). The sensitivity of Mut_RT in the employed qPCR was about 0.1 ng, whereas the wild-type enzyme showed no amplification under identical conditions. Finally, to demonstrate the applicability of Mut_RT in cDNA synthesis, total RNA from HEK293 cells was extracted and tested as target for Mut_RT for the amplification from the target HRPT mRNA 54 . The desired 77 bp PCR product was amplified from the HRPT mRNA transcript as detected in agarose gel electrophoresis (Fig. 6D). Again, in order to gain first preliminary insights into the overall selectivity of Mut_RT, we also sequenced the amplificates of the PCR and found, that the sequence was maintained without any errors (see Supplementary  Fig. S12). The above findings show that the identification of a catalytic function that is not present in the parental enzyme could be achieved by the employed combinatorial library design. We identified a new set of mutations that result in reverse transcription activity. We expect that the novel mutations of Mut_RT can serve as an evolutionary starting point for further engineering studies 12,55 .

Discussion
Here, we investigated a two-step approach that coupled structure-based design and molecular shuffling for the design and construction of a combinatorial library of mutants of KlenTaq DNA polymerase. First, we selected target amino acid sites for saturation mutagenesis that are making direct interaction with the primer/template strands or evolutionarily conserved among family A DNA polymerases. The rationally designed library was investigated extensively for the effect of amino acid substitution on each target site, and the activity profile showed   some of the selected target sites to remain immutable (Fig. 1F). For instance, Arginine 573 directly interacts with the nucleobase of both, primer and template strands and could not tolerate any mutational substitution, thereby suggesting its crucial role in structural stability and catalytic function of DNA polymerase (Fig. 1D Inset). Residues R659, K663, R728, R746 and Q754 of finger domain were also least tolerant to the substitution (Fig. 1F).
Overall, from this library we identified mutations that had negligible interference with DNA polymerase activity and thermostability. Then these functionally active mutants were randomly combined to build a versatile second library. We reasoned that the combination of mutants that have minuscule effect on enzyme activity and thermostability would result in entities that have an increased mutation load, but still retain activity. Besides activity and thermostability, we screened the library for entities with two distinct properties. First, we screened the library for entities with increased mismatch discrimination properties when the mismatches were located at the 3′-primer terminus. We identified the variant Mut_ADL that significantly discriminates between mismatches at the terminus and has high potential for applications like allele-specific amplification or genotyping (Fig. 5). Interestingly, the mutant Mut_ADL bears eight mutations contrasting our earlier attempts along the lines where only competent variants, bearing single mutants, were identified and further mutation results in significantly inactivated enzymes 21 . We also found that four of the eight mutations are located in evolutionarily conserved motifs of family A DNA polymerase (N483K in Motif 1, A570E, D578G in Motif 2, & I614M in Motif A) 24,56 . We speculate that such a tremendous tolerance of mutation load was nearly feasible due to the rational of combining functionally active mutants for the combinatorial library design.
Next, we screened the library for entities that exhibit reverse transcription activity, a catalytic activity that is only minuscule in the parental enzyme. Again, we were successful in identifying a variant. The variant termed Mut_RT bears five mutations and exhibits significantly increased activity on RNA as well as DNA, in comparison to the wild-type enzyme. The mutations of Mut_RT DNA polymerase are novel to our knowledge (Fig. 6A). It has been known that the mutation E507K in Taq DNA polymerase contributes to a fast PCR cycling property 57 , and mutation at K540, in concert with five other mutations, displays increased resistance to inhibitors, such as heparin 58 . Residue I614 tolerates amino acid exchange and retains activity near to the wild-type enzyme, but confers to low fidelity 59 . I614M in combination with secondary mutations (E602V, A608V, E615G) exert the functions of synthesising DNA-RNA hybrid 60 . However, the single mutation I614K has been shown to incorporate both deoxyribonucleotides and ribonucleotides with DNA template but not with RNA template 61 . From this, we speculate that the reverse transcriptase activity of Mut_RT might be due to the synergistic effects of distal mutations (N483K, E507K, I614K, K540Y, V586G).
The obtained results support our hypothesis that the depicted approach -rational design of mutation sites and combinatorial design by shuffling active mutants -is indeed successful in identifying enzyme variants with a diverse functional scope. We were successful in obtaining an enzyme with limited substrate scope (extension from matched primer strands) and with increased substrate scope (efficient usage of RNA templates). This is the first time that we were able to "evolve" these seemingly incompatible properties from a single library. We envision that this method is not only limited to the functions as described herein. Future attempts to explore the impact of sequence diversity on enzyme function of this library will aim to screen for mutants e.g., to sense epigenetic marks at single base resolution, the propensity to process damaged templates, improved propensity for chemically modified substrates, and improved tolerance to inhibitors. Of note, we reason, that the approach depicted here is not limited to KlenTaq DNA polymerase but can also be extended to other proteins or enzymes where structural data is readily available.

Material and Methods
DNA manipulation and focused library construction. We purchased oligonucleotides from biomers.
net GmbH and dissolved them in deionized water. The single mutants for all the selected target sites were generated by site-directed mutagenesis. Forward primers (Mutagenic primers) were designed to carry the triplet codon for the desired mutation and 5′ phosphorylated reverse primers aided in ligation after PCR. Each target residue was mutated with 19 different forward primers in a PCR reaction containing Phusion ® High fidelity DNA polymerase (NEB) and other necessary PCR components. After PCR amplification, the template plasmid (wild-type KlenTaq in pGDR11) was digested using the methylation sensitive endonuclease DpnI (NEB) by incubating at 37 °C for 1 hr. The PCR products were purified from agarose gel with QIAquick ® Gel Extraction Kit (QIAGEN) and ligated with T4 DNA ligase (NEB) overnight in the refrigerator. The ligated products were transformed into calcium chloride treated Escherichia coli BL21 (DE3) cells (Novagen) and positive clones were selected after overnight incubation at 37 °C. Plasmids were extracted from positive clones using QIAprep ® Spin Miniprep Kit (Qiagen) and sequenced by Sanger sequencing (GATC Biotech). Clones carrying the desired mutations were established in 384 well plate containing 150 μl of LB media amended with 100 μg/ml of carbenicillin and incubated for overnight growth. The overnight grown cultures were stored at −80 °C after adding 50 μl of 60% sterile glycerol (v/v).
Overexpression of KlenTaq DNA polymerase library. Overexpression of KlenTaq DNA polymerase mutant library was carried out as already described 28 . In short, the expression was performed in 96 well plates with 10 μl of overnight grown culture in 940 μl of LB media containing 100 μg/ml of carbenicillin. Protein expression was induced with 1 mM IPTG and cells were harvested by centrifugation at 4000 rpm for 20 min. Cell pellets were resuspended in 1X KlenTaq buffer (50 mM Trizma ® base (pH 9.2), 16 mM (NH 4 ) 2 SO 4 , 2.5 mM MgCl 2 , 0.1% (v/v) Tween20) containing 0.1 mg/ml lysozyme and lysed at 37 °C water bath for 20 min. After heat denaturation at 75 °C for 45 min, the plates were centrifuged at 4000 rpm at 4 °C for lysate preparation and the lysates were used directly for screening.