Structural basis for TNA synthesis by an engineered TNA polymerase

Darwinian evolution experiments carried out on xeno-nucleic acid (XNA) polymers require engineered polymerases that can faithfully and efficiently copy genetic information back and forth between DNA and XNA. However, current XNA polymerases function with inferior activity relative to their natural counterparts. Here, we report five X-ray crystal structures that illustrate the pathway by which α-(l)-threofuranosyl nucleic acid (TNA) triphosphates are selected and extended in a template-dependent manner using a laboratory-evolved polymerase known as Kod-RI. Structural comparison of the apo, binary, open and closed ternary, and translocated product detail an ensemble of interactions and conformational changes required to promote TNA synthesis. Close inspection of the active site in the closed ternary structure reveals a sub-optimal binding geometry that explains the slow rate of catalysis. This key piece of information, which is missing for all naturally occurring archaeal DNA polymerases, provides a framework for engineering new TNA polymerase variants. The laboratory-evolved polymerase Kod-RI catalyzes α-L-threose nucleic acid (TNA) synthesis. Here, the authors present Kod-RI crystal structures that give insights into how TNA triphosphates are selected and extended in a template-dependent manner, which will help to engineer improved TNA polymerases for synthetic genetics applications.

S ynthetic genetics is an emerging field of science that aims to extend the principles of heredity and evolution to nucleic acid polymers with backbone structures that are distinct from those found in nature 1 . Collectively referred to as xenonucleic acids or XNA 2 , these polymers have unique physicochemical properties that often include strong resistance to degradative enzymes and duplex structures that adopt a range of helical geometries 3 . By engineering polymerases to synthesize and recover genetic information encoded in XNA, researchers are developing complex molecular systems that are capable of undergoing Darwinian evolution in response to imposed selection constraints 4 . These studies, which expand our ability to store, propagate, and evolve genetic information, have profound implications for biotechnology, molecular medicine, and the origin of life 5 .
To date, five different XNA polymers with non-ribose sugars (1,5-anhydrohexitol nucleic acid (HNA), arabino nucleic acid (ANA), 2′-fluoro-arabino nucleic acid (FANA), cyclohexenyl nucleic acid (CeNA) and α-L-threose nucleic acid (TNA)) have achieved successful replication in a Darwinian evolution system [6][7][8][9] . Of these, TNA is considered to be the most structurally diverse, because it has a backbone repeat unit that is one atom shorter than that of DNA or RNA (Fig. 1a) 10 . By comparison, all other XNAs that are capable of Darwinian evolution maintain the same six-atom backbone repeat unit found in natural DNA and RNA 1 . Remarkably, despite this difference, TNA is capable of forming stable antiparallel Watson-Crick duplex structures with itself and with complementary strands of DNA and RNA 10,11 . Solution NMR studies reveal that duplex formation in either the selfpairing mode (TNA/TNA) or cross-pairing mode (TNA/DNA or TNA/RNA) occurs through an A-like helical geometry that is templated by a rigid TNA backbone 12,13 . More recently, stability assays performed under harsh biological conditions demonstrate that TNA is refractory to nuclease digestion 14 . This feature, coupled with the ability to undergo Darwinian evolution in vitro 8 , make TNA a promising candidate for diagnostic and therapeutic applications that require high biological stability 15 .
TNA synthesis is made possible by Kod-RI, a laboratoryevolved polymerase that derives from a replicative B-family polymerase isolated from the archaeal hyperthermophilic species Thermococcus kodakarensis (Kod) 16 . In addition to the 3′,5′exonuclease silencing mutations D141A and E143A, Kod-RI carries the TNA synthesis mutations A485R and E664I (Supplementary Fig. 1) that were identified using the microfluidic strategy of droplet-based optical polymerase sorting, which allows for rapid screening of engineered polymerases in uniform microcompartments 17 . Kod-RI is the most efficient TNA polymerase developed to date 16 , and exhibits 5-fold faster primer-extension efficiency (3 h vs. 15 h, respectively) and~20-fold higher fidelity (four errors per 1000 nucleotide incorporations vs. 70 errors, respectively) than Therminator DNA polymerase (9°N, A485L), which was previously used for TNA synthesis 18 .
Although more than 30 years have passed since the first polymerase structure was solved by X-ray crystallography 19 , no structural information is available for an archaeal polymerase with a primer-template and nucleoside triphosphate bound in the enzyme-active site. Since archaeal polymerases are common enzymes for many biotechnology applications 20 , open and closed ternary structures would help evaluate the mechanism of natural DNA synthesis relative to other polymerase families. However, the crystal structures of XNA polymerases themselves are even more valuable, as these structures would provide insights into the limitations of existing XNA polymerases. In their absence, structural information about the ternary complex must be derived from distantly related viral (RB69 Pol and Phi29 Pol) and eukaryotic polymerases (Pols α, δ, and ε), which share only~20% identity with archaeal B-family polymerases [21][22][23][24][25] .
Here, we describe a structural approach that was taken to evaluate the pathway by which a laboratory-evolved polymerase is able to synthesize unnatural TNA polymers on natural DNA templates. The collection of five X-ray crystal structures details an ensemble of intermolecular interactions and conformational changes that allow Kod-RI to promote TNA synthesis. Close inspection of the enzyme-active site in the closed ternary  structure trapped in a pre-catalytic state reveals a sub-optimal binding geometry for the incoming TNA triphosphate. This key piece of information explains the slow rate of Kod-RI mediated TNA synthesis relative to DNA synthesis by natural Kod DNA polymerase. Together, the set of X-ray crystal structures offer insight into the structural plasticity of DNA polymerases and provide a framework that can be used to guide the engineering of new TNA polymerase variants that function with improved catalytic activity.

Results
Function. Kod-RI is a DNA-dependent TNA polymerase that is able to transcribe individual strands or large libraries of degenerate DNA sequences into TNA (Fig. 1b) 16 . This property, which is remarkable considering the backbone structure of TNA relative to DNA and RNA, has enabled the evolution of TNA aptamers from unbiased pools of random sequences 26 . The enzyme functions by a primer-extension mechanism in which a primer strand (DNA or TNA) annealed to a DNA template is extended with chemically synthesized TNA triphosphates. Previous analyses indicate that Kod-RI functions with a modest rate of~1 nucleotide per minute 16 , which is~10,000-fold slower than the rate of DNA synthesis by wild-type Kod DNA polymerase 27 . Affinity measurements made on the primer-template (P/T) complex and TNA triphosphates reveal that Kod-RI has a K D of 0.6 nM for the P/T duplex and a K M of~15 μM for tNTP substrates (Fig. 1c). These values, which are within the range of natural archaeal B-family DNA polymerases 28 , led us to speculate that the slow rate of TNA synthesis is due to an imperfect active site that positions tNTP substrates in a geometry that is suboptimal for phosphodiester bond formation.
Crystallization. To provide structural insights into the mechanism of TNA synthesis, protein crystals of Kod-RI were grown under conditions that were designed to capture four main steps in the TNA synthesis pathway (Fig. 1d), namely P/T binding, nucleoside triphosphate binding, catalysis, and translocation. Critical to this effort was the chemical synthesis of a chainterminating primer bearing a 2′-deoxy-α-L-threofuranosyl adenosine residue (tA d ) at the 3′ end. Controlled pore glass (CPG) functionalized with a tA d analog was obtained in 8 synthetic steps (Fig. 2) from a known dimethoxytrityl-protected α-L-threofuranosyl adenosine nucleoside 29 . Solid-phase synthesis was then used to construct the DNA primer (P2, Supplementary Table 1) from the tA d -modified CPG resin. A second primer (P1, Supplementary Table 1) bearing an unmodified TNA adenosine residue (tA) was similarly constructed using tA-modified CPG. The α-L-threofuranosyl adenosine 3′-triphosphate (tATP) required for crystallization was obtained in 12 steps from Lascorbic acid using known methodology 29,30 .
Although Kod-RI crystallized in the presence of tATP and magnesium, no electron density was observed for either the bound triphosphate or magnesium ions. Similar results have been reported by others for crystallization trials conducted on the wildtype Kod polymerase and related homologs 31,32 . In an effort to overcome this problem, we explored a variety of crystallization conditions, which included variations in tATP concentration, soaking times, and metal ion additives. After extensive optimization, we successfully obtained open and closed ternary structures with clear electron density for the TNA substrate and bound magnesium ions in the enzyme-active site. The condition that proved most successful involved soaking co-crystals grown in the presence of tATP with excess tATP and/or magnesium ions just  Architecture. Consistent with all known structures of B-family polymerases, Kod-RI adopts a disk-shaped architecture that encompasses the N-terminal (NTD), exonuclease (Exo), and catalytic domains (Fig. 3a) 34 . The catalytic domain is further divided into the palm, finger, and thumb subdomains. The P/T duplex is bound in a groove defined by the palm and thumb subdomains, making contacts to 9 base pairs in the P/T duplex. Interaction maps created for the binary and ternary structures reveal that the duplex is primarily recognized by contacts made to the phosphodiester backbone, with only a small number of direct contacts being made to the sugar and nucleobase moieties (Supplementary Fig. 3a-d). All of the sugar and nucleobase contacts occur through the minor grove, which is consistent with the propensity for B-family polymerases to accept modified nucleotides bearing functional groups at the C5 pyrimidine and C7 deazapurine positions 35 . In addition, many residues responsible for recognizing the P/T duplex are highly conserved among B-family polymerases, including the sequence motif KKKY (residues 591-594, Fig. 3b), which is thought to stabilize the Bform helix by bringing the primer and template closer together 36 . The unpaired region of the template is stabilized by residues from the NTD and exo domains, which cause an abrupt kink in the template at the +1 position. These interactions are all consistent with the high resolution binary structures previously solved for Kod and 9°N 31 .
Conformational changes. Because we were able to solve all five of the polymerase structures that define the TNA synthesis pathway, it was possible to study the conformational changes that facilitate TNA synthesis. Comparative structural analyses identified three major conformational changes between the set of five Kod-RI structures. The first conformational change arises when the apo form of the polymerase binds the P/T duplex to form the binary complex (Fig. 3b). Upon P/T binding, the thumb subdomain transitions from an ensemble of poorly defined conformations to a well-ordered binary structure. One striking example of positional rearrangement is helix α20, which shifts~23 Å to bind the minor groove face of the P/T duplex (Fig. 3b). In addition, several other secondary structural elements (e.g., β26-28 and 3 10 4) not visible in the apo structure become visible in the binary complex (Fig. 3b). These structural changes signify the importance of the thumb subdomain in P/T binding.
The second major conformational change involves formation of the ternary complex with the P/T duplex and tATP substrate bound in the enzyme-active site. This step involves insertion of  Fig. 4), implying that the binary complex can accommodate a tNTP substrate without the need for structural change. This observation is consistent with previous kinetic data showing that dNTPs diffuse directly into the active site of Bfamily polymerases 37 , rather than occupying a pre-insertion site as has been observed for some A-family polymerases 38 . The initial tATP binding event is followed by a major conformational change (Fig. 3c) in which the finger subdomain closes upon the nucleoside triphosphate. Alignment of the open and closed ternary complexes (rmsd 2.5 Å) reveals that Kod-RI undergoes a substantial conformational change to achieve the catalytically relevant state. For this transition, helix α15 in the finger subdomain tilts 21°inward to form the closed ternary complex in which tATP is trapped in a pre-catalytic state by the 2′-deoxy-tA residue on the primer. In addition, numerous residues in the finger (e.g., Arg 460 , Lys 464 , Lys 487 , and Asn 491 ) and palm (e.g., Asp 404 , Asp 540 , Asp 542 , Glu 578 , and Glu 580 ) subdomains undergo repositioning to promote substrate recognition and catalysis (Fig. 4c).
Following catalysis, the polymerase experiences a third conformational change whereby the finger subdomain reopens and the polymerase translocates to the next position on the template (Fig. 3d). This last step completes the cycle of nucleotide addition by enabling the polymerase to move from position 0 to position −1 on the DNA template. The structural changes observed between the closed ternary complex and translocated product are similar to the changes observed between the open and closed ternary complexes (rmsd 2.5 Å), which is expected as the finger subdomain will open and close an equal distance between each cycle of nucleotide catalysis.
Active site analysis. The active site pocket, which encompasses the nascent T4:tATP base pair, is primarily formed by residues Ile 488 , Asn 491 , Ser 492 , Tyr 494 , Gly 495 , Gly 498 , and Tyr 499 in the finger subdomain and residues Leu 408 and Tyr 409 in the palm subdomain (Fig. 4a, b). Three highly conserved carboxylate groups (Asp 404 , Asp 540 , and Asp 542 ) mark the polymerase-active site 34 . Interestingly, the steric gate residue Tyr 409 does not interact with the threose sugar of the incoming tATP substrate (~4.0 Å away) even though this position is known to discriminate against ribonucleoside triphosphates in analogous wild-type polymerases ( Supplementary Fig. 5) 39 . Electron density maps indicate that tATP is tightly bound in the closed ternary conformation (Fig. 4b), but only weakly bound in the open conformation (Fig. 4d). This observation is expected based on the location of the finger subdomain in the open and closed ternary structures.
Three magnesium ions are observed in the active site of the closed ternary complex. Two of these (Mg 2 and Mg 3 ) adopt positions that are structurally identical to other B-family polymerases whose ternary structures have been solved by Xray crystallography (Fig. 4b, c)  (Mg 1 ), located between the α-phosphate of tATP and the primer strand (Fig. 4b, c), is responsible for aligning the 2′-hydroxyl group on the TNA primer for nucleophilic attack on the tNTP substrate. This metal ion lies 3.2 Å from the C2′ atom of the primer strand and 2.5 Å from the α-phosphate on the incoming tATP substrate. A simulated annealing omit map for tATP reveals that the adenine base is highly flexible relative to the triphosphate tail and threose sugar (Fig. 4b). Nucleobase flexibility is due to an active site pocket that is not fully optimized for the smaller size of the tNTP substrate (Fig. 4a). Unlike DNA, TNA lacks a 5′methylene carbon, which would facilitate stronger Watson-Crick base pairing by bringing the nucleoside triphosphate closer to the templating base. Presumably, these interactions could be strengthened by mutations in the thumb subdomain that better constrain the tNTP substrate in the active site pocket.
Base pair geometry. Computational analysis of the Kod-RI structures containing a P/T duplex revealed significant deviations in the planarity of the base pair at the active site 40 . In particular, the buckle and propeller parameters for the T5:tA12 base pair of the binary complex are~−22°and~18°, respectively (Fig. 4d,  Supplementary Table 2). These distortions are recapitulated in the nascent T4:tATP base pair of the open and closed ternary complexes and the non-planar geometry propagates to the T5:tA12 and C6:G11 base pairs (Supplementary Table 2). However, following a single turnover event, the base pair geometry returns to a normal planar conformation as evidenced in the translocated structure (Fig. 4d), indicating the sub-optimal base pair geometry observed in the pre-catalytic state is corrected following the chemical bond forming step. Similar base pair analyses performed on the ternary structures from known viral and eukaryotic B-family polymerases bound to dNTPs reveals that the base pair distortions observed for Kod-RI are distinct and severe relative to natural B-family polymerases (Supplementary Table 3) [21][22][23][24][25] . In all cases, base pair planarity is maintained throughout the duplex, including the incoming nucleoside triphosphate, which stacks directly on the 3′ end of the primer strand. One minor deviation from this trend is the binary structure of wild-type Kod DNA polymerase bound to an all-DNA P/T duplex, which exhibits buckle and propeller distortions at T5:A12 (Supplementary Table 2) 31 . Whether this distortion is typical for Kod DNA polymerase is difficult to assess in the absence of more structural information, most notably a ternary structure for the wild-type polymerase.
TNA synthesis mutations. Kod-RI differs from natural Kod DNA polymerase by the mutations A485R and E664I, which were identified by directed evolution as amino acid changes that confer TNA synthesis activity on the natural polymerase scaffold 17 . Both mutations are located outside the active site pocket and mutational analysis suggests that the two residues perform independent functions 16 . Arg 485 is located on helix α15 in the finger subdomain (Fig. 5). In the closed ternary complex, Arg 485 adopts an upward facing position with respect to the helix that interacts with residues Arg 266 , Glu 330 , and Leu 333 located on helices α6 and α9 of the Exo domain while this residue in each of the open conformation structures (i.e., binary, open, and translocated), is oriented in the downward position along the helix as shown in Supplementary Fig. 6. Comparison of the binary structures for Kod-RI and natural Kod DNA polymerase, reveals that the bulky Arg 485 mutation causes helix α15 to bend toward the primer-template duplex (Supplementary Fig. 7). We postulate that this structural perturbation enables the recognition of TNA substrates by altering the shape of the active site pocket.
By contrast, Ile 664 is located on strand β27 in the thumb subdomain (Fig. 5). This mutation contacts the phosphodiester backbone at nucleotide positions +5 and +6 on the P/T duplex. Holliger and co-workers have identified position 664 as a key site for the processive synthesis of non-cognate nucleic acid polymers 41 . In one example, variants of a DNA polymerase isolated from Thermococcus gorgonarius where found to promote efficient RNA synthesis on DNA templates 41 . While the precise functional role of Ile 664 remains unknown, we speculate that this mutation reduces stringent recognition of the natural P/T duplex by replacing a critical electrostatic interaction with a less discriminating hydrophobic side chain.

Discussion
Archaea constitute one of the three major evolutionary lineages of life 42 . These organisms exist in a broad range of habitats that include harsh environments, such as hot springs and salt lakes, as well as milder areas consisting of soils, oceans, and marshlands. Because of their ability to withstand high temperatures and organic solvents, enzymes isolated from thermophilic archaea have been exploited in many biotechnology applications 43 .
Indeed, numerous examples now exist where archaeal B-family DNA polymerases have been shown to accept chemically modified nucleotides bearing alternative functionality at the sugar or nucleobase moieties 44,45 and these enzymes are often used as the starting point for the directed evolution of XNA polymerases 6,17 . However, despite a prominent role in nature and biotechnology, there exists a paucity of structural information for this important class of DNA polymerases.
Here, we address this shortcoming by providing a series of Xray crystal structures that describe how a laboratory-evolved polymerase is able to synthesize unnatural TNA polymers on natural DNA templates. This body of work produced X-ray The open and closed ternary structure of Kod-RI bound to a primer-template duplex and TNA triphosphate reveals a suboptimal geometry for the incoming nucleoside triphosphate that is characterized by severe buckle and propeller distortions to the nascent base pair. The sub-optimal binding geometry, which extends to the divalent metal ions, likely accounts for the slow rate of catalysis observed for Kod-RI. By comparison, the ternary structures of distantly related viral and eukaryotic B-family polymerases exhibit a co-planar geometry for the nascent base pair with divalent metal ions that are ideally positioned for the subsequent chemical bond forming step (Fig. 6). However, this result is not surprising when one considers the limited evolutionary history of Kod-RI relative to natural polymerases 17 , which have benefited from billions of years of natural selection. The fact that engineered polymerases are able to synthesize artificial genetic polymers with backbone structures that are distinct from those found in nature is a remarkable achievement that can be further improved by directed evolution.
Moving forward, structural information available in the closed ternary complex provides an important framework for generating new TNA polymerase variants that function with increased catalytic activity. Future efforts aimed at improving the geometry of the nascent base pair would benefit from studies that focus on primary and secondary shell residues in the enzyme-active site. This could, for example, include the mutagenesis of nearby residues in the thumb subdomain as well as possible residues in the finger and palm regions that may be required for compensatory reasons. As these studies continue, efforts to establish a more comprehensive view of natural and engineered archaeal polymerases are warranted and would benefit from high resolution X-ray crystal structures for three important classes of ternary polymerases that have yet to be solved: (i) wild-type polymerase bound to natural dNTP; (ii) wild-type polymerase bound to tNTP; and (iii) Kod-RI bound to natural dNTP. In addition, these efforts would also benefit from binary and ternary structures in which the primer strand of the P/T duplex is composed entirely of TNA. Structural information of this type would make it possible to better evaluate the slow rate of TNA synthesis relative to natural DNA synthesis by the wild-type polymerase.
In summary, structural analysis of the TNA synthesis pathway provides critical insights into the constraints of a laboratoryevolved polymerase. This approach of directed evolution and structure determination provides important clues that can be used to guide the development of future XNA polymerases. In the future, it will be interesting to see how well molecular evolution, either alone or guided by computational design, can close the gap in catalytic activity between engineered polymerases and their natural counterparts. Such efforts open the door to a vast new world of synthetic genetics, where sequence-defined synthetic polymers can be used to create new tools for biotechnology and medicine, and possibly even improve our understanding of the origin of life.

Methods
Synthetic procedures. All reagents and solvents of highest purity were purchased from commercial suppliers and used without further purification. Reactions were run under an inert atmosphere of nitrogen or argon, unless otherwise indicated. Thin-layer chromatography (TLC) was performed using Silica Gel 60 F254-coated glass plates (0.250 mm thickness) and visualization was performed with UV irradiation. Chromatography was accomplished with SiliaFlash P60 (230-400 mesh) silica gel. Solutions in organic solvents were dried under reduced pressure using a Buchi rotary evaporator.
NMR spectra were collected on a Bruker DRX-400 (400 MHz 1 H) equipped with QNP probe or a Bruker DRX-500 instrument equipped with a TCI cryogenic probe (500 MHz 1 H, 125.7 MHz 13 C) at 298 K. Spectra are internally referenced to residual solvent signals (CDCl 3 is referenced to 7.26 ppm for 1 H and 77.16 ppm for 13 C). Chemical shifts are reported in ppm, and coupling constants (J) are rounded to integer or half-integer and reported in Hz. High-resolution mass spectrometry was performed by the University of California, Irvine Mass Spectrometry Center.
Chemical synthesis.  (4). To a solution of 3 (400 mg, 0.64 mmol) in methanol (15 mL) was added 500 mg K 2 CO 3 . The suspension was stirred for 18 h at 24°C. After which time, the solvent was evaporated, the residue was suspended in EtOAc (30 mL) and sequentially washed with water (15 mL) and brine (15 mL). The organic phase was dried over MgSO 4 and evaporated to dryness. The residue was purified by silica gel column chromatography (0-2% MeOH/CH 2 Cl 2 , w 0.5% TEA). The pure product (4) was obtained as a solid foam (240 mg, 72%). 1  , the solvents were evaporated and the crude material was purified by silica gel chromatography. The pure product (6) was obtained as a solid foam (90 mg, 70%). 1 H NMR (400 MHz, CDCl 3 , Supplementary Fig. 8a)  After stirring for 1 h at 24°C, the reaction was added to dried detritylated dT-CPG (R2) (150 mg) and 0.25 M ETT in MeCN (0.5 mL) and stirred for 15 min at 24°C. The CPG was filtered, washed with DCM, and MeCN. The CPG was then placed into three DNA synthesis columns and oxidized and capped using standard DNA synthesis protocol at 1 µmol scale.
Oligonucleotide synthesis. TNA modified oligonucleotides were synthesized on an Applied Biosystems 3400 DNA synthesizer using standard β-cyanoethyl phosphoramidite chemistry (Supplementary Table 1). The P1 primer was synthesized on a Universal Support II CPG column (1 μM scale, Glen Research) using chemically synthesized tA phosphoramidite 29 . The P2 primer was synthesized using the chemically synthesized 2′-deoxy-threofuranosyl adenosine-modified CPG (Fig. 2). TNA oligonucleotides were obtained by solid-phase synthesis on a 1-μmol scale using standard DNA coupling conditions 10 . Cleavage from the solid support and deprotection of the oligonucleotides was achieved in NH 4 OH (33%) for 18 h at 55°C. Oligonucleotides were purified by preparative denaturing polyacrylamide gel electrophoresis, isolated, ethanol precipitated, and desalted on a sephadex G-25 ion exchange resin. Pure salt-free oligonucleotides were validated by MALDI-TOF mass spectroscopy, UV quantified, and stored in H 2 O at −20°C.
Kod-RI expression and purification. The kod-RI gene was PCR amplified from a previously constructed vector, pGDR11-Kod-RI 16 which additionally harbors two mutations (D141A and E143A) to inactivate exonuclease activity, using Kod-RI_for and Kod-RI_rev primers (IDT) containing NdeI and NotI restriction enzyme sites, respectively (Supplementary Table 1). Purified PCR product and pET21 (Novagen) were digested with NdeI and NotI restriction enzymes (NEB) and ligated and the resulting pET21-kod-RI construct was sequence verified (Retrogen). Acella® cells (Edge BioSystems) harboring pET21-kod-RI were grown aerobically at 37°C in LB medium containing 100 μg mL −1 ampicillin. At an OD 600 of 0.8, expression of a tagless Kod-RI was induced with 0.8 mM isopropyl β-D-thiogalactoside at 18°C for 20 hr. Cells were harvested by centrifugation for 20 min at 3315 × g at 4°C and lysed in 40 mL lysis buffer (10 mM Tris.Cl pH 7.5, 100 mM NaCl, 0.1 mM EDTA, 1 mM DTT, 10% glycerol, 5 mg egg hen lysozyme) by sonication. The cell lysate was centrifuged at 23,708 × g for 30 min and the clarified supernatant was heat treated for 20 min at 70°C and centrifuged again at 23,708 × g for 30 min. The supernatant was loaded onto 5 mL HiTrap Q HP and heparin HP columns (GE) assembled in series with the efflux of the Q column loaded in front of the heparin column. After washing with lysis buffer, the Q column was removed and Kod-RI was eluted from the heparin column with a high salt buffer (10 mM Tris.Cl pH 7.5, 1 M NaCl, 0.1 mM EDTA, 1 mM DTT, 10% glycerol) using a linear gradient. Eluted fractions containing Kod-RI were visualized by SDS-PAGE, pooled, and concentrated using a 30 kDa cutoff Amicon centrifugal filter (Millipore). TNA synthesis. Primer-extension reactions were performed in a final volume of 10 µl using the PBS8 primer (5′-IR800-label-GTCCCCTTGGGGATACCACC-3′) and the L11 library (5′-GGATCGTCAGTGCATTGAGA-N 40 -GGTGGTATCCC CAAGGGGAC-3′, where N is the random region). Each reaction contained 10 pmol primer/template complex, 1× ThermoPol buffer [20 mM Tris-HCl, 10 mM (NH4) 2 SO 4 , 10 mM KCl, 2 mM MgSO 4 , 0.1% Triton X-100, pH 8.8], 1 µM KOD-RI, 100 µM of each tNTP, and 1 mM MnCl 2 . Reactions were incubated for 30 min at 55°C, quenched with stop buffer (8 M urea, 45 mM EDTA) and analyzed by 20% denaturing urea PAGE. The uncropped gel is provided as Supplementary Fig. 9.
Michaelis-Menten kinetic analysis. Kinetic measurements were performed in 96well format. Each measurement (10 μL) contained a final concentration of 1 μM of the self-priming hairpin template (5′-TCTCTATAGTGAGTCGTATAGGT GGTATCCGAAAGGATACCACC-3′), 1× Thermopol buffer, 4× Eva Green fluorescent dye, 1 mM MnCl 2 , 1 μM Kod-RI, and a titrated concentration of tNTPs [2.5-100 μM]. Reactions were initiated by denaturing for 2 min at 95°C, and extended for 2 h at 55°C with fluorescence intensity measurements collected at 6 s intervals. The first 18 s of data were excluded to eliminate possible artifacts caused by temperature equilibration. The data were fit using a nonlinear regression in R using the equation Y = (V min + V max ×X×(X + K m ) −1 where V min is the minimum velocity and V max is the maximum velocity. The reported values derive from the average of three independent replicates with error bars defining the standard deviation.
K D analysis. A twofold titration series starting from 2 μM Kod-RI was generated in 1× Thermopol buffer in a final volume of 5 μL. The polymerase solution was combined with 5 μL of annealed primer-template complex [TNA strand: 5′gacactcgtatgcagtagcc-3′ 5′-labeled DNA strand: 5′-Cy5 ACATTGGCATCAAGTC ATAA GGCTACTGCATACGAGTGTC-3′] at a final concentration of 2 nM in 1× Thermopol buffer. The reaction mixtures were then incubated for 60 min at 23°C. The samples were then loaded into separate capillaries and loaded onto the Monolith NT.115 Pico (Nanotemper). The capillaries were scanned using 20% laser and MST power and the data was plotted. The dissociation constant (K D ) was determined using a standard fit using custom software from Nanotemper. Control experiments with wild-type Kod and engineered Kod-RI binding to DNA/DNA and DNA/TNA duplexes are provided in Supplementary Fig. 10.
Crystallization procedures. All reagents purchased from commercial suppliers were of analytical grade. Stock solutions of sodium sulfate decahydrate (Fluka), polyethylene glycol 3000 and 3350 (Sigma Aldrich), and 2-(N-morpholino)ethanesulfonic acid (Calbiochem) were filtered before use. The additives, 1,6-Hexanediol and Silver Bullets Bio #56 (Hampton Research), were used without further manipulation. All crystallization samples were prepared according to the desired mechanistic step (see below) and 0.2 μL appropriate additive (10% v/v of 2 μL crystallization drop) was added before centrifugation. 24-well plate hanging drop trays were used to optimize crystals over a range of pH and PEG concentration, with every drop containing 1 μL of sample mixed with 1 μL of mother liquor over 500 μL mother liquor in every well. Trays were stored in the dark at room temperature. Crystals typically grew between 1-4 weeks.
Duplex preparation. The DNA template (Supplementary Table 1) was purchased from IDT as an HPLC purified sample bearing a 5′ Cy5 label. The template strand (T) was used without further purification as a substrate for crystallization trials 31,51 . Duplexes of P1/T and P2/T were prepared by combining equal amounts of the primer and template strands in Kod-RI buffer (see protein expression) supplemented with 20 mM MgCl 2 , and annealing the strands by heating at 95°C for 5 min and slowing cooling to 10°C over 10 min.
Crystallization and structure determination. All crystals were grown in hanging drops and transferred into mother liquor containing 20% glycerol (or the corresponding polyethylene glycol concentrations in the respective crystallization conditions) immediately before harvesting. The specific crystallization conditions for each polymerase state are described below. Five diffraction data sets were collected at synchrotron sources (Advanced Light Source and Stanford Synchrotron Radiation Lightsource) from single crystals. Unless specified, images were indexed, integrated, and merged using iMOSFLM 52 . Data collection statistics are summarized in Table 1. Initial models were determined by molecular replacement (MR) using Phaser 53 and all final models were determined using iterative rounds of manual building through Coot 54 and refinement with phenix 55 . The final stages of refinement employed TLS parameters; unless specified, Kod-RI was partitioned into 4 TLS groups (i.e., 1-156, 157-304, 305-532, 533-756) while the template and primer strands contributed an additional TLS group each. The stereochemistry and geometry of all structures were validated with Molprobity 56 , with the final refinement parameters summarized in Table 1. Final coordinates and structure factors have been deposited in the Protein Data Bank. All molecular graphics were prepared with PyMOL 57 .
Binary complex: The binary complex was prepared by incubating Kod-RI (5 mg mL −1 ) with 1.5 molar equivalents of the P1/T duplex at 37°C for 30 min. The binary complex co-crystallized in 0.2 M sodium sulfate decahydrate, 0.1 M 2-(Nmorpholino)ethanesulfonic acid pH 6.0, and 16 % polyethylene glycol 3350, supplemented with Silver Bullets Bio (Hampton Research) additive #56 (0.2 % w/v D-Sorbitol, 0.2 % w/v Glycerol, 0.2 % w/v Glycine, 0.2% w/v myo-Inositol, 0.2 % w/ v Sarcosine, 0.02 M HEPES sodium pH 6.8). MR was performed using the Kod exobinary complex structure (PDB ID: 4K8Z) as the search model in which the P/T duplex sequence is identical except for a single TNA A residue (tA12, Supplementary Table 1) at 3′ end of the primer strand 31 . After MR, a tA model replaced the corresponding deleted A12 and its phosphate group was linked to the preceding O3′ of G11 before iterative refinement and model building began. PDB ID: 5VU6.
Ternary complex (open): An initial binary complex was prepared by incubating Kod-RI (5 mg mL −1 ) with 1.5 M molar equivalents of the P2/T duplex at 37°C for 30 min. 5 M excess of tATP monomer was added to the binary complex and the solution was incubated at 37°C for 30 min. The ternary complex co-crystallized in 0.2 M sodium sulfate decahydrate, 0.1 M 2-(N-morpholino)ethanesulfonic acid pH 5.5, 3% w/v 1,6-Hexanediol, and 22 % polyethylene glycol 3350. Ternary complex crystals were transferred to mother liquor containing 20% glycerol additionally supplemented with 2 mM tATP for 30 min. MR was performed using an early binary complex (with P1/T duplex) structure as the search model. tATP was initially included as the incoming substrate; however, due to poor electron density, tATP was replaced by tA. The final open ternary complex structure includes, besides tA, one sulfate ion and one magnesium ion. PDB ID: 5VU7.
Translocated product: The binary complex with the P1/T duplex was prepared using a similar protocol and 5 M excess tATP was added and incubated at 37°C for 30 min before crystallization. Co-crystals of the translocated complex were grown in 0.2 M sodium sulfate decahydrate, 0.1 M 2-(N-morpholino)ethanesulfonic acid pH 5.8, and 12 % polyethylene glycol 3350, supplemented with Silver Bullets Bio (Hampton Research) additive #56. MR was performed using an early binary complex (with P1/T duplex) structure as the search model. During initial model building, the P1/T duplex was translocated and a TNA A model, tA13, was added and its phosphate group linked to 2′-O atom of tA12. The final translocated product structure spans residues 1-757 and contains 363 water molecules. PDB ID: 5VU9.
Data availability. Coordinates and structure factors for all five Kod-RI crystal structures have been deposited in the PDB with the accession codes: 5VU5, 5VU6, 5VU7, 5VU8, and 5VU9. Other data are available from the corresponding author upon reasonable request.