Stringent control of the RNA-dependent RNA polymerase translocation revealed by multiple intermediate structures

Each polymerase nucleotide addition cycle is associated with two primary conformational changes of the catalytic complex: the pre-chemistry active site closure and post-chemistry translocation. While active site closure is well interpreted by numerous crystallographic snapshots, translocation intermediates are rarely captured. Here we report three types of intermediate structures in an RNA-dependent RNA polymerase (RdRP). The first two types, captured in forward and reverse translocation events, both highlight the role of RdRP-unique motif G in restricting the RNA template movement, corresponding to the rate-limiting step in translocation. By mutating two critical residues in motif G, we obtain the third type of intermediates that may mimic the transition state of this rate-limiting step, demonstrating a previously unidentified movement of the template strand. We propose that a similar strategy may be utilized by other classes of nucleic acid polymerases to ensure templating nucleotide positioning for efficient catalysis through restricting interactions with template RNA.

P rocessive nucleic acid polymerases are molecular machines that carried out the template-directed biosynthesis of long DNA and RNA chains using nucleoside triphosphates (NTPs) as substrates, playing essential roles in nearly all life forms. These polymerases are capable of forming an elongation complex (EC) that processively carries out thousands of nucleotide addition cycles (NACs), and the mechanism by which the EC utilizes to execute each NAC is essential to the understandings of these enzymes. An NAC contains at least four micro-steps: the template-directed binding of NTP at or near the polymerase active site that is catalytically "open", the active site closure defined by conformational changes of key polymerase motifs for optimal catalysis, the formation of the phosphodiester bond (i.e., chemistry), and the translocation event that moves the polymerase one register in the downstream direction of the nucleic acid template [1][2][3] .
For more than two decades, insights into the polymerase catalysis have been benefitted from numerous three-dimensional structures defining distinct states of the polymerase in the NAC. To date, the visually most significant conformational changes between different NAC states are mainly associated with the active site closure step. Relatively large-scale conformational changes were observed in A-family (also known as Pol I family) polymerases and multi-subunit DNA-dependent RNA polymerases [4][5][6][7] . For instance, the fingers domain of bacteriophage T7 RNA polymerase undergoes a dramatic rotational movement to allow the O-helix to reposition the NTP from a "pre-insertion site" to the catalytic competent insertion site 4,5 , and the trigger loop of yeast RNA polymerase II (pol II) becomes more ordered and moves to the proximity of the nucleotide addition site ("A" site, and equivalent to "insertion site") upon active site closure to allow direct and indirect interactions with all moieties of NTP 7 . A different mode of polymerase active site closure is represented by viral RNA-dependent RNA polymerases (RdRPs). Viral RdRP comprises palm, fingers, and thumb domains as with other singlesubunit polymerases, but forms a unique encircled architecture through fingers-thumb interactions that is absent in A-family polymerases 8 . The structural rearrangements leading to the active site closure in viral RdRPs are mostly limited to their palm domain with the hallmark being a small-scale inward movement of the catalytic motifs A/D backbone to facilitate catalysis 9 .
Translocation is generally regarded as the most important post-chemistry NAC event. It changes the whole footprint of polymerase on the nucleic acids by one register and moves the newly incorporated nucleotide out of the active site, thus being a possible and interesting target for polymerase intervention independent of the pre-chemistry NTP substrate selection. In Afamily polymerases, translocation is facilitated by the O-helix movement that is essentially the reversion of the motion required for active site closure. This conformational change allows the side chain of an aromatic residue (e.g., Y631 in T7 RNA polymerase) to "push" the nascent base pair from the +1 site to the −1 site 1,5,6 . An analogous mechanism was proposed in multi-subunit RNA polymerases that the bending of bridge helix may account for the "push" based on a straight bridge helix observed in pre-translocation yeast pol II ECs and a bent bridge helix in apo bacterial RNA polymerases 10,11 . However, direct evidence of a bent bridge helix "pushes" against the nascent base pair in an intermediate EC is necessary to prove this hypothesis. It is conceivable that polymerases utilize key parts in the vicinity of the nascent base pair to facilitate translocation after catalysis. However, the mode of the nucleic acids movement, in particular, that of the template-product duplex relative to the polymerase active site during translocation remains largely elusive. In numerous polymerase EC structures, the duplex upstream of the active site is either in pre-or in post-translocation states.
Whether stable translocation intermediates between the pre-and post-translocation states exist, or could such intermediates be captured by structural biology approaches to interpret translocation mechanism, remains to be clarified.
We previously developed a methodology in efficiently crystallizing picornavirus RdRP EC by taking advantage of RNAmediated crystal contacts 12 . This approach allowed us to efficiently obtain versatile crystal forms of RdRP ECs, and the variations in lattice-induced restraints on the function of polymerase provided potentials to stabilize or trap low-energy intermediates that may be short-lived in solution. Building on our recent work reporting an enterovirus 71 (EV71) RdRP EC with an intermediate conformation of its template-product duplex 13 , here we report three types of translocation intermediate crystal structures that define multiple nucleic acid conformations bridging the pre-and post-translocation states, both in the regulate forward translocation and in an unusual reverse translocation induced by pyrophosphate (PP i ) and divalent metal ions. These structural data, in combination with enzymology characterization of the RdRP, highlight the key role of RdRP motif G in controlling the critical movement of the template RNA strand that is probably the rate-limiting step for both forward and reverse translocations. Our work provides important mechanistic details towards the understanding of how translocation occurs in viral RdRPs and will serve as an important reference for understanding the nucleic acid polymerase NAC in general.

A translocation intermediate with a mixture of two states.
Building on the previously established methods in assembling enterovirus RdRP EC and in capturing various EC states 12,13 , we used an EV71 RdRP bearing a C291M mutation in such approaches, since the equivalent PV RdRP mutant (C290M) was shown to produce higher quality EC crystals (Fig. 1a) 12 . The sixstate nomenclature of the RdRP catalytic cycle suggested in our previous work was used in this study to aid the illustration of translocation intermediates and the mechanisms suggested by them ( Fig. 1b) 3,9,14 . Besides capturing a state 6 (S 6 ) translocation intermediate after incorporation of a C-C-U tri-nucleotide with almost identical conformation to the previously reported intermediate (pdb entry 5F8N, Supplementary Fig. 1), we observed unusual electron densities for some of the EC structures derived from similar crystal soaking strategies (Fig. 2 and Table 1, S 6A /S 6B (C 3 ) structure, 2.3 Å resolution, C 3 denotes the third NAC). While the template strand density of this type of structure appeared normal, the density of the product strand clearly indicated the presence of two sets of backbone phosphates. When a pre-translocation product RNA was modeled, strong positive density appeared between the modeled phosphates and continuous density was observed for an array of neighboring product bases (Fig. 2a). By modeling two alternate conformations of the product strand, this unusual density can be readily interpreted ( Fig. 2b-d). Accordingly to the six-state nomenclature of the RdRP NAC suggested in our previous work with S 6 representing translocation intermediates, we assign these two EC states as S 6A and S 6B . While both states share the same template model that is nearly identical to that in the previously reported S 6 structure, the two product models differ in the extent of translocation with S 6B leading S 6A with an occupancy of 60% and 40%, respectively (Fig. 2d). In order to better visualize the intermediate states, here we use previously reported representative state 1 (S 1 , posttranslocation with empty active site) and state 5 (S 5 , pretranslocation with active site reset to open conformation) structures as the primary references (S 1 : 5F8G/5F8L; S 5 : 3OL9, chain A) 9,13 , and several conserved residues, including I176 in motif F, D238 in motif A, and D329 in motif C, were used to aid structural comparisons (Fig. 2b). Comparing to the S 5 product RNA, the S 6A product slightly moved toward the upstream as an entity, while the S 6B product had the upstream nucleotides move by longer distances than the downstream nucleotides and is nearly identical to the previously reported S 6 product ( Fig. 2c-d; Supplementary Movie 1: the first clip shows the modeled transition from S 5 to S 6 ). We have proposed that these intermediates can form fast equilibrium with the pre-translocation state, and the observation of the S 6A and S 6B co-existence suggests that these intermediates are indeed stable enough to form distributed states and to be captured by crystallography.
A PPi-induced reverse translocation intermediate structure. The observation of relative high level of freedom during translocation for the product strand intrigued us to test whether such a phenomenon is also applied to translocation in the reverse direction, a process starting from the post-translocation state to the pre-translocation state in the previous cycle. Although not previously observed in nucleic acid polymerases, such in-crystal reverse translocation may possibly occur as this EC lattice was shown to accommodate multiple rounds of catalysis and translocation 13 . By soaking native EC crystals in CTP-containing solutions, we were able to allow the polymerase to reach the state 1 of the third cycle (C 3 S 1 : after two incorporation events and two translocation events) ( Supplementary Fig. 2, top left). Subsequently, these S 1 EC crystals were soaked in solution containing PP i and MnCl 2 . In a time scanning experimental setting, applying long soaking time resulted in completion of the reverse translocation and the EC had "walked" back to a pre-translocation state of the previous cycle (C 2 S 4/5 , Supplementary Fig. 2, bottom right), while shorter soaking time led to an intermediate state again with strong indication of a mixed distribution of product strand phosphates and continuous electron density for neighboring bases ( Supplementary Fig. 2, bottom left; Fig. 3a and Table 1, S 6RA /S 6RB (C 2 ) structure, where "R" denotes reverse, 2.2 Å resolution). In contrast, the entire backbone of the template strand stayed largely at the post-translocation state and the density of bases were localized. Alternate conformation modeling and refinement indicated that two populations of the product strand existed with different reverse translocation extents (Fig. 3b-d).
The first population represents an intermediate state with small-scale reverse translocation of a few upstream product nucleotides (positions −5 to −7) (state 6 RA or S 6RA , 40% occupancy), while the second population had the entire product backbone moved toward the downstream with an average distance about one-half register and having the downstream nucleotides moved farther than the upstream ones ( Fig. 3c-d, S 6RB , 60% occupancy; Supplementary Movie 2: the clip shows the modeled transition from S 1 to S 6RB ). Unlike what was observed in the forward translocation intermediates, the majority of the original base pairs in the S 6RB were no longer maintained. Slippage occurred between the two RNA strands, resulting in a mixture of mismatches, G:U wobble base pairs, and a G:C base pair ( Fig. 3d and Supplementary Movie 2). When MgCl 2 was used in reverse translocation soaking trials, an intermediate structure can also be captured with consistent features of the MnCl 2 -derived S 6RA /S 6RB structure ( Supplementary Fig. 2, top right structure). It seems that the observed heterogeneity of the product strand conformation in both forward and reverse translocation is relatively independent, as not only the template strand, but also the product-interacting RdRP regions have defined electron density to indicate conformation homogeneity (Figs. 2b, 3b, and Supplementary Fig. 3).
While both the translocation directions (in particular the movement of the product strand) and the interaction details between the two RNA strands are different for forward and reverse translocations, as suggested by the captured intermediates (comparing Figs. 2d and 3d). A striking common feature for both modes of translocation is that the template backbone, in particular for positions −2 to +1, stayed fixed relative to its starting point of translocation. As we pointed out previously 12,13 , this phenomenon is probably related to two motif G residues (T114-S115 in EV71 RdRP, "G" symbol in Figs. 2d and 3d) that create a hurdle-like steric blockage right at the +1/+2 junction of the template backbone. Aside from this, the −1 and −2 template phosphates interact with two residues with basic side chains ("+" symbols in Figs. 2d and 3d), K127 and R188, help maintain an unusual backbone conformation of the −2 template and may play an auxiliary role to control the template backbone movement during translocation 12 , as both K127 and R188 are not conserved in viral RdRPs. In order to complete either forward or reverse translocation, the template +1 position phosphate needs to "leap" over the hurdle, and overcoming such an energy barrier is probably the rate-limiting step during translocation.  Fig. 1 A schematic diagram for EV71 RdRP EC assembly, crystallization, and crystal soaking and a six-state model for the viral RdRP NAC. a The EC was assembled using the T35/P10 construct upon addition of a G-A-G-A-G-A hexa-nucleotide. The 16-mer (P16)-containing EC was crystallized and subjected to different soaking trials to capture various NAC states. b The previously proposed six-state RdRP NAC starts with the state 1 (S 1 ) complex with an empty and open conformation active site, and the NTP binding leads to the state 2 (S 2 ) complex that remains in the open conformation. In most cases, the cognate NTP induces a conformational change to close the active site (corresponding to the active site closure) to reach state 3 (S 3 ), and then completion of the phosphoryl transfer indicates the formation of the state 4 (S 4 ) complex that remains in the closed conformation. The active site often resets to open conformation before translocation and the corresponding complex was assigned as the state 5 (S 5 ) complex. The state 6 (S 6 ) complex represents any intermediates that poised between the pre-and post-translocation states. The representative PDB entries are provided next to the state symbols and reference entries used to help present the intermediate structures in this study are shown in blue.
The hurdle residues in RdRP motif G are tunable. Possibly due to low sequence conservation, the function of motif G has not been well clarified since its discovery as an RdRP-unique motif ( Fig. 4a) 15 . The EV71 RdRP G117-equivalent residue exhibits relatively high-level conservation in the positive-strand and double-stranded (ds) RNA viruses. This residue does not interact with the RNA and the preference of a glycine could be related to its involvement in the folding of a sharp turn (corresponding to residues 117-120 in EV71 RdRP), thus only playing a structural role. Among the motif G residues, the aforementioned hurdle residues are of particular interest due to their critical spatial position and their roles in restricting the template RNA movement as suggested by both forward and reverse translocation intermediate structures. With an aim to dissect the role of the motif G hurdle residues in translocation and in RdRP NAC in general, we compared representative RdRP structures to help locate structurally equivalent motif G residues and found that the majority of residues at these two positions are serine (S), alanine (A), threonine (T), and glycine (G) (Fig. 4a). This observation suggests that RdRPs tend to use small-side-chain residues at these two positions. The low-level sequence conservation and high-level conservation in side chain size intrigued us to screen for motif G mutants that could possibly change the nature in restricting the movement of the template strand while keeping the ability in catalysis. By applying all possible combination of S/A/T/G residues at RdRP positions 114 and 115, we made 15 mutants for a comparison with the WT (T114-S115). We also made four mutants with identical residues at both positions for comparison, and medium-side-chain residues asparagine (N), leucine (L) and large-side-chain residues phenylalanine (F), tyrosine (Y) were chosen. Except for the double L and double F mutants (Supplementary Table 1), these constructs were able to form ECs after Capital letter G indicates motif G hurdle residues. The "+" symbols indicate the basic residues K127/R188 that can interact with the -1/-2 template phosphates, respectively. The red arrows indicate the distance and direction of backbone movement of the product nucleotides. Forty and 60% denote the crystallographic occupancy of S 6A and S 6B , respectively.
incorporating a G-A-G-A tetra-nucleotide directed by a "CUCU" template sequence (Fig. 4b, top), allowing us to determine the catalytic efficiency (denoted by k cat /K M,NTP ) in a stopped-flow fluorescence-based single nucleotide incorporation assay modified from previously reported methods (Fig. 4b-d) 16 . For the WT construct and five mutants (S115G, S115A, T114S, T114A, and T114A-S115A), the fluorescence signal change upon CTP addition fits well to a single-exponential decay model, while the signal change for other mutants deviated from the WT-model to different extents, possibly indicating conformational alteration brought by some of the mutants. We therefore classified all 18 constructs in five classes (I-V) based on the observed kinetics behaviors ( Fig. 4c and Supplementary Fig. 4). Nevertheless, the same fitting strategy was applied for all constructs to determine the k cat and K M,NTP values (Table S1). It turned out that more than half of the mutants had at least 10% catalytic efficiency of the WT-level (11-105%), to some extent supporting the postulation that residues at these two positions may be variable to some extent while maintaining the catalytic capability.
An "ST" mutant leads to a transition-state-like structure. For all five classes of mutants, the first three exhibited relative efficient EC assembly. We next tested one mutant from each class for EC crystallization to see whether the mutation could change the behavior in template strand restriction during translocation. Among these mutants, a class II mutant T114S-S115T (abbreviated as ST) yielded EC crystals that are suitable for structure determination. This mutant has a k cat value comparable to the WT but its K M,NTP is about 50-fold of the WT level ( Fig. 4c-d).
The much lower NTP affinity could indicate the alteration of the catalytic pocket at the NTP binding stage. Interestingly, the ST mutant-derived EC structure indeed exhibited a previously unidentified conformation with the −2 to +2 template nucleotides clearly occupying intermediate positions, while the product strand almost reached the post-translocated position (Table 1 and we did not observe NTP-like electron densities in it, and therefore the incoming NTP-assisted translocation is not evident. Previously determined native enterovirus EC crystal structures all adopted the post-translocation conformation, and therefore it is probable that the hurdle residue mutation relatively destabilize the posttranslocation state, allowing the capture of the observed intermediate state. Comparing to the pre-translocation state, the template +1 phosphate moved by 2.2 Å toward the upstream and residue 115 backbone amide nitrogen interacting with the phosphate oxygen also moved 1.7 Å accordingly (Fig. 5d). The electrostatic interactions between the K127/R188 side chains and the −1/−2 template phosphates were lost, and the unusual backbone conformation switched to regular conformation ( Fig. 5d and Supplementary Fig. 5, compare S 5 and S 6M ). Therefore, the S 6M represents a late stage in the translocation process and probably mimics the actual transition state of the rate-limiting step in forward translocation for an RdRP with a WT-sequence in motif G (Supplementary Movie 1: the second clip shows the modeled transition from S 6 to S 6M and the third clip shows the modeled transition from S 6M to S 1 of the next cycle).  Supplementary  Fig. 6), indicating that the energy landscape for the translocation process may have been indeed changed by the ST-mutation.

Discussion
The biosynthesis of long chain nucleic acid is essential for most forms of life and comprises thousands of NACs carried out by different types of processive nucleic acid polymerases. Catalysis and translocation need to be well coupled within each NAC and between neigboring NACs to fulfil rapid synthesis in a processive manner (tens to hundreds of nucleotides per second). However, these two processes have distinct characters and requirements. For catalysis, the polymerase needs to precisely position the +1 template nucleotide, ensuring the NTP recognition and subsequent conformational changes to close the active site for the phosphoryl transfer reaction. The conformational changes for all types of polymerases, in an induced-fit manner, unexceptionally occurred around the NTP substrate [4][5][6][7]9,17 . Therefore, the polymerase is relatively "fixed" on the nucleic acid in the catalysis part of the NAC. Once chemistry occurs, the goal of translocation is to "move" the entire polymerase one-register downstream and therefore to set up the stage for the next catalysis event in the next NAC. Hence, the RNA-RdRP interaction changes brought by the global movement of RNA relative to the RdRP during translocation are not expected to be localized around the active site, and the relative movement between the polymerase and nucleic acids and the components that control or contribute to these movements are essential to the understanding of translocation. However, as detailed below, only small-scale conformational changes around the active site of the RdRP occur during translocation as the RNA "threads" through the polymerase. Although the detailed translocation mechanism suggested by the intermediate structures reported here may only applicable for viral RdRPs, other classes of polymerases may utilize similar strategies to control translocation in accordance with catalysis. The observation of the three types of translocation intermediates allows a detailed analysis of RdRP translocation mechanism in the context of previously proposed models. The Brownian ratchet model emphasizes the fast equilibrium between the pre-and post-translocation states and the PP i or incoming NTP as the "ratchet" for stabilizing the pre-and posttranslocation states, respectively 18,19 . In forward and reverse translocations, fast equilibrium may be established between the starting state of each translocation event and the structurally observed intermediates (Fig. 6, a-b: S 5 and S 6A /S 6B ; S 1 and S 6RA / S 6RB ), and the observation of overall longer moving distances for the leading nucleotides (i.e., upstream nucleotides in Fig. 2d  The sequences of the T33-F int /P10 construct used in the stopped-flow assay and the data from the WT construct showing faster fluorescence signal decrease with higher CTP concentration. F int denotes a fluorescein-labeled nucleotide at an internal position. When GTP and ATP were provided as the only NTP substrates, the 10-mer primer (P10) was extended to a 14-mer product for EC assembly (the incorporated G-A-G-A tetra-nucleotide is boxed). The assembled EC was then used in stopped-flow trials by mixing with CTP for monitoring the singlenucleotide conversion from the 14-mer to a 15-mer. c A comparison of catalytic efficiency for WT and hurdle residue mutants. All constructs were classified into five categories based on their kinetics behaviors ( Supplementary Fig. 4) "irreversible" feature of the power stroke mechanism fits well with the rate-limiting steps of both forward and reverse translocation processes, at which a conformational change is required to release the locking interactions between the motif G hurdle residues and the template strand backbone (Fig. 6a-b; Supplementary Movie 1, the combination of the second and third clips shows the modeled process in releasing the locking interactions in forward translocation). The S 6M structure further suggests that mutating the motif G hurdle residues could alter the relative stability of key states in translocation, presumably through weakening of the locking interactions (Fig. 6c). For the ST mutant, rate-limiting step present in regular forward translocation may not exist, and  Fig. 2. For the S 6M structure, motif G hurdle residues and motif F I176 moved with the template strand accordingly, while the electrostatic interactions between -1/-2 template phosphates and K127 and R188 were lost and the unusual -2 template backbone conformation reset to normal. d A structural comparison of pre-translocation (S 5 ), early (S 6B ), and late (S 6M ) translocation intermediates states highlighting the coordinative conformational changes of the template strand and key residues in motifs F and G. The athlete-hurdle and ratcheting wheel cartoons are provided for better illustration of critical role of the hurdle residues 114-115. Labeling, symbols, and coloring scheme are the same as in Fig. 2 with motif G in pink.  Empty triangles indicate the motif G hurdle residues (right) and the −1/−2 phosphates-interacting basic residues (left) that "lock" the template (Fig. 5d). These locking interactions only unlock at the rate-limiting steps in models in (a-b), while in the panel c model, the template locking is overall weakened (partially locked symbols) by the ST mutation for all states. T1/T2, T1R/T2R, T1M/T2M are proposed transition states for forward, reverse, ST-mutant-derived translocations. The horizontal bars are only used to indicate relative free energy levels proposed in our working model and do not indicate quantitative suggestions.
the entire translocation process may be consistent with the Brownian ratchet mechanism as the stability of the intermediate state is likely comparably to that of the pre-and posttranslocation states. However, altering the free energy landscape is probably not beneficiary to the overall efficiency of the NAC. The about 50-fold increase in K M,NTP likely indicates the impairment of the precise placement of the +1 nucleotide by the mutant RdRP. Based on these analyses, we propose that other processive nucleic acid polymerases may also need to stringently control the translocation to bias the pre-and post-translocation state for efficient NTP binding and catalysis, through certain interactions to ensure the positional control of the templating nucleotide (i.e., the +1 template nucleotide). While the capture of three types of translocation intermediates in the EV71 RdRP allowed us to reveal the motif G hurdle residues as the key component in the stringent control, solving analogous intermediate structures may be needed to reveal functionally equivalent polymerase component(s) in other classes of nucleic acids polymerases systems. In bacterial DNA-dependent RNA polymerase, a pausing-related structure shows a conformation similar to the S 6B in this study and our previously reported S 6 structure, with an overall pre-translocated template DNA and posttranslocated product RNA 20 . The retention of the template strand in this structure is likely related to the interactions between the nucleotides at the +1/+2 junction and the bridge helix, which also plays an essential role in catalysis. Motif G has been only present in viral RdRPs, and even the most homologous reverse transcriptase does not have such an element. Our data here demonstrate that the motif G hurdle residues are tunable to some extent with small-side-chain residue replacement. Therefore, these two residues may serve as RdRPunique variable sites to alter the RdRP properties as well as the virus behavior, and could provide opportunity for potential applications related to RNA virus reverse genetics and vaccine development. We previously suggested that motifs A/D movement essential for catalysis is not coupled to translocation 23 . However, the observation of catalytic efficiency modulation by mutation(s) in the hurdle residues suggests that motif G not only serves as the key element to control translocation but also have indirect impact on catalysis. Additionally, The motif F isoleucine (EV71 RdRP I176, with leucine or alanine in RdRPs from a minority of RNA viruses), that caps the +1 template base in the pre-and post-translocation states, moved with the base during template translocation as suggested by the S 6M structure (Fig. 5c). The motif B loop, which contains an absolutely conserved glycine (G290 in EV71 RdRP) and was proposed in PV and influenza virus studies as a candidate to "push" the nascent base pair toward upstream during translocation [24][25][26] , undergoes small-scale backbone shift (no greater than 1 Å) in accordance with motifs G and F (Fig. 5d, and Supplementary Fig. 5 and Movie 1). This loop may indeed fully swing toward the upstream in the final stage of translocation as suggested by the "out"-conformation captured using PV RdRP constructs bearing mutations in this region 24 . Taken together, as some RdRP components contribute to both catalysis and translocation, these two key steps of NAC are coupled to some extent and the modulation of translocation independent of catalysis may not be readily achieved.
While efficient and rigorously controlled catalysis and translocation ensures the long chain nucleic acid biosynthesis in a processive manner, proofreading activities are necessary for maintaining an optimal fidelity level in genome replication, transcription, and reverse transcription processes. Under certain circumstances in transcription (e.g., pausing or arrest), the 3′-end nucleotide can be cleaved through hydrolysis as assisted by factors such as bacterial GreA/GreB and eukaryotic TFIIS 27,28 , or through pyrophosphorolysis reactions 29,30 . These activities, either directly linked to proofreading or related to iterative cleavage and re-synthesis processes beneficiary to the formation of ECs, may require a reverse translocation (known as "backtracking" in transcription) movement of the template-product to position the 3′-end nucleotide in the +1 position for cleavage reactions. The in-crystal reverse translocation and the intermediate structure observed in this study provide a structural basis to understand this movement ( Supplementary Fig. 2, Fig. 3, and Supplementary Movie 2). The pyrophosphorolysis is often much slower than nucleotide addition. When comparing the forward and reverse translocation intermediate structures S 6A /S 6B and S 6RA /S 6RB , a major difference is the disruption and reestablishment of baseparing interactions during the reverse translocation, suggesting that the reverse translocation is relatively difficult to achieve (compare Supplementary Movies 1 and 2). We performed PP i induced cleavage in solution and found that the cleavage only become obvious when PP i concentration reached 2 or 5 mM when Mg 2+ or Mn 2+ was provided as the divalent metal ion, respectively ( Supplementary Fig. 7, lanes 11-19 and 34-39). Although multiple cleavage events were evident, very slow kinetics and low yield were observed in the presence of either divalent metal ion ( Supplementary Fig. 7b). This is also largely consistent with the crystallographic observation of reverse translocation but not cleavage ( Fig. 3 and Supplementary Fig. 2).
RNA virus genome replication and transcription are typically not regulated by proofreading factors, except that the exonuclease activity may be related to fidelity regulation in coronaviruses and mismatched NTP-mediated 3′-end nucleotide cleavages were observed in hepatitis C virus RdRP 31,32 . However, intrinsic cleavage activities could still exist either through hydrolysis and pyrophosphorolysis under certain situation for viral RdRPs to achieve its optimal synthesis and fidelity. We hypothesize that misincorporation or sequence-related pausing may alter the free energy landscape of translocation, in particular for the stability of the post-translocation state, thus increasing the opportunity for reverse translocation and subsequent cleavage. In this way, the polymerase may improve the overall synthesis efficiency and fidelity without the assistance of other protein factors.

Methods
Plasmid construction and protein preparation. Two pET26b-Ub vector-based plasmid containing the EV71 (strains SK-EV006-LPS1 and HeN09-17/HeN/ CHN2009) 3D pol (RdRP) gene were used as the original cloning templates to construct the mutant plasmids according to previously described methods 13,33,34 . The constructs derived from the first strain were used for crystallographic studies, while those derived from the second strain were used in biochemical characterizations. Cell growth, isopropyl-β-D-thiogalactopyranoside (IPTG) induction, cell harvesting, cell lysis, protein purification, and protein storage were performed as described previously 13,16 .
RNA preparation and 3D pol EC assembly. The 35-mer template (T35) RNA used was prepared by T7 RNA polymerase-glmS ribozyme-based approaches described previously 35,36 , and the 10-mer RNA primer (P10) was chemically synthesized (Integrated DNA technologies). RNA construct assembly and EC assembly, purification, and storage were performed as previously described 13 , except that the KCl and NaCl concentrations were 50 and 20 mM, respectively, and MES (pH 6.5) was used as the buffering agent for EC assembly and storage for PDB entries 6LSE and 6LSF, and the NaCl concentration was further adjusted to 30 mM, and HEPES (pH 7.0) was used as the buffering agent for PDB entries 6LSG and 6LSH.
Crystallographic data processing and structure determination. X-ray diffraction data was collected at Shanghai Synchrotron Radiation Facility (SSRF) beamline BL19U1 (wavelength: 0.9789 Å) for the S 6A /S 6B structure and BL17U1 (wavelength: 0.9792 Å) for S 6RA /S 6RB and S 6M (C 0 and C 2 ) structures at 100 K. Data of at least 180°were typically collected in 0.2°oscillation steps. Reflections were integrated, merged and scaled using HKL2000 (Table 1) 37 . The initial structure solution was obtained using the molecular replacement program PHASER 38 with coordinates derived from EV71 EC structures (PDB entries 5F8G/5F8L, chains A-C) as the search model 13 . Manual model building and structure refinement were done using Coot and Phenix, respectively 39,40 . For the S 6A /S 6B and S 6RA /S 6RB structures, the entire product RNA chain was split into two alternate conformations and group occupancy refinement was applied. Ramachandran statistics are 92.7/7.1/0.0/0.2, 92.4/7.3/0.0/0.2, 94.4/5.4/0.0/0.2, and 93.2/6.6/0.0/0.2 for the S 6A / S 6B , S 6RA /S 6RB , S 6M (C 0 ), and S 6M (C 2 ) structures, respectively (values are in percentage and are for most favored, additionally allowed, generously allowed, and disallowed regions in Ramachandran plots, respectively). The 3500-K composite simulated-annealing (SA) omit 2F o -F c electron density maps were generated using CNS 41 . Unless otherwise indicated, protein structure superimposition was done using the maximum likelihood-based structure superpositioning program THESEUS 42 .
The stopped-flow fluorescence-based single nucleotide incorporation assay. This assay was modified from protocols described previously 16,43 . Briefly, the 10mer RNA primer (P10) and the 33-mer RNA template T33-F int with an internal fluorescein label (Integrated DNA technologies) were annealed at a 1.1:1 molar ratio to yield the T33-F int /P10 construct that was used for EC assembly and stopped-flow trials, except that: the enzyme concentration was 16 μM for EC assembly; the NaCl, KCl, MgCl 2 , and TCEP concentrations in the SF buffer were adjusted to 30, 50, 5, and 5 mM, respectively; the fluorescence excitation monochromator bandwidth is 4 nm. The fluorescence signal change was measured at 22.5°C after rapid-mixing, producing a final solution with 25 nM RNA, 400 nM 3D pol , 0.5 μM ATP/GTP and different concentrations of CTP in SF buffer. The EC was assembled using the T33-F int /P10 and EV71 RdRP upon incorporation of a G-A-G-A tetra-nucleotide with GTP and ATP as only NTP substrates, producing a 14-mer (P14)-containing EC (EC14). A rapid fluorescence signal decrease was observed when CTP was mixed with EC14 using a stopped-flow instrument (Applied PhotoPhysics Chirascan SF3).
Data analysis and fitting were done as described previously 16 . Briefly, the data of rapid signal decrease observed for CMP incorporation were fitted into a single exponential decay curve Y = amplitude•exp(-rate•X) + offset. The incorporation rates were then plotted as a function of the CTP concentration and the data were fitted to the Michaelis-Menten type equation, rate=k cat •[CTP]/([CTP] + K M,CTP ), to determine the k cat and apparent K M,CTP values for calculating the catalytic efficiency (k cat /K M,CTP ).
Reporting summary. Further information on research design is available in the Nature Research Reporting Summary linked to this article.