Functional control of a 0.5 MDa TET aminopeptidase by a flexible loop revealed by MAS NMR

Large oligomeric enzymes control a myriad of cellular processes, from protein synthesis and degradation to metabolism. The 0.5 MDa large TET2 aminopeptidase, a prototypical protease important for cellular homeostasis, degrades peptides within a ca. 60 Å wide tetrahedral chamber with four lateral openings. The mechanisms of substrate trafficking and processing remain debated. Here, we integrate magic-angle spinning (MAS) NMR, mutagenesis, co-evolution analysis and molecular dynamics simulations and reveal that a loop in the catalytic chamber is a key element for enzymatic function. The loop is able to stabilize ligands in the active site and may additionally have a direct role in activating the catalytic water molecule whereby a conserved histidine plays a key role. Our data provide a strong case for the functional importance of highly dynamic - and often overlooked - parts of an enzyme, and the potential of MAS NMR to investigate their dynamics at atomic resolution.


Introduction
Cells use large protein assemblies to perform many essential biological processes. The cellular protein quality control machinery comprises a collection of such large protein assemblies, including chaperones, unfoldases, proteases and peptidases. Collectively, these proteins eliminate damaged or misfolded proteins either by refolding them to a functional state or by proteolysis. Many proteases and peptidases form large oligomeric assemblies, often in the molecular weight range of hundreds of kilodaltons. The self-compartmentalization of these machineries allows for specificity, as only unfolded proteins or small fragments can access the protease reaction centers. The proteasome, a prominent example, cleaves proteins to peptides of ca. 7-15 residues length (1). These peptide fragments are then further digested to amino acids by aminopeptidases (2), such as tetrahedral aminopeptidases, present in all forms of life. While structures of many proteases, peptidases and chaperones are available, the precise mechanisms of their action, including substrate entry, fixation and product release, often remain difficult to decipher. Motions and allosteric regulation are thought to be intimately linked to enzymatic function, as shown for machines of the protein quality-control system (3)(4)(5). An increasing number of cases reveals that enzymatic turnover can directly depend on the inter-conversions of states, such as conformations in which the active site is open or closed (6)(7)(8)(9)(10) or where larger domains reorganize e.g. for binding additional accessory proteins (11). Characterizing the link between enzyme structure, dynamics and function at the atomic scale remains, however, experimentally challenging.
We study here the 468 kDa large tetrahedral aminopeptidase TET2 from the hyperthermophilic archaeon Pyrococcus horikoshii, a member of the metallo-peptidase family (M42). TET aminopeptidases assemble to dodecameric tetrahedral structures, encapsulating twelve Zn 2 active sites within a large hollow lumen with a diameter of ca. 60 Å (12-16) (Fig. 1). Pores on each of the four faces of the tetrahedron, each ca. 18 Å wide, allow the passage of unfolded or short --helical, or --hairpin peptides while preventing folded proteins from entry to the catalytic chamber. The processing occurs in a sequential manner from the N-terminus of the peptide (17). The fastest hydrolysis is observed for peptides of up to ca. 12 amino acids length, and the longest peptides processed by TET2 are ca. 35 residues long (12). The cleaved amino acids may be released through pores located on the tetrahedral faces close to the central entry pore (13), or through small pores at the apices of the tetrahedron (17), or the large entry pores. Different TET isoforms have different substrate preferences and may assemble into heterododecameric assemblies with improved efficiency for peptide processing (14,(17)(18)(19). Homo-dodecameric TET2 displays highest activity for cleavage of hydrophobic residues, with a preference for leucine as the N-terminal amino acid, showing a high activity up to 100 ¶ C over a broad pH range (20). Eukaryotic homologues of TET2 are involved in blood pressure regulation in humans (21,22) and hemoglobin degradation in the human malaria parasite P. falciparum (23).
Although static high-resolution structures are available, important mechanistic aspects of the function of TET peptidases remain debated (17). Solution-state NMR is ideally suited to study protein dynamics and thus link structures to function. However, for proteins of the size of TET2, solution-NMR suffers from rapid signal loss for most sites, and generally methyl-specific labeling is the way to obtain site-specific information (24)(25)(26)(27). However, this approach is by definition limited to methyl-bearing residues. Unlike solution-state NMR, magic-angle spinning (MAS) NMR does not face inherent protein size limitations and allows to see, in principle, each atom. MAS NMR is a powerful technique for studying dynamics at atomic resolution, and has been applied to crystalline, membrane and amyloid proteins (28)(29)(30)(31)(32)(33)(34)(35)(36)(37)(38)(39)(40). Most of the previous MAS NMR dynamics studies focused on proteins below 20-30 kDa, as the resonance overlap often encountered in larger proteins complicates analyses. We have recently achieved the resonance assignment of ca. 90 % of the backbone atoms, and about 70 % of the side-chain heavy atoms, as well as methyl groups of Ile, Leu and Val residues (41,42) and Phe ring positions (43) in TET2, and developed an approach that uses medium-resolution cryo-EM data along with (primarily solid-state) NMR data to solve the structure of TET2 (41). With 353 residues per subunit, TET2 is among the largest proteins for which such near-complete assignment has been achieved, and the largest structure solved to date. Exploiting the possibility to probe dynamics and interactions at essentially all sites in TET, we use here quantitative MAS NMR experiments, co-evolution analyses and molecular dynamics (MD) simulations to probe at the atomic level the dynamic contacts formed between the active sites and a functionally important loop. Enzyme kinetics experiments and mutants allow linking these findings to the function of this enzyme. Our study provides direct insight into the functional control of an enzyme through a region which is not even visible in high-resolution crystal structures, and demonstrates the maturity of MAS NMR for studying the structure-function link of even very large proteins.

Results
A highly dynamic loop in the catalytic chamber controls enzyme activity. The catalytic chamber of TET2 comprises twelve long loops, one from each subunit, comprising residues 120-138. Interestingly, in 3D structures of TET2 obtained by crystallography (13), these loop regions have not been modeled; similarly, in our recent cryo-EM data (41) this region had very weak electron density. In crystal structures of the homologous TET1 and TET2, the loop has been modeled, and has high B-factors (Fig. S1). All these observations point to large mobility of these loops. When modelled into the TET2 structure, the loops fill close to 30% of the catalytic chamber volume (Figure 1a-c). Thus, one may assume that they represent a significant steric penalty for substrate trafficking in the catalytic chamber, raising the question of the possible functional role. To probe whether the loops influence catalytic activity, we measured the peptidase activity of a TET2 mutant in which the loop has been shortened to a two-residue b-turn (D(120-138  The structure is based on PDB ID 1Y0R, and the loops (unresolved in the crystal structure) have been modeled with Swiss-Model. The schematic model shows the arrangements of the six dimers within the dodecamer. (c) View into the catalytic chamber and onto two adjacent subunits (light grey, black) with their respective loops (red), and the catalytic zinc ions in the active sites, as well as loops from adjacent subunits. (d) Enzyme kinetics data, obtained as the initial rate of absorbance signal following the cleavage of leucine-pnitroanilide (Leu-pNA). pNA) is detected by the absorbance of the reaction product, pNA. The loop variant showed a dramatically reduced enzymatic activity compared to the wild-type protein ( Figure 1d and Table S1). To ensure that this loop shortening does not significantly impact the protein structure which would lead to the observed drop in activity, we have collected MAS NMR correlation experiments (2D hNH and 3D hCONH). Based on the observation of very similar chemical shifts of the Dloop and WT variants (discussed below, Fig. 3) we can rule out structural distortions in this mutant, and it must be the loop itself which plays an important role for catalytic activity of WT TET2. We used MAS NMR to probe the conformational behavior of the loop in more detail. In dipolar-coupling based MAS NMR experiments, which are inherently most sensitive for rigid parts, the backbone of residues S119 to K132 and W136 to Q138 could not be assigned (41). The absence of these peaks in such dipolar-coupling transfers may point to largeamplitude motions. If such motions are very fast (tens of nanoseconds at most), scalar-coupling based transfers shall be appropriate. However, we did not observe additional peaks in scalar-coupling based hNH correlation experiments (data not shown). The assigned residue D135, shows rapid 15 N R 1fl spin relaxation (AE 12 s -1 ; Fig. S2 a). Moreover, its 15 13 C R1fl values at two 13 C spin-lock field strengths (i.e. two points of the NERRD profile), highlighting that only V120 and I139 have a strong NERRD effect. c). Collectively, the absence of observable signals for most of the backbone sites in the loop, and the direct evidence from spin relaxation of D135 suggest that the loop undergoes µs motion.
To gain further insight into the loop dynamics, we used methyl-directed MAS NMR of a specifically Ile/Leu/Val 13 CHD 2 -labeled and otherwise deuterated sample ( Figure  2a). The well-resolved cross-peaks of Val 120, located toward the beginning of the loop, and of Ile 139, located at the C-terminal junction of the loop to a b-strand, were assigned through a mutagenesis approach (42). The additional methyl group signal in the loop, "1 of Ile 124, is not spectrally resolved (42). V120 and I139 are convenient probes of the loop conformational dynamics, which we quantitatively measured. Fig. 2b, c shows the 1 H-13 C dipolar-coupling tensor data of all Ile-d1, Val-g2, Leu-d2 methyl groups. The dipolar coupling reflects the motional amplitude of the methyl group axis, averaged over all time scales up to hundreds of µs (45), and can be directly translated to the order parameter, which ranges from 1 for fully rigid to 0 for fully flexible sites. In the case of a valine, it corresponds to motion of a single sidechain torsion angle (‰1) and of the backbone. While the vast majority of valines are rather rigid, with order parameters (S 2 ) in the range 0.7 to 1, Val120 is highly flexible (S 2 =0.15 ± 0.02).
Ile 139, at the end of this loop, also displays a similarly low order parameter.
Spin-relaxation rate constants are sensitive to both the time scales and the amplitudes of dynamics. In particular, R 1fl experiments performed at different spin-lock radio-frequency (RF) field strengths up the the regime where the RF field nutation frequency reaches the MAS frequency (44,46,47),   sense µs-ms dynamics even if the exchanging conformations do not significantly differ in their chemical shifts. Figs. 2d, e show NERRD data for Val sites in TET2. The NERRD curves of almost all methyl sites are flat or show only a modest increase (AE 10 s ≠1 ) close to the rotary-resonance condition (see Fig. S5). Strong non-flat NERRD profiles are observed for V120 and I139 and unambiguously demonstrate that these sites undergo µs motions. As the motion experienced by the V120 side chain is presumably complex, involving methyl rotation, side chain motion and loop reorientation, quantitative analysis is challenging, and we limit the analysis to an estimated time scale of ca. 10-1000 µs (see Fig. S7 for discussion). Motion on this time scale leads to fast transverse relaxation in MAS NMR experiments (45), which provides an explanation for the broadening beyond detection of most backbone signals and the elevated R 1fl and relaxation dispersion of D135 (Fig. S2). We additionally measured longitudinal relaxation rate constants ( 13 C R 1 ), which is sensitive to faster motions (nanoseconds) (45,50,51). Neither V120 nor I139 have particularly fast R 1 decay, demonstrating that their motion does not take place on the ns time scale (Fig. S6). This observation again supports that the large-amplitude motion primarily occurs on longer time scales (µs-ms), corroborating the R 1fl data. We then characterized which parts of the catalytic chamber are in (possibly transient) contact with the loop. We exploited the fact that the chemical shift is a suitable reporter of such contacts: it is sensitive to the local environment around a given atom, averaged over time scales up to milliseconds. Therefore, even transient contacts of a given residue with the loop would be imprinted on its chemical shifts. The loop contacts shall, thus, be observable by comparing the chemical shifts of a wild-type (WT) protein and a protein lacking the loop. We have prepared these two samples of TET2 (WT, Dloop) with uniform labeling (u-2 H, 15 N, 13 C, 100% back-exchanged) and probed the backbone 1 H N , 15 N and 13 Ca chemical shifts. Fig. 3a, b shows the hCANHderived chemical-shift differences, WT-Dloop, which reflect the effects induced by the loop. As expected, CSP effects are located exclusively in the interior of the enzymatic lumen. The large area within the catalytic chamber involved in loop contacts spans residues from the entry pore to the active site. Transient contacts of the loop with all these residues requires large-amplitude motion, in line with the dynamics data re-ported in Fig. 2.

Co-evolution and molecular dynamics simulations detect loop contact sites.
Based on the observation that the loop is crucial for function ( Figure 1d) and in contact with many residues, we reasoned that interaction patterns of the loop may be conserved across TET homologs. Thus, we investigated how residues in the loop co-evolved in more than 20000 different homologous sequences. Analysis of co-evolution (Direct Coupling Analysis, see Methods) highlights the conservation of contacts involving the loop and residues in the catalytic chamber, both intra-and intermolecularly ( Figure 3c). Such co-evolution is remarkable, as generally the residues in loop regions evolve quickly (52). Co-evolution is observed between residues of the loop and e.g., V93, located right next to the Zn 2 center, as well as P246 in the entry pore. We used one-microsecond-long all-atom molecular dynamics (MD) simulations of the dodecameric TET2 assembly to gain additional insight into the contacts of the loop with other structural parts. These simulations are challenging for several reasons: with its 468 kDa, TET2 represents a size challenge for all-atom MD, making it difficult to study long time scales; furthermore, as the experimental data revealed, the loop motion occurs on a tens-of-microseconds time scale (Figs. 2 and S7). Consequently, in order to obtain convergence from MD simulations, hundreds of microseconds to milliseconds would need to be simulated. Our simulations can, thus, only provide qualitative conclusions, and these are in very good agreement with the experimental observations. Fig. 3d highlights the residues within the TET2 cavity which are in transient contact along the MD trajectory; these span the range from the entry pore to the active site, mirroring the NMR CSP data and the co-evolution data. The MD data also allow identifying numerous contacts between the loops of two adjacent monomers within the dimeric building block of TET2, as well as contacts to loops from other subunits (Fig. S9). MD provides the possibility to obtain a structural view of the loop conformations. In particular, we aimed to understand which role the evolutionarily-conserved contacts play for the loop conformations. We observed that the loop conformations in which the loop forms contacts to residues D94 and I238 (P121-I238 and P122-D94 and Q125-D94) correspond to states which bring the loop in proximity to the active site (Fig. 3e). This finding suggests that the observed co-evolution may be related to contacts of the loop to substrates. Collectively, three fundamentally different approaches, MAS NMR, MD and co-evolution analysis, reveal large amplitude motion of the functionally important loop, which we show to occur on a time scale of ca. 10 µs to 1 ms.

Conserved loop residues are important for enzymesubstrate interaction.
To identify the mechanisms of this enzymatic control via a highly flexible loop we investigated the sequence alignment of TET2 homologs, which reveals strong conservation of a histidine in this loop, and to lower extent also a Pro-Pro motif, corresponding to P121, P122 and H123 in TET2 (Figure 4a). Analysis of the structures of related aminopeptidases and homologs in which the loop has been modeled into the electron density suggests that H123 of a given subunit may be in close vicinity to the active site of the adjacent subunit (Fig. S1). Histidines often play an important role in enzyme catalysis because the imidazole side chain allows it to form hydrogen bonds and to combine donor and acceptor properties (53). Hydrogen bonds formed between the substrate and a residue outside the canonical Zn 2 center play such a stabilizing role in several other aminopeptidases (54). We speculated that the highly conserved histidine H123 may contribute to stabilizing the substrate in the active site; the Pro-Pro motif preceding H123, is conformationally restricted (55), and may be important to position the conserved His within the active site. We investigated the role of the conserved residues using functional assays with mutant proteins. Specifically, we mutated H123 to either phenylalanine or tyrosine (ring structure with similar dimensions as His, but without the H-bond donor/acceptor nitrogens) or lysine (to investigate the importance of a positive charge). In an additional mutant, D(122,126), we shortened the loop by one residue on each side of the H123, thus reducing its ability to reach into the active site. To test the importance of the PP motif (residues 121 and 122), we replaced it by a flexible GG stretch.  Figure 4b shows the results of activity assays for the WT and mutant samples processing the chromogenic substrates H-Leu-pNA. The mutants have significantly reduced activity. In particular, the Michaelis constant (56), K M , of the mutants indicates that the stability of the enzyme-substrate complex is reduced by up to one order of magnitude (Figure 4b and Table S1). Experiments with a longer substrate, the tetrapeptide H-Leu-Val-Leu-Ala-pNA, are in good qualitative agreement with the data from the short H-Leu-pNA (Figure 4c). Taken together, the activity assays show that the stability of the enzyme-substrate complex is reduced through mutations that render the loop either shorter or remove a residue of the conserved Pro-Pro-His motif.
Ligand-dependent conformational equilibrium of the loop. How can a highly dynamic loop without a stably defined position play a crucial role for the activity of an enzyme? We propose that within the wide range of loop conformations there are states which bring important residues, such as H123, close to the active site; in these conformations, contacts to the substrate may increase the stability of the enzyme-substrate complex, as evidenced by the K M values. According to this view, the equilibrium of loop conformations is expected to be altered by the presence of substrate bound to the active site.
We experimentally tested this model by measuring the effect of bound ligands on the conformational equilibrium of the loop. The challenge for such experiments comes from the short life time of substrates inside the active site, as they are cleaved within milliseconds (57). Moreover, the population of TET2 particles that have simultaneously all 12 sites occupied is extremely small. As an experimentally feasible alternative to generate a temporally stable and fully ligandoccupied state of TET2, we prepared samples of TET2 with the inhibitor amastatin (Fig. 5a), a peptide that tightly and non-covalently binds to the active sites of TET2 (13). MAS NMR experiments reveal the chemical-shift perturbations induced by this tightly bound inhibitor (Figs. S10 and S11a). CSPs are observed in the close vicinity of the binding site, in excellent agreement with the crystal structure (13). Importantly, a single set of resonances is found in our samples, i.e., the entire population is shifted from the apo state to the amastatin bound one. We turned to methyl 1 H-13 C correlation spectra and used the signal of V120 to monitor whether the bound inhibitor impacts the conformational equilibrium of the loop. As compared to the apo state, the cross-peak of V120 shifts significantly upon amastatin binding (Fig. 5b). V120 is ca. 6 -18 Å away from the nearest atom of the inhibitor in the MD ensemble ( Fig. 5c and data not shown), too far to cause any impact of amastatin on the V120 signal by direct molecular contact. Because the chemical shift reports on the ensembleaveraged conformational equilibrium, the altered peak position of V120 rather reveals that the relative population levels of the loop conformers are altered; similarly, the backbone N-H signal of D135 is significantly altered upon amastatin binding (Fig. S11a). Fig. 5e sketches this idea of a conformational ensemble. Based on the activity measurements, we expected that H123, via its effect on stabilizing the substrate in the active site, plays an important role in reshuffling the loop conformational equilibrium upon ligand binding. Consequently, we expected that a H123F mutant, unable to form these contacts, would be unable to induce this population re-shuffling. We found that this is exactly the case: in H123F TET2, the reporter NMR resonance of the loop conformation, V120, was essentially unaffected by the presence of the inhibitor in the active site (Fig. 5d). We ensured that amastatin tightly binds to the H123F mutant, evidenced by significant CSPs in 1 H-15 N correlation spectra upon inhibitor binding and again a fully bound state (i.e. no residual apo-state peaks), akin to the wild-type protein (Fig. S11b).  TET2. The inhibitory properties of aliphatic alcohols on aminopeptidases have been reported earlier (58). In the presence of MPD, several H-N moieties (Gly92 (NH), Asp94 (NH) and the zinc-chelating histidine 323 (Nd-Hd)) feature two cross-peaks of approximately equal intensity, indicating that MPD-bound and free states of TET2 co-exist in slow exchange (Fig. 6). A comparison with spectra of TET2 without MPD (sedimented rather than MPD-precipitated) shows that one of these two peaks corresponds to an MPD-bound form.
In Dloop TET2 only a single 1 H-15 N cross-peak is observed for these sites in the presence of MPD (Fig. 6), contrasting the behavior of WT TET2. For all three sites, the observed peak is close to the peak position that corresponds to the free (not MPD-bound) state of the WT protein. This finding suggests that the observed peak represents an apo state, and that the MPD-bound state is not significantly populated; alterna-tively, MPD binding and release may still occur, but at a faster rate, and the observed peak is the population-weighted average of free and bound states. Of note, these experiments do not provide direct evidence that the loop directly interacts with the ligand (in this case MPD). It is conceivable that the loop stabilizes the MPD-bound state more indirectly, by contacting other residues of the protein rather than the substrate itself. The precise mechanism as well as the binding affinity likely depends also on the nature of the substrate. Irrespective, this data shows that the observed affinity of a ligand at the active site directly depends on the presence of the loop, mirroring the reduced binding affinity of substrates that we observed in the activity measurements (Fig. 4, 1/K M values).

Discussion
Why do the large TET aminopeptidases, present in all kingdoms of life, feature loop regions which fill up almost one third of their catalytic chamber? Why has evolution generated these long stretches which seem to hamper the access of substrates to the active sites, rather than having an empty spacious lumen? Our combined MAS NMR, functional and computational study clarifies the functional role of these loops, which we show to be highly flexible (Fig. 2). We demonstrated that these loops act to stabilize substrates in the active site (higher enzyme-substrate affinity, Fig. 4), and that the loop-substrate interaction in turn shifts the conformational ensemble of the loop (Fig. 5e), and that the loop has an impact on active-site binding (Fig. 6). Residues in loop regions evolve rapidly, on evolutionary time scales, and are generally hardly conserved (52). Evolution of the physico-chemical characteristics of the loop may have helped to widen the substrate specificity of TET peptidases.
Remarkably, though, we identified a conserved Pro-Pro-His stretch, and demonstrated its functional relevance. Why is this functionally important element highly flexible, rather than being located on a short, less flexible element in the direct vicinity of the active site? We propose that the high degree of flexibility is required to allow the passage of substrates within the chamber, particularly as the substrates can be up to 35 amino acids long. Freedom of movement within the chamber is important not only for newly entering substrates but also for substrates that were cleaved once and which remain in the chamber for further degradation at one of the 12 catalytic sites (17). The length and flexibility of the loops may furthermore allow the required versatility for the interaction with a broad range of substrates of different lengths.
On the methodological side, the current study establishes that MAS NMR is highly suited for probing enzyme function, even of very large complexes such as the half-megadalton large TET assembly. For complexes of this size, solutionstate NMR (59,60) is generally limited to methyl groups. The ability of MAS NMR to detect essentially all backbone and side chain sites allows to obtain a more comprehensive view; here, only the combination of methyl data with backbone and even side chain His resonances allowed seeing with parts are in contact with the loop (Fig. 3), or binding the lig-ands (Fig. 6). We have exploited advanced MAS NMR methods to probe dynamics, including 13 C NERRD data (Fig. 2d), which, to our knowledge is the first report of this method, and asymmetric dipolar-coupling tensor averaging (Fig. 2b), both of which are unavailable for solution-state NMR methods. Our study demonstrates that such quantitative dynamics experiments by MAS NMR, combined with other biophysical methods, can be decisive to link static structures to functional mechanisms.

Online Methods
Protein samples. TET2 from P. horikoshii (UniProt entry O59196) was produced by overexpression of a pET41c plasmid encoding the TET2 sequence in Escherichia coli BL21(DE3) (Novagen) cells in suitably isotope-labeled M9 minimum media. Samples used for NMR studies were either u-[ 2 H, 13 C, 15  mM NaCl (pH 7.6), and mixed 1:1 (vol/vol) with 2-methyl-2,4-pentanediol (MPD), which results in appearance of white precipitate, which we filled into 1.3 mm MAS rotors (Bruker Biospin) using an ultracentrifuge device (ca. 50000 g, in a Beckman SW32 rotor, 20000 rpm) for at least 1 hour. We have also prepared samples by sedimenting TET2 from the buffer solution [20 mM Tris, 20 mM NaCl (pH 7.6)] with the same ultracentrifuge parameters, over night, without addition of precipitation agent (used for data shown in Fig. 6, light blue). 13 C-13 C spectra of MPD-precipitated, isopropanolprecipitated and sedimented samples were highly similar (data not shown). The loop-deletion mutant plasmid, lacking residues 120-138 of the WT sequence, was prepared by the RoBioMol platform at IBS Grenoble within the Integrated Structural Biology Grenoble (ISBG) facility. The other mutants were generated by a commercial provider, GenScript.
NMR. MAS NMR data were acquired on a 14.1 T (600 MHz 1 H Larmor frequency) Bruker Avance III HD spectrometer (Bruker Biospin) using a 1.3 mm probe tuned to 1 H, 13 C and 15 N frequencies on the main coil, and an additional 2 H coil that allows for deuterium decoupling, which greatly enhances resolution of 13 CHD 2 spectra (61). One additional data set, a 13 13 Ca, 15 N CSP reported in Fig.  3a was calculated as CSP =  ( "( 1 H) where -CA =0.3 and -N =0.1, and " denote the chemicalshift differences in the two spectra in units of ppm. All 13 C relaxation experiments and the REDOR experiment described below were obtained using pulse sequences reported in Figure S2 of ref. (43) as a series of 2D 1 H-13 C spectra (also implemented in NMRlib (65)). 1 H-13 C transfers (out and back) were achieved by cross-polarization, typically using ca. 2 ms long CP transfer with a 1 H RF field strength of ca. 90 kHz (linear ramp 90-100%) and matching the 13 C RF field strength to the n=1 Hartmann-Hahn condition (i.e. ca. 35 kHz). 13 C near-rotary-resonance relaxation dispersion (NERRD) R 1fl experiments (44,48) (Fig. 2d, e) were recorded at 14.1 T and a MAS frequency of 46 kHz. Relaxation delays were adapted in the different experiments, in order not to damage the hardware with extensively long high-power spin-lock duration; the delays are listed in table S3. 13 C R 1 measurements were done at 22.3 T, using relaxation delays of 0.05, 0.2, 0.4, 0.6, 0.8, 1.0, 1.25, 1.5, 2.0, 2.5 seconds. 1 H-13 C rotational-echo double resonance (REDOR) (66) experiments (Fig. 2c), in the implementation described in ref. (67) were used to measure asymmetric dipolar coupling tensors. The MAS frequency was 55.555 kHz (18 µs rotor period). The 1 H and 13 C p pulses were 5 µs and 6 µs (100 kHz and 83.3 kHz RF field strength), respectively. One out of two 1 H p pulses was shifted away from the center of the rotor period, in order to scale down the dipolar-coupling evolution and thus sample it more accurately, as described earlier (68), such that the short and long delays between successive 1 H p pulses were 0.5 µs and 7.5 µs, respectively. NMR data were processed in the Topspin software (version 3, Bruker Biospin) and analyzed using CCPnmr (69) (version 2.3) and in-house written python analysis routines. In analyses of the NERRD experiment, a two-parameter monoexponential decay function was fitted to the spin-lockduration-dependent peak intensity decays at the various RF field strengths. The fitting procedure of the REDOR experiment was described previously (67). Briefly, numerical simulations were performed with the GAMMA simulation package (70), setting all pulse-sequence related parameters (MAS frequency, pulse durations, RF field strengths and timing) to the values used in the experiment. A series of such simulations was carried out, in which the 1 H-13 C dipolar-coupling tensor anisotropy was varied from 1030 to 15000 Hz (where a rigid H-C pair at a distance of 1.115 Å has a tensor anisotropy of 43588 Hz, which results in a rigid-limid value of 14529 Hz when considering the fast methyl rotation) with a grid step size of 30 Hz, and the tensor asymmetry was varied from 0 to 1 with a grid step size of 0.05. Each experimental RE-DOR curve was compared to this two-dimensional grid of simulations (ca. 9800 simulations in total) and a chi-square value was calculated for each simulation. The reported bestfit tensor parameters are those that minimize the chi-square. Error estimates were obtained by a Monte Carlo approach. Briefly, for each methyl site 1000 synthetic noisy REDOR curves were generated around the best-fit simulated REDOR curve, using the spectral noise level and a normal distribution for generating the noisy data points. These 1000 synthetic REDOR curves were fitted analogously to the abovedescribed procedure, and the standard deviation over the tensor anisotropy and asymmetry is reported as error estimates. Squared order parameters (Fig. 2b) were obtained by dividing the best-fit tensor anisotropy by the rigid-limit value (14529 Hz), and squaring the value. The v-rescale (74) and Parrinello-Rahman (75) schemes were employed to control temperature (T=300K) and pressure (P=1 atm) respectively. Initial configuration for the TET complex was taken from X-ray structure (pdb code: 1Y0R) and the missing loop was modeled with Swiss Model. The TET complex was solvated in a rhombic dodecahedron box with a volume of 2880 nm 3 with periodic boundary conditions. A cutoff of 1 nm was used to compute van der Waals interactions, while electrostatic interactions were evaluated by means of the Particle Mesh Ewald algorithm using a cutoff of 1 nm for the real space interactions. The LINCS (76) algorithm was used to restrain all bond lengths to their equilibrium value. High-frequency bond-angle vibrations of hydrogen atoms were removed by substituting them by virtual sites, allowing an an integration time step of 4 fs (77). Distance restraints between protein molecules and zinc atoms were applied to preserve the local geometry of the enzymatic site. In the simulations of the substrate-bound protein, the substrate was modeled as a tetrapeptide (Leu-Leu-Val-Ala) where the N-terminal residue was modified in order to have a neutral terminus. A substrate molecule was bound to the active site of each monomer by introducing an additional set of distance restraints between substrate and zinc atoms. These restraints were modeled on the basis of the X-ray structure of amastatin-bound complex (pdb code: 1Y0Y) to preserve a correct binding geometry. Apo and substrate-bound systems were energy minimized and equilibrated for 200 ns and then 1 µs production runs were performed for each system. Reported results were obtained by analyzing one frame every 100 ps. Residues where considered in direct contact when the minimum inter-residue distance distance between heavy atoms was below 5 Å, whereas a looser cutoff (7 Å) was considered when evaluating DCA predictions according to standard practices in coevolutionary analysis (78).
Enzymatic activity assays. The enzymatic activity was measured by following the absorbance change induced when a para-nitroanilide (pNA) labelled substrate is enzymatically cleaved using aminoacyl-pNA compounds H-Leu-pNA and H-Leu-Val-Leu-Ala-pNA (Bachem, Bubendorf, Switzerland) as substrates. Measurements were performed on a BioTek Synergy H4 plate reader (Fisher Scientific) measuring the absorbance at 410 nm in a 384-well plate at 50 ¶ C. In all cases, the wells were filled with 50 µL of substrate solution at concentrations varying in the range from 0.1 to 6.4 mM for H-Leu-pNA and 1mM for H-Leu-Val-Leu-Ala-pNA in buffer (20 mM Tris, 100 mM NaCl, pH 7.5); plates were briefly centrifuged to ensure that the solution is in the bottom of the wells. The plate loaded with the substrate solutions was preequilibrated for 20 minutes at 50 ¶ C. Then, 10 µL of the protein solution (in the same buffer as the substrate) was added on each well in order to reach a final protein solution concentration on each well of 5ng/µL. All solutions contained 2.8 % (vol/vol) dimethylsulfoxide (DMSO; Sigma-Aldrich), which increases the solubility of the substrates. In order to minimize changes in the substrate solution (e.g. temperature) upon the protein addition, the plate cover was kept above the platereader thermostat and an electronic multichannel pipette was employed to load the protein solution into the wells and gently mix the solution. We estimated the pNA concentration from the solution absorptivity (molar absorption coefficient for the pNA at 410nm of 8800 M ≠1 .cm ≠1 ). To estimate the path length (0.375 cm), we considered that the dimensions of the wells and the total volume of solution. Before analysis, curves from blank sample (no protein) were substracted. The time-dependent absorbance values were analysed with in-house written python scripts, by fitting the initial rate with a linear equation. These initial-regime slopes as a function of the substrate concentration were fitted to obtain Michaelis-Menten parameters K M and k cat , reported in Figures 1d and  4b,c and Table S1.
Bioinformatic analyses. An initial seed for the coevolution analysis was built using the sequences contained in the PFAM seed of the M42 Peptidase family (PFAM ID: PF05343) and aligned using the MAFFT utility. The alignment was then curated, removing overly gapped regions. This resulted in a sequence model consisting of 353 positions, covering the whole width of the Pyrococcus horikoshii TET2 peptidase (Uniprot ID O59196). A hmmer model of the family was then built using the hmmbuild utility and used to search the uniport database (union of TREMBL and Swissprot datasets, release 07_2019) for homologs using the hmmsearch utility, with standard inclusion thresholds. To remove fragments, the retrieved homologs were further filtered by coverage, keeping only sequences containing no more than 25% gapped positions. The loop region of TET was defined as lying between V120 and Q138 in the Pyrococcus horikoshii TET2 peptidase. Starting from this final Multiple Sequence Alignment (MSA), logo sequences considering only the mentioned loop region, including some neighboring residues due to highly conserved physicochemical properties between them (residues 115 to 139) were made using seqlogo (79), a method that takes the position weight matrix of a DNA sequence motif and plots the corresponding sequence logo according to parameters. Column heights in Fig. 4a are proportional to the information content. Regarding sequence identity, no significant differences were observed be-tween logo sequences considering full MSA versus 90% sequence identity (data not shown). Direct-Coupling Analysis (DCA) was performed using the asymmetric version of the pseudo-likelihood maximization method, implemented in the lbsDCA code (78), using standard regularization parameters. To remove sampling bias, sequences were reweighted by identity, downweighing sequences with more than 90% sequence identity to homologs. DCA results were processed using utilities in the dcaTools package (78). To ignore uninformative very-short range predictions, all reported predictions and accuracies are for residue pairs separated by more than four residues along the chain. Structural contacts were defined by inter-atomic distances between heavy-atoms below 8 Å. The sequence mining procedure resulted in the extraction of 26'067 TET homologs with at least 75% coverage. After reweighting by sequence identity, the number of effective sequences was of 9157.67, giving an excellent B Eff /N Pos ratio of 25.9, where B Eff denotes the number of effective sequences after weighting sequences by sequence identity (80)) and N Pos denotes the number of residue positions (i.e. columns) in the MSA . DCA prediction benchmarked on the 1Y0Y structure show excellent prediction accuracies over a large range of predictions (Fig. S8a). Notably, considering the top 2N=706 highest ranked DCA predictions results in a prediction accuracy of 88%. Ignoring the false-positives rising from predictions falling in regions where the PDB structure is not defined, the accuracy rises above 90%. Inspection of the predicted contacts with respect to the 1Y0Y PDB structures ( Fig. S8b and Fig.3) highlights the prediction of multiple sets of contacts involving the loop region. These can be separated in a set formed by loop-loop interactions, a set of putative intra-molecular loop contacts and a third set of putative inter-molecular loop interactions (Fig S8). Table S2 reports the list of all 19 predicted contacts involving the TET loop.