Structure, sequon recognition and mechanism of tryptophan C-mannosyltransferase

C-linked glycosylation is essential for the trafficking, folding and function of secretory and transmembrane proteins involved in cellular communication processes. The tryptophan C-mannosyltransferase (CMT) enzymes that install the modification attach a mannose to the first tryptophan of WxxW/C sequons in nascent polypeptide chains by an unknown mechanism. Here, we report cryogenic-electron microscopy structures of Caenorhabditis elegans CMT in four key states: apo, acceptor peptide-bound, donor-substrate analog-bound and as a trapped ternary complex with both peptide and a donor-substrate mimic bound. The structures indicate how the C-mannosylation sequon is recognized by this CMT and its paralogs, and how sequon binding triggers conformational activation of the donor substrate: a process relevant to all glycosyltransferase C superfamily enzymes. Our structural data further indicate that the CMTs adopt an unprecedented electrophilic aromatic substitution mechanism to enable the C-glycosylation of proteins. These results afford opportunities for understanding human disease and therapeutic targeting of specific CMT paralogs.


CMT activity is not divalent metal ion dependent
We purified recombinantly expressed CeDPY19 and developed an in vitro assay that recapitulated the tryptophan C-mannosylation malaria-causing Plasmodium spp. 16 . Beyond mammalian biology and diseases, C-mannosylation has been studied in the model nematode C. elegans, where it was shown to control Wnt-signaling to establish left/right asymmetry in neuroblast polarization and migration during development 17 .
While tryptophan C-mannosylation was discovered nearly three decades ago 2 , the first tryptophan C-mannosyltransferase (CMT) was identified less than a decade ago: the C. elegans protein DPY19 (ref. 18). This enzyme was predicted to be an integral membrane protein belonging to the C-type glycosyltransferase superfamily (GT-Cs) 18 . All known CMTs are homologous to the C. elegans enzyme (CeDPY19), after which they are named. Unlike C. elegans, which contains a single CMT enzyme, humans have four paralogs, termed HsDPY19L1, HsDPY19L2, HsDPY19L3 and HsDPY19L4 (ref. 18), with HsDPY19L1 sharing the greatest sequence similarity with the C. elegans homolog 18 . The human paralogs differ in their tissue expression profile and sequon preferences. HsDPY19L1, HsDPY19L3 and HsDPY19L4 are expressed ubiquitously 19 , whereas HsDPY19L2 is found exclusively in the testis 20 , where it plays an essential role in spermatogenesis 21 . Loss of function mutations in HsDPY19L2 induce globozoospermia to cause male infertility 21 .
The reaction catalyzed by CMT enzymes occurs on the luminal side of the endoplasmic reticulum and involves the transfer of a mannose unit from a dolichylphosphate mannose (Dol-P-Man) donor 22 to an acceptor protein containing a WxxW or WxxC consensus sequon (Fig. 1a) 5 . Like N-glycosylation and O-mannosylation, tryptophan C-mannosylation occurs before or during protein folding 23 .
The true extent and the physiological role of tryptophan C-mannosylation has long remained poorly understood. Biochemical Article https://doi.org/10.1038/s41589-022-01219-9 reaction using synthetic, fluorescently labeled peptides (WEHI-1881196, WEHI-1881197, WEHI-1881198, WEHI-1886494, numbered 1-4) based on known acceptor substrate sequences 2,22 . As a donor substrate, we used a synthetic Dol-P-Man analog (Dol25-P-Man, 5), a compound that contains five isoprenoid units (25 carbon atoms) and had been successfully used as a mannose donor for the endoplasmic reticulum-resident GT-CMTs ALG3, ALG9 and ALG12 of the N-glycosylation pathway 33 . We measured in vitro C-mannosylation using Tricine-SDS-PAGE ( Fig. 1b and Extended Data Fig. 1a) and liquid chromatography-mass spectrometry (LC-MS) (Fig. 1c), which demonstrated the covalent attachment of a single hexose unit to the peptide. The activity of purified recombinant CeDPY19 was unchanged when reactions were supplemented with Mn 2+ or Mg 2+ salts, or with the metal ion chelator EDTA, suggesting that the enzyme does not require divalent metal ion cofactors (Fig. 1d). This is in stark contrast to oligosaccharyltransferase (OST) 34 and O-mannosyltransferase (PMT1/2) 29 , the related GT-C enzymes responsible for N-and O-glycosylation of proteins: OST activity is metal dependent 34 and available evidence indicates that PMT1/2 activity is likely also metal dependent 29 . CMT's metal independence and the unique nature of the chemical bond it forms suggests its mechanism diverges from that of PMT1/2 and OST, despite it using the same donor substrate (Dol-P-Man) as PMT1/2. Finally, we also found that CeDPY19 exclusively processed synthetic Dol-P-Man analogs and did not accept a synthetic, glucose-containing Dol-P-Glc analog (Dol25-P-Glc, 6) 33 as a substrate, thus recapitulating the enzyme's physiological substrate specificity in vitro (Extended Data Fig. 1b).

Structure and topology of CeDPY19
To facilitate high-resolution cryo-EM studies, we used phage display and a synthetic Fab library 35 and isolated a conformational epitope-binding Fab against CeDPY19 (Extended Data Fig. 2a,b). This approach was shown to increase the size and mass of particles and provide a fiducial mark in particle alignment 33 Table 1 and Supplementary Fig. 1). CeDPY19 contains 13 transmembrane helices, two long endoplasmic reticulum-luminal loops (EL1 and EL5), and a globular, endoplasmic reticulum-luminal, C-terminal domain (Fig. 2b). The CMT2-Fab used for these cryo-EM analyses binds to the cytoplasmic loops IL4 and IL5 and is therefore positioned on the opposite side of the membrane, relative to the CeDPY19 active site ( Fig. 2a and Supplementary  Fig. 3), where it cannot interfere with catalysis (Extended Data Figs. 1b and 3). Like other GT-Cs, CeDPY19 contains a structurally conserved and a variable module 33 . The latter is reminiscent of the variable module of the STT3 subunit of OST 28,30,32 with respect to transmembrane helix arrangement and fold, suggesting that C-linked and N-linked protein glycosylation machinery evolved from a common ancestor. The C-terminal, endoplasmic reticulum-luminal domain of CeDPY19 contains a core formed by an α 5 β 5 -sandwich (Fig. 2a,b: β1-β5, α1-α4 and α8). As this domain forms a lid-like structure that covers the respective catalytic sites, and given its shape, we refer to it as 'luminal dome'. On structural alignments, we found the core of the luminal dome to be structurally conserved in otherwise structurally diverse luminal domains of other GT-Cs, including those of OSTs 28,30,32 , bacterial arabinosyltransferases 38,39 and arabinofuranosyltransferases 40 (Fig. 2c).
CeDPY19 contains three disulfide bridges. One of them links Cys407 and Cys630, thereby tethering the luminal dome to EL5, the loop connecting transmembrane helices 9 and 10 ( Fig. 2b and Extended Data Fig. 4). This is distinct from STT3 in OST, where a reversible engagement and disengagement of EL5 relative to the luminal dome is essential for the binding and release of substrates and products 31 . The presence of this disulfide bond in CeDPY19, which is conserved in CMT enzymes, suggests that substrate binding and product release does not require association and dissociation of these domains.

Acceptor sequon recognition in unfolded proteins
To reveal how CMTs select and bind acceptor sequons, we determined a cryo-EM structure of CeDPY19 bound to a synthetic octapeptide (WEHI-1886493, 7), which contains a WxxW sequon: Trp(0)-Ala(+1)-Lys(+2)-Trp(+3). The numbers in parentheses refer to the location relative to the tryptophan Trp(0) that is modified by CeDPY19. A fluorescently labeled version of this peptide (WEHI-1886494) was found to be a suitable substrate of CeDPY19 in vitro (Extended Data Fig. 1b). The resolution of the acceptor peptide-bound complex was 2.7 Å (Fig. 3, Extended Data Fig. 5 and Supplementary Table 1), and the peptide was well-resolved, providing complete coverage of the WxxW sequon (Fig. 3a). To test the functional relevance of the enzyme-peptide interactions observed in our structure, we determined the activity of selected CeDPY19 mutants using a semiquantitative, yeast-based cellular assay 26 . Each CeDPY19 mutant was coexpressed with RNase2, a substrate of CMTs that can be used as a reporter of C-mannosylation activity, within Pichia pastoris, an organism that is naturally devoid of C-mannosylation activity. The RNase2 reporter protein was affinity-purified from culture supernatants and the occupancy of its single C-mannosylation site determined by MS after trypsin digestion of the samples (Fig. 4).
The acceptor peptide binds into a groove between the endoplasmic reticulum-luminal loops and the luminal dome of CeDPY19. While grooves for binding peptides are present at similar general locations in OST and PMT1/2 (refs. 28,29,32), the shape and the interactions with the acceptor peptide are distinct in CeDPY19, providing selectivity for the WxxW sequon. The shape of the groove forces the main chain of the acceptor peptide to bend sharply next to Trp(0), which is incompatible with any secondary protein structure. This observation rationalizes why CMTs exclusively process unfolded proteins 3,23 , since neither an α-helix nor a β-strand would fit into the binding pocket of the enzyme; yet, C-mannosylation is often found thrombospondin type 1 repeats that have an extended polypeptide backbone conformation that closely resembles beta strands 4 .
The backbone of the acceptor peptide forms several H-bonds to side chains of CeDPY19. The indole moiety of Trp(+3) is wedged into a deep cavity and held in place by cation-π interactions with the flanking residues Arg211 and Lys473 (Fig. 3a inset). Mutating either of those residues to alanine led to a near-complete (90-100%) drop in protein C-mannosylation (Fig. 4). Notably, a mutation of the equivalent arginine (Arg290) to histidine in the human HsDPY19L2 paralog represents the most frequently reported missense mutation in globozoospermic patients 41 . The side chain of Ala(+1) of the acceptor peptide points into a shallow cavity of the enzyme, where larger side chains would clash and therefore interfere with sequon binding. This rationalizes the observed preference for smaller side chains at position (+1) of the sequon 3 . In contrast, the side chain of Lys(+2) points into the solvent, which explains the high tolerance of CMTs to sequence variability reported for position (+2) of the WxxW sequon 3 . The indole group of the acceptor tryptophan, Trp(0), fits snugly into a groove formed mainly by the aromatic side chains Tyr395, Phe401 and Tyr578, forming a network of π-π stacking interactions. The backbone carbonyl of Pro576 forms an H-bond to N1 of the indole of Trp(0), providing a key contact to the substrate.
To rationalize the acceptor sequon variability at position (+3) of the four human CMT paralogs, we compared our peptide-bound CeDPY19 structure to the models of HsDPY19L1-L4, as predicted by AlphaFold 42 (Fig. 3b). The predicted peptide-binding sites of HsD-PY19L1 and HsDPY19L2 are similar to those of CeDPY19, which is in line with the finding that HsDPY19L1 recognizes WxxW sequons 5 . While the substrate specificity of HsDPY19L2 has not been experimentally demonstrated, its similarity to CeDPY19 suggests that HsDPY19L2 is an active testis-specific CMT that preferably glycosylates the WxxW sequon. In contrast, HsDPY19L3 recognizes the sequon WxxC, which contains a cysteine at the (+3) position 5 . In the CeDPY19 structure, a leucine residue (Leu474) is present near Trp(+3) of the bound peptide, forming the 'floor' of the Trp(+3) indole-binding pocket. The equivalent residue in HsDPY19L3 is a tyrosine (Tyr485), which would clash with the indole moiety of Trp(+3) (Fig. 3b). This likely explains why   Article https://doi.org/10.1038/s41589-022-01219-9 HsDPY19L3 cannot process the WxxW sequon 5 . Instead, it is plausible that the hydroxyl group of Tyr485 of HsDPY19L3 H-bonds to Cys(+3) in the WxxC sequon. Finally, HsDPY19L4 has the least similar active site compared to CeDPY19 (Fig. 3b). Like HsDPY19L3, the predicted model of HsDPY19L4 features a tyrosine residue (Tyr488 in HsDPY19L4) at the position of Leu474 in CeDPY19. Although this does not reveal the substrate specificity of HsDPY19L4, it suggests that HsDPY19L4 is unlikely to recognize and process the canonical WxxW sequon, and is more likely to have a preference for amino acids with smaller side chains at the (+3) position. We conclude that CeDPY19-Leu474 and its equivalent residues in CMT paralogs are the key determinant of CMT acceptor sequon preference at the (+3) position.

Active site structure and mechanism of Dol-P-Man recognition
Several side chains in the vicinity of the acceptor indole are likely involved in catalysis. Glu71 is well-positioned to abstract a proton from the C2 of the Trp(0) indole group at some point during the glycosylation reaction: no other nearby residue could act as general base during catalysis. Mutating Glu71 to Ala, Gln, Leu or Met abolished CeDPY19 activity (Fig. 4), providing strong support for an essential role of Glu71 in catalysis: most likely as a catalytic base. Mutating Glu71 to Asp only resulted in a 20% drop in protein C-mannosylation. Notably, the human paralog HsDPY19L4 contains an aspartate (Asp95) at this position rather than a glutamate, revealing that this protein is likely a competent enzyme and that nature has explored and adopted both residues as catalytic bases for C-mannosylation ( Fig. 3b and Extended Data Fig. 4). As expected from our in vitro assay, and unlike in OST 32 or PMT1/2 (ref. 29), no obvious residues for coordinating metal ions were found in the CeDPY19 active site.
To understand how CMTs recognize their donor substrate, we determined a 3.0 Å resolution structure of CeDPY19 bound to the synthetic, water-soluble donor-substrate analog Dol25-P-Man (ref.  Table 1). The substrate is recruited to the active site via a tunnel formed by EL5. At the active site, the mannose moiety is partially solvent-exposed but also forms several hydrogen bonds with the enzyme. The selectivity for Dol-P-Man over Dol-P-Glc appears to be ensured by the presumed catalytic base residue Glu71, the side chain of which would clash with the equatorial C2 hydroxyl of the glucose moiety if Dol-P-Glc was bound instead of Dol-P-Man (Extended Data Fig. 6a). Article https://doi.org/10.1038/s41589-022-01219-9 The phosphate moiety, which is the leaving group of Dol-P-Man, is coordinated by a salt bridge to Arg471 and by an H-bond to the indole NH of Trp262 (Extended Data Fig. 6a). We found that the mutations R471A and R471Q abolished CeDPY19 activity (Fig. 4), demonstrating an important role for Arg471 in recruiting the donor substrate and/or catalysis. The dolichyl moiety of bound Dol25-P-Man fits into a hydrophobic groove formed by TM6 and TM11 that is lined with the side chains of the conserved hydrophobic residues Trp262, Phe264, Leu472 and Phe401 from EL5 (Extended Data Fig. 6b). The mutations F264A and F401A led to a >50% reduction in protein C-mannosylation (Fig. 4), suggesting that recruitment of Dol-P-Man involves the specific recognition, binding and partial extraction of the dolichyl moiety from the membrane.

Trapped ternary complex reveals donor recruitment and activation
A comparison of the independently determined structures of peptide-bound and Dol25-P-Man-bound CeDPY19 revealed that the acceptor tryptophan of the bound peptide would clash with the mannose moiety of Dol25-P-Man (Fig. 5b). This suggests that conformational changes are required in the active site for the two substrates to bind simultaneously and for catalysis to proceed. To visualize these changes, we trapped the enzyme in a ternary complex using a nonhydrolyzable and thus nonreactive donor substrate analog containing a phosphonate group, termed Dol25-P-C-Man 8 (Fig. 5c). This prevents turnover, and the resulting ternary complex corresponds to a pseudo-Michaelis complex.
The structure, determined at 3.6 Å, revealed an active site conformation that was more similar to acceptor peptide-bound than to donor-substrate-bound CeDPY19 (Fig. 5b). The orientation of the bound peptide and its interactions with the enzyme were very similar to the peptide-bound CeDPY19 structure (Fig. 5b, Extended Data Fig. 5 and Supplementary Table 1). In contrast, key conformational changes were observed for bound Dol25-P-C-Man compared to Dol25-P-Man-bound CeDPY19. First, the phosphonate moiety is shifted by roughly 4 Å toward the two arginines Arg471 and Arg211 and the catalytic Glu71 (Extended Data Fig. 6). Second, the mannose moiety adopts a 'bent-back' conformation (Fig. 5d) that allows for the simultaneous binding of the donor substrate and the acceptor peptide. As a result, the anomeric C1 carbon of the mannose moiety is at a distance of roughly 3.5 Å from the C2 carbon of the acceptor tryptophan and optimally positioned for an electrophilic attack ( Fig. 6a and Extended Data Fig. 6a).
The bent-back conformation of Dol25-P-C-Man appears to be stabilized by the active site loop LDLβ2-α3. Because this loop strongly interacts with bound peptide, the activation of the donor substrate appears allosterically induced by acceptor peptide binding (Fig. 5c). Similar bent-back conformations have been observed in nucleoside-diphosphate-linked hexose donors bound to enzymes of the glycosyltransferase superfamily B (GT-B), including PimA 43 , MshA 44 , HepE 45 and PglH 46 . Our findings reveal how the donor substrate only adopts an active conformation on acceptor peptide binding, providing a mechanism to prevent the futile hydrolysis of Dol-P-Man in the absence of an acceptor peptide. No other GT-C structure has been reported in both the donor-only and the ternary complex state. The only other example of a donor-only bound GT-C, ALG6 bound to Dol-P-Glc, was also observed in a catalytically inactive state 33 . Given the structural similarity of donor-substrate binding sites in GT-Cs 33 and a proposed common mode of donor recruitment 47 , we postulate that inactive donor resting states and acceptor-mediated donor activation is conserved in GT-C enzymes.

Catalytic mechanism
Our structural and functional findings provide sufficient molecular detail to propose a mechanism for CMT-catalyzed tryptophan C-mannosylation (Fig. 6b). The reaction catalyzed by tryptophan CMT can be considered an electrophilic aromatic substitution at C2 of the indole, with inversion of configuration at the anomeric (C1) carbon of the mannose moiety 22 . The enzyme ensures regioselectivity by placing C1 of the mannose donor in close proximity to C2 of the indole acceptor. As the basicity of the carboxylate side chain of the catalytic Glu71 is likely insufficient for direct deprotonation of the indole at C2, we propose that this occurs after the mannosyl transfer step. The reaction can be thought to occur in three steps. In step 1, the enzyme activates the mannosyl donor by stabilizing the negative charge of the departing dolichylphosphate leaving group via interactions with Arg211 and Arg471, and potentially through protonation by Glu71. This facilitates an attack of the indole on the anomeric carbon of mannose, forming a C-glycosidic bond with inversion of anomeric configuration. The resulting cationic intermediate is resonance-stabilized. In step 2, the C2 proton of the cationic intermediate can now be plausibly abstracted by Glu71 to rearomatize the indole and generate the reaction product, with Glu71 completing its role as general acid-base in the reaction. While it is not clear to what extent steps 1 and 2 are concerted, mechanistic studies of other glycosyltransferases suggest that late transition states with substantial ionic character are not uncommon during glycosyl transfer 48 . The presumed resonance-stabilized cationic intermediate may be further stabilized by H-bonding interactions between the indole N1 and the backbone carbonyl of Pro576 and via a π-stacking network with Tyr395, Phe401 and Tyr578. In step 3, the observed geometric arrangements of the substrates and steric restrictions of the active site suggest an immediate reaction product with the mannose in a 4 C 1 conformation. Given that the energetically preferred conformation of the C-linked mannose is 1 C 4 rather than 4 C 1 (ref. 49), product release from the enzyme might coincide with a conformational flip of the mannose to 1 C 4 . This minimizes the potential for product inhibition.
Mechanistically, CMT and GT-C members such as OST and PMT1/2 share similar spatial arrangements of donor and acceptor substrate, as well as analogous carboxylic acid residues (Glu71 in CeDPY19) as a general base. They all promote glycosyl transfer by stabilizing (pyro) phosphate leaving groups through interactions with cationic residues and metals. Unlike OST and PMT1, CMT appears to have evolved metal independence by repurposing its catalytic carboxylic acid residue as a general acid-base that can promote phosphate departure through protonation then deprotonate the glycosylated intermediate to generate product. However, the most significant difference between CMT and other GT-C members, such as OST and PMT1/2, is that the latter catalyze nucleophilic displacements, while the former catalyzes an electrophilic aromatic substitution. Thus, while OST has evolved unique mechanisms to enhance amide nucleophilicity 32 , CMT has evolved the means to direct electrophilic attack at the C2 position of indole, stabilize the cationic intermediate, and regenerate indole aromaticity via deprotonation. This mechanism of enzyme-mediated carbon-carbon bond formation on ribosomal peptides is unique, with the closest related example being the prenylation of tryptophan-derived secondary metabolites via electrophilic aromatic substitution by fungal dimethylallyltryptophan synthases 50 . However, the fold, chemistry, substrate preferences and mechanisms of the dimethylallyltryptophan synthases are entirely different from those of the CMTs.

Discussion
The CeDPY19 structures presented here define the architecture of tryptophan CMTs, provide the structural basis of acceptor sequon binding in nascent polypeptide chains and of mannose donor-substrate recognition, and reveal how the donor substrate is activated by the enzyme. These data enabled us to propose a reaction mechanism for C-glycosidic bond formation, thereby addressing the most substantial gap in our understanding of protein glycosylation chemistries. The structures also provide a framework for understanding the recruitment and activation of lipid-linked carbohydrate donors by GT-C superfamily enzymes. Our findings rationalize the protein substrate preferences of the human CMT paralogs HsDPY19L1/L3, provide clues as to the preferences of HsDPY19L2/L4 and help understand mutations that cause CMT-deficiency-associated male infertility. Finally, our substrate-bound structures provide a strong foundation for the development of CMT inhibitors. Such molecules would be invaluable for exploring the biology of tryptophan C-mannosylation through paralog-specific inhibition of CMT enzymes to better understand the physiological role of tryptophan C-mannosylation in cell-cell communication and tissue development. They may also have applications as anti-parasite drugs (for example, against toxoplasmosis, malaria or helminth infections) or as male contraceptives through the specific inhibition of human HsDPY19L2.

Online content
Any methods, additional references, Nature Portfolio reporting summaries, source data, extended data, supplementary information, acknowledgements, peer review information; details of author contributions and competing interests; and statements of data and code availability are available at https://doi.org/10.1038/s41589-022-01219-9.

Overexpression and purification of CeDPY19
A synthetic gene construct encoding C. elegans DPY19 (CeDPY19) with a C-terminal FLAG 3 tag 26 was cloned into a pOET1 vector (Oxford Expression Technologies) and was expressed in Spodoptera frugiperda (Sf9) cells transfected with baculovirus that was generated using flash-BAC GOLD (Oxford Expression Technologies). Cells were cultured in serum-free SF4 medium at 27 °C. Cells were transfected at a density of 1 × 10 6 cells per ml and were harvested after 3 days. For purification, the cells were resuspended in 50 mM HEPES pH 7.4, 150 mM NaCl with 0.1 mg ml −1 DNAseI, 1:100 protease inhibitor cocktail (Sigma), 0.1 mg ml −1 PMSF and were lysed by dounce homogenization before solubilization by addition of 0.5% lauryl maltose neopentyl glycol (LMNG, Anatrace) 0.05% cholesteryl hemisuccinate (CHS, Anatrace) and 10% glycerol. After 1 h of solubilization, cell debris were pelleted by centrifugation at 100,000g in a type T45 Ti rotor (Beckmann). The supernatant was added to ANTI-FLAG M2-Affinity Gel (SigmaAlrich) and was incubated for 1 h. The affinity gel then was washed with 2 × 20 column volumes of washing buffer (40 mM HEPES pH 7.4, 150 mM NaCl, 0.01% LMNG, 0.001% CHS). Then the protein was eluted by incubation with washing buffer, supplemented with 0.3 mg ml −1 FLAG-peptide for 1 h. The protein was further purified using size-exclusion chromatography (SEC) and thereby desalted into 20 mM HEPES pH 7.4, 150 mM NaCl, 0.01% LMNG, 0.001% CHS.

In vitro glycosyl transfer assays for CeDPY19
Reactions of CeDPY19 were carried out in reaction buffer (40 mM HEPES pH 7.4, 150 mM NaCl, 0.01% LMNG and 0.001% CHS). Purified CeDPY19 (50 nM) was mixed with 50 mM Dol25-P-Man, 10 mM peptide and optionally 5 mM of MnCl 2 , MgCl 2 or EDTA. For reactions in the presence of EDTA, CeDPY19 was preincubated for 1 h on ice with the chelator before adding donor and acceptor substrates. The reactions were incubated for a total time of 42 h at 20 °C. Subsequently, the reactions were stopped by 50-fold dilution in Laemmli buffer and were subjected to Tricine-SDS-PAGE 51 .

Enzymatic biotinylation of CeDPY19
CeDPY19 was fused to a C-terminal Avitag and purified as described above. Then CeDPY19 was biotinylated in biotinylation buffer ( 13 virions were added to the Streptavidin beads and incubated for 30 min. The resuspended beads containing bound phages were washed extensively and then used to infect log-phase E. coli XL1-Blue cells. Phages were amplified overnight in 2xYT media with 50 µg ml −1 ampicillin and 10 9 pfu ml −1 of M13-KO7 helper phage. To obtain binders of high affinity and specificity, three additional rounds of selection were performed with decreasing the target concentration in each round (second round 125 nM, third round 62.5 nM and fourth round 12.5 and 6.5 nM) using the amplified pool of phages of the preceding round as the input. Selection from second to fourth rounds was done on a KingFisher Purification System (Thermo Scientific) using a solution capture method where the target was premixed with the amplified phage pool and then Streptavidin beads were added to the mixture. From the second round onward, the bound phages were eluted using 100 mM glycine, pH 2.7. This harsh elution technique often results in the elution of nonspecific and Streptavidin binders. To eliminate them, the precipitated phage pool from the second round onward were negatively selected against 100 µl of Streptavidin beads before adding to the target. The precleared phage pool was then used as an input for the selection.

Single-point enzyme-linked immunosorbent assay (ELISA)
ELISA experiments were performed at 4 °C in 96-well plates coated with 50 µl of 2 µg ml −1 neutravidin in Na 2 CO 3 buffer, pH 9.6 and subsequently blocked by 0.5% BSA in PBS. A single-point phage ELISA was used to rapidly screen the binding of the obtained Fab fragments in phage format.

Thermostability assays
Thermostability analysis experiments were performed as described previously 53 , with purified CeDPY19 in LMNG:CHS supplemented buffer, preincubated with or without a 1.5-fold molar excess of Fab for 2 h on ice. Then the samples were incubated for 10 min at different temperatures in a PCR machine and analyzed by SEC, measuring A 280 instead of fluorescence during SEC to assess the area under the curve of the SEC peaks of the respective samples. While a full curve was measured to determine the melting temperature (T m ) of CeDPY19 (35.68 °C), only two data points were measured for CeDPY19-Fab complexes: one at 4 °C and one at 36 °C. Then the percentage of peak-high retention between the samples with and without Fab was compared, which allowed for a qualitative assessment of thermostabilizing effects.

EM sample preparation
For the apo structure, purified CeDPY19 was mixed with excess CMT2-Fab and excess anti-Fab nanobody 37 . After incubation overnight at 4 °C, excess Fab and nanobody were removed by SEC. Peak fractions were pooled and the CeDPY19-CMT2-Fab-anti-Fab-Nb complex was concentrated to 6.5 mg ml −1 and used for cryo-EM grid preparation. For the acceptor peptide-bound structure and for the Dol25-P-Man-bound structure, CeDPY19-CMT2-Fab-anti-Fab-Nb complex was concentrated to 5 mg ml −1 (35 µM). Subsequently, either 1 mM acceptor peptide WEHI-1886493 (Ac-Pra-GSWAKWS-NH2) or 500 µM synthetic Dol25-P-Man (ref. 33) (final concentrations) were added, and the samples were incubated for 1-2 h on ice and then used for grid preparation.
For the structure of the ternary complex CeDPY19-CMT2-Fabanti-Fab-Nb complex was concentrated to 4.9 mg ml −1 (34 µM) and was mixed with 500 µM Dol25-P-C-Man and 1 mM Ac-Pra-GSWAKWS-NH2 (final concentrations). The samples were incubated for 1 h on ice and subsequently used for grid preparation.

EM grid preparation
Quantifoil holey carbon grids, Cu, R 1.2/1.3, 300 mesh, were glow discharged for 45 s, 25 mA using a PELCO easiGLOW glow discharger. Sample (2.5 µl) was applied to the cryo-EM grids and blotted for 1-3.5 s before plunge freezing in a liquid ethane-propane mixture with a Vitrobot Mark IV (Thermo Fisher Scientific) operated at 4 °C and 100% humidity.

EM data collection
Data were recorded on a Titan Krios electron microscope (Thermo Fischer Scientific, second generation) operated at 300 kV, equipped with a Gatan BioQuantum 1967 filter with a slit width of 20 eV and a Gatan K3 camera. Videos were collected semiautomatically using EPU 2 software (Thermo Fisher Scientific) at a nominal magnification of ×130,000 and a pixel size 0.33 Å per pixel, in super-resolution mode. The defocus range was −0.6 to −2.8 µm. Each video contained 40 images per stack with a dose per frame of 1.21 e − /Å 2 .

EM data processing, model building and refinement
For the apo structure of CeDPY19, 11,725 movies were collected, corrected for beam-induced motion using MotionCor2 (ref. 54) and subjected to further processing in RELION v.3.1 (https://relion.readthedocs.io/en/latest/Installation.html). The contrast transfer function (CTF) was estimated using Gctf 55 . Using LOG-based particle picking, 7,357,737 particles were auto-picked, extracted with threefold binning (1.98 Å per pixel) and were sorted by two-(2D) and three-dimensional (3D) classification. A total of 473,614 particles were re-extracted to 0.66 Å per pixel and were subjected to another round of 3D classification. Therefrom, 384,830 particles were selected and subjected to further refinement where the Fab and the detergent micelle were masked out. Subsequent particle polishing and per-particle CTF refinement allowed refinement of the particles to 2.75 Å resolution, by masking out the Fab-Nb complex and the detergent micelle.
For the acceptor peptide-bound structure of CeDPY19, 13,041 movies were collected, corrected for beam-induced motion using MotionCor2 (ref. 54) and subjected to further processing in RELION v.3.1. The CTF was estimated using Gctf 55 . Next, 9,885 micrographs were selected for further processing as they had an estimated resolution higher than 3.5 Å. Using LOG-based particle picking, 3,876,382 particles were auto-picked, extracted with threefold binning (1.98 Å per pixel) and were sorted by 2D and 3D classification. A total of 324,852 particles were re-extracted to 0.66 Å per pixel and subjected to further refinement where the Fab and the detergent micelle were masked out. Subsequent particle polishing and per-particle CTF refinement allowed to refine the particles to 2.83 Å resolution. To improve the resolution of the substrate the particles were subjected to an additional round of 3D classification and 287,795 particles were selected. Subsequent particle polishing and per-particle CTF refinement allowed to refine the particles to 2.72 Å resolution, by masking out the Fab-Nb complex and the detergent micelle.
For the Dol25-P-Man-bound structure of CeDPY19, 7,874 movies were collected, corrected for beam-induced motion using Motion-Cor2 (ref. 54) and subjected to further processing in RELION v.3.1. The CTF was estimated using Gctf 55 . Then 7,115 micrographs were selected for further processing as they had an estimated resolution higher than 3.5 Å. Using LOG-based particle picking, 2,543,900 particles were auto-picked, extracted with threefold binning (1.98 Å per pixel) and were sorted by 2D and 3D classification. A total of 301,020 particles were re-extracted to 0.66 Å per pixel and subjected to further refinement where the Fab and the detergent micelle were masked out. Subsequent particle polishing and per-particle CTF refinement allowed to refine the particles to 2.97 Å resolution. To improve the resolution of the substrate the particles were subjected to an additional round of nonuniform refinement in cryoSPARC v.3.2 (https://cryosparc.com/) with optimized per-particle defocus and optimized per-group CTF parameters yielding a final resolution of 2.99 Å, using a soft mask around the entire particle.
For the structure of the ternary complex of CeDPY19, 21,730 movies were collected, corrected for beam-induced motion using MotionCor2 (ref. 54) and subjected to further processing in RELION v.3.1. The CTF was estimated using Gctf 55 . Next, 12,385 micrographs were selected for further processing as they had an estimated resolution higher than 3.5 Å. Using LOG-based particle picking, 4,309,649 particles were https://doi.org/10.1038/s41589-022-01219-9 auto-picked, extracted with threefold binning (1.98 Å per pixel) and were sorted by 2D and 3D classification. A total of 276,165 particles were re-extracted to 0.66 Å per pixel and subjected to further refinement where the Fab and the detergent micelle were masked out. Subsequent particle polishing and per-particle CTF refinement allowed refinement of the particles to 3.2 Å resolution when masking out the Fab-Nb complex and the detergent micelle. To improve the resolution of the substrates the particles were subjected to an additional round of nonuniform refinement in cryoSPARC v.3.2 (https://cryosparc.com/) with optimized per-particle defocus and optimized per-group CTF parameters yielding a final resolution of 3.31 Å. To identify particles with a high occupancy of Dol25-P-C-Man, the particles were subjected to 3D variability in cryoSPARC v.3.2 analysis while masking out the Fab and the detergent micelle. Next, 57,289 particles were selected therefrom and were subjected to nonuniform refinement in cryoSPARC v.3.2 with optimized per-particle defocus and optimized per-group CTF parameters yielding a final overall resolution of 3.63 Å, using a soft mask around the entire particle. Local resolution estimates for all structures were calculated in RELION 3.1.
Atomic coordinates were built manually in Coot (https://www. ucl.ac.uk/~rmhasek/coot.html), were refined in PHENIX (http://www. phenix-online.org/), and were validated using MolProbity (http:// molprobity.biochem.duke.edu/). The model of apo CeDPY19 was built de novo and the ligand-bound CeDPY19 models were built based on the coordinates of the apo structure. The structure of the CMT2-Fabanti-Fab-Nb complex was built based on a published model of a Fabanti-Fab-Nb complex 37 . Ligands were generated using eLBOW (https:// phenix-online.org/documentation/reference/elbow.html).

Solid phase peptide synthesis
Solid phase peptide synthesis was performed on a CEM Liberty Blue Automated Microwave Peptide Synthesizer. Rink amide resin (0.68 mmol g −1 loading) was swollen in N, N-dimethylformamide (DMF) for 1 h then washed with DMF (10 ml) and CH 2 Cl 2 (2 × 10 ml) before coupling of the first Fmoc-protected amino acid. Unless otherwise stated, Fmoc-protected amino acids were coupled under the following condition: 4 eq. Fmoc-Xxx-OH, 1.25 eq. DIPEA, 5 eq. DIC, 5 eq. Oxyma, microwave 90 °C, 4 min. Fmoc deprotection was accomplished by treating the resin with 20% (v/v) pyrrolidine in DMF at 75 °C for 5 min, then washing three times with DMF. N-terminal capping acetyl was achieved by treating the resin with Ac 2 O/DIPEA/DMF (5/5/90, v/v/v, 5 ml) at 22 °C for 30 min. Cleavage of the completed peptide from the resin was achieved with a solution of TFA/iPr 3 SiH/H 2 O (95/2.5/2.5, v/v/v, 5 ml) at 40 °C for 40 min. The cleavage solution was concentrated to 2 ml and the product precipitated by the addition of ice-cold Et 2 O (4 ml). The crude peptide precipitate was collected and purified by preparative reversed-phase high-performance liquid chromatography (HPLC) using a system comprising a Waters ZQ 3100 mass detector, 2545 pump, SFO

LC-MS analysis of glycopeptides
Here, 2 µl of samples were analyzed on a calibrated Q-Exactive mass spectrometer (Thermo Fischer Scientific) coupled to a nano-Acquity UPLC system (Waters). Peptides were resuspended in 2.5% acetonitrile with 0.1% formic acid and loaded onto an Acclaim PepMap 100 trap column (75 µm × 20 mm, 100 Å, 3 µm particle size) and separated on a nano-ACQUITY UPLC BEH130 C18 column (75 µm × 250 mm, 130 Å, 1.7 µm particle size), at a constant flow rate of 300 nl min −1 , with a column temperature of 50 °C and a linear gradient of 2-60% acetonitrile/0.1% formic acid in 20 min, and then 60-98% acetonitrile/0.1% formic acid in 5 min, before being held isocratically for another 5 min. The mass spectrometer was operated under data-dependent acquisition, one scan cycle composed of a full-scan MS survey spectrum, followed by up to 12 sequential higher energy collisional dissociation (HCD) tandem MS (MS/MS) on the most intense signals above a threshold of 1 × 10 4 . Full-scan MS spectra (600-2,000 m/z) were acquired in the FT-Orbitrap at a resolution of 70,000 at 400 m/z, while HCD MS/MS spectra were recorded in the FT-Orbitrap at a resolution of 35,000 at 400 m/z. HCD was performed with a target value of 1 × 10 5 and normalization collision energy 25 was applied. automated gain control (AGC) target values were 5 × 10 5 for full Fourier transform-MS. For all experiments, dynamic exclusion was used with a single repeat count, 15 s repeat duration and 30 s exclusion duration. There was one clean run between samples.

Chemical synthesis of phosphonate donor mimic Dol25-P-C-Man
In brief, β-d-mannosyl phosphonate 13 was synthesized from allyl 2,3,4,6-tetra-O-acetyl-α-d-mannopyranoside 9 in a nine-step sequence based on a previously described method 56 . Notably, installing isopropylidene protecting groups was essential to achieve a high degree of anomeric control during the key Horner-Wadsworth-Emmons phosphonate insertion reaction. The α-anomer was readily separated after cleavage of the less stable 4,6-O-isopropylidene protecting group (reaction schemes and compounds are shown in the Supplementary  Information).

Cloning CeDPY19 mutants
Site-directed mutagenesis of CeDPY19 was accomplished by PCR amplifying the pGAPZ-CeDPY19 plasmid 26 with mutagenic primers in two reactions to give two double-stranded DNA fragments, which were reassembled to give the mutant vector using Gibson Assembly (NEB, E2611L). The mutagenic primer pairs used to make each mutant are provided in Supplementary Tables 2 and 3. These Gibson Assembly reactions were transformed into chemically competent E. coli DH5α and transformants selected on low-salt Luria-Bertani media agar plates using zeocin (50 µg ml −1 ) with the exclusion of light. Plasmids from single colonies were prepared for each mutant and verified by Sanger sequencing.

Nature Chemical Biology
Article https://doi.org/10.1038/s41589-022-01219-9 Integrating CeDPY19 mutants into Pichia pastoris For each mutant, 15 µg of plasmid DNA was linearized overnight using AvrII (NEB, R0174S) then purified by ethanol precipitation. This DNA was transformed into electrocompetent GS115 P. pastoris cells that had been complemented with a pPIC9K-RNase2 vector to enable methanol-induced expression of human RNAse2 (ref. 26). Transformants were selected for on YPDS agar plates with 100 µg of zeocin at 30 °C for 96 h with the exclusion of light. Colonies for each mutant were restreaked onto fresh yeast extract-peptone-dextrose-sorbitol (YPDS) agar plates with 100 µg of zeocin and grown at 30 °C for 96 h with the exclusion of light. Clones for each mutant were subjected to colony PCR using the primer pairs DPY075/DPY076, DPY077/DPY078 and DPY079/DPY080 (Supplementary Table 2) to confirm integration of the linearized vector into the GAP promoter 26 . Positive clones were used to inoculate YPD medium (5 ml) to produce material to verify mutant CeDPY19 expression by western blot (Supplementary Fig. 2). These cultures were grown to an optical density (OD 600 ) of 1.0 and the cells collected by centrifugation (1,500g, 30 min, 4 °C). The pellet was resuspended in buffer (10 mM NaPi pH 7.5, 1 M sorbitol, 10 mM EDTA, 100 mM dithiothreitol (DTT)) with lyticase (1 U, SigmaAldrich) and the mixture gently nutated at 37 °C for 1 h. SDS was added to a final concentration of 10% w/v and the mixture gently nutated at 22 °C for 10 min. SDS-PAGE loading buffer was added and the sample held at 37 °C for 10 min (further heating resulted in aggregation of the hydrophobic CeDPY19 protein). Western blot analyses of these samples were performed with M2 anti-FLAG mouse IgG1 (1:5,000, SigmaAldrich, F3165) as the primary antibody and goat antimouse horseradish peroxidase conjugate (1:10,000, ThermoFisher, 62-6520) as the secondary antibody. Membranes were imaged using a ChemiDoc System (Bio-Rad) and processed using Image Lab v.6.1 (Bio-Rad).

Isolation and digestion of RNase2 coexpressed with CeDPY19 mutants
Each yeast strain harboring a CeDPY19 mutant was used to inoculate BMGY medium (10 ml) and the cultures grown at 30 °C and 225 rpm for 20 h. These were centrifuged (1,500g, 10 min, 4 °C), the supernatant discarded and the cell pellet resuspended in BMMY medium to induce expression of RNase2. Cultures were grown at 30 °C and 225 rpm for 20 h, centrifuged (4,000g, 30 min, 4 °C) and the supernatant collected. The supernatant was adjusted to be buffered with 50 mM Tris-HCl pH 7.5, 200 mM NaCl and 1 mM EDTA. Protease inhibitor cocktail (Roche, cOmplete EDTA-free) and 0.02% sodium azide were added to the buffered supernatant and the sample filtered (0.22 µm). A 50% slurry (75 µl) of anti-FLAG M2-affinity gel (SigmaAldrich, A2220) was added to capture the secreted RNase2 and the samples nutated overnight at 4 °C. The affinity gel was pelleted by centrifugation (500g, 15 min, 4 °C) and the supernatant decanted. The gel was transferred to a spin-cup and washed three times with 500 µl of 50 mM Tris-HCl pH 7.5 and 200 mM NaCl. The spin-cup containing the gel was transferred to a fresh 1.5-ml microcentrifuge tube and 500 µl of elution buffer (50 mM Tris-HCl pH 7.5, 100 mM NaCl, 5% SDS) added. These samples were incubated at 22 °C for 10 min then 85 °C for 5 min before elution of the purified RNase2 in SDS solution by centrifugation (3,000g, 3 min). These protein samples were reduced by the addition of DTT to a final concentration of 20 mM with incubation in a ThermoMixer (Eppendorf) at 95 °C and 750 rpm for 15 min. After cooling, iodoacetamide was added to a final concentration of 100 mM and the sample incubated for 30 min at 22 °C with the exclusion of light. These alkylation reactions were quenched by the addition of DTT to a final concentration of 200 mM and incubation at 22 °C for 10 min. These reduced and alkylated samples were acidified with phosphoric acid to a final concentration of 1.2% and then diluted with 6 volumes of S-TRAP binding buffer (100 mM TEAB pH 7.55, 90% methanol). These samples were applied to S-TRAP mini columns (ProtiFi) and washed with S-TRAP binding buffer (4 × 400 µl). The captured protein was digested on-column using sequencing-grade Glu-C (Promega, V1651) in 125 µl 100 mM NH 4 HCO 3 buffer pH 7.8 at 37 °C for 16 h. Digested peptides were eluted from the column by centrifugation (4,000g, 1 min) after each addition of: 80 µl 100 mM NH 4

LC-MS analysis of digested RNase2
Samples were resuspended in buffer A* (0.1% TFA, 2% MeCN) and separated using a two-column chromatography set-up composed of a Pep-Map100 C18 20 mm × 75 µm trap and a PepMap C18 500 mm × 75 µm analytical column (ThermoFisher) coupled to an Orbitrap Exploris 480 Mass Spectrometer (ThermoFisher). A 65 min gradient was run for each sample with loading onto the trap column at 6 µl min −1 for 6 min using buffer A (2% DMSO, 0.1% HCO 2 H), followed by separation on the analytical column by altering the buffer composition from 3% buffer B (2% DMSO, 78% MeCN, 0.1% HCO 2 H) to 23% buffer B over 29 min; then from 23% buffer B to 40% buffer B over 10 min; then from 40% buffer B to 80% buffer B over 5 min; then holding at 80% buffer B for 5 min; then dropping to 3% buffer B over 1 min and holding at this value for another 9 min. The Orbitrap Exploris 480 Mass Spectrometer was operated in a hybrid data-dependent and independent manner switching between 2.5 s of data-dependent acquisition and roughly 0.5 s of parallel reaction scans for specific peptides of interest. For the data-dependent acquisition, a single Orbitrap MS scan (300-1,800 m/z; maximum injection time 25 ms; AGC 300%; resolution 120,000) was undertaken followed by MS2 HCD scans of precursors (maximum injection time 80 ms; AGC 400%; resolution 30,000 and stepped normalized collisional energy of 20, 30 and 40%) for up to 2 s. For parallel reaction monitoring, the ions 1,144.0522, 763.0372, 1,225.0072, 817.0539, 1,306.1022 and 871.0706 corresponding to the +2/+3 charge states of the (glyco)peptide NLYFQGKP-PQFTWAQWFE in the nonglycosylated, singly glycosylated and doubly glycosylated states were monitored. Parallel reaction scans were undertaken using a maximum injection time of 80 ms, AGC 500%, resolution 45,000 and stepped normalized collisional energy of 25, 30 and 40%.

Occupancy analysis of RNAse glycopeptides
To quantify the relative level of C-glycosylation within samples, extracted ion chromatograms of the monoisotopic peaks of the +2 charge states were extracted (±10 ppm) using Freestyle Viewer v.1.7 SP1 (Thermo Fisher Scientific). Peaks were processed with a 15-point Gaussian smooth and the area under the curve calculated. The resulting areas were used to calculate the relative abundance of peptide species, with occupation rates determined as a percentage of the total ion current of the compared peptide species. The resulting MS data and search results have been deposited into the PRIDE ProteomeXchange Consortium repository (http://www.proteomexchange.org/). These can be accessed with the identifier PXD032391 using the username reviewer_pxd032391@ebi.ac.uk and password sItWEoNT.

Statistics and reproducibility
Unless otherwise stated, all described in vitro CeDPY19 assays and Tricine gel-based analyses were conducted once as shown in the figures. All in vivo assays were performed as independent triplicates. Depicted 'representative micrographs' were chosen randomly but are Nature Chemical Biology Article https://doi.org/10.1038/s41589-022-01219-9 representative of all micrographs that were visually observed during the respective cryo-EM data collections.

Reporting summary
Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.

Data availability
Atomic coordinates of the CeDPY19 models have been deposited in RCSB Protein Data Bank under accession numbers 7ZLH (apo), 7ZLH (peptide-bound), 7ZLI (Dol25-P-Man-bound) and 7ZLJ (Dol25-P-C-Man-and peptide-bound). The three-dimensional cryo-EM maps were deposited in the Electron Microscopy Data Bank under accession numbers EMD-14780 (apo), EMD-14779 (peptide-bound), EMD-14781 (Dol25-P-Man-bound) and EMD-14782 (Dol25-P-C-Man-and peptide-bound). MS data to quantitate tryptophan C-mannosyaltion on RNAse2 has been deposited to the PRIDE proteomics repository under the accession number: PXD032391 using the username reviewer_pxd032391@ebi.ac.uk and password sItWEoNT. Source data are provided with this paper.