Engineering subtilisin proteases that specifically degrade active RAS

We describe the design, kinetic properties, and structures of engineered subtilisin proteases that degrade the active form of RAS by cleaving a conserved sequence in switch 2. RAS is a signaling protein that, when mutated, drives a third of human cancers. To generate high specificity for the RAS target sequence, the active site was modified to be dependent on a cofactor (imidazole or nitrite) and protease sub-sites were engineered to create a linkage between substrate and cofactor binding. Selective proteolysis of active RAS arises from a 2-step process wherein sub-site interactions promote productive binding of the cofactor, enabling cleavage. Proteases engineered in this way specifically cleave active RAS in vitro, deplete the level of RAS in a bacterial reporter system, and also degrade RAS in human cell culture. Although these proteases target active RAS, the underlying design principles are fundamental and will be adaptable to many target proteins. Chen et al. describe a rational design of subtilisin mutants that degrade active RAS by cleaving a conserved sequence in switch 2. They further modified the active site to be dependent on a cofactor to generate high target specificity. Proteases engineered to cleave this region degraded RAS in vitro and in cells with a promise of adaptability for other target proteins too.

ngineering proteases that cleave specific signaling proteins in active conformations would open new possibilities to study, regulate, and reprogram signaling pathways 1 . Our target here was the active state of the rat sarcoma (RAS) oncoprotein. RAS is a small GTPase that responds to growth factors, activates downstream effector molecules (such as those in the MAPK pathway), and stimulates cell growth. Three RAS isoforms (HRAS, KRAS, and NRAS) are the primary regulators of cell signaling pathways. All three isoforms coexist in cells and have distinct but overlapping roles in signaling 2,3 . Inherent to the function of RAS is a switch between its inactive (GDP-bound) and active (GTP-bound) forms. Two regions, termed switch 1 (amino acids [30][31][32][33][34][35][36][37][38] and switch 2 (amino acids [59][60][61][62][63][64][65][66][67][68][69][70][71][72][73][74][75][76], undergo structural changes as RAS cycles between the GDP-bound and GTP-bound forms in all RAS isoforms. The GTP-bound conformation drives the cascade of signaling effects. Based on these observations we have engineered the serine protease subtilisin to target the conserved QEEYSAM sequence in switch 2 of RAS (amino acids 61-67, Fig. 1A). Examination of several crystal structures of active RAS identified considerable conformational heterogeneity in switch 2 induced by a γphosphate on the nucleotide cofactor [4][5][6] . We hypothesized that these structural changes would uncover a cryptic cleavage site, making the QEEYSAM target sequence more vulnerable to proteolysis in active RAS than in the inactive form. Motifs of this type generally occur in amphipathic helices because most of the amino acids have high α-helical propensity and also because the spacing between the large hydrophobic amino acids (Y and M) matches a helical periodicity.
Natural subtilisins have broad sequence specificity, but there is a wealth of information about engineering mutations that alter specificity [7][8][9][10][11][12][13][14] . The major challenge to designing high-specificity proteases, however, is that specificity based on differential substrate binding falls far short of natural processing proteases. In many natural proteases the cognate sequence influences the chemical steps in peptide hydrolysis and not just binding steps 15 . To engineer highly sequence-specific subtilisins we combined two previous observations. First, mutations at remote binding pockets for substrate side chains (sub-sites) can distort the subtilisin active site 9,11 . Second, mutating a catalytic amino acid in an enzyme may allow chemical rescue with a small molecule cofactor 16,17 . We leveraged these observations to create an engineered protease that requires both conformational and chemical rescue by the substrate and cofactor, respectively, to achieve high levels of activity. In particular, for the fully engineered RASspecific proteases described here, binding of the cognate QEEY-SAM sequence results in structural changes that are transmitted from the binding pockets to the active site and enable cofactor activation. In contrast, binding of non-cognate sequences adversely affects the active site and in fact antagonizes cofactor binding. These cofactor-dependent subtilisins strongly prefer dynamic regions of proteins (such as switch 2 in active RAS) over structured regions because efficient cleavage requires interactions with P5 to P1' amino acids in an extended conformation (Fig. 1b).
Engineered proteases were tested in vitro, in engineered E. coli, and in human cell culture. Major findings are the following: (1) Engineered proteases cut the QEEYSAM sequence in a synthetic peptide substrate with high specificity and are tightly controlled by the cognate cofactor; (2) X-ray crystal structures of proteasesubstrate-cofactor complexes reveal interactions involved in substrate recognition and cofactor activation; (3) The QEEYSAM sequence in native RAS is cut in response to the cognate cofactor; (4) RAS-specific proteases cut active (GTP) RAS 60-80 times faster than the inactive (GDP) form; (5) NMR analysis reveals that the QEEYSAM sequence is more dynamic in active RAS than the inactive form; (6) The level of RAS within E. coli cells can be regulated by co-expression of the RAS gene with different protease and cofactor combinations; (7) A RAS-specific protease can destroy RAS in mammalian cells.

Results
Protease engineering. The engineering process to develop specificity for the RAS sequence QEEYSAM involved extensive modification of the Bacillus protease, subtilisin BPN' [16][17][18][19][20][21][22][23][24] , a canonical serine protease in which the scissile peptide bond is attacked by a nucleophilic serine (S221). The nucleophilicity of S221 results from its interactions with the catalytic histidine (H64) and aspartic acid (D32) that together form a charge relay system 25 . Most substrate contacts are with the first five amino acids on the acyl side of the scissile bond (denoted P1 through P5) 26 and the first amino acid on the leaving group side (denoted P1′, Fig. 1b). Corresponding sub-sites on subtilisin are denoted S1′, S1, S2, etc. Natural subtilisins have a strong preference for a hydrophobic amino acid at S1 and S4 sub-sites (Fig. 1B), but little discrimination for a particular hydrophobic amino acid 27,28 . The key to engineering high sequence-specificity was linking interactions in amino acid sub-sites to the rate of the first chemical step (acylation). In our design strategy, we considered two welldocumented properties of proteases. The first is that certain mutations at sub-sites alter the conformation of remote catalytic residues 9,11 . The second observation is that mutating a catalytic amino acid radically decreases activity of an enzyme 29,30 but in certain cases allows some chemical rescue of activity by an exogenous small molecule that mimics the mutated amino acid [31][32][33] . The design strategy was based on the hypothesis that a linkage Fig. 1 Target sequence (QEEYSAM) in RAS and in the active site of a RAS-specific protease. a Structure of RAS with a bound GTP analog (PDB code 6Q21) 5,6 , highlighting the YSAM site in Switch 2. b RAS-specific protease based on an X-ray structure of 3BGO.pdb 17 . Cognate sequence QEEYSAM-RD is modeled in the binding cleft. Substrate residues are denoted P1 through P5, numbering from the scissile bond toward the N-terminus of the substrate. The substrate amino acid on the leaving group side is denoted P1'. between substrate binding and chemical rescue can be created by mutating sub-sites to optimize interactions with the desired cognate sequence and combining these with mutation of a catalytic amino acid.
To begin testing this hypothesis robustly, we generated two different active site variations: D32G variants that are activated by nitrite or azide 16,17 and H64G variants that are activated by imidazole. Initial design was based on the X-ray structure of a D32A variant of subtilisin (SBT189, 3BGO.pdb) 17 . The structure shows the enzyme complexed with a cognate peptide (LYRAL) and azide bound in the position normally occupied by the catalytic D32. The substrate-binding pockets and the azide binding site form an interconnected network ( Fig. S1) 8,11,34,35 . The theory is that binding at one sub-site can influence interactions in other parts of the network. The desired cognate sequence QEEYSAM-RD for RAS was modeled into the binding cleft of the 3BGO.pdb structure. Amino acid substitutions were introduced into the model and protein-protein and protein solvent interactions were evaluated by visual inspection 36 . Based on this analysis, as well as earlier engineering work, we designed mutations in the catalytic region, the S1 pocket, and the S4 pocket to create a nitrite-dependent (D32G) protease and an imidazoledependent (H64G) protease (Table S1). These are denoted Protease1(N) and Protease1(I), respectively. These mutants were expressed, purified, and characterized for activity and specificity with a substrate series that was originally used to characterize SBT189: sDXKAM-AMC, where X = Y, F, I, or L 16,17,22,24 . AMC is the fluorogenic leaving group, 7-amino-4-methylcoumarin. Activity of both proteases is highest for P4 = F with lower activity for P4 = Y, I, and L (Fig. S2). The ability of other naturally occurring anions to activate Protease1(N) was examined using nitrite, formate, acetate, carbonate, and chloride (Fig. S3A). The ability of imidazole derivatives to activate the Protease1(I) was examined using imidazole, 4-hydroxymethyl-imidazole, imidazolepropionic acid, imidazoleacetic acid, and histamine (Fig. S3B). Consistent with earlier results 17 , nitrite is the strongest activator of Protease1(N), with formate and chloride activating weakly at 10 mM concentration 17 . Imidazole is the strongest activator of Protease1(I) with 4-hydroxymethyl-imidazole) weakly activating at 10 mM concentration. As expected, none of the anions affect the activity of Protease1(I) and none of the imidazole compounds affect the activity of Protease1(N).
The next step was to determine the X-ray crystal structure of Protease1(N) in complex with a cognate peptide LFRAL (6UBE. pdb) and use this structure to model additional mutations in the S4 pocket. To increase specificity for P4 = Y, we introduced the single and double mutations of V107I and L135V into Protease1 (N) and Protease1(I) and examined the activity of the mutants on P4 = F vs. Y substrates. (Fig. 2a). The double mutants (denoted Protease2(N) and Protease2(I), Table S1) exhibited considerable preference for the substrate sDYKAM-AMC. When P4 is F instead of Y, k cat /K M falls by more than two-fold. All other variations at P4 result in a greater than 50-fold decrease in k cat /K M (Fig. 2b, c). Further evaluation was carried out with peptide-AMC substrates with variations at P1 and P2. As previously observed for D32 mutants, H64G mutants also show high preference for P1 = M or L, and a moderate preference for P2 = A (Fig. S4) 16,17 . A thorough sub-site analysis previously performed on the natural subtilisins BPN' and lentus documents their high activity again a broad range of substrates sequences 37,38 .
Finally, we examined variations to increase activity for P3 = S and P5 = E. Protease mutants at 101 (S, K, R) and 103 (Q, R) were evaluated using the peptide substrates QEXYSAM-AMC, where X = E, R, I, or L and QEEYXAM-AMC, where X = S, R, or E. The S101K mutants gave a high preference for the desired cognate sequence QEEYSAM. The S101K mutants were further analyzed for activity using the peptide substrate QEEYSAM-AMC. Figure 3a compares k cat /K M as a function of nitrite concentration for Protease2(N) and the S101K mutant. Figure 3b compares k cat /K M as a function of imidazole concentration for Protease2(I) and the S101K mutant. The S101K mutation in both Protease2(N) and Protease2(I) increases the k cat /K M by~8-fold in 1 mM cofactor. The two S101K mutants are denoted as RAS-specific proteases RASProtease(N) and RASProtease(I). For reference, the activity of a previously engineered protease (SBT160) with a complete catalytic triad and preference for P4 = F or Y is also shown 24 . As expected, SBT160 activity is unaffected by nitrite or imidazole (Fig. 3a, b).
Synergy between conformational and chemical rescue. Comparing nitrite activation in RASProtease(N) with the cognate substrate (QEEYSAM-AMC) versus a near cognate (QEEISAM-AMC) revealed a linkage between cognate substrate binding and cofactor activation (Fig. 4a). In 1 mM nitrite k cat /K M is~100-fold greater for P4 = Y versus I. The low activity of RASProtease(N) with the near cognate substrate is partly because of weaker substrate binding (5.5-fold less for P4 = I than P4 = Y) but mostly because of a lack of cofactor activation when P4 = I. A similar phenomenon is observed with RASProtease(I). The k cat /K M value is~300-fold higher for the cognate P4 = Y versus P4 = I in 1 mM imidazole (Fig. 4B).
Structural analysis of protease interactions with the switch 2 target sequence. To understand the structural basis for specificity and cofactor activation, we determined crystal structures of RASProtease(I) alone, in complex with YSAM and QEEYSAM product peptides, and with imidazole ( Fig. 5a, S5-S8). We compared these structures with the crystal structure of Protease1(N) in complex with a peptide corresponding to its sequence specificity (LFRAL, Fig. 5b, right).
Overall the structures are very similar, with an RMSD between Cα carbons of 0.17 Å. The aromatic ring of the P4 side chain has common van der Waals interactions: the Cα and Cβ atoms of A104 interact with Cε1, the Cδ1 of L126 interacts with Cδ2, and Cβ of S128 interacts with Cδ2 and Cε2. Because of the space created in the S4 pocket by the L135V mutation, either a water or   an ion coordinates the hydroxyl of P4 Tyr, the hydroxyl of Y171, and the backbone nitrogen of S132 in RASProtease(I) (Fig. 5b,  Fig. S5). We currently have this modeled as a chloride ion based on the peak height observed in an anomalous difference fourier map; however, added chloride does not appear to influence kinetic properties. P4 interactions are facilitated by a significant shift of the loop containing residues 130 to 133 of RASProtease(I) relative to the Protease1(N) complex. In fact, these are the only residues in the entire structure whose Cα positions shift by more than 1 Å, including the N-and C-termini.
In contrast to P4, the changes to the P3 site are more limited (Fig S6). Mutation of S101K increases activity with Ser at the P3 position. Hydrogen bonding interactions between the backbone nitrogen and carbonyl oxygen of the P3 residue and the corresponding partners on residue G127 are conserved.
The structures also provide insight into the structural basis for cofactor activation by both nitrite in Protease1(N) and imidazole in RASProtease(I). In both cases, a catalytic amino acid is mutated to Gly and the vacated space is occupied by a network of conserved water molecules in the absence of the cognate cofactor. Nitrite was modeled into the solvent network of the Protease1(N) structure (replacing HOH 54; Fig. S7A) such that it supplies the critical Hbond to the catalytic H64 and is coordinated to N33 and three conserved waters (23, 302, and 382). The coordination sphere of nitrite in Protease1(N) appears to be more complete than observed for azide in the parent enzyme (3BGO) and suggests why nitrite is tightly bound in spite of its lower pKa (3.37, compared to 4.72 for azide). Structures of RASProtease(I) without imidazole have three conserved waters (19,81, and 122) that interact with Oδ1 and Oδ2 of D32, CO of S125, NH of G64, Oγ of S62, and Oγ of S221. When imidazole binds, these waters are displaced, the imidazole nitrogens H-bond to Oδ1 and Oδ2 of D32 and Oγ of S221, and the charge relay system is reconstituted (Fig. S8, S7B). An additional interesting feature of the structure of RASProtease(I) with QEEYSAM was the presence of an acyl adduct between the Cterminus of the P1 Met of the peptide and the Oγ of S221 of the enzyme ( Fig S5). Apparently, the binding energy of the QEEYSAM peptide is sufficient to push its terminal carbon and Oγ of S221 into an orientation that drives the equilibrium from a product complex to the acyl enzyme 39 . Studying these types of subtle changes to the water structure in the active site that occur upon substrate and cofactor binding will allow us to make further changes to promote chemical and conformational rescue.
Cleavage of RAS(GDP) and RAS(GMPPNP). The next step in assessing our RAS-specific proteases was to monitor cleavage of native RAS protein, wherein switch 2 would adopt a range of conformations from extended to helical. Moreover, measuring protein cleavage allowed us to assess the relative rates of cleavage for active versus inactive RAS, thereby testing our hypothesis that active RAS is more vulnerable to proteolytic attack due to increased dynamic motion in switch 2. To trap RAS in an active conformation, we used an adduct with guanosine 5′-[β,γ-imido] triphosphate (GMPPNP), a slow hydrolyzing analog of GTP. The cleavage of RAS(GMPPNP) is then compared with the inactive RAS(GDP) form. The goal of the kinetic experiments is to achieve a quantitative understanding of individual steps in the reaction pathway. A minimum realistic mechanism is as follows: where R = RAS, P = protease, r 1 = the RAS N-terminal cleavage product and r 2 = the RAS C-terminal cleavage product. K S is the dissociation constant for uncleaved RAS and k 2 is the rate of the first chemical step in peptide bond cleavage (acylation). K P is the dissociation constant for the r 1 product fragment. To accurately determine kinetic parameters for RAS cleavage, we measured the kinetics of QEEYSAM-AMC cleavage in the presence of RAS(GDP) and RAS(GMPPNP). Data for RASProtease (I) are shown in Fig. 6a, b. The concentration of RAS(GMPPNP) or RAS(GDP) was varied from 0 to 20 µM. Interaction of protease with RAS is manifested as inhibition of peptide-AMC cleavage by RAS and RAS cleavage products. Early in the progress curves, the major inhibitory species is native RAS. As the reaction progresses, inhibition increases as the concentration of the r 1 fragment increases. K P of r 1 was determined in an independent series of experiments. To accomplish this, a fragment corresponding to the N-terminal cleavage product of RAS (amino acids 1-67, ending at YSAM) was expressed in E. coli and purified. Inhibition of protease by r 1 was measured by varying its concentration from 50 nM to 3.3 µM in reactions of each protease and with 0.1, 0.5, and 1 µM QEEYSAM-AMC (Table S2). NMR analysis of cleaved RAS shows that both r 1 and r 2 are disordered (Fig. S9).
Kintek Explorer was used to fit all data to mechanism 1 (Fig. 6, Table S2).
Values of k 2 /K S were calculated to compare specificity for active RAS, independent of the r 1 product dissociation rate (Table S2). The analysis shows that Although less quantitative, the increased rate of cleavage of RAS(GMPPNP) relative to RAS(GDP) could also easily be observed by SDS gel analysis (Fig. S10). Analysis by MALDI confirms that the enzyme cleaves RAS after the QEEYSAM site with no detectible off-target cleavages (Fig. S11). Consistent with the low intrinsic activity observed in peptide assays, no cleavage of RAS was observed in the absence of a cognate cofactor with either enzyme. Measurements of binding uncleaved RAS (GMPPNP) and RAS(GDP) with RAS protease were also made by gel filtration and show the increased affinity of protease for RAS(GMPPNP) (Fig. S12).
Analysis of the target region in RAS by NMR. To better understand how dynamics in switch 2 contribute to RAS protease specificity for the active form, we examined the dynamics of this region by NMR. An order-to-disorder transition of switch 2 has been previously observed in RAS crystal structures 40,41 . Twodimensional 1 H-15 N HSQC spectra indicate extensive structural differences between the GDP-and GMPPNP-bound forms of RAS (Figs. S13A, B). NMR backbone resonance assignments were made using standard procedures and deposited in BioMa-gResBank (accession codes for the GDP and GMPPNP forms are 28008 and 28009, respectively). Most of the amide signals in the switch 1 (residues 30-38) and switch 2 (residues 59-76) regions are detectable in the GDP-bound state but are exchange broadened in the GMPPNP-bound form, indicating enhanced dynamics in the GMPPNP state on the NMR timescale (microseconds to milliseconds). Comparable observations have been made with WT HRAS, WT KRAS, and other RAS mutants [42][43][44][45][46][47][48][49] . Analysis of differences in backbone dynamics using 15 Nrelaxation measurements indicates that the GMPPNP-bound form of RAS is also considerably more flexible on the ps-ns timescale than the GDP-bound form (Fig. S13C, Fig. S14). Thus, the effect of GMPPNP binding is to increase main chain flexibility for a large number of residues in the molecule over a wide timescale range, making the GMPPNP-and GTP-bound states, and the switch regions in particular, more susceptible to proteolytic cleavage than the GDP-bound form.
Co-expression of RAS with RASProtease(I) in E. coli. To measure activity and specificity in cells, we developed a bacterial system for co-expressing RAS with RAS-specific proteases. We constructed genes for RAS fusion proteins (Fig. S15) that consisted of an N-terminal G A domain 50,51 , amino acids 1-166 of human HRAS, and a C-terminal cellulose binding domain 52 . The small N-terminal and C-terminal binding domains allow easy purification of the entire fusion protein as well as both cleavage products so that the precise cleavage site in E. coli can be determined. The expression of the RAS fusion protein is shown in Fig.  S16. The protease was co-expressed as a zymogen (34.9 kDa) consisting of an N-terminal inhibitory (I) domain (8.5 kDa) and the mature protease domain (26.4 kDa) 53 . Growth was at 37˚C either without imidazole (Fig. S17) or with 100 µM imidazole (Fig. 7) added to the culture media. The gel patterns show that the protease zymogen is processed into the I-domain and mature protease in both cases, but RAS cleavage is dependent on imidazole and coupled to degradation of the I-domain into smaller fragments. RAS fragments r 1 and r 2 appear as intact I-domain  disappears. By 41 hrs. in imidazole, all I-domain is cleaved and greater than 50% of RAS is cleaved (Fig. 7). The gel pattern also shows that RAS is specifically cleaved into two discrete fragments. We confirmed that RAS was cleaved after the QEEYSAM sequence by purifying the r 1 fragment from the E. coli extract and measuring its mass by MALDI. There is no indication in the gel pattern of E. coli proteins being degraded. The fact that~50% of RAS remains intact after 41 h is consistent with a high RAS protease preference for active RAS in the cell. Newly synthesized RAS predominantly binds GTP and therefore initially exists in the dynamic, active conformation. RAS(GTP) then converts to RAS(GDP) at a rate of~1 h −1 54-56 . Because E. coli lacks GDP-GTP exchange factors, we assume that RAS remains in the GDP bound form after hydrolysis and is less vulnerable to cleavage. After new RAS synthesis stops, RAS accumulates in the inactive and partially protease-resistant form.
Co-expression of RAS with RASProtease(N). Fig. S18 shows coexpression of the RAS gene with the RASProtease(N) zymogen gene. RASProtease(N) cleaves~70% of RAS into two discrete fragments within 18 h of growth but the remaining RAS remains intact after 48 h. Unlike the case with imidazole, we are not able to precisely control nitrite concentration because it is an intermediate in metabolic pathways involving nitrogen in E. coli 57,58 . Nevertheless, the RAS cleavage by nitrite-activated proteases is potentially informative because nitrite is a disease marker in eukaryotic cells and its concentration in E. coli is likely similar to that in cancer cells 57,58 . RASProtease(N) appears to be more active in E. coli with endogenous nitrite than RASProtease(I) in 100 µM imidazole. Even so, a cleavage-resistant population of RAS remains after 48 h. We presume this is RAS(GDP).
Engineered proteases can destroy active RAS in mammalian cells. Finally, we tested whether a RASProtease(N) could cleave the switch 2 target sequence in a human cell. Based on our NMR data and the fact that switch 2 is conserved among the 3 major RAS isoforms, any RAS isoform would be appropriate for this type of experiment. However, since KRAS is the most important cellular and therapeutic target, we used KRAS as our target for cell-based experiments. We employed fluorescently tagged RAS (eGFP-KRAS) as a means to simplify the detection of cleaved product. This fusion (48 kDa) can be observed in a Western blot probed with an anti-GFP monoclonal antibody (Abcam, Fig. 8a, lane 1). Induction of expression of the protease zymogen with doxycycline results in a marked depletion of eGFP-RAS and a corresponding increase in the presence of a band consistent with the eGFP protein from the fusion (Fig. 8a, lane 2). Probing the same samples with an anti-RAS antibody (Ras10, ThermoFisher) confirms that this band contains an eGFP-RAS fusion and further shows that RAS disappearance coincides with the appearance of the eGFP product (Fig. 8a, lanes 1 vs. 2). As expected, primary processing of the RASProtease(N) zymogen (34.9 kDa) to the mature protease (26.4 kDa) occurs readily in cells (Fig. 8A, lane  2). In addition, fluorescence microscopy images show a marked decrease in eGFP signal upon induction of RASProtease(N) (Fig. 8b). Disappearance of the eGFP-RAS fragment indicated that, as in E. coli, cleaved RAS is further degraded by cellular proteases. As expected, western blot analysis of cells cotransfected with eGFP-KRAS and inactive (S221A) RASProtease does not result in the appearance of a KRAS cleavage product upon induction of the inactive protease when probed with an anti-GFP antibody following a GFP pull-down (Fig. S19A). Likewise, the intensity of the RAS-reactive band remains constant when probed with an anti-RAS antibody. The inactive protease fails to cleave its inhibitory I domain as evidenced by the unprocessed protease band at approximately 35 kDa (Fig. S19A). Induction of the inactive protease in HEK 293T cells at 24 h after transfection results in no change in GFP fluorescence at 48 and 72 h after transfection compared to the same cells without induction of protease expression (Fig. S19B).

Discussion
Our goal in this work was to develop principles for engineering protein-specific proteases and to apply these principles to target active RAS. The key to engineering high sequence specificity in a protease is linking binding at sub-sites with chemical steps in peptide bond hydrolysis. This was accomplished by exploiting two facts about enzymes: (1) mutations at remote sub-sites can affect the conformation of catalytic amino acids; (2) change in the conformation of the catalytic region affects chemical rescue of active site mutants. High-specificity occurs when cognate sequence binding at sub-sites is compatible with productive binding of cofactor but binding of incorrect sequences antagonizes cofactor binding.
To understand substrate and chemical rescue, it is useful to consider some basic structural features of subtilisin proteases. Subtilisin has a cardioid shape with the active site and substrate binding pockets forming a cusp that divides N-terminal and Cterminal domains (Fig. 9, 28 ). The catalytic D32 and H64 and the P2, P3, and P5 sub-sites are primarily associated with the Nterminal domain (Fig. 5a). The catalytic S221 and P1, P4, and P6 sub-sites are primarily associated with the C-terminal domain. Thus the substrate intercalates between the two domains, bridges the interface, and affects the association between the two. Subtilisin specificity, in general, can be understood in terms of a model in which the domain interface is either in a deformed, lowactivity conformation or in a canonical active conformation. When the domain interface is stable, the substrate adapts to the enzyme and specificity is broad. This has also clearly been shown for α-lytic protease which resists deformation of catalytic amino acids even as binding pockets conform to bind a range of substrate sequences 59 . When the stability of the domain interface of a protease is decreased by mutation, however, the enzyme conforms to the substrate and activity will be low unless the substrate fit is precise and promotes the active conformation (i.e. in part by reestablishing the native interface between domains). Because individual sub-sites and the catalytic triad are interconnected, distortion in one area affects the other. Rheinnecker et al. 9,11 have shown previously that certain mutations in the S4 pocket of subtilisin, particularly those that form cavities, adversely affect activity of even small substrates that do not interact at S4. This led to an increase in specificity for peptide substrates with L or F over A at P4. To create RAS-specific proteases, we combined traditional sub-site engineering approaches with mutations that destabilize the domain interface. The best mutations weaken the interface between the two domains but generate favorable interactions with a cognate substrate and/or the cofactor. The favorable interactions rescue the active site conformation at the interface resulting in high activity. For example, removing a catalytic residue (e.g. D32G or H64G) reduces activity both by eliminating an element of the charge relay system and weakening the domain interface, but also creates potential for rescuing the active conformation and chemistry by the cognate cofactor and substrate. Likewise, enlarging the S4 pocket to accommodate tyrosine causes instability that is transmitted to the weakened catalytic site. Binding of the cognate substrate with a P4 tyrosine rescues the active conformation, however, by supplying stabilizing interactions in the S4 pocket that are transmitted to the catalytic region and promote productive binding of cofactor. The magnitude of cofactor activation depends on the population of the active conformation in the apo enzyme relative to its population with substrate and cofactor bound. This is a delicate balance. If the domain interface is too stable, a mutant will be fast but nonspecific. If the interface is too unstable, however, a mutant will be specific but slow. To produce high specificity and activity, the energies of the inactive and active conformations must be close enough that cognate binding to sub-sites significantly populates the active form 60 . This creates a critical state in which both cofactor binding and cognate sub-site interactions become required for activity. These proteases have very low activity against all substrates in the absence of cofactor but allow rescue of the active conformation by the cognate sequence with its cofactor.
Comparing several high-resolution structures reveals structural changes that may contribute to the ability of cognate substrate and cofactor binding to rescue activity in RAS-specific proteases. RAS-specific proteases have an expanded S4 site that binds numerous solvent molecules in the enlarged S4 pocket. Structures with a cognate peptide bound show that the L135V mutation allows space for an adventitious ion-binding site that coordinates the hydroxyl group of the P4 tyrosine of the substrate. This stabilizing interaction may be an important element for rescuing the active conformation of the catalytic region and enabling strong cofactor activation. Likewise, mutating a catalytic amino acid to Gly creates space that is occupied by a network of conserved waters in the absence of the cognate cofactor.
It might be assumed that enzymes engineered in this way would be slow 61 . This is not true. Kinetic analysis shows proteases targeting RAS are both more specific and more active than the viral processing proteases TEV and HRV 3C are for their cognate sequences 62,63 . For example, in the presence of 1 mM cofactor, RASProtease(N) and RASProtease(I) have k cat /K M of > 10 4 M −1 s −1 compared to k cat /K M values of~10 3 M −1 s −1 for 3Ctype proteases.
Engineering cofactor activation also proved to be useful for regulating protease activity inside cells. Imidazole is benign but normally not present inside cells. Thus, it can be used as a xenobiotic activator to tightly control RAS cleavage in E. coli. Controlling RAS cleavage in E. coli with nitrite is more complicated because it is an intermediate in metabolic pathways involving nitrogen. Nitrite may be quite useful for regulating activity in eukaryotic cells, however. Elevated nitrite is a common signature of disease states (including RAS-related cancers) [64][65][66] and reaches concentrations >100 µM in tumor cells 67,68 . This occurs because disease-induced nitric oxide synthase produces nitric oxide (NO) and NO quickly oxidizes and accumulates as nitrite. To date, we have only tested one RAS protease in human cells, but we have established that a RAS-specific protease can self-activate in human cells, locate KRAS at the plasma membrane, and cleave it as indicated by the presence of the eGFP fusion product and the precipitous disappearance of KRAS (Fig. 8). Although RAS isoforms are highly abundant in human cells (~300,000 RAS molecules (150 nM) in colorectal cancer cells) 69 , the proteases developed in E. coli will need to be adapted to the new environment if precise control of RAS signaling is the goal. Perhaps the more important point, however, is that the general principles learned from E. coli makes re-programming cultured human cells appear feasible and may provide strategies to selectively cleave RAS in cancer cells.
Primary structure alone is generally insufficient for encoding protease specificity. For example, Caspases are considered highly specific, but it is not possible to predict their natural protein targets from their cleavage patterns on small peptides 70,71 . Thus, a critical element of specificity is discrimination between different conformations of the same sequence. To use the conformation of the target protein to increase specificity, we chose a target sequence that is partially exposed in active RAS but is typically found in amphipathic α-helices and resistant to proteolysis when it occurs in other proteins. NMR analysis indicated extensive structural changes between the GDP-and GMPPNP-bound forms of RAS and increased mobility of the QEEYSAM sequence in RAS(GMPPNP). Specificity is governed therefore by both the correct primary structure and dynamic changes in the secondary structure of the target region. The additional information from conformation allows much higher specificity than can be achieved based on sequence alone. The effectiveness of this type of recognition is manifested in the 60 to 80-fold difference in cleavage rate between active and inactive RAS. In this sense inactive RAS serves as an internal control. To be useful in a cell, a proteinspecific protease must selectively destroy the target protein and not many competing substrates. Experiments with RAS-specific proteases in cells show this to be the case. Significant depletion of RAS can be achieved without any apparent effect on cell viability or noticeable degradation of endogenous proteins.
In conclusion, the principles presented here are general and can be applied to many target proteins. This includes proteins involved in aberrant signal transduction but also includes foreign proteins involved in cell invasion. The process of creating new proteinspecific proteases begins with matching the specificity of an existing protease with changes in local or global stability in a desired target protein. It culminates with designing-evolving the protease to match the new target sequence and cofactor environment.
Kinetic measurements. Synthetic peptide-AMC (7-amino-4-methylcoumarin (AMC, Ex: 350 nm, Em: 450 nm) was purchased from AnaSpec Inc. Concentrations of the AMC substrates were determined by absorbency at 324 nm using an extinction coefficient of 16 mM −1 cm −1 . Reaction kinetics of AMC substrates were measured using a KinTek Stopped-Flow Model SF2001 (Ex: 380 nm, Em: 400 nm cutoff filter). Kinetic data were fit using KinTek Global Explorer software obtained from the KinTek Corporation website (www.kintek-corp.com). Kinetic measurements of longer reactions, after manual mixing, were determined using a BioTek Synergy HT plate reader.
Statistics and reproducibility. Highly pure (≥98%) protease was used for all kinetic experiments. Kinetic parameters were determined using at least five substrate concentrations for each experiment. Determinations of were made at least three times for each substrate-enzyme-cofactor combination. Deviations in k cat /K M were < 5% in independent determinations.
Protein expression and purification of RAS. The genes for human HRAS (amino acids 1-166) were cloned into the vector pPal8 16 , which encodes an engineered subtilisin pro-sequence as an N-terminal fusion domain. The resulting fusion proteins were produced in E. coli and purified using an affinity-cleavage tag system, which we developed 16,75 . A commercial version of the purification system is available through Bio-Rad Laboratories (Profinity eXact Purification System). Exchange of GDP in recombinant preps 76 for GMPPNP was performed using EDTA to accelerate nucleotide dissociation and exchange 77 .
Cell lines and plasmid constructs. HEK 293T cells 78 were purchased from ATCC. ORFs for protease clones were synthesized by Genscript (Piscataway, NJ) and subcloned into pLVX-TetOne-Puro. The eGFP-KRAS plasmid was provided by the RAS Initiative (Frederick, MD).
Analysis of RAS cleavage in cells. HEK 293T cells were seeded into 6-well plates at a density of 1.2 × 10 6 cells per well in DMEM plus 10% Tetracycline-free FBS. Sixteen hours after seeding, cells were transfected with 1.25 µg of each plasmid using Lipofectamine 3000 (Thermo Fisher) according to the manufacturer's protocol. Transfected cells were incubated for 24 hr at 37°C in a humidified incubator providing 5% CO 2 . Twenty-four hours after transfection, expression of the protease was induced by addition of doxycycline to fresh media at a final concentration of 2 µg/mL, followed by incubation at 37°C for an additional 48 hr.
NMR spectroscopy. NMR spectra of RAS-GDP and RAS-GMPPNP were recorded on a Bruker Avance III 600 MHz spectrometer fitted with a cryoprobe. Samples for three-dimensional experiments were approximately 0.25 mM RAS in 20 mM HEPES, 50 mM NaCl, 5 mM MgCl 2 , and 1 mM TCEP, pH 7.4. All assignment experiments were collected at 25°C. The following standard three-dimensional heteronuclear experiments were acquired: HNCACB, CBCA(CO)NH, HN(CA)CO, HNCO. Triple resonance experiments were performed with 25% non-uniform sampling 79 . Amide chemical shift perturbations between the GDP and GMPPNPbound forms of G12V-HRAS, Δδ total , were calculated from the equation Δδ total = [(W H Δδ H ) 2 + (W N Δδ N ) 2 ] 1/2 , where Δδ H and Δδ N are the proton and nitrogen shift differences for a given resonance, respectively, and W H = 1 and W N = 0.2 are weighting factors. Two-dimensional heteronuclear { 1 H}-15 N steady-state NOE experiments were collected with a gradient-selected, sensitivity-enhanced pulse sequence 80 . Spectra were recorded with and without saturation in an interleaved manner employing a 5 s recycle delay. The heteronuclear NOEs were calculated from the NOE_on/NOE_off ratio and standard deviations were obtained from measured background noise levels. 15 N R 1 and R 2 measurements were carried out using gradient-selected, sensitivity-enhanced 2D 1 H-15 N HSQC experiments 80 with water flip-back modifications for solvent suppression. 15 N R 1 experiments were acquired with variable delay times of 10, 150, 300, 400, 600, 900, and 1200 ms. 15 N R 2 experiments were acquired with delay times of 8,16,24,32,40,48,64,80,96, and 130 ms. R 1 and R 2 values were obtained from single exponential decay fitting with error estimates for R using Sparky. Generalized order parameters (S 2 ) were extracted from the 15 N relaxation data utilizing the Modelfree program 81 . Spectra were processed using NMRPipe 82 and analyzed with Sparky 83 .
Crystallization, data collection, and processing. Purified RASProtease(I) was concentrated to 7 mg/mL for use in crystallization screening. The best crystals from our screens were obtained in a condition containing 0.1 M Bis-TRIS propane pH 8.5, 0.2 M KSCN, and 20% PEG 3350. Crystals appeared overnight and grew to a maximum size after 2-3 days. These crystals belong to space group P4 1 2 1 2 and have unit cell dimensions a = b = 58.6 Å, c = 124.8 Å, α=β=γ=90°. Native data up to 1.7 Å resolution were collected at 100 K using in-house X-ray diffraction resources at a wavelength of 1.5418 Å.
For the RASProtease(I)-YSAM complex, RASProtease(I) crystals were soaked overnight in mother liquor supplemented with the YSAM peptide at a final concentration of 2.1 mM. RASProtease(I) crystallizes in a variety of conditions and for the YSAM complex, the mother liquor for the YSAM complex contained 0.1 M TRIS-HCl pH 8.0, 20% PEG 6 K, 0.2 M NaCl. Crystals were transferred to the crystallization condition supplemented with 15% glycerol and 2.1 mM YSAM peptide and flash frozen in liquid nitrogen prior to data collection. Native data up to 1.2 Å resolution were collected at Stanford Synchrotron Radiation Laboratory (SSRL) beamline 12-2. The data were reduced using mosflm 84 and Aimless 85 from the CCP4 program suite 85 . A similar procedure was used to generate the RASProtease(I) -QEEYSAM complex. The mother liquor for these crystals was 0.1 M HEPES pH 7.0, 20% PEG 6 K, 0.2 M NaCl. Data for the RASProtease(I)-QEEYSAM complex were collected to 1.63 Å using in-house X-ray diffraction resources.
Similarly, purified Protease1(N) was concentrated to 12 mg/mL in a buffer composed of 5 mM HEPES pH 7.0, 10 mM Azide, and 0.53 mM LFRAL pentapeptide. Initial crystallization screening at 17°C using in-house resources identified a number of potential conditions, the best of which was 30% PEG 8 K, 0.2 M NaCl, and 0.1 M Imidazole pH 8.0. This condition was subsequently refined to 0.1 M Imidazole pH 8.4, 0.2 M NaCl, and 25% PEG 8 K. Like the RASProtease(I) crystals, these crystals belong to space group P4 1 2 1 2 and have unit cell dimensions a = b = 58.7 Å, c = 125.7 Å, α=β=γ=90°. A crystal from this condition was cryoprotected using mother liquor mixed with glycerol to give a final glycerol concentration of 17%, cryocooled directly in the gaseous nitrogen stream of the Xray source, and diffraction data were collected up to 1.6 Å resolution. The data were reduced using the D*Trek package 86 .
Structure determination, model building and refinement. Initial phases for RASProtease(I) and the RASProtease(I) -YSAM complex were determined by molecular replacement using the program Molrep 87 using the coordinates of the subtilisin chain in a subtilisin-prosegment complex structure (PDB ID: 1SPB). Prior to molecular replacement, all solvent molecules and ions were removed from the search model. The structure was refined using Refmac5 88-92 from CCP4 program suite. Iterative cycles of model building using COOT [93][94][95][96] yielded structures with R work /R free of 0.14/0.18 for RASProtease(I), 0.09/0.11, for the RASProtease(I)-YSAM complex, and 0.15/0.18 for the RASProtease(I)-QEEYSAM complex. A summary of the refinement statistics for all structures is provided in Table S3.
Initial phases for the Protease1(N)-LFRAL complex were determined by using the program Molrep 87 using the coordinates of a previously determined subtilisin crystal structure (PDB ID: 3F49). Prior to molecular replacement, all solvent molecules and ions with B factors greater than 20 Å 2 were removed from the search model. The structure was refined using Refmac5 88-92 from the CCP4 program suite. Iterative cycles of model building using COOT 93-96 yielded a structure with R work /R free of 0.16/0.18. A summary of the refinement statistics for the structure is provided in Table 1. The refined structure was deposited in the PDB (Accession Code: 6UBE).

Analysis of KRAS cleavage in cells.
To ensure a relatively uniform amount of nitrite in cells transfected with the designed protease, 1 mM sodium nitrite was added to the media along with doxycycline. Following this incubation, the media in each well was replaced with PBS and the cells from each well were harvested. eGFP-KRAS and/or its cleavage products were isolated from the cells using a GFP-trap_A immunoprecipitation kit (Chromotek). Briefly, cells were lysed in 10 mM TRIS-HCl pH 7.5, 150 mM NaCl, 0.5 mM EDTA, 1% (v/v) Triton X-100 on ice according to the manufacturer's recommendations. After centrifugation to remove insoluble cell debris, the lysates were incubated with anti-GFP beads that had been equilibrated in 10 mM TRIS-HCl pH 7.5, 150 mM NaCl, 0.5 mM EDTA overnight with gentle agitation at 4°C. The beads were then washed three times with 10 mM TRIS-HCl pH 7.5, 150 mM NaCl, 0.5 mM EDTA. Following removal of the wash buffer, the beads were resuspended in SDS-PAGE running buffer and boiled for 10 min at 95°C prior to analysis by Western blotting. The same lysates were then used for anti-FLAG pulldowns using a FLAG immunoprecipitation kit (Sigma) according to the manufacturer's instructions. The wash buffer was 50 mM TRIS-HCl pH 7.5, 150 mM NaCl and the samples were boiled for three minutes at 95°C prior to analysis by Western blotting.
Microscopy. Six-well tissue culture plates (Thermo Fisher Scientific cat. No. 140675) were imaged with a Zeiss LSM 710 AxioObserver microscope with plate adapter (cat. no. 451353-0000-000). Eight bit images were taken with an EC Plan-NeoFluar 10x/0.30 M27 objective. A 16% power, 514 nm wavelength was used to excite GFP conjugated to protein. Four images from each well near the center were taken to minimize reflection and light artifacts at 24, 48 and 72 h post-transfection resulting in 24 images at each timepoint.
Quantifying fluorescent cells. A representative timepoint image was opened in Zen Lite (Blue Edition v. 2.6, Zeiss). The green channel histogram was set from default 255 to 50. This increased the brightness before being exported into.tiff format. Images in the timepoint folder were processed similarly using "Batch" Each was converted to black and white. Green channel was set to 300%. Next the images were flattened and inverted. Exported images were grayscale, 8-bit,.tiff files after using "Actions" (batch processing) in Photoshop. Images were imported into FIJI 1 (Image J, NIH) and counted with the "Analyze Particles" module. Size was set to 0.01-0.04 with circularity equal to 0.00-0.50. The settings were saved to a "Macro" and used as a batch process on all images in the folder. Prism (v8.0, GraphPad) was used to plot the counts. Additional information can be accessed at: https://www.protocols.io/view/quantifying-fluorescent-cells-in-mammalian-cell-ti-y2nfyde.
Reporting summary. Further information on research design is available in the Nature Research Reporting Summary linked to this article.

Data availability
The refined X-ray structures are available in the PDB (Accession Codes: 6U9L, 6UAI, 6UAO). The source data for the charts and graphs in the figures are available as Supplementary data 1. All other data are available from the corresponding authors on reasonable request.