Engineering protein-specific proteases: targeting active RAS

We describe the design, kinetic properties, and structures of engineered subtilisin proteases that degrade the active form of RAS by cleaving a conserved sequence in switch 2. RAS is a signaling protein that, when mutated, drives a third of human cancers. To generate high specificity for the RAS target sequence, the active site was modified to be dependent on a cofactor (imidazole or nitrite) and protease sub-sites were engineered to create a linkage between substrate and cofactor binding. Selective proteolysis of active RAS arises from a 2-step process wherein sub-site interactions promote productive binding of the cofactor, enabling cleavage. Proteases engineered in this way specifically cleave active RAS in vitro, deplete the level of RAS in a bacterial reporter system, and also degrade RAS in human cell culture. Although these proteases target active RAS, the underlying design principles are fundamental and will be adaptable to many target proteins.

A: Structure of RAS with a bound GTP analog (PDB code 6Q21), (4,5), highlighting the YSAM site in Switch 2. B: RAS-specific protease based on an X-ray structure of 3BGO.pdb (6). Cognate sequence QEEYSAM-RD is modeled in the binding cleft. Substrate residues are denoted P1 through P5, numbering from the scissile bond toward the N-terminus of the substrate. Substrate amino acid on the leaving group side is denoted P1'.

A B
Examination of several crystal structures of active RAS identified considerable conformational heterogeneity in switch 2 induced by a γ-phosphate on the nucleotide cofactor (4,5,7). We hypothesized that these structural changes would uncover a cryptic cleavage site, making the QEEYSAM target sequence more vulnerable to proteolysis in active RAS than in the inactive form. Motifs of this type generally occur in amphipathic helices because most of the amino acids have high α−helical propensity and also because the spacing between the large hydrophobic amino acids (Y and M) matches a helical periodicity.
Natural subtilisins have broad sequence specificity, but there is a wealth of information about engineering mutations that alter specificity (8)(9)(10)(11)(12)(13)(14)(15). The major challenge to designing highspecificity proteases, however, is that specificity based on differential substrate binding falls far short of natural processing proteases. In many natural proteases the cognate sequence influences the chemical steps in peptide hydrolysis and not just binding steps (16). To engineer highly sequence-specific subtilisins we combined two previous observations: 1) Mutations at remote binding pockets for substrate side chains (sub-sites) can distort the subtilisin active site and (10, 12) 2) Mutating a catalytic amino acid in an enzyme can allow chemical rescue with a small molecule cofactor (6,17). We leveraged these observations to create an engineered protease that requires both conformational and chemical rescue by the substrate and cofactor, respectively, in order to achieve high levels of activity. In particular, for the fully engineered RAS-specific proteases described here, binding of the cognate QEEYSAM sequence results in structural changes that are transmitted from the binding pockets to the active site and enable cofactor activation. In contrast, binding of non-cognate sequences adversely affects the active site and in fact antagonizes cofactor binding. These cofactor-dependent subtilisins strongly prefer dynamic regions of proteins (such as switch 2 in active RAS) over structured regions because efficient cleavage requires interactions with P5 to P1' amino acids in an extended conformation (Fig. 1B).
Engineered proteases were tested in vitro, in engineered E. coli, and in human cell culture. Major findings are the following: 1) Engineered proteases cut the QEEYSAM sequence in a synthetic peptide substrate with high specificity and are tightly-controlled by the cognate cofactor; 2) X-ray crystal structures of protease-substrate-cofactor complexes reveal interactions involved in substrate recognition and cofactor activation; 3) The QEEYSAM sequence in native RAS is cut in response to the cognate cofactor; 4) RAS-specific proteases cuts active (GTP) RAS 60-80 times faster than the inactive (GDP) form; 5) NMR analysis reveals that the QEEYSAM sequence is more dynamic in active RAS than the inactive form; 6) The level of RAS within E. coli cells can be regulated by co-expression of the RAS gene with different protease and cofactor combinations; 7) A RAS-specific protease can destroy RAS in mammalian cells.

Protease engineering
The engineering process to develop specificity for the RAS sequence QEEYSAM involved extensive modification of the Bacillus protease, subtilisin BPN' (6,(17)(18)(19)(20)(21)(22)(23)(24), a canonical serine protease in which the scissile peptide bond is attacked by a nucleophilic serine (S221). The nucleophilicity of S221 results from its interactions with the catalytic histidine (H64) and aspartic acid (D32) that together form a charge relay system (25). Most substrate contacts are with the first five amino acids on the acyl side of the scissile bond (denoted P1 through P5, numbering from the scissile bond toward the N-terminus of the substrate (26) and the first amino acid on the leaving group side (denoted P1', Fig 1B). Corresponding sub-sites on subtilisin are denoted S1', S1, S2, etc. Natural subtilisins have a strong preference for a hydrophobic amino acid at S1 and S4 sub-sites (Fig. 1B), but little discrimination for a particular hydrophobic amino acid (27,28). The key to engineering high sequence-specificity was linking interactions in amino acid sub-sites to the rate of the first chemical step (acylation). In our design strategy, we considered two well-documented properties of proteases. The first is that certain mutations at sub-sites alter the conformation of remote catalytic residues (10,12). The second observation is that mutating a catalytic amino acid radically decreases activity of an enzyme (29,30) but in certain cases allows some chemical rescue of activity by an exogenous small molecule that mimics the mutated amino acid (31)(32)(33). The design strategy was based on the hypothesis that a linkage between substrate binding and chemical rescue can be created by mutating sub-sites to optimize interactions with the desired cognate sequence and combining these with mutation of a catalytic amino acid. To test the robustness of this hypothesis we generated two different active site variations: 1) D32G variants that are activated by nitrite or azide; 2) H64G variants that are activated by imidazole (6,17). Initial design was based on the X-ray structure of an inactive (D32A, S221A) variant of subtilisin (3BGO.pdb) (6).
The structure shows the enzyme bound to a cognate peptide (LYRAL-SA) and azide in the position normally occupied by the catalytic D32. The substrate binding pockets and the azide binding site form an interconnected network ( Fig. S1) (9,12,34,35). The theory is that binding at one sub-site can influence interactions in other parts of the network. The desired cognate sequence QEEYSAM-RD for RAS was modeled into the binding cleft of the 3BGO.pdb structure. Amino acid substitutions were introduced into the model and protein-protein and protein solvent interactions were evaluated by visual inspection (36). Based on this analysis, as well as earlier engineering work, we designed mutations in the catalytic region, the S1 pocket, and the S4 pocket to create a nitrite-dependent (D32G) protease and an imidazole-dependent (H64G) protease (TABLE 1). These are denoted Protease1(N) and Protease1(I), respectively.
These mutants were expressed, purified, and characterized for activity and specificity using substrate series that was originally used to characterize the progenitor protease: sDXKAM-AMC, where X = Y, F, I, or L (6,17,22,24). AMC is the fluorogenic leaving group, 7-amino-4methylcoumarin. Activity of both proteases is highest for P4 = F with lower activity for P4 = Y, I, and L (Fig. S2). The X-ray crystal structure of Protease1(N) was determined in complex with a cognate peptide LFRAL. We used the Protease1(N) structure to model additional mutations to increase specificity for P4 = Y.
*Engineered proteases have mutations at 18 positions in addition to those specified above (6).
On the basis of this analysis we introduced V107I and L135V mutations in the S4 pocket. This created the second pair of mutants: Protease2(N) and Protease2(I) ( Table 1).
These variants exhibited considerable preference for the substrate sDYKAM-AMC. Thus, when  Finally, we examined variations to increase activity for P3= S and P5= E. Protease mutants at 101 (S, K, R) and 103 (Q, R) were evaluated using the peptide substrates QEXYSAM-AMC, where X = E, R, I, or L and QEEYXAM-AMC, where X = S, R, or E. The S101K mutants gave a high preference for the desired cognate sequence QEEYSAM. The S101K mutants were further analyzed for activity using the peptide substrate QEEYSAM-AMC.

Fig. S3
compares k cat /K M as a function of nitrite concentration for Protease2(N) and the S101K mutant. The figure shows that the S101K mutation increases k cat /K M by six fold in 1mM nitrite.
The two S101K mutants are denoted as RAS-specific proteases RASProtease(N) and RASProtease (I).

Structural Analysis of Protease Interactions with the Switch 2 Target Sequence
To understand the structural basis for specificity and cofactor activation, we determined crystal structures of RASProtease(I) alone, in complex with YSAM and QEEYSAM product peptides, and with imidazole (Fig 4, S4). We compared these structures with the crystal structure of Protease1(N) in complex with a peptide corresponding to its sequence specificity (LFRAL, Fig.   4B). Overall the structures are very similar, with an RMSD between Cα carbons of 0.17 Å. The aromatic ring of the P4 side chain has common van der Waals interactions: the Cα and Cβ atoms of A104 interact with Cε1, the Cδ1 of L126 interacts with Cδ2, and Cβ of S128 interacts with Cδ2 and Cε2. Because of the space created in the S4 pocket by the L135V mutation, either a water or an ion coordinates the hydroxyl of P4 Tyr, the hydroxyl of Y171, and the backbone nitrogen of S132 in RASProtease(I) (Fig. 4B, Fig. S4). We currently have this modeled as a chloride ion based on the peak height observed in an anomalous difference fourier map; B A however, added chloride does not appear to influence kinetic properties. P4 interactions are facilitated by a significant shift of the loop containing residues 130 to 133 of RASProtease(I) relative to the Protease1(N) complex. In fact, these are the only residues in the entire structure whose Cα positions shift by more than 1 Å, including the N-and C-termini. This structural remodeling reflects what we intended (i.e. creating a stable state that only exists when substrate is bound) but the mode of conformational rescue is not what we expected.
In contrast to P4, the changes to the P3 site are more limited (Fig S5). Mutation of S101K increases activity with Ser at the P3 position. Hydrogen bonding interactions between the backbone nitrogen and carbonyl oxygen of the P3 residue and the corresponding partners on residue G127 are conserved.
The structures also provide insight into the structural basis for cofactor activation by both nitrite in Protease1(N) and imidazole in RASProtease(I). In both cases a catalytic amino acid is mutated to Gly and the vacated space is occupied by a network of conserved water molecules in the absence of the cognate cofactor. Nitrite was modeled into the solvent network of the Protease1(N) structure (replacing HOH 54; Fig. S6A) such that it supplies the critical H-bond to the catalytic H64 and is coordinated to N33 and three conserved waters (23, 302, and 382).
The coordination sphere of nitrite in Protease1(N) appears to be more complete than observed for azide in the parent enzyme (3BGO) and suggests why nitrite is tightly bound in spite of its lower pKa (3.37, compared to 4.72 for azide). Structures of RASProtease(I) without imidazole have three conserved waters (19,81, and 122) that interact with Oδ1 and Oδ2 of D32, CO of S125, NH of G64, Oγ of S62, and Oγ of S221. When imidazole binds, these waters are displaced, the imidazole nitrogens H-bond to Oδ1 and Oδ2 of D32 and Oγ of S221, and the charge relay system is reconstituted (Fig. S7, S6B). An additional Interesting feature of the structure of RASProtease(I) with QEEYSAM was the presence of an acyl adduct between the C-terminus of the P1 Met of the peptide and the Oγ of S221 of the enzyme (Fig S4). Apparently the binding energy of the QEEYSAM peptide is sufficient to push its terminal carbon and Oγ of S221 into an orientation that drives the equilibrium from a product complex to the acyl enzyme (37). Studying these types of subtle changes to the water structure in the active site that occur upon substrate and cofactor binding will allow us to make further changes to promote chemical and conformational rescue.

Cleavage of RAS(GDP) and RAS(GMPPNP)
The next step in assessing our RAS-specific proteases was to monitor cleavage of the intact protein, wherein switch 2 would adopt a range of conformations from extended to helical. Moreover, measuring protein cleavage allowed us to assess the relative rates of cleavage for active versus inactive RAS, thereby testing our hypothesis that active RAS is more vulnerable to proteolytic attack due to increased dynamic motion in switch 2. Reactions of RAS(GDP) and RAS(GMPPNP) with RASProtease(I) were also compared (data not shown). In summary, RAS(GMPPNP) cleavage is always faster than that observed for RAS(GDP) but detailed kinetic analysis is difficult because of limitations in quantitation of bands on a gel. Analysis by MALDI confirms that the enzyme cleaves RAS after the QEEYSAM site with no off-target cleavages (Fig. S8). No cleavage of RAS is observed in the absence of a cognate cofactor.  Table 2).
NMR analysis of cleaved RAS shows that both r 1 and r 2 are disordered (Fig. S9). Independent measurements of binding uncleaved RAS with protease were made by gel filtration in the absence of cofactor (Fig. S10).
Kintek Explorer was used to fit all data to mechanism 1 (Fig. 6, Table 2). Values of k 2 /K S were calculated to compare specificity for active RAS, independent of the r 1 product dissociation rate ( Table 2). The analysis shows that k 2 /K S for RASProtease(I) in 1mM imidazole is ~60 times faster for RAS(GMPPNP) compared to RAS(GDP). Similar measurements and global fits also were made for RASProtease(N) in 1mM nitrite (Fig. S13) with similar specificity (80-fold) for RAS(GMPPNP).   K P is ± 5%.

Analysis of the target region in RAS by NMR To better understand how dynamics in switch 2
contribute to RAS protease specificity for the active form, we examined the dynamics of this region by NMR. An order-to-disorder transition of switch 2 has been previously observed in RAS crystal structures (38,39). Two-dimensional 1 H-15 N HSQC spectra indicate extensive structural differences between the GDP-and GMPPNP-bound forms of RAS (Fig. S11A,B). NMR backbone resonance assignments were made using standard procedures and deposited in also considerably more flexible on the ps-ns timescale than the GDP-bound form (Fig. S11C,   Fig. S12). Thus, the effect of GMPPNP binding is to increase main chain flexibility for a large number of residues in the molecule over a wide timescale range, making the GMPPNP-and GTP-bound states, and the switch regions in particular, more susceptible to proteolytic cleavage than the GDP-bound form.

Co-expression of RAS with RASProtease(I) in E. coli
To measure activity and specificity in cells, we developed a bacterial system for co-expressing RAS with RAS-specific proteases. We constructed genes for RAS fusion proteins (Fig. S14) (51). Growth was at 37˚C with 100µM imidazole added to the culture medium (Fig. 7). Fig. S16 shows a control expression time course without imidazole. The gel patterns show that RAS cleavage is dependent on imidazole and coupled to the degradation of inhibitory domain of the protease zymogen, with RAS fragments r 1 and r 2 appearing as intact inhibitor decreases. By 41 hrs. in imidazole, all inhibitor is cleaved and greater than 50% of RAS is cleaved (Fig. 7). The gel pattern also shows that RAS is specifically cleaved into two discrete fragments. We confirmed that RAS was cleaved after the QEEYSAM sequence by purifying the r 1 fragment from the E. coli extract and measuring its mass by MALDI. There is no indication in the gel pattern of E. coli proteins being degraded. The fact that ~50% of RAS remains intact after 41 hrs is consistent with a high RAS protease preference for active RAS in the cell. Newly synthesized RAS predominantly binds GTP and therefore initially exists in the dynamic, active conformation. RAS(GTP) then converts to RAS(GDP) at a rate of ~1 hr -1 (52)(53)(54). Because E. coli lacks GDP-GTP exchange factors, we assume that RAS remains in the GDP bound form after hydrolysis and is less vulnerable to cleavage. After new RAS synthesis stops, RAS accumulates in the inactive and partially protease-resistant form. Unlike the case with imidazole, we are not able to precisely control nitrite concentration because it is an intermediate in metabolic pathways involving nitrogen in E. coli (55,56). Nevertheless, the RAS cleavage by nitrite-activated proteases is potentially informative because nitrite is a disease marker in eukaryotic cells and its concentration in E. coli is likely similar to that in cancer cells (55,56). RASProtease(N) appears to be more active in E. coli with endogenous nitrite than RASProtease(I) in 100µM imidazole. Even so, a cleavage-resistant population of RAS remains after 48 hrs. We presume this is RAS(GDP).  Fig. 8A, lane 1). Induction of expression of the pro-protease with doxycycline results in a marked depletion of eGFP-RAS and a corresponding increase in the presence of a band consistent with the eGFP protein from the fusion (Fig. 8A, lane 2). This change in migration does not occur with an inactive mutant of RASProtease(N) (S221A) with or without doxycycline (Fig. 8A, lanes 3 and 4). Probing the same samples with an anti-RAS antibody (Ras10, ThermoFisher) confirms that this band contains an eGFP-RAS fusion and further shows that RAS disappearance coincides with the appearance of the eGFP product (Fig. 9A, lanes 1 vs. 2). As expected, primary processing of the RASProtease(N) zymogen (34.9kDa) to the mature protease (26.4 kDa) occurs readily in cells (Fig. 8A, lane 2). Primary processing is not observed for the S221A mutant (i.e., compare Fig. 8A, lanes 2 and 4). In addition, fluorescence microscopy images show a marked decrease in eGFP signal upon induction of the active protease but not the inactive protease (Fig. 8B).

Engineered proteases can destroy active RAS in mammalian cells
Disappearance of the eGFP-RAS fragment indicated that, as in E. coli, cleaved RAS is further degraded by cellular proteases.

DISCUSSION
Our goal in this work was to develop principles for engineering protein-specific proteases and to apply these principles to target active RAS. The key to engineering high sequence specificity in a protease is linking binding at sub-sites with chemical steps in peptide bond hydrolysis. This was accomplished by exploiting two facts about enzymes: 1) Mutations at remote sub-sites can affect the conformation of catalytic amino acids; 2) Change in the conformation of the catalytic region affects chemical rescue of active site mutants. Highspecificity occurs when cognate sequence binding at sub-sites is compatible with productive binding of cofactor but binding of incorrect sequences antagonizes cofactor binding.
Engineering specificity using substrate and chemical rescue To understand the engineering process, it is useful to consider some basic structural features of subtilisin proteases. Subtilisin has a cardioid shape with the active site and substrate binding pockets forming a cusp that divides N-terminal and C-terminal domains (Fig. 9, (28)).

N-terminal domain C-terminal domain QEEYSAM
The catalytic D32 and H64 and the P2, P3, and P5 sub-sites are primarily associated with the N-terminal domain (Fig. 4A). The catalytic S221 and P1, P4 and P6 sub-sites are primarily associated with the C-terminal domain. Thus the substrate intercalates between the two domains, bridges the interface, and affects the association between the two. Subtilisin specificity in general can be understood in terms of a model in which the domain interface is either in a deformed, low-activity conformation or in a canonical active conformation. When the domain interface is stable, the substrate adapts to the enzyme and specificity is broad. This has also clearly been shown for α−lytic protease which resists deformation of catalytic amino acids even as binding pockets conform to bind a range of substrate sequences (57). When the stability of the domain interface of a protease is decreased by mutation, however, the enzyme conforms to the substrate and activity will be low unless the substrate fit is precise and promotes the active conformation (i.e. in part by reestablishing the native interface between domains). Because individual sub-sites and the catalytic triad are interconnected, distortion in one area affects the other. Rheinnecker et al. (10,12) have shown previously that certain mutations in the S4 pocket of subtilisin, particularly those that form cavities, adversely affect activity of even small substrates that do not interact at S4. This led to an increase in specificity for peptide substrates with L or F over A at P4. To create RAS-specific proteases, we combined traditional sub-site engineering approaches with mutations that destabilize the domain interface.
The best mutations weaken the interface between the two domains but generate favorable interactions with a cognate substrate and/or the cofactor. The favorable interactions rescue the active site conformation at the interface resulting in high activity. For example, removing a catalytic residue (e.g. D32G or H64G) reduces activity both by eliminating an element of the charge relay system and weakening the domain interface, but also creates potential for rescuing the active conformation and chemistry by the cognate cofactor and substrate. Likewise, enlarging the S4 pocket to accommodate tyrosine causes instability that is transmitted to the weakened catalytic site. Binding of the cognate substrate with a P4 tyrosine rescues the active conformation, however, by supplying stabilizing interactions in the S4 pocket that are transmitted to the catalytic region and promote productive binding of cofactor. The magnitude of cofactor activation depends on the population of the active conformation in the apo enzyme relative to its population with substrate and co-factor bound. This is a delicate balance. If the domain interface is too stable, a mutant will be fast but non-specific. If the interface is too unstable, however, a mutant will be specific but slow. To produce high specificity and activity, the energies of the inactive and active conformations must be close enough that cognate binding to sub-sites significantly populates the active form (58). This creates a critical state in which both cofactor binding and cognate sub-site interactions become required for activity.
These proteases have very low activity against all substrates in the absence of cofactor, but allow rescue of the active conformation by the cognate sequence with its cofactor.  (65,66). This occurs because diseaseinduced nitric oxide synthase produces nitric oxide (NO) and NO quickly oxidizes and accumulates as nitrite.
Using the conformation of the target protein to increase specificity. Primary structure alone is generally insufficient for encoding protease specificity. For example, Caspases are considered highly specific, but it is not possible to predict their natural protein targets from their cleavage patterns on small peptides (67,68). Thus, a critical element of specificity is discrimination between different conformations of the same sequence. To do this, we chose a target sequence that is partially exposed in active RAS but is typically found in amphipathic α-helices and resistant to proteolysis when it occurs in other proteins. NMR analysis indicated extensive structural changes between the GDP-and GMPPNP-bound forms of RAS and increased mobility of the QEEYSAM sequence in RAS(GMPPNP). Specificity is governed therefore by both the correct primary structure and dynamic changes in the secondary structure of the target region. The additional information from conformation allows much higher specificity than can be achieved based on sequence alone. The effectiveness of this type of recognition is manifested in the 60 to 80-fold difference in cleavage rate between active and inactive RAS. In this sense inactive RAS serves as an internal control. To be useful in a cell, a protein-specific protease must selectively destroy the target protein and not many competing substrates. Experiments with RAS-specific proteases in cells show this to be the case.
Significant depletion of RAS can be achieved without any apparent effect on cell viability or noticeable degradation of endogenous proteins.
In conclusion, the principles presented here are general and can be applied to many target proteins. This includes proteins involved in aberrant signal transduction but also includes foreign proteins involved in cell invasion. The process of creating new protein-specific proteases begins with matching the specificity of an existing protease with changes in local or global stability in a desired target protein. It ends with designing-evolving the protease to match the new target sequence and cofactor environment.

Kinetic measurements Initial characterization of different protease-inhibitor
combinations was carried out using a KinTek Stopped-Flow Model SF2001. Kinetic measurements of longer reactions, after manual mixing, were determined using a BioTek Synergy HT plate reader.

Protein Expression and purification of RAS
The genes for human HRAS (amino acids 1-166) were cloned into the vector pPal8 (17), which encodes an engineered subtilisin pro-sequence as an N-terminal fusion domain. The resulting fusion proteins were produced in E.
coli and purified using an affinity-cleavage tag system, which we developed (17), essentially as described in (71). A commercial version of the purification system is available through Bio-Rad Laboratories (Profinity eXact Purification System). Exchange of GDP in recombinant preps (72) for GMPPNP was performed as described in (73).
Cell lines and plasmid constructs HEK 293T cells (74) were purchased from ATCC. ORFs for protease clones were synthesized by Genscript (Piscataway, NJ) and subcloned into pLVX-TetOne-Puro. The eGFP-KRAS plasmid was provided by the RAS Initiative (Frederick, MD).
NMR spectroscopy NMR spectra of RAS-GDP and RAS-GMPPNP were recorded on a Bruker Avance III 600 MHz spectrometer fitted with a cryoprobe.
Crystallization, data collection and processing Purified RASProtease(I) was concentrated to 7 mg/mL for use in crystallization screening. The best crystals from our screens were obtained in a condition containing 0.1 M Bis-TRIS propane pH 8.5, 0.2 M KSCN, and 20% PEG 3350. Crystals appeared overnight and grew to a maximum size after 2-3 days. These crystals belong to space group P4 1 2 1 2 and have unit cell dimensions a=b=58.6 Å, c=124.8 Å, α=β=γ=90°. Native data up to 1.7 Å resolution were collected using in-house X-ray diffraction resources. Additional details of all methods are given in Supplemental information.