Mapping protein selectivity landscapes using multi-target selective screening and next-generation sequencing of combinatorial libraries

Characterizing the binding selectivity landscape of interacting proteins is crucial both for elucidating the underlying mechanisms of their interaction and for developing selective inhibitors. However, current mapping methods are laborious and cannot provide a sufficiently comprehensive description of the landscape. Here, we introduce a novel and efficient strategy for comprehensively mapping the binding landscape of proteins using a combination of experimental multi-target selective library screening and in silico next-generation sequencing analysis. We map the binding landscape of a non-selective trypsin inhibitor, the amyloid protein precursor inhibitor (APPI), to each of the four human serine proteases (kallikrein-6, mesotrypsin, and anionic and cationic trypsins). We then use this map to dissect and improve the affinity and selectivity of APPI variants toward each of the four proteases. Our strategy can be used as a platform for the development of a new generation of target-selective probes and therapeutic agents based on selective protein–protein interactions.

A defining characteristic of protein-protein interactions (PPIs) is the binding selectivity landscape of the interacting proteins [1][2][3] , which relates the amino acid sequence to the affinity of a protein toward its target. Comprehensively mapping this landscape is crucial both for understanding the mechanisms and evolutionary origins of selective PPIs and for protein engineering purposes, e.g., for designing selective binders and/or inhibitors for target proteins [4][5][6][7] . The binding selectivity landscape of each protein in a certain PPI is characterized by the interfacial residues of the protein, such that point mutating these residues can help determine the contribution of each residue to target selectivity-or in a protein with a broad selectivity spectrum to the selectivity of the protein to each of its putative targets individually. The binding selectivity landscape usually comprises of four types of key interface residues. Hot-spot residues are a few 8 interface residues that are highly relevant for a specific PPI, i.e., they contribute almost 75% of the total free energy of binding (ΔΔG bind ) of the protein to its partner [9][10][11] . Mutating hot-spot residues therefore decreases the affinity of the protein to a specific partner-but not necessarily to others. Cold-spot residues 1,[12][13][14] are interface residues occupied by suboptimal amino acids, such that mutating them increases the binding affinity of the protein to a specific partner. Selectivity-switch residues 15,16 are interface residues in which a point-mutation simultaneously decreases the affinity of the protein to one partner and increases its affinity to another. Finally, correlated-selectivity residues 17,18 are interface residues that work together to increase the selectivity of the protein to one specific partner. Such residues are especially difficult to characterize with conventional methods because only a double mutation (one mutation in each residue) can change the affinity of the protein to a certain partner.
Methods for mapping protein selectivity landscapes typically include mutating candidate residues and testing the resulting changes in affinity 4,19 . Despite considerable advancements in recent years 8,[20][21][22] , currently available methods still demonstrate several caveats that hinder our ability to develop, inter alia, selective inhibitors for clinically important proteins. For instance, alanine scanning and similar classical approaches [23][24][25][26][27][28][29] can test only a subset of all possible mutants, are time-consuming and laborious, require protein purification and binding affinity measurements for each mutant, and most importantly, focus chiefly on hot-spot residues. Modern approaches can overcome some of these caveats by employing protein library display and sorting technologies, which rapidly explore all possible (hot-and coldspot) mutations and qualitatively map the contribution of each residue to the affinity of a protein toward its target [30][31][32][33][34][35][36][37] . However, as only several hundred clones of the sorted libraries are ultimately sequenced, these methods do not comprehensively characterize the entire library and, to date, they cannot identify correlated-selectivity residues. A more recent approach employed next-generation sequencing (NGS) to guide protein and synthetic small-molecule optimization 4,7,[38][39][40][41][42] , effectively improving the binding affinity and selectivity, and generating binding epitopes de novo 6,7,19,40,41,43,44 . However, this and most other currently available approaches generate high-affinity (but not necessarily selective) binders, and, in the few studies designed to generate selective binders 7,40,41,43,44 , the methodology was limited to improving discrimination between only two target proteins that have different binding epitopes. Significantly, some broadspectrum proteins may have many potential targets with binding epitopes that have high-sequence homology and structural similarity.
In the current study, we present a novel, single-step approach for comprehensively mapping the binding selectivity landscape of proteins (including hot-spot, cold-spot, selectivity-switch, and correlated-selectivity residues) using a combination of experimental multi-target selective library screening and in silico NGS analysis. To test our approach in a real-life context, we chose to map the binding selectivity landscape of a broad-spectrum trypsin inhibitor, namely, the human amyloid protein precursor inhibitor (APPI; a member of the human Kunitz-domain family of serine protease inhibitors 45 ), to each of the four human serine proteases-kallikrein-6 (KLK6), mesotrypsin, anionic trypsin, and cationic trypsin-all of which share high-sequence homology and structural similarity. Then, we used this landscape to improve both the selectivity and affinity of APPI variants to each protease, which we evaluated through inhibition studies using the purified proteins.
We recently used a yeast-surface display (YSD) platform-a powerful directed evolution protein engineering technology 30,46-50 -to generate APPI-3M 51 : a triple-mutant APPI (M17G/I18F/ F34V) whose affinity to mesotrypsin, anionic trypsin, and cationic trypsin is comparable [K i = 89.8 ± 0.23 pM, 1.47 ± 0.02 pM, and 4.96 ± 0.25 pM for mesotrypsin, anionic trypsin, and cationic trypsin, respectively 51 ], whereas its affinity to KLK6 is lower by three orders of magnitude [K i = 1.09 ± 0.12 nM 51 ]. These features render APPI-3M an optimal model scaffold for engineering binding selectivity; its lack of selectivity toward mesotrypsin, anionic trypsin, and cationic trypsin is a good starting point for manipulating its relative selectivity, while its lower selectivity toward KLK6 makes it a good target for engineering selectivity switches.
Our study design is demonstrated in Supplementary Figure 1. We began by generating a yeast-displayed APPI-3M library including clones with single-residue random mutations in the binding interface (i.e., in the APPI binding loop) and clones with multiple-residue random mutations both within and beyond the binding interface (i.e., in the APPI scaffold and binding loop). Then, we divided the four proteases into combinatorial pairs (six combinations) and sorted the YSD APPI-3M library for variants with differential selectivity toward each protease in each pair. We then used NGS to sequence these fractions and analyzed them computationally. Consequently, the sorted APPI-3M mutant library fractions were rich in affinity-and selectivity-enhancing mutations; of these, we identified the most highly selective APPI mutations based on their ability to inhibit-as soluble proteinseach of the four proteases. To the best of our knowledge, this is the first report of a platform that can provide such a rich PPI binding selectivity landscape.

Results
Selecting APPI variants with improved selectivity. We began by generating a library of APPI-3M clones using both site-directed random mutagenesis of the APPI-3M binding loop (residues 11-18, except invariant Cys-14) and error-prone PCR amplification of the entire coding sequence. This design yielded a 'naive' APPI-3M library of 3.5 million variants, each with 0-2 amino acid mutations. Then, using our YSD system, we expressed each of these variants on the surface of yeast cells and used fluorescence-activated cell sorting (FACS) to quantify their binding to each of the four (soluble) serine proteases (mesotrypsin, KLK6, anionic trypsin, and cationic trypsin). We introduced the yeast-displayed naive library to pairs of serine proteases, each labeled with a different fluorescent dye (Alexa Fluor-650 or Alexa Fluor-488; i.e., a pairwise selective screen, Fig. 1a), at concentrations optimized for each pair to achieve an equivalent distribution of staining intensities (Fig. 1b). The library was sorted to isolate ∼1 million variants per sorted fraction (sorting gate), with increased selectivity toward each of the four serine protease targets versus its paired protease. Subsequent Mapping hot and cold spots and selectivity switches. To map the binding selectivity landscape of APPI-3M to each of the four serine proteases, we used Illumina Miseq to perform a highthroughput sequencing of APPI-3M gene fragments from the sorted and naive libraries. We then used this sequencing data to identify single and double amino acid substitutions in the APPI-3M sequence that had modulated its selectivity toward each of the four serine proteases. In each sorted fraction, the average number of read pairs per sequenced library was 1 million; of these, 95% of the sequenced read pairs passed quality filtering and integration and were thus translated to amino acid sequences and aligned to the sequence of APPI-3M. Because we were only interested in amino acid substitutions (and not in insertions or deletions), we analyzed only sequences of the same length as that of APPI-3M and determined a threshold value of 100 reads for variants with a single-amino acid substitution and 10 reads for variants with a double-amino acid substitution. To correlate between the abundance of a variant and its target selectivity, we determined the enrichment ratio of each variant, which we defined as the frequency of a certain mutation in the sorted library fraction divided by the frequency of that mutation in the naive library. Thus, we assumed that mutations that increase the selectivity of each variant to its putative target (mesotrypsin, KLK6, anionic trypsin, or cationic trypsin) will be more abundant in the sorted library fraction than in the naive library (enrichment ratio >1), while mutations that decrease selectivity will be less abundant in the sorted library than in the naive library (enrichment ratio <1).
We first characterized the effect of single-amino acid substitution on target selectivity. To this end, we created a heatmap for each sorted library fraction ( Fig. 2 and Supplementary Figure 3), using the enrichment ratio as a measure of binding selectivity. Then, we used this map to identify (i) hot spots, defined as APPI-3M residues, in which most mutations decreased the binding selectivity to one target protease versus another, (ii) cold spots, in which most mutations increased the binding selectivity; and (iii) selectivity switches, in which a single mutation decreased the selectivity to one target protease and increased the selectivity toward another.
Our analysis revealed that residue 15 in APPI-3M is a general hot spot for binding human serine proteases, as all mutations in this residue, except a substitution to Lys, decreased its binding affinity toward all four proteases (  Table 1), as mutating the residue in position 11 (originally Thr) from His to Ile (Fig. 2a, d, white arrows) switched the selectivity from anionic trypsin to mesotrypsin by a factor of 69 × 10 3 and mutating the residue in position 17 (originally Gly) from Glu to Arg (Fig. 2b, c, black arrows) switched the selectivity from cationic trypsin to KLK6 by a factor of 7 × 10 3 .
Mapping correlated-selectivity residues. Next, we turned to identify the effects of double-amino acid substitutions in APPI-3M on the selectivity toward each of the four serine proteases. The first steps in this process (quality filtration and integration, translation, alignment, and enrichment ratio calculations) were similar to those described above for single-amino acid analyses. Most double-mutant APPI-3M variants that increased the selectivity toward one serine protease versus all others increased the selectivity toward KLK6 [note that the affinity of the parental APPI-3M to KLK6 was two orders of magnitude lower than to anionic and cationic trypsin and one order of magnitude lower than to mesotrypsin 51 ], and these variants had mutations in residues 11 and 17 (Supplementary Table 1). To elucidate the effects of correlated residues and of residues 11 and 17 ( Fig. 3c), in particular on the selectivity toward KLK6, we predicted the total effect of each pair of mutated residues (i.e., the effect of all mutations in these two residues; see Methods) and illustrated the results as heatmaps ( Fig. 3a and Supplementary Figure 4). Variants in which both residues 11 and 17 were mutated demonstrated an increased selectivity (enrichment ratio >1) toward KLK6 versus the three other proteases. Therefore, we generated additional heatmaps to estimate the effect of specific pair residues (all pair combinations of residues 11 and 17, Fig. 3b). These heatmaps (Fig. 3b) revealed that many combinations of doubleamino acid substitutions increased the selectivity toward KLK6 versus the three other proteases, including a combination of either Val, Ala, Pro, or Ser at residue 11 with either Ala, Arg, or Ser at residue 17 (enrichment ratio >1). For instance, the combination of Val at residue 11 and Arg at residue 17 increased the total selectivity of APPI-3M toward KLK6 by a factor of 4 × 10 9 (calculated as the multiplication of the three relative selectivities: 59 × 10 3 -fold versus mesotrypsin,~364-fold versus cationic trypsin, and~170-fold versus anionic trypsin; Supplementary  Table 2). Similarly, the combination of Ser at residue 11 and Arg at residue 17 increased the total selectivity by a factor of 7 × 10 7 (~37 × 10 3 -fold versus mesotrypsin,~24-fold versus cationic trypsin, and~85-fold versus anionic trypsin).
Positions 11 and 17 in the APPI-3M sequence are correlated.
As residues in positions 11 and 17 of the APPI-3M sequence increased the selectivity of APPI-3M toward KLK6, we elucidated the interactions between different amino acids at these positions by generating and purifying representative single-and doublemutant APPI-3M variants. We chose the KLK6-selective T11V/ G17R and T11S/G17R double-mutant variants (see Fig. 3), and their corresponding single-mutant selectivity-switch variants T11V, T11S, and G17R (see Tables 1 and 2). We tested the affinity of the soluble forms of these five variants to each of the four serine proteases in a competitive inhibition assay (Table 3) and, based on the extracted K i values, we determined the selectivity of each variant toward KLK6 and compared it with the selectivity of the unmodified APPI-3M (Table 4). The amino acid substitution that most increased the total selectivity of APPI-3M toward KLK6 was T11V/G17R, followed by G17R and finally, T11S/G17R. The individual substitutions T11V and T11S did not improve the selectivity toward KLK6, rather they somewhat decreased it (Table 4). These results suggest that residues 11 and 17 are correlated-selectivity residues, which act together to increase target selectivity. To further test this hypothesis, we conducted a double-mutant cycle analysis 52 , in which we used the selectivity values of KLK6 with the two doublemutant variants and their single variants (T11V/G17R, T11S/ G17R, T11V, T11S, and G17R, Table 4) to calculate the selectivity strength between two mutated residues (i.e., the coupling energy, ΔΔG int ; Supplementary Figure 7). Indeed, in both double mutations, the ΔΔG int values were non-zero, indicating that residues 11 and 17 interact with each other to cooperatively affect the selectivity toward KLK6.
To gain insight into the structural basis of the observed selectivity changes, we attempted to crystallize the APPI-3M-T11V/G17R variant in complex with the increased-selectivity target KLK6 and the reduced-selectivity target mesotrypsin. We were able to obtain a high-resolution crystal structure of the APPI-3M-T11V/G17R variant bound to mesotrypsin (PDB ID: Table 3). A structural analysis of this complex revealed that a deleterious steric interaction between the APPI Arg-17 mutation and mesotrypsin Arg-193 pushes Arg-193 into a more buried conformation (Supplementary Figure 8), as previously found in the structures of mesotrypsin bound to wildtype APPI or BPTI Kunitz-type inhibitors 53,54 . The steric clash and the restriction of Arg-193 to a single buried conformation can explain the reduction in affinity toward mesotrypsin, which is consistent with our prior structure and mutagenesis studies 51,55 . The corresponding amino acid that occupies position 193 in KLK6 is Gly (PDB ID: 4D8N); therefore, the lack of a side chain in position 193 of KLK6 is probably more energetically favored (upon binding to APPI-3M-G17R) than that of mesotrypsin Arg-193 (due to the steric clash and the restriction of Arg-193). Efforts to crystallize the APPI-3M-T11V/G17R complex with KLK6 were unsuccessful, and thus the basis for selectivity improvements toward this alternative target, and for cooperativity between APPI residues 11 and 17, remain a subject for future investigations.
Selective screens are superior to affinity screens. A significant advantage of our pairwise selectivity screen approach over the traditional sequential affinity screen (a commonly used method, in which the library is sorted against each enzyme separately in a sequential manner 40,43 ) is the ability of our approach to identify, in a single screening step (rather than two sequential affinity screen steps), the top~5% of clones that are more selective toward one target versus another, even if the absolute affinity of these clones toward both targets is lower than that of the parent variant (in the current study, APPI-3M). To demonstrate that the traditional sequential affinity screens are unable to detect the clones obtained by our pairwise selectivity screens (namely, those with improved selectivity and low affinity), we performed two separate sequential affinity screens, one toward KLK6 and another toward cationic trypsin (Supplementary Figure 9A, B, D, E). As expected, both the sequential affinity and the pairwise selectivity screen approaches were able to identify the G17R mutation as a KLK6 selectivity-improving mutation (Supplementary Table 4), which is consistent with the 1.7-fold improvement in the selectivity toward KLK6 versus cationic trypsin, measured by the enzymatic assay (Supplementary Table 5). In contrast, we were unable to identify the selective G17E mutation by using the sequential affinity approach (Supplementary Table 4), although it was clearly identified using the pairwise selectivity screen between KLK6 and cationic trypsin (Supplementary Table 4), demonstrating a 3.4-fold improved selectivity toward cationic trypsin, as measured by the enzymatic assay (Supplementary Table 5). This discrepancy between the two approaches stems from the fact that the G17E mutation was not in the top~5% binders in the cationic trypsin and KLK6 sorts due to its weakened affinity toward cationic trypsin and KLK6 relative to the parental molecule APPI-3M (by~4-fold and~10-fold, respectively, Supplementary Table 5).
Upscaling. Our selective pairwise screening approach can be easily scaled up for multiple target proteins per screen, such that a library can be screened against a target of interest (labeled with one type of fluorophore) versus a mixture of competitors (all labeled with the same fluorophore, which is different from the one used for the target of interest). Such an approach is especially useful in the case where there is a single primary target of interest, since it will be completed through only a single sort. To demonstrate the feasibility of such an approach, we performed a competitive sort, in which KLK6 was the primary target of interest (labeled with Alexa Fluor-650) and cationic trypsin, anionic trypsin, and mesotrypsin (each labeled with Alexa Fluor-488) were the competitors (Supplementary Figure 9C, F), and compared the enrichment values to those of our pairwise comparisons. The enrichment ratios of the competitive multi-target screen were highly correlated with those of the pairwise selective screen; in both setups, the top-rated selectivity-improving clones were similar (Supplementary Table 6), both for single mutations (e.g., G17R) and for double mutations (e.g., T11S/G17R and T11V/G17R). In addition, this analysis revealed a clear selectivity cold spot, in which most mutations in residue 17 increased the  17   56  55  54  53  52  51  50  49  48  47  46  45  44  43  42  41  40  39  38  37  36  35  34  33  32  31  30  29  28  27  26  25  24  23  22  21  20  19  18  17  16  15  14  13  12  11  10  9  8  7  6  5  4  3   56  55  54  53  52  51  50  49  48  47  46  45  44  43  42  41  40  39  38 11 17 11 17 Figure 3E). This finding is consistent with those obtained using the pairwise screening approach (Supplementary Figure 3B).

Discussion
We describe a novel strategy for mapping the binding selectivity landscapes of proteins through a combination of experimental multi-target selective library screening and in silico nextgeneration sequencing analysis. Employing the APPI/serine protease system as a model PPI, we show that our strategy can be used to map, in a rapid, single-step, cost-effective process, several crucial aspects of the selectivity landscape, including hot-spot residues, selectivity switch residues, and correlated-selectivity residues. The latter are of special importance, as characterizing correlated-selectivity mutations and analyzing their effects (both individually and combined) on target affinity and selectivity is challenging with currently available approaches 27,56 . Several previous studies have combined selective screening of a protein library and NGS analyses to map the binding landscape of various proteins, including influenza inhibitors (HB36.4, HB80.3) 7 , the human Yes Associated Protein 65 (hYAP65) WW domain 19 , and an anti-VEGF antibody 40 . However, these approaches employed either libraries of clones with only single mutations or library screens that were performed against only a single target. Therefore, in these previous studies, it was difficult to identify mutations that change target selectivity or that work in concert to affect target affinity and selectivity in a correlated manner. Thus, a major advantage of our approach is its ability to identify correlatedselectivity mutations. For example, we found that the mutations T11V and G17R, when combined, yield a highly potent and selective inhibitor for KLK6, while combining the mutations T11S and G17R yield only a moderately potent and partially selective inhibitor for KLK6. These findings may suggest that a small and hydrophobic amino acid (e.g., Val in position 11) exerts a stronger effect on selectivity towards KLK6 than a small and polar amino acid (Ser in position 11).
Another advantage of our approach lies in using a pairwise selectivity screen, rather than the sequential affinity screen that is commonly used in other approaches 40,43 , to increase selectivity. This advantage is especially noticeable for the identification of clones that are selective but have distinct affinities toward both targets that are lower than that of the parent variant (in the current study, APPI-3M), as demonstrate in Supplementary  Table 4. In addition, the pairwise screening approach can be easily scaled up for multiple target proteins per screen, such that a library can be screened against a target of interest versus a mixture of competitors. Such an approach is especially useful where there is a single primary target of interest, since it will be completed with only one sort, as demonstrate in Supplementary  Table 6.
We chose the serine protease family as an ideal group of targets to demonstrate our strategy mainly because inhibiting the human serine proteases is of clinical value: both KLK6 and mesotrypsin are involved in cancer progression [57][58][59] , while anionic and cationic trypsins are involved in the etiology of pancreatitis 60,61 . However, the development of inhibitors capable of discriminating among trypsin-like proteases has been challenging. We and others have previously used X-ray crystallography to explore the structures of these proteases, in some cases identifying the distinguishing features that suggest the potential for developing highly selective inhibitors 54,62,63 . For example, several adaptive mutations have been shown to shape the active site of mesotrypsin for distinct substrate and inhibitor-binding selectivity 54,[63][64][65] . Nevertheless, the development of truly selective inhibitors has yet to be achieved, and we anticipate that our novel approach, which is capable of rapidly and efficiently screening large libraries to comprehensively map selectivity, will enable the development of selective probes and therapeutic agents.
APPI has attracted our interest as a scaffold for engineering selective serine protease inhibitors due to the marked sequence diversity among Kunitz family members, which possess canonical binding loops that are highly tolerant to substitution or incorporation of additional amino acids 66,67 . Because the sequence of the canonical binding loop and neighboring residues largely determine the affinity and selectivity of the inhibitor to its targets 53,68 , using APPI as a scaffold offers a unique opportunity to optimize target affinity and selectivity without compromising  stability. In addition, the affinity of the complexes between APPI and mesotrypsin, anionic trypsin, and cationic trypsin is similar, which facilitated the identification of cold spots, whereas the affinity of the APPI/KLK6 complex is three orders of magnitude lower than that of the other complexes, thus allowing us to identify selectivity-switch residues. As a validation of the utility of our platform, we show that the results obtained using NGS of the selected APPI clones typically correlate well with the binding selectivity of the purified protein variants in solution (as measured by competitive inhibition studies), but at different scales (Supplementary Table 7). For example, the selectivity values of 13 combinations of enzyme-inhibitor variants (out of a total of 15 possible combinations examined), calculated using NGS, are well-correlated (whether the selectivity was improved or damaged) with those obtained in the enzymatic assay. Of note, in all 15 combinations, a clear correlation was found between the ranking of the selectivity values that were calculated by each method (ranking is according to the level of selectivity improvement within each method for each enzyme, with the greatest improvement ranked as one; see example in bold boxes in Supplementary Table 7). As shown in Table 1, the NGS analysis predicted a selectivity increase of~7 × 10 3 -fold from cationic trypsin to KLK6 for G17R compared with G17E, and of~70 × 10 3 -fold from anionic trypsin to mesotrypsin for T11I compared with T11H; both these findings are in qualitative agreement with the increase in selectivity determined from the K i values of the soluble proteins, namely, an increase of 5.7-fold and~1.6-fold, respectively (Table 2). However, no correlation was found between the magnitudes of the improvements, i.e., the 7 × 10 3 -fold improvement calculated by NGS was calculated as ã 5.7-fold improvement in the enzymatic assay, while the 70 × 10 3fold improvement calculated by NGS was calculated as only a~1.6fold improvement in the enzymatic assay. Therefore, the selectivity increase values that were calculated by the NGS cannot be directly compared with those of the competitive inhibition studies; rather, the values can be compared between experiments using each method, and not between the two methods. Nevertheless, the results shown in Tables 1 and 2 confirm that our approach can predict the positions that can change target selectivity, and that our approach is sufficiently sensitive to detect small affinity changes, whereas other currently available approaches can typically identify only greater changes in the interactions between proteins 69 .
In further validation of our strategy, we identified most previously described mutations that affect the binding affinity and selectivity of APPI to serine proteases, as well as some novel mutations. For example, we identified residue 15 as a hot spot for all four human serine proteases, as all amino acid mutations in this residue (except R15K) reduced the binding affinity of APPI-3M to each of the four proteases ( Fig. 2 and Supplementary Figure 3). Indeed, residue 15 had previously been identified as a hot spot in Kunitz-domain inhibitors in studies with BPTI 70,71 . In addition, our data identified, for the first time, to the best of our knowledge, that residue 13 is a selectivity cold-spot for mesotrypsin, as most of the mutations in this residue improved selectivity toward mesotrypsin versus all other proteases. On the other hand, mutating the residue in position 11 switched the selectivity from anionic trypsin to mesotrypsin. Therefore, the difference between residues 13 and 11 is that the former facilitates a selectivity switch from three proteases to a specific protease, while the latter enables a selectivity switch from one protease to one other protease.
The use of NGS covered the entire library and provided a comprehensive map of the binding interface. However, generating the library by using a combination of site-specific saturation mutagenesis on the APPI loop, and random mutations also on other parts of the gene, limited our ability to analyze residues that are distant from the interaction site. We attribute this limitation to technical aspects of our library design, as the random mutations generated by using the error-prone PCR were represented to a lower extent than mutations generated by using site-saturation libraries. Nevertheless, the residues that we found to improve the selectivity of APPI toward the four serine proteases can provide an explanation for the basis for target selectivity of inhibitors toward serine proteases. These selectivity-improving mutations can also be beneficial for designing targeted therapeutics for cancer and other diseases, as they can potentially inhibit the desired serine protease in a selective manner, so as to minimize toxic effects. This study also serves as an example for the general utility of our new platform, as many PPI mediators and disease targets belong to large families of related proteins, making target selectivity a highly desirable but challenging goal in drug development. Thus, we our approach for simply and efficiently mapping PPI selectivity landscapes offers great promise for designing novel target-selective therapeutics.

Methods
YSD and flow cytometry cell sorting. The yeast-displayed APPI-3M library was constructed as described in Supplementary Methods. To display the APPI-3M library on the surface of the yeast, the library was grown in an SDCAA selective medium (2% dextrose, 0.67% Difco yeast nitrogen base, 0.5% Bacto casamino acids, 0.52% Na 2 HPO 4 , and 0.856% NaH 2 PO 4 •H 2 O) and induced for expression with a galactose medium (as for SDCAA, but with galactose 2%, instead of dextrose) according to an established protocol 72 . Inactive forms of mesotrypsin, anionic trypsin, and cationic trypsin containing the mutation S195A were used as a precaution against enzymatic cleavage during the experiments 51 . The four serine proteases were labeled with Alexa Fluor dyes (Invitrogen, Carlsbad, CA) and used to detect binding. For pairwise selectivity screen,~1 × 10 8 yeast cells were incubated with different Alexa Fluor-labeled serine proteases in a binding buffer (100 mM Tris, pH = 8.0, 1 mM CaCl 2 , 1% BSA) for 1.5 h at room temperature. Then, the cells were washed with the binding buffer and sorted for the high-selective variants by conducting several independent sorts, using FACSAria [the Ilse Katz Institute for Nanoscale Science and Technology, Ben-Gurion University of the Negev (BGU), Israel]. The complexes included the following pairs and . APPI-3M variants that showed a high binding affinity (top 5% of the entire population) toward one serine protease in the pair and a low binding affinity toward the other were selected. Dual-color flow cytometry (BD Accuri C6, Piscataway, NJ) was used to test the selective binding of each sorted library to one serine protease in the pair in the presence of the other.
Quality filtration and integration of sequences. Sequencing data from each library were treated identically and evaluated in triplicates, and Spearman's rank correlation coefficient 73 was calculated to be above 95%. An average Illumina quality score was calculated for each read in a given set of paired-end reads, and read pairs in which either read had an average quality score lower than 20 (i.e., less than 99% accuracy) were discarded. The remaining read pairs were merged into a single sequence by fast length adjustment of short reads (FLASH) software 74 . DNA sequences and their amino acid translations were aligned to the sequence of APPI-3M; sequences of different lengths and sequences containing stop codons were discarded.
Computational analysis of high-throughput sequencing results. The analysis was performed in MATLAB, version R2016a. Variants with one amino acid mutation and variants with multiple amino acids mutations were analyzed separately. First, the number of reads of each variant from each library was counted. Then, to avoid variants with a low number of reads (which can yield noisy frequencies and enrichment ratios), we determined a threshold value of 100 reads for variants with a single amino acid mutation and 10 reads for variants with double amino acids mutations. Variants with read numbers below the threshold in the naive and sorted library fractions were discarded, and variants with read numbers below the threshold value received the threshold value if the read number of the variant in the other library was above the threshold. Next, the frequency of each remaining variant, v, from each library was computed as F v ¼ Reads v P Reads v , where Reads v is the number of times that this variant appeared in the library. Based on its frequency, the enrichment ratio of each variant from each sorted library was calculated. The enrichment ratio for a given variant, v, was calculated as ER v; ¼ , where F v,sorted is the frequency of the variant in the sorted library and F v,naive is the frequency of the same variant in the naive (presorted) library. Eventually, for single amino acid substitution, heatmaps were created based on the enrichment ratio 5 ; for double amino acid substitutions, we summed the enrichment ratios of similar double-mutation variants that have mutations in the same residues ( P ER x;y ¼ ER 1 þ ER 2 þ þ ER N , where x and y are the mutated residues and N is the number of substitutions at the x and y residues). We illustrated these results as heat maps.

Data availability
All relevant data are available from the authors. The coordinates and structure factors for the complex of APPI-3M-T11V/G17R variant bound to mesotrypsin have been submitted to the Protein Data Bank (PDB) under the accession code 6GFI. The crystal structure of APPI-3M is available in the PDB under the accession code 5C67. The crystal structure of KLK6 is available in the PDB under the accession code 4D8N. The crystal structure of the mesotrypsin/BPTI complex is available in the PDB under the accession code 2R9P.