Introduction

Plasminogen activator inhibitor-1 (PAI-1, SERPINE1) is a member of the serine protease inhibitor (SERPIN) protein superfamily1. SERPINSs function as irreversible inhibitors that covalently bind the active site of their target proteases2. All inhibitory SERPINs share a common mechanism wherein a flexible reactive center loop (RCL) extends outside the central structure of the molecule serving as a cleavable peptide “bait” for the target protease (Fig. 1a)3. The SERPIN and protease form an acyl intermediate when the active site serine acts as a donor nucleophile to bind to the carbonyl carbon of the P1 residue within the RCL4,5. However, the RCL rapidly inserts into central β-sheet A before the hydrolysis reaction is completed, carrying the tethered protease to the opposite side of the molecule in a pole-to-pole transition6. This structural transformation stabilizes the protease-SERPIN complex and renders both the SERPIN and the serine protease no longer active2,7,8. If the rate of RCL insertion is slower than the resolution of the acyl intermediate, complete proteolysis of the scissile bond occurs, with the SERPIN then functioning as a substrate instead of an inhibitor9. Either of these mechanisms is possible when a SERPIN interacts with a given protease. The specificity of a SERPIN for its target protease is in part driven by the probability that the inhibitory pathway dominates over the substrate pathway10. This balance presents a unique challenge when engineering SERPIN variants that inhibit non-canonical target proteases11,12.

Figure 1
figure 1

PAI-1 displayed as a fusion protein on the surface of filamentous phage is an inhibitor of uPA. (a) Active PAI-1 (PDB:3PB1) reacts with free uPA (PDB:1W0Z) to form a covalent uPA:PAI-1 complex59,60. Both free and complexed uPA can be immunoprecipitated using an anti-uPA antibody. Alternatively, PAI-1 can spontaneously relax from its active, metastable state to a low-energy yet chemically inert latent conformation (PDB: 1DVN)61. PAI-1’s reactive center loop is highlighted in orange. (b) Schematic of phPAI-1 displayed as a fusion to the pIII coat protein of M13 filamentous phage, with the N-terminal myc- and C-terminal FLAG- and E-tags highlighted. This figure was generated using Adobe Illustrator 2021 version 25.2.3 (a,b), PyMOL version 2.5.0 (a), and BioRender.com (b).

PAI-1 is the principal inhibitor of the serine proteases tissue-type and urokinase-type plasminogen activators (tPA and uPA, respectively)13,14. tPA and uPA activate the zymogen plasminogen to plasmin15, the enzyme responsible for the proteolysis of fibrin—the structural backbone of blood clots16. Human deficiency of PAI-1 results in excessive plasmin generation and a mild to moderate bleeding diathesis17,18,19,20,21. Likewise, PAI-1 deficient mice are viable, exhibiting a moderate increase in fibrinolytic activity22. In addition to its critical role in regulating hemostasis, PAI-1 has been implicated in a number of other processes, including a link to longevity23 and numerous other pathogenic processes24. Given the wide-ranging functions of PAI-1, understanding the effects of potentially damaging mutations in PAI-1 may have clinical implications beyond the canonical role of PAI-1 in regulating fibrinolysis.

The present work couples phage display with high-throughput DNA sequencing (HTS), to measure the effects of multiple missense mutations on PAI-1’s uPA inhibitory function in a massively parallel fashion25. The overall goal is to map the mutational landscape of PAI-1 with respect to mutations that render PAI-1 no longer functional to (1) better understand the natural evolution of PAI-1 with respect to the amino acid space that can be occupied at any given residue, but also (2) to use this high resolution map of PAI-1 as a basis for engineering novel SERPINs. SERPINs, including PAI-1, have been previously engineered to inhibit proteases other than their canonical targets26,27 demonstrating the potential to develop novel therapeutics for the treatment of a variety of disorders including hemophilia and alpha-1-antitrypsin deficiency. PAI-1 is a particularly attractive choice as a SERPIN scaffold as it lacks native cysteine residues and remains functional in the absence of glycosylation, facilitating large-scale production of functionally active, recombinant PAI-1 in bacterial systems28,29.

Results and discussion

Characterization of PAI-1 fusion protein

Selection of phage displayed PAI-1 (phPAI-1) in the presence of a ninefold molar excess of a negative control, phage displaying the A3 domain of von Willebrand Factor (VWF, a protein fragment not known to interact with uPA, phVWF-A3), for complex formation with uPA (Fig. 2) resulted in at least a five-fold enrichment in phPAI-1 relative to phVWF-A3, indicating that the immunoprecipitation is specific for uPA and uPA:PAI-1 complexes. Given the rigorous washing of the immunoprecipitated complex, this enhancement further suggests that PAI-1 expressed as a pIII fusion protein on the phage surface retains its inhibitory activity30,31,32.

Figure 2
figure 2

Immunoprecipitation is specific for phPAI-1:uPA complexes. Nine parts phVWF-A3 and one part phPAI-1 (9:1) were combined (input, n = 3), incubated with uPA (1.7 nM) for 30 min at 37 °C, and selected by immunoprecipitation with an anti-uPA antibody (selected, n = 3). For each replica, 24 colonies were genotyped by PCR using primers common to both the phVWF-A3 (677 bp) and phPAI-1 inserts (1271 bp) followed by analysis on a 1% agarose gel. This figure was generated using GraphPad Prism version 9.0.2.

Characterization of the mutant library

The phPAI-1 mutant library exhibits a depth of 8.04 × 106 independent clones with an average of 4.2 ± 1.8 amino acid substitutions/molecule as determined by Sanger sequencing of 13 randomly selected phage clones. HTS demonstrated that 5117 of the possible 7201 missense variants (71%) are present in the mutant phPAI-1 library, along with at least 1 nonsense mutation at 269 of the 379 PAI-1 amino acid positions (71%). The frequency of DNA sequencing reads for individual amino acid substitutions within the starting library ranged over > 104-fold. As expected, the frequency of specific amino acid substitutions also varied based on the genetic code, with, for example, reduced representation of Met and Trp substitutions (both encoded by only a single codon), compared to Arg, Leu, and Ser substitutions (each encoded by six codons) (Fig. 3).

Figure 3
figure 3

phPAI-1 mutant library generated by error prone PCR includes more than two-thirds of all possible missense mutations. The mutational library contains 71% of all possible missense and nonsense mutations with 27% of all missense variants present with sufficient depth (base mean score > 10, padj < 0.05) to accurately determine the effects of the mutation on PAI-1 function. The primary amino acid position within PAI-1 is indicated along the x-axis and single amino acid substitutions are listed along the y-axis. WT amino acid residues are indicated in yellow, while missense and nonsense (X) mutations not present in the input library are shown in white. Variants present within the library are shown in grey (see scale) as a percentage of the input library represented by that variant. This figure was generated using Adobe Illustrator 2021 version 25.2.3 and the ggplot2 package version 3.3.3 for RStudio version 1.4.1106.

To limit the proportion of phPAI-1 variants transitioning to the non-reactive latent conformation, all reactions with uPA were performed immediately following phage production. To limit false positives within the dataset, only those variants with a base mean score (average of the normalized counts in the input and selected libraries corrected for sequencing depth as defined by Love et al.33). greater than 10 and an adjusted p value (padj) < 0.05 were included in further analyses (Fig. 4). Based on these criteria, 1963 (38%) of the 5117 missense variants present in the starting library could be scored for uPA reactivity or lack thereof. Although not a complete profile of all mutational space, these data represent a marked expansion of the mutational space that has been explored in previous reports34. Furthermore, the use of HTS facilitates accurate assessment for both gain- and loss-of-function mutations after only a single round of panning31,35.

Figure 4
figure 4

phPAI-1 uPA-selected libraries are enriched in variants that retain their inhibitory function and depleted of those without inhibitory activity. An MA plot33 of base mean score (average of counts in the input and selected libraries) vs. log2-fold change is shown, with missense mutations in pink and nonsense variants in blue. Variants with padj < 0.05 are shown as closed circles, while those that do not meet this significance threshold are shown as open circles. A base mean score greater than 10 was also set as a threshold for determining significance. This figure was generated using Adobe Illustrator 2021 version 25.2.3 and the ggplot2 package version 3.3.3 for RStudio version 1.4.1106.

Massively parallel assessment of variant impact on uPA inhibitory function

Following selection with uPA, 826 PAI-1 missense variants retained the ability to form a complex with uPA, with a range of enrichment scores likely representing varying degrees of inhibitory activity towards uPA (Fig. 5). Similarly, depleted variants (log2-fold enrichment score ≤ 0, n = 1137) are broadly classified here as loss-of-function, likely including variants that retain a low level of inhibitory activity towards uPA—again, reflecting that this approach enables the mapping of functional variability with respect to both gain and loss of function (Fig. 5). Missense variants were enriched or depleted up to 6- or 23-fold, respectively and different amino acid substitutions at the same position may exhibit opposite effects. For comparison, consider two mutations at Ile91, I91L and I91N, each representing approximately 0.0005% of the of the input library (Fig. 3). Following selection with uPA, I91L was enriched three-fold, consistent with previous reports that this mutation not only does not ablate PAI-1’s inhibitory function, but also extends its functional half-life31. In contrast, I91N was depleted three-fold—demonstrating that while the I91L mutation is well tolerated, I91N results in loss of function with respect to uPA inhibition. Of note, the selection method employed here (complex formation with uPA) does not distinguish between the three potential mechanisms for loss-of-function: PAI-1 misfolding, accelerated transition to the inactive latent state, and/or serving as a substrate for uPA. All three of these loss-of-function phenotypes would result in the inability of PAI-1 to form a covalent complex with uPA, and thus would be lost to selection for uPA binding.

Figure 5
figure 5

phPAI-1 mutational libraries contain missense variants that result in loss of function. Amino acid position is indicated on the x-axis, while amino acid substitutions are indicated on the y-axis. Loss of function missense variants, as well as those with a reduced capacity to inhibit uPA, are shown in red with shading as a function of their log2-fold score, while variants in blue retain PAI-1 inhibitory function. The intensity of shading scales with the degree of enrichment or depletion. WT amino acid residues are shown in black, while beige indicates missense mutations that were present in the mutational library but did not meet significance thresholds. White indicates amino acid substitutions that were not present in the mutational library. This figure was generated using Adobe Illustrator 2021 version 25.2.3 and the ggplot2 package version 3.3.3 for RStudio version 1.4.1106.

Implications for structure–function relationships

The results of our PAI-1 functional screen can be used to assess specific regions within PAI-1 without the additional construction of a targeted libraries. This point is highlighted by analysis of the RCL (residues 331–350) as illustrated in Fig. 6, although a similar approach could similarly be applied to other regions of interest, In the RCL, the observed enrichment and depletion at the P1 and P1′ positions (residues 346 and 347) are consistent with our understanding of PAI-1 biology. The P1 position has been shown to be a key determinant of SERPIN target protease specificity36,37,38,39,40, with PAI-1 inhibitory activity toward uPA requiring either a P1 Lys or the WT Arg residue41 Consistent with these previous reports, no missense mutations were tolerated at P1 in our screen (of note, lysine at this position is absent from our library), with several substitutions significantly depleted (Figs. 3, 6). Consistent with the previously reported tolerance of the P1′ position for most amino acid substitutions41, our screen identified no loss-of-function PAI-1 variants at this position (Fig. 6).

Figure 6
figure 6

Deep mutational scanning of PAI-1 provides insight into the mutational landscape of PAI-1’s RCL. Relative log2-fold change scores of missense mutations in PAI-1’s RCL (amino acids 331–350) following selection by ability to inhibit uPA39. Amino acids below the x-axis indicate depletion (log2-fold change < 0), while those above the x-axis indicate enrichment (log2-fold change > 0). Letter height corresponds to the relative log2-fold change. Residues are color coded by properties: acidic residues in red, basic residues in blue, polar amino acids in green, non-polar amino acids in orange, and Pro in purple39. The P1 and P1′ positions (Met346 and Arg347 in WT) are also indicated. This figure was generated using Adobe Illustrator 2021 version 25.2.3 and Seq2Logo 2.0 (http://www.cbs.dtu.dk/biotools/Seq2Logo/).

At the N-terminus of the RCL, enriched or tolerated substitutions observed in our data generally consist of small aliphatic and polar amino acids. For PAI-1 to retain its inhibitory function, this region of the RCL must be able to insert into β-sheet A9. These small amino acids allow the RCL to undergo the dramatic conformational changes that are required for this insertion. Consistent with this model, substitutions with bulky and/or charged side chains (Lys, Arg, Pro, Asp, Phe) were the most depleted residues for those N-terminal RCL positions whose side chains become oriented into the core beta sheet upon insertion42,43,44,45. In contrast, residues C-terminal to the scissile bond (P2′–P4′) are more tolerant of mutations than those at the N terminus of the RCL, as the former region does not insert into the central β-sheet46. Finally, the flexibility of the RCL is also important for dictating PAI-1’s inhibitory behavior, and our data are concordant with a previous proline-scanning mutagenesis screen42. Proline residues in the RCL would also be incompatible with RCL insertion into β-sheet A, which transforms it from a largely parallel β-sheet to a more stable anti-parallel β-sheet.

Correlation with predictive algorithms and human genome sequence variant data

A number of algorithms have been developed to predict the impact of single amino acid substitutions on protein function based on evolutionary conservation and/or amino acid type47. We compared our high throughput screening data with predictions from two commonly used algorithms, SIFT48 and PolyPhen-249. SIFT predicts the effects of amino acid substitutions by comparison to homologous sequences, and PolyPhen-2 uses both sequence conservation and structural homology to predict the effects of amino acid substitutions on protein function.

The SIFT algorithm prediction was concordant for 745 of the 1137 (66%) amino acid substitutions scored as “loss of function” in our screen and for 538 of the 826 (65%) scored as neutral. This level of concordance is similar to that previously reported for known deleterious human genetic mutations in other genes48. PolyPhen-2 exhibited concordance with our data for 994 of 1137 (87%) “loss of function” substitutions, but only 454 of the 836 (54%) neutral PAI-1 amino acid substitutions49. Overall, while these algorithms are a valuable resource for predicting protein functionality, they are unable to correctly assign all missense variants—emphasizing the need for deep mutational scanning of multiple different types/families of proteins.

Additionally, available human genomic sequence information provides support for the potential value of our data in interpreting the significance of human genetic variation identified by future clinical sequencing. The gnomAD database50 catalogs human amino acid sequence variant information from ~ 140,000 human exomic/genomic sequences, including 202 variants scored in our mutation screening analysis. Of these 202 variants, 92 were classified by our data as “loss of function”, significantly less than expected by chance (p = 2 × 10–4 SI Table 3), consistent with negative (purifying) selective pressure in the human population to maintain PAI-1 activity.

Evolutionary conservation of PAI-1 is consistent with mutational tolerance

To determine whether the distribution of functional missense mutations detected in this screen reflected the evolutionary constraints of individual PAI-1 amino acid positions, the evolved variation in natural sequences was leveraged. PAI-1 sequences of 84 extant mammalian species were present in the cleaned alignment. Significant differences in evolutionary conservation of sites in these alignments were observed among positions manifesting varying numbers of functional mutants in our mutational scanning data (ANOVA F3375 = 24.5, p < 0.0001, R2 = 0.23, Fig. 7). The overall trend was toward increasing evolutionary lability (less conservation) in the positions that accepted more functional mutants in our human PAI-1 constructs. For example, position 346 that defines the P1 site is among the most evolutionary conserved residues (evolutionary conservation score = − 0.774) and consequently is also found in the first quadrant of the normalized functional scores (Fig. 7). Natural exploration of sequence space through evolutionary time in PAI-1 therefore provides a partial guide to mutational tolerance that complements deep mutational scanning, which can explore mutational space that is hitherto unseen in nature. Overall, these results suggest that there is a limited mutational space that is consistent with PAI-1 functionality as a specific uPA inhibitor, and that altering the specificity of PAI-1 for novel serine proteases will likely require expansion into as yet unexplored regions of PAI-1’s mutational landscape.

Figure 7
figure 7

Site susceptibility to accepting missense mutations correlates with evolutionary conservation. The normalized functional mutation count (defined as the number of enriched/functional mutations divided by the total number of mutations scored in our screen at each position) is shown as quartiles along the x-axis with the range and number of positions in each quartile indicated. The fourth quartile (Q4, purple) corresponds to those positions at which the most introduced missense mutations are tolerated, while the first quartile (Q1, blue) corresponds to those positions where the least introduced mutations were tolerated. In comparison the ConSurf evolutionary conservation score is shown on the y-axis and ranges from − 0.77 to 6.3, with larger scores corresponding to sites that are more evolutionarily labile and lower scores corresponding to positions that are more conserved. ANOVA analysis shows that the degree of tolerance for missense mutations predicts evolutionary conservation in extant mammalian species’ PAI-1 (F3.375 = 24.5, p < 0.0001 R2 = 0.23). This figure was generated using Adobe Illustrator 2021 version 25.2.3 and GraphPad Prism version 9.0.2.

Conclusions

Deep mutational scanning has been applied to a number of proteins to analyze function, binding interactions, cellular protein abundance, cell growth/viability, and protein stability25,28,29,51,52. In the present study, we have adapted this approach to construct a detailed map for the mutational landscape of PAI-1 with respect to the gain or loss of its capacity to inhibit uPA. We anticipate that the mutational landscape of PAI-1 for other serine proteases, including its other canonical substrate, tPA, would likely demonstrate significant differences53, enabling engineering of PAI-1-like SERPINs with novel inhibitory profiles.

Previous PAI-1 mutational studies34 have been restricted to limited segments of PAI-141,53, or selected for a few variants with a unique functional impact, such as extended functional stability31,35. The error prone PCR approach used here to generate the phPAI-1 mutant library offers speed and ease of application with broad coverage of a significant subset of potential single amino acid substitutions. However, variant coverage is incomplete (Fig. 3), providing significant loss of function data for only a subset of mutation space (Fig. 5). Future advances in molecular approaches and machine learning algorithms will facilitate a comprehensive map of the mutation landscape for PAI-1 and numerous other proteins54.

Broadly, the work presented herein demonstrates how deep mutational scanning complements predictive algorithms of protein function and patterns observed in natural evolutionary processes. Furthermore, the data reported in this study provide a valuable resource for the interpretation of sequence variants in PAI-1 and other genes identified by the expanding clinical application of human whole genome sequencing.

Methods

Construction of a phage display library expressing PAI-1 fusion proteins

For display of PAI-1 on the M13 filamentous bacteriophage (phPAI-1, Fig. 1B), human SERPINE1 cDNA including a N-terminal myc tag and Gly–Gly–Gly–Ser linker was cloned between the AscI and NotI restriction sites of pAY-FE (Genbank #MW464120)55,56. The resulting construct encodes a phage-displayed PAI-1 protein N-terminally fused to a myc tag and C-terminally fused with FLAG and E tags (Fig. 1). The PAI-1 fusion protein was randomly mutagenized using the GeneMorph II Random Mutagenesis Kit (Agilent Technologies, Santa Clara, CA, USA). Primers used for PCR mutagenesis (SI Table 1) maintained the AscI and NotI restriction sites for ligation of the restriction digested insert into pAY-FE. Following ligation, the library was transformed into electrocompetent XL-1 Blue MRF’ E. coli as per manufacturer’s instructions. The depth of the library was determined by quantifying the number of ampicillin resistant colonies. Mutation frequency was estimated by Sanger sequencing of the SERPINE1 inserts from randomly selected individual colonies (n = 13).

Phage production and purification

Phage were prepared as previously reported57. Briefly, E. coli harboring pAY-FE PAI-1 were grown in LB Broth supplemented with 2% glucose and ampicillin (100 μg/mL) at 37 °C and during mid-log phase (OD600 0.3–0.4) were infected with M13KO7 helper phage at a multiplicity of infection of ~ 100, followed by growth for an additional 1 h at 37 °C. Cells were pelleted by centrifugation (4250×g for 10 m at 4 °C), resuspended in 2xYT media (16 g/L tryptone, 10 g/L yeast extract, 5 g/L NaCl) supplemented with ampicillin (100 μg/mL), kanamycin (30 μg/mL) and IPTG (0.4 mM) to induce expression of the PAI-1 fusion protein, and grown for 2 h at 37 °C55. All subsequent phage preparation steps were carried out at 4 °C to minimize PAI-1 transition to latency (SI Fig. 1). Phage were precipitated with polyethylene glycol-8000 (2.5% w/v) and NaCl (0.5 M) for up to 16 h followed by centrifugation (20,000×g for 20 min at 4 °C). The precipitated phage pellet was resuspended in 50 mM Tris containing 150 mM NaCl (pH 7.4; TBS). Phage titer was determined by transducing naïve XL-1 Blue MRF’ E. coli grown to mid-log phase for 1 h at 37 °C and plating on LB-agar supplemented with ampicillin (100 μg/mL) and 2% glucose.

Selection of uPA-bound phPAI-1

phVWF-A3, in which the VWF A3 domain (Ser1681-Cys1872) was PCR amplified (SI Table 1) and cloned into the pAY-FE vector between the AscI and NotI restriction sites (generating pAY-FE-VWF-A3), were used as a negative control for uPA binding. phPAI-1 were diluted 9:1 with phVWF-A3 and then incubated with uPA (1.7 nM) for 30 min at 37 °C. Residual protease activity was inhibited by incubating the reaction mixture with 1X EDTA-free protease inhibitor cocktail for 10 min at 37 °C. uPA (free and complexed) was immunoprecipitated using magnetic protein G beads (15 μL), which were previously coupled to a polyclonal anti-uPA antibody (17 nM). Beads were washed four times with TBS containing 5% BSA (1 mL), resuspended in Tris (20 mM) pH 8.0 containing 50 mM NaCl, 2 mM CaCl2, and 5% BSA, and eluted by digestion with enteropeptidase (16 U, New England BioLabs, Ipswich, MA, USA) for 16 h at 4 °C. The eluted phage pool was used to infect naive XL-1 Blue MRF’ E. coli. Eluted phage titers were quantified by transduction of XL-1 Blue MRF’ cells as described above. To determine the composition of the phage pools before and after selection, single colonies of ampicillin resistant bacteria were selected, and their DNA amplified by PCR using primers annealing outside the insertion site, to a region common to both pAY-FE:PAI-1 and pAY-FE:VWF-A3 (SI Table 1) with three replicates of n = 24 colonies in each.

High-throughput sequencing (HTS)

Twelve overlapping amplicons (150 bp) were PCR amplified from pAY-FE PAI-1 (SI Table 1) with overlapping regions only analyzed on one amplicon (SI Table 2). PCR amplicon products were gel purified, pooled (100 ng DNA divided equally between 12 amplicons), dA-tailed (NEBNext Ultra End Repair/dA-tail, New England Biolabs), and ligated to NextFlex barcodes (Bioo Scientific, Austin, TX) with NEBNext Ultra Ligation (New England Biolabs). Ligated products were purified with AmPure beads (Beckman Coulter, Indianapolis, IN, USA) according to the manufacturer’s directions. HTS was performed as previously described29 using the Illumina HiSeq2500 or HiSeqX platforms (Illumina, San Diego, CA, USA) at the University of Michigan DNA Sequencing Core (Ann Arbor, MI, USA) or MedGenome, Inc (Foster City, CA, USA). HTS data were analyzed using DESeq233 analyzing mutations at each position independent of other mutations within a given amplicon.

Comparison of variant selection results to publicly available datasets

Results of the DESeq2 analysis of input versus selected were compared to the output from the Sorting Intolerant from Tolerant (SIFT) algorithm that predicts the effect of an amino acid substitution on protein function by multiple sequence alignments of related proteins (PAI-1 from S. scrofa, B. taurus, M. vison, R. norvegicus, and M. musculus; glia-derived nexin from H. sapiens, M. musculus, and R. norvegicus; neuroserpin from H. sapiens, G. gallus, R. norvegicus, and M. musculus)58. To compare SIFT results to our data, tolerated mutations were defined as those that were able to inhibit uPA (log2-fold change > 0) at 0 h, and noninhibitors (log2-fold < 0) at 0 h were defined as not tolerated. Finally, a χ2 test (SI Table 3) was used to determine if the variants identified as loss-of-function by our high-throughput screen were significantly underrepresented in the gnomAD database50 by comparing the expected frequency of variants identified in our screen that were present in gnomAD versus those that were not present.

Evolutionary variability of PAI-1

A protein alignment of PAI-1 orthologs from 94 mammal species was constructed using the Comparative Genomics tools of the Ensembl webserver (www.ensembl.org; release 104) to search for orthologs to the human serpine1 gene (ENSG00000106366). We trimmed the alignment to include only the 379 positions contained in human PAI-1, and removed sequences containing more than 5 percent gaps after trimming. We then used ConSurf (https://consurf.tau.ac.il/) to calculate evolutionary conservation scores for each position in the protein, where higher ConSurf scores indicated more evolutionarily variable positions. To relate the functional susceptibility of each amino acid position to substitutions in our library to the same position’s evolutionary conservation score, a normalized functional mutation score was determined. This score was equivalent to the number of significantly enriched or functional missense mutations at a given position divided by the total number missense mutations with sufficient depth to be analyzed in our library at the same position. The normalized functional mutation scores were further divided into quartiles and used for an analysis of variance (ANOVA) to test the normalized functional mutation score predicted the degree of evolutionary conservation at the same position.