Deep mutational scanning of the plasminogen activator inhibitor-1 functional landscape

The serine protease inhibitor (SERPIN) plasminogen activator inhibitor-1 (PAI-1) is a key regulator of the fibrinolytic system, inhibiting the serine proteases tissue- and urokinase-type plasminogen activator (tPA and uPA, respectively). Missense variants render PAI-1 non-functional through misfolding, leading to its turnover as a protease substrate, or to a more rapid transition to the latent/inactive state. Deep mutational scanning was performed to evaluate the impact of amino acid sequence variation on PAI-1 inhibition of uPA using an M13 filamentous phage display system. Error prone PCR was used to construct a mutagenized PAI-1 library encompassing ~ 70% of potential single amino acid substitutions. The relative effects of 27% of all possible missense variants on PAI-1 inhibition of uPA were determined using high-throughput DNA sequencing. 826 missense variants demonstrated conserved inhibitory activity while 1137 resulted in loss of PAI-1 inhibitory function. The least evolutionarily conserved regions of PAI-1 were also identified as being the most tolerant of missense mutations. The results of this screen confirm previous low-throughput mutational studies, including those of the reactive center loop. These data provide a powerful resource for explaining structure–function relationships for PAI-1 and for the interpretation of human genomic sequence variants.

www.nature.com/scientificreports/ regulating hemostasis, PAI-1 has been implicated in a number of other processes, including a link to longevity 23 and numerous other pathogenic processes 24 . Given the wide-ranging functions of PAI-1, understanding the effects of potentially damaging mutations in PAI-1 may have clinical implications beyond the canonical role of PAI-1 in regulating fibrinolysis. The present work couples phage display with high-throughput DNA sequencing (HTS), to measure the effects of multiple missense mutations on PAI-1's uPA inhibitory function in a massively parallel fashion 25 . The overall goal is to map the mutational landscape of PAI-1 with respect to mutations that render PAI-1 no longer functional to (1) better understand the natural evolution of PAI-1 with respect to the amino acid space that can be occupied at any given residue, but also (2) to use this high resolution map of PAI-1 as a basis for engineering novel SERPINs. SERPINs, including PAI-1, have been previously engineered to inhibit proteases other than their canonical targets 26,27 demonstrating the potential to develop novel therapeutics for the treatment of a variety of disorders including hemophilia and alpha-1-antitrypsin deficiency. PAI-1 is a particularly attractive choice as a SERPIN scaffold as it lacks native cysteine residues and remains functional in the absence of glycosylation, facilitating large-scale production of functionally active, recombinant PAI-1 in bacterial systems 28,29 .

Results and discussion
Characterization of PAI-1 fusion protein. Selection of phage displayed PAI-1 (phPAI-1) in the presence of a ninefold molar excess of a negative control, phage displaying the A3 domain of von Willebrand Factor (VWF, a protein fragment not known to interact with uPA, phVWF-A3), for complex formation with uPA ( Fig. 2) resulted in at least a five-fold enrichment in phPAI-1 relative to phVWF-A3, indicating that the immunoprecipitation is specific for uPA and uPA:PAI-1 complexes. Given the rigorous washing of the immunoprecipitated complex, this enhancement further suggests that PAI-1 expressed as a pIII fusion protein on the phage surface retains its inhibitory activity 30-32 . Characterization of the mutant library. The phPAI-1 mutant library exhibits a depth of 8.04 × 10 6 independent clones with an average of 4.2 ± 1.8 amino acid substitutions/molecule as determined by Sanger sequencing of 13 randomly selected phage clones. HTS demonstrated that 5117 of the possible 7201 missense variants (71%) are present in the mutant phPAI-1 library, along with at least 1 nonsense mutation at 269 of the 379 PAI-1 amino acid positions (71%). The frequency of DNA sequencing reads for individual amino acid substitutions  59,60 . Both free and complexed uPA can be immunoprecipitated using an anti-uPA antibody. Alternatively, PAI-1 can spontaneously relax from its active, metastable state to a low-energy yet chemically inert latent conformation (PDB: 1DVN) 61 . PAI-1's reactive center loop is highlighted in orange. (b) Schematic of phPAI-1 displayed as a fusion to the pIII coat protein of M13 filamentous phage, with the N-terminal myc-and C-terminal FLAG-and E-tags highlighted. This figure was generated using Adobe Illustrator 2021 version 25.2.3 (a,b), PyMOL version 2.5.0 (a), and BioRender.com (b). www.nature.com/scientificreports/ within the starting library ranged over > 10 4 -fold. As expected, the frequency of specific amino acid substitutions also varied based on the genetic code, with, for example, reduced representation of Met and Trp substitutions (both encoded by only a single codon), compared to Arg, Leu, and Ser substitutions (each encoded by six codons) (Fig. 3).
To limit the proportion of phPAI-1 variants transitioning to the non-reactive latent conformation, all reactions with uPA were performed immediately following phage production. To limit false positives within the dataset, only those variants with a base mean score (average of the normalized counts in the input and selected libraries corrected for sequencing depth as defined by Love et al. 33 ). greater than 10 and an adjusted p value (p adj ) < 0.05 Figure 2. Immunoprecipitation is specific for phPAI-1:uPA complexes. Nine parts phVWF-A3 and one part phPAI-1 (9:1) were combined (input, n = 3), incubated with uPA (1.7 nM) for 30 min at 37 °C, and selected by immunoprecipitation with an anti-uPA antibody (selected, n = 3). For each replica, 24 colonies were genotyped by PCR using primers common to both the phVWF-A3 (677 bp) and phPAI-1 inserts (1271 bp) followed by analysis on a 1% agarose gel. This figure was generated using GraphPad Prism version 9.0.2. Figure 3. phPAI-1 mutant library generated by error prone PCR includes more than two-thirds of all possible missense mutations. The mutational library contains 71% of all possible missense and nonsense mutations with 27% of all missense variants present with sufficient depth (base mean score > 10, p adj < 0.05) to accurately determine the effects of the mutation on PAI-1 function. The primary amino acid position within PAI-1 is indicated along the x-axis and single amino acid substitutions are listed along the y-axis. WT amino acid residues are indicated in yellow, while missense and nonsense (X) mutations not present in the input library are shown in white. Variants present within the library are shown in grey (see scale) as a percentage of the input library represented by that variant. This figure was generated using Adobe Illustrator 2021 version 25 www.nature.com/scientificreports/ were included in further analyses (Fig. 4). Based on these criteria, 1963 (38%) of the 5117 missense variants present in the starting library could be scored for uPA reactivity or lack thereof. Although not a complete profile of all mutational space, these data represent a marked expansion of the mutational space that has been explored in previous reports 34 . Furthermore, the use of HTS facilitates accurate assessment for both gain-and loss-offunction mutations after only a single round of panning 31,35 .
Massively parallel assessment of variant impact on uPA inhibitory function. Following selection with uPA, 826 PAI-1 missense variants retained the ability to form a complex with uPA, with a range of enrichment scores likely representing varying degrees of inhibitory activity towards uPA (Fig. 5). Similarly, depleted variants (log 2 -fold enrichment score ≤ 0, n = 1137) are broadly classified here as loss-of-function, likely including variants that retain a low level of inhibitory activity towards uPA-again, reflecting that this approach enables the mapping of functional variability with respect to both gain and loss of function (Fig. 5). Missense variants were enriched or depleted up to 6-or 23-fold, respectively and different amino acid substitutions at the same position may exhibit opposite effects. For comparison, consider two mutations at Ile 91 , I91L and I91N, each representing approximately 0.0005% of the of the input library ( Fig. 3). Following selection with uPA, I91L was enriched three-fold, consistent with previous reports that this mutation not only does not ablate PAI-1's inhibitory function, but also extends its functional half-life 31 . In contrast, I91N was depleted three-fold-demonstrating that while the I91L mutation is well tolerated, I91N results in loss of function with respect to uPA inhibition. Of note, the selection method employed here (complex formation with uPA) does not distinguish between the three potential mechanisms for loss-of-function: PAI-1 misfolding, accelerated transition to the inactive latent state, and/or serving as a substrate for uPA. All three of these loss-of-function phenotypes would result in the inability of PAI-1 to form a covalent complex with uPA, and thus would be lost to selection for uPA binding.
Implications for structure-function relationships. The results of our PAI-1 functional screen can be used to assess specific regions within PAI-1 without the additional construction of a targeted libraries. This point is highlighted by analysis of the RCL (residues 331-350) as illustrated in Fig. 6, although a similar approach could similarly be applied to other regions of interest, In the RCL, the observed enrichment and depletion at the P1 and P1′ positions (residues 346 and 347) are consistent with our understanding of PAI-1 biology. The P1 position has been shown to be a key determinant of SERPIN target protease specificity [36][37][38][39][40] , with PAI-1 inhibitory activity toward uPA requiring either a P1 Lys or the WT Arg residue 41 Consistent with these previous reports, no missense mutations were tolerated at P1 in our screen (of note, lysine at this position is absent from our library), with several substitutions significantly depleted (Figs. 3,6). Consistent with the previously reported tolerance of the P1′ position for most amino acid substitutions 41 , our screen identified no loss-of-function PAI-1 variants at this position (Fig. 6). At the N-terminus of the RCL, enriched or tolerated substitutions observed in our data generally consist of small aliphatic and polar amino acids. For PAI-1 to retain its inhibitory function, this region of the RCL must be able to insert into β-sheet A 9 . These small amino acids allow the RCL to undergo the dramatic conformational changes that are required for this insertion. Consistent with this model, substitutions with bulky and/or charged side chains (Lys, Arg, Pro, Asp, Phe) were the most depleted residues for those N-terminal RCL positions whose   www.nature.com/scientificreports/ side chains become oriented into the core beta sheet upon insertion [42][43][44][45] . In contrast, residues C-terminal to the scissile bond (P2′-P4′) are more tolerant of mutations than those at the N terminus of the RCL, as the former region does not insert into the central β-sheet 46 . Finally, the flexibility of the RCL is also important for dictating PAI-1's inhibitory behavior, and our data are concordant with a previous proline-scanning mutagenesis screen 42 . Proline residues in the RCL would also be incompatible with RCL insertion into β-sheet A, which transforms it from a largely parallel β-sheet to a more stable anti-parallel β-sheet.
Correlation with predictive algorithms and human genome sequence variant data. A number of algorithms have been developed to predict the impact of single amino acid substitutions on protein function based on evolutionary conservation and/or amino acid type 47 . We compared our high throughput screening data with predictions from two commonly used algorithms, SIFT 48 and PolyPhen-2 49 . SIFT predicts the effects of amino acid substitutions by comparison to homologous sequences, and PolyPhen-2 uses both sequence conservation and structural homology to predict the effects of amino acid substitutions on protein function. The SIFT algorithm prediction was concordant for 745 of the 1137 (66%) amino acid substitutions scored as "loss of function" in our screen and for 538 of the 826 (65%) scored as neutral. This level of concordance is similar to that previously reported for known deleterious human genetic mutations in other genes 48 . PolyPhen-2 exhibited concordance with our data for 994 of 1137 (87%) "loss of function" substitutions, but only 454 of the 836 (54%) neutral PAI-1 amino acid substitutions 49 . Overall, while these algorithms are a valuable resource for predicting protein functionality, they are unable to correctly assign all missense variants-emphasizing the need for deep mutational scanning of multiple different types/families of proteins.
Additionally, available human genomic sequence information provides support for the potential value of our data in interpreting the significance of human genetic variation identified by future clinical sequencing. The gnomAD database 50 catalogs human amino acid sequence variant information from ~ 140,000 human exomic/ genomic sequences, including 202 variants scored in our mutation screening analysis. Of these 202 variants, 92 were classified by our data as "loss of function", significantly less than expected by chance (p = 2 × 10 -4 SI Table 3), consistent with negative (purifying) selective pressure in the human population to maintain PAI-1 activity.

Evolutionary conservation of PAI-1 is consistent with mutational tolerance.
To determine whether the distribution of functional missense mutations detected in this screen reflected the evolutionary constraints of individual PAI-1 amino acid positions, the evolved variation in natural sequences was leveraged. PAI-1 sequences of 84 extant mammalian species were present in the cleaned alignment. Significant differences in evolutionary conservation of sites in these alignments were observed among positions manifesting varying numbers of functional mutants in our mutational scanning data (ANOVA F 3375 = 24.5, p < 0.0001, R 2 = 0.23, Fig. 7). The overall trend was toward increasing evolutionary lability (less conservation) in the positions that www.nature.com/scientificreports/ accepted more functional mutants in our human PAI-1 constructs. For example, position 346 that defines the P1 site is among the most evolutionary conserved residues (evolutionary conservation score = − 0.774) and consequently is also found in the first quadrant of the normalized functional scores (Fig. 7). Natural exploration of sequence space through evolutionary time in PAI-1 therefore provides a partial guide to mutational tolerance that complements deep mutational scanning, which can explore mutational space that is hitherto unseen in nature. Overall, these results suggest that there is a limited mutational space that is consistent with PAI-1 functionality as a specific uPA inhibitor, and that altering the specificity of PAI-1 for novel serine proteases will likely require expansion into as yet unexplored regions of PAI-1's mutational landscape.

Conclusions
Deep mutational scanning has been applied to a number of proteins to analyze function, binding interactions, cellular protein abundance, cell growth/viability, and protein stability 25,28,29,51,52 . In the present study, we have adapted this approach to construct a detailed map for the mutational landscape of PAI-1 with respect to the gain or loss of its capacity to inhibit uPA. We anticipate that the mutational landscape of PAI-1 for other serine proteases, including its other canonical substrate, tPA, would likely demonstrate significant differences 53 , enabling engineering of PAI-1-like SERPINs with novel inhibitory profiles. Previous PAI-1 mutational studies 34 have been restricted to limited segments of PAI-1 41,53 , or selected for a few variants with a unique functional impact, such as extended functional stability 31,35 . The error prone PCR approach used here to generate the phPAI-1 mutant library offers speed and ease of application with broad coverage of a significant subset of potential single amino acid substitutions. However, variant coverage is incomplete (Fig. 3), providing significant loss of function data for only a subset of mutation space (Fig. 5). Future advances in molecular approaches and machine learning algorithms will facilitate a comprehensive map of the mutation landscape for PAI-1 and numerous other proteins 54 .
Broadly, the work presented herein demonstrates how deep mutational scanning complements predictive algorithms of protein function and patterns observed in natural evolutionary processes. Furthermore, the data reported in this study provide a valuable resource for the interpretation of sequence variants in PAI-1 and other genes identified by the expanding clinical application of human whole genome sequencing.

Methods
Construction of a phage display library expressing PAI-1 fusion proteins. For display of PAI-1 on the M13 filamentous bacteriophage (phPAI-1, Fig. 1B), human SERPINE1 cDNA including a N-terminal myc tag and Gly-Gly-Gly-Ser linker was cloned between the AscI and NotI restriction sites of pAY-FE (Genbank #MW464120) 55,56 . The resulting construct encodes a phage-displayed PAI-1 protein N-terminally fused to a myc tag and C-terminally fused with FLAG and E tags (Fig. 1). The PAI-1 fusion protein was randomly mutagenized using the GeneMorph II Random Mutagenesis Kit (Agilent Technologies, Santa Clara, CA, USA). Primers used for PCR mutagenesis (SI Table 1) maintained the AscI and NotI restriction sites for ligation of the restriction digested insert into pAY-FE. Following ligation, the library was transformed into electrocompetent XL-1 Blue MRF' E. coli as per manufacturer's instructions. The depth of the library was determined by quantifying the number of ampicillin resistant colonies. Mutation frequency was estimated by Sanger sequencing of the SER-PINE1 inserts from randomly selected individual colonies (n = 13).
Phage production and purification. Phage were prepared as previously reported 57 . Briefly, E. coli harboring pAY-FE PAI-1 were grown in LB Broth supplemented with 2% glucose and ampicillin (100 μg/mL) at 37 °C and during mid-log phase (OD 600 0.3-0.4) were infected with M13KO7 helper phage at a multiplicity of infection of ~ 100, followed by growth for an additional 1 h at 37 °C. Cells were pelleted by centrifugation (4250×g for 10 m at 4 °C), resuspended in 2xYT media (16 g/L tryptone, 10 g/L yeast extract, 5 g/L NaCl) supplemented with ampicillin (100 μg/mL), kanamycin (30 μg/mL) and IPTG (0.4 mM) to induce expression of the PAI-1 fusion protein, and grown for 2 h at 37 °C 55 . All subsequent phage preparation steps were carried out at 4 °C to minimize PAI-1 transition to latency (SI Fig. 1). Phage were precipitated with polyethylene glycol-8000 (2.5% w/v) and NaCl (0.5 M) for up to 16 h followed by centrifugation (20,000×g for 20 min at 4 °C). The precipitated phage pellet was resuspended in 50 mM Tris containing 150 mM NaCl (pH 7.4; TBS). Phage titer was determined by transducing naïve XL-1 Blue MRF' E. coli grown to mid-log phase for 1 h at 37 °C and plating on LB-agar supplemented with ampicillin (100 μg/mL) and 2% glucose.
Selection of uPA-bound phPAI-1. phVWF-A3, in which the VWF A3 domain (Ser1681-Cys1872) was PCR amplified (SI Table 1) and cloned into the pAY-FE vector between the AscI and NotI restriction sites (generating pAY-FE-VWF-A3), were used as a negative control for uPA binding. phPAI-1 were diluted 9:1 with phVWF-A3 and then incubated with uPA (1.7 nM) for 30 min at 37 °C. Residual protease activity was inhibited by incubating the reaction mixture with 1X EDTA-free protease inhibitor cocktail for 10 min at 37 °C. uPA (free and complexed) was immunoprecipitated using magnetic protein G beads (15 μL), which were previously coupled to a polyclonal anti-uPA antibody (17 nM). Beads were washed four times with TBS containing 5% BSA (1 mL), resuspended in Tris (20 mM) pH 8.0 containing 50 mM NaCl, 2 mM CaCl 2 , and 5% BSA, and eluted by digestion with enteropeptidase (16 U, New England BioLabs, Ipswich, MA, USA) for 16 h at 4 °C. The eluted phage pool was used to infect naive XL-1 Blue MRF' E. coli. Eluted phage titers were quantified by transduction of XL-1 Blue MRF' cells as described above. To determine the composition of the phage pools before and after selection, single colonies of ampicillin resistant bacteria were selected, and their DNA amplified by PCR using primers annealing outside the insertion site, to a region common to both pAY-FE:PAI-1 and pAY-FE:VWF-A3 (SI Table 1) with three replicates of n = 24 colonies in each. www.nature.com/scientificreports/ High-throughput sequencing (HTS). Twelve overlapping amplicons (150 bp) were PCR amplified from pAY-FE PAI-1 (SI Table 1) with overlapping regions only analyzed on one amplicon (SI Table 2 58 . To compare SIFT results to our data, tolerated mutations were defined as those that were able to inhibit uPA (log 2 -fold change > 0) at 0 h, and noninhibitors (log 2 -fold < 0) at 0 h were defined as not tolerated. Finally, a χ 2 test (SI Table 3) was used to determine if the variants identified as loss-of-function by our highthroughput screen were significantly underrepresented in the gnomAD database 50 by comparing the expected frequency of variants identified in our screen that were present in gnomAD versus those that were not present.

Scientific
Evolutionary variability of PAI-1. A protein alignment of PAI-1 orthologs from 94 mammal species was constructed using the Comparative Genomics tools of the Ensembl webserver (www. ensem bl. org; release 104) to search for orthologs to the human serpine1 gene (ENSG00000106366). We trimmed the alignment to include only the 379 positions contained in human PAI-1, and removed sequences containing more than 5 percent gaps after trimming. We then used ConSurf (https:// consu rf. tau. ac. il/) to calculate evolutionary conservation scores for each position in the protein, where higher ConSurf scores indicated more evolutionarily variable positions.
To relate the functional susceptibility of each amino acid position to substitutions in our library to the same position's evolutionary conservation score, a normalized functional mutation score was determined. This score was equivalent to the number of significantly enriched or functional missense mutations at a given position divided by the total number missense mutations with sufficient depth to be analyzed in our library at the same position. The normalized functional mutation scores were further divided into quartiles and used for an analysis of variance (ANOVA) to test the normalized functional mutation score predicted the degree of evolutionary conservation at the same position.