Mapping specificity, entropy, allosteric changes and substrates in blood proteases by a high-throughput protease screen

Proteases are among the largest protein families in eukaryotic phylae with more than 500 genetically encoded proteases in humans. By cleaving a wide range of target proteins, proteases are critical regulators of a vast number of biochemical processes including apoptosis and blood coagulation. Over the last 20 years, knowledge of proteases has been drastically expanded by the development of proteomic approaches to identify and quantify proteases and their substrates. In spite of their merits, some of these methods are laborious, not scalable or incompatible with native environments. Consequentially, a large number of proteases remain poorly characterized. Here, we introduce a simple proteomic method to profile protease activity based on isolation of protease products from native lysates using a 96FASP filter and their analysis in a mass spectrometer. The method is significantly faster, cheaper, technically less demanding, easily multiplexed and produces accurate protease fingerprints in near-native conditions. By using the blood cascade proteases as a case study we obtained protease substrate profiles of unprecedented depth that can be reliably used to map specificity, entropy and allosteric changes of the protease and to design fluorescent probes and predict physiological substrates. The native protease characterization method is comparable in performance, but largely exceeds the throughput of current alternatives.

To fully exploit the potential of the method, we applied it to comprehensively characterize the blood 230 cascade serine proteases. The group of enzymes tested consists of blood coagulation proteases aFVII, 231 aFIX, aFX, aFXI, activated α-Thrombin, PLG and aPC as well as Thrombin isoforms β-, γ-, that are 232 autoproteoytic products of α-Thrombin. We chose these proteases because they (i) are biologically and 233 chemically related; (ii) have a substantial therapeutic potential; (iii) have been to some extent structurally 234 characterized and (iv) their repertoire of substrates is not yet fully characterized. The concerted action of 235 these serine proteases regulates blood clot formation through activation of Thrombin which converts 236 fibrinogen to insoluble fibrin and activates platelets via PAR1 proteolytic activation 44 . Besides the nine 237 blood coagulation proteases we also included chymotrypsin to the screen because it has the archetypal 238 protease structure for the S1 chymotrypsin-like family 45 .

239
The respective proteases were analyzed using the HTPS workflow. The detected specificity features are 240 summarized in Figure 3A. For each protease we identified cleavages ranging from 1800 for aFVII and up 241 to more than 10000 for PLG ( Figure 3B). This represents an increase in the number of identified cleavages, by about two orders of magnitude compared to MEROPS database (428 vs 38513 unique 11 substrates/cleavages, Figure 1C, Table S2). While most of these substrates are not likely to be processed 244 during blood coagulation due to the nature of the substrate sample, they are nevertheless very useful to 245 determine the cleavage specificity, entropy, allostery and other functional/structural properties of the 246 proteases. The heat maps shown in Figure 3A and Figure S4 report significant (corrected p-value<0.01, Table S5)    indicating that, similar to Trypsin and Chymotrypsin, the protease specificity is determined mainly by the 266 P1 position. This is also evident from the sub-pocket resolved cleavage entropy profiles ( Figure 3D), 267 which show the substrate preference per position for each protease. Proteases from cluster 1 are 12 promiscuous proteases, their specificity is essentially determined by the amino acid in position P1 and, as 269 consequence, they cleave more frequently compared to other coagulation proteases ( Figure 3B). pocket, generates a more extended specificity for coagulation proteases (from P3 to P2') and thus a distinct substrate fingerprint. As an example, the preference of aFX for Gly in position P2 ( Figure 3A) is 274 generated by bulky residues in the insertion loop which accepts small amino acids at the corresponding 275 substrate positions 48 . In α-Thrombin, the 60-loop generates the preference in position P2 for Pro, 276 hydrophobic and planar residues and a preference in position P1' for small residues like Ala ( Figure 3A).

277
The key role of the steric hindrance of the 60-loop in the selectivity of α-Thrombin was shown by   (Table S2,

291
In the previous section we found that specificity profiles generated with our method were sensitive enough 292 to detect subtle differences between the investigated proteases. We next asked whether HTPS could detect 293 changes of specificity profiles corresponding to structural rearrangements of a protease catalytic pocket regulatory mechanism of proteins where the binding of an allosteric effector modulates conformational 296 and consequently functional changes of a protein. To investigate the effects of Na + on the reorganization 297 of the hydrophobic pocket in the active site of blood coagulation proteases and the ensuing effects on 298 activity and specificity, we generated protease fingerprints of tested coagulation proteases in the presence 299 of 0.2M NaCl or choline chloride (ChCl). ChCl was used to keep the ionic strength constant without 300 exerting an allosteric effect 57 . Different from other thrombin allostery studies 55 , the reaction was 301 performed at physiological temperature of 37°C and 0.2M NaCl, where about 50% thrombin is bound to 302 Na + molecules 56 , to ensure efficient proteolysis. We first tested whether the experimental conditions 303 recapitulated the well-known activity patterns of α-Thrombin towards Fibrinogen (FGA) and PC. Previous 304 studies have shown that α Thrombin, when bound to Na + (Fast form) has an enhanced activity towards the protease that influences α-Thrombin activity with a negative feedback mechanism 58-60 . As the dissociation 307 constant of α-Thrombin with bound Na + is close to the concentration of the ion in blood at 37°C 58 , a subtle 308 deviation of Na + concentration e.g. around platelet thrombi in vivo, generates a different substrate 309 selectivity with an important implication for the pro-vs. anti-coagulatory activity of α-Thrombin. The 310 cleavage patterns observed in our data were used to infer the amino acid specificity enrichment of α-

311
Thrombin towards its known physiological substrates (FGA and PC) 54 . In presence of NaCl we observed 312 lower p-values for FGA compared to PC substrate; while in presence of ChCl we identified a similar 313 distribution of p-values ( Figure 4A). This is in agreement with previous studies which showed that 0.2M 314 Na + led to an increase in the specificity towards FGA, but not towards PC 58 . This together with boosted 315 activity of α-Thrombin in presence of Na + ( Figure 4C) results also in a higher rate of FGA cleavages.

316
Other proteases, similar to α-Thrombin, exhibited an increase in proteolytic activity and these patterns  Figure 4C, Figure 3A).

326
We next performed an unsupervised hierarchical clustering to investigate the impact of allostery on the 327 specificity and entropy of blood coagulation proteases included in the study. We clustered the significant 328 specificity changes detected in presence of NaCl vs. ChCl ( Figure 4D) and observed that proteases 329 regulated allosterically by Na + clustered closely together and showed significant changes in their substrate 330 specificity. In contrast, no significant changes were observed for proteases that cannot bind Na + and thus 331 cluster separately. The specificity differences observed in case of aFVII, aFIX, aPC, aFX, β-and α-

332
Thrombin suggest, that Na + had an impact not only on the number of cleavages, but also on substrate 333 specificity ( Figure S7). Importantly, this is also evident from the correlation between activity changes and 334 the changes detected at the level of substrate preference with a R 2 value of 0.96 ( Figure 4E). Next, we  Figure S8B). We observed also in this case an R 2 value of 0.83, 339 demonstrating that proteases where Na + had the most significant impact on activity showed corresponding 340 changes at the level of entropy and specificity. These results convincingly demonstrate that Na + binding to 341 the allosteric site of coagulation cascade proteases regulates their activity in a way that reflects on protease 342 substrate preference ( Figure 4F) and entropy ( Figure 4G) with strong changes observed, for example, for  all coagulation proteases included in the study, and showed that HTPS has the capacity to detect the 350 functional consequences of allosteric changes at an exquisite level of sensitivity. We found that 351 differential activity, specificity and entropy correlated exactly with the presence/absence of residues that 352 enable Na + coordination. This highlights the potential of this approach as a tool for systematic screening   Figure 5A, 5B). We designed two synthetic peptides that 362 represent the best match according to the detected specificity for activated α-Thrombin (NH 2 -363 GIPR↓AAGD-COOH) and aFX (NH 2 -GIGR↓RIAE-COOH). As our analysis investigated the positional 364 specificity but did not take into account possible sub-site cooperativity 53 , we confirmed that these peptides 365 were cleaved effectively by the respective proteases. We monitored the intensity of the cleavage products 366 by mass spectrometry, using MS1 signal integration ( Figure 5C). The results showed the expected patterns 367 and thus confirmed that both synthetic peptides represent a good entry point for development of 368 substrates.

369
To evaluate the exact mode how these peptides bind to the protease active site (AS), we performed a 370 molecular docking analysis. The structural data for α-Thrombin (1ppb) and aFX (1g2l) showed strong accessible in aFX. Further, the aryl-binding site S4 of α-Thrombin, located above the conserved residue residues Tyr 99 and Phe 174, which together with the indole ring system of Trp 215 form the walls of an 376 aromatic box. Our docking models ( Figure 5D) showed how the residue P1 (Arg) can be effectively 377 oriented inside the S1 pocket, while the different P2 residues (i.e., Pro in the case of α-Thrombin and Gly 378 in the case of aFX) can fit specifically in the correspondence of S2 subsites, thus ensuring the interaction 379 with the AS. These results indicate that the substrate-design based on HTPS results produces structurally 380 plausible solutions.

381
Next, we designed small fluorescent tetrapeptide substrates corresponding to the P4-P1 active site 382 preferences to monitor the activity of activated α-Thrombin (zGIPR-AMC) and aFX (zGIGR-AMC).
substrates against a panel of proteases in a standard assay 66 and observed that both substrates had 385 selectivity for the target proteases ( Figure 5E). Accordingly, zGIPR-AMC was most efficiently cleaved by 386 α-Thrombin and also by γ-and β-Thrombin, while other proteases included in the assay did not cleave 387 zGIPR-AMC. The zGIGR-AMC was less selective because it was cleaved by aFX and also by α-, β-and 388 γ-Thrombin. We further calculated the k cat /K M and demonstrated that zGIPR-AMC had good selectivity 389 for α-Thrombin over aFX (380-fold higher k cat /K M ). Selectivity in case of zGIGR-AMC was substantially 390 lower with k cat /K M 3.5-fold higher values for aFX in comparison with α-Thrombin ( Figure 5F). The our protocol, we therefore asked whether we could use the protease specificity information derived from a 408 native lysate to generate hypotheses on proteins known to be secreted into the blood. To pursue this aim,

409
we developed a three-step filtering framework to single out, from a large initial search space, substrate 410 candidates for specific proteases ( Figure 6A). We used as a reference the 718 proteins reported as secreted  Figure S9A) (due to its broad specificity, PLG was excluded from the subsequent analysis).
operator curves and filtered the data to match a false positive rate of 1% ( Figure 6D). By this means, we 426 could reduce the search space of potential physiological substrates by about 100 times. The evaluation of 427 the sensitivity and specificity indicated a good prediction power for all proteases included in the analysis 428 (average AUC~0.97, Figure S9D-J). In a second step, we used the JPred4 72 software tool to predict the 429 solvent accessible regions of (nearly) all 718 secretome proteins and removed all the sequences that were 430 predicted to be buried, thus eliminating structurally implausible targets. This step further reduced the 431 number of potential targets by about a half, from 2688 to 1380 (in case of α-Thrombin substrates). Finally, 432 in a third step, we used loose protein-level filters to further refine the target selection: proteins for which 433 no expression was measured as well as those for which no co-citation with the target protein was reported 434 were removed. A good proxy for physiological substrates was calculated from the ranking of the 435 frequency of co-citation, protein abundance and the number of potential cleavage sites. These steps 436 significantly reduced the search space (we identified for Thrombin 794 potential physiological cleavages),

437
while having a negligible effect on the recall of previously known substrates ( Figure S9K). As expected, 438 the final score generated from our filtering strategy was highly skewed towards known substrates reported 439 in MEROPS, indicating that it correctly reports potential substrate candidates ( Figure S9L, Table S8).

440
Furthermore, we found that using this filtering strategy, most target sequences were unique to specific 441 proteases and only a few were shared among all six ( Figure 6E). Interestingly, the protein substrates 442 displayed an opposite trend: only a few proteins were targeted by a single protease, and the large majority 443 was potentially a target of several or even all of them. The high number of target sequences carried on 444 average by each target protein seems to explain to a significant extent this observation ( Figure 6F).

445
Next, we asked which processes and functions were enriched among the proteins targeted by the blood 446 cascade proteases. We used DAVID 73 to calculate the enrichment against the secretome background and

489
We demonstrate the microscale and high-throughput capabilities of HTPS by applying the workflow to a 490 set of coagulation cascade proteases and detect specificity features for activated α-, β-, γ-Thrombin, aFVII, 491 aFIX, aFX, aFXI, aPC and PLG. Here, the high numbers of detected cleavages allowed us to characterize 492 the minor distinguishing features between these closely related proteases and group them according to 493 their cleavage specificity and cleavage entropy. Furthermore, we were able to recapitulate from our 494 proteomic data the known specificity differences between two isoforms of Thrombin (α-and γ-), which 495 further demonstrated the sensitivity of the screen. The large number of cleavage events identified per 496 measurement allowed us to investigate the allosteric effect of Na + on activity and specificity with great 497 sensitivity: we obtained results that confirm the mechanisms of allosteric regulation for α-Thrombin 53,58 , 498 aPC 63 and aFX 62 and expand our knowledge also to other blood proteases for which so far mechanisms of 499 allosteric regulation with Na + were not extensively described. This demonstrated that differential 500 specificity and entropy profiling can be used to identify restraints to model conformational changes. It is 501 also important to note that allosteric effects are typically investigated with fluorescence anisotropy, 21 crystallography). In contrast, in its current implementation HTPS analyses are performed in near-native conditions, requires less than 1µg of protease per assay, and further downscaling can be envisioned.

505
The translational value of HTPS is perhaps best illustrated in the context of designing sensitive tools for 506 detection of protease activity. We used HTPS data to design synthetic peptides and show that they were 507 cleaved by their respective proteases, demonstrating that positional substrate preferences detected with our 508 protocol can translate into tools for detecting protease activity. This is useful, especially in case of poorly

530
To conclude, we introduce a new proteomic tool for protease research, which we dub HTPS. We believe it

665
The fluorescent substrates for α-Thrombin (zGIPR-AMC) and for aFX (zGIGR-AMC) were purchased 666 from Biomatik (USA) and selectivity was tested in a standard protease screen as described elsewhere 66 .

667
All measurements were performed in 20 mM Ammonium bicarbonate pH 7.8 with 200 mM NaCl. Where