High throughput protease profiling comprehensively defines active site specificity for thrombin and ADAMTS13

We have combined random 6 amino acid substrate phage display with high throughput sequencing to comprehensively define the active site specificity of the serine protease thrombin and the metalloprotease ADAMTS13. The substrate motif for thrombin was determined by >6,700 cleaved peptides, and was highly concordant with previous studies. In contrast, ADAMTS13 cleaved only 96 peptides (out of >107 sequences), with no apparent consensus motif. However, when the hexapeptide library was substituted into the P3-P3′ interval of VWF73, an exosite-engaging substrate of ADAMTS13, 1670 unique peptides were cleaved. ADAMTS13 exhibited a general preference for aliphatic amino acids throughout the P3-P3′ interval, except at P2 where Arg was tolerated. The cleaved peptides assembled into a motif dominated by P3 Leu, and bulky aliphatic residues at P1 and P1′. Overall, the P3-P2′ amino acid sequence of von Willebrand Factor appears optimally evolved for ADAMTS13 recognition. These data confirm the critical role of exosite engagement for substrates to gain access to the active site of ADAMTS13, and define the substrate recognition motif for ADAMTS13. Combining substrate phage display with high throughput sequencing is a powerful approach for comprehensively defining the active site specificity of proteases.


Figure S1: Data Analysis Pipeline
Approximately 10% of the paired-end reads were removed from the dataset by four different quality filters. Removed reads included 1) reads that did not match any of the given seed sequences to orient the sequence (light blue), 2) did not have perfect match between sense and antisense reads within the NNK region (magenta), 3) had quality score <=5 out for 40 at any NNK position (green), 4) encoded for a stop codon (dark blue, not visible).

Figure S2: Deep protease profiling
(A) Substrate phage display employs a random peptide library cloned between an epitope tag and the phage PIII protein, which is anchored to the phage body. (B) Initially, phages displaying a recombinant peptide are captured using anti-FLAG agarose beads. Following incubation with the protease, cleaved phages are released from the beads and separated from remaining uncleaved phage. (C) Single stranded phage DNA from the cleaved phage pool is prepared and the library inserts are amplified by PCR and appropriate adapters are added for high throughput sequencing. Enrichment relative to the unselected library is evaluated by counting the occurrence of each unique peptide in the sequencing data.

Figure S3: Read distribution in random peptide library
Histograms representing unique peptide counts in the random peptide for unselected (black), thrombin selected (blue), and ADAMTS13 selected (red) phage populations. Most peptides are seen only once or twice in all treatments. For final analysis, we required a minimum of 4 sequencing reads per peptide.

Figure S4: Deseq2 enrichment plots
The MA-plot derived from Deseq2 sofware (49) illustrates the log2 fold changes attributable to a given unique peptide over the mean of normalized counts. Points which fall out of the window are plotted as open triangles pointing either up or down. This figure illustrates the statistical treatments of the enrichment data that are required to control for stochastic variation in read count. Only those data points with padj < 0.05 (red) are used in subsequent analyses. A, the MA plots for thrombin or ADAMTS13 selection of the random 6 amino acid peptide library. B, the MA plot for ADAMTS13 selection of the VWF73(P3-P3') library. (C) A heatmap shows a different representation of the same data, and more clearly indicates the performance of each amino acid at each position. Green shows amino acids that potentiate cleavage by thrombin, and red shows amino acids that antagonize cleavage by thrombin.

Figure S6: Amino Acid motif for ADAMTS13 enrichment and depletion
Frequency plot of all significantly enriched and depleted peptides after selection by ADAMTS13 using iceLogo as in Figure S5.

Figure S7: Nucleotide distribution in VWF73(P3-P3') libraries
The frequency of each nucleotide at the 18 position of VWF73(P3-P3') library A and B is shown, confirming NNK randomization.

Figure S8: Amino Acid distribution in VWF73(P3-P3') library
A and B) The proportion of each amino acid at the 6 positions of the VWF73(P3-P3') library A and B, respectively. C) The frequency of each amino acid in the unselected libraries and ADAMTS13 selected library was compared to the codon frequencies within the NNK randomization scheme. These data show differences in amino acid frequency compared to expected, that are different than in the original random peptide library ( Figure 1C), implying a role of the VWF73 peptide sequence in library bias.    6N wobble for improved cluster diversity, unique barcodes for both forward and reverse primers (lowercase), and a FUSE55 vector hybridization domain. PCR products from barcoding primers were isolated and used as a template in a PCR reaction that completed the Illumina adapter sequence using PE1-seq and PE2-seq.

-A Library Preparation Oligonucleotides
The following primers were used in a PCR reaction using VWF cDNA as a template. One nM of VWF73-S2 and VWF73-S3, 1 uM VWF73-S1 and VWF73-AS1, and 1 ng template. The resulting product was purified and digested with Bgl1 prior to cloning into FUSE55.

VWF73(NNK)6-B Library Preparation Oligonucleotides
The following primers were used to generate a second VWF73(NNK)6 library from a synthetic template. The PCR was conducted with 1 uM VWF73-NNK S1 and VWF73-NNK AS3, 1 nM VWF73-NNK-AS1 and VWF73-NNK-AS2, and 1 ng VWF73-NNK-templ. PCR products from barcoding primers were isolated and used as a template in a PCR reaction that completed the Illumina adapter sequence using PE1-seq and PE2-seq (Table S2).

Name Sequence
NGSg-S1 ACACTCTTTCCCTACACGACGCTCTTCCGATCTNNNNNNgaatc GAGCAGGCGCCCAAC NGSg-AS1 ACACTCTTTCCCTACACGACGCTCTTCCGATCTNNNNNNacagt GAGCAGGCGCCCAAC NGSl-AS1 CTCGGCATTCCTGCTGAACCGCTCTTCCGATCTNNNNNNatgtc ATCAGAGGCAGGATTTCC Table S7: Amino acid bias in Random AA library vs VWF73(NNK) library The influence of nucleotide identity at the 3 rd codon position on amino acid diversity is shown. Amino acids requiring G at this position are compared to amino acid that require T at this position and A, C, G, or T (NA) at this position. The abundance of T at this position within the VWF73(NNK) libraries (see Fig.S11 A,B) results in a biased amino acid diversity that likely explains the difference in amino acid content in this library compared to the random peptide library (NNK).