Efficient backbone cyclization of linear peptides by a recombinant asparaginyl endopeptidase

Cyclotides are diverse plant backbone cyclized peptides that have attracted interest as pharmaceutical scaffolds, but fundamentals of their biosynthetic origin remain elusive. Backbone cyclization is a key enzyme-mediated step of cyclotide biosynthesis and confers a measure of stability on the resultant cyclotide. Furthermore, cyclization would be desirable for engineered peptides. Here we report the identification of four asparaginyl endopeptidases (AEPs), proteases implicated in cyclization, from the cyclotide-producing plant Oldenlandia affinis. We recombinantly express OaAEP1b and find it functions preferably as a cyclase by coupling C-terminal cleavage of propeptide substrates with backbone cyclization. Interestingly, OaAEP1b cannot cleave at the N-terminal site of O. affinis cyclotide precursors, implicating additional proteases in cyclotide biosynthesis. Finally, we demonstrate the broad utility of this enzyme by cyclization of peptides unrelated to cyclotides. We propose that recombinant OaAEP1b is a powerful tool for use in peptide engineering applications where increased stability of peptide products is desired.

P roteases are abundant throughout nature and are essential for a wide range of cellular processes. They typically serve to hydrolyse polypeptide chains, resulting in either degradation of the target sequence or maturation to a biologically active form. Less frequently, proteases can also ligate polypeptides, producing new or alternatively spliced variants. This unusual function has been reported for processes such as the maturation of the lectin concanavalin A 1 , peptide presentation by major histocompatibility complex class I molecules 2 , and anchoring of bacterial proteins to the cell wall 3 . Recently, this enzymatic transpeptidation has also been implicated in the backbone cyclization of ribosomally synthesized cyclic peptides [4][5][6][7][8][9][10] .
Cyclotides are a well-studied class of gene-encoded cyclic peptides that are expressed in plants and exhibit a range of bioactivities including insecticidal, nematicidal and molluscicidal activity against agricultural pests [11][12][13][14] . Structurally, they are characterized by a cyclic cystine knot motif that confers exceptional stability. Importantly, this stable framework can be used as a pharmaceutical scaffold, and bioactive sequences have been successfully grafted into cyclotides 15 . Backbone cyclization can also endow peptides with oral bioavailability, suggesting that this modification might find broad application in peptide drug engineering [16][17][18] . However, in vitro cyclization of synthetic peptides is challenging and the limited availability of enzymes capable of this process is a hurdle to large-scale production 19,20 . Furthermore, expression yields of cyclotides in transgenic plants that are not native cyclotide producers is poor, impeding transfer of agriculturally relevant bioactivities to other plants 8,21 . The mechanism of enzymatic cyclization intrinsic to cyclotide biosynthesis is poorly understood. Elucidating it will be important for the realization of the pharmaceutical and agricultural potential of cyclotides and for increasing the cyclization efficiency of unrelated 'designed' bioactive peptides.
Cyclotides are produced as precursors in which the cyclotide sequence is flanked by N-and C-terminal propeptides (Fig. 1). It is thought that enzymatic removal of the N-terminal propeptide precedes the final maturation step of C-terminal propeptide cleavage and ligation of the free N-and C-termini 8,21 . Only four native cyclases have been identified to date and the best characterized of these is the serine protease PatG, which cyclizes the bacterial cyanobactins [4][5][6][7] . In plants, the serine protease PCY1 cyclizes the segetalins; cyclic peptides from the Caryophyllaceae 4 . However, in the two other classes of plant-derived cyclic peptides (cyclotides and the PawS-derived cyclic peptides), strong Asx sequence conservation at the C-terminal P1 site implicates as possible cyclases the asparaginyl endopeptidases (AEPs), a group of cysteine proteases also known as vacuolar processing enzymes or Legumains, and this hypothesis is supported by studies in transgenic plants 8,9,21,22 .
Recently, an AEP (butelase 1) was isolated from the cyclotideproducing plant Clitoria ternatea and shown to cyclize a modified precursor of the prototypical cyclotide, kalata B1 (kB1) from Oldenlandia affinis, however, recombinant expression of functionally active butelase 1 has not been achieved, limiting its application 5 . Only one AEP with any cyclizing ability has been produced recombinantly, and this enzyme was highly inefficient, producing mainly hydrolysed substrate 10 . Here we report the identification, recombinant production and characterization of an O. affinis AEP that preferentially functions as a cyclase. The enzyme can cyclize native kalata substrate precursors and the unrelated anti-malarial peptide, R1, at close to 100% efficiency. This AEP releases the C-terminal propeptide of kB1, but it does not mediate the N-terminal processing event, which must occur first if efficient cyclization is to take place. Moreover, its specificity for model peptides mirrors the sequence requirements for cyclization of kB1 in transgenic plants, supporting a native function in the maturation of O. affinis cyclotides 8,21 .

Results
Identification and recombinant expression of O. affinis AEPs. Three expressed AEP isoforms were identified in an O. affinis complementary DNA library (OaAEP1-3) and a fourth sequence, with a single nucleotide change from OaAEP1 (resulting in a Glu 371 Val variant), was identified from genomic DNA (OaAEP1 b ) ( Fig. 2a; Supplementary Fig. 1). The four isoforms share at least 77% identity at the protein level, as determined by pairwise protein alignments. When compared with butelase 1, 64-69% identity was observed, whereas identity with human legumain was 49-53%.
OaAEP1 b was expressed in Escherichia coli as a His6-ubiquitin-AEP1 b fusion protein ( Supplementary Fig. 2a). AEPs are usually produced as zymogens that are self-processed at low pH to their mature, active form [23][24][25] . Consistent with this processing, activity of rOaAEP1 b against an internally quenched fluorescent (IQF) peptide representing the native C-terminal processing site in kB1 (Table 1; wildtype (wt)) was markedly increased following incubation at pH 4.5 (Fig. 2b). After purification, a dominant band of B32 kDa was evident by reducing SDS-polyacrylamide gel electrophoresis (PAGE) and confirmed to be rOaAEP1 b by Western blotting (Fig. 2c; Supplementary Fig. 2b). The average total protein yield from two independent experiments was B1.8 mg l À 1 after activation and purification, however batch to batch variation in purity was observed. Although glycosylation of some AEPs has been reported 26 , the production of an active form in E. coli confirms that this is not a requirement for activity of O. affinis AEP1 b .
Mass spectrometry (MS)/MS sequencing of peptide fragments generated from tryptic, chymotryptic or endoGlu-C digestion of the activated enzyme identified several peptide fragments with noncanonical cleavage sites, suggesting that they may be derived from rOaAEP1 b auto-processing events ( Supplementary Fig. 2). This allowed Asp52 to be assigned as the likely N-terminal autoprocessing site and Asp328, Asn329, Asp334, Asn336, Asp349 and/ or Asp351 as potential C-terminal processing sites ( Fig. 2a; Supplementary Fig. 2). No peptides downstream of Asp351 were identified, indicating that the activation was essentially complete and that the C-terminal domain (Leu352-Pro474) was removed during the post-activation purification step. The theoretical mass of the processed forms  is in good agreement with that determined by SDS-PAGE/Western blotting (Fig. 2c).
inhibit AEPs 27 , was a poor inhibitor of the recombinant enzyme suggesting that at least some P 0 residues are important for active site targeting.
Substrate specificity. The activity of rOaAEP1 b against IQF peptides representing wt and mutant versions of the native kB1 C-terminal cleavage site was determined (Table 1; Supplementary  Fig. 4a). Along with the strict P1 Asx specificity characteristic of AEPs 24 , rOaAEP1 b exhibited strong P2 0 selectivity since after Leu 31 Ala substitution within the IQF peptide barely any hydrolysis was observed. This observation is consistent with the lack of cyclic product generated when the corresponding mutation was introduced in planta 21 . Kinetic parameters (V max , K m and k cat ) are reported where applicable ( Table 1). The turnover rates (k cat ) reported here (B0.06-1.6 min À 1 ) are much slower than that reported for recombinant human legumain assayed against a small substrate (B8 s À 1 ) (ref. 28). This is not unexpected given that rOaAEP1 b prefers to carry out cyclization, rather than the hydrolysis being measured here. Supporting the observed P2 0 selectivity, rOaAEP1 b was unable to cleave the generic AEP substrate Z-AAN-MCA ( Supplementary Fig. 5). The substrate specificity of rOaAEP1 b was compared with that of recombinant human legumain (rhuLEG; Supplementary  Fig. 4b) 29 . A stringent P1 Asx requirement was again observed; however, in contrast to rOaAEP1 b , rhuLEG cleaved the Leu 31 Ala substrate at a rate similar to the wt substrate, demonstrating that P2 0 specificity is not a feature of all AEPs.
Cyclization of kB1 precursors. To explore the cyclization ability of rOaAEP1 b , processing of correctly folded (as determined by NMR) synthetic kB1 precursors was assessed by MS. When incubated with the wt kB1 precursor carrying the native C-terminal pro-hepta-peptide (GLPSLAA), the active enzyme produced a peptide of 2,891.2 Da (monoisotopic, [M þ H] þ ), consistent with the expected mass of mature, cyclic kB1 (Fig. 3a). This product was confirmed to be identical to native kB1 by reversed phase-high performance liquid chromatography (RP-HPLC) co-elution ( Supplementary Fig. 6) and one-and two-dimensional-NMR experiments ( Supplementary Fig. 7). Kinetic parameters (±s.e.m.) for the processing of the wt kB1 precursor were 0.53 ( ± 0.1) s -1 for k cat , 212 ( ± 76) mM for K m and 2,500 M À 1 s À 1 for k cat /K m as determined from a Michaelis-Menten plot ( Supplementary Fig. 8). While the turnover rate (k cat ) is lower than that reported for the plant-derived   To determine if rOaAEP1 b could also carry out the N-terminal processing required for cyclotide maturation, a kB1 precursor was tested that contained the folded cyclotide domain flanked by four residues from each of the N-and C-terminal propeptides (Fig. 3b). No N-terminal processing was observed, indicating that this processing is conducted by an enzyme other than OaAEP1 b . Although the bulk of the precursor remained intact after 20.5 h, the predominant processing product was a linear peptide lacking the C-terminal propeptide, demonstrating that correct N-terminal processing must occur before cyclization. Interestingly, a mass corresponding to a cyclized version of the C-terminally processed peptide (that is, C-terminal propeptide residues released, N-terminal propeptide residues remaining) was also observed, although this was the least abundant product.
Processing of modified cyclotide precursors. To further probe cyclization requirements, we tested rOaAEP1 b -mediated processing of modified kB1 and kB2 precursors over time (Fig. 4). When presented within an IQF peptide, the Leu 31 Ala substrate analogue was not hydrolysed by rOaAEP1 b (Table 1). However, the same substitution within the kB1 precursor did not preclude cyclization by rOaAEP1 b , although this version was cyclized far more slowly than the wt precursor ( Fig. 4a,b). Surprisingly, the presence of disulfide bonds in the cyclotide precursors is not a requirement for cyclization since a kB1 substrate analogue in which all six cysteines were substituted with serines was also efficiently cyclized (Fig. 4c). The absence of a defined kB1-like structure in the kB1 6xS mutant was confirmed by NMR spectroscopy (Supplementary Fig. 7c). Similarly, a kB2 linear precursor with the same Cys-Ser substitutions was also efficiently cyclized, confirming that enzyme activity is not specific to individual cyclotides (Fig. 4d).
In cysteine protease-mediated peptide bond hydrolysis, nucleophilic attack of a water molecule is required to resolve the acyl-enzyme thioester intermediate. However, during peptide cyclization (or transpeptidation) the substrate's N-terminal amine is postulated to function as a competing nucleophile, facilitating aminolysis of the reactive thioester intermediate 30 . Accordingly, a kB1 precursor with an acetyl-capped N-terminal amine was processed only to a linear peptide lacking the C-terminal propeptide (Fig. 4e). This hydrolysis occurred at a slower rate than cyclization of the wt precursor (compare with Fig. 4a). Water can therefore access the active site of rOaAEP1 b , but cyclization is favoured over hydrolysis in the presence of an appropriately positioned nucleophile.
Water is excluded during cyclization. An alternative ligation mechanism, distinct from transpeptidation, was recently proposed for huLEG 31 ). In that mechanism, initial hydrolysis of the C-terminal propeptide is followed by a separate ligation event requiring a C-terminal Asn residue in the substrate. To distinguish between these mechanisms in the case of rOaAEP1 b , reactions were carried out in the presence of 18 Olabelled water and the products were analysed by high-resolution MS. An isotopic shift consistent with the incorporation of 18 O was evident following enzymatic hydrolysis of the N-terminal acetylated kB1 precursor to give a linear product (Fig. 5a). However, there was no isotopic shift after processing of the wt precursor to a cyclic product, suggesting that hydrolysis is unlikely to play a role in cyclization by rOaAEP1 b (Fig. 5a).  rOaAEP1 b can cyclise an unrelated peptide. We also investigated cyclization of other substrates structurally unrelated to cyclotides by rOaAEP1 b , focussing on the anti-malarial peptide R1 (refs 32,33). This peptide was efficiently cyclized following the addition of N-and C-terminal AEP recognition sequences (Fig. 6a). Sequential trimming of the added recognition residues revealed that cyclization could be achieved following the addition of only a C-terminal Asn-Gly-Leu motif (although some linear product was also produced from this precursor) (Figs 6a-d). Lys and Gln were also accepted in place of Gly at the N terminus (Figs 6e-f) with little impact on yield at the time point tested. No processing of either the native R1 peptide or a modified R1 carrying the N-terminal Gly-Leu motif with only an Asn at the C terminus was observed ( Supplementary Fig. 9). Subsequent digestion with endoGlu-C confirmed that, in all cases, rOaAEP1 b processing produced cyclic peptide ( Supplementary Fig. 10). Evidence of an additional, linear, rOaAEP1 b -generated cleavage product was only observed for the R1 variant without any N-terminal flanking residues ( Fig. 6d; Supplementary  Fig. 10c).

Discussion
This study reports the cloning of four AEPs from the cyclotideproducing plant O. affinis; one of which was recombinantly expressed. The recombinant enzyme required self-processing to produce the active product: a cyclase that preferentially and efficiently couples C-terminal processing with C-and N-terminal ligation of linear O. affinis cyclotide precursors. Furthermore, this cyclizing ability was highly efficient when transferred to an unrelated anti-malarial peptide, demonstrating broad applicability in peptide engineering. Consistent with other auto-inhibited proteases, rOaAEP1 b required proteolytic activation to achieve maximum activity (Fig. 2b). The observed N-terminal auto-processing site (Asp52) is consistent with other experimentally validated N-terminal auto-processing sites identified in jack bean AEP 34 , butelase 1 (ref. 5) and human legumain 28,35 (Fig. 2a, Supplementary Fig. 1). In contrast, six potential C-terminal auto-processing sites (Asp328/334/349/351, Asn329/336) were observed within a region particularly rich in Asn/Asp residues (324-351). This finding is in agreement with the multiple C-terminal maturation steps recently described for rhuLEG 28,35 . Regardless of which of these sites is relevant in planta, the instability of active AEPs above pH 6 (refs 28,29,36) will likely preclude direct production of active enzyme in E. coli. Activated rOaAEP1 b proteolytically removed the C-terminal (but not N-terminal) propeptide of a kB1 precursor and resolved the acyl-intermediate in a hydrolysis-independent manner, generating a backbone cyclized product (Figs 3 and 5). rOaAEP1 b could also hydrolyse precursors lacking a free N-terminal amine to produce linear products, albeit at a slower rate. Although it is unknown if  OaAEP1 b performs both these cyclase and protease activities in vivo, the observed preference for cyclization over hydrolysis suggests that it probably functions predominantly as a cyclase. Dual protease/ligase capabilities have been reported for PatG 37 , a serine protease involved in cyclic peptide production in cyanobacteria, and more recently for human legumain 31 . Separate mechanistic pathways for hydrolysis and ligation have been proposed for human legumain: proteolysis proceeds via hydrolysis of a cysteinyl-thioester enzyme intermediate, whereas peptide ligation occurs via activation of the free peptidyl-acarboxylate through transient formation of an enzyme-linked anhydride intermediate that is subsequently resolved via aminolysis 31 . The catalytically critical residue for this mechanism of ligation, Asp188, is conserved in OaAEP1 b , but several lines of evidence preclude a role for this pathway in OaAEP1 b -mediated peptide cyclization. First, rOaAEP1 b cannot cyclize peptides carrying a free C-terminal Asn, the minimal proposed substrate requirement in the alternative pathway ( Supplementary Fig. 9). Second, our H 2 18 O experiments demonstrate the absence of 18 O incorporation into the cyclized product, which strongly indicates that cyclization does not follow a hydrolysis/ligation mechanism as proposed in the alternative pathway (Fig. 5). Third, our MS/MS data for rOaAEP1 b show no evidence of a reactive succinimide enzyme intermediate required for formation of the substrate-enzyme anhydrides 31 . Hence, our mechanistic data are in agreement with the traditional concerted mechanism, in which some of the energy from the (exergonic) cleavage of the C-terminal Asn-propeptide bond is preserved in the form of a thioester intermediate and used to overcome the energetically unfavourable (endergonic) peptide bond formation (cyclization) in the second step. However, alternative mechanisms may still play a role in cyclization mediated by other AEPs or with alternative substrates.
In the context of this established mechanism, C-and N-terminal proximity was thought to be crucial for cyclization to be favoured over hydrolysis 8 . Here we show that pre-organization of C-and N-termini in the substrate is not required by rOaAEP1 b since unconstrained cyclotide precursors lacking the characteristic disulfide-bonded structure are efficiently cyclized (Fig. 4). Furthermore, rOaAEP1 b can cyclize an anti-malarial peptide that is structurally and functionally unrelated to cyclotides following the addition of short flanking sequences (Fig. 6). These findings are consistent with the limited structural and/or sequence requirements imposed by other native cyclases on their substrates 4,5,7 Conceivably, polypeptides of diverse composition and length may be cyclized by rOaAEP1 b , provided that association of the C-and N-termini is not sterically hindered.
The application of this technology is limited to peptides that can retain activity following incorporation of the additional residues required for AEP-mediated processing. While rOaAEP1 b can cyclize a model peptide with only a single non-native residue incorporated to the mature peptide, this is at the cost of cyclization efficiency (Fig. 6d). Understanding the interplay between the sequence requirements for efficient cyclization and retention of bioactivity for a given target peptide will be crucial to realizing the potential of AEP-mediated cyclization. Importantly, the estimated turnover rate of rOaAEP1 b (k cat , 0.53 s À 1 ; Supplementary Fig. 8) is multiple orders of magnitude higher than the recombinantly produced cyclase PatG (1 d À 1 ; ref. 7), supporting its widespread application in peptide engineering.
In a previous study, a conserved tripeptide motif C terminal to both the N-and C-terminal cyclotide processing sites was identified through cyclotide sequence analysis 8 . This motif is Gly-Leu-Pro in the kB1 sequence, and its importance for efficient cyclization is supported by mutagenesis studies in transgenic plants 8,21 . The protease specificity of rOaAEP1 b reported here against IQF peptides mirrors these requirements (Table 1). At the C-terminal processing site, the P1 Asn and P2 0 Leu are particularly well-conserved, and both were crucial for both in planta cyclization and in vitro cleavage of model peptides. However, over the longer incubation period of the cyclization assays, a kB1 precursor with a Leu 31 Ala mutation was still enzymatically processed to a cyclic product (Fig. 4). It was initially proposed that the conserved Leu residue at the P2 0 position of cyclotides was important for preventing water from accessing the active site during cyclization. However the observed cyclization of the kB1 Leu 31 Ala mutant suggests that the role of a conserved bulky hydrophobic residue at the P2 0 position is only to promote appropriate enzyme-substrate interaction. Congruent with this hypothesis, the absence of a P2 0 residue renders substrates poor targets of both rOaAEP1 b (cyclization and hydrolysis; Fig. 6c; Supplementary Fig. 5) and butelase 1 (at least for hydrolysis) 5 . In summary, our results suggest a cyclization model in which the cleaved C-terminal propeptide retains sufficient affinity to remain bound to the active site until it is displaced by the incoming N terminus of the peptide, finally leading to cyclization by resolving the acyl intermediate 8,38 .
This P2 0 requirement is not characteristic of all AEPs 9,34,39 and might be a predictor of cyclization ability within this protease family. Indeed, in extracts from C. ternatea, protein fractions that were active against the generic AEP substrate Z-AAN-MCA (which does not contain a P2 0 residue) did not contain the cyclizing enzyme and, conversely, the butelase 1 containing fraction did not display activity against this substrate 5 . Here we report the presence of four unique AEP sequences in O. affinis ( Fig. 2a; Supplementary Fig. 1), and demonstrate that one (OaAEP1 b ) is capable of cyclizing O. affinis cyclotide precursors. Further work will investigate whether all, or a subset of, O. affinis AEPs (at least two more of which should exist to explain all the AEP contig sequences observed) exhibit this function and whether their substrate specificity is an accurate predictor of cyclization ability.
The constraints on the sequence of the incoming N terminus may not be very stringent. At least two residues with different properties are accepted in place of Gly at the P1 00 position by rOaAEP1 b with comparable yields of cyclic product under the conditions tested (Fig. 6e,f). Butelase 1 and Pat G also exhibit promiscuity in this region 5,37 . Interestingly, Gly 1 is highly conserved across cyclotides from different plant species, raising the possibility that selection at this position is not driven by AEP cyclase specificity. In transgenic plants, more stringent requisites were observed and no cyclic product was made from kB1 precursors with a conservative Gly 1 Ala mutation 8 . Because AEP cannot liberate the N-terminal propeptide, we hypothesize that selection at Gly 1 might be driven by the putative N-terminal processing enzyme. Further analysis will be necessary to determine if this reflects differences in the enzyme homologues being assayed (that is, AEPs from the model plant Nicotiana benthamiana compared with AEPs from native cyclotide producers) or experimental conditions.
In conclusion, this study unequivocally demonstrates the involvement of an AEP in maturation of native O. affinis cyclotides, advancing our understanding of the biosynthesis of this important class of cyclic peptides. Furthermore, the promiscuous yet highly efficient nature of this recombinantly produced enzyme highlights its exciting potential value as a biological tool for cyclization of a range of bioactive peptides. were synthesized by Genscript at 490% purity. Control IQF peptides representing the predicted cleavage products of the wt peptide (Abz-STRN; GLPS-Y(3NO 2 ) were also synthesized by Genscript at 490% purity. All IQF peptides were solubilized in 25% (v/v) acetonitrile:water. The fluorogenic peptide substrate Z-AAN-MCA (where Z is carboxybenzyl; MCA is 7-amido-4-methylcoumarin) and the caspase inhibitor Ac-YVAD-CHO (where Ac, acetyl; CHO, aldehyde) were supplied by the Peptide Institute and solubilized in dimethyl sulfoxide. The linear cyclotide precursor peptides kB1 wt , kB1 C&N , kB1 L31A and kB1 acetyl were chemically synthesized in-house by standard Fmoc solid-phase peptide synthesis. Folding and disulfide formation was carried out by incubating the reduced peptides in folding buffer (100 mM ammonium bicarbonate, 50% isopropanol, 2 mM reduced glutathione, 1 mM oxidized glutathione, pH 8.2) for 3 days 40 . The products were isolated by RP-HPLC at 495% purity and characterized by high-resolution MS and NMR spectroscopy. Peptides kB1 6xS and kB2 6xS as well as R1 and its derivatives were supplied by Genscript at 485% purity, as determined by RP-HPLC and MS. Peptides were dissolved in ultrapure water before analysis.
O. affinis transcriptome. Total RNA was extracted from O. affinis root, leaf and seedling tissues using a phenol extraction method. Plant material was frozen in liquid nitrogen and ground to a fine powder, which was then resuspended in buffer (0.1 M Tris-HCl pH 8.0, 5 mM EDTA, 0.1 M NaCl, 0.5% SDS, 1% 2-mercaptoethanol), extracted twice with 1:1 phenol:chloroform and precipitated by addition of isopropanol. The pellets were dissolved in 0.5 ml water and RNA was precipitated overnight at 4°C by addition of 4 M lithium chloride. The extracted RNA of each tissue was analysed by GeneWorks using the Illumina GAIIx platform. In total, 69.3 million 75 bp paired-end reads were generated. Reads were filtered with a phred confidence value of Q37 and assembled into contigs using Oases 41 with k-mer ranging from 41-67. The assemblies were merged using cd-hit-est 42 , resulting in 270,000 contigs. Statistics on the depth of sequencing were made by aligning the reads of each tissue on the contigs using BWA 43 . All the sequences, including one AEP, previously identified from an EST library of O. affinis were present among the contigs 44 . Homologues of this AEP sequence were searched using BLAST 45 in the contig library using a maximum E-value of 1e-20, resulting in the identification of 371 putative AEP transcripts. These sequences could then be clustered in 13 groups sharing at least 90% sequence identity using cd-hit 42 .
OaAEP1-3 cloning. Full-length AEP transcripts from the O. affinis transcriptome assembly were used to design a set of primers. A single degenerate forward primer (OaAEPdegen-F, 5 0 -ATG GTT CGA TAT CYC GCC GG-3 0 ) was sufficient to amplify all sequences since variability within the extreme 5 0 region of each fulllength transcript was limited to a single nucleotide position. Three reverse primers (OaAEP1-R, 5 0 -TCA TGA ACT AAA TCC TCC ATG GAA AGA GC-3 0 ; OaAEP2-R, 5 0 -TTA TGC ACT GAA TCC TTT ATG GAG GG-3 0 ; OaAEP3-R 5 0 -TTA TGC ACT GAA TCC TCC ATC G-3 0 ) were designed with the aid of Primer 3 (ref. 46). Each primer set successfully amplified an AEP sequence.
To clone expressed OaAEPs, total RNA was extracted from O. affinis leaves and shoots using TRIzol (Life Technologies) and was reverse transcribed with SuperScript III reverse transcriptase (Life Technologies) according to the manufacturer's instructions 11 . Target sequences were amplified from the resulting complementary DNA using Phusion High Fidelity Polymerase (New England BioLabs) and the primers described above under the recommended reaction conditions. Gel extracted PCR products were dA-tailed by incubation with Invitrogen Taq Polymerase (Life Technologies) and 0.5 ml 10 mM dA in the supplied buffer. The processed products were cloned into pCR8-TOPO (Life Technologies) and transformed into E. coli. Purified DNA from clones that were PCR positive for an AEP insert were sent for Sanger sequencing at the Australian Genome Research Facility (www.agrf.org). Coding sequences have been deposited in Genbank (accession codes: OaAEP1 (KR259377), OaAEP2 (KR259378), OaAEP3 (KR259379)).
In an alternative approach, genomic DNA was extracted from O. affinis leaf tissue using a DNeasy Plant Mini Kit according to the manufacturer's instructions. PCR amplification from this DNA used primers specifically targeting the OaAEP1 nucleotide sequence. Gel extracted product was dA-tailed as above, cloned into the TOPO vector and transformed into E. coli. Sequencing of PCR-positive clones identified a fourth sequence with a single amino acid change from OaAEP1 (OaAEP1 b ).
Antibodies. Polyclonal anti-OaAEP1 b rabbit serum was generated by immunizing a New Zealand White rabbit with a denatured, inactive form of OaAEP1 b (residues D 47 -P 474 ) that was produced recombinantly in E. coli. The rabbit received three doses, four weeks apart, of 150 mg of antigen in 50% (v/v) PBS and Freund's incomplete adjuvant. Serum was obtained 2 weeks after the final dose and used at a 1:2,000 dilution for Western blotting.
Recombinant expression of O. affinis AEP1 b (rOaAEP1 b ). Initial trials to produce active O. affinis AEP1 b based on predicted N-and C-terminal processing sites (residues D 47 -D 420 ) were unsuccessful and subsequent expression attempts incorporated both N-and C-terminal prodomains. DNA encoding full-length O. affinis AEP1 b without the putative signalling domain (residues A 24 -P 474 ) was inserted into the pHUE vector 47 to give a His6-ubiquitin-OaAEP1 b fusion protein construct (Supplementary Fig. 2) and introduced into T7 shuffle E. coli cells (New England BioLabs). Transformed cells were grown at 30°C in superbroth (3.5% tryptone (w/v), 2% yeast extract (w/v), 1% glucose (w/v), 90 mM NaCl, 5 mM NaOH) to mid-log phase; the temperature was then reduced to 16°C and expression was induced with isopropyl -D-1-thiogalactopyranoside (0.4 mM) for B20 h. Cells were harvested by centrifugation and resuspended in non-denaturing lysis buffer (50 mM Tris-HCl, 150 mM NaCl, 0.1% triton X 100, 1 mM EDTA, pH 7). Lysis was promoted by a total of five freeze/thaw cycles and the addition of lysozyme (hen egg white; 0.4 mg ml À 1 ). DNase (bovine pancreas; 40 mg ml À 1 ) and MgCl 2 (0.4 M) were also added. Cellular debris was removed by centrifugation and the lysate was stored at À 80°C until required.
Purification and activation of rOaAEP1 b . Lysate containing expressed rOaAEP1 b was filtered through a 0.1 mm glass fibre filter before being diluted 1:8 in buffer A (20 mM bis-Tris, 0.2 M NaCl, pH 7) and loaded onto two 5 ml HiTrap Q Sepharose high performance columns connected in series (GE Healthcare; 1.6-3.1 ml undiluted lysate per millilitre resin). Bound proteins were eluted with a continuous salt gradient (0-30% buffer B (20 mM bis-Tris, 2 M NaCl, pH 7); 15 column volumes) and AEP-positive fractions identified by Western blotting (anti-OaAEP1 b rabbit serum (1:2,000); peroxidase-conjugated anti-rabbit IgG (GE Healthcare NA934; 1:5000)). To self-activate rOaAEP1 b , EDTA (1 mM) and Tris(2-carboxyethyl)phosphine hydrochloride (0.5 mM) were added, the pH was adjusted to 4.5 with glacial acetic acid and the protein pool was incubated for 5 h at 37°C. Protein precipitation at this pH allowed removal of the bulk of the contaminating proteins by centrifugation. The remaining protein was filtered (0.22 mm), diluted 1:8 in buffer A2 (50 mM acetate, pH 4) then captured on a 1 ml HiTrap SP Sepharose high performance column (GE Healthcare). Bound proteins were eluted with a salt gradient (0-100% buffer B2 (50 mM acetate, 1 M NaCl, pH 4); 10 column volumes) and AEP-positive fractions were pooled. The final product was analysed by MS and reducing SDS-PAGE followed by Western blotting and staining with Instant blue (Expedeon). The total concentration of protein was estimated by bicinchoninic acid assay according to the manufacturer's instructions.
Tandem MS data were generated in Information Dependent Acquisition experiments, wherein full-scan TOF-MS spectra were acquired for 250 ms over m/z 350-1,800, and the 20 most intense signals with charge state þ 2 to þ 5 were selected for product ion scans of duration 50 ms over m/z 80-1,400. Data were acquired and processed using the Analyst TF 1.6 software (AB SCIEX, Canada). Automated protein identification was performed in ProteinPilot 4.0 using the Paragon algorithm with 'no enzyme' settings to interrogate an E. coli K12 proteome database (Uniprot) concatenated with sequences of rOaAEP1 b and those of the three cleavage enzymes, with false discovery rate set at Po0.05. Spectral data for peptides software matched to rOaAEP1 b were confirmed via manual interpretation in Analyst TF 1.6.
Assaying protease activity against fluorescent peptides. To assay activity of rOaAEP1 b against both internally quenched and other fluorescent peptides, substrate and enzyme were diluted as appropriate in activity buffer (50 mM sodium acetate, 50 mM NaCl, 1 mM EDTA, 0.5 mM Tris(2-carboxyethyl)phosphine hydrochloride, pH 5). To assay activity of rhuLEG (R&D systems) against the same substrates, the enzyme was first activated by incubation in 50 mM sodium acetate, 100 mM NaCl, pH 4 (4 ml buffer/1 ml enzyme) for 2 h at 37°C. Substrates and activated rhuLEG were diluted in 50 mM MES, 250 mM NaCl, pH 5 as required. Diluted enzyme and substrate were added to black, flat bottomed microtiter plates in a total assay volume of 110-200 ml. The change in fluorescence intensity over time was monitored on a SpectraMax M2 (Molecular Devices) using excitation/ emission wavelengths of 320/420 nm (IQF peptides) or 360/460 nm (other fluorescent peptides). Substrate and enzyme concentrations in each assay and the time point presented are as indicated in the figure legends.
To determine the kinetics of rOaAEP1 b activity against IQF peptides, each substrate was assayed at a range of concentrations between 2.5 and 80 mM in a total volume of 200 ml. The total protein concentration of the enzyme preparation used in the kinetic assays was 3.5 mg ml À 1 . It was not possible to precisely determine the concentration of active enzyme due to impurities remaining in the preparation and the absence of an inhibitor appropriate for active site titration. However, a conservative turnover rate (k cat ) was estimated based on a mass of 32 kDa and the assumption that the total protein concentration reflected active enzyme. At each substrate concentration, initial velocities were calculated from the linear portion of the progress curve. K m and V max were estimated using the Michaelis-Menten equation and the curve-fitting program GraphPad Prism (GraphPad Software, San Diego).
The high peptide concentrations required for estimating kinetic parameters necessitated the use of a correction factor to account for the inner filter effect; a phenomenon where high relative concentrations of the quenching group impede detection of the signal from the fluorescent donor even after substrate hydrolysis 48 . This was achieved as described previously 48 . The output generated by the fluorescent hydrolysis product (Abz-STRN) was measured in the presence of each concentration of non-hydrolysed substrate. The correction factor was the ratio between the expected and observed fluorescence signal at each substrate concentration. The corrected signal for each data point was then converted to amount of product by comparison to a standard curve of the fluorescent hydrolysis product.
Inhibition assays. To investigate the impact of inhibitors on enzyme activity against the wt IQF peptide, rOaAEP1 b (4.4 mg ml -1 total protein) was incubated with the indicated concentration of E64, Ac-YVAD-CHO, pepstatin A and iodoacetamide for 40 min before addition to the substrate (11 mM). Enzyme activity against the wt IQF peptide was then assessed as described above.
Cyclization assay. Linear target peptides (280 mM) were incubated with rOaAEP1 b (12 mg ml À 1 total protein unless otherwise indicated) in activity buffer. The reaction was allowed to proceed for up to 22 h at room temperature and was analysed by matrix-assisted laser desorption/ionization MS (MALDI MS), RP-HPLC or NMR as appropriate.
To confirm the presence of cyclic product, R1 derivatives processed by rOaAEP1 b were subsequently digested with endoGlu-C (25 mg ml À 1 ) in reaction buffer (50 mM Tris-HCl, 0.5 mM Glu-Glu, pH 8) such that the final dilution of the cyclization mix was 1:4. The reaction was allowed to proceed for 18 h at 37°C before analysis by MALDI MS.
In heavy water experiments, isotopically labelled water (97 atom % 18 O) was used in place of unlabelled water. Linear target peptides (70 mM) were incubated with rOaAEP1 b (6 mg ml À 1 total protein) in a non-reducing activity buffer (50 mM sodium acetate, 50 mM NaCl, 1 mM EDTA, pH 5) for 22 h at room temperature. The final H 2 18 O concentration in the assay was 81%.
MS to track cyclization of linear peptides. Cyclization of linear target peptides was monitored by MALDI MS. The reaction mixture (10-20 ml) was desalted using C18 zip tips and eluted in 4 ml 50% acetonitrile, 0.1% trifluoroacetic acid (TFA). A saturated MALDI matrix solution (a-cyano-4-hyroxycinnamic acid) prepared in 95% acetonitrile, 0.1% TFA was diluted 1:22 such that the final matrix solution comprised 90% acetonitrile, 0.1% TFA and 1 mM NH 4 H 2 PO 4 . Eluted samples were mixed 1:4 with the MALDI matrix, spotted onto a MALDI plate and analysed by an Ultraflex III TOF/TOF (Bruker) in positive reflector mode.
Purification of kB1 following in vitro cyclization. The crude cyclization mixture was loaded to an Agilent Zorbax C18 reversed-phase column (4.6 Â 250 mm, 300 Å) and separated on a Shimadzu Prominence system using a linear gradient of 5-55% buffer B (90% acetonitrile, 10% H 2 O, 0.05% TFA) in buffer A (0.05% TFA/H 2 O) over 60 min. Fractions were collected manually, analysed by MALDI MS essentially as described above and lyophilized. Analytical HPLC and co-elution studies with chemically synthesized kB1 were carried out as described above.
Nuclear magnetic resonance spectroscopy. All peptides were dissolved in 90% H 2 O/10% D 2 O at a concentration of B2.0 mg ml À 1 (0.5-0.75 mM). kB1 obtained from in vitro cyclizations was dissolved at 0.3 mg ml À 1 (B0.1 mM). Spectra were recorded on a Bruker Avance 600 MHz spectrometer equipped with a cryoprobe at 298 K. Phase-sensitive mode using time-proportional phase incrementation for quadrature detection in the t 1 dimension was used for all two-dimensional spectra. Excitation sculpting with gradients was used to achieve water suppression. NMR experiments included TOCSY using a MLEV-17 spin lock sequence with an 80 ms mixing time, and NOESY with a 200 ms mixing time. Spectra were recorded with 4,096 data points in the F 2 dimension and 512 increments in the F 1 dimension. The t 1 dimension was zero-filled to 1,024 real data points, and the F 1 and F 2 dimensions were multiplied by a sine-squared function before Fourier transformation. The spectra were referenced to the water signal at 4.77 p.p.m. at 298 K. All spectra were processed using TopSpin (Bruker) and manually assigned with CCPNMR using the sequential assignment protocol.
Cyclization kinetics. To determine the kinetics of rOaAEP1 b activity against the wt kB1 precursor, the substrate was assayed at room temperature at a range of concentrations between 75 and 250 mM in a total volume of 20-160 ml of activity buffer. The total protein concentration of the enzyme preparation added to the kinetic assays was 19.7 mg ml À 1 . The reaction was quenched after 5 min with 0.1% TFA and the volume adjusted to 800 ml. A volume of 700 ml was loaded onto a reversed-phase C18 analytical column (Agilent Eclipse C18, 5 mm, 4.6 Â 150 mm) and peptides were separated by HPLC (19 min linear gradient of 12-60% acetonitrile, 0.1% TFA at 1 ml min À 1 ). The identity of eluted peaks was confirmed using MALDI MS. The area under the curve corresponding to the precursor peptide was quantitated by comparison to a standard curve and initial velocities were calculated by converting this to mmoles product formed. Kinetic parameters were estimated using the Michaelis-Menten equation and the curve-fitting program GraphPad Prism (GraphPad Software, San Diego). As for the IQF peptides, it was not possible to precisely determine the concentration of active enzyme due to impurities remaining in the preparation and the absence of an inhibitor appropriate for active site titration. However, a conservative turnover rate (k cat ) was estimated based on a mass of 32 kDa and the assumption that the total protein concentration reflected active enzyme. Differences in enzyme preparations means these parameters are not directly comparable to those determined for the IQF peptides.