Stepwise Evolution Improves Identification of Diverse Peptides Binding to a Protein Target

Considerable efforts have been made to develop technologies for selection of peptidic molecules that act as substrates or binders to a protein of interest. Here we demonstrate the combination of rational peptide array library design, parallel screening and stepwise evolution, to discover novel peptide hotspots. These hotspots can be systematically evolved to create high-affinity, high-specificity binding peptides to a protein target in a reproducible and digitally controlled process. The method can be applied to synthesize both linear and cyclic peptides, as well as peptides composed of natural and non-natural amino acid analogs, thereby enabling screens in a much diverse chemical space. We apply this method to stepwise evolve peptide binders to streptavidin, a protein studied for over two decades and report novel peptides that mimic key interactions of biotin to streptavidin.

L-amino acid peptides binding to streptavidin. To identify hotspot sequences that bind to streptavidin (SA), we bound Cy5-SA to an array library of 2,476,099 5-mer L-peptides synthesized with 19 of 20 natural amino acids (excluding cysteine). The fluorescence signal intensity was aggregated across three independently synthesized arrays; then the 2,047 peptides with a signal-to-background ratio (S/B) >4 were selected. The data was further filtered to select 1,100 peptides for which signal intensities were highly correlated on all 3 arrays (the percentage of mean deviation to mean was <10%, see Supplementary dataset Table 2A). Most selected sequences (1,019 of 1,100) contained HP, PQ, or PM sequences: their presence would be expected in the well-known HPQ and HPM streptavidin binders [13][14][15][16][17][18] .
The remaining 81 non-HPQ sequences shown in Supplementary dataset Table 2B were analyzed using PEPLIB 19 . Several peptide sequence clusters were identified ( Supplementary Fig. 1). The peptides FDEWL, LAEYH, and PAWAH were selected as representative sequences from distinct clusters, along with the abundant HPQ motif, as hotspot sequences for the next step of evolution (table inset Fig. 1). Each of these 4 sequences was extended from both the N-and the C-termini with all possible 160,000 combinations of L-amino acid dimers, using all 20 natural L-amino acids (see Materials and Methods). Streptavidin binding to these 4 new libraries exhibited an amino acid preference at both the N-and C-termini, as shown by the Logo plots (weblogo.berkely. edu) in Supplementary Figure 2. The table inset in Fig. 1 shows the consensus sequences identified with the extension libraries.
At the 'peptide maturation with substitution libraries' step (see Fig. 1), we generated a series of iterative libraries containing all possible single, double substitutions and deletion variants of candidate binders selected in the prior two steps of the process (see Materials and Methods). As an example, LGEYH peptide selected from the 5-mer library was extended to create a XXLGEYHXX library, where X is one of the 20 natural amino acids. From the aforementioned library, the 9-mer peptide DYLGEYHGG showed the highest signal intensity was extended to a 12-mer peptide and tested for specificity as follows. First, DYLGEYHGG 9-mer peptide was further extended by two glycine amino acids at the N-terminus, GGDYLGEYHGG, and a substitution library (single/double/deletion) was generated and tested for streptavidin binding. Second, one of the top sequences in this library, FEDYLGEYHGG, was further extended on the N-terminus by a single glycine to create the 12-mer GFEDYLGEYHGG and a substitution library generated. The single substitution plot shown in Fig. 2A for this peptide validated high specificity of majority of the residues, except for glutamate at position 3, for which an E3P substitution was preferable. For this peptide the effect of E3P to the relative signal intensity was much more significant than F2L, or any other substitution. A substitution plot generated for the peptides from the same array library with a fixed proline at position 3 (double substitution plot, Fig. 2B) showed almost 2-fold improvement in overall binding signal intensity without loss of sequence specificity. D-amino acid peptides binding to streptavidin. The same stepwise evolution approach ( Fig. 1) was followed to identify hotspot D-amino acid sequences that bind to streptavidin. Again, a 2,476,099 5-mer peptide library was synthesized with 19 of 20 D-amino acids (excluding cysteine) and fluorescence signal intensities of bound Cy5-SA across three arrays was compared to identify 114 5-mer D-peptides with the highest S/B values (Supplementary dataset Table 3). Two 5-mer hotspots, "wqeea" and "lanvd", were selected for the extension step with all 160,000 possible combinations of D-amino acid dimers. As with the L-amino acids, the extended D-peptides showed a preference for specific sequences at both termini (Supplementary Figure 2). Consensus sequences identified with the extension libraries for each hotspot sequence are shown in the inset table in Fig. 1.
Similar to L-peptides, D-peptides with highest fluorescence intensity after extension were further matured by synthesizing series of the substitution and deletion libraries as outlined in Fig. 1. Examples of the substitution plots for D-peptides are shown in Supplementary Figures 7 and 8; sequences of the matured peptides used in the following experiments are shown in the inset table in Fig. 1.
Cyclic L/D-amino acid peptides binding to streptavidin. To identify cyclic peptide binders to streptavidin, Cy5-SA was bound to an array library of 388,962 5-mer L/D-peptides. The pentameric peptides were composed of a 4-mer peptide synthesized with a combination of uncharged L-and D-amino acids and a γ-Glu: this residue enabled synthesis of both head-to-tail cyclic and linear peptides, respectively, for each peptide through either allyl ester or t-butyl ester C-terminal protection (see Materials and Methods for amino acid composition and cyclization protocol). The paired cyclic and linear array features were spatially positioned side-by-side to control for potential variations in cyclization yield for each peptide. The fluorescence signal intensities across three independent replicates on the same array were compared and the ratio cyclic to linear peptide of signal intensities was compared. The cyclic peptide with the highest fluorescence signal intensity was NQpW[γ-Glu], while the comparable linear peptide showed no measurable fluorescence signal intensity on the array ( Supplementary Fig. 3). Co-crystal structures of array-matured peptides with streptavidin. We determined high-resolution (between 1.05Å-1.61 Å) co-crystal structures for the 7 matured peptides listed in the inset table in Fig. 1 and for head-to-tail cyclic peptide NQpWQ to reveal the details of peptide/streptavidin interactions (Supplementary Table 1) All peptides bind within or near the biotin binding pocket of streptavidin formed by two surface loops [1/2 (amino acids 22-28) and 3/4 (amino acids 42-52)], and by antiparallel β-sheets involved in an extensive polar interaction network, in which residues Ser88 and Thr90 of β-strand 6 play a major role (Fig. 3). The surface loops of streptavidin are flexible: upon biotin binding they undergo a conformational change to form a closed conformation 21 , but to accommodate the peptide ligands, loop 3/4 is dislocated by 13-16 Å from the biotin closed form to adopt a well-defined peptide-specific structure. The loop region around Trp120 of the neighboring subunit provides additional contacts within the streptavidin tetramer. With the exception of these flexible loops, streptavidin has a rigid structure, with a root mean square deviation (rmsd) of ~0.5 Å for a superposition of all atoms in the co-crystal structures. The Gdlwqheatwkkq, GGwhdeatwkpG and GNSFDDWLASKG peptides bind with the same N-terminus to C-terminus directionality, opposite to the binding orientation of all other peptides.
All contacts between peptides and streptavidin at the distance of 4 Å and the hydrogen bond and polar interactions of biotin with streptavidin are listed in Supplementary Table 4.
Detailed analysis of correlation between array and crystallography data. L-peptides. Binding conformation of EWVHPQFEQKAK peptide closely resembles previously published HPQ peptide-streptavidin structures 22 . The HPQFE amino acids at positions 4-8, occupy the biotin-binding pocket of streptavidin and adopt a rigid conformation, whereas the N-and C-terminal amino acids are exposed to the solvent and show significant degree of freedom in the crystal structure [ Fig. 4(B)]. The His4 and Gln6 sidechains of the HPQ motif form hydrogen bonds with residues Ser88 and Thr90 of streptavidin. Pro5 is crucial for positioning the Gln6 sidechain within hydrogen bonding distance of Thr90. The sidechain of Phe7 is involved in critical π-π stacking against Trp120 from a neighboring streptavidin subunit. Further, at position 8, the negatively charged, long sidechain of Glu8 is important for charge-charge interactions with Arg84 [ Fig. 4(A)]. In the substitution plot for the EWVHPQFEQKAK peptide [ Fig. 4(C)], the most specific region is the HPQFE middle portion, which forms a "specificity valley", surrounded by much less specific N-and C-terminal regions [ Fig. 4(C)]. In contrast to the EWVHPQFEQKAK peptide, the GNSFDDWLASKG peptide forms an α-helix and is located outside the biotin-binding pocket which is occupied by a glycerol molecule [ Fig. 5(A,B)], so that the peptide does not directly participate in polar interactions with Ser88 and Thr90. Instead, Asn2 forms a hydrogen bond with Ser45; the backbone NH of the C-terminal Gly12 forms a hydrogen bond with Asn85. The negatively charged sidechain of Asp5 forms a salt-bridge with Arg84 and contributes to a network of polar interactions with  All mentioned peptide residues-Asn2, Phe4, Asp5, Trp7, Leu8, and Gly12-demonstrate high specificity in the substitution plot [ Fig. 5(C)]. For example, the substitution plot reveals a clear preference for Asp over Glu at position 5, which can be explained by the multiple polar interactions that the short Asp5 sidechain-but not the longer Glu sidechain-can accommodate. Because GNSFDDWLASKG forms an α-helix, a repeating pattern of conserved residues alternates with non-conserved positions in steps of 2-3 amino acids.
Strong correlation between specificity in the substitution plots and co-crystal data was also observed for the AFPDYLAEYHGG peptide (the highest affinity for streptavidin in SPR measurements) and the RDPAPAWAHGGG peptide (the longest stretch of highly specific amino acids between positions 2 and 11) (Supplementary Fig. 5 and Supplementary Fig. 6, respectively).
D-peptides. All D-peptides fold into a left-handed α-helix with one turn for GyGlanvdessG and two turns each for the Gdlwqheatwkkq and GGwhdeatwkpG peptides. Gdlwqheatwkkq and GGwhdeatwkpG share high sequence homology; further, their conformation and binding mode are highly similar [ Fig. 3(C) and Supplementary Fig. 7(A)]. In the common "wxxea" core of these two peptides, the D-Trp forms polar contacts with Asp128 and contributes an edge-to-face interaction with Trp108, whereas the D-Glu shares a salt bridge with Arg84. D-Ala is critical to the core motif because its short sidechain perfectly accommodates the limited space within the pocket. The positions "xx" of the "wxxea" core face the solvent and therefore are less specific [ Supplementary Fig. 7(B,C)]. For the GyGlanvdessG peptide, the co-crystal structure supports the importance of residues 3-10 as indicated by the substitution plot [ Supplementary Fig. 8(A)], while the substitution-tolerant N-and C-terminal ends of the peptide are disordered in the structure [ Supplementary Fig. 8(B)].
Cyclic peptide. The peptide adopts a flat disc conformation with all amino acid side chains (except Gln5) localized in the same plane (Fig. 6). Asn1 and Gln5 engage in polar interactions with Thr90 and Ser88, respectively. Further, the backbone carbonyl oxygen of Asn1 is hydrogen-bonded to Ser27 and Tyr43.

Discussion
To demonstrate the feasibility of stepwise evolution and rational peptide array design (Fig. 1), we selected streptavidin as a target molecule. For almost three decades, streptavidin has been a classic model for evaluating various screening techniques, including phage display, mRNA display, and combinatorial peptide bead libraries [13][14][15][16][17][18] . The most frequently identified peptide sequences in the majority of these studies contain the same Indeed, we found that HPQ(M) was the predominant motif among the top 5-mer L-peptides discovered in the initial screen of the 5-mer library. When we examined the top 1,100 sequences that bound to streptavidin, only 81 contained "secondary" binding motifs, rather than the HPQ(M) motif. Here, we took advantage of the "spatially addressable features" of our technology: we could assess the relative binding affinity of not only the most dominant peptide families, but also peptide families exhibiting weaker, but nonetheless detectable, binding.
From these "secondary" peptide families identified in the initial screen, we chose three 5-mer "hot spot" sequences: LAEYH, PAWAH, and FDEWL. To the best of our knowledge, none of these sequences have been reported previously. We used a series of array peptide libraries rationally designed around each of these sequences to successfully evolve the three sequences to 12-mer L-peptides. One L-peptide, AFPDYLAEYHGG, showed a K d of 43 nM-a 100-fold improvement over the HPQ peptide also developed in this study, and 1,000-fold greater than Strep-tag II HPQ (Table 1). Finding a 12-mer peptide using current random library selection technologies would theoretically require the initial random library to possess diversity greater than 4 × 10 15 ; such a library would be several orders of magnitude larger than any library of practical size.
To explain why "secondary" binders that exhibit higher affinity to streptavidin than the commonly reported HPQ-motif binders have not been previously selected in random libraries, we assumed that the HPQ(M) motif contributes most of the binding energy, whereas the contribution of the flanking sequences is relatively minor and multiple flanking sequences are acceptable. Thus, HPQ(M)-peptides can en masse outcompete alternative candidates that have longer core sequences and that might be present in smaller copy numbers in the initial rounds of random library selection. We believe that the "winner takes all" bias could be a general phenomenon inherent to all display technologies, and thus limits their utility.
The benefits of employing a systematic screening approach rather than random library selection have been demonstrated 6,28 , but faces the challenge of synthesizing focused, yet adaptable libraries. The array synthesis technology described here has made stepwise evolution practical and possible for three reasons: (1) the high density of the peptide arrays enabled placement of nearly all possible 5-mer peptides on a single array, even for an initial screen; (2) the high sensitivity enables detection of even low-affinity binding events; and (3) the efficient array design and synthesis process enabled rapid completion of multiple rounds of binder evolution.
Notably, rational peptide array design enables not only identification of high-affinity peptide binders but also assessment of the effects of individual amino acid substitutions at each position in a single experiment, as clearly demonstrated by the single/double amino acid substitution plots (Fig. 2). Unlike the well-known alanine scan, this method both compares the binding of thousands of related peptide variants and identifies critical binding residues.
One concern is that the substitution plots might reflect artifacts of array-synthesized peptides and/or the surface microenvironment rather than bona fide peptide interactions with the target. Although these effects cannot be completely excluded, comparing the substitution plots with the co-crystal structures for each peptide revealed an excellent correlation between the binding specificity of the preferred amino acids and their relative contribution to the interactions with streptavidin in the co-crystal structure. The highly specific amino acids identified by substitution analysis face the streptavidin binding pocket and participate in an elaborate network of inter-/ intra-molecular interactions (Figs 4, 5, and Supplementary Figs 5-8). Similarly, amino acids that are exposed to solvent and do not make contacts with streptavidin represent non-specific peptide positions.
All eight L-and D-peptides discovered in this work (Table 1) bind to the same pocket of streptavidin in distinctive binding modes suggesting (1) the rich malleability of L-and D-peptides and (2) the ability of streptavidin itself to adjust the flexible loop to accommodate multiple peptides with diverse sequences and conformations(- Fig. 3). Interestingly, five of eight peptides found here adopt an α-helical conformation, underscoring the importance of intra-peptide interactions for peptide/target stability. The observation that highly diverse peptides can bind to the same pocket is likely not unique to streptavidin; we expect that the described approach will enable the discovery of multiple peptide binders for other targets (manuscript in preparation). Our combined stepwise evolution and rational peptide array design approach could potentially advance drug discovery by designing peptide libraries that incorporate additional modifications (e.g., β-amino acids, N-methyl amino acids, and peptoids) to expand the libraries' physicochemical and conformational diversity. These libraries, together with array-based assays, could be used to select for binding affinity and to assess proteolytic stability and cell permeability. Ultimately, this approach could both greatly shorten the time needed for lead molecule discovery and enable identification of compounds with integrated drug-like properties including oral availability, good pharmacokinetics, and low toxicity in a single screen.

Methods
Methods and the associated references are available in the online version of the paper.
Peptide array synthesis. Peptide synthesis was accomplished through light-directed array synthesis in a Roche NimbleGen Maskless Array Synthesizer (MAS) using an amino-functionalized substrate as previously reported 29 .
The combined cyclic and linear peptide libraries were synthesized starting with either the allyl ester (OAll) or t-butyl ester, respectively, of N-(2-nitrophenyl)propoxycarbonyl (NPPOC)-protected glutamate (γ-Glu) linked to the array surface through the carboxylic acid side chain. To cyclize the peptides prior to side chain deprotection, the array was first treated with tetrakis(triphenylphosphine)palladium(0) (2 mM) in THF for 3 h at room temperature to remove the OAll protecting group from the C-terminus of the peptide library. To remove residual palladium from the array, the slide was washed with 5% N,N-diisopropylethylamine (DIPEA) and 5% sodium diethyldithiocarbamate in DMF for 5 min. After a 1-min wash with water, the slide was spun to dryness before cyclization. The array was then cyclized by coupling the N-to the C-terminus using a standard coupling procedure: (1)  HotSpot Extension Arrays. Extension libraries were designed using a fixed-core sequence extended at both the N-and C-termini with all possible 20 L-or D-amino acid dimers. Each library included 160,000 unique peptides synthesized in five replicates. Each array accommodated up to three independent extension libraries.
Substitution Arrays. Substitution libraries were designed by introducing all possible single-and double-amino-acid substitutions and single-amino-acid deletions for a specific sequence using all 20 L-or D-amino acids. Each library was synthesized in five to seven replicates. Each array accommodated up to 12 independent substitution libraries.
Cyclic Discovery Arrays. All peptides in the library were 5-mers in the format XXXX[γ-Glu], where XXXX is a combination of all possible 4-mer amino acids from a subset of L-and D-amino acids, and γ-Glu is a L-glutamate protected on the C-terminus with either an allyl ester or a t-butyl ester, to generate cyclic or linear features, respectively, as described above. The L-amino acids included in this design were Ala, Asn, Gln, Gly, Ile, Leu, Phe, Pro, Ser, Thr, Trp, Tyr, and Val; the D-amino acids were Ala, Asn, Leu, Phe, Pro, Ser, Trp, and Tyr.
Streptavidin binding on array. Cy5 ™ -streptavidin (Cy5-SA) was purchased from GE Healthcare (Little Chalfont, UK). Freshly deprotected arrays were used in each experiment. Streptavidin binding to all arrays was performed with 0.5 µg/ml Cy5-SA either in binding buffer containing 10 mM Tris-HCl (pH 7.4), 1% alkali-soluble casein (EMD Millipore), 0.05% Tween-20 or in 10 mM Tris-HCl (pH 7.4), 4% BSA (Roche, Basel, Switzerland), 0.05% Tween-20 in a 30 mL PAP Jar container (Evergreen Scientific, Vernon, CA) overnight at 4° C. After incubation, arrays were washed in 20 mM Tris-HCl (pH 7.8), 0.2 M NaCl, 1% SDS or 1X TBS (pH 7.4) for 30 sec followed by a 1 min wash in water, and then dried by spinning in a microcentrifuge equipped with an array holder. Data analysis. Cy5 fluorescence intensity of the arrays was measured with an MS200 scanner (Roche NimbleGen, Madison, WI) at resolution 2 µm, wavelength 635 nm, gain 25%, and laser intensity 100%. Cy5 signal intensities were extracted using Image Extraction Software (Roche NimbleGen). Data pre-processing, normalization, and statistical tests were performed using the language R. Data visualization and analysis was performed with the Spotfire 6.5.0 (Tibco, Boston, MA) software platform. Distance analysis and principle component analysis of distance matrices were performed with the R package PEPLIB 19 .
Peptide synthesis. All peptides were provided at 98-99% purity and used as received. Strep-tag II peptide, NH 2 -SAWSHPQFEK-COOH (Strep-tag II HPQ), was purchased from IBA GmbH (Goettingen, Germany). The cyclic (head-to-tail) and linear versions of peptide NQpWQ were purchased from GenScript (Piscataway, NJ). All other peptides were synthesized by either the University of Wisconsin Biotechnology Center (Madison, WI) or by Peptide 2.0 (Chantilly, VA). SPR experiments. Surface Plasmon Resonance (SPR) experiments were performed using a Biacore X100 instrument (GE Healthcare). 60 µl of 100 µg/ml streptavidin in 10 mM Na-acetate (pH 5.0) was immobilized to flow cell 2 (Fc2) of a sensor chip CM5 (GE Healthcare) using the Amine Coupling Kit (GE Healthcare) at 20 °C for 6 min. Peptide stock solutions were prepared at 5 or 10 mM in H 2 O and diluted in HBS-EP+ (GE Healthcare) buffer. Peptide binding was performed in a multiple kinetics mode using HBS-EP+ as a running buffer and 0.2 M NaCl, 10 mM NaOH, or 10 mM HCl-glycine (pH 1.7) as the regeneration buffer. Binding kinetics parameters were calculated using Biacore X100 software.
Crystallization and data collection. Crystallization screening for streptavidin (Roche Diagnostics, Risch-Rotkreuz, Switzerland) and peptides was performed at 21 °C in vapor diffusion sitting-drop experiments at streptavidin concentrations of 20-30 mg/ml. Crystals were obtained by mixing 0.14 µL protein with 0.06 µL of screening solution (Procomplex, Qiagen, Hilden, Germany). Details regarding protein-peptide incubation ratios and times, concentrations and crystallization solutions are summarized in Supplementary Table 1. Various crystal forms were found in each peptide co-crystallization experiment. The first crystals appeared within minutes, mainly in polyethylene glycol-containing solutions, and grew to their final size within 3 days after setup. Crystals could be directly harvested out of the screening plate without any further optimization steps because crystal size and quality were sufficient for data collection. For cryoprotection, crystals were transferred into crystallization solution supplemented with 20% glycerol. Diffraction data were collected at the Swiss Light Source (Villigen, Switzerland) on beamline X10SA using a Pilatus 6 M detector.
Structure determination and refinement. Data were processed with Extended Data Services 30 and scaled using SADABS x-ray diffraction (Bruker, Billerica, MA). Structures were determined by molecular replacement with PHASER 31 using the apo-streptavidin coordinates of Protein Data Bank (PDB) entry 3RY1. With programs from the CCP4 suite 32 and BUSTER 33 , the coordinates obtained by molecular replacement were subsequently refined by rigid-body and positional refinement (Supplementary Table 1). Manual rebuilding of the protein was achieved using model-building software (COOT 34 ). The difference electron density was used to rebuild the loop areas and to place the peptides. Distance calculations and analysis of contacts between streptavidin and the peptides were conducted in COOT and with the molecular modeling/simulation program MOE 35 . Images were produced with the structural visualization program PYMOL 36 .