A Shorter Route to Antibody Binders via Quantitative in vitro Bead-Display Screening and Consensus Analysis

Affinity panning of large libraries is a powerful tool to identify protein binders. However, panning rounds are followed by the tedious re-screening of the clones obtained to evaluate binders precisely. In a first application of Bead Surface Display (BeSD) we show successful in vitro affinity selections based on flow cytometric analysis that allows fine quantitative discrimination between binders. Subsequent consensus analysis of the resulting sequences enables identification of clones that bind tighter than those arising directly from the experimental selection output. This is demonstrated by evolution of an anti-Fas receptor single-chain variable fragment (scFv) that was improved 98-fold vs the parental clone. Four rounds of quantitative screening by fluorescence-activated cell sorting of an error-prone library based on fine discrimination between binders in BeSD were followed by analysis of 200 full-length output sequences that suggested a new consensus design with a Kd ∼140 pM. This approach shortens the time and effort to obtain high affinity reagents and its cell-free nature transcends limitations inherent in previous in vivo display systems.

the maximally achievable display level (~10 6 per bead 31 ). The fluorescence distributions of the negative controls (without the BG on the bead) varied between the two detection modes, suggesting that there is a fraction of dysfunctional, misfolded scFv present attached non-specifically to the bead surface itself, which can be detected with the anti-HA antibody. However, these scFv molecules do not bind to the target (and consequently are not detected with the anti-Fc antibody). It has been shown that certain scFv antibodies can be prone to aggregation as a result of misfolding 38,39 , and this could explain the observed background signal during the on-bead display measurement. Nevertheless, the presence of a negligible fraction of misfolded scFv on the bead was not expected to interfere with the performance of the binding assay during the selection process. In order to achieve uniform surface display levels on beads for the SNAP-scFv-HA and the SNAP-HA fusion proteins, prior optimisation of in vitro expression and display efficiency was necessary. In particular, the time and temperature of the incubation step had to be re-evaluated (see Supplementary Fig. 2 and Supplementary Protocol 1).
In order to establish sufficiently stringent conditions to select for high affinity scFv in library screenings, the resolution of the display system was investigated. We compared the flow cytometric on-bead binding signals of the anti-FasR variants with a range of affinities: Ep6b_B01, E09, E09_Y58S (with the K d values of 0.18 nM, 8.6 nM and 187 nM, respectively) 37 and a non-binder CEA6 scFv 40 . Each scFv was displayed on the bead surface and subsequently incubated with 1 nM, 10 nM and 30 nM of the antibody-labelled FasR-Fc (Fig. 2B). This experiment The SNAP-E09 scFv-HA construct was expressed in vitro in the presence of streptavidin-coated beads coupled to spiking anchors without the BG moiety (negative controls; histograms a and d) or to spiking anchors conjugated to the BG (histograms b and e). The beads were then subjected to on-bead display (cartoons a and b) and binding assays (cartoons d and e). In the on-bead display assay the SNAP-scFv-HA was detected with Alexa488-labelled anti-HA antibody (histograms a and b; green). Fluorescence distribution from the flow cytometric analysis showed a 250-fold increase in the median fluorescence signal compared to the fluorescence of beads without bound ligands. The SNAP-scFv-HA showed a similar fluorescence signal (histogram b; filled green) to the SNAP-HA construct displayed on beads (histogram c; black dotted line), suggesting that maximum on-bead display was reached. For the on-bead binding assay, the same SNAP-scFv-HA construct was detected with a fluorescently labelled anti-Fc antibody (that detects the binding of the FasR at 1 nM; histograms (d and e;) blue), showing > 1,250-fold signal increase over the background. (B) On-bead binding assays for different scFv variants analysed by flow cytometry: a nonbinder (CEA6, grey) and three binders with increasing affinity, namely E09_Y58S (K d ~180 nM, orange), E09 (K d ~9 nM, blue) and EP6b_B01 (K d ~0.2 nM, red). Beads displaying the respective scFv variants were incubated in the presence of 1 nM, 10 nM and 30 nM of the target FasR-Fc (detected with a fluorescently labelled anti-Fc antibody). The peak of the fluorescence distribution obtained by flow cytometry correlates to the affinity of the displayed scFv and shows a clear difference between the variants. The resolution of scFv binders is improved at lower antigen concentration, since, at equilibrium, a greater proportion of higher affinity antibodies than lower affinity antibodies is able to bind to the available antigen. demonstrated that the median fluorescence signal corresponds to the affinity of the scFv and the increase in the signal indicates tighter binding. The difference in the MFS between weakest (187 nM) and strongest binders (0.18 nM) increased from 1.8-fold to 8.4-fold when the antigen concentration was dropped from 30 nM to 1 nM. This apparent correlation supported the idea 41 that -in a monovalent selection system -a reduced antigen concentration increases the selection stringency, such that, at equilibrium, a greater proportion of displayed higher affinity antibodies will carry the fluorescent antigen compared to lower affinity antibodies. These observations suggest that BeSD is capable of discriminating between variants on the basis of binding affinity, provided that an appropriate antigen concentration is chosen.
Screening of an scFv library generated by error-prone PCR. Beginning with the scFv E09 (parent) 37 as a DNA template, an error-prone library with a low mutation rate of ~1.7 amino acid changes per scFv (Library I) was created. Selection cycles were performed using an improved protocol ( Fig. 1) based on that of Diamante et al. 31 (as discussed above), in which 1-2 × 10 5 library members were screened in each round. To find an optimal threshold concentration for selection of the improved scFvs, the equilibrium binding titration curves were determined ( Supplementary Fig. 3). The apparent K d value of the parent was 4.3 nM (± 0.3), thus the library screening was done at an antigen concentration of 1 nM. It is worth noting that the on-bead K d measurement was highly reproducible; the standard deviation between the individual normalised values was only 13% and between the derived K d values 25% (Supplementary Fig. 4). By comparison, the variability of K d values in yeast display was reported to be 30% 42 . This screening procedure (pictured in Fig. 1) was repeated until a total of four rounds of selection had been performed (see Fig. 3A) with intermittent re-randomisation and an increasingly narrow sorting gate. In the first selection round (round I) a permissive sorting gate was set (i.e. on 3% of the parent fluorescent population and the same gate was used for selecting variants from the Library I), followed by more stringent rounds with the sorting gate set on the top 0.5% library members (rounds II-IV). To further explore diversity of the scFv variants, another round of mutagenesis by error-prone PCR was performed to introduce mutations after round II (made using the same mutation rate as before, resulting in Library II), now starting from the pool of scFv variants selected in rounds I and II.
It is noteworthy that the selections were done entirely in vitro -the recovered scFv sequence from each round was PCR-assembled into the full SNAP-scFv-HA construct using a high fidelity polymerase (see Methods). Subsets of each output were also cloned for the sequence analysis, which demonstrated highly diverse variant sequences (Fig. 3B). The frequency of wild-type E09 occurrence decreased continuously in successive selection rounds, indicating that the enrichment of improved variants at the expense of the starting species was successful (Fig. 3B).

Characterisation of the selected scFvs.
After the fourth selection round the output was cloned into the pISNEX in vitro expression vector (Supplementary Fig. 9) or into the vector pCantab6 (for periplasmic bacterial expression) 43 . 21 randomly selected clones were tested in on-bead binding assays to identify binders of improved affinity (Fig. 3C). Additionally, the proportion and affinity improvement of the output binders from our selection was compared to the output of the previous ribosome display campaign (that underwent six rounds of evolution 36 rather than four as in this work). 88 clones from the round IV output of BeSD were picked and expressed in E. coli and the supernatants were tested in a binding plate assay as described by Chodorge et al. 36 . The supernatant screen showed that hit rate in both outputs was comparable: the improved variants (defined as mutants with a binding signal three times above the parent) constituted 45% of the ribosome display output (data not published) and 35% for the BeSD output ( Supplementary Fig. 5). Moreover, the on-bead assay (performed with 21 variants, Fig. 3C) showed that 44% of the tested variants demonstrated higher binding signal when compared to the parent scFv. This result agreed with the plate based supernatant screen and it emphasises that the on-bead approach faithfully reflects conventional biophysical measurements.
Six clones (A01b, A03a, A07a, A09a, A11b, F03a), out of the 21 tested in the on-bead binding assay, with the highest affinity improvements (> 2-fold increase of MFS in on-bead binding assay over E09, see Fig. 3C) and the most abundant variant (A05a, contributing ~7% of the output of round IV) were expressed in E. coli and their binding kinetics were tested by bio-layer interferometry (BLI) (with the exception of F03a, which failed in bacterial expression). The parent scFv E09 and EP6b_B01, a high affinity variant from ribosome display 37 , were analysed alongside the selection outputs as references, and Table 1 summarises the K d , k on and k off values obtained. The binding constants of the scFv antibodies showed that all variants selected by BeSD had improved, low nM affinities. The largest gain was demonstrated by the variant A07a, being ~19-fold over the parent and ~2-fold over the EP6b_B01 clone. As the BLI was insufficiently sensitive to discriminate slow k off values, the scFv A07a was converted to an IgG to exclude avidity effects in the determination of K d values, by virtue of performing the analysis with a low density IgG-coated chip surface and using surface plasmon resonance (SPR). In this format A07a, from BeSD, and EP6b_B01, from ribosome display, were 47-fold and 228-fold improved, respectively.

Analysis of the sequence alignment of the output reveals affinity-relevant hotspots.
Panning selections have the drawback that they typically yield a proportion of proteins with weak affinity that are still sufficient to pass a threshold defined by the antigen concentration and washing steps, which demands screening hundreds, if not thousands, of individually prepared variants in biophysical assays to provide additional characterisation. In BeSD (and in other multivalent display systems, e.g., yeast display) the readout is quantitative, i.e., it reports directly on ligand occupancy and thus more directly reflects K d . We therefore probed whether the quantitative readout from BeSD could define a binding consensus, simply based on sequencing rather than time-consuming biophysical analysis of individual clones. To examine this hypothesis, 223 randomly selected clones form the fourth selection round output were sequenced (by Sanger sequencing) and aligned to the parent E09 scFv. The resulting alignment revealed 8 hotspots 36   Each output had high diversity of scFv amino acid sequence (with a decreasing proportion of parent E09 in each subsequent round). (C) 21 randomly selected scFvs from the fourth output were tested in on-bead binding assays. The plot shows the binding ratio of each variant over the E09 scFv (green bar, triplicate measurements), and the frequency of each variant in the selection output (numbers on right hand side). Seven scFvs either showing the largest improvements in binding (signal > 2-fold higher than E09; marked with '*') or appearing with the highest frequency in the output (marked with '**') were expressed in E. coli and analysed biophysically ( Table 1). Only one clone, F03a, failed in bacterial expression, and thus was not characterised.
Scientific RepoRts | 6:36391 | DOI: 10.1038/srep36391 sequences analysed (Fig. 4). The mutations in those positions were progressively enriched in the course of selections ( Supplementary Fig. 6). In the hotspot positions each residue was preferably changed to a specific amino acid (contributing to more than 80% of all mutations occurring in that position, data not shown), suggesting that those favoured mutations were specifically selected during the scFv evolution, as they are beneficial for its biophysical properties. The residue V L Y50 (numbered according to the Kabat system 44 ) was the only position that did not have a clearly dominating mutation, and the tyrosine was mutated either to serine or histidine (covering 50% or 48% of mutations occurring in this position, respectively). This observation of two alternative residues in position V L Y50 raised the possibility of deleterious negative epistatic interactions [45][46][47] between these residues and other mutations and suggested a another possible driving force behind selection of either histidine or serine in this position. This bifurcation in the evolutionary history of the final variants 48 would be indicative of a rugged fitness landscape 49,50 . Phylogenetic analysis of the scFv from the final selection output allows building a consensus scFv binder with greatly improved affinity. To investigate the relationships between the hotspot mutations a phylogenetic analysis was performed. To uncover potential epistatic interactions between the hotspot mutations, the unique sequences of the fourth selection round output (193 sequences) were aligned using the MUSCLE algorithm 51 (online) and the resulting phylogenetic tree (displayed in Fig. 5) was created with CIPRES 52 . The mutations found in more than 20% of the sequences in each clade were classified as consensus mutations for that clade. In addition to the hotspot positions (defined above), residue I28 was mutated to a tyrosine in 26% of sequences from the Clade 1, qualifying it also as a consensus mutation. To verify the contribution of each of the identified consensus mutations to the scFv's binding capability, the individual mutations were introduced into the parent and tested by BLI (Supplementary Fig. 7). Mutations V H S25P and V L Y50S each resulted in a 4-fold increase in the affinity for the FasR (compared to the K d of the parent scFv E09). Consequently, to evaluate the epistatic interactions between the mutations, the two scFv consensus mutants (named R4aS and R4aH) were constructed (Fig. 5). The scFv R4aH containing all consensus mutations from Clade 2 had already been identified in our experiments as the A05a scFv, the most abundant scFv variant in the output of the fourth round of BeSD selection (see Fig. 3C and Table 1), and showed 3-fold improvement in the K d value compared to the parent E09. The second consensus scFv, R4aS, was created by introducing the mutations from the Clade 1 into the E09 backbone by site-directed mutagenesis. R4aS was expressed both as scFv and IgG and its affinity determined by BLI and SPR (Table 1). The R4aS variant resulted in a high affinity IgG with a K d of 0.14 nM and a K d gain over parent of 98-fold (measured by SPR), representing tighter binding than the best experimentally selected variant, A07a (K d ∼ 0.29 nM; Table 1). When modelled on the structure of the E09 scFv in complex with the FasR derived by   The mutations contributing to more than 20% of all analysed sequences in the corresponding clade were classified as consensus mutations. The co-existence of these mutations suggests that no negative epistatic interactions occur between them. Additionally, the fact that they were greatly enriched in the last selection round indicates that they were necessary to improve the target binding properties of the scFvs. The dominant residues for each subsequent clade were assembled into R4aS and R4aH scFv consensus mutants. Interestingly, the scFv containing all consensus mutations from Clade 2 was also identified as the A05a scFv -the most abundant scFv variant in the output of the fourth round of BeSD selection (see Fig. 3C and Table 1), and has ~3-fold improved K d value (compared to the parent E09 scFv). The second consensus scFv, R4aS, was created (by introducing the relevant mutations to E09 backbone by site-directed mutagenesis), and subsequently its affinity was measured by BLI (see Table 1). Chodorge et al. 37 (Supplementary Fig. 8), neither the individual mutations nor the consensus scFv were predicted to improve the binding, because they are remote from the interaction surface.

Discussion
This work has established a new powerful methodology in which the evolution of protein binders is achieved by a combination of quantitative in vitro screening and subsequent analysis of consensus mutations. The combination of these two approaches identifies binders with affinities that surpass those of the experimental output: (i) Quantitative in vitro screening. First, experimental in vitro selections with a quantitative readout that reports on affinity were carried out based on a new bead display system, BeSD, which is used here for the first time for evolution of an antibody fragment. The selection of the 290 pM binder A07a and an improvement of 47-fold after four rounds of BeSD was possible from a moderately sized (10 5 -membered) error-prone library (in contrast to a 49-fold improvement obtained after 6 rounds of ribosome display selections and extensive subsequent screening of hundreds of individually prepared variants, as reported before with a 10 7 -fold larger library) 37 . The high quality single clone characterisation in BeSD leads to a high positive rate (confirmed by biophysical characterisation by BLI and SPR), because the titration-like FACS binding assays provide a much more quantitative way of assessing binding strength than affinity panning (i.e. selection against an immobilised target, where stringent control of selection threshold is hard to achieve). The benefit of a quantitative selection system is to improve the selection efficiency: the majority of hits were true positives with improved K d and satisfying the selection threshold. Strikingly, only seven variants needed to be individually prepared and screened to isolate A07a from the BeSD output. By contrast, typical panning outputs in phage or ribosome display selections need to undergo the laborious step of re-screening hundreds, if not thousands, of individually prepared variants in microplate assays in order to distinguish those variants genuinely selected for improvements in the desired function from those which simply persist in the selection process. In short, as binders with similar affinity result in both approaches, the much larger library diversity screened in ribosome display seems to be compensated by the better quality of the screen in BeSD, reporting directly on K d (while involving dramatically fewer experiments).
A further advantage of BeSD is that the in vitro approach transcends in vivo host constraints, allowing more precise control over the protein folding environment (e.g., control of redox conditions and the presence of particular molecular chaperones) allow simplified selections for protein stability (e.g., thermostablility, which typically results in the death of the host cell 53 ) and also obviates the need for cumbersome transformations of variant libraries into a host cell. The new methodology goes beyond other bead-based platforms on record 54,55 as none of the reported platforms have demonstrated the ability to display antibody fragments or quantitatively discriminate binders of varying affinities.
(ii) Consensus analysis. The quantitative readout of BeSD provides information that actively guided the design of consensus mutants. Highly enriched mutations were identified, and by applying a phylogenic analysis, a consensus scFv variant was assembled. The phylogenetic grouping of consensus patterns provides further guidance to avoid unproductive interactions between residues, based on genotypic incompatibility (i.e., negative epistatic interactions 46,49 ). This treatment led to creation of antibody R4aS with a 98-fold higher affinity than the parent. The high quality of selections is the precondition that such an approach can be successful, as reported in a hotspot analysis by Boder 56 that was also able to characterise binding for each single clone quantitatively. Nowadays, the availability of the high throughput sequencing technologies enhances the depth and breadth of insight into the course of directed evolution of proteins 57,58 . Deep sequencing analysis has identified nanomolar binders from naïve selections 59 , but with short reads epistatic interactions are likely to be obscured. Also, limited sequencing length 60 , confined the readout to the CDR 59 , while in this work beneficial mutations were identified across the entire gene. Also the limited sequencing length in prior work confined the readout to the CDR loops, while in this work beneficial mutations were identified across the entire gene. Although in this work the sequencing depth is moderate (200 clones), the availability of full-length sequence data gives access to information on context dependence for cluster analysis, which has never been productively used before.
The combination of stringent experiment and consensus analysis shortens the time from library to improved clone (from weeks to days) compared to ribosome or phage display -first by performing fewer rounds of experimental selections and secondly by replacing the extensive biophysical analysis of individually prepared variants selected by panning methods with characterisation of just one (or a few) consensus mutants. Given that sequencing of many clones is now cheap and fast, the consensus analysis based on a BeSD output and design of a binder based on these patterns, provides a new strategy to obtain improved binders faster.

Methods
Display construct. The plasmid pIVEX-SNAP-HA and pIVEX anchor were constructed by Diamante 31 , based on pIVEX-SNAP-GFP vector. Plasmids pIVEX-SNAP-GFP and pIVEX-anchor are available via the AddGene repository. pIVEX-SNAP-GFP contains the R30I mutant of the SNAP-tag 61 . The scFv insert was cloned into pIVEX-SNAP-HA by replacing the avi-tag in front of the SNAP sequence with NdeI and KpnI restriction sites. For the purpose of scFv selections a new vector, pISNEX, was created (for details see the Supplementary Protocol 2 and Supplementary Fig. 9A).
The benzylguanine (BG) conjugation to the pIVBT7 oligonucleotide method was adapted from the protocol published by Stein et al. 20 , with minor modifications (see Supplementary Protocol 3 for an updated procedure).
Creation of the scFv error-prone libraries. The pISNEX-SNAP-HA plasmid containing the gene encoding E09 scFv or the recovered linear DNA from the second round of selection was used as a template for the error-prone PCR using the GeneMorph II random mutagenesis kit (Agilent) (see Supplementary Protocol 6 for details). A portion of the library was used for assembly into the full BeSD template and a fraction was cloned into Scientific RepoRts | 6:36391 | DOI: 10.1038/srep36391 pISNEX (by NotI and BamHI restriction) to determine the mutation rate by sequencing a number of randomly picked colonies (74 or 45 sequences were analysed for Library I and Library II respectively).
The assembly of the in vitro transcription linear DNA template. The assembly of the linear DNA template for the in vitro transcription and translation (IVTT) was performed essentially as described by Houlihan et al. 24 , with a few modifications. In brief: the assembly fragments (5′ untranslated region-AGT and 3′ HA-tag and untranslated region) were amplified in separate PCR steps with the primer pairs LMB/LMB-match and pIVBT7/ pIVBT7-match (for the full list of primers used in this publication see Supplementary Table 1), respectively, from the plasmid template pISNEX-SNAP-GFP-HA. Standard thermo-cycling conditions were used with the annealing temperature of 55 °C and the 30 s extension at 72 °C (using Pfu Ultra II polymerase; Agilent). To remove the template vector the fragments were gel purified (QIAquick Gel Extraction Kit, QIAGEN), then digested with DpnI (NEB), followed by ProteinaseK treatment (NEB), and finally purified with DNA Clean&Concentrator kit (Zymo Research). The 50 μ l PCR assembly reaction (also with Pfu Ultra II polymerase) was done with LMB and BG-conjugated pIVBT7 primers, 20 ng of each assembly fragment and 40 ng of the insert fragment (either created with error-prone or with recovery primer pair). The standard thermo-cycling was done with annealing step at 58 °C, extension at 72 °C for 1 min 30 s and for 30 cycles. Samples were run on an agarose gel to confirm that DNA fragments with the correct size were amplified (data not shown).
Selection by Bead Surface Display and on-bead assays. The streptavidin coated beads (5.18 μ m, SiO2-MAG-SA-S1964, Microparticles) were used for the display of the SNAP-scFv-HA construct. Each selection round was performed according to the procedure described by Diamante et al. 31 , with the following modifications: 1) improvement of the emulsion PCR reproducibility by changing the surfactant to PicoSurf-1 (Dolomite); 2) optimisation of the IVTT conditions to accommodate for the expression of antibody fragments (e.g., lowering the expression temperature and increasing the incubation time); 3) optimising the deemulsification procedure for minimal loss of beads; 4) changing the primer pair for more efficient DNA recovery of beads selected by FACS. See the Supplementary Protocol 1 for the detailed experimental procedure.
For the on-bead assays the beads were first coated with the DNA anchors (presenting a biotin molecule on the 5′ end and BG moiety on the 3′ -end), then SNAP-scFv-HA (or SNAP-HA) was expressed with PURExpress (NEB), following the manufacturers recommendations. See the Supplementary Protocols 4 and 5 for the synthesis of the anchor DNA and in vitro transcription/translation procedures. For the display assay the beads were incubated with 70 nM Alexa488-labeled anti-HA antibody (1 h, room temperature, shaken at 1,200 rpm), then the unbound antibody was removed by the standard wash. For the binding assay the beads were incubated (1 h, room temperature, shaken at 1,200 rpm) with 1 nM FasR-Fc, unless stated otherwise (R&D Systems), followed by an incubation (1 h, room temperature, shaken at 1,200 rpm) with DyLight488-labelled polyclonal goat anti-Fc antibody (Abcam) at 10-fold molar excess over the receptor concentration (typically 10 nM). Both steps were carried out in 1.5% marvel in PBS each was followed by three washes with PBS supplemented with Tween 20 (0.05%).
The fluorescence of the beads was analysed by flow cytometry (Cytek DxP8) and the data were analysed in FlowJo10. For the on-bead affinity analysis the Fas-Fc was titrated against the beads and the normalized median fluorescence signal values (normalised to the highest value) were fitted to the saturation binding curve equation using Prism GraphPad software (see Supplementary Fig. 3).
Quantification of DNA molecules coupled to streptavidin beads by real-time PCR. The procedure was done as described before 31 . In brief: each PCR reaction contained 500 beads decorated with DNA templates and/or anchors were used, 0.8 μ M of each primer (F-RT-1 and R-RT-1 for quantification of the template or F-RT-1 and pIVBT7 for the anchors) and 2x SensiMix SYBR No-ROX Kit (Bioline). The RT-PCR (Corbett Research Rotor-Gene 6000) program started with an initial step of 10 min at 95 °C followed by 40 cycles (95 °C for 10 s, 60 °C for 10 s, 72 °C for 5 s). Reactions were performed in duplicate and a standard curve was obtained using known concentration of linear template coding for SNAP-GFP-HA (created with unmodified LMB and pIVBT7 primers) with correlation coefficient R 2 > 0.99. The number of DNA copies/reaction was calculated using the software accompanying the Rotor-Gene 6000 series and divided by the number of beads (500) and, in the case of DNA templates, multiplied by the correction factor 0.3 (as defined by Diamante et al. 31 , fraction of beads bearing DNA, according to the Poisson distribution, out of the total amount of beads).
Phylogenetic analysis of the selection output. 223 amino acid sequences of randomly selected scFvs from the fourth selection round output, were aligned using the MUSCLE algorithm 51 . Afterwards, the alignment was analysed with the Randomised Axelerated Maximum Likelihood program (RAxML) using the CIPRES online tool. The RAxML is the leading method for large-scale maximum likelihood (ML) estimation, which is a classical statistical method for phylogeny estimation. The analysis was performed using standard parameters and using bootstrap analysis (resampling method) with 100 tests. The phylogenetic tree was then visualised using the FigTree software (http://tree.bio.ed.ac.uk/software/figtree). The amino acid sequences of the scFv from each clade were aligned manually and the mutation frequency was calculated with Microsoft Excel 52 . The RAxML is the leading method for large-scale maximum likelihood (ML) estimation, which is a classical statistical method for phylogeny estimation. The analysis was performed using standard parameters and using bootstrap analysis (resampling method) with 100 tests. The resulted phylogenetic tree was then visualised using the FigTree software (http://tree.bio.ed.ac.uk/software/figtree). The amino acid sequences of the scFv from each clade were aligned manually and the mutation frequency was calculated with Microsoft Excel.
Site-directed mutagenesis. Single-point mutants of E09 scFv as well as the R4aS and R4aH consensus scFvs were generated by saturation mutagenesis with QuikChange Lightning Site-Directed Mutagenesis Kit (Agilent) following the manufacturer's protocol. The site-directed mutagenesis was performed on the E09 in pCantab6 vector using mutagenic primers listed in the Supplementary Expression and characterisation of the selected antibodies and scFvs. The scFvs were expressed periplasmicaly from the vector pCantab6 in the bacterial strain TG1 and purified using nickel affinity chromatography. The IgGs were expressed and purified as described by Chodorge et al. 37 . The K d values were determined by bio-layer interferometry using an Octet Red384 instrument (ForteBio, Inc.) and surface plasmon resonance (SPR) using a BIAcore T100 instrument. See Supplementary Protocols 8-10 for the experimental details.