Binding protein generation typically relies on laborious screening cascades that process candidate molecules individually. We have developed NestLink, a binder selection and identification technology able to biophysically characterize thousands of library members at once without the need to handle individual clones at any stage of the process. NestLink uses genetically encoded barcoding peptides termed flycodes, which were designed for maximal detectability by mass spectrometry and support accurate deep sequencing. We demonstrate NestLink’s capacity to overcome the current limitations of binder-generation methods in three applications. First, we show that hundreds of binder candidates can be simultaneously ranked according to kinetic parameters. Next, we demonstrate deep mining of a nanobody immune repertoire for membrane protein binders, carried out entirely in solution without target immobilization. Finally, we identify rare binders against an integral membrane protein directly in the cellular environment of a human pathogen. NestLink opens avenues for the selection of tailored binder characteristics directly in tissues or in living organisms.
This is a preview of subscription content, access via your institution
Open Access articles citing this article.
Scientific Reports Open Access 02 November 2021
Nature Communications Open Access 17 September 2021
AMB Express Open Access 15 July 2019
Access Nature and 54 other Nature Portfolio journals
Get Nature+, our best-value online-access subscription
$29.99 / 30 days
cancel any time
Subscribe to this journal
Receive 12 print issues and online access
$259.00 per year
only $21.58 per issue
Rent or buy this article
Prices vary by article type
Prices may be subject to local taxes which are calculated during checkout
Mass spectrometry data are available via ProteomeXchange with the identifier PXD009301. NGS datasets are available on the European Nucleotide Archive (ENA) under accession number PRJEB25673. The NGS and MS data were handled and annotated using the B-Fabric information management system35 and are available for registered users under project identifiers 1644 and 1875, respectively. Source data for Figs. 4–6 are available online.
The custom software used to design the flycode library and to filter and analyze NGS data is available through http://bioconductor.org/packages/NestLink/.
Hanes, J. & Plückthun, A. In vitro selection and evolution of functional proteins by using ribosome display. Proc. Natl Acad. Sci. USA 94, 4937–4942 (1997).
Bradbury, A. R., Sidhu, S., Dubel, S. & McCafferty, J. Beyond natural antibodies: the power of in vitro display technologies. Nat. Biotechnol. 29, 245–254 (2011).
Boder, E. T., Midelfort, K. S. & Wittrup, K. D. Directed evolution of antibody fragments with monovalent femtomolar antigen-binding affinity. Proc. Natl Acad. Sci. USA 97, 10701–10705 (2000).
Hassapis, K. A., Stylianou, D. C. & Kostrikis, L. G. Architectural insight into inovirus-associated vectors (IAVs) and development of IAV-based vaccines inducing humoral and cellular responses: implications in HIV-1 vaccines. Viruses 6, 5047–5076 (2014).
Burkovitz, A. & Ofran, Y. Understanding differences between synthetic and natural antibodies can help improve antibody engineering. mAbs 8, 278–287 (2016).
Genick, C. C. et al. Applications of biophysics in high-throughput screening hit validation. J. Biomol. Screen. 19, 707–714 (2014).
Fusaro, V. A., Mani, D. R., Mesirov, J. P. & Carr, S. A. Prediction of high-responding peptides for targeted protein assays by mass spectrometry. Nat. Biotechnol. 27, 190–198 (2009).
Zimmermann, I. et al. Synthetic single domain antibodies for the conformational trapping of membrane proteins. eLife 7, e34317 (2018).
Hohl, M., Briand, C., Grütter, M. G. & Seeger, M. A. Crystal structure of a heterodimeric ABC transporter in its inward-facing conformation. Nat. Struct. Mol. Biol. 19, 395–402 (2012).
Hohl, M. et al. Structural basis for allosteric cross-talk between the asymmetric nucleotide binding sites of a heterodimeric ABC exporter. Proc. Natl Acad. Sci. USA 111, 11025–11030 (2014).
Pardon, E. et al. A general protocol for the generation of Nanobodies for structural biology. Nat. Protoc. 9, 674–693 (2014).
Storek, K. M. et al. Monoclonal antibody targeting the β-barrel assembly machine of Escherichia coli is bactericidal. Proc. Natl Acad. Sci. USA 115, 3692–3697 (2018).
Fridy, P. C. et al. A robust pipeline for rapid production of versatile nanobody repertoires. Nat. Methods 11, 1253–1260 (2014).
Cheung, W. C. et al. A proteomics approach for the identification and cloning of monoclonal antibodies from serum. Nat. Biotechnol. 30, 447–452 (2012).
Sato, S. et al. Proteomics-directed cloning of circulating antiviral human monoclonal antibodies. Nat. Biotechnol. 30, 1039–1043 (2012).
Wine, Y. et al. Molecular deconvolution of the monoclonal antibodies that comprise the polyclonal serum response. Proc. Natl Acad. Sci. USA 110, 2993–2998 (2013).
Lavinder, J. J. et al. Identification and characterization of the constituent human serum antibodies elicited by vaccination. Proc. Natl Acad. Sci. USA 111, 2259–2264 (2014).
Boutz, D. R. et al. Proteomic identification of monoclonal antibodies from serum. Anal. Chem. 86, 4758–4766 (2014).
Cotham, V. C., Horton, A. P., Lee, J. W., Georgiou, G. & Brodbelt, J. S. Middle-Down 193-nm ultraviolet photodissociation for unambiguous antibody identification and its implications for immunoproteomic analysis. Anal. Chem. 89, 6498–6504 (2017).
Gu, L. C. et al. Multiplex single-molecule interaction profiling of DNA-barcoded proteins. Nature 515, 554 (2014).
Darmanis, S. et al. ProteinSeq: high-performance proteomic analyses by proximity ligation and next generation sequencing. PLoS ONE 6, e25583 (2011).
McGregor, L. M., Jain, T. & Liu, D. R. Identification of ligand-target pairs from combined libraries of small molecules and unpurified protein targets in cell lysates. J. Amer. Chem. Soc. 136, 3264–3270 (2014).
Jespers, L., Schon, O., Famm, K. & Winter, G. Aggregation-resistant domain antibodies selected on phage by heat denaturation. Nat. Biotechnol. 22, 1161–1165 (2004).
Sieber, V., Pluckthun, A. & Schmid, F. X. Selecting proteins with improved stability by a phage-based method. Nat. Biotechnol. 16, 955–960 (1998).
Krokhin, O. V. et al. An improved model for prediction of retention times of tryptic peptides in ion pair reversed-phase HPLC: its application to protein peptide mapping by off-line HPLC-MALDI MS. Mol. Cell Proteomics 3, 908–919 (2004).
Panse, C., Trachsel, C., Grossmann, J. & Schlapbach, R. specL—an R/Bioconductor package to prepare peptide spectrum matches for use in targeted proteomics. Bioinformatics 31, 2228–2231 (2015).
Geertsma, E. R. & Dutzler, R. A versatile and efficient high-throughput cloning tool for structural biology. Biochemistry 50, 3272–3278 (2011).
Shugay, M. et al. Towards error-free profiling of immune repertoires. Nat. Methods 11, 653–655 (2014).
Glanville, J. et al. Deep sequencing in library selection projects: what insight does it bring? Curr. Opin. Struct. Biol. 33, 146–160 (2015).
Barkow-Oesterreicher, S., Turker, C. & Panse, C. FCC—an automated rule-based processing tool for life science data. Source Code Biol. Med. 8, 3 (2013).
Keller, A., Nesvizhskii, A. I., Kolker, E. & Aebersold, R. Empirical statistical model to estimate the accuracy of peptide identifications made by MS/MS and database search. Anal. Chem. 74, 5383–5392 (2002).
Nesvizhskii, A. I., Keller, A., Kolker, E. & Aebersold, R. A statistical model for identifying proteins by tandem mass spectrometry. Anal. Chem. 75, 4646–4658 (2003).
Schenck, S. et al. Generation and characterization of anti-VGLUT nanobodies acting as inhibitors of transport. Biochemistry 56, 3962–3971 (2017).
Gabay, J. E., Blake, M., Niles, W. D. & Horwitz, M. A. Purification of Legionella pneumophila major outer membrane protein and demonstration that it is a porin. J. Bacteriol. 162, 85–91 (1985).
Türker, C. et al. B-Fabric: the swiss army knife for life sciences. In Proc. 13th International Conference on Extending Database Technology (eds Manolescu, I. et al.) 717–720 (Association for Computing Machinery, 2010).
We thank O. Schubert, R. Dawson and E. Geertsma for their helpful comments on the manuscript, and S. Štefanić for alpaca immunizations. The authors acknowledge the CRAN and Bioconductor Core teams and in particular L. Shepherd for making the NestLink package available through Bioconductor. This work was funded by a grant of the Commission for Technology and Innovation CTI (no. 16003.1 PFLS-LS, to M.A.S.), a SNSF Professorship of the Swiss National Science Foundation (no. PP00P3_144823, to M.A.S.), an SNSF BRIDGE proof-of-concept grant (no. 20B1-1_175192, to P.E.) and a BioEntrepreneur-Fellowship of the University of Zurich (no. BIOEF-17-002, to I.Z.).
A patent seeking to protect the NestLink technology has been applied for (PCT/EP2017/077816). P.E., I.Z. and M.A.S. are listed as inventors.
Publisher’s note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Integrated supplementary information
The nested library is prepared for Illumina MiSeq NGS by a restriction digest with SfiI. This creates two different trinucleotide sticky ends, allowing for site-specific ligation of synthetic, double-stranded MiSeq adapters containing complementary sticky ends. It is critical to avoid PCR for Illumina adapter joining, as it introduces a large number of recombination events and inevitably causes the attachment of the same flycodes to several different binders (see Supplementary Fig. 3). By applying Illumina 2 x 300-bp paired-end sequencing to read nested single-domain antibody library, one can achieve a 47–90 bp overlap (depending on the length of the nested binders and flycodes). Thus, it allows for an accurate acquisition of full-length sybody or nanobody sequences. Indexed adapters permit multiplexed analysis of different nested libraries in a single NGS run. Based on the flycode diversity estimations obtained by c.f.u. counting during library nesting, the concentrations of differentially indexed nested libraries are adjusted to yield about 50 raw reads per flycode. Raw reads are first filtered for sequencing quality and expected sequence patterns. In a second filtering step, the high read redundancy allows calculations of representative consensus scores for nanobody sequences attached to the same flycode. A stringent consensus score filter is applied, which removes sequencing errors with high efficiency and eliminates rare flycodes attached to more than one binder of the library from further analyses. The NGS output is a database of unique nanobodies and their attached set of flycodes (flycode assignment table).
The software Mascot was used for flycode identification and Progenesis for binder abundance determination. In a first step, the flycode assignment obtained by NGS is converted into a list of concatenated unique flycode sequences (proteins) and identifiers (binder sequences). The concatenated flycodes can be understood as virtual proteins that are searched for by Mascot. To this end Mascot performs first a tryptic digest of the virtual proteins in silico. Because flycodes do not encode trypsin cleavage sites internally, the in silico digest reverts the concatenation process via cleavage at the flycode terminal arginines. In a second step, the individual flycodes are matched to the recorded MS/MS spectra, followed by the calculation of flycode and binder scores, analogous to peptide and protein scores in proteomics. In a second step, the MS1 precursor ion features of identified flycodes are integrated using the software Progenesis. What we refer to by ‘binder abundance’ corresponds to the sum of all unique (non-conflicting) flycode feature MS1 intensity integrals of a particular binder. After filtering for a minimal number of detected flycodes (see Methods), the experimental outcome for NestLink applications II and III is analyzed by calculating ‘relative binder abundances’, which refers to the fraction of a particular binder abundance relative to the sum of all binder abundances in a sample (that is, one LC–MS/MS run). This sample-internal normalization enables one to monitor changes of binder frequencies (enrichment or depletion) within pools that have been subjected to different selection pressures.
Supplementary Figure 3 PCR amplifications need to be avoided for library nesting and NGS library preparation.
We initially tested a PCR-based protocol for NGS-adapter attachment. Contrary to what we expected, NGS data analysis revealed that the majority of flycodes were linked redundantly to different binder sequences and could thus not be unambiguously assigned to individual library members. We assumed that PCR amplification resulted in recombination events. Therefore, we tested this hypothesis experimentally. a, Eight clonal sybodies were pooled and amplified by PCR, followed by subcloning into a plasmid and Sanger sequencing of individual clones. Three out of twelve obtained sequences exhibited recombined CDRs. b, Mega-primer formation model. During PCR, incomplete elongation reactions result in mega-primer formation, which can anneal to alternative library members in subsequent cycles and cause recombination events. Since our libraries contain extensive homologous regions, recombination due to mega-primer formation represents a critical problem. Therefore, the NestLink protocol (library nesting and NGS-adapter attachment) operates completely independent of PCR.
a, Five distinct flycode C-terminal sequences (red, yellow, green, blue, purple) were designed to cover all areas of the optimal LC–MS/MS detection window with respect to m/z and hydrophobicity. The right panel shows the mass (y-axis) and retention time (x-axis) of 5,202 flycode precursor ions of application II detected by LC–MS/MS (Mascot scores higher than 40). The left panel depicts the simulated dispersion of the same set of flycodes, with hydrophobicities predicted by Sequence Specific Retention Calculator (SSRC)14. b, Histogram showing the theoretical amino acid composition of the seven randomized positions of the flycodes (X7) along with the experimentally determined composition based on NGS of 60,894 flycodes. Note that the amino acids C, H, I, M, K and R were planned to be absent from the flycode sequence and occurred only rarely in NGS (<0.06% of all sequenced amino acids). c, Histogram depicting the number of unique flycodes per nanobody of the nested library used in application II, as determined by NGS.
Supplementary Figure 5 Characterization of a monomeric and an oligomerizing sybody in the presence and absence of flycodes.
A monomeric and an oligomerizing sybody were individually fused to more than 1,000 flycodes at the genetic level, followed by expression and purification of the fusion proteins via His-tag. As controls, both binders were expressed and purified without flycodes. Purified proteins were separated by SEC using a Superdex 200 increase 10/300 GL column.
a, One sybody (identical sequence throughout the experiment) was fused to 24 different sets of 30 flycodes of known sequence. The 24 flycode sets were separately expressed and purified. After concentration determination, the flycode sets were mixed to obtain four concentration groups, each comprising six flycode sets of which an identical amount was added. The entire mixture was split in two and added either to 50 ml E. coli or an M. smegmatis lysate. Subsequently, the flycodes were isolated from these two lysates (biological replicates) and analyzed in two LC–MS/MS runs (technical replicates). b, List showing the MS1 intensity sums over all detected flycodes for each of the 24 flycode sets. Raw data are given for the four LC–MS/MS runs performed in this experiment. Further, the number of detected flycodes and the coverage (percentage of detected flycodes/total flycodes) is provided for the 24 flycode sets. The data were further analyzed as shown in Supplementary Fig. 7. The entire experiment was performed once.
a, Absolute quantification of protein concentrations via flycodes. Mean and s.d. of the summed MS1 intensity for each concentration group (six values per concentration group) are plotted against the amount of flycode sets added to the bacterial lysate (calculations based on the first LC–MS/MS run of flycodes isolated from the E. coli lysate). The data were fitted using linear regression (R2 = 0.9994). The coefficient of variation (CV) for each data point is indicated. b, The CVs between flycode sets of the same concentration groups were determined in the context of relative quantification (biological and technical replicates). More information how these CVs were calculated can be found in Supplemantary Note 1. c–e, CVs are plotted versus the flycode numbers for absolute quantification (c) and relative quantification using biological replicates (d) or technical replicates (e). The values were obtained by running a computational program that randomly removed flycodes from each flycode set (the gray dots represent the results of 1,000 repetitions). Blue dots correspond to the median CV and blue tick marks indicate the 0.05 and 0.95 quartiles. The red lines at CVs of 50% (c) and 10% (c–e) were included to guide the eye.
Data were measured with the ProteOn XPR36 Protein Interaction Array System (Bio-Rad) using biotinylated MBP (immobilized) and 81 nM, 27 nM, 9 nM, 3 nM, and 1 nM of the purified sybodies. This SPR experiment was performed once.
The characterized nanobodies of main text Fig. 5 are shown. The list compares their enrichment in the original experiment (upper line, light gray bars) with the repetition experiment (lower line, dark gray bars). The bars illustrate the fold increase of relative abundances within the four LC–MS/MS samples of each binder.
Data were measured with the ProteOn XPR36 Protein Interaction Array System (Bio-Rad) using biotinylated TM287/288 (immobilized) and the following concentrations of the purified nanobodies: 27 nM, 9 nM, 3 nM, 1 nM, 0.33 nM for NL2.01, NL13.1; 81 nM, 27 nM, 9 nM, 3 nM, 1 nM for NL3.01, NL6.01, NL11.1, NL19.1, NL26.1, NL29.1; 243 nM, 81 nM, 27 nM, 9 nM, 3 nM for NL17.1. This SPR experiment was performed once.
a, Venn diagram showing the number of nanobodies identified using binder selections in solution by SEC (green circle) and by pulldown using immobilized target (blue circle). b, Venn diagram showing the number of nanobodies identified by pulldown with immobilized target using monomeric nested library members after SEC (blue circle) or all expressed nested library members prior to SEC (red circle). c, Elution profile of nested nanobody library on SEC. The gray rectangle indicates the elution fractions corresponding to monomeric nanobodies that were used for binder selections after SEC.
Detection of cell-surface binding on various bacterial strains for the sybodies recognizing MOMP of Lp-SG6 (extended analysis of Fig. 6e).
a, Library members of interest (here sybodies) are cloned into an FX cloning initial vector20, from which they are excised using type IIS restriction enzyme SapI. b, The flycode library is harbored on the E. coli expression vector pNLx. The library of interest is inserted via exchange of ccdB using SapI. c, Open reading frame of pNLx after library nesting shown at the example of a sybody linked to an 11-amino-acid flycode. SfiI cleavage is used to attach Illumina adapters.
About this article
Cite this article
Egloff, P., Zimmermann, I., Arnold, F.M. et al. Engineered peptide barcodes for in-depth analyses of binding protein libraries. Nat Methods 16, 421–428 (2019). https://doi.org/10.1038/s41592-019-0389-8
This article is cited by
Nature Chemical Biology (2023)
Scientific Reports (2021)
Nature Communications (2021)
Nature Protocols (2020)