Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Article
  • Published:

Engineered peptide barcodes for in-depth analyses of binding protein libraries

Abstract

Binding protein generation typically relies on laborious screening cascades that process candidate molecules individually. We have developed NestLink, a binder selection and identification technology able to biophysically characterize thousands of library members at once without the need to handle individual clones at any stage of the process. NestLink uses genetically encoded barcoding peptides termed flycodes, which were designed for maximal detectability by mass spectrometry and support accurate deep sequencing. We demonstrate NestLink’s capacity to overcome the current limitations of binder-generation methods in three applications. First, we show that hundreds of binder candidates can be simultaneously ranked according to kinetic parameters. Next, we demonstrate deep mining of a nanobody immune repertoire for membrane protein binders, carried out entirely in solution without target immobilization. Finally, we identify rare binders against an integral membrane protein directly in the cellular environment of a human pathogen. NestLink opens avenues for the selection of tailored binder characteristics directly in tissues or in living organisms.

This is a preview of subscription content, access via your institution

Access options

Buy this article

Prices may be subject to local taxes which are calculated during checkout

Fig. 1: NestLink overview.
Fig. 2: Library nesting.
Fig. 3: Flycode library design and characteristics.
Fig. 4: Application I: ranking hundreds of binders according to their off-rates.
Fig. 5: Application II: nanobody selections without target immobilization.
Fig. 6: Application III: specific recognition of an outer membrane protein in the cellular context.

Similar content being viewed by others

Data availability

Mass spectrometry data are available via ProteomeXchange with the identifier PXD009301. NGS datasets are available on the European Nucleotide Archive (ENA) under accession number PRJEB25673. The NGS and MS data were handled and annotated using the B-Fabric information management system35 and are available for registered users under project identifiers 1644 and 1875, respectively. Source data for Figs. 46 are available online.

Code availability

The custom software used to design the flycode library and to filter and analyze NGS data is available through http://bioconductor.org/packages/NestLink/.

References

  1. Hanes, J. & Plückthun, A. In vitro selection and evolution of functional proteins by using ribosome display. Proc. Natl Acad. Sci. USA 94, 4937–4942 (1997).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  2. Bradbury, A. R., Sidhu, S., Dubel, S. & McCafferty, J. Beyond natural antibodies: the power of in vitro display technologies. Nat. Biotechnol. 29, 245–254 (2011).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  3. Boder, E. T., Midelfort, K. S. & Wittrup, K. D. Directed evolution of antibody fragments with monovalent femtomolar antigen-binding affinity. Proc. Natl Acad. Sci. USA 97, 10701–10705 (2000).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  4. Hassapis, K. A., Stylianou, D. C. & Kostrikis, L. G. Architectural insight into inovirus-associated vectors (IAVs) and development of IAV-based vaccines inducing humoral and cellular responses: implications in HIV-1 vaccines. Viruses 6, 5047–5076 (2014).

  5. Burkovitz, A. & Ofran, Y. Understanding differences between synthetic and natural antibodies can help improve antibody engineering. mAbs 8, 278–287 (2016).

    Article  CAS  PubMed  Google Scholar 

  6. Genick, C. C. et al. Applications of biophysics in high-throughput screening hit validation. J. Biomol. Screen. 19, 707–714 (2014).

    Article  CAS  PubMed  Google Scholar 

  7. Fusaro, V. A., Mani, D. R., Mesirov, J. P. & Carr, S. A. Prediction of high-responding peptides for targeted protein assays by mass spectrometry. Nat. Biotechnol. 27, 190–198 (2009).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  8. Zimmermann, I. et al. Synthetic single domain antibodies for the conformational trapping of membrane proteins. eLife 7, e34317 (2018).

    Article  PubMed  PubMed Central  Google Scholar 

  9. Hohl, M., Briand, C., Grütter, M. G. & Seeger, M. A. Crystal structure of a heterodimeric ABC transporter in its inward-facing conformation. Nat. Struct. Mol. Biol. 19, 395–402 (2012).

    Article  CAS  PubMed  Google Scholar 

  10. Hohl, M. et al. Structural basis for allosteric cross-talk between the asymmetric nucleotide binding sites of a heterodimeric ABC exporter. Proc. Natl Acad. Sci. USA 111, 11025–11030 (2014).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  11. Pardon, E. et al. A general protocol for the generation of Nanobodies for structural biology. Nat. Protoc. 9, 674–693 (2014).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  12. Storek, K. M. et al. Monoclonal antibody targeting the β-barrel assembly machine of Escherichia coli is bactericidal. Proc. Natl Acad. Sci. USA 115, 3692–3697 (2018).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  13. Fridy, P. C. et al. A robust pipeline for rapid production of versatile nanobody repertoires. Nat. Methods 11, 1253–1260 (2014).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  14. Cheung, W. C. et al. A proteomics approach for the identification and cloning of monoclonal antibodies from serum. Nat. Biotechnol. 30, 447–452 (2012).

    Article  CAS  PubMed  Google Scholar 

  15. Sato, S. et al. Proteomics-directed cloning of circulating antiviral human monoclonal antibodies. Nat. Biotechnol. 30, 1039–1043 (2012).

    Article  CAS  PubMed  Google Scholar 

  16. Wine, Y. et al. Molecular deconvolution of the monoclonal antibodies that comprise the polyclonal serum response. Proc. Natl Acad. Sci. USA 110, 2993–2998 (2013).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  17. Lavinder, J. J. et al. Identification and characterization of the constituent human serum antibodies elicited by vaccination. Proc. Natl Acad. Sci. USA 111, 2259–2264 (2014).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  18. Boutz, D. R. et al. Proteomic identification of monoclonal antibodies from serum. Anal. Chem. 86, 4758–4766 (2014).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  19. Cotham, V. C., Horton, A. P., Lee, J. W., Georgiou, G. & Brodbelt, J. S. Middle-Down 193-nm ultraviolet photodissociation for unambiguous antibody identification and its implications for immunoproteomic analysis. Anal. Chem. 89, 6498–6504 (2017).

    Article  CAS  PubMed  Google Scholar 

  20. Gu, L. C. et al. Multiplex single-molecule interaction profiling of DNA-barcoded proteins. Nature 515, 554 (2014).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  21. Darmanis, S. et al. ProteinSeq: high-performance proteomic analyses by proximity ligation and next generation sequencing. PLoS ONE 6, e25583 (2011).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  22. McGregor, L. M., Jain, T. & Liu, D. R. Identification of ligand-target pairs from combined libraries of small molecules and unpurified protein targets in cell lysates. J. Amer. Chem. Soc. 136, 3264–3270 (2014).

    Article  CAS  Google Scholar 

  23. Jespers, L., Schon, O., Famm, K. & Winter, G. Aggregation-resistant domain antibodies selected on phage by heat denaturation. Nat. Biotechnol. 22, 1161–1165 (2004).

    Article  CAS  PubMed  Google Scholar 

  24. Sieber, V., Pluckthun, A. & Schmid, F. X. Selecting proteins with improved stability by a phage-based method. Nat. Biotechnol. 16, 955–960 (1998).

    Article  CAS  PubMed  Google Scholar 

  25. Krokhin, O. V. et al. An improved model for prediction of retention times of tryptic peptides in ion pair reversed-phase HPLC: its application to protein peptide mapping by off-line HPLC-MALDI MS. Mol. Cell Proteomics 3, 908–919 (2004).

    Article  CAS  PubMed  Google Scholar 

  26. Panse, C., Trachsel, C., Grossmann, J. & Schlapbach, R. specL—an R/Bioconductor package to prepare peptide spectrum matches for use in targeted proteomics. Bioinformatics 31, 2228–2231 (2015).

    Article  CAS  PubMed  Google Scholar 

  27. Geertsma, E. R. & Dutzler, R. A versatile and efficient high-throughput cloning tool for structural biology. Biochemistry 50, 3272–3278 (2011).

    Article  CAS  PubMed  Google Scholar 

  28. Shugay, M. et al. Towards error-free profiling of immune repertoires. Nat. Methods 11, 653–655 (2014).

    Article  CAS  PubMed  Google Scholar 

  29. Glanville, J. et al. Deep sequencing in library selection projects: what insight does it bring? Curr. Opin. Struct. Biol. 33, 146–160 (2015).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  30. Barkow-Oesterreicher, S., Turker, C. & Panse, C. FCC—an automated rule-based processing tool for life science data. Source Code Biol. Med. 8, 3 (2013).

    Article  PubMed  PubMed Central  Google Scholar 

  31. Keller, A., Nesvizhskii, A. I., Kolker, E. & Aebersold, R. Empirical statistical model to estimate the accuracy of peptide identifications made by MS/MS and database search. Anal. Chem. 74, 5383–5392 (2002).

    Article  CAS  PubMed  Google Scholar 

  32. Nesvizhskii, A. I., Keller, A., Kolker, E. & Aebersold, R. A statistical model for identifying proteins by tandem mass spectrometry. Anal. Chem. 75, 4646–4658 (2003).

    Article  CAS  PubMed  Google Scholar 

  33. Schenck, S. et al. Generation and characterization of anti-VGLUT nanobodies acting as inhibitors of transport. Biochemistry 56, 3962–3971 (2017).

    Article  CAS  PubMed  Google Scholar 

  34. Gabay, J. E., Blake, M., Niles, W. D. & Horwitz, M. A. Purification of Legionella pneumophila major outer membrane protein and demonstration that it is a porin. J. Bacteriol. 162, 85–91 (1985).

    CAS  PubMed  PubMed Central  Google Scholar 

  35. Türker, C. et al. B-Fabric: the swiss army knife for life sciences. In Proc. 13th International Conference on Extending Database Technology (eds Manolescu, I. et al.) 717–720 (Association for Computing Machinery, 2010).

Download references

Acknowledgements

We thank O. Schubert, R. Dawson and E. Geertsma for their helpful comments on the manuscript, and S. Štefanić for alpaca immunizations. The authors acknowledge the CRAN and Bioconductor Core teams and in particular L. Shepherd for making the NestLink package available through Bioconductor. This work was funded by a grant of the Commission for Technology and Innovation CTI (no. 16003.1 PFLS-LS, to M.A.S.), a SNSF Professorship of the Swiss National Science Foundation (no. PP00P3_144823, to M.A.S.), an SNSF BRIDGE proof-of-concept grant (no. 20B1-1_175192, to P.E.) and a BioEntrepreneur-Fellowship of the University of Zurich (no. BIOEF-17-002, to I.Z.).

Author information

Authors and Affiliations

Authors

Contributions

P.E., I.Z. and M.A.S. developed the conceptual basis of NestLink. P.E., B.R., C.P. and M.A.S. designed the flycode library. P.E. and I.Z. generated the library. P.E. performed library nesting and selections. NGS was carried out by L.O., L.P. and P.E. I.Z., F.M.A., C.A.J.H., D.M., H.A.K. and P.E. performed single-clone analyses via SPR or flow cytometry. LC–MS/MS was performed and the data were analyzed by P.E., B.R. and C.P. P.E. and M.A.S. wrote the manuscript.

Corresponding author

Correspondence to Markus A. Seeger.

Ethics declarations

Competing interests

A patent seeking to protect the NestLink technology has been applied for (PCT/EP2017/077816). P.E., I.Z. and M.A.S. are listed as inventors.

Additional information

Publisher’s note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Integrated supplementary information

Supplementary Figure 1 Scheme illustrating NGS and sequence data analysis for NestLink.

The nested library is prepared for Illumina MiSeq NGS by a restriction digest with SfiI. This creates two different trinucleotide sticky ends, allowing for site-specific ligation of synthetic, double-stranded MiSeq adapters containing complementary sticky ends. It is critical to avoid PCR for Illumina adapter joining, as it introduces a large number of recombination events and inevitably causes the attachment of the same flycodes to several different binders (see Supplementary Fig. 3). By applying Illumina 2 x 300-bp paired-end sequencing to read nested single-domain antibody library, one can achieve a 47–90 bp overlap (depending on the length of the nested binders and flycodes). Thus, it allows for an accurate acquisition of full-length sybody or nanobody sequences. Indexed adapters permit multiplexed analysis of different nested libraries in a single NGS run. Based on the flycode diversity estimations obtained by c.f.u. counting during library nesting, the concentrations of differentially indexed nested libraries are adjusted to yield about 50 raw reads per flycode. Raw reads are first filtered for sequencing quality and expected sequence patterns. In a second filtering step, the high read redundancy allows calculations of representative consensus scores for nanobody sequences attached to the same flycode. A stringent consensus score filter is applied, which removes sequencing errors with high efficiency and eliminates rare flycodes attached to more than one binder of the library from further analyses. The NGS output is a database of unique nanobodies and their attached set of flycodes (flycode assignment table).

Supplementary Figure 2 LC–MS/MS analysis for NestLink.

The software Mascot was used for flycode identification and Progenesis for binder abundance determination. In a first step, the flycode assignment obtained by NGS is converted into a list of concatenated unique flycode sequences (proteins) and identifiers (binder sequences). The concatenated flycodes can be understood as virtual proteins that are searched for by Mascot. To this end Mascot performs first a tryptic digest of the virtual proteins in silico. Because flycodes do not encode trypsin cleavage sites internally, the in silico digest reverts the concatenation process via cleavage at the flycode terminal arginines. In a second step, the individual flycodes are matched to the recorded MS/MS spectra, followed by the calculation of flycode and binder scores, analogous to peptide and protein scores in proteomics. In a second step, the MS1 precursor ion features of identified flycodes are integrated using the software Progenesis. What we refer to by ‘binder abundance’ corresponds to the sum of all unique (non-conflicting) flycode feature MS1 intensity integrals of a particular binder. After filtering for a minimal number of detected flycodes (see Methods), the experimental outcome for NestLink applications II and III is analyzed by calculating ‘relative binder abundances’, which refers to the fraction of a particular binder abundance relative to the sum of all binder abundances in a sample (that is, one LC–MS/MS run). This sample-internal normalization enables one to monitor changes of binder frequencies (enrichment or depletion) within pools that have been subjected to different selection pressures.

Supplementary Figure 3 PCR amplifications need to be avoided for library nesting and NGS library preparation.

We initially tested a PCR-based protocol for NGS-adapter attachment. Contrary to what we expected, NGS data analysis revealed that the majority of flycodes were linked redundantly to different binder sequences and could thus not be unambiguously assigned to individual library members. We assumed that PCR amplification resulted in recombination events. Therefore, we tested this hypothesis experimentally. a, Eight clonal sybodies were pooled and amplified by PCR, followed by subcloning into a plasmid and Sanger sequencing of individual clones. Three out of twelve obtained sequences exhibited recombined CDRs. b, Mega-primer formation model. During PCR, incomplete elongation reactions result in mega-primer formation, which can anneal to alternative library members in subsequent cycles and cause recombination events. Since our libraries contain extensive homologous regions, recombination due to mega-primer formation represents a critical problem. Therefore, the NestLink protocol (library nesting and NGS-adapter attachment) operates completely independent of PCR.

Supplementary Figure 4 In-depth analysis of flycodes based on NGS and LC–MS/MS data.

a, Five distinct flycode C-terminal sequences (red, yellow, green, blue, purple) were designed to cover all areas of the optimal LC–MS/MS detection window with respect to m/z and hydrophobicity. The right panel shows the mass (y-axis) and retention time (x-axis) of 5,202 flycode precursor ions of application II detected by LC–MS/MS (Mascot scores higher than 40). The left panel depicts the simulated dispersion of the same set of flycodes, with hydrophobicities predicted by Sequence Specific Retention Calculator (SSRC)14. b, Histogram showing the theoretical amino acid composition of the seven randomized positions of the flycodes (X7) along with the experimentally determined composition based on NGS of 60,894 flycodes. Note that the amino acids C, H, I, M, K and R were planned to be absent from the flycode sequence and occurred only rarely in NGS (<0.06% of all sequenced amino acids). c, Histogram depicting the number of unique flycodes per nanobody of the nested library used in application II, as determined by NGS.

Supplementary Figure 5 Characterization of a monomeric and an oligomerizing sybody in the presence and absence of flycodes.

A monomeric and an oligomerizing sybody were individually fused to more than 1,000 flycodes at the genetic level, followed by expression and purification of the fusion proteins via His-tag. As controls, both binders were expressed and purified without flycodes. Purified proteins were separated by SEC using a Superdex 200 increase 10/300 GL column.

Supplementary Figure 6 Titration experiment to evaluate flycode detection by LC–MS/MS.

a, One sybody (identical sequence throughout the experiment) was fused to 24 different sets of 30 flycodes of known sequence. The 24 flycode sets were separately expressed and purified. After concentration determination, the flycode sets were mixed to obtain four concentration groups, each comprising six flycode sets of which an identical amount was added. The entire mixture was split in two and added either to 50 ml E. coli or an M. smegmatis lysate. Subsequently, the flycodes were isolated from these two lysates (biological replicates) and analyzed in two LC–MS/MS runs (technical replicates). b, List showing the MS1 intensity sums over all detected flycodes for each of the 24 flycode sets. Raw data are given for the four LC–MS/MS runs performed in this experiment. Further, the number of detected flycodes and the coverage (percentage of detected flycodes/total flycodes) is provided for the 24 flycode sets. The data were further analyzed as shown in Supplementary Fig. 7. The entire experiment was performed once.

Supplementary Figure 7 Analysis of the flycode titration experiment.

a, Absolute quantification of protein concentrations via flycodes. Mean and s.d. of the summed MS1 intensity for each concentration group (six values per concentration group) are plotted against the amount of flycode sets added to the bacterial lysate (calculations based on the first LC–MS/MS run of flycodes isolated from the E. coli lysate). The data were fitted using linear regression (R2 = 0.9994). The coefficient of variation (CV) for each data point is indicated. b, The CVs between flycode sets of the same concentration groups were determined in the context of relative quantification (biological and technical replicates). More information how these CVs were calculated can be found in Supplemantary Note 1. ce, CVs are plotted versus the flycode numbers for absolute quantification (c) and relative quantification using biological replicates (d) or technical replicates (e). The values were obtained by running a computational program that randomly removed flycodes from each flycode set (the gray dots represent the results of 1,000 repetitions). Blue dots correspond to the median CV and blue tick marks indicate the 0.05 and 0.95 quartiles. The red lines at CVs of 50% (c) and 10% (ce) were included to guide the eye.

Supplementary Figure 8 SPR analysis of MBP sybodies identified by NestLink.

Data were measured with the ProteOn XPR36 Protein Interaction Array System (Bio-Rad) using biotinylated MBP (immobilized) and 81 nM, 27 nM, 9 nM, 3 nM, and 1 nM of the purified sybodies. This SPR experiment was performed once.

Supplementary Figure 9 Reproducibility test based on application II.

The characterized nanobodies of main text Fig. 5 are shown. The list compares their enrichment in the original experiment (upper line, light gray bars) with the repetition experiment (lower line, dark gray bars). The bars illustrate the fold increase of relative abundances within the four LC–MS/MS samples of each binder.

Supplementary Figure 10 SPR analysis of nanobodies selected in solution against TM287/288.

Data were measured with the ProteOn XPR36 Protein Interaction Array System (Bio-Rad) using biotinylated TM287/288 (immobilized) and the following concentrations of the purified nanobodies: 27 nM, 9 nM, 3 nM, 1 nM, 0.33 nM for NL2.01, NL13.1; 81 nM, 27 nM, 9 nM, 3 nM, 1 nM for NL3.01, NL6.01, NL11.1, NL19.1, NL26.1, NL29.1; 243 nM, 81 nM, 27 nM, 9 nM, 3 nM for NL17.1. This SPR experiment was performed once.

Supplementary Figure 11 Additional selection experiments using the nested pool of application II.

a, Venn diagram showing the number of nanobodies identified using binder selections in solution by SEC (green circle) and by pulldown using immobilized target (blue circle). b, Venn diagram showing the number of nanobodies identified by pulldown with immobilized target using monomeric nested library members after SEC (blue circle) or all expressed nested library members prior to SEC (red circle). c, Elution profile of nested nanobody library on SEC. The gray rectangle indicates the elution fractions corresponding to monomeric nanobodies that were used for binder selections after SEC.

Supplementary Figure 12 Flow cytometry screen.

Detection of cell-surface binding on various bacterial strains for the sybodies recognizing MOMP of Lp-SG6 (extended analysis of Fig. 6e).

Supplementary Figure 13 Vectors for library nesting.

a, Library members of interest (here sybodies) are cloned into an FX cloning initial vector20, from which they are excised using type IIS restriction enzyme SapI. b, The flycode library is harbored on the E. coli expression vector pNLx. The library of interest is inserted via exchange of ccdB using SapI. c, Open reading frame of pNLx after library nesting shown at the example of a sybody linked to an 11-amino-acid flycode. SfiI cleavage is used to attach Illumina adapters.

Supplementary information

Supplementary Information

Supplementary Figures 1–13, Supplementary Notes 1–3 and Supplementary Tables 1 and 2

Reporting Summary

Supplementary Protocol

Identification of binding proteins by NestLink

Source data

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Egloff, P., Zimmermann, I., Arnold, F.M. et al. Engineered peptide barcodes for in-depth analyses of binding protein libraries. Nat Methods 16, 421–428 (2019). https://doi.org/10.1038/s41592-019-0389-8

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1038/s41592-019-0389-8

This article is cited by

Search

Quick links

Nature Briefing: Translational Research

Sign up for the Nature Briefing: Translational Research newsletter — top stories in biotechnology, drug discovery and pharma.

Get what matters in translational research, free to your inbox weekly. Sign up for Nature Briefing: Translational Research