Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Article
  • Published:

High-resolution mapping of protein sequence-function relationships

Abstract

We present a large-scale approach to investigate the functional consequences of sequence variation in a protein. The approach entails the display of hundreds of thousands of protein variants, moderate selection for activity and high-throughput DNA sequencing to quantify the performance of each variant. Using this strategy, we tracked the performance of >600,000 variants of a human WW domain after three and six rounds of selection by phage display for binding to its peptide ligand. Binding properties of these variants defined a high-resolution map of mutational preference across the WW domain; each position had unique features that could not be captured by a few representative mutations. Our approach could be applied to many in vitro or in vivo protein assays, providing a general means for understanding how protein function relates to sequence.

This is a preview of subscription content, access via your institution

Access options

Buy this article

Prices may be subject to local taxes which are calculated during checkout

Figure 1: A highly parallel assay for exploring protein sequence-function relationships.
Figure 2: Comparison of mutational tolerance and evolutionary conservation in the WW domain.
Figure 3: Comprehensive sequence-function map of the WW domain.
Figure 4: Prediction of WW domain folding energies and double-mutant enrichment ratios.

Similar content being viewed by others

Accession codes

Accessions

GenBank/EMBL/DDBJ

References

  1. Sidhu, S.S. & Koide, S. Phage display for engineering and analyzing protein interaction interfaces. Curr. Opin. Struct. Biol. 17, 481–487 (2007).

    Article  CAS  Google Scholar 

  2. Matouschek, A., Kellis, J.T. Jr., Serrano, L. & Fersht, A.R. Mapping the transition state and pathway of protein folding by protein engineering. Nature 340, 122–126 (1989).

    Article  CAS  Google Scholar 

  3. Cunningham, B.C. & Wells, J.A. High-resolution epitope mapping of hGH-receptor interactions by alanine-scanning mutagenesis. Science 244, 1081–1085 (1989).

    Article  CAS  Google Scholar 

  4. Levin, A.M. & Weiss, G.A. Optimizing the affinity and specificity of proteins with molecular display. Mol. Biosyst. 2, 49–57 (2006).

    Article  CAS  Google Scholar 

  5. Pal, G., Kouadio, J.L., Artis, D.R., Kossiakoff, A.A. & Sidhu, S.S. Comprehensive and quantitative mapping of energy landscapes for protein-protein interactions by rapid combinatorial scanning. J. Biol. Chem. 281, 22378–22385 (2006).

    Article  CAS  Google Scholar 

  6. Dias-Neto, E. et al. Next-generation phage display: integrating and comparing available molecular tools to enable cost-effective high-throughput analysis. PLoS ONE 4, e8338 (2009).

    Article  Google Scholar 

  7. Ge, X., Mazor, Y., Hunicke-Smith, S.P., Ellington, A.D. & Georgiou, G. Rapid construction and characterization of synthetic antibody libraries without DNA amplification. Biotechnol. Bioeng. 106, 347–357 (2010).

    CAS  PubMed  Google Scholar 

  8. Di Niro, R. et al. Rapid interactome profiling by massive sequencing. Nucleic Acids Res. 38, e110 (2010).

    Article  Google Scholar 

  9. Macias, M.J., Wiesner, S. & Sudol, M. WW and SH3 domains, two different scaffolds to recognize proline-rich ligands. FEBS Lett. 513, 30–37 (2002).

    Article  CAS  Google Scholar 

  10. Espanel, X. et al. Probing WW domains to uncover and refine determinants of specificity in ligand recognition. Cytotechnology 43, 105–111 (2003).

    Article  CAS  Google Scholar 

  11. Jager, M., Nguyen, H., Crane, J.C., Kelly, J.W. & Gruebele, M. The folding mechanism of a beta-sheet: the WW domain. J. Mol. Biol. 311, 373–393 (2001).

    Article  CAS  Google Scholar 

  12. Jiang, X., Kowalski, J. & Kelly, J.W. Increasing protein stability using a rational approach combining sequence homology and structural alignment: stabilizing the WW domain. Protein Sci. 10, 1454–1465 (2001).

    Article  CAS  Google Scholar 

  13. Kasanov, J., Pirozzi, G., Uveges, A.J. &, Kay, B.K. Characterizing class I WW domains defines key specificity determinants and generates mutant domains with novel specificities. Chem. Biol. 8, 231–241 (2001).

    Article  CAS  Google Scholar 

  14. Koepf, E.K. et al. Characterization of the structure and function of W → F WW domain variants: identification of a natively unfolded protein that folds upon ligand binding. Biochemistry 38, 14338–14351 (1999).

    Article  CAS  Google Scholar 

  15. Nguyen, H., Jager, M., Moretto, A., Gruebele, M. & Kelly, J.W. Tuning the free-energy landscape of a WW domain by temperature, mutation, and truncation. Proc. Natl. Acad. Sci. USA 100, 3948–3953 (2003).

    Article  CAS  Google Scholar 

  16. Pires, J.R. et al. Solution structures of the YAP65 WW domain and the variant L30 K in complex with the peptides GTPPPPYTVG, N-(n-octyl)-GPPPY and PLPPY and the application of peptide libraries reveal a minimal binding epitope. J. Mol. Biol. 314, 1147–1156 (2001).

    Article  CAS  Google Scholar 

  17. Toepert, F., Pires, J.R., Landgraf, C., Oschkinat, H. & Schneider-Mergener, J. Synthesis of an array comprising 837 variants of the hYAP WW protein domain. Angew. Chem. Int. Edn Engl. 40, 897–900 (2001).

    Article  CAS  Google Scholar 

  18. Yanagida, H., Matsuura, T. & Yomo, T. Compensatory evolution of a WW domain variant lacking the strictly conserved Trp residue. J. Mol. Evol. 66, 61–71 (2008).

    Article  CAS  Google Scholar 

  19. Dalby, P.A., Hoess, R.H. & DeGrado, W.F. Evolution of binding affinity in a WW domain probed by phage display. Protein Sci. 9, 2366–2376 (2000).

    Article  CAS  Google Scholar 

  20. Dai, M. et al. Using T7 phage display to select GFP-based binders. Protein Eng. Des. Sel. 21, 413–424 (2008).

    Article  CAS  Google Scholar 

  21. Quail, M.A. et al. A large genome center's improvements to the Illumina sequencing system. Nat. Methods 5, 1005–1010 (2008).

    Article  CAS  Google Scholar 

  22. Knight, R. & Yarus, M. Analyzing partially randomized nucleic acid pools: straight dope on doping. Nucleic Acids Res. 31, e30 (2003).

    Article  Google Scholar 

  23. Weiss, G.A., Watanabe, C.K., Zhong, A., Goddard, A. & Sidhu, S.S. Rapid mapping of protein functional epitopes by combinatorial alanine scanning. Proc. Natl. Acad. Sci. USA 97, 8950–8954 (2000).

    Article  CAS  Google Scholar 

  24. Guo, H.H., Choe, J. & Loeb, L.A. Protein tolerance to random amino acid change. Proc. Natl. Acad. Sci. USA 101, 9205–9210 (2004).

    Article  CAS  Google Scholar 

  25. Das, R. & Baker, D. Macromolecular modeling with Rosetta. Annu. Rev. Biochem. 77, 363–382 (2008).

    Article  CAS  Google Scholar 

  26. Kortemme, T. & Baker, D. A simple physical model for binding energy hot spots in protein-protein complexes. Proc. Natl. Acad. Sci. USA 99, 14116–14121 (2002).

    Article  CAS  Google Scholar 

  27. Bershtein, S., Segal, M., Bekerman, R., Tokuriki, N. & Tawfik, D.S. Robustness-epistasis link shapes the fitness landscape of a randomly drifting protein. Nature 444, 929–932 (2006).

    Article  CAS  Google Scholar 

  28. Weinreich, D.M., Delaney, N.F., Depristo, M.A. & Hartl, D.L. Darwinian evolution can follow only very few mutational paths to fitter proteins. Science 312, 111–114 (2006).

    Article  CAS  Google Scholar 

  29. Ge, B. et al. Survey of allelic expression using EST mining. Genome Res. 15, 1584–1591 (2005).

    Article  CAS  Google Scholar 

  30. Storey, J.D. & Tibshirani, R. Statistical significance for genomewide studies. Proc. Natl. Acad. Sci. USA 100, 9440–9445 (2003).

    Article  CAS  Google Scholar 

Download references

Acknowledgements

We thank C. Lee and J. Shendure for assistance with DNA sequencing, and J. Kelly, J. Thomas, J. Hesselberth, E. Phizicky, A. Rubin, L. Starita and K. McGarvey for helpful comments and discussion. This work was supported by the US National Institutes of Health (P41 RR11823 to S.F. and D.B., and F32GM084699 to D.M.F.). S.F. and D.B. were supported by the Howard Hughes Medical Institute.

Author information

Authors and Affiliations

Authors

Contributions

D.M.F. conceived of the method, carried out the experiments, analyzed the data and wrote the paper; C.L.A. conceived of the method, analyzed the data and wrote the paper; J.J.S. carried out the experiments; E.H.K., S.J.F. and D.B. carried out the protein folding and binding energy calculations; and S.F. conceived of the method and wrote the paper.

Corresponding author

Correspondence to Stanley Fields.

Ethics declarations

Competing interests

The authors declare no competing financial interests.

Supplementary information

Supplementary Text and Figures

Supplementary Figures 1–11, Supplementary Tables 1–2, Supplementary Note 1 (PDF 13072 kb)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Fowler, D., Araya, C., Fleishman, S. et al. High-resolution mapping of protein sequence-function relationships. Nat Methods 7, 741–746 (2010). https://doi.org/10.1038/nmeth.1492

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1038/nmeth.1492

This article is cited by

Search

Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing