We present a large-scale approach to investigate the functional consequences of sequence variation in a protein. The approach entails the display of hundreds of thousands of protein variants, moderate selection for activity and high-throughput DNA sequencing to quantify the performance of each variant. Using this strategy, we tracked the performance of >600,000 variants of a human WW domain after three and six rounds of selection by phage display for binding to its peptide ligand. Binding properties of these variants defined a high-resolution map of mutational preference across the WW domain; each position had unique features that could not be captured by a few representative mutations. Our approach could be applied to many in vitro or in vivo protein assays, providing a general means for understanding how protein function relates to sequence.
Subscribe to Journal
Get full journal access for 1 year
only $9.92 per issue
All prices are NET prices.
VAT will be added later in the checkout.
Tax calculation will be finalised during checkout.
Rent or Buy article
Get time limited or full article access on ReadCube.
All prices are NET prices.
Sidhu, S.S. & Koide, S. Phage display for engineering and analyzing protein interaction interfaces. Curr. Opin. Struct. Biol. 17, 481–487 (2007).
Matouschek, A., Kellis, J.T. Jr., Serrano, L. & Fersht, A.R. Mapping the transition state and pathway of protein folding by protein engineering. Nature 340, 122–126 (1989).
Cunningham, B.C. & Wells, J.A. High-resolution epitope mapping of hGH-receptor interactions by alanine-scanning mutagenesis. Science 244, 1081–1085 (1989).
Levin, A.M. & Weiss, G.A. Optimizing the affinity and specificity of proteins with molecular display. Mol. Biosyst. 2, 49–57 (2006).
Pal, G., Kouadio, J.L., Artis, D.R., Kossiakoff, A.A. & Sidhu, S.S. Comprehensive and quantitative mapping of energy landscapes for protein-protein interactions by rapid combinatorial scanning. J. Biol. Chem. 281, 22378–22385 (2006).
Dias-Neto, E. et al. Next-generation phage display: integrating and comparing available molecular tools to enable cost-effective high-throughput analysis. PLoS ONE 4, e8338 (2009).
Ge, X., Mazor, Y., Hunicke-Smith, S.P., Ellington, A.D. & Georgiou, G. Rapid construction and characterization of synthetic antibody libraries without DNA amplification. Biotechnol. Bioeng. 106, 347–357 (2010).
Di Niro, R. et al. Rapid interactome profiling by massive sequencing. Nucleic Acids Res. 38, e110 (2010).
Macias, M.J., Wiesner, S. & Sudol, M. WW and SH3 domains, two different scaffolds to recognize proline-rich ligands. FEBS Lett. 513, 30–37 (2002).
Espanel, X. et al. Probing WW domains to uncover and refine determinants of specificity in ligand recognition. Cytotechnology 43, 105–111 (2003).
Jager, M., Nguyen, H., Crane, J.C., Kelly, J.W. & Gruebele, M. The folding mechanism of a beta-sheet: the WW domain. J. Mol. Biol. 311, 373–393 (2001).
Jiang, X., Kowalski, J. & Kelly, J.W. Increasing protein stability using a rational approach combining sequence homology and structural alignment: stabilizing the WW domain. Protein Sci. 10, 1454–1465 (2001).
Kasanov, J., Pirozzi, G., Uveges, A.J. &, Kay, B.K. Characterizing class I WW domains defines key specificity determinants and generates mutant domains with novel specificities. Chem. Biol. 8, 231–241 (2001).
Koepf, E.K. et al. Characterization of the structure and function of W → F WW domain variants: identification of a natively unfolded protein that folds upon ligand binding. Biochemistry 38, 14338–14351 (1999).
Nguyen, H., Jager, M., Moretto, A., Gruebele, M. & Kelly, J.W. Tuning the free-energy landscape of a WW domain by temperature, mutation, and truncation. Proc. Natl. Acad. Sci. USA 100, 3948–3953 (2003).
Pires, J.R. et al. Solution structures of the YAP65 WW domain and the variant L30 K in complex with the peptides GTPPPPYTVG, N-(n-octyl)-GPPPY and PLPPY and the application of peptide libraries reveal a minimal binding epitope. J. Mol. Biol. 314, 1147–1156 (2001).
Toepert, F., Pires, J.R., Landgraf, C., Oschkinat, H. & Schneider-Mergener, J. Synthesis of an array comprising 837 variants of the hYAP WW protein domain. Angew. Chem. Int. Edn Engl. 40, 897–900 (2001).
Yanagida, H., Matsuura, T. & Yomo, T. Compensatory evolution of a WW domain variant lacking the strictly conserved Trp residue. J. Mol. Evol. 66, 61–71 (2008).
Dalby, P.A., Hoess, R.H. & DeGrado, W.F. Evolution of binding affinity in a WW domain probed by phage display. Protein Sci. 9, 2366–2376 (2000).
Dai, M. et al. Using T7 phage display to select GFP-based binders. Protein Eng. Des. Sel. 21, 413–424 (2008).
Quail, M.A. et al. A large genome center's improvements to the Illumina sequencing system. Nat. Methods 5, 1005–1010 (2008).
Knight, R. & Yarus, M. Analyzing partially randomized nucleic acid pools: straight dope on doping. Nucleic Acids Res. 31, e30 (2003).
Weiss, G.A., Watanabe, C.K., Zhong, A., Goddard, A. & Sidhu, S.S. Rapid mapping of protein functional epitopes by combinatorial alanine scanning. Proc. Natl. Acad. Sci. USA 97, 8950–8954 (2000).
Guo, H.H., Choe, J. & Loeb, L.A. Protein tolerance to random amino acid change. Proc. Natl. Acad. Sci. USA 101, 9205–9210 (2004).
Das, R. & Baker, D. Macromolecular modeling with Rosetta. Annu. Rev. Biochem. 77, 363–382 (2008).
Kortemme, T. & Baker, D. A simple physical model for binding energy hot spots in protein-protein complexes. Proc. Natl. Acad. Sci. USA 99, 14116–14121 (2002).
Bershtein, S., Segal, M., Bekerman, R., Tokuriki, N. & Tawfik, D.S. Robustness-epistasis link shapes the fitness landscape of a randomly drifting protein. Nature 444, 929–932 (2006).
Weinreich, D.M., Delaney, N.F., Depristo, M.A. & Hartl, D.L. Darwinian evolution can follow only very few mutational paths to fitter proteins. Science 312, 111–114 (2006).
Ge, B. et al. Survey of allelic expression using EST mining. Genome Res. 15, 1584–1591 (2005).
Storey, J.D. & Tibshirani, R. Statistical significance for genomewide studies. Proc. Natl. Acad. Sci. USA 100, 9440–9445 (2003).
We thank C. Lee and J. Shendure for assistance with DNA sequencing, and J. Kelly, J. Thomas, J. Hesselberth, E. Phizicky, A. Rubin, L. Starita and K. McGarvey for helpful comments and discussion. This work was supported by the US National Institutes of Health (P41 RR11823 to S.F. and D.B., and F32GM084699 to D.M.F.). S.F. and D.B. were supported by the Howard Hughes Medical Institute.
The authors declare no competing financial interests.
About this article
Cite this article
Fowler, D., Araya, C., Fleishman, S. et al. High-resolution mapping of protein sequence-function relationships. Nat Methods 7, 741–746 (2010). https://doi.org/10.1038/nmeth.1492
Genome Biology (2021)
Nature Communications (2021)
Optimization of therapeutic antibodies by predicting antigen specificity from antibody sequence via deep learning
Nature Biomedical Engineering (2021)
DiMSum: an error model and pipeline for analyzing deep mutational scanning data and diagnosing common experimental pathologies
Genome Biology (2020)
Genome Medicine (2020)