High-resolution mapping of protein sequence-function relationships

Fowler, Douglas M; Araya, Carlos L; Fleishman, Sarel J; Kellogg, Elizabeth H; Stephany, Jason J; Baker, David; Fields, Stanley

doi:10.1038/nmeth.1492

Article
Published: 15 August 2010

High-resolution mapping of protein sequence-function relationships

Douglas M Fowler¹,
Carlos L Araya¹,
Sarel J Fleishman²,
Elizabeth H Kellogg²,
Jason J Stephany^1,3,
David Baker^2,3 &
…
Stanley Fields^1,3,4

Nature Methods volume 7, pages 741–746 (2010)Cite this article

10k Accesses
342 Citations
33 Altmetric
Metrics details

Subjects

Abstract

We present a large-scale approach to investigate the functional consequences of sequence variation in a protein. The approach entails the display of hundreds of thousands of protein variants, moderate selection for activity and high-throughput DNA sequencing to quantify the performance of each variant. Using this strategy, we tracked the performance of >600,000 variants of a human WW domain after three and six rounds of selection by phage display for binding to its peptide ligand. Binding properties of these variants defined a high-resolution map of mutational preference across the WW domain; each position had unique features that could not be captured by a few representative mutations. Our approach could be applied to many in vitro or in vivo protein assays, providing a general means for understanding how protein function relates to sequence.

Access through your institution

Buy or subscribe

This is a preview of subscription content, access via your institution

Access options

Access through your institution

Buy this article

Purchase on Springer Link
Instant access to full article PDF

Buy now

Prices may be subject to local taxes which are calculated during checkout

**Figure 1: A highly parallel assay for exploring protein sequence-function relationships.**

**Figure 2: Comparison of mutational tolerance and evolutionary conservation in the WW domain.**

**Figure 3: Comprehensive sequence-function map of the WW domain.**

**Figure 4: Prediction of WW domain folding energies and double-mutant enrichment ratios.**

Discovering functionally important sites in proteins

Article Open access 13 July 2023

Matteo Cagiada, Sandro Bottaro, … Kresten Lindorff-Larsen

Generating quantitative binding landscapes through fractional binding selections combined with deep sequencing and data normalization

Article Open access 15 January 2020

Michael Heyne, Niv Papo & Julia M. Shifman

Inferring protein 3D structure from deep mutation scans

Article 17 June 2019

Nathan J. Rollins, Kelly P. Brock, … Debora S. Marks

Accession codes

Accessions

GenBank/EMBL/DDBJ

SRA020603

References

Sidhu, S.S. & Koide, S. Phage display for engineering and analyzing protein interaction interfaces. Curr. Opin. Struct. Biol. 17, 481–487 (2007).
Article CAS Google Scholar
Matouschek, A., Kellis, J.T. Jr., Serrano, L. & Fersht, A.R. Mapping the transition state and pathway of protein folding by protein engineering. Nature 340, 122–126 (1989).
Article CAS Google Scholar
Cunningham, B.C. & Wells, J.A. High-resolution epitope mapping of hGH-receptor interactions by alanine-scanning mutagenesis. Science 244, 1081–1085 (1989).
Article CAS Google Scholar
Levin, A.M. & Weiss, G.A. Optimizing the affinity and specificity of proteins with molecular display. Mol. Biosyst. 2, 49–57 (2006).
Article CAS Google Scholar
Pal, G., Kouadio, J.L., Artis, D.R., Kossiakoff, A.A. & Sidhu, S.S. Comprehensive and quantitative mapping of energy landscapes for protein-protein interactions by rapid combinatorial scanning. J. Biol. Chem. 281, 22378–22385 (2006).
Article CAS Google Scholar
Dias-Neto, E. et al. Next-generation phage display: integrating and comparing available molecular tools to enable cost-effective high-throughput analysis. PLoS ONE 4, e8338 (2009).
Article Google Scholar
Ge, X., Mazor, Y., Hunicke-Smith, S.P., Ellington, A.D. & Georgiou, G. Rapid construction and characterization of synthetic antibody libraries without DNA amplification. Biotechnol. Bioeng. 106, 347–357 (2010).
CAS PubMed Google Scholar
Di Niro, R. et al. Rapid interactome profiling by massive sequencing. Nucleic Acids Res. 38, e110 (2010).
Article Google Scholar
Macias, M.J., Wiesner, S. & Sudol, M. WW and SH3 domains, two different scaffolds to recognize proline-rich ligands. FEBS Lett. 513, 30–37 (2002).
Article CAS Google Scholar
Espanel, X. et al. Probing WW domains to uncover and refine determinants of specificity in ligand recognition. Cytotechnology 43, 105–111 (2003).
Article CAS Google Scholar
Jager, M., Nguyen, H., Crane, J.C., Kelly, J.W. & Gruebele, M. The folding mechanism of a beta-sheet: the WW domain. J. Mol. Biol. 311, 373–393 (2001).
Article CAS Google Scholar
Jiang, X., Kowalski, J. & Kelly, J.W. Increasing protein stability using a rational approach combining sequence homology and structural alignment: stabilizing the WW domain. Protein Sci. 10, 1454–1465 (2001).
Article CAS Google Scholar
Kasanov, J., Pirozzi, G., Uveges, A.J. &, Kay, B.K. Characterizing class I WW domains defines key specificity determinants and generates mutant domains with novel specificities. Chem. Biol. 8, 231–241 (2001).
Article CAS Google Scholar
Koepf, E.K. et al. Characterization of the structure and function of W → F WW domain variants: identification of a natively unfolded protein that folds upon ligand binding. Biochemistry 38, 14338–14351 (1999).
Article CAS Google Scholar
Nguyen, H., Jager, M., Moretto, A., Gruebele, M. & Kelly, J.W. Tuning the free-energy landscape of a WW domain by temperature, mutation, and truncation. Proc. Natl. Acad. Sci. USA 100, 3948–3953 (2003).
Article CAS Google Scholar
Pires, J.R. et al. Solution structures of the YAP65 WW domain and the variant L30 K in complex with the peptides GTPPPPYTVG, N-(n-octyl)-GPPPY and PLPPY and the application of peptide libraries reveal a minimal binding epitope. J. Mol. Biol. 314, 1147–1156 (2001).
Article CAS Google Scholar
Toepert, F., Pires, J.R., Landgraf, C., Oschkinat, H. & Schneider-Mergener, J. Synthesis of an array comprising 837 variants of the hYAP WW protein domain. Angew. Chem. Int. Edn Engl. 40, 897–900 (2001).
Article CAS Google Scholar
Yanagida, H., Matsuura, T. & Yomo, T. Compensatory evolution of a WW domain variant lacking the strictly conserved Trp residue. J. Mol. Evol. 66, 61–71 (2008).
Article CAS Google Scholar
Dalby, P.A., Hoess, R.H. & DeGrado, W.F. Evolution of binding affinity in a WW domain probed by phage display. Protein Sci. 9, 2366–2376 (2000).
Article CAS Google Scholar
Dai, M. et al. Using T7 phage display to select GFP-based binders. Protein Eng. Des. Sel. 21, 413–424 (2008).
Article CAS Google Scholar
Quail, M.A. et al. A large genome center's improvements to the Illumina sequencing system. Nat. Methods 5, 1005–1010 (2008).
Article CAS Google Scholar
Knight, R. & Yarus, M. Analyzing partially randomized nucleic acid pools: straight dope on doping. Nucleic Acids Res. 31, e30 (2003).
Article Google Scholar
Weiss, G.A., Watanabe, C.K., Zhong, A., Goddard, A. & Sidhu, S.S. Rapid mapping of protein functional epitopes by combinatorial alanine scanning. Proc. Natl. Acad. Sci. USA 97, 8950–8954 (2000).
Article CAS Google Scholar
Guo, H.H., Choe, J. & Loeb, L.A. Protein tolerance to random amino acid change. Proc. Natl. Acad. Sci. USA 101, 9205–9210 (2004).
Article CAS Google Scholar
Das, R. & Baker, D. Macromolecular modeling with Rosetta. Annu. Rev. Biochem. 77, 363–382 (2008).
Article CAS Google Scholar
Kortemme, T. & Baker, D. A simple physical model for binding energy hot spots in protein-protein complexes. Proc. Natl. Acad. Sci. USA 99, 14116–14121 (2002).
Article CAS Google Scholar
Bershtein, S., Segal, M., Bekerman, R., Tokuriki, N. & Tawfik, D.S. Robustness-epistasis link shapes the fitness landscape of a randomly drifting protein. Nature 444, 929–932 (2006).
Article CAS Google Scholar
Weinreich, D.M., Delaney, N.F., Depristo, M.A. & Hartl, D.L. Darwinian evolution can follow only very few mutational paths to fitter proteins. Science 312, 111–114 (2006).
Article CAS Google Scholar
Ge, B. et al. Survey of allelic expression using EST mining. Genome Res. 15, 1584–1591 (2005).
Article CAS Google Scholar
Storey, J.D. & Tibshirani, R. Statistical significance for genomewide studies. Proc. Natl. Acad. Sci. USA 100, 9440–9445 (2003).
Article CAS Google Scholar

Download references

Acknowledgements

We thank C. Lee and J. Shendure for assistance with DNA sequencing, and J. Kelly, J. Thomas, J. Hesselberth, E. Phizicky, A. Rubin, L. Starita and K. McGarvey for helpful comments and discussion. This work was supported by the US National Institutes of Health (P41 RR11823 to S.F. and D.B., and F32GM084699 to D.M.F.). S.F. and D.B. were supported by the Howard Hughes Medical Institute.

Author information

Authors and Affiliations

Department of Genome Sciences, University of Washington, Seattle, Washington, USA
Douglas M Fowler, Carlos L Araya, Jason J Stephany & Stanley Fields
Department of Biochemistry, University of Washington, Seattle, Washington, USA
Sarel J Fleishman, Elizabeth H Kellogg & David Baker
Howard Hughes Medical Institute, Seattle, Washington, USA
Jason J Stephany, David Baker & Stanley Fields
Department of Medicine, University of Washington, Seattle, Washington, USA
Stanley Fields

Authors

Douglas M Fowler
View author publications
You can also search for this author in PubMed Google Scholar
Carlos L Araya
View author publications
You can also search for this author in PubMed Google Scholar
Sarel J Fleishman
View author publications
You can also search for this author in PubMed Google Scholar
Elizabeth H Kellogg
View author publications
You can also search for this author in PubMed Google Scholar
Jason J Stephany
View author publications
You can also search for this author in PubMed Google Scholar
David Baker
View author publications
You can also search for this author in PubMed Google Scholar
Stanley Fields
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

D.M.F. conceived of the method, carried out the experiments, analyzed the data and wrote the paper; C.L.A. conceived of the method, analyzed the data and wrote the paper; J.J.S. carried out the experiments; E.H.K., S.J.F. and D.B. carried out the protein folding and binding energy calculations; and S.F. conceived of the method and wrote the paper.

Corresponding author

Correspondence to Stanley Fields.

Ethics declarations

Competing interests

The authors declare no competing financial interests.

Supplementary information

Supplementary Text and Figures

Supplementary Figures 1–11, Supplementary Tables 1–2, Supplementary Note 1 (PDF 13072 kb)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Fowler, D., Araya, C., Fleishman, S. et al. High-resolution mapping of protein sequence-function relationships. Nat Methods 7, 741–746 (2010). https://doi.org/10.1038/nmeth.1492

Download citation

Received: 25 March 2010
Accepted: 13 July 2010
Published: 15 August 2010
Issue Date: September 2010
DOI: https://doi.org/10.1038/nmeth.1492

This article is cited by

A comprehensive map of human glucokinase variant activity
- Sarah Gersing
- Matteo Cagiada
- Rasmus Hartmann-Petersen
Genome Biology (2023)
satmut_utils: a simulation and variant calling package for multiplexed assays of variant effect
- Ian Hoskins
- Song Sun
- Can Cenik
Genome Biology (2023)
mutscan—a flexible R package for efficient end-to-end analysis of multiplexed assays of variant effect data
- Charlotte Soneson
- Alexandra M. Bendel
- Michael B. Stadler
Genome Biology (2023)
Deep mutational scan of a drug efflux pump reveals its structure–function landscape
- Gianmarco Meier
- Sujani Thavarasah
- Markus A. Seeger
Nature Chemical Biology (2023)
Designed active-site library reveals thousands of functional GFP variants
- Jonathan Yaacov Weinstein
- Carlos Martí-Gómez
- Sarel J. Fleishman
Nature Communications (2023)