We present a large-scale approach to investigate the functional consequences of sequence variation in a protein. The approach entails the display of hundreds of thousands of protein variants, moderate selection for activity and high-throughput DNA sequencing to quantify the performance of each variant. Using this strategy, we tracked the performance of >600,000 variants of a human WW domain after three and six rounds of selection by phage display for binding to its peptide ligand. Binding properties of these variants defined a high-resolution map of mutational preference across the WW domain; each position had unique features that could not be captured by a few representative mutations. Our approach could be applied to many in vitro or in vivo protein assays, providing a general means for understanding how protein function relates to sequence.
At a glance
- Phage display for engineering and analyzing protein interaction interfaces. Curr. Opin. Struct. Biol. 17, 481–487 (2007). &
- Mapping the transition state and pathway of protein folding by protein engineering. Nature 340, 122–126 (1989). , , &
- High-resolution epitope mapping of hGH-receptor interactions by alanine-scanning mutagenesis. Science 244, 1081–1085 (1989). &
- Optimizing the affinity and specificity of proteins with molecular display. Mol. Biosyst. 2, 49–57 (2006). &
- Comprehensive and quantitative mapping of energy landscapes for protein-protein interactions by rapid combinatorial scanning. J. Biol. Chem. 281, 22378–22385 (2006). , , , &
- Next-generation phage display: integrating and comparing available molecular tools to enable cost-effective high-throughput analysis. PLoS ONE 4, e8338 (2009). et al.
- Rapid construction and characterization of synthetic antibody libraries without DNA amplification. Biotechnol. Bioeng. 106, 347–357 (2010). , , , &
- Rapid interactome profiling by massive sequencing. Nucleic Acids Res. 38, e110 (2010). et al.
- WW and SH3 domains, two different scaffolds to recognize proline-rich ligands. FEBS Lett. 513, 30–37 (2002). , &
- Probing WW domains to uncover and refine determinants of specificity in ligand recognition. Cytotechnology 43, 105–111 (2003). et al.
- The folding mechanism of a beta-sheet: the WW domain. J. Mol. Biol. 311, 373–393 (2001). , , , &
- Increasing protein stability using a rational approach combining sequence homology and structural alignment: stabilizing the WW domain. Protein Sci. 10, 1454–1465 (2001). , &
- WW domains defines key specificity determinants and generates mutant domains with novel specificities. Chem. Biol. 8, 231–241 (2001). , , ,
- Characterization of the structure and function of W F WW domain variants: identification of a natively unfolded protein that folds upon ligand binding. Biochemistry 38, 14338–14351 (1999). et al.
- Tuning the free-energy landscape of a WW domain by temperature, mutation, and truncation. Proc. Natl. Acad. Sci. USA 100, 3948–3953 (2003). , , , &
- Solution structures of the YAP65 WW domain and the variant L30 K in complex with the peptides GTPPPPYTVG, N-(n-octyl)-GPPPY and PLPPY and the application of peptide libraries reveal a minimal binding epitope. J. Mol. Biol. 314, 1147–1156 (2001). et al.
- Synthesis of an array comprising 837 variants of the hYAP WW protein domain. Angew. Chem. Int. Edn Engl. 40, 897–900 (2001). , , , &
- Compensatory evolution of a WW domain variant lacking the strictly conserved Trp residue. J. Mol. Evol. 66, 61–71 (2008). , &
- Evolution of binding affinity in a WW domain probed by phage display. Protein Sci. 9, 2366–2376 (2000). , &
- Using T7 phage display to select GFP-based binders. Protein Eng. Des. Sel. 21, 413–424 (2008). et al.
- A large genome center's improvements to the Illumina sequencing system. Nat. Methods 5, 1005–1010 (2008). et al.
- Analyzing partially randomized nucleic acid pools: straight dope on doping. Nucleic Acids Res. 31, e30 (2003). &
- Rapid mapping of protein functional epitopes by combinatorial alanine scanning. Proc. Natl. Acad. Sci. USA 97, 8950–8954 (2000). , , , &
- Protein tolerance to random amino acid change. Proc. Natl. Acad. Sci. USA 101, 9205–9210 (2004). , &
- Macromolecular modeling with Rosetta. Annu. Rev. Biochem. 77, 363–382 (2008). &
- A simple physical model for binding energy hot spots in protein-protein complexes. Proc. Natl. Acad. Sci. USA 99, 14116–14121 (2002). &
- Robustness-epistasis link shapes the fitness landscape of a randomly drifting protein. Nature 444, 929–932 (2006). , , , &
- Darwinian evolution can follow only very few mutational paths to fitter proteins. Science 312, 111–114 (2006). , , &
- Survey of allelic expression using EST mining. Genome Res. 15, 1584–1591 (2005). et al.
- Statistical significance for genomewide studies. Proc. Natl. Acad. Sci. USA 100, 9440–9445 (2003). &
- Supplementary Text and Figures (9M)
Supplementary Figures 1–11, Supplementary Tables 1–2, Supplementary Note 1