Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Letter
  • Published:

Local fitness landscape of the green fluorescent protein

Abstract

Fitness landscapes1,2 depict how genotypes manifest at the phenotypic level and form the basis of our understanding of many areas of biology2,3,4,5,6,7, yet their properties remain elusive. Previous studies have analysed specific genes, often using their function as a proxy for fitness2,4, experimentally assessing the effect on function of single mutations and their combinations in a specific sequence2,5,8,9,10,11,12,13,14,15 or in different sequences2,3,5,16,17,18. However, systematic high-throughput studies of the local fitness landscape of an entire protein have not yet been reported. Here we visualize an extensive region of the local fitness landscape of the green fluorescent protein from Aequorea victoria (avGFP) by measuring the native function (fluorescence) of tens of thousands of derivative genotypes of avGFP. We show that the fitness landscape of avGFP is narrow, with 3/4 of the derivatives with a single mutation showing reduced fluorescence and half of the derivatives with four mutations being completely non-fluorescent. The narrowness is enhanced by epistasis, which was detected in up to 30% of genotypes with multiple mutations and mostly occurred through the cumulative effect of slightly deleterious mutations causing a threshold-like decrease in protein stability and a concomitant loss of fluorescence. A model of orthologous sequence divergence spanning hundreds of millions of years predicted the extent of epistasis in our data, indicating congruence between the fitness landscape properties at the local and global scales. The characterization of the local fitness landscape of avGFP has important implications for several fields including molecular evolution, population genetics and protein design.

This is a preview of subscription content, access via your institution

Access options

Buy this article

Prices may be subject to local taxes which are calculated during checkout

Figure 1: Exploring the local fitness landscape.
Figure 2: The effect of single mutations on avGFP.
Figure 3: Prevalence of epistasis in the local fitness landscape of avGFP.
Figure 4: Modelling genotype to phenotype relationship.
Figure 5: The fitness matrix model of GFP long-term evolution.

Similar content being viewed by others

Accession codes

Accessions

Sequence Read Archive

References

  1. Wright, S. The roles of mutation, inbreeding, crossbreeding and selection in evolution. Proc. Sixth Int. Congr. Genet. 1, 356–366 (1932)

    Google Scholar 

  2. de Visser, J. A. G. M. & Krug, J. Empirical fitness landscapes and the predictability of evolution. Nature Rev. Genet. 15, 480–490 (2014)

    Article  CAS  Google Scholar 

  3. Dean, A. M. & Thornton, J. W. Mechanistic approaches to the study of evolution: the functional synthesis. Nature Rev. Genet. 8, 675–688 (2007)

    Article  CAS  Google Scholar 

  4. Soskine, M. & Tawfik, D. S. Mutational effects and the evolution of new protein functions. Nature Rev. Genet. 11, 572–582 (2010)

    Article  CAS  Google Scholar 

  5. Weinreich, D. M., Lan, Y., Wylie, C. S. & Heckendorn, R. B. Should evolutionary geneticists worry about higher-order epistasis? Curr. Opin. Genet. Dev. 23, 700–707 (2013)

    Article  CAS  Google Scholar 

  6. Mackay, T. F. C. Epistasis and quantitative traits: using model organisms to study gene-gene interactions. Nature Rev. Genet. 15, 22–33 (2014)

    Article  CAS  Google Scholar 

  7. Taylor, M. B. & Ehrenreich, I. M. Higher-order genetic interactions and their contribution to complex traits. Trends Genet. 31, 34–40 (2015)

    Article  CAS  Google Scholar 

  8. Bershtein, S., Segal, M., Bekerman, R., Tokuriki, N. & Tawfik, D. S. Robustness-epistasis link shapes the fitness landscape of a randomly drifting protein. Nature 444, 929–932 (2006)

    Article  ADS  CAS  Google Scholar 

  9. Fowler, D. M. et al. High-resolution mapping of protein sequence-function relationships. Nature Methods 7, 741–746 (2010)

    Article  CAS  Google Scholar 

  10. Roscoe, B. P., Thayer, K. M., Zeldovich, K. B., Fushman, D. & Bolon, D. N. Analyses of the effects of all ubiquitin point mutants on yeast growth rate. J. Mol. Biol. 425, 1363–1377 (2013)

    Article  CAS  Google Scholar 

  11. Jacquier, H. et al. Capturing the mutational landscape of the beta-lactamase TEM-1. Proc. Natl Acad. Sci. USA 110, 13067–13072 (2013)

    Article  ADS  CAS  Google Scholar 

  12. Melamed, D., Young, D. L., Gamble, C. E., Miller, C. R. & Fields, S. Deep mutational scanning of an RRM domain of the Saccharomyces cerevisiae poly(A)-binding protein. RNA 19, 1537–1551 (2013)

    Article  CAS  Google Scholar 

  13. Olson, C. A., Wu, N. C. & Sun, R. A comprehensive biophysical description of pairwise epistasis throughout an entire protein domain. Curr. Biol. 24, 2643–2651 (2014)

    Article  CAS  Google Scholar 

  14. Bank, C., Hietpas, R. T., Jensen, J. D. & Bolon, D. N. A systematic survey of an intragenic epistatic landscape. Mol. Biol. Evol. 32, 229–238 (2015)

    Article  CAS  Google Scholar 

  15. Meini, M. R., Tomatis, P. E., Weinreich, D. M. & Vila, A. J. Quantitative description of a protein fitness landscape based on molecular features. Mol. Biol. Evol. 32, 1774–1787 (2015)

    Article  CAS  Google Scholar 

  16. Kondrashov, A. S., Sunyaev, S. & Kondrashov, F. A. Dobzhansky–Muller incompatibilities in protein evolution. Proc. Natl Acad. Sci. USA 99, 14878–14883 (2002)

    Article  ADS  CAS  Google Scholar 

  17. Firnberg, E., Labonte, J. W., Gray, J. J. & Ostermeier, M. A comprehensive, high-resolution map of a gene’s fitness landscape. Mol. Biol. Evol. 31, 1581–1592 (2014)

    Article  CAS  Google Scholar 

  18. Parera, M. & Martinez, M. A. Strong epistatic interactions within a single protein. Mol. Biol. Evol. 31, 1546–1553 (2014)

    Article  CAS  Google Scholar 

  19. Coates, M. M., Garm, A., Theobald, J. C., Thompson, S. H. & Nilsson, D. E. The spectral sensitivity of the lens eyes of a box jellyfish, Tripedalia cystophora (Conant). J. Exp. Biol. 209, 3758–3765 (2006)

    Article  Google Scholar 

  20. DePristo, M. A., Weinreich, D. M. & Hartl, D. L. Missense meanderings in sequence space: a biophysical view of protein evolution. Nature Rev. Genet. 6, 678–687 (2005)

    Article  CAS  Google Scholar 

  21. Milkman, R. Selection differentials and selection coefficients. Genetics 88, 391–403 (1978)

    CAS  PubMed  PubMed Central  Google Scholar 

  22. Kimura, M. & Crow, J. F. Effect of overall phenotypic selection on genetic change at individual loci. Proc. Natl Acad. Sci. USA 75, 6168–6171 (1978)

    Article  ADS  CAS  Google Scholar 

  23. Crow, J. F. & Kimura, M. Efficiency of truncation selection. Proc. Natl Acad. Sci. USA 76, 396–399 (1979)

    Article  ADS  CAS  Google Scholar 

  24. Rockah-Shmuel, L., Tóth-Petróczy, Á. & Tawfik, D. S. Systematic mapping of protein mutational space by prolonged drift reveals the deleterious effects of seemingly neutral mutations. PLOS Comput. Biol. 11, e1004421 (2015)

    Article  ADS  Google Scholar 

  25. Li, W. H. Models of nearly neutral mutations with particular implications for nonrandom usage of synonymous codons. J. Mol. Evol. 24, 337–345 (1987)

    Article  ADS  CAS  Google Scholar 

  26. Akashi, H. Inferring weak selection from patterns of polymorphism and divergence at ‘silent’ sites in Drosophila DNA. Genetics 139, 1067–1076 (1995)

    CAS  PubMed  PubMed Central  Google Scholar 

  27. Povolotskaya, I. S. & Kondrashov, F. A. Sequence space and the ongoing expansion of the protein universe. Nature 465, 922–926 (2010)

    Article  ADS  CAS  Google Scholar 

  28. Usmanova, D. R., Ferretti, L., Povolotskaya, I. S., Vlasov, P. K. & Kondrashov, F. A. A model of substitution trajectories in sequence space and long-term protein evolution. Mol. Biol. Evol. 32, 542–554 (2015)

    Article  Google Scholar 

  29. Eyre-Walker, A. & Keightley, P. D. The distribution of fitness effects of new mutations. Nature Rev. Genet. 8, 610–618 (2007)

    Article  CAS  Google Scholar 

  30. Ohta, T. Slightly deleterious mutant substitutions in evolution. Nature 246, 96–98 (1973)

    Article  ADS  CAS  Google Scholar 

Download references

Acknowledgements

We thank Y. Kulikova and G. Filion for discussion on statistical analysis and I. Osterman, R. Moretti and J. Meiler for technical assistance and M. Friesen for a critical reading of the manuscript. We thank H. Himmelbauer, CRG Genomic Unit and the Russian Science Foundation project (14-50-00150) for sequencing. Experiments were partially carried out using the equipment provided by the IBCH core facility (CKP IBCH). The work was supported by HHMI International Early Career Scientist Program (55007424), the EMBO Young Investigator Programme, MINECO (BFU2012-31329), Spanish Ministry of Economy and Competitiveness Centro de Excelencia Severo Ochoa 2013-2017 grant (SEV-2012-0208), Secretaria d’Universitats i Recerca del Departament d’Economia i Coneixement de la Generalitat’s AGAUR program (2014 SGR 0974), Russian Science Foundation (14-25-00129) and the European Research Council under the European Union’s Seventh Framework Programme (FP7/2007-2013, ERC grant agreement, 335980_EinME).

Author information

Authors and Affiliations

Authors

Contributions

K.S.S. and M.V.M. conceived the idea for the experiment; K.S.S., D.A.B., M.V.M., A.S.M., G.V.S., M.D.L., D.M.C., E.V.P., I.Z.M., D.S.T., K.A.L. and F.A.K. participated in experimental design; K.S.S., D.A.B., M.V.M., G.V.S., E.V.P., E.S.E. and M.D.L. performed the experiments; K.S.S., D.A.B., M.V.M., D.R.U., A.S.M., D.N.I., N.G.B., M.S.B., O.S., N.S.B., P.K.V., A.S.K. and F.A.K. performed data analysis; K.S.S., D.A.B., M.V.M., D.R.U., D.N.I. and F.A.K. wrote the paper.

Corresponding author

Correspondence to Fyodor A. Kondrashov.

Ethics declarations

Competing interests

The authors declare no competing financial interests.

Additional information

Raw sequencing data were deposited in the Sequence Read Archive (SRA) under BioProject number PRJNA282342. Processed data sets are available at Figshare http://dx.doi.org/10.6084/m9.figshare.3102154.

Extended data figures and tables

Extended Data Figure 1 Scheme of the experimental approach.

The depiction of the construct design, expression and cell sorting.

Extended Data Figure 2 Fluorescence and impact of mutations.

A violin plot of the measured levels of fluorescence for genotypes carrying different numbers of missense mutations.

Extended Data Figure 3 Mutant genotypes and evolution.

a, b, The log-fluorescence and evolutionary conservation expressed as Shannon entropy (a), and fraction of mutant amino acid states found in avGFP orthologues (b). The y-axis error bars in b show the binomial proportion confidence interval level (68%), and other error bars denote s.e.m.

Extended Data Figure 4 Epistatically interacting pairs of sites in the GFP structure.

a, Pairs of amino acid sites for which we assayed at least one combination of mutations (in blue, top left). The distribution of the maximum level of epistasis observed between sites (blue scale, bottom right) and unknown values (white). b, Pairs of sites under exceptionally strong epistatic interaction (e < −2) connected by a blue line on the GFP structure. c, The distribution of distances in the GFP structure between sites with at least one pair of epistatically interacting mutations (red) and all pairs of sites in the structure (grey). d, Epistasis between pairs of mutations as a function of their individual fluorescence. e, The contribution of internally and externally oriented amino acid residues in the avGFP structure relative to pairs of missense mutations showing no epistasis (|e| < 0.3), weak (0.3 < |e| < 0.7) and strong (|e| > 0.7) epistasis.

Extended Data Figure 5 Modelling effect of mutations on fluorescence.

a, A multiple linear regression in which fluorescence is linear combination of effects of individual single mutations. b, A multiple regression in which mutations contribute linearly to a fitness potential and fluorescence is a sigmoidal function of p where F ≈ e−p/(1 + e−p). c, d, The predicted fluorescence by a neural network approach. Predicted fitness function by a neural network with one hidden neuron and two neurons in the outer layer. e, The scheme of our neural network approach. The genotype data was passed to the input layer of the neural network as an array of 0s or 1s corresponding to the absence or presence of amino acid mutations in the genotype, respectively. The first hidden layer consisted of a single neuron that calculated the weighted sum of inputs using weights obtained during training. The output of the first hidden layer was passed through an output subnetwork that transformed this value with a nonlinear function to make the final prediction of fluorescence. The output subnetwork consisted of several neurons with a sigmoidal transfer function, allowing the subnetwork to approximate a broad range of nonlinear functions. The final mapping of the hidden value to fluorescence was determined by the weights of connections between neurons inside the output subnetwork. During training all weights were optimized to find the best prediction of fluorescence from the hidden value. The resulting function that was defined during training is shown in Fig. 4. f, Correlation between the hidden value of the neural network and Rosetta-predicted ΔΔG for single mutants.

Extended Data Table 1 Genotypes with measured fluorescence in our data set

Supplementary information

Supplementary Information

This file contains Supplementary Text and Data, Supplementary Figures 1-5, Supplementary Table 1 and Supplementary references –see contents page for details. (PDF 2608 kb)

The local GFP fitness landscape

A 3D rendering of our dataset that is also depicted in Figure 1b. The protein sequence is arranged in a circle, with the N terminal and the chromophore labelled on the outer circle. Black line markers outside the fitness landscape representation are positioned every 10 sites of avGFP. The Z-axis, height, represents the level of fluorescence, which is colour-coded from green to black. The surface is shown as the median fluorescence brightness levels of all mutations at a given site with fluorescence levels conferred by individual mutations shown by dots. The centre represents the fluorescence of avGFP with distance away from it corresponding to the number of mutations in the genotype. The median surface extends up to genotypes with 10 mutations. (MP4 26274 kb)

PowerPoint slides

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Sarkisyan, K., Bolotin, D., Meer, M. et al. Local fitness landscape of the green fluorescent protein. Nature 533, 397–401 (2016). https://doi.org/10.1038/nature17995

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1038/nature17995

This article is cited by

Comments

By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.

Search

Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing