Fitness landscapes1,2 depict how genotypes manifest at the phenotypic level and form the basis of our understanding of many areas of biology2,3,4,5,6,7, yet their properties remain elusive. Previous studies have analysed specific genes, often using their function as a proxy for fitness2,4, experimentally assessing the effect on function of single mutations and their combinations in a specific sequence2,5,8,9,10,11,12,13,14,15 or in different sequences2,3,5,16,17,18. However, systematic high-throughput studies of the local fitness landscape of an entire protein have not yet been reported. Here we visualize an extensive region of the local fitness landscape of the green fluorescent protein from Aequorea victoria (avGFP) by measuring the native function (fluorescence) of tens of thousands of derivative genotypes of avGFP. We show that the fitness landscape of avGFP is narrow, with 3/4 of the derivatives with a single mutation showing reduced fluorescence and half of the derivatives with four mutations being completely non-fluorescent. The narrowness is enhanced by epistasis, which was detected in up to 30% of genotypes with multiple mutations and mostly occurred through the cumulative effect of slightly deleterious mutations causing a threshold-like decrease in protein stability and a concomitant loss of fluorescence. A model of orthologous sequence divergence spanning hundreds of millions of years predicted the extent of epistasis in our data, indicating congruence between the fitness landscape properties at the local and global scales. The characterization of the local fitness landscape of avGFP has important implications for several fields including molecular evolution, population genetics and protein design.
Access optionsAccess options
Subscribe to Journal
Get full journal access for 1 year
only $3.90 per issue
All prices are NET prices.
VAT will be added later in the checkout.
Rent or Buy article
Get time limited or full article access on ReadCube.
All prices are NET prices.
Sequence Read Archive
We thank Y. Kulikova and G. Filion for discussion on statistical analysis and I. Osterman, R. Moretti and J. Meiler for technical assistance and M. Friesen for a critical reading of the manuscript. We thank H. Himmelbauer, CRG Genomic Unit and the Russian Science Foundation project (14-50-00150) for sequencing. Experiments were partially carried out using the equipment provided by the IBCH core facility (CKP IBCH). The work was supported by HHMI International Early Career Scientist Program (55007424), the EMBO Young Investigator Programme, MINECO (BFU2012-31329), Spanish Ministry of Economy and Competitiveness Centro de Excelencia Severo Ochoa 2013-2017 grant (SEV-2012-0208), Secretaria d’Universitats i Recerca del Departament d’Economia i Coneixement de la Generalitat’s AGAUR program (2014 SGR 0974), Russian Science Foundation (14-25-00129) and the European Research Council under the European Union’s Seventh Framework Programme (FP7/2007-2013, ERC grant agreement, 335980_EinME).
Extended data figures
Extended data tables
A 3D rendering of our dataset that is also depicted in Figure 1b. The protein sequence is arranged in a circle, with the N terminal and the chromophore labelled on the outer circle. Black line markers outside the fitness landscape representation are positioned every 10 sites of avGFP. The Z-axis, height, represents the level of fluorescence, which is colour-coded from green to black. The surface is shown as the median fluorescence brightness levels of all mutations at a given site with fluorescence levels conferred by individual mutations shown by dots. The centre represents the fluorescence of avGFP with distance away from it corresponding to the number of mutations in the genotype. The median surface extends up to genotypes with 10 mutations.
About this article
Nature Genetics (2018)