Introduction

By considering the relationship between genotype and fitness as a topographic map, Wright (1931) created the concept of a fitness landscape. During the last century, this concept has been adopted across various subfields of the sciences, and it has been used extensively to study how populations may adapt to novel environments (De Visser and Krug 2014; Gorter et al. 2018; Perfeito et al. 2011). Only recently have technological and experimental advances enabled the assessment of large empirical fitness landscapes at high resolution (Bank et al. 2014; 2016; Hietpas et al. 2013; Weinreich et al. 2006; Wu et al. 2016). Wright (1931) noted early on that a complete fitness landscape with L loci, each of which has k alleles, results in a hypercube of kL genotypes. This enormous dimensionality can never be fully sampled and therefore enforces a careful and limited choice of the mutations that may be assayed in any given experiment. Thus, most fitness landscape studies to date have only considered amino acid changing mutations (e.g., Bank et al. 2016; Wu et al. 2016). Only considering the genotype–fitness relationship at the amino acid level entails the risk of misrepresenting the true underlying fitness landscape and, thus, the potential routes along which adaptive walks may proceed.

Firstly, mutations in an amino acid based fitness landscape are, by definition, non-synonymous. This neglects the accumulating evidence from both comparative and experimental studies that synonymous mutations (i.e., mutations that change the codon but not the encoded amino acid) can display non-negligible fitness effects (Agashe et al. 2013; 2016; Bailey et al. 2014; Bali and Bebok 2015; Choi and Aquadro 2016; Drummond and Wilke 2008; Firnberg et al. 2014; Hunt et al. 2014; Knöppel et al. 2016; Kudla et al. 2009; Lind et al. 2010; Plotkin and Kudla 2011; Presnyak et al. 2015; Sauna and Kimchi-Sarfaty 2011; Singh et al. 2007; Zhou et al. 2009). For example, recent studies have shown that synonymous mutations can affect the speed and accuracy of translation (Bali and Bebok 2015; Drummond and Wilke 2008; Plotkin and Kudla 2011; Saunders and Deane 2010) mRNA structure (O’Brien et al. 2014; Presnyak et al. 2015; Shabalina et al. 2013) expression in response to environmental changes (Shabalina et al. 2013) and that they are associated with several organismal malfunctions (Hunt et al. 2014; Parmley and Hurst 2007). Although synonymous effects undoubtedly exist, effect sizes are often small, which has made a systematic characterization difficult. In particular, to our knowledge there exists no study to date that has characterized whether fitness effects of synonymous mutations vary across environments; a finding that could be in concordance with the costs of adaptation that are frequently reported for amino acid changing mutations (e.g., Bataillon et al. 2011; Hietpas et al. 2013; Rodriguez-Verdugo et al. 2014; Wenger et al. 2011).

Secondly, the consideration of a fitness landscape at the codon level introduces a lower connectivity of the genotypes, i.e., a different topology of the fitness landscape. Whereas from the amino acid view of the landscape, any amino acid transition is possible in a single mutational step, a codon-based landscape requires up to three mutational steps to transition from one amino acid to another (c.f. Figure 1a). Hence, even a single amino acid position in the genome contains a fitness landscape that consists of the (4 nucleotides)3loci = 64 codons at that position.

Fig. 1
figure 1

Graphical summary of the study. a The consideration of the codon structure and the fitness effects of synonymous mutations results in fitness landscapes with different topologies and topographies. The graphs illustrate fitness landscapes at a single amino acid position. Gray lines indicate single-step mutations and colors indicate potential fitness differences. (1) Many studies implicitly assume that all amino acids are connected by a single mutational step. (2) The codon table restricts the number of possible substitutions at the amino acid level and thus results in a different topology. We denote the fitness landscape that accounts for the codon table but neglects the potential effects of synonymous mutations as the averaged landscape (codons that code for the same amino acid are presented in similar colors). (3) We denote the fitness landscape that considers the individual effect of each codon as the single-effect landscape (each codon has a specific color). b For this study, we obtained deep mutational scanning data of 54 codon fitness landscapes from Bank et al. (2014). We infer individual selection coefficients using a newly developed analysis software, empiricIST. c We quantify the distribution of synonymous fitness effects and perform regression methods to relate these effects to biological mechanisms. d We quantify the shape of the codon fitness landscapes across environments. We illustrate the consequences of ignoring synonymous effects on the evolutionary dynamics on the landscapes

We illustrate two aspects of the differences between the amino acid and codon levels in Fig. 1a. Considering the codon level results in a different topology of the fitness landscape (A1 to A2), and considering effects of synonymous mutations results in a potentially different topography of this codon fitness landscape (A2 to A3). As highlighted by Zagorski et al. (2016), a change in the topology of a fitness landscapes can result in markedly different accessibility of fitness peaks, and the topography further amplifies this effect. For example, a single-nucleotide mutation in a codon-based landscape can result in only 5 to 7 amino acid changes rather than the 20 total possible amino acid changes. Thus, at a single amino acid position, a codon-based fitness landscape (with 64 genotypes) can have multiple local fitness peaks, whereas the corresponding amino acid landscape (with 21 genotypes) is by definition single-peaked.

Here we quantify the effects of synonymous mutations (Fig. 1c) and study how including synonymous effects modifies the evolutionary dynamics on codon fitness landscapes (Fig. 1d). To this end, we use published data (Bank et al. 2014) from deep mutational scanning (Fowler and Fields 2014; Hietpas et al. 2012), which consist of codon fitness landscapes of the same 9-amino acid positions across 6 environments. Our results indicate that the distribution of synonymous effect sizes is heavy-tailed, with many mutations of little effect and a few larger-effect mutations. Furthermore, we compare the shape of the codon fitness landscapes with and without consideration of effects of synonymous mutations. We find that the evolutionary dynamics on these landscapes differ greatly between the two types of landscapes, as local optima created by synonymous effects can stall the progression towards the global optimum of the fitness landscape. Thus, our work calls for a more careful consideration of synonymous effects in future studies of fitness landscapes and adaptive walks.

Material and methods

MCMC method

We implemented a software to infer selection coefficients from deep mutational scanning experiments. The empiricIST software is based on a previously developed Bayesian Markov chain Monte Carlo (MCMC) approach (Bank et al. 2014) and is a user-friendly and accurate software for improved growth rate estimation from time-sampled deep-sequencing data. We took advantage of the high accuracy provided by this method to estimate selection coefficients of synonymous mutations. empiricIST is a software package for (1) processing sequencing count data from deep mutational scanning experiments, (2) estimating growth rates using a Bayesian MCMC approach described in detail in Bank et al. (2014) and (3) post-processing of growth rate estimates to estimate the shape of the beneficial tail of the distribution of fitness effects (DFE). A detailed description of the software, its usage, and options can be found in the accompanying manual (https://github.com/Matu2083/ empiricIST). In the following, we give a brief description of the assumed experimental setup and the model underlying the MCMC and estimation procedure, and by means of simulations compare the accuracy of the results to that obtained from conventional linear regression (Matuszewski et al. 2016).

Assumptions of the model and input data

We consider an experiment assessing the fitness of K mutants, labeled \(i \in \{ 1, \cdots ,K\}\). Each mutant i is assumed to be present at initial population size ci and to grow exponentially at constant rate ri, such that its true abundance at time t, Ni(t) is given by \(N_i(t) = c_i\mathop {{{\mathrm{exp}}}}\nolimits^{r_it}\). At each sampling time point t {1, …,T}, sequencing reads \(n_{i,t}\) are drawn from a multinomial distribution with parameters \(n_t = \mathop {\sum}\nolimits_{i = 1}^K n_{i,t}\) (i.e., the total number of sequencing reads) and pt = (p1,t, …,pK,t), where \(p_{i,t} = \frac{{c_i\exp ^{r_it}}}{{\mathop {\sum}\nolimits_{i = 1}^K c_i\exp ^{r_it}}}\) is the relative frequency of mutant i in the population at time t. Here, time is measured in hours to make results comparable across different environmental conditions (Bank et al. 2014; Chevin 2011). The software allows for input of either generation or standard time. We furthermore assume that sampling points are independent such that the overall likelihood can be written as the product of the individual likelihoods of each sampling point.

\(L(n) = \mathop {\prod}\limits_{{t}\, \in \,{T}} L\left({c,r|\left\{ {n_{1,t}, \ldots ,n_{K,t}} \right\}} \right).\)

All initial population sizes ci and growth rates ri are estimated relative to those of a chosen reference mutant with its initial population size and growth rate arbitrarily set to 10,000 and 1, respectively. Here, the wild-type sequence in laboratory conditions of 30 °C was used as the reference.

MCMC model

We implemented a Metropolis–Hastings algorithm in C++ using flat priors allowing all attainable values riR+ and ci |N to be realized with equal probability. During the burn-in period, the variance of both proposal distributions was adjusted such that the targeted acceptance ratio is around 25%, which optimizes the performance of the MCMC chain (Gelman et al. 1996).

The updated variance of the proposal distribution was calculated using

\(\sigma _{{\mathrm{new}}} = \sigma _{{\mathrm{old}}}f(k;y,k)\)

with

$$f(x;y,k) = \left[ {1 + \frac{{\left. {\left( {{\mathrm{cosh}}\left( {x - y} \right) - 1} \right)} \right)\left( {k - 1} \right)}}{{{\mathrm{cosh}}\left( {y - \left| {x - y} \right|} \right) - 1}}} \right]{\mathrm{sgn}}(x - y),$$

where x denotes the targeted acceptance ratio, y is the current acceptance ratio, and k is a (fixed) scale parameter that restricts the maximal change in the variance of the proposal distribution (Roberts et al. 2001). After discarding the first 100,000 accepted samples (i.e., after the burn-in period), the MCMC was run for an additional 10,000,000 accepted samples. Only every 1000th sample was retained for further analyses, such that the posterior distribution of each parameter was characterized by 10,000 samples overall.

Convergence and mixing were checked by visual inspection of the resulting trace files for all estimated parameters, and by calculating the effective sample sizes (i.e., the number of independent samples) and the Hellinger distance (Boone et al. 2014) between sets of 1000 batched recorded samples. Effective sample sizes were generally larger than 1000 for all parameters, and Hellinger distances below 0.1 indicated convergence and good mixing. To facilitate estimation, we took advantage of the fact that the multinomial distribution is preserved when a subset of the counting variables are observed. This enabled us to split the data set into subsets with 10 mutants each (implicitly treating the other mutants’ sequencing reads as observed). More options such as outlier detection, data imputation, DFE tail-shape estimation are detailed in the Supporting Information.

Assessing accuracy of the MCMC

To assess the accuracy of the Bayesian MCMC approach, we compared its parameter estimates to those obtained using ordinary least squares (OLS) linear regression of the log ratios against the number of sequencing reads ni,t over the different sampling time points (Matuszewski et al. 2016). For that we simulated time-sampled deep-sequencing data (implemented in C++; available from https://github.com/Matu2083/empiricIST), assuming that individual mutant growth rates and initial population sizes for each of the K mutants are drawn independently from a normal distribution (i.e., \(r_i \sim {\cal N}(1,0.01)\)) and a log-normal distribution (i.e., \(c_i \sim 10^{{\cal N}(4,0.25)}\)), respectively. Without loss of generality, we denote the wild-type reference (or any other reference genotype) by i = 1 and set its growth rate to 1.

Sequencing reads were then drawn independently for each of the T equally spaced time points from a multinomial distribution with parameters nt (i.e., the number of total sequencing reads per time point) and \(p_t = (p_{1,t}, \ldots ,p_{K,t})\). To check the robustness of these results when applied to the real experimental data, we furthermore drew growth rates from a mixture distribution

$$\left\{\begin{array}{*{20}{l}}|N(1,\hat \sigma)| & {\mathrm{if}} &{\mathrm{ z}} = 0,\\\mathrm{exp}(\hat \lambda ) + 1 & \mathrm{if}&\mathrm{z}= 1,\end{array}\right.$$

where \(Z \sim {\cal B}(x)\) is a Bernoulli-distributed random variable that indicates whether growth rates are drawn from the deleterious part of the DFE (i.e., if z = 0) or from the exponential beneficial tail (i.e., if z = 1). The parameters \(\hat \sigma\), \(\hat \lambda\), and \(\hat x\) are estimated from the underlying experimental data, and based on growth rate estimates obtained from OLS linear regression.

Finally, the accuracy of the parameter estimates was assessed by computing the mean square error (MSE)

$${\mathrm{MSE}} = \frac{1}{{K - 1}}\mathop {\sum}\nolimits_{i = 2}^K (\hat r_i - r_i)^2,$$

the length of the credibility interval (CI, calculated from the MCMC posterior distribution), and the frequency of the true growth rate lying in the 95% confidence interval obtained via OLS calculated over 100 simulated data sets.

Bayesian MCMC outperforms linear regression

Validating the method with various types of simulated data mimicking the experimental data (as detailed in the Supplementary Material) shows that our MCMC generally outperforms ordinary least square regression (OLS). Figure 2 and S1 show the simulation results. Although the mean square error (MSE) of the MCMC is comparable to that of the OLS when analyzing few time points (i.e., 3 to 5 time points), the MSE of the MCMC decreases faster as the number of time points increases (Fig. 2a).

Fig. 2
figure 2

Comparison between performance of empiricIST and ordinary least square regression with varying number of time points sampled. We display a mean square error (MSE), b size of the credibility interval (CI), and c the proportion the true growth rate contained in the CI. As shown, empiricIST yields an equal or lower MSE than OLS regression, particularly as the number of sampled time points increases. Furthermore, empiricIST outperforms the OLS regression regarding the size of the CI and at capturing the true growth rate, even when sampling a small number of time points

Furthermore, when analyzing few time points, the length of the credibility interval (CI) is significantly smaller for the MCMC than the corresponding confidence interval of the OLS regression (Fig. 2b). While the difference between the length of the confidence intervals decreases as the number of time samples T increases, the size of the CI from the MCMC always remains smaller, which implies that it yields more precise and accurate results than the conventional OLS regression. Most importantly, and unlike the OLS regression, the CI of the MCMC remains well calibrated along the entire range of parameters that were tested (cf. Figure 2c for illustration across a range of time points), despite being generally narrower than its OLS counterpart.

Apart from its main program—the Bayesian MCMC program—empiricIST provides Python and shell scripts for data pre-processing and post-processing. Details about their usage and options are given in the accompanying manual. Here we outline the two different options that are available for dealing with outliers in the sequencing data—i.e., outlier detection and data imputation—and explain the DFE tail-shape estimation.

Outlier detection in empiricIST

As an alternative to treating outliers as unobserved (i.e., as missing data), we also implemented an approach in which data points identified as outliers were imputed (see SI). For that we again used the linear regression of the log ratios of the mutant’s read number to the total number of reads at each individual time point (i.e., the ‘total’ normalization, sensu Bank et al. 2014), and classified as outliers data points that exceed the DFBETA cutoff of 2 and that had an absolute studentized residual bigger than 3. In comparison to other reasonable and established outlier criteria, this approach proved to be more cautious as exemplified by the higher specificity and lower sensitivity (Fig. 2, Fig. S1). By combining two independent outlier criteria (i.e., the DFBETA statistic and the studentized residuals), this approach ensures that data points identified as outliers have leverage effects (i.e., change the slope considerably) and are in conflict (meaning that are very different in comparison) with the remaining data points. Thus, to minimize changes in the original experimental data we took an extremely conservative approach, such that only those data points that stand out as extreme outliers will be imputed.

When comparing the MSE over 100 simulated data sets across different outlier detection methods, we find that the MSE increases with the proportion of outliers in the data set, independent of the method used. Imputing data points generally improves the accuracy of the parameter estimates compared to treating outliers as missing data (Bank et al. 2014, Fig. S2, S3). Expectedly, when there are no outliers in the data, normalization to the wild type displays the lowest error (c.f. Bank et al. 2016; Matuszewski et al. 2016). However, with only 1% outliers in the data, the error of the normalization to the wild type is comparable to that of the normalization to the total number of reads and performs increasingly worse as the proportion of outliers in the data increases (Fig. 2). Note that in the presence of outliers, using any outlier method improves growth rate estimates considerably.

Estimating the shape for the beneficial tail with empiricIST

Finally, empiricIST contains a Python script for estimating the shape of the beneficial tail of the DFE. It is often believed that these effects typically follow an exponential distribution (Gillespie 1983; 1984) characterized by many small, nearly-neutral mutations and a few strongly beneficial mutations. Using extreme value theory, it is possible to test whether experimental data complies with that assumption (and falls into the Gumbel domain), or whether the data are better represented by distributions from the Weibull domain (i.e., bounded distributions that decay more rapidly than an exponential distribution, implying more small effect mutations) or from the Fréchet domain (i.e., distributions decaying less rapidly than an exponential distribution implying an excess of large effect mutations; see also Beisel et al. 2007). Additional information about the different types of distributions and likelihood estimation are available in the section on DFE estimation in the SI (SI-Additional Material and Methods, section DFE tail-shape estimation, Fig S4).

We analyzed the power of the maximum-likelihood method to make this distinction by simulating 1000 Generalized Pareto Distribution (GPD) data sets for different underlying shape parameter (κ) values (spanning across all three GDP domains) and varying sample sizes. We find that for small sample sizes (Fig. S4A, B) \(\hat \kappa\) displays a large variance and a slight negative bias, in particular, if the underlying shape parameter is from the Weibull domain (i.e., κ < 0). This bias is caused by a (numerical) discontinuity in the log-likelihood function around κ = −1 (Eq. S3 in SI), causing κ to consistently deviate (Rokyta et al. 2008). As sample size increases, however, the variance of the maximum-likelihood estimate decreases and its bias vanishes (Fig. S4C, D). Furthermore, while κ typically falls into the correct domain (even for low sample sizes), the statistical power for detecting deviations from the null hypothesis (i.e., whether H0: κ = 0) is low (unless sample sizes are large).

Experimental data

The data used in this study were originally obtained in Bank et al. 2014 using the EMPIRIC approach (Hietpas et al. 2011; 2012). Briefly, single-codon-substitution libraries were generated using a plasmid constitutively expressing Hsp90. These were then transformed into the Saccharomyces cerevisiae DBY288 shutoff strain (Bank et al. 2014; Hietpas et al. 2011) using the lithium acetate method. Amplification occurred initially for 12 h at 30 °C in nonselective galactose medium with ampicillin (100 μg/ml, please see details of medium composition in Bank et al. 2014). These were then transferred to selective dextrose medium, also at 30 °C, to initiate shutoff of the wild type copy of Hsp90. Bulk competition started after 8 h in this selective medium, under six different environmental conditions (25 °C, 30 °C, 36 °C, 25 °C + S, 30 °C + S, and 36 °C + S, where S represents the addition of 0.5 M sodium chloride). For simplicity, we will refer to these conditions as normal medium or high-salt medium, and abbreviate these by 25N and 25S, for example, when additionally referring to the 25 °C environment. Samples were taken at several time points during the experiment (Table S1) and stored at −80 °C for posterior DNA isolation and sequencing. Sequencing was performed by the University of Massachusetts deep-sequencing facility, which generated approximately 30 million reads (see also Table S1, Bank et al. 2014). For further details regarding the experimental method please see (Bank et al. 2014; Hietpas et al. 2011; 2012; 2013).

The data set analyzed here contains all 576 possible single-codon mutations in a 9-amino acid region of the C terminal part of Hsp90 (amino acid positions 582 to 590) in Saccharomyces cerevisiae. Whereas from most environments only a single replicate was available, we had access to three technical replicates at 30N and two biological replicates at 30S. Populations were originally adapted to the 30N environment. Growth rates for all mutants were estimated using empiricIST. Furthermore, to obtain growth rate estimates per amino acid (residue) position, we pooled nucleotide sequences and jointly estimated growth rates for those nucleotide sequences that resulted in the same amino acid sequence (see above and SI). All downstream analyses are based on 1000 subsamples of the posterior distribution obtained from empiricIST, if not otherwise indicated. Selection coefficients were obtained by normalizing to the median growth rate of all mutations synonymous to the reference sequence as detailed in Bank et al. (2014).

Distribution of synonymous mutations

We obtained the distribution of synonymous fitness effects across all amino acid mutations as the difference between the selection coefficient of each individual codon and its corresponding pooled amino acid estimate. These data were used to perform the analyses in section The distribution of synonymous fitness effects.

Quantifying the impact of GC bias

To check whether Illumina sequencing created a GC bias in our data, we estimated the impact of GC content throughout the several steps of data acquisition and selection coefficient estimation. Firstly, because the library composition was not assessed directly for the data sets used in this study, we used the web plot digitizer (https://automeris.io/WebPlotDigitizer/) to obtain the abundance of each of the 64 codons during library construction in the data from Hietpas et al. (2011), Supplementary Figure 7C in Hietpas et al. (2011). We estimated the fraction of each of the 4 nucleotides present and calculated the deviation from the expected 25%. This was done for all three codon positions. We found a positive bias towards AT codons (Fig. S5) in the library construction. To estimate GC bias in the sequencing data obtained after the experiment we calculated how many G or Cs (guanine or cytosine nucleotides) were present in each barcoded codon (minimum 10, maximum 17). A glm (generalized linear model) using the negative binomial family with Environment indicated a small positive bias of GC content in the sequence abundance (GC: 0.098, P < 0.0001). This bias was also observed when we tested the correlation between CG abundance and selection coefficient for each amino acid substitution using an ANOVA model including GC content and Environment (GC: 0.00456, P < 0.0001). However, when repeating this analysis using the selection coefficient of synonymous mutations (i.e., after subtracting the amino acid effect) this bias was no longer significant (GC: 0.00003, P = 0.7244), indicating that the observed GC bias may indeed reflect selection rather than being an artifact of sequencing. Nevertheless, to account for any potential contribution of GC content or its interaction with other mechanisms, we included the GC abundance in the models that were used to identify possible mechanisms causing synonymous fitness effects.

Detecting the effect of synonymous mutations

Experimental error and reproducibility of measurements

To assess the reproducibility of measurements, we compared the correlation between selection coefficient estimates across the three 30N and two 30S replicates, and computed the overlap in their growth rate posteriors. For each replicate pair, we calculated the correlation between mutation-specific fitness effects from both the median estimates and 1000 randomly selected posterior samples. The median correlation of fitness effects across pairs of replicates for high salt medium (biological replicates) was 0.84 (lower and upper credibility intervals from 1000 posterior samples: [0.78, 0.88]) and for standard medium (technical replicates) it was 0.98 (lower and upper credibility intervals from 1000 posterior samples: [0.97, 0.99]), confirming that the experimental protocol has an excellent resolution for measuring selection coefficients. An ANOVA test indicated that experimental error was negligible in comparison to the effect of changing medium (Table S2) and confirmed the previously observed strong costs of adaptation (Hietpas et al. 2013).

To quantify whether the empiricIST credibility intervals cover the experimental error appropriately, we estimated the overlap between the 95% credibility intervals of the posterior distribution for all pairs of replicates. We observed a large overlap between pairs of replicates (Fig. S6, normal environment—(a) Rep1-2: 98%; (b) Rep1-3: 91%; (c) Rep2-3: 90%; high salt environment—(d) Rep1-2: 90%), indicating that the variance between replicates is indeed mostly covered by variance in the posterior distribution, and that we can use empiricIST credibility intervals as confidence levels in our analysis.

We used linear models to quantify the contribution of various factors to the estimated effects of synonymous mutations. Model variable names are highlighted throughout the paper using Italics. The respective analyses were performed on the distribution of synonymous effects data, i.e., the data in which the median amino acid effect was removed.

We estimated the relative contributions of the experimental error and the effect of synonymous mutations in the data by comparing the impact of replicate, codon, and medium (i.e., whether salt was added or not) using the following ANOVA model with data between replicates 2 and 3 of both the standard and the high-salinity environment for 30C:

$$\begin{array}{*{20}{l}} {Y} \hfill & = \hfill & {\mathrm{codon}} + {\mathrm{replicate}} + {\mathrm{medium}} + {\rm{replicate}}^\ast{\rm{codon}} \hfill \cr {} \hfill & {} \hfill & + {\rm{codon}}^\ast {\rm{medium}} + {\rm{replicate}}^\ast {\rm{medium}} \hfill \cr {} \hfill & {} \hfill &+ {\rm{codon}}^\ast {\rm{medium}}^\ast {\rm{replicate}} + \varepsilon \hfill \end{array}$$

where Y corresponds to the normalized selection coefficient, codon to a fixed factor corresponding to the 64 codons present in the data, replicate to a fixed factor pertaining to the arbitrary replicate number 2 or 3 for each environment, medium is a fixed factor corresponding to the presence or absence of high salt concentration in the medium and ε corresponds to the residual error. Additionally, we estimated effect size by calculating η2 (i.e., the ratio of the variance explained by a predictor to the total variance explained by the entire model Levine and Hullett 2002) for each of the model terms, using the etasq function of the R package sjstats (Lüdecke 2017). To assess the variability of our estimates, we performed the analysis for 1000 posterior samples. We find that the fitness effects of codon changes contribute more to the total variance of the model than variation in replicates, indicating that we can detect overall effects of codon changes, despite the presence of experimental error (Fig. S7).

Quantifying the effect size of synonymous mutations

To quantify the effect size of synonymous codon changes, we performed a linear regression for each amino acid (including all amino acids with 3 or more codons) and calculated η2 for the codon term as proxy for effect size (Levine and Hullett 2002). The regression per amino acid was performed within each environment and took into account residue solvent accessibility (i.e., whether the position was buried or exposed). Pooling of positions was performed to allow for the testing of codon effect within an amino acid. To minimize potential differences arising from pooling positions, we separated the data into buried and exposed positions according to solvent accessibility of the residue. Additionally, using an ANOVA model we tested how the estimated effect size per amino acid (using η2 as dependent variable) varied across environment and amino acid.

We performed three different but related types of analyses to quantify the average fitness effects of synonymous mutations. Firstly, we focused solely on the 15 mutations that are synonymous to the reference sequence (similar to Bank et al. 2014). Here, we computed the medians of the maximum and minimum effect size, and the standard deviation from 1000 samples of the posterior. Secondly, across the whole data set, we computed the descriptive statistics of the differences between each codon and the average amino acid effect of this codon. Finally, we compared the distributions of the absolute pairwise differences between amino acid effects, synonymous-codon effects, and samples from the posterior of the same codon. For the environments 30N and 30S (30 °C with normal and high salinity) we performed all analyses across the available 3 and 2 replicates, respectively, and confirmed that our conclusions remain qualitatively similar (results not shown).

Potential mechanisms underlying the effect of synonymous mutations on fitness

There are several mechanisms through which synonymous mutations can affect protein translation (reviewed in Plotkin et al. 2011). In this study, we focused on whether codon usage frequency or predicted mRNA stability (using RNA melting temperature as a proxy) can predict effects of synonymous mutations (Presnyak et al. 2015).

Firstly, to enable the inclusion of codon frequency patterns in yeast into our regression models, we obtained the relative abundance of each codon in the yeast genome from the Codon Usage Database (http://www.kazusa.or.jp/codon/cgi-bin/showcodon.cgi?species=4932).

Secondly, synonymous mutations may affect translation through different stability of the mRNA generated by different codons. To obtain predictions of how mRNA stability is affected by synonymous mutations, we used the prediction software mfold (Markham and Zuker 2008; Zuker et al. 1999) for 25, 30, and 36 °C and with high salt concentrations (0.5 M Na+), with physiological concentrations of salt (0.015 M Na+), and 0.001 M Mg2+, respectively. As input, we used sequences spanning 135 nucleotides of the Hsp90 protein in yeast. To obtain these sequences, we added 54 nucleotides flanking both 5′ and 3′ sides of the region of interest (complete sequences were obtained from https://www.addgene.org/41188/sequences/). From each of these data sets, we selected the conformation with the highest melting temperature (Tm), as highest-stability reference point.

Since Hsp90 is a chaperone involved in the response to thermal stress as well as in the regulation of osmotic stress (Boucher et al. 2014; Yang et al. 2006), we tested which factors can explain variation in codon fitness effects. For that we performed model selection using the leaps package (Lumley 2017). We started with the full model with all factors (Temperature, Medium, Codon frequency, Melting temperature, Residue position, and GC content) and their interactions and proceeded by backward selection. We selected the best models under three different criteria—BIC (Bayesian Information Criteria), adjusted R2 and Mallow’s Cp. To select the best model, we calculated the credibility interval, based on 1000 posterior samples, for AIC, BIC, R2 and adjusted R2, and performed model comparison with an ANOVA analysis.

Effect of synonymous mutations on the topography and the dynamics of adaptive walks in codon fitness landscapes

To quantify the impact of effects of synonymous mutations coding for the same amino acid on the topography of the fitness landscape, we compared the single-effect landscape with the averaged landscape. For the single-effect landscapes (Fig. 1a3), the effect of each codon was directly obtained from the experimental data. For the averaged landscape (Fig. 1a2), we assigned to every codon that coded for the same amino acid, the same pooled amino acid estimate obtained from empiricIST.

Each amino acid position in our data set corresponds to a complete multiallelic fitness landscape with 43 = 64 genotypes. We characterized the prevalence of epistasis in the resulting 9 × 6 = 54 fitness landscapes using several fitness landscape statistics. We estimated (1) the roughness-to-slope ratio (Aita et al. 2001; Bank et al. 2016; Szendro et al. 2013) to quantify the relative deviations from an additive model; (2) the multi-allelic gamma statistics (Bank et al. 2016; Ferretti et al. 2016) to characterize the prevalence and type of epistasis in the landscape. Additionally, to test the impact of synonymous mutations on the evolutionary dynamics on the landscape we estimated: (1) the number of local peaks (Szendro et al. 2013) and (2) the length and variance in the length of potential adaptive walks in the landscapes (Neidhart and Krug 2011; Szendro et al. 2013). Probabilities of adaptive walks were computed analytically under the strong-selection weak-mutation approximation, following Bank et al. (2016). Credibility of the estimates was assessed by computing the fitness landscape statistics for 100 posterior samples.

Differences between averaged and single-effect landscapes were assessed by comparing the lower and upper 2.5% boundary of the credibility estimates. Differences were considered significant if the lower credibility interval from the averaged landscape and the upper credibility interval of the single-effect landscape (and vice-versa) did not overlap for each of the statistics. We did not perform multiple testing adjustment for this analysis.

All analyses were performed with R (R version 3.3.3 Core Team 2017) or Mathematica 11 (version 11.2 Wolfram Research Inc. 2017).

Results and discussion

The distribution of fitness effects of synonymous mutations

Previous studies have shown that synonymous mutations can directly affect fitness (e.g., Firnberg et al. 2014; Hunt et al. 2014; Lind et al. 2010) and impact the ability of populations to adapt to new environments (Agashe et al. 2016; Bailey et al. 2014). For example, Bailey et al. (2014) found that two synonymous mutations were responsible for adaptation of Pseudomonas fluorescens to a new medium by increasing the expression of a gene involved in glucose metabolism. In a more recent study in Methylobacterium extorquens, Agashe et al. (2016) found that the deleterious effect of synonymous mutations in a medium with methy-lamine as the sole carbon source could be rescued by different mutations, including four synonymous mutations that increased transcription and protein production levels. The impact of synonymous mutations at the genome-wide level was also found in patterns of codon usage bias (syno-ny-mous codons are used at different frequencies) across genomes. Evidence from studies within and between species support the role of direct selection on synonymous sites in various genes (Choi and Aquadro 2016; DuMont et al. 2004; Hershberg and Petrov 2009; Ran and Higgs 2010; Shah and Gilchrist 2011; Singh et al. 2007; Sun et al. 2016). A first piece of evidence for synonymous effects in the studied region of Hsp90 came from Bank et al. (2014) who reported that one of the 15 mutations synonymous to the parental sequence had a significantly deleterious effect in 4 out of 6 environments (Fig. 9 in Bank et al. 2014). In order to quantify the distribution of synonymous fitness across different amino acid backgrounds, we applied empiricIST to the data set from Bank et al. (2014), which provided us with the growth rate of all 576 possible codon mutations across a 9 amino acid region of Hsp90 in Saccharomyces cerevisiae in 6 different experimental conditions, estimated from bulk competitions. We extracted the synonymous fitness contribution of each mutation by subtracting the mean amino acid effect.

The effect size of synonymous mutations

Although most of the 15 mutations that are synonymous to the wild type are of similar effect, some individual syno-ny-mous mutations present as much as 1% of fitness change (Codon AAC, Fig S8). Since the data set includes all 64 codon mutations for each residue position, we obtained a larger set of synonymous mutations by extracting the effect of synonymous mutations across all amino acids. By default, these mutations include the effect of amino acid changes in relation to the wild type plus possible effects of synonymous mutations. To eliminate the amino acid effect, for all mutations we subtracted the estimated average effect of the corresponding amino acid (see Methods). As a result of this, the DFE of the synonymous mutations is concentrated around 0. Its shape is clearly different from the DFE of the non-synonymous mutations (Fig. S9), with much lower effect sizes. As observed for the synonymous mutations to the wild type, the average effect size of synonymous mutations varies across environments, from 0.001 in 30S to 0.004 in 36N (Table S3B). However, effect sizes are also highly variable (Fig S9) and can reach up to 0.019 in 25N, 0.010 in 25S, 0.022 in 30N, 0.014 in 30S, 0.045 in 36N and 0.023 in 36S (Table S3B).

To quantify the effect of synonymous mutations in comparison with the effect of non-synonymous mutations and experimental error, we calculated the absolute pairwise differences between 1000 random pairs of amino acids, codons, posterior samples and replicates (Table S4). This allowed us to estimate the average effect of: (1) an amino acid change (non-synonymous mutations), (2) a codon change within the same amino acid (synonymous mutations), (3) variation between posterior samples (estimation error). Expectedly, absolute pairwise differences between mutants that changed the amino acid have a much stronger effect than synonymous mutations (Table S4, see Material & Methods). Overall differences between synonymous mutations are higher than two random draws of the posterior. On average, the effects of synonymous mutations are larger in the 36N environment (Table S4, Fig. S10), where Hsp90 is expected to be more important for organism survival (Boucher et al. 2014; Mishra et al. 2016; Yang et al. 2006).

Overall, our assessment shows that the size of synonymous fitness effects is non-negligible but only slightly above the experimental detection limit. Therefore, we restrict ourselves to statistical arguments about the distribution of synonymous effects in this study rather than identifying specific mutations which would likely result in a high false discovery rate.

The beneficial tail of the distribution of synonymous fitness effects

The distribution of fitness effects contains information about the availability of beneficial mutations (Orr 2005; 2010). It is of particular interest to study the shape of the beneficial tail of this distribution as it determines various aspects regarding the nature of adaptive walks (Eyre-Walker 2006; Orr 2010). Using the same data as in the present study, Bank et al. (2014) previously found that for all environments except 25S, the beneficial tail of the full distribution of fitness effects most likely belonged to the Weibull domain. This suggested that populations were close to a well-defined optimum, and the available beneficial mutations would be of similar and small size (Bank et al. 2014; Joyce et al. 2008; Orr 2010).

We used the tail shape estimator from empiricIST to estimate the tail shape of the distribution of beneficial synonymous mutations (i.e., the distribution of the beneficial contributions to the amino acid effects). We find that the shape parameter of the fitted Generalized Pareto Distribution is most likely positive in all environments, which indicates that the resulting shape of the beneficial tail belongs to the Fréchet domain (Fig. 3) (Joyce et al. 2008; Orr 2010). Distributions from this domain are characterized by many mutations of small effect, along with few mutations of large and unpredictable effect (Jain and Seetharaman 2011; Joyce et al. 2008; Neidhart and Krug 2011). Such a shape of the distribution of synonymous effects makes sense both intuitively and with respect to the reported examples of large-effect synonymous mutations (Agashe et al. 2016; Bailey et al. 2014): a majority of synonymous mutations may not affect fitness at all, whereas specific ones could indeed significantly affect fitness. This would also explain why synonymous effects are seldomly detected, as the rarity of large effects implies that a large number of mutations has to be screened to obtain a positive result.

Fig. 3
figure 3

Shape parameter estimates of the beneficial tail of the distribution of synonymous effects. In all environments, the estimated shape parameter of the tail is positive, which indicates that the distribution of synonymous effects belongs to the Fréchet domain (i.e., that it can be characterized by a heavy-tailed distribution). This implies the presence of many nearly-neutral and few larger-effect mutations. The shape parameter was estimated using the tail shape estimator from empiricIST using the information of 1000 samples from the posterior distribution. Environmental conditions are indicated as the combination of temperature (25 °C, 30 °C, and 36 °C) or (25, 30, and 36 °C) and salinity (N = normal and S = high salinity)

Relationship between observed synonymous effects and potential underlying biological mechanisms

Synonymous mutations can affect fitness by altering speed and accuracy of translation, and mRNA folding and stability (Brule and Grayhack 2017; Drummond and Wilke 2008; Knöppel et al. 2016; Kudla et al. 2009; Plotkin and Kudla 2011; Presnyak et al. 2015; Shabalina et al. 2013; Sharp et al. 2010; Yu et al. 2015; Zhou et al. 2009). It has been proposed that protein folding may be affected more significantly by changes in translation accuracy for buried (structural) positions, as they are often involved in the formation of crucial secondary and tertiary structures of the protein (Drummond and Wilke 2008; Saunders and Deane 2010; Zhou et al. 2009). The usage of different synonymous codons could therefore allow cells to slow down or arrest protein production in response to sudden environmental changes and to optimize resource production (Fredrick and Ibba 2010; Tuller et al. 2010; Zhang et al. 2009). We evaluated whether the effects of synonymous mutations that we observe can be explained by variation in codon preference or mRNA stability. To this end, we analyzed a full linear model incorporating temperature, medium composition, residue position, melting temperature of mRNAs, GC content, and codon usage frequency, as well as all possible interactions of those factors.

No clear predictors of codon fitness emerged from this analysis. The best model indicated that fitness effects of synonymous mutations are affected by interactions between residue positions and temperature, medium composition, mRNA melting temperature, GC content, and codon usage frequency (Table S5); however, only 1.4% of the variance in fitness effects could be attributed to this combination of factors. There are various reasons that could explain this inconclusive result. Firstly, the synonymous effect sizes could be too small compared with the experimental uncertainty to yield a clear result. This problem should be amplified by the observed shape of the distribution of synonymous effects; if only few mutations have an effect, the statistical power to detect this effect in the full data set will be very low. Secondly, we considered diverse amino acid positions and environments. Intuitively, it seems plausible that at each of the positions, different biological mechanisms could contribute to synonymous fitness effects. Thirdly, our analysis is based on a distribution of synonymous fitness effects that was observed on top of an amino acid effect in a conserved region of the protein, which could blur the true distribution of synonymous effects. Thus, larger data sets based on synonymous mutations to a common reference will be necessary for a better statistical assessment of the factors underlying the distribution of synonymous fitness effects.

The shape of the codon fitness landscape with and without synonymous effects

Having established that there is a non-negligible distribution of synonymous fitness effects, it is natural to ask how considering such effects changes a given fitness landscape. In the following section, we analyze the 54 64-genotype fitness landscapes of single amino acid positions that are contained in our data set. In contrast to the section above, we now also consider the amino acid effects of mutations and compare the shape of the fitness landscape when (1) all codons for the same amino acid are assigned the same effect (averaged landscape) and when (2) all codons have individually estimated effects (single-effect landscape).

Epistasis in the codon fitness landscape

We investigated the effect of synonymous mutations on the topography of the fitness landscape by comparing the prevalence and type of epistasis for averaged and single-effect landscapes (see Fig. 1 a2, a3, Material & Methods) for each of the 9 amino acid positions across 6 environments. For all 54 landscapes, we computed two statistics: the roughness-to-slope ratio r/s (Szendro et al. 2013) and the locus-specific gamma statistic (Ferretti et al. 2016). The roughness-to-slope ratio quantifies the prevalence of epistasis by comparing the deviation of the landscape from an additive model with the magnitude of the fitness effects (Carneiro and Hartl 2010; Schenk et al. 2013). The γij statistic measures the correlation of fitness effects of the same mutations in a single-step distance across all genetic backgrounds. Whereas the roughness-to-slope ratio describes the landscape by means of only a single value, γij results in a representation of the landscape by means of 2L values, where L is the number of loci. This epistatic footprint makes heterogeneity of epistasis in the landscape visible, and can thus indicate epistatic signals at the level of single loci (e.g. Bank et al. 2016).

The roughness-to-slope-ratio indicates that all but one of the codon landscapes are highly epistatic (r/s >1), with the magnitude of the roughness-to-slope ratio varying across amino acid positions and environments (Fig. S11). Single-effect landscapes tend to be more epistatic (larger roughness-to-slope ratio) than averaged landscapes, although this difference is in general small. Interestingly, in few cases the single-effect landscape has a smaller roughness-to-slope ratio than its corresponding averaged landscape. This is noteworthy because in this case the consideration of synonymous fitness effects makes the landscape less rugged/more linear, which is opposite to the intuitive expectation that adding variation in synonymous effects should increase the number of peaks and thus the prevalence of epistasis in the landscape. At high salinity, the roughness-to-slope ratio tends to be larger than in normal environments, and also the difference in the roughness-to-slope ratios between amino acid positions and between averaged and single-effect landscapes is larger (Fig. S11). The stronger epistatic signal observed in the high-salinity environments could be caused by the combination of low absolute growth rates observed in high salinity conditions that result in larger relative fitness differences of the mutations (c.f. Table S1 in Bank et al. 2014), and larger experimental uncertainty (Fig. S6) in this environment. This indicates that one needs to be cautious when interpreting roughness-to-slope ratios across data sets, because the measure may be confounded by experimental differences rather than genuine changes in the epistatic component of the landscape.

Computing the γij statistic per codon position confirms that averaged and single-effect landscapes tend to display a similar strength of epistasis within amino acid position and environment on a global scale (Fig. 4, Fig. S12). Only when γij is computed for individual pairs of nucleotide substitutions, larger differences in epistasis appear across the resulting epistatic profiles (see Fig. S13). The γij statistic per codon position shows smaller differences between environments than the roughness-to-slope ratio. As the gamma statistic is based on the correlation and not the effect size of fitness effects across genetic backgrounds, it is less sensitive to differences in mutational effect sizes and experimental error. The largest systematic differences in the codon position-wise strength of epistasis are found when comparing the order in which mutations occur. Gamma measures obtained from the epistatic effect of non-synonymous mutations (γ1→2, γ2→1) in general display strong epistasis (Fig. 4), compared to gamma measures obtained from (mostly) synonymous mutations (γ1→3, γ2→3, γ3→1, γ3→2). Thus, the structure of the codon table (i.e., the existence of synonymous and non-synonymous mutations) leaves a distinctive signal in all fitness landscapes, but the signal looks similar for averaged and single-effect landscapes. Splitting this signal into its components caused by individual pairs of nucleotides illustrates extensive local heterogeneity of epistasis across codon fitness landscapes and between single-effect and averaged landscapes (see Fig. S13) and indicates the potential for different dynamics of adaptive walks, which is discussed in the next Section. However, this measure describes each codon fitness landscape by a set of 216 values, which makes it difficult to obtain quantitative comparisons. Nevertheless, Fig. S13 shows qualitatively that every codon fitness landscape has indeed a different epistatic profile. This local heterogeneity of the fitness landscapes is not well captured by averaging summary statistics such as the roughness-to-slope ratio and the codon position-based gamma statistic.

Fig. 4
figure 4

Gamma statistics of pairs of codon positions for single-effect landscapes across amino acid positions (y axis) and environments (x axis). In general, interactions of non-synonymous mutations (γ1→2, γ2→1) are more epistatic, than non-synonymous mutations in the background of synonymous mutations. There is no systematic variation across environments (x axis), but there seems to be a systematic impact of amino acid position on the strength of epistasis (y axis). Specifically, position 582 shows qualitatively stronger epistasis for both γ1→3 and γ2→3 across all environments. This indicates that synonymous mutations may have a relatively larger role at constraining evolutionary paths at this position

Impact of synonymous mutations on adaptive walks

Including synonymous mutations changes the topography of the landscape, which may affect the accessibility of different mutational paths by creating additional peaks and sinks in the fitness landscape. To quantify the impact of synonymous effects on adaptive walks, we calculated the number of optima, the mean expected length of adaptive walks, and the variance in the number of steps for the single-effect and averaged landscapes. We based our calculation on the assumption of the strong-selection weak-mutation limit (Gillespie 1984) in which evolution happens by means of sequential beneficial substitutions that result in an adaptive walk that ends in a fitness peak (e.g., Frank 2014; Orr 2005; Schoustra 2009; Zagorski et al. 2016). We define a fitness peak as any genotype with fitness higher than all single-step mutational neighbors. For averaged landscapes, in which all synonymous mutations are assigned equal fitness, we consider a fitness plateau spanned by synonymous codons as a single local optimum if all non-synonymous codons in a distance of a single nucleotide step have lower fitness (as in Fig. 1a3).

By definition, the number of fitness peaks in the averaged landscape has to be lower or equal to that of the single-effect landscape. Indeed, we find that there is a large difference in the number of fitness peaks between single-effect and averaged landscapes (Fig. 5, Fig. S14). This difference is environment-dependent and also varies across amino acid positions (Fig. 5, Fig. S14), and it is accompanied by a larger between-environment variation in the number of peaks in the single-effect landscapes. For most environments and positions, averaged landscapes have only 1 or 2 fitness peaks (Fig. 5, Fig. S14). Conversely, among the single-effect landscapes, 25 °C stands out with a consistently large number of peaks across all amino acid positions (mean across positions for single-effect: 5.568, mean across positions for averaged landscape: 2.593). The much larger number of fitness peaks in the single-effect landscape suggests that evolution on the ‘true’ fitness landscape that includes effects of synonymous mutations is less predictable (De Visser and Krug 2014; Lobkovsky et al. 2011).

Fig. 5
figure 5

Number of optima observed from 100 posterior samples of single-effect (dark blue) and averaged (light yellow) landscapes for positions 582, 584, 586, and 590 (from left to right) across environments. (See Fig. S14 for the complete set of loci.) The number of optima is always larger in single-effect than in averaged landscapes. The number of optima is smaller at high temperatures, which may indicate increased constraints to adaptation. The large difference between the number of peaks in averaged and single landscapes suggests that synonymous mutations can affect adaptation to a new environment by trapping the population at a local optimum. The mean and median of the distributions are significantly different according to Welch two sample t-test and Wilcoxon test, p < 0.00001 after Bonferroni correction for 54 comparisons

As synonymous mutations are expected to have a stronger effect in buried amino acid positions (Drummond and Wilke 2008) differences between adaptive walks on single-effect and averaged landscapes should be larger in buried positions. However, we do not see consistent variation between the two landscape types between buried or exposed positions (see Material & Methods), which suggests that the impact of synonymous mutations is not solely due to effects on protein folding, i.e., not strongly correlated with the solvent accessibility of the residues.

Biologically, a larger difference in predicted adaptive walks between averaged and single-effect landscapes points to a greater importance of synonymous fitness effects. The pronounced differences that we observed at cold temperature could stem both from the smaller absolute growth rate of the wild type in this environment, which results in larger relative effects of mutations (i.e., small-effect mutations could become more visible), and from a reduced need for functional Hsp90 at cold temperature (i.e., Hsp90 is not so necessary), which could result in a larger number of (synonymous) adaptive solutions that are connected to the fine-tuning of the protein. In support of this hypothesis, we observe fewer optima and longer and more variable adaptive walks in the single-effect landscapes at 36N (Tables S6, S7), which is in agreement with the importance of Hsp90 at high temperatures (Bank et al. 2014; Boucher et al. 2014; Hietpas et al. 2013; Mishra et al. 2016) which may leave only few options for improvement. This is consistent with the small proportion of beneficial mutations across the whole DFE observed by Bank et al. (2014) in this condition.

Our results allow for an interesting thought experiment regarding the impact of synonymous mutations on evolution across populations of different sizes. Our results add to the notion that synonymous fitness effects exist but are small on average. According to the nearly-neutral theory, such small fitness effects will only be visible to selection if the population size is large (Ohta 1992) When they become visible in large populations, synonymous fitness effects create additional peaks in the organism’s fitness landscape, in which adaptation can become stalled. In such a situation, bottlenecks (i.e., sudden drops in the population size), which can occur under natural scenarios and are also frequently imposed in experiments, may render synonymous mutations effectively neutral. This erases the previous fitness peak and allows the population to continue their adaptive walk on an averaged-type fitness landscape. Thus, by opening mutational paths and erasing synonymous fitness peaks, a (temporally) smaller population size could speed up adaptation and increase its predictability (Jain et al. 2011; Wright 1931). Caused by the different effect size and distribution of non-synonymous versus synonymous mutations, this effect is in contrast to the slowdown of adaptation and decrease of predictability of evolution in small populations proposed in standard population-genetic theory (Lanfear et al. 2014; Orr 2000).

Conclusion

The impact of the codon table on the evolutionary dynamics on fitness landscapes has received little attention. This is a consequence of the vast size of the nucleotide space and the resulting dimensionality of the fitness landscape, which has led to most studies restricting themselves to the amino acid level. Using selection coefficient estimates obtained with empiricIST, a new software for the estimation of growth rates from deep mutational scanning data, we characterized the distribution of synonymous fitness effects and investigated the consequences of including synonymous mutations when characterizing the fitness landscape of single amino acid positions across environments. Interestingly, we found support for a heavy-tailed distribution of beneficial syno-ny-mous effects across all environments, suggestive of a distribution of fitness effects with many small-or-no effect mutations and few mutations of potentially large effects. This is in line with the current population-genetics literature, in which the importance of accounting for syno-ny-mous fitness effects is discussed controversially. We demonstrate that synonymous mutations can impact the topography of the fitness landscape and affect adaptation in an environment-dependent fashion. Importantly, we show that synonymous fitness effects can directly impact both the path and endpoint of an adaptive walk by creating additional fitness peaks. This highlights the importance of their consideration in the study of fitness landscapes.

Data archiving

The complete documentation of all analyses, which allows for the reiteration of all steps, is available from the Dryad Digital Repository https://doi.org/10.5061/dryad.k7jm5hp.