Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

# Expression level is a major modifier of the fitness landscape of a protein coding gene

## Abstract

The phenotypic consequence of a genetic mutation depends on many factors including the expression level of a gene. However, a comprehensive quantification of this expression effect is still lacking, as is a further general mechanistic understanding of the effect. Here, we measured the fitness effect of almost all (>97.5%) single-nucleotide mutations in GFP, an exogenous gene with no physiological function, and URA3, a conditionally essential gene. Both genes were driven by two promoters whose expression levels differed by around tenfold. The resulting fitness landscapes revealed that the fitness effects of at least 42% of all single-nucleotide mutations within the genes were expression dependent. Although only a small fraction of variation in fitness effects among different mutations can be explained by biophysical properties of the protein and messenger RNA of the gene, our analyses revealed that the avoidance of stochastic molecular errors generally underlies the expression dependency of mutational effects and suggested protein misfolding as the most important type of molecular error among those examined. Our results therefore directly explained the slower evolution of highly expressed genes and highlighted cytotoxicity due to stochastic molecular errors as a non-negligible component for understanding the phenotypic consequence of mutations.

This is a preview of subscription content

## Access options

from\$8.99

All prices are NET prices.

## Data availability

All raw data from high-throughput sequencing were deposited to NCBI BioProjects under accession number PRJNA681990.

## Code availability

Custom R and python codes were used in data analysis, which are available on Github (https://github.com/woson2020/Experror).

## References

1. 1.

Wagner, G. P. & Zhang, J. The pleiotropic structure of the genotype–phenotype map: the evolvability of complex organisms. Nat. Rev. Genet. 12, 204–213 (2011).

2. 2.

Mackay, T. F., Stone, E. A. & Ayroles, J. F. The genetics of quantitative traits: challenges and prospects. Nat. Rev. Genet. 10, 565–577 (2009).

3. 3.

Soskine, M. & Tawfik, D. S. Mutational effects and the evolution of new protein functions. Nat. Rev. Genet. 11, 572–582 (2010).

4. 4.

Weinreich, D. M., Delaney, N. F., Depristo, M. A. & Hartl, D. L. Darwinian evolution can follow only very few mutational paths to fitter proteins. Science 312, 111–114 (2006).

5. 5.

Sarkisyan, K. S. et al. Local fitness landscape of the green fluorescent protein. Nature 533, 397–401 (2016).

6. 6.

Li, C., Qian, W., Maclean, C. J. & Zhang, J. The fitness landscape of a tRNA gene. Science 352, 837–840 (2016).

7. 7.

Puchta, O. et al. Network of epistatic interactions within a yeast snoRNA. Science 352, 840–844 (2016).

8. 8.

Taylor, M. B. & Ehrenreich, I. M. Higher-order genetic interactions and their contribution to complex traits. Trends Genet. 31, 34–40 (2015).

9. 9.

Mackay, T. F. Epistasis and quantitative traits: using model organisms to study gene–gene interactions. Nat. Rev. Genet. 15, 22–33 (2014).

10. 10.

Weinreich, D. M., Lan, Y., Wylie, C. S. & Heckendorn, R. B. Should evolutionary geneticists worry about higher-order epistasis? Curr. Opin. Genet. Dev. 23, 700–707 (2013).

11. 11.

Drummond, D. A., Bloom, J. D., Adami, C., Wilke, C. O. & Arnold, F. H. Why highly expressed proteins evolve slowly. Proc. Natl Acad. Sci. USA 102, 14338–14343 (2005).

12. 12.

Zhang, J. & Yang, J. R. Determinants of the rate of protein sequence evolution. Nat. Rev. Genet. 16, 409–420 (2015).

13. 13.

Li, X., Lalic, J., Baeza-Centurion, P., Dhar, R. & Lehner, B. Changes in gene expression predictably shift and switch genetic interactions. Nat. Commun. 10, 3886 (2019).

14. 14.

Fisher, R. A. The Genetical Theory of Natural Selection (Clarendon Press, 1930).

15. 15.

Huang, Y. F. & Siepel, A. Estimation of allele-specific fitness effects across human protein-coding sequences and implications for disease. Genome Res. 29, 1310–1321 (2019).

16. 16.

Kircher, M. et al. A general framework for estimating the relative pathogenicity of human genetic variants. Nat. Genet. 46, 310–315 (2014).

17. 17.

Gulko, B., Hubisz, M. J., Gronau, I. & Siepel, A. A method for calculating probabilities of fitness consequences for point mutations across the human genome. Nat. Genet. 47, 276–283 (2015).

18. 18.

Huang, Y. F. Unified inference of missense variant effects and gene constraints in the human genome. PLoS Genet. 16, e1008922 (2020).

19. 19.

Eyre-Walker, A. & Keightley, P. D. The distribution of fitness effects of new mutations. Nat. Rev. Genet. 8, 610–618 (2007).

20. 20.

Yang, J. R., Chen, X. & Zhang, J. Codon-by-codon modulation of translational speed and accuracy via mRNA folding. PLoS Biol. 12, e1001910 (2014).

21. 21.

Yang, J. R., Zhuang, S. M. & Zhang, J. Impact of translational error-induced and error-free misfolding on the rate of protein evolution. Mol. Syst. Biol. 6, 421 (2010).

22. 22.

Yang, J. R., Liao, B. Y., Zhuang, S. M. & Zhang, J. Protein misinteraction avoidance causes highly expressed proteins to evolve slowly. Proc. Natl Acad. Sci. USA 109, E831–E840 (2012).

23. 23.

Drummond, D. A. & Wilke, C. O. Mistranslation-induced protein misfolding as a dominant constraint on coding-sequence evolution. Cell 134, 341–352 (2008).

24. 24.

Mehlhoff, J. D. et al. Collateral fitness effects of mutations. Proc. Natl Acad. Sci. USA 117, 11597–11607 (2020).

25. 25.

Wenger, A. M. et al. Accurate circular consensus long-read sequencing improves variant detection and assembly of a human genome. Nat. Biotechnol. 37, 1155–1162 (2019).

26. 26.

Li, C. & Zhang, J. Multi-environment fitness landscapes of a tRNA gene. Nat. Ecol. Evol. 2, 1025–1032 (2018).

27. 27.

Ghaemmaghami, S. et al. Global analysis of protein expression in yeast. Nature 425, 737–741 (2003).

28. 28.

Miller, B. G., Hassell, A. M., Wolfenden, R., Milburn, M. V. & Short, S. A. Anatomy of a proficient enzyme: the structure of orotidine 5′-monophosphate decarboxylase in the presence and absence of a potential transition state analog. Proc. Natl Acad. Sci. USA 97, 2011–2016 (2000).

29. 29.

Keren, L. et al. Massively parallel interrogation of the effects of gene expression levels on fitness. Cell 166, 1282–1294 (2016).

30. 30.

Faure, G., Ogurtsov, A. Y., Shabalina, S. A. & Koonin, E. V. Role of mRNA structure in the control of protein folding. Nucleic Acids Res. 44, 10898–10911 (2016).

31. 31.

Park, C., Chen, X., Yang, J. R. & Zhang, J. Differential requirements for mRNA folding partially explain why highly expressed proteins evolve slowly. Proc. Natl Acad. Sci. USA 110, E678–E686 (2013).

32. 32.

Sabi, R., Volvovitch Daniel, R. & Tuller, T. stAIcalc: tRNA adaptation index calculator based on species-specific weights. Bioinformatics 33, 589–591 (2017).

33. 33.

Capriotti, E., Fariselli, P. & Casadio, R. I-Mutant2.0: predicting stability changes upon mutation from the protein sequence or structure. Nucleic Acids Res. 33, W306–W310 (2005).

34. 34.

Protter, D. S. W. et al. Intrinsically disordered regions can contribute promiscuous interactions to RNP granule assembly. Cell Rep. 22, 1401–1412 (2018).

35. 35.

Zuckerkandl, E. & Pauling, L. in Evolving Genes and Proteins (eds Bryson, V. & Vogel, H. J.) 97–166 (Academic Press, 1965).

36. 36.

Kimura, M. Evolutionary rate at the molecular level. Nature 217, 624–626 (1968).

37. 37.

Chen, X. & Zhang, J. The genomic landscape of position effects on protein expression level and noise in yeast. Cell Syst. 2, 347–354 (2016).

38. 38.

Brachmann, C. B. et al. Designer deletion strains derived from Saccharomyces cerevisiae S288C: a useful set of strains and plasmids for PCR-mediated gene disruption and other applications. Yeast 14, 115–132 (1998).

39. 39.

Gietz, R. D. & Schiestl, R. H. Quick and easy yeast transformation using the LiAc/SS carrier DNA/PEG method. Nat. Protoc. 2, 35–37 (2007).

40. 40.

Storici, F. & Resnick, M. A. The delitto perfetto approach to in vivo site-directed mutagenesis and chromosome rearrangements with synthetic oligonucleotides in yeast. Methods Enzymol. 409, 329–345 (2006).

41. 41.

Mortimer, R. K. & Johnston, J. R. Genealogy of principal strains of the yeast genetic stock center. Genetics 113, 35–43 (1986).

42. 42.

Qiu, C. & Kaplan, C. D. Functional assays for transcription mechanisms in high-throughput. Methods 159–160, 115–123 (2019).

43. 43.

Kebschull, J. M. & Zador, A. M. Sources of PCR-induced distortions in high-throughput sequencing data sets. Nucleic Acids Res. 43, e143 (2015).

44. 44.

Tsai, I. J., Bensasson, D., Burt, A. & Koufopanou, V. Population genomics of the wild yeast Saccharomyces paradoxus: quantifying the life cycle. Proc. Natl Acad. Sci. USA 105, 4957–4962 (2008).

45. 45.

Charlesworth, B. Fundamental concepts in genetics: effective population size and patterns of molecular evolution and variation. Nat. Rev. Genet. 10, 195–205 (2009).

46. 46.

Zhu, Y. O., Siegal, M. L., Hall, D. W. & Petrov, D. A. Precise estimates of mutation rate and spectrum in yeast. Proc. Natl Acad. Sci. USA 111, E2310–E2318 (2014).

47. 47.

Hoffman, C. S. & Winston, F. A ten-minute DNA preparation from yeast efficiently releases autonomous plasmids for transformation of Escherichia coli. Gene 57, 267–272 (1987).

48. 48.

Chaisson, M. J. & Tesler, G. Mapping single molecule sequencing reads using basic local alignment with successive refinement (BLASR): application and theory. BMC Bioinf. 13, 238 (2012).

49. 49.

Murakami, C. & Kaeberlein, M. Quantifying yeast chronological life span by outgrowth of aged cells. J. Vis. Exp. 6, 1156 (2009).

50. 50.

Chan, P. P. & Lowe, T. M. GtRNAdb: a database of transfer RNA genes detected in genomic sequence. Nucleic Acids Res. 37, D93–D97 (2008).

51. 51.

Chen, F. et al. Dissimilation of synonymous codon usage bias in virus-host coevolution due to translational selection. Nat. Ecol. Evol. 4, 589–600 (2020).

52. 52.

Berman, H. M. et al. The Protein Data Bank. Nucleic Acids Res. 28, 235–242 (2000).

53. 53.

Yang, F., Moss, L. G. & Phillips, G. N. Jr. The molecular structure of green fluorescent protein. Nat. Biotechnol. 14, 1246–1251 (1996).

54. 54.

Chan, K. K. et al. Mechanism of the orotidine 5′-monophosphate decarboxylase-catalyzed reaction: evidence for substrate destabilization. Biochemistry 48, 5518–5531 (2009).

55. 55.

Kabsch, W. & Sander, C. Dictionary of protein secondary structure: pattern recognition of hydrogen-bonded and geometrical features. Biopolymers 22, 2577–2637 (1983).

56. 56.

Tien, M. Z., Meyer, A. G., Sydykova, D. K., Spielman, S. J. & Wilke, C. O. Maximum allowed solvent accessibilites of residues in proteins. PLoS ONE 8, e80635 (2013).

57. 57.

Kyte, J. & Doolittle, R. F. A simple method for displaying the hydropathic character of a protein. J. Mol. Biol. 157, 105–132 (1982).

58. 58.

Linding, R., Russell, R. B., Neduva, V. & Gibson, T. J. GlobPlot: exploring protein sequences for globularity and disorder. Nucleic Acids Res. 31, 3701–3708 (2003).

59. 59.

Hofacker, I. L. et al. Fast folding and comparison of RNA secondary structures. Monatsh. Chem. 125, 167–188 (1994).

60. 60.

Linding, R. et al. Protein disorder prediction: implications for structural proteomics. Structure 11, 1453–1459 (2003).

61. 61.

Sato, K., Akiyama, M. & Sakakibara, Y. RNA secondary structure prediction using deep learning with thermodynamic integration. Nat. Commun. 12, 941 (2021).

## Acknowledgements

We thank J. Zhang, X. He and W. Qian for comments on the manuscript. This work was supported by the National Natural Science Foundation of China (grant nos. 31771406 to X.C., 81830103 to G.B.T. and J.-R.Y. and 31671320 and 31871320 to J.-R.Y.) and the National Key R&D Program of China (grant nos. 2017YFA0103504 to X.C. and 2018ZX10301402 to J.-R.Y.) and the National Special Research Program of China for Important Infectious Diseases (grant no. 2018ZX10302103 to X.C.) and the start-up grant from ‘100 Top Talents Program’ of Sun Yat-sen University (grant nos. 50000-18821112 to X.C. and 50000-18821117 to J.-R.Y.).

## Author information

Authors

### Contributions

J.-R.Y. and X. Chen conceived the idea and designed and supervised the study. Z.W., X. Cai and Y.L. conducted experiments and acquired data. X. Cai, G.-B.T., J.-R.Y. and X. Chen contributed new reagent and analytical tools. Z.W., X.Z., X. Chen and J.-R.Y. analysed data. Z.W., X. Chen and J.-R.Y. wrote the paper.

### Corresponding authors

Correspondence to Jian-Rong Yang or Xiaoshu Chen.

## Ethics declarations

### Competing interests

The authors declare no competing interests.

Peer review information Nature Nature Ecology & Evolution thanks Yuping Li and the other, anonymous, reviewer(s) for their contribution to the peer review of this work. Peer reviewer reports are available.

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

## Extended data

### Extended Data Fig. 1 Additional information on the experimental pipeline.

(a) The CDS of the gene (GFP or URA3) was divided into non-overlapping regions of 50 bp. For each 50-bp region, one mutant primer was synthesized with the focal 50-bp region by using doped nucleotides (therefore introducing mutations) and a 20-bp invariable sequence flanking either side of the 50-bp region; see also Fig. 1a, Methods and Supplementary Table 6. (b) Different mutant primers were used in combination with the terminator primer to amplify PCR fragment 1 and in combination with the promoter primer to amplify PCR fragment 2. Then, PCR fragments 1 and 2 were fused using promoter primers and barcode (+ index) primers, giving rise to fusion PCR fragment 1, which was further fused with the LEU2 marker. The final product, fusion PCR fragment 2, was ready for recombination; see also Fig. 1a, Methods and Supplementary Table 6. (c and d) Electrophoresis results for PCR fragment 1 (c) and PCR fragment 2 (d) of GFP. (e and f) Construction of the acceptor strain. The HO locus in S. cerevisiae strain BY4741 (MATa his3Δ1 leu2Δ0 met15Δ0 ura3Δ0) was individually replaced by two expression cassettes, PTDH3-GFP-KanMX and PADP1-GFP-KanMX, via homologous recombination to construct two GFP acceptor strains (e). The two URA3 acceptor strains were constructed by replacing the CDS of GFP in the GFP acceptor strains via homologous recombination (f). (g) Typical transformation results on plates selective for transformants, showing a relatively high transformation rate. (h) Histogram of between-sample correlations of genotype frequencies. The correlations were stratified as correlations between samples from the same timepoint (that is, biological replicates) (red) or different timepoints (green).

### Extended Data Fig. 2 The measured fitness landscape is overall accurate.

(a) Protein expression levels driven by the two promoters of PTDH3 and PAGP1 assayed by flow cytometry using the corresponding GFP acceptor strains. The error bar represents standard deviation from 9 replicates (3 biological × 3 technological). (b) Distribution of reads ratio (Day7/Day0) of wild-type barcodes in four mutant libraries. Only wild-type barcodes whose reads ratios were no more than one standard deviation away from their average (black bars) were pooled and used for the estimation of read ratio of wild-type. (c) Coefficient of variation (CV) of fitness among biological replicates was calculated for each genotype, and collectively shown as a standard boxplot for all genotypes within a library. As comparison, similar estimates were shown for the previously reported fitness landscape of a tRNA gene. (d and e) Comparison of fitness of PTDH3-GFP variants growing in YPD on day 7 among biological replicates. Pearson’s correlation coefficients and the corresponding P values are shown. Correlations of other libraries were listed in Supplementary Table 3. (f) Correlation of fitness estimates from day 3 and day 7, and that among different growth media. The Pearson’s correlation coefficients are shown by colours indicated by the colour scale bar. (g) Enrichment of deleterious mutations in activity centre of URA3. The fitness of each amino acid was calculated by the average fitness of all single mutants of the corresponding codon. The labelled amino acid are four known active site and the dashed ellipse outlines the activity centre of URA3. (h) Distributions of fitness for four types of mutants are shown as boxplots for each library. The four types of mutants are categorized as follows: synonymous (having one or more synonymous mutations), single missense (having one missense mutations, additional synonymous mutations allowed), multiple missense (having two or more missense mutations, additional synonymous mutations allowed), nonsense (having at least one nonsense mutations, additional missense or synonymous mutations allowed). Statistical significance of differences by Mann–Whitney U-test were indicated by asterisks: *: P < 0.05; **: P < 0.001; n.s.: not significant. (i) Ten least-fit and ten fittest genotypes picked from the PTDH3-GFP library growing in YPD were individually measured by a spectrophotometer for their doubling time, which was further subtracted by the doubling time of wild-type and thereby plotted as relative doubling time (y axis). The relative doubling time of each genotype was tested for significant deviation from 0 by Mann–Whitney U-test using the three biological replicates, giving rise to filled circles for significant genotypes or empty circles for insignificant genotypes.

### Extended Data Fig. 3 Extrapolation of the expression-dependent and independent components of the fitness effect of CDS mutations.

(a) Two types of linear models were regressed for the fitness effect of each single-nucleotide mutation (y axis) using the expression levels of PTDH3 and PAGP1. In the ‘expression-dominant’ model, no expression-independent components were assumed, whereas the ‘mixed’ model contained both expression-dependent and expression-independent components. (b) The quality of the expression-dominant and mixed models in describing the data were assessed according to the Akaike information criterion (AIC). The number of mutations better described by the mixed model (blue bar) was always higher than the number better described by the expression-dominant model (red bar), regardless of the fitness landscapes (y axis) used. Binomial P values against the null expectation of equal preference for both models are indicated as **: P < 10−3; ***: P < 10−5. (c and d) The relative (to ACT1) activities of different promoters in yeast strain S288c growing in different media were measured by RT–qPCR of expression levels of corresponding native genes (c). The response of each promoter to uracil shortage was calculated as the ratio between its activity in SC/SC-URA and that in YPD (d). In both panels c and d, error bar represents the standard deviation of six replicates, and P values from Mann–Whitney U-tests are indicated as *: P < 0.05. (e) A simple model explaining why a functionally required gene is more sensitive to deleterious mutations when it is lowly expressed compared to when it is highly expressed. The green curve represents the relationship between expression level of the gene (e, on x axis) and organismal fitness (w, on y axis). Assuming diminishing return, that is, $$dw/de = f\left( e \right)$$ and f is a monotonic decreasing function. The dark, medium and light blue vertical lines represent optimal, slight shortage and severe shortage of gene expression, respectively. A deleterious mutation should trigger a small loss of function of the gene that is effectively equivalent to a small reduction of expression (Δe, the two grey segments), which shall lead to a corresponding reduction in fitness (Δw, the red or pink segments). Apparently, the same Δe should give rise to smaller Δw when e is higher at ‘slight shortage’ (the pink segment) compared to when e is lower at ‘severe shortage’ (the red segment), because $${\Delta}w/{\Delta}e = f\left( e \right)$$ is smaller for larger e due to diminishing return. This relationship is also intuitively shown in the figure.

### Extended Data Fig. 4 Additional tests on the error avoidance models.

(a and b) These panels are similar to Fig. 4e, f except that the tRNA adaptation index (tAI) was used as a proxy to test the mistranslation avoidance hypothesis. (c and d) These panels are similar to Fig. 4c and d except that the mutation-triggered increment of hydrophobicity was multiplied by the relative solvent accessibility (RSA) as a proxy for the probability of misinteraction so that the misinteraction avoidance hypothesis could be tested. Note that only the amino acids at the protein surface (RSA > 0.4) were considered. Altering the RSA criteria for the protein surface would not change the conclusion (data not shown). (e-j) These panels are similar to Fig. 4c, e, g, d, f, h except that the fitness landscape of URA3 in YPD was used. These results were largely consistent with those presented in Fig. 4c–h, except for the functional constraints for misinteraction (i). For the mutations within the 75-100% quantile of Pmisfold, we suspected that the fraction of misfolded URA3 was too high, making the number of correctly folded URA3 molecules insufficient when gene expression was driven by PAGP1 compared to when it was driven by PTDH3. (k-m) These panels are similar to Fig. 4c, e, g except that the fitness landscape of URA3 in SC was used. (n-p) These panels are similar to Fig. 4c, e, g except that the fitness landscape of URA3 in SC-URA was used.

### Extended Data Fig. 5 Relative contribution of different error avoidance models in other fitness landscapes.

(a-c) These panels are similar to Fig. 5c except that other measured fitness landscapes, as indicated on top of each panel, were used to estimate the relative contribution of misfolding avoidance (red), misinteraction avoidance (blue) and the mRNA folding requirement (green).

## Supplementary information

### Supplementary Information

Supplementary Tables 1–6.

## Rights and permissions

Reprints and Permissions

Wu, Z., Cai, X., Zhang, X. et al. Expression level is a major modifier of the fitness landscape of a protein coding gene. Nat Ecol Evol (2021). https://doi.org/10.1038/s41559-021-01578-x