Genetic variation segregating within a species reflects the combined activities of mutation, selection, and genetic drift. In the absence of selection, polymorphisms are expected to be a random subset of new mutations; thus, comparing the effects of polymorphisms and new mutations provides a test for selection1,2,3,4. When evidence of selection exists, such comparisons can identify properties of mutations that are most likely to persist in natural populations2. Here we investigate how mutation and selection have shaped variation in a cis-regulatory sequence controlling gene expression by empirically determining the effects of polymorphisms segregating in the TDH3 promoter among 85 strains of Saccharomyces cerevisiae and comparing their effects to a distribution of mutational effects defined by 236 point mutations in the same promoter. Surprisingly, we find that selection on expression noise (that is, variability in expression among genetically identical cells5) appears to have had a greater impact on sequence variation in the TDH3 promoter than selection on mean expression level. This is not necessarily because variation in expression noise impacts fitness more than variation in mean expression level, but rather because of differences in the distributions of mutational effects for these two phenotypes. This study shows how systematically examining the effects of new mutations can enrich our understanding of evolutionary mechanisms. It also provides rare empirical evidence of selection acting on expression noise.
This is a preview of subscription content, access via your institution
Open Access articles citing this article.
Nature Communications Open Access 18 July 2019
Nature Communications Open Access 11 June 2018
Lack of 14-3-3 proteins in Saccharomyces cerevisiae results in cell-to-cell heterogeneity in the expression of Pho4-regulated genes SPL2 and PHO84
BMC Genomics Open Access 06 September 2017
Subscribe to Journal
Get full journal access for 1 year
only $3.90 per issue
All prices are NET prices.
VAT will be added later in the checkout.
Tax calculation will be finalised during checkout.
Get time limited or full article access on ReadCube.
All prices are NET prices.
Smith, J. D., McManus, K. F. & Fraser, H. B. A novel test for selection on cis-regulatory elements reveals positive and negative selection acting on mammalian transcriptional enhancers. Mol. Biol. Evol. 30, 2509–2518 (2013)
Denver, D. R. et al. The transcriptional consequences of mutation and natural selection in Caenorhabditis elegans. Nature Genet. 37, 544–548 (2005)
Stoltzfus, A. & Yampolsky, L. Y. Climbing mount probable: mutation as a cause of nonrandomness in evolution. J. Hered. 100, 637–647 (2009)
Rice, D. P. D. & Townsend, J. P. J. A test for selection employing quantitative trait locus and mutation accumulation data. Genetics 190, 1533–1545 (2012)
Raser, J. M. & O’Shea, E. K. Control of stochasticity in eukaryotic gene expression. Science 304, 1811–1814 (2004)
McAlister, L. & Holland, M. J. Differential expression of the three yeast glyceraldehyde-3-phosphate dehydrogenase genes. J. Biol. Chem. 260, 15019–15027 (1985)
Pierce, S. E., Davis, R. W., Nislow, C. & Giaever, G. Genome-wide analysis of barcoded Saccharomyces cerevisiae gene-deletion mutants in pooled cultures. Nature Protocols 2, 2958–2974 (2007)
Ringel, A. E. et al. Yeast Tdh3 (glyceraldehyde 3-phosphate dehydrogenase) is a Sir2-interacting factor that regulates transcriptional silencing and rDNA recombination. PLoS Genet. 9, e1003871 (2013)
Gruber, J. D., Vogel, K., Kalay, G. & Wittkopp, P. J. Contrasting properties of gene-specific regulatory, coding, and copy number mutations in Saccharomyces cerevisiae: frequency, effects and dominance. PLoS Genet. 8, e1002497 (2012)
Liti, G. et al. Population genomics of domestic and wild yeasts. Nature 458, 337–341 (2009)
Schacherer, J., Shapiro, J. A., Ruderfer, D. M. & Kruglyak, L. Comprehensive polymorphism survey elucidates population structure of Saccharomyces cerevisiae. Nature 458, 342–345 (2009)
Lynch, M. et al. A genome-wide view of the spectrum of spontaneous mutations in yeast. Proc. Natl Acad. Sci. USA 105, 9272–9277 (2008)
Patwardhan, R. P. et al. High-resolution analysis of DNA regulatory elements by synthetic saturation mutagenesis. Nature Biotechnol. 27, 1173–1175 (2009)
Melnikov, A. et al. Systematic dissection and optimization of inducible enhancers in human cells using a massively parallel reporter assay. Nature Biotechnol. 30, 271–279 (2012)
Patwardhan, R. P. et al. Massively parallel functional dissection of mammalian enhancers in vivo. Nature Biotechnol. 30, 265–270 (2012)
Kwasnieski, J. & Mogno, I. Complex effects of nucleotide variants in a mammalian cis-regulatory element. Proc. Natl Acad. Sci. USA 109, 19498–19503 (2012)
Yagi, S., Yagi, K., Fukuoka, J. & Suzuki, M. The UAS of the yeast GAPDH promoter consists of multiple general functional elements including RAP1 and GRF2 binding sites. J. Vet. Med. Sci. 56, 235–244 (1994)
Baker, H. V. et al. Characterization of the DNA-binding activity of GCR1: in vivo evidence for two GCR1-binding sites in the upstream activating sequence of TPI of Saccharomyces cerevisiae. Mol. Cell. Biol. 12, 2690–2700 (1992)
Hornung, G. et al. Noise-mean relationship in mutated promoters. Genome Res. 22, 2409–2417 (2012)
Lehner, B. Selection to minimise noise in living systems and its implications for the evolution of gene expression. Mol. Syst. Biol. 4, 1–6 (2008)
Fraser, H. B., Hirsh, A. E., Giaever, G., Kumm, J. & Eisen, M. B. Noise minimization in eukaryotic gene expression. PLos Biol. 2, 834–838 (2004)
Wang, Z. & Zhang, J. Impact of gene expression noise on organismal fitness and the efficacy of natural selection. Proc. Natl Acad. Sci. USA 108, E67–E76 (2011)
Newman, J. R. S. et al. Single-cell proteomic analysis of S. cerevisiae reveals the architecture of biological noise. Nature 441, 840–846 (2006)
Batada, N. & Hurst, L. Evolution of chromosome organization driven by selection for reduced gene expression noise. Nature Genet. 39, 945–949 (2007)
Zhang, Z., Qian, W. & Zhang, J. Positive selection for elevated gene expression noise in yeast. Mol. Syst. Biol. 5, 1–12 (2009)
Frankel, N. et al. Phenotypic robustness conferred by apparently redundant transcriptional enhancers. Nature 466, 1–5 (2010)
Perry, M. W., Boettiger, A. N., Bothma, J. P. & Levine, M. Shadow enhancers foster robustness of Drosophila gastrulation. Curr. Biol. 20, 1562–1567 (2010)
Fontana, W. & Buss, L. “The arrival of the fittest”: toward a theory of biological organization. Bull. Math. Biol. 56, 1–64 (1994)
De Vries, H. Species and Varieties, Their Origin by Mutation 825–826 (Open Court, 1905)
Taly, J.-F. et al. Using the T-Coffee package to build multiple sequence alignments of protein, RNA, DNA sequences and 3D structures. Nature Protocols 6, 1669–1682 (2011)
Löytynoja, A. & Goldman, N. webPRANK: a phylogeny-aware multiple sequence aligner with interactive alignment browser. BMC Bioinform. 11, 579 (2010)
Libkind, D. et al. Microbe domestication and the identification of the wild genetic stock of lager-brewing yeast. Proc. Natl Acad. Sci. USA 108, 14539–14544 (2011)
Scannell, D. R. et al. The awesome power of yeast evolutionary genetics: new genome sequences and strain resources for the Saccharomyces sensu stricto genus. G3 1, 11–25 (2011)
Liti, G. et al. High quality de novo sequencing and assembly of the Saccharomyces arboricolus genome. BMC Genom. 14, 69 (2013)
Wang, Q.-M., Liu, W.-Q., Liti, G., Wang, S.-A. & Bai, F.-Y. Surprisingly diverged populations of Saccharomyces cerevisiae in natural environments remote from human activity. Mol. Ecol. 21, 5404–5417 (2012)
Tamura, K., Stecher, G., Peterson, D., Filipski, A. & Kumar, S. MEGA6: Molecular Evolutionary Genetics Analysis version 6.0. Mol. Biol. Evol. 30, 2725–2729 (2013)
Clement, M., Posada, D. & Crandall, K. A. TCS: a computer program to estimate gene genealogies. Mol. Ecol. 9, 1657–1659 (2000)
Ashkenazy, H., Erez, E., Martz, E., Pupko, T. & Ben-Tal, N. ConSurf 2010: calculating evolutionary conservation in sequence and structure of proteins and nucleic acids. Nucleic Acids Res. 38, W529–W533 (2010)
Hittinger, C. T. Saccharomyces diversity and evolution: a budding model genus. Trends Genet. 29, 309–317 (2013)
Wittkopp, P. J., Haerum, B. K. & Clark, A. G. Evolutionary changes in cis and trans gene regulation. Nature 430, 85–88 (2004)
Wittkopp, P. J. in Molecular Methods for Evolutionary Genetics Vol. 772 (eds Orgogozo, V. & Rockman, M. V. ) 297–317 (Humana, 2011)
Gietz, R. & Woods, R. in Methods in Molecular Biology 2nd edn, Vol. 313 (ed. Xiao, W. ) 107–120 (Springer, 2006)
Kudla, G., Murray, A., Tollervey, D. & Plotkin, J. Coding-sequence determinants of gene expression in Escherichia coli. Science 324, 255–258 (2009)
Lo, K., Hahne, F., Brinkman, R. R. & Gottardo, R. flowClust: a Bioconductor package for automated gating of flow cytometry data. BMC Bioinform. 10, 145 (2009)
Hahne, F. et al. flowCore: a Bioconductor package for high throughput flow cytometry. BMC Bioinform. 10, 106 (2009)
R Core Team. R: A language and environment for statistical computing (2013)
We thank C. Maclean, J. Zhang, and C. Hittinger for strains, the University of Michigan Center for Chemical Genomics for technical assistance with flow cytometry, and J. Coolon, R. Lusk, K. Stevenson, A. Hodgins-Davis, J. Lachowiec, C. Maclean, J. Yang, C. Landry, J. Townsend, and D. Petrov for comments on the manuscript. Funding for this work was provided to P.J.W. by the March of Dimes (5-FY07-181), Alfred P. Sloan Research Foundation, National Science Foundation (MCB-1021398), National Institutes of Health (1 R01 GM108826) and the University of Michigan. Additional support was provided by the University of Michigan Rackham Graduate School, Ecology and Evolutionary Biology Department, and the National Institutes of Health Genome Sciences training grant (T32 HG000040) to B.P.H.M.; by a National Institutes of Health Genetics training grant (T32 GM007544) to D.C.Y.; by a National Institutes of Health National Research Service Award (NRSA) postdoctoral fellowship (1 F32 GM083513-0) to J.D.G.; and by a European Molecular Biology Organization postdoctoral fellowship (EMBO ALTF 1114-2012) to F.D.
The authors declare no competing financial interests.
Flow cytometry data have been deposited in the FlowRepository under Repository ID FR-FCM-ZZBN.
Extended data figures and tables
a, Locations of polymorphisms within the TDH3 promoter relative to known functional elements, including RAP1 and GCR1 transcription factor binding sites, are shown. Squares, point mutations; circles, indels. Red, G:C→A:T; yellow, G:C→T:A; blue, G:C→C:G; orange T:A→C:G; green, T:A→G:C; purple, T:A→A:T. b, The log2 ratio of total expression divergence between natural isolates and a reference strain (x axis) versus the log2 ratio of total cis-regulatory expression divergence between natural isolates and the reference strain (y axis). Error bars, 95% confidence intervals. The 25 of 48 strains with significant cis-regulatory differences from the reference strain are shown in blue. Reference strain is shown in red. These data show differences in cis- and trans-regulation among strains, but do not reveal the evolutionary changes that give rise to these differences.
a, The TDH3 promoter haplotype network is shown with the inferred ancestral strain at the left. Circles represent haplotypes observed among the 85 strains, with their diameters proportional to haplotype frequency. The haplotypes are coloured according to clade (Supplementary Table 1). Triangles are haplotypes that were not observed among the strains sampled, but must exist or have existed as intermediates between observed haplotypes. Squares are possible intermediates connecting two observed haplotypes, but it is unknown which of these actually exists or existed in S. cerevisiae. Solid lines connect haplotypes that differ by a single mutation; dashed lines connect haplotypes that differ by multiple mutations. Mutations on each branch are coloured by the mutation type as in Extended Data Fig. 1a. b, Relationship between the effect of a polymorphism on mean expression level and the frequency of that polymorphism among the strains sampled (P = 0.43). c, Relationship between the effect of a polymorphism on expression noise and the frequency of that polymorphism among the strains sampled (P = 0.0028).
Distributions of effects on mean expression level from previous random mutagenesis experiments are shown partitioned by mutation type. For each mutation type, the distribution (inside) and density (outside, coloured) of the effects on mean expression level are shown. The number of mutations tested for each promoter is shown in the upper right corner of each panel. a, Bacteriophage SP6 promoter. b, Bacteriophage T3 promoter. c, Bacteriophage T7 promoter. d, Human CMV promoter. e, Human HBB promoter. f, Human S100A4/PEL98 promoter. g, Synthetic cAMP-regulated enhancer. h, Interferon-β enhancer. i, ALDOB enhancer. j, ECR11 enhancer. k, LTV1 enhancer replicate 1. l, LTV1 enhancer replicate 2. m, Rhodopsin promoter. Red: bacteriophage promoters from ref. 13. Blue: mammalian promoters from ref. 13. Green: mammalian enhancers from ref. 14. Yellow: mammalian promoters from ref. 15. Purple: promoter from ref. 16. n, Distribution of effects for C→T (red) and G→A (blue) mutations for mean expression level in this study. o, Same as n, but for expression noise. p, Distribution of effects for C→T/G→A polymorphisms compared with other polymorphism types for mean expression level in this study. q, Same as p, but for gene expression noise.
a, Correlation between mean expression level (x axis) and expression noise (y axis) for the 236 point mutations in the TDH3 promoter (R2 = 0.85). Grey points correspond to mutations in known transcription factor binding sites. Coloured points correspond to individual mutations highlighted in c–f. b, Alternative plot showing the majority of data from a more clearly; grey and coloured points are the same as in a. c, Distribution of gene expression phenotypes from a mutant (blue) with decreased mean expression level but similar expression noise as the reference strain (black). Outside the known TFBS, 50% of mutations decreased mean expression. d, Distribution of gene expression phenotypes from a mutant (red) with increased mean expression level but similar gene expression noise as the reference strain (black). Outside the known TFBS, 50% of mutations increased mean expression. e, Distribution of gene expression phenotypes from a mutant (brown) with decreased gene expression noise but similar mean expression level as the reference strain (black). Outside the known TFBS, 13% of mutations decreased expression noise. f, Distribution of gene expression phenotypes from a mutant (green) with increased gene expression noise but similar mean expression level as the reference strain (black). Outside the known TFBS, 87% of mutations increased expression noise.
a–h, Tests for selection using likelihood. a, The distribution of likelihood values for 100,000 randomly sampled sets of 45 mutations drawn from the mutational effect distribution is shown for mean expression level. The average likelihood for all samples of mutations tested (red) as well as the likelihood of the observed polymorphisms (blue) are also shown. b, Same as a, but for expression noise. The average likelihood for all mutation samples tested is shown in brown and the likelihood of the observed polymorphisms is shown in green. c, Same as a, but with the large effect mutations in the TFBS removed from the mutational effect distribution used for sampling. d, Same as b, but after removing the mutations in the TFBS from the mutational effect distribution. e, Same as a, but using only G→A and C→T polymorphisms. f, same as b, but using only G→A and C→T polymorphisms. g, Distribution of likelihoods for 10,000 random walks along the TDH3 promoter haplotype network using the effects from the mutational distribution. h, Same as e, but for expression noise. i–n, Tests for selection using average effects. i, The distribution of average effects for 100,000 randomly sampled sets of 45 mutations drawn from the mutational effect distribution is shown for mean expression level (black). Polymorphisms do not have a significantly different average mean expression (blue, 99.5%) than sets of mutations (red, 98.8%; P = 0.16438). This figure is comparable to Extended Data Fig. 5a, but uses average effects instead of the likelihoods to test for differences in distribution between random mutations and polymorphisms. j, Same as i, but for expression noise. Polymorphisms have significantly lower average expression noise (green, 102.1%) than sets of random mutations (brown, 110.9%; P <0.00001). k, Same as i, but with the large effect mutations in the TFBS removed from the mutational effect distribution used for sampling (polymorphisms, 99.5%; mutations, 99.6%; P = 0.37602). l, Same as j, but after removing the mutations in the TFBS from the mutational effect distribution (polymorphisms, 102.1%; mutations, 104.8%; P = 0.00002). m, Same as i, but using only G→A and C→T polymorphisms (polymorphisms, 99.7%; mutations, 98.8%; P = 0.21656). n, Same as j, but using only G→A and C→T polymorphisms (polymorphisms, 100.0%; mutations, 110.9%; P <0.00001).
Extended Data Figure 6 Test for selection using alternative metrics for quantifying gene expression noise.
a–d, Distributions of effects for mutations on gene expression noise across the TDH3 promoter with expression noise quantified as σ (a), σ2/μ2 (b), σ2/μ (c), and residuals from the regression of σ on μ (d), e–h, Distributions of effects for mutations on gene expression noise (brown) compared with polymorphisms (green) with noise quantified as σ (e), σ2/μ2 (f), σ2/μ (g), and residuals from the regression of σ on μ (h). i–l, The maximum likelihood fitness function (middle, black) relating the distribution of mutational effects (top, brown) to the distribution of observed polymorphisms (bottom, green) for expression noise quantified as σ (i), σ2/μ2 (j), σ2/μ (k), and residuals from the regression of σ on μ (l). m–p, Changes in expression noise observed among haplotypes over time in the inferred haplotype network (Extended Data Fig. 2a) are shown in green. The brown background represents the 95th, 90th, 80th, 70th, 60th, and 50th percentiles, from light to dark, for expression noise resulting from 10,000 independent simulations of phenotypic trajectories in the absence of selection where noise is quantified as σ (m), σ2/μ2 (n), σ2/μ (o), and residuals from the regression of σ on μ (p). q, P values for tests of selection using mean expression (μ) and five metrics of expression noise, including σ/μ which is used throughout the main text.
Extended Data Figure 7 Effects of mutations and polymorphisms on a second trans-regulatory background.
a, A comparison between effects of mutations on mean expression in the original trans-regulatory background (x axis) and a hybrid trans-regulatory background between BY4741 and YPS1000 (y axis). Error bars, 95% confidence intervals. b, Same as a, but for gene expression noise. c, Effects of individual mutations on mean expression level in the hybrid trans-regulatory background are shown in terms of the percentage change relative to the un-mutagenized reference allele, and are plotted according to the site mutated in the 678 bp region (significant mutations: red lines, t-test, Bonferroni corrected). Note that most mutations decrease expression, unlike in the original genetic background. d, Same as c, but for gene expression noise (significant mutations: brown lines, t-test, Bonferroni corrected). e, Distribution of de novo mutation effects in the second trans-regulatory background (red) compared with the effects of naturally occurring haplotypes in this trans-regulatory background (blue). Inset: the distribution of likelihood values for 100,000 randomly sampled sets of 27 mutations drawn from the mutational effect distribution is shown for mean expression level. The average likelihood for all samples of mutations tested (red) as well as the likelihood of the observed polymorphisms (blue) are also shown (P = 0.2584). Removing mutations in the known TFBS resulted in a significant difference between mutations and polymorphisms (P = 0.00781). f, Same as e, but for gene expression noise. Mutations, brown. Polymorphisms, green (P = 0.00037). Removing mutations in the known TFBS did not change this result (P < 0.00001).
a, Raw data from the flow cytometer are shown for the first control sample collected. Each point is an individual event scored by the flow cytometer, the vast majority of which are expected to be cells. FSC.A is a proxy for cell size, and FL1.A is a measure of YFP fluorescence. Log10 values are plotted both for FSC.A and for FL1.A. b, The same sample is shown after events found in the negative control sample (using hard gates on FSC.A and FL1.A) were excluded. c, The same sample is shown after flowClust was used to remove events likely to be from multiple cells entering the detector simultaneously. d, The same sample is shown after flowClust was used to isolate the densest homogenous population within the sample. The R2 value shown is the correlation between YFP fluorescence and cell size. e, After correcting for differences in cell size, the correlation between YFP fluorescence and cell size was nearly 0 and not significant. In all panels, the number of events analysed (that is, sample size) is shown in the bottom right corner. Box plots of mean expression of control samples before (red) and after (blue) correcting for the effects of individual plates for each day on which samples were run (f), for replicates nested within day (g), for array nested within day and replicate (h), for stack nested within day (i), for depth nested within day (j), for order nested within day and replicate (k), for row nested within array (l), for column nested within array (m), for block nested within array (n), and for the final cell count (o). The y axis is in arbitrary units. p–x, Same as f–o, but for gene expression noise.
a, The effects on mean expression level for each of the 28 mutations tested on both the reference haplotype (x axis) and natural haplotype A observed in wild strains (y axis) are shown. These two haplotypes differ by a single point mutation. Solid lines show expression from the PTDH3 haplotypes on which the two sets of mutations were created, both of which were defined as 100% activity. The grey line shows y = x. The dashed line shows the consistent increase in mean expression level when these mutations were tested on haplotype A. Error bars, 95% confidence intervals. Coloured points have significantly different effects on the two backgrounds (P < 0.05, ANOVA, Bonferroni corrected), indicating weak epistasis. b, Same as a, but for gene expression noise. c, Distributions of mutational effects for mean expression levels based on the 236 point mutations tested on the reference haplotype (red) as well as for the 28 mutations tested on haplotype A (blue). d, Same as c, but for gene expression noise. e, The effect on mean expression of the full TDH3 promoter (red) compared with promoters containing six fewer base pairs at the 5′ end (blue). Each box plot summarizes data from nine replicates. f, Same as e, but for expression noise.
a, A histogram summarizing the mutational effects on mean expression level is shown (red), overlaid with the density curve (black line) used to calculate the likelihood of an effect on mean expression level. b, Same as a, but for expression noise. c. Density curves for the effects of one (red), two (blue), three (green), four (purple), or five (black) mutations randomly drawn from the distribution of mutational effects observed for mean expression level. d, Same as c, but for expression noise.
FASTA formatted sequences of TDH3 promoter haplotypes from all naturally occurring strains and species analyzed. (TXT 64 kb)
R script for analysis of pyrosequencing data used to quantify variation in TDH3 mRNA abundance and cis-regulatory activity. (TXT 12 kb)
R script for calculating mean expression level and expression noise from the flow cytometry data. (TXT 17 kb)
R script for analyzing effects of mutations and polymorphisms, including tests for selection. (TXT 110 kb)
R script for analyzing effects of mutations and polymorphisms in a second genetic background. (TXT 34 kb)
Identity and properties of strains included in the study. (XLS 51 kb)
A list of primers used in the study. (XLS 105 kb)
Pyrosequencing data for all samples. (XLS 158 kb)
A summary of flow cytometry data. (XLS 2110 kb)
Summary of flow cytometry data in second genetic background. (XLSX 166 kb)
About this article
Cite this article
Metzger, B., Yuan, D., Gruber, J. et al. Selection on noise constrains variation in a eukaryotic promoter. Nature 521, 344–347 (2015). https://doi.org/10.1038/nature14244
This article is cited by
Nature Ecology & Evolution (2022)
Nature Ecology & Evolution (2022)
Nature Reviews Genetics (2021)
Nature Communications (2019)