Main

Assessing the natural diversity of species has the potential to uncover new genetic traits and gene functions. Moreover, natural isolate collections are increasingly powerful tools for probing reproducibility and the generalizability of experimental findings. In budding yeast, sequencing collections of natural isolates1,2,7,8,9 revealed that around 20% of the isolates exhibit stable aneuploidies—imbalances in chromosome copy numbers. This finding contrasts with earlier studies of laboratory strains, in which aneuploidies were unstable and exacted fitness costs3,5,10. The high natural prevalence implies that aneuploidies are beneficial, at least under some conditions, and that natural isolates have strategies to mitigate fitness costs1,4. Indeed, aneuploidies can increase stress resistance and drug tolerance in yeast11,12,13,14,15,16, virulence and immune escape in protists17 and malignancy, invasiveness and drug tolerance in human cancer cells18,19,20,21,22. However, how cells mediate aneuploidy tolerance is not well understood. One potential mechanism is dosage compensation—the attenuation of altered gene dosage. Dosage compensation of either individual genes or whole chromosomes at the proteome level has been observed in tissue-cultured mammalian cancer cells and in pathogens such as Leishmania donovani23,24,25,26,27,28,29,30,31. However, the diverging results obtained with two collections of lab-engineered aneuploid yeast strains have triggered a debate about whether dosage compensation is associated with aneuploidy across species, and have raised questions about which proteins are dosage compensated and through which mechanisms3,5,10,32. One of these collections consists of ‘synthetic’ disomes, in which each haploid strain carries a duplication of one of the 16 S. cerevisiae chromosomes, maintained under selection3. In these disomes, the gene dosage from a chromosome duplication yielded, on average, a twofold increase in mRNA and protein levels, but a fraction of proteins—namely, components of protein complexes—were attenuated specifically at the proteome level3,10. The second collection of strains was produced by inducing the meiosis of triploid or pentaploid strains and subsequent selection for stable aneuploid progeny5. Aneuploidies in this collection were more stable, but dosage compensation was minimal, and not enriched for protein-complex subunits5. Moreover, the two collections differ in that the SSD1 gene, which has been linked to aneuploidy tolerance, is truncated in the disomic collection33. Finally, contradictory aneuploidy-associated transcriptional signatures have been reported for lab-engineered aneuploid strains, which suggests that aneuploids respond dissimilarly to the presence of aneuploid chromosomes3,10,34,35.

Lab-generated synthetic aneuploids represent an early state of acquired aneuploidy; that is, they have not undergone long-term selection with the aneuploid chromosome. Dosage compensation might thus be different in natural isolates, in which most aneuploids are assumed to have been present over longer timescales36. The transcriptomes of natural aneuploids, however, resembled lab-generated aneuploids: transcripts that cause toxicity when overexpressed are dosage compensated, but the average chromosome-wide gene dosage is not attenuated and follows the chromosome copy number change4,37,38.

On the other hand, although dosage compensation is linked mainly with protein levels, the proteomes of natural isolates have yet to be investigated. Here we developed cell growth, sample preparation, data acquisition and data preprocessing strategies to generate precise proteomes for 796 environmental and domesticated natural isolates of S. cerevisiae. By integrating these proteomes, isolate by isolate, with genomes1 and transcriptomes6, we obtained a multi-omic dataset covering 613 natural isolates, including 15.5% aneuploids. Furthermore, for eight isolates, we generated a ubiquitinome dataset39, and for 55 isolates, we measured protein turnover using pulse labelling with stable amino acid isotopes40. We found that aneuploid proteomes differ markedly between natural and lab-engineered yeast. Natural isolates attenuate the aneuploid gene dosage broadly, affecting about 70% of proteins encoded on aneuploid chromosomes, across protein functions and on a chromosome-wide level. Attenuation of the proteome in natural isolates can be attributed to the induction of the ubiquitin–proteasome system (UPS) and to increases in overall rates of protein turnover.

Multi-omic data on natural yeast

We produced a multi-omic dataset for natural isolates by generating proteomes for isolates that were obtained globally from environmental and industrial niches (Supplementary Table 1), and integrating them with genomes1 and transcriptomes6 of the same isolates. For the proteomes, isolates were arrayed in 96-well plates and cultivated in a minimal synthetic medium; 933 strains reached sufficient biomass (Supplementary Table 2). We generated proteomes by adapting a SWATH mass spectrometry (SWATH-MS)-based high-throughput proteomics pipeline41,42 and by developing a computational preprocessing pipeline that accounts for the genetic diversity of natural isolates (Fig. 1a and Extended Data Fig. 1a). After extensive quality filtering, we retained 7,946 precursors that quantified 1,576 proteins across 796 isolates (Fig. 1a, Extended Data Fig. 1b,c and Supplementary Table 3). With less than 4% missing values, the proteins were quantified with a coefficient of variation (CV) of 16% across the 77 control samples, and the natural diversity was reflected in a CV of 32.8% across the isolates (Extended Data Fig. 1d). Processing of the disomic strain collection3 over the same pipeline yielded quantities for 1,377 proteins at high completeness and precision (less than 2.3% missing values; median CV of 9.5% within replicates, and 26.7% median CV across samples; Extended Data Fig. 1e,f and Supplementary Table 4).

Fig. 1: High-throughput proteomics pipeline and assembly of a cross-omics dataset for studying aneuploidy in natural yeast isolates.
figure 1

a, S. cerevisiae isolates were cultivated in synthetic minimal medium in 96-well format. Cells were collected by centrifugation at mid-log phase and lysed by bead beating under denaturing conditions. The lysate was then treated with reducing and alkylating reagents and digested with trypsin. The resulting peptides were desalted by solid-phase extraction (SPE) and analysed by liquid chromatography–tandem mass spectrometry (LC–MS/MS) in data-independent acquisition (DIA) mode using SWATH-MS. Data were integrated using DIA-NN. b, Integrated dataset of 613 natural S. cerevisiae isolates. Relative chromosome copy number (CN), relative median mRNA levels and relative median protein abundances per chromosome between isolates and euploid reference (log2 ratios strain/euploid) are shown. The median across all euploid isolates was used as the reference for ratio calculations. Isolate ploidy is indicated at the top of the heat map.

For integrating the proteomes with genomes and transcriptomes, we included isolates that either were euploid or had whole-chromosome aneuploidies1, and excluded all isolates with potential inconsistencies within or between the genome and transcriptome (Extended Data Fig. 1g, Supplementary Tables 5 and 6 and Methods). The integrated dataset represents 613 haploid to pentaploid natural isolates, including 95 with at least one aneuploid chromosome (Fig. 1b and Extended Data Fig. 1h). Aneuploidy was seen for all chromosomes (Extended Data Fig. 1i). Chromosome gains were more common (88.4%) than losses (11.6%). Chromosomes 1, 9 and 3 were most frequently aneuploid, whereas aneuploidy was rare for chromosomes 2, 4, 10 and 13 (Fig. 1b and Extended Data Fig. 1i). The low aneuploidy frequency of these chromosomes could, at least partially, be linked to their large size36. Furthermore, aneuploidy was more frequent in isolates with higher basal ploidy, and 26 strains had complex aneuploidies (Fig. 1b, Extended Data Fig. 1h–j and Supplementary Table 7).

Dosage compensation in natural aneuploids

Gene-by-gene level analysis of transcript and protein levels relative to chromosome copy number changes revealed a high degree of dosage compensation at the proteome level in natural isolates. For example, Gvp36 (a Golgi vesicle protein), Ccr4 (a component of the CCR4–NOT complex) and Rps20 (a component of the small ribosomal subunit) all had minimal attenuation at the mRNA level (Fig. 2a; mRNA attenuation slopes of 0.9, 0.98 and 1.11, respectively; Supplementary Table 8). At the protein level, Gvp36 was not attenuated, but Ccr4 and Rps20 were partially and strongly attenuated, respectively (Fig. 2a; protein attenuation slopes of 0.97, 0.69 and 0.12, respectively; Supplementary Table 8). In total, 70.5% of all proteins encoded by genes on the aneuploid chromosomes were dosage compensated at the proteome level (Fig. 2b). Consistently, the correlation between mRNA- and protein-level attenuation was weak (Spearman’s ρ = 0.20, P = 4.9 × 10−9; Extended Data Fig. 2a). The fraction of attenuated proteins was consistently higher than the fraction of attenuated mRNAs (Fig. 2b), and the median attenuation slope across all genes expressed on aneuploid chromosomes was markedly stronger for proteins than for mRNAs (0.65 versus 0.92; Extended Data Fig. 2b).

Fig. 2: Gene-by-gene quantification of dosage compensation.
figure 2

a, Linear regression of relative mRNA and protein levels against relative copy number changes for three exemplary genes with different extents of protein-level dosage compensation. Each point shows the log2 expression when expressed from an aneuploid chromosome in an isolate (GVP36: chr. 9, relative CN change > 0 in 24 points; CCR4: chr. 1, CN change > 0 in 29 points; RPS20: chr. 8, CN change > 0 in 9 points). The expected behaviour of a non-buffered gene (y = x, dotted grey line) is shown. b, Cumulative distribution analysis for natural isolates and disomic strains, comparing the fraction of genes attenuated at the mRNA or protein level on the basis of the distribution slopes. The vertical dotted grey line denotes an analysis threshold of 0.85, and the horizontal dotted grey lines and numbers indicate the fraction of proteins attenuated at this threshold. c, Proportion of genes in natural isolates and in synthetic disomes that are part of macromolecular complexes. d, Proportion of proteins that are not part of macromolecular complexes and attenuated or not in natural isolates or synthetic disomes. For ad, the regression analysis was performed separately for natural isolates and disomic strains (827 and 680 proteins included in the analysis, respectively). e, For proteins (dots) that are either attenuated (n = 583) or not (n = 244) when expressed from aneuploid chromosomes of natural isolates, comparison of the number of complexes a protein is part of (P = 4.1 × 10−8 (without adjustment); P = 9.3 × 10−7 (with adjustment)), the number of experimentally confirmed protein–protein interactions (PPIs) (P = 0.056; P = 0.20), the number of ubiquitination sites per protein obtained from Uniprot (integrating both experimental and computational resources; P = 0.023; P = 0.13) and the general variability in protein abundance (P = 4.4 × 10−3; P = 0.034). White diamonds represent means. P values refer to significance testing by two-sample, two-sided Wilcoxon tests without or with Benjamini-Hochberg adjustment.

By contrast, for the disome collection, the fraction of attenuated proteins was lower than it was in the natural isolates at any given threshold (Fig. 2b and Supplementary Table 9), whereas median attenuation slopes across all genes for proteins and mRNAs were similar (0.87 versus. 0.94; Extended Data Fig. 2b). Furthermore, when comparing synthetic disomic strains (1n + 1) to natural isolates with chromosome duplications (1n+1 or 2n+2; Extended Data Fig. 2c), most proteins attenuated in disome strains were attenuated in natural isolates (around 69%), whereas proteins that were not attenuated in the synthetic aneuploids were also attenuated in the natural disomes (Extended Data Fig. 2d,e). Thus, attenuation is specific to the proteome, and is more prevalent in natural isolates than in lab strains, both qualitatively and quantitatively.

Previous work on the synthetic disomes and in mammalian cell lines explained dosage compensation with the attenuation of surplus subunits of protein complexes10,25. In natural isolates, functional analysis of attenuated proteins found that 44% of attenuated proteins are components of macromolecular complexes (Extended Data Fig. 2f). However, most (63%) proteins that are not known to be macromolecular-complex components are also attenuated in natural isolates (Fig. 2c,d). For instance, proteins involved in metabolic pathways such as the TCA cycle, glycolysis and gluconeogenesis are attenuated (Supplementary Table 10). By contrast, only some metabolic processes associated with growth were not attenuated, including the glutathione, pyrimidine, pantothenate and CoA, and glycerophospholipid pathways, as well as the alanine, aspartate and glutamate pathway (Supplementary Table 10). Thus, we asked whether attenuation is determined by other factors. We found that, in addition to protein-complex membership, attenuation at the protein level correlated with the number of potential ubiquitination sites. The significance of the observed relationship was not robust to multiple testing correction in our dataset, but has been reported previously26 (Fig. 2e and Extended Data Fig. 3). Also, consistent with the importance of protein complexes, the numbers of protein–protein interactions showed a similar trend. Moreover, proteins that exhibited lower variability in abundance across euploid isolates were more likely to be attenuated (Fig. 2e and Extended Data Fig. 2g,h). Homologues of proteins that are non-exponentially degraded in mammalian cells25 were also more likely to be attenuated (Extended Data Fig. 2i).

By analysing effect sizes across isolates, we compared chromosome-wide dosage compensation in the natural isolates and the disomes (Extended Data Fig. 4a–c and Supplementary Table 11). As in the synthetic disomes and as reported for a small selection of natural isolates3,4,10,43, the relative transcript abundances of aneuploid chromosomes centred around the relative change in copy number. In the disomes, relative protein levels measured here also centred around the relative copy number increase, with the attenuation of specific proteins (for example, complex subunits) having a shouldered distribution, confirming previous studies10 (Extended Data Fig. 4a). By contrast, for the natural isolates, the mean relative protein distribution was shifted towards the euploid state, indicating chromosome-wide dosage compensation (Extended Data Fig. 4b, Supplementary Table 11 and Supplementary Fig. 1). For example, disome 9 and natural isolate AHS have the same karyotype and similar average mRNA distributions, but only in the natural isolate was the median relative protein abundance shifted towards the euploid state (Fig. 3a). Across all isolates, absolute attenuation was stronger with lower overall ploidy and for larger chromosomes, but also diverged widely between isolates (Fig. 3b, Extended Data Fig. 4d–f and Supplementary Tables 11 and 12). For example, relative protein levels in diploid isolate CFV were attenuated by 58%, whereas in triploid isolate ABV they were attenuated by only 14% (Fig. 3c and Supplementary Table 12). Chromosome-wide attenuation was significant for all chromosome gains and losses (Extended Data Fig. 4a–c and Supplementary Table 11, except for 3n−1 chromosomes), as were the differences between the proteome and the transcriptome (Extended Data Fig. 4a,b). These differences were robust across different growth stages (Extended Data Fig. 4g,h and Supplementary Table 13). Quantitatively, relative protein expression in the natural isolates was attenuated by an average of 25% across relative chromosome copy number changes (Fig. 3b).

Fig. 3: Dosage compensation across isolates.
figure 3

a, Comparison of relative mRNA and protein expression between natural isolate AHS and a disomic lab strain, both disomic for chromosome 9. Gene-by-gene log2 mRNA or protein ratios are shown, sorted by chromosomal location. Genes located on aneuploid chromosomes are coloured. The solid grey line marks 0, and dashed coloured lines indicate the medians of the aneuploid mRNA or protein expression distributions, respectively. Box plots show the distributions of log2 mRNA or protein ratios for all genes encoded on euploid or duplicated chromosomes (AHS: n = 1,513/48 euploid/aneuploid; disome 9: n = 1,322/45 euploid/aneuploid). In box plots, the centre marks the median, hinges mark the 25th and 75th percentiles and whiskers show all values that, at maximum, fall within 1.5 times the interquartile range; the median of the distributions is shown below the boxes. Only log2 ratios between −2 and 2 are shown to improve readability. b, Quantification of relative dosage compensation at the mRNA level (top) and at the protein level (bottom) for natural isolates and disomic lab strains. The medians of the log2 mRNA and log2 protein distributions in Extended Data Fig. 4a,b are plotted against the relative chromosome copy number change. Linear regressions for mRNA (orange) and protein (purple) levels in natural isolates and disomic lab strains are shown. Linear models and R2 values are shown for natural isolates. The dotted black line indicates the expected relative expression levels under no dosage compensation (y = x). c, Scatter plot comparing the isolate-wise extent of attenuation at the mRNA and protein level. Isolates AHS, CFV and ABV are highlighted. Distributions of mRNA-level (orange) and protein-level (purple) buffering across all aneuploid isolates are shown. The pie chart shows the number of isolates that are buffered more strongly at the protein or at the mRNA level. Isolates with complex aneuploidies of different relative copy number changes, as well as isolates that probably reverted to the euploid state, were excluded.

UPS activation across natural aneuploids

Aneuploidy affects gene expression in trans; that is, on euploid chromosomes. Three trans expression signatures potentially provide insights into the physiological responses to aneuploidy: the environmental stress response (ESR)3,35,44, the aneuploidy-associated protein signature (APS)10 and the common aneuploidy gene expression (CAGE)34. However, because these signatures differed between collections of aneuploid lab strains, they were fiercely debated10,34,35. Notably, the natural isolates exhibited transcriptional signatures that resembled the ESR, APS and CAGE signatures. However, the direction of up- or downregulation was highly isolate dependent, with clusters of isolates showing completely opposite patterns of regulation (Fig. 4a and Supplementary Fig. 2). Moreover, these transcriptional signatures were largely mitigated at the proteome level. Thus, the presence of ESR, APS and CAGE signatures differs in different isolates, and these signatures are not necessarily translated to the proteome.

Fig. 4: Analysis of the trans expression response in natural aneuploid isolates.
figure 4

a, mRNA and protein trans expression (log2 isolate/euploid) of genes previously implicated in the global response to aneuploidy across natural aneuploid isolates (n = 95). Genes annotated as CAGE genes34, ESR genes3,44 and APS genes10 are clustered to the left of the heat maps, with the direction of the regulation described in the reference papers indicated in red (up) or blue (down). Genes that are located on aneuploid chromosomes in a respective isolate are omitted from trans expression analyses and are therefore shown in grey. b, GSEA of median log2 protein expression ratios (isolate/euploid) for genes encoded on all euploid chromosomes across aneuploid natural isolates (n = 95, genes in trans of aneuploid chromosomes). Statistically significant enrichment scores (false discovery rate (FDR) < 0.05) are coloured in purple. The green background highlights gene sets with positive enrichment scores, the blue one gene sets with negative enrichment scores. c, Volcano plots for natural isolates (n = 95) and disomic strains (n = 9, biological triplicates) showing the results of one-sample, two-sided t-tests comparing the mean log2 protein ratios to μ = 0. Proteins with statistically significant differential expression after multiple hypothesis correction (Benjamini–Hochberg) are coloured in dark grey. Structural components of the proteasome are highlighted in blue.

We then performed gene set enrichment analysis (GSEA) to ask whether other trans signals were consistently seen across the proteomes of natural isolates. The KEGG gene set ‘Proteasome’ was highly enriched among the upregulated proteins across all natural aneuploids relative to all euploids, and was not enriched in the proteomes of the disomes (Fig. 4b,c and Extended Data Fig. 5a). Furthermore, ‘Proteasome’ was the only enriched KEGG term that was related neither to growth-related metabolism, which is also slightly altered in the natural aneuploids1,36, nor to transcription-associated processes that are plausibly associated with aneuploidy owing to their increased need for DNA and RNA synthesis. The enrichment of the KEGG term was due to upregulation of structural components of the proteasome, such as the core and regulatory particle (Fig. 4c, Extended Data Fig. 5b,c and Supplementary Table 14), and was seen across ploidies (Extended Data Fig. 5d,e), although the effect sizes varied between isolates (Extended Data Fig. 6a). The degree of enrichment for proteasome components did not correlate with the degree of aneuploidy (Extended Data Fig. 6b), and did not seem to be driven by the transcription factor Rpn4, a transcriptional regulator of proteasome components in lab strains (Extended Data Fig. 6c–j). Thus, proteasome components are induced in natural aneuploids but not in the lab-engineered disomes.

Protein turnover and dosage compensation

We hypothesized that the proteasome could mediate dosage compensation by accelerating protein degradation and thus protein turnover. Ubiquitinomics experiments using K-GG remnant peptide profiling39 revealed an increased total amount of K-GG-modified peptides in proteins encoded on aneuploid chromosomes (Extended Data Fig. 7a and Supplementary Table 15). Ubiquitination levels correlated with increased gene dosage, but were not found super-stoichiometrically on proteins encoded on the aneuploid chromosome (Extended Data Fig. 7b). Next, we analysed the proteomes of around 4,800 deletion mutants in the lab strain BY4741, of which more than 100 had acquired aneuploidies45. When encoded on an aneuploid chromosome, proteins with a faster turnover46 were more strongly dosage compensated than were proteins with a slower turnover (Fig. 5a).

Fig. 5: Increased protein turnover in natural isolates is linked to dosage compensation.
figure 5

a, Changes in protein abundance after inadvertent chromosomal duplications in aneuploid strains of the yeast deletion collection45 for proteins with short and long half-lives. Fold changes are defined as ratios between protein abundances and the median abundances of the respective protein across all strains. Long and short half-lives are defined as being >75% and <25% quantile (n = 110), respectively. P values were obtained by two-sided t-test. b, Stacked distributions of protein half-lives calculated for 55 isolates. c, Comparison of median turnover rates in euploid isolates (n = 10) versus all aneuploid isolates (n = 45) or isolates exhibiting high attenuation (n = 7). P values were determined using two-sample, two-sided Wilcoxon tests. d, Correlation between median turnover rate and protein-level dosage compensation in aneuploid isolates. Pearson correlation coefficient (PCC), P value and linear regression (blue line) are shown. The PCC and P value of the shown correlation do not change when genes expressed on aneuploid chromosomes in aneuploid isolates are excluded (PCC = 0.31, P = 0.037, two-sided). e, Comparison of median quantile-normalized turnover rates when a protein is expressed from aneuploid chromosomes, euploid chromosomes of aneuploid strains or euploid chromosomes of euploid strains. f, Distribution of PCCs between relative protein expression and isolate turnover rate determined for proteins expressed from euploid chromosomes of euploid isolates or aneuploid chromosomes. g, Distribution of PCCs for proteins expressed from aneuploid chromosomes split by protein-complex membership. In box plots, the centre marks the median, hinges mark the 25th and 75th percentiles and whiskers show all values that, at maximum, fall within 1.5 times the interquartile range.

We asked whether we could adapt dynamic SILAC40 to generate protein turnover data for natural isolates. We exploited our previous finding that prototrophic yeasts switch from self-synthesis to uptake for lysine47, and confirmed the uptake of lysine using stable-isotope-labelled lysine followed by liquid chromatography–selective reaction monitoring (Extended Data Fig. 7c–e). Natural isolates consumed lysine in preference to endogenous synthesis, indicating a full switch to lysine uptake. Indeed, most natural isolates had a higher lysine uptake than did a lysine-auxotrophic lab strain. Next, we conducted dynamic SILAC experiments on 48 haploid or diploid aneuploid isolates, including all diploid isolates with a single chromosome gain (trisomic strains), and 12 haploid or diploid euploid isolates with similar ranges of growth rates (Extended Data Fig. 8a and Supplementary Table 16). SILAC ratios were obtained for 2,400-4,800 peptides per time point (Extended Data Fig. 8b), providing information about the turnover rates for a median number of around 1,100 proteins across 55 isolates (Fig. 5b, Extended Data Fig. 8c and Supplementary Table 17).

Natural isolates exhibited a wide range of turnover rates. Isolates with a high degree of attenuation had significantly higher rates of turnover (Fig. 5c). Furthermore, the rates of turnover correlated with the degree of attenuation across the isolates (Fig. 5d). When quantile-normalized rates of turnover were analysed as a function of whether the protein was expressed from a euploid or an aneuploid chromosome, we found that proteins expressed from euploid chromosomes, in either euploid or aneuploid strains, exhibited comparable turnover rates (Fig. 5e). By contrast, when a protein was expressed from an aneuploid chromosome, its turnover rate was more frequently higher compared with when it was expressed from a euploid chromosome in either euploid or aneuploid strains (Fig. 5e). This suggests that, in natural isolates, proteins expressed from aneuploid chromosomes are often degraded more rapidly than they are when expressed from euploid chromosomes.

The relationship between protein abundance and the overall protein turnover of isolates was different for many proteins when they were expressed from an aneuploid rather than a euploid chromosome (Fig. 5f and Extended Data Fig. 8d) For example, in euploid isolates, the abundance of Age2 did not correlate with isolate turnover; however, when Age2 was expressed from an aneuploid chromosome, its protein levels were more attenuated with increasing turnover (Extended Data Fig. 8d). Such negative correlations between protein levels and turnover rates were observed across genes expressed on aneuploid chromosomes (Fig. 5f), and were particularly evident for subunits of protein complexes (Fig. 5g). By contrast, the correlation coefficient distribution for relative protein abundances and isolate-wise turnover rates centred around 0 for proteins expressed from euploid chromosomes of euploid isolates (Fig. 5f). Notably, proteasome components were enriched among the genes with positive correlation coefficients across euploid isolates (Extended Data Fig. 8e). This is consistent with increased protein turnover being—at least partially—mediated through an increase in proteasome abundance.

Discussion

The advent and success of next-generation sequencing technologies have greatly facilitated molecular biology investigations beyond traditional model organisms, and increasingly enable the generalizability of laboratory findings to be tested1,48,49. However, investigations with natural strains had not yet been extended to the proteome. A key finding from libraries of yeast natural isolates was the frequent occurrence of aneuploidy1,2,4. This contrasts with lab strains, in which aneuploidies are often transient and impose fitness costs3,12. Here we adapted high-throughput and data-independent acquisition proteomics41,42 to quantify the proteomes of hundreds of natural S. cerevisiae isolates at high precision, and with a low number of missing values. To capture additional proteomic properties for the isolates, we generated ubiquitination profiles and protein turnover data using dynamic SILAC40. Integrating these proteomic datasets with genomes1 and transcriptomes6 generated a large, systematic, openly available community resource, which is applicable, for instance, to questions related to protein expression dynamics, smaller copy number variations, the genetic basis of protein expression and gene function.

By comparing natural aneuploids with natural euploids, and comparing the natural aneuploids with lab-generated aneuploids, we report three major observations. First, in agreement with previous literature3,4,10,37,38,43,50, at the transcriptional level, chromosome-wide average mRNA expression levels are largely unaltered, and overall transcript abundance follows chromosome copy number in all yeast isolates, with the caveat that some mRNAs diverge from a clear one-to-one relationship with the DNA copy number. Moreover, our results provide a global perspective on the effects of changes in gene expression on euploid chromosomes. Here, three trans expression signatures (ESR, APS and CAGE) were considered common responses to aneuploidy3,34,44. Although all three transcription signatures are evident amongst the aneuploids, they are sometimes up- and sometimes downregulated, and are dampened considerably at the proteome. Of course, transcriptional responses are likely to be dynamic with respect to different environmental and growth conditions as well, but these results indicate a strong background dependency of transcriptional signals, and their mitigation at the proteome suggests that some of them might not be functional.

Second, at the proteome level, where dosage compensation was previously either attributed to the attenuation of surplus protein complexes or considered minimal5,10,38, in the natural isolates we find that dosage compensation applies at a broad scale. Indeed, 70.5% of proteins encoded on aneuploid chromosomes across functional classes, including proteins that are not part of protein complexes, show attenuation, with protein levels shifting to the euploid state all across the aneuploid chromosome. The attenuation effect sizes in aneuploids remove, on average, one-quarter of the additional relative gene dosage provided by the extra chromosome. Of note, individual isolates can differ markedly from the average, with some attenuating up to half of the extra proteomic mass. When it comes to chromosome-wide dosage compensation, natural yeast isolates more closely resemble the protein attenuation observed for mammalian tissue culture cells or L. donovani, which also show dosage compensation at the chromosome-wide level26,27,28,29,30,31. A likely explanation is that the synthetic yeast aneuploids represent a state in which cells are not yet adapted to aneuploidy, whereas natural strains might have stably carried aneuploidies over long periods of time1,2. Our findings thus propose a timeline for adaptive processes that could result in chromosome-wide dosage compensation in mediating aneuploidy tolerance. When aneuploidy offers a selective advantage, as demonstrated in certain environmental or stress conditions, or in the presence of toxic substances and antifungal drugs4,5,11,12,13,14,16, the fitness benefits can outweigh the costs associated with aneuploidy. These ‘naive aneuploids’ may at first lack chromosome-wide dosage compensation and suffer fitness costs, making them transient in temporary environmental or stress conditions3,12. However, if environmental conditions continue to favour aneuploidy, selective pressure may select for strains with dosage compensation, to increase the stability of the aneuploid state.

Third, the finding that dosage compensation dominates proteins that are not necessarily members of complexes requires new mechanistic explanations. Here, the data recorded for natural isolates highlight the role of the UPS51 and protein turnover rates: structural components of the proteasome were increased in aneuploids relative to euploids, and proteins encoded on aneuploid chromosomes showed an increased level of ubiquitination in absolute terms. The data are also consistent with the notion that the synthetic strains present a pre-adapted state, in comparison to natural isolates. Although the disomes do not show increased expression of structural components of the proteasome, they show signs of proteotoxic stress21,43,52. Furthermore, when evolved in vitro under selective antibiotic pressure to maintain the aneuploid chromosome, they accumulate adaptive mutations that influence UPS activity and increase the levels of protein degradation53. Because this data pointed to the role of protein turnover in dosage compensation, we turned to dynamic SILAC experiments. We found that proteins with a high turnover are better dosage compensated, and in natural isolates with the highest attenuation levels, we observe a faster protein turnover. Moreover, when a protein is encoded on an aneuploid chromosome and is thus more likely to be attenuated, it is more likely to exhibit a higher turnover compared with when it is encoded on a euploid chromosome in either a euploid or an aneuploid strain.

How could an increase in total protein turnover mediate the attenuation of gene dosage? First, at a faster turnover rate, the principle of non-exponential degradation25 might extend from stable protein complexes to more transient interactions, as well as to interactions between proteins, small molecules or metal ions. For instance, metabolite-bound proteins, which are consequently more stable, might be degraded more slowly than unbound proteins54. This hypothesis is supported by previous observations from high-throughput proteomic experiments: proteins that turn over rapidly are less likely to be differentially expressed across the yeast deletion collection strains, probably because they are better buffered45. In addition, proteins that are generally more tightly regulated across the S. cerevisiae species have a higher propensity to be attenuated. Second, single-cell diversity and rapid clonal selection might give an advantage to those cells with a proteome closer to the most optimal level. Consistent with this idea, aneuploid isolates with chromosome losses (for example, monosomic chromosomes in a diploid strain) also compensate for these aneuploidies by increasing the protein levels closer to the euploid state. Thus, dosage compensation specifically, and buffering effects that act in trans on euploid chromosomes, could be achieved by degrading a broad range of proteins faster, and then by natural selection for cells with optimal protein levels or by regulatory processes that promote proteostasis. Our results therefore add to increasing evidence that the proteome level buffers transcriptional events. For example, we previously found that proteomic changes resulting from genetic perturbations are buffered45, and others have found that co-expression signals resulting from the three-dimensional structure of the genome are buffered as well55.

Finally, this study reveals considerable diversity in the natural yeast proteome. This diversity is evident across several dimensions, including protein abundance, ubiquitination, dosage compensation, turnover rates and responses to aneuploidy in different strains. Our findings serve as a potent reminder that outcomes observed in a single genetic background, no matter how meticulously analysed, may not be universally representative. Consequently, incorporating proteomic data alongside genomic and transcriptomic information from a broad collection of natural isolates provides a valuable resource for addressing numerous questions about the generalizability of key evolutionary, ecological, molecular and metabolic processes.

Methods

Reagents

Unless otherwise noted, reagents were purchased as follows. Bacto yeast extract (212750), Bacto peptone (211677), Bacto dehydrated agar (214010), water (LC–MS grade, Optima, 10509404), acetonitrile (ACN) (LC–MS grade, Optima, 10001334), methanol (LC–MS grade, Optima, A456-212) and formic acid (LC–MS grade, 13454279) were purchased from Fisher Chemicals. Heavy (13C6/15N2) lysine was purchased from Roth (2085.1) and Silantes (211604102). Trypsin (sequence grade, V511X) was purchased from Promega. d-glucose (G7021), glycerol (G2025), DL-dithiothreitol (BioUltra, 43815), iodoacetamide (BioUltra, I1149) ammonium bicarbonate (eluent additive for LC–MS, 40867), yeast nitrogen base without amino acids (Y0626) and glass beads (acid washed, 425–600 µm, G8772) were purchased from Sigma-Aldrich. Urea (puriss. P.a., reag. Ph. Eur., 33247H) and acetic acid (eluent additive for LC–MS, 49199) were purchased from Honeywell Research Chemicals. Ninety-six-well solid-phase extraction plates (MACROSpin C18, 50–450 μl, SNS SS18VL) were purchased from the Nest Group.

Yeast strains

The natural isolate library counts 1,023 strains in total, of which 997 strains were previously described1 to be representative of the entire S. cerevisiae species. A further 26 strains were described in two studies7,56. The isolates were arranged in a 96-well plate format according to estimated growth rates from growth on YPD agar. Aneuploidies in strains from ref. 56 and laboratory isolates were manually detected through the coverage plots of the genomic read mapping. For all other isolates, the aneuploidy annotations as described1 were considered. All strain details including aneuploidy, phylogenetic classification, ecological origin of isolation and ploidy are provided in Supplementary Table 1.

Mat A disomic yeast strains, constructed by A. Amon’s laboratory3, were provided by R. Li (disome WT, strain 11311; disome 1, strain 12683; disome 2, strain 12685; disome 4, strain 24367; disome 5, strain 14479; disome 8, strain 13628; disome 9, strain 13975; disome 10, strain 12689; disome 11, strain 13771; disome 12, strain 12693; disome 13, strain 12695; disome 14, strain 13979; disome 15, strain 12697; and disome 16, strain 12700).

Transcriptomic data

Raw read counts are described in a parallel study6 and were filtered to include only genes with a mean of more than 1 count per million across measured strains (Supplementary Table 18). These filtered read counts were then normalized using the trimmed mean of M-values (TMM) method57 as implemented in edgeR58,59. Non-zero, non-log2-transformed counts-per-million values were used for further analysis.

Microarray gene-expression data for lab-engineered disomic strains were downloaded from the supplementary material of a previous study3. Only data for strains grown in batch culture were used. Raw expression profiling data for this dataset are available from the Gene Expression Omnibus database60 under the accession number GSE7812.

High-throughput cultivation of yeast isolates

Natural isolates

The yeast samples were cultivated and digested as follows: the collection was grown on agar plates containing synthetic minimal (SM) medium (6.8 g l−1 yeast nitrogen base, 2% glucose, without amino acids). Subsequently, colonies were inoculated in SM liquid medium (200 μl) and incubated at 30 °C overnight. Then, 160 µl of the culture was transferred to 96-deep-well plates pre-filled with one borosilicate glass bead in each well and diluted 10× in SM liquid medium to a total volume of 1.6 ml per well. Plates were sealed using an oxygen-permeable membrane and grown at 30 °C to exponential phase (Supplementary Table 2), shaking at 1,000 rpm for 8 h. Then, 1.5 ml of cell suspension was transferred to a new deep-well plate and collected by centrifugation (3,220g, 5 min, 4 °C). The supernatant was discarded and plates were immediately cooled on dry ice, then stored at −80 °C until further processing.

Lab-engineered synthetic disomic strains

Samples were grown using SD-His+G418 agar and medium selecting for the duplicated chromosomes (6.7 g l−1 yeast nitrogen base without ammonium sulfate, Difco 233520; 20 g l−1 glucose; 1 g l−1 monosodium glutamate, VWR 27872.298; 0.56 g l−1 CSM-His-Leu-Met-Trp-Ura, MP Biomedicals 4550422; 0.02 mg ml−1 uracil; 0.06 mg ml−1 leucine; 0.02 mg ml−1 methionine; 0.04 mg ml−1 tryptophan; 200 µg ml−1 G418, Gibco 11811023). Each disomic strain and the euploid wild type were set up in triplicate. The procedure for cultivation and lysis of the disomic strains was as described above, except that the collection by centrifugation was performed at 2,700g, 10 min and 4 °C.

Preparation of proteomics samples

Natural isolates

The samples for proteomics were prepared in 96-well plates as previously described41,61, with up to four plates processed in parallel. For yeast lysis, 200 µl of lysis buffer (100 mM ammonium bicarbonate and 7 M urea) and around 100 mg glass beads were added to each well, followed by 5 min bead beating at 1,500 rpm (Spex Geno/Grinder). For reduction and alkylation, 20 μl of 55 mM DL-dithiothreitol (1 h incubation at 30 °C) and 20 μl of 120 mM iodoacetamide (incubated for 30 min in the dark at ambient temperature) were used. Subsequently, 1 ml of 100 mM ammonium bicarbonate was added per well, followed by centrifugation (3,220g, 3 min) and 230 μl of this mixture was transferred to plates pre-filled with 0.9 μg trypsin per well. The samples were incubated for 17 h at 37 °C and the digestion was subsequently stopped by adding 24 μl of 10% formic acid (FA). The mixtures were cleaned up using C18 96-well plates, with 1-min centrifugations between the steps at the described speeds. The plates were conditioned with methanol (200 μl, centrifuged at 50g), washed twice with 50% ACN (200 μl, centrifuged at 50g) and equilibrated three times with 3% ACN/0.1% FA (200 μl, centrifuged at 50g, 80g and 100g, respectively). Then, 200 μl of the digested sample was loaded (centrifuged at 100g) and washed three times with 3% ACN/0.1% FA (200 μl, centrifuged at 100g). After the last washing step, the plates were centrifuged at 180g. Subsequently, peptides were eluted in three steps, twice with 120 μl and once with 130 μl of 50% ACN (180g), and collected in a plate (1.1 ml, square well, V-bottom). The collected material was completely dried on a vacuum concentrator and redissolved in 40 μl of 3% ACN/0.1% FA before transfer to a 96-well plate. The final peptide concentration was estimated by absorption measurements at 280 nm with a Lunatic photometer (Unchained Labs, 2 µl of sample). All pipetting steps were performed with a liquid handling robot (Biomek NXP) and samples were shaken on a thermomixer (Eppendorf Thermomixer C) after each step.

Lab-engineered synthetic disomic strains

Lysis, reduction, alkylation and digestion of the disomic strains were performed as described above. The digest was quenched using 25 µl of 10% FA per sample. The conditioning of the solid-phase-extraction plates was performed as described above, but using 0.1% FA instead of the 3% ACN/0.1% FA mixture. After loading 200 µl of the digested sample, the columns were washed four times with 200 µl of 0.1% FA followed by centrifugation (150g). Purified peptides were collected by three consecutive elution steps using 110 µl of 50% ACN (centrifugation at 200g). After vacuum drying, peptides were dissolved in 30 µl of 0.1% FA. All steps of the sample preparation were performed by hand. Peptide concentrations were determined using a fluorimetric peptide assay kit following the manufacturer’s instructions (Thermo Fisher Scientific, 23290).

LC–MS/MS measurements

Natural isolates

For the collection of natural isolates, liquid chromatography was performed on a nanoAcquity UPLC system (Waters) coupled to a Sciex TripleTOF 6600. Peptides (2 μg) were separated on a Waters HSS T3 column (150 mm × 300 μm, 1.8-μm particles) ramping in 19 min from 3% B to 40% B (solvent A: 1% ACN/0.1% FA; solvent B: ACN/0.1% FA) with a non-linear gradient (Supplementary Table 19). The flow rate was set to 5 μl min−1. The SWATH acquisition method62 consisted of an MS1 scan from m/z 400 to m/z 1,250 (50 ms accumulation time) and 40 MS2 scans (35 ms accumulation time) with a variable precursor isolation width covering the mass range m/z 400 to m/z 1,250 (Supplementary Table 20). Proteomic raw data were recorded using Analyst v.1.8.1.

Lab-engineered synthetic disomic strains

Proteomics measurements were performed on an Agilent 1290 Infinity LC system coupled to a SCIEX TripleTOF 6600 equipped with an IonDrive source as previously described41. Buffer A consisted of 0.1% FA in water, and buffer B of 0.1% FA in ACN. All solvents were LC–MS grade. Five micrograms of peptides per sample were separated at 30 °C with a 5-min active gradient starting with 1% B and increasing to 36% B on an Agilent Infinitylab Poroshell 120 EC-C18 column (2.1 × 50 mm, 1.9-μm particles). The flow rate was set to 0.8 ml min−1 and the scanning SWATH acquisition method consisted of an m/z 10-wide sliding isolation window.

Generation of an experimental spectral library for strain S288c

Five micrograms of yeast digest was injected and run on a nanoAcquity UPLC (Waters) coupled to a SCIEX TripleTOF 6600 with a DuoSpray Turbo V source. Peptides were separated on a Waters HSS T3 column (150 mm × 300 µm, 1.8-µm particles) with a column temperature of 35 °C and a flow rate of 5 µl min−1. A 55-min linear gradient ramping from 3% ACN/0.1% FA to 40% ACN/0.1% FA was applied. The ion source gas 1 (nebulizer gas), ion source gas 2 (heater gas) and curtain gas were set to 15 psi, 20 psi and 25 psi, respectively. The source temperature was set to 75 °C and the ion spray voltage to 5,500 V. In total, 12 injections were run with the following m/z mass ranges: 400–450, 445–500, 495–550, 545–600, 595–650, 645–700, 695–750, 745–800, 795–850, 845–900, 895–1,000 and 995–1,200. The precursor isolation window was set to m/z 1 except for the mass ranges m/z 895–1,000 and m/z 995–1,200, for which the precursor windows were set to m/z 2 and m/z 3, respectively. The cycle time was 3 s, consisting of high- and low-energy scans, and data were acquired in ‘high-resolution’ mode. The spectral libraries were generated using library-free analysis with DIA-NN directly from these scanning SWATH acquisitions. For this DIA-NN analysis, MS2 and MS1 mass accuracies were set to 25 ppm and 20 ppm, respectively, and the scan window size was set to 6.

Proteomics data processing

Natural isolates

Protein-wise fasta files were created by inferring single-nucleotide polymorphisms for each strain on the basis of the reference genome of the S288c strain. In cases of heterozygosity, one of the possible alleles was randomly inferred1,7. For non-reference genes, a single representative sequence per protein was available based on the genomes. The proteome for the reference strain S288c was obtained from UniProt (UP000002311, accessed 10 February 2020)63. Sequences of strains present in the original strain collections1,7 and subject to intellectual property restrictions were excluded from our study, leading to the inclusion of 1,023 strains in the processing. To reduce the processing time and limit the search space to relevant peptides, the protein-wise fasta files were processed to select peptides that were well shared across the strain collection. The protein sequences were thus trypsin-digested in silico and missed cleavages were disregarded. Non-proteotypic peptides were excluded and only peptides shared by 80% of the strains were selected for further analysis. This list of peptides was used to filter the experimental library. Raw mass spectrometry files were processed using the filtered spectral library with the DIA-NN software (v.1.7.12)42. Default parameters of the software were used except for the following: mass accuracy, 20; mass accuracy MS1, 12. Because the peptides selected were not necessarily present ubiquitously in all the strains, an additional step was required to remove false-positive peptide assignments (entries in which a peptide is detected in a strain in which it should be absent). This filter led to the exclusion of around 1% of the entries. Samples with insufficient MS2 signal quality (around 5.7 × 107) and entries with a q value greater than 0.01 or a protein group q value greater than 0.01 were removed. Outlier samples were detected on the basis of both the total ion chromatograms (TIC) and the number of identified precursors per sample (z-score > 2.5 s.d.) and were excluded from further analysis. Precursor normalized values as inferred by DIA-NN that were well detected across the samples (in at least 80% of the strains) and with CV < 0.3 in the quality control samples were retained. Subsequently, batch correction was performed at the precursor level by bringing median precursor quantities of each batch to the same value. Proteins were then quantified using the maxLFQ64 function implemented in the DIA-NN R package, resulting in a dataset containing 1,576 proteins for 796 strains. Missing values (less than 4% of all values) were imputed using k-nearest neighbours (KNN) imputation65.

Lab-engineered synthetic disomic strains

Mass spectrometry files were processed using the experimental spectral library obtained through gas phase fractionation for the S288c strain with the DIA-NN software (v.1.7.12). Default parameters of the software were used except for the following: mass accuracy, 20; mass accuracy MS1, 12. The output from the software was then processed in R. Entries with a q value greater than 0.01 or a protein group q value greater than 0.01 and non-proteotypic peptides were removed. Samples with too low an optical density (OD) (less than 0.075) were filtered out for further analysis (disome 4 and one replicate of disome 8). The precursor normalized values inferred by DIA-NN were used and precursors that were well detected across 80% of the samples were retained. Proteins were then quantified using the maxLFQ function implemented in the DIA-NN R package. The resulting dataset consists of 1,377 proteins for 38 samples. Missing values (less than 2.35% of all values) were imputed using the KNN approach. The median value of all available replicate measurements was used for each protein during all further analyses.

Twenty-four-hour time-course proteomics

Yeast isolates were cultivated on SM medium (as above) in batch culture. In brief, colonies from across an agar plate were incubated in 5 ml medium for 16 h at 30 °C, 750 rpm. The pre-culture was diluted to an optical density at 600 nm (OD600 nm) of 0.1 in 30 ml medium, and incubated for 24 h at 30 °C, 750 rpm. At regular intervals, the OD600 nm was recorded, and around 4 × 107 cells were collected by centrifugation (5 min, 10,000g, 4 °C) at five time points to cover early exponential, mid-exponential, late exponential and stationary phases of growth. Samples were lysed in screw cap tubes by adding around 100 mg of glass beads and 160 µl of lysis buffer (7 M urea and 100 mM ammonium bicarbonate (ABC)), followed by four cycles of bead beating (5 min, 1,500 rpm followed by 5 min on ice) using a GenoGrinder. Samples were centrifuged (5 min, 10,000g, 4 °C) and the supernatant was transferred to a 500-µl 96-well plate. Twenty microlitres of 55 mM DTT was added to each well and the samples were incubated for 1 h at 30 °C. Subsequently, the plate was cooled on ice for 5 min, and then 20 μl of 120 mM IAA was added to each well. The samples were incubated for 30 min at 25 °C in the dark. The reduced and alkylated samples were diluted by adding 500 μl of 100 mM ABC to each well. Then, 2 μg of trypsin/LysC was added to each sample and the plate was incubated for 17 h at 37 °C. The digest was stopped by the addition of 35 µl 20% FA, and peptides were purified using solid-phase extraction as described above. Purified peptides were dried using a vacuum concentrator and dissolved in 35 µl 0.1% FA, and peptide concentrations were determined using a fluorimetric peptide assay kit following the manufacturer’s instructions (Thermo Fisher Scientific, 23290).

Peptide separation was accomplished in a 63-min water to ACN active gradient on an Ultimate 3000 RSLnanoHPLC coupled to a Q Exactive Plus mass spectrometer (both Thermo Fisher Scientific) operating in data-independent acquisition (DIA) mode. Tryptic peptides (1 µg) were concentrated on a trap column (PepMap C18, 5 mm × 300 μm × 5 μm, 100 Ǻ, Thermo Fisher Scientific, buffer containing 2:98 (v/v) ACN/water containing 0.1% (v/v) trifluoroacetic acid, flow rate of 20 μl min−1) and separated on a C18 column (Acclaim PepMap C18, 2 μm, 100 Å, 75 μm, 150 mm, Thermo Fisher Scientific) in a linear gradient from 5–28% buffer B in 63 min followed by an increasing step to 98% B in 1 min and washing for 9 min with 98% buffer B before equilibration for 15 min with initial conditions with a flow of 300 nl (buffer A, 0.1% formic acid; buffer B, 80% ACN and 0.1% formic acid). The total acquisition time was 100 min. The Orbitrap worked in centroid mode with a duty cycle consisting of one MS1 scan at 70,000 resolution power with a maximum injection time of 300 ms and an AGC target of 3 × 106 followed by 40 variable MS2 scans using a 0.5-Da overlapping window pattern. The window length started with 25 MS2 scans at 12.5 Da, followed by 7 windows with 25 Da, and the last 8 windows were set to 62.5 Da. Precursor MS spectra (m/z 378–1,370) were analysed with 17,500 resolution after 110 ms accumulation of ions to a target value of 3 × 106 in centroid mode. The following mass spectrometric settings were used: spray voltage, 2.1 kV; no sheath and auxiliary gas flow; heated capillary temperature, 275 °C; normalized HCD collision energy 27%. In addition, the background ions m/z 445.1200 acted as lock mass.

Raw data were processed using DIA-NN v.1.8 (ref. 42) with the scan window size set to 7 and the MS2 and MS1 mass accuracies set to 20 ppm and 10 ppm, respectively. A spectral library-free approach and yeast UniProt (UP000002311, reviewed, canonical, downloaded 18 November 2021)63 were used for annotation. The output was filtered at 1% FDR on peptide level. Log2-transformed protein expression levels between the aneuploid and the euploid isolate were calculated per time point for each protein present in at least two of the three biological replicates of the euploid strain in the given time point, and normalized per strain and time point as described below for the natural isolate library.

Ubiquitinomics

Selected aneuploid and euploid yeast isolates were cultivated in SM medium (6.7 g l−1 yeast nitrogen base with ammonium sulfate, Difco 291920, 20 g l−1 glucose) at 30 °C. Three individual pre-cultures per strain were cultured for 16 h, and used to inoculate three flasks of 30–50 ml SM medium per strain. Cultures were collected at mid-log phase by centrifugation (2,880g, 8 min, 4 °C) and pellets were frozen at –20 °C. Cells were lysed using glass beads (volume equal to pellet volume) in 200 µl freshly prepared SDC buffer (1% sodium deoxycholate, 10 mM TCEP, 40 mM chloroacetamide and 75 mM Tris-HCl, pH 8.5) by five cycles of 1 min vortexing, 1 min on ice. Samples were centrifuged (13,800g, 15 min, 4 °C) and the supernatant was collected. Protein concentrations were determined using a Pierce BCA Protein Assay Kit (Thermo Fisher Scientific, 23225). Then, 500 µg of proteins was digested with a trypsin/LysC mix (V5071 or V5072, Promega) overnight at 37 °C with a 1:50 enzyme-to-protein ratio. K-GG peptide enrichment was performed as reported previously39. The digestion was stopped by adding two volumes of 99% ethylacetate/1% TFA, followed by sonication for 1 min using an ultrasonic probe device (energy output of around 40%). The peptides were desalted using 30 mg Strata-X-C cartridges (8B-S029-TAK, Phenomenex) as follows: (a) conditioning with 1 ml isopropanol; (b) conditioning with 1 ml of 80% ACN/5% NH4OH; (c) equilibration with 1 ml of 99% ethylacetate/1% TFA; (d) loading of the sample; (e) washing with 2× 1 ml of 99% ethylacetate/1% TFA; (f) washing with 1 ml of 0.2% TFA; and (g) elution with 2× 1 ml of 80% ACN/5% NH4OH. The eluates were snap-frozen in liquid nitrogen and lyophilized overnight. K-GG peptide enrichment was performed by resuspending lyophilized peptides in 1 ml of cold immunoprecipitation (IP) buffer (50 mM MOPS pH 7.2, 10 mM Na2HPO4 and 50 mM NaCl). Peptides were then incubated with 4 µl of K-GG antibody bead conjugate (Cell Signaling Technology, PTMScan HS Ubiquitin/SUMO Remnant Motif (K-ε-GG) Kit, 59322) for 2 h at 4 °C with end-over-end rotation. Beads were washed (with the help of a magnetic stand) four times with 1 ml IP buffer and an additional time with cold Milli-Q water. After removing all of the supernatant, the beads were incubated with 200 µl of 0.15 % TFA at room temperature while shaking at 1,400 rpm. After briefly spinning, the supernatant was recovered and desalted using in-house-prepared, 200 µl two plug StageTips66 with SDB-RPS (3M Empore, 2241). SDB-RPS StageTips were conditioned with 60 µl isopropanol, 60 µl 80% ACN/5% NH4OH and 100 µl 0.2% TFA. The K-GG enrichment eluate (0.15% TFA) was directly loaded onto the tips followed by two washing steps of 200 µl 0.2% TFA each. Peptides were eluted with 80% ACN/5% NH4OH. Peptides were Speedvac-dried and then resuspended in 10 µl of 0.1% FA, of which 4 µl were injected into the mass spectrometer.

For LC–MS measurement, peptides were loaded on 40-cm reversed-phase columns (75 µm inner diameter, packed in-house with ReproSil-Pur C18-AQ 1.9 µm resin (ReproSil-Pur, Dr. Maisch)). The column temperature was maintained at 60 °C using a column oven. An EASY-nLC 1200 system (Thermo Fisher Scientific) was directly coupled online with the mass spectrometer (Q Exactive HF-X, Thermo Fisher Scientific) through a nano-electrospray source, and peptides were separated with a binary buffer system of buffer A (0.1% FA plus 5% DMSO) and buffer B (80% ACN plus 0.1% FA plus 5% DMSO), at a flow rate of 300 nl min−1. The mass spectrometer was operated in positive polarity mode with a capillary temperature of 275 °C. The DIA method consisted of an MS1 scan (m/z = 300–1,650) with an AGC target of 3 ×106 and a maximum injection time of 60 ms (R = 120,000). DIA scans were acquired at R = 30,000, with an AGC target of 3 × 106, ‘auto’ for injection time and a default charge state of 4. The spectra were recorded in profile mode and the stepped collision energy was 10% at 25%. The number of DIA segments was set to achieve an average of four to five data points per peak. For details on the DIA method set-up, see a previous report39.

Raw data-independent acquisition data files were analysed using DIA-NN (v.1.8) in library-free mode searching against the S. cerevisiae reference proteome (strain ATCC 204508/S288c, UniProt ID UP000002311, excluding isoforms, accessed 18 November 2021). Trypsin/P, one missed cleavage, a maximum of two variable modifications (including cysteine carbamidomethylation and diglycine remnant modification, K-GG) and a precursor charge rate between 2 and 4 were set for precursor ion generation. The MBR and remove interferences options were enabled and Robust LC (high precision) was chosen as the quantification strategy. Peptides with a diglycine remnant (UniMod: 121) were used to quantify genes using the built-in MaxLFQ algorithm in DIA-NN, with a global and run-specific FDR of 1% being applied at both the precursor and the protein group level. The resulting data were filtered to include only genes that were measured in at least two of the three biological replicates in at least one strain for further analyses.

Intracellular lysine measurements

Isolates (ABH, AFR, AHR, AHS, AII, AIP, ALK, ALM, ALT, AMC, ANA, ANR, APD, APM, APT, AQD, ARL, ARV, ASV, ATA, ATC, AVL, BBD, BBV, BDA, BDI, BDK, BDL, BFK, BFV, BHE, BIP, BKP, BLF, BPA, BPG, BPH, BPP, BTS, CAH, CAN, CCH, CHH, CLN, CLT, CME, CMF, CMM, CMN, CNL, CPB, CPE, CPQ, CPR, CPT, CQQ, CQR, CRL, SACE.YAB and SACE.YCO, as for dynamic SILAC experiments, see below) were randomized in triplicate on two 96-well microtitre plates. Six replicates of a lysine-auxotroph lab strain (BY4742-HLU)67 were also added (three positions randomly per plate). Colonies were picked after 48 h growth at 30 °C on SM medium + 2% agar with a Singer Rotor HDA (Singer Instruments) and pre-cultured overnight in 200 µl SM medium supplemented with labelled l-lysine (80 mg l−1) for the continuous labelling experiment, or 200 µl unlabelled l-lysine (80 mg l−1) for the switching experiment. The OD600 nm was measured after 17 h (Tecan Infinite) and cells were diluted to a starting OD of around 0.1 in 1.6 ml Lys-8- or Lys-0-labelled medium, respectively. For the continuous labelling experiment, isolates were cultured for 8 h at 30 °C whilst shaking, collected by centrifugation (2,900g, 10 min, 4 °C), and stored overnight at −80 °C. For the switching experiment, isolates were cultured at 30 °C whilst shaking for 4 h, then centrifuged (2,900g, 10 min, 30 °C), the supernatant discarded and the pellets washed using 1 ml SM medium, cultivated for another 3 h at 30 °C whilst shaking, and collected as above. The final OD600 nm at collection of all samples was measured.

Amino acids were extracted by adding 200 µl pre-cooled 80% ethanol containing the internal standard D4-l-lysine (Silantes, 211113913) to each of the frozen cell pellets. The samples were incubated for 2 min at 80 °C and subsequently vortexed. This step was repeated two more times. The samples were centrifuged (2,900g, 10 min, 4 °C) and the supernatants were collected. The measurements of 1 µl per sample were performed on a triple quadrupole mass spectrometer system (Agilent 6460) as previously described68. Technical controls from pooled extracted metabolite samples were included and measured by LC–MS/MS after every 15th sample, in total 27 times. The analysis was performed using MassHunter Software B.07.01 (Agilent Technologies). The internal standard response ratios were calculated for each sample and normalized to the OD600nm measured at collection.

Dynamic SILAC

Yeast strains (46 diploid aneuploid isolates with a single chromosome gain (trisomic strains) for which we quantified attenuation, 2 randomly chosen haploid aneuploid isolates with a single chromosome gain, as well as 10 diploid euploid and two haploid euploid isolates with a similar range of growth rates to that of the aneuploid isolates, meaning isolates ABH, AFR, AHR, AHS, AII, AIP, ALK, ALM, ALT, AMC, ANA, ANR, APD, APM, APT, AQD, ARL, ARV, ASV, ATA, ATC, AVL, BBD, BBV, BDA, BDI, BDK, BDL, BFK, BFV, BHE, BIP, BKP, BLF, BPA, BPG, BPH, BPP, BTS, CAH, CAN, CCH, CHH, CLN, CLT, CME, CMF, CMM, CMN, CNL, CPB, CPE, CPQ, CPR, CPT, CQQ, CQR, CRL, SACE.YAB and SACE.YCO; see also Supplementary Table 16) were grown on synthetic medium containing 6.7 g l−1 yeast nitrogen base, 2% glucose and 80 mg l−1 l-lysine (SM + Lys-0). For SILAC labelling, l-lysine was swapped for 80 mg l−1 heavy [13C6/15N2] lysine (SM + Lys-8). Cells were taken from cryo stocks and streaked on freshly prepared SM + Lys-0 agar plates (20 g l−1 agar) and grown for 48–72 h at 30 °C. Colonies across the whole agar plate were gathered and cultivated in 5 ml SM + Lys-0 for approximately 16 h at 30 °C and 300 rpm. The overnight pre-culture was then diluted in 25 ml in SM + Lys-0 (pre-warmed to 30 °C) to a starting OD600 nm of around 0.1. The culture was grown at 30 °C, 300 rpm until it reached an OD600 nm of between 0.25 and 0.3. At this point, the medium was switched from SM + Lys-0 to SM + Lys-8 using the following procedure. First, 20 ml of the culture was transferred into a 50-ml Falcon tube and centrifuged for 5 min at 30 °C, 3,095g. Then, the supernatant was decanted and the pellet was washed twice with 4 ml SM + Lys-8 (pre-warmed to 30 °C). Lastly, the pellet was resuspended in 20 ml warm SM + Lys-8 and transferred into clean flasks. The heavy-labelled cultures were grown at 30 °C, 300 rpm and at three time points (90 min, 135 min and 180 min), 2 ml of the culture was collected into ice-cold screw cap tubes. The samples were centrifuged at 10,000g, 4 °C, the supernatant aspirated and the pellets stored at −80 °C. At each collection time point, the OD600 nm was also recorded. Strains BPP, BDK, ATA and BFV did not grow well under the chosen conditions or were not growing exponentially when sampled, and were therefore omitted from further processing.

Cells were lysed mechanically in screw cap tubes by adding around 100 mg glass beads and 100 µl fresh lysis buffer (7 M urea and 100 mM ABC) to each sample, followed by two cycles of bead beating (5 min, 1,500 rpm, followed by 5 min on ice) using a GenoGrinder. The samples were briefly centrifuged (4,000g, 1 min) and the supernatant was transferred to a 500-μl Eppendorf 96-well plate. From this step onwards, all samples were processed together in high throughput. Reduction, alkylation and digest were performed as described in the ‘Twenty-four-hour time-course proteomics’ section, using 10 μl of 55 mM DTT, 10 μl of 120 mM IAA, 380 μl of 100 mM ABC and 2 μg of trypsin/LysC. Samples were digested for 17 h at 37 °C. The digest was stopped by the addition of 25 μl of 20% FA and the samples were purified using solid-phase extraction as described above. Purified peptides were dried using a vacuum concentrator and subsequently dissolved in 25 µl 0.1% FA. Peptide concentrations were determined using a fluorimetric peptide assay kit following the manufacturer’s instructions (Thermo Fisher Scientific, 23290).

For each strain, 1 µg of peptide sample was separated on a VanquishNeo System (Thermo Fisher Scientific) by reverse-phase chromatography with a 30-min efficient gradient from 3 to 30% ACN on a self-packed 20-cm column (ID 75 µm, 1.9-µm beads), and directly injected through electrospray ionization (ESI) to an Exploris480 Orbitrap (Thermo Fisher Scientific). In brief, the MS settings for Top20 acquisition scheme were the following: ESI voltage: 2.2 kV; resolution MS1 60k; IT MS1 10 ms; RF-Lens 55; resolution MS2 15k; maxIT MS2 50 ms; isolation width 1.2 Da; HCD collision energy 28; AGC target 100%.

Raw files were analysed with MaxQuant v.1.6.7.0 using standard settings, with match between runs and requantify enabled, and the Uniprot S. cerevisiae protein database including isoforms (downloaded 9 February 2023) was selected for the database search. The complexity was set to 2, with Lys-8 set as the heavy label. Further processing of the data and calculation of half-lives was done in R. First, the evidence.txt was loaded with the fread function from the data.table package, filtered for lysine-containing peptides and cleaned from potential contaminants and remaining reverse hits. Owing to the fact that many proteins in yeast are very stable, a correction for doubling times is not applicable to most identified proteins. As in a previous study46, we therefore calculated turnover rates (kdp) and the corresponding half-lives without doubling time or dilution rate correction (Supplementary Table 16). In more detail, protein turnover rates were calculated for proteins with valid SILAC ratios in at least two time points per strain by building a linear model from the different sampling time points against the log-transformed H/L ratios, thus calculating kdp. The corresponding slopes from each fit depict the kdp value for each strain. Half-lives were calculated from the resulting kdp as log(2)/kdp (Supplementary Table 17). Isolate CLN was excluded owing to a very low number of valid SILAC ratios obtained at t = 135 min. Furthermore, three proteins (ERP1, NEO1 and MDE1) were excluded from the dataset because they were measured in only a few isolates (6, 4 and 3, respectively) and exhibited very high variability in half-lives across these strains.

Post-processing statistical analyses

All statistical analyses were conducted in R v.3.6 unless otherwise indicated. KEGG annotations for S. cerevisiae genes were obtained through the KEGG database (accessed January 2021)69. The org.Sc.sgd.db package70 was used to obtain chromosomal location information for genes and to map gene names to systematic open reading frame (ORF) identifiers. If no gene name was annotated in this package, the systematic ORF identifier was used instead. Standardized S. cerevisiae yeast strain names and systematic ORF identifiers for genes were used throughout all analyses. Heat maps were plotted using the ComplexHeatmap package71. In all box plot representations, the centre marks the median, box plot hinges mark the 25th and 75th percentiles and whiskers show all values that, at maximum, fall within 1.5 times the interquartile range.

Assembly of integrated chromosome copy number, mRNA and protein expression datasets

Gene copy numbers for natural isolates were downloaded from the 1002 Yeast Genome website (http://1002genomes.u-strasbg.fr/files/)1, and the following loci were excluded: ribosomal DNA, Ty elements, RTM loci, ORFs located on the 2-micron plasmid, mitochondrial ORFs and non-reference material. Furthermore, the table was filtered to retain only genes with non-zero and non-missing values for further analyses. Chromosome copy number status for all engineered disomic strains was confirmed by Torres et al.3, meaning that all disomic strains used in our study were haploid with indicated ‘disomic’ chromosomes duplicated. One exception was disome 13: despite published mRNA expression values being available and proteomics data having been measured in our experiments, disome 13 was excluded from all analyses because it had undergone whole-genome duplication when reaching our laboratory (personal communication, J. Zhu).

For 761 isolates, both proteomes (this study) and transcriptomes6 were available, and for 759 isolates, gene copy number information1 was available. Data for gene copy number, mRNA expression and protein abundances were matched by strain name and systematic ORF identifier for both the natural isolate collection and the disomic strain collection. Only genes for which values for gene copy number, transcript and protein levels were available were used for analyses. We noticed that a number of strains in the natural isolate collection exhibited a mismatch between the median gene copy number per chromosome and the assigned aneuploidy as described in Supplementary Table 1, which is likely to be attributable to segmental aneuploidies, shorter gene copy number variations or algorithm-specific thresholds used for aneuploidy determination. We excluded all strains (n = 80) containing one or more of those ‘mismatched’ chromosomes from our analysis (Supplementary Table 5). From this point, chromosome copy numbers as given by the aneuploidy annotation were used throughout the analyses.

Calculation of relative chromosome copy numbers, mRNA and protein expression values

The assembled integrated dataset was used to compare the relative changes in chromosome copy number, mRNA transcript expression and protein abundances between aneuploid and euploid strains. Relative chromosome copy number changes were calculated as the log2 ratio between the chromosome copy number and the ploidy of the strain. Relative abundances for transcriptomic and proteomic data were calculated gene-wise as the log2 ratio between a gene’s mRNA or protein abundance in a given strain and the median mRNA or protein expression value of the respective gene across all euploid strains (‘all-euploid strain’ method). In addition, ploidy-wise calculation of relative mRNA or protein abundance was tested, comparing the abundances of a haploid strain to the median abundance of that same mRNA or protein across all euploid haploid strains, each diploid strain to all euploid diploids and so forth for all basal ploidies (‘ploidy-wise’ method). There was a high correlation between relative expression values calculated across all euploid strains, and ploidy-wise calculated relative expression values (Extended Data Fig. 9), indicating that non-linear scaling of the proteome with ploidy72 had no significant effect on the outcome of the used data normalization strategy. For the transcriptomic data of the lab-engineered disomic strains, we used the transcript levels as published; that is, as log2 fold changes relative to the wild-type strain (disome WT, 11311)3. For replicate measurements, the median value was used for further analysis.

For the log2 mRNA and protein ratios of genes encoded on euploid chromosomes, a distribution centred around 0 would be expected, representing no overall shift of relative expression values of these genes across strains. This was true for our proteomics data, and also for most strains in the transcriptomic data. Because some natural isolates showed left tails in this distribution for the transcriptomic data, presumably because of restricting the assembled dataset to genes for which we had data across all three -omics layers, we decided to normalize the relative mRNA and protein expression values. Normalization was performed in a strain-by-strain manner for both the across-euploid strains and the ploidy-wise ratio calculation methods (see above): first, the median log2 mRNA or protein ratio of all genes encoded on euploid chromosomes of a given strain was calculated. This median value was then subtracted from all log2 mRNA or protein ratios of that strain.

The proteome profiles of disome 12 and disome 14 showed no aneuploid signature, indicating that those strains, even though they were held under selective pressure, had lost their duplicated chromosome either before their arrival in our laboratory or during our experiments. Both strains were therefore excluded from our analysis. Similarly, when comparing relative chromosome copy number changes and relative mRNA expression levels in the natural isolate collection, we noticed discrepancies indicative of chromosomal instabilities in natural S. cerevisiae isolates. Some euploid strains had gained or lost chromosomes, evident as much higher or lower fold changes of expression values observed in the transcriptomics data than in chromosome copy numbers. Likewise, some aneuploid strains underwent changes in their karyotype, resulting in either more complex aneuploidies or in aneuploid strains reverting to euploid strains. We decided to include in our analysis only strains that showed consistent relative expression (log2 ratio) values on the chromosome copy number and the transcriptome level. Consequently, we excluded strains that had at least one chromosome for which the difference between relative chromosome copy number and the median of the normalized relative mRNA abundances differed by more than ±4 standard deviations from the mean, on the basis of all relative chromosome–mRNA comparisons (Supplementary Table 6, n = 66). After excluding these strains, the calculations to obtain gene-wise relative (strain/euploid) mRNA and protein expression values were repeated to avoid unintended biases towards these excluded strains.

Gene-by-gene quantification of dosage compensation

Linear regressions between log2 mRNA or protein expression ratios and relative chromosome copy number (CN) changes (log2 chromosome CN/basal ploidy) were performed for all genes that were encoded on an aneuploid chromosome in at least three different natural aneuploid isolates (so genes on chromosomes 1, 3, 4, 5, 6, 8, 9, 11, 12 and 14). Isolates that had reverted to euploidy (Supplementary Table 12) were excluded from this analysis. Relative chromosome copy numbers were restricted to be greater than or equal to 0, thus including all euploid chromosomes, and all chromosome gains of aneuploid isolates, but excluding chromosome losses. Therefore, each regression was performed using data for the expression of the gene on euploid chromosomes (log2 CN change = 0) and at least three independent data points with a relative chromosome CN change greater than 0. The slopes of these gene-by-gene linear regressions were used as a measure of across-isolate dosage compensation. For lab-engineered disomic strains, a similar analysis was performed; however, it was necessarily restricted to one ‘aneuploid’ data point per gene and forced through 0 because each aneuploid chromosome was engineered exactly once in the disomic strain collection. Therefore, in total, the regressions were performed for 827 and 680 genes at the mRNA and protein level in natural isolates and disomic strains, respectively. For the cumulative distribution (‘rolling threshold’) analysis, the number of mRNAs or proteins exhibiting attenuation slopes smaller than a given threshold were counted, and effect sizes for these attenuated mRNAs or proteins were calculated as the median of the attenuation slopes smaller than the respective threshold. For the following analyses, a threshold of 0.85 was selected to define attenuated mRNAs and proteins.

For assessment of the protein properties on attenuation, the following sources were used: macromolecular-complex membership: Complex Portal of the EBI (accessed December 2020)73; protein–protein interactions (PPIs): STRING database (accessed November 2022)74; prediction of protein disorder and linear interacting peptides by AlphaFold, MobiDB and anchor: MobiDB (accessed October 2022)75; GC content and percentile mean gRSCU: calculated on the Saccharomyces cerevisiae S288C sequence (NCBI: GCF_000146045.2_R64) using the gc1, gc2, gc3 and gRSCU functions in BioKIT v.0.1.2 (ref. 76); ribosome occupancy: from a previous study77; amino acid synthesis costs and glucose cost: from a previous study78; absolute protein copy numbers per cell: from a previous study79; protein length, mass and modification sites: UniProt (accessed October 2022, ubiquitinated residue information inferred from experimental and automatic cross-link evidence listed as ‘Glycyl lysine isopeptide (Lys-Gly) (interchain with G-Cter in ubiquitin)’)63; and protein half-life: from a previous study46. The internal variability of transcripts and proteins was calculated as the standard deviation of mRNA and protein abundance across all euploid isolates of the collection, respectively. Receiver operator characteristics were calculated using the pROC package80 as described previously26.

For assessment of mRNA- and protein-level annotation across pathways, cellular localizations, molecular functions and biological processes, KEGG annotations were obtained using the KEGG API (accessed January 2021)69, and a GO slim mapping file was obtained from SGD (accessed January 2021)81. The degree of relative attenuation was determined as 100 × (1 − slope) per mRNA or protein, and median attenuation levels per KEGG or GO category were calculated only for those KEGG or GO terms with at least six mRNAs or proteins for which an attenuation slope had been determined.

For the comparison of proteins non-exponentially degraded in human cells versus yeast, we identified the yeast homologues of the human proteins quantified and assigned as either exponentially degraded (ED), non-exponentially degraded (NED) or undefined in a previous report25. We identified 759 yeast homologues for 3,187 human proteins covered in that report25 (around 23%), agreeing very well with the expected fraction of the human proteome that has yeast homologues. Of those 759 proteins, 146 were classified as NED, 349 classified as ED and 262 as undefined; two proteins could not be unambiguously mapped to one of these categories.

Chromosome-wide and strain-by-strain quantification of dosage compensation

To assess attenuation at the chromosome level, the median mRNA or protein log2 ratio of all genes encoded per chromosome or relative chromosome copy number change across isolates was calculated. For both disomic lab-engineered strains and natural isolates of S. cerevisiae, mRNA and protein expression log2 ratios between a gene’s expression in a strain and the gene’s expression over all euploid strains were examined in relation to the log2-transformed fold change of the copy number of the chromosome on which the gene is located. For gains of chromosomes, all log2 chromosome copy number changes for which fewer than 300 affected data points (genes) were quantified were excluded for distribution visualization. For chromosome losses, fewer data points were available overall, so the described cut-off was set at 50 data points (genes). The attenuation observed in the lab-engineered disomic strains measured by DIA-MS was comparable to that previously measured using SILAC10 (Extended Data Fig. 10).

To quantify the relationship between chromosome gains (log2 chromosome copy number/ploidy > 0, all relative chromosome copy number changes included) and relative mRNA or protein expression from aneuploid chromosomes, linear models were fitted between the log2 chromosome copy number change and the median relative mRNA or protein expression value.

For the strain-by-strain quantification of dosage compensation, non-parametric two-sided one-sample Wilcoxon tests were performed for each relative chromosome copy number change per isolate to compare the normalized log2 mRNA or protein expression distributions to the expected median (log2 chromosome copy number/basal ploidy). P values were corrected using the Benjamini–Hochberg method. This way, for each isolate, it could be assessed whether the observed attenuation of chromosomes with the same copy number change in that isolate was significant or not. Aneuploid isolates were marked as ‘reverted to euploid’ if the pseudomedian of the protein-level Wilcoxon test was between −0.1 and 0.1. The attenuation at mRNA or protein level was calculated as 100 × (1 − pseudomedian/relative chromosome copy number change) per isolate, with ‘pseudomedian’ referring to the pseudomedian obtained from the Wilcoxon test. The calculation was performed only for isolates that had a single aneuploidy, or complex aneuploidies of the same relative chromosome copy number change of aneuploid chromosomes; that is, attenuation levels could, for example, be calculated for a diploid isolate with one extra copy of chromosome 1 (for example, isolate BDI), also, for example, for a diploid isolate with one gained copy of chromosome 1 and one gained copy of chromosome 4 (for example, isolate CFV), but not, for example, for a tetraploid isolate with a complex aneuploidy that gained one copy of chromosome 1 and lost a copy of chromosome 3 (for example, isolate BRP).

For investigating the relationship between the degree of aneuploidy and dosage compensation, we defined an additional measure of degree of aneuploidy by calculating the ploidy-adjusted absolute number of protein copies per cell of all proteins encoded on aneuploid chromosomes in a given strain (referred to as ‘aneuploid protein load’). This measure correlated very well with the number of genes located on aneuploid chromosomes, a previously used measure of aneuploidy degree (for example, ref. 26; PCC = 0.96, P << 0.05).

Assessment of the trans transcriptome and proteome response in natural aneuploid isolates

Trans expression at the transcriptome and proteome level was defined as the mRNA or protein expression, respectively, of genes encoded on euploid chromosomes in aneuploid isolates. Genes up- or downregulated according to the ESR were mapped as described in ref. 44, genes up- or downregulated according to the CAGE signature were mapped as described in ref. 34 and genes annotated as upregulated in the APS were mapped as described in ref. 10. To find genes differentially expressed in trans in aneuploid strains—that is, genes encoded on euploid chromosomes of aneuploid strains that show up- or downregulation at the mRNA or protein level when compared with euploid strains—we calculated the gene-by-gene median normalized relative mRNA or protein abundances (log2 ratios) of all genes encoded on euploid chromosomes in aneuploid strains (n = 95). KEGG-pathway GSEA of these median relative expression values was performed with WebGestalt 2019 using the default settings (accessed December 2021)82. In addition, one-sample t-tests were used to compare gene-by-gene mean normalized protein log2 ratios across euploid chromosomes of aneuploid strains against the theoretical gene-by-gene mean protein log2 ratio value across euploid strains (µ = 0). P values were corrected for multiple hypothesis testing using the Benjamini–Hochberg method as implemented in the rstatix package83. Annotation of structural components was obtained through KEGG (accessed January 2021)69. Detailed proteasome component annotations were obtained from a previous study84.

For the analysis of the role of RPN4 in mediating the increase of proteasome abundance, RPN4 transcript levels were obtained from the transcriptome data of the natural isolate collection6, and TMM normalized and scaled as described above. RPN4 regulon targets (found in high-throughput screens and manually curated ones) were downloaded from SGD (accessed April 2023)81, with around 50 of these targets being measured in the proteomic dataset of the natural isolates.

Determination of ubiquitination levels

Relative levels of ubiquitinated proteins were determined gene-wise by calculating the log2 ratio between the measured abundance of a ubiquitinated protein in each strain and the median abundance of the ubiquitinated protein across all euploid strains. Assuming a distribution centred around 0 of relative levels of ubiquitinated proteins on euploid chromosomes, these relative abundances were then normalized strain-wise by subtracting the calculated median log2 ratio of all genes expressed on euploid chromosomes of a strain from all log2 ratios of that strain.

Attenuation and turnover analyses

Proteomes of the aneuploid yeast deletion collection were obtained from a previous study45. Fold changes were defined as ratios between protein abundances and the median abundances of the respective protein across all strains. Chromosomes were defined as duplicated when the median log2 expression levels were greater than 0.8 across all measured proteins on the respective chromosome. Log2 fold changes were averaged across the strains with duplications of the respective chromosome. Long and short half-lives were defined as being greater than the 75 % and less than the 25% quantile (n = 110), respectively. Half-lives were taken from a reference dataset and were obtained by metabolic labelling46.

To compare the turnover rates of proteins when expressed from aneuploid versus euploid chromosomes, we first filtered the turnover dataset to include only proteins for which we had turnover rates determined in at least 80% (44/55) of isolates and KNN-imputed the remaining missing values. We then quantile-normalized the turnover rates to correct for differences in overall turnover rates between isolates. We calculated the median quantile-normalized turnover rate for each protein when expressed from aneuploid chromosomes, from euploid chromosomes of aneuploid isolates or from euploid chromosomes of euploid isolates. These calculations were performed only for proteins for which turnover rates were determined at least three times on aneuploid and euploid chromosomes, respectively. The median turnover rates were then compared to count the number of times a protein exhibits a difference in turnover rates depending on whether it is expressed from aneuploid or euploid chromosomes.

For determining the relationship between protein attenuation and overall turnover rates of isolates, the Pearson correlation between the protein’s log2 expression ratio in a given isolate (see ‘Calculation of relative chromosome copy numbers, mRNA and protein expression values’ section) and isolates’ turnover rates was calculated. This analysis was performed for each protein expressed at least three times from aneuploid chromosomes, euploid chromosomes of euploid isolates or euploid chromosomes of aneuploid isolates. GSEA was conducted on Pearson correlation coefficients with WebGestalt 2019 using the default settings.

Reporting summary

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.