Abstract

Phenotypic cell-to-cell variability is a fundamental determinant of microbial fitness that contributes to stress adaptation and drug resistance. Gene expression heterogeneity underpins this variability but is challenging to study genome-wide. Here we examine the transcriptomes of >2,000 single fission yeast cells exposed to various environmental conditions by combining imaging, single-cell RNA sequencing and Bayesian true count recovery. We identify sets of highly variable genes during rapid proliferation in constant culture conditions. By integrating single-cell RNA sequencing and cell-size data, we provide insights into genes that are regulated during cell growth and division, including genes whose expression does not scale with cell size. We further analyse the heterogeneity of gene expression during adaptive and acute responses to changing environments. Entry into the stationary phase is preceded by a gradual, synchronized adaptation in gene regulation that is followed by highly variable gene expression when growth decreases. Conversely, sudden and acute heat shock leads to a stronger, coordinated response and adaptation across cells. This analysis reveals that the magnitude of global gene expression heterogeneity is regulated in response to different physiological conditions within populations of a unicellular eukaryote.

Main

Gene expression is tightly regulated at multiple levels, including chromatin structure, transcription, mRNA degradation and translation. This multi-layered process underpins the robust and timely expression of single proteins as well as the coordinated regulation of entire genetic programs that include dozens of genes. Yet, even in constant environments, the expression of specific genes varies between genetically identical cells, which leads to cell-to-cell heterogeneity in mRNA numbers and concentrations1,2,3. Cell-to-cell variability in gene expression results from different phenomena. First, the random timing of biological reactions makes transcription intrinsically stochastic. This form of variability, also called intrinsic noise, is gene specific and depends on the promoter sequence and chromatin states4,5. Heterogeneity in quantitative traits such as cell size, growth rate and the concentration of transcription factors also shapes gene expression variability in complex, non-trivial ways. This form of variability is not entirely stochastic and depends on other single-cell attributes that affect biomolecule numbers6,7. Furthermore, cells can enter dynamic cellular states that are characterized by specific gene expression programs. Examples of this include progression through the cell cycle or adoption of distinct metabolic states8. Cells in different states co-exist in cell populations or tissues leading to dynamic, yet deterministic, cell-to-cell variability in gene expression. Finally, cells in metazoan tissues belong to different cell types that are important for organ architecture and function. Although reversible and plastic, this form of individuality is not expected to be as dynamic as the transient cellular states.

The development of RNA sequencing protocols that support the analysis of entire transcriptomes from single cells has been instrumental in describing cell-to-cell variability and phenotypic heterogeneity in multicellular organisms9. Because such approaches sample expression levels of many genes in an unbiased manner, they provide insights into the molecular complexity of healthy tissues and tumours, affording a better understanding of tissue biology in health and disease10,11,12.

The gene-expression variability that is present in a population of unicellular organisms is conceptually different from heterogeneity in metazoan tissues. Our understanding of its structure and regulation remains superficial as transcriptomic approaches to sampling gene expression in single microbial cells have lagged behind13. This is mostly due to the small size and resistant cell walls of microorganisms14. Such approaches are critically needed because determining the extent of cell-to-cell variability in gene regulation in microbial populations is required to reach a mechanistic understanding of antibiotic resistance, cellular adaptation or population dynamics and evolution13,14.

Here, we overcome these limitations and develop an integrated experimental and computational framework to image individual cells of the fission yeast Schizosaccharomyces pombe followed by single-cell RNA sequencing (scRNA-seq) analysis and Bayesian true count recovery. Using this approach, we obtain a unique account of the heterogeneity in gene expression and cellular states as a function of cell size, growth and adaptation in this popular model organism.

Results

Single fission yeast cell imaging and transcriptome analysis

We developed an integrated approach for the imaging and isolation of single cells using a tetrad dissection microscope followed by transcriptome analysis through scRNA-seq (Fig. 1a). Datasets that combined images and scRNA-seq libraries were generated for 2,028 cells across a range of conditions together with 780 matching control libraries, each obtained from 3 pg total RNA (ctrRNA; Supplementary Table 13).

Fig. 1: Imaging and transcriptome analysis of single fission yeast cells.
Fig. 1

a, Experimental and analysis pipelines. Batches of 96 single cells were imaged and isolated using an MSM-400 dissection microscope (Singer Instruments) (grey). Single-cell RNA-sequencing data were generated using SCRB-seq for yeast cells, normalized using bayNorm (green) 17 and used for functional analysis (blue). b, Overall coverage of scRNA-seq datasets and selection of high-confidence genes. The highest raw count of each coding and non-coding gene observed across 21 datasets (n = 2,028 cells) is plotted as a function of the number of cells in which it was detected. All of the genes (n = 6,646) as well as the high-confidence genes used for all further analyses in this study (n = 1,011) are shown. High-confidence genes were defined as genes that represented >0.16% of the transcriptome of at least one cell. c, Gene expression levels in single cells. Normalized scRNA-seq counts for high-confidence genes were plotted as a function of the data of population average mRNA copies per cell from Marguerat and colleagues18. The normalized counts in single cells (n = 2,050,308 measurements) and average counts across cells are shown. RPearson = 0.61 and 0.48 for average counts (n = 1,011 genes) and when low-confidence genes were included, respectively.

For library preparation, we used a variation of the single-cell RNA barcoding and sequencing (SCRB-seq) protocol15. Our approach targets 3′-end cDNA sequences, includes unique molecular identifiers (UMIs) and benefits from optimizations of the Smart-seq2 protocol (see Methods and Supplementary Table 9)15,16. We generated 8.2 × 105 mappable sequencing reads per cell that were clustered around the transcription termination sites and corresponded to 6,721.3 unique mRNA molecules on average (Supplementary Fig. 1a–d, Supplementary Tables 14). This represents a mean transcriptome coverage of ~1.5% or ~6% on the basis of calibration with spike-in controls or single-molecule fluorescence in situ hybridization (smFISH) measurements, respectively (Supplementary Fig. 1e).

On average we could detect 1,421.1 genes per cell, but as genes with low molecule counts carried little information on true expression levels, we focused this study on a consolidated list of 1,011 robustly detected high-confidence genes that represented 18.5% and 4.3% of all coding and non-coding genes, respectively (see Methods; Fig. 1b and Supplementary Fig. 1f,g). These genes were often highly expressed in cell populations, involved in most cellular processes and showed constitutive, as well as condition-specific, regulation (Fig. 1b, Supplementary Fig. 1h and Supplementary Table 6).

The shallow transcriptome coverage inherent to scRNA-seq together with the high level of amplification required to detect small yeast transcriptomes made the pre-processing of data challenging. We used a Bayesian normalization approach called bayNorm that performs true mRNA count recovery based on cell-specific mRNA mean capture efficiencies (β) and gene-specific priors estimated from the data (see Methods and Supplementary Fig. 2a)17. The methodology and performance of this approach is reported in detail elsewhere17. Applied to our datasets, bayNorm generated true count distributions that were highly similar to measurements from population RNA-seq (Supplementary Fig. 2b) and to absolute mRNA counts obtained by smFISH (Supplementary Fig. 2c and Supplementary Table 5)17,18. Finally, true counts correlated with expression levels derived from cell populations18 while preserving information about cell-to-cell variability in mRNA expression (average RPearson = 0.78; Fig. 1c and Supplementary Fig. 2b–e). In summary, we generated a combined dataset of transcriptomes and microscopy images from single yeast cells, which is analysed in detail below.

Cell-to-cell variability of transcriptome regulation

We first searched for mRNAs with high cell-to-cell variability in comparison to most genes (highly variable genes, HVG). For this, we used nine scRNA-seq datasets consisting of 864 unperturbed cells growing exponentially at 2–10 × 106 cells ml−1 (Supplementary Table 2). As reported previously and expected from stochastic gene expression models19,20, the coefficients of variation (CV, in σ µ−1) and means of normalized counts were anti-correlated in cells and ctrRNA (Supplementary Fig. 3a). We therefore defined HVGs as mRNAs with CV that were significantly higher than this overall trend using simulated data with the Poisson noise as the only source of variability as a reference (see Methods and Fig. 2a). We applied this procedure to each dataset normalized separately because it led to CV that were closest to smFISH measurements while avoiding batch effects (Supplementary Fig. 2c). We identified 411 genes with CV that were significantly higher than the baseline in at least one dataset; 112 of these genes were also present in ctrRNAs and were discarded as false positives. This analysis generated a list of 299 high-confidence HVGs (Supplementary Fig. 3b and Supplementary Table 6). To investigate the specificity of our scRNA-seq approach, we analysed five control genes and eight HVGs covering a range of variability scores and biological functions by smFISH. Significantly higher size-corrected Fano factors (scFano) were evident in HVGs, thus confirming their higher noise levels (Fig. 2b,c and Supplementary Fig. 3d).

Fig. 2: Cell-to-cell variability of the fission yeast transcriptome.
Fig. 2

a, Identification of HVGs. The CV of normalized counts are plotted against their respective mean expression for all filtered normalized genes (mostly hidden; n = 1,001), HGVs (n = 125) and simulated synthetic control RNA-seq data (synRNA; n = 1,001 genes; see Methods). Genes are called variable if their ΔCV is significantly higher than the distribution of ΔCV from synRNA of similar mean expression using z scores and assuming normality (P < 0.1 in ≥85% of the bootstrapped samples; see Methods). The dashed line represents a Loess fit to synRNAs and the ΔCV of an example variable gene is highlighted by an arrow. b, Validation by smFISH of mRNA called variable from the scRNA-seq data. Representative smFISH images are shown for low-variability control rpb1 mRNA and three mRNAs with different levels of variability (lsd90, mot1 and SPAC27D7.09c; see also Supplementary Figs. 2c and 3c,d). The scale bars represent 5 μm and the scFano values are indicated on each plot. c, Boxplot showing scFano factors measured by smFISH for HVGs (n = 8) or control genes (n = 5). The P value from a one-sided Wilcoxon test is shown above the figure. d, Functional analysis of variable genes from 9 datasets consisting of 96 cells each (n = 864 cells total) during rapid proliferation. The significance of the overlap of variable mRNAs in each dataset with selected functional categories is shown. P values were corrected for multiple testing using the Benjamini–Hochberg procedure; the number of tests = 14 categories per dataset. False positive mRNAs that were called from total ctrRNA experiments were filtered out (Supplementary Fig. 3b). Note that some categories are more pervasively variable across datasets than others. Transcription factor data (unpublished observations by Bähler lab). e, Functional analysis of non-periodic HVGs. HVGs that were not among the top 500 most cell-cycle periodic genes in cell populations were sorted into three categories with 'low', 'moderate' or 'high' pervasive variable expression (n = 235 genes). Selected distinctive gene functions are shown in the plot. Boxplots represent the median, interquartile range and most extreme data points that are no more than 1.5× the interquartile range.

Genes that are periodically expressed during the cell cycle have been identified in synchronized cell populations21,22,23,24. We hypothesized that these genes would be over-represented among HVGs, as we sampled asynchronous cells from different cell-cycle stages. Accordingly, 53.3% of the top 500 periodic genes24 that were found among high-confidence genes were HVGs (Supplementary Fig. 3b; PFisher = 5.1 × 10−9) . In addition, genes associated with phase-specific expression during the cell cycle were generally enriched among variable genes (Fig. 2d). To evaluate the sensitivity of our approach, we analysed the top 500 periodic genes that were not HVGs. These showed significant but low amplitude and periodicity in population RNA-seq data, which is consistent with scRNA-seq being a less sensitive approach (Supplementary Fig. 4a–c). Only a minority of periodic genes were false positives (12.5%), thus confirming the specificity of our experimental and computational protocols (Supplementary Fig. 3b and Supplementary Table 6). Finally, this analysis demonstrates that periodic gene expression is a single-cell feature of asynchronous populations and not a technical artefact of cell-cycle synchronization25.

Notably, most HVGs were not cell-cycle periodic and could not be identified in synchronized cell populations. To characterize these genes, we split them into three categories on the basis of the number of datasets in which they were highly variable (>3, highly pervasive; 2–3, moderately pervasive and 1, lowly pervasive; Fig. 2e and Supplementary Table 6). These categories describe how robustly variable each gene is across biological replicates. Importantly, some lowly pervasive genes, such as lsd90, demonstrated high amplitudes of regulation (Fig. 2b and Supplementary Fig. 3c,d). Moderately and highly pervasive HVGs were related to mitochondria and heat-shock response. Interestingly, the genes that encode Nmt1 and the associated biosynthetic enzyme Thi2 were among the most pervasively variable HVGs, which suggests widespread heterogeneity in vitamin B1 metabolism. In terms of gene expression regulators, the transcription factor Fil1, which controls the amino acid starvation response and the TATA-associated factor Mot1, a general transcription factor, were pervasively variable (Fig. 2b and Supplementary Fig. 3c,d)26,27. The latter is consistent with the association of TATA-box sequences with variable and noisy genes5,28,29,30,31,32 and with a role of Mot1 expression variability in this regulation27. Lowly pervasive HVGs span diverse functions that are related to membrane biology and the adaptation to external conditions, and include genes from the core environmental stress response (CESR) program (Supplementary Fig. 3b)33. Low pervasive variability could result from subtle responses to external fluctuations, consistent with recent budding yeast scRNA-seq data14.

We finally investigated the association of HVGs with several cellular and genetic features (Supplementary Fig. 4d). Interestingly, budding yeast orthologues of HVGs were highly variable between cells at the protein level. This indicates that the architecture of gene expression variability is at least partially conserved between both yeasts19. HVGs were more regulated in response to environmental and genetic perturbations34,35, which suggests that noisy transcription could underlie rapid adaptation to unpredictable challenges36. Variable genes have been reported to evolve rapidly28,29. Accordingly, HVGs demonstrated higher evolution rates and non-synonymous/synonymous mutation ratios between fission yeast species35,37. Conversely, HVGs showed fewer negative genetic interactions and less co-regulation with other genes35. This suggests that variability may be detrimental to large protein complexes or highly connected regulatory networks30. Regarding promoter sequences, HVGs were likely to have a canonical TATA box, as has been observed in other organisms, but showed only a moderate enrichment for specific transcription factor binding sites (Supplementary Table 10).

In summary, our analysis defines the functional organization and pervasiveness of genome-wide gene expression variability in unperturbed fission yeast cells.

Cell size dependence of transcriptome regulation

Single-cell RNA-seq provides a snapshot of the gene-expression variability and cell states in a population. The interpretation of this information can be greatly facilitated by integrating transcriptomics data with measures of quantitative cellular features6,19,38,39,40. Our combined approach offers such capabilities as it includes microscopy images of each cell matched to their respective transcriptomes (Fig. 1a).

We used cell-length measurements from images across all growth datasets to order cells as a function of size, which reflects progression through the cell cycle (Supplementary Table 3). We first examined changes in the global properties of scRNA-seq measurements as a function of size. The mean cell length during rapid growth was 10.9 µm, consistent with reported data41 (Fig. 3a). Mean normalized scRNA-seq counts were constant across the size range, consistent with bayNorm returning size-corrected absolute molecule numbers (which are proportional to concentrations; Fig. 3a).

Fig. 3: Cell-size dependence of the fission yeast transcriptome.
Fig. 3

a, Cell length distribution across 864 cells during rapid proliferation and global characteristics of the corresponding transcriptomes (histogram). The mean raw expression scores (blue) and mean bayNorm-normalized expression scores (green) are shown for cell length bins of 1 μm. Note the positive correlation of raw scores with cell size that is lost following normalization. b, Single cells were assigned to functional categories on the basis of their relative transcriptome signatures. The boxplots show cells assigned to categories associated with cell sizes that were significantly smaller (blue; PWilcoxon,one-sided < 0.05) or larger (red; PWilcoxon,one-sided < 0.05) than the overall population (green). The boxplots are overlaid onto the cell size frequency histogram shown in a. The vertical line marks the average cell length in the dataset. Boxplots represent median, interquartile range and most extreme data points that are no more than 1.5× the interquartile range. c, Differential expression analysis between large (13–16 μm; n = 292) and small (8–10 μm; n = 281) cells performed using the MAST package70. The number of bootstrap iterations showing significant differential expression call is plotted for each gene (PMAST < 0.05; total iterations = 100) as a function of MAST log2 differential expression ratios. The genes that were significantly induced in small and large cells are highlighted in blue and red, respectively (cut-off: number of significant iterations > 90 and absolute log2[ratio] > 0.2; see Methods and Supplementary Table 6). Selected functional categories that were significantly enriched in either list are shown with the enrichment P values. PFisher,one-sided values were corrected for multiple testing using the Benjamini–Hochberg test. d, Transcripts that change in concentration during the G2 growth phase (non-scaling genes). The average bayNorm expression scores were computed in bins of 1 μm for cells shorter than 11 μm, normalized to the smallest size bin and used for k-means clustering (n = 414 cells; Supplementary Fig. 5a). Only genes with significant linear correlation with cell size were included in this analysis (n = 78 genes; PPearson,two-sided < 0.05). The boxplots represent the median and interquartile range. e, Co-regulation of non-scaling gene clusters. The Pearson correlation coefficients between the clusters from d are shown.

In EMM2 medium, fission yeasts elongate during the G2 phase for over two-thirds of the cell cycle. At mitosis, cells stop growing until cell division, which occurs in the G1/S phase. To validate our image-based classification of scRNA-seq data, we investigated whether transcriptome signatures of the M/G1/S phases were apparent in larger cells. As expected, these featured increased transcriptome fractions related to processes specific to G1/S transition and cell-wall biogenesis (Fig. 3b). This was also apparent when the expression counts were plotted as a function of cell length (Supplementary Fig. 5a). Besides cell-cycle signatures, some large cells showed increased transcriptome fractions related to respiratory metabolism, which increases during the reductive building phase of the yeast metabolic cycle (YMC; Fig. 3b; see below)8,42.

We then used cell-length measurements to guide our analysis of transcriptional program associated with cell proliferation. To do so, we searched for genes that were differentially expressed between large cells in M/G1/S and small, recently born G2 cells using bayNorm priors specific to each group (Fig. 3c and Supplementary Fig. 5b,c,e). We identified 92 genes that were significantly upregulated in large cells (Supplementary Table 6). Consistent with large cells being in M/G1/S, 28.3% of these were also periodically expressed in synchronized cell populations21,22,23,24. Twice as many genes (193) were induced in small cells, 19.2% of which are periodic24. Importantly, this analysis combined with HVG detection based on ΔCV (Fig. 2) retrieved 81.7% of the top 500 periodic genes present in the dataset, with the remaining genes showing no apparent regulation (Supplementary Fig. 4b,c). A significant proportion of genes that were overexpressed in small cells belonged to the stress-response program (PFisher,one-sided = 0.02) and/or had hydrolase activity (PFisher,one-sided = 0.002). Large cells, on the other hand, induced several genes involved in mitochondrial membrane transport. These observations, together with the analysis from Fig. 3b, are reminiscent of the YMC. We therefore analysed gene signatures of the YMC phases: reductive charging, oxidative and reductive building. Signatures of the YMC were compartmentalized with cell size and cycle (Supplementary Fig. 5d)42. Reductive charging genes were expressed at higher levels in small cells, whereas the expression of reductive building genes increased in large cells at the time of DNA replication (P = 6.1×10−5)8,42. This analysis raises the possibility of a YMC, synchronized with the cell cycle, in the proliferation of asynchronous single fission yeast cells43,44. These signatures were not apparent in the HVGs identified in Fig. 2, which demonstrates the increased sensitivity that is provided by combining both imaging and transcriptomics.

The molecule numbers of most mRNAs increase coordinately (scale) with cell size to maintain concentrations45,46. Accordingly, the average UMI-corrected raw counts per cell correlated with cell size (RPearson = 0.17, PPearson,two-sided = 6.6 × 10−7; Fig. 3a). Genes that escape this trend have not been characterized globally, yet they could be important in regulating growth and cell-size homoeostasis46. To identify genes that escape scaling, we analysed G2 cells between birth and a length of 11 µm; beyond this we found strong signatures of the M/G1/S programs (Supplementary Fig. 5a and Supplementary Table 6). The concentration of 78 genes changed coordinately with cell size during G2 (see Methods; PPearson,two-sided < 0.05). Using k-means clustering, we defined five small gene clusters, three of which increased (Cl1–Cl3) and one decreased (Cl5) in concentration (Fig. 3d and Supplementary Table 6). Cl4 showed significant, but low-amplitude, positive regulation with size. We assessed whether these clusters defined one or more cellular states by looking at the correlation between single cells of the clusters (see Methods). We found evidence for two states in our datasets. The first was defined by Cl1 and Cl2, which were positively correlated and contained genes that are also upregulated during meiotic differentiation (Fig. 3e)47. Cl3 and Cl5 were anti-correlated and defined a second state containing a small number of genes that function in carbohydrate metabolism (Fig. 3e). Although not significant, this enrichment could hint at a gradual change in cellular energy metabolism coordinated with cell-size. An orthogonal random forest approach confirmed that 69 of the 100 genes with the strongest nonlinear correlation with cell size were either differentially expressed between large and small cells or escaped scaling (Supplementary Table 6). Together, this analysis uncovers gene expression programs that occur during growth in G2 and escape coordination with cell size.

Interestingly, 45.5% of HVGs from Fig. 2 were more highly expressed in large or small cells, or escaped scaling. Their variability is therefore not stochastic but can be understood in the light of two physiological variables: cell size and cell-cycle stage. This demonstrates the potential of analysing scRNA-seq datasets in the context of quantitative cellular features to understand gene regulation.

Transcriptome heterogeneity within cell populations in response to environmental changes

Defining the impact of environmental signals on gene expression heterogeneity is important to understand how these factors shape population structures and adaptation. We generated a blueprint of 1,824 single-cell transcriptional profiles in a series of environmental conditions, which included stress response, high cell density and nutrient depletion (Supplementary Tables 1 and 2). To compare and contrast single-cell responses to different environments, we focused on 110 genes that are upregulated as part of the CESR33. Transcriptional signatures could be clearly distinguished using principal component analysis of bayNorm-normalized counts (Fig. 4a). Interestingly, cells growing rapidly in constant conditions occupied a distinct area of the transcriptional space, thus validating our previous observation that an exacerbated stress response is not common in single cells during rapid proliferation (Fig. 2d). We then examined the specificity of the transcriptional programs defined by scRNA-seq, with a focus on heat shock and oxidative-stress genes33. These signatures singled out cells that had experienced the corresponding stresses, thus confirming the specificity of scRNA-seq and the capacity of bayNorm normalization to correct for experimental batch effects (Fig. 4b,c).

Fig. 4: Gene-expression heterogeneity of fission yeast populations in response to environmental changes.
Fig. 4

a, Single cells show distinct stress signatures of gene expression in response to different external conditions. Principal component analysis (PCA) of normalized gene expression scores for the CESR genes. A total of 1,824 cells that grew in different external conditions and included the cells from Figs. 13 were analysed (Supplementary Tables 14). Each condition is colour coded as per the legend on the right and larger groups are circled and annotated. b, Genes specific to the heat-shock response as in a (n = 1,824 cells)33. c, Genes specific to the oxidative-stress response as in a (n = 1,824 cells)33. d, Heterogeneity in CESR gene expression during rapid proliferation and entry into the stationary phase. Average CESR gene expression per cell plotted as a function of cell size. The cell density is colour coded as per the legend on the right (n = 1,056 cells). The dashed lines represent one, two or three standard deviations from the mean of the dataset. e, Between cell CV in CESR gene expression and average expression (inset) related to d. Note the strong increase in average CESR expression per cell that is accompanied by an increase in expression heterogeneity that occurs at higher cell densities (n = 1,056 cells). f, Heterogeneity in CESR gene expression during acute response and adaptation to heat shock. The average CESR gene expression per cell is plotted as a function of cell size. The conditions are colour coded as per the legend on the right (n = 576 cells). The dashed lines represent one, two or three standard deviations from the mean of the dataset. g, Between cell CV in CESR gene expression (main panel) and average expression (inset) related to f. Note the acute increase in average expression per cell of heat-shock genes and the lack of increase in expression heterogeneity during acute and adaptive responses (n = 576 cells). Boxplots represent median, interquartile range and most extreme data points that are no more than 1.5× the interquartile range.

We then investigated whether the dynamics and heterogeneity of responses differed between perturbations. We first analysed the response of single cells to a gradual change in external conditions. Specifically, we analysed cells growing at densities that ranged from 2 × 106 to 74 × 106 cells ml−1, which encompassed rapid proliferation and the early stages of the stationary phase (Supplementary Fig. 6a and Supplementary Table 13). We observed a progressive increase in CESR mRNAs up to a density of about 40 × 106 cells ml−1 coordinated with a decrease in mRNAs from the translation and the cell-growth programs (Supplementary Fig. 6b)33. The concentration of ribosomes is known to increase with growth rate to support higher biosynthetic demand48. We therefore examined whether a decrease in the concentration of growth-related mRNA would affect growth rates. Surprisingly, growth rates remained constant up to approximately 40 × 106 cells ml−1 (Supplementary Fig. 6a). This indicates that mRNAs of the translation machinery can decrease in concentration in response to environmental changes independently and without affecting cell growth. This is consistent with the existence of a free ribosome fraction that buffers growth and environment48. Importantly, only a few other mRNA classes showed coordination with cell density, which indicates that this behaviour is not ubiquitous (Supplementary Fig. 6c, left). Notably, an increase in CESR mRNAs concentration was not accompanied with an increase in gene-expression noise nor by the appearance of outlier cells that had entered a full stress-resistance state (Fig. 4d,e and Supplementary Fig. 6c, right). This result indicates that single cells undergo a gradual and synchronized adaptation of gene expression at increasing cell densities.

Strikingly, within the following cell division, we detected a strong and heterogeneous induction of CESR genes (Fig. 4d,e) together with a decrease in growth rate (Supplementary Fig. 6a). Importantly, exhaustive functional analyses revealed that increased transcriptional heterogeneity was restricted to specific pathways and not a global property of the transcriptomes (Supplementary Fig. 6c, right). Additional genes showing strong heterogeneous responses were also regulated during meiotic differentiation and growth on glycerol (Supplementary Fig. 6c, middle and right)47,49. Together, these data support a model in which single cells readjust the balance of the stress- and growth-related transcriptional programs synchronously as a function of cell density and ahead of changes in growth rate. This is followed, within a single cell cycle, by a substantial, heterogeneous reshuffling of the cellular transcriptome. These findings indicate that entry into the stationary phase is a process that increases transcriptional heterogeneity and possibly promotes cell individuality and differentiation.

We then examined the impact of a rapid and severe change in external conditions on gene expression. The culture conditions of cells were briefly switched from 25 °C to 37 °C in a turbidostat and grown at steady-state at 37 °C (adaptation) or 25 °C (relaxation). The expression of CESR genes rapidly increased following a temperature switch and adjusted back to pre-stress levels during both adaptation and relaxation (Fig. 4f,g). In stark contrast with entry into the stationary phase, only a minor increase in transcriptional heterogeneity could be detected during heat shock, which did not propagate during adaptation or relaxation. This suggests that the acute stress response can be synchronous in a cell population and does not lead to phenotypic heterogeneity (Fig. 4g). Together, this analysis demonstrates that the level of transcriptional heterogeneity that is induced by changes in external conditions is variable and regulated, depending on the type and strength of the stimulus.

Conclusion

We report an integrated approach to analyse transcriptomes of single yeast cells in combination with phenotypic measurements. We also provide an account of genome-wide gene expression heterogeneity in fission yeast, during rapid proliferation under constant culture conditions and in response to environmental changes. When conditions are constant, the periodic gene expression during the cell cycle is the most robust and pervasive form of heterogeneity. However, G2-specific expression signatures reminiscent of the YMC of a budding yeast also exist together with genes that escape scaling with cell size. This analysis relied on the ability to order and classify cells based on size, independently of the scRNA-seq data. A set-up for quantitative imaging of high-dimensional morphological features coupled to our approach would extend its potential to additional cellular traits, such as nuclear size, mitochondrial numbers or actin structure. This would enable a better understanding of the hidden diversity of cellular states that occur during growth and adaptation. We analysed gene expression heterogeneity and its dynamics in response to environmental changes. We observed striking differences between stationary phase entry, a heterogeneous process, and an acute heat-shock response, which seemed to be more coordinated. This raises the question of whether gene-expression heterogeneity depends on the strength of the challenge and indicates that expression heterogeneity is controlled in a condition-specific manner. Analysis of diverse environmental challenges at the single-cell level will be required to understand the root of this variability. In particular, a comparison of post-mitotic quiescent cells with proliferating cells would inform on the impact of growth on heterogeneity. Overall, in addition to increasing our understanding of how a single-celled eukaryote functions, the findings reported here highlight the potential of investigating gene regulation as a cause and/or consequence of quantitative cellular phenotypes, such as cell size, genome-wide in single-cells.

Methods

The analysis of scRNA-seq is challenging because the yeast cell wall is resistant to standard lysis conditions that preserve RNA integrity. Moreover, yeast transcriptomes are highly plastic and respond to external conditions within minutes, making cell isolation and manipulation a source of artefacts14,33. We have overcome both hurdles by snap-freezing cells immediately after harvesting, a procedure that fixes both cell morphology and transcriptomes, and establishing a protocol for yeast cell lysis at high temperature in conditions that protect RNA integrity, thus bypassing the need for enzymatic digestion of the cell wall.

Cell culture

Fission yeast cells (strain 972 h-) were cultured with a seeding density of 0.5 × 106 cells ml−1 in all experiments. Standard EMM2 media was used except when indicated otherwise50. All culturing conditions are described in detail in Supplementary Table 2 and are assigned to individual samples in Supplementary Tables 1 and 3. All cultures were snap frozen at the time of harvest. This treatment kills fission yeast cells and precludes any changes in gene expression during the following isolation steps.

Heat stress

Cells were grown in YE medium at 25 °C up to a density of 2–4 × 106 cells ml−1. The cells were transferred to a water bath maintained at 37 °C or 39 °C (for datasets 1712_1 and 1712_2, 0408_2). To study adaptation to heat, cells were transferred post-heat shock to a turbidostat and maintained at a density of 4 × 106 cell ml−1 and a temperature of either 25 °C or 37 °C (siphon-flow based derivative of the instrument described in ref. 51).

Glycerol growth

Cells were grown in YE medium with 3% glycerol and 0.1% glucose.

Osmotic shock and oxidative stress

Cells were cultured in YE medium up to a density of 4 × 106 cells ml−1. To induce osmotic shock, an equal volume of 2 M sorbitol prepared in YE medium was added to the cell culture to a final concentration of 1 M for 15 min. To induce oxidative stress, cells were treated with 0.5 mM H2O2 for 15 or 60 min.

Nitrogen starvation

Cells were pre-cultured in EMM2 medium with NH4Cl as the nitrogen source up to a density of 2 × 106 cells ml−1. Cells were harvested by centrifugation, washed twice with EMM2 medium without nitrogen and re-suspended in medium without nitrogen. The cells were harvested at 6 and 24 h after starvation.

Cell isolation and imaging

Single cells were imaged with a ×20 objective on a MSM-400 tetrad-dissection microscope (Singer Instruments), picked into 3 µl QuickExtract RNA extraction solution (Lucigen, Epicenter) in 200 µl PCR tubes and immediately snap frozen at −80 °C. The use of the QuickExtract buffer solution is critical to protect RNA against degradation during cell lysis. For each ctrRNA sample, 3 pg total RNA isolated from matching cultures by hot phenol extraction were diluted in QuickExtract and processed as single cells. Single cell images were analysed using ImageJ. All cells and ctrRNA samples are described in Supplementary Table 3.

SCRB-seq for yeast library preparation

Single cells were lysed at 98 °C for 10 min in a PCR machine and library preparation performed based on refs. 15,16,52 using the primer sequences described Supplementary Table 9. The protocol was modified as follows. Briefly, oligo(dT)-containing cell indexes and UMIs were added to each well to a final concentration of 1 µM. Primers were annealed to the RNA template at 72 °C for 3 min and the components for reverse transcription added to the following final concentrations: 100 U Superscript II reverse transcriptase (Invitrogen); 10 U RNAse inhibitor (Invitrogen); 1×Superscript buffer; 5 mM dithiothreitol; 1 M betaine (Sigma); 1.5 mM MgCl2; 1 mM of each dNTP; 1 µl ERCC spikes Set A diluted 1/106 (NEB) and 1 µM RNA-TSO primer. Reverse transcription was carried out at 42 °C for 90 min, after which the temperature was ramped between 50 °C and 42 °C for 10 cycles of 2 min each. The reaction was heat inactivated at 70 °C for 15 min and the reaction cooled to 15 °C. Each single-cell sample was treated with 20 U Exonuclease I (NEB) for 30 min at 37 °C followed by heat inactivation at 80 °C for 20 min. Sets of 96 samples were pooled and purified using a PCR purification kit (Qiagen) and eluted in 60 µl elution buffer containing 10 mM Tris–Cl, pH 7.5. The samples were treated with 40 U exonuclease I for 30 min at 37 °C for a second time followed by heat inactivation at 80 °C for 20 min. PCR was performed on the pooled sample with the addition of 1×KAPA HiFi buffer, 0.075 mM of each dNTP, 1 µM PCR primer and 1.25 U KAPA HiFi enzyme. PCR cycling was performed with denaturation at 98 °C for 3 min, followed by 25 cycles of denaturation, annealing and extension at 98 °C, 60 °C and 72 °C for 20 s, 15 s and 1 min, respectively. A final extension at 72 °C for 5 min was conducted before cooling the samples to 15 °C. The samples were purified using 0.6×Agencourt AMPure XP beads and eluted in 10–15 µl nuclease-free 10 mM Tris, pH 7.5. The libraries were quantified on an Agilent Bioanalyser using an HS-DNA chip to confirm the presence of a clean peak at ~1,000 bp. Between 1 and 2 ng PCR library was used for tagmentation using the Illumina Nextera XT kit using a modified I5 primer as described in refs. 15,52 (Supplementary Table 9). Between 8 and 12 PCR cycles were performed post-tagmentation to amplify the 3′ fragments carrying the poly-A tail, the cell barcode and the UMI. The final libraries were purified twice using Agencourt AMPure XP beads at 1×bead concentration and the final elution was performed in elution buffer. The libraries were quantified using on an Agilent Bioanalyser and sequenced.

smFISH

Measurements of cell size, mRNA number per cell and cellular mRNA concentrations were obtained for 12 genes by smFISH as described in ref. 53. The genes queried were: SPBC16E9.16c (lsd90), SPAC27D7.09c, SPCC330.02 (rhp7), SPBC725.11c (php2), SPBC28F2.12 (rpb1), SPBC1826.01c (mot1), SPCC1223.11 (ptc2), SPBC146.13c (myo1), SPAPB1E7.04c, SPAC328.03 (tps1), SPAC2H10.01 and SPCC1739.01. The processed data are provided in Supplementary Table 7. The probe sequences are provided in Supplementary Table 8. For Supplementary Fig. 2c, we used R-code available at https://stackoverflow.com/questions/35717353/split-violin-plot-with-ggplot2. To calculate scFano values, the mRNA counts of each cell were divided by the length of the cell and multiplied by the average cell size in the experiment. The scFano factors were then calculated as the variance over the mean of these normalized counts (\(\sigma ^2/\mu\)).

Sequencing and read mapping

Pools of scRNA-seq libraries were sequenced on an Illumina HiSeq 2500 instrument at the MRC LMS genomics facility. Paired-end reads (100 nt) were generated from two pools of 96 samples per sequencing lane. Data were processed using RTA 1.18.64, with default filter and quality settings. The reads were de-multiplexed with bcl2fastq-1.8.4 (CASAVA, allowing 0 mismatches). Read 1 was used to extract cell-specific indexes and UMIs. The corresponding Read 2 was mapped to the fission yeast genome as described in ref. 18. Mapped reads were assigned to fission yeast genes as described in ref. 18 using Pombase annotation as of 27 May 2015 and including 5′ and 3′ UTR sequences. Reads 1 and 2 were assigned to specific cell/RNA samples based on cell-specific index sequences de-multiplexed using in house Perl scripts. Within each specific cell/RNA sample, the reads that shared identical UMI sequences and mapped to the same gene were collapsed.

UMI correction

Unique molecular identifiers are short random DNA sequences, typically 6–8 nt in length, that are appended to every single cDNA molecule during SCRB-seq for yeast sequencing library preparation. In SCRB-seq for yeast, UMIs are part of the first-strand reverse transcription primer15. UMIs are commonly used to remove PCR amplification biases but, importantly, have been recognized to be prone to sequencing errors and biases themselves54,55. These lead, for a given gene, to an enrichment in the fraction of UMIs with small sequence distances (also called Hamming distances) that are higher than expected by chance54. This phenomenon is present in our data and results in an overestimation of the library diversity. To correct for this bias, we developed an original network-based method that removes recursively, at each genomic locus, reads associated to UMIs that differ by a distance of 1 nt (Hamming distance = 1) from the UMIs with the highest abundance. Our method is identical to the adjacency method introduced and implemented recently in UMI-tools54. The application of our UMI error correction method removes about 30% of the raw reads pool (Supplementary Fig. 1b). For the dataset descriptions, statistics and raw counts see Supplementary Tables 14.

Average gene

Average profiles were obtained from raw UMI-corrected counts for ten cells using the deeptools package56. The function 'computeMatrix scale-regions' and default bin size of 10 nt and flanking regions of 300 nt were used (Supplementary Fig. 1d).

Selection of high-confidence genes

Genes representing >0.16% of the total molecules detected in the transcriptome of at least one cell across all datasets were included in the high-confidence gene-set used in this study. This empirical filter resulted in a list of 1,011 genes with varied functions and regulation patterns across conditions (Fig. 1b, Supplementary Fig. 1f–h and Supplementary Table 6). Importantly, this approach included genes with high expression in a small number of cells in the high-confidence list. This would not have been possible using a single cut-off of mean expression values across the dataset. This filtering protocol mostly removed genes that were expressed at low levels with a very high fraction of cells with zero counts (dropouts) for which detection becomes mostly stochastic (Supplementary Fig. 1f). Accordingly, the mean expression levels of the discarded genes after removal of zero values was 1.23 molecules per cell, which were significantly lower than that of high-confidence genes (mean = 5.8 molecules per cell, PWilcoxon,one-sided = 0; Supplementary Fig. 1g). Together, this analysis confirms the low information content of the discarded genes and validates our filtering approach.

Estimation of β using spike-ins and smFISH

We define the capture efficiency βi of a cell i as the probability of observing (sequencing) any one original mRNA molecule of the cell. We defined β as the mean of the βi values across all cells. As for any given gene, the observed UMI-corrected counts per cell are lower than the original number of mRNA molecules present in a cell, βi values range between 0 and 1. Spike-in controls can be used to estimate βi and β (ref. 57). To do this, we divided the total number of spike-in molecules observed within each cell by the corresponding theoretical number of input spike-in molecules. The mean of these ratios across all the cells is 0.015. We believe this number is an underestimate of the true β given recent absolute estimates of average mRNA counts in fission yeast populations18. Consistent with our observations, it has recently been reported that spike-ins have a lower β than mRNAs58. An alternative way to estimate β relies on estimates of absolute mRNA molecule numbers per cell obtained by smFISH. Using 12 different genes, we fitted a linear regression between the mean expression of UMI-corrected sequencing counts and the mean of the corresponding smFISH counts. With this approach, β was estimated to be the coefficient of variable, which is about 0.06 (Supplementary Fig. 1e). In summary, the β estimates that are obtained from spike-in controls and smFISH measurements are very different. We chose to use the geometric mean of the two estimates, which led to a β of 0.03. This estimate is one of the parameters for our Bayesian data normalization protocol described below (bayNorm)17. We note that, within this range, our biological conclusions are not overly sensitive to specific values of β. The dependence of bayNorm normalization on the choice of β is systematically explored elsewhere17.

Estimation of β i of single cells

The βi values of single cells are proportional to cell-specific global scaling factors (si) that are commonly used in normalization of scRNA-seq data (see for example ref. 59):

$$\beta _i = \beta \times s_i/s$$

where the constant of proportionality is related to β. The scaling factors can be estimated using spike-in controls57 or alternatively directly from the data. Simple estimates of si are the total number of molecules detected per cell (total count) or the mean of the number of molecules of a subset of genes detected in each cell. Popular bulk RNA-seq methods such as DESeq are designed to compute si(refs. 60,61). Unfortunately, these methods are not applicable to scRNA-seq datasets because of the high frequency of dropouts present in the data (dropouts, the proportion of genes with zero counts across cells)59. Alternative methods have been developed specifically for scRNA-seq (for example see refs. 62,63). We carefully assessed several existing methods for the estimation of scaling factors and settled for estimations of si based on the mean of UMI-corrected counts of a carefully chosen subset of genes in each given cell. The rationale behind this choice is the following: we argued that genes with high dropout rates, high variability in ctrRNA control experiments (showing technical variability) or those in the tail of the mean expression distribution (which have disproportionally high contribution to the total count) are not suitable for scaling factor estimation. Specifically, we used a list of 768 genes for the estimation of si that were selected as follows: (1) genes with a dropout rate >70% were excluded (zero UMI-corrected counts in more than 70% of the cells across all datasets, 202 genes); (2) the top 20 genes with the highest expression across datasets after total count normalization were removed and (3) genes with significantly high technical variability in ctrRNA controls were removed. To do this, we called HVGs in 11 ctrRNA datasets using the Bioconductor package scran and excluded 21 genes that were noisy in at least 5 of the 11 datasets63,64. Interestingly, the procedure described above produced the highest correlation between βi values and cell sizes (0.1781 for this procedure, 0.149 for the method proposed in ref. 63 and 0.0568 for the spike-in estimates).

Normalization using bayNorm

Data from scRNA-seq are commonly normalized by dividing the raw counts by the global scaling factor si estimated for each cell59. We have recently developed bayNorm, an alternative Bayesian approach to scRNA-seq normalization, which also provides simultaneous imputation for the dropouts17. In this approach, given the raw count xij observed in the jth cell for the ith gene and given the βj of the jth cell, we estimate the posterior distribution of the expected number of mRNAs \(x_{ij}^0\) that were originally present in the cell. We found that a reasonable choice for the likelihood of observed counts \(x_{ij}\) is a binomial distribution with size \(x_{ij}^0\) and probability \(\beta _j\) (ref. 17). In addition, we assume that the prior for \(x_{ij}^0\) follows a negative binomial distribution with mean \(\mu _i\) and size factor \(\phi _i\), with the following parameterization:

$$\sigma ^2 = \mu + (\mu )^2/\phi$$

Using the Bayes rule, the posterior distribution of the ith gene in the jth cell can be expressed as:

$$\underbrace {\Pr \left( {x_{ij}^0\,|\,x_{ij},\mu _i,\phi _i,\beta _j} \right)}_{{\mathrm{Posterior}}} = \frac{{\overbrace {{\mathrm{Pr}}\left( {x_{ij}\,|\,x_{ij}^0,\beta _j} \right)}^{{\mathrm{Binomial}}\,{\mathrm{likelihood}}} \times \overbrace {{\mathrm{Pr}}\left( {x_{ij}^0\,|\,\mu _i,\phi _i} \right)}^{{\mathrm{Negative}}\,{\mathrm{binomial}}\,{\mathrm{prior}}}}}{{\underbrace {{\mathrm{Pr}}\left( {x_{ij}\,|\,\mu _i,\phi _i,\beta _j} \right)}_{{\mathrm{Marginal}}\,{\mathrm{likelihood}}}}}$$

The outputs of the bayNorm normalization procedure are either samples from the posterior distribution described above or its maximum a posteriori estimate (Supplementary Fig. 2a). In addition to raw RNA-seq counts, bayNorm normalization requires as inputs prior distributions of the parameters \(\mu _i\;{\mathrm{and}}\;\varphi _i\) for each gene. In bayNorm, these are estimated from the scRNA-seq data directly using an Empirical Bayes approach (see ref. 17 for details). Prior estimation can be done using data from all cells across all datasets irrespective of experimental conditions. We refer to this procedure as 'global' normalization. Alternatively, if cells can be split into different groups on the basis of experimental conditions or phenotypic information, for instance, prior parameters \(\mu _i\;{\mathrm{and}}\;\varphi _i\) can also be estimated within each group independently. We refer to this procedure as 'local' normalization. On the one hand, the use of global priors based on the Empirical Bayes method reduces the technical batch effects that occur between different experiments. On the other hand, the use of local priors for different groups of cells enhances the resolution and sensitivity of differential expression analysis between these groups. The flexibility of prior parameter estimation allows heterogeneous cell populations to be accounted for65. Bayesian normalization, as implemented in bayNorm, has several additional advantages over widely used normalization approaches that rely on dividing molecules numbers by si (see also ref. 62). First, bayNorm also provides imputation by replacing a large proportion of zero counts with non-zero values, thus greatly reducing the fraction of dropouts in the normalized data (from 42.27% to 3.56% in the cell datasets for high-confidence genes). Second, bayNorm effectively corrects for the experimental batch-dependent variation in β and reduces batch-specific biases performing similarly to SCnorm but without the need for multiple expression-dependent scaling factors17,62. Also, the use of global priors as explained above can further reduce batch effects. Third, bayNorm normalization preserves the uncertainty present in the data particularly for cells with low coverage, thus reducing false discovery rates in differential expression analysis. Finally, bayNorm produces mRNA distributions and noise estimates close to the state-of-the-art smFISH measurements (Supplementary Figs. 2c and 3c,d) and averaged transcriptome structures close to high-quality population estimates (Fig. 1c and Supplementary Fig. 2b,e).

Prior distributions, β, posterior distribution and point estimate datasets

The mean capture efficiency was set at 0.03 throughout except for Supplementary Figs. 2c and 3c where a mean capture efficiency of 0.06 calculated from smFISH data was used. Prior distributions were generated as follows: the data for Figs. 1c; 3a,b,d,e; 4; Supplementary Figs. 2b,d,e; 3c; 4c; 5a–c and 6b,c were normalized using global priors that were obtained from all cells in the dataset to correct for batch effects. Supplementary Fig. 2c used priors calculated from rapidly growing cells. Fig. 2a,d,e and Supplementary Fig. 3a–c data were normalized using local priors estimated within each individual dataset to exclude any residual contribution of batch differences to HVG calls. Figure 3c and Supplementary Fig. 5d,e data were normalized using local priors estimated independently for sets of either large (13–16 μm) or small (8–10 μm) cells to maximize sensitivity of differential expression analysis.

Detection of noisy or highly variable genes

The expression of a given gene can vary among cells within a population. Cell-to-cell variability in gene expression, also called noise, is defined as the CV \(\left( {\sigma /\mu } \right)\) where \(\sigma\) and μ are the standard deviation and the mean of expression scores across cells, respectively. A number of modelling and experimental studies have shown that gene expression noise is inversely correlated to mean gene expression calculated across cells19,20,66,67. Genes with particularly high cell-to-cell variability are called noisy or HVGs and are defined as having significantly higher noise than most genes with similar means (Fig. 2a). The identification of HVGs from scRNA-seq experiments is challenging due to the strong technical noise present in the data and several teams have addressed this problem64,66,67,68. The general consensus is to decompose the total noise observed into its technical and biological components. To do this, the dependence of technical noise to the mean is measured and used to infer potential additional biological noise present for each gene (Fig. 2a and Supplementary Fig. 3a). Here, we have developed an original method for HVG detection based on bayNorm-normalized data and sets of computed synRNA to estimate noise floors. The synRNA data were generated as follows: Given a scRNA-seq dataset with a raw count matrix xij and a vector of estimated βj values, we produced a set of synRNA data \(x_{ij}^{{\mathrm{syn}}}\) with similar mean expressions and βj values as the real experimental data but with no biological variability above what is expected from the Poisson distribution. Poisson noise is the minimal amount of expected noise if no additional biological variability is present. To do this, we first generated a gene expression dataset \(x_{ij}^{{\mathrm{Poisson}}}\) sampled from a Poisson distribution with mean expression obtained from raw count matrix xij and βj:

$$x_{ij}^{{\mathrm{Poisson}}} = {\mathrm{Poisson}}\left( {\lambda _i} \right),{\mathrm{where}}\;\lambda _i = < \frac{{x_{ij}}}{{\mathop {\sum }\nolimits_i x_{ij}}} > < \left(\mathop {\sum }\limits_i x_{ij}\right)/\beta _j >$$

Both means above were calculated across cells (index j). In a final step, we used binomial downsampling to generate a \(x_{ij}^{{\mathrm{syn}}}\) dataset from \(x_{ij}^{{\mathrm{Poisson}}}\) simulating the effect of partial RNA capture during the scRNA-seq procedure:

$$x_{ij}^{{\mathrm{syn}}} = {\mathrm{Binom}}(x_{ij}^{{\mathrm{Poisson}}},\beta _j)$$

Finally, \(x_{ij}^{{\mathrm{syn}}}\)data were normalized with bayNorm using prior parameters estimated from the original raw data (that is, identical to those used for the normalization of the cell data). To identify HVGs, a local regression between noise and mean expression of all genes from the normalized synRNA dataset is calculated and compared with the noise levels observed in the corresponding normalized experimental datasets (log–log; Fig. 2a). To call genes with noise levels significantly above the synRNA fitted line, we used an approach similar to the one proposed in ref. 64 and based on an adaptation of the gene expression variation model68. Briefly, vertical differences (only positive residuals were considered) between noise levels in the experimental dataset and the fitted line were calculated (illustrated as ΔCV in Fig. 2a). The differences were normalized by dividing by the residuals from the regression, which follow a normal distribution. Most genes were assumed to not deviate significantly from the centre of the distribution. The centre was found by the kernel density of the normalized differences. Next, a normal distribution was fitted using differences that were below that centre. P values were then extracted on the basis of normal distribution and adjusted using the Benjamini–Hochberg procedure. Noisy genes were called independently for each batch of 96 cells or ctrRNAs generated in this study after normalization with bayNorm using priors estimated within each dataset (local priors). For each gene in each dataset, noise and mean values across cells where calculated using pooled expression scores of 100 samples of the bayNorm posterior distribution per cell. Using this design, gene variability was assessed in 100 bootstrapped versions of each dataset. Genes were called noisy if they had an false discovery rate < 0.1 in 85 or more bootstrap samples. Genes that were called in at least one rapid growth cell dataset (2502_1, 2502_3, 2502_5, 2502_7, 2502_9, 1904_1, 0109_3, 1711_1 and 1711_2) and none of the ctrRNA datasets (2502_2, 2502_4, 2502_6, 2502_8, 2502_1, 1904_2 and 0408_5) were called HVGs and are discussed further in this study (Supplementary Table 6).

Functional analysis of HVGs

The levels of quantitative variables describing a series of gene features were compared between HVGs (with and without the top 500 periodic genes) and all other genes from the high confidence set using one-sided Wilcoxon tests. Features were obtained from four studies and are listed below19,34,35,37. The labels were adjusted to be self-explanatory and a detailed description of each feature is available in the original publications.

  1. 1.

    Features as per the Koch et al.35 additional file 2. The labels from Supplementary Fig. 4d are listed with labels from the original additional file 2 between parentheses: Yeast conservation (Yeast.conservation); dN/dS (dN.dS); Multifunctionality (associated GO terms)” (Multifunctionality); Disordered domains (%) (Disorder); Number of physical protein interactions (PPI.degree); Fitness (SM.fitness.defect); Copy number (paralogues number) (Copy.number); Codon Adaptation Index (CAI); Codon unsage bias (Nc); Number of co-expressed genes (Co.expression.degree); Protein length (Protein.length); Expression level (RNA) (Expression.level); Expression variation (RNA) (Expression.variation); Number of protein domains (Num.of.domains); Number of single protein domains (Num.of.unique.domains); Broad conservation (Broad.conservation); Negative genetic interaction degrees (Observed.GI.degree).

  2. 2.

    From Rhind et al.37: Evolutionary rates ('Rate' values from Supplementary Table 30).

  3. 3.

    From Pancaldi et al.34: Expression variation (RNA) (Between condition variability score; Supplementary Table 1).

  4. 4.

    From Newman et al.19: Cell-to-cell variability (YEPD) (DM values in YEPD) and Cell-to-cell variability (SD) (DM values in SD).

Transcriptome fractions of functional categories

To assign cells to particular functional categories, we calculated z scores for the sums of the counts of each category within each cell. Cells were assigned to a given category if the category z score absolute value was >1.2 in more than 70 of 100 bayNorm posterior samples. The categories with assigned cells significantly larger or smaller than the whole population are shown on Fig. 3b (PWilcoxon < 0.05).

Differential gene expression analysis

Several differential expression analysis methods tailored for scRNA-seq analysis have been published. A recent comparative analysis69 and our own experience identifies MAST70 as a reliable method. Therefore, in this study, we used the MAST package with method = 'glm', the 'ebayes' option enabled and considering adjusted P values from the continuous part of the hurdle model utilized in MAST (multiple testing adjustment method: Benjamini–Hochberg)70. Differential expression detection was run independently on 100 bayNorm posterior distribution samples for Fig. 3. Genes called as differentially expressed in >90% of the posterior distributions were considered differentially expressed. The log2 ratios are the mean of the log2 ratios from each posterior distribution. An additional cut off on log2 ratios (>0.2 or <−0.2) was used in Fig. 3. The differential expression analysis shown in Fig. 3 used two sets of cells, large (13–16 μm) and small (8–10 μm).

For this analysis, large and small cell sets were normalized by bayNorm using different local priors specific to each set. To demonstrate that our differential expression analysis reflected gene expression differences related to different cell sizes and was not an artefact of the use of local priors, we performed the following experiment. Two sets of 50 cells were selected from the large or small cell sets and normalized using bayNorm and local priors. This sub-sampling experiment was repeated 20 times. In parallel, the labels of the large and small cells were randomized in advance before sub-sampling and normalization as above. Both groups were then used for differential expression analysis. As expected, this second randomized set showed almost no genes with robust and reproducible differential expression, thus validating our approach (Supplementary Fig. 5e).

Random forest model

To identify genes that have nonlinear correlation of expression levels with cell size, we built a random forest model of cell size given gene expression levels. We chose a subset of normal cells and applied a filtering criterion so that cells with total counts lower than 100,000 and higher than 300,000 were removed. In addition, we filtered out cells that were smaller than 6 µm and larger than 25 µm. We then applied a random forest model described in ref. 71 to the filtered dataset. Genes were ranked according to the importance statistic '%IncMSE' returned by the model (Supplementary Table 6).

YMC analysis and validation

Gene signatures of the three proposed YMC phases: oxidative, reductive building and reductive charging were obtained from ref. 42. Fission yeast orthologues were identified for each signature and the differential expression ratio between large and small cells from Fig. 3c was plotted for each lists (Supplementary Fig. 5d). To add statistical support to the observed expression patterns, we used the following rationale. Supplementary Fig. 5d shows that large/small differential expression ratios increase from the reductive charging to oxidative and reductive building phases. This pattern leads to a positive correlation between YMC cycle steps and the differential expression ratio. We compared the slope of the regression line between differential expression ratios and steps of the YMC and compared it with 1,000 datasets where the same ratios were randomized between YMC steps or where ratios were randomly sampled from the whole dataset. The R2 values from our data where significantly higher than those from the permutation (z scores, P < 10–4). The same was true when using Pearson correlations. Together, this analysis confirms that the observed pattern is unlikely to have arisen by chance.

Promoter sequence analysis

Promoter sequences of HVGs (position relative to transcription start site: −300 to +100) were analysed using genes that where neither HVGs nor false positives as a reference set. We used the tool CentriMo72 available as part of the MEME software suite to identify known motifs enriched in these promoters based on the YEASTRACT lists of budding yeast motifs (Supplementary Table 10).

Reporting Summary

Further information on research design is available in the Nature Research Reporting Summary linked to this article.

Data availability

All raw scRNA-seq datasets are available in ArrayExpress accession number E-MTAB-6825. Cell size measurement and all smFISH data are available as Supplementary Material. The bayNorm package is available from Bioconductor: https://bioconductor.org/packages/release/bioc/html/bayNorm.html. All figures except Fig. 1a and Supplementary Fig. 2a contain original data.

Additional information

Publisher’s note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

References

  1. 1.

    Kaern, M., Elston, T. C., Blake, W. J. & Collins, J. J. Stochasticity in gene expression: from theories to phenotypes. Nat. Rev. Genet. 6, 451–464 (2005).

  2. 2.

    Shahrezaei, V. & Swain, P. S. The stochastic nature of biochemical networks. Curr. Opin. Biotechnol. 19, 369–374 (2008).

  3. 3.

    Raj, A. & van Oudenaarden, A. Nature, nurture, or chance: stochastic gene expression and its consequences. Cell 135, 216–226 (2008).

  4. 4.

    Elowitz, M. B., Levine, A. J., Siggia, E. D. & Swain, P. S. Stochastic gene expression in a single cell. Science 297, 1183–1186 (2002).

  5. 5.

    Segal, E. & Widom, J. From DNA sequence to transcriptional behaviour: a quantitative approach. Nat. Rev. Genet. 10, 443–456 (2009).

  6. 6.

    Battich, N., Stoeger, T. & Pelkmans, L. Control of transcript variability in single mammalian cells. Cell 163, 1596–1610 (2015).

  7. 7.

    Shahrezaei, V. & Marguerat, S. Connecting growth with gene expression: of noise and numbers. Curr. Opin. Microbiol. 25, 127–135 (2015).

  8. 8.

    Mellor, J. The molecular basis of metabolic cycles and their relationship to circadian rhythms. Nat. Struct. Mol. Biol. 23, 1035–1044 (2016).

  9. 9.

    Svensson, V., Vento-Tormo, R. & Teichmann, S. A. Exponential scaling of single-cell RNA-seq in the past decade. Nat. Protoc. 13, 599–604 (2018).

  10. 10.

    Stubbington, M. J. T., Rozenblatt-Rosen, O., Regev, A. & Teichmann, S. A. Single-cell transcriptomics to explore the immune system in health and disease. Science 358, 58–63 (2017).

  11. 11.

    Baslan, T. & Hicks, J. Unravelling biology and shifting paradigms in cancer with single-cell sequencing. Nat. Rev. Cancer 17, 557–569 (2017).

  12. 12.

    Griffiths, J. A., Scialdone, A. & Marioni, J. C. Using single-cell genomics to understand developmental processes and cell fate decisions. Mol. Syst. Biol. 14, e8046 (2018).

  13. 13.

    Saliba, A.-E., C Santos, S. & Vogel, J. New RNA-seq approaches for the study of bacterial pathogens. Curr. Opin. Microbiol. 35, 78–87 (2017).

  14. 14.

    Gasch, A. P. et al. Single-cell RNA sequencing reveals intrinsic and extrinsic regulatory heterogeneity in yeast responding to stress. PLoS Biol. 15, e2004050 (2017).

  15. 15.

    Soumillon, M., Cacchiarelli, D., Semrau, S., van Oudenaarden, A. & Mikkelsen, T. S. Characterization of directed differentiation by high-throughput single-cell RNA-Seq. Preprint at bioRxiv https://doi.org/10.1101/003236(2014)

  16. 16.

    Picelli, S. et al. Full-length RNA-seq from single cells using Smart-seq2. Nat. Protoc. 9, 171–181 (2014).

  17. 17.

    Tang, W. et al. bayNorm: Bayesian gene expression recovery, imputation and normalisation for single cell RNA-sequencing data. Preprint at bioRxiv https://doi.org/10.1101/384586(2018).

  18. 18.

    Marguerat, S. et al. Quantitative analysis of fission yeast transcriptomes and proteomes in proliferating and quiescent cells. Cell 151, 671–683 (2012).

  19. 19.

    Newman, J. R. S. et al. Single-cell proteomic analysis of S. cerevisiae reveals the architecture of biological noise. Nature 441, 840–846 (2006).

  20. 20.

    Bar-Even, A. et al. Noise in protein expression scales with natural protein abundance. Nat. Genet. 38, 636–643 (2006).

  21. 21.

    Rustici, G. et al. Periodic gene expression program of the fission yeast cell cycle. Nat. Genet. 36, 809–817 (2004).

  22. 22.

    Peng, X. et al. Identification of cell cycle-regulated genes in fission yeast. Mol. Biol. Cell 16, 1026–1042 (2005).

  23. 23.

    Oliva, A. et al. The cell cycle-regulated genes of Schizosaccharomyces pombe. PLoS Biol. 3, e225 (2005).

  24. 24.

    Marguerat, S. et al. The more the merrier: comparative analysis of microarray studies on cell cycle-regulated genes in fission yeast. Yeast 23, 261–277 (2006).

  25. 25.

    Cooper, S. On a heuristic point of view concerning the expression of numerous genes during the cell cycle. IUBMB Life 64, 10–17 (2012).

  26. 26.

    Duncan, C. D. S., Rodríguez-López, M., Ruis, P., Bähler, J. & Mata, J. General amino acid control in fission yeast is regulated by a nonconserved transcription factor, with functions analogous to Gcn4/Atf4. Proc. Natl Acad. Sci. USA 115, E1829–E1838 (2018).

  27. 27.

    Ravarani, C. N. J., Chalancon, G., Breker, M., de Groot, N. S. & Babu, M. M. Affinity and competition for TBP are molecular determinants of gene expression noise. Nat. Commun. 7, 10417 (2016).

  28. 28.

    Tirosh, I., Weinberger, A., Carmi, M. & Barkai, N. A genetic signature of interspecies variations in gene expression. Nat. Genet. 38, 830–834 (2006).

  29. 29.

    Landry, C. R., Lemos, B., Rifkin, S. A., Dickinson, W. J. & Hartl, D. L. Genetic properties influencing the evolvability of gene expression. Science 317, 118–121 (2007).

  30. 30.

    Lehner, B. Selection to minimise noise in living systems and its implications for the evolution of gene expression. Mol. Syst. Biol. 4, 170 (2008).

  31. 31.

    Blake, W. J., KÆrn, M., Cantor, C. R. & Collins, J. J. Noise in eukaryotic gene expression. Nature 422, 633–637 (2003).

  32. 32.

    Weinberger, L. et al. Expression noise and acetylation profiles distinguish HDAC functions. Mol. Cell 47, 193–202 (2012).

  33. 33.

    Chen, D. et al. Global transcriptional responses of fission yeast to environmental stress. Mol. Biol. Cell 14, 214–229 (2003).

  34. 34.

    Pancaldi, V., Schubert, F. & Bähler, J. Meta-analysis of genome regulation and expression variability across hundreds of environmental and genetic perturbations in fission yeast. Mol. Biosyst. 6, 543–552 (2010).

  35. 35.

    Koch, E. N. et al. Conserved rules govern genetic interaction degree across species. Genome Biol. 13, R57 (2012).

  36. 36.

    López-Maury, L., Marguerat, S. & Bähler, J. Tuning gene expression to changing environments: from rapid responses to evolutionary adaptation. Nat. Rev. Genet. 9, 583–593 (2008).

  37. 37.

    Rhind, N. et al. Comparative functional genomics of the fission yeasts. Science 332, 930–936 (2011).

  38. 38.

    Lane, K. et al. Measuring signaling and RNA-Seq in the same cell links gene expression to dynamic patterns of NF-κB activation. Cell Syst. 4, 458–469.e5 (2017).

  39. 39.

    Cadwell, C. R. et al. Electrophysiological, transcriptomic and morphologic profiling of single neurons using Patch-seq. Nat. Biotechnol. 34, 199–203 (2016).

  40. 40.

    Nichterwitz, S. et al. Laser capture microscopy coupled with Smart-seq2 for precise spatial transcriptomic profiling. Nat. Commun. 7, 12139 (2016).

  41. 41.

    Turner, J. J., Ewald, J. C. & Skotheim, J. M. Cell size control in yeast. Curr. Biol. 22, R350–R359 (2012).

  42. 42.

    Kuang, Z. et al. High-temporal-resolution view of transcription and chromatin states across distinct metabolic states in budding yeast. Nat. Struct. Mol. Biol. 21, 854–863 (2014).

  43. 43.

    Silverman, S. J. et al. Metabolic cycling in single yeast cells from unsynchronized steady-state populations limited on glucose or phosphate. Proc. Natl Acad. Sci. USA 107, 6946–6951 (2010).

  44. 44.

    Slavov, N., Airoldi, E. M., van Oudenaarden, A. & Botstein, D. A conserved cell growth cycle can account for the environmental stress responses of divergent eukaryotes. Mol. Biol. Cell 23, 1986–1997 (2012).

  45. 45.

    Marguerat, S. & Bähler, J. Coordinating genome expression with cell size. Trends Genet. 28, 560–565 (2012).

  46. 46.

    Schmoller, K. M. & Skotheim, J. M. The biosynthetic basis of cell size control. Trends Cell Biol. 25, 793–802 (2015).

  47. 47.

    Mata, J., Lyne, R., Burns, G. & Bähler, J. The transcriptional program of meiosis and sporulation in fission yeast. Nat. Genet. 32, 143–147 (2002).

  48. 48.

    Metzl-Raz, E. et al. Principles of cellular resource allocation revealed by condition-dependent proteome profiling. eLife 6, e28034 (2017).

  49. 49.

    Malecki, M. et al. Functional and regulatory profiling of energy metabolism in fission yeast. Genome Biol. 17, 240 (2016).

  50. 50.

    Moreno, S., Klar, A. & Nurse, P. Molecular genetic analysis of fission yeast Schizosaccharomyces pombe. Methods Enzymol. 194, 795–823 (1991).

  51. 51.

    Takahashi, C. N., Miller, A. W., Ekness, F., Dunham, M. J. & Klavins, E. A low cost, customizable turbidostat for use in synthetic circuit characterization. ACS Synth. Biol. 4, 32–38 (2015).

  52. 52.

    Semrau, S. et al. Dynamics of lineage commitment revealed by single-cell transcriptomics of differentiating embryonic stem cells. Nat. Commun. 8, 1096 (2017).

  53. 53.

    Keifenheim, D. et al. Size-dependent expression of the mitotic activator Cdc25 suggests a mechanism of size control in fission yeast. Curr. Biol. 27, 1491–1497 (2017).

  54. 54.

    Smith, T., Heger, A. & Sudbery, I. UMI-tools: modeling sequencing errors in Unique Molecular Identifiers to improve quantification accuracy. Genome Res. 27, 491–499 (2017).

  55. 55.

    Islam, S. et al. Quantitative single-cell RNA-seq with unique molecular identifiers. Nat. Methods 11, 163–166 (2014).

  56. 56.

    Ramírez, F., Dündar, F., Diehl, S., Grüning, B. A. & Manke, T. deepTools: a flexible platform for exploring deep-sequencing data. Nucleic Acids Res. 42, W187–W191 (2014).

  57. 57.

    Lun, A. T. L., Calero-Nieto, F. J., Haim-Vilmovsky, L., Göttgens, B. & Marioni, J. C. Assessing the reliability of spike-in normalization for analyses of single-cell RNA sequencing data. Genome Res. 27, 1795–1806 (2017).

  58. 58.

    Svensson, V. et al. Power analysis of single-cell RNA-sequencing experiments. Nat. Methods 14, 381–387 (2017).

  59. 59.

    Vallejos, C. A., Risso, D., Scialdone, A., Dudoit, S. & Marioni, J. C. Normalizing single-cell RNA sequencing data: challenges and opportunities. Nat. Methods 14, 565–571 (2017).

  60. 60.

    Love, M. I., Huber, W. & Anders, S. Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome Biol. 15, 550 (2014).

  61. 61.

    Anders, S. & Huber, W. Differential expression analysis for sequence count data. Genome Biol. 11, R106 (2010).

  62. 62.

    Bacher, R. et al. SCnorm: robust normalization of single-cell RNA-seq data. Nat. Methods 14, 584–586 (2017).

  63. 63.

    Lun, L., A., T., Bach, K. & Marioni, J. C. Pooling across cells to normalize single-cell RNA sequencing data with many zero counts. Genome Biol. 17, 75 (2016).

  64. 64.

    Lun, A. T. L., McCarthy, D. J. & Marioni, J. C. A step-by-step workflow for low-level analysis of single-cell RNA-seq data with Bioconductor. F1000Res. 5, 2122 (2016).

  65. 65.

    Ziegenhain, C., Vieth, B., Parekh, S., Hellmann, I. & Enard, W. Quantitative single-cell transcriptomics. Brief. Funct. Genomics 17, 220–232 (2018).

  66. 66.

    Brennecke, P. et al. Accounting for technical noise in single-cell RNA-seq experiments. Nat. Methods 10, 1093–1095 (2013).

  67. 67.

    Grün, D., Kester, L. & van Oudenaarden, A. Validation of noise models for single-cell transcriptomics. Nat. Methods 11, 637–640 (2014).

  68. 68.

    Chen, H.-I. H., Jin, Y., Huang, Y. & Chen, Y. Detection of high variability in gene expression from single-cell RNA-seq profiling. BMC Genomics 17, 508 (2016).

  69. 69.

    Jaakkola, M. K., Seyednasrollah, F., Mehmood, A. & Elo, L. L. Comparison of methods to detect differentially expressed genes between single-cell populations. Brief. Bioinform. 18, 735–743 (2017).

  70. 70.

    Finak, G. et al. MAST: a flexible statistical framework for assessing transcriptional changes and characterizing heterogeneity in single-cell RNA sequencing data. Genome Biol. 16, 278 (2015).

  71. 71.

    Breiman, L. Random forests. Mach. Learn. 45, 5–32 (2001).

  72. 72.

    Bailey, T. L. & Machanick, P. Inferring direct DNA binding from ChIP-seq. Nucleic Acids Res. 40, e128 (2012).

Download references

Acknowledgements

We thank T. Livermore for his help during the initial part of this project and C. Stefanelli for her contribution to the development of bayNorm. We are grateful to S. Parrinello, A. Martinez-Segura and M. Priestman for their input on the manuscript. This research was supported by the UK Medical Research Council, a Leverhulme Research Project Grant (grant no. RPG-2014-408) and a Wellcome Trust Senior Investigator Award to J.B. (grant no. 095598/Z/11/Z). We used the computing resources of the UK Medical Bioinformatics partnership (UK MED-BIO), which is supported by the UK Medical Research Council (grant no. MR/L01632X/1) and the Imperial College High Performance Computing Service.

Author information

Author notes

    • François Bertaux

    Present address: Institut Pasteur, Paris, France

    • Anna Köferle

    Present address: Munich Center for Neurosciences, Ludwig-Maximilian-Universität, Planegg, Germany

  1. These authors contributed equally: François Bertaux, Wenhao Tang.

Affiliations

  1. MRC London Institute of Medical Sciences, London, UK

    • Malika Saint
    • , François Bertaux
    • , Xi-Ming Sun
    • , Laurence Game
    •  & Samuel Marguerat
  2. Institute of Clinical Sciences, Faculty of Medicine, Imperial College London, London, UK

    • Malika Saint
    • , François Bertaux
    • , Xi-Ming Sun
    • , Laurence Game
    •  & Samuel Marguerat
  3. Department of Mathematics, Faculty of Natural Sciences, Imperial College London, London, UK

    • François Bertaux
    • , Wenhao Tang
    •  & Vahid Shahrezaei
  4. Research Department of Genetics, Evolution and Environment and UCL Genetics Institute, University College London, London, UK

    • Anna Köferle
    •  & Jürg Bähler

Authors

  1. Search for Malika Saint in:

  2. Search for François Bertaux in:

  3. Search for Wenhao Tang in:

  4. Search for Xi-Ming Sun in:

  5. Search for Laurence Game in:

  6. Search for Anna Köferle in:

  7. Search for Jürg Bähler in:

  8. Search for Vahid Shahrezaei in:

  9. Search for Samuel Marguerat in:

Contributions

M.S. set up the scRNA-seq protocol in yeast cells, performed all sequencing experiments and part of the computational analysis. W.T. and F.B. developed bayNorm and performed most computational analyses together with S.M. and V.S. X.M.S. performed all smFISH and growth experiments. A.K. developed the first generation of the single-cell isolation and PCR amplification protocol under the supervision of S.M. and J.B. L.G. assisted with the developement of the scRNA-seq protocol. S.M., V.S. and J.B. supervised this study. S.M., V.S., J.B., W.T. and M.S. wrote the paper.

Competing interests

The authors declare no competing interests.

Corresponding authors

Correspondence to Vahid Shahrezaei or Samuel Marguerat.

Supplementary information

  1. Supplementary Information

    Supplementary Figures 1–6, Supplementary References.

  2. Reporting Summary

  3. Supplementary Tables 1–10.

    Samples description and statistics, description of experimental conditions, description of cells and ctRNA, raw counts, filtered normalized counts, gene statistics and features, smFISH data, smFISH probes sequences and fluorochromes, primers used for single-cell RNA sequencing, sequence analysis of HGV promoters.

About this article

Publication history

Received

Accepted

Published

DOI

https://doi.org/10.1038/s41564-018-0330-4