Transcript levels powerfully influence cell behavior and phenotype and are carefully regulated at several steps. Recently developed single cell approaches such as RNA single molecule fluorescence in-situ hybridization (smFISH) have produced advances in our understanding of how these steps work within the cell. In comparison to single-cell sequencing, smFISH provides more accurate quantification of RNA levels. Additionally, transcript subcellular localization is directly visualized, enabling the analysis of transcription (initiation and elongation), RNA export and degradation. As part of our efforts to investigate how this type of analysis can generate improved models of gene expression, we used smFISH to quantify the kinetic expression of STL1 and CTT1 mRNAs in single Saccharomyces cerevisiae cells upon 0.2 and 0.4 M NaCl osmotic stress. In this Data Descriptor, we outline our procedure along with our data in the form of raw images and processed mRNA counts. We discuss how these data can be used to develop single cell modelling approaches, to study fundamental processes in transcription regulation and develop single cell image processing approaches.
|Design Type(s)||observation design • replicate design • stimulus or stress design • image analysis objective|
|Measurement Type(s)||transcription profiling assay|
|Technology Type(s)||epifluorescence microscopy|
|Factor Type(s)||Concentration • temporal_instant • biological replicate • technical replicate|
|Sample Characteristic(s)||Saccharomyces cerevisiae BY4741|
Machine-accessible metadata file describing the reported data (ISA-Tab format)
Background & Summary
Transcript levels are regulated by essential biological processes1,2. Traditionally, investigations of how these processes are regulated within the cell have relied on cell populated based approaches3,4,5. Such studies have uncovered important insights into how transcript levels are regulated. However, studies performed using recently developed single cell-based approaches have demonstrated that the population average reported from such experiments can unintentionally equate essentially distinct expression patterns6,7,8,9,10,11,12,13,14,15,16,17,18,19,20. For example, two cell populations, one wherein a few cells express high levels of a particular RNA and one wherein many cells express low levels of a particular RNA, would appear to have comparable RNA expression levels in population-based measurements. Additionally, cell populated methods are also unable to address whether two different mRNA species are expressed in the same cell or in different cells which has hampered the identification of distinct cell types. This transcript variability is problematic because any selection on these cells based on RNA expression levels could have a vastly different impact on these two cell populations in terms of cell vitality and/or the number of cells that survive.
In comparison to population measurements, single-cell/single-molecule imaging experiments such as RNA single molecule fluorescence in situ hybridization (smFISH) can resolve the spatial and temporal distribution of individual RNA molecules with high resolution (Fig. 1). This enables the correlation of the expression of different RNA species with each other and with cellular phenotypes. Additionally, we and others have recently shown how RNA smFISH data can lead to better more predictive models of gene expression due to: 1) the fact that they contain information on more gene regulatory processes, including transcription (initiation, elongation), RNA export, and RNA degradation, and 2) they enable direct measurement of the distribution of RNA molecules which are essential for single cell modeling20,21,22,23,24,25,26,27.
Unfortunately, very few RNA smFISH datasets are publicly available, and available datasets are limited in terms of the number or RNA species, time points, and cells. This limits the ability of the research community to develop improved RNA smFISH analysis approaches and discover new insights into the important biological processes regulating transcription levels.
To address this limitation, we present our information rich RNA smFISH data set in this Data Descriptor which is an extension to our previous publication26,28. The experimental and computational workflow consist of (1) sample preparation, (2) imaging cells, nucleus and RNA species, (3) nuclear and cytoplasmic segmentation, (4) RNA spot detection, (5) RNA counting and (6) computation of various sample statistics (Fig. 1a). This data set reports the imaging and quantification of two mRNAs (STL1 and CTT1) in 65,000 individual model yeast Saccharomyces cerevisiae cells imaged in 3D (Figs 1–3). The expression of these mRNAs was induced using two different osmotic stresses (0.2 M and 0.4 M NaCl) and monitored over sixteen time points in biological duplicate or triplicate26,29.
Because this data set contains both 3D spatial and temporal information on the expression of these RNAs, it can be used to simultaneously investigate many different important processes regulating transcription levels as they occur in the cell. Such analyses are not possible in cell population-based or single cell sequencing experiments1,2,3,4,5,27. Here we present RNA expression as population mean, fraction of cells above basal mRNA expression (ON-cells), the variance normalized by the expression mean (Fano factor) (Fig. 1), marginal probability of nuclear and cytoplasmic mRNA (Fig. 2), and the joint probability of nuclear and cytoplasmic RNA expression (Fig. 3).
The reuse potential of this data set lies in three independent areas of research. The first area is in gene expression modelling. Standard approaches to modelling do not always work well on single cell data sets. This data set could be particularly helpful for theoretical labs working to develop improved modelling approaches and which may not have the resources or expertise to generate their own similar data sets. The second area is in image processing. The limit on the amount and quality of data that can be extracted from data sets such as these is set by image processing. Thus, improved image processing approaches are needed to maximize the amount of data extracted from one data set. Finally, we expect that such a dataset could be re-used by labs interested in studying transcription, RNA export, and RNA degradation as these processes have not yet been thoroughly investigated using single cell-based approaches.
These methods are expanded versions of descriptions in our related work26.
Yeast strain and growth condition
Saccharomyces cerevisiae strain BY4741 (MATa his3∆1 leu2∆0 met15∆0 ura3∆0) was used for RNA smFISH experiments. Three days before the experiment, yeast cells were streaked out on a complete synthetic media plate (CSM, Formedia, UK) from a glycerol stock stored at −80 C. The day before the experiment, a single colony from the CSM plate was inoculated in 5 ml CSM medium (pre- culture). After 6–12 h, the optical density (OD) of the pre-culture was measured and the cells were diluted into new CSM medium to reach an OD of 0.5 the next morning.
RNA single molecule Fluorescence In-Situ Hybridization (smFISH) Probe Construction
Probes used to detect discrete STL1 and CTT1 mRNAs consist of a set of 48 DNA oligonucleotides, each with a length of 20 nt (see Table 1 for DNA sequences). The 3′-end of each probe was modified with an amine group and coupled to either tetramethylrhodamine (TMR) (Invitrogen) (STL1) or Cy5 (GE Amersham) (CTT1). Coupled probes were ethanol precipitated and purified on an HPLC column to isolate oligonucleotides displaying the highest degree of coupling of the fluorophore to the amine groups.
Sample preparation for smRNA-FISH
Yeast cells in log-phase growth (OD = 0.5) were concentrated 10 times (OD = 5) by a glass filter system with a 0.45 μm filter (Millipore). Cells were exposed to a final osmolyte concentration of either 0.2 M or 0.4 M NaCl, and fixed in 4% formaldehyde at time points 0, 1, 2, 4, 6, 8, 10, 15, 20, 25, 30, 35, 40, 45, 50, and 55 minutes. At each time point, cells (5 ml) in the corresponding beaker were poured into a 15 ml falcon tube containing 0.55 ml of 37% formaldehyde, resulting in cell fixation. Cells were fixed at +20 C for 30 minutes, then transferred to +4 C and fixed overnight on a shaker. After fixation, cells were centrifuged at 500 × g for 5 minutes. The liquid phase was discarded, the cell pellet was resuspended in 5 ml ice-cold Buffer B (1.2 M sorbitol, 0.1 M potassium phosphate dibasic, pH 7.5), and cells were centrifuged again. After discarding the liquid phase, yeast cells were resuspended in 1 ml Buffer B, and transferred to 1.5 ml centrifugation tubes. Cells were then centrifuged again at 500 × g for 5 minutes. The supernatant was removed and the pellet was resuspended in 0.5 ml Spheroplasting Buffer (Buffer B, 0.2% β-mercaptoethanol, 10 mM Vanadyl-ribonucleoside complex). The OD of each sample was measured and samples were adjusted such that each contained the same number of cells. 10 μl of 2.5 mg/ml Zymolyase (US Biological) was added to each sample on a +4 C block. Cells were incubated on a rotor at +30 C and cell wall digestion was monitored by microscopy. Zymolyase digestion was terminated when 90% of the cells visualized under the microscope were black instead of opaque (indicating cell wall digestion) by incubation on a +4 C block. Cells were centrifuged for 5 minutes, then the cell pellet was resuspended with 1 ml ice-cold Buffer B and spun down for 5 minutes at 500 × g. After discarding the liquid phase, the pellet was gently re-suspended with 1 ml of 70% ethanol and kept at +4 C for at least 12 hours at +4 C. Cells were incubated with a 1:1000 dilution (chosen for optimal ratio of RNA spot intensity to cell background as determined by independent optimization experiments) of RNA smFISH probes (stock concentration: STL1-TMR probes: 242pmol/µl; CTT1-Cy5 probes: 90 pmol/µl) in hybridization buffer (10% dextran, 10 mM vandyl-ribonucleoside complex, 0.02% RNAse-free BSA, 1 mg /ml E.coli tRNA, 2x SSC, 10% formamide) to label STL1 and CTT1 mRNAs.
Microscopy setup for single molecule RNA FISH imaging
Cells were imaged with a Nikon Ti Eclipse epifluorescent microscope equipped with perfect focus (Nikon), a 100x VC DIC lens (Nikon), fluorescent filters for DAPI, TMR and CY5 (Semrock), an X-cite 120 fluorescent light source (Excelitas), and an Orca Flash 4v2 CMOS camera (Hamamatsu). The microscope was controlled by the Micro-Manager software program30.
Image acquisition for single molecule RNA FISH imaging
For single molecule RNA-FISH microscopy, z-stacks of images from fixed yeast cells for widefield (20 ms exposure), DAPI (20 ms exposure), TMR (1 s exposure) and CY5 (1 s exposure) were taken with each image in the z-stack separated by 200 nm. For each sample, multiple positions on the slide were imaged to ensure large numbers of cells.
Image analysis for single-molecule RNA-FISH
RNA smFISH images generated by microscopy were analyzed to segment cells and define the nuclear and cytoplasmic compartment of each cell26,31. For each DAPI image stack, a maximum intensity projection was generated. A DAPI intensity threshold was picked by eye to identify the maximum number of nuclei in the image. Based on this threshold, the image was then converted into a binary image. Connected regions that were too big or too small were removed. Connected regions in the image were then labeled by individual numbers resulting in individual numbered nuclei. Next, for each connected region an individual DAPI threshold was computed as 50% of the difference between the maximum DAPI signal minus the background DAPI signal. Through this iterative and cell specific thresholding, cell-to-cell differences in nuclei DNA concentration and DAPI staining were considered. For each cell, the entire DAPI image stack was converted into a binary image stack using the cell-specific DAPI thresholds, resulting in a 3D nucleus and cytoplasm where thresholds have been considered. After the nucleus had been segmented, the last 5 images of the widefield image stack were subjected to maximum intensity projection to generate the cell outlines. A background image was generated by running a disk smoothing filter over this image. The background image was subtracted from the maximum projected image to enhance contrast. This image was then combined with the thresholded DAPI image using morphological reconstruction and cells were segmented with a watershed algorithm. After image segmentation, elements that were not properly segmented and resulted in artificially small or large segmented elements were removed. Cells too close to the image borders were also removed.
To identify RNA spots, fluorescence thresholds for RNA-FISH images were determined for each dye based on a single image plane, for each position imaged and for each sample. For a given dye, the mean threshold value was calculated based on the thresholds identified from each image in the data set. This threshold was reproducible from image to image and from person to person. After the threshold had been determined for each dye, each image in the stack was subjected to a Gaussian filter to remove noise, and then filtered with a Laplacian of a Gaussian filter to detect spots in the image. These filtered images were then converted into binary images using the previously determined threshold. For each pixilated signal, the regional maximum was determined, which identifies the xyz-position of the RNA spot within each 3D image stack. This process was repeated for each of the TMR and CY5 image stacks. The number of RNA spots for each cell in the cytoplasm or the nucleus was determined by applying the mask of segmented nucleus and cytoplasm on the filtered RNA-FISH image stacks in 3D. The numbers of RNA molecules in the nucleus and cytoplasm were counted for each individual cell. The RNA-FISH data set consists of a total of 65454 single cells (25511 at 0.2 M NaCl and 39943 at 0.4 M NaCl).
RNA-FISH data analysis
For each time point, a distribution of mRNA molecules in single cells was determined as the marginal distribution of total STL1 and CTT1 mRNA or as the joint probability distribution of nuclear and cytoplasmic RNA. The distributions were also summarized in a binary data set of ON and OFF cells. ON-fraction are considered cells that turn on RNA transcription after osmotic stress. RNA-expression prior to osmotic stress is considered basal RNA transcription of the gene and is considered an OFF-cell. For the quantification of ON-fraction, the basal expression defined as 95% of the cumulative STL1 and CTT1 RNA distribution was chosen resulting in ON-cells with more than two (STL1) or more than eight (CTT1) mRNA molecules. The population average was computed as the mean marginal cytoplasmic or nuclear RNA for each mRNA species.
Data also have been deposited to BioStudies with the dataset identifier S-BIAD1, accessible through the URL: https://www.ebi.ac.uk/biostudies/studies/S-BIAD1.
Data also have been deposited to a public dropbox folder (https://www.dropbox.com/sh/4ydp2dnnu8kvdrm/AAA3eDV5CJOllWJprvwEjGy8a?dl=0)
The data sets consist of raw image stacks, analysed images and RNA spot counts in each cell.
Statistical analysis using a ks-test was performed on marginal distributions and the test statistics are available within Figshare32.
Raw datasets: widefield and fluorescent microscopy image stack
Each experimental condition, time point, and field of view is stored as an image stack consisting of 25 or 26 images of each STL1 mRNA labelled with TMR coupled oligonucleotides, CTT1 labelled with CY5 coupled oligonucleotides (see Table 1 for the DNA sequences), the DAPI stained DNA, and widefield images of the cells. Each stack consists of 100 or 104 images and is stored as a TIFF file. In total, 4–8 fields of view have been imagined at 16 different time points, for two experimental conditions each in biological duplicates or triplicates (331 image stacks = 277.44 GB) and can be accessed on-line (Table 2).
RNA spot counts in each cell
After all the images for an experimental data set have been analysed, the analysed data from all image stacks are combined into a single file for each experiment. The content of this file is described in Table 3 and is available as MATLAB and CVS files. From these datasets, the RNA counts per cell are then used to compute the mean (Fig. 1c), fraction of ON-cells (Fig. 1d), Fano factor (Fig. 1e), marginal (Fig. 2) and joint probability distribution (Fig. 3). Statistical analysis using a ks-test was performed on marginal distributions32.
During sample preparation, the cell wall of the yeast cells needs to be digested. If the cell wall is digested correctly, cells turn from an opaque to a black color. Cells were monitored under the microscope every 10 minutes after addition of Zymolyase and when 90% of the cells turned black, cells were transferred to the +4 C block to stop Zymolyase activity.
Optimizing RNA-FISH probe concentration (dilutions: 1:500; 1:1000; 1:2000; 1:4000; 1:8000) and formamide concentration (5, 10, 15, 20, 25%) in the hybridization buffer is an important step in validating the RNA-FISH methodology. We use hybridization conditions without probes as a control in these experiments to determine any background signal. At time t = 0 min, mRNA expression of STL1 is very low as previously shown with RNA-FISH and qRT-PCR15.
Validation of the computational pipeline
Each data set contains 122 to 232 images per stack. For each image stack the initial threshold to identify the maximum number of DAPI stained nuclei was determined manually by comparing the raw image with the image of the identified nuclei. Similarly, the thresholds for RNA spot detection in the TMR and CY5 channel was determined manually for each image stack by comparing the raw image with the image of the identified TMR or CY5 spots and then the mean RNA-spot threshold was calculated.
Raw datasets: widefield and fluorescent microscopy image stack
These datasets could be re-used to develop new and/or improved image processing applications. Each stack contains all the images acquired at a specific field of view. The resolution of the images is 2048 pixels by 2048 pixels. The number of images per stack is 100 or 104 with 25 or 26 z-position’s. At each z-position, a TMR (image 1), a CY5 (image 2), a DAPI (image 3), and a widefield (image 4) were acquired.
These datasets could be used to compare different image processing applications during application development and for investigating processes that affect the spatial organization of mRNA in the cell.
RNA spot counts in each cell
The data sets for each experiment contain the number of mRNA molecules in each cell, the cytoplasm, or the nucleus for each of the mRNA species visualized. These data sets can be directly used to compute the mean, the ON-fraction or the Fano factor. The Fano factor is computed as the variance divided by the mean. In addition, these data sets can be used to compute marginal nuclear and cytoplasmic RNA distributions as well as the joint probability distribution of nuclear and cytoplasmic RNA for STL1 and CTT1.
The computer codes used to generate spatiotemporal RNA expression data sets reported in this study and for automated cell segmentation and RNA spot detection were developed at Vanderbilt University and are intended solely for scientific research. The computer codes used for automated cell segmentation are available in the public dropbox repository: https://www.dropbox.com/sh/egb27tsgk6fpixf/AADaJ8DSjab_c0gU7N7ZF0Zba?dl=0. The computer codes used to generate spatiotemporal RNA expression data sets reported in this study are available from the corresponding author on reasonable request.
Chatterjee, S. & Ahituv, N. Gene Regulatory Elements, Major Drivers of Human Disease. Annu. Rev. Genomics Hum. Genet. 18, 45–63 (2017).
Lelli, K. M., Slattery, M. & Mann, R. S. Disentangling the Many Layers of Eukaryotic Transcriptional Regulation. Annu. Rev. Genet. 46, 43–68 (2012).
Brown, J. B. & Celniker, S. E. Lessons from modENCODE. Annu Rev Genomics Hum Genet. 16, 31–53 (2015).
Schneider-Poetsch, T. & Yoshida, M. Along the Central Dogma-Controlling Gene Expression with Small Molecules. Annu Rev Biochem. 20(87), 391–420 (2018).
Miller, C. et al. Dynamic transcriptome analysis measures rates of mRNA synthesis and decay in yeast. Mol. Syst. Biol. 7, 458 (2011).
Femino, A. M., Fay, F. S., Fogarty, K. & Singer, R. H. Visualization of Single RNA Transcripts in Situ. Science 280, 585–590 (1998).
Vera, M., Biswas, J., Senecal, A., Singer, R. H. & Park, H. Y. Single-Cell and Single-Molecule Analysis of Gene Expression Regulation. Annu. Rev. Genet. 50, 267–291 (2016).
Balazsi, G., van Oudenaarden, A. & Collins, J. J. Cellular decision making and biological noise: from microbes to mammals. Cell 144, 910–925 (2011).
Lenstra, T. L., Rodriguez, J., Chen, H. & Larson, D. R. Transcription Dynamics in Living Cells. Annu. Rev. Biophys. 45, 25–47 (2016).
Boettiger, A. et al. Super-resolution imaging reveals distinct chromatin folding for different epigenetic domains. Nature 529, 418–422 (2016).
Adivarahan, S. et al. Spatial Organization of Single mRNPs at Different Stages of the Gene Expression Pathway. Mol Cell. Nov 15, 72(4), 727–738 (2018).
Shaffer, S. M. et al. Rare cell variability and drug-induced reprogramming as a mode of cancer drug resistance. Nature 546, 431–435 (2017).
Raj, A. & van Oudenaarden, A. Nature, Nurture, or Chance: Stochastic Gene Expression and Its Consequences. Cell 135, 216–226 (2008).
Moffitt, J. R. et al. High-throughput single-cell gene-expression profiling with multiplexed error-robust fluorescence in situ hybridization. Proc. Natl. Acad. Sci. 113, 201612826 (2016).
Raj, A., van den Bogaard, P., Rifkin, S. A., van Oudenaarden, A. & Tyagi, S. Imaging individual mRNA molecules using multiple singly labeled probes. Nat. Meth. 5, 877–879 (2008).
Tsanov, N. et al. smiFISH and FISH-quant a flexible single RNA detection approach with super-resolution capability. Nucleic Acids Res. 45, e141–e141 (2017).
Iyer, S., Park, B. R. & Kim, M. Absolute quantitative measurement of transcriptional kinetic parameters in vivo. Nucleic Acids Res. 44, 1–8 (2016).
Stockwell, S. R. & Rifkin, S. A. A living vector field reveals constraints on galactose network induction in yeast. Mol. Syst. Biol. 13, 908 (2017).
Munsky, B., Neuert, G. & van Oudenaarden, A. Using Gene Expression Noise to Under- stand Gene Regulation. Science 336, 183–187 (2012).
Neuert, G. et al. Systematic Identification of Signal-Activated Stochastic Gene Regulation. Science 339, 584–587 (2013).
Senecal, A. et al. Transcription Factors Modulate c-Fos Transcriptional Bursts. Cell Rep. 8, 75–83 (2014).
Munsky, B. & Neuert, G. From analog to digital models of gene regulation. Phys. Biol. 12, 45004 (2015).
Sepulveda, L. A., Xu, H., Zhang, J., Wang, M. & Golding, I. Measurement of gene regulation in individual cells reveals rapid switching between promoter states. Science 351, 1218–1222 (2016).
Hilfinger, A., Norman, T. M. & Paulsson, J. Exploiting natural fluctuations to identify kinetic mechanisms in sparsely characterized systems. Cell Systems 2, 251–259 (2016).
Gómez-Schiavon, M., Chen, L.-F., West, A. E. & Buchler, N. E. BayFish: Bayesian inference of transcription dynamics from population snapshots of single-molecule RNA FISH in single cells. Genome Biol. 18, 164 (2017).
Munsky, B., Li, G., Fox, Z., Shepherd, D. & Neuert, G. Distribution Shapes Govern the Discovery of Predictive Models for Gene Regulation. Proceedings of the National Academy of Sciences. 29(115), 7533–7538 (2018).
Chappell, L., Russell, A. J. C. & Voet, T. Single-Cell (Multi)omics Technologies. Annu. Rev. Genomics Hum. Genet. 19, 15–41 (2018).
Li, G. & Neuert, G. A microscopy data set of discrete spatial and temporal single molecule RNA expression in single cells. University of Dundee. https://doi.org/10.17867/10000118 (2019).
Saito, H. & Posas, F. Response to Hyperosmotic Stress. Genetics 192, 289–318 (2012).
Edelstein, A. D. et al. Advanced methods of microscope control using μManager software. J. Biol. Methods 1, 10 (2014).
Kesler, B., Li, G., Thiemicke, A., Venkat, R. & Neuert, G. Automated cell boundary and 3D nuclear segmentation of cells in suspension. Preprint at https://doi.org/10.1101/632711 (2019).
Li, G. & Neuert, G. Multiplex RNA single molecule FISH of inducible mRNAs in single yeast cells. Figshare, https://doi.org/10.6084/m9.figshare.c.4464845 (2019).
GL and GN were funded by NIH DP2 GM11484901, NIH R01GM115892 and Vanderbilt Startup Funds. The authors thank Alexander Thiemicke, Benjamin Kesler, Hossein Jashnsaz, and Amanda Johnson for comments on the manuscript.
The authors declare no competing interests.
Publisher’s note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
ISA-Tab metadata file
About this article
Cite this article
Li, G., Neuert, G. Multiplex RNA single molecule FISH of inducible mRNAs in single yeast cells. Sci Data 6, 94 (2019). https://doi.org/10.1038/s41597-019-0106-6
A yeast-optimized single-cell transcriptomics platform elucidates how mycophenolic acid and guanine alter global mRNA levels
Communications Biology (2021)
World Journal of Microbiology and Biotechnology (2021)