Gene expression is a biological process regulated at different molecular levels, including chromatin accessibility, transcription, and RNA maturation and transport. In addition, these regulatory mechanisms have strong links with cellular metabolism. Here we present a multi-omics dataset that captures different aspects of this multi-layered process in yeast. We obtained RNA-seq, metabolomics, and H4K12ac ChIP-seq data for wild-type and mip6Δ strains during a heat-shock time course. Mip6 is an RNA-binding protein that contributes to RNA export during environmental stress and is informative of the contribution of post-transcriptional regulation to control cellular adaptations to environmental changes. The experiment was performed in quadruplicate, and the different omics measurements were obtained from the same biological samples, which facilitates the integration and analysis of data using covariance-based methods. We validate our dataset by showing that ChIP-seq, RNA-seq and metabolomics signals recapitulate existing knowledge about the response of ribosomal genes and the contribution of trehalose metabolism to heat stress. Raw data, processed data and preprocessing scripts are made available.
|Measurement(s)||RNA • metabolite • histone_modification • epigenetic status|
|Technology Type(s)||RNA sequencing • nuclear magnetic resonance spectroscopy • ChIP-seq|
|Factor Type(s)||environmental stress • duration of heat shock • wild type vs Mip6 mutant strain|
|Sample Characteristic - Organism||Saccharomyces cerevisiae|
Machine-accessible metadata file describing the reported data: https://doi.org/10.6084/m9.figshare.11836743
Background & Summary
Eukaryotic gene expression is a complex process in which the information coded in genes is transformed into functions that support living cells. This process comprises different interconnected steps, which occur in separate compartments and are performed by specific molecular components1,2. One of the earlier steps consists of setting up the appropriate epigenetic modifications to allow the expression or repression of specific gene programs3,4. These modifications take place mostly on DNA and histones, ensuring access to the proper transcriptional machinery. The specific set of modifications across the genome regulates the final synthesis of the mRNA5. These newly synthetized RNA molecules are extensively modified prior to their export to the cytoplasm, where they can be degraded by the mRNA decay machinery, stored in specific foci or translated into proteins6,7. Throughout this journey, RNA molecules are guided by RNA-binding proteins that control their fate7,8,9. Finally, the encoded protein products participate in numerous processes, including cellular metabolism where organic compounds are transformed and/or stored. A number of these compounds, such as Acetyl-CoA, glucose or methyl groups, participate, in turn, in chromatin modifications that regulate gene expression.
Our current understanding of transcriptional regulation was largely established using yeast as a model organism. Yeast is an ideal research model for transcriptome research because they exhibit most of the cellular complexity present in eukaryotes and have relatively compact, accessible genomes. However, while the interconnected transcriptional circuit has been studied and accepted by many yeast labs2,10,11,12,13,14,15, these studies usually target aspects of the transcriptional regulation and produce separate results that contribute to the overall transcriptional model. Moreover, different strains, experimental conditions and batches are used by different labs and there are few examples of experimental datasets where all these layers have been measured on exactly the same samples. This poses challenges to the mathematical integration of the data, as methods that rely on the analysis of co-variation patterns will have application restrictions. When the multi-layered data are obtained on the same samples, additional analysis opportunities arise that facilitate the establishment of relationships across regulatory mechanisms.
In this Data Descriptor, we present a yeast multi-omics dataset that features three basic layers of the transcriptional circuit, measured in the same set of samples. These include one epigenetic modification - H4K12ac, a mark for active promoters-, gene expression -RNA-seq- and targeted metabolomics. Moreover, data are obtained for both WT and a mip6Δ mutant, in control and heat-shock induced conditions. Mip6 is an RNA-binding protein that participates in RNA export under stress16 and consequently is informative of the contribution of post-transcriptional regulation to the adaptation of RNA levels to environmental changes. Taken together, the selection of yeast strains, growth conditions and omics experiments, creates a unique dataset to study the cross-talk between epigenetic, transcriptional and metabolic regulation in response to environmental cues. The availability of multi-omics data on the same set of samples will facilitate the application of powerful statistical approaches that fully leverage paired measurements to propose quantitative regulatory models. A subset of this collection, namely gene expression data for 20 genes that are regulated by stress transcriptional factors Msn2/4, has been published elsewhere16.
Figure 1 illustrates the experimental desing of our dataset. Panel A describes the strategy to manage sampling and the time-course nature of our experiment. A single, 330 mL culture (either for WT or mip6Δ strains) was grown at 30 °C until the exponential phase (OD = 0.7), and this culture was subsequently split across three flasks. One flask was maintained at 30 °C and labeled as time point 0. The other two flasks were incubated at 39 °C for 20 minutes and 120 minutes, respectively, by adding preheated media to rapidly increase the temperature to 39 °C. These last two flasks capture the heat-shock response, while the 30 °C flask serves as a control representing non-stress conditions. The rationale for having two flasks for the 39 °C temperature instead of a single flask sampled at two different time points was to avoid introducing effects related to culture volume. Panel B describes how samples were obtained for omics measuments. Basically, for each of the flasks described above three aliquotes were extracted for RNA-seq, NMR metabolomics, and ChIP-seq analyses. Therefore, the three omics assays were performed on the same cell culture. RNA-seq and NMR aliquotes were obtained first, and the remaining culture was treated to induce cross-linking before collecting the ChIP-seq aliquote. After aliquoting, each tube was inmediatelly frozen and stored at −80 °C.
The process described in Fig. 1 was repeated 4 times to generate four biological replicates. However, due to sample management limitations these 4 replicates were created on two different days. Specifically, biological replicates 1 and 2 were obtained in day 1, and replicates 3 and 4 were obtained in day 2.
Acquisition and preprocessing of multi-omics data
Total RNA was isolated by hot acid phenol extraction. RNA integrity was checked with Bioanalyzer (Agilent) and then submitted to a commercial sequencing facility (Macrogen Corea). Sequencing was done with Illumina using the TruSeq protocol. Between 50–60 million reads of 100 bp paired data were obtained from each sample. Raw sequencing data quality was checked by fastQC and good overall quality (Fig. 2a) was observed in all cases. No trimming was deemed necessary. Reads were mapped to the yeast saccer3 genome with Tophat217 and genes were quantified with HTSEQ18, intersection-option. Supplementary Table S1 shows the number of reads, mapping rate and number of reads in genes for all samples, revealing uniform quantities across the dataset.
The NOISeq19 R package was used to perform the quality control of count data. We observed most of reads mapped onto protein-coding genes (>80%), as expected (Fig. 2b). Counts were normalized via TMM20 and a low count filtering was applied with the NOISeq cpm method (with cpm = 1). Principal Component Analysis (PCA) indicated a slight batch effect for the day of culture growth (Fig. 2c left) that was removed by ARSyN21 (Fig. 2c right). In total, we obtained gene expression values for 6,379 genes.
Metabolomics measurements were performed on an NMR platform as described in22. Basically, metabolites were extracted via chloroform–methanol extraction and the spectra of cell extract samples were recorded on a Bruker AVII-500 using a TCI cryoprobe with spinning at 3,500 Hz. Spectra were processed using Topspin2.16 software (Bruker GmbH, Karlsruhe, Germany). Metabolite identification and assignment were performed with the help of the Human Metabolome Database and 2D NMR experiments. Signal peaks of spectra were normalized considering that the sum of peak areas across all metabolites was constant for every sample, and values for each metabolite were given as a fraction of the total area. A total of 45 compounds were detected, that included 5 sugars, 17 amino-acids, 4 alcohols, 3 vitamin-derivated compounds, 5 carboxylic acids, and other compounds (CMP, NAD, Glutathione, ATP and GMP), plus 3 unidentified metabolites (Table 1). Raw data were log2 transformed and compounds with non-positive measure across all samples were removed, as they were considered to be below reliable detection limit. PCA analysis indicated a small batch effect (Fig. 3a), that was removed by ARSyN (Fig. 3b).
For ChIP-seq samples, chromatin immunoprecipitation was performed as previously described23. After cross-linking the cultures for 20 minutes at room temperature with 1% formaldehyde (Sigma), they were quenched with 125 mM glycine for 15 minutes. Subsequently, the cells were collected by centrifugation, split in two aliquots (one for each ChIP) and washed with 25 mL of cold Tris-saline buffer (20 mM Tris-HCl, 150 mM NaCl, pH 7.5) three times. The pellets were frozen in liquid nitrogen and stored at −80 °C until further processing. Cells were disrupted by adding 300 µL of lysis buffer (50 mM HEPES-KOH at pH 7.5, 1 mM EDTA, 140 mM NaCl, 1 mM PMSF and protease inhibitors) and 200 µL of glass beads and vortexing for 13 minutes at 4 °C. The cell extracts were sonicated for 30 minutes in a Bioruptor sonicator (Diagenode) at high intensity using 30 seconds on/30 second off cycles in a 4 °C water bath. The cellular lysate was clarified by centrifugation at 12,000 rpm for 10 minutes at 4 °C and the whole supernatant was used for immunoprecipitation by incubating with magnetic beads (Dynabeads, Invitrogen) bound to anti-histone H4 (Abcam) or anti-histone H4K12ac (Active Motif) antibody for 2 hours at 4 °C. Beads were subsequently washed twice with lysis buffer, twice with lysis buffer supplemented with 360 mM NaCl, twice with wash buffer (10 mM Tris-HCl, pH 8.0, 250 mM LiCl, 125 mM Nadeoxycholol, 1 mM EDTA and 0.5% NP-40), and once with TE buffer. Samples were eluted by adding 50 µL of elution buffer (50 mM Tris-HCl, pH 8, 10 mM EDTA, 1% SDS) to the beads and incubating for 10 min at 65 °C. This step was repeated twice. The samples were incubated overnight at 65 °C to reverse the cross-linking and then incubated with 100 µg/250 µl of proteinase K (Ambion) for 1.5 h at 45 °C. DNA was isolated by phenol extraction. This DNA was sent to Macrogen Corea for sequencing.
Sequencing was done following the Illumina TruSeq protocol. Around 20 million 50 bp reads were obtained for each sample. Note that two ChIP-seq data files were obtained for each sample: H4 and H4K12ac. H4 files contain the reads after purification of total H4 histone and H4K12ac files contain the data associated to acetylation of Lysine 12. Raw sequencing data quality was checked by fastQC and good overall quality (Fig. 4a) was observed in all cases. Trimming of Illumina adapters was performed using Cutadapt24. Reads were mapped to the yeast saccer3 genome with Bowtie225. Supplementary Tables S2 and S3 summarize sequencing performance in terms of number of reads and mapping rate for H4 and H4K12ac samples, respectively.
H4 sample of mip6.39.0 was discarded as it showed poor sequencing performance. Macs2 software26 was used to call Histone 4 acetylation peaks on the H4K12ac samples alone. Next, a consensus file was generated by merging peaks across all samples using the merge command from bedtools software27 with default parameters. These consensus regions were used to map back reads of all samples, including H4 samples. Peaks were quantified with HTSEQ18, intersection-option. NOISeq19, R package was used to perform a quality control of count data. Moreover, coverage per base was obtained for both, H4 and H4K12ac samples, using the genomecov command from bedtools26.
Validation of dataset replicability
In order to assess replicability, pairwise scatter plots were obtained for RNA-seq data (Fig. 5a), metabolomics data (Fig. 5b) and ChIP-seq data (Fig. 5c,d). Only WT strain replicates are shown as mip6Δ strain data behaved similarly. Replicates were highly and equally correlated with each other, and no experimental outliers were detected.
Validation of biological consistency
Translational repression upon heat-shock occurs in most eukaryotes31,32, and it is well known that ribosomal genes rapidly shut down after heat treatment. Moreover, previous studies assessing the impact of heat-shock on yeast cells revealed a protective effect for trehalose33,34. We evaluated whether this effect was corroborated by our data.
First, we analyzed ribosomal data. As expected, we found a general down regulation of both ribosomal protein genes (RP genes, Fig. 6a, left panel) and ribosomal biosynthesis genes (RiBi genes, Fig. 6a, right panel) upon a heat treatment at 39 °C. This response was similar for WT and mip6Δ strains. The strongest effect was observed after 20 minutes of heat (blue bars), and recovery was observed for all genes after 120 minutes (orange bars). The drastic downregulation of several RP genes after 20 minutes at 39 °C was further validated in independent experiments by q-PCR (Fig. 6b).
It is well stablished that histone modifications modulate gene expression programs35. Among other modifications, histone acetylation is considered as a key player in the epigenetic control of gene expression and is associated with transcriptionally active genes36. Moreover, a significant deacetylation of H4 was observed after a one hour heat-shock of HeLa cells, being histone H4K12ac affected37. To evaluate our ChIP-seq data in relation to heat-shock we first analyzed the composite profile across all genes for the H4K12ac marker (Fig. 7a). In agreement with its role in transcriptional activation, we found a general enrichment of H4K12ac at the Transcription Start Site (TSS) of genes, and, as expected for the heat-shock response, we found consistently lower levels for all 39 °C samples. Moreover, we found a significant reduction of H4K12 acetylation at the TSS of RB genes after 20 minutes of heat-shock (Kolmogorov-Smirnov test p.value < 1e-10, Fig. 7b, blue lines), which agrees with the strong down regulation of their expression at this time point (Fig. 6a, blue lines). Notably, H4K12 acetylation levels appeared to be fully restored after 120 minutes (Fig. 7b, orange lines) while gene expression levels were not (Fig. 6a, orange lines). This result suggests that H4K12 acetylation responds more rapidly to heat stress than gene expression.
To determine if this time-dependent heat-response pattern between gene expression and H4K12 acetylation was a general pattern we analyzed the distribution of both types of data. In particular, we obtained the mean read coverage per base at the TSS ± 100 p for the H4K12ac signal in each gene, and computed log2 fold-change values (Log2FC) of the comparison between consecutive time-points. We compared this distribution to the Log2FC of gene expression. This analysis showed that the gene expression response measured by RNA-seq (grey boxes) has a larger dynamic range than H4K12 acetylation measured by ChIP-seq (white boxes), as the distributions are broader in the former (Fig. 7c). Additionally, the transcriptional response at 0′–20′ is overall larger than that at 20’ to 120′. However, the direction of signal change at the gene-level seems to be the same for both omics layers, since the position of selected genes in the Log2FC distributions is similar for RNA-seq and H4K12ac data (Fig. 7c). This was true both for genes that are down-regulated upon heat-shock (RP genes) than for upregulated genes (i.e. trehalose metabolism genes, Fig. 7c). We concluded that a coordinated signal of H4K12ac and gene expression can be inferred from our data, although the magnitude of change in each differs, with RNA-seq data manifesting a larger dynamic range.
Figure 7c suggested that expression changes of trehalose metabolism genes in mip6Δ cells were larger than in WT. We therefore investigated further this pathway. We confirmed a general -although with different magnitudes- upregulation for genes members of the trehalose metabolism pathway (Fig. 8a). Interestingly, we observe the highest value always for mip6Δ cells under these conditions, particularly for TSL1 and PGM2 genes, that showed the strongest transcriptional regulation, suggesting that a heat-induced accumulation of trehalose might be larger in the mip6Δ versus the WT. We verified this hypothesis by analyzing trehalose levels of our metabolomics dataset (Fig. 8b). We found a strong increase of this metabolite in the treated cells and a significant higher accumulation in the mutant. The metabolomics measurement was further confirmed by an independent analysis, where we measured trehalose levels in a double mutant lacking MIP6 and its yeast paralogue PES416 (Fig. 8c). Finally, network analysis of gene-metabolite levels of the trehalose pathway shows a strong correlation of trehalose with genes using this metabolite either as substrate (NTH1, NTH2) and product (TPS2), suggesting a direct regulation of trehalose levels by these gene products (Fig. 8d).
Taken together, this section shows a biologically consistent and coordinated signal of our RNA-seq, ChIP-seq and metabolomics datasets that agrees with previous findings. Our analysis also suggests a specific role for mip6 in the metabolic control of the heat-shock response, further supporting the biological interest of the dataset.
Preprocessing scripts for each of the omics datasets are available at the Github repository (https://github.com/ConesaLab/MultiMip6).
Rodríguez-Navarro, S. & Hurt, E. Linking gene regulation to mRNA production and export. Curr. Opin. Cell. Biol. 23, 302–309 (2011).
García-Oliver, E., García-Molinero, V. & Rodríguez-Navarro, S. mRNA export and gene expression: The SAGA–TREX-2 connection. BBA-Gene Regul. Mech. 1819, 555–565 (2012).
Kouzarides, T. Chromatin Modifications and Their Function. Cell 128, 693–705 (2007).
Zhang, T., Cooper, S. & Brockdorff, N. The interplay of histone modifications – writers that read. EMBO Rep. 16, 1467–1481 (2015).
Woo, H., Dam, H. S., Lee, S. B., Buratowski, S. & Kim, T. Modulation of gene expression dynamics by co-transcriptional histone methylations. Exp. Mol. Med. 49, e326 (2017).
Zinder, J. C. & Lima, C. D. Targeting RNA for processing or destruction by the eukaryotic RNA exosome and its cofactors. Gene. Dev. 31, 88–100 (2017).
Helwak, A., Kudla, G., Dudnakova, T. & Tollervey, D. Mapping the human miRNA interactome by CLASH reveals frequent noncanonical binding. Cell 153, 654–665 (2013).
Zander, G. et al. mRNA quality control is bypassed for immediate export of stress-responsive transcripts. Nature 540, 593 (2016).
Yoon, J.-H. et al. PAR-CLIP analysis uncovers AUF1 impact on target RNA fate and genome integrity. Nature Commun. 5, 5248–5248 (2014).
García-Oliver, E. et al. A novel role for Sem1 and TREX-2 in transcription involves their impact on recruitment and H2B deubiquitylation activity of SAGA. Nucleic Acids Res. 41, 5655–5668 (2013).
Cuenca-Bono, B. et al. A novel link between Sus1 and the cytoplasmic mRNA decay machinery suggests a broad role in mRNA metabolism. BMC Cell Biol. 11, 19–19 (2010).
Schneider, M. et al. The Nuclear Pore-Associated TREX-2 Complex Employs Mediator to Regulate Gene Expression. Cell 162, 1016–1028 (2015).
Schubert, T. & Köhler, A. Mediator and TREX-2: Emerging links between transcription initiation and mRNA export. Nucleus 7, 126–131 (2016).
Pascual-García, P. et al. Sus1 is recruited to coding regions and functions during transcription elongation in association with SAGA and TREX2. Gene. Dev. 22, 2811–2822 (2008).
Sen, R. et al. Distinct Functions of the Cap-Binding Complex in Stimulation of Nuclear mRNA Export. Mol. Cell. Biol. 39, e00540–00518 (2019).
Martín-Expósito, M. et al. Mip6 binds directly to the Mex67 UBA domain to maintain low levels of Msn2/4 stress dependent mRNAs. EMBO Rep. e47964, (2019).
Kim, D. et al. TopHat2: accurate alignment of transcriptomes in the presence of insertions, deletions and gene fusions. Genome Biol. 14, R36 (2013).
Anders, S., Pyl, P. T. & Huber, W. HTSeq - a Python framework to work with high-throughput sequencing data. Bioinformatics 31, 166–169 (2015).
Tarazona, S. et al. Data quality aware analysis of differential expression in RNA-seq with NOISeq R/Bioc package. Nucleic Acids Res. 43, e140 (2015).
Bullard, J. H., Purdom, E., Hansen, K. D. & Dudoit, S. Evaluation of statistical methods for normalization and differential expression in mRNA-Seq experiments. BMC Bioinformatics 11, 94 (2010).
Nueda, M. J., Ferrer, A. & Conesa, A. ARSyN: a method for the identification and removal of systematic noise in multifactorial time course microarray experiments. Biostatistics 13, 553–566 (2012).
Palomino-Schätzlein, M., Molina-Navarro, M. M., Tormos-Pérez, M., Rodríguez-Navarro, S. & Pineda-Lucena, A. Optimised protocols for the metabolic profiling of S. cerevisiae by 1H-NMR and HRMAS spectrosc. Anal. Bioanal. Chem. 405, 8431–8441 (2013).
Oliete-Calvo, P. et al. A role for Mog1 in H2Bub1 and H3K4me3 regulation affecting RNAPII transcription and mRNA export. EMBO Rep. 19, e45992 (2018).
Martin, M. Cutadapt removes adapter sequences from high-throughput sequencing reads. EMBnet J. 17, 10–12 (2011).
Langmead, B. & Salzberg, S. L. Fast gapped-read alignment with Bowtie 2. Nat. Methods 9, 357–359 (2012).
Yang, Y. et al. Leveraging biological replicates to improve analysis in ChIP-seq experiments. Comput Struct Biotechnol J. 9, e201401002 (2014).
Quinlan, A. R. & Hall, I. M. BEDTools: a flexible suite of utilities for comparing genomic features. Bioinformatics 26, 841–842 (2010).
Nuño-Cabanes, C. et al. A multi-omics dataset of heat-shock response in the yeast RNA transport protein Mip6. Gene Expression Omibus https://identifiers.org/geo:GSE135568 (2019).
Nuño-Cabanes, C. et al. A multi-omics dataset of heat-shock response in the yeast RNA binding protein Mip6 (NMR assay). MetaboLights https://identifiers.org/metabolights:MTBLS1320 (2020).
Nuño-Cabanes, C. et al. A multi-omics dataset of heat-shock response in the yeast RNA binding protein Mip6. figshare. https://doi.org/10.6084/m9.figshare.c.4716677 (2020).
Gasch, A. P. et al. Genomic Expression Programs in the Response of Yeast Cells to Environmental Changes. Mol. Biol. Cell 11, 4241–4257 (2000).
Causton, H. C. et al. Remodeling of yeast genome expression in response to environmental changes. Mol. Biol. Cell 12, 323–337 (2001).
Felix, C. F. et al. Protection against thermal denaturation by trehalose on the plasma membrane H+-ATPase from yeast. Eur. J. Biochem. 266, 660–664 (1999).
Hottiger, T., De Virgilio, C., Hall, M. N., Boller, T. & Wiemken, A. The role of trehalose synthesis for the acquisition of thermotolerance in yeast. Eur. J. Biochem. 219, 187–193 (1994).
Chu, S. et al. The Transcriptional Program of Sporulation in Budding Yeast. Science 282, 699 (1998).
Chen, L. et al. Genetic Drivers of Epigenetic and Transcriptional Variation in Human Immune Cells. Cell 167, 1398–1414.e1324 (2016).
Fritah, S. et al. Heat-shock factor 1 controls genome-wide acetylation in heat-shocked cells. Mol. Biol. Cell 20, 4976–4984 (2009).
This work is part of a research project funded by Generalitat Valenciana through PROMETEO grants programme for excellence research groups (PROMETEO 2016/093). We thank Dr. Palomino-Schatzlein from CIPF-IISLAFE Joint Research Unit of Metabolomics for support in metabolite sample preparation and analysis. Also, we thank Salva Casaní-Galdón from BioBam Bioinformatics S.L. for experimental support.
The authors declare no competing interests.
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
About this article
Cite this article
Nuño-Cabanes, C., Ugidos, M., Tarazona, S. et al. A multi-omics dataset of heat-shock response in the yeast RNA binding protein Mip6. Sci Data 7, 69 (2020). https://doi.org/10.1038/s41597-020-0412-z