A multi-omics dataset of heat-shock response in the yeast RNA binding protein Mip6

Gene expression is a biological process regulated at different molecular levels, including chromatin accessibility, transcription, and RNA maturation and transport. In addition, these regulatory mechanisms have strong links with cellular metabolism. Here we present a multi-omics dataset that captures different aspects of this multi-layered process in yeast. We obtained RNA-seq, metabolomics, and H4K12ac ChIP-seq data for wild-type and mip6Δ strains during a heat-shock time course. Mip6 is an RNA-binding protein that contributes to RNA export during environmental stress and is informative of the contribution of post-transcriptional regulation to control cellular adaptations to environmental changes. The experiment was performed in quadruplicate, and the different omics measurements were obtained from the same biological samples, which facilitates the integration and analysis of data using covariance-based methods. We validate our dataset by showing that ChIP-seq, RNA-seq and metabolomics signals recapitulate existing knowledge about the response of ribosomal genes and the contribution of trehalose metabolism to heat stress. Raw data, processed data and preprocessing scripts are made available.

examples of experimental datasets where all these layers have been measured on exactly the same samples. This poses challenges to the mathematical integration of the data, as methods that rely on the analysis of co-variation patterns will have application restrictions. When the multi-layered data are obtained on the same samples, additional analysis opportunities arise that facilitate the establishment of relationships across regulatory mechanisms.
In this Data Descriptor, we present a yeast multi-omics dataset that features three basic layers of the transcriptional circuit, measured in the same set of samples. These include one epigenetic modification -H4K12ac, a mark for active promoters-, gene expression -RNA-seq-and targeted metabolomics. Moreover, data are obtained for both WT and a mip6Δ mutant, in control and heat-shock induced conditions. Mip6 is an RNA-binding protein that participates in RNA export under stress 16 and consequently is informative of the contribution of post-transcriptional regulation to the adaptation of RNA levels to environmental changes. Taken together, the selection of yeast strains, growth conditions and omics experiments, creates a unique dataset to study the cross-talk between epigenetic, transcriptional and metabolic regulation in response to environmental cues. The availability of multi-omics data on the same set of samples will facilitate the application of powerful statistical www.nature.com/scientificdata www.nature.com/scientificdata/ approaches that fully leverage paired measurements to propose quantitative regulatory models. A subset of this collection, namely gene expression data for 20 genes that are regulated by stress transcriptional factors Msn2/4, has been published elsewhere 16 . Methods experimental design. Figure 1 illustrates the experimental desing of our dataset. Panel A describes the strategy to manage sampling and the time-course nature of our experiment. A single, 330 mL culture (either for WT or mip6Δ strains) was grown at 30 °C until the exponential phase (OD = 0.7), and this culture was subsequently split across three flasks. One flask was maintained at 30 °C and labeled as time point 0. The other two flasks were incubated at 39 °C for 20 minutes and 120 minutes, respectively, by adding preheated media to rapidly increase the temperature to 39 °C. These last two flasks capture the heat-shock response, while the 30 °C flask   www.nature.com/scientificdata www.nature.com/scientificdata/ serves as a control representing non-stress conditions. The rationale for having two flasks for the 39 °C temperature instead of a single flask sampled at two different time points was to avoid introducing effects related to culture volume. Panel B describes how samples were obtained for omics measuments. Basically, for each of the flasks described above three aliquotes were extracted for RNA-seq, NMR metabolomics, and ChIP-seq analyses. Therefore, the three omics assays were performed on the same cell culture. RNA-seq and NMR aliquotes were obtained first, and the remaining culture was treated to induce cross-linking before collecting the ChIP-seq aliquote. After aliquoting, each tube was inmediatelly frozen and stored at −80 °C.  www.nature.com/scientificdata www.nature.com/scientificdata/ The process described in Fig. 1 was repeated 4 times to generate four biological replicates. However, due to sample management limitations these 4 replicates were created on two different days. Specifically, biological replicates 1 and 2 were obtained in day 1, and replicates 3 and 4 were obtained in day 2.   www.nature.com/scientificdata www.nature.com/scientificdata/ Acquisition and preprocessing of multi-omics data. RNA-seq. Total RNA was isolated by hot acid phenol extraction. RNA integrity was checked with Bioanalyzer (Agilent) and then submitted to a commercial sequencing facility (Macrogen Corea). Sequencing was done with Illumina using the TruSeq protocol. Between 50-60 million reads of 100 bp paired data were obtained from each sample. Raw sequencing data quality was checked by fastQC and good overall quality (Fig. 2a) was observed in all cases. No trimming was deemed necessary. Reads were mapped to the yeast saccer3 genome with Tophat2 17 and genes were quantified with HTSEQ 18 , intersection-option. Supplementary Table S1 shows the number of reads, mapping rate and number of reads in genes for all samples, revealing uniform quantities across the dataset.
The NOISeq 19 R package was used to perform the quality control of count data. We observed most of reads mapped onto protein-coding genes (>80%), as expected (Fig. 2b). Counts were normalized via TMM 20 and a low count filtering was applied with the NOISeq cpm method (with cpm = 1). Principal Component Analysis (PCA) indicated a slight batch effect for the day of culture growth ( Fig. 2c left) that was removed by ARSyN 21 (Fig. 2c  right). In total, we obtained gene expression values for 6,379 genes.
Metabolomics. Metabolomics measurements were performed on an NMR platform as described in 22 . Basically, metabolites were extracted via chloroform-methanol extraction and the spectra of cell extract samples were recorded on a Bruker AVII-500 using a TCI cryoprobe with spinning at 3,500 Hz. Spectra were processed using Topspin2.16 software (Bruker GmbH, Karlsruhe, Germany). Metabolite identification and assignment were performed with the help of the Human Metabolome Database and 2D NMR experiments. Signal peaks of spectra were normalized considering that the sum of peak areas across all metabolites was constant for every sample, and values for each metabolite were given as a fraction of the total area. A total of 45 compounds were detected, that included 5 sugars, 17 amino-acids, 4 alcohols, 3 vitamin-derivated compounds, 5 carboxylic acids, and other compounds (CMP, NAD, Glutathione, ATP and GMP), plus 3 unidentified metabolites (Table 1). Raw data were log2 transformed and compounds with non-positive measure across all samples were removed, as they were considered to be below reliable detection limit. PCA analysis indicated a small batch effect (Fig. 3a), that was removed by ARSyN (Fig. 3b).
ChIP-seq. For ChIP-seq samples, chromatin immunoprecipitation was performed as previously described 23 . After cross-linking the cultures for 20 minutes at room temperature with 1% formaldehyde (Sigma), they were quenched with 125 mM glycine for 15 minutes. Subsequently, the cells were collected by centrifugation, split in two aliquots (one for each ChIP) and washed with 25 mL of cold Tris-saline buffer (20 mM Tris-HCl, 150 mM NaCl, pH 7.5) three times. The pellets were frozen in liquid nitrogen and stored at −80 °C until further processing. Cells were disrupted by adding 300 µL of lysis buffer (50 mM HEPES-KOH at pH 7.5, 1 mM EDTA, 140 mM NaCl, 1 mM PMSF and protease inhibitors) and 200 µL of glass beads and vortexing for 13 minutes at 4 °C. The cell extracts were sonicated for 30 minutes in a Bioruptor sonicator (Diagenode) at high intensity www.nature.com/scientificdata www.nature.com/scientificdata/ using 30 seconds on/30 second off cycles in a 4 °C water bath. The cellular lysate was clarified by centrifugation at 12,000 rpm for 10 minutes at 4 °C and the whole supernatant was used for immunoprecipitation by incubating with magnetic beads (Dynabeads, Invitrogen) bound to anti-histone H4 (Abcam) or anti-histone H4K12ac (Active Motif) antibody for 2 hours at 4 °C. Beads were subsequently washed twice with lysis buffer, twice with lysis buffer supplemented with 360 mM NaCl, twice with wash buffer (10 mM Tris-HCl, pH 8.0, 250 mM LiCl, 125 mM Nadeoxycholol, 1 mM EDTA and 0.5% NP-40), and once with TE buffer. Samples were eluted by adding 50 µL of elution buffer (50 mM Tris-HCl, pH 8, 10 mM EDTA, 1% SDS) to the beads and incubating for 10 min at 65 °C. This step was repeated twice. The samples were incubated overnight at 65 °C to reverse the cross-linking and then incubated with 100 µg/250 µl of proteinase K (Ambion) for 1.5 h at 45 °C. DNA was isolated by phenol extraction. This DNA was sent to Macrogen Corea for sequencing.
Sequencing was done following the Illumina TruSeq protocol. Around 20 million 50 bp reads were obtained for each sample. Note that two ChIP-seq data files were obtained for each sample: H4 and H4K12ac. H4 files contain the reads after purification of total H4 histone and H4K12ac files contain the data associated to acetylation of Lysine 12. Raw sequencing data quality was checked by fastQC and good overall quality (Fig. 4a) was observed in all cases. Trimming of Illumina adapters was performed using Cutadapt 24 . Reads were mapped to the yeast saccer3 genome with Bowtie2 25 . Supplementary Tables S2 and S3 summarize sequencing performance in terms of number of reads and mapping rate for H4 and H4K12ac samples, respectively.
H4 sample of mip6.39.0 was discarded as it showed poor sequencing performance. Macs2 software 26 was used to call Histone 4 acetylation peaks on the H4K12ac samples alone. Next, a consensus file was generated by merging peaks across all samples using the merge command from bedtools software 27 with default parameters. These consensus regions were used to map back reads of all samples, including H4 samples. Peaks were quantified with HTSEQ 18 , intersection-option. NOISeq 19 , R package was used to perform a quality control of count data. Moreover, coverage per base was obtained for both, H4 and H4K12ac samples, using the genomecov command from bedtools 26 .

Data Records
Raw data. The yeast WT and mip6Δ heat-shock multi-omics data have been deposited in different public repositories 28,29 . Table 2 shows a list of the current hosting of raw data files.

technical validation
Validation of dataset replicability. In order to assess replicability, pairwise scatter plots were obtained for RNA-seq data (Fig. 5a), metabolomics data (Fig. 5b) and ChIP-seq data (Fig. 5c,d). Only WT strain replicates are shown as mip6Δ strain data behaved similarly. Replicates were highly and equally correlated with each other, and no experimental outliers were detected.
Validation of biological consistency. Translational repression upon heat-shock occurs in most eukaryotes 31,32 , and it is well known that ribosomal genes rapidly shut down after heat treatment. Moreover, previous studies assessing the impact of heat-shock on yeast cells revealed a protective effect for trehalose 33,34 . We evaluated whether this effect was corroborated by our data.
First, we analyzed ribosomal data. As expected, we found a general down regulation of both ribosomal protein genes (RP genes, Fig. 6a, left panel) and ribosomal biosynthesis genes (RiBi genes, Fig. 6a, right panel) upon a heat treatment at 39 °C. This response was similar for WT and mip6Δ strains. The strongest effect was observed after 20 minutes of heat (blue bars), and recovery was observed for all genes after 120 minutes (orange bars). The drastic downregulation of several RP genes after 20 minutes at 39 °C was further validated in independent experiments by q-PCR (Fig. 6b).
It is well stablished that histone modifications modulate gene expression programs 35 . Among other modifications, histone acetylation is considered as a key player in the epigenetic control of gene expression and is associated with transcriptionally active genes 36 . Moreover, a significant deacetylation of H4 was observed after a one hour heat-shock of HeLa cells, being histone H4K12ac affected 37 . To evaluate our ChIP-seq data in relation to heat-shock we first analyzed the composite profile across all genes for the H4K12ac marker (Fig. 7a). In agreement with its role in transcriptional activation, we found a general enrichment of H4K12ac at the Transcription Start Site (TSS) of genes, and, as expected for the heat-shock response, we found consistently lower levels for all 39 °C www.nature.com/scientificdata www.nature.com/scientificdata/ samples. Moreover, we found a significant reduction of H4K12 acetylation at the TSS of RB genes after 20 minutes of heat-shock (Kolmogorov-Smirnov test p.value < 1e-10, Fig. 7b, blue lines), which agrees with the strong down regulation of their expression at this time point (Fig. 6a, blue lines). Notably, H4K12 acetylation levels appeared to be fully restored after 120 minutes (Fig. 7b, orange lines) while gene expression levels were not (Fig. 6a, orange  lines). This result suggests that H4K12 acetylation responds more rapidly to heat stress than gene expression.
To determine if this time-dependent heat-response pattern between gene expression and H4K12 acetylation was a general pattern we analyzed the distribution of both types of data. In particular, we obtained the mean read coverage per base at the TSS ± 100 p for the H4K12ac signal in each gene, and computed log2 fold-change values (Log2FC) of the comparison between consecutive time-points. We compared this distribution to the Log2FC of gene expression. This analysis showed that the gene expression response measured by RNA-seq (grey boxes) has a larger dynamic range than H4K12 acetylation measured by ChIP-seq (white boxes), as the distributions are broader in the former (Fig. 7c). Additionally, the transcriptional response at 0′-20′ is overall larger than that at 20' to 120′. However, the direction of signal change at the gene-level seems to be the same for both omics layers, since the position of selected genes in the Log2FC distributions is similar for RNA-seq and H4K12ac data (Fig. 7c). This was true both for genes that are down-regulated upon heat-shock (RP genes) than for upregulated genes (i.e. trehalose metabolism genes, Fig. 7c). We concluded that a coordinated signal of H4K12ac and gene expression can be inferred from our data, although the magnitude of change in each differs, with RNA-seq data manifesting a larger dynamic range. Figure 7c suggested that expression changes of trehalose metabolism genes in mip6Δ cells were larger than in WT. We therefore investigated further this pathway. We confirmed a general -although with different magnitudesupregulation for genes members of the trehalose metabolism pathway (Fig. 8a). Interestingly, we observe the highest value always for mip6Δ cells under these conditions, particularly for TSL1 and PGM2 genes, that showed the strongest transcriptional regulation, suggesting that a heat-induced accumulation of trehalose might be larger in the mip6Δ versus the WT. We verified this hypothesis by analyzing trehalose levels of our metabolomics dataset (Fig. 8b). We found a strong increase of this metabolite in the treated cells and a significant higher accumulation in the mutant. The metabolomics measurement was further confirmed by an independent analysis, where we measured trehalose levels in a double mutant lacking MIP6 and its yeast paralogue PES4 16 (Fig. 8c). Finally, network analysis of gene-metabolite levels of the trehalose pathway shows a strong correlation of trehalose with genes using this metabolite either as substrate (NTH1, NTH2) and product (TPS2), suggesting a direct regulation of trehalose levels by these gene products (Fig. 8d).
Taken together, this section shows a biologically consistent and coordinated signal of our RNA-seq, ChIP-seq and metabolomics datasets that agrees with previous findings. Our analysis also suggests a specific role for mip6 in the metabolic control of the heat-shock response, further supporting the biological interest of the dataset.

Code availability
Preprocessing scripts for each of the omics datasets are available at the Github repository (https://github.com/ ConesaLab/MultiMip6).