Multi-omics network model reveals key genes associated with p-coumaric acid stress response in an industrial yeast strain

The production of ethanol from lignocellulosic sources presents increasingly difficult issues for the global biofuel scenario, leading to increased production costs of current second-generation (2G) ethanol when compared to first-generation (1G) plants. Among the setbacks encountered in industrial processes, the presence of chemical inhibitors from pre-treatment processes severely hinders the potential of yeasts in producing ethanol at peak efficiency. However, some industrial yeast strains have, either naturally or artificially, higher tolerance levels to these compounds. Such is the case of S. cerevisiae SA-1, a Brazilian fuel ethanol industrial strain that has shown high resistance to inhibitors produced by the pre-treatment of cellulosic complexes. Our study focuses on the characterization of the transcriptomic and physiological impact of an inhibitor of this type, p-coumaric acid (pCA), on this strain under chemostat cultivation via RNAseq and quantitative physiological data. It was found that strain SA-1 tend to increase ethanol yield and production rate while decreasing biomass yield when exposed to pCA, in contrast to pCA-susceptible strains, which tend to decrease their ethanol yield and fermentation efficiency when exposed to this substance. This suggests increased metabolic activity linked to mitochondrial and peroxisomal processes. The transcriptomic analysis also revealed a plethora of differentially expressed genes located in co-expressed clusters that are associated with changes in biological pathways linked to biosynthetic and energetical processes. Furthermore, it was also identified 20 genes that act as interaction hubs for these clusters, while also having association with altered pathways and changes in metabolic outputs, potentially leading to the discovery of novel targets for metabolic engineering toward a more robust industrial yeast strain.


Results and discussion
Metabolic changes in SA-1 strain upon pCA exposure. A thorough understanding of the effects of pCA on yeast metabolism is required to generate potential metabolic engineering strategies that can improve strain robustness. Since lignocellulosic hydrolysates contain a high number of phenolic compounds of which 80% represents pCA, this compost was added to the feed-medium of carbon-limited chemostats. Cultivation without inhibitors served as control. Data collected along each batch phase was linearized by applying the natural logarithm to exit of CO 2 values as a function of time. Specific consumption rates of glucose and specific production rates of selected extracellular metabolites are shown in Table 1 and Table S1.
It has been reported that in aerobic cultures containing pCA, the growth rate is significantly reduced in a dose-dependent manner and inhibiting the efficient bioconversion of lignocellulose biomass by fermentative organisms 46,47 . Although the S. cerevisiae CEN.PK113-7D strain was shown to be capable of slow in situ catabolic conversion of 9.7 mM pCA in aerobic batch cultivations, performing a complete conversion of pCA into other phenolic compounds over a period of 72 h 46,47 . Our results indicate that under anaerobic conditions, the same catabolic effect does not occur with the pCA concentrations remaining relatively stable across steady-state measurements. This highlights the importance of analyzing industrial S. cerevisiae strains under anaerobic chemostat conditions, especially considering that mitochondrial respiration has been shown to be of significant importance for controlling yeast growth rate processes and resistance to phenolic compounds [48][49][50] .
Exploring the differences in the physiological parameters resulting from the addition of the phenolic compound, multiple alterations were observed. Some specific consumption and production rates increased, such as for glucose (26%), CO 2 (12%), and ethanol (53%). On the other hand, decrease in biomass yield (22%), and the glycerol production rate (19%) was observed (Table 1). In anaerobic glucose-limited chemostat cultures of the S. cerevisiae strains, carbon is mainly diverted to ethanol and CO 2 , and minor amounts of glycerol, lactic and acetic acids, with a concomitant formation of yeast biomass. The ethanol yield of SA-1 in the control condition was 21% lower than in the presence of pCA. Moreover, under anaerobic glucose-limited chemostat cultivations pCA is not metabolized by SA-1 (Fig. 1).
Overall, physiological data collected from the anaerobic chemostat cultures showed that pCA exposure may increase ethanol production in SA-1 strain (log2(foldChange) = 0.58; t-test p-value < 0.05). These changes are accompanied by an increased glucose uptake rate (log2(fC) = 0.33, pval = 0.10), which might suggest alterations in the metabolic state. In addition, these changes were followed by a decrease in the overall dry-weight cell biomass (log2(fC) = − 0.27, pval < 0.05) and biomass yield (log2(fC) = − 0.25, pval = 0.10). This is different from previous studies that showed that S. cerevisiae Ethanol Red ® , when exposed to pCA during aerobic batch cultivations, increases biomass yield and decreases ethanol yield in comparison to control conditions (without pCA) 47 . Strains resistant to lignocellulosic inhibitors have lower growth rates when exposed to such compounds 27 ; however, SA-1 strain showed no signs of slower growth during the batch phase, with both control and treated samples showing an average growth rate around 0.37 h -1 .

Differential gene expression between treated and control samples. Gene expression analysis
based on RNASeq data revealed that both conditions (treated and control) have a high correlation between biological duplicates (Pearson's R 2 > 0. 99 , Fig. S1). These values are within the established parameters for chemostat cultures 51 , which generate replicates with low biological variability. Additionally, the principal component analysis showed that 79.57% of the explained variance observed in the samples can be associated with the axis that represents separation based on experimental conditions, with only 10.28% of explained variance being associated with alterations from samples under the same conditions (Fig. S1B). These analyses indicate significant changes in the transcriptomic landscape of SA-1 strain when exposed to pCA. Table 1. Physiology of S. cerevisiae strains in glucose-limited anaerobic chemostats at a dilution rate of 0.1 h −1 . Specific rates (q) are given in mmol g −1 h −1 , µ in h −1 , yeast biomass (X) in g DW L −1 , conversion factor of substrate into biomass (Y X/S ) in g DW g glucose −1 , ethanol yield (Y Eth/S ) in g ethanol −1 g glucose −1 , and C recovery in (%). The µ value represents the growth rate measured during the batch phase before the steadystate is achieved. Data is the average values of duplicate experiments ± deviation of the mean. In order to assess the quality of our differential gene expression data, we measured the observed relationships between average gene expression ratios and their deviation from the mean (Fig. S1C), the observed p-values and variation and expression quantiles (Fig. S1D) as well as the correlation between observed gene expression standard deviation and median expression value (Fig. S1E). Not only does our quality control data show that there is little-to-no bias in our dataset, it has a similar behavior to what is expected from a well controlled differential gene expression experiment when using RNA samples collected in the second steady-state of chemostat cultures 52 .
It was identified a total of 1472 differentially expressed genes between conditions (404 up-and 1068 downregulated); however, only 448 (10 up-and 438 down-regulated) of the genes was statistically significant, with p-value < 0.05, FDR ≤ 0.01, and had |log2(FoldChange)| ≥ 1 ( Fig. 2A, Table S2). It is known that the presence of insoluble lignocellulosic inhibitors in the medium can promote generalized downregulation of gene expression in another industrial S. cerevisiae strain 34 . With that in mind, the possible transcriptomic alterations induced by pCA in our chemostat experiments were explored. By using gene expression data obtained from our RNAseq data, we observed a tendency towards global downregulation of the transcriptional machinery. This same expression pattern was already observed by Moreno et al. 34 as a response to the presence of insoluble inhibitory compounds in the fermentation medium. Our findings not only reflect the same pattern reported, but highlights the importance of understanding the biological impacts of different types of inhibitory conditions on fermentative microorganisms for directing bioengineering efforts towards more robust strains for the ethanol industry.
The DEGs identified in our study were mapped in the KEGG database 53 to identify perturbations in pathways containing SA-1 differentially expressed gene sets (either up-or down-regulated), which affect the overall pathway activity (Fig. 2B, Table S3). Using this approach, a total of 11 pathways that had statistically significant gene set alterations by pCA stress were identified, and three were negatively altered (repressed): oxidative phosphorylation, citrate cycle, and peroxisome. The other eight had perturbations with positive effects on the pathway www.nature.com/scientificreports/ (activated): cell cycle, ribosome biogenesis, metabolic pathways, biosynthesis of amino acids, pyrimidine metabolism, purine metabolism, methane metabolism and glycine, serine, and threonine metabolism. Additionally, a total of 20 differentially expressed genes (17 downregulated and 3 upregulated) were identified, associated with redox regulation and/or response to reactive oxygen species and another set of 15 DEGs (13 downregulated and 2 upregulated) that can be linked to ethanol metabolism and/or fermentation (Fig. 2C). No overlapping genes between these two sets (ROS/Redox and Ethanol/Fermentation) were observed. Amongst the impacted pathways, purine metabolism was one of the most prominent, this pathway is involved in the formation of adenine and guanine. The former is an essential part of the overall cell metabolism, in the form of ATP/ADP/AMP, by providing energy for cellular processes, and the latter plays a distinct role in cell response to stress conditions in the form of GTP/GDP, a molecule often used in signaling processes during stress response for transcriptional regulation 54 and glucose signaling via GPCR 55 . The increase in the metabolism of these compounds could also be related to an increased rate of metabolic processes during the stress response. Furthermore, several differentially expressed genes that have known associations, via Gene Ontology 56,57 , to reactive oxygen species and ethanol fermentative processes are identified. Both of these processes are known to be intricately related to the mitochondria and peroxisome organelles 48,58,59 , and are known hallmarks of yeast response to stress induced by lignocellulosic inhibitors [59][60][61][62] . In addition to the aforementioned processes, it was also identified several other upregulated pathways that are linked to mitochondrial activity, such as biosynthesis and metabolism of amino acids 63 . These processes are intrinsically associated with the TCA cycle, which is downregulated in our dataset, and of paramount importance in maintaining amino-acid homeostasis, which, in turn, is vital for promoting long-term viability in yeasts 64 .
In addition to several of our results suggesting alterations surrounding pathways related to the mitochondria in response to pCA stress, multiple studies already pointed out the importance of this organelle towards the resistance to this particular compound [65][66][67] . However, the exact mechanisms by which the pCA affects this S. cerevisiae organelle under anaerobic conditions are still largely unexplored. When exploring the literature available for other organisms, some studies conducted in rat liver and human cells showed that pCA may cause damage to the mitochondria: by inhibiting the pyruvate transport mechanism 68,69 , inducing reactive oxygen species (ROS) damage 70 and mitochondrial membrane depolarization 71 . Another study demonstrated that PAD1, a mitochondrial protein that is downregulated (log2(FC) = − 0.5) in SA-1, is essential for the decarboxylation of phenylacrylic acids 65 . In all the three cases described in the literature, mitochondrial damage ultimately leads to a signaling cascade that starts cell autophagy.
Graph network clustering using known protein-protein interaction data associated with fold changes derived from RNASeq was applied to identify co-expressed gene clusters. This approach is advantageous for inferring networks from data with a low number of samples, as is the case of most experiments in biology. This is because it brings precision from a prior knowledge of a manually curated database (STRINGdb) combined with fold change values that include the particularity of the experiment. A total of 98 clusters were found within the differentially expressed genes, with 50% of DEGs being located in the 7 biggest clusters (Fig. 2D, Table S4). These clusters were then characterized according to their expression profile (Table S5), based on the distribution of log2(fold change) for genes within each cluster, enrichment of significant Gene Ontology classes and KEGG pathways (Table S5). Additionally, no correlation (R 2 = 0.033) was found between the number of genes and the overall standard deviation of fold changes observed within each cluster.

Functional characterization of co-expressed gene clusters.
To further explore the clustered gene sets, the clusters were filtered bythe fold change values, located in a distance of 1.5 × IQR (interquartile range), were in the same quadrant, either above or below zero, and had more than 20 genes. A total of 9 clusters (C2, C3, C4, C6, C7, C9, C10, C11, and C12), with a total of 462 DEGs, were selected and characterized according to biological functional enrichment and association (Fig. 3A, Table S6) with phenotypic alterations (Fig. 3B, Table S7). Six of these clusters (C2, C4, C7, C9, C10, and C11) comprised downregulated genes, while 3 (C3, C6, and C12) were mostly from upregulated genes (Fig. 3C). However, each cluster had distinct associations with biomass yield and ethanol production (Fig. 3D). Downregulated clusters tended to share positive associations with biomass and negative relations with ethanol production. Upregulated clusters, on the other hand, showed an inverse pattern, with negative regulation of biomass and positive association with ethanol.
To delve deeper into the transcriptomic landscape alterations induced by pCA stress on SA-1 strain, we shifted our focus to the co-expressed gene clusters extracted from our network in order to characterize which functional groups of genes were being up/down-regulated and if those clusters could be related to the pathways changes observed previously. Amongst the groups of genes that stood out from the background, it was noticed that one of the co-expressed clusters associated with autophagy (C11) was actively repressed in SA-1 under p-coumaric stress. Our data also suggests a potential disruption of the mitochondria, with 70% of the genes located in two downregulated clusters being directly associated with mitochondrial cellular processes (C2) and translation (C7). This correlates with the negative impact observed for KEGG pathways associated with the citrate cycle and oxidative phosphorylation. Furthermore, 20 of the 27 differentially expressed genes associated with the peroxisomal pathway (AGX1, CAT2, CTA1, DCI1, ECI1, FAA2, IDP2, IDP3, PEX1, PEX11, PEX14, PEX2, PEX5, POT1, POX1, PXA1, PXA2, SPS19, YAT1, and YAT2) are also located in a downregulated cluster associated with the cytoplasmic organic substance metabolic process (C4). Peroxisomes are involved with long fatty acid degradation and biosynthesis in yeasts 72,73 , with lipid metabolism/biogenesis being one of the downregulated gene clusters identified (C10). In contrast, increased expression was detected in genes linked with nuclear ribogenesis (C3) and metabolic activities (C6 and C12). When compared with the predicted pathway impact, C3 has 19 of the 23 DEGs associated with the ribosome biogenesis pathway ( When coupled with the alterations observed in gene sets associated with the mitochondria, our data imply that multiple biological pathways are being regulated to compensate for the stress induced by pCA exposure. Such regulation might be associated with processes used by SA-1 to survive under adverse conditions. To further explore the characterized clusters, we also evaluated each of the genes found within the clusters to establish which of them acted as "network hubs" in their respective clusters, based on their values for eigenvector centrality, betweenness, degree, and closeness. In total, 25 genes (Fig. 4, Table S8) that act as the main points of interaction for the genes in the network were identified.
Prediction of genomic short-variants based on RNA-seq. Using data collected from high-throughput RNA sequencing, we reconstructed short variants (SNPs and INDELs) that occur within the transcripts and predicted the impact that they might have on their associated coding sequence 74 . A total of 38,420 short variations were identified (compared to the R64-1-1 reference annotation) that can be subdivided into three major categories: homozygous (both alleles carry the variant), heterozygous with reference (one allele carries the variant, while the other is equal to the reference) and heterozygous without reference (both alleles carry different variants, and neither is equal to the reference). These short variants were then classified according to their predicted impact on their associated coding sequence (Fig. 5A): 683 modifier variants (non-coding, e.g., intronic or UTR variants); 23,388 low impact variants (e.g., synonymous mutations); 13,655 moderate impact variants (e.g., missense mutations that preserve overall protein length/structure) and 1093 high-impact variants  (Table S9). This occurs because a variant can impact multiple genes, and current limitations make it difficult to solve such conflicts using HTS data alone 74,75 .
The major classes of high-impact variants involved frameshift variants, and most of them (~ 38%) were heterozygous (concerning the reference) in nature. Homozygous frameshift variants also comprised the second biggest class, with approximately 25% of variants falling in that class, and non-reference heterozygous frameshifts comprising ~ 4% of high-impact variants. Heterozygotic variants located in splicing sites were also a major class of high-impact predictions, with a combined total of ~ 17%. Lastly, stop-gaining mutations represented ~ 8% of all high-impact variants, with approximately half (3.75%) being homozygotic in nature and the remaining (4.68%) being heterozygous.
Then only the variants that had predicted moderate and high protein impacts were filtered for further exploration. Our analysis showed that the major class of moderate impact alterations was the class of missense variants, with ~ 67% of them being homozygous in nature and ~ 29% being heterozygous with the reference (Fig. 5B). However, when comparing the same results for high-impact variants, a much broader distribution of CDS consequences (χ 2 (3, N = 15,166) = 7096.5, p < 0.001, Fig. 5C) was observed.
In addition to gene expression changes, mutations are also a major player in the process of generating resistant strains for biofuel production, be them artificially generated or naturally selected [76][77][78] . These structural alterations can promote changes in the function of genes and proteins in multiple ways 79 , with even small mutations possibly having far-reaching effects 80,81 . By characterizing the profile of single nucleotide variants when compared to the S288C reference strain (which has extensive functional gene annotation data), it was possible to predict the impacts of mutations found in SA-1's genes 82 . Although we recognize that a comprehensive analysis based on comparative genomics of industrial yeasts, which is outside the scope of the present article, would be ideal to characterize the genomic complexity of the SA-1 industrial strain, it's expect that, the extraction of the information on the genetic diversity encountered in SA-1 will provide an important layer of information on the mechanisms associated to pCA response on this strain.

Assembly of a multi-omics network model for pCA response.
To generate a single model that represents the association between differentially expressed genes located in perturbed pathways, the association with fermentation/ethanol and ROS/Redox, the presence of high-impact short variants, and phenotype impact prediction, all the information described in the previous sections was converted into a multi-omics graph-based network model. Integrative approaches, such as this, are especially relevant to "Big Data" datasets, such as ours, to extract comprehensive models that capture subtleties involved in biological regulation that otherwise would be lost if each "omic" was only analyzed independently 83 . This type of analysis has already been successfully used in multiple fields, from biomedical research 84,85 to biotechnology 86,87 . Furthermore, it has been demonstrated that in-silico modeling of complex biological networks can be a powerful tool for driving the improvement of commercially relevant organisms, such as the development of an improved model of Aspergillus nidulans metabolic network models, which is important for the construction and optimization of glucoamylase-producing strains [88][89][90] .
When applied specifically to S. cerevisiae, this strategy has also been proven to be crucial in unraveling novel molecular mechanisms associated with gene regulation 91 , stress tolerance 92 and selection of targets for bioengineering 93 . From the complete network, all the edges in which at least one vertex was either a hub gene (as shown in Fig. 4) or a gene associated with fermentation/ethanol or ROS/Redox (as shown in Fig. 2C) were extracted. A total of 16 genes (Fig. 6, Table 2) were selected based on the aforementioned criteria for constructing the model, while these targets were clustered into two major groups: those associated with ethanol production (IDP2, ERG10, CYT1, ARO10, GCV1, TDA10, and CHA1) and those related to biomass yield (SOD1, CTA1, www.nature.com/scientificreports/ IRC7, SPO14, UTP13, CYB2, MET3, ADK1, and ADE13). In order to facilitate the discussion, we will be focusing on the hub genes mentioned previously, since they are the most representative targets for their respective coexpressed clusters. However, we strongly encourage the exploration of our entire network model made available in the supplementary files (Table S10) by the readers. This multi-level network also showed the type of interaction between genes and their targeted phenotype and associated pathway, which can be either a direct relationship between the changes in gene expression and the pathway/phenotype alteration (e.g. both upregulated) or an inverse relationship (where one is upregulated and the other is downregulated). In total, 19 positive interactions (where the gene foldChange occurs in the same direction as the change in pathway activity or metabolite measurement) and 11 negative interactions (where the gene foldChange occurs in the opposite direction as the change in pathway activity or metabolite measurement) were found. The proposed network showed 6 genes with predicted mutations (TDA10, CHA1, SPO14, IRC7, ADK1 and ADE13), 2 genes associated with ROS/Redox processes (CTA1 and SOD1) and 1 gene directly related to fermentation/ethanol (ARO10).
Genes identified as anchor nodes in the multi-omics model were annotated using keywords assigned by the UniProt database that reflect their functional and structural characteristics. The additional columns also show their predicted phenotypic association (Phenotype), any associated pathways (Pathway) and if the gene contains an SNV site (SNV). By anchoring our network on genes and using their associations (positive, negative or neutral) to pathways, phenotypes and genomic variants, it was possible to identify two main groups of targets: those associated with www.nature.com/scientificreports/ ethanol production and those associated with biomass yield. Each of these groups has unique features, particularly in terms of their target pathways, which will be explored separately. When analyzing the first group of genes (ethanol related) of the integrated response model, the TDA10 gene, an ATP-binding protein with unknown function that resembles E. coli kinases 94,95 , was downregulated (log2FoldChange − 0.72) and had an inverse relation with glycine, serine, and threonine metabolism and ethanol production. In addition, the TDA10 gene was also the target of a homozygous missense variant in position 343 of the CDS, which changes the corresponding amino acid from phenylalanine to leucine, causing structural changes to the overall protein. A negative correlation with ethanol production for the ERG10 and IDP2 genes was observed. The former (ERG10) may act in the oxidative stress response 95 and its deletion was associated with slower doubling times and susceptibility to high NaCl concentrations 96 , being a major target for genetic engineering approaches 29,97,98 . The latter (IDP2) is an isocitrate dehydrogenase that was downregulated in our dataset (log2FC − 3.57) and has been linked to small reductions in yeast lifespan 99 -this gene was also downregulated in mutants susceptible to thermosensitive autolysis and associated with mitochondrial dysfunction 100 .
It was also identified 3 genes that had a positive correlation with ethanol production: ARO10, GCV1, and CHA1 (Fig. 2C). Besides its regulatory role in fermentation 101 , ARO10 acts in the detoxification of damaged amino acids and resistance to lignocellulosic compounds, such as HMF and furfural 102 . This gene was upregulated (log2FoldChange 1.09) upon exposure to 7 mM of pCA, with a positive correlation to ethanol production. The GCV1 gene, upregulated in our dataset (log2FC 0.82), encodes the T subunit of the mitochondrial glycine decarboxylase system and increases in expression under multiple types of stress responses in S. cerevisiae [103][104][105][106] ; however, the exact role of GCV1 in these scenarios is still not fully understood. Lastly, the CHA1 was slightly upregulated in our dataset (log2FC 0.51) and had a positive correlation with the biosynthesis of amino acids and metabolism of glycine, serine, and threonine. This gene catalyzes the degradation of L-serine and L-threonine to use them as nitrogen sources and is upregulated in the response to ethanol stress 107 and congo red 108 .
Both CYB2 and CYT1 are mitochondrial genes that are regulated during changes in the anaerobic metabolic processes and fermentation of glucose [109][110][111] . In our dataset, both genes showed a negative correlation with metabolic pathways (that is, they were downregulated while the pathway was activated). Moreover, they appear to have inverse relations with phenotypic changes: CYB2 has a positive (direct) relation with biomass yield, and CYT1 has a negative (inverse) relation with the ethanol output.
In the second group of genes (biomass related) a total of four genes (SOD1, CTA1, CYB2, and SPO14) with a positive influence on the biomass yield and five with a negative correlation (IRC7, UTP13, MET3, ADK1 and ADE13) were identified. One of the most interesting targets in this group is SOD1, a downregulated gene (log2FC − 1.34): it encodes a Cu-Zn superoxide dismutase that has the main role of catalyzing the breakdown of toxic superoxides in the cell 112 and is also involved in signaling processes involving oxygen and glucose stimuli 113 . However, recent studies suggested that the main biological role of these proteins in yeasts is the peroxide signaling and activation of peroxisomes and multiple cell homeostasis pathways 114 . This is in accordance with our findings www.nature.com/scientificreports/ for both gene expression, pathway impact and co-expressed gene clusters enrichment. The other gene associated with response to reactive oxygen species was CTA1, a downregulated gene (log2FC − 2.84) in our dataset. This gene encodes a catalase associated with ROS detoxification in peroxisome and in the mitochondria 115 , and its activity is relevant in oxidative 116 , acetic acid 117 and heat 118 stress responses. These genes showed positive associations with the activity of the peroxisomal pathway, which is one of the most affected by the stress induced by pCA in SA-1 strain. While SOD1 did not appear as a hub in the selected gene clusters, it did act as a major interactor for cluster 8, which is associated with regulatory and cell homeostasis pathways (Table S5). However, CTA1 is a major interaction hub for C4, being enriched for genes related to the metabolism of organic substances in the cytoplasm. Additionally, the SPO14 (log2FC − 0.82) gene was associated with changes in the cell cycle regulation (Honigberg et al. 1992) and regulation of lipid metabolism 119,120 in S. cerevisiae. In addition to its role in the cell cycle, SPO14 was also a hub gene for cluster 10, which is enriched in genes for the lipid metabolic process. As for the genes that had negative correlations with the biomass yield, the IRC7 gene seems to be a target with multiple associated conditions. Besides its transcriptional behavior (log2FC 0.60) in SA-1 strain, when exposed to pCA, this gene is involved in the production of thiol compounds 121 and yeast survivability using cysteine as nitrogen source 122 . This gene was the most affected by our analysis of variants, accumulating a large amount of moderate-to-high-impact impact variants in heterozygosity, but none in homozygosity, suggesting that one of the alleles might be severely impaired. This corroborates an analysis of wine fermenting yeasts, which showed that several S. cerevisiae strains carried inactivating mutations for one or both alleles of IRC7 123 , reducing the overall enzymatic activity of this protein. Another study showed that the over-expression of IRC7 also resulted in the increased production of hydrogen sulfide 122 , a volatile sulfur compound that has been linked to increased longevity in S. cerevisiae 124 . Lastly, 3 genes that showed a positive correlation with purine metabolism: MET3 (log2FC 0.73), ADK1 (log2FC 0.50) and ADE13 (log2FC 0.82) were also identified. Upregulated in our dataset (log2FoldChange 0.73), MET3 is an ATP sulfurylase involved in sulfate and methionine metabolism 125 , which was upregulated during hypoxia 48 . Moreover, the over-expression of ADE13 may increase fermentation efficiency under acetic acid stress 126 , while ADK1 appears to be activated in response to sulphuric acid 127 and to heat stress 128 . These three genes were also associated with clusters enriched in genes linked to the regulation of metabolic processes.

Conclusion
Our results suggest that p-coumaric acid (pCA) stress may induce higher cellular activity in SA-1 strain under anaerobic conditions, with increased glucose uptake rate, and CO 2 and ethanol production rates, being the major indicators obtained from quantitative physiological data. In accordance, it was also observed a decrease in biomass yield and overall dry-weight cell biomass, which implicates the existence of some type of disturbance in the cell homeostasis. It was demonstrated that pCA stress can cause an overall activation of metabolic and biosynthesis pathways, which are also followed by increased rRNA biogenesis. Downregulation of several mitochondrial and peroxisomal-associated pathways may also be an indicator of cellular damage caused by the exposure to pCA; our data suggest that SA-1 strain has yet-to-be-explored molecular mechanisms that allow them to circumvent triggers that lead to programmed cell death. At the gene level, multiple genes that could be novel and/or interesting targets for bioengineering were identified. Our results highlight the importance of an integrated approach for target identification and association with phenotypes of interest for industrial applications. By using network-enhanced gene cluster detection, genes that could be the most influential in their biological vicinity were identified. These "hub genes'' are prime targets for genetic engineering approaches, as they are the ones with the highest impact on their sphere of influence and are most-likely to produce deep alterations in the associated biological process within the gene community. Although exploratory in nature, the data presented in this study contributes to understanding the characteristics of pCA-induced stress in S. cerevisiae and deepening the knowledge of mechanisms used by industrial yeast strains that can thrive under high-stress conditions, such as exposure to lignocellulosic inhibitors. Taken together, our results show that the biological mechanisms used by S. cerevisiae SA-1 to survive under the influence of lignocellulosic inhibitors are much more intricate than previously understood. Multiple biological pathways, which sometimes have opposite effects when analyzed individually, are intertwined in a complex balance that allows these yeasts to thrive even when exposed to high levels of stress. Systemic analysis is essential to understand the nuances involved in such interactions, with several information sources and analyses being integrated into a single model that can reflect multiple levels of biological data. This is especially relevant for researches in economically-driven or similar fields, such as bioethanol production and other industrial capacities, where the ability to select targets for bioengineering approaches that maximize the desired effect (e.g. improving ethanol production) while minimizing undesired side-effects (e.g. affecting unrelated pathways and/or other phenotypes) can be of paramount importance to gain a competitive edge. By using our network model as a frame of reference to develop strains that are more robust to the effects of inhibitory compounds, such as pCA, we hope to drive innovation towards a more robust yeast strain that is capable of improved efficiency under the strenuous conditions imposed by industrial fermentation vats.

Methods
Yeast strain and cultivation conditions. The strain investigated in this study, S. cerevisiae SA-1, is an industrial strain obtained from Fermentec (Piracicaba, Brazil). Inoculum cultures were prepared from glycerol stocks stored at − 80 °C on a defined medium 129,130 , whose composition (in g L −1 ) is described as follows: (NH 4 ) 2 SO 4 , 5.0; KH 2 PO 4 , 3.0; MgSO 4 .7H 2 O, 0.5; 1 mL L −1 trace element solution, 1 mL L −1 vitamin solution, and 25 g L −1 glucose. Cultures were grown overnight at 30 °C in a rotary shaker at 200 rpm. Chemostat cultivation with pCA was performed in a 2.0 L water jacket model Labfors 5 (Infors AG, Switzerland) with 1.0 L working volume, which was kept constant by a mechanical drain and a peristaltic pump. Throughout cultivation, both the culture vessel (0.5 L min −1 ) and the medium vessel (flow rate not measured) were purged with nitrogen gas to maintain anaerobic conditions. The agitation frequency was set at 800 rpm, the temperature was controlled at 30 °C, and the pH was adjusted to 5.0 using a controlled 2 M KOH solution. Pre-cultures for batch bioreactor cultivations were grown overnight in an orbital shaker at 30 °C and 200 rpm in 500 mL shake flasks containing 100 mL of the defined medium with 20 g L −1 starting glucose. The medium had the same composition as the preculture, except that Tween 80 and ergosterol were added at a final concentration of 0.01 g L −1 and 0.42 g L −1 , respectively, to allow anaerobic growth. The batch phase was terminated after glucose depletion (monitored by a sharp drop in CO 2 concentration in the exhaust gas), whereupon cultivation switched to continuous mode with the addition of fresh medium supplemented or not supplemented with 7 mM pCA. The dilution rate was set at 0.1 h −1 and the cultivation was assumed to be in a steady state when the dry weight of the culture and the specific carbon dioxide production rate varied by less than 2% for two volume changes during at least five residence times 131 .
The chemostat system was chosen due to its characteristics of maintaining physiological conditions in constant values among experiments, which is important when trying to isolate transcriptomic alterations that arise in response to a singular input (in our case, the presence of pCA), eliminating the effects of growth rates and other stochastic perturbations which may arise due to environmental conditions 132 .
Analysis of extracellular metabolites. Cell dry mass concentration was determined by the gravimetric method 133 . Extracellular metabolite samples from the chemostat cultures were filtered through 0.2 µm syringe filters. Concentrations of residual carbon, ethanol, glycerol, and organic acids were quantified by high-performance liquid chromatography (HPLC) 134 , using a Prominence HPLC model (Shimadzu Corporation, Japan) and an HPX-87H analytical column (Bio-Rad Laboratories, USA) at 60 °C with 5 mM H 2 SO 4 as mobile phase at 0.6 mL min −1 . Ethanol concentrations were corrected for evaporation 135 and pCA was quantified 136 by using an HPLC and a C18 analytical column (Supelco Inc. model Waters Spherisorb ODS-25 µm, 250 mm × 4.6 mm) at 30 °C with 2% (v/v) acetic acid in ionized water (eluent A) and acetic acid 0.5% in ionized water and acetonitrile (50:50, v/v; eluent B) as mobile phase at 1.0 mL min −1 using a gradient program: from 10 to 15% B (10 min), 15% B isocratic (3 min), 15 to 25% B (7 min), 25  www.nature.com/scientificreports/ (5 min), from 100 to 10% B (0.1 min). The total run time was 60 min, with a flow rate of 1.0 mL min −1 and an oven temperature of 30 °C. The injection volume for all samples was 10 μL. Monitoring was performed using a Shimadzu UV detector at wavelengths of 280 nm and 320 nm. Concentrations of compounds were calculated from calibration curves obtained from standard solutions 131 .
RNA extraction and sequencing. RNA extraction was performed using the Direct-zol™ RNA MiniPrep kit (Zymo Research catalog no. R2051) following the manufacturer's instructions. RNA samples were sequenced using BGISEQ-500, with each library generating approximately 24 M paired-end reads of 100 bp. Raw RNAseq reads were filtered to remove adapter contamination and low-quality reads, with Trimmomatic 137 . Each sample was aligned against the R64-1-1 version of the S. cerevisiae reference genome, which is based on the S288C strain, with STAR v2.7.0 138 , using "-sjdbGTFfile, " "-quantMode GeneCounts", "-twopassMode Basic" and the ENCODE guidelines for best practices of eukaryotic RNASeq 139 as additional parameters. The corresponding gene annotation files and variant call files were also obtained for the same assembly. All genome data was obtained from Ensembl Fungi release 48 140 .
Gene expression analysis and functional characterization. Differential

Co-expressed gene cluster detection and hub gene identification. Co-expressed gene clusters
were identified using an adaptation of the kNN-enhance method, which intensifies an existing network with node attributes 145 . The total protein-protein interaction (ppi) network from STRINGdb (v11) was converted into an undirected graph, where each node is a protein and the edges represent known interactions between them-only interactions with a total ppi_score ≥ 0.7 (high confidence) were considered for downstream analysis. Each node metadata information was enhanced with an extra attribute corresponding to the log2(foldChange) value of that protein, and for each pair of vertices connected by an edge, a second score was used (called fold-Change_score). This metric was calculated by 1 − norm X i − X j 2 where X i is the foldChange for vertex 1, and X j is the foldChange for vertex 2, and norm X i − X j 2 is the normalized Euclidean distance between X i and X j . Thus, values for foldChange_score varied from 1 (identical foldChange scores) to 0 (the largest fold-Change difference between two nodes in the network). Final edge weight scores were calculated by combining foldChange_score and ppi_score in a new "enhanced_ score. " Co-expressed gene clusters were identified using MCL clustering 146,147 , applied to the attribute-enhanced network, with inflation hyper-parameter tuned to maximize modularity score (Q). For each of the identified clusters, it was also extracted genes that could serve as "local hubs" based on four different metrics: degree, betweenness, eigenvector and closeness.
BNFinder 148 was used For the association between genes and phenotypical data, combining per-sample normalized gene expression (from RNASeq) with physiological data (derived from HPLC), and converted the observations into classes with the Sturges' rule 149 .
A two-fold strategy was applied to generate cluster functional labels: the first was based on gene ontology enrichment classes, with the most significant enriched class (lowest FDR) that represented at least 50% of genes within the cluster; the second strategy was based on Bayesian inference of association with physiological data, with genes being able to be associated either positively or negatively with the changes in each measured metabolite.
Short variant discovery. Short variants (SNPs and Indels) were identified using the GATK4 pipeline, in accordance with the best practices for RNASeq short variant discovery 74,75 . Aligned RNA/seq reads in BAM format were used as input, as well as GFF and VCF files for R64-1-1 annotation obtained from Ensembl. Short variant impact was estimated using Ensembl Variant Effect Predictor 150 .
Multi-omics network assembly. A graph-based approach was used to integrate all data layers (gene expression, co-expressed cluster hubs, pathway impact and nucleotide variants) into a unified network model using NetworkX 151 . The information for each layer was re-structured and merged with the others in a way that for every pair of vertices υ and ν, the first vertex (υ) represents a gene and the second vertex (ν) represents the characteristic associated to that gene (a pathway, phenotype or mutation). The weight of the edge defined by (υ,ν) was assigned according to the relationship between the expression change of the gene and alteration on the pathway/phenotype: + 1 for direct relationships (the direction of the gene fold change is in the same direction of the altered pathway/phenotype, i.e. both upregulated), − 1 for inverse relationships (the direction of the gene fold change is in the opposed direction of the altered pathway/phenotype, i.e. one upregulated while the other is downregulated) or 0 for neutral relationships (in the case of nucleotide variants). Directionality of the network was established in accordance with the following structure: variant → gene → pathway/phenotype, to reflect the idea that: "a nucleotide variant may affect the gene function, leading downstream alterations".  11-15 (2008).