Transcriptomic profiling of Escherichia coli K-12 in response to a compendium of stressors

Environmental perturbations impact multiple cellular traits, including gene expression. Bacteria respond to these stressful situations through complex gene interaction networks, thereby inducing stress tolerance and survival of cells. In this paper, we study the response mechanisms of E. coli when exposed to different environmental stressors via differential expression and co-expression analysis. Gene co-expression networks were generated and analyzed via Weighted Gene Co-expression Network Analysis (WGCNA). Based on the gene co-expression networks, genes with similar expression profiles were clustered into modules. The modules were analysed for identification of hub genes, enrichment of biological processes and transcription factors. In addition, we also studied the link between transcription factors and their differentially regulated targets to understand the regulatory mechanisms involved. These networks validate known gene interactions and provide new insights into genes mediating transcriptional regulation in specific stress environments, thus allowing for in silico hypothesis generation.


Materials and methods
Bacterial strains and growth conditions. Overnight cultures of E. coli K12 strain MG1655 grown in different environments were diluted 1:1000 and grown at 37 °C and 220 rpm. On reaching the desired OD 600 , growth was stopped by adding Qiagen RNA Protect Bacteria Reagent (cat no. 76506). The antibiotic concentrations used were determined by performing an minimum inhibitory concentration (MIC) experiment for chloramphenicol and trimethoprim, and a concentration that gave twice the doubling time compared to growth in control media was chosen. The growth conditions are described in detail in Table 1. For each growth condition, two biological replicates were grown, except for Rich M9_1, which had four. Different OD600 values account for the differential growth rates of E. coli in different conditions. Library preparation and sequencing. Total RNA was isolated using the Qiagen RNeasy Mini Kit (cat no. 74104) and checked for purity and intactness with a Agilent 2100 Bioanalyzer. The libraries were prepared with ribosomal RNA depletion (Ribo-Zero, NEB) and were sequenced on Illumina HiSeq2500-v4, SR100 mode at the VBCF NGS Unit (https:// www. vbcf. ac. at), resulting in 10.7-13.8 million single-end 100 bp Illumina reads per sample.
For RNA-seq quantification, a processed GFF file was generated by Bacpipe, where all features of interest were listed as a "gene", with each gene identified by MG1655 locus tag. Resulting GFF file contained 4566 features (4240 protein coding, 147 pseudogene, 71 noncoding RNA, 22 rRNA, and 86 tRNA). Following this, read counting was done by featureCounts v1.6.4, using options "-O -M --fraction -t gene -g ID -s 2", since the library was www.nature.com/scientificreports/ sequenced using a dUTP-based strand-specific protocol (Supplementary Table). Overall, 95.0-97.6% of initial reads were assigned to an annotated feature. For visualization, scaled gedGraph files were generated using bedtools genomecov with a scaling coefficient of 109/(number of aligned bases), separately for sense and antisense DNA strands. Bedgraph files were converted to bigWig using bedGraphToBigWig utility (Kent utilities, http:// hgdow nload. soe. ucsc. edu/ admin/ exe/ linux. x86_ 64/). Coverage tracks, annotation, and genome sequence were visualized using JBrowse v1. 16 (Reiner-Benaim, 2007) was used to compute P adj values. An absolute value of log 2 fold change > 2 (i.e., a fourfold difference in either direction) and an P adj < 0.001 was used as the threshold for selecting DEGs.
Stress response network (SRN). Genes involved in E. coli K12 stress response were obtained using GO term GO:0006950 (Response to stress) from Ecocyc 15 . Protein-protein interaction (PPI) networks were constructed for (i) genes in GO:0006950 and (ii) DEGs identified in E. coli K12 grown in carbon and amino acid starvation, low oxygen, presence of antibiotic stress and low pH, using the Search Tool for the Retrieval of Interacting Genes/Proteins (STRING) database version 11.0b using high confidence (cutoff score: 0.7). Each of these networks were exported to Cytoscape 3.8.0 and a union of the two networks was created to be used as the final Stress Response Network (SRN). Densely connected regions in the network representing important pathways were identified using MCODE (Molecular Complex Detection) clustering algorithm 16 . Additionally, the network was analysed to identify crucial stress response proteins. Functional enrichment of Gene Ontology (GO) annotations was performed using Database for Annotation, Visualization, and Integrated Discovery (DAVID) 6.8 and was visualized using MonaGO 17 .

Co-expression network analysis.
Signed weighted gene co-expression networks were constructed for the dataset using the Weighted Co-Expression Network Analysis (WGCNA) package 9 in R version 4.0.2. Firstly, to reduce the effect of noise due to low expression, genes where the sum of counts across all samples was < 10 TPM (Transcripts Per Million) were discarded from further analysis 18 (297 genes were discarded). A variance stabilizing transformation was then applied to the TPM counts using the DESeq2 package (Love et al., 2014). The goodSamplesGenes function was used to ensure the dataset had no missing values. A distance matrix was created for the samples and hierarchical clustering was applied to detect any sample outliers. Hierarchical clustering on the distance matrix of samples did not detect any outliers, hence all samples were retained for the analysis. Network topology analysis was performed using multiple soft-thresholding powers to obtain reliable scale independence and mean connectivity measures. Based on a scale-free topology criterion, an appropriate soft-thresholding power b was chosen using the pickSoftThreshold function. We chose the power for which the scale-free topology fit index (R 2 ) was > 0.80. The Pearson's correlations were raised to a power (b) of 14 to create a weighted adjacency matrix, which was then transformed into a Topological Overlap Matrix (TOM) and the corresponding dissimilarity was calculated to reduce the effects of pseudo associations. The topological overlap for a pair of genes is calculated by comparing their connections with all other genes in the network. Genes sharing the same neighbourhood are said to have a high topological overlap 19 . The TOM matrix was used as an input to create a dendrogram of genes using average linkage hierarchical clustering. Each leaf in the dendrogram represents a gene and the highly interconnected and co-expressed genes are grouped together by the branches. Gene modules were identified by cutting the branches off the dendrogram using the cutreeDynamic function and using a minimum cluster size of 30 genes. Modules with similar expression profiles were merged by clustering the module eigengenes and using the mergeCloseModules function with a height cut of 0.25. Eigengenes of the new merged modules were calculated and each module was identified with a colour, grey colour representing a module of uncorrelated genes. Module eigengenes were correlated to the environmental traits in the dataset to look for the most significant associations. P-values were calculated for the correlations and an FDR correction was applied to compute P adj values. Significant modules were identified using a cut-off criteria of correlation value > 0.7 and P adj ≤ 0.005. Network construction and identifying hubs. Protein-protein interaction (PPI) networks were constructed for the significant modules using the Search Tool for the Retrieval of Interacting Genes/Proteins (STRING) database version 11.0b with default parameters. The interactions from the STRING database were exported to Cytoscape 3.8.0 and the top 10 protein hubs for each module were identified using the Maximal Clique Centrality (MCC) algorithm of the CytoHubba plugin. The MCC algorithm was chosen for hub identification, as it performs better in comparison with the other algorithms of the CytoHubba plugin and can capture both high and low degree essential proteins in a network 20 .

Identification and distribution of transcription factors (TFs) and sigma factors. A collection of
TFs and sigma factors was obtained using the RegulonDB version 10.8 datasets supported by experimental evidence. In order to identify key genes induced by stress, we analysed the DEGs in each of the stress environments in our study to look for TFs and their regulated gene targets.
An enrichment analysis was also carried out using the hypergeometric test in R version 4.0.2 to identify modules highly enriched in TFs and sigma factors and comprehend their association with the co-expression modules. www.nature.com/scientificreports/ Differentially co-expressed genes (DCGs). The WGCNA function in the DCGL package in R version 4.0.2 was used to identify DCGs for relevant pairwise comparisons of different environments. Alternatively, overlapping genes between the co-expressed genes in the hub modules and the respective DEG set were also identified. Briefly, for environmental traits with more than one significant module, co-expressed genes were pooled into a single list and the genes overlapping with the DEGs were identified. They are referred to as differential co-expression overlap gene set (DCOGs) in this paper. TFs from DCOGs and the gene targets regulated by them were identified and the targets showing differential regulation were extracted for further study.
Functional enrichment analysis. Gene Ontology Biological Processes (GO-BP) and Kyoto Encyclopaedia of Genes and Genomes (KEGG) Pathway enrichment analysis were carried out using the ClusterProfiler package in R 21 to determine the processes and pathways regulated by the co-expression modules, transcription factors and their DE targets. Enriched terms were identified using a cut-off criteria of P adj ≤ 0.01.

Results
Identification of differentially expressed genes (DEGs). Transcriptomic changes were studied by identifying DEGs showing significant fourfold difference in gene expression in either direction. To understand the effect of environmental stress on gene regulation, we compared antimicrobial (Fig. 1a,c,d), acidic pH ( Fig. 1a,f), and low oxygen environments (Fig. 1a,e) to the Rich M9 environment. The effect of nutrient limitation was analysed by comparing gene expression data in minimal environments (Rich M9 and poor M9) to a nutrient rich growth media (LB) (Fig. 1b). In comparison with the nutritionally rich growth condition, growth in poor M9 showed the highest number of DEGs, a total of 746 genes were identified, of which 558 genes were significantly up-regulated and 188 genes were significantly down-regulated. In comparison with Rich M9, growth in antimicrobial containing environments had the highest number of DEGs. SRN network. The SRN generated consisted of 1431 nodes and 12,076 edges. Topological analysis of the network using the NetworkAnalyzer plugin in Cytoscape 3.8.0 showed that the highest degree was 157 and the average was 18.184. Degree of a node represents the number of nodes connected to it. Nodes with a higher degree are likely to be considered as hubs or central proteins in the network 22 . Using degree of a node as a measure of centrality, we identified 33 central proteins. We hypothesize that these proteins are crucial in responses to a variety of stressors and might also be involved in mediating stress-induced cross-protection. The list of proteins central to stress response and their literature annotations can be found as Supplementary Table S1 online. Out of the 33 central proteins identified, 24 were found to have known roles in stress response.
Five sub-networks using the DEGs were generated for each of the stress environments-Poor M9, CAM, TMP, LOX, and pH5. We observed that a majority (> 70-80%) of these central proteins were present in each of the stressor specific sub-networks, except for the LOX sub-network which had 40% of the central proteins present. The genes in top five densely connected clusters in the SRN identified by MCODE belonged to Flagellar assembly, Energy metabolism, SOS response and DNA Repair, RNA binding proteins, and Biosynthesis of amino acids and secondary metabolites. Genes involved in each of the clusters are outlined in Supplementary  Table S2 online. Functional enrichment of SRN identified response to heat and oxidative stress, DNA repair, SOS response, TCA cycle, anaerobic respiration, nitrate assimilation, flagellum based cell motility and cellular response to DNA damage as significantly enriched (P adj < 0.01) (Fig. 2).
Co-expression networks. Signed networks were constructed to identify genes that are co-expressed in the tested growth conditions as they take into consideration the sign of the correlation coefficients and can identify modules that are significantly positively or negatively correlated with the categorical variables or experimental conditions.
There are trade-offs for maximising R 2 and retaining the number of mean connections, hence a power (β) of 14 was chosen.The network analysis identified a total of 20 co-expression modules and were assigned a colour each indicated at the bottom of the dendrogram.
On introducing environmental traits in the network, significant associations (modules) were identified using a threshold of correlation value (between module eigengenes (ME) and traits) > 0.7 and P adj ≤ 0.005. Of the 20, 7 modules were identified with significant associations (Fig. 2). No modules were found to be significantly associated with Rich M9, Rich M9 supplemented with 1.2 µg/ml chloramphenicol (CAM) and Rich M9 acidic condition (pH5). Summary about the modules is given in Table 2. Detailed information about the genes in the modules can be found in Supplementary Table S3 online.
Functional enrichment analysis was carried out to understand the relationship between the modular biological functions and the experimental conditions. Analysis of E. coli K-12 grown in nutritionally rich Lennox broth identified two modules, green and darkorange (Fig. 3) that are positively and negatively correlated to the environment, respectively. The genes in the green module are significantly enriched in biological processes like carbohydrate metabolism and transport (Fig. 4a), whereas the darkorange module is involved in vitamin, amino acid, and nucleotide metabolic and biosynthetic processes (Fig. 4b). Three modules, blue, darkred and purple (Fig. 3) were found to be significantly associated with the minimal growth media-Poor M9. Of these, the darkred module (Fig. 4a) was significantly negatively correlated with the growth environment. Some of the GO-BP terms associated with the darkred module include ribosome biogenesis, RNA metabolic processes, post-transcriptional regulation of gene expression, translation. The blue module was largely involved in organic substance catabolic processes and alpha-amino acid biosynthesis and metabolism (Fig. 4b). The purple module was mainly linked with small molecule catabolic process and carbohydrate transport (Fig. 4c). Growth of E. coli K-12 in Rich M9 medium supplemented with trimethoprim 0.3 µg/mL was found to be significantly associated with genes in the www.nature.com/scientificreports/ orange module (Fig. 3). The orange module is associated with processes like SOS response, cellular response to DNA damage, stress, and external stimulus (Fig. 4c). Genes in the saddle brown module were positively correlated with the low oxygen (LOX) environment (Fig. 3) and were majorly involved in cofactor metabolic processes, iron-sulfur cluster assembly and protein maturation (Fig. 4c). In the KEGG enrichment analysis, 27 pathways were significantly enriched among five of these modules, some of which include microbial metabolism in diverse environments, quorum sensing and phosphotransferase system (PTS) (Fig. 4d). No significantly enriched KEGG terms were found to be associated with the orange and saddle brown module. www.nature.com/scientificreports/

Protein-protein interaction (PPI) networks and screening hub genes. PPI network interactions
for the hub modules were obtained from the STRING tool and exported to Cytoscape for network visualization (Fig. 5). The MCC algorithm of the CytoHubba plugin in Cytoscape 20 was used to identify the top 10 hub genes in each of the modules. Module hubs and their respective functions can be found as Supplementary Table S4 online.

Distribution of TFs and sigma factors. Transcription factors rewire bacterial gene expression based on
environmental signals allowing the bacterium to rapidly survive stress 23,24 . We found a total of 32 TFs to be differentially expressed across Poor M9 (individually compared to LB and Rich M9), CAM, TMP and pH5. The frequently occurring TF families found were AraC/Xyls, LuxR/UhpA and LysR. A comprehensive list for each stress environment detailing the TFs identified, their family classification, target genes and their differential expression status in that environment can be found in Supplementary Tables S5-S9 online. Figure 6a shows differential expression of the 32 TFs across Poor M9, CAM, TMP and pH5. Additionally, the distribution of experimentally validated TFs and sigma factors in the modules was identified. Based on the enrichment analysis using the hypergeometric distribution, two modules, green (p-value = 0.001) and purple (p-value = 0.02) were found to be significantly enriched in TFs. To identify the pivotal pathways regulated by TFs during stress, a KEGG enrichment analysis was carried out for the differentially regulated TF -gene targets only in the purple module (significantly associated with carbon and nitrogen starvation environment).  www.nature.com/scientificreports/ The KEGG analysis identified quorum sensing to be significantly enriched (P adj < 0.01). Out of the seven modules, only two modules (blue and darkred-significantly associated with Poor M9 environment) were found to be significantly enriched (P adj < 0.05) in sigma factors. Figure 6b shows the distribution of sigma factors in the blue and darkred modules. The entire list of sigma factor regulated genes in the two modules can be found as Supplementary Table S10 online.
Association between differentially expressed and co-expressed genes. The WGCNA function in the DCGL package did not identify any differentially co-expressed genes in the tested pairwise comparisons.
To understand the link between differential expression and co-expression, a list of all genes that were significantly co-expressed in an environment were compared to the list of genes differentially expressed in the same environment. This analysis was carried out only for the carbon and amino acid starvation condition (Poor M9), low oxygen (LOX), and antimicrobial stress (TMP), as significant co-expression was observed only in these environments (Fig. 3). DCOGs or the genes commonly identified between differential expression and co-expression were extracted for each of the stress environments-Poor M9, LOX and TMP. The list of DCOGs was further analysed to look for the presence of TFs and its respective differentially regulated targets. A KEGG pathway analysis was carried out for the identified TFs and its differentially regulated targets, in an effort to understand the regulation of transcriptional network by bacteria when encountering stress. No TFs were identified in the DCOGs for the low oxygen (LOX) environment. Six TFs were identified in the DCOGs for Poor M9 and the flagellar assembly pathway was found to be significantly enriched (P adj < 0.01) in KEGG analysis. A single TF RcsAB was identified in the DCOGs for environment containing the antibiotic trimethoprim (TMP) and the KEGG analysis revealed the biofilm formation pathway to be significantly associated (P adj < 0.01).

Discussion
In this study we use RNA-Seq and WGCNA to understand the functional modules and co-expression networks involved in the stress response of E. coli K12.  www.nature.com/scientificreports/ The Poor M9 medium is a representative of carbon limitation and amino acid starvation conditions. Analysis of TFs showed that genes involved in response to sulfate starvation, curli assembly and transport, biofilm formation, maintenance of pH homeostasis, arabinose catabolism, glycolate utilization were induced in response to starvation, indicative of their role in response to starvation. Based on co-expression analysis, nutrient depletion and amino acid stress activates a stringent response pathway in E. coli. The pathway is mediated by genes relA and spoT through the synthesis of the alarmone (p)ppGpp 25 . (p)ppGpp serves a global regulator for gene expression and redirects gene transcription from genes required for growth to genes required for survival during starvation 4 . The modules blue, darkred and purple were identified as being significantly associated with the Poor M9 medium (Fig. 3). The darkred module (Fig. 4a) was significantly negatively associated with the Poor M9 medium and was involved in translation, ribosome biogenesis, RNA metabolic processes and post-transcriptional regulation of gene expression, which is indicative of the induction of the stringent response pathway 26 . Although, the genes involved in mediating this pathway, relA and spoT, were not found to be differentially expressed in the Poor M9 medium in our dataset, previous studies have reported a decrease in basal levels of (p)ppGpp with high levels of SpoT 27,28 . Also, the "hopping model" for RelA mediated (p)ppGpp synthesis suggested that during a stringent response, RelA may hop between ribosomes, thereby allowing low enzyme concentrations to produce sufficient levels of (p)ppGpp 29 . However, this model has not been supported by other studies 30,31 and the mechanistic details of (p)ppGpp synthesis via RelA remains unclear. Based on this, it can be inferred that the darkred module is involved in regulatory interactions leading to the activation of the stringent response pathway under starvation conditions. On the other hand, the blue (Fig. 4b) and purple (Fig. 4c) modules, found to be positively correlated with Poor M9, are mostly involved in amino acid biosynthesis and import, and catabolism. This can be attributed to the stringent response as well as the RpoS dependent stress response mechanisms. RpoS is known to be a master regulator of stress response, required for adapting to growth in conditions with glycerol as the sole carbon source by inducing the carbon scavenging mechanisms in the cell 32 . Out of the 558 genes significantly upregulated in the Poor M9 medium, 105 were found to be RpoS-dependent. Also, an increase in the (p)ppGpp level due to the activation of the stringent response pathway favours the transcription of s 38 (RpoS) dependent www.nature.com/scientificreports/ promoters 33 . Several studies have shown that a rise in the alarmone ((p)ppGpp) levels, in co-ordination with the nutrient responsive transcription factor DksA, activate transcription of genes involved in de novo amino acid biosynthesis and import [34][35][36][37] . The RpoS dependent stress response also activates the transcription of poxB and acs (differentially upregulated in Poor M9) involved in carbon metabolism converting pyruvate to acetyl coenzyme A as well as other genes involved in catabolic reactions 33,38 . This explains the role of the co-expressed genes identified in the three modules significantly associated with Poor M9. Furthermore, our analysis also shows how the (p)ppGpp mediated stringent response activated by coexpressed genes in the darkred module feeds into the induction of coexpressed genes in blue and purple modules, thereby activating the RpoS dependent stress response mechanisms. This explores a potential link between the three different coexpressed gene modules in an environment.
Based on the hypergeometric distribution, the purple module was significantly enriched in transcription factors. A KEGG analysis of the DEG regulated by the TFs revealed the quorum sensing pathway to be significantly enriched and the genes mediating the quorum sensing (QS) pathway belong to lsr operon. Although, a previous study showed that the presence of glycerol and glycerol-3-phosphate (G3P) repress the lsr operon 39 , the lsr operon genes are found to be significantly upregulated in our study, making it an interesting experimental candidate for further exploration of the role of lsr operon in nutrient limiting conditions. Also, it would be interesting to see if there is a link between the stringent response and the lsr mediated QS circuit in E. coli K-12. Links between the stringent response and QS regulation have been demonstrated in the enterohemorrhagic E. coli O157:H7 EDL933 strain 40 . In addition, the blue and darkred modules were found to be significantly enriched for sigma factors. Majority of the genes in these modules were under the control of σ 70 (regulation of housekeeping genes), followed by σ 54 (regulation of genes involved in nitrogen metabolism), σ 38 (regulation of stationary phase genes), σ 32 (heat shock regulating genes), σ 24 (extreme heat shock genes) and a small number of genes under the control of σ 28 (regulation of flagellar proteins) 41 . The data confirms that a starvation response offers cross-protection to high temperatures in E. coli, and is also supported by the GO enrichment analysis of SRN (Fig. 2).
Analysing the transcriptional regulation using the association between differential expression and co-expression indicated that the Poor M9 medium was enriched in flagellar assembly. Genes involved in the pathway were under the control of the CsgD transcription activator/master biofilm regulator, that is known to repress the flagellar assembly genes 42 . This is in line with the upregulation of the CsgD (Curli subunit gene D) cascade (genes involved in curli assembly and transport) and downregulation of the flagellar genes observed in our study. E. coli can switch between a sessile lifestyle regulated by CsgD and motile planktonic growth regulated by the flagellar cascade depending on environmental signals. Studies have reported a "foraging behaviour" in E. coli in response to poor carbon sources (e.g., glycerol, glycine, and succinate), in which bacteria activate the costly mechanism of flagellar synthesis in order to access better growth conditions [43][44][45] . Six non-coding, small RNAs (sRNA) (OmrA, OmrB, GcvB, RprA, McaS, ArcZ) have been reported to fine-tune the interplay between curli mediated biofilm state and flagellar mediated motility 46,47 . The expression levels of the six sRNAs were analysed and were indicative of curli mediated biofilm formation in Poor M9 in this study. Pathogenic E. coli may undergo rapid biofilm dispersal and revert back to single cell planktonic state 48 , suggesting that cell motility is related to virulence/pathogenicity, whereas biofilm formation is a mechanism of defence against stress 49 .
Trimethoprim (TPM) is an inhibitor of bacterial DNA synthesis by inhibiting dihydrofolate reductase, thereby preventing thymine incorporation 50 . Exposure to antibiotic stress lead to the upregulation of curli genes and genes involved in synthesis of colanic acid capsule (See Supplementary Table S8 online). WGCNA identified the orange module to be significantly associated with the TMP environment. The module is mostly involved in SOS response and cellular response to stimulus via recA expression (Fig. 4c), which confirms the facts known regarding SOS induction caused by trimethoprim. The SOS response is also known to induce filament production on exposure to trimethoprim by transcribing the SOS cell division inhibitor, sulA 50,51 , found to be differentially upregulated in the TMP environment in our study. The rcsA gene encoding the RcsAB transcription factor was identified in the DCOGs for TMP, and its differentially regulated targets (csgDEF) were enriched in biofilm formation. Bacteria are known to form biofilms as part of the SOS response, providing them with protection from antibiotic exposure and other harsh environments 42,52 . The Rcs system is a negative regulator of the csg operons in an Rcs-A dependent manner 53 , however, there is a significant upregulation of genes csgD, csgE and csgF of the csgDEFG operon as well as csgB of the csgBA operon in our study. The regulation of curli expression is a highly complex interaction, with more than ten transcription factors controlling the csgD promoter, each responding to a different aspect in stress related environments 54,55 . It will be interesting to explore if the antibiotic stress signal causes a derepression of the Rcs system to activate curli expression and biofilm formation.
The stress response generated by low oxygen environment was studied by sealing the E. coli cultures with a layer of paraffin oil, thereby limiting gas exchange. An air-saturated medium leads to endogenous production of hydrogen peroxide, initiating an oxidative stress response in E. coli 56 . Hydrogen peroxide signals the activation of the OxyR regulon, leading to the transcription of hydrogen peroxide resistance genes 57,58 , of which katG, ahPF and trxC were among the top 10 upregulated genes (Fig. 1e), grxA was significantly upregulated and ahpC was threefold upregulated in the LOX environment. Hydrogen peroxide destabilises bacterial iron-sulfur clusters and cause the release of molecular iron 59 . This explains the involvement of genes co-expressed in the saddle brown module significantly associated with LOX in the de novo assembly of iron-sulfur clusters via the ics operon (Fig. 4c). Although, hydrogen peroxide can inactivate the ics operon and induce the suf operon to compensate for the iron-sulfur cluster assembly, the effect is reversed with a decline in hydrogen peroxide stress 60 . Since, we did not see a strong expression of the suf operon genes in the LOX environment, combining the differential expression and co-expression analysis, it can be inferred that hydrogen peroxide is detoxified by the resistance genes, allowing for the ics operon to assemble and repair the damaged iron-sulfur clusters.

Conclusions
In this study, we applied WGCNA on RNA-Seq data to identify relevant gene modules and the biological functions involved in E. coli stress response. Linking co-expression, differential expression and transcription factors allows us to find candidate genes that might help explore and further our understanding of the stress response cascade. In addition, further analysis of the module hubs might give useful insights into the regulation of coexpressed genes in a particular environment. Our data can lay the ground work for hypothesis based experimental validation of gene functions potentially involved in E. coli stress response mechanisms.