Attractor Concepts to Evaluate the Transcriptome-wide Dynamics Guiding Anaerobic to Aerobic State Transition in Escherichia coli

For any dynamical system, like living organisms, an attractor state is a set of variables or mechanisms that converge towards a stable system behavior despite a wide variety of initial conditions. Here, using multi-dimensional statistics, we investigate the global gene expression attractor mechanisms shaping anaerobic to aerobic state transition (AAT) of Escherichia coli in a bioreactor at early times. Out of 3,389 RNA-Seq expression changes over time, we identified 100 sharply changing genes that are key for guiding 1700 genes into the AAT attractor basin. Collectively, these genes were named as attractor genes constituting of 6 dynamic clusters. Apart from the expected anaerobic (glycolysis), aerobic (TCA cycle) and fermentation (succinate pathways) processes, sulphur metabolism, ribosome assembly and amino acid transport mechanisms together with 332 uncharacterised genes are also key for AAT. Overall, our work highlights the importance of multi-dimensional statistical analyses for revealing novel processes shaping AAT.

Microorganisms are able to adapt to diverse environmental changes, making them the longest surviving living systems on this planet. The well-studied bacterium Escherichia coli is able to switch between 2 stable attractor states, aerobic and anaerobic conditions, based on oxygen availability 1 . E. coli grows well on both conditions, albeit at a lower rate anaerobically. Over the last few decades, numerous studies have investigated the transition of E. coli between the states from a traditional and reductionist standpoint 2,3 .
The current understanding of E. coli metabolism during aerobic condition is that the glucose flux moves through the glycolysis pathway, and channelled to the TCA cycle via pyruvate dehydrogenase complex. This mode of respiration yields higher ATP levels, thereby, generating more energy compared to anaerobic respiration. In the absence of oxygen, depending on the availabilities of electron donors or acceptors, pyruvate formate-lyase, nowadays called formate C-acetyltransferase, catalyses pyruvate and coenzyme-A into formate and acetyl-CoA, a reversible conversion. In addition, lactate dehydrogenase also acts on pyruvate to produce lactate, and still others include succinate, acetate and ethanol production. These processes, collectively, suppress the metabolic fluxes channelling to the TCA cycle. Thus, major metabolic switching mechanisms occur during aerobiosis or AAT, and numerous recent studies have focused on deciphering other key novel mechanisms that are also concerted. For example, Green and colleagues used microarray transcript profiling to reveal peroxide stress response and methionine biosynthesis as novel processes induced during aerobiosis 4 . Although these works have shed light into the novel processes guiding E. coli AAT, it is important to note that living cells are dynamical systems that involve large interplay of cellular networks.
For understanding complex cellular behaviours such as immune response or growth, numerous studies have employed computational models utilizing linear and non-linear differential equations to monitor intracellular as well as extracellular molecular species, such as proteins or metabolites turnover, over time 5 . In addition, dynamical systems and chaos theories have also been used to study the complex self-organizing (e.g. skin pattern formation) and stochastic or chaotic behaviors (e.g. multiple lineage cell differentiation from single cell origin) of living systems 6 . Here, the models developed are qualitative, rather than quantitative, in defining the dynamical system,

E. coli transcriptome-wide expressions follow lognormal distribution. The original experiments
were performed by Feuer and colleagues 26 . Briefly, E. coli K-12 strain W3110 cells were grown anaerobically in a 3-liter continuously stirred tank bioreactor at pH7 and 37°C, and stirred at 500 rpm with a Rushton turbine. The first sample was drawn (t = 0) when OD of 3 at 600 nm was achieved, and air supply of 1L/min was then initiated. Subsequent samples were taken at t = 0.5, 1, 2, 5 and 10 min, with 3 biological replicates. The resultant RNA-Seq data, in read counts, for all samples were deposited and available with GEO accession number GSE71562.
The RNA-Seq data needed to be checked and reduced for gene expressions that can be considered noisy, especially the lowly expressed ones. Previously, we had used statistical distributions on normalized expressions as a means to remove unreliable or noisy genes 17,23,24,27 . Therefore, Transcripts Per Kilobase Million or Transcripts Per Million (TPM) normalization of the read counts of all samples were performed and checked for statistical distribution of all (4240 non-zero) transcripts for all time points (Figs. 2A and S1A). As previously seen for other cell types, we observed theoretical Pareto (power-law) and lognormal distributions best followed the experimental transcript distributions above a threshold of 5 TPM. Next, to check how close the gene expressions match the Pareto and lognormal distributions, we performed a Quantile-Quantile plot (Figs. 2B and S1B). The data show that lognormal has major advantage over Pareto, noticeable for the higher expressions (TPM > 800). The quality of statistical distribution fit was finally confirmed using the Akaike information criterion (Table S1). Notably, lognormal distribution for gene expressions has also been recently implicated for several other cell types 27,28 . Hence, we concur that our E. coli gene expressions follow lognormal distribution across all replicates and time points (Fig. S1A,B) and retained genes with transcript levels above the 5 TPM threshold for further analysis (N = 3391).
Correlation analysis shows gradual transcriptome-wide deviation from origin. Temporal correlation metrics are often used to check how a dataset deviates from their initial condition, time or perturbation 17,18,22 . Here, to investigate the transcriptome-wide deviation from t = 0, we adopted 4 correlation metrics (Pearson, Spearman, Biweight midcorrelation or bicor 29 , and Mutual Information-based correlation 30 MI c , see Methods). We used these metrics to investigate parametric (Pearson), and non-parametric (Spearman, bicor and MI c ) correlations. All metrics pointed to a gradual decay in the transcriptome-wide correlation with time for all 3 replicates (Fig. 2C,D -left panel and Fig. S2). These data suggest that there is a constant and gradual global transcriptomic "movement" in time 31 .
Looking closer, the Pearson correlation, which measures the linear association between 2 vectors, showed a sharp initial decay (t = 0.5 and 1 min) and a recovery before gradual decay for 2 replicates. This "discrepancy" is absent in the other 3 non-parametric correlations which could be less outlier-sensitive. In addition, the Q-Q plots show that the highly expressed genes deviated from the global statistical distribution (Figs. 2B and S1B). Thus, we removed genes one by one from the highest expression and checked the transcriptome-wide Pearson correlation. Consequently, the removal of the top 2 highest expressed genes (rnpB and lpp) was sufficient to result in smooth correlation decay in all metrics used (Fig. 2C,D -right panel and Fig. S2). Thus, we kept the remaining genes www.nature.com/scientificreports www.nature.com/scientificreports/ (N = 3389) for further analysis. Next, we investigate which portions of genes are key for guiding the anaerobic to aerobic state transition. To do this, we need to consider the concepts of attractors often used in physics and mathematics 18,32 . Defining attractor region and transcriptomic elements. State space for a dynamical system is the set of all possible states, where each state of the system corresponds to a unique point in the state space 13,14,20 . As it is difficult to define explicit expressions for representing transcriptome-wide dynamics, the analysis of state space The landscape is a schematic 3-D projection of N (total number of genes) to a twodimensional state space. In the attractor landscape, many stationary attractors (represented by the local minima), which correspond to the natural cellular phenotypes such as cell fate, might co-exist. Each attractor associates with a unique cellular signature profile (or gene expression profile in this study). The transitioning processes (dashed blue line) guide the cell from one stable attractor to another 57 . (E) Gene expression attractor landscapes generated by the superimposition of probability distribution of Pearson and mutual information correlation mertics to create a 3-D space 18 . Existence of stable attractor coinciding the convergence of cellular trajectories is indicated by the local minima.
as set of all possible pairs of linear and non-linear gene expression correlation dynamics (based on R v and MI v , see below) provides a useful way for understanding the qualitative features of attractor localizations or visualizations 18,22 . Note that the modified mutual information (MI v ) and Pearson correlation (R v ) used here are slightly different from those used in the previous section (Method).
The fractal nature of transcriptomic response in cell fate determination has been explored in previous studies 18,33,34 . Basically, discrete subsets of transcriptome are responsible for guiding the transition of gene expressions from one "equilibrium" attractor state to another. The attractor state is the result of the convergence of gene expression dynamics across the transitioning time. Thus, in order to identify the genetic drivers of AAT, we divided the transcriptome into discrete fractions, namely transcriptomic elements 18 , and compared their  www.nature.com/scientificreports www.nature.com/scientificreports/ individual trajectories against attractor region (Fig. 3A). Any element falling within the attractor boundary is attributed to AAT response.
A transcriptomic element is a minimum set of genes that possess enough correlation information (based on R v and MI v ) to show whether they fall into a cell attractor state 18 . To identify the size of a transcriptomic element, n genes were randomly chosen from the whole genome (n = 25, 50, 100, 200, 400, 600, 800, 1000), and their R v and MI v were evaluated for 100 repeats at all time points. The standard deviation of the R v and MI v were then plotted for the different n sizes and compared with the law of large numbers or LLN 18,35 . Both R v and MI v distributions of the transcriptomic elements converged to specific values for n > 50 (Figs. 3B and S3A). Furthermore, the result shows that the standard deviations of R v and MI v distributions at each time point decrease with increasing n, approximately following the law of large number (Figs. 3C and S3B). To investigate ensemble of genes that fall into the AAT attractor basin we, therefore, chose n = 100 as a compromise between density distribution and to retain a reasonable number of transcriptomic elements since higher n reduces the total number of elements.
To check the existence of cell attractor state in AAT, we analysed the superimposition of probability density (SPD) distribution for modified Pearson correlations (R v ) and modified mutual information (MI v ) of gene expression deviations from t = 0 to 10 minutes 18 (Method). We found 2 distinct peaks (Fig. 3D,E), with the major peak coinciding with the state transition end time of the experiments. This implies the existence of cell attractor by the localizations of R v and MI v on the correlation space. As previously, we defined the attractor region boundary or basin as the inflection plane of the major peak curves 18 , and observed the transcriptome-wide trajectory also fall within this attractor region (Fig. 3E). Note that there are usually several meta-stable attractor states before the final state transition occurs. Here, we are evaluating the early meta-stable attractor states crucial for AAT.

Identification of genes falling into attractor basin.
To identify the number of genes that fall into the attractor basin and their names, we first ranked all genes according to their expression variability, measured by standard deviation (SD) across time 18 . Secondly, we assembled the ranked genes into transcriptomic elements (n = 100) and checked their R v -MI v trajectories in time across the coordinates of the attractor region (Fig. S4A). We observe that one element, with the highest SD, fall above the attractor basin, 4 elements (3, 4, 5 and 12) fall into the basin and the rest fall out and remain below the basin (Fig. S4A).
The Euclidean distance of each element compared with the whole transcriptome trajectory is shown in Fig. S3C. Notably, the elements that fall above and into the basin shows the closest distance (less than the mean values) with whole transcriptome trajectory. Thus, we concur that these 5 transcriptomic elements are key in shaping the AAT response, and checked their combined effect. As anticipated, merging the elements into a larger sub-transcriptome resulted in them falling into the attractor region (Fig. S3C, black dotted). Hence, we named them as attractor genes.
Among the remaining 29 transcriptomic elements, 13 show close Euclidian distance to the whole transcriptome trajectory (Fig. S3D, empty circle symbol), while 16 elements have large deviation (Fig. S3D. empty square symbol). We considered them as pseudo-attractor and non-attractor elements as their collective or combined R v -MI v trajectories fall into and outside the attractor basin, respectively (Fig. S3D, green and blue dotted).
Finally, we combined the attractor (500) and pseudo-attractor (1300) genes and tracked their overall trajectories which resulted in them falling into the attractor basin (Fig. 3F). Since the 1,800 genes collectively enter into the attractor basin, we now re-term them as the attractor genes, while keeping the remaining as the non-attractor genes (Fig. 3F). In other words, our attractor analyses reveal 53% or about half of the transcriptome, spreading across a wide spectrum of expression levels (Figs. 3G and S3D), is responsible for shaping the E. coli AAT. Next, we also conducted a similar attractor study, sorting elements according to fold-changes instead (Fig. S4B). Here, we obtained 65% of the transcriptome fall into the attractor basin.
Overall, contrary to the general impression that only a small number of highly expressed genes shape AAT, our data suggest that gene expression levels are not indicative of AAT, and an order of about half the transcriptome is crucial for the biological state transition.

Principal component analysis as a test for attractor genes. Previous works had utilised principal
components (PC) trajectories to investigate the dynamic global response of cell differentiation 13,[16][17][18]36 . Here, we performed similar analysis for whole, attractor and non-attractor genes. PCs 1 and 2 constituted over 70% variance at t = 0 for all genes (Fig. S5A) and, hence, we tracked their values at each time point (Fig. 4A). As anticipated, our results show that the attractor genes tracked similar whole genome PC trajectories in time.
Next checking correlations, the attractor genes produce the most variable response (Fig. 4B). Nevertheless, the non-attractor genes also show monotonic or gradual variation in time, suggesting that even these genes can support AAT response, albeit with lower kinetics. Therefore, in this remaining 1,589 genes, we removed 68 genes that had below 1.12 fold changes at any time point, and checked their PCA and correlation plots (Figs. 4A and S5B). Our data show that removing the 68 genes did not noticeably affect the PCA or correlation response. Although the 1521 genes did not enter the attractor basin, they still show gradual response in time. Thus, these were named as collective non-attractor genes.
To summarize, our analysis indicate that 1,800 attractor genes shape the global AAT response, while 1521 genes follow weak but collective global response with remaining 68 genes not having any response. (Note that these 68 genes are additional to the 851 genes already removed during the low and high expression filtering using lognormal distribution fitting.) Temporal groups of attractor genes and characterization of major biological processes. The previous section highlights the significance of the attractor genes in shaping the global response of E. coli in aerobiosis. To scrutinize the biological functions of attractor genes, hierarchical clustering was utilised, resulting in 13 initial clusters (Fig. 5A). From these clusters, we further refined the gene groupings by setting a Pearson (2020) 10:5878 | https://doi.org/10.1038/s41598-020-62804-3 www.nature.com/scientificreports www.nature.com/scientificreports/ www.nature.com/scientificreports www.nature.com/scientificreports/ correlation of above 0.7 between each gene's temporal expression with the average profile for that group. Those that were below the threshold were re-evaluated within subgroups and subsequently re-grouped. As a result, we obtained 6 temporal groups of patterns for the attractor genes (Fig. 5B).
To elucidate the major biological processes that are regulated in each of the temporal groups, we performed Gene Ontology enrichment analysis using clusterProfiler R package 37 with Entrez Gene database 38 (both False Discovery Rate and adjusted p-value cut-off were set at 0.05). Enrichment analysis for the 6 groups of attractor genes resulted in (Fig. 6, Table S2): Group A: Gradual decay, mainly enriched in glycolysis, fermentation, anaerobic respiration, phosphorylation, amino acid metabolism, and locomotion, Group B: Gradual activation, enriched in TCA cycle, aerobic respiration, sulphur compound metabolism, protein folding and response to stress, Group C: Fast activation, followed by decay and re-activation, enriched in ribosome biogenesis, RNA processing, gene expression and response to copper ion, Group D: Early activation, followed by decay, mostly enriched in amino acid metabolism, cation transport, and organelle organization, group E: Early activation followed by plateau, enriched in aerobic respiration, several types of transporters, and ion homeostasis, and group F: Early decay, followed by plateau, enriched in biosynthesis of lipid, organonitrogen and nucleobase-containing compounds, and negative regulation of cellular processes.
Focusing on the names and functions of each attractor gene (using UniProt 39 and EcoCyc 40 databases), we observe several novel or uncharacterised genes for all groups (Tables S2-S4). Group A possesses several genes coding for flagellum protein, ligase, transcriptional regulator, hydrogenase/dehydrogenase functions, together with 71 uncharacterized genes. Group B contains ion and small molecular binding, and oxidoreductase genes together with 60 uncharacterized genes. Notably, a large portion of group C are tRNA genes (approximately 20% of the genes in group C) and ribosomal protein genes, with 76 uncharacterized genes. Group D consists of several ribosomal proteins (distinct from group C) and transporters, with 19 uncharacterized genes. Group E comprises mainly of oxidoreductase, ion transferase and binding proteins, with 33 uncharacterized genes. Finally, group F shows many rRNA coding genes, along with other protein binding genes and 73 uncharacterized genes (note that tRNAs and rRNAs were not mapped to any gene ontology term in the Entrez Gene database, they were found in EcoCyc database instead).
To narrow down the pool of the attractor genes and to investigate their individual function, a more stringent cut-off of expression levels (TPM) greater than 500 units at any time point, and a 3-fold change between any 2 time points unveiled 94 genes (individual function of the genes are listed in Table S5). As expected, the majority of the refined Group A (23 genes) belongs or is connected to glycolysis and fermentation. In addition, genes of aldehyde-alcohol dehydrogenase (adhE), transcriptional regulator (gadE), ferritin (ftn), formate hydrogenlyase regulatory protein (hycA) and fumarate reductase (frdB, frdC) are also present in the group.
Finally, we compared our attractor analysis with the common method of choosing genes with more than 2-fold expression change between any 2 time points. This resulted in 631 genes clustered in 5 temporal groups (Fig. S6A,B). Venn diagram comparative analysis reveals 522 common genes (Fig. 7), with 48 unique novel or uncharacterized genes compared with 219 unique in the attractor set and 113 that are common. The notable novel genes for the 2-fold analysis include uracil permease (yfbP), plasmid stabilization mediating proteins (mokC) and a few tRNA genes (Table S6). Gene expression distribution of the attractor and 2-fold genes showed that the latter possess higher proportion of lowly expressed genes (Fig. S6B).

Discussion
E. Coli aerobiosis is a highly investigated area of research. Although numerous works have been performed, the early dynamic transcriptome-wide response is still poorly understood. Here we investigated RNA-Seq data, consisting of 4,240 non-zero expressions, of anaerobic E. Coli in a bioreactor where readings were taken at 0, 0.5, 1, 2, 5 and 10 min after air supply was initiated. Unlike other approaches that typically investigated arbitrarily selected differential fold changing genes, and performed gene ontology enrichment analysis [41][42][43] , here we undertook multi-dimensional statistical approaches, in view of a dynamical system approach, where we queried the portion of transcriptome that are able to track the global transcriptome-wide attractor response. In other words, we investigated the spectrum of genes that are responsible for shaping the AAT.
After incorporating statistical distribution fitting ( Fig. 2A,B), we retained 3,389 genes for analysis. We also included the concept of dynamical systems approach into analysing the remaining gene expressions, introducing the state space visualizations through modified linear (Pearson, R v ) and non-linear (mutual information, MI v ) correlation density distributions. From these, the attractor regions were generated where a subset of genes non-attractor (1589) genes overlaid on SPD of R v and MI v for whole transcriptome. (G) Distribution of expression level for attractor (red), and non-attractor (blue) genes at representative t = 0. See Fig. S3D  www.nature.com/scientificreports www.nature.com/scientificreports/ (transcriptomic elements) were shown to fall into (Fig. 3). We obtained 1800 genes that fall into the attractor basin (attractor genes), and another 1589 genes fall away (non-attractor genes). Notably, the attractor genes track the global transcriptome-wide trajectory on the PC space (Fig. 4A), and their temporal correlation analysis showed significant variation in time (Fig. 4B). Among the 1589 non-attractor genes, 1521 genes showed small-scaled but clear trajectory on PC space, indicating weak but collective global response, while the remaining 68 did not show any response during the AAT (Fig. S5B).
Hierarchical clustering of the attractor genes showed 6 temporally regulated groups (Fig. 5). Functional enrichment analysis to characterise the function of these important genes shows, as expected, glycolytic, fermentation, anaerobic respiration and cell motility related genes were gradually deactivated (Group A), whereas TCA cycle, aerobic respiration and sulphur compound metabolism were activated (Group B), inversely to Group A. These data are consistent with our general understanding of E. coli aerobiosis 4 . There were also several novel or www.nature.com/scientificreports www.nature.com/scientificreports/ uncharacterized genes significantly induced (TPM > 500 and 3-fold changes), including osmY, ybaY, yfhJ, and ytjA. Additionally, entericidin B (ecnB), which plays a role in bacteriolysis 44 , was also found in this group.
Group C reveals ribosomal biogenesis, translation and gene expression processes. Notably, almost 20% of the genes in this group are tRNA, along with several unknown functional genes (yciG, bdm, YciG, YmdF, YobH, YciH - Table S3). Group F, on the other hand, constitutes mainly of ribosomal RNA. The data from these 2 groups are interesting in that several recent high-profile articles have highlighted the importance of hibernating ribosomes to conserve respiration energy during anaerobic conditions, and certain enzymes "kick in" to revive the metabolism [45][46][47][48][49] . Also, tRNAs are key for protein synthesis and they play important roles in cellular growth, stress response and general translational regulation. The data here show several genes coding for ribosomes and tRNA perhaps play key roles in translational process that shapes anaerobic to aerobic transition.
Group D and E elucidates several hydrogenase genes which are known to be produced in anaerobic or stress conditions and participates in the reduction of fumarate and dimethyl sulfoxide (fnr) 50 . In addition to these, cystathionine gama-synthase (metB), alkyl hydroperoxide reductase (ahpF, ahpC), ornithine carbamoyltransferase (argF, argI), and ferric iron reductase (fhuF) are also revealed. Many of these genes were not previously identified with the anaerobic to aerobic transition. www.nature.com/scientificreports www.nature.com/scientificreports/ Overall, our work undertaking several statistical metrics, on a dynamical systems viewpoint, to infer the transcriptome-wide response of E. Coli aerobiosis have revealed, for the first time, the existence of a fractal portion of transcriptome (1,800 attractor genes) that tracks transcriptome-wide response, and is collectively crucial for the adaptive state transition. This shows a much higher resolution than the conventional 2-fold expression changing genes selected between any 2 time points (Fig. 7). Notably, previous works have mainly focused on metabolic regulatory genes, but here we show the significance of other types of genes such as tRNAs and rRNAs which are largely involved in post-transcriptional and -translational processes, in addition to 332 uncharacterised attractor genes. Future work could focus on elucidating the function of these genes that are captured for each temporal group of gene regulation.

Methods
Statistical distributions fitting. Fitting gene expression distributions was performed using the maximum-likelihood estimation method (fitdistplus packge 51 for parameter estimation and the mass package 52 for log-normal, Pareto, Burr, Loglogistic, Weibull and Burr distributions 53 ).

Spearman correlation. Spearman rank correlations is defined by
where r x,i and r y,i are the ranks of the i th observation x i and y i , in vectors X and Y, respectively.

Bicor (Biweight midcorrelation).
The biweight midcorrelation 29 of two vectors X and Y is given by noting the weights for x or y, represented by general term p:  where 54 the joint probability distribution function p(x, y), and marginal probability distribution functions, p x ( ) i and p y ( ) i are estimated by means of an histogram-based approach by discretizing the rank-transformed gene expression into K = 10 bins 18,54,55 for t i (i = 0,1,2..,M, where and M = 10 min). Note that systematic error ε occurs during the discretization, which is then subtracted from the raw MI value. ε is defined as minimum MI for 100 random permutation of the rank-transformed gene expression vector 17 . The MI-based correlation between X and Y is expressed via 30 : Modified Pearson correlation Rv. The dynamic gene expression at time t i can be defined as a ) with x j (t i ) being expression value of the j th gene at t i . The deviation-from-average expression vector at time t i is defined as is the average expression of j th gene over M + 1 time points) The modified Pearson correlation is defined as This index thus measures the temporal correlation of genome-wide expression deviations from their average values so as to allow discriminating gene expressions with different amplification but similar temporal profiles 18 .

Modified mutual information MI v . Mutual information between vectors V(t i ) and V(t 0 ) is defined similar
to formula (1) with V(t i ) replacing X and V(t 0 ) replacing Y. The probability functions p(x), p(y) and p(x,y) are estimated based on discretized gene deviation data into 10 bins using histogram-based approach. Systematic error ε is defined as minimum MI for 100 random permutation of gene deviation vectors V(t i ) 18 . Finally, to compare MI among different replicates, we used the normalized value: being the standard deviation of gene j th expression across 6 time points. After that, we divided the ranked standard deviation vector σ into p groups, each group with n genes (n = 25, 50, 100, 200, 400, 600, 800, 1000). Note that we choose p = ⌈ N n ⌉, the p th group contained n genes which can be overlapped with the (p − 1) th group. Next, we examined the trajectory on MI v − R v space for each individual group of genes to check whether it fall into attractor region.
Determination of attractor region on R v -Mi v space. Attractor boundary was defined on the superimposed probability density (SPD) distribution of modified Pearson correlation R v and modified mutual information MI v for whole genome (3389 genes). Distribution of whole genome R v and MI v was generated by randomly choosing n = 100 genes for 100 times from the pool of 3389 genes, and SPD of these 100 repeats was estimated on discretised lattice by 2D kernel density estimation using the mass library in R programming 52 .
Attractor boundary was determined by the inflection points on the SPD of whole genome R v and MI v , where the inflection points 18 were determined as highest gradients in vertical and horizontal directions from the local points on the lattice. Averaging the z-coordinates of the vertical and horizontal inflection points determine the z-coordinate of inflection curve, or attractor boundary contour.
Hierarchical clustering. Hierarchical clustering was performed on normalized expressions of attractor and pseudo-attractor genes using Ward clustering method 56 . Normalized expression of the j th gene at time t i is defined as 17 σ = − z t x t x ( ) ( ( ) )/ j i j i j j where x t ( ) j i is expression of the j th gene at time t i , x j is the mean expression across all time points, and σ j is the standard deviation. As a result, 9 clusters were obtained, which were further regrouped according to 5 distinct temporal average expression patterns for attractor genes, and 4 distinct temporal average expression patterns for pseudo-attractor genes.

Data availability
The R-codes for transcriptomics analysis are available from the authors upon request. The E. coli data is obtained using GEO accession number GSE71562.