Hysteresis control of epithelial-mesenchymal transition dynamics conveys a distinct program with enhanced metastatic ability

Epithelial-mesenchymal transition (EMT) have been extensively characterized in development and cancer, and its dynamics have been modeled as a non-linear process. However, less is known about how such dynamics may affect its biological impact. Here, we use mathematical modeling and experimental analysis of the TGF-β-induced EMT to reveal a non-linear hysteretic response of E-cadherin repression tightly controlled by the strength of the miR-200s/ZEBs negative feedback loop. Hysteretic EMT conveys memory state, ensures rapid and robust cellular response and enables EMT to persist long after withdrawal of stimuli. Importantly, while both hysteretic and non-hysteretic EMT confer similar morphological changes and invasive potential of cancer cells, only hysteretic EMT enhances lung metastatic colonization efficiency. Cells that undergo hysteretic EMT differentially express subsets of stem cell and extracellular matrix related genes with significant clinical prognosis value. These findings illustrate distinct biological impact of EMT depending on the dynamics of the transition.


Supplementary Figure 1. Hysteresis attractors and experimental validation.
(a) Forward orbits of the iterated function system for TGF- stimulus below, within, and above threshold. The E-Cadherin (phenotype) forward orbits of the iterated function system, conveyed as a sequence of blue curves whose intersections with the red axes occur at stable (blue circles) or unstable (red circle) equilibria. Ten

Code availability
All computer code are scripts in Python and Mathematica. The scripts implement the methods described in this document and are available upon request.

From biology to a general ODE-based mathematical model
We identify a general ODE-based mathematical model to the biological mechanisms (Fig. 1a). To simplify presentation, we list species and their abbreviations, corresponding to (Figs. 1a, b), below in Supplementary Table 2. The biological mechanisms and corresponding mathematical  representations are exhibited in Supplementary Table 3.

Supplementary
ZEB transcriptional repression of E-cadherin ⇤ The symbol "%" means leads to increase in. ⇤ The symbol "•" indicates complexing. ⇤ The symbol "&" means leads to decrease in. ⇤ The symbol "!" means leads to.
Taking the general ODE-model of Supplementary Table 3, we simplify by assuming negligible delays between transcription and translation (gene and protein expression respectively), enabling treatment of individual species. Hereafter, model efforts are concerned with gene expression. For the reduced set of equations characterizing gene regulation, we prescribe Michaelis-Mentenstyle kinetics. This model is based on the following assumptions: (1) transcriptional activation is proportional to the number of binding sites occupied by transcriptional activators and (2) transcriptional repression is proportional to the transcriptional repressors bound. We have the following considerations for the transcriptional output of gene G: • exogeneously influenced by only transcriptional activation and repression, with respective rates k G,a , k G,r 2 R + ("reaction rate") • The model form is then

Application of ODE-based mathematical model to TGF-induced EMT signaling
We denote Smads, ZEB1/2, E-cadherin, and miR-200 as S, Z, E, and M respectively, and note that the putative promoter for miR-200 has two E-box binding sites and two Z-box binding sites, identified to two clusters. Both Smad•X, where X is a co-repressor (" • " means complexing, Smad•X means Smad recruitment of corepressor X ), and ZEB1/2 bind to E-box sites [3,4,5,6], and Z-box binding sites are exclusively bound by ZEB1/2 [7]. Therefore, miR-200 repression is competitive for Smad•X and ZEB1/2 for E-box binding, and this E-box binding is synergistic with Zbox binding. We introduce a simple quadratic term to account for the complexing reaction between miR-200 and ZEB1/2, a key component of the mutual inhibitory feedback loop [5].
Definition 1 (EMT model). We denote the EMT system asẋ = f (x, ✓) with state variables x 2 R 4 + , parameters ✓ 2 R 17 >0 , and initial conditions x 0 2 R 4 0 . The system is specified as where for species "X" we have B X as baseline transcription rates, k X as reaction coefficients, and K X as Michaelis-Menten constants. k MZ denotes the reaction rate for the complexing reaction between miR-200 and ZEB1/2.

Mathematical results for the TGFb EMT model of Definition 1
Recall that the symbol " ⇤" indicates the end of a proof. In this section we use tools from dynamical systems theory, in particular fixed-point analysis of non-linear ODE models. For more information on these ideas, see [8]. For concision, the contents of the mathematical proofs of this section are detailed in (later) Supplementary Section 3.
We show that there exists a solution to the mathematical model for TGFb EMT. There are several important properties of the fixed points of the system salient in the proof. Firstly, the parameters K T , k S , d S , and T only affect the fixed point value of S, and S is completely determined by their values. So, when we are only interested in fixed point properties, it is possible to take S to be a system parameter rather than T , which lowers the dimension of the parameter space by three. are not uniformly convergent. This does not affect the zeros themselves, but for certain ✓ and x 0 , the zeros are isolated points. This behavior can be observed in Fig. S1a for the intra-threshold panel.
In discussing stability we will make use of the Jacobian of f , given by It is immediately evident that d S and d E are both negative eigenvalues of J f (x). The sum of the other two eigenvalues is still negative since both k MZ M d Z and k MZ Z d M are negative, so there is at least one more eigenvalue with negative real part. Thus, J f (x) has at least three eigenvalues with negative real part. Consequently, we can use the determinant of J f (x) to determine asymptotic stability. The following result is useful because it shows that the stabilities of the equilibria coincide for the differential system and the iterated function system. We should note that Theorem 2 does not state that the stable equilibria are identical for the differential and iterated function systems for a given ✓ and x 0 . Although this is generally the case, we observe that for some values of ✓ and x 0 the corresponding stable equilibria take dramatically different values. The determinant of J f (x) can be simplified by expanding by minors across the row d S and then down the column of d E , giving as before and noting d S , d E > 0, we observe that det(J f ) has the same sign as det(A f ).

Corollary 2 (Instability
. An important finding is the nature of the input-output relationship for the differential system. Here, we show that the output-input relationship is unique; however, the converse is not true. To show this, we first derive a polynomial relation that gives the equilibrium values of Z.

Proposition 1 (Polynomial relation for Z).
The equilibrium values of Z are given by the roots of a quartic polynomial. Expressed as a polynomial relation, this is

Proposition 2.
No single biologically feasible value of Z is an equilibrium value of Z for two distinct positive values of S.

Proof. Supplementary Section 3.
This result means that there is a one-to-one mapping from some subset of values of Z to S. Said another way, if there exists one value of S > 0 that produces Z 2 (0, Z max ), then there exists no more. The result reveals that the mapping from Z to S is either one-to-one or one-to-none.

Proof. Supplementary Section 3.
There are one to one relationships between Z and E and S and T , specified through E =

Corollary 3 (E to T ).
For every E 2 (0, E max ), there exists a unique positive T .
The converse of Theorem 3 (and Corollary 3) is not true, as multistability, where a single equilibrium of S maps to multiple equilibria of Z (and M ), occurs for certain parameters ✓ h ⇢ ✓ ✓ R 13 >0 . Enforcing Z 2 (0, Z max ) has the effect of making some of the roots of Z inaccessible. Namely, the smallest root is negative, Z < 0, and a branch of equilibria Z > Z max occurs. These are all impermissible as they mathematically occur outside (0, Z max ). In terms of T 7 ! E, for ✓ h , as we vary T as observe the following cases The number of distinct realizable values of E 2 [0, E max ] is one or three, and the particular T 7 ! E correspondence is determined by the initial conditions. In the one-to-three mapping consisting of 0 < E 2 < E 3 < E 4 < E max where T 2 (a, b), E 2 and E 4 are stable fixed-points and E 3 is unstable. Unfortunately, despite the simple appearance of the differential system, necessary and sufficient conditions for this bifurcation do not appear to be analytically expressible, and we proceed with numerical treatment of the composition and properties of ✓ h .

Definition 2 (Hysteresis bifurcation).
Consider the systemẋ = f (x, ✓ h ). The bifurcation of E 1 in T is called a hysteresis bifurcation.

Simplification of mathematical model of Definition 1
Because the model exhibits a one-to-one correspondence between the steady-state value of S and the parameter T , we simplify the EMT model.

Definition 3 (Simplified EMT model)
. We simplify the model from Definition 1 by replacing the state variable S with the parameter T , We denote state variables as >0 , and initial conditions as We use the model from Definition 3 to define the relation g : T 7 ! E 1 between the dose of TGFb protein, T , and the steady-state mRNA concentration of CDH-1, E 1 .

Random parameters
Consider the systemẋ = f (x, ✓) with (x 0 , ✓) 2 X ⇥ ⇥. In real biological systems, uncertainty exists in (x 0 , ✓), where cell-to-cell variation occurs in the counts of molecules and proteins, cell size, receptor affinities (from counts and spatiotemporal fluctuations of bound receptors on cell surface), and so on. Hence, (x 0 , ✓) is defined as a random variable on the probability space (⌦, H , P).

Autocrine regulation
We introduce additional state variables (T f , T b ) into the ODE model from Definition 3 and replace the occurrence of T by T b to provision a simple model for autocrine regulation.
Definition 4 (Autocrine model). Let T f denote free (unbound) TGFb and T b denote TGFb that is bound to a receptor. Dynamics are specified through the system Free TGFb (T f ) has the following fates: production by secretion, loss by conversion to T b , and loss by biological decay. Bound TGFb (T b ) has the following fates: production by conversion from T f and loss by biological decay.

Partial differential equation model for spatial diffusion
We conceptualize a single cell located at A on some domain D ⇢ R 2 acting as a point source of T f , with auto-induced secretion of T f on A by the activities of T b . We define a partial differential equation model for spatial diffusion.
Definition 5 (EMT model with diffusion). Consider models from Definitions 3 and 4, and endow the state variable T f with diffusion on the 2D spatial domain D ⇢ R 2 (note that the model from Definition 1 can be similarly equipped with diffusion, retaining S as a state variable). For a single cell associated with domain A ⇢ D, we specify the simplified EMT model having diffusion as for initial conditions X(0, r) = d · A and Y (0) = 0 and with X 2 C 1,2 (R + ⇥ R + ) such that X : This is a 1-D heat equation where the boundary term is a source that fluctuates based on the value of the PDE at the boundary. For r = 0, we have The above system can be solved using the method of lines using the idea of discretizing the spatial domain, which generates a system of ODEs that can be numerically integrated using the standard techniques [9]. We approximate the diffusion term using a second-order central finite difference scheme (from Taylor's formula) for interior points where O(( r) 2 is the truncation error and i = 2, ..., M, and for the boundary point i = 1, a slightly different second-order finite difference scheme In this research, we use a two-dimensional second-order finite difference scheme. This model is exhibited in Fig. 4a with annotations of biological dynamics.

Simulated dynamics and computational studies
The models in Supplementary Section 1 are particular instances of ordinary and partial differential equation models of transcription, with random parameters and initial conditions. Despite their simple functional form, they exhibit non-trivial dynamics, including bifurcations representing hysteresis having complex functional dependence on model parameters. To understand how some measure of the hysteresis bifurcation depends on the model parameters, we compute a high dimensional model representation (HDMR), which reveals the hierarchy of functional effects (referred to as component functions). Using the HDMR component functions, we perform a global sensitivity analysis (GSA). Importantly, both the HDMR expansion and its associated GSA may be computed from a single set of random input-output data.

Identification of the hysteresis bifurcation in E-cadherin expression
We take imposing an upper bound of one across the corresponding state variables, with initial conditions (Z 0 , M 0 ) (0, 1). Using parameter values 1, 1, 100, 1, 10 1 , 10 1 ), corresponding to a robust miR-200-ZEB1/2 regulatory axis, fixed-point iteration is conducted with ✏ = 10 4 . The forward and backward orbits of g : T 7 ! E 1 are exhibited in Fig. 1b ranging over T 2 [0, 10]. The blue curve is generated with E 0 1, the green curve with E 0 E 1 (T ), and region between is the region of hysteresis, containing the equilibria of unstable fixed-points ("separatrix" or "boundary").
Definition 6 (Bistability). The existence of two disconnected concentration "levels" for CDH1 as TGFb varies in Fig. 1b is called bistability. Definition 7 (Hysteresis). The difference between the locations of the discontinuities of the forward and backward orbits as TGFb varies in Fig. 1b for CDH1 is called hysteresis.
The asymptotic stability analysis from Supplemental Section 1.2 reveals two regimes of behavior: two eigenvalues of the Jacobian are always negative, and the third eigenvalue's sign depends on ✓ and T . The separatrix of these regimes in ✓ and T defines a hysteresis bifurcation: as we vary the parameters T and ✓ and initial conditions X 0 for the map X 7 ! f (X, T, ✓) X, we move from a single attractive fixed-point to two attractive fixed-points and an intermediary repelling fixed-point. This phenomenon is illustrated with the behavior of the iterated function system

Identification of bistability in E-cadherin expression
Recall the probability space (⌦, H , P). We let the parameter vector be a random variable with distribution (image measure) ⌫. We specify ⌫ as a product of gamma measures with parameters (↵, ), where E ✓ = ↵ and Var ✓ = ↵ 2 , and N iid random realizations are attained To illustrate "hysteresis" the (hyperparamter) values of ↵ and are set such that E ✓ = (10 1 , 1, 100, 1, 10 1 , 10 1 ) Recall the fixed-point mapping, T ⇥ ⇥ 7 ! R + , which takes some value of TGFb , parameter vector ✓, and convergence threshold ", and returns a vector in X = [0, 1] 3 representing the equilibrium (the fixed-point). Now, consider the trace of the mapping onto E ⇢ X. The notation E(T, ✓) means the value in E depends on the value of TGFb T and the random variable ✓. Fig. 1c exhibits the density of E(T, ✓) with superimposed E E(T, ·) for T 2 [0, T max ], revealing a bimodal distribution. Bimodality is exhibited in switching behavior, whereby the system transitions from epithelial to mesenchymal state, or from mesenchymal to epithelial (MET) state (hence Fig. 3e is a mirror image to Fig. 1c). Fig. 1d illustrates the density of E(T, ✓) for various T profiles.
Having held T constant for any value of E, we examine the scenario of a step function input, where a, b > 0 are positive constants. Numerically integrating the ODE system, we show the distributions of the simulated equilibria in Fig. 1d. Notice that the solutions attained for the ODE system and the corresponding iterated function system are similar for constant and transient inputs. For "non-hysteresis" we set K Z 0 = 1 and keep the remaining parameters at their values. Their results are also shown in Fig. 1d. Bootstrapped population mean estimates with one standard deviation envelopes are exhibited in Fig. 1e.

Defining a (functional) measure of hysteresis
We define a (non-negative) measure of hysteresis as where, for the TGFb value t, the quantity E b (t, ✓) is the fixed-point value of E with X 0 = (0, 1, 1) and E a (t, ✓) is the fixed-point value of E with X 0 = (1, 0, 1). The quantity A(✓) is the difference of the two solution curves of E(✓) (identified to the two initial conditions) and computes the area of the interior region formed by these curves ("hysteresis region"). This quantity is illustrated in Fig. 1b, which additionally exhibits the separatrix of unstable equilibria (green dotted; "boundary").
Here, we are interested in examining the structure of the input-output map ✓ 7 ! A(✓), where and E by Z respectively (without loss of generality, this refinement has cost of one additional parameter). In determination of steady-state, we consider the sequence (E n ) n to have converged when the difference of successive iterations E n+1 E n has a L 2 norm less than ". We set the parameter space to be a scaled uniformly distributed hypercube, It is important to note that ODE system is uniformly bounded for all time under equality constraints of the parameters, and the bound is specifiable. We enforce unit bounds (in expectation), so that values of A(✓) are on the same scale and directly comparable. 10,000 independent random realizations of ✓ were attained using Uniform(⇥). For each random realization, the quantity A(✓)

High dimensional model representation (HDMR)
The quantity h(✓) is a complex, strongly non-linear surface in ✓. Even with the simple model of Definition 1, necessary and sufficient conditions for the existence of hysteresis as analytic relations among the parameters do not appear feasible or straightforward to attain. This is a common setting in dynamical systems, where non-linearity confers great complexity and subtlety to dynamics. This is also a common problem for identification of complex systems from empirical data. Therefore, we think of h(✓) as a complex system, and we interrogate it by evaluating it on random values in the parameter space, forming the collection Using D, we profile the functional dependence of h in ✓ using high dimensional model representation, an analysis tool from the complexity sciences, which may be regarded as a kind of uncertainty quantification methodology. Because the method is critical to attaining insight into the behavior of the quantity h(✓), we give a reasonably detailed overview of the method before prior to application.
In doing so, we use the notations of [10] and [11].

Description of HDMR
High dimensional model representation (HDMR) is a finite and non-orthogonal representation of multivariate functions, as a hierarchy of projections into subspaces of increasing dimensions, with the expectation that subspace contributions to output(s) rapidly diminish with increasing dimension. We concentrate here on the essential issues of HDMR as a finite decomposition of the space of square-integrable functions, where (X, X , ⌫) is a (non-degenerate) probability space, with n = |X| 2 N and p = |F | 2 N. The inner product h·, ·i on F , induced by ⌫, is defined as , h(x) 2 F, and the norm k·k F on F , induced by h·, ·i, is defined as Definition 8 (High dimensional model representation [10,12]). For every T 2 N n and f (x) 2 F , and defining where {P u } is the collection of (hierarchically-orthogonal) projection operators. We denote When T = n, the projection operators form a resolution of the unit operator, P u P u = 1. The notion of hierarchical-orthogonality is essential, as this guarantees the existence and uniqueness of the decomposition for general (non-degenerate) ⌫. Hierarchical-orthogonality is a generalization of mutual orthogonality and requires functions to be orthogonal to only those functions defined on its nested subspaces. For example, for ⌫ 123 6 = ⌫ 1 ⌫ 2 ⌫ 3 , hierarchical-orthogonality implies that hf 12 , f 1 i = hf 12 , f 2 i = 0 but neither implies hf 1 , f 2 i = 0 nor hf 12 , f 3 i = 0.

Define the residual as
and denote the risk functional When T = n, we have J = 0 (equality): equivalently written as A key property of HDMR is that models are often well approximated, or even exactly represented, using HDMR expansions truncated at low-order (T ⌧ n), i.e. f T is a reduced-order representation of f . When the input variables are independent (⌫ = Q i ⌫ i ), the component functions are mutually orthogonal and can be recursively constructed, . For independent inputs, the HDMR component functions convey a decomposition of model variance, Normalizing by Var f , we retrieve sensitivity indices reflecting the relation This illustrates that the HDMR component functions {f u } convey a decomposition of variance. When normalized, this is known as global sensitivity analysis. In this context of independent inputs, the sensitivity indices {S u } are known as Sobol sensitivity indices [13]. In global sensitivity analysis applications, only {S u } are sought (the {f u } are not required), so the {S u } are (directly) estimated using specialized sampling [14,15,16].

Remark 1 (Kolmogorov's superposition theorem).
It would seem that the best finite representation of multivariate functions should involve only univariate functions and for continuous functions this is true: Kolmogorov's superposition theorem [17] is an existence result that establishes that every multivariate continuous function on the unit cube (X = [0, 1] n ) can be represented using a finite number of univariate continuous functions, where i are ij are the continuous functions. These functions are also highly non-smooth, greatly restricting their utility for applied settings. Compared to Kolmogorov's result, HDMR supposes a hierarchy of projections into subspaces of increasing dimensions, and expresses f (x) as a superposition of 2 n functions. While many additional HDMR component functions appear when compared to Kolmogorov's result, most of the HDMR component functions are identically zero, or insignificant, for F of practical interest, i.e., a low effective-dimension, f (x) ' f T (x) for T ⌧ n.

Illustrative example of a HDMR analysis
Consider the "product function" (or monomial)

Its variance is given by
Its component functions are Consider the k-dimension component function f u (x u ) (|u| = k). It sensitivity index is given by (⇢ 2 +1) n 1 , and the k-order sum (the sum of all k dimensional sensitivity indices) is given by (and of course P k p{k} = 1). Because n is fixed, we concentrate on the numerator, which consists of the (polynomial) number of dimension k subspaces n k and the (exponential) ⇢ 2k . Observe that when ⇢ < 1 the explained variance of low dimensional approximations increases, so that mass of the probability vector p = (p{k} : k = 1, . . . , n) concentrates about small k (exponential suppression beats polynomial growth). As a point of reference, the uniform distribution (a common distribution for a "null hypothesis") on the unit interval has ⇢(Unif[0, 1]) = where ⌧ ⌘ 1+µ regulates expansion efficiency (instead of ⇢ = /µ). This worked example has a suggestive conclusion: HDMR analysis reveals that the effective dimension of the product function (monomial) is regulated by the coefficient of variation of the input distribution.

HDMR has been independently formulated for and applied to seemingly distinct domains
HDMR is discussed as early as [18] in ANOVA analysis and has enjoyed extensive application to statistics. When F is the collection of symmetric functionals of iid variables HDMR is known as the Hoeffding decomposition, a fundamental object in U-statistics [19]. The univariate terms of HDMR are sometimes known as Hajek projections and are extensively used to establish asymptotic normality of various statistics [20]. [10] discusses HDMR as a general decomposition of F where the input space resides in R n (function HDMR) or in the n-fold product space of arbitrary linear topological function spaces (functional HDMR). [10] also discusses F as the collection of functions taking finite value at some point x 2 X, where ⌫ is the Dirac measure sitting at x (cut HDMR or anchored ANOVA). These ideas have been further developed to F for general (nondegenerate) measures [12], wherein HDMR is known as generalized functional ANOVA, which in turn provisions global sensitivity analysis in terms of structural and correlative sensitivity indices [21]. In application to goodness-of-fit settings, e.g., Pearson Chi-square, HDMR identifies and eliminates information leakage in "big data" settings [22]. Note that sometimes HDMR is known as the Hoeffding-Sobol decomposition [23] or Sobol decomposition [24]. See [25] for references on HDMR's earlier history.

HDMR usage for this article
In this article, we use random sampling high dimensional model representation (RS-HDMR) [10,26,27]. We truncate the (function) HDMR of h to second-order (T = 2) and estimate the sensitivity indices, {S u : |u|  T } from N independent random vectors in ⇥ with independent inputs (⌫ = Q i ⌫ i ). The HDMR component functions and sensitivity indices were estimated using 500 bootstrap estimates of the component functions, having bootstrap samples of size 1000 from D and estimators as systems of (orthogonal) bases [26]. The estimator's coefficient of variation ⇢ = µ/ was employed to filter out low-quality estimates (we use greater than two).

HDMR results for this article
Supplemental Table 4 reveals significant first and second order sensitivity indices, whereby well over half of (hysteresis) model variance is constituted by first and second-order effects. The parameter having by far the highest total sensitivity index (S total 0.0499 P P P X jk 0.2944 P P P X i + P P P X jk 0.6247 ⇤ Values of S are computed per [26], as described in Method usage for this article and satisfy ⇢ > 2

Hysteresis propagation
Both Fig. 3f and Fig. 4b have similar computational treatment: both solve the system of equations of Definition 5 (based on the ODE systems of Definitions 1 and 3). Using the method of lines, Definition 5's model is solved on a 2D square grid (n 2 points)-depicted in Fig. 4a-using secondorder finite difference approximations and periodic boundary conditions. The rate of convergence of the system to an equilibria, and in turn chosen to be the "end time" of the simulation, depends upon the parameter values. For both figures, we choose "reasonable" sets of parameter values which exhibit interesting non-equilibrium behavior with computationally short integration times t.
For Fig. 3f, a grid-size of n = 21 is taken, and the method of lines generates a system of 2 205 ODEs. The parameter values for hysteresis and non-hysteresis panels are shown below in Supplementary Table 5. Denoting the subset of the parameter space ⇥ which exhibits a hysteresis bifurcation as ⇥ h , i.e. ⇥ h ⇢ ⇥, we see that nearly all parameters have no such bifurcation: across all values of such parameters, there exists no subset of values in ⇥ h . However, for certain parameters, their parameter spaces may be partitioned into ⇥ h and ⇥ c h . Initial conditions to this system are specified as: the center grid point takes value X 0 (·) = 10, with the remaining X 0 (·) = 0 and all Y 0 = 0. Using these initial conditions in conjunction with those of Definition 3, the system of ODEs is numerical integrated on [0, t], with time t = 100 chosen because the system is at equilibrium. Non-equilibrium dynamics are contrasted between hysteresis and no hysteresis for various time points.
For Fig. 4b, a grid-size of n = 51 is taken, and the method of lines generates a system of 15 606 ODEs. The parameter values for hysteresis and non-hysteresis panels are shown below in Supplementary Table 6. Initial conditions to this system are specified as: N = 70 grid points are randomly assigned (without replacement) probability law X 0 (·) = Gamma(5, 2 ⇥ 10 3 ), with the remaining X 0 (·) = 0 and all Y 0 = 0. Using these initial conditions in conjunction with those of Definition 1, the system of ODEs is numerical integrated on [0, t], with time t = 100 chosen because the system is at equilibrium. Non-equilibrium dynamics are contrasted between hysteresis and no hysteresis for various time points. Table 5: Parameter values for Fig. 3f Parameter(s) Hysteresis No Hysteresis

Supplementary
The parameters (k Z , K Z , k MZ ) admit clearly identifiable partitions of (⇥ i ) into ⇥ h (hysteresis) and ⇥ c h (non-hysteresis) subspaces, while the remaining parameters do not and effectively behave as free parameters which do not alter qualitative behaviors. The parameters (T max , Z max , M max , E max , K a , K b , d T b , k M , k E ) are set to "unit" values, whilst the parameters identified to TGFb , conveying spatial effects of hysteresis, are set such that their putative effects concentrate well on the chosen (finite) 2D spatial domain. Table 6: Parameter values for Fig. 4b Parameter(s) Hysteresis No Hysteresis

Supplementary
The considerations of these parameters are identical to those of Supplementary Table 5 except that k Z is unchanged.

Proofs for Supplementary Section 1.3
Theorem 1.
Proof. We want to demonstrate the existence of a solution to f (x, ✓) = 0 for any ✓ (and x 0 ), which is the system of equations For S and E this can be solved immediately, giving whereby the fixed point value for S is uniquely specified by the system parameters, and the value for E depends only on Z. It remains to consider Z and M . For Z, we have the expression where we call the right-hand side F (M ), a function determined by the system parameters. Similarly, for M we obtain where we call the right-hand side G(Z), again a function determined by the system parameters. Using these functions, fixed point values of Z and M must satisfy Z = F (G(Z)) = (F G)(Z). Defining Proof. Note that x 0 is asymptotically stable if and only if all eigenvalues of J f (x 0 ) have negative real part, so let us use this condition instead of asymptotic stability in the proof. Suppose all eigenvalues of J f (x) have negative real part. Two of these are the real negative d S and d E , whose product is d S d E . Since the coefficients of J f (x) are real, its characteristic polynomial has real coefficients, and so its complex eigenvalues come in conjugate pairs. So, J f (x) has a further eigenvalue z with negative real part and non-zero imaginary part, thenz is the fourth eigenvalue, Otherwise, all eigenvalues of J f (x) are real and negative, and thus det J f (x) is the product of four negative real numbers and again is positive. Conversely, suppose det J f (x) > 0. We argue similarly: two eigenvalues are negative and real, d S and d E . The other two must then either be a pair of complex conjugates, in which case their real parts are the same and must be negative because J f (x) has at least three eigenvalues with negative real part, or must be real numbers, at least three of which are negative, and then the fourth must be negative as well since their product is det J f (x) > 0.

Theorem 2.
Proof. We will go back to discussing the J f in terms of partial derivatives of f rather than explicit values here. By expanding its determinant by minors first across the row of d S and then down the column of d E , we obtain Call the matrix whose determinant appears on the right A f . Then, since d S d E > 0, det A and det J f have the same sign, so by the lemma it follows that an equilibrium is stable if and only if det A f > 0 at the equilibrium. Now, a fixed point of F G is asymptotically stable if and only if the derivative (F G) 0 (Z) = F 0 (G(Z))G 0 (Z) = F 0 (M )G 0 (Z) is less than one in magnitude. Since F G is increasing, its derivative is always positive, so we can safely ignore the condition of magnitude and take this condition to be that F 0 (M )G 0 (Z) < 1. Thus, the statement of the theorem is equivalent to showing that det A and 1 F 0 (M )G 0 (Z) have the same sign. Let us write the component functions f Z and f M of the function f in terms of the functions F and G, whereby in terms of these functions, we have But, at an equilibrium point F (M ) = Z and G(Z) = M , so the second term is zero in each expression and we have simply and also So, the determinant of A, using the expression in terms of these functions, is Since k MZ M + d Z and k MZ Z + d M are both positive, this has the same sign as 1 F 0 (M )G 0 (Z), which is the necessary result.

Lemma 2.
Proof. This can seen through linearization about X = (S, M, Z, E) 2 R 4 + . The Jacobian is 0 and we see that two eigenvalues have negative real parts, d E and d S . A third eigenvalue also have negative real parts, and the fourth is negative whenever Z = 0 or whenever Z > 0 and Proof. Recall at equilibrium we have Solving for M , we have At the same time, at equilibrium, we have Equating these these expressions for M , we see that the equilibrium values of Z satisfy Taking the reciprocal on each side, we have Inserting M max = k M d M and Z max = k Z d Z and factoring k Z from the left-hand side, we have Finally, we multiply the second fraction of the left-hand side by Z and clear its denominator, giving Proof. Recall the polynomial relation which gives the equilibrium values of Z: For Z > 0, the left-hand side is positive, so the right-hand side must be positive. The first three terms of the right-hand side are automatically positive, and the last term must also be positive at all such values. The function S K S +S = 1 1+K S /S is strictly increasing in S for S positive. For a given fixed biologically reasonable Z, the right-hand side is strictly increasing in S for S > 0, and the result follows. Proof. We multiply the polynomial relation of Z from Proposition 1 by K S + S, giving a quadratic relation in S. Expanding to form aS 2 + bS + c = 0 using Mathematica, we attain we simplify the coefficients to Dividing through by vwx, we attain an equation of the form a 0 Noting a 0 c 0 = y vwx the discriminant d is given by and by completing the square we have Since Z 2 (0, Z max ), we have w > 1 and x > 0 and thus d > 0, whereby the quadratic equation in S will always have two distinct solutions. The product of these solutions is equal to c 0 a 0 , which can be written as Observe that v, w, x, y > 0 and 0 < x < 1. Therefore, 1 x > 1. Hence, the entire inner term is positive and c 0 a 0 is negative. Thus, of the two distinct real solutions, one must be positive and one must be negative. Thus, there exists a positive solution S for any Z 2 (0, Z max ).

D<=
The mathematical model used for transcriptional regulation is an ordinary differential equation, Underlying reactions for these ODEs are defined in Supplementary Table 8.
Parameters are defined in Supplementary Table 9.  Underlying reactions for these ODEs are defined in Supplementary Table 16.

Supplementary Table 8. Reaction kinetics for model of Supplementary
Parameters are defined in Supplementary Table 17.
Note: the functional form of the general model is elected for representation of autocrine signaling, conferring similar timescales of the autocrine and "core" models.