Tipping points and early warning signals in the genomic composition of populations induced by environmental changes

We live in an ever changing biosphere that faces continuous and often stressing environmental challenges. From this perspective, much effort is currently devoted to understanding how natural populations succeed or fail in adapting to evolving conditions. In a different context, many complex dynamical systems experience critical transitions where their dynamical behaviour or internal structure changes suddenly. Here we connect both approaches and show that in rough and correlated fitness landscapes, population dynamics shows flickering under small stochastic environmental changes, alerting of the existence of tipping points. Our analytical and numerical results demonstrate that transitions at the genomic level preceded by early-warning signals are a generic phenomenon in constant and slowly driven landscapes affected by even slight stochasticity. As these genomic shifts are approached, the time to reach mutation-selection equilibrium dramatically increases, leading to the appearance of hysteresis in the composition of the population. Eventually, environmental changes significantly faster than the typical adaptation time may result in population extinction. Our work points out several indicators that are at reach with current technologies to anticipate these sudden and largely unavoidable transitions.


S1 Calculation of the fitness value of a genome according to the NK model
In this section, and for the sake of clarity, we present a detailed example of how to calculate the fitness of the genomes in a given fitness landscape L, according to the NK model. We have focused on the fitness values f s associated to the sequences s of length N = 4 loci, an alphabet of A = 2 letters (0 and 1) and landscape ruggedness K = 1. First, we construct the matrix L that represents fitness landscape L. Its size is A K+1 × N = 4 × 4, and the elements L ij are random numbers uniformly chosen between 0 and 1 (see Table S1 for a realization of such matrix  Table S1. Matrix L associated to fitness landscape L in the case of genomes of length N = 2, alphabet of A = 2 letters, and landscape ruggedness K = 1. Every locus j interacts with locus j + 1. Numbers in bold correspond to the contributions of the different loci to the total fitness of the genome s = 0110. There are A K+1 = 4 values in each column, representing the A K+1 possible combinations of K +1 = 2 elements taking A = 2 different states (i.e. 00, 01, 10 and 11). For example, elements in column j = 1 (i.e. L 11 , L 21 , L 31 and L 41 ) represent all possible contributions of the first locus of each genome to its total fitness. Index l j stands for the correct one for locus j in a particular sequence.
In order to improve the comprehension of this methodology, let us calculate the fitness associated to sequence s = 0110. The actual contribution of locus j, L lj ,j , depends on its state and on those of the K = 1 other loci that interact with it. In this case, we suppose that every locus j interacts with locus j + 1 (and locus j = N = 4 interacts with locus 1), although any other criterion of interaction would be equally valid. For s = 0110 we obtain l 1 = 2, l 2 = 4, l 3 = 3 and l 4 = 1 (bold elements of matrix L in Table S1). Finally, the fitness of sequence s = 0110 is the average value of the fitness contributions of each of its loci, The calculation of the fitness associated to the remaining A N − 1 = 15 sequences of length N = 2, alphabet of A = 2 letters, and ruggedness K = 1, in the fitness landscape L defined by Table S1, would be done in a similar fashion.

S2 Robustness of the model
In this section we examine several modifications of the main model with the aim of proving the robustness of the phenomenology described. We present results for the distribution of changes in the state of the population at equilibrium under a stochastically varying environment (see Methods). As mentioned, the presence of fat tails in these histograms is indicative of the non-linear response to changes in the landscape and the generality of genomic shifts.

S2.1 Length of the genome
Computational limitations prevent the systematic use of genome lengths much larger than those used here. Figure S1(a) compares two lengths, N = 8 and 12. There is a quantitative difference between the two response curves shown, the one with larger responses corresponding to the larger length. The amount of changes of small size in the population is depleted as N grows, while the tail of the distribution gains weight, meaning that more events of large size are found. Though relatively small in this example, this is a robust trend with sequence length. Increasing the length has indeed significant effects on the structure of fitness landscapes, causing an increase in the number of local fitness maxima among other structural changes [1]. In practice, increasing the length of the sequence works towards isolating different regions of the landscape, thus enhancing the magnitude of sudden transitions.

S2.2 Ruggedness of the landscape (degree of epistasis)
Figure S1(b) illustrates how different values of K (which controls epistasis and thus the ruggedness of the landscape) affect the response of the population. The larger K, the more likely are sudden changes. Only in the case of very low or no epistasis do we obtain a narrow response curve (exponential decay, dashed lines). A landscape without epistasis has a low degree of ruggedness and is therefore similar to a Fujiyama-like landscape, where dynamics of populations are smooth by construction. In our case, even small values of K cause the development of a fat tail (power-law decay, dotted lines) in the response curve.

S2.3 Fraction of lethal mutations
In the examples that illustrate this work, we have used several values of the lethality coefficient f l . The reason is that in different systems, and also in populations with varying degrees of adaptation, the relative amount of beneficial, deleterious, and lethal mutations may vary amply [2]. Figure S1(c) illustrates the effect of parameter f l in the dynamics of the model. There is a significant dependence of the strength of the response to environment on f l . Actually, this turns out to be the parameter to which our model is most sensitive, yielding apparently narrow response curves (exponential decay, dashed lines) for small f l and fat tails (power-law decay, dotted lines) for larger values.
As it is shown in Eq. (S12), and proved in the analytic calculations shown in section S3, it is the difference between the two largest eigenvalues, λ 1 − λ 2 , that controls the intensity of change in the state of the population. It turns out that for too low values of f l the number of sequences with fitness zero is very low and that difference typically does not come too close to zero when N is small, so changes are less pronounced [3]. However, our results when other variables are changed indicate that in realistic scenarios, where the degree of epistasis should be high, the sequences much longer than those that can be computationally tackled, or environmental perturbations much more frequent, low values of f l and even f l = 0 should also yield sudden changes in population states.

S2.4 Intensity of environmental noise
In most of our examples, except for the case of hysteresis, we have allowed the system to reach mutationselection equilibrium before the next perturbation arrives. Also, we have selected noise of small amplitude. That is to say, our examples are very conservative regarding the pace of perturbations that populations might encounter in natural environments. Actually, it is even argued that fast-mutating populations subject to rapidly changing environments (such as RNA viruses, pathogens infecting different hosts, or free-living bacteria) are never able to attain true mutation-selection equilibria. Increases in the noise intensity are one form in which such perturbations can be modelled. As shown in Fig. S1(d) and analytically described in Eq. (S2), the intensity of response of the population increases very significantly with our parameter . Finally, smaller values of than those used in these examples also yield the same phenomenology.

S2.5 Mutation rate
The transition matrix used in our dynamics, M = FG, is the particular case µ = 1 of the more general situation M = r(1 − µ)FI + rµ S FG (Eq. (5) of the main text, see Methods for more information). In general, the mutation rate µ also plays a quantitative role in the dynamics, in addition to the fitness (F) and topology (G) of the landscape. In particular, µ may significantly affect the spread of the population over the genome space. In the limit of sufficiently small µ, all genomes might concentrate in a single node (homogeneous population) and this might severely interfere with the evolution of the population on the network [4]. In our case, however, this just enhances the intensity of the transitions, as shown in Fig. S1(e). As lower values of the mutation rate are used, the response curve displays an accumulation of abrupt transitions between non-overlapping population states (at ∆ u 1 (τ, τ + 1) ∼ 1). Independently of the value of µ used, the typical response curve presents a fat tail over several decades.

S2.6 Size of the alphabet
Finally, in Figs. 1-3 of the main text we have considered binary sequences (a two-letter alphabet), while in the case of nucleotides the size of the alphabet would be 4, rising to 20 in the case of proteins, and occasionally larger and certainly variable in the case of genes and their alleles. We can see in Fig. S1(f) that there are no major differences when the size of the alphabet is varied. To do so, we have plotted in bold black the curve corresponding to a typical case with A = 4, while the rest maintain the same parameters, but increasing N (red curve), K (green), and f l (blue). The shift shown in all three cases towards larger values of the difference between subsequent equilibria ∆ u 1 (τ, τ +1) certifies that increasing the length of the sequence, the ruggedness of the landscape and the fraction of lethal mutations enhance the intensity of transitions: in summary, that the general results obtained for 2-letter alphabets are also applicable to larger alphabets.

S3 Analytical calculation of changes in population state and population fitness
It is possible to calculate how the variations in the fitness of a node and in its population are distributed as the environment changes. The results, that we develop in this section, provide a proof of the qualitative difference between both distributions: the stochastic variable characterizing changes of fitness in a single node has a Gaussian, narrow distribution, while the response of the population of that node to changes in fitness can be arbitrarily large. Hence, the distribution corresponding to the latter variable can develop a long, fat tail. For clarity, and without any loss of generality, in these calculations we will assume f l = 0 (no lethal mutations), and a stochastically varying fitness landscape (see Methods).

S3.1 Distribution of fitness changes for individual nodes
Consider the stochastic variable X i , which describes the changes in the fitness value of node i in contiguous environments, Let us recall how matrix L, which eventually defines f i , changes from τ to τ + 1 under a stochastically fluctuating fitness landscape, where L jk (τ ) are random numbers uniformly distributed in [0, 1], η jk (τ ) are random numbers uniformly distributed in [−1, 1] and is the strength of the noise. Hence, following Eq. (1) of the main manuscript, Since η lj ,k is a random variable uniformly distributed in [− , ], its mean is zero and its variance equals 2 /3. In the limit of large N we can apply the Central Limit Theorem, obtaining that the probability distribution of the stochastic variable X i (τ ) = f i (τ + 1) − f i (τ ) follows a Gaussian distribution with mean and variance

S3.2 Distribution of changes in the population abundance of individual nodes
In this section we calculate the statistical properties of the changes in the fraction of sequences populating each node under an environmental change from τ to τ + 1, similarly to the previous section. To this end, we will consider the stochastic variable Using the results in section S3.1, we can estimate the change in the elements of the transition matrix M as or in matrix form where X(τ ) = F(τ + 1) − F(τ ) and P is a matrix whose entries are distributed as a Gaussian, N (0, 1). Therefore, the calculation of the changes in the population of a node with time reduces to the study of the perturbation of the transition matrix M by another matrix [5] whose elements are in this case taken from a Gaussian distribution of mean zero and standard deviation / √ 3N . We need, therefore, the expression of the eigenvector associated to the largest eigenvalue of a perturbed non-symmetric, non-negative matrix. For the sake of completeness, we show its analytical derivation.
Proof. The main target of the analytical calculation shown here is to describe the new matrix M as a perturbation of matrix M, and consequently obtain the maximum eigenvalue λ 1 of M as a perturbation of λ 1 , and its associated eigenvector u 1 as a perturbation of u 1 . This methodology is inspired on the perturbation theory of matrices shown in [6], and has been applied in the context of Complex Network theory [5]. where We first substitute Eqs. (S4), (S5) and (S6) in Eq. (S3) and equate terms with the same order of . Taking into account that u 1 = 1 ⇒ u 1 · v 1 = 0, we obtain to first order in (where the superindex L indicates the left eigenvector), and to second order, 2 Vector v 1 can be written as Including Eq. (S8) in (S7), and multiplying both sides by u L k from the left, we obtain c k = and including Eq. (S9) in Eq. (S5), and Eq. (S8) in Eq. (S6), we finally obtain .
Let us return to our particular case. Making use of the Eq. (S3) just obtained, and taking into account that m = A N and Q = 1 √ 3N P, for each node i we obtain where α ijkl = (u L l ) j (u 1 ) k (u l ) i , P jk are random numbers normally distributed following N (0, 1) and {conn} is the set of pairs jk corresponding to connected pairs of nodes of the space of genomes.
If we sum a number of Gaussian variables with different variances, the result is also normally distributed and its variance is equal to the sum of such variances. Therefore, it follows (S11) Note that, as mentioned in Methods and proved in [7], the eigenvalues of matrix M are real. Therefore, the terms l = 2 in the summations in Eqs. (S10) and (S11) are the most relevant ones because λ 1 − λ l > λ 1 − λ 2 for l > 2. Therefore, we finally obtain from where we see that the perturbations suffered by the population of each node after one environmental change can be arbitrarily high when λ 2 becomes close to λ 1 . Finally, note that, as V ar(f i (τ + 1) − f i (τ )) = 2 /(3N ),

S4 Supplementary Video S1. Animated description of a genomic shift in a population in a stochastically varying environment
The evolution of the genomic composition shown in Figs. 2(a), (c), (e), (g) and the evolution of the fitness landscape shown in Figs. 2(b), (d), (f) and (h) of the main text are plotted in Supplementary Video S1. The underlying regular space of genomes is plotted in pale grey, while dark blue links connect sequences with positive fitness. The size of each node i is proportional to population of sequence i, (u 1 ) i , (plotted in white) and to its fitness f i (plotted in red). As in Fig. 2 of the main text, 50 values of τ were plotted. Parameters are: length of the sequence N = 8 (which leads to a sequence space of 256 nodes), landscape ruggedness K = 4, alphabet A = 2, lethality coefficient f l = 0.55, and noise intensity = 0.01.