The IICR and the non-stationary structured coalescent: demographic inference with arbitrary changes in population structure

In the last years, a wide range of methods allowing to reconstruct past population size changes from genome-wide data have been developed. At the same time, there has been an increasing recognition that population structure can generate genetic data similar to those produced under models of population size change. Recently, Mazet et al. (2016) showed that, for any model of population structure, it is always possible to find a panmictic model with a particular function of population size changes, having exactly the same distribution of T2 (the coalescence time for a sample of size two) to that of the structured model. They called this function IICR (Inverse Instantaneous Coalescence Rate) and showed that it does not necessarily correspond to population size changes under non panmictic models. Besides, most of the methods used to analyse data under models of population structure tend to arbitrarily fix that structure and to minimise or neglect population size changes. Here we extend the seminal work of Herbots (1994) on the structured coalescent and propose a new framework, the Non-Stationary Structured Coalescent (NSSC) that incorporates demographic events (changes in gene flow and/or deme sizes) to models of nearly any complexity. We show how to compute the IICR under a wide family of stationary and non-stationary models. As an example we address the question of human and Neanderthal evolution and discuss how the NSSC framework allows to interpret genomic data under this new perspective. Author summary Genomic data are becoming available for a rapidly increasing number of species, and contain information about their recent evolutionary history. If we wish to understand how they expanded, contracted or admixed as a consequence of recent and ancient environmental changes, we need to develop general inferential methods. Currently, demographic inference is either done assuming that a species is a single panmictic population or using arbitrary structured models. We use the concept of IICR (Inverse of the Instantaneous Coalescence Rate) together with Markov chains theory to develop a general inferential framework which we call the Non-Stationary Structured Coalescent and apply it to explain human and Neanderthal genomic data in a single structured model.


Introduction
The structured coalescent of Herbots (1994) assumes that the size of each subpopulation is 139 maintained constant under migration, which generates the following constraint at the popula- where q ij is the probability that one individual migrates from deme i to deme j. In other words, 142 all outward migrants must be replaced by inward migrants from the other islands. 143 This condition is required in the structured coalescent but we stress that it is not required 144 in the structured model of Notohara (1990) or when simulating data under structured models 145 using the ms software of Hudson (2002).
Looking now backward in time, Herbots (1994) defines the backward migration parameter from deme i to deme j (denoted m ij ) as: The backward migration parameter m ij represents the proportion of individuals in deme i that 149 were in deme j just before the migration step. Also, m i = i =j m ij represents the proportion 150 of individuals inside deme i that were in a different deme just before the migration step.

151
In this backward perspective, we suppose that we have a sample of k haploid genomes

176
If we now define, for all t ≥ 0, the exponential matrix P t = e tQ which has the same size as Q, 177 P t then satisfies the following properties, for all s, t:

178
• P t+s = P t P s (semigroup property), • P t is a stochastic matrix (each coefficient is non-negative and the sum over each row is 181 one).

182
Also each coefficient of the matrix P t , for all t ≥ 0, is a transition probability: where (X t ) t≥0 is a continuous-time Markov chain on the finite set I. In other words, X t is a 184 jump process, whose behaviour is the following:

185
• if at a given time s ≥ 0 we have X s = i, then it jumps away from state i after an 186 exponential time of parameter −Q(i, i), which does not depend on s, 187 • at each jump from state i, the rate at which state j is reached is The transition rate matrix Q then contains all the information on the behaviour of (X t ) t≥0 , 190 given the initial condition X 0 . We can see that, for all i ∈ I, the parameter Q(i, j) is the rate 191 of going from i to j, as soon as j = i, and the parameter −Q(i, i) is the rate of leaving i. 192 In the case of the structured coalescent the jump process of interest is the ancestral lineage matrices can be used to analyse a wide family of models of population structure. We take the 206 case of k = 2 and step by step explain how the transition rate matrix can be constructed. We 207 then describe the general algorithm used to construct the IICR for all the models analysed here 208 for k = 2. We finally apply this method to the n-island model and show that we can re-derive 209 the results obtained by Mazet et al. (2015) and Herbots (1994). in deme 1, one ancestral lineage in deme 2 and no ancestral lineage in deme 3. Note that We call E k,n the set of all possible states of a structured model with n 222 demes and a sample of size k. We have: where c represents the state when the MRCA of the sample is reached (|α| = 1).

224
The Markov chain can change from one state α ∈ E k,n to another state β ∈ E k,n either by a 225 migration event (which implies that |β| = |α|) or by a coalescence event inside a deme (which 226 implies that |β| = |α| − 1). Before constructing the associated transition rate matrix we need 227 to define an order on E k,n . We choose the inverse lexicographical order. For example for n = 3 228 and k = 2 it would be: Note that the state c (when the MRCA of the sample is reached) is placed in the last position. 230 We denote φ the function that associates an element of E k,n with the corresponding index in 231 the inverse lexicographical order. For example, taking the previous example, for α = (2, 0, 0) 232 it will be φ(α) = 1 and φ(c) = 7. Throughout the next sections we will assume that there is 233 an order on the set E k,n given by the function φ. We define n α := φ(α) so that P t (n α , 1) refers 234 to the first element of the row n α in the matrix P t .

235
The corresponding transition rate matrix can be constructed as: where i is the vector whose components are 1 on the i th position and 0 elsewhere.

237
The matrix Q describes two types of possible events for each configuration α:

238
• β = α − i + j when one lineage migrates (backward in time) from island i to island j.

239
The rate of this migration is M ij /2 (migration rate to deme j for each lineage in deme i) 240 times α i , the number of lineages present in deme i.

241
• β = α − i denotes a coalescence event between two lineages in deme i, which reduces 242 the number of lineages by one in this deme. This occurs only if α i ≥ 2. If this is not 243 the case we can see that α i (α i − 1) = 0. The term α i (α i − 1)/2 is the number of possible 244 pairs among the α i lineages. This term is multiplied by 1/c i since the i th island has a 245 population size equal to 2c i N , and 1/c i is the coalescence rate for each pair of lineages in 246 this island since time is scaled by 2N .

247
Since no other kind of event can occur than a migration or a coalescence, and multiple coa-248 lescences or migrations are negligible, the other rates are null. Note that the opposite of the 249 diagonal coefficient −Q(n α , n α ) is the total jump rate from configuration α.

250
The transition rate matrix can be very large depending on the model of population structure 251 assumed and on the sample size. For k ≤ n the number of states is on the order of n k , and the 252 matrix will have on the order of n 2k terms.

253
3.2 Case of a sample of two lineages (k = 2) 254 We now consider the case where we take a sample of two lineages (i.e., k = 2 corresponding 255 to two haploid genomes or one diploid genome) in an arbitrary model of population structure 256 with n demes of size 2c i N , for large N . We can reduce all possible configurations to only two 257 types of configurations, excluding the configuration where the two lineages have coalesced:

258
• both lineages are in the same deme i: α = 2 i ,

259
• the two lineages are in different demes, say, demes i and j with i = j: α = i + j .

260
When the two lineages are in the same deme (first case), there are two possible events that 261 can change the configuration: a coalescence with rate 1/c i , or a (backward) migration from i to 262 j = i for each lineage, with rate α i M ij /2 for both lineages, hence a total rate of α i M ij . When 263 a coalescence happens, the number of lineages decreases by one. When a migration from deme 264 i to deme j happens, the new configuration is one in which the lineages are now in different 265 demes, which is a second-type configuration.

266
When the two lineages are in different demes, no coalescence can occur and the two lineages 267 may either stay in the same deme or migrate to another deme, from i to (which can be equal 268 to j) for the first lineage, with rate α i M i /2, or from j to (which can be equal to i) for the 269 second lineage, with rate α j M j /2. If the lineages end up in the same deme we are back to a configuration of the first type, otherwise, we end up in a second-type configuration.

271
By definition, the number of rows and columns of the full transition rate matrix (that we 272 will call n c ) is the number of different configurations for the ancestral lineages. In the case of a 273 model with n demes and a sample of size k = 2, we have that n c = n 2 + 1. We will assume that 274 the "last configuration" is the one in which the two lineages have coalesced, and thus ignore 275 where the coalescence took place. Also note that the rate of a coalescence event in deme i 276 (which is equal to 1/c i ) depends on the size of deme i. In the transition rate matrices that we 277 will use here the coalescence configuration corresponds to the last row and column. Here we give a general algorithm that can be used to construct the transition rate matrix of a 281 given model. The first step is to explicitly order all the demes. Then, given the number n of 282 (ordered) demes the set of all possible configuration for k = 2 lineages is: where i + j means that there is one lineage in deme i and one lineage in deme j (note that it 284 could be i = j); and c is the configuration where both lineages have coalesced.

285
As in section 3.1 we take the inverse lexicographical order on E 2,n . Define φ as a function 286 from E 2,n to {1, 2, ..., |E 2,n |} such that φ(α) is the index of α according to the inverse lexico- Q ← n c × n c matrix full of zeros Initialisation; transition rate matrix l ← φ(y 1 , y 2 , . . . , y n ) 306 15:   3.4 Using the transition rate matrix to derive the distribution of coalescence times and evaluate the IICR for samples of size two 324 We now focus on the coalescence time between two lineages and see that we can derive the 325 IICR in terms of transition rate matrices. The theory of Markov chains (Norris, 1998) gives the 326 tools allowing to compute the probability distribution of T 2 based on the matrix exponential of 327 the transition rate matrix for the model of interest where P t is the transition semigroup of the corresponding Markov process, i.e., P t (n α , n β ) = the past and α(0) represents the initial sample configuration.

331
As noted in Section 2.2, the terms of P t represent the transitions probabilities of interest.

332
For instance, the term in row n α and column n β of P t represents the probability that the 333 process is in the configuration β at time t given that it was in the configuration α at time zero.

334
Thus, the probability that two lineages in the configuration α at t = 0 have reached their most 335 recent common ancestor at time t can be found as P t (n α , n c ), where n c is the last column since 336 n c = φ(c) is the column number of the coalescence state.

337
Consequently, if we denote by T α 2 the coalescence time of two lineages sampled in the 338 configuration α, the cumulative distribution function (cdf ) of this random variable can be 339 computed from the transition semigroup: The probability density function (pdf ) of T α 2 , f T α 2 (t), is by definition the derivative of F T α 2 (t).

341
It can thus be computed from the matrix P t , by using the property where P t is the matrix whose cells contain the derivative of the corresponding cells of P t . We can thus write It is then easy to derive, for any time t ≥ 0, the instantaneous coalescence rate, which is 345 the probability to coalesce at time t given that the lineages have not coalesced yet. This is by 346 definition the ratio .
In the next section, we show how transition rate matrices can be used to re-derive the  There is a third state that corresponds to the coalescence event which takes place 360 at rate 1. We thus obtain the simplified transition rate matrix where the first configuration is s, the second is d, and the third one corresponds to a coalescence 362 event, which can only occur when both lineages are in the same island.  Constructing the IICR for two stationary models, the 374 2D stepping stone and continent-island models 375 We now apply the framework and algorithm described above to two stationary models. To  Stepping stone models (Kimura, 1994;Malécot and Blaringhem, 1948) assume that the demes without edge with edge, island 1 (corner) with edge, island 2 (middle edge) with edge, island 5 (center) Figure 3: IICR plots for the 2D stepping stone model. Here we assumed a model with 3 × 3 = 9 islands and M = 1, with and without edge effect. In the model with edge effect, we plot the three ways to sample two lineages in the same island: in island 1, 3, 7 or 9 (corner), in island 2, 4, 6 or 8 (middle of the edge), and in island 5 (center of the lattice).
at a value equal to the deme size and converge in the ancient past towards the same plateau.  Here we assume a model where the population is divided into n demes (one big deme called 413 continent and n − 1 equally sized demes, smaller than the continent, called islands). The 414 continent is connected with the remaining n − 1 islands, but the islands are not connected 415 between each other ( Figure 2). Therefore, migration can only occur between the continent 416 and the islands, but not between different islands. We order the n demes in such a way that 417 the continent is deme number 1, whose (scaled) size is c 1 . We denote c 2 the size of the other 418 islands, and M 1 /2 the (scaled) migration rate from the continent to each island, and M 2 /2 419 the migration rate from each island to the continent. Condition (1) implies that we have the 420 following constraint: For the case n ≥ 3, the symmetry of the model allows us to consider, for a sample of two Note that c is the ratio between the sizes of the islands and the continent, and that the diagonal : IICR for a continent-island model. We constructed the transition rate matrix for a model with n = 4, namely one continent and three same-sized islands. The sizes of the continent and of the islands were set to c 1 = 1 and c 2 = 0.05, respectively. In other words, the continent was 20 times larger than the islands. We set the migration rates to M 1 /2 = 0.05, M 2 /2 = 1 (note that once M 1 is set, M 2 is constrained to keep inward and outward migrant gene numbers equal, as required by equation 1). In this model there are only four types of IICR curves, two IICR s and two IICR d . The first two correspond to the cases where we sample the two lineages either in the continent or in one of the islands. The IICR d curves correspond to cases where one gene comes from the continent and the other from an island or when the two genes come from two different islands. In this section we extend our work to non-stationary structured (NSS) models under the co- Previous sections showed that to any given stationary structured population model corresponds 469 a transition rate matrix, Q that can be constructed and used to predict the IICR for a given 470 sample configuration. Assuming that we sample k genes in configuration α, we call T α k the 471 time to the first coalescence event among these k lineages. We also described how the theory of 472 Markov chains allows to compute the probability distribution of T α k from Q using the formula: where n α denotes the index of the configuration α and n c is the number of possible configurations 474 and corresponds to the index of the coalescence configuration.

475
The matrix P t (which is the transition semigroup) has size n c × n c and is obtained by Markov jump process, even if the transition probabilities between these states will change. So, 490 the size of the matrix P t will always be the same. time T , we can computeP t by using the semigroup property as follows: In particular, the distribution of T α k , the first coalescence time of k genes sampled in configu-498 ration α under this structured model with a past demographic change event, can be computed The pdf of T α k can then be computed by This procedure can be extended to any number of parameter changes, by defining the

603
We would like to stress that the theoretical arguments that guarantee the convergence of Markov process is beyond the objectives of this work and deserves an independent study too.  Changes in connectivity in a complex splitting model produce complex genomic patterns 673 that cannot be easily interpreted. By using the IICR and the NSSC we were able to re-interpret 674 human and Neanderthal evolution, while stressing that it is only one of probably many possible 675 interpretations.

676
The structured scenario used here for humans and Neanderthals ignores spatial structure and migration rates are identical between all islands). The advantage of using symmetries is 716 that it significantly reduces the size of the transition rate matrix and computation time but 717 this idea will not be viable for all structured models.

718
In conclusion, one of the great challenges of population genetics inference is to identify the 719 structured models that could explain existing genomic data. Until now the choices of structured 720 models has been to a large extent arbitrary. The NSSC modelling framework proposed here may 721 be a powerful and promising way to overcome that challenge, and perhaps reduce arbitrariness 722 and some level of story-telling that has often plagued human evolution discourse. All scripts 723 used to carry out the simulations or analyse the data will be made available upon publication 724 of the manuscript.

Supplementary Information for
The IICR and the non-stationary structured coalescent: demographic inference with arbitrary changes in population structure 1 General algorithm for the construction of the transition rate matrix for two lineages We give a general algorithm that can be used to construct the transition rate matrix of a given model. The first step is to explicitly order all the demes. Then, given the number n of (ordered) demes the set of all possible configuration for k = 2 lineages is: where i + j means that there is one lineage in deme i and one lineage in deme j (note that it could be i = j); and c is the configuration where both lineages have coalesced.
We take the inverse lexicographical order on E 2,n . Define φ as a function from E 2,n to {1, 2, ..., |E 2,n |} such that φ(α) is the index of α according to the inverse lexicographical order. Then φ −1 is the inverse of φ and φ −1 (i) gives the element of E 2,n which is at position i according the inverse lexicographical order.
Once the function φ is defined and we have the values of C = (c 1 , . . . , c n ) (the size of the demes) and M ij (the migration matrix), we can use the following algorithm to construct the transition rate matrix Q: l ← φ(y 1 , y 2 , . . . , y n ) 15: return Q 28: end procedure Note that since the last configuration (coalescence) is an absorbing state of the Markov process, the last row has only zeros.
2 Constructing the IICR for stationary models. Examples: stepping stone and continent-island We now apply the framework and algorithm described above to some stationary models. By a stationary model we understand a structured model in which the parameters (i.e., number of demes, sizes of demes and gene flow) remain constant over time. To our knowledge, there is no analytical expression for the distribution of the coalescence time T 2 under these models. For some of them it is possible to find a simplified transition rate matrix using some symmetries.
In those case we give the corresponding transition rate matrix Q that can be used to compute numerically the distribution of T 2 and the IICR. In other cases it is not possible to get a simplified version of Q and we used the algorithm given in section 1 to obtain the IICR.

stepping-stone models
Stepping stone models (Kimura, 1953;Malécot, 1948) assume that the demes are located at the nodes of a regular lattice in one or two dimensions (hereafter 1D and 2D stepping stone models). Each deme can have up to four neighbours and migration events are only possible between neighbouring demes. These models incorporate space, and are thus thought to be more realistic than the n-island model described above, which implicitly assumes that migration is as likely between neighbours as it is between distant islands. The border demes can either be connected with each other, hence forming a torus, or can behave as bouncing borders ( Figure  S1). In some models the bouncing borders migrants are assumed to stay in their deme, whereas in other models they are distributed among the demes to which their deme is connected. We will distinguish two cases: 1. Without edges: One dimension (1D circular stepping stone) and two dimensions (2D torus stepping stone). They are more symmetric since all the migration rates are equal.
2. With edges: 1D and 2D stepping stone. Islands located on the edges and in the corners have fewer neighbours than islands in the middle of the lattice. In order to maintain simplicity and symmetry, the same migration rate is taken between each pair of islands. This implicitly assumes that migrants trying to migrate "outside" are bouncing back to their deme of origin. As we will see there are still more parameters in the model, and the corresponding transition rate matrices are more complex.
We will give an example of each of the four combinations: one or two dimensions, and with and without edge effects.

Circular 1D stepping-stone model
Here we assume that the population is divided into n (n ≥ 2) equal-sized islands which are located on a circle ( Figure S2). Each island thus receives immigrants coming only from its two neighbours.
With the notations of the main manuscript, ∀i = 1 . . . n we set c i = 1, M i = M , and The symmetry of the model allows us to consider that the configuration of a sample of two lineages depends only on their distance d, defined as the number of islands separating them, d ranges from 0 to n/2 ( x is the largest integer not larger than x), that is, n/2 + 1 different values.
The corresponding matrix Q is then of size n/2 + 2, the last configuration corresponding to the coalescence event, which can occur only if both lineages are in the same island. When there are five demes (n = 5), then we have n/2 = 2, the simplified transition rate matrix Q has thus 4 rows and columns: The first row represents the transitions away from the configuration in which both lineages are in the same island. They coalesce with rate 1/c i = 1. Each lineage can migrate with rate M/2 towards any of the two neighbouring islands. Any of these migrations will lead to a configuration in which both lineages are in a pair of islands distant of 1 unit (this is the second configuration that we consider).
From this second configuration (corresponding to the second row), no coalescence can occur and each lineage can only migrate to the next island, leading to two possible configurations. Either the migration brings them back on the same island (and we are back to the first configuration with rate M/2) or one of them migrates to the next island hence increasing the distance between them by one unit to 2 units (this is the third configuration). Since there are n = 5 islands there cannot be a distance greater than two (islands 2 and 5 or islands 1 and 4 are only 2 units distant) and we have thus all possible configurations of the simplified matrix Q. Also, since n = 5 is odd, migration events from this third configuration can only lead to configurations that are identical to itself or to the second one (with rate M/2) (see Figure S2). Some IICR corresponding to the circular stepping stone are shown in Figures S3, S4 and S5.
When there are six demes (n = 6), then we have n/2 = 3, the simplified matrix Q has thus 5 rows and columns: The only difference with the previous example is the fourth configuration, which corresponds to the largest distance of 3 units. From that configuration all migration events necessary lead to the third configuration (corresponding to a distance of 2).

1D stepping-stone model with bouncing edges
Here we consider the edge effects since the two islands located at the extremes of the 1D stepping stone have only one neighbour. The population is divided into n (n ≥ 2) equal-sized island (see Figure S6).
Since there are fewer symmetries than in the circular model, there are significantly more possible configurations in the simplified transition rate matrix Q and we now have to take into account the distance between the two lineages, and the distance from the edge of the linear stepping stone.
The general case can be analysed using combinatorics approaches but this will not be presented here and we will simply give the results for n = 4. Even in this case the simplified version of the transition rate matrix Q has as many as seven rows and seven columns. If we denote by (i, j) the configuration when one lineage is in island i and the other in island j, with i, j = 1 . . . 4, and given the central symmetry of the model, we can enumerate the configurations as follows : If we replace M 2 by M in equation (2) we have M 1 = c 2 M/c 1 . Then, we normalise population sizes by fixing c 1 = 1. Denoting c 2 /c 1 = c 2 by c, we obtain the following transition rate matrix: Note that c is the ratio between the sizes of the islands and the continent, and that the diagonal entries are obtained by the constraint that the sum over each row is zero. Figure S9 shows the IICR s and IICR d plots for the different sample configurations for a pair of genomes in a continent-island model with n = 4 (one continent and three islands). As expected from previous work on the IICR (Mazet et al., 2016;, first generation hybrid individuals, whose genome is sampled in different demes, exhibit IICR plots which would be interpreted as expansions from an ancient stationary population, even though the total population size is constant. One of the most striking result is that a diploid individual sampled in one of the islands exhibits an IICR that suggests (forward in time) an ancient stationary population which first expanded before being subjected to a significant population decrease. Thus, different individuals will exhibit very different history, not because their populations were subjected to different demographic histories, but because the IICR does not represent the history of a population. It represents the coalescent history of a particular sample in a particular model.

Particular case: only one continent and one island
If we focus on the particular case where there is only one continent and one island (i.e. n = 2), then configuration 3 in the case n ≥ 3 does not exist anymore. We thus obtain the following 4 × 4 transition rate matrix:    without edge with edge, island 1 (corner) with edge, island 2 (middle edge) with edge, island 5 (center) Figure S8: IICR plots for the 2D stepping stone model. Here we assumed a model with 3×3 = 9 islands and M = 1, with and without edge effect. In the model with edge effect, we plot the three ways to sample two lineages in the same island: in island 1, 3, 7 or 9 (corner), in island 2, 4, 6 or 8 (middle of the edge), and in island 5 (center of the lattice). Both in the continent Both in one island Continent-island Different islands Figure S9: IICR for a continent-island model. We constructed the transition rate matrix for a model with n = 4, namely one continent and three same-sized islands. The sizes of the continent and of the islands were set to c 1 = 1 and c 2 = 0.05, respectively. In other words, the continent was 20 times larger than the islands. We set the migration rates to M 1 /2 = 0.05, M 2 /2 = 1 (note that once M 1 is set, M 2 is constrained to keep inward and outward migrant gene numbers equal, as required by equation 1). In this model there are only four types of IICR curves, two IICR s and two IICR d . The first two correspond to the cases where we sample the two lineages either in the continent or in one of the islands. The IICR d curves correspond to cases where one gene comes from the continent and the other from an island or when the two genes come from two different islands.