A conditional likelihood is required to estimate the selection coefficient in ancient DNA

Time-series of allele frequencies are a useful and unique set of data to determine the strength of natural selection on the background of genetic drift. Technically, the selection coefficient is estimated by means of a likelihood function built under the hypothesis that the available trajectory spans a sufficiently large portion of the fitness landscape. Especially for ancient DNA, however, often only one single such trajectories is available and the coverage of the fitness landscape is very limited. In fact, one single trajectory is more representative of a process conditioned both in the initial and in the final condition than of a process free to visit the available fitness landscape. Based on two models of population genetics, here we show how to build a likelihood function for the selection coefficient that takes the statistical peculiarity of single trajectories into account. We show that this conditional likelihood delivers a precise estimate of the selection coefficient also when allele frequencies are close to fixation whereas the unconditioned likelihood fails. Finally, we discuss the fact that the traditional, unconditioned likelihood always delivers an answer, which is often unfalsifiable and appears reasonable also when it is not correct.

genetics, albeit sometimes not completely realistic, provide a clear platform to derive analytical results easy to interpret and generalize. The aim of this work is to introduce a new likelihood function that works for any strength of the selection coefficient and for any value of the frequency, i.e., also for frequencies close to the fixation boundary. Accordingly, in order to understand the potentiality and the limits of such analysis we will first work with the Moran model of population genetics, which is by far one of the most intensively studied and successful metaphor of evolution under selection and drift 4,13,14 . We will then study the same problem with the one-locus two-allele Wright-Fisher model, which is definitively a more complex and more realistic metaphor of natural selection and drift 6 .
As we shall see, extracting the selection coefficient even in such a simple set-up is tricky. If one uses the wrong likelihood, apparently meaningful, self-consistent but otherwise incorrect conclusions are produced. The key point will be to understand that single time-series of processes that are per se non-stationary need to be treated as stochastic processes conditioned both in the initial and the final condition.

Results
The models that we consider have two types of alleles, A and B. In the Moran model we will have haploid individuals carrying the alleles of type A and B. In the Wright-Fisher model we will have diploid individuals carrying a pair of alleles of types A and B in one autosomal locus. Although these two models differ in structure and complexity, it is still possible to provide a common description of the underlying process of selection and drift. We start by considering a population of N alleles. To allow for a common treatment of both models we will assume that N is an even number. At any point in time, N A and N B are the number of alleles of type A and B in the population, respectively, and at each time point N A + N B = N holds. We will say that N A and N B are the frequencies of alleles A and B, respectively. Throughout the whole manuscript, we assume that these frequencies can be measured exactly (no sampling errors).
We will follow the fate of the number of alleles of type B whose dynamics will be described as a Markov chain in discrete time with two absorbing states in N B = 0 and N B = N. These two absorbing states correspond to the fixation of allele A and B, respectively.
A single historical trajectory of T + 2 measurements for the frequency N B can be used to estimate the selection coefficient. The trajectory has a initial condition N B (0) = i, followed by T intermediate measurements from strictly consecutive updates and one additional, final measurement at T F . In what follows, while N B (0) is the same for all cases studied here, we consider various options for the timing T F of the final measurement and for the value of the frequency of the alleles of type B at T F , N B (T F ). We will also assume that the time is measured in generations, even if, strictly speaking, in the Moran model the generations are overlapping and in the Wright-Fisher model they are non-overlapping. We consider a total of four different limiting cases (Fig. 1). On the one hand, the first two cases are when T F is just one generation after the Tth measurement, i.e., T F = T + 1. Ideally, these first two cases correspond to time series of consecutive generations finishing at present time. Case I is defined when N B (T F ) is at an intermediate frequency, i.e., N B (T F ) ≠ 0, N. Case II is when N B (T F ) = N, namely when the allele of type B has reached fixation before or at present time. On the other hand, the second two cases are when generation T F is long after generation T, i.e., T F ≫ T. Ideally, this corresponds to trajectories where the initial time t = 0 of the temporal observation is far back in the past so that also after T generations the time-series of available data is still far back in the past. Here, generation T F is at present time and N B (T F ) is known but the values of N B at times between generations T and T F are missing. We then distinguish between case III, when the present frequency N B (T F ) is at any intermediate frequency, i.e., N B (T F ) ≠ 0, N, and case IV where the present frequency is at fixation for the allele of type B, i.e., N B (T F ) = N. Obviously, cases III and IV reduce to cases I and/or II when the frequency at present time is ignored. As will become clear later, these cases are definitively different depending on the assumption that one makes for the present state. One can also recognize that case III is the most studied one in the literature so far 2,6,15 . Since in cases II and IV fixation can occur at any generation including generations t < T, with the Wright-Fisher model we have also considered a variant of these two cases in which N B (T F ) = N − 1, i.e., very close to fixation but not yet fixed. These variants do not present substantial differences in the results and are further discussed below. Schematic representation of the four cases considered here. The green bars represent available data for T consecutive generations, whereas the blue dashed lines represent non available data. The time arrow goes from left to right with the present time called generation T F . The data includes an initial condition at generation zero. We follow the trajectory of the allele B, whose frequency at present time is known in all cases. In cases I and II, T F is just one generation after the measurement T, i.e., T F = T + 1, so that the available data concern the recent history of the allele. In cases III and IV the measurement T F is made a long time after the measurement T. We can think of the cases III and IV as time-series where both the ancient history and the present frequency are known in detail but data in between are missing. Within the Wright-Fisher model we consider a variant of cases II and IV, in which B is very close to fixation but not yet fixed. For each one of these four cases we generate 100 independent time-series while keeping the selection coefficient fixed to S = 2, whose meaning is explained below for each of the two models separately. We generate such trajectories via stochastic simulations and then analyze them with the likelihood developed below to prove if we are able to reliably extract the selection coefficient. Within each of the four cases I to IV, all trajectories share the same initial and final conditions N B (0) and N B (T F ), respectively, but are otherwise completely independent.
Each trajectory is fully described by the index functional δ ij (t) ∈ {0, 1} such that namely δ ij (t) = 1 when a transition from frequency i to frequency j of the number of B alleles occurs at time step t. The index t runs over the measurements, t = 0, 1, 2, … , T. Thanks to this functional, the selection coefficient can be estimated by means of the conditioned likelihood where lowercase s refers to the estimated value of S and the conditional transition probability is defined as The selection coefficient s enters into this definition through the explicit form of the model as will be discussed in detail below and in the Methods section.
The application of the conditioning at the final frequency N B at the end of the trajectory allows to explicitly write the relationship between the conditioned and the non-conditioned transition probabilities by exploiting the Markov property of the chains, as 16 which in a shorthand we write as where P ij (s) are the non-conditioned transition probabilities as defined by the model. The functions φ ij|k are instead complex functionals, determined by the Doob's h-transform, that depend on s, i, j and T F − t (Methods). If we could ideally access a large number of trajectories collected under the same initial condition but free to cover the available fitness landscape, only the initial condition would matter and the condition in the final state is no longer necessary. This case is what one encounters in experimental evolution. The estimation of the selection coefficient in those cases should be made by means of the unconditioned likelihood 17 where n({i, j}) is the number of transitions between each pair of frequencies i and j in the ensemble of trajectories. The number of transitions can be computed through The likelihood function L(s), and variations thereof that take sampling errors into account, is the most commonly used function to estimate the selection coefficient 2 . For a correct interpretation of the results presented below it is relevant to note that L c (s) and L(s) are related through c where Φ (s) is a complex functional depending on the φ ij|k and on the specific trajectory described by δ ij (t).
In the following we present a comparison of the estimated value of the selection coefficient from the likelihood L c (s) and from the likelihood L(s), for the Moran and the Wright-Fisher models. Applied to each single trajectory both likelihoods allow to derive the most likely value of s. The variation of the maximum likelihood estimates across the set of 100 time-series for each of the four cases introduced earlier (Fig. 1) gives a distribution from which the average and the 95% confidence interval can be estimated. For this model, the 100 trajectories of type B frequencies for each of the four types ( Fig. 1) have a duration of T = 500 generations. The trajectories have been generated by standard methods for conditioned processes 16,19 and then both the conditioned likelihood L c (s) and the non-conditioned likelihoods L(s) have been numerically derived as described. Surprisingly, only for the case III, i.e., time-series far back in the past with the character not yet fixed at present time, also L(s) delivers a selection coefficient close to the true one (Fig. 2, grey boxes). In all other cases I, II, and IV, L(s) delivers selection coefficients that are quite different from the true one.
The Wright-Fisher model for one-locus two alleles. In the Wright-Fisher model we consider an autosomal locus of a diploid organism with two alleles A and B. Reproduction occurs with perfect mixing but population size is fixed to a total of N alleles (corresponding to N/2 individuals). The three possible genotypes have fitness W AA , W BB and W AB . The selection coefficient is S = W AA /W BB with codominance implying W AB /W BB = (1 + S)/2. With these choices, in the absence of genetic drift the evolutionary trajectory would deterministically lead to the fixation of the allele A. For finite populations instead, the zygotes of the next generation are sampled from the gametes from the previous generation, in which the frequency of the alleles A and B have been determined through the evolutionary dynamics. The number N B of alleles of type B in a finite adult population thus changes randomly from one generation to the next as a result of selection and drift (Methods). Also here the number of alleles N B is described as a Markov chain with two absorbing states in 0 and N, corresponding to the fixation of allele A and B, respectively.
This model is numerically more challenging than the Moran model. In particular, the time scale to fixation is shorter than for the Moran model because here the generations are non-overlapping. Here, thus, each trajectory has a duration of T = 100 generations. As for the Moran model, we have generated 100 independent time-series for each of the four types (Fig. 1). Using the transition probabilities of this model and the δ ij (t), we have numerically derived the conditional likelihood L c (s) and the unconditioned likelihood L(s). The results are qualitatively similar to the ones for the Moran model (Fig. 3). For type III trajectories, however, the two likelihoods perform differently, with L c (s) providing a good estimate of the selection coefficient and L(s) a poor estimate. As discussed below, this has to do with the very rapid time scales of the Wright-Fisher model. If T is set to 10 generations instead of 100, the estimate from L(s) becomes closer to the true value. Due to its rapid time scales, it was also convenient to set N B (T F ) = N − 1, i.e., very close to fixation, in order to have relatively long trajectories.

Methods
In both models considered here, the process N B is a Markov chain in discrete time in a finite state space {0, 1, … , N}. These Markov chains are characterized by the one step transition probability matrix P whose elements P ij are independent of time and are defined as The factors φ ij|k that enter into the definition of the conditioned transition probabilities can be explicitly written by exploiting the definition of conditional probabilities and the Markov property of the process 16 as which are non-negative functions dependent explicitly on i, j, k and T F − t for t = 0, 1, 2… , T. When T F = T + 1, as in the cases I and II (Fig. 2), the factors φ ij|k depend explicitly on time and change in such a way to realize the condition N B (T F ). Nevertheless, the knowledge of the transition probabilities P ij defined in Eq. (8) allow to compute all likelihoods through Eq. (9) for any choice of the parameters. When  T T F , as in the cases III and IV (Fig. 1), the factors φ ij|k do not depend on time 20 and can be computed as the mathematical limit T F → ∞ by exploiting the spectral properties of the transition probability matrix P. When k is a transient state, i.e., k ≠ 0, N, then where u ik is the probability of absorption in k for a process started in i. Since deciding when T F is sufficiently large to allow using these last limit cases may depend on the system 15 , the definition given in Eq. (9) was used to the limits of numerical precision for large powers. For the Moran model, at each generation, each individual of type A produces a number of offspring equal to W A and each individual of type B produces a number of offspring equal to W B . At each generation, just one among the entire pool of N A W A + N B W B offspring is chosen at random. This new individual, then, replaces one randomly chosen individual in the parents' population. With this dynamics, the population size remains constant but the frequencies N A and N B change with time. Eventually, all individuals will be either of type A or of type B.
We follow the fate of character B. At each generation and before fixation occurs, the frequency N B can increase by one, decrease by one or stay the same. Based on the dynamics described above, the probabilities associated to the changes of N B are given by where the selection coefficient S = W A /W B is non-negative and the transition probabilities are independent of time. When 0 ≤ S < 1 the individuals of type B have a selective advantage with respect to individuals of type A (i.e., P j > Q j ) and vice versa when S > 1. The borderline case S = 1 corresponds to neutral evolution. The probabilities P i , Q i and R i form the elements of the transition probability matrix The fixation probabilities as function of the initial frequencies and of the selection coefficient can be computed as absorption probabilities from this matrix 4,16,18 .
For the Wright-Fisher model, let N B (t) = i be the number of alleles of type B in the adult population at generation t. Then, according to the evolutionary dynamics the frequency of the allele B in the successive gamete population is 3 where p B (i) = i/N, p A (i) = 1 − p B (i) and W O is the average fitness of the adult population, defined as The frequency of the allele of type B in the new adult population is obtained through random sampling and leads to the transition probabilities according to the binomial sampling.

Discussion
At a first sight, it may seem odd that the correct likelihood should depend on the conditional transition probabilities P ij|k (s). In fact, L c (s) is computed on one single trajectory of a stochastic process governed by selection and genetic drift. The key point is that single trajectories of a stochastic process should be considered as representative of a bundle of trajectories starting and ending at fixed conditions. Functionals of single trajectories are thus conditioned not only in the initial condition but also in their final condition. When only one realization of ancient DNA variations is available a special form of conditioning in the future has to be included in order to correctly estimate the selection parameters. Such processes were already studied by Schrödinger 21 who recognized the emergence of possible contradictory claims from the observation of diffusion trajectories conditioned in their initial and final positions. In mathematics, this kind of conditioning has been studied in the context of Brownian bridges, namely processes conditioned both at their initial and final value, a precise description of which requires the introduction of the Doob's h-transform. More recently, the Doob's h-transform has become an essential theoretical tool to study the statistics of rare events 22 and to understand circular arguments in statistical analysis 16,23 . It was also shown that this transform emerges necessarily when trajectories are selected on the basis of their outcome 16 . The likelihood L(s) defined in Eq. (6), based on the transition probabilities P ij given in Eq. (12) is not the one that should be used to extract a parameter like the selection coefficient from one given trajectory. Indeed, L(s) fails in almost all cases to provide a realistic interval of confidence. The reason for the failure of this method is born in the fact that a given realization does not reveal if it is an unlikely event of a process that would otherwise typically behave differently. As a matter of fact, the process behind a given realization is rather more representative of a process conditioned (in probabilistic terms) to end at the frequency observed at its final observation. If one knows, from first principles, what is the microscopic (molecular) mechanism driving the process under scrutiny then one can follow the procedure explained in this work, derive the conditional probabilities P ij|k and write the likelihood L c (s) in terms of these conditioned quantities. This quite obviously provides a good estimate of the selection coefficient. A crucial requirement for the success of this enterprise is the knowledge of the correct model to use.
The use of the unconditioned likelihood L(s) would still give an answer, i.e., a value of s that is apparently consistent with the data. Indeed, case I, which describes a process conditioned on ending at an intermediate state k ≠ 0, N would lead to support the idea of neutral evolution or balancing selection and in fact, L(s) yields a value of s close to unity. In case II, when k = N instead, the individuals of type B would get fixed in the population and the analysis of such a trajectory by means of unconditioned likelihood L(s) would lead to support the idea of a selective advantage in favor of type B even if type A individuals have a selective advantage by construction. Moreover, the time dependence of the transition probabilities, due simply to the effect of conditioning as seen in Eq. (9), would deceptively support the idea of changing environmental conditions. We see that these conclusions, albeit logical from the point of view of explaining the observations a posteriori, are determined by conditioning, i.e., by the fact that N B (T F ) takes a particular value. Given our a priori knowledge of how we have generated the trajectories, conclusions taken through the analysis with the likelihood L(s) would be therefore deprived of any foundation. But if we had no such a priori knowledge, there would be no way to confirm or reject the conclusions based on L(s). Case III, with data coming from far back in the past and no fixation, presents some peculiarity. For the Moran model L(s) gives a relatively good estimate of the true selection coefficient whereas for the Wright-Fisher model it does not. The reason relies on the different time scales associated to absorption in each of these models. One step in the Wright-Fisher model corresponds to at least N steps in the Moran model. Thus, when the duration T of the time-series is very long and no absorption takes place at the end or close to the end of the time-series, the analysis performed with L(s) leads to a value of s close to unity, compatible with the apparent neutrality of the evolutionary trajectory. When T is short, instead, also for the Wright-Fisher model L(s) delivers a value of s closer to the true value (a test done with T = 10 confirmed this assertion). Therefore, the effect of conditioning in the future combined with the typical time scale of the process and the length of the measurement T is non trivial 15 . Finally, in case IV conditioning can be very strong because the process can enter fixation at any time before the present, including times during the observed time-series. From the point of view of the likelihood L(s), case IV would give type B individuals a selective advantage where L c (s) instead correctly predicts that A was in advantage. Furthermore, in the light of the relationship between L c (s) and L(s), it emerges especially in trajectories belonging to case II that L c (s) is bimodal, with a local maximum governed by L(s) and a second local maximum at larger values of s governed by Φ (s). This explains the larger confidence interval for this case in both models. This suggests that the ratio of the likelihoods R(s) = L c (s)/L(s) rather than L c (s) alone could be considered an even better functional to estimate the true value of the selection coefficient.
It had already been observed in the context of other models of population genetics that the generation of faithful trajectories of allele frequencies under the condition that fixation has occurred requires the introduction of a fictitious selection coefficient 19,24 . In the context of the Moran model instead, it was shown that under the condition that fixation has not occurred after long time, the transition probabilities require a correction factor 20 . While both these cases are included and generalized in this manuscript, we should stress here instead that extrapolating the selection coefficients from single historic records without due consideration to the peculiar conditioning associated to single trajectories gives values of the selection coefficients that are often very different from the real values.