A conditional likelihood is required to estimate the selection coefficient in ancient DNA

Valleriani, Angelo

doi:10.1038/srep31561

Download PDF

Article
Open access
Published: 16 August 2016

A conditional likelihood is required to estimate the selection coefficient in ancient DNA

Angelo Valleriani¹

Scientific Reports volume 6, Article number: 31561 (2016) Cite this article

1998 Accesses
1 Citations
3 Altmetric
Metrics details

Subjects

Abstract

Time-series of allele frequencies are a useful and unique set of data to determine the strength of natural selection on the background of genetic drift. Technically, the selection coefficient is estimated by means of a likelihood function built under the hypothesis that the available trajectory spans a sufficiently large portion of the fitness landscape. Especially for ancient DNA, however, often only one single such trajectories is available and the coverage of the fitness landscape is very limited. In fact, one single trajectory is more representative of a process conditioned both in the initial and in the final condition than of a process free to visit the available fitness landscape. Based on two models of population genetics, here we show how to build a likelihood function for the selection coefficient that takes the statistical peculiarity of single trajectories into account. We show that this conditional likelihood delivers a precise estimate of the selection coefficient also when allele frequencies are close to fixation whereas the unconditioned likelihood fails. Finally, we discuss the fact that the traditional, unconditioned likelihood always delivers an answer, which is often unfalsifiable and appears reasonable also when it is not correct.

Fast and strong amplifiers of natural selection

Article Open access 29 June 2021

The population genomics of adaptive loss of function

Article Open access 11 February 2021

Genetic load: genomic estimates and applications in non-model animals

Article 08 February 2022

Introduction

Past records of the frequency of a character, i.e., an allele or a phenotype, until present observational time are often the only source of information to infer the strength of selection on that character. Time series of ancient DNA, in particular, are becoming available thanks to modern advances in preparation and sequencing methods^1,2. These past records deliver the fluctuating frequency of an allele over time. The nature of these fluctuations is characterized by the combined effect of various mechanisms, the simplest of which are natural selection and genetic drift, on which we will focus our attention here. While natural selection drives the frequency towards fixation or stabilization, genetic drift caused by a small effective population size works towards elimination of genetic diversity and, thus, towards fixation of one of the characters or alleles³. As such, if the population size is known, genetic drift is a noisy effect that changes the frequency of the alleles and masks the effect of selection.

Natural selection can be theoretically described with relatively simple population genetics models, such as the Moran and the Wright-Fisher models^3,4. At the basis of these models, the effect of natural selection is often crystallized in one single parameter per locus, called the selection coefficient. One of the tasks ahead of the analysis of DNA time-series is thus the extrapolation of the underlying selection coefficient. Indeed, the selective advantage of a certain character is quite impossible to determine from first principles, e.g. from an evaluation of metabolic costs and benefits, with the exception perhaps of a few experimentally controlled cases in bacterial populations. But even in bacteria, the advantage of a certain gene compared to another is determined indirectly, mostly by competition experiments and growth rate measurements⁵.

Various methods, mostly based on maximum likelihood techniques have been developed to duly take both genetic drift and sampling errors into account^6,7,8,9. Several limiting cases have considered the task of determining the selection coefficient in the absence of genetic drift, i.e., with large populations, thus taking in fact a deterministic approach^5,10,11,12. The limiting cases that we consider here are both an haploid character with two competing alleles and a one-locus two-allele model with selection and codominance. We consider a finite population with perfect sampling. These conditions allow an analytic and precise treatment of the effect of genetic drift.

Taking apart those cases where the population size is too big for genetic drift to play any role, in the general case it is possible that the less advantageous character or allele is present at a larger frequency than a competing but more advantageous character. Nevertheless, we may inquire if and when a given time-series of the frequencies is informative of the relative selection strength of the two competing characters. Simple models of population genetics, albeit sometimes not completely realistic, provide a clear platform to derive analytical results easy to interpret and generalize. The aim of this work is to introduce a new likelihood function that works for any strength of the selection coefficient and for any value of the frequency, i.e., also for frequencies close to the fixation boundary. Accordingly, in order to understand the potentiality and the limits of such analysis we will first work with the Moran model of population genetics, which is by far one of the most intensively studied and successful metaphor of evolution under selection and drift^4,13,14. We will then study the same problem with the one-locus two-allele Wright-Fisher model, which is definitively a more complex and more realistic metaphor of natural selection and drift⁶.

As we shall see, extracting the selection coefficient even in such a simple set-up is tricky. If one uses the wrong likelihood, apparently meaningful, self-consistent but otherwise incorrect conclusions are produced. The key point will be to understand that single time-series of processes that are per se non-stationary need to be treated as stochastic processes conditioned both in the initial and the final condition.

Results

The models that we consider have two types of alleles, A and B. In the Moran model we will have haploid individuals carrying the alleles of type A and B. In the Wright-Fisher model we will have diploid individuals carrying a pair of alleles of types A and B in one autosomal locus. Although these two models differ in structure and complexity, it is still possible to provide a common description of the underlying process of selection and drift. We start by considering a population of N alleles. To allow for a common treatment of both models we will assume that N is an even number. At any point in time, N_A and N_B are the number of alleles of type A and B in the population, respectively and at each time point N_A + N_B = N holds. We will say that N_A and N_B are the frequencies of alleles A and B, respectively. Throughout the whole manuscript, we assume that these frequencies can be measured exactly (no sampling errors).

We will follow the fate of the number of alleles of type B whose dynamics will be described as a Markov chain in discrete time with two absorbing states in N_B = 0 and N_B = N. These two absorbing states correspond to the fixation of allele A and B, respectively.

A single historical trajectory of T + 2 measurements for the frequency N_B can be used to estimate the selection coefficient. The trajectory has a initial condition N_B(0) = i, followed by T intermediate measurements from strictly consecutive updates and one additional, final measurement at T_F. In what follows, while N_B(0) is the same for all cases studied here, we consider various options for the timing T_F of the final measurement and for the value of the frequency of the alleles of type B at T_F, N_B(T_F). We will also assume that the time is measured in generations, even if, strictly speaking, in the Moran model the generations are overlapping and in the Wright-Fisher model they are non-overlapping. We consider a total of four different limiting cases (Fig. 1). On the one hand, the first two cases are when T_F is just one generation after the Tth measurement, i.e., T_F = T + 1. Ideally, these first two cases correspond to time series of consecutive generations finishing at present time. Case I is defined when N_B(T_F) is at an intermediate frequency, i.e., N_B(T_F) ≠ 0, N. Case II is when N_B(T_F) = N, namely when the allele of type B has reached fixation before or at present time. On the other hand, the second two cases are when generation T_F is long after generation T, i.e., T_F ≫ T. Ideally, this corresponds to trajectories where the initial time t = 0 of the temporal observation is far back in the past so that also after T generations the time-series of available data is still far back in the past. Here, generation T_F is at present time and N_B(T_F) is known but the values of N_B at times between generations T and T_F are missing. We then distinguish between case III, when the present frequency N_B(T_F) is at any intermediate frequency, i.e., N_B(T_F) ≠ 0, N and case IV where the present frequency is at fixation for the allele of type B, i.e., N_B(T_F) = N. Obviously, cases III and IV reduce to cases I and/or II when the frequency at present time is ignored. As will become clear later, these cases are definitively different depending on the assumption that one makes for the present state. One can also recognize that case III is the most studied one in the literature so far^2,6,15. Since in cases II and IV fixation can occur at any generation including generations t < T, with the Wright-Fisher model we have also considered a variant of these two cases in which N_B(T_F) = N − 1, i.e., very close to fixation but not yet fixed. These variants do not present substantial differences in the results and are further discussed below.

For each one of these four cases we generate 100 independent time-series while keeping the selection coefficient fixed to S = 2, whose meaning is explained below for each of the two models separately. We generate such trajectories via stochastic simulations and then analyze them with the likelihood developed below to prove if we are able to reliably extract the selection coefficient. Within each of the four cases I to IV, all trajectories share the same initial and final conditions N_B(0) and N_B(T_F), respectively, but are otherwise completely independent.

Each trajectory is fully described by the index functional δ_ij(t) ∈ {0, 1} such that

namely δ_ij(t) = 1 when a transition from frequency i to frequency j of the number of B alleles occurs at time step t. The index t runs over the measurements, t = 0, 1, 2, …, T. Thanks to this functional, the selection coefficient can be estimated by means of the conditioned likelihood

where lowercase s refers to the estimated value of S and the conditional transition probability is defined as

The selection coefficient s enters into this definition through the explicit form of the model as will be discussed in detail below and in the Methods section.

The application of the conditioning at the final frequency N_B at the end of the trajectory allows to explicitly write the relationship between the conditioned and the non-conditioned transition probabilities by exploiting the Markov property of the chains, as¹⁶

which in a shorthand we write as

where P_ij(s) are the non-conditioned transition probabilities as defined by the model. The functions ϕ_ij|k are instead complex functionals, determined by the Doob’s h-transform, that depend on s, i, j and T_F − t (Methods).

If we could ideally access a large number of trajectories collected under the same initial condition but free to cover the available fitness landscape, only the initial condition would matter and the condition in the final state is no longer necessary. This case is what one encounters in experimental evolution. The estimation of the selection coefficient in those cases should be made by means of the unconditioned likelihood¹⁷

where n({i, j}) is the number of transitions between each pair of frequencies i and j in the ensemble of trajectories. The number of transitions can be computed through . The likelihood function L(s) and variations thereof that take sampling errors into account, is the most commonly used function to estimate the selection coefficient². For a correct interpretation of the results presented below it is relevant to note that L_c(s) and L(s) are related through

where Φ(s) is a complex functional depending on the ϕ_ij|k and on the specific trajectory described by δ_ij(t).

In the following we present a comparison of the estimated value of the selection coefficient from the likelihood L_c(s) and from the likelihood L(s), for the Moran and the Wright-Fisher models. Applied to each single trajectory both likelihoods allow to derive the most likely value of s. The variation of the maximum likelihood estimates across the set of 100 time-series for each of the four cases introduced earlier (Fig. 1) gives a distribution from which the average and the 95% confidence interval can be estimated.

The Moran model

We consider the simplest version of the Moran model^4,13,18, which consists of a population of N individuals split into N_A individuals carrying the character A and N_B individuals carrying the character B. Except for the characters A and B, the individuals are identical. Individuals of type A have fitness W_A and individuals of type B have fitness W_B. The selection coefficient is S = W_A/W_B. In the Moran model the generations are overlapping and the dynamics runs as follows. At each time point t, one of the existing individuals reproduces with a probability proportional to its fitness. The resulting offspring is identical to the parent individual and replaces one of the existing individuals chosen at random with uniform probability. At each time step, thus, the number N_B of B individuals can increase or decrease by one, or stay the same with probabilities that depend on N_B, N and S (Methods). The Moran model is thus a random walk on a line for the number N_B, with two absorbing states in 0 and N corresponding to the fixation of the character A and B, respectively.

For this model, the 100 trajectories of type B frequencies for each of the four types (Fig. 1) have a duration of T = 500 generations. The trajectories have been generated by standard methods for conditioned processes^16,19 and then both the conditioned likelihood L_c(s) and the non-conditioned likelihoods L(s) have been numerically derived as described. Surprisingly, only for the case III, i.e., time-series far back in the past with the character not yet fixed at present time, also L(s) delivers a selection coefficient close to the true one (Fig. 2, grey boxes). In all other cases I, II and IV, L(s) delivers selection coefficients that are quite different from the true one.

The Wright-Fisher model for one-locus two alleles

In the Wright-Fisher model we consider an autosomal locus of a diploid organism with two alleles A and B. Reproduction occurs with perfect mixing but population size is fixed to a total of N alleles (corresponding to N/2 individuals). The three possible genotypes have fitness W_AA, W_BB and W_AB. The selection coefficient is S = W_AA/W_BB with codominance implying W_AB/W_BB = (1 + S)/2. With these choices, in the absence of genetic drift the evolutionary trajectory would deterministically lead to the fixation of the allele A. For finite populations instead, the zygotes of the next generation are sampled from the gametes from the previous generation, in which the frequency of the alleles A and B have been determined through the evolutionary dynamics. The number N_B of alleles of type B in a finite adult population thus changes randomly from one generation to the next as a result of selection and drift (Methods). Also here the number of alleles N_B is described as a Markov chain with two absorbing states in 0 and N, corresponding to the fixation of allele A and B, respectively.

This model is numerically more challenging than the Moran model. In particular, the time scale to fixation is shorter than for the Moran model because here the generations are non-overlapping. Here, thus, each trajectory has a duration of T = 100 generations. As for the Moran model, we have generated 100 independent time-series for each of the four types (Fig. 1). Using the transition probabilities of this model and the δ_ij(t), we have numerically derived the conditional likelihood L_c(s) and the unconditioned likelihood L(s). The results are qualitatively similar to the ones for the Moran model (Fig. 3). For type III trajectories, however, the two likelihoods perform differently, with L_c(s) providing a good estimate of the selection coefficient and L(s) a poor estimate. As discussed below, this has to do with the very rapid time scales of the Wright-Fisher model. If T is set to 10 generations instead of 100, the estimate from L(s) becomes closer to the true value. Due to its rapid time scales, it was also convenient to set N_B(T_F) = N − 1, i.e., very close to fixation, in order to have relatively long trajectories.

Methods

In both models considered here, the process N_B is a Markov chain in discrete time in a finite state space {0, 1, …, N}. These Markov chains are characterized by the one step transition probability matrix P whose elements P_ij are independent of time and are defined as

The factors ϕ_ij|k that enter into the definition of the conditioned transition probabilities can be explicitly written by exploiting the definition of conditional probabilities and the Markov property of the process¹⁶ as

which are non-negative functions dependent explicitly on i, j, k and T_F − t for t = 0, 1, 2…, T. When T_F = T + 1, as in the cases I and II (Fig. 2), the factors ϕ_ij|k depend explicitly on time and change in such a way to realize the condition N_B(T_F). Nevertheless, the knowledge of the transition probabilities P_ij defined in Eq. (8) allow to compute all likelihoods through Eq. (9) for any choice of the parameters. When , as in the cases III and IV (Fig. 1), the factors ϕ_ij|k do not depend on time²⁰ and can be computed as the mathematical limit T_F → ∞ by exploiting the spectral properties of the transition probability matrix P. When k is a transient state, i.e., k ≠ 0, N, then

where λ₀ is the largest non-trivial eigenvalue of P and w₀(i) is the i-th component of the corresponding right eigenvector. When k represents fixation, i.e., k = 0 or N, then¹⁶

where u_ik is the probability of absorption in k for a process started in i. Since deciding when T_F is sufficiently large to allow using these last limit cases may depend on the system¹⁵, the definition given in Eq. (9) was used to the limits of numerical precision for large powers.

For the Moran model, at each generation, each individual of type A produces a number of offspring equal to W_A and each individual of type B produces a number of offspring equal to W_B. At each generation, just one among the entire pool of N_AW_A + N_BW_B offspring is chosen at random. This new individual, then, replaces one randomly chosen individual in the parents’ population. With this dynamics, the population size remains constant but the frequencies N_A and N_B change with time. Eventually, all individuals will be either of type A or of type B.

We follow the fate of character B. At each generation and before fixation occurs, the frequency N_B can increase by one, decrease by one or stay the same. Based on the dynamics described above, the probabilities associated to the changes of N_B are given by

where the selection coefficient S = W_A/W_B is non-negative and the transition probabilities are independent of time. When 0 ≤ S < 1 the individuals of type B have a selective advantage with respect to individuals of type A (i.e., P_j > Q_j) and vice versa when S > 1. The borderline case S = 1 corresponds to neutral evolution. The probabilities P_i, Q_i and R_i form the elements of the transition probability matrix

The fixation probabilities as function of the initial frequencies and of the selection coefficient can be computed as absorption probabilities from this matrix^4,16,18.

For the Wright-Fisher model, let N_B(t) = i be the number of alleles of type B in the adult population at generation t. Then, according to the evolutionary dynamics the frequency of the allele B in the successive gamete population is³

where p_B(i) = i/N, p_A(i) = 1 − p_B(i) and W_O is the average fitness of the adult population, defined as

The frequency of the allele of type B in the new adult population is obtained through random sampling and leads to the transition probabilities

according to the binomial sampling.

Discussion

At a first sight, it may seem odd that the correct likelihood should depend on the conditional transition probabilities P_ij|k(s). In fact, L_c(s) is computed on one single trajectory of a stochastic process governed by selection and genetic drift. The key point is that single trajectories of a stochastic process should be considered as representative of a bundle of trajectories starting and ending at fixed conditions. Functionals of single trajectories are thus conditioned not only in the initial condition but also in their final condition. When only one realization of ancient DNA variations is available a special form of conditioning in the future has to be included in order to correctly estimate the selection parameters. Such processes were already studied by Schrödinger²¹ who recognized the emergence of possible contradictory claims from the observation of diffusion trajectories conditioned in their initial and final positions. In mathematics, this kind of conditioning has been studied in the context of Brownian bridges, namely processes conditioned both at their initial and final value, a precise description of which requires the introduction of the Doob’s h-transform. More recently, the Doob’s h-transform has become an essential theoretical tool to study the statistics of rare events²² and to understand circular arguments in statistical analysis^16,23. It was also shown that this transform emerges necessarily when trajectories are selected on the basis of their outcome¹⁶.

The likelihood L(s) defined in Eq. (6), based on the transition probabilities P_ij given in Eq. (12) is not the one that should be used to extract a parameter like the selection coefficient from one given trajectory. Indeed, L(s) fails in almost all cases to provide a realistic interval of confidence. The reason for the failure of this method is born in the fact that a given realization does not reveal if it is an unlikely event of a process that would otherwise typically behave differently. As a matter of fact, the process behind a given realization is rather more representative of a process conditioned (in probabilistic terms) to end at the frequency observed at its final observation. If one knows, from first principles, what is the microscopic (molecular) mechanism driving the process under scrutiny then one can follow the procedure explained in this work, derive the conditional probabilities P_ij|k and write the likelihood L_c(s) in terms of these conditioned quantities. This quite obviously provides a good estimate of the selection coefficient. A crucial requirement for the success of this enterprise is the knowledge of the correct model to use.

The use of the unconditioned likelihood L(s) would still give an answer, i.e., a value of s that is apparently consistent with the data. Indeed, case I, which describes a process conditioned on ending at an intermediate state k ≠ 0, N would lead to support the idea of neutral evolution or balancing selection and in fact, L(s) yields a value of s close to unity. In case II, when k = N instead, the individuals of type B would get fixed in the population and the analysis of such a trajectory by means of unconditioned likelihood L(s) would lead to support the idea of a selective advantage in favor of type B even if type A individuals have a selective advantage by construction. Moreover, the time dependence of the transition probabilities, due simply to the effect of conditioning as seen in Eq. (9), would deceptively support the idea of changing environmental conditions. We see that these conclusions, albeit logical from the point of view of explaining the observations a posteriori, are determined by conditioning, i.e., by the fact that N_B(T_F) takes a particular value. Given our a priori knowledge of how we have generated the trajectories, conclusions taken through the analysis with the likelihood L(s) would be therefore deprived of any foundation. But if we had no such a priori knowledge, there would be no way to confirm or reject the conclusions based on L(s). Case III, with data coming from far back in the past and no fixation, presents some peculiarity. For the Moran model L(s) gives a relatively good estimate of the true selection coefficient whereas for the Wright-Fisher model it does not. The reason relies on the different time scales associated to absorption in each of these models. One step in the Wright-Fisher model corresponds to at least N steps in the Moran model. Thus, when the duration T of the time-series is very long and no absorption takes place at the end or close to the end of the time-series, the analysis performed with L(s) leads to a value of s close to unity, compatible with the apparent neutrality of the evolutionary trajectory. When T is short, instead, also for the Wright-Fisher model L(s) delivers a value of s closer to the true value (a test done with T = 10 confirmed this assertion). Therefore, the effect of conditioning in the future combined with the typical time scale of the process and the length of the measurement T is non trivial¹⁵. Finally, in case IV conditioning can be very strong because the process can enter fixation at any time before the present, including times during the observed time-series. From the point of view of the likelihood L(s), case IV would give type B individuals a selective advantage where L_c(s) instead correctly predicts that A was in advantage. Furthermore, in the light of the relationship between L_c(s) and L(s), it emerges especially in trajectories belonging to case II that L_c(s) is bimodal, with a local maximum governed by L(s) and a second local maximum at larger values of s governed by Φ(s). This explains the larger confidence interval for this case in both models. This suggests that the ratio of the likelihoods R(s) = L_c(s)/L(s) rather than L_c(s) alone could be considered an even better functional to estimate the true value of the selection coefficient.

It had already been observed in the context of other models of population genetics that the generation of faithful trajectories of allele frequencies under the condition that fixation has occurred requires the introduction of a fictitious selection coefficient^19,24. In the context of the Moran model instead, it was shown that under the condition that fixation has not occurred after long time, the transition probabilities require a correction factor²⁰. While both these cases are included and generalized in this manuscript, we should stress here instead that extrapolating the selection coefficients from single historic records without due consideration to the peculiar conditioning associated to single trajectories gives values of the selection coefficients that are often very different from the real values.

Conclusions

An historic time-series is one trajectory whose contingency acts as a condition in the future and thus enters in the form of a bias in the elementary transition probabilities. The existence of such a bias when processes are conditioned in the future is often referred to as the Doob’s h-transform. Extracting the selection coefficient from frequency time-series using the false likelihood function has a deceptive effect: the extracted parameters seem to be meaningful and would support models that completely agree with the data used to extract them. Especially when predictions about the future outcomes are not possible because of the experimental limitations, seeking for models solely from past macroscopic data generates a false self-consistency reminiscent of circularity in data analysis^16,25,26,27. When the correct model is known, it is possible to derive a likelihood function that takes the Doob’s h-transform into account and to produce reliable estimates of the selection coefficient.

Additional Information

How to cite this article: Valleriani, A. A conditional likelihood is required to estimate the selection coefficient in ancient DNA. Sci. Rep. 6, 31561; doi: 10.1038/srep31561 (2016).

References

Schraiber, J. G. & Akey, J. M. Methods and models for unravelling human evolutionary history. Nature Reviews Genetics (2015).
Malaspinas, A.-S. Methods to characterize selective sweeps using time serial samples: An ancient DNA perspective. Mol Ecol 25, 24–41 (2016).
Article CAS Google Scholar
Gillespie, J. H. Population Genetics: A concise guide (JHU Press, 2010).
Ewens, W. J. Mathematical Population Genetics 1: Theoretical Introduction, vol. 27 (Springer Science & Business Media, 2012).
Woods, R. J. et al. Second-order selection for evolvability in a large Escherichia coli population. Science 331, 1433–1436 (2011).
Article CAS ADS Google Scholar
Bollback, J. P., York, T. L. & Nielsen, R. Estimation of 2Nes from temporal allele frequency data. Genetics 179, 497–502 (2008).
Article CAS Google Scholar
Malaspinas, A.-S., Malaspinas, O., Evans, S. N. & Slatkin, M. Estimating allele age and selection coefficient from time-serial data. Genetics 192, 599–607 (2012).
Article Google Scholar
Mathieson, I. & McVean, G. Estimating selection coefficients in spatially structured populations from time series data of allele frequencies. Genetics 193, 973–984 (2013).
Article Google Scholar
Feder, A. F., Kryazhimskiy, S. & Plotkin, J. B. Identifying signatures of selection in genetic time series. Genetics 196, 509–522 (2014).
Article Google Scholar
Illingworth, C. J. & Mustonen, V. Distinguishing driver and passenger mutations in an evolutionary history categorized by interference. Genetics 189, 989–1000 (2011).
Article Google Scholar
Illingworth, C. J., Parts, L., Schiffels, S., Liti, G. & Mustonen, V. Quantifying selection acting on a complex trait using allele frequency time series data. Molecular Biology and Evolution 29, 1187–1197 (2012).
Article CAS Google Scholar
Illingworth, C. J., Fischer, A. & Mustonen, V. Identifying selection in the within-host evolution of influenza using viral sequence data. Plos Comput Biol 10, e1003755 (2014).
Article ADS Google Scholar
Moran, P. A. P. Random processes in genetics. In Mathematical Proceedings of the Cambridge Philosophical Society, vol. 54, 60–71 (Cambridge Univ Press, 1958).
Article MathSciNet Google Scholar
Lieberman, E., Hauert, C. & Nowak, M. A. Evolutionary dynamics on graphs. Nature 433, 312–316 (2005).
Article CAS ADS Google Scholar
Zhao, L., Lascoux, M. & Waxman, D. An informational transition in conditioned Markov chains: Applied to genetics and evolution. Journal of Theoretical Biology 402, 158–170 (2016).
Article Google Scholar
Valleriani, A. Circular analysis in complex stochastic systems. Scientific Reports 5, 17986 (2015).
Article CAS ADS Google Scholar
Anderson, T. W. & Goodman, L. A. Statistical inference about Markov chains. The Annals of Mathematical Statistics 89–110 (1957).
Nowak, M. A. Evolutionary Dynamics: Exploring the Equations of Life (Harvard University Press, 2006).
Zhao, L., Lascoux, M. & Waxman, D. Exact simulation of conditioned Wright-Fisher models. Journal of Theoretical Biology 363, 419–426 (2014).
Article Google Scholar
Huillet, T. Siegmund duality with applications to the neutral Moran model conditioned on never being absorbed. Journal of Physics A: Mathematical and Theoretical 43, 375001 (2010).
Article MathSciNet Google Scholar
Schrödinger, E. Über die Umkehrung der Naturgesetze. Sitzungsber. Preuss. Akad. Wiss. Phys.-Math. Kl. 412–422 (1931).
Chetrite, R. & Touchette, H. Nonequilibrium Markov processes conditioned on large deviations. Annales Henri Poincaré 1–53 (2014).
Rusconi, M. & Valleriani, A. Predict or classify: The deceptive role of time-locking in brain signal classification. Scientific Reports 6, 28236 (2016).
Article CAS ADS Google Scholar
Zhao, L., Lascoux, M., Overall, A. D. & Waxman, D. The characteristic trajectory of a fixing allele: A consequence of fictitious selection that arises from conditioning. Genetics 195, 993–1006 (2013).
Article Google Scholar
Kriegeskorte, N., Simmons, W. K., Bellgowan, P. S. & Baker, C. I. Circular analysis in systems neuroscience: The dangers of double dipping. Nature neuroscience 12, 535–540 (2009).
Article CAS Google Scholar
Brenner, S. Sequences and consequences. Philosophical Transactions of the Royal Society of London B: Biological Sciences 365, 207–212 (2010).
Article Google Scholar
Lewontin, R. C. Facts and the factitious in natural sciences. Critical Inquiry 18, 140–153 (1991).
Article Google Scholar

Download references

Author information

Authors and Affiliations

Department of Theory and Bio-Systems, Max Planck Institute of Colloids and Interfaces, Potsdam, 14424, Germany
Angelo Valleriani

Authors

Angelo Valleriani
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

A.V. has conceived and conducted this study, made simulations and calculations and wrote the manuscript.

Ethics declarations

Competing interests

The author declares no competing financial interests.

Rights and permissions

This work is licensed under a Creative Commons Attribution 4.0 International License. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in the credit line; if the material is not included under the Creative Commons license, users will need to obtain permission from the license holder to reproduce the material. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/

Reprints and permissions

About this article

Cite this article

Valleriani, A. A conditional likelihood is required to estimate the selection coefficient in ancient DNA. Sci Rep 6, 31561 (2016). https://doi.org/10.1038/srep31561

Download citation

Received: 13 April 2016
Accepted: 22 July 2016
Published: 16 August 2016
DOI: https://doi.org/10.1038/srep31561

Comments

By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.