On the proportional abundance of species: Integrating population genetics and community ecology

The frequency of genes in interconnected populations and of species in interconnected communities are affected by similar processes, such as birth, death and immigration. The equilibrium distribution of gene frequencies in structured populations is known since the 1930s, under Wright’s metapopulation model known as the island model. The equivalent distribution for the species frequency (i.e. the species proportional abundance distribution (SPAD)), at the metacommunity level, however, is unknown. In this contribution, we develop a stochastic model to analytically account for this distribution (SPAD). We show that the same as for genes SPAD follows a beta distribution, which provides a good description of empirical data and applies across a continuum of scales. This stochastic model, based upon a diffusion approximation, provides an alternative to neutral models for the species abundance distribution (SAD), which focus on number of individuals instead of proportions, and demonstrate that the relative frequency of genes in local populations and of species within communities follow the same probability law. We hope our contribution will help stimulate the mathematical and conceptual integration of theories in genetics and ecology.

where X J is the observable process (predictable, in mathematical terms), and M J is the noise process (a martingale).
The process X J is easily computed by means of the Markov property of X J (see (1)). That requires to introduce the filtration or history induced by X J , given roughly by F J t obtained from σ(X J (s); 0 ≤ s ≤ t) by customary Dellacherie procedure for all t ≥ 0. We assume further that we choose a nice version of X J , that is right-hand continuous with left-hand limits. As a result, given any measurable function f defined on [0, 1], the predictable compensator f • X J (t) of f • X J (t) = f (X J (t)) is given by So that, if one applies the above formula to the identity function in [0, 1], one obtains: This gives the main dynamics for the proportion of living individuals in the population of size J.
The noise is (trivially) given by However, the important characteristics of this noise is provided by its "energy dissipation", which is the increasing process M J , M J such that M 2 J − M J , M J is a martingale (or the predictable compensator of the square of the noise, which is an observable quantity). Using again the Markov property one finds The two processes (S4) and (S6) are essential to understand the approximation of the dynamics by a diffusion when J → ∞ and one considers a large time scale (Jt instead of t).
Let define Z J (t) = X J (Jt). Therefore, after an elementary change of variables (u = s/J) in (S4), we obtain the predictable compensator of Z J as Analogously, in (S6) the new time scale yields Theorem 1 Define Z J (t) = X J (Jt), for all J ≥ 1 and t ≥ 0. Assume Z J (0) = z ∈ [0, 1] fixed, and that there exists two continuous functions β, σ : such that they satisfy in addition the two following hypotheses: Then, the process Z J converges in distribution towards a diffusion process Z which can be represented as Moreover, Z(t) ∈ [0, 1] with probability 1 for all t ≥ 0. Z is a Feller process and its semigroup (T t ) t≥0 acting on C([0, 1]) has a generator L given by As a result, the dual semigroup (T * t ) t≥0 leaves the space L 1 ([0, 1]) invariant, so that, in particular, given any probability density ρ on [0, 1], its evolution ρ t = T * t ρ satisfies the Chapman-Kolmogorov (or Master Equation), Proof. This theorem is a direct consequence of Proposition III.2.4, pages 92-93 in (2) (see also a more general result in (3)). One notes first that the process Z J with states in [0, 1] almost surely, has vanishing jumps if J → ∞, since sup t |∆Z J (t)| ≤ 1/J. Thus the first hypothesis in (2) Proposition III.2.4, is satisfied.
In addition, given T > 0, it holds that Similarly, So that, in both previous inequalities the left-hand terms converge to 0 in probability as J → ∞ due to (H1) and (H2).
Moreover, the hypotheses on β and σ imply that there is a unique solution in distribution to the equation (S7) (see for instance (4), Corollary 4.29 and Theorem 5.7). As a result, Proposition III.2.4 in (2) fully applies. So that, the convergence to the diffusion Z is proved.
Finally, the coefficients β and σ of the diffusion are bounded, with bounded derivatives, so that, the generator L applies each function of its core It is worth noticing that the convergence in distribution mentioned in the above theorem, means the convergence of the sequence of laws of the processes on the space of their trajectories. As a result, any continuous functional F (Z J ) of the trajectory of Z J converges in distribution to F (Z).
Corollary 1 Consider the sequences B J and D J given by equations (5) and (6) in the main text, where the functions b J , d J ∈ C 1 (]0, 1[) and c J ∈ C 2 (]0.1[) satisfy equations (7) and (8) in the main text.
Then, Z J converges in distribution to a diffusion Z represented as , where b and d are given by (7) in the main text. Similarly, define σ 2 (x) = 2c(x), where c is obtained from (8) in the main text. A simple computation yields, for all x ∈ [0, 1], so that (H1) is trivially satisfied. Moreover, Since b J and d J are bounded, lim J 1 J (b J (x) + d J (x)) = 0. And equation (8) in the main text implies that as J → ∞, uniformly in x ∈ [0, 1]. This implies in particular (H2) and the proof is complete.
It is worth noticing that the distribution P t of Z(t) represents the state of the open ecological system at time t. This state has a density ρ t , that is P t (dx) = ρ t (x)dx, and it can be obtained from the process Z(t) as follows: P t (Z(t) ∈]a, b]) is the limit of the frequency of trajectories of the process Z(t) visiting the interval ]a, b]. So that, these frequencies can be obtained by simulating the solutions to (S10).

Derivation of the Beta distribution
The invariant density distribution of Z(t) is the solution of the equation (11) in the main text. The choice of b, d, c according to equations (12), (13), (14) in the main text yields Noticing that b 0 − d 0 = αγ and b 1 − d 1 = (β − α)γ, the above equation is equivalent to A straightforward computation shows that any function of the form solves (S11). So that, choosing C = 1/B(α, β) (normalization constant) one obtains the unique solution ρ ∞ (x) of (S11) which is a probability density on the real line.
In particular, the choice of coefficients (21), (22), (23) (see main text), with p = 1/S, leads to where α = Remark. Under the neutrality hypothesis, the number of living individuals N i (t) of the species i have the same probability distribution at time t ≥ 0, for i = 1, . . . , S and these variables are independent. So that, let denote by N (t) any of the above variables. Since 0 ≤ Z J (t) = N (tJ)/J ≤ 1 for all t ≥ 0, the sequence (Z J (t)) J∈N is uniformly integrable for all t. Therefore, the convergence in distribution of Z J to Z yields where ρ t is the solution to the Master Equation.
Also, under the Neutrality Hypothesis one has the following approach to compute the probability of finding a species with n individuals at time tJ.
And, similarly, Finally, as it has been the tradition in neutral theory we can derive the typical species abundance distribution (SAD), or expected number of species having n individuals in the focal community. That is, the probability of occurrence of that event is given by (S14). Since the species are independent and identical, we have a binomial distribution with parameters (S, p n,J ), so that its mean value is simply Sp n,J . Therefore, it can be approached for J large enough by Table S 1: Fit of the discrete Beta distribution (eqn. 28) to fifteen plant and animal communities. Data for communities 1-6 comes from (5), 7-9 from (6) 10 from (7), 11-12 from (8) and 13-15 from (9). The estimation of α and β was done by optimisation based on the Nelder-Mead method implemented in the maximum likelihood function mle2, included in library bbmle for R. For each community, the Volkov model was simulated using function volkov included in library untb for R. Observed richness (S) and total abundance (J) were directly calculated from data and passed to the function as arguments. On the other hand, parameters theta and m required by this function were estimated using software tetame (10,11). Comparison between observed and predicted frequency distribution were done using Pearson's correlation (P).