Introduction

Ever since the evolutionary synthesis, population genetics theory has been integrated, to different extents, into different disciplines within biology including systematics and ecology. This later integration took off with the development of theoretical formulations relating the processes that drive changes in numbers of individuals within age-structured populations, with changes in the fitness of different genotypes1,2. Yet further integration was achieved with the emergence of the new ecological genetics spoused by Antonovics3, one of whose tenets was that “Forces maintaining species diversity and genetic diversity are similar. An understanding of community structure will come from considering how these kind of diversity interact”. More recently, the emergence of community genetics4 has reinvigorated the search for connections between population genetics and community ecology, along with the realization that there is a striking similarity between processes driving changes in the abundance and diversity of species within communities and genes within populations5,6.

The recent development of neutral approaches to the study of ecological systems7,8,9,10 have provided a renewed emphasis upon the value of theory and stochasticity in ecology11,12,13,14 and a locus for the further integration of genetical and ecological theories15,16. By merging the mathematical and statistical tools developed by population geneticists with the neutrality approach, neutral theory in ecology allows us to better understand the factors affecting the abundance and distribution of species15,16,17,18,19. But there is a major barrier to this integration, while population geneticists pioneered the use of diffusion approximations (i.e. a continuous process) to the understanding of processes affecting gene frequencies20, ecologists have favored to work with the distribution of the number of individuals across species (i.e. a discrete process) or SAD8,21,22,23 (but see24). It is not surprising then that the answer for the abundance of species within communities (i.e. Fisher’s Log-series8,) is different from that for gene frequencies within populations (i.e., a Beta distribution25,26). In this contribution, we aim at filling this gap in knowledge by analyzing the distribution of species abundances as a continuous process (i.e. using a diffusion approach). To do so we focus on the proportional abundance of species instead of the number of individuals. We show that if one assumes that birth and death rates follow the functional form used in neutral theory8,28 the stationary distribution for the species proportional abundance distribution (SPAD), the same as for genes, is a beta distribution with parameters α and β that quantify the relative importance of immigration and speciation, respectively, in relation to stochastic fluctuations. We show that this distribution provides a good description of empirical data and applies across a continuum of scales.

The model

We model the community as an open system, and as such we do not distinguish two spatial scales in our system, as usually done in neutral models, as the one proposed by Volkov et al.8, but a continuum of scales, which are defined by the observer of the system when studying it. The system could be, for example, a 50 ha plot in a tropical forest or a 1 m2 plot in the intertidal. What is important is to realize that once the observer defines the spatial scale of the system, it defines a boundary or an inside and an outside, where the focal system is embedded (Fig. 1). We call this observer defined scale the focal community that is embedded into a bath or environment with which it interacts. The focal community dynamics is driven by birth and death processes and by immigration from the outside. We do not explicitly consider speciation as this is subsumed into the immigration process11. Indeed the spatial scale of analysis is to some extent dictated by which is the dominant process adding new species to a given focal community; immigrations of individuals from species not yet found in the focal community but somewhere else in the bath, or new species arising through speciation within the focal community. If the later is the dominant process, then the spatial scale is likely to be large, since all species in the potential pool are already present and the only way a new species can arrive would be through speciation. Similarly, the processes that remove individuals and species from the focal community include death and emigration towards the bath or environment. To model the dynamics of this focal community we used the diffusion approximation of birth and death processes independent of a focal community size J. By community size we mean the total number of individuals regardless of species identity.

Figure 1
figure 1

Diagrammatic description of the diffusion approach taken in this contribution. This approach assumes the existence of a focal community (the white area delimited by a discontinuous line) of size J, and where N J (t) denotes the number of individual of a given species within it. The abundance of any species in this focal community follows a birth death process, with rates b J , d J and c J . However, since we are interested in the proportion of individuals instead on their numbers, we introduce the process Z(t) or stochastic proportional abundance. It is shown that as J → ∞, Z(t) converges to a diffusion that satisfies the stochastic differential equation for dZ(t) with rates b(x), d(x) and c(x) (see Eqs (79)). At any given time the probability density of Z(t) is given by the Fokker-Planck equation associated to ρ t (x) (Eq. (10)). Further, when t → ∞ this probability density becomes stationary or invariant and is called ρ (x). We show that when b(x), d(x) and c(x) have a particular functional form (see Eqs (1214)) the invariant distribution is a beta distribution (Eq. (15)). The Panels on the right show the simulation of trajectories for the diffusion process Z(t), the associated density at a given time ρ t (x) and the invariant distribution ρ (x).

Let N J (t) denote the number of living individuals of a given species within a focal community of size J, at time t ≥ 0 (so that N J (t) is less or equal to J for all t). This is assumed to be a birth and death process, with transition matrix P(t) = (P n,m (t); n, m = 0, …, J) (n and m denotes the number of individuals). For a small time increment h, this matrix satisfies as h → 0 for n ≥ 0

$${P}_{n,n+1}(h)={B}_{J}(n)h+o(h),\,{\rm{for}}\,{\rm{n}}\ge {\rm{0}},$$
(1)
$${P}_{n,n-1}(h)={D}_{J}(n)h+o(h),\,{\rm{for}}\,{\rm{n}}\ge {\rm{1}},$$
(2)
$${P}_{n,n}(h)=1-({B}_{J}(n)+{D}_{J}(n))h,\,{\rm{for}}\,{\rm{n}}\ge {\rm{0}},$$
(3)
$${P}_{n,m}\mathrm{(0)}={\delta }_{n,m},$$
(4)

where B J (n) and D J (n) are the birth and death rates, respectively, D J (0) = 0, B J (0) > 0, δ n,m is the customary Kronecker delta, and o(h) denotes the Landau-symbol, which satisfies \({\mathrm{lim}}_{h\to 0}\frac{o(h)}{h}=0\). Here, in addition, we assume that these rates are decomposed as follows

$${B}_{J}(n)={b}_{J}(n)+{c}_{J}(n)$$
(5)
$${D}_{J}(n)={d}_{J}(n)+{c}_{J}(n\mathrm{).}$$
(6)

The terms b J and d J represent birth and death rates in the focal community, respectively, which will be asymptotically independent of J, while c J takes into account the variations on the above rates due to the interaction between the focal system and the environment wherein it is embedded, proper to an open system approach. Since we are interested in proportions n/J, we introduce the variable x = n/J, which takes values in {0, 1/J, 2/J, …, 1}, and analyze the behavior of the system as the size of the population grows indefinitely: J → ∞. At this stage it is important to state meaningful hypotheses for the previous rates for large J, as all changes of scales in the dynamics of the open system are driven by this community size.

We first assume that b j and d J will lead, respectively, to the J-invariant (or endogenous) birth and death rates of the focal system, that satisfy

$$\mathop{\mathrm{lim}}\limits_{J\to \infty }{b}_{J}(xJ)=b(x);\,\mathop{\mathrm{lim}}\limits_{J\to \infty }{d}_{J}(xJ)=d(x),\,(x\in \mathrm{[0,}\,\mathrm{1]).}$$
(7)

On the contrary, the rate c J , should vary significantly with J, however, we require that it satisfies

$$\mathop{\mathrm{lim}}\limits_{J\to \infty }\frac{{c}_{J}(xJ)}{J}=c(x),\,(x\in \mathrm{[0,}\,\mathrm{1]).}$$
(8)

We can now define the stochastic process Z J  = (Z J (t) = N(tJ)/J; t ≥ 0) that we call the stochastic proportional abundance. This family of processes has a limit Z = (Z(t); t ≥ 0) as J → ∞, that corresponds to a diffusion process satisfying the stochastic differential equation (see Supplementary Information)

$$dZ(t)=(b(Z(t))-d(Z(t)))dt+\sqrt{2c(Z(t))}dW(t),$$
(9)

where W(t) denotes a Brownian motion.

It is worth noticing (see Supplementary Information also) that the process Z J  = (Z J (t); t ≥ 0) converges in distribution towards a diffusion process Z = (Z(t); t ≥ 0) as proven in27, and so, any continuous functional F(Z J ) of the trajectory of Z J converges in distribution to F(Z). In particular, it is proved (see Supplementary Information) that for any values 0 < a < b ≤ 1, it holds

$$\mathop{lim}\limits_{J\to \infty }\,{\mathbb{P}}(a < {Z}_{J}(t)\le b)={\mathbb{P}}(a < Z(t)\le b\mathrm{).}$$

where \({\mathbb{P}}\) is the probability defined on the set of all trajectories of the process.

Correspondingly, the Fokker-Planck equation associated with the probability density ρ t (x) of Z(t), is given by

$$\frac{\partial }{\partial t}{\rho }_{t}(x)=\frac{{\partial }^{2}}{\partial {x}^{2}}(c(x){\rho }_{t}(x))-\frac{\partial }{\partial x}([b(x)-d(x)]{\rho }_{t}(x)),$$
(10)

With the additional condition that \({\int }_{{\mathbb{R}}}\frac{\partial }{\partial t}{\rho }_{t}(x)dx=1\). The stationary solution ρ is determined as the solution to the equation

$$\frac{{\partial }^{2}}{\partial {x}^{2}}(c(x){\rho }_{\infty }(x))-\frac{\partial }{\partial x}([b(x)-d(x)]{\rho }_{\infty }(x))=0$$
(11)

In order to find the stationary distribution we need to make a hypothesis for each of the rates b(x), d(x) and c(x), the simplest ones are that

$$b(x)={b}_{0}+{b}_{1}x$$
(12)
$$d(x)={d}_{0}+{d}_{1}x$$
(13)
$$c(x)=\gamma x\mathrm{(1}-x),$$
(14)

where b i , d i , (i = 0,1), and γ are positive constants. Under these hypotheses (see Supplementary Information) the stationary solution takes the form of a typical Beta distribution

$${\rho }_{\infty }(x)=\frac{{\rm{\Gamma }}(\alpha +\beta )}{{\rm{\Gamma }}(\alpha ){\rm{\Gamma }}(\beta )}{x}^{\alpha -1}{\mathrm{(1}-x)}^{\beta -1}\mathrm{.}$$
(15)

Then an elementary computation shows that (15) provides a solution to (11) with

$$\alpha =\frac{{b}_{0}-{d}_{0}}{\gamma }$$
(16)
$$\beta =\frac{{d}_{1}-{b}_{1}}{\gamma }-\frac{{b}_{0}-{d}_{0}}{\gamma }\mathrm{.}$$
(17)

In Fig. 1, we provide a diagrammatic version of the main steps taken in our derivation of the stationary Beta distribution. As an important particular case, let us use the rates proposed by McKane28 and used in the neutral theory model proposed by Volkov8, which in our framework, this will correspond to the following rates

$${b}_{J}(n)=mp(1-\frac{n}{J})$$
(18)
$${d}_{J}(n)=m(1-p)\frac{n}{J}$$
(19)
$${c}_{J}(n)={\lambda }_{J}\mathrm{(1}-m)\frac{n}{J}\frac{J-n}{J-1}\mathrm{.}$$
(20)

where p is the probability with which we choose individuals of a given species, and m denote a migration probability. In addition, we introduce the parameter λ J to keep track of fluctuations in demographic rates due to interactions between the focal system and the environment, for instance, as a consequence of temperature variations or due to other unknown biotic or abiotic variables. We assume that λ J /J → λ as J → ∞. Thus, letting J → ∞, one obtains the convergence towards the corresponding limits

$$b(x)=mp(1-x)$$
(21)
$$d(x)=m(1-p)x$$
(22)
$$c(x)=\lambda \mathrm{(1}-m)x\mathrm{(1}-x),$$
(23)

where x [0, 1] (that is, b 0 = mp, b 1 = −mp, d 0 = 0, d 1 = m(1 − p), γ = λ(1 − m)).

Thus, under the above choice of coefficients, (9) becomes

$$\begin{array}{rcl}Z(t) & = & z-{\int }_{0}^{t}m(Z(s)-p)ds\\ & & +{\int }_{0}^{t}\sqrt{2\lambda \mathrm{(1}-m)Z(s\mathrm{)(1}-Z(s))}d{W}_{s}\mathrm{.}\end{array}$$
(24)

And ρ has the form (15) with

$$\alpha =\frac{mp}{\lambda \mathrm{(1}-m)}$$
(25)
$$\beta =\frac{m\mathrm{(1}-p)}{\lambda \mathrm{(1}-m)}\mathrm{.}$$
(26)

We interpret α, as quantifying the relative contribution of immigration of known species to the abundance of species already present in a focal community, while β quantifies the relative contribution of immigration of species not yet known in the focal community, that is, speciation. Notice that, both α and β are expressed in relation to the magnitude of the fluctuations induced by the interaction with the environment (i.e. λ(1 − m)).

When the probability with which we choose individuals of a given species is p = 1/S, where S denote the total number of species, β = α(S − 1) and thus (15) becomes

$${\rho }_{\infty }(x)=\frac{1}{B(\alpha ,\alpha (S-\mathrm{1))}}{x}^{\alpha -1}{\mathrm{(1}-x)}^{\alpha (S-\mathrm{1)}-1},$$
(27)

where \(\,B(\alpha ,\alpha (S-\mathrm{1))=}{\int }_{0}^{1}{x}^{\alpha -1}{\mathrm{(1}-x)}^{\alpha (S-\mathrm{1)}-1}\) (normalization constant) and \(\alpha =\frac{m}{S\mathrm{(1}-m)\lambda }\) (see derivation of the Beta distribution in the Supplementary Information).

In Fig. 2 we show the fit of (27) to several datasets including the Malayan butterflies and the Rothamsted Lepidoptera data originally used by Fisher29, tropical birds in Manu Park (Perú)30, tropical forests31, Fynbos shrublands32 and coral reefs33. In all cases the correlations between observed and fitted frequencies (expressed as proportional abundance) was highly significant (Table 1).

Figure 2
figure 2

Fit of the Beta distribution to different animal and plant communities. From left to right, first row, Amazon birds, Lepidoptera, butterflies, second row Tropical trees and Coral reefs (communities 10, 12, 11, 6, 2 and 14 in Table 1 respectively). Third row Tropical trees. Fourth row Tropical trees, and Fynbos shrublands. Fifth row Fynbos shrubland and coral reefs (communities 1, 3, 4, 5, 7, 8, 9, 13 and 15 in Table 1 respectively).

Table 1 Fit of the Beta distribution (Eq. (27)) to fifteen plant and animal communities.

In Fig. 3 we show the relationship between α and β. As expected, both are positively correlated, but more interestingly it is apparent that birds, butterflies and marine communities are characterized by large α, a measure of the importance of migration, as expected for open and highly connected systems where immigration in the form of dispersal could be the dominant processes accounting for the appearance of new individual each generation. Similarly, the Fynbos shrub dominated communities (7–9 in Table 1) are characterized by low β, which may be associated to low rates of speciation (but see32,34). Indeed, β is correlated to the biodiversity number θ of classical neutral theory (Pearson’s r = 0.97, n = 6, P < 0.01, see inset in Fig. 3), which is a function of speciation rate7,8.

Figure 3
figure 3

Relationship between parameters α and β for the communities shown in Table 1 (in blue Marine, in green Tropical Forest, in red shrublands, in light yellow butterflies and in strong yellow bird communities). In the inset the relationship between β (y axis) for forest communities 1–6 in Table 1, and the θ (x axis) parameter estimated in31 for the same forest communities.

Finally, in Fig. 4a, we show simulations of the stochastic proportional abundance of species or trajectories of Z(t) in (24). Figure 4b is the plot of the confidence intervals around the mean \({\mathbb{E}}Z(t)\), notice that the process rapidly converges to the long term average value. As we mentioned before, the density distribution ρ t of the stochastic proportional abundance, which corresponds to a neutral abundance at the rescaled time t, tends to a stationary distribution ρ as t → ∞. We can estimate ρ by sampling the trajectories of Z(t) after a large number of generations (e.g. t = 1000) represented by the histogram in Fig. 4c, which is in good agreement with the limit beta distribution density ρ .

Figure 4
figure 4

(a) Simulation of 225 trajectories (only 50 are shown) using Eq. (24) with λ = 0.001585, p = 0.0044, m = 0.09 and an initial proportional abundance Z(0) equal to 0.2. (b) Mean value of the observed trajectories and 95% confidence intervals. (c) Histogram of the trajectories Z(t) for t = 1000 in (a), estimated Beta distribution (27) (continuous blue line) and the theoretical density ρ (27) (red dashed line).

Discussion

A key component of the evolutionary synthesis was the mathematical formalization of the processes driving changes in gene frequencies within Mendelian populations. Wright’s island model25,35 demonstrated that the frequency of neutral alleles in a local open population affected by mutation, migration and drift, will converge to a Beta steady-state distribution of allele frequencies. In light of our results, the equilibrium distribution of gene frequencies in a local population is equivalent to the frequency of different species in a local community or the Species Proportional Abundance Distribution (SPAD). Although this equivalence was expected, as both genes and species are affected by similar stochastic processes, it is a novel result since the equilibrium distribution of the SPAD was unknown, and previous results have either relied upon additional assumptions, such as density dependence36 or on approximations to the continuous limit37,38. Our results complement the efforts to understand the distribution of species abundances that have focused on changes in the numbers of individuals in different species (e.g7,8,29.) instead of the proportional abundance of species within communities. As far as we know, ours is the only continuous, neutral, and exact mathematical formulation derived from first principles. That is, based upon a birth death processes on the appropriately rescaled relative abundance process that, in the limit as J → ∞, is shown to satisfy the stochastic differential equation (9) in agreement with Rebolledo’s central limit theorems27 (see also Supplementary Information).

The general model for SPAD that we propose is based on a diffusion approach, as it has been used in population genetics to study the distribution of gene frequencies. Indeed Kolmogorov39, showed that the steady state distribution for allele frequencies (i.e. a Beta distribution) in Wright’s island model was the stationary distribution of the diffusion approximation. In this vein, we show that the stationary distribution for the species proportional abundance is a Beta distribution, but only if birth and death rates are of the form (12), which accommodates, as a particular case, the ones traditionally used in neutral models8,28,31.

Since the gamma distribution is the invariant distribution of a single species population following stochastic logistic growth40,41, it has been suggested as the most appropriate to describe SADs42. Interestingly, Fisher’s logarithmic series model is a Gamma type distribution. It is derived from Poisson sampling a population of S species (i.e. when the number of individuals sampled from any species is Poisson distributed) whose abundance follows a gamma distribution with shape parameter k = 0. As shown by Kempton43 if the sampled population consists of independent subpopulations each following a generalization of the Gamma model (i.e. k ≠ 0) then the resulting distribution will be a Beta distribution, as it is well know in statistics, and the resulting sampling distribution would be the generalized log series. Similarly, Engen and Lande42 show that under a stochastic logistic model with positive mean growth rate, the relative abundances of species would be Dirichlet distributed, which is the multivariate version of the beta distribution. Thus, the beta distribution has been around for a long time in ecology, here we show it is the invariant distribution associated to a diffusion process representing an open dynamical system under neutrality.

It is important to realize that the stochastic process described by Z(t) is of the Markov type since future changes depend on the present state, but not on the past history which led to this present state. Although this is a common assumption in ecological and evolutionary models, a large body of experimental data and analyses shows the importance of history (or memory) in affecting current states at the level of individuals, populations and lineages44,45,46. In this context it will be desirable to develop non-markovian models for neutral macroecology; after all, life is a historical process and the explicit consideration of history may be the simplest way of breaking the symmetry of neutrality.

If the variable Z(t) were discontinuous (i.e. if it were a measure of number of individuals instead of proportions) it will change in jumps due to birth, death, immigration and speciation processes and in this case the probability of a change during a small time interval (t,t + h) is small (of the order of magnitude h), but if a change occurs, it is of finite magnitude. In the diffusion approximation, during any time interval, however small, Z(t) undergoes some change, such that the probability that Z(t + h) − Z(t) > ε is of smaller order of magnitude than h. Continuity in this case, is possible only for large J as the number of event per time interval become continuous in rescaled time (i.e. tJ).

In genetics, where diffusion methods where first applied in the context of biology, the diffusion approximation was used to derive the distribution of allele frequencies under the process of migration, mutation, selection and drift (by themselves and in combination)47. Interestingly, in this area of inquiry, diffusion methods provided good approximations to model the evolution of finite populations48, even though its derivations requires J → ∞. In our case, the derivation of the beta distribution is based on two limits one for the number of individuals, and secondly, one in time. The order in which these limits are taken cannot be changed. Once the diffusion limit is obtained via J → ∞, the beta distribution is indeed obtained as a consequence of t → ∞. Since what we are analyzing is the evolution of individual abundance, a process that started with the origin of life, it is correct to assume that we are at the large t limit (even if we consider the time since the last major extinction event 66 million years ago) and thus the finding of a beta distribution should be common. In our case, the fits to finite focal communities seems remarkable, however we do not know how J affects the fit to our stationary solution and if there is a minimum J below which our approximation would seem inadequate. The issue get even more complex since the Beta distribution does not have a close form Maximum Likelihood estimator, which hinders the usability of the model in terms of estimating parameters of the distribution given the data, and testing hypotheses about them. An alternative solution is to use an approximation to the maximum likelihood, several of which are implemented in available packages such as R, Matlab and Scipy, and which provide accurate estimations of parameters (less that 3 percent bias) with sample size above 10049, or to estimate the coefficients of the diffusion process itself using the methods suggested by50 and simulate the stochastic process (9) to obtain the expected form of ρ t as shown in Fig. (3) and then compare it to empirical ones. Although in strict terms Z(t) and its invariant ρ apply to one species, the neutrality assumption allow us to use ρ as a good hypothesis for multispecies assemblages. In this context we show in the Supporting Information (Figs S1–3) that the parameters of the Beta distribution β,α can be estimated with little error when simulating 200 trajectories of Z(t) (see also Fig. 4c), which as a first approximation we consider as a proxy for 200 species under neutrality. Finally, if the steady state assumption in (11) does not hold, due to perturbations or in the case of a newly colonized habitat, then we will be observing ρ t and its functional form can be explored through simulations (codes provided upon request). These are important issues that require further investigation to increase the applicability of the diffusion approximation herein provided.

Our diffusion approximation is based upon the paradigm of open dynamical systems, whereby we try to understand the behavior of a focal system, or focal community, in the context of an environment or bath with which it interacts; an approach that has been mostly developed for open quantum systems51. Since we are only able to specify the dynamics of our focal system, which is the one we study and develop theories an hypothesis about, everything we do not know about it is specified in the fluctuations represented by the noise term in the stochastic differential equation (9), whose intensity is dependent upon the the value of c(x). In this respect, our model can accommodate both neutral and non-neutral processes, with the latter being included in the noise term. In the particular case we explored, using transition rates as in28, the core of the dynamics is neutral at the level of the focal system but everything else that could potentially impact upon the dynamics of the local systems, either neutral or not, will be capture in the fluctuations induced by the interaction with the reservoir and included in the Brownian noise term. It is important to notice that we assume that these fluctuations act at comparable time scales, if this were not the case (as it is likely since immigration is faster than speciation) the addition of a different time scale in the form of fluctuations following a Poisson distribution may be in order. In this case we would arrive to a Lévy type diffusion process.

One of the problems of our derivation is that there are no comparable models against which to contrast its performance, as our model is defined using proportional abundances instead of the usual number of individuals. To solve this problem we show (see Supplementary Information) that an approximation for the abundance function, defined as the average number of species containing n individuals, n  {1, …, J}, or SAD is:

$$\langle {\varphi }_{n}\rangle \sim \frac{S}{JB(\alpha ,\alpha (S-\mathrm{1))}}{(\frac{n}{J})}^{\alpha -1}{(1-(\frac{n}{J}))}^{\alpha (S-\mathrm{1)}-1}$$
(28)

As shown in Table S1 (Supplementary Information) the approximation to the SAD derived from our model is as good as previous ones.

Finally, it is worth reiterating that the form of the stationary distribution ρ is dependent upon the transitions probabilities characterizing the birth and death process and that the Beta distribution is valid only for the transitions specified by28 but other are possible23,31. It remains to be seen what other stationary distributions can be found and if these are compatible with observed SADs. This will certainly improve our understanding of the causes underlying the distribution of abundance in ecological systems.