Dispersal ability determines the scaling properties of species abundance distributions: a case study using arthropods from the Azores

Species abundance distributions (SAD) are central to the description of diversity and have played a major role in the development of theories of biodiversity and biogeography. However, most work on species abundance distributions has focused on one single spatial scale. Here we used data on arthropods to test predictions obtained with computer simulations on whether dispersal ability influences the rate of change of SADs as a function of sample size. To characterize the change of the shape of the SADs we use the moments of the distributions: the skewness and the raw moments. In agreement with computer simulations, low dispersal ability species generate a hump for intermediate abundance classes earlier than the distributions of high dispersal ability species. Importantly, when plotted as function of sample size, the raw moments of the SADs of arthropods have a power law pattern similar to that observed for the SAD of tropical tree species, thus we conjecture that this might be a general pattern in ecology. The existence of this pattern allows us to extrapolate the moments and thus reconstruct the SAD for larger sample sizes using a procedure borrowed from the field of image analysis based on scaled discrete Tchebichef moments and polynomials.

Here we explain how to reconstruct a frequency distribution, f(x), from knowledge of its raw moments using scaled Tchebichef moments and polynomials 19 . In the context of this paper it means reconstructing the species abundance distribution from knowledge of its raw moments. In addition, we provide a function in R to calculate the scaled Tchebichef moments and polynomials and exemplifies the application of the method.
The idea of reconstructing a distribution, f(x), from its moments using scaled Tchebichef moments and polynomials can be explained as follows. If you know the moments then you can calculate the scaled Tchebichef moments, T n , which are then the weights of the scaled Tchebichef polynomials, , through the formula where the maximum value of N coincides with the number of bins. By giving different weights to each polynomial we can obtain through the above sum the shape of the distribution from which the original moments were calculated.
The basic ingredients to reconstruct the frequency distribution are: knowledge of the raw moments, M n , the number of required bins, N (in practice, we need to estimate the abundance of the most abundant species) and the total number of data points, S (that is, the number of species). Assuming that these are known, we can calculate the scaled Tchebichef polynomials, , using the following recurrence formulas We show in Fig. S1 the scaled Tchebichef polynomials from order n = 0 to order n = 5. These are the polynomials that are going to be added weighed by the respective scaled Tchebichef moments, T n , according to Eq. S1.
Finally, the scaled Tchebichef moment of order n, T n , with 0≤n<N, is given by , where M i is the raw moment of order i, which we assumed to be known, are the Stirling numbers of the first kind, C k (n,N) is ; n=0,1,…,N-1.
Notice that the information on the distribution that we want to reconstruct is provided through the raw moments, M i , in eq. S3.
We exemplify the reconstruction of a frequency distribution using the R function called figure.S2. This is a demonstration function of the function tcheb.mom.pol which is the one that implements the above expressions to calculate the scaled Tchebichef moments and polynomials. We hope that by providing an example of how to use the tcheb.mom.pol function, this function can be used in a wide variety of situations with some adaptations to the problem at hand.
The function figure.S2 assumes that the package "untb" has been installed. If it has not, then you should run the following instruction before using the function figure.S2 install.packages ("untb") (assuming that you have access to the internet).
We suggest you copy and paste function figure -c(0,1,2,3,4,5,6,7,8, 9,10,11) counts <-c(27,11,20,30,29,38,30,21,20,5,4,1) (Notice that this is rather artificial since we know beforehand the distribution, the point here is solely to see how well the method performs and, simultaneously, provide an example on how to call the function tcheb.mom.pol, which is the one that really matters.) Here we called the function twice because we want to exemplify the reconstruction of the distribution using the maximum allowed number of moments (twelve) and a smaller number of moments (six) to illustrate how an increase in the number of polynomials increases the fitting of the curve to the original histogram. In a typical situation it would be probably called only once.
The function tcheb.mom.pol has the following structure tcheb.mom.pol <-function(no.m,no.bins,mom,S) and the meaning of the input variables is: -no.m is the number of raw moments and, therefore the number of scaled Tchebichef moments and polynomials, -no.bins is the number of bins, -mom is a vector containing the raw moments, and -S is the total number of points to obtain the frequency distribution.
The number of moments, no.m, should be smaller or equal to the number of bins, no.bins (see eq. S1). The function "tcheb.mom.poly" calls the functions "Ck.f", "ro.f" and "stirling.1st.ord" in order to implement equations S1 to S3.
The remainder of function figure.S2 serves only to plot the histogram of the original distribution and the curves obtained with tcheb.mom.pol. Figure S1. The shapes of the scaled Tchebichef polynomials from order 0 (a horizontal line equal to 1) to order 5. was assumed known, and from these we calculated the raw moments. We then reconstruct the original distribution using the information contained in the raw moments by using scaled Tchebichef moments and polynomials up to order 5 and 11, to which corresponds 6 and 12 moments, respectively. Notice that 6 moments (from order 0 to 5) already provide a reasonable fit to the empirical distribution; the use of 12 moments leads to a perfect fit. # if the library "untb" has not been installed, then before using this # instruction type (assuming that you have access to the internet):
However, compared to the data used in ref. 5 on tropical tree species in a contiguous 50ha plot in Barro Colorado Island, Panama, the problem we face with the present dataset is that we have a small number of transects for each island (with the exception of Terceira). Therefore, half that number provides a very small scaling region from which to estimate the slope and intercept of the perceived power law relationship; in fact, for Santa Maria island we do not attempt to test the method because the total number of transects is only four. For Terceira and Pico islands we also show the forecasted distribution based only one one quarter of the transects. Still, the forecasted distributions provided a reasonable estimate of the species relative abundance of the entire data set, Figure S3. Terceira islands the dotted (gray) line is the forecasted species relative abundance obtained from one quarter of the total number of transects (4 and 10 transects for Pico and Terceira islands, respectively).

Figures with results for all sampled islands
Here

Sensitivity analysis of the impact of the choice of dispersal ability on the evolution of the skewness of the SADs
Dispersal ability forms a continuous. However, it is at the moment impossible from a practical point of view to quantify the dispersal ability for the 337 species of arthropods used in this work. Therefore, we described dispersal ability in a binary way: high and low dispersal ability. If we had access to more precise information on the dispersal ability for each species we could repeat the analysis by using only those sets of species at the extremes of the dispersal continuum (after choosing some thresholds), but we believe that in such a case we would have observed very similar trends, just more pronounced. We hope that one day such information will become available, so that these analyses become possible to perform. Here we assess possible sources of biases due to the miss-identification of dispersal ability.
Although we dealt with a large number of species, only 19 (approximately 6% of all species, with 3 identified as "High" and 16 as "Low" dispersal) raised doubts about their dispersal category; these are listed in Table S2. Therefore, we first assessed what would have been the impact of changing the dispersal "type" of these species on our previous results. Because one of our main arguments was that dispersal ability determines the evolution of the shape of the SADs which, as we discussed in the main text, in mainly reflected in the skewness of the distributions, we focus on this attribute to summarize the impact of incorrectly assigning dispersal ability. Our conclusion is that it did not change our main findings, as we can see by comparing Fig. S5 with Fig. S9, which are virtually identical.
Although we are confident about our assessment of the dispersal ability of the large majority of species, in what follows we further conducted a more thorough sensitivity analyses. We consider four levels for the probability of a species dispersal ability being incorrectly identified, Prob(W) = 0.06, 0.12, 0.18, 0.24, that is, ranging from our perceived possible error of 6% up to approximately one quarter of the species. Then, among the species that were incorrectly identified we consider six levels for the probability of a species being incorrectly identified as "High" instead of "Low", Prob(H|W) = 0.0, 0.2, 0.4, 0.6, 0.8, 1.0 or, equivalently, the probability of being incorrectly identified as "Low" instead of "High", Prob(L|W) = 1-P(H|W).
We use the data from Terceira island, the one with the largest number of transects, to illustrate the impact of incorrectly identifying the dispersal ability on the evolution of the SADs as described by the skewness. From Fig. S10 we conclude that only for the largest values of Prob(W) = 0.18 and 0.24, and Prob(H|W) equal to zero would our conclusions have been different, values of Prob(W) and Prob(H|W) that we think are unlikely. Therefore, we believe that our results are not significantly impacted by any unlikely miss identification of a species dispersal ability. Figure S9. The evolution of the skewness as a function of the number of transects assuming an incorrect identification of the dispersal ability for the species listed in Table S2 and thus changing for these the dispersal trait from "high" to "low", and vice-versa. Figure S10. The evolution of the skewness as a function of the number of transects assuming different probabilities of incorrectly identifying a species, P(W), and different probability for an incorrectly identified species being "High" while it should be "Low", P(H|W). Plots a-f are for P(W) = 0.06, similar to the value estimated by us, g-l for P(W) = 0.12, m-r for P(W) = 0.18, and s-x for P(W) = 0.24. In each set of six plots P(H|W) = 0.0, 0.2, 0.4, 0.6, 0.8, 1.0.