Introduction

Values of qualitative traits can considerably vary among and within species1,2. The structure of variation can be explored by partitioning total variance into components related to different sources (e.g. variation between species, between sites within species, between individuals within site, and within individuals). The calculated variance components express the relative contribution of sources in percentage, therefore their values can be compared between sites, species, or traits. However, some ecological hypotheses are related to the extent of trait variation1,2. For example, it is hypothesized that the extent of intraspecific trait variation (ITV) is higher in generalists than in specialist species3,4, and it may change along environmental and species richness gradients5,6,7.

The coefficient of variation (CV), the standard deviation divided by the arithmetic mean, is the most widely used measure of the extent of trait variation e.g.8,9,10,11. CV has two advantages: it is a dimensionless measure of relative variation2. The extent of trait variation can be compared among traits only if it is measured in the same units. For example, the standard deviation of height, measured in cm, and SLA, measured in g cm−2 cannot be compared, while their CV is comparable because it is dimensionless. Comparing absolute variation of the same trait between species also can be misleading when the difference between means is large. Ten centimeters departure from the mean height of the species is large for a short forb but small for a tall tree. That is why better to use relative measures, such as coefficient of variation, for among-species comparisons too.

Several papers called attention to cases where CV should not be used e.g.12,13. The most important restrictions are that the domain of the variable has to be non-negative (otherwise, its arithmetic mean could be zero preventing calculation CV) and it has to be measured in ratio or log-interval scale, where the meaning of “zero” value is unarbitrary. It cannot be calculated for nominal or ordinal scale data, where the mean and standard deviation is undefined. CV also should not be calculated for interval and difference scale (i.e. for log-transformed ratio-scale variables14), where changing of unit influences the mean value. Brendel15 pointed out that the CV of standardized stable isotope ratios depends on the applied reference isotope ratio. The aims of this paper are (1) calling attention to another problem: swapping nominator and denominator of ratio type traits results in an altered CV value; and (2) suggesting to use of quartile coefficient of variation that is free from this problem.

Ratios of size or mass of plant organs are widely used as functional traits, such as the ratio of leaf area and leaf dry mass (specific leaf area, SLA or leaf mass per area, LMA), the ratio of root length and root dry mass (specific root length, SRL) or ratio of the shoot and root mass16. In these ratios, nominator and denominator are often interchangeable without loss of meaning; for example, instead of the specific leaf area (SLA) often its inverse, the leaf mass ratio (LMA) is calculated17. We would expect the relative variation of the two forms of ratio (e.g. SLA and LMA) to be the same. Note that some ratios can be transformed into proportions. For example, instead of shoot mass: root mass ratio, we can use proportion of shoot mass, i.e. shoot mass/(shoot mass + root mass). In case of proportions, relative variation of their complement is considered.

Theory

The coefficient of variation is defined as the ratio of standard deviation and mean of the distribution:

$$CV=\frac{\sigma }{\mu }.$$

Regarding the ratio of two random variates to bivariate function allows approximating its mean (\({\mu }_{x/y}\)) and standard deviation (\({\sigma }_{x/y}\)) by Taylor series expansion (see Supplementary Appendix A for the derivation of formulas):

$${\mu }_{x/y}\approx {\widetilde{\mu }}_{{x}/{y}}=\frac{{\mu }_{x}}{{\mu }_{y}}-\frac{cov\left(x,y\right)}{{\mu }_{y}^{2}}+\frac{{\sigma }_{y}{\mu }_{x}}{{\mu }_{y}^{3}},$$
$${{\sigma }_{x/y}\approx \widetilde{\sigma }}_{x/y}=\frac{{\mu }_{x}}{{\mu }_{y}}\sqrt{\frac{{\sigma }_{x}^{2}}{{\mu }_{x}^{2}}+\frac{{\sigma }_{y}^{2}}{{\mu }_{x}^{2}}-2\frac{cov\left(x,y\right)}{{\mu }_{x}{\mu }_{y}}.}$$

If CVs of x/y and y/x equal:

$$\frac{{\mu }_{x/y}}{{\sigma }_{x/y}}=\frac{{\mu }_{y/x}}{{\sigma }_{y/x}},$$

therefore,

$${\mu }_{x/y}=\frac{{\sigma }_{x/y}}{{\sigma }_{y/x}}{\mu }_{y/x}.$$

This equation should be—at least approximately – hold to approximate means and standard deviations, but:

$$\frac{{\widetilde{\sigma }}_{x/y}}{{\widetilde{\sigma }}_{y/x}}{\widetilde{\mu }}_{y/x}=\frac{{\mu }_{x}}{{\mu }_{y}}\sqrt{\frac{{{\sigma }_x}^2+{{\sigma }_y}^2-2{\mu}_xcov\left(x,y\right)/{\mu}_y}{{\sigma }_x^2+{{\sigma }_y}^2-2{\mu}_ycov\left(x,y\right)/{\mu}_x}}\ne {\widetilde{\mu }}_{{x}/{y}}.$$

Since the equation does not hold for the approximate value, we can expect that CVs of a ratio and its inverse may differ. A real example will be shown in the Results section to illustrate that the difference could be important.

However, there is an important exception, when the ratio follows log-normal distribution. If x/y is log-normally distributed, its logarithm follows normal distribution, with \(\nu\) mean and \(\theta\) standard deviation

$$ln\frac{x}{y}\sim N\left(\nu ,\theta \right),$$

where \(\nu\) and \(\theta\) are the mean and standard deviation of the log-transformed ratio, respectively. The mean and standard deviation of the log-normal distribution are

$${\mu }_{x/y}=exp\left(\nu +0.5{\theta }^{2}\right),$$
$${\sigma }_{x/y}=exp\left(\nu +0.5{\theta }^{2}\right)\sqrt{\mathrm{exp}\left({\theta }^{2}\right)-1}.$$

Therefore, CV depends on \(\theta\) only, and it is independent from \(\nu\)18:

$${CV}_{x/y}=\sqrt{\mathrm{exp}\left({\theta }^{2}\right)-1}.$$

Since

$$ln\frac{y}{x}=-ln\frac{x}{y},$$

The logarithm of the inverse ratio is also normally distributed with the same standard deviation:

$$ln\frac{y}{x}\sim N\left(-\nu ,\theta \right),$$

Thus in this case CV is the same for the ratio and its inverse.

CV can be estimated by replacing standard deviation (σ) and means (μ) with their estimates (s and m, respectively):

$$\widehat{CV}=\frac{s}{m}.$$

If x/y follows lognormal distribution, there is another estimator of CV:

$$\widehat{{CV}_{L}}=\sqrt{\mathrm{exp}\left({\widehat{\theta }}^{2}\right)-1}=\sqrt{\mathrm{exp}\left(\frac{\sum {\left({z}_{i}-\overline{z }\right)}^{2}}{n-1}\right)-1},$$

where \({z}_{i}=ln\left({x}_{i}/{y}_{i}\right)\), \(\overline{z }\) is the arithmetic means of log-transformed ratios and n is the sample size. \(\widehat{{CV}_{L}}\) can be used as a descriptive statistic even if the ratio does not follow log-normal distribution.

Kirkwood19proposed another descriptive statistic the so-called geometric coefficient of variation:

$$GCV=\mathrm{exp}\left(\sqrt{\frac{\sum {\left({z}_{i}-\overline{z }\right)}^{2}}{n-1}}\right)-1.$$

GCV is not an estimate of CV, even if z follows log-normal distribution.

The logic of calculating CV is that dividing the measure of dispersion (standard deviation in CV) by the measure of location (mean in CV) removes the effect of differences in dispersion due to different locations, and if both are measured in the same units results in a dimensionless measure. Following this logic, several alternatives to CV were developed. The main motivation was to develop more robust (i.e. less sensitive to outlier values) alternatives to CV20 and references therein. Unfortunately, most of the proposed robust relative variation measures are also sensitive to swapping nominator and denominator in ratio type traits. An exception is the quartile coefficient of variation (CVQ):

$${CV}_{Q}\left(x\right)=\frac{{Q}_{3}\left(x\right)-{Q}_{1}\left(x\right)}{{Q}_{3}\left(x\right)+{Q}_{1}\left(x\right)},$$

where \({Q}_{1}\left(x\right)\) and \({Q}_{3}\left(x\right)\) are the first and third quartiles of variable x20,21.

For proving that \({CV}_{Q}\left(x/y\right)={CV}_{Q}\left(y/x\right)\) we will use the equation \({Q}_{3}\left(y/x\right)=1/{Q}_{1}\left(x/y\right)\). Therefore, first, this equation has to be proved. Let us start from the definition of first

$$P\left\{x/y\le {Q}_{1}\left(x/y\right)\right\}=0.25,$$

and third quartile

$$P\left\{y/x\le {Q}_{3}\left(y/x\right)\right\}=0.75.$$

From the definition of the first quartile of x

$$P\left\{x/y>{Q}_{1}\left(x/y\right)\right\}=0.75,$$

thus

$$P\left\{x/y>{Q}_{1}\left(x/y\right)\right\}= P\left\{y/x\le {Q}_{3}\left(y/x\right)\right\}.$$

If \(x/y>{Q}_{1}\left(x/y\right)\) then \(y/x<1/{Q}_{1}\left(x/y\right)\), therefore

$$P\left\{y/x<1/{Q}_{1}\left(x/y\right)\right\}= P\left\{y/x\le {Q}_{3}\left(y/x\right)\right\}.$$

Since for a continuous variable, the probability of any possible value is zero, on the right side the “less than or equal to” can be replaced by “less than”

$$P\left\{y/x<1/{Q}_{1}\left(x/y\right)\right\}= P\left\{y/x<{Q}_{3}\left(y/x\right)\right\},$$

and this equation holds only if

$${Q}_{3}\left(y/x\right)=1/{Q}_{1}\left(x/y\right).$$

Now, we can turn back to the proof of \({CV}_{Q}\left(x/y\right)={CV}_{Q}\left(y/x\right)\) equality.

$${CV}_{Q}\left(y/x\right)=\frac{{Q}_{3}\left(y/x\right)-{Q}_{1}\left(y/x\right)}{{Q}_{3}\left(y/x\right)+{Q}_{1}\left(y/x\right)}=\frac{\frac{1}{{Q}_{1}\left(x/y\right)}-\frac{1}{{Q}_{3}\left(x/y\right)}}{\frac{1}{{Q}_{1}\left(x/y\right)}+\frac{1}{{Q}_{3}\left(x/y\right)}}= \frac{\frac{{Q}_{3}\left(x/y\right)-{Q}_{1}\left(x/y\right)}{{Q}_{3}\left(x/y\right){Q}_{1}\left(x/y\right)}}{\frac{{Q}_{3}\left(x/y\right)+{Q}_{1}\left(x/y\right)}{{Q}_{3}\left(x/y\right){Q}_{1}\left(x/y\right)}}=\frac{{Q}_{3}\left(x/y\right)-{Q}_{1}\left(x/y\right)}{{Q}_{3}\left(x/y\right)+{Q}_{1}\left(x/y\right)}={CV}_{Q}\left(x/y\right).$$

Note that finite sample estimates of \({CV}_{Q}\left(y/x\right)\) and \({CV}_{Q}\left(x/y\right)\) may slightly differ.

Finally, let us shortly overview the relative variation of proportions. The standard deviation of a proportion and its complement is the same: \({\sigma }_{x/\left(x+y\right)}={\sigma }_{y/\left(x+y\right)}\). But their mean is different, \({\mu }_{x/\left(x+y\right)}=1-{\mu }_{y/\left(x+y\right)}\), therefor their c.v. also will be different. First and third quartile of a proportion and its complement is related:

$${Q}_{3}\left(\frac{y}{x+y}\right)=1-{Q}_{1}\left(\frac{x}{x+y}\right),$$
$${Q}_{1}\left(\frac{y}{x+y}\right)=1-{Q}_{3}\left(\frac{x}{x+y}\right).$$

The interquartile range is the same for both y/(x + y) and x/(x + y), but the sum of the two quartiles and therefore the quartile coefficient of variation is different. The absolute variation (i.e. standard deviation or interquartile range) of proportions and their complement is the same, but their relative variation is different. We have to keep in our mind that a proportion and this complement are interchangeable when absolute variation is studied, but they have different meaning when relative variation is calculated.

Results

As expected, the differences between SLA and LMA in \(\widehat{{CV}_{L}}\) and GCV came only from rounding errors: the order of largest difference was 10–16. In the quartile coefficient of variation, the highest difference was 0.007 (Fig. 1a). However, differences hardly influenced the ranking of species according to the amount of intraspecific trait variation: the largest difference in ranks was 1, and 67 of 79 species the rank was the same for both traits. However, the amount of intraspecific trait variation (ITV) of SLA and LMA measured by \(\widehat{CV}\) (i.e. estimated standard deviation divided by sample mean) differed considerably (Fig. 1b): the largest difference was 1.07. Although the rank of species based on SLA and LMA was strongly correlated even if ITV was measured by \(\widehat{CV}\) (Fig. 2), the position of some species was strongly influenced: the largest difference in ranks between the two traits (SLA and LMA) was 21, and only 4 of 79 species remained ranks the same.

Figure 1
figure 1

Within-species relative variation of specific leaf area (SLA) and leaf mass per area (LMA) calculated by (a) CV (coefficient of variation, standard deviation divided by mean) and (b) quartile coefficient of variation (see formula in the main text). Red line is the 1:1 line.

Figure 2
figure 2

Rank of species based on their within-species relative variation of specific leaf area (SLA) and leaf mass per area (LMA) calculated by CV (coefficient of variation, standard deviation divided by mean). Red line is the 1:1 line.

The differences in \(\widehat{CV}\) between SLA and LMA were mainly caused by outlier values. After species-wise excluding outlier SLA values, the highest difference reduced to 0.25, but the difference between ranks of species according to ITV of SLA and LMA remained large: the highest rank difference was 24 (even larger than without excluding outliers), and only for 14 of 79 species were the two ranks the same.

Excluding outlier values had a negligible effect on ITV measured by quadratic CV, the correlation between values estimated with and without excluding outliers was 0.99. The same correlation of \(\widehat{CV}\) was 0.84. Surprisingly, the correlations between ITV calculated with or without excluding species-wise outliers were even smaller for \(\widehat{{CV}_{L}}\) and GCV (0.67 and 0.65, respectively).

All of the four measures of ITV indicate almost the same property of species (Table 1): the lowest linear correlation was 0.61, while the lowest Spearman’s rank correlation was 0.72. Quartile coefficient of variation was the most different from the other three measures because it depends only on the central part of trait distribution, and therefore it is fully insensitive to outlier values.

Table 1 Correlations between within-species relative variation of SLA with (upper half-matrix) and without (lower half-matrix) excluding outliers.

Discussion

Presented results illustrate that ratio of sample standard deviation and sample mean (\(\widehat{CV}\)) is sensitive both to outlier values and choosing a ratio-trait or its inverse (for example SLA or LMA). Three alternatives to this measure were evaluated in this paper. Both \(\widehat{{CV}_{L}}\) and GCV gave the same value for a trait and its inverse, but they are more sensitive to outlier values than \(\widehat{CV}\). Quartile CV proved to be the most robust measure of ITV, it was hardly influenced by either excluding outliers and choosing a trait or its inverse. Therefore, I suggest that in studies testing hypotheses related to the amount of intra-specific trait variation, the quartile coefficient of variation should be used, especially if the inverse of the studied trait (i.e. 1/trait) is also meaningful.

Materials and methods

An R function for calculating two estimates of CV (\(\widehat{CV}\) and \(\widehat{{CV}_{L}}\)), geometric coefficient of variation (GCV), and quartile coefficient of variation (\({CV}_{Q}\)) were developed (Supplementary Appendix B). All analyses were done in R environment, and the script and data will be available in a public repository.

For illustrating purposes, the dataset of Gyalus et al.22 was used that contains plot level measurement of leaf traits. In this paper, only specific leaf (SLA, leaf area in cm2 per leaf dry mass in g) data were used. Leaf mass per area (LMA) was calculated as 1/SLA. Four indices of relative variation of SLA and LMA were calculated for each species with at least 10 SLA data. Then the absolute differences between SLA and LMA in relative within-species variation and species rank according to within-species variation were calculated. Since \(\widehat{CV}\) could be more sensitive to outlier values than other measures, all analyses were repeated after excluding outlier values.