Quartile coefficient of variation is more robust than CV for traits calculated as a ratio

Botta-Dukát, Zoltán

doi:10.1038/s41598-023-31711-8

Download PDF

Article
Open access
Published: 22 March 2023

Quartile coefficient of variation is more robust than CV for traits calculated as a ratio

Zoltán Botta-Dukát¹

Scientific Reports volume 13, Article number: 4671 (2023) Cite this article

2190 Accesses
7 Citations
1 Altmetric
Metrics details

Subjects

Abstract

Comparing within-species variations of traits can be used in testing ecological theories. In these comparisons, it is useful to remove the effect of the difference in mean trait values, therefore measures of relative variation, most often the coefficient of variation (CV), are used. The studied traits are often calculated as the ratio of the size or mass of two organs: e.g. specific leaf area (SLA) is the ratio of leaf size and leaf mass. Often the inverse of these ratios is also meaningful; for example, the inverse of SLA is often referred to as LMA (leaf mass per area). Relative variation of a trait and its inverse should not considerably differ. However, it is illustrated that using the coefficient of variation may result in differences that could influence the interpretation, especially if there are outlier trait values. The alternative way for estimating CV from the standard deviation of log-transformed data assuming log-normal distribution and Kirkwood’s geometric coefficient of variation free from this problem, but they proved to be sensitive to outlier values. Quartile coefficient of variation performed best in the tests: it gives the same value for a trait and its inverse and it is not sensitive to outliers.

Plant responses to changing rainfall frequency and intensity

Article 09 April 2024

Andrew F. Feldman, Xue Feng, … Benjamin Poulter

Weather explains the decline and rise of insect biomass over 34 years

Article 27 September 2023

Jörg Müller, Torsten Hothorn, … Annette Menzel

Environmental drivers of increased ecosystem respiration in a warming tundra

Article Open access 17 April 2024

S. L. Maes, J. Dietrich, … E. Dorrepaal

Introduction

Values of qualitative traits can considerably vary among and within species^1,2. The structure of variation can be explored by partitioning total variance into components related to different sources (e.g. variation between species, between sites within species, between individuals within site, and within individuals). The calculated variance components express the relative contribution of sources in percentage, therefore their values can be compared between sites, species, or traits. However, some ecological hypotheses are related to the extent of trait variation^1,2. For example, it is hypothesized that the extent of intraspecific trait variation (ITV) is higher in generalists than in specialist species^3,4, and it may change along environmental and species richness gradients^5,6,7.

The coefficient of variation (CV), the standard deviation divided by the arithmetic mean, is the most widely used measure of the extent of trait variation e.g.^8,9,10,11. CV has two advantages: it is a dimensionless measure of relative variation². The extent of trait variation can be compared among traits only if it is measured in the same units. For example, the standard deviation of height, measured in cm, and SLA, measured in g cm⁻² cannot be compared, while their CV is comparable because it is dimensionless. Comparing absolute variation of the same trait between species also can be misleading when the difference between means is large. Ten centimeters departure from the mean height of the species is large for a short forb but small for a tall tree. That is why better to use relative measures, such as coefficient of variation, for among-species comparisons too.

Several papers called attention to cases where CV should not be used e.g.^12,13. The most important restrictions are that the domain of the variable has to be non-negative (otherwise, its arithmetic mean could be zero preventing calculation CV) and it has to be measured in ratio or log-interval scale, where the meaning of “zero” value is unarbitrary. It cannot be calculated for nominal or ordinal scale data, where the mean and standard deviation is undefined. CV also should not be calculated for interval and difference scale (i.e. for log-transformed ratio-scale variables¹⁴), where changing of unit influences the mean value. Brendel¹⁵ pointed out that the CV of standardized stable isotope ratios depends on the applied reference isotope ratio. The aims of this paper are (1) calling attention to another problem: swapping nominator and denominator of ratio type traits results in an altered CV value; and (2) suggesting to use of quartile coefficient of variation that is free from this problem.

Ratios of size or mass of plant organs are widely used as functional traits, such as the ratio of leaf area and leaf dry mass (specific leaf area, SLA or leaf mass per area, LMA), the ratio of root length and root dry mass (specific root length, SRL) or ratio of the shoot and root mass¹⁶. In these ratios, nominator and denominator are often interchangeable without loss of meaning; for example, instead of the specific leaf area (SLA) often its inverse, the leaf mass ratio (LMA) is calculated¹⁷. We would expect the relative variation of the two forms of ratio (e.g. SLA and LMA) to be the same. Note that some ratios can be transformed into proportions. For example, instead of shoot mass: root mass ratio, we can use proportion of shoot mass, i.e. shoot mass/(shoot mass + root mass). In case of proportions, relative variation of their complement is considered.

Theory

The coefficient of variation is defined as the ratio of standard deviation and mean of the distribution:

$$CV=\frac{\sigma }{\mu }.$$

Regarding the ratio of two random variates to bivariate function allows approximating its mean (${\mu }_{x/y}$) and standard deviation (${\sigma }_{x/y}$) by Taylor series expansion (see Supplementary Appendix A for the derivation of formulas):

$${\mu }_{x/y}\approx {\widetilde{\mu }}_{{x}/{y}}=\frac{{\mu }_{x}}{{\mu }_{y}}-\frac{cov\left(x,y\right)}{{\mu }_{y}^{2}}+\frac{{\sigma }_{y}{\mu }_{x}}{{\mu }_{y}^{3}},$$

$${{\sigma }_{x/y}\approx \widetilde{\sigma }}_{x/y}=\frac{{\mu }_{x}}{{\mu }_{y}}\sqrt{\frac{{\sigma }_{x}^{2}}{{\mu }_{x}^{2}}+\frac{{\sigma }_{y}^{2}}{{\mu }_{x}^{2}}-2\frac{cov\left(x,y\right)}{{\mu }_{x}{\mu }_{y}}.}$$

If CVs of x/y and y/x equal:

$$\frac{{\mu }_{x/y}}{{\sigma }_{x/y}}=\frac{{\mu }_{y/x}}{{\sigma }_{y/x}},$$

therefore,

$${\mu }_{x/y}=\frac{{\sigma }_{x/y}}{{\sigma }_{y/x}}{\mu }_{y/x}.$$

This equation should be—at least approximately – hold to approximate means and standard deviations, but:

$$\frac{{\widetilde{\sigma }}_{x/y}}{{\widetilde{\sigma }}_{y/x}}{\widetilde{\mu }}_{y/x}=\frac{{\mu }_{x}}{{\mu }_{y}}\sqrt{\frac{{{\sigma }_x}^2+{{\sigma }_y}^2-2{\mu}_xcov\left(x,y\right)/{\mu}_y}{{\sigma }_x^2+{{\sigma }_y}^2-2{\mu}_ycov\left(x,y\right)/{\mu}_x}}\ne {\widetilde{\mu }}_{{x}/{y}}.$$

Since the equation does not hold for the approximate value, we can expect that CVs of a ratio and its inverse may differ. A real example will be shown in the Results section to illustrate that the difference could be important.

However, there is an important exception, when the ratio follows log-normal distribution. If x/y is log-normally distributed, its logarithm follows normal distribution, with $\nu$ mean and $\theta$ standard deviation

$$ln\frac{x}{y}\sim N\left(\nu ,\theta \right),$$

where $\nu$ and $\theta$ are the mean and standard deviation of the log-transformed ratio, respectively. The mean and standard deviation of the log-normal distribution are

$${\mu }_{x/y}=exp\left(\nu +0.5{\theta }^{2}\right),$$

$${\sigma }_{x/y}=exp\left(\nu +0.5{\theta }^{2}\right)\sqrt{\mathrm{exp}\left({\theta }^{2}\right)-1}.$$

Therefore, CV depends on $\theta$ only, and it is independent from $\nu$¹⁸:

$${CV}_{x/y}=\sqrt{\mathrm{exp}\left({\theta }^{2}\right)-1}.$$

Since

$$ln\frac{y}{x}=-ln\frac{x}{y},$$

The logarithm of the inverse ratio is also normally distributed with the same standard deviation:

$$ln\frac{y}{x}\sim N\left(-\nu ,\theta \right),$$

Thus in this case CV is the same for the ratio and its inverse.

CV can be estimated by replacing standard deviation (σ) and means (μ) with their estimates (s and m, respectively):

$$\widehat{CV}=\frac{s}{m}.$$

If x/y follows lognormal distribution, there is another estimator of CV:

$$\widehat{{CV}_{L}}=\sqrt{\mathrm{exp}\left({\widehat{\theta }}^{2}\right)-1}=\sqrt{\mathrm{exp}\left(\frac{\sum {\left({z}_{i}-\overline{z }\right)}^{2}}{n-1}\right)-1},$$

where ${z}_{i}=ln\left({x}_{i}/{y}_{i}\right)$, $\overline{z }$ is the arithmetic means of log-transformed ratios and n is the sample size. $\widehat{{CV}_{L}}$ can be used as a descriptive statistic even if the ratio does not follow log-normal distribution.

Kirkwood¹⁹proposed another descriptive statistic the so-called geometric coefficient of variation:

$$GCV=\mathrm{exp}\left(\sqrt{\frac{\sum {\left({z}_{i}-\overline{z }\right)}^{2}}{n-1}}\right)-1.$$

GCV is not an estimate of CV, even if z follows log-normal distribution.

The logic of calculating CV is that dividing the measure of dispersion (standard deviation in CV) by the measure of location (mean in CV) removes the effect of differences in dispersion due to different locations, and if both are measured in the same units results in a dimensionless measure. Following this logic, several alternatives to CV were developed. The main motivation was to develop more robust (i.e. less sensitive to outlier values) alternatives to CV²⁰ and references therein. Unfortunately, most of the proposed robust relative variation measures are also sensitive to swapping nominator and denominator in ratio type traits. An exception is the quartile coefficient of variation (CV_Q):

$${CV}_{Q}\left(x\right)=\frac{{Q}_{3}\left(x\right)-{Q}_{1}\left(x\right)}{{Q}_{3}\left(x\right)+{Q}_{1}\left(x\right)},$$

where ${Q}_{1}\left(x\right)$ and ${Q}_{3}\left(x\right)$ are the first and third quartiles of variable x^20,21.

For proving that ${CV}_{Q}\left(x/y\right)={CV}_{Q}\left(y/x\right)$ we will use the equation ${Q}_{3}\left(y/x\right)=1/{Q}_{1}\left(x/y\right)$. Therefore, first, this equation has to be proved. Let us start from the definition of first

$$P\left\{x/y\le {Q}_{1}\left(x/y\right)\right\}=0.25,$$

and third quartile

$$P\left\{y/x\le {Q}_{3}\left(y/x\right)\right\}=0.75.$$

From the definition of the first quartile of x

$$P\left\{x/y>{Q}_{1}\left(x/y\right)\right\}=0.75,$$

thus

$$P\left\{x/y>{Q}_{1}\left(x/y\right)\right\}= P\left\{y/x\le {Q}_{3}\left(y/x\right)\right\}.$$

If $x/y>{Q}_{1}\left(x/y\right)$ then $y/x<1/{Q}_{1}\left(x/y\right)$, therefore

$$P\left\{y/x<1/{Q}_{1}\left(x/y\right)\right\}= P\left\{y/x\le {Q}_{3}\left(y/x\right)\right\}.$$

Since for a continuous variable, the probability of any possible value is zero, on the right side the “less than or equal to” can be replaced by “less than”

$$P\left\{y/x<1/{Q}_{1}\left(x/y\right)\right\}= P\left\{y/x<{Q}_{3}\left(y/x\right)\right\},$$

and this equation holds only if

$${Q}_{3}\left(y/x\right)=1/{Q}_{1}\left(x/y\right).$$

Now, we can turn back to the proof of ${CV}_{Q}\left(x/y\right)={CV}_{Q}\left(y/x\right)$ equality.

$${CV}_{Q}\left(y/x\right)=\frac{{Q}_{3}\left(y/x\right)-{Q}_{1}\left(y/x\right)}{{Q}_{3}\left(y/x\right)+{Q}_{1}\left(y/x\right)}=\frac{\frac{1}{{Q}_{1}\left(x/y\right)}-\frac{1}{{Q}_{3}\left(x/y\right)}}{\frac{1}{{Q}_{1}\left(x/y\right)}+\frac{1}{{Q}_{3}\left(x/y\right)}}= \frac{\frac{{Q}_{3}\left(x/y\right)-{Q}_{1}\left(x/y\right)}{{Q}_{3}\left(x/y\right){Q}_{1}\left(x/y\right)}}{\frac{{Q}_{3}\left(x/y\right)+{Q}_{1}\left(x/y\right)}{{Q}_{3}\left(x/y\right){Q}_{1}\left(x/y\right)}}=\frac{{Q}_{3}\left(x/y\right)-{Q}_{1}\left(x/y\right)}{{Q}_{3}\left(x/y\right)+{Q}_{1}\left(x/y\right)}={CV}_{Q}\left(x/y\right).$$

Note that finite sample estimates of ${CV}_{Q}\left(y/x\right)$ and ${CV}_{Q}\left(x/y\right)$ may slightly differ.

Finally, let us shortly overview the relative variation of proportions. The standard deviation of a proportion and its complement is the same: ${\sigma }_{x/\left(x+y\right)}={\sigma }_{y/\left(x+y\right)}$. But their mean is different, ${\mu }_{x/\left(x+y\right)}=1-{\mu }_{y/\left(x+y\right)}$, therefor their c.v. also will be different. First and third quartile of a proportion and its complement is related:

$${Q}_{3}\left(\frac{y}{x+y}\right)=1-{Q}_{1}\left(\frac{x}{x+y}\right),$$

$${Q}_{1}\left(\frac{y}{x+y}\right)=1-{Q}_{3}\left(\frac{x}{x+y}\right).$$

The interquartile range is the same for both y/(x + y) and x/(x + y), but the sum of the two quartiles and therefore the quartile coefficient of variation is different. The absolute variation (i.e. standard deviation or interquartile range) of proportions and their complement is the same, but their relative variation is different. We have to keep in our mind that a proportion and this complement are interchangeable when absolute variation is studied, but they have different meaning when relative variation is calculated.

Results

As expected, the differences between SLA and LMA in $\widehat{{CV}_{L}}$ and GCV came only from rounding errors: the order of largest difference was 10^–16. In the quartile coefficient of variation, the highest difference was 0.007 (Fig. 1a). However, differences hardly influenced the ranking of species according to the amount of intraspecific trait variation: the largest difference in ranks was 1, and 67 of 79 species the rank was the same for both traits. However, the amount of intraspecific trait variation (ITV) of SLA and LMA measured by $\widehat{CV}$ (i.e. estimated standard deviation divided by sample mean) differed considerably (Fig. 1b): the largest difference was 1.07. Although the rank of species based on SLA and LMA was strongly correlated even if ITV was measured by $\widehat{CV}$ (Fig. 2), the position of some species was strongly influenced: the largest difference in ranks between the two traits (SLA and LMA) was 21, and only 4 of 79 species remained ranks the same.

The differences in $\widehat{CV}$ between SLA and LMA were mainly caused by outlier values. After species-wise excluding outlier SLA values, the highest difference reduced to 0.25, but the difference between ranks of species according to ITV of SLA and LMA remained large: the highest rank difference was 24 (even larger than without excluding outliers), and only for 14 of 79 species were the two ranks the same.

Excluding outlier values had a negligible effect on ITV measured by quadratic CV, the correlation between values estimated with and without excluding outliers was 0.99. The same correlation of $\widehat{CV}$ was 0.84. Surprisingly, the correlations between ITV calculated with or without excluding species-wise outliers were even smaller for $\widehat{{CV}_{L}}$ and GCV (0.67 and 0.65, respectively).

All of the four measures of ITV indicate almost the same property of species (Table 1): the lowest linear correlation was 0.61, while the lowest Spearman’s rank correlation was 0.72. Quartile coefficient of variation was the most different from the other three measures because it depends only on the central part of trait distribution, and therefore it is fully insensitive to outlier values.

Table 1 Correlations between within-species relative variation of SLA with (upper half-matrix) and without (lower half-matrix) excluding outliers.

Full size table

Discussion

Presented results illustrate that ratio of sample standard deviation and sample mean ($\widehat{CV}$) is sensitive both to outlier values and choosing a ratio-trait or its inverse (for example SLA or LMA). Three alternatives to this measure were evaluated in this paper. Both $\widehat{{CV}_{L}}$ and GCV gave the same value for a trait and its inverse, but they are more sensitive to outlier values than $\widehat{CV}$. Quartile CV proved to be the most robust measure of ITV, it was hardly influenced by either excluding outliers and choosing a trait or its inverse. Therefore, I suggest that in studies testing hypotheses related to the amount of intra-specific trait variation, the quartile coefficient of variation should be used, especially if the inverse of the studied trait (i.e. 1/trait) is also meaningful.

Materials and methods

An R function for calculating two estimates of CV ($\widehat{CV}$ and $\widehat{{CV}_{L}}$), geometric coefficient of variation (GCV), and quartile coefficient of variation (${CV}_{Q}$) were developed (Supplementary Appendix B). All analyses were done in R environment, and the script and data will be available in a public repository.

For illustrating purposes, the dataset of Gyalus et al.²² was used that contains plot level measurement of leaf traits. In this paper, only specific leaf (SLA, leaf area in cm² per leaf dry mass in g) data were used. Leaf mass per area (LMA) was calculated as 1/SLA. Four indices of relative variation of SLA and LMA were calculated for each species with at least 10 SLA data. Then the absolute differences between SLA and LMA in relative within-species variation and species rank according to within-species variation were calculated. Since $\widehat{CV}$ could be more sensitive to outlier values than other measures, all analyses were repeated after excluding outlier values.

Data availability

Data and code available from Zenodo https://doi.org/10.5281/zenodo.6907699.

References

Albert, C. H. et al. Intraspecific functional variability: Extent, structure and sources of variation. J. Ecol. 98, 604–613 (2010).
Article Google Scholar
Albert, C. H., Grassein, F., Schurr, F. M., Vieilledent, G. & Violle, C. When and how should intraspecific variability be considered in trait-based plant ecology?. Perspect. Plant Ecol. Evol. Syst. 13, 217–225 (2011).
Article Google Scholar
Sides, C. B. et al. Revisiting Darwin’s hypothesis: Does greater intraspecific variability increase species’ ecological breadth?. Am. J. Bot. 101, 56–62 (2014).
Article PubMed Google Scholar
Wellstein, C. et al. Intraspecific phenotypic variability of plant functional traits in contrasting mountain grasslands habitats. Biodivers. Conserv. 22, 2353–2374 (2013).
Article Google Scholar
Helsen, K. et al. Biotic and abiotic drivers of intraspecific trait variation within plant populations of three herbaceous plant species along a latitudinal gradient. BMC Ecol. 17, 38 (2017).
Article PubMed PubMed Central Google Scholar
Kuppler, J. et al. Global gradients in intraspecific variation in vegetative and floral traits are partially associated with climate and species richness. Glob. Ecol. Biogeogr. 29, 992–1007 (2020).
Article Google Scholar
Lemke, I. H. et al. Patterns of phenotypic trait variation in two temperate forest herbs along a broad climatic gradient. Plant Ecol. 216, 1523–1536 (2015).
Article Google Scholar
Cheng, J., Chu, P., Chen, D. & Bai, Y. Functional correlations between specific leaf area and specific root length along a regional environmental gradient in inner Mongolia grasslands. Funct. Ecol. 30, 985–997 (2016).
Article Google Scholar
Li, S. et al. Leaf functional traits of dominant desert plants in the Hexi Corridor, Northwestern China: Trade-off relationships and adversity strategies. Glob. Ecol. Conserv. 28, e01666 (2021).
Article Google Scholar
Roscher, C. et al. Trait means, trait plasticity and trait differences to other species jointly explain species performances in grasslands of varying diversity. Oikos 127, 865–865 (2018).
Article CAS Google Scholar
Roscher, C. et al. Functional groups differ in trait means, but not in trait plasticity to species richness in local grassland communities. Ecology 99, 2295–2307 (2018).
Article PubMed Google Scholar
Livers, J. J. Some limitations to use of coefficient of variation. J. Farm Econ. 24, 892 (1942).
Article Google Scholar
Pélabon, C., Hilde, C. H., Einum, S. & Gamelon, M. On the use of the coefficient of variation to quantify and compare trait variation. Evol. Lett. 4, 180–188 (2020).
Article PubMed PubMed Central Google Scholar
Houle, D., Pélabon, C., Wagner, G. P. & Hansen, T. F. Measurement and meaning in biology. Q. Rev. Biol. 86, 3–34 (2011).
Article PubMed Google Scholar
Brendel, O. Is the coefficient of variation a valid measure for variability of stable isotope abundances in biological materials?: Is CV a valid measure for isotopic compositions?. Rapid Commun. Mass Spectrom. 28, 370–376 (2014).
Article ADS CAS PubMed Google Scholar
Pérez-Harguindeguy, N. et al. New handbook for standardised measurement of plant functional traits worldwide. Aust. J. Bot. 61, 167–234 (2013).
Article Google Scholar
Poorter, H., Niinemets, Ü., Poorter, L., Wright, I. J. & Villar, R. Causes and consequences of variation in leaf mass per area (LMA): A meta-analysis. New Phytol. 182, 565–588 (2009).
Article PubMed Google Scholar
Koopmans, L. H., Owen, D. B. & Rosenblatt, J. I. Confidence intervals for the coefficient of variation for the normal and log normal distributions. Biometrika 51, 25–32 (1964).
Article MathSciNet MATH Google Scholar
Kirkwood, T. B. L. Geometric means and measures of dispersion. Biometrics 35, 908–909 (1979).
Google Scholar
Arachchige, C. N. P. G., Prendergast, L. A. & Staudte, R. G. Robust analogs to the coefficient of variation. J. Appl. Stat. 49, 268–290 (2022).
Article MathSciNet PubMed MATH Google Scholar
Bonett, D. G. Confidence interval for a coefficient of quartile variation. Comput. Stat. Data Anal. 50, 2953–2957 (2006).
Article MathSciNet MATH Google Scholar
Gyalus, A. et al. Plant trait records of the Hungarian and Serbian flora and methodological description of some hard to measure plant species. Acta Bot. Hung. 64, 451–454 (2022).
Article Google Scholar

Download references

Acknowledgements

This research was supported by the NKFIH-K124671 grant.

Funding

Open access funding provided by ELKH Centre for Ecological Research.

Author information

Authors and Affiliations

Centre for Ecological Research, Alkotmány 2-4., Vácrátót, H-2163, Hungary
Zoltán Botta-Dukát

Authors

Zoltán Botta-Dukát
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

Z.B.-D. conceived, designed, and executed this study and wrote the manuscript. No other person is entitled to authorship.

Corresponding author

Correspondence to Zoltán Botta-Dukát.

Ethics declarations

Competing interests

The author declares no competing interests.

Additional information

Publisher's note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Supplementary Information.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Botta-Dukát, Z. Quartile coefficient of variation is more robust than CV for traits calculated as a ratio. Sci Rep 13, 4671 (2023). https://doi.org/10.1038/s41598-023-31711-8

Download citation

Received: 14 July 2022
Accepted: 16 March 2023
Published: 22 March 2023
DOI: https://doi.org/10.1038/s41598-023-31711-8

This article is cited by

An in-depth analysis of parameter settings and probability distributions of specific ordinal patterns in the Shannon permutation entropy during different states of consciousness in humans
- Michelle Franka
- Alexander Edthofer
- Matthias Kreuzer
Journal of Clinical Monitoring and Computing (2024)
Climatic spatial dynamics in the state of Pernambuco through geostatistical modeling
- Alanderson Firmino de Lucas
- Lizandra de Barros de Sousa
- Thieres George Freire da Silva
Theoretical and Applied Climatology (2024)

Comments

By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.