## Background & Summary

Plant traits are the morphological, chemical, physiological or phenological properties of individuals1. They determine how plants as primary producers capture, process and store resources, how they respond to their abiotic and biotic environment and disturbances, and how they affect other trophic levels and the fluxes of water, carbon and energy through ecosystems2,3,4,5,6,7,8.

Despite the overwhelming diversity of plant forms and life histories on Earth, single plant organs, such as leaves, stems, or seeds, show comparatively few essential trait combinations9. Evidence for recurrent trait syndromes beyond the level of single organs has been rare, restricted geographically or taxonomically, and often contradictory. Díaz et al.9 addressed this question by analyzing the worldwide variation in six major traits critical to growth, survival and reproduction, namely: plant height (H), stem specific density (SSD), leaf area (LA), leaf mass per area (LMA), leaf nitrogen content per dry mass (Nmass) and diaspore (seed or spore) mass (SM). Díaz et al.9 found that occupancy of the six-dimensional trait space is highly constrained, and is captured in a two-dimensional global spectrum of plant form and function, indicating strong correlation and trade-offs among traits. These results provide a foundation and baseline for studies of plant evolution, comparative plant and ecosystems ecology, and predictive modelling of future vegetation based on continuous variation in essential plant functional dimensions.

Here we provide the trait dataset that served as basis for the analysis of the global spectrum of plant form and function presented in Díaz et al.9 –the ‘Global Spectrum of Plant Form and Function Dataset’ (short here ‘Global Spectrum Dataset’). The dataset is predominantly based on trait records compiled in the TRY database10,11 and provides trait values corresponding –to the extent possible–to mature and healthy plants grown under natural conditions within the species distribution range. The dataset provides species mean values for the six plant traits mentioned above plus leaf dry matter content, used for the imputation of stem specific density. The dataset covers >46,000 of the approximately 391,000 vascular plant species known to science12. Despite the rapid development of large plant trait datasets, the Global Spectrum Dataset stands out in terms of coverage and reliability. First, it provides quantitative information for a very high number of species, including about 5% of them with ‘complete coverage’ (all six traits). Second, it represents a unique combination of probabilistic outlier detection and comprehensive validation of trait values against expert knowledge and external information for data quality assurance. Third, it contains the attribution of data to original references, even if datasets contributed to TRY had been assembled from multiple original sources.

The quantitative trait data are enhanced by higher-level taxonomic information, based on the Angiosperm Phylogeny APG III (http://www.mobot.org/MOBOT/research/APweb/) and categorical traits, based on the ‘TRY – Categorical Traits Dataset’13, enriched by field data and various literature sources. This information facilitates stratification of species and quantitative traits according to phylogenetic and morpho-functional criteria.

The present dataset results from the integration of trait measurements from many datasets received via TRY and additional, partly unpublished, data. The data come from largely independent studies, that address a wide variety of questions at different scales, and using different measurement methods, units and terminologies14. The development of the dataset therefore faced three challenges: (1) to derive a dataset of species mean values covering all six traits with the aim of being representative of vascular plant species worldwide; (2) to detect erroneous trait records (due to errors in sampling, measurement, unit conversion, etc.); and (3) to ensure that correctly measured extreme values of traits in nature were not mistakenly identified as outliers and therefore excluded from the dataset. To deal with these challenges, we collected as many trait observations as possible. The dataset was developed over a period of six years (2009–2015) with continuous addition of new trait records as data became available. The final dataset is based on almost 1 million trait records, which can be traced back to ca. 2,500 references (see file: ‘References_original_sources.xlsx’). We identified outliers and potential errors based on a probabilistic approach10 combined with validation by domain experts and external information.

These combined efforts of data acquisition, integration and quality control resulted in the most comprehensive and probably most accurate dataset for species mean traits of vascular plants published so far.

## Methods

### Selection of plant traits

There is an extensive literature summarized in Díaz et al.9 and Pérez-Harguindeguy et al.6 supporting the key importance of the six core traits chosen – H, SSD, LA, LMA, Nmass and SM – to growth, survival and reproduction. Díaz et al.9 went further by showing that, together, these traits capture the essence of plant form and function at the broad scale: a two-dimensional space, with one major dimension reflecting the size of whole plants and its organs, and the other representing a balance between leaf construction cost against growth potential, captures roughly three-quarters of total trait variation. The core quantitative traits were complemented with the categorical traits: woodiness, growth form, succulence, adaptation to terrestrial or aquatic habitats, nutrition type, and leaf type.

### Definition of traits

In the following section we provide the names and definitions used for the continuous traits in the original publication of the global spectrum9, plus the names and definitions used in the Thesaurus Of Plant Characteristics (TOP)14. The detailed rationale, ecological meaning and key references for each of them can be found in the methods section of Díaz et al.9 and in Garnier et al.7. For the categorical traits we provide names, definition where available, and the categories used in the database. Traits were mostly measured following the protocols and definitions specified in the ‘New Handbook for Standardised Measurement of Plant Functional Traits Worldwide’6 (http://www.nucleodiversus.org). In the case of data from the LEDA database, measurements followed the protocols developed in the context of the LEDA project16 (https://www.leda-traitbase.org). In the case of published datasets individual measurement protocols are available in the original publications listed in Table S1.

#### Plant height (H) (unit: m)

Adult plant height, i.e. typical height of the upper boundary of the main photosynthetic tissues at maturity (TOP: vegetative plant height; the plant height considering the highest vegetative component).

#### Stem specific density (SSD) (unit: mg mm−3)

Stem dry mass per unit of stem fresh volume (TOP: stem specific density; the ratio of the mass of the stem or a unit thereof assessed after drying to its volume assessed without drying). SSD is much more commonly measured on woody species (particularly trees), than on non-woody species. Therefore, gaps in SSD for non-woody species were filled by estimates derived from leaf dry matter content (see Data Imputation below).

#### Leaf area (LA) (unit: mm2)

One-sided surface area of an individual lamina (TOP: leaf lamina area; the area of the leaf lamina in the one-sided projection; in case of compound leaves the area of a leaflet lamina).

#### Leaf mass per area (LMA) (unit: g m−2)

Leaf dry mass per unit of lamina surface area (TOP: leaf mass per area, the ratio of the dry mass of a leaf to its area).

#### Leaf nitrogen per mass (Nmass) (unit: mg g−1)

Leaf nitrogen content per unit of lamina dry mass (leaf total N) (TOP: leaf nitrogen content per leaf dry mass; the ratio of the quantity of nitrogen in the leaf or component thereof, i.e. leaf lamina or leaflet, per respective unit dry mass).

#### Diaspore mass (SM) (unit: mg)

Dry mass of an individual seed or spore plus any additional structures that assist dispersal and do not easily detach (TOP: seed dry mass; mass of an individual seed or spore assessed after drying; seed dry mass). Spore mass of pteridophytes, rarely reported in the literature, was estimated from published values of diaspore diameter and density (see Data Imputation below).

#### Leaf dry matter content (LDMC) (unit: g g−1)

The ratio of the dry mass of the leaf or component thereof, i.e. leaf lamina, to the corresponding water saturated fresh mass. In addition to the six focal traits, we compiled LDMC for herbaceous plants to calculate missing values for SSD (see Data Imputation below).

#### Adaptation to terrestrial or aquatic habitats

On the basis of the type of habitat in which the species naturally grows. Categories: aquatic, aquatic/semiaquatic, semiaquatic, terrestrial.

#### Woodiness

A feature of the whole plant defining the occurrence and distribution of wood along the stem. Categories: woody, non-woody, semi-woody (woody at base of stem(s) only).

#### Growth form

Growth form is mainly determined by woodiness and the direction and extent of growth, and any branching of the main shoot axis or axes. Categories: bamboo graminoid, climber, fern, herbaceous graminoid, herbaceous non-graminoid, herbaceous non-graminoid/shrub, succulent, shrub, shrub/tree, tree, other.

#### Succulence

Succulence characterizes plants with parts that are thickened, fleshy, and engorged, usually to retain water in conditions where climate or soil characteristics strongly limit water availability to plants. This criterion aims to provide more detailed information to the succulent growth form whenever available. Categories: leaf and stem succulent, leaf rosette and stem succulent, leaf rosette succulent, leaf rosette succulent (tall), leaf succulent, stem succulent, stem succulent (short), stem succulent (tall), succulent.

#### Nutrition type

Nutrition type here refers to whether the major source of energy and nutrients for the plant is photosynthesis, animals, dead material or other plants. Parasitism categories: hemiparasitic, holoparasitic, independent, parasitic. Carnivory categories: carnivorous, detritivorous.

According to the ‘New Handbook for Standardised Measurement of Plant Functional Traits Worldwide’6 succulence and nutrition type are part of growth form. We here treat them separately for simplicity and to avoid combined categories.

#### Leaf type

A classification of presence/absence of photosynthetic active leaves and their basic forms. Categories: broadleaved, needleleaved, scale-shaped, scale-shaped/needleleaved, photosynthetic stem.

### Definition of representative trait records

The six core quantitative traits certainly show intraspecific variation, amongst others caused by different ontogenetic stages and growth conditions. The dataset, focused on mean trait values for species rather than intraspecific variation, was intended to represent species mean trait values for mature and healthy (not obviously unhealthy) plants grown under natural conditions within the species distribution range. Leaf traits were intended to represent young but fully expanded and healthy leaves from the light exposed top canopy. Trait records not conforming to these requirements, i.e. records from plants grown in laboratories under experimental conditions and records measured on juvenile plants, were excluded from the dataset. This decision was made based on the respective metadata in the TRY database (see below).

### Data sources

The vast majority of quantitative trait data was provided by the TRY Plant Trait Database10 (https:// www.try-db.org, TRY version 2.0 accessed July 2010, updated by TRY version 3.0 accessed May 2015). This dataset was supplemented by a small number of published data not included in TRY and original unpublished data contributed by W. J. Bond, J. H. C. Cornelissen, S. Díaz, L. Enrico, M. T. Fernandez-Piedade, L. D. Gorné, D. Kirkup, M. Kleyer, N. Salinas, E.-D. Schulze, K. Thompson, and R. Urrutia-Jalabert.

Categorical traits were derived from the TRY Categorical Traits Dataset (https://www.try-db.org/TryWeb/Data.php#3), enhanced by field data and various literature sources.

The datasets contributing via TRY to the quantitative traits are described in Supplementary Table S1, which contains data from refs. 4,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,79,80,81,82,83,84,85,86,87,88,89,90,91,92,93,94,95,96,97,98,99,100,101,102,103,104,105,106,107,108,109,110,111,112,113,114,115,116,117,118,119,120,121,122,123,124,125,126,127,128,129,130,131,132,133,134,135,136,137,138,139,140,141,142,143,144,145,146,147,148,149,150,151,152,153,154,155,156,157,158,159,160,161,162,163,164,165,166,167,168,169,170,171,172,173,174,175,176,177,178,179,180,181,182,183,184,185,186,187,188,189,190,191,192,193,194,195,196,197,198,199,200,201,202,203,204,205,206,207,208,209,210,211,212,213,214,215,216,217,218,219,220,221,222,223,224,225,226,227,228,229,230,231,232,233 and the following unpbublished datasets: French Weeds Trait Database; Photosynthesis and Leaf Characteristics Database; South African Woody Plants Database (ZLTP); Tundra Plant Traits Database; Leaf N-Retention Database; Traits for Herbaceous Species from Andorra; Leaf Characteristics of Pinus sylvestris and Picea abies; Plant Coastal Dune Traits (France, Aquitaine); Dispersal Traits Database; LABDENDRO Brazilian Subtropical Forest Traits Database; Growth and Herbivory of Juvenile Trees; Cold Tolerance, Seed Size and Height of North American Forest Tree Species; Harze Trait Intravar: SLA; LDMC and Plant Height for Calcareous Grassland Species in South Belgium; Functional Traits for Restoration Ecology in the Colombian Amazon; Komati Leaf Trait Data; Baccara - Plant Traits of European Forests; Traits of Bornean Trees Database; Meadow Plant Traits: Biomass Allocation, Rooting depth; New South Wales Plant Traits Database; Traits for Herbaceous Species from Andorra; Catalonian Mediterranean Shrubland Trait Database; The Netherlands Plant Height Database; Plant Traits from Spanish Mediterranean Shrublands; Crown Architecture Database; Maxfield Meadow, Rocky Mountain Biological Laboratory – LMA; Herbaceous Plants Traits From Southern Germany; Leaf Area, Dry Mass and SLA Dataset; Herbaceous Leaf Traits Database Old Field New York; Plant Functional Traits From the Province of Almeria, Spain; Traits for Common Grasses and Herbs in Spain; Midwestern and Southern US Herbaceous Species Trait Database; Overton/Wright New Zealand Database; San Lorenzo Epiphyte Leaf Traits Database.

The reference for each individual trait record contributing via TRY to the Global Spectrum Dataset before exclusion of non-representative trait records, errors and duplicates is documented in the data file ‘References.xlsx’.

### Data integration and quality management

#### Semantic integration of terminologies from different datasets

Ecological studies are carried out for a large number of different questions at different scales and researchers often work independently and with little coordination among them. This results in idiosyncratic datasets using heterogeneous terminologies14. The first step was therefore a semantic integration of terminologies. The core traits were standardized according to the definitions and measurement protocols provided in the Thesaurus Of Plant Characteristics (TOP)14 and the ‘New Handbook for Standardised Measurement of Plant Functional Traits Worldwide’6,15. The metadata for plant and organ maturity (juvenile, mature), health (healthy, not healthy), growth conditions (natural conditions, experimental conditions), and sun- versus shade-grown leaves were harmonized across datasets.

#### Consolidation of taxonomy

Species names were standardized and attributed to families according to The Plant List (http://www.theplantlist.org), the commonly accepted list for vascular plants at the time of publication of Díaz et al.9, using TNRS234,235, complemented by manual standardization by experts. Attribution of families to higher-rank groups was made according to APG III (2009) (http://www.mobot.org/MOBOT/research/APweb/).

#### Conversion and correction of units, and exclusion of errors

Different datasets often used different units for the same trait. After conversion to the standardized unit per trait, differences among datasets - sometimes in the order of magnitude - became obvious. These differences could often be traced back to errors in the original units and were corrected. Obvious errors (e.g. impossible trait values like LMA < 0 g/m2) were excluded from the dataset.

#### Data imputation

To improve the number of species with values for all six core traits, trait records for stem SSD, LMA, Nmass and SM were complemented by trait values derived from records of related traits:

#### - Imputation of SSD

Trait records for SSD are available for a very large number of woody species, but only for very few herbaceous species. To incorporate this fundamental trait in the analyses by Díaz et al.9, we complemented SSD of herbaceous species using an estimation based on leaf dry matter content (LDMC), a much more widely available trait, and its close correlation to stem dry matter content (StDMC, the ratio of stem dry mass to stem water-saturated fresh mass). StDMC is a good proxy of SSD in herbaceous plants with a ratio of approximately 1:1199, despite substantial differences in stem anatomy among botanical families236, including those between non-monocotyledons and monocotyledons (where sheaths were measured). We used a data set of 422 herbaceous species collected in the field across Europe and Israel, and belonging to 31 botanical families, to parameterize linear relationships of StDMC to LDMC. The slopes of the relationship were significantly higher for monocotyledons than for other angiosperms (F = 12.3; P < 0.001, from a covariance analysis); within non-monocotyledons, the slope for Fabaceae was higher than that for species from other families (F = 4.5; P < 0.05, from a covariance analysis). We thus used three different equations to predict SSD for 1963 herbaceous species for which LDMC values were available in TRY (Table 1): one for monocotyledons, one for Fabaceae, and a third one for other non-monocotyledons. Estimated data are flagged.

#### - Imputation of LMA

Trait records for SLA (leaf area per leaf dry mass) were converted to LMA (leaf dry mass per leaf area): LMA = 1/SLA.

#### - Imputation of Nmass

Trait records for leaf nitrogen content per leaf area (Narea) were converted to records of leaf nitrogen content per leaf dry mass (Nmass) if records for LMA were available for the same observation (leaf): Nmass = Narea/LMA.

#### - Imputation of SM

To be able to include trait data for pteridophytes in the analyses in Díaz et al.9, diaspore mass values were estimated based on published data for spore radius (r). We assumed that spores would be approximately spherical, with volume = (4/3)πr3, and that their density would be 0.5 mg mm−3 (refs. 237,238,239,240). Although these assumptions were imprecise, we are confident they result in spore masses within the right order of magnitude and several orders of magnitude smaller than seed mass of spermatophytes. Most data were from Page237, data for Sadleria pallida were from Lloyd238, for Pteridium aquilinum from Conway239, and for Diphasiastrum spp from Stoor et al.240.

#### Probabilistic outlier detection

The hierarchical taxonomic classification of plants into families, genera and species has been shown to be highly informative with respect to the probability of trait values241,242,243. We therefore used it to conduct outlier detection at each of these levels.

The six core traits provided in the Global Spectrum Dataset are approximately normally distributed on a logarithmic scale10. We therefore assume that on log-scale, traits sample from normal distributions. In the context of a normal distribution the density distribution is symmetric to the mean with 99.73% (99.99%) of data to be expected within the range of mean +/− 3 standard deviations, and 99.99% of data within +/− 4 standard deviations. Using these wide confidence intervals ensures that extreme values that correspond to truly extreme values of traits in nature are not mistakenly identified as outliers and therefore excluded from the dataset.

The z-score indicates how many standard deviations a record is away from the mean:

$${\rm{z}} \mbox{-} {\rm{s}}{\rm{c}}{\rm{o}}{\rm{r}}{\rm{e}}=({\rm{v}}{\rm{a}}{\rm{l}}{\rm{u}}{\rm{e}}-{\rm{m}}{\rm{e}}{\rm{a}}{\rm{n}})/{\rm{s}}{\rm{t}}{\rm{a}}{\rm{n}}{\rm{d}}{\rm{a}}{\rm{r}}{\rm{d}}\,{\rm{d}}{\rm{e}}{\rm{v}}{\rm{i}}{\rm{a}}{\rm{t}}{\rm{i}}{\rm{o}}{\rm{n}}$$

Trait values with absolute z-scores >4 (>3) have a probability of less than 0.1% (0.3%) to be true values of the normal distribution. These trait values are most probably caused by errors not yet detected for these individual records, e.g., wrong unit, decimal error of trait value, wrong species (e.g. by mistake attributing a herb species name to a height measured on a tree), problems related to the trait definition or non-representative growth or measurement conditions. We acknowledge however that our z-score cutoff choice is an arbitrary one.

In many cases the number of trait values per taxon (e.g. a given species) was too small for a representative sample and did not provide a reliable estimate of the standard deviation (see Fig. 1). To circumvent this problem, we used the average standard deviation of trait values at the given taxonomic level, e.g., species, genus, family or all vascular plants. This average is an approximation of the standard deviation to be expected for an individual taxon, if a sufficient number of observations would be available (Fig. 1)10.

This probability-based data quality assessment on the different levels of the taxonomic hierarchy is routinely conducted within the TRY database for all traits with more than 1000 records. The z-score values for each trait record are made available on the TRY website and the highest absolute value is provided with each data release.

Trait values with an absolute z-score >4 (more than 4 standard deviations from at least one taxon mean) were excluded from the dataset unless their retention could be justified from external sources. Trait records with an absolute z-score 3 to 4 (3 to 4 standard deviations from at least one taxon mean) were checked by domain experts among the authors for plausibility, and retained or excluded accordingly.

#### Exclusion of duplicate trait records

Duplicate trait records were identified on the basis of the following criteria: same species (after standardization of taxonomy), similar trait values (accounting for rounding errors after semantic integration, unit conversion and data complementation), and no information on different measurement locations or dates.

#### Calculation of species mean trait values

The resulting dataset was used to calculate species mean trait values, without further stratification along, e.g., datasets or measurement sites. As trait distributions of the six core traits have been shown to be log-normal9, the mean species trait values were calculated after log-transformation of the trait values (geometric mean).

Data for the categorical traits were added and, if in doubt, checked against expert knowledge and independent external information from specialized websites in the Internet.

#### Final validation of taxonomy and mean trait values

Taxonomy was finally checked once more manually against the Plant List and APGIII. The ten most extreme species mean values of each trait (smallest and largest) were checked manually for reliability against external sources. Finally, outliers of species mean traits – after categorization of species according to the categorical traits and in bi- and multivariate trait space – were validated against external sources (see Díaz et al.9 Fig. 2, Extended Data Fig. 3, and Extended Data Fig. 4).

## Data Records

The dataset is available under a CC-BY license at the TRY File Archive (https://www.try-db.org/TryWeb/Data.php):

Díaz, S. et al. The global spectrum of plant form and function: enhanced species-level trait dataset. TRY File Archive https://doi.org/10.17871/TRY.81 (2022)244

### The dataset consists of two data files

• Species_mean_traits.xlsx

• References.xlsx

### Species_mean_traits.xlsx

The file provides mean trait values of plants grown under natural conditions for 46,047 species (including a small number of genus level classifications, sub-species and local varieties). Species names and mean trait values are complemented by taxonomic hierarchy (genus, family and phylogenetic group), the number of trait records contributing to each mean trait value and by categorical traits. Values of all six traits were available for 2,214 species. In total the dataset contains 476,932 entries for quantitative and categorical trait records and higher-level taxonomy (92,159 entries for quantitative traits, 200,585 entries for categorical traits, and 184,188 entries for higher-level taxonomy).

The quantitative species-level trait information is based on about 1 million trait records (see Table S1), measured on >500,000 plant individuals (number of different Observations in References (see below)). One trait record reported in the datasets is often based on several replicated measurements from different representative individuals at a site. The New Handbook for Standardised Measurement of Plant Functional Traits Worldwide6 recommends measurements on 10 to 25 individual plants or leaves, depending on the trait. Therefore in the cases that followed this or related protocols, a trait record in the original database probably represents the site-specific mean trait value for a given species. Reporting only the site-specific mean trait value was standard procedure in older publications and aggregated databases, assuming a common approach to replicated measurements on different individuals. More recent datasets tend to provide all individual measurements, among other reasons because this allows better treatment of intraspecific trait variation.

The present dataset was derived from 157 datasets (Table S1). Trait records can be traced to ca. 2500 original publications (see References_original_sources.xlsx). All species are complemented with higher-level taxonomic information; 92.5% and 84.8% of species are attributed to categories according to woodiness and basic growth-form, respectively. The raw data are available via the TRY Database (https://www.try-db.org/TryWeb/Home.php).

### References.xlsx

This file contains the references of all trait data, which contributed to the core traits of the Global Spectrum Dataset via the TRY database. If datasets contributed to TRY were already compiled from original publications, the table also provides the references of these original publications. The references are linked to the data in the species mean trait dataset via species unique identifiers and trait names.

The sum of replicates in the species mean trait table is about 100,000 trait records less than the sum of 979,924 trait records in References and Supplementary Table S1, because the species mean trait table contains mean trait values and information on number of trait records only for those species-trait combinations that were retained after data cleaning and imputation.

## Technical Validation

The dataset has a global coverage in geographic and climate space (Fig. 2, also Díaz et al.9 Extended Data Fig. 1), however with known gaps9,10,11. The numbers of species characterized per trait are similar to the TRY Database version 5, published in 201911. This indicates the efficiency of data collection and curation for the Global Spectrum Dataset. All species mean trait values (Table 2) are within the ranges published in Kattge et al.10. Histograms of trait frequency distributions are provided in Fig. 3. The coverage of species per trait with respect to woodiness is presented in Fig. 4. The dataset has so far been used in Díaz et al.9, where the data show a high internal consistency in bi- and multivariate analyses: known bivariate relationships were well reproduced (Díaz et al.9 Extended Data Figs. 3 and 4) and individual species were located in the first axes of the principal component analysis in positions expected from general knowledge about these species (Díaz et al.9 Fig. 2).

## Usage Notes

In case the dataset is used in publications, both this paper and Díaz et al.9 should be cited.

The six quantitative traits compiled here (plus LDMC) are among the best-covered quantitative traits in the TRY database. However, as is typical for these kinds of observational data, the numbers of records per species are unevenly distributed: few species mean trait values are based on a large number of records, while a large fraction of the species mean estimates is based on only a few or a single trait record(s) (see difference between mean and median number of trait records per species and trait in Table 2, the number of trait records per species mean is also indicated in the dataset file ‘Species_mean_traits.xlsx’). The representativeness of these mean values should be taken with caution, because the trait measurements have to be treated as samples from the variation of traits within species, which – for some traits – can be substantial10. However, as mentioned above, one trait record is often based on several trait measurements on characteristic individuals and therefore represents a species per site-specific mean value. In the context of large-scale analyses the variation within species has been shown to be considerably smaller than the variation between species10.