A functional trait database for Mediterranean Basin plants

Functional trait databases are emerging as crucial tools for a wide range of ecological studies across the world. Here, we provide a database of functional traits for vascular plant species of the Mediterranean Basin. The database includes 25,764 individual records of 44 traits from 2,457 plant taxa distributed in 119 taxonomic families. This database (BROT 2.0) is an updated and enlarged version of a previous database (BROT 1.0; 8,263 records, 14 traits, 952 taxa). Trait data were obtained from a comprehensive literature review, plus some field and experimental observations. All records are fully referenced and, in many cases, include geographic coordinates. The database is structured to include different levels of accuracy of trait information for each entry. BROT 2.0 should facilitate testing hypotheses on plant functional ecology within the Mediterranean Basin, and comparing this region with other ecosystems worldwide. The BROT 2.0 database and its trait definitions can be used as a template for creating similar trait databases in other regions of the world.

Functional trait databases are emerging as crucial tools for a wide range of ecological studies across the world. Here, we provide a database of functional traits for vascular plant species of the Mediterranean Basin. The database includes 25,764 individual records of 44 traits from 2,457 plant taxa distributed in 119 taxonomic families. This database (BROT 2.0) is an updated and enlarged version of a previous database (BROT 1.0; 8,263 records, 14 traits, 952 taxa). Trait data were obtained from a comprehensive literature review, plus some field and experimental observations. All records are fully referenced and, in many cases, include geographic coordinates. The database is structured to include different levels of accuracy of trait information for each entry. BROT 2.0 should facilitate testing hypotheses on plant functional ecology within the Mediterranean Basin, and comparing this region with other ecosystems worldwide. The BROT 2.0 database and its trait definitions can be used as a template for creating similar trait databases in other regions of the world. different Mediterranean species do not share their distribution limits, the border of our study area is also coarsealthough as a general geographical reference we follow the limits of the Mediterranean climate by Quézel & Médail 26 (Fig. 2). That is, some data come from locations outside of this geographical limit ( Fig. 2) but these locations are close and share many species and relevant ecological processes (e.g. summer drought and frequent fires) with ecosystems in the core area of the Mediterranean climate.
We compiled measurements and observations on plant functional traits from scientific papers, books, reports, theses, and other types of publications for plants from the Mediterranean Basin. In some cases, we also gathered functional trait information from unpublished sources such as expert opinion. The compilation is based on the traits defined in BROT 1.0 (ref. 24), but also includes 30 additional traits; some of the original traits have been redefined in line with the current knowledge. The final 44 plant functional traits considered include 22 general vegetative traits, 15 regeneration traits, and 7 sexual reproductive traits (Table 9). These functional traits reflect the physiological, morphological, reproductive, phenological, and regenerative properties of plants, and determine the ecological role of each plant species in the ecosystem. Many of the functional traits included in the database are known to be important for the response of plants to global changes and disturbances 2,6,27,28 . For each trait, we provided a standardized definition, and in many cases, this definition includes both quantitative, semi- Figure 1. Structure of BROT 2.0 database. Yellow boxes are the four files that make up the database, and include the column names of each file and the number of rows (below the box); ID refers to unique identifiers in the corresponding file. Black thick lines indicate the link between files. Gray arrows indicate the manuscript table in which information on a column can be found. (*) "Coordinates" include four columns: latitude, longitude, altitude, and accuracy (see Table 1). (#) "Authority" includes two columns: the authority for the species binomial name, and for the infra-species category (see Table 2).

ID
Unique identifier.

TaxonID
Unique taxa identifier; the ID in the Taxa file ( Table 2, Fig. 1).

Taxon
Taxa name without author names; see the Taxa file for authorities (see Table 2).

Trait
Name of the functional traits considered (list and definitions are given in Table 9 and in 'Data Records' section).

Data
The actual data. The categories for each functional trait are described in 'Data Records' section; words are not capitalized, all text is in lower case except single-letter codes (e.g. DispMode).

Units
Units for quantitative data; number of classes (in squatted brackets) for categorical and semi quantitative data.

DataType
Type of data as defined in Table 5. Note that the database includes different types of data, even for a given trait.

Method
General description of the method for gathering the information; it is related to the accuracy of the data (see Table 6).

SourceID
Unique identifier for the data source ( Fig. 1) from which data have been obtained. Complete references are listed in the Source file (see Table 3).

Region
The region of the Mediterranean Basin where the observation or experiment was performed or from where the seeds were collected (see Table 7).

Lat
Latitude (in decimal degree) of the study site. This field can be empty.

Long
Longitude (in decimal degree) of the study site. This field can be empty.

Alt
Altitude (in m) of the study site. This field can be empty.

Accuracy
The accuracy of the geographical coordinates. The definitions of the study sites range from small plots to large regions, and thus the given geographical coordinates may vary in their degree of accuracy in relation to the data (see Table 8). This field can be empty.
Comments Some data has a brief comment or clarification provided by the author (indicated as 'f.a.') or by the data compiler (indicated as 'f.c.'). This field can be empty. quantitative and qualitative information (see 'Data Records' section), which maximize gathering the available information. In some cases, the data is presented in a conditional form, that is, the data value includes some relevant extra information after a vertical bar (|, ASCII 124; i.e., data|condition; see examples in Table 5). For some traits (eight traits, called 'fixed' traits in Table 9), only one value for each taxon was compiled; these traits are assumed to be constant across the study area. However, for most traits (36 traits, called 'variable' traits in Table 9) we compiled information from multiple sources and localities. This information may include different type (and quality) of data (e.g., quantitative or binary), but may also include different, sometimes even contradictory information. This variability in the data may reflect within-species variability, plasticity or poor knowledge, and thus it is an area for further research. The selection of fixed/variable attribute of the traits was based on the variability observed when gathering the data, and on the general ecological knowledge of the region.
The database only includes taxa (at least at the species level) native to the Mediterranean Basin. We have tried to avoid information obtained from different publications that were based on the same experiment (in many cases, different publications by the same authors). Functional trait data obtained Authority for the species binomial name Infraname Two words separated by a space; the first is 'spp.' or 'var.' referring to subspecies or variety, respectively, the second is the name of the subspecies or variety. This field can be empty.

Authority2
Authority for the name of the subspecies or variety. This field can be empty.

Conditional
The above types may also, in turn, be conditional (quantitative conditional, semi-quantitative conditional, etc.) and indicate that the data entry has a value plus some additional information separated by a vertical bar (|, ASCII 124). This additional information is different in each trait, and it is defined in the corresponding traits (see 'Data Records' section). Here are three examples: HeatStimGerm = "low|L_H": germination stimulated by low heat intensities in an experiment where only Low and High heat intensities were tested (as defined in 'Data Records' section). That is, it is unknown whether it would be stimulated by a moderate heat intensity treatment; it was not stimulated by a high heat shock. ChemCues = "stimulation|smk": germination stimulated by smoke treatments (i.e., germination after smoke treatment was significantly higher than the germination in control conditions). SoilSeedBank = "persistent|bur": plant that has persistent soil seed bank according to the evidence come from a seed burial experiment. from inadequate experimental procedures or data that did not fully fit the definition of a particular trait were excluded. When available, we provided geographic information data (latitude, longitude, and altitude) for individual records in the database. In many cases, the reference from where the functional trait information was extracted did not include the coordinates, and thus we estimated them from the name of the locality (when possible). Because the studies vary from small plots to large regions, the coordinates of different record differ in the degree of accuracy in relation to the data. Thus we included the variable 'Accuracy' in the data to refer to the accuracy of the geographical location (see details in Table 8). Geographical information is provided for the variable traits only as fixed traits are assumed constant across the region.
Taxa names were homogenized following the European Science Foundation -European Documentation System (ESFEDS) 29 , which is largely based on Flora Europaea 30 . When a taxon name was missing in the ESFEDS database, or when some important taxonomic updates were available, generic taxonomical databases 31,32 , taxonomic databases for specific families 33,34 , or regional floras 35-37 were used. Finally, all taxa names were checked for up-to-date synonymous names and spelling errors using the Taxonomic Name Resolution Service 38 . All BROT 2.0 full taxa names, including authority, are provided in the Taxa file. This file also includes the taxonomic family following the APG IV system 39 . In a separate file (the Synonymous file; Fig. 1), we include synonymous names for the taxa considered; this is not an exhaustive list and only includes some alternative names frequently used in the literature.

Code availability
The four files of the database are in 'CSV' format and can be easily retrieved from Figshare (Data Citation 1) and uploaded to most statistical software, spreadsheets, or database management systems. The Sources, Taxa and Synonymous files have some special characters (e.g., accents on authors and authority names), and thus, for portability, UTF-8 encoding is used; the Data file uses plain text. Missing values are Method Definition

Measure
Published or unpublished data obtained from an experimental design in which the data is, at least, one of the objectives of the study.

Experience
Published or unpublished data from visual (rough) estimation or personal experience.

Compilation
Published data compiled from different sources (including experience, published data,...).
General reference Data published and obtained from a general publication such as a regional flora. U Unknown, or unclear in the original source, or from more than two of the above regions. Table 7. Descriptions of regions used in the Data file (the 'Region' column).

Accuracy Definition
High Data refer to a region of o 5 km 2 in size.

Mod
Data refer to a region between 5 and 500 km 2 in size.

Low
Data refer to a region between 500 and 50 000 km 2 in size.

VLow
Data refer to a region >50 000 km 2 in size. included as empty cells, and only occur in the Data file (in the last 5 columns; i.e., coordinates and comments; Table 1) and Taxa file (in the last two columns; Table 2).
As an example, we provide below the code to import the data into R and to perform some simple tasks; this code was tested in R (ver. 3.4) under Windows (ver. 7 and 10) and Linux (Ubuntu 16.04) and should work in any of the main computer configurations. Importing the files into a spreadsheet is easy, especially for LibreOffice and OpenOffice; users just need to select 'Unicode UTF-8' and fields separated by 'comma' when opening csv files. Microsoft Excel users may need to first rename the four *.csv files as *.txt, open them via the Import Text Wizard, and then select 'Unicode UTF-8' with fields separated by 'comma'.
Reading the BROT 2.0 files: List the references that provide information on lignotubers: Add families to all records in the main data file:

Data Records
The data compiled is available in a single dataset (Data Citation 1) composed of four ASCII text files, all in CSV format with quoted fields: the 'Data' file (BROT2_dat.csv) is the main file; the 'Taxa' file includes full taxa names (BROT2_tax.csv), and the 'Sources' file includes full references (BROT2_sou.csv). An additional 'Synonymous' file (BROT2_syn.csv) includes synonymous names for some of the taxa in the 'Taxa' file.  In total, we compiled functional trait data from 624 sources, of which 448 are articles published in peer-reviewed journals between 1893 and early 2018 (Fig. 3). The database includes 25,764 records for 2,457 taxa belonging to 2,265 species, 704 genera, and 119 families throughout the Mediterranean Basin. Asteraceae, Fabaceae, Lamiaceae, Poaceae, and Cistaceae are the best represented families in BROT 2.0, with the specific ranking depending on whether we consider the number of records or the number of species (Fig. 4). The top traits in number of records were RespFire (2,668), GrowthForm (2,451), SeedlEmerg (2,417), SeedMass (1,995), and FruitType (1,551 records) (Fig. 5). Data records are geographically distributed throughout the Mediterranean Basin, but some parts of the basin (e.g. the southern rim) are poorly represented (Fig. 2), reflecting the lower number of available studies in this area. Most of the data records (12,132) come from 379 studies conducted in the Iberian Peninsula, following by Greece (2,366 records from 59 sources), Anatolia (1,303 records from 45 sources), Mediterranean France (1,302 records from 38 sources), and Italy (761 records from 27 sources). Considering the numbers of data records, taxa, and sources summarized above, BROT 2.0 represents a significant improvement (both quantitatively and qualitatively) over BROT 1.0 (ref. 24) (Table 10). Definitions, categories, and units of functional traits included in the database are given below, with the trait short names (as in the Data file, Table 1) and whether it is a fixed or variable trait (F, V) in brackets. Traits are ordered following Table 9, that is, first are vegetative traits (1 to 22), then sexual reproductive traits (23 to 29), and finally regeneration traits (30 to 44). Note that the trait short names (in brackets) and all categories (in bold) are written as used in the database (Data file), including the letter case.
1 Growth form (GrowthForm, F) Morphology of the whole plant related with its size (for non-disturbed individuals). Categories are: -tree: very tall woody plant, frequently with one main, primary stem and the green canopy rarely reaching the ground.     Table 9.    -winter deciduous: plant that drops all its leaves during the winter.

Number of records by trait
-winter semi-deciduous: plant that drops part of its leaves during the winter, maintaining some brownish leaves in the crown. -drought semi-deciduous: plant that drops part of its leaves during the dry period (excluding species that drop leaves only in very extreme droughts).  -none: without leaves or any functional analogue organ. If leaves are modified as spines, then LeafShape = spines (see below). -broad: plant with broad leaves (for compound leaves, this refers to leaflets).
-needle-like: plant with needle-like leaves.
-linear: plant with linear leaves (for compound leaves, this refers to leaflets).
-scale-like: plant with scale-like leaves.
-spines: leaves are modified as spines (which is different from having spines or thorns in the branches).

Average leaf area (LeafArea, V)
Average one-sided projected surface area (mm 2 ) of an individual leaf (or phyllodes) 40 . For compound leaves, leaflets area × leaflets number. For species with different stem and basal leaves, it refers to stem leaves. Alternatively, one of the following categories: -very small: small needle-like, scale-like, and linear leaves (typically less than 25 mm 2 ).
-large: large broad leaves, or divided leaves with numerous and large broad leaflets (typically 2025-4550 mm 2 ). -very large: very large broad leaves, usually divided (typically more than 4550 mm 2 ).

Mass-based leaf nitrogen content (LNCm, V)
Leaf nitrogen content (mass-based) (mg g −1 ), that is, the ratio of the quantity of nitrogen in the leaf per respective unit dry mass 40,41 .

Average specific leaf area (SLA, V)
Average one-sided area of the fresh leaf (or phyllodes) divided by its oven-dry mass 40,42 , excluding petiole and/or rachis; for mature plants only. Units are mm 2 mg −1 (note that mm 2 mg −1 × 10 = cm 2 g −1 ).  Oven-dry mass divided by the fresh volume of a section of the main stem 40 (excluding bark for woody species; i.e., wood density) (g cm -3 ).
12 Coarse:fine fuel (CFFuel, V) Coarse to fine fuel biomass ratio (i.e., > = 6 mm diameter and o6 mm diameter, respectively), including live and dead material. Alternatively, one of the following categories: -low: without coarse fuel; all fuel is fine.  Dry matter content of leaves (mg g −1 ), that is, the ratio of the dry mass of a leaf to its water saturated fresh mass 40,41 .

Clonality (Clonality, V)
Ability to colonize the space through vegetative reproduction. Categories considered are: -rhizomes: non-swollen belowground horizontal stem that grows near the soil surface with the ability to produce roots and stems. Rhizomes s.l., i.e., including woody and non-woody rhizomes. -roots: roots normally growing close to the soil surface.
fleshy: fleshy fruit, including fruit in which the fleshy part is the floral cup (hypanthium) (e.g. Rosa).

Seed dry mass (SeedMass, V)
Average dry mass of seeds (including some single-seeded fruits such as achenes or caryopsis) (mg). Alternatively, one of the following categories: -very light: o3 mg -light: > = 3 and o30 mg -medium: > = 30 and o 300 mg -heavy: > = 300 mg 28 Annual seed production (SeedProd, V) Average number of seeds per plant produced every year (# seeds). Alternatively, one of the following categories: -rarely: rarely, if ever, produces seeds in the study area.
29 Basic seed shape (SeedShape, V) Ratio between the two maximum diameters: first maximum divided by the second maximum, excluding structures attached to the seed coat as wings or pappus. Alternatively, one of the following categories: -regular: close to 1 (spherical or lens-shaped seeds).

Bud source (BudSource, V)
Location of the bud bank for resprouting species 43 . The categories are: -epicormic buds: stem buds (protected by the bark).
-apex: buds in the stem apex protected from fire by leaf bases.
-root crown: transition point between stem and root.
-lignotuber: woody swelling below or just above the soil, ontogenetically programmed (i.e., inherited character). Based on embryological and/or anatomical features. -thickened root-crown: woody swelling below or just above the soil non-ontogenetically programmed (e.g. stem coalescence). Thickened root crown. -burl: woody swelling below or just above the soil with the unspecified origin (no distinction between lignotuber and burl is reported). -rhizomes: belowground horizontal stem (non-swollen).
-storage organs: non-woody storage organs, i.e., modified stems (bulbs, corms or stem tubers) or roots (root tubers). -others: other bud sources, including those not clearly specified (e.g. stump). Resprouting capacity one year after clipping 100% of the plant as average proportion of adult plants that resprout (percentage). Not reported for annual plants (which can be assumed to be RespClip = no). Alternatively, one of the following categories: -no: without resprouting capacity.
-unaffected|###: germination of the treated seeds equal to the control.
-inhibition|###: germination of the treated seeds lower than the control.
36 Heat-stimulated germination (HeatStimGerm, V) The highest intensity in heat treatments (i.e., seed exposition to dry heat > = 50°C) that produce higher germination than the control (i.e., stimulated germination). Heat intensity defined as: Low (L: o100°C during o = 5 min.), Moderate (M:o100°C during >5 min. or > = 100°C during o = 5 min.), High (H: > = 100°C during >5 min.) or unknown (unk). Note that in many cases, post-treatment seed viability is not considered or not specified. The heat intensities tested for each experiment are indicated after the vertical bar (category|LMH), with an underscore when the corresponding heat intensity is not tested (e.g., category|L_H). The categories are: -yes|unk: stimulated germination is produced after unspecified heat intensity exposition.
-high|###: stimulated germination after exposition to High-intensity treatments (### refers to L, M and H respectively). -moderate|###: the highest heat intensity that produces stimulated germination is Moderate (### refers to L, M and H respectively). -low|###: the highest heat intensity that produces stimulated germination is Low (### refers to L, M and H respectively). -unaffected|###: germination is not stimulated after any heat intensity tested and at least one of the treatments does not affect seed germination (### refers to L, M and H respectively; unk if unknown). -inhibition|###: inhibited germination (i.e., lower germination than control) in all heat treatments tested (### refers to L, M and H respectively; unk if unknown).

Other germination cues (OtherCues, V)
Germination response to exposition to boiling water (blw), mechanical scarification (mec), summer temperature (tsu), temperature fluctuation (tfu), light (lgt). The response is indicated before the bar and the cue tested is indicated after the vertical bar (category|cue, e.g., stimulation|blw). Categories are: -stimulation|###: germination of the treated seeds higher than the control.
-unaffected|###: germination of the treated seeds equal to the control.
-inhibition|###: germination of the treated seeds lower than the control.
38 Canopy seed bank longevity (CanopySeedBank, V) Presence and longevity of the stored seeds in the canopy. The categories are: -yes: seeds stored in the canopy is present (no information on longevity) -no: no stored seeds in the canopy. Average proportion of serotinous (closed) cones in the canopy (%) 44 .
40 Soil seed bank longevity (SoilSeedBank, V) Period that seeds remain viable in the soil seed bank inferred from: soil and vegetation comparisons (veg), experimental seed burial (bur), seed dormancy (dor) or unknown methods (unk). The method used is indicated after the vertical bar (response|method, e.g., persistent|veg). The categories are: -transient|###: no soil seed bank; seeds germinate in the first favorable season after dispersal. Normally seed bank longevity o = 1 yr (no persistent seed bank). -persistent|###: seeds do not germinate in the first favorable season after dispersal. Normally seed bank longevity >1 yr (could be longer but it is unknown). -at least short-persistent|###: at least>1 and o = 5 yr (could be longer but it is unknown).
-mid-persistent|###: at least>5 yr (could be longer but it is unknown) -long-persistent|###: at least>15 yr -very long-persistent|###: > = 30 yr 41 Age at maturity of resprouts (MatResp, V) Average age of resprouts at the first successful reproduction (yr), i.e. when most of the resprouted plants produce the first seed crop. Alternatively, one of the following categories: -early: o5 yr -medium: 5-10 yr -late: >10 yr 42 Age at maturity of saplings (MatSap, V) Average age of saplings at the first successful reproduction (yr), i.e. when most of the saplings produce the first seed crop, excluding saplings from plantations. Alternatively, one of the following categories: -early: o5 yr -medium: 5-10 yr -late: >10 yr 43 Post-fire seedling emergence (SeedlEmerg, V) Average density of seedlings per pre-fire mature individuals emerged during the first year after fire. This is a ratio (number of seedlings / number of mature plants), or alternatively, one of the following categories: -no: no post-fire seedlings emergence.
-low: number of seedlings lower than the number of pre-fire mature individuals.
-high: number of seedlings higher than the number of pre-fire mature individuals.
-yes: post-fire seedling emergence observed (quantitative data not available).
-variable: high variability observed between populations or sampled areas.
44 Post-fire seedling survival (SeedlSurv, V) Proportion of seedlings surviving the first dry season after fire (%). Alternatively, one of the following categories: -no: no post-fire seedlings survival.

Technical Validation
Most of the records included in the database (79%) are based on published material in peer-reviewed scientific journals (46%), books (27%), or theses (5%), and thus most data should be accurate. In addition, each specific data value has the original references, and so users can evaluate the validity and accuracy of the original source. The data has been checked for possible redundancies and errors, and some published data were excluded from the final database; data considered doubtful (e.g., very extreme values, or data that did not fully fit the definition of the trait, or that were obtained with a questionable sampling) were not included. Also note that for each species and trait, the BROT 2.0 database includes different values from different sources, and so users can make their own decisions on data usage; for example, using the mean trait value, or excluding extreme values. This feature of BROT 2.0 database is especially important because it provides information on trait variability.
The spatial scope of the database is the Mediterranean Basin (Fig. 2). The differences in the number of records among sub-regions of the Basin (for example, the north and the south of the Basin, Fig. 2) reflect the spatial heterogeneity of knowledge across the Basin.

Usage Notes
To properly use the data, it is important to consider the following fields in the Data file: 'Method', 'Accuracy', and 'DataType'. The first provides information on the origin of the data, and is related to quality; data with Method = 'measure' is likely to be of the highest quality (Table 6). 'Accuracy' provides an indicator of the accuracy of the geographical location (Table 8). Users can use these fields to select the data that best suits their requirement, or to weight their analysis. The 'DataType' column is also informative, as different types of data (quantitative, semi-quantitative, etc.) are linked to the quality and accuracy of the data (Table 5). Because quantitative data can be converted to qualitative or semiquantitative (following the trait definitions given in the section 'Data Records'), users can choose between using more data of low quality or less data of high quality.
Note that there may be variability in a given trait, and the more information (more records from different sources), the more certain is the value of the trait for a given species. For instance, resprouting is a key trait in fire-prone ecosystems, and for some (few) species, we may find records for both 'yes' and 'no' in the ability to resprout. This may be due to variability, but it could also be due to a poor sampling. The number of records, together with the field 'Methods', may provide an estimation of the level of confidence in the information. Moreover, given that all records are referenced, it is always possible to go to the original reference to search for additional information.
Note also that some traits are poorly known and prone to mistakes. For instance, the type of underground resprouting (trait: 'BudSource') is not always easy to observe, and many researchers, especially in the past, tended to confuse structures like 'lignotubers' with any other underground resprouting organ 43,45 . When the mistake was clear from the reference, it was not included in the database, but in many instances, it is impossible to know, as variability exists among populations and geographical locations.