Globe-LFMC, a global plant water status database for vegetation ecophysiology and wildfire applications

Globe-LFMC is an extensive global database of live fuel moisture content (LFMC) measured from 1,383 sampling sites in 11 countries: Argentina, Australia, China, France, Italy, Senegal, Spain, South Africa, Tunisia, United Kingdom and the United States of America. The database contains 161,717 individual records based on in situ destructive samples used to measure LFMC, representing the amount of water in plant leaves per unit of dry matter. The primary goal of the database is to calibrate and validate remote sensing algorithms used to predict LFMC. However, this database is also relevant for the calibration and validation of dynamic global vegetation models, eco-physiological models of plant water stress as well as understanding the physiological drivers of spatiotemporal variation in LFMC at local, regional and global scales. Globe-LFMC should be useful for studying LFMC trends in response to environmental change and LFMC influence on wildfire occurrence, wildfire behavior, and overall vegetation health.


Background & Summary
Live Fuel Moisture Content (LFMC) is the water content of live foliage relative to its dry mass, which influences vegetation susceptibility to wildfire 1 . Vegetation with high LFMC takes longer to ignite and leaf water acts as a heat sink, slowing down the rate of fire spread and reducing fire intensity 2,3 . LFMC as a measure of plant water status also has important implications for assessing drought stress in natural vegetation 4 , determining over and under watering practices in agricultural crops 5 , and assessing vegetation health 6 and wildlife habitat suitability 7 .
Field sampling and gravimetric methods are the most direct way to estimate LFMC. These methods require in situ destructive collection of a representative sample of leaf/shoot material, which is then weighed fresh, oven-dried and reweighed to determine dry matter mass. Field sampling is labour-intensive and sampling sites must be carefully selected to represent spatial variation in LFMC and vegetation types. Sampling must also be repeated over time to capture temporal variation in LFMC. Consequently, the compilation of a database capturing broad-scale spatial and temporal variability in LFMC is not feasible with the resources of a single organization or research group. Remote sensing data provide the opportunity to predict LFMC over large areas at fine spatial and temporal resolutions, but these data also require field samples for calibration and validation 1 . Given the large cost of collecting field measurements of LFMC over large areas or long time periods, an international effort to compile and share existing field observations in a global database would help overcome a key constraint for the improvement and validation of LFMC remote sensing methods.
Individual universities, research centres and government departments have started to organize and share their time series of field-sampled LFMC data. For example, the U.S. National Fuel Moisture Database 8 (NFMD, http:// www.wfas.net/nfmd/public/index.php) is a web-based query system that enables users to view live-and deadfuel moisture data. Chuvieco 9 made available a database (FMC_UAH v1.1, http://www.geogra.uah.es/emilio/ FMC_UAH.html) composed of 880 LFMC samples taken at different campaigns from 1996 to 2010 in Spain. Since 1996, the French National Forest Service (ONF) has been sampling and freely sharing weekly LFMC (www. reseauhydrique.dpfm.fr) on 35 geolocalized sites, recently quality-checked and made available by Duché et al. 10,11 . However, while a large number of LFMC datasets have been published in the refereed literature, much of the source data has not previously been made available to the research community.
We present Globe-LFMC 12 , the most comprehensive global database of in situ destructive sampling measurements of LFMC. Globe-LFMC is a compilation of 161,717 field measurements carried out at 1,383 sampling sites in 11 countries from 1977 to 2018 (Fig. 1). The database is properly documented, georeferenced and publicly accessible. When available, each record has an accompanying reference. We have made all names of sampling sites and species names consistent. We have also removed duplicates and corrected inconsistencies in the LFMC  www.nature.com/scientificdata www.nature.com/scientificdata/ data. Additionally, the database reports on the protocol used to obtain each LFMC value. Finally, we also used remote sensing to assess the heterogeneity of vegetation greenness surrounding site coordinates, since highly heterogeneous areas within a specific satellite footprint may not be suitable for the calibration or validation of remote sensing products).
This database will lead to further advances in modeling and monitoring of spatial and temporal variation in LFMC. It should also allow evaluation of LFMC estimation methods, providing guidance for end-users in determining which LFMC estimation methods best fit their specific application. The database will also assist investigation of spatial variability in LFMC across the plant, local, and regional scales, and allow improved sampling strategies to capture spatial and temporal variation. The database can also be used to calibrate dynamic global vegetation models, eco-physiological models of plant drying as well as understanding the environmental and physiological drivers of LFMC. Finally, the database may be useful for exploring LFMC trends in response to environmental change and LFMC influence on wildfire occurrence, wildfire behaviour and overall vegetation health.

Field Description
First name First name of the contact person.
Last name Last name of the contact person.
Email E-mail address of the person to be contacted.
Tel (include all codes) Phone number of the person to be contacted.

Institution
The institution where the contact person works.

Country
Country where the sampling area is located.

Latitude
Latitude of sampling area location (Decimal Degrees, DD).

Longitude
Longitude of sampling area location (Decimal Degrees, DD).
Sampling time Time when the sampling occurred (hh: mm, 24-hour notation) Sampling date Date when the sampling occurred, in the format "yyyymmdd".

Sampling year
Year when the sampling occurred.

Protocol
Identifier of the protocol used to obtain the LFMC value. Details present in the "LFMC protocol codes" spreadsheet. Slope (%) Percentage slope of the sampling plot.

Reference
Citation or link to original LFMC data sets or to relevant publications that use the data or describe the site (in cases where the collection of data in an area proceeded after the publication cited, the record in Globe-LFMC was still linked to the article related to the older sampling campaign, in order to provide a description of the site).
Name of picture file Name of the photograph showing the sampling site. www.nature.com/scientificdata www.nature.com/scientificdata/

Methods
Globe-LFMC unifies existing LFMC data created and provided by researchers and agencies in different countries ( Fig. 1). All of the data presented in the database were collected by in situ destructive sampling of leaf material or, occasionally, small twigs (<0.6 cm). After the mass of fresh samples was determined, samples were dried in an oven until the water was evaporated, and then the sample was reweighed to determine dry mass. LFMC is typically calculated as the percentage of water mass with respect to dry mass, and can thus be over 100%.
The sampling methods for the different data sources were slightly different in terms of the equipment used to collect the samples, the drying temperature and time and other protocols for data acquisition or processing. Globe-LFMC summarizes the sampling methods used via a unique code (see Data Records paragraph), and the detailed description of the methodology can be found in the citation of each code included in the database.
Overall, only the LFMC from leaves or small terminal twigs (<0.6 cm) was considered and added to the database. Occasionally information about other vegetation components was recorded in the source data but this information was omitted in Globe-LFMC because leaves are generally the dominant component when viewing vegetation from above and thus contribute the most to the spectral signal observed by an airborne/spaceborne sensor. Moreover, any quality control flags in the source data indicating low-quality data led to the omission of the corresponding LFMC values in Globe-LFMC.

Field Description
Protocol code ID corresponding to the one contained in the "LFMC data" sheet   Table 4. Distribution of dataset records and descriptive statistics for LFMC by country and overall (Global). n = number of observations. "Dominant Land Cover (number of observations)" and "Dominant Land Cover (number of plots)" summarize the land cover type with more number of observations and sites, respectively, overall (Global) and per country.
www.nature.com/scientificdata www.nature.com/scientificdata/ LFMC values from samples corresponding to the same date, species and site were recorded as a mean value. This approach allowed us to maintain consistent information on every single species sampled at the same plot. However, in some instances, the sampler collected and weighed different species together in the same sample. In those situations, the "Species collected" database field contains a list of species instead of a single species. Sometimes the species were reported with their common names or with typos in the original datasets, in which case the correct genus and species was substituted.
There were some entries where the same site name was used for more than one set of geographic coordinates. In order to have a single LFMC value for each species-date-site combination, the names were modified by adding an identifier (e.g. an increasing number or the state abbreviation). Conversely, we found some entries where two or more different site names corresponded to identical geographic. Those names were unified creating a new name (e.g. from "A" and "B" to "A-B", where A and B are two different names, or from "C1" and "C2" to "C", where C corresponds to the word that was in common in the C1 and C2 names). Plots with missing geographic coordinates were not added to the final database.
Most of the information provided came from the original datasets, but a few columns were added to the database to provide additional insight into site characteristics. A column was added for "Land Cover" that provides information on the land cover class at the sample site obtained from the 2015 ESA Climate Change global land cover map at 300 m spatial resolution for the year 2015 (http://maps.elie.ucl.ac.be/CCI/viewer/download.php). Columns were also added for "NDVI SD min ", "NDVI SD max ", "NDVI CV min " and "NDVI CV max ". These refer to the minimum and maximum Standard Deviation and the Coefficient of Variation of the Normalised Difference of Vegetation Index (NDVI) 13 within a 500 m square buffer centred on the geographic coordinate of each site. These NDVI-derived statistics were computed as indicators of the heterogeneity of the sampling sites and the area surrounding them. Filtering out heterogeneous sites may be a key site selection criteria for calibration and validation of LFMC predictions from coarser spatial resolution remote sensing products. Both NDVI Standard Deviation and Coefficient of Variation were computed from Landsat 8 Operational Land Imager data using Google Earth Engine 14 . Monthly mean NDVI maps were created using USGS Landsat 8 Surface Reflectance Tier 1 data and masking the pixels marked as cloud, cloud shadows, or snow, for each month for 2015 (the same year of the ESA land cover map used for the characterization of the land cover type of each site). For each of the 12 monthly maps, the standard deviation and the mean were computed within the 500 m × 500 m window. If 20% or more of NDVI values within the window were missing due to cloud and snow masking, no NDVI value was reported for that month. Consequently, every site was assigned 12 NDVI standard deviation values (one for each month) and 12 NDVI mean values. Globe-LFMC contains the minimum and maximum Standard Deviation and the Coefficient of Variation of NDVI values of every site. Finally, we also added information on slope and altitude for some sampling plots.  www.nature.com/scientificdata www.nature.com/scientificdata/ Description of google earth engine code. Google Earth Engine 14 was used to compute the NDVI statistics added to Globe-LFMC. The input of the program is a point shapefile ("samplePlotsShapefile", extensions.cpg, .dbf, .prj, .shp, .shx) representing the location of each Globe-LFMC site. This shapefile is available as additional data in figshare 12 (see Code Availability). To run this GEE code the shapefile needs to be uploaded into the GEE Assets and, then, imported into the Code Editor with the name "plots" (without quotation marks).
The outputs of the program are 12 ".csv" files, each corresponding to a month of the year 2015. Every file contains the following statistics for the 500 × 500 m 2 buffers around the coordinates of the Globe-LFMC site: NDVI SD, NDVI mean, the count of total pixels and the unmasked pixels.

Data Records
The compiled data are available in a single database in Excel format with three different interrelated spreadsheets; "Contact", "LFMCdata" and "Protocol 12 " (Fig. 2). A description of the fields in each spreadsheet can be found in Tables 1, 2 and 3. Each data record represents the LFMC measurement taken at a sampling site (Sitename) at a specific time and has a unique record identification code: C(contact_id)_(Sitename _id)_(record_id). These www.nature.com/scientificdata www.nature.com/scientificdata/ details allow users to select and download discrete datasets for their area of interest, and to identify the contact person for each data entry.
We plan to publish updates to LFMC-Globe as new data become available in the future. Scientists interested in sharing their data can contact the corresponding author of this manuscript to get instructions on how to share their data.

technical Validation
The database represents a range of countries and land cover types containing LFMC values that range from 0.21-549% (Table 4). A majority of samples were collected in the Western US, due to extensive government sampling programs for assessing wildfire danger, with some time series stretching back decades. Large numbers of samples were also collected in France, Spain and Australia ( Table 4). The land cover type with the largest number of observations and sites is "Tree cover, needle-leaved, evergreen, closed to open (>15%)" followed by "Shrubland", "Grassland" and "Cropland-rainfed" (Table 5).
Data in the database have been checked for possible replications and errors. We validated the data by checking their consistency with expected LFMC ranges, noting that it is out of the scope of this paper to provide detail LFMC trend analysis as this will be the objective of future work. Globe-LFMC contains values lower than 30% which are specific to dead fuels 15 . Those values mostly come from partially or fully cured grassland and herbaceous plots but were also occasionally recorded in other landcovers (Table 4 and Fig. 3). If we don't take into account those occasional outliers, the distribution of LFMC for species with significant numbers of observations (Fig. 3) shows consistency with established knowledge on the seasonal pattern of LFMC according to the type of vegetation, their strategies to cope with drought 16 and their pyro-ecophysiological traits 17,18 . For example, Eucalyptus is a genus which includes over seven hundred species of broad-leaved trees, usually evergreen and native to Australia. Because Eucalyptus trees have roots up to more than 2.5 m in length and adapted ecophysiological traits, they can draw water from deep in the soil profile to avoid drought and therefore their LFMC only fluctuates around a value of 100% across seasons. Similarly, Quercus ilex is an evergreen broad-leaved oak native of the Mediterranean region with similar strategies. Conversely, Quercus gambelii is a deciduous broad-leaved tree widespread in western North America that shows greater LFMC variability, with values in summer significantly lower than in spring. Finally, Artemisia tridentata (drought deciduous/evergreen shrub of western North America), Cistus monspeliensis (evergreen Mediterranean shrub) and grasslands present the strongest seasonality with the highest values in spring, lowest (<30%) in summer and intermediate in autumn and winter.

Usage Notes
Users of the database are encouraged to look at the available photos of the sites, whose names can be found in the "LFMC data" spreadsheet. The photos are contained in the zip folder named "photos of sites".
If the database is to be used for remote sensing products calibration or validation, fields for minimum and maximum "NDVI SD" and "NDVI CV" are recommended to be explored for a selection of the most homogenous sites.
An extra database in Excel format ("References&Changes_LFMC.xlsx", at figshare) 12 with two spreadsheets "References" and "Changes to USA National FM db" contain information on references, copyright notices and list the changes to the original datasets. We provide this information in case a researcher would like to compare the values shown here to the original databases.

Code availability
The Google Earth Engine (GEE) 14 code and the shapefile "samplePlotsShapefile" (extensions.cpg, .dbf, .prj, .shp, .shx), used for computing the NDVI statistics are part of the data and files uploaded together with Globe-LFMC into figshare 12 . This code can only be run if the user has access to a Google account and to GEE.