The Forest Observation System, building a global reference dataset for remote sensing of forest biomass

Forest biomass is an essential indicator for monitoring the Earth’s ecosystems and climate. It is a critical input to greenhouse gas accounting, estimation of carbon losses and forest degradation, assessment of renewable energy potential, and for developing climate change mitigation policies such as REDD+, among others. Wall-to-wall mapping of aboveground biomass (AGB) is now possible with satellite remote sensing (RS). However, RS methods require extant, up-to-date, reliable, representative and comparable in situ data for calibration and validation. Here, we present the Forest Observation System (FOS) initiative, an international cooperation to establish and maintain a global in situ forest biomass database. AGB and canopy height estimates with their associated uncertainties are derived at a 0.25 ha scale from field measurements made in permanent research plots across the world’s forests. All plot estimates are geolocated and have a size that allows for direct comparison with many RS measurements. The FOS offers the potential to improve the accuracy of RS-based biomass products while developing new synergies between the RS and ground-based ecosystem research communities.


Forest biomass is an essential indicator for monitoring the Earth's ecosystems and climate.
It is a critical input to greenhouse gas accounting, estimation of carbon losses and forest degradation, assessment of renewable energy potential, and for developing climate change mitigation policies such as REDD+, among others. Wall-to-wall mapping of aboveground biomass (aGB) is now possible with satellite remote sensing (RS). However, RS methods require extant, up-to-date, reliable, representative and comparable in situ data for calibration and validation. Here, we present the Forest Observation System (FOS) initiative, an international cooperation to establish and maintain a global in situ forest biomass database. aGB and canopy height estimates with their associated uncertainties are derived at a 0.25 ha scale from field measurements made in permanent research plots across the world's forests. all plot estimates are geolocated and have a size that allows for direct comparison with many RS measurements. The FOS offers the potential to improve the accuracy of RSbased biomass products while developing new synergies between the RS and ground-based ecosystem research communities.

Background & Summary
Global estimates of forest height, aboveground biomass (AGB) and changes over space and time are needed as both essential climate variables 1 and essential biodiversity variables 2 , and to support international policy initiatives such as REDD+ 3 . Several space-borne missions to assess forest structure and functioning, including BIOMASS (ESA), ALOS PALSAR (JAXA), GEDI (NASA) and NISAR (NASA-ISRO), will be operational in the coming years. These missions require ground-based estimates for algorithm calibration and product validation. For instance, high-quality, standardized measurements of forest biomass and height are critical for improving the accuracy of products derived from space-borne instruments. Furthermore, ensuring that different missions have access to the same set of high-quality standardized measurements for calibration and validation should vastly help improve comparability and confidence in future remote sensing (RS) products.
Remote Sensing users typically have different product requirements compared to those of the ecological and forestry communities. Namely, RS users often (1) need access to AGB estimates at the pixel level, while ecologists and foresters produce area-based estimates derived from individual trees measurements. RS users typically (2) need products at a consistent spatial resolution, while a variety of plot sizes and shapes have been adopted by ecologists and foresters. Finally, RS users (3) require AGB to be computed via globally and regionally consistent routines, while various approaches have been developed to derive AGB estimates from tree measurements. These communities also operate differently from a funding perspective. Most notably, recurrent investments are needed to maintain permanent forest plots -including censuses that temporally match RS data collection -and to ensure field and botanical staff are paid and trained, without whom the data would not be collected. In contrast, RS users typically access data provided by space-borne missions that have already been funded. Despite these differences, there is a clear need to share existing data sets for the benefit of both communities. # A full list of authors and their affiliations appears at the end of the paper. Correspondence and requests for materials should be addressed to D.S. (email: schepd@iiasa.ac.at) tropical forest plot data, as few as 40 tree height observations are sufficient for characterizing this relationship if stratified by diameter 22 .
All the data presented here were collected from permanent forest sample plots with known locations; accurate coordinates (with an error of less than 30 meters) have been either delivered to the FOS or will be recorded during the next census. Plot sizes are typically 1 ha in area (i.e., the median), but they can vary from 0.25 ha to 50 ha. Large plots are subdivided into 0.25 ha, i.e., 50 × 50 m sub-plots. The FOS consortium made the decision to consider only relatively large and permanent plots in order to reduce errors in georeferencing and to decrease the variability in the measured parameters. Recent research has quantified the effect of spatial resolution on the uncertainties in the AGB estimates, with sampling error dropping from 46.3% for 0.1 ha plots, to 26% and 16.5% for 0.25 ha and 1 ha plots, respectively 52 . Scaling up from the plot to the landscape level using lidar-derived metrics, studies have shown decreases in the RMSE for the AGB-lidar models, from 70-90 to 36-51 Mg AGB per ha, when increasing the plot size from 0.25 ha to 1 ha 17,53 . Clearly there are always size-effort tradeoffs, e.g., smaller plots would permit greater replication, but by focusing on larger plots that are also permanent, FOS has chosen to focus its efforts on a smaller but high-quality set of plots. Our approach, therefore, excludes the possibility of using databases of smaller plots such as those found in national forest inventories.
AGB and associated uncertainties were obtained using a standardized procedure implemented in the BIOMASS R-package 5 . For the sake of standardization, we systematically considered only trees having a diameter ≥10 cm (or a 5 cm threshold in the case where these trees contribute substantially (>5%) to the total AGB, e.g., in savannas). Taxonomy was first checked using the Taxonomic Name Resolution Service, which in turn served to assign a wood density value to each tree using the Global Wood Density Database (GWDD) as a reference 54,55 . Species-or genus-level averages were assigned when possible and, if not, the plot-level mean wood density was assigned to each tree species with no known wood density. Tree height was estimated in three different ways. First, when available, subsets of tree height measurements were used to build plot-specific height-diameter relationships, assuming a three-parameter Weibull model 5 or a two-parameter Michaelis-Menten model, whichever provided the lowest prediction error. Secondly, the regional height-diameter models proposed by Feldpausch et al. 31 were used to infer tree height. Finally, height was implicitly taken into consideration in the AGB calculation through the use of the bioclimatic predictor E proposed by Chave et al. 30 Fig. 2) has shown that their relative performance varies among locations. Thus, the most conservative approach is to provide the three estimates so that the uncertainty associated with the HD relationship can be assessed.
Errors associated with each of these steps (i.e., DBH measurement, wood density, tree height) were propagated through a Monte Carlo scheme to provide mean AGB estimates with associated credibility intervals (Fig. 2).
Boreal and temperate plots (representing 11% of the total number of sub-plots) were processed manually using similar steps. Species-specific allometric equations 56 allowed the stem volume to be estimated based on the height and DBH measurements. Biomass conversion and expansion factors 57 were used to estimate AGB from the stem volume taking the tree age, site index and stocking into account. The next version of the BIOMASS R-package will be capable of processing boreal and temperate data in addition to tropical.

Data Records
The data in FOS 58 are organized in a hierarchical structure (Fig. 3). The Plot description includes a link to the institution and network. The central part of the database is the Sub-plot table, where geolocation, the date of the census, the people who manage the specific plots, the AGB and the canopy height are stored.
The FOS does not store individual tree-level information, only plot-level aggregates. Users interested in tree-level information can contact the contributing networks or the plot PIs using the links provided in the Plot table. • AGB -Above ground biomass (Mg ha −1 ) • AGB_local -aboveground biomass (Mg ha −1 ) estimated using local equations or equation 4 in Chave 30 with wood density, DBH and H derived from local height-diameter relationships.

technical Validation
The key predictive variables of AGB are tree dimensions (primarily diameter and height) and taxonomic identity, which is responsible for explaining most tree-to-tree variations through interspecific wood density variations 59 . The procedures for ensuring the quality of the data collected are as follows: (1) On-site measurement accuracy. To ensure diameter accuracy and consistency among and within censuses, field teams follow standard forest inventory protocols for the correct choice of the Point of measurement (POM). For example, the RAINFOR protocol for tropical forests 60 records each POM by painting the location on each tree to ensure that subsequent measurements can be performed at the same point. For tree height, the consistency of the height measurement is ensured by having a designated, trained operator who works at multiple sites using the same instrument. At some sites, double measurements of height (from different positions) have been carried out, and mean values have been used as the height of the individual trees. For species identification, the reliability in highly diverse tropical plots is important; hence, the tree and plot AGB is estimated by taking the species-level variability in wood density into account 61 . This is supported by collecting botanical vouchers from every taxon (or potential taxon) in the field. In many cases, these vouchers have been deposited in recognized regional herbaria, identified by botanical experts, and where possible, made available electronically (e.g., via ForestPlots.net). However, voucher collection is not currently a standard protocol for every plot in the FOS. (2) Multiple censusing. By working primarily with re-censused permanent plots rather than single census plots, we have ensured that the uncertainties are reduced because almost every tree has been measured at least twice by the time of the focal census, thus providing the opportunity to correct any errors that may have been made previously, through the identification of spurious values. Repeat censuses also provide more opportunities to improve species identification by increasing the chance of encountering fertile material (see the next step). (3) Post fieldwork data processing, e.g., by identifying trees to species level. Species identification can be extremely challenging in tropical forests due to their diversity and the fact that most trees lack flowers or fruits when inventoried. Botanical identity is a key control on the AGB through its effect on wood density. To explore the reliability of identification in some of the most diverse RAINFOR sites in western Amazonia, PIs have separated the tree species assemblages into several larger taxonomic groups. As reported by Baker et al. 62 , taxonomic specialists for each group have then assessed the accuracy of the species identifications of the herbarium collections using 18 different botanists across 60 plots during the past 30 years. Overall, even in taxonomically difficult groups where species are often very rare, 75% of tree species were correctly identified. (4) Common protocols for potential error detection. These protocols have been developed by contributing networks, e.g., by flagging trees for attention that have declined by more than 5 mm in diameter. This allows trees to be detected that have shrunk between two censuses, and whether that individual is dead/rotten. Potential issues are flagged in order to be checked against existing field notes, and during the following census. Thus, as mentioned previously, repeat censuses provide more opportunities to improve data quality as compared to single-census plots. (5) Within-network collaboration. Data quality is further enhanced through the exchange of ideas between experts at different sites and between nations, through the use of common data analysis protocols (i.e., allometric equations, R packages, etc.), and by promoting shared publications. (6) Cross-network collaboration. In the FOS, by applying a uniform R script for data aggregation and AGB estimation, potential biases from using different height-diameter, wood density and allometric relations are strongly reduced.
The distribution of FOS plots by continent is presented in Table 1. Africa, Europe and South America are represented by similar numbers of locations (i.e., 62-80 plots) and contribute more than 80% of the plots at the time of publication, but in terms of coverage, South America alone comprises 49% of the forest area covered.
www.nature.com/scientificdata www.nature.com/scientificdata/ The IIASA network provides the highest number of plot locations to FOS (Table 2), while the TmFO network contributes the most in terms of areal coverage.
The range of values of major forest parameters represented in the FOS database is shown in Table 3. The maximum AGB value (918 Mg ha −1 ) and canopy height (41.7 m) at a 0.25 ha sub-plot were recorded in Lopé, Gabon. Some savannah sub-plots (e.g., in Gabon) have a few or no trees >5 cm dbh, which leads to low or no biomass estimation. The tallest trees (60.1 m) was found in Costa Rica and the maximum basal area (85.6 m 2 ha −1 ) was found in the Caucasus, Russia. Table 4 contains information about the AGB for different biomes and globally. As expected, the average AGB increases from boreal to temperate and then from temperate to tropical forests.

Usage Notes
This data set will be essential for validating and calibrating satellite observations and forest biometric models. The focus is to provide ground support for current and planned space-borne missions, such as NASA GEDI (https:// gedi.umd.edu/), NASA-ISRO NISAR (https://nisar.jpl.nasa.gov/), JAXA ALOS PALSAR (http://global.jaxa.jp/ projects/sat/alos/) and ESA BIOMASS (https://earth.esa.int/web/guest/missions/esa-future-missions/biomass), which are aimed at retrieving forest structure parameters such as forest height and biomass.   www.nature.com/scientificdata www.nature.com/scientificdata/ At this stage, we are making no claims regarding the statistical robustness of the FOS data set for global or regional biomass estimations. Instead our aim is to present uniformly processed data on forest biomass from available locations (see Table 1). One of the main goals of the FOS is to highlight gaps in the observations.

Number of plots
Using sub-plot data for validation of RS data might lead to spatial autocorrelation problems so possible solutions would be to use a plot average, use only values from the plot or test for the presence of spatial autocorrelation.
This data package contains geographical coordinates rounded to 2 digits after decimal point (up to 1 km at equator). The most up-to-date extended data set with accurate geolocation is available in the FOS portal: https:// forest-observation-system.net/ The FOS initiative depends on the contributions of high-quality forest plot data from participating networks. The fair use of the data presented here requires respecting the efforts and rights of the partners and supporting the long-term future of these observational efforts. The data set will be licensed under a Creative Commons Attribution 4.0 International License (CC-BY 4.0), which means that it will be fully open even for commercial use but requires acknowledgment of the PIs and plot owners. We would also appreciate that all users of the FOS data either share their own data via the FOS, and/or commit to collaboratively funding new censuses and the expansion of existing plot networks.

code availability
The BIOMASS R-package is an open source library available from the CRAN R repository. The development version is publicly available and can be found on the GitHub platform at: https://github.com/AMAP-dev/ BIOMASS. Furthermore, the BIOMASS R-package is accompanied by an open access paper describing the functionality in more detail 5 . Table 4. The distribution of aboveground biomass data (t ha −1 ) by biome in the FOS database (as of December 2018).