Background & Summary

Global estimates of forest height, aboveground biomass (AGB) and changes over space and time are needed as both essential climate variables1 and essential biodiversity variables2, and to support international policy initiatives such as REDD+ 3. Several space-borne missions to assess forest structure and functioning, including BIOMASS (ESA), ALOS PALSAR (JAXA), GEDI (NASA) and NISAR (NASA-ISRO), will be operational in the coming years. These missions require ground-based estimates for algorithm calibration and product validation. For instance, high-quality, standardized measurements of forest biomass and height are critical for improving the accuracy of products derived from space-borne instruments. Furthermore, ensuring that different missions have access to the same set of high-quality standardized measurements for calibration and validation should vastly help improve comparability and confidence in future remote sensing (RS) products.

Remote Sensing users typically have different product requirements compared to those of the ecological and forestry communities. Namely, RS users often (1) need access to AGB estimates at the pixel level, while ecologists and foresters produce area-based estimates derived from individual trees measurements. RS users typically (2) need products at a consistent spatial resolution, while a variety of plot sizes and shapes have been adopted by ecologists and foresters. Finally, RS users (3) require AGB to be computed via globally and regionally consistent routines, while various approaches have been developed to derive AGB estimates from tree measurements. These communities also operate differently from a funding perspective. Most notably, recurrent investments are needed to maintain permanent forest plots – including censuses that temporally match RS data collection – and to ensure field and botanical staff are paid and trained, without whom the data would not be collected. In contrast, RS users typically access data provided by space-borne missions that have already been funded. Despite these differences, there is a clear need to share existing data sets for the benefit of both communities.

The Forest Observation System – FOS ( – is an international, collaborative initiative that aims to establish a global in situ forest AGB database to support Earth Observation (EO) and to encourage investment in relevant field-based measurements and research4. The FOS enables access to high-quality field data by partnering with some of the most well-established teams and networks responsible for managing permanent forest plots globally. In doing so, FOS is benefiting both the RS and ecological/forestry communities while facilitating positive interactions between them.

To this end, the FOS project has established a data sharing policy and framework that seeks to overcome existing barriers between data providers and users. For example, data made available on the FOS website are plot-aggregated (i.e., stand AGB, canopy height, etc.), while the underlying original tree-by-tree data are managed by participating ecological networks. To ensure that estimates added to the FOS are robust and consistent, a freely downloadable BIOMASS R-package5 has been upgraded, which makes the procedure for computing plot AGB estimates from tropical forest inventories transparent, standardized and reproducible. There are developments underway to make the package usable for any forest type, including boreal and temperate ecosystems. This work has been complemented by the definition of a set of technical requirements and standards aimed at ensuring data comparability4.

The FOS currently hosts aggregate data from plots contributed by several existing networks, including: the network of the Center for Tropical Forest Science – Forest Global Earth Observatory (CTFS-ForestGEO)6, the RAINFOR7, AfriTRON8 and T-FORCES9 (curated on the platform)10, the IIASA network11,12, the Tropical Managed Forests Observatory (TmFO)13 and AusCover14. These international collaborations have already (i) invested in establishing permanent sampling plots; (ii) proposed robust protocols for accurate tree mapping and measurement, which are largely standardized across networks; (iii) monitored existing plots repeatedly; and (iv) established databases with particular emphasis on data quality control10,15. As the FOS is an open initiative, additional networks (e.g., GFBI16) and teams that comply with the aforementioned criteria are welcome to join in the future.

The data presented here have been partly published before17,18,19,20,21, but never in such a unified and comprehensive manner. Results based on some of the plots presented here have impacted a wide range of scientific fields, including tropical forest ecology22,23,24,25,26, drought sensitivity of forests19,27,28,29, tree allometry30,31,32,33, carbon cycles21,34,35,36, remote sensing18,37,38,39, climate change8,40,41,42,43, biodiversity44,45,46,47, diversity-carbon relationships48,49 and historical forest use50,51, among others.

The online database ( provides open access to the canopy height and biomass estimates as well as information about the plot PIs who have granted access to the data (see Fig. 1 below).

Fig. 1
figure 1

The web portal.


Within the sample plots, every stem above a defined threshold in diameter at breast height (DBH, usually 1, 5, 7 or 10 cm) was taxonomically identified and the DBH measured, avoiding any buttresses or deformities. In most plots, tree height was measured for a subset of trees that are representative of different diameter classes and tree species in order to develop site-specific height-diameter regression equations. Based on an analysis using the tropical forest plot data, as few as 40 tree height observations are sufficient for characterizing this relationship if stratified by diameter22.

All the data presented here were collected from permanent forest sample plots with known locations; accurate coordinates (with an error of less than 30 meters) have been either delivered to the FOS or will be recorded during the next census. Plot sizes are typically 1 ha in area (i.e., the median), but they can vary from 0.25 ha to 50 ha. Large plots are subdivided into 0.25 ha, i.e., 50 × 50 m sub-plots. The FOS consortium made the decision to consider only relatively large and permanent plots in order to reduce errors in georeferencing and to decrease the variability in the measured parameters. Recent research has quantified the effect of spatial resolution on the uncertainties in the AGB estimates, with sampling error dropping from 46.3% for 0.1 ha plots, to 26% and 16.5% for 0.25 ha and 1 ha plots, respectively52. Scaling up from the plot to the landscape level using lidar-derived metrics, studies have shown decreases in the RMSE for the AGB-lidar models, from 70–90 to 36–51 Mg AGB per ha, when increasing the plot size from 0.25 ha to 1 ha17,53. Clearly there are always size-effort tradeoffs, e.g., smaller plots would permit greater replication, but by focusing on larger plots that are also permanent, FOS has chosen to focus its efforts on a smaller but high-quality set of plots. Our approach, therefore, excludes the possibility of using databases of smaller plots such as those found in national forest inventories.

AGB and associated uncertainties were obtained using a standardized procedure implemented in the BIOMASS R-package5. For the sake of standardization, we systematically considered only trees having a diameter ≥10 cm (or a 5 cm threshold in the case where these trees contribute substantially (>5%) to the total AGB, e.g., in savannas). Taxonomy was first checked using the Taxonomic Name Resolution Service, which in turn served to assign a wood density value to each tree using the Global Wood Density Database (GWDD) as a reference54,55. Species- or genus-level averages were assigned when possible and, if not, the plot-level mean wood density was assigned to each tree species with no known wood density. Tree height was estimated in three different ways. First, when available, subsets of tree height measurements were used to build plot-specific height-diameter relationships, assuming a three-parameter Weibull model5 or a two-parameter Michaelis-Menten model, whichever provided the lowest prediction error. Secondly, the regional height-diameter models proposed by Feldpausch et al.31 were used to infer tree height. Finally, height was implicitly taken into consideration in the AGB calculation through the use of the bioclimatic predictor E proposed by Chave et al.30. Equation 7 of Chave et al.30 was used in this case while the generalized allometric model equation 4 was used otherwise (where heights were derived from local or Feldpausch height-diameter relationships). Among the three approaches, the use of a local HD model is the most accurate. However, local height measurements are not systematically available for all plots. The Chave et al. (2014) and Feldpausch et al. (2012) approaches are both an alternative to the use of a local HD model but independent validation (e.g., Fig. 2) has shown that their relative performance varies among locations. Thus, the most conservative approach is to provide the three estimates so that the uncertainty associated with the HD relationship can be assessed.

Fig. 2
figure 2

An example of the AGB estimation with the BIOMASS R-package. MDJ-02, CAP-10 and other indexes on the horizontal axis are Plot IDs. The vertical axis is AGB in Mg ha−1 and the error bar represents the credibility interval at 95% of the stand AGB value following error propagation.

Errors associated with each of these steps (i.e., DBH measurement, wood density, tree height) were propagated through a Monte Carlo scheme to provide mean AGB estimates with associated credibility intervals (Fig. 2).

Boreal and temperate plots (representing 11% of the total number of sub-plots) were processed manually using similar steps. Species-specific allometric equations56 allowed the stem volume to be estimated based on the height and DBH measurements. Biomass conversion and expansion factors57 were used to estimate AGB from the stem volume taking the tree age, site index and stocking into account. The next version of the BIOMASS R-package will be capable of processing boreal and temperate data in addition to tropical.

Data Records

The data in FOS58 are organized in a hierarchical structure (Fig. 3). The Plot description includes a link to the institution and network. The central part of the database is the Sub-plot table, where geolocation, the date of the census, the people who manage the specific plots, the AGB and the canopy height are stored.

Fig. 3
figure 3

The database structure of the plot information.

The FOS does not store individual tree-level information, only plot-level aggregates. Users interested in tree-level information can contact the contributing networks or the plot PIs using the links provided in the Plot table.

The details of the fields found in the two linked tables of Fig. 3 are provided below.

Plot description

  • Plot_ID – unique plot ID

  • Country_Name – Name of the country

  • Network – the name of the network (e.g., RAINFOR)

  • Institution – the institution that carried out the measurements

  • Link – web link to the data provider

  • Year_established – the year when the plot was established

  • Reference – a reference to the publications

  • Other_measurements – list of parameters measured on the plot

  • Biomass_processing_protocol – file name of the biomass processing protocol (available at Data Package 1), which contains the R code, the variables assigned and the intermediate results.

Sub-plot description

  • Sub-plot_ID – unique sub-plot ID

  • Plot_ID – link to the Plot description table

  • Year_census – year of the census

  • PI_team – List of Principal Investigator(s)

  • Lat_cnt – Latitude of the center of the plot

  • Long_cnt – Longitude of the center of the plot

  • Altitude (m a.s.l.)

  • Slope (degree)

  • Plot_area (ha)

  • Plot_shape (e.g., rectangle, circle, plus dimensions)

  • Forest_status – forest description, including age, successional stage, disturbances, etc.

  • Min_DBH – Minimum diameter of trees at breast height included in the census (cm)

  • H_Lorey – Lorey’s height, DBH-weighted mean tree height (m)

    • Hlor local – mean height estimated from local H = f(DBH) curve (m)

    • Hlor Chave – mean height estimated from the curve by Chave30 (m)

    • Hlor Feldpausch – mean height estimated from the curve by Feldpausch31 (m)

  • H_max – height of the tallest tree (m)

    • Hmax local – tallest tree measured or estimated from local H = f(DBH) curve (m)

    • Hmax Chave – maximum height estimated from the curve by Chave (m)

    • Hmax Feldpausch – maximum height estimated from the curve by Feldpausch (m)

  • AGB – Above ground biomass (Mg ha−1)

    • AGB_local – aboveground biomass (Mg ha−1) estimated using local equations or equation 4 in Chave30 with wood density, DBH and H derived from local height-diameter relationships.

      • Cred_2.5 – lower bound of 95% credibility interval (Mg ha−1)

      • Cred_97.5– upper bound of 95% credibility interval (Mg ha−1)

    • AGB_Feldpausch – AGB (Mg ha−1) using equation 4 in Chave30 with wood density, DBH and H derived from Feldpausch31 height-diameter relationship.

      • Cred_2.5 – lower bound of 95% credibility interval (Mg ha−1)

      • Cred_97.5 – upper bound of 95% credibility interval (Mg ha−1)

    • AGB_Chave – aboveground biomass (in Mg ha−1) estimated using equation 7 in Chave30 with wood density, DBH and H implicitly taken into consideration through the use of the bioclimatic predictor E

      • Cred_2.5 – lower bound of 95% credibility interval (Mg ha−1)

      • Cred_97.5 – upper bound of 95% credibility interval (Mg ha−1)

  • Wood_density - mean wood density of the trees (g cm−3)

  • GSV – growing stock volume (m3 ha−1)

  • BA – basal area (m2 ha−1)

  • Ndens – number of trees per hectare

Note that we have merged the Plot and Sub-plot tables in the data package associated with this paper58 for the user’s convenience.

Technical Validation

The key predictive variables of AGB are tree dimensions (primarily diameter and height) and taxonomic identity, which is responsible for explaining most tree-to-tree variations through interspecific wood density variations59. The procedures for ensuring the quality of the data collected are as follows:

  1. (1)

    On-site measurement accuracy. To ensure diameter accuracy and consistency among and within censuses, field teams follow standard forest inventory protocols for the correct choice of the Point of measurement (POM). For example, the RAINFOR protocol for tropical forests60 records each POM by painting the location on each tree to ensure that subsequent measurements can be performed at the same point. For tree height, the consistency of the height measurement is ensured by having a designated, trained operator who works at multiple sites using the same instrument. At some sites, double measurements of height (from different positions) have been carried out, and mean values have been used as the height of the individual trees. For species identification, the reliability in highly diverse tropical plots is important; hence, the tree and plot AGB is estimated by taking the species-level variability in wood density into account61. This is supported by collecting botanical vouchers from every taxon (or potential taxon) in the field. In many cases, these vouchers have been deposited in recognized regional herbaria, identified by botanical experts, and where possible, made available electronically (e.g., via However, voucher collection is not currently a standard protocol for every plot in the FOS.

  2. (2)

    Multiple censusing. By working primarily with re-censused permanent plots rather than single census plots, we have ensured that the uncertainties are reduced because almost every tree has been measured at least twice by the time of the focal census, thus providing the opportunity to correct any errors that may have been made previously, through the identification of spurious values. Repeat censuses also provide more opportunities to improve species identification by increasing the chance of encountering fertile material (see the next step).

  3. (3)

    Post fieldwork data processing, e.g., by identifying trees to species level. Species identification can be extremely challenging in tropical forests due to their diversity and the fact that most trees lack flowers or fruits when inventoried. Botanical identity is a key control on the AGB through its effect on wood density. To explore the reliability of identification in some of the most diverse RAINFOR sites in western Amazonia, PIs have separated the tree species assemblages into several larger taxonomic groups. As reported by Baker et al.62, taxonomic specialists for each group have then assessed the accuracy of the species identifications of the herbarium collections using 18 different botanists across 60 plots during the past 30 years. Overall, even in taxonomically difficult groups where species are often very rare, 75% of tree species were correctly identified.

  4. (4)

    Common protocols for potential error detection. These protocols have been developed by contributing networks, e.g., by flagging trees for attention that have declined by more than 5 mm in diameter. This allows trees to be detected that have shrunk between two censuses, and whether that individual is dead/rotten. Potential issues are flagged in order to be checked against existing field notes, and during the following census. Thus, as mentioned previously, repeat censuses provide more opportunities to improve data quality as compared to single-census plots.

  5. (5)

    Within-network collaboration. Data quality is further enhanced through the exchange of ideas between experts at different sites and between nations, through the use of common data analysis protocols (i.e., allometric equations, R packages, etc.), and by promoting shared publications.

  6. (6)

    Cross-network collaboration. In the FOS, by applying a uniform R script for data aggregation and AGB estimation, potential biases from using different height-diameter, wood density and allometric relations are strongly reduced.

The distribution of FOS plots by continent is presented in Table 1. Africa, Europe and South America are represented by similar numbers of locations (i.e., 62–80 plots) and contribute more than 80% of the plots at the time of publication, but in terms of coverage, South America alone comprises 49% of the forest area covered.

Table 1 Distribution of records by continents (as of December 2018).

The IIASA network provides the highest number of plot locations to FOS (Table 2), while the TmFO network contributes the most in terms of areal coverage.

Table 2 The distribution of records by participating networks (as of December 2018).

The range of values of major forest parameters represented in the FOS database is shown in Table 3. The maximum AGB value (918 Mg ha−1) and canopy height (41.7 m) at a 0.25 ha sub-plot were recorded in Lopé, Gabon. Some savannah sub-plots (e.g., in Gabon) have a few or no trees >5 cm dbh, which leads to low or no biomass estimation. The tallest trees (60.1 m) was found in Costa Rica and the maximum basal area (85.6 m2 ha−1) was found in the Caucasus, Russia.

Table 3 The range of major forest parameters in the FOS database (as of December 2018).

Table 4 contains information about the AGB for different biomes and globally. As expected, the average AGB increases from boreal to temperate and then from temperate to tropical forests.

Table 4 The distribution of aboveground biomass data (t ha−1) by biome in the FOS database (as of December 2018).

Usage Notes

This data set will be essential for validating and calibrating satellite observations and forest biometric models. The focus is to provide ground support for current and planned space-borne missions, such as NASA GEDI (, NASA-ISRO NISAR (, JAXA ALOS PALSAR ( and ESA BIOMASS (, which are aimed at retrieving forest structure parameters such as forest height and biomass.

At this stage, we are making no claims regarding the statistical robustness of the FOS data set for global or regional biomass estimations. Instead our aim is to present uniformly processed data on forest biomass from available locations (see Table 1). One of the main goals of the FOS is to highlight gaps in the observations.

Using sub-plot data for validation of RS data might lead to spatial autocorrelation problems so possible solutions would be to use a plot average, use only values from the plot or test for the presence of spatial autocorrelation.

This data package contains geographical coordinates rounded to 2 digits after decimal point (up to 1 km at equator). The most up-to-date extended data set with accurate geolocation is available in the FOS portal:

The FOS initiative depends on the contributions of high-quality forest plot data from participating networks. The fair use of the data presented here requires respecting the efforts and rights of the partners and supporting the long-term future of these observational efforts. The data set will be licensed under a Creative Commons Attribution 4.0 International License (CC-BY 4.0), which means that it will be fully open even for commercial use but requires acknowledgment of the PIs and plot owners. We would also appreciate that all users of the FOS data either share their own data via the FOS, and/or commit to collaboratively funding new censuses and the expansion of existing plot networks.