A dataset of forest biomass structure for Eurasia

The most comprehensive dataset of in situ destructive sampling measurements of forest biomass in Eurasia have been compiled from a combination of experiments undertaken by the authors and from scientific publications. Biomass is reported as four components: live trees (stem, bark, branches, foliage, roots); understory (above- and below ground); green forest floor (above- and below ground); and coarse woody debris (snags, logs, dead branches of living trees and dead roots), consisting of 10,351 unique records of sample plots and 9,613 sample trees from ca 1,200 experiments for the period 1930–2014 where there is overlap between these two datasets. The dataset also contains other forest stand parameters such as tree species composition, average age, tree height, growing stock volume, etc., when available. Such a dataset can be used for the development of models of biomass structure, biomass extension factors, change detection in biomass structure, investigations into biodiversity and species distribution and the biodiversity-productivity relationship, as well as the assessment of the carbon pool and its dynamics, among many others.


Background & Summary
Biomass is an important indicator of terrestrial vegetation and as such, is recognised as an Essential Climate Variable 1 and an Essential Biodiversity Variable 2 . The link between biodiversity, tree species distribution and biomass 3 as well as the biodiversity-productivity relationship 4 are well recognised. Moreover, biomass is mentioned in six out of the seventeen UN Sustainable Development Goals 5 . Remote sensing is one of the most common approaches to estimate forest biomass and its dynamics over large areas. This includes measurements of canopy cover, vegetation status from different indexes, canopy height and forest structure 6,7 . However, there are no remote methods that can measure biomass density and the biomass structure by component, which can only be obtained from ground measurements. This is why field measurements are so crucial, i.e., they are the most accurate ways to learn about biomass structure, and they are needed to calibrate remote sensing instruments, model the carbon cycle, and assess forest productivity, among other uses. Yet the sharing of biomass measurements has traditionally been highly problematic. Most researchers prefer to keep the raw data confidential and publish only the aggregated results or a limited number of the measured parameters. There are some reasons that can explain this situation. First of all, the destructive sampling method (DSM) for making biomass measurements on sample plots is a very labourintensive process so the considerable investment needed over time does not incentivise researchers to share the data. Secondly, in some cases, agreements are made between researchers and the owners of the plots, which have tended towards closed use of the data by individual research projects. Finally, many experiments and measurements were undertaken in a pre-internet era and may not have not been published in English. Therefore, preserved in paper format in different countries around the world, these measurements have not been readily accessible to the scientific community.
To help remedy this situation, we have collected the most comprehensive dataset of in situ forest biomass measurements in Eurasia estimated by the DSM. The dataset has been compiled from a combination of experiments undertaken by the authors and from scientific publications. Every record contains an accompanying reference. The dataset consists of 10,351 sample plots and 9,613 sample trees ( Fig. 1 not completely linked but there is some overlap, i.e., 6,280 trees are associated with 791 plots. All other plots have no trees associated with them or vice versa. The plot level dataset contains forest biomass structure per hectare, including live trees (stem, bark, branches, foliage, roots), understory (above-and below ground), green forest floor (above-and below ground) and coarse woody debris (snags, logs, dead branches of living trees and dead roots). Due to the compilation of quantities from diverse studies, some fractions (e.g., stem wood, foliage) are better represented than others (e.g., roots, green forest floor), which means that we reported only fractions where actual measurements were performed. In addition to biomass, we have recorded a number of other forest stand parameters where available, including tree species composition, average age, tree height, growing stock volume, etc. The tree level dataset consists of a description of the sample trees, their size and their biomass fractions (see Method section for more details).
The data presented here have been partly published before [8][9][10][11][12][13][14] , but never in a comprehensive, open access, electronic format that includes the full set of parameters. We have combined existing forest biomass datasets, removing duplicated records and merging complementary parameters to create a single fused product.
The dataset is complementary to existing datasets (e.g., refs 4,15) with almost no or little overlap observed. The dataset can be used for the development of models of biomass structure, allometric equations, biomass expansion factors (BEF), change detection of biomass structure, investigations into biodiversity and species distribution and the biodiversity-productivity relationship, and the assessment of the carbon pool and its dynamics, among others.

Methods
All the data presented here were collected by the DSM. The background prerequisite of the method is to follow the major requirements of a statistically sound sampling procedure. Sample plots should be representative of the selected forest unit and include 200-300 trees. Within sample plots, the diameter at breast height (DBH, which is usually at 1.3 m or 4.5 feet) is measured for each tree. The measurements of tree height are provided for 12-15 trees by species, selected proportionally to the number of trees by diameter class in order to develop height-diameter regression relationships. These and other reported results of the measurements allow for the estimation of basic biometric (mensuration) characteristics of stands such as tree species composition, age, average diameter and height, growing stock volume, etc. For the assessment of live biomass, a number of trees are selected, cut and measured, which is outlined in more detail as follows: • The sample trees selected for destructive measurements (typically 5-15 per sample plot) should represent all tree species and the full variety of tree diameters at the sample plot. • Trees are cut and measured for as accurate an estimation as possible of taper, age, volume, increment and other biometric characteristics. • The wood and bark are sampled 5-10 times at different heights for every sample tree (usually as cross-sections of 2 to 3 cm in width). • The crown of the sample trees is sampled to represent all the parts from the bottom to the top including the full range of branch sizes for further analysis (separating foliage and drying) as well as weighing, in both the fresh and oven dry states. • Leaf area index was calculated through the size-to-mass ratio of the sample of foliage, upscaled to 1 ha with foliage biomass. • In order to measure root biomass, soil sampling is employed to represent different distances from the stem and different depths 10 . The samples are washed with water in order to extract the roots, which are separated by whether they are dead/alive, tree/grass and by size. However, most of the field studies omit below ground investigations due to the very high labour-consuming nature of this work. • The understory is accounted for in sub-plots usually of 2 × 2 m, regularly distributed over the plot.
In the case of an unequal distribution of understory, mapping of the canopy windows is then recommended with a separate understory accounting in these windows and under the canopy. The numbers by plant species and height are recorded. The average representatives of each species and height class are harvested for further separation by biomass fraction, drying and weighing. • The green forest floor is described and sampled for subsequent analysis at 1 × 1 m sub-plots.
• Coarse woody debris is accounted for by type (logs and snags), size (length and diameter), and the stage of decomposition, and is sampled accordingly.
The sampled patterns are delivered to laboratories, oven dried and weighed. The results are recorded in units of mass of dry matter. The methods mentioned above are described in detail in a number of publications, e.g., refs 10,16,17. The data collected through the DSM can be found in Biomass_plot_DB.xlsx (plot data, Data Citation 1) and Biomass_tree_DB.xlsx (tree data, Data Citation 2).

Data Records
A list containing the fields and summary statistics is presented for the sample plot (

Technical Validation
The dataset represents a range of countries (Table 3), biomes (Table 4) and tree species ( Table 5). The most representative countries are Russia, Ukraine, China, and Kazakhstan ( Table 3).
The DSM remains the most labour-costly and precise method of assessing forest biomass. The accuracy of the method and, consequently, the reliability of the presented biomass data depend on the number of sample trees. The error of the method has been estimated and documented in several   studies 10,[18][19][20] , in which sub-samples of the data were made from a comprehensive dataset, e.g., the entire harvest of all trees at the sample plot to investigate how the accuracy changed with sample size. The results show that the accuracy varies depending on the type of biomass parameter considered, i.e., the most reliable variable is the estimation of stem biomass (92-94%) while the least reliable are the crown (80-90%) and belowground (70-80%) biomass estimates. We have provided a validation of the data by checking their consistency with expected ranges for these parameters. The distribution of forest biomass by major biomes is provided in Fig. 3, which shows reasonable variation with climatic condition.
Relative indicators (especially BEF) are usually the most useful for validation. For example, wood density varies substantially with tree species (Fig. 4) and site index (Fig. 5), but stays within the expected range reported in a number of ecological publications (i.e., ref. 15). Figure 6 illustrates that the share of the crown biomass depends very much on the stand age, which is the expected relationship (e.g., refs 10,21).
The distribution of the belowground live biomass is shown in Fig. 7. A larger below ground biomass share is typically observed in low biomass forests and/or tough site conditions. Some common relationships in the sample tree parameters are presented in Fig. 8. The outliers can be explained by the individual characteristics of the tree species and the climate gradient.
With respect to geographic and parametric representations, the data cover the forests of the major forest-forming species of Eurasia in a satisfactory way. The outliers (i.e., values outside the limits of an average of ±3 s.d.'s) are negligible but where present, they can usually be explained by age, site or climatic conditions, as well as by tree species. Overall the data demonstrate satisfactory consistency with reported ranges of national and zonal aggregations and regulations (e.g., yield tables 22 ).

Usage Notes
The data are stored in Excel xlsx format. Sheets 'Plot_db' and 'Tree_db' contain the data records. The sheet 'Species' lists the tree species code, and the English and Latin names. The sheet 'References' contains a reference for every individual data record. The sheet 'Field description' describes the dataset fields and the data units.