Harmonized global maps of above and belowground biomass carbon density in the year 2010

Remotely sensed biomass carbon density maps are widely used for myriad scientific and policy applications, but all remain limited in scope. They often only represent a single vegetation type and rarely account for carbon stocks in belowground biomass. To date, no global product integrates these disparate estimates into an all-encompassing map at a scale appropriate for many modelling or decision-making applications. We developed an approach for harmonizing vegetation-specific maps of both above and belowground biomass into a single, comprehensive representation of each. We overlaid input maps and allocated their estimates in proportion to the relative spatial extent of each vegetation type using ancillary maps of percent tree cover and landcover, and a rule-based decision schema. The resulting maps consistently and seamlessly report biomass carbon density estimates across a wide range of vegetation types in 2010 with quantified uncertainty. They do so for the globe at an unprecedented 300-meter spatial resolution and can be used to more holistically account for diverse vegetation carbon stocks in global analyses and greenhouse gas inventories.


Background & Summary
Terrestrial ecosystems store vast quantities of carbon (C) in aboveground and belowground biomass 1 . At any point in time, these stocks represent a dynamic balance between the C gains of growth and C losses from death, decay and combustion. Maps of biomass are routinely used for benchmarking biophysical models [2][3][4] , estimating C cycle effects of disturbance [5][6][7] , and assessing biogeographical patterns and ecosystem services [8][9][10][11] . They are also critical for assessing climate change drivers, impacts, and solutions, and factor prominently in policies like Reducing Emissions from Deforestation and Forest Degradation (REDD+) and C offset schemes [12][13][14] . Numerous methods have been used to map biomass C stocks but their derivatives often remain limited in either scope or extent 12,15 . There thus remains a critical need for a globally harmonized, integrative map that comprehensively reports biomass C across a wide range of vegetation types.
Most existing maps of aboveground biomass (AGB) and the carbon it contains (AGBC) are produced from statistical or data-driven methods relating field-measured or field-estimated biomass densities and spaceborne optical and/or radar imagery 12,15,16 . They largely focus on the AGB of trees, particularly those in tropical landscapes where forests store the majority of the region's biotic C in aboveground plant matter. Land cover maps are often used to isolate forests from other landcover types where the predictive model may not be appropriate such that forest AGB maps intentionally omit AGB stocks in non-forest vegetation like shrublands, grasslands, and croplands, as well as the AGB of trees located within the mapped extent of these excluded landcovers 17 . Non-forest AGB has also been mapped to some extent using similar approaches but these maps are also routinely masked to the geographic extent of their focal landcover [18][19][20][21] . To date, there has been no rigorous attempt to harmonize and integrate these landcover-specific, remotely sensed products into a single comprehensive and temporally consistent map of C in all living biomass.
Maps of belowground biomass (BGB) and carbon density (BGBC) are far less common than those of AGB because BGB cannot be readily observed from space or airborne sensors. Consequently, BGB is often inferred from taxa-, region-, and/or climate-specific "root-to-shoot" ratios that relate the quantity of BGB to that of AGB [22][23][24] . These ratios can be used to map BGB by spatially applying them to AGB estimates using maps of their respective strata 5 . In recent years, more sophisticated regression-based methods have been developed to predict www.nature.com/scientificdata www.nature.com/scientificdata/ of spaceborne optical and synthetic aperture radar (SAR) imagery and represents the year 2010 at a 100 m spatial resolution -making it the most contemporary global woody AGB currently available and the only such map available for that year. Moreover, GlobBiomass aims to minimize prediction uncertainty to less than 30% and a recent study suggests that it has high fidelity for fine-scale applications 31 .
The GlobBiomass product was produced by first mapping the growing stock volume (GSV; i.e. stem volume) of living trees, defined following Food and Agriculture Organization (FAO) guidelines 32 as those having a diameter at breast height (DBH) greater than 10 centimeters (cm). AGB density was then determined from GSV by applying spatialized biomass expansion factors (BEFs) and wood density estimates. These factors were mapped using machine learning methods trained from a suite of plant morphological databases that compile thousands of field measurements from around the globe 33 . The resulting AGB estimates represent biomass in the living structures (stems, branches, bark, twigs) of trees with a DBH greater than 10 cm. This definition may thereby overlook AGB of smaller trees and/or shrubs common to many global regions. Unlike other maps, though, the

Data source Description Use
Santoro et al. 30 (GlobBiomass) Global, remotely sensed map of woody AGB in living trees with DBH greater than 10 cm and masked to pixels containing Landsat-identified tree cover in 2010 34 . Native resolution of 100 m. Includes accompanying standard error of predictions layer.

Woody AGBC mapping
Bouvet et al. 35 Continental, remotely sensed map of woody AGB in living trees of any size in Africa. Unmasked  Berner et al. 18 Non-linear regression model relating tundra AGBC density to Landsat ETM derived NDVI. Previously used to map Alaskan Tundra. Tundra AGBC mapping MODIS Surface Reflectance (Daily) 46,47 Daily global NDVI composite images derived from MODIS Aqua and Terra surface reflectance images. Native resolution of 250 m. Accessed in Google Earth Engine 81 . Tundra AGBC mapping Steven et al. 48 Coefficients for calibrating MODIS to Landsat ETM NDVI values. Tundra AGBC mapping Monfreda et al. 20 Globally gridded yield maps for 70 annually harvested herbaceous commodity crops (Online-only  22 Mean and standard error of field-measured root-to-shoot ratios for natural landcover types, stratified by climatic zone.
Woody BGBC mapping Grassland BGBC mapping; Kottek et al. 43 Updated version of the Köppen-Gieger climate classification. www.nature.com/scientificdata www.nature.com/scientificdata/ GlobBiomass product employs a subpixel masking procedure that retains AGB estimates in 100 m grid cells in which any amount of tree cover was detected in finer resolution (30 m) imagery 34 . This unique procedure retains AGB estimates in tree-sparse regions like savannahs, grasslands, croplands, and agroforestry systems where AGB is often overlooked 17 , as well as in forest plantations. The GlobBiomass product is the only global map that also includes a dedicated uncertainty layer reporting the standard error of prediction. We used this layer to propagate uncertainty when converting AGB to AGBC density, modelling BGBC, and integrating with C density estimates of other vegetation types.
Bouvet et al. 35 -some of whom were also participants of the GlobBiomass project -independently produced a separate AGB density map for African savannahs, shrublands and dry woodlands circa 2010 at 25 m spatial resolution 35 (hereafter "Bouvet map"), which we included in our harmonized product to begin to address the GlobBiomass map's potential omission of small trees and shrubs that do not meet the FAO definition of woody AGB. This continental map of Africa is based on a predictive model that directly relates spaceborne L-band SAR imagery -an indirect measure of vegetation structure that is sensitive to low biomass densities 36 -with region-specific, field-measured AGB. Field measurements (n = 144 sites) were compiled from 7 different sampling campaigns -each specifically seeking training data for biomass remote sensing -that encompassed 8 different countries 35 . The resulting map is not constrained by the FAO tree definition and is masked to exclude grid cells in which predicted AGB exceeds 85 megagrams dry mater per hectare (Mg ha −1 ) -the threshold at which the SAR-biomass relationship saturates. To avoid extraneous prediction, it further excludes areas identified as "broadleaved evergreen closed-to-open forest", "flooded forests", "urban areas" and "water bodies" by the European Space Agency's Climate Change Initiative (CCI) Landcover 2010 map 37 and as "bare areas" in the Global Land Cover (GLC) 2000 map 38 . While the Bouvet map is not natively accompanied by an uncertainty layer, its authors provided us with an analytic expression of its uncertainty (SE; standard error of prediction) as a function of estimated AGB (Eq. 1) which we used to generate an uncertainty layer for subsequent error propagation.

Fig. 2
Difference between underlying woody aboveground biomass maps in Africa. Maps considered are the GlobBiomass 30 global map and the Bouvet 35 map of Africa. Both maps were aggregated to a 300 m spatial resolution and converted to C density prior to comparison using the same schema. The difference map was subsequently aggregated to a 3 km spatial resolution and reprojected for visualization. Negative values denote lower estimates by Bouvet et al. 35  We combined the GlobBiomass and Bouvet products to generate a single woody biomass map by first upscaling each map separately to a matching 300 m spatial resolution using an area-weighted average to aggregate grid cells, and then assigning the Bouvet estimate to all overlapping grid cells, except those identified by the CCI Landcover 2010 map as closed or flooded forest types (Online-only Table 1) which were not within the dryland domain of the Bouvet map. While more complex harmonization procedures based on various averaging techniques have been used by others 39,40 , their fidelity remains unclear since they fail to explicitly identify and reconcile the underlying source of the inputs' discrepancies 41 . We thus opted to use a more transparent ruled-based approach when combining these two woody biomass maps, which allows users to easily identify the source of a grid cell's woody biomass estimate. Given the local specificity of the training data used to produce the Bouvet map, we chose to prioritize its predictions over those of the GlobBiomass product when within its domain. In areas of overlap, the Bouvet map values tend to be lower in moist regions and higher in dryer regions (Fig. 2), though, where used, these differences rarely exceed ±25 megagrams C per hectare (MgC ha −1 ).
We then converted all woody AGB estimates to AGBC by mapping climate and phylogeny-specific biomass C concentrations from Martin et al. 42 . Climate zones were delineated by aggregating classes of the Köppen-Gieger classification 43 (Table 2) to match those of Martin et al. 42 . Phylogenetic classes (angiosperm, gymnosperm and mixed/ambiguous) were subsequently delineated within each of these zones using aggregated classes of the CCI Landcover 2010 map (Online-only Table 1). Martin et al. 42 only report values for angiosperms and gymnosperms so grid cells with a mixed or ambiguous phylogeny were assigned the average of the angiosperm and gymnosperm values and the standard error of this value was calculated from their pooled variance. Due to residual classification error in the aggregated phylogenetic classes, we weighted the phylogeny-specific C concentration within each climate zone by the binary probability of correctly mapping that phylogeny (i.e. user's accuracy) 44 using Eq. 2 where, within each climate zone, μ c is the mean probability-weighted C concentration of the most probable phylogeny, μ m is the mean C concentration of that phylogeny from Martin et al. 42 , p m is the user's accuracy of that phylogeny's classification (Table 3), and μ n and μ o are the mean C concentrations of the remain phylogenetic classes from Martin et al. 42 . Standard error estimates for these C concentrations were similarly weighted using summation in quadrature (Eq. 3) where σ c is the probability-weighted standard error of the most probable phylogeny's C concentration and σ m , σ n and σ o are the standard errors of the respective phylogeny-specific C concentrations from Martin et al. 42 .
Probability-weighted C concentrations used are reported in Table 4.  www.nature.com/scientificdata www.nature.com/scientificdata/ Mapped, probability-weighted C estimates were then arithmetically applied to AGB estimates. Uncertainty associated with this correction was propagated using summation in quadrature of the general form (Eq. 4) … , σ f is the uncertainty of μ f , and σ σ σ … , , , i j k , are the respective uncertainty estimates of the dependent parameters (standard error unless otherwise noted). Here, μ f , is the estimated AGBC of a given grid cell, and is the product of its woody AGB estimate, and its corresponding C concentration.
Tundra vegetation biomass. The tundra and portions of the boreal biome are characterized by sparse trees and dwarf woody shrubs as well as herbaceous cover that are not included in the GlobBiomass definition of biomass. AGB density of these classes has been collectively mapped by Berner et al. 18,45 for the North Slope of Alaska from annual Landsat imagery composites of the normalized difference vegetation index (NDVI) and a non-linear regression-based model trained from field measurements of peak AGB that were collected from the published literature (n = 28 sites). Berner et al. 18 note that while these field measurements did not constitute a random or systematic sample, they did encompass a broad range of tundra plant communities. In the absence of a global map and due the sparsity of high quality Landsat imagery at high latitudes, we extended this model to the pan-Arctic and circumboreal regions using NDVI composites created from daily 250 m MODIS Aqua and Terra surface reflectance images 46,47 that were cloud masked and numerically calibrated to Landsat ETM reflectance -upon which the tundra model is based -using globally derived conversion coefficients 48 . We generated six separate 80 th percentile NDVI composites circa 2010 -one for each of the MODIS missions (Aqua and Terra) in 2009, 2010 and 2011 -following Berner et al. 18 . We chose to use three years of imagery (circa 2010) rather than just one (2010) to account for the potential influence that cloud masking may exert upon estimates of the 80 th NDVI percentile in a single year. We then applied the tundra AGB model to each composite, converted AGB estimates to AGBC by assuming a biomass C fraction of 49.2% (SE = 0.8%) 42 and generated error layers for each composite from the reported errors of the AGB regression coefficients and the biomass C conversion factor using summation in quadrature as generally described above (Eq. 4). A single composite of tundra AGBC circa 2010 was then created as the pixelwise mean of all six composites. We also generated a complementary uncertainty layer representing the cumulative standard error of prediction, calculated as the pixelwise root mean of the squared error images in accordance with summation in quadrature. Both maps were upscaled from their native 250 m spatial resolution to a 300 m spatial resolution using an area weighted aggregation procedure, whereby pixels of the 300 m biomass layer was calculated as the area weighted average of contained 250 m grid cells, and the uncertainty layer was calculated -using summation in quadrature -as the root area-weighted average of the contained grid cells squared.
Grassland biomass. Grassland AGBC density was modelled directly from maximum annual NDVI composites using a non-linear regression-based model developed by Xia et al. 19 for mapping at the global scale. This model was trained by relating maximum annual NDVI as measured by the spaceborne Advanced Very High-Resolution Radiometer (AVHRR) sensor to globally distributed field measurements of grassland AGBC that were compiled from the published literature (81 sites for a total of 158 site-years). Like the tundra biomass training data, these samples did not constitute a random or systematic sample but do encompass a comprehensive range of global grassland communities. Given the inevitable co-occurrence of trees in the AVHRR sensor's 8 km resolution pixels upon which the model is trained, it's predictions of grassland AGBC are relatively insensitive to the effects of co-occurring tree cover. We thereby assume that its predictions for grid cells containing partial tree cover represent the expected herbaceous AGBC density in the absence of those trees. Maximum model predicted AGBC (NDVI = 1) is 2.3 MgC ha −1 which is comparable to the upper quartile of herbaceous AGBC estimates from global grasslands 49 and suggests that our assumption will not lead to an exaggerated estimation. For partially wooded grid cells, we used modelled grassland AGBC density to represent that associated with the herbaceous fraction of the grid cell in a manner similar to Zomer et al. 17

as described below (See "Harmonizing Biomass Carbon Maps").
We applied the grassland AGBC model to all grid cells of maximum annual NDVI composites produced from finer resolution 16-day (250 m) MODIS NDVI imagery composites circa 2010 50,51 . Here again, three years of imagery were used to account for potential idiosyncrasies in a single year's NDVI composites resulting from annual data availability and quality. As with AGB of tundra vegetation, annual composites (2009-2011) were constructed separately from cloud-masked imagery collected by both MODIS missions (Aqua and Terra; n = 6) and then numerically calibrated to AVHRR reflectance using globally derived conversion coefficients specific to areas of herbaceous cover 52 . We then applied the AGBC model to each of these composites and estimated error for each composite from both the AVHRR calibration (standard deviation approximated from the 95% confidence interval of the calibration scalar) and the AGBC model (relative RMSE) using summation in quadrature. A single map of grassland AGBC circa 2010 was then created as the pixelwise mean of all six composites and an associated error layer was created as the pixelwise root mean of the squared error images. Both maps were aggregated from their original 250 m resolution to 300 m to facilitate harmonization using the area-weighted procedure described previously for woody and tundra vegetation (see section 1.2).
Cropland biomass. Prior to harvest, cropland biomass can also represent a sizable terrestrial C stock. In annually harvested cropping systems, the maximum standing biomass of these crops can be inferred from annual net primary productivity (ANPP). While spaceborne ANPP products exist, they generally perform poorly in www.nature.com/scientificdata www.nature.com/scientificdata/ croplands 53,54 . Instead, cropland ANPP is more commonly derived from crop yields 20,21,53 . We used globally gridded, crop-specific yields of 70 annually harvested herbaceous commodity crops circa 2000 by Monfreda et al. 20 the only year in which these data were available. These maps were produced by spatially disaggregating crop-yield statistics for thousands of globally distributed administrative units throughout the full extent of a satellite-based cropland map 20 . These maps were combined with crop-specific parameters (Online-only Table 2) to globally map AGBC as aboveground ANPP for each crop following the method of Wolf et al. 21 . This method can be simplified as (Eq. 5)   43 was used to stratify grassland root-to-shoot ratios from Mokany et al. 22 as described in Table 5 and biomass carbon concentrations by Martin et al. 42 as described in Table 4.  www.nature.com/scientificdata www.nature.com/scientificdata/ where y is the crop's yield (Mg ha −1 ), ω is the dry matter fraction of its harvested biomass, h is its harvest index (fraction of total AGB collected at harvest) and c is the carbon content fraction of its harvested dry mass. This simplification assumes, following Wolf et al. 21 , that 2.5% of all harvested biomass is lost between the field and farmgate and that unharvested residue and root mass is 44% C.
Total cropland AGBC density was then calculated as the harvested-area-weighted average of all crop-specific AGBC estimates within a given grid cell. Since multiple harvests in a single year can confound inference of maximum AGBC from ANPP, we further determined the harvest frequency (f) of each grid cell by dividing a cell's total harvested area (sum of the harvested area of each crop reported within a given grid cell) by its absolute cropland extent as reported in a complementary map by Ramankutty et al. 55 . If f was greater than one, multiple harvests were assumed to have occurred and AGBC was divided by f to ensure that AGBC estimates did not exceed the maximum standing biomass density.
Since the yields of many crops and, by association, their biomass have changed considerably since 2000 56,57 , we calibrated our circa 2000 AGBC estimates to the year 2010 using local rates of annual ANPP change (MgC ha −1 yr −1 ) derived as the Theil-Sen slope estimator -a non-parametric estimator that is relatively insensitive to outliers -of the full MODIS Terra ANPP timeseries (2000-2015) 58 . Total ANPP change between 2000 and 2010 for each grid cell was calculated as ten times this annual rate of change. Since MODIS ANPP represents C gains in both AGB and BGB, we proportionately allocated aboveground ANPP to AGBC using the total root-to-shoot ratio derived from the circa 2000 total crop AGBC and BGBC maps (described below). Since error estimates were not available for the yield maps or the crop-specific parameters used to generate the circa 2000 AGBC map, estimated error of the circa 2010 crop AGBC map was exclusively based on that of the 2000-2010 correction. The error of this correction was calculated as the pixel-wise standard deviation of bootstrapped simulations (n = 1000) in which a random subset of years was omitted from the slope estimator in each iteration. The 8 km resolution circa 2000 AGBC map and error layer were resampled to 1 km to match the resolution of MODIS ANPP using the bilinear method prior to ANPP correction and then further resampled to 300 m to facilitate harmonization.
Woody crops like fruit, nut, and palm oil plantations were not captured using the procedure just described and their biomass was instead assumed to be captured by the previously described woody biomass products which retained biomass estimates in all pixels where any amount of tree cover was detected at the sub-pixel level (see section 1.1).  Table 4. Climate and phylogeny specific biomass C fractions used to convert biomass density estimates to carbon density. C fractions were taken from Martin et al. 42 and weighted by the aggregated probability of correct phylogenetic classification (i.e. user's accuracy) from Table 3. Climate zones are spatially defined in Table 2.  Table 5. Root-to-shoot ratios used to map BGB of select landcover types. Root-to-shoot ratios and their standard errors were taken from Mokany et al. 22 . Grassland stratification classes correspond with those reported as "Mokany Grassland Class" in Table 2.
www.nature.com/scientificdata www.nature.com/scientificdata/ Belowground biomass carbon maps. Matching maps of BGBC and associated uncertainty were subsequently produced for each of the landcover-specific AGBC maps using published empirical relationships.
With the exception of savannah and shrubland areas, woody BGBC was modelled from AGBC using a multiple regression model by Reich et al. 25 that considers the phylogeny, mean annual temperature (MAT), and regenerative origin of each wooded grid cell and that was applied spatially using maps of each covariate in a fashion similar to other studies 5,27 . Tree phylogeny (angiosperm or gymnosperm) was determined from aggregated classes of the CCI Landcover 2010 map 37 (Online-only Table 1) with phylogenetically mixed or ambiguous classes assumed to be composed of 50% of each. MAT was taken from version 2 of the WorldClim bioclimatic variables dataset  at 1 km resolution 59 and resampled to 300 m using the bilinear method. Since there is not a single global data product mapping forest management, we determined tree origin -whether naturally propagated or planted -by combining multiple data sources. These data included (i) a global map of "Intact Forest Landscapes" (IFL) in the year 2013 60 (a conservative proxy of primary, naturally regenerating forests defined as large contiguous areas with minimal human impact), (ii) a Spatial Database of Planted Trees (SDPT) with partial global coverage 61 , (iii) national statistics reported by the FAO Global Forest Resources Assessment (FRA) on the extent of both naturally regenerating and planted forests and woodlands within each country in the year 2010 62 , and (iv) national statistics reported by the FAOSTAT database (http://www.fao.org/faostat) on the planted area of plantation crops in 2010. Within each country, we assumed that the total area of natural and planted trees was equal to the corresponding FRA estimates. If the FAOSTAT-reported area of tree crops exceeded FRA-reported planted area, the difference was added to FRA planted total. All areas mapped as IFL were assumed to be of natural origin and BGB was modelled as such. Likewise, besides the exceptions noted below, all tree plantations mapped by the SDPT were assumed to be of planted origin. In countries where the extent of the IFL or SDPT maps fell short of the FRA/FAOSTAT reported areas of natural or planted forests, respectively, we estimated BGBC in the remaining, unknown-origin forest grid cells of that country (BGBC u ), as the probability-weighted average of the planted and natural origin estimates using Eq. 6 where BGBC p and BGBC n are the respective BGBC estimates for a grid cell assuming entirely planted and natural origin, respectively, and ∆ p and ∆ n are the respective differences between (i) the FRA/FAOSTAT and (ii) mapped extent of planted and natural forest within the given grid cell's country. While the mapped extent of IFL forests within a given country never exceeded that country's FRA reported natural forest extent, there were infrequent cases (n = 22 of 257) in which the mapped extent of tree plantations exceeded the corresponding FRA/FAOSTAT estimate of planted forest area. In these cases, we down-weighted the BGB estimates of SDPT forests in a similar fashion such that the weight of their planted estimate (ω p ) was equal to the quotient of (i) the FRA/FAOSTAT planted area and (ii) the SDPT extent within the country, and the weight of the natural origin estimate applied to the SDPT extent (ω n ) was equal to ω − 1 p . A BGBC error layer was then produced using summation in quadrature from the standard error estimates of the model coefficients, the AGBC error layer, the relative RMSE of MAT (27%), and the derived global uncertainty of the phylogeny layer. Phylogeny error was calculated as the Bernoulli standard deviation (δ) of the binary probability (p) of correct classification (i.e. "area weighted user's accuracy" 44 ; Table 3) using Eq. 7.
Since savannahs and shrublands are underrepresented in the regression-based model 25 , their BGBC was instead estimated using static root-to-shoot ratios reported by Mokany et al. 22 , which are somewhat conservative in comparison to the IPCC Tier-1 defaults 23,24 put favoured for consistency with methods used for grasslands (see below). Error was subsequently mapped from that of the AGBC estimates and the root-to-shoot ratios applied ( Table 5).
BGBC of tundra vegetation was mapped from AGBC using a univariate regression model derived by Wang et al. 26 that predicts root-to-shoot ratio as a function of MAT. We applied the model using the WorldClim version 2 MAT map 59 and propagated error from the AGBC estimates, the relative RMSE of MAT and the standard error of regression coefficients. Where tundra AGB exceeded 25 Mg ha −1 -the maximum field-measured shrub biomass reported by Berner et al. 18 -vegetation was considered to include trees and the Reich et al. 25 method described earlier for woody vegetation was used instead.
In the absence of a continuous predictor of grassland root-to-shoot ratios, we applied climate specific root-to-shoot ratios from Mokany et al. 22 to the corresponding climate regions of the Köppen-Gieger classification 43 (Table 2). Here, again, these ratios vary slightly from the IPCC Tier-1 defaults 23,24 but were chosen for their greater sample size and specificity. Grassland BGBC error was mapped from the error of the AGBC estimates and the respective root-to-shoot ratios.
Cropland BGBC was again estimated from crop-specific yields and morphological parameters (Online-only Table 2) following Wolf et al. 21 and Eq. 8 1 where y is the crop's yield (Mg ha −1 ), r is the root-to-shoot ratio of the crop, and h is its harvest index. Here again we assume that 2.5% of all harvested biomass is lost between the field and farmgate and that root biomass is 44% C, following Wolf et al. 21 . BGBC error was mapped from the error of the 2000-to-2010 ANPP correction for BGBC allocation as described above for cropland AGBC.
Harmonizing biomass carbon maps. The AGBC and BGBC maps were harmonized separately following the same general schema (Fig. 3). Given that our harmonized woody biomass map contains biomass estimates for grid cells in which any amount of tree cover was detected at the subpixel level (see section 1.1), we conserved its estimates regardless of the landcover reported by the 2010 CCI map in order to more fully account for woody biomass in non-forested areas 17 . We then used the MODIS continuous vegetation fields percent tree cover map for 2010 63 to allocate additional biomass density associated with the most probable herbaceous cover (grass or crop) to each grid cell in quantities complementary to that of the grid cell's fractional tree cover estimate (Eq. 9) where μ T is the total biomass estimate of a grid cell, μ w is the woody biomass estimate for the grid cell, μ h is its herbaceous biomass estimate, and q is the MODIS fractional tree cover of the grid cell. Since MODIS tree cover estimates saturate at around 80% 64 , we linearly stretched values such that 80% was treated as complete tree cover (100%). Moreover, we acknowledge that percent cover can realistically exceed 100% when understory cover is considered but we were unable to reasonably determine the extent of underlying cover from satellite imagery. As such, our approach may underestimate the contribution of herbaceous C stocks in densely forested grid cells. The most likely herbaceous cover type was determined from the CCI Landcover 2010 map, which we aggregated into two "likely herbaceous cover" classes -grass or crop -based on the assumed likelihood of cropland in each CCI class (Online-only Table 1). However, due to inherent classification error in the native CCI Landcover map, when determining the herbaceous biomass contribution we weighted the relative allocation of crop and grass biomass to a given grid cell based on the probability of correct classification by the CCI map (i.e. "user's accuracy", Table 6) of the most probable herbaceous class (p i ) such that μ h can be further expressed as (Eq. 10) where μ i is the predicted biomass of the most probable herbaceous class, and μ j is that of the less probable class. The uncertainty of a grid cell's total AGBC or BGBC estimate (σ T ) was determined and mapped from that of its components (μ μ and w h ) by summation in quadrature which can be simplified as (Eq. 11) where σ w is the error of the grid cell's estimated μ w , σ h is the error of its estimated μ h , and σ q is the error of its q.
Here, σ h can be further decomposed and expressed as Eq. 12 to account for the accuracy weighted allocation procedure expressed previously (Eq. 10) where σ i is the error of the estimated biomass density of the most probable herbaceous class, δ i is the estimated standard deviation of that class's Bernoulli probability (p; Eq. 7), and σ j is the error of the estimated biomass density of the less probable herbaceous subclass. Exceptions to the above schema were made in the tundra and boreal biomes -as delineated by the RESOLVE Ecoregions 2017 biome polygons 65 -where thematic overlap was likely between the woody and tundra plant biomass maps. A separate set of decision rules (Fig. 3) was used to determine whether grid cells in these biomes were to be exclusively allocated the estimate of the tundra plant map or that of the fractional allocation procedure described above. In general, any land in these biomes identified as sparse landcover by the CCI landcover map (Online-only Table 1) was assigned the tundra vegetation estimate. In addition, lands north of 60° latitude with less than 10% tree cover or where the tundra AGBC estimate exceeded that of the woody AGBC estimate were also exclusively assigned the tundra vegetation estimate. Lands north of 60° latitude not meeting these criteria were assigned the woody value with the additional contribution of grass.
Subtle numerical artefacts emerged from the divergent methodologies employed north and south of 60°N latitude. These were eliminated by distance weighting grid cells within 1° of 60°N based on their linear proximity to 60°N and then averaging estimates such that values at or north of 61°N were exclusively based on the northern methodology, those at 60°N were the arithmetic average of the two methodologies and those at or south of 59°N were exclusively based on the southern methodology. This produced a seamless, globally harmonized product that integrates the best remotely sensed estimates of landcover-specific C density. Water bodies identified as class "210" of the CCI 2010 landcover map were then masked from our final products.

Data Records
Data layers (n = 4, Table 7) for the maps of AGBC and BGBC density (Fig. 4) as well as their associated uncertainty maps which represent the combined standard error of prediction (Fig. 5) are available as individual 16-bit integer rasters in GeoTiff format. All layers are natively in a WGS84 Mercator projection with a spatial resolution of approximately 300 m at the equator and match that of the ESA CCI Landcover Maps 37 . Raster values are in units megagrams C per hectare (MgC ha −1 ) and have been scaled by a factor of ten to reduce file size.
www.nature.com/scientificdata www.nature.com/scientificdata/ These data are accessible through the Oak Ridge National Laboratory (ORNL) DAAC data repository (https://doi. org/10.3334/ORNLDAAC/1763) 66 . In addition, updated and/or derived vegetation-specific layers that were used to create our harmonized 2010 maps are available as supplemental data on figshare 67 .

technical Validation
Our harmonized products rely almost exclusively upon maps and models that have been rigorously validated by their original producers and were often accompanied by constrained uncertainty estimates. Throughout our harmonization procedure, we strived to conserve the validity of each of these products by minimizing the introduction of additional error and by tracking any introductions, as described above, such that the final error layers represent the cumulative uncertainty of the inputs used. Ground truth AGB and BGB data are almost always collected for individual landcover types. Consequently, we are unable to directly assess the validity of our integrated estimates beyond their relationships to individual landcover-specific estimates and the extents to which they were modified from their original, previously-validated form prior to and during our harmonization procedure.

Modifications to independent biomass layers.
Temporal and spatial updates made to existing landcover-specific maps of non-tree AGB resulted in relatively small changes to their predictions. For example, we used numerically calibrated MODIS imagery to extend the Landsat-based tundra plant AGB model beyond its native extent (the North Slope of Alaska) to the pan-Arctic region since neither a comparable model nor a consistent Landsat time series were available for this extent. We assessed the effects of these assumptions by comparing our predictions for the North Slope with those of the original map 18 (Fig. 6a). Both positive and negative discrepancies exist between ours and the original, though these rarely exceed ±2 MgC ha −1 and no discernibly systematic bias was evident.
Our updated map of grassland biomass carbon in the year 2010 was similarly made by applying the original AVHRR-based model to calibrated MODIS imagery. This too resulted in only subtle changes to the original biomass map (Fig. 6b) that were rarely in excess of 0.5 MgC ha −1 . In most areas, our estimates were higher than those of Xia et al. 19 who mapped the mean AGBC density between 1986 and 2006. Most of these elevated estimates corresponded with areas in which significant NDVI increases ("greening") have been reported while notably lower estimates in the Argentine Monte and Patagonian steppe biomes of southern South America, likewise, correspond with areas of reported "browning" 68,69 . Both greening and browning trends are well documented phenomena and have been linked to climatic changes 70 . Moreover, we further compared AGBC estimates from both the original Xia et al. 19 map and our 2010 update to AGBC field measurements coordinated by the Nutrient Network that were collected from 48 sites around the world between 2007 and 2009 49 . The RMSE (0.68 MgC ha −1 ) of our updated map was 10% less that of the Xia et al. map for sites with less than 40% tree cover. Likewise, our 2010 estimates were virtually unbiased (bias = −0.01 MgC ha −1 ) in comparison to the Xia map (bias = 0.25 MgC ha −1 ). While still noisy, these results suggest that our temporal update improved the overall accuracy of estimated grassland AGBC.
Finally, cropland biomass carbon maps were also updated from their native epoch (2000) to 2010 using pixel-wise rates of MODIS ANPP change over a ten-year period. While MODIS ANPP may be a poor snapshot of crop biomass in a single year, we assumed that its relative change over time reflects real physiological shifts affecting the cropland C cycle. This correction also resulted in only small differences that rarely exceeded ±2 MgC ha −1 and that, spatially, correspond well with observed declines in the yields of select crops that have been linked to climate change 71,72 (Fig. 6c). Nonetheless, updated global yield maps comparable to those available for 2000 would greatly improve our understanding of the interactions between climate change, crop yields, and C dynamics.

Belowground biomass estimates.
Belowground biomass is notoriously difficult to measure, model, and also to validate. We accounted for the reported uncertainty of nearly every variable considered when estimating belowground biomass and pixel-level uncertainty, but we were unable to perform an independent validation of our harmonized estimates at the pixel level due to a paucity of globally consistent field data. To complete such a task, a globally orchestrated effort to collect more BGB samples data across all vegetation types is needed.
Given this lack of data, we instead compared the estimated uncertainty of our BGBC maps to that of our AGBC estimates to infer the sources of any divergence (Fig. 5). As expected, our cumulative BGBC uncertainty layer generally reveals greater overall uncertainty than our AGBC estimates, with BGBC uncertainty roughly twice that of AGBC throughout most of the globe. The highest absolute uncertainty was found in biomass rich forests. Arid woodlands, especially those of the Sahel and eastern Namibia, generally had the greatest relative BGBC uncertainty, though their absolute uncertainty was quite small (generally less than 3 MgC ha −1 ). Here, biomass estimates of sparse woody vegetation were primarily responsible for heightened relative uncertainty.

Crop
Non-Crop User's Acc.  Table 6. Area-weighted confusion matrix for "likely herbaceous" classes. Aggregated "likely herbaceous" classes were aggregated from the ESA CCI 2010 landcover map as described in Online-only Table 1. Class accuracies were taken from the ESA CCI matrix for the year 2010 as reported in Tables 4-5 in version 2.5 of the D3.4-PUG CCI Landcover Product User Guide 84 and area-weighted following Olofsson et al. 44 . Area weighted user's accuracies were used to propagate uncertainty associated with herbaceous biomass allocation.
www.nature.com/scientificdata www.nature.com/scientificdata/ High relative and absolute BGBC uncertainty were also associated with predictions in select mountainous forests (e.g. east central Chile) as well as forested areas in and around cities. These patterns were largely driven by AGB uncertainty in the GlobBiomass product.
Biomass harmonization. The GlobBiomass global woody AGB map produced by Santoro et al. 30 comprises the backbone of our integrated products and, with few exceptions, remains largely unchanged in our final AGBC   www.nature.com/scientificdata www.nature.com/scientificdata/ map. The native version of the GlobBiomass map is accompanied by an error layer describing the uncertainty of each pixel's biomass estimate and this too forms the core of our integrated uncertainty layers. In areas with tree cover, the global average error of GlobBiomass estimates is 39 Mg ha −1 or 50% with greater relative uncertainty in densely forested areas, along the margins of forested expanses like farm fields and cities, and in similar areas with sparse tree cover.
Adding additional grass or crop biomass in complementary proportion to a grid cell's tree cover often did not exceed the estimated error of the original GlobBiomass map (Fig. 7). Grid cells exceeding GlobBiomass's native uncertainty comprise less than 40% of its total extent. Exceptions were primarily found in grassland and cropland dominated regions where tree cover was generally sparse, and, consequently, the herbaceous biomass contribution was relatively high. Even so, the absolute magnitude of these additions remains somewhat small (less than 2.3 MgC ha −1 for grassland and 15 MgC ha −1 for cropland).
Larger deviations from GlobBiomass were also present in areas of both dryland Africa and the Arctic tundra biome, where we used independent layers to estimate woody biomass. In African drylands, GlobBiomass likely underestimates woody biomass by adopting the conservative FAO definition (DBH > 10 cm), which implicitly omits the relatively small trees and shrubs that are common to the region. The Bouvet map of Africa that we used to supplement these estimates is not bound by this constraint, was developed from region-specific data, and predicts substantially higher AGB density throughout much of its extent with comparatively high accuracy (RMSE = 17.1 Mg ha −1 ) 35 . www.nature.com/scientificdata www.nature.com/scientificdata/ www.nature.com/scientificdata www.nature.com/scientificdata/ GlobBiomass also included sporadic biomass estimates throughout much of the Arctic tundra biome. Trees are generally scarce throughout this biome, which is instead dominated by dwarf shrubs and herbaceous forbs and graminoids, so given GlobBiomass's adherence to FAO guidelines, its predictions here may be spurious. We thus prioritized the estimates of the independent model developed specifically to collectively predict biomass of both woody and herbaceous tundra vegetation. These estimates were generally higher than GlobBiomass but agreed well with independent validation data from North America (RMSE = 2.9 Mg ha −1 ) 18 .
Comparison with the ipCC Tier-1 global biomass carbon map. While far from a perfect comparison, the only other map to comprehensively report global biomass carbon density for all landcover types is the IPCC Tier-1 map for the year 2000 by Ruesch and Gibbs 28 . As previously described, this map was produced using an entirely different method ("stratify and multiply") and distinct data sources 23 and represents an earlier epoch. However, the map is widely used for myriad applications, and it may thus be informative to assess notable differences between it and our new products.
Ruesch and Gibbs 28 report total living C stocks of 345 petagrams (PgC) in AGBC and 133 PgC in BGBC for a total of 478 PgC, globally. Our estimates are lower at 287 PgC and 122 PgC in global AGBC and BGBC, respectively, for a total of 409 PgC in living global vegetation biomass. Herbaceous biomass in our maps comprised 9.1 and 28.3 PgC of total AGBC and BGBC, respectively. Half of all herbaceous AGBC (4.5 PgC) and roughly 6% of all herbaceous BGBC (1.7 PgC) was found in croplands. Moreover, we mapped 22.3 and 6.1 PgC, respectively, in the AGB and BGB of trees located within the cropland extent. These trees constituted roughly 7% of all global biomass C and are likely overlooked by both the Ruesch and Gibbs map 28 and by remotely sensed forest C maps that are masked to forested areas. Zomer et al. 17 first highlighted this potential discrepancy in the Ruesch and Gibbs map 28 when they produced a remarkably similar estimate of 34.2 Pg of overlooked C in cropland trees using Tier-1 defaults. However, their estimates were assumed to be in addition to the 474 PgC originally mapped by Ruesch and Gibbs 28 . Here, we suggest that the 28.4 PgC we mapped in cropland trees is already factored into our 409 PgC total.
Our AGBC product predicts substantially less biomass C than Ruesch and Gibbs 28 throughout most of the pantropical region and, to a lesser extent, southern temperate forests (Fig. 8a). This pattern has been noted by others comparing the Ruesch and Gibbs map 28 to other satellite-based biomass maps 73 and may suggest that the IPCC default values used to create it 23 are spatially biased. In addition, well-defined areas of high disagreement emerge in Africa that directly correspond with the FAO boundaries of the "tropical moist deciduous forest" ecofloristic zone and suggest that this area, in particular, may merit critical review. Moreover, the opposite pattern is observed in this same ecofloristic zone throughout South America. Our map also predicts greater AGBC throughout much of the boreal forest as well as in African shrublands and the steppes of South America.
We observed similar, though less pronounced discrepancies, when comparing BGBC maps (Fig. 8b). Notably, our map predicts substantially more BGBC throughout the tundra biome -a previously underappreciated C stock that has recently risen to prominance 74 -the boreal forest, African shrublands and most of South America and Australia. However, we predict less BGBC in nearly all rainforests (Temperate and Tropical). These differences and their distinct spatial patterns correspond with the vegetation strata used to make the IPCC Tier-1 map 28 and suggest that the accuracy of the "stratify and multiply" method depends heavily upon the quality of the referenced and spatial data considered. Inaccuracies in these data may, in turn, lead to false geographies. Integrating, continuous spatial estimates that better capture local and regional variation, as we have done, may thus greatly improve our understanding of global carbon geographies and their role in the earth system. www.nature.com/scientificdata www.nature.com/scientificdata/ Congruence with IPCC Tier-2 and Tier-3 nationally reported woody carbon stocks. The error and variance between our woody biomass estimates -when aggregated to the country level -and comparable totals reported in the FRA were less for comparisons made against FRA estimates generated using higher tier IPCC methodologies than for those based on Tier-1 approaches (Fig. 9). Across the board for AGBC, BGBC, and total C comparisons, the relative RMSE (RMSE CV ) of our estimates, when compared to estimates generated using high tier methods, was roughly half of that obtained from comparisons with Tier-1 estimates (Table 8). Likewise, the coefficient of determination (R 2 ) was greatest for comparisons with Tier-3 estimates. For each pool-specific comparison (AGBC, BGBC, and total C), the slopes of the relationships between Tier-1, 2, and 3 estimates were neither significantly different from a 1:1 relationship nor from one another (p > 0.05; ANCOVA). Combined, these results suggest that our maps lead to C stock estimates congruent with those attained from independent, higher-tier reporting methodologies.
To explore this association at a finer regional scale, we also compared our woody C estimates to the United States Forest Service's Forest Inventory Analysis 75 (FIA) and found similarly strong congruence for AGBC and Total C stocks but subtle overestimates for BGBC (Fig. 9). The FIA is a Tier-3 inventory of woody forest biomass C stocks that is based on extensive and statistically rigorous field sampling and subsequent upscaling, We used data available at the state level for the year 2014 -again, the only year in which we could obtain data partitioned by AGBC and BGBC. Like our FRA comparison, we found a tight relationship between our woody AGBC totals and www.nature.com/scientificdata www.nature.com/scientificdata/ those reported by the FIA (Fig. 9b; RMSE CV = 25.7%, R 2 = 0.960, slope = 1.10, n = 48). Our woody BGBC estimates, though, were systematically greater than those reported by the FIA (Fig. 9d; RMSE CV = 86.4%, R 2 = 0.95, slope = 1.51, n = 48). This trend has been noted by others 27 and suggests that the global model that we used to estimate woody BGBC may not be appropriate for some finer scale applications as is foretold by the elevated uncertainty reported in our corresponding uncertainty layer (Fig. 5b). Our total woody C (AGBC + BGBC) estimates (Fig. 9f), however, agreed well with the FIA (RMSE CV = 34.1%, R 2 = 0.961, slope = 1.17, n = 48) and thus reflect the outsized contribution of AGBC to the total woody C stock. When the contribution of herbaceous C stocks is further added to these comparisons, our stock estimates intuitively increase in rough proportion to a state's proportional extent of herbaceous cover. The effect of this addition is particularly pronounced for BGBC estimates due to the large root-to-shoot ratios of grassland vegetation.
The relative congruence of our results with higher-tier stock estimates suggests that our maps could be used to facilitate broader adoption of higher-tier methods among countries currently lacking the requisite data and those seeking to better account for C in non-woody biomass. This congruence spans a comprehensive range of Fig. 9 Comparison of woody biomass density estimates to corresponding estimates of the FAO's FRA and the USFS's FIA. National woody AGBC totals derived from the woody components of our harmonized maps are compared to national totals reported in the 2015 FRA 62 (a) in relation to the IPCC inventory methodology used by each country. Likewise, we derived woody AGBC totals for US states and compared them to the corresponding totals reported by the 2014 FIA 75 (b), a Tier-3 inventory. We also show the additional effect of considering non-woody C -as is reported in our harmonized maps -in light green. Similar comparisons were made between our woody BGBC estimates and the corresponding estimates of both the FRA (c) and FIA (d). We further summed our woody AGBC and BGBC estimates and compared them to the total woody C stocks reported by both the FRA (e) and FIA (f).
www.nature.com/scientificdata www.nature.com/scientificdata/ biophysical conditions and spatial scales ranging from small states to large nations. Moreover, a recent study suggests that the fidelity of the underlying GlobBiomass AGB map may extend to even finer scales 31 . While our BGBC estimates may differ from some fine-scale estimates (Fig. 9d), their tight agreement with high tier BGBC totals at the national level (Fig. 9c) suggests that they may still be well suited for many national-scale C inventories -especially for countries lacking requisite high tier data. Use of our maps is unlikely to introduce error in excess of that currently implicit in Tier-1 estimates. Credence, though, should be given to the associated uncertainty estimates. To facilitate wider adoption of higher-tier methodologies, our maps could be used to derive new, region-specific default values for use in Tier-2 frameworks 76 or to either represent or calibrate 2010 baseline conditions in Tier-3 frameworks. In so doing, inventories and studies alike could more accurately account for the nuanced global geographies of biomass C.

Usage Notes
These maps are intended for global applications in which continuous spatial estimates of live AGBC and/or BGBC density are needed that span a broad range of vegetation types and/or require estimates circa 2010. They are loosely based upon and share the spatial resolution of the ESA CCI Landcover 2010 map 37 , which can be used to extract landcover specific C totals. However, our products notably do not account for C stored in non-living C pools like litter or coarse woody debris, nor soil organic matter, though these both represent large, additional ecosystem C stocks [77][78][79] . Our maps are explicitly intended for global scale applications seeking to consider C in the collective living biomass of multiple vegetation types. For global scale applications focused exclusively on the C stocks of a single vegetation type, we strongly encourage users to instead use the respective input map or model referenced in Table 1 to avoid potential errors that may have been introduced by our harmonization procedure. For AGB applications over smaller extents, users should further consider whether locally specific products are available. If such maps are not available and our maps are considered instead, credence should be given to their pixel-level uncertainty estimates. As mentioned above, the biomass of shrublands was only explicitly accounted for in Africa and the Arctic tundra, since neither broad-scale maps nor models generalizable to other areas were available in the existing literature. As such, we caution against the use of our maps outside of these areas when shrubland biomass is of particular interest or importance. Moreover, in contrast to the estimates for all other vegetation types considered, which we upscaled to a 300 m resolution, cropland C estimates were largely based on relatively coarse 8 km resolution data that were downscaled using bilinear resampling to achieve a 300 m spatial resolution. As such, these estimates may not adequately capture the underlying finer-scale spatial variation and should be interpreted with that in mind. Likewise, we reiterate that some BGBC estimates may differ from locally derived Tier-3 estimates, and attention should thus be given to our reported pixel-level uncertainty for all applications. Finally, our maps should not be used in comparison with the IPCC Tier-1 map of Ruesch and Gibbs (2008) to detect biomass change between the two study periods due to significant methodological differences between these products.

Code availability
Cropland biomass maps were created in the R statistical computing environment 80 . All other coding was done in Google Earth Engine 81 (GEE), wherein our workflow consisted of 18 interconnected scripts. All code can be found on GitHub (https://github.com/sethspawn/globalBiomassC) and permanently archived by Zenodo 82 .   Table 8. Statistical comparison of woody biomass carbon totals derived from the 2010 harmonized maps and those reported by the FRA in relation to the IPCC inventory methodology used. Statistics for AGBC, BGBC, and total C correspond to relationships depicted in Fig. 9a,c,f, respectively. (2020) 7:112 | https://doi.org/10.1038/s41597-020-0444-4 www.nature.com/scientificdata www.nature.com/scientificdata/ www.nature.com/scientificdata www.nature.com/scientificdata/ acknowledgements We gratefully acknowledge all the data producers, without whom this work would not be possible. al. whose AGB estimates comprise the core of our harmonized products and, in many cases, whose feedback greatly improved the quality of their inclusion. We are also grateful to the thoughtful feedback of three anonymous reviewers whose suggestions greatly improved the quality of our products and the clarity of our manuscript. Funding for this project was generously provided by the David and Lucile Packard Foundation and the National Wildlife Federation.