Background & Summary

Plant biomass not only shapes how humans and wildlife use terrestrial ecosystems1,2,3 but also influences climate by modulating ecosystem carbon storage and surface energy balance4,5,6. However, plant biomass and its associated ecosystem services are sensitive to rapid climate warming, which is occurring at least three times faster in the Arctic than anywhere else on the planet7,8. Rapid warming of Arctic ecosystems has enabled plants such as shrubs and trees to grow taller and expand across the land surface9,10,11,12 and vegetation to become more productive13,14,15,16,17. These changes can affect traditional land use18,19, impact habitat suitability for wildlife3,15,20, and amplify climate warming, primarily by lowering the surface albedo4,6. Consequently, there is a pressing need to better understand spatial patterns and temporal changes in plant biomass and species distribution throughout Arctic ecosystems.

Field measurements are crucial for quantifying the amount, composition, distribution, and temporal changes in plant biomass across Arctic ecosystems. While recognizing the importance of efforts like The International Tundra Warming Experiment21 and US National Ecological Observatory Network22, it is nevertheless uncommon for plant biomass to be systematically measured and monitored in Arctic ecosystems, where such measurements are time-consuming, logistically challenging, and resource intensive. Rather, plant biomass typically has been measured as part of individual research projects, each with its own focus and protocols. For instance, researchers have investigated how plant biomass is affected by climate23,24, wildfire25,26,27,28, herbivores29,30, and soil properties such as texture, nutrients, and pH31,32,33. Researchers have also measured plant biomass to assess ecosystem carbon storage34,35, evaluate terrestrial ecosystem models36, and map spatial patterns of plant biomass from landscape to biome scales by linking field and remote sensing measurements37,38,39,40.

Researchers typically measure plant aboveground biomass by harvesting sample plots during mid- to late-summer, though measuring tall shrub and tree biomass generally requires surveying stems on sample plots and using allometric models34,41. However, the number and size of sample plots varies among research projects, as do the taxonomic and functional groupings used when partitioning samples. Samples are sometimes partitioned by species, or more coarsely partitioned into plant functional types that include multiple species with similar functional traits42. Furthermore, while researchers are progressively archiving individual datasets in a growing number of online public repositories, finding datasets can be challenging and many datasets remain unarchived. Even when archived, it is still necessary to harmonize datasets before they can be used together to inform larger-scale biomass monitoring and mapping efforts. So far, there have been limited efforts to compile and harmonize plot-level measurements of plant aboveground biomass across individual datasets, either regionally43,44 or for the overall Arctic37,45. Altogether, these factors hinder efforts to understand spatial patterns and temporal changes in plant biomass across the rapidly warming Arctic.

Here, we present The Arctic plant aboveground biomass synthesis dataset, which includes georeferenced measurements of lichen, bryophyte, herb (graminoid and forb), shrub, and/or tree aboveground biomass on 2,327 sample plots from 636 field sites across Arctic and Subarctic ecosystems (Fig. 1). These five plant functional types correspond to broad differences in trait characteristics (e.g., height, woodiness, vascularity) and effects on ecosystem processes42 (Table 1), and are commonly used by terrestrial ecosystem models to represent plant form and function46,47. We created the synthesis dataset by assembling and harmonizing 32 datasets where aboveground biomass was quantified by harvesting sample plots, or, for trees and often tall shrubs, by surveying sample plots and using allometric models. Aboveground biomass is reported for each plant functional type as grams of oven-dried aboveground live biomass per square meter of ground surface (g m−2), and in most cases represents the peak summer biomass on each sample plot. The synthesis dataset does not include measurements of belowground biomass, which were recently compiled elsewhere45, or biomass chemistry (e.g., carbon or nitrogen content). Altogether, the synthesis dataset includes measurements that span a broad range of bioclimatic conditions across seven of the eight Arctic nations (Figs. 1, 2a,b). The synthesis dataset can be used for a variety of ecological applications that include monitoring, mapping, and modeling spatial patterns and temporal changes in plant aboveground biomass across the Arctic.

Fig. 1
figure 1

Synthesis dataset field site locations in (a) geographic and (b) climatic spaces. The synthesis dataset includes field sites from the sparsely vegetated High Arctic, moderately vegetated Low Arctic, mountainous Oroarctic, and forested Subarctic. (a) Bioclimatic zones were derived from several datasets98,99,100 and clipped to north of 55°N. (b) Climatologies are for the period 1981 to 2010 based on the CHELSA dataset gridded at 1 km2 resolution (version 2.1)101,102. Growing degree days represent the heat sum above 0 °C. To improve clarity, panel (b) excludes the Subarctic, the warmest and wettest 2.5th percentiles, and climate spaces (i.e., unique growing degree day and precipitation combinations) that covered less than 500 km2.

Table 1 Description of plant functional types used in The Arctic plant aboveground biomass synthesis dataset.
Fig. 2
figure 2

Frequency distributions of where and when sample plots were measured. Specifically, the (a) bioclimatic zone, (b) country, (c) year, and (d) day of year in which sample plots were measured. (b,c,d) Histogram bars are subdivided and color-coded by bioclimatic zones.

Methods

General approach

To create the synthesis dataset, we assembled, harmonized, and screened individual datasets. We then merged the harmonized datasets, added completeness flags, and performed further quality assurance. The workflow is depicted in Fig. 3, with further details provided below.

Fig. 3
figure 3

Workflow diagram depicting the process for creating The Arctic plant aboveground biomass synthesis dataset from existing datasets. Harmonization of metadata and biomass data included reformatting sample dates and spatial coordinates into common formats, as well as summarizing aboveground biomass by a common set of plant functional types.

Dataset sources

We assembled datasets by searching public data archives and directly soliciting datasets from the authors of relevant scientific papers and members of our professional networks. We searched public data archives and Google Scholar using combinations of keywords that included Arctic, tundra, vegetation, plant, and biomass. Data archives included the Arctic Data Center, DataOne, Dryad, Oak Ridge National Laboratory Distributed Active Archive Center for Biogeochemical Dynamics, PANGAEA Data Publisher for Earth and Environmental Science, Polar Data Catalog, and Zenodo. After identifying datasets and performing an initial screening, we contacted the researchers who created the dataset, sought additional information as needed, requested permission to include the dataset in the synthesis, and invited those researchers to be coauthors on the synthesis dataset. In total, we assembled 32 individual datasets provided by 54 researchers at 28 institutions in eight countries (Table 2).

Table 2 Summary of individual datasets that comprise The Arctic plant aboveground biomass synthesis dataset.

Metadata harmonization

We harmonized plot-level plant biomass measurements and metadata from individual datasets using custom scripts written in R48. These scripts provide a record of the harmonization process and enable future updates. For each dataset, we assigned a unique sequential identifier, recorded the names of data contributors, and included a citation to the original peer-reviewed paper or dataset, thereby enabling users to trace the origin of each measurement. For each unique sample plot in the dataset, we identified the country of origin, assigned a general locale, and recorded the original field site ID and sample plot ID. Site ID and plot ID may not be unique identifiers. Therefore, to ensure that each field site and sample plot could be uniquely identified in the synthesis dataset, we created site codes and plot codes by concatenating the country, locale, site ID, and plot ID. We documented whether the GPS coordinates were recorded at the site or plot level, then harmonized the coordinates to decimal degrees in the WGS84 global reference system using the sf package in R49.

The definition of a field site varied among individual datasets. Most often, a field site included multiple sample plots along one or more transects in a single vegetation type. In other cases, a field site included sample plots spread among multiple vegetation types in a landscape29,35,38,50,51. In these later cases, we subdivided the field site by grouping sample plots by vegetation types (e.g., low shrub tundra vs. graminoid tundra) that were recorded by the researchers who conducted the field work. This helped to ensure that in the synthesis dataset, each field site included multiple sample plots (i.e., replicate measurements) from a single vegetation type.

Plant aboveground biomass measurement harmonization

Plant aboveground biomass was quantified for most functional types by harvesting sample plots during mid- to late-summer. Typically, non-tree vascular plants rooted in a sample plot were clipped at the moss or ground surface and sorted into functional types (e.g., herbs, shrubs) or finer taxonomic groupings (e.g., species). If present, lichen and the green portion of mosses and other bryophytes were then harvested. Samples were dried to a constant weight typically at 50–60 °C using a drying oven and weighed using a digital scale. Trees and tall shrubs are challenging to harvest and process; therefore, tree and often tall shrub aboveground biomass were quantified on sample plots by (1) measuring the diameter of each stem at the ground surface or chest height, (2) predicting stem dry weight from stem diameter using allometric models41,52, and (3) summing stem dry weight across all stems on the sample plot. For some sample plots, dwarf to low shrubs were harvested while tall shrubs were surveyed. The synthesis dataset includes the sampling date and quantification method for each plant biomass measurement.

Individual datasets differed in the taxonomic detail of plant biomass measurements. While some datasets provided measurements for individual species and one dataset provided measurements of total aboveground biomass53, most datasets instead provided measurements for species-groups or broader plant functional types. Therefore, it was necessary to aggregate the plant biomass measurements to a harmonized set of plant functional types, with the level of taxonomic detail dictated by the most coarsely partitioned datasets. The synthesis dataset therefore includes plant biomass measurements that were aggregated to five plant functional types: lichens, bryophytes, herbs, shrubs, and trees (Table 1). Lichens are predominantly fungal54, yet are often included as a plant functional type in Arctic ecology42.

Plant biomass is expressed as grams of oven-dried aboveground live biomass per square meter of ground surface (i.e., g m−2); however, the actual area of each sample plot widely varied among individual datasets and plant functional types. For instance, bryophytes and lichens are small and particularly time consuming to harvest, thus sample plots typically were about 0.1 m2. In several cases, bryophyte and lichen biomass were upscaled using targeted harvests and measurements of functional type cover on a larger sample plot35,55. Herbs and shrubs were typically harvested from sample plots that were about 0.25 m2, while tall shrub and tree biomass were quantified by surveying sample plots up to 25 m2 and 100 m2, respectively. The synthesis dataset therefore includes the area (m2) of the sample plot that was used when measuring plant biomass for each functional type.

We sought to assemble plant biomass measurements for all functional types present on each sample plot; however, there were cases when a plant functional type was present but not measured. This was most common for bryophytes and lichens. Several datasets were missing plant biomass measurements for certain functional types but had ancillary estimates of areal cover by functional type. We set plant biomass to 0 g m−2 for functional types that had 0% cover and added a note to document this decision. We took special care to document as “unmeasured” when a plant functional type was present in a sample plot but not measured (i.e., missing data). Therefore, every sample plot in the synthesis dataset includes a discrete biomass measurement or documented missing value for each of the five plant functional types. Furthermore, each sample plot has a set of logical flags (i.e., true or false) that identify which groups of plant functional types were measured (e.g., all vascular or woody functional types). These flags can help guide appropriate use of the synthesis dataset.

Data Records

The Arctic plant aboveground biomass synthesis dataset is publicly available online through the Arctic Data Center56. The dataset includes one file in a comma-separated format (.csv) that has 11,372 rows and 33 columns. The first-row stores column names, while each subsequent row stores the biomass measurements and associated metadata for a single plant functional type (e.g., shrubs) on a sample plot. The dataset has 17 columns with character values, eleven columns with numeric values, and five columns with logical flags. Details about each column are provided in Table 3.

Table 3 Description of each column in The Arctic plant aboveground biomass synthesis dataset.

Altogether, the synthesis dataset requires about 7 MB of hard drive storage space.

Technical Validation

We took multiple steps to ensure the technical quality of The Arctic plant aboveground biomass synthesis dataset. For individual datasets (n = 32), we started by examining the structure of the tabular data, as well as visually screening these data for potential errors (e.g., typographical errors). Individual datasets were unique; therefore, we harmonized each dataset using a custom script in R48. These scripts provide documented and refinable workflows for data harmonization, which included, but were not limited to, fixing typographical errors, and screening the spatial coordinates for each field site and/or sample plot. Specifically, we visually screened spatial coordinates for irregularities by mapping each reported location over high-resolution satellite imagery using the R package leaflet57. Accurate spatial coordinates are especially important for ecosystem monitoring and mapping. Each script also included checks to ensure there were plot-level data for all five plant functional types and, after harmonization, that the dataset columns matched the synthesis dataset.

We created the synthesis dataset by merging the individual harmonized datasets and then performed additional screening using R. To ensure data quality for each column with character values, we extracted the unique values and visually checked for errors. For each column with numeric values, we calculated the range of values and similarly checked for errors. Plant aboveground biomass (g m−2) is the principal measurement in the synthesis dataset; therefore, we further examined these numeric values. This included visually inspecting histograms for each plant functional type (Fig. 4), as well as computing standardized anomalies (i.e., z-scores) and inspecting measurements with anomalies greater than three standard deviations for errors.

Fig. 4
figure 4

Frequency distribution of plant aboveground biomass (g m−2) by functional type for sample plots in The Arctic plant aboveground biomass synthesis dataset. To improve clarity, (1) the x-axis is limited to 95% of the maximum range in aboveground biomass for each plant functional type, and (2) sample plots are not shown if there was no biomass (i.e., 0 g m−2) for the plant functional type. The total number of sample plots and field sites with biomass measurements is provided for each plant functional type.

To further validate the synthesis dataset, we compared the range of total aboveground biomass values in the synthesis dataset with values reported by several prior syntheses43,58 and found they were of similar magnitudes. Gilmanov and Oechel43 reported that total aboveground biomass ranged from 3 g m−2 to 4,058 g m−2 among 56 field sites in Subarctic and Arctic ecosystems across North America and Greenland. In our synthesis dataset, total aboveground biomass ranged from 2 g m−2 to 3,123 g m−2 among 182 field sites in the same regions, excluding two forest sites in the Subarctic with 6,261 and 8,947 g m−2. Similarly, Wielgolaski58 reported that total non-tree aboveground biomass ranged from 57 g m−2 to 2,162 g m−2 across 14 field sites in Subarctic and Arctic ecosystems in the USSR and Norway44. In our synthesis dataset, total non-tree aboveground biomass ranged from 15 g m−2 to 2,344 g m−2 across 60 field sites in the same regions, excluding one site in a dense riparian shrub thicket with 8,218 g m−2. In our entire synthesis dataset, only 2.5% of field sites had total aboveground biomass greater than 4,000 g m−2 (max = 8,947 g m−2), almost all of which were Subarctic forests. Total aboveground biomass tends to be much lower in Arctic tundra than Subarctic forests, where total aboveground biomass averages ~6,000 g m−2 but can range from ~2,000 g m−2 to ~30,000 g m−2 depending on climate and disturbance history59,60.

We further examined how aboveground biomass varied for plant functional types both within and across bioclimatic zones (Table 4) as compared with previously reported patterns. However, it is important to recognize that field sites in our synthesis dataset are not random or stratified samples of these bioclimatic zones and thus summary statistics may be biased. Nevertheless, the most pronounced pattern was an increase in median shrub aboveground biomass from ~35 g m−2 in the High Arctic to ~140 g m−2 in the Low Arctic, reaching ~190 g m−2 in Oroarctic and ~340 g m−2 in the Subarctic. Similarly, the median total aboveground biomass increased from ~340 g m−2 in the High Arctic to 1,230 g m−2 in the Subarctic. General increases in shrub and total aboveground biomass from the High Arctic to the Subarctic are well-documented macroecological patterns23,24. Also consistent with prior research23, we observed that median bryophyte and shrub aboveground biomass were consistently higher than median lichen, herb, or tree aboveground biomass, with bryophytes comprising the largest proportion of total aboveground biomass in the High Arctic and shrubs the largest proportion in the Low Arctic. However, it is important to note there is high spatial variability in the amount and composition of plant aboveground biomass among field sites in each bioclimatic zone, reflecting pronounced heterogeneity within and among vegetation communities61,62.

Table 4 Summary of aboveground biomass (g m−2) by plant functional type for each bioclimatic zone.

Usage Notes

It is important to be aware of potential uncertainties and limitations when using the synthesis dataset, including uncertainties related to quantifying plant aboveground biomass on sample plots. First, it can be challenging to establish sample plot boundaries and identify which plants are rooted inside the plot and spreading outside the plot, versus rooted outside and spreading in. Second, it can be difficult to separate aboveground from belowground biomass. This source of error could particularly impact moss biomass measurements since the transition can be difficult to discern, though can also affect vascular plant biomass measurements if belowground rhizomes that form shoot tissue are excluded. Third, if plants are highly intermixed, it can be difficult to cleanly separate aboveground biomass into taxonomic or functional groups. Lichens are particularly prone to underestimation because small filamentous lichens are difficult to separate from litter, and crustose lichens were not harvested. Fourth, since it was not feasible to harvest trees and tall shrubs on sample plots, their aboveground biomass was instead estimated using stem diameter measurements and allometric models. Individual research teams selected and applied the allometric models they deemed most suitable, though it is important to acknowledge the dearth of allometric models for most Arctic trees and shrubs. In total, about 4.4% of plant biomass measurements in the synthesis dataset were derived using this approach and are thus subject to allometric model uncertainty. Efforts to reduce measurement uncertainty and improve data quality could focus on developing new allometric models for Arctic trees and shrubs, as well as establishing good-practice guidelines for measuring plant aboveground biomass in Arctic ecosystems.

The synthesis dataset has a slight taxonomic bias towards vascular plants over non-vascular plants. Specifically, herb, shrub, and tree biomass were measured on 89–97% of sample plots, but lichen and bryophyte biomass were measured on 67–72% of sample plots. This is likely due to greater research emphasis on vascular plants and, as discussed above, challenges with measuring lichen and bryophyte biomass. We encourage researchers to measure biomass for every plant functional type found on their sample plots whenever possible.

The geolocation accuracy of the field measurements should be considered when using the synthesis dataset for geospatial analyses. For each dataset, we assembled the best available coordinates, resulting in plot-level and site-level coordinates for 72% and 28% of measurements, respectively. Additionally, plot and site coordinates were determined using a variety of GPS units, with accuracies ranging from <1 meter to tens of meters. If necessary, users can filter the biomass measurements by coordinate type (i.e., plot or site), though we caution that not all plot-level coordinates may be suitable for geospatial analyses that require meter or submeter accuracy.

The synthesis dataset includes plant biomass measurements from across the Arctic (e.g., Figs. 1, 2a,b); however, there are geographic biases and gaps in data coverage. The distribution of sample plots was biased towards northern Europe (33%) and Alaska, USA (25%), with much lower density of sample plots across Canada (20%), Russia (16%) and Greenland (6%). Regions with data gaps include large parts of northern Canada, the Taimyr Peninsula and Chukchi Peninsula in Russia, and most of Greenland. These general regions have been identified as under sampled in prior assessments of geographic sampling biases in Arctic terrestrial research63,64,65. Regional and bioclimatic biases and gaps in existing field measurements of plant biomass could be quantitatively assessed using the synthesis dataset, which could help strategically prioritize future efforts to measure and monitor ecological changes occurring in the Arctic.

The time periods represented by the synthesis dataset should also be considered. Plant biomass was measured on sample plots between June and early September from 1998 to 2022 (Fig. 2c,d), with about two thirds of sample plots measured after mid-July. In tundra ecosystems, total plant aboveground biomass tends to reach a summer maximum between mid-July and late-August depending on growing season conditions, vegetation composition, and herbivory66,67,68. We estimate that plant biomass measurements made after mid-July are likely within ±15% of the summer maximum based on seasonal changes in plant aboveground biomass measured on sample plots in the Oroarctic68, Low Arctic67, and High Arctic66. Plant biomass measurements made before mid-July likely underestimate the summer maximum to a greater degree. When using the synthesis dataset, plant biomass measurements can be temporally filtered to fit the research needs.

We included as many individual datasets across the Arctic as possible within time limits allocated to this work but acknowledge the synthesis does not include all existing datasets. We prioritized datasets from observational studies carried out in the 21st century where plant biomass was separately measured for every functional type and where sample plots were accurately geolocated. In some cases, it was not possible to obtain access to datasets, or incorporate datasets that very recently became available69,70. We programmatically created the synthesis dataset using custom R scripts, and thus the synthesis dataset could in the future be updated to include additional datasets and other refinements.

The Arctic plant aboveground biomass synthesis dataset can be used for a variety of ecological applications that include monitoring, mapping, and modeling spatial patterns and temporal changes in plant biomass. Sample plots in the synthesis dataset could serve as ecological baselines for long-term monitoring and experimental manipulations (e.g., warming chambers, herbivore exclosures), or used to analyze geographic biases and gaps in existing field data63,64. These field data could be linked with satellite or airborne observations to create maps of plant biomass that can used for carbon accounting71, land use planning29, terrestrial ecosystem model evaluation72, and other ecological applications5,39. These field data can also be directly used to evaluate and improve terrestrial ecosystem models and their simulations of Arctic ecosystem response to climate warming36,73,74,75. Overall, The Arctic plant aboveground biomass synthesis dataset is a unique dataset suitable for many ecological applications.