Environmental drivers of increased ecosystem respiration in a warming tundra

Arctic and alpine tundra ecosystems are large reservoirs of organic carbon1,2. Climate warming may stimulate ecosystem respiration and release carbon into the atmosphere3,4. The magnitude and persistency of this stimulation and the environmental mechanisms that drive its variation remain uncertain5–7. This hampers the accuracy of global land carbon–climate feedback projections7,8. Here we synthesize 136 datasets from 56 open-top chamber in situ warming experiments located at 28 arctic and alpine tundra sites which have been running for less than 1 year up to 25 years. We show that a mean rise of 1.4 °C [confidence interval (CI) 0.9–2.0 °C] in air and 0.4 °C [CI 0.2–0.7 °C] in soil temperature results in an increase in growing season ecosystem respiration by 30% [CI 22–38%] (n = 136). Our findings indicate that the stimulation of ecosystem respiration was due to increases in both plant-related and microbial respiration (n = 9) and continued for at least 25 years (n = 136). The magnitude of the warming effects on respiration was driven by variation in warming-induced changes in local soil conditions, that is, changes in total nitrogen concentration and pH and by context-dependent spatial variation in these conditions, in particular total nitrogen concentration and the carbon:nitrogen ratio. Tundra sites with stronger nitrogen limitations and sites in which warming had stimulated plant and microbial nutrient turnover seemed particularly sensitive in their respiration response to warming. The results highlight the importance of local soil conditions and warming-induced changes therein for future climatic impacts on respiration.

the climate data from the depth closest to the most frequent depth across all datasets, i.e. using the data-contributors' own protocols (e.g., pin-pointing, cover estimation).Aboveground biomass was based on actual biomass harvest data (19 datasets) as well as biomass estimated from vegetation community cover data (42 datasets).Height reflected mean community height based on field estimates (33 datasets followed our suggested protocol, 12 datasets followed their own protocol).Similarly as for the soil data, vegetation data were measured in the same year as, or the closest year in time to, the ER measurements per dataset (Supp.Discussion 5).

(d) Microbial community: Bacterial abundance, Fungal abundance, and FB-ratio
Microbial drivers included plot-level proxies for bacterial and fungal biomass and derived fungal-bacterial (FB)-ratios.These proxies were based on taxon-specific quantification of marker genes.All microbial data were collected according to a standardized protocol in 2013 in 8 of the experiments covered by our meta-analysis, which could be related to 16 ER measurement years (datasets).Details on the sampling process and microbial analyses can be found in Jeanbille et al. (2021) 1 .

(ii) Context-dependencies
We collected information to compare the influence of variation in context-specific environmental conditions between the experiments on the ER response to warming in two different ways.First, similarly as for the indirect warming effects, we derived drivers from field measurements provided by data contributors, but we averaged climate and soil values from the control plots only here.Drivers quantifying context-specific environmental conditions related to the (a) Climate included Air temperature, Soil temperature and Soil moisture.Drivers related to the (b) Soil conditions included SOM, TC, TN, C:N-ratio, pH, BD, OLdepth (Ext. Table 4, Fig. 4-5).See details in i) Indirect warming effects above.Hence, these drivers comprised plot-specific data with a high level of detail, but at the same time they were not available from all experiments, lowering the sample size and power of these drivers in the meta-regressions.We supplemented these plot-specific drivers with map-derived values (e.g.soil C stock) or categorical values (e.g.soil moisture class) at the experiment level.This was because we aimed to investigate context-dependencies across all experiments, and to also include context-dependencies related to the vegetation communities for which no 'control' measurements were available that were comparable across all experiments.Specifically, for (a) Climate, we added Zone and permafrost (PF-probability); for (b) Soil, we included soil C stock, soil moisture class, and soil pH class, and for (c) Vegetation, we included Vegetation Class, and Net Primary Productivity.As we did not want to mix these two data sources, there is some overlap for a few drivers reflecting similar conditions but with greater or smaller sample sizes and respectively coarser or more specific data quality (i.e.soil moisture and pH).See below for details.See Supp.Table 5 for sample sizes for each of the context-dependent drivers.

Zone
Data contributors classified their experiment into one of three bioclimatic zones: alpine, low Arctic or high arctic tundra.Alpine zones refer to tundra growing on altitudes >1000m, whereas low Arctic and high arctic zones refer to the bioclimatic subzones from the Circumpolar Arctic Vegetation Map D-E and A-C respectively 2 .

Permafrost probability
For each experiment, we used a map-derived 1 km² value for probability of permafrost based on the map in Obu et al. (2019) 3 .

Soil C stock
We used a map-derived value for soil organic carbon (SOC) stock (tons ha -1 ) at 0-30 cm soil depth at the experiment level.We combined two SOC sources, i.e.SOC_global, based on the global SOC map (see FAO & ITPS (2018) 4 ) and SOC_circumpolar, based on the circumpolar soil map (see Hugelius et al. (2013) 5 ).SOC_global was used as default and SOC_circumpolar was used if the former was unavailable for the experiment.We preferred SOC_global over SOC_circumpolar because the former is more recent and more precise in terms of spatial accuracy.

Soil moisture class
We included soil moisture as a categorical driver, i.e. data contributors classified their experiment into one of three zones: dry, mesic, or wet.Soil moisture classes were broadly defined as wet (partly or continuously saturated, poor drainage, high water table), mesic (moist but not saturated, drains well, but never dries out), and dry (water drains or evaporates quickly leaving soils dry for most of the time).

Soil pH class
We additionally included soil pH as a categorical driver, i.e. data contributors classified their experiment into one of three pH classes: low (<5), medium (5-7), high (>7).

Vegetation class
For each experiment, we classified the vegetation into one of five broad categories of tundra systems (Barrens (B), Graminoid tundra (G), Prostrate shrub tundra (P), Erect shrub tundra (S), and Wetlands (W)).Data contributors classified their experiment into the most suitable Circumpolar Arctic Vegetation Map (CAVM) class 2 .For those experiments that were not included in the CAVM map (e.g.alpine tundra), we used a combination of data to determine the most comparable CAVM class, i.e. (i) broad description of the vegetation type by data contributors, (ii) vegetation cover of key functional groups, and (iii) soil moisture class.

NPP
For each site, we used a map-derived value of observed NPP (kg C m -2 year -1 ) obtained through the MODIS satellite, see Running & Zhao (2019) 6 .

(iii) Driver measurement years
Note that, in most cases, the vegetation, soil, and microbial data were measured in the same year or within 3 years from the ER measurements.For instance, 90% of the submitted '%Graminoids' driver data, 60% of 'TN', and 38% of 'FB_ratio' were measured in a year

Supp. Discussions
In this section, we discuss the supplementary analyses that we performed to evaluate the robustness of our methodology and results, specifically related to quantifying the ER response to experimental warming with OTC chambers (methodology), and investigating the drivers of variation in the ER response (meta-regression results).The meta-analysis results (see Ext. Table 1) show two key findings.First, investigating ER responses to warming based on the subset of 9 datasets confirmed that the response was similar (in sign and magnitude) to the ER response based on the 136 datasets used in our main study (see Ext. Table 1), and we therefore assume that the Ra and Rh response to warming from the subset is representative for the full dataset as well.The increase in total ecosystem respiration (ER) observed in the smaller metaanalysis is similar to that in the larger dataset, although slightly higher: 39% compared to 30% (Ext.Table 1).The overlap within their confidence intervals, however, strongly suggests that the results are within a comparable range.Second, the strong increase in ER with warming based on the subset of 9 datasets, appeared driven by increased plant-related and heterotrophic respiration, as both show strongly positive Hedges SMD and ROM as well.AIC values of all possible factor combinations, and showed no significant effects (p>0.05) of any of the drivers.We also ran single-factor models including all these drivers separately, for which we also did not find significant effects of any of the drivers.
We therefore conclude that the ER response was not affected by measurement methodology, and did not include these factors in our further meta-analysis orregression models as confounding factor.

Supp. Discussion 3: Potential interaction of OTC microclimate effects
Surprisingly, we did not find that changes in the microclimate caused by the OTCs (i.e., higher air and soil temperature, lower soil moisture; Supp.Table 3) were related to the ER response (Ext.Table 3).Because potential stimulating effects of higher soil temperatures on ER might have been masked by potential negative effects of soil drying occurring at the same time, we tested the interaction between soil temperature and moisture on the ER response with a two-factor meta-regression model (mods = ~ soil temp * soil moist).This model did not show a significant main or interactive effect, and we therefore conclude that there is no interaction masking possible microclimatic effects on the ER response.

Supp. Discussion 4: Sensitivity analysis meta-regressions for 'indirect warming effects' and 'context-dependencies'
To analyze relationships between a range of potential environmental drivers that represented indirect warming effects and context-dependencies, and the ER responses, we used plot-level field measurements reflecting soil, vegetation and microbial community conditions from the same experiments as the ER data, provided by the data contributors (only where these were available, see Supp.Table 2).The remoteness of many of these tundra field experiments often hampers regular, manual measurements in the field.In addition, the measurement of some environmental drivers is too labor intensive, destructive or specialized to be performed repeatedly with high frequency across all experiments.Data contributors were thus asked to provide as many environmental driver data as they had available, and to match these as closely in time as possible with their ER data.Still, our environmental data on the soil, vegetation, and microbial community drivers had the following characteristics that could have influenced the meta-regression results: 1) driver data were not always available for the same year as the ER data (i.e.inducing differences in years between ER and driver data), and 2) they were not always available from multiple years per experiment (i.e.inducing replicated driver data).Data contributors were asked to match their environmental driver data as closely in time as possible with their ER data.
These characteristics led to case-specific scenarios, depending on data availability per site, with some experiments having more accurately temporally matched ER-driver data than others, e.g. 3 years of ER data coupled to 3 years of a measured driver, vs.
3 years of ER data coupled to 1 (same) year of a measured driver.
We performed a sensitivity analysis to see how this case-specific matching in years between ER and environmental driver data might have affected our meta-regression results.We therefore re-ran all meta-regression models with p<0.1 with different scenarios to increasingly restrict our sample sizes related to the two characteristics that might have caused bias, i.e. difference in calendar years between ER and drivers, and replication of drivers to be coupled to multiple ER datasets (years).We then compared the model output (i.e.QM, slope (beta), p-value, and sample size) across these scenarios with the output from the full or unrestricted model (Supp.Figure 4).
The 7 restrictive scenarios included: A) differences in calendar year between the ER and the driver is maximum 0 (i.e., measured in exactly same year), 1, 2, 3, 4, or 5 years difference; and B) driver data cannot be replicated, and are only coupled to the ER year closest to the driver year.See below* for the overview of the number of datasets per environmental soil, vegetation, and microbial driver in the 8 sample size scenarios (7 restrictive + 1 unrestrictive).
This sensitivity analysis showed that overall, the slopes of the model (Beta) did not appear strongly influenced by the sample size (restriction), especially for most of the drivers that are significant in the full (unrestricted) scenario and reported and presented in the main text (Supp.Figure 4b).This suggests that the direction of the relations between drivers and ER response to warming are robust.That is, not affected by the discrepancy between measurement years of ER and driver data or by replication.On the other hand, QM values, i.e. the 'importance of moderator' reflecting the power of a driver in explaining the ER response, as well as the p-values were strongly driven by the sample size (restriction), where QM was higher, and p-values more significant for the fullest model for most of the drivers that are significant in the full (unrestricted) scenario.This implies that the models were less powerful to detect effects due to decreasing the sample sizes in more restrictive sample size scenarios.
We therefore conclude that using restrictive scenarios would compromise the power to detect relations in our database from field-based tundra warming experiments too strongly, including multiple labor intensive, destructive or specialized driver measurements.At the same time, using the full (unrestricted) database for the metaregressions does not appear to influence the relations (Beta) for the significant drivers.
We therefore deem the meta-regression results based on the full database the best, most realistic representation we were able to get from field measurements taken in the actual tundra warming experiments under study and therefore reported and interpret these in the main text.ER data from 56 independent OTC experiments was used in the analyses, where each experiment contributed between 1 and 13 measurement years of ER data (Supp. Table 2, Supp.Fig. 1), and between one and multiple measurement occasions each year (Supp.To accommodate for differences in precision related to the number of ER measurements across the growing season, we calculated Hedges' g Standardized Mean Difference (SMD), which is an effect size that divides the differences in means of both comparison groups (control and OTC) by the pooled standard deviation.
Hence, it is accompanied by a confidence interval that reflects the precision of the effect size for each dataset.In this way, datasets that were more precise, i.e. based on more measurements per season, weighed more in the meta-analyses andregressions.
To accommodate for differences in precision related to the number of ER years contributed per experiment, we repeated our meta-analysis on a subset excluding the Here, we investigated within-experiment temporal patterns in the variation of the ER response to warming, in addition to the temporal patterns observed across experiments that we reported in the main text (Fig. 3).Thereto, we extracted the slopes of linear regression models testing the effect of warming duration on the ER response for each experiment, both (a) across and (b) within age classes.Then, we performed meta-analyses testing whether overall the slope was significantly different from zero, which would indicate significant overall temporal trends within experiments.
The results are presented in the table below*.We visualized these linear regressions in Supp.Fig. 5.  From left to right, the columns show the dataset number (Nr), an alphabetically + chronologically ordered number per dataset ('DS') consisting of the unique experimental code and the year of flux measurement (Exp_ID_Year'), the total number of ER measurements or observations available for each dataset ('Tot Obs'), the number of day-observations used for each dataset ('Day Obs'), the number of OTC and control plots where ER measurements were taken ('OTC Plots', 'CTL Plots'), and the weights per dataset as used in the main meta-analysis.
Supp.From left to right: The type of environmental driver ('Type') and the specific driver ('Driver') investigated, the effect of the warming treatment on the driver (i.e.meta-analysis results: 'Slope') for two effect sizes: Hedges Standardized Mean Difference (SMD, left) vs. Raw Mean Difference (RMD, right).Model results are presented as meta-analysis slopes and 95% confidence intervals between square brackets, as well as significance levels based on p-values.Significant drivers and results are highlighted in bold.An upward (↑) or downward (↓) arrow before the model estimates indicates that the warming treatment increased or decreased the environmental driver.Sample sizes can be found in Supp.Table 5 ('Context-dependencies').Note that we do not include raw mean differences models for the vegetation community ('NA') because the methodologies across experiments to measure the %cover, biomass, or plant height differed too much, so comparing raw values is not warranted.From the ISRIC soil database, we sampled randomly 0.1% from the ca 10 million gridded datapoints within the study area.The convex hull shows the areas which the observations cover, ca 64% of the gridded values.

*
Overview of depths at which soil temperature and soil moisture data were provided:

maximum 3 Supp. Methods 2 : 1 . 1 km grid cell, 2 .
years different (before or after) the ER measurement year (90/100; 26/43; 6/16, respectively).See Supp.Discussion 4 for the full overview of sample sizes, when applying different restrictive scenarios to limit the max.acceptable differences in years between ER and environmental driver data, and for how we assessed to what extent (mis-)matches between the measurement years of ER versus soil, vegetation, or microbial data might have influenced our results.Methods for the upscaling Flow diagram of the upscaling method: Dark brown boxes indicate the layers that were used, which were based on the variables that were included in the significant model to predict ER response.In yellow, the processing steps are indicate, i.e.: Aggregation of the layer products to mean for each 1 km x Using information about the abundance of mineral layers for each grid cell, 3. Weight averaging the products obtained in 1 and 2, Next, 4.To estimate the uncertainty derived from the soil data, we used truncated multivariate random normal distribution generator to create 100 maps for both TN and C:N-ratio.Further, to include the uncertainty of the C:N-ratio estimates from SOC and TN instead of C:N-layer, we used measured values from our dataset, fitted 100 lines to it and used this to estimate the error derived from this assumption.Then, 5. Using the ROM model from our study, we predicted the RE response to warming for each n = 100 set of TN and C:N-ratio map.Finally, 6.The resulting n = 100 set of predicted values and associated standard errors were then combined to a mean and standard deviation for each grid cell.For a detailed description, see the Methods.

Fig 2 ).
Using seasonally averaged ER data based on different measurement occasions, and using different number of datasets (ER measurement years) per experiment across the analysis implies different precision in the data across the experiments.

Supp. Discussion 6 :
data from two 'outlier' experiments.The experiments ALA_1 and GRE_6 contributed data from many more measurement occasions per season (16273 and 1333 resp.)and/or from more years (11 and 13 years) than other experiments and therefore may have weighed disproportionally strong on the results.The results were similar as for the full dataset: mean pooled effect size of 0.59 (Hedges SMD, 95% CI [0.47, 0.72], N=112, p <0.001) as when all experiments and years were included: mean pooled effect size of 0.57 (95% CI [0.44-0.70],N=136, p<0.001).Temporal patterns in ER response to warming within experiments

Supp. Figure 3 :Supp. Figure 4 :Supp. Figure 7 :Supp. Figure 9 :
Trends (p<0.1) in the effects of flux measurement machine type (a) and OTC height (b) on ER response rates.Machine type represents whether ER measurements were taken with an infrared 'IR' or 'laser' machine.OTC height represents the height of the open-top-chambers used for experimental warming.ER Hedges Standardized Mean Differences (SMDs) for individual datasets are displayed with grey bubbles, calculated as [mean ER of the warmed plots -mean ER of the control plots] / pooled standard deviation).Bubble size denotes the weight of the observation used in the meta-regression models quantified as the inverse of the square root of within-study variance, with greater bubbles indicating greater weights.In (a), violin plots of the actual data across the two factor levels are displayed, i.e., kernel density estimation of the underlying distributions.Within the violin plots, meta-regression model estimates and 95% confidence intervals are displayed with yellow circles and error bars.In (b), the significant regression lines with 95%CI are displayed with black lines and grey shaded areas, respectively.Top left in each panel shows the sample size ('N', number of datasets), and bottom left shows the 'Qm' (Q-value of importance of the environmental drivers) and 'p'-value of the meta-regression models.Dotted horizontal lines (y=0.2, 0.5, 0.8 and -0.2, -0.5, -0.8) reflect small, medium, and large positive and negative Hedges SMD effect sizes, respectively 12 or subsequently greater ER increases and decreases with warming.Results of the sensitivity analysis on meta-regression models for drivers showing significant effects on ER Hedges SMD, and those showing trends in the latter.Significant environmental drivers based on the full model are presented on the x-axis with a red frame, trends without a frame.Different dots per driver represent different restrictive sample size scenarios, increasing in restrictiveness from right to left: the full model most right, followed by 6 scenarios where calendar year differences between ER measurements and driver measurements are restricted to maximum 5, 4, 3, 2, 1, 0 years, and the most left scenario represents no replication of driver measurements, including driver measurement only for the ER year closest to the driver year).Panel A shows the Qm-value per model per scenario, which represents the explanatory power of each driver, while Panel B represents the beta or (meta-)regression parameter for each driver.Dots are colored by sample size (nr. of datasets) in panel A, and by p-value of the meta-regression model in panel B.TN, CN, SMD, BD, and BM reflect total nitrogen, carbon:nitrogen ratio, Hedges Standardized Mean Differences, bulk density, and aboveground biomass.Spatial patterns of current ecosystem respiration (ER) (a) and its responses to 1.4°C warming (b-d) across the arctic and circumarctic alpine tundra, with uncertainty estimates (e, f).Current ER (a) and ER after 1.4°C warming (b), i.e. the average observed warming in the open-top chamber (OTC) warming experiments in our database, were compared to calculate absolute (c) and relative (d) changes in ER.Uncertainty estimates are given as standard error for the absolute change in ER (e) and as relative standard error for ER (standard error compared to the change in ER) (f).To conduct the upscaling, we used the significant meta-regression multi-factor model based on the warming experiments, i.e. including the ratio of means (ROM) as effect size (response) and TN and C:N-ratio in the mineral layer (Qm=6.7,p<0.05,N=39) as significant explanatory drivers , and applied this model to each 1 km x 1 km grid cell using global gridded soil data for TN concentration and C:Nratio of the mineral layer, resulting in a relative change in respiration.We also multiplied this relative change to spatially explicit baseline ER, calculated from spatially explicit global soil respiration and plant respiration databases, to obtain absolute changes in ER.For the whole region, ER increased from 3.4 to 4.3 PgC yr-1 (increase of 0.86 PgC yr-1 with std.deviation of 1.36 PgC yr-1) or by 25% (std.deviation = 40%).See Methods section and Supp.Methods 3 for more information.Comparison of observed TN and C:N-ratio (red, based on our experimental dataset) and gridded mean TN and C:N-ratio (blue, from ISRIC soil database) used for upscaling.

Supp. Discussion 1: ER partitioning: Autotrophic (Ra) vs. Heterotrophic respiration (Rh) response to warming
To understand whether the effect of warming on ER originated primarily from responses of plant or microbial respiration, we collected additional data from experiments used in our main analyses for which ER was partitioned into autotrophic (Ra) and heterotrophic respiration (Rh).

Discussion 5: Unbalanced ER time series across experiments
in 7 restrictive sample size scenarios and the full model.From left to right, scenario of no 278 replication of driver measurements, i.e.only including driver data once and for the ER year

01. Significant results with p<0.05 are highlighted in bold.
years, while a the ER response significantly increased with warming duration in age class[10-15)years.As a comparison, when evaluating the temporal patterns across experiments (main text), we found similar results, except for a significant decreasing trend in age class [5-10) years, which is not found here within experiments.It should be noted that these within-experiment temporal analyses, especially within age classes and for longer warming duration, are based on very small sample sizes (see table* above).Moreover, there are currently only two experiments with long time series of 11 and 13 years of subsequent ER data (ALA_1, GRE_6), thus covering multiple age classes, while all other experiments provided 1-4 measurement years of text), we found no significant effect.The results (b) within age classes showed no significant relation of ER response with warming duration in age classes [0-5),[5-10), and ≥ 15 ER data only (see table below*), often covering only single age classes.While the above within-experiment analyses thus strengthen rather than weaken our conclusion that experimental warming causes a continued increase in ER, we argue that the analyses across experiments, as reported in the main text, are more reliable representations of the overall the temporal patterns of ER responses to warming.We have therefore based our conclusions using exclusively the across-experiment temporal analyses.*Overview of the variability in time series length of ER years (i.e.Nr of repeated ER measurement years) within an experiment.

136 ER measurement years or datasets General information on the datasets included in the meta-analysis.
From left to right: country ('Country') and site ('Site') where the warming experiment is located; unique identification code of the warming experiment ('Exp ID') with the three first letters representing the country code; location in decimal degrees of the experiment ('Location'); environmental conditions quantifying the climate (climate zone or 'Zone', permafrost probability or 'PFprob'), the soil ('Soil Moisture class', 'pH class', and 'Soil C stock'), and vegetation ('Vegetation class', 'NPP'); specific ER measurement years as well as the number of years per experiment ('ER years', 'Nr ER years'); and the experimental warming duration in years at the time of ER measurements ('Warming duration').See Supp.Methods 1 for details on the environmental drivers.Countries with multiple Sites are highlighted with a grey background, as well as Sites with multiple Experiments (i.e.reflecting different vegetation communities or site conditions), and Experiments with multiple ER measurement years (i.e.'Datasets' throughout the manuscript)