## Introduction

Terrestrial net ecosystem carbon exchange (NEE) currently absorbs the equivalent of approximately 25% of all anthropogenic CO2 emissions1 and plays a significant role in regulating the variability of the global carbon (C) cycle2,3,4. Despite the importance of terrestrial NEE, its response to climate is a major source of uncertainty in future climate predictions5. Terrestrial NEE represents the small imbalance between CO2 assimilation through gross primary production (GPP) and CO2 release through total ecosystem respiration (TER). GPP and TER are coupled over the long term through the distribution of carbon assimilated to ecosystem carbon pools and their subsequent turnover leading to TER. Yet, GPP and TER can be decoupled on temporal scales going from years to centuries if one of these fluxes is perturbed by environmental conditions, and small decoupled variations in GPP or TER fluxes can result in large variations in NEE. Studies have shown that inter-annual variability (IAV) of terrestrial NEE and its sensitivity to climate has increased during the past 50 years3,6,7, but whether such an increase in NEE variability is due to the climate sensitivity of ecosystem production or respiration remains difficult to determine8,9,10,11.

A rich history of multiyear site-based data has revealed that the sensitivity of ecosystem production to precipitation decreases as water availability becomes more abundant12,13,14,15,16,17. Global analyses suggest that fluctuations in global land NEE are either due to water-controlled production in dry land ecosystems18,19, or due to temperature-controlled respiration in tropical ecosystems3. These competing hypotheses, i.e., water vs. temperature, were recently reconciled by Jung et al.20 who found that water-driven GPP and TER responses compensate each other, dampening water-driven NEE variability regionally, and therefore leaves a dominant temperature signal at global scale. However, previous conclusions were either based on ecosystem-scale measurements, or based on the water availability and NEE simulated by global vegetation models, or on data-driven empirical models used to extrapolate NEE globally, which are difficult to verify at the regional scale. Therefore, we still lack a good empirical understanding of climate sensitivity of ecosystem production and respiration and their consequences on net terrestrial carbon dynamics at regional scales21. Using a dense network of well-constrained observations across the contiguous United States (CONUS), we began by testing whether the widespread assumption that production is the primary control on IAV of NEE is true at the continental scale. We then investigated what processes control the IAV of NEE and how do they respond to water availability. Finally, we studied whether the state-of-the-art dynamic global vegetation model (DGVM) can capture the production and respiration dynamics in response to continental-scale water availability patterns.

We first calculated the per-pixel temporal correlation between IAV of gridded observation-based fluxes (e.g., detrended GPP and NEE using GPP derived from MODIS observations of the fraction of light absorbed by plants using a light-use efficiency model (GPPMODIS) and mean NEE from atmospheric CO2 inversions constrained by a dense network of atmospheric CO2 concentration observations, given atmospheric transport models (NEEACI) on an ecosystem scale using a dense network of 17 eddy covariance sites from across the representative biomes in CONUS (see Methods: Temporal correlation between IAV of GPP and NEE). Then, we compared the sensitivity of GPP and TER derived from these observation-constrained estimates with the results from an ensemble of ten DGVMs. Here precipitation was chosen as the main controlling variable for carbon fluxes, based on previous analyses13,14,15 and our own sensitivity analysis (see Methods: Sensitivity analysis). Precipitation is a simple measure of ecosystem water availability that is accurately measured across CONUS, and there are several lines of evidence for strong relationships with ecosystem production in this region15. Temporal sensitivities (δt and γt) were calculated from linear regression models in which ecosystem carbon fluxes (i.e., GPP or TER) are regressed against climate factors (i.e., precipitation and temperature) that varied over time. These values of δt and γt indicate the apparent sensitivities of carbon flux anomalies to unit change in climate factor for a given ecosystem over time14. In addition, spatial sensitivities (δs and γs) were calculated from nonlinear models based on ecosystem flux data combined from grid cells or ecosystem sites and used to indicate the apparent sensitivities of mean carbon fluxes to a unit change in climate factor along climate gradients, generally across different ecosystem types14. Lastly, we conducted a simple respiration modeling experiment of increasing complexity based on empirically derived models of heterotrophic respiration (see Methods: Ecosystem respiration modeling experiments) in order to allow a process-oriented evaluation of DGVM results. We show a precipitation threshold between 750 and 950 mm yr−1, below which the IAV of NEE is regulated by ecosystem production and above which IAV of NEE appears to be regulated by ecosystem respiration across CONUS. This precipitation threshold is evident across multiple datasets and scales of observation, but not captured by DGVMs, likely due to inaccurate simulation of heterotrophic respiration to environmental constraints.

## Results and Discussion

### Spatial and temporal correlation between GPP and NEE

Contrary to the finding that ecosystem production is the primary factor controlling continental-scale variations in net carbon exchange19,22,23, we find that mean annual GPP and NEE occupy different climate spaces (Supplementary Fig. 1) and do not necessarily covary spatially (Fig. 1a, b) or temporally (Fig. 1c) at regional scales. First, looking at spatial variations, ecosystem GPP is much more strongly controlled by mean annual precipitation (MAP; r = 0.93, p < 0.001) than by mean annual temperature (MAT; r = 0.38, p < 0.001), and increases with precipitation (Supplementary Fig. 2a), such that the highest mean annual GPP appears in the relatively warm and wet southeastern United States (Fig. 1a). In contrast, the largest NEE (i.e., a strong carbon sink) is found at intermediate levels of MAP (~750–1200 mm yr−1), and then decreases at both higher MAP ( > 1200 mm yr−1) and higher MAT (>20 °C) (Supplementary Fig. 2b), such that the highest mean annual NEE appears in the relatively cool and wet North Central United States (Fig. 1b). The spatial inconsistency between patterns of observed mean annual GPP and NEE suggests that the most productive ecosystems do not necessarily have the largest NEE uptake. This happens because NEE reflects not only current GPP but also the legacy of past ecosystem exposure to climate, the effect of management such as biomass harvest, and disturbances that decouple spatially annual NEE from GPP.

To test the commonly held assumption that photosynthesis is the main process regulating inter-annual NEE24, we calculated Pearson’s product moment temporal correlations between four independent GPP proxies and two NEE estimates at 1° spatial resolution from 2000 to 2014 across CONUS (n = 8 data products). Mean Pearson’s r from the resulting eight gridded observation-based fluxes showed that there is a significant positive correlation between GPP and NEE in more xeric (e.g., semiarid western grassland or shrubland) ecosystems, but only a very weak correlation in more mesic (e.g., eastern deciduous broadleaf forest) ecosystems (Fig. 1c). This sharply contrasting pattern is independent of the combination of GPP and NEE datasets used (Supplementary Fig. 3) and of their spatial or seasonal resolution (Supplementary Fig. 4), which suggests that this relationship is not an artifact of the observationally constrained dataset used. Wildfire CO2 emission (Supplementary Fig. 5) and human activities, such as agriculture (Supplementary Fig. 6) also did not substantially change the temporal correlation between GPP and NEE along the precipitation gradient in the CONUS. In contrast, mean Pearson’s r between GPP and NEE from the ensemble of TRENDY DGVM simulations (n = 10 models) showed a universally positive temporal correlation across the CONUS that was stronger in more xeric western ecosystems (Fig. 1d). Mean Pearson’s r from TRENDY DGVM simulations became higher than that from observation-constrained estimates in all regions where MAP is >~750 mm yr−1 (Fig. 1e). The varying strengths of GPP and TER controls on IAV of NEE along the continental precipitation gradient appear to be a robust pattern among observational datasets that is not well captured by DGVMs, in which NEE remains highly coupled to GPP across the entire CONUS.

### NEE sensitivity along the precipitation gradient

To explore the underlying processes that control NEE sensitivity along the precipitation gradient, we compare the spatial and temporal sensitivity of production and respiration to climate controls. Regarding temporal sensitivities, we find that the IAV of both GPP and respiration is primarily driven by precipitation (Supplementary Fig. 7 and Supplementary Table 1) and thus focused our analysis on their sensitivities to precipitation (Fig. 2). Both observational datasets showed decreased IAV sensitivities of GPP ($$\delta _{\rm GPP}^t$$ and $$\delta _{\rm GPP}^s$$) and TER ($$\delta _{\rm TER}^t$$and $$\delta _{\rm TER}^s$$) in response to increasing precipitation, but the slope of GPP sensitivity is steeper than TER (Supplementary Figs. 89 and Supplementary Table 1), which results in a precipitation threshold above which the IAV and local spatial gradients of NEE are controlled by GPP in more xeric ecosystems (∆δt > 0 or ∆δs > 0) and by respiration in more mesic ecosystems (∆δt<0 or ∆δs<0, Fig. 2). This precipitation threshold is the highest for temporal sensitivity using gridded observation-based fluxes (i.e., inversions of NEE and gridded GPP data products) (δt: MAP = 950 ± 90 mm yr−1, Fig. 2a) and lowest for spatial sensitivity using EC observations (δs: 750 ± 75 mm yr−1, Fig. 2b). The different precipitation thresholds between different observations may be due to data uncertainties in the large-scale gridded observation-based fluxes. The results also indicated that gridded observation-based fluxes, due to averaging out the ecosystem-scale variability, show lower sensitivity than in the EC observation (for both δt and δs). Furthermore, the spatial sensitivity (δs) is larger (a steeper slope) than the temporal sensitivity (δt) (Fig. 2). This is likely because δs reflects gradients of different vegetation types across precipitation, while δt only includes the short-term temporal response of fluxes to precipitation variability14. The legacy effect of previous year’s precipitation on current-year’s production may also contribute to the lower δt25,26. The sensitivity of GPP to precipitation decreases from dry grass/shrub ecosystem to wet forest ecosystems, also consistent with independent ecosystem-scale measurements13,15.

Although TRENDY models also simulate the decreasing sensitivity of GPP and TER with increasing precipitation (Supplementary Fig. 10), they do not appear to show the same sensitivity threshold behavior than observation-based fluxes. The $$\delta _{\rm GPP}^t$$ is always higher than $$\delta _{\rm TER}^t$$ in the TRENDY models across the CONUS (Supplementary Fig. 10), and DGVMs appear to overestimate the sensitivity of GPP to precipitation in the more mesic ecosystems of CONUS. Thus, in more mesic ecosystems, DGVM simulations show that GPP is the dominant control on terrestrial NEE variability, while observationally constrained estimates show that TER is the dominant control on terrestrial NEE variability. This threshold in precipitation and model–data mismatch is also evident when looking at the fraction of precipitation being lost as evapotranspiration, indicating that water surplus may cause a shift in NEE variability to more respiration control (Supplementary Fig. 11).

### Potential mechanism for the data–model mismatch

We then explore which ecosystem process—production or respiration—leads to this model–data mismatch. The per-pixel Pearson’s temporal r between GPPMODIS and TRENDY GPP (GPPTRENDY) is universally positive in the CONUS (Supplementary Fig. 12), indicating that the mismatch is not likely due to GPP but rather due to TER in the more mesic ecosystems. Indeed, our results indicate that TER inverted from gridded observation-based fluxes (TERinv = GPPMODIS–NEEACI) shows a significant temporal correlation with TRENDY TER (TERTRENDY) in the more xeric ecosystems where MAP < 750 mm yr−1 (r = 0.72, p < 0.001) (Fig. 3a), but less of a correlation in the more mesic ecosystems where MAP > 750 mm yr−1 (r = 0.302, p > 0.1) (Fig. 3b). This suggests that DGVM TER simulations may be less realistic in more mesic regions than in more xeric regions and thus respiration most likely explains the mismatch between DGVMs and observations in more mesic ecosystems.

Total respiration is the sum of autotrophic respiration (Ra) by plants, and heterotrophic respiration (Rh) by soil microbes. The Rh, which composes about half of TER, is jointly controlled by carbon supply, soil properties, and by climate-dependent decomposition rates. We hypothesized that the relative influence of carbon supply versus environmental control on decomposition, especially soil moisture, over Rh is a function of water availability to drive the decoupling between GPP and TER, and thus varying the strength of temporal correlation between GPP and NEE along the precipitation gradient. To test this hypothesis, we used three simple empirical ecosystem respiration models with varying complexity and factors that included SOC (C), temperature (T), soil moisture (M), and current-year productivity (P) as a proxy of fresh input to litter in the fast SOC pools. These ecosystem respiration models aimed to identify key environmental factors that may help improve respiration simulation within DGVMs. We found that TERTRENDY and all empirical ecosystem respiration models were able to simulate the IAV of TERinv in the more xeric ecosystem (r = 0.72–0.869, p < 0.01), except for TERCT model (r = –0.203, p > 0.05) (Fig. 3a). By contrast, in the more mesic ecosystem, only the TERCTMP model captured the variance of TERinv (r = 0.66, p < 0.001) (Fig. 3b), and resulted in the lowest RMSE (Fig. 3c) and statistically indistinguishable estimate of TER when compared with TERinv (for the best TERCTMP model: mean ± sd = 1211 ± 31 g C m−2 yr−1; TERinv: 1242 ± 41 g C m−2 yr−1). Then, we partitioned TERCMTP into Ra being a GPP-dependent estimate, and Rh consisting of a GPP-dependent component standing for a fast-responding labile component of Rh and a GPP-independent term standing for Rh of slower soil carbon pools27. Comparison with an observed soil respiration database (SRDB v3)28 confirms that the TERCMTP model performed better at simulating TER, Ra, and Rh than the DGVM simulations (Supplementary Fig. 13c). In particular, the TERCMTP simulations of Rh are much better at explaining the IAV of TERinv (r = 0.44, p < 0.1) than DGVM-simulated Rh (r = 0.25, p > 0.1) in more mesic regions. This suggests that DGVMs do not effectively simulate Rh and thus TER as water availability increases across the CONUS. Temporal correlation between IAV of gridded GPPMODIS and Rh from TERCMTP and TRENDY reveals that Rh may control the decoupling between GPP and NEE in the mesic CONUS areas (Fig. 4a, b).

Underestimation of the influence of soil moisture and soil carbon on Rh is a possible explanation of why DGVMs were not able to effectively simulate Rh in mesic ecosystems. DGVMs have routinely incorporated temperature and moisture constraints on Rh, but the effects of moisture on decomposition rate are much more uncertain than temperature, especially in warmer and wetter environments29,30, and also soil-dependent31. Currently, global land surface models like the ones in TRENDY appear to overestimate temperature effects on decomposition rates29,32, and lead to faster soil carbon turnover time and stronger carbon–climate feedbacks32,33, and therefore may override the influence of soil moisture and soil carbon on Rh34. In addition, experimental studies have shown that the sensitivity of microbial respiration to soil moisture increases in wetter ecosystems35, and may contribute to the higher observed TER variability in the more mesic ecosystems. Partitioning TERCMTP into Rh and Ra confirms that Rh contributed more to IAV of TER than Ra in mesic regions than in the arid regions (0.50, calculated as S.D. of Rh divided by S.D. of TERCMTP, vs. 0.40 in the drier region). Therefore, our analysis suggests that the model–data mismatch in more mesic ecosystems is most likely due to the poorly understood response of heterotrophic respiration to wetter conditions.

Large amounts of SOC are a major source of carbon supply for microbial decomposition36, and can sustain Rh when the GPP anomalies are low and fresh labile carbon supply to Rh is being suppressed, therefore explaining why NEE variations tend to be more buffered against changes in GPP in more mesic ecosystems (r = –0.22, p = 0.43). Therefore, Rh may be more limited by environmental conditions, rather than by carbon supply in more mesic ecosystems, while Rh may be more limited by carbon supplied through productivity in drier ecosystems. Consistent with this interpretation, we see that SOC shows a similar increase across the continental precipitation gradient (Supplementary Fig. 14). Therefore, accurate simulation of Rh requires capturing the relative influence of carbon supply (e.g., production and SOC) and environmental constraints (e.g., soil moisture) under different hydrological conditions. However, GPP and Rh appear to be too tightly coupled in DGVM simulations in mesic ecosystems across the CONUS, suggesting that Rh may be overly dependent upon production in global land surface models, and therefore underestimate the influence of environmental constraints in the IAV of the Rh. This may help explain in part why land surface models tend to underestimate turnover times27.

We also hypothesized that human land-use activity is a second plausible driver to cause the spatial mismatch between production and net carbon exchange in CONUS (Fig. 1a, b). The spatial inconsistence between annual mean GPP and NEE, especially in the Midwestern United States is most likely due to harvest of these intensive agricultural ecosystems. Agriculture statistics show that Midwest states account for about 21.2% of total agricultural land, but contributes to about 45.7% of crop export of the United States in 2012. The region exports roughly 0.08 Pg C yr−1 of crop products37, which is approximately half of the Midwest regional mean NEE (0.18 Pg C yr −1) and one-third of CONUS mean NEE (0.3 Pg C yr −1) between 2000 and 2014 in the region. The amount of carbon harvested in croplands needs to be better represented in DGVMs which lack realistic simulations of crop yields and often parameterize harvest as a fixed fraction of daily net primary production and do not consider the lateral transport of C in harvested goods. Removal of crop yields also means reduced SOC inputs into the soil, and the production of harvest residues that will decompose faster than natural litter, e.g., because of tillage, should increase the control of NEE by Rh (Fig. 1c).

Our results help to understand the climate sensitivity of key carbon cycle processes and potential ways to improve DGVM simulations at the continental scale. Previous studies have shown that moisture-regulated productivity in the arid or semiarid region is the dominant control on IAV of global land net carbon exchange;18,19 however, in these studies, they relied heavily on DGVM simulations that appear to be overly sensitive to the GPP response to water availability. To the extent that IAV is a useful diagnostic of long-term carbon–climate sensitivity, our results indicate that moisture-regulated respiration in mesic ecosystems can be another major mechanism regulating the variability of NEE. As the water balance of ecosystems within the United States is projected to be drier in certain regions and wetter in others38, our analysis will facilitate the identification of potential critical thresholds which, if crossed, can abruptly change the carbon balance of ecosystems in CONUS. Our analysis highlights heterotrophic respiration as one of the most poorly understood carbon cycle processes and thus the most difficult to accurately simulate in land surface models, especially in more mesic ecosystems. Therefore, better understanding of the environmental controls of heterotrophic respiration may help improve carbon turnover times in model simulations, thereby reducing the amount of uncertainty in future carbon–climate feedbacks.

## Methods

### Temporal correlation between IAV of GPP and NEE

We calculated the temporal correlation between IAV of detrended GPP and NEE using Pearson’s product moment correlation (Pearson’s r) at pixel level over the CONUS. For gridded observation-based fluxes, four gridded GPP or photosynthetic capacity indices combined with two gridded NEE estimates, resulting in eight individual Pearson’s r maps between IAV of GPP and NEE (Supplementary Fig. 3), were used to produce mean Pearson’s r (Fig. 1c) at 1° resolution from 2000 to 2014. For each pixel, if more than four individual correlations were significant, then the mean Pearson’s r is considered as significant. For TRENDY simulation, the mean Pearson’s r was calculated from detrended fluxes from each individual model (n = 10) at 1° resolution from 2000 to 2010 (Supplementary Fig. 14). For each pixel, if more than 5 individual correlations out of 10 TRENDY DGVM simulations were significant, then the mean Pearson’s r is considered as significant. Significance level at 0.1 was used in this study.

To test the robustness of temporal correlation between IAV of GPP and NEE, the same procedure was also applied at 3° spatial resolution and to growing-season fluxes (May–Oct) (Supplementary Fig. 3a, c, and d). To further reduce the uncertainties caused by pixel-level estimates for NEEACI, we also calculated the correlation between IAV of GPP observed by MODIS (GPPMODIS) or TERinv and NEEACI at the subcontinental scale (i.e., for the two regions with MAP above and below 750 mm yr−1). The TER inverted from gridded observation-based fluxes (TERinv) is calculated as the difference between GPPMODIS and NEEACI (i.e., TERinv = GPPMODIS–NEEACI). Although GPPMODIS and NEEACI may be subject to large uncertainties at pixel level, we minimized this uncertainty and its influence on our main conclusion by using an ensemble mean from multiple data sources and regional estimates rather than pixel-level values (see Methods: Sensitivity analysis). For TRENDY simulations, GPP, TER, and NEE are calculated as ensemble mean of ten DGVMs from 2000 to 2010. We found a significant positive Pearson’s r between IAV of GPP and NEE in more xeric regions (MAP < 750 mm yr−1) from both observations and TRENDY simulations, but correlations tended to diverge in the more mesic region (MAP > 750 mm yr−1). TERinv and GPPMODIS have a correlation with NEEACI of comparable absolute value but opposite sign in more mesic regions. In contrast, GPP still had a significant positive correlation with NEE in mesic regions in the TRENDY ensemble.

To test whether disturbance changes the temporal correlation between IAV of GPP and NEE, we compared the wildfire CO2 emission with GPPMODIS and NEEACI, and found that wildfire CO2 emissions are much smaller in magnitude (0.021 ± 0.0039 Pg C yr−1), compared to GPPMODIS (6.29 ± 0.26 Pg C yr−1) and NEEACI (0.30 ± 0.13 Pg C yr−1) (Supplementary Fig. 5). To test whether human activities (e.g., agriculture) change the temporal correlation between IAV of GPP and NEE, we masked out regions with high human influence index (HII) with different thresholds (HII > 0.4 or HII > 0.3). We found that human activity had little influence on the temporal correlation between GPP and NEE along the precipitation gradient across the CONUS (Supplementary Fig. 6).

To visualize the temporal correlation between IAV of GPP and NEE along precipitation and GPP gradients, we plot the mean and standard deviation of Pearson’s r within each mean GPPMODIS and precipitation bins (Fig. 1e). Mean GPP and precipitation for each pixel was calculated from 2000 to 2014. We use precipitation from monthly, 0.5° spatial resolution from Climate Research Unit at the University of East Anglia. Mean annual GPP and precipitation was binned into 14 equal intervals. Mean and SD of mean Pearson’s r from constrained global observations (Fig. 1c), DGVM simulations (Fig. 1d), mean annual GPP, and mean annual precipitation (MAP) were summarized in each interval, and plotted along the GPP/precipitation gradient (Fig. 1e).

Finally, the soil organic carbon (SOC) content was plotted along the precipitation and GPP gradients to show its potential influence of the Pearson’s r in CONUS (Supplementary Fig. 14). The 0–100-cm SOC stock map was interpolated from measured SOC points by Rapid Carbon Assessment (RaCA) by the USDA-NRCS Soil Science Division in 2010 using Kriging method.

### Sensitivity analysis

Temporal sensitivity ($${\mathrm{\gamma }}_{\rm flux}^t\,{\mathrm{and}}\,{\mathrm{\delta }}_{\rm flux}^t$$): The temporal sensitivity was used to indicate the inter-annual sensitivity of carbon flux to change in climate factor for a given ecosystem over time. Therefore, the temporal sensitivity was calculated from each time-series measurement in which ecosystem production and respiration and climate factors have varied over time. Temporal relationship between ecosystem production and respiration and climate factors from long-term site-level data are usually modeled as linear regardless of ecosystem types14. In this analysis, the temporal model was formulated as

$$\Delta {\mathrm{flux}} = {\mathrm{\gamma }}_{\rm flux}^t\Delta {\mathrm{Temp}} + {\mathrm{\delta }}_{\rm flux}^t\Delta {\mathrm{Prep}}$$
(1)

where Δflux (i.e., GPP or TER), ΔTemp, and ΔPrep are annual anomalies for gross carbon flux, temperature, and precipitation, respectively. Therefore, $${\mathrm{\gamma }}_{\rm GPP}^t\left( {{\mathrm{\gamma }}_{\rm TER}^t} \right)$$and $${\mathrm{\delta }}_{\rm GPP}^t$$ ($${\mathrm{\delta }}_{\rm TER}^t$$) indicate the apparent temporal sensitivity of GPP (TER) to the absolute change (Δ) of Temp and Prep controls. A summary for the temporal sensitivity was included in Supplementary Table 1, and the $${\mathrm{\delta }}_{\rm GPP}^t$$ and $${\mathrm{\delta }}_{\rm TER}^t$$ were used to generate Supplementary Fig. 2a and Supplementary Figs. 810. Annual anomalies (ΔTemp and ΔPrep) were calculated by removing the mean from the time-series data.

To calculate the relative contribution of Prep and Temp anomalies to the carbon flux anomalies (Supplementary Fig. 7), we follow the previous approach20. The product of a given sensitivity (e.g., $${\mathrm{\delta }}_{\rm GPP}^t$$) and the corresponding climate-forcing anomaly (e.g., ΔPrep) constitutes the flux anomaly component driven by this climate factor. Thus, $$\Delta {\rm GPP} = {\mathrm{\gamma }}_{\rm GPP}^t\Delta {\mathrm{Temp}} + {\mathrm{\delta }}_{\rm GPP}^t\Delta {\mathrm{Prep}}$$ estimates the contributions of temperature ($${\mathrm{\gamma }}_{\rm GPP}^t\Delta {\mathrm{Temp}}$$) and precipitation ($${\mathrm{\delta }}_{\rm GPP}^t\Delta {\mathrm{Prep}}$$) anomalies to the carbon flux anomalies (ΔGPP).

Spatial sensitivity ($${\mathrm{\gamma }}_{\rm flux}^s\,{\mathrm{and}}\,{\mathrm{\delta }}_{\rm flux}^s$$): The spatial sensitivity was used to indicate the sensitivity of carbon fluxes across climate gradients (and ecosystem types). The spatial sensitivity was calculated from a spatially explicit gridded model and observation-based datasets. Spatial models are usually nonlinear between ecosystem productivity and respiration and climate factors when they span large gradients in climate14. We model the ecosystem production and respiration flux as a function of mean Temp and Prep using a polynomial function (up to two orders) to capture the nonlinear environmental effects.

$${\rm flux} = \alpha _0 + \alpha _1{\rm Temp} + \alpha _2{\rm Temp}^2 + \alpha _3{\rm Prep} + \alpha _4{\rm Prep}^2$$
(2)

Finally, the first-order derivative flux–climate curve is calculated as the spatial sensitivity of the flux to climate factors. We derived $${\mathrm{\gamma }}_{\rm GPP}^s\left( {{\mathrm{\gamma }}_{\rm TER}^s} \right)$$ and $${\mathrm{\delta }}_{\rm GPP}^s$$ ($${\mathrm{\delta }}_{\rm TER}^s$$) to indicate the apparent spatial sensitivity and temporal sensitivities of GPP and TER to the change of Temp and Prep controls over space. A summary for the spatial sensitivity values is included in Supplementary Table 1, and the $${\mathrm{\delta }}_{\rm GPP}^s$$ and $${\mathrm{\delta }}_{\rm TER}^s$$ were used to generate Supplementary Fig. 2b and Supplementary Figs. 810.

Bootstrapping: to ensure that the sensitivity of ecosystem production and respiration to climate factors is not affected by extreme values, we performed 100 bootstrap analyses by randomly selecting a subset of data in each model. The confidence intervals of sensitivity in Fig. 2 and Supplementary Figs. 810 confirm that the threshold of ecosystem production and respiration to precipitation is not particularly sensitive to a few extreme values.

Sensitivity calculation for EC measurement: EC measurements provide direct observations of net ecosystem CO2 exchange and estimated GPP and TER fluxes with climate variables. A total of 17 sites with at least 5 years of data, representing the major ecosystems across the CONUS were obtained from the FLUXNET2015 database (Supplementary Table 2 and Supplementary Fig. 16). Wetland sites and sites with recent major disturbance were excluded from our analyses. Daily GPP and TER were estimated as the mean value from both the nighttime partitioning method39 and the light response curve method40. More details on the flux partitioning and gap-filling methods used are provided by ref. 41. Daily values were summed to annual values, and then used to estimate the sensitivity of productivity (i.e., GPP) and respiration (i.e., TER) to annual Temp and Prep. The temporal sensitivity (i.e., $${\mathrm{\gamma }}_{\rm GPP}^t,{\mathrm{\gamma }}_{\rm TER}^t,{\mathrm{\delta }}_{\rm GPP}^t,{\rm and}\,{\mathrm{\delta }}_{\rm TER}^t$$, Supplementary Table 1) for each individual eddy-covariance site was calculated from time-series measurements and plotted along the precipitation gradient for each bootstrap replicate (Supplementary Fig. 8a). The spatial sensitivity (i.e., $${\mathrm{\gamma }}_{\rm GPP}^s$$, $${\mathrm{\gamma }}_{\rm TER}^s$$, $${\mathrm{\delta }}_{\rm GPP}^s$$, and $${\mathrm{\delta }}_{\rm TER}^s$$, Supplementary Table 1) was calculated from all 17 flux sites and plotted along the precipitation gradient (Supplementary Fig. 8b).

Sensitivity calculation for gridded observation-based fluxes: first, we used National Ecological Observatory Network (NEON) ecodomains to calculate spatial and temporal sensitivity of GPP and TER to Prep and Temp. There are 17 NEON ecodomains in the CONUS and these ecodomains were designed strategically to capture the variability in ecological and climatological conditions. Within each ecodomain, we summarize mean GPPMODIS, NEEACI, TERglobal_obs, Prep, and Temp from 2000 to 2014. The temporal sensitivity (i.e., $${\mathrm{\gamma }}_{{\rm{GPP}}}^t,{\mathrm{\gamma }}_{{\rm{TER}}}^t,{\mathrm{\delta }}_{{\rm{GPP}}}^t,{\rm and}\,{\mathrm{\delta }}_{{\rm{TER}}}^t$$, Supplementary Table 1) for each individual NEON ecodomain was calculated from annual anomalies and plotted along the precipitation gradient for each bootstrap replicate (Supplementary Fig. 9a). The spatial sensitivity (i.e., $${\mathrm{\gamma }}_{{\rm{GPP}}}^s,{\mathrm{\gamma }}_{{\rm{TER}}}^s,{\mathrm{\delta }}_{{\rm{GPP}}}^s,{\rm and}\,{\mathrm{\delta }}_{{\rm{TER}}}^s$$ Supplementary Table 1) was calculated from long-term mean (2000–2014) gross carbon flux and climate of all 17 NEON ecodomains and plotted along the precipitation gradient (Supplementary Fig. 9b). The NEON ecodomains were obtained from (http://www.neonscience.org/data/maps-spatial-data).

The data uncertainties with GPPMODIS and NEEACI may affect the spatial and temporal sensitivity of constrained global observations. The GPPMODIS uncertainty was mainly from its inputs (including MODIS observations of FPAR, LAI, land cover, and daily meteorological data) and algorithms42. Of these, meteorological data contribute to the largest uncertainty at the global scale, but this uncertainty is lower in regions with dense observations, such as CONUS43. Validation with EC measurement suggested that GPPMODIS shows reasonable spatial patterns and temporal variability across a diverse range of biomes and climate regimes44. The annual NEEACI is the ensemble mean NEE of four atmospheric CO2 inversions to reduce the uncertainty, primarily due to limited atmospheric data, uncertain prior flux estimates, and errors in the atmospheric transport models45. In North America, the largest uncertainty in NEEACI is in the Midwestern United States, where agriculture dominates the landscape.

Sensitivity calculation for TRENDY simulation: The same procedure to calculate temporal and spatial sensitivity for constrained global observations is applied to the TRENDY simulations except (1) the temporal span is from 2000 to 2010; (2) GPP, NEE, and TER were the ensemble mean annual GPP and TER across ten DGVMs. The temporal and spatial sensitivity calculated from TRENDY simulations are plotted in Supplementary Fig. 10.

Comparing the climate sensitivity of GPP and TER along the precipitation gradient (Fig. 2). To compare the relative sensitivity of productivity and respiration to precipitation, we calculated the difference (Δδt or Δδs) between the sensitivity of GPP to precipitation ($$\delta _{{\rm{GPP}}}^t$$ and$$\delta _{{\rm{GPP}}}^s$$) and sensitivity of TER to precipitation ($$\delta _{{\rm{TER}}}^t$$ and $$\delta _{{\rm{TER}}}^s$$) for each bootstrapping replicate (i.e., blue point minus red point in Supplementary Figs. 810 for each bootstrapping replicate). Because $$\delta _{{\rm{TER}}}^t$$ and $$\delta _{{\rm{TER}}}^s$$ were positive, a positive Δδt or Δδs indicates that GPP is more sensitive to precipitation than TER. Mean and 90 percentile of Δδt or Δδs (n = 100) was plotted along the precipitation gradient (Fig. 2). The Δδt or Δδs were summarized as a change in carbon flux (unit: gC m−2 yr−1) in response to 100-mm change in precipitation.

### Robustness of climate sensitivity of GPP and TER along the water availability gradient

We used two other water availability indices, including mean annual precipitation minus evapotranspiration (P-ET, mm yr−1) and the ratio between MAP and potential evapotranspiration (P/PET, unitless), to test the robustness of the climate sensitivity of GPP and TER along the water availability gradient. The P-ET integrates the temperature effect on water demand and is widely used to represent climate water deficit. The P/PET is an indicator of the degree of dryness of the climate at a given temperature. We calculated the sensitivity of GPP and TER to these two water availability indices for constrained global observation and TRENDY simulation (Supplementary Fig. 11). We did not report the sensitivity of GPP and TER to water deficit for constrained EC observation, as ET/PET was not included in the dataset. Monthly ET/PET data at 0.5° resolution were from MOD16 ET product (http://www.ntsg.umt.edu).

### Ecosystem respiration modeling experiment

We designed a simple ecosystem respiration modeling experiment to diagnose why the DGVMs fail to capture the precipitation threshold of the sensitivity of production and respiration to precipitation. We used three empirical respiration models derived from publications with increasing complexity and factors that include observed SOC (C), temperature (T), soil moisture (M), and current-year production (P), and then compare them with TERinv (g C m−2 yr−1).

### TERCT model

According to the models previously validated against a global database of soil respiration (Rs) observations46, Rs can be predicted in response to soil C content (SoilC, Mg ha−1) and temperature (Temp, °C) as follows:

$${\mathrm{TER}}_{{\mathrm{CT}}} = {\mathrm{SoilC}} \times 64 \times 1.72^{0.21 \times {\mathrm{Temp}}}$$
(3)

### TERCTM model

On the basis of TERCT model, the effect of soil moisture (SoilM, m3 m−3) on Rs can be modeled as follows:46

$$\begin{array}{l}{\mathrm{TER}}_{{\mathrm{CTM}}} = \\ {\mathrm{SoilC}} \times 64 \times 1.72^{0.21 \times {\mathrm{Temp}}} \times \left( {\frac{{{\mathrm{SoilM}} - 2.1}}{{0.55 - 2.1}}} \right)^{6.6481} \times \left( {\frac{{{\mathrm{SoilM}} + 0.007}}{{0.55 - 0.007}}} \right)^{3.23}\end{array}$$
(4)

TERCTMP model: TERCTMP model is a photosynthesis-dependent respiration model that is calibrated and validated against eddy-covariance data27. TERCTMP combines the joint influences of temperature (f(Temp)), precipitation (f(Prep), mm yr−1), and substrate availability, including SOC (SoilC) and current-year production (P, g C m−2 yr−1), on ecosystem respiration, and can be described as follows:

$${\mathrm{TER}}_{{\mathrm{CTMP}}} = \left( {{{R}}_0 + {{k}}2 \times {{P}}} \right) \times {{f}}( {{\mathrm{Temp}}}) \times {{f}}( {{\mathrm{Prep}}} )$$
(5)

where

$${{R}}_0 = {\mathrm{constant}} + {{a}}1 \times {\mathrm{LAI}}_{{\mathrm{max}}} + {{a}}2 \times {\mathrm{SoilC}}$$
(6)
$${{f}}( {{\mathrm{Temp}}} ) = {\mathrm{e}}^{{{E}}_0 \times \left( {\frac{1}{{{\mathrm{Tref}} - {{T}}0}} - \frac{1}{{{\mathrm{Temp}} - {{T}}0}}} \right)}$$
(7)
$${{f}}( {\mathrm{P}} ) = \frac{{\left( {{{a}} \times {{k}} + {\mathrm{Prep}} \times \left( {1 - {{a}}} \right)} \right)}}{{\left( {{{k}} + {\mathrm{Prep}} \times \left( {1 - {{a}}} \right)} \right)}}$$
(8)

In the TERCTMP model, R0 is the reference respiration rate at the reference temperature (Tref) (15 °C), E0 is the activation energy, and T0 = −46.02 °C. In the response of respiration to precipitation (f(Prep)), k (mm) is the half-saturation constant of the hyperbolic relationship and a is the response of total respiration to null Prep. LAImax is the maximum leaf area index within a pixel. LAI at 1-km2 spatial resolution is derived from MODIS observations (MOD15A2, v6)47. Current-year GPP was used in the TERCTMP model as there is no evidence for lagged effects of GPP on TERinv or TERCTMP Rh (Supplementary Fig. 17). Conceptually, this model can be considered as the sum of a GPP-dependent term comprising autotrophic respiration (Ra) and the fast-responding labile component of heterotrophic respiration, and a GPP-independent term standing for heterotrophic respiration (Rh) of slower carbon pools. Therefore, TERCTMP can be partitioned into Ra and Rh as follows:

$${{\mathrm R}}{\mathrm{a}} = {{k}}2 \times {{P}} \times {{f}}( {{\mathrm{Temp}}} ) \times {{f}}( {{\mathrm{Prep}}} )$$
(9)
$${{\mathrm R}}{\mathrm{h}} = {\mathrm{R}}_0 \times {\mathrm{f}}\left( {{\mathrm{Temp}}} \right) \times {\mathrm{f}}\left( {{\mathrm{Prep}}} \right)$$
(10)

All the coefficients used in TERCTMP were taken from the original study27, where 104 globally distributed sites from the FLUXNET networks were used to derive plant functional-type specific parameters.

Model evaluation: using TERinv as a benchmark, we calculated the spatially averaged root-mean-squared error (RMSE) between four TER models (three empirical respiration models described above and one ensemble TRENDY TER (TERTRENDY) and TERinv (Fig. 3c). We also calculated the temporal correlation between four TER models and TERinv at the subcontinental scale (MAP above and below 750 mm yr−1) (Fig. 3a, b).

We also compared the TERCTMP and TERTRENDY with a global soil respiration database v3 (SRDB v3). Only measurements after 2000 were selected, and wetlands and deserts were excluded as well as disturbed ecosystems (Supplementary Fig. 18). A total of 123 site-year data were used. Of all the SRDB v3 data, a total of 18 site-years explicitly measure the Rh and Ra, and these were selected to validate Ra and Rh from TERCTMP model and TRENDY simulations (Supplementary Fig. 13). Comparison between annual TER, Ra, and Rh from TERCTMP model and TRENDY DGVM simulations and the SRDB v3 showed that TERCMTP model explained significantly more variation in measured Rh in SRDB (v3) than DGVM simulations did (Supplementary Fig. 13c).

Temporal correlation between Rh derived from TERCTMP model and TRENDY simulations and GPPMODIS and GPPTRENDY were calculated at pixel level (Fig. 4a, b) and at the subcontinental scale (MAP above and below 750 mm yr−1, Fig. 4c, d).

### Datasets

Gridded observation-based fluxes. We used four remotely sensed observations of GPP or photosynthetic capacity indices, including MODIS 17 GPP (GPPMODIS), solar-induced chlorophyll fluorescence (SIF), normalized difference vegetation index (NDVI), and fraction of photosynthetically active radiation (FPAR). The GPPMODIS is a product of maximum light-use efficiency, the FPAR, incoming radiation, and two scalar reduction factors that represent limitations on photosynthesis through temperature and vapor pressure deficit42,48. Monthly GPPMODIS at 0.05° resolution from 2000 to 2014 was obtained from the NTSG group (http://www.ntsg.umt.edu/). Annual mean GPP from 2000 to 2014 was used to produce Fig. 1a. SIF is sensitive to the electron transport rate of plant photosynthesis as well as the fraction of absorbed radiation49,50, from the Global Ozone Monitoring Experiment-2 (GOME-2) for the period 2007–2014. The SIF data are retrieved near the λ = 740 nm far-red peak in chlorophyll fluorescence emission. Details of the retrieval of SIF from GOME-2 measurements can be found in ref. 49. Monthly GOME-2 SIF at 0.05° resolution from 2007 to 2014 was used. NDVI is an index of landscape-integrated vegetation greenness and photosynthetic capacity, which is related to photosynthetic potentials under ideal environmental conditions, and thus NDVI reflects an inherent vegetation photosynthetic property. NDVI is from monthly, 0.05° MODIS MOD13C2 (C6) from 2000 to 2014, and only the data flagged as “good-quality” were used. FPAR is the fraction of absorbed photosynthetically active radiation that a plant canopy absorbs for photosynthesis and grows in the 0.4–0.7-nm spectral range. FPAR is from 8-day, 1-km resolution MODIS MCD15A2 (C5) from 2000 to 2014, and only the data flagged as “good-quality” were used. All the GPP or photosynthetic capacity indices were aggregated into an annual time step at 1° spatial resolution.

We used two gridded NEE estimates, including a NEE from an ensemble of four atmospheric CO2 inversions (ACI) (NEEACI) and a NEE upscaled from eddy covariance flux data for North America (EC-MOD)51. Atmospheric CO2 inversions estimate carbon exchange between the earth surface and atmosphere by utilizing atmospheric CO2 measurements, a key observational component of the global carbon cycle (e.g., their observed temporal and spatial gradients). ACIs defer mainly because of choices for atmospheric observations, transport model, spatial and temporal flux resolution, prior fluxes, observation uncertainty and prior error assignment, and inverse method. Therefore, different ACI are likely different in spatial distribution and magnitude of carbon flux45. Four different ACI products, including Carbon-Tracker 2015 (CT2015)52, Carbon-Tracker Europe 2015 (CTE2015)53, CAMS54, and Jena CarboScope v3.855,56, were obtained from 2000 to 2014, and resampled to 1° resolution using the nearest neighborhood at an annual time step. For each year from 2000 to 2014, an ensemble annual mean NEE was calculated across four ACIs (termed as NEEACI). Positive NEE indicates CO2 from atmosphere to land ecosystem, and thus carbon sink for land ecosystem. Annual mean NEEACI from 2000 to 2014 was used to plot Fig. 1b. The EC-NEE was developed from eddy covariance (EC) flux data, MODIS data streams, micrometeorological reanlaysis data, stand age, and aboveground biomass data using a data-driven approach at the UNH51. EC-MOD NEE is obtained at 8-day time step at 1-km resolution between 2000 and 2012 and was aggregated into an annual time step at 1° spatial resolution. GPP and TER from the EC-MOD approach were not used, because they are not directly measured and inherently correlated with NEE.

TRENDY DGVM simulations. We used simulations of ten DGVMs from the TRENDY v2 ensemble57 for the period 2000–2010: Hyland58, JULES59, LPJ60, LPJ-GUESS61, NCAR-CLM462, ORCHIDEE63, OCN64,65, SDVGM66, and VEGAS67. The model ensemble stems from the TRENDY Inter-model Comparison (“Trends in net land_atmosphere carbon exchange over the period 1980_2010”) that provided bottom-up estimates of carbon cycle processes for the Regional Carbon Cycle Assessment and Processes (RECCAP). Our analysis uses simulations from the “S2” storyline that includes time-varying atmospheric CO2 concentrations and climate and fixed land cover for 2005. All simulations were based on climate forcing from the CRU-NCEPv4 climate variables at 6-h resolution for the years 1901–2010, including precipitation, snowfall, temperature, short-wave and long-wave radiation, specific humidity, air pressure, and wind speed. GPP, NEE, and TER were summarized at 1° spatial resolution at an annual timescale from 2000 to 2010 for each model.

EC observations. A total of 17 sites with at least 5 years of data, representing the major ecosystems across the CONUS were obtained from FLUXNET2015 database (Supplementary Table 2 and Supplementary Fig. 16). Consistent with NEEACI, positive NEE denotes uptake by the biosphere, and negative values indicate carbon losses.

Climate data. Monthly gridded temperature (Temp) and precipitation (Prep) at 0.5° spatial resolution from 2000 to 2014 were obtained from Climate Research Unit (CRU TS v. 3.25) at the University of East Anglia68.

Global soil respiration database. The global soil respiration database (SRDB v3)28 encompasses all published studies that report at least one of the following data measured in the field (not laboratory): annual total soil respiration (Rs), mean seasonal Rs, a seasonal or annual partitioning of Rs into its source fluxes (i.e., Ra and Rh), Rs temperature response (Q10), or Rs at 10° C from 1961 to 2012. In this analysis, we use records containing annual Rs (Ra or Rh, if present) after 2000 in the CONUS (Supplementary Fig. 18). Wetland and desert records were excluded. In total, we obtain 123 site-year annual Rs measurements, and 18 site-year Ra and Rh measurements.

Wildfire emission. Monthly gridded CO2 emission from wildfire at 0.25° resolution is from global fire emission database (GFED4s, with small fires). Information about the algorithms, data, and uncertainties for the product can be found in ref. 69.

Human influence index. The human influence index (HII), an indictor of human impacts on the environment and ecosystem, was obtained from the Global Human Footprint Dataset of the Last of the Wild Project, Version 2, 2005 (LWP-2)70. The HII was created from nine global data layers covering human population pressure (population density), human land use and infrastructure (built-up areas, nighttime lights, and land use/land cover), and human access (coastlines, roads, railroads, and navigable rivers), and normalized by biome and realm.