Soils store the largest amount of carbon in terrestrial ecosystems containing around 1500 Pg C (Pg; 1 Pg = 1015 g) soil organic carbon (SOC) in top 1 m depth1,2. Understanding the relationship between SOC and its environmental controllers is key for accurately predicting climate and land use change impacts on SOC and reducing uncertainties in large scale carbon climate feedback projections.

Earth System Models (ESMs)3 are used to predict the global carbon climate feedbacks and simulate the future state of soils and ecology. Despite their key roles in determining the spatial heterogeneity of SOC and regulating the rate of SOC decomposition, many environmental factors that regulate soil formation are not adequately represented in current land surface models4. As a result, current land surface models poorly represent baseline SOC spatial heterogeneity4,5 and show large uncertainties in predicting future carbon climate feedbacks6. Burke, et al.7 reported that the quantity, spatial distribution, and decomposability of SOC stocks accounted for half of the overall uncertainty in predicting future carbon climate feedbacks and associated climate changes. Therefore, to reduce the uncertainty in future carbon climate feedback projections, it is critical to appropriately represent environmental controllers and the spatial heterogeneity of SOC in land surface models. One way to improve the spatial heterogeneity of SOC stocks in land surface models is to quantify and represent the environmental controls on SOC stocks consistent with field observations.

Statistical geospatial modeling can be used to quantify the heterogeneity of environmental controllers on SOC stocks from regional to national and global scales. Also, environmental controllers’ spatial gradient can be used in a space-for-time substitution approach to predict the spatiotemporal variation in SOC stocks8. O'Rourke, et al.9 stated that linking specific environmental controllers to SOC functions can be a way to toward more realistic projections in ESMs. Examples of this approach were applied for Australian10, Brazilian11 and other soils using geospatial modelling. The same way Weintraub, et al.12 highlighted the potential of soil observation networks to accelerate process representation in SOC models and improve our current capacity to make predictions.

In this study we used a database composed of 4559 soil profile observations in conterminous US and 56 environmental predictors representing climate, topography, soil type, geology, land use and water dynamics to fit a geographically weighted regression model. We used the model to predict and generate maps of current and future SOC in 21 US ecoregions. The maps were generated considering current and two shared socioeconomic pathways SSP126 and SSP585 from GFDL-EMS4 of Coupled Model Intercomparison Project phase 6 (CMIP6), corresponding of IPCC RCP 2.6 and 8.5 scenarios. To produce the SSP126 and SSP585 pathways maps, we used future temperature and precipitation regimes from GFDL-EMS4 predictions in conterminous US. Finally, the results were compared with total SOC stocks from GFDL-ESM4 model predictions.


The chosen model included 12 individual variables from the original 56 (Table 1) as yearly averages: Temperature, Precipitation, Atmosphere Net Radiation, Net Primary Productivity, (Normalized difference vegetation index) NDVI, Forest and Pastureland cover classes, Inceptisols, Vertisols, Ultisols, Spodosols, and Mollisols soil classes. The obtained SOC stocks map for all ecoregions (Figs. 1, 2) showed higher values in the north east and pacific north west, forested mountainous areas, south Florida, and central US prairies. The regions with highest mean SOC stocks were Everglades (777 Mg/ha), Mixed Wood Shield (660 Mg/ha), Mississippi Alluvial and Southeastern USA Plains (519 Mg/ha), Atlantic Highlands (456 Mg/ha) and Mixed Wood Plains (352 Mg/ha) (Table 2). The lowest values were found in Warm Deserts and Texas-Louisiana Coastal Plain, less than 65 Mg ha−1 C.

Table 1 Environmental predictors used for the geospatial modelling process.
Figure 1
figure 1

United States of America level II ecoregions, and the RaCA datapoints distribution, the red and green points were chosen for calibration and validation of the geographically weighted regression model respectively.

Figure 2
figure 2

Maps of soil organic carbon stocks distribution over continental United States of America obtained from the geographically weighted regression model (a) and model residuals (b). The black lines represent the borders of level II ecoregions described in Fig. 1.

Table 2 Soil carbon stocks and model residue for US main ecoregions.

Predicted SOC stocks using geographically weighted regression (GWR) showed coefficient of variation (R2) of 0.48 (quasi-global). The GWR approach consistently underestimated the SOC stocks, bias is higher in the ecoregion with higher observed SOC stocks (Table 2). The bias was 83 and 71% in the Everglades and Ozark-Ouachita-Appalachian Forests; all other regions had bias less than 43%. Six Ecoregions presented model bias lower than 15. The model residual was not distributed homogeneously inside ecoregions, but some regions presented higher model bias than others (e.g., Mississippi delta and Warm Deserts SOC stocks were under and overestimated respectively) (Fig. 2).

SOC stocks were controlled by different environmental factors across US ecoregions (Fig. 3). The ecoregions, Mediterranean California, Cold Desert, Upper Gila Mountain, Warm Desert, Western Cordillera, and South-Central Semiarid Prairies were closely related to net primary production and vegetation types. In the Southern plains ecosystems, precipitation and net solar radiation were good predictors, and Everglades showed a different dynamics, presenting mean temperature and NDVI as good predictors. There was not a group of predictors highlighted for the other Ecoregions (Fig. 3). Mean temperature was positively related with SOC stocks in Everglades, Texas-Louisiana Costal Plains, Mediterranean California and Mississippi Alluvial and Southeastern USA Plains, on the other Ecoregions temperature increases resulted in reduction of SOC stocks. In Texas-Louisiana Coastal Plains and Mediterranean California SOC stocks were more sensitive to precipitation amount, which indicated it is a major diver of SOC stocks in warm and dry conditions. In the Everglades, Central USA Plains, Upper Gila Mountains and Mixed Wood Shield precipitation was negatively related on SOC stocks.

Figure 3
figure 3

Principal component analysis relating the selected environmental predictors and the United States of America level II ecoregions.

The prediction for SSP126 and SSP585 scenarios using GFDL-ESM4 projection of temperature and precipitation showed a decrease in SOC stocks. The magnitude was about 4% to 7% (3252 and 5470 Tg) (Tg; 1 Tg = 1012 g) respectively (Table 2, Fig. 4). Both prediction maps (Fig. 4) showed an increase in SOC for northern prairies and Florida ecoregions. Other ecoregions showed decrease in SOC stocks indicating sensitivity of those ecosystems to future climatic changes. However, uncertainties related to sea level rise13,14, especially on SSP585 scenario, which predicts a raise between 60 and 110 cm util 2100, make it difficult to precisely determine SOC decreases in Florida Lowlands as this region is expected to lose land mass. Four ecoregions showed an increase in SOC stocks, Everglades, Mixed Wood Shield, Mississippi Alluvial and Southeastern USA Plains and Warm Deserts, it was most pronounced in the Everglades with a 192% increase in SOC. Model projections showed a reduction in SOC stocks for both lower and higher emission scenarios (SSP126 and SSP585) (Table 3, Fig. 5). The changes were not significant for SSP126 with approximately ~ 2% change. In contrast, for the SSP585 scenario, the reduction ranged between 5% in 2030 to 21% in 2100. Although the highest value of SOC observed in SSP585 (4507 Mg ha−1) was almost double the SSP126 (2506 Mg ha−1), total SOC stocks (the sum of all pixels in the maps) was lower (Fig. 4). We also compared our SOC results with the GFDL-ESM4 projections. Despite differences in the model’s resolution (800 m for geospatial model and 0.5° for GFDL-ESM4) and particularities of process representation (e.g., GFDL-ESM4 does not simulate for wetlands and peatlands); it also showed a decrease in SOC for SSP585 pathway, being more pronounced in northwest part of US (Supplementary Fig. 1). The lowest emission scenario (SSP126) maintained the current SOC stocks.

Figure 4
figure 4

Soil organic carbon stocks prediction for 2100 in SSP585 (a) and SSP126 (b) scenarios using GFDL-ESM4 model of couple model intercomparison project phase 6 (CMIP6) climate projections data and the calculated difference compared with 2017 soil organic carbon stocks for SSP585 (c) and SSP126 (d). The black lines represent the borders of level II ecoregions described in Fig. 1.

Table 3 Soil carbon stocks predictions for every 10 years between 2030 and 2100.
Figure 5
figure 5

Soil organic carbon stocks predictions in continental United States of America for every ten years between 2020 and 2100 for SSP126 in blue and SSP585 in red.


Our baseline SOC stock distribution showed similar spatial distribution as the published RaCA map produced using an ordinary kriging spatial interpolation approach15. The high values of SOC stocks in coastal environments showed the high capacity of wetlands like the Everglades to store SOC. Wetlands potential for store SOC was also highlighted in other studies, Hinson, et al.16 estimated 1153–1359 Tg of SOC in the 0–100 cm in US peatlands, or about 19% of US SOC stocks. Similarly, the model bias in the SOC estimation for ecoregions with higher SOC stock were also documented in earlier studies16. Net primary production importance as predictor for SOC stocks in driest ecosystems like the North American deserts (Fig. 3) may be related to its sparse vegetation. Thereby, probably as the carbon input to soil is low compared to other ecosystems, the net primary production impact was captured as a limiting factor.

In this study, climatic variables were important predictors of SOC stocks in some ecoregions such as the Southeastern USA Plains and Everglades, showing the sensitivity of these ecosystems to projected climate change estimates. The higher influence of temperature and precipitation on SOC of wetlands indicate vulnerability of these system to future climate changes17,18. Although climate factor has been reported in this and other studies as important predictors for SOC stocks, it may be a proxy for geochemical factors that affect SOC directly like soil texture and microbial activity. Geochemical factors were also reported as important predictors for SOC stock in other studies19. Yang, et al.20 found soil moisture and texture as major drivers to explain SOC distribution in Tibetan plateau and Giardina, et al.21 suggested the change in soil respiration due to increased temperature and precipitation as major factor to determine the change in SOC stocks.

Some other factors used to predict SOC stocks are also expected to change in the future (e.g., the location of wetlands is expected to change with the change in hydrologic regime and wildfires may increase in some regions due to temperature raise and reduced precipitation). According to the fifth IPCC report18, the Southern US ecoregions are expected to get drier, opposite to the Everglades and the central region that are expected to experience higher precipitation. Although wetlands are expected to change due to increased CO2 concentration, precipitation change, frequent extreme events (e.g., floods, storms), sea level rise, frequent wildfires, increase methane emissions and temperature, the effect of climate change on SOC stocks is still not well known22. The complexity of the water cycle makes it difficult to capture all the feedbacks and peatlands/wetlands may emit more carbon than sequester17. It’s not clear whether climate change will increase or decrease wetland area over continental US, but the net primary production should increase23. Primary vegetation is expected to move South to North in continental US due to increase in temperature in higher latitude23,24. Isolated events like wildfires has increase in the last years in west conterminous US, resulting in tree mortality25. Although carbon emission from wildfires is certain, its effect is still not well understood because of vegetation regeneration26. On the other hand, US deserts may increase biomass production18.

The extreme SOC values showed no difference for record low between SSP126 and 585, but the record high was 41 to 500% higher in SSP585 compared to 126. Soil organic carbon stocks reduction accounting for conterminous US for SSP585 was highly pronounced compared to 126, highlighting SOC sensitivity to climatic factors. The main exception, Florida, showed an increase in SOC stocks (Table 2), that can be explained by the high sensitivity of this region to mean temperature (Fig. 3). Temperature and precipitation changing are expected to influence other factors such as vegetation and sea level, influencing SOC stocks. These changes, as well its effect over SOC stocks, takes time, since SOC stabilization in soils is a dynamic process that reaches a new equilibrium stage on ecosystems after years or even decades from perturbation. These factors can add uncertainties when using a spatial based approach instead of temporal8. Although, when comparing our results with GFDL-ESM4 model simulations that accounted for carbon climate feedbacks, the same pattern was observed, considering total SOC stocks reduction for SSP126 and SSP585.

Although SOC dynamics may be similarly affected by the explored predictors in all the studied region, our results indicated relative importance, since SOC stocks were limited for different predictors across different ecoregions. This way, changes in SOC stocks inside the same ecoregion was related to variation on the specific limiting factors. This is especially important for improve soil process representations in Earth System Models, where treat specific important factors more accurately in different ecoregions can be a strategy for improve models’ predictions. Higher precipitation and lower temperatures were associated with higher levels of SOC stocks in majority of ecoregions. Changes in land cover types (vegetation properties) was important in drier ecosystem as North American deserts, whereas soil types and topography were more important in American prairies. Wetlands of the Everglades was highly sensitive to projected temperature changes. Although represent these behaviors in Earth System Models can be a way to produce more accurate global predictions, interaction among factors can result in emergent complex interactions, difficult to derive especially when data is a limiting factor12. This shows the importance of local models as an alternative to produce more accurate predictions on its represented region. The development of appropriate observation networks may provide data for the development, benchmarking, and validation of statistical and process-based models, increasing our understanding of SOC cycle at different scales and improving the predictive accuracy12,27. In addition, the description of different environmental controllers for US ecoregions can help to describe the scope and importance of global and local models.


Study area and soil carbon profiles and observations

We performed this study in the continental US using 4559 georeferenced soil profile observations obtained from the Rapid Carbon Assessment28 project (Fig. 1). The soil profiles were distributed across the conterminous US covering 21 existing ecoregions described by the United States Environmental Protection Agency (EPA,, land cover and soil types. A total of 31,472 samples describe the soil profiles, 77% (24,192) were distributed between 0 and 100 cm, which was used for this study. Thus, approximately five samples per soil profile were used for calculating soil carbon stocks at 0–100 cm depth. SOC was measured by dry combustion and bulk density was modelled30. For calculating SOC stocks for each pedon/profile, a fixed depth approach was used. A more complete description of the methods used to analyze the samples can be obtained in the Rapid Carbon Assessment: Methodology, Sampling and Summary15. The measured SOC ranged from 1.37 to 11,981.0 Mg C ha−1 and with a mean of 209.6 Mg C ha−1. The lowest stocks were measured in desert biomes and highest in wetlands. The SOC stocks showed a unimodal positively skewed distribution to normalize the data for the modelling activities we applied a box-cox transformation.

Environmental variables

Soil orders were obtained from the conterminous United States digital soil map derived from the gridded Soil Survey Geographic Database (gSSURGO) in 30 m resolution ( and aggregated into 10 soil orders: Alfisols, Andisols, Aridisols, Entisols, Histosols, Inceptisols, Mollisols, Spodosols, Ultisols, Vertisols. We obtained the bedrock geology types information31 from the United States Geological Survey ( map with 1 km resolution and aggregated the main geological features into 23 groups.

We obtained 30 years (1981–2010) mean annual average data of precipitation, maximum, minimum and mean temperature, mean dew point temperature and minimum and maximum vapor pressure deficit at 800 m resolution from PRISM website ( Monthly mean net radiation was derived from NASA ( with 0.25 degrees resolution and compiled into average annual net radiation datasets. The potential evapotranspiration maps were acquired from Consultative Group on International Agricultural Research—Consortium for Spatial Information ( with spatial resolution of 30 arc-seconds (≈ 1 km at equator). The mean annual NDVI and net primary productivity datasets were obtained from NASA Earth observations at 10 km resolution ( We obtained US land cover data of 30 m resolution from multi resolution land characteristics ( and aggregated land cover types into 6 major categories, briefly: Open Water and Perennial Ice/Snow were grouped as Water, Developed Open Space, Low, Medium and High Intensity were grouped as Urban, Deciduous, Evergreen and Mixed Forest were grouped as Forest, Woody and Emergent Herbaceous Wetlands were grouped as Wetlands and the other classes were Rock, Cropland, Grassland, Pasture and Scrub. The Water, Urban and Bare Rock Land cover types were excluded from the analysis. All the chosen categorical variables were treated as individual predictors (e.g., We considered Alfisols, Andisols, Forest and Croplands as four predictors).

We obtained a digital elevation model (DEM) with 30 m resolution32 from US Geological Survey Database ( and derived the following terrain attributes: Elevation, aspect, slope, plan curvature, profile curvature and total curvature in GIS environment using ArcGIS v 1033. The environmental predictors used are described in Table 1 and the ecoregion used for group the observations in Fig. 1. All the environmental dataset raster maps were resampled to 800 m, the climatic data resolution, and the values were extracted at SOC sample points and used in further analysis.

Data pre-processing and geospatial model fit

The SOC data in the same ecoregion that were outside of 1.5*IQR where removed as outliers. First, we used plots to identify non-linear relationships that could not be captured by linear modelling. Further, to avoid including unnecessary environmental predictors in the model we generated Pearson correlation coefficients for all numerical predictors paired in a correlation matrix. When the pair showed a high correlation coefficient (r > 0.70) one of the predictors was removed, we choose keeping the one which its effect over SOC dynamics is better known according to current theory. To measure the model fit, we divided the database into training (75%; 3419 samples) and testing (25%; 1140 samples) datasets. The training dataset was used to fit the model and testing set to measure the model prediction capacity. We treated the nominal/categorical variables as dummy variables (e.g., 0 or 1).

After pre-processing, we fitted multiple regression models and used three selection criteria to choose the optimum set of linear predictors, variable significance according to F test at p < 0.05, best subset34, and R2. Using the significant environmental predictors, we fitted a GWR model, the adaptative bandwidth was chosen based on Akaike Information Criterion minimization35. Briefly, GWR works as a multiple linear regression that fits unique parameters for every feature in the dataset. Consequently, the result is a model with the same predictors for SOC stocks and different coefficients (β values) for each local and ecoregion. The spatial variation of the coefficients can be used to explore local ecoregion specific controllers for SOC stocks. A more complete description of GWR approach can be found in Fotheringham book36,37. The model was then applied to the test dataset which had an R2 of 0.48. We then used the GWR model to generate a SOC stocks map in in Mg ha−1 33. Using the coefficients adjusted for the predictor variables in the GWR model, we calculated a correlation matrix and performed a principal component analysis to visualize the relationship between environmental predictors and ecoregions.

Future climate change scenarios

We investigated the changes in SOC stocks until 2100 under SSP126 and SSP585 climate change scenarios of CMIP618,38. The increase in atmospheric CO2 is expected to increase temperature and change precipitation averages. We obtained maps of mean air temperature and precipitation from National Climate Assessment ( and GFDL-ESM4 models (NOAA, National Oceanic and Atmospheric Administration) for every 10 years from 2030 to 2100 and used the GWR model to explore the effect of these changes on SOC stocks. Briefly, all the model predictors were kept constant with exception of temperature and precipitation (obtained from GFDL-ESM4 predictions), then the model was run to produce SOC maps. The generated SOC stock maps were also compared with GFDL-ESM4 model SOC stock maps for current and future climate change scenarios (CMIP6, This model was chosen because produced better predictions of SOC stocks in American biomes compared to other Earth System Models4. The maps were subtracted (e.g., 2100 SOC map for SSP126 scenario—current SOC map) for generate SOC stocks difference maps and those were used for comparing the differences between scenarios. All modelling processes were performed using R v. 3.6.139.