Introduction

The soil in terrestrial and blue carbon (C) ecosystems (BCE), represents a substantial store of organic C globally, with a total stock of approximately 1500 to 2400 Gt, including the ~32 Gt in BCE1,2. This stock is larger than that in terrestrial vegetation and the atmosphere combined, making soil organic carbon (SOC) sequestration and protection in terrestrial and BCE (seagrasses, tidal marshes, and mangroves) an important nature-based solution to mitigate climate change3. Increasing SOC also provides other co-benefits listed as priorities under the Sustainable Development Goals setup by the United Nations Educational, Scientific and Cultural Organization (UNESCO), such as improved ecosystem health, biodiversity, food security, water quality, and coastal protection4,5,6,7.

Understanding the environmental drivers of SOC variation in terrestrial and BCE is imperative to anticipate potential losses posed by stressors and identify opportunities in climate mitigation strategies. Changing climate and disturbance regimes can threaten SOC stocks in both terrestrial and BCE8,9,10. For example, the summer of 2019/20 saw an estimated 5.8 million hectares of temperate forest burnt, which equates to ~21% of their area in southeast Australia11. Depending on their severity and frequency, wildfires can influence forest SOC stocks12,13 with the greatest effects indicated after short-interval high-severity fires14. An improved understanding of the drivers of SOC variability and stocks can inform forest management that prioritises C dense hotspots vulnerable to changes in disturbance regimes and climate.

BCE are among the most vulnerable and threatened habitats to C loss from climate change, coastal development, sea level rise, and deforestation15,16. Future increases in temperature and rainfall variability with a changing climate may also cause changes in the distribution and the processes that alter C storage in BCE17. Although some BCE may transiently increase their SOC stocks with increased warming18, they are vulnerable to extreme weather events19,20. For example, ref. 21 estimated the loss of 0.54–2.45 Mt C from the seagrass beds of Shark Bay, Western Australia, following a marine heatwave. Similar to terrestrial ecosystems, the SOC stocks of BCE are highly vulnerable to both climate change and anthropogenic disturbances. Therefore, it is critical to develop accurate spatially explicit estimates of SOC stocks for these systems using consistent methods across terrestrial and BCE.

Land management practices can alter SOC stocks22,23. Activities that disrupt the soil and increase respiration and erosion release the SOC stored24. In Australia, land use change from native systems to cropping has released an estimated 51% of the SOC stored in the top 0.1 m of soil25 and conversion of forest to cultivation has caused a decrease in SOC of around 30%26. In BCE, the loss of seagrass meadows exposes the underlying organic-rich soils to erosional processes27. The conversion of mangroves to aquaculture ponds or changes in the tidal flow within estuaries can result in significant C loss28,29. The historical losses of SOC in terrestrial and BCE have incited interest in protecting and restoring the vegetation and the underlying SOC to help mitigate climate change30,31.

In terrestrial environments, soils are typically well drained with C inputs accumulating from biomass above and within the soil profile. BCEs are at least periodically inundated, affecting their physical and chemical soil properties and making the SOC stored in these habitats distinct from terrestrial environments. The position of the soil in the intertidal zone and the prevalence of flooding in BCEs alters the soil water regimes, drainage, and oxygen availability32 which drive the accumulation and long-term storage of SOC33,34. Soils with low oxygen availability will have reduced capacity for microbes to mineralise the C stored in saturated saline sediments, slowing SOC decomposition35. Moreover, position in the intertidal zone will affect the deposition of SOC and minerals in sediments from terrestrial and oceanic inputs34. The combination of these processes enhance the rate of C accumulation in BCEs, reduce C-mineralisation and loss of C from the soil profile, thereby increasing total SOC storage and making BCEs a long-term store of SOC. Therefore, it is crucial that baselines of continental SOC are derived consistently and include both terrestrial ecosystems and BCEs.

Spatially explicit estimates of SOC stocks in a country’s terrestrial and BCE are needed for conservation and restoration practices, informing national inventories, and climate mitigation policies such as Nationally Determined Contributions under the United Nations Framework Convention on Climate Change36. Spatially explicit estimates of SOC stock currently exist for terrestrial habitats with global maps37, as well as country-specific estimates from Australia38, the USA39, France40, China41, Tunisia42, and other countries. There is a continental map of Australian BCE stocks43, and also a global map of SOC stocks under mangroves44. However, we have found no literature on the combined spatial modelling of terrestrial and BCE anywhere in the world. We need harmonised datasets and consistent modelling of SOC stocks across both ecosystems to improve our understanding of its variation at different spatial scales. The connectivity between terrestrial and marine ecosystems45 requires a holistic understanding of relationships and processes occurring both inland and along the coast to aid conservation strategies.

Here, we collated and harmonised the most comprehensive terrestrial and BCE dataset of SOC stocks in Australia. We modelled the stocks using a multi-scale machine learning method with climatic, edaphic, mineralogical, vegetation, terrain, and oceanographic predictors. Our aims were to (i) enhance our understanding of continental and regional environmental drivers of SOC stocks in Australia, (ii) estimate the 0–30-cm layer SOC stocks in Australia, its ecosystem and land uses to identify hotspots for conservation, and (iii) digitally map the SOC stocks and its uncertainty across Australia at a spatial resolution of 90 m to inform nature-based strategies for climate change mitigation.

Results

We harmonised data from 6,767 sites representing Australia’s terrestrial ecosystems and BCE and calculated the SOC stocks for the 0–30-cm layer. It is the largest and most comprehensive dataset currently available and represents all soil types, ecosystems, and land uses in Australia. We then setup a model to represent the soil and environmental controls and the spatial distribution of SOC in terrestrial and BCE. This model describes SOC as a function of factors related to its formation and distribution: edaphic, climatic, biotic, terrain, and oceanographic (see Methods)38,46,47. SOC varies at different scales over the landscape48. To capture the scale dependence of SOC across Australia, we decomposed elevation and the topographic wetness index (TWI), with the discrete wavelet transform (DWT; see Methods)49,50 (Fig. 1), and used the multi-scale data in our model. We used the regression-tree method cubist to relate 29 spatially explicit proxies of the factors in the model to the SOC stocks.

Fig. 1: Topographic wetness index (TWI) of Australia and wavelet multiresolution decomposition.
figure 1

a The original TWI map (90 m × 90 m pixels) and b panel inset showing a magnified section of northern New South Wales. cf show the levels of wavelet decomposition (4–7) depicting shorter to longer range spatial variation. Level four represents ~720 m, level five~1440 m, level six~2880 m, and level seven represents ~5760 m.

The regression-tree algorithm further aids with the multi-scale understanding of the drivers of SOC and its spatial distribution because it segments the data into subsets based on a series of rules with conditional statements (see Methods). Within each segment, the characteristics of the predictors are similar. Then, within each division, a least-squares regression predicts the response51. We analysed the conditions and linear models to describe the drivers of SOC variation in terrestrial and BCE at the continental scale. We used the soil, climate, vegetation, multi-scale terrain, and oceanographic variables within each partition to discern the regional drivers of SOC (see Methods). We validated the models with a tenfold cross-validation, by bootstrapping and using an independent test set, which showed the models were accurate (Table 1).

Table 1 Summary statistics for the model validation.

Regionalising Australia’s terrestrial and BCE

cubist partitioned the data into 31 distinct rule sets that regionalise SOC and characterise the spatial variation in SOC stocks across Australia. The model separated the data into contiguous regions closely aligned with habitat type and climate zone, representing increasing SOC stocks (Fig. 2a). To aid interpretation, we grouped the rules into eight (Fig. 2) (see Methods). For example, forests were in rules representing larger SOC stocks, whereas shrublands and grasslands were in rulesets that represent smaller stocks (Fig. 2a). In BCE, seagrasses occurred in rulesets representing soil with smaller SOC stocks. In contrast, mangroves and tidal marshes were generally in rulesets that represent larger stocks alongside terrestrial forests (Fig. 2a, b). The spatial arrangement of the rules across Australia shows that the model adequately regionalised the dataset into bioclimatic regions (Fig. 2c).

Fig. 2: Continental soil carbon stocks by cubist rule.
figure 2

a Boxplot of the eight cubist rulesets, separated by ecosystem type. Center line represents median SOC (t ha−1), upper and lower quartiles represented by the box limits, and whiskers extend to the smallest and largest observations within 1.5 times the interquartile range. b Violin plot with red point representing median SOC (t ha−1) content of each ruleset highlighting the increase in SOC with rule number, c spatial distribution of data collated in this study and the rulesets which they occupy.

Environmental drivers of SOC in terrestrial and BCE

We used a state factor model to represent the continental and regional drivers of the spatial distribution of SOC in terrestrial ecosystems and BCE. The model describes the soil as the state, which is a function of biotic, climate, edaphic, oceanographic and terrain factors that define the system. For the modelling, these factors are substituted with spatially explicit proxies, which we used as predictors in our model (see Methods). To determine the drivers and to digitally map the SOC stocks, cubist partitions the data into subsets based on the a series of rules with conditional statements (see Methods). The data in each subset (described by a ruleset) is then fitted with an ordinary linear least-squares regression. To determine the drivers of the SOC at the continental scale we analysed the predictors used in all of the conditions and the linear models using the variable importance from cubist (see Methods). The most important driver of the variation in SOC stocks over Australia is Net primary productivity (NPP), which highlights the provision of biomass and C inputs into the soil in terrestrial and BCE. Solar radiation, rainfall, and clay content were also key drivers of continental SOC stocks (Fig. 3a). However, at this scale, terrain attributes were less important. In BCE, the oceanographic predictors, known to vary with SOC52, were relatively unimportant. Reasons might be the relative sparseness of the data in BCE, and other variables affecting SOC across terrestrial and coastal systems, such as elevation, clay content, and mineralogy, helped to proxy the effects of oceanographic predictors. The continental analysis provides an expected general overview of the drivers of SOC variation in terrestrial and BCE in Australia. However, we must describe the region-specific drivers of SOC to understand C dynamics in these ecosystems better to aid management strategies for the preservation of current SOC stocks for climate change mitigation.

Fig. 3: Continental and regional drivers of SOC variation derived from the cubist model.
figure 3

a Continental control of variation represented with variable importance (%) and b Regional controls of SOC variation derived by regression coefficients from each of the distinct cubist ruleset (from 1–8). Rulesets correspond to those in Fig. 2. Rules represented by larger numbers depict data with greater SOC stocks. The size of the regression coefficient indicates the effect, and the sign (positive or negative) indicates its direction.

To describe the regional drivers we used the regression coefficient of the linear model fitted to each regionalised subset of the data that formed the ruleset. We found that multi-scale terrain attributes were consistently strong drivers of regional SOC variation (Fig. 3b). In rule one, which is associated with the least SOC stock and represents mostly grassland and shrubland ecosystems (Fig. 2a), elevation at a medium scale (of approximately 1500 m) (Fig. 1; see Methods), NPP, and rainfall were the most important drivers of SOC variation (Fig. 3b). In rule two, associated with the next largest SOC stock, represents woodlands in terrestrial ecosystems and seagrasses in BCE (Fig. 2). NPP and rainfall drive SOC variaion here, but clay content and the clay mineral smectite were also key drivers. In rule three, associated with grasslands and woodlands of semi-arid Australia, medium-scale elevation and kaolinite have the greatest influence on SOC variation, with mean annual rainfall and the Prescott index also important. Rules four and five had a mix of terrestrial and BCE data. The Prescott index, smectite, and total phosphorus, had the greatest influence on SOC variation in rule four, which includes data that spans terrestrial and BCE almost equally. The SOC stock in rule five, with the smallest number of sites, was predominantly controlled by short–medium scale elevation and kaolinite. Tidal marshes and mangroves occurred mainly in rule six (Fig. 2). Here, clay content, kaolinite, and longer scale TWI (Fig. 1; see Methods) drove the variation in SOC stocks. Rule seven is associated with the largest SOC stock and represents forests, mangroves, seagrass, and tidal marshes. In addition to medium–longer scale elevation (Fig. 1; see Methods), annual rainfall in terrestrial and wave height in BCE drive variation in SOC stocks. Finally, in rule eight, associated with the largest SOC stock and comprising mostly temperate forests (Fig. 2a), elevation and evapotranspiration drive the variation in SOC stocks (Fig. 3b).

Digital mapping of SOC stocks and their uncertainty

We used the cubist model to digitally map the SOC stock in Australia’s terrestrial and BCE (Fig. 4; see Methods). The model’s validation statistics, which we used to assess predictability (Table 1), show that the model was unbiased and accurate. We estimate that the average SOC stock in Australia, including terrestrial and BCE, in the 0–30 cm layer is 36.2 t ha−1 (95% CI 25.7–51.3 t ha−1). Estimates range from 7.6 t ha−1 in the arid regions of central Australia to 582 t ha−1 in the temperate regions of southeastern Australia and western Tasmania (Fig. 4). Aggregating the mapped estimates to the continental scale, we estimate total SOC stocks to be 27.9 Gt (95% CI 19.8–39.6 Gt) (Fig. 4a). The mean stock in BCE was 61.8 t ha−1 (95% CI 35.4–108.8 t ha−1), ranging from 17.1 t ha−1 in regions with temperate climates to 313 t ha−1 in regions with subtropical climates. We estimate BCE’s 0–30 cm total SOC to be 0.35 Gt C (95% CI 0.2–0.6 Gt).

Fig. 4: Predicted SOC stocks and uncertainty for Australia’s terrestrial and BCEs.
figure 4

a soil organic carbon (SOC) stock (t ha−1) across Australia’s terrestrial and BCE. Panels are examples of SOC estimates for each of the states and territories of Australia highlighted on the continental map; Northern Territory (NT), Western Australia (WA), South Australia (SA), Tasmania (TAS), Victoria (VIC), New South Wales (NSW), and Queensland (QLD). In the top panels, we show estimates for BCE for three states (WA, NT, QLD). To highlight the BCE estimates, we show the terrestrial predictions in grayscale; b Standardised 95% CI across Australia’s terrestrial and BCE to represent uncertainty. Panels are examples of uncertainty for each of the states and territories of Australia; Northern Territory (NT), Western Australia (WA), South Australia (SA), Tasmania (TAS), Victoria (VIC), New South Wales (NSW), and Queensland (QLD). In the top panels, we show uncertainty for BCE for three states (WA, NT, QLD). To highlight the BCE uncertainty, we show the terrestrial uncertainty in grayscale.

The uncertainties of the estimates were smaller where the sampling was denser and larger where the sampling was sparse. For example, estimates of SOC stock in seagrass areas in southern Australia are more uncertain than those in Western Australia (Fig. 4b). The dataset was sparser in the center of Australia, resulting in larger uncertainties (Fig. 4b).

Soil organic carbon stocks and vegetation

The soil under Australia’s tall open eucalypt forests had the largest mean SOC stock with 138.1 t ha−1 (95% CI 100–189 t ha−1). Conversely, the soil under acacia woodlands, which occur in arid and semi-arid climates, had the lowest mean SOC stocks with 21.5  t ha−1 (95% CI 14.0–32.5 t ha−1). Rainforests and tall open eucalypt forests were the only ecosystems with mean SOC stocks that exceeded 100 t ha−1 (Table 2). Open eucalypt (10–30 m canopy height) and low open eucalypt forests (<10-m canopy height) contain relatively less SOC with a mean of 87.4 t ha−1 (95% CI 63.8–119.6 t ha−1) and 78.5 t ha−1 (95% CI 55.3–110.6 t ha−1), respectively. Acacia woodlands, hummock grasslands, and eucalypt woodlands had less than a quarter of the mean SOC stocks of tall eucalypt forests. However, given the large spatial extent of these ecosystems, eucalypt woodlands and hummock grasslands had the largest total SOC stocks (combined 30% of the total SOC stocks; Table 2). Acacia shrublands and Tussock grasslands occur above soils with relatively small mean stocks of 25.3 and 26.3 t ha−1 respectively. However, these systems also occupy a large extent (831,374 and 1,410,611 km2, respectively). They store more SOC than soil under tall open eucalypt forests and rainforests combined, which hold the largest mean stocks per unit area but occupy relatively smaller areas (Table 2).

Table 2 Estimates of mean (t ha−1) and Australia-wide (Gt C) SOC stocks in major vegetation groups with 95% confidence interval of prediction.

BCE have mean SOC stocks comparable to some of Australia’s most C dense terrestrial forests. Our mean estimates of SOC stocks for seagrasses, tidal marshes and mangroves are 50.0 (95% CI 27.1–94.1 t ha−1), 64.7 (95% CI 37.4–111.6 t ha−1), and 79.3 t ha−1 (95% CI 47.7–131.7 t ha−1), respectively (Table 2). Tidal marshes have the greatest total SOC stocks of the BCE and contribute most to Australia’s SOC stocks with 0.14 Gt (95% CI 0.08–0.24 Gt, Table 2). Blue carbon ecosystems occupy approximately 54,500 km2 (0.7% of the terrestrial and BCE area) in Australia, and contribute approximately 1.3% (0.35 Gt C, 0.2–0.6 Gt 95% CI) of the total stocks. The SOC stocks in BCE are among the most variable, and estimates are the most uncertain compared to all other ecosystems.

Soil organic carbon stocks and land use

Production from irrigated agriculture and plantations had the largest mean stocks with 52.93 t ha−1 (95% CI 42.25–66.28% CI, Table 3). Areas under production from relatively natural environments, which include native grazing had the smallest SOC stock, with an average stock of 32.41 t ha−1 (95% CI 23.55–44.65 t ha−1). However, because they span ~3,722,989 km2, the soil under these production systems stores 12.11 Gt (95% CI 8.8–16.68 Gt), which is ~43% of Australia’s total SOC stocks. The soil used for dryland cropping and agriculture occupies around 625,630 km2, and its estimated mean SOC stock is 49.9 t ha−1 (95% CI 39.68–62.66 t ha−1) and its total stock is 3.13 Gt (95% CI 2.49–3.93 Gt).

Table 3 Estimates of mean (t ha−1) and total (Gt C) SOC stocks in primary land use types88 with 95% confidence interval of prediction.

Discussion

We used spatial machine learning to model SOC stocks in Australia’s terrestrial and BCE simultaneously. An advantage of the approach is that it leads to knowledge-discovery as it enabled us to derive an exhaustive and consistent understanding of the drivers of SOC variation across the continent and to compare the like-for-like stocks from both ecosystems. The data-driven machine learning we performed separated the country into eight distinct bioclimatic regions, where the within-region variation in climate and vegetation was similar. The model’s regional ‘blocking’ of climate and vegetation revealed the underlying regional drivers of SOC variation. Thus, the model accounted for pedogenic processes involving SOC in Australia’s terrestrial and BCE.

We showed that the continental controls of SOC were above-ground productivity and climate. Regionally, the drivers of SOC variation were multi-scale terrain and soil properties specific to the model-derived bioregions. The drivers of SOC variation in forest, with large mean SOC stocks, were small-scale elevation, evapotranspiration, and temperature, where a higher elevation and evapotranspiration affect biomass production, the C inputs into the soil and decomposition rates53,54. It has been reported that temperature and rainfall, coastal geomorphology (tides, sediment accretion, and nutrient deposition), and the above-ground biomass affect SOC variation in BCE55,56,57. Our modelling shows that elevation, clay content, and mineralogy are key drivers of SOC variation in tidal marshes and mangroves. BCE with more clay content in the soil had more SOC, possibly due to enhanced C protection and fluvial depositions of SOC in sediments56,58. Tidal marshes and mangroves had more SOC when wave height was lower, which suggests these systems sit higher on the intertidal zone or within protected areas with minimal wave-induced erosion to remove SOC and a greater propagation of above-ground biomass59. Similarly, SOC increased with total phosphorus in tidal marshes where increased nutrient deposition from adjacent terrestrial systems in the upper tidal areas increases labile SOC60,61. The understanding gained with our approach enhances understanding of SOC and its variation in terrestrial and BCE, and could inform regional and ecosystem-specific conservation and management strategies that contribute to climate change mitigation through the preservation of current SOC stocks and associated avoided emissions.

Most of the SOC in Australia resides in the vast semi-arid and arid regions under grasslands and shrublands. The mean SOC stocks in these soils is small, but the large area these systems occupy results in a sizeable total stock. Soils with the largest mean SOC stocks occur in eucalypt tall open forests and rainforests with high annual rainfall, which is conducive to biomass production and cooler temperatures that slow decomposition62, and therefore should constitute hotspots for conservation to avoid emissions from degraded ecosystems and to preserve the plethora of additional co-benefits they provide. For example, the indirect effects of climate change also threaten these systems through increases in the frequency and severity of disturbance events, like drought and wildfires11,14,63, which affect the potential for SOC replenishment through impacts on post-fire vegetation recovery and productivity64. In BCE mangroves had the largest mean SOC stock; however, these systems are at risk of C loss from deforestation and degradation65,66, rising sea levels, and extreme weather events causing tree mortality16,67. Our results show a need for enhanced awareness and the importance of sustainable management for preserving vulnerable areas with large SOC stocks to prevent future losses.

The research presented here builds on that of Viscarra Rossel et al.38 and Serrano et al.43, making our findings the most comprehensive and consistent continental investigations of SOC in Australia. In addition, our results provide the most current spatially explicit estimates of SOC stocks in terrestrial and BCE and their uncertainty. Viscarra Rossel et al.38 estimated the terrestrial SOC stock of Australia to be 24.97 Gt (95% CI 19.04–31.83 Gt). Our estimate of the total SOC stocks is larger as our model includes new data from Australia’s temperate forests68 and BCE43. Including data from68 improved our continental estimates under eucalypt forests and woodlands. Our estimates of the stock in these systems (see Supplementary Note 2, Supplementary Table 2) fall within the confidence intervals of the estimated SOC stock in their small-scale regional study68.

The measurements of the SOC stock in terrestrial ecosystems were to a maximum depth of 30 cm, as much of the data comes from agricultural cropping regions. However, the measurements of SOC in BCE were to a depth of 100 cm. In our research, we constrained the modelling to 0–30 cm, rather than implementing depth functions to extrapolate the terrestrial stocks to 100 cm (e.g.69) because the main objective of this study was to harmonise the data to model and compare all ecosystems, and because extrapolation methods (e.g70) tend to introduce errors in the modelling. Doing so would have hindered the interpretation of the drivers of SOC variation across terrestrial and BCE and made the estimates of their stocks significantly more uncertain.

The uncertainty of our estimates is related to the sampling density within the different ecosystems—sparse sampling results in more uncertain estimates (Fig. 4b). For instance, estimates of SOC under seagrasses were most uncertain because of the sparsity of data along Australia’s vast coastline. Therefore, future research and soil surveys to improve baselines and monitoring should target areas where our maps are most uncertain(Fig. 4b). Additionally, we should also measure other soil properties (e.g. pH, clay content, mineralogy) to improve our understanding and modelling of soil in BCE, including the long-term preservation of SOC linked to adsorption to mineral surfaces organo-metallic compounds71,72.

The spatially explicit estimates of SOC stocks and uncertainty we derived could help update Australia’s national inventory and improve the accuracy of the continental C balance (i.e., additions and removals) and its reporting. Our research shows how to derive consistent, spatially explicit multi-scale SOC models that span terrestrial and BCE. The approach also enhances our understanding of SOC variation and produces accurate baselines of continental SOC stocks that allow direct comparisons between terrestrial and BCE. The baselines developed here could provide a reference to more accurately determine the potential of Australian soil management to sequester C and mitigate climate change. Future work might determine changes in SOC from land management and climate change based on the spatially explicit results from this study.

Methods

Using an environmental correlation methodology, we harmonised and modelled the SOC stocks in terrestrial and BCE. The approach uses climatic, edaphic, biotic, multi-scale terrain, and oceanographic variables. Our research builds on that of ref. 38 by supplementing the dataset with samples from native forests and BCE and further developing the modelling.

The soil C and bulk density data from Viscarra Rossel et al.38, which represent all of Australia’s terrestrial ecosystems, soil types, and land uses, were supplemented with data from native forests68 and tidal marsh, mangrove, and seagrass BCE43. The updated dataset comprised 6767 sites (Table 4). The additional data were compiled from a broader range of studies and originated from various depth layers. Terrestrial SOC data were not measured beyond 30 cm depth, however, BCE were consistently measured to a metre depth. To reduce the extrapolation and associated error of predicting to a metre depth in terrestrial ecosystems, where most of the data lies, predictions were limited to from 0–30 cm. To obtain one estimate of the drivers of SOC variation and stocks for the 0–30 cm layer at each sampling location the soil data required harmonisation. We used a similar approach to that described in38. First, we calculated SOC stock for each layer from the three different datasets:

$$SOC(t\,h{a}^{-1})=OC\times BD\times (1-g)$$
(1)

where OC is the gravimetric proportion of organic C (%), BD is the bulk density (g cm3) and g is the gravimetric proportion of gravel (>2 mm) in the sample. For the native forest dataset68, g is the sum of the gravimetric proportions of the coarse mineral fragments, roots and charcoal. The BCE dataset did not contain gravimetric gravel content.

Table 4 Number of sites in each major ecosystem type used for spatial modelling.

We harmonised the data from the different soil depth layers with continuous depth functions as follows38. Sites with data from two depths were fitted with a log-log model54, while sites with three or more layers with natural cubic splines73. To reduce the overestimation of SOC stocks, we constrained the cubic spline function to be linear beyond the boundary points of the soil depth C data. The coefficients of the cubic spline and log-log functions were used to estimate SOC stock at every centimetre from 0–30 cm to obtain total SOC stock estimates for that layer. The SOC stocks were positively skewed and were log10 transformed before modelling.

Spatial modelling

To derive spatially explicit estimates of SOC stocks across Australia’s terrestrial and BCE, we set up a model that relates the measured SOC stocks to the environmental and multi-scale variables related to the formation and distribution of SOC. A regression-tree model, cubist, was used to estimate SOC in areas with no available SOC stocks but with environmental variables. We describe the approach below.

Spatially explicit determinants of SOC

Climate, organisms, terrain attributes, parent material, and for some BCE, oceanographic variables affect the spatial variation of SOC46,47. Therefore, we compiled a list of 29 spatially explicit data that are proxies for the environmental factors known to control the variation in SOC both terrestrially and in BCE10,38,47. We did not include data relating to ecosystem disturbance regimes due to a paucity of consistent disturbance history data across the continent; also as there has been a reported relatively minor influence in determining forest SOC68. When necessary, the predictors were resampled to the 90-m grid of the digital elevation model (DEM) using a bilinear interpolation (Table 5).

Table 5 Spatially explicit predictors known to influence variation in SOC stocks of terrestrial and BCE in Australia.

Preparation of the spatially explicit predictors

Consistent modelling across all ecosystems requires that the predictors span terrestrial and BCE. Therefore, we merged the terrestrial 3-arc second shuttle radar topographic mission (SRTM) 3-arc second DEM with bathymetry data74 to get depth below sea level for the seagrass and mangrove ecosystems. We used net primary productivity (NPP) that covers the terrestrial and BCE75. They derived the blue C NPP (seagrasses) with a vertically generalised production model (VGPM76), and for the terrestrial NPP, they used MODIS Terra MOD17A377. To account for the variation of SOC due to oceanographic influence, we included spatially explicit predictors of tidal range, wave energy and wave height in the models. These layers do not have any meaningful values for terrestrial ecosystems, so we replaced them with zeros in the terrestrial extent.

There were no parent material and soil property predictors that covered both terrestrial and BCE. Thus, we used the terrain, climate, and vegetation predictors that spanned both ecosystems to extrapolate the soil and parent material predictors to the BCE extent (see Supplementary Note 1, Supplementary Table 1). Following Young et al.10, the extent of BCE was derived for each of the habitats from SeaMap Australia78 and merged into a single file. Seagrass ecosystems are generally poorly mapped across much of Australia, particularly in the Northern Territory and northern Western Australia. Therefore, these extents underestimate the northern seagrass meadows43.

Multi-scale decomposition of terrain attributes

Soil properties, including organic C, vary at different scales. To capture the scale dependency of SOC in our modelling, we decomposed elevation and the topographic wetness index (TWI) with the discrete wavelet transform (DWT)49,50.

To decompose the DEM and TWI, we used a Daubechies wavelet function with two vanishing moments79. The algorithm starts by applying a high-pass and low-pass filter to the data to separate its detailed (high frequency) and the smooth (low frequency) components. The algorithm proceeds with the decomposition by applying the filters to the smooth components49. We decomposed the data into ten scales and then used inverse wavelet transform to reconstruct the layer, giving a predictor the same size and resolution as the initial raster. We used four scales to represent the short to long-range variation in the models. These decomposed predictors explicitly account for the multi-scale nature of SOC in our models (Fig. 1).

Modelling of SOC stocks

To model and estimate the SOC stocks in Australia’s terrestrial and BCE, we used the regression-tree algorithm cubist80. cubist is a piecewise linear regression-tree. It partitions the response into subsets where the predictors have similar characteristics and then applies an ordinary least-squares regression to the data in those subsets, providing localised estimates. A series of if, then, else conditions define the rules and data subsets. If a condition is true, regress; else apply the following rule. These conditions can comprise of single or several attributes. We describe our implementation below.

Model training and validation

Before modelling, we separated the harmonised dataset into a test and training set at random, with 70% (4736 samples) of the data used to train the model. The remaining data (2031 samples) were used as an independent test set (Table 6). cubist has two parameters that need to be optimised, the number of committees and neighbours. We used the training set to select the optimal combination of parameters using 10-fold cross-validation and a tuning grid with five values for committees (2, 5, 10, 15, 20) and four neighbouring observations (2, 5, 7, 9). We used the root mean squared error (RMSE) to select the optimal combination of committees and neighbours.

Table 6 Summary statistics for the test and training data used to calibrate cubist model.

Viscarra Rossel et al. 38 implemented cubist-kriging to account for any spatial correlation of errors in the model. We tested this approach and found no significant increase in model accuracy or skill (Δρc <0.02, ΔRMSE <0.005). The small increase in model accuracy and skill suggests that cubist and the predictors used already accounted for both deterministic and random components of the SOC stock variation in Australia.

Quantifying uncertainty

To quantify the uncertainty of the model estimates, we used the non-parametric bootstrap approach81. Repeated sampling with the bootstrap provides independent sets of residuals to asses uncertainty and confidence limits for the estimates. We took 30 bootstrap sets of the training data and implemented cubist—each bootstrap leaving approximately one-third of the samples out of the original data (out-of-bag samples). cubist was then used to estimate the SOC stocks of the out-of-bag samples, which provided an additional evaluation of model performance.

cubist was then also validated using the test data. To assess model performance, we measure bias with the mean error (ME), imprecision with the standard deviation of the error (SDE), and inaccuracy with the root mean squared error (RMSE). We also calculated the concordance correlation coefficient (ρc)82, which measures the difference between measured and estimated values and their deviation from the 1:1 line. The statistic evaluates both imprecision and bias82. A ρc value of 1 denotes perfect agreement, while values of <0.65 generally suggest poor agreement83.

Model interpretation

The cubist model partitioned the dataset in to 31 distinct rules. However, to aid in interpretation of habitat trends and environmental drivers of SOC stocks we grouped these in to 8 rules. We averaged the coefficients and rules sequentially in groups of 4, with the final group consisting of 3 rules. To evaluate the continental drivers of SOC variation we calculated the overall variable importance of the cubist model from each of the bootstrap sets. A variable importance function (varImp) within the caret84 package computes a linear combination of all the predictor variables used in the conditions and linear models of each ruleset of the cubist model. The function then outputs the variable importance percentage for each of the predictors.

To determine the regional drivers of SOC stocks we reported the predictor variables and their regression coefficients for each of the linear models from the rulesets. Predictors were scaled before assessing the regression coefficients to remove the effect of units from the rulesets. Predictors with a larger positive or negative regression coefficient were interpreted as having a greater influence on the regional SOC stocks.

Spatial estimates and digital mapping of SOC stocks

The final SOC prediction was an average of the 30 bootstrap models, and their 95% confidence intervals quantified uncertainty. We standardised the 95% intervals by dividing the difference between the upper and lower bounds by the mean estimates from the bootstraps. The mean SOC stock estimates and their 95% confidence intervals were on the logarithmic scale, so we back-transformed them to the original unit using Cox’s method85,86.

Total SOC stocks

We calculated Australia’s total SOC stocks and the stocks in their habitats and land uses. We estimate total SOC stocks as the sum of the cell (pixel) values, multiplied by the resolutions (Res) of SOC estimates converted to hectares and dividing by 109 to convert to gigatonnes (Gt) of C.

$$SOC(Gt)=\frac{\sum SOC(t\,h{a}^{-1})\times Re{s}^{2}}{1{0}^{9}}$$
(2)

We used the National Vegetation Information System (NVIS) (version 687) and land use types88 to interpret and evaluate the spatially explicit estimates of SOC stock. We merged the NVIS layer with the BCE extent to extend the coverage over BCE. Land use types did not extend to the BCE.