Introduction

Soil organic matter (SOM) plays a crucial role in achieving multiple Sustainable Development Goals (SDGs)1,2,3,4,5, in particular SDG target 2.4 on sustainable food production and resilient agricultural practices and target 15.3 on land degradation neutrality by 2030. Furthermore, SOM is linked to six of the eight mission objectives of the Soil Deal for Europe6, which aims to fulfill European and international commitments to the SDGs. In 2023, the European Commission underscored its commitments through the Directive on Soil Monitoring and Resilience7, a legislative proposal where soil health is defined as “the continued capacity of soils to support ecosystem services”6.

Besides being essential for soil health, SOM offers an opportunity for climate change mitigation through carbon sequestration8,9. Studies have shown that SOM increase is feasible under best management practices, and consequentially, the “4 per mille Soils for Food Security and Climate” was launched with the aspiration to increase global SOM stocks by 4 per 1000 (or 0.4%) per year10. Coordinated efforts are underway to develop best practices for measuring, reporting, and verifying SOM changes11, while simultaneously adapting agricultural systems to facilitate carbon farming as a means of mitigating greenhouse gas emissions12.

However, while it is perhaps too early to determine the effectiveness of international commitments, SOM continues to decrease in European croplands13 and peatlands14. It is therefore no surprise that the increase of SOM and conservation of peat soils remains the main challenge related to soil health15.

With its intensive agriculture, degraded peatlands, and highly anthropogenic landscapes, the Netherlands is an ideal case for examining SOM changes to address the priority soil health challenge, “4 per mille”, the Soil Deal and SDG targets. Situated in Europe’s largest delta, the Rhine-Meuse-Scheldt delta, agriculture in the Netherlands is widely regarded as the most intensive in Europe16. Before the start of agriculture in the low lying regions less than 2000 years ago, more than 50% of what is now the Netherlands was covered in peat17,18. Through drainage, excavation and/or agricultural use of peatlands, this has now been reduced to 15% (Fig. 1d, e; SI 1). The Netherlands is largely composed of anthropogenic landscapes: historic land reclamation (17% (Percentages of present day surface area); Fig. 1c), conversion to urban areas (15%; Fig. 1a, b) and re-landscaping of new nature and recreational areas have had a tremendous impact on the soils.

Fig. 1: Maps of land use, physical geographic regions and peat occurrence.
figure 1

a Land use map of the Netherlands in 1953. b Land use map of the Netherlands in 2022. c Main physical geographic regions. d Peat occurrence in 1987 at 5–15 cm depth. e Peat occurrence in 1987 at 60-100 cm depth. See Methods for how maps of dynamic covariates were derived for all years between 1953 and 2022 and SI 1 for information on plaggen soils shown in (c).

The challenge of increasing SOM for restoring soil health and fulfilling international commitments requires high resolution, spatio-temporally explicit SOM assessment to facilitate management practices and land use decisions tailored to local soil conditions. To address this challenge, we developed a modeling platform in 3D space and time (3D+T) as a new paradigm for SOM monitoring and mapping. It provides annual predictions of SOM and its uncertainty in the Netherlands, at 25 m resolution at point support between 0 m and 2 m depth from 1953 (first measurements; Fig. 2) to 2022. We used machine learning, 869 094 SOM observations from 339,231 point locations (~10 locations per km2; Fig. 2 and Table 1) and spatially explicit environmental covariates. Using quantile regression forest19, the median of the predicted probability distribution was taken as the predicted SOM value while the 90th prediction interval (PI90) indicates prediction uncertainty (Methods). The covariates were either static (2D), variable in time (2D+T)20,21 or variable in 3D+T, and serve as proxies for soil-forming factors. While climate, relief, and parent material were considered static, land use (Fig. 1a, b) and the occurrence of peat (Fig. 1d, e) were considered dynamic due to their greater propensity to anthropogenic influence over 70 years compared to the other soil-forming factors22. Besides prediction uncertainty, we used statistical validation to assess mapping accuracy. Space-time mapping of soil properties remains a major challenge, with relatively few studies having mapped temporal changes on a regional to global scale20,21,23,24,25,26,27,28,29,30. To the best of our knowledge, our approach is the first to use a 3D+T dynamic covariate (Fig. 1d, e). Furthermore, these are the first SOM maps in 3D+T on a national scale. As a convention of this paper, SOM and absolute changes in SOM between 2 years (ΔSOM) are expressed as mass percentages.

Fig. 2: Soil organic matter (SOM) observations.
figure 2

a Map showing the 339 231 locations with SOM observations. b Temporal distribution of SOM observations (note log-scale of y-axis). Laboratory measurements are shown in orange and field estimates are shown green.

Table 1 Number of SOM laboratory measurements and field estimates used for model tuning and calibration separated by depth layer

Results and discussion

SOM decrease in peatlands

Our findings indicate that between 1953 and 2022 there was a decrease of more than 1% in SOM on 14% of the land surface area of the Netherlands, which is equivalent to 4750 km2. Furthermore, there was a decrease of over 10% in SOM on 4.5% of the land surface area, which amounts to 1520 km2, predominantly occurring in peatlands. In former peat layers now classified as mineral soil layers, average SOM decreases at 0–30 cm depth ranged between 9–21% (Table 2). For soils still classified as peat, average SOM decreases were mostly >2%. Regardless of peat oxidized or not, average SOM decreases in croplands were substantially higher compared to grasslands and forests. A recent study conducted in the Netherlands found that there were no substantial changes in SOM in organic soils at 0–30 cm depth31. However, at 30–100 cm depth, the study showed a decline of 7–9.5% between 1998 and 2018, which is consistent with our research findings in organic and former organic soils across all depths. The prediction maps also align with the spatial patterns of SOM in other studies31,32.

Table 2 Predicted average changes in SOM [%] for 0–30 cm depth between 1953 (left) and 2022 (top) for combinations of soil type (peat vs. mineral) and land use (grass = grassland, crop = cropland, forest)

3D spatial predictions of SOM changes (Fig. 3) were different for bog and brook-valley peatlands, located mainly in the Northeast of the country, and fen peatlands, located mainly in the low-lying West and Northwest of the country (Fig. 1c). In bogs and brook-valleys, peat layers were often thinner than 1 m (Fig. 3a, b, e, Supplementary videos 16) and SOM decreased by >10% or even >25% between 1953 and 2022 (Fig. 3c, d, f). Time-lapse maps spanning the entire 70-year period provide a visualization of these gradual changes over time at different depth layers (Supplementary videos 16). SOM decreases exceeding 25% were primarily predicted at depths below 20 cm (Fig. 3f). This can be attributed to the fact that in peatlands, SOM predictions were mainly confined to a range of 5–20% within the uppermost 20 cm, while deeper depths had predictions surpassing 25% (Fig. 4a, b, e, Supplementary videos 16). Figure 4c shows a typical brook-valley region on a 1:25,000 map (left), where SOM decreased by 10–25% at 5–15 cm depth due to peat oxidation. The center of this map depicts a location under intensive grassland visited as part of the ΔSOM validation dataset (Methods). Here, yearly SOM predictions between 1953 and 2022 (lines) and measurements in 1965 and 2022 (points) from 9–18 cm to 23–30 cm depth also underline this trend of decreasing SOM over time (Fig. 4c). Overall, our findings corroborate previous surveys in the northeastern province of Drenthe, which indicated that peat layers less than 40 cm thick are found in roughly 47% of the area previously classified as having thick peat soils, while mineral soils cover ~55% of the area that was initially identified as shallow peat soils33,34,35.

Fig. 3: Predicted soil organic matter (SOM) in 2022 and SOM changes (ΔSOM) between 1953-2022 at different depths.
figure 3

a SOM [%] predictions for 2022 at 5–15 cm depth. b SOM [%] predictions for 2022 at 60–100 cm depth and dashed line showing location of depth transect (plot e). c ΔSOM [%] (1953-2022) at 5–15 cm and crosses showing the ΔSOM validation locations (Methods). d ΔSOM [%] (1953-2022) at 60–100 cm depth and dashed line showing location of depth transect (plot f). e SOM [%] predictions for 2022. f ΔSOM [%] predictions for 1953-2022. Predictions shown in (e, f) were made from 0–200 cm depth with 5 cm depth increments along the transect shown in (b, d).

Fig. 4: Accuracy plots and metrics, zoom-in maps and time series of soil organic matter changes (ΔSOM).
figure 4

a Accuracy plot and accuracy metrics using 10-fold cross-validation of SOM [%] laboratory measurements between 1953–2011. b Accuracy plot and accuracy metrics of SOM [%] temporal changes (ΔSOM) at 63 locations shown in Fig. 3c (Methods). ce Zoom-in maps [1:25 000] of three ΔSOM validation locations (left) and time series plots from these locations sampled at two or three depths (right). Time series plots depict SOM [%] laboratory measurements (points), model predictions [%] (line) and prediction uncertainty [%] (PI90, background shading) between 1953–2022 colored by depth. Time series of soil samples in (ce) are also indicated in (b).

In fens, peat layers exceeding 1 m within the predicted 0–2 m depth range were found to be less susceptible to SOM decrease compared to thinner layers (Fig. 3). Upon comparing Fig. 3e, f, a noticeable trend emerges between 110,000 and 130,000 Easting: a decrease in SOM is predicted in the thinner peat layers that are located adjacent to the thicker ones. Typical for these fen meadow regions, Fig. 4d shows little to no SOM changes between 1953 and 2022 at 30–60 cm depth. Here, yearly SOM predictions between 1953 and 2022 and measurements in 1971 and 2022 at 23–50 cm depth were above 40% and did not change. SOM measurements increased from 40% to 49% in the upper 6 cm, perhaps because the area was turned into a nature conservation area in the 1980s.

Although our predictions indicate limited SOM changes in peat layers exceeding 1 m, it is important to note that such areas may still experience a decline in carbon stocks or net CO2 emissions. As our modeling was restricted to the top 2 m, SOM changes beyond this depth were not captured in our analysis. Therefore, it is crucial to exercise caution when interpreting our findings with respect to carbon loss or gain in soils with peat layers thicker than 2 m. In fact, previous studies reported carbon losses up to 1 g/kg/year36 and peat oxidation rates up to 1 cm/year in the fen meadow regions of the Netherlands37,38.

SOM changes in reclaimed lands due to land subsidence

Model predictions reveal SOM increases of more than 10% in large areas of reclaimed land (Fig. 1c) below 80 cm depth between 145,000 and 170,000 Easting (Fig. 3d, f). Time-lapse maps spanning 70 years visualize these gradual changes at 100–200 cm depth (Supplementary video 6). Land subsidence due to clay ripening, peat oxidation and soil compaction caused peat layers below 80 cm depth to shift upwards in terms of relative depth37,39,40,41 (Fig. 5), leading to SOM increase. As a result of land subsidence, SOM also decreased by 10–25% directly below thinner peat layers of ~50 cm (Fig. 3f between 155,000 and 170,000 Easting). Even at 0–30 cm depth, some soil layers changed from mineral to peat as a result of land subsidence, which explains predicted increases of 5–23% SOM (Table 2). If peat layers closer to the surface rise above groundwater levels, oxidation and carbon emissions will increase in the next decades. Carbon emissions as a result of land subsidence have been confirmed not only in the Netherlands18,42, but also other coastal plains and deltas worldwide43,44.

Fig. 5: Original and updated map of peat classes.
figure 5

a Map of sub-classes of soils containing peat in the original national soil map of the Netherlands (1:50,000)85. b Map of sub-classes of soils containing peat in the updated national soil map of the Netherlands. Following the Dutch soil classification7981, soils containing peat can be classified according to the starting depth and thickness of the peat horizon. These maps were used to derive eight fuzzy subsets of peat classes (one for each class) as 2D+T dynamic covariates and one fuzzy subset of peat occurrence as a 3D+T dynamic covariate. Note that the national soil maps exclude urban areas (white). The national soil map of the Netherlands can be downloaded and viewed at broloket.nl and bodemdata.nl.

Little SOM change in mineral soils

The model showed no substantial changes in SOM in the top 30 cm of mineral soils in grasslands or forests between 1953 and 2022 (Table 2). However, an average decrease of 0.1% was predicted in croplands. When grasslands or forests were converted into croplands, we predicted an average decrease of 0.1–0.3% in SOM, while the reverse scenario resulted in an increase of a similar amount. The model’s predictions for 2022 indicated that SOM levels in mineral soils in the top 30 cm were between 2.5–5% (Fig. 4a, e). Below 30 cm, temporal changes in SOM were almost non-existent in mineral soils (Fig. 3d & f, Supplementary videos 46). This outcome was expected because land use, which was considered a dynamic 2D+T covariate in the model (Fig. 1a, b, Methods), had little effect on the subsoil. Below 30 cm depth, the model’s predictions ranged from 1–2.5% for clay and loamy soils along the rivers, delta, and loess regions and less than 1% in the Pleistocene sandy areas (Fig. 4b, e, Supplementary videos 46). These findings are reasonable because subsoils typically have lower SOM than topsoils.

Despite its overall effectiveness, the model occasionally did not detect subtle temporal changes in SOM, as demonstrated in Fig. 4e (right). In this case, measurements indicated a 1% decrease in SOM between 1968 and 2022 in a sandy soil currently used for maize cultivation. However, the model’s predictions remained relatively constant. Analysis of a 1:25,000 map of the surrounding area revealed few predicted changes in SOM above 1% at a depth of 5–15 cm, with the exception of a nearby brook valley located southeast of the measurement site (Fig. 4e; left). Small changes in SOM as a consequence of land use changes can also be assumed based on the model’s variable importance, where dynamic land use covariates were among the least important (Fig. S3).

While the general trends we found in mineral soil across different land uses are consistent with other findings for the Netherlands and Europe, the average rate of SOM changes were lower than in previous research. When converted to SOM changes (Methods), Reijneveld et al.36 found increases of 0.40% for grasslands (0–5 cm) and 0.32% for croplands (0–25 cm) in mineral soils in the Netherlands between 1984 and 2004. Chardon et al.45 confirmed constant or slight increases in Dutch agricultural soils with SOM less than 14%. However, Conijn and Lesschen 201546 found an overall increase in SOM for permanent grassland but an overall decrease in SOM in croplands using a dynamic soil-crop model47,48. More recently, Knotters et al.31 found a decrease in SOM of 0.38% and 0.86% in Dutch croplands in mineral soils between 1998 and 2018 at 0–30 cm and 30–100 cm depth, respectively. However, for Dutch grasslands, the same study found a decrease of 0.48% SOM for 30–100 cm, while no substantial change was found for 0–30 cm depth. Between 2009 and 2015, converted SOM changes (Methods) on a European scale were approximately −0.09% on croplands, 0.24% on grasslands and −0.51% for grasslands converted to cropland13.

To summarize, our study and most recent research conducted in the Netherlands and Europe indicate that there is a decrease in SOM in croplands, particularly when converted from grassland, and an increase in SOM in grasslands. When accounting for the differences in time periods across the compared studies, the predicted changes in SOM we found were about 10-fold lower compared to actual measurement analyses at point or field-scale in other studies. Future studies should investigate this further, but one reason for this difference might be that random forest and other regression models smoothen predictions49, possibly leading to smaller predicted SOM changes. In addition, our model included both mineral and peat soils with SOM values up to 100%, potentially decreasing the model’s sensitivity to detect changes of smaller magnitude in mineral soils. To improve the accuracy of 3D+T mapping specifically for mineral soils, we suggest mapping them separately from organic soils or choosing a hierarchical approach50, investing in repeated measurements at the same locations for model calibration, and deriving covariates related to agricultural management practices. Future studies should investigate whether performance of our 3D+T modeling approach improves when monitoring data (more repeated measurements) are used during model calibration.

Model accuracy assessment

Model accuracy was assessed using a 10-fold cross-validation with data from 1953-2011 (MEC = 0.64; Fig. 4a) and design-based inference from 1993-2000 (MEC = 0.50) and 2018 (Table 3; Methods). The relatively high proportion of peat samples with SOM values up to 100% may explain relatively high RMSE values around 10%. The predicted SOM content at 0-30 cm depth was underestimated (ME > 1 for all methods; Table 3), which probably also relates to smaller predicted SOM changes in mineral topsoils compared to other studies (see above). Inaccuracies in the 30-100 cm layer in 2018 may be due to positional errors, differences in sampling support, or changes in laboratory methods between the calibration and validation data31. Furthermore, the use of the same data for model calibration and the national soil map generation, from which dynamic peat covariates were derived (Fig. 5), may have biased our predictions. The overall spatial patterns of predicted SOM align with previous SOM mapping studies in the Netherlands31,32.

Table 3 Model accuracy metrics of SOM [%] predictions using 10-fold cross-validation with laboratory measurements from 1953–2011 and design-based inference of an independent probability sample using measurements from 1993–2000 and 2018, respectively (Methods)

Our estimates of prediction uncertainty (PI90) in 3D space (SI 2) and time (Fig. 4c–e) were reliable based on the evaluation of prediction interval coverage probability (PICP; SI 2; Methods). However, when divided by depth, design-based inference from 1993–2000 and 2018 revealed that prediction uncertainty was overly-optimistic at 0–30 cm depth and slightly pessimistic below 30 cm depth (Table 3). Areas with high SOM values (peatlands) and urban areas had the highest prediction uncertainty, which was overall greater at lower depths (SI 2). Figure 4c confirms that uncertainty decreased as predictions decreased over time. Mineral soils had lower uncertainty (SI 2 and Fig. 4e).

An important limitation of our modeling approach is that it does not quantify the uncertainty of differences in SOM (ΔSOM) and of spatial aggregates (e.g., Table 2). These uncertainties can only be obtained if cross- and spatial correlation in prediction errors are quantified. For instance, the prediction error variance of ΔSOM at some location and depth is given by the sum of the variances of the SOM prediction errors at the two points in time for that location and depth, minus twice their covariance. Computation of the covariance requires the correlation between the two prediction errors. In case of spatial aggregation one must first quantify the spatial correlation of the SOM prediction errors, for instance by semivariograms, after which the uncertainty of the spatial aggregate can be computed using a block kriging of the residuals. Wadoux and Heuvelink51 did this in 2D space, but it is unclear how this should be efficiently done in 3D space and time (four dimensions). Semivariogram fitting in 3D+T is extremely challenging given that space-time and lateral-vertical anisotropies would have to be accounted for, while also the conventional geostatistical assumptions on multivariate normality and second-order stationarity would have to be questioned. There are also considerable computational challenges when conducting block kriging in a 3D+T context at high spatial resolution. We know of only one study that estimated 3D+T semivariograms and applied 3D+T kriging52, but this study assumed normality and used a simplified metric space-time semivariogram, which might not be appropriate for SOM in the Netherlands. Uncertainty quantification of SOM changes and spatial aggregates in 3D+T at scales relevant to management and policy is a critical task to investigate in future research, since the uncertainty related to soil monitoring has prompted widespread doubts about the feasibility of measuring and verifying SOM and soil organic carbon changes53,54. Thus, we advocate that future research should investigate the uncertainty quantification of SOM changes and SOM spatial aggregates, but given the challenges and complexity of such analysis this was beyond the scope of this research.

Temporal SOM changes (ΔSOM) were difficult to predict (Fig. 4b). The model sometimes failed to detect ΔSOM (Fig. 4b–e). However, prediction errors at point support tend to average out when increasing spatial and temporal support51,55,56, e.g., by averaging over larger areas, depth layers and years, making the 3D+T maps useful for many applications. For instance, predictions of samples c1 and c2 (lines in Fig. 4c) showed an unrealistic decrease of >10% SOM between 1972 and 1973, but when averaged over several years, the trend of decreasing SOM was confirmed by measurements in 1965 and 2022 (points in Fig. 4c). In general, temporal SOM variation was often lower than 3D spatial variation. However, the limited ΔSOM validation data (127 measurements from 63 locations; Fig. 3c) highlight the need for long-term soil monitoring using consistent methodologies in the field and laboratory11. For additional validation of ΔSOM predictions beyond the dataset used here, future studies could compare our predicted SOM changes with measured changes at individual locations of long-term field experiments57 for different soils and land uses. This would allow evaluation of the model’s ability to predict temporal changes, while currently the accuracy assessment was based on changes both in space and time.

3D+T mapping: a new paradigm for SOM monitoring

Our study demonstrated that SOM is highly variable over depth and time, which is insufficiently captured by non-spatially explicit13,31,58 or 2D mapping methods20,59 that are currently used for reporting SOM changes for soil health and climate mitigation commitments. In the Netherlands, approaches to assess SOM changes encompass long-term field experiments conducted at specific locations, summarized by Kooistra and Kuikman57, some of which served as the basis for process-based models45,46,47,48. Reijneveld et al.36 assessed changes of agricultural parcels using farm data. Knotters et al.31 assessed changes on the scale of four selected domains of interest, categorized by mineral soil, organic soils, grasslands and croplands. Unlike our research, none of the above studies explicitly account for spatial variation when modeling SOM dynamics. Furthermore, process-based models are often constrained by soil type (e.g., mineral) or land use60. In contrast, our 3D+T model accounts for SOM variation in space, depth and time and can predict for mineral and organic soils under any land use. In addition, the 3D+T model does not require repeated measurements from the same location, although prediction accuracy is likely to increase with better monitoring data. This is a major advantage because most areas in the world do not have monitoring data beyond individual field trials.

Previous digital soil mapping studies used spline depth functions61, geostatistical methods62,63,64, parametric depth functions65 or depth as a covariate59,66 to map a soil property at different depths. Gasch et al.52 used 3D covariates (soil parameters) and a 2D+T covariate (crop type) to predict soil water, temperature and electrical conductivity in 3D+T on a field scale (37 ha). However, to our knowledge this is the first study to use a 3D+T covariate. Machine learning has proven advantageous to predict complex, non-linear relationships between soil-forming factors and soil properties in 2D space67. Our study represents a next step in extending the predictive power of machine learning to 3D+T. In doing so, the 3D+T model was able to detect complex relationships between SOM and peat occurrence, which varied considerably in space, depth and time. The 3D+T covariate was the most important covariate in the model (Fig. S3). As a result, we found major changes in SOM not only in space but also over depth and time, especially in peatlands and reclaimed land. Therefore, we recommend that 3D+T modeling approaches be incorporated in SOM reporting alongside point monitoring networks to provide spatially explicit information. This is key for facilitating multi-functional land use policies and management practices based on local soil conditions. The 3D+T SOM maps with quantified uncertainty can aid decision-making. It can support decision-making on where to implement measures to increase SOM stocks to address the 4 per mille initiative and incorporate carbon farming in agricultural practices. They are also an important first step towards providing spatially explicit changes in soil carbon stocks and CO2 emissions from soils. As stated in the Climate Agreement of the Netherlands, the emissions of peat soils must be reduced with 1 Mton CO2-eq and of mineral agricultural soils by 0.4–0.6 Mton CO2-eq before 203068. Moreover, these maps identify where SOM-related soil health has declined most in the past 70 years and where restoration potential is highest and most urgent. Finally, the 3D+T SOM maps can serve as visual tools to raise awareness of the importance of soils for society, which is the Soil Deal’s eighth mission objective.

This research goes beyond the mere mapping of SOM between 1953 and 2022, as it has far-reaching implications for the future. With advancements in the explainable machine learning research domain, models using algorithms such as quantile regression forest are no longer considered black-box models. Recently, these methods have been applied in soil science to gain new insights into the complex relationship between covariates and soil properties69,70. Future studies could identify potential local drivers of SOM dynamics by using explainable machine learning methods69,70 to study the relationship between covariates and changes in SOM over time using the 3D+T methodology. Furthermore, this will allow the prediction of potential future changes in SOM. For example, by considering various scenarios involving groundwater levels, land use changes, or climatic indicators71, we can forecast changes in SOM25, providing crucial insights into the measures necessary to restore soil health.

The implications of the findings reported in this study also extend beyond the Netherlands. Peatland conversion72,73, land reclamation74, and agricultural intensification75 are ongoing in many parts of the world, but there is a lack of spatio-temporal soil and land use data in many of these regions. Moreover, the mechanisms underlying SOM dynamics apply to other bioclimatic zones as well, suggesting that the changes observed in the Netherlands may be relevant to less data-rich regions across the globe. These findings are particularly relevant to deltas worldwide, which often share similar geographic features and are home to ~350 million people76,77.

Materials and methods

Soil point data

We obtained 869 094 observations of SOM from 339,231 point locations using different datasets, most of which are part of the Dutch soil database (BIS; Fig. 2 & Tables 4, 5). SOM observations consist of either measurements in the laboratory using loss on ignition at 550 C78 or field estimates. Regarding the latter, soil surveyors estimated SOM in the field by looking and touching the soil sample for its color and texture. An expert estimation is then made, also based on their extensive, regional pedological knowledge of the soil-forming factors, soil texture, expected soil type and using SOM laboratory measurements at nearby locations. In instances where multiple soil surveyors made estimates at the same location and depth, the individual estimates were merged and the median value recorded79,80,81. Based on approximately eight thousand paired laboratory measurements and field estimates, and assuming that the laboratory measurement error is negligible compared to the field estimation error, the mean error, mean absolute error and standard deviation of the field estimation error were 0.23%, 2.09%, and 4.6% SOM, respectively. Hence, these errors were within an acceptable range for our purpose, considering that laboratory measurements themselves are also subject to errors82. Field estimates were discarded from all modeling steps whenever laboratory measurements from the same 3D location were available. Only the PFB and BPK datasets of BIS were used for model calibration because the probability sampling design of the LSK and CCNL datasets was ideal to independently assess mapping accuracy66,83,84 (Table 4).

Table 4 Overview of datasets used for modeling
Table 5 Descriptive statistics of SOM observation datasets shown in Table 4

Soil point data for model calibration

For model calibration, we used 15,312 laboratory measurements from 4298 locations (PFB) and 840,638 field estimates from 334,668 locations (PFB and BPK; Fig. 2a; Table 4). The locations with laboratory measurements, arranged in a purposive sampling design, were selected in the past to create the national soil map (1:50,000)85, meaning that soil variability is well covered. Soil samples measured in the laboratory were collected by genetic soil horizon between 1953 and 2011 (PFB; Fig. 2b).

In contrast to the laboratory measurements, the majority of field estimates (BPK) were spatially clustered in specific areas for regional soil mapping purposes. SOM field estimates were removed if there was a laboratory measurement available from the same location and genetic soil horizon. We decided to include field estimates in model calibration due to the additional spatio-temporal coverage (1953–2022), especially in recent years (Fig. 2b). However, since field estimates are less accurate than laboratory measurements and due to their clustered spatial distribution, we tested excluding field estimates and assigning them lower weights during model tuning (see below).

Soil point data for model accuracy assessment

Four different datasets were used for statistical validation to assess model accuracy (Table 4). The first dataset (PFB) consisted of the same 15,312 laboratory measurements used during model calibration. For model tuning and accuracy assessment purposes, this dataset was used for cross-validation (see below).

We further had the LSK and CCNL datasets available specifically collected for validation purposes of the national soil map (1:50,000)85. The LSK consisted of 4952 SOM laboratory measurements from 1185 locations sampled by horizon between 1993 and 2000. These soil sampling locations were determined using a national probability sample, more specifically a stratified simple random sample. The dataset is described in more detail in Finke et al.83 and Visschers et al.84 and its use for validating digital soil maps in Section 2.1 of Helfenstein et al.66.

All LSK locations that were still accessible were re-sampled at two fixed depth increments (0–30 cm and 30–100 cm) in 201831,86,87,88. This so-called CCNL dataset consists of 2295 laboratory measurements from 1144 locations. In terms of space-time design, the LSK and CCNL datasets are therefore a supplemented panel because only a subset of the sampling locations of the first survey were re-visited ~20 years later89. Despite the supplemented panel design, substantial methodological differences in the LSK and CCNL datasets prevents a temporal assessment at point scale90, as described in more detail in Section 2.1 and Appendix C of Knotters et al.31, who used these data to study temporal SOM changes within domains (not at point scale).

We will refer to the three datasets described above by their measurement years, i.e., 1953–2011, 1993–2000 and 2018.

In order to also assess changes in SOM over time (ΔSOM) at point locations, we re-sampled the same 1–3 uppermost genetic soil horizons from 63 PFB locations in 2022, leading to a total of 127 samples (Fig. 3c; Table 4). These PFB locations were first sampled between 1953-1999 (depending on the location) and because all locations were re-sampled in 2022, it can be termed a pure panel space-time design89. In contrast to LSK and CCNL, we sampled identical legacy soil horizons as in the past to the best of our abilities. The purposive sampling design of ΔSOM locations is described in detail in SI 3. The 127 samples used for statistical validation of ΔSOM were removed from the PFB dataset to avoid their use during model calibration and 10-fold cross-validation (see above).

Covariates

In line with the digital soil mapping methodology91, we used covariates as model independent variables (i.e., explanatory variables or features) that were representative of the soil-forming factors: climate, organisms, relief (topography), parent material (geology) and time92,93. In order to map SOM in 3D space and time, we extended upon established methods by using covariates that were static (2D), variable in time (2D+T)20,21 and variable over depth and time (3D+T). All covariates were prepared at 25 m resolution, for the standard depth layers specified by GlobalSoilMap94 (GSM; 0–5 cm, 5–15 cm, 15–30 cm, 30–60 cm, 60–100 cm and 100–200 cm) in the case of the 3D+T covariate, and for every year from 1953 to 2022 for 2D+T and 3D+T covariates.

Static covariates

Covariates were designated as static based on limited temporal variation or unavailable data over the 70-year period. Specifically, climate, relief, and parent material exhibited little temporal variability, while satellite-derived land cover indices were incomplete for the entire 70-year period. Although climate change may have impacted SOM, its effects were considered smaller than that of the dynamic covariates we selected (see below). Table 2 and Supplement S2 of Helfenstein et al.66 provide an overview of the static covariates. Additionally, we obtained monthly mosaics of Sentinel 2 RGB and NIR bands from 2015 onwards, removing monthly mosaics with more than 1% clouds. We computed eight indices from these mosaics: Brightness Index, Saturation Index (SI), Hue Index, Coloration Index, Redness Index, Carbonate Index, Grain Size Index, and Normalized Difference Vegetation Index, following Loiseau et al.95. To improve the signal-to-noise ratio and reduce data volume, we further processed the spectral indices into the first three principal components over all months and years (long-term yearly aggregates), as well as the long-term monthly mean and standard deviation. In total, we considered 318 static covariates for further model selection.

Dynamic 2D+T and 3D+T covariates

In recent decades and perhaps even centuries, anthropogenic activity has arguably altered soil characteristics more than any natural soil-forming factor96. To account for SOM changes between 1953 and 2022 in the Netherlands, we chose land use and the occurrence of peat as dynamic covariates due to their important link to SOM and high temporal variability.

Acquiring harmonized and spatially exhaustive information on land use changes from the 1950s to the present is difficult, as high resolution remote sensing products are not available prior to the 1980s. However, in the Netherlands, the main land use categories were carefully mapped using topographic maps since the 1900s. Largely based on these maps, digital historical land use maps were created for around 1900, 1960, 1970, 1980, and 199097 (Table 6). Since the 1980s, national land use maps have been developed by combining information from the Dutch key registries for topography, land cover, agricultural parcels, urban areas, and nature, as well as remote sensing data, to provide greater detail98. These maps have been regularly updated using the latest data and improved methods and since 2018 are updated annually. In total, we used five historical land use maps (1900, 1960, 1970, 1980, and 1990) and eleven recent land use maps from 1984 to 2021 to derive dynamic land use covariates.

Table 6 Table of the dynamic covariates variable in time (2D+T) related to land use (LU)

All land use maps were reclassified into nine general classes (Fig. 1a, b; Table 6), which were limited by the classes contained in the historical land use maps. We obtained the land use for every location, with coordinates x and y for every year t between 1953 and 2022 (LUxyt), by assigning the same class as in the temporally nearest year for which a map was available. If t was exactly in between 2 years for which a map was available, the older map was used. In the same manner, we further defined LUxyt_Δ5, LUxyt_Δ10, LUxyt_Δ20, and LUxyt_Δ40 by assigning the land use class that occurred most frequently in the 5, 10, 20, and 40 years prior to and including year t, respectively. These modal classes were assigned to account for the delayed response of SOM to land use change. An example of a 2D+T dynamic land use covariate (LUxyt) is shown in Fig. 1a, b for the years 1953 and 2022, respectively. Previous studies have employed comparable approaches to derive dynamic covariates that serve as proxies for land cover and land cover changes, however, they utilized continuous covariates sourced from Landsat, MODIS or AVHRR products from more recent years20,21.

Furthermore, we derived dynamic covariates for peat occurrence, considering that the majority of SOM in the Netherlands is found in soils with a peat layer. Peat soils have undergone substantial changes over time due to historical excavation, drainage, compaction, and agricultural management31,33,34,35,37,38,42. In order to account for temporal changes in peat soil horizons, we used the original and updated version of the national soil map of the Netherlands (1:50,000)85 (Fig. 5), which is available at BROloket and bodemdata.nl. We included 2D+T dynamic covariates of different peat classes and a 3D+T covariate of peat occurrence because the combination of both helped explain SOM variability.

For deriving 2D+T dynamic covariates of the occurrence of different peat classes, we used the concept of fuzzy membership99, which has also been widely used in soil classification and mapping100. Let p(x, y, t) denote the peat class at a location with coordinates x and y at time t, where t is any year between 1953 and 2022. In the Dutch soil classification system, used in the national soil map, soils containing peat can be designated into eight sub-classes, based on the starting depth and thickness of peat horizons (Fig. 5). Thus, each p(x, y, t)  {1, 2, 3, 4, 5, 6, 7, 8}, where the first seven classes refer to different peat soils and where class 8 contains no peat within 0–120 cm (i.e., mineral soil). The information about which peat class occurs at a particular location was limited to a maximum of two points in time, to for the original mapping year, and tu for the year the map was updated. to and tu varied in space and tu was only available for part of the country, since the soils of the Netherlands were systematically mapped, region by region, between the 1960s and 1990s, and some regions were updated once between 2014–2021. Many of the updated regions were areas with less than 2 m of peat within the top 2 m and reclaimed land, where soil characteristics or the depths at which these characteristics occur have substantially changed since the original map was made due to peat oxidation, compaction and land subsidence. For regions that were not updated, the old and updated maps were the same. For t < to, we assumed p(x, y, t) = p(x, y, to), i.e., for years between 1953 and the first mapping campaign we designated the same peat class as in the original map. For t > tu, we assumed p(x, y, t) = p(x, y, tu), i.e., for years between which a region was updated and 2022, we designated the same peat class as in the updated map. However, for tottu, peat class was designated a fuzzy membership value that was a combination of the classes at to and at tu. We let the membership of the class at to linearly decrease from one to zero in the period from to to tu, and similarly we let it linearly increase from zero to one for the observed class at tu, so that the sum of the memberships was always one. We used the 2D+T membership values for each of the eight peat classes as covariates in the random forest model.

For deriving the 3D+T dynamic covariate of peat occurrence, we used the peat starting depth and thickness information contained in the peat classes to derive another fuzzy variable of the occurrence of peat depending on location, depth and time. The fuzzy membership that was derived was as before a number between 0 and 1, where 0 means that peat does not occur, and 1 means that peat occurs. Any membership value in between 0 and 1 means that the soil at that location, depth and time was in a transition period from peat to non-peat or vice versa, in accordance with the soil peat class membership value at that location and time.

Note that some assumptions were needed to derive 3D+T peat occurrence depending on the peat class, depth and expert knowledge. For example, if p(x, y, t) = 4, which is the peat class for peat starting between 15 and 40 cm and having a thickness between 15-40 cm (Fig. 5), we assumed that peat occurs between 15 ≤ d ≤80 cm because the exact depth range of peat between this minimum and maximum depth were not known more precisely. Likewise, for p(x, y, t)  {1, 3, 5, 7}, where peat thickness >40 cm, we assumed that peat occurs for the entire depth range between the minimum starting depth for each class, that is 0, 15, 40, and 80 cm, respectively, and the maximum depth modeled (200 cm; Fig. 5). We made this assumption based on explanations from soil surveyors, who informed us that areas mapped with peat layers thicker than 40 cm tend to have substantially greater thicknesses. In order to predict at the standard depth layer specified by GSM94, the upper and lower depth boundaries of 0–5 cm, 5–15 cm, 15–30 cm, 30–60 cm, 60–100 cm and 100–200 cm layers were used for each year to derive peat occurrence according to t and d. During this step, assumptions were made regarding the overlap between GSM depth layers and defined depths of peat classes. When peat occurrence was >0, the GSM depth layer needed to overlap by ≥1 cm with the peat layer, whereas when peat occurrence was equal to 0, we permitted no overlap with the peat layer. These assumptions were made because even if there is only a small overlap of peat, it will lead to substantially higher SOM values than on purely mineral soils.

Depending on the peat class in the original and updated national soil map, d, t and the location, this ultimately resulted in one of three possible outcomes: no changes in peat occurrence, peat “appearing” or peat “disappearing”. An example of the 3D+T dynamic peat occurrence covariate is shown in Fig. 1d, e for the year 1987 for depths 5–15 cm (d) and 60–100 cm (e).

Overlay and regression matrix

We created a regression matrix containing SOM and covariate values by performing a spatial overlay for static covariates, a space-time overlay for 2D+T covariates and a space-depth-time overlay for 3D+T covariates. t was equal to the year at which a SOM observation was made. For deriving peat occurrence according to t and d, the upper and lower sampled horizon boundaries were used, whereby the same assumptions were made regarding the overlap of depth layers as when deriving peat occurrence for the GSM depth layers (see above).

Sampling depth information, more specifically the upper and lower boundary and midpoint of each sampled horizon, were included as covariates in the regression matrix so that predictions could easily be made at any chosen depth and depth interval. Including these as covariates also supported accounting for changes in SOM over depth, in addition to the other 3D covariate. See Ma et al.101 for an overview of models using depth as a covariate in comparison to non-3D digital soil mapping methods.

Model selection, tuning and calibration

For model selection, defined here as selecting the best model based on their performance102, we first reduced the number of static covariates (n = 318). We removed covariates in a two-step procedure using de-correlation followed by recursive feature elimination as in Poggio et al.59. From any pair of covariates for which the Pearson correlation coefficient was >0.85 or <−0.85, the covariate that was more correlated with all remaining covariates was removed. Recursive feature elimination103 was implemented using the caret package104. This resulted in a set of 16 static covariates. These, in addition to the three depth covariates, the five 2D+T covariates of land use, eight 2D+T covariates of peat classes and the 3D+T covariate of peat occurrence, were selected for model tuning and calibration and can be found in the variable importance plot (SI 2; Fig. S3).

For model tuning, we grew random forest models105 with the goal of optimizing hyper-parameters for mean predictions. Model tuning was performed using a location-grouped 10-fold cross-validation of the calibration data, wherein all observations from the same location were forced to be in the same fold. Each hold-in fold contained a random selection of 90% of the laboratory measurements and all field observations. Each hold-out fold contained the remaining 10% of laboratory measurements. In this way, models were calibrated using both laboratory measurements and field estimates, while performance to optimize hyperparameters was assessed only using laboratory measurements. We evaluated all combinations of the same hyper-parameters as in Section 2.4 of Helfenstein et al.66 and selected the combination with the best performance.

In order to account for the lower accuracy of field estimates compared to laboratory measurements, we assigned larger weights to the laboratory measurements. Weight values of two, five, ten and fifteen times the weight of field estimates were tested. The final set of weights and other hyper-parameters was chosen based on the lowest root mean squared error (RMSE; Equation (3)) across the cross-validation. When the increase in RMSE was below 0.1%, the model with fewer trees was chosen to reduce computation time. Note that an alternative way to account for differences in observation quality would be to derive the weights from the measurement error and residual variance, as in error-filtered machine learning106.

The final quantile regression forest used for model prediction was fitted using all soil observations in the calibration set (n = 15,312 laboratory measurements and 840,638 field estimates), 33 covariates and the final set of hyper-parameters, as optimized using random forest. We used the ranger package107 with the option “quantreg” to grow a quantile regression forest and without it to grow random forest models.

Variable importance

During model fitting, we used impurity as a measure of variable importance (SI 2; Fig. S3). Impurity assesses the total reduction in heterogeneity that a covariate generates on the response variable. It is calculated by summing up all the reductions in the heterogeneity index in the tree nodes where a covariate was selected for splitting108. It is important to note that impurity has a bias towards covariates with more distinct values, making it negatively biased towards categorical covariates, as they have a finite number of binary splits due to their limited number of classes109,110. While impurity was used in this study, the more appropriate permutation measure to assess variable importance is dependent on the out-of-bag error108. As we assigned larger weights to lab measurements, there were not enough unselected soil samples available to calculate the out-of-bag error, making it impossible to use permutation to measure variable importance.

Prediction maps

The calibrated quantile regression forest was used to derive the median (0.50 quantile; q0.50), 0.05 quantile (q0.05) and 0.95 quantile (q0.95) at every 25 m pixel and each standard depth layer specified by GSM94 for every year t from 1953 to 2022 over the Netherlands. In addition, spatially explicit 90% prediction intervals (PI90) were obtained at every 25 m pixel as a measure of prediction uncertainty as follows:

$$PI90={q}_{0.95}-{q}_{0.05}$$
(1)

Absolute mass percentage changes in SOM (ΔSOM) over the 70-year period were mapped by subtracting the 2022 and 1953 SOM [%] median prediction maps. For prediction, the depth covariates were equal to the upper and lower boundary and midpoint of each standard depth layer specified by GSM94. However, note that the model can predict at any depth, so in order to analyze changes in SOM over smaller depth increments, we also predicted SOM [%] at 5 cm depth intervals between 0 and 2 m along a transect (Fig. 3e, f). The location of this transect was chosen such that it contained fen and bog peat soils of varying thickness, reclaimed land and mineral soil under different land use types.

SOM changes based on soil type and land use

In order to gain insight into average changes in peat vs. mineral soils and the dominant land uses in the Netherlands (grassland, cropland and forest), we computed average ΔSOM values between 1953 and 2022 for each combination of these soil types and land uses for the top 30 cm (Table 2). This depth interval was chosen because the topsoil is usually most relevant for agricultural and ecological purposes. Weighted averages were computed for the upper layers using the ΔSOM maps from 0–5 cm, 5–15 cm, 15–30 cm.

In order to compare our findings with other studies13,31,36,45,46, conversions from soil organic carbon to SOM were necessary. We used the same conversion factor that was used for soils in the Netherlands in Knotters et al.31, so soil organic carbon values reported in other studies were multiplied by 2.000. Note that this conversion ratio depends on the soil type and we did not account for its uncertainty, as shown for soils in the Netherlands in Fig. 53 of van Tol-Leenders et al.86. If other studies reported values in g/kg, values were divided by 10 to obtain absolute mass percentages.

Model accuracy assessment

In order to assess model accuracy, we used prediction uncertainty, cross-validation (1953–2011), design-based inference (1993–2000 and 2018) and non-design based inference to evaluate SOM temporal changes (ΔSOM; Table 4).

Prediction uncertainty

At the location, depth and year of a SOM measurement, all quantiles from 0 to 1 at steps of 0.01 were predicted to obtain the PI90 (Equ. (1)) as well as the prediction interval coverage probability (PICP) of all prediction intervals. The PICP is the proportion of observations that fall into the corresponding prediction interval111. It is an indication of how accurately quantile regression forest quantifies uncertainty. Prediction uncertainty using PI90 is an example of model internal accuracy assessment since it is model (quantile regression forest) dependent.

Cross-validation (1953–2011)

In order to obtain an overall indication of 3D+T SOM mapping accuracy, we used a location-grouped 10-fold cross-validation with only laboratory measurements from the 1953–2011 dataset in the hold-out folds, similar as during model tuning (see above). Cross-validation is commonly used for digital soil mapping assessment112 and was also used in other space-time soil mapping studies20.

To obtain commonly used accuracy metrics, median predictions were used to calculate residuals. From these residuals we estimated the mean error (ME or bias), the RMSE and the model efficiency coefficient (MEC):

$$\widehat{ME}=\frac{1}{n}\,\mathop{\sum }\limits_{i=1}^{n}\left({y}_{i}-{\widehat{y}}_{i}\right)$$
(2)
$$\widehat{RMSE}=\sqrt{\frac{1}{n}\,\mathop{\sum }\limits_{i=1}^{n}{\left({y}_{i}-{\widehat{y}}_{i}\right)}^{2}}$$
(3)
$$\widehat{MEC}=1-\frac{\mathop{\sum }\nolimits_{i = 1}^{n}{\left({y}_{i}-{\widehat{y}}_{i}\right)}^{2}}{\mathop{\sum }\nolimits_{i = 1}^{n}{\left({y}_{i}-\overline{y}\right)}^{2}}$$
(4)

where n is the number of observations, yi and \({\widehat{y}}_{i}\) are the ith observation and prediction, respectively, at a certain location, depth and year, and \(\overline{y}\) is the mean of all test set observations. We computed these accuracy metrics for all observations (Fig. 4a) and separated into observations in 0–30, 30–100 and 100–200 cm depth layers (Table 3), as the latter was necessary for design-based inference (see below). The midpoint of the depth layer was used to designate it into the corresponding depth layer. For example, an observation from 20–60 cm, having a midpoint of 40 cm, was designated to the depth layer 30–100 cm.

Design-based inference (1993-2000 and 2018)

Since the 1993–2000 and 2018 datasets are probability samples over 2D space (see “Soil point data for model accuracy assessment”), we used design-based inference to compute accuracy metrics for 0-30, 30-100 and 100-200 cm depth layers (Table 3), in the same manner as in Section 2.6.2 of Helfenstein et al.66. This included the lower and upper 97.5% confidence limits of the accuracy metrics, which together give the 95% confidence intervals (de Gruijter et al.113; Section 7.2.4). Design-based inference using a probability sample is recommended for map validation because it yields unbiased estimates of the accuracy metrics and allows computing confidence intervals89,114,115.

Evaluation of SOM temporal changes (ΔSOM)

Using the ΔSOM validation dataset (see “Soil point data for model accuracy assessment” and locations in Fig. 3c), we computed the difference between a SOM measurement in 2022 and a legacy SOM measurement called Δy as follows:

$$\Delta y=y(x,y,d,{t}_{2022})-y(x,y,d,{t}_{i})$$
(5)

where y is a SOM observation, x and y are the coordinates of a location, d is the sampled depth and ti is the year between 1953 and 1999 when the legacy soil sample was collected before at that location and depth. We compared predictions to observations and also computed accuracy metrics using Equations (2)–(4), except that yi and \({\widehat{y}}_{i}\) were the ith ΔSOM measurement and prediction, respectively (Fig. 4b).