## Introduction

Silicon (Si) has been shown to be beneficial for crops exposed to biotic or abiotic stresses1,2,3,4,5. The role of Si remains controversial1,6, but crop yields may depend on the bioavailable Si in the soil7, particularly for globally cultivated crops and staple food that contain 1% or more Si by dry weight such as rice, sugar cane, or wheat, or more broadly, the Poaceae8,9,10. Soil Si bioavailability is therefore an emerging issue in agriculture, as Si fertilization of soil depleted of bioavailable Si could increase yields8,11,12,13.

Si is taken up from the soil solution by roots and is accumulated in the shoots in the form of amorphous silica particles (phytoliths). In the soil solution, dissolved Si (DSi) primarily occurs in the neutral form Si(OH)4 (silicic acid). The DSi concentration typically ranges from 0.1 to 0.6 mmol L−13. DSi originates from the dissolution of primary and secondary silicate minerals through alteration or chemical weathering14 and from plant recycling through the dissolution of phytoliths15,16. The alterability of silicate minerals can be characterized by their solubility and kinetic properties, which are dependent upon temperature, pH, and available solutes17,18.

DSi concentration in the soil solution can be lowered by plant uptake19, uptake by silica-shelled microorganisms20 and Si adsorption to mineral surfaces21. Quantifying the pool of Si that is bioavailable (PAS) is still a challenge7. Si extracted with CaCl2 (SiCaCl2), acetate, acetic acid, or citrate are used as PAS proxies since the concentrations extracted by these reagents are proportional to plant Si concentration or grain yield22,23. PAS concentrations extracted by these reagents have been shown to be positively correlated with soil properties such as phytolith content24, pH25,26,27, < 2 µm fraction7,28, organic matter, and iron oxides28. A negative correlation between PAS and total Si content has also been documented, reflecting the predominance of low solubility minerals such as quartz in the Si pool28,29. Critical levels of PAS have been identified in cultivated lands mainly under tropical climate including the paddy soils of Asia7,30, the USA31, and Brazil32, and the sugarcane fields of Australia33. Under critical levels of PAS, silicate fertilization may be recommended and is already applied on tropical soils in some countries.

Chemical weathering leads to a removal of Si through leaching and erosion14. Intense weathering causes soils to become acidic and depleted in bases and Si (desilication) due to loss of primary silicates that are easily weatherable. Cultivation, through exportation of crops containing silicon has been indicated as a Si pool modifier by lowering the phytolith pool34,35,36,37,38,39; as a consequence, cultivation may reduce the soil Si availability if plant residues are not returned to the fields40. Darmawan et al.11 documented a significant decrease of available Si after approximately 30 years of intensive rice cultivation in Indonesia. The decrease of PAS in soil has been listed as a factor contributing to the stagnation of crop yields12. Si supply is considered a useful strategy for improving crop health7, the influence of land use and agricultural practices on PAS has become an emerging issue41.

The objectives of this paper are to determine the impact of agriculture on PAS in temperate soils. To do so we (i) estimated the spatial variability of PAS in France, a country with a high soil diversity42 that can be considered as representative of most European soils, and likely be of most temperate soils worldwide; (ii) hierarchized the soil characteristics governing this variability in non-cultivated soils for different soil groups (pH, < 2 µm fraction, cation exchange capacity (CEC), major elements) to (iii) understand how and under which pedological conditions, cultivation acts on these soil divers; and (iv) determine which soils may exhibit too low SiCaCl2 concentrations in soil solution for proper growth of wheat, a Si accumulator plant.

We implemented a spatial statistical approach based on an extensive dataset extracted from the French Soil Quality Monitoring Network database43. This dataset contains data from over 2200 sites in the France mainland, all of which were analysed for major chemical, physical, pedological, climatic, and geological parameters including SiCaCl2 concentrations. We focused on the SiCaCl2 soil concentrations of the surface horizons (0–30 cm), considered to be the horizon most explored by plant roots.

## Results and discussion

### Variables explaining SiCaCl2 using digital soil mapping (DSM)

Concentrations of SiCaCl2 in French topsoils ranged from 2.3 to 134 mg kg−1 (subset 1, see Supplementary Table S1) with a 1st quartile value of 9.5 mg kg−1, a median of 17 mg kg−1, and a 3rd quartile of 28 mg kg−1 (Table 1). The values fell in the same range as values found elsewhere25,26.

The spatial distribution of SiCaCl2 at the French territory scale obtained by the DSM approach was evaluated by the following criteria using a 30-fold cross validation process: R2 = 0.43, concordance = 0.58, RMSE = 13 mg kg−1, and bias = 0.24 mg kg−1. These criteria are of the same order of magnitude than those obtained in other DSM studies, at national scale, for other soil parameters, which yield R2 ranging from 0.25 to 0.5544,45,46. The most important covariates for predicting the spatial pattern of the SiCaCl2 concentration in the obtained Random Forest model, was, as expected, the parent material (parent_material) (Fig. 1). The second most important covariate was the so called NDVI_1, a proxy for vegetation growth dynamics as defined by Loiseau et al.47. Other important covariates for the model were soil type, precipitation, and land use, in that order. These results confirmed the data obtained in Japan28 and in Louisiana31 indicating that PAS depends on pedological/geological conditions. As a consequence, the primary spatial structures of the map (Fig. 2a) matched the parent materials and the negative correlation with total Si concentration29. These results also highlighted the importance of land use/vegetation on the SiCaCl2 concentration.

### Physical and chemical control of SiCaCl2 in non-cultivated soils

In this part, only non-cultivated soils are considered to study the mechanisms that occur naturally in soils, without the influence of agriculture that generally strongly modifies some of the soil characteristics considered as PAS drivers, notably the soil pH by liming for slightly acidic to acidic soils, the soil organic carbon concentration (SOC).

#### Overall correlations between the main soil characteristics and SiCaCl2 concentrations

In the non-cultivated soils, we confirmed previous findings indicating that SiCaCl2 concentration was positively correlated (Table 2) with pH (r = 0.46) and the < 2 µm fraction (r = 0.59). The relationships with SOC and iron oxides (estimated by the Mehra Jackson extraction method48) were weaker (with r of 0.27 and 0.35, respectively). SiCaCl2 was also positively correlated with CEC (r = 0.59), exchangeable Ca (r = 0.54), and CEC of < 2 µm fraction (r = 0.44). The correlation of SiCaCl2 with pH indicated that SiCaCl2 concentration may be controlled by Si adsorbed on the surface of soil minerals and/or by the soil weathering state25,26. The correlation of SiCaCl2 with the < 2 µm fraction can be attributed to the presence of small phytoliths49, clay mineral dissolution50, or adsorption onto clay minerals51.

Nevertheless, while SiCaCl2 linearly increased with the < 2 µm fraction in non-cultivated soils regardless of parent material (Fig. 3a), the correlation with pH was more complex. The concentration in SiCaCl2 increases with pH for pH values lower than 6–7 and then slightly decreases for pH larger than 8 (Fig. 3b). Such an evolution of SiCaCl2 with pH was already described in the literature52. As a consequence, there was no correlation between SiCaCl2 and the soil pH (Table 2) for carbonated soils due to their high (> 7) and rather constant pH values (Fig. 4b). For the SOC and Fe oxides, the relationship with SiCaCl2 is less marked, even though the SiCaCl2 concentration globally increases with these parameters (Fig. 3c,d). As a conclusion, in non-cultivated soils, the SiCaCl2 concentration seems to be mainly driven by < 2 µm content and pH suggesting adsorption.

#### The impact of pH on the SiCaCl2 concentration is a function of the < 2 µm fraction

To explore the respective role of pH, CEC of < 2 µm fraction, and < 2 µm fraction on SiCaCl2 concentrations in soils, we considered the correlation between soil SiCaCl2 concentration and soil pH for each of the < 2 µm fractions defined in Fig. 3a. For low < 2 µm fraction concentrations (< 50 g kg−1), the SiCaCl2 concentration was significantly correlated with the pH but the slope of the regression was very low and appeared to be driven by a few extreme values (Fig. 5). As a result, SiCaCl2 concentrations increased minimally from 3.3–4.3 mg kg−1 to 6.4–9.3 mg kg−1 in 4.5 pH units, demonstrating the low impact of pH on SiCaCl2 concentration for soils with poor < 2 µm fractions. For < 2 µm fraction concentrations ranging from 50 to 325 g kg−1, SiCaCl2 concentration increased with pH. Nevertheless, for < 2 µm fraction content higher than 230 g kg−1, the increase of SiCaCl2 with pH is smaller. No correlation is observed for < 2 µm fraction concentrations higher than 325 g kg−1, indicating that adsorption might not be the dominant process at these < 2 µm contents. Carbonated soils are in this < 2 µm fraction range (median of 364 g kg−1, Supplementary Table S2) explaining the absence of correlation of SiCaCl2 with pH.

As a conclusion, the classically described increase of SiCaCl2 with pH (explained by Si adsorption processes) was only encountered in temperate soils for non-carbonated soils with < 2 µm concentrations higher than 50 g kg−1 or lower than 325 g kg−1.

#### Impact of the nature of the < 2 µm fraction on SiCaCl2 concentration

The < 2 µm CEC can be used as a proxy for the nature of the clay minerals of the soil as shown by the contrasted CEC of the different clay minerals provided by Goldberg et al.53. On this basis53, we assumed that (i) soils with < 2 µm CEC lower than 10 cmol+ kg−1 were primarily kaolinitic or possessed a < 2 µm fraction rich in quartz; (ii) soils with < 2 µm CEC ranging from 10 to 40 cmol+ kg−1 contained a large amount of illite and chlorite or a complex mixture of clay minerals; and (iii) soils with a < 2 µm CEC larger than 40 cmol+ kg−1 contained a large quantity of vermiculite and smectite.

A correlation of the SiCaCl2 concentrations with the < 2 µm CEC (a proxy for the nature of the clay minerals of the soil) was observed globally and for all non-carbonated parent material, with the exception of podzols (Table 2), suggesting a role of the nature of the clay minerals on the SiCaCl2 concentrations. However, Fig. 5 shows that soil pH and clay nature were linked, with acidic soils being mainly kaolinitic and basic soils being more smectitic or rich in vermiculite. This could partially explain the link between SiCaCl2 concentration and pH, because kaolinites are more stable than other clay minerals (e.g. smectite)14.

For carbonated soils developed on sedimentary parent materials and podzols, the SiCaCl2 concentrations did not appear to be impacted by the nature of the corresponding clay mineral (< 2 µm CEC) (Table 2). In carbonated soils on sediment parent material, the clay mineral assemblage was primarily dominated by smectite and vermiculite, according to the < 2 µm CEC recorded (Fig. 4d). In podzols, the nature of the clay minerals did not vary much and primarily consisted of kaolinite and quartz in the < 2 µm fraction (see the very low < 2 µm CEC in Supplementary Table S2).

Our results showed that the increase in SiCaCl2 concentrations in tandem with pH may be explained by changes in the nature of the underlying mineralogy of the < 2 µm fraction. In acidic soils with kaolinite as the primary clay component, the SiCaCl2 was lower than in soils with a near neutral pH, with a primary clay component of smectite.

### Impact of agriculture on available silicon in soils

For soils derived from carbonated and non-carbonated sediment material (73% of French soils), SiCaCl2 significantly increased in the cultivated soils (perennial and annual crop) as compared to soils at uncultivated sites (Fig. 4a). Indeed, the results of linear model showed that about 12% of the total variance is explained by this difference. However, for soils derived from igneous intrusive rock and metamorphic rock, there was no significant difference in SiCaCl2 concentrations between cultivated and non-cultivated sites. For these two parent materials, SiCaCl2 concentrations were very low, as compared to other parent materials. Thus contrarily to what was expected, cultivation increases SiCaCl2 concentrations but only in soils where this concentration is not too low.

Cultivated soils exhibited significantly higher pH values than uncultivated vegetated soils, regardless of parent material (Fig. 4b) due to liming practices on cultivation, with the notable exception of the carbonated soils, which are generally not limed. For carbonated soils, the difference of pH was due to acidifying conditions under non-cultivation, as compared to agricultural conditions. The difference in pH between cultivated and non-cultivated soils was less than 0.5 for carbonated soils on sediment and igneous intrusive rocks. Since carbonated soils have a < 2 µm fraction content higher than 325 g kg−1, pH has no effect on the SiCaCl2 concentration as shown by Fig. 5 and Table 2.

For non-carbonated soils on sediments and soils on metamorphic rocks, the difference was higher: the average pH of non-cultivated soils was 5.7 and 5.2, while the average pH of cultivated soils was 6.7 and 6.1 respectively for non-carbonated soils on sediments and metamorphic rocks (Supplementary Table S2).

As shown in Fig. 4c (and Supplementary Table S2), the < 2 µm fractions for non-carbonated soils on sediments primarily comprised 50 to 325 g kg−1. For this < 2 µm fraction range, we showed that the SiCaCl2 concentration increased with pH (Fig. 5). Therefore, cultivation associated to a pH increase by liming may be responsible for the SiCaCl2 concentration increase under cultivation for these soils. This pH increase was also associated with an increase in < 2 µm CEC and therefore with a higher smectite/vermiculite content under cultivation, and a higher illite/chorite content under permanent vegetation (Fig. 4d). Similar changes in clay mineral composition following changes in land use were observed on paired site approaches (Cornu et al.54 and references herein). The pH-driven effects of liming on SiCaCl2 concentrations have already been observed52. Indeed, increases in pH could cause higher adsorption of silicon to soil minerals and facilitate the solubilization of phytoliths in the soil solution, thus increasing the SiCaCl2 concentration at short time scales52.

### Consequences in terms of potential PAS deficiency for wheat in temperate soils

To our knowledge, no study has defined the critical level of SiCaCl2 for avoiding Si deficiency for temperate staple crops as wheat. Critical levels are, however, available in the literature for rice and sugarcane; these levels are adopted as references for discussion here because wheat has a shoot Si concentration between those of rice and sugarcane10. We proposed two critical values: a lower bound (20 mg kg−1) defined by Haysom and Chapman33 for sugarcane in Australia, and an upper bound (40 mg kg−1), an average value provided for rice based on data from silt loam in Louisiana soils31 (37 and 43 mg kg−1) and from southern acidic soils in India30 (43 mg kg−1). We applied these two thresholds to the French soils cultivated with wheat, obtained by crossing the arable land pixels defined to Corine Land Cover map55 with the cereal producing municipalities from the French OTEX classification (the Technico-economic orientation of the farms, http://agreste.agriculture.gouv.fr), to quantify the importance of potential SiCaCl2 deficiency in temperate soils (Fig. 2b). It results that only 4% of the soils cultivated with wheat fell below the 20 mg kg−1 critical level and could therefore be depleted of plant available Si. However, this hypothesis requires a more technical definition of the Si requirements of wheat. Indeed, this would enable to determine if the soils having concentrations between 20 and 40 mg kg−1 and representing 85% of the surfaces cultivated with wheat are below or under the critical level for wheat.

## Materials and methods

### Soil data from RMQS program

The soil data used in this study were provided by the French Soil Quality Monitoring Network (RMQS). The RMQS network consists of observation sites situated at the centre of a regular grid (16 * 16 km) covering the French territory. It provides 2111 sites in metropolitan France, almost half of which are cultivated (permanent crops or field crops) and the remaining of which consist of pastures, natural vegetation, or urban soils. The dataset used in this study corresponds to the first sampling campaign of the RMQS carried out from 2000 to 2009.

Soil type, parent material, climate, and land use were described in the field. At each site, 25 core samples were taken within a 20 m × 20 m plot and combined into a composite sample. Sampling depth generally consisted of the 0–30 cm layer43. The composite samples were air-dried and sieved to 2 mm before being analysed in Soil Analysis Laboratory of INRAE (Arras, France).

The following parameters were measured: (i) the total soil organic carbon concentration, as measured by dry combustion (NF ISO 10694); (ii) the particle size distribution, by wet sieving and pipette method (NF X 31-107); (iii) cation exchange capacity (CEC) and exchangeable cations (cobaltihexamin method, NF X 31-130); (iv) pH in water (1 to 5 soil to water ratio, NF X 31-107); (v) calcium carbonate, using the volumetric method (NF X 31-106) (CaCO3); and (vi) total P, K, Ca, Mg, Fe, and Al, determined by ICP-MS after dissolution with hydrofluoric and perchloric acids (NF X 31-147). The < 2 µm CEC was estimated as follows:

$$2\,\upmu\text{m}\;\;CEC=\frac{\left(CEC-0.15*OC\right)}{<2\,\upmu \text{m}\;fraction\; content}*1000$$
(1)

with CEC in cmol+ kg−1, OC in g kg−1, and < 2 µm faction content in g kg−1. We used 0.15 cmol+ kg−1 as an estimate of the CEC of organic matter.

Total Si was analysed from a subset of 673 samples by ICP after alkaline fusion56, and extrapolated to the remaining sites using the following conceptual equation:

$${\text{Si}} = f\left( {{\text{Al}},\;{\text{Fe}},\;{\text{K}},\;{\text{Na}},\;{\text{Ca}}_{nc} ,\;{\text{Mg}}_{nc} ,\;{\text{P}},\;SOC,\;{\text{CaCo}}_{3} ,\;residual\;water} \right)$$
(2)

where Canc and Mgnc are the fractions of Ca and Mg, respectively, that are not included in carbonate minerals or adsorbed to the exchangeable surfaces, and SOC is the organic carbon percentage. This model was implemented in cubist29 and yielded an R2 value greater than 0.98.

Bioavailable Si (SiCaCl2) was estimated on samples from 2091 sites using the 0.01 M CaCl2 method33. This widely adopted method25 allows estimation of the pool of readily soluble Si.

Bulk samples were equilibrated during 16 h at room temperature with 0.01 M CaCl2 with a solid:liquid ratio of 1:10. Si was then analyzed, after filtration of the supernatant at 0.45 µm, by Inductively Coupled Plasma Atomic Emission Spectroscopy (axial ICP-AES; 720 ES, Varian)29. The limit of quantification of the method was of 0.5 mg kg−1 with an uncertainty, U, (with a confidence level of 0.95) evaluated as follow:

$$U=0.0271*{\mathrm{Si}}_{\mathrm{CaCl}2}+0.25$$
(3)

with SiCaCl2 and U expressed in mg kg−1.

To decipher the impact of agricultural use on SiCaCl2 concentration, we stratified the database by (i) parent material, based on the European Soil Information System (EUSIS) classification57 available at the French territory scale as in Landré et al.29; and by (ii) land use. The group of soils developed on sediment was cut into two subgroups (carbonated and non-carbonated depending on their carbonate content (> 1% and < 1% respectively). We considered soils under forests, pastures, parks, natural vegetation, and wetlands to be non-cultivated soils. We did not make any distinction of land use for soils on igneous extrusive rocks (volcanic) and podzols, because cultivation of these soil types was limited to two and three sites, respectively.

Our statistical analysis was based on a subset of 1986 data points (subset 1, see Supplementary Table S1) due to the removal from the initial 2091 point dataset of any sites with missing geological information and three sites with peat soils.

### Digital soil mapping approach

To map and define the spatial distribution of SiCaCl2 concentrations, we implemented a digital soil mapping (DSM) approach based on the scorpan model framework proposed by McBratney et al.58, a spatial prediction function utilizing quantitative relationships between soil properties and soil forming factors as follows:

$${Soil}_{x}=f\left(s,c,o,r,p,a,n\right)+e$$
(4)

where $$Soil$$ is a soil property at position $$x$$. The $$s$$ refers to soil information derived from prior soil maps or from remote or proximal sensing data; $$c$$ refers to the climatic properties of the environment at a given point; $$o$$ refers to organisms, including vegetation, fauna, or human activity; $$r$$ refers to relief; $$p$$ refers to the parent material or lithology; $$a$$ refers to the soil age; $$n$$ refers to space or spatial position; and $$e$$ is the spatially correlated errors.

To apply this model, we retained a set of spatial covariates describing scorpan factors to predict the spatial distribution of SiCaCl2 at 90 m resolution. All covariates were previously resampled at the same resolution. The selection was carried out from a larger set of covariates using a combination of expert knowledge and statistical approaches. The latter was based on the boruta package in R59. This algorithm uses the classification of the covariate importance implemented in the randomForest package and compares the importance with random variables based on the Z score value. This step allows the identification of the most relevant covariates before fitting a prediction model. The initial 18 spatial covariates represented the scorpan factors as follows: soil (map of soil type and available water capacity (AWC)), climate (map of climate type, precipitation, and evapotranspiration), vegetation (land use, forest type), parent material (parent material types and gravimetry), and relief (shuttle radar topography mission (SRTM), compound topographic index (CTI), slope cosine, erosion, slope, and slope position). A final covariate corresponding to the normalized difference vegetation index (NDVI) was derived from remote sensing data. This spectral index is widely used to describe the photosynthetic capacity of vegetative cover. The underlying assumption driving this method holds that changes in vegetation may reflect various plant responses to climate, land management, or soil properties47. This covariate was computed from a large time series dataset. To summarise the temporal data, Loiseau et al.47 performed a principal component analysis that yielded 3 covariates corresponding to the 3 first components: NDVI_1, NDVI_2, and NDVI_3.

For the scorpan model, we developed a regression kriging (RK) model using the selected covariates and soil observations. The RK model is a hybrid technique combining a Random Forest approach with a geostatistical approach. Random Forest60 consists of an ensemble of regression trees built from covariates and point data. The final prediction is the mean of the individual tree predictions. This technique is applicable in R and requires two main parameters: the number of randomly selected splitting variables at each tree (mtry), and the number of trees (ntree). We used the default parameters provided by the package (mtry = 6 and ntree = 500). The final predictions are the sum of the Random Forest predictions and the residuals computed through an ordinary kriging procedure61.

The RK model was fitted on the on a subset of the RMQS sites for which covariate information were available (subset 2: 1987 points, see Supplementary Table S1). This procedure was implemented with the package GSIF62, which also allows fitting of residuals in a variogram.

To assess the covariates importance in the model, we used the function varImp() implemented in the Random Forest package. The function calculates the difference in MSE (Mean Squared Error) when the values of each predictor are shuffled. This indicator is then normalized by the standard deviation of the differences to obtain the Increase in MSE noted %IncMSE. The highest values are attributed to the more important variables.

The validity of the fitted prediction model was evaluated by a 30-fold cross validation. At each step, the soil dataset was split into two datasets: 2/3 for calibration, and 1/3 for validation.

To obtain an estimate of the final map accuracy, the validation indicators were calculated at each iteration and then averaged over the 30 repetitions. We used different indicators for validation, including the Root Mean Square Error (RMSE), the coefficient of determination (R2), the concordance63, and the bias.

### Take home messages

The SiCaCl2 concentrations in French soils is highly variable (from 2.3 to 134 mg kg−1) and depends mainly on parent material and soil type, but also on the land use.

When the soils are cultivated, the concentration in SiCaCl2 significantly increases for soils developed on sediment parent material but not for those developed on metamorphic rocks and igneous intrusive rocks. For these two last parent materials, SiCaCl2 concentration is low compared to soils developed on sediments. For soil developed on igneous extrusive rocks and podzol, no conclusion on the impact of agriculture on SiCaCl2 concentrations could be drawn due to the low number of sites under cultivation for these soils in France. For non-carbonated soils, this increase is due to the pH increase associated to liming practices. More research is needed to better understand the impact of cultivation on SiCaCl2 concentration for carbonated soils for example through a paired sites approach.

We also verified on this large panel of temperate soils that the SiCaCl2 concentration is governed by the < 2 µm fraction, pH, and iron oxide content to a lesser extent. The nature of the clay assemblage seemed also to act on the SiCaCl2 concentration. We however showed that the SiCaCl2 concentration increases with pH only for soils with content in < 2 µm fraction ranging from 50 to 325 g kg−1.

We also showed that 4% of the soils cropped with wheat could be deficient in SiCaCl2. This result was based on critical level of SiCaCl2 estimated for sugar cane and rice. Studies should be done to determine precisely the critical level for wheat.

At last, this study did not allow accessing the kinetic aspects. Other Si pools and/or chronosequences studies should be analyzed to answer this question.