Background & Summary

The soil water characteristics curves (SWCCs) describes the relationship between soil water content (in gravimetric or volumetric form) and matric potential1,2 and is a fundamental soil hydraulic property that characterizes soil water dynamics and key hydrological processes3. The amount of water retained in the soil pores at given matric potential is dominantly controlled by soil texture, structure, and organic matter4. The measurement of the SWCCs in the laboratory or field is laborious and time-consuming5. This limitation hinders empirical characterization of SWCC parameters over large areas as required in land surface and catchment-scale modeling. An alternative to excessive sampling to obtain spatially distributed SWCC parameters is the application of pedotransfer functions (PTFs)6. However, agronomic legacy of soil mapping influences PTF-derived SWCCs, which tend to be region-specific and focus on homogeneous agricultural soil7,8,9. The growing demand for hydrological land surface parameterization beyond agricultural lands and the related refinement of spatial representation require more definitive SWCC information10. Motivated by the growing need for more comprehensive and spatially resolved SWCCs, a key objective of this study was to pool published datasets and supplement these with anecdotal literature values for regions with poor coverage. A cursory inspection of major datasets such as WOSIS11, UNSODA12 and measurements by Holtan13 shows critical deficiencies ranging from lack of spatial referencing to partial information with datasets omitting the wet-end (water content measured for matric potential ≤0.2 m) and dry-end (water content measured at matric potential ≥150 m) that is often defined as wilting point of SWCCs. For example, WOSIS11 and Holtan13 datasets provide SWCCs starting at 0.6 and 1.0 m matric potential, respectively, and missing the wet-end measurements. In contrast, the ZALF dataset14 provides wet-end information but dry-end information is missing. In addition, Batjes et al.11 and Holtan13 used bulk density measured at 3.3 m matric potential (defined as density at field capacity) to convert gravimetric to volumetric water content, ignoring the water mass retained in the pore space at field capacity. Therefore, a critical aspect of compiling a global SWCCs dataset for hydro-climatic applications is to harmonize the data and devise strategies for imputing missing information as we further elaborate in the section ‘Materials and methods’.

The objectives of this study are to address the limitations of currently available datasets (such as lack of wet- and dry-end measurements of SWCCs, poor estimates of dry bulk density, etc.) and to provide a systematic approach to infer more reliable and geo-referenced SWCC parameters. We put an effort to collect a global soil hydraulic properties (GSHP) database with 15,259 SWCCs by importing, quality controlling, and standardizing tabular data from existing datasets and scientific literature and estimating van Genuchten (vG) model parameters from the measured SWCCs. For the quality control, firstly, we excluded SWCCs using likelihood-based confidence intervals, yielding 11,705 as ‘good quality estimate’. Along with SWCCs, GSHP also provides information regarding soil texture, bulk density, organic carbon, and porosity. The database also contains 8,675 data of Ksat that allow the quantification of the unsaturated hydraulic conductivity function. The GSHP database covers most countries, climatic regions, and continents, including tropical regions with more intense soil-forming processes and inactive clay minerals. The data and codes are made publicly available to promote data-driven analysis and to collect additional data to increase the accuracy of global modeling of unsaturated soil properties in land surface and Earth system models.

Methods

Data sources

The GSHP database was assembled in two steps. Firstly, we combined well-known datasets such as UNSODA12, HYBRAS15, WOSIS11, and AFSPDB16. Secondly, we used different search engines, including Science Direct (https://www.sciencedirect.com/), Google Scholar (https://scholar.google.com/) and Scopus (https://www.scopus.com) to collect additional data. We searched for SWCC datasets using “soil water retention curves”, “moisture retention curves”, and “soil water characteristics curves” as keywords. Moreover, the country names were also used with these keywords to conduct more specific searches for areas with few or no data points (this was the case, for example, for Cambodia, Vietnam, and Thailand). The sources of the collected datasets are listed in Table 1 together with the number of SWCCs for each dataset.

Table 1 List of sources for SWCC data and number of SWCCs (N) per dataset assembled in the GSHP database.

The next task was to check the availability of spatial coordinates for the measurements in each dataset. We assigned each SWCC to one of eight ‘accuracy classes’ ranging from highest (0–100 m) to lowest (more than 10,000 m or non-available information (NA)) accuracy. For example, Forrest et al.17, and Ottoni et al.15 provided exact coordinates of the locations, thus we assigned a location accuracy of 0–100 m (i.e., highly accurate; see Table S1 for more details). For other references, we digitized provided maps or sketches with locations of the measurements. We first georeferenced these maps using ESRI ArcGIS software (v10.3) and then digitized the coordinates from georeferenced images. Some of the documents we digitized (e.g. Nemes et al.12) provided the names of specific locations, and hence we used Google Earth to obtain the coordinates. Coordinates of SWCCs for which only a location name was provided, were estimated using Google Earth, while SWCCs with neither coordinates nor location name were discarded. A detailed description of the extraction of coordinates is provided in Gupta et al.18. Note that Batjes et al.11, and Leenaars et al.16 used another definition of accuracy classes in their collection of datasets. Therefore, we merged the various accuracy definitions into broader classes as shown in Table S1. For example, samples that have minimum and maximum location accuracy less than 100 m (likely 0–10 m, 0–40 m, etc) are assigned to the 0–100 m class. Likewise, 0–1000 m values are assigned to the 500–1000 m class. Furthermore, datasets were cross-checked to avoid redundancy. For example, the WOSIS dataset11 includes the AFSPDB dataset16, so we removed the redundant data from WOSIS and used the original dataset (AFSPDB) in the final compilation. The Florida dataset19 was directly obtained from a project website.

Database cleaning and harmonization

The harmonization of the collected datasets was done by first converting all data to the same units. The units are described in Table 2. Note that we used potential heads (unit of length) and not pressure values for the characterization of the water potential. Furthermore, the data was cleaned as follows: a) SWCCs with maximum volumetric water content more than 1.0 m3/m3 were removed, b) SWCCs that had less than four data pairs were removed, c) SWCCs in which the water content increased more than 10% for increasing (absolute) matric potential were removed (see Figure S1), and d) SWCCs without wet-end information that did not have bulk density data were removed because it was impossible to impute the saturated water content without such information.

Table 2 List of all 54 variables in the GSHP database and their units.

After data extraction from literature, geo-referencing, and harmonization, all information was collected in tabular form in the new GSHP database. The database consists of 54 variables (see Table 2) and 136,989 records with water content measurements (and complementary information) at given matric potential recorded for 15,259 SWCCs. A list of the variables of the database along with their units is given in Table 2.

Conversion of gravimetric to volumetric water content

Some datasets such as Holten13 and Batjes et al.11 provided the gravimetric water content, and the dry bulk density (ratio of solid mass to total volume), before potential shrinkage was required to convert to volumetric water content. However, for these samples, the dry bulk density of a soil clod had been determined by measuring the volume (and mass) of a dry ‘clod’ of soil and not by measuring the soil sample volume at wet state (for shrinking soils, the dry bulk density will be overestimated by measuring the sample volume at dry state). For these datasets, the ‘wet’ bulk density at 3.0 or 3.3 m matric potential had been measured as well, and this value had been used in the original analysis to convert gravimetric to volumetric water content. However, this ‘wet’ bulk density (including water mass) is higher than the dry bulk density. Here we followed a different strategy for conversion to volumetric water content and assumed that the volume of the ‘clod’ of soil at 3.3 m matric potential is the same as at saturation. This is a simplification because soils with structural pores can shrink by application of −3.3 m matric potential as observed by Assi et al.20 and Bonvin et al.21. However, the reported bulk density changes between 0.1 and 5.2% (average 3%) were small compared to the error when using bulk density measured at 3.3 m matric potential to convert gravimetric to volumetric water mass (typically, about 20% of sample mass is water at a matric potential of −3.3 m). The adapted expression for dry bulk density is then equal to:

$${\rho }_{bulk}=\frac{{m}_{solid}}{{V}_{total}}=\frac{{\rho }_{3.3m}}{\left(1+{\theta }_{grav3.3m}\right)}$$
(1)

where ρbulk is the dry bulk density, msolid is the mass of solid particles, Vtotal is the volume of the soil clod at 3.3 m matric potential), ρ3.3 m is the bulk density and θgrav3.3 m is the gravimetric water content at 3.3 m matric potential. Note that Holten13 provided the gravimetric water content at 3 m matric potential, but we assumed for simplicity that the water contents at 3 and 3.3 m matric potential are equal.

PTFs for constraining saturated (θ s) and residual (θ r) water contents for SWCCs without wet- and dry-end measurements

In this study, SWCCs were modeled using van Genuchten (vG) Eq. 2.

The vG model provides the volumetric water content, θ (m3/m3) at matric potential ψ (m) as

$$\theta (\psi )={\theta }_{r}+({\theta }_{s}-{\theta }_{r}){[1+{(\alpha | \psi | )}^{n}]}^{-(1-1/n)}$$
(2)

where θs and θr are the saturated and residual water contents, respectively (m3/m3), and α (m−1) and n (dimensionless) are SWCC shape parameters.

To estimate the vG parameters, measurements close to full and residual saturation are needed. The GSHP database contains some SWCCs that lack the wet- and/or dry-end information (i.e., no measurements are available at matric potential ≤ 0.2 m or ≥ 150 m, respectively). Fitting the vG model to SWCCs without wet-end information leads to unreliable parameter estimates as shown in Figure S2 and Table S2. Therefore, PTFs were used to impute the wet- and/or dry-end information in this study.

PTF for θ s

A PTF for θs was developed and tested based on those SWCCs that had water content information at matric potentials ranging from 0.00 or 0.01 m (saturated water content) to 150 m (permanent wilting point). In total, 2,287 SWCCs had both wet- and dry-end measurements and from this set we selected those 1,947 SWCCs with soil texture and bulk density data to build the PTF. After fitting the vG model to these SWCCs see below, a robust linear regression PTF was developed for θs using the R package ‘robustbase’22. This package is useful when the residual errors of the regression model have long-tailed distributions.

The linear regression PTF for θs was developed using, as covariates, clay and sand contents, bulk density, and a categorical variable that distinguished between tropical and other climatic regions (to account for unique soil-formation conditions in tropical climate). Two PTFs of θs were developed depending on the available covariate information: The first PTF (Model1) was developed for the SWCCs when both soil texture and bulk density were available and the second PTF (Model2) for the SWCCs when only bulk density information was available. To account for the intense weathering processes in the wet and warm climate of the tropical regions23, we distinguished between SWCCs from tropical and other climate regions (arid, boreal, temperate, and polar). Tropical regions have often Oxisols that are dominated by inactive (non-swelling) clay minerals (kaolinite) as shown by Ito and Wagai24 and are characterized by intense soil-formation processes affecting soil hydraulic properties25. The climate region information was extracted from the Köppen-Geiger climate zone map26,27.

The accuracy of the PTF was assessed by 20-fold cross-validation using coefficient of determination (R2), Root Mean Square Error (RMSE) and BIAS as accuracy measures. BIAS and RMSE are defined as: \(BIAS={\sum }_{i=1}^{{N}_{1}}\frac{\left({\widehat{\theta }}_{s,i}-{\theta }_{s,i}\right)}{{N}_{1}}\) and \(RMSE=\sqrt{\frac{{\rm{SSE}}}{{N}_{1}}}\) where \(SSE={\sum }_{i=1}^{{N}_{1}}{\left({\widehat{\theta }}_{s,i}-{\theta }_{s,i}\right)}^{2}\). SSE is the sum of squared errors between the cross-validation predictions \({\widehat{\theta }}_{s,i}\) and the measurements θs, i, and N1 is the total number of SWCCs. The coefficient of determination R2 is defined as: \({R}^{2}=\left[1-\frac{{\rm{SSE}}}{{\rm{SST}}}\right]\) where \(SST={\sum }_{i=1}^{{N}_{1}}{\left({\theta }_{s,i}-{\bar{\theta }}_{s,i}\right)}^{2}\). SST is the total sum of squares and \({\bar{\theta }}_{s,i}\) is the arithmetic mean of the θs deduced from fitting the measured SWCCs. We calculated the prediction interval for θs at 95% confidence and used their upper and lower limits as box constraints for θs when estimating the vG parameters by the ‘soilhypfit’28 (https://cran.r-project.org/web/packages/soilhypfit/index.html) R package for the SWCCs lacking wet-end measurements.

Tuller and Or29 model for θ r

A physically-based model by Tuller and Or29 was used to constrain estimates of θr for the SWCCs where dry-end measurements were missing. The model describes the dry-end gravimetric water content (θm) as a function of specific surface area (SA, in m2/kg) and the thickness of water film (h, in meters) adsorbed on mineral surfaces,

$${\theta }_{m}=h\cdot SA\cdot {\rho }_{w}$$
(3)

where ρw is the density of water. According to Or and Tuller30, capillary contribution becomes negligible for matric potential above 1000 m for a wide range of soil textures. For water adsorbed on planar surfaces by van der Waals forces, Iwamatsu and Horii31 expressed the equilibrium thickness (h) of an adsorbed thin water film as a function of a matric potential (ψ) as shown below in Eq. 4:

$$h=\sqrt[3]{\frac{{A}_{svl}}{6\pi g{\rho }_{w}\psi }}$$
(4)

where Asvl (J) is the Hamaker constant for solid-vapor interactions (we used Asvl equal to 6 × 10−20 J as proposed by Tuller and Or29), ψ (m) is the matric potential, ρw (kg/m3) is the density of the liquid, and g (m/s2) is acceleration due to gravity. According to Or and Tuller32, the water film thickness of an adsorbed water layer is equal to 3.510−10 m. At 150 m matric potential, the water thickness according to Eq. 4 is approximately 7.010−10 m (two layers of water molecules). For SWCCs without dry-end information, we constrained the possible range of residual water content using Eq. 3 with thickness h between one and two monolayers of water (3.510−10–7.010−10 m).

Constraints on vG shape parameters

We constrained the possible values of n from 1.0 to 7.0. These limits were assigned considering literature, expert knowledge, and values obtained from fitting the SWCCs without constraints. We used minimum and maximum α values of 0 and 100 (m−1) for all textural classes. Because the inverse of α characterizes the capillary pressure in the largest soil pores, the maximum value imposed for α corresponds to a maximum pore diameter of 3 mm. Detailed Information on employed constraints when estimating the vG parameters is provided in Table S3.

Fitting and quality check of the vG parameters using soilhypfit R package

SWCCs parameters were estimated, with box-constraints described in the previous sections, using the ‘soilhypfit’ R package28. ‘soilhypfit’ was designed for parametric modeling of SWCCs and/or unsaturated hydraulic conductivity datasets. The main function to estimate vG parameters is called ‘fit_wrc_hcc’ and uses maximum likelihood (ML) to estimate with the restriction m = 1 - 1/n. ‘fit_wrc_hcc’ uses optimisation algorithms of the NLopt library33 or the Shuffled Complex Evolution (SCE) algorithm34. Firstly, we used the default global optimisation algorithms to fit the SWCCs. Then, we refined the parameter estimates by the default unconstrained local algorithm, again fitting the untransformed parameters (α, n) with the same parameter range as in global optimization and using as initial values the parameters estimated by the global optimization algorithm. The reason for this sequential procedure is that the parameters returned by the global algorithm are sometimes slightly away from the parameters of the true maximum of the objective function and the computed standard errors of α and n, computed from the Hessian of loglikelihood function (=the observed Fisher information matrix), are then not accurate.

We checked the quality of estimates of the van Genuchten parameters n and α using loglikelihood profiles. From the profiles, we computed likelihood-based confidence intervals of α and n for 3 confidence levels (0.5, 0.8, and 0.95) (see chapter 4 in Uusipaikka35). Furthermore, we identified those SWCCs where the estimates of α and n coincided with any of the limits of the constraining intervals. We classified the parameter estimates in five classes based on the loglikelihood profiles and the standard errors of estimated parameter. The first two classes contained SWCCs with estimated parameters equal to the upper limits (α = 100, n = 7). For these SWCCs no standard error for the estimated parameters could be specified (classes denoted as ‘upper limit for α’ and ‘upper limit for n’). The third and fourth classes had flat likelihood profiles such that no quantiles of 75% probability or higher could be determined (classes denoted as ‘flat upper profile for α’ and ‘flat upper profile for n’). The fifth class or final class of the SWCCs was assigned to ‘good quality estimate’. After fitting the vG model to the SWCCs, standard deviations of the modelling errors, say \({\widehat{\sigma }}_{\theta }\) were computed and and SWCCs with \({\widehat{\sigma }}_{\theta }\) larger than 0.1 m3/m3 water content are discarded.

Data Records

The database consists of 54 variables and 136,989 records with water content measurements at given matric potential (and complementary information) recorded for 15,259 SWCCs from 2,702 locations. A list of the variables of the database along with their units is given in Table 2. The global distribution of SWCCs is shown in Fig. 1, and the source of each dataset is given in Table 1. Table 3 shows the total number of SWCCs with the corresponding number N of θ-ψ data pairs from the various sources. The SWCCs with more than 5 data pairs dominate the database with 10,117 SWCCs followed by 3,784 and 1,359 SWCCs with 5 and 4 data pairs, respectively. The dataset with highest and lowest number of SWCCs were the Florida19 and the Swiss36 datasets, with 6,008 and 111 SWCCs, respectively. The ETH literature dataset designates the SWCC data that we collected by our own literature search. The database and readme file is uplodeded on the Zenodo platform and can be accessed using this link https://doi.org/10.5281/zenodo.554733837.

Fig. 1
figure 1

Spatial distribution of SWCCs in the GSHP database. A total of 2,702 locations are shown on the map. The locations are grouped into four classes: YW and NW stand for SWCCs with and without wet-end information (water content measured for matric potential ≤ 0.2 m), respectively, while YD and ND stand for SWCCs with and without dry-end information (water content measured at matric potential ≥ 150 m), respectively.

Table 3 Number of SWCCs with 4, 5, and more than 5 data pairs (matric potential, water content) per sample along with a total number of SWCCs.

Technical Validation

Quality of PTFs for θ s and θ r

Two PTFs for θs were developed for SWCCs without wet-end information (Eqs. 56). For 30% of all SWCCs, wet-end information was missing and had to be inferred from PTFs.

$${\theta }_{s}=a+b\cdot BD+c\cdot Clay+d\cdot Sand$$
(5)
$${\theta }_{s}=a+b\cdot BD$$
(6)

where θs (m3/m3) is the saturated water content, BD the bulk density (g/cm3) and Sand and Clay fractions are given in %. Note that we differentiated between tropical and other climatic regions. Only the values of the intercepts differ for tropical climate; coefficients b, c, and d are the same for all climatic regions. The coefficients are provided in Table 4. Moreover, the variance-covariance matrices and residual standard errors are provided for both models (Eqs. 5 and 6) in the supplementary document (Tables S4 and S5). The model was calibrated with 1,947 SWCCs, and 20-fold cross-validation was applied to validate the model. The results of 20-fold cross-validation for the model with bulk density and soil texture and differentiating between tropical and other climate regions are shown in Fig. 2a. Likewise, Fig. 2b shows the results for the model using only bulk density and climatic region. Moreover, the results for θr were validated by comparing the predicted water content at 150 m matric potential using the Tuller and Or model with the measured water content at 150 m as shown in Fig. 2c.

Table 4 Coefficients of linear regression PTFs for saturated water content θs for tropical and other climates.
Fig. 2
figure 2

Performance of linear regression PTF to estimate θs and of the Tuller and Or29 model to estimate θr for SWCCs without wet- and dry-end information, respectively. (a) Relationship between measured values and cross-validation predictions of θs (measured θs are the values deduced from fitting the vG model to measured SWCCs). The solid black line is the 1:1 line, and the blue dashed line is the LOWESS (locally weighted scatter plot smoothing) curve. Cross-validation resulted in R2 = 0.645 and RMSE = 0.061 m3/m3 with BIAS = −0.009 m3/m3. (b) Performance of PTF that uses only bulk density to estimate θs for SWCCs without wet-end information measurements. Cross-validation resulted in R2 = 0.611 and RMSE = 0.066 m3/m3 with BIAS = −0.008 m3/m3. (c) Relation between measured and predicted water content at 150 m matric potential. Quantitative validation yielded the R2 = 0.752, RMSE = 0.053 m3/m3 with BIAS = −0.002 m3/m3.

Spatial coverage and auxiliary soil properties

The GSHP database contains SWCCs from all USDA soil textural classes (see Fig. 3a). Concerning the geographical distribution of the data, most of the SWCCs are from North America followed by Europe, Africa, Asia, South America, Australia/Oceania as shown in Table S6. 10,048 SWCCs belong to the temperate region, and 2,344, 1,422,1,411, and 34 SWCCs to boreal, tropical, arid, and polar regions, respectively (Fig. 3c). Regarding the measurement method of SWCCs, 99% of the SWCCs were measured in the laboratory and only 1% stem from the field (most field SWCCs are from the UNSODA dataset). Note that different laboratory methods were used to estimate SWCCs. Most commonly, pressure plate and sandbox apparatus for wet-end and pressure chamber for the dry-end measurements were used.

Fig. 3
figure 3

Overview of SWCC data. (a) Distribution of soil textures of samples in the GSHP database on the USDA soil texture triangle. (b) Venn diagram illustrating the number of SWCCs in the GSHP database for which bulk density, soil texture, and soil organic carbon data were also available. (c) Number of SWCCs per climatic regions.

Along with SWCCs, in GSHP we also collected information on soil texture, bulk density, organic carbon, porosity, and saturated hydraulic conductivity (Ksat). Out of 15,259 SWCCs, 12,233, 15,125, 2,255, 2,754 SWCCs have information on soil texture, bulk density, organic carbon, porosity, respectively. In addition, 12,193 SWCCs have both soil texture and bulk density information, and 2,159 SWCCs have information on soil texture, bulk density, and organic carbon, as shown in Fig. 3b. Note that in addition to 12,233 soil texture values, 82 measurements have soil texture information with a total (sand + silt + clay) less than 98% or greater than 102%. We did not use these SWCCs in the development of the PTF for θs, but kept them in the database and assigned the value “Error” to the soil texture class variable. The database also contains Ksat data for 8,675 soil samples that allow the quantification of the unsaturated hydraulic conductivity function using the Mualem-van Genuchten parameterization (see Figure S3). Regarding the number of soil profiles per location, in the GSHP database, 214 locations are without profile information whereas 2,390, 60, 17, and 21 locations have 1, 2–5, 6–10, and >10 soil profiles, respectively. Similarly, 259, 1,418, 916, and 109 locations have 1, 2–5, 6–10, and >10 soil samples, respectively. Moreover, 32 locations are without soil depth information whereas 287, 1,447, 876, 60 locations have 1, 2–5, 6–10, and >10 soil depths, respectively as shown in Table S7.

After the quality check of SWCCs, 10,373 SWCCs have been assigned as most accurate with a 0–100 m location accuracy. 1,890 SWCCs lack location accuracy information. Furthermore, after calculating the likelihood-based confidence intervals, a total of 11,705 SWCCs out of 15,259 emerges as ‘good quality estimate’ whereas 1,857, 633, 947, and 117 were classified as ‘flat upper profile for α’, ‘flat upper profile for n’, ‘upper limit for α’, and ‘upper limit for n’, respectively.

Statistical characteristics of vG parameters

Table 5 reports the means and standard deviations of vG parameters and the number of SWCCs per soil textural class. The largest mean values of α were observed for clayey soils (classes clay and silty clay); loamy soils, and sandy soils showed smaller values of α. In contrast, the largest mean n was observed for sandy soils (classes loamy sand and sand). For the other textural classes, n ranged from 1.34 to 1.71. Similarly, highest and lowest θs and θr were observed for clayey and sandy soils, respectively. Regarding other soil properties, the largest Ksat values were obtained for sandy soils followed by clay soils, most likely due to the presence of macropores in these fine-textured soils18. Fig. 4 shows the distribution of vG parameters for combinations of aggregated soil texture classes and tropical and other climatic regions. In the database, there are 1,097 clayey, 5,034 loamy, and 4,768 sandy SWCCs from other climates and 423 clayey, 449 loamy, and 462 sandy SWCCs from tropical regions in the GSHP database.

Table 5 Means and standard deviations (format: mean, standard deviation) of basic soil properties, hydraulic conductivity, and vG SWCCs parameters per soil textural class.
Fig. 4
figure 4

Distribution of vG parameters using aggregated soil texture classes (sandy soils: sand and loamy sand; loamy soils: sandy loam, loam, silt loam, silt, silty clay loam, clay loam, and sandy clay loam; clayey soils: sandy clay, silty clay, and clay) for SWCCs from tropical and other climatic regions. Note that soil textures are estimated using the USDA-Natural Resources Conservation Service soil texture triangle. The numbers in panel (a) show the number of vG parameters for different aggregated soil texture classes used to make this plots.

Usage Notes

Challenges to determine vG parameters

The major problem to determine the vG parameters was missing measurements of water content close to saturation. Imposing the expected range of water content close to saturation had a large effect on the estimated vG parameters as shown with three illustrative examples in Figure S2 and Table S2. To scope with SWCCs that have not enough measured data pairs to cover the full matric potential and water content range, we imposed limits for the range of parameter values based on literature. Although we provided the PTF-based values of θs for SWCCs that lack the wet-end information, after estimating the likelihood-based confidence intervals, we noticed that some of the SWCCs are fitted with vG parameters equal to upper limits of box constraints used in the fitting process. For example, 15% SWCCs without measured wet-end information were fitted with α values equal to a maximum value (100 m−1). Moreover, only 37% (1,714 out of 4,598 SWCCs) of SWCCs without wet-end information were assigned to the class of ‘good quality estimate’. We conclude that the limited number of measured data pairs and water content range imposes a large uncertainty of vG parameter values. But due to the scarcity of more complete measurements in many regions of the world, we must use such SWCCs as well that allow some general characterization of the unsaturated soil properties.

Effect of climate on SWCCs

Soil hydraulic properties do not depend on soil texture alone but on soil structure as well, especially saturated hydraulic conductivity Ksat and the shape parameter α that is inversely related to the air entry value10,18. Soil-formation processes are particularly intense in tropical regions, it can be expected that SWCCs parameter values differ for tropical and other climatic regions38. For both clayey and loamy soils, we found larger α values for tropical regions. Similarly, Ottoni et al.15. also reported that soil hydraulic properties (vG parameters and Ksat) are different in the tropical compared to the temperate regions. Likewise, Gupta et al.18 illustrate that the PTFs developed using temperate Ksat measurements could not predict successfully tropical Ksat measurements. We hypothesize that this is due to the differences in the soil-forming processes that also determine the clay type and mineralogy. Tropical soils are often Oxisols rich inactive (non-swelling) clay minerals (kaolinite). In contrast to tropical soils, active (smectite) and moderately active clay minerals (illite) are the dominant clay minerals in other climate regions. These swelling clay minerals retain water within internal structures with strong capillary forces. Recently, Lehmann et al.25 revealed that the incorporation of clay-type informed PTFs could improve characterization of soil hydraulic and mechanical properties.

Limitations of GSHP database

Some limitations need to be taken into account when using the GSHP database, as shortly detailed here. The database still lacks data from some regions (mainly Canada, and northern and western Australia). The database, in addition to SWCCs with ≥5 data pairs, also includes 9% SWCCs with 4 data pairs only (see spatial distribution of θ-ψ data pairs in Figure S4). For these SWCCs, the vG parameters are not well estimated because the measurements do not allow to properly capture the shape of the SWCC. 94% of SWCCs with 4 data pairs belong to the class without wet-end information. For some samples, we extracted the coordinates using Google Earth, and 12.3% SWCCs have no location accuracy information, hence, we advise using the coordinates with caution. Some estimated parameters were equal to the limits of the higher range of vG parameters but this represents only a limited portion (1% SWCCs attained the n value = 7 and 6.2% SWCCs obtained α = 100 m−1) of the total database. In the end, 11,705 SWCCs emerge as the good quality estimate after passing the profile likelihood test. Therefore, we recommend using these SWCCs with confidence, and other SWCCs should be used with care.

Future applications of the GSHP database

The new GSHP database presents SWCC data from all continents and covers regions in Russia, which were so far not represented in currently available datasets and generally not included in PTF training and global mapping of soil hydraulic properties (note that there are still some gaps in the geographical representation of the data, especially data are lacking from Canada and northern and southern Australia). This global coverage is a great asset for the development of global PTFs for vG parameters. Remote sensing technology opens the doors to link measured hydraulic properties with global environmental data. For example, Gupta et al.39 linked Ksat with satellite-based maps of environmental covariates such as local information on vegetation, climate, and topography to generate a global map of Ksat with 1-km resolution. Likewise Chaney et al.40 developed the maps of soil hydraulic properties (POLARIS dataset) at 30 m resolution using remote sensing covariates for the United States using SSUGRO (Soil Survey Geographic) database. We recently used the GSHP dataset to generate highly resolved global maps of vG parameters. For that purpose, we applied machine learning approaches to relate vG parameters to soil and environmental covariates and predicted vG parameter values for all locations as a function of remote sensing information41.