Background & Summary

Drylands, which comprise all areas with an aridity index (precipitation divided by potential evapotranspiration) lower than 0.65, collectively form the largest set of biomes on Earth1. In these water-limited ecosystems2,3, soil moisture is a key determinant of their structure and functioning4,5,6 as it largely drives the activity of vascular plants and soil organisms7, and impacts multiple hydrological processes, such as runoff, evaporation and transpiration from vegetation8, and biogeochemical cycles9. As such, soil moisture largely affects essential ecosystem services provided by these ecosystems, such as soil fertility and biomass/food production, which directly sustain the livelihoods of more than 1 billion people worldwide10.

Soil moisture is characterized by complex dynamics across a wide range of spatio-temporal scales11. Thus, an accurate characterization of the spatio-temporal dynamics of soil moisture can be particularly helpful for assimilation models, weather and flood forecasting, surface and subsurface hydrology studies and drought monitoring at local and regional scales8,9,11,12,13. Moreover, it may widen our understanding on feedback mechanisms between different meteorological and hydrological components and their interaction with ongoing climate change14. Climate models forecast average (median) warming values ranging from 3.2 °C to 3.7 °C for drylands by the late XXI century15, which together with associated changes in rainfall patterns, may decrease soil moisture across drylands worldwide16,17. These projections are not, however, free from uncertainties18. Continuous and long-term (>10 yrs) observations of soil moisture are particularly valuable for calibrating remote sensing products19 and parameterizing hydrological/ecosystem models12,13,20. These observations can be particularly useful to reduce the uncertainty of forecasts of long-term changes in soil moisture and other hydrological and vegetation attributes due to climate change20. However, such soil moisture series are only available for a limited set of ecosystems and geographical areas19,21, and are particularly scarce in drylands.

In drylands, vegetation is typically organised in a two-phase mosaic composed by plant-covered patches interspersed in a matrix of open areas without perennial vascular plants22,23,24. Vegetated and open areas have contrasted water dynamics, with infiltration rates that are typically higher beneath plant patches which also have lower water losses via run-off and evaporation25,26,27,28. Open areas are, however, not devoid of life as they are commonly covered by biocrusts, communities dominated by mosses, lichens, fungi, and cyanobacteria living in the soil surface across drylands worldwide29. Both vascular plants and biocrusts are key modulators of the water cycle in drylands, as they affect processes that, such as infiltration, runoff and evapotranspiration30, ultimately determine soil moisture contents. Despite the hydrological importance of both vascular plants and biocrusts, no dataset characterizing long-term (>10 yr) temporal variations in soil moisture across plant- and biocrust-dominated areas (microsites) is currently available.

Here we introduce the MOISCRUST dataset, a 14-yr continuous dataset of surface soil moisture measurements from multiple microsites (vegetated and open areas with different degree of biocrust development) gathered from the Aranjuez Experimental Station, a semi-arid grassland in Central Spain where multiple studies on the ecology of biocrusts have been carried out27,31,32,33,34,35,36.


Study site

The Aranjuez Experimental Station is located at the centre of the Iberian Peninsula (40° 02′ N–3° 32′ W; 590 m a.s.l., Fig. 1). The climate is Mediterranean semiarid, with average annual temperature and rainfall of 15 °C and 349 mm, respectively. Soils are classified as Gypsiric Leptosols37, with pH, organic carbon, and total nitrogen content values ranging between 7.2 and 7.7 mg/g, 9 and 32 mg/g, and 0.8 and 4 mg/g soil, respectively, depending on the microsite (open areas, vegetation, and biocrusts) considered31. Soils have a silty loam texture, showing c. 64.5%, 63.7–64.1% and 61.3–63.7% of sand, c. 28.4%, 28.4–29.2% and 30.0–32.4% of silt and c. 7.1%, 6.7–7.9% and 6.3% of clay for open and biocrust-dominated areas, respectively. The vegetation is dominated by Stipa tenacissima L. (18% of total cover), Retama sphaerocarpa (L.) Boiss, and Helianthemun squamatum Pers. (6% of total cover for both shrubs)31. The open areas between vascular plant patches are covered with a well‐developed biocrust community that covers ~34% of the soil surface, and is dominated by lichens such as Diploschistes diacapsis (Ach.) Lumbsch, Squamarina lentigera (Weber) Poelt, Fulgensia subbracteata (Nyl.) Poelt, Toninia sedifolia (Scop.) Timdal, and Psora decipiens (Hedw.) Hoffm33.

Fig. 1
figure 1

Location (upper panels) and partial view (lower panel) of the study area in central Spain, where patches of Stipa tenacissima and Retama sphaerocarpa are surrounded by a well-developed biocrust (white patches dominating the space between plant individuals) dominated by species such as Diploschistes diacapsis, Fulgensia subbracteata and Psora decipiens. From Berdugo et al.7.


The reproducible workflow is available in the Supplementary Material as an interactive Rstudio notebook in the file moiscrust.Rmd. It is packaged with renv to facilitate reproducibility. That means that the R package versions originally used to run the notebook are already installed in the “renv” folder of the repository. This workflow contains the following steps: (i) Data loading and preparation, (ii) imputation of missing data, (iii) incorporating weather data at daily resolution, and (iv) preparing dataset formats (see Supplementary Material for more details).

Data acquisition

Soil moisture was measured in the five most common microsites at the study site (Fig. 2): Stipa tussocks (Stipa), Retama shrubs (Retama), and open areas devoid of perennial vegetation with very low (<5%, BSCl), medium (25%-75%, BSCm) and high (>75%, BSCh) cover of biocrust-forming lichens. Stipa microsites were placed at the north-face of Stipa tussocks, within 10 cm of their base, and are characterized by shaded conditions and a biocrust community dominated by mosses (mainly Pleurochaete squarrosa and Tortula revolvens). Retama microsites occur beneath the canopy of R. sphaerocarpa shrubs, and are characterized by moderate shade and litter accumulation. All microsites were selected in flat areas to reduce water retention from runoff, as this could be a confounding factor in soil moisture measurements, and were separated at least 2 m from one another.

Fig. 2
figure 2

Photographs of the different microsites used in the study. Stipa = Stipa tenacissima; Retama = Retama sphaerocarpa; BSCl = open areas devoid of perennial vegetation with very low (<5%) cover of biocrust-forming lichens; BSCm = open areas with medium (25%–75%) cover of biocrust-forming lichens; BSCh = open areas with high (>75%) cover of biocrust-forming lichens. From Berdugo et al.7.

We used soil moisture sensors (ECH2O EC-5, Decagon Devices Inc., Pullman, USA) to monitor soil moisture at sub-daily resolution. The sensors used provide estimates of volumetric water content (VWC) with an accuracy of ± 3%, and standard equations applied were used to sensor calibration in all microsites, as given their very similar texture values errors would be the same between microsites38,39,40,41,42,43,44. Such an approach has commonly been used in studies assessing soil moisture in drylands38,39,40,41,42, and works pretty well with the type of soils of our study site43,44. Three replicated sensors per microsite (total n = 15) were installed according to a stratified random design in November 2006 (Fig. 3). The sensors were introduced vertically in the soil45, so that the probe registered soil moisture from 0 to 5 cm depth. We did so for two main reasons: i) we were particularly interested in register the soil moisture in the topsoil (from 0 to 5 cm depth), which is the fraction of the soil profile particularly affected by plants and biocrusts (e.g38,39,46,47,48.), and ii) installing the sensors horizontally would have implied conducting substantial disturbance in a protected and very sensitive ecosystem (biocrusts are very sensitive to trampling and other disturbances48,49,50), and this was something we wanted to avoid at all costs. Doing so would have also affected other measurements we have been conducted in this experiment, such as soil respiration46. The study area also had a meteorological station (Onset, Pocasset, MA, USA) that collect daily temperature, precipitation and relative air humidity (error of ± 0.2 °C; ± 0.2 mm and ± 3.5% respectively) from 30th March 2007 to 16th December 2020. Besides, solar radiation (W/m²) was daily collected during this period using a Silicon Pyranometer (Onset S-LIB-M003).

Fig. 3
figure 3

Pictures of the EC-5 moisture sensors used in open areas devoid of perennial vegetation with very low (<5%, A) and high (>75%, B) cover of biocrust-forming lichens.

Soil moisture has been recorded at the five different microsites described above since 17th November 2006. Three replicated soil moisture sensors were placed at each microsite, recording measures of VWC (m3/m3) continuously (every 120 min from 17th November 2006 to 31th January 2017 and every 150 min from 1st February 2017 to 16th December 2020). Hence, the data presented in this Data Descriptor includes a spatio-temporal continuous soil moisture dataset from 2006 to 2020, and shows the effect of both vegetation and biocrusts (with different degree of cover) on soil moisture during this period.

Filling data gaps

MOISCRUST contains a total of 697,695 records over the study period, obtained from a total of 15 soil moisture sensors, of which 380,583 are either missing or negative values (54.5% of the total records). These missing values are due to diverse causes, including damaged sensors, sensors that were removed for maintenance, exhausted batteries or malfunction caused by rabbits (Oryctolagus cuniculus), which gnaw the wires of the sensors (after we discovered rabbits do this we protected wires with a plastic hose). Besides, the MOISCRUST database has several negative values (anomalous values by imbalances in the standard equation) falling within the margin of error of the sensors. These anomalous values were set to NA. In these cases, when an anomalous data was observed, we checked whether the sensor continued to measure correctly by comparison with another trustworthy sensor. Later, equal observed measurements were included in the dataset, and anomalous measurements were discarded.

To fill the gaps in the MOISCRUST dataset, we first found, for a given entry y with missing data at time t, the sensor x with data for t that is in the same type of microsite (if possible), has the longest duration in common, and shows the highest correlation with the sensor to which y belongs. Then we estimated the missing value y with a linear model y ~ x. To find the best possible candidate sensor (x) to estimate the missing data (y), we correlated all pairs of sensors and computed a selection score based on the following equation:

$${S}_{x}= \% {vc}_{x,y}+({R}_{x,y}^{2}\cdot 100)+\left\{100,\;{\rm{if}}\;{{microsite}}_{x}={{microsite}}_{y}\;{\rm{or}}\;{\rm{0,}}\;{\rm{otherwise}}\right\}$$

where Sx is the selection score of the candidate sensor x; y is the sensor with a missing value to be estimated; x is the sensor to be used as candidate predictor to estimate the missing value in y; %vcx,y is the percent of common valid cases of the sensors x and y; R2x,y is the Pearson’s R² of the common valid cases of the sensors x and y; and micrositex and micrositey are the respective microsites of the sensors x and y. During data imputation, the sensor with the higher selection score was used to estimate each missing value (see Supplementary Material for a detailed description and a worked example of this procedure).

To provide an indicator of imputation quality, the algorithm generates a new column named interpolation quality, where the observed values are marked with “1”, and the imputed values contain the correlation coefficient of the model used to estimate them (see Supplementary Material for details). After this process was completed, the number of missing values in the dataset was reduced to 133,881 records (19.2% of the total records). The imputation algorithm was implemented using the R software51 and the libraries ‘renv’52, ‘data.table’53, ‘janitor’54, ‘tidyverse’55, ‘kableExtra’56, ‘foreach’57, ‘doParallel’58, ‘readr’59, ‘writexl’60, ‘RSQLite’61, ‘zip’62, ‘knitr’63, and ‘DBI’64.

Data structure

The raw and interpolated data sets of soil moisture provide records and estimations of soil moisture from 17th November 2006 to 16th December 2020 in four different formats: plain text (csv), SQLite, R (.Rdata), and Excel (.xlsx).

Data Records

Raw and imputed data (in the “data” and “database” folders, respectively) are freely available from Figshare65. Data files come along with a metadata file with a brief description of the dataset. This dataset will be updated annually in Figshare to include data additions. In addition, the repository contains the “renv” folder to facilitate the reproducibility (see Methods and Supplementary Material). For a fully description of this database please see the Data Descriptor “Moreno, J., S. Asensio, M. Berdugo, B. Gozalo, V. Ochoa, D. S. Pescador, B. M. Benito & F. T. Maestre. 2022. Fourteen years of continuous soil moisture records from plant and biocrust-dominated microsites. Scientific Data,”.

Technical Validation

Soil moisture measurements from the EC-5 sensors were validated using independent measurements obtained in the same date and microsites with the Time Domain Reflectometry technique (TDR66). These measurements were conducted at the same depth (0–5 cm) using TDR probes as described in Castillo-Monroy et al. . A total of 169 TDR measurements gathered between 17th March 2009 to 25th October 2018 and including the whole range of soil moisture values observed at the study area were used for this validation. The results obtained show a well-adjusted linear relationship between TDR and EC-5 measurements (adjusted R2 = 0.722, β = 0.839, 95% CI [0.753, 0.924], Fig. 4), which suggests that the sensors used properly measure soil moisture contents and their temporal variation at the study area.

Fig. 4
figure 4

Relationship between soil moisture obtained by EC-5 sensors and Time Domain Reflectometry (TDR) measurements at the same date and microsite during 2009–2018.

Usage Notes

Previous, short-term versions of the MOISCRUST dataset have been used to model annual variations in soil respiration rates across vegetation- and biocrust-dominated microsites, and to assess how vegetation, biocrusts and abiotic factors modulate wetting and drying events7. This dataset is particularly well suited for long-term studies focused on understanding spatio-temporal patterns of soil moisture in drylands67, and to analyse the effects of soil moisture–vegetation relationships (e.g. links between plant functional types and soil moisture68) and feedbacks on the dynamics of dryland ecosystems69. It also can be used to evaluate how both vascular plants and biocrusts determine soil water dynamics in drylands, to parameterize/tune up hydrological models aiming to study the hydrological behaviour of these ecosystems and to forecast their hydrological responses to ongoing climate change. Overall, the data provided by MOISCRUST contributes to advance our understanding of hydrologic processes in drylands and as such will be of interest to both researchers and managers working in these important ecosystems.

When using data from the MOISCRUST dataset please cite this publication. Both data and code are available under a Creative Commons Attribution 4.0 International Public License, whereby anyone may freely use data and adapt our dataset, as long as the original source is credited, the original license is linked, and any changes to our data are indicated in subsequent use.