## Background & Summary

Evapotranspiration (ET) connects the surface water and energy budgets1. It is the second largest component of the terrestrial water balance after precipitation and is a source of feedback in the climate system2,3. Our ability to observe the return-flow of moisture from the land to the atmosphere is limited by sparse in situ observations that are not generally representative of regional scales4,5,6. Remotely sensed ET across a range of data products often have similar representations of ET’s seasonality2,7. However, these products show large dissimilarities7, in particular when water is the limiting factor for ET (e.g. during drought)2,8. In the absence of snow, ET is the sum of three components: 1) evaporation from the soil surface (Esoil), 2) transpiration from vegetation (ET), and 3) evaporation of intercepted water from vegetation canopies (EC). Partitioning of ET into these three components with models5,9,10 and remote sensing2 often reveal large disagreements. In this study, we apply the methodology developed by Small et al.11 to estimate soil evaporation using soil moisture drying rates observed by the Soil Moisture Active Passive (SMAP) satellite. This continental-scale gridded dataset is unique from other datasets and has the potential to improve the representation of ET partitioning in hydrologic models and climate studies.

Ground-based observational techniques, for example, the eddy covariance12 or Bowen Ratio energy balance methods13,14, provide measurements of the total ET flux. However, these ground-based observations only provide estimates of Esoil when ET is zero, for example when vegetation experiences seasonal senescence. Ground-based measurements can provide estimates of Esoil directly, such as weighing lysimeters15,16, and indirectly, such as the heat pulse method17,18. However, such ground-based observations of Esoil are labor intensive, and thus cannot be applied at the regional scale or for long-term monitoring15,16,17,18,19.

Land surface models (LSMs) compliment sparse ground-based monitoring of ET by producing spatially and temporally continuous estimates of total ET and its components. Yet, simulated fluxes are dependent on imperfect model structure and parameters that are difficult to estimate, resulting in large differences in Esoil estimates from different LSMs4,5,11. Total ET simulated by LSMs in the Global Land Data Assimilation System (GLDAS20), North American Land Data Assimilation System phase 2 (NLDAS-221,22) and experimental NLDAS-Testbed have been evaluated through comparison with remotely sensed ET23,24 and networks of eddy covariance flux towers11,25, but there has been no similar effort to evaluate Esoil, ET or Ec as few datasets exist for this purpose4,5. Without observationally-based estimates of how ET is partitioned into the component fluxes, it is not possible to improve the representation of Esoil, ET or Ec in hydrologic models.

Remote sensing provides a promising tool for estimating latent heat flux and evaluating simulated ET. Remote sensing methods that estimate ET largely rely on thermal data as a key input to the evapotranspiration algorithm26,27,28,29. However, these algorithms do not provide information about ET partitioning, only the total ET flux. Two exceptions being the Global Land Evaporation Amsterdam Model (GLEAM3,30) and the Priestly Taylor Jet Propulsion Laboratory (PT-JPL31,32) products, that provide estimates of total ET and its components. These realizations use remotely sensed soil moisture to inform estimates of Esoil, but both GLEAM and PT-JPL are strongly dependent on models that indirectly estimate ET and its components rather than direct measurements of evaporative flux (e.g. weighing lysimeters).

To address the above issues, we develop a new remote sensing-based dataset of Esoil over the conterminous United States (CONUS) from 2015–2019 that essentially uses SMAP as a giant lysimeter with a sensing scale equivalent to SMAP’s 9 km x 9 km footprint. This Evaporation-Soil Moisture Active Passive dataset (E-SMAP) is the first to use remotely sensed soil drying rates in a mass balance framework to estimate Esoil (Fig. 1), thus providing unique estimates of Esoil3,31. We extend the initial work of Small et al.11 that developed and evaluated E-SMAP at several in situ observation locations, to provide a continental-scale, 4-year, 9 km soil evaporation dataset. In this data descriptor, we first describe calculation of Esoil and a data screening procedure, followed by an exposition into the components of soil evaporation. Since there are no ‘true’ observations of continental-scale soil evaporation, the technical evaluation consists of comparisons between E-SMAP and another remote sensing Esoil product (GLEAM) as well as two LSM-based datasets from the NLDAS-2.

## Methods

### Evaporation and the water balance of the surface soil layer

The procedure used to create E-SMAP follows the methodology described in Small et al.11. A brief summary is provided here, along with descriptions of alterations made to that approach. Esoil is estimated independently at each SMAP 9 km x 9 km grid cell via a water balance of the surface soil control volume (Fig. 1), where:

$${E}_{soil}=-\frac{d{\theta }_{s}}{dt}D-{q}_{bot}-{E}_{Ts}+I$$
(1)

$${\theta }_{s}$$ is volumetric soil moisture in the surface soil control volume (mm3mm−3), D is the thickness of the control volume (mm), qbot (mm day−1) is the flux across the bottom boundary of the control volume, ETs (mm day−1) is surface transpiration which is the fraction of total transpiration proportional to the fraction of roots within the top 50 mm surface soil layer, and I is infiltration (mm day−1). We define the thickness of the control volume, D, to be equivalent to the SMAP sensing depth (50 mm)33, noting that this sensing depth can vary through time with soil moisture34. We define qbot as positive when water moves from the control volume to deeper soil and negative when water moves from deeper soil to the control volume. Surface transpiration, ETs, is the fraction of total ET, proportional to the fraction of roots within the top 50 mm of the soil.

We use SMAP soil moisture time series to estimate Esoil following the assumption that Esoil is typically the largest flux in Eq. 1 excluding times when infiltration is actively occurring due to precipitation or snowmelt11. The observed $${\theta }_{s}$$ time series is used to calculate $$\frac{d{\theta }_{s}}{dt}$$ for intervals defined by successive SMAP overpasses35. The remaining terms on the right-hand side of Eq. 1 are estimated using a combination of auxiliary data and models described below.

### Precipitation screening

Following Small et al.11, Eq. 1 is not applied to SMAP overpass intervals with substantial precipitation, since we seek to minimize uncertainties in the partitioning of incoming precipitation between runoff, canopy interception, and infiltration. Therefore, ‘valid intervals’ are defined as successive SMAP overpasses with less than 2 mm of precipitation, while those with larger precipitation values are considered ‘not valid’11. This threshold was selected to reflect SMAP’s accuracy and sensing depth33,36, where 2 mm of infiltrated water in a 50 mm soil column yields a soil moisture change equal to SMAP’s reported uncertainty (0.04 mm3mm−3). After screening for precipitation, 66% of SMAP’s overpasses remain valid (Fig. 2a).

### Bottom flux (qbot)

We use the Hydrus 1-D model37 to estimate qbot. Model inputs include soil properties that are defined using soil texture and top boundary conditions that are set to observed atmospheric boundary conditions (Table 1). The model solves the Richards’ equation for saturated and unsaturated conditions. Here, the modeled soil column depth was set to 1000 mm, discretized with 101 nodes evenly separated 10 mm apart. Model simulations were initialized with a 4 year run (April 1, 2015- March 31, 2019), where the outputs from March 31, 2019 of this spin-up were used to set initial soil moisture conditions in the Hydrus simulations used to calculate qbot for E-SMAP. The exchange of moisture below the 50 mm node represents the flux at the bottom boundary of the control volume, qbot. Small et al.11 quantified the uncertainty of qbot caused by soil parameter uncertainties to be less than 0.1 mm day−1 during valid intervals (<2 mm of total precipitation).

### Transpiration from the surface soil layer (ETs)

We compute transpiration from the surface soil control volume for each grid cell based based on the calculation of total transpiration38. Using a modified version of the Penman-Monteith potential evapotranspiration (PET) equation39, potential transpiration is calculated accounting for fraction of the land surface covered by vegetation based on Enhanced Vegetation Index (EVI)40:

$$\lambda E=\frac{\left(s\times A\times {F}_{c}+\rho \times {C}_{p}\times \frac{{e}_{sat}-e}{{r}_{a}}\right)\times (1-{F}_{wet})}{s+\gamma \times \left(1+\frac{{r}_{s}}{{r}_{a}}\right)}$$
(2)

where λE is potential transpiration, s is the slope of the saturated water vapor pressure curve (Pa K−1), A is the net radiation (W m−2), $$\rho$$ is air density (kg m−3), Cp is specific heat capacity of air (1005 J kg−1 K−1), esat-e is vapor pressure deficit, ra is aero dynamic resistance (s m−1), $$\gamma$$ is the psychometric constant (Pa K−1), and rs is surface resistance. Fc is the fraction of total vegetation cover calculated as a function of EVI38 and Fwet is the relative surface wetness38. We then calculate ETs from λE by applying linear restrictions based on the fraction of total roots in the surface soil layer following an exponential function for root density41 as well as the surface soil water stress using observed soil moisture content from SMAP and soil properties42,43 in Eq. 3

$${E}_{Ts}=(\lambda E\times rf)\times {F}_{SM}$$
(3)

where rf is the percent of roots in the top 50 mm of the surface soil column41 and FSM is the soil water stress, calculated following prior literature42,43 using Eq. 4

$${F}_{SM}=\frac{({\theta }_{i}-{\theta }_{w})}{({\theta }_{cap}-{\theta }_{w})}$$
(4)

where θi is soil moisture at timestep i, θw is the wilting point of the soil and θcap is the field capacity of the soil.

Input data sources for calculation of ETs can be found in Table 1.

### Infiltration (I)

I is assumed to be equivalent to precipitation during valid intervals, and is therefore expected to be overestimated since canopy interception is not considered. We do not expect this error source to significantly impact Esoil calculated over intervals with little or no precipitation because overestimates in I will largely cancel out with overestimates in downwards qbot which are estimated from Hydrus 1-D simulations that receive the same precipitation. This assumption may result in underestimation of Esoil during periods when I is driven by other sources, such as snowmelt. However, these errors are expected to negligibly impact E-SMAP because SMAP already includes screening flags for regions and times with frozen soil and substantial snow coverage (snow fraction exceeding 5%)44.

### Data screening

Data are screened on the basis of precipitation (described above in the Precipitation Screening section) as well as through SMAP quality flags. SMAP’s retrieval quality flag is used to screen data that is not of “recommended quality”44. Screening on the basis of SMAP’s quality flags resulted in a reduction of nearly 40% of all SMAP grid cells in the study domain (118,531 to 72,105).

An additional constraint is the non-convergence of the Hydrus 1-D solver. 9,450 grid cells did not converge in Hydrus 1-D with the originally chosen soil parameter sets. To overcome the non-convergence, soil parameters at these grid cells were altered one of two ways: (1) parameters associated with the secondary soil classification at the grid cell were used or (2) if there was not a secondary soil classification, the NLDAS-2 “other” soil classification was used. Altering soil parameters resulted in convergence of 8,699 grid cells, while the remaining 751 points (0.6% of the domain) were ultimately screened from the dataset. Altering soil parameters is expected to have minimal impacts on calculations of Esoil because the uncertainty in qbot associated with soil parameters is much smaller than the magnitude of Esoil11. Finally, intervals with negative Esoil or ETs estimates were considered physically unrealistic and were also screened, reducing the E-SMAP space-time domain by 31%. The two primary reasons for negative Esoil outputs from Eq. 1 are (i) negative biases in SMAP observed drying rates and (ii) underestimates in precipitation (e.g. under-catch errors). The implications of this screening procedure as a whole are presented in the Technical Evaluation section.

### Statistical testing

Statistical significance of a Pearson correlation reported in the Technical Evaluation section is calculated from a right-tailed significance test in MATLAB (https://www.mathworks.com/help/stats/corr.html). Statistical significance of the differences between medians that are reported in the Technical Evaluation section are calculated from paired one-tailed Wilcoxon signed rank tests using the exactRankTests R Library45.

## Data Records

A list of data sources used to build E-SMAP are included in Table 1. Each data source is remapped to SMAP’s 9 km EASE-Grid with the nearest neighbor approach. As part of the E-SMAP dataset, gridded estimates are posted for each component in Eq. 1 on SMAP’s 9 km EASE-Grid from April 2015 through March 2019 during SMAP’s valid intervals (Table 2). The spatial domain encompasses 25°N–50°N and 125°W–67°W, covering the entire CONUS. The dataset, archived on Mendeley in netCDF format, is intended to support modeling development efforts that focus on the partitioning of ET into its components and climate case studies within the period of data record (2015–2019) that require independent representation of ET components. The dataset should be cited as: Abolafia-Rosenzweig, R., Badger, A., Small, E., Livneh, B. E-SMAP: Evaporation-Soil Moisture Active Passive. Mendeley https://doi.org/10.17632/ffw8zbdmpm.2 (2020)46.

E-SMAP is compared with one remote sensing-based and two LSM-based soil evaporation datasets in the “Technical Evaluation” (Table 3). The three evaluation datasets were remapped to SMAP’s 9 km EASE-Grid using bilinear interpolation from the CDO software47 prior to comparison with E-SMAP. No true ‘validation’ of E-SMAP was conducted because no continental-scale and spatially representative observations of Esoil exist. Thus, the technical evaluation examines similarities and differences of E-SMAP relative to widely used Esoil datasets rather than quantifying the accuracy of E-SMAP. A point scale evaluation of the E-SMAP methodology over 10 validation sites can be found in Small et al.11.

## Technical Evaluation

Kernel density estimators are used to show the overall tendencies of E-SMAP components in Fig. 3b–g. Esoil is largely explained by SMAP drying rates, $$-\frac{d{\theta }_{s}}{dt}D$$ and is modulated more modestly by other fluxes in Eq. 1 that are estimated from auxiliary data and models (qbot, I, and ETs). On average, for most regions, qbot is upwards into the surface control volume and largely ‘cancels out’ with ETs. Additionally, qbot, I, and ETs are each approximately four to five times smaller than SMAP drying rates. This results in the summation of qbot, I, and ETs to be, on average, four times smaller than drying rates observed by SMAP (Fig. 3).

The median ratio between SMAP drying rates and Esoil (Fig. 3a) is used to quantify the central tendency of the fraction of the Esoil signal attributable to SMAP drying rates. For example, in the Midwest, this fraction is 0.85, thus the summation of components estimated from ancillary data and tools (qbot, I, and ETs) composes 15% of the Esoil signal. E-SMAP relies on ancillary data and models more heavily where the ratio of SMAP drying to Esoil is substantially less than 1.0. For example, in the Northwest this ratio is approximately 0.77. There is a statistically significant correlation (p < 0.01; R2 = 0.91) between mean regional drying rates and the ratio of drying rates divided by Esoil, supporting the interpretation that where the SMAP drying rates are relatively large, qbot, I and ETs play smaller roles in the Esoil calculation. Overall, Fig. 3 supports that variability of Esoil in E-SMAP is primarily explained by SMAP drying rates, with contributions from other estimates ranging from 2% (Northeast) to 23% (Northwest).

We seek to understand the implications of data screening on the magnitude of Esoil to evaluate the representativeness of the screened E-SMAP product on climatological conditions. We compare a screened version of each evaluation product, matching E-SMAP’s temporal sampling produced from screening, with corresponding temporally continuous estimates (Fig. 4). All evaluation datasets show that E-SMAP screening results in a statistically significant increase (p < 0.01) in the central tendency of mean monthly Esoil (Fig. 4) and Esoil/ET (not shown). Evaluation products’ Esoil averaged over valid E-SMAP intervals are larger than corresponding continuous estimates, on average, by 9%, 10% and 2%, while Esoil/ET is larger by 3%, 17% and 8% for GLEAM, Mosaic and Noah, respectively. Figure 4d shows the interquartile range for the ratio of Esoil from screened time series relative to continuous time series is 1.05–1.12, 1.06–1.14, and 1.00–1.05 for GLEAM, Mosaic and Noah, respectively.

Screening based on negative E-SMAP Esoil results in higher monthly Esoil in all evaluation datasets, whereas precipitation screening results in higher Esoil in GLEAM and Mosaic but lower Esoil from Noah. Precipitation screening results from GLEAM and Mosaic contradict the hypothesis that Esoil is higher over rainy intervals. Therefore, these results may indicate that Noah more accurately represents Esoil relative to GLEAM and Mosaic. However, further analysis into this disagreement is outside the scope of this data descriptor. Regardless, the effect of precipitation screening in reducing Noah Esoil is outweighed by increases corresponding with negativity screening. In sum, all evaluation products show higher Esoil after following the E-SMAP screening procedure. Thus, on average, the E-SMAP product is expected to represent a modest, but significantly higher, monthly Esoil and Esoil/ET than temporally continuous estimates, notwithstanding large spatial and temporal variability noted in Fig. 4. We therefore include temporally static, gridded scaling factors with the E-SMAP dataset—calculated as the ratio of mean monthly continuous Esoil time series divided by mean monthly screened time series from evaluation datasets—that may be multiplied with E-SMAP’s final Esoil to estimate average temporally continuous Esoil over the 4-year E-SMAP period. Key to the application of these scaling factors is the assumption that Esoil estimated from Eq. 1 is affected by scaling factors similar to evaluation products.

Esoil from E-SMAP falls within the range of the evaluation products (Fig. 5). Comparing mean values of Esoil, E-SMAP is on average 0.72 mm day−1, which is larger than GLEAM (0.17 mm day−1) and Noah (0.5 mm day−1) but smaller than Mosaic (0.89 mm day−1). E-SMAP Esoil has a lower R2 with GLEAM, Mosaic and Noah (0.16, 0.13 and 0.15, respectively; not shown) than correlations between the GLEAM and the LSM evaluation datasets (R2 = 0.48 and 0.52 with Mosaic and Noah, respectively), which may be reflective of E-SMAP’s independence from these datasets. Reduced correlations are also partially attributable to the SMAP drying rates themselves, which are expected to be unbiased but contain random noise that may exceed the magnitude of Esoil in some cases32. This noisiness would correspond with a noisy Esoil estimate with reduced correlation relative to evaluation datasets, but with more stable averages over seasonal or longer time periods. Overall, Esoil from E-SMAP is comparable with Esoil from the evaluation datasets but caution should be exercised with individual data points because the effect of random noise within SMAP drying rates.

## Usage Notes

Moisture flux estimates in the E-SMAP dataset represent the average flux over the valid SMAP interval and are reported at the mid-date of respective intervals. The E-SMAP dataset may be used to estimate soil evaporation over a time period of months or years. However, soil evaporation estimates at individual time steps should be used with caution because unbiased uncertainty in observed drying rates from the SMAP satellite will introduce noise into shorter-interval estimates.