Background & Summary

Surface soil moisture(SSM) is a key variable in water and energy exchange between land and atmosphere, which controls the partitioning of precipitation into runoff, evapotranspiration, and infiltration, as well as the partitioning of turbulent energy fluxes into latent and sensible heat1,2,3,4,5. As a Global Climate Observing System “Essential Climate Variable”6,7, long and spatio-temporally consistent SSM datasets are necessary for many applications including numerical weather prediction, disaster monitoring of drought and flood, agriculture yield assessment and water resource management, and also for numerical model and global climate change process studies.

Microwave remote sensing—particularly L-band radiometery—provides unique monitoring opportunity, with global coverage and high accuracy. Legacy and currently-operational satellites/sensors including SMAP and Soil Moisture and the Ocean Salinity mission (SMOS) working at L band (1.41 GHz), AMSR-E/AMSR2, ASCAT, FY-3, and TMI working at C band(6.9 GHz) or X band(10 GHz) et al., provide SSM products covering more than thirty years. Those products vary in accuracy and resolution as determined by the mission specifications, calibration, and the retrieval algorithms. The SMAP mission, launched by the National Aeronautics and Space Administration (NASA) in 20158, provided observations of both Brightness Temperature (TB) and backscatter at L-band. SMAP has a dedicated subsystem to enable detection and mitigation of radio-frequency interference (RFI)8, to overcome L-band RFI problems encountered by SMOS9,10. Many validation and evaluation studies11,12,13 conducted at global and regional scales constrain errors in the SMAP SSM product to less than 0.04 m3/m3—useful for global monitoring—but longer time series are required for analyses of trends, decadal variability, and climatological studies. Meanwhile, the Japan Aerospace Exploration Agency’s (JAXA) AMSR2 (2012-present), combined with its predecessor the AMSR-E (2002–2011), have provided a set of TB covering 20 years with similar instrument attributes (frequencies, incident angles, etc.), orbit setting, TB calibration and processing. However, compared to the lower frequency SMAP SSM products, the currently available SSM products retrieved from ASMR-E/2 TB are more prone to errors and biases from atmospheric, vegetative, and soil interactions.

The usual method to establish a longer remote sensing dataset is to combine multiple satellite products. Owe et al.14 built a SSM dataset from 1978 to 2007 by applying one retrieval model (the land parameter retrieval model—LPRM) to the entire TB from available satellite microwave sensors working at C/X band. Liu et al.15,16 combined multiple soil moisture products covering 1978 to 2008, using a Cumulative Distribution Function (CDF) matching technique. This led to the Climate Change Initiative (CCI) program initiated by the European Space Agency (ESA), and its target for soil moisture is to produce a complete, consistent global SSM record based on active/passive products17. CCI SSM provides three datasets: an active dataset, a passive dataset, and a combined dataset. For the passive dataset, multiple satellites’ SSM products are retrieved using LPRM algorithm, and those products are rescaled by CDF matching. In the study of SSM fusion funded by ESA, De Jeu et al.9,18,19,20 developed three fusion approaches to retrieve SSM dataset using SMOS and AMSR-E. Yao et al.21 developed a dataset using SMOS and AMSR-E/AMSR2 using neural networks.

Of these long term satellite SSM datasets, only the CCI dataset is public and accessible. And while the CCI dataset is of sufficient length for climatological studies, a) the dependence on numerous sensor designs, frequencies, and retrieval algorithms, and b) the SSM CDF-matching approach preclude the assessment of climatological metrics such as trends and assessment of inter-annual to decadal scale variability.

This research extends some of the benefits of the SMAP SSM product to the longer AMSR-E/AMSR2 TB dataset using an ANN method to extract as many characteristics of the more robust L-band data from the higher frequency instruments. This approach allows us to transfer the merits of SMAP products back to AMSR2/AMSR-E TB and future AMSR3 TB. Currently, we have created a long term daily SSM product (named as ‘NNsm’) at 36 km resolution from 2002 to 2019. Validation against in situ data from the USDA, the Tibetan Plateau, OZNET, REMEDHUS and AMMA sites demonstrate that the NNsm matches well with SMAP and in situ SSM, with similar accuracy to SMAP (~0.04 m3/m3). Our intent is for this dataset to provide opportunities for longer-term study of the water cycle into the L-band era.

Methods

The satellite brightness temperature (TB) data we use are: (1) AMSR-E TB in slow rotation mode (L1S), (2) AMSR-E and AMSR2 Level 3 standard TB. The AMSR-E/2 Level 3 daily TB at 0.25 degree resolution were obtained from the online dissemination service Globe Portal System (G-Portal, https://gportal.jaxa.jp/gpr/) of JAXA. The SMAP Level 3 passive daily SSM product (SMAPL3sm) was obtained from NASA National Snow and Ice Data Center Distributed Active Archive Center (NSIDC DAAC, https://nsidc.org/data/SPL3SMP/versions/6), with a 36 km resolution at cylindrical Equal Area Scalable Earth Grid, Version 2.0 (EASE-Grid 2.0), with size of 406 rows × 964 columns. Table 1 summarizes the data used, their spatial resolutions, and provenance.

Table 1 Details of data used in the study.

Our research approach adopt the soil moisture retrieval algorithm developed by Yao et al.21 using artificial neural networks (ANN). The approach is divided into three components: calibration, training, and simulation/validation, as shown in the Fig. 1. An ANN is a nonlinear mathematical computing system which is capable of representing arbitrarily complex nonlinear processes that relate the inputs and outputs of any system22. The structure of an ANN model includes an input layer, a hidden layer, and an output layer. We use MATLAB to implement the ANN training and simulation.

Fig. 1
figure 1

Data Processing for the NNsm dataset.

AMSR-E/AMSR2 data inter-calibration

Firstly, to ensure consistency of data from two sensors onboard different satellites, the TB of AMSR-E are calibrated to the TB of AMSR2 for each grid cell.

The common approaches for inter-calibration23 are inappropriate for AMSR-E and AMSR2 data due to a temporal gap of 9 months between them. Previous study on inter-calibration for AMSR-E and AMSR2 are based on the double difference method(DD), taking a third sensor as an intermediate reference, such as the microwave radiometer imager (MWRI) onboard the FengYun-3B (FY3B) satellite24,25,26. AMSR-E operations were halted due to rotational issues in 2011, and then restarted at a slow rotation rate from 2012 to 2015 (L1S TB), which is useful for users who cross-calibrate AMSR-E with other radiometers. Hu et al.27 conducted direct global calibration for AMSR-E and AMSR2 using AMSR-E L1S TB data.

The AMSR-E L1S data has 3 years of overlapping observations with AMSR2, based on which we developed a linear regression model for each grid cell to calibrate the AMSR-E TB to AMSR2 TB, for each frequency and each polarization:

$$T{B}_{amsr2}=a\ast T{B}_{amsre}+b$$
(1)

Inputting the AMSR-E TB (2002–2011) into Eq. (1) yields the calibrated AMSR-E TB. Combined with AMSR2 TB (2012–2019), we get harmonized TB time series for AMSR-E and AMSR2 from 2002 to 2019.

Pre-process of AMSR-E L1S TB

The global distribution of the slow-rotation L1S terrestrial data is shown for one day in Fig. 2(a). While the total observed area is sparse for any day, locally there are many observations with overlapping footprints. Figure 2(b) displays the centers of those overlapping footprints in one 0.25 degree grid cell at Little Washita. First, we resampled those observed points to 0.25 degree.

Fig. 2
figure 2

(a) Global daily map of AMSR-E L1S terrestrial TB (K) on April 11, 2015 (b) overlapping L1S TB (K) footprints for one overpass in one 0.25 degree grid cell covering Little Washita (LW) on April 11, 2015.

Inter-calibration over each grid cell

We carry out the inter-calibration over each grid cell. We take the grid cell of the Little Washita Experimental Watershed in southwestern Oklahoma, USA (LW) as an example. In Fig. 3(a), the blue circles are TB of L1S and the red dots are TB of AMSR2. Figure 3(b) shows the calibration equation for H polarization at C band for the grid cell. In Fig. 3(a), the blue circles are TB of L1S and the red dots are TB of AMSR2. We obtain the linear relationship using the matched-pair data (N = 46), with correction slope a = 0.9713, and regression R = 0.9820.

Fig. 3
figure 3

Inter-calibration reelation for the Little Washita Experimental Watershed: (a) time series for AMSR-E L1S TB and AMSR2 TB from Dec. 2012 to Dec. 2015, (b) scatterplot of two TB set together with the regression equation, (c) the global distribution of the number of matched-pair for C band and H-pol, (d) the global distribution of regression correlation coefficient R for AMSR-E L1S TB and AMSR2 TB inter-calibration from Dec. 2012 to Dec. 2015.

The number of matched-paird at C band for H-polarization is shown globally in Fig. 3(c).The number of matched-pairs per grid cell is greater than 40 at mid-latitudes, which ensures sufficient statistical power to determine the regression relationship. The regression R for each grid cell is shown in Fig. 3(d). Here we only display the grids where R > 0.85; 99.92% of locations show statistically significant inter-calibration relationships (P = 0.05).

General inter-calibration equations

In the equatorial zone and around the land-sea interface we find some pixels with regression R less than 0.85. We deem this calibration approach for these grid cells to be uncertain. Those areas are shown in white in Fig. 3(d). To still provide an estimated cross-sensor inter-calibration for these grid cells with fewer matched-pairs, we use a general inter-calibration relation for each frequency and each polarization estimated using all global matched-pairs for the 2012–2015 period. The regression coefficients for these general relationships are shown in Table 2. Users investigating hydroclimate regime change or decadal shifts in mean state at the single-pixel level, however, should be aware of the methodological difference for these locations. A mask marking pixels using the global inter-calibration model is given with the dataset file.

Table 2 Global inter-calibration parameters for Eq. 1. Used for locations with unstable local inter-calibration. Location mask given in file Inter_cal_mask.nc.

ANN Training

In the ANN training period from March 2015 to December 2017, the input layer is comprised of reflectivity (R) and the microwave vegetation index (MVI)28 derived from AMSR2 TB; the output layer (training target) is the SMAPL3sm21. Ten neurons are used in our net and the training function is the default ‘trainlm’ function.

Data pre-processing

The descending SMAPL3sm data (at 06:00 local time) are selected in this study, corresponding to the AMSR-E/2 descending nighttime TB data (at 01:30 local time). The AMSR-E/2 Level 3 daily descending TB data (01:30) at 0.25 degree resolution are selected. And those AMSR2 TB are resampled to the 36 km EASE-Grid 2.0 using linear interpolation. The R is derived from estimated surface temperatures (Ts)29 and TB, and the MVI is derived from AMSR-E/2 TB at C band and X band:

$${{\rm{R}}}_{f}^{p}=1-\frac{T{B}_{f}^{p}}{{T}_{s}}$$
(2)
$${T}_{s}=1.11\times T{B}_{v}^{36.5}-15.2$$
(3)
$${\rm{MVI}}\left({{\rm{f}}}_{1},{{\rm{f}}}_{2}\right)=\frac{{{\rm{TB}}}_{v}\left({{\rm{f}}}_{2}\right)-{{\rm{TB}}}_{h}\left({{\rm{f}}}_{2}\right)}{{{\rm{TB}}}_{v}\left({{\rm{f}}}_{1}\right)-{{\rm{TB}}}_{h}\left({{\rm{f}}}_{1}\right)}$$
(4)

where f is the frequency, p is the polarization, v is vertical polarization and h is horizontal polarization.

ANN Training for each grid cell

To determine the local relationships between AMSR2-derived R/MVI and SMAPL3sm, we trained ANNs for every EASE-Grid 2.0 grid cell. Our ANN network was designed to minimize Mean Squared Error (MSE) of SMAPL3sm, using a random internal assignment of data into training/validation/testing categories (these assignments are distinct from our external data division and were applied only to our 2015–2017 testing data), with Levenberg-Marquardt (L-M) optimization used for back propagation.

During the training period from 2015 to 2017, the target SMAPL3sm and the input (R and MVI) were matched grid cell by grid cell. We removed those cells with less than 50 matching pairs (N) of AMSR2-SMAP observations. We set the Ts derived from the AMSR2 TB as a criterion of the freeze/thaw, and removed the data for frozen states. The global ANN were trained grid cell by grid cell with matching data.

NNsm simulation for each grid cell

In the simulation period from 2002 to 2019, the input R/MVI were derived from consistent AMSR-E/2 TB, as described in data pre-processing. Over frozen soil, soil moisture is not retrieved, thus we did not simulate SSM for grid cells when Ts was lower than 273.15 K. We do not mask values based on surface water or vegetation water content, and highlight that the target (SMAP) values are less reliable for high values of each, meaning that these results could be screened with the same methods as used for the SMAP data itself. With the pre-processed input and the global ANNs model for each grid cell trained in the previous step, we derived the daily soil moisture from 2002 to 2019 on each grid.

Data Records

The data records30 contain global daily soil moisture data with a spatial resolution of 36 km, in m3/m3, from June 2002 to December 2019. These data are stored in NetCDF format with one file per day, defined by two dimensions (lat, lon, respectively representing latitude and longitude) and a variable soil moisture (soil_moisture). The file name is “yyyyddd.nc”, where “yyyy” stands for year and “ddd” stands for Julian date. For example, “2003001.nc” contains the global soil moisture distribution on the first day of 2003.

Naming convention:

yyyyddd.nc

Variable: soil_moisture=volumetric soil moisture [m3/m3]

Technical Validation

Training and simulation performance

To confirm the reliability of the algorithm, we first present the performance of the ANN model during the training period from 2015 to 2017.

We evaluated the output of the trained ANN model, NNsm, against the target SMAPL3sm over the training period by analyzing the correlation coefficients and root-mean squared errors between the NNsm and the SMAPL3sm for each grid cell. The accuracy of predicted NNsm against the SMAPL3sm are shown in Fig. 4.

Fig. 4
figure 4

Global map of statistical comparison between NNsm and SMAPL3sm over training period (2015–2017): (a) Correlation coefficient, (b) Root mean squared error (m3/m3), (c) Cumulative distribution functions of all of the daily data for the NNsm and SMAPL3sm, (d) Global mask for regions with high vegetation water content, surface water, urban areas, and permanent ice/snow, (e) Masked correlation coefficients, (f) Masked RMSE, (g) Masked correlation coefficients (2018–2019), (f) Masked RMSE (2018–2019).

The trained NNsm values generally correlate well with the reference SMAPL3sm values over most of the globe, with a global mean of CC = 0.80, and a global mean of RMSE = 0.029 m3/m3. Globally, the CC values are high in regions with significant soil moisture dynamics. The RMSE values scale roughly with the local dynamic range of soil moisture and remain relatively small as a fraction of mean SSM values. Highly vegetated areas, including the Amazon and Congo rainforests, have correlations on the order of 0.5, presumably showing the AMSR-E/2 sensors’ diminished canopy penetration, as well as larger uncertainty in retrievals at all microwave frequencies for grid cells with high vegetation water content. Areas with commonly frozen ground also show reduced correlations and larger RMSEs. From the Cumulative Distribution Functions (CDF) of all of the daily data for the NNsm and SMAPL3sm in Fig. 4(c), we can see that the two CDF curves are very similar, with NNsm appearing to be a little drier than SMAPL3sm in the wettest conditions.

Figure 4(d) shows a global mask in black, where SMAP is expected to have errors less than 0.04 m3/m3. Surfaces with permanent ice and snow, urban areas, wetlands, and vegetated areas with vegetation water content >5 kg/m2 are excluded. The masked results of Fig. 4(a,b), removing the white areas from Fig. 4(d), are shown in Fig. 4(e,f). The spatial mean CC for Fig. 4(e) is 0.87 and the spatial mean RMSE for Fig. 4(f) is 0.022 m3/m3 as shown in Table 3.

Table 3 Global statistical comparison between NNsm and SMAPL3sm over training period (2015–2017) and simulation period(2018–2019). Values are the spatial means of individual pixel values.

Data in 2018–2019 was used to validate the performance of trained NNsm. As shown in Fig. 4(g,h). The correlation coefficients and RMSE between trained NNsm and SMAPL3sm has similar distribution pattern with that of 2015–2017, but there’s a slight decrease in accuracy than that of training period, with a spatial mean of masked CC = 0.73, and a spatial mean of masked RMSE = 0.033 m3/m3.

Validation using in situ observation

The performance of the simulated long term NNsm was validated against in situ soil moisture observations for the period from 2002 to 2019. We use in situ soil moisture observations from 14 representative validation sites, including (a) 7 United States Department of Agriculture (USDA) watershed sites (Walnut Gulch, Little Washita, Fort Cobb, Little River, Saint Joseph’s, South Fork, and Reynolds Creek), (b) 2 Tibetan Plateau sites (Pali and Naqu), (c) 2 Australian Moisture Monitoring Network (OzNet) sites (Yanco and Kyeamba), (d) the REMEDHUS Network sites, and (e) 2 African Monsoon Multidisciplinary Analysis (AMMA) sites (Benin and Niger) as shown in Fig. 3 and Table 4. Data of OzNet, REMEDHUS and AMMA sites are provided by International Soil Moisture Network (ISMN) (https://ismn.geo.tuwien.ac.at/) website31,32,33,34.

Table 4 List of validation sites. Time series for bold-face sites are shown in Fig. 6.

These ground-based sites are major validation points for satellite soil moisture products, covering a wide variety of topography, land cover types and soil types around the world. These sites, which include dozens of instrumented stations each, are designed to provide reliable and representative soil moisture values for comparison against spatially-aggregated satellite footprints. Fig. 5 shows the location of these sites.

Fig. 5
figure 5

Location of in situ soil moisture validation sites.

The performance of the NNsm over the ground sites are shown in time series in Fig. 6 and in a statistical matrix in Table 5. For demonstration, we show time series for one site in every continent. The magnitude and variability of NNsm (blue dots) are consistent with those of SMAPsm (red dots) and in situ SM (Obs-sm, grey line), and NNsm captures both the daily and seasonal dynamics of in situ SM. The ANN performs stably for both the training period (2015–2017) and the broader simulation period (2002–2014 and2018–2019), which can be seen both from the time series in Fig. 6 and the statistics in Tables 5 and 6 (2002–2019,in next section). Since the training target is SMAPL3sm, the performance of NNsm relative to in situ soil moisture (Obs-sm) is similar to SMAPsm as shown over most sites, and can do no better than SMAP in conditions where satellite and in situ soil moisture diverge.

Fig. 6
figure 6

Time series of NNsm (blue dots), SMAPL3sm (red dots), and in situ soil moisture (Obs-sm in grey lines) for 2002–2019. In situ sites shown: (a) USDA-Little Washita, (b) Tibet-Naqu, (c) OzNet-Yanco, (d) REMEDHUS, and (e) AMMA-CATCH-Benin sites.

Table 5 Statistical comparisons of NNsm and SMAPL3sm with the in situ soil moisture for training period (2015–2017) and simulation period(2018–2019).
Table 6 Statistical comparisons of NNsm, JAXAsm and LPRMsm with the in situ soil moisture for 2002–2019.

Validation and comparison with AMSR-E/AMSR2 standard products

Moreover, to clarify the advantages of our algorithm and soil moisture products, we validated the performance of NNsm by comparing the simulated output with the satellite standard SSM products of AMSR-E/AMSR2 from JAXA and LPRM, JAXAsm and LPRMsm respectively, over the in situ sites. Results are shown in Fig. 7 and Table 6 (2002–2019). When calculating the statistical matrix, we use the intersection of the observational periods for all four datasets (NNsm, JAXAsm, LPRMsm, in situ), as shown in second column of Table 6. We also performed inter-comparisons separately over the AMSR-E period (2002–2011) and AMSR2 period (2012–2019) (see Supplementary Tables 1 and 2).

Fig. 7
figure 7

Time series of NNsm (blue dots) against AMSRsm-JAXA(green dots), the AMSRsm-LPRM (magenta dots), and the in situ soil moisture (Obs-sm in grey lines) for 2002–2019. In situ sites shown: (a) USDA-Little Washita, (b) Tibet-Naqu, (c) OZNET-Yanco, (d) REMEDHUS, and (e) AMMA-CATCH-Benin sites.

From the time series plots shown in Fig. 7 and the statistical comparison shown in Table 6, NNsm is generally consistent with in situ SSM, while NNsm may underestimate or overestimate soil moisture slightly at a few sites. The performance of NNsm is better than that of AMSR-E/AMSR2 SSM from JAXA and LPRM, with higher CC, lower RMSE and ubRMSE. In most sites, LPRM overestimated the soil moisture, while JAXA underestimated the soil moisture. LPRM in particular shows changes in bias and variability over time.

From the time series plots shown in Fig. 7 and the statistical comparison shown in Table 6, NNsm is generally consistent with in situ SSM, albeit with some scaling bias apparent in some sites (e.g., Fig. 7(e)). The JAXAsm and LPRMsm time series show biases as well, with JAXAsm typically underestimating the in situ observations and LPRM generally overestimating (Fig. 7(a–e), Table 6). The LPRMsm also displays some large changes in variability and mean state (Fig. 7(a,c,e)) which seem to not follow either the in situ observations or the main line of TB forcing that drive the JAXAsm and NNsm time series. Across the in situ validation sites, NNsm displays broadly lower biases, lower RMSEs and unbiased RMSEs (ubRMSE), and higher correlations with the in situ data than the JAXAsm and LPRMsm AMSR-E/2 soil moisture time series. This suggests that the NNsm may be providing added value from interactions between the reflectivities and microwave vegetation indices used as inputs in Eqs. 24 beyond what is used in the JAXA and LPRM retrieval algorithms.

Comparison with SMOS and CCI soil moisture products

To quantify the utility of our algorithm and any benefits from creating an additional soil moisture product, we also compared NNsm with the satellite SMOS L3 SSM products(SMOSsm) (V300, https://www.catds.fr/) and CCI soil moisture product(CCIsm) (V05.2,combined)35,36,37, in terms of both accuracy and data volume.

CCIsm merges AMSR-E, Windsat and ASCAT from 2002 to 2010, and also merges SMOS and SMAP after 2010 and 2015 respectively; NNsm is derived from AMSR-E/AMSR2 TB from 2002 to 2019; SMOSsm is available from 2010. For appropriate comparison across these data spans, we carried out the comparison in two periods: one from 2002 to 2009, and the second from 2010 to 2019.

First, we quantify the relative accuracies of the SSM products. For the two periods (2002–2009 and 2010–2019) joint density scatter plots are shown in Fig. 8 and summary statistics in Tables 78, respectively. We also show data product intercomparison scatter plots for each site in Figs. S12 of the Supplementary file. Results are all shown for the overlapping data period; for example, data from the Little Washita validation site in Fig. 8 and Table 7 span the overlapping data windows of the NNsm data, the CCIsm data, and the in situ data from 2007 to 2009.

Fig. 8
figure 8

Jiont density scatter plot of in situ soil moisture (x-axis) against (a) NNsm and (b) CCIsm for 2010–2019, and (c) NNsm, (d) SMOSsm and (e) CCIsm for 2010–2019 over all 14 validation sites.

Table 7 Statistical comparisons of NNsm, and CCIsm with in situ soil moisture for 2002–2009.
Table 8 Statistical comparisons of NNsm,SMOSsm and CCIsm with in situ soil moisture for period of 2010–2019.

2002 to 2009: CCIsm, NNsm, and in situ data

In general, as evident in Fig. 8(a,b) and Table 7 the NNsm have substantially lower bias than CCIsm relative to the in situ observations, leading to a lower RMSE with roughly equivalent correlations. The ubRMSE is lower for the CCIsm, suggesting that the CCIsm bias is a primary source of error in the data set. CCIsm tends to overestimate the in situ values, especially when soil is dry (SSM < ~0.2 m3/m3); the average bias across sites is 0.088 m3/m3. Scatter plots of CCIsm vs in situ observations by site show some data processing artifacts as well (discretized SSM values evident in Supplementary Fig. 1.1, 1.7, and 1.12) which may introduce some errors as well.

2010 to 2019: CCIsm, NNsm, SMOSsm, and in situ data

In the L-bad era, as shown in Fig. 8(c,d,e) and Table 8, the three SSM products — NNsm, SMOSsm and CCIsm — have similar accuracies relative to the 14 in situ network sites. NNsm and SMOSsm have average biases of 0.021 m3/m3 and 0.013 m3/m3, and ubRMSEs of 0.051 m3/m3 and 0.071 m3/m3, respectively. Most NNsm and SMOSsm observations are located around the one-to-one line, while both products have some overestimated outliers. CCIsm has the lowest ubRMSE, with an average value of 0.038 m3/m3. But again CCIsm tends to overestimate the SM values at the in situ sites, particularly when soil is dry, with an average bias of 0.071 m3/m3.

Secondly, the comparison is carried out in terms of the number of data. For the period from 2002 to 2009, we take 2003 as an example. As shown in Fig. 9(a), NNsm can provide global product in summer. In general, NNsm provides more than 200 soil moisture retrievals over each grid in middle latitudes, and provides more than 100 soil moisture retrievals over each grid in high latitudes, as shown in Fig. 9(b). CCIsm doesn’t have soil moisture retrievals over equatorial zone and most area of Russia (Fig. 9(c)). In the North America, Northern Europe, Southeast China, and the Tibetan Plateau, CCIsm has a few soil moisture retrievals, with a number less than 50 over each grid (Fig. 9(d)).

Fig. 9
figure 9

Global map of daily soil moisture(20030701) and the number of soil moisture retrievals over each grid in one year(2003): (a) NNsm, (b) number of NNsm observations per grid cell, (c) CCIsm, and (d) number of CCIsm observations per grid cell. Global map of daily soil moisture(20100701) and the number of soil moisture retrievals over each grid in one year(2010): (e) NNsm, (f) number of NNsm observations per grid cell, (g) SMOSsm, (h) number of SMOSsm observations per grid cell,(i) CCIsm, and (j) number of CCIsm observations per grid cell.

For the sake of application utility, we also compare data volumes for the three data sets. Example data volumes for the pre-L-band period (2002 to 2009), and L-band period (2010 to 2019) are shown in Fig. 9(a–d), Fig. 9(e–j), respectively. For the period from 2002 to 2009, we take 2003 as an example. As shown in Fig. 9, NNsm provides a global product in the boreal summer. In general, NNsm provides more than 200 soil moisture retrievals over each grid cell in the mid-lattitudes, and provides more than 100 soil moisture retrievals over each grid cell in the high lattitudes. CCIsm does not have soil moisture retrievals over much of the equatorial zone and most area of Russia. In North America, Northern Europe, southeastern China, and the Tibetan Plateau CCIsm has very few soil moisture retrievals, less than 50 per grid cell.

For period from 2010 to 2019, we take 2010 as an example. As shown in Fig. 9(e,g,i), the three products have a similar spatial pattern and dynamic range in summer. NNsm can provide considerable number of soil moisture retrievals globally except in Tibetan Plateau (Fig. 9(f)). SMOS has 150 soil moisture retrievals on the average but less retrievals in Asia, affected by RFI seriously (Fig. 9(h)). CCIsm has evident advantages in number of retrievals after 2010, but still has no retrievals over equatorial zone, and has less retrievals in most area of Russia, in North America, and the Tibetan Plateau (Fig. 9(j)).

For the 2010–2019 period, we highlight 2010 as an example. As shown in Fig. 9(e,g,i), the three products have similar spatial pattern and dynamic range in boreal summer. NNsm has an annual data volume of typically more than 200 retrievals per year outside of regions with permanent snow and ice cover. SMOS has 150 soil moisture retrievals per year on average, with fewer retrievals in Asia, significantly affected by RFI. CCIsm has particularly high data volumes in the subtropics, but fewer in cold regions (and masks retrievals for dense tropical vegetation).

Potential benefits and usages of this dataset

Complement to SMAP product

The NNsm dataset can be seamlessly merged with SMAP SSM at daily scale, providing greater spatial coverage and higher frequency observations, as well as serving as a complementary gap-filling product for SMAP SSM (e.g., when SMAP entered safe mode temporarily from June 20 to July 22 in 2019).

Figure 10 shows global maps of SMAPL3sm, NNsm, JAXAsm, LPRMsm, and a combined NNsm/SMAPL3sm. SMAP has less daily spatial coverage, and can have a global coverage with 3-day average, as shown in Fig. 10(a,b). NNsm derived from AMSR-E/2 has wider daily spatial coverage than SMAP (Fig. 10(c)). When combining NNsm with SMAPsm for the same day as shown in Fig. 10(d), for example, in China, Eastern and Southern Europe, the United States, South America, Africa and Australia, the combined map shows more coverage and provides almost full global coverage, with no obvious dataset discrepancies or inconsistencies. JAXAsm (Fig. 10(e)) has a drier soil moisture estimation and LPRMsm (Fig. 10(g)) has more wet soil moisture estimation at global scale. When merging them with SMAPsm separately, the fusion maps have obvious underestimation or overestimation, with noticeable striping in South America and East Asia (Fig. 10(f,h)).

Fig. 10
figure 10

Global map of SMAPL3sm, NNsm, JAXAsm, LPRMsm and gap-filled SM products for July 10, 2018 (m3/m3). (a) SMAPL3sm, (b) 3-day average of SMAPL3sm, (c) NNsm, (d). SMAPL3sm gap-filled with NNsm, (e) JAXAsm, (f) SMAPL3sm gap-filled with JAXAsm, (g) LPRMsm, and (h) SMAPL3sm gap-filled with LPRMsm.

SMAP was placed into safe mode and stopped capturing data for one month temporarily from June 20 to July 22 in 2019. During this period, SMAP provides no product, as marked in blue boxes shown in Fig. 11(a–e). NNsm is consistent and can capture the rainfall events with one-day delay, since the observation time of NNsm is 01:30 am. With a SMAP-similar accuracy, NNsm can provide complementary soil moisture for the SMAP SSM product.

Fig. 11
figure 11

Time series of NNsm (blue dots), SMAPL3sm (red dots) and precipitation (blue bars) in 2019 when SMAP stopped capturing data temporarily (blue boxes). (a) USDA-Little Washita, (b) Tibet-Naqu, (c) OZNET-Yanco, (d) REMEDHUS and (e) AMMA-CATCH-Benin sites. Dry-down after rainfall.Time series of NNsm (blue dots), SMAPL3sm (red dots) and precipitation (blue bars) in 2019 over USDA sites (a) Walnut Gulch, (b) Little Washita, and (c) Fort Cobb.

Application for the study of short-term moisture dynamics

NNsm provides more frequent soil moisture observations for studying land-atmosphere interactions. SMAP has a narrow swath and can only provide roughly 10 measurements (or more dependent on latitude) within one month, while NNsm derived from AMSR-E/2 has a wider swath and provides measurements almost every day, as shown in Fig. 11(f–h). Combined with the standard SMAP product, NNsm can be used to extract dry-down curves with higher temporal resolution and process accuracy after rainfall.

Near-Real-Time product and extension to AMSR3

Having created and trained the models, it is now possible to produce near-real-time data products into the future provided that there are available brightness temperature data. Forward simulation of SM from TB data using the NNsm models is fast and efficient, and requires no ancillary datastreams. TB from future instruments can also be used, following a calibration to the existing AMSR2 data, analogous to the AMSRE-AMSR2 calibration performed in Eq. 1.

AMSR3 is scheduled to launch in 2023 as part of the GOSAT-GW mission and will provide similar C, X, and K band observations as a successor to AMSR2. Since our model only uses these observation bands as input, our method can be readily move to AMSR3 and a long term soil moisture product can continue to be generated for stable and consistent climatological studies of the terrestrial water cycle.