Multi-mission satellite remote sensing data for improving land hydrological models via data assimilation

Satellite remote sensing offers valuable tools to study Earth and hydrological processes and improve land surface models. This is essential to improve the quality of model predictions, which are affected by various factors such as erroneous input data, the uncertainty of model forcings, and parameter uncertainties. Abundant datasets from multi-mission satellite remote sensing during recent years have provided an opportunity to improve not only the model estimates but also model parameters through a parameter estimation process. This study utilises multiple datasets from satellite remote sensing including soil moisture from Soil Moisture and Ocean Salinity Mission and Advanced Microwave Scanning Radiometer Earth Observing System, terrestrial water storage from the Gravity Recovery And Climate Experiment, and leaf area index from Advanced Very-High-Resolution Radiometer to estimate model parameters. This is done using the recently proposed assimilation method, unsupervised weak constrained ensemble Kalman filter (UWCEnKF). UWCEnKF applies a dual scheme to separately update the state and parameters using two interactive EnKF filters followed by a water balance constraint enforcement. The performance of multivariate data assimilation is evaluated against various independent data over different time periods over two different basins including the Murray–Darling and Mississippi basins. Results indicate that simultaneous assimilation of multiple satellite products combined with parameter estimation strongly improves model predictions compared with single satellite products and/or state estimation alone. This improvement is achieved not only during the parameter estimation period (\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\sim $$\end{document}∼ 32% groundwater RMSE reduction and soil moisture correlation increase from \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\sim $$\end{document}∼ 0.66 to \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\sim $$\end{document}∼ 0.85) but also during the forecast period (\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\sim $$\end{document}∼ 14% groundwater RMSE reduction and soil moisture correlation increase from \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\sim $$\end{document}∼ 0.69 to \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\sim $$\end{document}∼ 0.78) due to the effective impacts of the approach on both state and parameters.

Studying the terrestrial hydrology is facilitated by developments of land surface models. These models are important to simulate various terrestrial compartments over an extended period of time. Moreover, they are essential for predicting hydrological processes and water storage changes at various temporal and spatial resolutions. The performance of the land surface models, however, can be degraded caused by multiple factors such as uncertainties in model forcings, model parameters, initial and boundary conditions, and simplification of the representation of processes 1,2 . To address this, traditionally additional datasets are integrated with models to improve model estimates.
The data integration approaches have become more popular with the advent of satellite remote sensing. This is related to the satellite's extensive coverage and high spatial and temporal resolution, especially during the past few decades. Satellite data products can be used to constrain the models, e.g., via data assimilation [3][4][5][6][7][8][9][10][11][12] . A number of studies has shown that applying multivariate data assimilation using in-situ and reanalysis estimates [13][14][15][16][17][18][19] could be beneficial. However, despite a few efforts for using multi-mission satellite products for data assimilation [20][21][22] , the extent of the effectiveness of the approach has not yet been fully investigated 23 . Furthermore, while using the multivariate data assimilation was found to be effective for improving on-line model estimates, its impact on the (long-term) forecasting skill is normally limited if only initial states are updated. The main reason behind this is the important role of the model parameters for simulating fluxes and water storage as well as uncertainty with respect to model forcings (meteorology). Poorly defined parameters, which are not updated during the Materials Case studies. The two major river basins, Mississippi and Murray-Darling are selected for the experiment given the presence of in-situ measurements to assess the proposed multivariate data assimilation. The Murray-Darling basin is the biggest river system in Australia comprising many wetlands (i.e. more than 30,000) and rivers (i.e. 23), which provide freshwater for, e.g., agriculture, industry, and water use 54 . A large area in the eastern part of the country is covered by the basin, which contains a variety of natural environments, e.g., desert and dry regions (west), rainforest (north), snow covered areas and areas with a larger amount of surface water (south). Historically, the Murray-Darling basin has undergone various extreme droughts and floods. Furthermore, water storage within the basin has also shown large inter-annual and annual variabilities 53,54 . Temperature varies from 0 to 3 • C in the elevated areas in the southeast of the basin in July to 33 to 36 • C for the upper northern parts in January. The same rainfall spatial variability also exists within the Murray-Darling basin, i.e. annual rainfall less than 250 mm in the northwest and excess of 2000 mm in the elevated areas 55 .
Similarly, the Mississippi River basin is an important source of freshwater in North America, which provides water for more than 18 million people and different socioeconomic sectors. Temperature varies strongly within the basin, which leads to large spatial and temporal hydro-climatic variabilities 56,57 . For example, higher temperature ( 21 • C ) along with hot and humid condition exist in May to September while average low temperatures ( −3 • C ) in January are available in the north caused by various factors such as polar and subtropical jet streams and Arctic cold. Snow line has been progressively migrating northward across the basin 59,60 . Overall, Upper Mississippi areas (e.g., central Minnesota to central Wisconsin) has larger snow cover compared to the other parts of the Mississippi River basin (such as southeast Missouri and southwest Illinois). Showers (and thunderstorms) occur mostly in summers over different parts of the basins while winter precipitation varies from less than 25 mm for the western and northern Great Plains to 75  www.nature.com/scientificreports/ conditions vary over the different parts of the Mississippi basin and different times of the year [58][59][60] . This includes semiarid climates in the west, humid condition over the eastern parts, sub-humid climates in the south along with a large discharge rate and multiple flood events across the basin.
In-situ groundwater well data (derived from USGS and the New South Wales Government) are used over both basins to evaluate the estimated groundwater variations from the model with and without data assimilation. To this end, groundwater level data are converted into groundwater storage change values with the help of specific yield values of the basin 2,61-63 . In addition, soil moisture observations from in-situ stations are used to assess the soil moisture estimates at different depths. For this purpose, top, shallow and root-zone soil moisture from the model are compared against in-situ soil moisture measurements at corresponding depths. Over the Mississippi Basin, groundwater and soil moisture measurements are acquired from USGS (https ://water .usgs.gov/ogw/data. html) and the International Soil Moisture Network (https ://ismn.geo.tuwie n.ac.at/), respectively. For the Murray-Darling Basin, the measurements are acquired from New South Wales Government (http://water info.nsw. gov.au/pinne ena/gw.shtml ) and the moisture-monitoring network 64 . Model and data. Model. The World-Wide Water Resources Assessment model (W3RA), which was designed and developed by the Commonwealth Scientific and Industrial Research Organisation (CSIRO) is used. W3RA is a Global water balance model, which is distributed on grid basis and simulates water flows and water storage 65 . ERA-interim reanalysis data including meteorological fields of precipitation, maximum and minimum temperature, and downwelling short-wave radiation, are used as model forcings. The model presents the water balance of the soil, groundwater and surface water independently over each grid cell 66 . The water and energy fluxes between the water storages are also modelled for two hydrological response units (HRUs) which occupy different fractions of a grid cell, i.e., tall and deep-root vegetation in HRU1 and short and shallow root vegetation in HRU2. Correspondingly, parameterizations are applied at the sub-grid level 39 . Poovakka et al. 40 discussed the necessity of calibration for this model, as it is currently limited to a number of catchments where streamflow records and input forcing data are available. The model relies on a variety of parameters such as water holding capacity and effective soil parameters 67 . A detailed list of selected parameters for estimation is presented in Table 1.
These parameters influence mass balance equations underlying the model. Soil albedo and photosynthetic capacity index (PCI) parameters are used to model canopy albedo and outgoing shortwave radiation from the land. Initial retention capacity ( I 0 ) and reference event precipitation ( P ref ) are applied to derive surface runoff. These parameters are also connected to rainfall intensity and the soil infiltration distribution. Soil water drainage is estimated based on β and field capacity drainage fraction. F ER0 , W 0lim and maximum stomatal conductance ( G smax ) are applied for evaporation modelling, e.g., via rainfall interception evaporation, soil evaporation, and maximum transpiration, respectively. Open water evaporation scaling factor is used to derive open water evaporation, which can have higher uncertainties over large bodies of surface water. Specific leaf area and leaf area index parameters are developed to facilitate vegetation phenology computations 65 .
Satellite remote sensing. Three main satellite products are used for data assimilation to update states and estimate parameters. TWS changes are derived from level 2 (L2) GRACE products (up to degree and order 90). L2 coefficients and their associated full error covariance information are acquired from the ITSG-Grace2014 gravity field model 68 . Post-processing steps are done following Khaki et al. 69 and Khaki and Awange 70 to calculate TWS changes between 2003 and 2016. The data is then used to update the summation of different water storage components from the model including groundwater, different soil layers, and surface water storage (see details in "Methodology" section). TWS error covariances (to be used in data assimilation) are computed from potential coefficients following Schumacher et al. 46 .
The National Oceanic and Atmospheric Administration (NOAA) of LAI Climate Data Record (CDR; version 4) and Fraction of Absorbed Photosynthetically Active Radiation (FAPAR) 71 are obtained for the period of 2003-2016. The data were produced by the University of Maryland and the NASA Goddard Space Flight Center (GSFC) on a daily 0.05 • × 0.05 • global scale. LAI products are used for data assimilation given their potential www.nature.com/scientificreports/ to improve modelling skills 72 . LAI has a major impact on estimating evapotranspiration (ET) and precipitation interception, thus, can be very useful for data assimilation 71 . Following Fox et al. 73 , a constant error standard deviation of 0.2 (m 2 m −2 ) is assumed for the LAI from satellite. Soil moisture products are achieved over the same period, i.e., 2003-2011 from AMSR-E (Level-3) 74 and 2011-2016 from SMOS (Level 3 Centre Aval de Traitement des Donnees SMOS) 75 . These data are used during assimilation to control the model surface soil moisture content. Regarding soil moisture measurement uncertainty, we followed Leroux et al. 76 and De Jeu et al. 77 and assumed 0.04 (m 3 m −3 ) error for SMOS and 0.05 (m 3 m −3 ) error for AMSR-E observations.
Water fluxes. Additional datasets of precipitation, total evapotranspiration, and water discharge are used to constrain the water balance through in the UWCEnKF implementation (see details in "Data assimilation" section). These data are derived from Khaki et al. 69 , in which data from different sources, e.g., satellite, reanalysis, and gauge-based measurements (from multiple sources over the Mississippi and Murray-Darling basins), are merged to achieve the best estimates over different basins. Note that the datasets applied here for water budget constraint are mostly independent from those applied for running the model except for precipitation, e.g., the Tropical Rainfall Measuring Mission (TRMM) is used in both ERA-interim forcing and the merged precipitation product for the water budget constraint. Nevertheless, this dependency between the products is not a limitation for our data assimilation experiments as it was shown that the water budget closure, where the water flux observations are used is not affected by this 69,78 .

Methodology
Sensitivity analysis. A sensitivity analysis following Cannavo 79 is carried out to measure the model response to parameter changes. This is done to identify the parameters that significantly affect the model output. The analysis will also increase our understanding of the impact of model parameters on model simulations. The selected approach here is a global sensitivity analysis that contrary to so-called local sensitivity analysis assesses sensitivity over the entire input parameter space. It is a variance-based method that investigates the contribution of each input parameter to the total variance of the output, i.e., y = f (X) and X = (x 1 , x 2 , . . . , x n ) with n being the number of input parameters ( x ). The objective is to measure the importance of input on the variance of the output, namely the sensitivity ( S i ) of y to x i through, This is known as the first order sensitivity index by Sobol 80 . Analytical solution of Eq. 1 for a non-linear high dimensional system is not possible, thus, a numerical approximation is needed. This can be facilitated using the Fourier Amplitude Sensitivity Test (FAST) and the Monte Carlo algorithm for numerical approximation. Here we apply the latter following Cannavo 79 , where a sequence of random points of length N can be used to approximate the solution for N → ∞ . This allows for evaluating a multidimensional integral using a Monte-Carlo technique. Consider two uniformly distributed independent random points A and B (with a size of N × n ). A = [α A , β A ] and B = [α B , β B ] are composed of N trial sets for the evaluation of y . The model ( f (X) ) can be evaluated in these two points: f(A) and f (α A , β B ) . Using this method, the influence of different variables and their subsets on the model can be analyzed. In practice, this method draws A and B and form C ( C i , i = 1, . . . , n ) in a way that its ith column is equal to the ith column of B, and its remaining columns are from A. Using these sample inputs, the model is run to derive corresponding model evaluations ( f (A), f (B), f (C i ) ). These are then used to calculate the sensitivity indices using, Using this method one can measure the sensitivity of the model to a given parameter based on its contribution to the variance of the model output see more details in 79 . Data assimilation. UWCEnKF. The main aim of UWCEnKF is to update the system state and its parameters in a dual way while accounting for water balance when incorporating new observations. Here, we present a summary of the approach and more details can be found in Khaki et al. 52 . Ait-El-Fquih et al. 45 proposed a new dual EnKF scheme following the one-step-ahead (OSA) smoothing and showed that this could improve data assimilation performance by imposing more information to the system. Their approach comprises two interactive EnKF filters for state-parameter estimation. Khaki et al. 52 extended this to a water balance system by enforcing an additional constraint. The approach includes different steps; it first uses the state forecast ensemble to update the parameters through EnKF-like update, as well as to compute the OSA smoothing ensemble. The updated parameters and state variables are then integrated with the model to obtain the next state forecast ensemble in the second EnKF, which will be used to acquire the analysis ensemble. Despite the addition of the second EnKF implementation compared to the traditional dual-EnKF due to the OSA smoothing part, it has been shown that this only increases the computational cost minimally while it considerably enhances the performance of the dual approach 44,45,52 . For the state-parameter estimation problem in a discrete-time dynamical system, one can write, Scientific Reports | (2020) 10:18791 | https://doi.org/10.1038/s41598-020-75710-5 www.nature.com/scientificreports/ where x t ∈ R n x is the system state vector (with dimension n x ) and y t ∈ R n y is the observation vector (with dimension n y ) at time t. θ ∈ R n θ represents the parameter vector of dimension n θ . In Eq. (3), the model operator is indicated by M t−1 (.) , which is used to forward the state vector from t − 1 to t, and the observational (design) operator at time t is shown by H t . The model and observation process noises are represented by ν t−1 ∼ N (0, Q t ) and w t ∼ N (0, R t ) , respectively, with state covariance matrix Q t and observation covariance matrix R t . To solve Eq. (3), UWCEnKF applies a dual EnKF scheme comprising two interactive EnKF filters for state-parameter estimation. Each step of the filter is presented below.
(with n being the ensemble number and a standing for analysis step), the process begins with integrating state and parameters within the model to derive forecast is then used to calculate the analysis parameter ensemble {θ with the sample forecast error covariance matrix P x f t and the sample cross-covariance matrix between the previous parameter vector and current forecast errors where S is ensemble perturbation and can be calculated as a difference between ensemble members and ensemble mean.
• State estimation. Traditionally, the analysis parameters are used to recalculate the forecast ensemble in the standard dual EnKF by integrating {x a,(i) t−1 } n i=1 into the model based on the updated parameters. Ait-El-Fquih et al. 45 showed that the implementation of the OSA smoothing step, which is a measurement update based on the current observation can lead to a better state estimate. The smoothing state {x with P x a t−1 ,x f t being the sample cross-covariance matrix, calculated from the analysis states at t − 1 and forecast states at t. Next, similar to the standard EnKF, the forecast step is applied but using the updated parameters to forward states in time (from t − 1 to t). This is done using {x Scientific Reports | (2020) 10:18791 | https://doi.org/10.1038/s41598-020-75710-5 www.nature.com/scientificreports/ where z t def = d t − p t + e t + q t is introduced as "pseudo-observation". In this equation, L is an n z × n x identity matrix, and G = −L (here, n z = n x ). Contrary to a standard EnKF that only computes states in the analysis step, UWCEnKF estimates pseudo-observation noise covariance along with the states. This leads to the computation of constrained states from unconstrained state analysis ( {x a,(i) t } n i=1 ) in a second analysis step. A recursive algorithm exists in UWCEnKF to efficiently compute the analysis state {x a,(i) t } n i=1 based on the pseudo-observation noise covariance matrix ( ˆ t ). The second update step, thus, involves cyclic iterations to adjust the analysis state for ℓ = 0 . . . L (with L being the iteration number) as, t is computed from the new state and it is then used again in Eqs. (14)-(16) 12 .
Data assimilation setup. An experiment is designed to monitor the performance of multivariate data assimilation. The study period is divided into three parts: 2000-2002 to generate the initial ensemble, 2003-2012 to assimilate observations and estimate model parameters (i.e. assimilation periods), and 2013-2016 to investigate the impact of the estimated parameters on model simulations in the absence of assimilation (i.e. forecasting period). The spin-up is made for m = 30 ensemble members for the period 2000-2002. This is done by perturbing the meteorological forcing fields, i.e. for precipitation: ×N (0, 0.3) , for shortwave radiation: +N (0, 50) , and for temperature: +N (0, 2) . Model errors are mainly caused by errors in the initial condition, forcing data, and model parameters. The above perturbation process accounts for the first two error sources while the model structure error is not considered here. Nevertheless, ensemble inflation applied in the assimilation process (explained below) allows the ensemble to largely account for this error 81,82 . A parameter ensemble is produced by drawing (30) random samples from each parameter's HRU defined range (cf. Table 1). The state vector for data assimilation includes soil moisture at three layers of top (up to 7-9 cm soil layer), shallow (up to 30 cm soil layer), and deep-zone layers (up to 100 cm soil layer), surface and snow water storage, groundwater and LAI. The observation vector contains GRACE TWS observations, satellite soil moisture, and LAI products. Cumulative distribution function (CDF) matching is used for rescaling the observations (TWS, soil moisture, and LAI) to match those from the model 22,83 .
The observational operator ( H t ) converts the state variables into the observation space by taking into account discrepancies between the model and observation spatial resolutions. It aggregates model state variables at multiple grid cells to 1 • to be updated by 1 • GRACE TWS data. Top layer soil moisture variables at every 0.25 • are updated by satellite soil moisture measurements (i.e. 0.25 • AMSR-E and SMOS). LAI observations are spatially averaged and assimilated at the same resolution as of model ( 0.125 • ). To deal with the observations different temporal resolution, all observations are rescaled to the monthly scale (same as GRACE products) and assimilated on a monthly basis. This scale is also selected because it allows easier water budget constraint implementation provided in the second step of UWCEnKF, where water balance equation is applied using TWS changes over consecutive months. The monthly corrections as a result of data assimilation are added as offsets to the state vectors at the last day of each month to generate the ensembles for the next month assimilation step 2,84 .
To enhance EnKF performance during assimilation ensemble inflation and localization are applied. It has been shown by literature 85,86 that ensemble-based data assimilation methods are sensitive to the size of ensemble. Generally, a larger number of ensemble members can better span the state-observation space and lead to better results but at the expense of strongly increased computation needs. To address this, ensemble inflation and localization methods are usually used to tackle filter divergent or inaccurate estimation 87 for a small ensemble size and to avoid filter inbreeding. Ensemble inflation increases ensemble deviation from the ensemble-mean by applying a small coefficient ( [1.1 − 1.3] for the parameter and state updates) to ensemble members 88 . Localization using the Local Analysis (LA) scheme is also applied. It performs by spatially limiting the assimilation process within a certain distance from a grid point 10,89 . The suggested values ( 3 • ) by Khaki et al. 10 are used as localization radii to achieve the best outcomes using a trial and error.
As mentioned, the experiment is undertaken over the Murray-Darling and Mississippi basins given good data availability. To assess the results, model simulations are spatially interpolated to the nearest in-situ stations (cf. "Case studies" section). Once the simulation time series are generated at these locations, three evaluation metrics including standard deviation (STD), Root-Mean-Squared Error (RMSE), and correlation values are calculated with respect to the independent in-situ measurements. RMSE and STD are particularly useful to respectively investigate the distance between the simulations and in-situ measurements and the spread of simulations around the mean. These show how accurate and precise the results are. Note that only time series anomalies (i.e. time series minus their temporal average) are used for the validation. To better investigate the (13)

Results
Sensitivity analysis. The results of the sensitivity analysis are shown in Fig. 1. Estimated sensitivity weights of parameters (cf. Table 1) for each iteration of total 100 different iterations (using 100 different sets of matrices, see "Sensitivity analysis" section) are spatially averaged to show the relative weights of the model parameters and their influence on model output. In addition, the average parameter weights (for 100 iterations) are also plotted in the figure by a solid black line. It can be seen that larger weights are assigned to a group of parameters including C SLA , ref , I 0 P ref , and β with C SLA and ref having the biggest weights amongst all parameters. This can show the fundamental impact of specific leaf area and its interaction with light and moisture (humidity) levels within the study area. These larger weights corresponding to more model sensitivity can be observed over a majority of iterations. Some of the other parameters such as G smax and F ER0 represent less impact on the model outputs. From Fig. 1, it can be seen that sensitivity of parameters (e.g., PCI and α dry ) differ between HRU's. These indicate the effect of the model parameter variations on the simulation results, which highlights the importance of an accurate selection of parameters for estimation.  www.nature.com/scientificreports/ In addition to the above variations, it is found that the sensitivity of parameters shows considerable variations over different grid points. This can be seen in Fig. 2, where the relationship between the average and STD of parameters over the grid points is shown. These variations indicate that defining fixed values (spatially and temporally) for parameters is not realistic as it does not reflect the characteristics of different regions (and over different time periods) and can be problematic. The large STD values for a majority of parameters such as I 0 , C SLA , P ref , and β can be explained by larger spatial variabilities of the parameters. This is also the case for some parameters with smaller weights, e.g., PCI (in HRU2 corresponding to short and shallow-rooted vegetation). It can also be seen that parameters with larger variabilities such as C SLA , ref , P ref , β , and I 0 demonstrate larger sensitivities too (cf. Fig. 1). This means that the model is largely sensitive to the variations of these parameters. A few parameters such as α dry , F loss,max , and W 0lim , on the other hand, show smaller spatial variabilities and can be considered spatially homogeneous. Based on this test, we focus only on the most sensitive and variable parameters including C SLA , ref , I 0 P ref , and β to be estimated. This allows to efficiently improve the model by avoiding estimation of all parameters.
Parameter estimation. The parameter estimation results are presented here. The adjusted parameters and their range of variations from the application of the assimilation approach are presented in Table 2. Figure 3 shows the time evolution of two sample parameters ( β and I 0 ) over the assimilation period. The variation of these two parameters represents their average at each month for the Mississippi basin. From the figure, it can be seen that the parameter estimation process converges the parameters for different assimilation cases, i.e. GRACE, soil moisture, LAI only experiments, as well as simultaneous data assimilation. Details of the converged parameters over both basins can be found in Table 2. The results are for both multivariate and univariate data assimilation scenarios. The STD values show the spread of parameters around the average value, which indicates the variabilities and corresponding uncertainties of the estimated parameters. Table 2 shows that some parameters have larger STDs, e.g., β , I 0 , ref , P ref , C SLA , which generally suggests more spatial variability. These results suggest the ability of the parameter estimation approach to derive different values for parameters by adequately spanning the parameter space. Spatially varying parameters can better capture the characteristics of areas with different  www.nature.com/scientificreports/ atmospheric and environmental conditions. Moreover, it is found that the estimated parameters are considerably different from the initial values, especially for the A/Par approach, which will consequently affect state estimates too. It can also be inferred from the table that each assimilation scenario results in different parameter estimation. Nevertheless, closer results can be found between the multivariate case (A/Par) and GRACE-only assimilation. This can be explained the larger impact of the GRACE data during the assimilation process compared to the other assimilated observation. This will be investigated more in the following section. Furthermore, to better explore the corresponding impact of the parameter estimation on model simulations, the simulations with (A/ Par) and without (A/O) the adjusted parameters are analyzed (cf. "Results validation-Observations impact" sections).

Results validation. Independent in-situ measurements over the Murray-Darling and Mississippi basins
are also used to evaluate the results for A/O and A/Par approaches. We compare the results of assimilating different observations, i.e. GRACE TWS only, satellite soil moisture only, LAI only, and simultaneous assimilation of all three data products. To this end, RMSE and STD values for both the assimilation and forecasting periods are computed (Fig. 4). We further compare RMSE values for groundwater wells and the different assimilation methods both for the assimilation and forecasting periods (Fig. 5). The figure shows the RMSE reduction for each scenario with respect to the open-loop results. Overall, the results highlight the effectiveness of the satellite data assimilation for improving the model simulations, especially over the assimilation period. Moreover, www.nature.com/scientificreports/ multivariate data assimilation clearly achieves the best results over both basins. This can clearly be seen for different locations in Fig. 5. Multivariate data assimilation performs reasonably consistent across the basin for both experiment periods. GRACE data assimilation reduces RMSE and STD more than soil moisture and LAI only assimilation experiments. This is expected due to the larger impact of GRACE TWS on groundwater storage during assimilation. Despite this, it is observed that simultaneous (multivariate) A/Par reduces groundwater RMSE 32% (on average) compared to the open-loop run, which is the best performance amongst the different assimilation cases. Similar performance can be observed for the two basins. The A/Par method also obtains slightly better results compared to the A/O method over the assimilation period. Over the forecasting period, however, the multivariate simultaneous data assimilation method performs substantially better, which is evident from smaller RMSE values in both basins compared to the open-loop and A/O results. This can also be seen in Fig. 5, where simultaneous data assimilation (and to a lesser degree also GRACE only assimilation using A/ Par) results in higher RMSE reductions than A/O. Such superiority can be explained by the positive impacts of the new method on model parameters, which allows the model to preserve the adjustment impact during the forecast period. www.nature.com/scientificreports/ The performance of the above data assimilation scenarios is further assessed against in-situ and independent satellite soil moisture measurements relying only on the correlation analysis. Correlations between simulated soil moisture (with and without data assimilation using different observations) and in situ measurements are calculated at different depths and average results are reported in Table 3. For this purpose, the top layer estimates are examined against in-situ measurements of 0-8 cm for Murray-Darling and 0-10 cm for Mississippi. The estimated top, shallow and a portion of deep-root soil layers are compared with in-situ measurements of deeper layers over the two basins (e.g., 0-30 cm and 0-90 cm for Murray-Darling, and 0-50 cm and 0-100 cm for Mississippi). A statistical test is also applied to measure the significance of the results at 0.05 level. In general, assimilating multiple observations simultaneously leads to higher correlation values, both for the A/Par and A/O methods compared to the open-loop results. Furthermore, top layer simulated soil moisture is compared with the surface soil moisture L2 product from the Advanced SCATterometer (ASCAT) over the same periods. The ASCAT soil moisture products provide an estimate of the water saturation of the 5 cm topsoil layer and are derived from the European Organisation for the Exploitation of Meteorological Satellites (EUMETSAT). The correlation between www.nature.com/scientificreports/ the open-loop soil moisture and the soil moisture of the data assimilation scenarios with ASCAT soil moisture data is then calculated to derive improvement values, i.e. the difference in correlation for a data assimilation approach and the open loop experiment, both for the assimilation and forecasting period (Fig. 6). According to Table 3, the multivariate data assimilation improves correlation values by 0.21 (on average for the cases with and without parameter estimation) over the Mississippi basin and by 0.17 (on average for the cases with and without parameter estimation) over the Murray-Darling basin. It can be seen that univariate satellite soil moisture data assimilation performs the best among the univariate data assimilation experiments by increasing the correlation values from 0.67 (on average for open-loop) to 0.76. Similar results can also be seen in Fig. 6, where the multivariate data assimilation obtains the highest correlation improvement followed by soil moisture only data assimilation. Limited impacts can be observed by GRACE data assimilation, especially over the assimilation period while the LAI only assimilation case has no considerable impact on the results. From Table 3, improvements can also be seen in soil moisture estimates from GRACE data assimilation. Overall, it is found that GRACE only data assimilation mainly affects the deep-root and shallow soil zones within the assimilation period (on average ∼ 9% more than top layer) while soil moisture data assimilation largely improves top layer estimates (on average ∼ 12% more than deep-root layer). The former can be explained by the larger impact of GRACE TWS data assimilation (as in uni-and multivariate cases) on deeper model soil layers. Satellite soil moisture measurements, on the other hand, mainly reflect the top few centimeter soil water variations and correspondingly impact the model top layer. The combination of observations in the simultaneous case leads to the better performance of the approach in both A/O and A/Par. Between the experiment periods, more correlation improvement (with respect to the open-loop results) is obtained during the forecasting period using A/Par ( ∼ 20% for simultaneous assimilation) than A/O ( ∼ 4% ). This shows the importance of multi-mission observations during data assimilation. Yet, estimating parameters along with the state effectively improves the state-parameter estimates when multivariate data assimilation is assumed. This effect can also be observed in Fig. 6. The simultaneous data assimilation, and to a lesser degree soil moisture only scenario positively impacts the model top layer simulation by estimating parameters along with states during the assimilation period.
Further result evaluation is done to assess the effect of satellite data assimilation, specifically from the LAI products. As shown in literature [90][91][92] , constraining land surface models with LAI observations could result in better evapotranspiration predictions. To explore this, the estimated LAI and evapotranspiration by the A/O and A/Par approach are compared with AVHRR LAI and evapotranspiration from the MODIS Global Evapotranspiration Project (MOD16) 93 . This is done also for all univariate and multivariate assimilation cases. Average correlation improvement with respect to the open-loop results for the Mississippi and Murray-Darling basins are depicted in Fig. 7. The analysis is done again separately for the assimilation and forecast periods. One can see that data assimilation effectively improves the estimates in most of the cases. The improvements are more pronounced for simultaneous and LAI only data assimilation. GRACE only and soil moisture only data assimilation cases lead to a small level of correlation enhancement, especially using the A/Par method, which can be explained by the updated parameters. Improvements in LAI simulations clearly lead to evapotranspiration estimates closer to MOD16. The improvements are found for both the assimilation with and without parameter estimation approaches, particularly over the assimilation period. The best results over the forecast period are found for the A/Par experiments and for simultaneous data assimilation. The A/O performance is clearly worse compared to the A/Par performance over the forecast period. These results are consistent with the previous assessments, stressing that multi-satellite data assimilation, especially along with parameter estimation considerably improves the model simulations by incorporating various observations. Due to the superiority of the multivariate data assimilation cases based on this section's results, in the following, we focus only on these approaches and especially A/Par to investigate their performance in more aspects.
Observations impact. The integration of multivariate satellite observations (GRACE TWS, soil moisture, and LAI simultaneously) during the assimilation process impacts model simulations. This effect can be seen in Fig. 8 over the Mississippi and Murray-Darling basins. In this figure, basin-averaged TWS variations from the open-loop run (no data assimilation) are compared with the assimilation results, as well as the GRACE TWS data. The error, measured as the absolute difference between the GRACE TWS data and model simulations (with and without assimilation) is also plotted in Fig. 8c,d. Note that the forecast period (2013-2016), when no assimilation is applied, is separated from the assimilation period (2003)(2004)(2005)(2006)(2007)(2008)(2009)(2010)(2011)(2012)(2013). It can clearly be seen in Fig. 8 that data assimilation decreases misfits between the open-loop and observations over both basins. Smaller errors in Fig. 8c,d confirm the ability of the applied data assimilation method for decreasing discrepancies between the model and observations. This improvement can largely be seen for the assimilation period, and to a lesser degree, for the forecast period. Importantly, data assimilation leads to a better simulation of anomalies such as 2011-2012 over the Murray-Darling basin and 2012-2013 over the Mississippi basin.
To further investigate the impact of observations in the assimilation process, TWS ensemble spread over the basins is shown in Fig. 9. This is particularly of interest to monitor the influence of data assimilation on estimates in the assimilation period and its absence in the forecast period. TWS variations from individual ensemble members (shaded blue) and their average (solid blue) are displayed in Fig. 9. To better explore the effect, the comparison is done between the A/Par (Fig. 9a,b) and A/O (Fig. 9c,d) approaches. Both methods maintain the ensemble spread steadily during the assimilation period. While the pattern for both methods is similar over the assimilation period they differ in the forecast period. Larger spreads and corresponding uncertainties can be observed for the A/O results compared to the A/Par approach (cf. Fig. 9c,d). It can be inferred from the figure that the parameter estimation process along with the assimilation can extend the impact of data assimilation during the forecast period. This also reduces model uncertainties in that time period.

Scientific Reports
| (2020) 10:18791 | https://doi.org/10.1038/s41598-020-75710-5 www.nature.com/scientificreports/ Figure 10 shows the impact of data assimilation on soil moisture components from individual ensemble members to further investigate the simulation results (cf. Fig. 9). The correlation improvements over both basins are calculated with respect to the open-loop run, i.e., r c − r o with r c being the correlation coefficients between the assimilation (A/Par and A/O) results and satellite soil moisture observations and r o being the correlation coefficients between the open-loop results and satellite soil moisture observations. This is done separately for the assimilation and forecast periods. Figure 10 depicts correlation increases by both assimilation methods over the assimilation period. The A/Par method, however, obtains slightly better results, especially over the Mississippi basin, with an average increase of correlation of 0.29 compared to 0.25 for A/O. Over the forecast period, on the other hand, the new method performs remarkably better than the A/O approach over both basins, which is related to the estimated parameters. In addition, it can be seen from the figure that ensemble correlations show a larger spread over the forecast period, particularly for the A/O approach. This can indicate the larger stability of the A/Par method during the forecast period (as Fig. 9), which can result in smaller model state uncertainties. Now we explore the influence of the assimilated LAI data products on estimates. To this end, we compare the estimated LAI from two assimilation approaches by comparing it with LAI derived from AVHRR data. This is again explored over the Murray-Darling and Mississippi basins. Figure 11 shows the correlation improvement with respect to the open-loop run. Correlation values are computed for the assimilation period over each grid point. Land cover data acquired from Climate Change Initiative -European Space Agency (Version 2.0; http:// www.esa-landc over-cci.org/) is also presented in the figure for a better interpretation of LAI improvement results. From Fig. 11, the A/Par method increases the correlation compared to the open-loop results over both basins. The correlation improvement over the forecast period, however, is smaller, i.e. more than 0.4 improvement over  www.nature.com/scientificreports/ ∼31% of grid points (averaged over both basins) against ∼74% for the assimilation period. This is expected due to the absence of data assimilation. Nevertheless, correlation increase can be seen across the basins within the forecast period. More improvements can be seen over the vegetated areas (containing trees, vegetation, and shrubland) in both assimilation and forecasting periods compared to the cropland areas. This can be attributed to the higher capability of the assimilated data to reflect the variations of plant canopies. Overall, it can be  www.nature.com/scientificreports/ concluded that the method successfully incorporates observations during the filtering process and estimates the associated parameters. This is in correspondence with the previous findings as documented in Figs. 8, 9 and 10. Evaluation against water fluxes. A successful data assimilation approach for a water balance system not only improves the model simulations of various compartments but should also result in a better reproduction of water fluxes. To assess this, the updated TWS estimates are compared against flux observations of precipitation, evaporation, water discharge, and water storage changes using correlation analysis. Cross-correlation values are computed between the simulations (from the open-loop run, as well as assimilation with and without parameter estimation) and flux observations used in the second step of data assimilation filtering scheme (cf. "Model and data" section). Afterwards, improvement is calculated between the assimilation results with respect to the open-loop results for the assimilation (Fig. 12a) and forecasting (Fig. 12b) periods, separately for the Murray-Darling (indicated by 'MD') and Mississippi (indicated by 'MIS') basins. Both assimilation methods improve the agreement between measured and modelled flux components and storage over the assimilation period. The cross-correations increase stronger for evapotranspiration and water storage changes, which can be explained by the assimilation of TWS and LAI data from satellite products. The level of improvement over the forecasting period is much better for the A/Par approach than for the A/O approach. This can be seen clearly in Fig. 12b, where the A/Par results are approximately 12% (on average) better than those of A/O. This shows that the applied parameter estimation strategy has more pronounced impacts than A/O on results.
Climate variabilities. In this section, the ability of the multivariate data assimilation with parameter estimation technique (as the best method so far) to accurately reflect inter-annual weather variabilities as well as extreme events is assessed. Figure 13 plots average TWS variations from the open-loop and A/Par approach with respect to precipitation data over the Murray-Darling and Mississippi basins. This is done separately for the assimilation and forecasting periods. Better agreement between the two time series leads to a higher correlation between precipitation and TWS anomalies. The assimilation results show a better match than the open-loop results between the estimated TWS-variations and precipitation variations. This is clearer over the assimilation period, in which the A/Par method increases the correlation by 0.12 (on average) compared to the open-loop simulations. Improvement can also be found over the forecasting period over both basins (by 0.08) using the multivariate A/Par approach. These results demonstrate that the assimilation results better represent climateinduced variations compared to the open-loop run.
Another important aspect of successful model simulations is their ability to represent seasonal changes. This is evaluated by comparing seasonal variations of the open-loop and A/Par TWS results with those from GRACE data (Fig. 14). Results in Fig. 14 depict the average TWS seasonal amplitude (top panel) and TWS seasonal changes (middle and bottom panels) for the Murray-Darling and Mississippi basins over the assimilation and forecasting periods. Figure 14 illustrates that contrary to the open-loop result, assimilation results show not only similar seasonal amplitude but also closer range of variations compared to GRACE. Importantly, such an improvement can also be observed over the forecasting period (2013-2016), which is related to the estimated model parameters by remote sensing data assimilation.
Better agreement between the assimilation results and observations can also be seen in seasonal changes over both study periods. This is more evident for the Murray-Darling basin, where larger discrepancies exist between the open-loop results and GRACE data. Data assimilation, thus, has larger impacts in this case even www.nature.com/scientificreports/ over the forecasting period. It can be concluded that the assimilation results agree better to climatic variations due to their better performance in representing seasonal changes, which are triggered largely by climate-related components mainly through precipitation.
To further investigate the performance of data assimilation, soil moisture results are compared with average precipitation changes over two particular time periods, 2009-2013 (in assimilation period) and 2013-2016 (forecasting period) for the Murray-Darling basin. The former time period is selected due to the occurrence of an extreme (or irregular) climatic event namely high precipitation due to El Niño Southern Oscillation between 2010 and 2012 94 . The latter time period is selected to monitor the assimilation impacts on the forecastings. Figure     www.nature.com/scientificreports/ can be due to various factors such as erroneous model parameters, over-simplifying physical phenomena, and errors in its underlying equations. Better results for the A/Par approach suggest that estimating parameters through data assimilation can largely address the issue and consequently reflect anomalies. A similar performance can also be seen over the forecasting period, e.g., when positive anomaly in 2013 is clearer in the A/Par results. This again confirms the positive impact of A/Par on the parameter estimations.
To better illustrate this, the difference between average soil moisture content in March-April and January-February 2010 over the Murray-Darling basin is shown in Fig. 16, again both for the A/Par method and the open-loop run. This is done to investigate the impact of ENSO phenomena on soil moisture changes. Remarkably larger positive differences in the assimilation results indicate their better performance in representing the phenomena. These results show that assimilating multiple satellite data products can effectively improve the model skills to capture inter-annual weather anomalies.

Conclusions
The present study investigated the ability of multivariate satellite remote sensing data assimilation to improve predictions with a land surface model. Various observations including GRACE TWS, AMSR-E and SMOS soil moisture products, and AVHRR LAI were assimilated individually and simultaneously into the W3RA model using the recently proposed A/Par method, UWCEnKF. This was done for (i) state-parameter estimation over the assimilation period and (ii) for model predictions over the forecasting period. Different data sets were used to assess the data assimilation performance over the Murray-Darling and Mississippi basins. The major findings of this effort are: • In general, it was shown that the application of multi-mission satellite data can successfully improve the model's different estimates, both in the assimilation and forecasting periods. On the other hand, univariate data assimilation was found to mainly improve the model corresponding variable. Analysing the results against the assimilated observations shows that the A/Par method results in a closer correspondence to observation data, including independent, not assimilated, data. Thus, this study showed the importance of multivariate data assimilation when various water components are targeted combined with parameter estimation. • In the forecasting period, the joint assimilation and parameter estimation method still improves estimates considerably, but the A/O approach does not improve updates. Better TWS and LAI forecasts were obtained over the Murray-Darling and Mississippi basins by this method. The use of independent groundwater and soil moisture measurements also confirmed this. The UWCEnKF A/Par method demonstrated high capability to preserve the observations' impacts over a longer time period, which suggests that the method can successfully estimate the model parameters. Furthermore, multivariate assimilation along with parameter estimation shows promising performance in reflecting inter-annual weather variabilities as well as weather extremes into the state estimates over both assimilation and forecasting periods. Therefore, model parameter estimation during data assimilation is crucial for improved predictions.
Overall, based on both assessments against assimilated and independent observations, multivariate data assimilation with model parameter estimation remarkably improved model simulations, e.g., in terms of water storage accuracy and forecasting skill. Nevertheless, more investigation is required on the performance of the method on hyper-resolution models, where assimilating massive datasets can be problematic. Moreover, the method should be tested over various basins with different hydro-climatic conditions to further assess its impact on the simulations, especially for the forecasting periods.  www.nature.com/scientificreports/ Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creat iveco mmons .org/licen ses/by/4.0/.