Background & Summary

Accurate estimation of surface runoff is attracting increasing interest in hydrologic sciences due to its significance in water resources management. Surface and subsurface runoff quantification has many applications in the field of hydrology. This quantification is valuable for basin water monitoring, hydrologic modelling and design, water infrastructure design, and flood risk assessment, groundwater recharge, among other applications1. Acquiring runoff data from gauging stations is a challenging task that often requires high installation and maintenance costs2. The availability of global, continuous, and more importantly reliable and easily accessible runoff data is of increasing relevance and of high value.

The Natural Resources Conservation Service Curve Number method (NRCS-CN), previously known as the Soil Conservation Service-Curve Number (SCS-CN) method, is a commonly used model in estimating rainfall runoff 3. The method was developed by the USDA SCS in 1954 and has gained popularity since then due to its simplicity, reliability, credibility and its responsiveness to soil type, land use, surface condition and antecedent condition4. Originally, the purpose of NRCS method was to obtain runoff estimates for agricultural watersheds from storm rainfall events in the USA5. However, due to its convenience, water resources practitioners have adopted its usage in multiple regions and land uses6,7,8,9,10,11,12,13. In addition to its application in rainfall-runoff modelling, the NRCS-CN method has been successfully applied in urban hydrology14,15,16, water quality17,18,19, rainwater harvesting3,20,21,22, subsurface flow estimation23,24,25,26, and estimation of evapotranspiration (ET)27,28,29. This widespread implementation of the NRCS-CN method proves its prominence in hydrologic applications. Many techniques to estimate global surface runoff are computationally demanding and time consuming. Satellite observations, acquired in near-real time, can provide reliable surface runoff estimates at a hydrologically relevant spatial resolution30. In this context, this paper proposes a robust approach to quantify surface runoff – at the pixel, watershed, regional, and global scales – in near-real time at a 250 m resolution exploiting the NRCS-CN method. The generation of a near real-time, terrestrial time-series runoff is of immense importance in a better quantification of surface runoff and flood events, especially in ungauged watersheds. Applications of the runoff generator can help in agricultural planning, hydrological engineering, flood forecasting and monitoring water resources.

Recently, a global dataset for Curve numbers at the 250 m resolution (GCN250) has been published and is publicly available1. GCN250 has been validated against runoff generated from GLDAS (Global Land Data Assimilation System) but not against gauge runoff data. The dataset can benefit from Google Earth Engine (GEE)31 as the leading planetary-scale geospatial analysis platform by first adjusting the curve number by slope and second validating it against runoff data using remotely-sensed real-time estimates of precipitation. Such an application will yield a global runoff generator building on the GCN250 dataset, which accounts for wet, average, and dry antecedent runoff conditions. Motivated by the generation of GCN250m data, this paper attempts to create a synergistic global runoff generator (a GEE app) exploiting the surface soil moisture data product (SMAP)32, which became available in 2015. We use SMAP surface soil moisture anomalies and remotely sensed precipitation data from the Global Precipitation Measurement Mission (GPM) in near real-time to dynamically retrieve the appropriate curve number (wet-average-dry) from the GCN250m dataset based on the antecedent runoff condition (ARC, previously known as antecedent moisture condition). ARC is an indication of the runoff potential before a storm occurs, and it accounts for the variation in CN caused by different storms4. We developed a readily usable method to estimate global runoff using GEE (Fig. 1). The developed runoff generator estimates surface runoff using the GCN250m, SMAP surface soil moisture anomalies, precipitation from GPM33, and elevation data from the Shuttle Radar Topography Mission (SRTM)34. The tool provides a simple and reliable solution for runoff generation, which will be of value and interest to the scientific community because it can be directly used in generating time-series runoff for any watershed (given the availability of a consistent SMAP soil moisture product from April 1, 2015, until near-real time, and rainfall data from GPM). We further validated our results with daily discharge data from USGS (2015–2020), and annual runoff data from NASA’s Global Land Data Assimilation System (GLDAS)35 and from the European Centre for Medium-Range Weather Forecasts (ECMWF)36.

Fig. 1
figure 1

Workflow of generating global runoff data in Google Earth Engine.

Methods

Datasets acquisition

We used several satellite sensors to generate runoff: the recently developed GCN250m global curve number dataset, the Soil Moisture Active Passive (SMAP) mission, the Shuttle Radar Topography Mission (SRTM) for elevation and the Global Precipitation Measurement Mission (GPM). The publicly available GCN250m dataset consists of three curve number maps accounting for three antecedent runoff conditions: dry, average, and wet at a 250 m spatial resolution1. We imported the GCN250m dataset into GEE assets to use it in runoff calculations. The SMAP mission utilizes L-band radar and radiometer instruments and provides direct observations of global soil moisture every two to three days at a 9 km spatial resolution32. From the SMAP mission, we use the surface soil-moisture anomalies (ssma) determined from the climatology of the day of interest. SMAP was accessible on GEE with a one-week delay. We obtain daily precipitation (in mm) from GPM 3-hourly satellite mission available on GEE in near-real time. All datasets were prepared using GEE. Streamflow data from the U.S. Geological Survey (USGS) Surface-Water Data for the USA were obtained for runoff validation. Results were compared against USGS runoff, European Centre for Medium-Range Weather Forecasts (ECMWF) runoff and Global Land Data Assimilation System (GLDAS) runoff.

Runoff estimation and adjustments

The NRCS-CN method

From daily rainfall, we estimate rainfall-runoff using the original NRCS-CN runoff equation37:

$$Q=\frac{{(P-{I}_{a})}^{2}}{(P+{I}_{a})+S}\quad P > {{\rm{I}}}_{{\rm{a}}},\quad {\rm{else}}\;Q=0$$
(1)

Where Q, P, and Ia, S = runoff depth, rainfall depth, initial abstraction, and maximum potential retention, respectively. Initial abstraction, which consists of infiltration, surface depression storage and canopy interception during the early parts of a storm37, was assumed to be a function of the potential maximum retention (Ia = λS). which consists of infiltration, surface depression storage and canopy interception during the early parts of a storm37. According to the NRCS, it is recommended that λ = 0.2 for widespread use. We use λ = 0.2 because the original curve number calculation was based on a λ = 0.2. The potential maximum retention, S, can be determined from the curve number dataset using the equation:

$$S=25.4\left(\frac{1000}{CN}-10\right)\quad \quad {\rm{For}}\;{\rm{S}}\;{\rm{in}}\;{\rm{mm}}$$
(2)

Since S is in millimeters, then P and Q are also expressed in millimeters.

Slope adjustments

Terrain slope can affect runoff prediction by reducing infiltration and recession time. Here we, adjust CN values for slope. The average CN map obtained from GCN250m is appropriate for slopes up to 5%. We adjust the CN map for slope based on the Sharpley-Williams method38 presented in Eq. (3).

$$C{N}_{2\alpha }=\frac{1}{3}\left(C{N}_{3}-C{N}_{2}\right)\left(1-2{e}^{-13.86\alpha }\right)+C{N}_{2}$$
(3)

Where CN is slope-adjusted CN for average ARC, CN3 is CN for wet ARC, and CN2 stands for CN for average ARC, and α is terrain slope in m/m.

After adjusting the average ARC condition CN map, we use the table in the National Engineering Handbook – Part 630 Hydrology to obtain values of CN1 and CN3. We apply the below equations37,38 to compute CN1 and CN3 from slope-adjusted CN2:

$$C{N}_{1}=C{N}_{2\alpha }-\frac{20\left(100-C{N}_{2\alpha }\right)}{100-C{N}_{2\alpha }+{e}^{\left[2.53-0.0636\left(100-C{N}_{2\alpha }\right)\right]}}$$
(4)
$$C{N}_{3}=C{N}_{2\alpha }\times {e}^{\left[0.00673\left(100-C{N}_{2\alpha }\right)\right]}$$
(5)

Accounting for antecedent runoff condition

The antecedent runoff condition (ARC) is affected by rainfall intensity and duration, soil moisture conditions, vegetation cover, stage of growth and temperature1. The CN values vary depending on ARC. We use surface soil-moisture anomalies (ssma) obtained from SMAP to retrieve the appropriate curve number based on the ARC. Since SMAP data is available every two to three days, we interpolate the anomaly values on days where data is missing. We retrieve the appropriate CN value using an “if – then” algorithm:

If ssma ≤−1.5, then CN = CN1 (dry ARC);

If ssma ≥1.5, then CN = CN3 (wet ARC);

Otherwise, CN = CN2 (average ARC)

Development of the app and upstream watershed delineation

We develop an application that provides users with on-click pixel runoff at 250 m resolution and a time-series of CN-based runoff over point of interest or watershed of interest. Upstream basins are delineated automatically within the app using the HydroBASINS product. Three columns from the attribute table of the HydroBASINS product are used: the “main_bas”, the “hybas_id” and the “next_down” columns. Sub-basins that share the same main basin (i.e. “main_bas” value) are filtered based on point of interest. The “next_down” column represents the ID of the next downstream polygon. We use it along with the “hybas_id” column to navigate downstream using a look-up dictionary. The resulting output is the upstream basin of the point of interest. The users have the option to either draw their point/basin of interest or use the upstream basin generation method in the app (which is based on the topological concept of the Pfafstetter coding system39). The app provides the users with the option to export both the time-series runoff as csv and the delineated upstream basin as a shapefile.

The application of Geographic Information System (GIS) in hydrologic modeling and water resources management requires large data and computing resources. The developed runoff app takes advantage of GEE’s powerful cloud computing environment to process big data in minutes thus overcoming the issue of limited computing capacity and saving time. The global runoff generator presented herein is based on the assimilation of three global datasets: the GCN250 curve number dataset, the GPM rainfall product, and the soil moisture product (SMAP). As an application in GEE, the GCN250 Runoff app can be used by users without prior knowledge of JavaScript, and without the need to import or store any data required for runoff calculation. The major capability of the GCN250 Runoff app is that a time-series of rainfall-runoff for any watershed can be generated for an extended period of record instantaneously which reduces the time and effort needed in generating runoff as compared to other methods of runoff generation. This would reduce the workload of hydrologists and researchers who work on hydrologic modeling and runoff generation using the NRCS-CN model. Time-series daily runoff values are plotted over either a point, a basin selected from HydroSHEDS basins collection, or a generated upstream basin from an outlet point of interest (Fig. 2). Users can use the HydroBasin Tool to choose the HydroBASIN level of interest, and the Upstream Basin Tool to delineate the upstream basin of a user-defined outlet point.

Fig. 2
figure 2

GCN250 Runoff application interface in Google Earth Engine.

The output of the app includes mapping runoff, rainfall, and curve number values anywhere on land, exporting time-series values as csv, charting time-series (Fig. 2), and exporting the upstream basin as a shapefile. Other uses of the app include exploring runoff and rainfall time-series for different watersheds globally, visualizing daily runoff, rainfall and curve number values based on change in soil moisture anomalies, and exporting the time-series of daily runoff.

Figure 2 shows the GCN250 Runoff app in Google Earth Engine. Designed to automate the process of hydrologic analysis, the purpose of the app is to facilitate the generation of rainfall-runoff using the curve number method. Obtaining time-series rainfall-runoff values on-click makes hydrologic modeling more efficient and effective. Other than generating runoff values, users can simultaneously observe the response of soil moisture anomalies, curve number values, and runoff to changes in rainfall over a pixel-level (at 250 m resolution) and watershed level, globally. The app is easy to use. It only requires two inputs: a period of study (start and end date) and a watershed of interest (or point of interest). No prior knowledge of GEE or JavaScript is required to run the app. All the layers and datasets that are needed for runoff calculation are available as GEE datasets and need not to be imported by the user. The Antecedent Runoff Condition (ARC) is automatically selected based on the surface soil moisture anomalies values per pixel.

Data Records

The 2021 mean annual GCN250 Runoff dataset at 250 m resolution is publicly available in the Figshare database (https://doi.org/10.6084/m9.figshare.19596157)40. The product is stored in GeoTiff format at 7.5 arc-second (~250 m spatial resolution) using the World Geodetic System 1984 (WGS84) datum geographic coordinate system. All data can be generated by any user using the publicly available application and can be exported in csv format in Google Chrome (https://jaafarhadi.users.earthengine.app/view/runoff-from-gcn250). The monthly mean GCN Runoff data for 2015–2021 is hosted on GEE as an Image Collection (https://bit.ly/3T321KH). The pixel values need to be multiplied by 0.001 to obtain the runoff in mm. Figure 3 shows the GCN250 Runoff dataset for 2021.

Fig. 3
figure 3

Generated annual mean runoff for 2021 using the GCN250 Runoff application.

Technical Validation

The generated runoff prediction accuracy was assessed by comparing it to the observed USGS streamflow data. It remains a challenge to directly compare runoff observations from USGS to our CN generated runoff. Pixel-based generated runoff provides information on the spatial distribution of runoff within the watersheds which is not the case when using the observed streamflow data from USGS. However, it is crucial that generated runoff is evaluated using the observed runoff. To quantitatively evaluate the generated rainfall-runoff accuracy, we used statistical performance indices in combination with graphical performance measures. The combination of quantitative statistical indices and graphical performance measures provides robust model accuracy and performance assessment41. The most common statistical indices were used: the Coefficient of determination (R2), Root Mean Square Error (RMSE), and Nash-Sutcliffe efficiency (NSE). We used time-series graphs for visual comparison and to better observe and identify model bias (i.e., when and where the model is performing inadequately).

The validation results show that the GCN250 Runoff app provides runoff data that correlates well with ground-truth discharge data from USGS. Further development of the app can be performed by integrating more functions and equations or applying automatic calibration to the model.

Runoff Validation with USGS discharge data

We validated the generated runoff results against 16 watersheds in the US from 8 hydrologic units having 4 different climatic characteristics for period 2015–2020. The USGS national database includes a wide range of hydrologic data that is freely available to the public through their web interface (http://waterdata.usgs.gov). Sixteen sites were selected for validation (Fig. 4).

Fig. 4
figure 4

Selected watershed locations used for validation.

We aggregated the daily runoff data, both observed and predicted, into monthly means to study the correlation between the observed and predicted runoff values (in mm/month). Watersheds in hydrologic unit 11 showed the largest correlation (goodness-of-fit, R2 = 0.8), along with one watershed from hydrologic unit 12 (R2 = 0.8). There was a good correlation (R2 > 0.5) in seven watersheds coming from different hydrologic units. Moderate correlation was found in five watersheds (R2 > 0.375), and weak correlation in one watershed (R2 = 0.2). As shown in Table 1, goodness-of-fit (R2) varies between watersheds within the same hydrologic unit which we believe is due to difference in specific watershed characteristics such as morphologic and vegetation factors.

Table 1 Selected sites for validation and their corresponding hydrologic unit number, goodness of fit and root (R2) mean square error (RMSE).

Performance by hydrologic units, climate, and soil characteristics

Results showed strong correlation for watersheds in hydrologic units 3, 10, 11 and 12, a moderate correlation for watersheds in hydrologic units 8, 13 and 7, and a weak correlation for watersheds in hydrologic unit 5. The strongest correlation (R2 = 0.898) was found in watersheds of hydrologic unit 11, characterized by impermeable soils and bedrock and having a humid climate. Watersheds in hydrologic unit 12 characterized by permeable soils, impermeable bedrock and a semi-arid climate showed a strong correlation ((R2 = 0.76) too. A strong correlation (R2 = 0.7) was found in watersheds of hydrologic units 3 (sub-humid climate) and 10 (arid climate), characterized by impermeable soils and permeable bedrock. A good correlation was found in watersheds of hydrologic units 8, and 13 characterized by impermeable soils and bedrock and a semi-arid climate. Watersheds in hydrologic unit 7 characterized by a humid climate and permeable soils and bedrock showed a moderate correlation (R2 = 0.5). There was a weak correlation (R2 = 0.2) in hydrologic unit 5 characterized by permeable soils and bedrock and an arid climate. The Root Mean Square Error (RMSE) ranged between 0.01 (HUC 13) and 0.29 (HUC3). Similarly, NSE values ranged between 0.01 (HUC5) and 0.87 (HUC 11). The variations in R2, RMSE, and NSE for the different hydrologic units are shown in Fig. 5a.

Fig. 5
figure 5

Comparison of USGS observed runoff and GCN250 generated runoff (mm/month) (a) Correlation results grouped by hydrologic unit code (HUC); (b) Correlation results grouped by climatic characteristics and soil permeability.

Climatic factors, such as rainfall and temperature, have a profound influence on runoff and groundwater. Thus, we group watersheds into four climate zones (Arid, Humid, Semi-arid, and sub-humid) to study how well the model works under different climatic characteristics (Fig. 5b). Results showed there exists a strong correlation in humid climates (R2 = 0.85, RMSE = 0.17, NSE = 0.82), followed by semi-arid (R2 = 0.73, RMSE = 0.12, NSE = 0.697) and sub-humid climates (R2 = 0.73, RMSE = 0.29, NSE = 0.673). The weakest correlation was observed in watersheds of Arid climates (R2 = 0.49, RMSE = 0.1, NSE = 0.456).

Knowing that soil permeability is among the physical features that affect runoff, we divide the watersheds based on soils and bedrock permeability into four groups (Fig. 5b). Group 1: watersheds of impermeable soils and bedrock, group 2: watersheds of impermeable soils but permeable bedrock, group 3: watersheds of permeable soils but impermeable bedrock, and group 4: watersheds with permeable soils and bedrock. Results showed that watersheds from group 1 had the highest R2, NSE and lowest RMSE value (R2 = 0.89, RMSE = 0.12, NSE = 0.868). Similarly, watersheds of groups 2 and 3 showed strong correlation (R2 = 0.75). There was a weak correlation in watersheds of permeable soils and bedrock (R2 = 0.243).

We observe how watershed area size and slope percent affects correlation results. The predicted runoff values show good correlation with watersheds of different area sizes, with the lowest correlation (R2 = 0.6) for area size less than 128 sq. km, and the greatest correlation was noticed for watersheds of area size between 128–140 sq. km. Runoff results from watersheds of higher slope percent (>7%) showed lower agreement (lower R2) with USGS observed runoff data as compared to watersheds with slope percent 2.5–7% (R2 > = 0.65).

Validation with GLDAS and ECMWF Runoff

We further validate the GCN250 Runoff dataset (average runoff for 2021) against runoff from GLDAS and ECMWF for the following regions: the US excluding Alaska, Sacramento (USA), Congo (Africa), Danube (Europe), Amazon (South America), Euphrates and Tigris (Syria, Iraq, Turkey), and part of East Africa (Fig. 6). We also compare the runoff ratios (average runoff/average rainfall) for the same regions (Fig. 7). Average annual runoff ratios were generated using the average annual rainfall (mm/day) from GPM (for GCN250 runoff ratio), GLDAS rain (for GLDAS runoff) and ECMWF precipitation (for ECMWF runoff ratio). Results show a good agreement between runoff ratios from ECMWF and GCN250 runoff ratios in the United States (Fig. 7a). For Congo, Euphrates and Tigris, and East Africa regions, results show that runoff ratios from GCN250 lie between runoff ratios from GLDAS and ECMWF with ECMWF runoff ratios having larger values. For the Amazon and Danube basins, the runoff ratios from GCN250 and GLDAS show good agreement both having a smaller runoff ratio as compared to ECMWF runoff ratios.

Fig. 6
figure 6

Comparison of average annual runoff (mm/year) for 2021 obtained from GLDAS, ECMWF and GCN250 Runoff for (a) United States, (b) Sacramento, (c) Congo, (d) Danube, (e) Amazon, (f) Euphrates and Tigris, and (g) East Africa. As shown in the figure, the GCN250 average annual runoff (mmm/year) lies between average annual runoff from GLDAS and ECMWF, except for (a) where it is slightly lower than GLDAS and ECMWF (51 mm/year compared to 55 mm/year), and (b) where it is slightly higher (21 mm/year).

Fig. 7
figure 7

Mean annual runoff ratios of 2021 obtained from ECMWF, GCN Runoff and GLDAS for (a) United States, (b) Sacramento, (c) Congo, (d) Danube, (e) Amazon, (f) Euphrates and Tigris, and (g) East Africa.

We also estimate the global runoff ratios of 2021. The mean global runoff ratio from GCN Runoff was 7.9% compared to 7.7% from GLDAS. The estimated global mean runoff ratio excluded areas of high latitudes (up to the latitude band 60°N-S).

The herein developed runoff app is a tool that makes runoff estimation faster and easier. Moreover, both the app and the generated global curve number runoff product are useful for several applications when combined with other data products. For example, users may want to utilize the output of the app (or the generated runoff product) in water balance applications. They would still need to obtain other type of information such as total water storage/ water recharged to water table/change in soil moisture or evapotranspiration for the area (or basin) of interest. The app will provide users with the runoff and rainfall component. Additionally, among the other useful applications of the generated runoff product and app is flood risk assessment. If combined with slope/aspect data, and upslope area, the global runoff data can be used for flood risk assessment. Users can either use the app, the generated runoff collection, or the existing code40 which will save them a lot of time and effort if they decide to build on this data for future work. The uncertainty of the GCN250 Runoff product is related to the uncertainty of the GPM rainfall dataset, the SMAP anomalies dataset, and the Curve Number (GCN250) dataset. The accuracy of results is affected by the accuracy of the rainfall data from GPM dataset. The GCN250 app can be further developed into a hydrologic modeling tool where runoff is generated from different equations according users’ choice. For example, different slope-adjustment equations can be added such as equations by Ajmal, et al.42 or Huang, et al.43, different CN conversion formulas (from CNI to CNII and CNIII), different rainfall products can be tested, initial abstraction value can become a user-defined input, and automatic calibration can be applied. Future work can be done on the GCN250 Runoff app using machine learning classifiers, such as random forest, to calibrate the NRCS-CN model used. All the input parameters required to calibrate the model are found in GEE except for the in-situ discharge data which can be imported by the user. In-situ discharge data need to be present for a prolonged period of record and missing months data need to be removed for better calibration results.