GCN250, new global gridded curve numbers for hydrologic modeling and design

The USDA curve-number (CN) method is fundamental for rainfall-runoff modeling. A global CN database is not currently available for geospatial hydrologic analysis at a resolution higher than 0.1°. We developed a globally consistent, gridded dataset defining CNs at the 250 m spatial resolution from new global land cover (300 m) and soils data (250 m). The resulting data product – GCN250 – represents runoff for a combination of the European space agency global land cover dataset for 2015 (ESA CCI-LC) resampled to 250 m and geo-registered with the hydrologic soil group global data product (HYSOGs250m) released in 2018. Our analysis indicated that medium to high runoff potential currently dominates the globe, with curve numbers ranging between 75 and 85. Global curve numbers were 62, 78, and 90 for dry, average, and wet antecedent runoff conditions, respectively. Australia has the highest runoff potential, while Europe has the lowest. Runoff ratios compare well with GLDAS. The potential application of this data includes hydrologic design, land management applications, flood risk assessment, and groundwater recharge modeling.

www.nature.com/scientificdata www.nature.com/scientificdata/ and PROBA-V data for 2013-2015. We generated the first global gridded CN dataset (GCN250) from the ESA CCI-LC maps (2015) and the HYSOGs250m soils data based on the USDA curve number tables 4 and plant functional types 10 . The GCN250 datasets represent the global curve numbers at approximately 250 m spatial resolution under dry, average, and wet antecedent runoff conditions (ARC). The soil was assumed undrained soil, and hence the CN of dual HSG were treated the same as the HSG class D.
We believe that this new CN dataset will be of value and interest to the scientific community because it can be directly used to assess time series changes in runoff at global, regional, and watershed scale, given the availability of a consistent ESA land cover product from 1992-2015 11 . The GCN250 dataset is valuable for hydrological analysis and design, flood risk assessment, and mapping, watershed water management, and other related applications. Rainfall-runoff modeling is a potential application given the available techniques in downscaling gridded precipitation data.

Methods
We used three main inputs to generate the GCN250 datasets ( Fig. 1): a land use/land cover map, a hydrologic soil group map, and three CN look-up tables. For the land cover product, we used the most recent ESA-CCI LC data of 2015 (ESA European Space Agency 9 ). The hydrologic soil groups were acquired from Ross, et al. 8 10 . The detailed description for each land cover class is provided by ESA European Space Agency 9 and Poulter, et al. 10 . We mapped the PFT into Land cover classes according to NEH-630 classification (Table 1). Because not all plant functions types exist in the NEH-630 classification, we made certain assumptions to make sure that all PFTs of the ESA land cover classes have a corresponding NEH-630 land cover class. Trees PFT were mapped as woods classes in NEH-630, while shrubs in PFT were mapped as desert shrubs and brushes classes in NEH-630. The hydrologic conditions of the wood and shrubs/brushes classes were used to distinguish between the PFT types of trees and shrubs. For example, broadleaf evergreen trees have a lower CN than broadleaf deciduous, and therefore we mapped them as woods with under good hydrologic conditions. Similarly, deciduous needle leaf trees have a higher runoff potential than any other tree type. Therefore, we mapped them as woods under a poor hydrologic condition. We mapped natural grass in the PFT class as the NEH-630 grasslands class under good hydrologic conditions. For managed grasslands (which are rain-fed croplands in ESA classes), we calculated CN values from the average CN of all cropland types in NEH-630 (row crops, small grains, and close-seeded or broadcast legumes or rotation meadow) under good hydrologic conditions. Bare soil CN values were matched with the fallow bare soil.
Composite curve number. Curve numbers used in generating GCN250 are presented in Online-only  where CN k is the curve number of the class for the hydrological soil group k, n is the total number of PFT types (i) present in the ESA-CCI LC class, CN i is the curve number for the individual PFT class -hydrological soil group k combination and FC i is the percentage of the PFT type i of the total LC class n based on ESA-CCI LC description detailed in the ESA LC product user guide 9 .
Some exceptions are listed hereafter. Curve numbers for irrigated croplands were assumed equal to those for managed grasslands under poor hydrologic conditions (which are expected to increase runoff and therefore emulate an irrigated field). For unconsolidated bare areas, CN values were matched with the fallow bare soil in NEH-630, as well for ESA bare areas LC type. For consolidated bare areas, CN values were averaged from impervious areas according to NEH-630. Zones that have dual hydrological soil groups were assumed to be undrained. The ESA CCI-LC dataset had a spatial resolution of 300 m, and it was resampled to 250 m using the nearest neighbor method to match the HYSOGs250m spatial resolution. We used two input datasets (HYSOGs250m and the resampled ESA CCI-LC) with three CN look-up tables (one for each ARC) to generate the curve number products at the 250 m resolution (GCN250).
Curve number for various antecedent runoff conditions. The CN values vary depending on antecedent runoff conditions (ARC), which is affected by the rainfall intensity and duration, total rainfall, soil moisture conditions, cover density, stage of growth, and temperature 14 . For this reason, we generated three curve number maps for three ARCs: dry (Fig. 2a), average (Fig. 2b), and wet ( Fig. 2c) ARC conditions. USDA 14 provided guidelines to convert CN values from average ARC conditions into wet and dry ARC conditions.

Data Records
There are three data records associated with this work: one raster dataset for each of the antecedent runoff conditions (ARCI = dry, ARCII = average, ARCIII = wet). The GCN250 product is publicly archived in Figshare 15 . The product is stored in GeoTiff format at 7.5 arc-second (~250 m spatial resolution) using the World Geodetic System 1984 (WGS84) datum geographic coordinate system. Table 2 presents a summary of the CN values mapping at the global and the continental scale. Figure 3 shows the distribution of the curve numbers for selected world river basin regions: (a) Amazon, (b) Mississippi, (c) Mekong, and (d) Nile and Tigris-Euphrates.

Technical Validation
The uncertainty of the GCN250 product is related to the uncertainty of the global ESA-CCI-LC dataset, the soils classification data, and the uncertainty in the CN look-up table. The compositing of CNs is affected by the accuracy of the Land cover classes, and the accuracy of hydrologic soil groups classifications in the HYSGOG250m database. Using gridded precipitation products from the Global Land Data Assimilation System (GLDAS) 16 , we compared the daily and the monthly runoff ratios resulting from the three GCN250 products with those of GLDAS runoff.
Land cover. The ESA CCI-LC was evaluated by independent means where validation was carried out by external partiesfnline 9 . When validating the 2015 ESA CCI-LC product, the accuracy level was found to be between 71.1% and 75.4%. User accuracy was high (83-97%) for croplands (irrigated and rainfed), broadleaved evergreen forests, urban areas, bare areas, water bodies, and permanent snow and ice. Low user accuracies were observed for the classes of natural vegetation, lichens and mosses, sparse vegetation, flooded forest, and forests (mixed broadleaf and needle leaf). We expect curve numbers of these areas to have a higher uncertainty www.nature.com/scientificdata www.nature.com/scientificdata/ compared to other areas with better LC classification accuracy. Moreover, because the CN method was originally developed based on observation of rainfall-runoff relationships in small agricultural watersheds, we expect some uncertainty in curve numbers developed for forested watersheds in humid environments and deep soils.  www.nature.com/scientificdata www.nature.com/scientificdata/ Hydrologists should proceed with caution when using these CNs for design and other hydrologic applications and should always compare generated runoff with observed values whenever possible.
Hydrologic soil groups. Ross,et al. 8 described the uncertainty assessment of the HYSOGs250m product that we used to generate the GCN250 dataset. The root means square error (RMSE) of the soil grids used as input to HYSOGs250m are between 9.5% and 13.1%. HYSOGs250 also included the groundwater table depth metric to capture broad-scale patterns of groundwater with a coefficient of variation of 9%. We expect these uncertainties to carry over into the GCN250 product.   www.nature.com/scientificdata www.nature.com/scientificdata/ Comparison with other curve number datasets. Table 3 shows a comparison of CN values between our GCN250 and CN reported by Zeng, et al. 5 for several large basins in the world. We believe that the (1-17%) higher values of composite GCN250 curve numbers compared to those reported in the Zeng, et al. 5 under average ARC conditions for the studied basins to be mainly due to the difference in the soils input map. According to Zeng, et al. 5 , the global soil distribution is dominated by moderately low runoff potential (37% soil group B), whereas it is dominated by moderately high runoff (57% soil group C) according to Ross,et al. 8 , and therefore the resulting CNs are higher for our results due to the prevalence of higher runoff soils. The CNs are highly impacted by soil groups. For example, woods in poor conditions in a B hydrologic soil group would have a CN of 66, while the CN in a C hydrologic soil group would be 77, which doubles the runoff from say a 75 mm rainfall event. Furthermore, Ross, et al. 8 factored the depth to impermeable layers (bedrock) and depth to the groundwater in the HYSOGs250m product, and both variables were absent from the Zeng, et al. 5 study. Zeng, et al. 5 used 17 classes from MODIS at 500 m resolution to generate the CN dataset, whereas we used the 36 classes of the 300-m ESA-CCI LC product to create the GCN250 product. We believe that the GCN250 dataset better captures land cover variations and soils hydrologic classifications than currently existing curve number datasets.
Comparison of GCN250 runoff with GLDAS runoff. We compared GCN250-generated runoff ratios to GLDAS runoff ratios (daily runoff/daily rainfall) for the following river basins: Amazon River Basin (South Fig. 4 Comparison between GCN250 and GLDAS daily runoff ratios derived from GLDAS rainfall for major world basins and East Africa. Wet = GCN250 ARCIII runoff ratio, Average = GCN250 ARCII runoff ratio, Dry = GCN250 ARCI runoff ratio. www.nature.com/scientificdata www.nature.com/scientificdata/ America), Colorado River basin (USA, Mexico), Mississippi (USA), Mekong River Basin (Cambodia, Vietnam, Thailand, Laos, Myanmar, China), Nile River Basin, Sacramento (USA), Tigris-Euphrates (Turkey, Iraq, Syria, Iran), Uruguay, Yangtze (China), also the area of East Africa (Fig. 3). Using daily gridded rainfall (aggregated from 3-hourly 0.25° × 0.25° rainfall) from the Global Land Data Assimilation System (GLDAS v2.1) 16 , we generated gridded daily runoff ratios from the GCN250 datasets (one for each of the three antecedent runoff conditions) and compared it to the aggregated daily runoff ratios (from the 3-hourly runoff) from GLDAS for 2015-2018. We applied the rainfall from GLDAS grid onto each CN pixel within that grid, generated the CN runoff, aggregated the runoff over the basin, and then calculated the mean runoff ratio by dividing the mean runoff by the mean rainfall over the basin for that day. We compared the time series of the daily mean runoff ratios from the two sets Comparison between GCN250 and GLDAS mean monthly runoff ratios derived from GLDAS rainfall for major world basins and East Africa. Wet = GCN250 ARCIII runoff ratio, Average = GCN250 ARCII runoff ratio, Dry = GCN250 ARCI runoff ratio.
www.nature.com/scientificdata www.nature.com/scientificdata/ (GLDAS and GCN250) (populated over all the basins individually) (Fig. 4) and the mean monthly runoff ratios (Fig. 5). Results show that correlations between mean monthly GLDAS runoff ratios and GCN250 runoff ratios varied by basin and by rainfall. For example, good agreement between GLDAS runoff ratio and GCN250 average CN runoff ratio was noticed in the Nile and the Amazon basins and also (to a lower degree) the Tigris-Euphrates, while for Sacramento River basin, the GLDAS runoff ratio was in agreement with the GCN250 wet CN runoff ratio. For the Yangtze River basin, we noticed good agreement between GCN250 wet CN runoff ratios and GLDAS runoff ratios for December and April of every year. In summer, GLDAS runoff ratios ranged between the GCN250 average CN runoff ratio and the GCN250 wet CN runoff ratio. In the Mississippi basin, GCN250 average CN runoff ratio agreed with GLDAS runoff ratio during the May-December. For the winter months, the GLDAS ratios were closer to the GCN250 wet CN runoff ratio. Basins dominated by snowmelt (such as the Colorado River basin) showed temporal disagreement. We attribute this variability in the results mainly to the assumption of a constant average initial abstraction of 20% of the CN method and to the variability in the ARC conditions within a watershed in space and in time. We note that the objective of this exercise is not to validate the CN method against other direct runoff data, but rather to provide insights into how close the GCN250-based runoff is to GLDAS and possibly guidance on how the dataset can be used in different geographic and climatic conditions. Users are encouraged to check local conditions and runoff trends whenever available in their area of interest before deciding on which of the three GCN250 products (ARCI, ARCII, and ARCIII) to use. The best approach would be to utilize a combination of watershed land use knowledge, rainfall intensity, plant growth stage (for agricultural watersheds), antecedent moisture (from gridded soil moisture datasets), and precipitation products to determine the antecedent runoff conditions.