Global gridded crop harvested area, production, yield, and monthly physical area data circa 2015

Here we provide an update to global gridded annual and monthly crop datasets. This new dataset uses the crop categories established by the Global Agro-Ecological Zones (GAEZ) Version 3 model, which is based on the Food and Agricultural Organization of the United Nations (FAO) crop production data. We used publicly available data from the FAOSTAT database as well as GAEZ Version 4 global gridded dataset to generate circa 2015 annual crop harvested area, production, and yields by crop production system (irrigated and rainfed) for 26 crops and crop categories globally at 5-minute resolution. We additionally used available data on crop rotations, cropping intensity, and planting and harvest dates to generate monthly gridded cropland data for physical areas for the 26 crops by production system. These data are in standard georeferenced gridded format, and can be used by any global hydrology, land surface, or other earth system model that requires gridded annual or monthly crop data inputs.

FAOSTAT 5 (average of period 2009-2011) to 5-minute gridded products of irrigated and rainfed harvest area, production, and yield, as well as value of crop production. The downscaling process is a sequential, iterative rebalancing procedure that relies on optimization principles 8 suitable for situations when the aggregate observed information is available (here FAO national crop statistics), constrained by the priors (i.e., the intermediate spatial maps), other available statistical data, and expert opinion 6 . Here we present our methodology for updating the GAEZ 2010 products to 2015, providing more recent data on production, yield, and harvested areas; these data are available to the wider global modeling community. The new dataset is referred to here as GAEZ+ 2015.
Updating to 2015 was done using FAOSTAT 5 country-level information on crop harvest area, crop production, and livestock herd size changes from 2010 to 2015, along with the FAO GeoNetwork's Global Administrative Unit Layer (GAUL) 7 to map 5 minute grid cells to countries. We used the MIRCA2000 4 information on multi-cropping patterns and monthly crop calendars to convert the updated GAEZ+ 2015 crop harvest areas into monthly cropland physical areas by crop and production system. This method uses national-level data to update a gridded product; as such, it is important to note the limits of this data set. GAEZ+ 2015 is not based on remote sensing observations of cropland presence or absence in a given grid cell, and therefore represents aggregate changes in cropland extent but not high-resolution changes. Further, information on crop calendars and rotations is taken from the MIRCA2000 4 dataset, so GAEZ+ 2015 does not include information on changes to crop calendars or rotations in the past 15 years. This data product is meant to be an intermediate data set for use by those who need crop layers consistent with country level FAO-reported 2015 statistics while we await the release of other products such as the crop-and production-specific datasets that are promised from the Global Food Security-Support Analysis Data (GFSAD) 9 and MapSPAM 10 , both of which use remote sensing to better identify changes in crop locations and calendars.
The data provided here follow commonly used file formats (geotiff and netCDF) and metadata standards familiar to global gridded modeling communities. The re-use value is high, as there are a number of Global Hydrologic Models and Land Surface Models that use this type of data. Additionally, interdisciplinary research teams are increasingly using global gridded crop data, e.g., in global economic models such as SIMPLE-G 11 and integrated assessment models such as the Global Change Analysis Model GCAM 12 . Further, this dataset uses the GAEZ crop list and ensures consistency with FAO administrative level agricultural data, thereby connecting this dataset to already widely used crop categories and more aggregate publicly available data that is already in use. FAO is currently using the GAEZ+ 2015 data in an ongoing study.

Methods
Here we describe methods for the GAEZ+ 2015 Annual Crop Data, and the GAEZ+ 2015 Monthly Cropland Data. The Annual Crop Data was generated first, then the Monthly Cropland Data was calculated based on the Harvest Area results of the Annual Data (Fig. 1).

Fig. 1
Schematic overview of annual and monthly data production methods. The GAEZ+ 2015 products described in this paper are in dark blue boxes; publicly available data used are in light blue. Dark blue arrows indicate which data are used in each processing step, and grey arrows from steps to data show which steps result in final GAEZ+ 2015 data products. The processing steps listed here are referred to in the Methods section text.
www.nature.com/scientificdata www.nature.com/scientificdata/ GaeZ+ 2015 Annual Crop Data Methods. The GEAZ+ 2015 Annual Crop Data updates the 2010 GAEZ v4 crop harvest area, yield, and production maps 6,7 (identified as Theme 5 in ref. 7 ) using national-scale data on the change in crop harvested area and livestock numbers from 2010 to 2015, based on statistics for 160 crop groups, and cattle and buffalo, from FAOSTAT 5 .
Three datasets were used to produce GAEZ+ 2015 Annual Crop Data: 1. FAOSTAT crop production domain: annual, country-level data on crop harvested area (H) and crop production (P) for each crop from the FAOSTAT database (Table 1) 2. GAEZ v4 6,7 gridded global annual harvested area, yield, and production by crop for the 26 FAOSTAT crops and crop categories at 5-minute resolution 3. Global Administrative Unit Layer (GAUL 2012) 13 data. GAUL 2012 reports the fraction of each global 5-minute grid cell that falls within a given country or disputed territory. There are 275 unique global administrative units.
Step This results in 160 rH and rP values per country. If harvest area and production values for a particular crop are zero or unreported in the FAOSTAT data, then rH c and rP c are both set to 1.0 (i.e., no change from 2010 to 2015). Three years of data are averaged (2009 -2011 and 2014 -2016) to account for missing data for some country/year combinations and to avoid emphasizing reported outliers.
Step 2. Aggregate FAOSTAT-based ratios to the GAEZ crop categories: We followed the crop aggregation methods of the GAEZ model to aggregate the FAOSTAT crop list (160 unique crops as of 2019) to 26 crops (see Table 1). For each of the 26 GAEZ crop categories, if there is more than one matching FAOSTAT crop (see Table 1) then we applied an area-weighted average (based on FAOSTAT year 2015 harvested area) of the FAOSTAT crops within each country to the rH and rP values for that crop and country. This results in 26 rH and rP values per country. There was one exception to this: the GAEZ_2010 crop category 'fodder crops' was an aggregate of 17 FAOSTAT crops (see Table 1) for which harvest area data are no longer reported on FAOSTAT; i.e., GAEZ_2010 had obtained FAOSTAT data on fodder crops circa 2010, but FAOSTAT no longer provides any data on fodder crops for any year. We assumed that the 2010 to 2015 fractional change in fodder crop harvest area in each country was proportional to the change in the FAOSTAT reported national herd sizes for cattle and buffalo livestock data 5 for that country, following the same methodology as for crop harvested area change (see Step 2 below). This method assumes a negligible international trade of fodder crops as indicated by bilateral trade matrices available from FAOSTAT.
Step 3. Apply country-level ratios to grid cells: Calculated country-level ratios were then applied to each grid cell k, using the GAUL_2012 13 definitions for which grid cells fall within which countries. Some grid cells are split between two or more countries. In this case, all model output variables for the grid cell are divided between the countries based on the fraction of grid cell area falling within the country i: ,2010 , Special Case: Sudan FAOSTAT data for years before 2011 report data for Sudan, and for South Sudan and Sudan after 2011. To compute the ratios for these grid cells, we split the 2010 data for Sudan into a virtual 'North' Sudan and 'South_ Sudan' , using the data for the year 2012, which was reported for both countries. We then used these generated 2010 data and applied the same methodology as described above to calculate changes in harvested areas and production in all grid cells in both countries.
www.nature.com/scientificdata www.nature.com/scientificdata/ Special Case: Small regions and islands Forty-nine countries -generally small regions or islands -had no data reported for crop harvested area by FAOSTAT. We assumed that there was no change in crop harvested area for the grid cells in these countries. Note that many may have had zero ha as previously-reported crop area in GAEZ v4. These countries are (the number following each region is the region's number in ADM0_CODE in the GAUL_2012 data 13 ): Anguilla (9), Aruba (14), Ashmore_and_Cartier_Islands (16), Azores_Islands (74578), Baker_Island (22)   Special Case: Disputed Areas Some grid cells in the GAUL_2012 13 cell-table database are assigned to nine disputed areas, rather than to specific countries. We assumed that there was no change in crop harvested area or production from 2010 to 2015 for grid cells these disputed areas. These areas are (the number following each region is the region's number of the ADM0_CODE in the GAUL_2012 13 data): Abyei (102) This new data product consists of 156 data files in geotiff format, one rainfed harvested area file and one irrigated harvested area file for each crop harvest area (1000 ha (10 7 m 2 ) per 5-minute grid cell), crop production (1000 tonnes (10 6 kg) per 5-minute grid cell), and crop yield (tonnes per ha (10 −1 kg m −2 ) per 5-minute grid cell), for each of the 26 GAEZ crops or crop categories in Table 1.
GaeZ+ 2015 monthly cropland area methods. Two datasets were used to produce monthly cropland area by crop and by irrigated vs rainfed management. These are: 1. GAEZ+ 2015 Annual Harvested Area 14 (as developed above) 2. MIRCA2000 cropland area 4 Step 5. Harmonize the GAEZ+ 2015 and MIRCA2000 crop lists The MIRCA2000 4 cropland product provides monthly growing area grids (gridded physical cropland area) for 26 irrigated and rainfed crops and crop categories, as well as cropping calendars that identify the planting month and harvesting month for each crop (via 'subcrops' -see below). However, the MIRCA2000 crop list is not the same as the GAEZ+ 2015 crop list; we matched each crop type in the GAEZ+ 2015 crop list to a crop type in the MIRCA2000 crop list to enable the application of MIRCA2000 crop calendars to GAEZ+ 2015 crops ( Table 2). Out of the 26 GAEZ+ 2015 crops, 18 had clear 1:1 matching crop categories within MIRCA2000. The remaining 8 crops were matched based on general crop characteristics, i.e., annual vs. perennial, or to unmatched MIRCA2000 cereals.
An essential component of the MIRCA2000 cropland dataset is the identification of subcrop categories within each crop category to split crops into areas grown in different seasons, or crops with different planting and harvesting dates within the same season. Up to 5 subcrops can be defined to represent such multi-cropping practices. Below, we use the following notation: Step 6. Apply MIRCA2000 monthly crop calendars to GAEZ+ 2015 annual data To generate the monthly cropland physical area of GAEZ+ 2015 crops, we followed these steps for each GAEZ crop in each grid cell: 1. For a given GAEZ crop in a given grid cell, is the area reported >0 for the matching MIRCA2000 crop?
a. If YES, then use the MIRCA2000 data for the grid cell and crop considered. b. If NO, then find the closest grid cell with the matching MIRCA2000 crop category, and apply the MIRCA2000 crop rotation from that grid cell to the given crop/grid cell combination for the following steps. 2. Does the matching MIRCA2000 crop category ( For each month and each grid cell, check if the sum of all crops (irrigated and rainfed) is greater than the 99% of area of the grid cell. We assume that at least 1% of land must be retained as non-cropland for agricultural infrastructure such as roads, buildings, irrigation infrastructure, and other landcovers (e.g. rivers, wetlands).
a. If NO, then no further processing is done. b. If YES, then reduce crop area by the excess value based on a removal order ( Table 2). Rainfed crops have higher removal order numbers for the excess truncation (starting with 1) before removing irrigated crops, until the cell area is not exceeded. A large removal number (e.g., 20) indicates that the crop's land is unlikely to be removed. Large priority numbers are given to the staple crops to ensure these important food producing lands are consistent with FAOSTAT country data. The maximum monthly amount of physical cropland that was removed by step 3 is 711,543 ha, which is 0.05% of total global cropland physical area.
The resulting global gridded data from Step 6 are monthly time series of cropland physical area by crop, subcrop, and production system, called GAEZ+_2015 Monthly Cropland Data 17 . Combining the MIRCA2000 crop calendar and subcrop rotation information with the GAEZ+ 2015 annual data allows for the representation of crop seasonality; e.g., Fig. 2 shows the aggregate monthly cropland physical area for Rice 1 and Rice 2 (two sub-crops of rice) over the northern hemisphere, clearly illustrating the two main rice-growing seasons. 14 www.nature.com/scientificdata www.nature.com/scientificdata/ Note that the annual dataset and the monthly data set are stored in two different repositories. The annual dataset is on Harvard Dataverse and the monthly dataset is on GeoHub. The use of different repositories is based on funding and contract obligations.

technical Validation
We compare GAEZ+ 2015 harvested area to FAOSTAT 5 reported harvested area by crop (Table 3) and by country (Online-only Table 1). It would be useful to also compare harvested area to other global gridded datasets, but at the time of this analysis, there is no other global gridded crop harvested area product for the year 2015. GAEZ+ 2015 crop yields and cropland physical area are compared to other publicly available global gridded datasets for the year 2015, as well as to one dataset for cropland presence/absence at the grid cell level for the year 2010. Validation of any global gridded crop dataset is challenged by universal uncertainties in underlying reported crop statistics and in remote sensing methods. Most, if not all, global gridded crop datasets make use of FAO-reported country level statistics, leading to consistency but not validation when datasets are compared. Here, we report comparisons to other datasets so users familiar with those data are aware of similarities and differences, and we compare spatial aggregates not used in the development of the data, e.g., sub-national boundaries and watershed boundaries, as well as grid cell values where available to illustrate the spatial scale at which these data are consistent or inconsistent.
Harvested area comparison. While the goal of this paper is to present year 2015 agricultural data, we first must note any biases apparent in the underlying year 2010 data upon which our product is based, as any bias in the 2010 data will necessarily be carried into the 2015 products. Globally, the sum of all GAEZ v4 2010 harvested areas is 4.2% lower than the FAOSTAT c.2010 total harvested area for all crops, based on all available matching country and crop combinations. The majority of this difference in harvested area is due to the Fruits_&_Nuts category, which is reported in FAOSTAT but not by GAEZ v4 2010. There are also large differences in the Crops_ NES and Yams_and_other_roots categories, but in the opposite direction, partially cancelling out the Fruits_&_ Nuts discrepancy. The four crops that account for the majority of the world's production -wheat, maize, rice, and soybeans -all match well, with differences between GAEZ v4 2010 and FAOSTAT c.2010 of <1%. Similarly, the  www.nature.com/scientificdata www.nature.com/scientificdata/ top crop producing countries in the world match well between the two datasets, though notably both Indonesia and Thailand have ~10% less total harvested area in GAEZ v4 2010 than in FAOSTAT c. 2010.
Since no other global gridded harvested area dataset exists at this time, we have only the FAOSTAT country-level and crop total data for 2015 as a basis for comparison. We do not expect the country or global crop aggregates to match exactly because the FAOSTAT data used to generate GAEZ+ 2015 Harvested Area provided only a change in crop harvested area by country; we did not target or calibrate to the FAOSTAT 2015 reported values, and so comparing these two datasets provides some evaluation of the combination of underlying GAEZ 2010 data with the FAOSTAT-based change ratio. Global harvested area by crop from GAEZ+ 2015 and FAOSTAT is shown in Table 3, and total crop harvested area by country from GAEZ+ 2015 and FAOSTAT is shown in Online-only Table 1. Table 3 presents crops in order of the largest to the smallest global harvested area, according to GAEZ+ 2015. The world's four staple crops -wheat, maize, rice, and soybeans -all have <1% difference in crop harvested   www.nature.com/scientificdata www.nature.com/scientificdata/ area between FAOSTAT and GAEZ+ 2015. As can be seen in Table 1, the FAOSTAT "CropsNES" category is an aggregate of many crops, which individually have small global harvested areas, but collectively are the third largest harvested area category in the world. Despite challenges in reporting of small crop harvested areas, there is a 0.1% difference in global CropsNES harvested area between FAOSTAT 2015 and GAEZ+ 2015. Notably, GAEZ+ 2015 reports 7% (~5 Mha) more Pulses harvested area than FAOSTAT, a difference that is directly inherited from the 7% difference between GAEZ v4 2010 and FAOSTAT 2010 Pulses harvested area difference. Pulses are the 5 th largest harvested area in the world, and an important food crop, particularly in developing countries; this crop warrants more attention from global crop researchers in light of this harvested area difference. Other crop categories with large (>10%) differences are Other cereals, Yams & other roots, and Banana; these crops account for a small proportion of global crop harvested area, but should be evaluated more carefully for local studies of regions where these crops are nutritionally and/or economically important. The GAEZ+ 2015 data set includes 169,182 thousand hectares of fodder crops; this value is not included in Table 3 because the FAO does not report fodder crop area for 2015.
Online-only Table 1 presents harvested area summed for all crops by country, ordered from most to least GAEZ+ 2015 harvested area. The five countries with the most harvested area (India, China, United States of America, Russian Federation, and Brazil) all have less than or equal to 1% difference between GAEZ+ 2015 and FAOSTAT 2015. There are a few notable large countries for which there are larger differences between the datasets; Australia, Germany and France all of more than a 10% difference. There are 31 countries with >0 harvested area reported in FAOSTAT 2015, yet 0 harvested area in GAEZ+ 2015; all these countries have <100,000 ha of FAOSTAT-reported 2015 harvested area, indicating that GAEZ+ 2015 is missing information on many of the smallest agricultural production countries. GAEZ+ 2015 grid cell crop yields are consistently lower than GDHY grid cell crop yields for all four crops (Fig. 3), with grid cell yield linear regression slope values of 0.5, 0.3, 0.9, and 0.4 for maize, rice, soybean, and wheat, respectively. While individual grid cell yield values are consistently lower in GAEZ+ 2015 than GDHY, the former reports a greater spatial extent, with more non-zero grid cell values; this can be seen in the large span of non-zero values along the y-axis of Fig. 3, as well as by comparing maps of the two products (Fig. 4). It is www.nature.com/scientificdata www.nature.com/scientificdata/ expected that the spatial extent of these two products is different, as the GDHY product uses crop harvested area data from year 2000 19 as a basis for crop distribution; this difference in distribution, especially the lower extent in GDHY compared to GAEZ+ 2015, explains the difference in grid cell level yields. Yield is a weight per area, so a smaller total area in GDHY necessitates a higher yield per area in order to achieve agreement with the FAO country level yield and production statistics also used by GDHY. cropland physical area comparison. Global total. Annual cropland physical area extent (shown in Fig. 5) can be calculated from the GAEZ+ 2015 monthly cropland physical area data. Here we compare GAEZ+ 2015 cropland extent to FAOSTAT reported cropland area extent 5 . GAEZ+ 2015 minimum cropland extent is calculated by assuming the maximum possible re-use of cropland in multi-cropping systems, effectively using the maximum monthly growing area as the cropland extent. GAEZ+ 2015 maximum cropland extent is calculated by assuming the minimum possible re-use of cropland, effectively taking the minimum of the annual harvested area and the grid cell area as the cropland extent. Globally, we find GAEZ+ 2015 cropland extent is 4-8% lower than FAOSTAT cropland extent (Table 4). This result is consistent with our estimate that GAEZ v4 2010 total harvest area is 4.2% lower than FAOSTAT reported harvested area, and that GAEZ v4 2010 total minimum cropland extent (calculated in the same way as the GAEZ+ 2015 minimum cropland extent) is 10% lower than HYDE 3.2 1 year 2010 reported global cropland extent.
Spatial distribution of cropland physical area. To evaluate the accuracy of the GAEZ+ 2015 spatial patterns of cropland physical area, we compare total cropland physical area, irrigated cropland physical area, and rainfed cropland physical area (Fig. 6), as well as irrigated rice physical area and rainfed rice physical area (Fig. 7) to the HYDE 3.2 1 product. While HYDE 3.2 1 utilizes country-level information from FAOSTAT within its data generation algorithm, it bases the spatial distribution of cropland on remote sensing data, which provides an independent source of sub-national crop data. Validation was done at two spatial aggregates: (1)   www.nature.com/scientificdata www.nature.com/scientificdata/ for maps of the spatial units). This provides validation metrics at geophysical and socio-political scales relevant to the modelling communities that would use this dataset. We also compare grid cell values of physical areas for all categories available from HYDE (Fig. 9, Table 5).
Linear regression results (Table 5 and Figs. 6, 7, and 9) show that when aggregated spatially, GAEZ+ 2015 minimum cropland, irrigated, and rainfed physical area match well with HYDE 3.2 spatial distributions (r 2 ≥ 0.9 for all three), with results significant at the p < 0.001 level. With the exception of irrigated land, the slope of the regression of GAEZ+ 2015 as a function of HYDE 2.3 is slightly less than 1.0, indicating a consistent small underestimation of rainfed land, and small overestimation of irrigated land compared to HYDE 3.2. However, the strength of the linear regression shows that the spatial distributions are similar. Rice irrigated and rainfed physical areas match less well, with r 2 values of only 0.77 and 0.48 for irrigated and rainfed, respectively, at the administrative unit level (values are higher for the basin aggregations). Differences in rice physical area are not surprising, especially for rainfed rice, given the known challenges of mapping widely distributed and diverse rice cropping systems, e.g. 21,22 .
While the linear regression shows agreement in aggregate, the grid cell comparison (Fig. 9) illustrates large scatter, especially in cropland physical area. This large scatter is consistent with the differences in underlying methods used to identify crop area presence/absence in the two datasets. We expect GAEZ+ 2015 to have 0 or lower values where HYDE reports non-zero values due to the use of the GAEZ v4 2010 crop map as a basis for the GAEZ+ 2015 data product.

Grid cell crop presence/absence comparison for year 2010.
Lastly, we compare the GAEZ v4 data that underlies GAEZ+ 2015 to the MapSPAM 10 crop data product. While this paper presents the new GAEZ+ 2015 dataset, note that the methods used to produce this gridded data restrict a given crop's presence (harvested area, yield, and physical area) to the grid cells in which that crop was present in the GAEZ v4 data product. Therefore, we find it informative to compare the GAEZ v4 crop-specific presence/absence to another data set that provides such information for the same year (2010); this will provide a view of how the underlying crop map data compares to other published data, and display the constraints placed on the GAEZ+ 2015 crop presence/absence in grid cells.
With a few notable exceptions, grid cell presence/absence of the world's four main crops -maize, rice, soybean, and wheat -mostly agree between GAEZ v4 and MapSPAM (Fig. 10). For all four crops, MapSPAM has presence in more grid cells than GAEZ v4 across Sub-Saharan Africa. There are also crop-specific inconsistencies across western Canada (Maize), Europe (Rice), Australia and Russia (Soybean), and India (Wheat).

Usage Notes
There are several known issues and limitations that users should be aware of. These are described in the following paragraphs. www.nature.com/scientificdata www.nature.com/scientificdata/ annual harvested area versus monthly cropland area discrepancies. As noted in the Methods section above, there is a small disparity between the annual harvested area and the monthly cropland area for some crops in the GAEZ+ 2015 product; the maximum difference in any month is 0.05% of global cropland.
Irrigated cassava files. The annual product includes a file for irrigated cassava harvested area. All grid cell areas for this crop are 0; this follows from zero irrigated cassava area in 2010 in GAEZ v4 data; therefore we did not include irrigated cassava in the monthly product. Limitation of the country-level statistics approach. This product will become obsolete when a product becomes available that updates global data using the GAEZ or a similar methodology to account for sub-national shifts in the spatial distribution of cropland due to a range of a potential factors, such as land protection, land   www.nature.com/scientificdata www.nature.com/scientificdata/  degradation, urban expansion, infrastructure development such as roads or reservoirs and canals, or climate change. The use of country-level statics in the methods presented here cannot capture these changes in spatial distribution; therefore, this product should be seen as a temporary tool to be used while researchers are waiting for updated products that can capture those changes.    www.nature.com/scientificdata www.nature.com/scientificdata/ Lack of new information on irrigation. There was insufficient post-2010 irrigation data available from FAO to update the distribution of irrigation activity among crop types and land areas, so the product uses for 2015 the same gridded rainfed-to-irrigated crop harvest area and crop production ratios as reported in GAEZ v4 for 2010.

Lack of fallow land information.
Fallow land is not included as a category in this dataset. This is partly due to a compromise made in the development of the dataset: because the grid cells with presence of a given crop are restricted to the same grid cells from GAEZ v4, we allowed the total cropland extent within a grid cell to become large in order to best match national level statistics. This accommodates cases where the crop expanded to other grid cells, yet leaves little room for fallow land.
Lagging data usage. It is a challenge to keep global cropland products current -the global agricultural system is constantly in flux, yet global data products take time to develop, so their input data is necessarily from earlier years. Some challenging features to quantify (e.g., crop calendars, irrigation) are only re-constructed at the global scale very infrequently. For example, GAEZ+ 2015 monthly crop data relies on the MIRCA2000 data 4 to characterize sub-annual cropping activity; MIRCA2000 was published a decade ago and represents the situation c.2000, a decade and a half prior to the 2015 target year of this new data set.
Other uncertainties. Any global agricultural data set will contain errors. These could result from imperfect statistical reporting of underlying data, simplifications needed to successfully harmonize disparate data sets (e.g., cropland data and irrigation data), and inherent uncertainties in remote sensing characterization of global-scale land use.
This product necessarily carries forward any errors in the 2010 base product of GAEZ v4. Finally, we note that due to the limitations -especially the country-level statistics approach -model results generated using this dataset should be interpreted with care at the level of grid cells are small aggregates. While gridded input data is required for many global models, we recommend interpreting results at aggregate spatial levels such as the administrative units or watersheds presented in Fig. 8. Grid cell level data such as MIRCA2000, HYDE 2.3, and MapSPAM are routinely used in global hydrologic models attempting to simulate years not represented by those datasets; here, the GAEZ+ 2015 update to the GAEZ v4 product aims to incorporate updated crop data into an existing gridded product to reduce the bias resulting from using outdated data. This method generates its own, unique set of biases due to the limitations described above.

code availability
Code used in this paper is available here: https://github.com/wsag/GAEZ-_2015_code. This repository includes the scripts that: a) Convert annual harvested area to monthly crop physical area b) Aggregate the Hydrosheds 5-minute river network into 20-200 km 2 watersheds c) Compares GAEZ+ 2015 harvested area with FAOSTAT harvested area d) Compares GAEZ+ 2015 yield data to GDHY yield data e) Compares GAEZ+ 2015 cropland physical area data to HYDE 2.3 cropland physical area data