Abstract
Identifying untapped opportunities for crop production improvement in current cropland is crucial to guide food availability interventions. Here we integrated an agronomically robust bottom-up approach with machine learning to generate global maps of yield potential of high resolution (ca. 1 km2 at the Equator) and accuracy for maize, wheat and rice. These maps serve as a robust reference to benchmark farmers’ yields in the context of current cropping systems and water regimes and can help to identify areas with large room to increase crop yields.
This is a preview of subscription content, access via your institution
Access options
Access Nature and 54 other Nature Portfolio journals
Get Nature+, our best-value online-access subscription
$29.99 / 30 days
cancel any time
Subscribe to this journal
Receive 12 digital issues and online access to articles
$119.00 per year
only $9.92 per issue
Buy this article
- Purchase on SpringerLink
- Instant access to full article PDF
Prices may be subject to local taxes which are calculated during checkout
Similar content being viewed by others
Data availability
The high-resolution global maps of yield potential have been deposited in Zenodo (https://doi.org/10.5281/zenodo.12209708) (ref. 19). Data on yield potential are available on the GYGA (https://www.yieldgap.org/). Global climatic data are available on WorldClim (https://www.worldclim.org/). Global gridded soil data are available on ISRIC (https://data.isric.org/). Global crop calendar data are available on SAGE, UW-Madison (https://sage.nelson.wisc.edu/data-and-models/datasets/crop-calendar-dataset/), RiceAtlas (https://www.nature.com/articles/sdata201774) and CropMonitor (https://cropmonitor.org/index.php/eodatatools/baseline-data/). Crop distribution maps are available on SPAM (https://mapspam.info). Source data are provided with this paper.
Code availability
The R code used in the current study is publicly available on GitHub (https://github.com/AramburuMerlos/gGYGA).
References
Cassman, K. G. & Grassini, P. A global perspective on sustainable intensification research. Nat. Sustain. 3, 262–268 (2020).
van Ittersum, M. K. et al. Can sub-Saharan Africa feed itself? Proc. Natl Acad. Sci. USA 113, 14964–14969 (2016).
Marin, F. R. et al. Protecting the Amazon forest and reducing global warming via agricultural intensification. Nat. Sustain. 5, 1018–1026 (2022).
van Ittersum, M. K. et al. Yield gap analysis with local to global relevance—a review. Field Crops Res. 143, 4–17 (2013).
FAO and IIASA. Global Agro Ecological Zones version 4 (GAEZ v4). http://www.fao.org/gaez/ Accessed 29 Sep 2023.
Rattalino Edreira, J. I. et al. Spatial frameworks for robust estimation of yield gaps. Nat. Food 2, 773–779 (2021).
Grassini, P. et al. How good is good enough? Data requirements for reliable crop yield simulations and yield-gap analysis. Field Crops Res. 177, 49–63 (2015).
Cedrez, C. B. & Hijmans, R. J. Methods for spatial prediction of crop yield potential. Agron. J. 110, 2322–2330 (2018).
Meyer, H. & Pebesma, E. Machine learning-based global maps of ecological variables and the challenge of assessing them. Nat. Commun. 13, 2208 (2022).
Milà, C., Mateu, J., Pebesma, E. & Meyer, H. Nearest neighbour distance matching leave-one-out cross-validation for map validation. Methods Ecol. Evol. 13, 1304–1316 (2022).
Meyer, H. & Pebesma, E. Predicting into unknown space? Estimating the area of applicability of spatial prediction models. Methods Ecol. Evol. 12, 1620–1633 (2021).
Breiman, L. Random forests. Mach. Learn. 45, 5–32 (2001).
Jeong, J. H. et al. Random forests for global and regional crop yield predictions. PLoS ONE 11, e0156571 (2016).
van Wart, J. et al. Use of agro-climatic zones to upscale simulated crop yield potential. Field Crops Res. 143, 44–55 (2013).
van Bussel, L. G. J. et al. From field to atlas: upscaling of location-specific yield gap estimates. Field Crops Res. 177, 98–108 (2015).
Aramburu Merlos, F. & Hijmans, R. J. Potential, attainable, and current levels of global crop diversity. Environ. Res. Lett. 17, 044071 (2022).
Global spatially-disaggregated crop production statistics data for 2010 Version 2.0. Harvard Dataverse. International Policy Research Institute https://doi.org/10.7910/DVN/PRFF8V (2019).
R Core Team. R: A Language and Environment for Statistical Computing (R Foundation for Statistical Computing, 2020).
Aramburu-Merlos, F., van Loon, M., van Ittersum, M. & Grassini, P. Global gridded maps of yield potential of the Global Yield Gap Atlas (GYGA). Zenodo https://doi.org/10.5281/zenodo.12209708 (2024).
Acknowledgements
This study was supported by the National Institute of Food and Agriculture of the United States Department of Agriculture (grants Hatch NEB-22-399 to P.G.) and the National Science Foundation (NSF #2214604 to P.G.) We thank M. Alimagham (Wageningen University and Research) for his feedback on an early version of this manuscript.
Author information
Authors and Affiliations
Contributions
F.A.-M., M.P.v.L, M.K.v.I. and P.G. conceived the research. F.A.-M. performed data acquisition, data processing, modelling and data analysis. F.A.-M., M.P.v.L, M.K.v.I. and P.G. wrote the manuscript.
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing interests.
Peer review
Peer review information
Nature Food thanks Nimai Senapati, Francisco Villalobos, Bingfang Wu and the other, anonymous, reviewer(s) for their contribution to the peer review of this work.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Extended data
Extended Data Fig. 1 Area of applicability of metamodel.
The metamodel area of applicability is shown in shades of gray. Red areas were excluded from metamodel predictions due to their high dissimilarity with the environments explored by data used to train the metamodel. Color darkness represents the crop harvested area of each crop and water regime combination as percentage of the total area of each pixel. White areas have < 0.5% of given crop and water regime and were not considered. Crop harvested areas were retrieved from SPAM17.
Extended Data Fig. 2 Global gridded yield potential (Ypot) comparison of two approaches.
Comparison of gridded Ypot predictions based on country-blind climate zones extrapolation (left panels) and metamodel (right panels) versus site-level Ypot from the Global Yield Gap Atlas (GYGA Ypot) for three crops and two water regimes. Each point represents a simulation site (reference weather station). Predictions were derived following nearest-neighbor-distance-matching leave-one-out cross-validation method. The root mean square error relative to GYGA Ypot average (RSME %, also known as normalized RMSE) is shown for each method and crop combination. Other model performance metrics are shown in Supplementary Table 2.
Extended Data Fig. 3 National gridded yield potential (Ypot) comparison of two approaches.
Comparison of Ypot predictions based on GYGA upscaling approach (left panels) and the metamodel (right panels) versus site-level Ypot from the Global Yield Gap Atlas (GYGA Ypot) for three crops and two water regimes. Each point represents a site (reference weather station). Predictions were derived following the Nearest-Neighbor-Distance-Matching Leave-One-Out Cross-Validation method, with crop harvested area of countries included in GYGA as target prediction area. The root mean square error relative to GYGA Ypot average (RSME, %) is shown for each method and crop combination.
Extended Data Fig. 4 Metamodel prediction uncertainty.
Expected normalized root mean square error (NRMSE) of global gridded yield potential estimates derived from the metamodel, expressed as percentage of the predicted yield potential.
Extended Data Fig. 5 Yield potential derived from different approaches.
Comparison of maize water-limited yield potential derived from a top-down approach (GAEZ, gaez.fao.org)5, a bottom-up approach (GYGA CZ, www.yieldgap.org), and a metamodel that integrates a bottom-up approach with machine learning (Metamodel) in East Africa.
Extended Data Fig. 6 Relation between annual precipitation and the yield potential derived from two approaches.
Water-limited yield potential (Yw) of rainfed maize as a function of annual precipitation in East Africa for two yield potential prediction approaches: a metamodel that integrates a bottom-up approach with machine learning (Metamodel) and a top-down approach (GAEZ, gaez.fao.org)5. Each point represents a 5-arc-minute resolution grid with rainfed maize in East Africa. The red lines are local regression lines. Annual precipitation data was extracted from WorldClim (worldclim.org).
Extended Data Fig. 7 Negative yield gap assessment.
Yield gaps (Yg) between water-limited yield potential and farmers’ actual yield for rainfed maize in the US Midwest at county level for two yield potential prediction approaches: a metamodel that integrates a bottom-up approach with machine learning (Metamodel) and a top-down approach (GAEZ, gaez.fao.org)5. Average county-level farmers’ yield for rainfed maize between 2005 and 2015 was retrieved from USDA-NASS Quick Stats (quickstats.nass.usda.gov/). Only counties with less than 5% of irrigated area or reporting non-irrigated yields in 3 or more years were considered. In the histograms, the dashed vertical line indicates Yg = 0, that is, no difference between yield potential and actual yield, and the percentage of counties presenting negative Yg for each method is shown.
Extended Data Fig. 8 Site-specific yield potential.
Yield potential of irrigated crops and water-limited yield potential of rainfed crops reported in the Global Yield Gap Atlas (GYGA, www.yieldgap.org) at reference weather station level for the three main cereal crops. Last access: July 10th, 2023.
Extended Data Fig. 9 Nearest-neighbor-distance-matching leave-one-out cross-validation (NNDM LOO CV) examples for rainfed wheat.
The top panel shows the distribution of site-specific water-limited yield potential (Yw) of rainfed wheat from the Global Yield Gap Atlas (GYGA) and the prediction grid (lands harvested with rainfed wheat as reported by SPAM17). The lower left and middle panels show the GYGA Yw sites used for model testing and training and excluded sites due to their proximity to the testing site for two iterations of the NNDM LOO CV10. The neighbors to be excluded are defined so that the cumulative frequency of distances between testing sites and their nearest training site in the NNDM LOO CV procedure matches the cumulative frequency of distances between the prediction grid cells and their nearest GYGA Yw, as shown in the lower right panel.
Extended Data Fig. 10 Relation between yield potential uncertainty and environmental dissimilarity.
Relationship between spatially cross-validated root mean square errors (RMSE) and dissimilarity indexes between testing and training sites for each crop and water regime. Values were derived following the nearest-neighbor-distance-matching leave-one-out cross-validation (NNDM LOO CV) method10 and the dissimilarity index used to estimate the area of applicability of the metamodel11. This association was used to estimate the expected RMSE of yield potential predictions from the dissimilarity index between the prediction area and the training sites. Pearson correlation coefficients (r) and their P values are shown.
Supplementary information
Supplementary Information
Supplementary Sections 1–5 and Tables 1–3.
Source data
Source Data Fig. 2
ML model training data.
Source Data Extended Data Fig. 1
Geotiff file with the area of applicability of the metamodel for each crop by water regime combination. The source data to train the ML model are provided as source data for Fig. 2.
Source Data Extended Data Fig. 2
Statistical source data: observed (GYGA yield potential at reference weather station level) and predicted yield potential (derived from the cross-validation procedure) for each crop, water regime and method combination.
Source Data Extended Data Fig. 3
Statistical source data: observed (GYGA yield potential at reference weather station level) and predicted yield potential (derived from the cross-validation procedure) for each crop, water regime and method combination.
Source Data Extended Data Fig. 4
Geotiff file with the metamodel prediction uncertainty for each crop and water regime combination. The source data to train the ML model are provided as source data for Fig. 2.
Source Data Extended Data Fig. 5
Geotiff file with the yield potential of rainfed maize in East Africa as predicted by three different approaches.
Source Data Extended Data Fig. 6
Statistical source data: yield potential of rainfed maize in East Africa as predicted by two different approaches and their corresponding annual precipitation values.
Source Data Extended Data Fig. 7
Statistical source data: actual yield of rainfed maize in the USA at the county level and average yield potential as predicted by two approaches.
Source Data Extended Data Fig. 8
Source data with site-specific yield potential for each crop and water regime combination from the GYGA.
Source Data Extended Data Fig. 9
Statistical source data with cumulative frequency of distances between training and testing sites in the spatial cross-validation procedure and training and prediction sites in the metamodel.
Source Data Extended Data Fig. 10
Statistical source data: RMSEs derived from the spatial cross-validation and their corresponding dissimilarity index between training and testing sites.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Aramburu-Merlos, F., van Loon, M.P., van Ittersum, M.K. et al. High-resolution global maps of yield potential with local relevance for targeted crop production improvement. Nat Food 5, 667–672 (2024). https://doi.org/10.1038/s43016-024-01029-3
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1038/s43016-024-01029-3