Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Brief Communication
  • Published:

High-resolution global maps of yield potential with local relevance for targeted crop production improvement

Abstract

Identifying untapped opportunities for crop production improvement in current cropland is crucial to guide food availability interventions. Here we integrated an agronomically robust bottom-up approach with machine learning to generate global maps of yield potential of high resolution (ca. 1 km2 at the Equator) and accuracy for maize, wheat and rice. These maps serve as a robust reference to benchmark farmers’ yields in the context of current cropping systems and water regimes and can help to identify areas with large room to increase crop yields.

This is a preview of subscription content, access via your institution

Access options

Buy this article

Prices may be subject to local taxes which are calculated during checkout

Fig. 1: Schematic representation of the metamodel.
Fig. 2: Global gridded yield potential for the three main cereal crops around year 2020.

Similar content being viewed by others

Data availability

The high-resolution global maps of yield potential have been deposited in Zenodo (https://doi.org/10.5281/zenodo.12209708) (ref. 19). Data on yield potential are available on the GYGA (https://www.yieldgap.org/). Global climatic data are available on WorldClim (https://www.worldclim.org/). Global gridded soil data are available on ISRIC (https://data.isric.org/). Global crop calendar data are available on SAGE, UW-Madison (https://sage.nelson.wisc.edu/data-and-models/datasets/crop-calendar-dataset/), RiceAtlas (https://www.nature.com/articles/sdata201774) and CropMonitor (https://cropmonitor.org/index.php/eodatatools/baseline-data/). Crop distribution maps are available on SPAM (https://mapspam.info). Source data are provided with this paper.

Code availability

The R code used in the current study is publicly available on GitHub (https://github.com/AramburuMerlos/gGYGA).

References

  1. Cassman, K. G. & Grassini, P. A global perspective on sustainable intensification research. Nat. Sustain. 3, 262–268 (2020).

    Article  Google Scholar 

  2. van Ittersum, M. K. et al. Can sub-Saharan Africa feed itself? Proc. Natl Acad. Sci. USA 113, 14964–14969 (2016).

    Article  ADS  PubMed  PubMed Central  Google Scholar 

  3. Marin, F. R. et al. Protecting the Amazon forest and reducing global warming via agricultural intensification. Nat. Sustain. 5, 1018–1026 (2022).

    Article  Google Scholar 

  4. van Ittersum, M. K. et al. Yield gap analysis with local to global relevance—a review. Field Crops Res. 143, 4–17 (2013).

    Article  Google Scholar 

  5. FAO and IIASA. Global Agro Ecological Zones version 4 (GAEZ v4). http://www.fao.org/gaez/ Accessed 29 Sep 2023.

  6. Rattalino Edreira, J. I. et al. Spatial frameworks for robust estimation of yield gaps. Nat. Food 2, 773–779 (2021).

    Article  PubMed  Google Scholar 

  7. Grassini, P. et al. How good is good enough? Data requirements for reliable crop yield simulations and yield-gap analysis. Field Crops Res. 177, 49–63 (2015).

    Article  Google Scholar 

  8. Cedrez, C. B. & Hijmans, R. J. Methods for spatial prediction of crop yield potential. Agron. J. 110, 2322–2330 (2018).

    Article  Google Scholar 

  9. Meyer, H. & Pebesma, E. Machine learning-based global maps of ecological variables and the challenge of assessing them. Nat. Commun. 13, 2208 (2022).

    Article  ADS  CAS  PubMed  PubMed Central  Google Scholar 

  10. Milà, C., Mateu, J., Pebesma, E. & Meyer, H. Nearest neighbour distance matching leave-one-out cross-validation for map validation. Methods Ecol. Evol. 13, 1304–1316 (2022).

    Article  Google Scholar 

  11. Meyer, H. & Pebesma, E. Predicting into unknown space? Estimating the area of applicability of spatial prediction models. Methods Ecol. Evol. 12, 1620–1633 (2021).

    Article  Google Scholar 

  12. Breiman, L. Random forests. Mach. Learn. 45, 5–32 (2001).

    Article  Google Scholar 

  13. Jeong, J. H. et al. Random forests for global and regional crop yield predictions. PLoS ONE 11, e0156571 (2016).

    Article  PubMed  PubMed Central  Google Scholar 

  14. van Wart, J. et al. Use of agro-climatic zones to upscale simulated crop yield potential. Field Crops Res. 143, 44–55 (2013).

    Article  Google Scholar 

  15. van Bussel, L. G. J. et al. From field to atlas: upscaling of location-specific yield gap estimates. Field Crops Res. 177, 98–108 (2015).

    Article  Google Scholar 

  16. Aramburu Merlos, F. & Hijmans, R. J. Potential, attainable, and current levels of global crop diversity. Environ. Res. Lett. 17, 044071 (2022).

    Article  ADS  Google Scholar 

  17. Global spatially-disaggregated crop production statistics data for 2010 Version 2.0. Harvard Dataverse. International Policy Research Institute https://doi.org/10.7910/DVN/PRFF8V (2019).

  18. R Core Team. R: A Language and Environment for Statistical Computing (R Foundation for Statistical Computing, 2020).

  19. Aramburu-Merlos, F., van Loon, M., van Ittersum, M. & Grassini, P. Global gridded maps of yield potential of the Global Yield Gap Atlas (GYGA). Zenodo https://doi.org/10.5281/zenodo.12209708 (2024).

Download references

Acknowledgements

This study was supported by the National Institute of Food and Agriculture of the United States Department of Agriculture (grants Hatch NEB-22-399 to P.G.) and the National Science Foundation (NSF #2214604 to P.G.) We thank M. Alimagham (Wageningen University and Research) for his feedback on an early version of this manuscript.

Author information

Authors and Affiliations

Authors

Contributions

F.A.-M., M.P.v.L, M.K.v.I. and P.G. conceived the research. F.A.-M. performed data acquisition, data processing, modelling and data analysis. F.A.-M., M.P.v.L, M.K.v.I. and P.G. wrote the manuscript.

Corresponding author

Correspondence to Patricio Grassini.

Ethics declarations

Competing interests

The authors declare no competing interests.

Peer review

Peer review information

Nature Food thanks Nimai Senapati, Francisco Villalobos, Bingfang Wu and the other, anonymous, reviewer(s) for their contribution to the peer review of this work.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Extended data

Extended Data Fig. 1 Area of applicability of metamodel.

The metamodel area of applicability is shown in shades of gray. Red areas were excluded from metamodel predictions due to their high dissimilarity with the environments explored by data used to train the metamodel. Color darkness represents the crop harvested area of each crop and water regime combination as percentage of the total area of each pixel. White areas have < 0.5% of given crop and water regime and were not considered. Crop harvested areas were retrieved from SPAM17.

Source data

Extended Data Fig. 2 Global gridded yield potential (Ypot) comparison of two approaches.

Comparison of gridded Ypot predictions based on country-blind climate zones extrapolation (left panels) and metamodel (right panels) versus site-level Ypot from the Global Yield Gap Atlas (GYGA Ypot) for three crops and two water regimes. Each point represents a simulation site (reference weather station). Predictions were derived following nearest-neighbor-distance-matching leave-one-out cross-validation method. The root mean square error relative to GYGA Ypot average (RSME %, also known as normalized RMSE) is shown for each method and crop combination. Other model performance metrics are shown in Supplementary Table 2.

Source data

Extended Data Fig. 3 National gridded yield potential (Ypot) comparison of two approaches.

Comparison of Ypot predictions based on GYGA upscaling approach (left panels) and the metamodel (right panels) versus site-level Ypot from the Global Yield Gap Atlas (GYGA Ypot) for three crops and two water regimes. Each point represents a site (reference weather station). Predictions were derived following the Nearest-Neighbor-Distance-Matching Leave-One-Out Cross-Validation method, with crop harvested area of countries included in GYGA as target prediction area. The root mean square error relative to GYGA Ypot average (RSME, %) is shown for each method and crop combination.

Source data

Extended Data Fig. 4 Metamodel prediction uncertainty.

Expected normalized root mean square error (NRMSE) of global gridded yield potential estimates derived from the metamodel, expressed as percentage of the predicted yield potential.

Source data

Extended Data Fig. 5 Yield potential derived from different approaches.

Comparison of maize water-limited yield potential derived from a top-down approach (GAEZ, gaez.fao.org)5, a bottom-up approach (GYGA CZ, www.yieldgap.org), and a metamodel that integrates a bottom-up approach with machine learning (Metamodel) in East Africa.

Source data

Extended Data Fig. 6 Relation between annual precipitation and the yield potential derived from two approaches.

Water-limited yield potential (Yw) of rainfed maize as a function of annual precipitation in East Africa for two yield potential prediction approaches: a metamodel that integrates a bottom-up approach with machine learning (Metamodel) and a top-down approach (GAEZ, gaez.fao.org)5. Each point represents a 5-arc-minute resolution grid with rainfed maize in East Africa. The red lines are local regression lines. Annual precipitation data was extracted from WorldClim (worldclim.org).

Source data

Extended Data Fig. 7 Negative yield gap assessment.

Yield gaps (Yg) between water-limited yield potential and farmers’ actual yield for rainfed maize in the US Midwest at county level for two yield potential prediction approaches: a metamodel that integrates a bottom-up approach with machine learning (Metamodel) and a top-down approach (GAEZ, gaez.fao.org)5. Average county-level farmers’ yield for rainfed maize between 2005 and 2015 was retrieved from USDA-NASS Quick Stats (quickstats.nass.usda.gov/). Only counties with less than 5% of irrigated area or reporting non-irrigated yields in 3 or more years were considered. In the histograms, the dashed vertical line indicates Yg = 0, that is, no difference between yield potential and actual yield, and the percentage of counties presenting negative Yg for each method is shown.

Source data

Extended Data Fig. 8 Site-specific yield potential.

Yield potential of irrigated crops and water-limited yield potential of rainfed crops reported in the Global Yield Gap Atlas (GYGA, www.yieldgap.org) at reference weather station level for the three main cereal crops. Last access: July 10th, 2023.

Source data

Extended Data Fig. 9 Nearest-neighbor-distance-matching leave-one-out cross-validation (NNDM LOO CV) examples for rainfed wheat.

The top panel shows the distribution of site-specific water-limited yield potential (Yw) of rainfed wheat from the Global Yield Gap Atlas (GYGA) and the prediction grid (lands harvested with rainfed wheat as reported by SPAM17). The lower left and middle panels show the GYGA Yw sites used for model testing and training and excluded sites due to their proximity to the testing site for two iterations of the NNDM LOO CV10. The neighbors to be excluded are defined so that the cumulative frequency of distances between testing sites and their nearest training site in the NNDM LOO CV procedure matches the cumulative frequency of distances between the prediction grid cells and their nearest GYGA Yw, as shown in the lower right panel.

Source data

Extended Data Fig. 10 Relation between yield potential uncertainty and environmental dissimilarity.

Relationship between spatially cross-validated root mean square errors (RMSE) and dissimilarity indexes between testing and training sites for each crop and water regime. Values were derived following the nearest-neighbor-distance-matching leave-one-out cross-validation (NNDM LOO CV) method10 and the dissimilarity index used to estimate the area of applicability of the metamodel11. This association was used to estimate the expected RMSE of yield potential predictions from the dissimilarity index between the prediction area and the training sites. Pearson correlation coefficients (r) and their P values are shown.

Source data

Supplementary information

Supplementary Information

Supplementary Sections 1–5 and Tables 1–3.

Reporting Summary

Source data

Source Data Fig. 2

ML model training data.

Source Data Extended Data Fig. 1

Geotiff file with the area of applicability of the metamodel for each crop by water regime combination. The source data to train the ML model are provided as source data for Fig. 2.

Source Data Extended Data Fig. 2

Statistical source data: observed (GYGA yield potential at reference weather station level) and predicted yield potential (derived from the cross-validation procedure) for each crop, water regime and method combination.

Source Data Extended Data Fig. 3

Statistical source data: observed (GYGA yield potential at reference weather station level) and predicted yield potential (derived from the cross-validation procedure) for each crop, water regime and method combination.

Source Data Extended Data Fig. 4

Geotiff file with the metamodel prediction uncertainty for each crop and water regime combination. The source data to train the ML model are provided as source data for Fig. 2.

Source Data Extended Data Fig. 5

Geotiff file with the yield potential of rainfed maize in East Africa as predicted by three different approaches.

Source Data Extended Data Fig. 6

Statistical source data: yield potential of rainfed maize in East Africa as predicted by two different approaches and their corresponding annual precipitation values.

Source Data Extended Data Fig. 7

Statistical source data: actual yield of rainfed maize in the USA at the county level and average yield potential as predicted by two approaches.

Source Data Extended Data Fig. 8

Source data with site-specific yield potential for each crop and water regime combination from the GYGA.

Source Data Extended Data Fig. 9

Statistical source data with cumulative frequency of distances between training and testing sites in the spatial cross-validation procedure and training and prediction sites in the metamodel.

Source Data Extended Data Fig. 10

Statistical source data: RMSEs derived from the spatial cross-validation and their corresponding dissimilarity index between training and testing sites.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Aramburu-Merlos, F., van Loon, M.P., van Ittersum, M.K. et al. High-resolution global maps of yield potential with local relevance for targeted crop production improvement. Nat Food 5, 667–672 (2024). https://doi.org/10.1038/s43016-024-01029-3

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1038/s43016-024-01029-3

Search

Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing