Abstract
Historical temperature measurements are the basis of global climate datasets like HadCRUT4. This dataset contains many missing values, particularly for periods before the mid-twentieth century, although recent years are also incomplete. Here we demonstrate that artificial intelligence can skilfully fill these observational gaps when combined with numerical climate model data. We show that recently developed image inpainting techniques perform accurate monthly reconstructions via transfer learning using either 20CR (Twentieth-Century Reanalysis) or the CMIP5 (Coupled Model Intercomparison Project Phase 5) experiments. The resulting global annual mean temperature time series exhibit high Pearson correlation coefficients (≥0.9941) and low root mean squared errors (≤0.0547 °C) as compared with the original data. These techniques also provide advantages relative to state-of-the-art kriging interpolation and principal component analysis-based infilling. When applied to HadCRUT4, our method restores a missing spatial pattern of the documented El Niño from July 1877. With respect to the global mean temperature time series, a HadCRUT4 reconstruction by our method points to a cooler nineteenth century, a less apparent hiatus in the twenty-first century, an even warmer 2016 being the warmest year on record and a stronger global trend between 1850 and 2018 relative to previous estimates. We propose image inpainting as an approach to reconstruct missing climate information and thereby reduce uncertainties and biases in climate records.
This is a preview of subscription content, access via your institution
Relevant articles
Open Access articles citing this article.
-
Newly reconstructed Arctic surface air temperatures for 1979–2021 with deep learning method
Scientific Data Open Access 15 March 2023
-
The Assessment of Global Surface Temperature Change from 1850s: The C-LSAT2.0 Ensemble and the CMST-Interim Datasets
Advances in Atmospheric Sciences Open Access 28 January 2021
-
A novel framework for spatio-temporal prediction of environmental data using deep learning
Scientific Reports Open Access 17 December 2020
Access options
Access Nature and 54 other Nature Portfolio journals
Get Nature+, our best-value online-access subscription
$29.99 per month
cancel any time
Subscribe to this journal
Receive 12 print issues and online access
$259.00 per year
only $21.58 per issue
Rent or buy this article
Get just this article for as long as you need it
$39.95
Prices may be subject to local taxes which are calculated during checkout




Data availability
A software snapshot, trained AI models (checkpoints), missing value masks and the HadCRUT4 reconstructions by the AI models can be downloaded at https://doi.org/10.5281/zenodo.3766741. Training data from 20CR and CMIP5 cannot be hosted due to copyrights, but are available at National Oceanic and Atmospheric Administration and ESGF (Methods). Contact kadow@dkrz.de for further information. Source Data are provided with this paper.
Code availability
All the code utilized in this project can be downloaded here or cloned here at https://github.com/FREVA-CLINT/climatereconstructionAI. This code will be updated and changed over time.
References
Brázdil, R. et al. European climate of the past 500 years: new challenges for historical climatology. Clim. Change 101, 7–40 (2010).
Cubasch, U. & Kadow, C. Global climate change and aspects of regional climate change in the Berlin–Brandenburg Region. Erde 142, 3–20 (2011).
Hartmann, D. L. et al. in Climate Change 2013: The Physical Science Basis (eds Stocker, T.F. et al.) Ch. 2 (IPCC, Cambridge Univ. Press, 2013).
Morice, C. P., Kennedy, J. J., Rayner, N. A. & Jones, P. D. Quantifying uncertainties in global and regional temperature change using an ensemble of observational estimates: the HadCRUT4 dataset. J. Geophys. Res. 117, D08101 (2012).
Vose, R. S. et al. NOAA’s merged land-ocean surface temperature analysis. Bull. Am. Meteorol. Soc. 93, 1677–1685 (2012).
Lenssen, N. et al. Improvements in the GISTEMP uncertainty model. J. Geophys. Res. Atmos. 124, 6307–6326 (2019).
Cowtan, K. & Way, R. G. Coverage bias in the HadCRUT4 temperature series and its impact on recent temperature trends. Q. J. R. Meteorol. Soc. 133, 459–77 (2013).
Rayner, N. A. et al. Global analyses of sea surface temperature, sea ice, and night marine air temperature since the late nineteenth century. J. Geophys. Res. 108, 4407 (2003).
Rhode, R. et al. A new estimate of the average Earth surface land temperature spanning 1753 to 2011. Geoinfor. Geostat. Overview 1, https://doi.org/10.4172/2327-4581.1000101 (2013).
Beckers, J. & Rixen, M. EOF calculations and data filling from incomplete oceanographic data sets. J. Atmos. Oceanic Technol. 20, 1839–1856 (2003).
Wang, K. & Clow, G. D. Reconstructed global monthly land air temperature dataset (1880–2017). Geosci. Data J. https://doi.org/10.1002/gdj3.84 (2019).
Smith, T. M., Reynolds, R. W., Livezey, R. E. & Stokes, D. C. Reconstruction of historical sea surface temperatures using empirical orthogonal functions. J. Clim. 9, 1403–1420 (1996).
Kaplan, A., Kushnir, Y., Cane, M. A. & Blumenthal, M. B. Reduced space optimal analysis for historical data sets: 136 years of Atlantic sea surface temperatures. J. Geophys. Res. Oceans 102, 27835–27860 (1997).
Elken, J., Zujev, M., She, J. & Lagemaa, P. Reconstruction of large-scale sea surface temperature and salinity fields using sub-regional EOF patterns from models. Front. Earth Sci. 7, 232 (2019).
Reichstein, M. Deep learning and process understanding for data-driven Earth system science. Nature 566, 195–204 (2019).
Monteleoni, C., Schmidt, G. A. & McQuade, S. Climate informatics: accelerating discovering in climate science with machine learning. Comput. Sci. Eng. 15, 32–40 (2013).
Barnes, E. A., Hurrell, J. W., Ebert-Uphoff, I., Anderson, C. & Anderson, D. Viewing forced climate patterns through an AI lens. Geophys. Res. Lett. 46, 13389–13398 (2019).
Racah, E. et al. ExtremeWeather: a large-scale climate dataset for semi-supervised detection, localization, and understanding of extreme weather events. Adv. Neural Inform. Process. Syst. 30, 3405–3416 (2017).
Kadow, C., Illing, S., Kröner, I., Ulbrich, U. & Cubasch, U. Decadal climate predictions improved by ocean ensemble dispersion filtering. J. Adv. Modeling Earth Syst. 9, 1138–1149 (2017).
Irrgang, C., Saynisch, J. & Thomas, M. Estimating ocean heat content from tidal magnetic satellite observations. Sci. Rep. 9, 7893 (2019).
Bertalmio, M., Sapiro, G. Caselles, V. & Ballester, C. Image inpainting. In Proc. ACM Conf. Comp. Graphics (SIGGRAPH) (eds Brown, J. R. & Akeley, K.) 417–424 (ACM/Addison-Wesley, 2000).
Shibata, S., Iiyama, M., Hashimoto, A. & Minoh, M. Restoration of sea surface temperature satellite images using a partially occluded training set. In 24th International Conference on Pattern Recognition (ICPR), Beijing (IEEE Computer Society) 2771–2776 (IEEE, 2018).
Dong, J. et al. Inpainting of remote sensing SST images with deep convolutional generative adversarial network. IEEE Geosci. Remote Sens. Lett. 16, 173–177 (2019).
Liu, G. et al. in Computer Vision—ECCV 2018 Lecture Notes in Computer Science, Vol. 11215 (eds Ferrari, V. et al.) 19–35 (Springer, 2018).
Barnes, C., Shechtman, E., Finkelstein, A. & Goldman, D. B. Patchmatch: a randomized correspondence algorithm for structural image editing. ACM Trans. Graph. 28, 24 (2009).
Iizuka, S., Simo-Serra, E. & Ishikawa, H. Globally and locally consistent image completion. ACM Trans. Graph. 36, 107 (2017).
Yu, J. et al. Generative Image Inpainting with Contextual Attention. In IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA 5505–5514 (IEEE/CVF, 2018).
Perez, P., Gangnet, M. & Blake, A. Poisson image editing. ACM Trans. Graph. 22, 313–318 (2003).
Elharrouss, O., Almaadeed, N., Al-Maadeed, S. & Akbari, Y. Image inpainting: a review. Neural Process. Lett. 51, 2007–2028 (2019).
Compo, G. P. et al. The Twentieth Century Reanalysis project. Q. J. R. Meteorol. Soc. 137, 1–28 (2011).
Taylor, K. E., Stouffer, R. J. & Meehl, G. A. An overview of CMIP5 and the experiment design. Bull. Am. Meteor. Soc. 93, 485–498 (2012).
Folland, C. K., Boucher, O., Colman, A. & Parker, D. E. Causes of irregularities in trends of global mean surface temperature since the late 19th century. Sci. Adv. 4, eaao5297 (2018).
Kiladis, G. N. & Diaz, H. F. An analysis of the 1877–78 ENSO episode and comparison with 1982–83. Mon. Weather Rev. 114, 1035–1047 (1986).
Aceituno, P. et al. The 1877–1878 El Niño episode: associated impacts in South America. Clim. Change 92, 389–416 (2009).
Knutson, T. R., Zhang, R. & Horowitz, L. W. Prospects for a prolonged slowdown in global warming in the early 21st century. Nat. Commun. 7, 13676 (2016).
Kosaka, Y. & Xie, S. P. Recent global-warming hiatus tied to equatorial Pacific surface cooling. Nature 501, 403–407 (2013).
Saffioti, C., Fischer, E. M. & Knutti, R. Contributions of atmospheric circulation variability and data coverage bias to the warming hiatus. Geophys. Res. Lett. 42, 2385–2391 (2015).
Marotzke, J. & Forster, P. M. Forcing, feedback and internal variability in global temperature trends. Nature 517, 565–570 (2015).
Yan, Z. X., Li, M., Zuo, W. & Shan, S. in Computer Vision—ECCV 2018 Lecture Notes in Computer Science, Vol. 11215 (eds Ferrari, V. et al.) 3–19 (Springer, 2018).
Kennedy, J. J., Rayner, N. A., Atkinson, C. P. & Killick, R. E. An ensemble data set of sea-surface temperature change from 1850: the Met Office Hadley Centre HadSST.4.0.0.0 data set. J. Geophys. Res. Atmos. 124, 7719–7763 (2019).
Eyring, V. et al. Overview of the Coupled Model Intercomparison Project Phase 6 (CMIP6) experimental design and organization. Geosci. Model Dev. 9, 1937–1958 (2016).
Dufresne, J. L. et al. Climate change projections using the IPSL-CM5 Earth System Model: from CMIP3 to CMIP5. Clim. Dyn. 40, 2123–2165 (2013).
Illing, S., Kadow, C., Oliver, K. & Cubasch, U. MurCSS: a tool for standardized evaluation of decadal hindcast systems. J. Open Res. Softw. 2, e24 (2014).
Lewis, S. C. & Karoly, D. J. Assessment of forced responses of the Australian Community Climate and Earth System Simulator (ACCESS) 1.3 in CMIP5 historical detection and attribution experiments. Aust. Meteorol. Oceanogr. J. 64, 87–101 (2014).
Collier, M. & Uhe, P. CMIP5 Datasets from the ACCESS1.0 and ACCESS1.3 Coupled Climate Models CAWCR Technical Report 059 (CAWCR, 2012).
Xin, X., Wu, T. & Zhang, J. Introduction of CMIP5 experiments carried out with the climate system models of Beijing Climate Center. Adv. Clim. Change Res. 4, 41–49 (2013).
Ji, D., Wang, L., Feng, J., Wu, Q. & Cheng, H. BNU-ESM Model Output Prepared for CMIP5 rcp45 Experiment, Served by ESGF (WDCC at DKRZ, 2015); https://doi.org/10.1594/WDCC/CMIP5.BUBUr4
Canadian Centre for Climate Modelling and Analysis (CCCma). CanESM2 Model Output Prepared for CMIP5 Historical, Served by ESGF (WDCC at DKRZ, 2015); https://doi.org/10.1594/WDCC/CMIP5.CCE2hi
Scoccimarro, E. et al. Effects of tropical cyclones on ocean heat transport in a high resolution coupled general circulation model. J. Clim. 24, 4368–4384 (2011).
Centre National de Recherches Météorologiques and Centre Européen de Recherche et Formation Avancée en Calcul Scientifique WCRP CMIP5: The CNRM-CERFACS Team CNRM-CM5-2 Model Output for the Historical Experiment (Centre for Environmental Data Analysis, 2017); http://catalogue.ceda.ac.uk/uuid/6ea812758cf14de8a5577406e896c3f9
Rotstayn, L. et al. Improved simulation of Australian climate and ENSO-related climate variability in a GCM with an interactive aerosol treatment. Int. J. Climatol. 30, 1067–1088 (2010).
Hazeleger, W. et al. EC-Earth. Bull. Am. Meteor. Soc. 91, 1357–1364 (2010).
Li, L. et al. The flexible global ocean–atmosphere–land system model, Grid-point Version 2: FGOALS-g2. Adv. Atmos. Sci. 30, 543–560 (2013).
Qiao, F. et al. Development and evaluation of an Earth System Model with surface gravity waves. J. Geophys. Res. Oceans 118, 4514–4524 (2013).
Miller, R. L. et al. CMIP5 historical simulations (1850–2012) with GISS ModelE2. J. Adv. Model. Earth Syst. 6, 441–477 (2014).
Volodin, E. M., Dianskii, N. A. & Gusev, A. V. Simulating present-day climate with the INMCM4.0 coupled model of the atmospheric and oceanic general circulations. Atmos. Ocean. Phys. 46, 414–431 (2010).
Watanabe, M. et al. Improved climate simulation by MIROC5: mean states, variability, and climate sensitivity. J. Clim. 23, 6312–6335 (2010).
Giorgetta, M. et al. CMIP5 Simulations of the Max Planck Institute for Meteorology (MPI-M) based on the MPI-ESM-LR Model: the rcp45 Experiment, Served ESGF (WDCC at DKRZ, 2012); https://doi.org/10.1594/WDCC/CMIP5.MXELr4
Meteorological Research Institute (MRI) MRI-CGCM3 Model Output Prepared for CMIP5, Served by ESGF (WDCC at DKRZ, 2012); http://cera-www.dkrz.de/WDCC/CMIP5/Compact.jsp?acronym=MRMC
Iversen, T. et al. The Norwegian Earth System Model, NorESM1-M—Part 2: climate response and scenario projections. Geosci. Model Dev. 6, 389–415 (2013).
Gent, P. R. et al. The Community Climate System Model version 4. J. Clim. 24, 4973–4991 (2011).
Acknowledgements
We thank the HPC-Service of ZEDAT, Freie Universität Berlin and the German Climate Computing Center (DKRZ) for the computation resources; the Climatic Research Unit (CRU) of the University East Anglia (UEA) and the MetOffice UK for providing the HadCRUT4 and HadSST4 datasets; the Earth System Grid Federation (ESGF) for providing the CMIP5 experiments; J. Marotzke (MPI-M), M. Schuster (FUB), E. Barnes (CSU), K. Buscher (UKM) for discussions; N. Inoue (University of Tokyo) for providing the applicable code for image inpainting; A. Richling (FUB) for reproducing the Intergovernmental Panel on Climate Change trend, uncertainty and confidence values; K. Cowtan, R. Way, and the University of York for not just providing the reconstructed HadCRUT4 data (used in Fig. 3b), but also software to apply the kriging scheme (used in Fig. 2). Support for the 20CR Project dataset is provided by the US Department of Energy, Office of Science Innovative and Novel Computational Impact on Theory and Experiment (DOE INCITE) programme, by the Office of Biological and Environmental Research (BER) and by the National Oceanic and Atmospheric Administration Climate Program Office.
Author information
Authors and Affiliations
Contributions
C.K. initiated the study design, coded the AI technology for climate research, performed the analysis and drafted the paper. D.M.H. supervised the NVIDIA AI technology and U.U. supervised the climate research results. All the authors discussed the results and edited the manuscript.
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing interests.
Additional information
Peer review information Primary Handling Editors: Stefan Lachowycz; Heike Langenberg.
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Extended data
Extended Data Fig. 1 Scheme for the study setup including training set.
Input for the AI models, training of the models, and their output. HadCRUT4 data in black, CMIP data or AI in red, 20CR data or AI in blue. Numbers on the bottom of the boxes represent the number of ‘images’ / months / time steps, which are used as input or result as output (see Method section).
Extended Data Fig. 2 Detailed grid space evaluation of 20CR reconstruction.
Correlation (left) and root mean squared error in centigrade (right) comparing the reconstructed 20CR 56th member by the 20crAI model with the original 20CR 56th member. Comparison of all grid points in an annual (row 1) and monthly (row 2) analysis. The respective analysis for the reconstructed grid points only, without (w/o) grid points which were evident during reconstruction below (row 3/4). Grey grid points indicate points that exist for the whole time series.
Extended Data Fig. 3 Detailed grid space evaluation of CMIP reconstruction.
Correlation (left) and root mean squared error in centigrade (right) comparing the reconstructed CMIP 145th member by the cmipAI model with the original CMIP 145th member. Comparison of all grid points in an annual (row 1) and monthly (row 2) analysis. The respective analysis for the reconstructed grid points only, without (w/o) grid points which were evident during reconstruction below (row 3/4). Grey grid points indicate points that exist for the whole time series.
Extended Data Fig. 4 Time-series analysis and evaluation of AI model reconstruction.
As Fig. 2, but the annual global mean anomaly temperature reconstructions in centigrade of 20CR (a, b) / CMIP (c, d) test-suite of monthly grid reconstructions of the held-out 56th / 145th member using the HadCRUT4 missing value mask (1870-2005). In black the original held-out member, in black-dashed the original but masked held-out member to see the effect of the missing values. In blue/red the reconstructed grid time-series of the 20crAI/cmipAI. Tables show anomaly correlation (r) and root mean squared error (rmse) compared to the original dataset on four selected time ranges. (see also Fig. 2).
Extended Data Fig. 5 Spatial evaluation of AI models over time.
Fieldcorrelation of the annual (a) and monthly (b) mean reconstruction of the 20CR 56th / CMIP 145th member by the 20crAI / cmipAI models with the original 20CR 56th / CMIP 145th member in blue / red. Solid line compares the full grid space, while the dashed line respective analysis for the reconstructed grid points only, without (w/o) grid points which were evident during reconstruction.
Extended Data Fig. 6 Evaluation on reconstructed grid points only.
Annual global mean anomaly temperature reconstruction in centigrade of 20CR (a) and CMIP (b) of monthly grid reconstructions applying only reconstructed missing values the extra 56th / 145th member using the HadCRUT4 missing value mask between 1870 and 2005. In black the extra member without (w/o) existing grid points, in black-dashed the original full left-out member to see the effect of the missing values. In blue/red the reconstructed grid time-series of the 20crAI/cmipAI models without (w/o) existing grid points.
Extended Data Fig. 7 Reconstruction analysis of additional Hadley Centre products.
Annual global mean anomaly temperature time series between 1850 and 2018. (a) HadCRUT4 original (masked) 100 member data in black (median, 95th, 5th percentile). The HadCRUT4 reconstruction of the 20crAI/cmipAI models in blue/red (median, 95th, 5th percentile). (b) HadCRUT4 original (masked) data in black, HadSST4 original (masked) data in pink, HadMIX original (masked) data in orange. The originals are dashed, the reconstructions have straight lines. HadMIX has all grid points available of HadSST4, if not available (usually over land) HadCRUT4 grid points are used.
Extended Data Fig. 8 HadCRUT4 trends of AI models in grid space.
Trends in surface temperature from Fig. 4 for 1901–2012. White areas indicate incomplete or missing data. Trends have been calculated only for those grid boxes with greater than 70% complete records and more than 20% data availability in first and last decile of the period. Black plus signs (+) indicate grid boxes where trends are significant (i.e., a trend of zero lies outside the 90% confidence interval). Graphics are constructed, to be compared with IPCC AR5 Chapter 2 Figure 2.21. Here HadCRUT4 Version 4.6.0.0 is used, IPCC report used Version 4.1.1.
Extended Data Fig. 9 Spatial reconstruction of an observed El Niño.
As Fig. 3 but with additional datasets. Recently, the HadSST4 (b) data set was released as an update to HadSST3 (ocean component of HadCRUT4 (a)). Kriging analysis of Cowtan&Way (c) is set next to Berkley Earth (d). In July 1877 HadSST4 has three new grid points, which show very high (warm) temperature anomalies in a region (further south than usual) where the the PCA reconstruction of 20crPCA (e) and cmipPCA (f) show some weak signal. Neural network reconstructions of 20crAI (g) and cmipAI (h) show some strong signal of an El Niño like temperature pattern.
Supplementary information
Supplementary Information
Supplementary Figs. 1–5.
Source data
Source Data Fig. 1
Temperature anomaly maps in NetCDF format.
Source Data Fig. 2
Temperature anomaly time series in NetCDF format.
Source Data Fig. 3
Temperature anomaly maps in NetCDF format.
Source Data Fig. 4
Temperature anomaly time series in NetCDF format.
Source Data Extended Data Fig. 2
Statistical Source Data on maps in NetCDF format.
Source Data Extended Data Fig. 3
Statistical Source Data on maps in NetCDF format.
Source Data Extended Data Fig. 4
Temperature anomaly time series in NetCDF format.
Source Data Extended Data Fig. 5
Statistical measure time series in NetCDF format.
Source Data Extended Data Fig. 6
Temperature anomaly time series in NetCDF format.
Source Data Extended Data Fig. 7
Temperature anomaly time series in NetCDF format.
Source Data Extended Data Fig. 8
Temperature trend maps in NetCDF format.
Source Data Extended Data Fig. 9
Temperature anomaly maps in NetCDF format.
Rights and permissions
About this article
Cite this article
Kadow, C., Hall, D.M. & Ulbrich, U. Artificial intelligence reconstructs missing climate information. Nat. Geosci. 13, 408–413 (2020). https://doi.org/10.1038/s41561-020-0582-5
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1038/s41561-020-0582-5
This article is cited by
-
Newly reconstructed Arctic surface air temperatures for 1979–2021 with deep learning method
Scientific Data (2023)
-
Potential Use of Chat GPT in Global Warming
Annals of Biomedical Engineering (2023)
-
Deep Learning for Seasonal Precipitation Prediction over China
Journal of Meteorological Research (2022)
-
Meshless Surface Wind Speed Field Reconstruction Based on Machine Learning
Advances in Atmospheric Sciences (2022)
-
Counter-prediction approach to predict the missing values of a spatial series on the example of the dustiness in the snow cover
Modeling Earth Systems and Environment (2022)