Applying big data beyond small problems in climate research

Article metrics


Commercial success of big data has led to speculation that big-data-like reasoning could partly replace theory-based approaches in science. Big data typically has been applied to ‘small problems’, which are well-structured cases characterized by repeated evaluation of predictions. Here, we show that in climate research, intermediate categories exist between classical domain science and big data, and that big-data elements have also been applied without the possibility of repeated evaluation. Big-data elements can be useful for climate research beyond small problems if combined with more traditional approaches based on domain-specific knowledge. The biggest potential for big-data elements, we argue, lies in socioeconomic climate research.

Access optionsAccess options

Rent or Buy article

Get time limited or full article access on ReadCube.


All prices are NET prices.

Change history

  • 18 March 2019

    In the version of this Perspective originally published, the following ‘Journal peer review information’ was missing “Nature Climate Change thanks Prabhat, Wendy Parker and other anonymous reviewer(s) for their contribution to this work.” This statement has now been added.


  1. 1.

    Mayer-Schönberger, V. & Cukier, K. Big Data: A Revolution that Will Transform How We Live, Work and Think (John Murray, London, 2013).

  2. 2.

    Lyon, A. Data. in The Oxford Handbook of the Philosophy of Science (ed. Humphreys, P.) 738–758 (Oxford Univ. Press, Oxford, 2015).

  3. 3.

    Pietsch, W. & Wernecke, J. In Berechenbarkeit der Welt? Philosophie und Wissenschaft im Zeitalter von Big Data (eds Pietsch, W., Wernecke, J. & Ott, M.) 37–57 (Springer VS, Wiesbaden, 2017).

  4. 4.

    Karpatne, A. et al. Theory-guided data science: A new paradigm for scientific discovery from data. IEEE Trans. Knowl. Data Eng. 29, 2318–2331 (2017). This paper introduces a framework for applying data science tools in scientific research and guiding the analysis by theory in order to ensure that the results are physically plausible.

  5. 5.

    Faghmous, J. H. & Kumar, V. A big data guide to understanding climate change: The case for theory-guided data science. Big Data 2, 155–163 (2014).

  6. 6.

    Ford, J. D. et al. Big data has big potential for applications to climate change adaptation. Proc. Natl Acad. Sci. USA 113, 10729–10732 (2016). This opinion paper makes the case for the increasing use of big data in research and decision making on climate change adaptation.

  7. 7.

    Overpeck, J. T., Meehl, G. A., Bony, S. & Easterling, D. R. Climate data challenges in the 21st century. Science 331, 700–702 (2011).

  8. 8.

    Caldwell, P. M. et al. Statistical significance of climate sensitivity predictors obtained by data mining. Geophys. Res. Lett. 41, 1803–1808 (2014).

  9. 9.

    Kryvasheyeu, Y. et al. Rapid assessment of disaster damage using social media activity. Sci. Adv. 2, e1500779 (2016).

  10. 10.

    Sprenger, M., Schemm, S., Oechslin, R. & Jenkner, J. Nowcasting Foehn wind events using the AdaBoost machine learning algorithm. Weather Forecast. 32, 1079–1099 (2017).

  11. 11.

    Baumberger, C., Knutti, R. & Hirsch Hadorn, G. Building confidence in climate model projections: an analysis of inferences from fit. Wiley Interdiscip. Rev. Clim. Change 8, e454 (2017). This article introduces a conceptual framework to assess the adequacy of climate models for projections and highlights the importance of the coherence with background knowledge.

  12. 12.

    Boyd, D. & Crawford, K. Critical questions for big data: Provocations for a cultural, technological, and scholarly phenomenon. Inf. Commun. Soc. 15, 662–679 (2012).

  13. 13.

    De Mauro, A., Greco, M. & Grimaldi, M. A formal definition of Big Data based on its essential features. Libr. Rev. 65, 122–135 (2016).

  14. 14.

    Kitchin, R. & McArdle, G. What makes Big Data, Big Data? Exploring the ontological characteristics of 26 datasets. Big Data Soc. 3, 1–10 (2016).This paper discusses characteristics of datasets typically associated with big data and illustrates the lack of terminological clarity around big data.

  15. 15.

    Lukoianova, T. & Rubin, V. L. Veracity roadmap: Is big data objective, truthful and credible?. Adv. Classif. Res. Online 24, 4–15 (2014).

  16. 16.

    Hastie, T., Tibshirani, R. & Friedman, J. The Elements of Statistical Learning: Data Mining, Inference, and Prediction (Springer, New York, 2008).

  17. 17.

    LeCun, Y., Bengio, Y. & Hinton, G. Deep learning. Nature 521, 436–444 (2015).

  18. 18.

    Linden, G., Smith, B. & York, J. recommendations: item-to-item collaborative filtering. IEEE Internet Comput. 7, 76–80 (2003).

  19. 19.

    Goertzel, B. & Pennachin, C. Artificial General Intelligence (Springer, Berlin Heidelberg, 2007).

  20. 20.

    Manogaran, G. & Lopez, D. Spatial cumulative sum algorithm with big data analytics for climate change detection. Comput. Electr. Eng. 65, 207–221 (2018).

  21. 21.

    Manogaran, G., Lopez, D. & Chilamkurti, N. In-Mapper combiner based MapReduce algorithm for processing of big climate data. Future Gener. Comput. Syst. 86, 433–445 (2018).

  22. 22.

    McGuffie, K. & Henderson-Sellers, A. A Climate Modelling Primer (John Wiley & Sons, Chichester, 2005).

  23. 23.

    Müller, P. Constructing climate knowledge with computer models. Wiley Interdiscip. Rev. Clim. Change 1, 565–580 (2010).

  24. 24.

    Knutti, R. Should we believe model predictions of future climate change? Philos. Trans. R. Soc. Math. Phys. Eng. Sci. 366, 4647–4664 (2008).

  25. 25.

    Krasnopolsky, V. M. & Fox-Rabinovitz, M. S. Complex hybrid models combining deterministic and machine learning components for numerical climate modeling and weather prediction. Neural Netw. 19, 122–134 (2006).

  26. 26.

    Tripathi, S., Srinivas, V. V. & Nanjundiah, R. S. Downscaling of precipitation for climate change scenarios: A support vector machine approach. J. Hydrol. 330, 621–640 (2006).

  27. 27.

    Chadwick, R., Coppola, E. & Giorgi, F. An artificial neural network technique for downscaling GCM outputs to RCM spatial scale. Nonlinear Process. Geophys. 18, 1013–1028 (2011).

  28. 28.

    Tavakol-Davani, H., Nasseri, M. & Zahraie, B. Improved statistical downscaling of daily precipitation using SDSM platform and data-mining methods. Int. J. Climatol. 33, 2561–2578 (2013).

  29. 29.

    Nasseri, M., Tavakol-Davani, H. & Zahraie, B. Performance assessment of different data mining methods in statistical downscaling of daily precipitation. J. Hydrol. 492, 1–14 (2013).

  30. 30.

    Abbot, J. & Marohasy, J. Application of artificial neural networks to rainfall forecasting in Queensland, Australia. Adv. Atmospheric Sci. 29, 717–730 (2012).

  31. 31.

    Abbot, J. & Marohasy, J. Input selection and optimisation for monthly rainfall forecasting in Queensland, Australia, using artificial neural networks. Atmospheric Res. 138, 166–178 (2014).

  32. 32.

    Deo, R. C. & Şahin, M. Application of the extreme learning machine algorithm for the prediction of monthly Effective Drought Index in eastern Australia. Atmospheric Res. 153, 512–525 (2015).

  33. 33.

    Tapia, C. et al. Profiling urban vulnerabilities to climate change: An indicator-based vulnerability assessment for European cities. Ecol. Indic. 78, 142–155 (2017).

  34. 34.

    Shelton, T., Poorthuis, A., Graham, M. & Zook, M. Mapping the data shadows of Hurricane Sandy: Uncovering the sociospatial dimensions of ‘big data’. Geoforum 52, 167–179 (2014).

  35. 35.

    Castelli, R. et al. In Proc. 114th Eur. Study Group Math. Industry 25–43 (2016);

  36. 36.

    Overeem, A. et al. Crowdsourcing urban air temperatures from smartphone battery temperatures. Geophys. Res. Lett. 40, 4081–4085 (2013).

  37. 37.

    Elmore, K. L. et al. MPING: Crowd-sourcing weather reports for research. Bull. Am. Meteorol. Soc. 95, 1335–1342 (2014).

  38. 38.

    Muller, C. L. et al. Crowdsourcing for climate and atmospheric sciences: current status and future potential. Int. J. Climatol. 35, 3185–3203 (2015).

  39. 39.

    Bunn, C., Läderach, P., Ovalle Rivera, O. & Kirschke, D. A bitter cup: climate change profile of global production of Arabica and Robusta coffee. Clim. Change 129, 89–101 (2015).

  40. 41.

    Foley, A. M., Leahy, P. G., Marvuglia, A. & McKeogh, E. J. Current methods and advances in forecasting of wind power generation. Renew. Energy 37, 1–8 (2012).

  41. 42.

    Inman, R. H., Pedro, H. T. C. & Coimbra, C. F. M. Solar forecasting methods for renewable energy integration. Prog. Energy Combust. Sci. 39, 535–576 (2013).

  42. 43.

    Ghosh, S. & Mujumdar, P. P. Statistical downscaling of GCM simulations to streamflow using relevance vector machine. Adv. Water Resour. 31, 132–146 (2008).

  43. 44.

    Mendes, D. & Marengo, J. A. Temporal downscaling: a comparison between artificial neural network and autocorrelation techniques over the Amazon Basin in present and future climate change scenarios. Theor. Appl. Climatol. 100, 413–421 (2010).

  44. 45.

    Chen, S.-T., Yu, P.-S. & Tang, Y.-H. Statistical downscaling of daily precipitation using support vector machines and multivariate analysis. J. Hydrol. 385, 13–22 (2010).

  45. 46.

    Raje, D. & Mujumdar, P. P. A comparison of three methods for downscaling daily precipitation in the Punjab region. Hydrol. Process. 25, 3575–3589 (2011).

  46. 47.

    Pietsch, W. The causal nature of modeling with big data. Philos. Technol. 29, 137–171 (2016).This philosophical paper argues that the predictive ability of machine learning tools is rooted in causality and not just correlations.

  47. 48.

    Masson, D. & Knutti, R. Predictor screening, calibration, and observational constraints in climate model ensembles: An illustration using climate sensitivity. J. Clim. 26, 887–898 (2013).

  48. 49.

    Lu, X. et al. Detecting climate adaptation with mobile network data in Bangladesh: anomalies in communication, mobility and consumption patterns during cyclone Mahasen. Clim. Change 138, 505–519 (2016).

  49. 50.

    Welker, C. et al. Modelling economic losses of historic and present-day high-impact winter windstorms in Switzerland. Tellus Dyn. Meteorol. Oceanogr. 68, 29546 (2016).

  50. 51.

    Arbuthnott, K., Hajat, S., Heaviside, C. & Vardoulakis, S. Changes in population susceptibility to heat and cold over time: assessing adaptation to climate change. Environ. Health 15(Suppl. 1), 73–93 (2016).

  51. 52.

    Vaughan, C. & Dessai, S. Climate services for society: origins, institutional arrangements, and design elements for an evaluation framework: Climate services for society. Wiley Interdiscip. Rev. Clim. Change 5, 587–603 (2014).

  52. 53.

    Benestad, R., Parding, K., Dobler, A. & Mezghani, A. A strategy to effectively make use of large volumes of climate data for climate change adaptation. Clim. Serv. 6, 48–54 (2017).

  53. 54.

    Wahabzada, M. et al. Plant phenotyping using probabilistic topic models: Uncovering the hyperspectral language of plants. Sci. Rep. 6, 22482 (2016).

  54. 55.

    Walter, A., Finger, R., Huber, R. & Buchmann, N. Smart farming is key to developing sustainable agriculture. Proc. Natl Acad. Sci. USA 114, 6148–6150 (2017).

  55. 56.

    Lipper, L. et al. Climate-smart agriculture for food security. Nat. Clim. Change 4, 1068–1072 (2014).

  56. 57.

    Katzav, J. & Parker, W. S. The future of climate modeling. Clim. Change 132, 475–487 (2015).

  57. 58.

    Schneider, T., Lan, S., Stuart, A. & Teixeira, J. Earth system modeling 2.0: A blueprint for models that learn from observations and targeted high-resolution simulations. Geophys. Res. Lett. 44, 12396–12417 (2017). This paper argues that parameterizations in Earth system models can be improved with the help of observations and data science tools, including machine learning.

  58. 59.

    Wenzel, M. & Schröter, J. Reconstruction of regional mean sea level anomalies from tide gauges using neural networks. J. Geophys. Res. Oceans 115, C08013- 1–15 (2010).

  59. 60.

    Gagne II, D. J., McGovern, A., Basara, J. B. & Brown, R. A. Tornadic supercell environments analyzed using surface and reanalysis data: A spatiotemporal relational data-mining approach. J. Appl. Meteorol. Climatol. 51, 2203–2217 (2012).

  60. 61.

    Rasouli, K., Hsieh, W. W. & Cannon, A. J. Daily streamflow forecasting by machine learning methods with weather and climate inputs. J. Hydrol. 414–415, 284–293 (2012).

  61. 62.

    Mekanik, F., Imteaz, M. A., Gato-Trinidad, S. & Elmahdi, A. Multiple regression and artificial neural network for long-term rainfall forecasting using large scale climate modes. J. Hydrol. 503, 11–21 (2013).

  62. 63.

    Merz, B., Kreibich, H. & Lall, U. Multi-variate flood damage assessment: a tree-based data-mining approach. Nat. Hazards Earth Syst. Sci. 13, 53–64 (2013).

  63. 64.

    McGovern, A., Gagne II, D. J., Williams, J. K., Brown, R. A. & Basara, J. B. Enhancing understanding and improving prediction of severe weather through spatiotemporal relational learning. Mach. Learn. 95, 27–50 (2014).

  64. 65.

    Abbot, J. & Marohasy, J. Using artificial intelligence to forecast monthly rainfall under present and future climates for the bowen basin, Queensland, Australia. Int. J. Sustain. Dev. Plan. 10, 66–75 (2015).

  65. 66.

    Mohammadi, K. et al. Extreme learning machine based prediction of daily dew point temperature. Comput. Electron. Agric. 117, 214–225 (2015).

  66. 67.

    Patil, A. P. & Deka, P. C. An extreme learning machine approach for modeling evapotranspiration using extrinsic inputs. Comput. Electron. Agric. 121, 385–392 (2016).

  67. 68.

    Salcedo-Sanz, S., Deo, R. C., Carro-Calvo, L. & Saavedra-Moreno, B. Monthly prediction of air temperature in Australia and New Zealand with machine learning algorithms. Theor. Appl. Climatol. 125, 13–25 (2016).

  68. 69.

    Andersen, H., Cermak, J., Fuchs, J., Knutti, R. & Lohmann, U. Understanding the drivers of marine liquid-water cloud occurrence and properties with global observations using neural networks. Atmospheric Chem. Phys. 17, 9535–9546 (2017).

  69. 70.

    Das, S., Chakraborty, R. & Maitra, A. A random forest algorithm for nowcasting of intense precipitation events. Adv. Space Res. 60, 1271–1282 (2017).

  70. 71.

    Dayal, K., Deo, R. & Apan, A. A. In Climate Change Adaptation in Pacific Countries: Fostering Resilience and Improving the Quality of Life (ed. Leal Filho, W.) 177–198 (Springer International Publishing, Cham, 2017).

  71. 72.

    Eghdamirad, S., Johnson, F. & Sharma, A. Using second-order approximation to incorporate GCM uncertainty in climate change impact assessments. Clim. Change 142, 37–52 (2017).

  72. 73.

    Majdzadeh Moghadam, F. Neural network-based approach for identification of meteorological factors affecting regional sea-level anomalies. J. Hydrol. Eng. 22, 04016058-1–15 (2017).

  73. 74.

    Kashiwao, T. et al. A neural network-based local rainfall prediction system using meteorological data on the internet: A case study using data from the Japan Meteorological Agency. Appl. Soft Comput. 56, 317–330 (2017).

  74. 75.

    Park, S., Im, J., Park, S. & Rhee, J. Drought monitoring using high resolution soil moisture through multi-sensor satellite data fusion over the Korean peninsula. Agric. For. Meteorol. 237–238, 257–269 (2017).

  75. 76.

    Rahmati, O. & Pourghasemi, H. R. Identification of critical flood prone areas in data-scarce and ungauged regions: A comparison of three data mining models. Water Resour. Manage. 31, 1473–1487 (2017).

  76. 77.

    Roodposhti, M. S., Safarrad, T. & Shahabi, H. Drought sensitivity mapping using two one-class support vector machine algorithms. Atmospheric Res. 193, 73–82 (2017).

  77. 78.

    Wu, J. et al. Establishing and assessing the Integrated Surface Drought Index (ISDI) for agricultural drought monitoring in mid-eastern China. Int. J. Appl. Earth Obs. Geoinformation 23, 397–410 (2013).

  78. 79.

    Zhou, L. et al. Quantitative and detailed spatiotemporal patterns of drought in China during 2001–2013. Sci. Total Environ. 589, 136–145 (2017).

  79. 40.

    Jones, G. D. et al. Selenium deficiency risk predicted to increase under future climate change. Proc. Natl Acad. Sci. USA 114, 2848–2853 (2017).

  80. 80.

    Tkachenko, N., Jarvis, S. & Procter, R. Predicting floods with Flickr tags. PLOS ONE 12, e0172870 (2017).

  81. 82.

    Preis, T., Moat, H. S., Bishop, S. R., Treleaven, P. & Stanley, H. E. Quantifying the digital traces of Hurricane Sandy on Flickr. Sci. Rep. 3, 3141 (2013).

Download references


We thank C. Beisbart, A. Merrifield, S. Sippel, R. McMahon and J. Lilliestam for discussions and comments that have improved the quality of this manuscript. The research was supported by the Swiss National Science Foundation, National Research Programme 75 Big Data, project no. 167215.

Author information

B.K. reviewed and classified the studies and led the writing with contributions from all authors. All authors contributed to the framing and the development of the ideas of the paper.

Correspondence to Benedikt Knüsel.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Journal peer review information: Nature Climate Change thanks Prabhat, Wendy Parker and other anonymous reviewer(s) for their contribution to this work.

Publisher’s note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Further reading