Applying big data beyond small problems in climate research

This article has been updated


Commercial success of big data has led to speculation that big-data-like reasoning could partly replace theory-based approaches in science. Big data typically has been applied to ‘small problems’, which are well-structured cases characterized by repeated evaluation of predictions. Here, we show that in climate research, intermediate categories exist between classical domain science and big data, and that big-data elements have also been applied without the possibility of repeated evaluation. Big-data elements can be useful for climate research beyond small problems if combined with more traditional approaches based on domain-specific knowledge. The biggest potential for big-data elements, we argue, lies in socioeconomic climate research.

Access options

Rent or Buy article

Get time limited or full article access on ReadCube.


All prices are NET prices.

Change history

  • 18 March 2019

    In the version of this Perspective originally published, the following ‘Journal peer review information’ was missing “Nature Climate Change thanks Prabhat, Wendy Parker and other anonymous reviewer(s) for their contribution to this work.” This statement has now been added.


  1. 1.

    Mayer-Schönberger, V. & Cukier, K. Big Data: A Revolution that Will Transform How We Live, Work and Think (John Murray, London, 2013).

  2. 2.

    Lyon, A. Data. in The Oxford Handbook of the Philosophy of Science (ed. Humphreys, P.) 738–758 (Oxford Univ. Press, Oxford, 2015).

  3. 3.

    Pietsch, W. & Wernecke, J. In Berechenbarkeit der Welt? Philosophie und Wissenschaft im Zeitalter von Big Data (eds Pietsch, W., Wernecke, J. & Ott, M.) 37–57 (Springer VS, Wiesbaden, 2017).

  4. 4.

    Karpatne, A. et al. Theory-guided data science: A new paradigm for scientific discovery from data. IEEE Trans. Knowl. Data Eng. 29, 2318–2331 (2017). This paper introduces a framework for applying data science tools in scientific research and guiding the analysis by theory in order to ensure that the results are physically plausible.

    Article  Google Scholar 

  5. 5.

    Faghmous, J. H. & Kumar, V. A big data guide to understanding climate change: The case for theory-guided data science. Big Data 2, 155–163 (2014).

    Article  Google Scholar 

  6. 6.

    Ford, J. D. et al. Big data has big potential for applications to climate change adaptation. Proc. Natl Acad. Sci. USA 113, 10729–10732 (2016). This opinion paper makes the case for the increasing use of big data in research and decision making on climate change adaptation.

    CAS  Article  Google Scholar 

  7. 7.

    Overpeck, J. T., Meehl, G. A., Bony, S. & Easterling, D. R. Climate data challenges in the 21st century. Science 331, 700–702 (2011).

    CAS  Article  Google Scholar 

  8. 8.

    Caldwell, P. M. et al. Statistical significance of climate sensitivity predictors obtained by data mining. Geophys. Res. Lett. 41, 1803–1808 (2014).

    Article  Google Scholar 

  9. 9.

    Kryvasheyeu, Y. et al. Rapid assessment of disaster damage using social media activity. Sci. Adv. 2, e1500779 (2016).

    Article  Google Scholar 

  10. 10.

    Sprenger, M., Schemm, S., Oechslin, R. & Jenkner, J. Nowcasting Foehn wind events using the AdaBoost machine learning algorithm. Weather Forecast. 32, 1079–1099 (2017).

    Article  Google Scholar 

  11. 11.

    Baumberger, C., Knutti, R. & Hirsch Hadorn, G. Building confidence in climate model projections: an analysis of inferences from fit. Wiley Interdiscip. Rev. Clim. Change 8, e454 (2017). This article introduces a conceptual framework to assess the adequacy of climate models for projections and highlights the importance of the coherence with background knowledge.

    Article  Google Scholar 

  12. 12.

    Boyd, D. & Crawford, K. Critical questions for big data: Provocations for a cultural, technological, and scholarly phenomenon. Inf. Commun. Soc. 15, 662–679 (2012).

    Article  Google Scholar 

  13. 13.

    De Mauro, A., Greco, M. & Grimaldi, M. A formal definition of Big Data based on its essential features. Libr. Rev. 65, 122–135 (2016).

    Article  Google Scholar 

  14. 14.

    Kitchin, R. & McArdle, G. What makes Big Data, Big Data? Exploring the ontological characteristics of 26 datasets. Big Data Soc. 3, 1–10 (2016).This paper discusses characteristics of datasets typically associated with big data and illustrates the lack of terminological clarity around big data.

    Article  Google Scholar 

  15. 15.

    Lukoianova, T. & Rubin, V. L. Veracity roadmap: Is big data objective, truthful and credible?. Adv. Classif. Res. Online 24, 4–15 (2014).

    Article  Google Scholar 

  16. 16.

    Hastie, T., Tibshirani, R. & Friedman, J. The Elements of Statistical Learning: Data Mining, Inference, and Prediction (Springer, New York, 2008).

  17. 17.

    LeCun, Y., Bengio, Y. & Hinton, G. Deep learning. Nature 521, 436–444 (2015).

    CAS  Article  Google Scholar 

  18. 18.

    Linden, G., Smith, B. & York, J. recommendations: item-to-item collaborative filtering. IEEE Internet Comput. 7, 76–80 (2003).

    Article  Google Scholar 

  19. 19.

    Goertzel, B. & Pennachin, C. Artificial General Intelligence (Springer, Berlin Heidelberg, 2007).

  20. 20.

    Manogaran, G. & Lopez, D. Spatial cumulative sum algorithm with big data analytics for climate change detection. Comput. Electr. Eng. 65, 207–221 (2018).

    Article  Google Scholar 

  21. 21.

    Manogaran, G., Lopez, D. & Chilamkurti, N. In-Mapper combiner based MapReduce algorithm for processing of big climate data. Future Gener. Comput. Syst. 86, 433–445 (2018).

    Article  Google Scholar 

  22. 22.

    McGuffie, K. & Henderson-Sellers, A. A Climate Modelling Primer (John Wiley & Sons, Chichester, 2005).

  23. 23.

    Müller, P. Constructing climate knowledge with computer models. Wiley Interdiscip. Rev. Clim. Change 1, 565–580 (2010).

    Article  Google Scholar 

  24. 24.

    Knutti, R. Should we believe model predictions of future climate change? Philos. Trans. R. Soc. Math. Phys. Eng. Sci. 366, 4647–4664 (2008).

    Article  Google Scholar 

  25. 25.

    Krasnopolsky, V. M. & Fox-Rabinovitz, M. S. Complex hybrid models combining deterministic and machine learning components for numerical climate modeling and weather prediction. Neural Netw. 19, 122–134 (2006).

    Article  Google Scholar 

  26. 26.

    Tripathi, S., Srinivas, V. V. & Nanjundiah, R. S. Downscaling of precipitation for climate change scenarios: A support vector machine approach. J. Hydrol. 330, 621–640 (2006).

    Article  Google Scholar 

  27. 27.

    Chadwick, R., Coppola, E. & Giorgi, F. An artificial neural network technique for downscaling GCM outputs to RCM spatial scale. Nonlinear Process. Geophys. 18, 1013–1028 (2011).

    Article  Google Scholar 

  28. 28.

    Tavakol-Davani, H., Nasseri, M. & Zahraie, B. Improved statistical downscaling of daily precipitation using SDSM platform and data-mining methods. Int. J. Climatol. 33, 2561–2578 (2013).

    Article  Google Scholar 

  29. 29.

    Nasseri, M., Tavakol-Davani, H. & Zahraie, B. Performance assessment of different data mining methods in statistical downscaling of daily precipitation. J. Hydrol. 492, 1–14 (2013).

    Article  Google Scholar 

  30. 30.

    Abbot, J. & Marohasy, J. Application of artificial neural networks to rainfall forecasting in Queensland, Australia. Adv. Atmospheric Sci. 29, 717–730 (2012).

    Article  Google Scholar 

  31. 31.

    Abbot, J. & Marohasy, J. Input selection and optimisation for monthly rainfall forecasting in Queensland, Australia, using artificial neural networks. Atmospheric Res. 138, 166–178 (2014).

    Article  Google Scholar 

  32. 32.

    Deo, R. C. & Şahin, M. Application of the extreme learning machine algorithm for the prediction of monthly Effective Drought Index in eastern Australia. Atmospheric Res. 153, 512–525 (2015).

    Article  Google Scholar 

  33. 33.

    Tapia, C. et al. Profiling urban vulnerabilities to climate change: An indicator-based vulnerability assessment for European cities. Ecol. Indic. 78, 142–155 (2017).

    Article  Google Scholar 

  34. 34.

    Shelton, T., Poorthuis, A., Graham, M. & Zook, M. Mapping the data shadows of Hurricane Sandy: Uncovering the sociospatial dimensions of ‘big data’. Geoforum 52, 167–179 (2014).

    Article  Google Scholar 

  35. 35.

    Castelli, R. et al. In Proc. 114th Eur. Study Group Math. Industry 25–43 (2016);

  36. 36.

    Overeem, A. et al. Crowdsourcing urban air temperatures from smartphone battery temperatures. Geophys. Res. Lett. 40, 4081–4085 (2013).

    Article  Google Scholar 

  37. 37.

    Elmore, K. L. et al. MPING: Crowd-sourcing weather reports for research. Bull. Am. Meteorol. Soc. 95, 1335–1342 (2014).

    Article  Google Scholar 

  38. 38.

    Muller, C. L. et al. Crowdsourcing for climate and atmospheric sciences: current status and future potential. Int. J. Climatol. 35, 3185–3203 (2015).

    Article  Google Scholar 

  39. 39.

    Bunn, C., Läderach, P., Ovalle Rivera, O. & Kirschke, D. A bitter cup: climate change profile of global production of Arabica and Robusta coffee. Clim. Change 129, 89–101 (2015).

    Article  Google Scholar 

  40. 41.

    Foley, A. M., Leahy, P. G., Marvuglia, A. & McKeogh, E. J. Current methods and advances in forecasting of wind power generation. Renew. Energy 37, 1–8 (2012).

    Article  Google Scholar 

  41. 42.

    Inman, R. H., Pedro, H. T. C. & Coimbra, C. F. M. Solar forecasting methods for renewable energy integration. Prog. Energy Combust. Sci. 39, 535–576 (2013).

    Article  Google Scholar 

  42. 43.

    Ghosh, S. & Mujumdar, P. P. Statistical downscaling of GCM simulations to streamflow using relevance vector machine. Adv. Water Resour. 31, 132–146 (2008).

    Article  Google Scholar 

  43. 44.

    Mendes, D. & Marengo, J. A. Temporal downscaling: a comparison between artificial neural network and autocorrelation techniques over the Amazon Basin in present and future climate change scenarios. Theor. Appl. Climatol. 100, 413–421 (2010).

    Article  Google Scholar 

  44. 45.

    Chen, S.-T., Yu, P.-S. & Tang, Y.-H. Statistical downscaling of daily precipitation using support vector machines and multivariate analysis. J. Hydrol. 385, 13–22 (2010).

    Article  Google Scholar 

  45. 46.

    Raje, D. & Mujumdar, P. P. A comparison of three methods for downscaling daily precipitation in the Punjab region. Hydrol. Process. 25, 3575–3589 (2011).

    Article  Google Scholar 

  46. 47.

    Pietsch, W. The causal nature of modeling with big data. Philos. Technol. 29, 137–171 (2016).This philosophical paper argues that the predictive ability of machine learning tools is rooted in causality and not just correlations.

    Article  Google Scholar 

  47. 48.

    Masson, D. & Knutti, R. Predictor screening, calibration, and observational constraints in climate model ensembles: An illustration using climate sensitivity. J. Clim. 26, 887–898 (2013).

    Article  Google Scholar 

  48. 49.

    Lu, X. et al. Detecting climate adaptation with mobile network data in Bangladesh: anomalies in communication, mobility and consumption patterns during cyclone Mahasen. Clim. Change 138, 505–519 (2016).

    Article  Google Scholar 

  49. 50.

    Welker, C. et al. Modelling economic losses of historic and present-day high-impact winter windstorms in Switzerland. Tellus Dyn. Meteorol. Oceanogr. 68, 29546 (2016).

    Article  Google Scholar 

  50. 51.

    Arbuthnott, K., Hajat, S., Heaviside, C. & Vardoulakis, S. Changes in population susceptibility to heat and cold over time: assessing adaptation to climate change. Environ. Health 15(Suppl. 1), 73–93 (2016).

    Google Scholar 

  51. 52.

    Vaughan, C. & Dessai, S. Climate services for society: origins, institutional arrangements, and design elements for an evaluation framework: Climate services for society. Wiley Interdiscip. Rev. Clim. Change 5, 587–603 (2014).

    Article  Google Scholar 

  52. 53.

    Benestad, R., Parding, K., Dobler, A. & Mezghani, A. A strategy to effectively make use of large volumes of climate data for climate change adaptation. Clim. Serv. 6, 48–54 (2017).

    Article  Google Scholar 

  53. 54.

    Wahabzada, M. et al. Plant phenotyping using probabilistic topic models: Uncovering the hyperspectral language of plants. Sci. Rep. 6, 22482 (2016).

    CAS  Article  Google Scholar 

  54. 55.

    Walter, A., Finger, R., Huber, R. & Buchmann, N. Smart farming is key to developing sustainable agriculture. Proc. Natl Acad. Sci. USA 114, 6148–6150 (2017).

    CAS  Article  Google Scholar 

  55. 56.

    Lipper, L. et al. Climate-smart agriculture for food security. Nat. Clim. Change 4, 1068–1072 (2014).

    Article  Google Scholar 

  56. 57.

    Katzav, J. & Parker, W. S. The future of climate modeling. Clim. Change 132, 475–487 (2015).

    Article  Google Scholar 

  57. 58.

    Schneider, T., Lan, S., Stuart, A. & Teixeira, J. Earth system modeling 2.0: A blueprint for models that learn from observations and targeted high-resolution simulations. Geophys. Res. Lett. 44, 12396–12417 (2017). This paper argues that parameterizations in Earth system models can be improved with the help of observations and data science tools, including machine learning.

    Article  Google Scholar 

  58. 59.

    Wenzel, M. & Schröter, J. Reconstruction of regional mean sea level anomalies from tide gauges using neural networks. J. Geophys. Res. Oceans 115, C08013- 1–15 (2010).

    Article  Google Scholar 

  59. 60.

    Gagne II, D. J., McGovern, A., Basara, J. B. & Brown, R. A. Tornadic supercell environments analyzed using surface and reanalysis data: A spatiotemporal relational data-mining approach. J. Appl. Meteorol. Climatol. 51, 2203–2217 (2012).

    Article  Google Scholar 

  60. 61.

    Rasouli, K., Hsieh, W. W. & Cannon, A. J. Daily streamflow forecasting by machine learning methods with weather and climate inputs. J. Hydrol. 414–415, 284–293 (2012).

    Article  Google Scholar 

  61. 62.

    Mekanik, F., Imteaz, M. A., Gato-Trinidad, S. & Elmahdi, A. Multiple regression and artificial neural network for long-term rainfall forecasting using large scale climate modes. J. Hydrol. 503, 11–21 (2013).

    Article  Google Scholar 

  62. 63.

    Merz, B., Kreibich, H. & Lall, U. Multi-variate flood damage assessment: a tree-based data-mining approach. Nat. Hazards Earth Syst. Sci. 13, 53–64 (2013).

    Article  Google Scholar 

  63. 64.

    McGovern, A., Gagne II, D. J., Williams, J. K., Brown, R. A. & Basara, J. B. Enhancing understanding and improving prediction of severe weather through spatiotemporal relational learning. Mach. Learn. 95, 27–50 (2014).

    Article  Google Scholar 

  64. 65.

    Abbot, J. & Marohasy, J. Using artificial intelligence to forecast monthly rainfall under present and future climates for the bowen basin, Queensland, Australia. Int. J. Sustain. Dev. Plan. 10, 66–75 (2015).

    Article  Google Scholar 

  65. 66.

    Mohammadi, K. et al. Extreme learning machine based prediction of daily dew point temperature. Comput. Electron. Agric. 117, 214–225 (2015).

    Article  Google Scholar 

  66. 67.

    Patil, A. P. & Deka, P. C. An extreme learning machine approach for modeling evapotranspiration using extrinsic inputs. Comput. Electron. Agric. 121, 385–392 (2016).

    Article  Google Scholar 

  67. 68.

    Salcedo-Sanz, S., Deo, R. C., Carro-Calvo, L. & Saavedra-Moreno, B. Monthly prediction of air temperature in Australia and New Zealand with machine learning algorithms. Theor. Appl. Climatol. 125, 13–25 (2016).

    Article  Google Scholar 

  68. 69.

    Andersen, H., Cermak, J., Fuchs, J., Knutti, R. & Lohmann, U. Understanding the drivers of marine liquid-water cloud occurrence and properties with global observations using neural networks. Atmospheric Chem. Phys. 17, 9535–9546 (2017).

    CAS  Article  Google Scholar 

  69. 70.

    Das, S., Chakraborty, R. & Maitra, A. A random forest algorithm for nowcasting of intense precipitation events. Adv. Space Res. 60, 1271–1282 (2017).

    CAS  Article  Google Scholar 

  70. 71.

    Dayal, K., Deo, R. & Apan, A. A. In Climate Change Adaptation in Pacific Countries: Fostering Resilience and Improving the Quality of Life (ed. Leal Filho, W.) 177–198 (Springer International Publishing, Cham, 2017).

  71. 72.

    Eghdamirad, S., Johnson, F. & Sharma, A. Using second-order approximation to incorporate GCM uncertainty in climate change impact assessments. Clim. Change 142, 37–52 (2017).

    Article  Google Scholar 

  72. 73.

    Majdzadeh Moghadam, F. Neural network-based approach for identification of meteorological factors affecting regional sea-level anomalies. J. Hydrol. Eng. 22, 04016058-1–15 (2017).

    Article  Google Scholar 

  73. 74.

    Kashiwao, T. et al. A neural network-based local rainfall prediction system using meteorological data on the internet: A case study using data from the Japan Meteorological Agency. Appl. Soft Comput. 56, 317–330 (2017).

    Article  Google Scholar 

  74. 75.

    Park, S., Im, J., Park, S. & Rhee, J. Drought monitoring using high resolution soil moisture through multi-sensor satellite data fusion over the Korean peninsula. Agric. For. Meteorol. 237–238, 257–269 (2017).

    Article  Google Scholar 

  75. 76.

    Rahmati, O. & Pourghasemi, H. R. Identification of critical flood prone areas in data-scarce and ungauged regions: A comparison of three data mining models. Water Resour. Manage. 31, 1473–1487 (2017).

    Article  Google Scholar 

  76. 77.

    Roodposhti, M. S., Safarrad, T. & Shahabi, H. Drought sensitivity mapping using two one-class support vector machine algorithms. Atmospheric Res. 193, 73–82 (2017).

    Article  Google Scholar 

  77. 78.

    Wu, J. et al. Establishing and assessing the Integrated Surface Drought Index (ISDI) for agricultural drought monitoring in mid-eastern China. Int. J. Appl. Earth Obs. Geoinformation 23, 397–410 (2013).

    Article  Google Scholar 

  78. 79.

    Zhou, L. et al. Quantitative and detailed spatiotemporal patterns of drought in China during 2001–2013. Sci. Total Environ. 589, 136–145 (2017).

    CAS  Article  Google Scholar 

  79. 40.

    Jones, G. D. et al. Selenium deficiency risk predicted to increase under future climate change. Proc. Natl Acad. Sci. USA 114, 2848–2853 (2017).

    CAS  Article  Google Scholar 

  80. 80.

    Tkachenko, N., Jarvis, S. & Procter, R. Predicting floods with Flickr tags. PLOS ONE 12, e0172870 (2017).

    Article  Google Scholar 

  81. 82.

    Preis, T., Moat, H. S., Bishop, S. R., Treleaven, P. & Stanley, H. E. Quantifying the digital traces of Hurricane Sandy on Flickr. Sci. Rep. 3, 3141 (2013).

    Article  Google Scholar 

Download references


We thank C. Beisbart, A. Merrifield, S. Sippel, R. McMahon and J. Lilliestam for discussions and comments that have improved the quality of this manuscript. The research was supported by the Swiss National Science Foundation, National Research Programme 75 Big Data, project no. 167215.

Author information




B.K. reviewed and classified the studies and led the writing with contributions from all authors. All authors contributed to the framing and the development of the ideas of the paper.

Corresponding author

Correspondence to Benedikt Knüsel.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Journal peer review information: Nature Climate Change thanks Prabhat, Wendy Parker and other anonymous reviewer(s) for their contribution to this work.

Publisher’s note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Knüsel, B., Zumwald, M., Baumberger, C. et al. Applying big data beyond small problems in climate research. Nat. Clim. Chang. 9, 196–202 (2019).

Download citation

Further reading


Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing