Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Review Article
  • Published:

Big Data in Earth system science and progress towards a digital twin

Abstract

The concept of a digital twin of Earth envisages the convergence of Big Earth Data with physics-based models in an interactive computational framework that enables monitoring and prediction of environmental and social perturbations for use in sustainable governance. Although computational advances are rapidly progressing, digital twins of Earth have not yet been produced. In this Review, we summarize the methodological and cyberinfrastructure advances in Big Data that have advanced the progress towards a digital Earth twin. Data assimilation provides the framework for incorporation of high-resolution observations into Earth system models but lacks the decision-making interface and learning ability needed for the digital twin. Machine learning (and particularly deep learning) in Earth system science is now more capable of reaching the high dimensionality, complexity and nonlinearity of real-life Earth systems and is expanding the learning ability from Big Data. Progress in causal inference and reinforcement learning are, respectively, increasing the interpretability of Big Data and the ability of simulations to solve sequential decision-making problems. Social sensing data could provide inputs for multiagent deep reinforcement learning via feedback loops between agents and the environment, enabling large-scale applications in human system modelling. Future research must focus on finding the optimal way to integrate these individual methodologies to achieve digital twins.

Key points

  • The volume of Big Earth Data is increasing year on year across all categories (remote sensing, in situ, social sensing, and simulation and reanalysis), with the addition of social sensing data contributing the largest increase since the 2010s.

  • Big Data assimilation encapsulates the strengths of data-driven approaches and incorporates them into ultrahigh-resolution Earth system models, allowing the assimilation of multisource observations.

  • Combining machine learning with process-based models and causal inference can enhance the transferability, interpretability and predictability of Earth system science.

  • Deep reinforcement learning integrated with agent-based modelling provides a promising framework to address complex governance decision-making problems.

  • These advances, plus technological innovations in computer infrastructure, are allowing Earth system research to evolve towards a digital twin of Earth, a replication of the Earth system constrained by physical laws and available Big Earth Data.

  • Big Data and the development of the digital twin are helping the scientific community to comprehensively model the coevolution of humans and nature, and to address sustainable development issues at a planetary scale.

This is a preview of subscription content, access via your institution

Access options

Buy this article

Prices may be subject to local taxes which are calculated during checkout

Fig. 1: Transition of data use in Earth system science.
Fig. 2: Big Data assimilation into ultrahigh-resolution models.
Fig. 3: Interactions between deep learning, physics-informed machine learning, causal inference and reinforcement learning in Earth system science.
Fig. 4: Causal inference to determine causation, causal pathway and causal effect of the Walker circulation.
Fig. 5: Identification of policy pathways on sustainable development using deep reinforcement learning.
Fig. 6: Grand challenges of Big Data use in Earth system science.

Similar content being viewed by others

References

  1. Yang, C. et al. Big Earth Data analytics: a survey. Big Earth Data 3, 83–107 (2019).

    Article  Google Scholar 

  2. Baldocchi, D. et al. FLUXNET: a new tool to study the temporal and spatial variability of ecosystem-scale carbon dioxide, water vapor, and energy flux densities. Bull. Am. Meteorol. Soc. 82, 2415–2434 (2001).

    Article  Google Scholar 

  3. Liu, Y. et al. Social sensing: a new approach to understanding our socioeconomic environments. Ann. Assoc. Am. Geogr. 105, 512–530 (2015).

    Article  Google Scholar 

  4. Whitcraft, A. K. et al. No pixel left behind: toward integrating Earth observations for agriculture into the United Nations Sustainable Development Goals framework. Remote Sens. Environ. 235, 111470 (2019).

    Article  Google Scholar 

  5. Graham, M. & Shelton, T. Geography and the future of Big Data, Big Data and the future of geography. Dialogues Hum. Geogr. 3, 255–261 (2013).

    Article  Google Scholar 

  6. Eyring, V. et al. Overview of the coupled model intercomparison project phase 6 (CMIP6) experimental design and organization. Geosci. Model. Dev. 9, 1937–1958 (2016).

    Article  Google Scholar 

  7. Hey, T., Tansley, S., Tolle, K. & Gray, J. The Fourth Paradigm: Data-Intensive Scientific Discovery (Microsoft Research, 2009).

  8. Kitchin, R. Big Data, new epistemologies and paradigm shifts. Big Data Soc. 1, 2053951714528481 (2014).

    Article  Google Scholar 

  9. Reichstein, M. et al. Deep learning and process understanding for data-driven Earth system science. Nature 566, 195–204 (2019). Provides a comprehensive overview of deep learning for Earth system science.

    Article  Google Scholar 

  10. Grieves, M. Digital twin: manufacturing excellence through virtual factory replication. White Paper 1, 1–7 (2014).

    Google Scholar 

  11. Barricelli, B. R., Casiraghi, E. & Fogli, D. A survey on digital twin: definitions, characteristics, applications, and design implications. IEEE Access. 7, 167653–167671 (2019).

    Article  Google Scholar 

  12. Raj, P. in Advances in Computers Vol. 121, 267–283 (Elsevier, 2021).

  13. Rasheed, A., San, O. & Kvamsdal, T. Digital twin: values, challenges and enablers from a modeling perspective. IEEE Access. 8, 21980–22012 (2020).

    Article  Google Scholar 

  14. Abdeen, F. N. & Sepasgozar, S. M. E. City digital twin concepts: a vision for community participation. Environ. Sci. Proc. 12, 19 (2022).

    Google Scholar 

  15. Liu, Y. K., Ong, S. K. & Nee, A. Y. C. State-of-the-art survey on digital twin implementations. Adv. Manuf. 10, 1–23 (2022).

    Article  Google Scholar 

  16. Tao, F., Zhang, H., Liu, A. & Nee, A. Y. C. Digital twin in industry: state-of-the-art. IEEE Trans. Ind. Inform. 15, 2405–2415 (2019).

    Article  Google Scholar 

  17. Bauer, P., Stevens, B. & Hazeleger, W. A digital twin of Earth for the green transition. Nat. Clim. Chang. 11, 80–83 (2021). Provided a conceptual framework of the digital twin of Earth.

    Article  Google Scholar 

  18. Voosen, P. Europe builds ‘digital twin’ of Earth to hone climate forecasts. Science 370, 16–17 (2020).

    Article  Google Scholar 

  19. Bauer, P. et al. The digital revolution of Earth-system science. Nat. Comput. Sci. 1, 104–113 (2021). Discussed the revolution in digital Earth systems and proposed the concept of an efficient software infrastructure for the Earth-system digital twin.

    Article  Google Scholar 

  20. Latif, M. The roadmap of climate models. Nat. Comput. Sci. 2, 536–538 (2022).

    Article  Google Scholar 

  21. Schellnhuber, H. J. ‘Earth system’ analysis and the second Copernican revolution. Nature 402, C19–C23 (1999).

    Article  Google Scholar 

  22. Steffen, W. et al. The emergence and evolution of Earth system science. Nat. Rev. Earth Environ. 1, 54–63 (2020).

    Article  Google Scholar 

  23. Hinton, G. E., Osindero, S. & Teh, Y.-W. A fast learning algorithm for deep belief nets. Neural Comput. 18, 1527–1554 (2006).

    Article  Google Scholar 

  24. Hinton, G. E. & Salakhutdinov, R. R. Reducing the dimensionality of data with neural networks. Science 313, 504–507 (2006).

    Article  Google Scholar 

  25. Mnih, V. et al. Human-level control through deep reinforcement learning. Nature 518, 529–533 (2015).

    Article  Google Scholar 

  26. Kaelbling, L. P., Littman, M. L. & Moore, A. W. Reinforcement learning: a survey. J. Artif. Int. Res. 4, 237–285 (1996).

    Google Scholar 

  27. Mousavi, S. M. & Beroza, G. C. Deep-learning seismology. Science 377, eabm4470 (2022).

    Article  Google Scholar 

  28. Bergen, K. J., Johnson, P. A., de Hoop, M. V. & Beroza, G. C. Machine learning for data-driven discovery in solid Earth geoscience. Science 363, eaau0323 (2019). Gave a comprehensive overview of the state of machine learning in the solid Earth geosciences and solutions to broaden and accelerate these capabilities.

    Article  Google Scholar 

  29. Herman, L. et al. A comparison of monoscopic and stereoscopic 3D visualizations: Effect on spatial planning in digital twins. Remote Sens. 13, 2976 (2021).

    Article  Google Scholar 

  30. Jiang, P. et al. Digital twin Earth — Coasts: developing a fast and physics-informed surrogate model for coastal floods via neural operators. Preprint at https://doi.org/10.48550/arXiv.2110.07100 (2021).

  31. Tao, F. et al. Digital twin-driven product design, manufacturing and service with Big Data. Int. J. Adv. Manuf. Technol. 94, 3563–3576 (2018).

    Article  Google Scholar 

  32. Keith, D. W. Geoengineering. Nature 409, 420–420 (2001).

    Article  Google Scholar 

  33. Lawrence, M. G. et al. Evaluating climate geoengineering proposals in the context of the Paris Agreement temperature goals. Nat. Commun. 9, 3734 (2018).

    Article  Google Scholar 

  34. Parson, E. A. Geoengineering: symmetric precaution. Science 374, 795–795 (2021).

    Article  Google Scholar 

  35. Armstrong McKay, D. I. et al. Exceeding 1.5 °C global warming could trigger multiple climate tipping points. Science 377, eabn7950 (2022).

    Article  Google Scholar 

  36. Rockström, J. et al. A safe operating space for humanity. Nature 461, 472–475 (2009).

    Article  Google Scholar 

  37. Oza, N. et al. NASA Earth Science Technology for Earth System Digital Twins (ESDT) https://essopenarchive.org/doi/full/10.1002/essoar.10509965.1 (ESS Open Archive, 2022).

  38. Yang, C., Raskin, R., Goodchild, M. & Gahegan, M. Geospatial cyberinfrastructure: past, present and future. Comput. Environ. Urban. Syst. 34, 264–277 (2010).

    Article  Google Scholar 

  39. Dax, G., Nagarajan, S., Li, H. & Werner, M. Compression supports spatial deep learning. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 16, 702–713 (2023).

    Article  Google Scholar 

  40. Reed, D. A. & Dongarra, J. Exascale computing and Big Data. Commun. ACM 58, 56–68 (2015).

    Article  Google Scholar 

  41. Mystakidis, S. Metaverse. Encyclopedia 2, 486–497 (2022).

    Article  Google Scholar 

  42. Guo, H., Chen, F., Sun, Z., Liu, J. & Liang, D. Big Earth Data: a practice of sustainability science to achieve the sustainable development goals. Sci. Bull. 66, 1050–1053 (2021).

    Article  Google Scholar 

  43. Li, X., Liu, F. & Fang, M. Harmonizing models and observations: data assimilation in Earth system science. Sci. China Earth Sci 63, 1059–1068 (2020).

    Article  Google Scholar 

  44. Gettelman, A. et al. The future of Earth system prediction: advances in model–data fusion. Sci. Adv. 8, eabn3488 (2022).

    Article  Google Scholar 

  45. Carrassi, A., Bocquet, M., Bertino, L. & Evensen, G. Data assimilation in the geosciences: an overview of methods, issues, and perspectives. WIREs Clim. Change 9, e535 (2018).

    Article  Google Scholar 

  46. Hewitt, H., Fox-Kemper, B., Pearson, B., Roberts, M. & Klocke, D. The small scales of the ocean may hold the key to surprises. Nat. Clim. Chang. 12, 496–499 (2022).

    Article  Google Scholar 

  47. Schneider, T. et al. Climate goals and computing the future of clouds. Nat. Clim. Change 7, 3–5 (2017).

    Article  Google Scholar 

  48. Stevens, B. et al. DYAMOND: the DYnamics of the Atmospheric general circulation modeled on non-hydrostatic domains. Prog. Earth Planet. Sci. 6, 61 (2019).

    Article  Google Scholar 

  49. Miyoshi, T., Kondo, K. & Imamura, T. The 10,240-member ensemble kalman filtering with an intermediate agcm. Geophys. Res. Lett. 41, 5264–5271 (2014).

    Article  Google Scholar 

  50. Ruiz, J., Lien, G.-Y., Kondo, K., Otsuka, S. & Miyoshi, T. Reduced non-Gaussianity by 30 s rapid update in convective-scale numerical weather prediction. Nonlinear Process Geophys. 28, 615–626 (2021).

    Article  Google Scholar 

  51. Honda, T. et al. Development of the real-time 30-s-update Big Data assimilation system for convective rainfall prediction with a phased array weather radar: description and preliminary evaluation. J. Adv. Model. Earth Syst. 14, e2021MS002823 (2022).

    Article  Google Scholar 

  52. Mass, C. F. & Madaus, L. E. Surface pressure observations from smartphones: a potential revolution for high-resolution weather prediction? Bull. Am. Meteorol. Soc. 95, 1343–1349 (2014).

    Article  Google Scholar 

  53. Li, R. et al. Smartphone pressure data: quality control and impact on atmospheric analysis. Atmos. Meas. Tech. 14, 785–801 (2021).

    Article  Google Scholar 

  54. Avellaneda, P. M., Ficklin, D. L., Lowry, C. S., Knouft, J. H. & Hall, D. M. Improving hydrological models with the assimilation of crowdsourced data. Water Resour. Res. 56, e2019WR026325 (2020).

    Article  Google Scholar 

  55. Sawada, Y. & Hanazaki, R. Socio-hydrological data assimilation: analyzing human–flood interactions by model–data integration. Hydrol. Earth Syst. Sci. 24, 4777–4791 (2020).

    Article  Google Scholar 

  56. Barendrecht, M. H. et al. The value of empirical data for estimating the parameters of a sociohydrological flood risk model. Water Resour. Res. 55, 1312–1336 (2019).

    Article  Google Scholar 

  57. Jonathan, W., Evans, A. J. & Malleson, N. S. Dynamic calibration of agent-based models using data assimilation. R. Soc. Open Sci. 3, 150703 (2016).

    Article  Google Scholar 

  58. Boukabara, S.-A. et al. Outlook for exploiting artificial intelligence in the Earth and environmental sciences. Bull. Am. Meteorol. Soc. 102, 1–53 (2021).

    Article  Google Scholar 

  59. Geer, A. J. Learning earth system models from observations: machine learning or data assimilation? Phil. Trans. R. Soc. A 379, 20200089 (2021).

    Article  Google Scholar 

  60. Buizza, C. et al. Data learning: integrating data assimilation and machine learning. J. Comput. Sci. 58, 101525 (2022).

    Article  Google Scholar 

  61. Pathiraja, S., Moradkhani, H., Marshall, L., Sharma, A. & Geenens, G. Data-driven model uncertainty estimation in hydrologic data assimilation. Water Resour. Res. 54, 1252–1280 (2018).

    Article  Google Scholar 

  62. Zhang, Q. et al. A dynamic data-driven method for dealing with model structural error in soil moisture data assimilation. Adv. Water Resour. 132, 103407 (2019).

    Article  Google Scholar 

  63. King, F., Erler, A. R., Frey, S. K. & Fletcher, C. G. Application of machine learning techniques for regional bias correction of snow water equivalent estimates in Ontario, Canada. Hydrol. Earth Syst. Sci. 24, 4887–4902 (2020).

    Article  Google Scholar 

  64. Barthélémy, S., Brajard, J., Bertino, L. & Counillon, F. Super-resolution data assimilation. Ocean Dyn. 72, 661–678 (2022).

    Article  Google Scholar 

  65. Cheng, S. et al. Generalised latent assimilation in heterogeneous reduced spaces with machine learning surrogate models. J. Sci. Comput. 94, 11 (2022).

    Article  Google Scholar 

  66. Cheng, S. et al. Data-driven surrogate model with latent data assimilation: application to wildfire forecasting. J. Comput. Phys. 464, 111302 (2022).

    Article  Google Scholar 

  67. Lorenz, E. N. Designing chaotic models. J. Atmos. Sci. 62, 1574–1587 (2005).

    Article  Google Scholar 

  68. Bonavita, M. et al. Machine learning for Earth system observation and prediction. Bull. Am. Meteorol. Soc. 102, E710–E716 (2021).

    Article  Google Scholar 

  69. Kong, Q. et al. Machine learning in seismology: turning data into insights. Seismol. Res. Lett. 90, 3–14 (2018).

    Article  Google Scholar 

  70. Lary, D. J., Alavi, A. H., Gandomi, A. H. & Walker, A. L. Machine learning in geosciences and remote sensing. Geosci. Front. 7, 3–10 (2016).

    Article  Google Scholar 

  71. Tahmasebi, P., Kamrava, S., Bai, T. & Sahimi, M. Machine learning in geo- and environmental sciences: from small to large scale. Adv. Water Resour. 142, 103619 (2020).

    Article  Google Scholar 

  72. Feng, M. & Li, X. Land cover mapping toward finer scales. Sci. Bull. 65, 1604–1606 (2020).

    Article  Google Scholar 

  73. Yu, S. & Ma, J. Deep learning for geophysics: current and future trends. Rev. Geophys. https://doi.org/10.1029/2021RG000742 (2021).

    Article  Google Scholar 

  74. Goodfellow, I., Bengio, Y. & Courville, A. Deep Learning (MIT Press, 2016).

  75. LeCun, Y., Bengio, Y. & Hinton, G. Deep learning. Nature 521, 436–444 (2015).

    Article  Google Scholar 

  76. Hochreiter, S. & Schmidhuber, J. Long short-term memory. Neural Comput. 9, 1735–1780 (1997).

    Article  Google Scholar 

  77. Krizhevsky, A., Sutskever, I. & Hinton, G. E. ImageNet classification with deep convolutional neural networks. Commun. ACM 60, 84–90 (2017).

    Article  Google Scholar 

  78. Goodfellow, I. et al. Generative adversarial networks. Commun. ACM 63, 139–144 (2020).

    Article  Google Scholar 

  79. Scher, S. Toward data-driven weather and climate forecasting: approximating a simple general circulation model with deep learning. Geophys. Res. Lett. 45, 616–12,622 (2018).

    Article  Google Scholar 

  80. Ma, L. et al. Deep learning in remote sensing applications: a meta-analysis and review. ISPRS J. Photogramm. Remote Sens. 152, 166–177 (2019).

    Article  Google Scholar 

  81. Ravuri, S. et al. Skilful precipitation nowcasting using deep generative models of radar. Nature 597, 672–677 (2021). Proposed a deep generative adversarial network model for faster and more accurate precipitation nowcasting from historical radar data.

    Article  Google Scholar 

  82. Zhong, Y. et al. WHU-Hi: UAV-borne hyperspdectral with high spatial resolution (H2) benchmark datasets and classifier for precise crop identification based on deep convolutional neural network with CRF. Remote Sens. Environ. 250, 112012 (2020).

    Article  Google Scholar 

  83. Hong, D. et al. More diverse means better: multimodal deep learning meets remote sensing imagery classification. IEEE Trans. Geosci. Remote Sens. 59, 4340–4354 (2020).

    Article  Google Scholar 

  84. Huang, L., Luo, J., Lin, Z., Niu, F. & Liu, L. Using deep learning to map retrogressive thaw slumps in the Beiluhe region (Tibetan Plateau) from CubeSat images. Remote Sens. Environ. 237, 111534 (2020).

    Article  Google Scholar 

  85. Chi, J., Kim, H., Lee, S. & Crawford, M. M. Deep learning based retrieval algorithm for Arctic sea ice concentration from AMSR2 passive microwave and MODIS optical data. Remote Sens. Environ. 231, 111204 (2019).

    Article  Google Scholar 

  86. Crane-Droesch, A. Machine learning methods for crop yield prediction and climate change impact assessment in agriculture. Environ. Res. Lett. 13, 114003 (2018).

    Article  Google Scholar 

  87. Korup, O. & Stolle, A. Landslide prediction from machine learning. Geol. Today 30, 26–33 (2014).

    Article  Google Scholar 

  88. Shen, C. A transdisciplinary review of deep learning research and its relevance for water resources scientists. Water Resour. Res. 54, 8558–8593 (2018).

    Article  Google Scholar 

  89. Kochanski, K., Mohan, D., Horrall, J., Rountree, B. & Abdulla, G. Deep learning predictions of sand dune migration. Preprint at https://doi.org/10.48550/arXiv.1912.10798 (2019).

  90. Leinonen, J., Nerini, D. & Berne, A. Stochastic super-resolution for downscaling time-evolving atmospheric fields with a generative adversarial network. IEEE Trans. Geosci. Remote Sens. 59, 7211–7223 (2021).

    Article  Google Scholar 

  91. Li, Z., Meier, M.-A., Hauksson, E., Zhan, Z. & Andrews, J. Machine learning seismic wave discrimination: application to earthquake early warning. Geophys. Res. Lett. 45, 4773–4779 (2018).

    Article  Google Scholar 

  92. Wang, B., Zhang, N., Lu, W. & Wang, J. Deep-learning-based seismic data interpolation: a preliminary result. Geophysics 84, V11–V20 (2019).

    Article  Google Scholar 

  93. Wang, N., Zhang, D., Chang, H. & Li, H. Deep learning of subsurface flow via theory-guided neural network. J. Hydrol. 584, 124700 (2020).

    Article  Google Scholar 

  94. Zhou, Z.-H. A brief introduction to weakly supervised learning. Natl Sci. Rev. 5, 44–53 (2018).

    Article  Google Scholar 

  95. Chen, T., Kornblith, S., Norouzi, M. & Hinton, G. A simple framework for contrastive learning of visual representations. In Proc. of the 37th International Conference on Machine Learning 1597–1607 (ICML, 2020).

  96. Chen, Y. & Bruzzone, L. Self-supervised change detection in multi-view remote sensing images. IEEE Trans. Geosci. Remote Sens. 60, 1–12 (2022).

    Google Scholar 

  97. Jung, H., Oh, Y., Jeong, S., Lee, C. & Jeon, T. Contrastive self-supervised learning with smoothed representation for remote sensing. IEEE Geosci. Remote Sens. Lett. 19, 1–5 (2022).

    Article  Google Scholar 

  98. Vidal, R., Bruna, J., Giryes, R. & Soatto, S. Mathematics of deep learning. Preprint at https://doi.org/10.48550/arXiv.1712.04741 (2017).

  99. Rackauckas, C. et al. Universal differential equations for scientific machine learning. Preprint at https://doi.org/10.48550/arXiv.2001.04385 (2021).

  100. Marcus, G. Deep learning: a critical appraisal. Preprint at https://doi.org/10.48550/arXiv.1801.00631 (2018).

  101. Rice, L., Wong, E. & Kolter, J. Z. Overfitting in adversarially robust deep learning. In Proc. of the 37th International Conference on Machine Learning 8093–8104 (ICML, 2020).

  102. Karniadakis, G. E. et al. Physics-informed machine learning. Nat. Rev. Phys. 3, 422–440 (2021). Provides a comprehensive overview for embedding physics-based knowledge into machine learning.

    Article  Google Scholar 

  103. Karpatne, A. et al. Theory-guided data science: a new paradigm for scientific discovery from data. IEEE Trans. Knowl. Data Eng. 29, 2318–2331 (2017).

    Article  Google Scholar 

  104. Kashinath, K. et al. Physics-informed machine learning: case studies for weather and climate modelling. Phil. Trans. R. Soc. A 379, 20200093 (2021).

    Article  Google Scholar 

  105. Zhao, W. L. et al. Physics-constrained machine learning of evapotranspiration. Geophys. Res. Lett. 46, 14496–14507 (2019).

    Article  Google Scholar 

  106. Huanfeng, S. & Liangpei, Z. Mechanism-learning coupling paradigms for parameter inversion and simulation in Earth surface systems. Sci. China Earth Sci. 66, 568–582 (2023).

    Article  Google Scholar 

  107. Jia, X. et al. Physics-guided machine learning for scientific discovery: an application in simulating lake temperature profiles. ACM/IMS Trans. Data Sci. 2, 1–20 (2021).

    Article  Google Scholar 

  108. Daw, A., Karpatne, A., Watkins, W., Read, J. & Kumar, V. Physics-guided neural networks (PGNN): an application in lake temperature modeling. Preprint at https://doi.org/10.48550/arXiv.1710.11431 (2021).

  109. Sturm, P. O. & Wexler, A. S. Conservation laws in a neural network architecture: enforcing the atom balance of a Julia-based photochemical model (v0.2.0). Geosci. Model. Dev. 15, 3417–3431 (2022).

    Article  Google Scholar 

  110. Beucler, T. et al. Enforcing analytic constraints in neural networks emulating physical systems. Phys. Rev. Lett. 126, 098302 (2021).

    Article  Google Scholar 

  111. Read, J. S. et al. Process-guided deep learning predictions of lake water temperature. Water Resour. Res. 55, 9173–9190 (2019).

    Article  Google Scholar 

  112. Karniadakis, G. E. et al. Physics-informed machine learning. Nat. Rev. Phys. 3, 422–440 (2021).

    Article  Google Scholar 

  113. Aldrich, J. Correlations genuine and spurious in Pearson and Pule. Stat. Sci. 10, 364–376 (1995).

    Article  Google Scholar 

  114. Altman, N. & Krzywinski, M. Association, correlation and causation. Nat. Methods 12, 899–900 (2015).

    Article  Google Scholar 

  115. Schölkopf, B. in Probabilistic and Causal Inference: The Works of Judea Pearl Vol. 36, 765–804 (Association for Computing Machinery, 2022).

  116. Pearl, J. The seven tools of causal inference, with reflections on machine learning. Commun. ACM 62, 54–60 (2019).

    Article  Google Scholar 

  117. Cui, P. & Athey, S. Stable learning establishes some common ground between causal inference and machine learning. Nat. Mach. Intell. 4, 110–115 (2022).

    Article  Google Scholar 

  118. Runge, J. et al. Inferring causation from time series in Earth system sciences. Nat. Commun. 10, 2553 (2019).

    Article  Google Scholar 

  119. van Nes, E. H. et al. Causal feedbacks in climate change. Nat. Clim. Change 5, 445–448 (2015).

    Article  Google Scholar 

  120. Zhang, K., Schölkopf, B., Spirtes, P. & Glymour, C. Learning causality and causality-related learning: Some recent progress. Natl Sci. Rev. 5, 26–29 (2018).

    Article  Google Scholar 

  121. Salvucci, G. D., Saleem, J. A. & Kaufmann, R. Investigating soil moisture feedbacks on precipitation with tests of Granger causality. Adv. Water Resour. 25, 1305–1312 (2002).

    Article  Google Scholar 

  122. Tuttle, S. E. & Salvucci, G. D. Confounding factors in determining causal soil moisture–precipitation feedback. Water Resour. Res. 53, 5531–5544 (2017).

    Article  Google Scholar 

  123. Jiang, B., Liang, S. & Yuan, W. Observational evidence for impacts of vegetation change on local surface climate over northern China using the Granger causality test. J. Geophys. Res. Biogeosci. 120, 1–12 (2015).

    Article  Google Scholar 

  124. Papagiannopoulou, C. et al. A non-linear Granger-causality framework to investigate climate–vegetation dynamics. Geosci. Model. Dev. 10, 1945–1960 (2017).

    Article  Google Scholar 

  125. Kretschmer, M. et al. Quantifying causal pathways of teleconnections. Bull. Am. Meteorol. Soc. 102, E2247–E2263 (2021).

    Article  Google Scholar 

  126. Kretschmer, M., Coumou, D., Donges, J. F. & Runge, J. Using causal effect networks to analyze different arctic drivers of midlatitude winter circulation. J. Clim. 29, 4069–4081 (2016).

    Article  Google Scholar 

  127. Sugihara, G. et al. Detecting causality in complex ecosystems. Science 338, 496–500 (2012).

    Article  Google Scholar 

  128. Yang, A. C., Peng, C.-K. & Huang, N. E. Causal decomposition in the mutual causation system. Nat. Commun. 9, 3378 (2018).

    Article  Google Scholar 

  129. Wang, J.-Y., Kuo, T.-C. & Hsieh, C. Causal effects of population dynamics and environmental changes on spatial variability of marine fishes. Nat. Commun. 11, 2635 (2020).

    Article  Google Scholar 

  130. An, W., Beauvile, R. & Rosche, B. Causal network analysis. Annu. Rev. Sociol. 48, 23–41 (2022).

    Article  Google Scholar 

  131. Moraffah, R. et al. Causal inference for time series analysis: problems, methods and evaluation. Knowl. Inf. Syst. 63, 3041–3085 (2021).

    Article  Google Scholar 

  132. Runge, J. et al. Identifying causal gateways and mediators in complex spatio-temporal systems. Nat. Commun. 6, 8502 (2015).

    Article  Google Scholar 

  133. Bareinboim, E. & Pearl, J. Causal inference and the data-fusion problem. Proc. Natl Acad. Sci. USA 113, 7345–7352 (2016).

    Article  Google Scholar 

  134. Rubin, D. B. Causal inference using potential outcomes: design, modeling, decisions. J. Am. Stat. Assoc. 100, 322–331 (2005).

    Article  Google Scholar 

  135. Crutzen, P. J. Albedo enhancement by stratospheric sulfur injections: a contribution to resolve a policy dilemma? Clim. Change 77, 211 (2006).

    Article  Google Scholar 

  136. Gupta, V. & Jain, M. K. Unravelling the teleconnections between ENSO and dry/wet conditions over India using nonlinear Granger causality. Atmos. Res. 247, 105168 (2021).

    Article  Google Scholar 

  137. Silva, F. N. et al. Detecting climate teleconnections with granger causality. Geophys. Res. Lett. 48, e2021GL094707 (2021).

    Article  Google Scholar 

  138. Wallace, J. M. & Gutzler, D. S. Teleconnections in the geopotential height field during the Northern Hemisphere winter. Mon. Weather. Rev. 109, 784–812 (1981).

    Article  Google Scholar 

  139. Runge, J., Nowack, P., Kretschmer, M., Flaxman, S. & Sejdinovic, D. Detecting and quantifying causal associations in large nonlinear time series datasets. Sci. Adv. 5, eaau4996 (2019). Ilustrated the capabilities of multivariate causal discovery techniques in a large-scale analysis of the nonlinear global climatic system.

    Article  Google Scholar 

  140. Hannart, A., Pearl, J., Otto, F. E. L., Naveau, P. & Ghil, M. Causal counterfactual theory for the attribution of weather and climate-related events. Bull. Am. Meteorol. Soc. 97, 99–110 (2016).

    Article  Google Scholar 

  141. Nowack, P., Runge, J., Eyring, V. & Haigh, J. D. Causal networks for climate model evaluation and constrained projections. Nat. Commun. 11, 1415 (2020).

    Article  Google Scholar 

  142. Luo, Y., Peng, J. & Ma, J. When causal inference meets deep learning. Nat. Mach. Intell. 2, 426–427 (2020).

    Article  Google Scholar 

  143. Degai, T. S. & Petrov, A. N. Rethinking Arctic sustainable development agenda through indigenizing UN sustainable development goals. Int. J. Sustain. Dev. World Ecol. 28, 518–523 (2021).

    Article  Google Scholar 

  144. Schrittwieser, J. et al. Mastering Atari, Go, chess and shogi by planning with a learned model. Nature 588, 604–609 (2020).

    Article  Google Scholar 

  145. Silver, D. et al. Mastering the game of Go without human knowledge. Nature 550, 354–359 (2017).

    Article  Google Scholar 

  146. Silver, D. et al. Mastering the game of Go with deep neural networks and tree search. Nature 529, 484–489 (2016).

    Article  Google Scholar 

  147. Sun, W., Bocchini, P. & Davison, B. D. Applications of artificial intelligence for disaster management. Nat. Hazards 103, 2631–2689 (2020).

    Article  Google Scholar 

  148. Sun, A. Y. Optimal carbon storage reservoir management through deep reinforcement learning. Appl. Energy 278, 115660 (2020).

    Article  Google Scholar 

  149. Wu, J., Tao, R., Zhao, P., Martin, N. F. & Hovakimyan, N. Optimizing nitrogen management with deep reinforcement learning and crop simulations. In IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops 1711–719 (CVPRW, 2022).

  150. Alibabaei, K., Gaspar, P. D., Assunção, E., Alirezazadeh, S. & Lima, T. M. Irrigation optimization with a deep reinforcement learning model: case study on a site in Portugal. Agric. Water Manag. 263, 107480 (2022).

    Article  Google Scholar 

  151. Chen, M. et al. A reinforcement learning approach to irrigation decision-making for rice using weather forecasts. Agric. Water Manag. 250, 106838 (2021).

    Article  Google Scholar 

  152. Zhou, N. Intelligent control of agricultural irrigation based on reinforcement learning. J. Phys. Conf. Ser. 1601, 052031 (2020).

    Article  Google Scholar 

  153. Strnad, F. M., Barfuss, W., Donges, J. F. & Heitzig, J. Deep reinforcement learning in World-Earth system models to discover sustainable management strategies. Chaos 29, 123122 (2019). Demonstrated the first attempt to identify sustainable management strategies by combining deep reinforcement learning with Earth system models.

    Article  Google Scholar 

  154. Wang, X. et al. Efficient reservoir management through deep reinforcement learning. Preprint at https://doi.org/10.48550/arXiv.2012.03822 (2020).

  155. Mullapudi, A., Lewis, M. J., Gruden, C. L. & Kerkez, B. Deep reinforcement learning for the real time control of stormwater systems. Adv. Water Resour. 140, 103600 (2020).

    Article  Google Scholar 

  156. Tian, W., Liao, Z., Zhi, G., Zhang, Z. & Wang, X. Combined sewer overflow and flooding mitigation through a reliable real-time control based on multi-reinforcement learning and model predictive control. Water Resour. Res. 58, e2021WR030703 (2022).

    Article  Google Scholar 

  157. Gronauer, S. & Diepold, K. Multi-agent deep reinforcement learning: a survey. Artif. Intell. Rev. 55, 895–943 (2022).

    Article  Google Scholar 

  158. Hernandez-Leal, P., Kartal, B. & Taylor, M. E. A survey and critique of multiagent deep reinforcement learning. Auton. Agent. Multi-Agent Syst. 33, 750–797 (2019).

    Article  Google Scholar 

  159. Nguyen, T. T., Nguyen, N. D. & Nahavandi, S. Deep reinforcement learning for multiagent systems: a review of challenges, solutions, and applications. IEEE Trans. Cybern. 50, 3826–3839 (2020).

    Article  Google Scholar 

  160. Hung, F. & Yang, Y. C. E. Assessing adaptive irrigation impacts on water scarcity in nonstationary environments — a multi-agent reinforcement learning approach. Water Resour. Res. 57, e2020WR029262 (2021).

    Article  Google Scholar 

  161. Galesic, M. et al. Human social sensing is an untapped resource for computational social science. Nature 595, 214–222 (2021).

    Article  Google Scholar 

  162. Shmueli, E., Singh, V. K., Lepri, B. & Pentland, A. Sensing, understanding, and shaping social behavior. IEEE Trans. Comput. Soc. Syst. 1, 22–34 (2014).

    Article  Google Scholar 

  163. An, L. Modeling human decisions in coupled human and natural systems: review of agent-based models. Ecol. Model. 229, 25–36 (2012).

    Article  Google Scholar 

  164. Zhu, R., Hou, Z., Guo, Z. & Wan, B. Summary of “The past, present and future of the habitable Earth: development strategy of Earth science”. Chin. Sci. Bull. 66, 4485–4490 (2021).

    Article  Google Scholar 

  165. Zhu, R., Zhao, G., Xiao, W., Chen, L. & Tang, Y. Origin, accretion, and reworking of continents. Rev Geophys. 59, e2019RG000689 (2021).

    Article  Google Scholar 

  166. Fan, J. et al. A high-resolution summary of Cambrian to Early Triassic marine invertebrate biodiversity. Science 367, 272 (2020).

    Article  Google Scholar 

  167. Wang, C. et al. The deep-time digital Earth program: data-driven discovery in geosciences. Natl Sci. Rev. 8, nwab027 (2021). A review of the current fundamental challenges of data-driven discoveries in the understanding of Earth’s evolution in deep time.

    Article  Google Scholar 

  168. Lewis, S. L. & Maslin, M. A. Defining the Anthropocene. Nature 519, 171–180 (2015).

    Article  Google Scholar 

  169. Ritchie, P. D. L., Clarke, J. J., Cox, P. M. & Huntingford, C. Overshooting tipping point thresholds in a changing climate. Nature 592, 517–523 (2021).

    Article  Google Scholar 

  170. Keys, P. W. et al. Anthropocene risk. Nat. Sustain. 2, 667–673 (2019).

    Article  Google Scholar 

  171. Otto, I. M. et al. Social tipping dynamics for stabilizing Earth’s climate by 2050. Proc. Natl Acad. Sci. USA 117, 2354–2365 (2020).

    Article  Google Scholar 

  172. Guo, H. et al. Measuring and evaluating SDG indicators with Big Earth Data. Sci. Bull. 67, 1792–1801 (2022).

    Article  Google Scholar 

  173. Fu, B. & Li, Y. Bidirectional coupling between the Earth and human systems is essential for modeling sustainability. Natl Sci. Rev. 3, 397–398 (2016).

    Article  Google Scholar 

  174. Liu, J. et al. Complexity of coupled human and natural systems. Science 317, 1513–1516 (2007).

    Article  Google Scholar 

  175. Cheng, G. & Li, X. Integrated research methods in watershed science. Sci. China Earth Sci 58, 1159–1168 (2015).

    Article  Google Scholar 

  176. DeFries, R. & Nagendra, H. Ecosystem management as a wicked problem. Science 356, 265–270 (2017).

    Article  Google Scholar 

  177. Grundmann, R. Climate change as a wicked social problem. Nat. Geosci. 9, 562–563 (2016).

    Article  Google Scholar 

  178. Li, X., Zheng, D., Feng, M. & Chen, F. Information geography: the information revolution reshapes geography. Sci. China Earth Sci 65, 379–382 (2022).

    Article  Google Scholar 

  179. Rittel, H. W. J. & Webber, M. M. Dilemmas in a general theory of planning. Policy Sci. 4, 155–169 (1973).

    Article  Google Scholar 

  180. Huang, Y., Zhang, Y., Youtie, J., Porter, A. L. & Wang, X. How does national scientific funding support emerging interdisciplinary research: a comparison study of Big Data research in the US and China. PLoS ONE 11, e0154509 (2016).

    Article  Google Scholar 

  181. Gorelick, N. et al. Google Earth engine: planetary-scale geospatial analysis for everyone. Remote Sens. Environ. 202, 18–27 (2017).

    Article  Google Scholar 

  182. Bojer, C. S. & Meldgaard, J. P. Kaggle forecasting competitions: an overlooked learning opportunity. Int. J. Forecast. 37, 587–603 (2021).

    Article  Google Scholar 

  183. Wilkinson, M. D. et al. The FAIR guiding principles for scientific data management and stewardship. Sci. Data 3, 160018 (2016).

    Article  Google Scholar 

  184. Cannon, M., Kelly, A. & Freeman, C. Implementing an Open & FAIR data sharing policy — a case study in the Earth and environmental sciences. Learned Publ. 35, 56–66 (2022).

    Article  Google Scholar 

  185. Li, X. et al. Boosting geoscience data sharing in China. Nat. Geosci. 14, 541–542 (2021).

    Article  Google Scholar 

  186. National Academies of Sciences, Engineering, and Medicine. Open Science by Design: Realizing a Vision for 21st Century Research (National Academies Press, 2018).

  187. Jumper, J. et al. Highly accurate protein structure prediction with AlphaFold. Nature 596, 583–589 (2021).

    Article  Google Scholar 

  188. Miyoshi, T. et al. “Big Data assimilation” revolutionizing severe weather prediction. Bull. Am. Meteorol. Soc. 97, 1347–1354 (2016). Exemplified the ability of Big Data assimilation for faster weather prediction with ultrahigh spatial–temporal resolution.

    Article  Google Scholar 

  189. Fan, J., Han, F. & Liu, H. Challenges of Big Data analysis. Natl Sci. Rev. 1, 293–314 (2014).

    Article  Google Scholar 

  190. Guo, H. Big Earth Data: A new frontier in Earth and information sciences. Big Earth Data 1, 4–20 (2017).

    Article  Google Scholar 

  191. Guo, H. et al. Big Earth Data: a new challenge and opportunity for digital Earth’s development. Int. J. Digital Earth 10, 1–12 (2017).

    Article  Google Scholar 

  192. Liang, J. & Gamarra, J. G. P. The importance of sharing global forest data in a world of crises. Sci. Data 7, 424 (2020).

    Article  Google Scholar 

  193. Klopper, K. B., de Witt, R. N., Bester, E., Dicks, L. M. T. & Wolfaardt, G. M. Biofilm dynamics: linking in situ biofilm biomass and metabolic activity measurements in real-time under continuous flow conditions. npj Biofilms Microbiomes 6, 1–10 (2020).

    Article  Google Scholar 

  194. Madaan, A., Sharma, V., Pahwa, P., Das, P. & Sharma, C. in Big Data Analytics (eds. Aggarwal, V. B. et al.) 47–54 (Springer, 2018).

  195. Li, J. et al. Social media: new perspectives to improve remote sensing for emergency response. Proc. IEEE 105, 1900–1912 (2017).

    Article  Google Scholar 

  196. Huang, Z., Qi, H., Kang, C., Su, Y. & Liu, Y. An ensemble learning approach for urban land use mapping based on remote sensing imagery and social sensing data. Remote Sens. 12, 3254 (2020).

    Article  Google Scholar 

Download references

Acknowledgements

The authors thank Y. Zen and G. Zhang for comments on the manuscript, X. Tian for suggestions on data assimilation, Y. Bai for suggestions on simulation and reanalysis data, C. Wang and K. Zhang for assistance in preparing the manuscript, Y. Ge and J. Qin for inspiring and improving figures, J. Runge for the PCMCI dataset, P. Bauer for sharing the Destination Earth figure, C. F. Mass and T. Miyoshi for permission to use their data in Fig. 2, and F. M. Strnad for providing the code and data in Fig. 5b. This work was jointly supported by the Strategic Priority Research Program of Chinese Academy of Sciences (XDA19070104) and the National Natural Science Foundation of China (41988101 and 42171140).

Author information

Authors and Affiliations

Authors

Contributions

X.L. conceptualized the Review. X.L. and M.F. led the discussions and coordinated inputs. X.L. and F.L. contributed the section on Big Data assimilation. Y.R., H.S., J.S., S.Y., Y.S. and C.H. contributed the section on machine and deep learning. M.F. and Q.X. contributed the digital twin section. All authors reviewed the manuscript before submission.

Corresponding authors

Correspondence to Xin Li or Min Feng.

Ethics declarations

Competing interests

The authors declare no competing interests.

Peer review

Peer review information

Nature Reviews Earth & Environment thanks the anonymous reviewers for their contribution to the peer review of this work.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Related links

Copernicus services: https://www.copernicus.eu/en/copernicus-services

Destination Earth: https://digital-strategy.ec.europa.eu/en/policies/destination-earth

Earth-2: https://blogs.nvidia.com/blog/2021/11/12/earth-2-supercomputer

eLTER: https://elter-ri.eu

National Science Foundation of the United States of America: https://www.nsf.gov/cise/bigdata/

Particulate Matter (PM) 2.5 sites in China: https://aqicn.org

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Li, X., Feng, M., Ran, Y. et al. Big Data in Earth system science and progress towards a digital twin. Nat Rev Earth Environ 4, 319–332 (2023). https://doi.org/10.1038/s43017-023-00409-w

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1038/s43017-023-00409-w

This article is cited by

Search

Quick links

Nature Briefing Anthropocene

Sign up for the Nature Briefing: Anthropocene newsletter — what matters in anthropocene research, free to your inbox weekly.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing: Anthropocene