A topography of climate change research


The massive expansion of scientific literature on climate change1 poses challenges for global environmental assessments and our understanding of how these assessments work. Big data and machine learning can help us deal with large collections of scientific text, making the production of assessments more tractable, and giving us better insights about how past assessments have engaged with the literature. We use topic modelling to draw a topic map, or topography, of over 400,000 publications from the Web of Science on climate change. We update current knowledge on the IPCC, showing that compared with the baseline of the literature identified, the social sciences are in fact over-represented in recent assessment reports. Technical, solutions-relevant knowledge—especially in agriculture and engineering—is under-represented. We suggest a variety of other applications of such maps, and our findings have direct implications for addressing growing demands for more solution-oriented climate change assessments that are also more firmly rooted in the social sciences2,3. The perceived lack of social science knowledge in assessment reports does not necessarily imply an IPCC bias, but rather suggests a need for more social science research with a focus on technical topics on climate solutions.

Access options

Rent or Buy article

Get time limited or full article access on ReadCube.


All prices are NET prices.

Fig. 1: Number of climate change documents in the Web of Science each year.
Fig. 2: A map of the literature on climate change.
Fig. 3: Evolution of the landscape of climate change literature.
Fig. 4: Representation in IPCC reports.

Data availability

The datasets from the study are available as Supplementary Information files.

Code availability

The code used to produce this paper is available at https://github.com/mcallaghan/cc-topography.


  1. 1.

    Minx, J. C., Callaghan, M., Lamb, W. F., Garard, J. & Edenhofer, O. Learning about climate change solutions in the IPCC and beyond. Environ. Sci. Policy 77, 252–259 (2017).

    Article  Google Scholar 

  2. 2.

    Kowarsch, M. et al. A road map for global environmental assessments. Nat. Clim. Change 7, 379–382 (2017).

    Article  Google Scholar 

  3. 3.

    Victor, D. G. Embed the social sciences in climate policy. Nature 520, 7–9 (2015).

    Article  Google Scholar 

  4. 4.

    Nunez-Mir, G. C. et al. Automated content analysis: addressing the big literature challenge in ecology and evolution. Methods Ecol. Evol. 7, 1262–1272 (2016).

    Article  Google Scholar 

  5. 5.

    Grieneisen, M. & Zhang, M. The current status of climate change research. Nat. Clim. Change 1, 72–73 (2011).

    Article  Google Scholar 

  6. 6.

    Haunschild, R., Bornmann, L. & Marx, W. Climate change research in view of bibliometrics. PLoS ONE 11, e0160393 (2016).

    Article  Google Scholar 

  7. 7.

    IPCC Climate Change 2014: Synthesis Report (eds Core Writing Team, Pachauri, R. K. & Meyer, L. A.) (IPCC, 2014).

  8. 8.

    Rao, V. B. et al. Future increase in extreme El Nino events under greenhouse warming increases Zika virus incidence in South America. npj Clim. Atmos. Sci. 2, 2–8 (2019).

    Article  Google Scholar 

  9. 9.

    IPCC Principles Governing IPCC Work (IPCC, 2013).

  10. 10.

    Chalmers, I., Hedges, L. V. & Cooper, H. A brief history of research synthesis. Eval. Health Prof. 25, 12–37 (2002).

    Article  Google Scholar 

  11. 11.

    Beller, E. et al. Making progress with the automation of systematic reviews: principles of the International Collaboration for the Automation of Systematic Reviews (ICASR). Syst. Rev. 7, 77 (2018).

    Article  Google Scholar 

  12. 12.

    Bjurström, A. & Polk, M. Physical and economic bias in climate change research: a scientometric study of IPCC Third Assessment Report. Climatic Change 108, 1–22 (2011).

    Article  Google Scholar 

  13. 13.

    Blei, D., Carin, L. & Dunson, D. Probabilistic topic models. IEEE Signal Process. Mag. 27, 55–65 (2010).

    Article  Google Scholar 

  14. 14.

    Lee, D. D. & Seung, H. S. Learning the parts of objects by non-negative matrix factorization. Nature 401, 788–791 (1999).

    CAS  Article  Google Scholar 

  15. 15.

    Van der Maaten, L. & Hinton, G. Visualizing data using t-SNE. J. Mach. Learn. Res. 9, 2579–2605 (2008).

    Google Scholar 

  16. 16.

    Moss, R. H. et al. The next generation of scenarios for climate change research and assessment. Nature 463, 747–756 (2010).

    CAS  Article  Google Scholar 

  17. 17.

    Hulme, M. & Mahony, M. Climate change: what do we know about the IPCC? Prog. Phys. Geog. 34, 705–718 (2010).

    Article  Google Scholar 

  18. 18.

    Corbera, E., Calvet-Mir, L., Hughes, H. & Paterson, M. Patterns of authorship in the IPCC Working Group III report. Nat. Clim. Change 6, 94–99 (2016).

    Article  Google Scholar 

  19. 19.

    Overland, I. & Sovacool, B. K. The misallocation of climate research funding. Energy Res. Soc. Sci. 62, 101349 (2020).

    Article  Google Scholar 

  20. 20.

    Ford, J. D. et al. Including indigenous knowledge and experience in IPCC assessment reports. Nat. Clim. Change 6, 349–353 (2016).

    Article  Google Scholar 

  21. 21.

    Meehl, G. A. et al. The WCRP CMIP3 multimodel dataset: a new era in climatic change research. Bull. Am. Meteorol. Soc. 88, 1383–1394 (2007).

    Article  Google Scholar 

  22. 22.

    Le, Q. V. & Mikolov, T. Distributed representations of sentences and documents. In Proc. International Conference on Machine Learning Vol. 32 (eds Xing, E. P. & Jebara, T.) 1188–1196 (PMLR, 2014).

  23. 23.

    Khabsa, M. & Giles, C. L. The number of scholarly documents on the public web. PLoS ONE 9, e93949 (2014).

    Article  Google Scholar 

  24. 24.

    Pedregosa, F. et al. Scikit-learn: machine learning in Python. J. Mach. Learn. Res. 12, 2825–2830 (2011).

    Google Scholar 

  25. 25.

    Chaney, A. J. B. & Blei, D. M. Visualizing topic models. In Proc. Sixth International AAAI Conference on Weblogs and Social Media 419–422 (Association for the Advancement of Artificial Intelligence, 2012).

  26. 26.

    Chang, J., Gerrish, S., Wang, C., Boyd-graber, J. L. & Blei, D. M. Reading tea leaves: how humans interpret topic models. Adv. Neural Inf. Process. Syst. 22, 288–296 (2009).

    Google Scholar 

  27. 27.

    Hall, D., Jurafsky, D. & Manning, C. D. Studying the history of ideas using topic models. In Proc. Conference on Empirical Methods in Natural Language Processing 363–371 (Association for Computational Linguistics, 2008).

Download references


M.C. is supported by a PhD stipend from the Heinrich Böll Stiftung. J.M. acknowledges funding from the German Federal Ministry of Education and Research within the PEGASOS project (grant reference: 01LA1826A).

Author information




M.W.C. and J.C.M. designed the research. M.W.C. performed the analysis. M.W.C., J.C.M. and P.M.F analysed the results. M.W.C. wrote the manuscript with contributions from all authors.

Corresponding author

Correspondence to Max W. Callaghan.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Peer review information Nature Climate Change thanks Robin Haunschild, Hannah Hughes and the other, anonymous, reviewer(s) for their contribution to the peer review of this work.

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Extended data

Extended Data Fig. 1 Disciplinary Entropy of Topics.

Coloured bars show the proportion of each topic made up of papers from each disciplinary category. Crosses show the Disciplinary Entropy of each topic (see methods for details).

Extended Data Fig. 2 Topic make up of a single document.

The Doc Term Matrix shows the number of occurrences of each term in the document. The Topic Term Matrix shows the topic score of each term-topic combination. The Doc Topic Matrix shows the document-topic score for each topic. This topic makeup of the document shown is illustrated by the bars in the top left. Words highly associated with each topic that occur in the document are highlighted. All values are real, although the doc-term matrix is scaled by the inverse-document frequency before being used in the model.

Extended Data Fig. 3 IPCC Representation by subfield.

Representation is the share of the subset of documents being cited by the IPCC divided by the share of the subset in the whole literature. We plot on a log scale so that 0.5 is equally distant to 1 as 2; plot labels show real values.

Extended Data Fig. 4 SI Social science & representation in topics across working groups.

Representation is the share of the subset of documents being cited by the IPCC divided by the share of the subset in the whole literature. Social science proportion shows the proportion of the total document-topic score coming from documents in the social sciences.

Extended Data Fig. 5 Topic representation over different values of K (number of topics).

Topics in the upper or lower 6.66th percentile of either dimension are labelled. Representation is the share of the subset of documents being cited by the IPCC divided by the share of the subset in the whole literature. Assessment period occurrence refers to the center of a topic’s distribution across assessment periods (see methods for further details).

Supplementary information

Supplementary Information

Supplementary Table 1.

Reporting Summary

Supplementary Data 1

A list of the documents considered in this study, along with basic metadata and their position on the map. For copyright reasons, the full metadata from WoS cannot be published. To reproduce the analysis, it would be necessary to download the abstracts for the papers shown, either using the WoS IDs provided or the query documented in ref. 5.

Supplementary Data 2

A list of the topics, along with their features discussed in this paper. The top ten words associated with each topic are also shown.

Supplementary Data 3

A list of document-topic scores, which can be cross-referenced with the document and topic IDs in docs.csv and topics.csv.

Supplementary Data 4

Models with different numbers of topics. It was used to select the topic model used for analysis in this paper.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Callaghan, M.W., Minx, J.C. & Forster, P.M. A topography of climate change research. Nat. Clim. Chang. 10, 118–123 (2020). https://doi.org/10.1038/s41558-019-0684-5

Download citation