“We’ve been trying to do as good a job as we could, but in the end building these databases ends up being a daunting task.”

– Volker Krey, IIASA and Lead Author for Chapter 4, during a Webinar on AR6 Scenarios Databases (IPCC 2023).

Introduction

Emissions scenarios have been a crucial part of IPCC reports since the First Assessment Report in 1990. They serve as boundary objects that harmonize assumptions about the future across Working Groups. As such, they have been instrumental to the production of a shared assessment of climate research1,2. During the first three assessment cycles, reference emissions scenarios were produced within the remit of the IPCC. Starting from the late 1990s, with the development of Integrated Assessment Models (IAMs), an increasing number of mitigation scenarios were published in the literature – and so assessed by the IPCC. In 2006, the IPCC decided to delegate the development of reference scenarios to the research community, in part due to concerns about maintaining the separation between research and assessment. This delegation accompanied the emergence and organization of a research community that produces and analyses scenarios1,3. In the same movement, the amount of scenarios available in the literature has been steadily growing, from several hundreds in 2000 to several thousands by the time of AR6.

The practice of collecting these scenarios in online databases goes back to 1997, when the first Emissions Scenarios Database was published on the web by CGER4, (p. 77). Since the Second Assessment Report, each Working Group III report has included a chapter dedicated to long-term scenarios, and, after 2001, based on scenarios databases (Chapter 10 in AR2, Chapter 2 in AR3, Chapter 3 in AR4, Chapter 6 in AR5, Chapter 3 in AR6; also Chapter 2 in SR1.5). Scenarios databases have been a consistent tool in IPCC assessments for 25 years, making it possible to collect and compare large ensembles of modeled futures. Over this timespan, they have considerably grown in size, scope, and ambitions. The AR6 database comprises 3131 scenarios from over 95 model families, including not only global pathways, but also national, regional, and sectoral pathways. The AR6 data template for global pathways lists around 2000 variables – there were 135 in the data template for the AR5 database. The Scenario Explorer5 was developed to be more user-friendly than previous interfaces, and communication from the IPCC on how to use the database signals an ambition to make it widely available and usable6.

The evolution of IPCC Scenarios Databases has been shaped by several interwoven processes, including the increase in the number of climate change research publications, the evolving climate policy landscape, the organization of the IAM community, and the lively discussions around the transparency, influence, and relevance of IAMs (e.g.)7,8,9,10,11,12,13. Because most of the scenarios they contain are produced by IAMs, and because the databases are mainly developed and assessed by researchers from the IAM community, their history is entangled with that of Integrated Assessment Modeling: they were shaped by the co-evolution of IAMs and the IPCC.

While these databases have helped establish IAM research as a pivotal and influential part of IPCC assessments, participation in the IPCC has also shaped the organization and agenda of IAM research. This is similar to the history of the Coupled Model Intercomparison Project (CMIP) in relations to the IPCC WGI, though CMIP is much larger in terms of scale, budget, complexity, and quantity of data14,15,16. Drouet et al. 16, (p. 14503) note that “the choice of the scenarios that serve as inputs for models within the CMIP is exogenous to the scientific enterprise and has been driven by the concern of producing information that is relevant for decision-making in the context of climate change”. The dynamic of coevolution of research communities and infrastructures with the IPCC is thus not unique to IAMs. While there appears to be little literature on the history of CMIP and its co-evolution with the IPCC outside of CMIP publications themselves, debates around IAMs echo those around CMIP and Global Circulation Models (GCMs). Earlier research on the climate science-policy interface has for instance analyzed the establishment of GCMs as the dominant form of climate models and its influence on the framing climate change17,18,19,20. Touzé-Peiffer et al. 15, (p. 12) similarly argue that CMIP has contributed to promoting climate research centered on GCMs as well as specific ways of using GCMs “as black boxes” by enabling people who have not participated in model development to analyze model results. On the other hand, Drouet et al. 16, (p. 14503) emphasize that CMIP has helped model improvement and increased the consistency, intelligibility, and relevance of IPCC reports, and that “together with the provision of a large database, the CMIP (…) promotes ARs accuracy”.

From the perspective of taking stock on the role and activities of the IPCC, studying Scenarios Databases is relevant in at least two respects. First, it contributes to the analysis of the co-evolution of the IPCC and of climate research. This is crucial to understand the role of IAMs within IPCC assessments, as well as the way criticism of IAMs is addressed in practice. Second, Scenarios Databases are a site from where to observe and analyze ongoing tensions regarding the role and mandate of the IPCC. At stake in the construction and use of these databases is the delicate combination of assessment, policy-relevance and transparency. For WGIII in the AR6 cycle, this translated in two ambitions expressed by the co-chairs: to enhance the transparency of the modeling assumptions and approaches assessed by the IPCC, and to better integrate global scenarios with a broader diversity of perspectives13,21. While this was part of a wider effort of the IPCC to enhance transparency for AR622,23,24, these ambitions also echoed two types of critiques of IAMs that emerged in the literature after AR5.

The first critique emphasises that the prominence of IAMs has tended to narrow down the “corridor” of possibilities11,12: the types of scenarios more frequently or more easily modeled by IAMs have taken center-stage, to the expense of alternative but less-modeled options such as post-growth scenarios9,25,26. The second issue is that of transparency. It has become more acute since IAMs have acquired a pivotal role at the science-policy interface. The complexity of IAMs means that their logic and functioning can be hard to grasp for outsiders, despite considerable efforts to improve model documentation and to open-source the codes since AR5 (for instance, the code of major IAMs such as MESSAGE, REMIND and GCAM is now open source, and the IAMC website and wiki provide documentation of most of the main IAMs). Critiques of IAMs have expressed concerns as to the difficulty to trace input assumptions and underlying worldviews10,27,28.

During the AR6 cycle, two databases were developed and used: the IAMC 1.5°C Scenario Explorer29 and the AR6 Scenario Explorer5. This article analyses the process and practices of building these databases and situates it within the history of scenarios databases since the 1990s. It complements Peters et al.’s contribution to this collection14 with a socio-historical perspective on scenarios “data journeys”30. The article first retraces the interwoven histories of scenarios databases, the IPCC WGIII, and the IAM community, and then follows the construction of the AR6 Scenarios Database step-by-step. Given these interwoven histories and co-evolutions, the article outlines the prominence of IAMs in Scenarios Databases. The point, however, is not to contest the influence of IAMs. Rather, the article aims to analyze the challenges and trade-offs in building and broadening a data infrastructure in the context of IPCC assessments, so as to provide a basis for reflecting on how to best broaden and consolidate the IPCC Working Group III evidence base.

Scenarios databases as knowledge infrastructures

Scenarios databases are not a stand-alone product. They are part of a data infrastructure that has developed along with the IAM research community. Over the last two decades, the development of IAMs and the involvement of IAM research groups in multiple large model intercomparison projects (MIPs) has generated increasing amounts of scenarios. The models themselves have grown more complex, integrating more modules and details, hence more variables. This produces a huge amount of scenario data that is quite heterogenous, insofar as it comes from models that have different architectures, assumptions, and solving mechanisms1. As a result, the IAM community has put a lot of effort into collecting, standardizing, and categorizing data: it has developed an infrastructure to work with and share scenario data.

This infrastructure relies on intercomparison protocols, data templates, model and scenario documentation, data reporting standards, open-source code packages31,32, tools and methodologies to analyze large scenarios ensembles, and classifications such as the RCP-SSP matrix or the climate categories and policy categories used in AR6. It is not stabilized: it evolves as models are developed and refined, as calls for more transparency arise, but also as new research questions are asked, new users become interested in the data, and new topics are brought within the scope of long-term global scenarios work. How are these databases developed in practice? How are they adapted to new ideas, projects, or expectations?

These questions are informed by a sociological and ethnographic approach to data, classification, and infrastructure which entails considering data infrastructures as practical achievements – as processes rather than end-products. In this perspective, knowledge infrastructures are understood as the combination of the technological and social elements (hardware, institutions, standards, etc.) that support the production and dissemination of knowledge33, (p. 1-25). Scholarship in Science and technology Studies (STS) has explored the practical work of “sorting things out”34, maintaining standards and collecting, ordering, and exchanging data. It has also studied the conditions for data mobility and interoperability, comparing data infrastructures, movement, and practices in different scientific fields35. This literature has underlined the specific difficulties of building and using databases in environmental research, were data travel across disciplines36,37. It builds on and extends the tradition of laboratory ethnography38,39,40, and in particular its attention to writing and documentation in the making of scientific facts41,42.

The main tenets of this perspective can be summarised as such. First, data do not already exist “out there”: they are produced. Making data is a process of literally forming the world; that is, reshaping things so that some of their features become information that can travel far and wide41. Second, data cannot always be fully extracted from their context of production: instrumentation, classification conventions, models, etc. leave their marks on data33. This becomes clear in attempts to collate past and present data, or data from different disciplines: differences (across time, space, or disciplines) in reporting standards, instruments, or research interests can impede commensurability36, generating what Edwards et al. 37. have called “science friction”. It takes coordination work, informal exchanges, and careful maintenance to allow data to travel across communities. This also implies that knowledge infrastructures are constructed together with the communities that use them34. Third, constructing databases and information infrastructures is technical as much as political work. As Bowker writes, “information infrastructures such as databases should be read both discursively and materially; […] they are a site of political and ethical as well as technical work”36, (p. 647). Organising data takes part in organising the world, in deciding what is counted and what is not, what is named and what is not, what is bundled together and what is differentiated.

Such a perspective translates into an attention to the institutional as well as practical work that sustains data infrastructures, and to their political dimensions. This implies combining two lines of investigation. On the one hand, the paper seeks to clarify the co-evolution of scenarios data infrastructures and the IAM community, with a focus on the challenge of extending the reach of the data infrastructure beyond IAM research. On the other hand, it analyses the practicalities of building, but also filling and using Scenarios Databases in the context of IPCC assessments, which differs from that of IAM research projects, in order to understand how this contributes to organising the “possibility space”43 and framing the IPCC messages. From structuring data and metadata templates to extracting messages from the database, each step in the process is shaped by the requirements specific to IPCC assessments (representation of the literature, transparency, policy-relevance, integration across disciplines), along with technical, institutional and material constraints associated with the development and maintenance of data infrastructures.

Results

A history of scenarios databases as research instruments and assessment tools

Emissions scenarios databases are almost as old as the IPCC. The first one was collected by Tsuneyuki Morita in 1994 to inform the assessment of the IPCC 1992 emissions scenarios44,45. Collected using a questionnaire sent by fax to 34 experts, this database fits in less than twenty pages. Since then, scenarios databases have been a consistent feature of the work of WGIII. They have also become research instruments: they are now a staple of IAM research projects and a crucial device for coordination across modeling teams. This section retraces the history of scenarios database to show the evolutions in their design, role and purpose.

Today, Scenarios Databases are hosted and managed at IIASA and data standards are overseen by the IAMC Scientific Working Group on Data Protocols and Management. This arrangement is the result of 30 years of iterative and collective work that has been part and parcel of the structuration of IAM research. It was shaped by four interwoven processes: the regular collection and assessment of published scenarios for the IPCC; the organization of research collaborations across modeling teams; the coordination and standardization of data and metadata formats, templates, and documentation, a task undertaken within the IAMC; and the development of the actual database software and services.

1994-2006: Mapping the literature and informing IPCC scenario development

Originally, scenarios databases were the answer to a practical need: collecting “both the main basic assumptions” of the models “and the latest outcomes” of the scenarios44, (p. 2) in order to complement the information published in the literature. As the number of quantified emissions scenarios increased, reviewing and assessing the literature required a centralized database to manage the “huge amount of data related to emissions scenarios of greenhouse gases” (http://sres.ciesin.columbia.edu/OpenProcess/htmls/MoritaDB.html [accessed 10/02/2023]). Nakicenovic et al.46 and Morita and Lee (http://sres.ciesin.columbia.edu/OpenProcess/htmls/MoritaDB.html [accessed 10/02/2023]) mention several databases of scenarios that informed the 1994 WGIII report, compiled by the Energy Modeling Forum as part of EMF12, the International Energy Workshop, and ETSAP. The survey by Morita in 1993-1994 appears as the first attempt to systematically review scenarios and their underpinning assumptions in order to inform the work of the IPCC.

Efforts to systematically collect scenarios were amplified during the preparation of the IPCC Special Report on Emission Scenarios (SRES). The first “IPCC Emission Scenarios Database”, sometimes referred to as “Morita’s Database”, was published on the web in 1997. It was hosted at the Centre for Global Environmental Research in Tsukuba, Japan, and designed as a relational database using MS Access ’97. In a progress report on the SRES process, Nakicenovic explained that the objective was “to expand the database into a scenario assessment tool in itself” (http://sres.ciesin.org/OpenProcess/htmls/prg_report.html [accessed 10/02/2023]). The database was first meant to help the review of the literature by collecting as many scenarios as possible. It was also used to perform statistical analyses of the data in order to map, not the full range of possible scenarios, but the full range of scenarios and assumptions in the literature so as to assist the conception of representative reference scenarios46. This database, which initially documented 428 global and regional scenarios, was maintained and expanded until 2009. It was used for the assessment of post-SRES scenarios in AR3 and AR44,47,48,49. In parallel, the SRES Writing Team designed scenarios templates and invited modeling group to submit scenarios in an “open process”. Forty-one scenarios were collected in a web-based database.

The SRES process was rather complex: it aimed to collect, review, and standardize scenario data in order to ensure that the set of IPCC reference scenarios were representative of the ranges of assumptions and outputs in the literature. The design of the database reflects the ambition to reconcile several purposes. The open process was meant to enhance methodological and geographical pluralism; the templates allowed for standardization and documentation of reported data in order to ensure the comparability of scenarios; and publication on the web provided transparency by ensuring the accessibility of data (note that the databases are still available 25 years later at http://sres.ciesin.columbia.edu/ [accessed 22/09/2023]).

Until AR4, the IPCC produced its own Emissions Scenarios. ‘Morita’s Database’ and the SRES database were directly related to this activity: their purpose was to allow for the evaluation of IPCC scenarios against the literature. By collecting available scenarios within a single “space of calculation”50, (p. 195), IPCC authors could perform statistical analyses to assess whether the IPCC scenarios represented the range of assumptions and emissions in the published literature, and so to ensure that the IPCC scenarios were in line with the state of the academic conversation.

Up to this point, the development of scenarios database was primarily oriented towards informing IPCC assessments and scenario development, already with concerns for transparency and pluralism. It contributed to the establishment of a specific method of literature assessment in Working Group III, based on the systematic, quantitative assessment of long-term global scenarios.

Since 2006: diversification and expansion of scenarios databases

In 2006, the IPCC decided to move the development of scenarios outside of the assessment fence51,52,53. Mikiko Kainuma (NIES), Nebojsa Nakicenovic (IIASA), and John Weyant (EMF), who all had experience with scenarios comparison and had been involved in the assessment of IAM scenarios for the IPCC, created the Integrated Assessment Modeling Consortium (IAMC) to organize and coordinate scenarios work within the research community54. As IAM-based research expanded, the research and assessment parts of scenarios work grew more intertwined, and scenarios databases found themselves at the heart of the knot. Their purpose diversified beyond supporting IPCC assessments as they became crucial instruments for cooperation across IAM groups.

The IAMC first convened in September 2008. This was also the period when IAM research started to organize around large MIPs: ADAM, the first of many EU-funded MIPs, lasted from 2006 to 2009; two other EU-funded MIPs, RECIPE and POeM, started in 2008, and the Asian Modeling Exercise in 2009. These projects intensified collaboration across IAM teams, and they generated huge amounts of scenarios data. Given the collaborative nature of these projects, the availability, comparability, and manageability of modeling outputs became an issue. Or, to quote the Data Protocols and Management presentation at the 2010 IAMC meeting, “Modeling comparison exercises and scenario reviews have expanded over the past years. Handling the data requests has become a pain for modeling teams”55. In 2009-2010, the IAMC developed a data template and a database prototype composed of a “data container (spreadsheet) and sets of variables (core and extended)”55. The IAMC Scientific Working Group on Data Protocols and Management was established to coordinate the standardization of reporting formats as well as the database submission system and the documentation of models.

Simultaneously, IIASA devoted resources to host and manage scenarios databases. While the databases were initially managed by a couple of IIASA researchers, there is a now a “Scenario services and scientific software” team which is committed to open source and FAIR data principles (interview 0043). The tools used have thus evolved towards open-source, transparent software, with a move to github and the development of a “nomenclature” package for the validation and processing of scenario data, as well as the Python package “pyam” for analysis and visualization31,32. The choice to move to open source tools, as opposed to the previously used proprietary MS Access 97, broadens access to the databases and increases transparency, since anyone with programming knowledge can consult and use the packages. At this point, the team at IIASA is mainly funded on EU-project money with limited stable funding.

The databases used for the IPCC assessments are constructed and managed as part of the same process and infrastructure as the project databases. The RCP Database served as a starting point for the development of the IAMC Database, and the (evolving) IAMC Data template forms the basis of IPCC assessment data templates – though it is adapted to meet the requirements of the assessment, for example adding variables that are relevant to sectoral chapters. The SR1.5 Scenario Explorer exemplifies the entanglement of MIPs, IIASA infrastructural work, and IPCC assessment: the Explorer interface, as well as the pyam package, were developed at IIASA by the researcher in charge of the compilation of scenarios for the SR1.5, in order to manage and make available the scenarios collected for the Special Report. Both the Explorer interface and pyam are meant to be used for a broader range of purpose than IPCC assessment – and indeed they are (the Explorer interface is now the standard for project databases). Over the years, this work has been supported by several EU-funded projects, e.g. SENSES, CD-LINKS, and openENTRANCE. IPCC Scenarios Databases and the IAMC research and data infrastructure are thus hard to disentangle.

From the AR5 to the AR6 cycle

Since AR2, the reports of Working Group III have all included a chapter dedicated to integrated assessment and emissions scenarios, based on a database of scenarios published in the literature. There is a degree of consistency in authors teams, with a few authors repeatedly involved in the long-term scenarios chapters of successive IPCC reports. As shown in the previous section, the format and scope of scenarios databases have evolved a lot from assessment cycle to assessment cycle, but their relevance as a tool for the assessment of long-term global scenarios is well-established. That there would be a scenarios database for AR6 was thus not questioned. Indeed, the report of the 2017 Expert meeting on Scenarios, which took place on the days before the AR6 Scoping meeting, notes that

“Given the large amount of scenarios and data from a large variety of sources, maintaining a database for the assessment is the only way to ensure full documentation and transparency of the assessment.”56, (p. 11).

The fact that the databases for the AR6 cycle would be hosted at IIASA and developed jointly by IIASA and IAMC does not appear to have been debated; interviews suggest that it seemed like the most efficient and perhaps only realistic option, given that the infrastructure was already there (Interviews 0046, 0048). This was decided early in the AR6 cycle, during the preparation of the Special Report on 1.5 °C. The conditions for the development of the databases were set in a collaboration agreement between the WGIII Co-Chairs, IAMC, and IIASA. The agreement establishes the responsibilities of the three parties and clearly states that the data infrastructure (including guaranteeing the long-term availability and archiving of the data) is the responsibility of IIASA – thus, the data infrastructure lies outside the IPCC – while the authors of the relevant chapters are responsible for the content.

The role and development of scenarios databases were openly considered during the AR6 cycle. Contrary to the AR5 cycle, several documents produced during the AR6 cycle address the importance of the databases and discuss their purpose and design, especially in relations to transparency and communication56,57. This signals a recognition of the centrality of scenarios databases as tools for assessment, but also of their implications for the transparency of assessments.

The increased focus on scenarios databases during the AR6 cycle is related to broader debates on the role of IAMs and long-term global scenarios in AR5. In WGIII contribution to the AR5, IAM-produced scenarios framed the message, and their results featured prominently in the Summary for Policy Makers (SPM). The Chair’s vision paper prepared ahead of the Scoping meeting for AR6 noted that there had been “pushback against the dominance of IAMs in the messaging of the SPM and subsequent derived material”, which it attributed to “the perceived lack of transparency surrounding the assumptions and structure” of IAMs, to “the relative lack of attention to national and regional modeling in favor of global models”, to the weak linkages between global and bottom-up or sectoral perspectives, and to concern about the implications of large scale BECCS21, (p. 33). In the aftermath of AR5, criticism of BECCS indeed sparked wider debates about the assumptions underpinning IAM scenarios7,8,9. In this context, the new WGIII co-Chairs communicated two ambitions relevant to the place of IAMs and scenarios in the sixth assessment cycle: enhancing transparency, especially for policymakers, and ensuring that regional, national, and sectoral modeling would complement the hitherto dominating global IAM perspective. These ambitions were clear in the recommendations from the 2017 Expert Meeting on Scenarios56 and in the Chair’s Vision Paper21.

These ambitions for enhanced transparency and pluralism of modeling approaches translated in two elements in the AR6 outline. First, following the recommendation from the Expert Meeting on Scenarios to “suggest a separate forward-looking chapter on the medium term (up to 2030-2040?) and the long term (2100)”56, (p. 2), the AR6 outline included two distinct scenario-based chapters: Chapter 3, on “Mitigation pathways compatible with long-term goals”, was to assess global, long-term pathways – it was the traditional IAM chapter – while Chapter 4 on “Mitigation and development pathways in the near- to mid-term” would consider “mitigation efforts in the context of national and, where appropriate, subnational action plans and policies” and “national, regional and global modeling of mitigation and development pathways in relation to mid-century strategies” – and so, enhance attention to national and regional modeling58, (p. 3). Second, the outline also included an annex on “Scenarios and modeling methods” – Annex III in the final report –, which the WGIII co-Chairs intended as a way to improve transparency, especially for policymakers58, (p. 9).

These ambitions also framed the development of the database. As mentioned above, the process was more transparent than for AR5, in that its purpose, framing, and ambitions were outlined in several publicly available documents where concern for transparency and pluralism is explicit. The Collaboration agreement between WGIII, IAMC and IIASA insists that “the database activities of the IAMC and IIASA […] will be a major asset for increasing the transparency of the underlying data sources of the AR6 and the Special Report on Global Warming of 1.5°C, and in order to achieve an effective dissemination of the scenario datasets underpinning the report” (https://data.ene.iiasa.ac.at/iamc-1.5c-explorer/static/files/collaboration_agreement_ipccwgiii_iamc_iiasa.pdf [accessed 09/03/2023]). In a letter inviting submissions to the Scenarios databases, the WGIII Co-Chairs also “would particularly encourage modeling teams who have not previously contributed to the AR5 and SR15 and similar databases to participate” so as to help “bring a wide range of scenarios, with global, national and sectoral detail, to the attention of policymakers”59. More concretely, the ambition to broaden the pool of modeling approaches and scenarios considered led to four distinct calls for scenarios: one for global emissions scenarios, one for national and regional scenarios, and two calls for sectoral scenarios, on buildings and transport. Once the outline was agreed upon, and the calls for scenarios were endorsed by the co-Chairs, the work fell into the hands of chapter teams.

Building the AR6 scenario infrastructure: process and challenges

As was the case in previous IPCC reports, the construction and analysis of the AR6 Scenarios Database involved the joint efforts of IPCC authors, IIASA, IAMC, and research teams. In many ways, there is a strong likeness between work around the AR6 Scenarios database and work in IAM modeling intercomparison projects: the data infrastructure is similar, many of the scenarios are the same, and a lot of IPCC authors in the scenarios-based chapters are part of the IAM community. For IAM authors, especially those in Chapter 3, the setting was thus somewhat familiar. Yet, it is crucial for the legitimacy of the IPCC that the distinction between research and assessment is maintained. Besides, the mandate for the database was framed by the AR6 outline and by the direction towards increased transparency and diversity of perspectives encouraged by the WGIII Co-Chairs. This makes the AR6 Scenarios Database a singular object. To analyze how it was constructed, shaped and used, I retrace the “data journey”30 of scenarios from modeling groups to IPCC reports: division of work between IIASA, IAMC, and IPCC, elaboration of the data and metadata templates, collection of scenarios, processing of the data, and formulation of messages.

Division of work

In the AR6 WGIII report outline, two chapters were explicitly concerned with mitigation pathways: Chapter 3 (Mitigation pathways compatible with long-term goals) and 4 (Mitigation and development pathways in the near- to mid-term). The selection of authors for these two chapters reflected the 2017 Expert Meeting’s recommendations to broaden the range of perspectives and to better integrate the global systems view that dominated AR5 with bottom-up sectoral approaches. In Chapter 3, which covers the IAM literature, the authors team was dominated by IAM researchers, with representatives from the main IAM teams (AIM, COFFEE, GCAM, IMAGE, MESSAGE, REMIND, WITCH), but it also comprised non-IAMers, including critiques of IAMs and experts in fields “where the credibility of IAMs has been questioned”13, (p. 6). The confrontation of IAM work with different methods and perspectives was expected to push for transparency. The appointment of Keywan Riahi, the Director of the Energy, Climate and Environment Program at IIASA, as Coordinating Lead Author (CLA) ensured a solid link between the chapter and the people at IIASA who would implement the database. Similarly, the Chapter 4 team included a number of modelers, as well as representatives from non-modeling approaches, and one Lead Author (LA) from IIASA and the IAMC Scientific Working Group on Data Protocols and Management who was able to liaise with the database team. To enable communication between the database team at IIASA and the IPCC chapter teams, a list of “scenario ambassadors” was compiled during the first WGIII Lead Author Meeting in Edinburgh in 2019, with at least one contact for each chapter (Interview 0044).

Most of the work related to the development, constitution, and use of the database was divided between IIASA and the Chapter 3 team (and to a lesser extent the Chapter 4 team, for the national and regional scenarios database). The IPCC procedures are meant to establish a clear distinction between research and assessment, and any research work is supposed to take place outside of the IPCC. The submission and acceptance cut-off dates for research papers to be considered in the report participate in maintaining this distinction. In the case of Scenarios databases, the tripartite Collaboration Agreement between WGIII, IAMC, and IIASA made it clear where the fence was. Implementation and maintenance of the database happened at IIASA; discussions on the assessment and analysis of the database – including variable templates, metadata requirements, vetting, and classification of scenarios took place among authors.

However, in practice, the line is sometimes blurry. Because IAM research is largely collaborative and based on multi-team projects, the IAM community is rather close-knit. The IAMers CLAs and LAs in Chapter 3 knew each other well; seven of them are members of the IAMC Scientific Steering Committee. At IIASA, the team in charge of developing and implementing the database was in direct contact with LAs. The fact that the CLA for Chapter 3 was based at IIASA ensured a smooth process and facilitated communication, but, from the outside, it makes it hard to disentangle IPCC activities from IIASA activities. For efficiency reasons related to database development, and for structural reasons due to the organization of IAM research, the overlap of Chapter 3, IIASA and IAMC was likely hard to avoid; it facilitated coordination among authors and ensured that the latest published IAM scenarios would be considered. At the same time, it created a peculiar dynamic within the LA team, with a majority of authors used to working together, and IAM community “outsiders” who were more or less well acquainted with IAM research (Interviews 0051, 0054, 0056). It remains to be investigated whether this is unique to Chapter 3 or occurred in other IPCC Chapters.

The data template: an IPCC-IAMC hybrid

The AR6 Scenarios database is, at its core, an IAM database primarily developed for Chapter 3, but adapted to the scope of the IPCC assessment. Its constitution can be described as an attempt to open up the IAMC infrastructure to new types of data, models, and users. As a result, it is an IPCC-IAMC hybrid. The database is framed by its variables template. For the AR6 global and sectoral database, it was based on the template for the SR1.5 Scenarios database, which itself followed the IAMC template. Its structure is thus similar to that of the databases used for IAM projects, but the variables list was expanded in conversation with the sectoral chapters. The template Excel file was emailed back and forth, with lots of input especially from the building, transport, and industry chapters. This resulted in a template that an IAM researcher described as “very, very ad-hoc” (interview 0047): one that was based on an IAM project template, but with many additions specific to the assessment, for instance very detailed variables on transports or buildings, that meant it would no longer fully make sense as an IAM data template.

Collecting scenarios data and metadata

Beyond adapting the variables template, opening up the infrastructure involved getting a diversity of modeling teams to submit scenarios. The Calls for Scenarios were but the first step in that direction. The process also involved direct communication with research groups to invite them to submit (especially for the sectoral chapters that did not issue a call for scenarios), including phone calls to try to bring in data from sectoral models that were not well represented in the database, as well as presentations at the EGU Conference, promotion on Twitter and through the IPCC Newsletter, and a series of tutorial videos to assist with the scenario submission process (interview 0044). One objective of the call was to extend the collection of scenarios beyond the literature already identified by the lead authors. IIASA dedicated substantial resources and time to assist in the database submission process. In terms of increasing the diversity of models and scenarios in the database, this was a success: about 100 of the models represented in the database were newcomers, leading its developer to describe it as “the widest and most inclusive scenario collection effort in the IPCC so far”6.

While the content of the database indeed grew in diversity, fully integrating non-IAM scenarios into the infrastructure proved challenging. Diversifying the database is not a matter of simply “plugging in” new models. The work of submitting to the database turned out to be, if not easier, worthier of the effort for IAM teams.

The first step to scenario submission was model registration. For national and regional scenarios, the registration was simplified to filling in a short online form, but for global and sectoral scenarios, it required filling in an Excel file with 42 categories of metadata on the model and information on model regions. The point of model registration and metadata is not to limit access but to ensure the quality and usability of the database. The model metadata was necessary to document the models and warrant a degree of reproducibility, especially with a view to enhance transparency. However, it required detailed knowledge of the model architecture and substantial work to fill in: modeling teams needed to organize and devote resources to it if they wanted to take part, and it was ultimately up to them whether they chose to do so.

Besides, not all categories necessarily made sense for all models. The information on model regions, on the other hand, was indispensable to ensure interoperability and comparability of the data by allowing the conversion of model-specific region nomenclatures to a single naming convention. Providing this information takes time and work for any team, though it only needs to be done from scratch once, and IIASA provided support.

After registration, the submission of scenarios itself was designed to be as straightforward as possible. All the same, this turned out to be easier for IAMs scenarios than non-IAMs. A lot of the scenarios in the database were directly transferred from the SR1.5 database and from project databases already hosted by IIASA. The migration required consistency checks, for example on variable names, model names, or unit baselines, but this work was centralized and semi-automated. The scenarios that could be imported in this way were all from IAMs and from teams that, by definition, had experience with the IAMC and IPCC data templates.

Submitting data from standalone studies was more costly, and there are relatively few scenarios not from MIPs in the database. This could require running the scenarios from a published paper again – so having access the version of the model used when the study was carried out – to get the right output variables. IAM teams know that they will routinely use similar data templates, and that being visible in IPCC report is valued in their community. They also have a sense of what type of scenarios and variables are interesting to submit to bring contrasting information to the database. For them, dedicating time and resources to archiving model versions or to developing data-extraction or data-formatting scripts is worth the investment (Interview 0053). However, for research groups outside the IAM community, the gains are limited compared to the amount of work required: they may have produced relevant scenarios as one-shot studies but mostly work on other topics or in other communities where participation in the IPCC is not as valued; they may also have been unsure how much their scenarios would be taken into account in the report27, (Interviews 0046, 0053).

For sectoral and national/regional scenarios, the data template reinforced this problem. There was value in starting from a similar template for all the calls: it ensured interoperability, and made it possible to compare data from sectoral and national/regional models with sectoral, national and regional data from global models. All the same, the amount of variables could be overwhelming, especially for teams that were only submitting a couple of specific scenarios, while still not catching some variables of interest from certain models. Besides, the template included a very detailed “Scenario metadata sheet”, based on the IAMC template, but not suited to the diversity of models that the calls attempted to garner (Interview 0046). These difficulties persisted despite the support provided by the database and chapter teams. This may owe to the ambition to diversify the database: without knowing in advance which models and scenarios were out there, it was virtually impossible to design a template that would fit such a diversity of models on the first attempt. In other words, building and diversifying a database involves a learning process.

Processing the database

A database in itself does not say anything, especially when it contains such a large and diverse amount of data from a variety of model types. To extract meaning and messages out of it requires processing. This is a crucial step that determines what travels from the database into the IPCC report. The outcome of this processing is that not all data are equal. This is not a shortcoming, nor can it be avoided: processing data means deciding which data are sound, which are relevant to address a specific range of questions, and how to relate them and present them.

Perhaps one innovation in AR6, compared to previous assessments, is that several ways to process scenario data were experimented, besides using the database to map the IAM literature. In total, seven chapters used scenarios from the database, each in different ways60, (p. 1892). The efforts to build the Scenario Explorer interface and to make it accessible after the AR6 approval open up the database to a wide range of uses: e.g. a way to ensure transparency and check reproducibility of AR6 scenario-based messages; a tool for dissemination; a public repository of data for diverse research communities. A lot of the messages based on scenario data that made it in the SPM61, however, were produced by Chapter 3, which is the chapter that made the most extensive use of the database.

Chapter 3 processed the database in several ways: historical vetting, future vetting, climate categorization, collection of extra metadata, and selection of “Illustrative Pathways” (IPs)14,60,62,63. The main purpose of the historical vetting was to filter out scenarios that were not in line with the recent evolution of energy and emissions; the future vetting was only indicative and did not lead to excluding scenarios from the analysis. The vetting process was designed within the chapter team. The vetting ranges, for instance, were established iteratively, with trials to test how many scenarios different vetting ranges would filter out.

Vetting was a form of quality control to ensure that the scenarios considered for Chapter 3 were suitable for the questions this specific chapter was mandated to address. It did not lead to excluding scenarios from the database, so scenarios “vetted out” of Chapter 3 could still be used in other chapters. However, an effect of the processing of the database by Chapter 3 was to reduce it to its IAM core14,60 (p. 188762, (p. 310) – which was in fact the primary literature that the chapter intended to assess. IAM scenarios appear to have been better suited to the type of assessment carried out in Chapter 3. Several factors played into this.

First, the uncertainty as to what variables would actually be used in the report created a difficulty for the teams submitting scenarios, including IAM teams. The data template distinguished between high priority and lower priority variables, but still included a huge number of variables as a result of the invitation for all chapters to include additional variables relevant to their assessment. Deciding which of these to submit in order to maximize the chances that their scenarios would not only be visible, but would also meaningfully contrast with the rest of the data, was not a trivial task. Having access to insider information via LA colleagues could help getting a sense of the graphs that would be in the report, so of the type of variables that would be used. Similarly, estimating the vetting ranges required work – either consulting the first and second order drafts of the report, or spending time doing test submissions to see which values would be filtered out by the vetting (Interview 0053). If submitting scenarios to the database took time, ensuring that this work would pay off by contributing to the report’s message required even more commitment.

Second, the climate categorization of pathways added a filter. Often presented by interviewees as one of the achievements of this assessment cycle, the close collaboration between WGI and WGIII around climate emulators allowed for “a fully integrated temperature-based classification of mitigation scenarios” for the first time63, (p. 9080). The climate assessment workflow is now published and open source63. The climate assessment could only be performed for scenarios “that contained sufficient emission data across gases and sectors to provide full century climate outcomes”63, (p. 9077). Infilling tools enabled modelers to close data gaps for required variables not reported by their models60,63,64. In the end, only 16 models from 13 model families provided the 1202 scenarios that received a climate assessment60, (p. 1887). This reduction makes sense in terms of ensuring the quality, legitimacy, and credibility of the temperature classification of scenarios. This quality control was all the more important that the rationale for this temperature classification was to enhance policy relevance: “the Paris Agreement long-term global temperature goal makes a global temperature classification of emission scenarios is now directly relevant to informing policy decisions”63, (p. 9080). All the same, this is an example of how the IPCC mandate for policy-relevance can frame the processing of the database, and, ultimately, which scenarios are considered sound, useful, or relevant.

IPCC authors repeatedly acknowledge that scenarios databases are “ensembles of opportunity” and warn that they should not be used as statistical samples60,62,65,66. Annex III in AR6 discusses the “biases” in the ensemble60, (p. 1876). The vetting and climate assessment filters have tended to reinforce biases in the ensemble by increasing the weight of already dominant models and studies, with the model families REMIND (PIK) and MESSAGE (IIASA) and the scenarios produced for the ENGAGE projects dominating the database14,62 (p. 306-307). As a result, the statistical analysis of the database in Chapter 3 provides an assessment of only a subset of the scenarios published in the literature, contrary to the analysis of scenarios databases in earlier assessment reports which sought to assess the full ranges available in the published literature. This evolution needs to be considered in light of the continuous expansion of the scenarios literature. The combined increases in the number of scenarios available, in the diversity of types of models contributing to the database, and in the range of issues considered in scenarios imply that it may no longer be possible to carry out the kind of exhaustive statistical analysis that was done for the SRES.

Extracting meaning from the database

To produce messages representative of the literature with such an “ensemble of opportunity” and move beyond a statistical analysis of subsets of the database, authors relied on three main strategies. The first of these strategies was to select “Illustrative Pathways” (IPs) that allowed the confrontation of contrasted storylines. This was inspired by the SR1.5 four Illustrative Pathways, as well as by a tradition of using markers or reference scenarios that can be traced back to SRES. Chapter 3 authors thus selected nine Illustrative Pathways taken from the most recent literature (some of the IMPs were published in a paper accepted just a couple of days before the WGIII cut-off date). The numbers and characteristics of the Illustrative Pathways were discussed with authors from the other chapters in WGIII during Lead Authors Meetings (interview 0056). This made it possible to highlight key contrasts in the range of futures explored in the literature, for instance between high-CDR, high-renewables, and low-demand pathways, and so to provide a sharper message. At the same time, it further focused the message on the core set of IAMs: the models that provided IPs (REMIND, MESSAGE, IMAGE, AIM, COFFEE, G-CAM, WITCH) are among the eight models with the most vetted scenarios in the database. The corresponding modeling teams were each represented by one LA or CLA in Chapter 3.

The second strategy was to gather more metadata in order to document not only the content of the database, but also the biases in the ensemble. The Database team thus provided Lead Authors with “metadata indicator sheets” with model and scenario information as well additional calculations, such as cumulative emissions to 2050 and to 2100, year of net zero, peak temperature, vetting status, etc.60, (p. 1883). This metadata list was updated on-the-go following requests from authors. A survey on the models was also sent to the teams that had contributed to the database, the results of which are presented in Annex III, and model documentation is also available on the Scenario Explorer interface, though the amount of details ultimately depended on the willingness of modeling teams to fill in the survey.

The third strategy was to use the database as one line of evidence among many others, and to go back to the literature for parts of the assessment. This was done in some parts of Chapter 3, for instance the sections on Sustainable Development and on the Economy, for which the database did not provide enough relevant indicators. Similarly, in Chapter 4, the database was considered as not representative of the literature, and most of the chapter in fact relied on direct assessment of publications. In Chapter 9 on Buildings, the authors decided to “supplement [the] dataset with a large number of scenarios separately pulled from the sectoral literature”60, (p. 1892), in part because the LAs had had difficulties in getting sectoral modeling teams to submit to the database27.

Discussion

This paper retraced the history of IPCC Scenarios Databases since the 1990s in order to shed light on the processes and challenges in developing the AR6 Scenarios Database. The socio-historical analysis of scenarios databases as knowledge infrastructures yields two main insights.

First, it shows that scenarios databases are a product of the co-evolution of IAM research and the IPCC WGIII. This co-evolution accounts for the current dominance of IAMs in the AR6 databases. As studies of knowledge infrastructures have shown, developing databases is not simply a matter of gathering data: it also involves developing a community and a shared understanding of what the data can be used for. In the case of the IPCC, this means translating requirements for transparency, pluralism, scientific credibility, and policy relevance in the development and processing of scenarios databases.

Second, the article documents the changes in the status and purpose of scenarios databases since the IPCC stopped producing its own scenarios. Up until SRES, scenarios databases were assessment tools that improved the transparency of IPCC scenarios and ensured they were in line with the literature. With the expansion of IAM research, databases have become instruments for cooperation and data sharing across IAM groups. The AR6 further expanded the purpose of Scenarios Databases, since they were intended to help broaden the evidence base of the IPCC report, to enhance transparency, and to be used for the dissemination of results.

Following criticism of the centrality of IAMs and long-term global pathways in AR5, there were efforts during the AR6 cycle to open up the scenarios data infrastructure to non-IAM scenarios. There was also experimentation with different ways to extract meaning from the database, within Chapter 3 – the chapter most closely associated to the database –, in other chapters, and in the dissemination of the Scenarios Explorer. These efforts started from an IAM core that itself took many years of work to establish. The result is an IPCC-IAMC hybrid: a database that is organized like an IAM database, but with extensions and adjustments to fit the IPCC AR6 mandate. Despite considerable efforts to open it up, the database structure, its content, and the messages that were extracted from it are still dominated by IAMs. Besides, given the co-evolution of the scenario data infrastructure, IAMs, and the IPCC, maintaining a clear distinction between research and assessment while ensuring the smooth development of scenarios databases tends to remain challenging, and the division of work is not always clear to outsiders.

In terms of enhancing the diversity of perspectives and the transparency of scenarios, how did the database fare? It did, to an extent, succeed in diversifying the pool of scenarios and models considered. The database includes contributions from an unprecedented number of first-time participants. Though the pathways produced by sectoral and national or regional models were not used as much as the global pathways in the assessment, their inclusion in the database increases their visibility. It brings them in the same calculative space as IAM-produced long-term global scenarios and, in this way, enables a degree a comparability across diverse perspectives. The AR6 Scenarios database is also more flexible than its predecessor: the “Chapter 3-way” of using the database is not the only one; other chapters (e.g. chapters 4 and 9) have used it in different ways. Dissemination via the Scenario Explorer opens the way to broader (re)interpretations and appropriation of its content.

When it comes to transparency, the outcome appears paradoxical. The database, the scenarios it contains, and the models that produced it are much better documented than was the case for AR5. Their shortcomings have also been discussed openly. At the same time, the infrastructure has become more complex and harder to understand. The database contains a vast amount of data from heterogenous sources, so the comparability of the data is less straightforward than it would be in an IAM-pathways database. Annex III contains detailed information, but remains very technical. At this point, it is a two-tier infrastructure: it is, by construction, best suited to IAM results, and more easily used by IAM-researchers, while researchers from other communities appear to have struggled to contribute to it and to exploit it.

As Paul Edwards has noted, “infrastructure formation is never tension-free”33, (p. 12). By analyzing the challenges and trade-offs in building the current IPCC Scenario data infrastructure, this article hopes to contribute to reflections as to how Scenarios Databases can help further pluralism and transparency in the IPCC WGIII. It outlines the importance of social, institutional, and organizational learning in the development of data infrastructures. This reflection can thus benefit from insights and inspiration from other “data journeys” in climate science and in other fields35,36,37.

Methods

To retrace a concrete and situated history of Scenarios Databases, the article draws on the qualitative analysis of a broad range of material. A total 10 interviews were carried outwith IPCC CLAs, LAs, and CSs from Chapters 3 and 4 of WGIII’s contribution to the AR6, with people from the IPCC WGIII and with IAM researchers who have worked on either side of the data pipelines, mostly at IIASA (all in 2022-2023). The interviews relied on the “text-author ensemble” method67, which consists of cross-analysing a scientific document (here the AR6 Scenarios Database and associated IPCC chapters) together with interviews with its authors to retrace its elaboration. The interview sample does not cover all relevant IPCC authors due to constraints on their availability, but nonetheless allows for a detailed reconstruction of the process. Interviewees were informed by email of the study objective and data handling procedures, as well as of the possibility to withdraw from the study upon request; they confirmed their consent orally at the beginning of the interviews. The interviews were anonymised and numbered. References to information obtained in the interviews are indicated by referring to the interview number (e.g. “Interview 0030”).

A similar set of interviews carried out between 2015 and 2017 allowed for a comparison with the AR5 cycle1. In addition, extensive documentation from the IPCC, scenarios research, and the IAMC on scenarios and scenario databases was collected and analyzed: IPCC reports and plenary decisions, reports from Expert meetings and workshops on scenarios, IAMC conference proceedings and recordings, scenarios data templates, model registration forms and instructions, IAM project documents, video tutorials and webinars on scenarios and scenarios databases. On this basis, a detailed timeline of IPCC-related scenarios work since the first assessment cycle was elaborated. The paper also relies on participant observation at the Scenarios Forum 2022 in Laxenburg, Austria and at the 2021 meeting of the IAMC.

In the article, the term “IPCC Scenarios Databases” is used to designate the databases that were used in the preparation of IPCC reports, so as to distinguish them from research project Scenarios Databases. It does not imply that they are products from the IPCC: the AR4, AR5, SR1.5, and AR6 Scenarios Databases were not developed by the IPCC, but by researchers in relations with the relevant chapter teams.

Reporting summary

Further information on research design is available in the Nature Research Reporting Summary linked to this article.