Introduction

Climate change represents one of the most significant challenges of our time that demands immediate and concerted action by national governments. Effective government responses to climate change not only require the adoption of concrete policies (policy activity), but also the design of appropriate interventions (policy instruments) that are sufficiently ambitious (policy stringency). The urgency and global scope of the problem of climate change underline the need to comparatively assess the activity, instrumental design1,2, and stringency of governmental policy responses3.

To track progress in national climate policy efforts, various databases have been developed over the last years4,5. Among these repositories are the Climate Policy Database (CPDB)6,7,8, the Climate Change Laws of the World (CCLW) (Grantham Research Institute)9, the European Environment Agency’s (EEA) database on greenhouse gas policies and measures in Europe, the International Energy Agency’s (IEA) Policies and Measures Database, the OECD Climate Actions and Policies Measurement Framework (CAPMF)10, and the Climate Policy Portfolios Dataset (CLIMAPP), which was developed in the context of the ACCUPOL project11. The stated goals of these repositories are to describe and categorize countries’ responses to climate change and to provide an empirical basis for evaluating their effectiveness. While there are other datasets that cover certain instrument types or sectors, the six datasets mentioned distinguish themselves by their aim to cover climate policy activity across countries comprehensively.

While tremendous work has been undertaken to track climate policy developments, we do not know whether and to which extent these datasets provide us with accurate and consistent information about countries’ policy efforts and their effects. While there are some papers that assess and compare more than one dataset4, they do not compare several datasets with regard to multiple policy output dimensions. Ideally, the datasets would provide consistent answers to the following questions: Which governments are attempting to address climate change (description)? What exactly are they doing (taxonomy)? And how effective are their attempts (effects)? However, while all datasets share the goal of tracking national climate policies, there are notable differences between them. They not only exhibit differences in geographic, sectoral, and temporal coverage but also use different analytical approaches and classificatory schemes to capture climate policies. Moreover, while some datasets only focus on description and taxonomy, others additionally provide a basis for assessing climate policies’ effects by containing information about their stringency. Hence, it cannot be taken for granted that the existing databases provide us with consistent information on how governments respond to climate change and what difference it makes.

This review article addresses these knowledge gaps. It analyzes whether and how existing datasets map national climate policies with regard to three key dimensions: general policy activity, policy instruments, and policy stringency. Our analysis yields mixed results.

While the databases provide quite consistent information on countries’ general enactment of climate policies over time (description), notable differences between the databases emerge once we delve into features of instrument choice (taxonomy) and stringency (effects). Specifically, we identify disparities in how the databases record the (growing) diversity of the policy instruments that countries employ to tackle climate change. These disparities become even more pronounced when examining how the databases record policy stringency (i.e., the calibration of policy instruments).

Our findings reveal that all datasets agree at the aggregate level in that they show that ever-more climate policies are being adopted (what is being done?). However, they diverge significantly when scrutinizing more nuanced elements like policy instrument types (how is it done?) and their stringency (to what effect?). The analysis thus highlights that while all databases can be utilized to gauge broader developments in climate policy, the choice between these databases becomes critically important when scholars move from a more general assessment of climate policy activity to concrete inquiries on policy instrument choice and stringency. At this more intricate level, giving careful attention to the choice of databases and their inherent strengths and weaknesses is of heightened importance. Here, it becomes more crucial to validate whether the results and conclusions derived are, in any manner, influenced by the specific database selected for analysis. We demonstrate that some (pairs of) datasets are much more congruent than others. Moreover, existing databases are generally much better suited to explaining the adoption of climate policies than their effects. For evaluative studies, it is not only important to know that certain policies and policy instruments exist but also how strict these measures are. For instance, a common critique against carbon trading schemes is that they have often been too low in price or insufficient in coverage to address climate change effectively. With regard to the latter point, the CLIMAPP and especially the OECD climate policy dataset are notable examples as they not only provide information about the existence or absence of certain climate policy measures but also about (changes in) their stringency.

Our review’s main contribution is to show what research endeavors are possible with the existing datasets and which are not and to provide concrete suggestions on how to enhance the reviewed datasets to make them even more useful for social science research on climate policy. Our article thus provides the most comprehensive and up-to-date source for scholars and practitioners interested in the comparative analysis of governmental climate policy efforts.

The article is structured as follows. We first delve into the criteria for evaluating climate policies and climate policy data repositories and map the existing datasets along the identified dimensions. We then compare the various datasets to determine the extent of their overall alignment, looking at general policy activity, instrument diversity, and policy stringency. In the concluding section, we reflect on the collective insights garnered from our review and tease out implications for future research in the field.

Criteria for evaluating climate policies and climate policy data repositories

At the most abstract level, any effort to fight climate change requires some policy activity/action. This means that policy targets—products or activities causing (mitigation) or being affected (adaptation) by climate change—are addressed by some governmental measures. The level of policy activity or density thus informs us about the extent to which a given sector is permeated by governmental interventions3,4. In general, climate policies target multiple sources of and activities causing greenhouse gas emissions, covering both economic and private activities. At this abstract level, the essential question is whether governments address certain issues, such as emissions from the transport or industrial sector. However, these policy targets can also be more granular and differentiate between emissions derived from various industrial plants (small vs. large combustion plants) or modes of transport (passenger cars vs. heavy-duty vehicles). To structure the large variety of targets that contemporary governments address with their climate policies, targets are often grouped into sectors or areas, such as transport, industry, energy, and buildings. In this context, it is important to clarify that the term policy “targets” does not refer to emission reduction targets commonly debated within the framework of the Paris Agreement but rather to the question of which specific climate challenges the government addresses.

The second, more specific dimension that can be analyzed is how governments seek to achieve their predefined policy goals. Governments can usually choose between different policy instruments to address policy targets1,12. Policy research has developed various ways to categorize policy instruments. For instance, an often-used categorization in the climate domain is by Vedung2, who distinguishes between regulations (“sticks”) such as emission standards and limit values, market-based instruments (“carrots”) such as emission trading or carbon taxes, and informational instruments (“sermons”) such as educational campaigns or performance labels (for other classification see13. A key insight from the literature is that all policy instruments have their particular strengths and weaknesses14. Consequently, governments need to choose from and craft “instrument mixes” that address the policy targets in the best possible way.

The third and most nuanced dimension is the stringency or intensity of public policies. This dimension refers to the exact degree of ambitiousness underlying the government’s efforts to address a given issue3. Frequent examples in the area of climate policy are the stringency of emission standards (e.g., 120 vs. 95 g/CO2 per km) or the price set by carbon taxes (20 vs. 100 Euro per tCO2). Stringency is an important policy dimension as it is necessary for evaluating policy effects, or it can be used as a proxy to infer effectiveness if an actual evaluation of effects is not feasible for some reason. Generally, the more ambitious a particular policy, the stronger its effects can be expected to be.

Table 1 presents a compiled overview of the existing datasets related to climate policy along the key dimensions policy activity, instruments, and stringency. It also indicates the geographic coverage by stating the number of countries included in the databases. Sectoral coverage refers to the specific industries or areas of society being covered by the databases, like transportation, industry, or energy production. Temporal coverage displays the range and distribution of data entries over time. All datasets include policy targets on both mitigation and adaptation, except for the OECD dataset. This dataset does not include adaptation targets.

Table 1 Overview of Existing Climate Policy Databases

Policy activity, instrument mixes, and stringency across the datasets

In the previous section, we mapped the existing datasets on climate policies. In this section, we delve deeper into identifying overall patterns, correlations, and disparities that exist across these datasets. We assess the general policy activity before delving into instrument mixes and stringency.

Climate policy activity

Figure 1 displays each dataset’s temporal trend of policy activity, i.e., the adoption of new climate policies over time. The figure reveals that all the datasets examined present a strong growth in the number of climate policies over time. This growth, however, does not occur “linearly” but instead comes in different surges, suggesting periodic intensifications in the production of climate policies. The datasets reveal notable peaks in climate policy production during the early 2000s, mid-2010s, and the early 2020s. The peaks observed in the 2000s and 2010s roughly coincide with (the aftermath of) major international climate agreements: the Kyoto Protocol, which was adopted in 1997 and ultimately came into force in 2005, and the Paris Agreement, which was adopted in 2015. However, it also needs to be noted that there is some variation between the peaks identified in the different datasets, indicating that despite their commonly reported growth trends, they display certain variations in the specific activities they capture. This assessment is corroborated when analyzing the correlation between the datasets at the country-year level.

Fig. 1: Policy activity over time and across datasets.
figure 1

Each bar represents the new policy activity included in the respective dataset in each year. The different number of countries and of policies in each database implies that the vertical axis is singular for each database. Please note that the figure includes all countries contained in the respective datasets. In Figure S8.1 in the Supplementary Information, we provide a figure that presents only the joint country sample across the datasets. Source: CPDB, EEA, IEA, CCLW, CLIMAPP, OECD.

A challenge for examining correlations at the country-year level is that there are significant differences across the datasets regarding the number of policy targets and instruments covered (see again Table 1). Consequently, some datasets appear to document more changes than others simply because they consider a greater number of policy targets and instruments. To address this issue, we transform all datasets into a two-dimensional portfolio space comprising all policy targets along one dimension and all instruments on the other. This “policy portfolio” approach has gained traction in recent years within public policy research to compare policy dynamics across countries and different policy sectors15,16. The portfolio’s value is 100 percent when every potential target-instrument combination is covered, while a value of 0 percent suggests that no policy actions have been taken. Changes within this portfolio space can be quantified as percentage point changes. For example, an increase from 10 to 15 percent coverage is recorded as a five-percentage point change. This approach allows us to compare standardized portfolio growth rates at the country-year level irrespective of the exact number of policy targets/instruments covered by the respective datasets12. The underlying logic of this approach is exemplified by the policy portfolio presented in Fig. 2. Using the CLIMAPP dataset, the figure presents Germany's climate policy portfolio at two points in time, namely, the year 2005 (shown in black boxes) and the year 2022 (shown in black and grey boxes). The analysis shows that during the specified period, the portfolio expanded from 48 to 114 covered spaces, representing an increase of 6.5 percentage points (4.7% to 11.2%). While 72 target-instrument combinations were introduced between 2005 and 2022, only 6, primarily subsidies, were abolished.

Fig. 2: German Climate Policy Portfolio in 2005 and 2022.
figure 2

The horizontal axis represents the different policy targets, and the vertical axis the instruments employed. A white background means that there is no policy intervention recorded in that space. A black square identifies spaces that were covered by policy intervention in 2005 and continue to do so in 2022. A gray square identifies new policies added between 2005 and 2022. A dark red square identifies policies in place in 2005 that are no longer part of the portfolio. Source: CLIMAPP.

We have been able to create portfolio spaces for all datasets except for the OECD database. For the CPDB, targets are the cross-combination between sectors (Table S1.1 in the Online Supplementary Document) and types (Table S1.2 in the Online Supplementary Document). For the CCLW, targets correspond to the sectors (Table S2.1 in the Supplementary Information). For the EEA, targets are the cross-combination between sectors (Table S3.1 in the Supplementary Information) and greenhouse gases affected (Table S3.2 in the Supplementary Information). For IEA, targets correspond to sectors (Table S4.1 in the Supplementary Information). For CLIMAPP, targets are shown in Table S6.1 in the Supplementary Information. For the IEA dataset, we limit our analysis to those targets (sectors) that are present in at least five countries, and to instruments found in at least ten countries. This is because considering every potential combination of targets and instruments surpasses our computational capacity. By focusing on “policy actions,” the OECD dataset already blends targets and instruments from the outset; however, it does not consider the same instruments for all potential targets. Thus, creating a (symmetric) two-dimensional portfolio space is not possible. This does not imply that the quality of the OECD dataset is inferior. It merely suggests that it is more challenging to compare it with others when using the portfolio approach. By excluding the OECD dataset from parts of the following analysis, we also address the issue that the OECD dataset only includes mitigation targets, in contrast to the others. Figure 3 demonstrates the correlation between the five other datasets. Given their different sample sizes, we have only included data for which there is an overlap in the pairwise comparisons.

Fig. 3: Cross-correlations in general policy activity.
figure 3

The figure depicts the correlation between the datasets under study (except the OECD dataset). The diagonal contains the distribution of the portfolio sizes in each database. The upper diagonal shows the cross-correlations between databases, and the lower diagonal shows the scatterplots. Source: CLIMAPP, CPDB, EEA, IEA, CCLW.

The strongest correlation is found between the IEA and the CPDB, with a correlation coefficient of 0.882. The weakest correlation exists between the EEA and the CPDB, with a coefficient of 0.274. On average, the datasets exhibit a correlation of 0.542. When broken down, the average correlations are as follows: 0.652 for the IEA dataset, 0.620 for CCLW, 0.547 for CPDB, 0.417 for the EEA dataset, and 0.476 for the CLIMAPP. These figures suggest that, while correlations vary across datasets, there is a general pattern of moderate to strong correlations, with some individual pairings, such as IEA and CPDB, showing particularly strong correlations. While a high level of correlation is not necessarily a quality indicator—as all datasets could potentially be inaccurate in their assessments—it does (at least) suggest some consistency across the datasets when it comes to tracking general policy activity over time.

Thus far, we have evaluated the datasets’ consistency by checking for cross-correlations (external validity). Another way to analyze and compare the datasets is to see whether climate policies are consistently coded within them (internal validity). For this purpose, we can take advantage of the fact that all EU policies being adopted through EU Directives—as opposed to EU Regulation—must also be transposed into national-level policies. One could expect that in a perfectly consistent dataset, all the policy measures coded at the EU level should also have been coded at the member state level. Put differently, when “laying” the EU policy portfolio over those of the member states, there should be a substantive, if not exhaustive, overlap between EU Directives and member state transpositions. To test for this, we created a distinct EU portfolio for the CPDB, the IEA, and the CCLW. At the time of writing, the OECD’s CAPMF lists the EU as a separate unit for analysis, but all the entries are, unfortunately, empty. As depicted in Fig. 4, there is generally a high level of consistency between the EU and the member states’ coding. The average consistency reaches an impressive 98% for the IEA dataset and 94% for the CCLW. The coding in the CPDB is also quite good, with an overall consistency of 89%. However, this consistency decreases when we shift focus from the older to the newer member states. For most EU member states that acceded post-2004, the consistency between the EU and national coding drops to around 80%. Note that due to the time lag between the adoption of an EU policy and its transposition into national law, it might be the case that some of the EU policies are not yet represented in the national policy portfolios. To control for this eventuality, we present an additional analysis in Figure S8.2 in the Supplementary information that sets the year 2018 as the upper time limit for policies included in the EU portfolio. This way, we avoid that our analysis of internal validity is biased by possible delays in transposition. This adjustment improves the overlap by about 2% for the CPDB and the CCLW and remains almost unchanged for the IEA.

Fig. 4: Coding accuracy using EU policies as benchmark.
figure 4

The figure depicts the consistency of different datasets for the coding of the EU and the member states. The horizontal axis displays the percent of overlap between the EU policies and the national (in the vertical axis) policies. Each color represents a different database. Source: CPDB, IEA, CCLW.

In sum, the aggregate insight from the first part of our analysis on climate policy databases is that—despite their pronounced conceptual variation (see again Table 1)—they are rather consistent regarding their assessment of general trends in climate policy activity across countries.

Climate policy instrument mixes

But does this picture also hold if we climb one step down on the “ladder of analytical abstraction” and focus on policy instruments? A key insight from public policy research is that instrument choice and combinations are crucial for the government’s ability to tackle societal challenges. Within this framework, the more quantitative policy design literature has developed concepts such as “instrument diversity”17 or “instrument balance”18 to evaluate instrument mixes. Essentially, both approaches involve assessing the variety of the policy instruments being utilized.

In the following, we use the concept of average instrument diversity (AID) developed by Fernández-i-Marín et al.17 to assess and contrast the information on the composition of instrument mixes provided by the datasets. The AID measure can be calculated using the R PolicyPortfolio package once the respective data is structured in policy targets and instruments. Put simply, AID is calculated by selecting instruments dedicated to specific targets (the vertical axis in Fig. 2), and then determining the average probability that the same instrument is picked when making “draws” across other policy targets (the horizontal axis in Fig. 2). For a more detailed explanation of this approach, please consult Chapter 7 of the Supplementary Information.

Figure 5 illustrates the correlations in AID among the five previously evaluated datasets, focusing on the years 2000 to 2023. This time span is chosen because the instrument diversity metric is quite sensitive to fluctuations when the portfolio size is small. The analysis suggests a substantially lower correlation level when examining the composition of instrument mixes rather than general policy activity. The average correlations for each database, when examined separately, are as follows: 0.419 for CPDB, 0.386 for CCLW, 0.350 for CLIMAPP, 0.323 for the IEA, and 0.208 for the EEA. The calculated average correlation across all datasets is 0.349, indicating a strength about two-third of that observed for general policy activity. Hence, while the datasets provide consistent information on whether climate action is taken in a particular country, they provide much more conflictive information on what governments are doing to tackle climate change.

Fig. 5: Cross-correlation for average instrument diversity.
figure 5

The figure depicts the correlations in AID among the five previously evaluated datasets. The diagonal contains the distribution of the average instrument diversity. The upper diagonal shows the cross-correlations between databases, and the lower diagonal shows the scatterplots. Only countries shared across databases are included. Covers the 2000-2020 period. Source: CLIMAPP, CPDB, EEA, IEA, CCLW.

An obvious explanation for these discrepancies is that the datasets rely on different—and more or less detailed—classifications of policy instruments. Datasets that exhibit more parsimonious categorizations of policy instruments automatically display smaller levels of instrument diversity. However, our analysis indicates that the identified discrepancies may also stem from coding inconsistencies within the individual datasets, i.e., that not all policy instruments in the respective datasets are coded at the same level of “detail”.

One way to examine the consistency with which a dataset classifies polices across countries and areas of intervention is to leverage the datasets’ distinction between instrument types and subtypes (among the six datasets analyzed, three differentiate between various types and subtypes of policy instruments), and to use alluvial diagrams to analyze their relations. Figure 6 presents an alluvial diagram picturing the various instrument types and subtypes considered in the CPDB. In the diagram, green flows symbolize instances where comprehensive information is available. This means that, for the policy in question, the coding effectively captures the full spectrum of detail: from the instrument category to the subcategory and the exact policy tool used. Conversely, red flows indicate coded policies where only the broad instrument category has been coded, without further detail on the subcategory or the specific tool. The diagram also features two shades of yellow, representing instances in the data where the available information is only partially complete. One shade of yellow highlights instances where detail is given for the subcategory, yet the instrument category is not specified. The other shade shows the inverse: detail is provided for the instrument category but not for the subcategory.

Fig. 6: Policy instrument and subtypes considered in CPDB.
figure 6

Alluvial figure depicting instrument types and subtypes in the CPDB, showing the correspondence between the entries with a “Category” (first column), the “Subcategory” (second, middle column), and the “Instrument” (third category). Colors correspond to the status of each policy. Source: CPDB.

In essence, this figure suggests that a complete coding between the instrument type and the first subcategory is observed in only 45 percent of the policies contained in the dataset. Furthermore, merely about 34 percent consistency is found between the first and the second instrument subcategory. This issue is most pronounced for the substantial group of regulatory instruments, for which only about 15 percent are coded across all three levels: category, subcategory, and individual instruments. The analysis thus indicates that the level of “detail” or the completeness with which policy instruments are described is not uniform across all the codings included in the dataset.

These inconsistencies have implications for analyzing instrument mixes and diversity, as the extent and quality of information provided can lead to significantly different conclusions about a country’s policy instrument diversity. Consider an example where three policy instruments are documented. In one instance, the information merely indicates that all three are regulatory instruments. In another instance, the policies are explicitly identified as auditing, building codes, and obligation schemes. In the first scenario, the AID would present a value of zero, suggesting no diversity, because it appears as though the same type of instrument is being applied repeatedly. However, in the second example, where specific types of regulatory instruments are identified, the AID would indicate very high diversity, as it shows a range of different instruments being employed.

Climate policy stringency

The main takeaway from the preceding sections is that there is broad agreement between the datasets regarding general policy activity, but notable differences become visible when we transition to the analysis of policy instruments. Even greater discrepancies appear when we assess these instruments’ stringency level. Of the six datasets analyzed, only two include the analytical dimension of policy stringency. What is more, both datasets measure stringency changes in very different ways.

The OECD dataset measures policy stringency by averaging different “policy variables.” For example, the stringency of emission trading schemes is assessed using two variables. The first variable captures the cost of the annual allowances measured in USD per tCO2e. This value is then scaled between 0 and 10, with 10 being the highest and 0 the lowest empirically observed value. The second variable captures the coverage of GHG, differentiated by CO2, CH4, N2O, and all other GHG. The stringency value of each GHG reflects the contribution of each gas to global GHG emissions: CO2 gets a stringency value of 6, CH4 a value of 2, and N2O a value of 1. The final stringency value for the CO2 Emissions Trading Schemes is then the mean of the two observed values.

The graph on the right side of Fig. 7 displays an increasing trend in the average policy stringency over time, effectively showing an almost doubling of stringency values from around 3.8 to roughly 7. Yet, an important aspect to consider in this context is that each instrument has its own “method” for calculating stringency, which makes direct comparisons of stringency across different types of policy instruments difficult, if not impossible, or even worse, misleading. This issue is not indicative of a shortcoming in the dataset; rather, it is a consequence of the inherent diversity in climate policy instruments and their respective settings.

Fig. 7: Policy stringency in the OECD dataset.
figure 7

The figure shows the policy stringency of climate actions in the OECD dataset. a presents the distribution of stringency (between 0 and 10) on the dataset; b presents their temporal distribution, with a temporal evolution. Source: OECD.

The left-hand side of Fig. 7 shows that the observed data points tend to cluster around the 5 and the 10 marks. This clustering pattern emerges due to the scale employed in measuring most policy variables. Most policy variables score as either 0 or 10. When these values are then aggregated at the (higher) policy level, an average score of 5 often appears, indicating that one policy variable scored 0 and the other scored 10. These findings underscore that even with a sophisticated analytical approach, capturing the nuances of policy stringency can be quite challenging.

This is also evident in the approach adopted by the CLIMAPP dataset, which captures policy stringency in relative rather than absolute terms. It provides insights into any modifications in existing policy instruments, registering whether they have become (1) stricter or (2) less strict over time. Thus, the dataset captures shifts in policy stringency, rather than providing an absolute measurement of policy stringency. Although the granularity of the dataset might be less than that of the OECD, it provides the benefit that it systematically distinguishes between changes in the level and the scope of policy stringency. Specifically, the data allows scrutiny of whether modifications imply stricter requirements such as emission limits, technologies, taxes, etc., or an expansion of the policy’s reach, i.e., inclusion of more categories, such as additional automobiles or industrial plants. Thus, the dataset provides a quite nuanced understanding of the shifts in policy stringency over time.

For instance, Fig. 8 (left side) shows that countries not only significantly increased their general policy activity since the 1980s, they also increasingly fine-tuned the adopted policies by adapting the stringency dimension. Moreover, as shown on the right side of Fig. 8, it seems that level changes are more widespread than scope changes.

Fig. 8: Changes in settings dimensions in comparison to general policy activity.
figure 8

a shows the sum of policy activity over time, dividing between portfolio activity (in black) and settings (in grey); b shows the level (dark grey) and scope (light grey). Source: CLIMAPP.

Discussion and conclusion

In this paper, we conducted an in-depth review of all existing climate policy datasets. We evaluated the datasets regarding their overall level of agreement (or divergence) concerning broader policy growth dynamics, the employment of different instrument types, and policy stringency. In this final section, we summarize and discuss the key takeaways from our analysis.

To begin, it is essential to acknowledge the exceptional depth of data available; to our knowledge, the richness of datasets in the realm of climate policy is unprecedented, surpassing even long-established research areas such as welfare and environmental studies19,20. The sheer volume and variety of public policy datasets in this field represent a commendable achievement.

That said, our analysis suggests five key points to consider when using the available climate policy datasets. On the upside, we found a notable level of agreement between the datasets regarding general trends in climate policy development. Researchers who aim to examine and explain climate policy growth patterns over time can confidently choose from all the available datasets. The first key takeaway is thus that (1) all the existing datasets are well-suited to examine questions such as why some countries have adopted particular climate policies while others have not. In this case, the dataset choice may be predominantly driven by the specific research scope, including the geographic regions, sectors, and time frame of interest.

However, when the focus turns to the analysis of policy instruments, we observe much more variability between the datasets. The second key takeaway from our analysis is thus that (2) researchers diving into this area should proceed with caution, acknowledging that their findings could be influenced by their choice of dataset. In this analysis, we observe that the CPDB has the strongest correlation with all other databases examined, followed by the CCLW, which also shows a significant degree of agreement. Nevertheless, regardless of the dataset used, it is advantageous for researchers to validate their findings with multiple datasets whenever possible. In this regard, Fig. 4 serves as a valuable resource for researchers, offering an efficient way to identify datasets that are most aligned with or, conversely, present the greatest challenge for cross-checking or validating a certain finding.

Finally, when examining policy stringency, the divergence among datasets is most pronounced. The third takeaway is that (3) only two of the six datasets available provide some metrics on policy stringency, and they measure it in quite distinct ways. The OECD dataset offers a robust measure of policy stringency levels, while the CLIMAPP dataset identifies whether there have been any changes in stringency but not the specific levels of change. When selecting a dataset for studying policy stringency, researchers should thus carefully consider what aspect of stringency they wish to investigate. The OECD dataset is preferable if the analysis focuses on the impact of policies, such as their emissions reduction potential. In contrast, the CLIMAPP dataset offers more detailed insights into the diverse strategies policymakers employ to extend or modify existing policy instruments.

When contrasting this assessment with the information displayed in Fig. 1, particularly the number of countries covered, it becomes evident that (4) there is an inherent trade-off involved in shifting from a predominantly descriptive or taxonomic approach to more evaluative datasets. It is remarkable that the two datasets involving policy stringency (OECD, CLIMAPP) also come with the smallest geographical coverage. This underscores the considerable data collection efforts required to transition from more descriptive efforts towards a more evaluative focus on governments’ climate policy efforts.

When moving from descriptive to more evaluative purposes of climate policy data, another aspect that needs consideration is the assessment of administrative capacities and structures that influence the implementation of these measures21. Even very stringent climate policies may fail to deliver if the administration lacks the resources to implement them. However, (5) all existing databases touch on the aspect of climate administrative capacity in a rather superficial way: the CPDB has one instrument on “institutional creation”, the IEA has one on “institutional mandates”, the IEA has one on “institutions for sustainable finance”, and the OECD has a variable on “climate advisory bodies”. Yet, these codings are far from capturing the complex administrative arrangements and capacities required to implement climate policies22,23,24. Future work on climate policies should thus not only aim to expand in terms of geographical scope or in terms of capturing policy nuances (stringency), but also consider expanding the analysis to include more administrative aspects. A first step, following, for example, ref. 25, might be to code whether the legal acts that bring about the respective climate policies involve a statement about implementation procedures, specifically assigning actors and rules, or setting up a rigorous monitoring regime.

In conclusion, our findings highlight a prevailing challenge in comparative public policy research, namely to capture and compare government actions across different policy measures and contexts and to identify and compare their effects3. We believe that there are clear advantages in the existence of multiple climate policy datasets and the various approaches they offer to examine climate policy actions and their ambitions. It is the diversity in datasets that enables researchers to conduct various types of research and to answer different research questions. Nevertheless, further discussion is necessary to understand how the selection of datasets may influence and possibly bias specific research findings. We hope that our review paper contributes to a thoughtful conversation on this important topic, encouraging awareness and transparency in data selection choices in future research.