Introduction

Climate change is a global phenomenon that manifests on regional to local scales1. Managing the risks of a changing climate thus requires accurate, high-resolution climate projections as well as an understanding of the associated uncertainties. One of our primary sources of information about future climate change is ensembles of coupled general circulation models (GCMs) run under various greenhouse gas emissions scenarios2. However, GCM projections of future climate are highly uncertain, owing to three primary factors: model uncertainty, arising from differences in the structures and parameters of GCMs and thus their responses to the same radiative forcing input; scenario uncertainty, arising from the range of possible future greenhouse gas emissions trajectories; and internal variability, arising from the chaotic nature of the Earth system.

Understanding the relative importance of each of these sources of uncertainty can help guide research agendas and inform the modeling choices of end-users. Several previous studies have made important progress towards this goal for a variety of both climate and socioeconomic outcomes3,4,5,6,7,8. Hawkins and Sutton3 (hereafter, HS09) use model outputs from the Coupled Model Intercomparison Project Phase 3 (CMIP3) to partition uncertainty in global and regional temperature projections, later extending their analysis to precipitation5. More recently, Lehner et al.6 (hereafter, L20) leverage single model initial condition large ensembles (SMILEs) alongside CMIP6 outputs to better characterize internal variability, particularly at regional to local scales where its influence can be dominant. Using a similar SMILE-based approach, Blanusa et al.7 (hereafter, B23) highlight the importance of internal variability in driving daily temperature and precipitation extremes.

While these works have led to many useful insights, they primarily rely on GCM outputs that are typically viewed as unsuitable for downstream analyses owing to their coarse spatial resolutions and systematic biases9. GCM outputs often need to be downscaled (to increase the spatial resolution) and bias-corrected (to remove systematic biases) before being considered suitable for the wide variety of end-uses in which they might be employed, including impact assessments10,11, adaptation planning12, infrastructure design13, and financial risk disclosures14. However, constructing a downscaled and bias-corrected ensemble requires making several methodological choices15,16 that can combine to produce considerable differences in the representation of temperature and precipitation, in particular for extremes17,18,19,20. Such differences can persist in impact assessments, for example, related to hydrology21,22,23 or ecosystem dynamics24. Due in part to these insights, a separate body of work has emerged that aims to quantify the importance of downscaling and bias-correction relative to other sources of uncertainty25,26,27,28,29,30,31. However, these studies often report mixed conclusions: for example, Chegwidden et al.27 analyze hydrologic variables in the Pacific Northwest region of North America and find that the choice of downscaling algorithm does not contribute meaningfully to projection spread; in contrast, Wootten et al.29 focus on meteorological variables in the southeastern United States and conclude that impact assessments using only a single set of downscaled and bias-corrected GCMs may suffer from overconfidence. Many of the conflicting results in this literature can be explained by different studies focusing on distinct and often small geographic regions, or on varying sets of meteorological or hydrological variables. Each study also relies on a unique sampling of GCMs, scenarios, and downscaling and bias-correction algorithms, which can lead to different uncertainty decompositions.

In this work, we aim to address the above literature gaps by quantifying the contribution of downscaling and bias-correction to projection uncertainty for a variety of climate metrics at a global scale. Following the simple variance decomposition approach of previous works29, we account for scenario uncertainty, model uncertainty, downscaling and bias-correction uncertainty, and interannual variability. Our approach involves calculating the time-evolving relative contribution of each source to the total projection spread (see “Methods”). We focus on statistically downscaled and bias-corrected ensembles and include, to our knowledge, all global, publicly available datasets with parent GCMs taken from the CMIP6 repository32. This leads to a super-ensemble comprising ~200 downscaled and bias-corrected model outputs across 4 emissions scenarios, 22 parent CMIP6 models, and 5 downscaling and bias-correction algorithms (Supplementary Table 1). Owing to data availability, we are restricted to analyzing metrics of climate change derived from daily maximum or minimum temperature and daily precipitation. Our selection of indicators includes annual temperature and precipitation averages as well as several indices of climate extremes due to their potential for large impacts on a broad variety of human–environment systems33.

Our uncertainty partitioning results are strongly heterogeneous across space, time, and climate metrics. However, in general, we find that downscaling and bias-correction contribute a non-negligible fraction of the total projection variance (typically no less than 25%, globally averaged). In many cases they represent the primary source of uncertainty. Downscaling and bias-correction are particularly important over the near term (early-to-mid 21st century), in projections of precipitation, in projections of extremes, in regions of complex terrain, and in regions where historical observations disagree. Our results corroborate previous works showing that in many instances, relying on a single set of downscaled and bias-corrected outputs can risk overconfidence29,34. For stakeholders or impact modelers who lack the computational capacity to extensively sample across all four sources of uncertainty, our results may also assist in deciding which factors to prioritize.

Results

Hereafter, to improve readability, we use the terms “downscaled” or “downscaling” to encompass the outputs or methods of downscaled and bias-corrected ensembles, unless the distinction between downscaling and bias-correction is important.

Variance decomposition of climate averages

We begin by analyzing indicators of long-term climatic change, namely annual average temperature and annual total precipitation. Before moving to the global picture, we focus on three example locations: New Delhi, India; Seattle, USA; and Lagos, Nigeria. In addition to being populous and economically important cities with distinct climates, these locations allow a comparison to previous works (L20, B23). The variance decomposition results for each city, as well as each individual downscaled projection, is shown in Fig. 1 (projections conditioned on each emissions scenario are shown in Supplementary Figs. 14). There is broad agreement on the sign of change for both temperature and precipitation, with average temperatures generally increasing in all locations (Fig. 1a–c) and total precipitation slightly increasing in New Delhi and Seattle (Fig. 1g, h) while remaining approximately constant in Lagos (Fig. 1i). However, there is considerable projection spread for all metrics and locations, and the resulting variance decompositions lead to different interpretations as to the driving factors. For temperature projections (Fig. 1d–f), the contribution of scenario uncertainty is similar in all three locations, starting small and only becoming non-negligible after around 2050. The reverse is true for interannual variability, which is more important in the first half of the century and declines over time. Similarly, the relative contribution of downscaling is largest over the near term and declines over time. However, there are considerable differences in magnitude across the three cities: temperature projections in New Delhi show little dependence on the choice of downscaled ensemble (Fig. 1d), whereas downscaling is the dominant uncertainty in Lagos long into the 21st century (Fig. 1f). For precipitation projections, a qualitatively different uncertainty decomposition emerges (Fig. 1j–l). Interannual variability is much more important in all locations, while the contribution of scenario uncertainty virtually disappears. In Seattle, downscaling is responsible for a substantial fraction of the variance of precipitation projections (Fig. 1k), model uncertainty contributes a small but perceptible fraction, and the overall decomposition changes little over time. This contrasts with New Delhi (Fig. 1j) and Lagos (Fig. 1l), where model uncertainty is relatively more important and grows over time.

Fig. 1: Projections and variance decomposition of climate averages for selected cities.
figure 1

ac Timeseries of annual average temperature from each downscaled model output. Gray lines show individual downscaled outputs and colored lines of different styles show associated ensemble-scenario means. Outputs for each city are taken from the single grid point encompassing their respective locations. df Variance decomposition of annual average temperatures corresponding to the timeseries plots in (ac). The contribution of each uncertainty source is expressed as a percentage of the total variance. gi Timeseries of annual total precipitation, similar to (ac). jl Variance decomposition of annual total precipitation, similar to (df).

Each variance decomposition shown in Fig. 1 arises from a combination of factors unique to each location. For example, the importance of downscaling uncertainty for Seattle precipitation may be related to its positioning in a mountainous region35, whereas the dominance of downscaling uncertainty in Lagos temperature projections may be driven by disagreements among the underlying observational datasets used to perform the downscaling (Supplementary Figs. 11 and 12). Fully explaining each uncertainty decomposition would require expertise regarding the many physical processes affecting each location’s climate, an understanding of their representations in the CMIP6 GCMs, and knowledge of how the resulting temperature and precipitation outputs are affected by each downscaling methodology. Although beyond the scope of the current work, these considerations are critical in determining which ensemble(s) and models therein to rely on for decision or risk analyses.

We now apply our variance decomposition globally, continuing to focus on climate averages. These results are shown in Fig. 2, where uncertainty sources are sorted along each column, and each row shows a 20-year averaging period representing the early, mid, or late 21st century. The global results are largely in keeping with those of the three example cities. For annual average temperature, across almost all regions of the globe, there is a marked increase in the contribution of scenario uncertainty over time and a corresponding decrease in downscaling uncertainty and interannual variability. This matches the behavior of each of the locations shown in Fig. 1, even if the magnitudes differ. For example, Lagos can be seen as an outlier in terms of the importance of downscaling uncertainty—by the late 21st century, downscaling still contributes around 25% of the total variance of Lagos temperature projections (Fig. 1f), almost double the global average. Figure 2a also shows that in many locations, model uncertainty grows to become the most important driver of variance by mid-century and continues to contribute a substantial fraction by late-century, though scenario uncertainty typically becomes larger. For annual total precipitation (Fig. 2b), interannual variability remains the dominant contributor, usually followed by downscaling uncertainty and model uncertainty, while scenario uncertainty is almost always negligible. As in Fig. 1, the precipitation decomposition changes little over time.

Fig. 2: Global variance decomposition of climate averages.
figure 2

a Variance decomposition for annual average temperature. Each column shows the contribution from a different source of uncertainty, measured as the fraction of total variance. Each row depicts a 20-year averaging period, where the variance decomposition is performed annually, and the results are averaged over time. The purple dots in the upper left subplot show the locations of New Delhi, Seattle, and Lagos. b Variance decomposition for annual total precipitation in the same layout as (a). The gray boxes in the lower left of each subplot give the area-weighted global average of each decomposition. A version of this plot with a more granular colormap is available in the Supplementary Information (Supplementary Fig. 28).

The global results shown in Fig. 2 also reveal some important spatial patterns. For both temperature and precipitation projections, major mountain ranges including the Rocky Mountains, the Andes, and the Himalayas exhibit comparatively large downscaling uncertainties with correspondingly lower contributions from other sources. This could be due to topographic influences on atmospheric dynamics that are not well represented in coarse-resolution GCMs, leading to methodological differences in the downscaling algorithms being amplified into a larger spread in outcomes36. However, the same regions also tend to show larger disagreements in the historical record (Supplementary Fig. 15), which can drive differences in the projections37,38. Indeed, we find that at the grid point level, downscaling uncertainty is more strongly correlated with observational disagreement than are the other sources (Supplementary Figs. 5 and 6).

Our global results broadly agree with HS09 and L20: for temperature projections, we find that interannual variability is largest over the mid- and high-latitudes; for precipitation projections, we find that model uncertainty is larger in the tropics compared to other regions. In our results, interannual variability remains considerably more important beyond the early 21st century, which arises because previous works apply decadal averages to each climate metric before performing the variance decomposition. Here we do not average any climate indices over time in order to ensure that our results remain sensitive to the entire distribution of possible outcomes in any given year. Applying long-term averaging before performing the variance partitioning would lead to a reduction in the importance of interannual variability, similar to what is observed in the Supplementary Information of B23.

Variance decomposition of climate extremes

While long-term averages are important indicators of climatic change, climate and weather extremes play an outsized role in driving environmental and socioeconomic impacts39. In this section, we therefore apply our variance decomposition approach to a suite of indices measuring climate extremes, focusing first on annual 1-day maxima of daily maximum temperature and daily precipitation, shown in Fig. 3. The spatial patterns of these results are somewhat similar to those of annual averages (Pearson correlation coefficients calculated at the grid cell level typically range from 0.4 to 0.6, shown in Supplementary Figs. 9 and 10); regions of complex terrain and areas of relatively large observational disagreement are again typically associated with larger downscaling uncertainties (Supplementary Figs. 13, 14, 16). The temporal evolutions are also broadly similar—for both average metrics and 1-day maxima, the precipitation decomposition remains approximately constant over time, and the temperature decomposition shows a pattern of increasing relative contributions from model and scenario uncertainty at the expense of downscaling uncertainty and interannual variability. In terms of the magnitude of the relative contribution from each source, the decomposition for 1-day precipitation maxima (Fig. 3b) is very similar to that for annual totals (Fig. 2b). One of the few differences is that interannual variability becomes slightly more important at the expense of model uncertainty, particularly in the tropics. For temperature projections, there are notable differences. Downscaling and interannual variability play a more important role at longer time horizons for annual maximum temperatures (Fig. 3a) compared to annual average temperatures (Fig. 2a). Recall that for annual average temperatures, scenario and model uncertainty account for most of the variance by the late 21st century (77%, globally averaged; Fig. 2a). The corresponding late-century breakdown for maximum temperatures is qualitatively different as each source contributes approximately equally (Fig. 3a).

Fig. 3: Global variance decomposition of annual 1-day maxima.
figure 3

a Variance decomposition for the annual maximum of daily maximum temperature. As in Fig. 2, columns delineate the contribution from each uncertainty source, and rows demonstrate the temporal evolution. b Variance decomposition for annual maximum 1-day precipitation, in the same layout as a. The gray boxes in the lower left of each subplot give the area-weighted global average of each decomposition. A version of this plot with a more granular colormap is available in the Supplementary Information (Supplementary Fig. 29).

We find qualitatively similar results for the annual maxima of daily average temperature and daily minimum temperature (Supplementary Fig. 31), although downscaling is slightly less important in both cases. We also consider how the uncertainty partitioning changes for temporally compounding extremes by repeating the calculation for 5-day maxima (Supplementary Fig. 32). This made very little difference for temperature projections; for precipitation, it led to a small decrease in the contribution from downscaling uncertainty and a corresponding increase in the importance of interannual variability.

There are several possible measures of climate extremes beyond annual 1-day maxima. Different end-users may care about distinct characteristics of a given hazard40, including its magnitude and timing in relation to relevant human or environmental thresholds, its correlation structure across space and time, and whether it co-occurs with another hazard41. Although mindful that any set of indices will neglect many aspects of climate extremes that are important for specific sectors, we now define and analyze a suite of metrics that aim to be as broad as possible. We analyze three threshold indices: the annual number of extremely hot days (defined as daily maximum temperature exceeding the local historical 99th percentile), the annual number of dry days (daily precipitation less than 1 mm), and the annual number of extremely wet days (daily precipitation exceeding the local historical 99th percentile). The resulting uncertainty decompositions are shown in Fig. 4.

Fig. 4: Global variance decomposition of threshold indices of climate extremes.
figure 4

Variance decomposition for: a annual number of extremely hot days, b annual number of dry days, and c annual number of extremely wet days. As in Figs. 2 and 3, columns delineate the contribution from each uncertainty source, and rows demonstrate the temporal evolution. Extremely hot days and extremely wet days are defined to occur when daily maximum temperature and daily precipitation exceed their local 99th percentiles, respectively, where percentiles are calculated over 1980–2014 from the GMFD observational dataset (see “Methods”). Dry days are defined to occur when daily precipitation is less than 1 mm. The gray boxes in the lower left of each subplot give the area-weighted global average of each decomposition. A version of this plot with a more granular colormap is available in the Supplementary Information (Supplementary Fig. 30).

Several insights emerge from Fig. 4. First, there continues to exist a clear qualitative difference between the precipitation- and temperature-based indices. The decomposition for dry days (Fig. 4b) and extremely wet days (Fig. 4c) is roughly constant over time and largely dominated by downscaling uncertainty and interannual variability, while scenario uncertainty again contributes negligibly. In contrast, the results for extremely hot days (Fig. 4a) show a similar temporal pattern to previous temperature-derived metrics where model and scenario uncertainty play an increasingly important role at longer time horizons. Second, note that in many regions, model uncertainty is the most important factor by the late 21st century in projecting extremely hot days, which contrasts with our results for the non-threshold metric of temperature extremes, annual maxima (Fig. 2a). This is likely related to the large spread in CMIP6 climate sensitivities42. Since we define an extremely hot day in reference to a constant (local) temperature threshold, higher-sensitivity GCMs will tend to cross that threshold earlier than lower-sensitivity GCMs, leading to a relative increase in model uncertainty. Third, for all metrics analyzed thus far, the annual number of dry days is markedly the most sensitive to the choice of downscaled ensemble. This may be related to observational disagreements regarding the historical frequency of dry days (Supplementary Fig. 17) but could also be driven in part by methodological differences in whether and how the bias-correction algorithms adjust their outputs based on minimum precipitation thresholds43. Finally, our results for extremely hot days and extremely wet days are in reasonable qualitative agreement with those of B23, notwithstanding some differences in the magnitudes that arise due to our inclusion of downscaling uncertainty and our decision not to apply decadal averaging.

In the Supplementary Information, we test the sensitivity of these results to several different threshold definitions (Supplementary Figs. 3345). Broadly, we find that downscaling becomes less important if daily average or minimum temperatures are considered instead of the daily maximum, and interannual variability becomes more important if more extreme thresholds are used. Calculating the historical quantiles from a separate observational dataset can lead to some differences in the contribution from downscaling uncertainty, but this does not change the qualitative results. We also include extensions to account for temporally compounding extremes by calculating the longest consecutive run of days crossing each threshold, the main effect of which is to increase the importance of interannual variability (Supplementary Figs. 3345). Lastly, we also investigate a simple multivariate metric, extremely hot and dry days (Supplementary Figs. 4647), which shows a very similar decomposition to that for extremely hot days. This indicates that conditioning the occurrence of daily temperature extremes on concurrent low precipitation does little to alter the uncertainty decomposition, although it is unclear whether this result would hold over longer timescales.

Implications for risk assessment

Our results so far have been presented in terms of the relative contribution of each uncertainty source. However, the magnitude of these contributions in physical units is also important, particularly for end-users who require decision-relevant information. In Fig. 5, we show the absolute uncertainty attributed to downscaling in the middle of the century for four previously defined indices of extremes. The absolute uncertainty is measured in physical units by computing the standard deviation across downscaled ensembles rather than the variance. Although the uncertainty decomposition only holds in variance space, Fig. 5 can provide a heuristic estimate of the extent to which the overall projection spread may be underestimated by relying on a single set of downscaled projections. Note that in contrast to the percentage shares, the absolute uncertainty contributed by each source tends only to grow over time (Supplementary Figs. 6975).

Fig. 5: The contribution of downscaling to absolute uncertainty.
figure 5

Absolute uncertainty attributed to downscaling, averaged over 2050–2069, for: a annual number of extremely hot days, b annual maximum of daily maximum temperature, c annual number of extremely wet days, and d annual maximum 1-day precipitation. The absolute uncertainty is expressed via the standard deviation across ensembles at each grid point and is measured in physically meaningful units. The gray boxes in the lower left of each subplot give the area-weighted global average of each contribution.

The heterogeneity across locations and metrics demonstrated in Fig. 5 suggests that the relevance of downscaling uncertainty for local decision and risk analyses is highly contextual. To demonstrate this further, we provide a stylized example around characterizing mid-century hot and wet extremes in Seattle, shown in Fig. 6, which illustrates the effects of only sampling from one downscaled ensemble relative to the entire super-ensemble. Across most metrics and emissions scenarios shown in Fig. 6, key distributional statistics such as the upper percentiles and inter-percentile ranges can vary considerably among downscaled ensembles as well as in relation to the full ensemble. For the precipitation-based metrics (Fig. 6c, d), the differences among downscaled ensembles are larger that those induced by switching from the lowest to highest emissions scenario. Even for the temperature-based metrics that show strong sensitivities to emissions scenario (Fig. 6a, b), relying on different downscaled ensembles can lead to qualitatively different risk perceptions in relation to local thresholds. Consider, for example, the extraordinary 2021 Pacific Northwest heatwave, which has been extensively studied after breaking several temperature records throughout the region44,45,46,47, leading to widespread impacts across many sectors48. During this event, Seattle-Tacoma airport recorded a temperature of 42.2 °C49 (denoted by the dashed vertical line in Fig. 6b). Figure 6b shows that estimates of the likelihood of surpassing this record by mid-century depend on the choice of downscaled ensemble, as one ensemble projects that this record is unlikely to be broken by mid-century even under an extreme emissions scenario.

Fig. 6: Hazard characterization depends on modeling choices.
figure 6

Comparison of the probability distribution generated by relying on the full ensemble (including all downscaled ensembles) versus any one downscaled ensemble, conditioned on the highest (SSP5-8.5) and lowest (SSP1-2.6) emissions scenarios. Distributions are constructed for the grid point containing Seattle over 2050–2069 for different metrics: a annual number of extremely hot days, b annual maximum of daily maximum temperature, c annual number of extremely wet days, and d annual maximum 1-day precipitation. Boxplot whiskers span the 99% range. The dashed vertical line in (b) denotes the highest temperature recorded at Seattle-Tacoma airport during the 2021 Pacific Northwest heatwave. Details on each downscaled ensemble and the SSP scenarios can be found in the Methods section and Supplementary Information. We neglect the carbonplan ensembles here since they contain a limited number of models.

Although we present here a highly simplified example that neglects many of the challenges of implementing risk assessments in a nonstationary climate50,51, it nonetheless serves to illustrate how modeling choices surrounding downscaled data sources can induce substantively different hazard characterizations. These results suggest that careful consideration should be given to the role of downscaling uncertainty within any broader framework as failure to do so may lead to decisions that are not robust to the full set of plausible climate futures.

Discussion

Our main finding, that downscaling and bias-correction often contribute considerable uncertainty in local climate projections, is robust to a number of methodological checks that we outline in the Methods section and Supplementary Information. There are nevertheless several possible avenues of future research. First, note that despite our simplified treatment of internal variability (see associated discussion in “Methods”), we nonetheless find that interannual variability is an important driver of uncertainty for many metrics. For several precipitation-based metrics and indices of extremes, the combined contribution of interannual variability and downscaling drive a large share of the variance. This would suggest that future work characterizing uncertainties around the role of internal variability at local scales would be valuable. The framework presented here could be extended to include downscaled initial condition ensembles52, but to our knowledge such an ensemble does not yet exist at global scale. Independent estimates of internal variability at local scales, potentially derived from hybrid statistical techniques53, could also be used to test for potential biases in the model-derived representation used here.

Second, one important limitation of this work is the necessarily unbalanced sample design. Our constraint of global spatial coverage led to the omission of many downscaled ensembles that are only available at continental or national scales. As such, many GCMs in our super-ensemble are only downscaled via two different methods, and our estimate of the downscaling uncertainty (the variance across downscaling methods) likely suffers from biases associated with this small sample size. We partially mitigate this bias by averaging each individual estimate across GCMs but expanding the super-ensemble to include a greater variety of downscaling methods should lead to more robust estimates. Adding more ensembles to the uncertainty decomposition could increase or decrease the relative importance of downscaling29.

It is also important to highlight that our definition of downscaling uncertainty encompasses more factors than just the selection of each downscaling and bias-correction algorithm. In general, many decisions related to the development of a downscaled ensemble can contribute to observed differences in outputs. Such choices include the underlying observational dataset and the temporal extent used for training, and any re-gridding processes applied to the observations or native GCMs. Alternate configurations of the same downscaling or bias-correction algorithm, for example, related to the preservation of GCM-simulated trends, can also lead to considerable differences54. In the Supplementary Information, we provide qualitative evidence that in many cases considered here, downscaling uncertainty is related to disagreements in the historical record. However, additional research is needed to more precisely separate the effects of each component in the downscaling and bias-correction process.

Third, we make an implicit assumption that the outputs from each scenario, GCM, and downscaling method represent equally plausible realizations of future climate. Methods are emerging that aim to constrain climate projections by downweighting55,56 or sub-selecting57,58 GCMs based on their agreement with historical observations, potentially combined with probabilistic emissions constraints59,60,61. However, the extension of such techniques to downscaled outputs remains an area of active research62,63 and may be complicated by the presence of observational disagreements at local scales if downscaling algorithms rely on conflicting datasets64. The application of any such framework would decrease absolute uncertainty, but may not reduce the relative importance of downscaling uncertainty if some GCMs or scenarios are down-weighted or removed from the ensemble. Future work investigating these questions would be valuable.

Fourth, we again highlight that our selection of climate metrics is necessarily limited. Since all of the indices we analyze are calculated annually, we are unable to probe extremes that manifest on longer timescales (for example, the magnitude of a 10-year return period event) and we aggregate over seasonal information that is important for many sectors. A useful extension to this work could test how these aspects of climate hazards alter the variance decompositions. In addition, moving beyond standardized meteorological indices to analyze targeted metrics that are relevant for specific sectors may lead to qualitatively different results65.

Finally, note that variance decomposition is only one of many possible approaches to characterize uncertainty. More formal sensitivity analysis techniques can be applied to understand specific aspects of the outcome space66 and ensure that inferences are relevant for downstream decision analyses67. In addition, climate projections are often used to drive sectoral models that contain their own structural and parametric uncertainties68,69,70,71. Socioeconomic outcomes of interest may well be more sensitive to the representation of these environmental and/or human system dynamics, and sound risk management strategies should account for the uncertainty in each relevant system as well as their interactions72.

Our results have important implications for many users of downscaled climate products. Across almost all locations, time horizons, and indices of climatic change that we analyze, downscaling rarely represents a negligible source of uncertainty. This would imply that a strategy of sampling from more than one downscaled ensemble is advisable during risk or impact analyses that are sensitive to low-probability climate hazards, as has been suggested elsewhere29,34. Such a sampling may represent a substantial increase in data and computational requirements, so we emphasize that it may not be necessary in all cases. Our results can provide some initial heuristic guidance in this regard—they suggest that downscaling uncertainty is particularly important over the near term, in projections involving precipitation or climate extremes, and in regions of observational disagreement. We have also developed an interactive JupyterLab-based dashboard, deployed on the MSD-LIVE platform, that facilitates further exploration of our results: https://lafferty-sriver-2023-downscaling-uncertainty.msdlive.org. In general, we urge end-users to follow existing recommendations regarding the use of downscaled climate products16,73, including taking a process-informed approach and relying on expert knowledge of local weather and climate phenomena74. End-users may also consider whether downscaled projections are the most appropriate method of generating future climate information; other complementary approaches might include applying GCM-simulated changes to gridded historical data75 or developing a statistical model based on pointwise observations76.

This work also adds to a growing body of literature applying an increasingly diverse set of tools to characterize the uncertainties of a changing climate and the resulting environmental and socioeconomic impacts. Deliberate efforts to coordinate methodological comparisons would help build confidence in the insights derived from this line of research, which in turn will be necessary to guide best practices for the increasing number of both public and private actors who are incorporating climate projections into their decision-making processes.

Methods

Data sources

We leverage five ensembles of statistically downscaled and bias-corrected GCM outputs: NASA NEX-GDDP-CMIP677 (which we refer to as NEX-GDDP), CIL-GDPCIR78, ISIMIP3BASD79,80 (which we refer to as ISIMIP3b), and two ensembles from carbonplan81: GARD-SV82 and DeepSD-BC83. Some details on the configurations of each approach can be found in Supplementary Table 2. Each ensemble is filtered to ensure: (1) parent GCMs are available in at least two ensembles, (2) downscaled outputs for each GCM are available for at least 3 Shared Socioeconomic Pathways (SSPs)84, (3) downscaled outputs are missing no more than one variable (from tasmax, tasmin, and pr), and (4) downscaling is performed on the same simulation member of the parent GCM. Satisfying these requirements results in dropping 13 of 35 NEX-GDDP parent models and 8 of 25 CIL-GDPCIR parent models. All ISIMIP3b outputs are used. Additional outputs from different downscaling techniques are available in the carbonplan dataset but do not satisfy the above requirements. After calculating each metric in each ensemble, all outputs are conservatively re-gridded to a common 0.25° grid.

For the threshold metrics that require comparing projection outputs to historical quantiles, we rely on two observational datasets: the Global Meteorological Forcing Dataset (GMFD) for Land Surface Modeling85 and the ERA5 reanalysis from the European Centre for Medium-Range Weather Forecasts86. These products are chosen because they are available globally at 0.25° spatial resolution. GMFD is the training dataset for the NEX-GDDP ensemble, and ERA5 is the training dataset for the CIL-GDPCIR ensemble and both carbonplan ensembles, although with different temporal extents. The ISIMIP3b ensemble is trained on W5E5 v2.087,88, which is only available at 0.5° spatial resolution. The quantiles are calculated from daily data over 1980–2014. We conservatively re-grid both observational datasets to the native grid of each downscaled ensemble before calculating the threshold metrics. Our definition of extremely hot days and extremely wet days in the main results is based on daily maximum temperature and daily total precipitation exceeding the local 99th percentile from GMFD, respectively. In the Supplementary Information, we compare the GMFD-calculated quantiles to those obtained from ERA5 (Supplementary Figs. 1821).

Uncertainty partitioning

Following previous works, we employ a simple variance decomposition approach to calculate the relative uncertainty arising from four sources: scenario uncertainty, model/GCM uncertainty, downscaling uncertainty, and interannual variability. Additionally, in a similar manner to Wootten et al.29, we employ a weighting strategy that accounts for data coverage. Our method is as follows: let x(t, s, m, d) represent a given climate metric in some location at year t from scenario s, parent GCM m, and downscaling method d. We first estimate the forced response \(\hat{x}(t,s,m,d)\) by fitting a 4th-order polynomial over 2015–2100. Interannual variability is then estimated as the centered rolling 11-year variance of the difference between the extracted forced response and the raw outputs, averaged over all outputs. The assumption of constant interannual variability was highlighted as one shortcoming of HS09, so in this work we allow the magnitude of interannual variability to evolve over time. The contribution of each remaining uncertainty source is calculated based on the forced response. Scenario uncertainty is estimated as the variance over scenarios of the multimodel, multi-method mean,

$${U}_{s}(t)={{{{\rm{var}}}}}_{s}\left[\frac{1}{N(s)}\mathop{\sum}\limits_{m,d}\hat{x}(t,s,m,d)\right],$$
(1)

where N(s) is the total number of downscaled outputs available for scenario s. The above definition may underestimate the true scenario uncertainty when the multimodel, multi-method response is weak. Brekke and Barsugli89 propose taking the variance over scenarios before averaging to circumvent this issue:

$${U}_{s}^{bb13}(t)=\frac{1}{{N}_{m}{N}_{d}}\mathop{\sum}\limits_{m,d}{{{{\rm{var}}}}}_{s}\left[\hat{x}(t,s,m,d)\right].$$
(2)

Here, Nm and Nd are the number of distinct GCMs and downscaling methods in our super-ensemble, respectively. Our main results are based on the former definition of scenario uncertainty, following much of the existing literature. In the Supplementary Information, we show that scenario uncertainty is indeed larger under the Brekke and Barsugli definition, although this does not change the qualitative results (Supplementary Figs. 4854). Model uncertainty is estimated as the weighted mean of the variance across models,

$${U}_{m}(t)=\mathop{\sum}\limits_{s,d}{w}_{s,d}{{{{\rm{var}}}}}_{m}\left[\hat{x}(t,s,m,d)\right].$$
(3)

The weights ws,d are chosen such that if more parent GCMs are available for a given downscaling method and scenario (i.e., if the variance is calculated across more GCMs), those methods and scenarios are weighted higher:

$${w}_{s,d}=\frac{m(s,d)}{{\sum }_{s,d}m(s,d)}.$$
(4)

Here, m(s, d) indicates the number of parent models that have been downscaled using method d for scenario s. Downscaling uncertainty is estimated as the weighted mean of the variance across methods:

$${U}_{d}(t)=\mathop{\sum}\limits_{s,m}{w}_{s,m}{{{{\rm{var}}}}}_{d}\left[\hat{x}(t,s,m,d)\right],$$
(5)

where the weights ws,m are chosen such that if more downscaled outputs are available for a given GCM and scenario, those GCMs and scenarios are weighted higher:

$${w}_{s,m}=\frac{d(s,m)}{{\sum }_{s,m}d(s,m)}.$$
(6)

Here, d(s, m) indicates the number of downscaled outputs available from parent GCM m and scenario s. The weighting strategy can be made more intuitive with an example: from Supplementary Table 1, there are five different downscaled outputs available from the CanESM5 parent GCM whereas only two different downscaled outputs are available from CMCC-ESM2 (neglecting SSP availability). The weighting strategy assumes that the estimated downscaling uncertainty from CanESM5 provides more information about the true uncertainty than the estimate from CMCC-ESM2. In this illustrative example, our estimate for the true downscaling uncertainty would be a weighted average of the two individual estimates, where the CanESM5 estimate is weighted higher by a factor of 5/2. In the Supplementary Information, we recalculate our main results without performing any weighting and show that the qualitative interpretations are unchanged (Supplementary Figs. 5561).

We assume that the total variance in each year is given by the sum of each individual variance estimate. Our main results show the relative contribution of each uncertainty source measured as a fraction of the total variance.

Methodological caveats

Here we outline two additional methodological caveats associated with our main results. First, the 4th-order polynomial fit used to separate the forced response from interannual variability likely leads to an underestimate of the true extent of internal variability since the fit will interpret unforced fluctuations as being part of the forced response. L20 show that for coarse-resolution GCM outputs, this bias can be particularly acute at regional scales and for noisy output variables such as precipitation, reaching 50% of the total uncertainty in some cases. One approach to mitigate this bias is to average over large spatial scales, but this would considerably reduce the influence of downscaling, which is our primary focus in this work. Alternatively, using a large number of model outputs may achieve a more robust averaged estimate. Our inclusion of over 200 downscaled model outputs across 22 GCMs may be sufficient in many cases, but this is difficult to verify within the current framework. As noted in L20, more sophisticated methods of extracting the forced response could also be used (e.g., ref. 90).

Second, our main results neglect interactions among uncertainty sources, which previous studies have shown to be significant in some instances91. To estimate the importance of interaction effects, we implement two checks. We first perform an ANOVA-based variance decomposition (described in the Supplementary Information) for all metrics across our three example cities. We find that interactions are small for projections of climate averages (Supplementary Fig. 64) but can sometimes be important for extremes (Supplementary Fig. 65). B23 note that accounting for the interaction between model and scenario uncertainty may alter their results, but we find this effect to be small—the interaction between model and downscaling uncertainty is typically larger. Our ANOVA results do not qualitatively alter the relative importance of each uncertainty source, but rather reassign the fractions of variance partitioned to each source to additional interaction terms. For example, when neglecting interactions we find that downscaling is the largest driver of variance for the annual number of extremely hot days in Lagos by the end of the century (Supplementary Fig. 63). The corresponding ANOVA results reveal large model-downscaling and scenario-downscaling interaction effects (Supplementary Fig. 65), but the overall (combined) influence of downscaling uncertainty remains important.

We also test whether our assumed total uncertainty, the sum of each individual term, differs from true total uncertainty, given by the variance across all outputs:

$${U}_{total}^{true}(t)={{{{\rm{var}}}}}_{s,m,d}\left[x(t,s,m,d)\right].$$
(7)

Regions and metrics for which the true total uncertainty is considerably different from our assumed total uncertainty indicate that interaction effects may be important. In the Supplementary Information, we show the ratio of these two quantities for each metric globally (Supplementary Fig. 68) and for our example cities (Supplementary Figs. 66 and 67). Our independence assumption generally leads to small errors (≤10%) for annual averages and annual 1-day maxima. The Sahara Desert stands out as a region of potentially large interaction effects for the precipitation-based metrics. For the threshold metrics, our assumption is less defensible as the discrepancy can reach 20% over many regions. Interaction effects may be particularly important for temperature-based metrics over the Amazon rainforest, and the Sahara again shows comparatively large discrepancies for the precipitation-based metrics. Although these discrepancies do not necessarily indicate the presence of large interaction effects, future research could investigate them in more detail.