Likelihood of unprecedented drought and fire weather during Australia’s 2019 megafires

Between June 2019 and March 2020, thousands of wildfires spread devastation across Australia at the tragic cost of many lives, vast areas of burnt forest, and estimated economic losses upward of AU$100 billion. Exceptionally hot and dry weather conditions, and preceding years of severe drought across Australia, contributed to the severity of the wildfires. Here we present analysis of a very large ensemble of initialized climate simulations to assess the likelihood of the concurrent drought and fire-weather conditions experienced at that time. We focus on a large region in southeast Australia where these fires were most widespread and define two indices to quantify the susceptibility to fire from drought and fire weather. Both indices were unprecedented in the observed record in 2019. We find that the likelihood of experiencing such extreme susceptibility to fire in the current climate was 0.5%, equivalent to a 200 year return period. The conditional probability is many times higher than this when we account for the states of key climate modes that impact Australian weather and climate. Drought and fire-weather conditions more extreme than those experienced in 2019 are also possible in the current climate.


INTRODUCTION
The 2019-2020 wildfire season in Australia was among the most catastrophic in recorded history, causing severe social, environmental, ecological and economic impacts across the continent. An area larger than the size of the United Kingdom was burned (estimates range from 24 to 34 million hectares 1 ), including at least 21 percent of Australia's temperate forests 2,3 and over 3000 homes 1 . Thirty-three deaths occurred as a direct result of the fires 4 . Hundreds more deaths, and thousands of hospital and emergency-department admissions, have been attributed to the extreme levels of air pollution resulting from the wildfire smoke 5,6 . The estimated death toll for animals is in the billions 1 , with fears that some species have been driven to extinction 7,8 . Recent estimates of the total economic loss to Australia resulting from the 2019-2020 wildfires are in the order of AU$100 billion 9,10 .
Two key factors have been linked to the severity of the 2019-2020 wildfires. First, the exceptionally dry conditions in the years and months leading up to the fire season produced very low fuel-moisture content, especially in eastern Australia [11][12][13][14] . The widespread drought conditions have been connected to the states of the El Niño Southern Oscillation (ENSO) and Indian Ocean Dipole (IOD) in the years and months preceding 2020 [15][16][17] . Second, the extremely hot and dry weather conditions experienced across Australia during the 2019-2020 summer were particularly favorable to fire ignition and spread 11,[18][19][20][21] . A number of extreme weather records were broken over this period, which included Australia's highest national daily-averaged temperature (41.9 o C) 15 and record-high values of the Forest Fire Danger Index (FFDI) in areas of all Australian States and Territories 19 . The Southern Annular Mode (SAM), which was strongly in its negative phase during the spring and summer of 2019, has been implicated in the unusually hot and dry conditions across eastern Australia 22 .
Many studies have investigated how climate and weather conditions favorable to wildfires in Australia have changed historically and how they will continue to change into the future. Paleoclimate records indicate an increase in the last century in the occurrence of the fire-promoting phases of both ENSO 23 and the IOD 15,24 . These increases may continue in the coming decades 25,26 . Observed records since the mid-twentieth century show a trend towards more dangerous fire-weather conditions for much of Australia [27][28][29] and a corresponding reduction in the time between major wildfires 30 . Future projections of Australian fire weather are strongly region-and model-dependent, but generally indicate increased severity in southeast Australia [31][32][33][34] .
Estimates of the likelihoods of increased susceptibility to fire from extreme climate and weather are essential for policy makers, contingency planners, and insurers. However, such likelihoods are difficult to quantify from observed records, which are limited to approximately the past century and thus provide few samples of extremely susceptible conditions in a given region 28,35,36 . There was, for example, no direct observational precedent for the high values of FFDI nor the low annual accumulated rainfall total experienced in southeast Australia in 2019 15 . Further, assessment of likelihoods is compounded by nonstationarity in the observed record, resulting, for example, from climate change. Even if past likelihoods could be well determined from the observed record, they may not be representative of current wildfire susceptibility.
Climate models can provide large samples of plausible conditions over short time periods that can be used to reduce uncertainties in quantifying risk. Previous studies have used ensemble seasonal and weather prediction systems to estimate return periods of surge levels in the Netherlands 37,38 and of significant wind and wave heights globally [39][40][41][42] . More recently, the approach of quantifying risks of extremes using ensemble climate simulations has been popularized under the acronym UNSEEN, standing for UNprecedented Simulated Extremes using ENsembles. The UNSEEN approach has been used to assess the risk of droughts and heat waves 43,44 and to quantify likelihoods of extreme meteorological events, such as instances of unprecedented rainfall 45,46 and temperature 47 , and sudden stratospheric warming in the Southern Hemisphere 48 .
In this paper, we use a decadal ensemble climate forecast model to quantify the likelihood of concurrent extreme drought and fire weather in a region of southeast Australia where the 2019-2020 wildfires burned significant area (Fig. 1). We focus on two indices averaged over this region (below, overlines denote regional averaging). Our drought index, DI, is defined as the total accumulated rainfall from January to December and quantifies how preconditioned for fires the landscape may be leading into the wildfire season of a given year. We use the December-average FFDI, FFDI Dec , to quantify how severe fire-weather conditions are near the peak of the fire season of a given year. These indices were selected to capture the unprecedented nature of the drought and fire weather in southeast Australia in 2019. Simultaneous low values of DI and high values of FFDI Dec indicate elevated susceptibility to wildfires in southeast Australia. Therefore, hereafter, we will refer to their vector as simply "fire susceptibility".
Our climate forecast dataset comprises 10-year long, daily forecasts, each with 96 ensemble members, initialized at the beginning of every May and November over the period 2005-2020. By pooling forecast ensemble members and lead times, these forecasts provide up to 1920 times more samples of DI and FFDI Dec in the current climate than are available from observed records ("Methods"). After first checking that these many samples provide accurate and independent representations of the real world ("Model fidelity"), we use them to estimate the likelihoods of exceeding extreme values of DI and FFDI Dec , including the unprecedented values experienced during the 2019-2020 wildfires ("Likelihoods of exceedance"). The very large number of forecast samples yields many years with FFDI Dec and DI that are simultaneously more severe than the observed 2019 values. This enables us to test the correspondence between unprecedented drought and fire weather in southeast Australia and the states of ENSO, IOD and SAM ("Extreme susceptibility to fire and climate drivers").

RESULTS
Historical record of fire susceptibility The historical record of FFDI Dec and DI is shown in Fig. 2. These data are calculated from high quality atmospheric reanalysis and gridded rainfall data ("Methods") and are referred to hereafter as "observations". The data points in Fig. 2 are shaded according to the year for which they are calculated, with colored shading for years in which severe wildfires occurred in summer in southeast Australia.
Severe fires have generally been associated with extreme values of FFDI Dec and DI (Fig. 2). The values of both indices recorded in 2019 were the most extreme in the 63 years (1958-2020) of observational data available for both indices (Fig. 2a). Indeed, there was no precedent for the 2019 DI values in the full 121 years (1900-2020) of data available for this index (see Supplementary  Fig. 1). A large proportion of Australia experienced unprecedented values of December-averaged FFDI in 2019 (Fig. 2b). Similarly, much of eastern and central Australia accumulated the lowest rainfall annually in 2019 relative to all other years in the joint historical record (Fig. 2c). Over much of our region of interest, the minimum DI occurred in 1980 and 1982, which were also years in which unplanned summer fires burned extensive areas of southeast Australia. We can quantify the joint extremity of DI and December-average FFDI in a given year as the normalized distance from the mean index values over 1958-2020: , where primes indicate the difference from the mean index over 1958-2020, normalized by the standard deviation of the index over the same period. Even at a local scale, this joint quantity was unprecedented in 2019 over much of southeast Australia (Fig. 2d).
The small number of samples in the observed record makes quantification of the probabilities of extreme FFDI Dec and DI events very difficult. It is not possible, for example, to directly determine the likelihood of an event more severe than that observed in 2019 simply because no such event has ever been observed. Statistical techniques enable extrapolation of fitted distributions, but require assumptions about the shape of the distribution and still suffer from large uncertainties when the sample size is small [49][50][51] . Issues with sampling become increasingly restrictive as dimensionality-that is, the number of variables -increases 52 . With its very large sample size, our forecast model (hereafter "model") provides many in-sample estimates of rare extreme events, allowing for the probabilities of these events to be determined directly from the empirical probability density function 53 .

Model fidelity
In order to provide reliable estimates of likelihoods of fire susceptibility, the model samples must be stable, independent and realistic estimates of the real world 46 . Stability here refers to an absence of systematic changes in the estimates of FFDI Dec and DI with model lead time. Model stability is necessary for pooling samples at different lead times. Dependence between model samples inflates the sample size without adding new information and arises at short lead times because the ensemble forecasts are initialized from similar initial conditions. We remove dependent samples by considering only model lead times for which ensemble members are uncorrelated (≥37 months: Fig. 3, see also "Methods").
To assess the model fidelity, we compare the modeled joint and marginal distributions of FFDI Dec and DI to the observed distributions over a common time period (Fig. 4). Initial  We therefore also assess the distributions computed from another set of forecast model data, denoted f 1980! 10 mem . These data are produced using the same decadal forecast system and initialization dataset as f 2005! 96 mem . However, they have only 10 ensemble members and they are initialized over the longer period 1980-2020. The latter enables comparison to a much larger set of observed values of FFDI Dec and DI. In Fig. 4d-f we compare the f 1980! 10 mem distributions over the period 1989-2020 (comprising 4480 samples) with observations over the same period ("Methods"). As for the f 2005! 96 mem data, the f 1980! 10 mem data distributions are stable and show good agreement with observations over the matched time period.
We use a two-dimensional, two-sample Kolmogorov-Smirnov (KS) test 54,55 to test the null hypothesis that the modeled and observed joint distributions (gray and white points in Fig. 4) are the same ("Methods"). Proxy time series are generated by randomly subsampling the model data for sets of equal length to the observed record. These sets are compared with the full model distribution to produce a null distribution for the KSstatistic, K. The KS-statistic calculated between the observed and modeled distributions, K obs , is compared with the null distribution. For both f 1980! 10 mem and f 2005! 96 mem , the observed KS-statistic falls below the 95 th percentile of the null distribution (p-value > 0.05) and hence the model is considered to provide values of FFDI Dec and DI that are consistent with the observed record (Fig. 4g). The KS test also confirms consistent modeled and observed distributions when applied using model data at each lead time independently (Fig. 4h), indicating that the independent model samples are both realistic and stable.

Likelihoods of exceedance
In 2019 unprecedented values of FFDI Dec and DI were observed that were respectively 19% higher and 8% lower than previous record values (since 1958). These record values coincided with one of the worst wildfire seasons in recorded history 1-3 . Our model simulations show that the likelihoods of experiencing FFDI Dec or DI values equal to or more extreme than those experienced in 2019 are 7.8% and 1.5%, respectively, in the current climate (2014-2023 comprising 13,440 samples, Fig. 5a, b). The likelihood of exceeding both simultaneously is roughly 0.5%, indicating a return period for the 2019 event of approximately 200 years (Fig. 5c). Note that for DI to be "more extreme" or to "exceed" is to have a lower value, since lower values of DI indicate drier conditions that are more conducive to wildfires.
Values of FFDI Dec and DI substantially more extreme than observed records occur within the model sample (Fig. 5d). This is especially true for FFDI Dec , for which nearly 8% of all model samples are more extreme than the observed 2019 value, with some model realizations up to twice as high. Indeed, there are realizations from the model where FFDI Dec and DI are and for all lead times together (black dashed). For the joint distribution, probability densities are from a twodimensional kernel density estimate and are presented with contour levels at 0.5e-4, 2e-4, and 4e-4 (enclosing approximately 92%, 63%, and 27% of the model data across all lead times). d-f As in a-c, but using f 1980! 10 mem data over the period 1989-2020. g Null distributions of the Kolmogorov-Smirnov statistic, K, resulting from bootstrapping the f 1980! 10 mem (purple shading) and f 2005! 96 mem (pink shading) datasets using all independent lead times. h As in g, but for each lead time separately. In g and h, the distributions are presented as a difference between K and the KS-statistic calculated between the observed and model data, K obs , such that the vertical black line (at K − K obs = 0) indicates the location of K obs in the distributions of K. Colored numbers show the right-tail p-value for the f 2005! 96 mem (pink) and f 1980! 10 mem (purple) distributions. Every sample from our model provides a simulated realization of the earth system, including the ocean and atmosphere. We can quantify for every modeled sample of FFDI Dec and DI the corresponding states of ENSO, IOD and SAM over the period leading into the wildfire season using the Nino 3.4, DMI and SAM I indices (defined in "Methods" and assessed in Supplementary Figs.  2 and 3). The model provides 66 simulated Earths over the period 2014-2023 with values of FFDI Dec and DI worse than the most extreme event in the observed record (2019). Of these samples, approximately 80% are associated with simultaneous positive ENSO, positive IOD and negative SAM states (Fig. 6a). Composites of sea surface temperature and 500 hPa geopotential height anomalies generated from years of unprecedented FFDI Dec and DI Dashed white lines show contours of a two dimensional kernel density estimate using the forecast data (levels at 2e-05, 1.8e-04, 3.4e-04, and 5e-04, enclosing approximately 97%, 68%, 39%, and 13% of the data, respectively). d As in c but showing only values of FFDI Dec and DI that are more extreme than the observed 2019 values. In all panels, likelihoods are calculated using model data over the period 2014-2023. Note that here "exceeding" a value of DI is defined as having a smaller magnitude since lower values of DI are indicative of increased susceptibility to fire.  exhibit the typical patterns associated with these states (Fig. 6b, c). While these results indicate a clear correspondence between extreme susceptibility to fire and the driver states, there is little evidence that the strength of the driver states is related to the joint magnitude of unprecedented FFDI Dec and DI (as quantified by the normalized distance from the mean-see the shading of points in Fig. 6a). That is, the severity of extreme wildfire susceptibility is apparently associated mostly with the concurrency of fire-conducive Nino 3.4, DMI, and SAM I states and not with their individual magnitudes, a characteristic of compound events previously described in frameworks for understanding extreme impacts [62][63][64] .
Conditioning on the states of ENSO, IOD, and SAM has a large impact on the likelihood of simultaneously experiencing unprecedented values of FFDI Dec and DI (Fig. 7). The SAM appears to be the strongest driver; when SAM I is strongly negative (< −1 standard deviation) over spring the likelihood of an unprecedented event is over 5 times higher (nearly 3%) than if the state of the SAM is not considered. Likelihoods are higher still when conditioned on more than one of the drivers being strongly in their fire-conducive phases (up to approximately 4% for strongly positive Nino 3.4 and strongly negative SAM I ). In 2019, both the IOD and SAM were strongly in their fire-conducive phases (Fig. 6a). According to the model, the likelihood under such conditions of the unprecedented values of FFDI Dec and DI in 2019 was approximately 3%.

DISCUSSION
We have used a very large ensemble of climate model simulations to quantify the likelihood of high susceptibility to wildfire from concurrent extreme fire-weather and drought in southeast Australia. Our analysis shows that the likelihood of experiencing the unprecedented conditions that occurred leading into and during the catastrophic 2019-2020 wildfire season was approximately 0.5% in the current climate. Substantially more extreme conditions are also realized by the model, and the impact of such conditions on the severity of wildfires in southeast Australia is a potential area for future research. A very high proportion (~80%) of the model realizations with more extreme fire susceptibility than that observed during 2019 occur when ENSO, IOD and SAM are all in their fire-promoting phases-positive, positive and negative, respectively-during the austral spring and early summer. Accounting for the observed phases and strengths of these climate modes, the likelihood of the fire-weather and drought conditions experienced in 2019 was approximately 3%.
ENSO and IOD are predictable on seasonal timescales, particularly during austral winter and spring when any event has already started to establish itself and persistence plays a first order role in predictability 65,66 . The neutral ENSO and positive IOD conditions experienced during spring (SON) of 2019, for example, were predicted by the Australian Bureau of Meteorology in June 67 and their implications on the fire season were foreshadowed in August 68 . SAM events are generally shorter lived and less predictable, although there is evidence that the significant stratospheric polar vortex weakening in spring 2019 and subsequent development of negative SAM was forecast as early as late July 58 . Thus the corresponding quantitative increases in likelihoods of extreme fire susceptibility demonstrated in this paper were potentially predictable months in advance of the peak in the 2019-2020 fire season.
The work in this paper builds on a growing area of research using climate simulations to assess and explore extreme events. There is enormous value to policy planners and decision makers in the ability to quantify the probability of impactful climate and weather events, particularly those that are unprecedented in the observed record. However, the use of climate models to quantify real-world risk is not without its difficulties. Foremost, the accuracy of estimates of probabilities is dependent entirely on the climate model's ability to realistically represent the full range of plausible states that could be experienced in the real world. This is inherently very difficult to test because limited observed records provide very few samples of real world states and one is left trying to verify model states that have never been observed. We designed a statistical test to check for consistency between our modeled and observed indices and applied this in a way to maximize the number of observations in the test period. However, our test still suffers from small numbers of observations, particularly of extreme events. Thus, there is still some inherent reliance on the model's ability to simulate the indices used.
Ideally, a climate model should be able to represent the real world without any correction, and indeed this can be the case for specific models simulating specific variables in specific regions 45,69 . But more generally, climate models have systematic biases that must be accounted for prior to their use. In this study we used relatively low resolution climate simulations because of the uniquely large number of realizations they provide of the current climate, enabling sufficient samples to empirically deduce likelihoods of rare concurrent extremes. Our model required minimal adjustment-only bias correction of the mean DI-to produce joint distributions of FFDI Dec and DI that are statistically consistent with the real world over our region of focus. However, this is not necessarily the case for other indices and/or regions. Where more sophisticated calibrations are necessary, one must be careful not to overly constrain the model to observations, since this could negate the purpose of using the model in the first place (to provide plausible, yet unobserved, realizations of the earth system). Ongoing improvements to climate models and their resolutions will likely reduce model biases, allowing analyses like ours to be applied more broadly and with more confidence.
Our focus here has been on susceptibility to wildfire in the current climate (2014-2023) because of our focus on the 2019-2020 wildfire season. It is important to note, however, that occurrences of extreme drought and fire weather are expected to increase over the coming century due to anthropogenic influences. Projections indicate that winter and spring rainfall over Australia's eastern seaboard will decrease 70 and that drought duration and frequency across southern Australia are likely to increase 71 . Increases in mean and extreme temperatures this century are virtually certain 70 and likely to contribute to increases in the number and severity of dangerous fire weather events 15,[31][32][33][34] and drier, more volatile, fuel loads 72 . Indeed, some studies suggest that temperature may play an increasing role over precipitation in global fire occurrence over the next century 18,73 . Exactly how these changes will impact wildfire risk is a potential area for future research. The burnt area data used in Fig. 2 are calculated from New South Wales National Parks and Wildlife Service Fire History data 75 . These data are provided as polygons of burnt areas of wildfires and prescribed burns, sometimes including associated start and/or end dates, over the period 01/ 01/1920-18/02/2021. The polygon data are converted to a gridded product with 0.05 o resolution in latitude and longitude. We consider only wildfire data and exclude data that have start or end dates that do not fall within 28 days of December or do not span a period encompassing December. The regional burnt areas used in Fig. 2 are calculated by summing the 0.05 o resolution data over the region shown in Fig. 1.

Calculation of forest fire and drought indices
The daily McArthur Forest Fire Danger Index 76,77 , FFDI, is defined here as: FFDI ¼ D 0:987 expð0:0338T À 0:0345H þ 0:0234W þ 0:243147Þ; (1) where T ( o C) is the maximum daily temperature; H (%) is the daily average relative humidity at 1000 hPa; W (km/h) is the daily average 10m wind speed; and D is the rolling 20-day total precipitation scaled to range between 0 and 10, with larger D for lower precipitation totals. Note that this formulation differs from standard formulations of the FFDI 76,77 in a number of ways, principally in its use of daily average humidity and wind speed. These changes were necessitated by the data that were available to us across the various datasets used in this paper. FFDI estimates herein are likely to be attenuated relative to the standard formulations as a result of these differences. We calculate the FFDI from equation (1) from both the forecast model data (see "Use of forecast model data") and the Japanese 55-year reanalysis (JRA-55) 78,79 which spans 1958-2020. For the latter, the 1.25 o resolution fields of the individual components in equation (1) are first interpolated linearly to the forecast model grid (the only exception to this is in Fig. 2b, d, where the FFDI is presented at the JRA-55 grid resolution). For both the forecast and JRA-55 data, the regional December-averaged FFDI, FFDI Dec , is calculated for a given year by averaging all daily December values of FFDI over the four model grid cells in Fig. 1.
The Drought Index, DI, of a given year is defined as the accumulated total precipitation (mm) between January and December (both inclusive) of that year. Thus, lower values of DI indicate drier conditions. We calculate DI using the daily forecast data and using data from the Australian Gridded Climate Dataset (AGCD), which provides interpolated in situ observations on a 0.05 o × 0.05 o grid over the period 1900-2020 80,81 . AGCD v1 data are used for the period 1900-2018 and AGCD v2 data are used for the period 2019-2020. In Fig. 2c, we show DI calculated from AGCD at the native grid resolution. We also spatially average DI calculated from AGCD to the JRA-55 grid resolution in order to generate the combined index presented in Fig. 2d. Everywhere else, we focus in this paper on the regional Drought Index, DI, which is defined for each dataset as the average DI over the region in Fig. 1.
The JRA-55 and AGCD data provide joint historical records of FFDI Dec and DI spanning 1958-2020 and are referred to herein as "observations". By pooling forecast ensemble members and lead times, the forecast model provides many estimates of plausible values of FFDI Dec and DI for every available forecast year.

Use of forecast model data
We use the Commonwealth Science and Industrial Research Organization (CSIRO) Climate Analysis Forecast Ensemble (CAFE) near-term climate prediction system to produce many simulations of contemporary climate. The system uses the Geophysical Fluid Dynamics Laboratory Coupled Model version 2.1 82 , with an upgraded oceanic component (MOM5.1) and an atmospheric model resolution of 2 o in latitude and 2. 5 o in longitude 83 . Retrospective forecasts were run from initial conditions taken from the CAFE60v1 reanalysis 84,85 , which provides an ensemble of estimates of the states of the atmosphere, ocean, land and sea-ice over the period 1960-2020 using the same underlying model as the forecasts.
The retrospective climate forecasts provide a very large sample of possible values of FFDI Dec and DI under contemporaneous anthropogenic and natural forcings that enable the likelihoods of exceeding rare events (like those in 2019) to be estimated empirically. Our methodology is similar to those introduced in previous papers using the UNSEEN approach 46 : 1. Remove model samples at short lead times where there is dependence between forecast ensemble members due to their similar initial conditions (see "Testing of ensemble member independence"). Dependence between samples artificially inflates the sample size without adding new information. 2. Test that the model provides stable (with lead time) and realistic estimates of FFDI Dec and DI. We apply a simple bias correction to the modeled DI (see "Bias correction") and check that the joint distributions of modeled FFDI Dec and DI are consistent with the observed record (see "Testing of model fidelity"). When using forecast data for the purpose of providing multiple realizations of a given time period, it is important that equal numbers of samples are included for each year in the period, since this avoids over/ under-sampling the conditions of a particular year. Figure 8 shows the number of samples per calendar year for the f 1980! 10 mem (pink labels) and f 2005! 96 mem (purple labels) datasets after removing lead times with dependent ensemble members. Fewer samples are available for calendar forecast years toward the start and end of each forecast period because these years have fewer lead times available. For both FFDI Dec and DI, we define the lead time of a given forecast as the number of elapsed months between initialization and December of the forecast year (e.g., the 2020 forecast initialised in Nov 2020 is at 1-month lead). Note that no DI forecast is available at 1-month lead, since this index requires the accumulation of rainfall from January to December. Thus, the shortest available lead time for DI is 13 months.

Testing of ensemble member independence
Each multi-member forecast is initialised from a set of initial conditions that seek to estimate the state of the climate at the time of initialization and the uncertainty about that state. As such, ensemble members of a given forecast at short lead times are strongly dependent on each other. Inclusion of dependent ensemble members in our analysis results in D.T. Squire et al.
artificial inflation of the sample size, without adding new information [39][40][41]46 . To determine the lead time at which the ensemble members can be considered independent, we apply a simple statistical test that the correlation between ensemble members at a given lead time is zero.
At Significance of ρ t is estimated using a permutation test, whereby 10,000 sets of 96 × 16 points are randomly drawn from the complete f 2005! 96 mem model dataset to produce 10,000 estimates of the mean Spearman correlation for each variable in the same manner as above. Because these estimates are constructed from randomly drawn data, they represent the distribution of mean correlation values for uncorrelated data (i.e., the null distribution). Ensemble members of each variable are considered to be dependent (i.e., the null hypothesis of independence is rejected) at a given lead time if ρ t falls outside of confidence intervals calculated from the randomly sampled distribution using a 5% significance level (Fig. 3). The test is very similar to that described in previous studies 46 , however here we test only the mean correlation over combinations of ensemble members rather than all boxand-whisker statistics. Our approach reduces the number of simultaneous tests and the associated issues with multiple testing 87 . Note that nonzero correlation between ensemble members at a given lead time can also come about because they share the same time-varying forcing. However, correlation from shared forcing is not a valid reason to reject ensemble members. Applying the same test as described above, but removing the ensemble mean temporal trend from each ensemble member prior to calculating ρ t produces negligible changes to Fig. 3.

Bias correction
All climate models have systematic biases relative to the real world. There is a very large range of existing methods, of varying levels of complexity, for correcting for climate model biases [88][89][90] . Generally, these methods involve building a transfer function between the distributions of observed and modeled variables over a particular period of time. All such methods include potentially ad hoc assumptions regarding, for example, the shape and stationarity of the observed/modeled distributions. In the present analysis, we seek to use our forecast model to learn about events that are unprecedented in the historical record and therefore have no observations to constrain their correction. Our approach to model correction is to find the simplest justifiable method that produces model distributions that are statistically consistent with the limited historical record. In doing so, we minimize the extent to which the forecast model data are manipulated, and thus rely as much as possible on the ability of the model to simulate the range of contemporaneous climate conditions.
It is necessary to bias-correct the forecast DI to ensure that the simulated joint distribution of FFDI Dec and DI is consistent with the real world. For each forecast lead time, we estimate the mean DI bias as the difference between the mean f 1980! 10 mem and observed DI over the period 1990-2020. These biases (which range between −141 mm and −68 mm, depending on the lead time) are subtracted from both the f 1980! 10 mem and f 2005! 96 mem forecasts to produce unbiased estimates of DI. No bias correction is necessary for FFDI Dec . Note that the f 1980! 10 mem model dataset is used here so that the biases can be estimated using a relatively long time period (31 years). The f 2005! 96 mem model dataset is initialised over a shorter period (2005-2020), so provides, for example, only seven years of data at 115 months lead (2014-2020) that could be compared with observations to estimate biases (see Fig. 8).

Testing of model fidelity
We test the ability of our forecast model to simulate the real world by comparing the forecast and observed distributions of FFDI Dec and DI over a common period of time. Previous studies assessing likelihoods of extremes using forecast ensembles have tested that the observed mean, standard deviation, skewness and kurtosis of the variable in question falls within 95% confidence intervals from bootstrapped distributions of each statistic computed from the forecast model [44][45][46][47][48]69 . We apply a different test for two reasons. First, our focus in this paper is on compound events and thus we seek to assess the fidelity of our model in simulating the joint distributions of FFDI Dec and DI. Second, because the approach of previous studies simultaneously tests multiple statistics, each with their own statistical significance, it suffers from issues with multiple testing 87 . Indeed, Monte-Carlo simulations applying the above test to samples and bootstrapped distributions drawn from the same Gaussian population show that the rejection rate is approximately 18% (not 5%), with little dependence on sample size. For two variables, the rejection rate is higher still.
For these reasons, we instead apply a two-dimensional Kolmogorov-Smirnov (KS) test 54,55 to compare the joint distributions of FFDI Dec and DI. The f 2005! 96 mem model dataset provides 96 (member) forecasts for the 7-year period 2014-2020 at all independent lead times (see Fig. 8). We calculate the two-dimensional KS statistic between the observed and forecast distributions, K obs , using all data in this period. To derive a p-value for this statistic, we bootstrap 10,000 7-year pseudo-timeseries of FFDI Dec and DI from all forecasts that fall within the same period. For each 96 mem (f 1980! 10 mem ) dataset and the numbers in or above each bar show the lead times that are available for each year (defined as the number of elapsed months between initialization and December of the forecast year). Where a range is specified, lead times are available in increments of 6 months.
bootstrapped sample, we calculate the two-dimensional KS statistic, K, relative to the full set of forecasts within the period, thus providing the null distribution for our KS test. If K obs falls below the 95 th percentile of the null distribution-i.e., the right-tail p-value is greater than 0.05-we cannot reject the null hypothesis that the joint distributions are the same. In this case, we consider that our forecast model provides a good representation of plausible values of FFDI Dec and DI.
The results of the two-dimensional KS test are shown for the biascorrected f 2005! 96 mem model data in Fig. 4 (pink shading). We run the test for all lead times together (Fig. 4g) and for each lead month separately (Fig. 4h). In the latter case, the period of time over which the test is applied is adjusted to maximize the number of observed points in the comparison.
For example, f 2005! 96 mem forecasts at 37-months lead span 2008-2020 (see Fig. 8) and thus the test at 37 months lead is applied over this period. We also apply the same KS test to the bias-corrected f 1980! 10 mem data, which span a longer period of time and hence allow for comparison to a larger sample of observations ( Fig. 4g and h, purple shading). For the f 1980! 10 mem data, all KS tests are applied over the period 1989-2020.

Calculation of likelihoods of exceedance
Likelihoods of exceeding a given event are calculated from the empirical probability distribution as the proportion of total f 2005! 96 mem forecast samples that are more extreme than the event in question. For example, Fig. 5a, b, respectively, show probabilities P(FFDI Dec > FFDI Dec,i ) and P(DI < DI i ) for every sample i of the 13,440 samples in the f 2005! 96 mem dataset over 2014-2023, and Fig. 5c similarly show P(FFDI Dec > FFDI Dec,i ∧ DI < DI i ). In the calculation of likelihoods of exceedance, we limit ourselves to the 10 year period, 2014-2023, since all independent lead times are available from the model for these years (Fig. 8).
Likelihood confidence bounds in Figs. 5 and 7 are constructed by repeatedly bootstrapping the set of FFDI Dec and DI values used to calculate the likelihood in question and recomputing the likelihood for each boostrapped sample to produce 10,000 resampled estimates of the likelihoods of exceedance. These resampled likelihoods are used to calculate the 2.5-97.5% percentile ranges shown in the figures. In Fig. 5c, d, the likelihoods of exceedance are interpolated onto a regular grid using the 'griddata' routine in the Python Scipy library.

Calculation of climate driver indices
We employ three simple indices for climate modes that impact Australia. To assess the strength and phase of the El Niño Southern Oscillation (ENSO), we use the Nino 3.4 index 91 , which is the average sea-surface temperature (SST) anomaly over the region 5 o N-5 o S, 120 o -170 o W. The Indian Ocean Dipole is quantified using the Dipole Mode Index, DMI 92 , which is the difference between the average SST anomalies over western (10 o  The climate mode indices are computed from the forecast model data and from reanalysis data: Hadley Centre Global Sea Ice and Sea Surface Temperature (HadISST1) 94 for Nino 3.4 and DMI; and JRA-55 for SAM I . Anomalies are computed relative to the climatological average over the period 1990-2020. In cases where forecast model data are used, a separate climatological average is constructed using the f 1980! 10 mem dataset and removed for each forecast lead time. This removes from the forecasts the mean model biases over the reference period at each lead time. We focus on the average Nino 3.4 and SAM I over September, October, November and December (SOND), and on the average DMI over September, October and November (SON), corresponding to when each index has its strongest influence on precipitation and FFDI in southeast Australia 15 . The fidelity of the model climate driver indices relative to observations and their relationships to FFDI Dec and DI are assessed in Supplementary Figs. 2 and 3.

DATA AVAILABILITY
FireCCI v5.1 and C3S v1.0 fire burned area data are openly available from the Copernicus Climate Data Store at https://doi.org/10.24381/cds.f333cf85. New South Wales National Parks and Wildlife Service Fire History data are available for download at https://data.nsw.gov.au/data/dataset/fire-history-wildfires-and-prescribed-burns-1e8b6. JRA-55 data are available from the University Corporation for Atmospheric Research Research Data Archive at https://doi.org/10.5065/D6HH6H41. HadISST1 data are available from the Met Office Hadley Centre at https://hadleyserver.metoffice.gov. uk/hadisst/data/download.html. AGCD v1 data are accessible from Australia's National Computational Infrastructure Data Catalogue at https://doi.org/10.25914/ 6009600b58196. AGCD v2 data are not publicly available, with access details provided by the Bureau of Meteorology at http://www.bom.gov.au/climate/ austmaps/metadata-monthly-rainfall.shtml. The Natural Earth data used to generate the shading in the inset of Fig. 1 are in the public domain and are available at https:// www.naturalearthdata.com/. The forecast datasets are not yet available publicly but are available from the corresponding author upon request, bearing in mind that these datasets comprise hundreds of terabytes of data. Postprocessed versions of all the variables and indices presented in this paper are available from the corresponding author upon request.