Introduction

As of January 2021, Australia had effectively controlled COVID-19 transmission, experiencing prolonged intervals of local elimination throughout 2020 despite a steady influx of imported cases, which have been effectively managed by Australia’s hotel quarantine system1. Scenario modelling conducted in February 2020 informed Australia’s rapid and strong initial response2, which was sufficient to control the import-driven first wave3. In particular, Australia’s definitive border measures, including mandatory 14-day hotel quarantine on arrival from overseas, was effective in limiting community exposure from infected international arrivals.

Nonetheless, in May 2020 a breakdown in infection prevention and control measures in the state of Victoria’s hotel quarantine system resulted in community exposure that developed into a second wave4,5. COVID-19 (ancestral strain) case incidence peaked at 450 locally-acquired infections per day in the last week of July 20205,6. This was substantially higher than the peak of 131 daily cases in the import-driven first wave. Despite the imposition of stringent inter-state movement restrictions, cross-border importation of cases into the state of New South Wales led to a long period of constrained community transmission in that state throughout 2020, kept in check by proactive public health case finding and quarantine without imposition of stringent social measures7. In Victoria, a combination of case finding, contact tracing, and prolonged restrictions on movement and gathering sizes were required to bring the second wave to an end in October 20205,8. Subsequent border incursions in various states and territories were associated with small clusters and in some cases localised community transmission, prompting the imposition of strong public health responses supported by variably stringent social restrictions over days and weeks. The daily incidence of locally-acquired COVID-19 cases in each jurisdiction over the study period is shown in Fig. 1.

The public health response to the Victorian second wave and outbreaks in other Australian jurisdictions was informed by the results of real-time analytics and an ensemble forecast of COVID-19 activity for each Australian state and territory, presented to key government advisory committees in weekly situational reports. Similar analytics and forecasts have supported public health responses around the world9,10,11,12,13. Here we detail one of the forecasting models included in the ensemble forecast, which was adapted from Australian seasonal influenza forecasts14,15,16 that we have deployed in near-real-time in collaboration with public health colleagues since 201517,18. We evaluate the model’s performance by measuring forecast skill relative to an historical benchmark forecast over the study period (May to October 2020) and demonstrate how these outputs supported public health responses during that period. In addition to using the model to predict future rates of locally-acquired COVID-19 cases in each Australian jurisdiction, we demonstrate how our methods were adapted and improved in response to the needs of government, in order to most effectively support the public health decision-making process.

We generated forecasts by combining a stochastic SEIR-type (Susceptible, Exposed, Infectious, Recovered) compartment model with daily COVID-19 case counts through the use of a bootstrap particle filter16. The forecasting model incorporated real-time estimates of the effective reproduction number \(R_\mathrm{eff}\) for the cohort of active cases (who may not be representative of the whole population), and the transmission potential (TP) averaged over the whole population (characterising the potential for wide-spread transmission)7.

In brief, \(R_\mathrm{eff}\) is the average number of infections caused by an active case in a population (where some people may be protected against infection) and reflects the transmission characteristics of active cases in that population. In contrast, TP characterises the expected number of infections caused by an average person in a population (even when there are no active cases in that population) and reflects whole-population behaviours. These TP estimates were informed by nationwide behavioural surveys and by mobility data from technology companies7,19.

We used the active-case \(R_\mathrm{eff}\) to characterise local transmission at the time of forecast, and assumed that transmission from future active cases would gradually become more representative of the whole population. That is, local transmission in the forecasting model gradually shifted from the active-case \(R_\mathrm{eff}\) to the whole-population transmission potential (TP) over the forecast horizon. We assess the impact of this assumption about active case heterogeneity by comparing the forecast skill against that of a separate suite of forecasts where local transmission was maintained at the current active-case \(R_\mathrm{eff}\).

One of the most significant challenges in using predictive models to support public health decision-making in real time is understanding how much confidence to place in the predictions20,21,22. We address the question “When should we trust these forecasts?” by showing how forecast skill can be assessed in real-time before the ground truth is known.

Figure 1
figure 1

Daily incidence of locally-acquired COVID-19 cases for each jurisdiction, as reported after the end of the study period (see "COVID-19 surveillance data" section). Most jurisdictions experienced a first wave in March and April, comprised largely of imported cases. From June to October, Victoria (VIC) experienced a large second wave and New South Wales (NSW) experienced a prolonged low-amplitude small second wave. Other jurisdictions are: Australian Capital Territory (ACT), Northern Territory (NT), Queensland (QLD), South Australia (SA), Tasmania (TAS), and Western Australia (WA). Vertical lines indicate the study period (1 May to 31 October). Note the different vertical scales for New South Wales and Victoria.

Results

At each week in the study period (May to October 2020) we produced forecasts for each jurisdiction that predicted daily COVID-19 case counts in that jurisdiction over the next four weeks. Note that each forecast was conditioned on the expected case counts by symptom onset date, after accounting for delayed case ascertainment (see "COVID-19 surveillance data" and "Forecasting COVID-19 case numbers in the community" sections  for methodological details).

We first examine how well these forecasts characterised future case incidence during the onset of the second wave in Victoria (In "Onset of the second wave in Victoria" section). Over this period, the timeliness and completeness of the near-real-time case data varied substantially, and we show how they impacted the forecast predictions.

We then demonstrate how we adapted the forecasts to support the gradual easing of restrictions in Victoria, as daily case incidence decreased in September and October but continued to persist at low levels (In "Likelihood of a large outbreak" section). Policy changes during this period were subject to reaching specific 14-day moving average case thresholds.

Finally, we present an analysis of the forecast performance across all Australian jurisdictions, in which we measure how accurately our forecasts predicted the future case numbers, relative to an historical benchmark forecast (In "Forecast skill" section). We calculated skill scores for our forecasts, where negative values indicate that the forecasts performed worse than the historical benchmark, zero indicates that the forecasts performed as well as the historical benchmark, and positive values indicate that the forecasts performed better than the historical benchmark. The maximum possible skill score is one, which indicates that a forecast yielded perfect predictions with no uncertainty. We identify the circumstances under which forecast skill was highest, evaluate the impact of key model features on forecast skill, and conclude by considering how to assess forecast skill in real-time.

Onset of the second wave in Victoria

Throughout May 2020, Victorian case incidence remained low and relatively stable (5.6 cases per day, on average). With such limited chains of local transmission, forecasts generated in May considered local extinction to be the most likely outcome and did not include the possibility of a large outbreak. These predictions were consistent with the findings of a subsequent genomic study, in which the month of May 2020 was characterised by near elimination of COVID-19 in Victoria23. However, a new cluster of local cases emerged at the end of May23, and by late June more widespread epidemic activity was clearly established, with more than 60 cases per day (Fig. 1).

Here we focus on the following key questions: “When did the forecasts first consider a large outbreak was possible?”; “When did the forecasts first consider a large outbreak was likely?”; and “How did the reported data available at each week influence these predictions?”.

Likelihood of a large outbreak

The forecast generated on 24 June was the first to indicate that a large outbreak might occur. The 95% credible interval included trajectories with increasing case counts, suggesting that there was least a 1-in-40 chance of a sustained outbreak, and the lower bounds of each credible interval was greater than zero. Forecasts generated on this date, and for the subsequent six weeks, are shown in Fig. 2, and a qualitative summary of each forecast is provided in Table 1.

The forecast generated on 1 July was the first to demonstrate confidence that a substantial outbreak would occur (the 50% credible interval included trajectories with increasing case counts). Its median trajectory exceeded 100 cases per day, the 50% credible interval (CrI) lower bound remained above 10 cases per day until mid-August, and the 50% CrI upper bound was in good agreement with the future case counts over the entire forecast horizon. As local transmission became more established, subsequent forecasts yielded increasingly confident and accurate predictions of the size and timing of this second wave. The 8 July forecast was a marked outlier from the other forecasts over this period, and we address this forecast separately in the next section.

Despite the 50% CrIs being reasonably narrow, the 95% CrIs remained very broad in all of these forecasts. The 95% CrI lower bounds decreased to zero cases per day roughly 2 weeks after the forecast date, and their upper bounds exceeded 1,000 cases per day. This diversity of epidemic trajectories (from the stochastic SEEIIR compartment model) illustrates the inherent uncertainty at the time of each forecast about the future course of this second wave, and what impact the imposed public health and social measures might have on local transmission over the forecast horizon.

Figure 2
figure 2

Forecasts of daily COVID-19 case incidence for Victoria, shown separately for each data date (right-hand axis labels). Each forecast is shown as a spread of credible intervals (blue shaded regions). Vertical dashed lines indicate the data date (i.e., when the forecast was generated). Black lines show the daily case counts at the time of forecasting, after accounting for delayed ascertainment, and dark yellow lines show the future case counts. Note that the daily case counts at the time of forecasting (black lines) end several days prior to the data date; see "Competing models" section for details. The y-axis is truncated at 800 cases; due to the potential for exponential growth in model cases, the upper bounds of the 95% credible intervals reached as high as 45,000 cases per day at the end of the 22 July forecast horizon.

Table 1 Qualitative summaries of the forecast credible intervals (CrIs) during the onset of the second wave, where “Date” is the date on which each forecast was generated.

The impact of near-real-time data

A natural consequence of working with near-real-time surveillance data was the challenge of interpreting the reported numbers of locally-acquired cases for recent dates, given the delay between symptom onset and cases being ascertained and reported. Fig. 3 shows the daily case counts reported at each data date, and the expected daily case counts after accounting for delayed ascertainment7 (see "Forecasting COVID-19 case numbers in the community" section for methodological details). Ascertainment delays are evident in the data reported on and after 24 June (i.e., once there was an indication of established local transmission) and became more pronounced as local case incidence increased. The most recent data in each extract (dashed lines) systematically undershoot the corresponding case counts in subsequent data extracts.

The forecast generated on 8 July is a particularly notable example of how data timeliness and completeness can impact model predictions. Its 50% CrI is much narrower than those of other forecasts and trends downward from the most recent observation (30 June), undershooting every future daily case count, even those before the data date itself. This forecast was influenced by an apparent decrease in daily cases reported over 25–30 June (Fig. 3), even after applying the modelled correction for delays between symptom onset and case ascertainment (based on data from earlier data extracts), and might have revealed an opportunity for achieving elimination. However, case counts for these dates were substantially higher in subsequent data extracts. While there were a number of local clusters in Victoria around this time, including in multiple schools and public housing sites24,25, to the best of our knowledge this ascertainment delay was not linked to any specific cluster. A plausible explanation is that unprecedented case numbers impacted various aspects of the public health response (e.g., laboratory testing, contact tracing, reporting). The Victorian Government subsequently received additional support from Australian Defence Force personnel26.

In contrast, the forecast generated on 5 August is in good agreement with the data, and its 50% CrI includes all of the future case numbers. Note the raw data available at the time of forecast suggested an epidemic in decline, but incidence remained near-maximal for the next two weeks and this was captured by the forecast.

These observations highlight the importance and the challenge of understanding the meaning and limitations of the available data in near-real-time. While correcting for delays (right censoring) is necessary, such corrections are not static through time. It is fundamentally difficult to estimate the completeness of these data in rapidly-evolving situations, such as during rapid epidemic growth, where bottlenecks and delays are not directly observable and are subject to change. Active communication with those responsible for the data is crucial to ensuring that the data and the resulting forecasts are interpreted appropriately, particularly when there are changes in surveillance, testing, and/or reporting.

Figure 3
figure 3

Daily incidence of COVID-19 cases for Victoria from mid-May until the end of June, shown separately for each data date (right-hand axis labels). Each data extract updated the entire time-series. Forecasts were conditioned on the expected case counts after accounting for delayed ascertainment (solid black lines); the reported case counts in the data extracts are also shown (dashed black lines). Updated case counts, as reported on 12 August, are shown as grey columns.

Policy context: easing restrictions in Victoria

In response to the imposed restrictions (“stage 3” and “stage 4”), daily case numbers in Victoria peaked in August and steadily decreased thereafter5,7. By early September the Victorian government had announced a plan to gradually ease these restrictions over a period spanning mid-September to late November, conditional on achieving specific case thresholds over 14-day windows ahead of each planned date for policy changes27. However, similar to the months of May and June, local cases remained at low but persistent levels in the second half of September and into October (see Fig. 4). The city of Melbourne had been under intense mobility and gathering restrictions (so-called “lockdown”) since 7 July, and the growing need to ease these restrictions had to be carefully balanced against the risk of resurgence. We used our forecasts to predict the probability that the 14-day moving average for incident cases would achieve these specific thresholds at each day of the forecasting period (see "Meeting case-number thresholds that trigger policy changes" section for methodological details).

Each of the four target case thresholds was achieved in the months of September and October 2020 (see Table 2). For each case threshold, we inspected the forecasts made in the weeks prior to that threshold being achieved, to see how far in advance the forecasts suggested that these thresholds would be achieved. The forecasts predicted a greater than 50% chance of achieving each threshold three to four weeks in advance (see Table 2 and Fig. 5). For example, the \(\le 50\) cases threshold was achieved on 13 September. Twenty five days earlier, the forecast generated on 19 August predicted a 46.8% chance of achieving this threshold by 13 September. The next forecast (generated on 26 August) predicted a 63.5% chance of achieving this threshold by 13 September.

The one exception was the 10-case threshold, which was reached on 9 October. In the weeks prior to this date, daily case counts fluctuated around 10 cases per day (see Fig. 4). Three weeks prior to this threshold being reached, the forecasts predicted only a 23% chance of reaching this threshold by 9 October. By the following week — two weeks in advance — the forecasts predicted a 68% chance of reaching this threshold by 9 October.

Figure 4
figure 4

Daily incidence of COVID-19 cases for Victoria in October, shown separately for each data date (right-hand axis labels). Forecasts were conditioned on the expected case counts after accounting for delayed ascertainment (solid black lines); the reported case counts in the data extracts are also shown (dashed black lines). Updated case counts, as reported after the study period, are shown as grey columns.

Table 2 Forecast predictions concerning the 14-day moving average case thresholds.
Figure 5
figure 5

Time series showing the projected probability of reaching each of the 14-day moving average case thresholds, shown separately for each data date (right-hand axis labels). Vertical dashed lines show when each threshold was first reached (with respect to symptom onset date, not reporting date). The black points show the projected probability of having achieved each threshold on the day that it was reached.

Forecast skill

Having described the behaviour and qualitative performance of our forecasts during periods of increasing, decreasing, and low levels of local transmission, we now consider their performance more precisely, through skill scores. We measured forecast skill relative to an historical benchmark, described in "Forecast targets and skill" section, which assumed that future case counts would follow the same distribution as the case counts observed to date. This provides a quantitative performance evaluation and helps to address a critical question when using forecasts to support public health decision-making: “When should we trust these forecasts?”.

Forecast uncertainty

To quantify forecast uncertainty we used forecast sharpness, a measure of the breadth of forecast outcomes (e.g., a sharpness of zero means there is a single possible outcome). For jurisdictions that reported substantial COVID-19 activity, as the lead time increased (i.e., looking further into the future) the forecasts exhibited little increase in uncertainty, or even grew more certain (Sect. S1). This trend reflects, in part, that as local epidemic activity increased there was more information available to inform the forecasts and constrain the range of plausible outcomes. During the second wave in Victoria, strong public health measures and social restrictions markedly reduced the TP, and this caused the forecasts to become sharper at longer lead times (Fig. S2). In contrast, assuming that transmission would be sustained at the current active-case \(R_\mathrm{eff}\) caused the forecasts to become more uncertain at longer lead times (Fig. S3). For jurisdictions that primarily reported zero cases on a given day, the forecast trajectories tended to local extinction, and forecast sharpness was close to zero (i.e., minimal uncertainty) for all lead times.

Forecast bias and case heterogeneity

The forecasts tended to under-estimate future case counts (negative bias, see Sect. S2). For jurisdictions which had achieved local elimination, such as South Australia in May, this reflects a deliberate model feature: our careful separation of the potential for reintroduction from the consequence of reintroduction. Without evidence of reintroduction into these jurisdictions, the forecasts did not admit the possibility of subsequent epidemic activity (see "Accounting for observed reintroduction" section) and predicted there would be zero reported cases each day. On the rare occasions where a jurisdiction subsequently reported one or two isolated cases (see Fig.  1) this resulted in a negative bias (under-estimation).

For Victoria, which experienced substantial COVID-19 activity during the study period, model assumptions about local transmission contributed to the underestimation of future case counts. Recall that the model used the active-case \(R_\mathrm{eff}\) to characterise local transmission at the time of forecast, with transmission reverting to the transmission potential (TP) over time (see "Forecasting COVID-19 case numbers in the community" section for details; example \(R_\mathrm{eff}\) trajectories are shown in Sect. S4). However the second wave in Victoria was dominated by localised outbreaks in health and aged care settings, and other essential services, where public health and social measures to constrain transmission had less impact than in the general population5. Over a 5-month period, the \(R_\mathrm{eff}\) was systematically greater than the TP, as described and evaluated in Golding et al.7. Accordingly, as shown in Fig. 2, the majority of the Victorian forecasts somewhat under-estimated future case counts. Despite this negative bias, the model consistently out-performed forecasts where local transmission was instead maintained at the current active-case \(R_\mathrm{eff}\) (see Sect. S3.1). Noting that \(R_\mathrm{eff}\) was consistently higher than the TP, this highlights the importance (and challenge) of accounting for heterogeneity of transmission in diverse settings and population sub-groups.

Forecast skill and the historical benchmark

Of the five jurisdictions that primarily reported zero cases on a given day, the forecasts consistently out-performed the historical benchmark for all lead times in the Australian Capital Territory and Western Australia, and out-performed the historical benchmark for 1-week lead times in South Australia and Tasmania. The historical benchmark out-performed the forecasts for all lead times in the Northern Territory, where the past case counts were extremely good predictors of the future case counts. For Queensland, which saw a sustained period of 1–5 cases per day (July to September 2020) the forecasts out-performed the historical benchmark for lead times of up to 2 weeks. For longer lead times the forecasts tended to include the possibility of a sustained increase in cases, which was inconsistent with the ground truth and reduced their skill relative to the historical benchmark.

For jurisdictions that experienced substantial COVID-19 activity (New South Wales, Victoria) the forecasts exhibited improved performance relative to the historical benchmark for shorter lead times (Sect. S3.3). For Victoria, which experienced the largest local outbreak, the forecasts substantially out-performed the historical benchmark for lead times of up to 4 weeks (Sect. S3.4).

Forecast skill across all jurisdictions was highest when daily case numbers were zero or at least ten (Sect. S3.1). The forecasts also out-performed the historical benchmark when there were 1–9 daily cases, but to a lesser degree. Across all jurisdictions, 1–9 daily cases comprised 18% of the daily observations; 4% in the state of Victoria, the other 14% in jurisdictions that did not experience sustained epidemic activity. These are the circumstances under which there is great uncertainty as to whether local transmission will become established and drive exponential growth in case numbers. The model forecasts began to include the possibility of a large outbreak in these circumstances (e.g., see the Victorian forecast for 24 June in Fig. 2) and many trajectories substantially exceeded the future case counts, since enacted control measures regularly curtailed transmission.

In contrast, the historical benchmark predictions were defined by the daily case counts reported to date and yielded much narrower predictions than the forecast model in these circumstances. The majority of the 1–9 daily case observations occurred in jurisdictions that did not experience sustained epidemic activity, for which the past observations were good predictors of the future case counts, and the historical benchmark performed well. For Victoria, the historical benchmark predictions substantially underestimated the future case counts in the second wave, and did not include the ground truth as a possibility.

Of course, when considering the model forecasts as a risk evaluation tool, their ability to include the potential for large outbreaks in their projections is a major advantage over the historical benchmark, as is their ability to predict peak timing and size.

Forecasting in near-real-time

We highlighted some challenges of using near-real-time data in  "Onset of the second wave in Victoria" section. By accounting for delayed ascertainment in the reported case counts (depicted in Figs. 3 and 4) we extracted additional information that, on average, increased the forecast skill (Sect. S3.2), even though it slightly decreased the forecast skill for days where 1–9 cases were reported (Sect. S3.1). Forecast performance was particularly sensitive to the imputed case counts when the reported case counts were low and a difference of, say, 1 case per day represented a large relative change (see Figs. 3 and 4 for examples).

When considering forecast skill as a function of the number of cases ultimately recorded for each day, as above, the results can only be evaluated in retrospect, since these numbers are unknown when the forecast is made. Accordingly, we also considered forecast skill as a function of the number of daily cases predicted by the forecast model (Sect. S3.5). In other words, before we know the ground truth, can we anticipate what the forecast skill is likely to be, based only on (a) the current forecast predictions; and (b) the forecast skill of previous forecasts (for which the ground truth is already known). Our intent was to address the question “What does this forecast mean?” before the ground truth is known, with the aim of providing assistance when interpreting the forecast outputs in near-real-time.

The skill scores were similar when aggregated by the number of cases ultimately reported for each day or by the median number of predicted case numbers, particularly when there were at least 5 cases per day (Fig. S17). This means that we can estimate the forecast skill by using the median forecast as a proxy for the ground truth; past performance appears to be indicative of future performance. This indicates that in decision-making contexts (i.e., when future case counts are unknown) we can still obtain a reasonable estimate of forecast skill and assess forecast reliability in real-time. These estimates are particularly reliable when local transmission is established.

Discussion

Principal findings

Forecasting infectious disease activity is a fundamentally difficult problem28, and even more so in unprecedented circumstances and in the face of regularly updated public health responses, public perceptions of risk, and changes in behaviour29,30,31. Despite these challenges, we were able to adapt our existing seasonal influenza forecasting methods14,15,16,17,18 to COVID-19 transmission in Australia, and provide the resulting forecasts to public health authorities to support the Australian COVID-19 response. We tailored our analyses in near-real-time to support public health decision-making in rapidly-evolving situations. This included providing public health authorities with estimated future epidemic activity, timelines for achieving case thresholds that would trigger easing of restrictions, and risk assessments based on repeated incursions of new chains of transmission in primarily infection-free jurisdictions.

Our forecasts exhibited the greatest skill in the state of Victoria, which experienced the overwhelming majority (96%) of locally-acquired cases reported in Australia over the study period, May–October 2020. Forecast skill was highest when there were at least 10 reported cases on a given day, the circumstances in which authorities were most in need of forecasts to aid in planning and response.

In considering several alternate forecast models, forecast skill was higher when we assumed that the active-case \(R_\mathrm{eff}\) characterised local transmission at the time of forecast and that, over the forecast horizon, local transmission would trend back towards the whole-population transmission potential (whether \(R_\mathrm{eff}\) was lower or higher than the TP). This improvement in skill was driven by characteristics of both the Victorian and New South Wales epidemics. In Victoria, while \(R_\mathrm{eff}\) remained consistently higher than the transmission potential, it decreased steadily over time7. In New South Wales, the public health response to incursions repeatedly drew \(R_\mathrm{eff}\) below the TP7. Although our model overestimated the rate at which local transmission would trend back towards the TP, it consistently out-performed forecasts generated under the assumption that local transmission would be sustained at the current active-case \(R_\mathrm{eff}\).

Forecast skill was also higher when the model was conditioned on the most recent data, even though these data were known to be incomplete. By imputing symptom onset dates and estimating the delay between symptom onset and case ascertainment7, we obtained additional information to that available in the reported time-series data, improving forecast skill. However, when the reported number of daily cases were low, forecast performance was particularly sensitive to the imputed case counts and there were occasions where forecast skill was higher when the model was not conditioned on the most recent data.

Study strengths

Australian jurisdictions maintained high testing levels over the study period. The proportion of infected persons that were identified as cases was likely to be both very high, and to remain relatively constant over the study period. These cases include asymptomatic individuals who were identified through large-scale and systematically exhaustive contact tracing programs5. In comparison, in recent influenza seasons the probability that a person with influenza-like illness (ILI) symptoms would seek healthcare and have a specimen collected for testing (as estimated from Flutracking32,33 survey participants) was around 3–8%16. But during the study period, 50–75% of Flutracking participants with ILI symptoms reported having a COVID-19 test (Sect. S5), suggesting high case ascertainment. The end of the second wave in Victoria provides yet further evidence of high case ascertainment. As the 14-day case thresholds for easing restrictions were achieved, there were fewer and fewer unlinked (“mystery”) cases reported. For example, the Victorian 19 October update reported an average of 7.7 cases per day in the past 14 days and a total of just 15 unlinked cases34, while the 26 October update reported an average of 3.6 cases per day in the past 14 days and a total of just 7 unlinked cases35. Elimination was achieved before the end of the study period, and so unidentified cases, if present, did not cause sustained chains of transmission. This is consistent with the TP, which was estimated to be less than one in Victoria from August to October 20207.

As described above, we had access to detailed line-listed data, and were able to estimate the delay distribution for the time between symptom onset and case ascertainment, and to impute the symptom onset date for cases where this had not yet been reported7.

We were able to incorporate the impact of public health interventions via near-real-time estimates of the effective reproduction number (\(R_\mathrm{eff}\)) and the whole-population transmission potential (TP), which were informed by nationwide behavioural surveys, population mobility data, and times from symptom onset to case detection7,19. Upon reintroduction of cases in jurisdictions that had achieved local elimination, estimates of the transmission potential allowed us to produce forecasts that were informed by local mobility and behaviour, before reliable estimates of the effective reproduction number were available.

Likewise, we were able to adapt the forecast outputs to support policy decisions in near-real-time for individual jurisdictions without incorporating additional effects into the underlying simulation model. This meant that we did not need to explicitly model control measures that were imposed and relaxed in each jurisdiction — such an effort would have been extremely complex and it would be impossible to account for decisions that occur within each forecast horizon and their impact on transmission dynamics. The fact that our forecasts exhibited high forecast skill, particularly during periods of sustained local transmission, demonstrates that increasing model complexity is not always necessary or appropriate12,22,36,37.

Study limitations

Australia had experienced limited COVID-19 activity prior to the study period, and so it was reasonable to assume that the entire population was susceptible to infection. Subsequent waves of local infection, as experienced in Australia after the study period, have resulted in a non-negligible proportion of the population having some degree of natural immunity. In these circumstances, the forecasts would become more sensitive to assumptions about case ascertainment. COVID-19 vaccination in Australia began in February 2021 (i.e., after the study period) and has mitigated the impact of subsequent circulation of the Delta and Omicron variants in 2021 and 2022. By February 2022, more than 94% of people over the age of 16 were fully vaccinated38. We have continued to refine and extend the methods presented here to account for changes in case ascertainment, vaccination coverage, booster vaccinations, and waning immunity. Results obtained with these refinements continue to be reported under Australia’s National Disease Surveillance Plan for COVID-1939 and will be described elsewhere.

Our forecast model only incorporated population heterogeneity by allowing local transmission to trend from the active-case \(R_\mathrm{eff}\) to the whole-population transmission potential (TP) over the forecast horizon. This limited granularity was appropriate for the purposes to which the forecasts were applied, but meant that the forecasts were not suitable for addressing more nuanced questions about, for example, targeted (geographic or socio-demographic) interventions to mitigate localised clusters40.

Meaning and implications

The first wave of COVID-19 in Australia was primarily driven by returning travellers who had acquired their infection overseas. In contrast, the second wave in Victoria was driven by local transmission. Cases in the second wave were also over-represented in essential services that involved frequent and intense contact, such as residential aged care facilities, healthcare, manufacturing, and meat processing5, and so were less able to reduce their contacts (and so transmission opportunities) than the general population. Accordingly, during the second wave in Victoria the active-case \(R_\mathrm{eff}\) was systematically higher than the whole-population transmission potential7, even though both decreased over time in response to public health interventions. Forecast bias was compounded by this disparity between \(R_\mathrm{eff}\) and transmission potential.

However, since the active-case \(R_\mathrm{eff}\) did decrease over time, forecast skill was highest when, over the forecast horizon, local transmission trended back towards the whole-population transmission potential. It is unclear how future trends in local transmission could be better modelled or predicted, highlighting fundamental challenges in regards to (a) understanding the transmission characteristics of persons who are currently infectious; and (b) predicting how the transmission characteristics of infectious persons will change into the future, particularly when case numbers are low. These are intrinsic sources of uncertainty for infectious diseases forecasting, and while they may conceivably be reduced, they cannot be eliminated.

There is an acknowledged need for effective and on-going collaborations between modellers and public health practitioners in order to take full advantage of modelling tools such as epidemic forecasts to support public health policy and decision-making41,42,43. The forecasts presented here were adapted from our Australian seasonal influenza forecasts14,15,16, which we have deployed in near-real-time in collaboration with public health colleagues since 201517,18. A number of modifications were required to adapt these forecasts to COVID-19 transmission in Australia, but developing and applying these methods to seasonal influenza activity in previous years has provided us with valuable experience in adapting to unprecedented circumstances, working with incomplete and evolving data, responding to public health needs, and effectively communicating forecast outputs to public health decision-makers.

Methods

Ethics statement

The study was undertaken as urgent public health action to support Australia’s COVID-19 pandemic response. The study used data from the Australian National Notifiable Disease Surveillance System (NNDSS) provided to the Australian Government Department of Health and Aged Care under the National Health Security Agreement for the purposes of national communicable disease surveillance. Data from the NNDSS were supplied after de-identification to the investigator team for the purposes of provision of epidemiological advice to government. Contractual obligations established strict data protection protocols agreed between the University of Melbourne and sub-contractors and the Australian Government Department of Health and Aged Care, with oversight and approval for use in supporting Australia’s pandemic response and for publication provided by the data custodians represented by the Communicable Diseases Network of Australia. The ethics of the use of these data for these purposes, including publication, was agreed by the Department of Health and Aged Care with the Communicable Diseases Network of Australia. All methods were carried out in accordance with the relevant guidelines and regulations.

COVID-19 surveillance data

We used line-lists of reported cases for each Australian state and territory, which were extracted from the National Notifiable Disease Surveillance System (NNDSS) on a weekly basis. The line-lists contained the date when the individual first reported exhibiting symptoms, the date when the case notification was received by the jurisdictional health department, and whether the infection was acquired locally or overseas.

We used a time-varying delay distribution to characterise the duration between symptom onset and case notification, which was estimated from the reporting delays observed during the study period7. Symptom onset dates were imputed for cases where this was not (yet) reported.

Forecasting COVID-19 case numbers in the community

We used a discrete-time stochastic SEEIIR model with gamma-distributed waiting times to characterise infection in each Australian jurisdiction. Let \(S(t)\) represent the number of susceptible individuals, \(E_1(t)+E_2(t)\) represent the number of exposed individuals, \(I_1(t)+I_2(t)\) represent the number of infectious individuals, and \(R(t)\) the number of removed individuals, at time \(t\). Symptom onset was assumed to coincide with the transition from \(I_1\) to \(I_2\). We used real-time estimates of the effective reproduction number \(R_\mathrm{eff}\)7, which were informed by nationwide behavioural surveys, mobility data from technology companies, and times from symptom onset to case detection7,19, and sampled a unique trajectory \(R^j_\mathrm{eff}(t)\) for each particle \(j\). In order to initialise the first epidemic wave in each jurisdiction, it was assumed that 10 exposures were introduced into the \(E_1\) compartment at time \(\tau\), to be inferred, giving initial conditions:

$$\begin{aligned} S(0)&= N - E_1(0) \\ E_1(0)&= 10 \\ E_2(0)&= 0 \\ I_1(0)&= 0 \\ I_2(0)&= 0 \\ R(0)&= 0 \\ \sigma (t)&= {\left\{ \begin{array}{ll} 0 &{} \text {if }t< \tau \\ \sigma &{} \text {if }t \ge \tau \end{array}\right. } \\ \gamma (t)&= {\left\{ \begin{array}{ll} 0 &{} \text {if }t < \tau \\ \gamma &{} \text {if }t \ge \tau \end{array}\right. } \\ \beta _j(t)&= R^j_\mathrm{eff}(t) \cdot \gamma (t). \end{aligned}$$

The number of individuals who leave each compartment on each time-step \(\Delta {}t\) follows a binomial distribution (see Table 3 for parameter definitions):

$$\begin{aligned} \lambda (t)&= \frac{\beta (t) \cdot \left[ I_1(t) + I_2(t)\right] + \hat{e}(t)}{N} \\ S^\mathrm{Pr}(t)&= 1 - \exp \left( - \Delta {}t\cdot \lambda (t) \right) \\ S^\mathrm{out}(t)&\sim {{\,\textrm{Bin}\,}}(S(t), S^\mathrm{Pr}(t)) \\ E_1^\mathrm{Pr}(t)&= 1 - \exp \left( - \Delta {}t\cdot 2 \cdot \sigma (t) \right) \\ E_1^\mathrm{out}(t)&\sim {{\,\textrm{Bin}\,}}(E_1(t), E_1^\mathrm{Pr}(t)) \\ E_2^\mathrm{Pr}(t)&= 1 - \exp \left( - \Delta {}t\cdot 2 \cdot \sigma (t) \right) \\ E_2^\mathrm{out}(t)&\sim {{\,\textrm{Bin}\,}}(E_2(t), E_2^\mathrm{Pr}(t)) \\ I_1^\mathrm{Pr}(t)&= 1 - \exp \left( - \Delta {}t\cdot 2 \cdot \gamma (t) \right) \\ I_1^\mathrm{out}(t)&\sim {{\,\textrm{Bin}\,}}(I_1(t), I_1^\mathrm{Pr}(t)) \\ I_2^\mathrm{Pr}(t)&= 1 - \exp \left( - \Delta {}t\cdot 2 \cdot \gamma (t) \right) \\ I_2^\mathrm{out}(t)&\sim {{\,\textrm{Bin}\,}}(I_2(t), I_2^\mathrm{Pr}(t)) \\ S(t + \Delta {}t)&= S(t) - S^\mathrm{out}(t) \\ E_1(t + \Delta {}t)&= E_1(t) + S^\mathrm{out}(t) - E_1^\mathrm{out}(t) \\ E_2(t + \Delta {}t)&= E_2(t) + E_1^\mathrm{out}(t) - E_2^\mathrm{out}(t) \\ I_1(t + \Delta {}t)&= I_1(t) + E_2^\mathrm{out}(t) - I_1^\mathrm{out}(t) \\ I_2(t + \Delta {}t)&= I_2(t) + I_1^\mathrm{out}(t) - I_2^\mathrm{out}(t) \\ R(t + \Delta {}t)&= R(t) + I_2^\mathrm{out}(t) \end{aligned}$$

We allowed for the introduction of exogenous exposures, which may be acquired by individuals who, e.g., visit an external location where infectious individuals are present, or encounter an infectious individual from an external location who has travelled to this location. The expected number of exogenous exposures, \(\hat{e}(t)\), is time-varying and independent trajectories can be sampled for each particle.

We modelled the relationship between model incidence and the observed daily COVID-19 case incidence (\(y_t\)) using a negative binomial distribution with dispersion parameter \(k\), since the data were non-negative integer counts and were over-dispersed when compared to a Poisson distribution. Let \(\textrm{X}(t)\) represent the state of the dynamic process and particle filter particles at time \(t\), and \(x_t\) represent a realisation; \(x_t=(s_t, e_{1t}, e_{2t}, i_{1t}, i_{2t}, r_t, \sigma _t, \gamma _t, j_t, \hat{e}_t)\). We assumed that cases were observed (i.e., were reported as a notifiable case) with probability \(p_\mathrm{obs}\) when they transitioned from \(I_1\) to \(I_2\). To account for incomplete observations due to, e.g., delays in testing and reporting for the most recent observations, we fitted a time-to-detection distribution and calculated the detection probability (\(p_\mathrm{det}\)) for each observation at each time \(t\)7. In order to improve the stability of the particle filter for very low (or zero) daily incidence, we also allowed for the possibility of a very small number of observed cases that were not directly a result of the community-level epidemic dynamics (\(bg_\mathrm{obs}\)). Accordingly, we defined the likelihood \(\mathcal {L}(y_t \mid x_t)\) of obtaining the observation \(y_t\) from the particle \(x_t\) as:

$$\begin{aligned} \mathcal {L}(y_t \mid x_t)&\sim {{\,\textrm{NegBin}\,}}(p_\mathrm{det}\cdot \mathbb {E}[y_t], k) \\ \mathbb {E}[y_t]&= (1 - n_\mathrm{inc}(t)) \cdot bg_\mathrm{obs} + n_\mathrm{inc}(t) \cdot p_\mathrm{obs} \cdot N \\ n_\mathrm{inc}(t)&= \frac{I_2(t) + R(t) - I_2(t - 1) - R(t - 1)}{N} \\ {\text {Var}}[y_t]&= p_\mathrm{det}\cdot \mathbb {E}[y_t] + \frac{\left( p_\mathrm{det}\cdot \mathbb {E}[y_t]\right) ^2}{k} \end{aligned}$$

The value of the dispersion parameter \(k\) was chosen so that when zero individuals enter the \(I_2\) compartment over one day, the 50% credible interval has an upper bound of zero observed cases and the 95% credible interval has an upper bound of one observed case.

To generate projected case counts at each day, we used a bootstrap particle filter with post-regularisation44 and a “deterministic” resampling method45, as previously described in the context of our Australian seasonal influenza forecasts14,15,16,17,18. We constructed a simulated trajectory \(\tilde{y}^i = \{\tilde{y}_0^i, \tilde{y}_1^i, \dots \}\) from each particle \(i\) in the ensemble by drawing a random sample \(\tilde{y}_t^i\) at each day \(t\):

$$\begin{aligned} \tilde{y}_t^i&\sim {{\,\textrm{NegBin}\,}}(\mu = \mathbb {E}[y_t^i], k). \end{aligned}$$

Parameters and model prior distributions

Table 3 Parameter values for (i) the transmission model; (ii) the observation model; and (iii) the bootstrap particle filter.
Table 4 The population sizes used for each jurisdiction. Values were obtained from Population Australia data for 2019 (https://www.population.net.au/).

Model and particle filter parameters are described in Table 3. Since Australia is one of the most urbanised countries in the world, for each jurisdiction we used capital city residential populations (including the entire metropolitan region, as listed in Table 4) in lieu of the residential population of each jurisdiction as a whole. Parameters \(\sigma\) and \(\gamma\) were sampled from a multivariate log-normal distribution that was defined to be consistent with a generation interval distribution with mean=4.7 and SD=2.97,46, and we sampled independent \(R^j_\mathrm{eff}(t)\) trajectories for each particle. The observation probability \(p_\mathrm{obs}\) was fixed at 0.8, assuming that 80% of infections would be detected47.

Competing models

We evaluated four competing models, by considering all combinations of:

  1. 1.

    Assuming that local transmission would trend back towards the whole-population transmission potential over the forecast horizon (“Trend to TP”), or that local transmission would instead be sustained at the active-case \(R_\mathrm{eff}\) distribution over the forecast horizon (“Fixed Reff”); and

  2. 2.

    Conditioning on all daily case counts where the detection probability \(p_\mathrm{det}\ge 0.5\), or only conditioning on all daily case counts where \(p_\mathrm{det}\ge 0.95\) (in order to reduce the influence of imputed symptom onset dates, and cases potentially not yet reported).

The labels used to identify each of these four models are listed in Table 5.

Table 5 The four competing forecast models.

Accounting for observed reintroduction

We deliberately treated the consequence of observed reintroduction separately from the potential for reintroduction. Here our interest lies in introducing new exposures to reflect a small number of reported cases in a jurisdiction after a sufficiently long period of no reported cases in that jurisdiction. This captures importation events where the particle ensemble contains few, if any, particles with any exposed or infectious individuals, and is fundamentally different from predicting the probability of a future reintroduction into a jurisdiction that currently has no active COVID-19 cases. We can realise this by defining exogenous exposure trajectories that are non-zero for a short period prior to the observed reintroduction.

For a jurisdiction that reports one or more COVID-19 cases at time \(\tau\) (\(y_\tau\)) after a prolonged period of no cases (e.g., 28 days or longer), we assume that the initial exposure(s) occurred \(\Delta = 5\) days prior to the case(s) being reported, and we account for the time-to-detection distribution (\(p_\mathrm{det}\)) and the observation probability (\(p_\mathrm{obs}\)):

$$\begin{aligned} \hat{e}(t)&= {\left\{ \begin{array}{ll} y_\tau \cdot \left( p_\mathrm{det} \cdot p_\mathrm{obs}\right) ^{-1} &{} \text {if } \tau = t + \Delta \\ 0 &{} \text {otherwise.} \end{array}\right. } \end{aligned}$$

We deliberately selected this approach instead of keeping \(\hat{e}(t)\) fixed at some positive value \(\varepsilon\), so that the forecasts for jurisdictions that had achieved local elimination did not suggest there would be subsequent epidemic activity without evidence of reintroduction. This decision was motivated by Australia’s international and domestic movement restrictions, which were effective in limiting community exposure from infected international arrivals and from travel between jurisdictions. The daily probability of an importation event was deemed to be sufficiently low that it would only serve to inflate the widest credible intervals in jurisdictions that had achieved elimination.

Forecast targets and skill

The forecast targets were the daily case counts (indexed by symptom onset date) for the 7 days prior to the data date (“back-cast”), for the data date itself (“now-cast”), and for the 28 days after the data date (“forecast”). Forecast performance was measured using Continuous Ranked Probability Scores (CRPS48). CRPS is a generalisation of the Mean Absolute Error (MAE) for probabilistic forecasts, which penalises forecasts for placing probability mass at a distance from the ground truth, and is one of the most widely used accuracy metrics for probabilistic forecasts. CRPS values increase as more probability mass is allocated to incorrect values, and as the distance between these incorrect values and the ground truth increases.

We calculated CRPS values for each observation \(y_\tau\) of COVID-19 cases on day \(\tau\), given observations \(y_{0:t}\) for days \(0 \dots t\) (inclusive):

$$\begin{aligned} P(Y_\tau \le y \mid y_{0:t})&= \sum _{i=1}^{N_\mathrm{px}} w_i \cdot \mathcal {H}(y - \tilde{y}_\tau ^i) \\ {{\,\textrm{CRPS}\,}}(y_\tau \mid y_{0:t})&= \sum _{y=0}^{Y_{\max }} \left[ P(Y_\tau \le y \mid y_{0:t}) - \mathcal {H}(y - y_\tau ) \right] ^2 \\ \mathcal {H}(x)&= {\left\{ \begin{array}{ll} 1 &{} \text {if } x \ge 0 \\ 0 &{} \text {otherwise.} \end{array}\right. } \end{aligned}$$

where \(\mathcal {H}\) denotes the Heaviside step function.

We compared the performance of each model \(\mathcal {M}\) to that of an historical benchmark forecast \(\mathcal {N}\), whose probability mass function \(\mathcal {N}(t)\) for the data date \(t\) was the distribution of the daily case counts reported in the jurisdiction at the time of forecast, accounting for the detection probability \(p_\mathrm{det}(\tau )\) for each observation \(y_\tau\):

$$\begin{aligned} Y_{t+h}&\sim \mathcal {N}(t) = \left\{ \bigg \lfloor \frac{y_\tau }{p_\mathrm{det}(\tau )}\bigg \rceil : \tau \in \{0, ..., t\} \wedge p_\mathrm{det}(\tau ) \ge 0.5 \right\} \\ h&\in \{-7, -6, \dots , 27, 28\}. \end{aligned}$$

We measured the performance of each model \(\mathcal {M}\) using CRPS skill scores49:

$$\begin{aligned} {\text {skill}}(\mathcal {M})&= \frac{{{\,\textrm{CRPS}\,}}_\mathcal {N} - {{\,\textrm{CRPS}\,}}_\mathcal {M}}{{{\,\textrm{CRPS}\,}}_\mathcal {N}}. \end{aligned}$$

We also measured forecast sharpness \(S(t)\) and forecast bias \(B(t)\) as per Funk et al.50:

$$\begin{aligned} S(t)&= \frac{\text {median}\left( |\tilde{y}_t - \text {median}\left( \tilde{y}_t\right) |\right) }{0.675}\\ B(t)&= 1 - \left[ P(Y_t \le y_t) + P(Y_t \le y_t - 1)\right] . \end{aligned}$$

Meeting case-number thresholds that trigger policy changes

We have already defined the probability that the case counts on day \(\tau\) will not exceed some threshold \(T\): \(P(Y_\tau \le T \mid y_{0:t})\). We can build on this definition to obtain the probability that the average daily case count over some window \(W\) up to, and including, day \(\tau\) does not exceed this threshold \(T\):

$$\begin{aligned} P(\mathbb {E}[Y_{\tau -W+1:\tau }] \le T \mid y_{0:t}) = \sum _{i=1}^{N_\mathrm{px}} w_i \cdot \mathcal {H}\left( W \cdot T - \sum _{j=0}^W \tilde{y}_{\tau -j}^i\right) . \end{aligned}$$

This was used to predict the probability of achieving each of the target case thresholds for relaxing measures in the second wave in Victoria, for each day in the forecast horizon.