CovidSim is an individual-based simulation code developed by the MRC Centre for Global Infectious Disease Analysis at Imperial College London. It is a modified version of an earlier model designed to support pandemic influenza planning1 and has now been used to explore various non-pharmaceutical interventions (NPIs) with the aim of reducing the transmission of the coronavirus, as documented in the key paper2 denoted Report 9. CovidSim played an important role in the United Kingdom in reorienting UK Government policy from herd immunity to a strategy focused on suppression of the viral infection. It should be noted, however, that many competitor models exist. Notable examples include the work performed at the London School of Hygiene and Tropical Medicine (see, for instance, refs. 3,4 and especially ref. 5), in which the effects of different NPIs in the UK are modelled. Another noteworthy model is CovaSim6, which is similar in structure to CovidSim in the sense that it models a population of individuals via discrete agents.

Likewise, CovidSim creates a network of individuals located in areas defined by high-resolution population density data. In the model, contacts with other individuals can be made in four different types of place, namely, within households, at schools, universities and work places. It is possible to model a combination of different NPIs, namely, general social distancing (SD), social distancing for those over 70 years of age (SDOL70), home isolation of suspected cases (CI), voluntary home quarantine (HQ) and place closure of universities and schools (PC) (see Table 2 of ref. 2). CovidSim contains over 900 input parameters, which are mainly located in two input files. Furthermore, a small number of parameters that define certain characteristics of the intervention scenario one wishes to study are supplied via the command line.

We investigated the reproducibility of the code, as has been done in past work7,8. That said, we especially focus on CovidSim’s robustness to uncertainty in the input parameters. By robustness in this context, we mean the extent to which the code amplifies uncertainties from the input to the output. Our main aim is to thus take the model as given and examine the uncertainty in its predictions when its parameters are treated as random variables instead of deterministic inputs. We will use a dimension-adaptive sampling method for this purpose9 to be able to handle the high-dimensional input space. This type of anisotropic sampling method adaptively exploits a possible low effective dimension, where only a subset of all inputs have a substantial impact on the model output. A wide range of domains have seen the application of such dimension-adaptive samplers, for example, computational electromagnetism10, finance11,12 and natural convection problems13, to name just a few. Here we perform a validation study to examine the ability of the predicted output distribution to envelop the observed COVID-19 death count, conditional on a predefined intervention scenario.

Due to the large number of inputs, one cannot hope to obtain an accurate, data-informed value of all parameters in contention. Moreover, considering CovidSim’s influential status and its likely use in future COVID-19 predictions, it is important to assess the impact of parametric uncertainty on the model output. We will argue the case for the prediction of uncertainty in high-impact decision-making, after we first describe our results.


We have performed an analysis on the original closed-source version of the code; however, the majority of our sensitivity analysis and uncertainty quantification efforts lie with the current updated open-source release of CovidSim.

With respect to the original version, we have been able to achieve exact reproducibility of the results2 in Report 9, although only when running within an Azure cloud environment. Attempts to run the code on a Linux-based machine failed; we could not reproduce the same results here and as this version is no longer supported it was not investigated further.

Uncertainty in CovidSim

The predictions of most computational models are affected by uncertainty from a variety of different sources. We identify the following three sources of uncertainty in CovidSim; namely, parametric uncertainty, model structure uncertainty and scenario uncertainty. This breakdown is not uncommon (see, for example, refs. 14,15,16).

Parametric uncertainty arises due to imperfect knowledge of the model input parameters \({\boldsymbol{\xi }}\in {{\mathbb{R}}}^{d}\), described in the ‘CovidSim parameters’ section. Model structure uncertainty is more fundamental, as it relates to uncertainty about the appropriate mathematical structure of the model, denoted by \({\mathcal{M}}\); one can think of missing epidemiological processes that are not implemented in CovidSim (see the discussion in Supplementary Section 6). Finally, a scenario \({\mathcal{S}}\) is the set of conditions under which a model \({\mathcal{M}}\left({\boldsymbol{\xi }}\right)\) is applied. In the case of CovidSim, \({\mathcal{S}}\) includes the choice of NPI scenarios, the initialization of the model and the well-known reproduction number R0. Note that the actual implementation of \({\mathcal{S}}\) will be parameterized as well, and that we could technically lump these parameters in with ξ; however, the scenario parameters are of a different nature than the internal inputs ξ, and treating \({\mathcal{S}}\) as a separate category mirrors the way in which the results were presented in Report 9, which showed results for different NPIs and R0 values.

If we denote q as the predicted output quantity of interest, we therefore have \(q=q\left({\boldsymbol{\xi }},{\mathcal{M}},{\mathcal{S}}\right)\), where all three arguments are uncertain. As noted, our main goal is to quantify the impact of parametric uncertainty. By treating the inputs as random variables with probability density function (PDF) \(p\left({\boldsymbol{\xi }}\right)\), our mean prediction is given by

$${\mathbb{E}}\left[q| {\mathcal{M}},{\mathcal{S}}\right]:= {\int}_{{{{\Omega }}}_{{\boldsymbol{\xi }}}}q\left({\boldsymbol{\xi }},{\mathcal{M}},{\mathcal{S}}\right)p\left({\boldsymbol{\xi }}\right){\rm{d}}{\boldsymbol{\xi }},$$

where Ωξ is the support of p(ξ). The uncertainty in the prediction in equation (1) can be represented by either the corresponding variance or confidence intervals. It is important to note that our results are conditional on \({\mathcal{M}}\) and \({\mathcal{S}}\). We are not in a position to change the former and we illustrate the importance of scenario uncertainty by repeating the parametric uncertainty analysis for two different scenarios.

Uncertainty propagation

We use EasyVVUQ17,18 from the Verified Exascale Computing for Multiscale Applications (VECMA) toolkit19 to propagate the input uncertainties through CovidSim. Templates from the CovidSim input files are generated to interface CovidSim with EasyVVUQ. In the process, a single file is generated, which contains all inputs, with their types and default values specified. Simply counting the number of entries in this file allows us to exactly determine the number of parameters present in the code, which is how we arrived at a number of 940 inputs.

We will not vary all 940 parameters (see the ‘CovidSim parameters’ section). We will instead assign to d (d 940) input parameters ξi an independent PDF, that is, ξi ≈ p(ξi); a d-dimensional sampling plan from the joint PDF, ∏ip(ξi), is then created, after which CovidSim is evaluated at each input point. We refine the sampling plan in a dimension-adaptive manner to handle the high-dimensional input space, the details of which can be found in the ‘Statistics’ section in the Methods.

CovidSim parameters

In this section we describe how we arrived at our selection of input parameters that we vary as part of our uncertainty quantification study. We have divided the parameters present in the input files into three groups:

  1. (1)

    Group 1, intervention parameters—these are parameters meant to slow down the viral infection, which can still be varied for a fixed \({\mathcal{S}}\) (for example, the length of time households are quarantined) when HQ is part of the selected NPI set.

  2. (2)

    Group 2, disease parameters—related to the characteristics of COVID-19 (for example, the latent period).

  3. (3)

    Group 3, spatial/geographic parameters—parameters that apply to the properties of the network (for example, the relative transmission rate for place types).

The purpose of this classification was to direct initial, exploratory uncertainty quantification (UQ) and sensitivity analysis (SA) campaigns on a coherent subset of parameters. The final UQ campaign contains parameters from all three groups. By campaign, we mean a single forward propagation step of uncertainty from the input to the output.

Before starting the UQ analysis we first performed a parameter study using, in part, expert domain knowledge from the CovidSim team at Imperial College London to reduce the number of inputs. We focus on a scenario based on the suppression release in the Report 9 folder on GitHub20, using the intervention setting that combines PC, CI, HQ and SD, as this class is the closest to actual NPIs that were implemented in the UK. In this case we have the aforementioned total of 940 parameters. Note that some input parameters are vectors, in which case we counted each entry as a separate input parameter. On top of our own initial selection, we received feedback over the course of our analysis from the developers of CovidSim as to the inclusion of given parameters in the UQ study.

Many of the parameters are (currently) not used in the case of COVID-19 simulation, such as numerous vaccination parameters. See the Supplementary Data for the full list with all input parameters, their default values, and the reasons for their inclusion or exclusion by the Imperial College London CovidSim team. This list also contains a short description of the parameters.

Although we made our own considerations and decisions as to which parameters to include in the UQ study, the large number of parameters in play requires expert knowledge to make a suitable initial selection. A total of 60 of these parameters were included in a UQ campaign at some point (these are displayed separately in the Supplementary Data). We choose uninformative uniform distributions to reflect our lack of knowledge in the most likely values of these inputs, with bounds either based on data or expert knowledge.

Any input that was selected at least once for refinement during the dimension-adaptive sampling in one of the three exploratory UQ campaigns of the ‘CovidSim parameters’ section was included in the final, large-scale UQ campaign. This led to a total of 19 final parameter distributions (see Supplementary Section 3).

Important scenario parameters are R0 and two trigger parameters, which are specified via the command line. In the case of modelling a suppression strategy, the SD and PC interventions are triggered when the weekly number of new intensive care unit (ICU) cases exceeds the value supplied by the first trigger. Likewise, they are suspended when this metric drops below the second specified trigger2. The results below are conditional on the selected NPI measures, as well as fixed values for R0 and the ICU triggers.

Confidence intervals

Here we consider two different PC_CI_HQ_SD suppression scenarios. The results that follow were obtained using a computational budget of 3,000 CovidSim evaluations per scenario. Figure 1a shows the 68% and 95% confidence intervals of the cumulative death prediction for \({{\mathcal{S}}}_{1}\), with R0 = 2.4 and on/off ICU triggers of 60/15. Remember that the PC and SD interventions are turned on and off based on a specified number of new weekly ICU cases (60 and 15 new cases here, which is one of the scenarios considered in Report 9). The PDF of the total death count after 800 days is also plotted. The latter shows clear non-Gaussian behaviour, with a heavy tail towards a higher death count. The corresponding Report 9 total death count2 is 8,700, whereas the current version, which now supports averaging over stochastic realizations, predicts21 9,500. Our mean prediction from equation (1) is almost double this amount. The Report 9 predictions are still captured by the distribution (at approximately the lower boundary of the 68% confidence interval), but the distribution also supports low-probability events which are about five to six times higher than those given in Report 9.

Fig. 1: Distribution of cumulative death predictions.
figure 1

a,b, The mean cumulative death prediction for \({{\mathcal{S}}}_{1}\) (R0 = 2.4, ICU on/off triggers 60/15) (a) and \({{\mathcal{S}}}_{2}\) (R0 = 2.6, ICU on/off triggers 400/300) (b) plus confidence intervals. The PDFs of the total death count after 800 days are shown to to the right. Day zero corresponds to 1 January 2020. We also plot the observed cumulative death count data for the UK (green squares) in both figures, which were obtained from ref. 22. The striped line is a single sample from CovidSim (current release), run with the baseline parameter values of Report 9.

Source data

Note that the output distribution conditioned on \({{\mathcal{S}}}_{1}\) clearly underpredicts the observed death count in the UK, which is also plotted in Fig. 1a. We therefore selected \({{\mathcal{S}}}_{2}\) using the parameters from Report 9 that gave the highest predicted mortality (R0 = 2.6 and ICU on/off triggers of 400/300 cases; Fig. 1b). The deterministic Report 9 prediction is still located at the 68% confidence interval lower border; however, the total death count PDF is notably less skewed, although still not exactly symmetric.

Figure 1 clearly indicates that the results are very sensitive to \({\mathcal{S}}\). As noted, we also plot the observed death count validation data from ref. 22 in both subfigures, which are evidently not captured well by the output distributions, although scenario \({{\mathcal{S}}}_{2}\) does perform better than \({{\mathcal{S}}}_{1}\). It is also plain from Fig. 1 that the rate of infection starts too slowly in both cases; it must be assumed that the epidemic started earlier than suggested in Report 9, which is in line with the findings of ref. 7. Hence, if one aims to validate CovidSim in a probabilistic sense (that is, obtaining a distribution that captures validation data with high probability), it is crucial to either tune the scenario parameters or to quantify the scenario uncertainty.

CovidSim also has a number of random seeds, whose influence on the death count is examined in Supplementary Section 5. See Supplementary Section 7 for confidence intervals on quantities of interest other than the cumulative death count.

Finally, we emphasize that the authors of Report 9 did not claim that their parameterization at the time would be able to match the death count data of the coming months. The main message was that it would “…be necessary to layer multiple interventions, regardless of whether suppression or mitigation is the overarching policy goal”2, and it also showed that doing nothing at all would have disastrous consequences.

Sensitivity analysis

With sensitivity analysis, the aim is to apportion the uncertainty of the model output to specific (combinations of) input parameter uncertainties. To this end, Sobol indices measure the fraction of the output variance that each combination of input parameters is responsible for when given a distribution on the inputs23. They can be computed in a post-processing step once the input uncertainties are propagated through the computational model24 (see the ‘Sobol index calculation’ section in the Methods).

The first-order Sobol indices (Si) are defined as \({S}_{i}:= {\mathbb{V}}\left[{q}_{i}\right]/{\mathbb{V}}\left[q\right]\in [0,1]\), for i = 1,  , d. Here \({\mathbb{V}}\left[q\right]\) is the total output variance and \({\mathbb{V}}\left[{q}_{i}\right]\) is the partial variance attributed to one particular input parameter23. Figure 2 displays the three Si with the highest values for \({{\mathcal{S}}}_{1}\) and \({{\mathcal{S}}}_{2}\) (see Supplementary Section 8 for more results). The Sobol indices are plotted against time, showing that the latent period (the period in which a patient is infected but not yet infectious) is the most influential at the beginning, although only for a short amount of time. A longer latent period therefore means that the rate of disease spread is slower in this early exponential growth stage, when there are still relatively few cases present.

Fig. 2: Fraction of variance from the model parameters.
figure 2

a,b First-order Sobol indices of the same three most dominant parameters for \({{\mathcal{S}}}_{1}\) (a) and \({{\mathcal{S}}}_{2}\) (b), plotted against time at one month intervals. The plot shows the fraction of the variance that each parameter is responsible for, over time. Blue circles, relative spatial contact rate given social distancing; orange, delay the start case isolation; green, latent period; blue stars, the sum of all 19 first-order indices; purple diamonds, the sum of the three most dominant parameters.

Source data

The second important parameter is the relative spatial contact rate given social distancing parameter, which indicates the assumed effectiveness of social distancing. Finally, the third parameter (in both scenarios) to dominate the variance is the delay to start case isolation. The latent period originally belonged to the disease parameter group of the ‘CovidSim parameters’ section, whereas the other two inputs are intervention parameters. Overall, it can be said that the intervention parameters, which influence control measures and human behaviour, are most influential. The inputs from the spatial/geographic group have a comparatively small effect.

In Fig. 2 we also plot the sum of all 19 first-order Sobol indices. This shows that first-order effects (that is, the fraction of the variance obtained by varying individual parameters) account for a little under 80% in the case of \({{\mathcal{S}}}_{1}\), and roughly 90% of the variance for \({{\mathcal{S}}}_{2}\). Conversely, interaction effects between parameters therefore account for no more than 10–20% in our chosen scenarios. We also show the sum of the first-order indices for just the three most important parameters (that is, those actually plotted in Fig. 2), which already accounts for roughly 50% and 67% of the observed variance in cumulative deaths for \({{\mathcal{S}}}_{1}\) and \({{\mathcal{S}}}_{2}\), respectively.

Uncertainty amplification

Although we based our input distributions (see Supplementary Table 1) on a combination of available data and expert knowledge, (in general) a certain level of ambiguity remains with respect to the choice of input distribution. We therefore devise a measure that examines the amplification of uncertainty in the outputs with respect to a given set of input distributions (as explained below). This relative measure of output-to-input variability is based on the coefficients of variation ratio (CVR), which serves as our robustness score and is given by

$${\mathrm{CVR}}:= {\mathrm{CV}}\left(\bar{q}\right)/{\mathrm{CV}}\left(\bar{{\boldsymbol{\xi }}}\right)=\left(\frac{1}{N}\mathop{\sum }\limits_{n=1}^{N}\frac{{\sigma }_{{q}_{n}}}{{\mu }_{{q}_{n}}}\right)\ /\ \left(\frac{1}{d}\mathop{\sum }\limits_{i=1}^{d}\frac{{\sigma }_{{\xi }_{i}}}{{\mu }_{{\xi }_{i}}}\right).$$

A coefficient of variation (CV) is a dimensionless quantity that measures the variability of a random variable with respect to its mean, and is defined as the standard deviation over the mean (σ/μ). In equation (2), \({\mathrm{CV}}(\bar{q})\) and \({\mathrm{CV}}(\bar{{\boldsymbol{\xi }}})\) are the mean CV of the output \(q\in {{\mathbb{R}}}^{N}\) and input \({\boldsymbol{\xi }}\in {{\mathbb{R}}}^{d}\), respectively. The results for CovidSim using equation (2) are displayed in Table 1, which shows that the uncertainty in the input is amplified by a factor of three for scenario 1. By contrast, CovidSim is more robust under \({{\mathcal{S}}}_{2}\), in which case the same input uncertainty is still amplified to the output, although now by a factor of two.

Table 1 The mean CV for the input and output, and the CVR

Note that \({{\mathcal{S}}}_{1}\) has a higher CVR while imposing stronger control than \({{\mathcal{S}}}_{2}\). Although the stronger control results in a much lower absolute number of predicted deaths, the output is more uncertain in a relative sense due to the long tail (see Fig. 1a), which results in a higher output CV and therefore a higher CVR.


Conditional on a given \({\mathcal{S}}\), we found that the Report 9 predictions are captured by the parametric uncertainty at the lower bound of the 68% confidence interval. The PDF of the total death count is skewed and can support low-probability events with a predicted death count that is about five to six times higher.

We find that CovidSim amplifies the input uncertainty by 300% (that is, roughly by a factor of three; see Table 1) depending on the chosen NPI scenario. Despite this amplification of uncertainty, the distribution of the output does not envelope available validation data well for the two scenarios we considered. We do note, however, that the predictions will be very sensitive to the chosen \({\mathcal{S}}\), which therefore must be tuned if one wishes to validate CovidSim against available data (see, for example, ref. 7). Tuning the ICU triggers alone is insufficient. In Supplementary Section 4 we show the results of an additional UQ campaign where we sought to extract the best-guess ICU trigger values from data. These results are similar to those presented in the main manuscript.

Predicting the uncertainty in computational models is already considered as vitally important in weather and climate models. For instance, the author of ref. 25 claims that “…no weather or climate prediction can be considered complete without a forecast of the associated flow-dependent predictability”. We also argue that, in the case of COVID-19 predictions, a single deterministic prediction paints an incomplete picture, as we showed that such a prediction is better viewed as only one member of a much wider distribution. Hence, some measure of uncertainty is required for a correct interpretation of the results, so that those tasked with policy-making are presented with a more complete picture of the outcomes that the model is capable of predicting.

For instance, if the policymaker is presented with just the deterministic model outcome of Fig. 1b, they may draw the conclusion that the UK will suffer 50,000 deaths after approximately 600 days by adopting scenario \({{\mathcal{S}}}_{2}\). However, by taking some reasonable input uncertainty into account, we see that the same model can also predict that number in less than 200 days with the same NPI settings. Another example concerns predictions with hard thresholds (such as the maximum number of available ICU beds). A single prediction might lie on the safe side of the threshold, yet the model may exhibit a considerable non-zero probability that this threshold can be exceeded, if it were admitted that the models are uncertain. We expect that such kinds of information pertaining to uncertainty would influence the decision-making process in an important way.

Let us briefly discuss applying the proposed method to models other than CovidSim, which may well be beneficial for the same reasons mentioned above. The dimension-adaptive sampling scheme has a black-box assumption, and can therefore be applied without modification to other models; however, note that EasyVVUQ requires that a template for the input file must be created17. We used the FabSim3 automation toolkit to execute the ensembles on a supercomputer (in our case the PSNC Eagle machine26; see the Code Availability section for the relevant links to our software). In summary, (dimension-adaptive) parametric uncertainty propagation is general enough to be applied to other models and it is important to do so moving forward; however, although the dimension-adaptive approach is efficient, it is ultimately still limited by the dimension of the input space. We could not have applied our method to all inputs of CovidSim, for example.


To conclude, to retrofit the model’s outputs with the observed data requires additional post-hoc tuning of certain parameters that control the scenario in which the model is applied. These issues need to be addressed in seeking to provide a more quantitative albeit strongly probabilistic version of the code that might be suitable for its future application in healthcare and governmental decision-making. Our findings exemplify how sensitivity analysis and uncertainty quantification can help improve model development efforts, and in this case support the creation of epidemiological forecasting with quantified uncertainty.

As an alternative to retrofitting the scenario parameters, one could attempt to quantify the uncertainty related to the scenario the model is applied in. One such potential route for future research could involve creating cheap surrogate models for CovidSim, for example, in the stochastic space of the most influential parameters identified, which opens up the possibility of Bayesian inference27. This would allow us to update our assumptions on the input distributions and obtain posterior input distributions conditioned on observed data instead. Furthermore, such a statistical calibration can eliminate a bias between the mean prediction and real-world observations. Repeating the procedure for a discrete set of scenario parameters then allows for the combined estimation of the parametric and the scenario uncertainty using Bayesian ensemble methods (see, for example, refs. 15,16).


In this section we first describe our method for computing the statistical results and subsequently describe the uncertainty amplification factor.


Here we describe how we compute the probability distribution of the code output, the corresponding ensemble execution and how the Sobol indices are calculated.

Dimension-adaptive uncertainty propagation

The traditional forward uncertainty quantification methods present in EasyVVUQ (for example, stochastic collocation and polynomial chaos), are subject to the curse of dimensionality. To illustrate the problem, consider first the standard stochastic collocation method, which creates a polynomial approximation of the code output q, as a function of the uncertain inputs \({\boldsymbol{\xi }}=({\xi }_{d},\cdots \ ,{\xi }_{d})\in {{\mathbb{R}}}^{d}\):

$$q({\boldsymbol{\xi }})\approx \tilde{q}({\boldsymbol{\xi }})=\mathop{\sum }\limits_{{j}_{1}=1}^{{m}_{1}}\cdots \mathop{\sum }\limits_{{j}_{d}=1}^{{m}_{d}}q({\xi }_{{j}_{1}},\cdots \ ,{\xi }_{{j}_{d}})\ {a}_{{j}_{1}}({\xi }_{1})\otimes \cdots \otimes {a}_{{j}_{d}}({\xi }_{d})$$

Here, \(\tilde{q}\) denotes the polynomial approximation of q, and \(q({\xi }_{{j}_{1}},\cdots \ ,{\xi }_{{j}_{d}})\) is the actual code output, evaluated at some location inside of the stochastic domain of \({\boldsymbol{\xi }}\in {{\mathbb{R}}}^{d}\). Each input ξi &isi n; ξ is assigned an independent PDF p(ξi), and the goal is to propagate these through CovidSim to examine the corresponding distribution of the output q. The basic building blocks for the SC method are one-dimensional quadrature and interpolation rules, which are extended to higher dimension through a tensor-product construction. In equation (3), \({a}_{{j}_{1}}({\xi }_{1})\otimes \cdots \otimes {a}_{{j}_{d}}({\xi }_{d})\) is the tensor product of one-dimensional Lagrange interpolation polynomials, used to interpolate the code outputs \(q({\xi }_{{j}_{1}},\cdots \ ,{\xi }_{{j}_{d}})\) to a (potentially) unsampled location ξ. For instance, unlike the Monte Carlo method, the sample locations \(({\xi }_{{j}_{1}},\cdots \ ,{\xi }_{{j}_{d}})\) are not random. Instead, each \({\xi }_{{j}_{i}}\) is a point drawn from a one-dimensional quadrature rule, used to approximate integrals weighted by the chosen input distribution p(ξi). The order of the quadrature rule for the ith input determines the number of points mi, and due to the tensor product construction the total number of code evaluations for d inputs equals M = m1m2md, or M = md if all inputs receive the same quadrature order (see Supplementary Fig. 1 for an example). The exponential increase with d, known as the curse of dimensionality, renders the SC method intractable beyond d ≈ 10. Hence, although our parameter analysis in the main article indicates that only roughly 6% of the inputs will be varied at some point, due to the large number of inputs this is far too much for such brute-force UQ methods.

A dimension-adaptive version of the stochastic collocation method (based on the work of refs. 9,28) has therefore been implemented in EasyVVUQ. It is reasonable to expect that the output q will not be equally sensitive to each input ξi. Hence, although our input space is d-dimensional, a dimension-adaptive approach banks on the existence of a lower effective dimension. The basic idea is to start with a zeroth-order quadrature rule for all inputs, and to adaptively rank order the inputs, keeping all ineffective inputs at a low (possible zeroth) order, while increasing the order of those that are effective (see Supplementary Fig. 1 for an example in two dimensions).

The dimension-adaptive approach is explained in detail in ref. 9, here we only provide a general outline. Let Λ be the set containing all selected quadrature-order multi-indices (the grey squares of Supplementary Fig. 1), which is initialized as Λ  {(0, ,  , 0)}. Let the forward neighbours of any multi index l be defined by the set {l + ei1 ≤ i ≤ d}, where ei is the elementary basis vector in the ith direction, for example, e2 = (0, 1, 0,  , 0). The forward neighbours of the set Λ are then the forward neighbours for all l Λ, which are not already in Λ. Similarly, the backward neighbours of l are given by {l − eili > 0, 1 ≤ i ≤ d}. An index set Λ is said to be admissible if all backward neighbours of Λ are in Λ.

To adaptively refine the sampling plan, a look-ahead step29 is executed, where the computational model is evaluated at the new unique sample locations generated by those forward neighbours l where Λ  {l} remains an admissible set, corresponding to the × symbols of Supplementary Fig. 1. For each admissible forward neighbour l, a local error measure is computed. As proposed in ref. 10, we will base our error measure on the so-called hierarchical surplus, defined as the difference between the code output q and the surrogate prediction \(\tilde{q}\), evaluated at new sample locations of an admissible forward neighbour l,

$$s\left({{\boldsymbol{\xi }}}_{j}^{(l)}\right):= q\left({{\boldsymbol{\xi }}}_{j}^{(l)}\right)-{\tilde{q}}_{{{\Lambda }}}\left({{\boldsymbol{\xi }}}_{j}^{(l)}\right),\quad {{\boldsymbol{\xi }}}_{j}^{(l)}\in {X}_{l}\backslash {X}_{{{\Lambda }}}.$$

Here, XΛ is the sampling plan generated by the one-dimensional quadrature rules in Λ, and Xl is the sampling plan generated by Λ  {l}. Futhermore, \({\tilde{q}}_{{{\Lambda }}}\) is the polynomial surrogate constructed from points in XΛ alone. A local error measure can now be defined as

$${\eta }^{(l)}:= \frac{1}{\#({X}_{l}\backslash {X}_{{{\Lambda }}})}\sum \nolimits_{{{\boldsymbol{\xi }}}_{j}^{(l)}\in {X}_{l}\backslash {X}_{{{\Lambda }}}}\parallel s\left({{\boldsymbol{\xi }}}_{j}^{(l)}\right)\parallel .$$

Note that other error measures, based on quadrature errors9,30, or Sobol sensitivity indices29 can also be defined. The admissible forward neighbour with the highest error measure η(l) is added to Λ, which can cause new forward neighbours to become admissible, and the algorithm repeats.

Note that every index l = (l1,  , ld)  Λ constitutes a separate tensor product of one-dimensional quadrature rules with orders given by l. Unlike the standard approach in equation (3), the SC expansion in the adaptive case is therefore constructed as a linear combination of tensor products, that is

$$q({\boldsymbol{\xi }})\approx \tilde{q}({\boldsymbol{\xi }})=\sum \nolimits_{{\bf{l}}\in {{\Lambda }}}{c}_{{\bf{l}}}\mathop{\sum }\limits_{{j}_{1}=1}^{{m}_{{l}_{1}}}\cdots \mathop{\sum }\limits_{{j}_{d}=1}^{{m}_{{l}_{d}}}q({{\boldsymbol{\xi }}}_{{\bf{j}}}^{({\bf{l}})})\ {a}_{{j}_{1}}^{({l}_{1})}({\xi }_{1})\otimes \cdots \otimes {a}_{{j}_{d}}^{({l}_{d})}({\xi }_{d}),$$

where \(q({{\boldsymbol{\xi }}}_{{\bf{j}}}^{({\bf{l}})})=q({\xi }_{{j}_{1}}^{({l}_{1})},\cdots \ ,{\xi }_{{j}_{d}}^{({l}_{d})})\), and \({m}_{{l}_{i}}\) is the number of points generated by a one-dimensional rule of order li. The coefficients cl are computed as

$${c}_{l}=\mathop{\sum }\limits_{{k}_{1}=0}^{1}\cdots \mathop{\sum }\limits_{{k}_{d}=0}^{1}{\left(-1\right)}^{| {\bf{k}}{| }_{1}}\cdot \chi ({\bf{l}}+{\bf{k}}),\quad {\rm{where}}\quad \chi ({\bf{l}})=\left\{\begin{array}{ll}1&{\bf{l}}\in {{{\Lambda }}}_{{\bf{l}}}\\ 0&{\rm{otherwise}}\end{array}\right.;$$

see ref. 28 for details.

As equation (6) consists of a linear combination of tensor products, the choice of the quadrature rule chosen to generate the one-dimensional points substantially affects the total number of code evaluations. It is common practice to select a nested rule, which has the property that a rule of a given order contains all points generated by that same rule at lower orders. When taking linear combinations of tensor products built from nested one-dimensional rules of different order, may points will overlap. This leads to a more efficient sparse sampling plan, especially in higher dimensions. For our calculations, we employ the well-known Clenshaw–Curtis quadrature rule (see, for example, ref. 10).

Ensemble execution

Consequently, through the use of adaptive methods we make the uncertainty analysis of CovidSim tractable, but our analysis nevertheless required us to perform thousands of runs, each with its own unique set of input parameters. Specifically, we used the Eagle supercomputer at the Posnan Supercomputing and Network Centre31, which has a track record of reliably supporting large ensemble calculations. The workflows associated with these UQ/SA procedures are large, multifaceted and iterative, and to handle and curate them efficiently, we rely on the FabSim3 automation toolkit26. FabSim3 allows us to capture commonly used workflow patterns in single-line bash commands, and it automatically captures all of the relevant input parameters, output data and variables of both the job submission environment and the local machine environment in which each simulation has been executed.

Sobol index calculation

Sobol indices are variance-based sensitivity measures of a function q(ξ) with respect to its inputs \({\boldsymbol{\xi }}\in {{\mathbb{R}}}^{d}\) (refs. 23,32). Let \({\mathbb{V}}\left[{q}_{{\bf{u}}}\right]\) be a so-called partial variance, where the multi-index u can be any subset of \({\mathcal{U}}:= \{1,2,\cdots \ ,d\}\). Each partial variance measures the fraction of the total variance in the output q that can be attributed to the input parameter combination indexed by u. The Sobol indices are defined as the normalized partial variances, that is

$${S}_{{\bf{u}}}:= \frac{{\mathbb{V}}\left[{q}_{{\bf{u}}}\right]}{{\mathbb{V}}[q]}$$

where \({\mathbb{V}}[q]={\sum }_{u\subseteq {\mathcal{U}}}{\mathbb{V}}[{q}_{{\bf{u}}}]\) is the is the total variance of q (ref. 32). As all partial variances are positive, the sum of all possible Su equals 1.

To perform the Sobol sensitivity analysis, we employ the method described in ref. 24, which is an adaptation of a method originally proposed in ref. 33. The general idea is to transform the adaptive SC expansion into a polynomials chaos expansion (PCE) to facilitate the computation of the Sobol indices. The PCE equivalent of equation (3) reads

$$q({\boldsymbol{\xi }})\approx \tilde{q}({\boldsymbol{\xi }})=\sum \nolimits_{{\bf{k}}\in {\mathcal{K}}}{\eta }_{{\bf{k}}}\ {\phi }^{({k}_{1})}({\xi }_{1})\otimes \cdots \otimes {\phi }^{({k}_{d})}({\xi }_{d})$$

Here, the basis functions ϕk are usually constructed to be orthonormal to the input density, and the response coefficients ηk are normally computed via a spectral projection technique or via a regression method. Unlike equation (3), summation does not take place over the collocation points ξj. It instead takes place over multi indices \({\bf{k}}=({k}_{1},\cdots \ ,{k}_{d})\in {\mathcal{K}}\), determined by a selected truncation scheme (see below). The PCE method is a well-know technique; please refer to refs. 34,35 for more details.

The PCE method is particularly suited for sensitivity analysis, as the Sobol indices can be calculated from the response coefficients ηk in a post-processing procedure36. The PCE mean and variance (when the ϕk are orthonormal), are given by34

$${\mathbb{E}}\left[\tilde{q}\right]={\eta }_{{\bf{0}}}\quad {\rm{and}}\quad {\mathbb{V}}\left[\tilde{q}\right]=\sum \nolimits_{\begin{array}{c}{\bf{k}}\in {\mathcal{K}}\end{array}{\bf{k}}\ne {\bf{0}}}{\eta }_{{\bf{k}}}^{2}$$

Similarly, the partial variances can be computed with

$${\mathbb{V}}\left[{\tilde{q}}_{{\bf{u}}}\right]=\sum \nolimits_{{\bf{k}}\in {{\mathcal{K}}}_{{\bf{u}}}}{\eta }_{{\bf{k}}}^{2}\quad {\rm{where}}\quad {{\mathcal{K}}}_{{\bf{u}}}=\{{\bf{k}}| {k}_{i}>0\ {\rm{when}}\ {k}_{i}\in {\bf{u}},\ \ j=0\ {\rm{when}}\ j\notin {\bf{u}}\}.$$

The multi index set \({{\mathcal{K}}}_{{\bf{u}}}\) can be interpreted as the set of all multi indices corresponding to varying only the inputs indexed by u. That is, if, for instance, u = (1, 3), \({{\mathcal{K}}}_{{\bf{u}}}\) is the subset of \({\mathcal{K}}\), with all indices k where k1 > 0 and k3 > 0, with all other kj = 0. Note that with equations (10) and (11), the Sobol indices in equation (8) are readily available, provided we have the PCE coefficients ηk.

To compute the PCE coefficients from our anisotropic sparse grid, we can transform the Lagrange basis to a PCE basis on the level of the one-dimensional basis functions24. Applying this transformation \({\mathcal{T}}\) to equation (6) yields

$${\mathcal{T}}\left[\tilde{q}\right]=\sum \nolimits_{{\bf{l}}\in {{\Lambda }}}{c}_{{\bf{l}}}\ {\mathcal{T}}\left[\mathop{\sum }\limits_{{j}_{1}=1}^{{m}_{{l}_{1}}}\cdots \mathop{\sum }\limits_{{j}_{d}=1}^{{m}_{{l}_{d}}}q({{\boldsymbol{\xi }}}_{{\bf{j}}}^{({\bf{l}})})\ {a}_{{j}_{1}}^{({l}_{1})}({\xi }_{1})\otimes \cdots \otimes {a}_{{j}_{d}}^{({l}_{d})}({\xi }_{d})\right],$$

and so we have to apply the transformation separately to each tensor product. Equating a tensor product of equation (12) to a corresponding PCE expansion in equation (9) yields

$$\begin{array}{rcl}\mathop{\sum }\limits_{{j}_{1}=1}^{{m}_{{l}_{1}}}\cdots \mathop{\sum }\limits_{{j}_{d}=1}^{{m}_{{l}_{d}}}q({{\boldsymbol{\xi }}}_{{\bf{j}}}^{({\bf{l}})})\ {a}_{{j}_{1}}^{({l}_{1})}({\xi }_{1}) & \otimes \cdots \otimes {a}_{{j}_{d}}^{({l}_{d})}({\xi }_{d})= &\sum \nolimits_{{\bf{k}}\in {{{\Lambda }}}_{{\bf{l}}}}{\eta }_{{\bf{k}}}^{({\bf{l}})}{\phi }^{({k}_{1})}({\xi }_{1})\\ & \otimes \cdots \otimes {\phi }^{({k}_{d})}({\xi }_{d}),\end{array}$$

where the PCE truncation is Λl {kk ≤ l}24. By using the orthogonality property of the PCE basis functions (and the independence of the input distributions), we can find an expression for each coefficient \({\eta }_{{\bf{k}}}^{({\bf{l}})}\) as

$${\eta }_{{\bf{k}}}^{({\bf{l}})}=\mathop{\sum }\limits_{{j}_{1}=1}^{{m}_{{l}_{1}}}\cdots \mathop{\sum }\limits_{{j}_{d}=1}^{{m}_{{l}_{d}}}q({{\boldsymbol{\xi }}}_{{\bf{j}}}^{({\bf{l}})})\ {v}_{{k}_{1}}^{({l}_{1},{j}_{1})}\otimes \cdots \otimes {v}_{{k}_{d}}^{({l}_{d},{j}_{d})},$$

where each univariate transformation coefficient \({\nu }_{{k}_{i}}^{({l}_{i},{j}_{i})}\) is given by

$${\nu }_{{k}_{i}}^{({l}_{i},{j}_{i})}=\int {a}_{{j}_{i}}^{({l}_{i})}{\phi }_{{k}_{i}}\ p({\xi }_{i}){\rm{d}}\xi ,\quad i=1\cdots \ ,d.$$

This is integrated over the support of p(ξi) using Gaussian quadrature. To generate the orthonormal \({\phi }_{{k}_{i}}\) we use the Chaospy package37.

Once in possession of the \({\eta }_{{\bf{k}}}^{{\bf{l}}}\), we can compute the statistics and the Sobol indices corresponding to an adaptive sparse grid. The first two moments are given by

$${\mathbb{E}}\left[\tilde{q}\right]=\sum \nolimits_{{\bf{l}}\in {{\Lambda }}}{c}_{{\bf{l}}}\cdot {\eta }_{{\bf{0}}}^{({\bf{l}})}\quad {\rm{and}}\quad {\mathbb{V}}\left[\tilde{q}\right]=\sum \nolimits_{\begin{array}{c}{\bf{l}}\in {{\Lambda }}\\ {\bf{l}}\ne {\bf{0}}\end{array}}{\left[\sum \nolimits_{{\bf{k}}\in {{\mathcal{K}}}_{{\bf{l}}}}{c}_{{\bf{k}}}{\eta }_{{\bf{l}}}^{({\bf{k}})}\right]}^{2}$$

where \({{\mathcal{K}}}_{{\bf{l}}}:= \{{\bf{k}}| {\bf{l}}\in {{{\Lambda }}}_{{\bf{k}}},\forall {\bf{k}}\in {{\Lambda }}\}\). The expression for the variance is obtained by (1) inserting equation (6) into \({\mathbb{E}}\left[{\tilde{q}}^{2}\right]-{\mathbb{E}}{\left[\tilde{q}\right]}^{2}\); (2) grouping all terms with like k in \({\mathbb{E}}\left[{\tilde{q}}^{2}\right]\), which is what \({{\mathcal{K}}}_{{\bf{l}}}\) indicates; and (3) using the orthogonality of the ϕk to remove all cross terms ϕkϕj, j ≠ k. The statistics in equation (16) represent a more general version than those given in equation (10), and will revert to these equations when given the combination coefficients cl corresponding to a standard, non-adaptive SC grid. The partial variances \({\mathbb{V}}\left[{\tilde{q}}_{{\bf{u}}}\right]\), and by extension the Sobol indices, are computed in the same way as before, namely by summing individual variance contributions indexed by the set \({{\mathcal{K}}}_{{\bf{u}}}\) shown in equation (11).

Uncertainty amplification factor

The aim here is to find a robustness score of a computational model, under uncertainty in the input parameters. A simple (dimensionless) measure for variability in some random variable X is the CV, which is defined as the standard deviation over the mean, that is

$${\mathrm{CV}}(X)=\frac{{\sigma }_{X}}{{\mu }_{X}},\quad {\rm{if}}\ \ {\mu }_{X}\ne 0.$$

Any forward uncertainty propagation method approximates the first two moments of the output \(q\in {{\mathbb{R}}}^{N}\), and so \(CV(q)\in {{\mathbb{R}}}^{N}\) is readily available. Assuming we can (analytically) compute the first two moments of each input ξiξ, i = 1,  , d, \(CV({\xi }_{i})\in {\mathbb{R}}\) is also easily computed. Although ξ may contain inputs defined on vastly different scales, as the CV is a dimensionless quantity, this will not pose a problem. We propose to use the ratio of CV(Q) and CV(ξ) as a relative measure of variability between the input and the output. To do so we first have to account for the fact that in general, d ≠ N. Here we choose to average over all points:

$${\mathrm{CVR}}:= {\mathrm{CV}}\left(\bar{q}\right)/{\mathrm{CV}}\left(\bar{{\boldsymbol{\xi }}}\right)=\left(\frac{1}{N}\mathop{\sum }\limits_{n=1}^{N}\frac{{\sigma }_{{q}_{n}}}{{\mu }_{{q}_{n}}}\right)\ /\ \left(\frac{1}{d}\mathop{\sum }\limits_{i=1}^{d}\frac{{\sigma }_{{\xi }_{i}}}{{\mu }_{{\xi }_{i}}}\right).$$

The basic idea of equation (18) is to say something about the robustness of the code to input uncertainty, given the fact that in all likelihood the choice of input distributions can be at least partly ambiguous. We have, for instance, prescribed an input distribution for the relative household contact rate after closure, with end points located at 20% of the default value (see Supplementary Table 1). Although this was within the range suggested by expert opinion, the number of 20% is still just a user-specified choice, and it might as well have been for instance 15%. It therefore makes sense to look at the relative input-to-output uncertainty; thus, when given a user-specified average input perturbation of say 20% (\({\mathrm{CV}}(\bar{{\boldsymbol{\xi }}})=0.2\)), equation (18) tells us to what extent the code (which is a nonlinear mapping from the input to the output) amplifies this assumed uncertainty. Relative damping of uncertainty is also possible, corresponding to CVR < 1.