## Main

Since the global outbreak of SARS-COV-2 virus, most countries have used a combination of different non-pharmaceutical interventions (NPIs) to limit the rate of spread, flatten the curve and protect high-risk populations. Top-down policy interventions were supplemented with bottom-up precautionary measures taken by citizens as a response to rising confirmed infected cases and deaths in their regions1. Despite a nearly universal reaction to the spread of the virus, countries (or regions) have had varying levels of success in containing the pandemic and limiting the rate of infection in the population2. It is crucial to understand the reasons for these variations to design and evaluate NPI policies, quantify the trade-offs related to timing and stringency of those policies, and identify the extent to which policies that have proven successful in one state or country can be copied by others with little modification. Extant studies attribute such differences to different factors such as timing and stringency of social distancing policies3,4,5,6, behavioral factors such as mask-wearing practices7,8, timing and intensity of voluntary responses8,9, quality of public health infrastructure and testing10,11, geographical locations6,12,13, and demographic compositions (the age structure, household distribution, ethnicity, gender, income and so on)8,10,14,15,16,17,18. These studies are successful in explaining a substantial portion of differences in the success-level across different countries, but some differences remain unaccounted for. Socio-cultural differences and variations in social interaction norms are sometimes cited as important factors that contribute to these differences and can affect how the government designs policies and people respond to policies19,20. Although certain behavioral and norm-related factors play a role in the factors described earlier, in this paper we focus on variations in the structure of social interaction across different countries.

The importance of micro-level social interaction structures has previously been demonstrated in the epidemiology and network science literature21,22,23,24. In more recent studies, micro-level network structures have been used to study tuberculosis transmission in South Africa25; the spread of HIV in Africa26; the efficacy of contact tracing for controlling COVID-1927,28; and the impact of the contact patterns on the COVID-19 outbreak in China29. Furthermore, recent studies have provided empirical evidence on the importance of interdependencies and social networks in the transmission of COVID-1913. However, most of these studies do not present a model that is simultaneously data-driven and explainable, which is essential for counterfactual analysis, something that is at the heart of policy design. Furthermore, few studies have tried to model the effect of micro-level structural factors on the success levels of different countries.

Developing such explainable, data-driven models is challenging for two reasons. First, there is a scarcity and difficulty in acquiring high-resolution data on relevant social interactions. For the spread of SARS-COV-2, this data is particularly important, as infection is driven by in-person interactions lasting longer than several minutes (15 min, according to Centers for Disease Control and Prevention definition of close contact). Geolocation data and mobility networks are sometimes used as proxy measures for such in-person interactions10,30,31. However, there is a possibility that the in-person interactions may differ substantially for a given mobility network. Alternatively, we can use aggregate data on in-person interactions and rely on national and regional averages, which are either already available or easier to obtain. There is, however, some uncertainty as to whether such low-resolution data can be useful in developing policy evaluation models. We will show in this paper that such data can still be useful if used in the right modeling framework.

Even when high-resolution data are available, incorporating them into a dynamic explanatory model is difficult. These data can substantially improve the accuracy of machine learning models in predicting infection cases32,33,34. However, it is more suitable to use these models in situations where prediction is the primary goal rather than intervention, as the inclusion of high-resolution data would increase model complexity and prevent their use for scenario analyses. Explainability stands in contrast to black-box thinking and is considered as the ability to understand/interpret the predictions of the computational model, and provide a causal understanding that links model inputs and prediction outcomes35. The theoretical foundation for trade-offs between accuracy and explainability is still a topic of debate and research, but most scholars have demonstrated justifications and evidence for such trade-offs36,37. Without going into the detail of these arguments, we stress that explainability is crucial when models are used to identify intervention trade-offs on the basis of what-if-scenario-based studies, in which black-box predictability can be misleading.

In this paper we aim to study the effect of variations in micro-level social interaction structures on the differences in countries’ success in containing a pandemic, and to create an explainable model that can be used for high-level policy evaluations. The limited data and computational challenges raised earlier led us to develop data-driven scaled notional networks as a middle-ground approach to the trade-off between models based on abstract complex networks and large networks based on dynamic high-resolution data. Using a small set of network agents and relying on average national data (as opposed to the entire distribution) would both help to make the stochastic simulations for the SEIR models more efficient. For the purpose of understanding the role of micro-level social interaction structures and minimizing variations associated with cultural difference due to factors such as social norms or trust in governments, we focus on a group of group of European countries that are relatively similar in some socioeconomic aspects (compared with the overall variation across different countries of the globe), but have sufficient variations in micro-level social structures and have experienced varying levels of success in containing the pandemic. The central question of this paper is therefore: when all else is equal and the time and stringency of social distancing measures are accounted for in the models, how much of the variation in the success rates of these countries can be explained by micro-level social interaction parameters?

The success of social distancing policies is largely determined by how different the physical contact networks are before and after the intervention. We expect the magnitude of this difference to be driven by a variety of factors; however, two micro-level structural factors stand out: first, we anticipate that average household sizes will play an important role, as households have no reason to expect much change from social distancing policies. In previous studies, the role of household size in shaping the dynamics of the COVID-19 pandemic has been examined using both observational and modeling methods15,16,18. We also expect the number of close in-person contacts to be the other important factor. In addition to impacting the dynamic of the disease before policies are implemented, it can substantially alter the difference between the networks before and after the policies are implemented.

To build an explainable computational model, we first present an algorithm that builds a scaled, computationally efficient person-to-person interaction network for each country, based on two micro-level social interaction data: the average household size and the average number of contacts. We then use a network-based SEIR model (see Methods) to simulate the disease dynamic for each country during the early stage of the COVID-19 pandemic. We incorporate the effect of social distancing policies based on the pandemic state and the stringency level of policies in different countries. All models are built using available average national data and no additional model parameter or calibration process is used. Crucially, our model does not adjust any of the parameters to match the simulated results with real data on the changes in the number of COVID cases. This is a conscious simplification that sacrifices some of the predictive power of the model compared with other more complex models. As a result of this simplification, however, we can illustrate how top-down policies interact with micro-level structural parameters and allow for counterfactual analysis. We also demonstrate that this simple model has sufficient predictive power to validate the model during the first wave of the pandemic.

To validate our model, we used data from the first wave of the pandemic, the only wave during which most countries had switched from no social distancing policies to some level of such policies. By focusing on the first wave, we were also able to control the model complexity without having to include additional parameters to account for heterogeneity in reversing NPIs in different countries, variations due to social norms regarding compliance, and later on different vaccination policies. The results show that our model is capable of reproducing the relative success of COVID-19 social distancing policies without any parameter tuning or post-simulation calibration. Based on the results of simulations conducted in these countries, we find that very strict policies are not always required to control the pandemic in all countries, and to that somewhat weaker policies can also work, depending on to two micro-level structural factors. The model can offer a possible strategy to design a social distancing policy for governments that are having difficulty adopting high stringency policies, or who are trying to balance the effects of the pandemic with other political and economic factors.

## Results

### Methods overview

Figure 1 shows the overall modeling strategy, which includes four steps. First, we used social interaction data to build a representative scaled-down network for each country. The country-specific social interaction data include the average household size and the average daily number of close (person-to-person) contacts, which are both crucial in shaping the rate of virus spread (see the Methods for more details). We built this scaled-down network using a transformation that takes those two data points as input and generates a representative small-world network with a set of parameters specific to each country. Figure 2 presents a sample visual representation that offers a more direct and clearer demonstration of the four-step network generation process. In the second step, we modeled the spread of the virus on these representative networks using a compartmental model (SEIR), implemented with algorithms developed for efficient simulation on complex network structures. Third, we incorporated the effect of the social distancing policies in terms of stringency and timing (both taken from real data with no calibration) into the model. Together, these three steps generated the trajectory of the total confirmed cases for each country by taking into account the combination of micro-level network parameters as well as policy timing and stringency factors (fourth step). The model does not involve any adjustment to the input parameters and no error-minimization calibration was used to match the simulated results with real data on the changes in the number of COVID cases.

The model was validated using ten European countries, divided into two sets primarily on the basis of the source of micro-level social structure data. In addition to using the model to reproduce pandemic trends, we performed a series of counterfactual and scenario-based studies to isolate the role of each of these micro-level factors and their interactions with policy parameters. The results were also further validated using an array of robustness tests.

### Main results

We simulated the model in the corresponding small-world network of representative countries (ten European countries in total, divided into two sets) for the first wave of the pandemic. Our primary goal was to validate the predictive power of the model in reconstructing the relative trends and ordinal ranking of these countries with respect to their case numbers at different points of time, before and after the implementation of social distancing policies in the first two months, when the first wave of the virus was still on the rise.

In Fig. 3 we show the simulated cumulative confirmed cases for the first set of five countries (Fig. 3b), next to the real cumulative confirmed cases per million for the same countries (Fig. 3a)38. These five countries are Germany, the United Kingdom, the Netherlands, Italy and Belgium, for which the national average rate of physical interaction was available from survey data. We can see that the model is capable of reproducing the relative order in the dynamic of the disease propagation in the period of the study. First, the relative order in the simulated results matches the real data in the final number of infected cases. More importantly, the simulated results reconstruct some of the interesting dynamics that happen during the transition from the outset of the pandemic to the end of two months. For instance, the model successfully predicts that Belgium overtakes Italy, which previously ranked first in terms of the number of cases among the five countries, about a month after the start of the epidemic. This dynamic preservation of relative order is also evident in a more nuanced way, that is, by comparing the per-capita cases in Germany, the Netherlands and the United Kingdom, which closely track each other but exhibit a few notable cross-points in the second month that are also replicated by the model. Finally, although the goal of the model was to reconstruct the ordinal ranking, comparing the distances between the final number of cases in Fig. 3a,b suggests that the model also has decent predictive power in terms of the ratio of cases among different countries, and can achieve a relatively low error rate (simulation versus data), especially considering the simplicity of the model (see Supplementary Section 9 for a discussion on the simple match score). The error rate is below 10% when we calculate the average relative error of all of the countries. In a way, it is a corollary of a model that dynamically keeps the ordinal ranking of different trajectories to also replicate the proportions of pairs of point estimates with a certain level of accuracy. Although this is not the subject of this paper, these encouraging observations make the model worthy of further exploration to probe this further in future studies.

Further validation of the model with other sets of countries is challenging due to the lack of available data on national face-to-face contact rates, there are also questions regarding the reliability of confirmed cases in many countries in the early days of the pandemic (see Supplementary Section 8 for more information). With these data issues in mind, we performed the simulation process for the second set of countries from Central and Eastern Europe (Hungary, Austria, Poland, Slovenia and Slovakia), using the rate of face-to-face contacts estimated by a trained model, as opposed to the more reliable tracking method used in the previous set (the results are shown in Fig. 4). As expected, the prediction accuracy of the second set was lower than the initial set (15% for the first set (after 60 days) versus 10% for the second); however, the model still was successful in replicating the ordinal ranking of the confirmed cases during the initial wave (see Supplementary Section 9 for a discussion on the details of calculating the error rate and country-specific error contributions).

The model can also help us see the effect of different micro-level parameters, and their interactions with the timing and stringency of macro-level policies. As expected, a contact pattern with higher density (greater mean reported contacts, M) in general increases the rate of the virus spread. However, once we fix the average number of contacts, we can expect that the ratio of familial to long contacts (that is, strong to weak contacts) to also play a role, because in the absence of any NPI (and also in cases where NPI is either lenient or is implemented too late), a higher share of long contacts (which translates into fewer familial contacts) imposes higher risk on the network. In the next section we will further explore the relative role of strong (within-household) and weak contacts, and the dependency of this role as a function of the timing and stringency of NPIs. The simulated trajectory of each country in this model is thus determined by the combination of the above four country-specific parameters (mean reported contacts number, average household size, the stringency index and implemented timing of policies), except for the virus’s transmission rates, which are the same for all experimental countries. In other words, the relative order of the total confirmed cases in these five countries is approximately simulated using only these four parameters, all of which are obtained by scaling the real data using a systematic scaling method (as opposed to calibration) and the same scaling factors for all the countries.

### Model validation with longer periods

Although the model is primarily designed for the first 60 days of the first wave, we extended the period to up to 120 days. The results of the 90 day (until 1 June 2020) and 120 day (until 1 July 2020) confirmed cases show general agreement with the real data (see Supplementary Figs. 8 and 9), but the model errors rise to 16.72% and 22.66% for the 90 and 120 day cases, respectively, and these numbers are 27% and 31%, respectively, for the second set of countries (see Supplementary Section 9).

We believe there are two reasons for the fade in accuracy of our model: the first reason has to do with variations in the level of voluntary social distancing across these five countries, especially once a relatively homogeneous response of the early days gives way to a more heterogeneous way of reversing the peak-time behavior. The second reason is due to the way we model the social distancing policies, that is, by using a single-shot approximation, which is different from the multi-stage implementation of such policies in reality.

As for the first factor, one can technically incorporate the variations in voluntary social distancing policies into building this network by empirically separating the effect of policy from voluntary behavioral change39. This will improve the longer-term predictive power of the model, but doing so will add more layers of complexity to the model and will fall outside of the scope of this paper. Although including voluntary social distancing into the model requires adding additional parameters and assumptions, creating a multi-stage implementation of policies is something that can fit within the current framework. However, the one-shot implementation is a conscious decision to keep the model simple and to show the power of this simple model for the period of the study. Naturally, this modeling strategy starts to become less efficient if we extend the duration passes the first peak and we get to periods in which countries start (heterogeneously) rolling back their policies, which naturally requires more parameters for each country to capture the time and degree of NPI policy relaxation. Once again, we opt for a shorter period of model validation in favor of keeping the number of model parameters small, and to avoid any unintended calibration that can be the consequence of a larger degree of freedom for the model.

### Robustness tests

We performed a series of robustness tests to check some of the underlying assumptions of the model, and tested the sensitivity of the model to variations in some of the key parameters. Most of the details of these robustness tests are presented in the Supplementary Section 6. Here we briefly discuss the implication of each test.

One of the key input parameters to our model is the average number of close contacts (M). It is likely that the survey-based data used in this model has introduced some systematic biases to the reported M value; we therefore performed a series of simulations to test the sensitivity of the model to moderate the variations in the absolute M value for each country while keeping the relative M value fixed. As the results demonstrate (see the Supplementary Section 6.1), the model is robust to variations of ±20% in the M value, and still demonstrates an acceptable performance even when M has an error of up to +50%. We also looked into the sensitivity of the model to the infection rate (β) and demonstrated that the results were robust to moderate variations of the rate of infection in the model (Supplementary Section 6.2).

### Policy implications

To determine the policy implications of these findings, we conducted a series counterfactual analyses and scenario studies using the proposed modeling framework. First we tested several counterfactual analyses for all of the countries in the first set, and verified that it is in fact the interaction of the policy factors with micro-level parameters that is responsible for the performance of this model, and not the policy parameters or the micro structure (see Supplementary Section 11). We then proceeded to use the framework for a series of scenario-based studies to determine the relative importance of the model’s input parameters. We focused on scenarios that capture the interaction of a composite micro-level parameter with a policy parameter. The former measures the ratio of the average household size to the number of daily contacts (household to total contact ratio or HTTCR, hereafter), whereas the latter is the timing of social distancing policies, keeping the level of stringency fixed across scenarios.

Let us consider three hypothetical countries, each with the same average contact rate, but with low-, medium- and high-HTTCRs, respectively. We set their number of average daily contacts to ten, and set their average household sizes to 1.9, 2.9 and 3.9, respectively, which resulted in three different HTTCR values (0.19, 0.29 and 0.39, respectively). We then test two scenarios: early (after 10 days) and delayed (after 25 days) implementation of social distancing policy of relatively high stringency (σSD = 0.9) in these countries.

Figure 5 shows the dynamic in the number of daily cases. The results for the no-social-distancing-policy case (Fig. 5c) are as expected: once the contact density is fixed, the lower proportion of within-household contacts speeds up the spread of the virus. Once the social distancing policies are implemented (Fig. 5a,b), we can see how the timing of social distancing policies can change the trajectory of the pandemic for countries with different HTTCRs. Notably, an early implementation of these policies has a stronger effect on countries with smaller HTTCRs. This is more evident in Fig. 5d, which shows the ratio of the cumulative number of cases after 50 and 100 days for high- and low-HTTCR countries as a function of the social distancing implementation date. The graph shows that early implementation benefits the low-HTTCR country (when the ratio on the y-axis is above 1), but the effect fades away after a certain date and the ratio can substantially turn in benefit of the high-HTTCR country, especially towards the end of the simulation period.

These results can provide some insight to better understand the comparative dynamic of the countries in this study. Italy had the most confirmed cases at the beginning because it had the highest contact density and lowest proportion of familial contacts, which sped the pandemic up. However, due to implementing the strictest social distancing policies, which were applied early, the pandemic entered a steady stream after a short period of rapid increase. The importance of early implementation of social distancing policies is starker for Italy due to it’s high level of pre-pandemic average daily contacts and low proportion of familial contacts, as our model suggests, and as demonstrated in past studies6. We also expect a similar pattern to be true for metropolitan areas such as New York City, which has a relatively low household size to average contact ratio8. In Belgium, which, by contrast, has a mid-level contact density, relatively weak and late policies led to the worst case among these five countries during the first wave.

The result of the simulation in Germany is particularly interesting and shows how much this country benefited from a low contact density and high proportion of familial contact to slow the pandemic. Compared with the other four countries in the first set, Germany’s stringency of policy was the lowest and the implementation time was late, but the number of confirmed cases began to enter the concave part relatively early, both in simulation and data. In the UK and the Netherlands, the contact pattern and policies were quite similar. The Netherlands has a slightly higher contact density but an earlier implementation of policies than the UK. This similarity is also evident in both simulation and data.

As a summary observation, we can find that a high stringency and early social distancing policy is sufficient and effective to curb the pandemic (as was the case in Italy), but it is not always necessary (as was the case in Germany). The contact pattern in the country is also critical to influence the spread of the pandemic. A relatively lower stringency and late policy is still possible to control the pandemic if the contact density (number of contacts) is low or ties are highly clustered (HTTCR) in the country.

## Discussion

This paper demonstrates an explainable model that takes a balanced approach between the predictive power of complex granular models and the computational efficiency and interpretability of SIR/SEIR network models. The key to this balanced approach is to develop algorithms to transform micro-structural data into a small-scale social network. Such country-specific explainable models can then be used to capture the interaction between top-down policy and bottom-up structural factors, and used in counterfactual analysis and early stages of policy design at the national or regional levels. Such models are also easier to build (especially for developing countries) due to the high probability of aggregate-level data availability

However, these models have a number of limitations that are a direct result of their simplicity. First, to simplify the model, any heterogeneity of parameters outside of micro-scale structural variations and policy parameters is ignored. Specifically, we ignore the distributions of household sizes and in-person social contacts and rely solely on averages. This decision was in part due to data availability, and partly to demonstrate how much of the micro-level factors can be captured just by using average national level data. Given that such national level average data are more accessible for most countries, this model shows that the policy-maker can still benefit from using them even in the absence of higher-resolution data. Yet we expect this work to benefit from including micro-level data at a higher geographical resolution, as well as distribution parameters. Moreover, we use a single step-function in implementing the social distancing policies, something that is different from the often multi-step implementation of the real world. We expect that more gradual implementation of NPIs are needed for the subsequent waves of a pandemic.

Second, by relating the process of social contact link removal to the stringency of top-down policies, we implicitly ignore the role of policy compliance and voluntary social distancing, both shown to be very important during the COVID-19 pandemic; however, this is acceptable because we model the relative case trajectories, provided the compliance and voluntary measures are relatively similar across the countries in each set.

Finally, this model is characterized for the early stage of the pandemic, where the primary policy vehicle was social distancing. As countries entered later stages, other policy instruments, mask requirements and—later on—vaccination policies, played crucial roles in putting countries on different trajectories. Although the model has enough degree of flexibility to accommodate some of these factors, it remains a topic for future studies to see whether the average data on national mask policy, mask compliance and vaccination rate can provide sufficient explanatory power for future generations of this model.

Overall, some of the limitations can be addressed by adding first-level heterogeneity, including in multi-step policy implementation schemes, and changing the scaling in the network transformation algorithms. Others are, however, inherent to the nature of such types of models, and addressing them would require adding many details, which is contrary to the reason for the model’s existence.

## Methods

### Physical interaction network model

COVID-19 mainly spreads from person-to-person in interactions with close physical proximity. Consequently, one ideally needs to build a physical contact network-based on micro-level geographical co-location data for all (or at least a representative sample) of the population. However, collecting the data on all physical contacts between all individuals in a population is an impractical, time-consuming task40. Moreover, even if we did have access to all co-location data, we would run into computational issues due to the size of the resulting network, especially for policy applications that require analyzing many what-if scenario questions. To address these issues we use scaled, notional networks, constructed based on real demographic data to be used as the initial point (that is, pre-pandemic collocation) for in-person contacts. These types of scaled, notional networks provide us with a balance between the diversity of networks for different countries enabled by real data, and the practicality and computational efficiency needed for scenario studies.

But which class of notional networks is suitable for this purpose? The structure of physical contact networks (geographic co-location) is quite different in nature from those commonly observed in online social networks and other types of networks whose functions are primarily exchange of information. Unlike the latter in which (often) strong preferential attachment mechanisms can result in the emergence of high-degree network hubs and power-law distributions, the degree of physical contact networks are tightly linked to nodes capacity in terms of time and mobility. Small-world networks41, on the other hand, have been shown to be able to capture several empirical facts of the contact pattern such as, the heterogeneous number of contacts, people living in small groups which are overlapping, and people contacting across long distances due to work and social relations42. Similar qualities of the small-world network are also found in a simulated physical contact-based network, created by Eubank30. Eubank’s physical contact-based network is built by large-scale individual-based urban traffic simulations using data from the census, land-use and population-mobility. The degree distribution and diameter of this simulated network suggest that the people-contact graph is more like a small-world graph rather than a random graph. Evidences of the small-world network have also been shown in mobility networks. For example, the structure of the mobility network in Germany, generated by Schlosser43, shows properties of the small-world network, which are the large clustering coefficient and the small average shortest path length. In this work we use small-world networks to represent the pre-pandemic contact network of a given country and in the next section will discuss the process we use to set the parameters of such networks.

#### Generating pre-pandemic notional contact networks using demographic data

Building the notional small-world network requires three parameters41: the number of nodes, N, the mean number of neighbors of each node, M, and the probability of each link to rewire, P. In our model, we fix the number of nodes that represent individuals in our notional model for all of the countries and use the other two parameters (M and P) to introduce key variations across different countries. Although this decision is not necessary, it facilitates comparison of the effect of different policies across countries.

We use the next parameter, M, to capture the average number of in-person contacts of individuals per unit of time. Although this data is not available for all, we do have access to survey-based data for a number of countries. Such survey reports44,45,46,47 provide a large-scale quantitative approach for infections transmitted by in-person contact route. For this work, we use data from Joe Mossong’s population-based servery46, which includes a total of 7,290 participants and 97,904 physical contact records with different individuals during one day across five targeted European countries. To use the data from this survey, we simply round the mean reported contact number of each country in Joe Mossong’s survey report to the closest integer to approximate the mean degree (M) of corresponding simulated contact-based network. For example, Joe Mossong’s survey report shows that the mean daily reported contact number, including the familial contacts (within-household interactions) and external contacts (out of household in-person social interaction), in Germany is 7.96, then the mean degree is set as 8 in the simulated small-world network of Germany.

Once the values for N and M are set for each country, we are left with setting P. This parameter can drastically modulate the speed of disease transmission41,48, but how do we decide its value? This is not directly evident as, unlike the other two parameters, we don’t have a direct corresponding interpretation of this parameter in real networks, and to the best of our knowledge, there is no previous work that maps real data on physical contacts to estimate the value of rewiring probability in small-world networks. This becomes even more challenging in our work as we don’t want to estimate this parameter to match simulation to the real virus spread data. Here we propose an algorithm to transform the survey-based data on average household size that has been previously shown to be important in shaping the disease dynamic16,18, combined with the value of the average number of physical interactions used in the previous step, into an estimated value of rewiring probability. The data on average household size across different countries is readily available for most counties and is regularly updated by various organizations that collect such data (see the 2019 report in ref. 49, for example). The basic idea behind this transformation is assuming a limited close interaction capacity (time and cognitive capacity) for each individual50, which means that if people have a fixed contact number and more connections with their families, then they have fewer external connections with distant groups. That is, in the small-world network, such nodes have a lower probability to rewire links from neighbors to further nodes.

We use a four-step process to use the country-specific average household size for setting the rewiring probability in the representative small-world network. First, we build a small-world network with the given N and M, already determined, and an initial estimated value for P, which will be adjusted in the final step. Second, all nodes were separated into N/K different groups by order, where K is a new parameter and considered as the smallest integer that is larger than the average household size. After the first two steps, the links within each group are designated as the strong links (that is familial contacts), which are stable and—in our model—are assumed to not change during the social distancing phase. The rest of the links are considered as weak links (external contacts) and may be disconnected by social distancing policies. Third, we consider the full social distancing case in which all weak links are removed as a result of NPIs. This hypothetical transformation will leave us with many disconnected sub-networks, where each sub-network (connected components, but not necessarily full graphs), contains at least one node and at most K nodes, given how step two is designed. Given the way these sub-networks are constructed, we then consider each as a household and the number of nodes in each sub-network is denoted as the household size. We then calculate the average household size of the small-world network by averaging the size of all the sub-networks generated in step 3. In the final step, we adjust the value of P for each country, to match the average size of these sub-networks to the average household size from the survey data.

### Contagion model

Similar to many recent studies on the spread of COVID-19, we model the dynamic process of virus transmission as a continuous-time SEIR stochastic model2,3,4,10,12,27,51. In the SEIR model, every individual in the contact-based network is in one of four possible states: susceptible (S), exposed (E), infected (I) and recovered (R). Compared to the other classic compartmental model in epidemiology, that is, the SIR (susceptible, infectious, recovered) model, SEIR adds an additional state, E, to denote the infected individuals who are in the incubation period, which is the transition time between the exposed and infected states. Assuming this state is crucial for many applications, given the incubation period of COVID-19 that can last up to 14 days52.

The infectability of pre-symptomatic cases has been a matter of many studies over the last year. Although some scholars assume that the individuals who are in the incubation period of COVID-19 are infected but not infectious2,3,4,51,52, others consider that the familial cluster of the COVID-19 infection indicates a possibility of transmission during the incubation period10,53,54,55. In the work of Aleta et al.10, which uses the SLIR model, the individuals in state L (which has a similar definition as the state E) is set as infected but not infectious. But Aleta et al. add one more state (P for pre-symptomatic) between L and I to express the infected, infectious and pre-symptomatic state. Furtmermore, Aleta and colleagues’ work separates the infected state (I) into two sub-states: where one is infectious and symptomatic, and the other is infectious but asymptomatic. In other words, Aleta considers the infected, but asymptomatic or pre-symptomatic individuals are infectious.

In our model we define that the exposed node is pre-symptomatic and infectious. Under this setting, our model show that even when social distancing policies were applied early when there were a small number of reported accumulative confirmed cases (the sum of nodes I and R), the epidemic still increased fast due to a mass of undetected infectious cases (E nodes). In our SEIR model, the susceptible individual (S nodes) can be infected by the pre-symptomatic but infectious individual (E nodes) or the symptomatic infectious individuals (I nodes). The infection rates of E nodes and I nodes are thus distinguished by: βE and βI. The main reason to set two different infection rates is that the pre-symptomatic individual can not be observed. As a consequence of unobservable symptoms, pre-symptomatic individuals can easily transmit the virus due to the lack of protection during the contact. By contrast, the symptomatic individual is observable and then the infection prevention can be applied during the contact.

We assume that an exposed individual will infect its susceptible neighbors after a period, with a probability that follows the exponential distribution with the parameter, βE. In other words, a susceptible node will be infected by one exposed neighbor after an average of $$\frac{1}{{\beta }_{E}}$$ days. Following the same logic, a susceptible node will be infected by one infected neighbor after an average of $$\frac{1}{{\beta }_{I}}$$ days. Aggregating the impact of contacts with exposed and infected neighbors, a given susceptible node, z, will be infected by one of its exposed or infected neighbors with the weighted infection rate, $${\beta }_{W}^{z}$$, where

$${\beta }_{W}^{z}={\beta }_{I}\times {\mathrm{number}}\,{\mathrm{of}}\,{\mathrm{infected}}\,{\mathrm{neighbors}}+{\beta }_{E}\times {\mathrm{number}}\,{\mathrm{of}}\,{\mathrm{exposed}}\,{\mathrm{neighbors}}$$

After an average of $$\frac{1}{{\beta }_{W}^{z}}$$ days, node z will move to the exposed state (E). In the early stages of the pandemic, we assume that βE and βI are the same due to inadequate special precautions such as observing social distancing or wearing masks. Once the exposed individuals (E) are showing symptoms or have tested positive, they are moved to the infected state (I). In the early stages of the outbreak and without sufficient test capacity, the incubation period follows the exponential distribution with the mean of $$\frac{1}{\alpha }$$ days. Moreover, the expected period that it takes for an infectious individual to move into the recovered state follows an exponential distribution with γ. Once the I node is recovered, it will be moved to an R state, after which the node will no longer be infected.

For the SEIR simulation, we use the Gillespie algorithm56,57, which is one of the efficient methods for simulating stochastic processes, and has been adapted to work for temporal-network simulations58. The continuous nature of the method can make it easier to set the NPI timing more precisely in the simulation since this timing is a function of the number of cases (see Supplementary Section 1 for more details regarding the implementation of the SEIR model).

### Modeling NPIs

In most countries, the first wave of the pandemic quickly caught the attention of local and federal governments who responded to the rapid spread of the virus by implementing a set of NPIs. In the early months of 2020, these interventions were mostly in the form of social distancing policies, ranging from limited rules for stores and restaurants capacity all the way to partial and total lockdown. What social distancing measures each country had in their NPIs portfolio in the early stages of the pandemic, and for how long, created a heterogeneity in the aggregate strength of cross-country NPIs. Creating a compact model requires a composite measure of policy stringency that incorporates all the essential heterogeneity factors. Moreover, once the stringency measure is quantified, we need to apply it to the country-specific small-world network to build a post-NPI network. In this section we discuss our method to tackle these two questions.

It has been empirically demonstrated that social distancing policies modified the contact pattern of people and reduced social interactions, even after taking into account the voluntary responses by the people to the pandemic39. In our model, the direct impact of social distancing policies needs to turn into a method for modification of the contact-based simulated network. One possible method is to infer policy stringency from changes in mobility data, using available resources provided by companies such as Google and SafeGraph. Although these datasets have been used by many scholars to estimate the degree of social distancing6,10,12,13,39, the complexity and scale of such datasets call for multi-level, elaborate models, something that is against the main goal of this work. Some of such elaborate models have been successfully implemented by other scholars: for instance, in Aleta and colleagues’ research10, a weighted multi-layer synthetic network is built to represent the interactions in different groups (school, workplace and community) by combining mobility data, census and demographic data in the metropolitan Boston area. In another work, Lai et al.12 analyzed the mobility change over a network, consisting of 340 prefecture-level cities in mainland China, by the dataset on near-real time daily relative outbound and inbound flow of smartphone users for each city in 2020.

The goal of this paper, however, is to build a simple compact model, using high-level country-level input data, rather than detailed data at the local levels. To maintain the simplicity needed for such a compact model, we capture the impact of policies on social interactions by the Government Stringency Index (GSI), defined and calculated by the Oxford COVID-19 Government Response Tracker59,60, which tracks policies and interventions of cross-national governments across a standardized series of indicators over the period of COVID-19 and creates a set of country-specific GSIs to measure the extent of governments’ responses. More details on the component of the GSI and the trajectories of GSI in targeted countries can be found in the Supplementary Sections 2 and 3.

Social distancing policies are dynamically adjusted based on the state of the epidemic. This dynamic process of policies is also reflected by the trajectory of the GSI60. In the early stage of the pandemic, the GSI in each country grew higher and higher until to the maximum. To simulate the intervention of policies on the pandemic, applying multi-time adjusted GSI is indeed more realistic. In doing so, however, the complexity of the model is also increased by more inputs in the GSIs and corresponding temporal shifts for the GSIs. To avoid this in favor of simplicity and since our primary goal is to explain/predict the relative trajectories of different countries, we focus on the strongest index for each country and the corresponding time in which the index jumped to its strongest level. This is especially a feasible assumption for the early stages of the pandemic during which the stringency index was a monotonic non-decreasing function. This level can also be interpreted as the accumulation of all the previous policies up to the point of the strongest measures.

Once the level and timing of NPIs for each country are determined, we need to change the small-world contact network accordingly. This involves deciding about the timing, types and the ratio of link removal. Given our decision to use a single accumulated change in the contact network, we make this change once the GSI reaches a critical point, GSIt. We then scale the GSIt to a ratio σSD [0, 1]. To decide which links to remove, we only focus on the weak links, as previous empirical studies have demonstrated that a substantial portion of the long-distance connections (weak links) disappeared after the implementation of policies, at least during the early stages of the pandemic43. Consequently, in the next step, we randomly remove the proportion σSD of weak links on the basis of how we defined weak versus strong links in the previous section. For example, if we set the GSIt as the maximum value of the GSI in the study period, then the GSIt in Germany is 76.85 and is scaled to 76.85% as σSD, which is the proportion of weak links that are randomly disconnected after the implementation of the social distancing policy in Germany. Importantly, for each country, the σSD is transformed from its GSIt with the same scale, which is aligned with our goal of not using any country-specific calibration. This proportional removal of close social contacts especially makes sense for the early stages of the pandemic, before other factors such as lockdown fatigue, social learning or the arrival of vaccination change the correlation between the two. We validated this assumption by comparing the stringency of country-level GSI with google mobility index of that demonstrates how much time people spend at their residence location (see Supplementary Section 4). After the removal of a subset of weak links, the modified contact-based network in our model exhibits a reduction of the network’s small-world property, something that is also captured by Schlosser’s empirical research43.

After the scaled GSIt (σSD) is set as the input to simulate the intervention of social distancing policies on the contact-based network for countries, we need to intervention time in the simulation. In our model, we set the increasing proportion of accumulative confirmed infected cases in the whole population (record by The Oxford Martin School38), rather than the date time, as the metric to describe when the intervention happened42. That is, in the simulation model, when the number of accumulative confirmed infected nodes account for a threshold of the whole population, the contact-based network is modified. Moreover, in the simulation, we need to scale the threshold smaller to ω, where ω (0, 1), due to the smaller population size in the simulation model, comparing to the real population in countries. Similar as the σSD, for each country, the ω is also transformed from its threshold with the same scale. More details on country-specific intervention timing’s selection, and including a figure of the demonstration for all countries can be found in the Supplementary Section 3.

### Model validation with longer period

Our model is validated using data from the first wave of the pandemic by focusing on the first two months. This choice of modeling period was a conscious decision: our goal here is to capture the effect of the interaction of social distancing policies with micro-level factors. Thus, to capture as much of this interaction as possible by using a simple model, we focused on the first wave of COVID-19 to have clear data from the pre-treatment periods, both in terms of social interaction data, as well as policy data. Importantly, this was the only period that different countries started from no-policy to some level of policy, whereas all other subsequent waves had different starting points for different countries. Moreover, to limit the number of confounding factors, we decided to focus on the period where policies were changing monotonously— although at a heterogeneous rate in different countries. This is because capturing the ramping down of the policies in the model requires a more explicit way of accounting for country-to-country variations in voluntary social distancing after the number of cases starts falling. So, that brings us to the period between the beginning of the first wave, until the situation comes under control in most countries and they start reversing some of the policies.

With this background rationale in mind, we extended the period of the model to include 90 days and 120 days and the results are included. We did not try the model for longer than 120 days, since after the first four months, the number of cases in most of these countries is quite low and most of the social distancing restrictions are lifted. This leaves little room for the interaction effect to be captured by our model.