Introduction

Since the declaration of the coronavirus disease 2019 (COVID-19) as a global pandemic by the World Health Organization (WHO) on 11 March 2020, there have been more than 80 million confirmed cases and more than 1.8 million deaths worldwide as of 31 December 2020. Many countries have imposed preventive measures, such as city lockdown, travel restrictions, quarantine, widening social distancing, enhancing personal hygiene, etc, for stopping or slowing down further transmission of the disease. While continuing with the pandemic measures, it is also important to assess the pandemic risk for setting up preparedness plan and control measures for COVID-191 especially for those countries with high risk of wide-spread transmission2.

A common pandemic risk assessment approach is by epidemiological modeling, where population is divided into at least three groups: susceptible, infected, and recovered3,4,5,6. This approach has been successful in understanding the disease transmission in a region through mathematical formulation of human interactions which may lead to infection, and the rate of recovery from infection. However, the susceptible-infection-recovery approach was designed fundamentally for closed populations7 and so restricted to regional studies. To supplement the epidemiological modeling, it is helpful to combine data from different regions or countries to study the pandemic situation, i.e. the status of the disease transmission in multiple regions or over the world, and to quantify the potential risk involved due to the COVID-19 outbreak. Two issues are of particular interest: prediction and control. Prediction means whether we can learn insights from data to produce early warning signals of pandemic and for preparedness8,9,10. Control refers to assessing the current pandemic risk severity for deciding proper measures during global pandemic11,12.

In this paper, we propose network analysis13 to aggregate data information from all over the world to construct dynamic pandemic networks for COVID-19. Network analysis has been applied to analyzing scientific collaborations14, to econometrics for assessing systemic risk and contagion effects in financial markets15,16, to medical research in studying gene co-expression, disease co-occurrence and global epidemics17,18,19,20, and to analyzing text21. We extend the network approach in the literature9,22 to study topological properties of COVID-19. There are three main features of our proposed network analysis. First, we make use of publicly available data, namely daily number of confirmed cases and daily accumulated number of infected people in each country to learn topological properties of dynamic pandemic networks and to visualize the propagation of COVID-19 for risk prediction and control. Second, we construct two pandemic risk scores and determine risk contributions from countries for pandemic prediction and control. The risk contributions of countries are helpful for setting preparedness plans and for assessing the severity of COVID-19 outbreak in specific regions. Finally, we compare the COVID-19 topological features with an independent network, the Erdos–Renyi model23, to understand the current pandemic situation and observe any signal for the COVID-19 outbreak to go away (for restarting economic activities).

Figure 1
figure 1

The pandemic risk assessment system work flow diagram.

In the healthcare management consideration, it is important, yet challenging, to assess pandemic risk of COVID-19 in the global perspective. In this study, we propose the network analysis and two pandemic risk scores using publicly available information on the number of confirmed COVID-19 cases, and the estimated number of currently infected people. We provide evidence that network statistics can yield early warning signals of the outbreak of COVID-19. The time series of the two pandemic risk scores and the respective risk contributions by countries can help to predict or ‘stress test’ the potential risk involved from releasing travel restriction measures between countries, and to estimate the severity of the pandemic for epidemic control. Simulations from a network assuming independent links between countries suggest that there is no sign for COVID-19 to go away in a few months. In summary, Fig. 1 shows the work flow diagram of the pandemic risk assessment system developed in this paper. We started from publicly available confirmed cases, recovered cases and the number of deaths due to COVID-19, to constructing dynamic pandemic networks and their network statistics, and finally compiling the preparedness risk score (PRS), and the severity risk score (SRS) for risk assessment. Details of the work flow are given in Fig. 1 and in “Methods”. For tracing the current pandemic risk of 164 countries, we have included the most updated risk scores and risk contributions in http://covid-19-dev.github.io/.

Results

Network connectedness

As described in “Methods”, we followed the literature9,22 to construct time series of pandemic networks of 164 countries from February 2020 to December 2020. The countries were linked together in a particular day when the correlation of the increase in the number of confirmed cases in the past 14 days exceeds a certain level. This ‘co-movement’ of newly confirmed COVID-19 cases defines the links between countries and forms the pandemic networks. To quantify the network connectedness, we calculated the network statistics presented in “Methods”. Figure 2 shows the pandemic networks on 4 February 2020, 11 March 2020, and 11 April 2020. From the figure, we can visualize the pandemic networks are getting more dense from February to April 2020. This visual increase in connectedness was documented as an early warning signal of the COVID-19 pandemic in the literature22. The left panel of Fig. 3 presents the number of edges, network density, clustering coefficient, and the assortativity coefficient of the COVID-19 pandemic networks from February 2020 to December 2020. The right panel of Fig. 3 gives the corresponding network statistics for the independent network, the Erdos-Renyi model, which are discussed in the Discussion. From the left panel of the figure, the actual number of edges in the pandemic network increases steadily from late February 2020 and rises sharply after the WHO’s declaration of the COVID-19 pandemic on 11 March 2020 to the peak on 18 March 2020. The sharp increase of the number of edges in the week of 12–18 March 2020 is partly attributed to having more countries reporting their confirmed COVID-19 cases, and partly due to more edges being formed because of the ‘common trend’ in the confirmed cases in more countries. The pattern of the number of edges in this particular week evidently confirms the pandemic announced by the WHO. The downward trend in the number of edges after 18 March 2020 signifies the time period of epidemic control of the further spreading for COVID-19 by countries which implemented various measures, like quarantine rules, travel restrictions, and enlarging social distancing. The downward trend also reveals the effect of those measures on the pandemic control and thus seeing less coherence in the occurrence of confirmed cases in the countries. The number of edges has been stable in April to June 2020, but there is a rising pattern in July to December 2020.

The research in the literature9 proposed that we can use the network density, \(D_t\) in Fig. 3, to generate early warning signals of the COVID-19 pandemic. We observe in the left panel of Fig. 3 that \(D_t\) shows the first peak on 26 or 27 February 2020 which is analogous to ‘pre-earthquake phenomena’ as an alarm to a big earthquake some days later. The first peak of the network density, induced by the high connectedness of changes in confirmed COVID-19 cases from countries in late February, probably tells us that the pandemic was happening. There has been stronger evidence of co-occurrence of more confirmed cases in the countries, leading to a sharp increase in the network density in just two to three days in late February. There is another peak in the network density in mid March after the WHO’s declaration of the COVID-19 pandemic on 11 March 2020. Similar to the number of edges, the network density is quite stable in April to June and has an increasing trend in July to December 2020.

Figure 2
figure 2

The COVID-19 pandemic networks on 4 February 2020, 11 March 2020, and 11 April 2020.

Figure 3
figure 3

The time series plots of the network statistics, including the number of edges, network density, clustering coefficient and assortativity coefficient. The left column gives the network statistics for the COVID-19 pandemic networks, and the right column gives the corresponding statistics from an independent network, the Erdos–Renyi model (simulated networks).

The clustering coefficient at time t is a measure of how close the pandemic network at time t to a perfectly-linked network where all countries are linked together. In the extreme case (of the perfectly-linked network) where all countries’ confirmed cases increase simultaneously in the past 14 days, the countries will be well connected and can be viewed as one big pandemic region. Therefore, when the clustering coefficient is higher, the pandemic network will be more similar to the perfectly-linked situation, indicating stronger evidence of the global pandemic. From the clustering time series in the third row of the left panel in Fig. 3, we observe that the highest value occurs on 26 February 2020, when we also observe the first peak in the network density. The clustering coefficients also present an early warning signal of the global COVID-19 pandemic. Unlike the network density, the clustering coefficients do not show another obvious peak in mid March. Although we still observe substantial confirmed COVID-19 cases in various countries, the clustering coefficients on around 18 March (when we see another peak in the network density) is smaller than that on 26 February 2020, implying that the pandemic network is less similar to the perfectly-linked network than what we see on 26 February 2020. The downward trend in the clustering coefficient probably implies that the global outbreak of COVID-19 was gradually controlled since early March, though it takes time for the actual confirmed cases to go down substantially. The clustering coefficient lies between 0.25 to 0.50 in April to September 2020 and exhibits an increasing trend after June 2020. It exceeds 0.5 for a few weeks in late October to early November 2020.

The assortativity coefficient, \(AS_t\) in Eq. (5), is to assess the assortative mixing of the vertices or countries in the pandemic networks. The last row in Fig. 3 (in the left panel) presents \(AS_t\) from February 2020 to December 2020, where we see the value of \(AS_t\) to jump up to around 0.5 in late February. The substantial increase in \(AS_t\) in February implies that countries with a large number of connections tend to link with high-connection countries. This pattern of association of ‘similar’ countries (in terms of the degree \(k_{it}\) in Eq. (3)) can indicate possible inflection risk in social networks24. In the current COVID-19 pandemic situation, we can view this high \(AS_t\) as an indication of the disease outbreak since it is likely when a global pandemic occurs, high-connection countries tend to be linked together because they may have a simultaneous increase in the number of confirmed COVID-19 cases in a short period of time. It is not surprising to see that the time series pattern of \(AS_t\) is similar to that of \(C_t\) in the third row of Fig. 3. The assortativity coefficient exhibits a decreasing trend from early March to early May 2020, and lies between 0.25 and 0.5 most of the time in May to September 2020. It stays at above 0.5 in November and early December 2020, indicating quite high assortative mixing in these two months.

Pandemic risk sores

The preparedness risk score (PRS), \(S_{1t}\) and the severity risk score (SRS), \(S_{2t}\) are computed for t from 4 February 2020 to 29 December 2020. Both \(\omega _t\) and \(n_t\) are measured in millions of people. We plot the standardized \(S_{1t}\), in which \(S_{1t}\) is divided by the total number of possible interactions (when all countries are directly linked and \(\omega _t\) taken as the population sizes) in Fig. 4 (left top panel). The first peak appears on 2 March 2020 when the standardized \(S_{1t}\) increases sharply from around zero to close to 0.1 in just a few days. Using 0.05 as a reference, i.e. 5% of the total interactions between people in the 164 countries, this sharp increase marks the first time when the risk score exceeds this reference, and can be regarded as an early warning signal of the COVID-19 pandemic. The second peak is found on 18 March 2020, a week after WHO’s declaration of COVID-19 as a global pandemic on 11 March 2020. This spike reaches a point close to 0.4, accounting for more than one-third of the total interactions between people in the 164 countries. After that, probably due to stringent measures imposed by various countries, including travel restrictions, community lockdown and enhancing social distancing, the standardized \(S_{1t}\) drops but quite slowly in mid April and May. The time series of the PRS stays mostly above 0.05 till mid June, indicating that the pandemic risk is still substantial even after three months of the WHO’s declaration with tremendous measures and efforts from various countries in preventing the transmission of COVID-19. An increasing trend appears in the standardized PRS in June to November 2020, when we see a local maximum of above 0.2. What worries us is whether this standardized \(S_{1t}\) will go up again after December 2020 if some countries release travel restriction measures later.

Figure 4 (left bottom panel) displays the time series of the standardized \(S_{2t}\), in which \(S_{2t}\) is divided by the total number of possible interactions (which all countries are directly linked, with \(\omega _t\) taken as the population sizes and \(n_t\) taken as 0.1% of the population sizes). The construction of \(S_{2t}\) is more sensitive to the confirmed COVID-19 cases and thus \(S_{2t}\) can serve as a measure of the COVID-19 outbreak severity. The first peak of \(S_{2t}\), signifying the first wave of the pandemic, is on 12–13 April 2020, roughly a month after the WHO’s declaration on 11 March 2020. There is improvement in reducing the number of confirmed cases and possibly also the pandemic network connectedness that we observe obvious drop in \(S_{2t}\) in late April and early May. However, another local peak is observed on 30 May 2020, which likely indicates the second wave of the outbreak in the second half of May. The recent increasing trend of \(S_{2t}\) appeared in late June to December 2020 probably implies that the third wave of the outbreak of COVID-19 has come. The situation may get worse later if there is no preventive measure for further stopping the transmission of the disease in local community or if travel restrictions are going to be suspended or stopped.

Figure 4
figure 4

Plot of the standardized \(S_{1t}\) with the standardization done by all possible interactions between people in the 164 countries, and the standardized \(S_{2t}\) with the standardization done by all possible interactions between people in the 164 countries divided by \(0.1\%\).

Risk contributions from countries

Figure 4 (right top panel) gives the risk contributions of Africa, America, Asia, Eastern Mediterranean and Europe by grouping the \(f_{it}\), the risk contribution based on the first pandemic risk score \(S_{1t}\) in Eq. (8), in the five regions. We observe that Asia contributes most of the pandemic risk associated with \(S_{1t}\) in February 2020 and the Asia’s contribution stayed mostly between 0.25 and 0.5 after February. In the predictive perspective, reducing the mobility of the Asia’s population can be a reasonable preventive measure during the pandemic. There is an upward pattern in the America’s contribution to close to 0.5 in mid June to July 2020 and in the Europe’s contribution to above 0.25 in late August and December 2020, indicating relatively higher risk contribution of transmission of the disease due to possible interaction between susceptible populations.

Figure 5 presents the heatmap of \(f_{it}\). Based on the construction of \(S_{1t}\) using mainly the interactions between two susceptible populations, this risk contribution is more on preventive preparation if there is chance for people from the two countries to interact. So the risk contribution \(f_{it}\) is like doing ‘stress testing’ of the pandemic risk. In early February, the risk score is mainly contributed by Asia, where the risk contributions from India and Philippines are around 0.3. From late February to early March, the risk score from non-Asian region, e.g. United States from America, Iran and Pakistan from Eastern Mediterranean, and Italy, Germany and France from Europe started showing significant contributions to the risk score. In particular, \(f_{it}\) of the United States has been quite high since mid February, signifying the early stage of the pandemic in the country. In late May and June, Ethiopia, Nigera from Africa, and Brazil from America also showed around 0.1 contribution that should be alerted. The risk contribution \(f_{it}\) of the India, Indonesia, Japan in Asia, Brazil and the United States in America, and France, Italy and Russia in Europe are quite relatively high in July to November 2020.

Figure 5
figure 5

The heatmap of the risk contribution \(f_{it}\) based on the preparedness risk score \(S_{1t}\) for the 164 countries from February to December 2020.

Figure 6
figure 6

The heatmap of the risk contribution \(g_{it}\) based on the severity risk score \(S_{2t}\) for the 164 countries from February to December 2020.

Figure 4 (right bottom pannel) gives the risk contributions of Africa, America, Asia, Eastern Mediterranean and Europe by grouping the \(g_{it}\), the risk contribution based on the second pandemic risk score \(S_{2t}\) in Eq. (9). Since \(S_{2t}\) is deduced from the number of possible interactions between people of a susceptible population and people of an infected population, the number of confirmed cases will play an important role in determining the risk contribution \(g_{it}\). From Fig. 4, the pandemic risk based on \(S_{2t}\) was contributed mostly by Asia in February 2020, and mostly by Europe and America in March and April 2020. After April, the risk contribution from America climbed up to about 0.5, and exceeded 0.75 in July 2020. In September and October 2020, the risk contribution is dominated by Asia. The risk contribution of America and Europe are at high levels in November and December 2020, when the winter comes. Figure 6 presents the heatmap of \(g_{it}\). This risk contribution reflects the severity of the COVID-19 pandemic risk from each country and thus is a viable measure for guiding us on the pandemic control of further transmission of the disease. In early February, the risk contributions of Japan, Philippines, Korea, Singapore and Thailand are quite high. The risk contributions are then dominated by Korea, Iran and Italy in early March, and followed by United States and Spain in late March. We can see that \(f_{it}\) of United States appears to be high in late February, earlier than seeing large value in \(g_{it}\), probably indicating that there was chance for United States to set up stringent control measures in late February before the widespread of the disease reflected in \(g_{it}\) afterward. On the contrary, India decided the national lockdown policy on 24 March 202025. The lockdown policy may account for the fact that even though its \(f_{it}\) showing quite high value in February, its risk contribution indicated by \(g_{it}\) is very low in March and April. The cases of United States and India probably indicates the possible use of \(f_{it}\) as a measure for setting up timely stringent measures. In April to June, the risk contribution \(g_{it}\) is dominated by Brazil, Peru and United States in America, and Italy, Russia and Spain in Europe. India and South Africa’s risk contributions have increased in July and August, respectively. The risk contribution of European countries are quite high in December 2020. The severity of the COVID-19 pandemic risk in those countries with relatively high \(g_{it}\) cannot be ignored.

Discussion

To assess the current pandemic status of COVID-19, we simulate networks from the Erdos-Renyi model23. Specifically, at each time t, we generate 10,000 networks with edges independent of each other. To maintain the same level of network density, two vertices in a simulated network are linked together with the probability \(D_t\). The right panel of Fig. 3 displays the time series of the average of the network statistics from the 10,000 simulated networks. The outbreak of a disease likely makes the pandemic network to follow some structures which deviate from the Erdos-Renyi model. For example, in the COVID-19 pandemic, the common trend in the confirmed COVID-19 cases observed between countries i and j and countries j and k may imply a relatively higher chance to observe a common trend on the COVID-19 confirmed cases in countries i and k as well (this is the transitivity property in social networks). Therefore, by comparing the discrepancy between the time series patterns of \(C_t\) and \(AS_t\) between the COVID-19 pandemic networks and the Erdos–Renyi model, we can find hints as to when the global COVID-19 pandemic will go away. From Fig. 3, the clustering coefficients of the pandemic networks are well above that of the Erdos–Renyi model with the same network density \(D_t\). Although there is sign for the assortativity coefficients of the pandemic networks to approach zero, the assortativitiy coefficient of the Erdos–Renyi model, in June and July, it goes up again after July 2020 and reaches a local maximum in November 2020. Comparing \(D_t\), \(C_t\) and \(A_t\) of the pandemic networks and the same network statistics simulated from the Erdos–Renyi model, there is no strong evidence that COVID-19 will go away shortly after December 2020.

The preparedness risk score \(S_{1t}\) accounts for the possible interactions between susceptible population of countries being linked together in the pandemic networks, and can be used to quantify the risk of transmission of the COVID-19 between countries. The possible transmission risk is attributed to substantial percentages of asymptomatic transmission or presymptomatic transmission26. Therefore, this \(S_{1t}\) score is a kind of ‘stress testing’ measure27 which evaluates the potential transmission risk involved if travel restriction policies are canceled or become less strict. From the time series plot of the standardized \(S_{1t}\) in Fig. 4, the transmission risk, or \(S_{1t}\), increases sharply on 2 March 2020 to 0.08, and 18 March 2020 to 0.36, implying more than 1/3 of all possible interaction between susceptible populations of the 164 countries. The first peak on 2 March 2020 (well before the WHO’s declaration on 11 March 2020) can be regarded as an early warning signal of the COVID-19 global pandemic. The second peak on 18 March 2020 marks the rapid spreading of the disease in March and early April, when the number of cases of Italy, Spain, France and other European countries increase substantially3,28,29. After reaching the peak on 18 March 2020, the transmission risk went down in April and May, probably because of a series of travel restrictions and other lockdown measures which significantly stopped the mobility of susceptible population between countries to almost completely disappeared. However, \(S_{1t}\) didn’t go down further and stayed at around 0.05 in April and May, implying that the transmission risk had not been reduced to a level that would enable the release of travel restriction measures. Even worse is that the increase in the transmission risk is reignited in late June and July and continues to be quite severe till December 2020.

The risk contribution \(f_{it}\) in Eq. (8) based on \(S_{1t}\) quantifies the relative impact of country i at time t in transmitting COVID-19 through interaction between susceptible populations. The higher the value of \(f_{it}\), the higher is the potential impact from country i at time t with respect to the transmission risk. The risk contribution \(f_{it}\) can be interpreted as a relative transmission risk measure which can be considered by governments to set up preventive measures by revisiting their epidemic control policies applied to different regions from time to time. For example, we observe that \(f_{it}\) of Nigeria and South Africa (of Africa), Brazil, Mexico and United States (of America), India and Indonesia (of Asia), Pakistan (of Eastern Mediterranean) and Russia and Spain (of Europe) are relatively high in June and July. Governments are recommended to be more alerted when revisiting any preventive measure with those countries whose \(f_{it}\)’s are high.

The severity risk score \(S_{2t}\) calculates the possible interactions between susceptible populations and the currently infected population of countries being linked together in the pandemic networks. The score reflects the current severity of the COVID-19 outbreak due to the transmission of disease from infected people who may have symptoms or due to presymptomatic transmission. The evolution of pandemic risk severity reflected by the time series pattern of \(S_{2t}\) provides a reference for governments to revisit their current epidemic control measures to see if it is necessary to revise or to strengthen the current measures for preventing their medical systems from overloading too much or breaking down. In Fig. 4, we observe from \(S_{2t}\) the first wave of COVID-19 outbreak in early to mid April when there is substantial increase in the confirmed COVID-19 cases in Europe28. There is an obvious drop in \(S_{2t}\) in late April till mid May, probably due to the lockdown control in European countries30. The second wave of the COVID-19 outbreak fell in late May when there were graduate economic reactivation activities implemented. Another sharp increase in the risk score started in late June and the increasing trend continued in July to December 2020, signifying the third wave of the outbreak. The high value in \(S_{2t}\) can give an alerting signal to governments about the need to slow down the recovering of social activities or the revival of economy.

Although \(g_{it}\) refers to the risk contribution due to interactions between infected people and susceptible population of countries, high \(g_{it}\) also implies high risk of ‘domestic transmission’ if people from different regions of a country can have physical contact, for example, through school resumption, social, or religious activities. Before and during the first wave of COVID-19 outbreak in March and April, we see relatively high \(g_{it}\) in Fig. 6 in France, Germany, Italy, Spain, United Kingdom and United States. After imposing strict countrywise lockdown policies, the risk contributions of many European countries, except Russia, went down in May. In terms of \(g_{it}\), the pandemic risk severity of United States does not significantly lower in May. We also see the transmission of the disease to South America where the risk contribution of Peru is quite high in May. In July, the total risk contribution in Brazil and United States accounts for around 80% of the overall \(S_{2t}\) and we also see substantial increase in risk contribution in South Africa. The risk contribution of India is the highest in September and October 2020. The \(g_{it}\) heated up again in the United States and European countries in November and December 2020. Governments and healthcare specialists may not ignore relatively high \(g_{it}\) which may indicate the relative severity of COVID-19 pandemic risk in countries, especially if those countries are financial, economic or even travel hubs in the region.

For tracing the latest COVID-19 pandemic status, we have included the most updated risk scores and risk contributions in http://covid-19-dev.github.io/.

Methods

As presented in Fig. 1, the first step of the proposed pandemic risk assessment system is to construct dynamic pandemic networks using publicly available confirmed COVID-19 cases.

Network statistics

Let \(X_{i,t}\) be the number of confirmed COVID-19 cases of country i in day t. Following the literature9,22, we obtain the daily changes for each country as \(Y_{i,t}= \sqrt{X_{i,t}}-\sqrt{X_{i,t-1}}\). To construct dynamic pandemic networks, we calculate \(\rho _{ij,t}\), the correlation between country i and country j’s daily changes in day t using the observations \((Y_{i,t-k}, Y_{j,t-k})\), for k = 0, ..., 13. After that, we construct the pandemic network at time t by defining \(A_{ij,t}\), which is the (ij)th element of an adjacency matrix of the pandemic network at time t:

$$\begin{aligned} A_{ij,t} = \left\{ \begin{array}{ll} 1 &{} \quad \text{ if } \rho _{ij,t} > r,\\ 0 &{} \quad \text{ otherwise }. \end{array} \right. \end{aligned}$$
(1)

In other words, an edge/link between country i and country j is formed in the pandemic network at time t if \(\rho _{ij,t} > r\). Since the earliest report of the number of confirmed cases by the WHO in late January 2020, we have more countries in giving us the data of \(X_{i,t}\). To study the statistical properties of the pandemic network from February to December 2020, we start from recording the number of countries having confirmed cases records at time t, denoted by \(V_t\), which is less than or equal to 164 (country names listed in Figs. 5 and 6). Statistically, \(V_t\) is the number of vertices of the pandemic network at time t. When a country has zero \(Y_{i,s}\) in 14 consecutive days (\(s=t-13, \ldots , t\)), it will be excluded from \(V_t\). Figure 2 shows the pandemic networks at \(t=1\) (4 February 2020), \(t=36\) (11 March 2020) and \(t=67\) (11 April 2020), from which we observe increasing level of connectedness in the pandemic networks from early February to early April22. To quantify the network connectedness and to summarize other network properties, we consider four network statistics at time t, \(E_t\) (the number of edges), \(D_t\) (the edge density), \(C_t\) (global clustering coefficient), and \(AS_t\) (assortativity). Since we define the edge of the pandemic network by the event that the correlation \(\rho _{ij,t}\) exceeds the threshold r, the number of edges, \(E_t\), gives the number of pairs of countries showing a ‘common trend’ in terms of the confirmed COVID-19 cases in the past 14 days, including time t. In this paper, we follow the literature9 to set \(r=0.5\). The network density, \(D_t\), defined by

$$\begin{aligned} D_t = \frac{2 E_t}{V_t (V_t - 1)}, \end{aligned}$$
(2)

is to understand how dense the pandemic network is at time t. The literature9 presents visual evidence on using \(E_t\) to give an early warning signal of the COVID-19 pandemic.

To capture how strong vertices tend to be linked together, we calculate a clustering coefficient of the pandemic network at time t. For vertex i, define its local clustering coefficient at time t as

$$\begin{aligned} c_{it} = \frac{e_{it}}{{k_{it} \atopwithdelims ()2}}, \end{aligned}$$
(3)

where \(e_{it}\) is the number of connected pairs among the neighbors of vertex i at time t, and \(k_{it}\) is the number of neighbors (or degree) of vertex i at time t. The numerator in Eq. (3), \(e_{it}\), also counts the number of triangles formed by vertex i in the pandemic network at time t, and \({k_{it} \atopwithdelims ()2}\) is the number of connected triples (a subgraph of three vertices connected by two edges) at time t connecting vertex i. The \(c_{it}\) measures the tendency for vertex i to form triangles with its neighbors. The clustering coefficient, denoted by \(C_t\), is a weighted average:

$$\begin{aligned} C_t = \frac{\sum _{i=1}^{V_t} {k_{it} \atopwithdelims ()2} c_{it}}{\sum _{i=1}^{V_t} {k_{it} \atopwithdelims ()2} }. \end{aligned}$$
(4)

The clustering coefficient \(C_t\) measures how strong vertices/countries in the pandemic network at time t are bounded together in clusters. In the extreme case where \(C_t\) = 1, all pairs vertices/countries are connected by edges in the pandemic network at time t. In other words, the higher the \(C_t\), the stronger the evidence of global pandemic is revealed in the pandemic network.

$$\begin{aligned} AS_t = \frac{V_t^{-1} \sum _{i=1}^{V_t} \sum _{j=1}^{V_t} k_{it} k_{jt} I(A_{ij,t}=1) - \left[ V_t^{-1} \sum _{i=1}^{V_t} \sum _{j=1}^{V_t} \frac{1}{2} (k_{it}+ k_{jt}) I(A_{ij,t}=1)\right] ^2}{V_t^{-1} \sum _{i=1}^{V_t} \sum _{j=1}^{V_t} \frac{1}{2} (k_{it}^2 + k_{jt}^2) I(A_{ij,t}=1) - \left[ V_t^{-1} \sum _{i=1}^{V_t} \sum _{j=1}^{V_t} \frac{1}{2} (k_{it}+ k_{jt}) I(A_{ij,t}=1)\right] ^2}, \end{aligned}$$
(5)

In addition to the network density and the clustering, we also consider the assortativity of networks31, which is used to describe assortative mixing properties in networks. Specifically, we evaluate the tendency for countries with similar degrees to link together in the pandemic networks. The degree of country i is defined as the number of countries connecting that country at time t and is given by \(k_{it}\). Using the network terminology, \(k_{it}\) is the number of edges incident to vertex/country i. In epidemiological perspectives, the risk of individuals’ infection may be related to the assortativity (high degree vertices tend to link with other vertices with high degrees) in social networks24. In this paper, we apply this assortative mixing concept to the pandemic networks constructed to understand how the assortative mixing properties change over time during the COVID-19 pandemic. Statistically speaking, we calculate an assortativity coefficient which measures the correlation of the degress of the vertices in the pandemic networks. At time t, the assortativity coefficient is defined by Eq. (5) where I(E) is an indicator function whose value is equal to one if the event E is true and zero otherwise. We attempt to learn any insight from this assortativity coefficient on possible early warning signals of a global pandemic like COVID-19.

Pandemic risk scores

In this paper, we propose two pandemic risk scores defined by the dynamic pandemic networks represented by \(A_{ij,t}\) in Eq. (1), the adjacency matrix at time t. Specifically, we aggregate the link information in \(A_{ij,t}\), susceptible population sizes (or population at risk) of each country, and the number of confirmed cases, \(X_{i,t}\), to construct two scores which help us to assess the potential transmission risk across countries, and the severity of the global pandemic risk due to COVID-19. Define the first risk score as the preparedness risk score (PRS)

$$\begin{aligned} S_{1t} = \omega _t^\top A_t \omega _t, \end{aligned}$$
(6)

where \(\omega _t\) is the vector of population size of each country (based on the World Bank population figures in 2018) subtracted by the total number of confirmed cases in each country up to time t. In other words, the ith element of \(\omega _t\), denoted by \(\omega _{it}\), represents the population of country i at risk or susceptible at time t. Since \(S_{1t}= \sum _{i=1}^{V_t} \sum _{j=1}^{V_t} \omega _{it} A_{ij, t} \omega _{jt}\), \(S_{1t}\) in Eq. (6) counts the total number of possible interactions of susceptible population contributed from all pairs of countries which are linked together at time t. This PRS accounts for the risk of asymptomatic transmission or presymptomatic transmission32 due to possible interaction between people in the two countries. These two kinds of transmission are usually hidden in the population and is useful to quantify the pandemic risk especially when the transmission between two countries cannot be completely stopped by travel restrictions or other infection prevention policies. Therefore, the risk score, \(S_{1t}\), can help us to project the severity of the COVID-19 outbreak for preparedness if existing travel restrictions or lockdown schemes are released.

We define the second risk score as the severity risk score (SRS)

$$\begin{aligned} S_{2t} = \omega _t^\top A_t n_t, \end{aligned}$$
(7)

where \(n_t\) is the vector of the total number of currently infected people at time t. In other words, the ith element of \(n_t\), denoted by \(n_{it}\) is calculated as \(n_{it}= \sum _{s=1}^t X_{i,s} - AR_{i,t} - AD_{i,t}\), where \(AR_{i,t}\) and \(AD_{i,t}\) are the accumulated number of recovered cases and accumulated number of deaths due to COVID-19 in country i at time t. It is easy to see that \(S_{2t}= \sum _{i=1}^{V_t} \sum _{j=1}^{V_t} \omega _{it} A_{ij, t} n_{jt}.\) The rationale behind \(S_{2t}\) is to count the number of possible interactions between currently infected cases of one country and all people at risk in another country, if the two countries are linked together. The product of \(\omega _{it} n_{jt}\) is similar to the classical SIR model7, where the number of susceptible people at time t multiplied with the number of infected people at time t is used to determine the rate of change of the number of susceptible people. This SRS accounts for presymptomatic transmission due to possible interaction between susceptible population in one country and currently infected cases in another country before the confirmed cases are identified and forced-isolated or quarantined. The SRS, \(S_{2t}\), can help assess the severity of the COVID-19 due to the current level of infections and plan for outbreak control measures. We present this step in computing the PRS and SRS in the work flow diagram in Fig. 1.

As in the literature33, we can define the risk contribution by country i on the PRS as

$$\begin{aligned} f_{it} = \frac{ \omega _{it} \frac{\partial \sqrt{S_{1t}}}{\partial \omega _{it}} }{ \sqrt{S_{1t}} } = \frac{\omega _t^\top A_t {\mathbf {1}}_i {\mathbf {1}}_i^\top \omega _t}{\omega _t^\top A_t \omega _t}, \end{aligned}$$
(8)

where \({\mathbf {1}}_i\) is a column vector with the i-th entry equal to 1 and 0 otherwise. Using the Euler rule, we can show that \(\sum _i f_{it} = 1\). From Eq. (8), \(f_{it}\) is approximately equal to the ratio of the percentage change in \(\sqrt{S_{1t}}\) over the percentage change in \(\omega _{it}\). If a particular country i has high \(f_{it}\), reducing the mobility of the people (at risk) in that country may lead to substantial reduction in \(S_{1t}\), thereby lowering the global pandemic risk. Similarly, we can define the risk contribution by country i on the SRS as

$$\begin{aligned} \frac{ n_{it} \frac{\partial S_{2t}}{\partial n_{it}} }{ S_{2t} } \approx \frac{ n_{it} \sum _{j=1}^{V_t} \omega _j A_{jl} }{ S_{2t} } = g_{it}. \end{aligned}$$
(9)

Again, we can show that \(\sum _i g_{it}=1\). Equation (9) implies that \(g_{it}\) is approximately equal to the ratio of the percentage change in \(S_{2t}\) over the percentage change in \(n_{it}\). If a particular country i has high \(g_{it}\), quarantining currently infected cases and enhancing social distancing measures as best as we can help ‘block’ the interaction of infected people with people at risk, and may lead to substantial reduction in \(S_{2t}\), thereby achieving better control on the global pandemic. In summary, for outbreak prediction or prevention, \(S_{1t}\) is recommended for policymakers to make reference to. For outbreak control, \(S_{2t}\) is more meaningful.

The risk contributions \(f_{it}\) and \(g_{it}\) with respect to the PRS and SRS can be computed on the global, regional, and country levels (see Fig. 1). For active pandemic risk monitoring, the most updated PRS, SRS, \(f_{it}\), and \(g_{it}\) can be accessed in http://covid-19-dev.github.io/.

Sensitivity analysis

The construction of the dynamic pandemic network in Eq. (1) requires the specification of the threshold value r. To perform sensitivity analysis on the choice of r, we study the pandemic network properties for \(r = 0.4, 0.5\) and 0.6. Figure 7 shows the PRS, SRS, network density, clustering coefficient and assortativity based on the pandemic networks constructed using \(r = 0.4, 0.5\) and 0.6. When r is smaller, we expect to see more connections and so the three network density time series in Fig. 7 differ with the highest one appears at \(r=0.4\). Although the three network time series have different levels, they exhibit very similar trends. In addition, both the clustering coefficient and assortativity are insensitive to the choice of r, implying that major network properties regarding pandemic network connectedness are not sensitive to the choice of the threshold r if it falls in 0.4 to 0.6. A reviewer suggests that the adjacency matrix can be weighted to reflect the different level of connection between countries. Based on this suggestion, we also compare the PRS and SRS with the adjacency matrix defined by \(A_{ij,t}= \rho _{ij,t}\) if \(\rho _{ij,t} > 0\) and \(A_{ij,t}=0\) otherwise. We can see from Fig. 7 that all four PRS and SRS show similar temporal patterns, indicating that the \(A_t\) defined by different r and the \(A_t\) defined by the weight \(\rho _{ij,t}\) give consistent pandemic risk assessment based on the PRS and SRS. The above sensitivity analysis demonstrates the robustness of our proposed pandemic network methodology, which enables users to pick their own thresholds without changing major conclusions for pandemic risk prediction and assessment of COVID-19.

There is also a comment that the SRS is better calculated with the use of the number of local COVID-19 cases rather than the total confirmed cases. This is a good point. However, most countries do not provide local case information. Since the percentage of imported cases is relatively small after extensive travel restrictions after March 202034, the imported cases should contribute very little in the susceptible population and so the general trend of the SRS will likely be the same even though we replace the number of confirmed cases by the number of the local cases. Therefore, we keep the definition of the SRS in Eq. (7) in our application to the COVID-19 pandemic. Another related issue in the calculation of the SRS is the availability of the number of recovered cases. As WHO only contains official information of the number of COVID-19 confirmed cases and the number of death due to COVID-19, the number of recovered cases can be missing or under-reported in some countries35,36,37. To deal with incomplete recovered case information, we perform additional analysis to extract possible tendency of the world COVID-19 recovery rate over time. We first set a threshold \(r_R\) which classifies whether the number of recovered cases is under-reported or missing. Define \(R_{i,t}^{(R)}\) as the reported recovery rate of country i at time t, i.e. the total reported number of recovered cases up to time t divided by the total number of confirmed cases up to time t. We estimate the world recovery rate by filtering out those countries which have \(R_{i,t}^{(R)} < r_R\), and calculate

$$\begin{aligned} R_{w,t} = \frac{\sum _{i: R_{i,t}^{(R)} \ge r_R} AR_{i,t}^{(R)}}{\sum _{i: R_{i,t}^{(R)} \ge r_R} \sum _{s=1}^t X_{i,s}}, \end{aligned}$$
(10)

where \(AR_{i,t}^{(R)}\) is the accumulated number of recovered cases of country i reported up time t. Therefore, \(R_{w,t}\) is an estimated world recovery rate at time t based on those countries whose recovery rate at time t exceeds \(r_R\). Figure 8 presents the estimated world recovery rate based on Eq. (10) with \(r_R = 0.5\) to 0.8. The four plots in the figure show an increasing trend in the recovery rate from May 2020 to December 2020. In mid December 2020, the four estimated recovery rates based on different \(r_R\) are consistent that they seem to converge to around 0.9, which is in line with some published recovery rates38,39,40.

For countries whose \(R_{i,t}^{(R)} < r_R\), we regard their number of recovered cases either under-reported or missing. Then, we do imputation for the countries with \(R_{i,t}^{(R)} < r_R\) by taking \(AR_{i,t} = R_{w,t} \times \sum _{s=1}^t X_{i,s}\). That is, we substitute the under-reported or missing recovery rates at time t by \(R_{w,t}\). For countries whose \(R_{i,t}^{(R)} \ge r_R\), we take \(AR_{i,t} = AR_{i,t}^{(R)}\), that is, we regard the accumulated number of recovered cases reported as official valid values to be adopted in the SRS calculation. The bottom panel of Fig. 8 gives the SRS based on the four \(r_R\). Again, the four SRS time series pick up a similar trend that we are able to identify local peaks in early October, late October, late November and mid December 2020. We adopt \(r_R = 0.7\) in Figs. 4 and 6 as the time series of \(R_{w,t}\) is relatively more stable when \(r_R = 0.7\). Given that the SRS is to capture the relative severity of the pandemic risk for nowcasting the COVID-19 pandemic, the robustness in SRS to the choice of \(r_R\) gives us flexibility in adjusting \(r_R\) in real applications.

Figure 7
figure 7

The left panel shows the PRS and SRS, and the right panel shows the network density, clustering coefficient and assortativity for \(r = 0.4, 0.5\) and 0.6.

Figure 8
figure 8

The top panel shows the estimated world recovery rate based on \(r_R = 0.5\) to 0.8. The bottom panel shows the SRS based on the imputed number of recovered cases with \(r_R = 0.5\) to 0.8.