Impact of the Euro 2020 championship on the spread of COVID-19

Large-scale events like the UEFA Euro 2020 football (soccer) championship offer a unique opportunity to quantify the impact of gatherings on the spread of COVID-19, as the number and dates of matches played by participating countries resembles a randomized study. Using Bayesian modeling and the gender imbalance in COVID-19 data, we attribute 840,000 (95% CI: [0.39M, 1.26M]) COVID-19 cases across 12 countries to the championship. The impact depends non-linearly on the initial incidence, the reproduction number R, and the number of matches played. The strongest effects are seen in Scotland and England, where as much as 10,000 primary cases per million inhabitants occur from championship-related gatherings. The average match-induced increase in R was 0.46 [0.18, 0.75] on match days, but important matches caused an increase as large as +3. Altogether, our results provide quantitative insights that help judge and mitigate the impact of large-scale events on pandemic spread.


• Spain: Ministry of Public Health
Aggregated by COVerAGE-DB from https://cnecovid.isciii.es/covid19/ To estimate the deaths associated with the Euro 2020 cases we calculate the case fatality risk by using the number of deaths and number of cases as reported by Our World in Data (OWD) [2].
For showcasing the stringency of governmental measures (panel C in Fig. S24-S36), we used data from the Oxford COVID-19 Government Response Tracker [3] and the public health and social measures (PHSM) severity index [4] from the World Health Organization (WHO). For our correlational analysis of cases and human mobility ( Fig. 3B and S4), we used data from the COVID-19 Community Mobility Reports [5] provided by Google. For correlation with pre-Euro 2020 incidences (Fig.S6B) we use case numbers as reported by the Johns Hopkins University (JHU) [6]. Lastly, we used data from Google Trends [7] to investigate people's interest in the Euro 2020 (Fig. S20).

S2 Supplementary analysis: our results in context
Supplementary Figure S1: Our results in context: How much of an effect do short but strong increases of transmission have? A-C: Understanding Euro 2020 matches as point interventions where the reproduction number is allowed to increase drastically from its base level R base for one day (∆R = 2.0, yellow curve), we compare its cumulative effect with different scenarios of lifting restrictions. These effects are in the order of magnitude of those reported in the literature [8]. The purple lines represent the same effect as a single increase but distributed over one week (∆R = 0.28 ≈ 2/7), while the red curve represents a permanent lifting of those restrictions. The effect of the yellow and purple interventions is similar for t ≤ 2 weeks because the product between ∆R and the duration of the intervention is the same. D: We observe long-term effects of consecutive interventions even when R base is lower than one (red dotted line). The impact of these effects increases exponentially with R base . E: Similarly, the final incidence (after six weeks) increases with R base . The red dotted line indicates that an incidence ratio larger than one can already result from values of R base smaller than one. Altogether, the cumulative effect of short but strong interventions (such as Euro 2020 matches) can be compared to lifting all bans on gatherings for a certain period of time. Curves were generated using a linear SEIS model without immunity for illustrative purposes.
To put our results in context, we compare the impact that different hypothetical scenarios of lifting of restrictions would have on case numbers (Fig. S1). Using a linear SEIS model for illustrative purposes, we evaluate three scenarios: i) Recurrent, bi-weekly (period T = 2 weeks) large events that strongly increase the reproduction number over its base level R base for one day by ∆R s = 2.0 (yellow curves . ii) A temporary one-week lifting of restrictions, with an effect equal to a single-day large event by distributing the increase in R base over a week: ∆R w = 0.28 ≈ 2/7 (purple curves). iii) A permanent lifting of restrictions to the level of the second scenario: ∆R p = 0.28 for the considered time span (red curves). The value for ∆R s in the first scenario is comparable to the largest effects found for the England-Scotland matches, while those in the second and third scenarios are similar to the effect of banning all private gatherings of 2 people or more as reported in [8].
The effect of interventions is comparable whenever the products between ∆R and the duration of the interventions are the same (e.g., yellow and purple curves for t ≤ 2 weeks in Fig. S1A, B). In other words, the cumulative effect of short but strong interventions (such as Euro 2020 matches), can be compared to Supplementary Information of Impact of the Euro 2020 championship on the spread of COVID-19 lifting all bans on gatherings for a certain period of time. However, for regularly recurring interventions of size ∆R s , we observe permanent long-term effects when R base + ∆R s /T ≥ 1; the impact of recurring interventions increases disproportionately over time ( Fig. S1A-C). Controlling the long-term effect of recurrent increases of the reproduction number is possible if the underlying reproduction number R base is small enough. Small changes of R base substantially impact the outcome, even below the R base = 1 threshold, and in an exponential manner (Fig. S1D, E). This underlines the importance of control strategies if large-scale events are expected to temporally increase the spread of COVID-19.
On the other hand, quantitatively, the expected size z of an infection chain depends on the effective reproduction number R eff . As long as R eff is larger than one, the infection chains can become arbitrarily large. But even if R eff < 1, one single infection is expected to cause z = (1 − R eff ) −1 infections before the chain dies out. For example, if R eff = 0.9, a single infection caused by the Euro 2020 implies z = 10 infections in the total chain. Thus, in comparison, the primary cases have only a small contribution; the majority of the impact of an event like the Euro 2020 is the spread of subsequent infections into the general population (e.g., Fig. 2A).

S4 Supplementary Figures
Supplementary Figure S2: Overview of the sum of primary and subsequent cases accountable to the Euro 2020. Calculations account for cases until July 31st, i.e., about three weeks after the championship finished.
In the Netherlands (⋆) the "freedom day" occurred on the same time as the Euro 2020. This effect also had a gender imbalance, thus, making it hard for our model to extract the Euro 2020 effect (see. Fig. S31). White dots represent median values, black bars and whiskers correspond to the 68% and 95% credible intervals (CI), respectively, and the distributions in color (truncated at 99% CI) represent the differences by gender (n = 12 countries).

9
Supplementary Figure S3: Overview of cases in all considered countries apart from the Netherlands We split the observed incidence (black diamonds) of the three countries with the largest effect size into i) cases independent of Euro 2020 matches (gray area), ii) primary cases (directly associated with Euro 2020 matches, red area), and ii) subsequent cases (additional infection chains started by primary cases, orange area). See Figure 2 for more details. The turquoise shaded areas correspond to 95% CI. In the box plots, white dots represent median values, turquoise bars and whiskers correspond to the 68% and 95% credible intervals (CI), respectively.
Supplementary Figure S4: We found no significant correlation between cases arising from the Euro 2020 and human mobility. Using mobility data from the "Google COVID-19 Community Mobility Reports" [5], we tested for correlation against the fraction of Euro 2020 related cases. Using the different categories (A-F) from the Mobility Report we found no significant correlation in either. The gray line and area are the median and 95% credible interval of the linear regression (n = 11 countries; The Netherlands was excluded for this analysis). Whiskers denote one standard deviation.
Supplementary Figure S5: We found no significant correlation between cases arising from the Euro 2020 and the stringency of governmental interventions. We correlated the average Oxford governmental response tracker [3] in the two weeks before the championship with the total number of cases per million inhabitants related to football gatherings. The gray line and area are the median and 95% credible interval of the linear regression (n = 11 countries; The Netherlands was excluded for this analysis). Whiskers denote one standard deviation.

11
Supplementary Figure S6: We found slight trends in the correlations between the impact of Euro 2020 and the base reproduction number and country popularity. While these correlations are below the classical significance threshold of 0.05, they are less explanatory than the potential for spread (defined in Fig. 3). There was no significant correlation between the initial COVID-19 incidence and the impact of the Euro 2020. The gray line and area are the median and 95% credible intervals of the linear regression (n = 11 countries; The Netherlands was excluded for this analysis). Whiskers denote one standard deviation.
Supplementary Figure S7 Supplementary Information of Impact of the Euro 2020 championship on the spread of COVID-19

S4.1 Model including the effect of stadiums
Supplementary Figure S9: Including in our model the potential local transmission around the stadium where the matches occur does not significantly increase the overall effect. In addition to the effect of football-related gatherings (A), we extended our model to include an additive effect on the reproduction number when a country hosted a match (B) (for those countries that hosted matches, i.e. n = 6 countries). We assume that local transmissions in and around the stadium would be detected mainly in the venue's country. However, footballrelated cases in a country where matches have a significant contribution to COVID-19 spread are tied to the dates of matches played by the country's team (A) and not to the country of the stadium venue (B), which is especially visible for England and Scotland. This also explains why previous attempts at measuring Euro 2020-related cases focusing on stadium venues were inconclusive. For Spain, an increase in the base reproduction number close to the date of a match makes the model inconclusive. In transparent is the region of the posterior of which we suppose that the model identifies the increase incorrectly; that is, where the posterior delay is smaller than 5.5 days. White dots represent median values, black bars and whiskers correspond to the 68% and 95% credible intervals (CI), respectively, and the distributions in color (truncated at 99% CI) represent the differences by gender. Supplementary Information of Impact of the Euro 2020 championship on the spread of COVID-19

S4.2 Testing the detection of a null-effect
Supplementary Figure S11: Changing the days of the match by a large offset results in a non-significant effect. To test the reliability of our results, we ran counterfactual scenarios where the date of the matches was moved to lie outside the championship period. As expected, such offsets lead non-significant results of the average effect size across countries. White dots represent median values, black bars and whiskers correspond to the 68% and 95% credible intervals (CI) (n = 11 countries, The Netherlands was excluded for this analysis).

S4.3 Robustness of parameters 16
Supplementary Figure S12: Robustness test for the effect of the temporal association between matches and cases by varying the effective delay. We applied an artificial variation of all match days in a positive or negative direction. Under these relatively small variations of the delay, the gender imbalance is strong enough to lead to a stable effect size as the prior of the delay still allows for a sufficient shift of the posterior delay. The model run for France with a 1-day offset is missing because of an unknown, sampling-based error. White dots represent median values, black bars and whiskers correspond to the 68% and 95% credible intervals (CI), respectively, and the distributions in color (truncated at 99% CI) represent the differences by gender (n = 11 countries, The Netherlands was excluded for this analysis).
Supplementary Figure S13: Robustness test for the effect of the width of the delay kernel. In this robustness test, we varied the prior for the width of the delay kernel from the country-specific default (green) towards smaller (yellow) and larger (purple) widths (left column). In the violin plots, the left side is the prior for men; the right side for women. The right column shows the priors and resulting posterior of the standard deviation of the delay kernel σD. Except for England and Scotland (A2, D2), the data does not constrain this parameter. The results are not significantly modified in any country by changing the prior assumptions on this parameter (left column). On average, allowing for larger widths increases the effect size over the reported results. White dots represent median values, black bars and whiskers correspond to the 68% and 95% credible intervals (CI), respectively, and the distributions in color (truncated at 99% CI) represent the differences by gender.
Supplementary Figure S14: Robustness test for the effect of the allowed base reproduction number variability. We propose models with three different base change point intervals: 6 days (yellow), 10 days (green), and 20 days (purple). In the violin plots, the left side is the distribution for men; the right side for women. We do not find significant differences in the fraction of football-related cases (left column) nor in the base reproduction numbers R base (right column). On average, allowing less variation in R base -i.e., removing the freedom of the model to absorb potential gender-symmetric and non-time-resolved cases related to football matches into short-timescale variations of R base -increases the effect size over the reported baseline results. Shaded areas in panels *2 correspond to 95% CI.
White dots represent median values, black bars and whiskers correspond to the 68% and 95% credible intervals (CI), respectively, and the distributions in color (truncated at 99% CI) represent the differences by gender.

S4.3 Robustness of parameters 19
Supplementary Figure S15: Robustness test for the effect of the fraction of female participation in football related gatherings The default model employs a relatively constraining prior for the fraction of female participation in football-related gatherings (green) motivated by [9]. To check for the influence of this assumption, in an alternative model, we assume a more uninformative prior with mean female participation of 50% participation (purple) instead of 20% (green) (A2-G2). We do not find large differences in the results. On average, the total fraction of cases attributed to football matches grows when allowing the assumption of larger female participation in the fan gatherings.
Hence, more cases are attributed to the Euro 2020 overall than in the baseline model. At the same time, a constraint used by the model for associating cases and matches is relieved. Thus, on average, the uncertainty of the posterior slightly grows (A1-G1). White dots represent median values, black bars and whiskers correspond to the 68% and 95% credible intervals (CI), respectively, and the distributions in color (truncated at 99% CI) represent the differences by gender.

S4.3 Robustness of parameters 20
Supplementary Figure S16: Robustness test for the effect the generation interval. We propose models with three different generation intervals: with a mean of 4 days (yellow), 5 days (green), and 6 days (purple). The lack of significant difference in the fraction of football-related cases (left column) shows that if we assume a longer generation intervals than our base assumption of 4 days our conclusions do not change. One remarks that the the base reproduction numbers R base (right column) increases with a longer assumed generation interval, which is expected because a the increase of cases that needs to be modeled stays fixed. In the violin plots, the left side is the distribution for men; the right side for women. Shaded areas in the right column correspond to 95% CI. White dots represent median values, black bars and whiskers correspond to the 68% and 95% credible intervals (CI), respectively, and the distributions in color (truncated at 99% CI) represent the differences by gender.
Supplementary Figure S20: Relative popularity of the search term "football" in England and Scotland measured using "Google Trends" [7] in the category "sport news". Vertical red lines represent the final and match of Scotland vs England, respectively.
Supplementary Figure S21: Male-female imbalance over time shows the largest deviations during championship. We plotted the gender imbalance directly from our data (left column). All countries which showed significant effects had their largest imbalance change during or slightly after the championship (red), and also a number of non-significant countries display this behavior. In addition, the standard deviation of the imbalance during the championship (red) was on average larger than before the championship (orange, right column). This indicates that the large changes in imbalance during the championship were highly unusual and can't be attributed to chance alone. The red time period are the 30 days of the tournament plus the 5 days after and the orange time period the ones up to 35 days before the tournament.

S4.4 Further analyses 25
Supplementary Figure S22: The inferred noise terms do not depend strongly on the length of the analyzed time-period. We plotted the size of our gender noise term σ ∆γ and the size of the change-points of the base reproduction number σ ∆γ . When beginning the run of our model a month earlier (blue), the noise terms do not change significantly compared to our base model (orange). White dots represent median values, colored bars and whiskers correspond to the 68% and 95% credible intervals (CI).
Supplementary Figure S23: The inferred effect size (percentage of football-related primary infections) do not depend strongly on the length of the analyzed time-period. To showcase that the total length of the analyzed period doesn't change significantly our results, we compare the percentage of football-related primary infections one-month-longer runs (blue) compared to our base model (orange). White dots represent median values, colored bars and whiskers correspond to the 68% and 95% credible intervals (CI).

S4.5 Posterior of parameters 28
Supplementary Figure S24: Overview of the posterior for England. We compare (A) the time-dependence of the incidence before, during (blue shaded area) and after the championship; (B) the gender imbalance of observed cases; (C) Oxford governmental response tracker (OxCGRT) [3] and public health and social measures severity index (PHSM) [4] (not part of the model); (D) the gender-symmetric base reproduction number R base ; (E) the genderasymmetric football reproduction number R football ; (F) gender-asymmetric noise related reproduction number Rnoise; and (G) to (Q) the prior and posterior of various parameters. In mid July the incidence starts dropping. In contrast, the number of deaths continues to increase. Together, this indicates that the testing policy was changed around that time. England is one of the two countries where the delay D and the female participation in fan activities dominating the additional transmission can be measured and significantly constrained with the data compared to the prior distribution (G and I). Red diamonds show data not used for the analysis. This comes with an increase in the uncertainty in the model prediction. One notes two slight bumps of the base reproduction number: one during and one after the end of the championship. The first bump may indicate that our model is not able to fully attribute a part of the effective reproduction number to ∆R football and is attributing the effect of England's matches in the group phase to the base reproduction number instead. The second bump might be explained hereby: During the championship there may be generally more social contacts, which are not in temporal synchronization with the matches, and therefore not explained by ∆R football but by R base instead. Hence, after the championship the base reproduction number decreases and increases again when measures are lifted (C). The turquoise shaded areas correspond to 95% credible intervals.
Supplementary Figure S25 . The overall incidence is relatively low, which increases the noisiness of the data. This is especially apparent in the gender imbalance (B). The base reproduction number is slowly increasing during the analyzed time-period, which can be partially explained by a decrease of the stringency index (C). The match effects are greater for later matches, beginning from the last group match until the quarterfinals (E), which is the expected variation. The turquoise shaded areas correspond to 95% credible intervals.
Supplementary Figure S28 Hardly any significant effects, apart from a small but long-lasting increase in R base , are observed. The turquoise shaded areas correspond to 95% credible intervals.