Risk factors for territorial spreading of SARS-CoV-2 in North-eastern Italy

The impact of specific risk factors for SARS-CoV-2 infection spread was investigated among the 215 municipalities in north-eastern Italy. SARS-CoV-2 incidence was gathered fortnightly since April 1, 2020 (21 consecutive periods) to depict three indicators of virus spreading from hierarchical Bayesian maps. Eight explanatory features of the municipalities were obtained from official databases (urbanicity, population density, active population on total, hosting schools or nursing homes, proportion of commuting workers or students, and percent of > 75 years population on total). Multivariate Odds Ratios (ORs), and corresponding 95% Confidence Intervals (CIs), quantified the associations between municipality features and virus spreading. The municipalities hosting nursing homes showed an excess of positive tested cases (OR = 2.61, ever versus never, 95% CI 1.37;4.98), and displayed repeated significant excesses: OR = 5.43, 3–4 times versus 0 (95% CI 1.98;14.87) and OR = 6.10, > 5 times versus 0 (95% CI 1.60;23.30). Municipalities with an active population > 50% were linked to a unique statistical excess of cases (OR = 3.06, 1 time versus 0, 95% CI 1.43;6.57) and were inversely related to repeated statistically significant excesses (OR = 0.25, > 5 times versus 0; 95% CI 0.06;0.98). We highlighted specific municipality features that give clues about SARS-CoV-2 prevention.


Official data sources.
A. Data related to cases were originally gathered by the 18 territorial Prevention Departments of the FVG region, and successively coded, validated, and registered by the Regional Health Center Directorate, and made publicly available weekly through the Regional Department of Civil Protection (RDCP) website 6 by means of a comma delimited text file. Data were disentangled at municipality level (N = 215), the smallest Italian administrative area, without reference to age or sex of SARS-CoV-2 cases. The header of the text file reported the variable names only in the Italian language. Three relevant variables for the study were extracted from the file: 1) the numeric code of each municipality of the FVG region (reported as "ISTAT", i.e. the code defined by the Italian National Institute of Statistics); 2) the calendar date of the data collection (reported as "Data"); and 3) the cumulative number of alive persons confirmed positive and still without a negative SARS-CoV-2 test (reported as "Attualmente positivi"). It should be noted that the exact date of diagnosis of the cases was not reported in the file. It has been described that patients with mild-to-moderate SARS-CoV-2 remained infectious no longer that 10 days after symptom onset 7 , while most patients with severeto-critical SARS-CoV-2 remained infectious no longer that 20 days, except immunocompromized patients who remained infectious beyond 20 days. The severe-to-critical patients were those requiring hospitalization, intensive care, or ventilation support (e.g. patients that remained in strict isolation). Given these assumptions and the type of available data, we chose a priori to examine data fortnightly (the first and 15th day of each month, consecutively for 21 times since April 1st, 2020), empirically assuming that a 2-week lag (precisely: 15.3 ± 1.4 days) between analyses was adequate to limit the number of overlapping cases in two consecutive periods. B. The explanatory features of the municipalities were gathered from three different open-access data sources that are routinely collected. In order to reduce as much as possible the uncertainty of mixing different kind of data sources, we accessed only official, reliable, and well-referenced public data at Italian institutional level. Features have always been recorded in two groups due to the low number of municipalities in FVG region (N = 215). Features consisting of continuous data were split into two groups by setting the median as the cut-off. The feature: Municipalities hosting nursing homes (coded: no/yes) was directly provided by the FVG region 8 , last update, mid-February 2019. The other features were downloaded from the Italian National Institute of Statistics (ISTAT) 9 . In particular: 1) The percentage of the population aged between 20 and 59 years to total population, used as a generic indicator of people with potential and generic mobility (recoded as < 50% and ≥ 50%); 2) the percentage of the population aged above 75 years to the total population, used as an indicator of population at high risk of Covid-19 infection (recoded as < 14% and ≥ 14%); 3) the percentage of commuting students (recoded as < 14.4%. and ≥ 14.4%) and the percentage of commuting workers (recoded as < 50%. and ≥ 50%) to the total population, both used as indicators of people crossing the border of their municipality of residence on daily basis-these two indicators were released during the 2011 Italian national census; 4) the population density (recoded as < 100 and ≥ 100 inhabitants/km 2 ), and the urban-rural gradient both used as indicators of high risk municipalities; and 5) the municipalities hosting schools of any grade 10 . C. The number of resident FVG population (around 1,200,000 inhabitants) by municipality was abstracted from the same publicly available database of the cases (i.e. RDCP data source). For each 14-day period the expected numbers of cases for each municipality were computed using the sum of cases as the reference. Areaspecific heterogeneity was accounted for by considering two-component random effects: 1) an unstructured effect (uncorrelated heterogeneity); and 2) a group of neighboring random effects (correlated heterogeneity). The model parameters were estimated using Gibbs sampling. Model fitting was carried out using three separate Markov chains starting from different initial values. The first 60,000 samples from each chain were discarded as burn-in, and the following 40,000 iterations were sampled. Convergence was checked by visual inspection of the time series samples plots and by examination of four diagnostic indicators (Geweke, Gelman-Rubin, Raferty-Lewis, and Heidelberg-Welch diagnostics) produced by the coda package 12 . RRs and PPs were graphically displayed using two separate maps at municipality level. In one map, RR values were grouped in five categories (< 0.90, 0.90-0.94, 0.95-1.04, 1.05-1.09, and ≥ 1.10) with a color scheme of red tonalities. In a second map, PPs were grouped in five categories (< 2.5%, 2.5-9%, 10-89%, 90-97.5%, ≥ 97.5%) with gray tonalities. The boundaries of the four provinces of FVG region (i.e. administrative areas with one main urban center) were added to the maps. In order to calculate the three indicators of SARS-CoV-2 spreading, we exploited the whole set of 21 consecutive maps to group municipalities a priori according to the characteristics of the statistically significant excesses of the RRs. Specifically, in each municipality and for every 21 fortnightly maps we counted the number of times risk www.nature.com/scientificreports/ of SARS-CoV-2 presented a statistically significant excess (i.e., the times RR was significantly above 1 and with a PP ≥ 97.5%). In other words, this count of significant excesses allowed to link cases to specific municipalities repeatedly across examined periods and to measure their association with the studied features. Three indicators were obtained from the counts: 1) Statistically significant excesses (coded as "Never" for 0 excesses out of 21 maps, "Ever" for at least one excess in 21 maps) used to detect directly high risk municipalities; 2) peak periods (coded as "Never" for 0 excesses in 21 maps), "First peak" for an excess exclusively during the first peak (between 2020/04/01 and 2020/07/01), "Second peak" for an excess exclusively during the second peak (after 2020/07/01), and "Both" for at least one excess in both peaks) used as a sensitivity analysis of the study across peaks; and 3) number of repeated statistically significant excesses (Never, 1, 2, 3-4, and ≥ 5) used to measure the gravity of the virus spread by means of a frequency in each municipality across 21 maps.

Statistical analyses.
We computed crude rates (CR) per 10,000 (number of cases / population per 10,000) and counted the number of municipalities that displayed statistically significant excesses (RR > 1 with a PP ≥ 97.5%). Odds ratios (OR), and their corresponding 95% confidence intervals (CI), were computed using unconditional multiple logistic regression models 13 to quantify the association between the explanatory municipalities features and the COVID-19 spreading indicators. A p-value < 0.05 was considered statistically significant. All statistical analyses were carried out using SAS (version 9.4 SAS Institute, Cary, NC, USA). Figure 1 displays the spatial distribution and the descriptive indicators (number of cases and crude rates) associated with confirmed SARS-CoV-2 positive cases among the 215 municipalities of FVG region, during 21 consecutive fortnightly periods from April 1, 2020 to February 1, 2021. The first map showed that the number of confirmed SARS-CoV-2 positive cases was 1400 corresponding to a CR of 11.5/10,000 inhabitants. Cases increased only during the following 14-days (N = 1653, CR = 13.5) to decline continuously up to July 1, 2020 (N = 38, CR = 0.3). Afterwards, the number of cases increased up to January 15, 2021 (N = 17,661, CR = 144.7) and subsequently plateaued. Based on these observations, we decided to split the study period into two peaks (one: from April 1, 2020 to July 1, 2020, and the second peak thereafter). A tenfold increase in the number of cases occurred between the tip of the first peak and that of the second peak (1653 vs. 17,661 respectively).

Results
The varying spatial distribution of cases highlighted three main patterns ( Fig. 1). Firstly, in the lower right corner of the maps, five municipalities tended to cluster around the capital city of FVG region. Secondly, from November 1, 2020 part of the excesses seemed to move counterwise from North-east in the upper right corner of the map) to South-west (lower left). Thirdly, when jointly examining the 21 maps, 104 municipalities (out of 215) never displayed a statistically significant excess. Table 1 gives the distribution of the explanatory features of the 215 municipalities of the FVG region (i.e. population density, urban-rural gradient, hosting of nursing homes, percent of ≥ 75 years population to total population, percent of 20-59 years population to total population, percent of commuting workers to total population, hosting schools of any grade, and percent of commuting students to total population) according to the three dimensions of the COVID-19 pandemic spread deducted from the 21 fortnightly Bayesian maps. Municipalities hosting nursing homes, hosting schools of any grade, and with a percentage of 20-59 years population higher or equal than 50%, were associated (p-value < 0.05) with all the three dimensions of COVID-19 pandemic spread. Table 2 gives the univariate ORs of the studied features of the municipalities according to the three dimensions of the COVID-19 spread. Municipalities hosting nursing homes showed a statistically significant excess of COVID-19 cases (OR = 2.74 for ever versus never; 95% CI: 1.50-5.00) with a repeated number of statistical excesses: OR = 6.21, for 3-4 times vs. 0, in 21 fortnightly periods (95% CI: 2.40-16.08) and OR = 9.32, for ≥ 5 times vs. 0 (95% CI: 2.67-32.57). Municipalities with a percentage of a population of 20-59 years above or equal 50% displayed a statistically significant excess of COVID-19 positive cases (OR = 1.73 for ever versus never; 95% CI: 1.01-2.97) linked to a unique statistical excess (OR = 2.75, for one time vs. 0; 95% CI: 1.34-5.64). Municipalities that hosted schools of any grade showed a repeated number of excesses: OR = 2.71, for 3-4 times vs. 0 (95% CI: 1.09-6.74) and OR = 4.89, for ≥ 5 times vs. 0 (95% CI: 1.51-15.84). Population density above or equal 100 inhabitants/km 2 was associated with a repeated number of statistically significant excesses of COVID-19 (OR = 3.96, for ≥ 5 times vs. 0; 95% CI: 1.04-15.02). Similarly, urbanicity was associated with a repeated number of statistically significant excesses of COVID-19 (OR = 5.69, for ≥ 5 times vs. 0; 95% CI: 1.74-15.55). The first peak was merely associated with four municipalities excesses, which indicated a low power of the analyses. Although not statistically significant, the risk pattern of the features did not diverge between the two peaks.
These associations were further examined by means of a multivariate model that included the five statistically significant explanatory features emerged in the univariate model (Table 3). Hosting nursing homes and the proportion of 20-59 years population on total were confirmed as associated with COVID-19 spreading. In particular, municipalities hosting nursing homes showed a statistically significant excess of SARS-CoV-2 positive cases (OR = 2.61 for ever versus never; 95% CI: 1.37-4.98), and displayed repeated significant excesses: OR = 5.43 for 3-4 times vs. 0 (95% CI: 1.98-14.87) and OR = 6.10, for ≥ 5 times vs. 0 (95% CI: 1.60-23.30). Municipalities with a proportion of 20-59 years population ≥ 50% were linked to a unique number of statistical excesses in 21 periods (OR = 3.06; 95% CI: 1.43-6.57), and they were inversely related to repeated statistically significant excesses of SARS-CoV-2 positive cases (OR = 0.25, for ≥ 5 times vs. 0; 95% CI: 0.06-0.98). Finally, the associations observed in the univariate analysis for population density, urbanicity, and hosting schools of any grade lost their effect in the multivariate model.

Discussion
Our observational study, which was based on all confirmed SARS-CoV-2 positive cases diagnosed in a welldefined population during 21 fortnightly periods, with data centrally validated since the beginning of the pandemic, showed that municipalities hosting nursing homes besides displaying a statistically significant excess of positive tested cases were also at risk of at least 3-4 repeated excesses. By contrast, municipalities with the higher proportion of a population aged 20-59 years to total (i.e., the population in the most active age) were   www.nature.com/scientificreports/ linked to a unique statistically significant excess in the 21 fortnightly periods, and they were inversely and significantly associated with five or more repeated excesses of SARS-CoV-2 positive cases. Finally, population density, urbanicity and hosting schools of any grade lost their effect when standardized for the aforementioned features of the municipalities (i.e., hosting nursing homes or a higher proportion of active population). These results, which are broadly consistent with the findings from other investigations, add further information on the features associated to sporadic or repeated excesses of the virus spreading and on the features that may have been contrasted by lockdown policies. The observed pattern gives clues to prevent spread of SARS-CoV-2 infection in municipalities with well-defined features. Several plausible biological mechanisms linked  www.nature.com/scientificreports/ to population demography 5 and to specific SARS-CoV-2 infection risk factors 14 may elucidate the associations measured by our study. Municipalities hosting nursing homes showed an excess of SARS-CoV-2 positive cases along with repeated significant excesses (of at least 3-4 times over the 21 fortnightly periods). Assuming that the excesses observed in our study were partially or almost totally linked to the nursing homes located within these municipalities, our observation is consistent with previous papers published in a number of countries 15,16 . In particular, in the US, the vulnerability of nursing homes in relation to SARS-CoV-2 infection 17 has been defined as the "Perfect Storm" 18 . Moreover, our observation of repeated excesses in the same municipalities of FVG suggested, firstly, that once the virus entered the nursing homes, it was difficult to eradicate it in a short period of time and, secondly, that lockdown policies regarding access of relatives or external visitors unaffected the virus infection course. It is well-known that nursing homes accommodate vulnerable persons into shared living or close quarters or communal spaces. These persons generally presented chronic comorbidities (such as, diabetes, cardiovascular diseases, chronic respiratory diseases, cerebrovascular diseases, malignancies or dementia) and various degrees of disability, which may hinder preventive health measures, such as personal hand washing 19 . An alternate explanation of the pattern observed is a higher rate of SARS-CoV-2 testing in nursing homes than in the general population; however, this hypothesis should be tested with a different study design.
Municipalities with a ≥ 50% proportion of 20-59-year old population were linked to a unique statistically significant excess of tested cases for SARS-CoV-2 over the 21 periods, but they were also inversely related to ≥ 5 statistically significant excesses. The age range studied is a proxy of the most active portion of the population that is also linked to population mobility 20 . This direct association has been previously reported in nearly 52 countries 20,21 together with the benefits of the reduction in mobility in the virus transmission 21 . Thus, in FVG region, lockdown policies should have limited the mobility patterns and reduced transmissibility of the virus. However, it seems that the mobility containment unaffected sporadic clusters temporally limited to 14-days while these measures avoided statistically significant repeated excesses in the municipalities with a higher proportion of active population. Moreover, the excesses limited to a 14-day period suggested that some kind of isolation or quarantine may have limited the further spreading of the infection.
In our study, municipalities hosting schools were associated with repeated significant excesses (of at least 3-4 times or more in the 21 periods examined). After adjusting for all significant features mututally, the hosting schools feature lost its effects. There is limited evidence that schools had a relevant role in SARS-CoV-2 transmission in the general population [22][23][24] , although indications that community transmission can be imported into the school setting has been described 25 . Evidence of increased risk of reported SARS-CoV-2 infection and COVID-19 outcomes has been reported among adults living with children during the second peak. However, this did not translate into a materially increased risk of covid-19 mortality, and absolute increases in risk were small 26 . In any case, further studies are needed to elucidate the role of schools in the ongoing pandemic in the light of potentially low social distancing between scholars (during lessons or travel from/to school).
By univariate analysis, municipalities with a population density higher than or equal to 100 inhabitants/km 2 (vs. lower than 100), or classified as urban (vs. rural) showed a statistically significant association only between repeated excesses (higher or equal than 5 times) of SARS-CoV-2 positive cases. This observation was not confirmed in the multivariate model. It is known that SARS-CoV-2 is an aerial pathogen, and population density or urbanicity might have played a significant role in the acceleration of transmission, together with overcrowding influx following social events [27][28][29] . Our result suggested that lockdown policies contained the natural spreading of the virus.
Several strengths of this investigation deserve attention. First, the present study took advantage of a regional database containing information about SARS-CoV-2 infection of all residents. Second, this database met Italian standards for validation, high-quality, and comparability of the methodology of registration, thus included in the national COVID-19 database. Third, the features of the municipalities were obtained from official open-access data sources that are routinely collected for national censuses. Fourth, we assumed the residential location of the participants as a proxy of individual exposure. Our study population was relatively stable, with nearly 93% of the FVG population living in their area of residence for more than 15 years 30 .
Conversely, the study suffered from some worth noting limitations, some common to other observational studies. First the lack of availability of individual level data such as age, sex, date of diagnosis or time spent in the municipality of residence avoided intragroup comparison and the computation of spatio-temporal models. However, in our study the lack of this type of information should have only flattened the risk estimates if random with respect to virus spreading. Second, the data were gathered territorially by multiple teams and no information was available about the type of population tested (i.e. on voluntary basis, screened or symptomatic). Third, undiagnosed patients (i.e. asymptomatic, misdiagnosed, or dead individuals) might flatten our estimates; however, due to the mode of virus transmission, it is unlikely that these cloaked positive individuals had different risk factors than those reported in the official data. Thus, although the maps represented strictly infected patients with a positive test, they can give clues about the whole pattern of infection. Fourth, lockdown policies should have influenced the spread pattern of SARS-CoV-2, though whether the policies were homogeneously applied in the whole FVG region is an unanswered question.
In conclusion, while waiting for the global vaccination against COVID-19, the best way to prevent infection is by case identification and isolation until contagion becomes unlikely. However, the implementation of centered measures of prevention require knowledge of the routes of transmission. We implemented key parameters able to describe the virus spread at municipality level and quantified their association with several explanatory features linked to municipalities. These findings give clues for better preventing the SARS-CoV-2 pandemic at local level.

Data availability
The datasets used and/or analyzed during the current study are available from the first author or by download of the dataset from the reported links.