Modeling community COVID-19 transmission risk associated with U.S. universities

The ongoing COVID-19 pandemic is among the worst in recent history, resulting in excess of 520,000,000 cases and 6,200,000 deaths worldwide. The United States (U.S.) has recently surpassed 1,000,000 deaths. Individuals who are elderly and/or immunocompromised are the most susceptible to serious sequelae. Rising sentiment often implicates younger, less-vulnerable populations as primary introducers of COVID-19 to communities, particularly around colleges and universities. Adjusting for more than 32 key socio-demographic, economic, and epidemiologic variables, we (1) implemented regressions to determine the overall community-level, age-adjusted COVID-19 case and mortality rate within each American county, and (2) performed a subgroup analysis among a sample of U.S. colleges and universities to identify any significant preliminary mitigation measures implemented during the fall 2020 semester. From January 1, 2020 through March 31, 2021, a total of 22,385,335 cases and 374,130 deaths were reported to the CDC. Overall, counties with increasing numbers of university enrollment showed significantly lower case rates and marginal decreases in mortality rates. County-level population demographics, and not university level mitigation measures, were the most significant predictor of adjusted COVID-19 case rates. Contrary to common sentiment, our findings demonstrate that counties with high university enrollments may be more adherent to public safety measures and vaccinations, likely contributing to safer communities.

addition to high COVID-19 test specificity, remote instruction and a high capacity for student quarantine were critical to successfully managing outbreaks. Findings from an agent-based model also suggested that reducing direct contact in residence halls and classrooms through class size caps or hybrid options, combined with masking and social distancing, could keep the campus viral reproductive rate at or below 1.0 11 . Additionally, evidence suggested that frequent testing and strict social distancing measures may prevent university shutdowns and the economic penalties associated with such a disruption, potentially outweighing the costs of implementing such surveillance 12 . However, given that providing masks, implementing remote learning, and conducting regular testing requires up front and ongoing funding that not all colleges and universities have, modification of these mitigation strategies is often needed 13 .
The aims of this investigation were twofold: (1) to estimate U.S. county-level COVID-19 case and mortality rates and compare those of counties with and without college campuses, and (2) to model the impact of mitigation efforts on case rates of COVID-19 in a sample of U.S. counties with college campuses. We hypothesized that colleges and universities with robust COVID-19 response plans (e.g. those that have increasing adherence to mask wearing, social distancing, vaccination rates, frequent testing, education campaigns, etc.) would have a protective effect both on campus members and their surrounding communities, and that overall, presence of universities would not negatively impact the overall surrounding community.

Methods
Data sources. Cases and mortalities. SARS-CoV-2 laboratory-confirmed case and mortality data were acquired from the Centers for Disease Control and Prevention (CDC) data archive 14 . We selected the time period of January 1, 2020 to March 30, 2021 to answer question one for three reasons: (1) the Spring (approximately January-May) and Fall (approximately September-December) periods capture the entirety of a typical university calendar year, providing a control period to adequately measure COVID-19 transmission in local communities, (2) The commencement of the fall semester (September 1, 2020) is a critical point as it is approximately the median time point for this analysis, and (3) The U.S. experienced three distinctive COVID-19 "waves" over this period, creating a large representative dataset for comparing counties with campuses across the country.
Each case in this dataset contained 32 elements, including demographics (e.g., age, race and ethnicity, and sex) and county/state of residence. Numerous reporting agencies provide this information to the CDC and as a result, lineage-specific COVID-19 designation, symptom onset, and/or test positive dates were often incomplete, inconsistent, and/or provisional. All cases were aggregated with no sequencing-specific lineage designation. If multiple dates were listed, a single case-positive date was selected from the following (using the same order of preference as the CDC) data: symptom onset date, test positive date, and CDC report date. If no date or county of residence information was provided, the case was excluded. All personally identifiable information (names, addresses, etc.) were removed by the CDC prior to public release of the data. Geographic locations for each case were aggregated to county-level for additional privacy protections.
University enrollment. University enrollment (UE) data were acquired from the Integrated Postsecondary Education Data System (IPEDS) 15 . Total enrollment by postsecondary institution for the Fall semester of 2018-2019 academic year was aggregated by county and divided into four categories. Counties with total enrollment x ≥ 15,000, 15,000 > x > 5000, 5000 ≥ x > 0, or no enrollment, were labeled large (n = 253), medium (n = 361), small (n = 792), and absent (n = 1641), respectively. These cut points were selected based on natural breaks (Jenks optimization method) in the numerically-aligned data.
Other factors of interest in evaluating COVID-19 among populations at the county level included median household income 16 , unemployment 17 , COVID-19 community vulnerability index (CCVI) 18 , percentage of population that was vaccinated as of March 30, 2021 (at least 1 dose) 19 , and percentage vote by party candidate in the 2020 U.S. Presidential Election 20 .
University COVID-19 mitigation and containment policies. Highly incongruous COVID-19 policies were implemented across U.S. universities during of the Fall 2020 semester reopening. In light of this, to address Aim 2 we selected aggregated variables collected by the College Crisis Initiative at Davidson College that could be compared widely across a sample of institutions of various size and 21 . Only data from four-year universities were included in the analysis, and schools were organized by the enrollment size categories used in Aim 1. Reopening plan variables included the "Mode of Instruction" (MOI) as of September 1, 2020 and proposed on-campus COVID-19 testing strategy. Other institution factors (e.g. land grant university status and degree of campus urbanization) 15 were included in the analysis due to potential impact of funding mechanisms and population density structures. Additionally, we evaluated county-aggregate factors such as self-reported masking adherence, state-instituted mask mandates 22 , median household income 16 and unemployment rates 17 .
All statistical comparisons of groups for Aim 1 were assessed using JMP® (Version 16.0) 26 . Group means were compared using Student's t-test or Dunnett's Test (control group = counties without university enrollment). Comparisons were significant at p < 0.05. Age-adjusted case and mortality rates were modeled separately as dependent variables. Models were stratified by county university enrollment size, totaling eight standard least squares regression models. Final model selections were made using backwards Bayesian Information Criterion (BIC)-based stepwise regression. Overall significance of an independent variable was assessed by its frequency of inclusion in each of the four county university enrollment types, as well as averaging the logWorth (calculated as − log 10 (p value)) of that variable across all four university county types.
Aim 2: Since the college reopening dataset contained both quantitative and qualitative variables, a factor analysis of mixed data using the FactomineR package 27 in R (version 4.1.2) was conducted to identify important variables related to university reopening plans. Once the contributions of these variables to the overall variance of the dataset were assessed, a hierarchical cluster analysis was performed (using Bartlett test of sphericity, p < 0.05) to identify similar school mitigation strategy clusters associated with population-adjusted COVID-19 cases at the county level.
Finally, due to overdispersion in the case data, we fit negative binomial models using the MASS package 28 to identify university COVID-19 mitigation strategies and other county-level variables that significantly predicted population-adjusted county COVID cases in the Fall 2020 semester. Model fit was evaluated with the performance package 29 which gives a performance score based on the model's BIC, Nagelkerke's R 2 , and root-mean square error.

Results
Overall impact of universities on county-level COVID-19 cases and deaths (Aim 1). By March 30, 2021 a total of 22,385,335 cases and 374,130 deaths were reported to the CDC. Between January 1, 2020 and March 30, 2021, increasing county university enrollment was associated with significant reductions in COVID-19 case rates, but only slight differences in mortality rates ( Table 1, Fig. 1). Compared to counties with no universities, this equated to a 1% reduction in cases among counties with small university enrollments, an 8% reduction among counties with medium university enrollments, and a 16% reduction among counties with large university enrollments. A comparison of standardized case and mortality rates by age group showed similar patterns across all enrollment types: 20-59-year-olds accounted for most cases, with the highest case rate in the 20-29-year-old age group (Fig. 2). Despite having a lower case rate than adults under 50, adults over 50 experienced the highest mortality rates regardless of county university enrollment type, and this rate increased with age (Fig. 3). On average, those > 80 had nearly twice the risk of death from COVID-19 as 70-79-year-olds, 2.5-times the risk as 60-69-year-olds, and 5-times the risk as 50-59-year-olds.
Across the three wave periods there is a significant reduction in both case and mortality rates with increasing county UE, compared to counties with no university enrollment (Figs. 4 and 5). Case and mortality rates remained below 105 and 2.94 (per 100,000), respectively, for all county types through waves 1 and 2, but markedly increased in wave 3. These rapid increases in both case and mortality rates were experienced across all county types but were less severe as university enrollment increased. The Fall 2020 semester coincided with the beginning of wave 3, and case and mortality rates followed similar trends across all county university types (Figs. 4 and 6).
The effects of each included covariate were similar across county university enrollment types ( Table 2). Each of the 18 covariates evaluated in this analysis was included in at least one final best-fit model; variables included in ≥ 75% of all models were 2019 population, CCVI, % receiving at least one vaccination dose, and % of time in which childcare facilities, nursing home visitations, and restaurants were closed at the county level. Statistically weighted averages of logWorth values show that enforcing the closure of nursing home visitations and childcare facilities were the two most important covariates for age-adjusted case rates, while 2019 county population & CCVI were the two most important covariates for age-adjusted mortality rates.

Factor and cluster analyses (Aim 2).
A total of 1,568 colleges and universities across 740 counties were included in the sub-analysis of college reopening plans for the Fall of 2020 based on availability of reported COVID-19 mitigation plans for the Fall 2020 semester (Table 3). Of these, 235 counties had large enrollments, 196 counties had medium, 302 had small enrollment totals, and seven counties had no reported higher education enrollments.
The multivariate factor analysis revealed that the first three dimensions (for which eigenvalues were above one) only explained 20% of the cumulative variance in population adjusted COVID-19 cases during the Fall of 2020 (Table 4). However, within those dimensions, county-level descriptions (i.e. whether there was a state mask mandate, self-reported mask wearing, and county-level enrollment size). contributed 15% more than expected (~ 40% of overall contribution) to the first dimension. The second dimension's contribution to the variance in population-adjusted COVID-19 cases was strongly defined by institutional factors (~ 50%) (i.e. institutional category, size category, land grant status, and school location description), and campus COVID-19 mitigation strategies (~ 35%) (i.e. MOI category and testing strategy). The third dimension was almost entirely defined by campus COVID-19 mitigation strategies (~ 65% of total contribution). Grouped variables contributing more than expected to these three dimensions (Table 5) were retained for both the cluster analysis and modeling. www.nature.com/scientificreports/ From these factors, we constructed a hierarchical cluster analysis that identified three distinct school clusters ( Table 6) that were differentially associated with population-adjusted COVID-19 cases. All factors with a positive value test score above 1.96 (p value ≤ 0.05) were retained. Clusters were most heavily defined by the school location setting (e.g. rural vs urban), the county enrollment size, and the size category of the institution. Cluster 1 represented suburban and city-based small private institutions in counties with large student enrollments that did not report COVID measures for the Fall 2020 semester. Cluster 2 represented larger, city-based land grant public institutions that remained online or in hybrid instruction mode for the fall 2020 semester, and that offered multiple testing options for students. Schools in clusters 1 and 2 were in counties under mask mandates and that reported high mask usage. The third cluster was defined by small schools in counties with either no or small student enrollments, that held in-person instruction during the Fall of 2020, offered voluntary or no testing Table 1. Summary of United States population (U.S. Census 2010) and key socioeconomic, demographic, and political variables by race & ethnicity and county total university enrollment (none, small, medium, and large) (Aim 1). Cases and deaths labeled as unknown/missing denote individuals where gender was not provided. *Data describing trends from January 1, 2020-March 30, 2021. a Comparison of means analyzed using Dunnett's Test (race & ethnicity control = white population; county university enrollment control = none). b The following correspond to the following statistical values: * = p < 0.05, ** = p < 0.01, *p < 0.001. c Unable to statistically analyse.  www.nature.com/scientificreports/ variables) and economic/county-level covariates (mask usage, mask mandates, median household income) with population-adjusted cases in the Fall of 2020 showed that the most strongly contributive factors to increased county population-adjusted case numbers were low and moderate mask usage, lack of state mask mandates, and median household income (Table 6). For the school-related covariates alone, the overall best fit model predicted that larger schools, schools located in rural areas, and non-land-grant institutions were associated with more county-level cases, (i.e. schools that were part of Cluster 2 were more likely to be located in counties with lower population-adjusted COVID-19 case rates, while those in Clusters 1 and 3 were associated with higher countylevel cases). Overall, MOI and testing strategies were not significantly predictive of county-level case rates during the Fall 2020 semester, except that schools that employed mandatory testing of students were associated with counties with lower population adjusted case numbers.

Discussion
It had been speculated that counties with large university enrollments were at higher risk for COVID outbreaks in the U.S. In this analysis, we evaluated a total of 22,385,335 cases reported to the CDC, representing 3,047 U.S. counties from January 1, 2020 through March 30, 2021. After all cases and deaths were aggregated to the county level and categorized by their total university enrollment sizes, this analysis found that the presence of large university enrollments was associated with lower county COVID-19 case rates. A retrospective analysis evaluating 15 months of cases and deaths caused by COVID-19 reveals small, but significant reductions in cases among counties with increasing university enrollments. However, little to no change was noted with respect to mortality rates. Further analyses focused on differences between age groups also reveal little to no variation in the case and mortality rates of each group across university enrollment size. However, notable age-related trends were discovered. First, the highest case rates were among young to middle-aged adults (20-59), with 20-29-year-olds experiencing the highest case rates than any other age group-a finding also found by Monod et al. 30 Second, although the 0-9 and 80+ year old age groups experienced the lowest case rates, the 80+ year old age group's risk of death was substantially higher than all other age groups. These trends corroborate the widely observed findings pertaining to COVID-19 fatality 31,32 .
The time series plots of daily aggregated county averages of case and mortality rates (Fig. 4) show observable differences in the magnitude of peaks with increasing county university enrollment. However, when evaluating COVID transmission by wave, critical differences become more apparent. Three COVID-19 waves occurred prior to May 2021, with each wave more severe than the wave before. However, the third wave saw a ~ 3-6fold increase in cases and deaths as compared to waves 1 and 2. Although all counties experienced this drastic increase in COVID-19, those with more university enrollment had significantly lower case rates (5.3, 10.6, and 27.2% for counties with small, medium, and large university enrollments, respectively) compared to counties with no university enrollment. Counties with medium or large university enrollments experienced significantly lower COVID-19 death rates (averaging 12.8 and 29.8% lower, respectively) compared to counties with low or no university enrollments. The second epidemiologic period of interest was among counties before and after the Fall 2020 semester. As seen in the wave analysis, all counties experienced a rapid increase in both COVID-19 cases and deaths after the start of the Fall 2020 semester. However, counties with increasing university enrollment experienced significantly lower case rates (3.7, 10.8, and 27.1% lower for counties with small, medium, and large university enrollments, respectively) compared to counties with no university enrollment. Counties with medium and large university enrollments also resulted in significantly lower death rates (averaging 13.2 and 30.2% lower, respectively) compared to counties with low or no university enrollment.
Our sub-group analysis of COVID-19 mitigation strategies for the Fall 2020 semester provide further evidence that colleges and universities were not associated with increases in county-level cases-even before cohesive containment plans were established. The cluster analysis identified that schools identified in Cluster 2 tended to be larger, land grant universities that maintained entirely online or hybrid course instruction during the Fall 2020 semester and were likely to be in counties with lower population-adjusted COVID cases. These schools were more likely to have mandatory student testing that was significantly associated with overall lower county cases numbers. During the Fall 2020 semester, county and state-level factors (e.g., mask usage, mask mandates, and median household income) were far more significantly predictive of overall county-level cases, which held true during the larger analysis period.
Overall, the COVID-19 pandemic through March 30, 2021 rapidly spread through all U.S. counties with similar patterns in the timing and intensity of cases and deaths, regardless of the size of university enrollment. However, the magnitude by which cases and deaths affected counties is strongly associated with university enrollment. Although there were minimal differences among death rates by university enrollment, large enrollment universities were most affected by COVID-19 in the early stages of the pandemic-a suspected driving force behind the lack of significant differences in overall mortality rates. However, as the pandemic progressed in intensity (case and deaths per day), counties with increasing university enrollments experienced decreased risks in acquiring, and dying from, COVID-19.
Together these two analyses strongly suggest that community-level variables-and not universities-are what drove COVID-19 cases during this time period. Despite having larger populations, counties with large university   www.nature.com/scientificreports/ enrollments fared better than counties with little or no university enrollments, especially as COVID-19 cases surged through the winter of 2020-2021 (wave 3). In comparison to counties with little or no university enrollment, large university enrollment counties contained higher household incomes, less unemployment, and had higher vaccination rates (% with at least 1 dose). These counties also tended to enforce statewide mandates more frequently and for longer throughout the pandemic compared to counties with little to no university enrollments (Tables S1 and S2).
In addition to the differences noted above, public health decisions were dependent on several political, economic, and social factors. It is apparent that this pandemic has fueled divisions along political lines, which influenced both public health decisions and compliance. From adherence to social distancing and mask wearing to vaccination rates, political associations appear to be a strong influencing force 33 . Using the 2020 U.S. Presidential election as a proxy for determining a county's political affiliation, the associations of COVID-19 cases and deaths are moderately correlated (Figs. S1 and S2). Counties with higher university enrollments are also correlated with increased overall education rates, which have been found to be associated with use of pandemic control [34][35][36] .
To date, several studies have analyzed the risk of transmission associated with universities and/or college-aged populations, but all have been limited to no more than 4-5 institutions 2,7,37,38 . Furthermore, very few studies have estimated the attributable risk between universities and students within the communities they reside 39 . Studies that did address community transmission and associations with students were systematic reviews, mathematical simulations, and/or consisted of elementary or primary student ages 10,40,41 . Our retrospective analysis is novel in that all data collected is specific to our central question, narrowing the scope of our investigation to produce tangible estimates of transmission risk at high spatial and temporal scales.
This study is limited in its analysis by aggregating non-lineage-specific individual-level COVID-19 case data to county of residence. By doing so, generalizations are made across varying population sizes and COVID-19 strains and cannot capture subtle differences. Numerous reporting agencies submit varying levels of completeness to the CDC, leading to a high potential for ascertainment bias. Additionally, due to the rapid onset of cases, particularly through peak wave periods, significant delays in reporting also occurred. This analysis attempted to establish etiologic-specific onset dates given the data made available. In some cases, differences of 1 + week from reported onset date to actual, true onset date likely exist and are unavoidable. However, the data account for a significant portion of the U.S. population, thus reducing the errors in generalization and representing the best source for such a study. Limitations to the sub-group analysis included an incomplete picture of mitigation strategies for some of the universities (e.g. Cluster 1 was defined by schools that did not fully report their COVID-19 plans). Counties without higher-education enrollments were also severely limited in this dataset, which could have impacted the analysis. We also assumed the mitigation plans to be static over the course of the Fall 2020 semester due to the lack of available data of changes. Given that this was the first semester with some schools in person, we felt it was likely that a university would maintain proposed plans, and if changes occurred, they would be to bolster mitigation, not reduce.
This study incorporates key differences among American counties stratifying across a spectrum of enrolled university students. In general, counties with increasing enrollment populations tend to be more populated and urban. As such, several potentially important factors associated with COVID transmission, reporting, and knowledge, attitudes, and practices have been generalized or overlooked. Future studies would provide a great service to public health by expanding on the methodologies and results of this study on several of these key factors. For example, counties with large university enrollments appear to have many key protective factors in place to mitigate COVID-19. However, these counties also contain significantly larger populations of black, indigenous, and people of color (BIPOC) that suffer from disproportionate racial and health inequities. Consequently, social vulnerabilities are significantly higher in counties with large university enrollments. Preliminary analyses of this dataset, in respect to COVID-19 outcomes by race and ethnicity, suggest corroborative evidence of these inequities (data not shown). Future studies should also evaluate the co-evolution and/or competition between COVID-19 and other respiratory viruses, particularly seasonal influenza and respiratory syncytial virus conducted in a similar longitudinal analyses as this study.

Conclusions
1. An increase in a county's university enrollment is associated with a lower rate of COVID-19 cases and deaths compared to counties with no university enrollments. 2. Counties with the largest universities tend to have greater population sizes, higher household income, lower unemployment, a higher percentage of people partially vaccinated against COVID-19, and a higher proportion of the population voting for Biden in the 2020 presidential election, but also a higher social vulnerability, than those with no university enrollment. 3. Counties with university enrollments tend to have higher enforcement of statewide closing mandates than those without. 4. Despite having high population densities and high social vulnerabilities, individuals in counties with large university enrollments had reduced COVID-19 case and mortality rates than those without university enrollments, especially in the third wave of the pandemic. 5. Increasing education levels, enforcement of closing mandates, adherence to public health recommendations, and political affiliation were all associated with lower COVID-19 case and mortality rates. 6. There was no significant impact on community cases from universities who returned for the Fall 2020 semester; county-level factors were the leading predictors of cases. Table 6. Best overall fit model that predicts population-adjusted county-level COVID-19 cases during the fall of 2020 (September 1st-December 31st) including college/university characteristics and COVID-19 mitigation strategies. Increasing number of asterisks correspond to the increasingly significant statistical values (e.g. * = p < 0.05, ** = p < 0.01, *** = p < 0.001). www.nature.com/scientificreports/ 7. These results offer evidence against the presumption that universities increase risk of COVID-19 community transmission.

Data availability
All data used in this analysis is free and available for public use. Individuals interested in COVID-19 case data can apply and request access via https:// data. cdc. gov/.