Introduction

In late 2019, the novel coronavirus disease 2019 (COVID-19) emerged as a worldwide pandemic threat1. By early 2020, virus transmission began across North America. Transmission and cases of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) infection developed earlier and in more rapid succession in certain cities with densely populated and marginalized populations2,3. Other countries had some lead time to prepare for SARS-CoV-2 infections, including Canada, which has an overall smaller and more geographically distributed population size and lower scale of international travel that could influence the regional epidemiology of COVID-19.

Ontario is the most populous province in Canada with diversity in age, socioeconomic status, and race/ethnicity across the province’s counties, townships, and municipalities4. While the presence of certain clinical characteristics, particularly a history of cardiovascular and renal disease or obesity, may increase the likelihood of SARS-CoV-2 infection5,6,7,8,9,10,11,12,13, other data show sociodemographic characteristics, including race and socioeconomic status, may be stronger drivers of infection risk as seen with other communicable diseases14,15,16,17,18,19,20,21,22,23,24,25. However, earlier data may be biased if cases were selected only among individuals presenting to hospital, with limited or no controls. Furthermore, estimates of risk may be biased by the propensity for or against exposure and testing over time. Therefore, we sought to evaluate the association of sociodemographic and clinical risk factors with the likelihood of laboratory-confirmed SARS-CoV-2 infection before and after the peak of the first wave and until the end of 2020 in Ontario, by analyzing linked population-based health databases among individuals tested for SARS-CoV-2. Given that the time period surrounding the peak of the first wave coincided with the introduction of a broad suite of public health measures aimed at mitigating the spread of the pandemic, we further describe whether instituting these public health interventions in March/April of 2020 (Supplemental Fig. 1) were effective at mitigating both sociodemographic and clinical risk factors.

Methods

Study design and population

Ontario has a publicly funded health care system with universal access to care without user fees at the point of service. We assembled a population-based retrospective cohort of all Ontarians aged 18 years and older who were eligible for the province’s universal Ontario Health Insurance Plan (OHIP), alive as of January 1, 2020, and who underwent testing for SARS-CoV-2 up to December 31, 2020 (Supplemental Fig. 2). This cohort was created at ICES, a non-profit research institute whose legal status under Ontario’s health information privacy law allows it to collect and analyze health care and demographic data, without consent, for health system evaluation and improvement. The cohort was created through linkage of multiple provincial and federal health care related databases (e.g., hospital discharge abstracts, physician claims, chronic disease registries, health survey, laboratory, and drug dispensing data), as well as the Immigration, Refugees and Citizenship Canada (IRCC) Permanent Resident database26. These datasets were linked using unique, encoded identifiers and analyzed at ICES as previously described and validated27. We required individuals to be eligible for health insurance one year prior to the index date of study as determined from the Ontario Registered Persons Database (RPDB), which includes basic demographic information about anyone who has ever had an Ontario health insurance number. Individuals who were not residents of Ontario on the index date were excluded. Since baseline health status and congregate living arrangements of long-term care residents differ substantially from community-dwelling individuals, they were also excluded from these analyses. The index date for study inclusion was the date of a first SARS-CoV-2 test, either as recorded in the Ontario Laboratories Information System (OLIS; which consolidated the majority [≥ 88%] of results of COVID-19 testing during the first wave in Ontario), and/or the Case and Contact Management System (CCM).

Exposure variables

Testing date was divided into calendar weeks. We obtained data on age, sex, and community-dwelling characteristics from the RPDB. Communities were categorized by regional public health units (PHUs; also referred to as “regions” throughout), geographic location, and size according to Statistics Canada’s Census data28. Communities with less than 10,000 residents were classified as rural. Median neighborhood income was categorized by quintile according to national Census data. Individuals who immigrated to Ontario as their first place of landing in Canada between 1985 and 2017 were identified via the IRCC Permanent Resident database.

Clinical comorbidities were identified using previously validated case definition algorithms for Canadian administrative databases based on hospitalization and emergency department records from the Canadian Institute for Health Information Discharge Abstract Database (CIHI DAD) and National Ambulatory Care Reporting System (NACRS), respectively using International Classification of Diseases, Tenth Revision, Canada (ICD-10-CA) coding, hospital and physician procedure coding, and chronic disease diagnoses from the OHIP database (Supplemental Table 1). In addition to the number of hospitalization or emergency department episodes in the prior year, we also included the following characteristics within the previous 5 years: history of coronary artery disease (CAD; defined as a prior myocardial infarction, percutaneous or surgical coronary revascularization); hospitalization for heart failure (HF) or stroke, history of liver disease, chronic lung disease (including pneumonia, tuberculosis, asthma or chronic obstructive pulmonary disease), organ transplantation, atrial fibrillation, chronic kidney disease (CKD) or malignant cancer. Any prior history and duration of hypertension, any history of diabetes or human immunodeficiency virus (HIV) was also assessed. Frailty was defined using the Johns Hopkins Adjusted Clinical Groups (ACG®) Version 10 frailty indicator29,30. Sex-specific and, when available, age-standardized regional rates of smoking, obesity, and racial/ethnic diversity were calculated at the public health unit level using national census and survey data available from Public Health Ontario due to a lack of individual-level administrative data on income and race in Ontario and Canada more generally4,31,32,33,34. Racial/ethnic diversity was defined as the estimated regional visible minority proportion of individuals who self-identified as Black, South Asian, Chinese, Filipino, Latin American, Arab, Southeast Asian, West Asian, Korean, and Japanese according to national Census data35. The receipt of influenza vaccination within the past year was determined from the OHIP and ODB databases among eligible individuals.

Outcomes

Our outcome of interest was laboratory-confirmed SARS-CoV-2 infection, via the proxy of testing positive for SARS-CoV-2 by reverse transcription polymerase chain reaction (RT-PCR). As individuals could undergo multiple tests over the study period, if any test was positive, they were classified as having the outcome. Otherwise, individuals testing negative for SARS-CoV-2 in all tests comprised the uninfected group.

Statistical analysis

Sociodemographic and clinical characteristics between individuals with and without laboratory-confirmed SARS-CoV-2 infection were compared using chi-square tests for categorical variables and one-way analysis of variance for continuous variables.

To evaluate the association between baseline characteristics and risk of SARS-CoV-2 infection, we fit multivariable logistic regression models to determine the odds ratio (OR) of testing positive. We regressed the outcome (testing positive vs. testing negative) on sociodemographic and clinical characteristics, incorporating PHU-specific random effects to account for clustering of individuals within communities. In these multivariable analyses, we adjusted for calendar week of testing, sociodemographic factors (age, sex, rural/urban residence, immigrant status, neighborhood income quintile, and regional racial/ethnicity diversity rate), and the aforementioned clinical risk factors. An interaction term was introduced to test for heterogeneity in the association of age with the likelihood of SARS-CoV-2 infection over testing week. Given significant heterogeneity was detected, we stratified reporting of results into three time periods, one prior to the peak of the first wave of the pandemic, one immediately following the peak, and the latter half of 2020.

Sociodemographic and clinical risk factors that were independently associated with laboratory-confirmed SARS-CoV-2 infection were then used to calculate an integer risk score for each individual by summing the number of risk factors the individual had. Separately for the periods prior to and following the peak of the first wave of the pandemic, we then calculated absolute infection rates and the odds ratio of SARS-CoV-2 infection in Ontario stratified by quartiles of regional rates of racial/ethnic diversity (Ontario median 4.0%, interquartile range, 2.5–17.6%; Supplemental Fig. 3). Associations of the integer scores with SARS-CoV-2 infection were assessed using logistic regression models. As few individuals had no risk factors following the peak of the pandemic, we combined the presence of 0 to 1 risk factors as the lowest risk group for the post-peak period. Trend in infection risk with increasing integer score was assessed using the Cochrane-Armitage trend test. All P values were 2-sided with P < 0.05 considered significant. The study followed the Strengthening the Reporting of Observational Studies in Epidemiology (STROBE) reporting guideline. All data were analyzed at ICES using SAS version 9.4 (SAS Institute, Cary, NC).

Ethical statement

ICES is a prescribed entity under section 45 of Ontario’s Personal Health Information Protection Act. Section 45 authorizes ICES to collect personal health information, without consent, for the purpose of analysis or compiling statistical information with respect to the management of, evaluation or monitoring of, the allocation of resources to, or planning for all or part of the health system. Projects conducted under section 45, by definition, do not require review by a research ethics board (for use of anonymized data). The data access and analysis for this study was conducted under section 45 and all relevant protocols/materials were approved by ICES’ Privacy and Compliance Office. All methods were carried out in accordance with locally relevant guidelines and regulations.

Results

A description of the number of study participants and reasons for inclusion and exclusion are summarized in Supplemental Fig. 2. Overall, 3,167,753 eligible community-dwelling patients underwent RT-PCR testing for SARS-CoV-2 infection between January 1 and December 31, 2020, of which 142,814 (4.5%) had confirmed test positivity (Table 1 and Supplemental Table 2). The weekly number and rate of community-dwelling individuals first testing positive for SARS-CoV-2 by age group is presented in Fig. 1. The weekly number of community-dwelling individuals tested for SARS-CoV-2 and share of tests that were positive, stratified by age group, is presented in Supplemental Fig. 4. During the first wave, the peak number and proportion of individuals with laboratory-confirmed SARS-CoV-2 infection occurred during the week starting April 12, 2020, three weeks following a province-wide lockdown, which reflects the standard latency period for measuring impact of public health measures on infection rates (Supplemental Fig. 1)36.

Table 1 Baseline characteristics of community-dwelling patients with and without SARS-CoV-2 infection during the first wave of the pandemic in Ontario, Canada.
Figure 1
figure 1

Weekly number and rate of community-dwelling individuals with SARS-CoV-2 infection stratified by age group during 2020, prior to (A Weeks of January 1–April 12) and following (B Weeks of April 19–June 7 and C Weeks of June 14–December 27) the peak of the first wave of the pandemic in Ontario, Canada. Time represented by the start of the calendar week with the weeks of January 1 through March 8 consolidated given low initial infection counts. Age groups: 18–45 years (dark green); 45–65 years (light green); 65–75 years (light blue); 75–85 years (pink); 85 years and older (orange). The proportion of weekly counts attributable to an age group is represented in each column with the percentage indicated on each section. The peak of the first wave of the pandemic regarding community-dwelling infections, using the proxy of test positivity in Ontario, correlated with the week of April 12, 2020. Ages 75 and up are combined to suppress small cells for weeks Jan 1 – Mar 8 and June 7, per ICES' reidentification risk assessment procedures.

Baseline sociodemographic and clinical characteristics among community-dwelling individuals with and without SARS-CoV-2 infection during the first wave (i.e., the first half of 2020) are presented in Table 1. Compared with individuals without SARS-CoV-2 infection, community-dwelling individuals with SARS-CoV-2 infection were overall younger, more frequently male, immigrants, and residing in racially/ethnically diverse, large, urban, low-income communities. This trend continued until the end of 2020 (Supplementary Table 2). Furthermore, individuals with SARS-CoV-2 infection had higher rates of diabetes initially, but otherwise had lower rates of most clinical comorbidities and lower rates of recent hospitalization or ED visits, which remained consistent until the end of 2020.

Independent predictors of laboratory-confirmed SARS-CoV-2 infection among community-dwelling individuals tested are presented in Table 2. Results are stratified into the period prior to and immediately following the peak of the first wave of the pandemic in the province, as well as the remainder of 2020, given the detection of significant heterogeneity in the risk of SARS-CoV-2 infection by age across time (P-interaction < 0.0001). During the period leading up to the peak of the first wave of the pandemic, the likelihood of SARS-CoV-2 infection progressively increased across age groups (age 45–65, OR 1.30, 95% CI 1.24–1.37; age 65–75, OR 1.38, 95% CI 1.26–1.51; age 75–85, OR 1.46, 95% CI 1.30–1.63; age 85 and older, OR 1.60, 95% CI 1.41–1.81) compared with the youngest individuals aged 18–45 years. Other independent risk factors for SARS-CoV-2 infection included male sex (OR 1.45, 95% CI 1.31–1.60), residing in the lowest quintile of neighborhood income (OR 1.09, 95% CI 1.01–1.17), residing in more racially/ethnically diverse communities (OR per 1% increase in regional racial/ethnic diversity 1.02, 95% CI 1.01–1.03), immigration to Canada (OR 1.53, 95% CI 1.45–1.61), frailty (OR 1.31, 95% CI 1.01–1.25), hypertension (OR 1.10, 95% CI 1.04–1.17), and diabetes (OR 1.12, 95% CI 1.05–1.20).

Table 2 Predictors of SARS-CoV-2 infection prior to (weeks of January 1 to April 12, 2020) and following (weeks of April 19 to June 7, 2020; weeks of June 14 to December 27, 2020) the peak of the first wave of the pandemic among community-dwelling individuals in Ontario, Canada.

Immediately following the peak of the first wave of the pandemic, the likelihood of laboratory-confirmed SARS-CoV-2 infection across age groups in community-dwelling individuals reversed (Supplemental Fig. 4). Thereafter, the oldest individuals aged ≥ 85 years had the lowest likelihood of SARS-CoV-2 infection (Table 2), with a progressive increase in infection risk as age declined compared with individuals aged ≥ 85 years (age 75–85, OR 1.11, 95% CI 0.97–1.28; age 65–75, OR 1.23, 95% CI 1.08–1.40; age 45–65, OR 1.43, 95% CI 1.27–1.62; and age 18–45, OR 1.58, 95% CI 1.40–1.80). Additionally, there was a progressive increased risk of SARS-CoV-2 infection across all lower quintiles of neighborhood income compared with the highest quintile (quintile 1, OR 2.01, 95% CI 1.87–2.16; quintile 2, OR 1.53, 95% CI 1.43–1.65; quintile 3, OR 1.46, 95% CI 1.36–1.57; quintile 4, OR 1.27, 95% CI 1.17–1.37). Other independent risk factors included male sex (OR 1.67, 95% CI 1.52–1.83), residing in more racially/ethnically diverse communities (OR per 1% increase 1.05, 95% CI 1.03–1.06), immigration to Canada (OR 1.82, 95% CI 1.75–1.90), history of hypertension (OR 1.07, 95% CI 1.01–1.12), and history of diabetes (OR 1.32, 95% CI 1.24–1.40).

Results through the remainder of 2020 are reported in Table 2. Independent risk factors continued to be younger individuals (age 18–45, OR 1.09, 95% CI 1.03–1.14), who had the highest likelihoods of SARS-CoV-2 infection compared with older age groups (age 66–75, OR 0.87, 95% CI 0.83–0.92; and age 76–85, OR 0.85, 95% CI 0.81–0.90). There continued to be a progressive increase in the risk of infection in the lower quintiles of neighborhood income (quintile 1, OR 1.64, 95% CI 1.60–1.67; quintile 2, OR 1.45, 95% CI 1.42–1.48; quintile 3, OR 1.39, 95% CI 1.36–1.42; quintile 4, OR 1.17, 95% CI 1.15–1.19). Other independent risk factors included male sex (OR 1.24, 95% CI 1.20–1.27), residing in more racially/ethnically diverse communities (OR per 1% increase 1.04, 95% CI 1.03–1.05), immigration to Canada (OR 1.99, 95% CI 1.96–2.01), history of hypertension (OR 1.08, 95% CI 1.07–1.10), and history of diabetes (OR 1.33, 95% CI 1.30–1.36).

The absolute and relative risk of laboratory-confirmed SARS-CoV-2 infection among community-dwelling individuals according to the number of independent risk factors identified above (e.g., age category, male sex, residing in a lower income neighborhood, Canadian immigrant status, hypertension, diabetes, and, prior to the peak of the pandemic, a history of frailty), degree of regional racial/ethnic diversity, and time period are shown in Fig. 2 and Table 3. Prior to the peak of the pandemic, SARS-CoV-2 infection rates were generally greater across communities with more racial/ethnic diversity and among individuals with a higher number of risk factors such that individuals living in the most racially/ethnically diverse communities without any other risk factors had a similar rate of infection as individuals living in the least racially/ethnically diverse communities with 3 or more risk factors (Fig. 2a). Individuals with 1, 2, or ≥ 3 risk factors had progressively higher odds of SARS-CoV-2 infection compared with individuals without risk factors across regions of racial/ethnic diversity.

Figure 2
figure 2

SARS-CoV-2 infection rates by number of risk factors and degree of regional racial/ethnic diversity, prior to (A Weeks of January 1–April 12, 2020) and following (B Weeks of April 19–June 7, 2020 and C Weeks of June 14–December 27, 2020) the peak of the first wave of the pandemic in Ontario, Canada. Individuals were classified according to the number of potential risk factors present. Risk factors prior to the first wave were comprised of: male sex, age > 45 years, lowest quintile of neighborhood income, Canadian immigrant, history of frailty, hypertension, and diabetes; following the first wave: male sex, age < 85 years, neighborhood income quintiles 1–4, Canadian immigrant, hypertension, and diabetes. Region was categorized by quartiles of community racial/ethnic diversity rate.

Table 3 Rates and odds ratios of SARS-CoV-2 infection by number of risk factors and degree of regional racial/ethnic diversity, prior to (weeks of January 1 to April 12, 2020) and following (weeks of April 19 to June 7, 2020; weeks of June 14 to December 27, 2020) the peak of the first wave of the pandemic among community-dwelling individuals in Ontario, Canada.

Following the peak of the first wave of the pandemic in mid-April and continuing for the remainder of 2020, infection rates declined across Ontario, however there was less impact in regions with higher degrees of racial/ethnic diversity. Additionally, while accumulation of more risk factors remained associated with a higher risk of SARS-CoV-2 infection in the most racially/ethnically diverse communities, the risk factors now included all age groups < 85 years (Fig. 2b and c; Table 3). Immediately following the first-wave peak, individuals living in the most racially/ethnically diverse communities with 2, 3, or ≥ 4 risk factors had ORs of 1.89, 3.07, and 4.73 for SARS-CoV-2 infection compared to lower risk individuals in their community with 0–1 risk factors. In contrast, in the least racially/ethnically diverse communities, there was little to no gradient in infection rates across risk strata. For the remainder of 2020, although the absolute infection rates increased across all four quartiles in comparison to immediately following the first-wave peak, the ORs for SARS-CoV-2 infection among individuals living in the most racially/ethnically diverse communities with 2, 3, or ≥ 4 risk factors remained significantly elevated at 1.66, 2.48, and 3.70, with little gradient in infection rates across risk strata among the least racially/ethnically diverse communities.

Discussion

We observed three dynamic factors during the first wave and first year of the pandemic associated with laboratory-confirmed SARS-CoV-2 infection among community-dwelling individuals in Ontario that merit consideration for risk determination, response planning, and health policy. First, we detected significant time-varying heterogeneity in the association between age and the odds of SARS-CoV-2 infection. Initially, an incremental increase in age was independently associated with a higher odds of SARS-CoV-2 infection, a biologic risk factor representing a more vulnerable population susceptible to viral infection. After the implementation of public health measures in late March 2020, the association reversed with the odds of SARS-CoV-2 infection increasing across progressively younger age groups compared with community-dwelling individuals 85 years and older. Second, the number of independent risk factors was associated with a stepwise increase in the odds of SARS-CoV-2 infection. Across Ontario, there was an initial increased odds of infection associated with the presence of more clinical risk factors (such as diabetes and hypertension), which was accentuated in regions with higher racial/ethnic diversity. After public health measures were implemented, the absolute and relative risk associated with the accumulation of risk factors diminished overall. However, among the regions of Ontario with the highest rates of racial/ethnic diversity, the relative odds of infection associated with a higher number of risk factors remained present. Third, we observed that the risk of SARS-CoV-2 infection was associated with higher regional racial/ethnic diversity. Following public health measures, regional racial/ethnic diversity remained independently associated with higher odds of SARS-CoV-2 infection, though absolute rates were reduced.

We analyzed cumulative risk stratified across quartiles of community racial/ethnic diversity to ascertain to what degree the presence of an increasing burden of sociodemographic and clinical risk factors was predictive of laboratory-confirmed SARS-CoV-2 infection independent of residential factors. Regions of Ontario with the highest racial/ethnic diversity correlate with the largest sized communities and the highest neighborhood density, household crowding, and deprivation37. Lower infection rates in the period following the pandemic’s initial peak correlated with implementation of national and provincial restrictions, including restrictions in international travel and closure of schools and non-essential businesses. These data suggest that these broad public health interventions are associated with changes in the likelihood of infection by age, but had little impact on the other risk factors driving virus transmission. These risk factors may represent characteristics of individuals at higher risk of exposure, those working in essential services or living in densely populated housing, or both, and not represent a higher biologic susceptibility to infection. Systemic and structural inequities in these determinants of health are likely associated with higher residential and occupational risk of viral transmission and reduced ability to comply with isolation orders. These data may help inform policy to protect more vulnerable populations including essential workers, such as the implementation of paid sick leave, targeted screening in heavily impacted neighborhoods, and “wrap-around” services, to ensure those infected have a safe place to isolate and not infect others in the home.

There are several strengths of our study. This is the first North American population-based analysis of the cumulative effect of clinical and sociodemographic risk factors stratified by community racial/ethnic diversity and time. This approach unmasked considerable differences in the age-related susceptibility to SARS-CoV-2 infection before and after the peak of the first wave of the pandemic that continued until the end of 2020, and the residual risk that remained despite broad public health measures in large, urban, racially/ethnically diverse regions of the province most impacted by COVID-19. These findings have implications for the effectiveness of current public health measures to restrict SARS-CoV-2 infection, particularly among young, male, immigrants living in lower socioeconomic neighborhoods who are likely working or living in conditions not conducive to adherence to these measures. There are important limitations to acknowledge as well. Access to testing for SARS-CoV-2 infection varied over time, including restriction of testing during the earlier periods to the highest risk patients; while some patients with severe COVID-19 illness may also have died before testing. Over time, testing capacity increased and broader testing occurred. As a result, our analysis was focused on individuals that underwent SARS-CoV-2 testing to reduce ascertainment bias and among whom our results apply. In addition, a small number of hospital-based SARS-CoV-2 test results were not available for this analysis, but this low percentage of missingness would not be expected to materially impact the results of available data. Since we did not have individual-level data on weight, smoking status, income, and race/ethnicity, we relied on community-level variables as proxies, which underestimates the extent of socioeconomic and racial/ethnic inequities in SARS-CoV-2 infection. These analyses would have also benefited from incorporating other individual-level socioeconomic data that are currently not routinely captured by the province, such as data on missed workdays and contributing reasons (e.g., quarantine, isolation, etc.), which may relate to infection. During both time periods, a number of chronic conditions were significantly associated with a lower risk of laboratory-confirmed SARS-CoV-2 infection, likely reflecting collider/screening bias among asymptomatic patients undergoing regular care38. Finally, the analyses presented here do not discuss the downstream clinical impact of SARS-CoV-2 infection as these will be reported in a subsequent paper.

Despite the dramatic impact of a provincial lockdown, following the peak of the initial wave of the pandemic in early April, the highest likelihood of SARS-CoV-2 infection emerged among clusters of people represented by younger age, male sex, individuals that immigrated to Canada, with hypertension or diabetes, and residing in the most racially/ethnically diverse, urban, most socioeconomically disadvantaged communities of Ontario. Further efforts appear necessary to reduce the risk of SARS-CoV-2 infection among the highest risk individuals residing in these communities.