Introduction

The COVID-19 pandemic, caused by the novel severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), has affected over 200 million individuals across the globe and killed over 4.4 million as of August 20211. Previous studies have examined risk factors for infection and predictors of disease severity. Established medical risk factors predicting infection and severity encompass age ≥ 65, diabetes mellitus, obesity, smoking, chronic pulmonary disease, cardiovascular disease, chronic kidney disease, malignancy and chronic HIV infection2,3,4,5,6,7. The development of asthma, cardiovascular disease, hypertension, chronic kidney disease and obesity have been shown to be influenced by social determinants of health8. Social determinants of health comprise systemic social inequality, specifically, discrepancies in access to resources that impact health such as education, economic stability, safe living conditions and healthcare9.

Proton pump inhibitors (PPIs) and H2-receptor antagonists (H2RAs) are among the most commonly prescribed medications, used for gastroesophageal reflux disease (GERD) and peptic ulcer disease10, 11. Prior work investigating the link between PPIs and aerodigestive infections such as pneumonia has driven the initial hypothesis that PPIs could potentially compromise physiological barriers to SARS-CoV-2 infectivity11, 12. Although current medical literature has focused on the identification and quantification of medical-related risk factors, there is a paucity of data on how socioeconomic and behavioral influences on infection risk compare to conventional medical factors2,3,4,5,6,7. Here, we sought to investigate if there is an association between chronic acid suppression and SARS-CoV-2 infection in patients seen at three urban health systems in California while accounting for concomitant social determinants of health.

Results

Of 900 medical records initially selected for manual review, twenty were excluded because they only contained the SARS-CoV-2 antibody test (not the nasopharyngeal PCR test), and ten were from patients less than 18 years old at the time of testing (Fig. 1). Five individuals with high degrees of missing data (> 40%) and four with unknown residential zip codes were excluding due to inability to map geocoded social determinants of health. 861 medical records were retained for downstream analyses.

Figure 1
figure 1

Selection of medical records for study. Selection of medical records was performed independently at three academic hospitals of the University of California. Each site uniformly sampled 150 COVID-19 positive (+) and 150 COVID-19 negative (−) patients. Among these, 861 medical records were retained for downstream analyses.

The number of included medical records was similar across the three sites. In total there were 428 (49.7%) documented positive tests (Fig. 2). Approximately half were female, and one-fifth were ≥ 65 years old. The majority self-identified as white, and 22% of the total sample identified as Latinx. Most individuals were calculated to have a low Charlson Comorbidity Index (CCI; 0–2) and were not obese based on body mass index (BMI). Diabetes (13.7%) and immunocompromised status (22.0%) were the most prevalent comorbidities. Nearly one-fifth tested were healthcare workers, and the majority of people lived in single family homes. During the period sampled, the probability of a positive SARS-CoV2 test decreased significantly after March (adjusted odds ratios of testing positive for SARS-CoV2 of 0.81, 0.67, and 0.79 in April, May, and June respectively), likely reflecting the adoption of widespread testing (Fig. 3).

Figure 2
figure 2

Medical and sociodemographic characteristics of medical records included in the analysis. BMI body mass index, GERD gastroesophageal reflux disease, CKD chronic kidney disease, SNF skilled nursing facility.

Figure 3
figure 3

Features identified as correlating with the SARS-CoV-2 testing result. Adjusted odds ratios and P-values were calculated when compared to baseline (ref).

Chronic acid suppression and SARS-CoV-2 risk

To evaluate the effect of chronic acid suppression on SARS-CoV-2 infection risk, logistic model selection using 62 candidate features including chronic PPI use and chronic H2RA use was performed. This process identified neither medication class as significant influences on infection risk (Fig. 3). This finding was unchanged even after manually incorporating chronic acid suppression use (combination of both PPI and H2RA) into the model (adjusted odds ratio 1.04, 95% CI 0.92 to 1.17, P = 0.515; Supplementary Fig. S1). Interestingly, GERD was associated with a slightly lower odds of COVID-19 in both models.

Medical conditions associated with SARS-CoV-2 risk

Several medical comorbidities were identified as significant predictors of incident infection. These included a BMI ≥ 30 (adjusted odds ratio 1.11, 95% CI 1.03 to 1.19, P = 0.0054; Fig. 3), and dementia (adjusted odds ratio 1.32, 95% CI 1.09 to 1.6, P = 0.0042; Fig. 3). Unexpectedly, non-hematogenous malignancy (adjusted odds ratio 0.86, 95% CI 0.77 to 0.95, P = 0.0050; Fig. 3) and history of alcoholic use disorder (adjusted odds ratio 0.81, 95% CI 0.69 to 0.95, P = 0.0108; Fig. 3) were associated with a negative test. Other comorbidities including heart failure and stroke were not found to be statistically significant predictors of COVID-19.

Sociodemographic factors and SARS-CoV-2 risk

Females (adjusted odds ratio 0.89, 95% CI 0.83–0.95, P = 0.000213; Fig. 3) were less likely to test positive, but individuals of Latinx ethnicity (adjusted odds ratio 1.25, 95% CI 1.16 to 1.35, P = 4.36e-08; Fig. 3) were more likely to have positive result. Finally, those who resided in localities with more than 4.4% of the adult population utilizing public transportation as their main means of commute also were more likely to test positive (adjusted odds ratio 1.1, 95% CI 1.03 to 1.18, P = 0.0034; Fig. 3). Among our sample, healthcare worker status did not strongly correlate with testing outcome, and neither did “social deprivation index” and “time spent at home” (Fig. 3).

Discussion

We identified no association between long-term PPI or H2RA use and COVID-19 after controlling for medical and sociodemographic predictors of infection risk. Our results are consistent with, and complement, two previously published studies utilizing a clinical informatics approach. A case–control study using South Korean health insurance claims reported no association between current and past use of PPIs and COVID-1913. A cohort study using the United Kingdom’s Biobank also reported no increased risk of COVID-19 among patients who reported taking PPI or H2RA14. Of note, medication data collection period was 2006–2010 for this study, therefore the study investigators could not confirm that participants were still taking these medications following COVID-19 outbreak. By contrast, a single large United States (US) survey-based study found a positive association between adults self-reporting COVID-19 and daily PPI use, but did not find an association with H2RA15. A notable limitation of this study is the reliance of self-reported COVID-19 when testing was available. The study reported a COVID-19 prevalence of 6.4% which was higher than the national estimates at the time, raising concerns regarding the accuracy of self-report. Furthermore, the survey was administered in English, functionally excluding the near quarter of US Census respondents who report limited to no English proficiency16.

Consistent with previous publications, our data found an increased risk of COVID-19 and a BMI ≥ 3017,18,19,20,21. Furthermore, patients with dementia were more likely to test positive, possibly reflecting frequent interactions with the healthcare system or caregivers and a decreased ability to socially distance. Interestingly, those with underlying solid tumors or alcohol use disorder had a lower risk of infection with SARS-CoV-2. We surmised that this may be due to reduced exposure to infected individuals by spending significant time at home and adhering more strictly to precautions such as mask wearing, handwashing and social distancing, but still having a high need for medical services and therefore receiving more testing. Finally, the negative association between COVID-19 positivity and GERD may reflect functional heartburn, which is comorbid with anxiety and possible proclivity to be tested22.

An important finding of our study is that members of the Latinx community are at a disproportionate risk of contracting SARS-CoV-2, even after controlling for a variety of medical, socioeconomic and behavioral factors. Current literature in the US and globally has commented mainly on the increased risk of infection, severe disease and death in black patients23,24,25,26,27,28,29,30. Our California-based cohort showed that being of Latinx ethnicity is associated with higher rates of infection with SARS-CoV-2. In addition to data from New York City reporting 34% of COVID-19 deaths in Latinx people despite only representing 29% of the population, our California-based cohort enriched in Latinx patients highlights increased risk30, 31. In the 45 states reporting data by ethnic group, 20 report the proportion of cases among Latinx people is twice as high and 11 report it is three times as high as would be expected based on the general population32, 33. These collective data underscore the substantial vulnerability of this ethnic group.

Patient zip codes and geolocation data provided indirect insight into the physical environments and behaviors of the communities in which the patients reside and evaluated effects of social determinants of health on COVID-19 risk. We found patients living in zip codes with increased public transportation reliance were associated with SARS-CoV-2 positivity. Since the ability to work from home, avoid public transportation, financially accommodate for furloughs from work are not equally distributed, the ability to social distance effectively may be one of privilege30, 33, 34. Although these differences in social determinants of health are receiving specific attention in the current pandemic, they speak to important and systematic disparities across healthcare that need to be better understood and addressed. In our cohort, females had a significantly lower probability of infection with SARS-CoV-2. It is currently unclear if there are immunologic or behavioral explanations why female sex may be protective from infection35. It has been proposed that sex hormone regulation of angiotensin-converting enzyme 2 and transmembrane serine protease-2 may be implicated; however, future work is needed to clarify this finding36.

Our study has several strengths. The multi-center design incorporates patients from three large urban academic health systems spanning Northern and Southern California providing care for a diverse patient cohort. Data included confirmed SARS-CoV-2 by PCR test results in addition to medications and comorbidities data collected and curated by physicians to improve accuracy. Application of geolocation using zip code enabled identification of sociodemographic determinants of infection risk. An important limitation of our study is possible selection bias. Our study cohort consisted only of tested patients. Attempts to correct this included incorporating multiple predictors of the probability of testing as well as the probability of infection itself; nonetheless we could not exclude residual bias. Other potential limitations include possible measurement bias (e.g. discordance between EHR documentation and actual patient use of medications), ecological fallacy (i.e. attribution of locality-level data to individuals), and cohort unrepresentativeness (i.e. differences between urban, tertiary-care cohorts and the general US population). It should be noted that testing was performed both for symptom-triggered testing and pre-procedural or hospitalization screening. Thus, the pretest probability was not homogeneous. Associations identified in this observational study should not be assumed causal and require prospective validation.

This multicenter case–control study demonstrated no association between chronic acid suppression use and risk of COVID-19. In addition, we found that Latinx ethnicity and residing in a zip code with high public transportation usage were associated with increased risk of COVID-19 which is in keeping with growing evidence that this pandemic has disproportionately affected vulnerable communities in our society. Further work is needed to address reducing the burden of the SARS-CoV-2 pandemic on these communities.

Methods

Study design

This is a case–control study involving the review of medical records from three campuses of the University of California (UC) Health system: San Francisco, San Diego, and Los Angeles. The study was approved by the Institutional Review Board (IRB) of the University of California, San Francisco (UCSF; IRB #20-30549), which was accepted and approved by the IRB of the University of California, San Diego (UCSD) via a memorandum of understanding (UC IRB Reliance #3499). Informed consent was exempted by the IRB of UCSF and UCSD. The IRB of the University of California, Los Angeles (UCLA) determined that the study did not constitute human subjects research and therefore waived approval and informed consent. The study was performed in accordance with relevant guidelines and regulations of the three respective academic medical institutions and their IRB committees. The STROBE guideline for case–control studies was referenced for study design and reporting the findings (Supplementary Table S1)37.

At the time of the study’s inception in May 2020, de-identified databases of electronic health records (EHR) data at UC Health were queried to support sample size calculations. At that time, the estimated prevalence of positivity for SARS-CoV-2 virus was 5.6% among those tested across all campuses of UC Health. At UCSF, the prevalence of PPI use was estimated to be 7.5%. We obtained approximate effect sizes from a prior study that assessed the relative risk of acute gastroenteritis among chronic PPI users, motivated in part by work suggesting the possible enteric transmission of the SARS-CoV-2 virus38, 39. We calculated that at least 416 cases and controls would be needed to detect two-fold relative risk among chronic PPI users compared to non-users with 80% power. These numbers were rounded up to obtain 450 cases and controls each, or 900 medical records in total. These were divided evenly across all three participating sites (300 records each).

Patient selection and data collection

Records of all individuals who underwent SARS-CoV-2 nasopharyngeal PCR testing at any of the three health systems (UCSF, UCSD, UCLA) from March 1, 2020 to June 10, 2020 were separately retrieved via EHR database query. Stratified uniform sampling was performed to yield 150 cases and 150 controls for manual review at each site. A standardized case report form and accompanying data dictionary were developed. Data elements included PPI and H2RA use, Charlson Comorbidity Index (CCI), additional medical comorbidities and immunocompromised states, and sociodemographic factors (Supplementary Methods). Variable definitions and data extraction process was determined and fixed a priori and based on consensus among the co-authors.

For eligibility criteria of both cases and controls (collectively referred to as patients), medical records of all patients ≥ 18 years old at the time of their initial SARS-CoV-2 test were included in the study database. Result of the SARS-CoV-2 PCR testing was determined by documentation in the EHR of each site. For patients who were tested more than once during the data collection period and were consistently negative, the date of the first result was recorded. Patients who tested positive after initial negative test(s) were recorded as positive, and the date of the first positive test was recorded. Chronic acid suppression was defined as the use of PPI or H2RA daily or as needed for at least four weeks, documented by medical providers in charts or prescription refill records. The decision for this is based on studies demonstrating the incubation period of SARS-CoV-2 to possibly be two to three weeks or more, thus a minimal of four weeks ensures adequate acid suppression prior to SARS-CoV-2 exposure40, 41.

All medical history was reviewed and annotated by physicians (BZ, AS, SB, SD, DR). Disagreements during extraction pertaining to the coding of specific variables were adjudicated through discussion and resolved through consensus voting (Supplementary Methods). The extraction sheet was modified once to better capture race and ethnicity. Following the completion of data abstraction, datasheets were compiled and subject to quality control measures via both manual (JS) and informatics methods in order to ensure consistent and accurate data capture across all three sites.

Geomapping social determinants of health

Patient records were mapped to geographic areas using postal zip codes corresponding to their home address. However, most geocoded data including reports from the United States Census utilize zip code tabulation areas (ZCTAs). Postal zip codes were mapped to zip code tabulation areas (ZCTAs), with the method implemented by Williams-Holt42,43,44.

Geocoded covariates related to population and housing density were obtained from the 2010 United States Decennial Census45. Additional geocoded covariates were obtained from the 2018 United States American Community Survey (ACS) using the R package tidycensus v0.9.9.246. For each ZCTA, median household income, public transportation as means of transportation to work for workers 16 and over, and walking as means of transportation to work for workers 16 and over were extracted.

The United States Census Bureau provides Community Resilience Estimates (CREs) at the census tract level defining what percentage of residents in a census tract have 3 + risk factors, 1–2 risk factors, or 0 risk factors47. These risk factors are derived from the 2018 American Community Survey (ACS) and the 2018 National Health Interview Survey (NHIS).

Mask-wearing by county was obtained from a survey conducted by The New York Times48. Briefly, survey responses were collected from 250,000 individuals between 7/2/2020 and 7/14/2020 who were asked: “How often do you wear a mask in public when you expect to be within six feet of another person?” The possible responses were: never, rarely, sometimes, frequently, and always.

Daily social distancing mobility data was obtained from the data company SafeGraph49, 50. Briefly, daily anonymized and aggregated mobile device data were obtained from SafeGraph from 2-12-2020 through 7/12/2020. Mobile devices were aggregated by Census Block Groups (CBGs) based on home location and the median percentage of time a device was observed at home versus observed at all during a given time period was calculated for all observed devices in a CBG each day.

SARS-CoV-2 Community Mobility Reports were obtained from Google from February 2020 through July 202051. Briefly, these reports quantify how visits and length of stay at various locations changed compared to a baseline using aggregated and anonymized data from users who opted into Google Location History. Google reported daily values for time spent at the following locations aggregated at the county level: grocery & pharmacy, parks, transit stations, retail & recreation, residential, and workplaces. The time spent at a location was represented as a percent change of time spent at the location compared to the baseline.

The Social Deprivation Index (SDI) has previously been used to quantify levels of disadvantage and evaluate their associations with health outcomes52. The SDI used in this study was developed by the Robert Graham Center and is comprised of seven demographic characteristics from the 2015 ACS53. These characteristics include: income, education, employment, housing, household characteristics, transportation, and demographics. Reported SDI measures for ZCTAs were used in this analysis.

Outcomes

The primary outcome was association between chronic acid suppression (PPIs, H2RAs) and risk of SARS-CoV-2 infection. Secondary outcomes included the association between SARS-CoV-2 infection and the month of testing, demographics, comorbid medical conditions, immunocompromised state, CCI, and social and mobility variables (Supplementary Table S2).

Statistical analysis

Medical records and covariates with a high degree of missing data (> 40%) were excluded from subsequent analysis. Multiple imputation was then performed using the R package mice v3.11.0 to impute 10 datasets with 20 iterations per dataset to ensure convergence54.

Continuous covariates were converted to binary variables for interpretability in the logistic regression model and corresponding forest plot. To identify binary cut points which may be associated with an increased chance of a positive SARS-CoV-2 test, we conducted a grid search of all possible values between the 10th and 90th percentiles for each continuous covariate. The binary cut point associated with the lowest p-value resulting from a chi-square test was used to binarize each continuous covariate.

Prior to regression analysis, chi-squared and ANOVA tests were performed where appropriate to test for associations between all variables and positivity for SARS-CoV2 (Supplementary Table S3). Subsequently, to identify and select the subset of input covariates that were most relevant to a positive SARS-CoV-2 test, a multistep process was employed. First, a stepwise linear model to predict a positive SARS-CoV-2 test was fit separately for each of the 10 imputed datasets. Both forward and backward selection using the Akaike Information Criterion were used to identify the subset of covariates to be included in the final model for each imputed dataset. Next, covariates which were included in all 10 reduced models were chosen to be included in the final pooled model. Covariates which appeared in ≥ 50% of the reduced models were considered individually for inclusion in the final pooled model with a Wald test. The selected covariates were used in the final reduced logistic regression model. We initially fit a mixed-effects model on the combined dataset using study site as a random intercept55. These results suggested a low degree of heterogeneity attributable to study site (i2 statistic = 0%). Consequently, all subsequent results were obtained by fixed-effect (pooled) models.