Citywide serosurveillance of the initial SARS-CoV-2 outbreak in San Francisco using electronic health records

Routledge, Isobel; Epstein, Adrienne; Takahashi, Saki; Janson, Owen; Hakim, Jill; Duarte, Elias; Turcios, Keirstinne; Vinden, Joanna; Sujishi, Kirk; Rangel, Jesus; Coh, Marcelina; Besana, Lee; Ho, Wai-Kit; Oon, Ching-Ying; Ong, Chui Mei; Yun, Cassandra; Lynch, Kara; Wu, Alan H. B.; Wu, Wesley; Karlon, William; Thornborrow, Edward; Peluso, Michael J.; Henrich, Timothy J.; Pak, John E.; Briggs, Jessica; Greenhouse, Bryan; Rodriguez-Barraquer, Isabel

doi:10.1038/s41467-021-23651-6

Download PDF

Article
Open access
Published: 11 June 2021

Citywide serosurveillance of the initial SARS-CoV-2 outbreak in San Francisco using electronic health records

Isobel Routledge ORCID: orcid.org/0000-0002-0354-4076¹^na1,
Adrienne Epstein ORCID: orcid.org/0000-0002-8253-6102¹^na1,
Saki Takahashi¹^na1,
Owen Janson¹,
Jill Hakim¹,
Elias Duarte¹,
Keirstinne Turcios¹,
Joanna Vinden¹,
Kirk Sujishi¹,
Jesus Rangel¹,
Marcelina Coh¹,
Lee Besana¹,
Wai-Kit Ho¹,
Ching-Ying Oon¹,
Chui Mei Ong¹,
Cassandra Yun ORCID: orcid.org/0000-0002-6971-2060¹,
Kara Lynch¹,
Alan H. B. Wu¹,
Wesley Wu²,
William Karlon¹,
Edward Thornborrow¹,
Michael J. Peluso ORCID: orcid.org/0000-0003-0585-6230¹,
Timothy J. Henrich¹,
John E. Pak²,
Jessica Briggs ORCID: orcid.org/0000-0002-8078-3898¹,
Bryan Greenhouse ORCID: orcid.org/0000-0003-0287-9111¹ &
…
Isabel Rodriguez-Barraquer¹

Nature Communications volume 12, Article number: 3566 (2021) Cite this article

2216 Accesses
10 Citations
129 Altmetric
Metrics details

Subjects

Abstract

Serosurveillance provides a unique opportunity to quantify the proportion of the population that has been exposed to pathogens. Here, we developed and piloted Serosurveillance for Continuous, ActionabLe Epidemiologic Intelligence of Transmission (SCALE-IT), a platform through which we systematically tested remnant samples from routine blood draws in two major hospital networks in San Francisco for SARS-CoV-2 antibodies during the early months of the pandemic. Importantly, SCALE-IT allows for algorithmic sample selection and rich data on covariates by leveraging electronic health record data. We estimated overall seroprevalence at 4.2%, corresponding to a case ascertainment rate of only 4.9%, and identified important heterogeneities by neighborhood, homelessness status, and race/ethnicity. Neighborhood seroprevalence estimates from SCALE-IT were comparable to local community-based surveys, while providing results encompassing the entire city that have been previously unavailable. Leveraging this hybrid serosurveillance approach has strong potential for application beyond this local context and for diseases other than SARS-CoV-2.

Using sero-epidemiology to monitor disparities in vaccination and infection with SARS-CoV-2

Article Open access 04 May 2022

COVID-19 SeroHub, an online repository of SARS-CoV-2 seroprevalence studies in the United States

Article Open access 26 November 2022

German federal-state-wide seroprevalence study of 1st SARS-CoV-2 pandemic wave shows importance of long-term antibody test performance

Article Open access 18 May 2022

Introduction

The rapid spread of the SARS-CoV-2 virus has laid bare important gaps in routine infectious disease surveillance. Serological data, particularly when collected at high spatial and temporal resolutions, are a key resource for addressing many key epidemiological questions since they directly quantify the proportion of the population that has been infected by a pathogen^1,2. For SARS-CoV-2, serology is particularly useful given the high levels of disease under-ascertainment: serologic surveillance is the gold standard for estimating attack rates (the proportion of the population that has been infected) and highly complementary to virologic and syndromic surveillance systems for providing vital information on where a population is along the epidemic curve³. Population-based serosurveys that employ a probabilistic sampling frame are considered to be the gold standard for estimating seroprevalence. However, performing large population-based serosurveys can be prohibitively resource-intensive to initiate swiftly or perform repeatedly, especially during an ongoing outbreak, as demonstrated by the relative sparsity of population-based versus convenience sampled serosurveys for SARS-CoV-2 that have been conducted to date³. For example, to date, no population-based serosurveys have been conducted for the city of San Francisco or wider Bay Area, and few have been conducted in the United States, limiting our ability to identify of risk factors for infection, understand population-level immunity, and determine which populations and localities may be in need of targeted public health resources such as testing, contact tracing, or vaccine allocation⁴.

Residual blood samples from readily available sources (e.g., blood donors or remnant samples collected from routine medical care visits), especially when linked to individual-level meta-data, provide a unique opportunity to address these limitations and to efficiently survey a population for antibodies over an extended period of time^5,6. Such studies were found to be useful in the 2009 H1N1 influenza pandemic^{7,8,9,10,11,12,13}, facilitating analyses on a broader spatial and temporal scale than typical cross-sectional serological surveys allow. However, in most studies that use residual blood samples the source population is unknown¹⁴. This presents a major limitation, as the results are difficult to interpret when it is not known whether the sampled population is representative of the population of interest.

The San Francisco Bay Area has widely been recognized for taking an early and proactive response to COVID-19. San Francisco Bay Area counties introduced a shelter-in-place order on 17 March 2020, requiring residents to remain at home unless leaving the house for essential activities. Relative to many other US cities, few cases were detected in San Francisco during the early months of the epidemic, a pattern which continued as the pandemic progressed¹⁵. However, like many other areas, a high proportion of asymptomatic infections and limited access to diagnostic testing during this time makes it difficult to interpret these numbers. Results from an early San Francisco seroprevalence study conducted on convenience samples in late March to early April 2020 suggested that <1% of the population had been infected overall¹⁶, in contrast to a seroprevalence of >6% estimated by a community study focusing on a specific neighborhood, particularly among the Hispanic/Latinx population¹⁷ but consistent with a survey of a rural Bay Area community¹⁸. The lack of citywide, representative seroprevalence estimates during this time period limits the ability to determine to what degree these discrepancies reflect heterogenous exposure or differences in study design.

Here we present a blueprint and the early results of the ongoing SCALE-IT study (Serosurveillance for Continuous, ActionabLe Epidemiologic Intelligence of Transmission), leveraging residual sera samples from two large hospital systems in San Francisco, California to quantify the prevalence of SARS-CoV-2 antibodies. Importantly, these remnant samples are linked to electronic health records (EHRs) enabling careful algorithmic selection based on demographic and clinical variables, improving their representativeness to the general population. We tested over 5000 samples collected from late March to June 2020 from San Francisco residents, and calculated raw and adjusted seroprevalence estimates over space, time, and socio-demographic indicators. These data provide estimates of the overall seroprevalence in San Francisco during the initial phase of the local SARS-CoV-2 outbreak and highlight spatial and demographic heterogeneities in transmission across the city.

Results

Between March 28, 2020, and June 26, 2020, we collected a total of 5244 samples, representing 4735 individual patients, from UCSF Health (n = 3037 patients) and ZSFG (n = 1698 patients) (Fig. 1, Supplementary Fig. 1). By design, the age distribution of sampled individuals remained consistent throughout the study period, and the geographic distribution of residents matched the proportion of the San Francisco population living in each zip code (Fig. 2). Our sample did not achieve the target sample size for the youngest age group due to the limited number of children receiving routine phlebotomy in the UCSF and ZSFG health systems (Table 1). Our results were relatively representative of the San Francisco population by race and ethnicity, although our sample overrepresented those who identified as Black/African American and slightly underrepresented those who identified as Asian.

**Fig. 1: Flow diagram of sampling algorithm.**

**Fig. 2: Distributions of SCALE-IT samples.**

Table 1 Distribution of Socio-demographic characteristics of patients sampled. Table showing socio-demographic characteristics of patients sampled in SCALE IT and of the San Francisco population (2019).

Full size table

Overall, from 5244 samples we identified 192/4735 positive samples from unique patients for a raw seroprevalence of 4.1%. After weighting for age group and sex to match the population structure of San Francisco and correcting for test performance characteristics (overall sensitivity of 93.7% and specificity of 99.6%), this corresponds to an estimated population seroprevalence of 4.2% (95% Credible Interval [CrI]: 2.1–6.3%). Based on the number of cases reported during the period covered by the study, we estimate that only 4.9% of all infections were ascertained by the reporting system (95% CrI: 3.3–9.9%) (Supplementary Methods 1). Amongst pregnant women seeking routine care (N = 268), we estimated a raw seroprevalence of 3.4% (9/268 seropositive), and after adjusting for test performance characteristics we estimate 3.5% (95% CrI: 1.1–6.4%) seroprevalence amongst this group. This estimate in our sentinel population group is consistent with the estimates across our overall population of samples.

We did not observe statistically significant differences in seroprevalence by age (Fig. 3a) or hospital system (Supplementary Table 2, Supplementary Data 1). We found seroprevalence to be nearly twice as high in uninsured individuals (6.3%, 95% CrI: 3.1–9.9%) than in those with some form of insurance, [Private/Commercial: 3.4% (95% CrI: 1.6–4.7%); Government: 4.0% (95% CrI: 2.3–5.0%)] (Fig. 3b). With respect to race/ethnicity, seroprevalence was highest in those identifying as Hispanic (6.3%, 95% CrI: 4.4–8.3%) followed by Black or African American (4.8%, 95% CrI: 2.8–7.0%), and lowest in those who identified as Asian (2.3%, 95% CrI: 0.8–3.5%) (Fig. 3c). Seroprevalence was almost twice as high in those identifying as Male (5.3%, 95% CrI: 3.7–6.6%) compared to Female (2.7%, 95% CrI: 1.1–3.6%) (Fig. 3d). Although these samples were obtained over a 3-month collection period, given the relatively low attack rate during these initial stages of the pandemic in San Francisco, we were not able to detect meaningful differences in seroprevalence over time (Supplementary Table 2, Supplementary Figs. 2 and 3).

**Fig. 3: Stratified seroprevalence by demographic group.**

Geographically, we found seroprevalence to be highest in the Bayview neighborhood in the southeast region of the city, at 8.1% (95% CrI: 4.6%, 12.3%) (Fig. 4a, Supplementary Table 3, Supplementary Data 2). Although several other neighborhoods had similarly high seroprevalences, there was much more uncertainty around these estimates (Fig. 4b). These findings are consistent with patterns of incidence in the city during this period of time (Fig. 4c). We identified 157 individuals who were homeless in our study, and amongst this group seroprevalence was estimated to be 10.8% (95% CrI: 6.1%, 16.5%).

**Fig. 4: Multi-panel map of seroprevalence by geography.**

As validation of the representativity of our approach using curated remnant samples, we compared results from this study to two contemporaneous community-based serosurveys conducted in specific neighborhoods of San Francisco. First, we compared these results to a cross-sectional serosurvey carried out in a census tract within the Mission District (census tract 022901, zip code 94110) between April 25 and April 28, 2020¹⁷. Chamie et al. tested 2545 census tract residents for SARS-CoV-2 antibodies and estimated seroprevalence to be 3.1% (95% CI: 2.5–3.9%). This is consistent with our findings of 3.8% seroprevalence (95% CrI: 1.8–6.3%) between April and June 2020 in the broader Mission District neighborhood. Second, we compared our results to a cross-sectional serosurvey carried out in two census tracts in San Francisco’s 10th District between May 30 and June 2, 2020 (https://unitedinhealth.org/sf-district-10), located in the Bayview neighborhood. Among the nearly 1600 individuals tested for antibodies, seroprevalence was estimated at 5.6% in Hispanic participants (n = 320), 2.3% in Black participants (N = 397) and 0.4% in white participants (n = 231). The relatively high seroprevalence we detected in the Bayview neighborhood through our study is comparable to the results of this community-based study, and the disparities by race/ethnicity were similar in direction, though different in magnitude, to those identified through our remnant sample study as well. It is worth noting that the community studies available for comparison also rely upon convenience sampling as participation in the studies was voluntary, and therefore may contain inherent selection biases themselves.

Discussion

In this study, we developed and piloted a scalable and systematic pipeline using remnant samples from two major hospital networks in San Francisco to select, collect, and test specimens for SARS-CoV-2 antibodies (SCALE-IT). Through this effort, we estimated seroprevalence during the early months of the epidemic to be relatively low throughout San Francisco (4.2%), but still representing more than 20 times the number of infections identified by PCR-confirmed cases at that time. This may be due to the limited availability of PCR testing during the beginning of the pandemic, and the lack of testing of asymptomatic individuals. We also identified important disparities in seroprevalence at the neighborhood level, with the highest seroprevalence in the Bayview neighborhood in the southeast region of the city, as well as disproportionately higher seroprevalence in individuals experiencing homelessness and those identifying as Hispanic, Black/African American, or male. Leveraging this hybrid serosurveillance approach has potential for broad application beyond this local context and for diseases other than SARS-CoV-2.

The heterogeneities in seroprevalence we observed by race/ethnicity and socio-economic status—here obtained from EHR data on health insurance status and whether individuals were housed—echo the patterns, which have been highlighted over the course of the pandemic at national and global levels^19,20. Specific to San Francisco, our results provide estimates of SARS-CoV-2 cumulative exposure at a granular spatial resolution with a scope covering the entire city; despite low overall seroprevalence, we identified specific neighborhoods with disproportionately higher seroprevalence. Interestingly, we also found seroprevalence to be approximately twice as high in those identifying as male compared to female. Potential explanations for this difference include differential pathogen exposure by sex, which is supported by findings of other studies elsewhere¹⁴ and in San Francisco, finding PCR positivity rates of 1.2% (20/1658) in women and 3.3% (63/1908) in men, with an odds ratio of 2.71 (1.64-4.69) for PCR positivity in males, and also that the majority (74%,) of those who tested positive by PCR or were seropositive for SARS-CoV-2 were frontline workers and unable to shelter-in-place¹⁷. It has been found that males and females mount different immune responses and infection severity²¹, which could affect assay sensitivity, however, we believe this is unlikely to explain the large difference we see in our estimates as we do not see sex-based differences in the sensitivity of our assay on the positive controls used in the study, which represent a range of disease severities.

While a key strength of our approach was leveraging residual sera from two large health system networks and using data from EHRs to algorithmically select samples for inclusion, there are limitations to this type of surveillance that require consideration. Most obviously, patient samples may not be fully representative of the underlying population. This may be particularly true during “shelter-in-place” periods, when behavioral changes may affect the availability and characteristics of the patient population. These issues can ideally be mitigated by careful sample selection, as done here by focusing on a subset of outpatients, with the possibility of further refinement by inclusion of additional selection criteria (e.g., by restricting or weighting sampling to consider specific visit types or underlying conditions). Representativity of the serosurveillance system could also be enhanced by including a broader network of local health systems. We also recognize that the generalizability of our findings may differ by age groups, and is likely to be lower in children who were underrepresented in our sample set despite the stratified sampling framework. Additional study designs, such as school-based serosurveys, could be leveraged to augment these data to prospectively assess seroprevalence in specific age groups, possibly by using non-invasive, saliva-based antibody testing²². Despite including over 5000 samples, our study was not powered to detect differences between covariates or by time in a multiple regression framework, in part due to San Francisco’s success in maintaining low transmission and thus low seroprevalence during this time period. Lastly, while we validated our estimates against results from available community-based studies, further validation would be ideal to assess validity of results and findings.

Whilst our estimates of seroprevalence in the Mission and Bayview districts were consistent with community studies and we found similar disparities by demographics, we did find slightly higher seroprevalences overall. This was particularly true for the Bayview/Sunnydale surveys where we estimate a seroprevalence of 8.1% (95% CrI: 4.6–12.3%) for Bayview/Hunter’s point neighborhood, whilst a community survey in the census tract 231.02 which lies within the neighborhood, found a raw seroprevalence estimate of 24/784 (3.06%). This difference may be due to heterogeneity within the neighborhood, i.e., higher seroprevalences in other census tracts not sampled in the community survey, or differences in the underlying population sampled. There are also differences in the timing of sample collection; we collected samples up until the end of June 2020, whereas this study was conducted between May 30 and June 2, 2020. In addition, the difference could be caused by our study sampling individuals more at risk of exposure than the community surveys. It is also interesting to compare our results to other serosurveys, which sampled the wider San Francisco Bay Area during the early months of the pandemic. A serosurvey across the wider San Francisco Bay Area found a seroreactivity 0.1% in 1000 blood donors, and 0.26% in 387 hospitalized patients admitted for non-respiratory indications in early April 2020²³. An additional study of residual sera in the San Francisco Bay area between 23 and 27 April found a seroprevalence of 1.0%¹⁴. These results are quite a bit lower than our estimate for April of 4.6% (2.7–6.3%), but not directly comparable as the source populations drawn for these studies are not fully characterized and are unlikely to be representative of the San Francisco population. In addition, samples in both studies included residents from outside of San Francisco county, including counties known to have experienced very low transmission of SARS-CoV-2 during this time period.

We did not find a clear increase in seropositivity over time, whilst case counts in San Francisco did increase, albeit slowly, during the observation period. This lack of increase in seroprevalence over time may be the result of changes in some of the demographics of our sample population over time (Supplementary Fig. 2, Supplementary Table 4), as the proportion of samples from patients who identify as white, female and who have private insurance (all of which we found had lower seroprevalence) increased over the period of sample collection. This could also be explained by a lack of power to detect small changes at such low seroprevalence. If implemented in a context where there was more power to detect changes and/or stratify by additional demographic variables when selecting samples, then our approach could provide valuable data to explore additional questions of public health interest, such as the impact of interventions and changes in ascertainment rates over time.

In this pilot study, we developed and implemented a SARS-CoV-2 serosurveillance system to detect population-level pathogen exposure in near-real time, and demonstrated how data collected through this platform were comparable to results from more resource-intensive community-based serological studies and incidence data. The appeal of this hybrid approach is that it achieves many of the strengths of population-based surveys and provides rich data, while leveraging existing infrastructure to allow for much greater efficiencies often seen in convenience sampling approaches. Using EHR data, we were able to develop a stratified sampling frame, ensuring improved representativeness of the results in contrast to serosurveys performed using convenience samples without these key pieces of information¹⁴. At the same time, we used these data to identify important spatial and demographic heterogeneities in seroprevalence within our study site; serosurveys performed on residual samples are often limited to coarser levels of meta-data on the sampled population²⁴. The relative ease with which SCALE-IT can be implemented means that it can be deployed over a broad geographic scale, continuously over time, and dynamically adjusted to address specific surveillance needs.

We envision multiple lines of work for future directions. First, the samples that we have selected, collected, and processed in this work could serve as a valuable biorepository for future applications. The ability to link rich EHR data to a large bank of well-curated serum samples opens up opportunities for additional analysis including longitudinal studies of patients. Second, as serosurveillance efforts will be fundamental to monitor SARS-CoV-2 transmission rates and evaluate the impact of control interventions (both Non Pharmaceutical Interventions and pharmaceuticals) over the coming months and years, future work could leverage these and prospective serological data to parametrize mechanistic models and to study the effects of control strategies on infection rate. Third, as discussed by others^1,2, our local SCALE-IT platform could easily be expanded to contribute to a ‘Global Immunological Observatory’ to perform serosurveillance for other pathogens beyond the SARS-CoV-2 virus. Data generated by such an observatory could be used to address specific public health gaps including serosurveillance for seasonal pathogens such as influenza or emerging infections. Lastly, the insights gained from developing this platform could serve as a blueprint for adoption by other health systems in various contexts.

Methods

Data source

Residual serum samples from routine blood draws from the University of California, San Francisco (UCSF) and San Francisco Department of Public Health (SFDPH) inpatient and outpatient healthcare systems were sampled from March 28, 2020, onward. UCSF Medical Center is a network of three hospitals with ~1.8 million outpatient visits annually (https://www.ucsfhealth.org/about/annual-reports). The SFDPH hospital, Zuckerberg San Francisco General Hospital (ZSFG), is a city hospital that provides trauma, medical, and surgical services to a heterogeneous population of largely un- or underinsured patients, including the city’s homeless population, and serves roughly 100,000 patients per year (https://zsfg.ucsf.edu/about-ucsf-zsfg).

We obtained daily EHRs for all patients in these networks undergoing routine blood testing, defined as blood chemistries and tests for sexually transmitted infections and rubella. EHR data included information on patient demographics, address, insurance provider, and diagnoses. We also obtained information on all tests for respiratory infections (including SARS-CoV-2) performed on patients in the 6 months prior to the blood draw.

Sampling methodology

We aimed to collect 2000 samples monthly. We determined this sample size based on considerations of both statistical power and feasibility. To estimate seroprevalence with an absolute error of 5% and at Type I error of 5%, and a prior of 20% seroprevalence, a sample size of 246 individuals would need to be tested each month. We determined that an overall sample size of a minimum 1230 samples per month would be sufficient to allow stratification of results by five age groups (0–19, 20–39, 40–59, 60–79, 80+ years).

From the full list of residual serum samples that were available, we restricted our sampling frame to samples from individuals undergoing routine blood testing. We included patients residing in San Francisco, including those experiencing homelessness. We excluded individuals who were tested for SARS-CoV-2 during the visit when they received their blood draw (except if the test was for routine purposes, such as testing prior to an elective procedure or admittance to the hospital). We did not have any exclusion criteria for previous visits or tests for SARS-CoV-2 of any severity. We restricted our sample to outpatient and emergency department visits for adults; for the youngest age group, we included both inpatient and outpatient visits due to small numbers of available samples. Finally, we excluded samples if a sample from the same patient had been selected within the previous 30 days.

After obtaining the list of eligible samples according to the above criteria, we selected serum samples for the study using a sampling algorithm aimed to ensure an adequate sample size for each of five age strata and to maximize geographic representativity. After setting a daily target sample size for our overall population, we divided this equally between five age bins to set a target sample size for each age bin. We also set a target sample size for each zip code, which was proportional to its population size. For each zip code with a larger number of eligible samples than its target size, we kept all samples from age groups with sample sizes below or at their target and obtained a random sample from any age group that had an eligible sample size above the target size. We intentionally oversampled pregnant women as a healthy sentinel population by aiming to obtain up to 10% of the samples from pregnant women undergoing routine care, as defined by ICD-10 codes.

Sample processing

Remnant samples were stored at +4 °C in outpatient laboratories at UCSF and ZSFG, and collected by our study team twice every week. After collection, samples were centrifuged for 15 min at 3500 g before aliquoting a working stock of 300 μL into 96 well-barcoded tubes, diluting in 1:1 HEPES storage buffer, and storing at +4 °C. The remainder of the sample was aliquoted into 1.4 mL barcoded tubes and stored at −20 °C.

Serologic assays and validation data

We used two serologic assays for this study in order to maximize assay specificity. First, we screened all samples using an in-house ELISA assay and then performed confirmatory testing on a subset of samples above a threshold value using an in-house Luminex assay. The ELISA assay detected IgG to the receptor-binding domain (RBD) of the spike (S) protein, based on published protocols²⁵ with minor modifications, described here briefly. 1 μg of RBD was used to coat each well of 384-well high binding plates, secondary antibody was diluted 1:5000 (Southern Biotech #2048-05), and OPD was used to develop the plates. Concentration values were calculated from the ELISA optical density using a plate-specific standard curve from serial dilutions of a pool of positive control samples²⁶. Samples with an ELISA concentration value above 0.049 were selected for confirmatory testing (see Supplementary Methods 1, Supplementary Tables 5 and 6).

For confirmatory testing, we used a multiplex microsphere assay (Luminex platform) to detect IgG against the SARS-CoV-2 S protein, RBD, and the nucleocapsid (N) protein, based on a standardized serology protocol with minor modifications²⁷. Briefly, plasma samples were diluted to 1:100 in blocking buffer A (1× PBS, 0.05% Tween, 0.5% bovine serum albumin, 0.02% sodium azide). Antigen concentrations used were as follows: S: 4 μg/mL, RBD: 2 μg/mL, and N: 3 μg/mL. As above, concentration values were calculated from the Luminex median fluorescent intensity using a plate-specific standard curve from serial dilutions of a pool of positive control samples. A logistic regression model including the concentration values of the three antigens for each sample was determined to have the highest cross-validation accuracy for classification and was used to establish a cutoff for positivity (see Supplementary Methods 1).

Serologic assays were optimized using positive and negative controls from several sources. Serum samples from 127 patients with PCR-confirmed SARS-CoV-2 infections (representing 266 total samples, with 1-4 longitudinal monthly time points per individual beginning at 3 weeks post-symptom onset) were obtained from the Long-term Impact of Infection with Novel Coronavirus (LIINC) study (https://www.liincstudy.org/) and used as positive controls. Importantly, participants in this cohort represent a range of infection severities (ranging from asymptomatic to severe), age, sex, and ethnicity and race. Serum samples from 119 individuals obtained prior to the emergence of SARS-CoV-2 were used as negative controls. The overall sensitivity of our serial testing approach using positive and negative controls was 93.7% (95% CrI = 89.0%, 97.2%) and specificity was 99.6% (95% CrI = 98.2%, 100.0%) (Supplementary Tables 1, 5 and 6, Supplementary Methods 1).

Analytic methods

Raw seropositivity was determined as the proportion of all samples from unique individuals that tested positive on the confirmatory assay. We then produced estimates of seroprevalence adjusted for the sensitivity and specificity of the serial testing approach, incorporating potential conditional dependence of the tests as described in Gardner et al.²⁸ (see Supplementary Methods 1). We stratified by covariates to obtain seroprevalence estimates for each stratum (age, sex, insurance status, ethnicity, and neighborhood). To identify neighborhoods, we geocoded sample addresses using the Google Cloud Geocoding API using the ggmap R package²⁹. Samples (n = 365 unique individuals) which could not be geocoded to rooftop (n = 261) and/or were from homeless individuals (n = 157) were excluded from neighborhood level estimates of seroprevalence, however, estimates of seroprevalence were calculated for homeless individuals separately and provided alongside neighborhood level estimates of seroprevalence. All analysis was conducted using the R statistical software³⁰ and the Stan programming language³¹. Code and data to reproduce all analyses are available at: https://github.com/EPPIcenter/scale-it³².

Institutional Review Board (IRB) approval

This study received expedited review approval by the UCSF IRB #20-30379 (Serological Surveillance of SARS-CoV-2 in Residual Serum/Plasma Samples). The IRB did not require patient contact or written consent to use residual sera. The LIINC study (providing positive control samples) was approved by the UCSF (IRB #20-30479). Pre-pandemic samples used as negative controls came from the New York Blood Bank, and were de-identified and not subject to IRB review for use in this study.

Reporting summary

Further information on research design is available in the Nature Research Reporting Summary linked to this article.

Data availability

To avoid identifiability of data and to comply with institutional policy around data privacy, we have provided summarized data by demographic group and neighborhood instead of individual-level data, used to generate Fig. 4a, b, as well as posterior values for seroprevalence by demographic group used to generate Fig. 3a–d. The aggregated data used for this analysis can be found on Github at https://github.com/EPPIcenter/scale-it/ (DOI:10.5281/zenodo.4695335)²⁶. Maps were created in QGIS (QGIS.org, QGIS Geographic Information System. QGIS Association. http://www.qgis.org, 2021) using shapefiles in the public domain (Fig. 2c: California. Metropolitan Transportation Commission. Census Zip Code Tabulation Areas, 2000 - San Francisco Bay Area, California. Retrieved from https://earthworks.stanford.edu/catalog/stanford-df986nv4623, 2002) (Fig. 4a–d: City of San Francisco, SF data (2019) Planning Neighborhood Groups Map, https://data.sfgov.org/Geographic-Locations-and-Boundaries/Planning-Neighborhood-Groups-Map/iacs-ws63, 2019). Cumulative incidence by planning neighborhood from March - June 2020 in Fig. 4c used publicly available data from the San Francisco department of Public Health (https://data.sfgov.org/COVID-19/COVID-19-Cases-by-Geography-and-Date/d2ef-idww). Figures 3 and 4 visualize Supplementary Tables 2 and 3. Figure 2 visualizes the distribution of samples, although because the underlying raw data for Fig. 2 are at the individual level, they have not been shared with the manuscript for ethical reasons, although the summarized demographic distributions of the samples are included in the manuscript (Table 1) and access to full raw data can be requested from the authors by contacting Bryan Greenhouse. Data for poverty rates shown in Fig. 2c come from the American Community Survey 2019 (https://data.census.gov/cedsci/).

Code availability

The code used for this analysis can be found on Github at https://github.com/EPPIcenter/scale-it/ (DOI:10.5281/zenodo.4695335)²⁶.

References

Metcalf, C. J. E. et al. Use of serological surveys to generate key insights into the changing global landscape of infectious disease. Lancet 388, 728–730 (2016).
Article Google Scholar
Mina, M. J. et al. A global lmmunological observatory to meet a time of pandemics. eLife 9, e58989 (2020).
Article Google Scholar
Arora, R. K. et al. SeroTracker: a global SARS-CoV-2 seroprevalence dashboard. Lancet Infect. Dis. 21, e75–e76 (2020).
Article Google Scholar
Bubar, K. M. et al. Model-informed COVID-19 vaccine prioritization strategies by age and serostatus. Science 371, 916–921 (2020).
Article ADS Google Scholar
Metcalf, C. J. E., Mina, M. J., Winter, A. K. & Grenfell, B. T. Opportunities and challenges of a World Serum Bank – Authors’ reply. Lancet 389, 252 (2017).
Article Google Scholar
Clapham, H. et al. Seroepidemiologic study designs for determining SARS-COV-2 transmission and immunity. Emerg. Infect. Dis. 26, 1978–1978 (2020).
Article CAS Google Scholar
Bandaranayake, D. et al. Risk factors and immunity in a nationally representative population following the 2009 influenza A(H1N1) pandemic. PLoS ONE 5, e13211 (2010).
Article ADS Google Scholar
Gilbert, G. L. et al. Influenza A (H1N1) 2009 antibodies in residents of New South Wales, Australia, after the first pandemic wave in the 2009 southern hemisphere winter. PLoS ONE 5, 12562 (2010).
Article ADS Google Scholar
Dowse, G. K. et al. Incidence of pandemic (H1N1) 2009 influenza infection in children and pregnant women during the 2009 influenza season in Western Australia - a seroprevalence study. Med. J. Aust. 194, 68–72 (2011).
Reed, C., Katz, J. M., Hancock, K., Balish, A. & Fry, A. M. Prevalence of seropositivity to pandemic influenza A/H1N1 virus in the United States following the 2009 pandemic. PLoS ONE 7, e48187 (2012).
Article ADS CAS Google Scholar
Waalen, K. et al. High prevalence of antibodies to the 2009 pandemic influenza A(H1N1) virus in the Norwegian population following a major epidemic and a large vaccination campaign in autumn 2009. Eurosurveillance 15, 19633 (2010).
Hoschler, K. et al. Seroprevalence of influenza A(H1N1)pdm09 virus antibody, England, 2010 and 2011. Emerg. Infect. Dis. 18, 1894–1897 (2012).
Article Google Scholar
Mak, G. C. et al. Sero-immunity and serologic response to pandemic influenza A (H1N1) 2009 virus in Hong Kong. J. Med. Virol. 82, 1809–1815 (2010).
Article Google Scholar
Havers, F. P. et al. Seroprevalence of antibodies to SARS-CoV-2 in 10 sites in the United States, March 23-May 12, 2020. JAMA Intern. Med. 180, 1576–1586 (2020).
Dong, E., Du, H. & Gardner, L. An interactive web-based dashboard to track COVID-19 in real time. Lancet Infect. Dis. 20, 533–534 (2020).
Article CAS Google Scholar
Ng, D. L. et al. SARS-CoV-2 seroprevalence and neutralizing activity in donor and patient blood. Nat. Commun. 11, 4698 (2020).
Article ADS CAS Google Scholar
Chamie, G. et al. Community transmission of severe acute respiratory syndrome coronavirus 2 disproportionately affects the latinx population during shelter-in-place in San Francisco. Clin. Infect. Dis. Aug 21, ciaa1234, https://doi.org/10.1093/cid/ciaa1234 (2020).
Appa, A. et al. Universal PCR and antibody testing demonstrate little to no transmission of SARS-CoV-2 in a rural community. Open Forum Infect. Dis. https://doi.org/10.1093/ofid/ofaa531 (2020).
Gross, C. P. et al. Racial and ethnic disparities in population-level Covid-19 mortality. J. Gen. Intern. Med. 35, 3097–3099 (2020).
Article Google Scholar
Pan, D. et al. The impact of ethnicity on clinical outcomes in COVID-19: a systematic review. EClinicalMedicine 23, 100404 (2020).
Article Google Scholar
Takahashi, T. et al. Sex differences in immune responses that underlie COVID-19 disease outcomes. Nature 588, 315–320 (2020).
Article ADS CAS Google Scholar
Cooch, P. B. et al. Supervised self-collected SARS-Cov-2 testing in classroom-based summer camps to inform safe in-person learning. Journal of Pediatrics, Perinatology and Child Health, 5, 075–093 (2021).
Ng, D. L. et al. SARS-CoV-2 seroprevalence and neutralizing activity in donor and patient blood. Nat. Commun. 11, 4698 (2020).
Article ADS CAS Google Scholar
Anand, S. et al. Prevalence of SARS-CoV-2 antibodies in a large nationwide sample of patients on dialysis in the USA: a cross-sectional study. Lancet 6, 1335–1344 (2020).
Article Google Scholar
Roy, V. et al. SARS-CoV-2-specific ELISA development. J. Immunol. Methods 484, 112832 (2020).
Article Google Scholar
Geralovina, I. EPPIcenter/flexfit: Flexible format standard curve fitting and data processing, Repository: https://github.com/EPPIcenter/flexfit (https://doi.org/10.5281/zenodo.4706072) (2021).
Wu, L. et al. Optimisation and standardisation of a multiplex immunoassay of diverse Plasmodium falciparum antigens to assess changes in malaria transmission using sero-epidemiology. Wellcome Open Res. 4, 26 (2020).
Article Google Scholar
Gardner, I. A., Stryhn, H., Lind, P. & Collins, M. T. Conditional dependence between tests affects the diagnosis and surveillance of animal diseases. Prev. Vet. Med. 45, 107–122 (2000).
Article CAS Google Scholar
Kahle, D. & Wickham, H. ggmap: Spatial Visualization with ggplot2. R. J. 5, 144–161 (2013).
Article Google Scholar
R. Core Team. R: A Language and Environment for Statistical Computing (R Foundation for Statistical Computing, Vienna, Austria, 2017).
Stan Development Team. Stan Modeling Language Users Guide and Reference Manual. https://mc-stan.org (2020).
Routledge, I., Epstein, A. & Takahashi, S. Citywide serosurveillance of the initial SARS-CoV-2 outbreak in San Francisco using electronic health records, Repository: www.github.com/EPPIcenter/scale-it (https://doi.org/10.5281/zenodo.4695335) (2021).

Download references

Acknowledgements

We acknowledge the significant contribution to this work made by the following persons and organizations: Dr. Kim Rhoads, Dr. Diane Havlir and the Unidos en Salud United in Health partnership, the Office of Community Engagement at the UCSF Helen Diller Family Comprehensive Cancer Center, and the District 10 community partners and participants at the Rafiki Coalition for Health and Wellness, J & J Community Resource Center, The Samoan Community Development Center, and the Young Community Developers, for providing information from community-based testing and response efforts in the Bayview neighborhood. We also acknowledge Jennifer Creasman, Dalia Martinez, and Susan Sudduth at the UCSF Clinical & Translational Science Institute (CTSI) and Janet Nguyen at ZSFG for their valuable assistance in accessing the EHR databases. We also acknowledge the clinical research, laboratory, and epidemiology teams for collecting valuable samples and data from the LIINC cohort. We acknowledge sources of support included funding from the Schmidt Science Fellows, in partnership with the Rhodes Trust (S.T.), Chan Zuckerberg Biohub Investigator program (B.G.), the ZSFG Department of Medicine and Division of HIV, ID, and Global Medicine, the MIDAS Coordination Center (MIDASNI2020 5) by a grant from the National Institute of General Medical Science (3U24GM132013-02S2), and the National Institutes of Health/National Institute of Allergies and Infectious Diseases (NIH/NIAID 3R01AI141003-03S1).

Author information

These authors contributed equally: Isobel Routledge, Adrienne Epstein, Saki Takahashi.

Authors and Affiliations

University of California San Francisco, San Francisco, CA, USA
Isobel Routledge, Adrienne Epstein, Saki Takahashi, Owen Janson, Jill Hakim, Elias Duarte, Keirstinne Turcios, Joanna Vinden, Kirk Sujishi, Jesus Rangel, Marcelina Coh, Lee Besana, Wai-Kit Ho, Ching-Ying Oon, Chui Mei Ong, Cassandra Yun, Kara Lynch, Alan H. B. Wu, William Karlon, Edward Thornborrow, Michael J. Peluso, Timothy J. Henrich, Jessica Briggs, Bryan Greenhouse & Isabel Rodriguez-Barraquer
Chan Zuckerberg Biohub, San Francisco, CA, USA
Wesley Wu & John E. Pak

Authors

Isobel Routledge
View author publications
You can also search for this author in PubMed Google Scholar
Adrienne Epstein
View author publications
You can also search for this author in PubMed Google Scholar
Saki Takahashi
View author publications
You can also search for this author in PubMed Google Scholar
Owen Janson
View author publications
You can also search for this author in PubMed Google Scholar
Jill Hakim
View author publications
You can also search for this author in PubMed Google Scholar
Elias Duarte
View author publications
You can also search for this author in PubMed Google Scholar
Keirstinne Turcios
View author publications
You can also search for this author in PubMed Google Scholar
Joanna Vinden
View author publications
You can also search for this author in PubMed Google Scholar
Kirk Sujishi
View author publications
You can also search for this author in PubMed Google Scholar
Jesus Rangel
View author publications
You can also search for this author in PubMed Google Scholar
Marcelina Coh
View author publications
You can also search for this author in PubMed Google Scholar
Lee Besana
View author publications
You can also search for this author in PubMed Google Scholar
Wai-Kit Ho
View author publications
You can also search for this author in PubMed Google Scholar
Ching-Ying Oon
View author publications
You can also search for this author in PubMed Google Scholar
Chui Mei Ong
View author publications
You can also search for this author in PubMed Google Scholar
Cassandra Yun
View author publications
You can also search for this author in PubMed Google Scholar
Kara Lynch
View author publications
You can also search for this author in PubMed Google Scholar
Alan H. B. Wu
View author publications
You can also search for this author in PubMed Google Scholar
Wesley Wu
View author publications
You can also search for this author in PubMed Google Scholar
William Karlon
View author publications
You can also search for this author in PubMed Google Scholar
Edward Thornborrow
View author publications
You can also search for this author in PubMed Google Scholar
Michael J. Peluso
View author publications
You can also search for this author in PubMed Google Scholar
Timothy J. Henrich
View author publications
You can also search for this author in PubMed Google Scholar
John E. Pak
View author publications
You can also search for this author in PubMed Google Scholar
Jessica Briggs
View author publications
You can also search for this author in PubMed Google Scholar
Bryan Greenhouse
View author publications
You can also search for this author in PubMed Google Scholar
Isabel Rodriguez-Barraquer
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

O.J., J.H., E.D., K.T., and J.V. all contributed equally and should be considered joint second authors. I.R.B. and B.G. contributed equally and should be considered joint final authors. I.R., A.E., S.T., B.G., J.B., and I.R.B. conceived of the study. I.R. and A.E. managed sample selection activities with support from J.V. Plasma specimens were collected by K.S., J.R., M.C., L.B., W.K.H., C.Y.O., C.M.O., C.Y., K.L., A.W., and W.K. O.J., J.H., E.D., K.T., and J.V. performed antibody assays with proteins provided by J.P. and W.W. M.J.P. and T.J.H. and provided and analyzed serum from positive controls. I.R. and S.T. performed data analyses with support from A.E. The manuscript and figures were prepared by I.R., A.E., and S.T., with additional input from B.G. and I.R.B. All authors contributed to interpretation of the results and edited the manuscripts. All authors read and approved the final manuscript.

Corresponding author

Correspondence to Isobel Routledge.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Peer review information Nature Communications thanks Gabriel Chodick, Oliver Laeyendecker, and the other, anonymous reviewer(s) for their contribution to the peer review of this work. Peer review reports are available.

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary Information

Descriptions of Additional Supplementary Files

Dataset 1

Dataset 2

Reporting Summary

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Routledge, I., Epstein, A., Takahashi, S. et al. Citywide serosurveillance of the initial SARS-CoV-2 outbreak in San Francisco using electronic health records. Nat Commun 12, 3566 (2021). https://doi.org/10.1038/s41467-021-23651-6

Download citation

Received: 28 January 2021
Accepted: 29 April 2021
Published: 11 June 2021
DOI: https://doi.org/10.1038/s41467-021-23651-6

This article is cited by

Using sero-epidemiology to monitor disparities in vaccination and infection with SARS-CoV-2
- Isobel Routledge
- Saki Takahashi
- Isabel Rodríguez-Barraquer
Nature Communications (2022)

Comments

By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.