Serosurveillance provides a unique opportunity to quantify the proportion of the population that has been exposed to pathogens. Here, we developed and piloted Serosurveillance for Continuous, ActionabLe Epidemiologic Intelligence of Transmission (SCALE-IT), a platform through which we systematically tested remnant samples from routine blood draws in two major hospital networks in San Francisco for SARS-CoV-2 antibodies during the early months of the pandemic. Importantly, SCALE-IT allows for algorithmic sample selection and rich data on covariates by leveraging electronic health record data. We estimated overall seroprevalence at 4.2%, corresponding to a case ascertainment rate of only 4.9%, and identified important heterogeneities by neighborhood, homelessness status, and race/ethnicity. Neighborhood seroprevalence estimates from SCALE-IT were comparable to local community-based surveys, while providing results encompassing the entire city that have been previously unavailable. Leveraging this hybrid serosurveillance approach has strong potential for application beyond this local context and for diseases other than SARS-CoV-2.
The rapid spread of the SARS-CoV-2 virus has laid bare important gaps in routine infectious disease surveillance. Serological data, particularly when collected at high spatial and temporal resolutions, are a key resource for addressing many key epidemiological questions since they directly quantify the proportion of the population that has been infected by a pathogen1,2. For SARS-CoV-2, serology is particularly useful given the high levels of disease under-ascertainment: serologic surveillance is the gold standard for estimating attack rates (the proportion of the population that has been infected) and highly complementary to virologic and syndromic surveillance systems for providing vital information on where a population is along the epidemic curve3. Population-based serosurveys that employ a probabilistic sampling frame are considered to be the gold standard for estimating seroprevalence. However, performing large population-based serosurveys can be prohibitively resource-intensive to initiate swiftly or perform repeatedly, especially during an ongoing outbreak, as demonstrated by the relative sparsity of population-based versus convenience sampled serosurveys for SARS-CoV-2 that have been conducted to date3. For example, to date, no population-based serosurveys have been conducted for the city of San Francisco or wider Bay Area, and few have been conducted in the United States, limiting our ability to identify of risk factors for infection, understand population-level immunity, and determine which populations and localities may be in need of targeted public health resources such as testing, contact tracing, or vaccine allocation4.
Residual blood samples from readily available sources (e.g., blood donors or remnant samples collected from routine medical care visits), especially when linked to individual-level meta-data, provide a unique opportunity to address these limitations and to efficiently survey a population for antibodies over an extended period of time5,6. Such studies were found to be useful in the 2009 H1N1 influenza pandemic7,8,9,10,11,12,13, facilitating analyses on a broader spatial and temporal scale than typical cross-sectional serological surveys allow. However, in most studies that use residual blood samples the source population is unknown14. This presents a major limitation, as the results are difficult to interpret when it is not known whether the sampled population is representative of the population of interest.
The San Francisco Bay Area has widely been recognized for taking an early and proactive response to COVID-19. San Francisco Bay Area counties introduced a shelter-in-place order on 17 March 2020, requiring residents to remain at home unless leaving the house for essential activities. Relative to many other US cities, few cases were detected in San Francisco during the early months of the epidemic, a pattern which continued as the pandemic progressed15. However, like many other areas, a high proportion of asymptomatic infections and limited access to diagnostic testing during this time makes it difficult to interpret these numbers. Results from an early San Francisco seroprevalence study conducted on convenience samples in late March to early April 2020 suggested that <1% of the population had been infected overall16, in contrast to a seroprevalence of >6% estimated by a community study focusing on a specific neighborhood, particularly among the Hispanic/Latinx population17 but consistent with a survey of a rural Bay Area community18. The lack of citywide, representative seroprevalence estimates during this time period limits the ability to determine to what degree these discrepancies reflect heterogenous exposure or differences in study design.
Here we present a blueprint and the early results of the ongoing SCALE-IT study (Serosurveillance for Continuous, ActionabLe Epidemiologic Intelligence of Transmission), leveraging residual sera samples from two large hospital systems in San Francisco, California to quantify the prevalence of SARS-CoV-2 antibodies. Importantly, these remnant samples are linked to electronic health records (EHRs) enabling careful algorithmic selection based on demographic and clinical variables, improving their representativeness to the general population. We tested over 5000 samples collected from late March to June 2020 from San Francisco residents, and calculated raw and adjusted seroprevalence estimates over space, time, and socio-demographic indicators. These data provide estimates of the overall seroprevalence in San Francisco during the initial phase of the local SARS-CoV-2 outbreak and highlight spatial and demographic heterogeneities in transmission across the city.
Between March 28, 2020, and June 26, 2020, we collected a total of 5244 samples, representing 4735 individual patients, from UCSF Health (n = 3037 patients) and ZSFG (n = 1698 patients) (Fig. 1, Supplementary Fig. 1). By design, the age distribution of sampled individuals remained consistent throughout the study period, and the geographic distribution of residents matched the proportion of the San Francisco population living in each zip code (Fig. 2). Our sample did not achieve the target sample size for the youngest age group due to the limited number of children receiving routine phlebotomy in the UCSF and ZSFG health systems (Table 1). Our results were relatively representative of the San Francisco population by race and ethnicity, although our sample overrepresented those who identified as Black/African American and slightly underrepresented those who identified as Asian.
Overall, from 5244 samples we identified 192/4735 positive samples from unique patients for a raw seroprevalence of 4.1%. After weighting for age group and sex to match the population structure of San Francisco and correcting for test performance characteristics (overall sensitivity of 93.7% and specificity of 99.6%), this corresponds to an estimated population seroprevalence of 4.2% (95% Credible Interval [CrI]: 2.1–6.3%). Based on the number of cases reported during the period covered by the study, we estimate that only 4.9% of all infections were ascertained by the reporting system (95% CrI: 3.3–9.9%) (Supplementary Methods 1). Amongst pregnant women seeking routine care (N = 268), we estimated a raw seroprevalence of 3.4% (9/268 seropositive), and after adjusting for test performance characteristics we estimate 3.5% (95% CrI: 1.1–6.4%) seroprevalence amongst this group. This estimate in our sentinel population group is consistent with the estimates across our overall population of samples.
We did not observe statistically significant differences in seroprevalence by age (Fig. 3a) or hospital system (Supplementary Table 2, Supplementary Data 1). We found seroprevalence to be nearly twice as high in uninsured individuals (6.3%, 95% CrI: 3.1–9.9%) than in those with some form of insurance, [Private/Commercial: 3.4% (95% CrI: 1.6–4.7%); Government: 4.0% (95% CrI: 2.3–5.0%)] (Fig. 3b). With respect to race/ethnicity, seroprevalence was highest in those identifying as Hispanic (6.3%, 95% CrI: 4.4–8.3%) followed by Black or African American (4.8%, 95% CrI: 2.8–7.0%), and lowest in those who identified as Asian (2.3%, 95% CrI: 0.8–3.5%) (Fig. 3c). Seroprevalence was almost twice as high in those identifying as Male (5.3%, 95% CrI: 3.7–6.6%) compared to Female (2.7%, 95% CrI: 1.1–3.6%) (Fig. 3d). Although these samples were obtained over a 3-month collection period, given the relatively low attack rate during these initial stages of the pandemic in San Francisco, we were not able to detect meaningful differences in seroprevalence over time (Supplementary Table 2, Supplementary Figs. 2 and 3).
Geographically, we found seroprevalence to be highest in the Bayview neighborhood in the southeast region of the city, at 8.1% (95% CrI: 4.6%, 12.3%) (Fig. 4a, Supplementary Table 3, Supplementary Data 2). Although several other neighborhoods had similarly high seroprevalences, there was much more uncertainty around these estimates (Fig. 4b). These findings are consistent with patterns of incidence in the city during this period of time (Fig. 4c). We identified 157 individuals who were homeless in our study, and amongst this group seroprevalence was estimated to be 10.8% (95% CrI: 6.1%, 16.5%).
As validation of the representativity of our approach using curated remnant samples, we compared results from this study to two contemporaneous community-based serosurveys conducted in specific neighborhoods of San Francisco. First, we compared these results to a cross-sectional serosurvey carried out in a census tract within the Mission District (census tract 022901, zip code 94110) between April 25 and April 28, 202017. Chamie et al. tested 2545 census tract residents for SARS-CoV-2 antibodies and estimated seroprevalence to be 3.1% (95% CI: 2.5–3.9%). This is consistent with our findings of 3.8% seroprevalence (95% CrI: 1.8–6.3%) between April and June 2020 in the broader Mission District neighborhood. Second, we compared our results to a cross-sectional serosurvey carried out in two census tracts in San Francisco’s 10th District between May 30 and June 2, 2020 (https://unitedinhealth.org/sf-district-10), located in the Bayview neighborhood. Among the nearly 1600 individuals tested for antibodies, seroprevalence was estimated at 5.6% in Hispanic participants (n = 320), 2.3% in Black participants (N = 397) and 0.4% in white participants (n = 231). The relatively high seroprevalence we detected in the Bayview neighborhood through our study is comparable to the results of this community-based study, and the disparities by race/ethnicity were similar in direction, though different in magnitude, to those identified through our remnant sample study as well. It is worth noting that the community studies available for comparison also rely upon convenience sampling as participation in the studies was voluntary, and therefore may contain inherent selection biases themselves.
In this study, we developed and piloted a scalable and systematic pipeline using remnant samples from two major hospital networks in San Francisco to select, collect, and test specimens for SARS-CoV-2 antibodies (SCALE-IT). Through this effort, we estimated seroprevalence during the early months of the epidemic to be relatively low throughout San Francisco (4.2%), but still representing more than 20 times the number of infections identified by PCR-confirmed cases at that time. This may be due to the limited availability of PCR testing during the beginning of the pandemic, and the lack of testing of asymptomatic individuals. We also identified important disparities in seroprevalence at the neighborhood level, with the highest seroprevalence in the Bayview neighborhood in the southeast region of the city, as well as disproportionately higher seroprevalence in individuals experiencing homelessness and those identifying as Hispanic, Black/African American, or male. Leveraging this hybrid serosurveillance approach has potential for broad application beyond this local context and for diseases other than SARS-CoV-2.
The heterogeneities in seroprevalence we observed by race/ethnicity and socio-economic status—here obtained from EHR data on health insurance status and whether individuals were housed—echo the patterns, which have been highlighted over the course of the pandemic at national and global levels19,20. Specific to San Francisco, our results provide estimates of SARS-CoV-2 cumulative exposure at a granular spatial resolution with a scope covering the entire city; despite low overall seroprevalence, we identified specific neighborhoods with disproportionately higher seroprevalence. Interestingly, we also found seroprevalence to be approximately twice as high in those identifying as male compared to female. Potential explanations for this difference include differential pathogen exposure by sex, which is supported by findings of other studies elsewhere14 and in San Francisco, finding PCR positivity rates of 1.2% (20/1658) in women and 3.3% (63/1908) in men, with an odds ratio of 2.71 (1.64-4.69) for PCR positivity in males, and also that the majority (74%,) of those who tested positive by PCR or were seropositive for SARS-CoV-2 were frontline workers and unable to shelter-in-place17. It has been found that males and females mount different immune responses and infection severity21, which could affect assay sensitivity, however, we believe this is unlikely to explain the large difference we see in our estimates as we do not see sex-based differences in the sensitivity of our assay on the positive controls used in the study, which represent a range of disease severities.
While a key strength of our approach was leveraging residual sera from two large health system networks and using data from EHRs to algorithmically select samples for inclusion, there are limitations to this type of surveillance that require consideration. Most obviously, patient samples may not be fully representative of the underlying population. This may be particularly true during “shelter-in-place” periods, when behavioral changes may affect the availability and characteristics of the patient population. These issues can ideally be mitigated by careful sample selection, as done here by focusing on a subset of outpatients, with the possibility of further refinement by inclusion of additional selection criteria (e.g., by restricting or weighting sampling to consider specific visit types or underlying conditions). Representativity of the serosurveillance system could also be enhanced by including a broader network of local health systems. We also recognize that the generalizability of our findings may differ by age groups, and is likely to be lower in children who were underrepresented in our sample set despite the stratified sampling framework. Additional study designs, such as school-based serosurveys, could be leveraged to augment these data to prospectively assess seroprevalence in specific age groups, possibly by using non-invasive, saliva-based antibody testing22. Despite including over 5000 samples, our study was not powered to detect differences between covariates or by time in a multiple regression framework, in part due to San Francisco’s success in maintaining low transmission and thus low seroprevalence during this time period. Lastly, while we validated our estimates against results from available community-based studies, further validation would be ideal to assess validity of results and findings.
Whilst our estimates of seroprevalence in the Mission and Bayview districts were consistent with community studies and we found similar disparities by demographics, we did find slightly higher seroprevalences overall. This was particularly true for the Bayview/Sunnydale surveys where we estimate a seroprevalence of 8.1% (95% CrI: 4.6–12.3%) for Bayview/Hunter’s point neighborhood, whilst a community survey in the census tract 231.02 which lies within the neighborhood, found a raw seroprevalence estimate of 24/784 (3.06%). This difference may be due to heterogeneity within the neighborhood, i.e., higher seroprevalences in other census tracts not sampled in the community survey, or differences in the underlying population sampled. There are also differences in the timing of sample collection; we collected samples up until the end of June 2020, whereas this study was conducted between May 30 and June 2, 2020. In addition, the difference could be caused by our study sampling individuals more at risk of exposure than the community surveys. It is also interesting to compare our results to other serosurveys, which sampled the wider San Francisco Bay Area during the early months of the pandemic. A serosurvey across the wider San Francisco Bay Area found a seroreactivity 0.1% in 1000 blood donors, and 0.26% in 387 hospitalized patients admitted for non-respiratory indications in early April 202023. An additional study of residual sera in the San Francisco Bay area between 23 and 27 April found a seroprevalence of 1.0%14. These results are quite a bit lower than our estimate for April of 4.6% (2.7–6.3%), but not directly comparable as the source populations drawn for these studies are not fully characterized and are unlikely to be representative of the San Francisco population. In addition, samples in both studies included residents from outside of San Francisco county, including counties known to have experienced very low transmission of SARS-CoV-2 during this time period.
We did not find a clear increase in seropositivity over time, whilst case counts in San Francisco did increase, albeit slowly, during the observation period. This lack of increase in seroprevalence over time may be the result of changes in some of the demographics of our sample population over time (Supplementary Fig. 2, Supplementary Table 4), as the proportion of samples from patients who identify as white, female and who have private insurance (all of which we found had lower seroprevalence) increased over the period of sample collection. This could also be explained by a lack of power to detect small changes at such low seroprevalence. If implemented in a context where there was more power to detect changes and/or stratify by additional demographic variables when selecting samples, then our approach could provide valuable data to explore additional questions of public health interest, such as the impact of interventions and changes in ascertainment rates over time.
In this pilot study, we developed and implemented a SARS-CoV-2 serosurveillance system to detect population-level pathogen exposure in near-real time, and demonstrated how data collected through this platform were comparable to results from more resource-intensive community-based serological studies and incidence data. The appeal of this hybrid approach is that it achieves many of the strengths of population-based surveys and provides rich data, while leveraging existing infrastructure to allow for much greater efficiencies often seen in convenience sampling approaches. Using EHR data, we were able to develop a stratified sampling frame, ensuring improved representativeness of the results in contrast to serosurveys performed using convenience samples without these key pieces of information14. At the same time, we used these data to identify important spatial and demographic heterogeneities in seroprevalence within our study site; serosurveys performed on residual samples are often limited to coarser levels of meta-data on the sampled population24. The relative ease with which SCALE-IT can be implemented means that it can be deployed over a broad geographic scale, continuously over time, and dynamically adjusted to address specific surveillance needs.
We envision multiple lines of work for future directions. First, the samples that we have selected, collected, and processed in this work could serve as a valuable biorepository for future applications. The ability to link rich EHR data to a large bank of well-curated serum samples opens up opportunities for additional analysis including longitudinal studies of patients. Second, as serosurveillance efforts will be fundamental to monitor SARS-CoV-2 transmission rates and evaluate the impact of control interventions (both Non Pharmaceutical Interventions and pharmaceuticals) over the coming months and years, future work could leverage these and prospective serological data to parametrize mechanistic models and to study the effects of control strategies on infection rate. Third, as discussed by others1,2, our local SCALE-IT platform could easily be expanded to contribute to a ‘Global Immunological Observatory’ to perform serosurveillance for other pathogens beyond the SARS-CoV-2 virus. Data generated by such an observatory could be used to address specific public health gaps including serosurveillance for seasonal pathogens such as influenza or emerging infections. Lastly, the insights gained from developing this platform could serve as a blueprint for adoption by other health systems in various contexts.
Residual serum samples from routine blood draws from the University of California, San Francisco (UCSF) and San Francisco Department of Public Health (SFDPH) inpatient and outpatient healthcare systems were sampled from March 28, 2020, onward. UCSF Medical Center is a network of three hospitals with ~1.8 million outpatient visits annually (https://www.ucsfhealth.org/about/annual-reports). The SFDPH hospital, Zuckerberg San Francisco General Hospital (ZSFG), is a city hospital that provides trauma, medical, and surgical services to a heterogeneous population of largely un- or underinsured patients, including the city’s homeless population, and serves roughly 100,000 patients per year (https://zsfg.ucsf.edu/about-ucsf-zsfg).
We obtained daily EHRs for all patients in these networks undergoing routine blood testing, defined as blood chemistries and tests for sexually transmitted infections and rubella. EHR data included information on patient demographics, address, insurance provider, and diagnoses. We also obtained information on all tests for respiratory infections (including SARS-CoV-2) performed on patients in the 6 months prior to the blood draw.
We aimed to collect 2000 samples monthly. We determined this sample size based on considerations of both statistical power and feasibility. To estimate seroprevalence with an absolute error of 5% and at Type I error of 5%, and a prior of 20% seroprevalence, a sample size of 246 individuals would need to be tested each month. We determined that an overall sample size of a minimum 1230 samples per month would be sufficient to allow stratification of results by five age groups (0–19, 20–39, 40–59, 60–79, 80+ years).
From the full list of residual serum samples that were available, we restricted our sampling frame to samples from individuals undergoing routine blood testing. We included patients residing in San Francisco, including those experiencing homelessness. We excluded individuals who were tested for SARS-CoV-2 during the visit when they received their blood draw (except if the test was for routine purposes, such as testing prior to an elective procedure or admittance to the hospital). We did not have any exclusion criteria for previous visits or tests for SARS-CoV-2 of any severity. We restricted our sample to outpatient and emergency department visits for adults; for the youngest age group, we included both inpatient and outpatient visits due to small numbers of available samples. Finally, we excluded samples if a sample from the same patient had been selected within the previous 30 days.
After obtaining the list of eligible samples according to the above criteria, we selected serum samples for the study using a sampling algorithm aimed to ensure an adequate sample size for each of five age strata and to maximize geographic representativity. After setting a daily target sample size for our overall population, we divided this equally between five age bins to set a target sample size for each age bin. We also set a target sample size for each zip code, which was proportional to its population size. For each zip code with a larger number of eligible samples than its target size, we kept all samples from age groups with sample sizes below or at their target and obtained a random sample from any age group that had an eligible sample size above the target size. We intentionally oversampled pregnant women as a healthy sentinel population by aiming to obtain up to 10% of the samples from pregnant women undergoing routine care, as defined by ICD-10 codes.
Remnant samples were stored at +4 °C in outpatient laboratories at UCSF and ZSFG, and collected by our study team twice every week. After collection, samples were centrifuged for 15 min at 3500 g before aliquoting a working stock of 300 μL into 96 well-barcoded tubes, diluting in 1:1 HEPES storage buffer, and storing at +4 °C. The remainder of the sample was aliquoted into 1.4 mL barcoded tubes and stored at −20 °C.
Serologic assays and validation data
We used two serologic assays for this study in order to maximize assay specificity. First, we screened all samples using an in-house ELISA assay and then performed confirmatory testing on a subset of samples above a threshold value using an in-house Luminex assay. The ELISA assay detected IgG to the receptor-binding domain (RBD) of the spike (S) protein, based on published protocols25 with minor modifications, described here briefly. 1 μg of RBD was used to coat each well of 384-well high binding plates, secondary antibody was diluted 1:5000 (Southern Biotech #2048-05), and OPD was used to develop the plates. Concentration values were calculated from the ELISA optical density using a plate-specific standard curve from serial dilutions of a pool of positive control samples26. Samples with an ELISA concentration value above 0.049 were selected for confirmatory testing (see Supplementary Methods 1, Supplementary Tables 5 and 6).
For confirmatory testing, we used a multiplex microsphere assay (Luminex platform) to detect IgG against the SARS-CoV-2 S protein, RBD, and the nucleocapsid (N) protein, based on a standardized serology protocol with minor modifications27. Briefly, plasma samples were diluted to 1:100 in blocking buffer A (1× PBS, 0.05% Tween, 0.5% bovine serum albumin, 0.02% sodium azide). Antigen concentrations used were as follows: S: 4 μg/mL, RBD: 2 μg/mL, and N: 3 μg/mL. As above, concentration values were calculated from the Luminex median fluorescent intensity using a plate-specific standard curve from serial dilutions of a pool of positive control samples. A logistic regression model including the concentration values of the three antigens for each sample was determined to have the highest cross-validation accuracy for classification and was used to establish a cutoff for positivity (see Supplementary Methods 1).
Serologic assays were optimized using positive and negative controls from several sources. Serum samples from 127 patients with PCR-confirmed SARS-CoV-2 infections (representing 266 total samples, with 1-4 longitudinal monthly time points per individual beginning at 3 weeks post-symptom onset) were obtained from the Long-term Impact of Infection with Novel Coronavirus (LIINC) study (https://www.liincstudy.org/) and used as positive controls. Importantly, participants in this cohort represent a range of infection severities (ranging from asymptomatic to severe), age, sex, and ethnicity and race. Serum samples from 119 individuals obtained prior to the emergence of SARS-CoV-2 were used as negative controls. The overall sensitivity of our serial testing approach using positive and negative controls was 93.7% (95% CrI = 89.0%, 97.2%) and specificity was 99.6% (95% CrI = 98.2%, 100.0%) (Supplementary Tables 1, 5 and 6, Supplementary Methods 1).
Raw seropositivity was determined as the proportion of all samples from unique individuals that tested positive on the confirmatory assay. We then produced estimates of seroprevalence adjusted for the sensitivity and specificity of the serial testing approach, incorporating potential conditional dependence of the tests as described in Gardner et al.28 (see Supplementary Methods 1). We stratified by covariates to obtain seroprevalence estimates for each stratum (age, sex, insurance status, ethnicity, and neighborhood). To identify neighborhoods, we geocoded sample addresses using the Google Cloud Geocoding API using the ggmap R package29. Samples (n = 365 unique individuals) which could not be geocoded to rooftop (n = 261) and/or were from homeless individuals (n = 157) were excluded from neighborhood level estimates of seroprevalence, however, estimates of seroprevalence were calculated for homeless individuals separately and provided alongside neighborhood level estimates of seroprevalence. All analysis was conducted using the R statistical software30 and the Stan programming language31. Code and data to reproduce all analyses are available at: https://github.com/EPPIcenter/scale-it32.
Institutional Review Board (IRB) approval
This study received expedited review approval by the UCSF IRB #20-30379 (Serological Surveillance of SARS-CoV-2 in Residual Serum/Plasma Samples). The IRB did not require patient contact or written consent to use residual sera. The LIINC study (providing positive control samples) was approved by the UCSF (IRB #20-30479). Pre-pandemic samples used as negative controls came from the New York Blood Bank, and were de-identified and not subject to IRB review for use in this study.
Further information on research design is available in the Nature Research Reporting Summary linked to this article.
To avoid identifiability of data and to comply with institutional policy around data privacy, we have provided summarized data by demographic group and neighborhood instead of individual-level data, used to generate Fig. 4a, b, as well as posterior values for seroprevalence by demographic group used to generate Fig. 3a–d. The aggregated data used for this analysis can be found on Github at https://github.com/EPPIcenter/scale-it/ (DOI:10.5281/zenodo.4695335)26. Maps were created in QGIS (QGIS.org, QGIS Geographic Information System. QGIS Association. http://www.qgis.org, 2021) using shapefiles in the public domain (Fig. 2c: California. Metropolitan Transportation Commission. Census Zip Code Tabulation Areas, 2000 - San Francisco Bay Area, California. Retrieved from https://earthworks.stanford.edu/catalog/stanford-df986nv4623, 2002) (Fig. 4a–d: City of San Francisco, SF data (2019) Planning Neighborhood Groups Map, https://data.sfgov.org/Geographic-Locations-and-Boundaries/Planning-Neighborhood-Groups-Map/iacs-ws63, 2019). Cumulative incidence by planning neighborhood from March - June 2020 in Fig. 4c used publicly available data from the San Francisco department of Public Health (https://data.sfgov.org/COVID-19/COVID-19-Cases-by-Geography-and-Date/d2ef-idww). Figures 3 and 4 visualize Supplementary Tables 2 and 3. Figure 2 visualizes the distribution of samples, although because the underlying raw data for Fig. 2 are at the individual level, they have not been shared with the manuscript for ethical reasons, although the summarized demographic distributions of the samples are included in the manuscript (Table 1) and access to full raw data can be requested from the authors by contacting Bryan Greenhouse. Data for poverty rates shown in Fig. 2c come from the American Community Survey 2019 (https://data.census.gov/cedsci/).
Metcalf, C. J. E. et al. Use of serological surveys to generate key insights into the changing global landscape of infectious disease. Lancet 388, 728–730 (2016).
Mina, M. J. et al. A global lmmunological observatory to meet a time of pandemics. eLife 9, e58989 (2020).
Arora, R. K. et al. SeroTracker: a global SARS-CoV-2 seroprevalence dashboard. Lancet Infect. Dis. 21, e75–e76 (2020).
Bubar, K. M. et al. Model-informed COVID-19 vaccine prioritization strategies by age and serostatus. Science 371, 916–921 (2020).
Metcalf, C. J. E., Mina, M. J., Winter, A. K. & Grenfell, B. T. Opportunities and challenges of a World Serum Bank – Authors’ reply. Lancet 389, 252 (2017).
Clapham, H. et al. Seroepidemiologic study designs for determining SARS-COV-2 transmission and immunity. Emerg. Infect. Dis. 26, 1978–1978 (2020).
Bandaranayake, D. et al. Risk factors and immunity in a nationally representative population following the 2009 influenza A(H1N1) pandemic. PLoS ONE 5, e13211 (2010).
Gilbert, G. L. et al. Influenza A (H1N1) 2009 antibodies in residents of New South Wales, Australia, after the first pandemic wave in the 2009 southern hemisphere winter. PLoS ONE 5, 12562 (2010).
Dowse, G. K. et al. Incidence of pandemic (H1N1) 2009 influenza infection in children and pregnant women during the 2009 influenza season in Western Australia - a seroprevalence study. Med. J. Aust. 194, 68–72 (2011).
Reed, C., Katz, J. M., Hancock, K., Balish, A. & Fry, A. M. Prevalence of seropositivity to pandemic influenza A/H1N1 virus in the United States following the 2009 pandemic. PLoS ONE 7, e48187 (2012).
Waalen, K. et al. High prevalence of antibodies to the 2009 pandemic influenza A(H1N1) virus in the Norwegian population following a major epidemic and a large vaccination campaign in autumn 2009. Eurosurveillance 15, 19633 (2010).
Hoschler, K. et al. Seroprevalence of influenza A(H1N1)pdm09 virus antibody, England, 2010 and 2011. Emerg. Infect. Dis. 18, 1894–1897 (2012).
Mak, G. C. et al. Sero-immunity and serologic response to pandemic influenza A (H1N1) 2009 virus in Hong Kong. J. Med. Virol. 82, 1809–1815 (2010).
Havers, F. P. et al. Seroprevalence of antibodies to SARS-CoV-2 in 10 sites in the United States, March 23-May 12, 2020. JAMA Intern. Med. 180, 1576–1586 (2020).
Dong, E., Du, H. & Gardner, L. An interactive web-based dashboard to track COVID-19 in real time. Lancet Infect. Dis. 20, 533–534 (2020).
Ng, D. L. et al. SARS-CoV-2 seroprevalence and neutralizing activity in donor and patient blood. Nat. Commun. 11, 4698 (2020).
Chamie, G. et al. Community transmission of severe acute respiratory syndrome coronavirus 2 disproportionately affects the latinx population during shelter-in-place in San Francisco. Clin. Infect. Dis. Aug 21, ciaa1234, https://doi.org/10.1093/cid/ciaa1234 (2020).
Appa, A. et al. Universal PCR and antibody testing demonstrate little to no transmission of SARS-CoV-2 in a rural community. Open Forum Infect. Dis. https://doi.org/10.1093/ofid/ofaa531 (2020).
Gross, C. P. et al. Racial and ethnic disparities in population-level Covid-19 mortality. J. Gen. Intern. Med. 35, 3097–3099 (2020).
Pan, D. et al. The impact of ethnicity on clinical outcomes in COVID-19: a systematic review. EClinicalMedicine 23, 100404 (2020).
Takahashi, T. et al. Sex differences in immune responses that underlie COVID-19 disease outcomes. Nature 588, 315–320 (2020).
Cooch, P. B. et al. Supervised self-collected SARS-Cov-2 testing in classroom-based summer camps to inform safe in-person learning. Journal of Pediatrics, Perinatology and Child Health, 5, 075–093 (2021).
Ng, D. L. et al. SARS-CoV-2 seroprevalence and neutralizing activity in donor and patient blood. Nat. Commun. 11, 4698 (2020).
Anand, S. et al. Prevalence of SARS-CoV-2 antibodies in a large nationwide sample of patients on dialysis in the USA: a cross-sectional study. Lancet 6, 1335–1344 (2020).
Roy, V. et al. SARS-CoV-2-specific ELISA development. J. Immunol. Methods 484, 112832 (2020).
Wu, L. et al. Optimisation and standardisation of a multiplex immunoassay of diverse Plasmodium falciparum antigens to assess changes in malaria transmission using sero-epidemiology. Wellcome Open Res. 4, 26 (2020).
Gardner, I. A., Stryhn, H., Lind, P. & Collins, M. T. Conditional dependence between tests affects the diagnosis and surveillance of animal diseases. Prev. Vet. Med. 45, 107–122 (2000).
Kahle, D. & Wickham, H. ggmap: Spatial Visualization with ggplot2. R. J. 5, 144–161 (2013).
R. Core Team. R: A Language and Environment for Statistical Computing (R Foundation for Statistical Computing, Vienna, Austria, 2017).
Stan Development Team. Stan Modeling Language Users Guide and Reference Manual. https://mc-stan.org (2020).
Routledge, I., Epstein, A. & Takahashi, S. Citywide serosurveillance of the initial SARS-CoV-2 outbreak in San Francisco using electronic health records, Repository: www.github.com/EPPIcenter/scale-it (https://doi.org/10.5281/zenodo.4695335) (2021).
We acknowledge the significant contribution to this work made by the following persons and organizations: Dr. Kim Rhoads, Dr. Diane Havlir and the Unidos en Salud United in Health partnership, the Office of Community Engagement at the UCSF Helen Diller Family Comprehensive Cancer Center, and the District 10 community partners and participants at the Rafiki Coalition for Health and Wellness, J & J Community Resource Center, The Samoan Community Development Center, and the Young Community Developers, for providing information from community-based testing and response efforts in the Bayview neighborhood. We also acknowledge Jennifer Creasman, Dalia Martinez, and Susan Sudduth at the UCSF Clinical & Translational Science Institute (CTSI) and Janet Nguyen at ZSFG for their valuable assistance in accessing the EHR databases. We also acknowledge the clinical research, laboratory, and epidemiology teams for collecting valuable samples and data from the LIINC cohort. We acknowledge sources of support included funding from the Schmidt Science Fellows, in partnership with the Rhodes Trust (S.T.), Chan Zuckerberg Biohub Investigator program (B.G.), the ZSFG Department of Medicine and Division of HIV, ID, and Global Medicine, the MIDAS Coordination Center (MIDASNI2020 5) by a grant from the National Institute of General Medical Science (3U24GM132013-02S2), and the National Institutes of Health/National Institute of Allergies and Infectious Diseases (NIH/NIAID 3R01AI141003-03S1).
The authors declare no competing interests.
Peer review information Nature Communications thanks Gabriel Chodick, Oliver Laeyendecker, and the other, anonymous reviewer(s) for their contribution to the peer review of this work. Peer review reports are available.
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
About this article
Cite this article
Routledge, I., Epstein, A., Takahashi, S. et al. Citywide serosurveillance of the initial SARS-CoV-2 outbreak in San Francisco using electronic health records. Nat Commun 12, 3566 (2021). https://doi.org/10.1038/s41467-021-23651-6
This article is cited by
Nature Communications (2022)