Seroprevalence of SARS-CoV-2 antibodies in Saint Petersburg, Russia: a population-based study

Properly conducted serological survey can help determine infection disease true spread. This study aims to estimate the seroprevalence of SARS-CoV-2 antibodies in Saint Petersburg, Russia accounting for non-response bias. A sample of adults was recruited with random digit dialling, interviewed and invited for anti-SARS-CoV-2 antibodies. The seroprevalence was corrected with the aid of the bivariate probit model that jointly estimated individual propensity to agree to participate in the survey and seropositivity. 66,250 individuals were contacted, 6,440 adults agreed to be interviewed and blood samples were obtained from 1,038 participants between May 27 and June 26, 2020. Naïve seroprevalence corrected for test characteristics was 9.0% (7.2–10.8) by CMIA and 10.5% (8.6–12.4) by ELISA. Correction for non-response decreased estimates to 7.4% (5.7–9.2) and 9.1% (7.2–10.9) for CMIA and ELISA, respectively. The most pronounced decrease in bias-corrected seroprevalence was attributed to the history of any illnesses in the past 3 months and COVID-19 testing. Seroconversion was negatively associated with smoking status, self-reported history of allergies and changes in hand-washing habits. These results suggest that even low estimates of seroprevalence can be an overestimation. Serosurvey design should attempt to identify characteristics that are associated both with participation and seropositivity.

Procedures. RDD was carried out using area prefixes of mobile phone numbers to include only mobile phone users in St. Petersburg. The individuals who had answered the call were asked to answer 25 questions on demographics, marital status, education level, income level, past history of illnesses, travelling abroad, household size, social contacts, and visits to public places during lockdown (see full questionnaire in the study protocol). Refusal to participate in blood sampling was also recorded. We have also randomly incentivized respondents to participate in the study by offering complimentary taxi transit to and from the clinic test site for approximately 25% of those who agreed to go through CATI.
Those who had agreed to take part in antibody testing were later contacted by the clinic call center and were assigned an appointment date for blood sampling. The participants signed informed consent forms and filled out additional paper-based survey forms in the clinic on the day of the visit. Forms included question on the medical history, history of allergies, smoking, alcohol consumption, chronic diseases and medication taken regularly. Blood sampling started on May 27, 2020 and was planned for two weeks but was prolonged till June 26, 2020 because of low participation rates.

Laboratory tests.
We assessed anti-SARS-CoV-2 antibodies using two tests. Serum samples were tested using chemiluminescent microparticle immunoassay (CMIA) Abbott Architect SARS-CoV-2 IgG on the Abbott ARCHITECT i2000sr platform (Abbott Laboratories, Chicago, USA) that detects immunoglobulin class G (IgG) antibodies to the nucleocapsid protein of SARS-CoV-2 (cutoff for positivity 1.4). In addition to that blood samples were also tested by enzyme-linked immunosorbent assay (ELISA) using CoronaPass total antibodies test (Genetico, Moscow, Russia) that detects total antibodies (cutoff for positivity 1.0) and is based on recombinant receptor binding domain of the spike protein of SARS-CoV-2 (Department of Microbiology, Icahn School of Medicine at Mount Sinai, New York, NY, USA). We simultaneously report seroprevalence based on CMIA and ELISA.
Sample size. Initial sample size of 1550 participants was calculated assuming prevalence of 20% and test sensitivity (100%) and specificity (99.6%) for our CMIA test with sampling error was 2% using a 95% confidence interval (see Supplementary Appendix Fig. A1) 17 . After receiving the preliminary results (for 500 individuals), we reduced the sample size by assuming 10% prevalence that gave us a target sample size of 882 participants, that was rounded to 1000 participants.

Statistical analysis.
The primary aim of the study was to assess the seroprevalence of antibodies to SARS-CoV-2 in serum samples based on CMIA tests and ELISA tests accounting for non-response bias and test characteristics (sensitivity and specificity). Seroprevalence was defined as the proportion of those tested positive to all participants. Non-response was assessed by comparison of answers provided during the CATI by those visited the test site and all other surveyed.
To understand the direction of non-response bias in our data we estimated a binomial probit regression of individual agreement to participate in the study and offer his/her blood sample on observable characteristics. We used this fitted model to compute conditional probability to participate in the study (holding all but one variable at mean levels at a time). Our bivariate probit model is formally introduced in Statistical Appendix).
In the secondary analyses we also assessed seroprevalence by week based on the date of interview and the date of blood sampling. In subgroup analysis we first compared seroprevalence estimates corrected for non-response between different groups of individuals based on their answers in CATI. To explore individual risk factors for test positivity and obtain prevalence ratios we estimated a generalised linear model with Poisson distribution and a log link restricted to data from participants who completed clinic paper-based survey. We have entertained the possibility to use robust variance-covariance matrix in our adjusted prevalence ratio analysis. However, such www.nature.com/scientificreports/ adjustment narrowed the confidence intervals rendering our adjusted estimates less conservative 18 . For this reason we report confidence intervals from the unadjusted variance-covariance matrix.
In sensitivity analysis we explored how inclusion of different sets of observable characteristics of individuals (namely, travel history, face mask use, public transport use, visits to public places and others) in the model that corrected seroprevalence for non-response influenced the results. We also applied alternative definitions of seroprevalence (test combination either favouring sensitivity or specificity). To account for possible sample non-representativeness in sensitivity analysis we computed raking weights to match the survey age group and educational attainment proportions in 2016 representative survey of adult city population (see Supplementary Appendix Table A3 for description of this survey and the target proportions). R package anesrake was used to compute the weights 19 . We then estimated seroprevalence on re-weighted data.
We treated refusals to answer certain phone or paper-based survey questions as missing data, for this reason the results onwards are considered after listwise deletion of observations with missing variables.
All reported seroprevalence results were also corrected for test characteristics using the manufacturer's validation data-sensitivity (100% and 98.7%) and specificity (99.6% and 100%) for CMIA and ELISA test, respectively 20 . Standard errors were computed with delta method. Detailed description of statistical analysis is provided in Statistical Appendix).
Data sharing. All analyses were conducted in R 21 with the aid of GJRM package 22 , study data and code is available online (https:// github. com/ euspo rg/ spb_ covid_ study 20).

Ethical considerations and study registration. The study was approved by the Research Planning
Board of European University at St. Petersburg (on May 20, 2020) and the Ethic Committee of the Clinic "Scandinavia" (on May 26, 2020). All research was performed in accordance with the relevant guidelines and regulations. Informed consent was obtained from all participants of the study. The study was registered with the following identifiers: Clinicaltrials.gov (NCT04406038, submitted on May 26, 2020, date of registration-May 28, 2020) and ISRCTN registry (ISRCTN11060415, submitted on May 26, 2020, date of registration-May 28, 2020).

Results
Participation rates. Between May 21 and June 25, 2020 66,250 individuals were reached using RDD. Of 13,071 respondents agreed to participate in the CATI 6,671 were excluded for various reasons (see Fig. 1). The resulting 6,400 individuals responded to CATI questionnaire (see Supplementary Appendix Table A2 for details regarding missing records on variables of interest). The respondents were representative of the city population in terms of their gender, employment status, and household size, but were younger than the adult city population as of 2016 and had higher levels of educational attainment (see Supplementary Appendix Table A3).
3,390 of surveyed individuals agreed to receive a phone call from the clinic and schedule a visit for antibody testing. Between May 27 and June 26, 2020 only 1038 individuals that satisfied eligibility criteria visited the clinic and provided blood samples (16.2% and 30.6% of those who were interviewed and agreed to participate in serosurvey, respectively). The rest declined the invitation or did not show up at the test site. 1038 CMIA tests  Table A2 for summary statistics on phone survey respondents and tested individuals).
In the course of the study we observed the gradual attrition of participants. Compared with the individuals who limited their participation to the CATI, participants who took part in antibody testing were younger, more likely to be female, report a higher education level, experience illnesses in the previous 3 months, report a history of previous COVID-19 testing and a change in their hand-washing habits during the epidemic. Our attempt to randomly incentivize respondents to take part in the study by offering taxi did not reach its purpose (see Supplementary Appendix Fig. A2a).  Table 1). When we accounted for non-response bias with respect to demographic and socioeconomic characteristics our seroprevalence point estimates did not change considerably. Inclusion of characteristics associated with seroprevalence as regressors in our single imputation model shifted point estimates of seroprevalence downwards and after adjustment for all aforementioned characteristics in the model seroprevalence was 7.4% (95% CI 5.7-9.2) for CMIA and to 9.3% (7.4-11.2) for ELISA.

Secondary subgroup analysis.
Seroprevalence was similar between men and women and was slightly lower in the older (65+) age group (see Table 2). The seroprevalence was higher for individuals who reported past history of illnesses-(15.1% (95% CI 11.6-18.6) for CMIA and 20.0% (95% CI 14.8-25.2) for ELISA) compared to those who did not (3.8% (95% CI 2.1-5.5 for CMIA and 7.4% (95% CI 5.4-9.3 for ELISA). It was also higher for individuals who reported past history of COVID-19 tests, but was slightly lower in individuals who reported that they started washing hands more often since the onset of pandemic and lived alone. There was noticeable variation in seropositivity between city districts (see Fig. 2).
We observed a slight increase in seroprevalence by the week of the phone interview (see Fig. 3a) and by the week of the blood draw (see Fig. 3a).

Discussion
Our study aimed to assess the spread of epidemic in the fourth largest European city-St. Petersburg. This is the first population-based serological survey estimating COVID-19 spread in Russia and one of the few representative population-based studies in Europe Although the seroprevalence estimate varied based on the test used and type of correction applied, the total number of population with detectable antibodies was still far lower than the proportion needed for herd immunity. Overall seroprevalence in the range between 7% and 10% was in line with the results obtained from the previous studies and provides evidence of the similar epidemic development across the world with less than one tenth of population affected in the first months 5,6 .
To the best of our knowledge, this is the first seroprevalence survey of COVID-19 that applied correction based on characteristics that are associated with the risk of seropositivity in combination with incentivised participation. Early COVID-19 serological surveys are likely to exhibit high sampling error because of recruitment methods 27 . Population based studies with random sampling relied on probability weighting obtained from the comparison with the source population 5-7 . Our findings show that even low estimates of seroprevalence (around or below 10%) obtained in population surveys can be an overestimation in populations with high risk of non-response bias.
We detected only a slight change in the estimate of seroprevalence when we corrected our estimated for nonresponse bias with respect to demographic or socioeconomic characteristics, but far more significant difference was detected when several behavioural characteristics were included in models and applied in the correction.
In general, our analysis shows that naïve estimates that do not account for the non-response bias tend to drive prevalence estimates upward. In contrast to the findings in the literature examining the non-response bias in HIV serosurveys, on average participants who are more likely to have antibodies are more likely to participate in COVID-19 surveys 16,28 . Participants with history of illness in the last 3 months or past history of tests for COVID-19 in the last 3 months were more likely to agree to antibody testing in our study probably seeking external confirmation.
In our sample of participants we did find only a slight age difference in the seropositivity rates, and there was no difference between men and women, which is in line with previous findings 6 . However, we observed several clear differences in seroprevalence estimates in a subgroup analysis. First of all, we detected an elevated seroprevalence in participants who reported history of illness and history of any COVID-19 test in the last 3 months, this association was seen regardless of the modelling approach. Second, seroprevalence was lower in participants who lived alone and reported that they started to wash their hands more often. Third, in the secondary analysis of participants who were tested we observed that seroprevalence was lower in current smokers compared to never smokers, it was also lower in participants who reported past history of allergies.
All associations revealed in our study should not be immediately regarded as causal due to limitations in the study design and analysis. History of testing and illness in the last 3 months can be easily interpreted. Seroprevalence among those reporting a history of COVID-19 testing was relatively low (around 20%), this can be explained by the high scale of testing in Russia since the onset of the epidemic. However, our study is not a direct evidence of the effectiveness of hand hygiene, as self-reported change in habits can reflect other differences between sub-populations. There is limited and conflicting evidence about the smoking rates in COVID-19 patients 29,30 . While our study is the one of the first that compared population-based seroprevalence estimates between smokers and non-smokers there is a need for more studies to confirm this finding 9 . There are many It is also tempting to immediately search for biological explanation that link allergy status and risk of infection 32 . However, we should be very cautious due to limitations of study design and other possible explanations, e.g. people who self-report being allergic may behave in a way to minimize risk of being infected. The question about allergy was very general in our paper-based survey, that also limits the value of this finding.
Important source of bias in serological studies is the performance and the nature of the serological tests 33 . Possible explanation of the difference in our study includes different classes of Ig analysed-IgG in case of CMIA and IgG+IgM+IgA in case of ELISA. However, given the total seroprevalence of not more than 10% it seems that lack of IgM and IgA in CMIA test can only partially explain the difference. A recent study showed that seroconversion started on day 5 after disease onset and IgG level rose even earlier than IgM 34 . Another possible explanation for different seroprevalence estimates of two tests is the nature of antigen. SARS-CoV-2 antibody responses specific to the Spike (S) and/or the nucleocapsid (N) proteins are equally sensitive in the acute infection phase 35 . However, as compared to anti-S antibody responses, those against the N protein appear to wane in the post-infection 36 . Recent evaluations of CMIA test used in our study reported sensitivity far below 100% reported by manufacturer. This may also explain the difference 37,38 . Independent validation of the serological assays used in our study is required. This validation should take into account that fact the sensitivity may be declining over time. Another source of underestimation is a proportion of infected that do not seroconvert. Straightforward adjustments for this sort of biases are not available without additional laborious testing 39 .
Our study has several other important limitations. We are addressing seroprevalence in adults only, while previous studies also included participants younger than 18 years old 5,6 . We are reporting prevalence over the period of more than two months that may not reflect the point prevalence at the end of the study period. Our study had a relatively low participation rate given the existing propensity to answer phone calls in the city. However, the majority of phone numbers generated through random digit dial were not reached, rather than Table 3. Prevalence ratios for self-reported characteristics of tested individuals in phone and paper-based surveys. * -"Cold symptoms in the last 3 months" was used in the paper-based survey instead of "Past history of illness in the last 3 months" in the phone-based interview. www.nature.com/scientificreports/ declined to participate. Among 6,671 excluded 3,048 (45,7%) were actually ineligible. We assumed missingness at random for those who did not complete the interview or did not pick the phone. Comparison with the previous representative city survey showed that our sample was representative (see Supplementary Appendix Table A3). We have also excluded distant city districts from our sampling. Even though we observed statistically significant differences between by-district seroprevalence, the lion's share of city residents (about 4.3 mln of 5.2 mln) live in the surveyed districts. Our randomized incentivisation scheme was not successful because randomly assigned taxi offer was not associated with participation agreement and failed to become a valid exclusion restriction. In our main analysis we did not apply post-stratification methods adopted previously 5 . However, application of raking weights estimated to match targets from a representative survey of adult city population showed little to no changes in weighted seroprevalence estimates. We explained this by little to no association between seroconversion and age or education level. Finally, we report cross-sectional results but longitudinal data are needed to offer additional insights to immunity waning and prolonged defence against re-infection.
Conclusion. COVID-19 pandemic has already affected at least 300 000 residents of St. Petersburg that can be extrapolated to millions in the whole country. However the vast majority of population does not carry antibodies to SARS-CoV-2. This highlights the need for further high-quality population based studies that can provide evidence for measures to diminish the impact of the pandemic.