Prevalence of SARS-CoV-2 antibodies in France: results from nationwide serological surveillance

Assessment of the cumulative incidence of SARS-CoV-2 infections is critical for monitoring the course and extent of the COVID-19 epidemic. Here, we report estimated seroprevalence in the French population and the proportion of infected individuals who developed neutralising antibodies at three points throughout the first epidemic wave. Testing 11,000 residual specimens for anti-SARS-CoV-2 IgG and neutralising antibodies, we find nationwide seroprevalence of 0.41% (95% CI: 0.05–0.88) mid-March, 4.14% (95% CI: 3.31–4.99) mid-April and 4.93% (95% CI: 4.02–5.89) mid-May 2020. Approximately 70% of seropositive individuals have detectable neutralising antibodies. Infection fatality rate is 0.84% (95% CI: 0.70–1.03) and increases exponentially with age. These results confirm that the nationwide lockdown substantially curbed transmission and that the vast majority of the French population remained susceptible to SARS-CoV-2 in May 2020. Our study shows the progression of the first epidemic wave and provides a framework to inform the ongoing public health response as viral transmission continues globally.

A fter the first case of SARS-CoV-2 infection was reported in France on 24 January 2020, authorities largely relied on confirmed case counts to monitor the unfolding epidemic 1 . Case-based surveillance focused primarily on symptomatic patients or those with severe disease and access to biological confirmation was initially limited. The surge in COVID-19 hospitalisations and deaths, particularly in the eastern and Paris regions, led the French authorities to implement a national lockdown from 17 March to 11 May 2020.
It is now clear that a substantial fraction of infected individuals develop mild symptoms or even remain asymptomatic [2][3][4][5] . For this reason, the actual proportion of the French population infected during the first epidemic wave remains elusive. Prevalence of previous or current infections is critical to understanding the course and extent of the epidemic.
Since a serological response is likely to take place in all SARS-CoV-2 infected individuals, the corresponding serological markers should persist for at least some time. Accordingly, prevalence of SARS-CoV-2 antibodies can assess cumulative population incidence. Such an assessment can be obtained from seroepidemiological studies, provided that the antibody detection method is accurate enough, even in a low prevalence context, and that the results from the study sample can reasonably be extrapolated to the population. In addition, such studies can measure the proportion of infected individuals who developed neutralising and potentially protective antibodies, which is particularly important in the absence of a vaccine 6 . To the best of our knowledge, few seroprevalence studies have included detection of SARS-CoV-2 neutralising antibodies, and none at a national level 2,[7][8][9][10] .
To estimate the fraction of the French population infected with SARS-CoV-2 over time as well as the proportion of individuals having developed neutralising antibodies, we implemented serological surveillance based on serial cross-sectional sampling of residual sera obtained from clinical laboratories. Here, we present nationwide estimates of seroprevalence in the French population, with estimates stratified by age, sex and region, from three collection periods prior to, during, and following the lockdown.

Results
Sampled population. A total of 9184 residual sera for Metropolitan France were randomly selected from available sera at the three collection periods (3221 samples from 9 to 15 March 2020, 3084 samples from 6 to 12 April 2020 and 2879 samples from 11 to 17 May 2020). For the French overseas departments, 613, 511 and 713 samples were included, respectively, for the three collection periods (we excluded Mayotte Island from the analysis due to an insufficient number of available samples). The age, sex and regional distribution of the sample population is shown in Supplementary  Table 4). When taking into account the inherent delay between infection and IgG-mediated antibody responses, this estimate provides the number of infections which occurred~2 weeks prior to the collection periods 11 . The prevalence of pseudo-neutralising antibodies for SARS-CoV-2 Sprotein rose from 0.06% (95% CI: 0.00-0.17) to 3.33% (95% CI: 2.66-4.07) over the same period ( Table 1). The raw proportions of positive sera for each individual test are detailed in Supplementary  Table 3. Seroprevalence increased significantly between March and April, with a ten-fold increase in relative risk, but plateaued from April to May 2020 (Fig. 1a).

Discussion
Nationwide serological surveillance in France measures the extent of the epidemic during a period when case-based surveillance prioritised assessment of symptomatic cases and testing capacity was limited. We show that following the first wave of the COVID-19 epidemic, seroprevalence remained low, with about 5% of the population having developed a detectable humoral response to the virus. This level is within the same order of magnitude as studies carried out at comparable epidemic stages in Europe 4,12,13 . Estimates at multiple points of the French epidemic show a sharp increase between the first two collection periods, immediately preceding and during the generalised lockdown, followed by little progression observed at the final collection period just after lockdown ended. This confirms its substantial impact in almost halting community transmission.
The overall IFR estimated from hospitalised deaths is in line with previous estimates 14,15 , but is greatly increased when accounting for deaths in nursing homes, as is found in other countries 16 . Biological analyses of the institutionalised elderly population in France are typically carried out in clinical laboratories and as such this population should be represented in our sampling and seroprevalence estimates. As IFR is not solely determined by the pathogenesis and may evolve as health systems  improve their care strategies, it is essential to re-evaluate this metric as the epidemic progresses 17 .
One of the primary strengths of our study is the inclusion of individuals of all ages, notably children under 10 years old. Understanding how school-aged children are susceptible to infection remains of particular importance in the face of continuing challenges for public health decisions about school settings. Seroprevalence was lowest in primary school-aged children suggesting limited susceptibility and/or transmissibility in this age group. This finding is compatible with a previous cohort study in France which concluded that primary school-aged children were poor drivers of SARS-CoV-2 transmission amongst themselves or to teachers 18 .
As expected, regional results show significantly higher seroprevalence where circulation occurred earlier and was more intense, notably in Île-de-France and Grand-Est. A large religious gathering in early March in the Grand-Est region triggered intense regional circulation of the virus and was responsible for secondary cases all over Metropolitan France and in French Guiana 19 . Estimates for other French regions confirm widespread, but less intense SARS-CoV-2 circulation at the exit of lockdown. To date, few seroprevalence studies have included detection of neutralising antibodies, which are theoretically correlated to protection 2,3,7,10 . Importantly, seroprevalence of neutralising antibodies has not been estimated at a nationwide scale. As of 17 May 2020, we find that~70% of seropositive individuals had detectable pseudo-neutralising antibodies with large variation across age categories and regions. Several studies similarly reported that only a fraction of seropositive individuals had detectable levels of neutralising antibodies, this fraction being variable 2,3,7-9 . This finding could be explained by differences in antibody kinetics with delayed appearance of neutralising antibodies 20 .
There are three additional factors that should be taken in account in the interpretation of our results. First, we set positive thresholds for our assays to a specificity of 100%. While ruling out the risk of false positives, this could preclude the detection of the lowest antibody levels. In particular, our in-house tests were calibrated on a series of confirmed, hospitalised, COVID-19 cases, which may have limited the assessment of sensitivity. As a result, and even though the model corrected for imperfect sensitivity, we may still be underestimating the proportion of individuals with mild or asymptomatic infections who may develop a weaker or more short-lived humoral response [21][22][23] . Moreover, possible differential waning of antibody levels affecting mainly anti-N and pseudo-neutralising antibodies, could also result in an underestimation of seroprevalence at a distance from the epidemic waves, but this should be negligible within our relatively short surveillance period 20,24,25 . In order to facilitate the interpretation of SARS-CoV-2 infection seroprevalence as the pandemic progresses, longitudinal serological studies documenting symptoms and immune response remain essential. Finally, the urgency to provide estimates of infected population as well as logistic constraints in the lockdown period prevented the use of census or address-based sampling frames. Although the use of residual sera limits the risk of self-selection bias, it may introduce potential bias if individuals who required laboratory tests differ in terms of risk of infection from the general population. If the sampled individuals required routine monitoring for chronic health problems, they may have taken greater precautions and lowered their exposure to the virus, leading to underestimation of seroprevalence compared to the general population. However, our estimates are comparable to those reported from serological studies conducted in large preexisting representative cohorts in Île-de France, Grand-Est and Nouvelle-Aquitaine 2 . Additionally, when comparing our regional estimates and COVID-19 mortality rates, a surveillance indicator with a low susceptibility to reporting bias and which should correlate with population exposure, we find a strong correlation, with French Guiana largely influencing the overall coefficient (Fig. 2b). This discrepancy between virus circulation and mortality rate for French Guiana seems to be explained by the age structure of its infected population, skewed towards young ages 26 . These assessments against external data suggest that using residual sera can be a robust and cost-effective approach for serological surveillance.
The availability of residual sera made it possible to quickly implement sample collection early in the epidemic, providing a background seroprevalence estimate prior to the peak, and to observe epidemic dynamics throughout the first wave by including multiple collection periods. Our seroprevalence estimates, including the proportion of the population having produced pseudo-neutralising antibodies, confirmed that postlockdown, the vast majority of the French population remained susceptible to SARS-CoV-2, even in regional hotspots. We find that a seroprevalence of at most 9% in certain regions yielded enough hospitalisations to overwhelm the healthcare system. Our results provide a critical understanding of the progression of the first epidemic wave and a framework to inform the ongoing public health response as viral transmission continues in France and globally. Serological surveillance based on residual sera will continue to be used to provide timely seroprevalence estimates as the pandemic evolves and through 2021 to monitor the progression of population level immunity and guide public health response.

Methods
Design and population. Serological surveillance used repeated cross-sectional sampling of residual sera obtained from biobanks of the two largest centralising laboratories in France covering all regions and accounting for~80% market share in specialty clinical diagnostic testing, according to the Autorité de la concurrence (French competition regulator) 27 . Residual sera included specimens from individuals of all ages undergoing routine diagnosis and monitoring in all medical specialties (such as biochemistry, immunology, allergy, etc.) except infectious diseases and obstetrics.
Sample selection and preparation. Specimens were collected over three 1-week periods: prior to (9)(10)(11)(12)(13)(14)(15) March 2020), during (6-12 April 2020) and following (11)(12)(13)(14)(15)(16)(17) May 2020) the nationwide lockdown. To obtain results by subgroups and enough precision, we randomly sampled available sera at the biobanks. Sampling was stratified by sex, 10-year age groups (0-9 years to ≥80 years) and region. Due to the limited number of sera available for French overseas departments (Guadeloupe, Martinique, Mayotte, French Guiana, La Réunion), all available sera were included. Relying on early modelling of the COVID-19 epidemic, which estimated an expected prevalence of 3% as of 28 March 2020, we calculated a target sample size of 3500 per collection period, with a margin of error of 0.55% 28 . After selection, blood samples were centrifuged and sera were transferred on 96-well microplates then frozen at −20°C before transport.
SARS-CoV-2 antibody testing. All serological analyses were conducted with the National Reference Centre for Respiratory Infection Viruses including Influenza at the Institut Pasteur in Paris. Three novel serological assays were developed: two Luciferase-Linked ImmunoSorbent Assay (LuLISAs), detecting the nucleoprotein (LuLISA N) and spike (LuLISA S) protein of SARS-CoV-2, respectively, and a pseudo-neutralisation assay (PNT) 20,29 . The two LuLISA assays are endowed with a wide dynamic range (4-log) and a high throughput capacity (2300 assays/h) 30 . In LuLISA, the presence of all four anti-N or anti-S IgG subtypes is detected using a unique alpaca anti-human IgGVhH (single variable heavy chain antibody domain), consisting in an IgG-binding moiety directed against the Fc domain of human IgG. This VhH is expressed in fusion with the NanoKAZ luciferase, the bioluminescent activity of which is measured. The full description of the in-house anti hIgG VhH is provided in Anna et al 20 . Serum samples are considered positive when the relative light units per second (RLU/s) value is above the threshold determined for each of the LuLISA IgG/N and IgG/S assays from a pre-pandemic serum collection. The PNT mimics the SARS-CoV-2 entry step in HEK 293T cells stably expressing the human SARS-CoV-2 spike receptor ACE2 on their surface. It uses a lentiviral vector pseudo-typed with SARS-CoV-2 Spike protein, which penetrates cells in an ACE2-dependent manner, and consequently expresses a luciferase Firefly reporter. When the lentiviral Spike-mediated entry is blocked by potential serum neutralising antibodies, this leads to a reduced bioluminescence signal expressed as RLU/s. This test makes it possible to estimate the prevalence of potentially neutralising anti-S antibodies, although the effective level of protection conferred by neutralising antibodies remains unclear.
Assay calibration. Individual test characteristics were assessed using sets of prepandemic sera collected before 04/09/2019 in healthy individuals from the collection of ICAReB biobanking platform at Institut Pasteur and sera from hospitalised cases of COVID-19 confirmed by RT-PCR, with mostly moderate illness, sampled between 8 and 36 (median = 16) days past symptoms onset (Fig. 2a).
For LuLISA, serum samples are considered positive when the RLU/s value is above the threshold determined for each of the LuLISA IgG/N and IgG/S assays from a pre-pandemic serum collection. For PNT, samples are considered positive with values below a threshold set as the mean minus threefold the standard deviation determined on a collection of pre-pandemic sera assuming a normal distribution (Shapiro-Wilk normality test W = 0.9943, p = ns). This threshold permits discrimination of sera with a significant anti-SARS-CoV-2 neutralising activity from those of naïve individuals with a 99% confidence index ensuring 100% specificity on pre-pandemic sera.
In a context of low expected prevalence of infections, we set the thresholds to define a positive test result in order to obtain an in-sample empirical rate of 100% specificity to reduce the risk of false positives. This led to suboptimal sensitivities for each individual testing method, ranging from 85 to 96% (Supplementary  Table 1). Since individuals exposed to the SARS-CoV-2 virus do not undergo a single type of immune response, the results of three different but complementary serological tests provided a more precise assessment of the population exposure to the virus. We defined seroprevalence based on the proportion of individuals who tested positive for SARS-CoV-2 antibodies for at least one of the three tests. This combination led to a perfect classification for our set of reference samples (223 prepandemic subjects and 45 hospitalised confirmed cases of COVID-19) (Supplementary Table 1).
Overview of statistical methods. Our aim is to infer the probability of SARS-CoV-2 seropositivity in the population using (1) tests results from three serological assays in specimens sampled from the population, (2) assay properties from the calibration study on known control (pre-pandemic) and case specimens and (3) post-stratification variables to account for demographics and geographic differences between the sample and population structure.
We infer seroprevalence in a Bayesian framework by fitting a general linear mixed model of seropositivity with sex, age, region and the collection period as predictors 31 . We then compute the fraction of infections reported as cases, IHR and IFR per 100 infections using national surveillance data.
Datasets. Our study data consist of three sets. The first contains serological results for n patients along with their sex (2 levels), age class (9), region (17) and collection period (3). The second contains for the three assays, the number of pre-pandemic samples tested N pp of which TN have true negative results and the number of samples from confirmed cases N cc of which TP have true positive results. Finally, we use population counts by sex, age class and region defining 306 poststratification cells for each collection period 32 .
Modelling seroprevalence. First, we assume that the three serological assays performed for all specimens provide three complementary markers indicative of infection. We therefore consider a test t combining the three results whereby a specimen with any positive result among the three is deemed positive, i.e. has a binary response y ti ¼ 1, and specimens with all three assays negative are classified by y ti ¼ 0. We assessed the sensitivity se t and specificity sp t of such a combined test. Let p t denote the probability of having a positive result for test t, test results are modelled as a Bernoulli process: y t $ Bernðp t Þ: Actual seroprevalence is derived from the frequency of positive tests, using estimates and associated uncertainties for sensitivity and specificity obtained from the calibration study. Accounting for the test performance, p t is related to the prevalence of SARS-CoV-2 antibodies π by p t ¼ se t π þ ð1 À sp t Þð1 À πÞ 33 . Sensitivity and specificity are defined in the following binomial processes: with subscripts pp for pre-pandemic and cc for confirmed cases. We derived seroprevalence from regression coefficients estimated from: π ¼ logit À1 ðβX þ α age σ age þ α region σ region þ α period σ period Þ; ð3Þ where β are fixed overall intercept and parameter for sex, with prior β $ Nð0; 1Þ and α * with Ã in (age, region, period) are varying intercepts with hierarchical hyper priors: We use the resulting probabilities of seropositivity in each stratum j to derive poststratified estimates for the total population or by subgroups: using national census population counts N j stratified by sex, 10-year age bands and region 32 .
Using posterior estimation of regression coefficients, we calculate the risk of having been infected relative to a reference category for each predictor.
The model is specified using RStan 2.21.2 34 and all data processing use R 3.6.2 35 . Code is publicly available at https://github.com/slevu/serpico2. Estimates are reported as mean of the posterior probability distributions over 10 4 iterations and their credible intervals by the 2·5th and 97·5th percentiles.
Fraction of reported infections. Using seroprevalence estimates, we first infer the cumulative number of infected individuals situating their exposure 20 days prior to sampling dates. We consider a mean incubation period of 5 days 36 and a mean delay between symptoms onset (if any) and detectable seropositivity of 15 days 29 . We quantify the observable fraction of infected population from national surveillance as the ratio of documented confirmed cases reported over estimated infected individuals, accounting for a reporting delay of 10 days 37,38 . Total number of confirmed cases per day was obtained from Etalab (https://dashboard.covid19.data. gouv.fr/) 39 .
Infection fatality and infection hospitalisation rates. We use the number of deaths stratified by age and region recorded in hospitals 40 and overall deaths in nursing homes (obtained from national surveillance 37 ) to derive the IFR by age. Age distribution of deaths in nursing homes during the first epidemic wave was obtained separately from a sample of 312 facilities. Dates of death events were considered with a time lag from infection to death of 20 days 14,38 . Hospital admission data were obtained from national surveillance 37 considering a delay from infection to hospitalisation of 10 days 38 .
Ethical considerations. Authorisation for conservation and preparation of elements of the human body for scientific use was granted to the two biobanks by the bioethics committee from General Board for Research and Innovation (DGRI) of French Ministry of Higher Education and Research (approvals Nos. AC-2015-2418 and AC-2018-3329). Information regarding secondary use of de-identified residual sera for approved research studies was systematically displayed and orally communicated at the primary clinical laboratories. The Ethics Committee (Comité de Protection des Personnes Ile-de-France VI, CHU Pitié-Salpétrière Hospital, Paris, France) waived the need for ethical approval for the collection, analysis and publication of the retrospectively obtained and anonymized specimens and data for this study. This work was carried out following regulations of the French Public Health Code (articles L. 1413-7 and L. 1413-8) and the French Commission for Data Protection (CNIL).
Reporting summary. Further information on experimental design is available in the Nature Research Reporting Summary linked to this paper.

Data availability
All data are present in the article and its Supplementary Information files or upon reasonable request from the corresponding author, although requests for data might require partial aggregation or downsampling to protect patient privacy. Source data are provided with this paper.