Surveillance and detection aim to rapidly identify and isolate cases to prevent onward transmission of SARS-CoV-2 in the community and to avoid a substantial resurgence of cases of COVID-19. After an initial period—during which, because of a limited capacity, testing for SARS-CoV-2 infections mainly focused on severely ill patients—a new testing policy was implemented in France to systematically screen for potential infections with SARS-CoV-2 and enable lifting of the lockdown restrictions on 11 May 20208.

The specific characteristics of COVID-19, however, hinder the identification of cases9,10,11. Large proportions of asymptomatic infectious individuals12, and the presence of mild or paucisymptomatic infections that easily go unobserved9,11, present serious challenges to the detection and control of SARS-CoV-29,10,13. Missing a substantial portion of infectious individuals compromises the control effort, enabling the virus to silently spread10,11,12. Synthesizing evidence from virological3 and participatory syndromic surveillance4 with mathematical models2,14 that account for behavioural data15,16,17,18, we assessed the performance of the new testing policy in France and identified its main limitations for actionable improvements.

COVID-19 surveillance

Management of the COVID-19 pandemic in France after lockdown in spring (May–June) 2020 involved the generation of a centralized database that collected all data on virological testing (SI-DEP3, the information system for testing). All individuals with symptoms that were compatible with COVID-1919 were invited to consult their general practitioner and obtain a prescription for a virological test8. Contacts of confirmed cases were traced and tested. A total of 20,777 virologically confirmed cases were notified from 13 May (week 20) to 28 June (week 26) in mainland France. These cases included individuals with or without symptoms at the time of testing who tested positive for SARS-CoV-2 or individuals who tested positive for SARS-CoV-2 for whom information on clinical status at the time of testing was missing (Extended Data Fig. 1). Accounting for presymptomatic individuals among those presenting with no symptoms at the time of testing and after imputation of missing data (Methods), an estimated 16,165 (95% confidence interval, 16,101–16,261) symptomatic cases were tested in the study period (Fig. 1a). The average delay from symptom onset to testing decreased from 12.5 days in week 20 to 2.8 days in week 26 (Fig. 1b and Extended Data Fig. 1). Accounting for this delay (Methods and Extended Data Fig. 2), we estimated that 14,061 (13,972–14,156) virologically confirmed symptomatic cases had an onset of symptoms in the study period, showing a decreasing trend over time (2,493 in week 20, 1,647 in week 26). The test positivity rate decreased in the first weeks and stabilized at around 1.2% (mean over weeks 24–26).

Fig. 1: Virological surveillance, participatory syndromic surveillance and behavioural data for model parameterization.
figure 1

a, Estimated number of virologically confirmed symptomatic cases in mainland France by week of testing and week of onset (bar graphs), and test positivity rate (line graphs). Estimates are based on the imputation of individuals without symptoms who tested positive at the time of testing into asymptomatic or presymptomatic; imputation of missing data on clinical status at the time of testing into asymptomatic, presymptomatic or symptomatic; and imputation of the date of onset of symptoms for presymptomatic and symptomatic cases (Methods). Imputations were performed n = 100 times. Uncertainties (black bars) correspond to the 95% confidence intervals. Test positivity rates were computed for cases with complete information. Data for weeks 20–26 were consolidated in week 30. b, Breakdown of virologically confirmed cases with symptoms and complete information in the SI-DEP database by week of testing according to the declared onset of symptoms (left y axis; n = 5,514). The estimated time from onset to testing is also shown (right y axis; median and 95% confidence interval obtained from n = 100 imputations of the onset date). c, Weekly incidence of suspected cases of COVID-19 (median (dashed line), 95% confidence interval (shaded area) and 3-week moving average (solid line)), and percentage of individuals seeking healthcare (median and 95% confidence interval), estimated from the participatory surveillance system, (average weekly n = 7,481). d, The number of suspected cases of COVID-19 in the participatory cohort who sought healthcare, and among those individuals, the number of individuals who received a prescription and performed a virological test when given the prescription. e, Estimated change in presence at workplace locations over time and by region based on Google location history data17. Region acronyms are listed in Table 1. f, Percentage of individuals avoiding physical contact with respect to lockdown, estimated from a large-scale survey conducted by Santé publique France18.

Source data

A digital participatory system was additionally considered for COVID-19 syndromic surveillance in the general population20, including those who did not consult a doctor. Called, it was adapted from the platform (which is dedicated to the surveillance of influenza-like illnesses4) to respond to the COVID-19 health crisis in early 2020. It is based on a set of volunteers who weekly self-declare their symptoms, along with sociodemographic information. On the basis of symptoms declared by an average of 7,500 participants each week, the estimated incidence of suspected cases of COVID-1919 decreased from about 1% to 0.8% over time (Fig. 1c). Of 524 suspected cases, 162 (31%) consulted a doctor in the study period. Among them, 89 (55%) received a prescription for a test, resulting in the screening of 50 individuals (56% of those given the prescription) (Fig. 1d).

COVID-19 pandemic trajectories and detection rates

We used stochastic discrete age-stratified epidemic models2,14 based on demography, age profile21 and social contact data15 of the 12 regions of mainland France to account for age-specific contact activity and role in COVID-19 transmission. Disease progression is specific to COVID-192,14 and parameterized using the current knowledge to include presymptomatic transmission22, and asymptomatic12 and symptomatic infections with different degrees of severity9,11,23,24. The model was shown to capture the transmission dynamics of the pandemic in Île-de-France in the first wave and was used to assess the effect of lockdown and exit strategies2,14. Full details are reported in the Methods.

Intervention measures were modelled as mechanistic modifications of the contact matrices, accounting for a reduction in the number of contacts engaged in specific settings, and were informed from empirical data. Lockdown data were obtained from previously published studies2,14. The exit phase was modelled considering region-specific data of school attendance based on the data from the Ministry of Education16, partial presence at workplaces based on estimates from location history data of mobile phones17 (Fig. 1e), a reduction in the adoption of physical distancing over time and the increased risk aversion of older individuals based on survey data18 (Fig. 1f), and the partial reopening of activities. A sensitivity analysis was performed on the reopening of activities, as data were missing for an accurate parameterization of associated contacts. Testing and isolation of detected cases were implemented by considering a 90% reduction in contacts for the virologically confirmed cases of COVID-192,14. Region-specific models were fitted to regional hospital admission data (Fig. 2) using a maximum likelihood approach. Further details are reported in the Methods and Supplementary Information.

Fig. 2: Hospital admissions and number of new symptomatic cases.
figure 2

ac, Hospital admissions over time; data (points) and simulations (median and 95% confidence intervals) for Île-de-France (a), Pays de la Loire (b) and Normandy (c). Hospital admission data up to week 27 (consolidated in week 28) were used to infer parameter values. df, Projected number of new symptomatic cases over time (median and 95% confidence interval) and estimated number of virologically confirmed symptomatic cases by week of onset (points), for the same regions (Île-de-France (d), Pays de la Loire (e) and Normandy (f)) (left y axis). The estimated detection probability of symptomatic cases (%) is also shown (red points, median and 95% confidence interval; right y axis). In all panels, 95% confidence intervals were obtained from n = 500 independent stochastic runs. Plots for the remaining regions are shown in Extended Data Fig. 3.

Source data

The projected number of cases decreased over time in all regions, in agreement with the decreasing tendency reported in hospital admissions during the study period (Fig. 2 and Extended Data Fig. 3). Overall, 103,907 (95% confidence interval, 90,216–116,377) new symptomatic infections were predicted in mainland France in weeks 20–26 (from 35,704 (30,290–40,748) in week 20 to 4,319 (3,773–4,760) in week 26). Île-de-France was the region with the largest predicted number of symptomatic cases (from 12,427 (8,104–14,136) to 1,704 (1,258–2,004) from week 20 to week 26), followed by Grand Est and Hauts-de-France (Table 1 and Extended Data Table 1).

Table 1 Population, confirmed and projected symptomatic cases, estimated detection rate and trends

Projections were substantially higher than the number of virologically confirmed cases (Figs. 2, 3). The estimated detection rate for symptomatic infections in mainland France in the period of weeks 20–26 was 14% (12–16%), suggesting that about 9 out of 10 new cases with symptoms were not identified by the surveillance system. A lower detection rate was found for asymptomatic infections (Extended Data Fig. 5). The estimated detection rate increased over time (7% (6–8%) in week 20, 38% (35–44%) in week 26) (Table 1). By the end of June, five regions had a median detection rate above 50%, and six regions had a detection rate within the confidence interval of model projections (Fig. 3b–d). All regions except Brittany displayed average increasing trends in the estimated detection rate in June compared with May. We did not find any significant associations between the detection rate and the number of detected cases, or the test positivity rate (Extended Data Fig. 4). However, the detection rate was negatively associated with model-predicted incidence (Spearman correlation, r = −0.75, P < 10−15) (Fig. 3f). In addition, the data followed a power-law function, π = 66 × i−0.51, where π is the weekly detection rate of symptomatic cases (expressed as a percentage) and i the projected weekly incidence (number of cases per 100,000). This function quantifies the relationship between the detection capacity of the test–trace–isolate system and the circulation of the virus in the population. It clearly shows that the detection capacity rapidly decreases as the incidence of COVID-19 increases.

Fig. 3: Detection rate and incidence.
figure 3

a, Projected number of new symptomatic cases over time (median and 95% confidence interval) and estimated number of virologically confirmed symptomatic cases by week of onset (points) in mainland France (left y axis). The estimated detection rate of symptomatic cases (%) is also shown (red points, median and 95% confidence interval; right y axis). b, Estimated detection rate of symptomatic cases (%) and 95% confidence intervals over time for mainland France (red dots and bars), and for all regions (grey lines, only median values are shown for visualization). c, Map of the estimated detection rate (%) by region in week 26 (22–28 June 2020). d, Estimated detection rate per region compared to the national estimate. Regions are ranked by increasing median detection rate. Box plots represent the median (line in the middle of the box), interquartile range (box limits) and 2.5th and 97.5th percentiles (whiskers). e, Predicted percentage of the population infected (median and 95% confidence interval) compared with estimates from the serological study EpiCov26 performed on a representative sample of the population in mainland France. f, Estimated detection rate of symptomatic cases (%) by region and by week compared with the projected incidence by region and by week. The curve shows the result of a least-square fit to the data with a power-law function, π = a × ib, where π is the detection rate (expressed as a percentage), i is the weekly incidence (cases per 100,000), a = 66 (95% confidence interval, 52–85) and b = 0.51 (0.41–0.60). g, Estimated incidence of symptomatic cases and 95% confidence intervals in mainland France in week 26 from different sources: virological surveillance data (SI-DEP), participatory surveillance data (, with two estimates) and model projections. h, Projected incidence per region compared to the national estimate. Regions are ranked as in d. Box plots are as defined in d. In all panels, medians and 95% confidence intervals for model projections were obtained from n = 500 independent stochastic runs.

Source data

Validation of the model was performed in two ways. First, we compared our model projections of the percentage of the population infected with the results of three independent seroprevalence studies performed after the first wave in France7,25,26 (Methods). Modelling results are in agreement with serological estimates at the national and regional level (Fig. 3e and Extended Data Fig. 6). Second, we compared the projected incidence of symptomatic cases of COVID-19 in week 26 (6.69 (5.84–7.37) cases per 100,000) with the value obtained from the number of virologically confirmed cases (2.55 (2.48–2.61) cases per 100,000) and two estimates based on data (Fig. 3g). The first estimate applies the measured test positivity rate to the incidence of self-reported suspected cases of COVID-19 (estimate 1, which yielded 8.6 (95% confidence interval, 6.2–11.5) cases per 100,000); the second additionally assumes that only 55% would be confirmed as a suspected case by a physician and prescribed a test (according to data; estimate 2, which yielded 4.7 (3.4–6.3) cases per 100,000). Our projections are in line with plausible estimates from, and suggest that, on average, at least 80% of suspected cases should be tested to reach the predicted incidence.

Sensitivity analysis showed that the findings were robust to elements of the contact matrices that could not be informed by empirical data (Supplementary Figs. 8, 9). Furthermore, a model selection analysis showed that changes in contact patterns over time due to restrictions and the activities of individuals of different age classes after lockdown (for example, partial attendance at school and remote working) are needed to accurately capture the transmission dynamics (Supplementary Table 2 and Supplementary Fig. 5).


Despite a test positivity rate in mainland France well below the recommendations (5%) of the WHO5, a substantial proportion of symptomatic cases (9 out of 10) remained undetected in the first 7 weeks after lockdown.

Low detection rates in mid-May were in line with estimates for the same period from a seroprevalence study in Switzerland27. Surveillance improved substantially over time, leading to half of the French regions reporting numbers of cases that were compatible with model projections. The framework progressively strengthened with increasing resources over time, as shown by a more-rapid detection of cases (78% reduction in the average delay from symptom onset to testing from May to June). At the same time, the system benefited from a substantial and concurrent decrease in epidemic activity in all regions.

Despite this positive trend, our findings highlight structural limitations and a critical need for improvement. Some areas remained with limited diagnostic exhaustiveness. This is particularly concerning in those regions that were predicted to have large numbers of weekly infections (Île-de-France, in which only one out of three symptomatic cases was detected by the end of June, and Grand Est, in which one out of five was detected). Almost all patients (92%) who were clinically diagnosed by sentinel general practitioners as suspected cases of COVID-19 were prescribed a test20. However, only 31% of individuals with COVID-19-like symptoms consulted a doctor according to participatory surveillance data. Overall, these figures suggest that a large number of symptomatic cases of COVID-19 were not screened because they did not seek medical advice despite the recommendations. This was confirmed by serological studies. In France, only 48% of symptomatic participants with antibodies against SARS-CoV-2 reported consulting a general practitioner7; in Spain, between 16% and 20% of individuals with antibodies against SARS-CoV-2 reported a previous virological screening6. By combining estimates from virological and participatory surveillance data, we extrapolated an incidence rate from crowdsourced data that is compatible with model projections, under the hypothesis that the large majority of suspected cases would get tested (>80%). This finding further supports testing of all suspected cases of COVID-19. Large-scale communication campaigns should reinforce recommendations to raise awareness in the population and strongly encourage healthcare-seeking behaviour especially in patients with mild symptoms. At the same time, investigations to identify reasons for not consulting a doctor could be quickly performed through the participatory surveillance system.

Red tape might have contributed to low testing rates. Prescription of a test was deemed compulsory in the new testing policy to prevent misuse of diagnostic resources8; however, this involved consultation, prescription and a laboratory appointment, which may have discouraged mildly affected individuals who do not require medical assistance. To facilitate access, testing should not require a prescription, as later established by authorities28. Some local initiatives emerged over summer that increased the number of drive-through testing facilities, promoted massive screening in certain areas and offered mobile testing facilities to increase proximity to the population29. The use of antigen tests will further facilitate access. These initiatives are particularly relevant to counteract socioeconomic inequalities in access to care in populations that are vulnerable to COVID-1930,31. However, such strategies should not hinder a testing protocol that targets suspected index cases. Our results show that high testing efforts, measured by low test positivity rates, are not associated with high rates of detection. This was also observed in the UK during the first wave, when detection remained low despite large numbers of tests and a low positivity rate32. Without strong case-based surveillance, the risk is to disperse resources towards random individuals without symptoms who are unlikely to be positive. This could saturate the test–trace–isolate system, as observed during summer33, without slowing down the circulation of SARS-CoV-2 that is required to safeguard the hospital system.

Given presymptomatic transmission, notification of contacts should be almost immediate to enable the effective interruption of transmission chains22. For testing to be an actionable tool to control the transmission of SARS-CoV-2, delays should be suppressed and screening rates greatly increased but better targeted. Over May–June, the average weekly number of tests was 250,000—remaining well below the objective that was originally set by authorities (700,000 tests). The number of tests increased over summer, but proportionally to the increased circulation of the virus. The capacity of detection of the test–trace–isolate system scaled as the inverse of the square root of the incidence, already deteriorating rapidly at low incidence levels. More aggressive testing that targets suspected index cases should be performed at low viral circulation to avoid case resurgence. The system was predicted to be able to detect more than two out of three cases (rate >66%) only if the incidence was lower than one symptomatic case per 100,000, a figure that is 50 times smaller than estimated at the exit from lockdown. As detection of at least 50% of cases is needed to control the pandemic while avoiding strict social distancing2, these results indicate that the system was insufficient to perform comprehensive case-based surveillance, as has been recommended when aiming to phase out restrictions5. Current restrictions applied in Europe to curb the second wave offer a second opportunity to improve testing policies and support the lifting of these measures in the upcoming weeks. Failing to do so may lead to a rapid and uncontrolled increase in the number of cases of COVID-192,34. Such risk is even stronger in the winter season and with the existing fatigue with regard to adhesion to the restrictions18.

Models were region-based and did not consider a possible coupling between regional epidemics caused by mobility. This choice was supported by stringent movement restrictions during lockdown30, and by the limited mobility increase in May–June, before important inter-regional displacements took place at the start of the summer holidays in July. Foreign importations of the virus35 were neglected as France reopened its borders with EU member states on 15 June, and the Schengen area remained closed until July. The cohort is not representative of the general population; however, a previous study on influenza-like illnesses has shown that the adjusted incidence was in good agreement with sentinel estimates4. Underdetection may also continue because of the imperfect characteristics of the reverse-transcription PCR tests used to identify infections of SARS-CoV-236. Some cases tested for SARS-CoV-2 could have had false-negative results, for example, because they were tested too early after the infection, thus further increasing the rate of underdetection. Previous work assessed the rate of underdetection in 210 countries32, but this study mainly focused on the early global dynamics. Our model gives up geographical extent for higher data quality in a specific country, providing a synthesis of data sources that characterizes human behaviour over time and space together with virological and participatory surveillance data to identify the weak links in the pandemic response.

Our findings identify critical needs for the improvement of the test–trace–isolate response system to control the COVID-19 pandemic. Substantially more aggressive and efficient testing that targets suspected cases of COVID-19 needs to be achieved to act as a way to control the COVID-19 pandemic. Associated communication and logistical needs should not be underestimated. These elements should be considered to enable the lifting of restrictive measures that are currently used to curb the second wave of COVID-19 in Europe.


No statistical methods were used to predetermine sample size. The experiments were not randomized and the investigators were not blinded to allocation during experiments and outcome assessment.

Virological surveillance data

The centralized database SI-DEP for virological surveillance3 collects all tests performed in France for any reason. In the period under study, guidelines recommended individuals to consult a general practitioner at the first sign of COVID-19-like symptoms and to obtain a prescription for a virological test (a prescription was compulsory to access the test)8. In addition, routine testing was performed for patients admitted to the hospital with any diagnosis, healthcare personnel and individuals at other facilities (for example, in some care homes for older people or long-term healthcare facilities). Data include detailed information for the individuals tested in France, including (1) the date of the test; (2) the result of the test (positive or negative); (3) location (region); (4) the absence or presence of symptoms at the time of testing; (5) self-declared delay between onset and test in presence of symptoms. The delay is provided with the following breakdown: onset date occurring 0–1 day before date of test, 2–4 days before, 5–7 days before, 8–15 days before, or more than 15 days before. For some tests, information on points (4) and (5) is missing. The SI-DEP database provided complete information for 23,210 (66%) out of 35,264 laboratory-confirmed cases of COVID-19 tested between week 20 (11–17 May) and week 30 (19–26 July), with an increasing trend of complete information over time (from 49% in week 20 to 76% in week 30) (Extended Data Fig. 1). Among confirmed cases with complete information, 12,716 (55%) showed no symptoms at the time of testing (Extended Data Fig. 1). The study referred to the period from week 20 to week 26. Data up to week 30 were used to consolidate the data in the study period accounting for the delays.

Imputation of asymptomatic versus presymptomatic cases, onset date and missing information

Individuals who tested positive on a given date were recorded in the SI-DEP database as: cases with symptoms at the time of testing, with a self-declared delay from onset of symptoms; cases without symptoms at the time of testing; or cases with no information on presence or absence of symptoms at the time of testing. These three subsets of cases were analysed to account for the presence of presymptomatic individuals among those with no symptoms at the time of testing, imputation of missing data and the estimation of dates of infection or symptom onset.

For laboratory-confirmed cases of COVID-19 who had symptoms at the time of testing, we estimated their date of onset using the information on the date of test and the time interval of onset-to-test delay, which was self-declared by the patients (Fig. 1b). In the time period between weeks 20 and 30, 20% of cases had an onset-to-test delay of ≤1 day, 63% had a delay of ≤4 days, 83% had a delay of ≤7 days and 88% had a delay of ≤15 days (Extended Data Fig. 1). We fitted a Gamma distribution to the onset-to-test delay data with a maximum likelihood approach, using three different periods of time (May, June and July), to account for changes in the distribution of self-declared delays over time (that is, longer delays at the beginning of the study period, shorter delays at its end) (Extended Data Fig. 2). The estimated average delay in May, June and July was 12.9 (95% confidence interval, 7.0–16.1), 5.1 (3.7–6.3) and 2.7 (2.0–3.1) days, respectively. July data were used to consolidate data corresponding to infections with onset in June and tested with delay. Given a confirmed case with symptoms testing on a specific date, we assigned the onset date by sampling the onset-to-testing delay from the fitted distribution for that period, conditional to the fact that the delay lies in the corresponding time interval declared by the patient. We assumed that onset did not occur before the implementation of the national lockdown, on 17 March 2020 (week 12); we therefore truncated the Gamma distribution accordingly, when assigning the date of onset for cases with onset-to-test delay >15 days. The imputation procedure was carried out 100 times. Results were aggregated by week of onset.

For laboratory-confirmed cases of COVID-19 with no symptoms at the time of testing, we assumed that on average 40% of them were asymptomatic12 (see the ‘Transmission model summary’ section), whereas the remaining 60% were presymptomatic who tested early thanks to contact tracing. Imputation was done by sampling from a binomial distribution and repeated 100 times. Data on contact tracing could not be used to inform data on infection or symptom onset, because of national regulatory framework on privacy preventing the matching of the two databases (virological tests and contact tracing). Given the low sensitivity of PCR tests in the early phase of the incubation period, we considered that imputed presymptomatic cases belonged to the prodromic phase. Onset date for presymptomatic cases was estimated by sampling from an exponential distribution with a mean of 1.5 days, corresponding to the duration of the prodromic phase in our model (Supplementary Table 1). For imputed asymptomatic, we assumed the same delay from infection to testing as in cases with symptoms. Given the structure of our compartmental model and to match the definition of the time used for symptomatic individuals (week of onset), we considered a delay in the detection of asymptomatic individuals starting from the end of the prodromic phase (corresponding to the symptom onset time for symptomatic infections) to the date of testing. We assigned this date by sampling the delay from the monthly gamma distribution. Imputation of the dates was repeated 100 times.

For laboratory-confirmed cases of COVID-19 with no information on symptoms at the time of testing, missing data were imputed by sampling from a multinomial distribution with probabilities equal to the rate of occurrence of the outcomes (asymptomatic, presymptomatic or symptomatic with five possible time intervals for the onset-to-test delay) reported for cases with complete information and assuming the imputation of cases without symptoms into asymptomatic and presymptomatic, as described above. Imputation was performed by region and by week and repeated 100 times. Presymptomatic and symptomatic individuals were aggregated together by onset date (Fig. 1a) to estimate the rate of detection of symptomatic cases.

Participatory surveillance data and analysis is a participatory online system for the surveillance of COVID-19, available at It was adapted from GrippeNet.fr4 to respond to the COVID-19 health crisis in March 2020. is a participatory system for the surveillance of influenza-like illnesses available in France since 2011 through a collaboration between Inserm, Sorbonne Université and Santé publique France, supplementing sentinel surveillance4,37. The system is based on a dedicated website to conduct syndromic surveillance through self-reported symptoms volunteered by participants resident in France. Data are collected on a weekly basis; participants also provide detailed profile information at enrolment38. In addition to tracking the incidence of influenza-like illnesses4,37, was used to estimate vaccine coverage in specific subgroups39, individual perceptions towards vaccination40 and healthcare-seeking behaviour41. It was also used to assess behaviours and perceptions related to diseases other than influenza42, including COVID-1943.

Participants are on average older and include a larger proportion of women compared to the general population38,44. The participating population is, however, representative in terms of health indicators such as diabetes and asthma conditions. Despite these discrepancies, trends of the estimated incidence of influenza-like illnesses from reports compared well with those of the national sentinel system4,37. All analyses were adjusted by age and sex of participants.

To monitor suspected cases of COVID-19 in the general population, we used the expanded case definition recommended by the High Council of Public Health for systematic testing and described in their 20 April 2020 notice19, which included either of the two following definitions: (1) (sudden onset of symptoms OR sudden onset of fever) AND (fever OR chills) AND (cough OR shortness of breath OR (chest pain AND age > 5 years old)) or (2) (sudden onset of symptoms) OR (sudden onset of fever AND fever); and one of the three following conditions: (i) (age > 5 years old) AND ((feeling tired or exhausted) OR (muscle/joint pain) OR (headache) OR (loss of smell WITHOUT runny or blocked nose) OR (loss of taste)); or (ii) ((age ≥ 80 years old) OR (age < 18 years old)) AND (diarrhoea); or (iii) (age < 3 months old) AND (fever WITHOUT other symptoms).

Two independent estimates obtained from cohort data for the incidence of symptomatic cases in week 26 are shown in Fig. 3. These estimates were computed as follows. Estimate 1 = ( estimated incidence of suspected cases in week 26) × (test positivity rate from SI-DEP in week 26); estimate 2 = ( estimated incidence of suspected cases in week 26) × (estimated proportion screened and confirmed as a suspected case of COVID-19 by a physician, and prescribed a test; estimates from × (test positivity rate from SI-DEP in week 26). The two estimates were used to validate model projections and identify the specific surveillance mechanisms that needed improvement.

Ethics statement was reviewed and approved by the French Advisory Committee for research on information treatment in the health sector (that is, CCTIRS, authorization 11.565), and by the French National Commission on Informatics and Liberty (that is, CNIL, authorization DR-2012–024)—the authorities ruling on all matters related to ethics, data and privacy in the country. Informed consent was provided by each participant at enrolment, according to regulations.

Transmission models summary

We used a stochastic discrete age-stratified transmission model for each region based on demographic, contact15 and age profile data of French regions21. Models were region-specific to account for the geographically heterogeneous epidemic situation in the country and given the mobility restrictions limiting inter-regional movement fluxes. The study focused on mainland France where the epidemic situation was comparable across regions, and excluded Corsica, which reported very limited epidemic activity and overseas territories characterized by increasing transmission20.

Four age classes were considered: [0–11), [11–19), [19–65) and 65+ years old, referred to as children, adolescents, adults and older individuals. Transmission dynamics follows a compartmental scheme specific to COVID-19, in which individuals were divided into susceptible, exposed, infectious and hospitalized (Supplementary Information and Supplementary Figs. 1, 2). We did not consider further progression from hospitalization (for example, admission to intensive care units, recovery or death2) as it was not needed for the objective of the study. The infectious phase is divided into two steps: a prodromic phase (Ip) and a phase during which individuals may remain either asymptomatic (Ia, with probability12 pa = 40%) or develop symptoms. In the latter case, we distinguished between different degrees of severity of symptoms9,11,23,24, ranging from paucisymptomatic (Ips), to infectious individuals with mild (Ims) or severe (Iss) symptoms. Prodromic, asymptomatic and paucisymptomatic individuals have a reduced transmissibility rβ = 0.55, as estimated previously11, and in agreement with evidence from the field45,46,47. A reduced susceptibility was considered for children and adolescents, along with a reduced relative transmissibility of children, following available evidence from household studies, contact-tracing analyses, serological investigations and modelling works48,49,50,51,52,53. A sensitivity analysis was performed on the relative susceptibility and transmissibility of children, and on the proportion of asymptomatic infections (Supplementary Figs. 1013). Full details are reported in the Supplementary Information.

The study was not extended to the summer months, because of (1) the challenge of mechanistically parameterizing the contact matrices during summer; (2) the increase of movement fluxes across regions weakening our assumption of region-specific models; and (3) the interruption of surveillance during the summer break, which prevented the identification of the key factors behind case underascertainment.

Contact matrices

Age-stratified transmission uses a social contact matrix that reports the average contact rates between different age classes in France15. This refers to the baseline condition, that is, before lockdown. The contact matrix includes the following layers: contacts at home, school, workplace, transport, leisure activities and other activities, and discriminates between physical and non-physical contacts. To account for the change of contact patterns over time, contact matrices are mechanistically parameterized, by region and over time, with different data sources informing on the percentage of students going to school16, the percentage of workers going to the workplace17, the compliance to preventive measures18, with a higher compliance registered in older individuals18. Information on the progressive reopening of activities indicates that leisure and other activities were only partially open in the study period. Data, however, are not fine-grained enough to parameterize our model, so we assume a 50% opening of these activities and explore variations in the sensitivity analysis.

School attendance

School reopening was parameterized by considering the percentage of reported attendance at school (pre-school and primary school; middle and high school) provided by the Ministry of Education16 (Supplementary Fig. 3). The number of contacts in the school matrix was modified to account for the attendance of students in each school level provided by data. That is, attendance of 14.5%, referring, for example, to the attendance registered in Île-de-France in pre-schools and primary schools, corresponds to a reduction of 85.5% in the number of contacts established at school by students belonging to that school level. Contacts for different modes of transport were modified accordingly.

Presence at work

To account for the percentage of individuals at work, given recommendations on remote working and activities that were not yet reopened, we used the estimated variation of presence at workplaces based on mobile phone location data provided by Google Mobility Trends17. Contacts at work and for different modes of transport were therefore modified according to this percentage, as described for contacts at school. Household contacts were increased proportionally to each adult staying at home based on statistics comparing weekend versus weekday contacts15 and the proportion of adults working during the weekend54, as done previously2.

Adoption of physical distancing

Our previous work showed that physical contacts during lockdown were fully avoided2, in agreement with data collected afterwards18. To account for individual adoption of preventive behaviour after lockdown, we used the percentage of population avoiding physical contacts estimated from a large-scale survey conducted by Santé publique France (CoviPrev18). Data were fitted with a linear regression (Fig. 1) to provide the weekly percentage of individuals avoiding physical contacts. We therefore modified our contact matrices over time, removing the percentage of physical contacts corresponding to the survey estimates for that week.

Increased risk aversion of older individuals

Data from the Santé publique France survey CoviPrev18 also show that older individuals protected themselves further relative to other age classes. On average, they respected physical distancing 28% more than the other age classes (Supplementary Fig. 4). For this reason, we considered a further reduction of 30% in contacts for older individuals in the exit phase, informed by survey data.

Inference framework

The parameters of the transmission models to be estimated are specific to each pandemic phase.

Before lockdown, {βt0}, where β is the transmission rate per contact and t0 is the date of the start of the simulation, seeded with 10 infectious individuals.

During lockdown, {αLDtLD}, where αLD is the scaling factor of the transmission rate per contact and tLD is the date when lockdown effects on hospitalization data became visible.

After lockdown, {αexitπa(w), πs(w)}, where αexit is the scaling factor of the transmission rate per contact, and πa(w) and πs(w) are the proportion of asymptomatic and symptomatic cases tested in week w of the exit phase, respectively. Detected cases in the simulations had their contacts reduced by 90% to mimic isolation, as done in previous studies2,14.

We used simulations of the stochastic model to predict values for all quantities of interest (500 simulations each time). We fitted the model to the daily count of hospitalizations Hobs(d) on day d throughout the period and the number of people testing positive by week of onset, split according to disease status (symptomatic or asymptomatic), denoted Tests,obs(w) and Testa,obs(w) in week w of the exit phase. We used hospital admission data up to week 27 (29 June–5 July) to account for the average delay from infection to hospitalization. Data in week 27 were consolidated by waiting for one additional week to account for updates and missing data (week 28, 6–12 July 2020).

We assumed a Poisson distribution for hospitalizations and a binomial distribution for the number of people getting the test, therefore the likelihood function is

$$L({\rm{D}}{\rm{a}}{\rm{t}}{\rm{a}}|\varTheta )=\mathop{\prod }\limits_{d={t}_{o}}^{{t}_{n}}{P}_{{\rm{P}}{\rm{o}}{\rm{i}}{\rm{s}}{\rm{s}}{\rm{o}}{\rm{n}}}({H}_{{\rm{o}}{\rm{b}}{\rm{s}}}(d);{H}_{{\rm{p}}{\rm{r}}{\rm{e}}{\rm{d}}}(d),\beta ,{t}_{0},{\alpha }_{{\rm{L}}{\rm{D}}},{t}_{{\rm{L}}{\rm{D}}},{\alpha }_{{\rm{e}}{\rm{x}}{\rm{i}}{\rm{t}}},{\pi }_{{\rm{a}}}({w}_{d}),{\pi }_{{\rm{s}}}({w}_{d}))\times \prod _{w\in {\rm{e}}{\rm{x}}{\rm{i}}{\rm{t}}}{P}_{{\rm{B}}{\rm{i}}{\rm{n}}{\rm{o}}{\rm{m}}{\rm{i}}{\rm{a}}{\rm{l}}}({{\rm{T}}{\rm{e}}{\rm{s}}{\rm{t}}}_{{\rm{s}},{\rm{o}}{\rm{b}}{\rm{s}}}(w);{i}_{{\rm{s}},{\rm{p}}{\rm{r}}{\rm{e}}{\rm{d}}}(w),{\pi }_{{\rm{s}}}(w))\times \prod _{w\in {\rm{e}}{\rm{x}}{\rm{i}}{\rm{t}}}{P}_{{\rm{B}}{\rm{i}}{\rm{n}}{\rm{o}}{\rm{m}}{\rm{i}}{\rm{a}}{\rm{l}}}({{\rm{T}}{\rm{e}}{\rm{s}}{\rm{t}}}_{{\rm{a}},{\rm{o}}{\rm{b}}{\rm{s}}}(w);{i}_{{\rm{a}},{\rm{p}}{\rm{r}}{\rm{e}}{\rm{d}}}(w),{\pi }_{{\rm{a}}}(w))$$

where Θ = {βt0αLDtLDαexit, {πa(w)}, {πs(w)}} indicates the set of parameters to be estimated, Hpred(d) is the model-predicted number of hospital admissions on day d, is,pred(w) and ia,pred(w) are the model-predicted weekly incidences of symptomatic and asymptomatic cases, respectively, in week w of the exit phase, PPoisson is the probability mass function of a Poisson distribution, PBinomial for a binomial distribution, [t0tn] is the time window considered for the fit, and w is the week in the exit phase (weeks 20–26).

We reduced the required computations with an optimization procedure in two steps, first maximizing the likelihood function in the pre-lockdown and lockdown phase to estimate the first four parameters, and then maximizing the likelihood in the exit phase by fixing the first four parameters that describe the epidemic trajectory before the exit phase to their maximum likelihood estimators (MLEs). This second step was further simplified through an iterative procedure, and we show through simulations that the simplified optimization procedure is consistent and well-defined. The parameter space was explored using NOMAD software55. Fisher’s information matrix was estimated at the MLE value to obtain the corresponding confidence intervals. Simulations were then parameterized with 500 parameter sets obtained from the joint distribution of transmission parameters at MLE (one stochastic simulation for each parameter set). A Bayesian estimate of the posterior parameter distribution using Markov chain Monte Carlo (MCMC) would also have been an alternative to maximum likelihood and confidence interval estimation. In this case, however, MCMC would have considerably slowed down parameter exploration, with negligible added value to the fitting procedure.

We repeated model fitting starting from several starting points and using different random number streams. Values of fitted parameters and full details on the different steps and the tests performed are reported in the Supplementary Information (Supplementary Figs. 6, 7 and Supplementary Table 3).

Simulation details

Simulations are initialized with 10 infected adults in the Ip compartment at time t0. We obtained 500 parameter sets from the joint distribution of transmission parameters at MLE and ran one stochastic simulation for each parameter set. Therefore, errors in the detection rates computed in the output account for the variability of the estimate of the parameters, in addition to the stochastic fluctuations of the model. We find that the errors in the estimation of the detection rates obtained including the variability of the parameters are slightly larger than the ones obtained with only stochastic fluctuation, suggesting that the stochasticity of the model is the main source of error in the estimation of the detection rate.

Model selection analysis

To assess the role of the mechanistic modification of the contact matrix informed by the different data sources in the exit phase, we compared our model to a simplified version assuming that contact patterns in the exit phase do not change from pre-epidemic conditions, and that all changes in the epidemic trajectory are explained exclusively by the transmissibility per contact. This is equivalent to normalizing the contact matrix to its largest eigenvalue and estimating the reproductive ratio over time. We compared the two models with the Akaike information criterion and found that accounting for changes in contacts better describes the epidemic trajectory (Supplementary Table 2 and Supplementary Fig. 5).

Comparison with serological estimates

We compared model projections with serological estimates from three independent studies7,25,26 (Fig. 3e and Extended Data Fig. 6).

Estimates by Carrat et al.7 used ELISA-S tests and ELISA-NP tests. The sample was not representative of the population, and estimates were weighted to account for this bias. In the comparisons, we used the results from a multiple imputation method performed by the authors and estimating a participant’s positivity with a likelihood of positivity based on observed test results and covariates (see ref. 7 for more details).

Estimates by Santé publique France25 are based on at least one positive result in one of the following three tests: ELISA-S, ELISA-NP and a pseudo-neutralization test that detects the presence of pseudo-neutralizing antibodies, representative of the presence of neutralizing antibodies as conferring protection against infection. Analyses were performed on residual sera obtained from clinical laboratories, and estimates were weighted to account for the lack of representativeness.

Estimates by EpiCoV26 (Enquête Épidémiologie et Conditions de vie liées à la Covid-19) used ELISA-S tests and further validated these with a seroneutralizing antibody test at higher specificity (see ref. 26 for more details). This was the only seroprevalence survey that was conducted in a representative sample of the population. For this reason, we used it as the reference study.

For all studies, we report in Fig. 3e and Extended Data Fig. 6 the estimates 14 days before the last blood collection to account for the time needed to mount a detectable presence of antibodies. For the EpiCoV survey, we used the last date at which samples were sent back to the laboratory.

Modelling results are in good agreement with the serological estimates at the national level (Fig. 3e) and in the large majority of the regions (Extended Data Fig. 6). Projections tend to be systematically smaller than serological estimates in two regions that were weakly affected by the epidemic (Pays de la Loire and Brittany), although they remained compatible with observations.

Overall differences may be due to the limitations of the methods involved. First, the type of tests, the specificity levels, the samples of the population tested, and the weighting and imputation approaches considered in each serological study could lead to differences across the three investigations. We note, for example, that larger discrepancies are observed between EpiCov and Santé publique France results in those regions that experienced smaller epidemics. We used EpiCov as the reference study as it was the only one study that was conducted on a representative sample of the population. Second, there are limitations to the dataset of hospital admissions used to calibrate the models: the database infrastructure for data collection became operational in mid-March and was filled in retrospectively. Notification biases would inevitably alter the inference of parameters in the pre-lockdown phase. This may have differed region by region; however, we have no way to control for this potential bias; possible errors would have affected regions with small-size epidemics more than others. In support of this hypothesis, we note that a similar but independent mathematical model fitted to regional hospitalization data24 in the first wave predicted small epidemics in Pays de la Loire and Brittany, similarly to our model.

Reporting summary

Further information on research design is available in the Nature Research Reporting Summary linked to this paper.