Nationally representative SARS-CoV-2 antibody prevalence estimates after the first epidemic wave in Mexico

Seroprevalence surveys provide estimates of the extent of SARS-CoV-2 infections in the population, regardless of disease severity and test availability. In Mexico in 2020, COVID-19 cases reached a maximum in July and December. We aimed to estimate the national and regional seroprevalence of SARS-CoV-2 antibodies across demographic and socioeconomic groups in Mexico after the first wave, from August to November 2020. We used nationally representative survey data including 9,640 blood samples. Seroprevalence was estimated by socioeconomic and demographic characteristics, adjusting by the sensitivity and specificity of the immunoassay test. The national seroprevalence of SARS-CoV-2 antibodies was 24.9% (95%CI 22.2, 26.7), being lower for adults 60 years and older. We found higher seroprevalence among urban and metropolitan areas, low socioeconomic status, low education and workers. Among seropositive people, 67.3% were asymptomatic. Social distancing, lockdown measures and vaccination programs need to consider that vulnerable groups are more exposed to the virus and unable to comply with lockdown measures.


Study population included in Ensanut COVID 2020
Supplementary Only for individuals with 15 years of age and more. 2 Worker with access to social security services or private medical insurance.

Summary
ENSANUT COVID-19 is a probabilistic survey designed to achieve two goals: a) To study the impact of SARS-COV-2 on the health and nutrition of the Mexican population, and b) to estimate the trends of the main chronic diseases (diabetes, hypertension, and obesity). Probabilistic surveys are exercises of statistical inference; i.e., they try to make inferences from a sample to the population. Statistical inferences from a survey can be expressed through confidence intervals; the validity of confidence intervals can be supported in two ways. First, if a survey is probabilistic and measurements have no error, the intervals with 95% confidence for a parameter θ will contain the parameter 95% of the time. We will describe the sampling procedure to show that ENSANUT COVID-19 is a probabilistic survey, and support the validity of the confidence intervals when measurements have no error. Second, we will verify that estimators of ENSANUT COVID-19 for parameters that change slowly over time are similar to estimators of other surveys.

Sampling design
The usual way to make a probabilistic survey is to define a sampling frame, allocate probabilities of selection, and select a sample. The first step of ENSANUT COVID-19 is the construction of a sampling frame.

Sampling frame of Primary sampling units (PSU)
The sampling frame of PSU was a list of Basic Geostatistical Areas (AGEBs) built by the National Institute of Geography and Statistics (INEGI). In urban localities (2500 and more inhabitants), where usually a locality has more than one AGEB, the AGEBs of the 2010 Census was used as PSU. In contrast, the AGEB´s of the 2005 Population and Housing Count were used as PSU for the rural localities (1 to 2,499 inhabitants) because rural AGEB´s of the 2010 Census are not publicly available. AGEBs of the 2005 PCH were updated as follows: the new localities in the 2010 Census were added to rural AGEB´s and towns that disappeared in the 2010 Census were removed from the rural AGEBs.

Sampling frame of Secondary Sampling Units (SSU)
We used the list of urban blocks provided by INEGI for the public as the sampling frame of SSU for urban AGEBs. We used the list of rural localities as the sampling frame of SSU for rural AGEB.

Sampling frame of tertiary Sampling Units (SSU)
The National Institute of Public Health (INSP) constructed sampling frames for the SSU selected for the ENSANUT COVID-19 survey. In urban blocks, INSP made a list of households, and in rural localities, INSP made a list of clusters of households. The INSP team (INSPcartography) that constructed the sampling frame of TSU was unrelated and independent of the household interviewers.

Sampling frame of individuals in households
Household interviewers made a list of all individuals in the households.

Domains of study
Sampling size was set to make inferences for 9 regions of Mexico. Regions were defined as set of contiguous entities. The resulting regions were: Pacific-North (Baja California, Baja California Sur, Nayarit, Sinaloa, Sonora), Frontera (Chihuahua, Coahuila, Nuevo León, Tamaulipas

Selection of primary sampling units
The primary sampling units (PSUs) were classified into three strata based on the size of the locality: rural (1 to 2,499 inhabitants), urban (2,500 to 99,999 inhabitants), and metropolitan (100,000 and more inhabitants). PSUs were selected with probability proportional to their population, and the sample size was allocated proportionally to the population cells of the contingency table defined by the cross-classification of entities and size of localities.

Selection of secondary sampling units
The selection scheme depended on the type of stratum. In the PSUs of the urban and metropolitan strata, 5 blocks were selected with probability proportional to the population of the block according to the Census. Then, in each selected block, a selection of 6 households was made using systematic sampling with a random start; selection of households was carried out in the field by INSP-Cartography using a computer. In the case of rural PSUs, 2 localities were selected with probability proportional to their size (total population). Later, during the field visit of INSP-Cartography, clusters of approximately 50 households were made in each locality; right away, 1 cluster was selected within each locality through a simple random sampling (SRS), and 1 sub-cluster of approximately 15 households was selected within the selected cluster, again, trough SRS.

Selection of people inside the households
The selection of participants within the households consisted of two stages. In the first stage, all households of a dwelling were identified and a household questionnaire was applied to each household (in Mexico more than one family or household could live in the same dwelling). The household questionnaire listed all the inhabitants and was stratified into six age groups. Supplementary table 2 specifies the sampling fraction for the age groups. ENSANUT COVID-19 selected a sample of health service users who received medical care in the last three months.

Supplementary table 2. Sampling fraction for individuals in the household Group
Fraction of selection Preschool Children from 0 to 4 years old All School Children 5 to 9 years old One per household

Sampling weights
ENSANUT COVID 19 selected individuals with a known probability, which was used to calculate the sampling weights. Sampling weights of ENSANUT COVID-19 were calculated on the basis of: a) probabilities of selection, b) response rates, and c) result of Census on the total number of individuals of Mexico. We expect that ENSANUT COVID 19 will produce unbiased estimators because weights are derived from probabilities of selection, and ENSANUT COVID 19 estimators resulted congenial to estimators of previous surveys, as is exemplified next.

Sampling weights
We compare ENSANUT-COVID19 estimators against external sources. We will present only three items for validation: the age pyramid, the prevalence of food insecurity, and the prevalence of diabetes. Supplementary figure 1 compares the age pyramid of the ENSANUT-COVID19 and the results of the Census 2020. Differences greater than 1% were not observed. Furthermore, ENSANUT-COVID-19 and the Census practically coincide in the percentage of men in households: 48% (ENSANUT-COVID19) and 49% (CENSUS).
Supplementary figure 1. Comparison of the age pyramids of the household population of ENSANUT-COV19 and the Census 2020 (N=36,024) The numbers are percentages of each age group and sex from the total population. Supplementary

Context
The Health Secretary announced that from August 2020, the National Health and Nutrition Survey 2020 (Ensanut 2020 Covid-19) would be collected at national level and by 9 regions: North-Pacific, Border, Center-Pacific, Center-North, Center, CDMX, State of Mexico, South-Pacific, Peninsula. This survey aims to provide information on the family experience of the pandemic, the effects on income, food security, diet quality, access to health services, and to measure SARS-CoV-2 antibodies to estimate the percentage of the population that has been exposed to coronavirus, under the coordination of the National Institute of Public Health (INSP).
The Institute for Epidemiological Diagnosis and Reference (InDRE), together with the INSP, worked on the processing of serological samples to evaluate the presence of specific antibodies against the SARS-CoV-2 virus. It is important to note that commercially available kits must be evaluated regarding sensitivity, specificity, positive and negative predictive values (PPV, NPV), and ROC curve prior to be used in studies for serological diagnosis. Here, we report the results of an evaluation of three commercial tests aimed at detecting IgG antibodies against SARS-Cov-2.

Study population
We • Elisa Anti SARS-CoV-2 (IgG) from the company EUROIMMUN (Sensitivity 94% and specificity 99.6%, the manufacturer does not report confidence intervals).

Biological samples collection
The medical staff of the UMF 198 was in charge of collecting the pharyngeal and nasopharyngeal exudates to detect SARS-CoV-2 using the reverse transcription technique coupled to the real-time polymerase chain reaction (rRT-PCR), and extracting blood samples to obtain serum. Pharyngeal and nasopharyngeal swab samples were processed by the IMSS and serum samples were sent to the InDRE SeroSurvey Laboratory for processing.
rRT-PCRs were obtained during the acute phase of infection (0-7 days). Blood samples were collected from patients with an initial diagnosis by PCR and patients recovered from infection at least 22 days after the onset of COVID-19 symptoms. The sera were processed by three different commercial kits for the determination of IgG class antibodies against SARS-CoV-2. Negative controls were provided by the INSP's ENSANUT-2018 biobank. 210 samples were randomly selected from the 32 states in Mexico, including areas where malaria is endemic, to consider the possibility of cross-reactivity. These samples do not have an rRT-PCR result, however, from an epidemiological moment prior to the emergence caused by SARS-CoV-2, they were assumed to be negative for infection.

Validation results
We evaluated the following parameters of the three different kits for serological analyses: sensitivity, specificity, positive predictive values, negative predictive values, the ROC curve, and the area under the curve.

Supplementary table 5. Contingency chart for the Elecsys Anti-SARS-CoV-2 assay
Real-time RT-PCR  figure 2, the area shaded in blue represents the confidence intervals.

Comparison of tests
A comparison was made between the ROCHE laboratory test and the tests of the EUROIMMUNE and ABBOTT laboratories to evaluate possible significant differences (Supplementary figure 5). No significant differences were observed between EUROIMMUNE and ROCHE (p=0.110). However, a significant difference between the ROCHE and ABBOTT laboratory tests was found (p=0.009).

Supplementary figure 5. Test performance between A) ROCHE and EUROIMMUNE and B) ROCHE and ABBOTT A) B)
For the results with available data of the CT values (n = 84), the results of the ROCHE test values were adjusted by grouping them into three age groups "0-20, 21-30 and 31-40". No significant differences were found between the CT-adjusted age groups and the results of the sensitivity and specificity tests for any of the three tests.

Processing costs
The unit price of the Elecsys Anti SARS-CoV-2 test from the ROCHE laboratory was $48.16 MXN. Each kit can process 200 tests, so the total cost of each kit was $9,632.27 MXN. The cost per test for the "SARS-CoV-2 IgG" assay from the ABBOTT laboratory was $259.89 MXN, each kit contains 100 tests and the total cost per kit was $25,989.02 MXN. Finally, the "Elisa Anti SARS-CoV-2 (IgG)" test from the EUROIMMUN laboratory has a unit cost of $218.59 MXN, each kit can process 96 tests so the cost per kit is $20,984.40. The costs of all tests consider VAT included. Among the three tests, the ROCHE laboratory test had the lowest cost.

Conclusions
The three tests evaluated show adequate performance in detecting positive cases (sensitivity from 91.4 to 92%) and discriminating between true negatives and false positives (specificity from 97 to 99%). However, among the three tests, the Elecsys Anti-SARS-CoV-2 test from the ROCHE laboratory obtained the best score in terms of sensitivity and specificity (92% sensitivity and 99.4% specificity) and the IgG test of SARS-CoV-2 from ABBOTT Company performed worse (sensitivity of 91.4 and specificity of 97.14). The most important variation between the tests are the false positives evaluated in the ENSANUT-2018 samples, before the appearance of SARS-CoV-2. ROCHE's laboratory test produced only one false positive, ABBOTT six, and EUROIMMUN four false positives and two indeterminate results. To estimate false negatives, the information should be interpreted with caution, as not all people infected with SARS-CoV-2 will generate antibodies.
Whereas ROCHE and ABBOTT's lab tests target the nucleocapsid, the EUROIMMUN "Elisa Anti SARS-CoV-2 (IgG)" test targets protein S, so its results should be more specific for the SARS-CoV-2. However, the unit cost of the EUROIMMUN laboratory tests is 4.5 times higher than the unit cost of the ROCHE laboratory tests. In this sense, although all the tests respond adequately in their sensitivity and specificity values, the ROCHE test has a better costbenefit performance. Therefore, this evaluation recommends the use of the "Elecsys Anti-SARS-CoV-2" test from the ROCHE laboratory for the processing and detection of anti-SARS-CoV-2 IgG antibodies in the samples obtained through the ENSANUT survey. COVID 2020.

Selection bias quantification
We evaluated the possibility of selection bias considering low response rate in the serologic sample. First, we selected variables associated with seroprevalence that could potentially affect the probability of accepting to participate in the serologic subsample: age, sex, region, education, employment, contact with a suspected case of COVID, having a respiratory disease, having experienced COVID-19 related symptoms. Then, we compared the distributions of those variables between the household questionnaire (36,024 subjects) and the serologic subsample (9,464 subjects). In that comparison, we considered that the household sample is a more representative sample of the population for two reasons: the sample size, and the proportion of people who agreed to participate (serologic vs household sample: 44% vs 73%). 1 We found that the serologic subsample had a lower proportion of students, and a higher proportion of people reporting contact with a suspected case, having a respiratory disease or symptoms, and reporting difficulty breathing (Supplementary table 11). We used raking, a sampling balance method, to replicate the distribution of the key variables from the household sample into the serologic subsample. Our first approach was to use the least key variables, because the variables could be correlated and using too many variables increases the complexity and reduces the efficiency. Using the household distribution of "symptoms" by age and region to adjust the sample distribution, we found that all the distribution of the serologic sample variables--except students--, now fall within the 95% CI of the distribution of the household sample (see Supplementary table 11  The sample size of independent individuals is n=9,464. Error bars represent 95% confidence intervals. Muscle and joint pain were asked in the same question.