Abstract
SusceptibleExposedInfectedRemoved (SEIR)type epidemiologic models, modeling unascertained infections latently, can predict unreported cases and deaths assuming perfect testing. We apply a method we developed to account for the high false negative rates of diagnostic RTPCR tests for detecting an active SARSCoV2 infection in a classic SEIR model. The number of unascertained cases and false negatives being unobservable in a real study, populationbased serosurveys can help validate model projections. Applying our method to training data from Delhi, India, during March 15–June 30, 2020, we estimate the underreporting factor for cases at 34–53 (deaths: 8–13) on July 10, 2020, largely consistent with the findings of the first round of serosurveys for Delhi (done during June 27–July 10, 2020) with an estimated 22.86% IgG antibody prevalence, yielding estimated underreporting factors of 30–42 for cases. Together, these imply approximately 96–98% cases in Delhi remained unreported (July 10, 2020). Updated calculations using training data during March 15December 31, 2020 yield estimated underreporting factor for cases at 13–22 (deaths: 3–7) on January 23, 2021, which are again consistent with the latest (fifth) round of serosurveys for Delhi (done during January 15–23, 2021) with an estimated 56.13% IgG antibody prevalence, yielding an estimated range for the underreporting factor for cases at 17–21. Together, these updated estimates imply approximately 92–96% cases in Delhi remained unreported (January 23, 2021). Such modelbased estimates, updated with latest data, provide a viable alternative to repeated resourceintensive serosurveys for tracking unreported cases and deaths and gauging the true extent of the pandemic.
Introduction
COVID19 was first diagnosed in Wuhan, China in December 2019 and was quickly declared a pandemic by the World Health Organization on March 11, 2020^{1}. The first case in India was declared on January 30, and as of April 4, 2021, there have been 12,587,921 cases and 165,132 deaths reported^{2}. India responded quickly, instituting a nationwide lockdown on March 25, when there were only 657 cases and 11 deaths^{2,3}. Epidemiologic models can be used to monitor disease rates and inform public health interventions, but data quality will impact the ability of models to make accurate predictions. Underreporting of cases and deaths attributable to SARSCoV2 infection has hindered modeling efforts. This underreporting is primarily due to limited testing, deficiencies in the reporting infrastructure and a large number of asymptomatic infections.
Classical epidemiologic models, such as the SusceptibleExposedInfectedRemoved (SEIR) compartmental model, have been used to predict the trajectory of the COVID19 pandemic. For example, a modification of the standard SEIR model applied to Wuhan data and accounting for presymptomatic infectiousness, timevarying ascertainment rates, transmission rates and population identified that the outbreak had high covertness and high transmissibility^{4}. This work estimated that 87% (with a lower bound of 53%) of the infections in Wuhan before March 8 were unascertained^{4}. However, traditional SEIR models do not account for imperfect testing^{5,6,7}. Individuals with a false negative diagnostic test will also remain unascertained and contribute to the compartment of latent unreported cases in a SEIR model.
It is important to clarify that there are two classes of tests that are being discussed in the literature and are relevant to this paper: diagnostic tests and antibody tests. A diagnostic test (typically an RTPCR test) is used to identify the presence of SARSCoV2, indicating an active infection^{8}. An antibody test (i.e., a serology test) looks for the presence of antibodies, the body’s immune response to fight off SARSCoV2, indicating a past infection^{9}. Figure 1 presents a timeline in terms of when these tests are administered during the course of an infection. Due to a large number of asymptomatic cases and limited number of tests, many infections do not get detected. Populationbased seroprevalence surveys, therefore, give us an idea about the “true” number of infections including reported and unreported cases, and consequently, the ascertainment rate^{10}. Thus, adjusted estimates of total number of cases and ascertainment rates based on serological surveys, when available, provide an option to validate modelbased estimates of unreported cases and ascertainment rates. These estimates would usually be impossible to validate (except for in a simulation study) since these numbers are not observable in the real data.
Both diagnostic and antibody tests suffer from the issue of false negatives and false positives. For the RTPCR test, a false negative is more worrisome since that means allowing an infected person a false safety assurance. In contrast, a false positive from an antibody test is of greater concern, since it gives the false impression that the person has been infected in the past, has gained some protection from the virus, and is unlikely to be infected again. The RTPCR test is quoted to have a high false negative rate, ranging from 15 to 30% (i.e., low sensitivity, 85–70%), and a low false positive rate around 1–4% (i.e., high specificity, 99–96%)^{11}. The antibody test assays are more precise—the commercial assays have sensitivity around 97.6% and specificity of 99.3% (DiaSorin) at about 15 days after infection^{10}.
To address these data quality issues and the high rate of asymptomatic COVID19 cases, we develop an extension to a standard SEIR model incorporating false negative rates in diagnostic testing to predict both the numbers of unreported cases and deaths and to estimate the rate at which COVID19 cases and deaths are being underreported (unascertained). Our method segregates the traditional infected compartment into tested/untested and true positive/false negative compartments, thus accounting directly for misclassifications due to imperfections in the RTPCR diagnostic tests. We apply this false negativeadjusted SEIR model to predict the transmission dynamics of SARSCoV2 in Delhi, the national capital region of India and one of the hotspots of COVID19 in the country, using data from March 15 to June 30, 2020 for our original set of calculations and an updated range from March 15 to December 31, 2020 for another updated set of calculations. We make predictions across a range of possible sensitivities for the diagnostic test, all assuming perfect specificity.
To understand the true extent of spread of the novel coronavirus, the National Centre for Disease Control (NCDC) in India have performed five rounds of serological surveys in Delhi, among several such studies conducted across the world (Table 1)^{12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,44}. While limited on reported details, the first round of the Delhi Serology Study collected 21,387 random samples across 11 districts in Delhi between June 27 and July 10, 2020 and found COVID19 antibodies present in 22.86% of samples^{12,18}. This seroprevalence is the highest among the studies till July 2020 summarized in Table 1 but is similar to that found in New York City (22.70%), another large, densely populated area^{42}. This indicates that Delhi, during July 2020, had high seroprevalence, even compared to worldwide epicenters and hotspots of COVID19. The fifth and latest round of the Delhi serology study which collected 28,000 random samples across all the 272 municipal wards of Delhi between January 15–23, 2021 found an even higher seroprevalence, estimated at 56.13%^{20}. This seroprevalence is the third highest among those summarized in Table 1, falling only behind the serosurvey in UK and that in Paris, France (among the working population)^{31,39}. These numbers show that Delhi continued to have high seroprevalence among all the COVID19 hotspots worldwide even in very recent times.
A simple proportional estimate based on these reported seroprevalences would tell us that Delhi, with approximately 19.8 million people, had around 4.6 million cumulative cases by July 10, 2020, and around 11.1 million cumulative cases by January 23, 2021. These numbers contrast sharply with the 109,140 cumulative cases (3,300 total deaths) reported in Delhi as of July 10, 2020 and the 633,739 cumulative cases (10,799 total deaths) as of January 23, 2021, which represent, respectively, approximately 0.55% and 3.20% of Delhi’s population. This disparity suggests that only about 2.4% of cases were being detected (underreporting factor of about 42) as of July 10, 2020, and as of January 23, 2021, that percentage has improved to only 5.7% (underreporting factor of about 18). The seroprevalence estimate also implies that the infection fatality rate (IFR) for Delhi was of the order of 0.07% or 717 per million as of July 10, 2020, which updates to 0.10% as of January 23, 2021. This IFR seems low compared to estimates worldwide^{45} and as such it may be reasonable to argue that some COVIDrelated deaths may also be unreported, or the cause of death misclassified. Uncertainty regarding the reporting of death data is further supported by the very small fraction of deaths in India that are medically reported^{46} and that the IFR estimates for SARSCoV2 from other studies in the world^{45} appear to be higher than influenza (as of 2018–19, infection fatality rate of influenza is at 961 per million or around 0.1%)^{47}.
The availability of several rounds of seroprevalence estimates from the Delhi serology study provides a unique opportunity to validate modelpredicted rates of latent unreported infections for our proposed false negativeadjusted SEIR model. The ELISA assay used in the Delhi serosurvey is a customized assay, some discussions on the development and imperfections of which are available both in recent literature and public media domain^{48,49}. Based on these known imperfections, we perform adjustments of the reported case counts/infection estimates under different sensitivity and specificity assumptions for both the diagnostic and antibody (Ab) tests and compare the modelbased estimates for the extent of underreporting to those obtained from the seroprevalencebased calculations. Other derived metrics such as case fatality rates and infection fatality rates are also presented. We use reported COVID19 case and death count data from covid19india.org^{2}. This framework can be adapted and applied to any set of reported casecounts where imperfect and limited testing exists.
Results
Extended SEIR model adjusted for misclassification
Figure 2 provides a schematic diagram of the proposed falsenegative adjusted SEIR model. Under low (0.7), medium (0.85), metaanalyzed (0.952)^{50} and perfect (1) sensitivity, and perfect (1) specificity assumptions for the RTPCR diagnostic test, we predict the total (reported and unreported) cases and deaths for Delhi using the proposed extended SEIR model.
Using data through June 30, 2020, this model estimates 4.8 million cases and 33,165 deaths on July 10, 2020 if we assume the RTPCR test has a sensitivity of 0.85, and those predicted counts become 4.2 million and 28,499, respectively, if the sensitivity is assumed to be 1.0 (Fig. 3a). In contrast, the observed case and death counts are 109,140 and 3,300 reported in Delhi as of July 10, 2020^{2}. Examining the ratio of predicted total number of cases and the predicted number of reported cases on July 10, 2020, the estimated case underreporting is within the range of 34–53 and the same quantity for underreported deaths is between 8 and 13 (Supplementary Table 1). According to this model, 97–98% of Delhi’s cases remain undetected as of July 10, 2020. The model predictions under the different scenarios considered and the results relative to daily reported/total case and death counts are summarized in Supplementary Figs. 1a and 2a.
Using data through December 31, 2020, this model estimates 10.2 million cases and 45,004 deaths on January 23, 2021 if we assume the RTPCR test has a sensitivity of 0.85, and those predicted counts become 8.0 million and 34,949, respectively, if the sensitivity is assumed to be 1.0 (Fig. 3b). In contrast, the observed case and death counts are 633,739 and 10,799 reported in Delhi as of January 23, 2021^{2}. Examining the ratio of predicted total number of cases and the predicted number of reported cases on January 23, 2021, the estimated case underreporting is within the range of 13–22 and the same quantity for underreported deaths is between 3 and 7 (Supplementary Table 1). According to this model, 92–96% of Delhi’s cases remain undetected as of January 23, 2021. The model predictions under the different scenarios considered and the results relative to daily reported/total case and death counts are summarized in Supplementary Figs. 1b and 2b.
Future projections and variation of the underreporting factor through the course of the pandemic
We extended our projections of unreported case counts and the underreporting factors prospectively. Our projections for August 15, 2020 predict between 6.5 and 9.6 million cumulative (reported and unreported) cases in Delhi (across low to high false negative rate scenarios for the diagnostic test) (Supplementary Table 1). This provides us with a range of 35–54 for the underreporting factors for cases and a range of 8–13 for underreporting of deaths on August 15, 2020 (Supplementary Table 1). The temporal changes in the daily estimated case underreporting factors throughout the course of the pandemic is another crucial feature captured by our projections, as can be seen in Supplementary Fig. 3a. For the low (0.7) sensitivity scenario, the estimated case underreporting factor is 34 on June 1, 2020, the beginning of the first unlock period. This increases to 49 for June 20, 2020, when the daily number of tests and reported cases both increased.
The updated set of projections for February 15, 2021 predict between 8.1 and 13.8 million cumulative (reported and unreported) cases in Delhi (across low to high false negative rate scenarios for the diagnostic test) (Supplementary Table 1). This provides us with a range of 13–22 for the underreporting factors for cases and a range of 3–7 for underreporting of deaths on February 15, 2021 (Supplementary Table 1). Notably, our projections indicate that the underreporting factor for total cases is approximately constant over the period of January 1 to March 15, 2021, as can be seen in Supplementary Fig. 3b. For the low (0.7) sensitivity scenario, the estimated case underreporting stays at 22 throughout this period, and for the perfect (1.0) sensitivity scenario, this number decreases to 13.
Naïve corrections to reported test results using known misclassification rates for tests
Since the total (reported and unreported) number of cases and subsequently, the underreporting factor, are not part of the observed data and therefore our SEIR model estimates cannot directly be validated, we validate these estimates using the estimated number of true infections predicted by the serosurvey data. However, the antibody tests are also imperfect and as such we also correct the seroprevalence estimates for imperfect testing.
Using varying (low to perfect) sensitivities and specificities for the diagnostic and antibody tests, we estimate that the true case count in Delhi as of July 10, 2020, lies between 4.4 and 4.6 million, which represents 30 to 42 times the number of reported cases (Table 2a). These estimates strongly agree with modelbased findings as reported in the previous subsection, indicating that 96–97% cases in Delhi were underreported. Our updated estimate for the true case count in Delhi as of January 23, 2021 lies between 11.1 and 11.9 million, representing 17 to 21 times the number of reported cases (Table 2b). Again, these estimates are in agreement with the modelbased estimates from the previous subsection, indicating that 94–95% cases in Delhi remained unreported even as recently as January 23, 2021.
Case fatality rate (CFR) and infection fatality rate (IFR)
The sensitivity and specificity of the diagnostic test impact our estimate of the casefatality rate (\(\frac{\#deaths}{\#reported cases}\)), but not the infectionfatality rate (\(\frac{\#deaths}{\#true infections}\)). We estimate that the CFR lies between 2.24–3.06% as of July 10, 2020 (Table 2a), and between 1.40 and 1.91% as of January 23, 2021 (Table 2b). On the other hand, the sensitivity and specificity of the antibody test impact our estimates of the IFR. We estimate that the IFR lies between 0.07 and 0.08% based on the reported death counts as of July 10, 2020, and between 0.09 and 0.10% based on that as of January 23, 2021 (Table 2).
If we consider the hypothetical scenario of tenfold underreporting of deaths, as suggested by the SEIR model outputs (a range of 8–13), the infectionfatality rate estimate increases to 0.7–0.8% for July 10, 2020 (Table 2a). The updated SEIR model outputs indicate a range of 3–7 for the underreporting factor for deaths, and assuming fivefold underreporting of deaths, the adjusted infectionfatality rate estimate lies between 0.4 and 0.5% for January 23, 2021 (Table 2b). We are not able to perform any validation for the estimated underreporting factor for deaths as we do not have estimates of true death rates or excess deaths.
Discussion
We developed an extension of the standard SEIR compartmental model to adjust for imperfect diagnostic testing. Applying our model on publicly available case and death count data for Delhi, we estimated the underreporting factor for cases to be somewhere between 34 and 53 and that for deaths to be somewhere between 8 and 13 on July 10, 2020 (with updated estimated ranges of 13–22 and 3–7 respectively on January 23, 2021). We obtained adjusted estimates of the underreporting factor for cases using the seroprevalence study (30–42 on July 10, 2020 and 17–21 on January 23, 2021), which largely agreed with those estimated from the model. Further, the estimated underreporting factors were seen to be more stable over an extended period of time with the new set of training data and testing period compared to the original calculations. Having an accurate idea about the underreporting factor and the extent of spread is extremely helpful in terms of tracking the growth of the pandemic and determining intervention policies. Since repeated serological surveys to track the everevolving seroconversion scenario are rarely viable options due to high expense in terms of cost, resources, and time, model estimates updated regularly with new incoming data provide an opportunity to monitor the underreporting factor and unreported cases and deaths.
Limitations
(1) Our SEIR model incorporates only false negatives of the diagnostic tests but not false positives. We are more concerned about false negatives as this gives a false sense of safety to a patient and may increase the likelihood the person will engage in activities that will spread the disease. In addition, the false positive rates are quite low for PCR tests^{11}. (2) We have refrained from incorporating a timevarying recovery rate in our model for several reasons. First, recovery data from India is not quite accurate and there is often a “catch up” period. The definition of recovery (e.g., negative COVID test, no symptoms) is also variable. As such, this may induce more noise. Second, modeling recoveries better change our estimate for “active” cases but does not affect what we consider in this paper, cumulative cases reported up to a give date. Third, including more timevarying parameters in the model will complicate the model further, and depending on the availability and quality of the recovery data, it may yield unstable/questionable fits. Finally, without directly considering the recovery rate to be timevarying, it is possible to effectively capture changes in the recovery rate by modifying one of the other parameters affecting recovery rate, like the mortality rate on which we have more data. For instance, one further generalized version of our model offers an option for timevarying mortality rate which has the potential to capture timevarying recovery^{51}. (3) We used the seroprevalence estimate as a parallel, independent way of validating our model findings. An alternative approach for using serosurvey data is to introduce quarantine and immune compartments in the model structure and assume that symptomatic individuals are identified and successfully isolated with a given average delay from the onset of symptoms and that recovered individuals are never susceptible to an infection again^{52}. We have not compared our method with this approach. (4) The implications of any such modelbased adjustments depend heavily upon the reliability of the reported seroprevalence information. To that end, it is important to mention that many pertinent details were not released publicly in the first and fifth (latest) phases of the Delhi NCDC serology survey, such as the response and positivity rates stratified by age, sex, job type, district; sampling design and so on. A single reported number for the seroprevalence (22.86% and 53.16% respectively for the 1st and the 5th Delhi serosurveys) without sufficient detail on the survey design and assay used has limited use. (5) We do not know if individuals with antibodies are protected from reinfection, how long this protection lasts, the antibody levels needed to protect us from reinfections^{53}, or whether a person with the antibody can still be contagious or show severe symptoms. The positive news from our estimates is that a large number of people in Delhi had the infection without feeling severe symptoms or needing clinical care.
Conclusion
There have been debates about the path towards achieving herd immunity in India. The estimated range for the herd immunity threshold lies within 44–73% (based on worldwide estimated basic reproduction number of 1.8–3.8)^{54,55}. For Delhi, and possibly even more so for other parts of India, herd immunity seems to be attainable as of recent dates but is certainly not a panacea we can rely on. Even based on the IFR obtained without adjusting for potential death underreporting and trusting the reported death counts as of January 23, 2021 (Table 2), if 50% of the 1.38 billion people in India get infected (a concept that many proponents of herd immunity have suggested), this would imply an estimated 690,000 deaths. This estimate skyrockets to a staggering 3.0–3.5 million deaths if we believe the current estimated underreporting factor for death from our proposed model. Although we could not validate the estimated underreporting factor for death, the quality of the reported death data is questionable. For example, a mid2020 study attempting to model COVID19 fatalities stratified by agegroups indicates that at least 1500–2500 deaths in Delhi in the 60 + age group have not been reported^{56}. The high estimate of fatalities when adjusted for underreporting, along with these evidence for underreporting of deaths in India, calls for cautious actions, as India is beginning to see a second wave of the pandemic as recently as the beginning of April 2021^{57}. Strong policy decisions directed towards containment of the new surge in infections and logistically efficient vaccination strategies are the need of the hour in this regard.
The appearance and spread of COVID19 have taken the entire world by a storm, but a large number of examples from all across the world clearly depict that we can change the narrative and course of this virus through extensive testing, contact tracing, use of masks, hand hygiene and social distancing. For example, Delhi has seen tremendous success in turning the corner of the virus curve, with the timevarying reproduction number staying below unity for the larger part of the period between September 2020 and February 2021 (Supplementary Figs. 4–5). This trend of improved containment, however, seems to have reversed in the recent times, with the estimated timevarying reproduction number undergoing an alarming increase above unity during March–April 2021 (Supplementary Fig. 5). Several factors including public complacency, waning immunity that was acquired from past infections and the emergence of new variances may have contributed to this surge^{58}. The appearance of these escalated numbers also calls for closer inspections of the serosurveybased estimates, since a \(>50\%\) seroprevalence and a spike in the number of new infections are theoretical antipodes in the context of a pandemic. Multiple potential reasons behind emerging biases in serosurvey estimates including nonrepresentative sampling and assay characteristics have been discussed in recent literature, alongside possible ways of adjusting for such bias^{59,60}.
Rapid and significant scientific advancements in both clinical and public health interventions have been made over the past year^{61}. Datadriven policy decisions are crucial at this juncture. Our analytical framework for integrating diagnostic testing imperfections in the context of estimating unreported cases provides an alternative to conducting frequent serosurveys in Delhi. Validation of epidemiological model outputs against seroprevalence estimates inspires confidence in our inference and will hopefully prove to be a useful strategy for other casestudies.
Methods
Extended SEIR model adjusted for misclassification
We developed an extension of a standard SEIR model. In this model, the susceptible individuals (S) become exposed (E) when they are infected. After a latency period, exposed individuals are able to infect other susceptible individuals and are either untested (U) with probability \(r\) or tested (T) with probability \(1r\). Tested individuals enter either the false negative compartment (F) with probability \(f\) or the (true) positive compartment (P) with probability \(1f\). Individuals who are in the untested and the false negative compartments are considered unreported COVID19 cases and enter either the recovered unreported (RU) or death unreported (DU) compartments. Similarly, those who tested positive move to either a recovered reported (RR) or death reported (DR) compartment. Figure 2 represents the SEIR model schematic, with arrows representing the possible transitions individuals in each compartment can undergo. The corresponding system of differential equations is presented below. The parameters and their initialization values used are described in Supplementary Table 2.

\(\frac{\partial S}{\partial t}=\beta \frac{S\left(t\right)}{N}\left({\alpha }_{P}P\left(t\right)+{\alpha }_{U}U\left(t\right)+ F\left(t\right)\right)+\lambda \mu S\left(t\right).\)

\(\frac{\partial E}{\partial t}=\beta \frac{S\left(t\right)}{N}\left({\alpha }_{P}P\left(t\right)+{\alpha }_{U}U\left(t\right)+F\left(t\right)\right)\frac{E\left(t\right)}{{D}_{e}}\mu E\left(t\right).\)

\(\frac{\partial U}{\partial t}=\frac{(1r)E(t)}{{D}_{e}}\frac{U\left(t\right)}{{\beta }_{1}{D}_{r}}{\delta }_{1}{\mu }_{c} U\left(t\right)\mu U\left(t\right).\)

\(\frac{\partial P}{\partial t}=\frac{r(1f)E(t)}{{D}_{e}}\frac{P\left(t\right)}{{D}_{r}}{\mu }_{c}P\left(t\right)\mu P\left(t\right).\)

\(\frac{\partial F}{\partial t}=\frac{rfE(t)}{{D}_{e}}\frac{{\beta }_{2}F\left(t\right)}{{D}_{r}}\frac{{\mu }_{c} F\left(t\right)}{{\delta }_{2}}\mu F\left(t\right).\)

\(\frac{\partial RU}{\partial t}=\frac{U(t)}{{\beta }_{1}{D}_{r}}+\frac{{\beta }_{2}F(t)}{{D}_{r}}\mu RU\left(t\right).\)

\(\frac{\partial RR}{\partial t}=\frac{P\left(t\right)}{{D}_{r}}\mu RR\left(t\right).\)

\(\frac{\partial DU}{\partial t}={\delta }_{1}{\mu }_{c}U\left(t\right)+\frac{{\mu }_{c}F\left(t\right)}{{\delta }_{2}}.\)

\(\frac{\partial DR}{\partial t}={\mu }_{c}P\left(t\right).\)
Here, \(X(t)\) denotes the number of individuals in the compartment of interest \(X\) at time \(t\). Based on this set of differential equations, we calculate the basic reproduction number of the proposed model using the Next Generation Matrix Method^{62}. The expression for \({R}_{0}\) turns out to be the following:
Here, \({S}_{0}=\frac{\lambda }{\mu }=1\), since we have assumed natural birth and death rate to be equal within this short period of time. In this setting, both \(\beta\) and \(r\) are timevarying parameters which are estimated using the Metropolis–Hastings MCMC method^{63}. To estimate the parameters, we first need to be able to solve the differential equations, which is difficult to perform in this continuoustime setting. It is also worth noting that we do not require the values of the variables for each time point. Instead, we only need their values at discrete time steps, i.e., for each day. Thus, we approximate the above set of differential equations by a set of recurrence relations. For any compartment \(X\), the instantaneous rate of change with respect to time \(t\) (given by \(\frac{\partial X}{\partial t}\)) is approximated by the difference between the counts of that compartment on the \({\left(t+1\right)}^{th}\) day and the \({t}^{th}\) day, that is \(X\left(t+1\right)X(t)\). Starting with an initial value for each of the compartments on the Day 1 and using the discretetime recurrence relations, we can then obtain the solutions of interest. Some examples of these discretetime recurrence relations are presented below.

\(E\left(t+1\right)E\left(t\right)=\beta \frac{S\left(t\right)}{N}\left({\alpha }_{P}P\left(t\right)+{\alpha }_{U}U\left(t\right)+F\left(t\right)\right)\frac{E\left(t\right)}{{D}_{e}}\mu E\left(t\right),\)

\(U\left(t+1\right)U\left(t\right)=\frac{\left(1r\right)E\left(t\right)}{{D}_{e}}\frac{U\left(t\right)}{{\beta }_{1}{D}_{r}}{\delta }_{1}{\mu }_{c} U\left(t\right)\mu U\left(t\right),\)

\(P\left(t+1\right)P\left(t\right)=\frac{r\left(1f\right)E\left(t\right)}{{D}_{e}}\frac{P\left(t\right)}{{D}_{r}}{\mu }_{c}P\left(t\right)\mu P\left(t\right),\)

\(F\left(t+1\right)F\left(t\right)=\frac{rfE\left(t\right)}{{D}_{e}}\frac{{\beta }_{2}F\left(t\right)}{{D}_{r}}\frac{{\mu }_{c} F\left(t\right)}{{\delta }_{2}}\mu F\left(t\right).\)
The rest of the differential equations can each be similarly approximated by a discretetime recurrence relation. These parameters are estimated using training data from Delhi from March 15 to June 30, 2020 for our first set of analyses, and from March 15 to December 31, 2020 for our updated set of analyses. The training data were divided into multiple periods, in accordance with the lockdown and unlock procedures employed by the government of India, as described in Supplementary Table 3. Using these, we obtained predictions for dates ranging from June 1 through August 15, 2020, for the first set of analyses, and between January 1 to March 15, 2021 for the updated set of analyses. Since we used an MCMC algorithm to estimate the parameters and the posterior means of the compartment sizes, it is easy to obtain empirical posterior credible intervals based on the full set of MCMC draws to quantify the uncertainty associated with these estimates and projections. However, we deliberately refrained from reporting the uncertainty estimates in this paper to avoid intricacies in presentation of the results that may hinder the central message. Further, we assumed the RTPCR test specificity to be 1 and did not incorporate false positives arising from the diagnostic test to avoid additional assumptions for model identifiability.
Naïve corrections to reported test results using known misclassification rates
Notations: Let N = population size, X = number of true cases in the population (hence N – X = number of noncases in the population), T = number of people tested, S = number of true cases tested (hence T – S = number of noncases tested, X – S = number of true cases not tested, N – X – T + S = number of noncases not tested), P = number of positive tests (also, therefore, cumulative number of reported cases, hence T – P = number of negative tests). Note that X and S are the only two unknowns in this setting. Also, let us assume that the sensitivity of the test of interest is \(\boldsymbol{\alpha }\) and the specificity of the same is \(\beta\). With that, we can set up the following equation, because there are two ways a test can be positive, as can be seen in Supplementary Fig. 6.
Adjusting the terms, we get the following expression for \(S\).
Assuming that the proportion of cases among those tested stays the same as the original population (random and hence homogenous testing), we can replace \(S\) by \(\frac{TX}{N}\), which will lead to the following updated equation.
Solving this, we get the following expression for \(X\).
Thus, these two expressions give us, for a given set of \(\alpha\) and \(\beta\), the corrected number of reported cases (\(S\)), and also the estimated number of true (reported and unreported) cases (\(X\)). For the computation of \(S\), we use \(\frac{P}{T}=\frac{\mathrm{109,140}}{\mathrm{747,109}}\approx 0.146\), the test positive rate of the RTPCR tests in Delhi as of July 10^{2}. For the computation of \(X\), we use \(\frac{P}{T}=\frac{\mathrm{4,889}}{\mathrm{21,387}}\approx 0.229\), the positive rate reported by the first round of the Delhi serological survey^{12,13,14}. For the updated analysis based on more recent data, these numbers are updated to \(\frac{\mathrm{644,064}}{\mathrm{10,289,461}}\approx 0.062\) and \(\frac{\mathrm{15,716}}{\mathrm{28,000}}\approx 0.561\) respectively. Once we get these estimates, we can compute the adjusted underreporting factor as \(URF=\frac{X}{S}\). Also, assuming that \(D\) denotes the cumulative number of deaths till a date of interest, we can compute the corrected versions of case fatality rate and infection fatality rate as \(CFR=\frac{D}{S}\) and \(IFR=\frac{D}{X}\), respectively. Further, if we want to adjust for a potential scenario where for every M death due to COVID19, we observe 1 death (Mfold underreporting for deaths), we can update the IFR estimate as \(IFR=\frac{MD}{X}\). We calculate our adjusted IFR estimates for \(M=10\) for the July 10, 2020 computations, and for \(M=5\) for the January 23, 2021 computations. Based on the data from Delhi, we use \(D=3300\) for July 10, 2020, and \(D=\mathrm{10,994}\) for January 23, 2021^{2}. We also use a population size of \(N=1.98\times {10}^{7}\) based on recent population data^{64}, since the last official census in India was performed in 2011, and the number reported there may not be representative of the current scenario.
A critical question here is the choice of \(\alpha\) and \(\beta\) for the two tests to ensure our computations reflect adjustments made based on sensible and realistic scenarios. Based on previously reported sensitivity and specificity levels for the diagnostic test^{10,49}, we used the combinations \(\alpha =\beta =1 \left(\text{perfect test}\right), \alpha =0.952 {\text{and}} \beta =0.99\), \(\alpha =0.85 {\text{and}} \beta =0.99\), \({\text{and}} \alpha =0.7 {\text{and}} \beta =0.99\). The serological assay used by NCDC is a customized assay, and we referred to existing literature on and publicly available discussions on this particular assay, alongside literature on serological assays in general^{48,49}, and decided to use the combinations of \(\alpha =\beta =1 \left(\text{perfect test}\right)\), \(\alpha =0.976 {\text{and}} \beta =0.993\), \({\text{and}} \alpha =0.92 {\text{and}} \beta =0.97\).
Data availability
All data used in our analyses are available at http://covind19.org.
Code availability
All our computational codes are available at http://covind19.org.
Change history
20 August 2021
A Correction to this paper has been published: https://doi.org/10.1038/s41598021966031
References
Hui, D. et al. The continuing 2019nCoV epidemic threat of novel coronaviruses to global health—The latest 2019 novel coronavirus outbreak in Wuhan, China. Int. J. Infect. Dis. 91, 264–266 (2020).
Coronavirus in India: Latest Map and Case Count. Covid19india.org. https://covid19india.org/ (2020).
Chauhan, N. After Covid19 lockdown, plan to unlock India in phases. In Hindustan Times. https://www.hindustantimes.com/indianews/afterlockdownplantounlockindiainphases/storyvsK1wGQ7moLTMjlKkUelHP.html (2020).
Hao, X. et al. Reconstruction of the full transmission dynamics of COVID19 in Wuhan. Nature https://doi.org/10.1038/s4158602025548 (2020).
Godio, A., Pace, F. & Vergnano, A. SEIR modeling of the Italian epidemic of SARSCoV2 using computational swarm intelligence. Int. J. Environ. Res. Public Health 17, 3535 (2020).
Li, J. & Cui, N. Dynamic analysis of an SEIR model with distinct incidence for exposed and infectives. Sci. World J. 2013, 1–5 (2013).
Zhang, J., Li, J. & Ma, Z. Global dynamics of an SEIR epidemic model with immigration of different compartments. Acta Math. Sci. 26, 551–567 (2006).
Udugama, B. et al. Diagnosing COVID19: The disease and tools for detection. ACS Nano 14, 3822–3835 (2020).
Peeling, R. et al. Serology testing in the COVID19 pandemic response. Lancet. Infect. Dis https://doi.org/10.1016/s14733099(20)30517x (2020).
Tran, N., Cohen, S., Waldman, S. & May, L. Review of COVID19 testing methods. In Laboratory Best Practice Blog. https://blog.ucdmc.ucdavis.edu/labbestpractice/index.php/2020/06/16/reviewofcovid19testingmethods/ (2020).
Woloshin, S., Patel, N. & Kesselheim, A. False negative tests for SARSCoV2 infection—Challenges and implications. N. Engl. J. Med. https://doi.org/10.1056/nejmp2015897 (2020).
Saxena, A. Explained: Here are the key takeaways from Delhi’s serological survey. In The Indian Express. https://indianexpress.com/article/explained/delhiserologicalsurveyshowsantibodiesin23participantswhatdoesthismean6516512/ (2020).
Murhekar, M. et al. Prevalence of SARSCoV2 infection in India: Findings from the national serosurvey, MayJune 2020. Indian J. Med. Res. 152, 48 (2020).
Indian Council for Medical Research. ICMR second serosurvey for SARSCoV2 infection. In Static.pib.gov.in. https://static.pib.gov.in/WriteReadData/userfiles/Modified%20ICMR_SecondSerosurvey_MMSP%20(1).pdf (2021).
ICMR sero survey: One in five Indians exposed to Covid19. In BBC News. https://www.bbc.com/news/worldasiaindia55945382 (2021).
Selvaraju, S. et al. Populationbased serosurvey for severe acute respiratory syndrome coronavirus 2 transmission, Chennai, India. Emerg. Infect. Dis. 27, 586–589 (2021).
Gupta, R. et al. Seroprevalence of antibodies to SARSCoV2 in healthcare workers & implications of infection control practice in India. Indian J. Med. Res. 153, 207 (2021).
Babu, N. Percentage of people with antibodies high, shows Delhi serological survey. In The Hindu. https://www.thehindu.com/news/cities/Delhi/percentageofpeoplewithantibodieshigh/article32156162.ece (2020).
Sharma, N. et al. The seroprevalence and trends of SARSCoV2 in Delhi, India: A repeated populationbased seroepidemiological study. medRxiv. https://doi.org/10.1101/2020.12.13.20248123 (2020).
Goswami, S. Delhi’s 5th sero survey: Over 56% people have antibodies against Covid19. In Hindustan Times. https://www.hindustantimes.com/cities/delhinews/delhis5thserosurveyover56peoplehaveantibodiesagainstcovid19101612264534349.html (2021).
Mohanan, M., Malani, A., Krishnan, K. & Acharya, A. Prevalence of SARSCoV2 in Karnataka, India. JAMA 325, 1001 (2021).
Department of Health & Family Welfare, Government of Kerala. Technical paper COVID 19: ICMR—Serological surveillance report round 3. In Health.kerala.gov.in. https://health.kerala.gov.in/pdf/TechnicalpaperCOVID19SeroSurveillanceRound3ICMR.pdf (2021).
Barnagarwala, T. Coronavirus: What Mumbai serosurvey shows about gender differences in infection, mortality and herd immunity. In The Indian Express. https://indianexpress.com/article/explained/mumbaisserosurveywhatitshowsaboutgenderdifferencesininfectionmortalityandherdimmunity6529186/ (2020).
Ghose, A. et al. Community prevalence of antibodies to SARSCoV2 and correlates of protective immunity in an Indian metropolitan city. medRxiv. https://doi.org/10.1101/2020.11.17.20228155 (2020).
Malani, A. et al. SARSCoV2 seroprevalence in Tamil Nadu in OctoberNovember 2020. medRxiv. https://doi.org/10.1101/2021.02.03.21250949 (2021).
Hallal, P. et al. SARSCoV2 antibody prevalence in Brazil: Results from two successive nationwide serological household surveys. Lancet Glob. Health 8, e1390–e1398 (2020).
Xu, X. et al. Seroprevalence of immunoglobulin M and G antibodies against SARSCoV2 in China. Nat. Med. https://doi.org/10.1038/s4159102009496 (2020).
Public Health England. Weekly coronavirus disease 2019 (COVID19) surveillance report—Summary of COVID19 surveillance systems. In Assets.publishing.service.gov.uk. https://assets.publishing.service.gov.uk/government/uploads/system/uploads/attachment_data/file/888254/COVID19_Epidemiological_Summary_w22_Final.pdf (2021).
Ward, H. et al. SARSCoV2 antibody prevalence in England following the first peak of the pandemic. Nat. Commun. 12 (2021).
Vu, S. et al. Prevalence of SARSCoV2 antibodies in France: Results from nationwide serological surveillance. medRxiv. https://doi.org/10.1101/2020.10.20.20213116 (2020).
Roederer, T. et al. Seroprevalence and risk factors of exposure to COVID19 in homeless people in Paris, France: A crosssectional study. Lancet Public Health 6, e202–e209 (2021).
Korth, J. et al. SARSCoV2specific antibody detection in healthcare workers in Germany with direct contact to COVID19 patients. J. Clin. Virol. 128, 104437 (2020).
Poustchi, H. et al. SARSCoV2 antibody seroprevalence in the general population and highrisk occupational groups across 18 cities in Iran: A populationbased crosssectional study. Lancet. Infect. Dis 21, 473–481 (2021).
Shakiba, M. et al. Seroprevalence of COVID19 virus infection in Guilan province. Iran https://doi.org/10.1101/2020.04.26.20079244 (2020).
Doi, A. et al. Estimation of seroprevalence of novel coronavirus disease (COVID19) using preserved serum at an outpatient setting in Kobe, Japan: A crosssectional study. https://doi.org/10.1101/2020.04.26.20079822 (2020).
Pollán, M. et al. Prevalence of SARSCoV2 in Spain (ENECOVID): A nationwide, populationbased seroepidemiological study. Lancet https://doi.org/10.1016/s01406736(20)314835 (2020).
Public Health Agency Sweden. Första Resultaten Från Pågående Undersökning av Antikroppar för Covid19Virus. (2020).
Stringhini, S. et al. Seroprevalence of antiSARSCoV2 IgG antibodies in Geneva, Switzerland (SEROCoVPOP): A populationbased study. Lancet https://doi.org/10.1016/s01406736(20)313040 (2020).
Gaskell, K. et al. Extremely high SARSCoV2 seroprevalence in a strictlyOrthodox Jewish community in the UK. medRxiv. https://doi.org/10.1101/2021.02.01.21250839 (2021).
Angulo, F., Finelli, L. & Swerdlow, D. Estimation of US SARSCoV2 infections, symptomatic infections, hospitalizations, and deaths using seroprevalence surveys. JAMA Netw. Open 4, e2033706 (2021).
Sood, N. et al. Seroprevalence of SARSCoV2specific antibodies among adults in Los Angeles County, California, on April 10–11, 2020. JAMA 323, 2425 (2020).
Rosenberg, E. et al. Cumulative incidence and diagnosis of SARSCoV2 infection in New York. Ann. Epidemiol. https://doi.org/10.1016/j.annepidem.2020.06.004 (2020).
Ng, D. et al. SARSCoV2 seroprevalence and neutralizing activity in donor and patient blood from the San Francisco Bay Area. https://doi.org/10.1101/2020.05.19.20107482 (2020).
Bendavid, E. et al. COVID19 antibody seroprevalence in Santa Clara County. California. https://doi.org/10.1101/2020.04.14.20062463 (2020).
Ioannidis, J. The infection fatality rate of COVID19 inferred from seroprevalence data. https://doi.org/10.1101/2020.05.13.20101253 (2020).
Roy, L. Infected India: The true toll of coronavirus in the world’s 2ndmost populated country. In Forbes. https://www.forbes.com/sites/lipiroy/2020/06/25/infectedindiathetruetollofcoronavirusintheworlds2ndmostpopulatedcountry/#4cf904c850fb (2020).
Burden of Influenza. Centers for Disease Control and Prevention . https://www.cdc.gov/flu/about/burden/index.html (2020).
Sapkal, G. et al. Development of indigenous IgG ELISA for the detection of antiSARSCoV2 IgG. Indian J. Med. Res. 151, 444 (2020).
The Print India. Serosurveys—Pure Science. https://www.facebook.com/1733495223546925/posts/3183938748502558/ (2020).
Dinnes, J. et al. Rapid, pointofcare antigen and molecularbased tests for diagnosis of SARSCoV2 infection. Cochrane Database Syst. Rev. https://doi.org/10.1002/14651858.cd013705 (2020).
Bhaduri, R. et al. Extending the susceptibleexposedinfectedremoved (SEIR) model to handle the high false negative rate and symptombased administration of Covid19 diagnostic tests: SEIRfansy. medRxiv. https://doi.org/10.1101/2020.09.24.20200238 (2020).
Mandal, S., Das, H., Deo, S. & Arinaminpathy, N. When to relax a lockdown? A modellingbased study of testingled strategies coupled with serosurveillance against SARSCoV2 infection in India. https://doi.org/10.1101/2020.05.29.20117010 (2020).
Kirkcaldy, R., King, B. & Brooks, J. COVID19 and postinfection immunity. JAMA 323, 2245 (2020).
Britton, T., Ball, F. & Trapman, P. A mathematical model reveals the influence of population heterogeneity on herd immunity to SARSCoV2. Science https://doi.org/10.1126/science.abc6810 (2020).
Randolph, H. & Barreiro, L. Herd immunity: Understanding COVID19. Immunity 52, 737–741 (2020).
Chakravarty, S. Estimating missing deaths in Delhi's COVID19 data. https://doi.org/10.1101/2020.07.29.20164392 (2020).
Zargar, A. India sees record daily number of COVID infections as 2nd wave prompts tougher restrictions. In Cbsnews.com. https://www.cbsnews.com/news/indiacovid19recordcoronaviruscases2ndwavenewrestrictions/ (2021).
Kuchay, B. Why is India staring at a ‘second peak’ of COVID cases? In Aljazeera.com. https://www.aljazeera.com/news/2021/3/19/whyisindiastaringatasecondpeakofcovidcases (2021).
Accorsi, E. et al. How to detect and reduce potential sources of biases in studies of SARSCoV2 and COVID19. Eur. J. Epidemiol. 36, 179–196 (2021).
Takahashi, S., Greenhouse, B. & RodríguezBarraquer, I. Are seroprevalence estimates for severe acute respiratory syndrome coronavirus 2 biased?. J. Infect. Dis. 222, 1772–1775 (2020).
COVID19 Vaccine and Therapeutic Drugs Tracker. In COVID19 Vaccine and Therapeutic Drugs Tracker. https://biorender.com/covidvaccinetracker (2020).
Diekmann, O., Heesterbeek, J. & Roberts, M. The construction of nextgeneration matrices for compartmental epidemic models. J. R. Soc. Interface 7, 873–885 (2009).
Hastings, W. Monte Carlo sampling methods using Markov chains and their applications. Biometrika 57, 97–109 (1970).
BBC India. India coronavirus: Nearly one in four in Delhi had Covid19, study says. In BBC News. https://www.bbc.com/news/worldasiaindia53485039 (2021).
Acknowledgements
The authors would like to thank the Center for Precision Health Data Sciences at the University of Michigan School of Public Health, The University of Michigan Rogel Cancer Center and the Michigan Institute of Data Science for internal funding that supported this research. The authors are grateful to Professors Eric Fearon, Aubree Gordon and Parikshit Ghosh for useful conversations that helped formulating the ideas in this manuscript. The research was supported by NSF DMS 1712933.
Author information
Authors and Affiliations
Contributions
Ru.B. prepared the initial draft and carried out the naïve misclassification correction to case counts and took leadership of composing the final draft. Ri.B., R.K., L.J.B. and B.M. developed the extended SEIR model with misclassification. Ri.B. and R.K. implemented the extended SEIR model. D.R. carried out the literature review pertaining to the national and worldwide serosurveys and constructed the corresponding tables and summaries. M.S carried out extensive literature search and visualization and participated in writing of the manuscript. B.M. conceptualized the project and oversaw the research. All authors read and approved the final version of the draft.
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing interests.
Additional information
Publisher's note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
The original online version of this Article was revised: The original version of this Article contained errors in Figure 2, where the colours within each circle and dashed line arrows were omitted.
Supplementary Information
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Bhattacharyya, R., Kundu, R., Bhaduri, R. et al. Incorporating false negative tests in epidemiological models for SARSCoV2 transmission and reconciling with seroprevalence estimates. Sci Rep 11, 9748 (2021). https://doi.org/10.1038/s41598021891271
Received:
Accepted:
Published:
DOI: https://doi.org/10.1038/s41598021891271
This article is cited by

Modeling Global COVID19 Dissemination Data After the Emergence of Omicron Variant Using Multipronged Approaches
Current Microbiology (2022)
Comments
By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.