Lies, Gosh Darn Lies, and not enough good statistics: why epidemic model parameter estimation fails

We sought to investigate whether epidemiological parameters that define epidemic models could be determined from the epidemic trajectory of infections, recovery, and hospitalizations prior to peak, and also to evaluate the comparability of data between jurisdictions reporting their statistics. We found that, analytically, the pre-peak growth of an epidemic underdetermines the model variates, and that the rate limiting variables are dominated by the exponentially expanding eigenmode of their equations. The variates quickly converge to the ratio of eigenvector components of the positive growth mode, which determines the doubling time. Without a sound epidemiological study framework, measurements of infection rates and other parameters are highly corrupted by uneven testing rates, uneven counting, and under reporting of relevant values. We argue that structured experiments must be performed to estimate these parameters in order to perform genetic association studies, or to construct viable models accurately predicting critical quantities such as hospitalization loads.


Scientific Reports
| (2021) 11:408 | https://doi.org/10.1038/s41598-020-79745-6 www.nature.com/scientificreports/ incubation period, up to 14 days in some cases after infection, that lasts until the latent exposed individuals become infectious. There have been some early estimates based on confirmed cases 14,19 with more evidence of pre-symptomatic transmission being noted as well as and some evidence of asymptomatic transmission 20,21 . Pre-symptomatic incubation to infectious status is shorter than incubation to symptomatic status since patients are often infectious before symptoms emerge. Some of those asymptomatic people remain asymptomatic until they are non-contagious 12 . The SARS-COV-2 incubation period may partly account for the observed lag when social distancing or other viral spread prevention policies are imposed. Patients may still be infectious for several days after symptomatic recovery. Symptomatic patients likely to be hospitalized are hospitalized more quickly than non-hospitalized patients recover. Hospitalized patients in Intensive Care Units (ICU) or that required immediate ventilation tend to experience a longer time to recovery than non-hospitalized patients. Those that stay on the ventilator for long periods tend to have a high mortality rate, and may stay on the ventilator for many weeks prior to dying 9 . A compartmental model that captures the stages relevant to infectious transmissions as represented in published staging times 9,12,19,20,22,23 , and durations counts susceptible population members S , latent exposed E , infectious asymptomatic I A , infectious symptomatic I S , infectious people who will be hospitalized I H , those hospitalized who recover I HR , and hospitalized leading to mortality I HM . Recoveries are R , and mortalities are R M . The conversion between compartments involves a number of variables which are assumed to be uncontrolled and random. There will be a distribution of times that individuals remain in a compartment, assumed to yield an overall average rate of conversion. The time from exposed to infectious is ≈ α −1 , where α is partitioned into contributions to asymptomatic infectious I A , symptomatic infectious I S , and infectious that will be hospitalized I H , so that α = α IS + α IA + α I H . This model, dubbed SEAIRH, extends the SEAIR model 24 . Total removal time among asymptotic infectious is γ −1 I A , with a fraction ζ going to infectious symptomatic. Infectious symptomatic removal time is γ −1 I S . The period prior to hospitalization is α I HR + α I HM −1 . The rate that the proportion that recovers is α I HR , and that which dies is α I HM . Figure 1 graphically highlights the interactions described above, and outlines the equations, below, that quantify the graphically represented relationships. The ovals represent compartments (e.g. susceptibles S , and exposed incubators E ). The solid arrows represent conversion flows between compartments (e.g. expressions such as ζ γ I A I A ). The rectangles represent conversions due to infection (e.g. converting S to E due to one of the infectious groups, such as β I A S I A N ). We note that, as in basic conversions between compartments, the infection rate includes social effects of contacts, including stochastic high impact super-spreader events 25 , the tendency for symptomatic people to isolate themselves or to be isolated, and on physiological aspects of transmission (sneezing and coughing). The dotted lines represent infectious groups that infect susceptibles leading to incubations (e.g. the I A in β I A S I A N ). The model equations, reflecting an underlying Markov chain with R and R M being absorbing, expressing these connections and rates are: www.nature.com/scientificreports/ These include: susceptible, latent exposed, infectious who have been identified or are symptomatic and isolated, infectious who have not been identified, and who are pre-symptomatic, asymptomatic, infectious who will be hospitalized as severe or critical, those who are hospitalized but will recover, those who are hospitalized who will succumb, those who recovered, and those who passed away. Solid arrows represent conversions from one state to another, with a fixed rate. Dotted arrows represent infectious interactions that promote conversion from susceptible to latent exposed. Each dotted arrow provides a component I N to the conversion rate β I N for that particular infectious group. β is reduced for isolated individuals compared to pre-symptomatic and asymptomatic spreaders. Rectangular boxes reflect the flow through mediated by dotted arrow input: β I N S. www.nature.com/scientificreports/ Note that dN dt = 0 , indicating conversions of all individuals in the system are accounted for. Parameter values derived from publications are listed in Table 1.
The rate of infection for a susceptible individual depends on the probability that an infectious viral load is transferred, multiplied by the rate of encounters a susceptible individual has. The encounters can involve: other susceptible individuals, or symptomatic infectious people, which as a group tends to be isolated with a corresponding depressed rate of encounters β I S , and undetected presymptomatic and asymptomatic (those who are infected, infectious, but never display symptoms until recovery) infectious people whose interaction rate β I A is substantially higher, subject to social distancing regulations, since they are never discovered and fully isolated (Fig. 1). The fraction of infectious symptomatic individuals that a given susceptible individual may encounter is β I S I S N , and the total number of susceptible individuals exposed to infectious symptomatic cases is β I S I S N S . Likewise, that for presymptomatic and asymptomatic cases ( I A ), the rate of symptomatic infections is β I A I A N S . These terms drive the creation of new infections in the population. The force of the symptomatic group is the coefficient of I S , or β I S S N . The number of the susceptible group that an individual can infect over their entire period τ I S = 1 γ I S of infectiousness is the reproduction number R t = τ I S · β I S S(t) N = β I S S(t) γ I S N , and similarly for the asymptomatic infectious group R t = β I A S(t) γ I A N . If almost all of the population is susceptible so that S ≈ N , the basic reproduction numbers for symptomatic and asymptomatic individuals are R 0 = β I S γ I S = β I S τ I S and R 0 = respectively. These are consistent with other definitions describing basic reproduction numbers. For the growth eigenvector, the ratios among the compartments determine an average overall R 0 . Multiple mode reproduction numbers in COVID-19 are also noted 26,27 . These numbers primarily drive the rate of growth of the infection in the population, which early in the expansion is measured by the doubling time.
Early in the evolution of the infection, which may be defined as when N − S ≪ N , the variables immediately involved in the feedback loop determine the rate limiting step. Therefore, identifying and the equation governing the system in this regime is  www.nature.com/scientificreports/ This has solutions of the form X(t) = e Mt X(0) . The M may be diagonalized by a matrix U so that This is an eigen equation, where the κ J s determine the time rate of exponential growth or decay with doubling time τ j = ln 2 κ j , and the eigenvectors represent the linear combinations of E , I A ,I S , and I H that grow or decay with that eigenvalue. The combinations of eigenmodes is determined by initial conditions. The leading eigenvalue will dominate with exponential growth yielding fixed proportions of each of the E , I A ,I S , and I H to each other. The other terms turn out to identify rates related to the delay time for the system to respond to changes in distancing policy due to incubation time, to imbalances between symptomatic and asymptomatic patients, and to the decay of I H .
Data from New York State were obtained from The COVID Tracking Project 28 .
Recent results indicate that some individuals may become reinfected 29 . Given some rate of reinfection, the inclusion of flow from recovered/removed back to susceptible compartments admits an eigenvector with eigenvalue κ = 0 assuming no alternative interventions.

Results
Testing in New York State, starting on 03/04/2020, labeled as day 1. On 3/13, day 10, NY State received permission to contract for its own SARS-COV-2 testing. Statewide "distancing" started on 3/20, day 17, with the signing of the "New York State on Pause" bill. Prior to that, local jurisdictions had already been imposing local ordinances against assembly, and started closing schools. Figure 2 shows the cumulative total testing and positive test numbers indexed by day for New York State. Testing has been driven by tracking contacts of discovered cases which is reflected heavily in the close alignment of total tests and positive tests. On 3/13, the total number of tests increased from 308 to 3200, with surges to the 5000 level, then 7000, then 14,000 showing rapid subsequent growth.
Early in the testing, from day 1 to 19, the rate of growth of positive cases was κ = 0.3519 ± 0.01390 , corresponding to a doubling time of 1.97 ± 0.08 . From day 20 to day 30, the rate of growth of positive cases was κ = 0.2027 ± 0.0076 , corresponding to a doubling time of 3.42 ± 0.13 . These numbers suggested very high rates of contagious transmission. These doubling times were reported by the New York State Governor in some of his earliest briefings.
If, as tracking numbers increased, testing surveillance was broad enough to pick up community spread individuals proportional to total numbers of tests applied, then the proportion of positives from the tests may reflect population rates. However, if rates are tightly limited to immediate known cases, then the reported positives will be a better estimate of underlying population, since the fraction of those seeking medical assistance should be proportional to the exposed number in the population. When available tests increased, the apparent rate grew substantially. Therefore, infected population growth may be more closely reflected in the fraction of positive results normalized by total number of tests applied, in spite of very highly biased sampling selection. Inclusion of these counts necessarily depended on the availability of test kits; yet the number of patients qualifying for www.nature.com/scientificreports/ testing was also increasing; information required to resolve these conditions is missing. For a given proportion of ill patients who seek help, this should track with the fraction of the population who is ill. However, this may be subject to growing awareness of the population to get help with SARS-COV-2 infections. First, consider the idea that tests may be broad enough to sample spread across a population. When test numbers were low, the likelihood that targeted testing would reflect the general population was also low and sampling uncertainties large. Therefore, a lower bound on testing levels was applied, excluding samples prior to 3/20. Later, test ratios started to demonstrate a downwards bend. This shoulder was cut for samples beyond 3/30. Figure 3a shows a regression of a primarily exponential segment in the proportion of positive cases to total tests. Figure 3b shows that segment in the context of the full range of the time series of the proportion of positives to total tests. New York doubling time was estimated from a χ 2 regression between the log of positive test ratios versus time, yielding κ = 0.0471 ± 0.0095 with a doubling time of 14.7 ± 3.0 adjusting for testing counts. In the alternative scenario, positive samples reflect the proportion of symptomatic patients seeking medical aid, a possibility since the testing was so closely tied to diagnosed patients plus contact surveillance. A regression was performed on the cumulative positive counts shown in Fig. 3c) yielding κ = 0.1170 ± 0.0021 per day, with a doubling time of 5.9 ± 0.1 days.
Taking guidance from Table 1 Figure 4 shows a log-linear plot of the rate-limiting variables for a numerical integration of the entire system of differential equations. The pre-peak segment shows a clear view of how the system is dominated by the leading exponential eigenmode of the growth, including the proportions between variables represented in the eigenvector of the leading eigenvalue, which determines the slope. Figure 5 shows the evolution of the system variables in a linear-linear plot. The lags in the peak variables shown in Fig. 5a identify the peak pulse through the system of linear equations. The "est" entries in Table 1 for α H represent values commensurate with (but not a fit to) the New York hospitalization levels 28 . They are a factor of 12 smaller than those fitting the Wuhan hospitalization rate 9 . As such, it is clear that the impact of SARS-COV-2 and COVID-19, the disease it causes, on features such as progression to hospitalization, response to treatment for symptomatic patients, whether patients are identified in time to stop progression to serious or critical stages may impact survivability. The model predicts 3294 fatalities per million, peak recovering hospitalizations of 3347 on day 111, and peak mortality hospitalization (primarily long-term ventilator load) of 1732 on day 114. Figure 5b includes susceptible S and recovered R variables. The range of variation of these variables appears to dwarf the fraction of the population that is latent exposed, infected, or involved with hospital load. One feature of the equations is that the rate of flow of individuals through a compartment may not be reflected in the total number in the compartments at any given time, even at their peaks. At the end, these rates would leave 24,738 per million uninfected and susceptible, with 971,967 recovered per million.
The difficulty in understanding how the testing protocol impacts estimations of rates is illustrated in the New York State rates along with Lebanon's and Australia's rates is shown in Table 2. Considering cases as a representative sample of a fixed proportion of the infected population argues for computing a rate based on cumulative cases. If, on the one hand, the testing generated a random sample of the broader population, more testing would identify more individuals simply because there were more tests. If so, the proportion of positives to total tests may be a closer approximation to the population, and the total positives would be proportional to the square of the actual proportion of diseases, resulting in a doubling of κ . That seems to be roughly what was observed between the two New York State regressions. On the other hand, cumulative rates for two other jurisdictions, Lebanon and New South Wales, Australia, show rates similar to each of the two New York State numbers. And while the New York State proportional model gives an expected factor of 2 in the rate, it is the cumulative rate that more closely resembles the growth and peak in New York, not the relative proportion rate. More, the shifts in test availability and distancing initiation are all visible in the New York data, which contributes to the difficulty even of identifying exponential growth regimes, much less identifying an exponential rate that constrains the available model parameter space. Lebanon's rates seem comparatively low, possibly suggesting throttling based on limited test kit availability. However, the hospitalization rates are commensurately low which may be the result of the early  www.nature.com/scientificreports/ confinement, schools and universities closure, and other social distancing strategies employed by the Lebanese government in late February, 2020. These difficulties highlight the impact of different testing between populations limiting the possibility of comparing rates obtained from testing protocols from different jurisdictions.

Discussion
One of the major goals of epidemic modeling is to predict mortality and resource load on community medical facilities: how many beds, how many ventilators, how much pharmaceuticals, as well as other resources will be needed to get through the epidemic. Early epidemic growth for this system is dominated by the largest eigenvalue of 9 coefficients governing the rate-limiting variables. This eigenvalue determines the doubling time of the growth, and imposes one constraint on those coefficients; the eigenvectors impose three more constraints on the system, leaving five coefficients undetermined. Essentially, all of the rate-limited relevant epidemic variables grow at the same rate maintaining fixed ratios. However, as they near peak, the variable trajectories become more differentiated, with lagging or leading peaks emerging as the impact of S N filters through the system of equations. However, at peak, it is already too late to allow time to acquire and deploy needed resources to hospitals and clinics. By itself, the trajectory of these models in pre-peak growth offers little hint as to final needs. Further, there are a number of combinations of parameters that would yield the same leading eigenvector and eigenvalue. This issue is not specific to the model presented here, but holds for almost any compartmental model more complicated than SIR.
More so, the parameters that govern these epidemic models tend to reflect physiological rates of how the disease expresses itself in individuals, as well as effects that are moderated by demic characteristics. Examples are age structure in the population, which impacts both asymptomatic cases 12,30 and severity of disease 9 . Identification of asymptomatic/pre-symptomatic cases has been problematic since testing protocols tended to require symptoms, or contacts with known infected people. One case in California went untested for 10 days because she had no known contacts. Positive to test ratios for PCR vs. randomly sampled immunoglobin tests reported in New York State shows large differences highlighting the bias in PCR testing. Cases that advance to severe or critical depend on other factors, such as treatment modalities prior to development of advanced symptoms. The rate of transmission depends on physiological parameters as well as normal social distance and social distancing response to an epidemic, how public institutions such as schools are run, how grocery shopping interactions are handled, whether known infections are isolated and other factors specific to each community. Given how widely these parameters may vary from population to population, and the mechanics of how they vary: how they depend on the geographically specific dominating SARS-COV-2 lineages dominant within a given geography 5,6 , and how they depend on behavioral, social, age structure, and other factors of a population. It is worth seeking whether and how these factors relate to the expressed epidemic model rate parameters as phenotypes.
Since the problem of identifying rate limiting parameters prior to peak is underdetermined, these rates must be determined elsewhere. Most statistical reporting does not provide nearly enough information to extract these factors, even at an environmental (quasi-) epidemiological experimental design standards. Further, jurisdictions are applying tests to try to identify new cases that are related to other identified cases through contact. The testing "enrollment protocol" was not designed to understand the spread in the population, but rather to try to identify patients and remove them from circulation by isolating them. More and broader testing is applied as test kits become more available, complicating the basis for interpreting positive counts. Test kits may not be uniform with loss of sensitivity depending on the stage of the infection and/or the type of swab taken (Nasal, nasopharyngeal or sputum). From jurisdiction to jurisdiction, testing and reporting protocols vary, making it difficult to compare jurisdictions, or even the same jurisdiction to itself from day to day. The rate of growth and doubling time may reflect availability and levels of testing more than the actual disease in the population.
Perhaps the best way to acquire the necessary parameters would be a prospective longitudinal cohort study coordinated across multiple jurisdictions. Enrollment should be randomized, reflect regional characteristics such as sex and age structure, and the criteria should be shared across populations participating in the study. During the course of the study, status will be clearly defined (thresholds for "asymptomatic, " defining "recovery, " etc.), and subjects will be monitored for changes in status (a) from susceptible to latent exposed, recording dates of exposure (if possible), (b) to infectious (symptomatic, pre-symptomatic or asymptomatic, with a clearly defined standard for determining possible "infectious" condition) conversion and dates, (c1) for pre-symptomatic to symptomatic conversions and dates or (c2) recovery dates, (d) symptomatic to recovery conversion dates, or (e1) hospitalization dates, (e2) recovery from hospitalization dates, (e3) ICU admission dates, (e4) ICU recovery date, (e5) ventilator treatment start date, (e6) ventilator recovery date, (e7) date of death. A record of how each subject moves through the model compartments, together with time distributions, can provide phenotypic parameters that modelling alone cannot, offering insight into the biology, response of the disease to medications, comorbid conditions, demic characterizations (age is important in determining asymptomatic, symptomatic, hospitalization, and mortality rates), and other features relevant to the impact of SARS-COV-2.
Further, systematically measured parameters provide a uniform basis for comparisons between populations necessary for complete model constructions that yield distributions of trajectories and confidence intervals for timing and peak loads, and which can provide a full epidemiological exploration of how individual subject phenotypes respond to environmental, genetic, comorbid, and behavioral factors that may yield valuable information for biological, clinical and pharmaceutical development. As such, these models may be used as independent triangulating tests and measurement verifications of physiological parameters, and to identify evidence whether some factors that are strong enough to generate deviations are missing.

Conclusions
A response to an article in Nature 31 stated: "A well-known lawyer, now a judge, once grouped witnesses into three classes: simple liars, damned liars, and experts. He did not mean that the expert uttered things which he knew to be untrue, but that by the emphasis which he laid on certain statements, and by what has been defined as a highly cultivated faculty of evasion, the effect was actually worse than if he had". The statement was applied to the specific issue of expert forensic testimony, but adapted for elucidating the duties of a chemist to report their procedures and results adequately. The statement has been restated as "lies, damn lies, and statistics. " The message serves as a warning that statistics collected for certain purposes may not be suited to other purposes. That unsuitability does not reflect any attempt at obfuscation, yet may lead to confusion. The availability of real-time information of testing results appeared to be a boon for epidemic modelers. But, in this case, the use of testing, positive test counts, etc. are tilted towards identifying patients who are likely to have specific treatment needs, and to try to identify contacts to stop epidemic spread. While this serves to save lives and represents the most obvious value for limited resources, these uses render the reported statistics problematic for modeling, or for appropriate epidemiological description of how the disease behaves in populations in response to demographic, dynamics, social, demic, and genetic factors. Physiological parameters based primarily on patients may be biased in terms of those patients who were identified, and the methods by which they were identified. Further, protocols shifted over time within jurisdictions as previously unrecognized community spread and asymptomatic individuals were recognized to be significant contributors to viral spread. While population based surveillance is getting some attention [32][33][34] , the need to understand how asymptomatic patients transmit the virus, and whether they sustain any cryptic physiological damage has driven scientists to random sampling of the population to identify these subjects 35,36 . While the range of physiological impact is being expanded, most such studies are focused on staging for identifying treatment efficacy of disease [37][38][39][40] . Sampling has continued the tendency to be opportunistic, based on inpatient contact 39,40 . Temporal information about viral shedding and stage conversion times have not shown as much interest as therapeutic response studies 39,41 . Further, timing information has presented novel features, such as timing of symptoms and system physiology impact with overlapping time intervals rather than compartmental staging classifications 39 . Pediatric studies also required population sampling [42][43][44] since surveillance testing protocols tended to exclude children.
Finally, modeling not only can provide important information planners need for capacity loads, but models can also test whether parameters, obtained from formally designed epidemiological studies, describe how the disease behaves in a population. Current planning to ensure funds, resources, and designs, are available for conducting this type of multinational study is currently lacking. The most visible impact was both underestimation and overestimation in different regions of what the epidemic impact would be for COVID-19. Both types of failures led to expenses far larger than the cost of running well designed studies, and maintaining resources ready to face this continuing challenge as well as future challenges. www.nature.com/scientificreports/