A novel benchmark for COVID-19 pandemic testing effectiveness enables the accurate prediction of new Intensive Care Unit admissions

The positivity rate of testing is currently used both as a benchmark of testing adequacy and for assessing the evolution of the COVID-19 pandemic. However, since the former is a prerequisite for the latter, its interpretation is often conflicting. We propose as a benchmark for COVID-19 testing effectiveness a new metric, termed ‘Severity Detection Rate’ (SDR), that represents the daily needs for new Intensive Care Unit (ICU) admissions, per 100 cases detected (t − i) days ago, per 10,000 tests performed (t − i) days ago. Based on the announced COVID-19 monitoring data in Greece from May 2020 until August 2021, we show that beyond a certain threshold of daily tests, SDR reaches a plateau of very low variability that begins to reflect testing adequacy. Due to the stabilization of SDR, it was possible to predict with great accuracy the daily needs for new ICU admissions, 12 days ahead of each testing data point, over a period of 10 months, with Pearson r = 0.98 (p = 10–197), RMSE = 7.16. We strongly believe that this metric will help guide the timely decisions of both scientists and government officials to tackle pandemic spread and prevent ICU overload by setting effective testing requirements for accurate pandemic monitoring. We propose further study of this novel metric with data from more countries to confirm the validity of the current findings.


Scientific Reports
| (2021) 11:20308 | https://doi.org/10.1038/s41598-021-99543-y www.nature.com/scientificreports/ and therefore the interpretation would be conflicting; indeed, the health administrations of a country should be confident that a sufficient number of tests is performed to effectively track the virus spread. However, if such a metric also implemented measurable outcomes of the pandemic in the community (e.g., number of deaths, number of ICU admissions, etc.), they could introduce by their more factual nature a link between expectation and actuality, since the outcomes of COVID-19 are inherently tied to the virus's pathogenesis. Therefore, such a link could, in theory, introduce a benchmarkable step of convergence towards a soft cap (threshold) that would in turn reflect testing adequacy, e.g., usually a maximized or minimized value, or a state of minimized variation.
In this report, we present an easy-to-implement metric that we developed while independently monitoring and analyzing COVID-19 pandemic evolution in Greece, which considers outcomes that are already monitored in most countries, such as the daily numbers of human losses, COVID-19 patients in the Intensive Care Units (ICU), and patients who are being discharged from the ICU. In our example we show that this metric displays remarkable output stability when a certain threshold of daily testing is reached, which to our view clearly reflects testing adequacy. Furthermore, we validated its benchmarking efficiency by forecasting, not only with high accuracy but also great precision, the total daily needs for new ICU admissions, roughly 2 weeks in advance, over a period of 10 months.

Methods
The national monitoring data for the evolution of the COVID-19 pandemic in Greece were retrieved from the Hellenic National Public Health Organization 3 and Greek Government's official daily announcements 4 . Specifically, the daily official announcements included the following parameters: (a) number of new COVID-19 cases detected, (b) number of deaths due to COVID-19, (c) total number of COVID-19 ICU patients, (d) total number of COVID-19 patients discharged from ICU, (e) total number of SARS-CoV-2 PCR tests performed 5 , and (f) total number of SARS-CoV-2 rapid antigen tests performed 6 . Based on the available data, we defined the daily needs for new COVID-19 ICU admissions as number U: where: Today's deaths due to COVID-19: d, Today's number of COVID-19 patients discharged from ICU: e, Today's total number of COVID-19 ICU patients: x 0 . Yesterday's total number of COVID-19 ICU patients: x t−1 . This number U represents the actual daily new COVID-19 ICU admissions, plus those patients who died in the community (not in ICU), whom we theorize to have required ICU admission, hence the definition of the daily needs for new COVID-19 ICU admissions.
Next, we defined as the Severity Detection Rate with a time lag (t − i) (SDR i ), a metric that represents the percentage of patients who require ICU admission, per new cases, detected (t − i) days ago, per 10,000 tests, performed (t − i) days ago: where: Today's rolling 7-day average of new daily needs for COVID-19 ICU: U. Rolling 7-day average of detected COVID-19 cases, (t − i) days ago: c t−i . Rolling 7-day average of total number of COVID-19 tests, (t − i) days ago: n t−i .
Tests in Greece were performed freely by any individual who wanted to get tested, in selected hospitals, or in most private diagnostic centers and clinics, or in mobile testing hubs, dispatched by the public healthcare administration. Also, an individual may get tested in regular intervals (e.g., up to twice per week), as requested by their employer or the administration, due to the nature of their profession. To the best of our knowledge, only one swab is taken from the individual per test, in Greece. Furthermore, the reported COVID-19 cases detected, and daily tests performed, are used for the official calculation of positivity rate, announced routinely by the country's healthcare administration 3 ; if multiple tests per individual were simply added to the total daily number, this would constitute a systematic error in the calculation of positivity rate. Therefore, for the reasons explained above, for this analysis, the daily number of tests reported publicly is presumed to represent unique individuals.
For initial data exploration, the lag of Severity Detection Rate (SDR) metric was set to 14 days, which means that the current day's critical outcomes of COVID-19 (i.e., ICU admission or death in the community) were attributed to COVID-19 cases detected 14 days ago. For the identification of the optimal lag point between the critical outcomes of COVID-19 and the detected cases, we searched within an interval between 7 to 21 days, in the period 17/10/20 to 31/1/21 of the dataset, for the most stable correlation between the numerator (number U) and the denominators ([cases t−I × tests t−i ]) of the metrics studied. The best correlation was obtained for a lag of 12 days (i = 12) (see "Discussion" section) and therefore, for consistency, all charts and tables reflect this optimal time lag (i = 12).
Finally, for completeness of the study, we also defined as ICU admission Rate with a time lag (t − i) (henceforth "ICU Rate", IR i ), a metric that represents the percentage of patients who require ICU admission, per new cases, detected (t − i) days ago: where: Today's rolling 7-day average of new daily needs for COVID-19 ICU: U. Rolling 7-day average of detected COVID-19 cases, (t − i) days ago: c t−i .
IR metric is essentially a simpler form of the SDR metric, which doesn't take into account the number of daily tests performed. As we wanted to also evaluate its predictive performance, we doubled every piece of analysis performed on the SDR metric, on the IR metric as well. The related charts and tables are not part of the "Results"

Results
For observation, the daily evolution of SDR 12 , from the 7th of May 2020 onwards, was traced on the same chart versus the observed number of daily ICU needs, the positivity rate and the corresponding number of testing samples ( Fig. 1). Compared to the other quantities, the SDR metric shows a remarkable stabilization past the time mark on approximately 20/8/2020, which also corresponds to the attainment of an average daily testing number of 10,000/ day. From that point forward, the observed daily ICU needs, the positivity rate and the testing rate continue to fluctuate independently and considerably, but without accordingly perturbing SDR stabilization.
Tripling the average daily rate of testing (from 4 to 12 K) in the second (ii) interval brought a sevenfold lower average value of SDR (20.1%/2.7% ~ 7.4), with a remarkable 20-fold decrease (19.6%/1%) in the Standard Deviation (SD) of SDR, and a concomitant threefold decrease in the Coefficient of Variation (CV) of SDR (0.97/0.36 ~ 2.7). Further doubling of the average daily number of tests (from 12 to 24 K) in the third (iii) interval again brought an equivalent decrease in the SDR SD (1.0%/0.4% = 2.5) although the average value of SDR was now only moderately diminished by approximately 30% (2.7%/2.1% ~ 1.29), indicating a tendency towards stabilization of the SDR value and a continuous reduction of the Standard Deviation (SD). Overall, it is noteworthy that specifically the average and SD values of SDR continued to drop consistently in all 6 periods.
We then traced the values of SDR metric against the daily number of tests. TheSDR values display a strong correlation with the daily number of tests, employing power regression (Spearman r = − 0.90, p = 10 -167 , N = 451) and suggest that beyond a threshold of daily tests performed, SDR becomes significantly stabilized (Fig. 2); for Greece, this stabilization begins once the number of daily tests exceeds the mark of 10,000 per day.
The next step was to study the correlation between the numerator (number U) and the denominator ([cases t−i × tests t−i ]) of SDR metric, for the period 17/10/2020 to 8/8/2021 (Fig. 3) www.nature.com/scientificreports/ to be the same as the start of testing period (iii) ( Table 1). Before that date, both the numbers of new daily needs for ICU and daily cases were relatively low ( Fig. 1 and Data TAB in Supplementary Workbook) and therefore of smaller interest to the specific study, i.e., when added to the rest of the data, the respective correlation is innately stronger due to the near-baseline nature of the data points prior to 17/10/2020. Finally, we applied the linear regression equations to forecast the rolling 7-day average daily needs for new ICU admissions, 12 days ahead of each data point of daily announced cases and tests, for the corresponding periods, i.e., from 17/10/2020 to 8/8/2021. The forecast employing the SDR regression equations (Fig. 3) proved  . 4). Expectedly, as can be noticed in Fig. 4B, most of the few intense discrepancies in the fitted values are observed around dates of transition from one regression equation to another; a rolling regression window could possibly help improve the forecast of even these phases. Overall, forecasting with the use of Severity Detection Rate proved to be functional as it indicated a very strong agreement between the predicted and observed values for a period of nearly 10 months, which included the two major pandemic waves in the country, thus far.

Discussion
We have shown that beyond a threshold of daily tests performed, SDR reaches a plateau that displays very low variation. This threshold appears roughly around the 10,000 daily samples mark in Greece, a country of approximately 11 million people, but this number is expected to vary greatly from country to country depending on the total population, rural density, societal particularities, population immune profile, and sampling strategies 7 .
Reaching that threshold should not mean that there is no need for further increase in the number of daily tests, as it is strongly suggestive that the more tests a country performs, the more informative the results are about the actual viral spread in the community, and consequently health administrations are in better position to respond accordingly. In terms of the SDR metric, more daily tests appear to further decrease its variation ( Table 1). The weaker its variation, the stronger the correlation coefficient between the numerator and denominator of SDR, i.e., number U versus the product (cases t−i × tests t−i ), and therefore, the more accurately we can predict the number of daily needs for new ICU admissions, t + i days in advance. In the studied example, predictions were highly accurate with an average daily number of tests as high as 24,000 (Table 1), which resulted in a SD of the SDR of 0.4%. As the SD of the SDR showed a consistent decrease over a period of 15 months in our studied example (Table 1), we propose it can possibly act as an actual numerical threshold that denotes the attainment of the SDR plateau. As a direct consequence of this potential predictability, when SDR establishes a plateau, we consider that the bulk of daily tests is returning a set of positive cases that is stably representative of the current spread of the virus. Therefore, the SDR metric constitutes a benchmark of testing effectiveness. The metric is potentially efficient at a local level as well, if cases that require delocalization, e.g., due to lack of available ICU locally, are effectively tracked and taken into account. As the full segmentation of the necessary data was not available at a local level for the present study, it was not possible to evaluate the effects of viral spread uniformity across the country and, more specifically, the metric's behavior due to disproportionate testing intensities locally, e.g., higher number of tests in districts with lower viral load, and relatively lower numbers of daily tests in districts with higher true viral load. In such a case, it would be helpful to apply the SDR monitoring at a local level.  www.nature.com/scientificreports/ The metric's median value is expected to decrease monotonically and with decreasing variation as daily tests increase, or due to the gradual containment of the virus, immunization of the population, thanks to an efficient vaccination program, improvement of therapeutic protocols that reduce the number of very severe cases, or even a significant reduction in the average age of infected individuals due to the efficient protection of the elderly. Conversely, the metric's median value may increase (interrupting the plateau) if the viral spread becomes greatly enhanced with time, e.g., due to the prevalence of a new more infectious variant [8][9][10] , without the testing levels catching up. In such a case the SDR's median will increase disproportionately and beyond its expected variability.
In order to comprehend the nontrivial nature of the plateau attainment and retainment in the plot of SDR versus the number of daily tests (Fig. 2), it is useful to look more carefully at some notable boundaries of the SDR metric. For instance, if it was possible to test the entire population every day for newly infected individuals (minus the individuals that are already known to be infected), then the "discovery" of every new infection case would be guaranteed (assuming 100% accurate tests). With a number of daily tests as big as the entire population and with the highest possible number of detected cases (i.e., equal to the actual cases), the SDR value becomes [U/ ((actual new cases) × population)] with the denominator assuming its greatest possible value, hence producing the lowest possible SDR.
In a different approach that hypothetically guarantees the detection of all the actual new infected cases (without testing the entire population), we can consider testing all the newly infected individuals, and only them, so that the number of daily tests becomes equals to the number of new infections (again, assuming 100% testing accuracy). In this case the SDR value becomes [U/(actual cases × actual cases) = U/actual cases 2 ]. Whether the possible values of the SDR metric can be bigger or smaller than the value obtained in this second hypothetical scenario, depends on whether the product [cases t−I × tests t−i ] is smaller or bigger than the square of the number of actual new infection cases (see mathematical demonstration, below). Finally, as the theoretical maximum of all the possible SDR values we may consider the case where the denominator [cases t−i × tests t−i ] is equal to 1, and therefore SDR would be equal to number U. Specifically:   (5) is trivial as the number of actual new cases (α t−i ) and the entire population of the country, or area of interest, are by definition the highest possible values of the product (cases t−i × tests t−i ). However, inequality (6) describes a situation where the number of tests can only be equal or greater than α t−i 2 /cases t−i , and which may increase up to the number of the entire population, causing the reduction of the SDR value till its described minimum of U/(α t−i × population). Inequality (7), inversely, describes a situation where the number of tests can only be equal or lower than α t−i 2 /cases t−i ′, and which may decrease to as low as 1 test, causing the increase of the SDR value to its maximum that equals the number U.
Therefore, because of this demonstrated relationship between the number of daily tests and the number of actual new infections, we theorize that in a plot of SDR versus the number of daily tests, the observed plateau is a consequence of the SDR starting to adopt values that are smaller than U/α 2 . Inversely we observe values outside the plateau as long as SDR adopts values greater than U/α 2 . This is potentially what happened around the mark of 10,000 tests in our studied example (roughly around 20/8/2020), with the product (cases × tests) increasing almost tenfold within a few days and presumably becoming greater than the square of the actual new cases, thus collapsing the SDR variability into the observed plateau (Figs. 2, 5). The importance of the plateau being, as previously explained, the reduction of the metric's variability (i.e., Standard Deviation), enabling a correspondingly robust forecasting of ICU needs, (t + i) days ahead of each datapoint.
In the context of the regression analysis of the daily needs for new ICU admissions (U) vs. the product of [Detected Cases × Performed Tests] (Fig. 3), significant changes in the SDR median would be reflected as changes in the slope and/or the intercept of the regression line. Specifically, changes in the slope most likely translate into two possibilities: (A) a change in virulence (i.e., how many individuals per group of 100 positive cases, per 10,000 tests, are expected to develop very severe COVID-19, given a theoretical zero regression intercept), or (B) a modification in sampling parameters (e.g., testing more or fewer asymptomatic persons, or testing a younger subset of the population). Accordingly, a change in the intercept will likely signify either (a) changes in viral prevalence 7,11 , as the intercept represents a fixed number U for a theoretical x = 0, (i.e., a number of individuals with very severe COVID-19, while no cases are detected), or (b) changes in testing accuracy 7,11 , with intercept values closer to zero reflecting optimal accuracy. Rolling 3-weeks regression windows could be employed to  ) is required to discern which exact change is responsible for the observed new disease dynamics, and the SDR derived regression analysis can provide significant hints as to the direction of the change. In any of the above cases, an important shift of the SDR would signify an important change in the pandemic parameters, which in turn would dictate a specific course of action for the authorities, appropriate for each case.
In Table 2 we contrast the regression parameters (i.e., slope, intercept and R 2 ) against important factors of the ongoing pandemic, such as, Delta variant prevalence, vaccination levels, and lockdown periods [12][13][14][15][16][17] . What is most notable is the stable slope decrease of the regression equations, over all 6 periods examined, which is compatible with a decrease in population-level severity/virulence. This is to be expected, given the long periods of the applied lockdown measures and the ongoing mass vaccination program in the country (reaching 50% population coverage of fully vaccinated individuals on 8/8/2021). As presented in the previous paragraph, another factor that can possibly lower the SDR slope is a significant change in sampling parameters, in a way where the group of asymptomatic individuals that are being tested becomes considerably increased, a situation that results inherently to fewer detected cases than the group of symptomatic individuals. While it is hard to discern the potential contribution of each factor with just the publicly available data, it is, nonetheless, possible to calculate a 9.5-fold total drop in the observed severity between the beginning and ending of the six periods (17/10/2020 → 8/8/2021), after adjusting for the obvious contribution of the change in the average number of cases and tests ( Table 2): On the contrary, the intercept oscillates considerably between periods, ranging from + 28.6 to − 3.9. As explained previously, increases of the intercept may be attributed to greater viral spread in the community, as was the case in the second period (11/2/2021-21/4/2021), when Athens, the capital, saw a great increase in infected cases, which signaled the beginning of the 3rd wave of the pandemic in Greece. Besides viral spread, the other factor that influences the intercept is the accuracy of the tests performed, i.e., potential false positives and false negatives, due to poor test specificity, test sensitivity, or yet undetectable levels of the virus in asymptomatic infected individuals who simply got tested too early in the course of the disease. Regarding Delta variant prevalence (B.1.617.2), representing 90% of cases in Greece on 8/8/2021, it doesn't appear to be affecting the severity of the disease (i.e., a slope increase), however it is possibly contributing to the intercept increase from 16/6/2021 onwards, with its greater transmissibility potential, as reported by other studies 18 . Overall, the slope and intercept of SDR-based regression equations offer an additional layer of information, which, in conjunction with other metrics and parameters, may create a better understanding of the pandemic's dynamics.
We called this new metric Severity Detection Rate, as its representation of the percentage of very severe COVID-19 outcomes is modulated by the number of tests performed. It is essentially a standardization of the  www.nature.com/scientificreports/ very severe cases ratio over the infected individuals, with the rate of daily testing. In other words, the Severity Detection Rate becomes representative of the proportion of people who need ICUs out of the total cases once a sufficient threshold of daily testing rate (hence 'detection rate') is achieved. As presented in the "Methods" section, for a more complete examination, we also defined the percentage of patients who require ICU admission, per new cases detected (t − i) days ago, as ICU Rate (IR). If, in theory, the total number of tests became equal to the entire population of a country (or the area of interest), then the SDR metric would be the same as the IR metric, as the 'number of tests' parameter would be removed from the Calculate SDR: #number of today's ICU needs / 100 posiƟve cases/ 10,000 tests performed t-i days ago

Monitor following #numbers, daily:
• # of tests (PCR + Quick AnƟgen assay) • # of posiƟve cases • # of deaths from COVID-19 • # of current ICU paƟents • # of paƟents discharged from ICU Recommended populaƟon-level surveillance of COVID-19 pandemic using the Severity DetecƟon Rate (SDR) metric Monitor  www.nature.com/scientificreports/ denominator (as redundant), and both would practically represent the true percentage of critical patients per infected individuals. In order to assess the predictive potential of the IR metric, we have repeated for IR every piece of analysis that was performed on the SDR metric throughout this study and included all the related charts and tables as Supplementary Information (file: "supplementary _charts_tables_IR_assessment.docx"). Regarding forecasting, the conclusion drawn by this parallel analysis is that the IR metric performed as well as the SDR metric, in the analyzed example (Supplementary Figs. 3-S, 4-S, Supplementary Table 2-S). On top of this, the IR metric would probably have the advantage of simplicity when communicated in the general public, as it represents a more comprehensible concept: the number of very severe cases per infected individuals. We therefore believe that the IR metric may be used in cases where the population-level COVID-19 testing surveillance of the pandemic is well established, by efficient and sufficient testing. Nonetheless, we support that by including the number of daily tests performed, the SDR metric is inherently more suitable for a wider range of surveillance scenarios, e.g., when the testing strategies and pandemic parameters (e.g., number and type of tests, geographical/occupational/age targeting, contact tracing efficacy, transmissibility of the virus, etc.) are more volatile in time. In different countries, or in specific areas of interest, it is still possible for the IR-based monitoring to fail to return regression coefficients as strong as in our studied example. In those cases, it would be necessary to switch to SDR-based monitoring to ensure that a threshold of sufficient testing has been reached (i.e., plateau formation). In any case, although more studied examples are required to better understand the potential practical differences between the two metrics, since they both showed equal forecasting performances, we believe that SDR is the more well-rounded metric, which can be efficiently used in potentially very diverse situations of pandemic surveillance.

Conclusions
Taken together, the monitoring of the Severity Detection Rate and the forecasting of number U (i.e., daily needs for ICU) should be viewed as integral parts of the currently employed epidemiological toolbox, i.e., the positivity rate, efficient contact tracing for determination of the basic reproduction number R 0 19,20 , and wastewater-based surveillance 21,22 . The metric introduces the goal for authorities to minimize its variation by means of a sufficient number of daily tests and an adequate sampling strategy. Once this goal is achieved, accurate forecasting of daily needs for new ICU admissions becomes possible. With accurate forecasting, number U becomes in essence a quantitative metric for the severity of the pandemic.
In Fig. 6 we detail all the proposed steps for population-level surveillance of COVID-19 pandemic using the Severity Detection Rate metric. For monitoring SDR Standard Deviation, a minimum of 3-weeks rolling window interval is suggested empirically, as this interval includes the roughly 2-week lag period between case detection and ICU intubation. The recommended surveillance model provides three distinct advantages: (1) a measurable threshold for adequacy of tests performed, (2) important qualitative information regarding the current dynamics of the pandemic (virulence, prevalence, testing accuracy, etc.) that are reflected by changes in the slope/intercept of the regression analysis, and (3) the ability to accurately predict the ICU needs, t + i days ahead.
We strongly believe that the explicit tracking of this novel metric enhances the visibility of viral spread and dynamics and may procure an accurate outlook of the upcoming needs for ICU admissions well in advance, which should serve as an early warning system for COVID-19 health establishments and resources. We therefore suggest further study of Severity Detection Rate with data from more countries, as well as at a local level wherever possible, to confirm the proposed functionality and utility of this metric.