Predicting mortality for Covid-19 in the US using the delayed elasticity method

The evolution of the pandemic caused by COVID-19, its high reproductive number and the associated clinical needs, is overwhelming national health systems. We propose a method for predicting the number of deaths, and which will enable the health authorities of the countries involved to plan the resources needed to face the pandemic as many days in advance as possible. We employ OLS to perform the econometric estimation. Using RMSE, MSE, MAPE, and SMAPE forecast performance measures, we select the best lagged predictor of both dependent variables. Our objective is to estimate a leading indicator of clinical needs. Having a forecast model available several days in advance can enable governments to more effectively face the gap between needs and resources triggered by the outbreak and thus reduce the deaths caused by COVID-19.

www.nature.com/scientificreports/ After estimating equations using different lags, we select the one with the best forecast performance (which minimizes forecasting errors). We calculate RMSE 16 as an indicator of predictive accuracy. Other indicators, such as MAE, MAPE or SMAPE, can be used. RMSE is defined as follows: where N is the number of out-of-sample observations, which we use to estimate the forecast performance of our estimate, ŷ is the estimated value of the dependent variable, and y is the actual value. Finally, we select the estimate with the lagged explanatory variable that shows the lowest value in this indicator, which determines the prediction window, and we make the corresponding prediction in total values.

Results
The estimation sample spans from 4/3/2020 to 29/3/2020 (26 observations). We left March 30, 31 as well as April 1 (3 observations) as out-of-sample observations in order to measure the forecast performance of the estimated model. Estimation is performed through the OLS estimator. We select the model including nine delays, since it shows the lowest RMSE value. The equation of the model that evidences the best forecast performance is the following: The delayed elasticity, β = 0.7839 , means that a 1% increase in the number of infected cases predicts a 0.78% increase in the number of deaths 9 days later. The estimate presents a high goodness of fit, with an R-square of 0.98. Figure 1 displays the evolution of daily new confirmed deaths and new confirmed cases nine days earlier. Table 1 shows the number of actual and estimated deaths, as well as the errors for each time period. To obtain the total number of deaths, we carry out the following transformation: We also applied the DEM to other smaller areas, specifically to the State of California and the city of New York, using data for the same period, in order to test the robustness of the method. The equations which correspond to the best predictive accuracy for each case are the following (for the number of deaths in California and the number of deaths and infected cases in New York city, we add + 1 to the actual values in order to solve the missing value problem generated by observations that take the value 0 when we take logarithms):  • City of New York: For the city of New York, the DEM offers a 7-day window, while for the State of California the prediction window is nine days, the same as for the US as a whole. As regards the delayed elasticity parameters, we found a delayed elasticity of 0.896 for California, which is higher than for the US, which presents a value of 0.7839, while New York city shows a lower delayed elasticity, whose value is 0.7627. Finally, both the California and New York city models present an R-square of 0.98, similar to the 0.98 shown by the US model.   www.nature.com/scientificreports/ As already pointed out, the aim of the DEM is to fill the predictive gaps in the early stages of the pandemic. However, its predictive accuracy holds in the long-run. Indeed, the initial estimates of this work were performed on April 2. However, since the date of this study's revision is early October, we already have enough observations available to evaluate its predictive revision accuracy in the long-run. Figure 2 displays actual and estimated COVID-19 deaths extended up to August 31, 2020 (data extracted from Johns Hopkins University CSSE as of October 5, 2020 is included in Table C1 in the annex). As can be seen, the model estimates remain in an error range of below 10% until July 1: that is, we are able to estimate with a high degree of accuracy for just over three months, having used only 26 observations to obtain the representative equation for the prediction.
In sum, the results show that in the case of the US the deaths for the following nine days could have been predicted with a high level of accuracy during the expansive stage of the pandemic. The results also show that the DEM can be applied to different territorial levels, so that in the case of California, the predictions would also be nine days in advance, and seven days in New York city. We have also verified that the model remains stable over three months. In other words, without re-estimating the model, we could have maintained the same equation during the whole expansive stage of the pandemic to make the predictions.

Discussion
The DEM is a model that does not require long time series and is therefore easily applicable in the early stages of the pandemic, when there is a lack of available data and when authorities need urgent and reliable predictions to fill the gap between the clinical needs caused by the pandemic and the available resources. In addition, it provides a prediction window, which in our case is nine days for deaths from Covid-19 in the US.
The DEM is applicable to other areas, as we have shown at the state and city level. It is also applicable by age groups if data for both variables are available. Obviously, disaggregation, as we have shown, must provide different delayed elasticities given that the pandemic does not evolve in the same way in different locations, and that the disease does not affect different age groups in the same way. Moreover, the DEM is versatile and can be applied to different types of clinical needs associated with the pandemic, such as hospitalizations, admissions to ICUs or ventilator needs. www.nature.com/scientificreports/ Together with the possibilities indicated above, the DEM evidences certain difficulties that must be taken into account. First, the model is sensitive to any factor that affects the counts of both variables. For example, in the first stage of the Covid-19 pandemic, the health system did not have the means to perform all the tests needed, such that the model will clearly lose accuracy as testing capacity improves. There might also be differences and/or changes in the death count. States or territories may also define deaths differently, which will affect the comparability of the results, added to which the authorities might alter the criteria for counting deaths, which would imply greater model inaccuracy. Second, we must also take into account changes of all kinds that affect the relationship between infections and deaths during the evolution of the pandemic. This involves issues such as: the collapse of health systems, changes in treatments, or clinical innovations, changes in environmental factors or mutations of the virus that may alter its lethality, changes in the characteristics of the infected population, such as in the age pyramid of those infected… These factors make it advisable to use short time series and the closest observations in time, so that the above-mentioned changes are kept to a minimum.
Given that all the aforementioned factors may alter the accuracy of the model throughout the pandemic, it will need to be recalculated. To do this, in addition to estimating the model each time we become aware of any of the previously mentioned special circumstances, we can also establish systematic recalculation criteria such as: recalculating when the model presents a continuous prediction error above a certain percentage, for example 5% or 10%, or recalculating by setting a stable calendar, for example every week.
As stated, the DEM is linked to epidemiological models because it estimates the relationship between two epidemiological variables. For this reason, it opens up possibilities for further research on the relationship between delayed elasticity and the parameters of epidemiological models. Clarifying this possible relationship could allow the DEM to be integrated into epidemiological models in order to improve the latter's predictive capacity. Due to its simplicity, versatility and predictive accuracy, the DEM can be applied to make predictions in other areas of clinical care where there are related cause-effect variables. This would substantially expand the research field of the methodology presented in this paper.

Data availability
All data generated or analysed during this study are included in this published article (and its Supplementary Information files).