Supplementary materials: Forecasting COVID-19 spreading through an ensemble of classical and machine learning models: Spain’s case study

This document contains the supplementary materials for the manuscript “Forecasting COVID-19 spreading through an ensemble of classical and machine learning models: Spain’s case study”.

The main aspect to highlight in the previous figure is that in most cases it is still true that the ensemble of models (with any of the three forms of aggregation), manages to improve the MAPE results of the two families of models individually. This particularly clear in Autonomous Communities like Aragón, Cantabria, Castilla y León, Castilla La Mancha, Cataluña or País Vasco.
Regarding whether Machine Learning or population models obtain better results, it is observed that this changes depending on the autonomous community. In some of them, ML models always obtain better MAPE than population ones (e.g. Madrid, Castilla La Mancha, and La Rioja), while in others cases population models obtain better results than ML ones (e.g. Cantabria, Castilla y León and Cataluña).

Supplementary materials
Supplementary figure 2. Performance of some of the additional ML configuration we tried, for the Spain case in the test split. Models are aggregated with mean aggregation.

Testfull
Testno omicron  Figure 9 because the split is not as stable as validation (i.e. Omicron is slowly appearing, driving ML models to underestimation).
Supplementary figure 5. Mean MAPE and RMSE, for each model, for the Spain case in the test split. ML models are trained in Scenario 4.
Supplementary figure 6. SHAP dependence plot for non-cases features. For each feature we plot the raw value of the feature vs its associated SHAP value. We average SHAP values across all ML models. All models are trained in Scenario 4 for the Spain case. We display values for all the dataset (train + val + test).

5/7
Supplementary figure 7. SHAP dependence plot for cases lags features. For each feature we plot the raw value of the feature vs its associated SHAP value. We average SHAP values across all ML models. All models are trained in Scenario 4 for the Spain case. We display values for all the dataset (train + val + test).

Explicit solution of the ODE of the Gompertz model and estimation of the initial parameters
Remember that the ODE which defines the Gompertz model is given by: being p(t) the population at time t, and a and b two parameters to determine.

6/7
Taking y(t) = log(p(t)), we obtain its explicit solution as follows (note that we consider y (t) := ∂ y ∂t in order to simplify the notation): =⇒ e bt y (t) + e bt by(t) = e bt a =⇒ e bt y(t) = e bt a =⇒ e bt y(t) = a e bt dt =⇒ In order to estimate the parameters a, b and c we fix three time instants t i , t j and t k verifying: h = t j − t i and 2h = t k − t i . Be α = log(p(t j )) − log(p(t i )) log(p(t k )) − log(p(t i )) we get: The process to be followed to obtain the initial parameters for the Logistic and Bertalanffy models is analogous to the previous one presented for the Gompertz case.