Correction to: Nature Energy https://doi.org/10.1038/s41560-020-0662-1, published online 17 July 2020.
In the version of this Article originally published, a comprehensive analysis of the model performance was not provided; thus, to avoid potential confusion over the model validation procedure and to provide a better representation of the model performance, the rolling-window cross-validation and out-of-sample testing results have now been included in the corrected Article and its Supplementary Information.
In the Methods section ‘The Mobility Dynamic Index Forecast Module’, the sentence describing the cross-validation method “In addition, cross validation is adopted to search the optimal network structure and avoid overfitting, in which the datasets are divided into training and test datasets by a ratio of 2:1.” has been changed to “In addition, the rolling-window cross-validation is adopted to search the optimal network structure, which is detailed in Supplementary Note 5. Out-of-sample testing is also performed for the selected neural network structure to estimate the performance of the model in predicting future mobility.”
In the Supplementary Information, the original Supplementary Fig. 10 that used R2 to describe the random-split cross-validation results has been replaced by the corrected version that uses the root mean square error (RMSE) to describe the out-of-sample testing results, and the caption has accordingly been updated to read “Root Mean Square Error (RMSE) of the neural network model with 2 hidden layers and 25 nodes. The data before May 15 were used as the training dataset, and the data between May 25 and May 31 were used as the out-of-sample testing dataset. (a) Google mobility: workplaces; (b) Google mobility: retail and recreation; (c) Google mobility: grocery and pharmacy; (d) Google mobility: parks; (e) Apple Mobility.”
Additionally, the original Supplementary Table 4 that used R2 to select the neural network structure has been replaced by the corrected version that uses the RMSE instead to select the neural network structure, and the caption has been updated accordingly to read “Rolling-window cross-validation of the neural network model for different combinations of hidden layers and nodes. The data in the table show the Root Mean Square Error (RMSE) of the training dataset and cross-validation dataset. Yellow highlighted text indicates the layers and the nodes are adopted in the neural network in the PODA model.”
Furthermore, discussion of the rolling-window cross-validation and out-of-sample testing results has been added in Supplementary Note 5: the first paragraph, starting “Supplementary Figure 10 compares the historical mobility data with the results predicted by the trained model…” has been rewritten to read:
“Multiple regularization techniques were adopted to avoid overfitting. We used weight-decaying (equivalent to L2 regularization) to penalize large neural network weights and enforce model parameter sparsity and such to avoid overfitting. We also used mini-batch with Adam optimizer to train the neural network. Mini-batch training can offer a regularizing effect since it adds noise to the learning process. In addition, early stopping was employed to avoid overfitting.
The rolling-window cross-validation was performed to study the effect of the number of layers and nodes on the performance of the neural network model. Supplementary Table 4 lists the Root Mean Square Error (RMSE) of the training datasets and cross-validation datasets. For each combination of layer and node, two evaluations were performed with training dataset to be before April 15 and April 29, respectively. For each run, the model was trained using 2/3 of the randomly selected data from the training dataset. The “Validation dataset” listed in Supplementary Table 4 was used for cross-validation. Generally speaking, the neural network models with 1-hidden-layer and 2-hidden-layer achieve better performance than the 3-hidden-layer and 4-hidden-layer models. They are relatively insensitive to the number of nodes. Overall, the neural network models with 1-layer-30-node, 1-layer-35-node, 2-layer-25-node, and 2-layer-30-node are top performers. The 2-layer-25-node neural network is adopted in the PODA model for this work.
Supplementary Figure 10 shows the out-of-sample testing of the neural network model with 2 hidden layers and 25 nodes. Data before May 15 was used for model training, and the data between May 25 and May 31 for model testing. The trained model well predicts the future mobility related to “workplaces”, “retail and recreation”, and “grocery and pharmacy”. The relatively poor performance in predicting “Google parks” and “Apple mobility” is due to the high day-to-day variations. There is no obvious over-fitting as the performance in the testing dataset is comparable to the training dataset. Finally, the neural network model was retrained with 2/3 of random-sampled all of available data before June 11 to capture the latest pattern.”
These corrections have been peer reviewed.
About this article
Cite this article
Ou, S., He, X., Ji, W. et al. Author Correction: Machine learning model to project the impact of COVID-19 on US motor gasoline demand. Nat Energy (2020). https://doi.org/10.1038/s41560-020-00711-7