COVID-19 forecasts using Internet search information in the United States

As the COVID-19 ravaging through the globe, accurate forecasts of the disease spread are crucial for situational awareness, resource allocation, and public health decision-making. Alternative to the traditional disease surveillance data collected by the United States (US) Centers for Disease Control and Prevention (CDC), big data from Internet such as online search volumes also contain valuable information for tracking infectious disease dynamics such as influenza epidemic. In this study, we develop a statistical model using Internet search volume of relevant queries to track and predict COVID-19 pandemic in the United States. Inspired by the strong association between COVID-19 death trend and symptom-related search queries such as “loss of taste”, we combine search volume information with COVID-19 time series information for US national level forecasts, while leveraging the cross-state cross-resolution spatial temporal framework, pooling information from search volume and COVID-19 reports across regions for state level predictions. Lastly, we aggregate the state-level frameworks in an ensemble fashion to produce the final state-level 4-week forecasts. Our method outperforms the baseline time-series model, while performing reasonably against other publicly available benchmark models for both national and state level forecast.


Introduction
COVID-19, an acute respiratory syndrome disease caused by novel coronavirus SARS-CoV-2, has spread to more than 200 countries worldwide, leading to more than 224 million confirmed cases and 4.98 million deaths as of Oct 25, 2021 [1]. Understanding how the disease spread dynamics progress over time is much needed, given the fluid situation and the potential rapid growth of COVID-19 infections. The implementation of efficient intervention policy and the allocation of emergency resources all depend on the accurate forecasts of the disease situation [2]. Currently, machine learning methods [3,4,5] and compartmental models [6,7,8,9,10] are the most popular and prevailing approaches for the publicly-available COVID-19 spread forecasts, according to the weekly forecast reports compiled by the Centers for Disease Control and Prevention (CDC) [11]. On the other hand, statistical models utilizing internet search behaviors for COVID-19 predictions have not attracted much attention.
In the last decade, numerous studies have shown that Internet-based big data could be a valuable complementary data source to monitor the prevalence of infectious diseases and provide near real-time disease estimations [12,13,14,15], alternative to the traditional surveillance approach. For example, Yang et al. [12] provided a robust way to estimate real-time influenza situation in the United States using Google search data; Ning et al. [16] and Yang et al. [17] further extended the method for the regional-level and the state-level influenza estimation using the Google search data at the finer resolution; Yang et al. [18] demonstrated the success of search data to track dengue fever in five tropical countries. Other types of internet-based data include cloud-based electronic health records [19] and social media messages [20]. Currently, during the ongoing COVID-19 pandemic, a large amount of pandemic-related online searches are generated, indicating actual infections and general concerns, which might contain useful information to estimate and predict the disease spread.
However, building COVID-19 prediction model with online search data is undoubtedly challenging. First of all, COVID-19 pandemic is a novel disease outbreak with rapid development, which creates difficulties in identifying relevant keyword queries of the search data. Even with the expert-curated queries, the online search frequency data can contain a high level of noise with many unusual spikes, due to the general searches driven by non-disease factors such as media coverage or public concern. Besides, the ground truth data compiled by CDC could also be noisy with frequent retrospective revisions of daily/weekly cases accounting for the mistakes in data collection and reporting.
The relevant search queries in existing literature include general COVID-19 terms such as "coronaviours" or "COVID" [27], public safety precautions such as "handwashing" [28], symptom-related queries such as "loss of smell" [35,30]. However, most of the existing articles focus on a handful of query terms under a specific class, without a large-scale data-driven query identification process.
While most of existing articles argue for the potential of online search data for COVID-19 forecasts [23,27,28,29,30,33,31,24,35], only a few actually build the prediction models [34,26,32,25]. Specifically, Mavragani and Gkillas [34] demonstrates the search data predictive power via quantile regression in U.S. states. Prasanth et al. [26] uses the search data from selected queries on a long-short term memory (LSTM) framework for U.S, U.K and India. Ayyoubzadeh et al. [32] takes it further by combining linear regression with LSTM model to provide short-term COVID-19 cases forecast in Iran. Lampos et al. [25] conducts a prediction study in several European countries, using transfer-learning and Gaussian processes.
However, none of the articles above fully utilizse the predictive power of internet search data by accounting for the spatial-temporal structure, including the time series information of COVID-19 or internet searches in near-by regions/areas. Other qualitative analysis, ad-hoc correlation exercise, or off-the-shelf model application are even further away from a real impact. The only internet-search-based model, to the best of our knowledge, that accounts for spatial structure is ARGONet [22], which uses a clustering and L 1 -penalized data augmentation technique for 2-day ahead COVID-19 cases forecast in China. So far, none of the existing Internet-search-based methods provides robust weeks-ahead forecasts for different geographical areas in the United States that account for spatial correlations.

Our contribution
In this paper, we propose a simple framework with Google search queries for United States national and state level 4-week-ahead COVID-19 deaths forecasts. In particular, we identify relevant queries through a large-scale cross-correlation exercise, and de-noise the search frequency data from unusual spikes using inter-quantile-range method. We detect strong correlation between lagged Google search data and COVID-19 deaths with symptom-related terms, and select important Google search queries among the large-batch of related terms to increase the robustness and interpretability of our forecasts. We then utilize the detected predictive power and combine the selected Google search data and lagged COVID-19 time series information to produce national level forecasts. We further incorporate a multi-resolution spatial temporal framework for state-level forecasts, leveraging cross-state, cross-region COVID-19 and Google search information, accounting for the geographical proximity and correlations in infections. Unlike the clustering technique in ARGONet [22], our state level method exhibits stronger internal spatial structure and better model interpretability. Lastly, we incorporate a winner-takes-all mechanism to generate more coherent final United States national and state level predictions. Numerical comparisons show that our method performs competitively with other publicly available COVID-19 forecasts. The success of our method demonstrates that previously-developed models for influenza predictions [12,17] using online search data can be re-purposed for accurate and robust forecasts of COVID-19, further emphasizing the general applicability of our method and the power of big data disease detection. national and state level. Regional level Google search queries are obtained by simply summing up the state level Google search query volumes that are in the region. Note that some of the search terms might seem identical, e.g. "coronavirus vaccine" and "covid 19 vaccine.", but we treat them separately in our model due to linguistic heterogeneity, as terms with similar meaning but differently phrased are embedded with different search frequencies by the public due to different linguistic preferences.
Google Trends also truncates data to 0 if the search volume for the query is too low. Consequently, for a given query and state, the zeros in Google Trends data indicate missing data due to low volume of searches for the the specified query and state, which is very common in practice. We account for the high level of sparsity in the state level data by borrowing information from regional level to "enrich" state-level sparsity through a weighted average of state-level search frequency (2/3 weight) and regional-level search frequency (1/3 weight) [17].

Inter-Quantile Range (IQR) Filter for Google Search Data
Instability and sudden spikes/drops in Google search volume data can due to natural noises in Google Trends' sampling approach. Meanwhile, sparsity might still exist in some states' search queries after regionalenrichment of state-level Google search data. Such instability and sparsity severely reduce the prediction accuracy at the national, regional and state level. Therefore, we introduce an Inter-Quantile Range (IQR) based data filtering mechanism to reduce the noise in the data.
We first drop the Google search terms that have frequencies lower than the median number of all other Google search queries frequencies. Then, we identify the outliers of a Google search query. The large-valued outlier are those above 99.9 percent quantile and also three standard deviations above past-week rolling average. The small-value outliers aer those below 1 percent quantile. We overwrite the outlier values to be the past three-day average.
The reason behind different data processing approaches for large and small valued outliers lies in the hypothesis that an sudden increase of search volume is more probable to be true than a drop in search frequency of a Google search query, which is possibly due to inadequate search intensity. For instance, sudden increases in search frequencies of COVID-19 related terms occur when COVID-19 first hit the U.S. in mid March 2020, while decreases/sparsities in search query volumes are resulted from low search volume and missing data, especially in state-level search queries. Therefore, the IQR filter is "looser" on large valued outliers and "more strict" on small valued outliers. As an result, we believe that a large valued outlier is indeed an "unreasonable" spike if they are significantly larger than search frequencies from the other days in the same week.
This IQR inspired filtering mechanism is able to further account for the sparsity in Google search queries as well as removing unusual spikes, which improves our model's forecast accuracy.

Optimal Lag
It is typical to see the peak of COVID-19 search volume ahead of the peak in reported cases or deaths, see Fig 1 for an illustration for query "loss of taste". One hypothesis is that the early-stage infected people could search for COVID-19 related information online before their arrival at a clinic or tested positive.
As such, using delayed Google search frequencies for forecast is essential for our predictions. One simple way is to enumerate all possible delaying lags of Google search frequencies as exogenous feature variables. Yet, this will significantly increase the number of exogenous variables and impact prediction accuracy which could result in over-fitting. Thus, we derive the optimal lag for each Google search query and only consider those optimal lags in forecasting model.
We use the period from April 1st 2020 to June 30th 2020 to find the optimal lags. We will use the period after July 1st 2020 for comparing forecast accuracy. Because media-driven or information-seeking searches are very common at the beginning of the pandemic [39,40], we exclude period prior to April 1st 2020 in the analysis, so that the query terms identified are more likely to be driven by actual infection. For each query, we fit a linear regression of COVID-19 daily death against lagged Google search frequency, considering a range of lags (4 to 35 days). We select the lagged Google search frequency that has the lowest mean square error (MSE) as the optimal lag for that query. Table S3 in supplementary materials shows all the optimal lags for the selected important Google search terms, ranked by their optimal lags. For national, regional and state level Google search information, we consider the same optimal lag displayed in table S3 throughout this study. Figure 1: Google search query "loss of taste" and COVID-19 weekly incremental death Illustration of delay in peak between Google search query search frequencies (Loss of Taste in red) and COVID-19 national level weekly incremental death (blue). Y-axis are adjusted accordingly.

Highly Correlated 23 Terms
Though we removed low frequency queries and sparsity in the remaining queries through the IQR filter, some remaining queries might still exhibit high variability and do not obtain a clear trend comparing to COVID-19 death. To further obtain most useful terms for predicting COVID-19 death and eventually reduce our model complexity, we computed Pearson correlation coefficient between each of the optimal lagged search term and COVID-19 daily death during the period from April 1st 2020 to June 30th 2020, where the detail derivation of optimal delay is shown in section 2.2.2. Table S2 in supplementary materials lists all the Google search  query terms and their Pearson correlation coefficient against COVID-19 daily death, in which only positive Pearson correlation terms are displayed.
We select the 23 terms that have Pearson correlations above 0.5 as "important terms" and only use them in our forecast model, shown in table S3.

ARGO Inspired Prediction
Let X i,t,m be the Google Trends data of search term i day t of area m; y t,m be the New York Times COVID-19 death increment at day t of area m; c t,m be the New York Times COVID-19 confirmed case increment at day t of area m, where the area m can refer to the entire nation, one specific HHS region (such as New England), or one specific state (such as Georgia). Let O k be the optimal lag for the kth Google search term, which is the same for all area m. Let I {t,r} be the weekday r indicator for t (i.e., I {t,1} indicates day t being Monday, and I {t,6} indicates day t being Saturday), which accounts for the weekday seasonality in COVID-19 incremental death time series.

ARGOX Inspired State Level Prediction
In the previous section, our prediction is based on Google search terms and past COVID-19 cases and death information. The predictions perform competitively on national level, but fall short in regional and state level due to the spatial correlation induced by geographical proximity, transportation connectivity and region-state-wise related spread. Thus, we incorporate ARGOX [17], a unified spatial-temporal statistical framework that combines multi-resolution, multi-source information while maintaining coherency with regional and national COVID-19 death. ARGOX [17] operates in two steps: the first step extracts state level internet search information via LASSO, and the second step enhances the estimates using cross-region, cross-resolution spatial temporal framework. The second step takes a dichotomous approach for the joint states and alone states, identified through geographical separations and multiple correlations on COVID-19 growth trends. In particular, it incorporates the cross-state, cross-source correlations for the joint model of connected states, while leaving the freedom for stand-alone modeling for a handful of isolated states. For the connected states, it gathers raw estimates for state/regional/national-level weekly COVID-19 incremental deaths and past time step's COVID-19 incremental deaths, and applies a penalized best linear predictor to get the final state level estimates. See SI for detailed formulation in our particular setting of joint model and stand-alone model.

State Level Forecasting with National Constraint
Although ARGOX is a consistent framework in cascading fashion, the exact predictions from ARGOX are not consistent through aggregation from bottom up. There exists inconsistency between the aggregation of state-level ARGOX prediction and national level prediction ARGO prediction, illustrated in fig S1 and S2 in supplementary materials, which shows that the sum of ARGOX 51 states' COVID-19 incremental death predictions does not equal the national level ARGO prediction. Furthermore, our ARGO inspired national level predictions (section 3.1) are more accurate for the national level groundtruth than the sum of ARGOX 51 states' predictions. Therefore, we propose a constrained second step to ARGOX inspired state level prediction, by restricting the sum of state-level ARGOX predictions to be close to the ARGO inspired national prediction.
After the first step in section 3.2, instead of separating the 51 US states into "joint" and "alone" states and estimating state-level COVID-19 deaths separately, we treat them as a whole (except HI and VT) as we are constraining the sum of all states' death estimations. We separate out HI and VT, since these two states have the lowest incremental COVID-19 deaths, and such sparsity causes instability when deriving the covariance matrices. We estimate HI and VI COVID-19 weekly incremental death using the ARGO inspired state-level estimates through equation (2).
For the 49 states (except HI and VT), our raw estimates for the state-level weekly COVID-19 incremental death where r m is the region number for state m. Here, we denote state and reg to be state/regional estimates with internet search information only, where the detailed derivations can be found in SI, and nat to be national estimates from equation (1). Similar to the second step in section 3.2 and in ARGOX [17], we denote the state-level death increment at week τ as Z τ = ∆y τ = y τ − y τ −1 and it has four predictors: (i) We denoteŷ * ARGO τ,nat as the week τ national level ARGO-inspired COVID-19 incremental death estimation excluding HI and VT. To predict the week τ state-level COVID-19 death increment, we solve the constrained optimization problem below that minimizes the variance between groundtruth and our predictor, subject to the constraint that the sum of our predictor ought to be close to national level COVID-19 incremental death estimations.
where 1 = (1, . . . , 1) is a length 49 vector of 1s, and µ Z is the mean of Z τ . In particular if the above optimization problem is unconstrained, the solution will be the best-linear predictor (no ridge-penalty) in ARGOX [17]. The detail derivation of the optimization problem in equation (3) as well as the final closed-form solution can be found in the supplementary materials.

Winner-takes-all State Level Ensemble Forecast
To further boost the state-level COVID-19 death prediction accuracy, we incorporate an ensemble framework that combines our previous estimations and selects the best predictor for each week. For all 51 U.S. states, we denote the ARGO-inspired state-level prediction for week τ as "ARGO" (section 3.1), ARGOX-inspired joint-alone state prediction as "ARGOX-2step" (section 3.2), and ARGOX-inspired national constrained prediction as "ARGOX-NatConstraint" (section 3.3). For a training period of 15 weeks, we evaluate each predictor with mean squared error (MSE) and select the one with lowest MSE as the ensemble predictor for week τ + 1, τ + 2, τ + 3, and τ + 4. Such winner-takes-all approach has been previously shown to be effective for influenza estimation [14].

Evaluation Metrics
We use three metrics to evaluate the accuracy of an estimate of COVID-19 death against the actual COVID-19 Death published by John Hopkins University (JHU): the root mean squared error (RMSE), the mean absolute error (MAE), and the Pearson correlation (Correlation). RMSE between an estimateŷ t and the true value y t MAE between an estimateŷ t and the true value y t over period Correlation is the Pearson correlation coefficient betweenŷ = (ŷ 1 , . . . ,ŷ T ) and y = (y 1 , . . . , y T ).

National Level
We focus on United State's 51 states/districts (including Washington DC) for all comparisons in this section. In this section, we conduct sensitivity analysis by comparing our own methods on national level and state level for 1 to 4 weeks ahead COVID-19 incremental death prediction.
For national level, we compare with four other simplified models (i) persistence (Naive), (ii) ARGO prediction (section 3.1), (iii) AR-only prediction, and (iv) GT-only prediction. The Naive (persistence) predictions use current week's incremental death counts from New York Times (NYT) as next 1 to 4 weeks' estimation, since NYT does not retrospectively correct past data. AR-only prediction uses the model setup in section 3.1 but only with lagged death and cases information, whereas GT-only prediction is using Google search information only. For fair comparisons, both AR-only and GT-only predictions include weekday indicators, and use 56 days training period. Daily COVID-19 incremental deaths are estimated using the three methods above, and aggregated into weekly incremental deaths for the time period of July 4, 2020 to Oct 9th, 2021. The period before July 4, 2020 is excluded for comparison analysis, as the optimal Google search terms' lags are selected using that period. The groundtruth is COVID-19 weekly incremental death from JHU COVID-19 dataset (section 2.1).    The results presented above demonstrate ARGO's accuracy and robustness. The optimal lagged 23 important Google search terms appear to be key factors in the enhanced accuracy of ARGO, as well as the past week's COVID-19 deaths, as shown in Fig S3 and S4 in supplementary materials, which reflect a strong temporal autocorrelation after the feature selection of L 1 penalty.

State Level
For state level, we compare persistence (Naive) predictions, ARGO prediction, ARGOX-2step prediction, ARGOX-NatConstraint and Winner-takes-all Ensemble. ARGO predictions for each state are obtained from a L 1 penalized regression, utilizing the lagged state-level COVID-19 cases, death and optimal lagged Google search terms (section 3.1). ARGOX-2step combines cross-state cross-region information to obtain model outputs as predictions (section 3.2). ARGOX-NatConstraint, revised upon ARGOX-2step, constrains the sum of state-level prediction to align with national-level prediction while pooling cross-resolution cross-temporal information (section 3.3). ARGO daily predictions use 56 days as training period, and are aggregated into 1 to 4 weeks ahead weekly incremental death predictions. ARGOX-2step, ARGOX-NatConstraint, and winner-takes-all ensemble method all use 30 consecutive and overlapping weeks as training period.
We conduct retrospective estimation of the 1 to 4 weeks ahead 51 U.S. state level COVID-19 incremental deaths, for the period of July 4th 2020 to Oct 9th 2021. To evaluate the accuracy of our estimation, we conduct sensitivity analysis among our methods by comparing their estimations with the actual COVID-19 weekly incremental death released by JHU dataset in multiple error metrics, including RMSE, MAE, and the Pearson correlation (section 4.1).   Detailed numerical results for each state are reported in tables S5-S55 and fig S14-S64 in supplementary material, where the ensemble approach demonstrates its accuracy in majority of the states. Notice that JHU ground-truth data exists jumps and spikes in some states due to retrospective edition. Such noisy ground-truth data is a key challenge to forecasting tasks. Our ensemble framework, on the other hand, reveals its robustness over geographical variability and extracts a strong combination from all other ARGO, ARGOX approaches.
To show detail break down of methods contributing to the ensemble approach, we display the ensemble approach's selection proportion among ARGO, ARGOX-2step and ARGOX-NatConstraint 1 to 4 weeks ahead predictions for all 51 U.S. states from 2020-07-04 to 2021-10-09, in table 3. Throughout the period for all 51 states, the ensemble approach selects state-level ARGO predictions the most (around 40%), and ARGOX-NatConstraint the least (around 25%) throughout 1 to 4 weeks ahead estimations. This is evident as  In addition to the weekly estimates, ARGOX Ensemble also gives confidence intervals, by simply taking the confidence intervals of selected method. Table S4 (in supplementary materials) shows the coverage of the confidence intervals for all 51 states. The nominal 95% confidence interval has an actual 88.2% coverage on average for 1 week ahead predictions, suggesting that our confidence intervals reasonably measure the accuracy of our weekly estimates, albeit with over-confidence.

Comparison with Other Publicly Available Methods
We obtain all available CDC published forecasts from teams making projections of COVID-19 cumulative and incident deaths for comparison [41,11,42,43,44,45]. CDC official predictions are a compilation of predictions from all teams that submit their weekly predictions every Monday since January 15th 2020, contributed by different research groups or individuals. Note that all prediction models have their strengths in different date ranges and are providing robust and consistent predictions.

Discussion
While our ARGOX-Ensemble approach shows strong results, its accuracy and robustness depends on the reliability of its inputs. One limitation of our method is that the Google search query volumes are sensitive to media coverage, and such instability could propagate into our COVID-19 death predictions. Fortunately, media driven searches die down as pandemic progresses. In addition, our model also mitigate such instability via adaptive training.  We use the summer period to identify the optimal lag and the 23 highly correlated queries. Such idea of optimal lag captures the intuition that people tend to search before clinic visits. It is interesting to observe different indications of epidemiological plausibility and infections from the optimal delays of the queries (table  S3). COVID-19 incremental death trend has the longest delay from COVID-19 cases and tests related queries (more than 4 weeks), and follows by mild (3-4 weeks) and severe symptoms (2-3 weeks) related queries, indicating symptom to death is in moderate horizon and symptom's severity increases while the optimal lag decreases. On the other hand, there are still some Google search queries affected by media coverage and general public fear. Namely, vaccination related queries have the shortest delays (table S3), which is a short term signal as well as general concern, intensified by news media coverage and spikes in cases or death trends, since the vaccination is not yet available in summer 2020. Nonetheless, our model is able to robustly capture the COVID-19 death trend, by determining the optimal lags during the summer period.
Information in Google search data deteriorates as forecast horizons expands, which could potentially impact the robustness and accuracy of our 4 weeks ahead predictions (Fig 2 and Tab 4). Nevertheless, the L1 penalty and the dynamic training are able to capture the most relevant search terms and time series information for COVID-19 national level death estimation, and our model is still better than GT-only or AR-only models ( For state level, besides producing ARGO state level predictions using the same framework as national level ARGO, we effectively combine state, regional, and national level publicly available data from Google searches and delayed COVID-19 cases and death to produce ARGOX-2Step state level estimations. ARGOX-NatConstraint improve upon ARGOX-2Step by restricting the sum of state level death predictions to be similar to national level death predictions, as ARGO national level predictions have already shown its strength. Both ARGOX-2Step and ARGOX-NatConstraint incorporate geographical and temporal correlation of COVID-19 death to provide accurate, reliable 1 to 4 weeks ahead estimations. To further improve accuracy and robustness, we combine all three methods and produce winner-takes-all ensemble forecast for 1 to 4 weeks ahead state level deaths. ARGO and ARGOX-2Step are unified frameworks adapted directly from influenza prediction with minimal changes, which demonstrates their robustness and general applicability, while reducing the possibility of over-fitting. Furthermore, the winner-takes-all ensemble approach efficiently combines all three frameworks and is able to outperform the constituent models for all states in all 1 to 4 weeks ahead predictions. Our national model and state-level performances are competitive to other state-of-arts models from CDC. Thus, we have shown that adapting ARGOX framework to COVID-19 can achieve accurate and robust results, and our model could serve as a valuable input for the CDC's current ensemble forecast.

Concluding Remarks
In this paper, we demonstrate that methods for influenza prediction method using online search data [12,16,17] can be re-purposed for COVID-19 prediction. Specifically, by incorporating Google search information and autoregressive information, we could achieve strong performance on national level deaths predictions, while aggregating Google search information and cross-state-regional-national data could achieve competitive performance on state level death predictions, for 1 to 4 weeks ahead COVID-19 death forecasts, compared with other existing COVID-19 methods submitted to CDC. The combination of COVID-19 cases and deaths with optimally delayed Google search information, as well as the utilization of geographical structure, appear to be key factors in the enhanced accuracy of ARGO in national and state level predictions, demonstrating great additional insights which could assist and complement current CDC forecasts. This Supplementary Material is organized as following:

Implementation detail for ARGOX 2-Step
In the first step, we use LASSO to aggregate the search volume information in the corresponding area. In the second step, we take a dichotomous approach for the 51 US states/districts, setting apart seven states: AK, HI, DE, KY, VT and ME. We first set apart AK and HI, since they are geographically separated from the contiguous US. Then, we determine the rest by computing multiple correlation in COVID-19 incremental death count of each state to the COVID-19 incremental death counts of entire nation, the COVID-19 incremental death counts of the other regions (excluding the region that the state belongs) and the COVID-19 incremental death counts other states. DE, KY, VT and ME are the 4 states that have the lowest multiple correlations. A relatively low multiple correlation of a state implies that the state's COVID-19 death growth trend is not well aligned with other states', other regions' or the whole nation, indicating that information cross the other states or other regions might not help the stand-alone 7 states' death prediction. Therefore, we incorporate the dichotomous approach from ARGOX [17] on the 45 "joint" states, and 6 "alone" states.

First
Step For the first step, using the same notation as in section 3.1, we extract region/state level internet search information in region/state m for day T + l by estimatingŷ T +l,m using Google search terms with equation where X i,t,m is the Google Trends data of search term i day t of region/state m and I {T +l,r} is a weekday r indicator for the forecast date T + l. The coefficients {µ y,m , δ = (δ 1,m , . . . , δ K,m ), γ = (γ 1,m , . . . , γ 6,m } are obtained via argmin µy,m,δ,γm,λ We set M = 56 days for training, and λ = {λ δ , λ γ } through cross-validation, where we let λ δ = λ γ for simplicity. Additionally, we use K = 23 highly correlated Google search terms, and letÔ k = max (O k , l) as the adjusted optimal lag of kth Google search term subject to l th day ahead prediction. Denote the regional estimates obtained as (ŷ reg T,1 , . . . ,ŷ reg T,10 ) and state estimates obtained as (ŷ GT T,1 , . . . ,ŷ GT T,51 ). Lastly, we obtain national COVID-19 death estimateŷ nat T using equation (1). Since all the estimates obtained above are daily COVID-19 incremental death, we aggregate them into 1 to 4 weeks total incremental death and work on 1 to 4 weeks ahead death forecast separately using the following steps. We denote index τ for weekly indexing and t for daily indexing.

Second Step
For the 45 joint states, we gather the raw estimates for state/regional/national-level weekly COVID-19 incremental deaths from the first step to obtain the best linear predictor with ridge-regression inspired shrinkage for state-level COVID-19 increnental death estimate for week τ via ARGOX [17], using four predictors. Specifically, we use our raw estimates for the state-level weekly COVID-19 incremental deaths: 45 ) , expanded national and regional level COVID-19 death estimates:ŷ nat τ = (ŷ nat τ , . . . ,ŷ nat τ ) andŷ reg τ = (ŷ reg τ,r1 , . . . ,ŷ reg τ,r45 ) , where r m is the region number for state m, and the previous week state-level groudtruth, with 30-weeks training window for parameter estimations.
For the 6 alone states, we take a stand-alone modeling approach [17], focusing on estimating the individual state's COVID-19 1 to 4 weeks ahead incremental death by integrating the within-state and national information in the second step. Specifically, we use 3 predictors, previous week state-level groundtruth, state level and national level COVID-19 estimates,ŷ GT τ,m ,ŷ nat τ , for m ∈ {AK, HI, DE, KY, VT and ME}, where the regional terms are dropped. Similarly, we use the best linear predictor with ridge-regression inspired shrinkage to get the final estimates [17], with 30-weeks training period.

Detail Derivation for ARGOX-Nat-Constraint
The detailed derivation for constrained optimization problem in equation (3) is shown as follows. For simplicity, we dropped week index τ and are solving the optimization problem in equation (3) to estimate the predictor for week t. We assume that Z τ , W τ are demeaned in this case. Additionally, we rewrite the constraint in equation (3) as 1 AW τ =ỹ for simplicity, by denotingỹ =ŷ * ARGO We first re-write the original optimization as a Lagrangian function after some simplification as : where Σ ZZ = Var(Z), Σ W W = Var(W ) and Σ ZW = Cov(Z, W ) and are constructed through equation ARGOX's setup [17]. Then, we can re-write the original problem in equation (3) as: By taking derivative with respect to A and λ, we have After setting them to 0 for optimally condition and solve for A and λ, we have: where n = 49 is the length of vector 1. Let W τ = W τ − µ W be the demeaned predictor. Our estimate for the increment at week t iŝ Thus, our final prediction for state level COVID-19 week t incremental death iŝ Moreover, we use the ridge-regression inspired shrinkage to modify the estimate, by replacing Σ ZW as 1 2 Σ ZW and Σ W W as ( Therefore, our final prediction for state-level COVID-19 week t incremental death with ridge inspired shrinkage is:ŷ Table S1 lists the selected 23 important terms used in this study. They are selected through optimal lag selections.  Table S2 shows the optimal lag delayed Google search queries' Pearson correlation with COVID-19 daily incremental death.  Table S3 shows the selected 23 Google search queries' optimal lag (delay), selected through fitting regression of lagged terms against COVID-19 death trend and select the lag with minimal mean-squared error. A preprint -November 4, 2021

ARGOX-Nat-Constrained motivation
FigS1a, S1b, S2a, and S2b show the inconsistency of aggregated state level ARGOX predictions comparing against ARGO national level predictions for future 1-4 weeks ahead predictions.       Fig S8 to fig S13 shows the teams comparison in three error metrics as heatmaps throughout 1 to 4 weeks ahead forecasts.