Revisiting the use of web search data for stock market movements

Advances in Big Data make it possible to make short-term forecasts for market trends from previously unexplored sources. Trading strategies were recently developed by exploiting a link between the online search activity of certain terms semantically related to finance and market movements. Here we build on these earlier results by exploring a data-driven strategy which adaptively leverages the Google Correlate service and automatically chooses a new set of search terms for every trading decision. In a backtesting experiment run from 2008 to 2017 we obtained a 499% cumulative return which compares favourably with benchmark strategies. A crowdsourcing exercise reveals that the term selection process preferentially selects highly specific terms semantically related to finance (e.g. Wells Fargo Bank), which may capture the transient interests of investors, but at the cost of a shorter span of validity. The adaptive strategy quickly updates the set of search terms when a better combination is found, leading to more consistent predictability. We anticipate that this adaptive decision framework can be of value not only for financial applications, but also in other areas of computational social science, where linkages between facets of collective human behavior and online searches can be inferred from digital footprint data.

feature elimination technique (see Methods in the paper for more details). Fig. S1 shows the histogram of the number of terms selected by the curation process. Over 50% of the decisions are based on the search volume of just one or two terms; moreover 77% of the decisions are made using less than 5 search terms. The curation process never selected more than 20 terms. . Occurrence of the number of search terms selected by the automated curation process from the 100 terms returned by the Google Correlate TM service.

Semantic relatedness to finance
To investigate if there is any semantic difference between the terms selected and rejected by the curation process, we set up a crowdsourcing job on the Figure Eight R platform 1 . We asked the workers located in the U.S. to rate the relatedness to finance of the 10 most frequently selected terms and the 10 most frequently rejected ones into one of four categories; namely, 'nil', 'weak', 'medium', or 'strong'. To rule out unreliable responses, we mixed the 20 original terms with a set of 10 quality control terms, consisting of 5 terms with an obviously strong relatedness to finance (i.e., 'Wall Street', 'Hedge Fund', 'Foreign Exchange', 'Financial Crisis', 'Stock Price') and 5 terms with apparently nil connection to finance (i.e., 'Name', 'Traffic Light', 'Ocean', 'Tree', 'Cat'). The correct rating for the former 5 terms and the latter 5 terms is 'strong' and 'nil', respectively. We collated 100 rating for each of the 20 original terms from workers who correctly rated at least 8 out of the 10 quality control terms. Fig. S2 shows the number of ratings assigned to each of the four categories of relatedness to finance for the 10 most frequently selected terms (blue) and the 10 most frequently rejected ones (red). The 10 most frequently selected terms received over 200 more 'strong' ratings and over 250 less 'nil' ratings than the 10 most frequently rejected ones.

Nil
Weak Most frequently selected terms Most frequently rejected terms Figure S2. The total number of 'nil', 'weak', 'medium', and 'strong' ratings received by the 10 most frequently selected terms and the 10 most frequently rejected terms. Holding long from " + 1 to " + 2 Holding short from " + 1 to " + 2

3/5 4 Selection of window size
The window size in our adaptive strategy controls the length of historical data used to query the GCS and train the linear regression model. We split our data into a validation period (January 20 2008 to January 1 2011) and a testing period (January 1 2011 to March 26 2017). The validation period is used to select the optimal window size from four discrete values (52, 104, 156, and 208 weeks). Fig. S4 illustrates the effect of window size on the cumulative return at the end of the validation period. The window size of 208 weeks yields the optimal validation performance, and is used in the trading in the testing period.

Independent validation on an individual stock
The adaptive strategy, with the same parameters used in the trading experiment on the DJIA, is applied to trade an individual stock (IBM) as an independent validation. The benchmark strategies proposed by Preis et al. 1 and Heiberger 2 are also replicated on the IBM stock, again with the same setting used in the trading experiment on the DJIA. The benchmark strategy proposed by Kristoufek 3 is not considered in this experiment, since it diversifies a portfolio and therefore is not suitable for direct application to an individual stock. Fig. S5 illustrates the cumulative return from January 20 2008 to March 26 2017. The adaptive strategy outperforms by a comfortable margin both the buy-and-hold baseline and the benchmark strategies, obtaining a 234.6% return at the end of the experiment. In contrast, the best performing benchmark strategy (Heiberger) obtains a 75.6% return.