Population’s health information-seeking behaviors and geographic variations of stroke in Malaysia: an ecological correlation and time series study

Stroke has emerged as a major public health concern in Malaysia. We aimed to determine the trends and temporal associations of real-time health information-seeking behaviors (HISB) and stroke incidences in Malaysia. We conducted a countrywide ecological correlation and time series study using novel internet multi-timeline data stream of 6,282 hit searches and conventional surveillance data of 14,396 stroke cases. We searched popular search terms related to stroke in Google Trends between January 2004 and March 2019. We explored trends by comparing average relative search volumes (RSVs) by month and weather through linear regression bootstrapping methods. Geographical variations between regions and states were determined through spatial analytics. Ecological correlation analysis between RSVs and stroke incidences was determined via Pearson’s correlations. Forecasted model was yielded through exponential smoothing. HISB showed both cyclical and seasonal patterns. Average RSV was significantly higher during Northeast Monsoon when compared to Southwest Monsoon (P < 0.001). “Red alerts” were found in specific regions and states. Significant correlations existed within stroke related queries and actual stroke cases. Forecasted model showed that as HISB continue to rise, stroke incidence may decrease or reach a plateau. The results have provided valuable insights for immediate public health policy interventions.


Stroke as a public health issue in Malaysia. The epidemiological literature of stroke in Malaysia was
scarce until the implementation of the National Neurology Registry (NNEUR) of Malaysia in 2009 25,26 . Malaysia witnessed an escalating incidence of stroke cases, being the third most common cause of mortality and topped the nation's disability rate 27 . In 2016 alone, stroke accounted for 11,284 cases, mostly affecting men (55%) and those aged 60 years or older (60%) 26 . Age-standardized stroke mortality rates were 103 per 100,000 in men and 97 per 100,000 in women 25 . Significant functional disabilities and psychiatric morbidities posed substantial burden to patients, caregivers, healthcare systems and providers 25 , thus escalating high economic burden 28 . Google trends related studies. Google Trends has been valuable to explore trends, seasonality and correlations for a variety of neurological and non-communicable diseases. Walcott et al 29 used Google data to determine the prevalence of stroke in the USA. They found that disease-specific search queries related to stroke correlated well with geographical differences across states and the correlation model provided a metric to evaluate health disparities 29 . Senecal et al 30 hypothesized the importance of online search symptoms for early identification of cardiovascular diseases. They found correlation of online symptom of chest pain with coronary heart disease epidemiology 30 . Kumar et al 2 were eager to determine if temporal and geographical interests in seeking cardiovascular disease (CVD) information online would follow a seasonal or geographical pattern similar to those observed in real-world data. They performed an ecological correlation study by using online search queries from Google Trends and age-adjusted estimates of mortality associated with heart disease, heart failure and stroke per 100,000 persons. They found that query volumes followed strong seasonal patterns and yielded moderate to strong positive correlations between state-level search query volumes and mortality rates 2 . Bragazzi 6 explored internet usage data for seeking health materials for self-care and self-management purposes in monitoring multiple sclerosis using Google Trends. The study concluded that Google Trends was a reliable tool for monitoring multiple sclerosis with significant correlations found between clinical manifestations and treatment across different states in Italy 6 .

Motivations of the current study
As conventional epidemiological data collection and analysis is labor intensive and time consuming, Google Trends has offered an alternative to provide real-time data. Such alternatives, being part of PHDS has given an opportunity to public health advocates to yield immediate evidence for crafting disease control and prevention strategies. The diversity of subjects that Google Trends could explore for examining changes in search interest overtime and the usefulness of this tool in assessing human behavior is evident that online search traffic data analytics being correlated with conventional epidemiological data will be valuable to explore, predict and forecast health behavioral changes amongst populations 4 . Given the high prevalence of stroke in Malaysia in recent years, it is timely to offer this novel epidemiological surveillance data analytics tool at the population level for faster evidence synthesis.

Methods
Study population and design. This countrywide ecological correlation and time series study was conducted between January 2004 to March 2019 by employing digital and spatial epidemiological analytics for the study of stroke HISB and incidence of stroke among the Malaysian population. Digital epidemiology adopted concepts of "infodemiology" and "infoveillance" that was recently coined as the "new public health" to study online HISB of health related conditions and disease patterns, distributions, trends, variations, and correlations by using novel internet data streams 31 . While "infodemiology" has been defined as the science of distribution and determinants of information in an electronic medium, specifically the internet (Google Trends) with the ultimate aim to inform public health policy, "infoveillance" has been conceptualized as the longitudinal tracking of "infodemiology" metrics for surveillance and trend analysis. Spatial epidemiological analytics that utilized geographic information systems (GIS) was employed to understand the distribution of HISB and stroke incidence across regions, cities and states in Malaysia.
Data source. Online HISB of stroke was retrieved from Google Trends multi-timeline search queries data.
Google Trends, an online tracking system of internet search volumes that merged with Google Insights for Search (Google Inc.) 32 , was searched between years 2004 until 31st March 2019 for the terms "stroke, " "strok (Malay), " "angin-ahmar (Malay), " "cerebrovascular accident, " and "CVA" in Malaysia. Related domains of "stroke and organ affected, " "stroke types, " 'stroke symptoms, " "stroke signs, " "stroke risk factors, " "stroke treatment" and "stroke prevention" were also explored. Google Trends automates normalized data for the overall number of searches and provides values as relative search volumes (on a scale from 0 to 100; value 0 does not necessarily indicate no searches, but rather indicates very low amount of search volumes that are not included in the results) in order to compare variations of different search terms across geographical settings and periods. This approach has been applied and validated. All queries and search volumes related to stroke were downloaded via .csv file format.
Conventional surveillance data of actual stroke counts in the country was obtained from the NNEUR, a prospective, multicenter hospital-based registry that captures data of acute stroke patients admitted across Ministry of Health Malaysia hospitals nationwide. The registry is an on-going effort funded by the government of Malaysia and consists of fifteen participating stroke hospitals across the Peninsular Malaysia and Borneo region. The registry aims to capture a comprehensive epidemiological surveillance data of stroke in the country. NNEUR participating stroke hospitals enroll confirmed hospitalized stroke patients within two weeks of symptoms onset 26,33 . Actual stroke counts that were available between 2012 and March 2019 across states were retrieved and tabulated.
procedure. The procedure of data retrieval, exploration and analysis was conducted based on the validated methodological framework proposed by Mavragani and Ochoa 34 . It includes four major steps as follows: Step 1: Measurement of online search interests (data overview) We explore online interest for different terms or keywords (up to five) in the same region for the same period such as "stroke, " "strok, " "angin ahmar, " "CVA, " and "cerebrovascular accident" in Malaysia from January 01, 2004, to March 31, 2019. Related domains of stroke were also explored. As our search terms may encounter misspellings in English but correct in Malay (for e.g. "stroke" in English, but "strok" in Malay is equally correct for the language, but considered misspelled in English), we utilized the "+ feature during searches to aggregate the result volumes without eliminating it. II.
Step 2: Explore seasonality or variations This step aimed to detect variations or seasonality of web-based interest. It forms the platform if the data is suitable to proceed on examining relations between online search interests and actual events or disease cases. III.
Step 3: Finding correlations This step correlates web-based queries among them or with official actual data cases. The official actual stroke count data in Malaysia was obtained from the NNEUR. IV.
Step 4: Predict and forecast This final step aimed to predict and forecast stroke HISB with future incidence of stroke.
Statistical methods. Statistical analysis was conducted using R version 3.5.1 35 and IBM SPSS Statistics version 22.0 36 . We conducted time series analytics to explore trends of HISB of stroke in Malaysia. Seasonality over time, month and weather variations, coupled with top search queries and flux volumes was determined through Google Trends multi-timeline data. To test for differences in mean search volumes across weather and month, we used linear regression analysis with season or month as a categorical predictor, with the 95% CIs for percentage change being bootstrapped with 1,000 random samples. Correlograms to check for autocorrelation and adjusted partial autocorrelation significance for time series was determined using Wessa Time Series 37 . In addition, we determined randomness of data through series of point time lags that reached zero or near-zero in yielded correlograms. The degradation of points to near zero, either rapidly or slowly determines stationary or non-stationary of the data in the correlograms.
Spatial epidemiology of choropleth maps were yielded through merged data from the Global Administrative Database (GADM-Level 1 Data-Malaysia) that was available from the Center of Spatial Sciences 38 . A list of stroke attributes and related terms of their flux volumes were correlated with their hit search data using Pearson's correlation coefficient analysis. Pearson's correlation analysis is the measure of linear correlation between two continuous variables 39,40 ; in this study "stroke" search term as the dependent variable and stroke-related terms as independent variables retrieved from Google Trends search queries. The analysis yields Pearson's correlation coefficient (r) and ranges between − 1 and 1 39,40 . A correlation of − 1 indicates that the two variables are negatively linearly related, a correlation of 0 means that the two variables do not have any linear relations, while a correlation coefficient of 1 means that two variables are perfectly positively linearly related 40,41 . Consistent with these statistical theories, we followed trends of recent time series studies that utilized Google Trends to explore Scientific RepoRtS | (2020) 10:11353 | https://doi.org/10.1038/s41598-020-68335-1 www.nature.com/scientificreports/ correlations within search terms or between search terms and counts data of different diseases by employing Pearson's correlation analyses 2,7,8,10 . Subsequently, we performed an ecological correlation analysis 42,43 to test whether search volumes were correlated with the actual incidence of stroke at state and country level using Pearson's correlation coefficient analysis. Significance level was set at two tails (P < 0.05).
Finally, we forecasted a predictive model using exponential smoothing of Winters additive method to yield Malaysia's Stroke 2.0, that aims to forecast HISB and projected incidence of stroke within the next 3 years. Forecasting and modelling methods in principle have two general approaches-exponential smoothing or moving averages 44 . On what determines the usability on one of those two approaches are the conditions of stationary and seasonality of the time series data [44][45][46] . Moving averages are highly appreciable in stationary time series 44 . As our time series data showed seasonality trends and was non-stationary, we opted for exponential smoothing 44 . Literature has identified that Holt-Winters exponential smoothing (a stochastic procedure of observations during the time) is better and more widely used due to its flexibility in seasonal variations 45,47 . The method assigns exponentially increasing weights when previous observations get closer to the current state, with older observations being assigned a relatively lesser weights 47 . Winters method offers two methodologies to execute forecasting analysis; either additive method or multiplicative method 45,46,48,49 . Additive method is used when the data shows seasonality that is roughly constant, while multiplicative method is used when seasonal variations change proportionally and rapidly to the level of time series [44][45][46] . As our data is more inclined to the former, we used the additive method. The mathematical formula is given below: in which α, γ and δ denote smoothing parameters, and S t , T t and I t represent smoothing equations of levels, trends and seasonality. The data from observed values (X t ) is projected through the forecasting Eq. (4), at k steps ahead to yield prediction, X t (k) 46 Results trends of stroke health information-seeking behaviors. The most common search query was the English term 'stroke. ' Between January 2004 and 31st March 2019 (n = 183), a total of 6,282 'stroke' hit search queries were generated through Google Trends in Malaysia. The interest over time of internet search queries showed a cyclical pattern within a 2-year interval, and subsequently exhibited seasonality over the years (Fig. 1). Correlograms that yielded autocorrelation and partial autocorrelation plots showed statistical significance with series of time lags, and dataset was at randomness (Fig. 2).

Variations of search volumes by months and weather.
The mean percentage of stroke search volume was significantly higher for the period of January to April and June to December in comparison to the month of May (P < 0.01 for January-February, April, June-October and December compared to May; P = 0.016 for March vs May; P = 0.014 for November vs May) ( Table 1). When analyzed by weather, average search volume was higher during the Northeast Monsoon in comparison to the Southwest Monsoon (P < 0.001) ( Table 1). Figure 3 illustrates a choropleth map that exhibits the geo-spatial distribution of 'stroke' HISB across all states in Malaysia. (1)   Correlograms were plotted using wessa.net time series function 37 . Yielded parameters: lambda = 1, d = 0, and D = 0 indicated no transformation or differencing was applied before PACF was computed. 95% confidence interval (CI) was computed assuming white noise time series. ACF autocorrelation function; PACF partial autocorrelation function. Correlations of stroke-related Google Trends search queries. Table 2 exhibits correlations between stroke related Google Trends search queries. Stroke symptoms and signs and risk factors were the most searched stroke-related terms in the population. Most stroke-related search queries showed positive correlations with statistical significance (P < 0.05). Across all search queries, "stroke and weakness" showed the strongest positive relationship (r = 0.851, P = 0.014) followed by the risk factor "stroke and family" (r = 0.401, P < 0.001).

Correlations of stroke Google Trends search query and stroke counts. Most states in Malaysia
showed statistical significance between 'stroke' Google Trends search query with actual counts of stroke. From the countrywide perspective, Malaysia showed a statistically significant negative correlation between 'stroke'    forecasting model of stroke in Malaysia. Figure 5 shows an estimated forecasting model of stroke in Malaysia. The initial correlograms showed that degradation of points in series of time lags to near-zero was slow, suggesting that the data was at non-stationary. We subsequently confirmed stationary based on unit-root tests. The Augmented Dickey Fuller test showed non-statistical significance (P = 0.722), while the Kwiatkowski-Philips-Schmidt-Shin (KPSS) test was statistically significant (P = 0.001), indicating the presence of non-stationary, thus subjecting our model to exponential smoothing. The yielded forecasted model using Winters additive method was statistically significant (P = 0.001), accounting for 62.7% of the total variance explained. The multi-fitted data Table 3. Correlations between stroke-related search query and actual stroke counts data. Data was mined since 2012 till 31st March 2019 for compatibility with official stroke count registry data. Official count data was retrieved with permissions from NNEUR Malaysia. Malaysia's stroke count data included nine states (excluded Federal Territories of Kuala Lumpur and Putrajaya, Negeri Sembilan, Johor, Melaka and Pahang due to minimal or unavailability of data for inclusion into analysis). Most correlations were statistically significant yielding evidence that online HISB follows actual count data for further selection into forecasting model. *Denotes statistical significance at P < 0.05. **Denotes statistical significance at P < 0.01. www.nature.com/scientificreports/ within the 95% confidence interval showed that 'stroke' Google Trends search query would continue to rise but the incidence of stroke may decrease slightly or reach a plateau within the next 3 years (Fig. 5).

Discussion
This countrywide ecological correlation and time series study utilized the combination of 'digital epidemiology' through novel data stream (Google Trends internet data) and 'classical epidemiology' of surveillance count data through disease registry that was explicitly aimed to nurture a comprehensive population health-forecasting model of stroke in Malaysia. With rising stroke incidence, we set to address the Malaysian populations' HISB of stroke in real-time situations, how these behaviors were changing over time with weather variations and geographic gradients, and how would Malaysians be impacted by the current stroke scenario in the future. The trends and patterns yielded in this preliminary spatial epidemiological and time series analytical approach from the Malaysian perspective would set the direction of public health policy preventive measures and tertiary level management guidelines for stroke in the country. We observed one significant peak of hit searches in 2016. The relatively high search volumes of 'stroke' in 2016 could be attributed to the initiation of massive rigorous campaigns and interventions at the hospital and community level nationwide. In 2015, stroke emerged as the second highest non-communicable disease afflicting Malaysians. Malaysia's leading efforts in combating stroke was recognized by the World Stroke Organization in 2016 when the country's sole rehabilitation hospital was awarded with the best institutional campaigner to prevent stroke in the low and middle income country category 51 . From the public health perspective, advocates called upon immediate unification of various stakeholders from the government, private and non-governmental organizations to integrate the nationwide hypertension campaign called the "The Morning Hype Campaign" with the "My Stroke Story Photo Exhibition Campaign, " the largest ever representation that involved thirty one stroke survivors who were empowered to submit their photo stories depicting their personal journeys of stroke survival with the desire to live life to the fullest 52 . A touching phenomenon that grabbed media attention in 2016 was the news depicting a Malaysian suffering stroke in London and the family being hit with an excruciatingly high hospital bill, halting further treatment for stroke. Malaysians' emotions were triggered and an online fund raising campaign was launched to allow fundraisers to channel donations and to follow the health progress of the stroke survivor 53 . These phenomena may have triggered the spike of multiple hit searches of stroke in Google across Malaysia in 2016.
Over an 18-year period, we observed that populations' HISB of stroke showed a cyclical pattern within a 2-year interval, and subsequently extended to a seasonality trend over the recent years (as evident from Fig. 1). As borderless internet connectivity allows accessibility across all regions in Malaysia with the emergence of Internet of Things (IoTs), the cyclical pattern data yielded through the trend series analysis could be attributed to immediate HISB by stroke afflicted patients, patients' relatives, family members, colleagues or friends to explore further information about stroke. Google has acknowledged the significance of online health searches and has prioritized the delivery of medically accurate and reliable information 30 . People searching for information on stroke and their outcomes may do so at the time they are experiencing symptoms and may believe that information provided by Google is accurate for the next course of action. Two possible postulations could be derived from the temporal patterns exhibited in our trend analyses. The first is that people may search for symptoms at the time they are experiencing some discomfort such as limb weakness or slurred speech during the onset of stroke or transient ischemic attack. Such searches could be accomplished by the patients themselves at the early onset of symptoms or by their representatives when their clinical conditions deteriorate further. Secondly, seasonality patterns that could extend over months or years could be attributed by searches accomplished by poststroke survivors to explore disease prognosis, quality of life, disabilities, treatment strategies and cure. Searches at this period of time could also be conducted by patients' family members, relatives, friends or colleagues to provide social and functional support in view of the debilitating nature of stroke that impairs activities of daily living (ADL) in post-stroke survivors. These situations may have catalyzed periodic ups and downs of 'stroke' hit searches frequently via Google Trends. These consistencies were observed in passively generated search queries from Google Trends that have evaluated seasonal patterns in HISB for a variety of non-communicable diseases 2,54-56 .
HISB showed variations between months and weather. We observed greater peaks of hit search volumes between November and April annually which was parallel with the Northeast Monsoon weather, affirming that a causal link may exists between stroke related information-seeking behaviors mediated by higher incidence of stroke during Northeast Monsoon (6,155 cases) as compared to Southwest Monsoon (5,878 cases). Interestingly, the links between HISB and incidence of stroke during Northeast Monsoon were consistent with the geo-spatial distribution of the yielded choropleth maps. Regions affected during this weather season were the East Coast Region and the Northern Region of Peninsular Malaysia. "Red alerts" were conveyed through the distribution maps exhibiting that the states involved in the two regions, namely Kelantan, Terengganu and Perlis were highly prevalent in terms of stroke incidence and stroke search queries in the country. Previous state-specific study showed that Terengganu had relatively high number of stroke cases 33 .
The current study was the first from the Asian perspective that has offered triple anticipated relationships in a spatial epidemiological analysis, showing consistencies between HISB and actual stroke counts data with month, weather and geographical variations in the country. Although these findings were consistent with previous studies that explored HISB from a variety of non-communicable diseases through an ecological perspective 33,[54][55][56] , these studies were limited with only two associations; the relationships between online HISB either with incidence of the disease or seasonal variations. The linkage of these attributes could not be speculated with the pattern of seasonal variations and geographical distribution coherently. Substantial amount of literature have found considerable amount of evidence that meteorological, temperature or weather variations pose greater risk Scientific RepoRtS | (2020) 10:11353 | https://doi.org/10.1038/s41598-020-68335-1 www.nature.com/scientificreports/ for the occurrence of stroke [17][18][19][57][58][59] . Much specifically, the seasonal variations of stroke were more likely to be attributed during colder months [60][61][62][63][64][65] . These trends were consistent with the findings of our current study that stroke incidence, coupled with high HISB were more prevalent during the colder Northeast Monsoon season.
A plausibility of such association could be attributed when seasonal changes occur from warmer to cooler temperatures, causing increased blood viscosities or vasoconstriction, a major predictor of stroke 59,64 .
Brigo and colleagues postulated that people with chronic health conditions will frequently use search engines to look for terms related to their disease definitions, etiologies, risk factors, symptoms, treatment and prevention strategies 66 . Our findings were in line with this hypothetical consideration as stroke related Google Trends search queries showed positive correlations with disease pathology, risk factors, symptoms, signs, treatment and prevention. Similar consistencies were observed in online search queries of other diseases or health conditions namely status epilepticus 7 , multiple sclerosis 6 and systemic lupus erythematous 8 . We also found correlations between HISB and actual stroke incidence across states and countrywide estimate. Although being statistically significant, most states and countrywide associations showed negative correlations between HISB and actual stroke incidence. A plausible explanation of such scenario could be attributed to the nature of the disease or health-related states that are being studied, as the correlation impact of non-communicable diseases are highly complex to decipher due to a number of environmental and lifestyle factors which directly affects the disease states that need to be controlled, such as geography, ethnicity, physical activity, eating habits and social interactions. Although online search queries rise, knowledge of stroke may be improved, lifestyle behaviors could mediate a bidirectional effect of socio-economic status and health. The geographical setting of certain states which are lower in socio-economic status may catalyze a weaker motivation and inadequate resources to maintain a healthy lifestyle. This theoretical model was advocated by Wang & Geng 67 . We also took note of region-specific estimates that were collectively occupied by certain states. The HISB seemed to correlate well with actual stroke cases across regions but correlation of HISB and state-specific counts showed some inconsistencies as discussed earlier. Similar finding was observed in a previous study from the USA 29 . Plausible explanations include: (1) state-specific data captured from the registry dataset that was used for comparison by itself was estimated to be limited; (2) when corresponding to regions, states within the particular region are bulked together, yet states with higher socio-economic status or urban areas have better internet penetration, giving rise to greater search queries and yielding positive relationships with actual stroke cases; and (3) geographic differences (either state or region level) on actual stroke risk factors such as ethnicity, diet, obesity, diabetes mellitus or socio-economic status may serve as surrogate markers for greater internet search interests among the population at risks 29 .
For the first time, we incorporated spatial epidemiology with time series analytics by the utilization of both novel internet data streams and conventional surveillance data of non-communicable diseases. We forecasted a combined impact model that predicted Malaysia's Stroke 2.0 of HISB and incidence of stroke for the next 3 years. The yielded forecasted model found that, as HISB of stroke continue to rise, the incidence of stroke may slightly decrease or reach a plateau over the next 3 years. Since the spurious peak of stroke searches in 2016 and coupled with ongoing rigorous stroke campaigns, we believed that people tend to explore more about stroke online consistently, thus gaining appropriate up-to-date knowledge on the treatment, control and early prevention of stroke. This could be the reason on why actual stroke cases may have appeared stationary over the subsequent years, yet may be reaching a plateau phase or projected to have a reduced incidence over the next 3 years in our forecasted model. It is giving an important impression that as people explore more information about stroke on the internet, they tend to improve their knowledge and understanding of stroke, succinctly triggering their self-care efforts and control measures to prevent themselves from being afflicted with stroke. We recommend an urgent need for this promising observation through robust analytics and study designs in the near future to test possible variables that may influence such observations. We believe that internet resources have enhanced stroke knowledge, and coupled with efforts of stroke advocates who are currently drafting policy implementations for a paradigm shift of stroke care reform from the vertical to horizontal approaches of prevention strategies through campaigns, community screenings and surveillance efforts may have predicted such observations in the forecasted model.

public health implications
Internet data analytics is real-time as compared to conventional surveillance or registry data. This tackles the issue of delayed data collection, analyses, forecasting and interpretation of yielded evidence to inform urgent public health policy. Our analysis identified geographic variations of stroke HISB and actual stroke counts across different states in Malaysia. This approach provides a metric to evaluate health disparities among populations at the national level, informing public health practitioners and advocates in the country to direct community health programs and interventions using targeted approach, such as accelerating stroke risk-factor prevention programs and education measures in disproportionately affected states. Temporal trends from query volumes coupled with their geographic distribution and searches could yield a quantifiable and valuable measure of public attention information needs of stroke. The current results that utilized internet data analytics integrated with conventional registry data would catalyze great opportunities for public health agencies to disseminate health information rapidly and efficiently at a cost-effective pace, provided reliable news are shared to the population. It would be timely to see the acceleration of public health informatics applications in the current sense, where new technology explosion within the population through Google Trends could be used as a proxy for proper diffusion strategies based on health education messages, thus filling translational gap between best evidence and practice. Stakeholders from the public health domain could leverage on these new technologies and information overload to plan proper communication strategies for the prevention of stroke.

Study strengths and limitations
The current study which used time series analytics through novel internet data streams, (conceptualized as digital epidemiology through the application of infodemiology and infoveillance methodologies) has offset several disadvantages faced by conventional epidemiological approaches. Digital epidemiology provides real-time information of population's HISB at the national level. Paired with spatial epidemiological approaches, disease states and risk factors could be detected in high risk areas or regions for quick interventions. The approach is cost-effective and quick to be carried out to notify public health advocates for rapid policy drafting and implementations.
Internet data may have certain limitations that need to be cautioned during interpretation. The first is ambiguity of search keywords as Google Trends monitors only queries carried out in Google search engine. The search terms may not be proxy to individuals with stroke or high risk stroke as academics or professionals who are just interested or curious may provide search hits. The anonymity of Google Trends data limits the exploration of stroke HISB across specific demographics, subpopulations and disparities among populations. This is important as the incidence of stroke is stratified across age, ethnicity, gender and socio-economic characteristics 26 . Understanding local HISB of stroke is crucial, but Google Trends data are not available for geographical areas smaller than state or city/town level based on yielded search volumes. Google Trends eliminates repeated queries from the same user over a short period of time to reduce counts of continued searching, and uses a certain threshold of traffic volume so that the very new search terms are assigned to a value of zero, but this could change rapidly. As such, the data may not be independently verified or reliable and investigators have limited control over the data, making quality control difficult.
With the revolution of big public health data, the most popular tool for analyzing HISB using web-based data till date is Google Trends 4 . Online search traffic data was recommended as a good analyzer for internet behavior, and Google Trends has acted as a reliable tool for predicting changes in human behavior; subjected to careful selection of searched terms 4 . With the selection of valid search terms, Google data can accurately measure population's interest and behavior 68 . As we explored and forecasted a particular disease attribute (in this case "stroke"), the search terms and queries will be constant over time. With such valid and consistent terms used to explore disease attributes (e.g. symptoms and signs, risk factors, treatment, etc.), the search terms and analysis are replicable for future research, thus ensures reliability. Moreover, our search terms exploration technique was based on the validated model as proposed by Mavragani & Ochoa 13 .
Due to the nature of the ecological-correlation study design, the results of our study may be subjected to ecological fallacy as there may be mismatch of drawing conclusions about individual-level stroke epidemiological associations from a group-level data. However, it is a unique and a more appropriate study design to explore trends and patterns for observing correlations of exposures at the population level in exploring a particular disease or public health phenomenon. The current study may be subjected to "mixing" as geographical variations may suffer migrations of population within states, thus diluting differences between groups in our study population. To be consistent with epidemiological concepts in determining disease distribution and determinants, future research using Google Trends data should incorporate individual tracing when users are logged in to their accounts, thus enabling user characteristics retrieval and analyses such as age, gender and ethnicity. The intent of the study would catalyze more meaningful interpretations based on disease risk stratifications of stroke. Such opportunity and usefulness of Google Trends data should be maximized to facilitate public health interventions, health education and promotions, but should be cautioned of use with relevant privacy settings assured.

conclusion
The current study has provided insights on trends of stroke HISB from internet data that showed possible associations with weather and geographical variations through time series analytics and spatial epidemiology approaches. Search queries were correlated positively with disease characteristics but negatively with actual stroke counts data. Our forecasted model showed that HISB will continue to rise but stroke incidence may reach a plateau within the next 3 years. The current study has offered new real-time surveillance tool and approaches to alert public health systems and policy makers for planning appropriate resources towards stroke detection and prevention in the country. Future studies should validate internet based data with external datasets for reliable use of such approaches.