Instagram, Flickr, or Twitter: Assessing the usability of social media data for visitor monitoring in protected areas

Social media data is increasingly used as a proxy for human activity in different environments, including protected areas, where collecting visitor information is often laborious and expensive, but important for management and marketing. Here, we compared data from Instagram, Twitter and Flickr, and assessed systematically how park popularity and temporal visitor counts derived from social media data perform against high-precision visitor statistics in 56 national parks in Finland and South Africa in 2014. We show that social media activity is highly associated with park popularity, and social media-based monthly visitation patterns match relatively well with the official visitor counts. However, there were considerable differences between platforms as Instagram clearly outperformed Twitter and Flickr. Furthermore, we show that social media data tend to perform better in more visited parks, and should always be used with caution. Based on stakeholder discussions we identified potential reasons why social media data and visitor statistics might not match: the geography and profile of the park, the visitor profile, and sudden events. Overall the results are encouraging in broader terms: Over 60% of the national parks globally have Twitter or Instagram activity, which could potentially inform global nature conservation.


S1. Estimating the temporal autocorrelation in the data
Pearson correlation coefficient is a widely used statistical measure to study the linear relationship between two variables, i.e. if the variables are correlated with each other. In case of time-series data, however, Pearson correlation estimates might be inflated (i.e. showing stronger relationship than there actually is) because of temporal autocorrelation on the measurements that have been done on the same site on regular intervals. Because our data contains monthly measurements from the same locations (national parks), there is a change of having temporal autocorrelation in the data that could potentially influence the correlations. Hence, we used autocorrelation function (ACF) and partial autocorrelation function (PACF) to estimate the temporal autocorrelation in the datasets (official visitor statistics and social media user days (SUD)). We plotted the correlograms (autocorrelation plots) for each park in South Africa ( Figure S1) and Finland ( Figure S2) with different lags (up to 11 lags) using 95 % confidence interval. If the autocorrelation is higher / lower than the confidence limits (i.e. outside the blue area in the Figures S1 and S2), there exists temporal autocorrelation in the data. In such cases, the Pearson correlation coefficient should be taken with caution as the correlation might be inflated. We removed such parks from the further analyses ( Figures 5 and 6 in the main article).

S2. Platform comparisons including all parks
In the main text of the article we exclude parks with temporal autocorrelation that reduced the number of parks included in the analyses in South Africa from 21 to 18 and in Finland from 35 to 18 (altogether n=36). In South Africa, where only a few parks had temporal autocorrelation, this exclusion did not have a dramatic effect on the results. However, in Finland where almost half of the parks were excluded, the platform comparisons produced slightly different results. Hence, in Figures S3 and S4 we report also the results including all national parks in Finland and South Africa. Figure S3 show that the differences between platforms are statistically significant also in Finland with p-value 0.008 when tested with Kruskal-Wallis test. Dunn's test with Holm-Sidak adjusted p-values reveal that the differences between Instagram and Twitter (p-value: 0.008) and

Results in
Instagram and Flickr (p-value: 0.009) are both significant. With all parks included, interestingly, the 2E median correlations of Instagram and Twitter are the same in both countries (platform-wise) which might indicate the general robustness of social media for predicting the visitor rates in an equivalent manner in very different and distinct regions (here South Africa vs Finland). However, these results might be biased because of temporal autocorrelation. It is also noteworthy that Flickr does not follow the same pattern as it has clearly better correlations in Finland than it has in South Africa. Figure S4 focuses on investigating if social media data tend to work better in more visited parks and in parks with more social media content and users. The results reveal that the correlations of Instagram are clustered around 0.75 correlation coefficient, whereas with Twitter and Flickr the correlations are more scattered when including all national parks. When considering all national parks, the trendline does not have as strong slope as in Figure 6 (main text) where only parks without temporal autocorrelation were reported. Figure 5

in the article). The performance of Instagram is rather similar both in South Africa and in
Finland, having a 70 % median correlation. Figure has been Figure   6

S3. Potential reasons for difference between official statistics and social media data
In the group discussions in South Africa and Finland, the stakeholders identified hypothetical reasons which could influence the differences between social media posts and visitor numbers. We organised these potential reasons presented in the group discussions under four main categories (Figure 7 in the article). Below and in Figure S5, we provide some concrete examples taken from the stakeholder discussions. Finally, social and phenological events may affect the relative number of social media posts. For example, festivals can cause peaks in the temporal pattern of social media data (e.g. Tankwa Karoo).
Moreover, rains and poor weather conditions may discourage people to take photos, while a particularly beautiful spring flower blooming may attract more tourists to parks and increase their willingness to post.
The discussion above is based on expert knowledge and reasoning in the stakeholder workshops.
These reasons should be investigated with more thorough analyses in future studies of the factors influencing the use of social media in the parks.