Patterns of human and bots behaviour on Twitter conversations about sustainability

Mouronte-López, Mary Luz; Gómez Sánchez-Seco, Javier; Benito, Rosa M.

doi:10.1038/s41598-024-52471-z

Download PDF

Article
Open access
Published: 08 February 2024

Patterns of human and bots behaviour on Twitter conversations about sustainability

Mary Luz Mouronte-López¹,
Javier Gómez Sánchez-Seco^1,2 &
Rosa M. Benito²

Scientific Reports volume 14, Article number: 3223 (2024) Cite this article

1160 Accesses
1 Citations
3 Altmetric
Metrics details

Subjects

Abstract

Sustainability is an issue of worldwide concern. Twitter is one of the most popular social networks, which makes it particularly interesting for exploring opinions and characteristics related to issues of social preoccupation. This paper aims to gain a better understanding of the activity related to sustainability that takes place on twitter. In addition to building a mathematical model to identify account typologies (bot and human users), different behavioural patterns were detected using clustering analysis mainly in the mechanisms of posting tweets and retweets). The model took as explanatory variables, certain characteristics of the user’s profile and her/his activity. A lexicon-based sentiment analysis in the period from 2006 to 2022 was also carried out in conjunction with a keyword study based on centrality metrics. We found that, in both bot and human users, messages showed mostly a positive sentiment. Bots had a higher percentage of neutral messages than human users. With respect to the used keywords certain commonalities but also slight differences between humans and bots were identified.

Humour and sarcasm: expressions of global warming on Twitter

Article Open access 15 July 2022

Analysing Twitter semantic networks: the case of 2018 Italian elections

Article Open access 24 June 2021

A text mining analysis of human flourishing on Twitter

Article Open access 28 February 2023

Introduction

According to the United Nations, sustainability refers to what makes it possible to meet current human needs without compromising the ability of future generations to meet their own necessities¹. Sustainability is an issue of international concern. As a consequence, in 2015, world leaders adopted a set of 17 global goals (these objectives were: SDG1: No poverty, SDG2: zero hunger, SDG3: good health and well-being, SDG4: quality education, SDG5: gender equality, SDG6: clear water and sanitation, SDG7: affordable and clean energy, SDG8: decent work and economic growth, SDG9: industry, innovation and infrastructure, SDG10: reduced inequalities, SDG11: sustainable cities and communities, SDG12: responsible consumption and production, SDG13: climate action, SDG14: life below water, SDG15: life on land, SDG16: peace, justice, and strong institutions, SDG17: partnerships for the goals) to eradicate poverty, protect our planet and ensure prosperity, constituting the 2030 agenda for sustainable development². Pieces of research exist that describe some approaches to achieving environmental protection, the so-called “triple-de”: decarbonisation, detoxification, and dematerialisation³. In addition to empirically analysing data for a number of companies over certain time periods referring to the companies’ environmental responsibility, there is also an examination of how media attention impacts the relationship between environmental protection and sustainable development^4,5,6.

Social networks, due to their large number of users, are powerful tools to understand public perception. In particular, the online social site Twitter had 229 million of monetized daily active users in 2022 (Monetized Daily Active Users (mDAU) symbolises the number of unique users who access and interact in a given day)⁷. Twitter is characterised by the fact that the majority of its user accounts are public (Twitter offers its users two main privacy settings: public and private. In the case of private accounts, only those individuals the user follows are able to read their messages. It is important to note that Twitter does not allow different privacy settings for individual messages.), in turn, makes their messages public. By contrast, other social networks (for example Facebook have mostly private accounts, which makes their content only accessible to each user’s network of friends. In particular, Facebook allows users to choose from a variety of privacy options, allowing them to have a fully visible profile, a profile only viewable by recognised friends, or anything in between. Users can modify the privacy settings for each specific post, making it public, visible only to friends, private, or it can be set to a custom audience. The above is an intrinsic feature of both social networks, that have different approaches⁸.

For researchers Twitter has the advantage that it provides a powerful Application Programming Interface (API) which enables access to it in advanced ways^9,10 and¹¹. Using these API, it is possible to obtain a large amount of data concerning each tweet, user profile, place in which a tweet was sent, among other supplemental information^9,10, and¹¹. Since September 2017, Twitter users have been able to post messages of up to 280 characters each (previously it was limited to 140). Twitter users can publish new tweets, reply, retweet and quote⁹. Furthermore, it must be noted that, in other networks such as Facebook, which are more private in nature, accessing the information provided by its API is more complex^9,11. In particular, on Facebook, retrieving many status messages is more complicated than the message retrieval that is possible on Twitter. Consequently, Twitter’s API provides greater potential to access all the information about a topic or discussion^9,10,11 and¹².

With regards to sustainability discourse on Twitter, certain research analysed the practice of tweeting about Corporate Social Responsibility and Sustainability issues, focusing on those which are the most relevant topics, and also considering who is leading the debates on these issues^13,14,15. Previous research also explores how climate leadership and environmental messages impact on companies’ stock prices¹⁶. There are studies that place the main examination focus on companies within a particular country¹⁷.

Multiple approaches have been used to analyse social networks in depth and with the purpose of understanding their mechanisms of operation^18,19,20, as well as to examine public perception on various topics of social interest^21,22,23,24, including political electoral campaigns^24,25, and polarisation issues^26,27,28. Certain mechanisms governing the propagation of information from bots, and their influence on public opinion formation, have also been explored^{29,30,31,32,33,34,35,36}. There are also several studies that have analysed the behaviour of bots on Twitter by examining temporal patterns that describe their activity^37,38,39, such as entropy, existence of motifs (repetitive sub-sequences in the interaction time series), unusual periods of inactivity (discords) and detection of periodicities in the posting of messages.

With respect to machine learning models, including Random Forest, Generalised Linear, Support Vector Machine, and neural networks models, particularly, those using deep learning methodologies such as transformer-based models, have been widely used in various fields, such as: medicine^40,41,42, economics^43,44,45, environment^10,46,47,, industry^47,48,49,50, and food security⁵¹, among others.

Taking into consideration all of the above, we aim to answer the following research questions:

1.
What is the optimal model for identifying account typologies? Can existing models be improved?
2.
Regarding sustainability, are there differences in the behavioural patterns on Twitter of humans and bots?
3.
Are there various patterns of activity? What are their characteristics?
4.
What is the sentiment on sustainability? Are there differences between humans and bots? How has the sentiment evolved over time?
5.
What are the most relevant words used in the tweet text? Are there differences between humans and bots?

In this paper, we present an analysis of the messages posted on Twitter about sustainability with the goal to further understand social opinion on this issue. In particular, the said analysis can accurately identify the account typologies (bots or humans) that at some time made posts related to sustainability. Furthermore, useful information can be collected from knowing interaction patterns and understanding characteristics of these tweets. Specifically, the novelties of this research are: (i) The large database on which the study is built allows us to better understand the patterns of messages published by both humans and bots dealing with a specific topic (in particular sustainability). Notice that previous studies, as the one mentioned above used a much smaller data set. (ii) Building a model based on the digital footprint of each user, which symbolises all types of messages that a user sends, in a similar way as previosly proposed³⁴. But in addition to that, we go one step further utilising various compression algorithms (gzip, zlib, bzip2, lzma and smaz) and comparing them. Analogously to⁵², the model is built using attributes related to the user’s profile and their activity (which is described through parameters associated with the messages sent by each author (see Sect. "Activity patterns"). At the same time, as a novelty, our model utilises these parameters together with others referring to the user’s digital footprint. Consequently, a robust model is developed providing optimal results. We apply our model to Twitter data corresponding to users who at some point during the period from 2016 to 2022 sent messages (tweets, retweets, quotes, or replies) related to sustainability. (iii) On the basis of the time series describing the activity of each user, behavioural patterns are detected using clustering analysis and statistical tests. As a novelty with respect to previous research^37,38,39, the behaviour of humans and bots is studied based on parameters that globally characterise the time series corresponding to tweets, retweets, replies and quotes (see Sect. "Activity patterns"). (iv) We determine the most relevant sentiments and words, in addition to detecting some differences according to the activity pattern, the type of account, as well as the polarity of the messages.

Results

Compression algorithms

In this section, based on Twitter data downloaded for the period 2006-2022, we present the main results obtained in our study. The tweets used are those containing the following keywords: sustainable agriculture, sustainable food, renewable energy, green urban, sustainable transport, pollution, sustainable city, and sustainable industry (see Sect. "Building the dataset"). We used 38,615 users and 96,252,871 tweets (40,000 users were processed, but only those users with a public account were taken into account in the analysis, which were 38,615). The users for the analysis were randomly selected among all those who at some time sent a tweet with any of the above mentioned keywords related to sustainability. Then, a digital footprint was built for each user (see Sect. "Building the dataset"). It consists in a string of characters that represent the different interaction mechanisms in Twitter: post a tweet (’A’), retweet (’T’), reply (’C’) and quote (’G’) used by the user. As an example, a user’s footprint could be a string in the form “ACTCATTTTAG”, meaning that the user has made the following actions: post a tweet, reply, retweet, reply, post a tweet, 4 retweets, post a tweet and quote.

After creating the users’ digital footprint a compression algorithm was applied. We applied different compression methods in order to find the most appropiate one to our study. The compression methods used were: (gzip, zlib, bzip2, lzma and smaz). We found that gzip algorithm was the most optimal, see Sect. "Building the model").

Figure 1, displays the compression ratio (size of raw digital footprint (in bytes)/size of compressed digital footprint (in bytes)) as a function of the size of the raw footprint for bot and human users (in bytes), both magnitudes are highly relevant³⁴ for gzip method. It can be seen that this ratio showed a gap between human (dots in blue colour) and bot (dots in red colour) users. For human users with footprints smaller than 3,265 bytes, the compression ratio was in most cases less than 10. However, if the sizes were larger than 3,265 bytes, the ratio showed a large variability. This is in line with the fact that information with high entropy cannot be compressed/decompressed with optimal efficiency⁵³. For bots with footprint sizes lower than 3,265 the compression ratio showed linear growth with size. The linear fit in addition to the raw footprint size that resulted in the highest variability of the compression ratio has been included in the Supplementary Material Document (Tables S1 and S2). It can be observed that bots exhibited a higher median of the raw footprint (see statistical quartiles in Table S3) than human users.

Building the model

We built a supervised learning model based in the use of labelled data sets with the purpose of classifying our used Twitter data to discern whether a user is a bot or not. With the aim of labelling the users as humans or bots, the Botometer tool was utilised (see "Methods" Section). Botometer is a very appropriate resource to identify bots on social networks such as Twitter. However, because bot detection is a complex technique, it has limitations: the accuracy of identification is limited by the quality of the data, and the algorithms used for the analysis. Botometer provides a probability score that a user account is a bot, which can result in false positives and false negatives in detection. As bots become more sophisticated, they more optimally imitate human behaviour. This fact makes it difficult for bot detection tools to keep pace with the changing strategies of malicious users. Human user accounts also exist that tend to be identified as bots. This is because they exhibit repetitive behaviours, or utilise tools to automate certain tasks^54,55. There is also a linguistic and cultural bias in bot detection algorithms, which implies that tools are not equally effective in all languages⁵⁵. In order to mitigate the limitations of Botometer, we have carried out the following actions: (i) select only tweets in English, (ii) choose only Twitter users who have a public account and (iii) adjust the threshold value for labelling users as bot or non-bot in order to match the proportion of bots values that are indicated in other studies as typical on Twitter.

With respect to the model, it was built using a variety of approaches that correspond to the following three alternatives:

Alternative 1: in this alternative, as input variables, we consider both raw size and compression ratio of the user’s fingerprint, similarly to³⁴, but additionally we include several compression methods.
Alternative 2: in this alternative, analogously to⁵², certain parameters referring to the messages from an author, as well as to the user profile were considered as input attributes. They were: ratio between amount of persons being followed and number of followers, percentage of retweets, replies, quotes and tweets over total messages. Also included, were the ratio of tweets with multimedia content, and maximum time gap between messages (in hours). No parameters related to compression algorithms were used
Alternative 3: in this option, both alternatives 1 and 2 were taken together.

Various mathematical models were implemented (see Sect. "Building the model"), which were generalised linear (GLM), random forest (RFM) and support vector machine (SVM). They were evaluated utilising several performance metrics. The most optimal mathematical model and compression method were random forest and gzip, respectively. Although, lzma, bz2, zlib also showed a good accuracy (higher than 0.80). For each option of explanatory variables and mathematical model, the best hyperparameters (where they existed) and performance metrics, have been given in the Supplementary Material Document (see Tables S4–S17). The main performance metrics for the three aforementioned alternatives, gzip compression algorithm, and GLM, RFM and SVM procedures are given in Table 1.

Table 1 Performance metrics of the validation set for the models: generalised linear (GLM), random forest (RFM) and support vector machine (SVM) models, for the three alternatives considered.

Full size table

Activity patterns

With the purpose to unveil the activity patterns of both humans and bots, a clustering analysis using the $K-Means$ method was implemented for A (tweet), T (retweet), C (reply) and G (quote) interactions. For each user, the following attributes were taken into consideration: mean, standard deviation, median, mode as well as maximum, and minimum number of tweets posted daily, including maximum used lag order and obtained p-value in the Augmented Dickey-Fuller (ADF) test^56,57,58 (see Sect. "Activity patterns").

For the different types of messages analysed, 100 experiments were performed in order to study the interactions of human users. A number of human users identical to the number of bots were randomly selected with replacements in each trial. The clustering tendency was evaluated using the Hopkins statistician (HS). Silhouette and Dunn indexes were also computed to obtain the optimal number of clusters (see Supplementary Material Document, Sect. S2.3).

The results obtained for the different types of interactions are described below.

Type A interactions

Data from 3421 bots and 34,229 humans was used in the examination of type A messages. Based on the value provided by both Silhouette and Dunn indexes, the human users could be grouped into two clusters. The average value for HS was 0.01060, with a standard deviation equal to $1.1540 \times {10^{-05}}$ showing evidence of the existence of clusters (the closer this statistic is to 0, the more evidence in favour of the existence of clusters in the information exist.). The average values corresponding to Silhouette and Dunn indexes were 0.89673 and 0.02408. Taking the 100 experiments into account, the Kruskal-Wallis test⁵⁹ showed that there was no difference between groups marked as 1 (group 1), nor between groups marked as 2 (group 2). The average percentage of human users in groups 1 and 2 was 2.24 % and 97.76 % respectively. Regarding bots, according to the Silhouette and Dunn indexes, two clusters were also detected, each having 8.39% and 91.61% of bots.

For human users, Table 2 displays the values of the median of the parameters exhibited by the centroids of each cluster in the 100 experiments performed. Table 2 shows magnitudes corresponding to centroids in each cluster for both humans and bots.

Type C interactions

In order to study the type C messages, 25,264 users (159 bots, and 25,105 humans) were analysed. A variety of the trials executed returned a different optimum number of clusters in accordance with the values of Silhouette and Dunn indexes. Because of this, we were not able to perform an activity pattern analysis.

Type T interactions

In relation to the type T messages, 202 bots and 28,866 humans were explored. The HS, calculated over the 100 experiments, provided an average value equal to 0.06259 with a standard deviation equal to 0.00160. According to the Silhouette and Dunn indexes, 2 was the optimum number of clusters in all trials. On average, each cluster included 7.83% and 92.17% of the total human users. Regarding bots, 3 clusters containing 14%, 4%, and 82 % of the total users were found.

For human users, Table 2, shows the median of the parameters associated with the centroids of each cluster in the 100 experiments. For bot users, the attributes corresponding to the centroids are displayed in each cluster.

Type G interactions

For Type G messages, 3,626 bot and 34,970 human users were examined. Notice that although there is a considerable higher number of bots than in the other types of interactions, the analysis did not detect any cluster either for humans or bots. For human users, the HS showed average and standard deviation values equal to 0.15265 and 0.0011. For bots this statistic was 0.13584.

Table 2 Results of the clustering analysis for the Types of interactions A (post tweets) and T (retweets).

Full size table

Account typologies

This section aims to investigate whether differences exist between humans and bots behaviour in tweetting. To do this, we analyze the values of different statistical such as the mean, standard deviation, median, mode, maximum, and minimum number of tweets sent daily, in addition to used lag order and obtained p-value in the ADF test. For A and T messages, the Kruskal-Wallis test demonstrated that considering the above-mentioned factors individually, there were no dissimilarities between the 100 trials performed (p-$value>0.05$). Therefore, it was possible, without loss of generality, to utilise a single experiment to examine differences with the bot group.

For Type A messages, for all considered parameters except the median, a p-$value > 0.05$ was obtained in the Kruskal-Wallis test proving that a differentiation exists for this magnitude. In relation to Type C, G and T messages a p-$value>0.05$ was obtained for all parameters individually analysed.

Sentiment analysis

With the purpose of examining the sentiment of the interactions on sustainability, similarly to^9,11, only type A messages were taken into consideration, because they are the only ones that bring new personal opinions.

The sentiment from the original tweets posted during the period 2006-2022 was analysed. According to the procedure indicated in Sect. "Building the dataset", 40,000 users were selected, but due to the fact that some of them had an empty author field, only 37,650 were considered. For bots and humans, the average text size of each tweet was 114 and 130 characters, respectively.

In the same way to^9,11 for each tweet the sentiment is calculated as an average of the sentiment of the words included in it. The computed sentiment is typified as positive (when its value is in the interval [0, −1]), neutral (if it is 0) and negative (when its value is in the interval [−1, −0]). Only tweets with a subjectivity higher than 0 were considered.

We observed that bot users showed a higher number of neutral tweets than human users. For each user typology and sentiment type (positive, negative and neutral), we built a 17-dimensional vector $v_{tucs}$, in which each component is the annual average sentiment. For each user typology and period analysed, we create three vectors, describing the positive, negative and neutral sentiments.

Sentiments by user typology, ut, humans or bots:

$${\text{positive}}: \,\vec{v_{{utpos}}} = (v_{{utpos1}} ,v_{{utpos2}} ,...,v_{{utpos16}} ,v_{{utpos17}} )$$

$${\text{negative: }}\vec{v_{{utneg}}} = (v_{utneg1}, v_{utneg2},..., v_{utneg16}, v_{utneg17})$$

$${\text{neutral: }} \vec{v_{{utneu}}} = (v_{utneu1}, v_{utneu2},..., v_{utneu16}, v_{utneu17})$$

In order to unveil differences between human and bot users behaviour, the cosine similarity between vector pairs corresponding to each sentiment type was calculated. This metric, which presents values in the range [−1, 1], is defined as follows⁶⁰:

$$\begin{aligned} Cosine \, Similarity \ (a,b) =\frac{\sum _{ij}^Na_ib_j}{\sqrt{\sum _{ij}^N a_ia_j}\sqrt{\sum _{ij}^N b_ib_j}} \end{aligned}$$

(1)

where a, b symbolise two N-dimensional vectors. $a_{i}$ and $b_j$ represent the coordinates of each vector. N is the dimension of each vector.

When the angle between the two vectors (a and b) is small it generates high cosine values, (close to 1) indicating high cosine similarity.

It was observed that, despite the relevant differences in the number of tweets between user typologies (during the analysed period, a maximum 80,000 messages were posted by humans in a year and 14,000 by bots), the cosine similarity was 0.95 for negative and positive sentiment vectors. Neutral sentiment showed a higher difference between both user typologies with a value equal to 0.84. Figure 2 depicts the historical evolution of the sentiment about sustainability (utilising the chosen keywords for the tweets downloaded).

Figure 3 shows the evolution of the average daily sentiment during the first four months of 2022 for both human and bot users. It can be seen that the messages posted about sustainability by bots exhibited a higher range of variability in the daily average sentiment. It varied between a value of 0.001, which is very close to neutrality, up to a value of 0.158. In contrast, the daily average sentiment of human users varied between 0.045 and 0.149. The bots also presented more messages whose average sentiment was located in the aforementioned extreme values (see Supplementary Material Document, Table S18).

Keyword analysis

For each tweet, we obtained the most frequent words, as well as the most relevant ones according to centrality metrics⁶¹ (see Sect. "Keywords analysis"). The latter could be assimilated to the process that a human being performs when visualising a text and deduce the most significant content.

For each user typology and cluster, Table 3 shows the top 20 keywords. It can be observed that there are 5 words that are common in all clusters. They are “air pollution” and “new renewable energy” which symbolise important topics in sustainability matters. There are also other common words between clusters, if the results derived from calculating the centrality metrics are compared with those obtained by applying the counting method. It can be seen that some low frequency words are, however, a key element to understand the content of the tweet (see Table 3 and in Supplementary Material Document, Table S20).

Table 3 Top keywords related to sustainability got from tweets posted by 37.650 users by cluster and typology (TKG method).

Full size table

Discussion

This research makes a relevant contribution to its application area by extending the work of³⁴ and⁵², implementing a model that distinguishes between human and bot accounts on Twitter. The incorporation as explanatory variables of the parameters related to footprint compression, profile characteristics and user activity, improves the model’s performance. The model was tested with five different compression algorithms (zlib, bzip2, lzma and smaz), showing good results (accuracy $>0.80$) with four of them slightly favouring gzip as the most optimal. The sample taken into consideration for the construction of the model consisted of 38,615 users with the percentage of bots ranging from 9 to 15 percent⁶².

The time series characterisation associated with each user typology allowed us to perform a clustering analysis for both type A and T messages. For bot and human users, the analysis of type A messages demonstrated the existence of two clusters, pointing out that there were two different ways of operating on Twitter. For bot and human users, type T messages exhibited 2 and 3 clusters, respectively.

The analysis of the evolution of the sentiment of the posted tweets during the period since 2006 to 2022 exhibited interesting patterns. Bots, despite having shown a generally higher percentage of positive posts, exhibited a large number of posts with extreme values, a higher overall polarity and a higher proportion of neutral posts compared to human users. This sentiment characteristic remained relatively stable in the first four months of 2022 for both account types.

If we focus on the keyword analysis, especially that based on the centrality metric, we gain valuable information about the content of the messages. Human clusters show high similarity with each other (0.96). However, the bot clusters, especially cluster 1, exhibited the highest difference with the rest of clusters ($<0.86$). In spite of that, five words present in the aforementioned cluster 1 are common to all others.

This research not only expands our understanding of sustainability by analysing patterns of activity, relevant words and sentiment, but also provides practical information for distinguishing between human users and bots on Twitter. Below, we summarise certain limitations of the study carried out in this document, which were previously explained in detail.

Bot detection is a complex task, and the accuracy of bot detection tools is limited by the quality of the data and the algorithms used. In order to mitigate this, in reference to Botometer, several actions were carried out such as selecting only English tweets, choosing only users with public accounts, and establishing a threshold value for labelling in accordance with the values provided by other investigations. With respect to the analysis of sentiments, we use TextBlob due to its simplicity of use, its efficiency, and its processing times that are very appropriate for the volume of data used in this research.

Analysing human and bot activity patterns on Twitter was one of the primary objectives of this investigation, but the implications of the research findings extend beyond this social network. The high importance of sustainability issues in national and international contexts makes understanding the behaviour patterns of human and bot accounts very relevant. This knowledge would enable institutions and authorities, as policy makers and decision makers, to have a more effective influence in achieving more sustainable behaviour. Regarding sustainability matters, companies could also benefit from knowledge of human and bot activity patterns, as well as the sentiment analysis of messages posted on social networks in order to make their communication strategies more effective. Comprehending how bots can influence opinion on sustainability topics could also be of interest to citizens. All of the above allows us to raise awareness about the need to perform a critical analysis of the information published on social networks.

In future research, as a continuation of the investigation described in this document, we plan to implement a comparative sentiment analysis of the tweets utilising different sentiment analysers. Additionally, based on time series, we could implement sentiment prediction models. The procedures to be used could be some including artificial neural networks such as Long Short-Term Memory (LSTM), Adaptive Wavelet Neural Network (AWNN), Elman Recurrent Neural Networks (ERNN) or others such as autoregressive (AR), Autoregressive Integrated Moving Average (ARIMA), in addition to others. Tweets in Spanish could be used to expand the analysis here executed. The influence of bots on the perception of human users could also be explored through contagion models⁶³.