Introduction

With rapidly diminishing global boundaries, readily available communication devices, and increasing popularity of social media, a new trend of being “extreme” is becoming normal in today’s attention-driven society. In domains like religion and politics, this trend is evident, whereas in areas like music, lifestyle, food, healthcare, and other day-to-day choices, it is not as visible but still present covertly. The ubiquitous presence of such trends makes it important to understand their causes and effects and identify methods to intervene when necessary and appropriate.

Religious and violent extremism has been a well-discussed topic in the literature due to their global presence and destructive outcomes1,2,3,4,5. However, there are other categories of extremism that grow gradually over time and remain unnoticed until their ultimate consequences are visible. We use the term “eccentricity” to distinguish such extremism from traditional political and religious extremism. The impact of eccentricity can be perceived in various forms. By spreading hate and disharmony, eccentric behavior polarizes society6,7. Fanaticism for favorite celebrities, singers, and politicians often leads to personal rivalry, threats, and cyber-bullying8,9. In some other scenarios, eccentricity can be more injurious and fatal too. Such outcomes have been observed in anti-vaccine movements10,11,12 and firearms-related discourses13,14. The impact of eccentric opinions and behaviors is not only limited to social and individual issues, but it influences financial and economic domains too. Industries often utilize this behavioral tendency to change the dynamics of the market. The popularity of pure vegetarian/non-vegetarian food15,16, vegan diets, and GMO/anti-GMO foods17 is the result of such exploitation by the related corporations and social organizations. These examples suggest that a systematic investigation and research on this category of extremism (eccentricity) is required.

Understanding the genesis and evolution of extremism has always been of great interest to researchers18,19,20,21,22. Factors like gender, race, education, and upbringing do affect the thought process of an individual23,24,25. However, the influence of society, social interactions, and information exposure plays a dominant role in building or altering opinions21,26,27,28,29,30,31,32. To understand the factors and underlying opinion dynamics researchers have developed different strategies33,34. Researchers have applied statistical physics and other mathematical models to examine the dynamics of extreme idea generation35,36,37. Simulation models like agent-based modeling have also been utilized in several studies21,36,38,39. In this study, we uniquely investigated this problem using multiple empirical data of human idea generation and evolution: experimental data obtained through human-subject experiments of idea generation40,41 and online ideation and interaction data obtained from public social media GAB42. We analyzed both datasets and compared the results to identify common patterns of how eccentricity may arise in social networks. To our knowledge, this is the first study that investigated the eccentric idea generation and evolution dynamics using extensive real-world empirical data obtained from diverse sources.

To understand the inception and development of eccentricity, the first step is to establish a quantitative, measurable definition of eccentricity. There are several works that define traditional extremism, but these definitions are restricted to the application and domain of their work and cannot be directly used to define eccentricity in a broader context43,44,45. Each definition covers some aspects of extremism and ignores the rest46,47. As most researchers consider extremism in terms of violence, they weigh an opinion based on whether the opinion will result in an action or not4,45. If the opinion results in an action (mostly destructive), the opinion is considered extreme, e.g., those leading to terrorism or a coup. The major drawback of this popular definition is that it does not quantify the strength of opinions. As this definition makes a binary classification about extremism or not, it would miss the scenarios where extremism might not result in direct actions but may cause more gradual escalation of opinions.

Another common way to define eccentricity is using the threshold method in continuous models18,19. In this method, an opinion’s strength is quantified in a range of possible values and if the strength is greater than a predefined threshold, that opinion is considered extreme or eccentric26,48. One of the major challenges with this method is deciding the threshold. The threshold value varies for different environments, different backgrounds and different tasks, and the same opinion will not qualify as extreme in one setting versus another. Even in the same setting, with time, the opinions that were initially classified eccentric may sound sensible in presence of more eccentric ideas later, and vice versa.

In our study, we chose the simplest dictionary definition of eccentricity: “deviation from the norm”. The norm of a conversation or discussion is defined as the center of all opinions in a social neighborhood, and the eccentricity is quantified as the distance from the norm, both in a semantic metric space that is obtained using machine learning techniques for semantic embedding. This approach makes our definition of eccentricity parameter-free and avoids the problem of binary classification. As the metric is continuous and context-free, the method can be applied to any domain or task without the need for domain-specific knowledge or expertise.

The concept of eccentricity and its quantification, as introduced in this research, holds significant potential for integration into contemporary social media algorithms and various applications where the visibility or impact of individuals or ideas is a critical factor. For instance, in today's social media platforms, content recommendation algorithms rely on metrics like the number of likes or comments to suggest posts and users to others. Similarly, in the context of talk shows or debates, success is often gauged by the volume of applause or views garnered. However, for those seeking to cultivate platforms that prioritize the meaningfulness and relevance of content, the consideration of opinion eccentricity and discourse distinctiveness becomes a valuable tool. By factoring in eccentricity metrics, these platforms can effectively filter out noise and prioritize content that fosters meaningful engagement, ultimately contributing to a more constructive and purposeful digital landscape.

Ideas that become more popular are intrinsically more eccentric

We focus on understanding the influence of the eccentricity of an idea on the amount of attention the idea receives. From two different data sources (human-subject experiments40,41 and online social media GAB42), we collected three types of data: (1) text posts collected from an online experiment for a laptop tagline writing (high collaboration) task, (2) text posts collected from an online experiment for story writing (low collaboration) task, and (3) social media posts from GAB (see "Methods" for details). The number of likes on a post recorded in these datasets is considered a representation of the amount of attention the post received. The eccentricity of a posted opinion is measured by the semantic distance between the idea and the center of all other ideas in the individual’s social neighborhood.

The social neighborhood of an individual, \(u\), is his or her ego network, which comprises all other individuals \(u\) is following and user \(u\) herself. User \(u\) gets exposed to the opinions posted in her social neighborhood (Fig. 1a). Recent opinions posted (we used recent 5 days) in the social neighborhood of \(u\) constitute a knowledge base of \(u\) at each time point \(t\) (Fig. 1b). The rationale behind maintaining a knowledge base that contains only recent opinions is rooted in the belief that exposure to contemporary ideas holds a greater potential for inspiring novel insights. Recent opinions tend to align more closely with current trends and prevailing sentiments. Opinions can undergo significant changes, especially within the dynamic realm of social media. For instance, consider the events of April 15, 2019, when the tragic fire at the iconic Notre-Dame Cathedral in Paris became a focal point of discussion on both news outlets and social media. Just a few days later, on April 18, 2019, the release of the Mueller Report on Russian interference in the 2016 U.S. Presidential election took center stage. Within a week, the collective attention on social media shifted from discussions about the Notre-Dame fire to in-depth analysis of the report. Over extended periods, topics and discussions can undergo drastic transformations, often losing relevance and impact on shaping new opinions. By prioritizing recent ideas in building our knowledge base, we can effectively sidestep outdated information and extraneous noise, thereby enhancing the model's efficiency.

Figure 1
figure 1

Illustrating the Step-by-Step Process of Eccentricity Calculation Within a Social Network. Participants in a social network are connected to a subset of other participants forming a social neighborhood. (a) Each participant can access the ideas generated in the social neighborhood. (b) The ideas are vectorized using Doc2Vec. Recent ideas posted in the social neighborhood of a participant (by the participant and their neighbors) constitute the Knowledge Base for the participant. (Every participant has a separate knowledge base, which is updated when a new idea is posted in the neighborhood). (c) To measure the eccentricity of a newly generated idea, distance between new idea (vector) and the center of the knowledge base is measured. The intensity of red color represents the distance from the center of the knowledge base (which is, by definition, the eccentricity).

In the next step, we convert text ideas in the knowledge base into numerical vectors using the Doc2Vec method49. Principal component analysis is performed on these numerical vectors to reduce the dimensionality (see "Methods" for details). The knowledge base is different for each individual and gets updated when there is a new idea in the neighborhood. The eccentricity of a new opinion \({O}_{u}(t)\), posted by \(u\) at time \(t\), is measured by the distance of \({O}_{u}(t)\) from the center of \(u\)’s knowledge base at that time \(t\) (Fig. 1c) (see "Methods" for mathematical definitions).

Posted ideas are partitioned into different popularity levels according to the number of likes they received. We compare the probability distributions of eccentricity for different popularity levels to find the relationship between eccentricity and popularity. We use the kernel density estimation method with a Gaussian filter50 to estimate the probability distribution for each popularity level. Figure 2a and b represent popularity distributions for the posts in the laptop tagline writing task and the posts in the short story writing task, respectively.

Figure 2
figure 2

Probability distributions of the eccentricity of the posted ideas at different popularity levels (number of likes, i.e., amount of attention they attracted). Popularity levels are represented in different color bins, low (blue) and high (orange) (and, in case of GAB data, medium (green)). The vertical dashed lines show the average value of eccentricity for each popularity level. (a) Plots for the dataset collected in the laptop tagline writing experiment (high collaboration task). Two-sample Anderson–Darling test with Bonferroni correction on unequal sample sizes (n>2 = 59, n<=2 = 815) (b) Plots for the dataset collected in the short story writing experiment (low collaboration task). Two-sample Anderson–Darling test with Bonferroni correction on unequal sample sizes (n<=2 = 669, n>2 = 38) (c) Plots for the dataset collected from GAB. Two-sample Anderson–Darling test with Bonferroni correction on unequal sample sizes (n<=10 = 130,234, n11-100 = 1866, n>10090). In the plots of high popularity levels, the tail of the probability density function becomes broader in all three data sources. The average eccentricity also increases as the popularity level goes up in all cases.

The data obtained from the online experiments has a limited range of likes (0–5), hence ideas are partitioned into two popularity levels: High (> 2) and Low (< = 2). Figure 2c shows the probability distribution for GAB social media posts. GAB posts, which have a wide range of the number of likes (0 to 500 +), are partitioned into three popularity levels: Low (< = 10), Medium (11–100), and High (> 100). For all the three datasets, the right tail of the distribution gets broader for the higher popularity levels, indicating that more eccentric posts attract greater attention and popularity. The pattern of increasing average eccentricity with increasing popularity levels is consistent across all the datasets, despite the different nature of the sources. The average eccentricity and eccentricity distributions for different popularity levels are significantly different from each other (p-values obtained by Anderson–Darling test using Bonferroni correction method are shown on each plot).

Movement of neighborhood ideas conceals own ideas’ deviation

Furthermore, we propose another variation of eccentricity measure named “self-eccentricity”. Self-eccentricity is the eccentricity of opinions with respect to the previous opinions made by the same author of the opinion in question. In other words, the self-eccentricity of an individual’s new idea is the distance of the new idea from the center of that individual’s previous ideas. Whereas eccentricity is an indicator of deviation from the common consensus or core of the discussion in the social neighborhood at a certain time point, self-eccentricity measures departure from one's own previous ideas over time. We applied this measure to the GAB dataset that had sufficient historical data of users’ opinions. We observed that the eccentricities of posts made by an individual did not remain constant but kept changing. We quantified the temporal change of self-eccentricity and analyzed the dynamics of eccentricity change for each user.

In order to quantify the overall evolution of an individual's eccentricity, we have adapted the F-score and G-score metrics proposed by Mall et al.51 for our analysis. The modified definitions can be found in the "Methods" section. The F-score for a user represents a weighted average of the change in their opinion eccentricity, irrespective of the direction of change. It measures a user’s stability in terms of their opinion eccentricity over the period of study. On the other hand, the G-score considers both the change in eccentricity and the direction of that change. It quantifies the extent of average increase or decrease in opinion eccentricity. Both the F-score and G-score definitions incorporate a decay term that factors in the time delay between any two consecutive opinions and the overall average time delay between opinions. This adjustment accounts for changes in an individual's state of mind over time. When combined, the F-score and G-score provide a comprehensive characterization of an individual's behavior. Furthermore, we calculate the F-score and G-score based on the self-eccentricity for each user. The combined F-score and G-score for self-eccentricity help us understand how a user's current opinion deviates from their past opinions.

In Fig. 3a, each individual is represented in a 2-D space based on their F-score and G-score calculated based on the eccentricity (relative to the social neighborhood). The distribution is symmetric, with individuals spread on both the positive and negative sides of the G-score axis. Most individuals are concentrated around the G-score zero line, suggesting that there is no consistent directional trend in idea eccentricity concerning their social neighborhood. In simpler terms, individuals' ideas tend to align with those of their social network.

Figure 3
figure 3

Distribution of GAB users’ idea eccentricity dynamics. Dynamics of user opinion eccentricity is explained using two metrics: F-score which quantifies average absolute change and G-score which measures average directional change in idea eccentricities. Each dot represents a user in the data. (a) F-scores and G-scores of users’ eccentricities in their respective social neighborhood, i.e., deviation of their ideas from the center of their neighbors’ ideas. Users are color coded according to their F-score, G-score (calculated based on social neighborhood eccentricity) coordinates as shown in color bars. (b) F-scores and G-scores of users’ self-eccentricities, i.e., deviation of their idea from their own recent ideas. Each user in (b) is color coded using the same color scheme as in (a) to show the correspondence between the two plots. (c) Distributions for eccentricity G-score (relative to the social neighborhood) and self-eccentricity G-score (relative to own past). The G-score distributions are significantly different (Mann–Whitney test with equal sample size of 318) from each other. The G-score (social neighborhood) mean is lower than the G-score (self-eccentricity) mean (− 0.0092 vs 0.201).

However, the plot takes on a different pattern in Fig. 3b when users are positioned in the F-score and G-score space of their self-eccentricity. Each user in Fig. 3b is color coded using the same color scheme as in Fig. 3a to show the correspondence between the two plots. Here, the distribution is noticeably skewed toward positive G-scores. This shift indicates that users are generally moving away from their previous ideas, signifying an increase in self-eccentricity over time in relation to their past opinions. This trend holds true regardless of their eccentricity within their social neighborhood, as indicated by the marker colors. The distributions of G-scores for eccentricity and G-scores for self-eccentricity are represented in Fig. 3c. Mann–Whitney test (with an equal sample size) shows a significant difference in these two distributions with G-scores for self-eccentricity higher than G-scores for eccentricity.

These two results, when interpreted together, deliver the following key finding of our work: Individuals are turning more eccentric with time in terms of their own previous ideas (Fig. 3b); however, as everyone in the neighborhood of these individuals is also shifting from their prior opinions, the change in individual’s ideas may not be noticeable (Fig. 3a). Such spontaneous yet unrecognized increase of idea eccentricity can be driven by the positive correlation between eccentricity and attention described earlier (Fig. 2).

Discussion

In today’s heavily interconnected world, we notice a trend that a large section of society is increasingly opting for eccentric choices that stand out. We have explored a few insights about this behavior with multiple real-world empirical data. Our first finding shows that the deviation of opinions from the norm helps attract the attention of other individuals. Several studies have shown that serious adverse effects of social media usage are cravings to receive social acceptance and attract the attention of friends and acquaintances52,53,54. Our finding indicates that such human desire may naturally lead to generation of more eccentric opinions. This behavior may scale up to other contexts in the real world beyond online social media.

Another crucial implication about eccentricity obtained in this study is that the overall collective shift of ideas in our social neighborhood may create an illusion of consistency in our own opinions. This can be understood using an analogy of multiple passengers riding on an elevator. In a smooth-moving elevator, any change in elevation is not felt directly by the people using the elevator because the only reference points to assess one’s position are fellow individuals riding on the same elevator. As everyone is moving in the same direction at the same speed, it feels like everyone is standing still and not moving. Similarly, we may not feel the shift in eccentricity of our opinions as our social neighbors show similar shifts.

This study draws a picture of how extreme ideas and opinions may spontaneously arise in society. Everyone wants to gain social acceptance and become popular and influential. As being eccentric in opinions helps attract neighbors’ attention, people start expressing out-of-center opinions. And as most of the social neighbors do the same, it would be difficult to notice that one’s opinions are becoming more eccentric compared to others. If people do not recognize the heat, there would be little feedback mechanisms to stop them from becoming more eccentric. These behavioral patterns form a cycle and may reinforce each other. These conclusions illustrate the need for further study of how to detect such spontaneous escalation dynamics in society, and if appropriate, how to implement effective interventions so that it will not cause undesirable negative impacts on our lives.

One notably direct and consequential application of this concept is its potential role in the analysis and regulation of our own behavior within the realm of social communication. As demonstrated in our research, the pursuit of attention often leads individuals to express opinions that deviate from the norm. Paradoxically, such behavior, when exhibited by our peers, tends to camouflage our own eccentricities. Without the means to measure and alert us to these behavioral shifts, they may potentially yield adverse long-term consequences.

Drawing from the insights presented in this study, there is an opportunity to develop tools and mechanisms capable of monitoring and notifying individuals about such behavioral changes. These tools could provide individuals with an opportunity to reevaluate and recalibrate their communication patterns. Similarly, such tools can find application within organizational settings to track and analyze communication behavior among members, or in educational contexts to scrutinize student behavior in the classroom, thus affording opportunities for early intervention to prevent situations from becoming irreparable.

Methods

Online experiments and data collection

We collected data from two different sources. One is a human-subject online experiments performed at a mid-sized US university, where students in different majors were recruited to participate in a collaborative textual design task on a Twitter-like online experimental platform40,41. Participants were linked to a subset of other participants like a social network setting. There were two types of tasks used for the experiment: (i) writing laptop marketing taglines (high collaboration task) and (ii) writing short fictional stories (low collaboration task). Like in typical social media platforms, participants in these experiments could see only the ideas posted by their social neighbors, like those neighbors’ ideas, and add comments to them. Details of the experiments, idea generation process and method of visualizations can be found in these papers40,41. All experiments were carried out in accordance with relevant guidelines and regulations and informed consent was obtained from all subjects who participated in the experiments.

The other data source we utilized for data collection is the GAB social media. GAB is a social networking service particularly popular among the far-right people in the US. The data from the GAB is freely available and can be scraped from their website. We used the snowball sampling method to collect data from GAB. We collected around 30 M posts and connections information between authors of these posts. As we are interested in understanding eccentricity, which requires a connected network of individuals, we selected the largest connected component (LCC) from the user network. A subgraph induced by randomly selected 10% of the users in the LCC was used for data analysis to keep the computational demand at a manageable level. This gave us a dataset of around 3,000 GAB users with about 147,000 posts for the analysis. The dataset consists of posts that were made between August 2016 and January 2021.

Text embedding and dimensionality reduction

The first step in analyzing textual data is to convert it into numerical form, a process called text embedding. Before converting text ideas into numerical vectors, each text idea was cleaned to remove stop words, punctuation marks and digits. We also used word stemming to convert different forms of a word to a standard form. In the subsequent step, the Doc2Vec method was used to convert the cleaned text ideas into numerical vectors. Doc2Vec first creates a vocabulary using the text corpus (all ideas combined in this study), trains a model, and infers a numerical vector for each text idea. For the GAB data, we set the inferred vector size to be 300, whereas for online experiment data it was set to 90, given the difference in the data sizes. The 300-dimensional numerical vectors for the GAB data were further transformed to lower dimensional vectors (115 dimensions) using the principal component analysis preserving 90% variance in the data. We call the resulting numerical vector an “idea vector”.

Eccentricity and self-eccentricity calculation

To calculate eccentricity of an idea, we first create a social neighborhood for each user, which is a directed network between a user and all other users he or she is following. In our study, we assume the collaboration network remains unchanged for each user during the study. The collaboration network is used to create a knowledge base for each user. A knowledge base is the collection of recent ideas (recent 5 days) made in the neighborhood. The ideas in knowledge base are converted to numerical vectors using Doc2Vec method. The mean vector of a vectorized knowledge base is the center of knowledge base, and the distance of an idea vector from the center is the eccentricity of that idea. It is essential to emphasize that each individual has a unique knowledge base that evolves with the introduction of new ideas in their social neighborhood. Consequently, the eccentricity of a particular opinion is determined based on the individual's current knowledge base at the time when that opinion was generated. This highlights the dynamic and personalized nature of eccentricity assessment.

The mathematical process for determining eccentricity involves a series of steps described as follows: To calculate the eccentricity of an opinion \({O}_{u}(t)\) (opinion posted at time \(t\) by user \(u\)), we need a knowledge base \(K{B}_{u}(t)\) (Knowledge base of user \(u\) at time point \(t\)). \(K{B}_{u}(t)\) is the collection of all opinions in the social neighborhood of user \(u\) posted before time \(t\). Social \(neighborhood \left(u\right)\) contains all the users \(u\) is following (or is connected to, in case of undirected network) and user \(u\).

$$K{B}_{u}\left(t\right)=\left\{{O}_{v }(k) \right| v \in\, neighborhood \left(u\right), \,k<t \& \left(t-k\right)<5\, day.$$
(1)

Eccentricity of the opinion \({O}_{u}(t)\), \(Ecc({O}_{u}\left(t\right))\) is the distance (\(d\)) of \({O}_{u}(t)\) from the center of \(K{B}_{u}(t)\),

$$Ecc \left({O}_{u}\left(t\right)\right)=d \left({O}_{u}\left(t\right),center \left(K{B}_{u} \left(t\right)\right)\right).$$
(2)

Self-eccentricity is the deviation from the center of one’s own recent previous ideas (recent 5 days). For each individual, a self-archive of recent ideas is maintained, and each new idea is evaluated against the self-archive. Self-archive \({SA}_{u}(t)\) of user \(u\) at time \(t\) can be defined as:

$${SA}_{u}\left(t\right)={O}_{u}\left(k\right) | k<t \& \left(t-k\right)<5 \,days.$$
(3)

Self-eccentricity (\(Self\_Ecc({O}_{u}\left(t\right))\)) of an opinion \({O}_{u}(t)\) is the calculated as the distance of \({O}_{u}(t)\) from the center of self-archive of the user \(u\) at time \(t\).

$$Sel{f}_{Ecc\left({O}_{u}\left(t\right)\right)}=d \left({O}_{u}\left(t\right),center\left(S{A}_{u}\left(t\right)\right)\right).$$
(4)

In our work, we have used Euclidean distance (\({L}^{2}\) norm) as the distance metric. We have estimated the center of the knowledge bases and the self-archives with the mean of all the contained vectors.

Popularity levels and eccentricity distribution

Ideas are categorized into different popularity levels based on the number of likes they received. The online experiments had a limited number of participants, and each session ran only for two weeks, so the maximum number of likes is less than 6 for this dataset (laptop taglines data < 5, story writing data < 6)40,41. Due to the limited number of data points in online experiment dataset, we have opted to categorize the online experiment data into only two groups to ensure the robustness of our claims: ideas having fewer than or equal to two likes (Low popularity) and ideas having more than two likes (High popularity). In the case of the GAB dataset, the range of number of likes is much wider (0 to 500 +). The range of the number of likes differs significantly between the online experiment data and GAB social media data. To make both datasets comparable, we employ a logarithmic scale to normalize the 'number of likes' attribute in the GAB social media data, resulting in a range of 0 to 2.6 on the logarithmic scale. We then define distinct popularity categories for logarithmic likes [0, 1] (original scale 0–10), (1, 2] (11–100), and > 2(> 100).

For each popularity level, a probability distribution of eccentricity is constructed. The kernel density estimation method is used to estimate the probability distribution. We have used a Gaussian kernel with bandwidth of five to smoothen the curve. The mean value of eccentricity is also calculated for each popularity class.

F-score and G-score calculation

For each user \(u\), F-score (\(F(u)\)) and G-score (\(G(u)\)) are calculated by taking a weighted average of their eccentricity changes. For F-score calculation, we only consider the magnitude of change of eccentricity not whether it is increasing or decreasing (Eq. (5)). In the G-score calculation, we retain the sign of change of eccentricity to understand the average increase or decrease of eccentricity (Eq. (6)). The additional decay term accounts for delay between two consecutive opinions (\({t}_{k+1}-{ t}_{k}\)) and overall average delay between opinions (\(\alpha \)). \({N}_{u}\) is the total number of opinions posted by the user \(u\).

$$F\left(u\right)=\sum_{k=1}^{{N}_{u}-1}\frac{{e}^{\left(-\frac{{t}_{k+1}-{ t}_{k} }{\alpha }\right)}\left( \left|Ecc\left({O}_{u}\left({t}_{k+1}\right)\right)-Ecc({O}_{u}({t}_{k})\right| \right)}{{N}_{u}-1}$$
(5)
$$G\left(u\right)=\sum_{k=1}^{{N}_{u}-1}\frac{{e}^{\left(-\frac{{t}_{k+1}-{ t}_{k} }{\alpha }\right)}\left( Ecc\left({O}_{u}\left({t}_{k+1}\right)\right)-Ecc({O}_{u}({t}_{k}) \right)}{{N}_{u}-1}$$
(6)

Implementation details

Python 3.8 was used to implement all the data analysis procedures of the project. To embed text ideas into numerical space, Gensim Doc2Vec library55 was used. We have used the Distributed Bag of Words (DBOW) model in the Doc2Vec method. The Doc2Vec parameters are different between the online experiment dataset and the GAB dataset as the size of training data is different. For the GAB dataset, the document vector size is 300 and the min_count parameter is set to 10; meanwhile, for the online experiment dataset, the document vector size is 90 and the min_count parameter is set to 7. The NetworkX library56 was used to create and maintain collaboration networks.

Human research participants

Experiments were conducted after an approval from the Institutional review board at Binghamton University, NY, USA.