Cross-platform social dynamics: an analysis of ChatGPT and COVID-19 vaccine conversations

The role of social media in information dissemination and agenda-setting has significantly expanded in recent years. By offering real-time interactions, online platforms have become invaluable tools for studying societal responses to significant events as they unfold. However, online reactions to external developments are influenced by various factors, including the nature of the event and the online environment. This study examines the dynamics of public discourse on digital platforms to shed light on this issue. We analyzed over 12 million posts and news articles related to two significant events: the release of ChatGPT in 2022 and the global discussions about COVID-19 vaccines in 2021. Data was collected from multiple platforms, including Twitter, Facebook, Instagram, Reddit, YouTube, and GDELT. We employed topic modeling techniques to uncover the distinct thematic emphases on each platform, which reflect their specific features and target audiences. Additionally, sentiment analysis revealed various public perceptions regarding the topics studied. Lastly, we compared the evolution of engagement across platforms, unveiling unique patterns for the same topic. Notably, discussions about COVID-19 vaccines spread more rapidly due to the immediacy of the subject, while discussions about ChatGPT, despite its technological importance, propagated more gradually.


Introduction
Social media have markedly reshaped global information access, sharing, and consumption, thereby redefining the dynamics of information dissemination and, consequently, agenda-setting dynamics (56; 48; 16).The spread and consumption of information on online social media may be influenced by several factors such as biases (45; 10), platform designs, and algorithms (22; 12).Typically, online users are inclined towards information that resonates with their viewpoints (4), often dismissing opposing data (67), leading to the formation of likeminded user groups supporting a common narrative (16).The dynamics of these interactions may vary across social media platforms due to differences in business models and content selection algorithms (64).Online discourse frequently centers around controversial or timely topics such as political elections (37; 2), natural events (23), or significant global occurrences (6).
Recent advancements in Large Language Models (LLMs) have attracted significant attention due to their potential impact on various sectors (49).These models, trained on extensive datasets, can process and generate text similar to human communication, exhibiting quick and effective adaptability to new tasks (7).A notable instance is ChatGPT, launched by OpenAI on November 30, 2022 1 , which has catalyzed discussions regarding its capabilities and associated risks, including misinformation, ethical considerations, and broader AI implications (51).LLMs like ChatGPT have displayed remarkable competence in diverse tasks, ranging from creative writing to complex problem-solving (26; 25), and their usage has proliferated across various professional domains and among the general public.In particular, the accessibility of ChatGPT to the general public triggered a substantial volume of posts across multiple social media platforms shortly after its release (18).While many discussions were positive, growing concerns regarding the potential risks such as misinformation dissemination, cybersecurity threats, and adverse impacts on the labor market also fueled the discourse (33; 17; 29).These concerns have also given rise to alternative discussions emphasizing the limitations of LLMs in precise planning and problem-solving (19; 65; 43).
The COVID-19 pandemic emphasized the challenges posed by information saturation and highlighted the central role of social media in its dissemination (66; 11; 6).Analyzing the spreading patterns of innovations like ChatGPT across digital platforms, and comparing these dynamics with other significant events, is crucial for evaluating the social media platforms' impact in our society.
In this study, we investigate the trajectory of ChatGPT discourse across online platforms, using data from five major social media platforms -Facebook, Twitter, Instagram, Reddit, and YouTube -alongside global news coverage captured by the GDELT dataset.We capture user engagements, tracing the rise in interest and participation across diverse platforms while characterizing the debate surrounding LLMs by identifying dominant themes and sentiments.Additionally, we model the growth trajectory of user engagement within the LLM discourse and compare it with the user growth pattern related to COVID-19 vaccination discussions, a well-documented controversy (11).This comparison aims to elucidate the differences in information dissemination dynamics across global topics.
Although some recent studies have looked into online discussions about Chat-GPT (60; 41; 30; 35), they did not provide a comparison across different social media or considered global news coverage.The nature of debates can change based on the platform they occur on (12; 11), so analyzing discussions in different online settings is crucial to gaining a thorough understanding.Our analysis fills this gap by examining discussions on multiple social media platforms and news outlets.
In this study, we identified a concise set of relevant topics from comments about LLMs, like risks, health, education, finance, and technical discussions.We found that users on different platforms focus markedly on different topics, reflecting the distinctive nature of each platform.By performing sentiment analysis, we were also able to identify specific themes that most represent concern and excitement toward the recent deployment of AI.When we modeled the user growth pattern on each platform, we found that users engaged faster with discussion about ChatGPT on Twitter, YouTube, and Reddit compared to Facebook and Instagram.We also noticed that COVID-19 vaccine debates spread faster than those about ChatGPT on all platforms.In both cases, discussions on social media spread faster than in news articles.
Our research underscores the importance of understanding online discussions within their unique contexts.It highlights the factors affecting how information spreads across various platforms and topics.The findings from our study have implications for how we perceive the spread of information online, especially during critical global events.Our findings are also in line with key communication and media theories such as selective exposure theory (57; 59), the agenda-setting function of media (44; 55) and the role of framing in decision making (21; 9).Evidence that people consume information that aligns with their existing beliefs and attitudes, posited by selective exposure theory, emerges from our results as users on different platforms gravitated towards discussing aspects of ChatGPT that interest them or align with their perspective.Moreover, the agenda-setting theory suggests that the media significantly influences what issues are important to the public based on the coverage they receive.The heightened discussion around ChatGPT across all platforms following its release is a classic example of this theory in play.Lastly, the importance of framing is also evident in our research.How topics related to LLMs and COVID-19 vaccines are presented on different platforms can significantly affect perceptions and resultant user engagements.Analyzing these frames can provide insights into how these topics can be most effectively communicated.By recognizing the diverse focus and engagement patterns on different platforms, stakeholders, including policymakers, educators, and the tech industry, can better anticipate and respond to public reactions and concerns in a digitally connected world.

ChatGPT Discussion in Online Platforms
We begin our analysis by outlining the discussion around ChatGPT across different platforms.Our dataset includes about 3 million news articles and posts from November 25, 2022, to February 25, 2023 (see Methods for more details).Figure 1.a displays the cumulative count of content related to ChatGPT over three months, highlighting a sharp increase in early December followed by steady growth.Twitter and Facebook are identified as the most active platforms, while the discussion has a lower volume of news articles coverage.Figure 1.b shows the distribution of interactions by platform, where interactions consist of likes, comments, shares, and platform-specific metrics.Despite differences in platforms, all interaction distributions exhibit a long tail pattern consistent with previous studies (11), indicating that a small number of posts receive the most interactions while the majority receive minimal consideration.This pattern of engagement confirms the skewed nature of online discussions, where only a few posts dominate the conversation and receive disproportionate attention across different social media platforms.

Platform-Specific Dynamics
In this section, we apply topic modeling techniques to identify the main themes in ChatGPT discussion across platforms (see Methods for more details to resonate more with users on YouTube, Twitter, and Reddit, suggesting a user base eager to discuss the working mechanism of these language models.Despite the observed difference, the implications of AI on the job market emerged as a consistent theme across platforms.In the topic of "Job Market", comments covered the potential increased productivity while also addressing concerns about human job replacement.Users on Twitter relatively discussed more the topic related to the potential risks associated with LLMs.In summary, the topic of "Risks" is about posts discussing issues from data misuse, jailbreaking, potential biases, and societal impacts of AI.Finally, we note how the topic "Health" got less interest in the early weeks since the launch of ChatGPT.This lack of interest is surprising considering present public health community concerns about its role in substituting healthcare experts (32) and other global phenomena such as Infodemic (15).Further, we analyzed the sentiment tone distribution around topics by leveraging data from Global Database of Events, Language, and Tone (GDELT), a comprehensive resource that systematically gathers news content from various news outlets (39; 40).We tied the sentiment tone of news articles to the comments that mentioned them, using these articles as a proxy for the sentiment tone of the comments.Figure 2.b shows the sentiment distribution for each topic.The breakdown of distribution statistics is available in table 3. We obtained further evidence about both public concern and enthusiasm regarding the use of ChatGPT, showing that AI's effects on education and writing are perceived as negative.On the other hand, ChatGPT was mostly discussed in a positive way in image generation, financial, and technical conversations.Nonetheless, the widespread sentiment distribution is common to all the extracted topics and underlines the potential controversy it generated in the public debate.

Modeling User Engagement Growth across Platforms
After identifying differences and similarities in ChatGPT discussions across various platforms and noting their potential to spark debate, we aim to track how new users post ChatGPT-related content.By examining how the number of users grows over time, we can quantify the differences between platforms in the evolution of the discourse.
We processed the data to consider the cumulative number of unique users up to each day, counting each user that participated in the debate once, marked on their first appearance day.We modeled users' growth using logistic function (for more details about the model refer to the Methods section).The model's main coefficients, α and β, represent the growth rate and the time half of the unique users were engaged, respectively.A higher growth rate indicates a more steep increase in users' volume and is quantified by a higher α value, while a lower β value implies that it takes a shorter time to engage 50% of the final user base.We also measured the growth speed through the Speed Index (SI ) at which the model reaches its plateau and can be interpreted as how fast the discussion saturates among users.The speed index provides a comparative measure of user engagement dynamics across platforms because of its normalized value.Figure 3 depicts a series of plots showing the cumulative sum of unique users engaged in ChatGPT-related topics for each platform, while for news articles, it shows the cumulative sum of published articles present in the GDELT dataset.In each plot, we report the curve obtained by fitting the logistic function Eq. 1 to model the users' growth, while the parameters of the fits (α, β, and SI ) are detailed in Table 1.Twitter, YouTube, and Reddit exhibit similar growth rates (α values) and times to reach half of the unique users (β values).They also have higher Speed Index values with respect to other platforms, indicating faster growth in unique users compared to Instagram and Facebook.Conversely, Instagram and Facebook demonstrate steeper user growth (higher α values) but require more time to engage half of the unique users (higher β values).Remarkably, Facebook has the second largest user volume, but a SI lower than all other platforms except Instagram, suggesting that a delayed interest in the user base is independent of the size.Regarding news articles, it seems that social media users are leading the way in content generation during this period, a finding that aligns with previous studies underscoring the leading role of social media in shaping the news landscape (62; 54).
We observed platforms may exhibit different user engagement patterns as the discussion on a topic evolves.Understanding these dynamics is crucial for planning the dissemination of information and managing online discussions about major events.

Comparing ChatGPT Discourse with COVID-19 Vaccine Discussions
To further clarify the impact of ChatGPT in the public discourse on various platforms we compare its growth pattern with another topic that got significant attention, namely discussions surrounding COVID-19 vaccines.The discourse on vaccines, inflated by urgent health concerns during the pandemic and spread quickly across various media platforms (63).Conversely, ChatGPT represents a different kind of subject, being a notable technological advancement of potentially comparable resonance (36; 15).This analysis compares the rates at which different content spreads across diverse social media platforms.Additionally, we include news articles in the analysis to grasp ChatGPT's impact on global media coverage.To carry out this comparison, a dataset on COVID-19 vaccine discussions was curated across the same platforms and within a comparable time range as our ChatGPT dataset.This specific timeframe was selected due to the substantial debates and conversations surrounding COVID-19 vaccines, making it a suitable benchmark for comparison with the ChatGPT discourse.Figure 3 displays a series of plots showcasing the cumulative count of unique users engaged in vaccine-related discussions across each platform, while for news articles, it illustrates the cumulative total of published pieces.Accompanying each plot is a curve representing the fitted logistic function, providing a model for the diffusion process underlying the growth of the number of unique users in the vaccine debate.Table 1 reports the parameters for fitting these logistic curves.The fitted results of ChatGPT-related data are also included for a visual comparison.The normalized curves account for variations in user base sizes and allow for a direct comparison across platforms.In light of the comparative analysis, several key takeaways emerge regarding the evolution of ChatGPT and COVID-19 vaccine debates across different platforms.In all platforms we observed that COVID-19 vaccine debate exhibits a considerably higher Speed Index than the ChatGPT discourse.Notably, more than the others, Twitter and Reddit exhibit a higher similarity in the engagement patterns between the two topics.This disparity underscores the faster and wider spread of information about COVID-19 vaccines with respect to ChatGPT discourse.This aligns with previous research, which found that online users are prone to engage with highly controversial topics such as climate change or health-related subjects (23; 12).Moreover, the analysis underlines the dependency of information spreading patterns on the type of environment and audience, with platforms' user bases showing different interest levels in various topics.

Conclusion
The spread of information on social media platforms and how the public receives it often aligns with the principles of agenda-setting theory, showcasing what issues gain prominence and trigger discussions in digital public realms.This study, by comparing the discourse around ChatGPT with the discussions on COVID-19 vaccines, highlights the notable differences in information diffusion depending on the nature and global relevance of the topics.In more detail, the analysis of online discussion dynamics regarding Large Language Models, particularly ChatGPT, across different social media platforms and news articles reveals a complex picture.The posts that gain the most interaction usually align with users' majority interest on each platform, possibly reflecting the selective exposure theory.Meanwhile, the prominence of ChatGPT discussions following its release showcases the agenda-setting potential of new technological developments.Similarly, the rapid and extensive spread of discussions around COVID-19 vaccines highlights the urgency and global concern tied to the pandemic, aligning with traditional agenda-setting models where pressing issues dominate public discourse.Our cross-platform analysis reveals distinct engagement dynamics across platforms, emphasizing the importance of the platform environment and user base in digital agenda-setting.Moreover, the difference in information spread between these two topics underlines the unique opportunity that massive global events offer in studying societal engagement on digital platforms.With their real-time and global reach, social media have reshaped collective participation in response to global events and opened new ways to analyze these engagements through data.This study highlights a crucial aspect of our digital age: the complex interplay between the nature of information, the dynamics of social media platforms, and the collective engagement of the user base in agenda-setting, particularly during globally significant events.Furthermore, the insights from such analyses present a vital pathway to explore how social media platforms can be leveraged to promote informed discourse and engagement.As the frontier of AI technologies like ChatGPT continues to advance, comprehending the discourse surrounding them, how it is shaped, and how it disseminates across various platforms becomes crucial.Despite the fears surrounding the misuse of such models, especially in the potential creation of misinformation, the discussions generally centered around the technical aspects and potential usage of LLMs, such as in creative writing -an important insight for AI developers and policymakers.Future research could investigate such patterns in other areas of AI or technology innovations to provide a broader understanding of discourse dynamics around emerging technologies.Furthermore, other research efforts can cover more platforms, analyzing different temporal frames and comparing other significant global events.This will enhance our understanding of digital agenda-setting and inform effective communication, public engagement, and policy-making strategies in our increasingly interconnected digital society.These insights can guide AI and technology companies, educators, and policymakers in effective communication and anticipating public responses to new technological developments.The lessons learned here underscore the importance of a varied communication strategy tailored to the dynamics of different platforms.This study offers a crucial lens into the communication dynamics of controversial or complex digital developments, providing a vital foundation for future research and practice.

Data Collection
Our approach to data collection involved utilizing two specific keywords: "Ope-nAI" and "ChatGPT", in a case-insensitive manner, over a time span from November 25, 2022, to February 25, 2023.When it came to the information related to the COVID-19 vaccine, we built upon the previously accumulated data (63), further expanding this data by using the same keywords reported to extract relevant news articles from the GDELT dataset, as well as posts from Instagram, Reddit and YouTube.Each platform required a unique method to extract the necessary content.For Facebook and Instagram, we employed CrowdTangle(61), a tool that offers social media analytics by tracking public content on different social media platforms.In the case of the Reddit dataset, we initially integrated the use of both CrowdTangle and Pushshift.However, we resort to Reddit dump after encountering issues with the Pushshift API.The Twitter dataset was compiled using its official API, prior to enforcement of rate alterations.As for the YouTube data, this was collected via the YouTube data API, and news articles were retrieved from the GDELT's Global Knowledge Graph (GKG) table by using Google's BigQuery service2 .Table 2 shows a detailed breakdown of the data.Our data collection was not without limitations.
For instance, CrowdTangle can only track data to a certain limit3 .As for the GDELT news collection, it was dependent on whether the keyword was present in the article's URL.There is no doubt that all datasets contain some amount of spam.However, in the case of Facebook, the presence of spam was apparent from the begining of our analysis.Consequently, we decided to filter out these spam posts that were hijacking ChatGPT hashtag, which in turn reduced the size of Facebook's dataset to one-fourth (800k to 200k).This process was based on the simultaneous usage of these hashtags: #reeel, #cr7, #chatgpt, #fyp, #viral (see SI fo further details).

Topic Descriptions
In this section, we outline the key topics identified from our analysis and provide a description for each, highlighting the main discussions shared by users.These topics are as follows: • The AI Growth topic covers comments highlighting the rapid increase in users drawn to OpenAI's ChatGPT technology.
• AI Rivalry captures comments on the competitive positions taken by major tech entities like Google, Baidu, Microsoft, Meta, Amazon, and Nvidia towards the rise of OpenAI's ChatGPT.
• Topic Access ChatGPT is about discussions surrounding the methods of accessing ChatGPT service.It consists of conversations about countryspecific restrictions, the use of alternative means like fake phone numbers, and details related to Plus subscribers.
• The fourth topic, Creative Writing, spans a broad spectrum of artistic expression.It encapsulates discussions and requests related to various forms of written art, such as poetry, songs, screenplays, in addition to crafting jokes, designing itineraries, and writing books.
• The topic of Cryptocurrency focuses on discussions about digital currencies, mainly Bitcoin, Ethereum, and Dogecoin, and their price predictions.While this topic could have been combined with "financial discussions", it was kept separate due to its significant size.Thematically, cryptocurrency is also separate from the realm of finance.
• Education is one of the main topics discussed in our datasets and covers a wide range of sub-topics.Discussions often touch on issues like plagiarism and AI-generated essays, students leveraging LLMs to cheat on assignments, and schools taking measures to ban the use of such models.There's also a keen interest in how LLMs perform on exams and their utility in answering questions in fields like math and physics.The topic further extends to language learning and LLMs' role in translation.Notably, this topic has been the focus of other research(34; 50).
• The Entertainment topic branches into three main sub-categories: recipe ideas, sports, and gaming.The first category captures requests related to new food or cocktail ideas.In sports, users often seek ChatGPT's insights on players, teams, and strategies across various leagues like the NBA, F1, and football.On the gaming front, discussions revolve around enhancing game designs and tips for achieving higher scores in specific games.
• A significant portion of the comments are dedicated to Financial Discussions topic, emphasizing the convergence of AI and finance.Users mainly discussed AI's influence in marketing, stock trading, and catalyzing business growth in addition to the potential of LLMs to revolutionize entrepreneurship and enhance SEO practices.Discussions often touched on optimizing business strategies and leveraging ChatGPT to boost sales.The potential impact of integrating LLMs in the financial sector has been discussed by many studies (42).
• The topic Health captures comments about well-being and medical matters.Users discussed various medical issues; therapeutic conversations with the bot, personality assessments, exercise routines, and relationship insights.The significance of health discussions in the context of AI has been detailed in other studies(1; 38).
• Image Generation centers on comments about the use of advanced tools for creating visuals.Users discussed tools like Midjourney, Stable Diffusion, and DALLE-2, highlighting their capabilities and applications in crafting compelling images.
• The Job Market topic captures comments about the impact of LLMs on the job market.Users discussed both the positive aspects, such as enhanced productivity and innovation, and concerns regarding the potential of human job displacement (46; 5).The dual-edged role of automation in recruitment was also highlighted, with candidates using it to refine applications and employers leveraging it for assessment.

Topic Modeling
In our study, we employed BERTopic (27) for our topic modeling tasks which is a technique that combines the capabilities of transformer models, such as BERT (Bidirectional Encoder Representations from Transformers), and traditional topic modeling techniques like Latent Dirichlet Allocation (LDA).The advantage of BERTopic is that it leverages the context-capturing capabilities of transformer models, which are superior to traditional techniques when it comes to understanding semantic meanings of words (20).Initially, the dataset was filtered to include only English-language comments.For language detection, we employed the xlm-roberta-base-language-detection model( 13) for all platforms, with the exception of Twitter, as the raw Twitter data was already sorted by language.This model achieved an average accuracy of 99.6% on a benchmark of 20 languages (31).Next, we preprocessed the data by removing URLs and stop words to reduce noise.For embedding the sentences, we used the all-MiniLM-L6-v2 model from the Sentence Transformers library (52).The BERTopic parameters were then fine-tuned depending on the size of each social media platform's dataset.
After applying the BERTopic model to our datasets, we obtained a diverse range of topics.Our aim was to establish a consistent set of topics across all datasets, necessitating a careful review and reorganization of the results.Hence, we post-processed the BERTopic results by performing explanatory mixed methods analysis (24).This entailed a qualitative aggregation of the numerous topics identified by BERTopic into a smaller number of general topics (i.e.themes).This process drew from an iterative qualitative refinement approach inspired by grounded theory principles (3).Given the unsupervised nature of the BERTopic model, it was vital to conduct a thorough evaluation of the interpretability of the generated topics.We engaged in reviewing the keywords provided for each topic (i.e.Interpretability evaluation).Due to the often nuanced nature of these keywords, it was also necessary for some instances to examine a sample of 30 to 50 comments assigned to the topic to gain a deeper understanding of its context and relevance (i.e.Content analysis).To come up with a consistent list of topics across five datasets, we embarked on the iterative process of aligning the topics from each dataset to a standardized set of around 14 common topics.This process was dynamic, as our understanding of the overall topic space deepened with the review of each dataset (i.e.Iterative refinement).This occasionally led to the modification of our set of common topics to encapsulate better the spectrum of themes presented in the data.A short description of these finalized common topics can be found in the topic description section, highlighting the key themes and considerations for each one.
Throughout the process, in order to ensure consistency, we needed to make some compromises which was an inevitable consequence of reducing a highdimensional space to a lower dimension.We adopted a pragmatic approach to streamline the diverse range of themes that emerged from our data, all revolving around the central theme of artificial intelligence, machine learning, and chat bots.Topics that were too broad and failed to deliver meaningful insight were discarded.We manually assigned clear labels to distinguishable topics; for example, topics characterized by keywords such as "poem -write poem -poetry -write -ask write", "music -song -band -sound -album", "valentines -daylove -valentine -valentines day", "email -cold -cold email -lead -write" were categorized under Creative Writing.We labeled topics related to financial discussions, identified by keywords like "marketing -business -customer -product -brand", "stock -ai stock -investor -investorideas -stock directory", "moneymake money -make -fiverr -money online", as Financial Discussions.However, certain topics proved challenging to categorize due to their inherent complexity or vagueness.For instance, topics characterized by keywords like "artificialartificial intelligence -intelligence -won't believe -dangerous Kansas", "chatbot -ai chatbot -chatbots -ai -ai chatbots", "language -model -language modellarge language -transformer", "chat gpt -gpt -chat -gpt chat -ai chat", "know -need know -ask -question -need" were less straightforward.These topics, often driven by short or overly complex comments, were labeled as outliers and removed from our analysis.We were skeptical of such broad topics, to be more specific, a topic such as Q&A often revolves around specific subjects, and if the topic of the question and answer isn't detected, then these comments require further processing, especially if they contain screenshots of the conversations, making them outliers in our analysis.We recognize that this approach entails a certain degree of subjectivity and could potentially eliminate some relevant information.
Despite the superior performance of BERTopic over many conventional topic modeling techniques and the process of manual review and adjustment we used, we must acknowledge the limitations of the process.These limitations arise from the inherent approximations made by the model and the unavoidable subjectivity in human judgment during the labeling process.Our endeavor involved the modeling and manual review of around 1.8 million comments.Regardless of these challenges, we strived to provide a coherent set of topics that offer meaningful insights into our data.

Sentiment Tones for Topics
In our study to understand the sentiment tone associated with topics across different social media platforms, we combined results from the topic modeling section with analysis from the GDELT dataset.The GDELT Project is a global database that tracks news and provides sentiment data on these articles among other features.We focused on news articles published across the same timeframe.We then selected posts across all platforms that contained a URL matching an entry in the GDELT dataset.GDELT's sentiment analysis is based on two main metrics: the Positive Score -representing the percentage of words in an article with a positive emotional connotation, which ranges from 0 to +100 -and the Negative Score -indicating the percentage of words with a negative connotation, also ranging from 0 to +100.The overall sentiment tone is calculated by subtracting the Negative Score from the Positive Score, producing a range from -100 (very negative) to +100 (very positive), with 0 being neutral. 4After merging the topic modeling dataset with GDELT, we identified 31,208 URLs.Each of these URLs is linked to a post or comment referencing a specific news article with an associated topic.Table 3 displays the number of URLs for each topic and provides statistics for the sentiment tone distribution, including minimum, maximum, 25 th percentile, 75 th percentile, and mean values.

Logistic Function
We apply the logistic function (commonly known as s-curve) to model the growth trajectory of users engaged with the ChatGPT and COVID-19 discussions.In this model that was originally devised to model the population growth ( 14), the initial stage of growth is approximately exponential; then, as saturation begins, the growth slows to linear, and at maturity, growth stops.The role of logistic function has been emphasized in new product adaptation (53), transport infrastructures' evolution (28), and interplay between technological revolutions and financial capital (47).Recently, this function has been integrated into online social network analysis (58), and adapted to analyze the user's engagement dynamics across different topics (22).To fit the data, we used a logistic function with the following formula: where α is the slope and corresponds to the user growth rate, while β is the point at which the function attains a value of 0.5, indicating when half of the overall unique users have engaged with the subject.The value of α quantifies how fast the number of users is growing.A higher α means that the user numbers are growing at a faster rate, while lower values indicate a less pronounced growth.The value of β measures how long it takes for half of the total unique users to engage.A lower β means that it takes less time to reach half of the total unique users, implying quicker engagement.Finally, we utilized the speed index function (22), which measures the normalized area under the curve and is defined as follows: This index captures how fast the function arrives at its peak, spanning from 0 to 1.It can be used to compare how quickly user engagement dynamics stabilize across different platforms and topics.If the SI value is high, it means the discussion topic saturates quickly among users.In other words, the topic reaches its peak popularity rapidly and then doesn't grow much after that.Conversely, a low SI indicates a slower yet constant growth, when users keep joining the conversation for a longer time.

Facebook Spam Content
Upon analyzing data related to ChatGPT discussions on Facebook, we observed some sharp strange dents in the cumulative number of posts and cumulative number of users.We then checked the dataset and found out there are many posts with similar styles publishing explicit or totally unrelated content in mass numbers that was inevitable to ignore.Our investigation resulted in flagging posts that used either of the following strings in their content: "#reels #chatgpt", "Video Funny Amazing #fyp #viral", and "#reeel #cr7 #chatgpt".We then proceed with plotting these flagged posts alongside the normal ones as shown in Figure 5.We observed orders of magnitude differences in the number of published posts per day and received interactions for these spam content, indicating that our efforts in removing these spam content were successful.

Figure 1 :
Figure 1: Cumulative number of unique posts about ChatGPT discussion across various platforms (a) and distribution of interaction volume versus the number of posts on different platforms (b).The nature of interactions varies among platforms; For instance, on Twitter, interactions are the sum of likes, quotes, retweets and replies, while on Instagram and YouTube, interactions are the sum of likes and comments.

bFigure 2 :
Figure 2: (a) Proportion of comments for each topic by platform.The cell color intensity corresponds to the proportion of comments discussing a given topic; a higher percentage results in a darker hue.(b) Box plots distributions of the sentiment tone across topics.On the x-axis, sentiment tones are represented as values.A negative value indicates a negative sentiment, while a positive value suggests the opposite.The further away from zero the value is, the stronger the sentiment.The vertical red dashed line at the 0 mark, differentiates positive tones from negative tones.Black diamonds inside the boxes indicate the average sentiment tone for each topic.

Figure 3 :
Figure 3: Cumulative number of unique users with logistic fits by platform.Each plot shows the cumulative count of unique users engaged in ChatGPT and COVID-19 vaccination-related topics over time.The fitted curve corresponds to a logistic function used to model the diffusion of unique users.

Figure 4 :
Figure 4: Cumulative number of unique posts about COVID vaccine discussion across various platforms (a) and distribution of interaction volume versus number of posts on different platforms (b).

Figure 5 :
Figure 5: Number of daily posts and interactions for discussions about Chat-GPT on Facebook.Content that we flagged as spam shows a totally different behavior.

Table 1 :
Logistic function fitting parameters for various datasets.The table reports the α (growth rate), β (point of half-saturation), and SI (Speed Index) values, which are derived from fitting the data for each platform into a logistic function.SI measures the normalized area under the curve.Root mean square error (RMSE) quantifies the model accuracy in fitting data.The smaller the value, the closer the predictions are to the observations.

Table 2 :
Data Breakdown • Topic Public Figures included comments centered on leading figures in tech and business.Notable figures discussed include Sam Altman, Elon Musk, Stephen Wolfram, Larry Page, Sergey Brin, Jordan Peterson, Yann LeCun, Lex Fridman, Marc Andreessen, and Bill Gates.• The Risks topic is segmented into two main areas: security concerns and accuracy & bias.Security concerns cover issues like data privacy, which pertains to the potential misuse of private user data; malicious activities, which involve the exploitation of the model for harmful purposes; and jailbreaking ChatGPT, which refers to unauthorized uses of the model.In accuracy & bias, posts address societal concerns like misinformation and inherent biases in the model, leading to issues such as culture wars, gender, political and religious biases.Additionally, it touches on broader societal impacts when users discuss contentious subjects such as veganism, climate change, and geopolitical tensions like the Russia-Ukraine conflict.• As the final topic, Technical Discussions covers the technical sides of AI and LLMs.Users shared insights on AI tutorials, discussed LLM training techniques, looked into open-source models, and explored different ways that AI can be integrated into various tasks such as bot development and speech processing.

Table 3 :
Statistical metrics of sentiment tone distribution for each topics