Introduction

The sudden emergence of COVID-19 at the end of 2019 caused people to rethink the importance of public health crisis management. This includes establishing emergency response organizations, improving epidemic awareness, evaluating the effectiveness of institutional arrangements, enhancing scientific decision-making and implementation, and addressing psychological crises during the pandemic (Nuzzo et al., 2020). However, one aspect is frequently overlooked: eliminating the “social stigma directed toward the source of the outbreak.”

Social stigma, prejudice, and punitive remarks can destroy all goodwill, potentially diminishing the desire to maintain transparency in epidemic regions (Smallman, 2015; WHO, 2020; Chapelan, 2021). History has recorded such hostility countless times. The pandemic that swept through Europe from 1918 to 1919 became known as the Spanish flu; the prion disease affecting cattle that began in the 1980s is widely referred to as British mad cow disease; Ebola, which impacted all of West Africa from 2014 to 2015, is named after a river in the Democratic Republic of the Congo. Other examples include Mexican influenza (H1N1 influenza) and Zika virus, named after the Zika Forest in Uganda, where it was first discovered. The stigmatization of the source of a disease outbreak often undermines disease control, incites hostility toward infected people or regions, and leads to local protectionism fueled by fear of the disease (Abdelhafiz & Alorabi, 2020; UNICEF, 2020; UNICEF Sudan, 2021; Rewerska-Juśko & Rejdak, 2022; Yousefi et al., 2022). Social stigma can greatly hinder the effectiveness of efforts to fight pandemics. As such, appropriate government intervention becomes critical to alleviate or eradicate this issue.

World Health Organization (WHO) indicates that “Governments, citizens, media, key influencers, and communities have an important role to play in preventing and stopping stigma surrounding people from China and Asia in general” (UNICEF, 2020) This article focuses on official intervention to utilize the behavioral data of hundreds of millions of internet users in the Baidu Index to examine changes in the use of potentially discriminatory language, like “Wuhan pneumonia”, in internet searches following the official naming of the virus. By analyzing the usage patterns of the keywords “Wuhan pneumonia” and “novel coronavirus pneumonia,” this study aims to shed light on the impact of official intervention on the use of discriminatory language when searching the internet.

The analysis in this article has three main focuses. The first is to suggest the importance of government intervention in reducing social stigma. The second is to examine the searching behavior of potential discriminatory language over time. Through the time dimensions, we can observe the change in the searching behavior before and after official intervention. This paper also uses panel data at the provincial level to analyze socioeconomic factors that would limit the effect of official intervention to searching behavior of potential discriminatory language. And finally, coping strategies for reducing social stigma directed toward the source of the outbreak.

Social stigma and official intervention

An individual can be stigmatized due to various factors, including race, culture, gender, intelligence, and health, leading others to perceive them as different (Goffman, 1963). Social stigma in the field of infectious disease prevention and control, as defined by the World Health Organization (WHO) and related United Nations organizations, refers to the labeling, stereotyping, or isolation of individuals affected by a disease or regions experiencing an outbreak. This stigma extends not only to those directly affected by the disease but also to individuals residing in areas with infectious disease outbreaks, resulting in discrimination and isolation (Abdelhafiz & Alorabi, 2020; WHO, 2020; UNICEF, 2020; CDC USA, 2021; UNICEF Sudan, 2021).

On 12 February 2020, the WHO officially named the disease caused by the new coronavirus as COVID-19, which stands for Coronavirus Disease 2019. WHO Director-General Tedros Adhanom Ghebreyesus (2020) emphasized that the chosen name deliberately avoided any reference to a person, place, or animal associated with the virus to prevent stigmatization. In a tweet, Tedros Adhanom Ghebreyesus stated, “We now have a name for the disease caused by the novel coronavirus: COVID-19. Having a name matters to prevent the use of other names that can be inaccurate or stigmatizing” (Tedros, 2020). This tweet marked the first instance where the significance of virus naming was conveyed to the global public, and it was also the first time the public has become aware of the issue of social stigma related to the naming of infectious diseases.

When society stigmatizes a disease, those who are infected are likely to feel reluctant to confront the illness and may avoid seeking treatment to evade negative labeling (Abdelhafiz & Alorabi, 2020; Chew et al., 2021; UNICEF Sudan, 2021). The consequences of this phenomenon were similar to the stigma faced by African residents due to AIDS (Shilts, 2011). Sontag also points out that patients with certain diseases, such as cancer and AIDS, were deliberately isolated and attacked, resulting in them feeling guilty. Due to the public’s lack of understanding of the disease, societal symbols and myths can prevent patients from effectively confronting the illness, ultimately leading to higher mortality rates (Sontag, 1978; 1997). In China, the influence of ancient Chinese philosophies and attitudes contribute to regional prejudices and social stigma toward ethnic groups in southwest China (Zhang, 2005).

In addition to diseases such as AIDS or cancer, the stigma associated with infectious diseases is widespread, posing a serious barrier to disease prevention and treatment. Once a disease name is established, it is very difficult to alter or remove it. Historically, this has happened repeatedly, particularly in contexts where empathy, respect for others, and human rights are paramount. For instance, the term “Spanish flu” fueled irrational fears and stigma (Hoppe, 2018). The term “Ebola” has led to survivors of Ebola in West Africa facing long-term humiliation and stigmatization within their communities (Overholt et al., 2018). Furthermore, the term Mexican influenza (H1N1) influenced some Americans’ perceptions of Latinos in the United States, adversely affecting their social welfare (McCauley et al., 2013).

Following the outbreak of COVID-19 in early 2020, the use of the term “Wuhan pneumonia,” including in internet searches, associated the city of Wuhan with the disease. Unknowns and the fear of these unknowns are the key drivers of social stigma and discrimination. (UNICEF, 2020; UNICEF Sudan, 2021) In addition, due to the initial lack of official and public understanding of COVID-19 as well as its symptoms, such as rapid infection and relatively high mortality, the disease quickly became a marker. As a result, individuals infected with the disease and residents of Wuhan not only had to deal with the physical pain of the disease but also with societal pressure. This stigma then extended to groups associated with those afflicted by COVID-19, typically starting with those closest to the patients and then spreading to those in the same geographical area. This stigma can lead to a reluctance to undergo COVID-19 testing, posing challenges for disease control authorities in their epidemic prevention efforts (Chew et al., 2021).

Furthermore, the highly contagious nature of the COVID-19 virus means that any stigmatization of an area or an infected person, which may lead to avoidance of testing, hampers the efforts of health organizations in various countries to contain the virus’s spread (Althoff et al., 2021).

In order to prevent social stigma, the WHO (2015) has stated that newly identified human diseases should be assigned socially acceptable names that are non-offensive to individuals, regions, or countries and avoid reference to animals. Naming that prioritizes convenience and inherent stereotypes may inadvertently promote stigma and symbolic associations, leading to public attitudes of alienation and hostility towards stigmatized groups, which could hinder the effectiveness of public health measures during an epidemic. The WHO has stated that government intervention is effective in reducing social stigma during pandemics (UNICEF, 2020). Scholars have also pointed out that official intervention has a positive effect on reducing social stigma during the COVID-19 epidemic. (Abdelhafiz & Alorabi, 2020; Huda et al., 2020). Moreover, both the British and Australian governments have made specific commitments and implemented practical policies to reduce social stigma during pandemics (UK Parliament, 2015; Watkins, 2020).

The initial COVID-19 outbreak occurred suddenly in China in early 2020. In January and February, cities were successively placed under lockdown. Moreover, the government and ordinary people were unaware of the nature of the COVID-19 virus. This situation was consistent with the WHO’s definition of social stigma due to disease. (UNICEF, 2020; UNICEF Sudan, 2021) This article uses the changes in the Baidu index to explore changes in people’s search behavior before and after the Chinese government intervention in order to understand the effect of official intervention.

Measurement of searching potential discriminatory language

The COVID-19 pandemic provided material to understand the use of potential discriminatory language among ordinary people in China. Internet search behavior is an important source of online big data (Liu and Xu, 2015). The Baidu Index reflects the subjective preferences of users as well as the influence of the macro-social environment when searching online. This paper utilized the Baidu Index, a public big data analytics tool derived from the Baidu search engine and Baidu news service. The index represents the query volume for specific keywords within a certain area, thus reflecting the popularity of these terms among users as well as the search preferences and tendencies of users at a specific moment. (Liu and Liao, 2021).

In the early days of the COVID-19 outbreak in January 2020, due to the limited understanding of the virus, the sudden lockdown of Wuhan and then followed by more and more cities and provinces, and the high death toll of infected individuals, fear of the disease was understandably a common reaction when people searched for information about the virus online. According to the World Health Organization (WHO) and UNICEF (2021), this fear often extended to infected individuals and even the city where the outbreak originated.

Discriminatory behavior concerning the disease’s origin can manifest itself in the naming, discussion, and searching of the disease or its point of origin. While search behavior cannot wholly encapsulate all behavioral traits, it constitutes an integral part of an individual’s response to the disease and then form tendencies to the disease. Especially in China, search behavior can boost the formation of trending topics, such as Weibo trending topics, which are ranked according to users’ search click-through rate. This dynamic can inadvertently fuel discrimination related to the place of origin, continuously reinforcing prejudiced perceptions of the disease and its geographical source. In the early days of the outbreak, the demand for searches related to “Wuhan pneumonia” came after Wuhan was locked down and the Chinese government officially disclosed the disease. As the WHO and UNICEF (2021) showed, this unprecedented lockdown of a major city in China exerted a significant psychological impact on the populace, thereby shaping inherent biases about the disease and the city of Wuhan when searching.

However, this article also believes that the causal relationship between those conducting online searches and discrimination cannot be directly deduced from internet search behavior alone. Consequently, the primary objective of this study is to investigate the manifestations of search behavior using potentially socially discriminatory language in the initial stages of the pandemic and the effect of official intervention, employing the Baidu Index as our primary research tool.

This paper assessed the proclivity towards discrimination in relation to the source of an outbreak as follows: Internet users who used the term “Wuhan pneumonia” in Baidu searches were believed to have discriminatory views toward the source of the outbreak. Conversely, the use of “novel coronavirus pneumonia,” a neutral term, does not display the same tendency toward social stigma. In order to explore the fluctuations in social stigmatization across different time periods and analyze the influence of macro social-environmental factors on these changes, this paper selected a three-week period from January 19 to February 8 as our study period. Landmark events during this period were selected as significant moments and categorized into distinct periods, as presented in Table 1.

Table 1 Explanation of the measurement of potentially discriminatory language toward the source of the outbreak in internet searches.

Data sources and research methods

Search engine data: the Baidu Index

The Baidu search engine is the most influential internet search tool in China, with an average daily volume of more than seven billion searches (Meng and Zhao, 2019; Fang, et al., 2021). This paper used the most recent Baidu Index (Baidu Index, 2019), which is similar to Google Trends, to extract relevant search data. The Baidu Index serves as a data analysis platform based on the behavioral data of hundreds of millions of internet users, making it a crucial statistical analysis tool for both the modern internet era and the big data age. Its personal computer (PC) trends capture data from June 2006 to the present, while its mobile trends capture data from January 2011 to the present.

The Baidu search index used in this paper is based on the search volume of regular Internet users on Baidu. This study analyzed and calculated the weighted sum of the search frequencies of each keyword in Baidu web searches.

The Baidu Index has compiled an information database reflecting the degree of interest, evolving trends, regional distribution, and search preferences of internet users regarding social issues across different periods (Meng and Zhao, 2019). Depending on the search source, the search index is divided into a PC search index and a mobile search index. The Baidu Index data used in this article combines the PC and mobile indices. The Baidu Index also has some ancillary features, such as statistics on audience demographics, primarily including two types of social attributes: age distribution and gender ratio. Additionally, it includes a statistical measure (media index) for the most frequently appearing media articles following a search.

Given the above method of measuring the social stigma toward the source of the outbreak as well as notable differences in the population base and number of internet users in each province, this paper calculated internet searches using potentially discriminatory language toward the outbreak’s source in different regions by dividing the search volume (PC and mobile) of “Wuhan pneumonia” by the aggregate search volume of “Wuhan pneumonia” and “novel coronavirus pneumonia” on each day. Table 2 presents the descriptive statistics of the discriminatory views toward the source of the outbreak.

Table 2 Descriptive statistical analysis of potentially discriminatory language toward the source of the outbreak in internet searches.

Collection of data on macro-influencing factors

In order to examine the impact of macro-social environmental factors on discriminatory views toward the source of disease, this paper collected recent economic and social development variables. The data on population density, GDP, CPI, registered unemployment rate, and the ratio of fiscal expenditures were sourced from the 2018 statistics of the National Bureau of Statistics.

Land area data were drawn from the Ministry of Civil Affairs’ national administrative division information query platform. Internet penetration rate data were obtained from Wangsu’s 2018 China Internet Development Report and calculated by dividing the number of internet users in each province by the population of that province.

Temporal and regional analysis and media effects

This section examines the changes in social stigma toward the source of the outbreak across different time periods, focusing on regional and temporal distribution, media effects, and the media’s role in promoting or curbing discriminatory views.

Regional and temporal analysis of potentially discriminatory language toward the source of the outbreak in internet searches

This paper examined the evolution of social stigma toward the source of the outbreak over a 21-day period. During this period, the index score gradually decreased from a peak of 0.933 to 0.179 (as shown by the dotted line in Fig. 1). Combining the search trends of the Baidu index keywords “Wuhan pneumonia” (solid black line in Fig. 1) and “novel coronavirus pneumonia” (solid gray line in Fig. 1), we can identify several key points over time:

Fig. 1: Searching trend of potential discriminatory language.
figure 1

Searching potential discriminatory language toward the source of the outbreak and keyword search trends on the Baidu Index.

1. After the lockdown of Wuhan on January 23, the search volume for “Wuhan pneumonia” peaked, with a value of 613,238, while the search volume for “novel coronavirus pneumonia” at the same point was 162,438, and the discrimination index was 0.7906.

2. Between January 29 and January 30, the search volume for “novel coronavirus pneumonia” overtook that of “Wuhan pneumonia.” The discrimination index also dropped from 0.7422 on January 29 to 0.2574 on January 30, indicating that the official intervention in the early days of the pandemic significantly affected people’s behavior.

3. The search volume of “novel coronavirus pneumonia” was at a high level after January 30. After the Joint Prevention and Control Mechanism of the State Council officially named the disease on February 8, the search volume for “novel coronavirus pneumonia” surpassed the search volume for “Wuhan pneumonia” by a large margin. The discrimination index also continued to decline.

Timeline analysis revealed the sudden shift from January 28 to January 29 shows that the government’s initial intervention significantly affected behavioral choices. We found that the while the official guidance given to the media played a major role during this period, the shift in views was also affected by macro-social factors in each region.

We observed regional differences in searching potential discriminatory language toward the source of the outbreak at different time points. However, there were some common patterns across the study period (see Table 3 and Fig. 2):

Table 3 Regional ranking of searches using potentially discriminatory language toward the source of the outbreak in different periods.
Fig. 2: Searches using potentially discriminatory language in different provinces and periods.
figure 2

Searches using potentially discriminatory language toward the source of the outbreak in different provinces in different periods.

1. Shanghai consistently ranked in the top five in searches using potentially discriminatory language: It was fourth in the initial stage, third before official guidance was issued, fourth after official guidance was issued, and fifth following the official naming of the disease. It was also fourth in the 3-week average.

2. Guangdong and Beijing also ranked highly in the initial stage, but unlike Shanghai, they showed a significant fall after the official guidance was issued and were no longer in the top ten when the disease was named.

3. Qinghai and Tibet showed the opposite trend to Guangdong and Beijing, as searches using potentially discriminatory language in these areas actually increased after the official guidance was issued. Therefore, the significant decline in discrimination tendencies in other provinces was not observed in these western inland provinces.

This article also indicates two aspects of regional differences:

First, it is often assumed that those in positions of dominance—whether economically, politically, or culturally—are more likely to succumb to regional social stigma. (Connerton, 1989; Halbwachs, 1992; Harvey and Bourhis, 2012; Kuppens et al., 2018) This paper’s analysis of regional differences confirms that this phenomenon does exist, albeit with variations. For example, in Guangdong and Beijing, searches using potentially discriminatory language were obvious in the initial stage of the outbreak. However, searches using potentially discriminatory language showed a sharp decline following the issuing of official guidance. Therefore, the previous assumption that those in dominant positions succumb to regional social stigma may only be partially true. Changes in search behavior may show different trends over time which are closely related to factors such as the local internet penetration rate and exposure to official news media. Although those in a dominant position are the most prone to using potentially discriminatory language when searching online, they also have the greatest potential for change.

Second, the opposite patterns occurred in Qinghai and Tibet, due to the combination of a lack of exposure to official news media and low internet penetration rates, which resulted in a sense of political alienation. Therefore, the issuing of official guidance failed to significantly reduce discrimination toward the source of the outbreak.

Media effects on searches using potentially discriminatory language toward the source of the outbreak

Internet searches conducted by the general public reflect their personal inclinations. However, the news content they encounter often serves to amplify this search behavior, potentially deepening their perceptions and reinforcing their viewpoints, thereby creating a cumulative effect. Therefore, by using the most frequently accessed information from these searches, this study analyzed the media landscape during the 3-week period from 23 January to 8 February 2020 (spanning 3 weeks). Supplementary research materials for this study were also provided by the Baidu Media Index.

Tables 4 and 5 present the top three news stories and their search index at a number of key time points using data from the Baidu Media Index. The types of media outlets were as follows. Fifteen news articles with “Wuhan pneumonia” in the headline were provided by portal websites, including Sina, NetEase, Sohu, and Tencent, accounting for 53% of the total. Only BanYueTan and the Beijing News were sponsored by the government or under the direct control of the Central Committee of the Communist Party of China (CPC), accounting for 13.3%.

Table 4 The most frequently accessed information from searches for “Wuhan pneumonia”.
Table 5 The frequently accessed information from searches for “novel coronavirus pneumonia”.

After February 8, there were no news headlines that used the term “Wuhan pneumonia” in any newspapers or publications under the direct control of the CPC Central Committee. Of the 18 news articles with “novel coronavirus pneumonia” in the main title of the news, there were 10 articles from party newspapers and publications directly under the CPC Central Committee or media sponsored by the Chinese government, including five articles from CCTV News, one article from People.cn, and one article from the central government. There was also one article on the internet, one article on China Economic Net, one article on Huanqiu.com, and one article on Beijing News. These accounted for 55.6% of the total. Among these news headlines, the term “novel coronavirus pneumonia” was used the most by the National Health Commission, where it appeared five times.

Some intuitive feelings can be generated from the headlines and statistics of these reports. China’s central media (including CPC and government media) were more cautious when dealing with discourse related to local social stigma, whether this was in the initial stage of the outbreak or later on. However, the speed of information dissemination and catchy titles were more important for web portals, producing a less cautious approach.

The above Results demonstrate how the government and official media intervened to limit the use of potentially socially discriminatory language. This action has resulted in an awareness among the public that such language might exacerbate social stigma. Following government intervention, the “correct” terminology pertaining to COVID-19 has been officially confirmed, significantly influencing people’s search behavior.

Regression analysis of provincial data

This paper found that there are regional differences in official intervention in the above section. In order to understand whether the effect of official intervention on the use of discriminatory language may be influenced by regional or socioeconomic factors, a regression analysis is required.

The above analysis has defined the source of social stigma in four different periods as the dependent variable. Previous research has provided some pointers on how to effectively analyze the effect of socioeconomic influencing factors on the dependent variable.

Demographic and geographical factors, such as economic development levels and media orientation factors, should be the most important variables. Some scholars have proposed that the fundamental influencing factors of regional social stigma are differences in the natural environment, cultural and environmental factors, and individual environmental perception (Chen and Xie, 2017; Ran et al., 2021). Other scholars have also emphasized the guiding role of mass media, even beyond socioeconomic factors (Cattaneo, 2014). Therefore, this section combines these variables to analyze the source of discrimination against people in different regions, using the provincial cross-sectional data of four time periods to carry out regression analysis to investigate the effect of official intervention on the use of potentially discriminatory language in searches.

Definition of variables

This paper used cross-sectional data and panel data for analysis. The dependent variable was the use of potentially discriminating language toward the source of the outbreak in internet searches in each province across several time periods, including the initial stage of the outbreak, the period before official guidance on the use of discriminatory language was issued, the period after official guidance on the use of discriminatory language was issued, and following the official naming of the disease. Official intervention variables are Official guidance * Average number of official social media posts and Official guidance which are put in the last model (The final column of Table 7).

The regional variables included the population and geographical factors of each province (land area, population density), the level of economic development (GDP, the proportion of fiscal expenditure), livelihood factors (CPI, registered unemployment rate), the education level (years of education per capita), and media effect factors (internet penetration rate, average number of official social media posts). Table 6 presents the descriptive statistical analysis of variables.

Table 6 Descriptive statistical analysis of variables.

Regression findings of provincial data

In order to conduct a more effective analysis, this paper first conducted regression analysis on the cross-sectional data on searches using potentially discriminatory language toward the source of the outbreak across four different times.

The advantage of cross-sectional data is that it is based on a one-dimensional data set composed of different regions (objects) at a single point in time. This approach highlights the spatial differences of the social phenomenon at a specific point in time and shows the uniqueness of each sample. This paper analyzed the reasons for different patterns in social stigma across 30 provinces in the time section.

The analysis results presented in Table 7 indicate that the model proposed in this paper demonstrates sufficient explanatory power before and after the initial stage of the outbreak, with an adjusted R square reaching 90%. During this period, many factors, including population, geography, economic development, livelihood, cultural literacy, and media effects, had a significant impact on social stigma.

Table 7 Regression results of factors influencing searches using potentially discriminatory language toward the source of the outbreak.

In the initial stage of the outbreak, discriminatory search behavior was positively affected by population density, level of economic development, and per capita years of education. This implies that these factors significantly increased discriminatory search behavior across each province. Factors related to livelihoods, such as the consumer price index (CPI) and registered unemployment rate, along with those related to media effects, such as the internet penetration rate and the average number of official social media posts, contributed to reducing searches using potentially discriminatory language in each province.

The effect of GDP on searches using potentially discriminatory language in each province was linked to the local internet penetration rate, indicating that internet penetration strengthens the effect of low economic development on social stigma. At the same time, the P-value of the regression equation for this time section, as determined by the F-test was 0.000, indicating the model’s overall statistical significance. The heteroscedasticity test yielded a P-value of 0.6415, exceeding 0.05, indicating that the model is not subject to heteroscedasticity problems. Hence, this model is particularly suitable for analyzing factors influencing the public’s social stigma toward the source of the outbreak before official guidance was issued. However, the model’s explanatory power was poor in the periods after the official guidance was issued and after the disease was named.

This paper initially posits that due to the official guidance, searches using potentially discriminatory language declined rapidly, and the data structure changed, rendering the original model inapplicable. By the time the official name was established, the P-value obtained by the F-test of the model was 0.3282, no longer suggesting a linear relationship.

In order to better explain the failure of the model in the second half of the timeline, this paper separated the data from the four aforementioned periods, calculated the decline of searches using potentially discriminatory language toward the source of the outbreak between adjacent periods, and reintegrated them with the independent variables from the original model to form panel data. The analysis of this data focused on whether official directives led to a rapid decline in potentially discriminatory search behavior toward the source of the outbreak and what factors worked together with the official intervention.

In this paper, the Hausman test, LM statistic test, and F-test were performed on the fixed-effects model, random-effects model, and mixed regression analysis model in the panel data, respectively. Based on the previous discussion, this paper concluded that the key factor was the official intervention in guiding public opinion. Therefore, January 29, the day when the official intervention began, was set as the boundary, and a dummy variable measuring the effect of the official intervention was set up in order to construct a more suitable model.

The final column of Table 7 shows that the official intervention had a positive and significant impact on reducing social stigma toward the source of the outbreak. When the government began issuing guidance, the number of official social media posts in each province played an important positive role in the decline of potentially discriminatory search behavior.

Socioeconomic factors, such as income or education, had significant and positive effects on searches using potentially discriminatory language during the initial stage of the outbreak and before the release of official guidance on the use of discriminatory language. However, after the release of official guidance on discriminatory language, people with higher incomes and higher education were no longer more likely to use potentially discriminatory language when searching online. In China, official propaganda has a considerable influence on people’s cognition (Wu et al., 2021; Piao & Wu, 2023). Therefore, after the official guidance was released, these people may soon have become aware of which language to avoid from official propaganda and adjusted their search behavior accordingly.

Overall, the panel model explained 50% of the reduction in social stigma toward the source of the outbreak during this short period, and the P-value of the F-test was 0.0000, indicating a strong model.

Discussion

The analysis in this paper highlights several key points that could provide an important reference for understanding the emergence and evolution of discriminatory tendencies toward the source of an outbreak.

As for the social stigma over time, official intervention influences discriminatory tendencies toward the source of an outbreak, especially in today’s internet age. Whether in the early stages of the outbreak or later time, the government can play a role in increasing public awareness of potentially discriminatory language and reducing the number of searches using such language. Meanwhile, if a region has high internet penetration and a large number of official social media posts, it can play a role in suppressing discriminatory tendencies toward the source of an outbreak in its early stages. During the period of state regulation of discriminatory public opinion, these two factors played a complementary role in reducing discriminatory tendencies toward the source of the outbreak. However, this effect could not be useful in larger geographical area, whereas areas with high population density lean the other way.

The official intervention also diminish effects of socioeconomic factors. Considering the region factors, economic development significantly increases people’s discriminatory tendencies. It is worth noting that the data also revealed that areas with high unemployment rates or CPI had relatively low levels of discrimination. These findings in the initial stage of the outbreak may be related to the experimental findings of scholars in social psychology: highly educated individuals and wealthier people are often more discriminatory in their attitudes (Harvey and Bourhis, 2012; Kuppens, et al., 2018). As such, the relatively disadvantaged people in society, who “can understand what it is like to be stigmatized and form a circle of interconnection” (Goffman, 1963), are more aware of the harm of discrimination. Having often been victims of discrimination themselves, their inclination to discriminate could be lower. However, above effects were reversed after the official intervention, which demonstrate the importance of official attitudes and actions in preventing disease stigma.

Conclusion

In the face of the COVID-19 pandemic, effectively utilizing big data for public opinion monitoring and behavior prediction to provide data analysis and policy recommendations for crisis response has become an important task in the academic community. This paper first examined the literature and evolution of the concept of discriminatory tendencies toward the source of the outbreak, and the importance of government intervention in increasing the public’s awareness of social stigma.

The study’s findings highlight the effectiveness of the official intervention and the official naming of the virus as “COVID-19.” The sudden change observed between January 28 and January 29 shows that the official intervention significantly affected people’s behavioral choices. Shanghai exhibited the highest discriminatory tendencies of the provincial-level units in the study. Guangdong and Beijing also ranked highly in the initial period, but their ranking fell sharply after official guidance was issued, and they dropped out of the top ten after the disease was officially named. In terms of official media effects, official party media was more cautious when dealing with discourse related to local social stigma. On the other hand, portal websites prioritized information dissemination speed and headline attractiveness.

Although official public opinion guidance helped to reduce discriminatory language in internet searches, the panel data analysis also revealed some limitations in its effectiveness. Factors such as the CPI and registered unemployment rate, internet penetration rate, and the number of official social media posts helped reduce potential discriminatory searching behavior. The effect of GDP on the discriminatory tendencies correlated with each province’s internet penetration rate. Low internet penetration rates provinces, such as Qinghai, and Tibet, which could result in low exposure to official news media. After officially issuing of public opinion guidance, their potential discriminatory searching behavior slightly heightened rather than subsided.

Based on the regression analysis and big data description, this paper proposes the following suggestions to address social stigma toward the source of an outbreak:

First, official intervention is effective in reducing discriminatory language in internet searches. Our analysis showed that the number of official social media posts in various regions significantly and positively reduces potentially discriminatory language searching. As such, when governments attempt to diminish discriminatory tendencies, official guidance plays a crucial role and is a necessary task for officials to enhance epidemic prevention effectiveness.

Second, news media should adhere to the World Health Organization’s rules when naming diseases, especially epidemic diseases, to avoid linking the disease to a specific geographical location, population, or animal. Failing to do so can lead to negative outcomes, including discrimination against those in the outbreak location and potential concealment of infections, hindering epidemic prevention efforts.

Third, countries should ensure open and transparent communication of disease control information. By promoting knowledge of infectious diseases and ensuring transparency in epidemic prevention information, the general public can eliminate their fears and understand diseases more objectively.