Introduction

The COVID-19 pandemic has spread out since early 2020. The corona virus is contagious and has transformed to variants. The global community has been suffered with multiple waves of new confirmed cases of the corona virus variants. The herd community of the corona virus consists of the natural infected groups and vaccinated groups. However, the herd immunity for COVID-19 is required to prevent multiple variants from Alpha through Omicron (Moghnieh et al. 2022). Particularly, the absence of vaccines for corona virus and its variants has exacerbated the pandemic over the world. The variants have decreased the probability to form the herd immunity.

The spread of corona virus variants has significantly increased public risk perception, thereby leading people to avoid in-person activities (Dryhurst et al. 2020). Markets have responded to such changes in socioeconomic landscape by rapidly adapting digital transformations, which consequently boosted online platforms relating to shopping. The public have become preferred to online shopping, rather than in-person shopping, particularly when the number of infected people increases (Grashuis et al. 2020; Li et al. 2020; Mouratidis and Papagiannakis, 2021; Pham et al. 2020). This shift of the public’s lifestyle provides an opportunity to understand the impact of the COVID-19 pandemic on socioeconomic change via big social monitoring data relating to online information seeking activities.

The impact of the COVID-19 pandemic can be examined by comparing socioeconomic activities before and after COVID-19 pandemic. However, the long-lasting pandemic crisis makes it difficult to investigate the time-varying impact of the COVID-19 pandemic. Few literature has considered temporal changes of the impact of COVID-19 through its multiple waves due to the cost of collecting relevant data and the time-consuming data preprocessing. Online social monitoring data enables us to investigate the impact of the multiple waves of the corona virus variants and relevant prevention policies on online socioeconomic activities, which are costly-efficient and real-time monitoring. Recent studies have investigated changes in online activity patterns during the COVID-19 pandemic (Gu et al. 2021; Lampos et al. 2021; Nasser et al. 2021). However, the socioeconomic impact of the multiple waves of the corona virus variants remains unknown.

During the COVID-19 pandemic, online shopping patterns has been investigated in various ways. A previous study discussed a chance to die or modify old purchasing habits from in-person activities and to create new habits (Sheth, 2020). The new habits are likely to be influenced by socioeconomic constraints, such as public policy, technology and changing demographics. Another study proposed this behavior pattern change during the COVID-19 pandemic introducing the “react”, “cope”, and “adapt” phases of the Reacting Coping Adapt (RCA) framework (Kirk and Rifkin, 2020). At the “react” phase, the public change their purchasing behavior based on pandemic risk perception as a social response to dynamic social distancing policies. At the “coping” phase, they start adopting new purchasing pattern based on the public policy level. At “adapt” phase, they establish/stabilize new purchasing pattern and less reactive to the pandemic situation (Guthrie et al. 2021; Kirk and Rifkin, 2020). The RCA framework has been validated by the online shopping patterns in France before, during, and after the COVID-19 pandemic (Guthrie et al. 2021). The application of this RCA framework to other countries and social behaviors is still lacking.

Nowadays, the Internet service providers monitor and record online search activities through data logging and analyze these online search activity data to detect changes in the user’s interest and optimize the search algorithm for most relevant information to their interest in a timely manner. For example, increased online search activities about a specific shopping product hint an emerging demand of the shopping product, which is a practical information for inventory and supply chain management.

Online social network data, such as Twitter, have been already used to predict stock market price change (Almehmadi, 2021). Online information search activity data, such as Google Trends, have been used to forecast the near-term values of economic indicators (Carrière‐Swallow and Labbé, 2013; Choi and Varian, 2012), private consumption (Vosen and Schmidt, 2011), and epidemics (Carneiro and Mylonakis, 2009; Teng et al. 2017). Recently, the utility of these data has been examined in investigating spatiotemporal changes of social response to natural disasters, such as earthquakes and droughts (Gizzi et al. 2020; Kam et al. 2021; Kam et al. 2019; Kim et al. 2019). However, These social monitoring big data have been underutilized to investigate the changes of socioeconomic activities during the multiple waves of the corona virus variants.

The NAVER Shopping website is the most popular online shopping platform among the citizens of the Republic of Korea with online sales valued at about 2.7 billion KRW in the third quarter of 2021 (2.3 million USD) (https://www.wiseapp.co.kr/insight/detail/89). Online shopping activities via the NAVER shopping website can capture major modes of online shopping activities of the Koreans. For example, increased online search activities relating to a specific shopping product hint an emerging demand of the NAVER’s users relating to the shopping product (Woo and Owen, 2019). Rumors about an emerging topic can affect the public’s social behavior patterns via social media (Alkhodair et al. 2020). However, the quality of social monitoring data determines an appropriate analysis spatial scale, and a careful design of data preprocesses is necessary for quality control (Wilcoxson et al. 2020). Recently, it has been found that the public interest in nationwide natural disasters and global pandemics can reduce the impact of rumors on social media and online seeking activities because the rumors can be verified by the direct and indirect experience of the public from the disaster or pandemic (Park et al. 2022; Kam et al. 2021).

Recent studies found a relationship between decision making and consumer behavior patterns at the individual level during the COVID-19 pandemic (Birtus and Lăzăroiu, 2021; Smith and Machova, 2021; Vătămănescu et al. 2021). Statewise sentimental alterations have been also found from the public’s complaints about water pollution during the COVID-19 pandemic (Liu et al. 2023). However, the impact of the COVID-19 pandemic and associated prevention policies on national-level social behavior pattern remains unknown. Online social monitoring data provides a unique opportunity to examine the relationship between decision making and consumer behaviors as response to changes of the COVID-19 pandemic prevention policies.

This study aims to investigate the impact of multi-year COVID-19 pandemic, using the NAVER DataLab Shopping Insight (NDLSI) data that provided by the NAVER Corporation. The data provides online search activity volumes relating to +1,800 shopping products at the nation level, which can detect an emerging change of online purchasing activities of the Koreans. The NAVER Corporation has operated the online search engine since 1999 and is the most popular internet search engine platform in South Korea. It had 1.2 billion visits from August through October 2022, and 94% of these visits solely from the Republic of Korea (https://www.similarweb.com/website/naver.com/#traffic). The NAVER Coporation provides weekly online search activity volume data of 1,800 shopping times since 2017 via the NDLSI platform. Such big social monitoring data provide a unique research opportunity to examine the COVID-19 impact on online shopping activities of the Koreans within the RCA framework by answering the following questions:

  1. 1.

    What are the major components of the dynamic patterns of online search activities before and after COVID-19?

  2. 2.

    How have the social behavior patterns related to online shopping search activities changed along multiple waves of corona virus variants?

  3. 3.

    Which prevention policies are key factors of the temporal changes of online shopping search activities during the COVID-19 pandemic?

To answer these questions, this study extracts the major modes of information seeking behavior patterns relating to shopping products from the NDLSI data (2017–2021) via the singular value decomposition algorithm-based Principal Component Analysis (PCA). Furthermore, the RCA framework is validated by the major modes of the NDLSI data during the multiple waves of the COVID-19 pandemic. The PCA analysis of the NDLSI data will advances the current understanding about changes in e-commerce before and after the two-year long COVID-19 pandemic.

Data and methods

NAVER DataLab Shopping Insight (NDLSI) data

The NDLSI data includes the number of clicks on 1,837 shopping products from the NAVER Shopping platform. This study uses the NDLSI data that provide 214-week online search activities relating to 1,837 shopping products (July 31, 2017 through August 30, 2021). Weekly relative search activity volumes of the NDLSI data range from 0 to 100 (normalized by the maximum number of clicks during the search period and multiplied by 100). The NDLSI data is classified at the three levels: 11 categories for the first level, 204 categories for the second level, and 1,837 items for the third level (see Table S1. in Supplementary Material). These categories of shopping products are provided from NAVER shopping platform, which are based on the merchant category codes (MCCs) that a credit card issuer to uses to categorize the transactions consumers complete using a particular card. The MCCs is used to classify merchants and businesses by the type of goods or services provided in order to keep a track of transactions. Recently, changes in credit/debit card spending in the MCCs have been analyzed during the COVID-19 pandemic (Darougheh, 2021; Dunphy et al. 2022). The first level categories include Fashion clothing, Fashion Miscellaneous Goods, Cosmetics/Beauty, Digital/Home Appliance, Furniture/Interior, Childbirth/Childcare, Food, Sports/Leisure, Life/Health, Leisure/Life convenience, and Duty-free shops. The category and product names are provided in Korean. In this study, the category and product names are translated in English via the Google Translator.

Six COVID-19 metrics

This study uses the six COVID-19 metrics from the Center for Systems Science and Engineering at Johns Hopkins University (JHU CSSE) COVID-19 dataset (Dong et al. 2020). The six COVID-19 metrics include new confirmed cases, stringency index, residential index, vaccination index, new death cases, and fatality. New confirmed/death cases are the number of the corresponding case of the Koreans over the study period. The stringency index is estimated based on the nine metrics: school closures, workplace closures, cancellation of public events, restrictions on public gatherings, closures of public transport, stay-at-home requirements, public information campaigns, restrictions on internal movements, and international travel controls. The stringency index shows the strictness of the government prevention policies in quantitative method (Dong et al. 2020). The value ranges from 0 (lowest stringency) to 100 (highest stringency). Higher stringency values represent more strict prevention policies. The residential index shows the number of people who spend more time at home after the COVID-19 pandemic than before. The vaccination index is a partial vaccinated index that represents the percent of who have vaccinated at least once. The fatality index is the ratio of the number of the number of new death cases to the number of new confirmed cases. While these daily six metrics are available, this study computes and uses the weekly sums of new confirmed cases and the weekly averages of the other five COVID-19 metrics, which is a consistent temporal scale with the NDLSI data analysis. The Korea Meteorological Administration (KMA) provides the historical meteorological data of the Republic of Korea through the Open MET Data Portal platform (https://data.kma.go.kr/cmmn/main.do). In this study, weekly temperature averages of the 95 stations in the Republic of Korea are computed to extract the seasonality of the regional climate system.

Singular value decomposition (SVD)-based principal component analysis

In the machine learning field, the principal component analysis (PCA) is a popular unsupervised learning method. The PCA technique is known as a data compressing technique to extract key features of the high dimension data. Singular value decomposition (SVD) algorithm can be used to extract the PCA major modes (Vosen and Schmidt, 2011; Wilks, 2011). The SVD algorithm-based PCA decomposes a covariance matrix into three matrixes if the A matrix has m x n dimension (n < m; Eq. 1). These matrixes include the U matrix (an m by m matrix), the Σ matrix (a m diagonal matrix) and the V transpose matrix (an n by n matrix). The Σ matrix is a diagonal matrix which have one to one correspondence with the U matrix. The U matrix shows the orthogonal eigenvectors, which are known as the principal components (PCs).

$${{{\mathrm{A}}}} = {{{\mathrm{U}}}}\Sigma {{{\mathrm{V}}}}^{{{\mathrm{T}}}}$$
(1)

In this study, the SVD algorithm is employed to the covariance matrix of the NDLSI data over the five different periods. The five periods include the period before the COVID-19 pandemic (July 31, 2017–December 31, 2019), Wave 1 (July 31, 2017–May 25, 2020), Wave 2 (July 31, 2017–October 19, 2020), Wave 3 (July 31, 2017–March 1, 2021), and Wave 4 (July 31, 2017–August 31, 2021) of the corona virus variants. Here, the waves are defined based on the surges of the new confirm cases. To explore shopping products with an increasing/decreasing interest of the public during each wave of the COVID-19 pandemic, the SVD analysis period for the wave of interest covers before the emergence of the next wave, which includes the overlapped analysis period of the previous wave. It enables us to investigate the impact of the wave of interest on the public interest relating to shopping products compared to that of the previous wave.

Two major modes are found before the COVID-19 pandemic: the increasing line trend and the seasonality pattern of the online search activities. Before employing the SVD-based PCA analysis to the NDLSI data, the linear trend and the seasonality are removed from the NDLSI matrixes over the period of Wave 1, 2, 3 and 4. The detrended NDLSI data over the different periods enable us to investigate changes of online search activities relating to shopping products over the multiple waves of COVID-19. Not available values in the NDLSI data were replaced with zeros. The U and V matrixes are the same eigenvectors of the covariance matrix and the Σ matrix includes the eigenvalues. The Σ matrix’s diagonal values show the quantitative contribution of the corresponding vector to the total variance of the covariance matrix.

Spearman’s rank correlation

Spearman’s rank correlation is a non-parametric metric to find a relationship between two variables based on their ranks (Spearman, 1904). This study uses Spearman’s rank correlation because online search activities of most items in the NDLSI data do not have normal distribution. Spearman’s rank correlation efficiency ranges from −1 (negative perfect relation between two variables) to +1 (positive perfect). In this study, Spearman’s rank correlation is used to trace the user’s interest in shopping products that are associated with the wave of corona virus variants. Furthermore, Spearman’s rank correlation is computed between the COVID-related metrics and NDLSI data to examine which socio-economic factors associated on e-commerce search activities. First, Spearman’s rank correlation coefficients are computed between the first principal component (PC1; one time series) and the search activities of +1,800 shopping products (>1,800 time series) during the periods of Wave 1 through 4. Furthermore, the distribution of Spearman’s rank correlation coefficient is constructed by the kernel density estimate (KDE) method from the Joyplot python package (https://github.com/leotac/joypy). Shopping products with a high coefficient have increased from Wave 1 through 4 (Fig. S1). In this study, 0.45 of Spearman’s coefficient is a threshold value to detect up to 20% of associated item with the PC1 mode with the COVID-19 pandemic.

Quantile-Quantile plot (QQ plot)

The number of the PC-associated shopping products affect the construction of the reliable correlation distributions with the COVID-19 metrics. A Quantile-Quantile (QQ) plot is a common visualization method to determine whether two data sets are came from same distributions or not. Despite different numbers of the COVID-19 associated shopping products during each wave period, the QQ plot can detect the stability of the correlation distribution shape. The QQ plot is based on the ranks of each data, which gives an advantage that the two dataset still can be compared in the QQ plot even though the sample sizes of the two datasets are different. The one-to-one line is a reference line of the QQ plot. When the quantile line of the two data is close to the reference line, the two sample data are from the same distribution (Nist, 2006). In this study, the QQ plots are constructed for the sensitive analysis of the stability of the correlation distribution shape to shopping products numbers. This analysis can determine how many shopping products are needed to generate the reliable distributions of its correlation with the waves of corona virus variants (Figs. S2 and S3).

Results

Principal components of NDLSI data

Before the COVID-19 pandemic (hereafter, Wave 0), the first and second Principal Component (PC) modes (PC1 and PC2, respectively) explained around 15% and 10% of the total variance, respectively. PC1 was a monotonic increasing trend of online search activities for shopping products. PC2 was strongly associated with the seasonality of weekly mean temperature, however the seasonal cycle of online search activities relating to shopping products was four weeks ahead of the seasonality of the temperature (Fig. 1A, B). Based on Spearman’s rank correlation coefficients with the PC1 and PC2, the top 10 shopping products showed that these two major modes captured well an increasing trend of shopping product-specific e-commerce and the seasonality of online search relating to shopping items during Wave 0 (Fig. 1C–F).

Fig. 1: Two major modes of online search activities relating shopping products of the Koreans.
figure 1

Weekly time series of Principal Component 1 (PC1) of NDLSI data before COVID-19 (A) and PC2 (B) with heatmap of correlation coefficients of top 10 correlated items. Associated online search activities of top 10 shopping products with the PC1 time series (A): Positive (C) and negative correlation (D). Associated online search activities of top 10 shopping products with seasonality (B): Summer- and winter-related shopping products in (C) and (D), respectively.

Results showed that the top 10 PC1-related shopping products included toothpaste, table tennis shoes, and cleaning tissue, packed lunch, and hair spray. Shopping products with a negative correlation coefficient with the PC1 mode included (car) hands free, sea fishing, Random Access Memory (RAM), and Network Attached Storage (NAS). Shopping products with a positive (negative) correlation coefficient with the PC2 mode (the seasonality of temperature in advance of four weeks) were summer (winter) season shopping products. Based on the correlation coefficients with PC2, summer season shopping products included fan, parasol, yeolmu kimchi (a type of kimchi for summer), and tarp. Winter season shopping products included brooch, beanie, and neck cape. These PC2-based items were the well-known popular shopping products for summer and winter, respectively, confirming that the PCA technique is useful to extract and interpret key features in the NDLSI data when the principal major mode is associated with a certain temporal pattern (herein, the seasonality of temperature).

Flow of PC1 related items during the COVID-19 pandemic

Results from the PCA analysis of the detrended NDLSI data showed that PC1 resembled the new confirm cases of COVID-19 over the four waves of the corona virus variants (Fig. 2). The percent of explained variance by the PC1 mode increased from the first wave (20%) through the fourth wave (27%), which means that associated shopping products with the corona virus variants increased during the COVID-19 pandemic. The first-level category shopping products associated with the PC1 mode showed temporal changes from Wave 1 through 4 (Fig. 3). For visualization, the Sankey diagram was constructed, which has been often used as an efficient visualization for changes of the flow/volume of the data (Lupton and Allwood, 2017).

Fig. 2: First major modes of the detrend NDLSI data of the four waves of the corona virus variants.
figure 2

Weekly time series of the PC1 mode of the detrended NDLSI data up to Wave 1 through Wave 4 (gray dash lines) along South Korea’s COVID-19 new confirmed cases (a sky line).

Fig. 3: Changes in the first-level category shopping products associated with the corona virus variants during the four waves.
figure 3

Sankey diagram of COVID-19 associated shopping products during the four waves.

Based on the result of the explained variance by the PC1 mode (around 20% of the total variance), changes in online search activities relating to shopping products with the correlation coefficient, 0.45, or higher (close to 20% of total items) were analyzed. Overall, life/health, digital/home appliance items showed a large percentage during the study periods). Outdoor activity-related category items, including cosmetics/beauty, fashion clothing and fashion miscellaneous goods, account for small portions than other category items. Associated items with the corona virus variants have increased from Wave 1 through Wave 4 by more than twice (from 327 to 714). After the first wave, new 241 shopping products showed the correlation coefficient, 0.45 or above. This inflow of online search activities were associated with shopping items in the categories of life/health (25%), digital/home appliances (15%), and food (15%) (Fig. 4).

Fig. 4: Changes in new shopping products associated with the corona virus variants during the period of Wave 2 through 4.
figure 4

Percentages of the first-level shopping product categories of inflow after Wave 2, 3, and 4.

After Wave 2, the inflow of the 125 items included life/health (29%), digital/home appliance (19%), and childbirth/childcare (12%) items with decreased item numbers (125 items). After Wave 3, the inflow of 190 items included life/health (22%), digital/home appliance (17%), and childbirth/childcare (19%) items. Interestingly, duty-free shopping products and leisure/life convenience items first appeared after the Wave 2 and 4, respectively. The leisure/life convenience category items included work out class (fitness/personal training and Pilates) abroad travel items (abroad travel package, airline ticket, Wi-Fi/ Universal Subscriber Identity Module (USIM)). Increasing online search activities relating to work out class may be come from a concern about health due to a restrict quarantine policy. Increased interest in abroad travel cases after Wave 4 suggests that the public in South Korea might have a low perceived risk of the COVID-19 pandemic and begin to consider that the pandemic is over.

To investigate the temporal change of the third-level (product-specific) category shopping products associated with the waves of the corona virus variants, changes in the correlation coefficients of the top 10 items were selected for each waves (Fig. 5). The results showed that 31 shopping products were associated with the PC1 component throughout the four waves. More than 32% shopping products were in the category of life/health shopping products. These 31 items can be classified into two groups: the items with a higher and lower correlation coefficient over time. The first group items included minidisc player monitor arms, webcam, interphone box, fabric, handicraft supplies/subsidiary materials, character card/ticket, processed snacks, cooking oil/oil, bread, tuning supplies, craft, feed, seeds/seedlings, water aperture, gravel/sands/soil, landscape tree/sapling. These first group shopping products showed a persistent increase in the correlation coefficient through the multiple waves. The second group items included gas range, microwave, toothbrush, hula hoop. These second group shopping products showed a decrease in the correlation coefficient (Fig. 5).

Fig. 5: Heatmap of Spearman’s rank correlation coefficient between 31 items with PC1 from Wave 1 to 4.
figure 5

The numbers of the Wave 1 through 4 heatmaps are Spearman’s rank correlation coefficients of the shopping products with the PC1 mode. The Wave 2 to 4 heatmap depict the percent changes of Spearman’s rank correlation coefficients compared with the correlation coefficients after Wave 1 ((CorrXCorr1)/ Corr1) * 100, where X depicts the wave occurrence order (X = 2, 3, and 4).

Fig. 6: Changes in the six COVID-19 metrics over the four waves.
figure 6

Weekly time series of the COVID-19 new confirmed cases (A), the stringency (B), residential index (C), vaccinated rate (D), new deaths by the corona virus (E), and fatality (F).

These two shopping product groups might originate from the different social response to the strictness of prevention policies. During the first wave, the government forced the public to stay at home to minimize the risk of being exposed to the corona virus. However, the prevention policies became less strict at Wave 4 to account for the fatigue of the public from the multi-year pandemic and revive local business and industry sectors. While the first group items have become more associated with the waves of the corona virus variants, the second group items no longer showed a high correlation coefficient with the corona virus variants.

Association with the six COVID-19 metrics

A surge of new confirmed cases of corona virus variants can influence social behavior patterns relating to e-commerce in a different way due to a different level of the COVID-19 prevention policy and the easy access of online shopping activities. In this study, Spearman’s rank correlation coefficients between the six COVID-19 metrics and the NDLSI data are computed to investigate potential causes of changes in online search activity volumes of shopping products (Fig. 6).

The six COVID-19 metrics showed different correlation distributions with the six COVID-19 metrics (Fig. 7). As the sensitivity test of the correlation distribution shape to the number of shopping products, the Quantile-Quantile (QQ) plots have been made along the different shopping times (see Figs. S2 and S3). According to the QQ plots, the top 50 items were chosen to construct the correlation distributions of the top 50 shopping products with the vaccination index. The correlation coefficients were widely distributed, indicating a relatively weak association with online search activities relating to the shopping products (Fig. 7A). The correlation distributions with the stringency and fatality indices showed a low variance with high correlation coefficients above 0.8. The correlation distribution with the residential index showed a relatively low correlation coefficients than those with the stringency and fatality indices. New confirmed and death cases showed a relatively high variance than the correlation distributions with the fatality and stringency data. The categories of the top 50 shopping products included life/health (20%), digital/home appliance (16%) and food (16%), shopping products (Fig. 7B).

Fig. 7: Correlation distributions of the six COVID-19 metrics with the first mode of online search activities relating to shopping products.
figure 7

Distributions of Spearman’s rank correlation coefficient of top 50 items related to the COVID-19 pandemic with six COVID-19 metrics (A), and the pie chart of first category percentage of items of top 50 items (B).

To investigate associations of online search activities relating shopping products with the six COVID-19 metrics, the Spearman’s rank correlation coefficients with the 31 PC1-associated items associated with the COVID-19 pandemic were computed (Fig. 8). New confirmed cases, stringency, residential index, new death cases and fatality showed a high correlation coefficient with the most of top 10 shopping products. The vaccination index showed no significant correlation coefficient with the top 10 shopping products. Gas range, baby walker and toothbrush items showed a relatively low correlation with the COVID-19 metrics than other shopping products. Online search activities relating to these shopping items showed a decreasing correlation during COVID-19 pandemic (see Fig. 5), that is, these items no longer show a significant effect of the COVID-19 pandemic after the Wave 4.

Fig. 8: Associations of online search activities relating to shopping items with the six COVID-19 metrics.
figure 8

Heatmap of Spearman’s rank correlation coefficient between COVID-19 metrics and the 31 shopping products.

Overall, the stringency and fatality metrics generally have high association with the changes in online search activity patterns for shopping product. Stringency can be regarded as how government control public strictly. Fatality shows seriousness of pandemic. The results indicate that consumer behavior response sensitively to extent of restriction policies and seriousness of pandemic.

Discussion

This study used the NDLSI data about the online search activity volumes for shopping products, not real purchasing data. Using the data of online search activities can provide an evidence on emerging purchasing patterns of the public in the next regime, implying that the public might tend to purchase items that have been most searched in the previous timeframe (Chen et al. 2017). Lately, credit card dataFootnote 1 and bar cord dataFootnote 2 include the records of actual purchase activities. Integrating the actual purchase data and online search activity data can provide more practical guidelines and plans for socio-economic changes not only during the COVID-19 pandemic, but also the post pandemic period.

This study revealed that the public interest in online shopping products had been changed not only after the first wave of the COVID-19 pandemic but also during the following three waves. These dynamic patterns of the public interest in online shopping products were possibly explained by the RCA framework (Kirk and Rifkin, 2020). The RCA framework consists of reacting, coping, and adapting phases, and significant changes in social behavior patterns are expected during a transition period from one to another phase. The first wave was a typical ‘react’ phase because people responded to the pandemic situation. A large inflow volume after Wave 2 (241 items) indicated a coping phase. The new confirmed cases were relatively low during Wave 2 (see a line colored in sky in Fig. 2) compared with those during other waves. Inflow of online search activities relating shopping products was the minimum after Wave 2. This finding suggests that a transition from a ‘react’ to ‘coping’ phase might occur between Wave 2 and Wave 3. After Wave 3, the public coped with the long-term pandemic. During Wave 4, the categories related with outdoor activities show a low percentage, indicating a low level of the public interest in outdoor activities due to the COVID-19 quarantine policy. The result that the inflow of online search activities relating to leisure/life convenience items (workout class, abroad travel) at Wave 4 indicates that the public became less reactive to the wave of the corona virus variants, which hints an emerging signal of a low perceived risk of the COVID-19 pandemic after Wave 4. Therefore, the ‘adapt’ phase transition is expected after Wave 4.

Understanding the public’s purchasing patterns amid a global crisis via big social monitoring data is critical from the risk management perspective. Risk control (e.g., self-protection) and financing (insurance) strategies can be improved for the next global crisis by understanding and predicting changes in social behaviors. This study found that the shopping products with an increased interest of the public have been changed during the two year-long COVID-19 pandemic, which can be explained by different stages of the RCA framework. The social behavior patterns found by this study had been also reported from the observed reacting and coping consumer behaviors in mass media and online and reacting public behavior to social distance during the COVID-19 pandemic (Guthrie et al. 2021; Kirk and Rifkin, 2020; Tintori et al. 2020). Specifically, better understanding and predicting of which products can help markets manage inventory of shopping products that are in an emerging high/low demand throughout different regimes of the crisis. This study found that associations of these products were more clear when they were used for self-protection measures (e.g., facial masks in the COVID-19 pandemic).

Governments and authorities can accordingly implement changes in the public’s actions to prevent potential market failures that, for example, self-protection measures may not be sufficiently supplied, or big market players use their power to dominate necessity markets (Stiglitz, 2021). These responses from the public and private sectors can be optimized with prevention plans in a timely manner of different waves of the crisis by analyzing big social monitoring data. This study found changes in the interest and demand of the shopping products related to self-protection measures during the COVID-19 pandemic, which hints how to facilitate big social monitoring data to mitigate the adverse effects of daily infections. Furthermore, this information can help insurance industries manage systematic risks that cannot be fully controlled by individuals or other industry sectors, which can offer risk transfer measures (Alonso et al. 2020; Harris et al. 2021; Peiffer-Smadja et al. 2020; Rita et al. 2019). This study also found a strong association between changes in online search activities of the public relating to shopping items and perceived risk, which was previously found in the travel insurance purchasing patterns (Al Mamun et al. 2022; Tan and Caponecchia, 2021). This information can give an insight for how to increase the public’s willingness to prepare for the next pandemic.

Search engine optimization (SEO) algorithms for searching items have been developed, particularly in the e-commerce sector to increase the customer’s satisfaction and loyalty (Husain et al. 2020; Liu et al. 2008; Pratminingsih et al. 2013). Some online search engine platforms collect the data of the user’s online activities and optimize the customized recommendation algorithm that could give more relevant result of searching. Especially, e-commerce sites, such as Amazon, have developed this customized SEO algorithm to increase a chance to purchase the products (Heng et al. 2018; Linden et al. 2003). In this study, the observational evidence of the COVID-19 impact on online search activities about shopping products was reported, which was also found in online shopping pattern for apparel (Watanabe et al. 2021). The SEO algorithms developed by the data before the COVID-19 pandemic increased the user’s complaint by three times (Dahiya et al. 2021), implying that the COVID-19 pandemic was an unprecedented event since the advent of Internet that supposedly cause a drastic context difference. Therefore, the SEO algorithms are needed to update until the data after the pandemic is sufficient. Furthermore, the expected continued growth of online commerce industries requires the coping strategies to adapt an increasing trend of not only pandemics but also other disasters such as climatic extremes, pandemic, war, and terror.

This study provides an insight about how social big monitoring data can help authorities to better understand the social response to COVID-19 via near real-time social monitoring data. In this study, the NDLSI data about the online search activity volumes relating to shopping products, not real purchasing data, were used. The NDLSI data analysis provided a possible evidence on an emerging change in the public’s purchasing patterns at the shopping product level. Previously, it was found that the public tended to purchase shopping products that have been most searched in the previous timeframe (Chen et al. 2017). Associations between the public interest in shopping products and purchase records can be explored using credit card dataFootnote 3 and barcode dataFootnote 4. These data have been used to investigate changes in spending associated with stringent nonpharmaceutical interventions during the COVID-19 pandemic (Horvath et al. 2023). Integrating the actual purchase data and online search activity data can give more practical guidelines and plans for socio-economic changes during not only the COVID-19 pandemic, but also the post pandemic period. Furthermore, the e-commerce sector can harness social big monitoring data to develop their strategic plans for supply chain management for the next pandemic.

This study also explored associations of changes of online search activity patterns with the COVID-19 metrics. The results showed that the COVID-19 metrics, except for vaccination, were strongly associated with changes in online search activity patterns relating to shopping products. The stringency index was a reliable indicator of the strictness of the government’s response to the COVID-19 pandemic and had a significant impact on social behavior patterns, which is in line with the findings of Makki et al. (2020) that the timing and duration of the stringency implementation are key factors to prevent the spread of the corona virus variants. Furthermore, a recent study found that policy perceptions affect the practice of volunteered prevention behaviors, such as mask waring and social distancing (Lee et al. 2021). They found that the perceived policy stringency was associated with actual risk and political ideology, causing noncompliance in communities during the COVID-19 pandemic.

The proposed methods in this study have some limitations. For example, the results based on the correlation analysis provide potential, not actual, triggers of changes in the social behavior patterns during the COVID-19 pandemic, which have previously known as the caveat of the correlation analysis (Haley and Drazen, 1998; Stigler, 2005). The findings of this study about potential triggers however can help design more effective and efficient interview and survey questionnaires to investigate true triggers of changes in the public interest in shopping products. Combined information from big social monitoring and survey/interview data will create new knowledge about the dynamics of social behavior patterns and help develop a reliable social behavior prediction modeling.

Conclusions

This study succeeded to extract the major modes of the public’s interest in shopping products and investigate changes in online search activities relating to associated shopping products with the COVID-19 pandemic. The SVD algorithm-based PCA analysis of the NDLSI data showed the dynamic patterns of online search activities relating to shopping products during the two year-long COVID-19 pandemic. Before the COVID-19 pandemic, an increasing trend and seasonality of online search activity volumes about shopping products are the major mode of the NDLSI data. After the COVID-19 pandemic, the impact of COVID-19 on online search activities relating to shopping products were various during the four waves of the corona virus variants, particularly when the objective risk was dramatically increased. Changes of the online search activity patterns were associated with the change of the COVID-19 prevention policy and objective risk of being exposed to the corona virus variants. This study attempted to explain the changes of these online search activity patterns within the RCA framework by identifying the react, coping, and adapt phases.

This study highlights the utility of online social monitoring data in developing strategic plans for preparation, mitigation, and recovery policies for the next pandemic. Furthermore, the findings of this study can guide how to design interview and survey questionnaires to investigate actual drivers of social behavior changes during the COVID-19 pandemic. Integrated studies using online social monitoring data and survey and interview data will advance the current knowledge and prediction skill of social behavior changes, which can provide actionable information to mitigate its adverse effects for the sustainable development of our communities Kim et al. (2019), Spearman (1904).