Forecasting consumer confidence through semantic network analysis of online news

This research studies the impact of online news on social and economic consumer perceptions through semantic network analysis. Using over 1.8 million online articles on Italian media covering four years, we calculate the semantic importance of specific economic-related keywords to see if words appearing in the articles could anticipate consumers’ judgments about the economic situation and the Consumer Confidence Index. We use an innovative approach to analyze big textual data, combining methods and tools of text mining and social network analysis. Results show a strong predictive power for the judgments about the current households and national situation. Our indicator offers a complementary approach to estimating consumer confidence, lessening the limitations of traditional survey-based methods.

www.nature.com/scientificreports/using a methodology that relies on an indicator that calculates the importance of economic-related keywords (ERKs) appearing on digital news media.This is a departure from the time-consuming manual content analysis of economic news used in the past 18 .Our focus is on the Italian Consumer Confidence Climate index, which provides an indication of the optimism and pessimism of consumers who evaluate the Italian general economic situation and report their expectations for the future.We chose an indicator of semantic importance, called Semantic Brand Score (SBS), which calculates the relative importance of one or more keywords in the news 19 .We selected this indicator because of its ability to forecast various outcomes, from financial market trends 20 to election results 21 and tourism demand 22 .Based on methodologies drawn from social network analysis and text mining, the semantic importance of keywords is calculated in terms of their prevalence, i.e., frequency of word occurrences; connectivity, i.e., degree of centrality of a word in the discourse; and diversity, i.e., richness and distinctiveness of textual associations.The approach we use in our study is different from past research that focused on the evaluation of news sentiment e.g., 23,24 .We have implemented a new integrated semantic index as a measure of semantic significance.This metric has been proven to be more informative than sentiment analysis, which can be subject to variable error rates and reliability issues 25 , and represents a valuable tool for analyzing and understanding relationships among words in a corpus 20,22 .
This study contributes to the discussion on online media's role in shaping consumer confidence.By providing an alternative method based on semantic network analysis, we investigate the antecedents of consumer confidence in terms of current and future economic expectations.Our approach is not intended to replace the information obtained from traditional tools but rather to supplement them.For instance, we may use consumer surveys in conjunction with our methods to gain a more comprehensive understanding of the market.
Section "The connection between news and consumer confidence" delves into the impact of news on consumers' perceptions of the economy.Section "Research design" outlines the methodology and research design employed in our study.Section "Results" showcases the primary findings, subsequently analyzed in Section "Discussion and conclusions".

The connection between news and consumer confidence
Effective news coverage plays a crucial role in shaping the current and future expectations of individuals.Both digital and mainstream media provide information that can significantly impact people's economic evaluations of present and future conditions and influence economic decisions.The information disseminated through news channels can significantly impact the way people perceive the economy, leading to changes in their spending habits, investment decisions, and overall economic behavior 26,27 .The news may influence consumer confidence, especially when people are exposed to ambiguous messages 11 , or when media coverage does not fully reflect economic conditions, or when it is biased by partisanship 9,28 .For example, Damstra and Boukes 29 investigated the impact of the real economy on economic news in Dutch newspapers and confirmed that the description of economic reality offered by the media is skewed to the negative, which in turn affects people's economic expectations about the future, but not their current evaluations.Other studies show the role of rumors in shaping consumer response and spending 30 , while others demonstrate how the tone of economic news may influence consumer confidence, with a slight difference between prospective versus retrospective economic evaluations 18,31 .For example, Boukes et al. 18 found that consumers' retrospective evaluations were not influenced by the tone of the news.Other studies explored the effect of the negativity bias on consumer confidence and demonstrated how consumers react only to bad news 10 .The negativity bias, well documented in social psychology, political science, and economics 32,33 , is at the basis of this asymmetry in response to bad versus good news: negative information often has a more profound effect on the formation of impressions than positive information.As a result, negative information can have a lasting impact on our perceptions and judgments.Other scholars have challenged the negativity bias and the asymmetric response of consumers.In a study examining the relationship between media reporting of economic news and consumer confidence in the United States, Casey and Owen 31 found evidence of positive and negative consumer confidence asymmetries.
Empirical studies have demonstrated how alternative methods based on textual analysis are more reliable and could complement and reduce the limitations of survey-based methods to describe current economic conditions and better predict a household's future economic activity.For example, a recent study conducted on the accuracy of Swiss opinion surveys revealed that the level of survey bias varies significantly depending on the policy areas being measured.The study found that the strongest biases were observed in areas related to immigration, the environment, and specific types of regulation.
This information is crucial for policymakers and researchers who rely on public opinion surveys to inform their decisions.By understanding the potential biases in survey results, they can make more informed decisions and develop more effective policies 34 .Song and Shin 35 have recently conducted a study on sentiment analysis of South Korean news articles using a lexicon approach.Their findings have demonstrated the potential of news as a valuable source for developing alternative economic indicators that can supplement traditional Consumer Confidence indices.News data is not only cheaper to acquire: its advantages, compared to monthly national surveys, include the ability to observe consumer trends at a more granular level, with more data points, and the ability to capture the social and economic impact of specific issues through a broader perspective 36 .Additional empirical evidence confirms the complicated relationship between consumers and news reported by the media.Through an investigation of the association between consumer spending for durable goods and consumer confidence, Ahmed and Cassou 37 found that news has a relevant impact on confidence during economic expansions, though it is generally not important during economic recessions.
Contributing to this stream of research, we use a novel indicator of semantic importance to evaluate the possible impact of news on consumers' confidence.

Research design
Consumer confidence index survey and selection of keywords.Consumer confidence climate is a monthly economic indicator that measures the degree of optimism perceived by consumers regarding the overall state of the economy and their financial situation, evaluated through their saving and spending habits.Its value is high when consumers spend more and save less and low when consumers save more and spend less.Its trend typically increases when the economy expands and decreases when the economy contracts, reflecting the outlook of consumers with respect to their ability to find and retain good jobs according to their perception of the current state of the economy and their financial situation.
In Italy, the Consumer Confidence Climate survey is composed of a set of questions designed to assess consumers' perceived optimism or pessimism around the Italian economic situation and their expectations for the future.Survey participants provide their opinion about future unemployment, current and future households' financial situation, current and future possibility of savings, current opportunities for durable goods purchases, and current family budget.The answers to nine questions are aggregated, and the result is reported in a seasonally adjusted index 38 .The Consumer Confidence Climate can be broken down into four sub-indices released by the Italian Institute of Statistics (ISTAT).These indices are: the Economic Climate, the Personal Climate, the Current Climate, and the Future Climate.The Economic Climate Index considers consumers' current assessment and future expectations regarding the general economic situation in Italy, as well as their outlook on future unemployment.The index of Personal Climate takes into account various factors that impact a household's financial well-being.These include the current financial situation, savings, significant purchases of durable goods, and the family budget.The Current Climate index analyzes various factors that impact the Italian economy, including the current financial situation of households, their savings, expenditures on durable goods, and family budget.Finally, the Future Climate includes questions related to the foreseen future of the Italian general economic situation, the households' financial situation, unemployment expectations, and savings.We downloaded the target series data from the Italian National Institute of Statistics (ISTAT) website (https:// www.istat.it).
From the Consumer Confidence Climate survey, we extracted economic keywords that were recurring in the survey's questions.We then extended this list by adding other relevant keywords that matched the economic literature and the independent assessment of three economics experts.The inclusion of external experts to validate the selection of keywords is aligned with the methodology used in similar studies 39 .These keywords, translated from Italian, include home, rent, income, pensions, savings, credit, loans, interest rates, prices, market, job, competition, economy, public sector, politics, institutions, basic necessities, global, family, trust, discomfort/ distress, consumer, education degree, purchase, car, PC, and holidays.These keywords provide insight into the concerns and priorities of Italian society.From the basic necessities of home and rent to the complexities of the economy and politics, these words refer to some of the challenges and opportunities individuals and institutions face.We also considered their synonyms and, drawing from past research 20,40 , we considered additional sets of keywords related to the economy or the Covid emergency, including singletons-i.e., individual words-such as Covid and lockdown.
Table 1 shows the full list of ERKs, with the RelFreq column indicating the ratio of the number of times they appear in the text to the total number of news articles.www.nature.com/scientificreports/Computational methods have been recognized as unable to understand human communication and language in all its richness and complexity 41 .Aligned with contemporary approaches to semantic analysis 39,42 , we have integrated computational methods with traditional techniques to analyze online text.Our methodology incorporates algorithmic measures to systematically gather news data.
Telpress International B.V.-a company that collects online news from multiple web sources, including mainstream media sites and blogs-provided access to online news data.The final sample comprised over 1,808,000 news articles published between January 2, 2017, and August 30, 2020.Our textual analysis focused solely on the initial 30% of each news article, including the title and lead.This decision aligns with previous research 21 and is based on the understanding that online news readers tend only to skim the beginning of an article, paying particular attention to the title and opening paragraphs 43,44 .As a robustness check, we ran our models on the full text of the articles but found no significant improvement in results.This information is also useful because it enables faster data analysis.
A new index of importance for economic keywords.The Semantic Brand Score (SBS) is a composite indicator measuring semantic importance, which combines text mining and social network analysis methods.It is applied to (big) textual data to evaluate the importance of one or more 'brands' or, more in general, words or groups of keywords 19 .Its analytical power extends beyond commercial brands.A brand may refer to commercial products, personal brands, a company's core values, or concepts related to societal trends 21 .The SBS indicator is composed of three dimensions: prevalence, diversity, and connectivity.This index builds upon the relationships among words in any given text.Its first component, prevalence, measures how frequently an economic-related keyword is used in the online discourse.The more a word appears in online news, the more readers will remember and recognize it, which could ultimately influence their opinions and behaviors.For instance, consider the scenario where the phrase "economic crisis" is repeatedly featured in the media.This could potentially instill a sense of fear and uncertainty among the general public, leading them to believe that their employment status or financial stability is in jeopardy.However, the importance of a keyword does not only depend on its frequency of occurrence but also on its association with other keywords in the text.For instance, the utilization of the term "economic crisis" in a broad context and its association with various other words can significantly impact people's emotions and actions.Conversely, if the term is solely linked to a job crisis or a war occurring in a far-off land, it can also trigger a shift in people's attitudes and behaviors.To account for these scenarios, the SBS indicator includes other dimensions, diversity and connectivity, which help evaluate how heterogeneous and strong are the associations to an ERK and how much that concept can bridge connections among other terms/concepts in the discourse.The concept is that the greater the frequency of a keyword within a discourse, and the more it is enriched with associations, the more it will be retained and have a significant impact.
To calculate diversity and connectivity, we analyzed the semantic networks generated from online texts using the SBS BI web application 45 , and we relied on the computing resources of the ENEA/CRESCO infrastructure 46 .The first step of the computational process was to apply common text pre-processing routines 47 -such as tokenization, removal of stop-words, and removal of word affixes, known as stemming 48 .The second step was to build a social network of co-occurring words for each week of news and study them through social network analysis 49 .Figure 1 illustrates an example of output visualized using the following sentence attributed to Adam Smith (The Theory of Moral Sentiments, 1759): "The same principle, the same love of system, the same regard to the beauty of order, of art and contrivance, frequently serves to recommend those institutions which tend to promote the public www.nature.com/scientificreports/welfare".In order to improve readability, we labeled nodes before stemming, and we set the co-occurrence threshold to maximum three words.Diversity is a dimension of the SBS index that considers the relationship of economic keywords with the other words in the text.This is related to the construct of brand image 50 and to the idea that, when associations are less common and in a high number, the keyword is more important 19,51 .We operationalized diversity through the following formula, based on the metric of distinctiveness centrality 52 : In general, we consider a graph G, made of n nodes (words) and E edges (word links), associated with a set of connection weights W. In the formula, g j is the degree of node j, which is one of the neighbors of node i (the one for which diversity is calculated).I(w ij > 0) is an indicator function that is equal to 1 when the edge connecting nodes i and j exists, i.e., when w ij > 0, and is equal to 0 when this edge is missing.
The last dimension of the SBS, connectivity, is measured as the weighted betweenness centrality of the ERKs 53,54 and represents their 'brokerage power' , i.e. how much each keyword can serve as a bridge to connect other terms and topics in the discourse 19 .The connectivity formula is based on the analysis of the shortest paths connecting each pair of nodes 49 : where d jk is the number of shortest network paths connecting nodes j and k (calculated using edge weights) and d jk (i) is the number of those paths that include node i.
The final SBS indicator was calculated by summing the standardized scores of its components, considering all the words in the corpus for each timeframe.Aligned with past studies, e.g. 19,21, we used an equal weighting scheme and carried out standardization by subtracting the mean and dividing by the standard deviation, as in the following formula: where PR is prevalence, DI is diversity, and CO is connectivity.We also tested different approaches, such as subtracting the median and dividing by the interquartile range, which did not yield better results.
Lastly, we calculated the language sentiment of all articles as a control variable and a possible additional predictor of the Consumer Confidence Index and its dimensions.Sentiment was computed using the SBS BI web app 45 , which uses a lexicon similar to VADER 55 for the Italian language.Sentiment scores range from − 1 to + 1, with − 1 indicating very negative article content and + 1 the opposite.

Granger causal relationships between keywords and consumer confidence. The Consumer
Confidence series have a monthly frequency, whereas our predictor variables are weekly data series.In order to use the leading information coming from ERKs, we transformed the monthly time series into weekly data points using a temporal disaggregation approach 56 .The primary objective of temporal disaggregation is to obtain highfrequency estimates under the restriction of the low-frequency data, which exhibit long-term movements of the series.Given that the Consumer Confidence surveys are conducted within the initial 15 days of each month, we conducted a temporal disaggregation to ensure that the initial values of the weekly series were in line with the monthly series.To obtain weekly values, we applied a cubic spline interpolation [57][58][59] .Figure 2 illustrates the disaggregated series we obtained.
To measure whether the SBS indicators offered relevant information to anticipate our economic variables, we performed Granger Causality tests.In general, a time series is said to Granger-cause another time series if the former has incremental predictive power on the latter.Therefore, Granger causality provides an indication of whether one event or variable occurs prior to another.We also looked at the cross-correlation of the target series with our predictors (i.e., ERKs series) to see if they were in phase (positive signs of cross-correlation) or out of phase (negative sign) 60,61 .
Figure 3 outlines the methodology employed in our research design.We started by identifying the Economic Related Keywords (singletons or word sets).We then calculated the SBS indicators to measure the keyword's importance and applied Granger causality methods to predict the consumer confidence indicators.

Results
In this section, we discuss the signs of cross-correlation and the results of the Granger causality tests used to identify the indicators that could anticipate the consumer confidence components (see Table 2).In line with past research, e.g. 62,63, we dynamically selected the number of lags using the Bayesian Information Criteria.The models indicate that 61% of the semantic importance series of ERKs Granger-cause the Personal component of the Consumer Climate index, while only 34% Granger-cause the Future component and 27% the Current component.It is not surprising that average consumers have a better understanding of their personal situation when responding to questions but may be less informed about economic cycles.When answering questions about their own financial situation, individuals are likely to have a more accurate understanding of their personal (n − 1) g j I(w ij > 0).www.nature.com/scientificreports/circumstances.However, when it comes to broader economic trends and cycles, the average consumer may not have the same level of knowledge or expertise.This is understandable, as economic cycles can be complex and difficult to understand without specialized training or experience.Interestingly, this representation of the current situation comes from online news, which may report what is currently happening more than depicting future scenarios-which may directly impact consumers' opinions and economic decisions.Among the most significant concepts strongly associated with the consumers' confidence in the future, we find keywords such as educational degree, purchasing ability, and a list of European programs to support Italy during the recession (e.g., Troika, Sure).www.nature.com/scientificreports/Among the political and institutional keywords primarily associated with a perceived deterioration of consumers' economic conditions, we found Politics, European Union, and the National Retirement System/INPS (see their negative signs in Table 3).
It is unsurprising to note a significant negative Granger causality between the Covid keyword and the consumer evaluation of the economic climate.This implies that as the Covid term becomes more prevalent and widespread in online discussions, consumers' assessments and expectations of the Italian economic situation become increasingly pessimistic, with a bleak outlook on future employment prospects.
Another interesting result is the strong negative Granger causality between the keywords educational degree, unemployment, purchase, and the Climate's Future index, describing the expectations on the Italian economic situation, households' financial situation, unemployment expectations, and savings.The findings suggest that when education takes center stage in online discussions, consumers tend to have a less optimistic outlook on the future.This could indicate a decrease in trust regarding the effectiveness of education and obtaining degrees in shaping future prospects.Similarly, there seems to be a more pessimistic response from consumers when the news reports frequently and consistently about purchasing ability and fighting unemployment.This finding is consistent with another result that indicates a negative Granger causality between media coverage of the national retirement system and individuals' current and future personal situations.If we consider that the unemployment rate in Italy from 2016 to 2020 went from 11.7 to 9.3% (https:// www.stati sta.com/ stati stics/ 531010/ unemp loyme nt-rate-italy/, accessed April 16, 2021), this pessimistic view is consistent with the inception and spread of an economic downturn, partially ascribed to the Covid-19 pandemic.This result seems to confirm the negativity bias: when the present looks good, people feel more optimistic about the current and future economy 10 .
In summary, the findings presented in Table 2 indicate that 27% of the selected keywords have a Grangercausal relationship with the aggregate Climate.This percentage is consistent with the results obtained when evaluating Granger causality for the Current dimension of the survey.These results suggest that a significant portion of the selected keywords can be used to predict changes in the Climate dimension, providing valuable insights for future research and decision-making.Our tests indicate that a higher number of keywords could impact how consumers perceive the Future situation.However, the most significant impact appears to be on the personal climate, as evidenced by 61% of significant Granger causality tests.
Finally, it is worth noting that the sentiment variable exhibits a significant correlation solely with the Personal component of the Consumer Confidence Index.
Table 3 provides a breakdown of the nine questions and offers a more granular view of the impact of ERKs on the adjusted consumer confidence index.Five of the nine questions pertain to the present circumstances, either of the household or the country, while the remaining four relate to the future expectations of both the nation and the consumers themselves.The importance of some keywords seems to be more impactful when associated with the single questions than with the aggregate climate measures presented in Table 2.In particular, all the keywords are strongly significant and improve the predictability of the household's economic situation index.Moreover, keywords related to Covid-19 appear highly predictive of consumers' current and future evaluation of the household economic situation and future unemployment.
In line with the findings presented in Table 2, it appears that ERKs have a greater influence on current assessments than on future projections.This is aligned with the current debate in the literature on consumer confidence, as it is still unclear whether surveys merely reflect current or past events or provide useful information about the future of household spending 8 .
The Granger causality tests for sentiment indicate significance only for the second question, which pertains to the assessment of the household's economic situation.
Forecasting.As an additional step in our analysis, we conducted a forecasting exercise to examine the predictive capabilities of our new indicators in forecasting the Consumer Confidence Index.Our sample size is limited, which means that our analysis only serves as an indication of the potential of textual data to predict consumer confidence information.It is important to note that our findings should not be considered a final answer to the problem.
We performed monthly out-of-sample forecasting of the five main CCI indices (Climate Overall, Future Climate, Current Climate, Personal Climate, and Economic Climate).where y t is the target series, h represents the number of steps ahead to forecast, φ i is the ith coefficient of the autoregressive model of order p = 2 and ε t represents a serially uncorrelated error white noise.We also tested other different models with more lags, without getting to better forecasting results.
To combine information from our large set of economic-related keywords and use it for forecasting, we created a factor-augmented autoregressive model (FAAR, also indicated as SBS ERK model) whose h-step ahead forecast is given by the following equation: where F t represents an R × 1 vector of factors and ξ a coefficient vector.All our models include the AR(2) com- ponent.Model estimation was carried out using an initial window of 60 weekly observations, expanded at each regression step.The total out-of-sample period comprised 30 monthly forecasts.The optimal number of factors was estimated dynamically through Partial Least Squares and using the Bayesian Information Criterion (BIC), with a maximum number of 4 factors 64 .
We compared the forecasting results of our new models (which comprise the SBS scores of economic-related keywords) with those of our benchmark (the autoregressive model) and of two other models including the sentiment indicator (i.e., sentiment together with the AR(2) terms and the SBS ERK model also including sentiment).
Moreover, to go beyond the aggregate measures and get a complete picture of the SBS performance, we investigated the individual components-prevalence, diversity, and connectivity-separately.
Lastly, we considered a model based on BERT encodings 65 as an additional forecasting baseline.
In particular, this model was based on a neural network that processed encodings extracted by a pre-trained BERT model.In the following, the encodings extraction stage is first detailed, and then the neural network structure and its optimization are described.
Since the news articles considered in this work are written in Italian, we used a BERT tokenizer to preprocess the news articles and a BERT model to encode them; both pre-trained on a corpus including only Italian documents.
The BERT model for the computation of the encodings processes input vectors with a maximum of 512 tokens.Therefore, a strategy to handle vectors with more than 512 elements is necessary.In this work, we considered and compared two variants.
The first (referred to as BERT-truncated) considered only the first 30% of the tokens resulting from the tokenization procedure of the input news article.We truncated or padded the token vector with zeros to get 510 elements and added the classification [CLS] and separation [SEP] tags.The resulting vector was fed into a pretrained BERT encoder, which computed a 768-element encoding vector for each token.Among these, we only considered the encoding of the [CLS] token to represent the news article, as it captures BERT's understanding at the news level.
In a second approach (referred to as BERT-chunk), we divided the token vector of each article into chunks of 510 elements, adding at the beginning and the end the [CLS] and [SEP] tags, respectively.The last chunk was padded to 512, if necessary.The BERT model then processed each chunk to extract the embeddings associated with the [CLS] tag, as in the BERT-truncated case.The embeddings of the [CLS] tags of all the chunks were then averaged to obtain a vector representing the full news article.
In both cases, the encodings of the [CLS] tokens for all the news articles in a week were averaged to obtain a vector summarizing the information for that week.
To nowcast CCI indexes, we trained a neural network that took the BERT encoding of the current week and the last available CCI index score (of the previous month) as input.The network comprised a hidden layer with ReLU activation, a dropout layer for regularization, and an output layer with linear activation that predicts the CCI index.
As with the other forecasting models, we implemented an expanding window approach to generate our predictions.Specifically, we started with an initial subset of data to train the neural network and make a first prediction Additionally, we tested a neural network architecture with recurrent layers to explicitly model temporal dependencies.However, the performance we obtained was worse than the non-recurrent version we reported in the result section.This is probably due to the limited number of training samples, which are insufficient to optimize the more complex recurrent model.Table 4 illustrates the mean square forecasting errors (MSFEs) relative to the AR(2) forecasts.The numbers in the table represent the forecasting error of each model with respect to the AR(2) forecasting error.We used the Diebold-Mariano test 66 to determine if the forecasting errors of each model were statistically worse (in italic) than the best model, whose RMSFEs are highlighted in bold.
The empirical findings indicate that SBS ERK models produce the most accurate forecasts for Climate Overall, Personal, and Economic Climate, while adding sentiment leads to the best forecasting of Future Climate.
Looking at SBS components, we can notice that all of them are equally accurate in forecasting Personal Climate, while connectivity is the best performer also for Economic and Current Climate, for this second variable together with diversity.Notice that both AR and BERT models are always statistically different with respect to the best performer, while AR(2) + Sentiment performs worse than the best model for 3 variables out of 5.
The worse performance of the BERT models can be attributed to the insufficient number of training samples, which hinders the neural network's ability to learn the forecasting task and generalize to unseen samples.A much larger dataset would be required to effectively leverage the high dimensionality of BERT encodings and model the complex dependencies between news and CCI indexes.Interestingly, the BERT-chunk model performed approximately the same as the BERT-truncated one.This is in line with the idea that most of the relevant information of a news article is contained at its beginning or that online readers focus mainly on the headline and the lead 67 .

Discussion and conclusions
Our research sheds light on the importance of incorporating diverse data sources in economic analysis and highlights the potential of text mining in providing valuable insights into consumer behavior and market trends.Through the use of semantic network analysis of online news, we conducted an investigation into consumer confidence.Our findings revealed that media communication significantly impacts consumers' perceptions of the state of the economy.Figure 4 shows the economic-related keywords that can have a major role in influencing consumer confidence (those with the most significant Granger-causality scores, as presented in Section "Results").Through a granular analysis of the dimensions of consumer confidence, we found that the extent to which the news impacts consumers' economic perception changes if we consider people's current versus prospective judgments.Our forecasting results demonstrate that the SBS indicator predicts most consumer perception categories more than the language sentiment expressed in the articles.ERKs seem to impact more the Personal climate, i.e., consumers' perception of their current ability to save, purchase durable assets, and feel economically stable.In addition, we find a disconnect between the ERKs' impact on the current and future assessments of the economy, which is aligned with other studies 68,69 .While the Consumer Confidence Index has often been considered a suitable predictor of economic growth and a good indicator of consumers' optimism about the current economy, short-term estimations may show deviations from long-term trends, likely caused by nonsystematic shocks.
Lastly, keywords associated with national or European political decisions seem to lead to more uncertainty and pessimism.This is consistent with other empirical evidence demonstrating how the conduct of politics-in our case, both at a national and European level-plays a role in determining how consumers feel about the economy's future in both the long and short run 9 .The higher prominence and predictive power of political keywords, both as it refers to economic and non-economic concerns, have been considered in past research among the key determinants of consumers' perception of the future of the economy 26,70 .
These results are aligned with previous studies showing how exposure to uncertain information makes people feel uncertain and more pessimistic about their future 11,14,71 .People's reaction was more positive when keywords were associated with clear financial concepts (e.g., gold or monetary policy).When keywords were related to political discussions or concepts like rent, the role of Europe, or retirement, people's reaction was more negative.Interestingly, the keyword "gold" had an impact on determining consumer confidence in six of the nine questions: Evaluation of the Economic situation in Italy; Evaluation of the household economic situation; Evaluation of the household budget; Current Opportunities for Savings; Current Opportunities of Purchasing Durable Goods and Expectations on the economic situation of Italy.During economic downturns, such as the 2018 and 2019 recession in Italy, financial institutions often increase their holdings of gold as reserve assets.This may be due to the perception that gold is a safe and stable investment during times of economic uncertainty.As a result, consumers may view this move positively, as it signals financial stability and security within the institution.As demonstrated by a study commissioned by the IMF 72 , macroeconomic announcements have a significant impact on both the price of gold and consumer confidence.
Even if exploratory in nature, our study suggests that news has important implications on consumer confidence during economic recessions, not only during an economic expansion, as suggested by recent research 37 .Overall, our models confirm the important role played by the media in shaping current judgments and future expectations 11 , and the impact that national and European politics have on shaping these assessments 9 .
This article investigates the antecedents of consumer confidence by analyzing the importance of economicrelated keywords as reported on online news.After mining online Italian news over a period of four years, we found that most of the selected keywords impact how consumers perceive their personal economic situation.
Overall, this study offers valuable insights into the potential of semantic network analysis in economic research and underscores the need for a multidimensional approach to economic analysis.This study contributes to consumer confidence and news literature by illustrating the benefits of adopting a big data approach to describe current economic conditions and better predict a household's future economic activity.The methodology in this article uses a new indicator of semantic importance applied to economic-related keywords, which promises to offer a complementary approach to estimating consumer confidence, lessening the limitations of traditional survey-based methods.The potential benefits of utilizing text mining of online news for market prediction are undeniable, and further research and development in this area will undoubtedly yield exciting results.For example, future studies could consider exploring other characteristics of news and textual variables connected to psychological aspects of natural language use 73 or consider measures such as language concreteness 74 .

Figure 2 .
Figure 2. Temporal disaggregation of consumer confidence series.

Table 2 .
Granger causality tests and cross-correlation signs of ERKs series and Consumer Climate (and its four components).*p < .10. * * p < .05. * * * p < .01.for the next period.The training set window was subsequently expanded by including the next observation, and the process was repeated recursively.

Table 4 .
Forecasting results.Bold figures indicate the best forecasting models.Italic figures are those for which the Diebold-Mariano test rejects the null hypothesis of equal predictive accuracy with respect to the best forecasting model at a significance level of 0.1 (*), 0.05 (**), or 0.01 (***).