The language of happiness in self-reported descriptions of happy moments: Words, concepts, and entities

This article attempts to study the language of happiness from a double perspective. First, the impact and relevance of sentiment words and expressions in self-reported descriptions of happiness are examined. Second, the sources of happiness that are mentioned in such descriptions are identified. A large sample of “happy moments” from the HappyDB corpus is processed employing advanced text analytics techniques. The sentiment analysis results reveal that positive lexical items have a limited role in the description of happy moments. For the second objective, unsupervised machine learning algorithms are used to extract and cluster keywords and manually label the resulting semantic classes. Results indicate that these classes, linguistically materialized in compact lexical families, accurately describe the sources of happiness, a result that is reinforced by our named entities analysis, which also reveals the important role that commercial products and services play as a source of happiness. Thus, this study attempts to provide methodological underpinnings for the automatic processing of self-reported happy moments, and contributes to a better understanding of the linguistic expression of happiness, with interdisciplinary implications for fields such as affective content analysis, sentiment analysis, and cultural, social and behavioural studies.


Introduction
H appiness has been the focus of a large body of research across many fields, from sociology and psychology to philosophy and economics, with an equally broad range of perspectives, objectives, and methodologies. Social surveying in the form of self-reported assessment of happiness is perhaps the traditional way in which researchers have attempted to measure this human emotion, particularly in psychology, although this method has been shown to have serious pitfalls, as people tend to make judgements based on events and moods that they have experienced recently, and consequently omit their entire life experiences (Zajchowski et al. 2017). In addition, self-reports are bound to their context, as they cannot measure equally the actual well-being of people across languages and cultures (Wierzbicka, 2004(Wierzbicka, , 2011. In this way, the language of happiness is not exempt of conceptual issues: words such as happiness and happy present certain cognitive scenarios which not only distinguish them from each other but also establish a distinction between such words and their conceptual counterparts in other languages. As a result, these words do not have exact semantic equivalents in other languages and cannot be translated interchangeably: the English happiness and the German Glück, for example, do not share the same meanings, as the latter is often used to refer to good fortune rather than happiness itself, and its use is closer to that of the French bonheur than the English happiness.
With the advent of social media and the availability of large amounts of user-generated content, researchers have turned to readily available text as raw data from which to monitor opinions, emotions and, generally speaking, sentiment. This is the obvious advantage of leveraging social media to measure social happiness, as acquiring these data would otherwise involve costly and timeconsuming processes.
Studies that illustrate this trend include Kramer (2010), who attempted to measure users' happiness by analyzing their Facebook status updates and identifying positive and negative words found in them; Dodds et al. (2011) carried out a large-scale hedonometrics-centered study to analyze 4.6 billion tweets over a 33-month period and measure social happiness based on the presence of a set of 10.000 words for which happiness scores had been calculated through crowd-sourcing techniques. The assumption in these and other similar studies is that positive words are used to describe happiness-related events and, since the volume of text that social media produces vastly exceeds the limits of qualitative techniques, researchers can use software tools and resources, such as sentiment dictionaries, either compiled ad hoc or readily available.
Examples of such sentiment dictionaries include The Harvard General Inquirer (Stone and Hunt, 1963), MPQA (Wilson et al. 2005), Bing Liu's Opinion Lexicon (Hu and Liu, 2004), SentiWordNet (Baccianella et al. 2010), SO-CAL (Taboada et al. 2011), EmoLex (Mohammad and Turney, 2010), VADER (Hutto and Gilbert, 2014), and SenticNet (Cambria et al. 2020). These resources have different features and thus lend themselves to be used in different tasks and for different purposes. Generally, they are word lists with varying degrees of sentiment information, although some (SenticNet) are conceptually structured and offer a rich array of emotion related data. The simplest ones, e.g., The Harvard General Inquirer, consist of a simple polarity lexicon of English single-word items, classified as either positive or negative, others may include intensity on the polarity (e.g., VADER), sometimes referred to as valence, and yet others offer richer information that attempts to tackle emotion categorization (EmoLex). Needless to say, these resources lend themselves to different applications and scenarios, from simple binary classification of polarity to emotion detection and classification.
Reliance on this type of lexical resources, nevertheless, is not without its issues. The most obvious limitation is that their coverage determines to a large extent the quality of the results; obviously, if a certain lexical item is not included in the sentiment dictionary, it will not be identified by the search algorithm. Multiword expressions also pose a well-known challenge, as they have traditionally been neglected by the creators of such resources (Constant et al. 2017); in fact, none of the above-mentioned lexica contain such multiword units. Other related issues include the format in which lexical items are listed in the dictionary (as word forms, as lemmas, or as stems), which determines the search strategies, and may ultimately skew results.
The second problem is harder to overcome, and refers to the fact that the semantics of individual words and phrases may be altered by the context in which they appear, sometimes to the point that they actually mean exactly the opposite of what they initially denote, which is especially true of sentiment words. For example, the positive word "happy" will invert its polarity simply by a preceding negative adverb, such as "not" or "never". A good lexicon-based sentiment analysis system, therefore, needs to account for the impact of such contextual shifters, which is not an easy task. For instance, we can implement a rule that inverts the sentiment of "happy" when it is preceded by "never" in a span of 3 words, which would correctly classify as negative expressions such as "I never was really happy there", but would incorrectly classify cases like "I've never been so happy before". Various systems of contextual shifters have been developed within the field of sentiment analysis (e.g., Kennedy and Inkpen, 2006;Moreno-Ortiz and Pérez-Hernández, 2018;Polanyi and Zaenen, 2006). However, the level of difficulty that sentence-level context poses pales in comparison to higher-order linguistic levels of analysis; discourse-related phenomena, such as the metaphorical usage of words, irony, sarcasm, understatements, or humblebragging (all of which are pervasive in social media) constitute a serious problem for which no practical solutions are offered at present.
Yet another important caveat to this methodological sentiment lexicon approach, regardless of the quality of the sentiment dictionary employed and the sophistication of the contextprocessing algorithm, is that the expression of happy events or situations might not contain any sentiment-laden word at all. For example, the question "what makes you happy?", may elicit responses such as "going for a walk in the park" or "watching the rain with a cup of coffee in my hand", where no positive (or negative) words are present. It follows that the language of happiness involves much more than just the presence or absence of sentiment-carrying words.
In this work we set out to investigate the sources of happiness mentioned in self-reported happy moments by identifying the relevant concepts and entities. Seligman (2002) already posited the existence of three main sources of happiness in his theory of authentic happiness according to the usage of this word in the English language: happiness as pleasure (a hedonic enjoyment based on the quest for comfort), happiness as engagement (a state of focus on a specific task), and happiness as meaning (a connection to something that goes beyond oneself through the use of one's capabilities). Myers (2000) and Argyle (2001) also emphasized the role played by social interactions, work, and leisure as some of the variables that influence happiness. While social relationships are thought to provide happiness as a source of joy, work can positively influence one's self-esteem, and leisure activities can provide a sense of identity and relaxation (Crossley and Langdridge, 2005). Visakko and Voutilainen (2020) add mental states, social entities, and material possessions to this list. Furthermore, they distinguish between happiness itself and happiness as an experience, and argue that although models of happiness target private mental processes, "talk about happiness may occur independently of those objects of experience". Kahneman (2011) also established a distinction that affects these sources of happiness and how they are expressed: the experiencing self (who lives in the present and describes the current or general emotional state) and the remembering self (the storyteller that recalls happy and/or unhappy memories). The main objective of this study is to offer an analysis of the language of happiness in English-written self-reported happy moments that considers the issues discussed above based on actual quantitative evidence. Our work, however, is not aimed at measuring human emotions, but at understanding and evidencing the verbal expression of emotions, through the identification and quantification of linguistic elements such as reported entities or activity types. It must also be noted that this study concerns the English language only. Therefore, the derived concepts are those relevant to English exclusively, as cross-linguistic research has already shown that the concepts associated with the expression of happiness in English cannot be extended to other languages (Wierzbicka, 2011;Kavanova et al. 2021). Specifically, we aim to answer the following questions: Research question 1: What is the precise relevance of sentiment-laden words and expressions in self-reported descriptions of happy moments? Research question 2: Are there significant differences in the types of sentiment words that are used to express the various categories of expressions of happiness? Research question 3: What sources of happiness are mentioned in self-reported happy moments? How are they materialized in language?

Method
Our sample is HappyDB (Asai et al. 2018), a freely available, crowd-sourced corpus of over 100,000 expressions of happiness compiled with the aim of serving as a linguistic resource to research the language of happiness in English. 1 Our analysis is meant to be as thorough as possible and to employ state-of-theart methodologies and tools. Given the size of the corpus (n = 100,922 happy moments), we use mainly quantitative Natural Language Processing (NLP) techniques, but we also provide a number of qualitative observations on the results of the quantitative analysis. We employ the following text analysis techniques and methodologies: basic text statistics, sentiment analysis (using both machine-learning and lexicon-based approaches), keyword extraction and classification (using word embeddings, graphbased ranking algorithms, and clustering algorithms), and entity analysis, using a combination of neural NER (Named Entity Recognition) and morpho-syntactic pattern matching.

Data preparation
The authors of HappyDB employed 10,843 workers through Amazon Mechanical Turk (MTurk) over a three-month period, who generated an average of 9.31 moments per worker. Workers on MTurk get monetary remuneration for their contributions, so this compilation method does skew the sample in certain ways, a fact that is admitted by the authors, especially in terms of age: nearly half of the informants are in the range 20-30 years-old, and almost 30 percent in the 30-40 range. Furthermore, although most informants are US residents (86 percent), no indication is given as to their status as native speakers or their social status in terms of income. The former variable may have an impact on the vocabulary that is used to express ideas and emotions (Adib et al. 2019), whereas the latter may determine the types of activities (e.g., leisure, family) that are described. 2 The original responses were automatically cleaned and spell-checked by the authors. However, on closer examination, we did find further issues with the dataset, derived mainly from the collection method. Basically, many informants simply copied and pasted text from online sources, such as online blog entries, resulting in a large number of invalid examples, repeated dozens of times. These "fake" happy moments amounted to over 9000 entries (for example, the Wikipedia entry for "happiness" was found 42 times in the "cleaned_hm.csv" dataset). What is more, many of these consisted of very long paragraphs and even full essays, which means they do have a big impact on the overall dataset. We performed a semi-automatic cleansing of the dataset, where we also removed a number of responses that had obviously been produced by informants with hardly any English language literacy. Thus, we ended up with 91,608 happy moments, down from the original 100,922 cited by the authors. 3 A relevant feature of HappyDB is the classification of moments in a set of categories based on (in the words of the authors) "research in positive psychology that also reflects the contents of  LEISURE,and NATURE. 4 Rather than classifying by hand the complete set of moments, the classification was performed automatically by training a logistic regression classifier on a subset of 15,000 moments manually classified moments by 5 workers, 3 of whom had to agree in order to consider a label valid. The authors provide accuracy figures of this classification task, which vary considerably, from F1 = 0.92 for AFFECTION to just 0.54 for ENJOY THE MOMENT.
As for the analysis of sentiment and emotions, the authors do offer two studies on the dataset, one based on running a small sample (n = 500) of moments on the Linguistic Inquiry and Word Count (LIWC) (Pennebaker et al. 2015), and the second using the Valence-Arousal-Dominance (VAD) model of emotion (Warrinera et al. 2013). The former analysis concludes (unsurprisingly) that the HappyDB corpus is very disclosing and honest, thus making it useful for the study of human emotions. The VAD scores thrown by the latter, which are also released with the corpus, point in the same direction.
Since its release, other studies have been carried out using the Happy DB corpus or related resources. The CL-Aff shared task on affective content analysis focused on the prediction of thematic (agency and sociality) labels and the characterization of happy moments in terms of affect, emotion, participants, and content (Jaidka et al. 2019). It has also been used in studies that relate sociological and demographic features of the contributor (such as age, gender, marital status, or country of origin) to specific moments and categories of happiness found in the corpus (Adib et al. 2019). These authors studied, for instance, the correlation between gender and number of contributions (as participants generated multiple contributions to the dataset). They found that male participants contributed more to the dataset (with an average of 11 observations) than female participants (with an average of 8 contributions), even if the number of female/male participants was similar or that "people in the married group is happier when they eat steak whereas people in the single group prefers to eat pizza" (Adib et al. 2019, p. 659).
Data analysis methods. As a first approximation we compute basic text statistics on the cleaned dataset, including more linguistically oriented data than offered in the original paper. We then proceed to perform more sophisticated analytics, which we describe below. Sentiment analysis. Sentiment analysis, or opinion mining, is the field of NLP that analyzes "people's opinion, sentiments, evaluations, appraisals, attitudes, and emotions towards entities such as products, services, organizations, individuals, issues, events, topics, and their attributes" (Liu, 2011, p. 459). The basic tasks are polarity detection and emotion recognition, performed by the use of a classification task (Cambria et al. 2017). Generally speaking, corpus-based (i.e., machine learning) approaches are predominant both in industry and research, since they have been shown to excel in classification tasks. Lexicon-based approaches, which use a sentiment lexicon to provide the polarity for each word or phrase found in the text, are also employed. Current state of the art in sentiment classification is offered by unsupervised machine-learning approaches in the form of neural networks that use Transformers (Vaswani et al. 2018). Language models based on the Transformers architecture, which was introduced with BERT (Devlin et al. 2019), have been shown to improve on previous top benchmark scores across numerous NLP tasks, both in natural language understanding and generation, including sentiment analysis (Wolf et al. 2020).
Machine-learning sentiment classifiers, regardless of their accuracy, pose an important limitation for our objectives, however. They can help us assess what proportion of happy moments are actually classified as positive, but this is only a proxy metric regarding the main questions we attempt to answer in this section: what the proportion of positive words in happy moments is and which are those words. A sentiment classifier is a predictive model that acts as a black box which can provide no explanations as to how the classification result was arrived at, only (in certain cases) a confidence score. This is true of all machine-learning classifiers, but especially of neural networks. This is where lexicon-based sentiment analysis systems can be useful, since they determine the overall sentiment of a text by identifying specific sentiment words and phrases, which they can of course list in their output along with the classification. These systems require rich lexical knowledge to achieve good results in different domains and their quality and coverage are the most determining factor in their performance.
We first carry out a sentiment classification task using both tools and compare their results. We then use the lexicon-based sentiment analysis tool to offer an in-depth study of the sentiment-carrying words and expressions found in HappyDB's happy moments. Our machine-learning classifier is the Hugging-Face Transformers (Wolf et al. 2020) implementation using the DistilBERT (Sanh et al. 2019) language model, fine-tuned for sentiment analysis on the Stanford Sentiment Treebank v. 2 (Socher et al. 2013). We used the built-in pipeline released with HuggingFace Transformers, which returns for each text input a binary classification (POSITIVE, NEGATIVE) and a confidence score. As for the lexicon-based sentiment analysis system, we used Lingmotif (Moreno-Ortiz, 2017) in its 2.0 version, a multilingual, multi-platform, lexicon-based tool that can be used with generallanguage and domain-specific texts. Lingmotif 2.0 is implemented as a Python 3 library, accessed through a REST API with a userfriendly web-based interface. 5 Lingmotif determines the semantic orientation of a text by detecting linguistic expressions of polarity, but it also offers a rich set of quantitative data, graphic visualizations of the sentiment metrics. 6 The lexical resources contained in the application constitute the core of the sentiment engine, named Lingmotif-lex (Moreno-Ortiz and Pérez-Hernández, 2018), a manually-curated, wide coverage, domainneutral, English sentiment lexicon, which contains over 28,000 single-word forms and over 38,000 multi-word expressions. Lingmotif also allows the use of optional user-generated plugin lexicons to account for domain specificity: lexical information contained in the domain-specific plugin lexicon overrides that included in Lingmotif's core lexicon. In addition, the system implements a comprehensive set of sentiment shifters to account for context-dependent modifications of the valence.
Keyword analysis. Keyword identification and extraction is a well-known method to summarize the contents of texts and identify what topics are being discussed. Quantitative and qualitative keyword analyses of our corpus allow us to (i) categorize thematic areas relevant in the description of happiness made by informants, and (ii) examine which thematic areas, which we term conceptual classes, can be found in each HappyDB category, their lexical tightness, and their distribution across different categories. This, ultimately, should help us identify which, according to our corpus, are the sources of happiness.
Several keyword extraction methods have been proposed; they are usually classified according to the approach that they employ. Siddiqi and Sharan (2015) mention the following: linguistic (i.e., rule-based), statistical, machine learning (both supervised and unsupervised), and domain-specific approaches. Our extraction technique of choice is the unsupervised machine-learning algorithm known as TextRank (Mihalcea and Tarau, 2004). This decision was made after comparing results with two other methods: one using the statistical approach commonly used in Corpus Linguistics research, based on the comparison of word frequencies in the focus corpus with that of a reference corpus, and another one using the RAKE algorithm (Rose et al. 2010).
TextRank employs a graph-based ranking model, thus being language-independent, which produces good results in general. We used the PyTextRank implementation (Nathan, 2016), with several improvements over the original algorithm; notably, it uses lemmas instead of stems for constructing the graphs, and includes verbs instead of just nouns. We did, however, modify its output slightly to filter out some unwanted results derived from the corpus compilation method (e.g., "amazon", "mechanical", "turk") as well as punctuation marks, which were often returned as lemmas in keyphrases.
Our procedure for analyzing keywords followed these steps: 1. Extraction of keywords by category: PyTextRank returns a ranked list of candidate keywords and keyphrases, together with a keyness score in the range 0-1. An additional issue is that it runs as a pipeline component of the SpaCy (Honnibal et al. 2020) NLP package, which has a limit of 100,000 characters for Doc objects. This meant that we needed to extract keywords in batches and then aggregate results. Aggregation was performed by adding scores of individual keywords. We filtered the output as mentioned and saved the full output. 7 The number of keywords extracted in this step is proportional to category size (see Fig. 1), and it ranged from over 13,000 for ACHIEVEMENT to under 1000 for EXERCISE. 2. Vectorization of keywords: in order to study the semantics of keywords we obtained the word embeddings for the top n keywords in each category. We used the "en_core_-web_lg" language model to get the vectors from Spacy's Doc objects of each keyword/keyphrase; therefore, the vectors had 300 dimensions. Since our objective is to be able to identify the sources of happiness within each of the HappyDB categories, and these are bound to be materialized in language as either nouns or verbs, we filtered out "noise" words from the keywords prior to getting their vectors. Thus, the keyphrase "a new cell phone" becomes "cell phone" before obtaining its vector. 3. Clustering of keywords within each category: in order to get a deeper insight of the sources of happiness in each of the categories, we applied the K-means clustering algorithm to ARTICLE HUMANITIES AND SOCIAL SCIENCES COMMUNICATIONS | https://doi.org/10.1057/s41599-022-01202-8 each set of vectors and generated 2-dimensional visualizations using t-SNE. The number of clusters in each dimension was arrived at by calculating the Silhouette Coefficient. We used the K-means, t-SNE, and silhouette implementations found in the scikit-learn package (Buitinck et al. 2013), and plotted the 2D visualizations with Matplotlib. The suggested number of clusters was optimized by means of informal qualitative examination. We then checked for conceptual consistency within each cluster and assigned it a suitable label.
Entity extraction and clustering. In addition to keywords, named entities can also hold a clue as to what the sources of happiness are. Entities such as commercial brands may point to a certain type of consumer products, while cities or countries may be mentioned when discussing traveling destinations. Entities add very specific information to keyword clusters as sources of happiness, as they may draw an accurate picture of our habits and preferences, mainly in reference to consumption tendencies and the identification of certain products, celebrations, or places as essential elements in our description of happiness. Named-entity Recognition (NER) is one of the most relevant tasks in NLP, having received a lot of attention over the years, and therefore current techniques offer high accuracy. Our approach to entity identification is hybrid, combining state-of-the-art, neural architecture, transition-based NER (Lample et al. 2016) with a linguistic method based on the identification of part-of-speech patterns (e.g., sequences of proper nouns).

Analysis and results
Basic text metrics. Table 1 offers descriptive statistics on a number of basic text metrics. All counts were obtained with the SpaCy (Honnibal et al. 2020) NLP package using the "en_cor-e_web_lg" model. Statistics were computed with the Pandas Python library. It is worth noting that the number of sentences must not be considered very accurate, due to punctuation misuse (e.g., use of commas instead of periods is common in usergenerated text). The most relevant information in this table is the great variability in the length of the responses (termed "moments" by the authors), as indicated by the high variance and standard deviation in the number of tokens and words (tokens include all character sequences and punctuation marks; words are counted as tokens consisting of alpha-numeric characters, including stopwords). Another interesting datum is the relatively low average number of adjectives per moment, especially taking into account that users are meant to be describing emotionallyrelevant events. This is already an indication of the possibility that the proportion of polarity-carrying words may be lower than expected, a fact for which clearer evidence will be offered.
Specific text metrics for the seven Happy DB's categories mentioned above (ACHIEVEMENT, AFFECTION, BONDING, ENJOY THE MOMENT, EXERCISE, LEISURE, and NATURE) are provided in the supplementary material (Supplementary Tables S1, S2).
As can be viewed in Fig. 1 below, categories are disproportionate in size, with ACHIEVEMENT and AFFECTION making up nearly 68 percent of the total, which suggests that these two categories are the most salient in defining people's happiness. The average number of words per sentence is also a revealing figure, as this metric is commonly used in readability formulas (higher numbers suggesting higher sentence complexity). Although other variables need to be accounted for, all things being equal, EXERCISE and LEISURE do have substantially lower scores, which suggests that moments in these categories are more easily described.
As a first approximation to the lexical content of the dataset, Supplementary Table S3 shows the top 50 (lexical) words used in it. This is not a word list in the traditional sense, since we generated it grouping word forms by lemma, excluded stopwords (very frequent grammatical items), and only included the main four lexical parts of speech: nouns, lexical verbs (excluding  auxiliaries), adjectives, and adverbs. Most of these words can be grouped in a number of conceptual classes that are recurring in this corpus and already point to potential sources of happiness: friends and family, activities, feelings and emotions, food and time references. The analysis of keywords we carry out later refines the information offered in this table and allows us to cluster words around such conceptual classes so as to identify the most relevant ones.
The sentiment of happy moments. Figure 2 displays the classification results of both sentiment analysis systems described above. Comparing these results, however, is not straightforward. As mentioned before, the Transformers model produces a binary classification, whereas Lingmotif produces a three-class classification in which texts that contain no sentiment words at all are classified as "NONE". In order to make results more directly comparable, we decided to classify as "NONE" those cases in which the Transformers model produced a confidence score below 0.9. It is the percentage of positive items, then, that should be considered as directly comparable, which, as we can see, is very similar.
If we bear in mind that this dataset contains expressions of happiness exclusively, the results are surprising, since we would expect that the vast majority of happy moments would be classified as positive. The reason why this does not happen is the absence of positive terms and/or the presence of negative items in the sentences, which leads to errors from a sentiment analysis perspective: instances that Lingmotif has classified as neutral or negative but should have been understood as positive or at least related to some description of a happy event. Given this assumption, the Transformers classifier shows better performance than the lexicon-based one, but only by a very small margin, especially if we consider the number of moments classified as negative.
It is also worth looking at classification results by category. Table 2 shows the statistics resulting from Lingmotif's classification.
The result of the chi-square goodness-of-fit test (χ 2 (6) = 423.10 p < 0.001) provides evidence of what is already rather obvious: that there are highly significant differences in "error" (i.e., happy moments not classified as positive) distribution among categories. Certain categories (AFFECTION, BONDING, ACHIEVEMENT) display a much lower error rate than others, meaning they are expressed using more positive words than the rest (and therefore are classified by the lexicon-based sentiment analysis analyzer as positive). Conversely, the LEISURE, ENJOY THE MOMENT, EXERCISE, and NATURE categories (in this order) display much higher error rates, meaning more non-sentiment language (or negative words and expressions) is used to describe moments in these categories. It is also worth noting that these error rates do not correlate with sample sizes.
As for the identified sentiment items, Lingmotif finds mainly general positive terms associated with happiness and satisfaction across the seven categories (e.g., happy, happiness, good, nice, enjoy, love). However, some expressions are specifically related to one of these categories: Supplementary Table S4 shows the top 40 most frequent positive items related to just one of the categories. The items included in this list summarize the instantiation of the concept of happiness in events, properties, or objects, which are inherent to specific categories but irrelevant in others, and also show that it is possible to identify semantically-related items, which we analyze in detail in the following section. We find references to work-related concepts in ACHIEVEMENT, affective (both emotional and physical) expressions in AFFECTION, words associated with friendship in BONDING, words related to indulging in food in ENJOY THE MOMENT, health-related terms in EXERCISE, entertaining activities in LEISURE, and weather conditions in NATURE. With regard to the negative items, as in the case of the positive items, some of them are shared by all categories, particularly  those with a higher frequency. But while the most frequent positive items are more semantically generic ("happy", "nice", "good", etc.), the negative ones are content-specific. Even more, they are not the obvious antonyms of the frequent positive terms: instead of general negative expressions such as "unhappy" or "unpleasant", we find references to rather precise negative situations, such as "sick", "stress", or "tired". Supplementary  Table S5 shows the diversity and lexical precision of the most frequent negative items in each category. Overall, both sentiment analysis techniques show that the language of happiness is not exclusively expressed through positive words, but rather relies on constructions that include neutral or negative expressions, usually to provide the background needed for a positive event to be understood.
Keyword analysis. Before analyzing the conceptual groupings found in the seven categories, we must look at the complete picture to get a general idea of how well our keywords reflect the different categories in HappyDB. Figure 3 shows a visualization of the top 1000 keywords for each category. This visualization, generated using the Tensorflow Embeddings Projector is a threedimensional rendering of all keywords using the UMAP (McInnes et al. 2018) dimension reduction and projection algorithm. In it, each color dot represents a keyword's position in the overall vector space, and each color represents a HappyDB category. 8 Although the 2D image on this page loses interpretability, the clustering is quite apparent, meaning each category determines certain conceptual classes.
Many keywords are bound to be included in several categories. For example, food-related words may likely be keywords in ENJOY THE MOMENT, LEISURE, and EXERCISE. In the vector space, however, these keywords will be located near one another, since their cosine similarity will be high. What we seek to discover here is whether certain sections of the space are occupied predominantly by keywords belonging in a certain category. In Fig. 4 the red dots represent the keywords for the EXERCISE category, which are scattered all over the embeddings space; nevertheless, the area in the red oval shows that keywords directly related to physical exercise are the majority.
The analysis of keyword clusters by category (detailed as step 3 above) will serve us to identify which conceptual classes are relevant in each of the categories. Figure 5 shows the clusters generated for ACHIEVEMENT, to which we have manually added a label that singles out the semantic class that encompasses the meaning of the items in each cluster: As we can see in these keyword cluster representations, the number of conceptual classes identified varies from one category to another, which may hint to the lexical richness of the texts and the variety of different elements respondents considered relevant in their happiness definition. We identified 12 different conceptual classes in ACHIEVEMENT, the one with the highest number of clusters (and the highest number of keywords). An important subset is related to consumer products, such as mobile devices and electronic items, lifestyle items, vehicles, food, and household commodities. Two others are related to the consumption of commercial audio and visual content (entertainment products). Basic areas of daily life are also relevant for happiness as described in relation to achievements: work and school. Perhaps surprisingly, only three categories refer to more abstract concepts related to achievement: health, money, and goals. Food, work, and household commodities are the largest conceptual classes that display a more compact configuration, whereas health and communication and media are the smaller and less compact clusters. This is particularly observable in the case of communication and media, whose items appear scattered across the vector space.
Although the categories AFFECTION and BONDING (Fig. 6) could be viewed as similar at first sight, their keywords actually reveal that they refer to different types of happy moments. We will begin analyzing the former, where we identified 9 keyword clusters. Two of the largest ones are related to close and extended family, which are also connected to particular events (family celebrations, family activities and trips). Food and leisure-related items are also relevant, as a complement to the activities we carry out in family events. In addition, the time spent with the family, as well as the time dedicated to school and work, represent a large part of the keywords found in this category. A fairly large set of items related to how people feel while celebrating with their families is also present, although this conceptual class is not very dense. In BONDING, on the other hand, we find time being spent with friends at food hangouts, vacations, celebrations, or school. This confirms what the creators of HappyDB stated about the differences between AFFECTION and BONDING (the former related to family, the latter to friendship), but it also adds more information to the types of activities that involve friends. Even though BONDING is the most compact category, in which most items in keyword clusters were correctly assigned to a single semantic label, some conceptual classes, such as entertainment or communication and media, include outlying items, which is visible in the dispersion of orange and green dots.
ENJOY THE MOMENT and EXERCISE are presented in Fig. 7. The former presents 7 keyword clusters, none of which refer to people who might be involved in happy moments (e.g., family or friends). Happiness is thus viewed as formed by rewarding moments associated with different types of food (which is understood as a recreational activity) and entertainment. 87.91% of the items identified in the keyword clusters fit correctly in their semantic class. Although EXERCISE is one of the categories with most clusters and dispersion, all of them are related to the various aspects of physical activity. But as can be seen in Fig. 7, the cluster of words referring to workout routines, exercise types and muscle names display greater compactness, due to the lexical specificity of the items in these clusters. This category also includes references to places and time where/when exercise takes place, the feelings it arouses and the goals achieved. References to weight and diet are also found.
Finally, LEISURE and NATURE (Fig. 8) present the lowest number of conceptual classes and, unlike the rest, do not contain any references to people other than the respondents themselves. Food is one of the smallest semantic classes in LEISURE, and it refers to events associated with food consumption rather than products. It is thus quite different to the conceptual class of food found in ENJOY THE MOMENT, mainly comfort or fast food, or the food-related words involved in celebrations in the BONDING and AFFECTION categories. In addition, LEISURE is lexically dominated by entertainment-related words, with a strong focus on movies, games, and TV shows. Food (58.49%) and music (61.11%) are the classes with the lowest number of items that fit correctly in their semantic label, as their items were mixed with time and place references and with some (but scarce) items referred to books or written material. NATURE, on the other hand, represents happiness as pleasant weather conditions, natural landscapes, and outdoor activities.
The clustering and labeling methodology employed to analyze keywords provides a global picture of conceptual relationships and their relevance across categories. To summarize, we have 42 semantic classes, 10 of which are shared by two or more categories in HappyDB, as shown in Table 3.
In conclusion, every category determines a distinct set of conceptual classes that summarize the main sources of happiness evoked by the respondents, and those classes are determined by the thematic area of the category they belong to. Moreover, some of these are shared by more than two categories, which suggests that these might be a more predominant type of source of happiness than the rest. Furthermore, the presence of the miscellaneous class indicates that although the vast majority of the keywords can be efficiently clustered, some remain without semantic links.
Named entities. Table 4 shows the results of our named entities extraction process, including distribution of entity labels by      categories. The label ORG refers to companies, agencies, or institutions; UNK to entities identified by the linguistic extractor, which did not assign any label; GPE makes reference to countries or cities; PERSON indicates people, including fictional; PRO-DUCT points to objects or food; WORK_OF_ART specifies titles of books or songs; LOC refers to non-GPE locations, mountain ranges or bodies of water; FAC introduces city infrastructures, and EVENT indicates named hurricanes, battles, sports events, etc.
The most common type of entity found in HappyDB is ORG. This entity label, however, includes organizations such as companies, so it often refers to consumer products and services, which can be misleading. Supplementary Table S6 contains the top five entities for each of the main labels. Entities appear in all seven categories, but with some remarkable differences in their frequency per happy moment, with high difference from mean (x ¼ 0:19) in the cases of NATURE (−0.15) and LEISURE (0.24). Further, in the case of the former, the most frequent label is GPE, since the named entities refer to the places where experiences in nature take place, the rest of the labels being very low. The exceptionally high proportion of entities in the LEISURE category suggests, together with the label distribution (very high in ORG, PRODUCT, and WORK_OF_ART), that respondents' leisure experience heavily relies on consumer products.
The most prominent entities in ACHIEVEMENT are those associated with electronic devices and products of popular technology brands (iPad, PlayStation 4, Kindle, MacBook Pro). We also find references to e-commerce companies (Ebay, Etsy, Amazon, Costco) and to social media (Instagram, Facebook, YouTube). Four very popular online games are mentioned: Heroes of Storm, Legend of Zelda, World of Warcraft, and League of Legends. A number of entities are related to the keyword cluster food, represented through restaurant names (Taco Bell, Olive Garden, Starbucks) and brands of popular drinks (Diet Coke, Frappuccino, Mountain Dew). There are also some places (Florida, New York, India, Las Vegas) and sports events (Olympics, Boston Marathon, World Cup) with wide media coverage.
AFFECTION and BONDING share entities that refer to keywords related to celebrations with relatives (Valentine's Day, Easter holiday, Mother's Day). Facebook, Skype, and Facetime are entities with a high frequency, together with a number of cities and countries. They also share places that people generally visit for fun during the holidays, thus related to the keywords that represent leisure activities and trips to popular locations (Disneyland, Grand Canyon, Great Wolf Lodge, Nicco Park).
The entities in ENJOY THE MOMENT and LEISURE name sports events (Olympics, World Cup, NBA), and food both in the form of events (food festival event, strawberry festival, taco festival) and fast-food restaurants (McDonald's, Starbucks, Subway). Top entities also name social media (Facebook, Twitter, Instagram) and streaming platforms, such as Netflix. Moreover, LEISURE includes entities based on TV shows, films, and series (Star Wars, Walking Dead, Game of Thrones).
Some of the entities in EXERCISE refer to time and distance measurements (a mile, a ton, 30 min), correlated to these two keyword clusters already identified. We also find places (Hawaii, Himalaya) and types of workouts (crossfit, yoga). Although NATURE also presents places where exercises can be done, these are more generally oriented to open spaces (Appalachian, the Grand Canyon, the Haleakala National Park,) or nature-related facilities (zoo, botanical garden).

Discussion and conclusions
The results of the different analyses we have performed shed considerable light on our research questions. In reference to our first question, which concerns the weight and relevance of sentiment-laden words and expressions in self-reported descriptions of happy moments, the results of the polarity classification of our data reinforce the idea that happiness is not exclusively stated through the use of positive expressions, especially in certain categories of our corpus. Instead, other mechanisms that go beyond positive evaluative language seem to come into play when speakers talk about their satisfaction with life and the sources of their happiness: they may refer to daily actions that have made them feel fulfilled, but which are not expressed by what we would consider positive words. This explains why both sentiment classifiers fail to identify such a high percentage of instances in categories like LEISURE, ENJOY THE MOMENT, EXERCISE, or NATURE. In these four categories, happiness (or at least the language used to talk about it) cannot be equated to emotions or affections. Our results are in agreement with those of Tanzer andWeyandt (2020, p. 2693) in their experiment on imaging happiness, who concluded that, in the English language, "happiness is not best understood as an affective state, but better understood within its behavioral context, as an emergent property of activity".
The answer to our second research question also points to the same direction. In our analysis of sentiment words, the most frequent positive items summarize how the general happiness concept is related to specific events, properties, or objects, inherent to one category but irrelevant in others. Nevertheless, the observed difference in lexical specificity between frequent positive and frequent negative items supports the negative bias notion (Kanouse and Hanson, 1972), by which events of a negative nature have a bigger impact on one's psychological state than neutral or positive things. Negative differentiation, one of the four elements of the negative bias proposed by Rozin and Royzman (2001), indicates that negative vocabulary is not only more numerous in language but also more richly descriptive. In addition, negative words found in our corpus seem to encapsulate the most salient "happiness-threatening" events, feelings, or conditions specific to each category: in the case of ACHIEVEMENT, having "problems" or "struggling" because of "debt" or "pain"; in the case of EXERCISE, being "tired", having an "injury", or being "sick". It may seem startling that people would talk about those things when questioned about what makes them happy, although it shows the relevance that negative events hold on positive ones. Research on decision-making, attribution of intention, learning, and other psychological processes suggest what Baumeister et al. (2001) call "the great power of bad events over good ones". In this particular case, it shows that even when talking about happiness, we tend to recall unpleasant situations that may prevent good things from happening and that the narrative of a bad event, even if it is felt as overcome, is relevant in our descriptions of present happiness. As for our third research question, regarding the sources of happiness in self-reported descriptions of happy moments, our keyword analysis yields very rich information which suggests that it is possible to distinguish between two types of happiness, depending on the conceptual classes from which happiness is obtained: an external and an internal one. The first was already framed by Visakko and Voutilainen (2020), who named it happiness "as an intersubjective phenomenon" whose roots rely on social interactions with others in such a way that "the self's happiness and other's happiness are seen as interconnected" (p. 48). Myers (2000) and Argyle (2001) also argued that social activities are related to happiness, thus hinting at the importance of external social input as a key source in this type of happiness. The categories AFFECTION and BONDING are associated with this external happiness because both contain keyword clusters that refer to social activities traditionally celebrated with friends and family members (e.g., weddings, birthday parties, family reunions). Both also include keywords that make reference to people, more specifically, close family in AFFECTION, and friends in BONDING. Although ENJOY THE MOMENT and LEISURE do not present any references at all to people or family, they do refer to places, events and pastimes that generate happiness due to their cultural and societal nature: as such, these could be classified as a kind of external happiness. One the other hand, the internal type of happiness focuses on accomplishments, tasks, and activities that make people feel happy without external interaction. The remaining categories in HappyDB (ACHIEVEMENT, NATURE, and EXERCISE) conform to this description, as they do not entail direct social interactions with others but actions that make people feel fulfilled in a more personal sense and are carried out individually (or are presented as if they were). This distinction between internal and external happiness is also related to the psychological concepts of agency and community proposed by Paulhus and Trapnell (2008), which was explored in the classification task of the CL-aff HappyDB dataset in terms of agency (those cases in which the author is in control) and sociality (instances that involve a social situation with other people). The predictive validity of this classification has been further studied in Jaidka et al. (2020), although they correlate agency and sociality with the use of individual words included in emotion dictionaries 9 rather than with the compact keyword clusters we have identified in the seven original categories established in HappyDB.
Additionally, our analysis of named entities singles out the informants' preferences as consumers of commercial products of all sorts: electronic devices, foods, TV shows, social media, and even travel destinations. Thus, consumerism and materialism play a very important role in the corpus: respondents named a large number of trademarks and entertainment corporations associated with buying (and receiving) presents, something that connects consumer habits to the pursuit of happiness. In relation to this, we must remember that "new" is the 7th most frequent word in our corpus, as shown in Table 6. Therefore, acquiring new products, living new experiences and visiting new places is an important source of happiness. This relation has been recently studied, particularly in reference to conspicuous consumption and consumption that increases social connectedness as a factor to increase happiness (Gokdemir, 2015;Wang et al. 2019), which is the case of most of the entities found in our corpus: they name where we eat, what we celebrate, how we are told to keep fit and healthy, where we go, what we buy, and the type of online media we consume. To some extent, the entities we have extracted from the corpus seem to represent a type of happiness that is sponsored by companies, corporations, entertainment platforms, or tourism trends. Although the results of these studies leave the door open to competing hypotheses that need further research, there is consensus that consumerism has an important impact on our (perceived and portrayed) social status and, as a consequence, on our happiness level.
In conclusion, describing happiness not only implies the use of positive expressions: it also involves the use of neutral and negative words that help in the construction of the happy moment narrative. Therefore, the proportion of positive lexical items should only be regarded as a proxy metric for the automatic measurement of self-reported happiness in user-generated text. Further, we have demonstrated how to apply certain NLP techniques to effectively identify sources of happiness in such texts, which go beyond family and social relationships and leisure activities: newness stands out as a cross-categorial feature that signals happiness, and points to the importance of owning new products and experiencing things for the first time. Food, in any of its manifestations, including food consumption events, appears to be another common source of happiness, as we found references pervadingly throughout the corpus. Finally, the role of commercial products and services as sources of happiness can hardly be underestimated, given the large number of such references in the corpus.

Data availability
The original dataset and other related resources can be found at https://github.com/megagonlabs/HappyDB. The Lingmotif output for the dataset (both in HTML and JSON formats), the keyword lists, and vectors and metadata files are available at https://github.com/Diverking/HappyDB. Notes 1 The dataset and other related resources can be found at https://github.com/ megagonlabs/HappyDB. In this research we use the original, complete dataset (minus those entries we removed as indicated), although a smaller collection of 17,215 happy moments was compiled for the CL-Aff Shared Task @AAA-19, which the authors call "CL-Aff HappyDB dataset" (Jaidka et al. 2019). 2 Adib et al. (2019) discuss in detail the usefulness of crowd-sourcing platforms as data sources and address potential problems related to the diversity of the contributors of the sample, possible biases in data collection and other ethical questions. 3 Claeser (2019) also identified duplicates or near-duplicate happy moments in the above mentioned training sample of HappyDB used in CL-Aff Happiness shared task. The need for data cleansing has been considered as necessary by the authors for future data releases (Jaidka et al. 2019). 4 We will use SMALL CAPS for mentions of these categories. 5 http://www.lingmotif.com.