Natural emotion vocabularies as windows on distress and well-being

To date we know little about natural emotion word repertoires, and whether or how they are associated with emotional functioning. Principles from linguistics suggest that the richness or diversity of individuals’ actively used emotion vocabularies may correspond with their typical emotion experiences. The current investigation measures active emotion vocabularies in participant-generated natural speech and examined their relationships to individual differences in mood, personality, and physical and emotional well-being. Study 1 analyzes stream-of-consciousness essays by 1,567 college students. Study 2 analyzes public blogs written by over 35,000 individuals. The studies yield consistent findings that emotion vocabulary richness corresponds broadly with experience. Larger negative emotion vocabularies correlate with more psychological distress and poorer physical health. Larger positive emotion vocabularies correlate with higher well-being and better physical health. Findings support theories linking language use and development with lived experience and may have future clinical implications pending further research.

I had a couple of suggestions that might improve this report. First, I wondered about the joint contributions of positive and negative unique emotion vocabularies (either additively or in interaction) in predicting the mental health outcomes described in Study 1. Although the clarity of correlations and partial correlations is a lovely aspect of the paper, it would be interesting to know whether there is a potential buffering effect of highly differentiated positive emotion vocabulary that might moderate the association between negative emotion vocabulary with distress. Are mixed emotions good? bad? Neither?
For Study 2, I was confused by the presentation of the N for the study. Were the blogs written by the same underlying sample of people? I assume that the 35,000+ refers to people (who wrote blogs of varying lengths) but it is not clear in the way it is presented. I think the source of confusion is that the term "unique blogs" sounds more like blog posts rather than whole blogs (which is what I think the authors did). This should be clarified.
The authors do not note that generally emotion researchers have long discussed the fact that the English language has more words for negative states than positive states. I wondered if they have considered the implications of this for their findings.
I think the Discussion might incorporate a bit more about the difference between emotion vocabularies that might be measured using something like an emotional intelligence measure or an emotion vocabulary test) and the data they have here. It would be interesting to probe the link between emotional experience and the expression of emotion in writing in the context of such information. Are those who are not distressed lacking these words or do they simply have no reason to deploy them in the moment? It might be interesting for future research to employ mood inductions to help illuminate the nature of rich emotion language. It seems there may still be a functional role of these rich vocabularies. If a person is experiencing distress, might it be better in a long term way, for the person to have names for these experiences? Over the long haul, it may be that those who are able to label their feelings with precision are better off than others. This is a signed review by Laura King Reviewer #2 (Remarks to the Author): This paper offers an interesting, novel approach to investigating how emotional vocabularies used in natural language can predict mental and physical health, with Study 2 supplementing Study 1's findings with a larger sample and wider range of written texts. In light of the innovativeness of its emotional vocabulary (EV) methodology, this work is potentially highly impactful and may jumpstart a new line of research harnessing big data to identify markers of health and well-being (and ill-being).
To improve the paper, I list a few mostly minor questions below (not in order of importance).
1) The introduction and literature review of this paper seemed a bit disjointed and did not have ideal flow. At times the main ideas were not clear and were somewhat confusing, especially when transitioning between previous findings about active and passive vocabularies. For example, the authors include text about indigenous hunter-gatherers in the Philippines that does not appear to be relevant for the rationale of the study.
2) What was the purpose of students in Study 1 completing two essays? Was it just to compute testretest reliability? It was unclear how the two sets of essays were used or treated differently in analyses.
3) Was it possible that students in the class used in Study 1, who completed all kinds of measures (as well as the two essays) over the course of the semester, could have guessed or become aware of the aims of the study? What was the cover story used? Was any material relevant to the study (emotion, emotion differentiation, LIWC, natural language processing) covered in the course?
2) When did participants complete the depression measure in Study 1? Also, as the authors are aware, the one-item global measure of physical health is not ideal. Thus, interpretations relevant to health (as opposed to well-being/mental health) should be taken with caution.
3) For text-derived individual differences, the authors account for general vocabulary size, but do not assess or mention participants' education level for either study. This may influence their findings, especially with the significantly wider age range in Study 2 (ages 13 to 48).
4) The writing samples for Study 1 included in the supplemental materials are very helpful and improve understanding of the findings. It would be beneficial to include classification information for each sample. Specifically, the authors could note or rank which samples have a larger negative emotion vocabulary relative to their positive emotion vocabulary.
5) The authors list a formula used to assess EV while controlling for total word count. A very similar formula is used for analyzing general vocabulary size. It would be helpful to elaborate a little on the difference between "unique words" and "unique emotion words." The authors address what classifies unique emotion words, but do not list the inclusion criteria for unique words.
Reviewer #3 (Remarks to the Author): In principle, we know little about natural emotion word repertoires or spontaneously produced emotion language in real world situations, and that the distinction between active and passive vocabularies is of particular value in this regard. That being said, the manuscript appears to be framed in an either-or sort of fashion which rings false. The overall impression is that the work is trying to overturn existing findings rather than contextualize them -and in doing this overstates its own conclusions. Even the title is misleading given the findings: it is mainly diverse vocabularies for negative emotions that appear to be linked to self-reported negative outcomes (i.e., 'markers of distress' are not objective, whereas the work being criticized measured outcomes in a more objective fashion). In addition, the authors own findings suggest that diverse vocabularies for positive emotions are linked to positive aspects of self-reported well-being.
Beyond this change in framing, there are several concerns with regard to method and interpretation that should be addressed in a substantive revision: Method: 1. The authors do not provide any details on how emotion words were selected for use in their custom dictionary. Were these based on previous literature, normative ratings, corpus data (e.g., frequency of use)? Many of the words included (e.g., alone, bad, bitter) are not solely applied to mental life or emotional experience. Further, it seems possible that the extreme imbalance between the number of negative and positive emotion words may have artificially inflated negative EV scores relative to positive EV scores, casting doubt on whether the two indices can be directly compared with each other and with other text-derived and self-report variables. I would like to see the authors redo their analyses with a comparable number of emotion words in each index, and provide evidence of principled inclusion criteria.
2. The use of a coarse-grained, lexico-centric means of linguistic analysis is limited in the insights it can provide. More nuanced features of the language used in relation to emotion may be uncovered by a deeper analysis of themes, such as those attainable through topic modeling and distributional semantics. While the authors mention that there was a wide range of thematic content, this diversity is not quantified in any way. What does a thematic analysis suggest, and how do the present results relate to common themes identified? The relationships between the EV indices and other text-derived measures seem like obvious illustrations (e.g., negative emotion words are linked to health issues, positive words to achievement).
3. The choice of criterion validity measures in Study 1, and the lack of criterion validity measures in Study 2, diminishes the impact and utility of these findings. In Study 1, text-derived measures are not linked to robust measures that are necessary to fully contextualize the present findings. For example, it seems possible that responses to a single-item self-report measure of physical health may be influenced by current mood in a way that longer-form measures or certainly objective measures would not be. While the authors discuss the possibility that words influence our experience of the world (i.e., a moderate linguistic relativity hypothesis), they do not sufficiently demonstrate that language has any bearing on non-linguistic variables. Without any non-text-derived measures of mood, health, and wellbeing, a critical review of Study 2 results in particular would suggest they merely demonstrate the effect of language on language.
Interpretation: 4. The logic linking the effect and importance of words/concepts for emotion is not clear at even seems contradictory. To what extent do the authors hold that words can be adaptive in their segmenting of emotional experience? Given the current findings, is this only applicable to positive experiences, and what is the mechanism underlying that distinction? I would like to see the authors specifically state their alternative hypotheses for how language may (or may not) support emotional health and well-being. 5. Word usage (and therefore active vocabulary) isn't just a matter of individual 'comfort' or 'interest': there are complex processes underlying both lexical selection (within the individual brain) as well as communities of speech (between individual brains). Factors such as recency and frequency, prestige and affiliation, are also involved in the creation and maintenance of word repertoires. The authors' assertion that people who are suffering from distress are more interested in negative emotion words (p. 9, line 157) should be discussed along these lines.
6. The authors criticize previous findings without offering clear recommendations for how emotion language should be acquired and used. They suggest that it is not vocabulary size that needs to be increased, but stop short of proposing other features of language or the conceptual system that may be driving positive outcomes. Existing lines of research do investigate these mechanisms, yet are criticized by the authors for employing constrained/passive methods. This seems to me an opportunity to acknowledge that a multi-method approach may be necessary to fully investigate the underlying relationship between emotional experience and language use. Miscellaneous: 7. Can the authors indicate how the sample texts in the supplemental material were selected? It is not clear if they are intended as representative cases, outliers, or random samples.
8. Scatterplots of key correlations would be useful so the reader can examine the distributions.

Reviewer #1:
General Comments: This is paper presents two very interesting studies examining the relationship of using a rich/varied emotional vocabulary to psychological functioning. The studies and findings are highly novel. Conceptually, the paper examines the possibility that emotion words are acquired to explain experiences. As such, those with more experience of sorrow, distress, etc. should have richer emotional vocabularies than others. The results support this idea in an innovative way--by examining the actual use of unique positive and negative emotion words in stream of consciousness writing and in an enormous sample of blog posts. It is just very interesting that emotional states give rise to emotion words and then contribute to those moods, in turn. Although I have some suggestions and concerns, overall I found this work to be fascinating. Comment #1: I had a couple of suggestions that might improve this report. First, I wondered about the joint contributions of positive and negative unique emotion vocabularies (either additively or in interaction) in predicting the mental health outcomes described in Study 1. Although the clarity of correlations and partial correlations is a lovely aspect of the paper, it would be interesting to know whether there is a potential buffering effect of highly differentiated positive emotion vocabulary that might moderate the association between negative emotion vocabulary with distress. Are mixed emotions good? bad? Neither?
Author reply: At the reviewer's suggestion, we have explored this idea with additional moderation analyses in the Study 1 sample. Using a multiple regression technique, we examined the effect of the interaction of Positive and Negative EV predicting self-reported depression scores, controlling for both EV main effects and the three covariates used in the manuscript's partial correlations (negative and positive emotional tone and general vocabulary). Results revealed a significant moderation suggestive of a buffering effect of positive emotion vocabularies, b=-.84, SE=.40, p=.035, with this interaction accounting for 3.4% of depression variance. As Figure S5 shows, the pattern of effects was such that for individuals with small negative emotion word repertoires, depression symptoms were lower and did not depend on Positive EVs. By contrast, for individuals with large Negative EVs, there was a buffering effect of Positive EV, such that individuals with restricted Positive EV were more depressed. Interestingly, however, this buffering effect reached significance only at exceedingly high levels of Negative EV (i.e., above values of 1.27, or above the 97.3 rd percentile).
The reviewer wonders whether mixed emotions are good or bad. Given the meaning of the EV index, we cannot draw conclusions about mixed emotional states per se-only about cooccurring broad vocabularies. Based on the moderation findings, we cautiously suggest that possessing and/or using a varied positive emotion vocabulary mitigates the relationship between varied negative vocabularies and depression, but we cannot comment on the causality in this effect. Given the preliminary nature of these findings, their subtlety, and the reviewer's comment favoring the clarity of the correlational approach of the manuscript, we present these results in the Supplemental materials (S5), and we direct the reader here in Footnote 4 to the Study 1 results (p. 14).
Comment #2: For Study 2, I was confused by the presentation of the N for the study. Were the blogs written by the same underlying sample of people? I assume that the 35,000+ refers to people (who wrote blogs of varying lengths) but it is not clear in the way it is presented. I think the source of confusion is that the term "unique blogs" sounds more like blog posts rather than whole blogs (which is what I think the authors did). This should be clarified.
Author reply: We have clarified "unique blogs" to indicate that the N of 35,000+ refers to whole blogs (i.e., all content ever posted by 35,000+ individual people, each to their own personal blog), not separate blog posts. The revised sentence now reads: "The final corpus contained the full content of blogs by 35,385 individuals, ranging in total length from 107 to 481,983 words (M = 3,142;SD = 6,572)." Comment #3: The authors do not note that generally emotion researchers have long discussed the fact that the English language has more words for negative states than positive states. I wondered if they have considered the implications of this for their findings.
Author reply: We thank the reviewer for this suggestion, which is also relevant to ideas raised by Reviewer 3. We have incorporated a discussion of this imbalance in negative and positive emotion words in the lexicon (pp. 24-25).
Comment #4: I think the Discussion might incorporate a bit more about the difference between emotion vocabularies that might be measured using something like an emotional intelligence measure or an emotion vocabulary test) and the data they have here. It would be interesting to probe the link between emotional experience and the expression of emotion in writing in the context of such information. Are those who are not distressed lacking these words or do they simply have no reason to deploy them in the moment? It might be interesting for future research to employ mood inductions to help illuminate the nature of rich emotion language. It seems there may still be a functional role of these rich vocabularies. If a person is experiencing distress, might it be better in a long term way, for the person to have names for these experiences? Over the long haul, it may be that those who are able to label their feelings with precision are better off than others.
Author reply: Thank you for sharing these speculations, which we found very interesting. We have elaborated the manuscript's Discussion in several ways throughout, inspired directly by these reviewer comments. We now directly discuss the need for future experimental work to better understand the nature of emotional vocabulary diversity and its complex relationship to mood and experience.

Reviewer #2:
General Comments: This paper offers an interesting, novel approach to investigating how emotional vocabularies used in natural language can predict mental and physical health, with Study 2 supplementing Study 1's findings with a larger sample and wider range of written texts. In light of the innovativeness of its emotional vocabulary (EV) methodology, this work is potentially highly impactful and may jumpstart a new line of research harnessing big data to identify markers of health and well-being (and ill-being).
To improve the paper, I list a few mostly minor questions below (not in order of importance).

Comment #1:
The introduction and literature review of this paper seemed a bit disjointed and did not have ideal flow. At times the main ideas were not clear and were somewhat confusing, especially when transitioning between previous findings about active and passive vocabularies. For example, the authors include text about indigenous hunter-gatherers in the Philippines that does not appear to be relevant for the rationale of the study.
Author reply: Thank you, we agree there were ways in which the flow of ideas in the introduction could have been clearer, and we have made changes to the introduction to address this issue that were informed by comments from each reviewer.

Comment #2:
What was the purpose of students in Study 1 completing two essays? Was it just to compute test-retest reliability? It was unclear how the two sets of essays were used or treated differently in analyses.
Author reply: Yes, the second essay was used to compute test-retest reliability in Study 1. We have clarified this in the manuscript by replacing the term "temporal stability" with "test-retest reliability" in the introduction to Study 1 (p. 6), and by revising the paragraph in the Study 1 procedure so that it links the Time 2 essay explicitly to its function for these analyses: "Undergraduates enrolled in a large online introductory psychology class completed identical writing assignments in mid-September (Time 1) and, for test-retest reliability analysis, again in early December (Time 2)." (p. 7).

Comment #3:
Was it possible that students in the class used in Study 1, who completed all kinds of measures (as well as the two essays) over the course of the semester, could have guessed or become aware of the aims of the study? What was the cover story used? Was any material relevant to the study (emotion, emotion differentiation, LIWC, natural language processing) covered in the course?
Author reply: It is not possible that the students guessed or became aware of the aims of this study, as the present study was conducted archivally; the research question was only conceptualized in 2015 after the data were collected in 2014. Questionnaires were completed to introduce students to topics as they were being taught in the course, such as issues related to self-report methodologies and cognition. Essays were initially collected as part of the introductory psychology course material to give students an opportunity to learn about mindwandering, consciousness, and attention and gain an appreciation of William James's stream of consciousness ideas. Topics related to emotion and language were covered during the course of the semester as well. To address the broader possibility that students may have anticipated their language would be analyzed, we have now read, at random, 80 (5%) of the Time 1 essays. Of those, 19 essays (24%) made some reference to possible readers and/or use in research (e.g., "Is anyone even going to read this? ........IS ANYONE OUT THERE????"; "Sorry I'm making this so long, whoever has to read this probably isn't having a brilliant time;" "well anyways i hope this is some help to researchers because i actually tried"). Notably, only one text of the 80 (0.01%) gave any indication that the writer thought word choice might be analyzed (i.e., "I don't know what the purpose of this assignment is, but maybe it'll be scanned for keywords or something."). Given this low rate, the archival nature of the study, and the fact that students were never introduced to the idea of emotional vocabulary diversity, we believe that students' expectations almost certainly could not have biased study findings.
Comment #4: When did participants complete the depression measure in Study 1? Also, as the authors are aware, the one-item global measure of physical health is not ideal. Thus, interpretations relevant to health (as opposed to well-being/mental health) should be taken with caution.
Author reply: The depression measure was administered on November 12, 2015, which was about 3 weeks before the end of the course (and about 2-3 weeks before the Time 2 essay used for test-retest reliability). We have added a caveat regarding the interpretive caution required for the one-item health measure to the manuscript (manuscript p. 23).

Comment #5:
For text-derived individual differences, the authors account for general vocabulary size, but do not assess or mention participants' education level for either study. This may influence their findings, especially with the significantly wider age range in Study 2 (ages 13 to 48).
Author reply: We also thought that emotion vocabularies could have been affected by education level, and this was the motivation behind the creation of the general vocabulary index we tested as a covariate. We have now clarified in the manuscript that the general vocabulary index was intended to measure general verbal ability, a widely-used proxy for education level (see, e.g., Keuleers, Stevens, Mandera, & Brysbaert, 2015), and we discuss the limitations related to absence of an education variable (p. 23).
In the Study 1 (college) sample, the education levels of participants were highly homogenous: 99.2% were undergraduates (59.7% freshman; 26.1% sophomores). For the blog sample, educational data was not available.
To shed further light on the possible role of educational attainment on the observed effects, we now report the average Age of Acquisition (AoA) for each word included in the EV computations (norms are in the new Supplemental Table S6, and we refer to them in the manuscript in Footnote 2). This AoA data was obtained from a published corpus of AoA norms (Kuperman, Stadthagen-Gonzalez, & Brysbaert, 2012). This reputable database of AoAs reflect receptive knowledge of 30,000 high-frequency English words. These AoA norms support the idea that the words in our EV dictionary are overwhelmingly learned in childhood or adolescence and are familiar to most native speakers by an average age of 9 years old (average AoA for negative words 8.9yrs, SD=2.79yrs; average AoA for positive words 8.21yrs, SD=2.63 yrs). Thus, it is likely that most (i.e., virtually all) of the college sample (Study 1) and broader-aged sample (Study 2) knew the dictionary-contained words and was able to use them with understanding and intent.

Comment #6:
The writing samples for Study 1 included in the supplemental materials are very helpful and improve understanding of the findings. It would be beneficial to include classification information for each sample. Specifically, the authors could note or rank which samples have a larger negative emotion vocabulary relative to their positive emotion vocabulary.
Author reply: We are glad that the samples (we understand this to mean in Supplement S1) were helpful in illustrating both the concept of active EVs and the general patterns in our results. In S1, each sample now appears together with the writer's EV scores and corresponding percentile ranks. We have also taken the opportunity to improve the interpretability of sample words captured in Supplement S3, where we have added EV scores to allow readers several options for how to the examples-including how negative emotion vocabulary compare relative to their positive emotion vocabulary within samples. Because word count is also indicated, readers can grasp by example how the EV scores adjusted for word count. To improve the interpretability of the EV scores in S3, we also now repeat here the sample's range and central tendency information, for greater context.

Comment #7:
The authors list a formula used to assess EV while controlling for total word count. A very similar formula is used for analyzing general vocabulary size. It would be helpful to elaborate a little on the difference between "unique words" and "unique emotion words." The authors address what classifies unique emotion words, but do not list the inclusion criteria for unique words.
Author reply: Thank you for pointing out this opportunity for greater clarity. We now state in the manuscript (p. 10) that "unique words" refers to the number of words that appear at least once in any given text. We additionally provide an example sentence that illustrates the concept of what constitutes a "unique word" as well as explicit description of what classes of words were included in these formulae (e.g., exclusion of function words, which are not diagnostic of vocabulary abilities).

Comment #1 (Framing):
In principle, we know little about natural emotion word repertoires or spontaneously produced emotion language in real world situations, and that the distinction between active and passive vocabularies is of particular value in this regard. That being said, the manuscript appears to be framed in an either-or sort of fashion which rings false. The overall impression is that the work is trying to overturn existing findings rather than contextualize them -and in doing this overstates its own conclusions. Even the title is misleading given the findings: it is mainly diverse vocabularies for negative emotions that appear to be linked to self-reported negative outcomes (i.e., 'markers of distress' are not objective, whereas the work being criticized measured outcomes in a more objective fashion). In addition, the authors own findings suggest that diverse vocabularies for positive emotions are linked to positive aspects of self-reported well-being.
Author reply: We agree that one of the primary contributions of the current research is to shed light on how active, unprompted emotion vocabularies behave, as well as their psychological relevance. Given the general lack of research on this topic, we also agree that there is considerable value in expanding our current understanding of emotions and affective language, building on the extensive research that has been conducted on passive emotion vocabularies. In our revision, we have made it clear that we are not making an "either-or" distinction, and we have used our revisions to the Discussion (detailed in other replies above and below) to contextualize our findings within past work. We also acknowledge that humans possess and make use of active and passive vocabularies, likely in different ways (as our findings suggest), and that our current findings are not inherently incongruent with past work found in the emotion term recognition/labeling literatures. Instead, we now explicitly call attention to the fact that active EVs constitute another side of the coin, which has seen little to no empirical, psychological research to date (pp. 22-23). To improve the alignment between the title and study findings, we have also changed the manuscript title.

Comment #2 (Method):
The authors do not provide any details on how emotion words were selected for use in their custom dictionary. Were these based on previous literature, normative ratings, corpus data (e.g., frequency of use)? Many of the words included (e.g., alone, bad, bitter) are not solely applied to mental life or emotional experience. Further, it seems possible that the extreme imbalance between the number of negative and positive emotion words may have artificially inflated negative EV scores relative to positive EV scores, casting doubt on whether the two indices can be directly compared with each other and with other text-derived and selfreport variables. I would like to see the authors redo their analyses with a comparable number of emotion words in each index, and provide evidence of principled inclusion criteria.
We now include a thorough discussion of these concerns in the manuscript, including the selection of the words and the concern that words are not applied solely to mental life or emotional experience (see pp. 24). Regarding the imbalance between the number of negative and positive emotion words in the emotion vocabulary calculations, the reviewer raises an interesting and complex issue, which we answer in three parts: 1. This imbalance is to be expected, based on the long-observed fact that the English language has more words for negative states than positive states (as Reviewer 1 brought up in Comment #3). Moreover, it appears that the imbalance between negative and positive words in the lexicon may transcend cultures/languages (Shrauf & Sanchez, 2004 (Piantadosi, 2014;Zipf, 1935Zipf, , 1949. To highlight this principle in our own samples, we provide a few examples here. In the student sample (Study 1), we find that the majority of words counted (74%) were used by fewer than 1% of students, and 25% were only used once in the entire corpus. The low impact of adding words to the lists is also evidenced by the new additions to our word lists (which we made during the course of this revision; see the following response point), which affected scoring for very few individuals. For instance, the added word "enthralled" occurred in < 0.5% of the blogs (Study 2) and not at all in the student writing (Study 1). (By comparison "love," "happy," and "excited" occurred,respectively,in 65%,47%,and 22% of blogs,and 46%,35%,and 24% of student texts.) Thus, although lengthening the positive word list raises the ceiling for possible positive EV scores, this affects a particularly small minority of individuals, and therefore does not alter the descriptive, sample-wide characteristics and relationships discussed in our work. In this way, word-counting-based approaches, while they would be imprecise for individual-level diagnostic assessment, are robust against measurement noise at the group level. We now comment on the negligible effects of word list length and balance, in the supplement (S8), and underscore the appropriate uses of the EV approach related to this and other reasons in the Discussion (pp 25-26). 3. Lastly, to address this issue as thoroughly as possible, we revisited the word lists to improve the balance as much as possible. We added 13 more positive words to the positive word list, re-ran all study analyses, and fully updated the manuscript. The updated word list has replaced the previous version found in Supplement S1 and is also included in the free Vocabulate software available for reader download. As a result of this change, the maximum possible Positive EV score was increased for each participant, and the possible mean Positive EV increased accordingly. However, as expected, these additions resulted in only small descriptive changes, and not at all to the broader patterns of results as observed through a variety of indicators. Old and revised positive EV scores correlated with each other at r=.97 (Study 1) and r=.98 (Study 2), and Positive EV scores increased for fewer than 23% of students (Study 1, Time 1) and for 23% of blog writers (Study 2). The magnitudes of any changes were extremely small: on average, positive EV increased in Study 1 by .04 points (SD increase = .09), and in Study 2 by .02 points (SD increase = .01). The largest increase in either study was for a student whose unique positive emotion count went up by 1 word, from 4 to 5 unique words; her positive EV score went up accordingly from 2.86 to 3.57 (a .71 difference), although she retained her relative rank in the sample as having the second highest positive EV. Most importantly, the primary study outcomes (correlation coefficients) involving positive EV shifted by a maximum of .02 in Study 1 and .02 in Study 2, resulting in no changes to the patterns of results.

Comment #3 (Method):
The use of a coarse-grained, lexico-centric means of linguistic analysis is limited in the insights it can provide. More nuanced features of the language used in relation to emotion may be uncovered by a deeper analysis of themes, such as those attainable through topic modeling and distributional semantics. While the authors mention that there was a wide range of thematic content, this diversity is not quantified in any way. What does a thematic analysis suggest, and how do the present results relate to common themes identified? The relationships between the EV indices and other text-derived measures seem like obvious illustrations (e.g., negative emotion words are linked to health issues, positive words to achievement).
Author reply: We agree that thematic analysis is an interesting avenue for developing an initial sense of contexts that may elicit EV variations, and we have thus added extensive analyses on thematic dimensions of the texts to the Supplements (new section S7). Principally, we used the meaning extraction method (MEM) topic modeling approach (see, e.g., Argamon, Koppel, Pennebaker, & Schler, 2007;Boyd, 2017;Chung & Pennebaker, 2008) to extract and quantify topics from each study in a data-driven manner. Unlike the LIWC categories used in the present study, which are determined a priori, MEM themes are derived in a bottom-up fashion from the emergent language patterns of the texts. By reducing the semantic dimensionality of words used in each corpus, we established and subsequently quantified the overarching "themes" or "topics" present in the writing samples from each study, respectively, which could then be statistically examined for their relationships to participant EVs.
To explore the reviewer's question, we evaluated the correlations between MEM topic scores and EV scores. As the tables in S1 show, there were clear patterns of correlation, such that broader EVs were generally correlated in interesting and often intuitive ways with various topics. For example, in Study 1, students who more prominently invoked the topic of college (characterized by high loadings of the words "year," "campus," "college," "degree," "student," "major") used less diverse negative EVs (r=-.08, p=.001) and more diverse positive EVs (r=.15,p<.001). Students invoking the topic of sleep ("day," "early," "exhaust," "late," "hour," "nap," "bed") used less diverse positive EVs (r=-.13, p<.001). In Study 2, bloggers who more frequently showed a poetic theme ("heart," "soul," "eye," "tear," "darkness," "deep," "light," "dream," "sky") used more diverse emotion vocabularies of both valences, while bloggers who wrote more on the theme of recipes ("pepper," "recipe," "butter," "salad," "tomato," "cut," "stir") used less diverse EVs of both valences. These results echo the primary findings in the manuscript, in that they give an impression of diversity in emotion language mirroring concerns with themes germane to distress and wellbeing.

Comment #4 (Method):
The choice of criterion validity measures in Study 1, and the lack of criterion validity measures in Study 2, diminishes the impact and utility of these findings. In Study 1, text-derived measures are not linked to robust measures that are necessary to fully contextualize the present findings. For example, it seems possible that responses to a single-item self-report measure of physical health may be influenced by current mood in a way that longerform measures or certainly objective measures would not be. While the authors discuss the possibility that words influence our experience of the world (i.e., a moderate linguistic relativity hypothesis), they do not sufficiently demonstrate that language has any bearing on non-linguistic variables. Without any non-text-derived measures of mood, health, and well-being, a critical review of Study 2 results in particular would suggest they merely demonstrate the effect of language on language.  (Boster, 2017;Johnstone & Scherer, 2000). We agree with the existing framework that we are fitter as a species thanks to language, without which there would be only incommunicable, unsustainable "sensorimotor toil" that would ultimately interfere with reproduction (Cangelosi & Harnard, 2001). The segmentation of emotional experience-positive and negative-can be thought to operate along the same principles of adaptiveness as all forms of categorization in cognition (Harnard, 2017); i.e., humans would be disadvantaged by an absence of any language for positive and negative states. Recent findings of cross-cultural universals in the structure of emotion semantics underscore the fundamental species-level adaptiveness a lexical system for naming both positive and negative states (Jackson et al., 2019). That said, bringing to bear evolutionary-level knowledge on trait-level variability, while potentially useful, poses complex theoretical and methodological challenges not to be undertaken lightly, and which the current study is not designed to address (see Brown & Richerson, 2014;Buss 2015;Loehlin, 1992;Toobey & Cosmides, 1990). Our hypothesis here is that there will be a broad correspondence between vocabulary richness and experience. This is fundamentally concerned with patterning of individual differences in vocabulary size within species, which we now explain in the manuscript does not depend on the overall adaptiveness of language in general (pp. 21-22). In the process of making these changes, we have also added an elaboration to the discussion that places the current findings in deeper conversation with current constructivist theories of emotion language, and we show how current results and their implications may be consistent with constructivist theories in nuanced ways (pp. 21).

Comment #6 (Interpretation):
Word usage (and therefore active vocabulary) isn't just a matter of individual 'comfort' or 'interest': there are complex processes underlying both lexical selection (within the individual brain) as well as communities of speech (between individual brains). Factors such as recency and frequency, prestige and affiliation, are also involved in the creation and maintenance of word repertoires. The authors' assertion that people who are suffering from distress are more interested in negative emotion words (p. 9, line 157) should be discussed along these lines.
Author reply: This is a good point. We did not intend to suggest that those individuals suffering from distress are more interested in negative emotion words. We have now clarified that, in accordance with cognitive and linguistic theories on vocabulary enrichment, the psychological process of being interested in or attentive to one's own affective states can serve as a mechanism by which the individual is motivated to acquire and deploy more varied ways of describing their emotions. Additionally, we now explicitly incorporate past research that touches on these other psycholinguistic and sociolinguistic factors that are well-established and known to be involved in vocabulary development and maintenance: "In addition to well-established determinants of vocabulary acquisition and maintenance (e.g., Kenji & D'andrea, 1992;Van Overschelde, 2002), we similarly suggest that preoccupation with or interest in one's own affective states could contribute to the development of increasingly diverse affective taxonomies and lexica."(p. 5).

Comment #7 (Interpretation):
The authors criticize previous findings without offering clear recommendations for how emotion language should be acquired and used. They suggest that it is not vocabulary size that needs to be increased, but stop short of proposing other features of language or the conceptual system that may be driving positive outcomes. Existing lines of research do investigate these mechanisms, yet are criticized by the authors for employing constrained/passive methods. This seems to me an opportunity to acknowledge that a multimethod approach may be necessary to fully investigate the underlying relationship between emotional experience and language use.
Author reply: It is true, we had deliberately steered away from making explicit recommendations on how emotion language should be acquired and used, because the current correlational methods preclude such inferences. However, we share the reviewer's interest in this question, and we are familiar with the existing research that begins to speak to it. Upon reflection, we have decided that readers are likely to be wondering the same thing, and thus, we now explicitly discuss the temptation of-and reasons to refrain from--extrapolating from the current findings to form applied recommendations for acquiring/using emotion language. In doing so, we have named several other features that may drive positive outcomes (pp. 26-27).

Comment #8 (Miscellaneous):
Can the authors indicate how the sample texts in the supplemental material were selected? It is not clear if they are intended as representative cases, outliers, or random samples.