Introduction

Memories of the personal past that are retrieved unintentionally and effortlessly have been termed involuntary autobiographical memories (IAMs1). Recent evidence suggests that some IAMs are experienced recurrently—that is, episodes of the same event can be retrieved repetitively and involuntarily2. Past studies indicate that recurrent IAMs are commonly experienced in everyday life: large proportions (52–55%) of general populations (e.g., undergraduates, nationally representative samples, community-dwelling older adults) have endorsed experiencing at least one recurrent IAM within the past year2,3,4. However, recurrent IAMs have also been conceptualized as harmful or characteristic of psychiatric disorders5,6, despite their prevalence among the public. For instance, these memories have been described as a transdiagnostic component of many clinical disorders (e.g., depression, anxiety, PTSD), acting as a mechanism by which psychopathology emerges or is maintained5,7. As such, recurrent IAMs have been simultaneously characterized as maladaptive or clinically relevant on one hand5 and benign or pleasant on the other2.

This discrepancy has been acknowledged by some researchers, who have suggested that while many people experience recurrent IAMs, it may be the case that only some subset of IAMs are dysfunctional or related to poor mental health8,9,10,11. Indeed, our recent work has supported these hypotheses, finding that the maladaptive subset of recurrent IAMs could be those that are negative in valence: participants who experienced recurrent IAMs with self-reported negative valence had elevated symptoms of depression, PTSD, social anxiety, and general anxiety compared to those who experienced neutral, positive, or no recurrent IAMs3,4. However, solely relying on self-reported ratings of memories’ phenomenological properties (e.g., valence) is a major limitation of this field to date. For one, valence ratings are confounded with content (i.e., what people report remembering) since certain events are likely to be more negative or positive than others. Without examining the events that participants are remembering, we cannot conclude as to whether it is negative valence per se that characterizes maladaptive recurrent IAMs or that maladaptive recurrent IAMs tend to involve certain types of content. Prior attempts to characterize recurrent IAMs have been done without knowing what these memories are actually about, prompting us to ask if content analysis could help explain differences between maladaptive and benign recurrent memories.

Past content analyses of autobiographical memories have been fruitful at describing the events that participants remember, revealing common topics such as accidents, holidays, and interpersonal relationships12,13,14,15. Importantly, some suggest that content in recurrent IAMs might change as a function of mental health status, and could even provide insights into how recurrent IAMs might diverge across different disorders5,6,16,17,18. In other words, researchers have hypothesized that content in AMs (especially recurrent IAMs) differs between (1) those experiencing high versus low levels of psychopathology, as well as between (2) those with different mental health disorders (disorder-specific content). For example, early work has reported that recurrent IAMs about abuses or assaults were significantly related to greater depression severity, whereas recurrent IAMs about other topics (e.g., “illness or death”, “relationships/family”) were not significantly related to psychopathology19. More generally, some studies have found that AM content differed between those high versus low in symptoms, or with versus without diagnoses: patients with severe health anxiety have been more likely to report AMs about disease, illness, or death compared to participants without diagnoses20, and participants with high social anxiety have been more likely to report social anxiety-related AMs than participants with low social anxiety21. Researchers have also observed linguistic differences in AMs produced by individuals with versus without psychiatric diagnoses. For instance, participants with social anxiety disorder have been found to use more self-referential, anxiety-related, and sensory language compared to nonpsychiatric controls22. Beyond distinguishing between high versus low levels of psychopathology, these content differences in AMs have also been theorized to be disorder-congruent, in that recurrent IAMs should differ across disorders in terms of their themes or topics5. Indeed, linguistic variables (e.g., sensory-somatic and self-referential language) have been shown to differentiate AMs produced by those with bipolar disorder, unipolar depression, or no diagnoses23.

Though these results have supported that there are associations between AM content and mental health, numerous studies have not reached such conclusions. Other researchers have observed no significant differences in IAM content as a function of psychopathology, including between dysphoric versus nondysphoric participants24 and between participants with high versus low social anxiety25. Previous work has also failed to find evidence of disorder-congruent content: while AM content from participants with anxiety disorders was significantly different from nonclinical controls, content did not significantly differ across various disorders (i.e., social anxiety disorder versus panic disorder26). Taken together, it remains inconclusive as to whether AM content varies across symptom severity or across disorders. It is possible that some of these mixed findings could have arisen due to limitations such as (1) relatively small sample sizes, (2) modeling content as pure categories (e.g., labeling each memory as containing a single topic), and (3) valence being entangled with content.

First, sample sizes have typically been relatively small (<100) due to the populations being studied (e.g., people with psychiatric diagnoses), which limits recruitment. Sample sizes also have an upper limit when conducting traditional, manual content analyses, since large volumes of text can quickly become unfeasible to code manually. While data from relatively small sample sizes have provided valuable foundations for this research area, it is yet to be seen whether past findings generalize to larger, nonclinical samples. Second, the prevailing method of content analysis—single, mutually exclusive content labels being assigned to each document (i.e., single-membership models)—is a rather coarse measure of content. Evidence suggests that mixed-membership models (i.e., assuming that each document is a mixture of topics) offer a more granular measure of content that can illustrate topic structures distinct from those produced by single-membership models27,28. Third, past studies have not typically disentangled content and valence. We believe examining both content and valence simultaneously is an important open research question because while some types of content likely involve congruent valence (e.g., deaths and negative valence), content and valence can also be relatively independent. A participant might recall a relatively unpleasant event yet ascribe neutral or positive valence to the memory (e.g., failing a test, which subsequently led to improvements in study habits); conversely, a memory involving a relatively mundane event can feel highly distressing (e.g., a family dinner, which evokes feelings of homesickness after having moved out). Some have suggested that content is relatively unimportant compared to feelings and thoughts evoked by one’s IAMs. For example, individuals’ negative appraisals of their IAMs have been found to be better predictors of depression symptoms than experimenter-rated severity of the recalled event29. Given this, we asked whether recalled details of the event still matter after accounting for the valence ascribed to the memory. While many previous studies have measured content and valence in AMs and incorporated these variables into their analyses (e.g., matching AMs for emotional intensity22), to the best of our knowledge, studies to date have not simultaneously examined the relationships of IAM content and valence with psychopathology.

Here, we tested the hypothesis that content in recurrent IAMs would be associated with symptoms of mental health disorders, unique from previously observed links between self-reported valence and psychopathology3,4. Further, we also asked whether any relationships between content and symptoms would be distinct across different disorders (e.g., disorder-specific content). To address these questions, we analyzed both content and self-reported valence in recurrent IAMs experienced by a large nonclinical sample.

Methods

In a previous study, we conducted the first large-scale content analysis of recurrent IAMs using computational techniques (e.g., machine learning, natural language processing) and highlighted the validity of using semi-automated methods for content analysis15. Here, we used the same approach to assess how symptoms of mental health disorders (i.e., depression, posttraumatic stress, social anxiety, general anxiety) might uniquely predict the use of different content categories (i.e., topics) within recurrent IAMs. By using computational methods, the current study analyzed data from a much larger sample size (N = 6187) than previous work and allowed us to ask more nuanced questions about content in recurrent IAMs (e.g., modeling topics as continuous variables rather than categorizing memories as containing single topics).

Participants

As part of a previous study15, a convenience sample of undergraduate students was recruited at the University of Waterloo, who participated in return for course credit. Data were collected in five waves between September 2018 and February 2020, with each wave occurring at the start of an academic term (i.e., Fall/September, Winter/January, Spring/May). In total, 6187 unique individuals participated, and they produced 3624 text responses (not all participants experience recurrent IAMs, so not all participants can produce text responses; see Recurrent Memory Scale under Materials). Of these participants, 71% were women, 28% were men, and 1% were nonbinary, genderqueer, or gender nonconforming. Participants were mostly White/Caucasian (39%), East Asian (23%), or South Asian (19%), and were primarily born in Canada (66%), China (9%), or India (5%). Mean age was 19.9 (SD = 3.3, range = 16–49).

Materials

Recurrent Memory Scale

The Recurrent Memory Scale3 was used to assess participants’ recurrent IAMs. Participants indicated if they had experienced at least one recurrent IAM within the past year, not within the past year, or never2. If they had experienced at least one within the past year, they wrote a brief description of their one most frequently recurring IAM and rated it on ten 5-point Likert scales (e.g., frequency of recurring, valence3). For instance, valence of their most frequently recurring IAM was assessed using the item “Is the recollection emotionally very positive, positive, neutral, negative, or very negative?” (−2 = very negative, 0 = neutral, 2 = very positive). While participants were administered the full scale, here we focus on participants’ text descriptions of their recurrent IAMs (content) and self-reported valence ratings.

Depression Anxiety Stress Scales

The Depression Anxiety Stress Scales-21 (DASS-2130) consists of 21 items with three subscales: depression (DASS-D), anxiety (DASS-A), and stress (DASS-S). Internal consistency was high in the current sample for the full scale (α = .95) and the subscales for depression (α = .91), anxiety (α = .87), and stress (α = .89).

Posttraumatic Stress Disorder Checklist for DSM-5

The Posttraumatic Stress Disorder Checklist for DSM-5 (PCL-531) consists of 20 items assessing symptoms of PTSD. Participants indicated the degree to which they experienced symptoms in the past month following any very stressful event of their choosing. Internal consistency was high in the current sample (α = .96).

Social Phobia Inventory

The Social Phobia Inventory (SPIN32) consists of 17 items assessing fear, anxiety, and physical discomfort experienced during social situations. Internal consistency was high in the current sample (α = .95).

State-Trait Inventory of Cognitive and Somatic Anxiety—Trait Version

The State-Trait Inventory of Cognitive and Somatic Anxiety—Trait Version (STICSA-T33) consists of 21 items assessing cognitive and somatic aspects of trait anxiety. Internal consistency was high in the current sample (α = .94).

Procedure

Participants opted into completing this study, for which they received course credit towards undergraduate psychology courses. It consisted of a 60-minute online survey completed in a single session. This survey was used by the University of Waterloo’s Department of Psychology to characterize students volunteering to participate in psychology-related studies. After providing informed consent, participants voluntarily completed a battery of questionnaires in a randomized order, including the Recurrent Memory Scale and mental health indices (DASS-21, PCL-5, SPIN, and STICSA-T). All other measures in the online survey were unrelated to the current study (e.g., administered by other researchers at the University of Waterloo). All procedures were approved by the University of Waterloo’s Office of Research Ethics (Protocol #40049).

Data preparation

Prior to analysis, we first used supervised machine learning (ML) to detect and remove invalid texts; these include “don’t know” responses, incomprehensible or nonsensical responses, or responses that are irrelevant to the question (e.g., describing dreams when the question asked about memories34). Removing invalid texts is a recommended step in text analysis35,36 because invalid texts are unrelated to the construct in question (here, recurrent IAMs), and excluding them reduces noise in the data. Previous work has confirmed that ML-based methods can be more effective at identifying invalid text responses compared to other existing approaches, such as response length or time34. Here, the ML-based approach identified 202 texts as invalid (71 human labeled, 131 model predicted15), all of which were excluded from further analyses.

Valid texts were then preprocessed following current recommendations35,36,37, including tokenization, cleaning, stop word removal38, vocabulary pruning39, and lemmatization40. Texts were represented using a bag-of-words, unigram approach41, which decomposes texts into singular words without retaining information about word order.

Topic modeling

We discovered topics in participants’ descriptions of their recurrent memories using structural topic modeling (STM42,43). STM is a method of unsupervised machine learning that estimates hidden topic structures that could have plausibly produced the observed set of documents (i.e., corpus). By using texts as the input, topic modeling can output topics, or groups of words that can be interpreted as themes in the input texts42,44,45. Given the success of past work highlighting STM as a valid method of semi-automated content analysis with AM texts (see15 for details about preprocessing, model selection, and validation with human judgment), we extended our prior approach to answer novel questions about recurrent IAMs and psychopathology.

In previous work, we constructed topic models based on this dataset using only one covariate: participants’ self-reported ratings of the memory’s valence15. Here, we analyzed this dataset in conjunction with mental health-related covariates: participants’ current symptoms of depression (DASS-D), PTSD (PCL-5), social anxiety (SPIN), and general anxiety (STICSA-T). Data and code supporting the findings of this study are openly available on the Open Science Framework (https://doi.org/10.17605/OSF.IO/GUR5V).

Results

Recurrent IAM valence predicts symptoms of mental health disorders

We replicated our previous findings in that negative valence in recurrent memories was significantly related to greater symptoms of depression, PTSD, social anxiety, and general anxiety3,4 (Fig. 1). Elevated symptoms were associated with more negative valence in recurrent IAMs. Correlations between recurrent IAM valence and symptoms of mental health disorders were all significant (ps < .001, rs < −.16).

Fig. 1: Self-reported valence ratings of recurrent IAMs and symptoms of mental health disorders.
figure 1

Note. DASS-D Depression, Anxiety, Stress Scales—Depression Subscale, PCL-5 PTSD Checklist for DSM-5, SPIN Social Phobia Inventory, STICSA-T State-Trait Inventory of Cognitive and Somatic Anxiety—Trait Version, IAM involuntary autobiographical memory. Shaded ribbons represent 95% confidence intervals. ***p < .001.

As an exploratory analysis, we also examined whether these patterns held when excluding participants who scored above clinically relevant cutoffs on any of the mental health indices31,46,47,48 (Fig. S1 in Supplementary materials). Even following these exclusions (nexcluded = 1562; 63%), all patterns remained significant (ps < .03, rs < −.07).

Topic structure and modeling

To examine content, we implemented structural topic modeling (STM) using the {stm} package in R43. Model selection and validation procedures are reported elsewhere in detail15. In brief, researchers must select an a priori number of topics to be identified when using STM43. To select an appropriate number of topics, we simulated and inspected models with the same parameters (e.g., covariates) across a varying number of topics (5–25 topics36,49). We then selected a final number of topics using a two-stage approach41,50. First, internal validation (based on computed metrics derived from the data) guided the initial selection of three candidate models out of the twenty simulated models. Second, external validation (based on human judgment and performance measures) guided the selection of the final model out of the three candidate models15. Previous work has indicated consistency between the current topics (discovered via semi-automated methods) and topics typically found in AMs (using manual methods15). The final topic structure obtained is shown in Table 1. Inclusion of the additional mental health-related covariates (participants’ scores on the DASS-D, PCL-5, STICSA-T, and SPIN) did not alter the topic structure obtained during the original study15, which only included valence as a covariate. Correlations between topics are also described in this previous study15.

Table 1 Topics in recurrent IAMs.

Predicting topic prevalence using symptoms of mental health disorders

Symptoms of mental health disorders significantly accounted for unique variance in topic prevalence, even when controlling for valence ratings and symptoms of all other disorders (see https://doi.org/10.17605/OSF.IO/GUR5V for details). We found unique relationships between specific topics and specific symptoms of disorders, above and beyond how positive or negative a memory was rated (Table 2).

Table 2 Significant predictors of topic prevalence in recurrent IAMs

Depression

Depression symptoms were significantly and uniquely predictive of greater use of topic 8 (“Abuse and trauma”; Fig. 2).

Fig. 2: Predicted topic prevalence using depression symptoms.
figure 2

DASS-D Depression, Anxiety, Stress Scales—Depression Subscale. Different panels represent different topics, denoted by topic numbers and most representative words at the top of each panel. **p < .01. Shaded ribbons represent 95% confidence intervals.

PTSD

PTSD symptoms were significantly and uniquely predictive of greater use of topic 2 (“Negative past relationships”). Further, PTSD symptoms were significantly and uniquely predictive of less use of topic 4 (“Embarrassing events”), topic 9 (“Conversations”), topic 11 (“Interactions with friends”; Fig. 3).

Fig. 3: Predicted topic prevalence using PTSD symptoms.
figure 3

PCL-5 = PTSD Checklist for DSM-5. Different panels represent different topics, denoted by topic numbers and most representative words at the top of each panel. **p < .01, *p < .05. Shaded ribbons represent 95% confidence intervals.

Social anxiety

Social anxiety symptoms were significantly and uniquely predictive of greater use of topic 12 (“Communication and miscommunication”) and topic 16 (“Reflections on decisions”). Further, social anxiety symptoms were significantly and uniquely predictive of less use of topic 2 (“Negative past relationships”), topic 8 (“Abuse and trauma”; Fig. 4).

Fig. 4: Predicted topic prevalence using social anxiety symptoms.
figure 4

SPIN Social Phobia Inventory. Different panels represent different topics, denoted by topic numbers and most representative words at the top of each panel. **p < .01, *p < .05. Shaded ribbons represent 95% confidence intervals.

General anxiety

General anxiety symptoms were significantly and uniquely predictive of greater use of topic 9 (“Conversations”; Fig. 5).

Fig. 5: Predicted topic prevalence using general anxiety symptoms.
figure 5

STICSA-T State-Trait Inventory of Cognitive and Somatic Anxiety—Trait Version. Different panels represent different topics, denoted by topic numbers and most representative words at the top of each panel. *p < .05. Shaded ribbons represent 95% confidence intervals.

Discussion

Controversy surrounds the basic nature of recurrent IAMs. What are they typically about? Which of them—if any—are dysfunctional (i.e., related to worse mental health)? Some authors have speculated that only a subset of recurrent IAMs is maladaptive8,9. Evidence suggests that this maladaptive subset could be characterized by negative valence3,4,51,52,53,54. However, valence is potentially entangled with content (i.e., what people report remembering), since valence is related but dissociable from content. In other words, while some events or topics might typically be associated with certain valences (e.g., “Experiences with family members” and positive valence), content can also be relatively independent from valence. It remains relatively unexplored whether content (one’s reconstruction of the event) is related to psychopathology, after accounting for valence (emotional responses to IAMs). In the current study, we examined whether the content and valence of recurrent IAMs could differentiate between memories that are related to worse mental health status versus those that are not. By analyzing content in large samples of recurrent IAMs using structural topic modeling15,42,43, our work indicates that both the valence and content of one’s recurrent IAMs are linked to symptoms of mental health disorders.

Specifically, we replicated the relationship between negative IAM valence and greater symptoms of all disorders3,4. This result is consistent with past work showing more negative AMs in individuals with major depressive disorder52, PTSD54, social anxiety disorder53, or generalized anxiety disorder51. It also replicates a significant association between more negative AM valence and greater depression symptoms55. Interestingly, Rubin et al.15 also report a nonsignificant correlation between AM valence and PTSD symptoms (r = −.12). Here, we found significant negative correlations between recurrent IAM valence and depression, as well as PTSD symptoms, potentially because of the much larger sample size (and statistical power to detect effects). Alternatively, the current work may have found these effects because it focused on recurrent IAMs whereas Rubin et al.15 examined a variety of AMs (e.g., both voluntary and involuntary ones); it may be that involuntary or recurrent memories have stronger relationships to mental health than voluntary memories. The current results provide empirical support for theoretical models wherein recurrent IAMs, and the emotions they evoke, are a transdiagnostic process involved in psychopathology5,7, even in large, nonclinical samples.

In addition, our topic model allowed us to test hypotheses about whether one’s level of psychopathology is associated with recurrent memory content2,3,5. We used participants’ symptoms of mental health disorders (i.e., depression, PTSD, social anxiety, general anxiety) to predict the content (i.e., topic prevalence) within recurrent IAMs. What we have shown here is that some topics were seemingly benign: some topics such as “Physical activities and performance” and “Environment and locations” were not significantly related to any mental health index. Critically, specific topics were significantly associated with symptoms of some mental health disorders, but not others; no topics were universally related to symptoms of all mental health disorders (i.e., depression, PTSD, social anxiety, general anxiety), suggesting that recurrent IAM content is disorder-specific.

One of our key findings was that each mental health index uniquely predicted the prevalence of distinct topics. For example, while PTSD symptoms were significant positive predictors of the “Negative past relationships” topic (e.g., “relationship”, “negative”, “situation”, “traumatic”), depressive symptoms were significant positive predictors of the “Assaults and abuse” topic (e.g., “assault”, “abuse”, “trauma”, “fail”). Furthermore, social anxiety symptoms were significant positive predictors of the “Communication and miscommunication” topic (e.g., “question”, “ask”, “teacher”, “class”), and general anxiety symptoms were significant positive predictors of the “Conversations” topic (e.g., “someone”, “conversation”, “say”, “person”). These findings support the idea that there is indeed disorder-specific content in recurrent IAMs6,16, and that recurrent IAMs containing certain types of content are more likely to reflect psychopathology than other types of content5,19. In fact, we replicated a significant positive relationship between use of the “Assaults and abuse” topic and depression symptoms19,56, but with a much larger sample size and in a nonclinical sample. Our current evidence also lends support to hypotheses about the nature of emotional memory processes in PTSD. In particular, models of PTSD have suggested that difficulties in retrieving positive memories could underlie the development or maintenance of PTSD57,58,59,60. In line with these ideas, our results indicated that greater PTSD symptoms were related to less use of a positive topic (“Interactions with friends”) in recurrent IAMs. Topics significantly related to social anxiety (e.g., topic 12: “Communication and miscommunication”, e.g., incorrectly answering a question aloud during a class) seem to reflect ideas that recurrent IAMs in social anxiety disorder might focus on specific, aversive social events that individuals have experienced53,61. Similarly, topics significantly related to general anxiety (topic 9: “Conversations”) might reflect the high prevalence of social/interpersonal concerns as a worry topic in generalized anxiety disorder62,63.

Overall, our results suggest that while it is accurate to say that negative recurrent IAMs are consistently related to increased symptoms of psychopathology3,4, this statement can now be refined. Here, we show that both valence and specific types of content in recurrent IAMs are related to symptoms; self-reported valence as well as the use of specific topics in recurrent IAMs were significantly related to mental health indices. Moreover, many negative topics were not significantly related to symptoms of any disorder (e.g., “Stressful events”, “Confrontations, fights, and arguments”). Though emotions evoked by recurrent IAMs were an important component in the relationship between these memories and psychopathology, our study suggests that content (e.g., types of events described, how the individual reconstructs them) is also vital to consider and provides unique insight into mental health status. This is notable because it suggests that the emotional valence ascribed to specific memories is not entirely sufficient to distinguish maladaptive recurrent memories from benign ones. Based on our analyses, even if participants attributed the same level of valence to their memories (e.g., “very negative”), the way they reconstructed the memory (i.e., content) was still significantly related to their current symptoms of mental health disorders.

In terms of limitations, we only investigated recurrent IAMs—a specific type of AM—in the present study. While there is theoretical precedent claiming that these memories are particularly relevant to psychopathology5,7, the current results could be compared against voluntary AMs to test these previous hypotheses. For instance, it would be valuable to examine if there are dissociations or similarities between voluntary and involuntary AM content in relation to PTSD symptoms (as in55). Also due to the current study’s focus on recurrent IAMs, our data were largely limited to a past-focused temporal orientation, as participants were instructed to only describe/rate memories. While participants were free to describe future-oriented content, this would have been incidental (at the discretion of the participant) and related to their past-oriented memories. Future studies could compare memories to other forms of spontaneous or clinically relevant thought (such as episodic future thoughts, ruminations, or worries) to investigate how content and/or valence might have varying relationships with psychopathology depending on the type of thought. Alternatively, studies could qualitatively classify topics in terms of whether they are more past- or future-oriented.

Another relevant design choice was to focus on participants’ one most frequently recurring IAM. This approach is consistent with past literature2,3,4, which facilitates comparison with prior studies. However, past work has also found that participants report experiencing many different recurrent IAMs (M = 7.33). As such, future studies could ask participants to list all (or at least multiple) recurrent IAMs they have experienced recently. By asking participants to describe and/or rate these less frequent recurrent IAMs, one could potentially better capture the variety of memories that individuals experience in their daily lives.

Additionally, it is worth noting that the cross-sectional nature of the current study also introduces limitations. Participants completed all measures in a single, asynchronous, online session. Potentially, any given participant could have been in a negative mood during the study, which could have led to greater accessibility of negative recurrent IAMs, or more negative ratings of these memories’ valence1. At the same time, negative mood could have inflated participants’ scores on mental health indices64. While mood is indeed part of theoretical models of why recurrent IAMs persist and correlate with mental health outcomes29,65, the current work cannot conclusively discern the directionality of effects observed. Future work could consider methods such as ecological momentary assessment66 to unravel temporal dynamics of recurrent IAMs and the emotions preceding/following them.

We also used a large, nonclinical convenience sample of undergraduates in the current study. While it can be assumed that these participants were overall nonclinical (e.g., not patients with psychological/psychiatric diagnoses), based on general base rates of psychopathology, it is probable that some number of our participants were experiencing clinically relevant mental health challenges. Future work could recruit large samples of clinical and/or nonclinical participants and potentially examine whether the current observations are consistent across populations. In a similar vein, the current study examined symptoms as a dimensional measure of psychopathology, rather than a categorical distinction (e.g., individuals with disorders vs. individuals without). It remains possible that psychopathology could have nonlinear relationships with recurrent IAMs that emerge when comparing those with or without diagnosed mental health disorders.

Finally, future work could extend the computational approach taken in the current study. Here, we used a unigram, bag-of-words approach of topic modeling, which is common in the natural language processing literature67. While this approach typically performs well and effectively captures content41,42, it has a few noteworthy drawbacks. Unigram approaches can miss multiword phrases (e.g., “family” and “member” being highly representative of topic 6, rather than “family member”), which could potentially make a topic difficult for researchers to interpret (e.g., seeing “member” without “family”). Moreover, the bag-of-words approach does not have access to information about word order41, which can sometimes change the meaning of a document (e.g., “is it good?” vs. “it is good”). Developments in natural language processing and associated software could enable future studies to incorporate these additional features into topic models.

In conclusion, our current study is the first to provide a comprehensive description of how both recurrent IAM valence and content predict symptoms of mental health disorders in a large, nonclinical sample. While recurrent IAM valence seems to have relatively homogenous relationships with symptoms across various disorders, recurrent IAM content appears to differentiate between disorders. By using machine learning and natural language processing techniques in a novel application (i.e., autobiographical memory), we present a robust and reproducible topic model that reveals unique relationships between specific topics and specific mental health indices. Our work shows that AM phenomenology (e.g., valence) and content can be analyzed in tandem, and at much larger scales than previously thought possible, to answer critical questions about the fundamental nature of recurrent IAMs. Topics in recurrent IAMs—and their links to mental health—are identifiable, distinguishable, and quantifiable.