Mechanisms upholding the persistence of stigma across 100 years of historical text

Today, many social groups face negative stereotypes. Is such negativity a stable feature of society and, if so, what mechanisms maintain stability both within and across group targets? Answering these theoretically and practically important questions requires data on dozens of group stereotypes examined simultaneously over historical and societal scales, which is only possible through recent advances in Natural Language Processing. Across two studies, we use word embeddings from millions of English-language books over 100 years (1900–2000) and extract stereotypes for 58 stigmatized groups. Study 1 examines aggregate, societal-level trends in stereotype negativity by averaging across these groups. Results reveal striking persistence in aggregate negativity (no meaningful slope), suggesting that society maintains a stable level of negative stereotypes. Study 2 introduces and tests a new framework identifying potential mechanisms upholding stereotype negativity over time. We find evidence of two key sources of this aggregate persistence: within-group “reproducibility” (e.g., stereotype negativity can be maintained by using different traits with the same underlying meaning) and across-group “replacement” (e.g., negativity from one group is transferred to other related groups). These findings provide novel historical evidence of mechanisms upholding stigmatization in society and raise new questions regarding the possibility of future stigma change.


Study 1: examining aggregate negative stereotypes across 100 years of text
Study 1 examines whether the aggregate negativity in stereotypes of stigmatized groups has either generally decreased, increased, or remained stable in English-language book text from 1900 to 2000.Past work would suggest that any of these three patterns are empirically and theoretically possible.First, a decrease in negativity might be expected given results from repeated cross-sectional surveys of explicit and implicit attitudes showing slow but steady drops in negative representations for some target groups between the early 2000s-2020 [20][21][22] .Alternatively, increasing negativity might be expected, based on findings that as the number and visibility of several stigmatized groups increased over the past century, so too has the perceived threats of those groups, perhaps prompting negative backlash 23 .Indeed, recent history has seen rising hate crimes and legislation targeting stigmatized groups 24,25 .Finally, negativity may have remained persistent throughout the past 100 years.As reviewed above, some social science theories [9][10][11]26 posit that stigmatization serves multiple evolutionary, psychological, and social functions. Ths, societies may maintain a relatively stable level of stigmatization because it allows individuals and groups to attain relevant goals.

Results
For each of the 58 groups (represented by group label lists from historical thesauruses; Appendix), we ranked cosine similarities between the group and a list of 414 traits available across the 100 years (from a larger list of ~ 600 traits 27 ).We then identified the top ten traits associated with each of the 58 groups in each decade.From these top-associated traits we also extract our primary metric of interest-stereotype negativity-by taking the historically-contextualized (Appendix) valence scores of these traits 19 .For example, in 1900, the group Homeless was most associated with traits including helpless, heartless, lonely, disorderly, and thoughtless, which had an average valence score of − 0.10 (corresponding to the 18th most negative group); in 1950, the group was associated with traits including helpless, careless, inquisitive, impetuous, and cruel, with an average valence score of − 0.11 (the 17th most negative).In this way, each of the 58 groups ends up with a timeseries of 11 valence scores (all decades from 1900 to 2000).Additionally, to have a measure of whether the stereotype was stable in latent semantic meaning, we transformed the top-associated traits into scores of stereotype warmth and competence 28,29 , a widely used typology of stereotype content.We again did so using historically-contextualized scores (Appendix) of each trait along these latent semantic dimensions.In summary, our analyses focus on the 58 timeseries (one for each group) of latent valence, warmth, and competence, in addition to changes in the top-associated traits themselves (i.e., the top-10 trait content).
For our first result, we inspect the average stereotype negativity aggregating across the 58 stigmatized groups over 100 years of English-language books.Bayesian mixed-effects models (Methods) showed an aggregate slope that was close to zero, b = − 0.0030, 95% credible interval (CI) [− 0.0042, − 0.0017] (Fig. 1), indicating only a slight movement towards more negative representations of stigmatized groups over the past century.Indeed, inference using the Region of Practical Equivalence (ROPE) 30 showed 100% of the posterior estimates for the aggregate slope fell within a region that would be reasonably said to be a "null" effect.Thus, over 100 years of English-language text, negative stereotypes of stigmatized groups have remained, on aggregate, remarkably stable.

Robustness checks
We ensured that central conclusions were robust to various methodological choices (Appendix).First, because the Google Books corpus changed over time in the proportion of non-fiction scientific texts 31 , we replicate all analyses with word embeddings trained on the smaller, genre-balanced (i.e., consistent balance of fiction and non-fiction texts over time) Corpus of Historical American English 32 .Although COHA is substantially smaller (< 1% the size of Google Books), we still find consistent conclusions with both corpora, ruling out concerns that the observed stability in stereotype negativity is due merely to changes in genre composition.Second, we ensured robustness across frequentist modeling approaches, finding identical conclusions regardless of model specifications.
Third, for a subset of groups we had scores on (1) how much the meaning of the group labels (e.g., changes in the meaning of Gay) had changed across time (known as semantic drift), (2) how many meanings the group labels had other than non-group related meanings (known as polysemy), and (3) the frequency of these labels.For groups with these available data, we computed an additional regression that directly controlled for drift, polysemy, and frequency of group labels, and the main conclusions of aggregate stability remained.Further, none of these three variables showed significant interactions with change, indicating that these covariates did not moderate the conclusions of how groups are changing.Fourth, because the current methods rely on choices of how to represent the social groups in question, we tested whether changing the lists of group labels altered the key results.Even when using only the four most central and frequent group labels to represent a sample of the groups, we again found aggregate stability.

Study 2: mechanisms upholding aggregate negativity over 100 years of text
Study 1 showed that aggregate stereotype negativity was relatively stable over 100 years of English-language book text, raising the question of what societal mechanisms might maintain such stigmatization.Here, we introduce and test the Stigma Stability Framework (Fig. 2) to propose two complementary mechanisms of reproducibility (within groups) and replacement (transfer across groups), each enacted in three empirical patterns.

Reproducibility
We refer to the first mechanism as reproducibility, in which stereotype negativity is reproduced (repeated) towards a single target group.The idea of reproducibility emerges from the theory of stigma as a fundamental cause 26 , which posits that, if the underlying motivations to stigmatize (e.g., to dominate) have not been addressed, advantaged groups will continually reproduce stigma, often by developing new means to stigmatize the same group via interchangeable, mutually reinforcing mechanisms.For example, historical analyses show the changing means used to sustain stigmatization of Black people in the United States, moving from slavery to explicit forms of discrimination (e.g., Jim Crow laws) to more covert expressions, such as aversive 33 and "laissez faire" racism 34 .
Empirically, reproducibility can be enacted through three patterns.First, a pattern we term "deep stability" occurs when a stereotype is repeated across time (e.g., a group is "lazy" in 1900 and "idle" in 2000), with the same underlying negativity, latent semantic meaning (i.e., warmth and competence dimension), and semanticallysimilar top associated traits (i.e., near synonyms with high cosine similarities).A second pattern, which we term "valence + semantic stability", occurs when the same underlying negativity and semantic meaning is reproduced but new traits emerge; for example, a group is "lazy" in 1900 but "helpless" in 2000, with "helpless" being similar to "lazy" in average warmth and competence but not a direct semantic synonym as in the first pattern (i.e., they

Replacement
The second complementary mechanism upholding aggregate stereotype negativity is stigma replacement, in which the negativity towards one group is transferred across group lines such that, on aggregate, patterns of change may "cancel out".As one historical example, replacement is seen in increasing negativity towards Black Americans following the Great Migration into the Northern US in 1915-1930 that occurred alongside decreasing negativity towards European immigrant groups into those same areas 35 .That is, negativity historically held towards immigrants was transferred to more newly-arrived Black Americans.More broadly, the idea of replacement is also conceptually supported by the theory of stigma as a fundamental cause 26 : if underlying stigma motivations remain, but the permissibility of stigmatizing a given target changes (e.g., it is no longer permissible to stigmatize immigrant groups), then advantaged groups may seek a new target for their negativity (e.g., Black Americans).
Replacement, understood as the transfer or generalization of negativity, can similarly follow three empirical patterns.First, replacement could occur by transferring stereotype negativity across groups that share some semantic relationship, essentially in a hydraulic manner.For example, lessening negativity towards Asexual people may be transferred towards a group that shares similar semantic meanings of warmth and competence (i.e., is represented close in semantic space), such that a group like Infertile experiences a corresponding strengthening in negativity.In this case, groups that are more semantically similar in the 1900s would have very different (and, perhaps in some cases, even opposing) slopes from 1900 to 2000, resulting in a negative correlation between semantic similarity between groups and similarity in their change.
Second, the transferal of negativity could occur through other, non-semantic processes.Empirically, this would be observed if the semantic similarity between groups did not significantly predict the similarity of negativity slopes across groups.Indeed, the above example of transferring prejudice between immigrant groups and Black Americans 35 is less about shared semantics than it is other shared characteristics, such as geographic location.Further explanations for why negativity is transferred between groups could include the relative prevalence of the groups (e.g., when a group switches from the second to the first most prevalent minority group in society it could "acquire" the brunt of prejudice 25 ) or the shared functions of the groups (e.g., both satisfy the need for exploitation 36 ).
Finally, also within the general umbrella of replacement is a pattern that we term "generalization", which is the idea that some semantically-related groups may experience similar patterns of lessening negativity; in short, a pattern of change in one group "generalizes" to a similar second group, such that there would be a negative correlation between the semantic similarity between groups and their differences in change.To be clear, this pattern is not a hydraulic relationship (i.e., one group lessens, another group strengthens) like the other two empirical patterns of replacement, and thus it is not strictly a means of maintaining aggregate stable negativity.In fact, observing "generalization" would result in an aggregate change in the societal-level of negativity because multiple groups are changing in similar ways and similar directions.Nevertheless, we include this last empirical

Overall prevalence of mechanisms
Using these criteria, we found that over half of the individual group-level slopes (33/58 groups; or 57%; Table 1) revealed little meaningful change, a result consistent with the reproducibility mechanism.The remaining groups showing change (25/58; or 43%) suggest replacement (transfer) of stereotype negativity.

Empirical patterns of stability through reproducibility
Starting with the 33 groups indicating reproducibility, we find evidence for all three proposed patterns, each occurring in approximately equal proportion.The first pattern ("deep stability") is descriptively the most common, observed in 13/33 groups (39%).For example, negative stereotypes of the group Mute had top traits including [silent, listless, dull] in 1900 and 2000, with both timepoints reflecting near-identical negative representations (with traits that had high cosine similarities) and reflecting the same latent stereotype meanings of coldness and incompetence across time.
A second set of groups, 10/33 (30%), followed the "valence + semantic stability" pattern in which the actual top-associated traits turned over across time (i.e., traits had low cosine similarities) but latent valence and warmth and/or competence were stable.The negative stereotypes towards Black illustrate this empirical pattern: top traits in 1900 included [coarse, reckless, irresponsible, helpless, honest] but in 2000 included [sloppy, belligerent, thoughtless, and respectable].Although the traits themselves changed, the negativity was reproduced via stable latent sources (i.e., the average latent warmth was stable, b = 0.0024 [− 0.0001, 0.0050], as was average competence b = − 0.0002 [− 0.0031, 0.0026]).
The final set of groups, 10/33 (30%), followed the "valence stability" pattern, wherein latent valence was stable and reproduced across time, but the source of that valence varied (i.e., the latent semantics of warmth and competence shifted, possibly also with changes in the top traits).For example, stereotypes of Criminal were persistent in negative valence (b = 0.0025 [− 0.0011, 0.0061]) but the traits also showed an increase in latent warmth (b = 0.0036 [0.0009, 0.0063], see Appendix.That is, although the new top-associated traits were (relatively) warmer (e.g., no longer harsh and cruel but now inept and immature), they continued to reproduce negative valence through other meanings, such as by increasing in negative competence, negative morality, and assertiveness 37 .In sum, for these latter groups we find that, when a new trait does emerge, it likely brings new meaning along latent axes of warmth/competence or some other dimension, but always reproducing the underlying negative valence.

Empirical patterns of stability through replacement
Twenty-five groups (43%) changed meaningfully in stereotype negativity, prompting the next investigation on which empirical patterns of replacement they follow (Fig. 2).The first possibility is a "transfer" of stigma via shared semantics, in which a strengthening negativity towards one target group corresponds to diminishing negativity in secondary groups that are semantically related.Such a pattern was notably rare in the groups we examined.Indeed, using our current empirical operationalization (Methods, Appendix), only one group, Asexual, suggested transfer via shared semantics (i.e., warmth/competence) with other groups: Asexual showed a strong negative slope, b = − 0.012, while semantically-similar groups including Infertile (b = 0.0014) and Atheist (b = 0.0026) had slopes that were null but trended towards more positivity over time.In short, the initial tests appear to suggest transfer via shared semantics is a relatively rare mechanism in historical patterns of stigma negativity, although it could be observed more widely for other groups using different empirical criteria.
In contrast, most of the changing groups (19/25, or 76%) suggested other processes of transferring negativity that were not predicted by simple semantic relationships.For instance, the increasing negativity towards the group target Aboriginal did not correspond to lessened negativity towards semantically-related groups of Indian or Middle-eastern, suggesting that many changing groups may be sharing/transferring negativity through processes not reducible to shared semantics.
Finally, we found that a handful of the changing groups (5/25, or 20%) showed "generalization" of negativity, in which semantically-similar groups are changing in similar ways (e.g., similar strengthening in negativity).For instance, increasing negativity towards Smoker (b = − 0.0060) was similar and shared across semantically-similar groups including Alcoholic (b = − 0.0023).Such a finding could help explain why the overall, aggregate trend showed a slight movement towards more negative representations in general.We nevertheless emphasize that this empirical pattern of generalization is uncommon (only 5 groups out of the possible 58), thereby underscoring that mechanisms prompting widespread change in societal negativity are rare in the current set of stigmatized groups.

General discussion
Using 100 years of English-language book text and the largest sample of negatively stigmatized groups studied via NLP methods to date, the current research contributes new understanding to the persistence of aggregate negativity in social group stereotypes.Study 1 shows that, over the past 100 years, societies have maintained a First, the reproducibility mechanism is relatively more prevalent than replacement, with 57% of groups showing individual stable slopes, suggesting that negativity itself is reproduced towards individual group targets.Within these stable groups, approximately one-third showed "deep stability" (i.e., all metrics we investigated were stable), as in the case of several disability-related stigmas.The remaining two-thirds of stable groups showed patterns of reproducibility that suggested shifting sources of negativity.For instance, for groups such as Alcoholic or Black, the top-associated traits might have shifted over time, but the underlying latent valence was always the same general level of negativity.Such dynamic reproducibility suggests that society may be inventing new means (e.g., new words or new meanings) to repeatedly stigmatize the same groups across time 38 .
At the same time, a handful of groups did show some meaningful change in stereotype negativity, underscoring that change for some groups is possible, if far from assured.Such change in negativity suggests the operation of a complementary replacement mechanism, in which negativity is transferred across group lines.Notably, however, we found little evidence that the transfer of stigma was falling along predictable lines of semantically-similar groups (e.g., there was no evidence of a transfer between Gay and Transgender 39 ).Instead, the data suggest that transfer of stigma is more likely to occur through means other than simply semantic relations.These findings set the stage for future research to identify non-semantic replacement mechanisms, such as groups that appear in the same geographic locations 35 , that fulfill the same function 36 , or that switch their relative ranking in terms of numerical prevalence 25 .
Finally, for a small handful of changing groups, we found that increasing negativity towards one target group appeared to cascade through semantically-related groups, an empirical pattern that could help explain the slight aggregate trend towards increasingly negative representations over time.That is, while the current work focuses on the mechanisms upholding stigma stability, we also show the utility of the current methods for uncovering means by which society may, in the future, show aggregate change in stigmatization.Although such generalization of stigma is obviously concerning in the case of increasing negativity, it could be possible that, for other groups not studied here (e.g., groups that are not as ubiquitously stigmatized), generalization mechanisms could operate to cascade positivity throughout the network (e.g., as in the "secondary transfer effects" of intergroup contact 40 ).
Of course, there are limitations to using text analysis for social science inquiries.For instance, when it comes to the words used to operationalize groups, factors such as semantic drift, polysemy, and frequency 14 can confound inferences.In the Appendix, we show that the primary conclusions are not altered after controlling for the drift, polysemy, or frequency of group labels, or after using shorter lists of only four central group words.Additionally, when it comes to the underlying text, the current study focused on the (limited) Google Books English corpus 31 .Although conclusions were robust in a complementary book text source, variation is likely to arise in different media sources or languages.For instance, stigma may be more persistent in some societies than others, such as those with stronger collective norms that require more conformity 41 .We look forward to testing such questions following continued innovations in natural language processing and the availability of archived text data across cultures, geographic locations, and diverse languages.
Finally, the current work was limited in focusing on only one dimension of stigma-negativity in stereotypes-leaving open the question of how other aspects of stigma, such as the initial act of labeling or behavioral dimensions of discrimination 1 , might persist or change over time.Although labeling and behavior are more difficult to address using historical language, researchers may successfully merge the current data with other  2): "1" = reproducibility via "deep stability, " with all metrics showing stability over time; "2" = reproducibility via "valence + semantic stability, " despite change in the top traits; "3" = reproducibility via "valence stability, " despite changing semantic sources of the valence; "4" = replacement via "shared semantics"; "5" = replacement or transfer via other, "non-semantic" means; "6" = the only empirical pattern that might produce change, by producing "generalization" via shared semantics, where a change in one group cascades to similar changes among semantically-related groups.Estimated change in latent valence are the group-level random slopes, from the Bayesian mixed-effects model predicting valence from the fixed effect of time.95% HDI = 95% Highest Density Interval.Top traits are the top-10 traits ranked as the most associated with the group target (i.e., have the highest average cosine similarities), within the listed example years.
Vol  42,43 or human attitude data 22,44 to better understand the persistence and change of interacting components of stigma 6 .

Conclusion
The results reported here fall between the hopes of optimists that we might gradually increase in positivity towards all groups 22 and the fears of pessimists that society will continue to grow in hostility and negativity 24 .Instead, the current data seem to suggest a stasis, in which the aggregate negativity of today is not so different from that of the past.Most critical, by expanding beyond traditional social science methods to consider stereotype negativity towards a large, diverse set of stigmatized groups across an unprecedented timespan of 100 years of books, we can also newly observe how stigmatization persists in society.Our hope is that introducing the Stigma Stability Framework, alongside a methodological toolkit to test its predictions, will provide a clearer path to explore the mechanisms (specifically, reproducibility and replacement) upholding persistent negativity.Only by understanding the pernicious ways that stigma endures both within and across groups can we, as researchers and societal actors, be equipped to durably reduce the multifaceted processes of stigmatization.

Text data sources
We used word embeddings trained using the word2vec algorithm (a neural network method to compute vector representations of word meaning 45 ) on book text obtained from Google Books and the Corpus of Historical American English (COHA) text data 14

Selecting and representing stigmatized groups in text
A study of whether stigma is stable or changing in society requires the best approximation of a large, diverse set of stigmatized groups.To that end, we selected an established list of 93 stigmatized identities, characteristics, and statuses 46 .Because we use single word embeddings, a subset of these 93 groups were indistinguishable from one another with the current methods.Thus, we collapsed these into a single identity-for example, both "symptomatic" (e.g., bipolar symptomatic) and "remitted" identities (e.g., bipolar remitted) were combined, as were various forms of cancer (e.g., breast cancer current, breast cancer remitted, colorectal cancer current, and so on).We recognize this as a limitation of the current methods, since these groups do indeed differ in how they are perceived in society as well as in their social, health, and economic consequences.
To identify group stereotypes in text, we need to use multiple terms to represent a single group and thereby ensure that the representation of a group triangulates on the group-specific meaning rather than some other polysemous meaning of a single term (e.g., "Alien" alone could refer to aliens from outer space, rather than to the intended meaning of a non-citizen or immigrant).Thus, for each of the stigmatized groups, we generated lists of single word synonyms using both historical and contemporary thesauruses (e.g., Oxford Historical Thesaurus, Thesaurus.com).Table S1 in the Appendix lists the chosen synonyms for each group.Using only the uniquely distinguishable groups, and those groups that could be represented in a list of single word synonyms available across all decades of text, ultimately left us with a final list of 58 stigmatized groups (Table 1).

Extracting stereotype content and valence
To compute stereotype valence (positivity/negativity), we begin by extracting the stereotype content (top-ten traits associated with each group).Using a list of 414 traits, all available traits in the corpus of text 27 , we computed the average cosine similarity between a given target trait (e.g., "untrustworthy") and a group representation (e.g., Dealer), by averaging across the pairwise cosine similarities between the trait and all group synonyms (e.g., "untrustworthy"-dealer, "untrustworthy"-peddler, "untrustworthy"-narcotic, "untrustworthy"-supplier, and so on).Then, all traits were ranked according to how strongly associated they were with the group, and the top-ten traits were used as the stereotype content for that group in a given decade.Additional details are provided in the Appendix.
After identifying the top-10 trait associates for each group in each decade, we replaced the traits with their corresponding valence rating that was contextualized to that specific decade.Specifically, rather than assume that a single rating of valence (e.g., from valence rating norms) was applicable across 100 years, we allowed the valence of traits to vary across time.To do so, we first created lists of 25 words that strongly (and stably) signaled positivity/negativity, drawn from the lists used for the Implicit Association Test and the Word Embeddings Association Test.Then, we took each of the traits and looked at its relative cosine similarity to these positive and negative words within each decade of text.We used these historically-contextualized valence scores of each trait within a decade of text and took the average across all the top-10 traits within a decade.For example, imagine the top ranked traits for Aboriginal include [hostile, rebellious, adventurous, superstitious].The corresponding historically-contextualized valence ratings for each of these traits in 1900 are [− 0.13, − 0.18, 0.05, − 0.19] and in 2000 are [− 0.14, − 0.11, − 0.02, − 0.15].Taking the average across these traits returns an average valence for Aboriginal of − 0.11 in 1900 and − 0.11 in 2000.We repeat this computation for all 11 decades (1900-2000) resulting in an 11-decade long timeseries of average historically-contextualized valence scores for each stigmatized group.
We followed a similar process to create the timeseries for the average historically-contextualized latent warmth and competence scores for each stigmatized group.We use a set of "anchor" words (Appendix) from automated

Figure 1 .
Figure 1.Trajectories of stereotype valence (positivity/negativity) towards 58 stigmatized groups.The dark black line indicates the aggregate (averaged) trajectory from raw values, showing stability in aggregate trends of stereotype negativity across 58 stigmatized groups over the past century.Individual colored lines show the individual group trajectories.Y-axis indicates the stereotype valence score (historically-contextualized valence scores averaging across the top 10 traits in each decade), with higher scores indicating more positive trait representations and lower scores indicating more negative trait representations.X-axis indicates the decade of the Google Books text.

Figure 2 .
Figure 2. Visual overview of the Stigma Stability Framework.The framework proposes two complementary classes of mechanisms-replacement and reproducibility-to explain aggregate (averaged) persistence of negative stereotypes towards stigmatized groups at a societal level.The general mechanisms are, in turn, empirically enacted in six empirical patterns, as described in the figure.Gray numbers and percentages indicate the number of groups, in the current sample and with the current methods, that followed each empirical pattern.
pattern under the umbrella of a "replacement" mechanism, because it conceptually also involves a transfer or generalization of negativity across groups.Study 2 used the same data and general methods as Study 1 to provide initial empirical tests of the prevalence of mechanisms in the Stigma Stability Framework, looking across all 58 groups and 100 years of English-language book text.A group is classified as showing reproducibility if the random slope estimates from the Bayesian regression model are null (i.e., the Highest Density Interval includes zero).Conversely, a group is classified as showing replacement if the random slopes are not null, since replacement requires that the target group be changing in stereotype negativity for some transfer to occur.
Vol.:(0123456789) Scientific Reports | (2024) 14:11069 | https://doi.org/10.1038/s41598-024-61044-zwww.nature.com/scientificreports/ relatively stable level of stereotype negativity, as revealed from the aggregate trend across 58 stigmatized groups.A key contribution of the current work is going beyond this aggregate persistence to also consider what societal mechanisms may uphold such negativity.Study 2 provided a first attempt at conceptualizing and empirically testing a novel theoretical framework for addressing this question.We propose two overarching classes of mechanisms-reproducibility of negativity towards individual group targets, and replacement (or transfer) of negativity across group lines-as a framework to understand how stereotype negativity persists on aggregate.The initial empirical tests of this framework suggest three key take-aways.

Table 1 .
Estimated change in latent valence and identified top traits for 58 stigmatized groups across 100 years of book text."Mech.class" indicates the class of mechanism for each group, with codes as follows (see also Fig. .:(0123456789) . Standard hyperparameters were used (e.g., 300-dimensions, a context window size of 4 words on either side of the target training word), and only words appearing at least 500 times were included in training.The entirety of the Google Books corpus (across 200 years available, from 1800 to 2000) consists of ~ 850 billion tokens and 500 million books, while the COHA corpus is much smaller, consisting of ~ 410 million tokens, but it is balanced in the composition of text genres across history (equivalent balance of fiction and non-fiction texts).