Introduction

What is fiction about, and what is it good for? The stories that humans tell one another can be described or defined in many ways. This paper focuses on stories, understood as fictional verbal narratives: verbal retelling of events involving intentional actions, carrying little or no guarantee that the events in question ever occurred in reality. Fiction in this sense is probably present in all human cultures (Brown, 1991; Scalise Sugiyama, 2001) but the causes of its cultural success remain disputed. One popular explanation starts by noticing the similarities between narrative fiction and play-acting, dreams, or non-verbal fictions such as pantomime, theatrical play, storytelling by visual means, etc. All these activities involve some kind of simulation: the imaginary representation of actions that may not have any reality. Simulations allow us to anticipate mentally events that could occur in the future, and imagine possible reactions to them. Owing to this feature, simulations are often deemed to be cognitively beneficial. Going one step further, theorists of an evolutionary bent speculate that the benefits provided by simulations were fitness-enhancing in our evolutionary past, leading to the natural selection of simulation-specific mental adaptations (Tooby and Cosmides, 2001; Mar and Oatley, 2008; Boyd, 2009). The fun that we take in play and storytelling could be one such adaptation.

Such an adaptive simulation perspective is well accepted as an explanation for the evolution of play in several animal lineages (Lancy, 1980; Piaget, 1999). A similar perspective has been defended for dreaming (Piaget, 1999; Revonsuo, 2000). Could the adaptive simulation perspective apply to verbal, narrative fiction as well? Many authors, often belonging to the “literary Darwinism” school of thought, claim as much (Carroll, 2011; Gottschall, 2013; Oatley, 2011; Boyd, 2009). Our goal in this paper is to formulate a version of the view that fictions are adaptive simulations that is as precise and refutable as possible.

We shall not concern ourselves with other adaptationist accounts of fiction, those that argue that storytelling has adaptive benefits not related to simulation—for instance, as a vehicle for subsistence-related information (Scalise Sugiyama, 2001), as a device to enhance social cohesion (Dissanayake, 1979), or as a way of making sense of events (Bietti et al., 2018). These accounts may or may not be accurate, but they can usually be applied beyond fiction sensu stricto, to linguistic communication or to the arts in general (Mellmann, 2012). Accordingly, these accounts usually dwell on broader categories such as “storytelling” in a sense that includes non-fictional narratives, gossip, etc. (Bietti et al., 2018). This paper will focus on narrative fiction in a narrow sense, not including things like rumours, incorrect hypotheses, counterfactual reasoning, conceptual abstractions, reasoning by analogy, over-reaction to possible threats, self-deception, etc.

The adaptive simulation perspective, as we view it, holds that narrative fiction plays upon a set of cognitive dispositions to produce and enjoy simulations, which may also be recruited by play or by dreams. If fiction is an adaptive simulation, what is it a simulation of? There exists a variety of answers to this question. Some say fiction should focus specifically on social life, others on threats, yet others on fitness-relevant events in general.

The view that fiction prepares us for social life seems dominant: fiction might recruit and train our capacity to imagine other people’s thoughts (Boyd, 2009; Mar and Oatley, 2008; Zunshine, 2006). Mar and Oatley (2008) suggest that the function of narrative or fictional simulation is to improve a reader’s empathy or theory of mind, broadly construed (Kidd and Castano, 2013). This view raises two kinds of issues, empirical and theoretical. From an empirical standpoint, the evidence for fiction improving theory of mind is still shaky. An attempted replication of Kidd and Castano (2013) failed to find the expected beneficial effects (Camerer et al., 2018). It is unclear to what extent the development of mentalising skills in young children benefits from the children’s exposure to stories involving deceptions or lies: some evidence suggest that young children are unable to understand the basic conceit that tales like Little Red Riding Hood are based on (Peskin, 1996), just as they have trouble understanding deception in general (Mascaro et al., 2017). A more fundamental problem is theoretical. No convincing case has been made that fiction, as distinct from conversation or day-to-day interactions, recruits our theory of mind capacities in uniquely specific, intensive, or useful ways. There is room for doubt. Narrative fiction simulates rare and implausible interactions (as we shall see below) that we are unlikely to encounter in real-life. It typically leaves its consumers with few opportunities to react to fictive events and get feedback on their reactions. Contingent feedback is a key component of any learning process that is amply provided by real-world interactions, or interactive play (pretend play, board games, etc.), but not by fiction. A convincing adaptationist account should compare fiction with other plausible ways of honing mentalising skills.

Another candidate adaptation is the simulation of threats. The claim that fictional narrative prepares us for dangers by simulating them has been put forward by several authors (van Krieken, 2018; Clasen et al., 2018; Gottschall, 2013). In their view, “horror simulations may […] serve the adaptive function of preparation for real-world encounters with negative emotions and/or hostile others” (Clasen et al., 2018). A similar view has been defended concerning dreams (Revonsuo, 2000; an explicit inspiration to Gottschall’s theory of fiction in his 2013 book). Revonsuo’s “threat simulation hypothesis” holds that dreams play an adaptive function as danger simulators, a view borne out by content analyses of dreams in multiple cultures (Revonsuo, 2000; Zadra et al., 2006).

Put this way, however, a threat simulation hypothesis for fiction may suffer from the same flaw as the “theory of mind training” hypothesis: it does not specify how fiction compares to other ways of preparing for potential dangers. It also leaves the notion of a threat rather unspecified. The next section proposes a more specific, thus hopefully more testable version.

The ordeal simulation hypothesis

This hypothesis starts from the assumption that simulation is a useful way to prepare for some kinds of threats, but not for all threats. Borrowing a distinction from Boyer and Liénard (2006), we distinguish two ways an organism may avoid a threat. One may detect and react to the threat when it has become manifest—by fighting or fleeing, for instance. Or one may prevent the threat from occurring in the first place, by inferring its existence before it becomes a manifest danger. We add that some threats are easier to react to than to prevent, while others are easier to prevent than to react to. The latter are preventable threats; we will call the former “reactable” threats, to coin a term.

A typical preventable threat is pathogen contamination. It can be detected in others, or in one’s environment, before one gets contaminated. At this stage it is not too late to take a series of precautionary measures (washing hands, being careful about one’s food, etc.). Once contamination is manifest, however, an infection has taken hold. It might well be too late for an adaptive reaction. A typical “reactable” threat is aggression, or predation. A rich behavioural repertoire allows us to counter these threats when they become manifest, by fighting, fleeing, or freezing (Duntley and Buss, 2011). Confrontation with a reactable threat can be trained for, but that is not the same thing as preventing such threats from occurring. Some reactable threats are common or benign enough that day-to-day experience adequately trains us for them. Most of us had enough encounters with mosquitoes, spiders, or upset dogs, to learn the adequate reactions. Other reactable threats, however, are difficult to train for with ordinary experience alone: they may be too rare to be encountered frequently, or too dangerous to be experienced many times without lethal consequences. For such cases, mock training in play or in imagination provides the ideal preparation.

We call “ordeals” these reactable threats that are too rare and too impactful for us to train for confronting them by ordinary experience alone. We take inspiration from Symons’s remark that animal and human play tends to simulate rare, high-risk events, for the same reason that drills simulate rare emergencies (Symons, 1978). A typical ordeal is seldom encountered but even one occurrence can have momentous effects—making real-world training an unlikely option. Alongside dangerous ordeals, one can also conceive of “positive” ordeals: events that could massively increase one’s fitness if reacted to appropriately. A typical positive ordeal would be the choice of a mate in a social context where long-term monogamy is strictly enforced. The concentration of huge risks and opportunities upon a few rare events, and the possibility of reacting to them adaptively, provide both a possibility and a reason to prepare for ordeals. Other proponents of the adaptive simulation perspective have studied in greater depth the ways that fiction influences our behaviour by preparing us for future contingencies (Carroll, 2011; Carroll et al., 2017; Clasen et al., 2018; see also Pinker, 1997: pp. 542–543). The ordeal simulation hypothesis elaborates upon this claim by specifying which precise kind of event fiction should prepare us for.

Like other accounts of fiction taking the adaptive simulation perspective, this hypothesis claims that fiction targets a sense of narrative imagination also recruited by play and (possibly) dreaming. It adds that this sense’s chief evolutionary function is to train our minds for future ordeals (as distinct from dangers in general), by simulating them. In other words, the enjoyment that we derive from play or fiction is in part due to an evolved reward system that drives us to seek simulated ordeals, be they positive or negative. This hypothesis does not merely underscore similarities between fiction, play and dreams, nor does it claim that fiction should be focused on dangers in general. Not all dangers are “ordeals”, and not all ordeals are dangers. The ordeal simulation hypothesis makes rather specific predictions concerning the kind of dangers that one should encounter in fiction: they should be rare, severe, reactable threats. This excludes threats that are more or less beyond our control (death from a lightning strike or a ruptured aneurysm); typical preventable threats such as food poisoning, epidemic or cardiovascular diseases, etc.; as well as benign or frequent reactable risks (e.g., mosquito bites). Accidents, encounters with predators, and social interactions turned violent are ordeals: when they occur, an appropriate reaction can save us, while a clumsy move may kill us. Homicides are ordeals par excellence: they combine two types of ordeal of high evolutionary relevance: predation and social aggression (Boyer and Liénard, 2006; Barrett, 2015).

The importance of aggression and predation in play, dreams, and fiction, is often noted, but its theoretical implications are not necessarily drawn. Aggression and predation (being attacked or chased by social antagonists or predators) make up between 41 and 52% of the threats encountered in dreams (Revonsuo and Valli, 2000; Zadra et al., 2006). Yet the threat simulation hypothesis for dreams does not explain why aggression and predation, of all possible dangers, should enjoy any prominence at all. Likewise, Scalise Sugiyama’s intriguing analysis of Little Red Riding Hood (Scalise Sugiyama, 2004) argues that predators loom large in fictional narratives across cultures because they constitute an important adaptive risk, not considering the popularity of predator tales in today’s industrial societies where animal attacks present no threat at all. In contrast, the literature on play specifically explains why aggression-related events should be a focus of play (Symons, 1978; Fry, 1990; Lancy, 1980).

Other authors have noticed fiction’s fascination for high-stakes events with major fitness impact, a phenomenon that Daniel Nettle (2005a, 2005b) studied in depth. Interestingly, Nettle does not endorse an adaptive simulation perspective, proposing instead a view of fiction as “supernormal conversation”. In this hypothesis, events with major fitness impact (such as murders or high-stakes marriages) feature prominently in fiction for the same reason that they figure prominently in gossip or journalism: because their social impact makes them useful things for a social animal to keep track of. Fiction, however, makes these events much less relevant than their real-life equivalent would be, since their relevance is normally due to their real-life consequences. Fiction here is thought to exploit psychological proclivities that are adapted to non-fictional stimuli. As a consequence, it must emphasise the most extreme fitness changes. The abnormal prevalence of killings in fiction, thus, is due to the fact that fictional accounts must make up for their lack of real-world relevance by emphasising high-stakes events. Although starting from different premises, the ordeal simulation hypothesis and the supernormal conversation hypothesis make rather similar predictions. We did not attempt to differentiate them in this paper, but as we shall see, some of our findings do not fit easily with either hypothesis.

Overview

We tested the ordeal simulation hypothesis in two studies. Study 1 compares the occurrence of agentive deaths in fiction with real-life, whereas Study 2 compares mentions of death in fictional vs. non-fictional texts. Study 1 asked whether agentive deaths, i.e. deaths caused by a homicidal intention (a typical ordeal), are over-represented in twentieth century American novels, compared to real-world statistics for that time and place. Using Wikipedia summaries of 744 US novels (1900–1999), we found that agentive deaths were largely over-represented, while other types of death, also highly frequent in fiction compared to reality, were less prominent. This appears to vindicate the ordeal simulation hypothesis’s main prediction. However, we had yet to see whether this focus on agentive death was specific to fictional, as opposed to non-fictional, narratives. In Study 2, we used automatic text analysis to extract the frequency of words related to agentive and natural death, in two distinct corpora, matched for cultural background and author’s gender, one consisting of novels, the other of private correspondence and diary entries. Contrary to what the ordeal simulation hypothesis would predict, we found no textual indication that mentions of agentive mortality were specifically over-represented in fictional narratives as opposed to private correspondence or diary entries. This will lead us to propose another interpretation for the prominence of agentive death in fictional narratives, in the general discussion that serves as a conclusion.

All reported studies were preregistered, in three waves (two for Study 1, one for Study 2), on the Open Science Framework. We append as Electronic Supplementary Material a “research diary” that contains all our preregistered material, as well as the results, data, and code. This research diary includes material not presented here, being peripheral to our test of the ordeal simulation hypothesis.

Study 1: The Ultimate Spoiler: Mortality and causes of death in 744 novels

This studyFootnote 1 considers three types of mortality—natural, agentive, and accidental—as they occur in the real-world statistics of twentieth century USA, and in the plots of novels produced in the USA in the twentieth century, as studied through Wikipedia summaries. This choice of setting, which carries obvious cultural biases, was entirely due to data availability: excellent Wikipedia summaries exist for American novels on any decade of the twentieth century. The ordeal simulation hypothesis predicts that two types of deaths should be specifically over-represented in fiction as compared to reality: agentive deaths (i.e., deaths caused by a human agent) and, to a lesser extent, accidental deaths. Agentive death includes suicides, peacetime homicides, and war-related killings. Accidents in the definition that we used include injuries not caused by (communicable or not communicable) diseases: deaths by fire, by drowning, by transportation accidents, etc. Following an evolutionary-psychological logic, the ordeal simulation hypothesis also predicts that very specific, extremely rare types of deaths should be over-represented in fiction, namely, death by animal predator attacks (a frequent threat in our ancestors’ environment, now negligible), and death by capital punishment (a type of death that combines two major threats, social exclusion and violent aggression). Rates of natural death may or may not be exaggerated in fiction as compared to reality, but should be more realistic than both agentive and accidental death rates. (Natural deaths include any death caused by disease and not due to an external injury, be it an unintentional accident or an intentionally caused wound—World Health Organisation, 2004).

Methods

Selection of novel summaries

Wikipedia, the online encyclopedia, was searched manually (between March and June 2014) for suitable summaries of twentieth century novels. Our selection criteria excluded the following:

  • Novels not written by an American citizen writing in the English language.

  • Novels whose Wikipedia summary was shorter than 200 words, or was incomplete (introducing the plot instead of summarising it, or leaving out the end part).

  • Unfinished novels or short stories compilations.

  • Novels written by multiple authors, by anonymous authors, or by authors without a Wikipedia entry specifying their birth date (this criterion was included to ensure the quality of our Wikipedia sources, and also because we originally planned to study the authors’ demographic information).

  • To maximise our sample’s diversity while minimising coding time, only one novel per author was retained. When several were available we picked the earliest novel satisfying the above criteria.

  • Novels for children and Young Adults were not included. They appear to be rather different from the rest of the sample in handling explicit topics like violent mortality, and this, coupled with their massive overrepresentation in Wikipedia summaries, might have biased the study.

A first wave of selection retained a list of 846 items, which we preregistered. More thorough reading during coding revealed that 102 of these items did not satisfy one of the above criteria. This left us with 744 novels. They were sorted into four different categories or genres, based on consensual estimation by two authors (AA and OM): General fiction (n = 349), Violent fiction (including crime novels, spy novels, thrillers, and war novels) (n = 156), Science Fiction (n = 167) and Fantasy (n = 72).

Real-world statistics

We used data from the Institute for Health Metrics and Evaluation (IHME), made available through the Global Health Data Exchange (GHDx) (Global Burden of Disease Collaborative Network, 2017), which centralises data from World Health Organisation documents, as well as the Global Burden of Disease study. Using their data exportation tool (http://ghdx.healthdata.org/gbd-results-tool, accessed March 2019), we obtained death rates for the year 1990, broken down by cause. We consider two broad types of causes, the agentive (suicides, homicides, and war-related deaths) and the accidental (other types of death, including accidental deaths). Predator attacks and executions are both types of agentive causes, but are treated separately due to their extreme rarity.

This data was collected for the year 1990 (the earliest year for which this source has available data), and for the following locations: Afghanistan, Mexico, Russia, South Africa, a selection of countries considered by the OESC to be low-income countries, and the United States. Apart from the USA, these countries were chosen for their relatively high mortality rates—one of our goals being to obtain an upper bound for real-world mortality rates. For each location and each cause of death we considered the yearly rate for all age groups, and separately, the age group of 15 to 49 year-olds. For each of the causes of death we considered, we then extracted three rates:

  • - A “realistic” rate, obtained by considering the yearly death rate for Americans aged 15–49, taking our source’s mid-range estimate. This rate was meant to reflect what realistic mortality chances would have been for typical twentieth century American adults.

  • - An “upper bound” rate, obtained by getting the maximum of all upper bound estimates for each of the locations that we searched. Our searches through the literature (summarised in sections B and D of the supplementary material) suggest that other datasets are unlikely to exceed these maximum estimates by a large margin.

  • - A “lower bound” rate, obtained by getting the minimum of all lower bound estimates for each of the locations that we searched.

Coding the summaries

Each summary was coded to obtain the number of characters in the novel, as well as the number of characters dying in some way during the period of time covered by the plot. Only characters that were individually named (e.g., “Victor Frankenstein”) or otherwise identified (e.g., “the doctor’s creature”) were taken into account. Characters presented as undifferentiated groups (e.g., “Victor’s relatives”, “the ship’s crew”) were not considered. Five types of deaths were coded: Suicide, Homicide, War-related (these three making up the “Agentive death” category), Accidental, and “Other”. Coding was performed by two authors (OM and AA) and one independent coder (blind to the study’s hypotheses). A sample of 20 books was coded by all three to estimate inter-rater agreement. The intra-class-correlation for the general death rate was 0.854 (one-way random, absolute agreement, average), and 0.812 for the proportion of agentive deaths. 489 novels contained at least one death.

The plots of our 744 novels cover widely variable amounts of time. Looking for some basis for comparing absolute death rates in fiction and reality, two authors each selected the novels whose plot, in their personal estimate, spanned one year or less (82 general, mainstream fiction, 77 violent fiction, 7 science-fiction, 14 fantasy). Estimating this from the Wikipedia summaries was difficult, and inter-rater reliability was low (kappa = 0.58), which lead us to discard all the novels on whose chronology we disagreed. In the end 180 novels were retained as having a plot spanning one year or less. We calculated mortality rates for these 180 novels, based on the assumption that their plot covers exactly one year. This assumption is of course highly uncertain, and the figures given below must thus be taken as mere proxies. However, given the magnitude of the reported below, our basic conclusions would still hold even if we underestimate the duration of all plots by a factor of 2 or 3.

Results

Death rates in novels vs. reality

Death rates are vastly greater in fiction as compared to reality, in general and for all the specific causes of death that we considered (Table 1). Fictional rates systematically range at least one order of magnitude above real rates. The only exception is natural death. There, rates are still higher in fiction than in reality, but the orders of magnitude are the same. All differences are large enough that our estimate for the amount of time covered by all the novels’ plots may be off by a factor of 2 or 3 without affecting our results. Confirming our previous result, we find that the discrepancy between real and fictional rates is most pronounced by far for agentive deaths, murders in particular. This remains true for non-genre “general” fiction.

Table 1 Mortality rates, expressed as one death per 100,000 individuals for a given year, for various causes of death

Relative weight of agentive death

We focused on the 489 novels whose summary mentions at least one death (241 general fiction, 124 violent fiction, 80 science fiction, 37 fantasy) to explore the relative weight of various causes of death (Table 2). Relatively speaking, only homicides are vastly over-represented in fiction compared to reality. Other causes of death are on the high end of a realistic range. This, at least, is true if we consider adults between ages 15 and 49 to establish our baseline. Including senior mortality figures would change this pattern drastically—but novel protagonists tend to be relatively young. The disproportion of agentive deaths compared to other deaths is almost singly accounted for by the excessive frequency of homicides (as distinct from suicides or war-related deaths). The relative frequency of fictional homicides exceeds even the highest estimates in the anthropological literature, including Chagnon’s controversial estimate of one in three death among the Yanomamö due to homicide (Duntley and Buss, 2011). Deaths as a result of predator aggressions are notable by their presence in fictional data, given their near absence in the real world. Here again this is not specific to genre fiction: animal predator attacks also occur in two “general fiction” novels, sometimes at the cost of rather artificial plot contortions (see, e.g., The Prince of Tides (Conroy, 1986), a bleak romantic drama where a pet tiger is set lose on a group of intruders in a suburban home). Another type of death not infrequent in novels, but highly unlikely in reality, is capital punishment.

Table 2 The share of several causes of death, relative to the total number of deaths, in real-world data compared to novel summaries

Discussion

The data just described may not surprise a regular consumer of novels. Yet, it does contradict the view that fiction provides a realistic simulation of social life. To cite one prominent adaptationist theory:

“… most fiction strives for realism in the most important aspects of human experience: the psychological and the social. Even novels with fantastical themes and settings (e.g., science-fiction or fantasy novels) strive for verisimilitude with respect to human emotions and interpersonal interactions”. (Mar and Oatley, 2008: p. 185).

Pace Mar and Oatley (whose work cannot be reduced to the claim just cited), we identify one area where fiction blatantly flouts realism with respect to interpersonal interactions. In fiction, one’s chances of getting killed by one’s fellow humans are a hundred times higher than in reality.

The general prevalence of death from natural causes, on the other hand, was not predicted by the hypothesis. It is coherent with the finding that negative content enjoys cultural success, as compared to accounts of positive or neutral events (Fessler et al., 2014; Blaine and Boyer, 2018; Boyer and Parren, 2015; Barrett et al., 2016; Bebbington et al., 2017). This phenomenon, however, is by no account specific to fiction, as opposed to information bearing on the real world. This raises the question whether the overrepresentation of agentive deaths specifically affects fiction, as opposed to other types of verbal production. According to the ordeal simulation hypothesis, fiction should specifically be concerned with such ordeals as agentive killings, a prediction also made by Nettle’s Supernormal Conversation Hypothesis (Nettle 2005a, 2005b). Study 1 did not speak to this prediction. Study 2 tests it.

Study 2: Ordeal simulation or social relevance? Mentions of death in private documents

The ordeal simulation hypothesis is meant to explain the content of narrative fiction as distinct from other kinds of verbal productions. Yet it is not the only possible explanation for the prominence of agentive death in fiction. Agentive death rates might be boosted, instead, by a conjunction of two biases: a preference for negative over positive information (Baumeister et al., 2001) and a preference for information bearing on the social world (Mesoudi et al., 2006; Stubbersfield et al., 2015). In this account, the cultural appeal of agentive death may not be specific to fiction, contrary to what the ordeal simulation hypothesis implies.

To check for this possibility, Study 2 used automatic text analysis to extract the frequency of words related to agentive or natural death, in two distinct corpora, one consisting of novels written by English-speaking women (1751–1953), another of private correspondence and diary entries, also written by English-speaking women (all with ties with the United States of America, or the colonies that preceded it), between 1675 and 1953. We looked for private documents written by amateurs, not intended for the general public, and recounting the kind of real, quotidian events that happened to the writer over the preceding weeks or months. This material is more suitable than, for instance, newspapers, as journalists may be too close to fiction writers to provide a sound basis for comparison. They may use similar strategies of narration as the authors of fictional stories, since many readers arguably get the same kind of excitement from reading true fact as from reading fiction. The private letters and diary entries selected for this study, by contrast, were not meant by their authors to be published or shared beyond their immediate correspondents. Following the ordeal simulation hypothesis, we predicted that words relating to agentive death (homicide, suicide, or war) should be over-represented in novels as compared to private documents, but that vocabulary related to natural death should not be. We added words related to accidental deaths to our investigation, for the sake of completeness and without a clear prediction in mind.

Methods

The “Letters and Diaries” corpus

This corpus was built from the Letters and Diaries of American Women corpus (Rhind-Tutt et al., 2001). We assembled it by selecting from the primary documents all the letters and diary entries written by an author who had a biographical notice in the database, with birth date. This criterion was due to the fact that this corpus was also being used by another study. We also took care to include only intimate documents not intended for a broad audience. Any author who had already been published in her lifetime (as per the information given on her biographical notice) was not retained. Memoirs, as distinct from diaries or letters, were excluded. These book-length documents might have been written with future publication in mind. The resulting corpus comprises 10,810 documents (6095 letters, 4715 diary entries) written by 156 authors, from 1675 to 1953.

The “Novels” corpus

This corpus was built from the Gutenberg online repository of free-of-rights literary works. The list was assembled between 2014 and 2015. We went by the following inclusion criteria:

  • The books were written by women. Books written by women with male co-authors were discarded. In case of doubt, the book was discarded.

  • All co-authored books were discarded.

  • The author had English as native language. (A US or Commonwealth nationality was assumed to indicate a native English speaker, unless the name or content suggested otherwise.)

  • The book was a work of fiction. We excluded: poetry, scientific books and research, travel diaries, memoirs, historical books, pedagogical books, cooking and housekeeping books, spiritual books, books about manners, business books, books on how to build relationships, medical books, engineering books. Books that were presented as autobiographical, memoirs, or true accounts of historical events, were also excluded.

  • Only works precisely dated (to the year) were considered. Books published after 1953 were not considered, since our ‘Letters and Diaries’ corpus material ends at this date (only very few books were found after that date in any case).

  • Because of their peculiarity, and also to be consistent with other published work (Morin et al. 2016), we tried to exclude books destined to be read by children (estimated from the title, the author, the book’s length). However, given the great number of such works in the corpus, we were not entirely successful, and realised after a first wave of data collection that many such works had been collected. These are explicitly signalled.

We ended up with 811 books (of which 188 are books written for children), written by 500 authors, between 1751 and 1953.

Word lists

We built up three word lists from the section “death” of the Linguistic Inquiry and Word Count (“LIWC”, Pennebaker et al., 2007), a standard tool for lexicometric investigations. The “death” section of the LIWC was divided into three subsections, mirroring the classification of death causes used in Study 1: “general death” (words that do not connote any threat or danger); “accidents” for words linked to unspecified or accidental death risks, and “agentive” for words that clearly connote agentive deaths.

Results

All analyses were carried out in R (version 3.4.3). We built a series of linear mixed effects models (using the lme4 package in R—Bates et al., 2019). Model comparison was performed using Akaike’s information criterion (AIC): models with a lower AIC were considered more informative. Each model tried to predict the proportion of words in the word list of interest present in the documents. One first model, the “null model”, simply nested the data points—the individual novels, letters, or diary entries, according to the author’s identity. A second model, the “best model without test variables”, was then generated by adding a series of control variables to the model (the document’s length, the document’s date, the document’s vocabulary, the author’s age), retaining only the variables that made the model strictly more informative. As a third step, we added our variable of interest: which corpus the document was from (“Letters and Diaries” or “Novels”)—a variable called “corpus”.

For all three variables of interest, a substantial amount of variation was linked to the two categories of documents contained within each corpus. The “Letters and Diaries” corpus contains letters (“letter”) and diary entries (“diary”). The “Novels” corpus contains books written primarily for children (children’s novels or “children” for short) and books written primarily for adults (adult’s novels, “adults” for short—these are not adult novels in the more specific sense of that term). We built a fourth model using these four sub-categories, a variable henceforth called “category”, instead of “corpus”.

Words related to natural death

We attempted to predict the proportion of words from our “natural death” word list among the word tokens present in each document. Our best model without test variables (AIC = −75,894) included a positive effect for each document’s vocabulary size (i.e., the ratio of the number of word types over the number of word tokens; henceforth called vocabulary), in addition to the random intercept for authors. Documents with richer vocabulary are more likely to include words related to natural death. Adding a document’s “corpus” to this model did not result in a more informative model (AIC = −75,883). The resulting model included a small positive effect for “corpus”—i.e., “Novels” as contrasted with “Letters and Diaries”. Novels, as compared to private documents, were slightly more likely to include words related to natural death (fixed effect estimate for the corpus being “Novels”: Beta = 0.00001, SE = 0.00005, t = 1.8). Replacing “corpus” with “category” (whether the document is a letter, a diary entry, a novel for adult or for children) did not produce a model more informative than either the preceding model or the best model without test variables (AIC = −75,866). This pattern of results remained robust when excluding nine outliers (documents where the proportion of death-related words was >15%), but the effect of “corpus” (“Novels” as opposed to “Letters and Diaries”) became even weaker. Overall, natural death-related terms were not markedly more frequent in fictional documents, as predicted (Fig. 1).

Fig. 1
figure 1

The frequency of words related to natural or agentive deaths, in four types of documents. Error bars stand for 95% confidence intervals. The frequency of words related to natural death in diary entries (on the far right) is partially inflated, because diary entries, being shorter, have richer vocabularies relative to their length. The fact that our word list for words related to natural death contains more words (137) than the one for agentive death (81) does not explain away the difference between the two proportions. Even when controlling for this, words related to natural death are still much more frequent than words related to agentive death in all four types of documents

Words related to agentive death

Our best model without test variables (AIC = −108,647) included only the random intercept for authors. No control variable was found to make the model more informative. Adding “corpus” to this model did not produce a more informative model (AIC = −108,629). The model with “corpus” included a small negative estimate for the effect of “corpus” (Beta = −0.000004, SE = 0.000009, t = −0.4). In other words, being a novel as opposed to a private document made the occurrence of words related to agentive death slightly less probable, contrary to our prediction. Adding the “category” variable did not improve the model (AIC = −108,598). This pattern of results remained robust when excluding two outliers (documents where the proportion of death-related words was >10%). Our main prediction was thus refuted.

Words related to accidental death

Our best model without test variables (AIC = −140,812) included only the random intercept for authors. No control variable was found to make the model more informative. Adding “corpus” to this model did not produce a more informative model (AIC = −140,790). The model with “corpus” included a small positive estimate for the effect of “corpus” (Beta = −0.0000006, SE = 0.000005, t = 0.3). Replacing “corpus” with the “category” variable did not improve the model (AIC = −140,750). This pattern of results remained robust when excluding eight outliers (all the documents where the proportion of death-related words was >1%). More importantly, we realised that words related to accidental deaths were extremely rare because only one lexeme from our LIWC-based word list was present in the documents (drown*). This was not the case for the other two word lists.

Discussion

Study 2 suggests that mentions of death, both violent and non-violent, are about equally frequent in fictional and non-fictional material, contradicting the ordeal simulation hypothesis’s prediction. We could attempt to explain away this result by noting that text-mining does not necessarily capture a text’s content. The word “death”, for instance, might be used in set phrases whose meaning is anything but lethal (“dead reckoning”, “we are dead-set on this task”, etc.). Our impression, however, is that death-related vocabulary substantially correlates with actual mentions of death. To verify this, one of us (OS) systematically went through all the passages in our “Letters and Diaries” corpus that featured one of the words in our agentive death-related words list (e.g., “execution”, “killer”, “war”). He found that, in more than three quarters of cases, such words did refer to actual executions, killers, wars, etc. They were not used as part of a set phrase, nor did they relate to a death that the author had read about in the newspaper, or in a novel. (Pennebaker et al., 2003).

Our results thus tend to support the view that death-related themes are about as prevalent in private letters or diaries as they are in fiction. This is surprising, and not simply because it contradicts our hypothesis. Novels, after all, are commercial products. We may presume that their authors tried to appeal to their readership’s preferences as best they could. Assuming that death-related information appeals to most readers, we would expect professional novelists to emphasise it in their commercial productions. Here, they do not seem to.

Like those of Study 1, the results of this study apply to one particular culture only—American culture, chosen for reasons of data availability and to maintain consistency with Study 1. Although most students of the topic seem to agree that agentive death also has appeal in other cultural traditions (Scalise Sugiyama, 2004; Boyer and Parren, 2015), more work would be needed to generalise our findings.

General discussion

Theories of narrative fiction since Aristotle (Poetics, Aristotle 1996) grapple with two slightly contradictory facts. On the one end, fiction is a form of pretence that can reach a high degree of realism. In itself, a fictional narrative bears no indication of its own untruth (Goodman, 1978), and many fictions are quite believable. On the other hand, even the most realistic fictions depict dramatic events that most of us hardly ever have to go through. Fiction, in other words, is a believable depiction of unbelievably strange events. The classic solution to this puzzle sees fiction as an instrument made to generate a vicarious experience: a simulation. This paper proposed and tested a more specific version of the “fiction as simulation” theory, which we term the “ordeal simulation hypothesis”. It states that fictional narrative simulates primarily “ordeals”, situations where a person’s reaction might dramatically improve or decrease her fitness. Examples include deadly aggressions by predators or humans, or decisions on long-term matrimonial commitments. Experience does not prepare us well for these rare, high-stakes occasions, in contrast with situations that are just as fitness-relevant but occur more frequently (e.g., risks of catching an infectious disease, or opportunities for casual sex). This hypothesis differs from the view that fiction should prepare us for social life, for threats or dangers in general, or for any and all fitness-relevant events.

It accounts for several aspects of the psychology of fiction that would otherwise remain puzzling. It explains the prominence of narrative fiction over other kinds of verbal fictions: the fact that most verbal fiction concentrates upon the goal-directed actions of intentional beings (Propp, 2010; Bruner, 2004). No doubt experimental poetry featuring descriptions of empty imaginary landscapes, imaginary fruits and vegetables, alternative laws of physics, etc. may have been produced—but it is unlikely to find avid consumers. The hypothesis explains why sex and murder are more central to fiction more than they are to reality, which is not the case for other biologically important activities (like food procurement or gestation). It accounts for the similarities between play, dreams, and fiction. Moreover, it explains why people die in novels, in the specific ways that they do: fiction emphasises not merely mortality, but agentive mortality specifically.

Explanatory adequacy does not make a theory true, however. The ordeal simulation hypothesis is not specific enough to make unique true predictions concerning the prominence of violent deaths in fictional narratives as distinct from non-fictional texts. Other hypotheses, bearing not on fiction but more generally on socially acquired information, also predict that accounts of agentive deaths are more prominent even in non-fictional material, for two reasons: the relevance of danger-related information in general (Fessler et al., 2014; Blaine and Boyer, 2018; Boyer and Parren, 2015; Barrett et al., 2016; Bebbington et al., 2017), and the specific appeal of social information (Mesoudi et al., 2006; Stubbersfield et al., 2015). We found no evidence for a specific differentiation in mentions of agentive death for novels as compared to private documents. Mortality, both agentive and non-agentive, seems as relevant for non-fiction as it is for fiction.

No single theory seems able to explain what could make death-related themes so prevalent both in fictional and non-fictional material. Theorists trying to account for the cultural appeal of negative content claim that threat-related information is more believable for three main reasons (Fessler, 2019; Blaine and Boyer, 2018). One is the “smoke-detector principle” (Nesse, 2001): the risks of failing to ward off a possible danger are much greater than those of excessive caution. The second is a matter of evidence asymmetry: negative evidence for the absence of a threat is less easily encountered than positive cues. Lastly, warning others of threats may be a useful way to boost one’s standing in a coalition, especially when the warnings identify possible enemies. Tellingly, each of these explanations applies to the credence that we attach to information concerning real-world dangers. They are not meant to explain the appeal of fictional dangers, which (in the case of novels at least) no one ever takes seriously as real possibilities. Moreover, as we saw, a fictional context does not render gruesome events any less interesting. As argued by several authors (McCauley, 1998; Tooby and Cosmides, 2001), the opposite is more likely. In a fictional context, we enjoy events that, considered as real possibilities, repulse us.

Our search for a specific signature of fictional content remains nonetheless uncompleted. This in itself does not suffice to refute the ordeal simulation hypothesis, or the more general view that fictions are adaptive simulations; but it does call for more theoretical work specifying the kind of content that an adaptive simulation device should focus on. More generally, it underscores the need for any adaptationist theory of fiction to come up with precise predictions focusing specifically on fiction as opposed to other kinds of speech or writing. Until this is done, discussions debating whether fiction, as distinct from other kinds of verbal productions, is rooted in an adaptation or emerges as a by-product of other cognitive activities (Pinker, 2007; Mellmann, 2012) are likely to stall.