Main

Predictions about upcoming events, and their eventual outcomes, have a substantial effect on learning and memory1,2,3,4,5,6,7. A wealth of research has demonstrated that momentary prediction errors, or surprises, are particularly well-remembered1,2,7,8,9,10,11,12 and that an array of physiological and mental processes seem to underpin these memory benefits. For instance, surprise engages the dopaminergic and serotonergic midbrain systems7,10,13,14,15,16,17, alters and enhances hippocampal activity16,18,19,20,21,22,23,24, increases pupil dilation7,25,26,27,28,29 and enhances perceptual13,16,30,31 and attentional processing8,32,33,34,35,36,37,38,39 to the surprising stimulus. Surprises can be further differentiated into events that are better than expected for the agent because they signal reward (+ reward prediction error, or + signed surprise) and others that are worse than expected (− signed surprise)40,41,42,43,44,45. The physiological response to these events is critical for learning and updating values of stimuli in the service of the optimal behaviour to approach rewards and avoid punishments41. With regard to memory, outcomes with + signed surprise are better remembered in some models3,40,46,47, whereas in others, the magnitude rather than the direction of surprise (or unsigned surprise) drives memory1,2,7,11,40.

The prevailing focus in these studies is how surprise affects learning and memory for momentary events on the scale of seconds; however, it is clear that humans can predict outcomes well beyond the upcoming moment48. A very non-exhaustive list of things we can make predictions about beyond the next event includes a series of upcoming stimuli49,50, the linguistic content of not just upcoming words51,52,53 but paragraphs54,55, attributes about oneself months and years into the future56,57,58,59, political elections months and years into the future60, and the winners of upcoming sports games61,62 and championships63. Given that we commonly make long-term predictions and that surprise is linked to better memory, it is worth asking whether long-term predictions resulting in errors (long-term surprises) also correlate with memory.

The problem of long-term predictions is intriguing in domains such as politics or sports that feature probabilistic updates about a single outcome over long stretches of time. As humans can represent information at multiple timescales of granularity, it is unclear which timescale should be operative at any given time, whether errors accumulate over time or whether each of these variables impact memory. Consider the following example: imagine that you followed the campaign of Bernie Sanders for the Democratic nominee for US president starting in late January 2020, which was roughly 25% probable per FiveThirtyEight, a website focused on analysing political, societal and sports data (https://projects.fivethirtyeight.com/2020-primary-forecast/). In this hypothetical scenario, imagine that throughout February and March 2020, Sanders won state after state, and therefore Sanders’ likelihood inched upwards every few days until it reached 98% by 1 April. Had Sanders won one more state on 7 April and secured the nomination (100%), would it be surprising? The answer depends on the timescale. If judging from 1 April, no -- with only a 2% surprise, the outcome was essentially a foregone conclusion. However, if judging from late January, at 75% surprise, the answer is undoubtedly yes. These kinds of event are both long-term (the interval between an initial prediction and final outcome spans more than a single moment) and composed of several subevents (there are numerous updates to predictions before that outcome).

Here our primary focus was to determine whether surprise occurring across multiple timescales and events correlates with better memory. We framed our questions around events spanning multiple moments, such that the events could still be surprising as a collective even if they were not individually. To assess this, we asked basketball fans about their most positive and negative memories of entire basketball games and seasons. To link these results with the literature on momentary surprise, we also asked fans about individual plays, allowing us to replicate findings in this literature and extend them to real-world memories (also refs. 64,65). We chose the sports domain for three main reasons. First, recent explosions in publicly available sports data (for example, ref. 66) allowed us to precisely quantify predictions about the likelihoods that a given team would win a game or championship. Using such predictions, we were able to form estimates of surprise following the outcome of a play, game or season. Second, sports contain a natural hierarchy of timescales. In the National Basketball Association (NBA), which is the focus here, the broadest scale over which teams compete is the season, and seasons consist of collections of games, which consist of collections of plays. Therefore, we could find and situate the context of participant memories within a multi-scale predictive framework with respect to game and championship likelihoods. Third, affiliating with sports teams often provides fans a strong sense of identity67 and, as a result, sports memories can be highly arousing and vivid68,69. Emotional events are recollected more vividly70,71,72,73, more often74,75,76 and for longer intervals than neutral events74,77,78, and they also often serve as the basis for experiments surrounding highly vivid, ‘flashbulb’ memories of public events79,80,81,82. We therefore anticipated that participants would be able to readily access these memories. We predicted that surprise across all three timescales would be associated with enhanced memory and, critically, that the contributions of surprise at the longer timescales could not be attributed to surprise at shorter scales.

Results

Characterizing participant responses

Participants (N = 122, 34 female) took a survey asking them about their most positive and negative memories of individual plays, games and seasons as basketball fans (Fig. 1a). We asked them to include as much detail as possible, especially emphasizing that we would like to specifically find each memory via later internet search. Our goal was to precisely identify the play, game or season from which participant memories occurred so we could quantify their surprise-related attributes. We intentionally left the question open-ended and avoided implying that they should use surprise as a heuristic. It was slightly more difficult to identify precise plays and games than seasons: in total, we identified 70 positive NBA plays, 74 negative plays, 77 positive games, 74 negative games, 96 positive seasons and 83 negative seasons.

Fig. 1: Memory task, basic response characteristics and analytical approach.
figure 1

a, Participants were asked to freely respond to prompts about their most positive and negative memories of plays, games and seasons as basketball fans. We then linked these responses to specific teams and times across multiple timescales (moments in games, games within seasons, seasons). b,c, We characterized the response frequencies by the primary NBA team’s geographical location for plays, games and seasons (b) as well as their temporal distributions by month and year of play and game responses (c, top) and starting year of season responses (c, bottom). Because of the COVID-19 pandemic, the end of the NBA season was shifted by 4 months in 2020 and 1 month in 2021 (white arrows in c). d,e, We imported play-by-play data for all games across 17 seasons (2004–2005 to 2020–2021) (d) to find game context of all participant responses and to train a win probability model (e). The win probability model computed the likelihood of winning the game given each game context, including the strength of the two teams and the current score (which were combined into an adjusted score), the time remaining in the game and the team in possession of the ball (only ‘Home’ shown in e). For instance, the play from a in which Lebron James blocked a shot corresponds to the win probability in the black square in e.

Twenty-nine of the 30 NBA teams served as the primary team of interest in participant responses (Fig. 1b and Extended Data Fig. 1a). We conducted the study in a location that was more than 320 km (200 miles) from the nearest NBA team stadium to reduce the bias towards a single fanbase, although there remained a bias towards the nearest teams (responses across all types, the San Francisco area-based Golden State Warriors, 261; Los Angeles Lakers, 83) (Fig. 1b). Memories for plays and games were biased towards recent years (after 2013) and towards the end of the postseason, which typically occurred in June but was moved to October in 2020 and July in 2021 for COVID-19 pandemic rescheduling (Fig. 1c, top). Memories for seasons were similarly biased towards recent years (Fig. 1c, bottom). As data were collected from February until November 2021, we also plotted the age of the memories between the study and the date of their reported memories, which showed a similar bias (Extended Data Fig. 1b,c).

Remembered plays were linked with high surprise

We began our investigations by examining surprise for positive and negative plays, which resembles the momentary surprise studied in many human memory models (for example, ref. 11) and in traditional reinforcement learning models (for example, ref. 43). Our main analytical approach for play and game memories relied on scraping the NBA API for play-by-play data (Fig. 1d). These data included more than 5.6 million plays from more than 22,000 games between 2004 and 2021. Our derivation of momentary surprise was based on the primary outcome variable of relevance to sports fans: which team wins a given game. One can conceptualize watching a basketball game as navigating a state space of predictions about the eventual winner that becomes updated with each change in game context (Fig. 1e and Extended Data Fig. 2). Four factors influenced win probability in our model: the score difference between the two teams (oriented as positive for the home team), the relative strength of the teams, the amount of game time remaining and the team with possession of the ball. Context changes included scores, timeouts and three types of play that potentially change possession (turnovers, rebounds and tip-offs). We then computed surprise as the derivative in the win probability time course across these changes in game context (Fig. 2a). This construct was later split into an unsigned value using the absolute value of this derivative and a signed value that was oriented positively for each participant’s preferred team and negatively for their non-preferred team. We validated that the surprise metrics from all positive and negative plays based on our algorithm corresponded tightly with those from an expert website (https://www.inpredictable.com) (Extended Data Fig. 3). As we did not have direct access to fans’ subjective likelihoods during their viewing experience, this analytical approach assumed that fan’s subjective win likelihood estimates and objective estimates (based on our algorithm) were similar, which is consistent with a previous report7.

Fig. 2: Surprise predicts memory for positive and negative plays.
figure 2

a, We computed play surprise as the difference in win probabilities across successive moments, shown here using Game 7 from the 2016 NBA Finals between the Cleveland Cavaliers and Golden State Warriors. We computed an unsigned version of this metric using the absolute value and a signed version of this metric by considering this value with respect to the participant’s preferred team (for example, positive for a preferred outcome). b, Positive and negative responses were strongly biased towards the end of games, with positive responses more biased (positive N = 70; negative N = 74; Mann–Whitney U = 1904.5, P = 0.006, r (effect size) = 0.265, 95% CI (−110, −1)). 1Q–4Q, first to fourth quarter; OT, first overtime. c, Positive and negative responses were also strongly biased towards the end of the corresponding team’s season, with positive responses more strongly biased (Mann–Whitney U = 1922, P = 0.006, r = 0.26, 95% CI (−10.7, 0)). d, Positive and negative responses corresponded with moments of higher unsigned surprise than a null distribution of all basketball plays (top) (positive versus null Mann–Whitney U = 3.4 × 108, P < 0.001, r = 0.74, 95% CI (0.13, 0.33); negative versus null Mann–Whitney U = 3.2 × 108, P < 0.001, r = 0.55, 95% CI (0.07, 0.11)) and a more focused null distribution of the fourth quarter of the games chosen by participants (bottom) (positive versus null Mann–Whitney U = 3 × 105, P < 0.001, r = 0.66, 95% CI (0.12, 0.30); negative versus null Mann–Whitney U = 2 × 105, P < 0.001, r = 0.43, 95% CI (0.04, 0.10)). e, Positive and negative responses corresponded to higher and lower signed surprise, respectively, than null distributions of all basketball plays (top) (positive versus null Mann–Whitney U = 3 × 107, P < 0.001, r = 0.71, 95% CI (0.15, 0.35); negative versus null Mann–Whitney U = 6 × 107, P < 0.001, r = 0.69, 95% CI (−0.13, −0.09)) and the fourth quarter of the games chosen by participants (bottom) (positive versus null Mann–Whitney U = 3 × 105, P < 0.001, r = 0.70, 95% CI (0.18, 0.34); negative versus null Mann–Whitney U = 6 × 104, P < 0.001, r = 0.63, 95% CI (−0.14, −0.10)). f, The spectacularity of a given play, which assesses an alternative way in which a basketball play can be surprising that is unrelated to win probability, was higher for positive than negative play responses (n = 48; Mann–Whitney U = 1.5 × 104, P = 0.001, r = 0.37, 95% CI (0,1)).

Before characterizing surprise, we investigated when these plays occurred within games and seasons. Both positive and negative plays tended to come from the very end of the fourth quarter of games and into overtime periods, and positive plays tended to have less time remaining in the game than negative plays (positive median 5 s, N = 70; negative median 53 s, N = 74; Mann–Whitney U = 1904.5, P = 0.006, r (effect size) of 0.265, 95% confidence interval (CI) of (−110, −1)) (Fig. 2b). Examining the mean and standard deviation of surprise across all games revealed that mean surprise went down modestly throughout the average game, accounting for the large number of games in which the outcome is all but certain by the end (‘blowouts’) (Extended Data Fig. 4a). However, standard deviation of surprise increased exponentially towards the end of the game, suggesting the final moments offer the largest potential for surprise (Extended Data Fig. 4b). We also calculated the percentile of the game from which the play occurred relative to the chosen team’s season. For example, the first game of the season was in the 0th percentile and the last game was in the 100th percentile. Similar to how responses tended to come from the final months (Fig. 1c), these responses tended to come from the very end of the primary team’s season. In this case, negative plays tended to come closer to the end of a team’s season than positive plays (median percentile, positive 89.3rd; negative 99.0th; Mann–Whitney U = 1,922, P = 0.006, r = 0.26, 95% CI (−10.7, 0)) (Fig. 2c).

On the basis of previous results showing that momentary surprise influences memory1,2,7,11,40, we predicted that memories for positive and negative plays would preferentially be selected from a subset of the most surprising plays. We first tested this by examining whether unsigned surprises from the chosen play memories were higher than a null distribution of all plays from all games. It should be noted that the null distribution was non-normal; most plays had little relevance on the eventual outcome: 32.4 and 78.6% of plays had less than 0.01 and 0.05 surprise, respectively, whereas only 0.1% had more than 0.25 surprise. Participant play memories were greater than this null distribution (median, positive 0.31; negative 0.11, null 0.019; positive versus null Mann–Whitney U = 3.4 × 108, P < 0.001, r = 0.74, 95% CI (0.13, 0.33); negative versus null Mann–Whitney U = 3.2 × 108, P < 0.001, r = 0.55, 95% CI (0.07, 0.11); positive versus negative Mann–Whitney U = 3,372, P = 0.002, r = 0.30, 95% CI (0.02, 0.15)), with 78.5% and 51.4% of positive plays and 71.6 and 17.6% of negative plays having more than 0.05 and 0.25 surprise (Fig. 2d, top). We next reasoned that sports fans have limited time and probably do not watch every game or part of every game. Instead, to maximize their experience of surprise83, which acts as a reward in itself in low-stakes contexts such as sports games7,62,84,85,86,87,88, they may prioritize their fandom by watching select games or parts of games. If this were the case, a null distribution of plays that is more representative of the plays actually experienced by participants would be those from the games the participants reported, or even just plays from the fourth (final) quarter of those games. Positive and negative play surprise was higher than the null distribution from only the games that participants reported (null median 0.025; positive versus null Mann–Whitney U = 1 × 106, P < 0.001, r = 0.7, 95% CI (0.13, 0.32); negative versus null Mann–Whitney U = 1 × 106, P < 0.001, r = 0.5, 95% CI (0.06, 0.10)) and also higher than plays from the fourth quarters (null median 0.036; positive versus null Mann–Whitney U = 3 × 105, P < 0.001, r = 0.66, 95% CI (0.12, 0.30); negative versus null Mann–Whitney U = 2 × 105, P < 0.001, r = 0.43, 95% CI (0.04, 0.10)) (Fig. 2d, bottom).

We next asked whether these memories represented particularly positive or negative signed surprises with respect to the participant’s preferred team7,43,89. For our null distribution, we used a double distribution of the same plays, as if assuming a hypothetical fan might cheer for each team in each game, using all plays from all games. Positive memories were more positive and negative memories were more negative than the null distributions (median, positive 0.29; negative −0.11, null 0.0; positive versus null Mann–Whitney U = 3 × 107, P < 0.001, r = 0.71, 95% CI (0.15, 0.35); negative versus null Mann–Whitney U = 6 × 107, P < 0.001, r = 0.69, 95% CI (−0.13, −0.09)) (Fig. 2e). Therefore, participants tended to report plays with exceptionally good or bad win probability swings for their preferred teams. In sum, the influence of unsigned and signed surprises on memory for plays replicates previous findings on the influence of momentary surprise on memory7,40 and extends them into real-world domains64,65.

Finally, we also considered that our operationalization of surprise as win probability change is only one dimension along which basketball can be surprising7. Rather, one might consider a surprising play in terms of the way it unfolds against a lifetime of viewing other plays, such as a spectacular or improbable shot. Because viewing feats of athleticism may be one of the primary reasons one watches sports, we proposed that such plays would be more likely to be categorized as positive, although we acknowledge the alternative possibility that an improbable shot from an opposing team may also ‘sting’ more deeply. We operationalized this type of surprise as the ‘spectacularity’ of a given play, which we separately rated without regard for the context of the game. We rated these plays as (1) a routine play such as a layup or baseline jump shot, (2) a tough play such as a long shot or a fadeaway jump shot that might be considered in the top three in a given game, (3) a particularly athletic play that might be featured on a ‘top ten plays of the day’ news segment across all sports, such as a half-court shot or behind-the-basket, changing-hands-in-mid-air layup or (4) a truly sensational play that might be considered on a year-end ‘top ten plays of the year’ segment such as a full-court shot or a famous dunk by Michael Jordan off of a missed free throw attempt. We found that positive plays had higher spectacularity than negative plays (positive, median 2, mean 2.0, N = 45; negative median 1, mean 1.6, N = 48; Mann–Whitney U = 1.5 × 104, P = 0.001, r = 0.37, 95% CI (0,1)), indicating that spectacularity was another dimension correlating with participant memories and that it was higher for positive than negative memories (Fig. 2f).

Negative game memories were associated with high surprise

Next, we aimed to determine whether surprise on a longer timescale, lasting beyond a single moment, would also influence memory. For this analysis, we considered two long-term surprise metrics calculated over the course of games. The first was full-game surprise, or the difference between the pregame win probability (that is, with 2,880 seconds remaining) and the final result90,91. The second metric was within-game surprise, or the difference between the lowest probability for the eventual winner and the final result (Fig. 3a). This metric captured the familiar concept of a ‘comeback’ that looks at how maximally wrong one’s predictions might have been during the course of a game. We validated that both of these metrics corresponded strongly with those from an expert sports analyst for both positive and negative memories of games (Extended Data Fig. 5a,b). We also validated that full-game surprise from our algorithm corresponded strongly with full-game surprise based on pregame betting odds (Extended Data Fig. 5c).

Fig. 3: Full- and within-game surprises predict memory for positive and negative games.
figure 3

a, We computed full-game surprise as the difference between the pregame and final win probability and within-game (‘comeback’) surprise as the difference between the lowest and final win probability for the game winner. Additionally, we found the maximum play surprise for each game. b, Both positive and negative game responses were biased towards the end of the corresponding team’s season, and negative responses were more biased (positive N = 77; negative N = 74; Mann–Whitney U = 2,225, P = 0.01, r = 0.22, 95% CI (−2.9, 0)). c, The game time from when the winning team had their lowest win probability, marking the onset of a comeback, was later in the game for negative, but not positive games (positive versus null Mann–Whitney U = 9.9 × 105, P < 0.001, r = 0.14, 95% CI (16, 381); negative versus null Mann–Whitney U = 1.1 × 106, P < 0.001, r = 0.32, 95% CI (296, 839)). These times were not heavily distributed towards the end of the game, which suggests there was more than a single play that drove the comeback and therefore it occurred over a longer timescale. d, Unsigned full-game surprise (top) and within-game (bottom) were greater than the null distribution (all games) for negative, but not positive, game responses (positive versus null Mann–Whitney U = 8.7 × 105, P = 0.998, r = 0.0, 95% CI (−0.05, 0.05); negative versus null Mann–Whitney U = 5.5 × 105, P < 0.001, r = 0.34, ref. 5, 95% CI (−0.18, −0.09); positive versus negative Mann–Whitney U = 1980, P = 0.001, r = 0.305, 95% CI (−0.2, −0.06)). e, Signed full-game surprise (top) (positive versus null Mann–Whitney U = 9.9 × 105, P < 0.001, r = 0.43, 95% CI (0.26, 0.52); negative versus null Mann–Whitney U = 2.6 × 106, P < 0.001, r = 55, 95% CI (−0.58, −0.31); positive versus null Mann–Whitney U = 5,245, P < 0.001, r = 84, 95% CI (0.81, 0.995)) and within-game (bottom) (positive versus null Mann–Whitney U = 5.3 × 105, P < 0.001, r = 0.69, 95% CI (0.47, 0.68); negative versus null Mann–Whitney U = 2.9 × 106, P < 0.001, r = 0.74, 95% CI (−0.74, −0.53)) values were higher and lower for positive and negative games, respectively, than the null distribution using all games. f, The spectacularity of a given game was higher for positive than negative game responses (N = 74; Mann–Whitney U = 3.4 × 104, P < 0.001, r = 0.49, 95% CI (1,1)).

Similar to our characterizations for play memories, we showed that both positive and negative game memories were biased towards the end of a team’s season; similar to the temporal distributions of play memories, negative games were more likely to be recalled closer to the end of a team’s season (median percentile, positive 98.8th, based on N = 77; negative 100th, based on N = 74; Mann–Whitney U = 2,225, P = 0.01, r = 0.22, 95% CI (−2.9, 0)) (Fig. 3b). We also found the time that the winning team had its lowest win probability. We compared these times to those from a null distribution of all games, which tended to peak early with a skewed distribution towards the end of the games (Fig. 3c). For both positive and negative game memories, these times occurred later in the game than in the null distribution (median game remaining, positive 1,364 s, based on N = 77; negative 1,364 s, based on N = 74; null 2,081, based on N = 22,539; positive versus null Mann–Whitney U = 9.9 × 105, P < 0.001, r = 0.14, 95% CI (16, 381); negative versus null Mann–Whitney U = 1.1 × 106, P < 0.001, r = 0.32, 95% CI (296, 839)), showing that the preferred team had a better chance of winning than is typical at some point into the game. However, there was a trend such that these low win probability time points tended to come later in the game for negative than positive games (positive versus negative Mann–Whitney U = 3,331, P = 0.07, r = 0.17, 95% CI (0, 640)). Notably, these times for negative games were not heavily clustered around the final moments of the game, which suggests that negative game memories were not solely driven by games that were lost in the final seconds. Rather, the events in the game leading to the loss were spread across many accumulated, albeit small, surprises over time.

We again asked whether game memories correlated with unsigned surprise; in this case, surprise aggregated over multiple events spanning a timescale of minutes to hours. Similar to our play analyses, we compared full-game and within-game unsigned surprise values for positive and negative game memories against a null distribution of these values in all games (Fig. 3d, top). The null distribution was biased towards surprise values less than 0.5 (median 0.41), as one should expect if pregame probabilities hold predictive power. Full-game surprises from negative game memories were greater than those from the null distribution and the positive game memory distribution, whereas positive memories did not differ from the null (median full-game surprise, positive 0.41; negative 0.60; positive versus null Mann–Whitney U = 8.7 × 105, P = 0.998, r = 0.0, 95% CI (−0.05, 0.05); negative versus null Mann–Whitney U = 5.5 × 105, P < 0.001, r = 0.345, 95% CI (−0.18, −0.09); positive versus negative Mann–Whitney U = 1980, P = 0.001, r = 0.305, 95% CI (−0.2, −0.06)). To independently assess full-game surprise using a more well-established source, we also computed it on the basis of pregame Vegas betting odds and we found similar results (median full-game surprise, null 0.42; positive versus null Mann–Whitney U = 6.8 × 105, P = 0.86, r = 0.01, 95% CI (−0.06, 0.045); negative versus null Mann–Whitney U = 4.3 × 105, P < 0.001, r = 0.35, 95% CI (−0.19, −0.09)). These results indicate that negative game memories preferentially came from instances in which the participant’s preferred team was expected to win. Next, we compared within-game surprise for all games against positively and negatively remembered games. In the null distribution, these values were above 0.50 (mean 0.64, median 0.65) (Fig. 3d, bottom). To explain this, consider that scoring in a basketball game is a stochastic process, and simulating a basketball game between two equally strong teams resembles examining whether heads occur more than tails at a coin flip game over 100 flips; the eventual winner between heads and tails is likely to be losing at some point during the coin flip game, leading to more than 0.50 within-game surprise on average. Positively remembered games were no different than the null distribution of all games, whereas negatively remembered games had higher within-game surprise than the null distribution and the positive memory distribution (median within-game surprise, positive 0.62; negative 0.84; positive versus null Mann–Whitney U = 8.3 × 105, P = 0.556, r = 0.04, 95% CI (−0.07, 0.04); negative versus null Mann–Whitney U = 5.4 × 105, P < 0.001, r = 0.35, 95% CI (−0.17, −0.08); positive versus negative Mann–Whitney U = 2,055, P = 0.003, r = 0.28, 95% CI (−0.19, −0.025)). These results indicate that, similar to results for full-game surprise, the preferred team was winning by a larger-than-average value at some point during negatively remembered games (that is, they blew the lead).

We next examined an alternative hypothesis: because games are composed of plays, negative game memories could simply be driven by single plays rather than the more long-term surprise of multiple accumulated events that are by themselves not particularly surprising. The distribution of times from which the lowest win probability occurred was not clustered near the end of the game (when the largest surprises tend to occur) (Fig. 3c), which argues against the case for single plays driving game memories. Nevertheless, to further address this concern, we found the most surprising play in the positive and negative games (Fig. 3a), and we compared this value to a distribution of maximum play surprise values for all games. As the distribution of maximum surprise values for each game rather than all surprise values, this null distribution is higher than the play surprise distribution (median 0.118). Both positive and negative maximum play surprise distributions were higher than the null distribution and did not differ (median maximum play surprise, positive 0.134; negative 0.179; positive versus null Mann–Whitney U = 6.9 × 105, P = 0.002, r = 0.20, 95% CI (−0.03, −0.007); negative versus null Mann–Whitney U = 5.8 × 105, P < 0.001, r = 0.30, 95% CI (−0.05, −0.02); positive versus negative Mann–Whitney U = 2,660, P = 0.48, r = 0.07, 95% CI (−0.03, 0.01)) (Extended Data Fig. 6a), potentially leaving this as an alternative account for the full- and within-game surprise results. To rule out this account, we first resampled our data with replacement until we found 100 instances in which each maximum play surprise distribution did not differ from the null (Extended Data Fig. 6a) (P > 0.10). We then performed independent-sample t-tests on each instance to ask whether these subsets of games still showed differential levels of full- and within-game unsigned surprise against the null. Negative game memories within this subset had significantly higher full- and within-game unsigned surprise (Extended Data Fig. 6b) than the null distribution (see Extended Data Fig. 6c,d for distributions averaged across samples) (median statistics across 100 instances for full-game, N = 74; t = 6.0; P < 0.001; d = 0.70; 95% CI (−0.17, 0.08); medians for within-game, N = 74; t = 4.6; P < 0.001; d = 0.53; 95% CI (0.07, 0.15)). Positive game memories did not differ from the null (median statistics across 100 instances for full-game, N = 77; t = 0.05; P = 0.5; d = 0.005; 95% CI (−0.05, 0.05); medians for within-game, N = 77; t = 0.3; P = 0.49; d = 0.03; 95% CI (−0.06, 0.05)). Therefore, while large momentary surprises may partially contribute to memories for events aggregated across numerous moments such as games, they cannot fully explain them, because the effect of game surprise persisted after controlling for play surprise. This indicates that participants preferentially remembered negative games with surprise aggregating over multiple events.

We next asked whether positive and negative game memories were drawn from games in which participants’ preferred teams overperformed or underperformed. We again created a null distribution by considering full- and within-game surprise values as if a hypothetical fan cheered once for each team per game. Full-game signed surprise for positive and negative game memories was higher and lower than this null distribution, respectively (Fig. 3e, top) (median, positive 0.41, negative −0.54, null 0.0; positive versus null Mann–Whitney U = 9.9 × 105, P < 0.001, r = 0.43, 95% CI (0.26, 0.52); negative versus null Mann–Whitney U = 2.6 × 106, P < 0.001, r = 55, 95% CI (−0.58, −0.31); positive versus negative Mann–Whitney U = 5,245, P < 0.001, r = 84, 95% CI (0.81, 0.995)). We found similar results for within-game surprise (Fig. 3e, bottom) (median, positive 0.55, negative −0.74, null 0.0; positive versus null Mann–Whitney U = 5.3 × 105, P < 0.001, r = 0.69, 95% CI (0.47, 0.68); negative versus null Mann–Whitney U = 2.9 × 106, P < 0.001, r = 0.74, 95% CI (−0.74, −0.53); positive versus negative Mann–Whitney U = 5,188, P < 0.001, r = −0.82, 95% CI (1.2, 1.36)). As positive game memories did not have higher unsigned full- and within-game surprise values than the null distribution, this analysis supports the idea that these games were good for their preferred team but not especially surprising overall. Conversely, negative game memories had higher unsigned surprise than expected, and these surprises also came within games in which the preferred team underperformed expectations.

Finally, similar to how one might consider a spectacular play to be surprising in an alternative manner to our win probability-derived metric, we also independently rated the spectacularity of games against a hypothetical corpus of other games. We operationalized game spectacularity as (1) a routine game where nothing especially notable occurred, (2) a good solo performance or strong media narrative leading up to the game (for example, the final game between two historic players), (3) a top ten performance by a particular player in a given week or (4) a top ten performance by a player in a given year. An example positive game memory along these lines reported by six participants was for 23 January 2015, when Golden State Warriors player, Klay Thompson, broke a record by scoring 37 points in a single quarter. We found that spectacularity for games, like for plays, was higher for positive than negative memories (positive, median 3, mean 2.8, N = 61; negative, median 2, mean 1.9, N = 74; Mann–Whitney U = 3.4 × 104, P < 0.001, r = 0.49, 95% CI (1,1)) (Fig. 3f).

Season memories had high surprise

To assess whether memories could be influenced by prediction errors across far longer intervals, such as months, we assessed positive and negative participant memories of entire seasons. This analysis involved a different approach, using sports betting odds to model prediction rather than via derivations from play-by-play data. Betting odds are released at multiple time points throughout a season, which bettors can use to place money on which team will win the championship. We used these odds to predict final season outcomes for all NBA teams and examined how positive and negative memories for seasons unfolded relative to these predictions (Fig. 4a). To assess final season outcomes, we devised a system that ordinally ranked teams according to how long they lasted into the season and postseason, and we broke ties using their regular season records (see Methods for more details). Under this system, the championship winner was assigned no. 1 and the team with the worst record was assigned no. 30 (Fig. 4b).

Fig. 4: Surprise within and across full seasons predicts positive and negative season memories.
figure 4

a,b, To calculate surprise across seasons, we first found Vegas championship betting odds updated monthly and before and after each playoff round, plotted here for the 2018–2019 NBA champion Toronto Raptors (a). We then computed a metric that ranks teams by their success in the playoffs and regular season record. b, This metric is shown for all teams at the end of the 2018–2019 season. c, Participants chose seasons in which their preferred team was higher ranked than the average ranking (dotted line) (N = 96; Wilcoxon signed-rank Z = 8.6, P < 0.001, r = 0.88, 95% CI (1.5, 2)), whereas negative seasons did not differ from average (N = 83; Wilcoxon signed-rank Z = 0.65, P = 0.52, r = 0.07, 95% CI (13.75, 16.75)). df, To assess how season memories may have differed on the basis of how outcomes deviated from predictions, we used linear regression to predict final season rankings on the basis of the following (top graphs): preseason championship probability (d), preseason ranking (e) and highest within-regular season championship probability (f). Top plots show the regressions, whereas bottom plots show the residual between final outcomes and expectations based on each regression. Positive and negative season memories were better and worse than these expectations, respectively, for each residual metric (positive: log preseason odds, Wilcoxon signed-rank Z = 8.4, P < 0.001, r = 0.86, 95% CI (−6.0, −4.7); preseason ranking, Wilcoxon signed-rank Z = 8.5, P < 0.001, r = 0.87, 95% CI (−4.9, −5.7); log highest within-season odds, Wilcoxon signed-rank Z = 4.9, P < 0.001, r = 0.50, 95% CI (−2.7, −1.3); negative: log preseason odds, Wilcoxon signed-rank Z = 2.4, P = 0.01, r = 0.27, 95% CI (0.38, 6.9); preseason ranking, Wilcoxon signed-rank Z = 2.8, P < 0.004, r = 0.305, 95% CI (1.1, 7.6); log highest within-season odds, Wilcoxon signed-rank Z = 5.1, P < 0.001, r = 0.56, 95% CI (2.2, 7.4)).

First, we asked whether final outcomes influenced memories apart from predictions. In this case, we used Wilcoxon signed-rank tests with the average final season ranking as the population mean (15.5) rather than comparing these results against the null distribution of a ranking system, which is a uniform distribution. We found that positive season memories had better rankings for the participant’s preferred team than the average season (median 1, mean 2.8, N = 96; Wilcoxon signed-rank Z = 8.6, P < 0.001, r = 0.88, 95% CI (1.5, 2)), whereas negative seasons did not differ from the average (median 16, mean 15.5, N = 83; Wilcoxon signed-rank Z = 0.65, P = 0.52, r = 0.07, 95% CI (13.75, 16.75)) (Fig. 4c). To assess the role of prediction in season memories, we first measured the linear relationship for the final season outcomes from two variables constituting full-season surprise and one variable constituting within-season surprise. These variables were the logs of preseason championship probabilities, preseason rankings (which ranked teams ordinally according to the preseason probabilities) and the logs of the highest within-season championship probabilities (excluding the playoffs). All three metrics predicted final season outcome (Fig. 4d–f, top) (r2 in predicting final rankings, log preseason odds, 0.37; preseason rankings, 0.43; log highest within-season probability 0.47, all P < 0.001). Next, we used the residuals from these linear relationships to assess how positive and negative season memories measured up in comparison to what was predicted before and during the season. For all three metrics, positive season memories were better than predicted (log preseason odds, median −4.9, mean −5.6, Wilcoxon signed-rank Z = 8.4, P < 0.001, r = 0.86, 95% CI (−6.0, −4.7); preseason ranking: median −4.9, mean −5.6, Wilcoxon signed-rank Z = 8.5, P < 0.001, r = 0.87, 95% CI (−4.9, −5.7); log highest within-season odds, median −1.8, mean −2.2, Wilcoxon signed-rank Z = 4.9, P < 0.001, r = 0.50, 95% CI (−2.7, −1.3)) and negative memories were worse than predicted (log preseason odds: median 1.3, mean 4.0, Wilcoxon signed-rank Z = 2.4, P = 0.01, r = 0.27, 95% CI (0.38, 6.9); preseason ranking, median 4.1, mean 4.4, Wilcoxon signed-rank Z = 2.8, P < 0.004, r = 0.305, 95% CI (1.1, 7.6); log highest within-season odds, median 1.9, mean 5.6, Wilcoxon signed-rank Z = 5.1, P < 0.001, r = 0.56, 95% CI (2.2, 7.4)) (Fig. 4d–f, bottom).

We next examined whether our results were driven by participants reporting won or lost championships for their favourite team, which may play a special role for sports fans beyond simply winning games63. Therefore, we reran these analyses excluding both finals teams in the championship game. In all cases, positive (log preseason odds, median −5.9, mean −6.0, one-sample t(14) = 3.6, P = 0.003, dz = 0.98, 95% CI (−9.5, −2.5); preseason ranking, median −4.5, mean −4.6, one-sample t(14) = 3.8, P = 0.002, dz = 1.02, 95% CI (−7.2, −2.0); log highest within-season odds, median −3.3, mean −4.9, one-sample t(14) = 3.5, P = 0.004, dz = 0.93, 95% CI (−8.0, −1.9)) and negative (log preseason odds: median 5.6, mean 8.2, Wilcoxon signed-rank Z = 5.0, N = 56, P < 0.001, r = 0.67, 95% CI (5.1, 11.7); preseason ranking, median 7.6, mean 9.0, Wilcoxon signed-rank Z = 5.4, P < 0.001, r = 0.72, 95% CI (6.3, 12.4); log highest within-season odds, median 5.7, mean 8.5, Wilcoxon signed-rank Z = 5.2, P < 0.001, r = 0.69, 95% CI (5.6, 11.5)) seasons were better and worse than expected, respectively.

Next, just as numerous plays compose a game, numerous plays and games compose a season. We therefore asked whether these data could be alternatively explained by especially surprising plays or games. For this analysis, we found the largest play surprise, full-game surprise and within-game surprise values pooled across the entire season for each team in the corpus and for the teams represented by positive and negative season memories. Maximum play unsigned surprise across the season did not differ between the null distribution of all team seasons and those from positive and negative season memories (median, positive 0.6, negative 0.595, null 0.58; positive versus null Mann–Whitney U = 2.5 × 104, P = 0.72, r = 0.02, 95% CI (−0.05, 0.03); negative versus null Mann–Whitney U = 2.2 × 104, P = 0.33, r = 0.07, 95% CI (−0.06, 0.02)). Maximum full-game unsigned surprise values were higher for both positive and negative season memories than the null distribution (median, positive 0.86, negative 0.85, null 0.83; positive versus null Mann–Whitney U = 1.6 × 104, P < 0.001, r = 0.34, 95% CI (0.01, 0.04); negative versus null Mann–Whitney U = 2.8 × 104, P < 0.001, r = 0.30, 95% CI (0.005, 0.02)), and maximum within-game unsigned surprise values were lower for both positive and negative season memories than the null distribution (median, positive 0.97, negative 0.96, null 0.97; positive versus null Mann–Whitney U = 1.9 × 104, P < 0.001, r = 0.24, 95% CI (−0.01, −0.004); negative versus null Mann–Whitney U = 1.5 × 104, P < 0.001, r = 0.29, 95% CI (−0.02, 0.005)). This may have arisen because participants generally chose seasons from teams who performed at the extremes of team strength (Fig. 4c), and therefore the potential for larger surprise values was higher. Nevertheless, after resampling 100 subsets of these distributions such that they no longer differed from the null distribution, we still found that positive and negative season memories were statistically significantly better and worse than predicted, respectively (median statistics across 100 instances controlling for positive full-game surprise, N = 32; t = 4.8; P < 0.001; dz = 0.6; 95% CI (−3.1, −2.1); medians controlling for negative full-game surprise, N = 55; t = 2.7; P = 0.01; dz = 0.36; 95% CI (0.9, 5.9); medians controlling for positive within-game surprise, N = 64; t = 11.2; P < 0.001; dz = 1.4; 95% CI (−6.8, −4.8); medians controlling for negative within-game surprise, N = 55; t = 3.0; P = 0.003; dz = 0.41; 95% CI (1.4, 6.8)) (Extended Data Fig. 7). Therefore, surprise aggregated across a full season could not be explained by a single play or game.

Finally, just as with game and play memories, we independently assessed the spectacularity of reported seasons. We operationalized the season spectacularity of a team as (1) a typical season where nothing especially notable occurred, (2) a season in which a remarkable pre- or mid-season event occurred, such as a top-25 player leaving or arriving to the team, (3) a season in which the team was the most discussed across the NBA during the year, such as the 2010–2011 Miami Heat or (4) an all-time, historic season in terms of media hype or narrative, such as Michael Jordan’s last season with the Chicago Bulls. Contrary to games and seasons, we found that season spectacularity did not differ for positive and negative memories (positive median 2, mean 1.93, N = 96; negative median 2, mean 1.97, N = 83; Mann–Whitney U = 3.9 × 104, P = 0.77, r = 0.02, 95% CI (0,0)).

Altogether, participants reported positive and negative season memories, which involved numerous accumulated games and plays over the course of months, that substantially exceeded and fell below long-term predictions, respectively. These findings cannot simply be explained by considering championship teams or surprises on a shorter timescale, such as especially surprising plays or games. These results were captured pointedly by one participant annotating their own memory of a relatively, although not objectively, positive season for their favourite team: “The [Sacramento] Kings have been a bad team ever since I can remember, so my emotions correlating to my favourite team have more to do with my experience than with the performance of the team”.

Discussion

We found that participants’ reported memories correlated with events in which surprise unfolded on the scale of seconds, hours and months. Given that fans enjoy good outcomes and dislike bad outcomes for their teams relative to expectations, as humans show pleasure or displeasure relative to expectations in a range of domains92,93,94,95, one might expect that participants would recall disproportionately high positive and negative signed surprises. However, reported memories also had disproportionately high unsigned surprises, suggesting that reported memories were not simply due to fan preference. Individually surprising moments (in this case, for individual plays) were well-remembered, which replicates previous results on momentary surprise1,2,3,7,11,96 and extends them into real-world domains64,65. The last aspect is important, because extending ideas from laboratory-based experiments in reinforcement learning (which consist of numerous repetitions of nearly identical trials) into real-world contexts (in which situations may not be precisely repeated) has been a central challenge97. Moreover, surprising games and seasons, which involved surprise accumulated over multiple subevents and longer timescales, were also well-remembered. Importantly, surprises at these longer scales could not be explained by particularly large surprises at shorter scales (plays in the case of games; plays and games in the case of seasons).

We predicted that participants would preferentially remember games with high full-game or within-game surprise, which relate to the common terms of ‘upsets’ and ‘comebacks’, respectively. However, this prediction was only supported for negative game memories. We offer three possible reasons for this. First, when responding for positive games compared to negative games, participants relied more heavily on spectacularity, an alternative form of surprise. This may have led them to de-emphasize win probability-based forms of surprise as a heuristic in reporting their memories. Although this may mask a positive result, future studies with more leading questions may show the importance of this metric (for example, ‘What was your most positive and surprising sports memory for an individual game?’). Second, participants often chose games in which their team won a championship (35 out of 77 positive games), which is a primary goal of sports63. Often when teams win the championship, their team is already favoured to win the final game owing to their superiority, which would reduce both full- and within-game surprise due to being expected to outscore the other team throughout the game. From a surprise perspective, this result may be puzzling, as it is unclear why such an incremental increase in championship likelihood, even to 100%, should be so positive. We speculate that the jump from a near-certainty to a certainty—and all of the surrounding accolades, bragging rights and victory parades that follow, perhaps after years or decades of unfulfilled fandom—is positive enough and possibly reflects a strong enough reward magnitude that fans value these games in spite of the previously high expectations. Factoring in the subjective reward magnitude of winning a championship into an expected value of magnitude times probability may account for this preference98.

A final possibility relates to the phenomenon of ‘wishful thinking’ on the part of sports fans: sports fans of a preferred team often provide biased game reports99,100 and overestimate their likelihood of winning61,101 (for this effect in other contexts, see refs. 56,60,102,103,104). We estimated prediction using our objective win probability model, but wishful thinking could bias this prediction: if the subjective prediction were higher than the objective one for the preferred team, the positive surprise for preferred outcomes would be dampened and negative surprise would be exacerbated. This would lead to greater disappointment for sports fans of the losing team105,106,107 and could be partially responsible for the asymmetries we observed in game memories. Additionally, this topic raises an important consideration for future studies to collect prediction data during real-world experiences to verify how deviations between subjective and objective estimates alter emotions and memory.

In spite of the overall findings consistent with surprise being linked to better long-term memories, many negative season memories exceeded preseason predictions but corresponded to seasons in which the preferred team lost the championship (season rank no. 2 in Fig. 4c). These ‘near-misses’108,109 have been reported as aversive110,111,112 and increase with a higher perceived likelihood of the preferred outcome113. Yet, it is notable that many championship losses were also regarded as positive season memories. We speculate that this may come down to the perspective (or timescale) one takes on such a season, how one realistically values alternatives114, and the specific value one places on championships versus other outcomes63. In other words, it may rely on whether it was enjoyable to see a team do so well from the vantage point of the beginning of the season or agonizing to see them fail after being so close to being champions.

To explain the long-term surprise findings in this paper, we argue that sports fans must hold some lingering trace of their predictions from previous time points that become compared against much later outcomes. This presents an interesting puzzle for how these memory biases might arise computationally. Although speculative, consider that an experienced sports fan learns the value of game states in a way that resembles a reinforcement learning agent trained via temporal difference learning (hereafter, TD-RL agent). That is, when encountered with a successive series of states before some final reward outcome, a TD-RL agent learns to predict the value of each state on the basis of its history of reward probabilities and magnitudes41,115. Similarly, experienced sports fans can learn to associate game states with win probabilities7. Next, consider the following scenarios. In one scenario, the agent’s preferred team has a 98%-win probability with 10 min remaining in a game, and in another scenario, they have a 10%-win probability. After this, both cases converge, in that the win probability gradually decreases or increases to 30% with 1 min remaining. Then, the preferred team loses by the same score in both scenarios. In either scenario, the experienced sports fan, such as the TD-RL agent, will have updated their state probabilities (to 30%) by the 1-min time point and therefore the final negative signed surprise is equal in both cases. Additionally, with the same final score, the difference between the scenarios cannot be explained by the overall accumulation of positive and negative events. Therefore, for long-term surprise to affect memory, we speculate that the sports fan must not simply update their predictions in a memoryless116,117 and rapid manner but instead must keep traces of predictions beyond the immediate moment, such as from the 10-min mark in these scenarios.

It is intriguing to consider which neural processes could produce such long-term surprise effects. In sporting events, performance differences accrue over time to decide a winner. This arrangement resembles many laboratory-based evidence accumulation tasks featuring a series of stimuli (for example, clicks, visual flashes, visual motion) that unfold over time and create a series of states that later become mapped non-linearly onto some outcome variable (for example, win or lose, left or right reward trial)118,119. Recent findings suggest evidence accumulates and affects behaviour across multiple events and at multiple timescales in regions across the cortex120,121,122,123,124,125. Therefore, a future study could, for example, examine whether midbrain or cortical areas respond differentially to well-controlled trials that manipulate the long-term, multi-event surprise, such as by manipulating the overall architecture of the trial (for example, more right clicks at the beginning, creating a 98% probability of a right reward trial, versus more left clicks, creating a 10% probability) while controlling for the overall amount of evidence accrued for each possibility.

Another relevant neural effect follows from findings that neurons and neuronal areas integrate evidence at different timescales126,127,128,129,130,131. In the above 98% versus 10% scenario, a long timescale representation could be formed with 10 min remaining and, because it is not immediately updated due to its long timescale, surprise is higher in the 98% than the 10% case when the final result unfolds. However, whether neural integration—which has generally been studied on the scale of seconds to minutes—operates on scales of months (as in the season data here) remains unclear.

The dopamine system is also relevant to consider. Dopamine neurons respond with respect to deviations from predictions, such that better outcomes are associated with higher responses and worse outcomes with lower responses43,132. Additionally, recent studies have shown that dopaminergic regions also respond to unsigned prediction errors7,14,15 and changes in beliefs133,134. Dopamine neuron responses are modulated by the reward histories of multiple past trials, indicating that they can integrate information beyond the immediately preceding moment to create predictions135. However, this integration is distinct from how a momentary response might differ on the basis of multiple accumulated events in the past. Additionally, it has been shown that dopamine neuron responses reflect updated predictions on a nearly moment-by-moment basis, in line with temporal difference learning136. As noted above, an overtrained TD-RL agent would make optimal predictions but would not show differential responses based on different history in leading to the same momentary state. Therefore, it is unclear whether dopamine would respond differently in the 98% versus 10% cases in the final moments and whether it could play a role in these memory effects.

The ability to form long-term memories for arbitrary, non-repeated events is most intimately linked with the function of the hippocampus137,138. Regarding the computational approach to this problem, some recent artificial intelligence models have included episodic memory modules that simulate the hippocampus by saving contextual attributes of the current state in an ongoing fashion for later reference95,139,140,141,142. Here, a reasonable solution to how long-term surprise could influence memory is by maintaining previous predictions by continuously saving them and comparing them against later outcomes. A possible future test of this idea is whether a hippocampal amnesiac patient would differ from controls in having a smaller bias towards long-term surprising information but relatively intact memory for momentary surprising information.

One limitation here is that by asking for participants to report only a single positive and negative memory at each timescale, we showed that surprise correlates with memory accessibility but not necessarily its availability75,143. Probing all or a larger subset of basketball memories could determine whether surprise affects availability as well. Antony et al.7 asked basketball fans to freely recall basketball game clips and found that surprise also drove memory for individual plays when participants were encouraged to respond with everything they could remember. Therefore, this concern has been partially addressed for momentary surprise in laboratory settings, but future studies could examine whether surprise plays a role by probing a broader swath of memories64,65. Another limitation is that, while in some cases, participants spontaneously provided the surrounding context for plays, games or seasons (Extended Data Table 1), we did not rigorously examine whether information before or after surprising events are similarly well-remembered. In laboratory studies, salient events (such as surprise) boost surrounding memories in some models24,144,145, whereas in others they impair them146 or have no effect3,96 (see ref. 147 for a short review), and it would be worthwhile to examine this more rigorously for these real-world contexts148. Finally, the main limitation in this study is that there was no experimental manipulation, meaning that findings were correlational and causal inferences cannot be directly drawn.

The greatest factor in determining whether an event is surprising is the high-level context149,150 or schema151,152. This factor guides predictions of what should be expected in a particular situation, and therefore surprise varies across domains153. Indeed, the broader literature on surprise includes domains as wide-ranging as music154, language51, politics64,65 and narratives55,151,155,156. In this paper, we used the term ‘surprise’ to refer to changes in game or championship probability. While our data indicate that memories also correlated with other variables, such as seeing spectacular feats of athleticism (which we assess in our spectacularity measure and argue provides another form of surprise), who wins a game or championship may be the most relevant variable for sports fans. If the primary task or context differed, such that they were asked to instead count the number of passes or listen carefully to the commentary, we would expect other factors to drive memory. Just as a routine play (for example, a layup) occurring at a critical moment would be considered highly surprising in our model, an everyday word would be surprising in a particular context (for example, ‘Tornado wind speeds reach over 100 dogs per hour’).

The outcomes of sports contests can be instrumental to one’s sense of identity67 and mood68, influencing neural substrates heavily linked with emotion7,91,157 and physiological factors such as testosterone158 and creating lasting memories69. In fact, our participants often noted that both positive and negative outcomes were among the best or worst things that happened to them in a given month or year (and in a few cases, even a lifetime) (Extended Data Fig. 8). Dovetailing with a previous report showing that surprise influences a host of factors related to memory and emotion7, these findings underscore the importance of initial predictions in determining how events are remembered in the real world. Our approach of probing memories over multiple timescales and events shows that long-term surprises correlate with memory in a manner similar to momentary surprise, widening the door for more behavioural and neural questions investigating their underpinnings and highlighting an overlooked factor in studies of prediction and memory.

Methods

Participants

This study was reviewed and approved by the California Polytechnic State University, San Luis Obispo Institutional Review Board as IRB no. 2020-068, and all participants gave informed consent to take part. Undergraduate participants (N = 122, 34 female, 18–28 years old, mean 19.5) were recruited at California Polytechnic State University, San Luis Obispo (Cal Poly). This school is located along the central coast of California and therefore conveniently pools students who grew up as fans from at least four major NBA fan bases (Los Angeles Lakers, Los Angeles Clippers, Sacramento Kings and the San Francisco Area-based Golden State Warriors). Participants took the study for course credit among a number of alternative studies. All participants considered themselves basketball fans and attested to having seen or played in more than 50 basketball games.

Questionnaire

Before the study, participants were screened as basketball fans. At the beginning of the study, they were told they would be responding with autobiographical positive and negative memories of plays, games and seasons. We specifically asked them to recall everything from memory and not to look up any information. Each prompt began with an open-ended question about the memory, for example, ‘As a FAN, what is your MOST POSITIVE memory of a SINGLE PLAY in a college or professional basketball game?’ We asked them to describe in as much specific detail as possible what happened on the court (the play, game or season itself), any contextual information about when it occurred, such as plays, games or seasons leading up to the chosen memory, and then information about their experience, such as where they were, how they watched it or tended to watch it and who they were with or tended to watch it with. To ensure we had as many identifying details as possible for plays and games and to provide for as rich an account as possible given the multidimensional nature of memories159, we separately asked numerous clarifying follow-up questions. We asked about the teams involved and when the memories occurred, asking participants to be as specific as possible without looking anything up. We then asked about fandom, including the team for which they were cheering, how big of a fan they were (1, almost indifferent; 2, preferred team in that sport; 3, favourite team in that sport and 4, favourite team in any sport), whether they were also specifically cheering against the other team (or teams) and how much they disliked that team or teams (1, indifferent or almost indifferent; 2, generally root against the team; 3, least favourite team in that sport and 4, least favourite team in any sport). Next, we asked about the emotional impact of the memory, including to rate it with respect to other memories in their life. They rated each memory on an 11-point scale (−5 to +5), as the worst or best thing that happened ever or in a given year, month, week or day or that it was a neutral memory (rating of 0). They also estimated how long the play affected them on a 5-point scale, with 1–4 indicating minutes, hours, days and months, and 5 indicating that they were still affected. Finally, in light of data suggesting that positive sports memories may be revisited more often than negative ones160,161 and the general fact that rehearsing autobiographical information improves memory for it162, we also asked how many times they estimated they re-watched the play (for play memory), the full game or parts of the game (for game memory) or parts of the season (for season memory). Note that for each of our analyses, we used memories from our sample participants as representative of basketball memories in the full population.

After responding to each of these questions, we also collected data on participants’ favourite and least favourite men’s and women’s National Collegiate Athletic Association (NCAA) basketball teams, favourite and least favourite NBA and Women’s National Basketball Association teams, their favourite sport, their favourite team in any sport, an estimation of the number of basketball games they had watched (0–50, 50–200, 200–1,000, 1,000+), whether they had played basketball (in an organized fashion, only recreationally (for example, pickup games), only a little (for example, gym class) or never) and if so, how long they had played.

Participants were given the option to respond as fans of any major college or professional American sports league (men’s and women’s NCAA, NBA and Women’s National Basketball Association). While we originally anticipated a more even breakdown of college and professional responses, they skewed heavily towards the NBA in every major category: of the 122 responses, we were able to identify 70 as positive NBA plays, 77 positive games, 96 positive seasons, 74 negative plays, 74 negative games and 83 negative seasons. Therefore, we chose to focus our analyses on the NBA for simplicity and to avoid under-powering other analyses, although surprise drives memory in men’s NCAA basketball7 and possibly other leagues.

NBA play-by-play data

Play-by-play data from 17 seasons (2004–2005 to 2020–2021) were first retrieved from an NBA API (https://github.com/swar/nba_api). These data included information about each game, including the date of the game, the teams involved, quarter and game time updates, score changes and descriptions of each play. We pared down the data into game events involving a change in context, which included scores, rebounds, turnovers and tips from jump balls. In total, these data included more than 5.6 million plays from more than 22,000 games. We focused on these 17 seasons because nearly all responses were from this time period and they occurred after the NBA expansion from 29 to 30 teams, simplifying the analyses. All geographical moves by teams (for example, the Seattle Supersonics moving to become the Oklahoma City Thunder) were reconfigured to use the current team location and NBA three-letter code.

Win probability model

For the main play and game data analyses, we operationalized predictions by creating a win probability metric based on four factors from the play-by-play data: the score difference between the two teams (oriented as positive (negative) when the home team was winning (losing)), the relative strength of the teams, the amount of game time remaining and the team in possession of the ball7. In practice, the first two metrics were combined into one as the ‘expected score difference’ by the end of the game. Using publicly available data (https://www.basketball-reference.com), we found for each team the average number of points they scored minus points scored against them during that season; the difference between these subtractions then indicated the relative difference in team strength for an entire game (see ref. 163 for a similar method using betting odds). Note that this measurement does not capture fluctuations in team strength within a season, whether due to injuries, new player arrivals and departures or other factors. However, to circumvent this issue, we separately showed that surprise based on publicly available pregame betting odds also influenced memory (see below). We then divided this difference between team strengths by the total number of seconds per game (to estimate how much the stronger team should outscore the weaker team, in units of points per second), multiplied it by the number of seconds left in the game, and subtracted this number from the current score difference. For instance, if for one season the Milwaukee Bucks scored an average of 106 points and had an average of 101 points scored against them (+5 net) and the Boston Celtics scored an average of 98 points and had an average of 97 points scored against them (+1 net), the Bucks would be expected to outscore the Celtics by 4 points over an entire game. Therefore, the expected score difference at the beginning of the game would be 4 points in the Bucks’ favour. If the Celtics led by 1 at halftime, the Bucks would nonetheless be expected to outscore them by 2 points over the course of the second half (4 points per game advantage × 50% of the game remaining), so the expected score difference would be 1 point in the Bucks’ favour.

To create the model, we divided possible game states into a grid in three dimensions: the expected home minus away team score difference from −30 to 30; the number of seconds remaining in the game in increments of six (for example, 2,880 − 2,774 s, 2,774 − 2,768 s) until the final 6 seconds, after which we used increments of two (for example, 6 − 4 s, 4 − 2 s, 2 − 0 s) and whether the home team possessed the ball (one or zero). Game states from overtime periods were treated identically in the model to those from fourth quarter periods with the same number of seconds remaining163. We smoothed the data in the temporal dimension but not the score dimension. For most time points, we combined three time increments (for example, 2,774 − 2,758 s), although to avoid edge effects, we combined the first two increments for the first time point (for example, 2,880 − 2,768 s). We used finer increments near the end of the game because of the rapid swings in context that occur in NBA games with very little time remaining. Accordingly, we did not temporally smooth the data for the final time bin (2 − 0 s). To compute win probabilities, we found all possible games matching each state and ‘peeked ahead’ to see how often the home team won the game. Probabilities for contexts with a greater difference in score than 30 points were set to 0.998 or 0.002. This model essentially resembles a ‘look-up table’ rather than an algorithm such as logistic regression. We chose this approach due to non-linearities in the state space163. For example, win probabilities for the home team in possession of the ball with a −4 to +4 point lead with 5 s remaining are the following: 0.018, 0.097, 0.197, 0.36, 0.652, 0.876, 0.938, 0.983 and 0.998. Furthermore, the win probability benefit of having possession of the ball is not uniform across all score differences and game times but rather peaks when the score is close towards the end of the game (Extended Data Fig. 2)163. To validate this approach, training linear and logistic regression models using the same factors yields higher win prediction error rates as defined by mean differences between all predictions and outcomes (mean error for linear regression 0.355; logistic regression 0.316; look-up method 0.304).

Note that our emphasis is not on the process by which participants learn the value (win probability) of the game states, akin to the goal of a reinforcement learning agent. Rather, we start from the assumption that participants have predictive mental models of win probabilities that approximate the true probabilities, as previously validated7. A rare, surprising event, such as a team coming back to win after having only a 1%-win probability, does not mean the prediction was wrong if the same game state occurred 99 other times in which the team lost.

Aligning participant responses with NBA play-by-play data

To find win probabilities and (subsequently) surprise values using our algorithm for play memories, we logged participant responses by the game date, teams involved, quarter and seconds remaining. Then, we searched through the play-by-play database to find the specific win probability values before and after the chosen play by finding the specific number of seconds remaining when the play ended. In many cases, multiple plays occurred within a single second (such as with free throws, wherein multiple plays elapse with no time expiring, or with shots taken within a single second, such as shots taken with only a fraction of a second on the game clock). In these cases, we manually parsed the data to find the correct moment.

To find win probabilities for game memories, we logged participant responses by the game date and teams involved. To find championship probabilities, we logged participant responses by the season and team involved.

Surprise metrics

Numerous ways of quantifying surprise have been devised (for a systematic review, see ref. 153). All surprise constructs used here were probabilistic point estimates of how outcomes differed from previous predictions. To find play surprise, we computed the difference in win probabilities from before to after the chosen plays. These values were oriented as positive for the home team and were flipped if the participant preferred the away team to correctly align signed surprise. For unsigned play surprise, we instead used the absolute values. To externally validate our surprise metric for plays, we looked up the signed difference in win probability values (signed surprise) given the game context using an expert analyst’s site (https://www.inpredictable.com) (Extended Data Fig. 3). This expert used many of the same factors (https://www.inpredictable.com/2015/02/updated-nba-win-probability-calculator.html) to determine win probability, including game time remaining, score, possession and team strength (although they used pregame Vegas odds). They also took special considerations towards the end of the game, during which they abandoned their logistic regression model and opted for a decision tree, resembling our look-up table approach. We compared our surprise metrics against the expert using Pearson correlations.

To find full-game surprise based on our algorithm, we computed the difference in pregame win probabilities (with 2,880 s remaining) and the final probability (1 or 0). To find within-game surprise, we first found the lowest in-game win probability of the eventual game winner and computed the difference between this and the eventual outcome. For instance, if the home team won and their pregame win probability was 0.54 and their lowest win probability was 0.22 in the third quarter, full-game surprise would be 1 − 0.54 = 0.46 and within-game surprise would be 1 − 0.22 = 0.78. Similar to play surprise, both of these values were oriented as positive for the home team and flipped if the participant was cheering for the away team. We also computed unsigned values of these metrics using the absolute values. To externally validate our win probability on full- and within-game surprise, we similarly found the pregame probability and minimum win probability for the winning team on the same expert analyst’s site (Extended Data Fig. 5). As another measure of external validity, we also obtained pregame Vegas betting odds for each game. Odds are posted according to a formula, such that if they are above 0, the championship likelihood is

$$\frac{100}{100+{\mathrm{odds}}}$$

and if they are below zero, the odds are

$$\frac{\left|{\mathrm{odds}}\right|}{100+\left|{\mathrm{odds}}\right|}.$$

From these probabilities, we computed an alternative measure of full-game surprise. Computing full-game surprise in this way has the advantage of win probabilities being updated more frequently than a full season, which allows for within-season fluctuations in team strength, but it has the disadvantage of coming from a closed algorithm. Nevertheless, full-game surprise values from our algorithm and from the betting odds for the chosen games were highly correlated (positive r = 0.79, negative r = 0.89) (Extended Data Fig. 5), and both ways of measuring full-game surprise produced similar qualitative results.

To find season surprise, we created a final season ranking metric as an outcome variable that used how far teams survived into the season and playoffs to determine ranking, and we split ties using their regular season record. NBA champions were given a no. 1 ranking, runners-up were given a no. 2 ranking, conference championship finalists were given no. 3–4 rankings (decided by inverse regular season record), conference semifinalists were given no. 5–8 rankings and so on. If two teams survived the same amount into the season and had the same regular season record, their rankings were averaged (for example, if no. 3–4 had the same record, each was given a ranking of 3.5). This metric was more fine-grained than one that is analogous to games, wherein winners are assigned a probability of 1 and losers 0, which would involve giving the champion a 1 and all other teams 0. An alternative metric would be regular season record; however, (1) many fans would consider this unsatisfactory because regular season is not what determines champions, and (2) to prevent injuries in the playoffs, many recent basketball teams have questioned the point of maximizing their regular season record in favour of simply playing well enough to position themselves for a playoff run while managing workloads164,165. This metric also resembles how other major sports leagues determine the order in which teams can draft new players (https://operations.nfl.com/journey-to-the-nfl/the-nfl-draft/the-rules-of-the-draft/).

We used three different measures to model participant prediction at the beginning of and throughout the season. Each measure relied on the Vegas championship odds for each team and season as they were updated throughout the year. Before 2009, these odds were only updated before the season and before each round of the playoffs. After 2009, the odds were updated monthly before and during the season and sometimes during the mid-season All-Star break. We computed the odds in an identical fashion to the pregame betting odds above. For our first predictor, we used the final preseason odds for preseason prediction. Our second predictor resembled the first, except that it ranked teams ordinally by their preseason odds (and thereby used the same ordinal metric for both predictor and outcome variables). For our third predictor, we used all of the available regular season odds to find the best odds throughout the season for a given team. We used linear regression to find the best fit between these predictors and the final season ranking metric. We used the logarithm of the first and third predictors because doing so produced a more linear fit. We then used the residual from these regression lines to determine whether teams did better or worse than expected given this predictive relationship.

Spectacularity metrics

We devised a spectacularity metric scale from 1 to 4 meant to assess spectacular or otherwise notable and rare plays, games and seasons. See main text for descriptions of each rubric.

Statistical analyses

Our main comparisons in this paper were of three main varieties. First, we compared attributes of positive and negative reported memories against null distributions from the NBA play-by-play corpus and betting odds. For cases in which a shorter timescale form of surprise also differed from the null distribution for a given memory (for example, maximum play surprise for game memories), we resampled the data with replacement 100 times to find instances in which the shorter timescale effect was no longer statistically significant and re-investigated the effect of interest, reporting the median statistical values across all instances. Second, we directly compared attributes of positive and negative reported memories themselves. For each test, we first conducted skewness and kurtosis tests of normality. For cases in which these tests violated assumptions of normality, we used two-sided Mann–Whitney U-tests for independent-sample comparisons and two-sided Wilcoxon signed-rank tests for one-sample comparisons. When normality was not violated, we used paired or unpaired t-tests. Third, we compared distributions of positive and negative responses on various ordinal non-surprise attributes, such as spectacularity. These comparisons used two-sided Mann–Whitney U-tests.

The analyses in this study were not preregistered.

Reporting summary

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.