Introduction

A central theme in the study of decision-making is the limited-capacity nature of human information processing1. A classic example of choice behavior shaped jointly by presumed processing limitations and the demands of the decision environment is the context-dependent nature of preferences: the relative value of an option depends not only on the option in question but the other options in the choice set, or context2,3,4. This sort of context-dependent valuation explains a number of interesting patterns of choice both in humans5,6,7,8,9 and animals10.

Context-dependent—rather than stable—valuation famously challenges classical theories of rational choice11 which posit that value is a static quantity independent of the context, and accordingly, preference between two options should be independent of contextual factors such as the other available options. At the same time, relative (versus absolute) valuation could be understood, computationally, as a rescaling of reward values, relative to the other values present in the context, allowing an individual’s representation of value to adjust to the current value context5. Indeed, recent theoretical treatments have argued that contextual—versus absolute—value representations permit a decision-maker to discriminate between two low-value options in one context, and at the same time, two high-valued options in another context12,13.

Context effects have been observed in both real-world choices—with respect to the temporal history of information viewed, or past choices faced by decision-makers14,15—and in tightly constrained, small-sample demonstrations of “independence of irrelevant alternatives” (IIA) violations16,17. While a body of work has examined context effects arising from choice set composition—probing how the distribution of option values faced by the decision-maker systematically alter choices and subjective valuations of options—limited work has investigated whether these sorts of context effects hold in real-world settings18. Unlike tightly-controlled laboratory settings, the choice sets facing real-world decision-makers are arbitrarily large, and composed of options, which could be valued along multiple dimensions, each learned via rich histories of direct experience. Beyond the possible practical implications of any real-world context effects—which hold sizable economic consequences both for consumers and retail businesses—understanding how these real-world context effects unfold is important for understanding the boundary conditions of these well-studied laboratory effects. For example, it is possible that these sorts of context effects do not scale to more naturalistic settings, where individuals may resort to heuristics (e.g. ‘taking the best’). Supporting this view, more naturalistic presentation of options has been shown to reduce apparent suboptimalities in choice19.

More generally, our understanding of context effects in choice might be inherently limited by the artificial, but well-constrained nature of tasks employed in these laboratory choice studies20,21,22,23. Accordingly, we reasoned that we could leverage large-scale datasets that reflect real-world choices in diverse settings in order to glean novel insights about how context effects play out ‘in the wild’24,25, thus directly evaluating whether these effects are meaningful and general26. Here we elucidate whether context effects could be observed in actual choices with real-world consequences—for both the decision-maker and the restaurant27,28—at a scale and breadth sufficient to permit detection of context effects while averaging over potential option- and subject-specific effects.

We demonstrate the generality of context-dependent valuation—both in choices and self-reported valuation—by analyzing a massive real-world restaurant rating dataset. Here we examine user behavior on Yelp, a web-based application which maintains crowd-sourced reviews of local businesses which is particularly popular in North America. Our analysis specifically focused on restaurants because, in most cases, people visit one restaurant per meal—ensuring that options effectively compete with each other—and the categories of restaurants are relatively well-defined in our dataset. With the presumption that each Yelp user’s rating evidences both (1) a decision made to visit the rated restaurant over some competing set of options29 and (2) the user’s subjective valuation of their consummatory experience30, this rating data permits large-scale examination of contextual effects on both inferred choices and valuations. In our analysis, we view each restaurant decision as one made from a “choice set”—defined as similar, geographically proximate restaurants—with the goal of choosing the restaurant with the highest utility31,32.

Our large-scale real-world analysis of ratings addresses two empirical questions. First, what is the impact of the local value context upon choice efficiency?—namely, do Yelp users make less ratings-maximizing choices in sets with higher overall ratings? Based on laboratory results indicating that choice contexts with more valuable ‘distractor’ items (i.e., options lower in value than the higher-rated options in the set) decrease individuals’ propensity to choose the highest-valued item20,22, we hypothesized that we would see fewer choices to the highest-rated option in higher-valued contexts (but see Gluth et al.21 and Webb et al.33 for discussions of other possibly coexisting effects). One mechanism underpinning such a choice context effect would be if options were selected on the basis of expected values, which are themselves context dependent. To shed light on these intermediate computations we performed a second, laboratory-based study, testing the hypothesis that if decision-makers’ subjective valuations are context dependent, then their ratings of the expected quality of a given restaurant would also exhibit context dependence in accordance with the value distribution of the choice set, even in the absence of explicit choice. Taken together, we find evidence for context-dependent decision-making and valuation in real-world behavior (manifesting both in real-world restaurant choices and ratings) and furthermore, laboratory-based evidence suggesting that these context effects in choice and subjective valuation are closely related.

Study 1: Real-world choice and rating data

Methods

Data sources and selection of users and restaurants

Real-world data analysis focused on two data sources in the Yelp Open Dataset (obtained from https://www.yelp.com/dataset): (1) reviews of businesses, for which each record consists of a unique user ID, a unique business ID, the date and time of review, the unique ID of the business reviewed, the user’s given star rating (ranging from 1 to 5 in whole numbers), and the text of the user’s review, and (2) data describing each business, which consists of unique business IDs, business names, GPS coordinates (latitude and longitude) and restaurant categories. We note that the reviews in this dataset reflects the output of Yelp’s recommendation software, which filters out suspect or biased reviews on the basis of quality, reliability, and user activity patterns.

We examined only restaurants in predominantly English-speaking US and Canada Metropolitan Statistical Areas (MSAs) with at least 1000 restaurants, leaving six US MSAs in the analysis: Charlotte-Concord-Gastonia NC-SC, Cleveland-Elyria OH, Las Vegas-Henderson-Paradise NV, Madison WI, Phoenix-Mesa-Scottsdale AZ, and Pittsburgh PA (US), and two Canadian MSAs: Calgary AB, and Toronto ON. We only examined choices pertaining to restaurant businesses—that is, all business entries containing the category “Restaurants” in the Yelp data, which totaled 59,315 restaurants in the US and Canada. Each business was mapped to an MSA with the US Census Bureau’s defined Core-Based Statistical Areas (in the US) or Census Metropolitan Areas (in Canada). We examined user reviews that were posted between January 1 2012 and November 14 2018 (the temporal end of the Yelp dataset).

Calculation of restaurant categories and neighborhood clusters

As business in the Yelp dataset are assigned multiple, unordered categories (e.g., “Restaurants, Italian, Pizza”), we inferred the single category that best described each restaurant. For each restaurant, we tabulated the occurrence of each Yelp-assigned category term in the text of the user reviews associated with that restaurant, and used the most frequently-occurring category term (e.g. “Pizza”). This ensured that each restaurant occurred in at most one possible consideration set, defined by restaurant category and neighborhood, preventing a restaurant from occurring in multiple consideration sets (e.g. both “Pizza” and “Italian”).

To define spatial neighborhoods within each MSA—which, alongside the restaurant categories, jointly defined the restaurant consideration sets—we employed the DBSCAN algorithm34, a density-based spatial clustering algorithm that permitted us to define “clusters” based on users’ restaurant visit patterns, allowing neighborhoods of any arbitrary shape. The DBSCAN algorithm, implemented in the scikit-learn package (https://scikit-learn.org) for Python35, took the GPS coordinates of each restaurant review as input (e.g., 1,191,427 data points in the Phoenix-Mesa-Scottsdale, AZ MSA) and two parameters: a search radius (in decimal degrees), and a minimum number of data points within the search radius. Because the restaurant density and overall size of the MSAs considered vary considerably, the search radius and the minimum number of points were separately tuned for each MSA by performing nearest-neighbor analysis on the restaurants’ GPS coordinates and the total number of restaurants in each MSA, following previous work36. Across all MSAs, the average number of restaurants, irrespective of category, per neighborhood (i.e., cluster) was 11.389 (SD = 1.346; see Table S1and Fig. S1 for statistics describing resultant category and neighborhood sizes).

Selection of user data

We only examined the choice and review behavior of users for whom there were at least 100 reviews in the Yelp dataset, yielding 4591 unique users. Within each user’s series of reviews, we only considered restaurants in the most frequently-visited (i.e., rated) MSA for each user, for which we assume users likely have the most knowledge of the restaurant options. The resultant distribution of users per MSA was as follows: Phoenix-Mesa-Scottsdale AZ: 29.84%, Las Vegas-Henderson-Paradise NV: 28.33%, Toronto ON: 21.26%, Charlotte-Concord-Gastonia NC-SC: 7.382%, Pittsburgh PA: 4.849%, Cleveland-Elyria OH: 4.324%, Calgary AB: 2.078%, and Madison WI: 1.935%.

Calculation of choice sets and choice and rating measures

For every restaurant that a user reviewed, we analyzed that restaurant in the context of the restaurant’s category and its inferred geographic cluster—which, together, define a choice set. Our analysis only considered choices made within sets composed of 3 or more restaurants, with at least 3 unique ratings (the context effects of interest hold when we consider all possible choice sets), resulting in 53,118 choices examined in the final analysis. For each option in the choice consideration set, we computed the mean star rating for each option on the date of the user’s review—that is, considering reviews up to the point of that review but not including the review to capture the user’s possible informational state at the time of choice—rounding to the nearest half-star, as is seen by Yelp users27. Our analyses also took into consideration the number of ratings present for the highest-rated option in the set, and the price (expressed the number of dollar signs from ranging from 1 to 4) of the highest-rated option in the set. We computed the probability of choosing the (possibly non-unique) highest-rated restaurant in the choice set, following previous work21,22—as a function of the mean star rating of the choice set in question computed over these rounded star ratings.

In our analyses of post-choice ratings, the Rating Deviation measure was computed as the difference between the mean rating of the chosen restaurant before that user’s rating and the user’s rating itself. To ensure user ratings measured pure context effects, the average star rating of the choice sets (the abscissa in Fig. 4B) analyses omitted the chosen (i.e., rated) option. The residuals plotted in Figs. 4B and 6A–B are estimated from the mixed-effects regression models reported in Tables 2 and 3, omitting the mean star rating of the choice set as a predictor variable.

Choice replication dataset: Yelp check-ins

We also analyzed the patterns of check-ins—time-stamped, geolocated record of a user having been at a particular business—as a proxy for visitations to restaurants in the Yelp dataset. Using the Yelp mobile app, a user can only “check into” a business, which will be shared with their Yelp social network (i.e. “friends”) if their current GPS coordinates are proximate to the coordinates of the business they are checking into. Unlike ratings, the aggregate number of check-ins to a business are not visible to other users, which mitigates the possibility that a check-in could be used to bias the rating distribution or popularity of a particular restaurant.

The Yelp dataset contains a list of timestamped check-ins for each business (check-ins per restaurant M = 8503; note these are not linked to specific users), which we analyzed similarly to the ratings-inferred choice analysis. For the 565,661 check-ins considered in the analysis, which spanned the years 2014–2016, we determined the surrounding choice set for that check-in using the same neighborhood and category definitions as in the ratings-based choice analysis and tabulated whether that check-in was made to the highest-rated option in that choice set. We then examined whether the mean star rating of the choice set influenced the probability of checking into the highest-rated option.

Choice replication dataset: Deliveroo orders

In our second conceptual replication dataset, we analyzed choice patterns of customers ordering food on Deliveroo, an online food delivery service popular in several countries outside of North America31. Our dataset consisted of 1,613,968 food orders, placed by 195,333 users ordering from 30,552 restaurants of 180 cuisine types across 197 cities, spanning February 2018 to March 2018. Similar to Yelp, each restaurant’s star rating (1–5 stars), the number of previous ratings for the restaurant at the time of the order, and the price of the restaurant. Here, we defined choice sets as unique combinations of cities and cuisine types (e.g., “Mexican” in Coventry, UK). We filtered out choice sets with less than 5 restaurants, leaving 821 unique choice sets in total. We calculated the probability of a user ordering from the highest-rated restaurant, for each choice set separately over the course of the two months. Mirroring our analysis of the Yelp datasets, we then examined whether the mean star rating of each choice set influenced the user’s probability of ordering from the highest-rated restaurant in the set.

Inferential statistics

In both the real-world and laboratory data, we estimated mixed-effects regression models (using logistic regression in models predicting choices) using Markov chain Monte Carlo with uninformative priors using the MCMCglmm package (version 2.32; https://cran.r-project.org/web/packages/MCMCglmm/index.html) for R37. In our regression models, all fixed effects were also taken as random effects over users (or participants). The resultant estimates of these fixed effects are reported in the regression tables. In all applicable models, we log-transformed the length of the text review, the number of reviews for the chosen option, and the number of options in the current choice set to mitigate skew apparent in these variables.

Results

We examined signatures of context-dependent choice and subjective valuations using a dataset from Yelp, a website and smartphone app popular in North America, on which users review local businesses such as restaurants and retail stores. Each rating is made on a scale of one to five stars, with higher star ratings indicating a better rating. For every Yelp user considered, we analyzed each user-provided restaurant rating—which, following previous work29,31, we took as a proxy for a user’s choice to visit that restaurant at the time of the review—with respect to the competing restaurant options in that neighborhood (i.e., the choice set).

This choice set was defined as all restaurants belonging to the chosen restaurant’s category within the same neighborhood (e.g. “Pizza” in a particular neighborhood in Pittsburgh). These geographic neighborhoods were defined by a density-based spatial clustering algorithm, taking all user reviews as input (Fig. 1A; see “Methods” section). Thus, the 8 cities considered contained multiple geographically-defined choice sets for each restaurant category which varied considerably in their rating distributions (Fig. 1B,C). Taking this approach, we analyzed each user’s choices (N = 53,118 choices) in the context of similar restaurants that were geographically proximate38,39. Critically, this analysis considered each option’s average rating at the time of the user’s presumed choice—rounded to the nearest half-star, as is presented to users on Yelp—in order to capture the decision-maker’s information state at the time of restaurant selection.

Figure 1
figure 1

(A) Example of spatially-defined choice sets for Pizza restaurants in Pittsburgh, as computed by the density-based clustering algorithm. Each choice set is denoted by a unique color. (B) Distributions of option ratings for two example choice sets in the Yelp dataset, which correspond to the orange and purple clusters on the map. Notably, the mean star rating (red dashed line) is different across these two choices sets. (C) Distribution of the mean star rating of the choice set, across all choice sets analyzed. (D) Proportion of ratings-maximizing (or ‘target’) choices as a function of the difference in ratings between the highest-rated and second-highest-rated option.

Context effects in choices

We first sought to establish that users’ choices were sensitive to option ratings, as well as the ratings of alternatives in the choice set. Consistent with the idea that our operationalization of choice and definition of choice sets capture a value-comparison process, we observed that the likelihood that a user chooses the highest-rated option (e.g. a 4.5-star option) increased as a function of the difference between the highest-rated option and the second-highest-rated option (e.g., a 4-star option) in the choice set (Fig. 1D; logistic regression β = 0.5182, 95% CI 0.4145–0.6149, p < 0.0001; Table S2).

We next examined whether the value context—i.e. the star rating composition of the choice set—affected individuals’ propensity to choose the highest-rated restaurant. We found that as the mean star rating of the choice set increased, choices of the highest-rated option (i.e., target) decreased markedly (Fig. 2A, logistic regression β =  − 0.0846, 95% CI − 0.1425, − 0.0181, p = 0.002; Table 1). In other words, the likelihood that a user chose the best-rated target depended strongly upon the rating composition of the choice set, with the “target” choice becoming less likely as the quality of the other options in the set increased, demonstrating a pronounced context effect4,22. Intuitively, higher-rated targets were more likely to be chosen29, and in choice sets with more options, the choice of the highest-rated option was less likely (log number of options β =  − 1.0066, 95% CI − 1.0776, − 0.9454, p < 0.0001), again reflecting the well-documented set size effect (Webb et al.33). Importantly, our analysis controlled for the number of reviews40 and the price of the highest-rated option31 by taking them as covariates in the regression. Similarly, this context effect remained significant (β =  − 0.1335, 95% CI − 0.1924, − 0.0761, p < 0.0001) even when controlling for the price of the second highest-rated option, for which we found that more expensive second-highest-rated options predicted less likely choices to the highest-rated option (β =  − 0.1793, 95% CI − 0.2246, − 0.1447, p < 0.0001). Furthermore, this context effect—that, the negative effect of mean choice set star rating upon ratings-maximizing choice—was statistically significant even when controlling for the variance of choice set star ratings (which itself exerted a significant positive effect upon ratings-maximizing choice; β = 0.5727, 95% CI 0.4926, 0.6557, p < 0.0001) or using different geospatial definitions of choice sets (see Supplementary Materials).

Figure 2
figure 2

(A) Proportion of choices to the highest-rated option, inferred from Yelp ratings, as a function of the mean star rating of the choice set, for all choice sets (left panel) and broken down by choice set size (right panel). (B) Proportion of choices to the highest-rated option, inferred from Yelp Check-ins, as a function of the mean star rating of the choice set, for all choice sets (left panel) and broken down by choice set size (right panel). (C) Proportion of choices to the highest-rated option, determined from food orders on Deliveroo, as a function of the mean star rating of the choice set, for all choice sets (left panel) and broken down by choice set size (right panel). Across all three datasets, we observed a marked context effect such that participants became less likely to make ratings-maximizing choices as the mean rating of the choice set increased.

Table 1 Coefficient estimates for mixed-effects logistic regression predicting ratings-maximizing choices as a function of mean star rating of choice set (i.e., the context effect) and log-transformed choice set size, controlling for the price of the highest-rating option, and the log-transformed number of reviews of the highest-rated option in the Yelp choice dataset.

As we observed a modest correlation between the difference between the top two highest-rated options and the mean star rating of the choice set (Spearman ρ =  − 0.1717, p < 0.0001), we sought to rule out the possibility that the apparent suboptimality in making ratings-maximizing choices in sets with higher average option ratings (Fig. 2A) simply arose from highly-rated second-best options. To do this, we repeated the regression, taking the difference between the ratings of the highest and second-highest options—demonstrated above to predict choice accuracy (Fig. 1D)—as a covariate, finding that the effect of mean star rating of choice set (i.e., context) remained statistically significant (β =  − 0.1175, 95% CI − 0.186, − 0.0611, p < 0.001; Table S3).

Importantly, this analysis takes a user’s rating as a proxy for their choice to visit a restaurant, but we sought to demonstrate that these choice context effects take hold in two independent replication datasets which operationalize choice differently. First, we examined a larger dataset of ‘Check-ins’ on Yelp, which are time-stamped, geolocated records of a user having visited a particular restaurant, using the same choice set definition as the previous analysis (N = 565,661 choices; see Supplementary Materials). Examining users’ propensity to ‘Check into’ the highest-rated option as a function of the mean star rating of the choice set (Fig. 2B), we found a similar and pronounced context effect: users were significantly less likely to make choices to the highest-rated ‘target’ option as the quality of the other options in the set increased (β =  − 0.017767, 95% CI − 0.025576, − 0.00987, p < 0.000; Table S4). We also found that, unlike the ratings-based choice examined above, the variance of the star ratings distribution exerted no discernable effect upon ratings-maximizing choices (β =  − 0.001514, 95% CI − 0.009204, 0.006273, p = 0.711).

As another means of conceptual replication, we examined whether context effects could also be observed in actual purchase decisions from the online food delivery service Deliveroo (N = 1,613,968 orders; see Supplementary Materials), a dataset recently used to understand exploratory behavior in consumer choice (Schulz et al.31). As in the two other Yelp datasets, we again found a marked context effect (Fig. 2C): users became less likely to order from the highest-rated restaurant in a choice set as the average rating of the choice set increased (β =  − 0.01777, 95% CI − 0.0256, − 0.0099, p < 0.001; Table S5) but again, variance of star ratings exhibited no significant effect upon choice (β =  − 0.0015, 95% CI − 0.00920, 0.00627, p = 0.711). Taken together, these two independent replication datasets support the generality of these context effects of across different settings and operationalizations of choice.

Following previous laboratory-based work21,22, we performed an additional analysis of choices in the Yelp datasets (inferred from Ratings and Check-ins), examining the ratio of choices made between the two top-rated options as a function of the quality of the “distractor” items inferior to the two highest-rated options. This analysis provides a more direct examination of possible violations of independence of irrelevant alternatives (IIA), according to which the choice ratio between two options should not depend on the quality of other “irrelevant” options41. Importantly, this analysis only considered the subset of choices made to the highest- or second-highest-rated options in each choice set (75% and 72% of choices in the ratings-based and Check-ins-based choice datasets, respectively), probing whether the relative ratio of choices made to the best option varied as a function of the average ratings of the distractor items (which are, by definition, rated worse than the best two options). Conceptually replicating Louie et al.22 and echoing the results of our prior analysis, we observed that choice sets with more valuable inferior options led to significant decreases in the relative ratio of ratings-maximizing choices in both ratings-based choices (Fig. 3A; β =  − 0.482, 95% CI − 0.5566, − 0.4, p < 0.0001; Table S6) and in Check-in based choices (Fig. 3B; β =  − 0.497, 95% CI − 0.5131, − 0.4802, p < 0.0001; Table S7). In other words, we observed that users’ ability to choose between high-quality restaurants was systematically diminished by the quality of putatively irrelevant, lower-quality options, a hallmark of context-dependent decision-making.

Figure 3
figure 3

(A) Proportion of choices to the highest-rated option, inferred from Yelp ratings conditioned on users choosing one of the two highest-rated options in the set, as a function of the mean star rating of the inferior “distractor” items (the sub-top-two-highest rated options), for all choice sets (left panel) and broken down by choice set size (right panel). (B) Proportion of choices to the highest-rated option, inferred from Yelp Check-ins conditioned on users choosing one of the two highest-rated options in the set, as a function of the mean star rating of the inferior “distractor” items, for all choice sets (left panel) and broken down by choice set size (right panel).

Context effects in subjective valuation: Yelp rating behavior

Interestingly, the Yelp ratings examined here—which reflect a user’s consumption experience with a particular restaurant—also afford the opportunity to probe how choice context modulates a user’s subjective valuation of an option following a presumed choice. Intuitively, a user’s restaurant ratings correlated with the average star ratings left by other users before the current rating was made (r = 0.175, p < 0.0001), indicating that higher-rated restaurants were more likely to receive higher star ratings after being chosen. Taking this dependency into consideration, we focused on predicting the deviation between the user’s rating and the average restaurant rating at the presumed time of choice (Fig. 4A). In other words, we asked whether context had a reliable influence on how a user’s ratings differed from the average ratings, which were left prior that user’s rating. We observed that higher average option ratings of the choice set—i.e., higher-valued contexts—were associated with larger rating deviations (Fig. 4B; β = 0.0576, 95% CI 0.037, 0.0753, p < 0.0001). Importantly, our regression model controlled for a number of variables including the number of previous reviews, the price of the chosen option, and average previous rating of the chosen restaurant (Table 2). At first blush, the directionality of this context effect appears at odds with the context effects observed on choice (Fig. 2A–C) where more valuable (or, higher-rated) contexts reduce overall choice efficiency. Following influential accounts of context effects on choice17,20,42, which posit that the subjective value of an option is scaled relative to the values of set of options faced, one might expect a decrease in rating deviation with more valuable (higher-rated) contexts.

Figure 4
figure 4

(A) Distribution of deviations between the user’s rating and the average rating of the chosen option at the time of rating for all ratings analyzed. (B) Deviation between the user’s rating and the average rating of the chosen option as a function of the mean star rating of the choice set (excluding the rated option) for all choice sets (left panel) and broken down by choice set size (right panel), after controlling for a number of nuisance variables (see main text). We observed a marked context effect such that higher-rated choice sets resulted in a more positive rating deviation.

Table 2 Coefficient estimates for mixed-effects regression predicting the difference between user rating and the mean star rating of the chosen option at the time of choice (rating deviation in Fig. 4) in the Yelp dataset, as a function of mean star rating of choice set excluding the chosen option (i.e., the context effect), controlling for the rating of the chosen option at the time of choice log-transformed choice set size, the price of the chosen option, and the log-transformed number of reviews of the chosen option.

However, it is possible to view ratings not as a pure reflection of the user’s consumption utility of the restaurant—i.e., how good or bad the restaurant was, in absolute terms—but instead as the outcome of a comparison with the user’s contextually-bound expectations about the restaurant’s quality43,44 before making a rating (and presumably before visiting the restaurant). More formally, we conjecture that the subjective valuation of an option, reflected in a user’s rating of that option, may be computed as the difference between the experienced utility of the restaurant and the contextually-computed expectation of that option (akin to a “prediction error”45). On this view, choice contexts with higher average ratings drive should reduce the user’s expectation of an option, in proportion to the average rating of the choice set. Assuming that the user’s experienced utility for an option is a symmetric random variable centered around that option’s ground-truth utility (here, its average rating before the user’s rating is made), an option rated in a higher-valued context should receive higher ratings relative to the options’ ground-truth value. Indeed, this exact pattern manifests in the rating deviations (Fig. 4B): more valuable choice sets increase the deviation between a user’s subjective valuation of an option and the average rating of an option.

Study 2: Choice experiment

A limitation inherent to the real-world rating data is that only the user’s final rating is observable, but not the user’s contextually computed expectation of a restaurant’s quality, which is a key component in this presumed comparison process reflected in Fig. 4A. To demonstrate more conclusively that the context effects observed in real-world ratings stem from contextually-bound expectations, we conducted a laboratory experiment in which participants (N = 62) viewed choice sets drawn from the real-world choice sets designed to emulate aspects of the choice and rating experience faced by Yelp users (Fig. 5A). In half of the 270 trials, participants made hypothetical restaurant choices in a given choice set (e.g., pizza in a particular neighborhood), and in the other half of trials, participants provided their expectation of the quality of a randomly selected restaurant on the basis of its restaurant’s rating and the choice set.

Figure 5
figure 5

(A) Task screenshots for laboratory experiment. On each trial, participants views a choice set consisting of 3–7 options from a single restaurant category, and either chooses a restaurant (Choice trials; left panel) or indicates their expected satisfaction of a randomly selected restaurant (Rating trials; right panel). (B) Proportion of choices of the highest-rated option as a function of the mean star rating of the choice set, for all choice sets (left panel) and broken down by choice set size (right panel). We observed a marked context effect in the Experiment, that participants were less likely to make a ratings-maximizing choice in sets with higher mean star ratings.

Method

Participants

We collected data from 100 participants on Amazon Mechanical Turk, using the psiTurk package46 (version 2.3.8; https://psiturk.org). This sample size, which we have adopted as a standard for online studies employing within-subject designs, ensures adequate statistical power to detect meaningful differences across conditions, while at the same time protecting against false positives results or inflated effect sizes which can result from small samples. Experimental protocols were approved by the McGill University Research Ethics Board (REB) and the methods described hereafter were carried out in accordance with REB guidelines. Participants provided informed consent in accordance with the McGill University REB and were paid $4.00 USD for their participation. Thirty-eight participants were excluded from the sample due to failing to correctly respond to at least 60% of catch trials (see below), leaving a final sample size of 62 (age M = 41.88, SD = 12.00, 32% female). Note that the exclusion of these participants does not affect the significance of results reported below.

Choice sets and procedure

Our experiment utilized 275 choice sets extracted from the real-world analysis described above. Each choice set was presented to participants in a 4 × 3 grid of restaurants depicting an image of each restaurant (from the yelp website), the average star rating of each restaurant (1–5 stars in increments of half stars, following the Yelp interface), the number of reviews used to compute the average star rating, the price of each restaurant represented as dollar signs ranging from one to four, and the category (i.e., cuisine of the restaurant; see Fig. 5A). The size of each choice set varied from 3 (71% of the choice sets) to 7 (1%) options. Following our real-world analysis, restaurants in each choice were always drawn from the same category (e.g., ‘pizza’). Each participant viewed 275 choice sets, which were randomly divided into 135 choice trials, 135 rating trials, and 5 catch trials.

Choice trials required participants to answer the prompt “Which restaurant would you choose if you were going to eat at a [category] restaurant?” among the available options. Participants indicated their choice by clicking on a cell in the grid with their mouse. On rating trials, a randomly chosen restaurant from the choice set was highlighted, and participants provided an expectation in response to the prompt “Imagine you ended up at [highlighted restaurant]. How satisfied do you think you would be with this restaurant if you were going to eat at a [cuisine type] restaurant?” Participants responded to this question using a slider bar that ranged from “Not satisfied” to “Very Satisfied” (the output of which was not visible to participants, but ranged from 0 to 100). Finally, each participant completed 5 catch trials (used to identify participants who did not follow instructions), in which they were required to choose the restaurant with the highest star rating. Importantly, no specific directions were provided concerning the information participants should use to make choices and ratings.

For all trial types, the available options were presented on the screen for 5 s during which time participants could not make a response, and following this viewing period, participants had 5 s to make a response. Following each choice and rating, participants rated their confidence in the previous response using a that ranged from 1 (“Not at all confident”) to 7 (“Very confident”). Participants had 5 s to make a confidence response. Confidence data were not examined in the present analyses.

Inferential statistics

Following the real-world dataset, inferential statistics for choices and expected satisfaction ratings were analyzed with mixed-effects regressions, taking all predictor variables as fixed and random effects at the participant level. Participant-level effect sizes in choice and rating context effects were computed as per-subject regression coefficients from the group analysis (conditioned on the group level estimates), superimposed on the estimated group-level effect.

Results

Context effects in choice

Analyzing choices, we observed participants were significantly less likely to choose the highest-rated ‘target’ option in choice sets with higher mean ratings (Fig. 5B; β =  − 0.8538, 95% CI − 1.321, − 0.4892, p < 0.0001; Table S8), but again, the variance of the star ratings of the choice set exerted no appreciable influence on ratings-maximizing choices (β = 0.0116, 95% CI − 0.2272, 0.2906, p = 0.462). In other words, our laboratory experiment replicated the choice context effected observed in the real-world choice behavior—more valuable contexts were associated with less likely choices of the highest-rated option.

Context effects in ratings

Turning to expected satisfaction ratings, we observed a striking context effect upon participants’ expected satisfaction ratings (Fig. 6A), finding that more valuable contexts reduced participants’ expectations (β =  − 2.7376, 95% CI − 3.3713, − 2.0962, p < 0.001; Table 3). This analysis, as in the real-world ratings, controlled for the influence of a number of variables that may have influenced participants’ choices, including the star rating and price of the chosen option (see Table 3). In our account of the real-word ratings (Fig. 4B), these (unobserved) contextually-computed expectancies are the basis of the comparison with a decision-maker’s true experience with the rated option. In our experiment, which examined choice and rating behavior in many of the same choice sets that Yelp users presumably faced in the real world, the observation that these expectations decrease sharply in accordance with the average value of the context supports our account of this comparison process: as contextually-bound expectations decrease, the user’s real-world rating—conceptualized as the true consumption utility minus expectations—increases. Put another way, these experimentally elicited expectations—which appear subject to contextual modulations of the surrounding choice set (Fig. 6A)—provide a compelling explanation for the contextual effects observed in real-world rating data.

Figure 6
figure 6

(A) Expected satisfaction ratings for the selected option, after controlling for a number of nuisance variables, as a function of the mean rating of the choice set, collapsed over all choice sets (left panel) and grouped by the rating of the selected option (right panel). We observed a marked context effect such that participants gave lower expected satisfaction ratings in higher-rated choice sets. (B) Context effects for participants whose choice context effect sizes were weak (top row) and strong (bottom row). We observed that these rating context effects were stronger in participants with stronger context effects in choice. (C) Correlation between subject-level choice context effects (estimated from the group-level mixed effects regression) and the participant-level effect of mean rating of choice set (i.e., context effect with respect to ratings). Participants who exhibited stronger (negative) context effects in choice also exhibited a stronger (negative) context effect in expected satisfaction ratings, statistically confirmed by robust regression.

Table 3 Coefficient estimates for mixed-effects regression predicting expected satisfaction ratings as a function of the mean star rating of choice set excluding the selected option (i.e., the context effect), controlling for the rating of the selected option at the time of choice, log-transformed choice set size, the price of the selected option, and the log-transformed number of reviews of the selected option.

Concordance between choice- and ratings-based context effects

Finally, we probed the relationship between individual differences in the extent of context-dependence manifested in choice—quantified by the subject-level effect of mean choice set rating upon the probability of ratings-maximizing choice—and context-dependence evident in separately-measured expectation ratings. To do this, we examined context-dependence in expectation ratings separately for participants with small versus large context effects in choice (Fig. 6B; via median split), finding that context dependence in ratings was markedly stronger in individuals whose choices also exhibited more context dependence. That is, the slope of the relationship between the mean value of the choice set (i.e., context) and the rated expectation was more pronounced for participants whose choices exhibited stronger contexts effects in choice—though, critically, these contexts effects on choice and ratings were observable at the aggregate, sample level. Further illustrating the concordance between context-dependence in choice and expectancy ratings, we found that the strength of context dependence in choice (quantified by the subject-level effect of mean rating of choice set upon ratings-maximizing choices) predicted the degree to which ratings were dependent on the average value of the choice set (Fig. 6C; quantified by the subject-level random effect of mean rating of choice set; robust regression; β = 3.7073, p = 0.0104). In other words, the degree of context-dependence in an individual’s choices strongly related to the degree of context-dependence in their ratings, supporting the generality of these context effects to both choice and subjective valuation.

Discussion

Here we examined whether real-world choices and valuation, evidenced in a large dataset of online restaurant reviews, are subject to context effects that have previously been observed in tightly-constrained laboratory settings20,22. We observed that ratings-maximizing choice behavior was systematically affected by choice context, and importantly, this robust pattern of context-dependent choice replicated in two separate datasets, which operationalize choice differently. Furthermore, we found that subjective ratings were systematically biased by the rating distributions of the choice sets faced by decision-makers. This observed context-dependence appeared to be driven by the values (i.e., ratings) of the options alone, suggesting that this real-world rating information was evaluated—and acted upon, insofar as directing users’ choices—in a relative and context-dependent, as opposed to absolute, manner.

Further corroborating this account, in a laboratory experiment, we found evidence for context dependence in decision-makers’ choices and subjective expectations of options’ values, which, beyond replicating real-world choice context effects in a controlled setting, demonstrates that the value composition of a choice set can systematically alter decision-makers’ expectancy of an option’s value. Further, the concordance between the extent of context-dependence observed in individuals’ choices and context-dependence in ratings (Fig. 6C) further supports the idea that contextually-computed subjective valuations—observable in expectancy ratings—serve as input to choices22. Taken together, the marked context dependence we observe ‘in the wild’ and experimentally further buttress the idea that an option’s value is computed relative to its context, an idea pervasive across neuroscience, psychology, and behavioral economics3,4,47. Computationally, relative—versus absolute—value representation holds the benefit that information processing resources are allocated efficiently to the encoding of subjective values12,13.

Our results raise a number of questions about the nature of these observed context effects. While our account postulates that the expectations informing the real-world ratings are computed in a relative, context-dependent fashion—which finds more direct support in our laboratory experiment—an open question concerns whether the subjective utility of consumption is also computed contextually, relative to the value composition of the choice set (or some other reference point). However, such effects could also arise from processes operating at consumption (or feedback, e.g.48). Future work should examine, more directly, the extent to which subjective (consumption) utility might also exhibit context dependence.

While our analysis makes a core assumption that Yelp users’ choices (and ratings) are informed by ratings users observe on the Yelp website (or smartphone app) at the time of presumed choice, it may be the case that these online ratings may act here as a proxy for users’ knowledge about the value of options in the environment, rather than directly informing the observed choices. In other words, some users’ (absolute) valuations of options could be informed by direct experience with options or ‘word of mouth’ rather than star ratings. While user ratings are often a valid—if noisy—indicator of “true” quality of the options, we remain agnostic about which source(s) of information are the basis for these choices. The qualitative concordance between real-world choice patterns (Fig. 2A) and laboratory choices (Fig. 5B), which were made in the absence of knowledge about the options’ ‘true’ values—and instead, only on the basis of ratings information—hints that the rating information on Yelp might plausibly have been the basis for real-world users’ choices. Moreover, the low rates of ratings-maximize choice in the real-world choice data (in contrast to the experimental choice data) likely reflect other, possibly idiosyncratic considerations that decision-makers face in restaurant choice—for example, personal experience or other contextual factors not represented in the data, such as geographic, time, or social constraints.

It also is worth noting that our analysis approach—as in any real-world choice study —relies critically on the operationalization of choice sets, which in our case were spatially-defined clusters of restaurants belonging to a single category. This simplified our analysis by guaranteeing that each restaurant appears in at most one choice set, and that restaurant options only competed with each other within the context of a single category. Of course, it is possible that an individual considers multiple restaurant categories when choosing between restaurants31, or that, in choosing across different categories, a restaurant could present as an option in multiple choice sets (e.g., “pizza” and “Italian” restaurants). However, any such construction would presumably be orthogonal to the context effects observed here, absent any asymmetric relationship between restaurant rating and category membership. More directly, it is unlikely that these simplifying assumptions systematically hold the consequence that these robust context effects observed are the result of artifacts in light of the laboratory-based replication of these choice patterns. Nevertheless, future work could endeavor to understand how choice set rating distributions of different categories (and possibly even other choice sets of the same category) might bear on context-dependent choices and ratings. Finally, while a follow-up analysis revealed that these context effects were robust to the spatial definition of choice sets (see Supplementary Materials), we should note it is possible that individuals might, at times, strive to choose the more globally “optimal” restaurant with respect to an entire city (or large region of a city) rather a single neighborhood per se.