In uncertain environments, effective decision makers balance exploiting options that are currently preferred against exploring alternative options that may prove superior
. For example, a honeybee foraging for nectar must decide whether to continue exploiting the current patch or move to a new location
Effective decision-making requires balancing exploratory and exploitative behaviour
. For example, finding a restaurant that is better than one’s current favourite requires some exploration. The timing of exploration is also critical. Normatively, the rate of exploration should increase as uncertainty about the relative goodness of options increases
. For example, one may give a restaurant a second chance after a year has passed because the service could have improved in the interim. People in laboratory studies with objective rewards (for example, money) behave in a manner consistent with the ideal actor
These coherency-seeking tendencies in subjective choice have implications for exploratory behaviour. Most choices, like those in a supermarket, involve subjective interpretation of reward. If people alter their preferences to match their choices, then patterns of exploration should be opposite to that found with objective monetary rewards. With objective rewards, the likelihood of exploring increases the longer it has been since exploring (Fig. 1a). We refer to this manner of exploration as uncertainty-minimizing, as it responds to the possibility of missing changes in the choice environment while exploiting preferred options. If instead preferences conform to choices, then people should become less likely to explore the more they exploit (Fig. 1b), which we refer to as coherency-maximizing. In coherency maximization, the longer people repetitively exploit an option, the more entrenched their preference becomes. Such increased liking for chosen options strengthens coherence between preference and past behaviour, while also promoting coherent future behaviour based on this preference. Unlike common approaches to balancing exploration and exploitation in machine learning 24 , both views predict that exploration is structured and non-random in that the likelihood of exploring varies with recent choice history. Although these two views of exploration differ in their predictions for local timing of exploratory choice, they both predict the global exploration frequency should be stable over longer timescales. For example, under coherency maximization, once one eventually explores and discovers a new choice to exploit, the entrenchment process starts anew. In effect, the exploratory choice reduces the burden of continuing to choose coherently to justify past choices, thereby resetting people’s preferences to the level before the entrenchment started. This makes it possible to settle on a new choice once the exploitation streak of a former choice has ended.
Whereas laboratory studies with objective rewards find uncertainty-minimizing exploration, we predict that coherency-maximizing exploration will dominate with subjective rewards. To test this hypothesis, we evaluated how people explore with subjective rewards by examining shoppers’ behaviour in the supermarket. Tesco, a major UK supermarket chain, provided approximately 283,000 fully anonymized datasets, each representing the purchases of a shopper within a specific product category over a period of 250 weeks, involving 152.2 (s.d. = 89.9) store visits on average. We examined how individual shoppers explored product options within six different product categories: beers, breads, coffees, toilet papers, washing detergents and yogurts. For example, a shopper may prefer and exploit beer brand A for a number of store visits before exploring brand B. Exploration and exploitation coding was based on repetition—repeated choices (that is, purchases) were coded as exploitations, whereas explorations involved non-repetitive (that is, switching) choice (see Methods for further details).
On average, people explored with a relative frequency of 0.404 and this global tendency to explore was stable over time (Fig. 2a,b), mirroring the results in laboratory studies using objective rewards 8,9 . Both uncertainty minimization and coherency maximization (Fig. 1) anticipate this result while also predicting that local patterns of exploration should be non-random. Indeed, people’s patterns of exploratory purchases were non-random, as evidenced by exploitation streaks that were longer (mean (M) = 8.56 purchases, s.d. = 18.33 purchases) than expected in 92.8% of cases by a permutation test (Supplementary Information). This result indicates that people systematically explore when shopping. The key question is whether people’s local exploration patterns are more akin to those predicted by uncertainty minimization (Fig. 1a) or coherency maximization (Fig. 1b). As predicted and consistent with the coherency-maximizing view, shoppers were less likely to explore the longer they had been exploiting a product (Fig. 2c,d). This result is in stark contrast to studies with objective rewards that find uncertainty-minimizing exploration.
Model-based analyses, which treat exploitation streak length as a continuous predictor of exploration rate, corroborated the conclusion that people are coherency maximizers. Choices were modelled with logistic regression to predict the probability of exploration given the current exploitation streak length (Fig. 3a). The results showed that the impact of exploitation streak length on probability to explore was negative for 79.3% of the shopper datasets, implying that people explored less the longer they have been exploiting. A permutation test for all regression slopes revealed that 82.6% were lower than expected (Fig. 3b). The findings suggest non-random exploration in line with the predictions for subjective outcomes and coherency maximization.
In other domains, exploratory behaviour is viewed as a stable characteristic of individuals and groups. For example, individuals’ strategies tend to agree across internal (for example, memory retrieval) and external (for example, foraging) search tasks 25 , and exploratory behaviour has been found to systematically vary with factors such as impulsivity, genotype, depressive symptoms and age 7,9,14,26 . Analogously, we consider whether people’s pattern of coherency-maximizing exploration is consistent at the individual level across different products. Using the model-based estimates, we found that individuals’ patterns of coherency maximization were consistent across the product categories considered. For example, for 20.3% more shoppers than expected by chance, either all or none of five product category datasets were associated with strong coherency-maximizing behaviour (Supplementary Information), which is remarkable given the diversity of the product categories considered.
One controversial aspect of the coherency-maximizing view is that preferences may follow from choices. We assessed this possibility by examining consumer’s choices with coupon offers. First, we analysed how customers reacted towards product coupons, where they received points on a bonus card or price discounts for buying a promoted product. If people’s preferences change with exploitation streak length, they should prefer coupons to exploit or explore products differently at different stages (that is, lengths) of their exploitation streaks. Based on 69,664 coupon redemptions in our choice datasets (Fig. 4a), we observed that customers redeemed coupons to explore products more quickly when they were on short exploitation streaks (M = 27.0 days) compared with long ones (M = 29.8 days). Conversely, customers redeemed coupons to exploit more quickly on long exploitation streaks (M = 24.4 days) and slower on short streaks (M = 25.7 days). This strong interaction is predicted by the coherency-maximization view.
Second, rather than relying on existing data, we conducted a follow-up coupon study in which we issued coupons for instant coffee to 8,623 randomly selected households who regularly buy instant coffees. A logistic regression model was fit to the group to predict coupon redemption probability based on exploitation streak length. Consistent with the previous coupon analysis and coherency-maximizing exploration, the results of the logistic regression revealed a significant interaction term of coupon type (that is, whether the coupon meant exploration or exploitation to the customer) and current exploitation streak length, |z| = 3.623, P < 0.001 (Fig. 4b; details in Supplementary Information). Hence, we find support for the idea that people’s choices induced preference changes, as their interest in coupon rewards depended on how well the coupon matched their recent choices (that is, their exploitation streak length).
The overall pattern of results strongly indicates that shoppers are coherency-maximizing explorers, which is striking given that research with objective rewards (for example, money) finds the opposite—uncertainty-minimizing exploration
We analysed 282,972 anonymous datasets containing the chronologically ordered purchases of individual supermarket customers at Tesco, a UK supermarket chain, regarding one of six different product categories. Tesco provided access to these datasets in collaboration with dunnhumby, a customer science company and subsidiary of Tesco (see ‘Data availability’ section for requests). Customers’ product choices were recorded in a database every time they checked out using a personalized bonus card. This data use was in accord with the card agreement, which stipulated that anonymized shopping data would be used and shared outside of Tesco. Individuals in this database can only be identified by an anonymous dataset number, but not by any personal information. Thus, the analysed datasets only contained purchase-related information (for example, quantities, prices, discounts and so on), but no personal information about the shopper. Our sample was restricted to people with at least 50 purchases within a specific product category, which was necessary to model individual behaviour and to select from loyal customers (that is, for which we have good coverage of their purchases). We did not select customers who never explored or who did over 75% of the time. We coded exploration and exploitation based on repetition, where repeated choices were coded as exploitations and non-repetitions as explorations (Fig. 3a).
Computer code to replicate the findings of the present study can be found online at the Open Science Framework: https://osf.io/e76wy.
The datasets generated and analysed for the present study are available from dunnhumby, a customer science company and subsidiary of Tesco, upon request: firstname.lastname@example.org. Further information about the data and analyses is available online at the Open Science Framework: https://osf.io/e76wy.
How to cite this article: Riefer, P. S., Prior, R., Blair, N., Pavey, G. & Love, B. C. Coherency-maximizing exploration in the supermarket. Nat. Hum. Behav. 1, 0017 (2017).
We thank P. Todd for comments. This work was supported by the Leverhulme Trust grant RPG-2014-075, National Institutes of Health (grant 1P01HD080679) and Wellcome Trust Senior Investigator Award WT106931MA to B.C.L. The funders had no role in study design, data collection and analysis, decision to publish or preparation of the manuscript. At the time of submission, R.P., N.B. and G.P. were employed by dunnhumby Ltd. This work was carried out as part of P.S.R.’s PhD thesis, which was co-sponsored by dunnhumby Ltd. and UCL. dunnhumby Ltd. did not place any restrictions on the design, data collection and analysis, decision to publish or preparation of the manuscript, beyond the requirement that this work was to be done in compliance with its data policy.
Supplementary Data and Analyses, Supplementary Tables 1–4, Supplementary References