Gambling-like behavior in pigeons: ‘jackpot’ signals promote maladaptive risky choice

Individuals often face choices that have uncertain outcomes and have important consequences. As a model of this environment, laboratory experiments often offer a choice between an uncertain, large reward that varies in its probability of delivery against a certain but smaller reward as a measure of an individual’s risk aversion. An important factor generally lacking from these procedures are gambling related cues that may moderate risk preferences. The present experiment offered pigeons choices between unreliable and certain rewards but, for the Signaled group on winning choices, presented a ‘jackpot’ signal prior to reward delivery. The Unsignaled group received an ambiguous stimulus not informative of choice outcomes. For the Signaled group, presenting win signals effectively blocked value discounting for the large, uncertain outcome as the probability of a loss increased, whereas the Unsignaled group showed regular preference changes similar to previous research lacking gambling related cues. These maladaptive choices were further shown to be unaffected by more salient loss signals and resistant to response cost increases. The results suggest an important role of an individual’s sensitivity to outcome-correlated cues in influencing risky choices that may moderate gambling behaviors in humans, particularly in casino and other gambling-specific environments.

Individuals are often faced with choices involving uncertain outcomes that can have critical consequences such as predation in the wild or large financial losses. In the laboratory, risky environments are often modeled by offering a choice between an uncertain large reward (UL) against a certain but smaller reward (CS), where the odds against receiving the UL reward are systematically increased to determine how the value of the UL changes. Under this probability discounting (PD) procedure, the rate at which individuals discount the value of the UL with increased odds against its receipt 1 indexes their risk tolerance as a measure of their propensity to take future risks 2 . Indeed, individual differences in risk tolerance are an important factor in risky decision making as these measures have shown clinical relevance through associations with gambling 3-5 , smoking 6,7 , and internet gaming 8 behaviors, as well as obesity 9 . Additionally, as the DSM-V has categorized gambling as an addictive disorder 10 , and a high prevalence of negative outcomes (monetary or otherwise) are associated with it [11][12][13] , determining the underlying processes involved in risky decision making may aid in understanding these maladaptive behaviors.
In a PD procedure, optimal decision makers should maximize their expected reward as described by normative theories such as expected value [14][15][16] and optimal foraging theory 17 . Evidence suggests, however, that optimal choice does not always occur (e.g., refs 14 and 15). In PD experiments, choice often appears hyperbolic and is well described by Equation 1: Figure 2 shows the proportion of UL choices as a function of the odds against obtaining the UL reward averaged over the last five sessions of training while Supplementary Fig. S1 shows individual fits for all conditions. The Unsignaled group showed decreased choice of the UL as the reward rate began to favor the CS alternative (see Fig. 1c), indicative of sensitivity to the changes in primary reinforcement. As an index of these changes, the Unsignaled group crossed 0.5 proportion UL choices at a level suggesting one certain pellet was approximately equal in value to a 20% chance at four pellets. An indifference point of 20% is very similar to the point of equivalent expected values at 25% reinforcement (see Fig. 1c), suggesting the Unsignaled group nearly optimaly tracked changes in the rate of primary reinforcement. The Signaled group, however, showed no apparent change in choice, even when the reinforcement rates heavily favored the CS. Indeed, a non-linear mixed effects (nlme) analysis using a shared A parameter (A = 0.99, see Methods below) confirmed significant differences in discounting rates between the Signaled (h < 0.01, SEM = 0.01) and Unsignaled (h = 0.24, SEM = 0.07) groups, F(1, 38) = 12.98, p = 0.001, and further indicated that the h parameter for the Signaled group was not significantly different from zero, p > 0.999. These results effectively show that discounting of the UL outcome, while similar in appearance to previous studies with the Unsignaled group (e.g., refs 40 and 41), was completely blocked in the Signaled group.

Reversal.
A potential limitation of the reduced discounting is the use of a visual/spatial discrimination and presenting probabilities of reinforcement in a single decreasing order. Spatial discriminations of choices can confound spatial preferences (a pre-experimental preference for the left or right alternative) with a choice alternative preference 28,37,38 . Additionally, with similar procedures, the order of probabilities has been shown to alter preferences 44 . To address these issues, we used the same procedures as above but reversed the contingencies such that the UL alternative was now presented in the opposite location and the previous Signaled group became the Unsignaled group (see supplemental materials). If the large differences in preferences were indeed a product of the signal for wins and not a procedural artifact, the pigeons previously in the Unsignaled condition should now show attenuated discounting. Figure 3 shows the proportion of UL choice as a function of the odds against obtaining its reward averaged over the last five sessions of training while Supplementary Fig. S1 shows individual fits. The Signaled group, which previously discounted the UL, now showed minimal changes in UL preference as the odds against its delivery decreased (see Fig. 1c), while the Unsignaled group that previously did not discount the UL showed a large  preference reversal. Additionally, the Unsignaled group's indifference point (where UL preference crosses 0.5) was at 25% probability of reinforcement; this indicates that one certain pellet was approximately equal in value to a 25% chance of 4 pellets, the exact point at which the expected values become equivalent across the alternatives (see Fig. 1c). These trends were also confirmed by the nlme analysis with a shared A parameter (A = 1, see Methods below), in which the Unsignaled group (h = 0.29, SEM = 0.05) had a significantly steeper slope than the Signaled group (h = 0.01, SEM = 0.01), F(1, 38) = 37.28, p < 0.001. Furthermore, the slope parameter for the Signaled group was also not significantly different from zero, p = 0.201, again indicating a lack of discounting of the UL alternative's value for the signaled group.
Is it Suboptimal? Animals are thought to have been pressured by their environments to behave optimally in order to survive 17 and as such, they should prefer choice alternatives that produce higher probabilities of primary reinforcement 45 . This was the case in the Unsignaled conditions, where pigeons showed preference changes and indifference points that generally followed the scheduled reinforcement rates and tracked their relative expected value (see Fig. 1c). To confirm that scheduled reinforcement associated with the UL was actually less than optimal, in Table 1 we examined the obtained rewards for all birds in both conditions over the last five sessions on choice trials and found that obtained reinforcement was reduced. Pigeons' preference for the UL outcome in the signaled conditions produced just over half (M = 51.2, SEM = 1.79) the reward earned in the Unsignaled conditions (M = 91.5, SEM = 6.12), clearly exemplifying suboptimal choice for the signaled conditions. Furthermore, Fig. 4 illustrates a positive correlation between discounting rates and obtained food rewards for all birds under both conditions, r 2 = 0.95, p < 0.001, suggesting that discounting within the range observed here advantageously led to increased reward.
Explicit Signaling of Losses. In the previous phases of this experiment there was a signal for winning outcomes but no signal for losses. Although there is a growing body of evidence to suggest that losses minimally influence preferences by pigeons 30,34,35 , translational procedures with rats have suggested that, if there is inhibition to loss signals, suboptimal choice does not occur 46 , and humans may show differential discounting of wins and losses 21,22 . Although the lack of the 'jackpot' signal appearing on loss trials likely serves as a signal for a loss, we introduced a more salient signal for loss outcomes in the Signaled group to determine if salient losses would influence discount rates (see supplemental materials). Figure 5 shows the proportion of UL choices including the novel signaled losses (dashed lines) as well as the proportion of UL choices from the reversal (solid lines) for comparison. Despite losses now being cued, the Signaled group did not show any apparent change in UL preference, while the Unsignaled group showed a slight decrease in discounting. Nlme analysis using a shared A parameter (A = 1, see Methods below) revealed  Increasing the Cost to Gamble. As signaling losses did not reduce the Signaled group's preference for the UL, we next asked if there were conditions under which the UL can be devalued for the Signaled group. Previous research has shown that altering the delay to reinforcement such that the UL has a longer delay relative to the CS 28 , decreasing the duration of the win signal prior to reinforcement 27,28 , and decreasing the salience of the win signal 31 can all decrease the effectiveness of signaled win outcomes. An alternative method is to increase the 'cost' or effort required to choose the UL.
To assess the effect of changing the cost on choice, we systematically increased the number of pecks required to choose the UL from 1 to 2, 4, 8, and 16 across session blocks while the cost of the CS remained at one peck (see supplementary materials). If an alternative has greater relative value, as the signaled UL appears to in the present experiment, its preference should decrease at a relatively slower rate, often described as being less elastic 47,48 . We  therefore predicted that the Signaled group would show less elastic preference for the UL with increasing cost, relative to the Unsignaled group.
UL choice proportions at response costs of 1 and 16 as a function of the odds against receiving its reward are shown in the top row of Fig. 6; additional cost comparisons can be found in Supplementary Fig. S2. As the UL cost increased, both groups' choice allocations showed increased discounting of the UL and a lowered intercept by the final cost of 16 responses, indicating that the increase in cost decreased the value of the UL alternative. The parameter estimates for A and h as a function of UL cost are also shown in the bottom row of Fig. 6 and illustrate these changes. Nlme analysis that included cost as an additional fixed factor and allowed the A parameter to vary for both groups also confirmed these effects as indicated by a significant Group × Cost interaction on discounting (h parameter), F(1, 233) = 14.24, p = 0.002, and main effects of group, F(1, 233) = 6.16, p = 0.0138, and cost, F(1, 233) = 31.01, p < 0.001, on the intercept (A parameter).
As predicted, the Unsignaled group showed a faster increase in discounting rates with increased cost. Both groups also showed decreased intercepts, indicating that when the cost was high enough, four pellets at 100% probability lost value relative to one pellet at a cost of one response. To better illustrate the changes in preference, the data were restructured as the average percent UL choice across the last five sessions of each UL peck requirement and fit with Equation 2 47,48 :

Discussion
Although components of the present results have been reported in previous experiments, the current work advances our understanding of suboptimal choice by collectively encompassing past and predicted results within one model. Similar to previous work [27][28][29][30][31]43 , signaling uncertain choice outcomes prior to reward delivery greatly increased risk preferences. Previous research showing strong suboptimal preferences has generally occurred, however, when the predictive value of the signal following the UL is greater than the CS 26,37 . In the present experiments, the predictive value between the UL and CS signals were equal, which can lead to indifference or relatively weaker preferences 37,39 . With the addition of a magnitude difference, strong suboptimal choice was found even when the UL and CS signals were equally predictive. Within the framework of PD, the interaction of an increased reward magnitude and predictive value of the UL 'jackpot' signal blocked discounting of the UL's value which, we believe, is the first demonstration of such an effect in the literature. While pigeons and starlings have been previously shown to be insensitive to signaled probabilities of reinforcement 28,29,31,38 and suboptimal choice has been found with magnitude differences 43 , their combination had not been tested and led to the blocking of PD. The choice behavior of the Unsignaled group with uninformative signals is also in stark contrast to the Signaled group. The Unsignaled group served as a control for how risky choice tasks are often modeled without signals 41,49 and more optimally discounted the UL leading to nearly twice as much reward as the Signaled group. Reversing the conditions 26,27 and providing a more salient loss signal 30,34,35 further revealed that the difference between the two groups was not due to procedural artifacts and is consistent with previous research. Finally, a novel finding was that when the cost of UL choices was increased, demand for the UL was found to be more inelastic for the Signaled group.
Why signaling win outcomes reduced loss aversion to such an extent, however, is still unclear. For example, we have interpreted the group effects as the 'jackpot' signal reducing the effect of discounting (as do current theories of suboptimal choice), but it may also be that presenting a probabilistic cue that does not produce food (as in the Unsignaled group) produces increased discounting. In either case, while the discounting Equation 1 is useful in characterizing differences between signaled and unsignaled conditions, it does not offer clear explanations for why the differences occur. Several variables influencing the effectiveness of the win cues have been previously identified 26,27,31 , such as its predictive utility for reward, the duration of its appearance prior to reward, and its overall conditioned reinforcement value, leading to different hypotheses. One hypothesis stems from the value of information provided by the win signal 30,31 . That is, the appearance of a signal for reward reduces the time spent in uncertainty of the reward. In the present experiment, however, the signaled condition had equally informative cues between the UL and CS alternatives. The fact that suboptimal preferences still emerged may therefore challenge this interpretation.
Alternative hypotheses stem from the value of win signals as conditioned reinforcers [27][28][29] . The stimulus value hypothesis 29 , based on the contextual choice model 42 , posits that the multidimensional conditioned reinforcement strength 37 of the win signal (magnitude, predictive utility, cost, etc.) drives suboptimal choice (see supplementary materials and Fig. S3). As the CS and UL had equally informative cues for reward in the signaled conditions, only the dimensions of relative probabilities and magnitudes of reinforcement were different. Given that the pigeons were insensitive to the probabilities of reinforcement; the stimulus value hypothesis suggests that group differences in the present experiment were due to an increased sensitivity to the reward magnitude of the UL relative to the CS. As the actual UL magnitude of reward between the signaled and unsignaled conditions on win trials was 4 pellets, however, it is instead inferred that the 'jackpot' cues in the signaled condition effectively acted by increasing the magnitude of the UL reward. The hyperbolic decay model has also been applied to risky choice 28,50 . Hyperbolic decay suggests that the value of a choice alternative is determined by its delay to reinforcement. For probabilistic choices, however, an alternative's value only decays when a signal predicting reinforcement is present. The value of a signal is initially set to 1, decays the longer it is present without reinforcement, and sums across trials of non-reinforcement. For example, in the unsignaled conditions, the CS signal is always followed by reinforcement 10 s after it appears and equates to 10 s of devaluation. The UL signal, however, is only sometimes followed by food; this means the UL can appear for 10, 20, or 30 s (etc.) across multiple trials prior to reinforcement. Greater UL devaluation is therefore consistent with the UL discounting seen in the unsignaled conditions and predicts increased CS preference as the probability of UL reinforcement declines. For the signaled conditions, the CS signal is also always followed by reinforcement, but the UL signal only appears when reinforcement will follow. Thus, even across diminishing UL reinforcement probabilities, both the CS and UL signals are equally subjected to 10 s of devaluation and individuals should be indifferent between them. While individuals in the signaled condition were indeed unaffected by diminishing UL reinforcement probabilities, they showed a strong preference for the UL rather than being indifferent between the UL and CS. In order to account for the present findings, a small addition of a magnitude term would need to be added 50 . Upon doing so, the initial value for the UL and CS changes to 4 and 1, respectively. Thus, the hyperbolic decay model is consistent with the present findings and predicts the current group differences are due to the combined effects of a signal occurring only when reinforcement follows and the magnitude of the UL being greater than the CS. Additionally, it may be possible for the hyperbolic decay model to account for the cost manipulation conducted here by accounting for the increased time it takes to complete the response requirement 50 .
Finally, the contrast 26 and signal for good news (SiGN) 27 hypotheses suggest that it is the change from a state of uncertainty (when making a probabilistic choice) to a state of certainty (when the signaled win outcome appears) that produces the suboptimal choice effect. As the outcome of the CS in the present experiments can be predicted at the time of choice (a certain one pellet), this alternative produces no contrast and, in the case for the SiGN hypothesis, would not serve as a conditioned reinforcer. The UL, however, cannot be predicted at the time of choice and, upon the appearance of the 'jackpot' signal, generates contrast or an increase in reinforcement value that leads to suboptimal preference. In their present form, the contrast and SiGN hypotheses both would predict suboptimal preferences in the present experiment. The SiGN hypothesis also states that, because the UL win signal appears temporally closer to reinforcement than the CS choice stimulus, the appearance of the UL win signal reduces the delay to reinforcement. With this added component, the SiGN hypothesis has been able to account for changes in suboptimal preferences based on changing delays to reinforcement 51 and the cost manipulation conducted here (as it increases the UL's delay to reinforcement) that the contrast explanation currently cannot. Neither hypothesis, however, makes any assertion as to the role of differential magnitudes of reinforcement, although it follows to reason that signals predicting greater magnitudes of reinforcement could be conceptualized to produce greater contrast and/or reinforcement value than signals predicting smaller magnitudes. Still, the present results require that these models better formulate their predictions of how other dimensions of reinforcement may interact.
Although the present experiments cannot clearly distinguish between these different models, the results presented here better support hypotheses stemming from the reinforcing value of the 'jackpot' signal rather than its information. The general finding that 'jackpot' cues have following a risky choice is a robust phenomenon in animal models 24,26,31,[52][53][54] , implicating an important role of cues on an individual's risky decision making. Laboratory measures of risk taking such in humans, however, do not often assess the role that cues may have on risk.
If laboratory measures such as PD are to inform other risky decisions such as gambling in humans 2 , these measures should also take into account the individual's sensitivity to 'jackpot' signals. While evidence exists that human gamblers show increased physical arousal or gambling intentions 55, 56 and regional fMRI brain activation to gambling-related scenarios or stimuli 57,58 , fewer experiments have examined the role of outcome-correlated cues modulating gambling behavior 23 , although one study, using a reinforcement learning model, indicated that cues can effect choices when reinforcement rates were equivalent 59 .
Human gamblers have also shown reduced fMRI reward pathway activation to risky choice outcomes relative to healthy controls 60,61 . This has led to the suggestion that, similar to substance abusers, gamblers seek highly rewarding events to compensate for a hypoactive reward system. Additionally, there is evidence that, relative to controls, gamblers show increased brain activity during anticipation of an expected win following risky choices 62 and both humans and animals have shown increased neuronal activity during uncertainty prior to receiving a reward 63 . These findings suggest that the period after choice but prior to the outcome are an important factor in biasing risk preferences. Indeed, a procedure analogous to the signaled outcomes used here showed that individuals who are self-described gamblers increased choice of gambling-like alternatives 36 . These results suggest that outcome-correlated cues may indeed modulate human risk sensitivities relevant to certain behavior (e.g. gambling), but this needs to be verified in future research. Additionally, the effect of outcome-correlated cues may be different depending on whether they precede or occur simultaneously with the outcome, and future research should take this point into consideration.
The present experiments show that signaling a win prior to receipt of its outcome effectively increases risk taking and can block PD in pigeons. Furthermore, signaling losses do not attenuate the effect, and the value added by these signaled wins is resistant to increases in cost. Collectively, the results suggest that, when making risky decisions, stimuli correlated with win outcomes can increase risk to the point of suboptimality. Indeed, numerous examples of signaling stimuli prior to gambling outcomes occur in casinos, such as the images on the reels of a slot machine, the ball on a roulette wheel, and matching numbers on lottery and Powerball tickets. That pigeons in the Signaled group were also willing to pay an increased cost for the chance to obtain the 'jackpot' reward may also be an indicator of why some individuals can expend increasing resources gambling. Future gambling research and laboratory measures of an individual's risk sensitivity should therefore assess the effect of such cues by controlling for their presence (and absence) to further determine their influence on decision-making.

General Methods
Ten White Carneau pigeons approximately 8-12 years old originally purchased from the Palmetto Pigeon Plant (Sumter, SC) with previous experience in suboptimal choice tasks and no systematic differences in experience were used in the experiment. Subjects were housed in individual cages measuring 28 × 38 × 30.5 cm and maintained at 80-85% their free feeding weight on a 12:12 light-dark cycle (lights off at 7 pm) with free access to grit and water. All research was approved by the University of Kentucky Institutional Animal Care and Use committee (Protocol 01029L2006) and was conducted according to the 2010 NIH Guide for the Care and Use of Laboratory Animals (8 th edition).
The experiment was conducted in a Med Associates (St. Albans, VT) modular operant chamber (ENV-008) measuring 30.5 × 25.5 × 33 cm inside a noise attenuating box. The pigeons responded to three circular keys approximately 21.5 cm above the floor, 2.5 cm across, and 5 cm apart. A 12-stimulus inline projector (Industrial Electronics Engineering, Van Nuys, CA) behind each key projected one of four stimuli (red, green, or three white horizontal or vertical lines on a dark background) onto the left and right response keys and a white light onto the center key. Reinforcement was delivered to a magazine tray at the base of the response panel in the form of a 45-mg pellet from a dispenser (ENV-45 Med Associates, Fairfax, VT) behind the response keys. The chamber was illuminated by a 28 V, 0.1 A house light centered over the chamber. White noise was generated from outside the chamber and a computer in an adjacent room controlled the experiment using Med-PC IV.
Subjects were first trained using an autoshaping procedure in which one of four stimuli were illuminated randomly onto either the left or right response keys; the white light was only presented on the center response key. Following either 30 s or a peck to the stimulus, whichever came first, the house light illuminated and a single pellet delivered into the magazine. The house light remained illuminated for 5 s and then offset for 5 s resulting in a 10-s intertrial interval (ITI). This procedure for reinforcement and the ITI remained consistent throughout the experiment.
Following pretraining, subjects were trained on a visual/spatial one versus four pellet magnitude discrimination. All trials began with a white orienting stimulus on the center key. A response to the center key turned off the orienting stimulus and began either a forced or free choice trial. On free choice trials, concurrently available initial link stimuli of three horizontal or vertical white lines on a black background on each side key appeared. Choice of the uncertain large (UL) alternative led to a terminal link stimulus (red or green) for 10 s after which four pellets were delivered to the magazine. Choice of the certain small (CS) alternative led to a different terminal link stimulus (red or green) for 10 s after which a single pellet was delivered to the magazine. Forced choice trials were identical to free choice trials except that only one alternative appeared on either the left or right key. Sessions consisted of 65 trials, 25 free and 40 forced, divided into five 13-trial blocks. The first eight trials of each block were forced and the last five were free choice. All initial and terminal link stimuli (including their spatial location) were counterbalanced across subjects. Magnitude training continued until all subjects chose the UL alternative at least 95% of the time for two consecutive sessions Subjects were then randomly assigned to the Signaled and Unsignaled groups and trained on a PD procedure structured similar to magnitude training. Each block began with eight forced trials followed by five free choice trials. The first block of trials of each session was the same as magnitude training. In subsequent blocks, the probability of receiving the UL reward when chosen decreased from 100% to 50%, 25%, 12.5%, and 6.25%. For the Signaled group, choice of the UL in these subsequent blocks led either to the predictive terminal link stimulus (or 'jackpot' signal) for 10 s followed by four pellets or a blackout period for 10 s. For the Unsignaled group, choice of the UL was always followed by the nonpredictive terminal link stimulus for 10 s that was followed by the four-pellet reward according to the probabilities of reinforcement associated with that block. Training continued until a line fit to the slope estimates (parameter h) was not statistically different from zero in both groups for five sessions, totaling 30 sessions. Data Analysis. Data were analyzed using nonlinear mixed effects (nlme) modeling using Equation 1 from the nlme package in R 64 . Estimates for both A and h parameters were generated treating group as a nominal factor and subject as a random factor. Two models were run that either allowed the A intercept parameter to vary for each group or as a global parameter shared by both groups. Model selection was chosen based on differences in the Akaike information criteria reaching at least 4 units lower 65 , (data not shown). As h estimates appeared non-linear in form, correlations including this measure used the ranked Spearman correlation. Data Availability. All data presented in the main document can be found as an online supplementary file.