Abstract
Cholinergic neurotransmission affects decision-making, notably through the modulation of perceptual processing in the cortex. In addition, acetylcholine acts on value-based decisions through as yet unknown mechanisms. We found that nicotinic acetylcholine receptors (nAChRs) expressed in the ventral tegmental area (VTA) are involved in the translation of expected uncertainty into motivational value. We developed a multi-armed bandit task for mice with three locations, each associated with a different reward probability. We found that mice lacking the nAChR β2 subunit showed less uncertainty-seeking than their wild-type counterparts. Using model-based analysis, we found that reward uncertainty motivated wild-type mice, but not mice lacking the nAChR β2 subunit. Selective re-expression of the β2 subunit in the VTA was sufficient to restore spontaneous bursting activity in dopamine neurons and uncertainty-seeking. Our results reveal an unanticipated role for subcortical nAChRs in motivation induced by expected uncertainty and provide a parsimonious account for a wealth of behaviors related to nAChRs in the VTA expressing the β2 subunit.
This is a preview of subscription content, access via your institution
Access options
Subscribe to this journal
Receive 12 print issues and online access
$209.00 per year
only $17.42 per issue
Buy this article
- Purchase on SpringerLink
- Instant access to full article PDF
Prices may be subject to local taxes which are calculated during checkout
Similar content being viewed by others
References
Everitt, B.J. & Robbins, T.W. Central cholinergic systems and cognition. Annu. Rev. Psychol. 48, 649–684 (1997).
Dani, J.A. & Bertrand, D. Nicotinic acetylcholine receptors and nicotinic cholinergic mechanisms of the central nervous system. Annu. Rev. Pharmacol. Toxicol. 47, 699–729 (2007).
Guillem, K. et al. Nicotinic acetylcholine receptor β2 subunits in the medial prefrontal cortex control attention. Science 333, 888–891 (2011).
Rangel, A., Camerer, C. & Montague, P.R. A framework for studying the neurobiology of value-based decision making. Nat. Rev. Neurosci. 9, 545–556 (2008).
Fobbs, W.C. & Mizumori, S.J. Cost-benefit decision circuitry: proposed modulatory role for acetylcholine. Prog. Mol. Biol. Transl. Sci. 122, 233–261 (2014).
Kolokotroni, K.Z., Rodgers, R.J. & Harrison, A.A. Acute nicotine increases both impulsive choice and behavioral disinhibition in rats. Psychopharmacology (Berl.) 217, 455–473 (2011).
Mendez, I.A., Gilbert, R.J., Bizon, J.L. & Setlow, B. Effects of acute administration of nicotinic and muscarinic cholinergic agonists and antagonists on performance in different cost-benefit decision making tasks in rats. Psychopharmacology (Berl.) 224, 489–499 (2012).
McGrath, D.S. & Barrett, S.P. The comorbidity of tobacco smoking and gambling: a review of the literature. Drug Alcohol Rev. 28, 676–681 (2009).
Schultz, W. Multiple dopamine functions at different time courses. Annu. Rev. Neurosci. 30, 259–288 (2007).
Waelti, P., Dickinson, A. & Schultz, W. Dopamine responses comply with basic assumptions of formal learning theory. Nature 412, 43–48 (2001).
Montague, P.R., Dayan, P. & Sejnowski, T.J. A framework for mesencephalic dopamine systems based on predictive Hebbian learning. J. Neurosci. 16, 1936–1947 (1996).
Berridge, K.C. From prediction error to incentive salience: mesolimbic computation of reward motivation. Eur. J. Neurosci. 35, 1124–1143 (2012).
Maskos, U. et al. Nicotine reinforcement and cognition restored by targeted expression of nicotinic receptors. Nature 436, 103–107 (2005).
Mameli-Engvall, M. et al. Hierarchical control of dopamine neuron-firing patterns by nicotinic receptors. Neuron 50, 911–921 (2006).
Grace, A.A., Floresco, S.B., Goto, Y. & Lodge, D.J. Regulation of firing of dopaminergic neurons and control of goal-directed behaviors. Trends Neurosci. 30, 220–227 (2007).
Daw, N.D., O'Doherty, J.P., Dayan, P., Seymour, B. & Dolan, R.J. Cortical substrates for exploratory decisions in humans. Nature 441, 876–879 (2006).
Frank, M.J., Doll, B.B., Oas-Terpstra, J. & Moreno, F. Prefrontal and striatal dopaminergic genes predict individual differences in exploration and exploitation. Nat. Neurosci. 12, 1062–1068 (2009).
Gittins, J.C. & Jones, D.M. A dynamic allocation index for the discounted multiarmed bandit problem. Biometrika 66, 561–565 (1979).
Scott, P.D. & Markovitch, S. Learning novel domains through curiosity and conjecture. IJCAI (US) 1, 669–674 (1989).
Kaelbling, L.P. Learning in Embedded Systems (MIT Press, 1993).
Meuleau, N. & Bourgine, P. Exploration of multi-state environments: Local measures and back-propagation of uncertainty. Mach. Learn. 35, 117–154 (1999).
Yu, A.J. & Dayan, P. Uncertainty, neuromodulation, and attention. Neuron 46, 681–692 (2005).
Bach, D.R. & Dolan, R.J. Knowing how much you don't know: a neural organization of uncertainty estimates. Nat. Rev. Neurosci. 13, 572–586 (2012).
Oudeyer, P.-Y. & Kaplan, F. What is intrinsic motivation? A typology of computational approaches. Front. Neurorobot. 1, 6 (2007).
Fiorillo, C.D., Tobler, P.N. & Schultz, W. Discrete coding of reward probability and uncertainty by dopamine neurons. Science 299, 1898–1902 (2003).
Schuck-Paim, C., Pompilio, L. & Kacelnik, A. State-dependent decisions cause apparent violations of rationality in animal choice. PLoS Biol. 2, e402 (2004).
Carlezon, W.A. Jr. & Chartoff, E.H. Intracranial self-stimulation (ICSS) in rodents to study the neurobiology of motivation. Nat. Protoc. 2, 2987–2995 (2007).
Kobayashi, T., Nishijo, H., Fukuda, M., Bureš, J. & Ono, T. Task-dependent representations in rat hippocampal place neurons. J. Neurophysiol. 78, 597–613 (1997).
Funamizu, A., Ito, M., Doya, K., Kanzaki, R. & Takahashi, H. Uncertainty in action-value estimation affects both action choice and learning rate of the choice behaviors of rats. Eur. J. Neurosci. 35, 1180–1189 (2012).
Anselme, P., Robinson, M.J.F. & Berridge, K.C. Reward uncertainty enhances incentive salience attribution as sign-tracking. Behav. Brain Res. 238, 53–61 (2013).
Sutton, R.S. & Barto, A.G. Reinforcement Learning: an introduction (MIT Press, 1998).
Kakade, S. & Dayan, P. Dopamine: generalization and bonuses. Neural Netw. 15, 549–559 (2002).
Herrnstein, R.J. Relative and absolute strength of response as a function of frequency of reinforcement. J. Exp. Anal. Behav. 4, 267–272 (1961).
Ishii, S., Yoshida, W. & Yoshimoto, J. Control of exploitation-exploration meta-parameter in reinforcement learning. Neural Netw. 15, 665–687 (2002).
Yeomans, J. & Baptista, M. Both nicotinic and muscarinic receptors in ventral tegmental area contribute to brain-stimulation reward. Pharmacol. Biochem. Behav. 57, 915–921 (1997).
Serreau, P., Chabout, J., Suarez, S.V., Naudé, J. & Granon, S. Beta2-containing neuronal nicotinic receptors as major actors in the flexible choice between conflicting motivations. Behav. Brain Res. 225, 151–159 (2011).
Krugel, L.K., Biele, G., Mohr, P.N., Li, S.-C. & Heekeren, H.R. Genetic variation in dopaminergic neuromodulation influences the ability to rapidly and flexibly adapt decisions. Proc. Natl. Acad. Sci. USA 106, 17951–17956 (2009).
Niv, Y., Edlund, J.A., Dayan, P. & O'Doherty, J.P. Neural prediction errors reveal a risk-sensitive reinforcement-learning process in the human brain. J. Neurosci. 32, 551–562 (2012).
Balasubramani, P.P., Chakravarthy, V.S., Ravindran, B. & Moustafa, A.A. An extended reinforcement learning model of basal ganglia to understand the contributions of serotonin and dopamine in risk-based decision making, reward prediction, and punishment learning. Front. Comput. Neurosci. 8, 47 (2014).
Granon, S., Faure, P. & Changeux, J.-P. Executive and social behaviors under nicotinic receptor regulation. Proc. Natl. Acad. Sci. USA 100, 9596–9601 (2003).
Picciotto, M.R. et al. Abnormal avoidance learning in mice lacking functional high-affinity nicotine receptor in the brain. Nature 374, 65–67 (1995).
Maubourguet, N., Lesne, A., Changeux, J.-P., Maskos, U. & Faure, P. Behavioral sequence analysis reveals a novel role for β2* nicotinic receptors in exploration. PLoS Comput. Biol. 4, e1000229 (2008).
Gordon, G., Fonio, E. & Ahissar, E. Emergent exploration via novelty management. J. Neurosci. 34, 12646–12661 (2014).
Payzan-LeNestour, E. & Bossaerts, P. Risk, unexpected uncertainty and estimation uncertainty: Bayesian learning in unstable settings. PLoS Comput. Biol. 7, e1001048 (2011).
Redgrave, P. & Gurney, K. The short-latency dopamine signal: a role in discovering novel actions? Nat. Rev. Neurosci. 7, 967–975 (2006).
Bromberg-Martin, E.S. & Hikosaka, O. Midbrain dopamine neurons signal preference for advance information about upcoming rewards. Neuron 63, 119–126 (2009).
Rice, M.E. & Cragg, S.J. Nicotine amplifies reward-related dopamine signals in striatum. Nat. Neurosci. 7, 583–584 (2004).
Addicott, M.A., Pearson, J.M., Wilson, J., Platt, M.L. & McClernon, F.J. Smoking and the bandit: a preliminary study of smoker and nonsmoker differences in exploratory behavior measured with a multiarmed bandit task. Exp. Clin. Psychopharmacol. 21, 66–73 (2013).
Galván, A. et al. Greater risk sensitivity of dorsolateral prefrontal cortex in young smokers than in nonsmokers. Psychopharmacology (Berl.) 229, 345–355 (2013).
Paxinos, G. & Franklin, K.B. The Mouse Brain in Stereotaxic Coordinates (Gulf Professional Publishing, 2004).
Grace, A.A. & Bunney, B.S. Intracellular and extracellular electrophysiology of nigral dopaminergic neurons--1. Identification and characterization. Neuroscience 10, 301–315 (1983).
Rokosik, S.L. & Napier, T.C. Intracranial self-stimulation as a positive reinforcer to study impulsivity in a probability discounting paradigm. J. Neurosci. Methods 198, 260–269 (2011).
D'Acremont, M. & Bossaerts, P. Neurobiological studies of risk assessment: a comparison of expected utility and mean-variance approaches. Cogn. Affect. Behav. Neurosci. 8, 363–374 (2008).
Behrens, T.E.J., Woolrich, M.W., Walton, M.E. & Rushworth, M.F.S. Learning the value of information in an uncertain world. Nat. Neurosci. 10, 1214–1221 (2007).
Daw, N.D. Trial-by-trial data analysis using computational models. in Decision Making, Affect, and Learning: Attention and Performance XXIII (eds. Delgado, M.R., Phelps, E.A. & Robbins, T.W.) 3–38 (2011).
McClure, S.M., Daw, N.D. & Montague, P.R. A computational substrate for incentive salience. Trends Neurosci. 26, 423–428 (2003).
Acknowledgements
We thank E. Guigon for discussions, C. Prévost-Solié for technical support, and J.-P. Changeux, E. Ey, G. Dugué and A. Boo for comments on the manuscript. This work was supported by the Centre National de la Recherche Scientifique CNRS UMR 8246, the University Pierre et Marie Curie (Programme Emergence 2012 for J.N. and P.F.), the Agence Nationale pour la Recherche (ANR Programme Blanc 2012 for P.F., ANR JCJC to A.M.), the Neuropole de Recherche Francilien (NeRF) of Ile de France, the Foundation for Medical Research (FRM, Equipe FRM DEQ2013326488 to P.F.), the Bettencourt Schueller Foundation (Coup d'Elan 2012 to P.F.), the Ecole des Neurosciences de Paris (ENP) to P.F., the Fondation pour la Recherche sur le Cerveau (FRC et les rotariens de France, “espoir en tête” 2012) to P.F. and the Brain & Behavior Research Foundation for a NARSAD Young Investigator Grant to A.M. The laboratories of P.F. and U.M. are part of the École des Neurosciences de Paris Ile-de-France RTRA network. P.F. and U.M. are members of the Laboratory of Excellence, LabEx Bio-Psy, and P.F. is member of the DHU Pepsy.
Author information
Authors and Affiliations
Contributions
J.N. and P.F. designed the study. S.T. and J.N. performed the virus injections. M.D., N.T., G.R. and J.N. performed the behavioral experiments. S.V. and F.M. performed the electrophysiological recordings. S.P. and U.M. provided the genetic tools. J.N., F.M. and P.F. analyzed the data. J.N., A.M. and P.F. wrote the manuscript.
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing financial interests.
Integrated supplementary information
Supplementary Figure 1 Analysis of locomotion in the ICSS-based bandit task
(a) Dwell times (see Methods) were shorter (T(18)=3.67, p=0.002, paired t-test) in the CS than in the US. In US, there were no effects of the reward probability of the target on the dwell times (F(2,18)=0.2, p=0.82, one-way ANOVA). (b) Variation of the instantaneous speed in certain setting: in the CS, the maximal speed of WT mice depended on the ICSS intensity (F(2,18)=13.2, p<0.001, one-way ANOVA) in contrary to what was observed in the uncertainty setting (US) with different probabilities of reward (see Fig 1e).
Supplementary Figure 2 Comparison of models of decision-making and locomotion
(a) Bayesian Information Criterion (BIC) computed using three classical models of action selection ignoring uncertainty (matching law, epsilon-greedy, softmax, see Methods) and two alternative models taking uncertainty into account (softmax model with an uncertainty bonus, or with uncertainty-modulated temperature parameter, see Methods). Smaller BIC value indicates that the uncertainty bonus provided a better fit. (b) BIC derived from multiple linear regression (see Methods) for exploratory locomotion models embedding an increasing number of explicative variables. The red star and crosses indicate the winning model, which incorporate the reward history (R(t-1)) and the expected value (E(R)) and uncertainty (σ2(R)) of the chosen goal (indexed A), but not those of the alternative goal (indexed B).
Supplementary Figure 3 Robustness of the parameters derived from the decision-making model
(a) Comparison between transitions in the last two sessions (#9-10) displayed in the main part of the results, versus transitions measured two sessions before (#7-8). Transitions from these two data sets were not significantly different for all gambles (G1: T(18)=0.44, p=0.67; G2: T(18)=-1.36, p=0.19; G3: T(18)=-1.64, p=0.12, paired t-tests), indicating that the results are stable through the sessions, and that mice decisions reached steady state in this setting. (b, c, d) Proportions of exploitative choices (choice of the most valuable alternative) of the mice for the three gambles in different sets of reward probabilities: {25%, 50%, 75%} (b); {50%, 75%, 100%} (c); {25%, 75%, 100%} (d). (e) Parameters (ϕ and ß) derived from the model-based analysis (uncertainty model) of the transition functions, for the probabilities used in the main text (black) and in the present panels (b, green; c, purple; d, light blue). In each case the uncertainty-seeking parameter was significantly positive, showing that the parameters derived from the model provide a robust characterization (across probabilities sets) of the influence of uncertainty on decision-making process.
Supplementary Figure 4 β2*-nAChRs are not involved in motivation by certain rewards
(a) Learning of the task in the DS with increasing performance along learning sessions for both groups (session effect: F=12.16, p<0.001), which was not different between β2 KO and WT mice (genotype effect: F=0.04, p=0.84, genotype x session interaction: F=0.99, p=0.45, two-way ANOVA). (b) In the DS, the rate of ICSS behavior (number of ICSS per minute) scaled with the intensity of current pulses up to a plateau for both groups (intensity effect: p=<0.001, two-way ANOVA). When tested with different intensities of current pulses, β2KO mice performed the task with the same level of performance (genotype effect: F=1.22, p=0.27, genotype-intensity interaction: F=0.73, p=0.74, two-way ANOVA). (c) In the deterministic setting (DS), when the ICSS intensity increases (I(ICSS) = {40,80,120} µA), the speed profile of β2KO mice is affected, with higher maximal speed (F(2,10)=36.35, p<0.001, one-way ANOVA) at higher intensities of ICSS. (d) Maximal speeds corresponding to the three ICSS intensities ({40,80,120} µA) for β2 KO and WT mice did not differ significantly (genotype effect: F=0,86, p=0,36; genotype x intensity interaction: F=0.16, p=0.86). Note that the values given in (d) do not correspond to the peaks in the speed profiles because the maximum of the average speed profile does not necessarily correspond to the average of the maximal speeds.
Supplementary Figure 5 Analysis of locomotion in β2KO and β2VEC mice
(a) The speed profile of β2KO mice was not significantly modified by the reward probability of the target (F(2,10)=0.08, p=0.93). (b) β2KO mice travelled the same distance whatever the target probability (F(2,10)=0.14, p=0.87), hence the relation between the reward probability of the target place and the cumulative distance travelled was altered in β2KO mice. (c) The speed profiles of β2VEC mice were similar irrespective of the probability of the next reward (F(2,11)=0.21, p=0.81). (d) When going towards less likely ICSS, β2VEC mice tended to travel more (F(2,11)=6.2, p=0.005), showing that β2 nAChRs in the VTA is sufficient to restore the balance of exploiting the task versus exploring the open field.
Supplementary Figure 6 Additional measures of restoration of functional β2*-nAChRs by the lentiviral injection
(a-d) Example of a recorded neuron: (a) Neurobiotine (b) eGFP and (c) tyrosine hydroxylase, identify, respectively, DA cells (green), the neuron re-expressing the β2 subunit (red), and a recorded cell (blue). eGFP, enhanced green fluorescent protein. e) Mean ± s.e.m DA cell firing frequency increase after injection of 30 µg/gk nicotine concentration, in WT (n=46, gray), β2KO (n=20, red) and β2VEC (n=45, black) mice. f) Same for proportion of spike within burst (%SWB). Vertical dashed bar indicates nicotine injection.
Supplementary Figure 7 Model comparison and robustness in β2KO and β2VEC mice
(a,b) Bayesian Information Criterion (BIC) computed using the four models of action selection (matching law, epsilon-greedy, softmax, softmax with an uncertainty bonus, see Methods) for (a) β2KO mice, (b) β2VEC mice. In each case, the uncertainty model provided smaller BIC, which indicates better fit. (c, d, e) Proportions of exploitative choices (choice of the most valuable alternative) of β2KO mice for the three gambles in different sets of reward probabilities: {25%, 50%, 75%} (c); {50%, 75%, 100%} (d); {25%, 75%, 100%} (e). (f) Parameters derived from the model-based analysis (uncertainty model) of the transition functions of β2KO mice, for the probabilities used in the main text (black) and in the present panels (b, green; c, purple; d, light blue). The model parameters did not significantly differ between probability sets (for ϕ, F(3,37)=0,32; p=0,81; for β, F(3,37)=0,26; p=0,85).
Supplementary Figure 8 Learning phase in the probabilistic task: experimental data and model comparison
(a,b) Evolution of the proportion of choices of the three rewarded locations in the uncertain setting, across the learning sessions, for WT (a) and β2KO (b) mice. (c,d) Difference in Bayesian information criterion (compared to the standard RL model) of models including an expected uncertainty bonus (“uncertainty”), an adaptive learning rate (“adaptive LR”) and an unexpected uncertainty bonus, for WT (c) and β2KO (d) mice. (e,f) Model fits of the experimental data shown in (a,b) for the winning models, i.e. expected uncertainty for WT mice, and standard model for β2KO mice.
Supplementary Figure 9 Model comparison in the dynamic foraging task
(a) Computational models of reinforcement-learning and decision-making used to analyze the behavioral data, summarizing whether sensitivity to uncertain outcomes arises from learning, decision, or both processes. (b,c) Bayesian Information Criterion (BIC) for the standard reinforcement learning model and alternative models: standard model with asymmetric learning (L) rates for positive and negative outcomes, uncertainty model with a single learning rate for value and uncertainty (bonus), uncertainty model with separate learning rates for value and uncertainty, uncertainty model with three learning rates (for positive and negative outcomes, and for uncertainty). Smaller BIC value indicates better fit, which was the uncertainty model with separate learning rates for value and uncertainty for WT mice (b) and the standard reinforcement learning model for β2KO mice (c).
Supplementary Figure 10 Alternative models for the spatial learning and passive avoidance tasks
(a) Variations of the temperature parameter (ß) in the simulation of the spatial learning task using the standard reinforcement-learning model. Original experimental data are represented (mean ± sem) by dots (black for WT, red for β2KO). The curves represent the modeling of the data with an increased value of ß (from top to bottom, black to dark blue). (b) Variations of the initial value (V0) of the rewarding arm in the simulation the spatial learning task using the standard reinforcement-learning model. Same presentation as (a). (c) Variations of the learning rate (α) in the simulation the spatial learning task using the standard reinforcement-learning model. Same presentation as (a). (d) In the simulation the spatial learning task using the standard reinforcement-learning model, combined modifications of initial value and learning rate hardly explain the WT data. Data are shown as dots with error bars (mean + s.e.m), simulation as stripes. (e) Variations of the temperature parameter (ß) in the simulation the passive avoidance task using a sequential reinforcement-learning model. (Same presentation as (a). (f) Variations of the baseline activity (θ) in the simulation the passive avoidance task using a sequential reinforcement-learning model. (Same presentation as (a). Data in (a-d) adapted with permission from Ref 42. Data in (e,f) adapted with permission from Ref 43.
Supplementary Figure 11 Model simulation: open-fields without rewards and object recognition
(a) Decomposition of behavior in an open-field. Locomotion in the open field is transformed into four states, resulting from the differentiation between active (A) or inactive (I) states (depending on the velocity) and periphery (P) or center (C) zones. (b) Discretized representation of the behavior based on the four-states decomposition, used for model simulation. Possible transitions are represented by plain arrows and forbidden transition by dashed arrows. (c,d) Simulation of transition probabilities between “center-active” (CA) and “center-inactive” states (c), and between “periphery-active” and “center-active” (d), for WT (black, model with uncertainty bonus) and β2KO (red, model without uncertainty bonus) mice. (e) Simulation of total time spent in inactive states (PI and CI) for WT (black) and β2KO (red) mice. (f) Object recognition in an open-field. Two states represent the object areas, the rest of the open-field is modeled as 25 discrete states. (g) Total time spent in the “object areas” states for WT (black, model with uncertainty bonus) and β2KO (red, model without uncertainty bonus) mice. Data in (c- e) adapted with permission from Ref 13. Data in (g) adapted with permission from Ref 42.
Supplementary information
Supplementary Text and Figures
Supplementary Figures 1–11 (PDF 2272 kb)
Supplementary Methods Checklist
(PDF 494 kb)
Rights and permissions
About this article
Cite this article
Naudé, J., Tolu, S., Dongelmans, M. et al. Nicotinic receptors in the ventral tegmental area promote uncertainty-seeking. Nat Neurosci 19, 471–478 (2016). https://doi.org/10.1038/nn.4223
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1038/nn.4223