Nicotinic receptors in the ventral tegmental area promote uncertainty-seeking


Cholinergic neurotransmission affects decision-making, notably through the modulation of perceptual processing in the cortex. In addition, acetylcholine acts on value-based decisions through as yet unknown mechanisms. We found that nicotinic acetylcholine receptors (nAChRs) expressed in the ventral tegmental area (VTA) are involved in the translation of expected uncertainty into motivational value. We developed a multi-armed bandit task for mice with three locations, each associated with a different reward probability. We found that mice lacking the nAChR β2 subunit showed less uncertainty-seeking than their wild-type counterparts. Using model-based analysis, we found that reward uncertainty motivated wild-type mice, but not mice lacking the nAChR β2 subunit. Selective re-expression of the β2 subunit in the VTA was sufficient to restore spontaneous bursting activity in dopamine neurons and uncertainty-seeking. Our results reveal an unanticipated role for subcortical nAChRs in motivation induced by expected uncertainty and provide a parsimonious account for a wealth of behaviors related to nAChRs in the VTA expressing the β2 subunit.

Figure 1: Decisions under uncertainty in a mouse bandit task using intracranial self-stimulations.
Figure 2: Model-based analysis of decisions shows motivation for expected uncertainty.
Figure 3: β2*-nAChRs in the VTA affect choices and locomotion.
Figure 4: Model-based analysis reveals a role for VTA β2-nAChR in uncertainty-driven motivation.
Figure 5: β2*-nAChRs affect decision-making under uncertainty in a dynamical foraging task.
Figure 6: New interpretation of behaviors related to VTA nAChRs using the uncertainty model.

We thank E. Guigon for discussions, C. Prévost-Solié for technical support, and J.-P. Changeux, E. Ey, G. Dugué and A. Boo for comments on the manuscript. This work was supported by the Centre National de la Recherche Scientifique CNRS UMR 8246, the University Pierre et Marie Curie (Programme Emergence 2012 for J.N. and P.F.), the Agence Nationale pour la Recherche (ANR Programme Blanc 2012 for P.F., ANR JCJC to A.M.), the Neuropole de Recherche Francilien (NeRF) of Ile de France, the Foundation for Medical Research (FRM, Equipe FRM DEQ2013326488 to P.F.), the Bettencourt Schueller Foundation (Coup d'Elan 2012 to P.F.), the Ecole des Neurosciences de Paris (ENP) to P.F., the Fondation pour la Recherche sur le Cerveau (FRC et les rotariens de France, “espoir en tête” 2012) to P.F. and the Brain & Behavior Research Foundation for a NARSAD Young Investigator Grant to A.M. The laboratories of P.F. and U.M. are part of the École des Neurosciences de Paris Ile-de-France RTRA network. P.F. and U.M. are members of the Laboratory of Excellence, LabEx Bio-Psy, and P.F. is member of the DHU Pepsy.

J.N. and P.F. designed the study. S.T. and J.N. performed the virus injections. M.D., N.T., G.R. and J.N. performed the behavioral experiments. S.V. and F.M. performed the electrophysiological recordings. S.P. and U.M. provided the genetic tools. J.N., F.M. and P.F. analyzed the data. J.N., A.M. and P.F. wrote the manuscript.

Correspondence to Philippe Faure.

The authors declare no competing financial interests.

Integrated supplementary information

Supplementary Figure 1 Analysis of locomotion in the ICSS-based bandit task

(a) Dwell times (see Methods) were shorter (T(18)=3.67, p=0.002, paired t-test) in the CS than in the US. In US, there were no effects of the reward probability of the target on the dwell times (F(2,18)=0.2, p=0.82, one-way ANOVA). (b) Variation of the instantaneous speed in certain setting: in the CS, the maximal speed of WT mice depended on the ICSS intensity (F(2,18)=13.2, p<0.001, one-way ANOVA) in contrary to what was observed in the uncertainty setting (US) with different probabilities of reward (see Fig 1e).

Supplementary Figure 2 Comparison of models of decision-making and locomotion

(a) Bayesian Information Criterion (BIC) computed using three classical models of action selection ignoring uncertainty (matching law, epsilon-greedy, softmax, see Methods) and two alternative models taking uncertainty into account (softmax model with an uncertainty bonus, or with uncertainty-modulated temperature parameter, see Methods). Smaller BIC value indicates that the uncertainty bonus provided a better fit. (b) BIC derived from multiple linear regression (see Methods) for exploratory locomotion models embedding an increasing number of explicative variables. The red star and crosses indicate the winning model, which incorporate the reward history (R(t-1)) and the expected value (E(R)) and uncertainty (σ2(R)) of the chosen goal (indexed A), but not those of the alternative goal (indexed B).

Supplementary Figure 3 Robustness of the parameters derived from the decision-making model

(a) Comparison between transitions in the last two sessions (#9-10) displayed in the main part of the results, versus transitions measured two sessions before (#7-8). Transitions from these two data sets were not significantly different for all gambles (G1: T(18)=0.44, p=0.67; G2: T(18)=-1.36, p=0.19; G3: T(18)=-1.64, p=0.12, paired t-tests), indicating that the results are stable through the sessions, and that mice decisions reached steady state in this setting. (b, c, d) Proportions of exploitative choices (choice of the most valuable alternative) of the mice for the three gambles in different sets of reward probabilities: {25%, 50%, 75%} (b); {50%, 75%, 100%} (c); {25%, 75%, 100%} (d). (e) Parameters (ϕ and ß) derived from the model-based analysis (uncertainty model) of the transition functions, for the probabilities used in the main text (black) and in the present panels (b, green; c, purple; d, light blue). In each case the uncertainty-seeking parameter was significantly positive, showing that the parameters derived from the model provide a robust characterization (across probabilities sets) of the influence of uncertainty on decision-making process.

Supplementary Figure 4 β2*-nAChRs are not involved in motivation by certain rewards

(a) Learning of the task in the DS with increasing performance along learning sessions for both groups (session effect: F=12.16, p<0.001), which was not different between β2 KO and WT mice (genotype effect: F=0.04, p=0.84, genotype x session interaction: F=0.99, p=0.45, two-way ANOVA). (b) In the DS, the rate of ICSS behavior (number of ICSS per minute) scaled with the intensity of current pulses up to a plateau for both groups (intensity effect: p=<0.001, two-way ANOVA). When tested with different intensities of current pulses, β2KO mice performed the task with the same level of performance (genotype effect: F=1.22, p=0.27, genotype-intensity interaction: F=0.73, p=0.74, two-way ANOVA). (c) In the deterministic setting (DS), when the ICSS intensity increases (I(ICSS) = {40,80,120} µA), the speed profile of β2KO mice is affected, with higher maximal speed (F(2,10)=36.35, p<0.001, one-way ANOVA) at higher intensities of ICSS. (d) Maximal speeds corresponding to the three ICSS intensities ({40,80,120} µA) for β2 KO and WT mice did not differ significantly (genotype effect: F=0,86, p=0,36; genotype x intensity interaction: F=0.16, p=0.86). Note that the values given in (d) do not correspond to the peaks in the speed profiles because the maximum of the average speed profile does not necessarily correspond to the average of the maximal speeds.

Supplementary Figure 5 Analysis of locomotion in β2KO and β2VEC mice

(a) The speed profile of β2KO mice was not significantly modified by the reward probability of the target (F(2,10)=0.08, p=0.93). (b) β2KO mice travelled the same distance whatever the target probability (F(2,10)=0.14, p=0.87), hence the relation between the reward probability of the target place and the cumulative distance travelled was altered in β2KO mice. (c) The speed profiles of β2VEC mice were similar irrespective of the probability of the next reward (F(2,11)=0.21, p=0.81). (d) When going towards less likely ICSS, β2VEC mice tended to travel more (F(2,11)=6.2, p=0.005), showing that β2 nAChRs in the VTA is sufficient to restore the balance of exploiting the task versus exploring the open field.

Supplementary Figure 6 Additional measures of restoration of functional β2*-nAChRs by the lentiviral injection

(a-d) Example of a recorded neuron: (a) Neurobiotine (b) eGFP and (c) tyrosine hydroxylase, identify, respectively, DA cells (green), the neuron re-expressing the β2 subunit (red), and a recorded cell (blue). eGFP, enhanced green fluorescent protein. e) Mean ± s.e.m DA cell firing frequency increase after injection of 30 µg/gk nicotine concentration, in WT (n=46, gray), β2KO (n=20, red) and β2VEC (n=45, black) mice. f) Same for proportion of spike within burst (%SWB). Vertical dashed bar indicates nicotine injection.

Supplementary Figure 7 Model comparison and robustness in β2KO and β2VEC mice

(a,b) Bayesian Information Criterion (BIC) computed using the four models of action selection (matching law, epsilon-greedy, softmax, softmax with an uncertainty bonus, see Methods) for (a) β2KO mice, (b) β2VEC mice. In each case, the uncertainty model provided smaller BIC, which indicates better fit. (c, d, e) Proportions of exploitative choices (choice of the most valuable alternative) of β2KO mice for the three gambles in different sets of reward probabilities: {25%, 50%, 75%} (c); {50%, 75%, 100%} (d); {25%, 75%, 100%} (e). (f) Parameters derived from the model-based analysis (uncertainty model) of the transition functions of β2KO mice, for the probabilities used in the main text (black) and in the present panels (b, green; c, purple; d, light blue). The model parameters did not significantly differ between probability sets (for ϕ, F(3,37)=0,32; p=0,81; for β, F(3,37)=0,26; p=0,85).

Supplementary Figure 8 Learning phase in the probabilistic task: experimental data and model comparison

(a,b) Evolution of the proportion of choices of the three rewarded locations in the uncertain setting, across the learning sessions, for WT (a) and β2KO (b) mice. (c,d) Difference in Bayesian information criterion (compared to the standard RL model) of models including an expected uncertainty bonus (“uncertainty”), an adaptive learning rate (“adaptive LR”) and an unexpected uncertainty bonus, for WT (c) and β2KO (d) mice. (e,f) Model fits of the experimental data shown in (a,b) for the winning models, i.e. expected uncertainty for WT mice, and standard model for β2KO mice.

Supplementary Figure 9 Model comparison in the dynamic foraging task

(a) Computational models of reinforcement-learning and decision-making used to analyze the behavioral data, summarizing whether sensitivity to uncertain outcomes arises from learning, decision, or both processes. (b,c) Bayesian Information Criterion (BIC) for the standard reinforcement learning model and alternative models: standard model with asymmetric learning (L) rates for positive and negative outcomes, uncertainty model with a single learning rate for value and uncertainty (bonus), uncertainty model with separate learning rates for value and uncertainty, uncertainty model with three learning rates (for positive and negative outcomes, and for uncertainty). Smaller BIC value indicates better fit, which was the uncertainty model with separate learning rates for value and uncertainty for WT mice (b) and the standard reinforcement learning model for β2KO mice (c).

Supplementary Figure 10 Alternative models for the spatial learning and passive avoidance tasks

(a) Variations of the temperature parameter (ß) in the simulation of the spatial learning task using the standard reinforcement-learning model. Original experimental data are represented (mean ± sem) by dots (black for WT, red for β2KO). The curves represent the modeling of the data with an increased value of ß (from top to bottom, black to dark blue). (b) Variations of the initial value (V0) of the rewarding arm in the simulation the spatial learning task using the standard reinforcement-learning model. Same presentation as (a). (c) Variations of the learning rate (α) in the simulation the spatial learning task using the standard reinforcement-learning model. Same presentation as (a). (d) In the simulation the spatial learning task using the standard reinforcement-learning model, combined modifications of initial value and learning rate hardly explain the WT data. Data are shown as dots with error bars (mean + s.e.m), simulation as stripes. (e) Variations of the temperature parameter (ß) in the simulation the passive avoidance task using a sequential reinforcement-learning model. (Same presentation as (a). (f) Variations of the baseline activity (θ) in the simulation the passive avoidance task using a sequential reinforcement-learning model. (Same presentation as (a). Data in (a-d) adapted with permission from Ref 42. Data in (e,f) adapted with permission from Ref 43.

Supplementary Figure 11 Model simulation: open-fields without rewards and object recognition

(a) Decomposition of behavior in an open-field. Locomotion in the open field is transformed into four states, resulting from the differentiation between active (A) or inactive (I) states (depending on the velocity) and periphery (P) or center (C) zones. (b) Discretized representation of the behavior based on the four-states decomposition, used for model simulation. Possible transitions are represented by plain arrows and forbidden transition by dashed arrows. (c,d) Simulation of transition probabilities between “center-active” (CA) and “center-inactive” states (c), and between “periphery-active” and “center-active” (d), for WT (black, model with uncertainty bonus) and β2KO (red, model without uncertainty bonus) mice. (e) Simulation of total time spent in inactive states (PI and CI) for WT (black) and β2KO (red) mice. (f) Object recognition in an open-field. Two states represent the object areas, the rest of the open-field is modeled as 25 discrete states. (g) Total time spent in the “object areas” states for WT (black, model with uncertainty bonus) and β2KO (red, model without uncertainty bonus) mice. Data in (c- e) adapted with permission from Ref 13. Data in (g) adapted with permission from Ref 42.

Supplementary information

Supplementary Text and Figures

Supplementary Figures 1–11 (PDF 2272 kb)

Supplementary Methods Checklist

(PDF 494 kb)

Naudé, J., Tolu, S., Dongelmans, M. et al. Nicotinic receptors in the ventral tegmental area promote uncertainty-seeking. Nat Neurosci 19, 471–478 (2016).

