There is increased appreciation that dopamine neurons in the midbrain respond not only to reward1 and reward-predicting cues1,2, but also to other variables such as the distance to reward3, movements4,5,6,7,8,9 and behavioural choices10,11. An important question is how the responses to these diverse variables are organized across the population of dopamine neurons. Whether individual dopamine neurons multiplex several variables, or whether there are subsets of neurons that are specialized in encoding specific behavioural variables remains unclear. This fundamental question has been difficult to resolve because recordings from large populations of individual dopamine neurons have not been performed in a behavioural task with sufficient complexity to examine these diverse variables simultaneously. Here, to address this gap, we used two-photon calcium imaging through an implanted lens to record the activity of more than 300 dopamine neurons from the ventral tegmental area of the mouse midbrain during a complex decision-making task. As mice navigated in a virtual-reality environment, dopamine neurons encoded an array of sensory, motor and cognitive variables. These responses were functionally clustered, such that subpopulations of neurons transmitted information about a subset of behavioural variables, in addition to encoding reward. These functional clusters were spatially organized, with neighbouring neurons more likely to be part of the same cluster. Together with the topography between dopamine neurons and their projections, this specialization and anatomical organization may aid downstream circuits in correctly interpreting the wide range of signals transmitted by dopamine neurons.
Subscribe to Journal
Get full journal access for 1 year
only $3.90 per issue
All prices are NET prices.
VAT will be added later in the checkout.
Rent or Buy article
Get time limited or full article access on ReadCube.
All prices are NET prices.
The code for the encoding model and the motion correction is available on github (https://github.com/benengx). All other code is available upon reasonable request.
All data are available from the corresponding author upon reasonable request.
Cohen, J. Y., Haesler, S., Vong, L., Lowell, B. B. & Uchida, N. Neuron-type-specific signals for reward and punishment in the ventral tegmental area. Nature 482, 85–88 (2012).
Schultz, W., Dayan, P. & Montague, P. R. A neural substrate of prediction and reward. Science 275, 1593–1599 (1997).
Howe, M. W., Tierney, P. L., Sandberg, S. G., Phillips, P. E. M. & Graybiel, A. M. Prolonged dopamine signalling in striatum signals proximity and value of distant rewards. Nature 500, 575–579 (2013).
Howe, M. W. & Dombeck, D. A. Rapid signalling in distinct dopaminergic axons during locomotion and reward. Nature 535, 505–510 (2016).
Barter, J. W. et al. Beyond reward prediction errors: the role of dopamine in movement kinematics. Front. Integr. Neurosci. 9, 39 (2015).
Dodson, P. D. et al. Representation of spontaneous movement by dopaminergic neurons is cell-type selective and disrupted in parkinsonism. Proc. Natl Acad. Sci. USA 113, E2180–E2188 (2016).
da Silva, J. A., Tecuapetla, F., Paixão, V. & Costa, R. M. Dopamine neuron activity before action initiation gates and invigorates future movements. Nature 554, 244–248 (2018).
Coddington, L. T. & Dudman, J. T. The timing of action determines reward prediction signals in identified midbrain dopamine neurons. Nat. Neurosci. 21, 1563–1573 (2018).
Kremer, Y., Flakowski, J., Rohner, C. & Lüscher, C. VTA dopamine neurons multiplex external with internal representations of goal-directed action. Preprint at https://www.bioRxiv.org/content/10.1101/408062v1 (2018).
Howard, C. D., Li, H., Geddes, C. E. & Jin, X. Dynamic nigrostriatal dopamine biases action selection. Neuron 93, 1436–1450.e8 (2017).
Parker, N. F. et al. Reward and choice encoding in terminals of midbrain dopamine neurons depends on striatal target. Nat. Neurosci. 19, 845–854 (2016).
Steinberg, E. E. et al. A causal link between prediction errors, dopamine neurons and learning. Nat. Neurosci. 16, 966–973 (2013).
Bayer, H. M. & Glimcher, P. W. Midbrain dopamine neurons encode a quantitative reward prediction error signal. Neuron 47, 129–141 (2005).
Lak, A., Nomoto, K., Keramati, M., Sakagami, M. & Kepecs, A. Midbrain dopamine neurons signal belief in choice accuracy during a perceptual decision. Curr. Biol. 27, 821–832 (2017).
Pinto, L. et al. An accumulation-of-evidence task using visual pulses for mice navigating in virtual reality. Front. Behav. Neurosci. 12, 36 (2018).
Barretto, R. P. J., Messerschmidt, B. & Schnitzer, M. J. In vivo fluorescence imaging with high-resolution microlenses. Nat. Methods 6, 511–512 (2009).
Carelli, R. M. Nucleus accumbens cell firing and rapid dopamine signaling during goal-directed behaviors in rats. Neuropharmacology 47 (Suppl. 1), 180–189 (2004).
Hamid, A. A. et al. Mesolimbic dopamine signals the value of work. Nat. Neurosci. 19, 117–126 (2016).
Kim, H. F., Ghazizadeh, A. & Hikosaka, O. Dopamine neurons encoding long-term memory of object value for habitual behavior. Cell 163, 1165–1175 (2015).
Slonim, N., Atwal, G. S., Tkacik, G. & Bialek, W. Information-based clustering. Proc. Natl Acad. Sci. USA 102, 18297–18302 (2005).
Cox, J., Pinto, L. & Dan, Y. Calcium imaging of sleep-wake related neuronal activity in the dorsal pons. Nat. Commun. 7, 10763 (2016).
Eshel, N., Tian, J., Bukwich, M. & Uchida, N. Dopamine neurons share common response function for reward prediction error. Nat. Neurosci. 19, 479–486 (2016).
Joshua, M. et al. Synchronization of midbrain dopaminergic neurons is enhanced by rewarding events. Neuron 62, 695–704 (2009).
Kim, Y., Wood, J. & Moghaddam, B. Coordinated activity of ventral tegmental neurons adapts to appetitive and aversive learning. PLoS ONE 7, e29766 (2012).
Pillow, J. W. et al. Spatio-temporal correlations and visual signalling in a complete neuronal population. Nature 454, 995–999 (2008).
Beier, K. T. et al. Circuit architecture of VTA dopamine neurons revealed by systematic input-output mapping. Cell 162, 622–634 (2015).
Lammel, S. et al. Unique properties of mesoprefrontal neurons within a dual mesocorticolimbic dopamine system. Neuron 57, 760–773 (2008).
Tsai, H.-C. et al. Phasic firing in dopaminergic neurons is sufficient for behavioral conditioning. Science 324, 1080–1084 (2009).
Surmeier, D. J., Ding, J., Day, M., Wang, Z. & Shen, W. D1 and D2 dopamine-receptor modulation of striatal glutamatergic signaling in striatal medium spiny neurons. Trends Neurosci. 30, 228–235 (2007).
Panigrahi, B. et al. Dopamine is required for the neural representation and control of movement vigor. Cell 162, 1418–1430 (2015).
Lammel, S. et al. Diversity of transgenic mouse models for selective targeting of midbrain dopamine neurons. Neuron 85, 429–438 (2015).
Daigle, T. L. et al. A suite of transgenic driver and reporter mouse lines with enhanced brain-cell-type targeting and functionality. Cell 174, 465–480.e22 (2018).
Dombeck, D. A., Khabbaz, A. N., Collman, F., Adelman, T. L. & Tank, D. W. Imaging large-scale neural activity with cellular resolution in awake, mobile mice. Neuron 56, 43–57 (2007).
Harvey, C. D., Coen, P. & Tank, D. W. Choice-specific sequences in parietal cortex during a virtual-navigation decision task. Nature 484, 62–68 (2012).
Low, R. J., Gu, Y. & Tank, D. W. Cellular resolution optical access to brain regions in fissures: imaging medial prefrontal cortex and grid cells in entorhinal cortex. Proc. Natl Acad. Sci. USA 111, 18739–18744 (2014).
Aronov, D. & Tank, D. W. Engagement of neural circuits underlying 2D spatial navigation in a rodent virtual reality system. Neuron 84, 442–456 (2014).
Pologruto, T. A., Sabatini, B. L. & Svoboda, K. ScanImage: flexible software for operating laser scanning microscopes. Biomed. Eng. Online 2, 13 (2003).
Sage, D. & Unser, M. Teaching image-processing programming in Java. IEEE Signal Process. Mag. 20, 43–52 (2003).
Chen, T.-W. et al. Ultrasensitive fluorescent proteins for imaging neuronal activity. Nature 499, 295–300 (2013).
Kerlin, A. M., Andermann, M. L., Berezovskii, V. K. & Reid, R. C. Broadly tuned response properties of diverse inhibitory neuron subtypes in mouse visual cortex. Neuron 67, 858–871 (2010).
Pinto, L. & Dan, Y. Cell-type-specific activity in prefrontal cortex during goal-directed behavior. Neuron 87, 437–450 (2015).
Fürth, D. et al. An interactive framework for whole-brain maps at cellular resolution. Nat. Neurosci. 21, 139–149 (2018).
Runyan, C. A., Piasini, E., Panzeri, S. & Harvey, C. D. Distinct timescales of population coding across cortex. Nature 548, 92–96 (2017).
Mereu, G. et al. Spontaneous bursting activity of dopaminergic neurons in midbrain slices from immature rats: role of N-methyl-d-aspartate receptors. Neuroscience 77, 1029–1036 (1997).
Lein, E. S. et al. Genome-wide atlas of gene expression in the adult mouse brain. Nature 445, 168–176 (2007).
We thank J. Y. Choi, S. S. H. Wang, J. Pillow, D. Witten, L. Pinto, S. Bolkan, D. Lee, N. Engelhard, B. Deverett, A. Song, B. Briones and C. Brody, as well as the BRAINCOGS team, the Bezos Center, the McDonnell Center, and the Witten and Tank laboratories. We also thank E. Engel for reagents. Funding was from ELSC and EMBO (B.E.); NYSCF, Pew, McKnight, NARSAD, and Sloan Foundation (I.B.W.); ARO grants: W911NF-16-1-0474 (N.D.D.), W911NF-17-1-0554 (I.B.W.), and NIH grants U19 NS104648-01, DP2 DA035149-01, 1R01DAA047869-01 and 5R01MH106689-02 (I.B.W.). I.B.W. is a New York Stem Cell Foundation—Robertson Investigator.
Nature thanks Regina M. Carelli, Patricia Janak and the other anonymous reviewer(s) for their contribution to the peer review of this work.
The authors declare no competing interests.
Publisher’s note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Extended data figures and tables
Extended Data Fig. 1 Features of the virtual-reality task, encoding model predictions and selection of the encoding model.
a, Example screenshots of the virtual world presented to the mouse in different positions along the maze. b, Activity trace during six consecutive trials of an example neuron that was significantly modulated by position in the central stem (P = 0, statistical significance was assessed by comparing the F-statistic obtained from a nested model comparison with or without position to a distribution of the same F-statistic obtained from shuffled data; see Methods). The coloured strip below the trace describes the trial epochs. Reward delivery is denoted by a water droplet. c, ∆F/F traces for ten example neurons during six consecutive trials (green). Overlaid are the predictions of the behavioural model for these trials (blue). The coloured strip below each trace denotes the trial epochs. Reward delivery is denoted by a water droplet. d, Mean (across neurons) of percentage variance explained (tested on held-out data with fivefold cross-validation) by the final model (red) and other models in which variables was either removed (blue) or added (green). See Methods for descriptions of all variables tested. All models for which a variable was removed from the final model performed significantly worse, based on comparing R2 for all neurons (P < 2 × 10−6, two-sided paired t-test, n = 303, Holm–Bonferroni correction for all model comparisons). For models in which variables were added to those in the final model, the performance either did not exhibit a significant difference, or was degraded. See Methods for complete description of all models. e, Comparison of performance for all neurons of the final model (x axis) and all the other models. Each panel shows the comparison with one model; significance of the two-sided paired t-test (after Holm–Bonferroni correction) is shown in each panel. n = 303 in all cases.
Extended Data Fig. 2 Simultaneous calcium imaging and cell-attached recording in DA neurons in the VTA of Ai148×DAT::cre mice.
a, Relative change in fluorescence (top) and cell-attached current (bottom) recorded simultaneously. b, Average spike-triggered fluorescence (averaged over n = 126 spikes) for the cell in a. c, Magnified spike waveform for cell in a. d, Examples of bursts from three different DA cells, showing cell-attached current (top) and change in fluorescence (bottom). The spike times are shown with black bars under the fluorescence trace. The red horizontal bars under the current traces show the timing of NMDA puffs (see Methods).
We developed a custom motion-correction procedure to compensate for both non-rigid slow drift of the field of view (timescale of tens of minutes) as well as non-rigid fast motion (timescale of tens of milliseconds). Importantly, the procedure avoids any use of interpolation, which can produce artefacts. The procedure consists of the following main steps. (1) Blue box. The entire movie is divided in non-overlapping 50-s clips; in each clip we perform rigid motion correction using standard cross-correlation methods (on the red channel). The template for each clip is calculated by dividing the clip into non-overlapping sections of 100 frames, calculating the mean image of each section, and obtaining the median of the mean images. (2) Red box. We use a non-rigid algorithm for image registration to align all the templates. The algorithm outputs shift parameters for every pixel and template. Separately, we manually draw patches that include neurons of interest in the first template. For each template, we use the shift parameters of all the pixels in each patch to estimate the average motion of the patch. We use that information to crop the patch from each 50-s clip of the movie. (3) Orange box. We perform rigid motion correction (as above) on the concatenated patch movies, down-sample by a factor of two (to increase the signal strength) and then perform rigid motion correction again. (4) Green box. We extract the patch templates by using the mean projection, and hand-draw ROIs of the neurons. See Methods for a detailed explanation of the motion-correction algorithm, and Supplementary Video 2 for an example video before and after correction. Code is available at https://github.com/benengx/Deep-Brain-Motion-Corr.
Extended Data Fig. 4 Recovered neuron locations and validation of the spatial organization of neural responses.
a, Example of lens location recovery. Coronal histological slices stained for tyrosine hydroxylase were aligned to the Allen Brain Atlas45 using the Wholebrain software package42. The centre of the lens was marked and its position in common coordinates was recovered using the software. SNC, substantia nigra pars compacta. b, Left, recovered centres of GRIN lenses from all mice (black ellipses) are shown on top of the atlas images. Right, recovered locations of all neurons that entered the clustering analysis based on an encoding model R2 during the cue period >5% (n = 233; see Methods for details of location recovery). Neurons are colour-coded according to their cluster identity. A/P, anteroposterior. c, Relative contributions of each behavioural variable as a function of neuron location along the anteroposterior, mediolateral and dorsoventral axes. In each row, the relative contribution of a behavioural variable is correlated with the anteroposterior (left), mediolateral (middle) or dorsoventral (right) locations. The correlation value and significance (after Holm–Bonferroni correction for all tests) is shown in the panel (n = 233 in all cases). The linear fit of the entire population is shown by a black line, and linear fits of neurons belonging to individual mice (which had more than five neurons) are shown by grey lines. d, Statistical tests of the spatial organization of responses to different behavioural variables that account for individual differences across mice. The table lists the P values and F-statistics obtained for three statistical tests for the spatial organization of the cue-period variables. The first test was a mixed effect model that included all neurons that had good fit to the behavioural model during the cue period (R2 > 5%, n = 233). In this model, the relative contribution for a given variable to each neuron was the dependent variable; the anteroposterior, mediolateral and dorsoventral locations and their pairwise interactions were independent fixed effects; and the mouse identity was a random effect for the intercepts (MATLAB code: model = fitglme(Data,‘variable~ml∗ap∗dv-ml:ap:dv+(1|mouseID)’)). For this test, the degrees of freedom for the numerator and denominator were 6 and 226, respectively. In the field of view (FOV) tests, for every variable we averaged the relative contributions of all neurons in a given FOV (for mice that had two FOVs, we combined neurons from the two FOVs). A regression was run with the average relative contributions as the dependent variable, and the anteroposterior, mediolateral and dorsoventral lens’ locations and their pairwise interactions were independent fixed effects (n = 19). In the weighted version of the FOV test, we also weighed each FOV observation by the number of neurons in that FOV. For these two tests, the degrees of freedom for the numerator and denominator were 6 and 12, respectively. In all cases, the listed P values correspond to the F-test for the fixed effects. M/L, mediolateral. D/V, dorsoventral.
Extended Data Fig. 5 Average activity and relative contributions of different behavioural variables for several example cells.
The panels show activity averages time-locked to different behavioural variables for six example cells. The percentage of relative contribution of the corresponding behavioural variable to the activity of each cell is shown.
a, Distributions of the negative log-likelihood of the clustering model (Fig. 3) for shuffled (grey) versus real (red) data indicates a significant fit of the clustering model. Shuffling of relative contributions across variables (top) and across neurons (bottom). b, Prediction of choice and accuracy from neurons in each cluster. For each neuron, decoding was performed by logistic regression using the average cue period activity (on a trial-by-trial basis) to predict choice or accuracy. Regression was performed using tenfold cross-validation (over trials). Separate decoders were trained to predict either choice or accuracy. Weighted decoding was used to control for the different number of trials of each type (left/right choices or correct/incorrect trials; see Methods). Each panel shows a histogram of the decoding performance for a given variable (left column: choice, right column: accuracy) and a given cluster (rows). Grey vertical lines indicate 50% performance (chance level). Vertical yellow lines indicate the median of the distribution. Significance was assessed by a two-sided Wilcoxon signed-rank test after a Holm–Bonferroni correction for the ten tests. For clusters 1 to 5, n = 74, 36, 27, 27 and 26, respectively. The predictive power of the different clusters is broadly consistent with their association with the different behavioural variables: choice was significantly predicted by neurons belonging to clusters 1 (associated primarily with kinematics, which contains view angle) and 3 (associated primarily with cues, which determine choice for successful trials). The strongest predictive power for the mice’s accuracy is exhibited by cluster 5, which is primarily associated with accuracy. c, Noise correlations estimated by an alternative method. Here, noise correlations were estimated by calculating the increase in variance explained by the behavioural-only encoding model when the second neuron’s activity was added to it as a predictor25,43. The noise correlation estimate is shown for all neuronal pairs (n = 1,492) during the cue period (left) and outcome period (right). d, To investigate the possible effect of neuropil contamination on the observed relationship between pairwise correlations and distance (Fig. 4), we recalculated the relationship between correlations and interneuronal distance as a function of the neuropil correction factor. In all cases, we find a similar pattern to that presented in Fig. 4. e, To investigate the relationship between task performance and neural encoding, mice were divided into two groups based on their task performance. The relative contributions of the behavioural variables were averaged separately for neurons belonging to the mice in each group. Consistent with modulation by reward expectation, we found that cue-related activity was stronger and reward responses were weaker in the top-performing mice. Previous reward (which does not provide useful information for task performance) was more strongly represented in the bottom-performing mice (two-sided Wilcoxon signed-rank test, n1 = 129 neurons in the top-performing mice, n2 = 104 in the bottom-performing mice, with Holm–Bonferroni correction for the six tests). f, To investigate the relationship between instantaneous performance and neural encoding, for each session, all trials were grouped into blocks of ten consecutive trials with no overlap; these blocks were split into two groups based on whether the average performance in the block was greater or less than the median performance across all blocks in that session. The relative contributions of all behavioural variables calculated separately for the better- or worse-performance blocks are shown. The results did not show a significant difference for any of the variables (two-sided Wilcoxon signed-rank test, n1 = n2 = 233 neurons, with Holm–Bonferroni correction for the six tests).
a, Summary of average relative contributions of the different behavioural variables for neurons belonging to each cluster as calculated by the no-refitting approach (see Methods). Left, average relative contributions of cue period behavioural variables to neural activity for each cluster. Right, average relative contribution of reward for each cluster. b, As in a, but for the clustering analysis performed on the contributions calculated using the refitting approach (see Methods). c, Normalized confusion matrix for the cluster identities of each neuron, obtained by comparing the clustering of the relative contributions based on either the no-refitting or the refitting approach (see Methods for description of two approaches). The main diagonal represents neurons for which the cluster identities matched (97.8%). d, Average relative contributions of clusters obtained by separately analysing two random halves of the trials for each neuron. Correlations between the average relative contributions in each cluster across the two sets are as follows (n = 5 in all cases). Position: ρ = 0.99, P < 8 × 10−5. Cues: ρ = 0.99, P < 4 × 10−4. Kinematics: ρ = 0.99, P < 2 × 10−4. Accuracy: ρ = 0.99, P < 3 × 10−4. Previous reward: ρ = 0.99, P < 0.001. Reward response: ρ = 0.48, P < 0.42. e, Normalized confusion matrix for the cluster identities of each neuron, obtained by clustering the two random halves of the data. The main diagonal represents neurons for which the cluster identities matched (79.1%). Note that the chance level of matching is 20%. The matrix was calculated for neurons for which a cluster was assigned in the procedures for both halves of the data (>75% probability to belong to a cluster, n = 91). f, Average absolute value of the correlations for all pairs of predictors across all behavioural variables during the cue period (average across all predictor pairs and mice). g, Average relative contributions assessed separately using three different approaches: no refitting (NR; used in the paper); no refitting + Lasso regularization (NR + L); and refitting (R). Correlations between the results of the different approaches are as follows: ρ(NR, NR + L) = 1, P < 7 × 10−9. ρ(NR, R) = 0.99, P < 1 × 10−4. ρ(NR + L, R) = 0.99, P < 8 × 10−5 (n = 6 in all cases). When omitting the reward response contributions: ρ(NR, NR + L) = 1, P < 2 × 10−5. ρ(NR, R) = 0.91, P < 0.04. ρ(NR + L, R) = 0.92, P < 0.03 (n = 5 in all cases). Lasso regularization was applied using the lasso function in MATLAB; the mean square error (MSE) of the model was estimated using fivefold cross-validation, and we chose the lambda value that minimized the MSE. The results with lasso regularization were almost identical to the result without regularization, which suggests that there was not significant overfitting in our model. h, Average relative contributions assessed separately using two random halves of the data. For each neuron, we randomly divided all the trials in which the neuron was recorded into two separate subsets while matching the number of rewarded and previously rewarded trials between the subsets. Each subset of trials was then used to calculate the relative contributions of the behavioural variables (ρ = 0.99, P < 3 × 10−4 for all behavioural variables (n = 6), ρ = 0.8, P < 0.11 when omitting the reward response contributions (n = 5)). i, We tested the robustness of the clustering results by performing an alternative clustering procedure based on the predicted neuronal traces. After determining the regression weights for all neurons, behavioural predictors from one session were used to generate predicted activity traces for all neurons. A similarity matrix was constructed by taking the absolute correlation between the predicted traces for each neuronal pair. The similarity matrix was clustered using information-based clustering20 (see Methods) and ordered by the obtained clusters (right panel; cluster identity for each neuron depicted by a coloured stripe to the right). j, Normalized confusion matrix for the cluster identities of each neuron, comparing clustering of the relative contributions (method used in the main text; Fig. 3) and the alternative method described here. The two clustering methods involve conceptual differences that may result in different clustering organizations. For example, the method used in Fig. 3, which clusters the relative contributions of the behavioural variables, is independent of a particular tuning for these variables, whereas the method presented here should be affected by such tuning (for example, upward versus downward position ramps). Nevertheless, we find a similar overall clustering structure between the two methods, with the main differences as follows. Original clusters 3 and 5 (associated with previous reward and accuracy) are joined in a single cluster (new cluster 5); and original cluster 1 (associated with kinematics) is now split into two clusters (new clusters 1 and 3). Further investigation of the split of the kinematics cluster showed that the neurons that split from the main kinematics cluster have stronger modulation for the view angle component of kinematics (based on the regression coefficient values). Such a split could not occur in the formulation used in the main text, which combined all the kinematics components (speed, acceleration and view angle). k, Further validation of the encoding model by simulating data with known relative contributions of the different behavioural variables. We replaced the activity of each neuron by a simulated trace that was computed using known relative contributions of the different behavioural variables as follows: first, the predictors corresponding to each behavioural variable were summed, resulting in one predictor per variable. Each predictor was z-scored and multiplied by a different relative contribution (taken from the values obtained for the real data). The scaled predictors were then summed, resulting in a single vector that forms the basis of the firing rate of the simulated neuron. To this vector, we added a constant to obtain an average firing rate close to 5 Hz (which was observed in in vivo electrophysiological recordings22). After zeroing negative values of this firing rate vector, we used it to generate a spike train using a Poisson process. Finally, the spike train was convolved with an approximate GCaMP kernel (see Methods). We proceeded to estimate the relative contributions for the simulated trace using the encoding model procedure. The relative contributions used to simulate the traces (x axis) and the recovered contributions (y axis) for a given behavioural variable are shown; the correlation between the original and recovered relative contributions and its associated P value are denoted (n = 233 in all cases).
a, Schematic of the shaping protocol. Training consisted of nine mazes with increasing task difficulty. In the first five mazes, cues were permanent and were visible from the beginning of the trial (but still became progressively bigger as the mouse approached them). From maze 6 onward, cues only appeared when the mouse approached within 10 cm of their location. From maze 7 onward, cues could also appear on the unrewarded side. Cues were randomly distributed along the cue region. The number of cues on each side was sampled from a Poisson distribution with the mean indicated for each maze. b, Task performance, model fit and relative contributions of the behavioural variables throughout learning. The total number of neurons, the number of neurons with good model fit during the cue period (R2 > 5%; these were used to calculate the relative contributions of the behavioural variables during the cue period), and the number of mice analysed in each training stage are indicated at the top. Shaded colours are s.e.m. The results show that task performance increased steadily across the permanent cue mazes, and then dropped in the first transient cue maze, probably owing to the working memory component that is added in the transient cue mazes. The overall R2 value of the behavioural model increased across learning, indicating that over the course of training, neural activity could be better explained by the measured behavioural variables. Notably, the relative contribution of position increased monotonically during the permanent cue mazes, but then dropped during the transient cue mazes, similar to the performance of the mice across the mazes. This is consistent with the interpretation of positional ramps as reflecting a value signal3,18, because the expected value at each position is closely related to reward expectation for that session, and reward expectation is determined by average task performance. The relative contributions for cues also increased during early learning, consistent with being a reflection of the strength of the cue-reward association. Note that this value is decreased in the last maze, in which (because of the increased task difficulty) each cue has a lower predictive power with respect to reward. The relative contribution of previous reward decreased across the permanent cue mazes, then transiently increased during the first transient cue session. Because relying on previous reward is the wrong strategy in this task, this decrease in the relative contribution of previous reward may relate to mice weighting previous reward more heavily during the major steps in training when they have not yet learned the correct strategy for solving the task. The relative contribution of kinematics declined over the training procedure. This may be due to the kinematic aspect of the behaviour becoming less variable over training, as the mouse’s motor skills improved for virtual-reality navigation. The relative contribution of trial accuracy was significantly higher during the transient cue mazes than the permanent cue mazes. This result potentially suggests that DA activity is correlated with task performance preferentially when there is a working memory component. The reward response declined during the permanent cue mazes, and remained relatively consistent during the transient cue mazes; this is consistent with an RPE signal, as RPE indicates negative modulation of reward responses by reward expectation (and reward expectation is related to task performance). c, Proportion of neurons that were significantly modulated by the different behavioural variables throughout learning (see Methods). Shaded colours show the 1 s.d. confidence intervals for a binomial distribution calculated using Jeffreys method. d, Details of the shaping procedure. The table lists the parameters of the mazes progressively used during the shaping of the behaviour. The ‘permanent cues’ field indicates whether the cues were presented at the beginning of the trial; otherwise, each cue was presented when the mouse was 10 cm away from its location. ‘High- (and low) -cue-probability side mean’ indicates the means of the Poisson distribution from which the number of cues presented on each side were drawn (at least one cue was always drawn); ‘none’ indicates that no cues were presented for the low-probability side on any trial in that maze. The mice were automatically advanced to the next maze if the following criteria were met: (1) their performance was above a predetermined threshold (‘minimum performance for advancing’ field) for a given number of trials (‘number of trials to calculate performance’ field). (2) They completed at least n sessions in the current maze, in which n is given by the ‘minimum number of sessions for advancing’ field.
a, For each behavioural variable (position, cues and accuracy), each heat map contains all significant neurons for that maze, with each row representing the average response of one neuron (the activity of each neuron is normalized by its peak). Statistical significance is assessed by comparing the F-statistic obtained from a nested model comparison with or without each behavioural variable to a distribution of the same F-statistic obtained from shuffled data (see Methods). In the case of position and accuracy, the averaging is over trials. In the case of cues, the averaging is across cue occurrences, and the average baseline activity was subtracted (in the second preceding the cue occurrence). The numbers of significant and total neurons for that variable and maze are indicated at the top of each heat map. The height of the heat maps for each maze is proportional to the average fraction of significant neurons (across variables) for that maze. b, Changes in tuning across learning. Left, percentage of neurons with significant responses to position that exhibited a positive slope in their average response. Middle, percentage of neurons with significant responses to cues that exhibited higher response to contralateral cues (compared to ipsilateral cues). Right, percentage of neurons with significant responses to accuracy that exhibited higher response in error trials (compared to correct trials). Shaded colours show the 1 s.d. confidence intervals for a binomial distribution calculated using Jeffreys method. The horizontal dotted lines indicate 50% in each panel. Position-selective neurons exhibited early in training more downward ramps than upward ramps (left, mazes 2 and 3). Because upward, and not downward, ramps are consistent with a value signal3,18, this result suggests an evolution in the specific tuning—and not only the strength of representation—of this variable that is consistent with a value signal. Throughout training, cue-selective neurons are mostly selective for either contralateral or ipsilateral cues, and the preferential representation of contralateral cues develops late in training. This is interesting, because selectivity for contralateral versus ipsilateral cues is not a prediction of the RPE framework. Accuracy-selective neurons exhibit a strong bias towards elevated activity for error trials versus correct trials, which was evident by the last permanent cue maze.
Extended Data Fig. 10 Specific expression of GCaMP6f in midbrain dopamine neurons in the Ai148×DAT::cre mouse line.
a, Example GCaMP6f expression (green) and tyrosine hydroxylase (TH) antibody staining (red). Square indicates location of high-magnification view of GCaMP expression in TH+ neurons. Scale bars, 500 µm (top) and 100 µm (bottom). b, Quantification of penetrance and specificity of the Ai148×DAT::cre mouse line. Penetrance is the number of TH+ neurons also expressing GCaMP (mean: 95.2%; s.e.m.: 1.52%; n = 11 sections; 1,082 cells, 2 mice). Specificity is the number of GCaMP+ neurons that are also TH+ (mean: 96.7%; s.e.m.: 0.74%; n = 11 sections; 1,075 cells, 2 mice). c, Examples of lesions caused by GRIN lens implants (left). Insets are higher-magnification images of the regions in which TH+ neurons were counted underneath the lens and compared to counts contralateral to the lens. Scale bars, 50 µm. White overlay indicates location of the lesion. Cells were counted in 50 µm by 50 µm squares from 0–300 µm below the lens. d, Average number of TH+ neurons per 50 µm2 by distance from the bottom of the lens. Orange denotes average count under the lens; grey denotes average count from the contralateral hemisphere. Shading denotes s.e.m. n = 11 mice.
Three sample trials showing the mouse’s perspective view of the visual field during the task
Raw (left) and motion-corrected (right) video of DA neurons imaged through a GRIN lens during behaviour. For visualization, the movie was sped up by a factor of 5 by taking a rolling average of 5 frames and down-sampling by a factor of 5
About this article
Cite this article
Engelhard, B., Finkelstein, J., Cox, J. et al. Specialized coding of sensory, motor and cognitive variables in VTA dopamine neurons. Nature 570, 509–513 (2019). https://doi.org/10.1038/s41586-019-1261-9
Current Opinion in Neurobiology (2021)
Current Opinion in Behavioral Sciences (2021)
Nature Neuroscience (2020)
Current Biology (2020)
Current Biology (2020)