Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Article
  • Published:

Cell-type-specific asynchronous modulation of PKA by dopamine in learning

Abstract

Reinforcement learning models postulate that neurons that release dopamine encode information about action and action outcome, and provide a teaching signal to striatal spiny projection neurons in the form of dopamine release1. Dopamine is thought to guide learning via dynamic and differential modulation of protein kinase A (PKA) in each class of spiny projection neuron2. However, the real-time relationship between dopamine and PKA in spiny projection neurons remains untested in behaving animals. Here we monitor the activity of dopamine-releasing neurons, extracellular levels of dopamine and net PKA activity in spiny projection neurons in the nucleus accumbens of mice during learning. We find positive and negative modulation of dopamine that evolves across training and is both necessary and sufficient to explain concurrent fluctuations in the PKA activity of spiny projection neurons. Modulations of PKA in spiny projection neurons that express type-1 and type-2 dopamine receptors are dichotomous, such that these neurons are selectively sensitive to increases and decreases, respectively, in dopamine that occur at different phases of learning. Thus, PKA-dependent pathways in each class of spiny projection neuron are asynchronously engaged by positive or negative dopamine signals during learning.

This is a preview of subscription content, access via your institution

Access options

Buy this article

Prices may be subject to local taxes which are calculated during checkout

Fig. 1: FLiP reveals bidirectional changes in PKA activity in SPNs in vivo.
Fig. 2: Plasticity of DA release and DAN activity dynamics across learning.
Fig. 3: PKA activities in D1R- and D2R-SPNs are dynamically modulated and follow patterns of DA during learning.
Fig. 4: Transient changes in DAN activity are sufficient to modulate PKA activity in SPNs.

Similar content being viewed by others

Data availability

All data (MATLAB data files) are available online via the public repository managed by Harvard Medical School (https://sharehost.hms.harvard.edu/neurobiology/?sabatini/DA_PKA). Any other relevant data are available from the corresponding author upon reasonable request.

Code availability

Custom MATLAB codes are available online via the public repository managed by Harvard Medical School (https://sharehost.hms.harvard.edu/neurobiology/?sabatini/DA_PKA).

References

  1. Bromberg-Martin, E. S., Matsumoto, M. & Hikosaka, O. Dopamine in motivational control: rewarding, aversive, and alerting. Neuron 68, 815–834 (2010).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  2. Kravitz, A. V. & Kreitzer, A. C. Striatal mechanisms underlying movement, reinforcement, and punishment. Physiology (Bethesda) 27, 167–177 (2012).

    Google Scholar 

  3. Vidal-Gadea, A. G. & Pierce-Shimomura, J. T. Conserved role of dopamine in the modulation of behavior. Commun. Integr. Biol. 5, 440–447 (2012).

    Article  PubMed  PubMed Central  Google Scholar 

  4. Steinberg, E. E. et al. Positive reinforcement mediated by midbrain dopamine neurons requires D1 and D2 receptor activation in the nucleus accumbens. PLoS ONE 9, e94771 (2014).

    Article  ADS  PubMed  PubMed Central  CAS  Google Scholar 

  5. Hikida, T., Kimura, K., Wada, N., Funabiki, K. & Nakanishi, S. Distinct roles of synaptic transmission in direct and indirect striatal pathways to reward and aversive behavior. Neuron 66, 896–907 (2010).

    Article  CAS  PubMed  Google Scholar 

  6. Tsai, H. C. et al. Phasic firing in dopaminergic neurons is sufficient for behavioral conditioning. Science 324, 1080–1084 (2009).

    Article  ADS  CAS  PubMed  PubMed Central  Google Scholar 

  7. Steinberg, E. E. et al. A causal link between prediction errors, dopamine neurons and learning. Nat. Neurosci. 16, 966–973 (2013).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  8. Saunders, B. T., Richard, J. M., Margolis, E. B. & Janak, P. H. Dopamine neurons create Pavlovian conditioned stimuli with circuit-defined motivational properties. Nat. Neurosci. 21, 1072–1083 (2018).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  9. Coddington, L. T. & Dudman, J. T. The timing of action determines reward prediction signals in identified midbrain dopamine neurons. Nat. Neurosci. 21, 1563–1573 (2018).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  10. Schultz, W., Dayan, P. & Montague, P. R. A neural substrate of prediction and reward. Science 275, 1593–1599 (1997).

    Article  CAS  PubMed  Google Scholar 

  11. Cohen, J. Y., Haesler, S., Vong, L., Lowell, B. B. & Uchida, N. Neuron-type-specific signals for reward and punishment in the ventral tegmental area. Nature 482, 85–88 (2012).

    Article  ADS  CAS  PubMed  PubMed Central  Google Scholar 

  12. Eshel, N., Tian, J., Bukwich, M. & Uchida, N. Dopamine neurons share common response function for reward prediction error. Nat. Neurosci. 19, 479–486 (2016).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  13. Day, J. J., Roitman, M. F., Wightman, R. M. & Carelli, R. M. Associative learning mediates dynamic shifts in dopamine signaling in the nucleus accumbens. Nat. Neurosci. 10, 1020–1028 (2007).

    Article  CAS  PubMed  Google Scholar 

  14. Shen, W., Flajolet, M., Greengard, P. & Surmeier, D. J. Dichotomous dopaminergic control of striatal synaptic plasticity. Science 321, 848–851 (2008).

    Article  ADS  CAS  PubMed  PubMed Central  Google Scholar 

  15. Gerfen, C. R. et al. D1 and D2 dopamine receptor-regulated gene expression of striatonigral and striatopallidal neurons. Science 250, 1429–1432 (1990).

    Article  ADS  CAS  PubMed  Google Scholar 

  16. Kupchik, Y. M. et al. Coding the direct/indirect pathways by D1 and D2 receptors is not valid for accumbens projections. Nat. Neurosci. 18, 1230–1232 (2015).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  17. Skeberdis, V. A. et al. Protein kinase A regulates calcium permeability of NMDA receptors. Nat. Neurosci. 9, 501–510 (2006).

    Article  CAS  PubMed  Google Scholar 

  18. Lee, H. K. et al. Phosphorylation of the AMPA receptor GluR1 subunit is required for synaptic plasticity and retention of spatial memory. Cell 112, 631–643 (2003).

    Article  CAS  PubMed  Google Scholar 

  19. Yagishita, S. et al. A critical time window for dopamine actions on the structural plasticity of dendritic spines. Science 345, 1616–1620 (2014).

    Article  ADS  CAS  PubMed  PubMed Central  Google Scholar 

  20. Iino, Y. et al. Dopamine D2 receptors in discrimination learning and spine enlargement. Nature 579, 555–560 (2020).

    Article  ADS  CAS  PubMed  Google Scholar 

  21. Lau, G. C., Saha, S., Faris, R. & Russek, S. J. Up-regulation of NMDAR1 subunit gene expression in cortical neurons via a PKA-dependent pathway. J. Neurochem. 88, 564–575 (2004).

    Article  CAS  PubMed  Google Scholar 

  22. Nayak, A., Zastrow, D. J., Lickteig, R., Zahniser, N. R. & Browning, M. D. Maintenance of late-phase LTP is accompanied by PKA-dependent increase in AMPA receptor synthesis. Nature 394, 680–683 (1998).

    Article  ADS  CAS  PubMed  Google Scholar 

  23. Lee, S. J., Chen, Y., Lodder, B. & Sabatini, B. L. Monitoring behaviorally induced biochemical changes using fluorescence lifetime photometry. Front. Neurosci. 13, 766 (2019).

    Article  PubMed  PubMed Central  Google Scholar 

  24. Chen, Y., Saulnier, J. L., Yellen, G. & Sabatini, B. L. A PKA activity sensor for quantitative analysis of endogenous GPCR signaling via 2-photon FRET-FLIM imaging. Front. Pharmacol. 5, 56 (2014).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  25. Chen, Y. et al. Endogenous Gαq-coupled neuromodulator receptors activate protein kinase A. Neuron 96, 1070–1083.e5 (2017).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  26. Mohebi, A. et al. Dissociable dopamine dynamics for learning and motivation. Nature 570, 65–70 (2019).

    Article  ADS  CAS  PubMed  PubMed Central  Google Scholar 

  27. Dana, H. et al. Sensitive red protein calcium indicators for imaging neural activity. eLife 5, e12727 (2016).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  28. Patriarchi, T. et al. Ultrafast neuronal imaging of dopamine dynamics with designed genetically encoded sensors. Science 360, eaat4422 (2018).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  29. Klapoetke, N. C. et al. Independent optical excitation of distinct neural populations. Nat. Methods 11, 338–346 (2014).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  30. Mahn, M. et al. High-efficiency optogenetic silencing with soma-targeted anion-conducting channelrhodopsins. Nat. Commun. 9, 4125 (2018).

    Article  ADS  PubMed  PubMed Central  CAS  Google Scholar 

  31. Howe, M. W., Tierney, P. L., Sandberg, S. G., Phillips, P. E. M. & Graybiel, A. M. Prolonged dopamine signalling in striatum signals proximity and value of distant rewards. Nature 500, 575–579 (2013).

    Article  ADS  CAS  PubMed  PubMed Central  Google Scholar 

  32. Matamales, M. et al. Local D2- to D1-neuron transmodulation updates goal-directed learning in the striatum. Science 367, 549–555 (2020).

    Article  ADS  CAS  PubMed  Google Scholar 

  33. Jiang, S. Z. et al. NCS-Rapgef2, the protein product of the neuronal Rapgef2 gene, is a specific activator of D1 dopamine receptor-dependent ERK phosphorylation in mouse brain. eNeuro 4, ENEURO.0248-17.2017 (2017).

    Article  Google Scholar 

  34. Ilango, A. et al. Similar roles of substantia nigra and ventral tegmental dopamine neurons in reward and aversion. J. Neurosci. 34, 817–822 (2014).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  35. Goto, A. et al. Circuit-dependent striatal PKA and ERK signaling underlies rapid behavioral shift in mating reaction of male mice. Proc. Natl Acad. Sci. USA 112, 6718–6723 (2015).

    Article  ADS  CAS  PubMed  PubMed Central  Google Scholar 

  36. Yamaguchi, T. et al. Role of PKA signaling in D2 receptor-expressing neurons in the core of the nucleus accumbens in aversive learning. Proc. Natl Acad. Sci. USA 112, 11383–11388 (2015).

    Article  ADS  CAS  PubMed  PubMed Central  Google Scholar 

  37. Ma, L. et al. A highly sensitive A-kinase activity reporter for imaging neuromodulatory events in awake mice. Neuron 99, 665–679.e5 (2018).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  38. Collins, A. G. E. & Frank, M. J. Opponent actor learning (OpAL): modeling interactive effects of striatal dopamine on reinforcement learning and choice incentive. Psychol. Rev. 121, 337–366 (2014).

    Article  PubMed  Google Scholar 

  39. Gurney, K. N., Humphries, M. D. & Redgrave, P. A new framework for cortico-striatal plasticity: behavioural theory meets in vitro data at the reinforcement-action interface. PLoS Biol. 13, e1002034 (2015).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  40. Gerfen, C. R. & Surmeier, D. J. Modulation of striatal projection systems by dopamine. Annu. Rev. Neurosci. 34, 441–466 (2011).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  41. Gerfen, C. R., Paletzki, R. & Heintz, N. GENSAT BAC Cre-recombinase driver lines to study the functional organization of cerebral cortical and basal ganglia circuits. Neuron 80, 1368–1383 (2013).

    Article  CAS  PubMed  Google Scholar 

  42. Bäckman, C. M. et al. Characterization of a mouse strain expressing Cre recombinase from the 3′ untranslated region of the dopamine transporter locus. Genesis 44, 383–390 (2006).

    Article  PubMed  CAS  Google Scholar 

  43. Lee, S. J., Escobedo-Lozoya, Y., Szatmari, E. M. & Yasuda, R. Activation of CaMKII in single dendritic spines during long-term potentiation. Nature 458, 299–304 (2009).

    Article  ADS  CAS  PubMed  PubMed Central  Google Scholar 

  44. Pnevmatikakis, E. A. et al. Simultaneous denoising, deconvolution, and demixing of calcium imaging data. Neuron 89, 285–299 (2016).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  45. Motulsky, H. J. How to report the methods used for the mixed model analysis https://www.graphpad.com/guides/prism/8/statistics/stat_how-to-report-the-methods-used.htm (2020).

Download references

Acknowledgements

This work was supported by NIH (B.L.S., U19NS113201 and R35NS105107; Y.C., F32DA035543; and L.T., U01NS013522 and U01NS090604), Howard Hughes Medical Institute (B.L.S.), Sackler Scholar Programme in Psychology (S.J.L.), Schuurman Schimmel van Outeren Stichting (B.L.) and Hendrik Muller fonds (B.L.). Graphical illustration was provided by Sigrid Knemeyer (sigrid@scistories.com).

Author information

Authors and Affiliations

Authors

Contributions

S.J.L. designed the study, developed the photometry system, collected and analysed data and wrote the manuscript. B.L. collected and analysed data, and edited the manuscript. Y.C. conceived the study and edited the manuscript. T.P. and L.T. developed the dLight sensor and shared expertise. B.L.S. designed and supervised the study and wrote the manuscript.

Corresponding author

Correspondence to Bernardo L. Sabatini.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Peer review information Nature thanks Michael Frank, Charles Gerfen and the other, anonymous, reviewer(s) for their contribution to the peer review of this work.

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Extended data figures and tables

Extended Data Fig. 1 Multipurpose photometry system for FLiP, fibre photometry and optogenetics.

The system consists of two independent multicolour photometry units. The top photometry unit consists of three subcomponents used for: (1) red channel fluorescence photometry, (2) Chrimson optogenetic laser activation and (3) green channel fluorescence lifetime and intensity photometry. For (1), red channel photometry was accomplished using a fibre-coupled 565-nm LED (M565F3, Thorlabs) for excitation with output collimated in free-space by L2 and filtered by F1 (554/23, Semrock). Red fluorescence was separated from the excitation light by dichroic D1 (573LP, Semrock), filtered by F2 (630/60, Semrock) and focused onto a PMT (H10770(P)A-40, Hamamatsu) by L3. For (2), Chrimson optogenetic light was provided by a fibre-coupled 593.5-nm laser (SKU: YL-593-00100-CWM-SD-03-LED-F, Optoengine) with output collimated by L1 and combined with the red photometry path via M2, a mirror that can be inserted or removed, respectively, for Chrimson optogenetic stimulation or red channel photometry. For the green channel fluorescence lifetime measurement mode of (3), a 50-MHz 473-nm pulsed laser (BDS-473-SM-FBE, Becker and Hickl) was fed through a rotating neutral density filter for power adjustment, reflected by D3 (488LP dichroic, Semrock) and focused onto a patch cable by L6. Emission light was passed through D3, reflected by D2 (532LP dichroic, Semrock), filtered by F4 (517/22, Semrock) and focused by L5 to a high-speed hybrid PMT (HPM-100-07-Cooled, Becker and Hickl). The hybrid PMT was connected to a time-correlated single-photon counting board (SPC-830, Becker and Hickl) for fluorescence lifetime measurements. For the green channel fluorescence intensity measurement mode of (3), a fibre-coupled 470-nm LED (M470F3, Thorlabs) was collimated by L4, filtered by F3 (482/18, Semrock) and reflected by a removable mirror (M4); emission light was detected by a PMT (H7422-40, Hamamatsu). Alternatively, when fluorescence lifetime measurements were not needed, the bottom photometry unit was used. This simple ‘dual-colour fluorescence intensity photometry’ unit consists of 470-nm and 565-nm LEDs (Thorlabs), two PMTs (H10770(P)A-40, Hamamatsu) and a Doric Minicube (FMC5_E1(465-480)_F1(500-540)_E2(555-570)_F2(580-680)_S, Doric Lenses) that are connected by patch cables. For both photometry units, LEDs were driven by a digital signal processor system (RX8-5-12, Tucker-Davis Technology) for frequency modulation to carry out locked-in amplification of sensor signals detected by PMTs. In addition to the two main photometry units, a 593.5-nm laser (SKU: YL-593-00100-CWM-SD-03-LED-F, Optoengine) and a 473-nm laser (MBL-III-473, Optoengine) with independent patch cable connections were installed for Chrimson optogenetics and stGtACR2 optogenetics, respectively, for VTA DAN activity manipulation while monitoring the NAc.

Extended Data Fig. 2 Plasticity in DA release and DAN activity dynamics.

a, Behavioural parameters demonstrating that mice are able to learn the visual-cue-guided operant conditioning described in Fig. 2a. Top: from the left, success rate (number of rewarded trials/total number of trials, one-way repeated-measures ANOVA, F(6.298, 384.2) = 107.8, P < 0.0001), entering failure rate (number of receptacle zone entering failure trials/total number of trials, one-way repeated-measures ANOVA, F(5.782, 352.7) = 103.8, P < 0.0001), occupancy failure rate (number of premature receptacle zone exit trials/number of receptacle zone entering success trials, one-way repeated-measures ANOVA, F(4.253, 225.4) = 7.324, P < 0.0001), time spent in zone (time spent in a zone/total session time, receptacle: one-way repeated-measures ANOVA, F(3.171, 193.4) = 51.12, P < 0.0001; trigger: F(4.544, 277.2) = 110.9, P < 0.0001), average speed (one-way repeated-measures ANOVA, F(3.969, 242.1) = 15.26, P < 0.0001). Bottom: from the left, entering latency (delay to enter the receptacle zone after the LED cue, one-way repeated-measures ANOVA, F(1.760, 107.4) = 9.652, P = 0.0003), zone occupancy (time spent in the receptacle zone after entering the zone during a trial, 3 s = maximum, one-way repeated-measures ANOVA, F(5.229, 277.1) = 3.420, P = 0.0045). Last three graphs depict success rate (one-way R repeated-measures ANOVA, F(1.727, 72.54) = 668.6, P < 0.0001), entering failure rate (one-way repeated-measures ANOVA, F(1.328, 55.79) = 244.5, P < 0.0001), and occupancy failure rate (one-way repeated-measures ANOVA, F(1.295, 54.40) = 61.04, P < 0.0001) comparisons for regular, reward-omission and rewarded LED-omission sessions of expert mice. n = 64 mice from all photometry behaviour experiments. Plotted as mean ± s.e.m. across mice and dots = mouse averages. ***P < 0.001 for Bonferroni-corrected post hoc comparisons. b, DAN activity across learning. The average responses for beginner, intermediate, expert, reward-omission (of expert mice) and rewarded LED-omission (of expert mice) trials are shown in red, orange, green, blue and purple, respectively. Dashed vertical lines indicate the behavioural time stamps (T = trigger zone entry, L = LED on, Z = receptacle zone entry, D = pellet dispensing, R = receptacle entry). Top: normalized Δf/f of VTA jRCaMP signal showing VTA DAN soma activity. Bottom: normalized Δf/f of NAc jRCaMP signal showing VTA DAN terminal activity. n = 10 mice. Plotted as mean ± s.e.m. across mice. c, VTA jRCaMP response (mean of normalized signal) to LED (mean of 0–3 s after LED cue) during training. Left, Individual mouse average plotted for different training periods (beginner, intermediate and expert) plotted as mean ± s.e.m. across trials (one-way repeated-measures ANOVA, F(1.870, 14.96) = 73.76, P < 0.0001). Right, daily average of LED response versus success rate, in which each dot represents a daily (session) measurement of a mouse plotted as linear regression fit ± its 95% confidence interval. d, VTA jRCaMP response to reward (mean of 3 s around the peak after receptacle entry) during training plotted as in c (one-way repeated-measures ANOVA, F(1.835, 14.68) = 40.40, P < 0.0001). e, VTA jRCaMP response in trained mice. Left, response to reward omission (RO-dip, mean of 3–6 s (shifted for slow soma jRCaMP signal) after expected time of pellet dispensing, one-sample t-test, P = 0.013) and to LED omission (LO-LED, mean of 0–3 s after expected time of LED onset, one-sample t-test, P = 0.045) plotted as mean ± s.e.m. across mice. Right, response to reward (mean of 0–3 s after pellet dispensing) in regular success (suc) and LED-omission (LO) trials (paired t-test, P = 0.001) plotted as in c. f, As in c, for NAc jRCaMP (one-way repeated-measures ANOVA, F(1.986, 15.89) = 98.52, P < 0.0001). g, As in d, for NAc jRCaMP (one-way repeated-measures ANOVA, F(1.939, 15.51) = 33.95, P < 0.0001). h, As in e, NAc jRCaMP response in trained mice. Left, response to reward omission (RO-dip, mean of 0–3 s after expected time of pellet dispensing, one-sample t-test, P < 0.0001) and to LED omission (LO-LED, mean of 0–3 s after expected time of LED onset, one-sample t-test, P = 0.189). Right, response to reward (mean of 0–3 s after pellet dispensing) in regular success (suc) and LED-omission (LO) trials (paired t-test, P < 0.0001). i, Daily average response of VTA jRCaMP, NAc jRCaMP and dLight response to LED and reward across training for individual mice. From the left, one-way mixed-effects analysis, F(1.589, 13.35) = 25.07, P < 0.0001; F(2.339, 18.71) = 11.70, P = 0.0003; F(1.720, 14.45) = 25.75, P < 0.0001; F(2.984, 23.87) = 15.69, P < 0.0001; F(1.806, 15.17) = 35.41, P < 0.0001; F(2.835, 22.68) = 12.73, P < 0.0001). Plotted as mean ± s.e.m.across trials for each mouse. *P < 0.05, **P < 0.01, ***P < 0.001 for one-sample t-tests, and paired t-tests, Bonferroni-corrected post hoc comparisons. All t-tests are two-sided.

Extended Data Fig. 3 Relationship between DA signal and behavioural parameters.

a, Schematics of building a generalized linear model that relates user-controlled stimuli and behavioural parameters to fluorescence signals. In brief, there are three types of explanatory (independent) variables in the model. Continuous variables (speed, acceleration, rotation and position) continuously change their values as time passes. Event variables (movement initiation, cue, reward delivery and receptacle entry) are 0 except at a time point of an event when they temporarily change their value to 1. Whole trial variables (accuracy = 0 for current trial failure, 1 for current trial success; previous trial = 0 for previous trial failure, 1 for previous trial success) change their values in the beginning of a trial and stay constant until the next trial. b, Comparison of average variable contributions for VTA jRCaMP (red), NAc jRCaMP (orange) and dLight (green) for beginner (left), intermediate (middle), and expert (right) sessions. Contribution of each category was calculated by a method described in a. Kinematic variables include speed, acceleration and rotation variables. Other categories are assigned to an individual variable (a set of time-shifted variables). Mean contributions to three signals were compared by one-way repeated-measures ANOVA for each variable category. Plotted as mean ± s.e.m. across mice (n = 10 mice). c, Comparison of model fits for VTA jRCaMP (red), NAc jRCaMP (orange) and dLight (green) for beginner (left), intermediate (middle), and expert (right) sessions plotted as mean ± s.e.m. across mice (n = 10 mice). Model fit was estimated by the correlation between actual signals and predicted signals from the model. Left set of bars represent correlations during a full duration (about −40 to 80 with respect to the trigger zone entry). Right set of bars represent correlations during a trial duration (−5 to 15 s with respect to the trigger zone entry). Model fits for three signals were compared by one-way repeated-measures ANOVA. (*)P < 0.10 for one-way repeated-measures ANOVA (Bonferroni-corrected).

Extended Data Fig. 4 Bidirectional modulation of DAN activity during behaviour.

a, Schematic describing the experimental procedure. Left, expression of stGtACR2, ChrimsonR and GCaMP6f in DANs. Middle, injection of three viral vectors into VTA of a DAT-IRES-cre mouse. Right, the NAc fibre was used to collect terminal GCaMP signal. The VTA fibre was used to collect somatic GCaMP signal and to optogenetically activate (ChrimsonR) or inactivate (stGtACR2) DANs. b, VTA GCaMP (top) and NAc GCaMP (middle) response (normalized to 99th percentile of the single session) to unpredicted rewards in naive (untrained) mice. Mean NAc GCaMP signal normalized to the control response was compared between control and inactivation trials (bottom, paired t-test, P = 0.002). Dotted line represents time of pellet dispensing. c, VTA GCaMP (top) and NAc GCaMP (middle) response (normalized to 99th percentile of trained mouse sessions) to reward predictive LED cue in trained mice. Comparison of mean NAc GCaMP signal during control and inactivation trials (bottom, paired t-test, P = 0.027). Dotted line represents time of LED onset. d, VTA GCaMP (top) and NAc GCaMP (middle) response (normalized to 99th percentile of trained mouse sessions) to predicted reward (reward following LED cue) in trained mice. Comparison of mean NAc GCaMP signal during control and inactivation trials (bottom, paired t-test, P = 0.065). Dotted line represents time of pellet dispensing. e, VTA GCaMP (top) and NAc GCaMP (middle) response (normalized to 99th percentile of trained mouse sessions) to reward omission in trained mice. Comparison of mean NAc GCaMP signal during control and activation trials (bottom, 2 of 3 lines overlapping, paired t-test, P = 0.029). Dotted line represents time of expected reward delivery. (*)P < 0.10, *P < 0.05, **P < 0.01 for paired t-test. All graphs are plotted as mean ± s.e.m. across mice (n = 3 mice) and dot = mouse average. Average of signal (0–10 s) normalized to the control response was used for the comparison between control and optogenetic trials. The average response of each mouse was calculated from 5 or 6 trials. Blue and red bars indicate the periods of blue-laser illumination for stGtACR2 and red-laser illumination for ChrimsonR, respectively. VTA GCaMP signal could not be collected for blue-laser illumination period owing to optical crosstalk. All t-tests are two-sided.

Extended Data Fig. 5 Movement and optical artefacts cannot explain dLight and jRCaMP signal patterns.

a, Δf/f (%) of different controls. The average signals for beginner, intermediate, expert, reward-omission (of expert mice) and rewarded LED-omission (of expert mice) trials are shown in red, orange, green, blue and purple, respectively. Dashed vertical lines indicate the behavioural time stamps (T = trigger zone entry, L = LED on, Z = receptacle zone entry, D = pellet dispensing, R = receptacle entry). Top, Δf/f (%) of eGFP signal from the VTA of DAT-IRES-cre mice (n = 4 mice) that were injected with AAV1-Cag-FLEX-eGFP into the VTA. Middle, Δf/f (%) of eGFP signal from the NAc of DAT-IRES-cre mice (n = 4 mice) that were injected with AAV1-Cag-FLEX-eGFP into the VTA. Bottom, Δf/f (%) of DA-binding mutant dLight (D103A mutation) signal from the NAc of C57BL/6J mice (n = 8 mice) that were injected with AAV9-hSyn-dLightD103A into the NAc. b, Δf/f (%) of different controls that are magnified in Δf/f axis and demagnified in time axis. VTA eGFP (left, n = 4 mice), NAc eGFP (middle, n = 4 mice) and NAc mutant dLight (right, n = 8 mice) signal aligned to the time of trigger zone entry (dashed vertical line). There was a minor (compared to sensor responses) change in NAc eGFP and mutant dLight signal that develops across learning (possibly owing to haemodynamic effects). c, Test for the optical crosstalk between green and red spectrum for simultaneous dual-colour photometry for dLight and jRCaMP. Mice were given unexpected free food pellets, and signal was aligned to the time of pellet dispensing. Left, raw fluorescence signal in red and green spectrum from NAc of C57BL/6J mice (n = 3 mice, 10 trials prt mouse) injected with AAV9-hSyn-dLight1.1 into the NAc. Right, raw fluorescence signal in red and green spectrum from the NAc of DAT-IRES-cre mice (n = 3 mice, 10 trials per mouse) injected with AAV1-hSyn-FLEX-NES-jRCaMP1b into the VTA. d, Baseline (pretrial) raw fluorescence estimating the change in a signal strength due to photobleaching and viral expression change across days. Raw fluorescence was normalized by the maximum value of each mouse across all sessions. All graphs are plotted as mean ± s.e.m. across mice.

Extended Data Fig. 6 Bilateral dLight measurement and mutant FLIM-AKAR control experiments.

a, Schematic describing a strategy to measure DA level in both hemispheres. AAV9-hSyn-dLight1.1 was bilaterally injected into the NAc of C57BL/6J mice. Then, two optical fibres were implanted 200 μm above the injection sites in two hemispheres. b, Relationship between dLight signals from two hemispheres. Left, average correlation between two dLight signals (Δf/f) during trial duration (−5 to 15 s with respect to the trigger zone entry) and all time (analysed in 20-s time window for each linear fit) plotted as mean ± 95% confidence interval across mouse averages (two-sided paired t-test, P = 0.002). Middle, correlation of individual mouse for trial duration plotted as mean ± 95% confidence interval across session averages for each mouse. Right, correlation of individual mouse for all time plotted as in middle. n = 7 mice. c, Comparison between FLIM-AKAR and FLIM-AKAR(T391A), which has a point mutation at the PKA phosphorylation site, signals. AAV1-FLEX-FLIM-AKAR or AAV1-FLEX-FLIM- FLIM-AKAR(T391A) was injected into the NAc of Drd1a-cre or Adora2a-cre mice for these experiments. From the top, D1R-SPN FLIM-AKAR (D1R-SPN AKAR), D1R-SPN FLIM-AKAR(T391A) (D1R-SPN mAKAR), D2R-SPN FLIM-AKAR (D2R-SPN AKAR), D2R-SPN FLIM-AKAR(T391A) (D2R-SPN mAKAR). From the left, signals were aligned to the time (dashed vertical line) of ‘LED on’ for success and failure trials separately, ‘receptacle entry’ for success trials and ‘pellet dispensing’ for success, reward-omission and LED-omission trials. Signals for beginner, intermediate, expert, reward-omission (of expert mice) and rewarded LED-omission (of expert mice) trials are shown in red, orange, green, blue and purple, respectively. Plotted as mean ± s.e.m. across mice. n = 14 mice (D1R-SPN AKAR), 7 mice (D1R-SPN mAKAR), 18 mice (D2R-SPN AKAR), 6 mice (D2R-SPN mAKAR).

Extended Data Fig. 7 Plasticity in patterns of PKA activity in SPNs during learning.

a, Heat maps of SPN FLIM-AKAR response for success trials during learning. Each row represents an individual trial. Red lines or dots indicate behavioural time stamps (T = trigger zone entry, L = LED on, Z = receptacle zone entry, D = pellet dispensing, R = receptacle entry). Different colours in mouse ID columns represent different mice for an individual row. Top, D1R-SPN FLIM-AKAR responses of Drd1a-cre mice. n = 98 trials (beginner), 418 trials (intermediate), 873 trials (expert) from 14 mice. Bottom, D2R-SPN FLIM-AKAR responses of Adora2a-cre mice. n = 134 trials (beginner), 596 trials (intermediate), 1,152 trials (expert) from 18 mice. b, Heat maps of SPN FLIM-AKAR response for failure trials during learning. Plotted as in a. Top, D1R-SPN FLIM-AKAR responses of Drd1a-cre mice. n = 497 trials (beginner), 402 trials (intermediate), 122 trials (expert) from 14 mice. Bottom, D2R-SPN FLIM-AKAR responses of Adora2a-cre mice. n = 528 trials (beginner), 416 trials (intermediate), 218 trials (expert) from 18 mice. c, Heat maps of SPN FLIM-AKAR response for reward-omission trials (left) and rewarded LED-omission trials (right). Plotted as in a. Top, D1R-SPN FLIM-AKAR responses of Drd1a-cre mice. n = 69 trials (reward omission) from 14 mice, 25 trials (rewarded LED-omission trials) from 6 mice. Bottom, D2R-SPN FLIM-AKAR responses of Adora2a-cre mice. n = 91 trials (reward omission) from 18 mice, 35 trials (rewarded LED-omission trials) from 10 mice.

Extended Data Fig. 8 Transient change in DAN activity is sufficient to modulate PKA activity in SPNs in the NAc.

a, Left, dLight responses in DAT-IRES-cre mice (n = 3 mice) to DAN activation (20 Hz, 2-ms pulse width, 14.3 mW illumination) for different durations of illumination (red = 0.5 s, orange = 1 s, green = 5 s, blue = 10 s, one-way repeated-measures ANOVA, F(1.064, 2.127) = 35.89, P = 0.023). Middle, D2R-SPN FLIM-AKAR responses in DAT-IRES-cre;Adora2a-cre mice (n = 4 mice) plotted in the same way as the left (one-way repeated-measures ANOVA, F(1.353, 4.059) = 4.140, P = 0.109). Right, D2R-SPN FLIM-AKAR responses in DAT-IRES-cre;Adora2a-cre mice (n = 4 mice) to 10 s illumination without (black) and with (red) intraperitoneal injection of D2R antagonist at least 10 mins before recording (paired t-test, valley: P = 0.338, peak: P = 0.033). Statistics performed on mean dLight (0–10 s) and AKAR (end of illumination to ending time + 20 s except for valley estimation (0–10 s) for D2R antagonist experiment) signal. To test whether D2R-SPN PKA activity can respond to DAN activation at all, we activated DAN for 10 s, which increases DA levels far more than does a natural food reward response (b, left). This nonphysiological level of DA release results in a bidirectional modulation of D2R-SPN net PKA activity (b, right) with net PKA activity slightly decreasing and then increasing. However, given that D2R antagonist does not significantly affect the initial reduction in PKA activity, this reduction is unlikely to be D2R-mediated. On the other hand, the delayed activation of PKA was blunted by D2R antagonist, suggesting a contribution of indirect circuit mechanisms, such as modulation of the activity of D2R-expressing cholinergic interneurons in the NAc. b, dLight and D1R-SPN FLIM-AKAR responses to DAN terminal stimulation (20 Hz, 2-ms pulse width) in the NAc (red = VTA DAN stimulation for 1 s and 10.5 mW, orange = DAN terminal stimulation for 1 s and 7.7 mW). Left, dLight responses in DAT-IRES-cre mice (n = 3 mice, paired t-test, P = 0.429). Right, D1R-SPN FLIM-AKAR responses in DAT-IRES-cre;Drd1a-cre mice (n = 6 mice, paired t-test, P = 0.597). Statistics performed on mean dLight (0–3 s) and AKAR (end of illumination to ending time + 20 s) signal. c, Optogenetic induction of ramping DA level change and consequent PKA activity change in SPNs to different illumination duration. Left, dLight responses in DAT-IRES-cre mice (n = 3 mice) to ramping DAN activation for different ramping durations (red = 3 s, orange = 5 s, green = 7 s ramping activation) and to food reward (blue) (one-way repeated-measures ANOVA, F(1.123, 2.246) = 4.260, P = 0.163). To induce ramping DA level change, the frequency of stimulation was gradually increased from 24 Hz to 34 Hz for 3 s (at 10.5 mW), 16 Hz to 34 Hz for 5 s (at 6.1 mW), and 4 Hz to 30 Hz for 7 s (at 10.5 mW). Middle, D1R-SPN FLIM-AKAR responses in DAT-IRES-cre;Drd1a-cre mice (n = 4 mice) plotted in the same way as the left (one sample t-test on 7-s ramp, P = 0.038). Right, D2R-SPN FLIM-AKAR responses in DAT-IRES-cre;Adora2a-cre mice (n = 4 mice) plotted in the same way as the left (one sample t-test on 7-s ramp, P = 0.779). d, D1R-SPN PKA activation versus DA release analysis. Left, mean of D1R-SPN AKAR (0–80 s) versus mean of dLight (0–20 s) for different stimulations (reward, 3-s ramp, 5-s ramp and 7-s ramp). Right, peak of D1R-SPN AKAR (0–80 s) versus mean of dLight (0–20 s) for different stimulations. Each data point represents the average and the s.e. across mice (n = 3 mice for dLight, n = 4 mice for AKAR). All t-tests are two-sided. All graphs are plotted as mean ± s.e.m. (if shaded) across mice. Dashed vertical line = illumination onset. The average response of each mouse was calculated from 10 trials. Blue bars indicate the periods of laser illumination (NAc) for ChrimsonR during which accurate FLIM-AKAR measurements were not possible. Pellet response for dLight was aligned to the peak after receptacle entry and time-shifted so that the upward slope starts near 0 s. Pellet response for D1R-SPN AKAR was aligned to the receptacle entry. Statistics for c were performed on mean dLight signal (0–20 s) and mean AKAR signal (0–80 s).

Extended Data Fig. 9 Selective PKA inhibition in SPNs slows learning.

a, Schematic describing a strategy to investigate the effect of D1R-SPN or D2R-SPN PKA inhibition on behaviour. AAV1-FLEX-PKIalpha-IRES-nls-mRuby2 was injected into the NAc of Drd1a-cre or Adora2a-cre mice to selectively inhibit PKA in D1R-SPN or D2R-SPN, respectively. For control groups, AAV1-Cag-FLEX-eGFP was injected instead. Ten to fourteen days after surgery, mice were started on a behaviour schedule that includes 1 day of habituation (day 0) and 11 days of training (days 1 to 11). b, Effect of D1R-SPN PKA inhibition on the fraction of time spent in trigger zone (time spent in trigger zone/total session time). Daily average time (left, two-way repeated-measures ANOVA day × group interaction, F(10, 140) = 5.565, P < 0.0001) and multiday average (right, unpaired t-test, P = 0.012, 0.017, 0.323). c, Effect of D1R-SPN PKA inhibition on the speed after LED (average speed during 0–1.2 s after LED onset; 1.2 s is the minimum latency to enter the receptacle zone after LED cue). Daily average speed after LED onset (left, two-way repeated-measures ANOVA day × group interaction, F(10, 140) = 2.923, P = 0.002) and multiday average (right, unpaired t-test, P = 0.046, >0.999, >0.999). d, Effect of D1R-SPN PKA inhibition on the entering failure rate (number of entering failure trials/total number of trials). Daily average entering failure rate (left, two-way repeated-measures ANOVA day × group interaction, F(10, 140) = 2.591, P = 0.007) and multiday average (right, unpaired t-test, P = 0.043, 0.367, 0.073). e, Effect of D2R-SPN PKA inhibition on the time spent in trigger zone. Daily average time (left, two-way repeated-measures ANOVA day × group interaction, F(10, 140) = 1.322, P = 0.224) and multiday average (right, unpaired t-test, P > 0.999, >0.999, >0.999). f, Effect of D2R-SPN PKA inhibition on the speed after LED. Daily average speed after LED onset (left, two-way RM ANOVA day × group interaction, F(10, 140) = 3.124, P = 0.001) and multiday average (right, unpaired t-test, P = 0.938, 0.011, 0.145). g, Effect of D2R-SPN PKA inhibition on the entering failure rate. Daily average entering failure rate (left, two-way repeated-measures ANOVA day × group interaction, F(10, 140) = 1.951, P = 0.043) and multiday average (right, unpaired t-test, P > 0.999, = 0.011, >0.999). h, Occupancy failure rate (number of premature receptacle zone exit trials/number of receptacle zone entering success trials) of D1R-SPN PKA inhibition experiments (one-way repeated-measures ANOVA on GFP group, F(3.213, 22.49) = 1.918, P = 0.153; one-way repeated-measures ANOVA on PKI group, F(2.424, 16.97) = 0.352, P = 0.747). i, Left, average speed (total distance travelled/total session time, two-way repeated-measures ANOVA group effect, F(1, 14) = 6.506, P = 0.023). Right, average speed during baseline period (−20 s before trigger zone entry to trigger zone entry, two-way repeated-measures ANOVA group effect, F(1,14) = 4.304, P = 0.057). j, DAPI cell counting for D1R SPN PKA inhibition (unpaired t-test, P = 0.517). k, Occupancy failure rate of D2R-SPN PKA inhibition experiments (one-way repeated-measures ANOVA on GFP group, F(2.293, 16.05) = 2.062, P = 0.155; one-way repeated-measures ANOVA on PKI group, F(1.598, 11.19) = 1.482, P = 0.264). l, Left, average speed (two-way repeated-measures ANOVA group effect, F(1, 14) = 0.011, P = 0.916). Right, average speed during baseline period (two-way repeated-measures ANOVA group effect, F(1, 14) = 0.211, P = 0.653). m, DAPI cell counting for D2R SPN PKA inhibition (unpaired t-test, P = 0.598). *P < 0.05 for Bonferroni-corrected unpaired t-tests. All t-tests are two-sided. Individual dots on the bar graph = a mouse. All graphs are plotted as mean ± s.e.m. across mice (n = 8 for each group).

Extended Data Fig. 10 Model of DA action on PKA activity of SPNs.

Overview of DA action on SPN PKA activity. Top, when DAN activity increases in response to a reward, reward predictive cue or optogenetic activation of DANs, more DA is released from DAN terminals in the NAc. This increase in DA level allows DA to bind to D1R, which increases the activity of adenylyl cyclase, the level of cAMP and—ultimately—the activity of PKA in D1R-SPNs. By contrast, the increase in DA level has a minimal effect on D2R, which is occupied by the basal level of DA. Bottom, when DAN activity decreases in response to a reward omission or optogenetic inhibition of DANs, DA release from DAN terminals in the NAc decreases below the baseline. This decrease in DA level has a minimal effect on D1R, which is not occupied by the basal DA. By contrast, the decrease in DA level allows the basal DA to unbind from D2R, which disinhibits PKA activity in D2R-SPNs.

Supplementary information

Supplementary Table 1

P-values for statistical comparisons

Reporting Summary

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Lee, S.J., Lodder, B., Chen, Y. et al. Cell-type-specific asynchronous modulation of PKA by dopamine in learning. Nature 590, 451–456 (2021). https://doi.org/10.1038/s41586-020-03050-5

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1038/s41586-020-03050-5

This article is cited by

Comments

By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.

Search

Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing