Behaviors are influenced by rewards to both oneself and others, but the neurons and neural connections that monitor and evaluate rewards in social contexts are unknown. To address this issue, we devised a social Pavlovian conditioning procedure for pairs of monkeys. Despite being constant in amount and probability, the subjective value of forthcoming self-rewards, as indexed by licking and choice behaviors, decreased as partner-reward probability increased. This value modulation was absent when the conspecific partner was replaced by a physical object. Medial prefrontal cortex neurons selectively monitored self-reward and partner-reward information, whereas midbrain dopaminergic neurons integrated this information into a subjective value. Recordings of local field potentials revealed that responses to reward-predictive stimuli in medial prefrontal cortex started before those in dopaminergic midbrain nuclei and that neural information flowed predominantly in a medial prefrontal cortex-to-midbrain direction. These findings delineate a dedicated pathway for subjective reward evaluation in social environments.
Subscribe to Journal
Get full journal access for 1 year
only $4.92 per issue
All prices are NET prices.
VAT will be added later in the checkout.
Tax calculation will be finalised during checkout.
Rent or Buy article
Get time limited or full article access on ReadCube.
All prices are NET prices.
The data that support the findings of this study are available from the corresponding author upon reasonable request.
Parrott, W. G. & Smith, R. H. Distinguishing the experiences of envy and jealousy. J. Pers. Soc. Psychol. 64, 906–920 (1993).
Takahashi, H. et al. When your gain is my pain and your pain is my gain: neural correlates of envy and schadenfreude. Science 323, 937–939 (2009).
Hume, D. A Treatise of Human Nature (Oxford Univ. Press, Oxford, 1978).
Festinger, L. A theory of social comparison processes. Hum. Relat. 7, 117–140 (1954).
West-Eberhard, M. J. Sexual selection, social competition and evolution. Proc. Am. Philos. Soc. 123, 222–234 (1979).
Hamilton, W. D. Geometry for the selfish herd. J. Theor. Biol. 31, 295–311 (1971).
Clark, A. B. Sex ratio and local resource competition in a prosimian primate. Science 201, 163–165 (1978).
Fiorillo, C. D., Tobler, P. N. & Schultz, W. Discrete coding of reward probability and uncertainty by dopamine neurons. Science 299, 1898–1902 (2003).
Tobler, P. N., Fiorillo, C. D. & Schultz, W. Adaptive coding of reward value by dopamine neurons. Science 307, 1642–1645 (2005).
Matsumoto, M. & Hikosaka, O. Two types of dopamine neuron distinctly convey positive and negative motivational signals. Nature 459, 837–841 (2009).
Schultz, W. Neuronal reward and decision signals: from theories to data. Physiol. Rev. 95, 853–951 (2015).
Amodio, D. M. & Frith, C. D. Meeting of minds: the medial frontal cortex and social cognition. Nat. Rev. Neurosci. 7, 268–277 (2006).
Azzi, J. C., Sirigu, A. & Duhamel, J. R. Modulation of value representation by social context in the primate orbitofrontal cortex. Proc. Natl. Acad. Sci. USA 109, 2126–2131 (2012).
Lak, A., Stauffer, W. R. & Schultz, W. Dopamine prediction error responses integrate subjective value from different reward dimensions. Proc. Natl. Acad. Sci. USA. 111, 2343–2348 (2014).
Kashtelyan, V., Lichtenberg, N. T., Chen, M. L., Cheer, J. F. & Roesch, M. R. Observation of reward delivery to a conspecific modulates dopamine release in ventral striatum. Curr. Biol. 24, 2564–2568 (2014).
Schultz, W. & Romo, R. Responses of nigrostriatal dopamine neurons to high-intensity somatosensory stimulation in the anesthetized monkey. J. Neurophysiol. 57, 201–217 (1987).
Ullsperger, M., Danielmeier, C. & Jocham, G. Neurophysiology of performance monitoring and adaptive behavior. Physiol. Rev. 94, 35–79 (2014).
Fukushima, H. & Hiraki, K. Whose loss is it? Human electrophysiological correlates of non-self reward processing. Soc. Neurosci. 4, 261–275 (2009).
van Schie, H. T., Mars, R. B., Coles, M. G. & Bekkering, H. Modulation of activity in medial frontal and motor cortices during error observation. Nat. Neurosci. 7, 549–554 (2004).
Yoshida, K., Saito, N., Iriki, A. & Isoda, M. Representation of others’ action by neurons in monkey medial frontal cortex. Curr. Biol. 21, 249–253 (2011).
Yoshida, K., Saito, N., Iriki, A. & Isoda, M. Social error monitoring in macaque frontal cortex. Nat. Neurosci. 15, 1307–1312 (2012).
Yoshida, K. et al. Single-neuron and genetic correlates of autistic behavior in macaque. Sci. Adv. 2, e1600558 (2016).
Chang, S. W., Gariépy, J. F. & Platt, M. L. Neuronal reference frames for social decisions in primate frontal cortex. Nat. Neurosci. 16, 243–250 (2013).
Chang, S. W. et al. Neural mechanisms of social decision-making in the primate amygdala. Proc. Natl. Acad. Sci. USA 112, 16012–16017 (2015).
Rudebeck, P. H., Buckley, M. J., Walton, M. E. & Rushworth, M. F. A role for the macaque anterior cingulate gyrus in social valuation. Science 313, 1310–1312 (2006).
Lockwood, P. L., Apps, M. A., Roiser, J. P. & Viding, E. Encoding of vicarious reward prediction in anterior cingulate cortex and relationship with trait empathy. J. Neurosci. 35, 13720–13727 (2015).
Haroush, K. & Williams, Z. M. Neuronal prediction of opponent’s behavior during cooperative social interchange in primates. Cell 160, 1233–1245 (2015).
Falcone, R., Cirillo, R., Ferraina, S. & Genovesio, A. Neural activity in macaque medial frontal cortex represents others’ choices. Sci. Rep. 7, 12663 (2017).
Isoda, M. & Noritake, A. What makes the dorsomedial frontal cortex active during reading the mental states of others? Front. Neurosci. 7, 232 (2013).
Holroyd, C. B. & Coles, M. G. H. The neural basis of human error processing: reinforcement learning, dopamine, and the error-related negativity. Psychol. Rev. 109, 679–709 (2002).
Williams, S. M. & Goldman-Rakic, P. S. Characterization of the dopaminergic innervation of the primate frontal cortex using a dopamine-specific antibody. Cereb. Cortex 3, 199–222 (1993).
Matsumoto, M., Matsumoto, K., Abe, H. & Tanaka, K. Medial prefrontal cell activity signaling prediction errors of action values. Nat. Neurosci. 10, 647–656 (2007).
Lodge, D. J. The medial prefrontal and orbitofrontal cortices differentially regulate dopamine system function. Neuropsychopharmacology 36, 1227–1236 (2011).
Wiener, N. The Theory of Prediction (McGraw-Hill, New York, NY, 1956).
Monakow, K. H., Akert, K. & Künzle, H. Projections of precentral and premotor cortex to the red nucleus and other midbrain areas in Macaca fascicularis. Exp. Brain Res. 34, 91–105 (1979).
Chiba, T., Kayahara, T. & Nakano, K. Efferent projections of infralimbic and prelimbic areas of the medial prefrontal cortex in the Japanese monkey, Macaca fuscata. Brain Res. 888, 83–101 (2001).
Young, A. B., Penney, J. B., Dauth, G. W., Bromberg, M. B. & Gilman, S. Glutamate or aspartate as a possible neurotransmitter of cerebral corticofugal fibers in the monkey. Neurology 33, 1513–1516 (1983).
Ongür, D., An, X. & Price, J. L. Prefrontal cortical projections to the hypothalamus in macaque monkeys. J. Comp. Neurol. 401, 480–505 (1998).
Schultz, W. Predictive reward signal of dopamine neurons. J. Neurophysiol. 80, 1–27 (1998).
Hong, S., Jhou, T. C., Smith, M., Saleem, K. S. & Hikosaka, O. Negative reward signals from the lateral habenula to dopamine neurons are mediated by rostromedial tegmental nucleus in primates. J. Neurosci. 31, 11457–11471 (2011).
Apps, M. A. J. & Ramnani, N. Contributions of the medial prefrontal cortex to social influence in economic decision-making. Cereb. Cortex 27, 4635–4648 (2017).
Asaad, W. F. & Eskandar, E. N. A flexible software tool for temporally-precise behavioral control in Matlab. J. Neurosci. Methods 174, 245–258 (2008).
Matsuzaka, Y., Aizawa, H. & Tanji, J. A motor area rostral to the supplementary motor area (presupplementary motor area) in the monkey: neuronal activity during a learned motor task. J. Neurophysiol. 68, 653–662 (1992).
Matelli, M., Luppino, G. & Rizzolatti, G. Architecture of superior and mesial area 6 and the adjacent cingulate cortex in the macaque monkey. J. Comp. Neurol. 311, 445–462 (1991).
Hikosaka, O. & Wurtz, R. H. Visual and oculomotor functions of monkey substantia nigra pars reticulata. I. Relation of visual and auditory responses to saccades. J. Neurophysiol. 49, 1230–1253 (1983).
Granger, C. W. J. Investigating causal relations by econometric models and cross- spectral methods. Econometrica 37, 424–438 (1969).
Hamilton, J. D. Time Series Analysis (Princeton University Press, Princeton, NJ, USA, 1994).
Barnett, L. & Seth, A. K. The MVGC multivariate Granger causality toolbox: a new approach to Granger-causal inference. J. Neurosci. Methods 223, 50–68 (2014).
The authors thank M. Ullsperger, A. Fischer, R. Burnside, M. Yoshida, and Y. Kobayashi for their helpful discussions; M. Matsumoto for physiological identification of dopaminergic neurons; and M. Togawa, Y. Yamanishi, and A. Shibata for technical assistance. Japanese monkeys used in this study were provided by the National BioResource Project ‘Japanese Macaques’ of Japan Agency for Medical Research and Development, AMED. This research was supported in part by a Grant-in-Aid for Japan Society for the Promotion of Science KAKENHI (15H04262), the Strategic International Research Cooperative Program from AMED (JP17jm0310011), and the Strategic Research Program for Brain Sciences from AMED (JP18dm0107145) to M.I.
The authors declare no competing interests.
Publisher’s note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Integrated supplementary information
a, Three possible reward outcomes resulting from behavioral constrains mimicking resource limitation (left) and the number of each outcome for each stimulus (right). Each block consisted of 120 trials and each stimulus was presented equally often (n = 40 trials). b,c, Changes in M1’s reward probability in the self-variable block (b) and the partner-variable block (c). P(M1), M1-reward probability. P(M2), M2-reward probability. P(¬M2), M2-no-reward probability. Note that P(M1) after the outcome to M2 represents the conditional probability of M1’s reward. Grayed areas indicate the stimulus period (1 s) during which the analysis was focused. The numbers in parentheses indicate the number of corresponding cases (see a, right). Detailed account of the probability structure by taking the partner-variable block as an example. In the partner-variable block, three different stimuli were used, that is, stimulus D (P = 0.25 for M2), stimulus E (P = 0.5 for M2), and stimulus F (P = 0.75 for M2), which were presented equally often (each 40 trials; see a, right). Then, the number of trials in which M1 was rewarded was 8 out of the 40 trials for all the stimuli (8/40 = 0.2; see ‘M1-rewarded’ in a, right). Thus, during the stimulus presentation, the reward probability for M1 was, on average, 0.2 ( = 20%) regardless of which stimulus was presented. This is schematically shown in c by using a horizontal line in each stimulus condition with gray backgrounds. Then, there were two different scenarios once the reward outcome for M2 was revealed. The first scenario occurred after M2 was rewarded. The number of trials in which this scenario occurred was 10 for stimulus D, 20 for stimulus E, and 30 for stimulus F (see ‘M2-rewarded’ in a, right). In these trials, M1 was never rewarded; as explained in the main text, there was no trial in which both animals were rewarded on the same trial. Therefore, the conditional probability of M1 to get a reward was 0 ( = 0%; 0/10 for stimulus D, 0/20 for stimulus E, and 0/30 for stimulus F). The second scenario occurred after M2 was unrewarded. The number of trials in which this scenario occurred was 30 for stimulus D, 20 for stimulus E, and 10 for stimulus F (see ‘M1-rewarded’ plus ‘Neither-rewarded’ in a, right;). In these trials, M1 had a chance to receive a reward. Specifically, the conditional probability of M1 to get a reward was, on average, 0.27 (8/30 trials) for stimulus D, 0.4 (8/20 trials) for stimulus E, and 0.8 (8/10 trials) for stimulus F.
a, An example of M1’s gaze behavior obtained during a single session (501–1,000 ms after stimulus offset). ROI, region of interest. In the figure, M1 was positioned on the near side (not shown) and M2 was positioned on the far side. b, Proportion of gaze locations entered in the ROI (501–1,000 ms after stimulus offset). **P < 0.01; two-tailed Welch’s t-test. Self-variable block; monkey S, t3519 = 17.32, P = 1.41 × 10−64 (M2 unrewarded, n = 2596 trials; M2 rewarded, n = 649 trials); monkey H, t1390 = 8.1, P = 1.21 × 10−15 (M2 unrewarded, n = 2193 trials; M2 rewarded, n = 550 trials). Partner-variable block; monkey S, t3522 = 5.39, P = 7.61 × 10−8 (M2 unrewarded, n = 1956 trials; M2 rewarded, n = 1974 trials); monkey H, t809 = 3.55, P = 4.07 × 10−4 (M2 unrewarded, n = 1253 trials; M2 rewarded, n = 1253 trials). Red bars, mean. c, A heat map showing the proportion of monkey S’s gaze locations during the stimulus period. Monkey S was positioned on the near side (not shown). Note that this monkey spontaneously looked at the reward-predicting stimuli most of the time, although it was not instructed to do so (see Methods). d, Proportion of monkey S’s gaze locations entered in the ROI during the early and late epochs. Data are plotted for each probability condition. Early, P = 0.68; late, P = 0.47; Spearman rank correlation test (n = 15102, 15112 and 15114 trials for 0.25, 0.50 and 0.75, respectively, in both early and late epochs). The proportion of gaze in the ROI was significantly smaller in the late epoch than in the early epoch (two-tailed paired t-test; t45327 = 32.68, P = 1.5 × 10−231).
M2 did not show differential anticipatory licking behavior in the self-variable block, where M2’s reward probability was the same and M1’s reward probability was variable (see Fig. 1b). Note that M2 received reward outcomes before M1. Monkey B was paired with monkey S, and monkey D was paired with monkey H. **P < 0.01; n.s., not significant (P > 0.8); Spearman rank correlation test. Monkey B: self-variable block, ρ = -3.3 × 10−3, P = 0.88 (n = 690 blocks); partner-variable block, ρ = 0.12, P = 1.37 × 10−7 (n = 690 blocks). Monkey D: self-variable block, ρ = -3.1 × 10−3, P = 0.83 (n = 322 blocks); partner-variable block, ρ = 0.12, P = 2.42 × 10−4 (n = 322 blocks). Center and error bars indicate mean ± s.e.m.
a, Four reward outcomes were possible in the noncontingent condition of the social Pavlovian procedure (left). The number of each outcome for each stimulus is shown (right). b,c, No change in M1-reward probability following stimulus offset either in the self-variable block (b) or in the partner-variable block (c). The same conventions are used as in Supplementary Fig. 1. d, Lack of subjective value difference in the partner-variable block when the outcome contingency was absent. The monkeys were first conditioned in the noncontingent condition and then in the contingent condition. **P < 0.01; *P < 0.05; n.s., not significant; Spearman rank correlation test. Center and error bars indicate mean ± s.e.m. Monkey S, contingency (-); self-variable block: ρ = 0.18, P = 5.01 × 10−19 (days 1–3, n = 2505 trials); ρ = 0.27, P = 1.14 × 10−51 (days 4–7, n = 3120 trials); partner-variable block: ρ = 0.006, P = 0.76 (days 1–3, n = 2580 trials); ρ = -7.8 × 10−3, P = 0.66 (days 4–7, n = 3300 trials). Monkey S, contingency ( + ); self-variable block: ρ = 0.33, P = 1.72 × 10−67 (days 1–3, n = 2580 trials); ρ = 0.42, P = 4.63 × 10−93 (days 4–6, n = 2160 trials); partner-variable block: ρ = -0.03, P = 9.19 × 10−2 (days 1–3, n = 2700 trials); ρ = -5.29 × 10−2, P = 1.39 × 10−2 (days 4–6, n = 2160 trials). Monkey H, contingency (-); self-variable block: ρ = 6.43 × 10−2, P = 1.7 × 10−3 (days 1–3, n = 2375 trials); ρ = 0.13, P = 1.39 × 10−13 (days 4–7, n = 3026 trials); partner-variable block: ρ = 1.38 × 10−2, P = 0.5 (days 1–3, n = 2383 trials); ρ = 0.3 × 10−3, P = 0.99 (days 4–7, n = 3336 trials). Monkey H, contingency ( + ); self-variable block: ρ = 0.12, P = 8.71 × 10−4 (days 1–3, n = 715 trials); ρ = 0.13, P = 0.4 × 10−2 (days 4–5, n = 479 trials); partner-variable block: ρ = -0.01, P = 0.77 (days 1–3, n = 716 trials); ρ = -0.2, P = 1.09 × 10−5 (days 4–5, n = 479 trials).
a, Definition of spike duration. The spike duration was measured from the peak of the first negative component to the peak of the subsequent positive component. b, Distribution of the spike duration for dopaminergic (DA) neurons. Filled red bars indicate dopaminergic neurons with activity that negatively correlated with the partner-reward probability (partner effect). Open black bars indicate dopaminergic neurons without the significant partner effect. For comparison, the spike duration for presumed substantia nigra pars reticulata (SNr) neurons is shown in blue (mean spike duration, 0.23 ms; mean baseline firing rate, 77 Hz; n = 31). c, Baseline firing rate of dopamine neurons was calculated during inter-trial intervals. Values denote the mean and values in parentheses denote s.d. There was no significant difference between neurons with (n = 31 neurons) and without (n = 169 neurons) the significant partner effect in any of the parameters (two-tailed Wilcoxon signed-rank test; firing rate in the self-variable block, P = 0.35; firing rate in the self-variable block, P = 0.11; spike duration, P = 0.45).
Scatter plots of sensitivity to the self-reward probability measured in the self-variable block (abscissa) and sensitivity to the partner-reward probability measured in the partner-variable block (ordinate) in the early (left) and late (right) epochs. Each dot represents the data for each neuron (n = 207). The number of each neuronal type is provided in the inset. See Methods for the definition of each neuronal type.
a, Spike density functions for a population of negative self-type MPFC neurons in the early (n = 21 neurons, top) and late (n = 19 neurons, bottom) epochs. b, Spike density functions for a population of negative partner-type MPFC neurons in the early (n = 11 neurons, top) and late (n = 39 neurons, bottom) epochs. Gray rectangles indicate the analysis epochs. See Methods for the definition of positive and negative types.
a, The temporal sequence of events was identical to the procedure shown in Fig. 1d, except that M2’s reward spout was removed. Under this condition, M2 was unable to receive a reward even if the low-pitched tone was presented. b, Choice bias measured for each stimulus pair. The same conventions were used as shown in Fig. 1e. *P < 0.05; **P < 0.01; n.s., not significant; two-tailed paired t-test. Self-variable block (n = 14 blocks for each probability condition): 0.25 vs. 0.5, t13 = 28.88, P = 2.21 × 10−11; 0.25 vs. 0.75, t13 = 173.86, P = 2.84 × 10−23; 0.5 vs. 0.75, t13 = 96.06, P = 6.32 × 10−20. Partner-variable block (n = 13 blocks for each probability condition): 0.25 vs. 0.5, t12 = 5.22 × 10−2, P = 0.96; 0.25 vs. 0.75, t12 = 3.12 × 10−2, P = 0.98; 0.5 vs. 0.75, t12 = 2.6, P = 2.33 × 10−2. Note that the choice bias indicated by * in the partner-variable block is opposite to the choice bias in the social condition (Fig. 1e).
a, Single-trial raw LFP for each probability condition in the self-variable block (top) and the partner-variable block (bottom). b, Block-averaged LFP for each probability condition in the self-variable block (top) and partner-variable block (bottom). Each probability condition consisted of 40 trials. Note different gains (ordinate) between a and b.
Supplementary Figure 10 Comparison of single-unit spike latencies between dopaminergic neurons and MPFC neurons.
Dopaminergic neurons had a latency basically longer than 100 ms. Approximately 20% of the MPFC neurons had a latency shorter than 100 ms.
About this article
Cite this article
Noritake, A., Ninomiya, T. & Isoda, M. Social reward monitoring and valuation in the macaque brain. Nat Neurosci 21, 1452–1462 (2018). https://doi.org/10.1038/s41593-018-0229-7
Nature Neuroscience (2021)
Nature Communications (2020)
Nature Neuroscience (2020)