Introduction

Organisms are motivated to seek pleasurable experiences (eg, food, sex, drugs of abuse) and receipt of these positive/rewarding stimuli reinforce behavior. As such, neuroscientists have sought to identify the neural underpinnings of reward and reinforcement for decades. In the 1950s Olds and Milner (1954) observed the first instance of brain stimulation reward (BSR), and subsequently determined that rats would work for this stimulation—that is, electrical current delivered to a discrete brain region could serve as a positive reinforcer. These findings spurred the search for the brain’s ‘reward center’. Since then, numerous nuclei and neurotransmitter systems have been implicated in reward processing and reinforcement, however, none so much as the mesolimbic dopamine (DA) system (Ikemoto, 2007; Taber et al, 2012; Wise and Rompre, 1989). An established body of evidence shows that natural reinforcers such as food, as well as drugs of abuse and BSR support operant behaviors through their ability to activate the mesolimbic system (Di Chiara et al, 2004; Hernandez and Hoebel, 1988a; Yoshida et al, 1992). Moreover, DAergic lesions or receptor antagonism attenuate approach toward, or responding for these stimuli (Ettenberg and Camp, 1986a; Mora et al, 1975; Robledo et al, 1992).

Accumulating evidence from the last few decades, however, re-focused DA’s role in reward. Indeed, while research suggests that DA controls several aspects of reinforcement, such as motivation (Salamone and Correa, 2002), incentive salience (Berridge and Robinson, 1998), and prediction error (Schultz et al, 1997), the pleasurable experience produced by rewarding stimuli (ie, hedonia) does not appear to be strictly DA-dependent. In support of this view, DA depletion spares orofacial ‘liking’ responses to sweet tasting solutions in rats (Berridge and Robinson, 1998), and DA receptor antagonism in human subjects does not reduce reported ratings of pleasure produced by food or drugs (Brauer and De Wit, 1997; Meyers et al, 2010). Rather, research suggests that the subjective effects of pleasure that characterize ‘reward’ stem from activation of the endogenous opioid system. Indeed, opioid receptor agonism enhances orofacial liking responses to sweet solutions in rats (Pecina, 2005), and human studies report that systemic opioid antagonism attenuates the experience of pleasure following physical activity (Daniel et al, 1992) or food consumption (Yeomans and Gray, 1996). Nevertheless, the neurobiology underlying these effects remains complicated, since opioid and DA systems are anatomically connected and blockade of opioid signaling also reduces DAergic activity (Spanagel et al, 1992; Taber et al, 1998).

Interestingly, both DAergic and opioid systems are highly influenced by endocannabinoid (eCB) signaling and mounting evidence indicates that the eCB system is critical in DAergic and opioid control of reinforcement and reward. This review will explore the relationship of these neural systems as they relate to reward and reinforcement produced by exogenous and endogenous CBs, BSR, and food. We will attempt to differentiate between aspects of reward and reinforcement when possible, however, in animal behavioral tests these properties can be difficult to deconvolve. In general, we will explore reward as it pertains to hedonia or pleasure. BSR and place conditioning are the most widely used paradigms to assess reward, so we will focus on results from these models. Stimuli described here as rewards decrease thresholds for BSR and induce conditioned place preferences (CPPs) (Fountain et al, 1990; Tzschentke, 2007). Reinforcement, however, is a separate, albeit usually interconnected, process from reward. We will discuss reinforcement as it relates to the ability of a stimulus to reinforce or support operant behavior. Self-administration paradigms are considered optimal for measuring drug reinforcement, while operant responding for natural stimuli, such as food, is utilized to assess their reinforcing value. These distinctions are important to recognize as not all rewards are capable of promoting operant behavior and not all reinforcers produce hedonia, evidencing that these processes are likely subserved by different brain systems.

The eCB system in brief

The eCB system is so-named because it provides the binding site for exogenous CBs (chemical constituents of the cannabis plant or their synthetic analogs). This neuro-modulatory system consists of two well-characterized cannabinoid (CB) receptors (CB1 and CB2), as well as their endogenous ligands and ligand-related synthesis, reuptake, and degradation proteins. Both CB1 and CB2 are Gi/o-coupled, however, they differ in both anatomical distribution and function. While CB1 is expressed predominately in the central and peripheral nervous system (Herkenham et al, 1991), traditionally, CB2 is found mainly in peripheral and brain immune cells (Galiègue et al, 1995; Núñez et al, 2004). However, recent evidence suggests that CB2 is also expressed in neurons and glial cells (Gong et al, 2006; Xi et al, 2011), although CB2 receptors are less widely expressed than CB1 and have much lower levels of expression (Atwood and Mackie, 2010). Interestingly, CB1 receptors are located primarily on presynaptic terminals of glutamate and GABA cells where ligand binding results in decreased neurotransmitter release (Katona et al, 1999), whereas CB2 is mainly expressed on post-synaptic sites and ligand binding hyperpolarizes the post-synaptic membrane (Zhang et al, 2014). Despite the identification of neuronal CB2 receptors, studies on their function have been controversial due to a lack of selective CB2 antibodies and knockout models, as well as CB1/CB2 heterodimerization (see review by Chen et al (2017). Because of these reasons, much less information is available regarding the role of CB2 receptors in reward/reinforcement, but see Zhang et al (2014). It should also be noted that, although this review will focus on effects at CB1 receptors, eCBs also bind to various ligand-gated ion channels and other G-protein coupled receptors (Ryberg et al, 2009; Szallasi and Di Marzo, 2000).

The primary endogenous ligands of CB receptors are N-arachidonylethanolamine (anandamide; AEA), a partial agonist at CB1 receptors, and 2-arachidonylglycerol (2-AG), a full agonist at both CB1 and CB2 receptors (Devane et al, 1992; Mechoulam et al, 1995). These eCBs are synthesized by neurons ‘on-demand’ following Gq/11-coupled receptor binding, or heightened cell activation resulting in an influx of Ca2+. Upon synthesis, these lipid messengers readily diffuse through the post-synaptic membrane and interact with CB receptors of nearby cells. Thus, brain eCBs primarily act as retrograde messengers, transmitting messages from post- to pre-synaptic neurons, resulting in negative feedback to presynaptic cells (Alger, 2002). Following release, AEA and 2-AG signaling is quickly terminated through cellular reuptake and hydrolysis by the enzymes fatty acid amide hydrolase (FAAH) and monoacylglycerol lipase, respectively (FAAH can also hydrolyze 2-AG). Although other eCBs have been identified, AEA and 2-AG remain the most well-studied.

The mesolimbic DA system

The mesolimbic DA system is composed of DAergic cell bodies of the ventral tegmental area (VTA) that send their diffuse projections to cortical and limbic regions, including the nucleus accumbens (NAc), a region heavily implicated in reward and reinforcement (Swanson, 1982). Midbrain DA neurons are believed to signal reward-related stimuli through changes in their firing patterns. In general, these neurons fire in a tonic low-frequency (1–5 Hz) pacemaker manner, which results in a baseline DAergic ‘tone’ on high-affinity D2-like DA receptors (Grace, 1991). However, the presentation of rewarding/reinforcing stimuli is accompanied by phasic high-frequency burst firing (20 Hz), (Grace, 1991), increasing terminal DA sufficiently to occupy low-affinity excitatory D1-like receptors (Dreyer et al, 2010). Not to intentionally over-simplify their function, these cells also transiently burst fire to environmental stimuli with no apparent affective valence (Horvitz, 2000), and in some cases in response to aversive events/stimuli (Budygin et al, 2012). Indeed, Brischoux et al (2009) found that although a majority of midbrain DA cells are inhibited by or show no response to aversive stimuli, a subset of VTA DAergic cells are excited by the delivery of electrical shock to the hindpaw. However, these animals, similar to a number of other studies examining DA neuron response to aversive stimuli (Mantz et al, 1989; Schultz and Romo, 1987; Ungless et al, 2010), were unfortunately tested under anesthesia, which may have affected the nature of the electrophysiological response. Regardless, these findings provide evidence for two functional DA systems within the VTA—one system responsible for reward-related signals, and another theorized to be activated by all salient stimuli, regardless of valence (Bromberg-Martin et al, 2010; Ikemoto, 2007; Lammel et al, 2011; Redgrave et al, 1999).

Reward-related burst firing of midbrain DA cells occurs in a pattern consistent with ‘reward prediction’. That is to say that presentation of hedonic/rewarding stimuli (‘rewards’) cause burst firing of VTA DA neurons and with repeated stimulus presentation this phasic DA signal shifts from reward receipt to the presentation of reward-predictive cues (Romo and Schultz, 1990). When the probability of reward delivery is high, the magnitude of burst activity is greater to the reward-predictive cues, but when the probability of reward is low, DAergic cell activity is greater during reward receipt. This suggests that midbrain DA neurons signal an error term reflecting the difference in value of ‘expected’ vs ‘received’ rewards—ie, a reward prediction error (Romo and Schultz, 1990). Conversely, reward omission or aversive stimuli cause midbrain DA neurons to pause transient activity, resulting in a negative prediction error (Schultz, 1998). Thus, midbrain DAergic phasic signaling transmits information about previous and current reward situations making this form of signaling particularly important for making cost-benefit analyses in the development of reinforced behaviors.

The functional consequence of midbrain DA cell activation is DA release at terminal regions. Microdialysis studies outline a clear correlation between presentation of positive/rewarding stimuli and DA release. For example, DA levels are elevated in the NAc following the delivery of food (Hernandez and Hoebel, 1988b), water (Yoshida et al, 1992), or drugs of abuse (Church et al, 1987). Utilizing fast-scan cyclic voltammetry (FSCV), studies show that stimuli known to make VTA DA cells burst fire, ie, food or food-predictive cues, enhance transient DA concentration within the NAc (Roitman, 2004). A wide body of FSCV data similarly supports a role for reward-evoked striatal DA release as a prediction error signal. In support of this, unexpected reward delivery or presentation of reward-predictive cues results in phasic DA release (Brown et al, 2011; Cheer et al, 2007; Roitman, 2004; Sunsay and Rebec, 2008), and omission of an expected reward or presentation of an aversive stimulus results in decreased extracellular DA in the ventral striatum (Gentry et al, 2016; Oleson et al, 2012; Roitman et al, 2008).

Burst firing of DA neurons requires glutamatergic input (Charlety et al, 1991), and conversely, GABAergic input to midbrain DA neurons dampens burst firing and returns the cell to baseline activity (Engberg et al, 1993). Therefore, DAergic response to rewarding stimuli requires orchestration of glutamatergic and GABAergic inputs to the ventral midbrain—the eCB system is uniquely positioned to serve this function.

eCB modulation of DA transmission

The administration of exogenous CBs, such as Δ9-tetrahydrocannabinol (THC), the primary psychoactive constituent of the cannabis plant, elevates extracellular DA concentrations in the ventral striatum (Cheer et al, 2004; Chen et al, 1990; Tanda et al, 1997). This DAergic enhancement is dependent on CB1 receptor activation, as pretreatment with the CB1 antagonist/inverse agonist SR141716A (rimonabant) blocks this effect (Tanda et al, 1997). Single-unit recording studies show that CBs enhance extracellular DA concentrations in the NAc through increasing both the baseline firing rate and burst frequency of midbrain DA neurons (French et al, 1997) in a CB1-dependent manner (Gessa et al, 1998). Interestingly, recent evidence suggests that eCBs, such as 2-AG, can also enhance DA neuron excitability through direct interaction with ion channels (Gantz and Bean, 2017). Additional research is required to determine to what extent this mechanism of action is recruited in vivo and how it influences reward and reinforcement.

Midbrain DA neurons do not express CB1 receptors, suggesting that CBs must excite VTA DA cells indirectly. The VTA is largely composed of DA neurons (~60%; (Swanson, 1982)), a small population of GABA cells (30%), and even fewer glutamate neurons (~3%) (Dobi et al, 2010). The VTA also receives glutamatergic and GABAergic afferents from several limbic and sensory regions. These inputs to DA cells express CB1, therefore, presynaptic eCB modulation can alter VTA DA cell activity (Melis et al, 2004; Riegel and Lupica, 2004). In vitro administration of the GABAA receptor antagonist bicuculine causes VTA DA neurons to burst fire, suggesting that relief of tonic GABA inhibition on VTA DA cells facilitates phasic activation (Cheer et al, 2000). Further, the synthetic CB1/CB2 agonist WIN 55,212-2 (WIN) reduces electrically evoked GABAA-mediated inhibitory post-synaptic currents of DA neurons in VTA slices, and these effects are blocked by rimonabant (Szabo et al, 2002). Thus, enhanced activation of VTA DA neurons likely promotes synthesis and ‘on demand’ release of eCBs from DA neurons. These lipophilic messengers then diffuse out of the post-synaptic cell to influence presynaptic inputs, with inhibition of local GABA neurons causing disinhibition of DA cells. Indeed, IPSCs mediated by GABA receptors on VTA DA neurons are inhibited by presynaptic CB1 signaling (Lupica and Riegel, 2005; Riegel and Lupica, 2004). Additionally, activation of CB1 receptors on GABA terminals within the NAc augments local DA terminal release (Sperlágh et al, 2009) to influence reward-related behavior. Alternatively glutamatergic and cholinergic cells of the NAc also express CB1, and binding may decrease NAc DA concentration (Fusco et al, 2004). See Figure 1.

Figure 1
figure 1

Schematic of proposed eCB and opioid interaction with the mesolimbic dopamine system in the ventral tegmental area and the nucleus accumbens. (a) Glutamatergic and GABAergic terminals of the ventral tegmental area (VTA) express mu opioid receptors (MOPR) and cannabinoid type-1 receptors (CB1). Glutamatergic activation of VTA dopamine (DA) neurons likely promotes synthesis and ‘on demand’ release of eCBs, which diffuse out of the post-synaptic cell and bind to CB1 to further disinhibit DA release via presynaptic GABA inhibition. Likewise, MOPR agonists (exogenous or endogenous opioid peptides) disinhibit VTA DA cells through inhibition of GABA neurons, which synapse on VTA DA cells or glutamate projections neurons. (b) NAc DA release can occur independently of VTA DA cell body excitation. The VTA sends GABAergic projections to the NAc, which synapse of cholinergic interneurons, inhibiting excitatory cholinergic (ACh) input onto DA terminals. CB1 or MOPR-mediated inhibition of these GABA cells may disinhibit ACh release, resulting in DA terminal stimulation. However, ACh interneurons express MOPR and CB1, suggesting that direct opioid or eCB inhibition of these cells may decrease DA concentration in the NAc. Glutamatergic and GABAergic terminals in the NAc may also directly modulate DA activity. NAc Glutamate and GABA cells express MOPR and/or CB1. Thus, CB1 or MOPR agonism of GABA inputs to NAc DA terminals could enhance DA release, while CB1 or MOPR-induced inhibition of glutamatergic inputs may dampen NAc DA release.

PowerPoint slide

Endogenous opioids and reward

Just like the mesolimbic DA system, endogenous opioid signaling is similarly implicated in reward. The opioid system owes its name to its ability to bind opioids, including, opiates derived of the opium poppy (eg, morphine), as well synthetic derivatives like heroin. Opioids exert their characteristic effects (eg, analgesia, respiratory depression, euphoria) through binding to three principle classes of opioid GPCRs (Gi/o)—the mu opioid receptor (MOPR), delta opioid receptor (DOPR), and the kappa opioid receptor (KOPR). The MOPR, DOPR, and KOPR are chiefly activated by three groups of endogenous opioid peptides in a semi-specific manner; endorphins which primarily bind to MOPRs, enkephalins which preferentially bind to DOPRs, and dynorphins which typically bind to KOPRs. A fourth opioid receptor has more recently been identified, the nociceptin receptor (NOPR). The NOPR has little affinity for classic opioid peptides, but rather is bound by the peptide Orphanin FQ/nociception. Altogether, these proteins and their receptors are expressed widely throughout the brain, including reward-related regions, such as the VTA and NAc. In general, administration of MOPR, and in some cases DOPR, agonists produce reward and support operant behaviors (see reviews by Le Merrer et al, 2009; Shippenberg et al, 2008). However, KOPR and NOPR are involved in counter-reward mechanisms. Indeed, KOPR agonist administration induces aversion, and while NOPR agonism is not inherently aversive, stimulation of NOPRs opposes the rewarding action of opioids and other drugs of abuse (Chefer et al, 2013; Di Giannuario and Pieretti, 2000). These effects are likely due to the ability of KOPR and NOPR agonists to inhibit mesolimbic DA (for more in-depth analysis, see Lalanne et al (2014); Witkin et al (2014). This review will focus mainly on opioid action at MOPRs, given their primary role in reward-related processes. Indeed, MOPR agonists, including morphine and endorphins, are readily self-administered by animals (Bozarth and Wise, 1981; Thompson and Schuster, 1964) and their administration supports CPP (Hnasko et al, 2005) and decreases thresholds for BSR (van Wolfswinkel and van Ree, 1985).

Mopr agonists enhance DAergic activity

Systemic or intra-VTA MOPR agonism increases the firing rate of VTA DA neurons (Melis et al, 2000), and enhances DA release in the NAc (Spanagel et al, 1992), suggesting that MOPR agonists influence reward and reinforcement through activation of the mesolimbic system. In support of these findings, several studies have shown that DA antagonists block CPP for opioids (Acquas and Di Chiara, 1994; Bozarth and Wise, 1981; Leone and Di Chiara, 1987). MOPRs are located both pre- and post-synaptically within VTA and NAc and ligand binding results in inhibition of neurotransmitter release and membrane hyperpolarization (Fields and Margolis, 2015). Therefore, similarly to eCBs, it is likely that within the VTA, MOPR agonists disinhibit DA cells through inhibition of GABA release. Indeed, the selective MOPR agonist DAMGO inhibits presynaptic GABAergic inputs on VTA DA neurons (Zhang et al, 2015). However, recent evidence demonstrates that MOPR activation also disinhibits glutamatergic input to VTA DA cells via presynaptic GABA inhibition, suggesting that disinhibition of presynaptic glutamate release similarly works to enhance VTA DA neuron activity (Chen et al, 2015) (Figure 1a). Striatal DA release can also occur independently of VTA DA cell body excitation. Indeed, the VTA also sends GABAergic projections to the NAc, which synapse of cholinergic interneurons, inhibiting their excitatory input onto DA terminals (Figure 1b). Thus, opioid inhibition of VTA GABA cells functionally disinhibits striatal cholinergic neurons to augment NAc DA release independent of DA firing (Fields and Margolis, 2015). However, whether and to what extent this occurs in vivo remains unknown, and recent studies show that MOPR agonists can bind directly to cholinergic interneurons thereby reducing NAc terminal DA release (Yorgason et al, 2017). Additionally, the mechanism by which endogenous opioid peptides exert their effects as well as the precise time course of their actions in the mesolimbic system remains unclear, as the literature to date has relied solely on the application of exogenous compounds. Future research should employ opto- and chemo-genetic techniques to examine the role of opioid systems in reward and reinforcement.

Interestingly, behavioral evidence suggests that opioids can exert rewarding/reinforcing effects independent of DA function. For example, Ettenberg et al (1982) found that systemic DA antagonism with low doses of the DA receptor antagonist alpha-flupenthixol enhanced cocaine self-administration, but did not affect self-administration of heroin, while high doses abolished self-administration of cocaine, but not heroin (Ettenberg et al, 1982). Furthermore, DA-deficient mice still acquire CPP for morphine (Hnasko et al, 2005), although at higher doses than those employed in other studies. While the mechanisms underlying DA-independent opioid reward remain unclear, Laviolette et al (2002) showed that intra-NAc DA receptor antagonism blocked morphine CPP in dependent, but not naive, rats. This suggests that opioid reward may shift between DA-dependent and DA-independent mechanisms conditional to an organism’s motivational state. However, the neural underpinnings of this phenomenon remain unknown.

Exogenous CBs and reward

A wide body of literature demonstrates the powerful rewarding and reinforcing properties of exogenous CBs in human subjects. However, the animal literature presents a complicated picture. While some studies suggest that CB administration does not affect (Arnold et al, 2001) or attenuates BSR (Vlachou et al, 2005), others report that low doses of THC reduce BSR thresholds (Lepore et al, 1996). A report by Katsidoni et al (2013) indicates biphasic effects of THC on BSR dependent on dose, with a low dose of THC decreasing BSR thresholds and a higher dose increasing them. Both effects were reversed by pretreatment with rimonabant, suggesting that CB1 receptor signaling is required for these dose-dependent rewarding and aversive actions of THC. Similar discrepancies exist in the CB CPP literature. Indeed, while some studies report THC-induced CPP in rodents (Valjent and Maldonado, 2000), other studies report CB-induced conditioned place aversion (CPA) (Parker and Gillies, 1995; Cheer et al, 2000). It is has been hypothesized that this discrepancy is also dose-dependent, with time of injection playing a major role (Gardner, 2005). Lepore et al (1995) found that when THC CPP pairings were 24 h apart, only higher doses of THC produced CPP, however, when THC pairings were 48 h apart, lower doses of THC produced CPP and higher THC doses produced CPA. The authors explain this difference as an effect of THC withdrawal-induced dysphoria. That is, when THC pairings occur at 24-hour intervals, they coincide with withdrawal produced by the previous THC administration, and thusly a higher dose of THC is required to overcome withdrawal effects and produce reward. However, when pairings occur at 48 h intervals they do not overlap with THC withdrawal, allowing the lower doses to produce reward and causing higher doses to produce aversion (Gardner, 2005). Lepore et al cite unpublished observations of acute THC withdrawal-induced increases in BSR thresholds as evidence for their conclusions, however, peer-reviewed data demonstrating this phenomenon is lacking. Other studies indicate that pre-exposing an animal to THC in its home cage before place conditioning promotes the development of CB CPP, purportedly through attenuation of the unconditioned aversive effects of these drugs (Valjent and Maldonado, 2000). Thus, with consideration to dose and timing of injection, animal models reveal rewarding properties of CB drugs.

eCBs and reward

Studies examining the rewarding properties of eCBs, however, are much less clear. Blockade of eCB signaling with rimonabant is reported to increase BSR thresholds (Deroche-Gamonet et al, 2001) or to produce no change (Arnold et al, 2001; Oleson et al, 2012). Potential rewarding effects of rimonabant may arise due to inverse agonism at CB1 receptors. Indeed a recent report suggests that systemic administration of the CB1 neutral antagonists AM4113 and PIMSR1 have no effect on BSR (Gardner et al, 2016). Therefore, eCB signaling is not necessary for BSR. Additionally, inhibition of AEA degradation with the FAAH inhibitor URB597, does not enhance BSR (Vlachou et al, 2006). Similarly, FAAH inhibition, does not result in CPP (Gobbi et al, 2005), nor does administration of exogenous AEA (Mallet and Beninger, 1998). However, rimonabant delivered directly into the NAc, but not the dorsal striatum, supports CPP. This again may also be due to rimonabant’s action as an inverse agonist, as CPP was abolished by blockade of AMPA glutamate receptors (Ramiro-Fuentes et al, 2010). Altogether, more research, particularly utilizing neutral CB1 antagonists, is necessary to determine the role of eCB signaling in reward.

The eCB system and reinforcement

Similar to studies on CB reward, examination of CB reinforcement using self-administration has yielded mixed results. A number of early studies showed that IV THC was not self-administered by rats or rhesus monkeys, but was successfully self-administered by squirrel monkeys (Justinova et al, 2003; Tanda et al, 2000); but see review by Tanda (2016)). Importantly, THC self-administration in squirrel monkeys was blocked by rimonabant (Tanda et al, 2000), evidencing a CB1-dependent mechanism. A number of factors could contribute to a lack of THC self-administration in animal models, among which include discrepancies in the route of administration (humans typically smoke cannabis while animal models rely on IV delivery of drug solutions) and the chemical constituents of the drug (cannabis smoke contains hundreds of CB and non-CB chemical entities, while animals are typically given access to one CB compound in isolation). Additionally, CBs may produce locomotor and working memory side effects at higher doses, which could confound task performance. Furthermore, as discussed above, THC’s initial aversive/anxiogenic effects may punish rather than reinforce self-administration. Indeed rats readily learned to self-administer THC directly into the VTA or the NAc, presumably because this route of administration bypasses its mechanism of aversive action (Zangen, 2006). Melis et al (2017) recently showed that pre-exposure to vapor containing a 10 : 1 ratio of THC and another chemical constituent of cannabis, cannabidiol (CBD), facilitates self-administration of THC+CBD in rats. This may be due to the ability of CBD to alleviate the aversive effects of THC (Russo and Guy, 2006). Alternatively, several groups report IV self-administration of synthetic CBs (eg, WIN and JWH018) in rodents, and self-administration is blocked by CB1 receptor antagonism (De Luca et al, 2015; Fattore et al, 2001; Lefever et al, 2014; Martellotta et al, 1998). Interestingly, the eCB 2-AG supports self-administration in both rodents and squirrel monkeys, and squirrel monkeys also readily self-administer AEA (De Luca et al, 2014; Justinova et al, 2005, 2011). These effects are blocked by pretreatment with rimonabant. Finally, the AEA transport inhibitor AM404 is self-administered by squirrel monkeys in a CB1 receptor-dependent manner, suggesting that endogenously released AEA is reinforcing (Schindler et al, 2016). Therefore, stimulation of CB1 receptors with either exogenous or endogenous CBs serves as a reinforcer in animal models.

Interaction between eCB and opioid systems

The eCB and endogenous opioid systems share similar pharmacological characteristics. For instance, CB1 and MOPR are both Gi/o-coupled receptors, and ligand binding leads to inhibition of adenylyl cyclase and voltage-gated calcium channels, and activation of potassium channels and mitogen-activated protein kinase signaling (Childers, 1991; Howlett, 1995). Agonism of these receptors likewise results in similar behavioral outcomes; including analgesia, sedation, and reward/reinforcement. Interestingly, a growing body of literature illustrates a functional connection between these two neuro-modulatory systems. Evidencing this, CB administration increases endogenous opioid levels in the NAc (Valverde et al, 2001), and, reciprocally, opioid administration increases eCB levels (Caille et al, 2007). Further, chronic administration of either opioid or CB drugs results in cross tolerance (Newman et al, 1974), as well as alterations in receptor density and activation (Fattore et al, 2007). The mechanism underlying this functional interaction remains unclear, however, one possible explanation is interaction between receptors. CB1 and MOPR are similarly distributed throughout the brain, including regions subserving reward and reinforcement. Indeed, CB1 and MOPR co-localize on GABA neurons of the NAc (Pickel et al, 2004). Co-localization may result in the formation of heterodimers. Data suggest that in the NAc, CB1 and MOPR may heterodimerize and stimulation of these receptor complexes can cause synergistic inhibition of GABA release (Schoffelmeer et al, 2006). However, additional research is necessary to determine the extent to which this mechanism functions in vivo. For more in-depth review of opioid–eCB interactions, see reviews by Parolaro et al (2010); Robledo et al (2008); Vigano et al (2005).

Similarly, MOPR and CB1 functionally interact to mediate reward and reinforcement. MOPR knockout or antagonism (via systemic administration of naloxone) blocks CB-induced CPP (Braida et al, 2001a; Ghozland et al, 2002), while CB1 knockout or systemic rimonabant administration blocks the acquisition of opioid CPP or self-administration (Ledent et al, 1999; Martin et al, 2000; Navarro et al, 2001; 2004). Additionally, THC self-administration is attenuated by naloxone (Braida et al, 2001b; Justinova et al, 2004), and rimonabant blocks heroin self-administration (Caille and Parsons, 2005). These effects may be reliant on the ability of these systems to modulate mesolimbic DA (but see Caille and Parsons (2005)). In support of this, THC-induced DA release in the NAc is attenuated by systemic or intra-VTA MOPR antagonism (Chen et al, 1990; Tanda et al, 1997). However, heroin-induced enhancement of NAc DA concentrations was not diminished by systemic rimonabant (Tanda et al, 1997). Interestingly, MOPR antagonism does not affect THC-induced VTA DA cell firing (French, 1997), and CB1 antagonism does not affect activation of midbrain DA cells induced by morphine (Melis et al, 2000). These data suggest that interaction between opioid and eCB systems primarily in the NAc works to augment mesolimbic DA. See Figure 1; Table 1 for summary of neurotransmitter system interactions.

Table 1 Interactions between Endocannabinoid, Dopamine, and Opioid Systems

Critical roles for DA and endogenous opioids in food reward

Food acts as a potent natural reward/reinforcer. Food seeking and consumption is sustained not only by metabolic need, but also by motivation for food and food’s hedonic properties, which rely on DA and opioid systems. Several studies show that DAergic antagonism or lesions of the mesolimbic system attenuate food-seeking and operant responding for food/food-associated cues, but do not abolish feeding (Baldo et al, 2002; Cousins and Salamone, 1994; Koob et al, 1978). Moreover, DA depletion or antagonism spares the orofacial liking response to sucrose (Treit and Berridge, 1990). Thus, while mesolimbic DA mediates the effort exerted to obtain food, it does not mediate food’s hedonic properties. Conversely, MOPR antagonism decreases food intake while MOPR agonism enhances food consumption (Bodnar, 2015). Interestingly, opioid antagonists preferentially attenuate intake of highly palatable food (eg, foods that are sweet or high in fat) (Yeomans and Gray, 1996), suggesting that MOPR signaling mediates the hedonic properties of food. Furthermore, in specific subregions of the NAc shell (‘hotspots’), pharmacological stimulation of MOPR, DOPR, or KOPR enhances orofacial liking responses to sucrose (Castro and Berridge, 2014). Barbano et al (2009) found that systemic naloxone only decreases food intake in sated, but not hungry rats, and similarly reduces the effort rats are willing to exert for palatable food reinforcers. However, administration of the DA receptor antagonist flupenthixol attenuates how hard an animal will work for palatable food but does not affect food intake, suggesting that DA mediates the cost-benefit calculation of performing an action to obtain a reinforcer, but not necessarily hedonic value. Therefore, DA and endogenous opioids signal distinct facets of food reward and reinforcement (for a recent comprehensive review, see Baldo et al (2013).

eCBs and food reward

Stimulation of the eCB system promotes food reward and reinforcement partly through action within mesolimbic regions (Di Marzo and Matias, 2005). In 1975, Abel (1975) first documented the ability of cannabis to enhance appetite, especially for sweet foods. Since then, animal studies have shown that THC enhances food intake in sated rats (Williams et al, 1998) and increases the motivation to obtain food reinforcers (Solinas and Goldberg, 2005). Importantly, these effects are blocked by pretreatment with either CB1 or MOPR antagonists, suggesting a role for both the eCB and opioid systems in the motivation to work for food reinforcers (Solinas and Goldberg, 2005). These effects are likely due to the ability of THC to increase the hedonic properties of food, as THC enhances orofacial ‘liking’ reactions to sucrose and decreases aversive reactions to bitter quinine solutions (Jarrett et al, 2005, 2007). These effects are similarly blocked by CB1 receptor antagonism.

eCBs also modulate food reward and reinforcement. Intra-NAc shell administration of 2-AG enhances food intake (Kirkham et al, 2002), as does systemic and intra-NAc delivery of AEA (Hao et al, 2000; Mahler et al, 2007). Furthermore, infusion of AEA into NAc shell hotspots enhances ‘liking’ reactions to sucrose solutions (Mahler et al, 2007). In support of a role for AEA in food reward, Williams and Kirkham (1999, 2002) found that systemic delivery of AEA results in significant over eating in sated rats, and this is blocked by pretreatment with rimonabant or naloxone, but not by CB2 antagonism. AEA-induced hyperphagia is not affected by the serotonin agonist dexfenfluramine, leading the authors to conclude that eCB signaling promotes feeding through enhancing food reward rather than inhibiting serotonergic satiety mechanisms (Williams and Kirkham, 2002), but see Thompson et al (2016). Indeed, rats readily develop a CPP for palatable foods, and this CPP is enhanced by intra-cranial AEA and blocked by intra-cranial CB1 receptor antagonism (Mendez-Diaz et al, 2012). However, pharmacological inhibition of AEA with VDM11 does not increase food intake, suggesting that endogenous AEA may be insufficient to drive food consumption (Chambers et al, 2004). Though transgenic mice with selectively reduced forebrain 2-AG levels, do not develop a CPP for palatable food, they do develop a CPP for cocaine (Wei et al, 2016), evidencing a role for endogenous 2-AG in palatable food reward.

Disruption of eCB signaling via CB1 receptor antagonism disturbs food reward and reinforcement. CB1 receptor antagonist administration or CB1 receptor knockout reduces the intake of food and sweet solutions (Arnone et al, 1997; Di Marzo and Matias, 2005; Thornton-Jones et al, 2005), and CB1 receptor antagonists reduce self-administration of palatable food in both food restricted and sated rats, suggesting a role in the hedonic properties of food (Fois et al, 2016). CB1 receptor antagonism also diminishes how hard an animal is willing to work for a food reinforcer (Solinas and Goldberg, 2005), although this could also stem from a lessened hedonic impact of food reinforcers, decreasing their value. Indeed, systemic rimonabant or CB1 receptor knockout abolishes the ability of conditioned food reward to mitigate acoustic startle, while treatment with WIN enhances ‘pleasure attenuated startle’ (Friemel et al, 2014). Similarly CB1 knockout mice have attenuated motivation for sucrose and exhibit a lessened sucrose preference (Sanchis-Segura et al, 2004). Interestingly, Skelly et al (2010) found that intra-NAc shell delivery of WIN or rimonabant alone had no affect on the consumption of highly palatable food, however, intra-NAc DAMGO enhanced food consumption and this enhancement was exacerbated by WIN and abolished by rimonabant. These data support a relationship between NAc CB1 and MOPR signaling upon the facilitation of food hedonics.

The role of eCB signaling in food reward and reinforcement suggests that this system may be an effective target for the treatment of eating disorders, such as binge eating disorder (BED). BED is characterized by episodes of compulsive overconsumption of highly palatable food (ie, binges) with subsequent distress. Human studies show that individuals with BED have enhanced craving for palatable food reinforcers (Joyner et al, 2015), which is matched by augmented DA signaling in response to food related stimuli (Wang et al, 2011). However, BED sufferers report decreased pleasure following eating (Klatzkin et al, 2016). These data suggest similarities between drug addiction and binge eating, such as an enhanced cue-induced DA response, increased craving, and decreased pleasure produced by ingestion. In an animal model of BED, experimental rats exhibited decreased DA receptor number and enhanced MOPR levels in the striatum, compared to controls (Heal et al, 2017), suggesting pathological function of these systems. The eCB system is similarly implicated in BED. Women with BED have elevated plasma AEA levels (Monteleone et al, 2005), which could lead to enhanced DAergic reactivity to food stimuli. Indeed, in rat models, rimonabant dose dependently reduces binge eating (Scherma et al, 2013) and also decreases palatable food-induced NAc DA release (Melis et al, 2007). Similar results supported the use of rimonabant as a weight loss drug in human clinical trials. However, rimonabant administration resulted in depressed mood and anxiety, making the drug not well suited as a therapeutic (Christensen et al, 2007). Both the intended and side effects may be due to rimonabant’s inhibition of mesolimbic DA signaling. Therefore, CB drugs capable of reducing DAergic response to palatable food without globally dampening DA signaling could provide treatment for BED. For example, the CB1 receptor antagonist SM-11, which attenuates VTA-NAc DA cell single spiking and burst activity induced by WIN administration, but does not affect baseline DAergic activity (Fois et al, 2016).

Conclusions and future directions

In summary, the vast majority of research supports a role for eCB signaling in reward and reinforcement. Administration of CB drugs produces reward and reinforcement in both human users and in animal models. However, future research should work to improve the face validity of rodent models of CB self-administration through better modeling of human cannabis use with attention to the route of administration and chemical components of the CB drugs utilized. However, a role for endogenous activity of eCBs in reward remains unclear. Additional investigation utilizing neutral CB1 antagonists along with the development of tools for opto- and chemo-genetic targeting of eCB machinery will help to elucidate eCB function. Further, CB reward and reinforcement are blocked by MOPR antagonism, demonstrating a critical interaction between these systems in both hedonia and motivation. This relationship may be due to CB1 and MOPR interaction in mesolimbic regions such as the NAc, but further study is necessary to explore these mechanisms in vivo and their role in behavior.

A wide body of evidence indicates that eCB signaling is integral for food reward and food-maintained behavior. Exogenous CBs, like THC, increase food intake, enhance the amount of work an organism will perform for food, and augment CPP and orofacial liking reactions for palatable food. eCBs likely regulate food reward and reinforcement through interaction with DA and opioid systems in the VTA and NAc. MOPR antagonism blocks CB-induced enhancement of food seeking and hyperphagia, suggesting that endogenous opioid system activation underlies CB-induced food reward/reinforcement. Reciprocally, opioid-induced increases in food intake are blocked by CB1 antagonism, suggesting therapeutic potential for CBs in the treatment of eating disorders. However, with the failure of rimonabant there are currently no CB pharmacotherapies for eating disorders, emphasizing the need to develop CB1 receptor antagonists without detrimental side effects. Likewise, the mechanism by which opioid and eCB systems interact to in eating disorders merits further investigation, with particular attention to gender differences. Sufferers of eating disorders are predominantly female (Hudson et al, 2007), however, investigations into eCB–opioid–DA interactions have all been performed in male rats. Altogether, future investigations should work to examine how these systems work together to mediate reward/reinforcement and how pathological interactions may contribute to psychiatric disorders.

Funding and disclosure

Funding was provided by NIH grants DA039690 to JMW and DA022340/DA042595 to JFC. The authors declare no conflict of interest.