It has been proposed that addictive drugs and reinforcement learning might influence the same neurophysiological pathways. In a recent paper in Science, Redish uses this hypothesis to generate a computational model of addiction.

One of the effects of many drugs of abuse is an increase in dopamine levels in the brain, which is thought to contribute to the addictive quality of these drugs. Natural rewards are also accompanied by an increase in dopamine, although with learning this increase shifts from the time of reward to the time of the cuing stimulus.

Reinforcement learning occurs as a result of an individual's interaction with the environment — that is, in response to experience rather than explicit teaching. This process can be modelled using temporal-difference reinforcement learning (TDRL), a reinforcement learning algorithm that relies on an error-reward signal. TDRL models aim to attain maximum future reward, and learn, by means of various calculations, to function accordingly. In these models, learning only occurs when the reward is incorrectly predicted. With correctly predicted reward, there is no error signal, and therefore no learning.

Using dopamine as the error-reward signal, the computational model established by Redish shows what happens when a positive signal, much like the dopamine surge that would accompany the use of a drug such as cocaine, is introduced neuropharmacologically rather than occurring as a result of an unexpected natural reward or cue stimulus. This overrides the reward predictions of the TDRL model and, because the positive signal does not relate to the other factors included in the model's calculations, the model is unable to predict such rewards. This means that the likelihood of the model selecting a pathway that would lead to drug reward depends on its number of experiences.

With learning, TDRL models usually achieve a stable response to natural rewards. This response depends on the time to and level of a reward and any discounting factors, which decrease the expected value of the reward. This sensitivity between natural reward and cost is called elasticity. In Redish's modified TDRL, the demand for drug reward increases disproportionately, so that although the process still shows some elasticity, it is inelastic when compared with natural reward. However, this does not necessarily mean that a drug reward would always be selected over a non-drug reward, as selection would depend on the size of the non-drug reward relative to that of the drug reward.

It is hoped that computational models of addiction such as this one will help us to understand the mechanisms and factors that are involved in addiction. Such models could be used to help explain and confirm observations, and to make further predictions that can be tested in the future.