A core and disturbing feature of substance-related and addictive disorders is their persistence despite real and serious negative consequences to the individual. These consequences can include relationship breakdown, job loss, adverse medical and health outcomes and incarceration. They represent a large part of the human toll of addiction.

Following the groundbreaking work of Deroche-Gamonet et al. [1], it has been increasingly popular to model this aspect of human addiction in non-human animals by punishing the drug-seeking response. Here, laboratory animals are initially trained to self-administer a drug and then this behaviour is punished, typically using footshock or another aversive event. This approach makes sense. It can be used to understand why some laboratory animals discontinue drug-seeking in the face of adversity and it can be further used to identify individuals that persist in drug-seeking and taking despite the punishment, and so express a punishment-resistant, compulsive or ‘addiction-like’ phenotype.

In a clever series of experiments, Durand and colleagues [2] report how sensitivity to punishment changes with experience. They first show that mild footshock punishment is ineffective at reducing cocaine self-administration. Rats punished with this mild footshock would continue to seek cocaine. This is unsurprising; the rats had received many days of cocaine self-administration training and the cost (punishment) of responding was relatively low. Then, Durand et al. increased the intensity of punishment. Eventually, there was near complete suppression of cocaine-seeking. The interesting finding was that when rats were allowed to recover their responses in the absence of punishment and then re-tested at the previously ineffective lower intensity punishment, these lower intensities could now suppress cocaine-seeking.

Follow-up experiments provided key insights into this phenomenon. The effect was only observed if the rats were punished with higher shock intensities, not if these shocks were delivered response-independently, precluding mechanisms independent of the punishment contingency (e.g., sensitisation to shock, fear conditioning). This effect did not generalise to a different punisher (histamine injections). Furthermore, this effect persisted if tested intermittently, but not if tested daily with the lower intensity shock punisher.

The finding that milder punishers are more effective after intense punishment matches observations from previous studies using natural rewards [3]. It is reminiscent of hysteresis—a sensitivity to history in biological systems, such as cell signalling and gene expression. Moreover, the opposite can also be true: intense punishment can be less effective following weak punishment [3]. So, the effectiveness of punishment is not invariant within an individual.

The underlying mechanism(s) for these findings remain elusive. The simplest explanation is that the rats sensitise to footshock. However, this is unlikely based on Durand and colleagues’ findings (see above). An alternative possibility is that this effect depends on how and when action values are retrieved and updated. For example, severe punishment may update the punished action’s value. Subsequent weak punishers that share sensory properties with severe punishers can invoke this updated value. Regular testing with weak punishment promotes a re-updating of action value and a return to more modest suppression, whereas intermittent testing only permits retrieval of previously learned values without updating. Clearly, more work is needed to understand this.

Durand et al. findings have important implications for how we think about punishment and how we use it to make inferences about the underlying causes of behaviour. Persistent drug-taking may well shift an individual’s punishment-responsivity curve towards insensitivity, but the effects of a given punisher on cocaine-seeking, or indeed any behaviour, are not invariant. Rather, these effects depend on complex learning and motivational processes that determine whether, when, and by how much punishers will suppress a behaviour.

This conclusion will come of no surprise to students of animal learning, who have long recognised that the contingencies and circumstances within which punishment is experienced determine whether and how punishment suppresses behaviour [3]. Indeed, profound insensitivity to punishment has been observed under conditions of relatively severe punishment and clear aversive motivation [4], an effect attributable to failed response-punisher association learning. So, insensitivity to punishment can emerge for a number of reasons.

Punished drug-seeking remains an important model for the field. But, as the findings of Durand et al. underscore, perhaps it is time to examine the root causes of behaviour in these tasks. A profitable line of enquiry may be to understand the extent to which drug-seeking becomes punishment-resistant through dysfunctional motivations [1], impaired encoding of punishment associations [4], and/or as an emergent property of reduced behavioural control or steeper punishment discounting (see ref. [5]), amongst other potential mechanisms. This subtle shift in focus towards behavioural mechanisms may move us closer to a deeper and more accurate understanding of, and treatments for, decision making in substance-related and addictive disorders.