How cocaine addiction can change the brain

carrot on stick

Editor's Introduction

Carrots and sticks fail to change behavior in cocaine addiction

annotated by
Shaimar Gonzalez Morales

Drug addiction is a disease that affects many people. What is it that makes it so hard for those affected to give up drug use? This study investigates the changes in brain physiology and behavior that occur when someone becomes dependent on drugs. They show that, over time, those who become addicted behave more habitually and are less able to change their behavior in response to the environment. The findings here are useful for the improvement of medical interventions and treatments.

Paper Details

Original title
Carrots and sticks fail to change behavior in cocaine addiction
Karen D. Ersche Laetitia H. E. Ward
Original publication date
Vol. 352, Issue 6292, pp. 1468-1471
Issue name


Cocaine addiction is a major public health problem that is particularly difficult to treat. Without medically proven pharmacological treatments, interventions to change the maladaptive behavior of addicted individuals mainly rely on psychosocial approaches. Here we report on impairments in cocaine-addicted patients to act purposefully toward a given goal and on the influence of extended training on their behavior. When patients were rewarded for their behavior, prolonged training improved their response rate toward the goal but simultaneously rendered them insensitive to the consequences of their actions. By contrast, overtraining of avoidance behavior had no effect on patient performance. Our findings illustrate the ineffectiveness of punitive approaches and highlight the potential for interventions that focus on improving goal-directed behavior and implementing more desirable habits to replace habitual drug-taking.


Why do some people take drugs by any possible means, seemingly without regard for the consequences? Actions normally constrained by their outcome become “out of control” in drug-addicted individuals, who fail to stop taking drugs despite being aware that continuing drug use provides little pleasure while inflicting considerable damage on their lives. Even the prospect of contracting an infectious disease fails to deter these individuals from sharing drug paraphernalia. Such maladaptive and ill-judged behaviors may be explained in terms of aberrant learning processes (1), where drug-taking is a learned behavior initially directed toward a conscious desire to enjoy a rush or avoid feelings of discomfort. Such goal-directed actions, whether appetitive or avoidant, are modulated by their outcomes. Following extended practice, however, drug-taking may deteriorate into a stimulus-driven habit that is elicited by antecedent stimuli and is thus performed regardless of any goals (2). This proposal is consistent with the notion of behavior being jointly regulated by goal-directed and habitual brain systems (34) and the disruption of this balance during the course of addiction (1).

Maladaptive behavior in drug-addicted individuals may thus result from impairments in goal-directed control, an enhanced propensity to develop stimulus-driven habits, or a combination of these factors. Preclinical evidence supports both accounts. Exposure to either cocaine or stress amplifies the transition from goal-directed to stimulus-driven behavior (56). Cocaine administration also diminishes information processing about consequences, leading to failures to adjust behavior during goal reevaluation (7).

We studied 125 participants to determine whether a newly learned behavior is under voluntary (goal-directed) or habitual (stimulus-driven) control using both positive and negative reinforcement. Seventy-two individuals met the DSM-IV-TR criteria for cocaine dependence and were actively using cocaine, as verified by urine screen (8), whereas 53 healthy control volunteers had no history of chronic drug or alcohol abuse (table S1). Participants learned by trial and error that an action was associated with a particular outcome, such as earning points toward a monetary reward (Fig. 1A) or avoiding an unpleasant electrical shock (Fig. 2, A and B). We then reduced the value of previously reinforcing outcomes by discontinuing point allocation for certain outcomes in the appetitive task (Fig. 1B) and physically disconnecting participants from the electrical stimulator in the avoidance task (Fig. 2C). We then tested whether participants made fewer responses to obtain or avoid the (now) devalued outcome, reflecting a goal-directed strategy, or whether they maintained their previously learned behavior despite outcome devaluation, as an index of habit.


Fig. 1. Appetitive instrumental learning task. (A) Participants learned by trial and error which response associated with an animal picture gained them points. Feedback was provided by a picture of another animal, coupled with a number of points, or an empty box with no points. Goal-directed discrimination learning performance improved steadily in all participants over eight training blocks (F6,684 = 43.98, P < 0.001), but performance accuracy in individuals with CUD was reduced compared with that in control volunteers (F1,121= 20.19, P < 0.001). (B) Participants were instructed that some of the pictures that were previously associated with points would no longer lead to point increases. Sensitivity to outcome devaluation was tested by simultaneous presentation of two outcome-related pictures and the instruction to select the response leading to a valued outcome without providing performance feedback. CUD patients showed significant impairments when outcome-action knowledge was tested behaviorally (t88.2 = 3.83, P < 0.001). (C) Slip-of-action test to determine the balance between goal-directed and habitual responses: Participants were asked to selectively respond to those stimuli still associated with reward and to withhold responding to stimuli that had been devalued. (For demonstration only, we indicated “go” and “no-go” below the pictures to denote the correct response.) Habitual behavior is reflected by continued responses to devalued outcomes, implying reduced sensitivity to outcome value. We observed a highly significant group–by–outcome-value interaction (F1,121 = 18.24, P < 0.001). CUD patients responded significantly more often than controls to the stimuli associated with the devalued outcome (t121= –4.72, P < 0.001), whereas the level of responding toward valued outcomes did not differ between the groups (t121 = –0.65, P = 0.520). (D) Immediately after the slip-of-action test, a control task was introduced: Participants were instructed to respond only to those stimuli still associated with reward and to withhold responding to devalued stimuli. All participants responded more frequently to stimuli associated with the valued rather than the devalued outcome (F1,121 = 111, P < 0.001), but this difference was significantly smaller in CUD patients (F1,121 = 42.10, P < 0.001). (In all panels, error bars denote SEM, ns indicates P > 0.05, and asterisks indicate P < 0.05.) Analysis of covariance showed that executive impairments in the control task were not sufficient to account for the impaired “slip-of-action” performance. The significant group–by–outcome-value interactions in (C) survived statistical correction (F1,120 = 8.79, P = 0.004), indicating enhanced habitual control (see text).

Panel A

Instrumental Appetitive Learning Performance


Participants were presented with animal pictures on a screen and asked to push a button to determine if the image was associated with a value or not. Over the course of eight blocks (96 trials), the participants learned to associate certain images with a reward.


The authors used SPSS (a software package used for statistical analysis) to analyze their data.

The authors performed a two-step hierarchical multiple regression to test participants’ association of images and outcomes. This method allowed the authors to test how different factors independently affect an outcome.

In the first step, the analysis was geared to test how a factor such as dependency status (dependent or not dependent) on cocaine, opiates, and alcohol affects the response of the individual. For the second step, they included other factors such as impulsivity, compulsivity, stressful life events, and the mean for learning accuracy during the 96 trials to assess if any of these variables influenced the individuals’ responses.

To learn more about multiple regression:


All participants gradually learned which animal pictures were associated with rewards. The authors found that cocaine use disorder (CUD) patients had lower learning performance than control volunteers.

Panel B

Behavioral Test of Outcome-Action Learning


In the second test, the authors wanted to test participants’ sensitivity to outcome devaluation (when the reward is reduced or removed from an outcome). To see how well participants retained their associations, the authors presented them with two pictures: one of the animal pictures with an X in front of the picture (the devalued choice) and one without an X. The participants were instructed to choose the one without an X to receive a reward.


The authors used t-tests and Mann-Whitney U tests.

The t-test is used to compare the means of two groups—in this case, the mean in accuracy between the CUD group and the control group. The t-test is a “parametric” test, which means that it is appropriate when the data follow a normal distribution. When the data do not follow a normal distribution, a non-parametric analysis such as a Mann-Whitney U test can be used to compare the means.

To learn more about t-tests:

To learn more about Mann-Whitney U tests:


The authors found that the longer the training phase for a task, the less sensitive CUD patients were to changes in the value of a stimuli. This suggests that they have an impairment in their outcome-action learning.

In the appetitive learning task, the longer the training, the less sensitive CUD patients were regarding changes in stimuli value, suggesting that there is an impairment in outcome-knowledge action.

Panel C

Test of Stimulus-Response Habits (Slip-of-Action)


In this task, participants were presented with all possible outcomes, with two pictures covered by an X to indicate that those stimuli had no value. Participants were instructed to choose pictures that were associated with a reward and not choose those that had no reward.


To test how the participants responded to outcome devaluation the authors used repeated-measures ANOVA. An ANOVA with repeated measures allows measurement of the same subjects under different conditions and makes comparisons between groups of subjects as well as different conditions. In this case, this was a two-2 factor ANOVA, which means two degrees of comparison, the between factor (i.e. between groups: cocaine or without cocaine) and the within factor (i.e. within groups: value or devaluated stimuli).


CUD patients continued to respond to stimuli even after they were devalued, suggesting that the brain systems that control habitual behaviors were influencing behavior more than those that control goal-directedness. This further suggests that CUD patients had difficulty integrating new information. They continued to respond to valued outcomes similarly to the control group.

Panel D

Test of Working Memory/Disinhibition


In the final task, participants were presented with all possible outcomes. Some images that were previously associated with a reward were marked with an X, indicating that they no longer had any value. The participants were asked to choose images that were associated with a reward. This trial was presented immediately after the slip-of-action test to see whether participants could remember what they just learned.


The authors again used ANOVA to analyze the data from this task. They included the differences in response between the response to the original stimulus and the response to the now-devalued stimulus.


Patients with CUD showed less difference in how they responded to valued stimuli and devalued stimuli. Analysis of covariates showed that there was impairment of executive functions, but that those changes didn’t fully explain the decrease in performance (meaning there are other factors besides CUD affecting learning performance).

Ethics of data collection, analysis, and presentation

Studies that include human subjects need to be approved before they can be conducted. This is to safeguard the health and rights of participants.

The protocol for this study was approved by the National Research Ethics Committee under code 12/EE/0519; PI: KDE.


Fig. 2. Avoidance instrumental learning task. (A) Participants were trained to associate distinctive visual stimuli with an electrical shock to one wrist or the other. (B) Participants were instructed to avoid receiving shocks by pressing a foot-pedal on the side corresponding to the wrist where they were expecting to receive an electrical shock in response to the appearance of the CS. Individuals with CUD made significantly fewer successful avoidance responses compared with controls (F1,121 = 11.28, P = 0.001). No group differences in skin conductance responses to the CS were observed (F1,89 = 0.71, P= 0.401). (C) In the outcome devaluation procedure, we disconnected one wrist from the electrical stimulator (devalued) while leaving the other wrist connected (valued). Participants were made explicitly aware that one wrist previously associated with an electrical shock was now safe. (D) During the extinction procedure, the number of unnecessary foot-pedal presses to avoid shocks from the now disconnected electrical stimulator was measured. The events discussed in (C) and (D) were conducted twice: once after a short period of training and again after overtraining to promote habit formation. All participants made a greater number of successful avoidance responses to the CS associated with the valued outcome compared with the devalued outcome (F1,121 = 20.05, P < 0.001). This difference was marginally smaller in CUD patients compared with controls (F1,121 = 3.23, P = 0.075). Consistent with their poor performance during the training phases, individuals with CUD remained less successful than controls in avoiding shocks. Skin conductance increased in all participants in response to the CS associated with the valued outcome compared with the devalued outcome (F1,88 = 8.23, P = 0.005), but this did not differ between the groups (F1,88 = 0.29, P = 0.592). [Results were statistically corrected for group differences in subjective shock intensity. In (B) and (D), error bars denote SEM, SQRT signifies square-root transformation, ns indicates P > 0.05, and asterisks denote P < 0.05.]

Panel A

Classical Conditioning

Similar to the appetitive learning task (see Figure 1), participants were presented with images and learned associations between certain stimuli and outcomes.

In this case, the outcomes were not rewards, but electric shocks to either the left or right wrist.

Panel B

Instrumental Avoidance of Learning Performance

After the training (Panel A), the authors introduced a pedal that participants could use to avoid receiving a shock. Participants had to press the pedal on the side that corresponded to the wrist they expected to receive a shock. If they pressed the correct pedal in 750 milliseconds or less, they avoided the shock.

CUD patients were less successful at avoiding the electric shock, and extensive training did not have an effect on how successfully they avoided the shock.

Panel C

Outcome Devaluation

In the third part of the experiment, the authors removed the electrical shock from one of the stimuli (devalued the stimulus). Participants were told which stimuli was being devalued, and they were re-trained similarly to the first phase.

Panel D

Test of Habitual Responding in Extinction

The final phase was similar to the second phase: Participants were instructed to use the correct foot pedal to avoid a shock. However, they did not need to use the pedal to avoid shocks from stimuli that had been devalued. The authors measured the number of successful avoidance responses and the number of unnecessary foot pedal presses (when participants pressed the pedal associated with the devalued stimulus).

This is called “extinction” because one of the outcomes has been extinguished.

The authors conducted the phases in C and D twice: once with a short training period, and once after an extensive training period that promoted habit formation.

All participants showed more successful avoidance responses (foot pedal presses) for the valued stimulus over the devalued stimulus, but this difference was smaller in CUD patients. Overall, CUD patients were less successful at avoiding shocks than control patients. Similar to the appetitive learning task, this suggests that the CUD patients were less successful at integrating new information.

In participants with cocaine use disorder (CUD), instrumental learning performance fell significantly short of that of control volunteers, irrespective of whether the goal was to make responses to obtain symbolic rewards or to avoid electrical shocks (Figs. 1A and 2B). However, depending on the type of reinforcement, prolonged training had a differential effect on the behavior of these individuals. For appetitive behavior, extensive training rendered CUD patients less sensitive to outcome devaluation (Fig. 1B). They persistently responded to stimuli previously associated with reward, irrespective of whether their behavior was actually rewarded or not (Fig. 1C). In fact, the shift toward habitual responding improved their response rate to the valued outcome (Fig. 1C). The strong habit bias in the slip-of-action test was not due to executive impairments (910), which were assessed separately in a control task (Fig. 1D) and included as a covariate in the statistical model.

By contrast, overtraining avoidance behavior had no effect on task performance in individuals with CUD. Despite intact fear conditioning (Fig. 2B), CUD patients continued to show attenuated avoidance responses to the conditioned stimulus (CS) associated with a shock, even after extended training (Fig. 2D). Such impairments in the initiation of goal-directed avoidance behavior have previously been reported in animals after dopamine receptor blockade (11) or experimental lesions of dopamine neurons (12). Although CUD patients undervalued the aversive outcome, overtraining did not change their sensitivity to outcome devaluation, either in terms of behavior or skin conductivity. As shown in Fig. 2D, CUD patients’ responses were comparable to controls when the CS was no longer associated with a shock.

In light of the high prevalence of comorbid addictions in CUD, we sought to determine the extent to which the increased formation of appetitive habits and the persistent deficiencies in avoiding aversive outcomes resulted from cocaine addiction specifically or from addiction to other drugs. We also assessed the influence of vulnerability factors such as impulsivity-compulsivity traits, stress, and poor instrumental learning performance (8). Addiction to cocaine, but not to other drugs, explained ~13% of the variance of appetitive habits in the slip-of-action test (coefficient of determination R2 = 0.13; F4,117 = 4.48, P = 0.002). However, reduced performance accuracy during training (β = –0.410, P < 0.001) and higher numbers of stressful life events (β = 0.30, P = 0.015) were factors of even greater weight in the model, accounting for one-third of the variance (R2 = 0.31; F8,113 = 6.32, P < 0.001). Hence, our results suggest that, in individuals with prior exposure to cocaine and stress, impairments in instrumental learning lead to a shift from goal-directed to goal-independent habitual behavior.

We also applied a similar model to examine attenuated avoidance responses to the valued CS in extinction (table S2), revealing that addiction to cocaine (but not to other drugs) accounted for only 9% of the variance (R2 = 0.09; F4,119 = 2.82, P = 0.028). High levels of impulsivity (β = 0.18, P = 0.047) and low avoidance accuracy during overtraining (β = –0.67, P < 0.001)—both associated with reduced striatal dopaminergic neurotransmission (1213)—were the strongest predictors in this model, accounting for more than half of the variance of attenuated avoidance (R2 = 0.52; F8,115= 15.85, P < 0.001). These results are consistent with preclinical evidence for impulsivity predicting compulsive cocaine-seeking, even in the face of aversive consequences (14).

Our data provide compelling evidence for impairments in instrumental learning in CUD, regardless of affective valence and whether rewards were primary (shock) or secondary (monetary). In the case of appetitive learning, increased habitual responding may either be an indirect consequence of poor goal-directed action (7) or result from stronger habit learning. Both explanations would be consistent with disruptions of the balance between goal-directed and habitual control hypothesized to underlie compulsive cocaine-seeking (1). By contrast, impaired performance for instrumental avoidance in CUD patients occurred in the context of intact fear conditioning and was not accompanied by habit learning. This could be interpreted as a motivational impairment that is consistent with theories of the role of dopamine in motivational processes (1112) and with reports of reduced striatal dopamine function in CUD (1516). Our findings are also in line with evidence indicating that manipulations of dopamine neurotransmission alter instrumental learning (17) and shift the balance between goal-directed and habitual responding (1819).

Although the observed appetitive habit bias was specific to cocaine addiction, the main contributory factors were impaired goal-directed learning and accumulated life stress. We also report evidence of additional executive impairments consistent with previous findings (9); however, these were insufficient to explain the increased goal-to-habit shift in appetitive behavior. Nonetheless, impulsivity and instrumental learning impairments are critical factors in explaining the reduced propensity to avoid aversive outcomes.

How can these findings be applied to other addictive and compulsive behaviors? Emerging evidence in alcoholism has already shown disruptions in the balance of action control for appetitive behavior (2021). Avoidance habits might be more relevant for opiate addiction, given that the avoidance of unpleasant withdrawal symptoms is thought to play an important role in its development. Although we did not find supportive evidence in our comorbid sample, this hypothesis should be tested in opiate-addicted patients without such comorbidity. The performance profile of CUD patients in the appetitive condition may reflect a transdiagnostic risk factor for developing compulsive habits, as was recently shown to explain common deficits seen in obsessive-compulsive disorder (OCD), alcohol addiction, and eating disorders (2223). Notably, however, our data show that this pattern may not hold in the context for avoidance behavior, where, for example, OCD patients (unlike our CUD sample) exhibit greater habitual learning (24).

Our findings illustrate the particular difficulty of treating CUD: The persistent deficits in avoiding aversive consequences highlight the ineffectiveness of punitive interventions for cocaine addiction. Moreover, the tendency of patients to perform a rewarded behavior in an automatic fashion, irrespective of its consequences, is unlikely to be affected by cognitive interventions that target the enhancement of alternative outcomes. Treatment of cocaine addiction should thus focus on training desirable habits that replace habitual drug-taking while protecting CUD patients from aversive consequences that they may fail to avoid.

Supplementary Materials

Materials and Methods

Supplementary Text

Tables S1 and S2

References (2530)

References and Notes

  1. B. J. Everitt, T. W. Robbins, Nat. Neurosci. 8, 1481–1489 (2005).
  2. F. J. Miles, B. J. Everitt, A. Dickinson, Behav. Neurosci. 117, 927–938 (2003).
  3. A. Dickinson, Philos. Trans. R. Soc. London Ser. B 308, 67–78 (1985).
  4. B. W. Balleine, J. P. O’Doherty, Neuropsychopharmacology 35, 48–69 (2010).
  5. L. H. Corbit, B. C. Chieng, B. W. Balleine, Neuropsychopharmacology 39, 1893–1901 (2014).
  6. E. Dias-Ferreira et al., Science 325, 621–625 (2009).
  7. G. Schoenbaum, B. Setlow, Cereb. Cortex 15, 1162–1169 (2005).
  8. Supplementary materials are available on Science Online.
  9. K. D. Ersche et al., Science 335, 601–604 (2012).
  10. R. Z. Goldstein, N. D. Volkow, Nat. Rev. Neurosci. 12, 652–669 (2011).
  11. R. J. Beninger, S. T. Mason, A. G. Phillips, H. C. Fibiger, J. Pharmacol. Exp. Ther. 213, 623–627 (1980).
  12. J. D. Salamone, M. Correa, Behav. Brain Res. 137, 3–25 (2002).
  13. J. W. Dalley et al., Science 315, 1267–1270 (2007).
  14. D. Belin, A. C. Mar, J. W. Dalley, T. W. Robbins, B. J. Everitt, Science 320, 1352–1355 (2008).
  15. N. D. Volkow et al., Nature 386, 830–833 (1997).
  16. D. Martinez et al., Am. J. Psychiatry 166, 1170–1177 (2009).
  17. M. J. Frank, L. C. Seeberger, R. C. O’Reilly, Science 306, 1940–1943 (2004).
  18. S. de Wit et al., Psychopharmacology 219, 621–631 (2012).
  19. K. Wunderlich, P. Smittenaar, R. J. Dolan, Neuron 75, 418–424 (2012).
  20. Z. Sjoerds et al., Transl. Psychiatry 3, e337 (2013).
  21. J. M. Barker, J. R. Taylor, Neurosci. Biobehav. Rev. 47, 281–294 (2014).
  22. C. M. Gillan et al., Am. J. Psychiatry 168, 718–726 (2011).
  23. C. M. Gillan, M. Kosinski, R. Whelan, E. A. Phelps, N. D. Daw, eLife 5, e11305 (2016).
  24. C. M. Gillan et al., Biol. Psychiatry 75, 631–638 (2014).
  25. Acknowledgements: We thank all volunteers for their participation in this study, as well as the staff at the Mental Health Research Network and the Cambridge BioResource for their assistance with volunteer recruitment. We are especially grateful to N. Flake and S. Whittle for their exceptional commitment in this regard. We also thank the staff at the National Institute for Health Research (NIHR) Clinical Research Facility at Addenbrooke’s Hospital for their support throughout this study. We are grateful to S. Abbott, R. Lumsden, J. Arlt, C. Whitelock, I. Lee, and M. Pollard for their assistance. C.M.G. is supported by a Sir Henry Wellcome Postdoctoral Fellowship (101521/Z/12/Z). This work was funded by a grant from the Medical Research Council (MR/J012084/1) and was conducted within the NIHR Cambridge Biomedical Research Centre and the Behavioral and Clinical Neuroscience Institute, which is jointly funded by the Medical Research Council and the Wellcome Trust. The data described in this paper are stored at the University of Cambridge's institutional repository, Apollo (