Abstract
Learning that certain actions lead to risky rewards is critical for biological, social, and economic survival, but the precise neural mechanisms of such reward-guided learning remain unclear. Here, we show that the human nucleus accumbens plays a key role in learning about risks by representing reward value. We recorded electrophysiological activity directly from the nucleus accumbens of five patients undergoing deep brain stimulation for treatment of refractory major depression. Patients engaged in a simple reward-learning task in which they first learned stimulus-outcome associations (learning task), and then were able to choose from among the learned stimuli (choosing task). During the learning task, nucleus accumbens activity reflected potential and received reward values both during the cue stimulus and during the feedback. During the choosing task, there was no nucleus accumbens activity during the cue stimulus, but feedback-related activity was pronounced and similar to that during the learning task. This pattern of results is inconsistent with a prediction error response. Finally, analyses of cross-correlations between the accumbens and simultaneous recordings of medial frontal cortex suggest a dynamic interaction between these structures. The high spatial and temporal resolution of these recordings provides novel insights into the timing of activity in the human nucleus accumbens, its functions during reward-guided learning and decision-making, and its interactions with medial frontal cortex.
Similar content being viewed by others
Main
William Congreve, the seventeenth century English playwright, wrote: ‘Uncertainty and expectation are the joys of life.’ Risk- taking, perhaps the epitome of uncertainty and expectation, is an integral component of economic, social, and biological decision-making, and typically incites positive and appetitive emotions. Although functional MRI has provided evidence that the human nucleus accumbens becomes active during reward learning and risk-taking (Breiter and Rosen, 1999; Delgado et al, 2000; Knutson et al, 2001; Matthews et al, 2004; O'Doherty et al, 2004), many questions about the functional properties of the human nucleus accumbens remain debated or unknown. Some of these properties are difficult or impossible to ascertain using functional MRI, given limitations of measuring slow changes in blood flow rather than rapid changes in neural activity. In addition, the neuronal processes within this structure cannot be assessed directly by functional MRI. Here, we recorded electrophysiological activity directly from the nucleus accumbens of five awake human patients who underwent deep brain stimulation surgery for treatment of major depression (Schlaepfer et al, 2008). This provided a rare opportunity to examine the functioning of the human nucleus accumbens. We addressed the following issues, which remain debated or largely unknown.
First, although the nucleus accumbens clearly is involved in reinforcement processing, different accounts for what information it represents have been put forth. Some have proposed that nucleus accumbens activity reflects errors in reward prediction (McClure et al, 2003; O'Doherty et al, 2004; Abler et al, 2006; Rodriguez et al, 2006), which is the difference between the reward or reward cue experienced, and the reward that was predicted. Others have suggested that it represents the rewarding value of reinforcements such as food or money (Knutson et al, 2001; Small et al, 2001; Cromwell et al, 2005). Although not typically dissociated in experiments, these two accounts can make different predictions. For example, a perfectly predictable reward has a prediction error of zero but the reward value greater than zero. Relatedly, some have argued for a ventral/dorsal striatum distinction, with the ventral region being involved in learning about rewards and the dorsal regions being involved in selecting actions based on learned reward values (O'Doherty et al, 2004; Morgane et al, 2005; Atallah et al, 2007).
Second, it is unknown how quickly the human nucleus accumbens can process task-related information. Scalp-recorded EEG studies have demonstrated that the medial frontal cortex can process reinforcement information as early as 200 ms (Gehring and Willoughby, 2002), but nucleus accumbens activity cannot be measured from the scalp, so the timing of activity in the human nucleus accumbens, relative to the timing of other structures including the medial frontal cortex, is unknown. Although functional MRI can resolve activity changes on the order of seconds, EEG can provide insight into the speed with which information is processed within the nucleus accumbens (eg on the order of tens or hundreds of milliseconds).
Finally, the electrophysiological interactions with the medial frontal cortex, which has direct efferent projections to the nucleus accumbens (Finch, 1996; Haber and McFarland, 1999; Morgane et al, 2005), are largely unknown in humans. Functional MRI studies have suggested that functional connectivity between the nucleus accumbens and the medial frontal cortex increases in response to risks and rewards (Cohen et al, 2005). However, the latency and direction of the functional interactions between the nucleus accumbens and medial frontal cortex, which may occur in tens or hundreds of milliseconds (Kasanetz et al, 2006), is below the temporal resolution of functional MRI. Intracranial EEG provides a unique opportunity to examine the time courses of the interactions between these regions with high temporal precision. Thus, a final goal of the study was to examine the latency and direction of interactions between the electrophysiological responses of the nucleus accumbens and medial frontal cortex (measured via surface EEG).
MATERIALS AND METHODS
Patients
All five patients (3 male; aged 37–55, average: 45) suffered from treatment refractory (refractory to multiple medications, psychotherapy, and electroconvulsive therapy) major depression. Electrode placement was planned using MRIs, as described elsewhere (Sturm et al, 2003; Schlaepfer et al, 2008). The target structure was the postero-ventro-medial part of the nucleus accumbens (see Figure 1d). The location of electrode placement was made entirely on clinical grounds. This experiment, and the larger clinical study of the use of deep brain stimulation as a treatment option for major depression, was approved by the ethics committees at the Universities of Bonn and Cologne. This study is registered with the Trials Registry (www.clinicaltrials.gov) under the number NCT00122031.
EEG Recording
Electroencephalogram recordings were conducted in a quiet testing room the day after surgical implantation of the DBS electrodes. The DBS electrodes are Medtronic model 3387, and are made of a mixture of platinum/iridium (90/10%). At this time, electrode leads remained externalized and could be hooked up to our mobile EEG recording system. Continuous EEG data were sampled at 1000 Hz with a 300 Hz anti-alias filter and referenced to linked mastoids. The recordings reported here were taken from the ventral-most contact in the left hemisphere. Results were similar for other contacts. Anatomically, this contact is located in the purported shell of the nucleus accumbens, as confirmed by visual inspection of postoperative X-ray scans. However, it is not possible for us to determine whether the activity we observed was generated by neurons located within the shell subregion, or whether other neurons around the ventral striatum contributed to the signal. Further, because of the surface area of the contacts, it is likely that many neurons within the ventral striatum contributed to the signal recorded here. We also recorded from several surface EEG electrodes, including Cz. Patients had taken their standard antidepressant medication, but were not sedated. Patients sat in a comfortable chair in front of a desk and performed the experiment on a laptop computer. The laptop was equipped with a parallel output cable that delivered triggers to the EEG recording system at the onset of each visual stimulus and button press with millisecond precision.
Event-Related Potential Statistical Analyses
Electroencephalogram data were band-pass filtered off-line at 0.1 to 15 Hz using filtering methods provided by eeglab software (www.sccn.ucsd.edu/eeglab/). Event-Related Potentials (ERPs) were computed by taking EEG windows around the onset of each event of interest (eg cue) from −200 to +1000 ms. Single trial epochs were baseline corrected from −200 to 0 ms. Analyses were conducted using ANOVAs of voltage changes from 400–600 ms post-cue or 300–400 ms post-outcome in SPSS 12.0 software package. Task-related data for Cz are presented in the Supplementary Information section.
Cross-correlation vectors were computed as follows. First, we calculated the correlation coefficient between the nucleus accumbens and Cz time series within a window of 0 to +1000 ms around each triggering event (eg feedback onset). This is the cross-correlation with a time lag of 0 ms. Next, we shifted one time course with respect to the other by 1 ms; the resulting correlation coefficient is the cross-correlation with a time lag of 1 ms. This procedure was repeated in 1 ms steps from −500 to +500 ms. The result is a vector of correlation coefficients, where the coefficient at each time lag represents the extent to which activity at one electrode can be predicted from the lagged activity at the other electrode. Positive values indicate that Cz activity predicts future nucleus accumbens activity, and negative values indicate that nucleus accumbens activity predicts future Cz activity. This procedure was carried out for each trial, and then separately averaged together for separate conditions. Finally, we averaged across conditions. To assess statistical significance of the cross-correlations, we used the following boot-strapping procedure. First, we computed all cross-correlations from each patient's data, but randomized (1) the trial pairing within each category (eg nucleus accumbens activity from trial 1 might be cross-correlated with Cz activity from trial 14), and (2) the onset time used to compute the correlations. Thus, the temporal characteristics of both time courses are preserved, but their position in time with respect to each other is randomized. This boot-strapping procedure was repeated 100 times for each condition and for each patient; displayed in Figure 5 is the grand average.
Experimental Protocol
On the day following surgery, we were able to obtain electrophysiological recordings while patients were awake, outside of surgery, not sedated, and engaged in a reward-learning task. Before the start of the experiment, patients were explained the procedures of the experiment and signed informed consent documents. The entire experiment, including instructions and breaks, lasted ∼30 min.
Our reward-learning task was designed to allow a distinction between learning stimulus-action-reward associations and freely choosing between actions with already learned associations (O'Doherty et al, 2003; Morris et al, 2006; Atallah et al, 2007). This distinction is important because learning these associations and using them to guide behavior might be subserved by different neural circuits (Cardinal, 2006; Atallah et al, 2007). The experiment thus comprised two tasks, a ‘learning’ task and a ‘choosing’ task. In the learning task, patients pressed the left or right mouse button when a visual stimulus (cue) appeared on the left or right side of a computer monitor. The cues remained on screen until patients made a response (all patients responded correctly on 100% of trials). After a 1500 ms delay, feedback appeared on the screen to indicate whether patients won or lost a small amount of money (see Figure 1a). The feedback remained onscreen for 2000 ms. The amount won or lost was determined by reinforcement contingencies (Figure 1), which we did not tell patients, but which they quickly learned and could spontaneously report following the experiment. Specifically, there was a ‘safe’ cue, which predicted a future reward of 0.06 € with 100% certainty, and a ‘risky’ cue, which predicted a 75% chance of a 0.12 € reward, but also a 25% chance of losing 0.12 €. Note that the expected value of these two options is the same; this is important for distinguishing prediction errors from reward signals, as described below. There were 152 trials during the learning task (76 of each cue type). A pleasant or unpleasant sound (for wins or losses) was simultaneously played to provide poly-modal reinforcement. We used a game show-type buzzer for losses, the Windows chime for safe rewards, and the Windows ‘Tada!’ sound for risky wins. Patients started each trial by pressing the space bar.
When the learning phase finished, patients were told that there was a second part to the experiment, in which they would see both shapes at the same time and were free to choose whichever they wanted (but could only choose one per trial), and that sometimes they would win money and sometimes they would lose money (Figure 1b). Patients were not told that how much money they could win or lose was related to which shape was displayed or chosen, nor were they told that the reinforcement contingencies were the same as experienced in the learning task. Patients began this task when they indicated they understood the instructions and were ready to begin. There were 200 trials in the choosing task. On each trial, patients saw both stimuli on the screen. When patients indicated their choice with a button press, a black box appeared behind the chosen stimulus for 500 ms to provide visual confirmation of the response. After a 1500 ms delay, feedback was presented for 2000 ms. The experiment was programmed using Presentation software (www.neuro-bs.com). Patients were paid what they won immediately following the experiment (9.00 € during the learning task, and an average of 10.85 € during the choosing task).
RESULTS
Behavior
During the learning task, there were no differences in reaction time to the two cues (mean for risky/safe: 613/577 ms, repeated-measures ANOVA: F1,4=0.288, p=0.62). During the choosing task, patients selected the safe (mean±std.: 68±8%) option more often than the risky option (32±8%). Reaction times during the choosing task were numerically but not significantly faster for risky than for safe choices (mean for risky/safe: 498/683 ms, F1,4=1.53, p=0.28; Figure 1c). We also examined whether feedback during the choosing task influenced the decision patients made on the subsequent trial. Although patients were more likely to choose the risky option after a risky win or loss compared to after safe wins (percent choose risky following safe, risky win, risky loss: 20, 58, 48%; see Figure 1c), these differences was not statistically significant (F1.5,6.3=2.55, p=0.158).
Nucleus Accumbens Activity During Learning
As seen in Figure 2a, nucleus accumbens activity was significantly greater following the risky than safe cues. This difference began around 400 ms after onset of the cue and continued until around 700 ms (ANOVA; F1,4=8.74, p=0.042; see Figure 2b), and was observed individually in four of five patients (see Supplementary Information). During feedback, we observed that nucleus accumbens activity following safe rewards was significantly greater than baseline activity (t4=−2.79, p<0.025), as was the increase in activity from risky losses to safe rewards to risky rewards (linear increase across condition: F1,4=23.07, p<0.01; Figure 2c). The loss-reward difference was maximal around 300 ms (Figure 2d), but inverted at around ∼450–500 ms, which was because of a sharp negative-going potential following losses. This inversion effect (more negative for loss compared to win from 450–550 ms) was significant across patients (t4=−2.15, p=0.048). Finally, to examine learning effects throughout the experiment, we plotted separately the first and last 10 trials of each condition. Activity increased slightly from the beginning of the experiment to the end of the experiment (Figure 3).
Learning vs Choosing
We next examined whether the nucleus accumbens is equally sensitive to rewards and their predictors during both learning and reward-guided decision-making. We thus examined accumbens activity during the choosing task, which was similar to the learning task except patients were free to choose which of the two cues they wanted on each trial. In contrast to the cue-related activity observed during the learning task, we observed no changes in activity following the cue during the choosing task (Figure 4a and b). Indeed, activity did not even reliably depart from baseline at the time of the peak response in the learning phase (P>0.5; see Supplementary Information for activity plotted separately for each patient). In contrast to these cue-locked differences between the learning and choosing tasks, feedback-locked ERPs during the choosing task were similar to those recorded during the learning task (Figure 4c and d). Safe rewards continued to elicit activity greater than baseline (t4=−2.77, p<0.025), and the increase in activity from losses to safe rewards to risky rewards was also significant (F1,4=11.36, p<0.001; Figure 4c). Further, the later increase in activity for losses was visually similar to that observed during the learning phase and was marginally significant (t4=−1.53, p=0.10). Finally, we examined whether feedback-related ERPs predicted the choice patients made on the subsequent trial, but no reliable results emerged.
Functional Interactions Between the Nucleus Accumbens and Medial Frontal Cortex
To investigate possible directional relationships between the accumbens and medial frontal cortex, we computed cross-correlations, in which time courses of activity in the accumbens and scalp electrode Cz were repeatedly correlated, each time lagging one time course relative to the other by 1 ms (see Materials and methods). Cross-correlation patterns were similar across all conditions, so we averaged them together; separate plots are displayed in the Supplementary Information. The cross-correlation peaked at zero time lag, and was significant during both cue (F1,4=44.5, p=0.003) and feedback (F1,4=9.34, p=0.038), but was stronger during cue than feedback (F1,4=8.79, p=0.04; Figure 5a). This zero time lag suggests that much of the coactivation between the medial frontal cortex and nucleus accumbens is simultaneous, ie, driven by a third region. To investigate whether this connectivity also contained top-down and bottom-up processes, we examined the asymmetry in the cross-correlations by subtracting the correlation value at each negative time lag from the correlation value at the corresponding positive time lag (Figure 5b). In this subtraction, positive values indicate that medial frontal activity predicts future nucleus accumbens activity, and negative values indicate that nucleus accumbens activity predicts future medial frontal activity. Here, we observed a dynamic relationship between the medial frontal cortex and nucleus accumbens: Medial frontal cortex initially preceded nucleus accumbens activity with a lag from 0 to 100 ms that peaked at 46 ms on average (standard deviation: 14 ms), similar to the time lag observed in rats (Kasanetz et al, 2006). However, the direction of this effect reversed later in time, with nucleus accumbens activity preceding medial frontal activity from around 200–400 ms with a peak at 302 ms (standard deviation: 90 ms). This correlation reversal was significant both during the cue (F1,4=122, p<0.001) and feedback (F1,4=7.45, p=0.05) phases.
DISCUSSION
Our findings provide novel evidence for the function of the nucleus accumbens in representing risk and rewards, and its interactions with the medial frontal cortex during reward processes. In particular, this study revealed four primary findings.
First, nucleus accumbens activity was increased during risky compared to safe reward cues, and exhibited a linear increase in activity during feedback from losses to safe and risky outcomes that extended during most of feedback presentation, with the exception of the brief inversion between risky losses and wins, which is driven by a sharp negative deflection to losses at around 450 ms (discussed more below). These findings are consistent with a role of the nucleus accumbens in encoding the potential and received value of a reward, rather than prediction error. For example, reward prediction errors should be zero during safe rewards, because these rewards can be perfectly predicted. Additionally, the prediction error at the time of cue is equal for risky and safe trials, because their expected value is the same. Our results are in accord with suggestions that the nucleus accumbens is sensitive to the magnitude of rewards (eg Knutson et al, 2001; Small et al, 2001; Cromwell et al, 2005).
In addition to the potential value of the reward, the risky and safe cues were also associated with differences in variance (ie uncertainty or risk) of the associated outcome. Could this have explained the difference in activity seen in Figure 2a and b? Although we cannot rule out the possibility that variance contributed to this effect, several considerations suggest that this alone did not drive the risky vs safe cue ERP difference. First, cue-related spiking activity in midbrain dopamine neurons (Fiorillo et al, 2003), and the BOLD response in the striatum (Preuschoff et al, 2006), increases for increasingly predictive cues, which is the exact opposite pattern of results to what we found. This suggests that our findings reflected reward magnitude, and not reward predictability. Second, if the nucleus accumbens encoded the variance of associated outcomes rather than the reward magnitude, one would expect the response to safe cues to decrease over the course of the learning task, as patients learned that the safe cue was associated with a zero-variance outcome. However, we observed the opposite result (Figure 3). Thus, although it is possible that variance contributed to the results, it seems likely that, as in other studies (Schultz et al, 1992; Bowman et al, 1996; Shidara et al, 1998; Knutson et al, 2001; Yacubian et al, 2006), nucleus accumbens cue-related activity was sensitive to the magnitude of the potential future reward. One further possibility is that if patients had ‘hoped for’ the risky cue, then it is possible that the appearance of the safe cue would trigger a negative prediction error. However, this explanation is unlikely, for two reasons. First, during the choosing task, patients exhibited a preference for the safe over the risky option (the safe option was selected on 68% of trials), thus arguing against an expectation/hope for the risky cue during the learning task. Second, if patients did experience a subjectively negative prediction error during the safe cue in the learning task, one might expect the ERP response to have an inverted polarity, as is seen following losses (ie a true negative prediction error). Instead, the ERP to the safe cue exhibited a similar-looking response but ∼50% of the amplitude compared to the risky cue (compare with 50% of the possible reward ensuing reward value).
One curious and unexpected finding was the increase in activity following losses that peaked around 450–500 ms postfeedback. This was observed in both the learning and choosing tasks, and can be seen within individual patients (see Supplementary Information). The nucleus accumbens is known to increase activity following punishments and aversive stimuli (Becerra et al, 2001; Young, 2004; Roitman et al, 2005), and this loss-related increase may be a reflection of this. It is also possible that this peak reflects the added salience or low probability of risky losses. Because of the novelty of this finding, and because it has no surface ERP correlate that we are aware of, further experimentation will be required to more confidently attach psychological significance to this effect.
The second finding was that the nucleus accumbens was significantly more active during the cue phase in the learning task than during the choosing task, suggesting a pronounced role of the nucleus accumbens in learning stimulus-reward mappings (Pothuizen et al, 2005; Cardinal, 2006). Why was the nucleus accumbens not active during the cue phase of the choosing task? This effect is not likely because of initial learning of cue-reward associations: ERPs to safe cues did not change over the course of the learning task, suggesting that much of the learning took place in the first few trials. The primary difference between the learning and choosing tasks is that the learning task was Pavlovian (passive learning) whereas in the choosing task, outcomes depended on patients' behavior. Evidence from rat lesion and human neuroimaging studies suggests that the nucleus accumbens is involved in Pavlovian conditioning as well as initial learning during instrumental tasks (Atallah et al, 2007; Day and Carelli, 2007). In contrast, evidence suggests that the dorsal striatum is more involved in reward-guided-free choice selection (Lauwereyns et al, 2002; Samejima et al, 2005). Curiously, feedback-locked ERPs were similar across the two tasks. These findings suggest that during cue presentation, the nucleus accumbens helps to learn and maintain stimulus-reward associations, and is no longer needed during the choosing task, when actions must be taken based on learned associations. During feedback, the nucleus accumbens used reward information to update learned representations of these associations, possibly mediated by the dorsal striatum (Lauwereyns et al, 2002; Samejima et al, 2005). It is also possible that these differences were driven by the difference in number of visual cues presented (one during the learning phase but two during the choosing phase), although it is not clear why this would produce no change in activity (instead of the average activity as seen in the learning task), nor is it clear that patients processed only the visually presented cue (as opposed to thinking about the other cue as well) during the learning phase. Finally, it is possible that during the learning task, the cues themselves acquired rewarding properties, thus eliciting a ‘reward’ response, although in this case, it is unclear why the cues during the choosing task would fail to elicit a reward-like response, or to reflect the decision chosen on that trial.
The third main finding was the timing with which the nucleus accumbens processes reward-related information. Cue-related ERPs, and the difference between risky and safe conditions, peaked at 554 ms, whereas feedback-related ERPs peaked at 328 ms during the learning task and 315 ms during the choosing task (see Figures 2b, d and 4b, d). This difference could not be explained by visual input, because there was more visual information to decode in the feedback phase (see Figure 1). The time course of the feedback-locked findings provides a novel link between activity in the medial frontal cortex observed in previous scalp EEG studies (Gehring and Willoughby, 2002; Nieuwenhuis et al, 2004; Frank et al, 2005; Cohen and Ranganath, 2007), in which it is observed that ERPs can distinguish positive from negative feedback at around 200–300 ms.
The fourth finding was of functional connectivity between the nucleus accumbens and medial frontal cortex. Functional interactions between these areas are expected given the anatomical projections from the medial prefrontal cortex to the nucleus accumbens (Haber et al, 2000, 2006). It also is consistent with functional MRI studies suggesting enhanced connectivity between the medial frontal cortex and nucleus accumbens during risk-taking (Cohen et al, 2005). The strongest correlations were observed at a time lag of 0 ms. This suggests that the two time courses share temporally simultaneous variance. Most likely, these zero-lag coactivations were driven by a third region projecting to both the nucleus accumbens and the medial frontal cortex. One possibility is the mediodorsal nuclei of the thalamus, which projects to both structures (Berendse and Groenewegen, 1990), and which is necessary for some reward-related behavior adaptations (Block et al, 2007). Another possible route is through the midbrain dopamine centers, which receive projections from the nucleus accumbens (via the globus pallidus) and project back to medial frontal cortex (Powell and Leman, 1976; Mogenson et al, 1983; Haber and McFarland, 1999).
Aside from the zero-lag correlations, we also observed significant asymmetry in the cross-correlations. Specifically, at short delays (∼100 ms), medial frontal activity predicted future nucleus accumbens activity, whereas with larger delays of ∼300 ms, nucleus accumbens activity predicted future medial frontal activity. This seems consistent with the proposal that the nucleus accumbens is a dynamic gateway to integrate reinforcement signals to bias reward-seeking behavior. Specifically, multiple afferent signals from medial frontal structures about possible actions, and from amygdala and orbitofrontal cortex about possible rewards, converge in the nucleus accumbens, likely providing synaptic inputs to the same accumbens neurons (Groenewegen et al, 1999). With the LTP-potentiating action of dopamine (Wolf et al, 2003), the nucleus accumbens forms stimulus-reward associations, and in turn biases or reinforces actions that might lead to larger rewards (Redgrave and Gurney, 2006). This possibility is consistent with the anatomical connectivity of this circuit: Medial frontal cortex directly projects to, and can modulate neural activity and dopamine levels in, the nucleus accumbens (Jackson et al, 2001; Brady and O'Donnell, 2004). The nucleus accumbens in turn can project back to the medial frontal cortex via indirect connections such as the ventral tegmental area (Carr and Sesack, 2000; Haber et al, 2000).
The DBS electrodes measure local field potentials, which comprise largely the sum of dendritic activity. Although several reports have linked local field potential activity to the functional MRI BOLD response (Logothetis, 2002), some recent reports demonstrate that findings from single-unit recordings, EEG, and MEG (which measures magnetic field potentials, closely related to electrical potentials) do not always conform to patterns of BOLD activity (Maier et al, 2008). Thus, although it is likely that our findings are applicable to interpreting functional MRI studies, it is possible that some aspects of intracranial EEG findings might not map perfectly onto patterns of activation from functional MRI studies. However, the observation that activity closely followed the value or magnitude of reward is consistent with some functional MRI results (Breiter and Rosen, 1999; Knutson et al, 2001; Ernst et al, 2005).
One must ask how generalizable the accumbens electrophysiological findings are, given that these patients might have dysfunctional reward systems. It is not possible to measure electrophysiological activity from the nucleus accumbens of healthy humans, so we cannot determine whether patients' depression influenced our results. However, it is not clear that electrophysiological functions of the nucleus accumbens investigated here are pathological in these patients. That is, although it is clear that DBS to this region alleviates symptoms of depression (Schlaepfer et al, 2008), the mechanism of this improvement remains unknown. For example, it is possible that DBS-driven overstimulation of nucleus accumbens target or afferent regions (McIntyre et al, 2004; McCracken and Grace, 2007) drives the efficacy of this procedure. Consistent with this ‘network modulation’ idea, DBS to other brain regions such as the subgenual cingulate is also effective at alleviating depression symptoms (Mayberg et al, 2005). Further, although some studies show differences in ventral striatal activation between depression and control subjects (Epstein et al, 2006), other studies have found no differences between depression and controls during simple reward tasks (Knutson et al, 2008). In the present dataset, we found no significant correlations between the presurgery Hamilton depression scores and the voltage change of ERP components (all p's>0.2), although it is possible that depression severity-ERP links are too subtle to be detected with a small sample size. Nonetheless, several aspects of our findings are consistent with those of animal studies, particularly that activity reflected the value of cued and received rewards (Schultz et al, 1992; Carelli and Deadwyler, 1997; Albertin et al, 2000; Cardinal and Howes, 2005; Cromwell et al, 2005). Patients' behavioral performance and self-reported motivation, and our anecdotal observation of them, suggest that they were engaged in the task. The best route to knowing whether depression contributed to our findings would be to conduct similar studies in rats or nonhuman primates, for example, in rat models of depression (Overstreet et al, 2005). At the time of recording, stimulation had not yet begun, so any possible longer-lasting effects of DBS could not have influenced our findings.
References
Abler B, Walter H, Erk S, Kammerer H, Spitzer M (2006). Prediction error as a linear function of reward probability is coded in human nucleus accumbens. Neuroimage 31: 790–795.
Albertin SV, Mulder AB, Tabuchi E, Zugaro MB, Wiener SI (2000). Lesions of the medial shell of the nucleus accumbens impair rats in finding larger rewards, but spare reward-seeking behavior. Behav Brain Res 117: 173–183.
Atallah HE, Lopez-Paniagua D, Rudy JW, O'Reilly RC (2007). Separate neural substrates for skill learning and performance in the ventral and dorsal striatum. Nat Neurosci 10: 126–131.
Becerra L, Breiter HC, Wise R, Gonzalez RG, Borsook D (2001). Reward circuitry activation by noxious thermal stimuli. Neuron 32: 927–946.
Berendse HW, Groenewegen HJ (1990). Organization of the thalamostriatal projections in the rat, with special emphasis on the ventral striatum. J Comp Neurol 299: 187–228.
Block AE, Dhanji H, Thompson-Tardif SF, Floresco SB (2007). Thalamic-prefrontal cortical-ventral striatal circuitry mediates dissociable components of strategy set shifting. Cereb Cortex 17: 1625–1636.
Bowman EM, Aigner TG, Richmond BJ (1996). Neural signals in the monkey ventral striatum related to motivation for juice and cocaine rewards. J Neurophysiol 75: 1061–1073.
Brady AM, O'Donnell P (2004). Dopaminergic modulation of prefrontal cortical input to nucleus accumbens neurons in vivo. J Neurosci 24: 1040–1049.
Breiter HC, Rosen BR (1999). Functional magnetic resonance imaging of brain reward circuitry in the human. Ann NY Acad Sci 877: 523–547.
Cardinal RN (2006). Neural systems implicated in delayed and probabilistic reinforcement. Neural Netw 19: 1277–1301.
Cardinal RN, Howes NJ (2005). Effects of lesions of the nucleus accumbens core on choice between small certain rewards and large uncertain rewards in rats. BMC Neurosci 6: 37.
Carelli RM, Deadwyler SA (1997). Cellular mechanisms underlying reinforcement-related processing in the nucleus accumbens: electrophysiological studies in behaving animals. Pharmacol Biochem Behav 57: 495–504.
Carr DB, Sesack SR (2000). Projections from the rat prefrontal cortex to the ventral tegmental area: target specificity in the synaptic associations with mesoaccumbens and mesocortical neurons. J Neurosci 20: 3864–3873.
Cohen MX, Heller AS, Ranganath C (2005). Functional connectivity with anterior cingulate and orbitofrontal cortices during decision-making. Brain Res Cogn Brain Res 23: 61–70.
Cohen MX, Ranganath C (2007). Reinforcement learning signals predict future decisions. J Neurosci 27: 371–378.
Cromwell HC, Hassani OK, Schultz W (2005). Relative reward processing in primate striatum. Exp Brain Res 162: 520–525.
Day JJ, Carelli RM (2007). The nucleus accumbens and Pavlovian reward learning. Neuroscientist 13: 148–159.
Delgado MR, Nystrom LE, Fissell C, Noll DC, Fiez JA (2000). Tracking the hemodynamic responses to reward and punishment in the striatum. J Neurophysiol 84: 3072–3077.
Epstein J, Pan H, Kocsis JH, Yang Y, Butler T, Chusid J et al (2006). Lack of ventral striatal response to positive stimuli in depressed vs normal subjects. Am J Psychiatry 163: 1784–1790.
Ernst M, Nelson EE, Jazbec S, McClure EB, Monk CS, Leibenluft E et al (2005). Amygdala and nucleus accumbens in responses to receipt and omission of gains in adults and adolescents. Neuroimage 25: 1279–1291.
Finch DM (1996). Neurophysiology of converging synaptic inputs from the rat prefrontal cortex, amygdala, midline thalamus, and hippocampal formation onto single neurons of the caudate/putamen and nucleus accumbens. Hippocampus 6: 495–512.
Fiorillo CD, Tobler PN, Schultz W (2003). Discrete coding of reward probability and uncertainty by dopamine neurons. Science 299: 1898–1902.
Frank MJ, Woroch BS, Curran T (2005). Error-related negativity predicts reinforcement learning and conflict biases. Neuron 47: 495–501.
Gehring WJ, Willoughby AR (2002). The medial frontal cortex and the rapid processing of monetary gains and losses. Science 295: 2279–2282.
Groenewegen HJ, Wright CI, Beijer AV, Voorn P (1999). Convergence and segregation of ventral striatal inputs and outputs. Ann NY Acad Sci 877: 49–63.
Haber SN, Fudge JL, McFarland NR (2000). Striatonigrostriatal pathways in primates form an ascending spiral from the shell to the dorsolateral striatum. J Neurosci 20: 2369–2382.
Haber SN, Kim KS, Mailly P, Calzavara R (2006). Reward-related cortical inputs define a large striatal region in primates that interface with associative cortical connections, providing a substrate for incentive-based learning. J Neurosci 26: 8368–8376.
Haber SN, McFarland NR (1999). The concept of the ventral striatum in nonhuman primates. Ann NY Acad Sci 877: 33–48.
Jackson ME, Frost AS, Moghaddam B (2001). Stimulation of prefrontal cortex at physiologically relevant frequencies inhibits dopamine release in the nucleus accumbens. J Neurochem 78: 920–923.
Kasanetz F, Riquelme LA, O'Donnell P, Murer MG (2006). Turning off cortical ensembles stops striatal up states and elicits phase perturbations in cortical and striatal slow oscillations in rat in vivo. J Physiol 577: 97–113.
Knutson B, Adams CM, Fong GW, Hommer D (2001). Anticipation of increasing monetary reward selectively recruits nucleus accumbens. J Neurosci 21: RC159.
Knutson B, Bhanji JP, Cooney RE, Atlas LY, Gotlib IH (2008). Neural responses to monetary incentives in major depression. Biol Psychiatry 63: 686–692.
Lauwereyns J, Watanabe K, Coe B, Hikosaka O (2002). A neural correlate of response bias in monkey caudate nucleus. Nature 418: 413–417.
Logothetis NK (2002). The neural basis of the blood-oxygen-level-dependent functional magnetic resonance imaging signal. Philos Trans R Soc Lond B Biol Sci 357: 1003–1037.
Maier A, Wilke M, Aura C, Zhu C, Ye FQ, Leopold DA (2008). Divergence of fMRI and neural signals in V1 during perceptual suppression in the awake monkey. Nat Neurosci 11: 1193–1200.
Matthews SC, Simmons AN, Lane SD, Paulus MP (2004). Selective activation of the nucleus accumbens during risk-taking decision making. Neuroreport 15: 2123–2127.
Mayberg HS, Lozano AM, Voon V, McNeely HE, Seminowicz D, Hamani C et al (2005). Deep brain stimulation for treatment-resistant depression. Neuron 45: 651–660.
McClure SM, Berns GS, Montague PR (2003). Temporal prediction errors in a passive learning task activate human striatum. Neuron 38: 339–346.
McCracken CB, Grace AA (2007). High-frequency deep brain stimulation of the nucleus accumbens region suppresses neuronal activity and selectively modulates afferent drive in rat orbitofrontal cortex in vivo. J Neurosci 27: 12601–12610.
McIntyre CC, Savasta M, Kerkerian-Le Goff L, Vitek JL (2004). Uncovering the mechanism(s) of action of deep brain stimulation: activation, inhibition, or both. Clin Neurophysiol 115: 1239–1248.
Mogenson GJ, Swanson LW, Wu M (1983). Neural projections from nucleus accumbens to globus pallidus, substantia innominata, and lateral preoptic-lateral hypothalamic area: an anatomical and electrophysiological investigation in the rat. J Neurosci 3: 189–202.
Morgane PJ, Galler JR, Mokler DJ (2005). A review of systems and networks of the limbic forebrain/limbic midbrain. Prog Neurobiol 75: 143–160.
Morris G, Nevet A, Arkadir D, Vaadia E, Bergman H (2006). Midbrain dopamine neurons encode decisions for future action. Nat Neurosci 9: 1057–1063.
Nieuwenhuis S, Holroyd CB, Mol N, Coles MG (2004). Reinforcement-related brain potentials from medial frontal cortex: origins and functional significance. Neurosci Biobehav Rev 28: 441–448.
O'Doherty JP, Dayan P, Friston K, Critchley H, Dolan RJ (2003). Temporal difference models and reward-related learning in the human brain. Neuron 38: 329–337.
O'Doherty JP, Dayan P, Schultz J, Deichmann R, Friston K, Dolan RJ (2004). Dissociable roles of ventral and dorsal striatum in instrumental conditioning. Science 304: 452–454.
Overstreet DH, Friedman E, Mathe AA, Yadid G (2005). The Flinders Sensitive Line rat: a selectively bred putative animal model of depression. Neurosci Biobehav Rev 29: 739–759. E-pub April 2005, 2022.
Pothuizen HH, Jongen-Relo AL, Feldon J, Yee BK (2005). Double dissociation of the effects of selective nucleus accumbens core and shell lesions on impulsive-choice behaviour and salience learning in rats. Eur J Neurosci 22: 2605–2616.
Powell EW, Leman RB (1976). Connections of the nucleus accumbens. Brain Res 105: 389–403.
Preuschoff K, Bossaerts P, Quartz SR (2006). Neural differentiation of expected reward and risk in human subcortical structures. Neuron 51: 381–390.
Redgrave P, Gurney K (2006). The short-latency dopamine signal: a role in discovering novel actions? Nat Rev Neurosci 7: 967–975.
Rodriguez PF, Aron AR, Poldrack RA (2006). Ventral-striatal/nucleus-accumbens sensitivity to prediction errors during classification learning. Hum Brain Mapp 27: 306–313.
Roitman MF, Wheeler RA, Carelli RM (2005). Nucleus accumbens neurons are innately tuned for rewarding and aversive taste stimuli, encode their predictors, and are linked to motor output. Neuron 45: 587–597.
Samejima K, Ueda Y, Doya K, Kimura M (2005). Representation of action-specific reward values in the striatum. Science 310: 1337–1340.
Schlaepfer T, Cohen MX, Frick C, Kosel M, Brodesser D, Axmacher N et al (2008). Deep brain stimulation to reward circuitry alleviates anhedonia in refractory major depression. Neuropsychopharmacology 33: 368–377.
Schultz W, Apicella P, Scarnati E, Ljungberg T (1992). Neuronal activity in monkey ventral striatum related to the expectation of reward. J Neurosci 12: 4595–4610.
Shidara M, Aigner TG, Richmond BJ (1998). Neuronal signals in the monkey ventral striatum related to progress through a predictable series of trials. J Neurosci 18: 2613–2625.
Small DM, Zatorre RJ, Dagher A, Evans AC, Jones-Gotman M (2001). Changes in brain activity related to eating chocolate: from pleasure to aversion. Brain 124: 1720–1733.
Sturm V, Lenartz D, Koulousakis A, Treuer H, Herholz K, Klein JC et al (2003). The nucleus accumbens: a target for deep brain stimulation in obsessive-compulsive- and anxiety-disorders. J Chem Neuroanat 26: 293–299.
Wolf ME, Mangiavacchi S, Sun X (2003). Mechanisms by which dopamine receptors may influence synaptic plasticity. Ann NY Acad Sci 1003: 241–249.
Yacubian J, Glascher J, Schroeder K, Sommer T, Braus DF, Buchel C (2006). Dissociable systems for gain- and loss-related value predictions and errors of prediction in the human brain. J Neurosci 26: 9530–9537.
Young AM (2004). Increased extracellular dopamine in nucleus accumbens in response to unconditioned and conditioned aversive stimuli: studies using 1 min microdialysis in rats. J Neurosci Methods 138: 57–63.
Acknowledgements
This study was partly funded by Medtronic Inc. and a predoctoral NIDA NRSA to MXC. We thank Caroline Frick, Markus Kosel, and Daniela Rottländer for assistance with the patients, and Michael Frank for useful and insightful discussions.
Author information
Authors and Affiliations
Corresponding author
Additional information
Disclosure/Conflict of Interest
The authors declare no conflict of interest.
Supplementary Information accompanies the paper on the Neuropsychopharmacology website (http://www.nature.com/npp)
Supplementary information
Rights and permissions
About this article
Cite this article
Cohen, M., Axmacher, N., Lenartz, D. et al. Neuroelectric Signatures of Reward Learning and Decision-Making in the Human Nucleus Accumbens. Neuropsychopharmacol 34, 1649–1658 (2009). https://doi.org/10.1038/npp.2008.222
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1038/npp.2008.222