Nothing Special   »   [go: up one dir, main page]

Neural Correlations, Population Coding and Computation: Reviews

Download as pdf or txt
Download as pdf or txt
You are on page 1of 9

REVIEWS

Neural correlations, population coding


and computation
Bruno B. Averbeck*, Peter E. Latham‡ and Alexandre Pouget*
Abstract | How the brain encodes information in population activity, and how it combines and
manipulates that activity as it carries out computations, are questions that lie at the heart of
systems neuroscience. During the past decade, with the advent of multi-electrode recording
and improved theoretical models, these questions have begun to yield answers. However, a
complete understanding of neuronal variability, and, in particular, how it affects population
codes, is missing. This is because variability in the brain is typically correlated, and although
the exact effects of these correlations are not known, it is known that they can be large. Here,
we review studies that address the interaction between neuronal noise and population
codes, and discuss their implications for population coding in general.

As in any good democracy, individual neurons count of one neuron around its average are not correlated with
for little; it is population activity that matters. For the fluctuations of other neurons, population coding is
example, as with control of eye1,2 and arm3 movements, relatively well understood. Specifically, we know which
visual discrimination in the primary visual cortex (V1) factors control the amount of information a population
is much more accurate than would be predicted from code contains4–7, how networks that receive population
the responses of single neurons4. This is, of course, not codes as input can be constructed so that they carry out
surprising. As single neurons are not very informative, computations optimally8, and even how information in a
to obtain accurate information about sensory or motor population code increases as a result of learning or atten-
variables some sort of population averaging must be tion9–12. Unfortunately, noise in the brain is correlated,
performed. Exactly how this averaging is carried out and because of this we need to take a second look at the
in the brain, however, and especially how population results that have been obtained under the assumption of
codes are used in computations (such as reaching for an independent noise. (As discussed in BOX 1, ‘correlated’
object on the basis of visual cues, an action that requires in this article means ‘noise correlated’.) For the compu-
a transformation from population codes in visual areas tational work, this means extending the theories to take
to those in motor areas), is not fully understood. into account correlated noise, and for the empirical work
Part of the difficulty in understanding population this means assessing how attention and learning affect not
coding is that neurons are noisy: the same pattern of only single neuron properties, such as tuning curves, but
activity never occurs twice, even when the same stimu- also how they affect correlations in the noise.
lus is presented. Because of this noise, population coding For these reasons, it is essential that we gain a thorough
is necessarily probabilistic. If one is given a single noisy understanding of both the correlational structure in the
*Department of Brain and
population response, it is impossible to know exactly what brain and its impact on population coding. Progress has
Cognitive Sciences and Center stimulus occurred. Instead, the brain must compute some been made on both fronts by adopting two complemen-
for Visual Science, University estimate of the stimulus (its best guess, for example), or tary perspectives13. One focuses on encoding, and asks
of Rochester, Rochester, perhaps a probability distribution over stimuli. whether adding correlations to a population of neurons
New York 14627, USA.

The inability of population activity to perfectly repre- without modifying single neuron responses (so that
Gatsby Computational
Neuroscience Unit, sent variables raises two questions. First, just how accu- the correlated and uncorrelated populations would be
University College London, rately can variables be represented? And second, how does indistinguishable on the basis of single neuron record-
17 Queen Square, the presence of noise affect computations? Not surpris- ings) increases or decreases the amount of information
London WC1N 3AR, UK. ingly, the answer to both depends strongly on the nature in the population. The goal of this approach is to deter-
Correspondence to A.P.
e-mail:
of the neuronal noise, and especially on whether or not the mine whether there are any general principles that relate
alex@bcs.rochester.edu noise is correlated (see BOX 1 for definitions). If the noise correlations to increases or decreases in the amount of
doi:10.1038/nrn1888 is uncorrelated, meaning the fluctuations in the response information, and to assess whether the information

358 | MAY 2006 | VOLUME 7 www.nature.com/reviews/neuro


REVIEWS

computed from single neurons recorded separately can To determine whether correlations affect the amount
substitute for the true information in a population (a com- of information in a population code, it is necessary to
mon technique used in the analysis of experimental data9,14– compute the amount of information in the correlated
17
). The other focuses on decoding, and more generally on responses, denoted I, and compare this with the amount
computation. It is driven by the fact that if one wants to of information that would be in the responses if they
extract all the information from a population of correlated were uncorrelated, denoted Ishuffled (the name Ishuffled is
neurons (that is, decode optimally), the strategy for doing derived from the fact that, in experiments, responses are
so can be quite complicated. From this perspective, we decorrelated by shuffling trials). The difference, ∆Ishuffled
can ask how well decoding strategies that ignore correla- (≡ I–Ishuffled), is a measure of the effect of correlations on
tions, and, therefore, are relatively simple, compare with the amount of information in a population code13. An
optimal, but more complex, strategies. important aspect of this approach is that information is
Here, we summarize the empirical and theoretical quantifiable, and so can be computed from data.
work that has been carried out in relation to these two We can develop most of the intuition necessary
perspectives, and discuss what they tell us about the to understand how correlations affect the amount of
interplay between correlations, population coding and information in a population code by considering a two
computation. neuron, two stimulus example. Although this is a small
population code, it retains several of the features of larger
The encoding perspective: ∆Ishuffled ones. In particular, each stimulus produces a (typically
Perhaps the most straightforward question that can be different) set of mean responses, and around those means
asked about correlations is whether or not they affect the there is noise in the form of trial-to-trial fluctuations.
amount of information in a population code. Although the Because of the noise, any response could be produced by
question is straightforward, the answer is not. For some either stimulus, so a response does not tell us definitively
correlational structures information goes up, for others it which stimulus occurred. Therefore, the noise reduces
goes down, and for still others it stays the same. Although the information in the responses, with the degree of the
this is somewhat disappointing, because it means that the reduction depending on both the correlations in the noise
details of correlations matter, it is important to know for and their relationship to the average responses.
two reasons. First, it raises a cautionary note, as it implies To understand the relationship between signal, noise
that, in general, the amount of information in a population and information in pairs of neurons, we can plot the corre-
cannot be computed without knowing the correlational lated and uncorrelated response distributions and exam-
structure (see below). Second, because details matter, we ine their features. In the left column of FIG. 1 we show a set
are forced to pay attention to them — general statements of correlated responses, and in the right column we show
such as ‘correlations always hurt so the brain should elimi- the associated uncorrelated distributions. The response
nate them’ or ‘correlations always help so the brain should distributions in this figure are indicated schematically by
use them’ can be ruled out. ellipses, which represent 95% confidence intervals (FIG. 1).

Box 1 | Population codes, noise correlation and signal correlation


Population codes are often characterized by the ‘tuning curve plus noise’ model. In this a Tuning curves
model, the tuning curve represents the average response of a neuron to a set of stimuli,
response (spikes)

Neuron 1 Neuron 2
with the average taken across many presentations of each stimulus, and the noise refers
to the trial-to-trial variability in the responses. In panel a, tuning curves are shown for two
neurons that have slightly different preferred stimuli. In panel b, we show two
Mean

hypothetical scatter plots of the single trial responses for this pair of neurons, in response
to the repeated presentation of a single stimulus s1 (arrow in panel a). Ellipses represent
95% confidence intervals. The example on the left illustrates positive noise correlation Stimulus
s1
and the example on the right illustrates negative noise correlation. Responses also show a
second sort of correlation known as signal correlation61. These are correlations in the
b Examples of noise correlation at s1
average response. Neurons with similar tuning curves (panel a) typically have positive
signal correlations, because when s increases, the mean responses of both neurons tend + Noise – Noise
Neuron 2

Neuron 2

to increase, or decrease, together. Conversely, neurons with dissimilar tuning curves correlation correlation
typically have negative signal correlations. Unless stated otherwise, ‘correlated’ in this
article means ‘noise correlated’.
s1 s1
In panels c and d we illustrate the response of a population of neurons. The x-axis Neuron 1 Neuron 1
corresponds to the preferred orientation of the neuron, the response of which is plotted
on the y-axis. Each dot corresponds to the firing rate of one neuron in this example trial, c Uncorrelated d Correlated
and the purple curve shows the average response of each neuron in the population.
Although the neurons in both panels c and d exhibit noise fluctuations, there is a
difference in the structure of those fluctuations: on individual trials the responses of
nearby neurons in panel c are uncorrelated (fluctuating up and down independently),
whereas in panel d they are correlated (tending to fluctuate up and down together).
Note that nearby neurons in panel d are positively correlated (as in panel b, left) whereas Preferred Preferred
those that are far apart are negatively correlated (as in panel b, right). stimulus stimulus

NATURE REVIEWS | NEUROSCIENCE VOLUME 7 | MAY 2006 | 359


REVIEWS

Information (I) in Information (Ishuffled) correlations controls whether correlations increase or


unshuffled responses in shuffled responses decrease information. To illustrate this, in FIG. 1a we
a ∆I shuffled <0 have constructed responses such that the signal and
noise correlations are both positive. This leads to larger
4 4

Neuron 2 (spikes)

Neuron 2 (spikes)
overlap between the ellipses for the correlated than for
3 s2 3 s2
the uncorrelated responses, which makes the correlated
2 2 responses harder to decode. The correlated responses
1 1 carry less information, so ∆Ishuffled <0. In FIG. 1b, on the
0 s1 0
s1 other hand, the signal is negatively correlated whereas
the noise is positively correlated. Here, there is less over-
0 1 2 3 4 0 1 2 3 4
Neuron 1 (spikes) Neuron 1 (spikes) lap in the correlated than the uncorrelated responses,
which makes the correlated responses easier to decode.
b ∆I shuffled >0 In this figure, then, the correlated responses carry more
information, and ∆Ishuffled >0. Importantly, there is also
4 4
an intermediate regime (FIG. 1c) in which I and Ishuffled are
Neuron 2 (spikes)

Neuron 2 (spikes)
s2
3 s2 3 the same (∆Ishuffled = 0). So, the presence of correlations
2 2 does not guarantee an effect on the amount of informa-
1 s1 1 s1
tion encoded. The decrease in information when the
0 0
signal and noise are both positively correlated (or both
negatively correlated) and the increase when the signal
0 1 2 3 4 0 1 2 3 4
Neuron 1 (spikes) Neuron 1 (spikes) and noise have opposite correlations is a general feature
of information coding in pairs of neurons, and has been
c ∆I shuffled = 0
observed by a number of authors18–20.
4 4 s2 These examples illustrate two important points. First, if
Neuron 2 (spikes)

Neuron 2 (spikes)

s2
3 3 we know only the individual responses of each neuron in a
pair, and not their correlations, we do not know how much
2 2
information they encode. Second, just because neuronal
1 1
s1 s1 responses are correlated does not necessarily mean that
0 0 they contain more (or less) information. This is important,
0 1 2 3 4 0 1 2 3 4 as it has been suggested that correlations between neurons
Neuron 1 (spikes) Neuron 1 (spikes) provide an extra channel of information14,21.
Figure 1 | Effects of correlations on information In all of the examples shown in FIG. 1, the correlations
encoding. In all three cases, we show the response are the same for both stimuli, meaning the ellipses in each
distributions for two neurons that respond to two different panel have the same size and orientation. However, it is
stimuli. The panels on the left show the unshuffled possible for the correlations to depend on the stimulus,
responses, those on the right show the shuffled responses. in which case the ellipses would have different sizes or
Each ellipse (which appears as a circle in the uncorrelated orientations. Such correlations are often referred to
plots) indicates the 95% confidence interval for the
as stimulus-modulated correlations22, and they affect
responses. Each diagonal line shows the optimal decision
boundary — that is, responses falling above the line are information encoding in the same way as the examples
classified as stimulus 2 and responses below the line discussed above: if the correlations increase the overlap,
are classified as stimulus 1. The x-axis is the response of the information goes down, whereas if the correlations
neuron 1, the y-axis the response of neuron 2. a | A larger decrease the overlap then the information goes up. In
fraction of the ellipses lie on the ‘wrong’ side of the extreme cases, it is even possible for neurons to have
decision boundary for the true, correlated responses identical mean responses to a pair of stimuli, but differ-
than for the independent responses, so ΔIshuffled <0. ent correlations (for example, one ellipse at +45° and the
b | A smaller fraction of the ellipses lie on the wrong side other at –45°). Although beyond the scope of this review,
of the decision boundary for the correlated responses, the effect of these stimulus-modulated correlations, which
so ΔIshuffled >0. c | The same fraction of the ellipses lies on
are just beginning to be investigated23, can be large.
the wrong side of the decision boundary for both the
correlated and independent responses, so ΔIshuffled = 0. What is actually observed in the brain? Do cor-
Ishuffled, uncorrelated information; ∆Ishuffled, I–Ishuffled. relations increase or decrease the amount of available
information? Various empirical studies have measured
ΔIshuffled in pairs of neurons and have found that it is small
The larger the overlap of the ellipses, the more mistakes in the rat barrel cortex24, and macaque V1 (REFS 25,26),
are made during decoding, and the less information prefrontal27 and somatosensory cortices28. The results
is contained in the neural code. Therefore, these plots of these studies have also shown that ΔIshuffled can be
allow us to see, graphically, how correlations affect the either positive or negative, which means that in real
information in the neuronal responses. neurons — not just in theory — noise correlations can
An important point about correlations is that the either increase or decrease the amount of information
interaction between the signal correlations (which in encoded in pairs of simultaneously recorded neurons.
this case correspond to the relative positions of the Overall, however, the observed effects have been
mean responses, see BOX 1 for definitions) and the noise quite small29.

360 | MAY 2006 | VOLUME 7 www.nature.com/reviews/neuro


REVIEWS

These results have direct bearing on (and are at a


4,000
odds with) the binding-by-synchrony hypothesis. This
hypothesis, originally put forward by Milner30 and von der 3,500 c = –0.005
c = 0 (Ishuffled)
Malsburg31, and championed by Singer and colleagues32–34, 3,000 c = 0.01
c = 0.1

Information (I)
states that noise correlations (more specifically, synchro- 2,500
nous spikes) could solve the binding problem35 by signal-
2,000
ling whether different features in a visual scene belong
to the same object. Specifically, they suggested that the 1,500
number of synchronous spikes across a pair of neurons 1,000
depends on whether the pair represents the same or dif- 500
ferent objects. If this hypothesis were true, it would imply
0
that ΔIshuffled would be large and positive, at least for some 0 50 100 150 200 250 300 350 400 450 500
pairs, because shuffling data removes synchronous spikes. Population size
To test this directly, Golledge et al.25 calculated ΔIshuffled (Icor
b
in their study) using an experimental paradigm similar to 0
that used by Singer and colleagues. They found that shuf- c = 0.1
–5
fling the data eliminated little information about whether
two features in a visual scene belonged to the same object, –10
a finding that argues against the binding-by-synchrony

∆Ishuffled/I
hypothesis. –15
These empirical studies suggest that in vivo correla- –20
tions have little impact on the amount of information in
pairs of neurons. Whether this holds for large popula- –25
tions, however, is unknown. In fact, as pointed out by
–30
Zohary and colleagues36, small effects of correlations 0 400 800 1,200 1,600 2,000
in pairs can have large effects in populations. But, as Population size
with the two neuron example given above, the effect
Figure 2 | Information, I, and ΔIshuffled versus population
can be either positive or negative. To illustrate this,
size. a | Information, I, versus population size, for different
consider a population of neurons with bell-shaped correlation coefficients, c. Positive correlations (c = 0.01
tuning curves in which neurons with similar tuning or c = 0.1) decrease information with respect to the
curves are more strongly correlated than neurons with uncorrelated (c = 0) case. Furthermore, for positive
dissimilar tuning curves. As, in this example, neurons correlations, information saturates as the number of
with similar tuning curves show positive signal cor- neurons increases. b | ∆Ishuffled/I versus population size.
relations, we expect, on the basis of our two neuron, An important feature of this plot is that correlations have
two stimulus example above, that positive noise cor- large effects at the population level even though ∆Ishuffled/I
relations will lead to a reduction in information and is small for individual neuronal pairs. Ishuffled, uncorrelated
negative correlations to an increase. This is exactly information; ∆Ishuffled, I–Ishuffled. Encoding model in panels a
and b was taken from REF. 45 and the information measure
what is found. Specifically, as the number of neurons
was Fisher.
increases, Ishuffled (FIG. 2a, correlation coefficient (c) = 0)
becomes much larger than I when noise correlations
are positive (FIG. 2a, c = 0.01 or c = 0.1) and much A corollary of these results is that noise correlations
smaller when they are negative (FIG. 2a, c = –0.005). can cause the amount of information in a population of
Interestingly, however, these effects are small for pairs of neurons to saturate as the number of neurons approaches
neurons, and only become pronounced at the popula- infinity36,45–47 (FIG. 2a). One of the first studies to address
tion level. FIGURE 2b shows how Ishuffled compares with this question empirically suggested that the pattern of
I as the number of neurons increases. For a model in noise correlations observed in the medial temporal visual
which the maximum correlations are 0.1, the differ- area (MT) was such that information would saturate36.
ence between Ishuffled and I is minimal (<1%) for a pair This was subsequently challenged by theoretical stud-
of neurons (n = 2). However, as the size of the popula- ies46,47 that pointed out that the correlations measured in
tion grows to only a few thousand neurons, correlations MT do not necessarily imply that the information would
begin to have a large effect on the encoded information, saturate as the number of neurons increased.
reducing it by a factor of almost 25 relative to Ishuffled. Although the question of whether or not a particular
Although it is not yet clear whether this model accu- correlational structure will cause the information to satu-
rately reflects the effects of noise correlation in the rate is interesting from a theoretical perspective, it may
brain, it provides us with an important lesson: small, not be so relevant to networks in the brain. This is because
perhaps undetectable, correlations in pairs of neurons the nervous system can extract only a finite amount of
can have a large effect at the population level. Therefore, information about sensory stimuli, and, in subsequent
it may be typical for Ishuffled and I to be very different. stages of processing, the amount of information cannot
This, in turn, implies that studies14,37–44 in which Ishuffled exceed the amount extracted by, for example, the retina
is used as a surrogate for the true information, I, should or the cochlea. Therefore, as the number of neurons
be treated with caution. increases, the correlations must be such that information

NATURE REVIEWS | NEUROSCIENCE VOLUME 7 | MAY 2006 | 361


REVIEWS

No-sharpening model Box 2 | Assuming independence when decoding


A model in which the
orientation tuning curves of What do we mean by ‘ignoring correlations when decoding’? Consider the following situation: a machine generates a
cortical cells are solely the number, x, which we would like to know. Unfortunately, every time we query the machine, the sample it produces is
result of the converging corrupted by independent, zero mean noise. To reduce the noise, we collect 10,000 samples. As the samples are
afferents from the LGN, independent, the best estimate of x is a weighted sum, with each sample weighted by 1/10,000.
without further sharpening in Imagine now that the machine gets lazy, and only the first two samples are independent; the other 9,998 are the same
the cortex.
as the second. In this case, the optimal strategy is to weight the first sample by 1/2 and the other 9,999 by a set of weights
that adds up to 1/2. If, however, we decide not to measure the correlations, and assume instead that the samples are
Sharpening model
A model in which the LGN
independent, we would assign a weight of 1/10,000 to all samples. This is of course suboptimal, as the first sample should
afferents provide broad tuning be weighted by 1/2, not 1/10,000. The difference in performance of the optimal strategy (weight of 1/2 on the first
curves to orientation that are sample) versus the suboptimal strategy (weights of 1/10,000 for all samples) is what ∆Idiag measures.
sharpened in the cortex But why should we settle for the suboptimal strategy? The answer is that the suboptimal strategy is simple: the weights
through lateral interactions. are determined by the number of samples, which is easy to compute. For the optimal strategy, on the other hand, it is
necessary to measure the correlations. In this particular example, the correlations are so extreme that we would
immediately notice that the last 9,999 examples are perfectly correlated. In general, however, measuring correlations is
hard, and requires large amounts of data. Therefore, when choosing a strategy, there is a trade-off between performance
and how much time and data one is willing to spend measuring correlations.
Neurons face the same situation: they compute some function of the variables encoded in their inputs, and to perform
this computation optimally they must know the correlations in the ~10,000 inputs that they receive71. If they ignore the
correlations, they may — or may not — pay a price in the form of suboptimal computations.

saturates. As such, the question is not whether informa- noise is independent and Poisson in both models, iden-
tion saturates in the nervous system — it does — it’s how tical tuning curves imply identical information. How
quickly it saturates as the number of neurons increases, about the true information, I? To compute the true infor-
and whether it saturates at a level well below the amount mation, we need to know the correlations. Seriès et al.49
of information available in the input. These remain open have simulated these two models in a regime in which
experimental and theoretical questions45. the tuning curves and the variability were matched on
average. They then estimated the true information, I, and
Pitfalls of using Ishuffled in place of I found that, across many architectures, the no-sharpening
As we have seen, the value of Ishuffled compared with I models always contained more information than the
quantifies the impact of correlations on information sharpening models, despite identical tuning curves. The
in population codes. However, this is not the only use difference in information is the result of using different
of Ishuffled. This measure is also commonly used as a architectures, which lead to different neuronal dynamics
surrogate for the true information14, primarily because and, therefore, different correlations. This point is lost if
estimating the true information in a large neuronal only Ishuffled is measured.
population would require simultaneous recordings of all Similar problems often emerge in other models. For
of the neurons, whereas Ishuffled requires only single cell example, one approach to modelling the neural basis of
recordings, as well as fewer trials. Similarly, correlations attention is to simulate a network of analogue neurons
are often ignored in theoretical and computational work, and modify the strength of the lateral connections to
as they can be difficult to model37–43. Instead, information see if this increases information44. If the information
is often estimated under the assumption of independent is computed under the assumption of independent
noise, which is an estimate of Ishuffled rather than I. Poisson noise, these simulations only reveal whether
Unfortunately, using Ishuffled instead of the true infor- Ishuffled increases. Unfortunately, as we have shown above,
mation can be very misleading because, as discussed in without knowing the correlations, the true information
the previous section, Ishuffled is not guaranteed to provide might have either increased or decreased.
a good estimate of I. Orientation selectivity provides a A common theme in these examples is that the noise
good example of the problem that can arise. Two types of correlations are not independent of the architecture of
model have been proposed to explain the emergence of the network. If the architecture changes, so will the cor-
orientation selectivity in V1. One is a no-sharpening model relations. Assuming independence before and after the
in which the tuning to orientation is due to the conver- change is not a valid approximation, and can therefore
gence of lateral geniculate nucleus (LGN) afferents onto lead to the wrong conclusions.
cortical neurons (this is essentially the model that was
proposed by Hubel and Wiesel48). The other is a sharpen- The decoding perspective: ∆Idiag
ing model in which the LGN afferents produce only weak Above, we asked how correlations affect the total
tuning, which is subsequently sharpened by lateral con- amount of information in a population code. Our ulti-
nections in the cortex. It is possible to build these models mate interest, however, is in how the brain computes
in such a way that they produce identical tuning curves. with population codes, so what we really want to know
Can we conclude from this that they contain the same is how correlations affect computations. This, how-
amount of information about orientation? ever, requires us to specify a computation, and to also
If we were to use Ishuffled as our estimate of information, specify how it is to be performed. To avoid such details,
we would answer ‘yes’. For instance, if we assume that the and also to derive a measure that is computation and

362 | MAY 2006 | VOLUME 7 www.nature.com/reviews/neuro


REVIEWS

implementation independent, we ask instead about Estimate wdiag on Apply to unshuffled


decoding, and, in particular, whether downstream shuffled responses responses (measures Idiag)
neurons have to know about correlations to extract all a ∆Idiag= 0
the available information. We focus on this question

Neuron 2 (spikes)

Neuron 2 (spikes)
4 wdiag 4
because its answer places bounds on computations. 3 3
Specifically, if ignoring correlations means a decoder s2
2 2
loses, for example, half the information in a popula-
1 1
tion code, then a computation that ignores correlations s1 0
0 wdiag = woptimal
will be similarly impaired. This does not mean that 0 1 2 3 4 0 1 2 3 4
decoding is a perfect proxy for computing; the effect Neuron 1 (spikes) Neuron 1 (spikes)
of correlations on decoding will always depend, at least
b ∆Idiag>0
to some degree, on the computation being performed.
4 4 woptimal

Neuron 2 (spikes)

Neuron 2 (spikes)
However, there is one fact that we can be sure of: if
all the information in a population can be extracted 3 3

without any knowledge of the correlations, then, for- 2 2 wdiag ≠ woptimal


wdiag
mally, any computation can perform optimally without 1 1
knowledge of the correlations. 0 0
To investigate the role of correlations in decoding, 0 1 2 3 4 0 1 2 3 4
Neuron 1 (spikes) Neuron 1 (spikes)
then, we can measure the difference between the infor-
mation in a population code, I, and the information, Figure 3 | Effects of correlations on information
denoted Idiag, that would be extracted by a decoder opti- decoding. The panels on the left show the shuffled
responses, those on the right show the unshuffled
mized on the shuffled data but applied to the original
responses. Each ellipse (which appears as a circle in the
correlated data (BOX 2). We refer to this difference as uncorrelated plots) indicates the 95% confidence interval
∆Idiag (∆Idiag = I–Idiag), although it has been given different for the responses. Each diagonal line shows the optimal
names depending on the details of how it is measured. decision boundary — that is, responses falling above the
The name ∆Idiag is often used when working with Fisher line are classified as stimulus 2 and responses below the
information13,50, whereas ∆I (REFS 51,52) and Icor-dep (REF. 22) line are classified as stimulus 1. The x-axis is the response of
have been used for Shannon information53 (both ∆I (REF. 52) neuron 1, the y-axis the response of neuron 2. The panels
and Icor-dep (REF. 22), which are identical, are upper bounds on the left show the decoding boundary (black line)
on the cost of using a decoder optimized on shuffled constructed using the uncorrelated responses (green and
data; see REF. 52 for details). For this discussion the yellow circles). The panels on the right show the decoding
boundary (red line) constructed using the correlated
details of the information measure are not important.
responses (green and yellow ellipses). This is the optimal
Although the encoding perspective (discussed above) decoding boundary. The ‘independent’ decoding boundary
and the decoding perspective are related13, they are not is included on this panel for easy comparison.
as tightly coupled as might be expected. For example, a | The two decoding boundaries (indicated by a dashed
∆Ishuffled can be non-zero — even very far from zero red and black line) are identical, so the fraction of trials
— when correlations have no effect on decoding (∆Idiag decoded correctly is the same whether or not the decoding
= 0). The opposite is also possible: ∆Ishuffled can be zero algorithm was constructed using the correlated responses,
when correlations both exist and have a large effect on and ΔIdiag = 0. b | The two decoding boundaries are
decoding (∆Idiag >0)13,54. To understand this intuitively, different, so fewer trials are decoded correctly using the
let us investigate how correlations affect decoding for decoding algorithm constructed from the correlated
responses, and ΔIdiag >0. I, information; Idiag, information
our two neuron, two stimuli example (FIG. 3). In general,
that would be extracted by a decoder optimized on the
a decoder is just a decision boundary, and in FIG. 3, in shuffled data but applied to the original correlated data;
which we have only two stimuli and the correlational ∆Idiag, I–Idiag; wdiag, decoding boundary estimated on
structure is fairly simple, the decision boundary is a line. shuffled data.
Examining the panels in FIG. 3a, we see that the decision
boundaries are the same whether they are estimated
on shuffled (left column) or correlated (right column) example in FIG. 3b, ∆Idiag is greater than zero, even though
responses. However, in the example shown in FIG. 3b, the effect of correlations on encoding is rather small
using a decision boundary based on shuffled responses (∆Ishuffled is close to zero, and could be made exactly zero
(black line) can lead to a strongly suboptimal decoding by adjusting the angle of the ellipses).
algorithm, as it would produce wrong answers much So how much information is lost when neural
more often than the optimal decision boundary (red line responses measured in the brain are decoded using algo-
in FIG. 3b). (Although ∆Idiag = 0 in FIG. 3a, for technical, rithms that ignore correlations? To our knowledge, the
Fisher information
Measures the variance of an but potentially important, reasons, correlations can be first researchers to address this question were Dan et al.55,
optimal estimator. crucial for decoding in this case; in fact, ∆I ≠ 0. A discus- who asked whether or not synchronous spikes in the
sion of this issue is beyond the scope of this review, but LGN carried additional information. They found that
Shannon information see REFS 52,54 for details.) pairs of synchronous spikes did carry extra information:
Measures how much one’s
uncertainty about the stimuli
Importantly, although ∆Idiag is zero in FIG. 3a, the corre- for their most correlated pairs, 20–40% more informa-
decreases after receiving lations clearly affect the amount of information encoded tion was available from a decoder that took synchronous
responses. (∆Ishuffled <0, as can be seen in FIG. 1a). Conversely, in the spikes into account than a decoder that did not.

NATURE REVIEWS | NEUROSCIENCE VOLUME 7 | MAY 2006 | 363


REVIEWS

0.4 (REF. 25) and motor cortex57 of the macaque. So, almost
c = 0.1
all of the empirical data suggest that little additional
information is available in the noise correlations
0.3
between neurons.
Do the small values of Idiag that have been observed

∆Idiag/I
0.2
experimentally extrapolate to populations? Amari and col-
leagues58 were the first to study this question theoretically.
They looked at several correlational structures and tun-
0.1 ing curves, and in most cases found that ∆Idiag was small
compared with the total information in the population.
These results, however, should not be taken to imply
0 that ∆Idiag is always small for populations. In FIG. 4 we
0 400 800 1,200 1,600 2,000
plot ∆Idiag as a function of population size. As was the
Population size
case with ∆Ishuffled, the effect of correlations on decoding
Figure 4 | ∆Idiag/I versus population size. As was the increases for larger populations.
case in FIG. 2b, correlations can have a small effect when This analysis tells us that the effect of correlations on
decoding pairs of neurons, but a large effect when
decoding strategies can be anything from no effect at
decoding populations. c, correlation coefficient;
I, information; Idiag, information that would be extracted all to a large effect. In some sense the observation that
by a decoder optimized on the shuffled data but applied correlations can be present and large and ∆Idiag can be
to the original correlated data; ∆Idiag, I–Idiag. Encoding small or even zero is the most surprising. This has the
model taken from REF. 45. important ramification that, when studying population
codes, one has to go beyond simply showing that noise
correlations exist: their effect on decoding spike trains
Almost all subsequent studies found that the maxi- must be directly measured.
mum value of ∆Idiag across many pairs of neurons was
small, of the order of 10% of the total information. Conclusions
This has been shown in the mouse retina51, rat barrel During the past decade it has become increasingly
cortex24, and the supplementary motor area13,56, V1 clear that if we want to understand population coding
we need to understand neuronal noise. This is not just
because noise makes population coding probabilistic,
Box 3 | Other measures of the impact of correlations
but is also because correlated noise has such a broad
An information-theoretic measure that has been applied to pairs of neurons is range of effects. First, correlations can either increase
redundancy20,22,27,61–63. This quantity is the sum of the information from individual cells or decrease the amount of information encoded by
minus the total information, a population of neurons. Importantly, decreases can
∆Iredundancy = ∑i I i – I (1) have especially severe effects in large populations, as
24,27,61,63–65
many correlational structures cause information to
where I i is the Shannon information from neuron i and I is the total information .
saturate as the number of neurons becomes large36,45,46.
The negative of ∆Iredundancy is known as ∆Isynergy, and neural codes with positive ∆Isynergy
(negative ∆Iredundancy) are referred to as synergistic. Such correlations, if they occur in the brain, would
The quantity ∆Iredundancy is often taken to be a measure of the extent to which neurons place fundamental constraints on the precision with
transmit independent messages, an interpretation based on the observation that if which variables can be represented 36,45–47. Second,
neurons do transmit independent messages, then ∆Iredundnacy is zero. However, this correlations might or might not affect computational
interpretation is somewhat problematic, because the converse is not true: ∆Iredundancy can strategies of networks of neurons. A decoder that can
be zero even when neurons do not transmit independent messages51,52. Therefore, extract all the information from a population of inde-
despite the fact that ∆Iredundancy has been extensively used24,27,61–65, its significance for pendent neurons may extract little when the neurons
population coding is not clear (for a more detailed discussion, see REFS 51,52). are correlated, or it may extract the vast majority54,58.
More recently, ∆Iredundancy has also been interpreted63,65 as a measure of how well Incidentally, other measures of information coding in
neurons adhere to the famous redundancy reduction hypothesis of Attneave, Barlow
populations, including synergy, have been put forward,
and others66–69. However, this interpretation is due to a rather unfortunate duplication
of names; in fact ∆Iredundancy is not the same as the redundancy referred to in this but they do not directly address the questions we are
hypothesis. For Barlow, as for Shannon53, redundancy is defined to be 1–H/Hmax, where considering here (BOX 3).
Hmax is the maximum entropy of a discrete distribution (subject to constraints) and H is These two aspects of correlations — how they affect
the observed entropy. This definition has also been extended to continuous the amount of information encoded by a population
distributions70, for which redundancy is 1–I/Imax, where Imax is the channel capacity. The and how they affect decoding of the information from
redundancy given in equation 1, however, corresponds to neither of these definitions, that population — can be quantified by the measures
nor does its normalized version, ∆Iredundancy/I. Measuring ∆Iredundancy therefore sheds little, if ∆I shuffled and ∆I diag , respectively. The first of these,
any, light on the redundancy reduction hypothesis. ∆Ishuffled, is the difference between the information in a
Studies that have estimated synergy or redundancy have in general, but not always63, population code when correlations are present and the
found that pairs of neurons can be either redundant or synergistic24,27,61,63–65, whereas
information in a population code when the correla-
larger populations are almost always redundant. The latter result is not surprising:
populations typically use many neurons to code for a small number of variables, so the tions are removed. This measure can be either greater
marginal contribution of any one neuron to the information becomes small as the than or less than zero46,47. The second, ∆Idiag, which
population size increases63. is more subtle, measures the difference between the
amount of information that could be extracted from

364 | MAY 2006 | VOLUME 7 www.nature.com/reviews/neuro


REVIEWS

a population by a decoder with full knowledge of the In summary, theoretical studies have greatly increased
correlations, and the amount that could be extracted our understanding of the effects of correlations on the
by a decoder with no knowledge of the correlations. amount of information encoded in a population36,45–47,
Because correlations are not removed from the and we are even beginning to understand how to build
responses when computing ∆Idiag, this quantity is very networks that could extract a large fraction of the infor-
different from ∆Ishuffled. In particular, unlike ∆Ishuffled, it mation from a population in cases in which correlations
can never be negative, because no decoder can extract are important23. The somewhat optimistic nature of
more information than one with full knowledge of the both of these statements should, however, be tempered
correlations. So, if ∆Idiag is zero, correlations are not by two observations. First, essentially all these studies
important for decoding, and if ∆Idiag is positive, they assumed Gaussian noise, and it is not clear how well they
are. In the latter case, the ratio ∆Idiag /I quantifies just generalize to the non-Gaussian noise found in the brain.
how important correlations are52. Second, experimentally quantifying the role of correla-
A somewhat counterintuitive result that has emerged tions in large populations has proved extremely difficult:
from quantitative studies of ∆Ishuffled and ∆Idiag is that the we have good measurements only for pairs of neurons,
two are not necessarily related: ∆Ishuffled can be either posi- and, as discussed above, results for pairs do not give
tive or negative when ∆Idiag is zero, and ∆Ishuffled can be zero much indication of what is going on at the population
when ∆Idiag is positive52,54. So, these two quantities answer level. It is therefore crucial that we develop methods that
different questions, and the use of both of them together can be used to study large populations experimentally.
can provide deeper insight into population codes. Because of data limitations it is not possible to directly
Both ∆Ishuffled and ∆Idiag have usually been found compute information59,60, so we are left with two options.
to be <10% for pairs of neurons. Therefore, it would One is to assess the role of correlations by decoding spike
seem that correlations are not important, and, in par- trains using algorithms that do and do not take correla-
ticular, that correlations caused by synchronous spikes tions into account, and comparing their performance.
— the type of correlations implicated in the binding The other is to develop better models of how noise is
by synchrony hypothesis — do not carry much extra correlated in populations, and to carry out theoretical
information. However, whether correlations are impor- computations based on those noise models. By apply-
tant for populations, which is the relevant question for ing both methods, we should ultimately understand the
the brain, remains an open question, because even role of correlated noise in the brain, and, in particular,
small correlations can have a significant effect in large how the brain carries out computations efficiently in the
populations23,36 (FIGS 2b,4). presence of this noise.

1. Lee, C., Rohrer, W. H. & Sparks, D. L. Population and decoding are related, as well as investigating 23. Shamir, M. & Sompolinsky, H. Nonlinear population
coding of saccadic eye movements by neurons in the the role of stimulus-modulated correlations. codes. Neural Comput. 16, 1105–1136 (2004).
superior colliculus. Nature 332, 357–360 (1988). 14. Hung, C. P., Kreiman, G., Poggio, T. & DiCarlo, J. J. 24. Petersen, R. S., Panzeri, S. & Diamond, M. E.
2. Sparks, D. L., Holland, R. & Guthrie, B. L. Size and Fast readout of object identity from macaque inferior Population coding of stimulus location in rat
distribution of movement fields in the monkey superior temporal cortex. Science 310, 863–866 (2005). somatosensory cortex. Neuron 32, 503–514 (2001).
colliculus. Brain Res. 113, 21–34 (1976). 15. Rolls, E. T., Treves, A. & Tovee, M. J. The 25. Golledge, H. D. et al. Correlations, feature-binding and
3. Georgopoulos, A. P., Schwartz, A. B. & Kettner, R. E. representational capacity of the distributed encoding population coding in primary visual cortex.
Neuronal population coding of movement direction. of information provided by populations of neurons in Neuroreport 14, 1045–1050 (2003).
Science 233, 1416–1419 (1986). primate temporal visual cortex. Exp. Brain Res. 114, 26. Panzeri, S., Golledge, H. D., Zheng, F., Tovée, M. J. &
4. Paradiso, M. A. A theory for the use of visual 149–162 (1997). Young, M. P. Objective assessment of the functional
orientation information which exploits the columnar 16. Gochin, P. M., Colombo, M., Dorfman, G. A., role of spike train correlations using information
structure of striate cortex. Biol. Cybern. 58, 35–49 Gerstein, G. L. & Gross, C. G. Neural ensemble coding measures. Vis. Cogn. 8, 531–547 (2001).
(1988). in inferior temporal cortex. J. Neurophysiol. 71, 27. Averbeck, B. B., Crowe, D. A., Chafee, M. V. &
5. Pouget, A., Dayan, P. & Zemel, R. Information 2325–2337 (1994). Georgopoulos, A. P. Neural activity in prefrontal cortex
processing with population codes. Nature Rev. 17. Georgopoulos, A. P. & Massey, J. T. Cognitive spatial- during copying geometrical shapes. II. Decoding shape
Neurosci. 1, 125–132 (2000). motor processes. 2. Information transmitted by the segments from neural ensembles. Exp. Brain Res.
6. Seung, H. S. & Sompolinsky, H. Simple models for direction of two-dimensional arm movements and by 150, 142–153 (2003).
reading neuronal population codes. Proc. Natl Acad. neuronal populations in primate motor cortex and 28. Romo, R., Hernandez, A., Zainos, A. & Salinas, E.
Sci. USA 90, 10749–10753 (1993). area 5. Exp. Brain Res. 69, 315–326 (1988). Correlated neuronal discharges that increase coding
7. Salinas, E. & Abbott, L. F. Vector reconstruction from 18. Oram, M. W., Foldiak, P., Perrett, D. I. & Sengpiel, F. efficiency during perceptual discrimination. Neuron
firing rates. J. Comput. Neurosci. 1, 89–107 (1994). The ‘Ideal Homunculus’: decoding neural population 38, 649–657 (2003).
8. Deneve, S., Latham, P. E. & Pouget, A. Reading signals. Trends Neurosci. 21, 259–265 (1998). 29. Averbeck, B. B. & Lee, D. Coding and transmission of
population codes: a neural implementation of ideal 19. Johnson, K. O. Sensory discrimination: decision information by neural ensembles. Trends Neurosci.
observers. Nature Neurosci. 2, 740–745 (1999). process. J. Neurophysiol. 43, 1771–1792 (1980). 27, 225–230 (2004).
9. McAdams, C. J. & Maunsell, J. H. Effects of attention 20. Panzeri, S., Schultz, S. R., Treves, A. & Rolls, E. T. 30. Milner, P. M. A model for visual shape recognition.
on the reliability of individual neurons in monkey Correlations and the encoding of information in the Psychol. Rev. 81, 521–535 (1974).
visual cortex. Neuron 23, 765–773 (1999). nervous system. Proc. R. Soc. Lond. B 266, 31. von der Malsburg, C. The correlation theory of brain
10. Schoups, A., Vogels, R., Qian, N. & Orban, G. 1001–1012 (1999). function. Internal Report, Dept Neurobiology, MPI for
Practising orientation identification improves Although somewhat technical, this was one of the Biophysical Chemistry (1981).
orientation coding in V1 neurons. Nature 412, first studies to clearly define a set of measures that 32. Singer, W. & Gray, C. M. Visual feature integration and
549–553 (2001). can be used to assess the role of correlations in the temporal correlation hypothesis. Annu. Rev.
11. Yang, T. & Maunsell, J. H. The effect of perceptual information coding. The basic approach presented Neurosci. 18, 555–586 (1995).
learning on neuronal responses in monkey visual in this manuscript was further elaborated in 33. Schnitzer, M. J. & Meister, M. Multineuronal firing
area V4. J. Neurosci. 24, 1617–1626 (2004). reference 22. patterns in the signal from eye to brain. Neuron 37,
12. Ghose, G. M., Yang, T. & Maunsell, J. H. Physiological 21. Engel, A. K., Konig, P. & Singer, W. Direct physiological 499–511 (2003).
correlates of perceptual learning in monkey V1 and V2. evidence for scene segmentation by temporal 34. Vaadia, E. et al. Dynamics of neuronal interactions in
J. Neurophysiol. 87, 1867–1888 (2002). coding. Proc. Natl Acad. Sci. USA 88, 9136–9140 monkey cortex in relation to behavioural events.
13. Averbeck, B. B. & Lee, D. Effects of noise correlations (1991). Nature 373, 515–518 (1995).
on information encoding and decoding. 22. Pola, G., Thiele, A., Hoffmann, K. P. & Panzeri, S. An 35. Roskies, A. L. The binding problem. Neuron 24, 7–9
J. Neurophysiol. (in the press). exact method to quantify the information transmitted (1999).
Combines a theoretical and empirical examination by different mechanisms of correlational coding. 36. Zohary, E., Shadlen, M. N. & Newsome, W. T.
of the way in which studies of information encoding Network 14, 35–60 (2003). Correlated neuronal discharge rate and its

NATURE REVIEWS | NEUROSCIENCE VOLUME 7 | MAY 2006 | 365


REVIEWS

implications for psychophysical performance. Nature 48. Hubel, D. H. & Wiesel, T. N. Receptive fields, binocular 60. Treves, A. & Panzeri, S. The upward bias in measures
370, 140–143 (1994). interaction and functional architecture in the cat’s of information derived from limited data samples.
The first study to show, in the context of neural visual cortex. J. Physiol. (Lond.) 160, 106–154 Neural Comput. 7, 399–407 (1995).
coding, that small correlations between neurons can (1962). 61. Gawne, T. J. & Richmond, B. J. How independent are
have a large effect on the ability of a population of 49. Seriès, P., Latham, P. E. & Pouget, A. Tuning curve the messages carried by adjacent inferior temporal
neurons to encode information. The main conclusion sharpening for orientation selectivity: coding efficiency cortical neurons? J. Neurosci. 13, 2758–2771
of this manuscript was that the correlations in MT and the impact of correlations. Nature Neurosci. 7, (1993).
cause information to saturate as the population 1129–1135 (2004). 62. Schneidman, E., Bialek, W. & Berry, M. J. Synergy,
reaches ~100 neurons. Whether or not this is One of the first papers to show that manipulations redundancy, and independence in population codes.
correct remains an open experimental question (see that increase the information in single cells J. Neurosci. 23, 11539–11553 (2003).
also references 45 and 46). (through, for example, sharpening tuning curves) 63. Narayanan, N. S., Kimchi, E. Y. & Laubach, M.
37. Olshausen, B. A. & Field, D. J. Emergence of simple- can, because the manipulation modifies Redundancy and synergy of neuronal ensembles
cell receptive field properties by learning a sparse correlations, reduce the information in the in motor cortex. J. Neurosci. 25, 4207–4216
code for natural images. Nature 381, 607–609 population. (2005).
(1996). 50. Casella, G. & Berger, R. L. Statistical Inference 64. Gawne, T. J., Kjaer, T. W., Hertz, J. A. & Richmond, B.
38. Simoncelli, E. P. & Olshausen, B. A. Natural image (Duxbury Press, Belmont, California, 1990). J. Adjacent visual cortical complex cells share about
statistics and neural representation. Annu. Rev. 51. Nirenberg, S., Carcieri, S. M., Jacobs, A. L. & 20% of their stimulus-related information. Cereb.
Neurosci. 24, 1193–1216 (2001). Latham, P. E. Retinal ganglion cells act largely as Cortex 6, 482–489 (1996).
39. Hyvarinen, A. & Hoyer, P. O. A two-layer sparse coding independent encoders. Nature 411, 698–701 (2001). 65. Puchalla, J. L., Schneidman, E., Harris, R. A. &
model learns simple and complex cell receptive fields 52. Latham, P. E. & Nirenberg, S. Synergy, redundancy, Berry, M. J. Redundancy in the population code of
and topography from natural images. Vision Res. 41, and independence in population codes, revisited. the retina. Neuron 46, 493–504 (2005).
2413–2423 (2001). J. Neurosci. 25, 5195–5206 (2005). 66. Attneave, F. Informational aspects of visual
40. Hyvarinen, A., Karhunen, J. & Oja, E. Independent 53. Shannon, C. E. & Weaver, W. The Mathematical perception. Psychol. Rev. 61, 183–193
Component Analysis (John Wiley and Sons, New York, Theory of Communication (Univ. Illinois Press, Urbana (1954).
2001). Champagne, Illinois, 1949). 67. Barlow, H. B. in Current Problems in Animal
41. Bell, A. J. & Sejnowski, T. J. An information- 54. Nirenberg, S. & Latham, P. E. Decoding neuronal spike Behaviour (eds Thorpe, W. H. & Zangwill, O. L.)
maximization approach to blind separation and blind trains: how important are correlations? Proc. Natl 331–360 (Cambridge Univ. Press, Cambridge,
deconvolution. Neural Comput. 7, 1129–1159 Acad. Sci. USA 100, 7348–7353 (2003). 1961).
(1995). 55. Dan, Y., Alonso, J. M., Usrey, W. M. & Reid, R. C. 68. Srinivasan, M. V., Laughlin, S. B. & Dubs, A.
42. Nadal, J. P. & Parga, N. Nonlinear neurons in the low- Coding of visual information by precisely correlated Predictive coding: a fresh view of inhibition in
noise limit: a factorial code maximizes information spikes in the lateral geniculate nucleus. Nature the retina. Proc. R. Soc. Lond. B 216, 427–459
transfer. Network 5, 565–581 (1994). Neurosci. 1, 501–507 (1998). (1982).
43. Gold, J. I. & Shadlen, M. N. Neural computations that 56. Averbeck, B. B. & Lee, D. Neural noise and movement- 69. Barlow, H. Redundancy reduction revisited. Network
underlie decisions about sensory stimuli. Trends Cogn. related codes in the macaque supplementary motor 12, 241–253 (2001).
Sci. 5, 10–16 (2001). area. J. Neurosci. 23, 7630–7641 (2003). 70. Atick, J. J. & Redlich, A. N. Towards a theory of early
44. Lee, D. K., Itti, L., Koch, C. & Braun, J. Attention 57. Oram, M. W., Hatsopoulos, N. G., Richmond, B. J. & visual processing. Neural Comput. 2, 308–320
activates winner-take-all competition among visual Donoghue, J. P. Excess synchrony in motor cortical (1990).
filters. Nature Neurosci. 2, 375–381 (1999). neurons provides redundant direction information with 71. Braitenberg, V. & Schüz, A. Anatomy of the Cortex
45. Sompolinsky, H., Yoon, H., Kang, K. & Shamir, M. that from coarse temporal measures. J. Neurophysiol. (Springer, Berlin, 1991).
Population coding in neuronal systems with correlated 86, 1700–1716 (2001).
noise. Phys. Rev. E Stat. Nonlin. Soft Matter Phys. 64, 58. Wu, S., Nakahara, H. & Amari, S. Population coding Acknowledgements
051904 (2001). with correlation and an unfaithful model. Neural P.E.L. was supported by the Gatsby Charitable Foundation,
46. Abbott, L. F. & Dayan, P. The effect of correlated Comput. 13, 775–797 (2001). London, UK, and a grant from the National Institute of Mental
variability on the accuracy of a population code. The first study to theoretically investigate the Health, National Institutes of Health, USA. A.P. was
Neural Comput. 11, 91–101 (1999). effects of ignoring correlations when decoding a supported by grants from the National Science Foundation.
One of the most influential theoretical studies of the large population of neurons. As such it is the B.B.A. was supported by a grant from the National Institutes
effect of noise correlations on information encoding. decoding complement to reference 46. of Health.
47. Wilke, S. D. & Eurich, C. W. Representational accuracy 59. Strong, S. P., Koberle, R., de Ruyter van Steveninck,
of stochastic neural populations. Neural Comput. 14, R. R. & Bialek, W. Entropy and information in neural Competing interests statement
155–189 (2002). spike trains. Phys. Rev. Lett. 80, 197–200 (1998). The authors declare no competing financial interests.

366 | MAY 2006 | VOLUME 7 www.nature.com/reviews/neuro

You might also like