Neuron 51, 359–368, August 3, 2006 ª2006 Elsevier Inc. DOI 10.1016/j.neuron.2006.06.030
Reduction of Information Redundancy
in the Ascending Auditory Pathway
Gal Chechik,1,5,* Michael J. Anderson,4
Omer Bar-Yosef,2 Eric D. Young,4 Naftali Tishby,1,3
and Israel Nelken1,2
1
Interdisciplinary Center for Neural Computation
2
Department of Neurobiology
3
School of Computer Science and Engineering
Hebrew University of Jerusalem
Jerusalem 91904
Israel
4
Department of Biomedical Engineering
Johns Hopkins University
Baltimore, Maryland 21205
Summary
Information processing by a sensory system is reflected in the changes in stimulus representation along
its successive processing stages. We measured information content and stimulus-induced redundancy in
the neural responses to a set of natural sounds in three
successive stations of the auditory pathway—inferior
colliculus (IC), auditory thalamus (MGB), and primary
auditory cortex (A1). Information about stimulus identity was somewhat reduced in single A1 and MGB neurons relative to single IC neurons, when information
is measured using spike counts, latency, or temporal
spiking patterns. However, most of this difference
was due to differences in firing rates. On the other
hand, IC neurons were substantially more redundant
than A1 and MGB neurons. IC redundancy was largely
related to frequency selectivity. Redundancy reduction
may be a generic organization principle of neural
systems, allowing for easier readout of the identity of
complex stimuli in A1 relative to IC.
Introduction
Over the last 40 years, various general principles of information processing in sensory systems have been
suggested based on theoretical considerations. These
include effective information transmission (Becker and
Hinton, 1992; Linsker, 1988), efficient use of storage
(Barlow, 1961; Miller, 1956) or energy resources (Levy
and Baxter, 1996, 2002), achieving sparse codes (Olshausen and Field, 1996), and extraction of behaviorally
relevant stimulus properties (Escabi et al., 2003; Fritz
et al., 2003; Rieke et al., 1995). Each of these proposed
principles predicts specific transformations of stimulus
representations along the processing hierarchy, but
the experimental evidence required to assess any of
them is still very limited.
Among the potential changes in stimulus representations, of special interest is the way groups of neurons interact to code information about the stimuli. These inter-
*Correspondence: gal@stanford.edu
Present address: Computer Science Department, 353 Serra Mall,
Stanford University, Stanford, California 94305.
5
actions can be synergistic, in which the interactions
increase the amount of information carried by the group
compared with the same neurons considered independently of each other. The interactions can also be redundant, in which they reduce the amount of information
carried by isolated neurons independently because different neurons convey overlapping information. At the
receptor level, neurons are often highly redundant since
each point in the sensory epithelium is represented by
a large number of neurons with overlapping receptive
fields. Barlow (1961) advocated the idea that redundancies in stimulus representation are reduced as the
stimuli are successively processed at different stations.
As a result, neurons at higher processing stations may
become largely independent to allow for easier readout
and more efficient use of coding resources. This idea,
together with other theoretical principles, can be investigated experimentally by comparing stimulus representations along a hierarchy of processing stations.
To investigate how the stimulus representation
changes along processing stations, it is necessary to
use stimuli that potentially engage nontrivial processing
mechanisms at all levels of the auditory pathway. This
requirement poses opposing constraints on the stimuli:
on the one hand, the stimuli have to be rich enough to
activate interesting central processing mechanisms,
and on the other hand, their peripheral representations
must be similar enough to make the task of distinguishing between them nontrivial. To satisfy these two requirements, we designed a set of stimuli that was based
on natural bird vocalizations that contain rich and complex acoustic structures. To these we added systematically modified variants that shared similar spectro-temporal structures (Figure 1). These are expected to elicit
high redundancies in the auditory periphery, although
they are clearly different perceptually. Furthermore, we
have previously demonstrated that these stimuli evoke
rich and complex responses in auditory cortex
(Bar-Yosef et al., 2002). These stimuli are therefore suitable to test the fate of stimulus-induced redundancy in
the ascending auditory system.
To quantify changes in stimulus representations, we
used measures of information content (Borst and Theunissen, 1999; Rieke et al., 1997) and stimulus-induced
informational redundancy of neural responses in three
subsequent stations in the core auditory pathway: the
inferior colliculus (IC), medial geniculate body of the
thalamus (MGB), and primary auditory cortex (A1).
Results
All recordings were performed in halothane-anesthetized cats using a single set of stimuli consisting of natural and modified bird vocalizations (Bar-Yosef et al.,
2002). Figure 2 shows examples of three representative
stimuli, together with the neuronal responses they elicited in cells from different brain areas. The A1 neurons
(Figures 2F and 2G) often responded differently to the
full sound (left column) and to the main chirp component
of the sound (center column), in which the echoes and
Neuron
360
Figure 1. Spectrograms of the Stimuli Used in this Study
Five variants (rows) were created out of three different bird chirps
(columns). The variants were Natural: the full sound; Main: main
chirp component after removing echoes and background noise;
Noise: sound after removing the main chirp; Echo: the echo parts
of the noise; Back: the background remaining after removing the
echo from Noise.
background noise were removed. Responses to the full
natural sound and to the background noise and echoes
(right column) were often similar (Figures 2F and 2G),
even though the echoes were 15–20 dB weaker than
the main chirp and had different temporal envelopes.
In contrast, IC neurons (Figures 2B and 2C) responded
similarly to the full sound and to the main chirp, but responded weakly to the noise. MGB neurons were intermediate (Figures 2D and 2E). In this study we quantify
these complex response properties using information
theoretic measures.
Information about Stimulus Identity
The relations between neural responses and the identity
of the stimulus are often of a complex nature: they typically involve complex and stochastic patterns of activity
that are not well characterized by linear correlations
alone. High-order correlations between neural activity
and stimuli can be quantitatively evaluated using the
mutual information (MI) I(S;R) (Cover and Thomas,
1991; Shannon, 1948) between the stimuli S and the responses R (see the Experimental Procedures and the
Supplemental Data). The MI is a function of the joint distribution of stimuli and responses that has several alternative interpretations. First, the MI can be interpreted as
quantifying the differences between the responses to
different stimuli (‘‘stimulus effect’’). Whereas a stimulus
effect is usually quantified by simple measures such as
changes in average spike rate, the MI measure is sensitive to additional changes in the distribution of the responses. For instance, two stimuli that give rise to the
same average spike counts but with different standard
deviations yield a nonzero MI. Furthermore, the MI is
free of any assumption on the shape of the distribution
of the responses to each stimulus, such as normality
or equal variance, and can be used to quantify depen-
Figure 2. Samples of Stimuli and Neural Responses
(A) Three typical stimuli: A bird chirp in its full natural form (left), the
main chirp component after removing the echoes and the background noise (center), and the echoes and background (right) ([B]–
[G]). Responses in different brain regions are displayed as dot rasters. In the right column, frequency response areas with the stimulus
spectra superimposed (white lines) are displayed. The frequency response areas show discharge rate (from blue [low] to red [high]) in
response to tones of various frequencies (kHz, abscissa) and sound
levels (dB SPL, ordinate). (B and C) Two IC cells (best frequencies
are 8.8 and 10.6 kHz). Stimulus spectra were shifted into the neuronal response areas in (B) and (C) by increasing the sampling rate by
2.4 and 5.3, respectively. (D and E) Two MGB cells (both best frequencies are 5.9 kHz). ([F]–[H]) Three A1 cells (best frequencies are
3.6 and 5.9 kHz). In (D)–(H), the original sampling rates were used.
dence between categorical variables such as stimuli
and spike patterns, where measures such as averages
cannot be meaningfully computed. On the downside,
the MI is substantially more difficult to estimate reliably.
An alternative interpretation of MI, rooted in information
theory, sees MI as the average reduction in the uncertainty about the stimulus after observing a single response (Cover and Thomas, 1991; Shannon, 1948).
In practice, since the responses R are complex and
high-dimensional in nature, they are usually quantified
using simplified representations of the spike trains.
Common representations are the spike counts during
the stimulus, the first spike latency, or the set of spike
patterns coded as binary words at a given resolution
(e.g., 4 ms bins). Choosing how spike trains are represented typically has a large effect on the level of information that can be extracted from the spike train. More
complex representations (such as binary spike patterns)
can extract more information from the responses, but it
is not clear to what extent this information is used by
Redundancy Reduction in the Auditory Pathway
361
Figure 3. Information in the Coding of Stimulus Identity
([A]–[D]) Illustration of matrix-based estimation of MI between stimuli identity and spike counts for a single MGB cell. (A) Five illustrative stimuli.
(B) Raster plots of the responses to 20 repeats of each of the stimuli in (A). (C) Histogram of the spike count distribution across trials presented in (B).
(D) Color-coded histograms of spike counts for all the 15 stimuli (rows). The naive MI estimator is the MI over this empirical joint distribution matrix.
(E) Color-coded histogram of spike patterns’ occurrence for all 15 stimuli. For the purpose of this illustration, spikes in a window of 64 ms were considered, and their response times were discretized into 8 ms bins, yielding 8 bins, each containing no spikes (0) or at least one spike (1).
([F]–[H]) Mutual information and firing rates. Each point shows the average firing rate of a neuron to the stimulus ensemble (ordinate) plotted against
the MI between spike counts and stimulus identity (abscissa). Large symbols denote the mean over a brain region. (F) MI using counts. (G) MI using
latencies. (H) MI using spike patterns.
downstream neurons, whose readout mechanism may
be limited. To address this issue we analyzed several response representations, each taking into account some
different aspects of the responses (Nelken et al., 2005),
and we report results obtained with several response
representations.
We used the MI as a tool to quantify how the representation of the above set of stimuli changes between IC,
MGB, and A1. We started by estimating the levels of information about stimulus identity that are conveyed by
single neurons. Figures 3A–3D illustrate how MI is estimated from spike counts: the responses (Figure 3B)
for each stimulus (Figure 3A) are summarized using the
spike count, and the distribution of the counts is calculated for each stimulus (Figure 3C). The empirical joint
distribution of stimuli and counts (Figure 3D) can be
used to estimate the MI (see the Experimental Procedures). Similarly, the responses can be represented using other statistics like temporal firing patterns. The corresponding MI values can be calculated based on the
distribution of these patterns and the stimuli (Figure 3E).
On average, we found that individual IC neurons conveyed 2- to 4-fold more information about the identity of
the stimuli than did A1 and MGB neurons. This was
observed both for the information conveyed by spike
counts (IC: 0.68 bits/trial, n = 39; MGB: 0.16 bits/trial,
n = 36; A1: 0.18 bits/trial, n = 45), spike latency (0.75,
0.36, 0.39 bits/trial in IC, MGB, and A1, respectively),
and spike patterns (0.88, 0.38, 0.41 bits/trial; see the
Supplemental Data for more details). The MI estimated
by spike counts was strongly correlated with MI estimates using latency or spike patterns. The ratios of the
firing rates over all stimuli had about the same magnitudes, and as a result, information per spike was rather
similar in the three stations (mean MI per spike [6 standard deviation] was 0.38 6 0.26, 0.44 6 0.28, and 0.28 6
0.18 bits/spike using spike patterns in IC, MGB, and A1,
respectively; see Figures 3F–3H and the Supplemental
Data). These differences show that stimuli typically elicited responses that were more easily differentiated in
IC neurons than in A1 neurons. However, most of this
difference could be accounted for by the higher firing
rate of IC neurons, which made individual responses
overall more discriminable. The 2-fold reduction in single-neuron information that we observed between IC
and A1 is counterbalanced by the substantially larger
number of neurons in A1. Thus, if such differences in
information levels are indeed typical for general sets of
natural stimuli, they are not expected to affect the total
representational capacity of A1 relative to the IC.
To better understand the meaning of the absolute MI
values reported here, we consider the MI as a reduction
Neuron
362
in uncertainty. The total uncertainty of the stimulus
ensemble, as quantified by the stimulus entropy, is
log2 (15) = 3.91 bits. Since the average A1 neuron carried
0.41 bits/trial, about ten independent A1 neurons would
be enough to eliminate stimulus uncertainty. In other
words, if information was additive across neurons,
the identity of the stimulus could have been completely
specified, on a trial-by-trial basis, using ten neurons only
(and a correspondingly smaller number of IC neurons).
Thus, the seemingly low information values computed
here nevertheless imply that surprisingly small populations of neurons could be enough to discriminate
between the stimuli used in this study.
Informational Redundancy
The above calculation estimates information carried by
single neurons, but the information carried by a population of neurons could also depend on the relationships
between the responses of different neurons (Pouget
et al., 2003, Schneidman et al., 2003). It is customary
to separate these relationships into two types. Signal
correlations are due to a similarity in the neuronal responses across different stimuli. These occur, for example, when several neurons have the same response profile (mean spike counts as a function of stimulus identity)
over the stimulus ensemble. Noise correlations are due
to common fluctuations in the responses to a given stimulus across repeated presentations. For example, when
a stronger-than-average response of one neuron at
a certain trial tends to occur with a stronger-than-average response of the other neuron in the same trial, their
correlation is referred to as a noise correlation. As a rule,
signal correlations always lead to redundancy, whereas
noise correlations may lead either to redundancy or to
its opposite, synergy.
Before we discuss how signal and noise correlations
can be quantified, we demonstrate the effect of signal
correlations using spike-counts response profiles. We
focus on signal correlations and assume for the moment
that the noise correlations are negligible (this assumption will be experimentally verified below). Examples of
response profiles are displayed for one pair of A1 neurons (Figure 4A) and one pair of IC neurons (Figure 4B).
Whereas the responses of the IC neurons in this example exhibited covariation across the stimulus set, the
A1 neurons did so to a much lesser extent. The same
data can be studied from the view of the readout of spike
responses. In this setting, the observed responses are
used to infer which stimulus was presented. It is reasonable to assume that in this case, observing a response of
the second IC neuron does not add much to the information supplied by the first, testifying that these two neurons provide redundant information. In contrast, observing the second A1 neuron may help proportionately
more in identifying the stimulus. It will be shown below
that this contrast between IC and A1 neuronal pairs is
common, although we used a different way to quantify
this difference.
The natural tools to study synergy and redundancy in
these terms are again information theoretic. Figures 4C
and 4D show the joint distribution of the spike counts
for the two pairs of neurons. The joint distribution of
the IC pair (Figure 4D) shows a clear interdependence
between the two neurons: large spike counts in one neu-
Figure 4. Joint Distributions of Spike Counts
(A and B) Spike counts across the stimulus ensemble for a pair of IC
cells (best frequencies 5.5 and 6.1 kHz) and a pair of A1 cells (both
best frequencies are 5.1 kHz). Error bars = SEMs of the spike counts,
for 20 repeats of the ensemble for A1 neurons and ten repeats for IC
neurons. The sampling rate of the stimuli for the IC neurons was increased to place the center frequency of the chirp at BF.
(C and D) Joint distribution of spike counts across all repeats of all
stimuli, of the same two pairs of IC and A1 neurons.
ron tend to occur with large spike counts in the other
neuron. In the A1 pair (Figure 4C), this dependence is
much weaker if present at all. Thus, the dependence between the responses of the two neurons is another way
of uncovering redundancies. In contrast with the meancount response profiles, which are based on average
spike counts and require ad hoc measures in order to
quantify the redundancy, the degree of dependence in
the joint distribution of the responses can be measured
by the MI without any distributional assumptions. In addition, MI is not limited to spike counts, and we can calculate the distribution of other statistics of the responses, like the joint distribution of stimuli and spike
patterns or latency as shown above.
Most importantly, rather than investigating similarity
in neuronal responses, we focus on informational redundancy, which quantifies the similarities between the sets
of stimuli that can be discriminated using neuronal responses. To clarify the difference, consider the following
example. A phasic neuron codes the identity of the stimulus in the timing of its burst. Another, tonic neuron codes the identity of the stimulus in its overall spike count.
The pair of neurons can be redundant if the timing of the
phasic response is highly correlated with the number of
spikes elicited in the tonic neuron by the same stimulus.
In such a case, although each neuron has a very different
response pattern and requires a different decoding
method, the information they convey about the stimuli
is similar. Therefore, nonredundancy is inherently different from ‘‘distinct tuning curves.’’ To address such potential heterogeneity in coding, we quantified redundancy conveyed through various aspects of spike
trains: spike counts, latency, and spike patterns. Spike
patterns are actually sensitive to both latency and total
spike counts.
Redundancy Reduction in the Auditory Pathway
363
We quantified the signal correlations by the mutual
information between responses of different neurons,
I(X1; X2), using joint distributions as in Figures 4C and
4D. Noise correlations can be quantified by the stimulus-conditioned information IðX1 ; X2 jSÞ in which the MI
is first estimated using the joint distribution of the responses for each stimulus separately, then averaged
across stimuli. Estimating noise correlations requires
more stimulus repetitions because the joint distributions
of the responses are estimated from substantially
smaller numbers of trials. In order to have a reliable estimate of the noise correlations, we measured the responses of a subset of neurons in A1 with 100 repetitions per stimulus. Noise correlations in these data
were negligible (see Figure S1 in the Supplemental
Data). Noise correlations are believed to be mostly due
to network interactions, and are therefore expected to
be more pronounced in higher processing stations.
Since these correlations were negligible in A1, we conclude that they are of minor importance in MGB and
a fortiori also in IC. Thus, in the data discussed here, signal correlations seem to be dominant, leading to a possible predominance of information redundancy between
neurons. This result allowed us to approximate the
response distributions of several neurons as being conditionally independent given the stimulus (see Experimental Procedures). The stimulus-conditioned independence approximation has been used previously (Reich
et al., 2001) when the within-stimulus correlations between simultaneously-recorded neurons are small, as
they are here (Figure S1). This approximation has several
considerable practical advantages: it allows us to use
nonsimultaneously measured pairs of neurons and to
couple them as if noise correlations were absent. It also
provides a more reliable redundancy estimation, and
therefore allows using fewer stimulus repeats or estimating higher-order redundancies.
To quantify the redundancy among larger groups of
neurons caused by between-stimuli covariation, we
used the measure of multi-information, a natural extension of mutual information, defined as
IðX1 ; .; Xn Þ =
X
pðx1 ; .; xn Þlog
x1 ;.;xn
pðx1 ; .; xn Þ
pðx1 Þ$.$pðxn Þ
(Studenty and Vejnarova, 1998). Redundancy was then
defined as the normalized multi-information
IðX1 ; .; Xn Þ=
X
IðXi ; SÞ
i
where all joint distributions were approximated under
the stimulus-conditioned independence approximation.
Normalization was performed as in Brenner et al. (2000)
and Reich et al. (2001) and is required in order to bring
measures from different auditory stations to a unified
scale (see the Experimental Procedures). Using unnormalized measures yields considerably more pronounced effects, which are shown in Figure S2.
We first discuss redundancy when the neural response is summarized using the total spike counts,
and we then discuss extensions to more complex measures of the responses further below. Neurons in A1 and
MGB were found to be significantly less redundant than
neurons in IC in the way they code the stimulus identity
(Figure 5A). The median normalized redundancy in IC
was 0.13 (with a median absolute deviation from the median of 0.07), whereas in MGB it was 0.02 (60.015) and in
A1 0.03 (60.015) (t test, p < 10210 for both IC-MGB and
IC-A1 comparisons, not significant for A1-MGB comparison, p > 0.8). This phenomenon is even more pronounced when considering triplets of neurons (Figure 5B), where median-normalized redundancies were
0.34, 0.03, and 0.05 in IC, MGB, and A1, respectively.
These results suggest that information processing in
the auditory pathway operates to achieve a neural representation in which neurons are tuned for independent
stimulus properties.
The size of the redundancy could be strongly affected
by how responses are represented. In particular, decoding spike trains using total spike counts neglects information conveyed by the temporal structure of the spike
train. Higher and more accurate estimates of the MI may
be obtained by using other statistics of the spike trains
(de Ruyter van Steveninck et al., 1997; Panzeri et al.,
1999; Victor, 2002, Nelken et al., 2005), since these
may take into account temporal structures and highorder correlations within spike trains. To test how redundancies depend on spike train representations, we
further estimated MI conveyed by three other statistics
of the spike trains: distribution of spike patterns viewed
as binary words (de Ruyter van Steveninck et al., 1997;
Strong et al., 1998), first spike latency, and binless estimation based on embedding in Euclidean spaces (Victor, 2002) (see Experimental Procedures and the Supplemental Data). The size of the redundancy remained
essentially the same. For example, Figure 5C displays
the distribution of normalized pair redundancy calculated using the spike patterns as binary words (as in
de Ruyter van Steveninck et al., 1997; Strong et al.,
1998, Nelken et al., 2005). These show that as in the
case of spike count information, IC neurons are significantly more redundant than A1 and MGB neurons.
Controls: Stimulus Bandwidth, Anatomical Location,
and Frequency Selectivity
In V1, redundancy has been shown to decrease when
increasing the size of the visual stimulus (Vinje and
Gallant, 2000, 2002). One possible analog of increasing
the spatial size of a visual image is to increase the bandwidth of an auditory stimulus. In order to check the effect
of bandwidth on redundancy reduction, we computed
the redundancies elicited by a subset of stimuli consisting of all the narrowband versions (Main, Echo, and
Main + Echo). These redundancies were also substantially larger in IC than in MGB and A1 (Figure 5D). Similar
results were obtained when using only the remaining
stimuli (data not shown). Thus, the large decrease in
redundancy in MGB and A1 relative to the IC is not
due only to the inclusion of both narrowband and wideband stimuli in the stimulus set.
The higher redundancy in IC could result from undersampling, because neurons recorded in the same electrode penetrations could share more response properties and therefore show higher redundancy. Neurons in
IC are known to differ by a number of response properties, such as temporal response patterns, width of tuning curves, best modulation frequency, and strength of
Neuron
364
Figure 5. Informational Redundancy in the
Coding of Stimulus Identity
(A) Distribution of normalized pairs’ redundancy I(X1; X2)/I(X1; X2; S) based on spike
counts for cells in IC, MGB, and A1. Arrows
denote group means.
(B) Distribution of normalized triplets’ redundancy I(X1; X2; X3)/I(X1; X2; X3; S) based on
spike counts.
(C) Normalized pair redundancy for spike patterns coded as binary words.
(D) Distribution of redundancies as in (A) for
a stimulus set restricted to stimuli having energy in a narrow frequency range (rows 2–3,
Figure 1).
(E) Average spike count redundancies between pairs in three different proximity classes as explained in the text, in the three auditory stations. Error bars denote the SEM of
each group, the number above each bar denotes the number of pairs in each group,
and p is the p value for a one-way ANOVA
test for the difference between the groups’
means.
inhibition. At least some of these properties are organized in dorso-ventral columns (Ehret et al., 2003;
Schreiner and Langner, 1997), which was also the direction of our electrode penetrations in most experiments.
To address this problem, redundancy was analyzed
separately in three proximity classes: from neurons recorded in the same penetration, in the same animal
but in different penetrations, and in different animals
(Figure 5E). MGB and A1 redundancy was found to depend somewhat on proximity class, although rather
weakly (one-way ANOVA, p = 0.03 and p = 0.06, respectively). Redundancy in IC was largely independent of the
proximity class. Thus, at least in MGB, neurons recorded in the same penetration, and therefore largely
within the same MGB subdivision (as judged by anatomical reconstruction of the penetrations), were somewhat
more redundant than neurons across penetrations.
More importantly, however, redundancy in IC was significantly larger in all proximity classes than in any of the
MGB and A1 proximity classes. Thus, anatomical considerations cannot explain the higher redundancy in IC
relative to MGB and A1.
When probed with pure tones, auditory neurons often
exhibit high sensitivity to a specific frequency, termed
their best frequency (BF). Common BF is therefore a potential source for redundancy between auditory neurons. Neurons with the same BF would be expected to
respond strongly to stimuli containing energy near their
BF and respond weakly to other stimuli, generating stimulus-induced correlations as in Figure 4B. Figure 6B
plots the normalized redundancy (whose distribution is
presented in Figure 5A) between pairs of A1 neurons ordered by their BFs. Figure 6A plots the same measure
for numerical simulations of auditory nerve fibers
(ANFs; see Experimental Procedures) having the same
set of BFs and responding to the same stimuli. Large
redundancy values are observed for simulated ANF
pairs with similar BFs, especially in the frequency range
where the stimuli contain most of their energy. In contrast, A1 neurons in the same frequency range show
essentially no redundancy. Figures 6C–6F quantifies
this effect by plotting pairs’ redundancy as a function
of the difference between BFs in the three auditory
stations and in the ANF simulation. BF similarity is correlated with strong redundancy in ANF simulations (regression slope of 20.088 bits/octave, n = 45, p < 1026)
and in IC (slope 20.037 bits/octave, n = 39, p < 1026).
This correlation is smaller in MGB and absent in A1
(MGB slope 20.0028 bits/octave, n = 36, p < 0.001;
A1 slope 20.013 bits/octave, n = 45, not significant).
The existence of these strong correlations in IC, and at
the same time, weak correlations in MGB and A1, does
not mean that all IC neurons with the same BF are redundant, as can be clearly seen in Figure 6D. Rather, there
are many IC neurons with strong redundancy in their responses to this set of sounds, and the common feature
of these pairs is a similar BF. The average dependence
of redundancy on BF difference is similar in IC and in the
ANF simulations. On the other hand, neither in MGB nor
in A1 did we find any pair of neurons, even among those
with similar BF, that had redundancy as large as in IC.
Discussion
By comparing information levels and informational redundancy across a sensory processing pathway, we
identified a dramatic change in stimulus representation
that reflects the characteristics of stimulus representation in these stations. Starting with a set of sounds that
was designed to induce high informational redundancy
in the auditory periphery (Figure 6A), we found that the
redundancy was still substantial in IC but essentially disappeared in MGB and in A1. This reduction in redundancy was observed for any response representation,
Redundancy Reduction in the Auditory Pathway
365
Figure 6. Redundancy and Frequency Selectivity
(A) Normalized redundancy plotted as a function of cells’ best frequencies. Simulated responses of auditory nerve fibers (ANFs) following the hair-cell model by Meddis (1986) as implemented by Slaney (Auditory toolbox ver2, 1998). The number of model ANFs and
their best frequencies were matched to the A1 cells (B). Diagonal
values (white) are omitted, since these values measure the redundancy of a cell with itself. (B) Same plot as in (A), but for A1 neurons.
(C) Normalized redundancy between pairs of ANF model neurons as
a function of BF difference. (D) Same as in (C), for recorded IC neurons. (E) Same as in (C), for recorded MGB neurons. (F) Same as in
(C), for recorded A1 neurons.
including spike counts, latency, or temporal spike patterns. While IC redundancies were correlated with the
frequency sensitivity of IC neuronal pairs, this was not
the case in A1 and MGB.
The IC integrates essentially all lower processing
streams, and thus contains neurons which are potentially selective for complex features; it is also believed
to contain a detailed representation of sounds in terms
of their physical features, possibly in overlapping parameter maps (Casseday et al., 2002). As expected,
our findings suggest that this representation contains
relatively high informational redundancies when stimuli
have only small variations in their spectro-temporal
structure. Above the IC, the representation of the spectro-temporal structure of sounds is degraded (see also
Miller et al., 2002). Although we could expect an associated reduction in the ability of cortical neurons to encode the identity of sounds, we demonstrate that this
reduction is rather small. While we do not know the nature of processing that MGB and A1 perform on the outputs of IC, we do show here that it results in reduced
informational redundancy. Therefore, although A1 neurons respond to a wide range of stimuli in a way that is
seemingly not stimulus-specific (Bar-Yosef et al., 2002;
Middlebrooks et al., 1994; Schnupp et al., 2001) (see
also Figures 3F–3H and Figure 4A), the information conveyed by different neurons is largely independent.
The reason for this effect is that although A1 cells may
respond similarly to some stimuli, the subsets of stimuli
that evoke similar responses change from one A1 neuron to another. For example, some A1 neurons responded similarly to both the Natural and Noise variants
of the stimulus (Figure 2G), while others responded similarly to Natural and Main but differently to the Noise variant (Figure 2H). Thus, each of these neurons groups the
set of stimuli through a different criterion. Such distributed coding provides good discrimination between
stimuli when observing multiple neurons, since together
they partition the set of possible stimuli to small enough
sets in an efficient way, given the partition capabilities of
the single neurons. This interpretation again reflects the
point of view that focuses on partitions of stimulus
space rather than similarities of the responses.
In the primary visual cortex, redundancy between neurons has been studied as a function of the size of the
stimulated visual field (Vinje and Gallant, 2000, 2002),
and was found to differ from auditory redundancies in
two crucial aspects. Firstly, redundancies between neurons in V1 have been shown to decrease with increase in
the spatial size of the stimulated visual field. One possible analog of a restricted stimulated field in vision is a narrowband stimulus in audition. However, using only narrowband stimuli reproduced the same decrease in
redundancy in MGB and A1 relative to IC (Figure 5D).
Thus, increase in bandwidth, at least between the narrow- and wideband stimuli used here, does not have
the same effect in A1 as an increase in stimulated visual
field has in V1: A1 neurons, like V1 neurons, show little redundancy with broadband (or large-field) stimuli. However, in A1, the low redundancy is retained when stimuli
are narrowband, while in V1, smaller stimuli are associated with increased redundancy (Vinje and Gallant,
2000, 2002). Secondly, visual neural responses showed
an increase in selectivity and a formation of a sparse representation of visual scenes. In A1, the redundancy was
much reduced compared with the IC (Figure 5D), but A1
neurons responded to many different sounds in the set
(Figures 3F and 3G and Figure 4A), and were actually
less selective to stimulus identity than IC neurons.
Thus, redundancy in visual cortex processing seems to
operate differently than in A1 with regard to the size of
the stimulus, although low redundancy may be achieved
under the appropriate conditions (Reich et al., 2001).
What could be the computational advantages of such
a redundancy reduction process? A possible outcome is
a ‘‘splitting’’ of the information inside a single frequency
channel as suggested by de Cheveigne (2001). This addresses the difficult problem of segregating the spectrotemporal representation of complex soundscapes into
distinct components that belong to different auditory
objects—a segregation that is achieved by the auditory
system in spite of possible overlaps between objects
both in time and in frequency.
More generally, Barlow suggested that reducing the
redundancy between computing elements reflects a process where the system extracts meaningful structures in
signals and codes them independently (Barlow, 2001).
Neuron
366
Indeed, reducing redundancy during information processing, by mapping stimuli to a higher-dimensional feature space, is known to provide better discrimination
among complex inputs—as is done in independent component analysis (Bell and Sejnowski, 1995) and support
vector machines (Vapnik, 1995). The increased coding
independence of A1 neurons, compared to the IC, may
thus reflect the extraction of relevant information from
acoustic stimuli. This view is supported by our finding
that A1 cells carry considerably less information about
the spectro-temporal structure (relative to IC cells) than
about the more abstract notion of stimulus identity
(data not shown). Similar processes may characterize
other modalities, as for example in inferotemporal visual
neurons that are sensitive to the more abstract notion of
a face, but less sensitive to its physical details. The observations presented here raise the hypothesis that obtaining representations with reduced redundancies in
high processing stations is a generic organizational principle of sensory systems that allows easier readout of
behaviorally relevant aspects of the natural scene.
Redundancy Quantification
Informational redundancy between pairs of neurons can be quantified by the difference between information conveyed by a group
of neurons and the sum of information conveyed by those neurons
individually:
n
X
IðX1 ; .; Xn ; SÞ 2
IðXi ; SÞ
i=1
is a measure of redundancy previously used for pairs of neurons
(Brenner et al., 2000; Gat and Tishby, 1999; Narayanan et al., 2005;
Rieke et al., 1997; Rolls and Treves, 1998; Schneidman et al., 2003;
Warland et al., 1997) . This can be also presented as the difference
between two multi-information terms
IðX1 ; .; Xn jSÞ 2 IðX1 ; .; Xn Þ
where multi-information is a natural extension of mutual information,
defined as the following (Studenty and Vejnarova, 1998):
Experimental Procedures
Information about Stimulus Identity
The mutual information between responses R and a set of stimuli S is
defined in terms of their joint distribution: p(S,R). When this distribution p(S,R) is known exactly, the MI can be calculated as
X
pðS;RÞ
IðS;RÞ =
pðS;RÞlog
pðSÞpðRÞ
s;r
where
pðSÞ =
age positive information even when the two variables are independent. We estimated this bias by shuffling neural responses among
all trials. This bias estimator was found to be consistent with the analytical approximation derived in Panzeri and Treves (1996) and
Treves and Panzeri (1995). The resulting baseline information was
subtracted from all information calculations. In all calculations of
MI based on scalar statistics of the spike trains, the maximal magnitude of the biases did not exceed 15% of the information.
X
pðS;RÞ
X
pðS;RÞ
r
and
IðX1 ; .; Xn Þ =
X
pðx1 ; .; xn Þlog
x1 ;.;xn
pðx1 ; .; xn Þ
pðx1 Þ$.$pðxn Þ
The first term, stimulus-conditioned information, is large when neuronal responses are correlated per each given stimulus, and is zero
only when the neuronal responses are independent given the stimulus. In our data, we found that the first term was small for pairs of
neurons (see Figure S1), meaning that the joint distribution can be
well approximated as being independent when conditioned on the
stimulus. Formally, the joint conditional distribution
pðx1 ; .; xn jsÞ
pðRÞ =
s
are the marginal distributions over the stimuli and responses,
respectively. See the Supplemental Data for a more detailed description of how the MI is calculated in practice.
Information about stimulus identity was estimated using several
methods. The MI between spike counts and stimuli was estimated
using the histograms of the count distribution per each stimulus.
The bins of the histogram were chosen to achieve near uniform marginal distribution, and the number of bins was chosen to maximize
the bias-corrected information (using the method of Treves and Panzeri, 1995) conveyed by each cell (see Nelken et al., 2005 for more
details). Latency information was similarly computed using a histogram estimation of latency distribution per stimulus. MI about counts
and latencies was also estimated using a binless method (Victor,
2002), with essentially identical results (correlation coefficients between binless and binned estimations of MI across populations of
neurons were 0.85, 0.93, and 0.96 in A1, MGB, and IC, respectively).
In addition, MI was estimated using the distribution of binary words,
following the method of de Ruyter van Steveninck et al. (1997) and
Strong et al. (1998). To this end, each spike train was discretized in
several temporal resolutions of 2, 4, 8, 16, and 32 ms (yielding 3–60
bins per word), and the resolution and temporal windows that yielded
maximal (bias-corrected) MI were selected (usually 4 ms resolution).
MI was also calculated by embedding spike trains in Euclidean
spaces and using binless estimation strategies with the method of
Victor (2002) and using the 2nd order expansion of Panzeri et al.
(2001), again yielding similar results. In A1 and MGB, the MI in first
spike latencies and binary words (de Ruyter van Steveninck et al.,
1997) yielded about double the information that is conveyed by spike
counts. In IC, using binary words (de Ruyter van Steveninck et al.,
1997) yielded about 30% more information than spike counts.
For a finite sample size, mutual-information estimators that are
based on an estimated joint distribution are biased, having on aver-
is approximated by the product of the conditional marginals
N
Y
pðxi jsÞ
i=1
for every stimulus s. Note that this does not imply unconditional
independence:
pðx1 ; .; xn Þ =
N
Y
pðxi Þ:
i=1
To quantify the redundancy between a pair of neurons caused
by between-stimuli covariation, we used the mutual information
I(X1; X2), where the neurons are coupled under the stimulus-conditioned independence approximation. For groups of neurons, we
used the negative of their multi-information, again under stimulusconditional independence approximation.
Since redundancy tends to grow when single-unit information
about the stimulus grows, the varying information levels in the different auditory stations required normalizing the redundancies to a unified scale. Under conditional independence the redundancy is limited by the sum of single-unit information terms:
IðX1 ; .; Xn Þ%IðX1 ; .; Xn ; SÞ =
n
X
IðXi ; SÞ:
i=1
The redundancy was therefore normalized as the following (as in
Brenner et al., 2000; Reich et al., 2001):
IðX1 ; .; Xn Þ
P
:
IðXi ; SÞ
i
The effects described in this paper are considerably more pronounced for the unnormalized measures (see Figure S2).
Redundancy Reduction in the Auditory Pathway
367
Electrophysiological Recordings
For detailed methods, see Bar-Yosef et al. (2002). Extracellular recordings were made in A1 of nine halothane-anesthetized cats, in
medial geniculate body of two halothane-anesthetized cats, and inferior colliculus of nine isoflurane-anesthetized and two halothaneanesthetized cats. Anesthesia was induced by ketamine and xylazine and maintained with halothane (0.25%–1.5%, all A1 and MGB
cats, and two IC cats) or isoflurane (0.1%–2% nine IC cats) in 70%
N2O using standard protocols authorized by the committee for animal care and ethics of the Hebrew University Haddasah Medical
School (A1, MGB, and IC recordings) and Johns Hopkins University
(IC recordings). Single neurons were recorded using metal microelectrodes and an online spike sorter (MSD, Alpha-Omega) or a
Schmitt trigger. MGB neurons were further sorted offline. All neurons were well separated. In total we used data from 45 A1 neurons,
36 MGB neurons, and 39 IC neurons. In A1, penetrations were performed over the whole dorso-ventral extent of the appropriate frequency slab (between about 2 and 8 kHz). In MGB, all penetrations
were vertical, traversing a number of isofrequency laminae, and recording locations have been histologically localized in all divisions.
In IC vertical penetrations were used in all experiments except
one, in which electrode penetrations were performed at a shallow
angle through the cerebellum, traversing the IC in a caudo-rostral
axis. We tried to map the full medio-lateral extent of the nucleus,
but in each animal only a small number of electrode penetrations
were performed. Based on the sequence of best frequencies along
the track, the IC recordings are most likely in the central nucleus.
Stimuli were presented 20 times (A1 and MGB recordings and IC recordings in 12 neurons) and 5–20 times (IC recordings in 27 neurons). For 13 IC neurons, the sampling rate of the stimuli was increased to place the center frequency of the chirp at their BF.
Signals were presented to the animals using sealed, calibrated earphones at 60–80 dB SPL, at the preferred aurality of the neurons as
determined using broadband noise bursts. Sounds are from the Cornell Laboratory of Ornithology and have been selected and modified
as in Bar-Yosef et al., (2002). The responses to the Natural and Main
versions in A1 have been described in Bar-Yosef et al. (2002); the
rest of the data in A1 and all the MGB and IC responses are new.
ANF Simulations
Responses of auditory nerve fibers were simulated using an auditory
toolbox for Matlab by Slaney (Auditory toolbox ver2. Technical report. 1998). The peripheral filters are g-tone filters. They are followed
by half-wave rectification and low-pass filtering as implemented in
a version of the Meddis hair-cell model (Meddis, 1986). Spikes
were generated by a nonhomogeneous Poisson generator, using
the output of the hair-cell stage as a rate function.
Bar-Yosef, O., Rotman, Y., and Nelken, I. (2002). Responses of neurons in cat primary auditory cortex to bird chirps: effects of temporal
and spectral context. J. Neurosci. 22, 8619–8632.
Becker, S., and Hinton, G.E. (1992). Self-organizing neural network
that discovers surfaces in random-dot stereograms. Nature 355,
161–163.
Bell, A.J., and Sejnowski, T.J. (1995). An information-maximization
approach to blind separation and blind deconvolution. Neural Comput. 7, 1129–1159.
Borst, A., and Theunissen, F.E. (1999). Information theory and neural
coding. Nat. Neurosci. 2, 947–957.
Brenner, N., Strong, S.P., Koberle, R., Bialek, W., and de Ruyter van
Steveninck, R.R. (2000). Synergy in a neural code. Neural Comput.
12, 1531–1552.
Casseday, J.H., Fremouw, T., and Covey, E. (2002). The Inferior Colliculus: A Hub for the Central Auditory System. In Integrative Functions in the Mamalian Auditory Pathway, D. Oertel, R.R. Fay, and
A.N. Popper, eds. (New York: Springer), pp. 238–318.
Cover, T., and Thomas, J. (1991). Elements of Information Theory
(New York: Wiley and Sons).
de Cheveigne, A. (2001). The auditory system as a ‘‘separation
machine’’. In Physiological and Psychophysical Bases of Auditory
Function, D.J. Breebart, A.J.M. Houtsma, A. Kohlrausch, V.F. Prijs,
and R. Schoonhoven, eds. (Maastricht, The Netherlands: Shaker
Publishing), pp. 453–460.
de Ruyter van Steveninck, R.R., Lewen, G.D., Strong, S.P., Koberle,
R., and Bialek, W. (1997). Reproducibility and variability in neural
spike trains. Science 275, 1805–1808.
Ehret, G., Egorova, M., Hage, S.R., and Muller, B.A. (2003). Spatial
map of frequency tuning-curve shapes in the mouse inferior colliculus. Neuroreport 14, 1365–1369.
Escabi, M.A., Miller, L.M., Read, H.L., and Schreiner, C.E. (2003).
Naturalistic auditory contrast improves spectrotemporal coding in
the cat inferior colliculus. J. Neurosci. 23, 11489–11504.
Fritz, J., Shamma, S., Elhilali, M., and Klein, D. (2003). Rapid taskrelated plasticity of spectrotemporal receptive fields in primary auditory cortex. Nat. Neurosci. 6, 1216–1223.
Gat, I., and Tishby, N. (1999). Synergy and redundancy among brain
cells of behaving monkeys. Paper presented at: Advances in Neural
Information Proceedings systems (Denver, CO: MIT press).
Levy, W.B., and Baxter, R.A. (1996). Energy efficient neural codes.
Neural Comput. 8, 531–543.
Levy, W.B., and Baxter, R.A. (2002). Energy-efficient neuronal computation via quantal synaptic failures. J. Neurosci. 22, 4746–4755.
Supplemental Data
The Supplemental Data for this article can be found online at http://
www.neuron.org/cgi/content/full/51/3/359/DC1/.
Linsker, R. (1988). Self-organization in a perceptual network. IEEE
Computer 21, 105–117.
Acknowledgments
Middlebrooks, J.C., Clock, A.E., Xu, L., and Green, D.M. (1994). A
panoramic code for sound location by cortical neurons. Science
264, 842–844.
This work has been supported by a grant from the Human Frontiers
Science Program and by a grant from the Israeli Science Foundation
(ISF). G.C. was supported by a grant from the Israeli Ministry of
Science.
Received: November 3, 2005
Revised: May 1, 2006
Accepted: June 28, 2006
Published: August 2, 2006
Meddis, R. (1986). Simulation of mechanical to neural transduction in
the auditory receptor. J. Acoust. Soc. Am. 79, 702–711.
Miller, G.A. (1956). The magical number seven plus or minus two:
some limits on our capacity for processing information. Psychol.
Rev. 63, 81–97.
Miller, L.M., Escabi, M.A., Read, H.L., and Schreiner, C.E. (2002).
Spectrotemporal receptive fields in the lemniscal auditory thalamus
and cortex. J. Neurophysiol. 87, 516–527.
Narayanan, N., Kimchy, E., and Laubach, M. (2005). Redundancy
and synergy of neuronal ensembles in motor cortex. J. Neurosci.
25, 4207–4216.
Barlow, H. (2001). Redundancy reduction revisited. Network 12,
241–253.
Nelken, I., Chechik, G., Mrsic-Flogel, T.D., King, A.J., and Schnupp,
J. (2005). Encoding stimulus information by spike numbers and
mean response time in primary auditory cortex. J. Comp. Neurosci.
19, 199–221.
Barlow, H.B. (1961). Possible principles underlying the transformation of sensory messages. In Sensory Communication, I.W. Rosenblith, ed. (Cambridge, MA: MIT Press), pp. 217–234.
Olshausen, B.A., and Field, D.J. (1996). Emergence of simple-cell
receptive field properties by learning a sparse code for natural
images. Nature 381, 607–609.
References
Neuron
368
Panzeri, S., and Treves, A. (1996). Analytical estimates of limited
sampling biases in different information measures. Network 7, 87–
101.
Panzeri, S., Schultz, S.R., Treves, A., and Rolls, E.T. (1999). Correlations and the encoding of information in the nervous system. Proc.
R. Soc. Lond. B Biol. Sci. 266, 1001–1012.
Panzeri, S., Petersen, R.S., Schultz, S.R., Lebedev, M., and Diamond, M.E. (2001). The role of spike timing in the coding of stimulus
location in rat somatosensory cortex. Neuron 29, 769–777.
Pouget, A., Dayan, P., and Zemel, R.S. (2003). Inference and computation with population codes. Annu. Rev. Neurosci. 26, 381–410.
Reich, D.S., Mechler, F., and Victor, J.D. (2001). Independent and redundant information in nearby cortical neurons. Science 294, 2566–
2568.
Rieke, F., Bodnar, D.A., and Bialek, W. (1995). Naturalistic stimuli increase the rate and efficiency of information transmission by primary
auditory afferents. Proc. R. Soc. Lond. B Biol. Sci. 262, 259–265.
Rieke, F., Warland, D., de Ruyter van Steveninck, R., and Bialek, W.
(1997). Spikes (Cambridge, MA: MIT Press).
Rolls, E.T., and Treves, A. (1998). Neural Networks and Brain Function (Oxford, England: Oxford University Press).
Schneidman, E., Bialek, W., and Berry, M.J., 2nd. (2003). Synergy,
redundancy, and independence in population codes. J. Neurosci.
23, 11539–11553.
Schnupp, J.W., Mrsic-Flogel, T.D., and King, A.J. (2001). Linear processing of spatial cues in primary auditory cortex. Nature 414, 200–
204.
Schreiner, C.E., and Langner, G. (1997). Laminar fine structure of frequency organization in auditory midbrain. Nature 388, 383–386.
Shannon, C.E. (1948). A mathematical theory of communication. Bell
Sys. Tech. J. 27, 379–423.
Strong, S.P., Koberle, R., de Ruyter van Steveninck, R., and Bialek,
W. (1998). Entropy and information in neural spike trains. Phys. Rev.
Let. 80, 197–200.
Studenty, M., and Vejnarova, J. (1998). The multiinformation function
as a tool for measuring stochastic dependence. In Learning in
Graphical Models, M.I. Jordan, ed. (Dordrecht, The Netherlands:
Kluwer Academic Publishers), pp. 261–297.
Treves, A., and Panzeri, S. (1995). The upward bias in measures of
information derived from limited data samples. Neural Comput. 7,
399–407.
Vapnik, V. (1995). The Nature of Statistical Learning Theory (New
York: Springer).
Victor, J.D. (2002). Binless strategies for estimation of information
from neural data. Phys. Rev. E. Stat. Nonlin. Soft Matter Phys. 66,
51903–51918.
Vinje, W.E., and Gallant, J.L. (2000). Sparse coding and decorrelation
in primary visual cortex during natural vision. Science 287, 1273–
1276.
Vinje, W.E., and Gallant, J.L. (2002). Natural stimulation of the nonclassical receptive field increases information transmission efficiency in V1. J. Neurosci. 22, 2904–2915.
Warland, D., Reinagel, P., and Meister, M. (1997). Decoding visual information from a population of retinal ganglion cells. J. Neurophysiol. 78, 2336–2350.