Nothing Special   »   [go: up one dir, main page]

Academia.eduAcademia.edu
Neuron 51, 359–368, August 3, 2006 ª2006 Elsevier Inc. DOI 10.1016/j.neuron.2006.06.030 Reduction of Information Redundancy in the Ascending Auditory Pathway Gal Chechik,1,5,* Michael J. Anderson,4 Omer Bar-Yosef,2 Eric D. Young,4 Naftali Tishby,1,3 and Israel Nelken1,2 1 Interdisciplinary Center for Neural Computation 2 Department of Neurobiology 3 School of Computer Science and Engineering Hebrew University of Jerusalem Jerusalem 91904 Israel 4 Department of Biomedical Engineering Johns Hopkins University Baltimore, Maryland 21205 Summary Information processing by a sensory system is reflected in the changes in stimulus representation along its successive processing stages. We measured information content and stimulus-induced redundancy in the neural responses to a set of natural sounds in three successive stations of the auditory pathway—inferior colliculus (IC), auditory thalamus (MGB), and primary auditory cortex (A1). Information about stimulus identity was somewhat reduced in single A1 and MGB neurons relative to single IC neurons, when information is measured using spike counts, latency, or temporal spiking patterns. However, most of this difference was due to differences in firing rates. On the other hand, IC neurons were substantially more redundant than A1 and MGB neurons. IC redundancy was largely related to frequency selectivity. Redundancy reduction may be a generic organization principle of neural systems, allowing for easier readout of the identity of complex stimuli in A1 relative to IC. Introduction Over the last 40 years, various general principles of information processing in sensory systems have been suggested based on theoretical considerations. These include effective information transmission (Becker and Hinton, 1992; Linsker, 1988), efficient use of storage (Barlow, 1961; Miller, 1956) or energy resources (Levy and Baxter, 1996, 2002), achieving sparse codes (Olshausen and Field, 1996), and extraction of behaviorally relevant stimulus properties (Escabi et al., 2003; Fritz et al., 2003; Rieke et al., 1995). Each of these proposed principles predicts specific transformations of stimulus representations along the processing hierarchy, but the experimental evidence required to assess any of them is still very limited. Among the potential changes in stimulus representations, of special interest is the way groups of neurons interact to code information about the stimuli. These inter- *Correspondence: gal@stanford.edu Present address: Computer Science Department, 353 Serra Mall, Stanford University, Stanford, California 94305. 5 actions can be synergistic, in which the interactions increase the amount of information carried by the group compared with the same neurons considered independently of each other. The interactions can also be redundant, in which they reduce the amount of information carried by isolated neurons independently because different neurons convey overlapping information. At the receptor level, neurons are often highly redundant since each point in the sensory epithelium is represented by a large number of neurons with overlapping receptive fields. Barlow (1961) advocated the idea that redundancies in stimulus representation are reduced as the stimuli are successively processed at different stations. As a result, neurons at higher processing stations may become largely independent to allow for easier readout and more efficient use of coding resources. This idea, together with other theoretical principles, can be investigated experimentally by comparing stimulus representations along a hierarchy of processing stations. To investigate how the stimulus representation changes along processing stations, it is necessary to use stimuli that potentially engage nontrivial processing mechanisms at all levels of the auditory pathway. This requirement poses opposing constraints on the stimuli: on the one hand, the stimuli have to be rich enough to activate interesting central processing mechanisms, and on the other hand, their peripheral representations must be similar enough to make the task of distinguishing between them nontrivial. To satisfy these two requirements, we designed a set of stimuli that was based on natural bird vocalizations that contain rich and complex acoustic structures. To these we added systematically modified variants that shared similar spectro-temporal structures (Figure 1). These are expected to elicit high redundancies in the auditory periphery, although they are clearly different perceptually. Furthermore, we have previously demonstrated that these stimuli evoke rich and complex responses in auditory cortex (Bar-Yosef et al., 2002). These stimuli are therefore suitable to test the fate of stimulus-induced redundancy in the ascending auditory system. To quantify changes in stimulus representations, we used measures of information content (Borst and Theunissen, 1999; Rieke et al., 1997) and stimulus-induced informational redundancy of neural responses in three subsequent stations in the core auditory pathway: the inferior colliculus (IC), medial geniculate body of the thalamus (MGB), and primary auditory cortex (A1). Results All recordings were performed in halothane-anesthetized cats using a single set of stimuli consisting of natural and modified bird vocalizations (Bar-Yosef et al., 2002). Figure 2 shows examples of three representative stimuli, together with the neuronal responses they elicited in cells from different brain areas. The A1 neurons (Figures 2F and 2G) often responded differently to the full sound (left column) and to the main chirp component of the sound (center column), in which the echoes and Neuron 360 Figure 1. Spectrograms of the Stimuli Used in this Study Five variants (rows) were created out of three different bird chirps (columns). The variants were Natural: the full sound; Main: main chirp component after removing echoes and background noise; Noise: sound after removing the main chirp; Echo: the echo parts of the noise; Back: the background remaining after removing the echo from Noise. background noise were removed. Responses to the full natural sound and to the background noise and echoes (right column) were often similar (Figures 2F and 2G), even though the echoes were 15–20 dB weaker than the main chirp and had different temporal envelopes. In contrast, IC neurons (Figures 2B and 2C) responded similarly to the full sound and to the main chirp, but responded weakly to the noise. MGB neurons were intermediate (Figures 2D and 2E). In this study we quantify these complex response properties using information theoretic measures. Information about Stimulus Identity The relations between neural responses and the identity of the stimulus are often of a complex nature: they typically involve complex and stochastic patterns of activity that are not well characterized by linear correlations alone. High-order correlations between neural activity and stimuli can be quantitatively evaluated using the mutual information (MI) I(S;R) (Cover and Thomas, 1991; Shannon, 1948) between the stimuli S and the responses R (see the Experimental Procedures and the Supplemental Data). The MI is a function of the joint distribution of stimuli and responses that has several alternative interpretations. First, the MI can be interpreted as quantifying the differences between the responses to different stimuli (‘‘stimulus effect’’). Whereas a stimulus effect is usually quantified by simple measures such as changes in average spike rate, the MI measure is sensitive to additional changes in the distribution of the responses. For instance, two stimuli that give rise to the same average spike counts but with different standard deviations yield a nonzero MI. Furthermore, the MI is free of any assumption on the shape of the distribution of the responses to each stimulus, such as normality or equal variance, and can be used to quantify depen- Figure 2. Samples of Stimuli and Neural Responses (A) Three typical stimuli: A bird chirp in its full natural form (left), the main chirp component after removing the echoes and the background noise (center), and the echoes and background (right) ([B]– [G]). Responses in different brain regions are displayed as dot rasters. In the right column, frequency response areas with the stimulus spectra superimposed (white lines) are displayed. The frequency response areas show discharge rate (from blue [low] to red [high]) in response to tones of various frequencies (kHz, abscissa) and sound levels (dB SPL, ordinate). (B and C) Two IC cells (best frequencies are 8.8 and 10.6 kHz). Stimulus spectra were shifted into the neuronal response areas in (B) and (C) by increasing the sampling rate by 2.4 and 5.3, respectively. (D and E) Two MGB cells (both best frequencies are 5.9 kHz). ([F]–[H]) Three A1 cells (best frequencies are 3.6 and 5.9 kHz). In (D)–(H), the original sampling rates were used. dence between categorical variables such as stimuli and spike patterns, where measures such as averages cannot be meaningfully computed. On the downside, the MI is substantially more difficult to estimate reliably. An alternative interpretation of MI, rooted in information theory, sees MI as the average reduction in the uncertainty about the stimulus after observing a single response (Cover and Thomas, 1991; Shannon, 1948). In practice, since the responses R are complex and high-dimensional in nature, they are usually quantified using simplified representations of the spike trains. Common representations are the spike counts during the stimulus, the first spike latency, or the set of spike patterns coded as binary words at a given resolution (e.g., 4 ms bins). Choosing how spike trains are represented typically has a large effect on the level of information that can be extracted from the spike train. More complex representations (such as binary spike patterns) can extract more information from the responses, but it is not clear to what extent this information is used by Redundancy Reduction in the Auditory Pathway 361 Figure 3. Information in the Coding of Stimulus Identity ([A]–[D]) Illustration of matrix-based estimation of MI between stimuli identity and spike counts for a single MGB cell. (A) Five illustrative stimuli. (B) Raster plots of the responses to 20 repeats of each of the stimuli in (A). (C) Histogram of the spike count distribution across trials presented in (B). (D) Color-coded histograms of spike counts for all the 15 stimuli (rows). The naive MI estimator is the MI over this empirical joint distribution matrix. (E) Color-coded histogram of spike patterns’ occurrence for all 15 stimuli. For the purpose of this illustration, spikes in a window of 64 ms were considered, and their response times were discretized into 8 ms bins, yielding 8 bins, each containing no spikes (0) or at least one spike (1). ([F]–[H]) Mutual information and firing rates. Each point shows the average firing rate of a neuron to the stimulus ensemble (ordinate) plotted against the MI between spike counts and stimulus identity (abscissa). Large symbols denote the mean over a brain region. (F) MI using counts. (G) MI using latencies. (H) MI using spike patterns. downstream neurons, whose readout mechanism may be limited. To address this issue we analyzed several response representations, each taking into account some different aspects of the responses (Nelken et al., 2005), and we report results obtained with several response representations. We used the MI as a tool to quantify how the representation of the above set of stimuli changes between IC, MGB, and A1. We started by estimating the levels of information about stimulus identity that are conveyed by single neurons. Figures 3A–3D illustrate how MI is estimated from spike counts: the responses (Figure 3B) for each stimulus (Figure 3A) are summarized using the spike count, and the distribution of the counts is calculated for each stimulus (Figure 3C). The empirical joint distribution of stimuli and counts (Figure 3D) can be used to estimate the MI (see the Experimental Procedures). Similarly, the responses can be represented using other statistics like temporal firing patterns. The corresponding MI values can be calculated based on the distribution of these patterns and the stimuli (Figure 3E). On average, we found that individual IC neurons conveyed 2- to 4-fold more information about the identity of the stimuli than did A1 and MGB neurons. This was observed both for the information conveyed by spike counts (IC: 0.68 bits/trial, n = 39; MGB: 0.16 bits/trial, n = 36; A1: 0.18 bits/trial, n = 45), spike latency (0.75, 0.36, 0.39 bits/trial in IC, MGB, and A1, respectively), and spike patterns (0.88, 0.38, 0.41 bits/trial; see the Supplemental Data for more details). The MI estimated by spike counts was strongly correlated with MI estimates using latency or spike patterns. The ratios of the firing rates over all stimuli had about the same magnitudes, and as a result, information per spike was rather similar in the three stations (mean MI per spike [6 standard deviation] was 0.38 6 0.26, 0.44 6 0.28, and 0.28 6 0.18 bits/spike using spike patterns in IC, MGB, and A1, respectively; see Figures 3F–3H and the Supplemental Data). These differences show that stimuli typically elicited responses that were more easily differentiated in IC neurons than in A1 neurons. However, most of this difference could be accounted for by the higher firing rate of IC neurons, which made individual responses overall more discriminable. The 2-fold reduction in single-neuron information that we observed between IC and A1 is counterbalanced by the substantially larger number of neurons in A1. Thus, if such differences in information levels are indeed typical for general sets of natural stimuli, they are not expected to affect the total representational capacity of A1 relative to the IC. To better understand the meaning of the absolute MI values reported here, we consider the MI as a reduction Neuron 362 in uncertainty. The total uncertainty of the stimulus ensemble, as quantified by the stimulus entropy, is log2 (15) = 3.91 bits. Since the average A1 neuron carried 0.41 bits/trial, about ten independent A1 neurons would be enough to eliminate stimulus uncertainty. In other words, if information was additive across neurons, the identity of the stimulus could have been completely specified, on a trial-by-trial basis, using ten neurons only (and a correspondingly smaller number of IC neurons). Thus, the seemingly low information values computed here nevertheless imply that surprisingly small populations of neurons could be enough to discriminate between the stimuli used in this study. Informational Redundancy The above calculation estimates information carried by single neurons, but the information carried by a population of neurons could also depend on the relationships between the responses of different neurons (Pouget et al., 2003, Schneidman et al., 2003). It is customary to separate these relationships into two types. Signal correlations are due to a similarity in the neuronal responses across different stimuli. These occur, for example, when several neurons have the same response profile (mean spike counts as a function of stimulus identity) over the stimulus ensemble. Noise correlations are due to common fluctuations in the responses to a given stimulus across repeated presentations. For example, when a stronger-than-average response of one neuron at a certain trial tends to occur with a stronger-than-average response of the other neuron in the same trial, their correlation is referred to as a noise correlation. As a rule, signal correlations always lead to redundancy, whereas noise correlations may lead either to redundancy or to its opposite, synergy. Before we discuss how signal and noise correlations can be quantified, we demonstrate the effect of signal correlations using spike-counts response profiles. We focus on signal correlations and assume for the moment that the noise correlations are negligible (this assumption will be experimentally verified below). Examples of response profiles are displayed for one pair of A1 neurons (Figure 4A) and one pair of IC neurons (Figure 4B). Whereas the responses of the IC neurons in this example exhibited covariation across the stimulus set, the A1 neurons did so to a much lesser extent. The same data can be studied from the view of the readout of spike responses. In this setting, the observed responses are used to infer which stimulus was presented. It is reasonable to assume that in this case, observing a response of the second IC neuron does not add much to the information supplied by the first, testifying that these two neurons provide redundant information. In contrast, observing the second A1 neuron may help proportionately more in identifying the stimulus. It will be shown below that this contrast between IC and A1 neuronal pairs is common, although we used a different way to quantify this difference. The natural tools to study synergy and redundancy in these terms are again information theoretic. Figures 4C and 4D show the joint distribution of the spike counts for the two pairs of neurons. The joint distribution of the IC pair (Figure 4D) shows a clear interdependence between the two neurons: large spike counts in one neu- Figure 4. Joint Distributions of Spike Counts (A and B) Spike counts across the stimulus ensemble for a pair of IC cells (best frequencies 5.5 and 6.1 kHz) and a pair of A1 cells (both best frequencies are 5.1 kHz). Error bars = SEMs of the spike counts, for 20 repeats of the ensemble for A1 neurons and ten repeats for IC neurons. The sampling rate of the stimuli for the IC neurons was increased to place the center frequency of the chirp at BF. (C and D) Joint distribution of spike counts across all repeats of all stimuli, of the same two pairs of IC and A1 neurons. ron tend to occur with large spike counts in the other neuron. In the A1 pair (Figure 4C), this dependence is much weaker if present at all. Thus, the dependence between the responses of the two neurons is another way of uncovering redundancies. In contrast with the meancount response profiles, which are based on average spike counts and require ad hoc measures in order to quantify the redundancy, the degree of dependence in the joint distribution of the responses can be measured by the MI without any distributional assumptions. In addition, MI is not limited to spike counts, and we can calculate the distribution of other statistics of the responses, like the joint distribution of stimuli and spike patterns or latency as shown above. Most importantly, rather than investigating similarity in neuronal responses, we focus on informational redundancy, which quantifies the similarities between the sets of stimuli that can be discriminated using neuronal responses. To clarify the difference, consider the following example. A phasic neuron codes the identity of the stimulus in the timing of its burst. Another, tonic neuron codes the identity of the stimulus in its overall spike count. The pair of neurons can be redundant if the timing of the phasic response is highly correlated with the number of spikes elicited in the tonic neuron by the same stimulus. In such a case, although each neuron has a very different response pattern and requires a different decoding method, the information they convey about the stimuli is similar. Therefore, nonredundancy is inherently different from ‘‘distinct tuning curves.’’ To address such potential heterogeneity in coding, we quantified redundancy conveyed through various aspects of spike trains: spike counts, latency, and spike patterns. Spike patterns are actually sensitive to both latency and total spike counts. Redundancy Reduction in the Auditory Pathway 363 We quantified the signal correlations by the mutual information between responses of different neurons, I(X1; X2), using joint distributions as in Figures 4C and 4D. Noise correlations can be quantified by the stimulus-conditioned information IðX1 ; X2 jSÞ in which the MI is first estimated using the joint distribution of the responses for each stimulus separately, then averaged across stimuli. Estimating noise correlations requires more stimulus repetitions because the joint distributions of the responses are estimated from substantially smaller numbers of trials. In order to have a reliable estimate of the noise correlations, we measured the responses of a subset of neurons in A1 with 100 repetitions per stimulus. Noise correlations in these data were negligible (see Figure S1 in the Supplemental Data). Noise correlations are believed to be mostly due to network interactions, and are therefore expected to be more pronounced in higher processing stations. Since these correlations were negligible in A1, we conclude that they are of minor importance in MGB and a fortiori also in IC. Thus, in the data discussed here, signal correlations seem to be dominant, leading to a possible predominance of information redundancy between neurons. This result allowed us to approximate the response distributions of several neurons as being conditionally independent given the stimulus (see Experimental Procedures). The stimulus-conditioned independence approximation has been used previously (Reich et al., 2001) when the within-stimulus correlations between simultaneously-recorded neurons are small, as they are here (Figure S1). This approximation has several considerable practical advantages: it allows us to use nonsimultaneously measured pairs of neurons and to couple them as if noise correlations were absent. It also provides a more reliable redundancy estimation, and therefore allows using fewer stimulus repeats or estimating higher-order redundancies. To quantify the redundancy among larger groups of neurons caused by between-stimuli covariation, we used the measure of multi-information, a natural extension of mutual information, defined as IðX1 ; .; Xn Þ = X pðx1 ; .; xn Þlog x1 ;.;xn  pðx1 ; .; xn Þ pðx1 Þ$.$pðxn Þ  (Studenty and Vejnarova, 1998). Redundancy was then defined as the normalized multi-information IðX1 ; .; Xn Þ= X IðXi ; SÞ i where all joint distributions were approximated under the stimulus-conditioned independence approximation. Normalization was performed as in Brenner et al. (2000) and Reich et al. (2001) and is required in order to bring measures from different auditory stations to a unified scale (see the Experimental Procedures). Using unnormalized measures yields considerably more pronounced effects, which are shown in Figure S2. We first discuss redundancy when the neural response is summarized using the total spike counts, and we then discuss extensions to more complex measures of the responses further below. Neurons in A1 and MGB were found to be significantly less redundant than neurons in IC in the way they code the stimulus identity (Figure 5A). The median normalized redundancy in IC was 0.13 (with a median absolute deviation from the median of 0.07), whereas in MGB it was 0.02 (60.015) and in A1 0.03 (60.015) (t test, p < 10210 for both IC-MGB and IC-A1 comparisons, not significant for A1-MGB comparison, p > 0.8). This phenomenon is even more pronounced when considering triplets of neurons (Figure 5B), where median-normalized redundancies were 0.34, 0.03, and 0.05 in IC, MGB, and A1, respectively. These results suggest that information processing in the auditory pathway operates to achieve a neural representation in which neurons are tuned for independent stimulus properties. The size of the redundancy could be strongly affected by how responses are represented. In particular, decoding spike trains using total spike counts neglects information conveyed by the temporal structure of the spike train. Higher and more accurate estimates of the MI may be obtained by using other statistics of the spike trains (de Ruyter van Steveninck et al., 1997; Panzeri et al., 1999; Victor, 2002, Nelken et al., 2005), since these may take into account temporal structures and highorder correlations within spike trains. To test how redundancies depend on spike train representations, we further estimated MI conveyed by three other statistics of the spike trains: distribution of spike patterns viewed as binary words (de Ruyter van Steveninck et al., 1997; Strong et al., 1998), first spike latency, and binless estimation based on embedding in Euclidean spaces (Victor, 2002) (see Experimental Procedures and the Supplemental Data). The size of the redundancy remained essentially the same. For example, Figure 5C displays the distribution of normalized pair redundancy calculated using the spike patterns as binary words (as in de Ruyter van Steveninck et al., 1997; Strong et al., 1998, Nelken et al., 2005). These show that as in the case of spike count information, IC neurons are significantly more redundant than A1 and MGB neurons. Controls: Stimulus Bandwidth, Anatomical Location, and Frequency Selectivity In V1, redundancy has been shown to decrease when increasing the size of the visual stimulus (Vinje and Gallant, 2000, 2002). One possible analog of increasing the spatial size of a visual image is to increase the bandwidth of an auditory stimulus. In order to check the effect of bandwidth on redundancy reduction, we computed the redundancies elicited by a subset of stimuli consisting of all the narrowband versions (Main, Echo, and Main + Echo). These redundancies were also substantially larger in IC than in MGB and A1 (Figure 5D). Similar results were obtained when using only the remaining stimuli (data not shown). Thus, the large decrease in redundancy in MGB and A1 relative to the IC is not due only to the inclusion of both narrowband and wideband stimuli in the stimulus set. The higher redundancy in IC could result from undersampling, because neurons recorded in the same electrode penetrations could share more response properties and therefore show higher redundancy. Neurons in IC are known to differ by a number of response properties, such as temporal response patterns, width of tuning curves, best modulation frequency, and strength of Neuron 364 Figure 5. Informational Redundancy in the Coding of Stimulus Identity (A) Distribution of normalized pairs’ redundancy I(X1; X2)/I(X1; X2; S) based on spike counts for cells in IC, MGB, and A1. Arrows denote group means. (B) Distribution of normalized triplets’ redundancy I(X1; X2; X3)/I(X1; X2; X3; S) based on spike counts. (C) Normalized pair redundancy for spike patterns coded as binary words. (D) Distribution of redundancies as in (A) for a stimulus set restricted to stimuli having energy in a narrow frequency range (rows 2–3, Figure 1). (E) Average spike count redundancies between pairs in three different proximity classes as explained in the text, in the three auditory stations. Error bars denote the SEM of each group, the number above each bar denotes the number of pairs in each group, and p is the p value for a one-way ANOVA test for the difference between the groups’ means. inhibition. At least some of these properties are organized in dorso-ventral columns (Ehret et al., 2003; Schreiner and Langner, 1997), which was also the direction of our electrode penetrations in most experiments. To address this problem, redundancy was analyzed separately in three proximity classes: from neurons recorded in the same penetration, in the same animal but in different penetrations, and in different animals (Figure 5E). MGB and A1 redundancy was found to depend somewhat on proximity class, although rather weakly (one-way ANOVA, p = 0.03 and p = 0.06, respectively). Redundancy in IC was largely independent of the proximity class. Thus, at least in MGB, neurons recorded in the same penetration, and therefore largely within the same MGB subdivision (as judged by anatomical reconstruction of the penetrations), were somewhat more redundant than neurons across penetrations. More importantly, however, redundancy in IC was significantly larger in all proximity classes than in any of the MGB and A1 proximity classes. Thus, anatomical considerations cannot explain the higher redundancy in IC relative to MGB and A1. When probed with pure tones, auditory neurons often exhibit high sensitivity to a specific frequency, termed their best frequency (BF). Common BF is therefore a potential source for redundancy between auditory neurons. Neurons with the same BF would be expected to respond strongly to stimuli containing energy near their BF and respond weakly to other stimuli, generating stimulus-induced correlations as in Figure 4B. Figure 6B plots the normalized redundancy (whose distribution is presented in Figure 5A) between pairs of A1 neurons ordered by their BFs. Figure 6A plots the same measure for numerical simulations of auditory nerve fibers (ANFs; see Experimental Procedures) having the same set of BFs and responding to the same stimuli. Large redundancy values are observed for simulated ANF pairs with similar BFs, especially in the frequency range where the stimuli contain most of their energy. In contrast, A1 neurons in the same frequency range show essentially no redundancy. Figures 6C–6F quantifies this effect by plotting pairs’ redundancy as a function of the difference between BFs in the three auditory stations and in the ANF simulation. BF similarity is correlated with strong redundancy in ANF simulations (regression slope of 20.088 bits/octave, n = 45, p < 1026) and in IC (slope 20.037 bits/octave, n = 39, p < 1026). This correlation is smaller in MGB and absent in A1 (MGB slope 20.0028 bits/octave, n = 36, p < 0.001; A1 slope 20.013 bits/octave, n = 45, not significant). The existence of these strong correlations in IC, and at the same time, weak correlations in MGB and A1, does not mean that all IC neurons with the same BF are redundant, as can be clearly seen in Figure 6D. Rather, there are many IC neurons with strong redundancy in their responses to this set of sounds, and the common feature of these pairs is a similar BF. The average dependence of redundancy on BF difference is similar in IC and in the ANF simulations. On the other hand, neither in MGB nor in A1 did we find any pair of neurons, even among those with similar BF, that had redundancy as large as in IC. Discussion By comparing information levels and informational redundancy across a sensory processing pathway, we identified a dramatic change in stimulus representation that reflects the characteristics of stimulus representation in these stations. Starting with a set of sounds that was designed to induce high informational redundancy in the auditory periphery (Figure 6A), we found that the redundancy was still substantial in IC but essentially disappeared in MGB and in A1. This reduction in redundancy was observed for any response representation, Redundancy Reduction in the Auditory Pathway 365 Figure 6. Redundancy and Frequency Selectivity (A) Normalized redundancy plotted as a function of cells’ best frequencies. Simulated responses of auditory nerve fibers (ANFs) following the hair-cell model by Meddis (1986) as implemented by Slaney (Auditory toolbox ver2, 1998). The number of model ANFs and their best frequencies were matched to the A1 cells (B). Diagonal values (white) are omitted, since these values measure the redundancy of a cell with itself. (B) Same plot as in (A), but for A1 neurons. (C) Normalized redundancy between pairs of ANF model neurons as a function of BF difference. (D) Same as in (C), for recorded IC neurons. (E) Same as in (C), for recorded MGB neurons. (F) Same as in (C), for recorded A1 neurons. including spike counts, latency, or temporal spike patterns. While IC redundancies were correlated with the frequency sensitivity of IC neuronal pairs, this was not the case in A1 and MGB. The IC integrates essentially all lower processing streams, and thus contains neurons which are potentially selective for complex features; it is also believed to contain a detailed representation of sounds in terms of their physical features, possibly in overlapping parameter maps (Casseday et al., 2002). As expected, our findings suggest that this representation contains relatively high informational redundancies when stimuli have only small variations in their spectro-temporal structure. Above the IC, the representation of the spectro-temporal structure of sounds is degraded (see also Miller et al., 2002). Although we could expect an associated reduction in the ability of cortical neurons to encode the identity of sounds, we demonstrate that this reduction is rather small. While we do not know the nature of processing that MGB and A1 perform on the outputs of IC, we do show here that it results in reduced informational redundancy. Therefore, although A1 neurons respond to a wide range of stimuli in a way that is seemingly not stimulus-specific (Bar-Yosef et al., 2002; Middlebrooks et al., 1994; Schnupp et al., 2001) (see also Figures 3F–3H and Figure 4A), the information conveyed by different neurons is largely independent. The reason for this effect is that although A1 cells may respond similarly to some stimuli, the subsets of stimuli that evoke similar responses change from one A1 neuron to another. For example, some A1 neurons responded similarly to both the Natural and Noise variants of the stimulus (Figure 2G), while others responded similarly to Natural and Main but differently to the Noise variant (Figure 2H). Thus, each of these neurons groups the set of stimuli through a different criterion. Such distributed coding provides good discrimination between stimuli when observing multiple neurons, since together they partition the set of possible stimuli to small enough sets in an efficient way, given the partition capabilities of the single neurons. This interpretation again reflects the point of view that focuses on partitions of stimulus space rather than similarities of the responses. In the primary visual cortex, redundancy between neurons has been studied as a function of the size of the stimulated visual field (Vinje and Gallant, 2000, 2002), and was found to differ from auditory redundancies in two crucial aspects. Firstly, redundancies between neurons in V1 have been shown to decrease with increase in the spatial size of the stimulated visual field. One possible analog of a restricted stimulated field in vision is a narrowband stimulus in audition. However, using only narrowband stimuli reproduced the same decrease in redundancy in MGB and A1 relative to IC (Figure 5D). Thus, increase in bandwidth, at least between the narrow- and wideband stimuli used here, does not have the same effect in A1 as an increase in stimulated visual field has in V1: A1 neurons, like V1 neurons, show little redundancy with broadband (or large-field) stimuli. However, in A1, the low redundancy is retained when stimuli are narrowband, while in V1, smaller stimuli are associated with increased redundancy (Vinje and Gallant, 2000, 2002). Secondly, visual neural responses showed an increase in selectivity and a formation of a sparse representation of visual scenes. In A1, the redundancy was much reduced compared with the IC (Figure 5D), but A1 neurons responded to many different sounds in the set (Figures 3F and 3G and Figure 4A), and were actually less selective to stimulus identity than IC neurons. Thus, redundancy in visual cortex processing seems to operate differently than in A1 with regard to the size of the stimulus, although low redundancy may be achieved under the appropriate conditions (Reich et al., 2001). What could be the computational advantages of such a redundancy reduction process? A possible outcome is a ‘‘splitting’’ of the information inside a single frequency channel as suggested by de Cheveigne (2001). This addresses the difficult problem of segregating the spectrotemporal representation of complex soundscapes into distinct components that belong to different auditory objects—a segregation that is achieved by the auditory system in spite of possible overlaps between objects both in time and in frequency. More generally, Barlow suggested that reducing the redundancy between computing elements reflects a process where the system extracts meaningful structures in signals and codes them independently (Barlow, 2001). Neuron 366 Indeed, reducing redundancy during information processing, by mapping stimuli to a higher-dimensional feature space, is known to provide better discrimination among complex inputs—as is done in independent component analysis (Bell and Sejnowski, 1995) and support vector machines (Vapnik, 1995). The increased coding independence of A1 neurons, compared to the IC, may thus reflect the extraction of relevant information from acoustic stimuli. This view is supported by our finding that A1 cells carry considerably less information about the spectro-temporal structure (relative to IC cells) than about the more abstract notion of stimulus identity (data not shown). Similar processes may characterize other modalities, as for example in inferotemporal visual neurons that are sensitive to the more abstract notion of a face, but less sensitive to its physical details. The observations presented here raise the hypothesis that obtaining representations with reduced redundancies in high processing stations is a generic organizational principle of sensory systems that allows easier readout of behaviorally relevant aspects of the natural scene. Redundancy Quantification Informational redundancy between pairs of neurons can be quantified by the difference between information conveyed by a group of neurons and the sum of information conveyed by those neurons individually: n X IðX1 ; .; Xn ; SÞ 2 IðXi ; SÞ i=1 is a measure of redundancy previously used for pairs of neurons (Brenner et al., 2000; Gat and Tishby, 1999; Narayanan et al., 2005; Rieke et al., 1997; Rolls and Treves, 1998; Schneidman et al., 2003; Warland et al., 1997) . This can be also presented as the difference between two multi-information terms IðX1 ; .; Xn jSÞ 2 IðX1 ; .; Xn Þ where multi-information is a natural extension of mutual information, defined as the following (Studenty and Vejnarova, 1998): Experimental Procedures Information about Stimulus Identity The mutual information between responses R and a set of stimuli S is defined in terms of their joint distribution: p(S,R). When this distribution p(S,R) is known exactly, the MI can be calculated as   X pðS;RÞ IðS;RÞ = pðS;RÞlog pðSÞpðRÞ s;r where pðSÞ = age positive information even when the two variables are independent. We estimated this bias by shuffling neural responses among all trials. This bias estimator was found to be consistent with the analytical approximation derived in Panzeri and Treves (1996) and Treves and Panzeri (1995). The resulting baseline information was subtracted from all information calculations. In all calculations of MI based on scalar statistics of the spike trains, the maximal magnitude of the biases did not exceed 15% of the information. X pðS;RÞ X pðS;RÞ r and IðX1 ; .; Xn Þ = X pðx1 ; .; xn Þlog x1 ;.;xn   pðx1 ; .; xn Þ pðx1 Þ$.$pðxn Þ The first term, stimulus-conditioned information, is large when neuronal responses are correlated per each given stimulus, and is zero only when the neuronal responses are independent given the stimulus. In our data, we found that the first term was small for pairs of neurons (see Figure S1), meaning that the joint distribution can be well approximated as being independent when conditioned on the stimulus. Formally, the joint conditional distribution pðx1 ; .; xn jsÞ pðRÞ = s are the marginal distributions over the stimuli and responses, respectively. See the Supplemental Data for a more detailed description of how the MI is calculated in practice. Information about stimulus identity was estimated using several methods. The MI between spike counts and stimuli was estimated using the histograms of the count distribution per each stimulus. The bins of the histogram were chosen to achieve near uniform marginal distribution, and the number of bins was chosen to maximize the bias-corrected information (using the method of Treves and Panzeri, 1995) conveyed by each cell (see Nelken et al., 2005 for more details). Latency information was similarly computed using a histogram estimation of latency distribution per stimulus. MI about counts and latencies was also estimated using a binless method (Victor, 2002), with essentially identical results (correlation coefficients between binless and binned estimations of MI across populations of neurons were 0.85, 0.93, and 0.96 in A1, MGB, and IC, respectively). In addition, MI was estimated using the distribution of binary words, following the method of de Ruyter van Steveninck et al. (1997) and Strong et al. (1998). To this end, each spike train was discretized in several temporal resolutions of 2, 4, 8, 16, and 32 ms (yielding 3–60 bins per word), and the resolution and temporal windows that yielded maximal (bias-corrected) MI were selected (usually 4 ms resolution). MI was also calculated by embedding spike trains in Euclidean spaces and using binless estimation strategies with the method of Victor (2002) and using the 2nd order expansion of Panzeri et al. (2001), again yielding similar results. In A1 and MGB, the MI in first spike latencies and binary words (de Ruyter van Steveninck et al., 1997) yielded about double the information that is conveyed by spike counts. In IC, using binary words (de Ruyter van Steveninck et al., 1997) yielded about 30% more information than spike counts. For a finite sample size, mutual-information estimators that are based on an estimated joint distribution are biased, having on aver- is approximated by the product of the conditional marginals N Y pðxi jsÞ i=1 for every stimulus s. Note that this does not imply unconditional independence: pðx1 ; .; xn Þ = N Y pðxi Þ: i=1 To quantify the redundancy between a pair of neurons caused by between-stimuli covariation, we used the mutual information I(X1; X2), where the neurons are coupled under the stimulus-conditioned independence approximation. For groups of neurons, we used the negative of their multi-information, again under stimulusconditional independence approximation. Since redundancy tends to grow when single-unit information about the stimulus grows, the varying information levels in the different auditory stations required normalizing the redundancies to a unified scale. Under conditional independence the redundancy is limited by the sum of single-unit information terms: IðX1 ; .; Xn Þ%IðX1 ; .; Xn ; SÞ = n X IðXi ; SÞ: i=1 The redundancy was therefore normalized as the following (as in Brenner et al., 2000; Reich et al., 2001): IðX1 ; .; Xn Þ P : IðXi ; SÞ i The effects described in this paper are considerably more pronounced for the unnormalized measures (see Figure S2). Redundancy Reduction in the Auditory Pathway 367 Electrophysiological Recordings For detailed methods, see Bar-Yosef et al. (2002). Extracellular recordings were made in A1 of nine halothane-anesthetized cats, in medial geniculate body of two halothane-anesthetized cats, and inferior colliculus of nine isoflurane-anesthetized and two halothaneanesthetized cats. Anesthesia was induced by ketamine and xylazine and maintained with halothane (0.25%–1.5%, all A1 and MGB cats, and two IC cats) or isoflurane (0.1%–2% nine IC cats) in 70% N2O using standard protocols authorized by the committee for animal care and ethics of the Hebrew University Haddasah Medical School (A1, MGB, and IC recordings) and Johns Hopkins University (IC recordings). Single neurons were recorded using metal microelectrodes and an online spike sorter (MSD, Alpha-Omega) or a Schmitt trigger. MGB neurons were further sorted offline. All neurons were well separated. In total we used data from 45 A1 neurons, 36 MGB neurons, and 39 IC neurons. In A1, penetrations were performed over the whole dorso-ventral extent of the appropriate frequency slab (between about 2 and 8 kHz). In MGB, all penetrations were vertical, traversing a number of isofrequency laminae, and recording locations have been histologically localized in all divisions. In IC vertical penetrations were used in all experiments except one, in which electrode penetrations were performed at a shallow angle through the cerebellum, traversing the IC in a caudo-rostral axis. We tried to map the full medio-lateral extent of the nucleus, but in each animal only a small number of electrode penetrations were performed. Based on the sequence of best frequencies along the track, the IC recordings are most likely in the central nucleus. Stimuli were presented 20 times (A1 and MGB recordings and IC recordings in 12 neurons) and 5–20 times (IC recordings in 27 neurons). For 13 IC neurons, the sampling rate of the stimuli was increased to place the center frequency of the chirp at their BF. Signals were presented to the animals using sealed, calibrated earphones at 60–80 dB SPL, at the preferred aurality of the neurons as determined using broadband noise bursts. Sounds are from the Cornell Laboratory of Ornithology and have been selected and modified as in Bar-Yosef et al., (2002). The responses to the Natural and Main versions in A1 have been described in Bar-Yosef et al. (2002); the rest of the data in A1 and all the MGB and IC responses are new. ANF Simulations Responses of auditory nerve fibers were simulated using an auditory toolbox for Matlab by Slaney (Auditory toolbox ver2. Technical report. 1998). The peripheral filters are g-tone filters. They are followed by half-wave rectification and low-pass filtering as implemented in a version of the Meddis hair-cell model (Meddis, 1986). Spikes were generated by a nonhomogeneous Poisson generator, using the output of the hair-cell stage as a rate function. Bar-Yosef, O., Rotman, Y., and Nelken, I. (2002). Responses of neurons in cat primary auditory cortex to bird chirps: effects of temporal and spectral context. J. Neurosci. 22, 8619–8632. Becker, S., and Hinton, G.E. (1992). Self-organizing neural network that discovers surfaces in random-dot stereograms. Nature 355, 161–163. Bell, A.J., and Sejnowski, T.J. (1995). An information-maximization approach to blind separation and blind deconvolution. Neural Comput. 7, 1129–1159. Borst, A., and Theunissen, F.E. (1999). Information theory and neural coding. Nat. Neurosci. 2, 947–957. Brenner, N., Strong, S.P., Koberle, R., Bialek, W., and de Ruyter van Steveninck, R.R. (2000). Synergy in a neural code. Neural Comput. 12, 1531–1552. Casseday, J.H., Fremouw, T., and Covey, E. (2002). The Inferior Colliculus: A Hub for the Central Auditory System. In Integrative Functions in the Mamalian Auditory Pathway, D. Oertel, R.R. Fay, and A.N. Popper, eds. (New York: Springer), pp. 238–318. Cover, T., and Thomas, J. (1991). Elements of Information Theory (New York: Wiley and Sons). de Cheveigne, A. (2001). The auditory system as a ‘‘separation machine’’. In Physiological and Psychophysical Bases of Auditory Function, D.J. Breebart, A.J.M. Houtsma, A. Kohlrausch, V.F. Prijs, and R. Schoonhoven, eds. (Maastricht, The Netherlands: Shaker Publishing), pp. 453–460. de Ruyter van Steveninck, R.R., Lewen, G.D., Strong, S.P., Koberle, R., and Bialek, W. (1997). Reproducibility and variability in neural spike trains. Science 275, 1805–1808. Ehret, G., Egorova, M., Hage, S.R., and Muller, B.A. (2003). Spatial map of frequency tuning-curve shapes in the mouse inferior colliculus. Neuroreport 14, 1365–1369. Escabi, M.A., Miller, L.M., Read, H.L., and Schreiner, C.E. (2003). Naturalistic auditory contrast improves spectrotemporal coding in the cat inferior colliculus. J. Neurosci. 23, 11489–11504. Fritz, J., Shamma, S., Elhilali, M., and Klein, D. (2003). Rapid taskrelated plasticity of spectrotemporal receptive fields in primary auditory cortex. Nat. Neurosci. 6, 1216–1223. Gat, I., and Tishby, N. (1999). Synergy and redundancy among brain cells of behaving monkeys. Paper presented at: Advances in Neural Information Proceedings systems (Denver, CO: MIT press). Levy, W.B., and Baxter, R.A. (1996). Energy efficient neural codes. Neural Comput. 8, 531–543. Levy, W.B., and Baxter, R.A. (2002). Energy-efficient neuronal computation via quantal synaptic failures. J. Neurosci. 22, 4746–4755. Supplemental Data The Supplemental Data for this article can be found online at http:// www.neuron.org/cgi/content/full/51/3/359/DC1/. Linsker, R. (1988). Self-organization in a perceptual network. IEEE Computer 21, 105–117. Acknowledgments Middlebrooks, J.C., Clock, A.E., Xu, L., and Green, D.M. (1994). A panoramic code for sound location by cortical neurons. Science 264, 842–844. This work has been supported by a grant from the Human Frontiers Science Program and by a grant from the Israeli Science Foundation (ISF). G.C. was supported by a grant from the Israeli Ministry of Science. Received: November 3, 2005 Revised: May 1, 2006 Accepted: June 28, 2006 Published: August 2, 2006 Meddis, R. (1986). Simulation of mechanical to neural transduction in the auditory receptor. J. Acoust. Soc. Am. 79, 702–711. Miller, G.A. (1956). The magical number seven plus or minus two: some limits on our capacity for processing information. Psychol. Rev. 63, 81–97. Miller, L.M., Escabi, M.A., Read, H.L., and Schreiner, C.E. (2002). Spectrotemporal receptive fields in the lemniscal auditory thalamus and cortex. J. Neurophysiol. 87, 516–527. Narayanan, N., Kimchy, E., and Laubach, M. (2005). Redundancy and synergy of neuronal ensembles in motor cortex. J. Neurosci. 25, 4207–4216. Barlow, H. (2001). Redundancy reduction revisited. Network 12, 241–253. Nelken, I., Chechik, G., Mrsic-Flogel, T.D., King, A.J., and Schnupp, J. (2005). Encoding stimulus information by spike numbers and mean response time in primary auditory cortex. J. Comp. Neurosci. 19, 199–221. Barlow, H.B. (1961). Possible principles underlying the transformation of sensory messages. In Sensory Communication, I.W. Rosenblith, ed. (Cambridge, MA: MIT Press), pp. 217–234. Olshausen, B.A., and Field, D.J. (1996). Emergence of simple-cell receptive field properties by learning a sparse code for natural images. Nature 381, 607–609. References Neuron 368 Panzeri, S., and Treves, A. (1996). Analytical estimates of limited sampling biases in different information measures. Network 7, 87– 101. Panzeri, S., Schultz, S.R., Treves, A., and Rolls, E.T. (1999). Correlations and the encoding of information in the nervous system. Proc. R. Soc. Lond. B Biol. Sci. 266, 1001–1012. Panzeri, S., Petersen, R.S., Schultz, S.R., Lebedev, M., and Diamond, M.E. (2001). The role of spike timing in the coding of stimulus location in rat somatosensory cortex. Neuron 29, 769–777. Pouget, A., Dayan, P., and Zemel, R.S. (2003). Inference and computation with population codes. Annu. Rev. Neurosci. 26, 381–410. Reich, D.S., Mechler, F., and Victor, J.D. (2001). Independent and redundant information in nearby cortical neurons. Science 294, 2566– 2568. Rieke, F., Bodnar, D.A., and Bialek, W. (1995). Naturalistic stimuli increase the rate and efficiency of information transmission by primary auditory afferents. Proc. R. Soc. Lond. B Biol. Sci. 262, 259–265. Rieke, F., Warland, D., de Ruyter van Steveninck, R., and Bialek, W. (1997). Spikes (Cambridge, MA: MIT Press). Rolls, E.T., and Treves, A. (1998). Neural Networks and Brain Function (Oxford, England: Oxford University Press). Schneidman, E., Bialek, W., and Berry, M.J., 2nd. (2003). Synergy, redundancy, and independence in population codes. J. Neurosci. 23, 11539–11553. Schnupp, J.W., Mrsic-Flogel, T.D., and King, A.J. (2001). Linear processing of spatial cues in primary auditory cortex. Nature 414, 200– 204. Schreiner, C.E., and Langner, G. (1997). Laminar fine structure of frequency organization in auditory midbrain. Nature 388, 383–386. Shannon, C.E. (1948). A mathematical theory of communication. Bell Sys. Tech. J. 27, 379–423. Strong, S.P., Koberle, R., de Ruyter van Steveninck, R., and Bialek, W. (1998). Entropy and information in neural spike trains. Phys. Rev. Let. 80, 197–200. Studenty, M., and Vejnarova, J. (1998). The multiinformation function as a tool for measuring stochastic dependence. In Learning in Graphical Models, M.I. Jordan, ed. (Dordrecht, The Netherlands: Kluwer Academic Publishers), pp. 261–297. Treves, A., and Panzeri, S. (1995). The upward bias in measures of information derived from limited data samples. Neural Comput. 7, 399–407. Vapnik, V. (1995). The Nature of Statistical Learning Theory (New York: Springer). Victor, J.D. (2002). Binless strategies for estimation of information from neural data. Phys. Rev. E. Stat. Nonlin. Soft Matter Phys. 66, 51903–51918. Vinje, W.E., and Gallant, J.L. (2000). Sparse coding and decorrelation in primary visual cortex during natural vision. Science 287, 1273– 1276. Vinje, W.E., and Gallant, J.L. (2002). Natural stimulation of the nonclassical receptive field increases information transmission efficiency in V1. J. Neurosci. 22, 2904–2915. Warland, D., Reinagel, P., and Meister, M. (1997). Decoding visual information from a population of retinal ganglion cells. J. Neurophysiol. 78, 2336–2350.