Abstract
Free full text
Predictive information in a sensory population
Prediction is an essential part of life. However, are we really “good” at making predictions? More specifically, are pieces of our brain close to being optimal predictors? To assess the efficiency of prediction, we need to measure the information that neurons carry about the future of our sensory experiences. We show how to do this, at least in simplified contexts, and find that groups of neurons in the retina indeed are close to maximally efficient at separating predictive information from the nonpredictive background. Efficient coding of predictive information is a principle that can be applied at every stage of neural computation.
Guiding behavior requires the brain to make predictions about the future values of sensory inputs. Here, we show that efficient predictive computation starts at the earliest stages of the visual system. We compute how much information groups of retinal ganglion cells carry about the future state of their visual inputs and show that nearly every cell in the retina participates in a group of cells for which this predictive information is close to the physical limit set by the statistical structure of the inputs themselves. Groups of cells in the retina carry information about the future state of their own activity, and we show that this information can be compressed further and encoded by downstream predictor neurons that exhibit feature selectivity that would support predictive computations. Efficient representation of predictive information is a candidate principle that can be applied at each stage of neural computation.
Almost all neural computations involve making predictions. Whether we are trying to catch prey, avoid predators, or simply move through a complex environment, the data we collect through our senses can guide our actions only to the extent that these data provide information about the future state of the world. Although it is natural to focus on the prediction of rewards (1), prediction is a much broader problem, ranging from the extrapolation of the trajectories of moving objects to the learning of abstract rules that describe the unfolding pattern of events around us (2–4). An essential aspect of the problem in all these forms is that not all features of the past carry predictive power. Because there are costs associated with representing and transmitting information, it is natural to suggest that sensory systems have optimized coding strategies to keep only a limited number of bits of information about the past, ensuring that these bits are maximally informative about the future. This principle can be applied at successive stages of signal processing, as the brain attempts to predict future patterns of neural activity. We explore these ideas in the context of the vertebrate retina, provide evidence for near-optimal coding, and find that this performance cannot be explained by classical models of ganglion cell firing.
The structure of the prediction problem depends on the structure of the world around us. In a world of completely random stimuli, for example, prediction is impossible. Consider a simple visual world such that, in the small patch of space represented by the neurons from which we record, there is just one object (a dark horizontal bar against a light background) moving along a trajectory
If we look at neural responses in small windows of time, e.g.,
We can make these ideas precise by estimating, in bits, the information that the words
where
As expected, the retina is most informative about the position of the object
Even if we keep a perfect record of everything we have experienced until the present moment, we cannot make perfect predictions: all of the things we observe are influenced by causal factors that we cannot observe, and from our point of view the time evolution of our sensory experience thus has some irreducible level of stochasticity. Formally, we imagine that we are sitting at time
Not all aspects of our past experience are useful in making predictions. Suppose that we build a compressed representation Z of our past experience, keeping some features and throwing away others. We can ask how much predictive information is captured by these features,
Building the maximally efficient predictor is nontrivial, even in seemingly simple cases. For an object with trajectories as in Fig. 1, knowledge of the object’s position and velocity at time t provides all of the information possible about the future trajectory. However, knowing position and velocity exactly requires an infinite amount of information. If, instead, we know the position and velocity only with some errors, we can draw an error ellipse in the position–velocity plane, as shown in Fig. 2B, and the area of this ellipse is related to the information that we have captured about the past. Points inside the error ellipse extrapolate forward to a cloud of possible futures. The key point is that error ellipses with the same area but different shapes or orientations—using, for example, the limited number of available bits to provide information about position vs. velocity—extrapolate forward to clouds of different sizes. Thus, to make the best predictions, we have to be sure that our budget of bits of about the past is used most effectively, and this is true even when prediction is “just” extrapolation.
The statement that the neural response w provides information about a feature f in the stimulus means that there is a reproducible relationship between these two variables (5–8). To probe this reproducibility, we must present the same features many times, and sample the distribution of responses
where the features are drawn from the distribution
In the case of interest here, the feature f is the future of the stimulus. To measure the information that neural responses carry about the future, we thus need to repeat the future. More precisely, we need to generate stimulus trajectories that are different but converge onto the same future. Given that we can write the distribution of trajectories
If trajectories converge onto a common future at time
The results for the five-cell group in Fig. 3C, which has a modest amount of information about the future, are not unusual. For each of the 53 neurons in the population that we monitor, we can find the group of cells, including this neuron, that has the most future information. These groups also operate close to the bound in the
Could near-optimal prediction result from known receptive field properties of these cells? To test this, we have made conventional linear/nonlinear (LN) models of the individual neurons in our dataset (21, 22): image sequences are projected linearly onto a template (spatiotemporal receptive field), and the probability of spiking is a nonlinear function of this projection. We fit these models to the responses of each neuron to a long movie with same statistics as in Fig. 3, and we adjust the nonlinearity to match the mean spike probability and the information captured about the past by single cells (details in Linear–Nonlinear Model). We then analyzed the performance of the model populations in exactly the same way that we analyzed the real populations. Populations of LN neurons fall far below the bound on predictive information, and this gap grows with the number of neurons (Fig. S1), in marked contrast to the real data (Fig. 3D). Interestingly, the models are not so far from the performance of an optimal system that has access only to data from ~100 ms in the past, comparable to the delay one might guess from Fig. 1B. Rather than being a consequence of the receptive fields for individual neurons, the near-optimal performance that we see in Fig. 3D thus is evidence that conventional models are missing the ability of the retina to overcome apparent delays in the encoding of dynamic inputs.
It seems natural to phrase the problem of prediction in relation to the visual stimulus, as in Fig. 3, but the brain has no direct access to visual stimuli except that provided by the retina. Could the brain learn, in an unsupervised fashion, to predict the future of retinal outputs? More precisely, if we observe that a population of retinal ganglion cells generates the word
In Fig. 4A, we show an example of
With these naturalistic stimuli, larger groups of cells carry predictive information for hundreds of milliseconds, as shown in Fig. 4C, and the maximum predictive information is above 1 bit/spike on average across the thousands of groups that we sampled. Smaller groups of cells do not carry long-term predictive power, and for short-term predictions they carry roughly one-half the information per spike that we see in larger groups.
The large amounts of predictive information that we see in neural responses are tied to the structure of the sensory inputs (Fig. 4D). Naturalistic movies generate the most powerful, and most long-ranged, predictable events; the responses to random checkerboard movies lose predictability within a few frames; and motion of a single object (as in Fig. 1) gives intermediate results. The internal dynamics of the retina could generate predictable patterns of activity even in the absence of predictable structure in the visual world, but this does not seem to happen. This raises the possibility that trying to predict the future state of the retina from its current state can lead us (or the brain) to focus on patterns of activity that are especially informative about the visual world.
The predictive information carried by N neurons is more than N times the information carried by single neurons, but even at
Compressing our description of input words down to 1 bit means sorting the words
Predictor neurons are constructed without reference to the stimulus—just as the brain would have to do—but by repeating the same naturalistic movie many times, we can measure the information that the spiking of a predictor neuron carries about the visual input, using standard methods (15, 23). As we see in Fig. 5C, model neurons that extract more predictive information also provide more information about the visual inputs. There is some saturation in this relationship, perhaps because the most effective predictor neurons are more efficient in selecting the relevant bits of the past. Nonetheless, it is clear that, by solving the prediction problem, the brain can “calibrate” the combinations of spiking and silence in the ganglion cell population, grouping them in ways that capture more information about the visual stimulus.
If we return to the simple world of a single bar moving on the screen, as above, then we can see that the spikes in predictor neurons are associated with interesting patterns of motion. One example is in Fig. 5D, where we see that a spike corresponds to an exceptionally long period of nearly constant velocity motion, followed by a reversal. Other examples include periods of high speed, independent of direction, or moments where the bar is located at a particular position with very high precision (see Feature Selectivity in Predictor Neurons and Fig. S3 for details). These results, which need to be explored more fully, support the intuition that the visual system computes motion not for its own sake, but because, in a world with inertia, motion estimation provides an efficient way of representing the future state of the world.
Information theory defines the capacity of a signal to carry information (the entropy), but information itself is always information about something; successful applications of information theoretic ideas to biological systems are cases where it is clear which information is relevant. However, how can we use information theory to think about neural coding and computation more generally? It is difficult to guess how organisms will value information about particular features of the world, but value can be attached only to bits that have the power to predict the organism’s future experience. Estimating how much information neural responses carry about the future of sensory stimuli, even in a simple world, we have found evidence that the retina provides an efficient, and perhaps nearly optimal, representation of predictive information (Fig. 3).
Efficient representation of predictive information is a principle that can be applied at every layer of neural processing. As an illustration, we consider the problem of a single neuron that tries to predict the future of its inputs from other neurons, and encodes its prediction in a single output bit—spiking or silence. This provides a way of analyzing the responses from a population of neurons that makes no reference to anything but the responses themselves, and in this sense provides a model for the kinds of computations that the brain can do. Predictive information in the patterns of activity is coded synergistically (Fig. 4), maximally efficient representations of this information involve spiking at reasonable rates, without any further constraints, and the optimal predictor neurons are efficient transmitters of information about the sensory input, even though the rules for optimal prediction are found without looking at the stimulus (Fig. 5). Thus, solving the prediction problem would allow the brain to identify features of the retina’s combinatorial code that are especially informative about the visual world, without any external calibration.
The idea that neural coding of sensory information might be efficient, or even optimal, in some information theoretic sense, is not new. Individual neurons have a capacity to convey information that depends on the time resolution with which spikes are observed, and one idea is that this capacity should be used efficiently (24, 25), in part by adapting coding strategies to the distribution of the inputs (26–28). Another idea is that the neighboring cells in the retina should not waste their capacity by transmitting redundant signals, and minimizing this redundancy may drive the emergence of spatially differentiating receptive fields (29, 30). Similarly, temporal filtering may serve to minimize redundancy in time (31), and this is sometimes called “predictive coding” (32). Reducing redundancy requires removing any predictable components of the input, keeping only the deviations from expectation. In contrast, immediate access to predictive information requires an encoding of those features of the past that provide the basis for optimal prediction. The retina actively responds to predictable features of the visual stimulus (33) and, in the case of smooth motion, can anticipate an object’s location in a manner that corrects for its own processing delay (34, 35). Our current results suggest that, even for irregular motion, the retina can efficiently extract the features of the stimulus that allow it to encode all available predictive information. Efficient coding of predictive information is therefore a very different principle from most of those articulated previously, and one that illustrates the surprising computational powers of local neural circuits, like the retina.
Although there has been much interest in the brain’s ability to predict particular things, our approach emphasizes that prediction is a general problem, which can be stated in a unified mathematical structure across many contexts, from the extrapolation of trajectories to the learning of rules (20). Our results on the efficient representation of predictive information in the retina thus may hint at a much more general principle.
Multielectrode Recordings.
Data were recorded from larval tiger salamander retina using the dense 252-electrode arrays with 30-μm spacing, as described in ref. 36. A piece of retina was freshly dissected and pressed onto the multielectrode array. While the tissue was perfused with Ringer’s solution, images from a computer monitor were projected onto the photoreceptor layer via an objective lens. Voltages were recorded from the 252 electrodes at 10 kHz throughout the experiments, which lasted 4–6 h. Spikes were sorted conservatively (36), yielding populations of 49 or 53 identified cells from two experiments, from which groups of different sizes were drawn for analysis.
Stimulus Generation and Presentation.
Movies were presented to the retina from
Motion Trajectories.
The moving-bar stimulus was generated by a stochastic process that is equivalent to the Brownian motion of a particle bound by a spring to the center of the display: the position and velocity of the bar at each time t were updated according to the following:
where
Common Futures.
To create trajectories in which several independent pasts converge onto a common future, we first generated a single very long trajectory, comprised of
Estimating Information.
For all mutual information measures, we followed ref. 37: data were subsampled via a bootstrap technique for different fractions f of the data, with 50 bootstrap samples taken at each fraction. For each sample, we identify frequencies with probabilities, and plug into the definition of mutual information to generate estimates
Information Bottleneck.
Information about the future of the stimulus is bounded by the optimal compression of the past, for each given compression amount. Formally, we want to solve the “bottleneck problem” (18):
where we map pasts
We thank E. Schneidman, G. J. Stephens, and G. Tkačik for useful discussions; and G. W. Schwartz, D. Amodei, and F. S. Soo for help with the experiments. We also thank the Aspen Center for Physics, supported by National Science Foundation (NSF) Grant PHY-1066293, for its hospitality. The work was supported by NSF Grants IIS-0613435, PHY-0957573, PHY-1305525, and CCF-0939370; by National Institutes of Health Grant EY-014196; by Novartis (through the Life Sciences Research Foundation); by the Swartz Foundation; and by the W. M. Keck Foundation.
The authors declare no conflict of interest.
This article contains supporting information online at www.pnas.org/lookup/suppl/10.1073/pnas.1506855112/-/DCSupplemental.
Articles from Proceedings of the National Academy of Sciences of the United States of America are provided here courtesy of National Academy of Sciences
Full text links
Read article at publisher's site: https://doi.org/10.1073/pnas.1506855112
Read article for free, from open access legal sources, via Unpaywall: https://www.pnas.org/content/pnas/112/22/6908.full.pdf
Citations & impact
Impact metrics
Citations of article over time
Alternative metrics
Smart citations by scite.ai
Explore citation contexts and check if this article has been
supported or disputed.
https://scite.ai/reports/10.1073/pnas.1506855112
Article citations
Limits on inferring T cell specificity from partial information.
Proc Natl Acad Sci U S A, 121(42):e2408696121, 07 Oct 2024
Cited by: 0 articles | PMID: 39374400 | PMCID: PMC11494314
Information Thermodynamics: From Physics to Neuroscience.
Entropy (Basel), 26(9):779, 11 Sep 2024
Cited by: 0 articles | PMID: 39330112 | PMCID: PMC11431499
Partial Information Decomposition: Redundancy as Information Bottleneck.
Entropy (Basel), 26(7):546, 26 Jun 2024
Cited by: 0 articles | PMID: 39056909 | PMCID: PMC11276267
Chemical reservoir computation in a self-organizing reaction network.
Nature, 631(8021):549-555, 26 Jun 2024
Cited by: 1 article | PMID: 38926572 | PMCID: PMC11254755
The neuron as a direct data-driven controller.
Proc Natl Acad Sci U S A, 121(27):e2311893121, 24 Jun 2024
Cited by: 1 article | PMID: 38913890 | PMCID: PMC11228465
Go to all (93) article citations
Data
Data behind the article
This data has been text mined from the article, or deposited into data resources.
BioStudies: supplemental material and supporting data
Similar Articles
To arrive at the top five similar articles we use a word-weighted algorithm to compare words from the Title and Abstract of each citation.
Learning to make external sensory stimulus predictions using internal correlations in populations of neurons.
Proc Natl Acad Sci U S A, 115(5):1105-1110, 18 Jan 2018
Cited by: 13 articles | PMID: 29348208 | PMCID: PMC5798322
Modeling developmental patterns of spontaneous activity.
Curr Opin Neurobiol, 21(5):679-684, 16 Jun 2011
Cited by: 5 articles | PMID: 21684148 | PMCID: PMC3184139
Review Free full text in Europe PMC
Vision and the establishment of direction-selectivity: a tale of two circuits.
Curr Opin Neurobiol, 19(3):293-297, 20 Apr 2009
Cited by: 17 articles | PMID: 19386483 | PMCID: PMC2805110
Review Free full text in Europe PMC
Distributed sensory coding applied to simulations of iconic storage and metacontrast.
Bull Math Biol, 40(5):605-623, 01 Jan 1978
Cited by: 21 articles | PMID: 708946
Funding
Funders who supported this work.
HHS | National Institutes of Health (1)
Grant ID: EY-014196
NEI NIH HHS (3)
Grant ID: R01 EY014196
Grant ID: R56 EY014196
Grant ID: EY-014196
National Science Foundation (3)
Grant ID: PHY-1305525
Grant ID: CCF-0939370
Grant ID: IIS-0613435
W.M. Keck Foundation (1)
Grant ID: N/A