tmpD1FF TMP
tmpD1FF TMP
tmpD1FF TMP
RESEARCH ARTICLE
Open Access
Abstract
Background: Explaining how the brain processing is so fast remains an open problem (van Hemmen JL, Sejnowski T.,
2004). Thus, the analysis of neural transmission (Shannon CE, Weaver W., 1963) processes basically focuses on
searching for effective encoding and decoding schemes. According to the Shannon fundamental theorem, mutual
information plays a crucial role in characterizing the efficiency of communication channels. It is well known that this
efficiency is determined by the channel capacity that is already the maximal mutual information between input and
output signals. On the other hand, intuitively speaking, when input and output signals are more correlated, the
transmission should be more efficient. A natural question arises about the relation between mutual information and
correlation. We analyze the relation between these quantities using the binary representation of signals, which is the
most common approach taken in studying neuronal processes of the brain.
Results: We present binary communication channels for which mutual information and correlation coefficients
behave differently both quantitatively and qualitatively. Despite this difference in behavior, we show that the
noncorrelation of binary signals implies their independence, in contrast to the case for general types of signals.
Conclusions: Our research shows that the mutual information cannot be replaced by sheer correlations. Our results
indicate that neuronal encoding has more complicated nature which cannot be captured by straightforward
correlations between input and output signals once the mutual information takes into account the structure and
patterns of the signals.
Keywords: Shannon information, Communication channel, Entropy, Mutual information, Correlation, Neuronal
encoding
Background
Huge effort has been undertaken to analyze neuronal
coding, its high efficiency and mechanisms governing
them [1]. Claude Shannon published his famous paper on
communication theory in 1948 [2,3]. In that paper, he formulated in a rigorous mathematical way intuitive concepts
concerning the transmission of information in communication channels. The occurrences of inputs transmitted
via channel and output symbols are described by random
variables X (input) and Y (output). An actual important
task is determination of an efficient decoding scheme;
i.e., a procedure that allows a decision to be made about
the sequence (message) input to the channel from the
output sequence of symbols. This is the essence of the
*Correspondence: jszczepa@ippt.pan.pl
Institute of Fundamental Technological Research, Polish Academy of Sciences,
Pawinskiego 5B, Warsaw, PL
2015 Pregowska et al.; licensee BioMed Central. This is an Open Access article distributed under the terms of the Creative
Commons Attribution License (http://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and
reproduction in any medium, provided the original work is properly credited. The Creative Commons Public Domain Dedication
waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise
stated.
Methods
The communication channel is a device that acts on the
input to produce the output [3,17,21]. In mathematical language, the communication channel is defined as a
matrix of conditional probabilities linking the transition
between input and output symbols possibly depending on
the internal structure of the channel. In neuronal communication systems of the brain, information is transmitted
by means of a small electric current and the timing of
the action potential (mV), also known in literature as a
spike train [1], plays a crucial role. Spike trains can be
encoded in many ways. The most common encoding proposed in the literature is binary encoding, which is the
most effective and natural method [11,22-26]. It is physically justified that spike trains as being observed, are
detected with some limited time resolution , so that in
each time slice (bin) a spike is either present or absent. If
we think of a spike as representing a "1" and no spike as
representing a 0, then, if we look at some time interval
T
of length T, each possible spike train is equivalent to
digit binary number. In [26] it was shown that transient
responses in auditory cortex can be described as a binary
process, rather than as a highly variable Poisson process.
Thus, in this paper, we analyze binary information sources
and binary channels [25]. Such channels are described by
a 2 2 matrix:
Page 2 of 7
C=
p0|0 p0|1
p1|0 p1|1
,
(1)
where
p0|0 + p1|0 = 1
and
p0|1 + p1|1 = 1 ,
for
i, j = 0, 1 ,
pYj
for
i = 0, 1 ,
pYj
for
j = 0, 1 .
(3)
The quantities pX1 and pY1 can be interpreted as the firing rates of the input and output spike trains. We will
use these probability distributions to calculate the mutual
information (between input and output signals), which is
expressed in terms of the entropies of the input itself, output itself and the joint probability of input and output
(4). In the following, we consider two random variables
X (input signal to the channel) and Y (output from the
channel) both assuming only two values 0 and 1, formally both defined on the same probability space. It is well
known that the correlation coefficient for any independent random variables X and Y is zero [14], but in general
it is not true that (X, Y ) = 0 implies independence of
random variables. However, for our specific random variables X and Y , which are of binary type, most common in
communication systems, we show the equivalence of independence and noncorrelation (see Appendix). The basic
idea of introducing the concept of a mutual information
is to determine the reduction of uncertainty (measured
by entropy) of random variable X provided that we know
the values of discrete random variable Y . The mutual
information (MI) is defined as
Page 3 of 7
(5)
(6)
where
H(Y |X = i) := jOs p(Y = j|X = i) log p(Y = j|X = i) ,
(7)
Is and Os are, in general, sets of input and output symbols, p(X = i) and p(Y = j) are probability distributions
of random variables X and Y , and p(X = i Y = j) is
the joint probability distribution of X and Y . Estimation
of mutual information requires knowledge of the probability distributions, which may be easily estimated for
two-dimensional binary distributions, but in real applications it possesses multiple problems [30]. Since, in practice, the knowledge about probability distributions is often
restricted, more advanced tools must be applied, such as
effective entropy estimators [24,30-33].
The relative mutual information RMI(X, Y ) [34]
between random variables X and Y is defined as the ratio
of MI(X, Y ) and the average of information transmitted
by variables X and Y :
RMI(X, Y ) :=
(8)
RMI(X, Y ) =
1 pX log pX 1 pY log pY +
i=0
i
i
j
j=0 j
i,j=0 pji log pji
.
1 pX log pX 1 pY log pY /2
i=0
i
i
j
j=0 j
(9)
The standard definition of the Pearson correlation coefficient (X, Y ) of random variables X and Y is
E[ (X EX) (Y EY )]
V (X) V (Y )
E(X Y ) EX EY
=
,
E[ (X EX)2 ] E[ (Y EY )2 ]
(X, Y ) :=
(10)
=
(11)
It follows that the Pearson correlation coefficient
(X, Y ) is by no means a general measure of dependence
between two random variables X and Y . (X, Y ) is connected with the linear dependence of X and Y . That is, the
well-known theorem [15] states that the value of this coefficient is always between -1 and 1 and assumes -1 or 1 if
and only if there exists a linear relation between X and Y .
The essence of correlation, when we describe simultaneously the input to and the output from neurons,
may be expressed as the difference in the probabilities of
coincident and independent spiking related to independent spiking. To realize this idea, we use a quantitative
neuroscience spike-train correlation (NSTC) coefficient:
NSTC(X, Y ) :=
(12)
Such a correlation coefficient with this normalization
seems to be more natural than the Pearson coefficient
in neuroscience. A similar idea was developed in [35]
where raw-cross-correlation of simultaneous spike trains
was referred to the square root of the product of firing
rates. Moreover, it turns out that NSTC coefficient has an
important property: i.e., once we know the firing rates pX1
and pY1 of individual neurons and the coefficient, we can
determine the joint probabilities of firing:
p00 = 1 pX1 1 pY1 + NSTC pX1 pY1 ,
p01 = 1 pY1 pX1 NSTC pX1 pY1 ,
(13)
p10 = pY1 1 pX1 NSTC pX1 pY1 ,
p11 = pX1 pY1 + NSTC pX1 pY1 .
Since p11 0, by formula (12) we have the lower
bound NSTC 1. The upper bound is unlimited for
the general class (2) of joint probabilities. In the important
special case when the communication channel is effective
Page 4 of 7
7
1
15 5 +
.
(14)
M() =
2
1
15
5
In this case, the family of the communication channels
2
is given by the conditional
for each parameter 0 < < 15
probability matrix C():
7
1
C() =
15
3
5 2
5 +
2
5 +2
2
15
3
5 2
1
5 +
2
5 +2
(15)
1
7
4
20
,
(16)
M() =
1
7
20 + 2 20
3
and the information source probabilities are pX0 = 10
+
7
7
X
2 and p1 = 10 2 for 0 < < 20 . Here the
communication channels C() are of the form
1
7
C() =
4
3
10 +2
20
7
10 2
1
20 +2
3
10 +2
7
20
7
10 2
(17)
1 1
10 20
(18)
M() =
4 1
+
5 20
Page 5 of 7
is visibly larger than zero what suggests that the communication efficiency is relatively good, while at the same
time the Pearson correlation coefficient (equal to -0.03)
is very close to zero, indicating that the input and output
signals are almost uncorrelated (independent for binary
channels). It suggests that these measures describe different qualitative properties. Figure 3 shows the behaviors of
RMI, and the NSTC coefficient.
Conclusions
C() =
1
9
20
8
9
1
20 +
1
10
1
10
(19)
9
and the information source probabilities are pX0 = 10
and
1
1
X
p1 = 10 for 0 < < 20 . It turns out that NSTC coefficient increases linearly from large negative values below
-0.4 to a positive value of 0.1. Simultaneously, is practically zero and RMI is small (below 0.1) but varies in a nonmonotonic way having a noticeable minimum (Figure 3).
Moreover, observe that for small the RMI (equal to 0.1)
To summarize, we show that the straightforward intuitive approach of estimating the quality of communication
channels according to only correlations between input
and output signals is often ineffective. In other words, we
refute the intuitive hypothesis which states that the more
the input and output signals are correlated the more the
transmission is efficient (i.e. the more effective decoding
scheme can be found). This intuition could be supported
by two facts:
1. for not correlated binary variables ((X, Y ) = 0),
(which are shown in the Appendix to be
independent) one has RMI = 0,
2. for fully correlated random variables (|(X, Y )| = 1)
(which are linearly dependent) one has RMI = 1. We
introduce a few communication channels for which
the correlation coefficients behave completely
differently to the mutual information, which shows
this intuition is erroneous.
In particular, we present the realizations of channels
characterized by high mutual information for input and
output signals but at the same time featuring very low
correlation between these signals. On the other hand, we
find channels featuring quite the opposite behavior; i.e.,
having very high correlation between input and output
signals while the mutual information turns out to be very
low. This is because the mutual information, which in fact
is a crucial parameter characterizing neuronal encoding,
takes into account structures (patterns) of the signals and
not only their statistical properties, described by firing
rates. Our research shows that neuronal encoding has a
much more complicated nature that cannot be captured
by straightforward correlations between input and output
signals.
Appendix
The theorem states that independence and noncorrelation are equivalent for random variables that take only two
values.
Theorem 1. Let X and Y be random variables, which
take only two real values ax , bx and ay , by , respectively. Let
M be the joint probability matrix
M=
p00 p01
p10 p11
Page 6 of 7
,
(20)
where
p00 = p(X = ax Y = ay ) ,
p01 = p(X = bx Y = ay ) ,
p10 = p(X = ax Y = by ) ,
p11 = p(X = bx Y = by ) ,
(26)
p00 + p01 + p10 + p11 = 1 ,
and (25)
for i = 0 ,
pXbx
pYay
for i = 1 ,
for j = 0 ,
pYby
for j = 1 .
(21)
(27)
Thus, we have
(28)
Similarly, we have
To prove this Theorem 1, we first show the following
particular case for binary random variables.
Lemma 1. Let X1 and Y1 be two random variables,
which take two values 0,1 only. Let M1 be the joint probability matrix
p00 p01
,
(22)
M1 =
p10 p11
for i, j = 0, 1 ,
(23)
for i = 0, 1 ,
pYj 1
for j = 0, 1 .
(24)
(29)
(30)
where
pji = p(X1 = i Y1 = j)
=0.
(25)
Page 7 of 7
Competing interests
The authors declare that they have no competing interests.
23.
Authors contributions
JS and AP planned the study, participated in the interpretation of data and
were involved in the proof of the Theorem. AP and EW carried out the
implementation and participated in the elaboration of data. EW participated in
the proof of the Theorem. All authors drafted the manuscript and read and
approved the final manuscript.
Acknowledgements
We gratefully acknowledge financial support from the Polish National Science
Centre under grant no. 2012/05/B/ST8/03010.
24.
25.
26.
27.
29.
30.
31.
32.
33.
34.
35.