Shannon Information Measures¶
The pyinform.shannon
module provides a collection of entropy and
information measures on discrete probability distributions
(pyinform.dist.Dist
). This module forms the core of PyInform as
all of the time series analysis functions are built upon this module.
Examples¶
Example 1: Entropy and Random Numbers¶
The pyinform.shannon.entropy()
function allows us to calculate the
Shannon entropy of a distributions. Let’s try generating a random distribution
and see what the entropy looks like?
import numpy as np
from pyinform.dist import Dist
from pyinform.shannon import entropy
np.random.seed(2019)
xs = np.random.randint(0,10,10000)
d = Dist(10)
for x in xs:
d.tick(x)
print(entropy(d))
print(entropy(d, b=10))
3.3216276921709724
0.9999095697715877
This is exactly what you should expect; the pseudo-random number generate does a decent job producing integers in a uniform fashion.
Example 2: Mutual Information¶
How correlated are consecutive integers? Let’s find out using
mutual_info()
.
import numpy as np
from pyinform.dist import Dist
from pyinform.shannon import mutual_info
np.random.seed(2019)
obs = np.random.randint(0, 10, 100)
p_xy = Dist(100)
p_x = Dist(10)
p_y = Dist(10)
for x in obs[:-1]:
for y in obs[1:]:
p_x.tick(x)
p_y.tick(y)
p_xy.tick(10*x + y)
print(mutual_info(p_xy, p_x, p_y))
print(mutual_info(p_xy, p_x, p_y, b=10))
1.3322676295501878e-15
4.440892098500626e-16
Due to the subtlties of floating-point computation we don’t get zero. Really, though the mutual information is zero.
Example 3: Relative Entropy and Biased Random Numbers¶
Okay. Now let’s generate some binary sequences. The first will be roughly uniform, but the second will be biased toward 0.
import numpy as np
from pyinform.dist import Dist
from pyinform.shannon import relative_entropy
p = Dist(2)
q = Dist(2)
np.random.seed(2019)
ys = np.random.randint(0, 2, 100)
for y in ys:
p.tick(y)
xs = np.random.randint(0, 6, 100)
for i, _ in enumerate(xs):
xs[i] = (((xs[i] % 5) % 4) % 3) % 2
q.tick(xs[i])
print(relative_entropy(q,p))
print(relative_entropy(p,q))
0.3810306585586593
0.4924878808808457
API Documentation¶
-
pyinform.shannon.
entropy
(p, b=2.0)[source]¶ Compute the base-b shannon entropy of the distribution p.
Taking \(X\) to be a random variable with \(p_X\) a probability distribution on \(X\), the base-\(b\) Shannon entropy is defined as
\[H(X) = -\sum_{x} p_X(x) \log_b p_X(x).\]Examples:
>>> d = Dist([1,1,1,1]) >>> shannon.entropy(d) 2.0 >>> shannon.entropy(d, 4) 1.0
>>> d = Dist([2,1]) >>> shannon.entropy(d) 0.9182958340544896 >>> shannon.entropy(d, b=3) 0.579380164285695
See [Shannon1948a] for more details.
- Parameters
p (
pyinform.dist.Dist
) – the distributionb (float) – the logarithmic base
- Returns
the shannon entropy of the distribution
- Return type
float
-
pyinform.shannon.
mutual_info
(p_xy, p_x, p_y, b=2.0)[source]¶ Compute the base-b mututal information between two random variables.
Mutual information provides a measure of the mutual dependence between two random variables. Let \(X\) and \(Y\) be random variables with probability distributions \(p_X\) and \(p_Y\) respectively, and \(p_{X,Y}\) the joint probability distribution over \((X,Y)\). The base-\(b\) mutual information between \(X\) and \(Y\) is defined as
\[\begin{split}I(X;Y) &= \sum_{x,y} p_{X,Y}(x,y) \log_b \frac{p_{X,Y}(x,y)}{p_X(x)p_Y(y)}\\ &= H(X) + H(Y) - H(X,Y).\end{split}\]Here the second line takes advantage of the properties of logarithms and the definition of Shannon entropy,
entropy()
.To some degree one can think of mutual information as a measure of the (linear and non-linear) coorelations between random variables.
See [Cover1991a] for more details.
Examples:
>>> xy = Dist([10,70,15,5]) >>> x = Dist([80,20]) >>> y = Dist([25,75]) >>> shannon.mutual_info(xy, x, y) 0.21417094500762912
- Parameters
p_xy (
pyinform.dist.Dist
) – the joint distributionp_x (
pyinform.dist.Dist
) – the x-marginal distributionp_y (
pyinform.dist.Dist
) – the y-marginal distributionb (float) – the logarithmic base
- Returns
the mutual information
- Return type
float
-
pyinform.shannon.
conditional_entropy
(p_xy, p_y, b=2.0)[source]¶ Compute the base-b conditional entropy given joint (p_xy) and marginal (p_y) distributions.
Conditional entropy quantifies the amount of information required to describe a random variable \(X\) given knowledge of a random variable \(Y\). With \(p_Y\) the probability distribution of \(Y\), and \(p_{X,Y}\) the distribution for the joint distribution \((X,Y)\), the base-\(b\) conditional entropy is defined as
\[\begin{split}H(X|Y) &= -\sum_{x,y} p_{X,Y}(x,y) \log_b \frac{p_{X,Y}(x,y)}{p_Y(y)}\\ &= H(X,Y) - H(Y).\end{split}\]See [Cover1991a] for more details.
Examples:
>>> xy = Dist([10,70,15,5]) >>> x = Dist([80,20]) >>> y = Dist([25,75]) >>> shannon.conditional_entropy(xy, x) 0.5971071794515037 >>> shannon.conditional_entropy(xy, y) 0.5077571498797332
- Parameters
p_xy (
pyinform.dist.Dist
) – the joint distributionp_y (
pyinform.dist.Dist
) – the marginal distributionb (float) – the logarithmic base
- Returns
the conditional entropy
- Return type
float
-
pyinform.shannon.
conditional_mutual_info
(p_xyz, p_xz, p_yz, p_z, b=2.0)[source]¶ Compute the base-b conditional mutual information the given joint (p_xyz) and marginal (p_xz, p_yz, p_z) distributions.
Conditional mutual information was introduced by [Dobrushin1959] and [Wyner1978], and more or less quantifies the average mutual information between random variables \(X\) and \(Y\) given knowledge of a third \(Z\). Following the same notations as in
conditional_entropy()
, the base-\(b\) conditional mutual information is defined as\[\begin{split}I(X;Y|Z) &= -\sum_{x,y,z} p_{X,Y,Z}(x,y,z) \log_b \frac{p_{X,Y|Z}(x,y|z)}{p_{X|Z}(x|z)p_{Y|Z}(y|z)}\\ &= -\sum_{x,y,z} p_{X,Y,Z}(x,y,z) \log_b \frac{p_{X,Y,Z}(x,y,z)p_{Z}(z)}{p_{X,Z}(x,z)p_{Y,Z}(y,z)}\\ &= H(X,Z) + H(Y,Z) - H(Z) - H(X,Y,Z)\end{split}\]Examples:
>>> xyz = Dist([24,24,9,6,25,15,10,5]) >>> xz = Dist([15,9,5,10]) >>> yz = Dist([9,15,10,15]) >>> z = Dist([3,5]) >>> shannon.conditional_mutual_info(xyz, xz, yz, z) 0.12594942727460334
- Parameters
p_xyz (
pyinform.dist.Dist
) – the joint distributionp_xz (
pyinform.dist.Dist
) – the x,z-marginal distributionp_yz (
pyinform.dist.Dist
) – the y,z-marginal distributionp_z (
pyinform.dist.Dist
) – the z-marginal distributionb (float) – the logarithmic base
- Returns
the conditional mutual information
- Return type
float
-
pyinform.shannon.
relative_entropy
(p, q, b=2.0)[source]¶ Compute the base-b relative entropy between posterior (p) and prior (q) distributions.
Relative entropy, also known as the Kullback-Leibler divergence, was introduced by Kullback and Leiber in 1951 ([Kullback1951a]). Given a random variable \(X\), two probability distributions \(p_X\) and \(q_X\), relative entropy measures the information gained in switching from the prior \(q_X\) to the posterior \(p_X\):
\[D_{KL}(p_X || q_X) = \sum_x p_X(x) \log_b \frac{p_X(x)}{q_X(x)}.\]Many of the information measures, e.g.
mutual_info()
,conditional_entropy()
, etc…, amount to applications of relative entropy for various prior and posterior distributions.Examples:
>>> p = Dist([4,1]) >>> q = Dist([1,1]) >>> shannon.relative_entropy(p,q) 0.27807190511263774 >>> shannon.relative_entropy(q,p) 0.3219280948873624
>>> p = Dist([1,0]) >>> q = Dist([1,1]) >>> shannon.relative_entropy(p,q) 1.0 >>> shannon.relative_entropy(q,p) nan
- Parameters
p (
pyinform.dist.Dist
) – the posterior distributionq (
pyinform.dist.Dist
) – the prior distributionb (float) – the logarithmic base
- Returns
the relative entropy
- Return type
float
References¶
- Cover1991a(1,2)
T.M. Cover amd J.A. Thomas (1991). “Elements of information theory” (1st ed.). New York: Wiley. ISBN 0-471-06259-6.
- Dobrushin1959
Dobrushin, R. L. (1959). “General formulation of Shannon’s main theorem in information theory”. Ushepi Mat. Nauk. 14: 3-104.
- Kullback1951a
Kullback, S.; Leibler, R.A. (1951). “On information and sufficiency”. Annals of Mathematical Statistics. 22 (1): 79-86. doi:10.1214/aoms/1177729694. MR 39968.
- Shannon1948a
Shannon, Claude E. (July-October 1948). “A Mathematical Theory of Communication”. Bell System Technical Journal. 27 (3): 379-423. doi:10.1002/j.1538-7305.1948.tb01448.x.
- Wyner1978
Wyner, A. D. (1978). “A definition of conditional mutual information for arbitrary ensembles”. Information and Control 38 (1): 51-59. doi:10.1015/s0019-9958(78)90026-8.