Chapter 6

Chapter Six:Information Theory
Adama Science and Technology University
January 10, 2022
(ASTU) Digital Communication January 10, 2022 1 / 34

Outlines
1 Introduction
2 Information sources
3 Source Coding
4 Entropy Coding

Introduction
Performance of A Communication System

Effectiveness(validity)
Reliability
Effectiveness(validity)
Bandwidth
Transmission Rate
Information Transmission Rate Rb (bps): measured in bits per second
Symbol Transmission Rate RB (Bauds): measured in symbols per
second or Bauds
Rb = RB log2M , M = 2N (1)
Example:
Transmit 7 symbols in 1 second
{Red, Orange, Yellow , Green, Cyan, Blue, Purple}
RB = 7Bauds

Cont. . . .
Using Rb
{000, 001, 010, 011, 100, 101, 110}
Rb = 7 ∗ 3 = 21bps
If the information transmission rate of M-ary PSK is 1500bps, M=8.
Determine the symbol transmission rate.
If the symbol transmission rate of QPSK is 3000 Bauds. Determine the
rate of the information transmission.
Bandwidth Efficiency
Bit rate ; nb = Rb /B measured in bps/Hz
Bit rate; nB = RB /B measured in bps/Hz

Cont. . . .
Reliability
Signal-to-noise ratio (SNR) Bit/Symbol Error Rate or Probability of
Error
Bit Error Rate (BER) Peb = Errorbits/Totalbits
Symbol Error Rate PeB = ErrorSymbols/TotalSymbols
Table of Comparison

Cont. . . .

Information Sources
What is Information?
An information source may be viewed as an object which produces an
event, the outcome of which is selected at random according to a
probability distribution.
A discrete information source is a source which has only a finite set of
symbols as possible outputs.
The set of source symbols is called the source alphabet, and the
elements of the set are called symbols or letters.
Information sources can be classified as having memory or being
memoryless

Cont. . . .
A source with memory is one for which a current symbol depends on

the previous symbols.
A memoryless source is one for which each symbol produced is
independent of the previous symbols.

Cont. . . .
Analog Information Sources

Communication systems are designed to transmit the information
generated by a source to some destination.
The output of these sources are analog signals and therefore, these
sources are called as ‘Analog sources’.
Discrete Information Sources
If source of information is a digital computer or a storage device such
as magnetic tape or optical disk then their output is in the discrete
form such as binary or ASCII.
Sources producing a discrete output are known as the discrete sources.

Cont. . . .
Discrete Memoryless Sources (DMS)

Output sequences from the source are statistically independent
(Current output does not depend on any of the past outputs)
Characterized by the list of the symbols, the probability assignment to
these symbols, and the specification of the rate of generating these
symbols by the source
As an example, if a binary source generates a random sequence
101101000... such that each binary digit is statistically independent of
all the other digits
Stationary Sources
If the output of a discrete source is statistically dependent on its past
or future outputs.
The example of a discrete source is the source generating the English
text.

Mathematical Models for the Information Sources
The most important property of an information source is that it
produces an output that is random (which cannot be predicted
exactly).
The simplest-type of discrete source which is a binary source
transmitting a binary sequence in the form 11001011... .
The transmitted signal consists of only two alphabets {0, 1}.
A discrete information source with an alphabet of m possible letters
say {x1 , x2 . . . xm }.
This source transmits a sequence of letters selected from these letters.
For a discrete source in the set {X1 , X2 , . . . Xm } has a probability of
occurrence equal to P(xi )
A discrete source of information mathematically become;
m
X
P(xi ) = P(X = xi ), 1 ≤ i ≤ L and P(xi ) (2)
i=1

Cont. . . .
Logarithmic measure of information

Consider two discrete random variables X and Y such that
X = {x1 , x2 , . . . xn } and Y = {y1 , y2 , . . . ym } (3)
Suppose we observe Y = yj and wish to determine, quantitatively, the

amount of information Y = yj provides about the event X = xi ,
i = 1, 2, . . . n
Note:
If X and Y are statistically independent Y = yj provides no
information about the occurrence of X = xi
If they are fully dependent Y = yj determines the occurrence of X = xi
The information content is simply that provided by X = xi
A suitable measure that satisfies these conditions is the logarithm of
the ratio of the conditional probabilities
P{X = xi |Y = yj } = P{xi |yj } and P{X = xi } = p(xi )

Cont. . . .
Information content provided by the occurrence of Y = yj about the
event X = xi is defined as
P(xi /yj )
I (xi , yj ) = log
P(xi )
I (xi , yj ) is called the mutual information between xi and yj

Unit of the information measure is the nat(s) if the natural logarithm
is used and bit(s) if base 2 is used
Note ln a = ln2. log2 a = 0.69315 log2 a
If X and Y are independent I (xi , yj ) = 0
If the occurrence of Y = yj uniquely determines the occurrence of
X = xi , I (xi , yj ) = log (1/p(xi ))
i.e., I (xi , yj ) = − log p(xi ) = I (xi ) – self information of event X
Note : A high probability event conveys less information than a low
probability event
Cont. . . .
For a single event x where p(x) = 1, I (x) = 0 the occurrence of a

sure event does not convey any information
p(xi |yj ) p(xi |yj )p(yj ) p(xi , yj )

I (xi , yj ) = log = log = log
p(xi ) p(xi )p(yj ) p(xi )p(yj )
And since
p(xi , yj )
p(yj /xi ) =
p(xi )
p(yj |xi )
I (xi , yj ) = = I (yj , xi )
p(yj )
Information provided by yj about xi is the same as that provided by xi
about Y = yj

Cont. . . .
Using the same procedure as before we can also define conditional

self-information as
1
I (xi |yj ) = log2 = −log2 p(xi |yj )
p(xi |yj )
Now consider the following relationships
I (xi , yj ) = I (xi ) − I (xi |yj )
Which leads to
p(xi |yj )p(xi )

log = I (xi , yj ) − I (xi )
p(xi )

Cont. . . .
Properties of I (xi )
The information content of a symbol xi , denoted by I (xi ) satisfies the
following properties
I (xi ) = 0 for P(xi ) = 1
I (xi ) ≥ 0
I (xi ) > I (xj ) if P(xi ) < P(xj )
I (xi , xj ) = I (xi ) + I (xj ) if xi and xj are independent
I (xi ) must satisfy the following requirements:
I (xi ) must approach 0 as P(xi ) approaches infinity
The information content I (xi ) must be a non-negative quantity since
each message contains some information. In the worst case, I (xi ) can
be equal to zero
The information content of a message having higher probability of
occurrence is less than the information content of a message having
lower probability

Cont. . . .
Exercise -1:
1. A source produces one of four possible symbols during each interval
having probabilities; P(x1 ) = 0.5, P(x1 ) = 0.25, P(x1 ) = 0.125.
Obtain the information content of each of these symbols.
2. Calculate the amount of information if it is given that P(xi ) = 0.25
3. In a binary PCM if ’0’ occur with probability = 0.25 and ’1’ occur with
probability equal to 0.75. Then calculate the amount of information
carried by each digits

Cont. . . .
Average Mutual Information
Average mutual information between X and Y is given by:
n X
m n X
m
X X p(xi , yj )
I (X ; Y ) = p(xi , yj )I (xi , yj ) = p(xi , yj ) log2 ≥0
p(xi )p(yj )
i=1 j=1 i=1 j=1
(4)
Similarly, average self-information is given by:
n
X n
X
H(X ) = p(xi )I (xi ) = − p(xi ) log2 p(xi ) (5)
i=1 i=1
Where X represents the alphabet of possible output letters from a

source.
H(X ) represents the average self information per source letter and is
called the entropy of the source.
Entropy is defined as the average information per message.
Cont. . . .
If p(xi ) = 1/n for all i, then the entropy of the source becomes.
X1 1
H(X ) = log2 = log2 n
n n
In general,H(X ) ≤ logn for any given set of source letter probabilities.

The entropy of a discrete source is maximum when the output letters
are all equally probable.
Conditional self-information or conditional entropy as follows:
n X
m
X 1
H(X /Y ) = p(xi , yj ) log2 (6)
p(xi /yj )
i=1 j=1

Cont. . . .
Conditional entropy is the information or uncertainty in X after Y is

observed.
From the definition of average mutual information one can show that:
n X
X m
I (X , Y ) = p(xi , yj ) log p(xi /yj ) − log p(xi )
i=1 j=1
= −H(X /Y ) + H(X ) (7)
Thus I (X , Y ) = H(X )–H(X /Y ) and H(X ) ≥ H(X |Y ), with equality

when X and Y are independent

Cont. . . .
H(X /Y ) is called the equivocation

It is interpreted as the amount of average uncertainty remaining in X
after observation of Y
H(x) – Average uncertainty prior to observation
I (X ; Y ) – Average information provided about the set X by the
observation of the set Y
Information Rate
If the time rate at which source X emits symbols is r (symbols/s), the
information rate R(b/s) of the source is:
R = rH(X ) (8)
Where R is information rate (Information bits/second) .

H(X ) is entropy or average information
and r is rate at which symbols/second are generated.

Cont. . . .
Exercise-2:
1. A DMS X has four symbols x1 , x2 , x3 , x4 with probabilities;
P(x1 ) = 0.4, P(x2 ) = 0.3, P(x3 ) = 0.2, P(x4 ) = 0.1. Calculate
a. H(X ).
b. Find the amount of information contained in the messages x1 , x2 , x3
and x4 x3 x3 x2 , and Compare with the H(X ) obtained in part (a).
2 For a source transmitting two independent messages m1 and m2 having
probabilities of p and (1–p) respectively, prove that the entropy is
maximum when both the messages are equally likely. Also, plot the
variation of entropy (H) as a function of probability (p) of one of the
messages.
A discrete source emits one of five symbols once every millisecond with
probabilities 1/2, 1/4, 1/8, 1/16 and 1/16 respectively. Determine the
source entropy and information rate.

Source Coding
An objective of source coding is to minimize the average bit rate

required for representation of the source by reducing the redundancy
of the information source.
A conversion of the output of a discrete memoryless source (DMS)
into a sequence of binary symbols (i.e., binary code word) is called
source coding .
The device that performs this conversion is called the source encoder

Cont. . . .
Terms Related to Source Coding Process

Code word Length : The number of binary digits in the code word
Average Code word Length (L) : The average number of bits per
source symbol in the source coding process
m
X
L= P(xi )ni
i=1
Code Efficiency (η) :

Lmin
η=
L

Cont. . . .
Code Redundancy (γ) :

γ =1−η
The source coding theorem states that ” For a DMS X,
with entropy H(X), the average code word length L per
symbol is bounded as
L ≥ H(X )
and further, L can be made as close to H(X) has desired

for some suitably chosen code. Thus with Lmin = H(X ),
the code efficiency can be written as
H(x)
Code Efficiency = × 100 (9)
L

Cont. . . .
Classification of Codes
Fixed-Length Codes : A fixed-length code is one whose codeword
length is fixed.
Code 1 and code 2 of Table below are fixed-length with
length 2.
Variable-Length Codes : A variable-length code is one whose
codeword length is not fixed.
All codes of Table below except codes 1 and 2 are
variable-length codes.
Distinct codes : A code is distinct if each codeword is distinguishable
from other codewords.
All codes of Table below except code 1 are distinct
codes – notice the codes for x1 and x3.
Prefix-Free Codes :A code in which no codeword can be formed by
adding code symbols to another codeword is called a
prefix-free code.
Cont. . . .
Exercise -4: Given a DMS X with two symbols x1 and x2 and

P(x1 ) = 0.9, P(x2 ) = 0.1. Symbols x1 and x2 are encoded as follows:
Find the efficiency and the redundancy of this code.

Entropy Coding
The design of a variable-length code such that its average codeword

length approaches the entropy of DMS is often referred to as entropy
coding.
In this section, we present two examples of entropy coding.
Shannon-Fano coding and
Huffman coding

Shannon-Fano Coding algorithm
Shannon-Fano showed: “To reliably store the information generated
by some random source X , you need no more/less than, on the
average, H(X ) bits for each outcome”
Using ASCII representation, computer needs 8bits = 1byte for storing
each outcome. The resulting file has size 8,000,000 bits.
Shannon-Fano Coding algorithm
1 List the source symbols in order of decreasing probability.
2 Partition the set into two sets that are as close to equiprobables as
possible, and assign 0 to the upper set 1 to the lower set.
3 Continue this process, each time partitioning the sets with as nearly
equal probabilities as possible until further partitioning is not possible.

Huffman Coding Algorithm
1 List the source symbols in order of decreasing probability.

2 Combine the probabilities of the two symbols having the lowest
probabilities, and reorder the resultant probabilities, this step is called
reduction 1. The same procedure is repeated until there are two
ordered probabilities remaining.
3 Start encoding with the last reduction, which consist of exactly two
ordered probabilities. Assign 0 as the first digit in the codewords for all
the source symbols associated with the first probability; assign 1 to the
second probability.
4 Now go back and assign 0 and 1 to the second digit for the two
probabilities that were combined in the previous reduction step,
retaining all assignments made in Step 3.
5 Keep regressing this way until the first column is reached.

Cont. . . .
Example: Given a DMS with seven source letters x1 , x2 , . . . , x7 with

probabilities 0.35, 0.30, 0.20, 0.10, 0.04, 0.005, 0.005, respectively.
Order the symbols in decreasing order of probabilities.

Cont. . . .

Cont. . . .
Exercise-5:
A Discrete Memoryless Source X has five equally likely symbols.
a. Construct a Shannon-Fano code for X, and calculate the efficiency of
the code.
b. Construct another Shannon-Fano code and compare the results.
c. Repeat for the Huffman code and compare the results.

Thank You

Chapter 6

Uploaded by

Copyright:

Available Formats

Chapter 6

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Chapter 6

Uploaded by

Copyright:

Available Formats

Chapter Six:Information Theory

Adama Science and Technology University

January 10, 2022

(ASTU) Digital Communication January 10, 2022 1 / 34

(ASTU) Digital Communication January 10, 2022 2 / 34

Performance of A Communication System

(ASTU) Digital Communication January 10, 2022 3 / 34

(ASTU) Digital Communication January 10, 2022 4 / 34

(ASTU) Digital Communication January 10, 2022 5 / 34

(ASTU) Digital Communication January 10, 2022 6 / 34

(ASTU) Digital Communication January 10, 2022 7 / 34

A source with memory is one for which a current symbol depends on

(ASTU) Digital Communication January 10, 2022 8 / 34

Analog Information Sources

(ASTU) Digital Communication January 10, 2022 9 / 34

Discrete Memoryless Sources (DMS)

(ASTU) Digital Communication January 10, 2022 10 / 34

(ASTU) Digital Communication January 10, 2022 11 / 34

Logarithmic measure of information

X = {x1 , x2 , . . . xn } and Y = {y1 , y2 , . . . ym } (3)

Suppose we observe Y = yj and wish to determine, quantitatively, the

(ASTU) Digital Communication January 10, 2022 12 / 34

I (xi , yj ) is called the mutual information between xi and yj

For a single event x where p(x) = 1, I (x) = 0 the occurrence of a

p(xi |yj ) p(xi |yj )p(yj ) p(xi , yj )

(ASTU) Digital Communication January 10, 2022 14 / 34

Using the same procedure as before we can also define conditional

Now consider the following relationships

I (xi , yj ) = I (xi ) − I (xi |yj )

p(xi |yj )p(xi )

(ASTU) Digital Communication January 10, 2022 15 / 34

(ASTU) Digital Communication January 10, 2022 16 / 34

(ASTU) Digital Communication January 10, 2022 17 / 34

Where X represents the alphabet of possible output letters from a

In general,H(X ) ≤ logn for any given set of source letter probabilities.

(ASTU) Digital Communication January 10, 2022 19 / 34

Conditional entropy is the information or uncertainty in X after Y is

= −H(X /Y ) + H(X ) (7)

Thus I (X , Y ) = H(X )–H(X /Y ) and H(X ) ≥ H(X |Y ), with equality

(ASTU) Digital Communication January 10, 2022 20 / 34

H(X /Y ) is called the equivocation

Where R is information rate (Information bits/second) .

(ASTU) Digital Communication January 10, 2022 21 / 34

(ASTU) Digital Communication January 10, 2022 22 / 34

An objective of source coding is to minimize the average bit rate

(ASTU) Digital Communication January 10, 2022 23 / 34

Terms Related to Source Coding Process

Code Efficiency (η) :

(ASTU) Digital Communication January 10, 2022 24 / 34

Code Redundancy (γ) :

and further, L can be made as close to H(X) has desired

(ASTU) Digital Communication January 10, 2022 25 / 34

Exercise -4: Given a DMS X with two symbols x1 and x2 and

Find the efficiency and the redundancy of this code.

(ASTU) Digital Communication January 10, 2022 27 / 34

The design of a variable-length code such that its average codeword

(ASTU) Digital Communication January 10, 2022 28 / 34

(ASTU) Digital Communication January 10, 2022 29 / 34