DC Digital Communication MODULE IV PART1

3/23/2009
Module IV –Part I
INFORMATION Information theory and coding: Discrete
messages - amount of information – entropy -
THEORY information rate. Coding- Shannon’s theorem,
Channel capacity - capacity of Gaussian channel-
Bandwidth S/N Trade off - Use of orthogonal signal
to attain Shannon‘s limit - Efficiency of
MODULE III orthogonal signal transmission.
PART I
AMOUNT OF INFORMATION AMOUNT OF INFORMATION

Consider a communication system in which the allowable The above units are related as :
messages are m1,m2…. with probabilities of occurrence p1,p2…
Then p1+p2+……….=1. ln a log10 a 1Hartley = 3.32bits

log 2 a = =
Let the transmitter transmit a message mk with probability pk. Let ln 2 log10 2 1Nat = 1.44bits
the receiver has correctly identified the message. Then the
amount of information conveyed by the system is defined as:
The base of 2 is preferred because in binary PCM the
Ik= logb (1/pk) where b is the base of log.
possible messages 0 and 1 occur with likely hood and the
= -logb pk
amount of information conveyed by each bit is log22=1 bit.
The base may be 2,10 or e.
When the base is 2 the unit of Ik is is bit (binary unit)
when it is 10 the unit is Hartley or decit.
When the natural logarithmic base is used the unit is nat.
Base 2 is commonly used to represent Ik.
IMPORTANT PROPERTIES OF IK NOTES

When the symbols 0 and 1 of a PCM data occur with equal likely
Ik approaches 0 as pk approaches 1. hood with probabilities ½ the amount of information conveyed by
pk=1 means the receiver already knows the message and each bit is
there is no need for transmission so Ik=0. Eg: The statement Ik(0) = Ik(1) = log22= 1 bit
‘sun rises in the east’ conveys no information.
Ik must be a non-negative quantity since each message When the probabilities are different the less probable symbol
contains some information in the worst case Ik=0. conveys more information.
Let p(0)=1/4 p(1)=3/4
The information content of a message having higher
probability of occurrence is less than the information content Ik(0)=log2 4=2 bits
of message having lower probability. Ik(1)=log2 4/3=0.42 bit
As pk approaches 0, Ik approaches infinity. The information
When there are M equally likely and independent messages such
content in a highly improbable event approaches unity.
that M=2N with N an integer, the information in each message is
Ik=log2 M=log2 2N = N bits.
1
3/23/2009
NOTES EXAMPLE
In this case if we are using binary PCM code for representing the
M messages the number of binary digits required to represent all EXAMPLE 1
the 2N messages is also N. A source produces one of four possible symbols during each
i.e when there are M (=2N) equally likely messages the amount of interval having probabilities: p(x1)=1/2, p(x2)=1/4, p(x3) = p(x4)
information conveyed by each message is equal to the number of = 1/8. Obtain the information content of each of these
binary digits needed to represent all the messages. symbols.
When two independent messages mk and mI are correctly
identified the amount of information conveyed is the sum of the
information associated with each of the messages individually. ANS:
1 1 I(x1)=log22 =1 bit
I k = log 2 I I = log 2 I(x2)=log24 =2 bits
pk pI
When the messages are independent the probabilities of the I(x3)=log28 =3 bits
composite message is pkpI. I(x4)=log28 =3 bits
1 1 1
I k , I = log 2 = log 2 + log 2 = Ik + I I
pk p I pk pI
AVERAGE INFORMATION,ENTROPY AVERAGE INFORMATION,ENTROPY

Suppose we have M different and independent messages M
1
m1,m2… with probabilities of occurrence p1,p2… H ≡ ∑ pk log 2
Suppose further that during a long period of transmission a k =1 pk
sequence of L messages has been generated. If L is very Average information is also referred to as Entropy. Its unit is
large we may expect that in the L message sequence we information bits/symbol or bits/message.
transmitted p1L messages of m1, p2L messages of m2,etc.
The total information in such a message will be:
1 1 M
1
I Total = p1 L log 2 + p2 L log 2 + ................... H ≡ ∑ pk log 2
p1 p2 k =1 pk
Average information per message interval is represented by
the symbol H is given by:
I Total 1 1
H≡ = p1 log 2 + p2 log 2 + ...................
L p1 p2

When pk=1, there is only a single possible message and the
receipt of that message conveys no information. H Plot of H as a function of p
H = log2 1 = 0
When pk→0 amount of information Ik→∞ and the average
information in this case is : 1 HMAX
1
lim p log 2 = 0
p →0 p
The average information associated with an extremely unlikely
message as well as an extremely likely message is zero.
Consider that a source generates two messages with probabilities
p and (1-p). The average information per message is :
1 1
H = p log 2 + (1 − p ) log 2 0 1/2 1 p
p (1 − p )
when p = 0, H = 0 when p = 1, H = 0
2
3/23/2009

dH
The maximum value of H may be located by setting =0 dH ⎛1− p ⎞
dp = 0 log ⎜ ⎟=0
1 1 dP ⎝ p ⎠
H = p log 2 + (1 − p ) log 2
p (1 − p ) 1− p 1
= 1 1− p = p p =
H = − p log 2 p − (1 − p ) log 2 (1 − p ) p 2
Similarly when there are 3 messages the average information H
dH ⎛ 1 ⎞ ⎛ −1 ⎞ becomes maximum when the probability of each of these messages
= − ⎜ p ⋅ + log
l p ⎟ − ⎜ (1
( − p ).
) + log(1
l ( − p ).
) − 1⎟ p=1/3.
1/3
1 1 1
dp ⎝ p ⎠ ⎝ (1 − p ) ⎠ H MAX = log 2 3 + log 2 + 3 log 2 3 = log 2 3
= −1 − log p + 1 + log(1 − p ) 3 3 3
Extending this, when there are M messages H becomes a maximum
when all the messages are equally likely with p=1/M. In this case
= log(1 − p ) − log p each message has a probability 1/M and
⎛1− p ⎞ 1
= log ⎜ H max = ∑ log 2 M = log 2 M
⎟ M
⎝ p ⎠
INFORMATION RATE R EXAMPLE 1

A discrete source emits one of the five symbols once very milliseconds
Let a source emits symbol at the rate r symbols/second. Then with probabilities 1/2, 1/4, 1/8, 1/16 and 1/16 respectively. Determine
information rate of the source:
source entropy and information rate.
R= r H information bits/second. M
1 5
1
H = ∑ P log i 2
Pi
= ∑ P logi 2
i =1 i =1 Pi
R→ information rate, H→ entropy of the source 1 1 1 1 1
r rate at which symbols are generated.
r→ generated = log 2 2 + log 2 4 + log 2 8 + log 2 16 + log 2 16
2 4 8 16 16
1 1 3 1 1 15
= + + + + = = 1.875 bits/symbol
R= r (symbols/second) x H (information bits/symbol) 2 2 8 4 4 8
R= rH (information bits/second) 1 1
Symbol rate r = f b = = = 1000 symbols/sec
Tb 10 −3
Information rate R = rH = 1000 × 1.875 = 1875 bits/se c
EXAMPLE 2 EXAMPLE 3
The probabilities of five possible outcomes of an experiment are given An analog signal band limited to 10kHz is quantized into 8 levels of
as a PCM system with probabilities of 1/4, 1/5, 1/5,1/10, 1/10, 1/20,
1 1 1 1 1/20 and 1/20 respectively. Find the entropy and rate of information.
P( x1 ) = , P( x2 ) = , P( x3 ) = , P( x4 ) = P( x5 ) =
2 4 8 16
fm= 10 kHz fs = 2 x 10kHz = 20 kHz
Determine the entropy and information rate if there are 16 outcomes per Rate at which messages are produced r = fs = 20 ×10 messages / sec
3
second.
5
1 1 ⎛1 ⎞ ⎛1 ⎞ ⎛ 1 ⎞
H ( X ) = ∑ P( xi ) logg2 bits/symbol
y H ( X ) = log2 4 + ⎜ log2 5 ⎟ × 2 + ⎜ log2 10 ⎟ × 2 + ⎜ log2 20 ⎟ × 3
i =1 P( xi ) 4 ⎝5 ⎠ ⎝ 10 ⎠ ⎝ 20 ⎠
1 1 1 1 1
= log2 2 + log 2 4 + log2 8 + log 2 16 + log2 16 = 2.84 bits/messages
2 4 8 16 16
1 2 3 4 4
= + + + + =
15
= 1.875 bits/outcome R = rH ( X )
2 4 8 16 16 8
= 20000 × 2.84
Rate of outcomes r = 16 outcomes/sec
15 = 56800 bits/sec
Rate of information R = rH ( X ) = 6 × = 30 bits/sec
8
3
3/23/2009
EXAMPLE 4 EXAMPLE 4 (Contd..)

Consider a telegraph source having two symbols dot and dash. The Average time per symbol is
dot duration is 0.2s. The dash duration is 3 times the dot duration. T s = P(dot) ⋅ t (dot) + P(dash) ⋅ t (dash) + t space
The probability of the dots occurring is twice that of the dash and
the time between symbols is 0.2s. Calculate the information rate of 2 1
the telegraph source.
= × 0.2 + × 0.6 + 0.2
3 3
p(dot) = 2 p(dash) p((dot)) + p(dash)
( ) = 3 p((dash)) = 1 = 0.5333 seconds/sym bol
1 2 1
p(dash) = p(dot) = Average symbol rate is r = = 1.875 symbols/sec
3 3 Ts
H ( X ) = p(dot) log2
1
+ p(dash) log2
1 Average information rate R = rH = 1.875 × 0.92 = 1.72 b/s
p(dot) p(dash)
= 0.667 × 0.585 + 0.333 ×1.585 = 0.92 b/symbol
H ( X ) = 0.92 b/symbol
SOURCE CODING SOURCE CODING

Let there be M equally likely messages such that If the M=2N. The more likely a message is, the fewer the number of bits
messages are equally likely, the entropy H becomes maximum that should be used in its code word.
and is given by Let X be a DMS with finite entropy H(X) and an alphabet { x1,
H max = log2 M = log2 2N = N x2,…,xm } with corresponding probabilities of occurrence p(xi)
where i = 1, 2, 3, ……..m.
The number of binary digits needed to encode each message is Let the binary code word assigned to symbol xi by the
also N.
N encoder
d have
h l
length
th ni measured d in
i bits.
bit Length
L th off a code
d
So entropy H = N if the messages are equally likely. word is the number of bits in the code word.
The average information carried by individual bit is H/N = 1 bit. The average code word length L per source symbol is given
If however the messages are not equally likely H is less than N by
and each bit carries less than 1 bit of information. M
L = p( xi )n1 + p( x2 )n2 + ... + p( xm )nm = ∑ p( xi )ni
This situation can be corrected by using a code in which not all i =1
messages are encoded into the same number of bits.
SOURCE CODING SOURCE CODING

The parameter L represents the average number of bits per
x1 p(x1) n1 source symbol used in the source coding process.
y1 L
x2 p(x2) n2 Code efficiency η is defined as η = min where Lmin is the
y2 L
minimum possible value of L. When η approaches unity the
. p(x3) n3 . code is said to be efficient.
. .
Code redundancy γ is defined as γ = 1 – η
. .
. X CHANNEL Y .
bk
. .
.
DMS SOURCE CODING
.
. . .
. .
.
xm .p(xm) nm Binary sequence
yn
4
3/23/2009
SOURCE CODING SHANNON’S SOURCE CODING THEOREM
The conversion of the output of a DMS into a sequence of Source coding theorem states that for a DMS X with entropy
binary symbols (binary codes) is called source coding. H(x) the average code word length per symbol is bounded as
The device that performs this is called source encoder. L ≥ H ( x)
If some symbols are known to be more probable than others
and L can be made as close to H(x) as desired for some
then we may assign short codes to frequent source symbols
and long code words to rare source symbols. suitably chosen code. When Lmin = H ( x ) the code
Such a code is called a variable length code. efficiency
ffi i i η = H ( x)
is
L
As an example, in Morse code the letter E is encoded into a
single dot where as the letter Q is encoded as ‘_ _ . _ ‘. This is No code can achieve efficiency greater than 1, but for any
because in English language letter E occurs more frequently source, there are codes with efficiency as close to 1 as
than the letter Q. desired.
The proof does not give a method to find the best codes. It
just sets a limit on how good they can be.
SHANNON’S SOURCE CODING THEOREM SHANNON’S SOURCE CODING THEOREM

M −1
⎛ qi ⎞ 1 M −1 ⎛ qi ⎞
Proof of the statement: Length of code ≥ H(X) ≥ 0 [0 ≤ H(X) ≤ N] ∑ p logi 2 ⎜ ⎟≤ ∑ pi ⎜ − 1⎟
i =0 ⎝ pi ⎠ ln2 i =0 ⎝ pi ⎠
Consider any two probability distributions { p0 , p1,..., pM −1} 1 M −1
≤ ∑ (qi − pi )
ln2 i =0
and {q0 , q1,...,qM −1} on the alphabet { x0 , x1,..., xM −1} of a
1 ⎛ M −1 M −1
⎞
discrete memoryless channel. ∑ qi − ∑
ln2 ⎜⎝ i =0
≤pi ⎟ = 0
i =0 ⎠ M −1
⎛ qi ⎞
M −1
⎡q ⎤ 1 M −1 ⎛q ⎞ Thus we obtain the fundamental inequality ∑ p log
l i 2 ⎜ ⎟≤0
∑ p i log 2 ⎢ i ⎥ =
p ln 2
∑ p i ln ⎜ i ⎟ .............. (1) M −1
⎛ qi ⎞
i =0 ⎝ pi ⎠
i =0 ⎣ i⎦ i =0 ⎝ pi ⎠ ∑ p log ⎜ ⎟ ≤ 0 ..............(2)
i 2
i =0 ⎝ pi ⎠
By a special property of the natural logarithm (ln), we have, If there are M equally probable messages x1, x2 ,..., xM −1 with probabilities
ln x ≤ x − 1, x ≥ 0 q1,q2 ,..., qM −1 the entropy of this DMS is given by
M −1
1
Applying this property to Eq-(1) ∑ q log
i =0
i 2
qi
= log2 M ..............(3)
SHANNON’S SOURCE CODING THEOREM SHANNON’S SOURCE CODING THEOREM
1 H(X) = 0 if and only if the probability pi = 1 for some i and the

Also, qi = for i = 0,1,...,M − 1 ..............(4) remaining probabilities in the set are all zero. This lower
M
bound on entropy corresponds to no uncertainty.
Substituting Eq-(4) in Eq- (2)
M −1 M −1
1
∑ p lo gi 2
pi M
≤0 ≤ log 2 M , sinc e ∑ p i = 1 H(X) = log2M if and only if pi = 1/M for all i, i.e., all the symbols
i =0 i =0 in the alphabet are equi-probable. This upper bound on
M −1
H ( X ) ≤ log 2 M entropy corresponds to maximum uncertainty.
1 M −1 1
∑ p log
i =0
i 2 + ∑ p i log 2
pi i =0 M
≤0
H(X ) ≤ N if M = 2 N
Proof of the lower bound: Each probability pi is less than or
M −1 M −1 H( X ) ≤ L equal to 1. Each term pi log2(1/ pi) is zero if and only if pi = 0
1 1
∑ pi log 2
i =0 pi
≤ − ∑ p i log 2
i =0 M
If the symbols are equally likely. or 1. i.e., pi = 1 for some I and all others are zeroes. Since
M −1 each probability pi ≤ 1, each term pi log2(1/ pi) is always non
≤ ∑ p log
i =0
i 2 M negative.
5
3/23/2009
CLASSIFICATION OF CODES CLASSIFICATION OF CODES

Fixed Length Code: A fixed length code is one whose code word Uniquely Decodable Code: A code is uniquely decodable if
length is fixed. Code 1 and code 2 of Table 1 are fixed length the original source sequence can be reconstructed perfectly
codes. from the encoded binary sequence.
Variable Length Code: A variable length code is one whose code
Code 3 of the table is not uniquely decodable since the binary
sequence 1001 may correspond to source sequences x2x3x2
word length is not fixed. Shannon-Fano and Huffman’s codes are or x2x1x1x2.
examples of variable length codes. Code 3, 4, 5 in the Table 1
A sufficient condition to ensure that a code is uniquely
are variable length codes.
codes d
decodable
d bl is i that
th t no code
d word d is
i a prefixfi off another.
th Th
Thus
Distinct Code: A code is distinct if each code word is codes 2,4 and 6 are uniquely decodable codes.
distinguishable from other code words. Codes 2,3,4,5 and 6 are Prefix-free condition is not a necessary condition for unique
distinct codes. decodability. e.g. code 5
Prefix Code: A code in which no code word can be formed by Instantaneous Codes: A code is called instantaneous if the
adding code symbols to another code word is called a prefix end of any code word is recognizable without examining
code. No code word should be a prefix to another. subsequent code symbols. Prefix-free codes are
instantaneous codes e.g. code 6
e.g. Codes 2,4 and 6
CLASSIFICATION OF CODES
. PREFIX
Consider aCODING (INSTANTANEOUS
discrete memory-less CODING)
source of alphabet {x0, x1,…,
xi Code Code Code Code Code Code xm-1} with statistics {p0, p1, …, pm-1}
1 2 3 4 5 6 Let the code word assigned to source symbol xk be denoted by
x1 00 00 0 0 0 1 {mk1, mk2, …, mkn} where the individual elements are 0s and 1s
and n is the code word length.
x2 01 01 1 10 01 01 Initial part of code word is represented by mk1, …, mki for some i≤n
Any sequence made of the initial part of the code word is called a
x3 00 10 00 110 011 001 prefix of the code word.
A prefix code is defined as a code in which no code word is a
x4 11 11 11 111 0111 0001
prefix of any other code word.
It has the important property that it is always uniquely decodable.
Fixed Length Codes: 1,2 Prefix Code: 2,4,6
But the converse is not always true.
Variable Length Code: 3,4,5,6 Uniquely Decodable Code: 2,4,6 Thus, a code that does not satisfy the prefix condition is also
Distinct Code: 2,3,4,5,6 Instantaneous Codes: 2,4,6 uniquely decodable.
EXAMPLE 1(Contd…) B anc C are not uniquely decodable EXAMPLE

An analog signal is band-limited to fm Hz and sampled at Nyquist
xi Code A Code B Code C Code D rate. The samples are quantized into 4 levels. Each level
represents one symbol. The probabilities of occurrence of these
x1 00 0 0 0 4 levels (symbols) are p(x1) = p(x4) = 1/8 and p(x2) = p(x3) = 3/8.
Obtain the information rate of the source.
x2 01 10 11 100
Answer:
x3 10 11 100 110 3 1
p( x 2 ) = p( x3 ) = p( x1) = p( x 4 ) =
8 8
x4 11 110 110 111 1 3 8 3 8 1
H(X ) = log2 8 + log2 + log2 + log2 8 = 1.8 bits/symbol.
8 8 3 8 3 8
Nyquist rate means fs = 2fm
Rate at which symbols are generated r = 2fm symbols/second
A and D are Prefix codes. A,C,D satisfies Kraft inequality
⎛ symbols ⎞ ⎛ bits ⎞
R = 2 fm ⎜ ⎟ 1.8 ⎜ symbol ⎟ = 3.6fm bits/second
A prefix code always satisfies Kraft inequality. But the converse is not always true. ⎝ second ⎠ ⎝ ⎠
6
3/23/2009
EXAMPLE EXAMPLE
We are transmitting 3.6fm bits/second. There are four levels, 2 binary digits symbols

these four levels may be coded using binary PCM as shown
Binary digit rate = × 2 fm
symbol second
below.
= 4 f m binary digits/sec ond
Symbol Probabilities Binary digits
Q1 1/8 00 Since one binary y digit
g is capable
p of conveying
y g 1 bit of
Q2 3/8 01 information, the above coding scheme is capable of
Q3 3/8 10 conveying 4fm information bits/sec.
Q4 1/8 11 But we have seen earlier that we are transmitting 3.6fm bits of
Two binary digits are needed to send each symbol. Since information per second.
symbols are sent at the rate 2fm symbols/sec, the transmission This means that the information carrying ability of binary PCM
rate of binary digits will be: is not completely utilized by this transmission scheme.
EXAMPLE SHANNON-FANO CODING PROCEDURE

In the above example if all the symbols are equally likely ie (i) List the source symbols in the order of decreasing
p(x1)=p(x2)=p(x3)=p(x4)=1/4 probability.
(ii) Partition the set in to two sets that are as close to

With binary PCM coding, maximum information rate is achieved if
all messages are equally likely.
equiprobables as possible.
Often this is difficult to achieve. So we go for alternative coding
((iii)) Assign
g 0’s to upper
pp set and 1’s to the lower set.
schemes
h t increase
to i average information
i f ti per bit.
bit
(iv) Continue the process each time partitioning the sets with
as nearly equal probabilities as possible until further
partitioning is not possible.
(v) The rows of the table corresponding to the symbol gives

the Shannon –Fano code.
SHANNON-FANO CODING SHANNON-FANO CODING

Find out the Shannon-Fano Codes corresponding to eight
messages m1,m2,m3……m7 with probabilities 1/2, 1/8, 1/8, 1/16,
1/16, 1/16, 1/32 and 1/32 m1 0
Message Probabilities Codes No of
bits/message m2 1 0 0
m1 1/2 0 0 1
m3 1 0 1
m2 1/8 1 0 0 100 3
m4 1 1 0 0
m3 1/8 1 0 1 101 3
m4 1/16 1 1 0 0 1100 4 m5 1 1 0 1
m5 1/16 1 1 0 1 1101 4
m6 1/16 1 1 1 0 1110 4
m6 1 1 1 0
m7 1/32 1 1 1 1 0 11110 5 m7 1 1 1 1 0
m8 1/32 1 1 1 1 1 11111 5
m8 1 1 1 1 1 1
7
3/23/2009

8
There are 6 possible messages m1, m2, m3…….m6 with
∑

L= p ( xi )ni probabilities 0.3, 0.25, 0.2, 0.12, 0.08, 0.05 Obtain Shannon-
i =1
Fano Codes
1 1 1 1 1 1 1 1
= ×1 + × 3 + × 3 + × 4 + × 4 + × 4 + × 5 + × 5
2 8 8 16 16 16 32 32 Messages Probablities codes length
= 2.31
8
1 m1 03
0.3 0 0 00 2
H = ∑ p ( x i ) log 2
i =1 p ( xi ) m2 0.25 0 1 01 2
1 ⎛1 ⎞ ⎛1 ⎞ ⎛1 ⎞ m3 0. 2 1 0 10 2
= × log2 2 + ⎜ × log2 8 × 2 ⎟ + ⎜ × log2 16 × 3⎟ + ⎜ × log2 32 × 2 ⎟
2 ⎝8 ⎠ ⎝ 16 ⎠ ⎝ 32 ⎠ m4 0.12 1 1 0 110 3
= 2.31 m5 0.08 1 1 1 0 1110 4
H
η = = 100% m6 0.05 1 1 1 1 1111 4
L

8
L= ∑
i =1
p ( xi )ni A DMS has five equally likely symbols. Construct
Shannon-Fano Code
= 0.3 × 2 + 0.25 × 2 + 0.2 × 2 + 0.12 × 3 + 0.08 × 4 + 0.05 × 4
xi P(xi) Codes Length
= 2.38 b / symbol
x1 0.2 0 0 0 0 2
8
1
H = ∑
i =1
p ( x i ) log
g2
p ( xi )
x2
x3
0.2 0 1 0 1 2
0.2 1 0 1 0 2
1 1 1 1 x4
= 0.3 × log2
+ 0.25× log2 + 0.2 × log2 + 0.12 × log2 0.2 1 1 0 1 1 0 3
0.3 0.25 0.2 0.12 x5
1 1 0.2 1 1 1
+ 0.08× log2 + 0.05× log2 = 2.36 b / symbol 1 1 1 3
0.08 0.05
H 2.36
η= = = 0.99 = 99% Redundancy γ = 1 − η = 0.01 = 1%
L 2.38

5
L = ∑
i =1
p ( xi )ni A DMS has five symbols x1, x2, x3, x4, x5, construct Shannon-
Fano Code
= 0.2(2 + 2 + 2 + 3 + 3) = 2.4
xi P(xi) Codes Length
= 2.4 b / symbol
x1 0.4 0 0 0 0 2
5
1
H = ∑
i =1
p ( x i ) logg 2
p ( xi )
x2
x3
0.19 0 1 0 1 2
0.16 1 0 1 0 2
⎛ 1 ⎞ x4 0.15
= ⎜ 0.2 × log2 ⎟ × 5 = 2.32 b / symbol x5
1 1 0 1 1 0 3
⎝ 0.2 ⎠ 0.1 1 1 1 1 1 1 3
H 2.32
η= = = 0.967 = 96.7%
L 2.4
8
3/23/2009
HUFFMANN -CODING EXAMPLES OF HUFFMANN’S CODING

(i) List the source symbols in the order of decreasing probability.
(ii) Combine the probabilities (add) of two symbols having the 0.4
x1 0.4 0.4 0.4 0.6
lowest probabilities and reorder the resultant probabilities.
(1 1) 1 1
This process is called bubbling. 0.2 0.2 0.2 0.4 0.4 0
(iii) During the bubbling process if the new weight is equal to
x2
1
existing probabilities the new branch is to be bubbled to the (0 0) 0.2
x3 0.1 0.2 0.2 0
topp of the g
group
p havingg same pprobabilities. 1
(1 0 1)
(iv) Complete the tree structure and assign a ‘1’ to the branch 0.1 0.1 0.2 0
rising up and ‘0’ to that coming down.
x4
1
(v) From the final point trace the path to the required symbol and
(1 0 0)
0.1 0.1 0
x5
order the 0’s and 1’s encountered in the path to form the code. 1
(0 1 1)
0.1
* It produces the optimum code . x6 0
* It has the highest efficiency. (0 1 0)
EXAMPLES OF HUFFMANN’S CODING EXAMPLES OF HUFFMANN’S CODING
x1 0.30 0.30 0.30 0.45 0.55 x1 0.4 0.4 0.4 0.6

(1 1) 1 1 (0) 1
0.25 0.25 1
x2 0.25 0.25 0.30 0.45 0
x2 0.19 0.35 0.4 0
1 1
(0 1) 0.20 (1 1 1) 0.19
x3 0.20 0.25 0.25 0
x3 0.16 0.25 0
(0 0) 1 (1 1 0) 1
0.12 0.13 0.20 0 0.15 0.16 0
x4 x4
1 1
(1 0 0)
0.08 0.12 0
(1 0 1)
0.1
x5 x5 0
(1 0 1 1) 1 (1 0 0)
0.05 0
x6
(1 0 1 0)
EXAMPLES OF HUFFMANN’S CODING CHANNEL REPRESENTATION
0.4 A communication channel may be defined as the path or

x1 0.2 0.4 0.6 medium through which the symbols flow to the receiver.
(1 0) 1
0.2 1 A Discrete Memory-less Channel (DMC) is a statistical model
x2 0.2 0.4 0.4 0
with an input X and an output Y as shown below.
(0 1) 1
0.2 0.2 0.2 During each signalling interval, the channel accepts an input
x3 0
signal from X, and in response it generates an output symbol
(0 0) 1
0.2 0.2 from Y.
0
x4 The channel is discrete when the alphabets of X and Y are both
1
(1 1 1)
0.2 finite values.
x5 0
It is memory-less when the current output depends only on the
(1 1 0) current input and not on any of the previous inputs.
9
3/23/2009
CHANNEL REPRESENTATION CHANNEL REPRESENTATION
A diagram of a DMC with m inputs and n outputs is shown

x1 y1 above.
x2 y2 The input X consists of input symbols x1,x2,…..xm. The output Y
. . consists of output symbols y1,y2….yn.
. . Each possible input to output path is indicated along with a
. . conditional probability p(yj/xi) which indicates the conditional
.
.
X p(yj/xi) Y .
.
probability of obtaining output yj given that input is xi and is called
channel transition probability.
. .
A channel is completely specified by the complete set of
. .
transition probabilities. So a DMC is often specified by a matrix
.
of transition probabilities [P(y/x)]
.
xm yn
CHANNEL MATRIX CHANNEL MATRIX

Matrix [P(y|x)] is called channel matrix. Each row of the matrix
⎡ P( y1 | x1 ) P( y2 | x1 ) .................. P( yn | x1 ) ⎤ specifies the probabilities of obtaining y1,y2….yn, given x1. So, the
⎢ P( y | x ) P( y | x ) ................. P( y | x ) ⎥ sum of elements in any row should be unity.
⎢ 1 2 2 2 n 2 ⎥
n
⎢....................................................................... ⎥
⎢ ⎥ ∑ p( y j | xi ) = 1 for all i
⎢....................................................................... ⎥
j =1
⎢⎣ P( y1 | xm ) P( y2 | xm ) ................. P( yn | xm )⎥⎦
If the probabilities
p P(X)
( ) are represented
p byy the row matrix,, then we
have [P(X)] = [p(x1) p(x2) ………………...p(xm)]
The output probabilities P(y) are represented by the row matrix
[P(Y)] = [p(y1) p(y2)………………….p(yn)]
The output probabilities may be expressed in terms of input
probabilities as
[P(Y)] = [P(X)] [P(Y|X)]
CHANNEL MATRIX LOSSLESS CHANNEL

A channel described by a channel matrix with only one non-
If [P(X)] is represented as a diagonal matrix zero element in each column is called a lossless channel.
⎡ p( x1 ) 0 ...................0 ⎤ In a lossless channel no source information is lost in
⎢0 transmission.
p( x2 ) .... .........0 ⎥
⎢ ⎥ 3/4 y1
⎢.................................... ⎥ x1 ⎡ 3 1 ⎤
⎢ 0 0 0 ⎥
⎢ ⎥ 1/4
4 4
⎣0 0 ........... p( xm ) ⎦ y2
⎢ ⎥
⎢ 1 2
Then [P(X,Y)] = [P(X)]d [P(Y|X)] x2 1/3 0 0 0⎥
y3 ⎢ 3 3 ⎥
The (i,j) element of matrix [P(X,Y)] has the form p(xi,yj). The
2/3
⎢ 0 0 0 0 1 ⎥
matrix [P(X,Y)] is known as the joint probability matrix and the x3 y4 ⎢ ⎥
element p(xi,yj) is the joint probability of transmitting xi and 1 ⎣ ⎦
receiving yj. y5
10
3/23/2009
DETERMINISTIC CHANNEL DETERMINISTIC CHANNEL

A channel described by a channel matrix with only one non-
zero element in each row is called a deterministic channel 1 0 0
1 0 0
x1 [P(Y|X)] = 0 1 0
x2 1 y1 0 1 0
1
x3
y2
0 0 1
1
x4 1
y3
x5
1 Since each row has only one non-zero element, this element
must be unity. When a given source symbol is sent to a
deterministic channel, it is clear which output symbol is received.
NOISELESS CHANNEL BINARY SYMMETRIC CHANNEL

A channel is called noiseless if it is both lossless and A binary symmetric channel is defined by the channel diagram
deterministic. shown below and its channel matrix is given by
The channel matrix has only one element in each row and each
column, and this element is unity. ⎡1 − p p ⎤
[ P(Y | X )] = ⎢
The input and output alphabets are of the same size.
⎣ p 1 − p ⎥⎦
x1 1 y1 x1=0 y1=0
1-p
p
x2 1 y2
x3 1 y3 p
x2=1 y2=1
xm 1 ym 1-p
BINARY SYMMETRIC CHANNEL EXAMPLE 1

The channel matrix has two inputs 0 and 1 and two outputs 0 p(x1) 0.9 y1
and 1.
This channel is symmetric because the probability of
receiving a 1 if a 0 is sent is the same as the probability of p(x2) y2
receiving a 0 if a 1 is sent. 0.8
This common transition probability is represented by p. (i) Find the channel matrix of the binary channel.
(ii) Find p(y1) and p(y2) when p(x1)=p(x2)=0.5
(iii) Find the joint probabilities p(x1,y2) and p(x2,y1) when
p(x1)=p(x2)=0.5
11
3/23/2009
SOLUTION EXAMPLE 2
Two binary channels of the above example are connected in
⎛ p(y1 | x1) p(y2 | x1) ⎞ ⎛ 0.9 0.1⎞
Channel Matrix p(y | x)= ⎜ ⎟ =⎜ ⎟ cascade. Find the overall channel matrix and draw the resultant
⎝ p(y1 | x2 ) p(y2 | x2 ) ⎠ ⎝ 0.2 0.8 ⎠ equivalent channel diagram. Find p(z1), p(z2) when
⎛ 0.9 0.1⎞ p(x1)=p(x2)=0.5
[P (Y )] = [P ( X )][P (Y | X )] = [0.5 0.5] ⎜ ⎟ = [0.55 0.45]
⎝ 0.2 0.8⎠
p( y1) = 0.55, p(y 2 ) = 0.45 x1 0.9 0.9 z1
⎛ 0.5 0 ⎞⎛ 0.9 0.1⎞ ⎛ 0.45 0.05⎞
[P ( X ,Y )] = [P ( X )]d [P (Y | X )] =⎜ ⎟⎜ ⎟=⎜ ⎟
⎝ 0 0.5⎠⎝ 0.2 0.8⎠ ⎝ 0.1 0.4 ⎠
⎛ p(x1, y1) p(x1, y2 ) ⎞ ⎛ 0.45 0.05 ⎞ z2
⎜ ⎟=⎜ ⎟ x2
⎝ p(x2, y1) p(x2, y2 ) ⎠ ⎝ 0.1 0.4 ⎠ 0.8 0.8
p( x1, y 2 ) = 0.05 p( x2 , y1) = 0.1
SOLUTION EXAMPLE 3
A channel has the channel matrix
P (Z | X ) = [P (Y | X )][ P (Z | Y ]
⎡1 − p p 0 ⎤
⎛ 0.9 0.1⎞⎛ 0.9 0.1⎞ ⎛ 0.83 0.17 ⎞ [P (Y | X )] = ⎢ p 1 − p ⎥⎦
=⎜ ⎣ 0
⎟⎜ ⎟=⎜ ⎟
⎝ 0.1 0.8 ⎠⎝ 0.2 0.8 ⎠ ⎝ 0.34 0.66 ⎠ (i) Draw the channel diagram
x1 0.83 (ii) If the source has equally likely outputs compute the
z1
probabilities associated with the channel outputs
p p for p
p=0.2
[P (Z )] = [P ( X )][P (Z | X )] x1=0 y1=0
x2 z2
0.66 y2=e (erasure)
⎛ 0.83 0.17 ⎞
P(Z) = [0.5 0.5] ⎜ ⎟ = [0.585 0.415] x2=1 y3=1
⎝ 0.34 0.66 ⎠
SOLUTION EXAMPLE
1
5
This channel is known as binary erasure channel (BEC) x1
3
y1
1 1
It has two inputs x1=0 and x2=1 and three outputs y1=0, y2=e, 1 3 4
3 1
y3=1 where e denotes erasure. This means that the output is in 4
1
doubt and hence it should be erased 2
x2 y2
1
⎡ 0 .8 0 .2 0 ⎤ 4
[ P (Y )] = [0 . 5 0 . 5 ]⎢
0 . 8 ⎥⎦
1
⎣ 0 0 .2 4
x3 1 y3
= [0 . 4 0 . 2 0 . 4 ]
2
(i) Find the channel matrix
1 1.
(ii) Find output probabilities if p ( x1 ) = , p( x2 ) = p(x3 ) =
2 4
(iii) Find the output entropy H (Y ) .
12
3/23/2009
MUTUAL INFORMATION AND CHANNEL

SOLUTION
⎡1 1 1 ⎤
CAPACITY OF DMC
⎢ 3 3 3 ⎥ ⎡0.33 0.33 0.33⎤
P [Y / X ] = ⎢ 1 1 1 ⎥ = ⎢0.25 0.5 0.25⎥ Let a source emit symbols x , x1 2 ,..., x m and the receiver receive
⎢ 4 2 4⎥ ⎢ ⎥ symbols y1 , y 2 ,..., y n .
⎢1 1 1 ⎥ ⎣⎢0.25 0.25 0.5 ⎦⎥
⎣⎢ 4 4 2 ⎦⎥ ⎡0.33 0.33 0.33⎤ The set of symbols y k may or may not be identical to the set of
P [Y ] = [P ( X )][P (Y | X )] = [0.5 0.25 0.25] ⎢0.25 0.5 0.25⎥ symbols x k depending on the nature of the receiver.
⎢ ⎥
⎣⎢0.25 0.25 0.5 ⎥⎦ Several types of probabilities will be needed to deal with the two
= ⎡7 17 17 ⎤ alphabets
p x k and y k
⎣ 24 48 48⎦
3
⎧ 1 ⎫
H (Y ) = ∑ p ( y )log i 2 ⎨ ⎬ x1
y1
i =1 ⎩ p( y i ) ⎭ y2
x2
7 ⎛ 24 ⎞ 17 ⎛ 48 ⎞ 17 ⎛ 48 ⎞ X . . Y
= log2 ⎜ ⎟ + log2 ⎜ ⎟+ log2 ⎜ ⎟
24 ⎝ 7 ⎠ 48 ⎝ 17 ⎠ 48 ⎝ 17 ⎠ . CHANNEL .
= 1.58 b/symbols . .
xn yn
PROBABILITIES ASSOCIATED WITH A CHANNEL ENTROPIES ASSOCIATED WITH A CHANNEL

Correspondingly we have the following entropies also:
i. p(xi) is the probability that the source selects symbol xi
for transmission
i. H(X) is the entropy of the transmitter.
ii. p(yi) is the probability that the symbol yj is received.
ii. H(Y) is the entropy of the receiver
iii. p(xi,yi) is the joint probability that xi is transmitted and yj
iii. H(X,Y) is the joint entropy of the transmitted and received
is received
symbols
iv p(xi/yj) is the conditional probability that xi was
iv.
iv. H(X|Y) is the entropy of the transmitter with a knowledge
transmitted given that yj is received
of the received symbols.
v. p(yj/xi) is the conditional probability that yj is received
v. H(Y|X) is the entropy of the receiver with a knowledge of
given that xi was transmitted.
the transmitted symbols.
ENTROPIES ASSOCIATED WITH A CHANNEL RELATIONSHIP BETWEEN ENTROPIES

n −1 m −1
1
⎛ 1 ⎞ H ( X ,Y ) = ∑∑ p( xi , y j )log2
m −1
H (X ) = ∑ p ( x ) log i 2 ⎜ ⎟ j = 0 i =0 p( xi , y j )
i =0 ⎝ p( xi ) ⎠ n −1 m −1
1
m −1
⎛ 1 ⎞ = ∑∑ p( xi , y j )log2
H (Y ) = ∑ p ( y i ) lo g 2 ⎜ ⎟ j = 0 i =0 p( x i | y j ) p( y j )
i =0 ⎝ p(y i ) ⎠ n −1 m −1 ⎛ 1 1 ⎞
n −1 m −1 ⎛ 1 ⎞ = ∑∑ p( xi , y j ) ⎜ log2 + log2 ⎟⎟
H ( X | Y ) = ∑∑ p( xi , y j )log2 ⎜ ⎜ p( x i | y j ) p( y j )
⎜ p( x | y ) ⎟⎟ ⎝ ⎠
j =0 i =0
j =0 i =0 ⎝ i j ⎠
n −1 m −1 ⎛
1 1 ⎞
n −1 m −1 ⎛ ⎞ = ∑∑ ⎜ p( xi , y j )log2 + p( xi , y j )log2 ⎟
H (Y | X ) = ∑∑ p( xi , y j )log2 ⎜
1 ⎜
j =0 i =0 ⎝ p( xi | y j ) p( y j ) ⎟⎠
⎜ p( y | x ) ⎟⎟
j =0 i =0 ⎝ j i ⎠
n −1 m −1
1
n −1 m −1
1 = H ( X | Y ) + ∑∑ p( xi , y j )log2
H ( X ,Y ) = ∑∑ p( xi , y j )log2 j =0 i =0 p( y j )
j =0 i =0 p( xi , y j )
13
3/23/2009
RELATIONSHIP BETWEEN ENTROPIES MUTUAL INFORMATION

⎡
n −1 m −1
⎤ 1 If the channel is noiseless then the reception of some symbol
H ( X ,Y ) = H ( X | Y ) + ∑ ⎢ ∑ p( xi , y j )⎥ log2

p( yj) yj uniquely determines the message transmitted.

j =0 ⎣ i =0 ⎦
Because of noise there is a certain amount of uncertainty
n −1
1 regarding the transmitted symbol when yj is received.
= H ( X | Y ) + ∑ p( y i )log2 ⎡ m−1 ⎤
⎢ ∑ p( xi , y j )⎥ = p(y j )
j =0 p( y j ) p(xi|yj) represents the conditional probability that the
⎣ i =0 ⎦ transmitted symbol was xi given that yj is received.
The average uncertainty about x when yj is received is
= H ( X | Y ) + H (Y ) represented as
m−1
⎪⎧ 1 ⎪⎫
H( X | Y = y j ) = ∑ p( xi | y j )log2 ⎨ ⎬
H ( X ,Y ) = H ( X | Y ) + H (Y ) i =0 ⎩⎪ p( xi | y j ) ⎭⎪
The quantity H(X|Y=yj) is itself a random variable that takes on
Similarly values H(X|Y=y0), H(X|Y=y1),…, H(X|Y=yn) with probabilities
H (X ,Y)= H (Y | X ) + H ( X ) p(y0), p(y1),…, p(yn).
MUTUAL INFORMATION MUTUAL INFORMATION

Now the average uncertainty about X when Y is received is If the channel were noiseless the average amount of information
⎡ received would be H(X) bits per received symbol.
n −1 m −1 ⎛ 1 ⎞⎤
H ( X | Y ) = ∑ ⎢ ∑ p( xi | y j )log2 ⎜
⎜ p( x | y ) ⎟⎟ ⎥
p( y j ) H(X) is the average amount of information transmitted per symbol.
j = 0 ⎢ i =0 ⎝ j ⎠⎦ ⎥
⎣ i
Because of channel noise we lose an average of H(X|Y) information
n −1 m −1 ⎛ 1 ⎞ per symbol.
= ∑∑ p( xi | y j ) p( y j )log2 ⎜
⎜ p( x | y ) ⎟⎟ Due to this loss the receiver receives on the average H(X) – H(X|Y)
j =0 i =0 ⎝ i j ⎠
bits p
per symbol.
y
n −1 m −1 ⎛ 1 ⎞ The quantity H(X) – H(X|Y) is denoted by I(X;Y) and is called
= ∑∑ p( xi , y j )log2 ⎜

⎜ p( x | y ) ⎟⎟ mutual information.
j = 0 i =0 ⎝ i j ⎠
m −1
⎛ 1 ⎞ m−1 n −1 ⎛ 1 ⎞
H(X|Y) represents the average loss of information about a I ( X ;Y ) = ∑ p( xi )log2 ⎜ ⎟ − ∑∑ p( xi , y j )log2 ⎜⎜ ⎟⎟

transmitted symbol when a symbol is received. i =0 ⎝ p( xi ) ⎠ i =0 j =0 ⎝ p( xi | y j ) ⎠
It is called equivocation of X w. r. t. Y.
MUTUAL INFORMATION CHANNEL CAPACITY

n −1
But ∑ p( x , y ) = p( x )
i j i
A particular communication channel has fixed source and
j =0 destination alphabets and a fixed channel matrix.
m−1 n−1
⎛ 1 ⎞ m−1 n−1 ⎛ 1 ⎞ So the only variable quantity in the expression for mutual
I(X;Y ) = ∑∑p(xi , y j )log2 ⎜ ⎟ − ∑∑p(xi , y j )log⎜⎜

⎟⎟ information I(X;Y) is the source probability p(xi).
i =0 j =0 ⎝ p(xi ) ⎠ i =0 j =0 ⎝ p(xi | y j ) ⎠
Consequently maximum information transfer requires specific
m−1 n−1 p(x / y )
= ∑∑p(xi , y j )log i j source statistics obtained through source coding.
i =0 j =0 p(xi ) A suitable measure of the efficiency of information transfer
m −1 n −1 p( xi , y j ) through a DMS is obtained by comparing the actual
= ∑∑ p( xi , y j )log ...........(1) information transfer to the upper bound of such trans-
i = 0 j =0 p( y j ) p( xi )
information for a given channel.
If we interchange the symbols xi and yj the value of eq(1) is not The information transfer in a channel is characterised by
altered, so we get I(X;Y)=I(Y;X). mutual information and Shannon named the maximum mutual
H(X) – H(X|Y)=H(Y) – H(Y|X) information as the channel capacity.
Channel capacity C=I(X;Y)max
14
3/23/2009
CHANNEL CAPACITY CHANNEL CAPACITY OF A BSC
Channel capacity C is the maximum possible information

x1=0 y1=0
transmitted when one symbol is transmitted from the p 1-α
transmitter. α
Channel capacity depends on the transmission medium, kind α
of signals, kind of receiver, etc. and it is a property of the
y2=1
system as a whole. x2=1
1- α
1
1
1-p
The source alphabet consists of two symbols x1 and x2 with
probabilities p(x1)=p and p(x2)=1-p. The destination alphabet
is y1,y2.
The average error probability per symbol is
p e = p ( x1 ) p ( y 2 | x1 ) + p ( x 2 ) p ( y 1 | x 2 )
= pα + (1 − p )α = α
CHANNEL CAPACITY OF A BSC CHANNEL CAPACITY OF A BSC

The error probability of a BSC is α
⎡1 − α α ⎤ H Plot of H as a function of α
The channel matrix is given by [ P (Y | X ) ] = ⎢
1 − α ⎥⎦

⎣ α
Destination entropy H(Y) is 1 HMAX

1 1
H ( Y ) = p ( y 1 ) log 2 + p ( y 2 ) log 2
p ( y1 ) p( y2 )
1 1
= p ( y 1 ) log 2 + (1 − p ( y 1 ) ) log 2
p ( y1 ) 1 − p ( y1 )
= Ω[ p( y1 )]
1 1
Ω ( x ) = x log 2 + (1 − x ) log 2 0 1/2 1 α
x 1− x

The maxima occurs at x=0.5 and Hmax =1 bit/symbol
2 2
The probability of the output symbol y1 is 1

H (Y | X ) = ∑ p ( xi ) ∑ p( y | xi ) log 2
p ( y1 ) = ∑ p ( xi , y1 )
j
i =1 j =1 p ( y j | xi )
x
1 1
= p ( y1 | x1 ) p( x1 ) + p( y1 | x2 ) p( x2 ) = p ( x1 ) p ( y1 | x1 ) log 2 + p ( x1 ) p ( y 2 | x1 ) log 2
p ( y1 | x1 ) p ( y 2 | x1 )
= ((1 − α ) p + α (1
( − p) 1 1
+ p ( x 2 ) p ( y1 | x 2 ) log 2 + p ( x 2 ) p ( y 2 | x 2 ) log 2
= α + p − 2α p H (Y ) = Ω(α + p − 2α p ) p ( y1 | x 2 ) p ( y 2 | x2 )
n m
1
Noise entropy H (Y | X ) = ∑∑ p ( xi , y j ) log 2 = p(1 − α )log2
1 1 1
+ pα log2 + (1 − p)α log2 + (1 − p)(1 − α )log2
1
j =1 i =1 p ( y j | xi ) 1−α α α 1−α
2 2
1 1 1
= ∑∑ p( xi ) p( y j | xi )log2 = (1 − α )log2
1−α
+ α log2
α
= Ω(α ) H (Y | X ) = Ω(α )
j =1 i =1 p( y j | xi )
15
3/23/2009

This condition is satisfied by any α if p=1/2.
I ( X ;Y ) = H (Y ) − H (Y | X ) So the channel capacity of a BSC can be written as
I ( X ;Y ) = Ω(α + p − 2α p ) − Ω(α ) C = 1 − Ω(α )
If the noise is small, error probability α<<1 and the mutual

C = 1 − Ω(α )
information becomes almost the source entropy.
I ( X ; Y ) = Ω( p ) = H ( X )
On the other hand if the channel is very much noisy, α=1/2.
I ( X ;Y ) = 0
For a fixed α, Ω(α) is a constant, but the other term Ω(α+p-2 αp)
varies with source probability.
This term reaches a maximum value of 1 when α+p-2 αp=1/2
SHANNON’S THEOREM ON CHANNEL CAPACITY DIFFERENTIAL ENTROPY H(X)
i. Given a source of M equally likely messages with M>>1 Consider a continuous random variable X with the probability
which is generating information at a rate R, given a channel density function fX(x).
with channel capacity C, then if R≤C, there exists a coding By analogy with the entropy of a discrete random variable we
technique such that the output of the source may be can introduce the definition
transmitted over the channel with a probability of error in ∞
⎛ 1 ⎞
the received message which may be made arbitrarily small. h( X ) =
−∞
∫f X ( x)log2 ⎜ ⎟ dx
⎝ f X ( x) ⎠
ii. Given a source of M equally likely messages with M>>1 h(x) is called differential entropy of X to distinguish it from the
which is generating information at a rate R; then if R>C, the ordinary or absolute entropy. The difference between h(x) and
probability of error is close to unity for every possible set of H(X) can be explained as below.
M transmitter signals.
DIFFERENTIAL ENTROPY H(X) DIFFERENTIAL ENTROPY H(X)

∞ ∞
⎛ 1 ⎞
We can view the continuous random variable X as the limiting = ∫f ( x )log2 ⎜
X ⎟dx − Δlim
⎝ fX ( x ) ⎠ x → 0
log2 Δx ∫ fX ( x )dx
form of a discrete random variable that assumes the values −∞ −∞
xk=kΔx where k=0, ±1, ±2, … and Δx approaches zero. ⎡ ∞

⎤
H ( X ) = h( X ) − lim log2 Δx ⎢ since ∫ fX ( x )dx = 1⎥
The continuous random variable X assumes a value in the (continuous) Δx →0
⎣ −∞ ⎦
interval xk, xk + Δx with probability fX(xk)Δx. In the limit as Δx →0 − lim ,log 2 Δ x approaches infinity.
Δx → 0
Hence p permitting g Δx to approach
pp zero the ordinaryy entropy
py of H (X ) → ∞ . This implies that the entropy of a continuous random
the continuous random variable X may be written in the limit variable is infinitely large.
as follows A continuous random variable may assume a value anywhere in
∞
⎛ 1 ⎞ the interval −∞ to +∞
H ( X ) = lim ∑ fX ( xk )Δx log2 ⎜ ⎟
and the uncertainty associated with the
(continuous) Δx → 0
k =−∞ f
⎝ X k( x ) Δ x ⎠ variable is on the order of infinity.
So we define h(X) as differential entropy with the term − log 2 Δ x
⎧⎪ ∞ ⎛ 1 ⎞ ∞ ⎫⎪
= lim ⎨ ∑ fX ( xk )log2 ⎜ ⎟ Δx − log2 Δx ∑ fX ( xk )Δx ⎬ serving as a reference.
Δx → 0
⎪⎩ −∞ f ( x
⎝ X k ⎠ ) k =−∞ ⎪⎭
16
3/23/2009
EXAMPLE EXAMPLE
1
h( x) = ∫ 1 2 log 2 2 dx = 1 2 [x ]−1 = 1bit
A signal amplitude X is a random variable uniformly 1
distributed in the range (-1,1). The signal is passed through
an amplifier of gain 2. The output Y is also a random variable −1
2
uniformly distributed in the range (-2,+2). Determine the
differential entropies of X and Y
h( y ) = ∫
−2
1
4 log 2 4 dy = 1 4 × 2[ x]−22 = 2bits
The entropy of the random variable Y is twice that of X.

⎪⎧ 12 , x <1
fx ( x ) = ⎨ Here Y=2X and a knowledge of X uniquely determines Y.
⎪⎩0, Otherwise
Hence the average uncertainty about X and Y should be
⎧⎪ 14 , y <2 identical.
fy ( y ) = ⎨ Amplification can neither add nor subtract information. But here
⎪⎩0, Otherwise
h(Y) is twice as large as h(X).
This is because h(X) and h(Y) are differential entropies and they
will be equal only if their reference entropies are equal.
EXAMPLE CHANNEL CAPACITY AND MUTUAL

The reference entropy R1 for X is − log Δx and reference
INFORMATION
entropy R2 for Y is − log Δy Let a random variable X is transmitted over a channel.
In the limit as Δx, Δy → 0 Each value of X in a given continues range is now a message that
may be transmitted. e.g. a pulse of height X.
R1 = lim − log Δ x R2 = lim − log Δy The message recovered by the receiver will be a continuous
Δx → 0 Δy → 0
random variable Y.
Δy = log ⎛ dy ⎞ = log ⎡ d (2 x) ⎤ If the channel were noise free the received value Y would uniquely
R1 − R2 = lim logg ⎜ ⎟ ⎢⎣ dx ⎥⎦

Δx
Δx , Δy → 0 ⎝ dx ⎠ determine the transmitted value X.X
= log 2 = 1bit i.e., R1 = R2 + 1 bits Consider the event that at the transmitter a value of X in the
interval (x, x+∆x) has been transmitted (∆x→0).
R1, reference entropy of X is higher than the reference Here the amount of information transmitted is log [1 f X ( x ) Δx ]
entropy R2 for Y. Hence if X and Y have equal absolute since the probability of the above event is fx(x)∆x.
entropies their differential entropies must differ by 1 bit. Let the value of Y at the receiver be y and let fx(x|y) is the
conditional pdf of X given Y.
Then fx(x|y) ∆x is the probability that X will lie in the interval Comparing with the discrete case we can write the mutual
information between random variable X and Y as
(x, x+∆x) when Y=y provided ∆x→0. There is an uncertainty
about the event that X lies in the interval (x,x+∆x). ∞ ∞
⎡ f ( x | y) ⎤
This uncertainty log [1 f X ( x | y ) Δx ] arises because of channel I ( X ;Y ) = ∫∫ f XY ( x, y ) log 2 ⎢ X
⎣ f X ( x) ⎦
⎥dxdy
noise and therefore represents a loss of information. −∞ −∞
Because log[1 g[ f X ( x ) Δx ] is the information transmitted and ∞ ∞

1
∞ ∞
.log [1 f X ( x | y ) Δx ] is the information lost over the channel the = ∫∫

−∞ −∞
f XY ( x, y ) log 2
f X ( x)
dxdy + ∫ ∫ f XY ( x, y ) log 2 f X ( x | y )dxdy
−∞ −∞
net information received is given by the different between the
∞ ∞ ∞ ∞
two. 1
= log [1 f X ( x ) Δx ] − log [1 f X ( x | y ) Δx ] = ∫∫f
−∞ −∞
X ( x) fY ( y | x)log2
f X ( x)
dxdy + ∫ ∫ f XY ( x, y)log2 f X ( x | y)dxdy
−∞ −∞
f ( x | y)
= log X ∞ ∞ ∞ ∞
f X ( x) 1
= ∫
−∞
f X ( x)log2 dx ∫ fY ( y | x)dy + ∫ ∫ f XY ( x, y)log2 f X ( x | y )dxdy
f X ( x) −∞ −∞ −∞
17
3/23/2009

∞
∞
1 It is the loss of information over the channel.
Now, ∫ f X ( x ) log
−∞
f X ( x) −∞
∫
dx = h( x ) and fY ( y | x )dy = 1 The average of log [1 f X ( x | y )] is the average loss of
information over the channel when some x is transmitted
∞ ∞
and y is received.
I ( x ; y ) = h( x ) + ∫∫
−∞ −∞
f XY ( x, y ) log 2 f X ( x | y )dxdy
By definition this quantity is represented by h(x|y) and is
∞ ∞ called equivocation
q of X and Y
1
= h( x ) − ∫∫ f XY ( x, y ) log 2 dxdy ∞ ∞
1
−∞ −∞
f X ( x | y) h( X | Y ) = ∫∫
−∞ −∞
f XY ( x, y )log 2
f x ( x | y)
dxdy
The second term on the RHS represents the average over x

and y of log [1 f X ( x | y )] I ( X ;Y ) = h ( x ) − h ( x | y )
But this term log [1 f X ( x | y ) ] represents the uncertainty about
x when y is received.
CHANNEL CAPACITY MAXIMUM ENTROPY FOR CONTINUOUS

CHANNELS
That is when some value of X is transmitted and when some
value of Y is received the average information transmitted over For discrete random variables the entropy is maximum when all
the channel is I(X;Y). the outcomes were equally likely.
Channel capacity C is defined as the maximum amount of For continuous random variables there exists a PDF fx(x) that
information that can be transmitted on the average. maximizes h(x).
C = max[ I ( X ; Y )]
It is found that the PDF that maximizes h(x) is Gaussian
distribution ggiven by
y
( x − μ )2
C = max[ I ( X ; Y )] f X ( x) =
1
e
−
2σ 2
2πσ
Also the random variables X and Y must have the same mean μ
and same variance σ2
MAXIMUM ENTROPY FOR CONTINUOUS MAXIMUM ENTROPY FOR CONTINUOUS

CHANNELS CHANNELS
Consider an arbitrary pair of random variation X and Y Whose ∞
1
∞
PDF are respectively denoted by fy(x) and fx(x) where x is a ∫

−∞
fY ( x ) log 2
fY ( x )
≤ − ∫ fY ( x ) log 2 f X ( x )dx
−∞
dummy variable.
∞
Adapting the fundamental inequality h(Y ) ≤ − ∫ fY ( x ) log 2 f X ( x )dx..........(1)
−∞
m
qk
∑p
k =1
k log 2
pk
≤0
When the random variable X is Gaussian its PDF is given by
( x −μ )
∞ 1 −
we may write f ( x) f X ( x) =
2
e 2σ .............(2)
∫ fY ( x ) log 2 X
fY ( x )
dx ≤ 0 2πσ
−∞
Substituting (2) in (1)
∞ ∞
1 ∞ ( x − μ )2
∫ fY ( x ) log 2 + ∫ fY ( x ) log 2 f X ( x ) ≤ 0
fY ( x ) −∞ h(Y ) ≤ − ∫ fY ( x )log 2
1
e
−
2σ 2
dx
−∞
−∞ 2πσ
18
3/23/2009
MAXIMUM ENTROPY FOR CONTINUOUS MAXIMUM ENTROPY FOR CONTINUOUS

CHANNELS CHANNELS
Converting the logarithm to base e using the relation
It is given that the random variable X and Y has the properties
(i) mean=μ, (ii) variance=σ2
log 2 ( x ) = log 2 e[log e ( x )] ∞ ∞
∫ fY ( x ) = 1, ∫ (x − μ) fY ( x )dx = σ 2
2
⎡ 1 ( x−μ ) ⎤ 2
∞ −∞ −∞
−
h (Y ) ≤ − ∫ fY ( x ) log 2 e log e ⎢
2
e 2σ ⎥dx ⎡ 1 ⎤
−∞
∞ ⎣⎢ 2πσ ⎦⎥ h(Y) ≤ −log ge 2πσ ⎥
g2 e ⎢− − log
∞
⎡ ( x − μ )2 ⎤ ⎣ 2 ⎦
≤ − log 2 e ∫ fY ( x ) ⎢ − − log 2πσ ⎥dx 1
≤ log2 e + log2 e.loge 2πσ h (Y ) ≤
1
(
lo g 2 2 π σ 2 e )
2σ 2
e
−∞ ⎣ ⎦ 2 2
1
⎧ ∞
⎛ ( x − μ )2 ⎞
∞
⎫ ≤ lo g 2 e + lo g 2 2 π σ
≤ − log 2 e ⎨ ∫ fY ( x ) ⎜ − ⎟ dx − ∫ fY ( x ) log e 2πσ dx ⎬ 2
⎩ −∞ ⎝ 2σ 2
⎠ −∞ ⎭ 1 1
≤ lo g 2 e + lo g 2 2 π σ 2
2 2
1
≤
2
(
lo g 2 2 π σ 2 e )
MAXIMUM ENTROPY FOR CONTINUOUS CHANNEL CAPACITY OF A BAND LIMITED AWGN

CHANNEL (SHANNON HARTLEY THEOREM)
CHANNELS
1 The channel capacity C is the maximum rate of information
Maximum value of h(Y) is h( y ) = log 2 (2π eσ 2 ) transmission over a channel . The mutual information I(X;Y) is
2 given by I(X;Y)=h(Y)-h(Y|X)
For a finite variance σ2 the gaussian random variable has the
largest differential entropy attainable by any random variable. The channel capacity is the maximum value of the mutual
The entropypy is uniquely
q y determined byy its variance. information I(X;Y). Let a channel is band limited to B Hz and
di t b d by
disturbed b a white
hit Gaussian
G i noise
i off PSD (η/2
( /2)
Let the signal power be S. The disturbance is assumed to be

additive so the received signal y(t)=x(t) + n(t)
Because the channel is band limited both the signal x(t) and the
noise n(t) are bandlimited to B Hz . y(t) is also bandlimited to
B Hz.
CHANNEL CAPACITY OF A BAND LIMITED AWGN CHANNEL CAPACITY OF A BAND LIMITED AWGN
CHANNEL (SHANNON HARTLEY THEOREM) CHANNEL (SHANNON HARTLEY THEOREM)
∞ ∞
All these signals are therefore completely specified by samples h (Y | X ) = ∫ f X ( x ) dx ∫ fY ( y | x ) log 2 (1 fY ( y | x ))dy
taken at the uniform rate of 2B samples / second . −∞ −∞
Now we have to find the maximum information that can be For a given x, y is equal to a constant x+n .
transmitted per sample .
Hence the distribution of Y when X has a given value is
Let x,n and y represent samples of x(t) , n(t) and y(t) . identical to that of n except for a translation by x .
The information I(X;Y) transmitted per sample is given by If fn represents the PDF of the noise sample n
I(X;Y) = h(Y)-h(Y|X) fY ( y | x ) = f N ( y − x )
By definition ∞ ∞
h (Y | X ) = ∫
∞ ∞ ∫−∞
fY ( y | x)log2 (1 fY ( y | x) dy = ∫ f N ( y − x)log2 (1 f N ( y − x))dy
−∞
−∞ −∞∫ f XY ( x , y ) log(1 / fY ( y | x )) dxdy putting y-x = z
∞ ∞
=∫ ∫
∞ ∞
−∞ −∞
f X ( x ) fY ( y | x )log(1 fY ( y | x ) dxdy ∫ −∞
f Y ( y | x ) log 2 (1 f Y ( y | x ) = ∫ −∞
f N ( z ) log 2 (1 f N ( z )) dz
19
3/23/2009
CHANNEL CAPACITY OF A BAND LIMITED AWGN CHANNEL CAPACITY OF A BAND LIMITED AWGN
CHANNEL (SHANNON HARTLEY THEOREM) CHANNEL (SHANNON HARTLEY THEOREM)
h (Y | X ) = h ( z ) = h ( y − x ) = h ( n ) I max ( x, y ) = hmax ( y) − h(n)

h (Y | X ) = h ( n ) 1
= log(2πe( S + N )) − h(n)
I ( X ;Y ) = h ( y ) − h (n ) 2
For a white gaussian noise with mean square value N
The mean square value of the x(t) = S and the mean square 1
N = ηB

value of the noise = N .
h (n ) = lo g 2π e N
2
Mean square value of y is given by y = S + N Channel capacity per sample CS = I max ( x, y )
2

Now mean square value of output entropy h(y) is obtained 1 1

when Y is Gaussian and is given by CS = log [2π e ( S + N ) ] − [log(2π eN ) ]
2 2
1
h m ax ( y ) = lo g 2 π e (S + N ) σ2 = S + N 1 ⎡ 2π e ( S + N ) ⎤
2 = log
2 ⎢⎣ 2 π eN ⎥⎦
CHANNEL CAPACITY OF A BAND LIMITED AWGN CAPACITY OF A CHANNEL OF INFINITE

CHANNEL (SHANNON HARTLEY THEOREM)
BANDWIDTH BW
1⎡ (S + N ) ⎤
1 ⎡ S ⎤
CS = log =
lo g ⎢ 1 + The Shannon Hartley Theorem indicates that a noiseless
2 ⎢⎣ N ⎥⎦ 2 ⎣ N ⎥⎦

Gaussian channel with S/N= infinity has an infinite capacity
Channel capacity per sample is ½ log( 1+S/N) . There are 2B since
samples per second .So the channel capacity per second is given by ⎡ ⎛ S ⎞⎤
C = B ⎢ lo g 2 ⎜ 1 + ⎟
⎣ ⎝ N ⎠ ⎥⎦
⎡1 ⎛ S ⎞⎤ When the bandwidth B increases the channel capacity does not
C = 2 B ⎢ log ⎜ 1 + ⎟
N ⎠ ⎥⎦
become infinite as expected because with an increase in BW the
⎣2 ⎝ noise power also increases.
⎡ ⎛ S ⎞⎤ Thus for a fixed signal power and in presence of white Gaussian
= B ⎢ lo g ⎜ 1 + ⎟ b its /s e c o n d
⎣ ⎝ N ⎠ ⎥⎦ noise the channel capacity approaches an upper limit with
increase in band width .
⎡ ⎛ S ⎞⎤ ⎡ ⎛ S ⎞⎤
C = B ⎢ log ⎜ 1 + ⎟ ⎥ bits/second C = B ⎢ log 2 ⎜ 1 + ⎟
⎣ ⎝ N ⎠⎦ ⎣ ⎝ N ⎠ ⎥⎦

Putting N = ηB
CAPACITY OF A CHANNEL OF CAPACITY OF A CHANNEL OF

INFINITE BANDWIDTH BW INFINITE BANDWIDTH
⎡ ⎛ ⎞⎤ S
C = B ⎢ log 2 ⎜ 1 +
S S S C ∞ = 1.44
⎟⎥ C∞ = log 2 e = 1.44 η
⎣ ⎝ ηB ⎠⎦ η η
ηB 1
S ηB ⎡ S ⎤ S ⎡ S ⎤S S ⎡ S ⎤ s ηB
= . log 2 ⎢1 + = log 2 ⎢1 + ⎥ = log 2 ⎢1 + This equation indicates that we may trade off bandwidth for
⎥ η B ⎥⎦

η S ⎣ ηB ⎦ η ⎣ ηB ⎦ η ⎣ signal to noise ratio and vice versa
S
S C ∞ = 1 .4 4
Putting = x this expression becomes η
ηB N SB ⎛ S ⎞
putting η = C ∞ = 1 .4 4 = 1 .4 4 ⎜ ⎟ (B )
N ⎝ N ⎠
S B
C = log 2 (1 + x ) 1/ x
η
As B →∞, x → 0 For a maximum C we can trade off S/N and B.
S If S/N is reduced we have to increase the BW . If the BW is to
C ∞ = lim log 2 (1 + x )1/ x be reduced we have to increase S/N and so on .
x→0 η
20
3/23/2009
ORTHOGONAL SET OF FUNCTIONS ORTHOGONAL SET OF FUNCTIONS

Consider a set of functions g1 ( x ), g 2 ( x ),.........., g n ( x ) defined over The vectors vi and vj are perpendicular when θ = 90 i.e.
the interval x1 ≤ x ≤ x2 and which are related to one another as vi.vj= 0. The vectors are then said to be orthogonal. In
below x correspondence function whose integrated product is zero are
2
also orthogonal to one another.
∫ g ( x )g ( x ) = 0
i j Consider we have an arbitrary function f(x) and we are
x1
i≠ j interested in f(x) only in the range from x1 and x2 ie over the
interval in which the set of functions g(x) are orthogonal.
If we multiply and integrate the functions over the interval x1
andd x2 the
th resultlt is
i zero exceptt when
h the
th signal
i l are the
th same. Now we can expand f(x) as a linear sum of the functions gn(x)
A set of functions which has this property is described as f ( x ) = c1 g1 ( x ) + c2 g 2 ( x )................ + cn g n ( x )...............(2)
being orthogonal over the interval from x1 to x2.
JG JJG where c’s are numerical constants.
The function can be compared to vector v i and v j whose dot
product is given by The orthogonality of the g’s makes it easy to compute the
GG coefficients cn. To evaluate cn we multiply both sides of eq(2)
v i v j cosθ by gn(x) and integrate over the interval of orthogonality.
.
ORTHOGONAL SET OF FUNCTIONS ORTHOGONAL SET OF FUNCTIONS

x2
If we are selecting the denominator of RHS such that ∫ gn ( x )dx = 1

2
x2 x2 x2
∫ f ( x)g ( x)dx = c ∫ g ( x)g ( x)dx + c ∫ g ( x)g ( x)dx + .....

x1
x2
n 1 1 n 2 2 n
x1 x1
x2
x1 we have cn = ∫ f ( x)gn ( x )dx..........(3)
....... + cn ∫ gn ( x )gn ( x )dx
x1 x2
When orthogonal functions are selected such that ∫ gn ( x)dx = 1

2

x1
x1
they
y are said to be normalised.
Because of orthogonality all of the terms on the right hand side
becomes zero with a single exception. The use of normalised functions has the advantage that cn’s
x2 x2 can be calculated from eq(3) without having to evaluate the
∫ f ( x)g ( x)dx = c ∫ g
2
n n n ( x )dx x2
∫g
2
x1 x1
x2 integral ( x)dx
∫
n
f ( x ) g n ( x )dx x1
cn =
x1
∫
x2
g n2 ( x )dx A set of functions which are both orthogonal and normalised
x1
is called orthonormal set.
MATCHED FILTERS RECEPTION OF MARY FSK MATCHED FILTERS RECEPTION OF MARY FSK
Let a message source generates M messages each with equal s1 (t )
T
likelihood.
Let each message be represented by one of the orthogonal set ∫
0
e1
of signals s1(t), s2(t), ……., sn(t). The message interval is T.
The signals are transmitted over a communication channel s2 (t )
where they are corrupted by Additive White Gaussian Noise AWGN T
(AWGN). SOURCE OF
M
∫ e2
At the
th receiver
i a determination
d t i ti off which
hi h message has
h b
been 0
. . . . .
MESSAGES
transmitted is made through the use of M matched filters or
. . . . .
correlators.
Each correlator consists of a multiplier followed by an integrator.
The local inputs to the multipliers are si(t). sM (t) T
Suppose that in the absence of noise the signal si(t) is
transmitted and the output of each integrator is sampled at the
end of a message interval.
∫ 0
eM
21
3/23/2009
Then because of the orthogonality condition all the integrator will
( si (t ) + n(t ) ) si (t )dt
T
have zero output except for the ith integrator whose output will be ei = ∫
T 0
∫ s i 2 ( t )d t T T

0
It is adjusted to produce an output of Es, symbol energy.
= ∫ si 2 (t )dt + ∫ n(t ) si (t )dt = Es + ni
0 0
In the presence of an AWGN wave from n(t), the output of the lth
correlator will be T To determine which message has been transmitted we shall
T
el = ( si (t ) + n(t )) sl (t )dt = ∫ sl (t )n (t )dt + ∫ si (t ) sl (t )dt
T
compare the matched filters output e1, e2 …….., eM.
∫T
0 0
0 We may decide that si(t) has been transmitted if the
= ∫ n(t ) sl (t )dt ≡ nl corresponding output ei is larger than the output of any of the
0
The quantity nl is a random variable which is gaussian, which has a filters.
mean value of zero and has a mean square value given by The probability that some arbitrarily selected output el is less than
σ2=ηEs/2 the output ei is given by
The correlator corresponding to the transmitted message will el 2
have an output 1 ei −
T
ei = ∫ ( si (t ) + n (t )) si (t )dt
p ( e l < ei ) =
2π σ ∫ −∞
e 2σ 2
d e l ................(1)
0
M −1
⎡ 1 E s + ni −
el 2 ⎤
The probability that e1 and e2 both are smaller than ei is given
by
pL = ⎢
⎢⎣ 2π σ ∫ −∞
e 2σ 2
de l ⎥
⎥⎦
p( e1 < ei and e2 < ei ) = p( e1 < ei ) p( e2 < ei ) When el = −∞, x = −∞
= [ p(e1 < ei )] = [ p(e2 < ei )] el
2 2
Let =x Es n
2σ When el = Es + ni , x = + i
2σ 2σ
The probability pL that ei is that largest of the outputs is given del = 2σ dx
by
pL = p( ei > e1 , e2 , e3 ,.....eM ) = [ p( el < ei )]M −1 ⎡ Es n
+ i ⎤
M −1
2σ σ
M −1 ⎢ 1 ⎥
∫
2
⎡ 1 ei −
el 2 ⎤ pL = ⎢ e− x dx ⎥
= ⎢
⎢⎣ 2π σ ∫ −∞
e 2σ 2
de l ⎥
⎥⎦
⎢
⎣
π −∞
⎥
⎦
pL depends on two deterministic parameters Es/η and M and on a

η Es η Es Es Es single random variable ni/√2σ
σ2 = , σ= =
2 2 2σ η ⎛E n ⎞
pL = pL ⎜ s , M , i ⎟
M −1 ⎝η 2σ ⎠
⎡ Es
+
ni
⎤
η 2σ
⎢ 1 ⎥ To find the probability that ei is the largest output without reference to
∫
− x2
pL = ⎢ e dx ⎥ ..........( 4 ) the noise output ni of the ith correlator we need to average pL over all
⎢ π −∞ ⎥ possible values of ni.
⎣ ⎦ The average is the probability that we shall be correct in deciding
that the transmitted signal corresponds to the correlator which yields
⎛E n ⎞ the largest output. Let this probability be pC.
pL = f ⎜ s , M , i ⎟ The probability of an error is then pE = 1-pC
⎝η 2σ ⎠ ni is a random variable (gaussian) with zero mean and variance σ2 .
Hence the average value of pL considering all possible values of ni is
given by
22
3/23/2009
MATCHED FILTERS RECEPTION OF MARY FSK EFFICIENCY OF ORTHOGONAL SIGNAL

TRANSMISSION
∞ 2
The above equation is plotted after evaluation by numerical
⎛ E ⎞ − 2 σi 2
n
1 ni
pC =
2σ ∫
−∞
pL ⎜ s , M ,
⎝ η 2σ
⎟e
⎠
dni integration by a computer
pe 1
ni 0.5 M=2
If y = and using eq (4)
2σ 0.1
M=4
M −1 10-22
⎛ Es
+y ⎞ M=16
η
⎛ 1 ⎞
M ∞
⎜ ⎟
∫ ∫
2 2
pC = ⎜ ⎟ e− y ⎜ e− x dx ⎟ dy 10-3
M=1024
⎝ π ⎠ −∞
⎜⎜ −∞
⎟⎟ 10-4 M=2048
⎝ ⎠
pe = 1 − pc 10-5
M→α
ln 2 Si = Si
0.71 2 3 6 10 20 ηR η log M
2
EFFICIENCY OF ORTHOGONAL SIGNAL Efficiency of Orthogonal Signal Transmission :

TRANSMISSION Observations About the Graph
Si Si
For all M, pe decreases as increases. As → ∞ , pe → 0
pe = f ( M , Es η ) ηR ηR
For M → ∞ ,pe=0 provided

Es E sT Si
Abscissa is = Es
= Si ≥ ln 2 and pe=1 otherwise
Tη log2 M Put ηR
η log M 2
T
SiT For fixed M and R, pe decreases as as the noise density η
= =
Si log
g2 M
η log M P
Put =R d
decreases.
2 η logT M
2
T
For fixed M and η, pe decreases as the signal power goes up.
Si
= For fixed Si, η and M, pe decreases as we allow more time T for
ηR the transmission of a single message or the rate R is decreased.
For fixed Si, η and T, pe decreases as M decreases
As M, the number of messages increases, the error
probability reduces.
Efficiency of Orthogonal Signal Transmission : Efficiency of Orthogonal Signal Transmission :

Observations About the Graph Observations About the Graph
Si As M increases, the bandwidth is increased and the number of
As M → ∞ the error probability pe → 0 provided ≥ ln 2 matched filters increases and so does the circuit complexity
ηR Errorless transmission is really possible as predicted by
As M → ∞ the bandwidth (B = 2Mfs) B → ∞ Shannon’s Theorem provided Rmax=1.44 Si/ η if M → ∞
As we are required to transmit information at the same fixed rate
Maximum allowable errorless transmission rate Rmax is the R in the presence of fixed noise power spectral density with fixed
channel capacity. error probability and fixed M, we have to control signal power Si
Si Si Si
= ln 2 R max = = 1.44
η Rmax η ln 2 η
The maximum rate obtained for this M ary FSK is the same as
that obtained by Shannon’s Theorem
23

DC Digital Communication MODULE IV PART1

Uploaded by

Copyright:

Available Formats

DC Digital Communication MODULE IV PART1

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

DC Digital Communication MODULE IV PART1

Uploaded by

Copyright:

Available Formats

3/23/2009

AMOUNT OF INFORMATION AMOUNT OF INFORMATION

IMPORTANT PROPERTIES OF IK NOTES

AVERAGE INFORMATION,ENTROPY AVERAGE INFORMATION,ENTROPY

AVERAGE INFORMATION,ENTROPY AVERAGE INFORMATION,ENTROPY

AVERAGE INFORMATION,ENTROPY AVERAGE INFORMATION,ENTROPY

INFORMATION RATE R EXAMPLE 1

EXAMPLE 4 EXAMPLE 4 (Contd..)

SOURCE CODING SOURCE CODING

SOURCE CODING SOURCE CODING

SOURCE CODING SHANNON’S SOURCE CODING THEOREM

SHANNON’S SOURCE CODING THEOREM SHANNON’S SOURCE CODING THEOREM

SHANNON’S SOURCE CODING THEOREM SHANNON’S SOURCE CODING THEOREM

1  H(X) = 0 if and only if the probability pi = 1 for some i and the

CLASSIFICATION OF CODES CLASSIFICATION OF CODES

EXAMPLE 1(Contd…) B anc C are not uniquely decodable EXAMPLE

EXAMPLE SHANNON-FANO CODING PROCEDURE

 (ii) Partition the set in to two sets that are as close to

 (v) The rows of the table corresponding to the symbol gives

SHANNON-FANO CODING SHANNON-FANO CODING

SHANNON-FANO CODING SHANNON-FANO CODING

SHANNON-FANO CODING SHANNON-FANO CODING

SHANNON-FANO CODING SHANNON-FANO CODING

HUFFMANN -CODING EXAMPLES OF HUFFMANN’S CODING

* It has the highest efficiency. (0 1 0)

EXAMPLES OF HUFFMANN’S CODING EXAMPLES OF HUFFMANN’S CODING

x1 0.30 0.30 0.30 0.45 0.55 x1 0.4 0.4 0.4 0.6

EXAMPLES OF HUFFMANN’S CODING CHANNEL REPRESENTATION

0.4  A communication channel may be defined as the path or

CHANNEL REPRESENTATION CHANNEL REPRESENTATION

 A diagram of a DMC with m inputs and n outputs is shown

CHANNEL MATRIX CHANNEL MATRIX

CHANNEL MATRIX LOSSLESS CHANNEL

DETERMINISTIC CHANNEL DETERMINISTIC CHANNEL

NOISELESS CHANNEL BINARY SYMMETRIC CHANNEL

BINARY SYMMETRIC CHANNEL EXAMPLE 1

p( x1, y 2 ) = 0.05 p( x2 , y1) = 0.1

MUTUAL INFORMATION AND CHANNEL

PROBABILITIES ASSOCIATED WITH A CHANNEL ENTROPIES ASSOCIATED WITH A CHANNEL

ENTROPIES ASSOCIATED WITH A CHANNEL RELATIONSHIP BETWEEN ENTROPIES

RELATIONSHIP BETWEEN ENTROPIES MUTUAL INFORMATION

p( yj) yj uniquely determines the message transmitted.

MUTUAL INFORMATION MUTUAL INFORMATION

MUTUAL INFORMATION CHANNEL CAPACITY

CHANNEL CAPACITY CHANNEL CAPACITY OF A BSC

 Channel capacity C is the maximum possible information

CHANNEL CAPACITY OF A BSC CHANNEL CAPACITY OF A BSC

 Destination entropy H(Y) is 1 HMAX

CHANNEL CAPACITY OF A BSC CHANNEL CAPACITY OF A BSC

CHANNEL CAPACITY OF A BSC CHANNEL CAPACITY OF A BSC

 If the noise is small, error probability α<<1 and the mutual

SHANNON’S THEOREM ON CHANNEL CAPACITY DIFFERENTIAL ENTROPY H(X)

DIFFERENTIAL ENTROPY H(X) DIFFERENTIAL ENTROPY H(X)

xk=kΔx where k=0, ±1, ±2, … and Δx approaches zero. ⎡ ∞

The entropy of the random variable Y is twice that of X.

EXAMPLE CHANNEL CAPACITY AND MUTUAL

MUTUAL INFORMATION MUTUAL INFORMATION

 Because log[1 g[ f X ( x ) Δx ] is the information transmitted and ∞ ∞

.log [1 f X ( x | y ) Δx ] is the information lost over the channel the = ∫∫

1 H(X) = 0 if and only if the probability pi = 1 for some i and the

(ii) Partition the set in to two sets that are as close to

(v) The rows of the table corresponding to the symbol gives

0.4 A communication channel may be defined as the path or

A diagram of a DMC with m inputs and n outputs is shown

Channel capacity C is the maximum possible information

Destination entropy H(Y) is 1 HMAX

If the noise is small, error probability α<<1 and the mutual

Because log[1 g[ f X ( x ) Δx ] is the information transmitted and ∞ ∞

The second term on the RHS represents the average over x

Let the signal power be S. The disturbance is assumed to be

Now mean square value of output entropy h(y) is obtained 1 1

pL depends on two deterministic parameters Es/η and M and on a