Nothing Special   »   [go: up one dir, main page]

DSP PDF

Download as pdf or txt
Download as pdf or txt
You are on page 1of 8

DSP: Huffman coding for data compression

1. Introduction
Data compression is a crucial aspect of modern computing, enabling efficient storage and

transmission of information. Huffman coding is a widely used technique for lossless data

compression. In this report, we explore Huffman data compression, including its algorithm,

implementation in MATLAB/Python, experimental results, and observations.

2. Problem Description
The goal of Huffman coding is to efficiently represent data by assigning variable-length codes

to input symbols based on their frequencies. Symbols with higher frequencies are assigned

shorter codes, reducing the overall size of the encoded data.

3. Algorithm (Pseudocode)
1.Read the number of symbols and their corresponding probabilities.

2.Sort the probabilities in descending order.

3.Create a priority queue and insert the symbols and their probabilities.

4.Repeat the following steps until the priority queue contains only one element:

a. Remove the two elements with the lowest probabilities from the priority queue.

b. Create a new node with the sum of the two probabilities and the concatenation of their symbols.

c. Insert the new node into the priority queue.

5. The remaining element in the priority queue is the root of the Huffman tree.

6. Generate the Huffman codebook by traversing the Huffman tree.

7. Encode the input signal using the Huffman codebook.

8. Decode the encoded signal using the Huffman tree.

9. Check if the decoded signal is equal to the original input signal.


Algorithm 1: User-Interactive Huffman Encoding

1. Prompt the user to enter the number of symbols (x).


2. Create a vector n with values from 1 to x (symbols).
3. Prompt the user to enter the probabilities for each symbol (p).
4. Sort probabilities in descending order (optional, for visualization).
5. Create a Huffman code dictionary (dict) using `huffmandict(n, p)`.
6. (Optional) Calculate the minimum bits per symbol (bps).
7. Generate a random symbol sequence (inputsig) with a defined length.
8. Encode the symbol sequence using the dictionary to get encoded data (code) with
`huffmanenco(inputsig, dict)`.
9. (Optional) Display the encoded data.

This pseudocode highlights the user interaction for defining symbols and probabilities.

Algorithm 2: Core Huffman Encoding Process

1. Input: Number of symbols (x), symbol probabilities (p).


2. Create a vector n containing symbols from 1 to x.
3. Build a Huffman tree using the probabilities (internal process of
`huffmandict`).
4. Generate Huffman codewords for each symbol based on the tree (internal process
of `huffmandict`).
5. Create a dictionary (dict) to map symbols to codewords.
6. Output: Huffman code dictionary (dict).

This focuses on the core logic of Huffman coding without user interaction details.

Algorithm 3: Encoding a Symbol Sequence with Huffman Coding

1. Input: Symbol sequence (inputsig), Huffman code dictionary (dict).


2. Loop through each symbol in inputsig:
- Find the corresponding Huffman codeword in dict.
- Append the codeword to encoded data (code).
3. Output: Encoded data (code) as a sequence of 0s and 1s.

This pseudocode outlines the process of encoding a symbol sequence using a predefined Huffman
code dictionary.
START

Enter No. of Symbols


1
and Probabilities

Display Symbols and


2
Probabilities

Sort Probabilities in
3
Descending Order

Generate Huffman
4
Dictionary

Generate Random
5
Input Signal

6 Encode the Signal

7 Decode the Signal

Check if Input equals


8
Decoded Signal

Convert Input &


9
Encoded Signal-Binary

Calculate
10
Sequence Length

Calculate Encoded
11
Length

12 Display Results

End
4. MATLAB Implementation

x=input('enter the number of

symbol') n=1:x;

disp('the number of symbols are n:')

disp(n)

p=input('enter the probabilities ')

disp(p)

s=sort(p,'descend')

disp("the sorted probabilites are ")

disp(s)

dict= huffmandict(n,p);

bps = ceil(log2(max(n)));

inputsig= randsrc(100,1,[n;p]);

code = huffmanenco(inputsig,dict)

sig = huffmandeco(code,dict);

isequal(inputsig,sig)

binarySig = int2bit(inputsig,bps);

seqLen = numel(binarySig)

binaryComp = int2bit(code,bps);

encodedLen = numel(binaryComp)

5. Experimental Results and Observations


Sample Input: Let's assume we have 5 symbols with probabilities [0.4, 0.3, 0.2, 0.1, 0.0].

Experimental Results: After applying Huffman coding, the average code length reduces, and the

encoded data size decreases significantly.

Observations: Huffman coding effectively reduces redundancy in the data by assigning

shorter codes to more probable symbols and longer codes to less probable symbols.
THEORY
1. input('enter the number of symbol');

This function prompts the user to enter the number of symbols in your data. It captures the
user's input as a string and stores it in the variable x.

It's mandatory because you need to define the number of symbols you're working with to create
the Huffman code dictionary.

2. n = 1:x;

This creates a row vector n containing sequential integers from 1 to the value stored in x.

It's typically used to represent the symbols in your data. Here, it assumes your symbols are
simply numbered from 1 to x.

3. disp('the number of symbols are n:');


disp(n);

These lines use the disp function to display informative messages to the user.

The first line displays "the number of symbols are n:" on the console.

The second line displays the contents of the vector n, showing the user the defined symbols.

These are not mandatory for functionality but improve clarity and user interaction.

4. p = input('enter the probabilities ');


disp(p);

Similar to input, this prompts the user to enter the probabilities associated with each symbol. It
stores the user's input as a row vector in p.

The values in p should sum to 1 and represent the likelihood of each symbol appearing in your
data.

This is mandatory because Huffman coding uses probabilities to create optimal codewords.

5. s = sort(p,'descend'); disp("the sorted probabilites are ");


disp(s);

sort(p,'descend') sorts the elements in p in descending order and stores the result in s.

disp functions again display messages and the sorted probabilities for user reference.

Sorting probabilities can be helpful for visualization and understanding the distribution of symbol
occurrences, but it's not strictly necessary for Huffman encoding itself.
6. dict = huffmandict(n, p);

This line is where the core Huffman coding happens.


huffmandict is a MATLAB function that creates a Huffman code dictionary based on the
symbols (n) and their probabilities (p).
The internal process (not directly accessible) involves building a Huffman tree using the
probabilities. The tree structure determines the optimal codewords (binary representations)
for each symbol to minimize the average codeword length.
The output, dict, is a cell array with two columns:
The first column lists the symbols (values from n).
The second column contains the corresponding Huffman codewords as row vectors of 0s
and 1s.
This is mandatory because the Huffman code dictionary is essential for encoding and
decoding your data.

7. bps = ceil(log2(max(n)));

This calculates the minimum number of bits required to represent the symbols in binary
form using ceil(log2(max(n))).
log2 calculates the base-2 logarithm, and ceil rounds the result up to the nearest integer.
It's informative to know the minimum bits needed, but not crucial for encoding/decoding.

8. inputsig = randsrc(100, 1, [n; p]);

This generates a random sequence of symbols (inputsig) with a length of 100.


randsrc is a random number generator for symbols.
The first argument (100) specifies the number of symbols to generate.
The second argument (1) sets the number of output channels (single channel here).
The third argument is a cell array [n; p].
The first row (n) defines the possible symbols (same as before).
The second row (p) specifies the probabilities for each symbol.
This step might not be mandatory if you have your own data sequence, but it's useful for
demonstration and testing.

9. code = huffmanenco(inputsig, dict);

This encodes the random symbol sequence (inputsig) using the Huffman code dictionary
(dict).
huffmanenco is another MATLAB function that performs the encoding based on the
codewords in dict.
The internal process (not directly accessible) involves replacing each symbol in inputsig with
its corresponding Huffman codeword from dict.
The output, code, is a row vector containing the encoded data as a sequence of 0s and 1s.
This is mandatory for encoding data using Huffman coding
EXAMPLE : 1

EXAMPLE : 2
6. Conclusion

Huffman coding is a powerful technique for lossless data compression. This report presented an

overview of Huffman coding, its algorithm, implementation, and experimental results. Huffman coding

efficiently reduces the size of data by assigning shorter codes to frequently occurring symbols, thus

improving storage and transmission efficiency.

7. References

Huffman, D. A. (1952). "A Method for the Construction of Minimum-Redundancy Codes".

Proceedings of the IRE.

Sayood, K. (2017). Introduction to Data Compression (5th ed.). Morgan Kaufmann.

MATLAB Documentation: Huffman Encoding

You might also like