Dip Notes
Dip Notes
Dip Notes
ON
UNIT-I
INTRODUCTION TO IMAGE PROCESSING
1.1 Introduction:
The digital image processing deals with developing a digital system that performs operations
on a digital image. An image is nothing more than a two dimensional signal. It is defined by
the mathematical function f(x,y) where x and y are the two co-ordinates horizontally and
vertically and the amplitude of f at any pair of coordinate (x, y) is called the intensity or gray
level of the image at that point. When x, y and the amplitude values of f are all finite discrete
quantities, we call the image a digital image. The field of image digital image processing
refers to the processing of digital image by means of a digital computer. A digital image is
composed of a finite number of elements, each of which has a particular location and values
of these elements are referred to as picture elements, image elements, pels and pixels.
Digital image processing deals with manipulation of digital images through a digital
computer. It is a subfield of signals and systems but focus particularly on images. DIP
focuses on developing a computer system that is able to perform processing on an image. The
input of that system is a digital image and the system process that image using efficient
algorithms, and gives an image as an output. The most common example is Adobe
Photoshop. It is one of the widely used application for processing digital images.
Applications:
Some of the major fields in which digital image processing is widely used are mentioned
below
ii) Specialize image processing hardware: It consists of the digitizer just mentioned, plus
hardware that performs other primitive operations such as an arithmetic logic unit, which
performs arithmetic such addition and subtraction and logical operations in parallel on images
iii) Computer: It is a general purpose computer and can range from a PC to a supercomputer
depending on the application. In dedicated applications, sometimes specially designed
computer are used to achieve a required level of performance
iv) Software: It consists of specialized modules that perform specific tasks a well designed
package also includes capability for the user to write code, as a minimum, utilizes the
specialized module. More sophisticated software packages allow the integration of these
modules.
vi) Image display: Image displays in use today are mainly color TV monitors. These
monitors are driven by the outputs of image and graphics displays cards that are an integral
part of computer system.
vii) Hardcopy devices: The devices for recording image includes laser printers, film
cameras, heat sensitive devices inkjet units and digital units such as optical and CD ROM
disk. Films provide the highest possible resolution, but paper is the obvious medium of
choice for written applications.
viii) Networking: It is almost a default function in any computer system in use today because
of the large amount of data inherent in image processing applications. The key consideration
in image transmission bandwidth.
The eye is nearly a sphere with average approximately 20 mm diameter. The eye is enclosed
with three membranes
a) The cornea and sclera - it is a tough, transparent tissue that covers the anterior surface of
the eye. Rest of the optic globe is covered by the sclera
b) The choroid – It contains a network of blood vessels that serve as the major source of
nutrition to the eyes. It helps to reduce extraneous light entering in the eye It has two parts
(1) Iris Diaphragms- it contracts or expands to control the amount of light that
enters the eyes
(c) Retina – it is innermost membrane of the eye. When the eye is properly focused, light
from an object outside the eye is imaged on the retina. There are various light receptors over
the surface of the retina The two major classes of the receptors are-
1) cones- it is in the number about 6 to 7 million. These are located in the central portion of
the retina called the fovea. These are highly sensitive to color. Human can resolve fine details
with these cones because each one is connected to its own nerve end. Cone vision is called
photonic or bright light vision.
2) Rods – these are very much in number from 75 to 150 million and are distributed over the
entire retinal surface. The large area of distribution and the fact that several roads are
connected to a single nerve give a general overall picture of the field of view. They are not
involved in the color vision and are sensitive to low level of illumination. Rod vision is called
is isotopic or dim light vision. The absent of reciprocators is called blind spot.
The major difference between the lens of the eye and an ordinary optical lens in that the
former is flexible. The shape of the lens of the eye is controlled by tension in the fiber of the
ciliary body. To focus on the distant object the controlling muscles allow the lens to become
thicker in order to focus on object near the eye it becomes relatively flattened. The distance
between the center of the lens and the retina is called the focal length and it varies from
17mm to 14mm as the refractive power of the lens increases from its minimum to its
maximum. When the eye focuses on an object farther away than about 3m.the lens exhibits
its lowest refractive power. When the eye focuses on a nearly object. The lens is most srongly
refractive. The retinal image is reflected primarily in the area of the fovea. Perception then
takes place by the relative excitation of light receptors, which transform radiant energy into
electrical impulses that are ultimately decoded by the brain.
Digital image are displayed as a discrete set of intensities. The range of light intensity levels
10
to which the human visual system can adopt is enormous- on the order of 10 - from isotopic
threshold to the glare limit. Experimental evidences indicate that subjective brightness is a
logarithmic function of the light intensity incident on the eye.
Dept. of ECE Page | 5
LECTURE NOTES DIGITAL IMAGE PROCESSING
The curve represents the range of intensities to which the visual system can adopt. But the
visual system cannot operate over such a dynamic range simultaneously. Rather, it is
accomplished by change in its overcall sensitivity called brightness adaptation. For any given
set of conditions, the current sensitivity level to which of the visual system is called
brightness adoption level , Ba in the curve. The small intersecting curve represents the range
of subjective brightness that the eye can perceive when adapted to this level. It is restricted at
level Bb , at and below which all stimuli are perceived as indistinguishable blacks. The upper
portion of the curve is not actually restricted. Whole simply raise the adaptation level higher
than Ba. The ability of the eye to discriminate between change in light intensity at any
specific adaptation level is also of considerable interest. Take a flat, uniformly illuminated
area large enough to occupy the entire field of view of the subject. It may be a diffuser such
as an opaque glass, that is illuminated from behind by a light source whose intensity, I can be
varied. To this field is added an increment of illumination ΔI in the form of a short duration
flash that appears as circle in the center of the uniformly illuminated field. If ΔI is not bright
enough, the subject cannot see any perceivable changes.
As ΔI gets stronger the subject may indicate of a perceived change. ΔIc is the increment of
illumination discernible 50% of the time with background illumination I. Now, ΔIc /I is
Dept. of ECE Page | 6
LECTURE NOTES DIGITAL IMAGE PROCESSING
called the Weber ratio. Small value means that small percentage change in intensity is
discernible representing “good” brightness discrimination. Large value of Weber ratio means
large percentage change in intensity is required representing “poor brightness
discrimination”.
In this the eye fills the non existing information or wrongly pervious geometrical properties
of objects.
There are two categories of the steps involved in the image processing –
(2) Methods whose outputs are attributes extracted from those images.
i) Image acquisition: It could be as simple as being given an image that is already in digital
form. Generally the image acquisition stage involves processing such scaling.
ii) Image Enhancement: It is among the simplest and most appealing areas of digital image
processing. The idea behind this is to bring out details that are obscured or simply to
highlight certain features of interest in image. Image enhancement is a very subjective area of
image processing.
Dept. of ECE Page | 7
LECTURE NOTES DIGITAL IMAGE PROCESSING
iii) Image Restoration: It deals with improving the appearance of an image. It is an objective
approach, in the sense that restoration techniques tend to be based on mathematical or
probabilistic models of image processing. Enhancement, on the other hand is based on human
subjective preferences regarding what constitutes a “good” enhancement result.
iv) Color image processing: It is an area that is been gaining importance because of the use
of digital images over the internet. Color image processing deals with basically color models
and their implementation in image processing applications.
v) Wavelets and Multiresolution Processing: These are the foundation for representing
image in various degrees of resolution.
vi) Compression: It deals with techniques reducing the storage required to save an image, or
the bandwidth required to transmit it over the network. It has to major approaches a) Lossless
Compression b) Lossy Compression
vii) Morphological processing: It deals with tools for extracting image components that are
useful in the representation and description of shape and boundary of objects. It is majorly
used in automated inspection applications.
viii) Representation and Description: It always follows the output of segmentation step that
is, raw pixel data, constituting either the boundary of an image or points in the region itself.
In either case converting the data to a form suitable for computer processing is necessary.
ix) Recognition: It is the process that assigns label to an object based on its descriptors. It is
the last step of image processing which use artificial intelligence of software.
Knowledge base:
Knowledge about a problem domain is coded into an image processing system in the form of
a knowledge base. This knowledge may be as simple as detailing regions of an image where
the information of the interest in known to be located. Thus limiting search that has to be
conducted in seeking the information. The knowledge base also can be quite complex such
interrelated list of all major possible defects in a materials inspection problems or an image
database containing high resolution satellite images of a region in connection with change
detection application.
An image is denoted by a two dimensional function of the form f{x, y}. The value or
amplitude of f at spatial coordinates {x,y} is a positive scalar quantity whose physical
meaning is determined by the source of the image. When an image is generated by a physical
process, its values are proportional to energy radiated by a physical source. As a
consequence, f(x,y) must be nonzero and finite; that is o<f(x,y) <co
(a) The amount of the source illumination incident on the scene being viewed.
(b) The amount of the source illumination reflected back by the objects in the scene
These are called illumination and reflectance components and are denoted by i (x,y) an r (x,y)
respectively.
The functions combine as a product to form f(x,y). We call the intensity of a monochrome
image at any coordinates (x,y) the gray level (l) of the image at that point l= f (x, y.)
L min ≤ l ≤ Lmax
The interval [Lmin, Lmax] is called gray scale. Common practice is to shift this interval
numerically to the interval [0, L-l] where l=0 is considered black and l= L-1 is considered
white on the gray scale. All intermediate values are shades of gray of gray varying from black
to white.
To create a digital image, we need to convert the continuous sensed data into digital from.
This involves two processes – sampling and quantization. An image may be continuous with
respect to the x and y coordinates and also in amplitude. To convert it into digital form we
have to sample the function in both coordinates and in amplitudes.
To simple this function, we take equally spaced samples along line AB. The location of each
samples is given by a vertical tick back (mark) in the bottom part. The samples are shown as
block squares superimposed on function the set of these discrete locations gives the sampled
function.
In order to form a digital, the gray level values must also be converted (quantized) into
discrete quantities. So we divide the gray level scale into eight discrete levels ranging from
eight level values. The continuous gray levels are quantized simply by assigning one of the
eight discrete gray levels to each sample. The assignment it made depending on the vertical
proximity of a simple to a vertical tick mark.
Starting at the top of the image and covering out this procedure line by line produces a
two dimensional digital image.
A digital image f(m,n) described in a 2D discrete space is derived from an analog image
f(x,y) in a 2D continuous space through a sampling process that is frequently referred to as
digitization. The mathematics of that sampling process will be described in subsequent
Chapters. For now we will look at some basic definitions associated with the digital image.
The effect of digitization is shown in figure.
The 2D continuous image f(x,y) is divided into N rows and M columns. The intersection of a
row and a column is termed a pixel. The value assigned to the integer coordinates (m,n) with
m=0,1,2..N-1 and n=0,1,2…N-1 is f(m,n). In fact, in most cases, is actually a function of
many variables including depth, color and time (t).
Dept. of ECE Page | 11
LECTURE NOTES DIGITAL IMAGE PROCESSING
1) Low level process -these involve primitive operations such as image processing to reduce
noise, contrast enhancement and image sharpening. These kind of processes are characterized
by fact the both inputs and output are images.
2) Mid level image processing - it involves tasks like segmentation, description of those
objects to reduce them to a form suitable for computer processing, and classification of
individual objects. The inputs to the process are generally images but outputs are attributes
extracted from images.
The result of sampling and quantization is matrix of real numbers. Assume that an image
f(x,y) is sampled so that the resulting digital image has M rows and N Columns. The values
of the coordinates (x,y) now become discrete quantities thus the value of the coordinates at
orgin become 9X,y) =(o,o) The next Coordinates value along the first signify the iamge along
the first row. it does not mean that these are the actual values of physical coordinates when
the image was sampled.
Thus the right side of the matrix represents a digital element, pixel or pel. The matrix can be
represented in the following form as well. The sampling process may be viewed as
partitioning the xy plane into a grid with the coordinates of the center of each grid being a
pair of elements from the Cartesian products Z2 which is the set of all ordered pair of
elements (Zi, Zj) with Zi and Zj being integers from Z. Hence f(x,y) is a digital image if gray
level (that is, a real number from the set of real number R) to each distinct pair of coordinates
(x,y). This functional assignment is the quantization process. If the gray levels are also
integers, Z replaces R, the and a digital image become a 2D function whose coordinates and
she amplitude value are integers. Due to processing storage and hardware consideration, the
number gray levels typically is an integer power of 2.
k
L=2
Then, the number, b, of bites required to store a digital image is B=M *N* k
2
When M=N, the equation become b=N *k
When an image can have 2k gray levels, it is referred to as “k- bit”. An image with 256
8
possible gray levels is called an “8- bit image” (256=2 ).
1.9 Spatial and Gray level resolution:
Spatial resolution is the smallest discernible details are an image. Suppose a chart can be
constructed with vertical lines of width w with the space between the also having width W, so
a line pair consists of one such line and its adjacent space thus. The width of the line pair is
2w and there is 1/2w line pair per unit distance resolution is simply the smallest number of
discernible line pair unit distance.
Gray levels resolution refers to smallest discernible change in gray levels. Measuring
discernible change in gray levels is a highly subjective process reducing the number of bits R
while repairing the spatial resolution constant creates the problem of false contouring.
Dept. of ECE Page | 13
LECTURE NOTES DIGITAL IMAGE PROCESSING
It is caused by the use of an insufficient number of gray levels on the smooth areas of the
digital image . It is called so because the rides resemble top graphics contours in a map. It is
generally quite visible in image displayed using 16 or less uniformly spaced gray levels.
A pixel p at coordinate (x,y) has four horizontal and vertical neighbor whose coordinate can
be given by
This set of pixel called the 4-neighbours of p is denoted by n4(p) ,Each pixel is a unit distance
from (x,y) and some of the neighbors of P lie outside the digital image of (x,y) is on the
border if the image . The four diagonal neighbor of P have coordinated
(x+1,y+1),(x+1,y+1),(x-1,y+1),(x-1,y-1)
And are deported by nd (p) .these points, together with the 4-neighbours are called 8 –
neighbors of P denoted by ns(p).
(ii) Adjacency:
Let v be the set of gray –level values used to define adjacency, in a binary image, v={1}
if we are reference to adjacency of pixel with value. Three types of adjacency
4- Adjacency – two pixel P and Q with value from V are 4 –adjacency if A is in the set n4(P)
8- Adjacency – two pixel P and Q with value from V are 8 –adjacency if A is in the set n8(P)
M-adjacency –two pixel P and Q with value from V are m – adjacency if
(i) Q is in n4 (p) or
(ii)Q is in nd (q) and the set N4(p) U N4(q) has no pixel whose values are from V
For pixel p,q and z with coordinate (x.y) ,(s,t) and (v,w) respectively D is a distance function
or metric if
D [p.q] ≥ O {D[p.q]+D(q,z)
De (p,q) = Iy – t I
The D4 Education Distance between p and is defined as
De (p,q) = Iy – t I
The types of images in which we are interested are generated by the combination of an
“illumination” source and the reflection or absorption of energy from that source by the
elements of the “scene” being imaged. We enclose illumination and scene in quotes to
emphasize the fact that they are considerably more general than the familiar situation in
which a visible light source illuminates a common everyday 3-D (three-dimensional) scene.
For example, the illumination may originate from a source of electromagnetic energy such as
radar, infrared, or X-ray energy. But, as noted earlier, it could originate from less traditional
sources, such as ultrasound or even a computer-generated illumination pattern. Similarly, the
scene elements could be familiar objects, but they can just as easily be molecules, buried rock
formations, or a human brain. We could even image a source, such as acquiring images of the
sun. Depending on the nature of the source, illumination energy is reflected from, or
transmitted through, objects. An example in the first category is light reflected from a planar
surface. An example in the second category is when X-rays pass through a patient’s body for
thepurpose of generating a diagnostic X-ray film. In some applications, the reflected or
transmitted energy is focused onto a photo converter (e.g., a phosphor screen), which
converts the energy into visible light. Electron microscopy and some applications of gamma
imaging use this approach. The idea is simple: Incoming energy is transformed into a voltage
by the combination of input electrical power and sensor material that is responsive to the
particular type of energy being detected. The output voltage waveform is the response of the
sensor(s), and a digital quantity is obtained from each sensor by digitizing its response. In this
section, we look at the principal modalities for image sensing and generation.
The components of a single sensor. Perhaps the most familiar sensor of this type is the
photodiode, which is constructed of silicon materials and whose output voltage waveform is
proportional to light. The use of a filter in front of a sensor improves selectivity. For example,
a green (pass) filter in front of a light sensor favors light in the green band of the color
spectrum. As a consequence, the sensor output will be stronger for green light than for other
components in the visible spectrum.
In order to generate a 2-D image using a single sensor, there has to be relative displacements
in both the x- and y-directions between the sensor and the area to be imaged. Figure shows an
arrangement used in high-precision scanning, where a film negative is mounted onto a drum
whose mechanical rotation provides displacement in one dimension. The single sensor is
mounted on a lead screw that provides motion in the perpendicular direction. Since
mechanical motion can be controlled with high precision, this method is an inexpensive (but
Dept. of ECE Page | 16
LECTURE NOTES DIGITAL IMAGE PROCESSING
slow) way to obtain high-resolution images. Other similar mechanical arrangements use a flat
bed, with the sensor moving in two linear directions. These types of mechanical digitizers
sometimes are referred to as microdensitometers.
A geometry that is used much more frequently than single sensors consists of an in-line
arrangement of sensors in the form of a sensor strip, shows. The strip provides imaging
elements in one direction. Motion perpendicular to the strip provides imaging in the other
direction. This is the type of arrangement used in most flat bed scanners. Sensing devices
with 4000 or more in-line sensors are possible. In-line sensors are used routinely in airborne
imaging applications, in which the imaging system is mounted on an aircraft that flies at a
constant altitude and speed over the geographical area to be imaged. One dimensional
imaging sensor strips that respond to various bands of the electromagnetic spectrum are
mounted perpendicular to the direction of flight. The imaging strip gives one line of an image
at a time, and the motion of the strip completes the other dimension of a two-dimensional
image. Lenses or other focusing schemes are used to project area to be scanned onto the
sensors. Sensor strips mounted in a ring configuration are used in medical and industrial
imaging to obtain cross-sectional (“slice”) images of 3-D objects.
The individual sensors arranged in the form of a 2-D array. Numerous electromagnetic and
some ultrasonic sensing devices frequently are arranged in an array format. This is also the
predominant arrangement found in digital cameras. A typical sensor for these cameras is a
Dept. of ECE Page | 17
LECTURE NOTES DIGITAL IMAGE PROCESSING
CCD array, which can be manufactured with a broad range of sensing properties and can be
packaged in rugged arrays of elements or more. CCD sensors are used widely in digital
cameras and other light sensing instruments. The response of each sensor is proportional to
the integral of the light energy projected onto the surface of the sensor, a property that is used
in astronomical and other applications requiring low noise images. Noise reduction is
achieved by letting the sensor integrate the input light signal over minutes or even hours. The
two dimensional, its key advantage is that a complete image can be obtained by focusing the
energy pattern onto the surface of the array. Motion obviously is not necessary, as is the case
with the sensor arrangements This figure shows the energy from an illumination source being
reflected from a scene element, but, as mentioned at the beginning of this section, the energy
also could be transmitted through the scene elements. The first function performed by the
imaging system is to collect the incoming energy and focus it onto an image plane. If the
illumination is light, the front end of the imaging system is a lens, which projects the viewed
scene onto the lens focal plane. The sensor array, which is coincident with the focal plane,
produces outputs proportional to the integral of the light received at each sensor. Digital and
analog circuitry sweep these outputs and convert them to a video signal, which is then
digitized by another section of the imaging system.
To create a digital image, we need to convert the continuous sensed data into digital form.
This involves two processes: sampling and quantization. A continuous image, f(x, y), that we
want to convert to digital form. An image may be continuous with respect to the x- and y-
Dept. of ECE Page | 18
LECTURE NOTES DIGITAL IMAGE PROCESSING
coordinates, and also in amplitude. To convert it to digital form, we have to sample the
function in both coordinates and in amplitude. Digitizing the coordinate values is called
sampling. Digitizing the amplitude values is called quantization.
Digital image is a finite collection of discrete samples (pixels) of any observable object. The
pixels represent a two- or higher dimensional “view” of the object, each pixel having its own
discrete value in a finite range. The pixel values may represent the amount of visible light,
infra red light, absortation of x-rays, electrons, or any other measurable value such as
ultrasound wave impulses. The image does not need to have any visual sense; it is sufficient
that the samples form a two-dimensional spatial structure that may be illustrated as an image.
The images may be obtained by a digital camera, scanner, electron microscope, ultrasound
stethoscope, or any other optical or non-optical sensor. Examples of digital image are:
digital photographs
satellite images
Computer graphics, CAD drawings, and vector graphics in general are not considered in this
course even though their reproduction is a possible source of an image. In fact, one goal of
intermediate level image processing may be to reconstruct a model (e.g. vector
representation) for a given digital image.
1.14 Digitization:
Digital image consists of N M pixels, each represented by k bits. A pixel can thus have 2k
different values typically illustrated using a different shades of gray, see Figure . In practical
k
applications, the pixel values are considered as integers varying from 0 (black pixel) to 2 -1
(white pixel).
The images are obtained through a digitization process, in which the object is covered by a
two-dimensional sampling grid. The main parameters of the digitization are:
These two parameters have a direct effect on the image quality but also to the storage size of
the image (Table 1.1). In general, the quality of the images increases as the resolution and the
bits per pixel increase. There are a few exceptions when reducing the number of bits increases
the image quality because of increasing the contrast. Moreover, in an image with a very high
resolution only very few gray-levels are needed. In some applications it is more important to
have a high resolution for detecting details in the image whereas in other applications the
number of different levels (or colors) is more important for better outlook of the image. To
sum up, if we have a certain amount of bits to allocate for an image, it makes difference how
to choose the digitization parameters.
The properties of human eye imply some upper limits. For example, it is known that the
human eye can observe at most one thousand different gray levels in ideal conditions, but in
any practical situations 8 bits per pixel (256 gray level) is usually enough. The required levels
decreases even further as the resolution of the image increases. In a laser quality printing, as
in this lecture notes, even 6 bits (64 levels) results in quite satisfactory result. On the other
hand, if the application is e.g. in medical imaging or in cartography, the visual quality is not
the primary concern. For example, if the pixels represent some physical measure and/or the
image will be analyzed by a computer, the additional accuracy may be useful. Even if human
eye cannot detect any differences, computer analysis may recognize the difference. The
requirement of the spatial resolution depends both on the usage of the image and the image
content. If the default printing (or display) size of the image is known, the scanning resolution
can be chosen accordingly so that the pixels are not seen and the image appearance is not
jagged (blocky). However, the final reproduction size of the image is not always known but
images are often achieved just for “later use”. Thus, once the image is digitized it will most
likely (according to Murphy’s law) be later edited and enlarged beyond what was allowed by
the original resolution. The image content sets also some requirements to the resolution. If the
image has very fine structure exceeding the sampling resolution, it may cause so-called
aliasing effect where the digitized image has patterns that does not exists in the original.
(i)Point Operations: map each input pixel to output pixel intensity according to an intensity
transformation. A simple linear point operation which maps the input gray level f(m,n) to an
output gray level g(m,n)is given by:
g(m, n) = af (m, n b
Where a and b are chosen to achieve a desired intensity variation in the image. Note that the
output g(m.n) here depends only on the input f(m,n).
(ii)Local Operations: determine the output pixel intensity as some function of a relatively
small neighborhood of input pixels in the vicinity of the output location. A general linear
operator can be expressed as weighted of picture elements within a local neighborhood N.
Simple local smoothing (for noise reduction) and sharpening (for deploring or edge
enhancement) operators can be both linear and non-linear.
(iii)Global Operations: the outputs depend on all input pixels values. If linear, global
operators can be expressed using two-dimensional convolution.
Median/order statistics
Homomorphic filters
In addition to enhancement and restoration, image processing generally includes issues of
representations, spatial sampling and intensity quantization, compression or coding and
segmentation. As part of computer vision, image processing leads to feature extraction and
pattern recognition or scene analysis.
Dept. of ECE Page | 24
LECTURE NOTES DIGITAL IMAGE PROCESSING
UNIT–II
IMAGE
TRANSFORMS
Dept. of ECE Page | 25
LECTURE NOTES DIGITAL IMAGE PROCESSING
Dept. of ECE Page | 26
LECTURE NOTES DIGITAL IMAGE PROCESSING
Dept. of ECE Page | 27
LECTURE NOTES DIGITAL IMAGE PROCESSING
Dept. of ECE Page | 28
LECTURE NOTES DIGITAL IMAGE PROCESSING
Dept. of ECE Page | 29
LECTURE NOTES DIGITAL IMAGE PROCESSING
Dept. of ECE Page | 30
LECTURE NOTES DIGITAL IMAGE PROCESSING
Dept. of ECE Page | 31
LECTURE NOTES DIGITAL IMAGE PROCESSING
Dept. of ECE Page | 32
LECTURE NOTES DIGITAL IMAGE PROCESSING
Dept. of ECE Page | 33
LECTURE NOTES DIGITAL IMAGE PROCESSING
Dept. of ECE Page | 34
LECTURE NOTES DIGITAL IMAGE PROCESSING
Dept. of ECE Page | 35
LECTURE NOTES DIGITAL IMAGE PROCESSING
UNIT-III
IMAGE ENHANCEMENT
IMAGE ENHANCEMENT IN SPATIAL DOMAIN:
3.1. Introduction
The principal objective of enhancement is to process an image so that the result is more
suitable than the original image for a specific application. Image enhancement approaches
fall into two board categories.
The term spatial domain refers to the image plane itself and approaches in this categories are
based on direct manipulation of pixel in an image. Spatial domain process are denoted by the
expression
g(x,y)=T[f(x,y)]
The neighborhood of a point (x,y) can be explain by using as square or rectangular sub image
area centered at (x,y).
The center of sub image is moved from pixel to pixel starting at the top left corner. The
operator T is applied to each location (x,y) to find the output g at that location . The process
utilizes only the pixel in the area of the image spanned by the neighborhood.
S=T(r)
Because enhancement at any point in an image deepens only on the gray level at that point,
technique in this category are referred to as point processing.
It produces an image of higher contrast than the original one. The operation is performed by
darkening the levels below m and brightening the levels above m in the original image.
In this technique the value of r below m are compressed by the transformation function into a
narrow range of s towards black .The opposite effect takes place for the values of r above m.
It is a limiting case where T(r) produces a two levels binary image. The values below m are
transformed as black and above m are transformed as white.
s= L-1-r
Reverting the intensity levels of an image in this manner produces the equivalent of a
photographic negative. This type of processing is practically suited for enhancing white or
gray details embedded in dark regions of an image especially when the black areas are
dominant in size.
range of output gray levels. The opposite is true for higher values of input levels. We would
use this transformations to expand the values of dark pixels in an image while compressing
the higher level values. The opposite is true for inverse log transformation. The log
transformation function has an important characteristic that it compresses the dynamic range
A variety of devices used for image capture, printing and display respond according to a
power law. The process used to correct this power law response phenomenon is called
gamma correction. For eg-CRT devices have intensity to voltage response that is a power
function. Gamma correction is important if displaying an image accurately on a computer
screen is of concern. Images that are not corrected properly can look either bleached out or
too dark. Color phenomenon also uses this concept of gamma correction. It is becoming more
popular due to use of images over the internet. It is important in general purpose contract
manipulation. To make an image black we use y>1 and y<1 for white image.
The principal advantage of piecewise linear functions is that these functions can be arbitrarily
Complex. But their specification requires considerably more user input.
It is the simplest piecewise linear transformation function. We may have various low contrast
images and that might result due to various reasons such as lack of illumination, problem in
imaging sensor or wrong setting of lens aperture during image acquisition. The idea behind
contrast stretching is to increase the dynamic range of gray levels in the image Being
processed.
Dept. of ECE Page | 39
LECTURE NOTES DIGITAL IMAGE PROCESSING
The location of points (r1,s1) and (r2,s2) control the shape of the curve
a) If r1=r2 and s1=s2, the transformation is a linear function that deduces no change in gray
levels.
b) If r1=s1, s1=0 , and s2=L-1, then the transformation become a thresholding function
that creates a binary image
c) Intermediate values of (r1, s1) and (r2, s2) produce various degrees of spread in the gray
Generally r1≤ r2 and s1≤ s2 so that the function is single valued and monotonically
increasing.
(1) One method is to display a high value for all gray level in the range. Of interest and a
low value for all other gray level.
(2) Second method is to brighten the desired ranges of gray levels but preserve the
background and gray level tonalities in the image.
High order bits contain the majority of visually significant data and contribute to more subtle
details in the image. Separating a digital image into its bits planes is useful for analyzing the
relative importance played by each bit of the image. It helps in determining the adequacy of
the number of bits used to quantize each pixel. It is also useful for image compression.
The histogram of a digital image with gray levels in the range [0, L-1] is a discrete function
of the form
H(rk)=nk
where rk is the kth gray level and nk is the number of pixels in the image having the level
rk.. A normalized histogram is given by the equation
P(rk) gives the estimate of the probability of occurrence of gray level rk.
In the dark image the components of the histogram are concentrated on the low (dark) side of the
gray scale. In case of bright image the histogram components are baised towards the high side of
the gray scale. The histogram of a low contrast image will be narrow and will be centered
towards the middle of the gray scale.
The components of the histogram in the high contrast image cover a broad range of the gray
scale. The net effect of this will be an image that shows a great deal of gray levels details and has
high dynamic range.
Histogram equalization is a common technique for enhancing the appearance of images. Suppose
we have an image which is predominantly dark. Then its histogram would be skewed towards the
lower end of the grey scale and all the image detail are compressed into the dark end of the
histogram. If we could ‘stretch out’ the grey levels at the dark end to produce a more uniformly
distributed histogram then the image would become much clearer.
Let there be a continuous function with r being gray levels of the image to be enhanced. The
range of r is [0, 1] with r=0 repressing black and r=1 representing white. The transformation
function is of the form
It produces a level s for every pixel value r in the original image. The transformation function
is assumed to fulfill two condition T(r) is single valued and monotonically increasing in the
internal 0<T(r)<1 for 0<r<1.The transformation function should be single valued so that the
inverse transformations should exist. Monotonically increasing condition preserves the
increasing order from black to white in the output image. The second conditions guarantee
that the output gray levels will be in the same range as the input levels. The gray levels of the
image may be viewed as random variables in the interval [0.1]. The most fundamental
descriptor of a random variable is its probability density function (PDF) Pr(r) and Ps(s)
denote the probability density functions of random variables r and s respectively. Basic
results from an elementary probability theory states that if Pr(r) and Tr are known and T-1(s)
satisfies conditions (a), then the probability density function Ps(s) of the transformed variable
is given by the formula
Thus the PDF of the transformed variable s is the determined by the gray levels PDF of the
input image and by the chosen transformations function.
An important point here is that Tr depends on Pr(r) but the resulting Ps(s) always is uniform,
and independent of the form of P(r). For discrete values we deal with probability and
summations instead of probability density functions and integrals. The probability of
occurrence of gray levels rk in an image as approximated
N is the total number of the pixels in an image.
Thus a processed image is obtained by mapping each pixel with levels rk in the input image
into a corresponding pixel with level sk in the output image. A plot of Pr (rk) versus rk is
called a histogram. The transformation function given by the above equation is the called
histogram equalization or linearization. Given an image the process of histogram equalization
consists simple of implementing the transformation function which is based information that
can be extracted directly from the given image, without the need for further parameter
specification.
In some cases it may be desirable to specify the shape of the histogram that we wish the
processed image to have. Histogram equalization does not allow interactive image
enhancement and generates only one result: an approximation to a uniform histogram.
Sometimes we need to be able to specify particular histogram shapes capable of highlighting
certain gray-level ranges. The method use to generate a processed image that has a specified
histogram is called histogram matching or histogram specification.
Algorithm:
2. Compute G(k), k = 0, …, L-1, the transformation function, from the given histogram hz 3.
Compute G-1(sk) for each k = 0, …, L-1 using an iterative method (iterate on z), or in effect,
directly compute G-1(Pf (k))
Spatial filtering is an example of neighborhood operations, in this the operations are done on
the values of the image pixels in the neighborhood and the corresponding value of a sub
image that has the same dimensions as of the neighborhood This sub image is called a filter,
mask, kernel, template or window; the values in the filter sub image are referred to as
coefficients rather than pixel. Spatial filtering operations are performed directly on the pixel
values (amplitude/gray scale) of the image The process consists of moving the filter mask
from point to point in the image. At each point (x,y) the response is calculated using a
predefined relationship.
For linaer spatial filtering the response is given by a sum of products of the filter coefficient
and the corresponding image pixels in the area spanned by the filter mask. The results R of
liner filtering with the filter mask at point (x,y) in the image is
The sum of products of the mask coefficient with the corresponding pixel directly under the
mask. The coefficient w (0,0) coincides with image value f(x,y) indicating that mask it
centered at (x,y) when the computation of sum of products takes place. For a mask of size
MxN we assume m=2a+1 and n=2b+1, where a and b are nonnegative integers. It shows that
all the masks are of add size. In the general liner filtering of an image of size f of size M*N
with a filter mask of size m*m is given by the expression
To generate a complete filtered image this equation must be applied for x=0, 1, 2, -----M-1
and y=0,1,2---,N-1. Thus the mask processes all the pixels in the image. The process of linear
filtering is similar to frequency domain concept called convolution. For this reason, linear
spatial filtering often is referred to as convolving a mask with an image. Filter mask are
sometimes called convolution mask.
Where w’s are mask coefficients and z’s are the values of the image gray levels
corresponding to those coefficients, mn is the total number of coefficients in the mask.
An important point in implementing neighborhood operations for spatial filtering is the issue
of what happens when the center of the filter approaches the border of the image. There are
several ways to handle this situation.
i) To limit the excursion of the center of the mask to be at distance of less than (n-1) /2 pixels
form the border. The resulting filtered image will be smaller than the original but all the
pixels will be processed with the full mask.
ii) Filter all pixels only with the section of the mask that is fully contained in the image.
It will create bands of pixels near the border that will be processed with a partial mask.
iii) Padding the image by adding rows and columns of o’s & or padding by replicating
rows and columns. The padding is removed at the end of the process.
x=y
vi) Multiply the result in (v) by (-1)
H(u,v) called a filter because it suppresses certain frequencies from the image while leaving
others unchanged.
Edges and other sharp transition of the gray levels of an image contribute significantly to the
high frequency contents of its Fourier transformation. Hence smoothing is achieved in the
frequency domain by attenuation a specified range of high frequency components in the
transform of a given image. Basic model of filtering in the frequency domain is
G(u,v) = H(u,v)F(u,v)
F(u,v) - Fourier transform of the image to be smoothed objective is to find out a filter
function H (u,v) that yields G (u,v) by attenuating the high frequency component of F (u,v)
There are three types of low pass filters
1. Ideal
2. Butterworth
3. Gaussian
It is the simplest of all the three filters. It cuts of all high frequency component of the Fourier
transform that are at a distance greater that a specified distance D0 form the origin of the
transform. it is called a two – dimensional ideal low pass filter (ILPF) and has the transfer
function
Dept. of ECE Page | 47
LECTURE NOTES DIGITAL IMAGE PROCESSING
Where D0 is a specified nonnegative quantity and D(u,v) is the distance from point (u,v) to
the center of frequency rectangle. If the size of image is M*N , filter will also be of the same
size so center of the frequency rectangle (u,v) = (M/2, N/2) because of center transform
Because it is ideal case. So all frequency inside the circle are passed without any attenuation
where as all frequency outside the circle are completely attenuated. For an ideal low pass
filter cross section, the point of transition between H (u,v) =1 and H(u,v)=0 is called of the “
cut of frequency”.
It has a parameter called the filter order. For high values of filter order it approaches the form
of the ideal filter whereas for low filter order values it reach Gaussian filter. It may be viewed
as a transition between two extremes. The transfer function of a Butterworth low pass filter
(BLPF) of order n with cut off frequency at distance Do from the origin is defined as
Most appropriate value of n is 2.It does not have sharp discontinuity unlike ILPF that
establishes a clear cutoff between passed and filtered frequencies. Defining a cutoff
frequency is a main concern in these filters. This filter gives a smooth transition in blurring as
Ringing increases as a function of filter order. (Higher order leads to negative values).
Where D(u,v)- the distance of point (u,v) from the center of the transform
The filter has an important characteristic that the inverse of it is also Gaussain.
Image sharpening can be achieved by a high pass filtering process, which attenuates the low
frequency components without disturbing high-frequency information. These are radially
symmetric and completely specified by a cross section. If we have the transfer function of a
low pass filter the corresponding high pass filter can be obtained using the equation
This filter is opposite of the Ideal Low Pass filter and has the transfer function of the form
The transfer function of Butterworth High Pass filter of order n is given by the equation
The transfer function of a Gaussain High Pass Filter is given by the equation
Homomorphic filters are widely used in image processing for compensating the effect of no
uniform illumination in an image. Pixel intensities in an image represent the light reflected
from the corresponding points in the objects. As per as image model, image f(z,y) may be
characterized by two components: (1) the amount of source light incident on the scene being
viewed, and (2) the amount of light reflected by the objects in the scene. These portions of
light are called the illumination and reflectance components, and are denoted i ( x , y) and r (
x , y) respectively. The functions i ( x , y) and r ( x , y) combine multiplicatively to give the
image function f ( x , y):
where 0 < i ( x , y ) < a and 0 < r( x , y ) < 1. Homomorphic filters are used in such situations
where the image is subjected to the multiplicative interference or noise as depicted in
equation 1. We cannot easily use the above product to operate separately on the frequency
components of illumination and reflection because the Fourier transform of f ( x , y) is not
separable; that is
We can separate the two components by taking the logarithm of the two sides
and ln r(x, y). respectively. The function F represents the Fourier transform of the sum of two
images: a low-frequency illumination image and a high-frequency reflectance image. If we
now apply a filter with a transfer function that suppresses low- frequency components and
enhances high-frequency components, then we can suppress the illumination component and
enhance the reflectance component. Taking the inverse transform of F ( x , y) and then anti-
logarithm, we get
f’ ( x , y) = i’ ( x , y) + r’(x, y)
Color of an object is determined by the nature of the light reflected from it. When a beam of
sunlight passes through a glass prism, the emerging beam of light is not white but consists
instead of a continuous spectrum of colors ranging from violet at one end to red at the other.
As shown in figure, the color spectrum may be divided into six broad regions: violet, blue,
green,yellow, orange, and red. When viewed in full color no color in the spectrum ends
abruptly, but rather each color blends smoothly into the next.
Fig: Color spectrum seen by passing white light through a prism.
Radiance:
Radiance is the total amount of energy that flows from the light source, and it is usually
measuredinwatts(W).
Luminance:
Luminance, measured in lumens (lm), gives a measure of the amount of energy an observer
perceives from a light source. For example, light emitted from a source operating in the far
infrared region of the spectrum could have significant energy (radiance), but an observer
would hardly perceive it; its luminance would be almost zero.
Brightness:
Figure shows average experimental curves detailing the absorption of light by the red, green, and
blue cones in the eye. Due to these absorption characteristics of the human eye, colors arc seen as
variable combinations of the so- called primary colors red (R), green (G), and blue
(B).The primary colors can be added to produce the secondary colors of light --magenta (red
plus blue), cyan (green plus blue), and yellow (red plus green). Mixing the three primaries, or
a secondary with its opposite primary color, in the right intensities produces white light. The
characteristics generally used to distinguish one color from another are brightness, hue, and
saturation. Brightness embodies the chromatic notion of intensity. Hue is an attribute
associated with the dominant wavelength in a mixture of light waves. Hue represents
dominant color as perceived by an observer. Saturation refers to the relative purity or the
amount of white light mixed with a hue. The pure spectrum colors are fully saturated. Colors
such as pink(red and white) and lavender (violet and white) are less saturated, with the degree
of saturation being inversely proportional to the amount of white light-added. Hue and
saturation taken together are called chromaticity, and. therefore, a color may be characterized
by its brightness and chromaticity.
Dept. of ECE Page | 53
LECTURE NOTES DIGITAL IMAGE PROCESSING
UNIT-IV
b) Acquisition mode, or
c) Processing mode
a) Sensor noise
e) Others
Degradation process operates on a degradation function that operates on an input image with
an additive noise term. Input image is represented by using the notation f(x,y), noise term can
be represented as η(x,y).These two terms when combined gives the result as g(x,y). If we are
given g(x,y), some knowledge about the degradation function H or J and some knowledge
about the additive noise teem η(x,y), the objective of restoration is to obtain an estimate
f'(x,y) of the original image. We want the estimate to be as close as possible to the original
image. The more we know about h and η , the closer f(x,y) will be to f'(x,y). If it is a linear
position invariant process, then degraded image is given in the spatial domain by
g(x,y)=f(x,y)*h(x,y)+η(x,y)
G(u,v)=F(u,v)H(u,v)+N(u,v)
The terms in the capital letters are the Fourier Transform of the corresponding terms in the
spatial domain.
The image restoration process can be achieved by inversing the image degradation process,
i.e.,
where 1/H(u,v)is the inverse filter, and G(u,v)is the recovered image. Although the concept is
relatively simple, the actual implementation is difficult to achieve, as one requires prior
knowledge or identifications of the unknown degradation function and the unknown noise
source. In the following sections, common noise models and method of estimating the
degradation function are presented.
The principal source of noise in digital images arises during image acquisition and /or
transmission. The performance of imaging sensors is affected by a variety of factors, such as
environmental conditions during image acquisition and by the quality of the sensing elements
themselves. Images are corrupted during transmission principally due to interference in the
channels used for transmission. Since main sources of noise presented in digital images are
resulted from atmospheric disturbance and image sensor circuitry, following assumptions can
be made:
z= gray level
σ= standard deviation
(ii)Rayleigh Noise:
(iii)Gamma Noise:
The PDF of Erlang noise is given by
Its shape is similar to Rayleigh disruption. This equation is referred to as gamma density it is
correct only when the denominator is the gamma function.
(iv)Exponential Noise:
Exponential distribution has an exponential shape. The PDF of exponential noise is given as
(v)Uniform Noise:
In this case, the noise is signal dependent, and is multiplied to the image.
g(x,y)=f(x,y)+η(x,y)
or
So N(u,v) can be subtracted from G(u,v) to obtain an estimate of original image. Spatial
filtering can be done when only additive noise is present. The following techniques can be
used to reduce the noise effect:
It is the simplest mean filter. Let Sxy represents the set of coordinates in the sub image of
size m*n centered at point (x,y). The arithmetic mean filter computes the average value of the
corrupted image g(x,y) in the area defined by Sxy. The value of the restored image f at any
point (x,y) is the arithmetic mean computed using the pixels in the region defined by Sxy.
This operation can be using a convolution mask in which all coefficients have value 1/mn A
mean filter smoothes local variations in image Noise is reduced as a result of blurring. For
every pixel in the image, the pixel value is replaced by the mean value of its neighboring
pixels with a weight .This will resulted in a smoothing effect in the image.
Here, each restored pixel is given by the product of the pixel in the sub image window, raised
to the power 1/mn. A geometric mean filters but it to loose image details in the process.
The harmonic mean filter works well for salt noise but fails for pepper noise. It does well
with Gaussian noise also.
(e)Median filter:
It is the best order statistic filter; it replaces the value of a pixel by the median of gray
levels in the Neighborhood of the pixel.
The original of the pixel is included in the computation of the median of the filter are quite
possible because for certain types of random noise, the provide excellent noise reduction
capabilities with considerably less blurring then smoothing filters of similar size. These are
effective for bipolar and unipolor impulse noise.
Using the l00th percentile of ranked set of numbers is called the max filter and is given by the
equation
It is used for finding the brightest point in an image. Pepper noise in the image has very low
values, it is reduced by max filter using the max selection process in the sublimated area sky.
The 0th percentile filter is min filter
This filter is useful for flinging the darkest point in image. Also, it reduces salt noise of
the min operation.
(f)Midpoint filter:
The midpoint filter simply computes the midpoint between the maximum and minimum
values in the area encompassed by
It comeliness the order statistics and averaging .This filter works best for randomly
distributed noise like Gaussian or uniform noise.
D(u,v)- the distance from the origin of the centered frequency rectangle.
These filters are mostly used when the location of noise component in the frequency domain
is known. Sinusoidal noise can be easily removed by using these kinds of filters because it
shows two impulses that are mirror images of each other about the origin. Of the frequency
transform.
The function of a band pass filter is opposite to that of a band reject filter It allows a specific
frequency band of the image to be passed and blocks the rest of frequencies. The transfer
function of a band pass filter can be obtained from a corresponding band reject filter with
transfer function Hbr(u,v) by using the equation
These filters cannot be applied directly on an image because it may remove too much details
of an image but these are effective in isolating the effect of an image of selected frequency
bands.
This filter incorporates both degradation function and statistical behavior of noise into the
restoration process. The main concept behind this approach is that the images and noise
are considered as random variables and the objective is to find an estimate f of the
uncorrupted image f such that the mean sequence error between then is minimized.
Assuming that the noise and the image are uncorrelated (means zero average value) one or
other has zero mean values. The minimum error function of the above expression is given
in the frequency …….. is given by the expression
Product of a complex quantity with its conjugate is equal to the magnitude of …… complex
quantity squared. This result is known as wiener Filter The filter was named so because of the
name of its inventor N Wiener. The term in the bracket is known as minimum mean square
H*(u,v)-degradation function .
H(u,v) H(u,v)
The restored image in the spatial domain is given by the inverse Fourier transformed of the
frequency domain estimate F(u,v).
It shows an interesting result that even if we know the depredation function we cannot
recover the underrated image exactly because N(u,v) is not known . If the degradation value
has zero or very small values then the ratio N(u,v)/H(u,v) could easily dominate the estimate
F(u,v).
IMAGE SEGMENTATION:
If an image has been preprocessed appropriately to remove noise and artifacts, segmentation
is often the key step in interpreting the image. Image segmentation is a process in which
regions or features sharing similar characteristics are identified and grouped together. Image
segmentation may use statistical classification, thresholding, edge detection, region detection,
or any combination of these techniques. The output of the segmentation step is usually a set
of classified elements, Most segmentation techniques are either region-based or edge based.
(i) Region-based techniques rely on common patterns in intensity values within a cluster of
neighboring pixels. The cluster is referred to as the region, and the goal of the segmentation
algorithm is to group regions according to their anatomical or functional roles.
(ii) Edge-based techniques rely on discontinuities in image values between distinct regions,
and the goal of the segmentation algorithm is to accurately demarcate the boundary
separating these regions. Segmentation is a process of extracting and representing
information from an image is to group pixels together into regions of similarity. Region-
based segmentation methods attempt to partition or group regions according to
(a)Intensity values from original images, or computed values based on an image operator
simpler systems may be restricted to a minimal set on properties depending of the type of
data available.
Gray level thresholding is the simplest segmentation process. Many objects or image regions
are characterized by constant reflectivity or light absorption of their surface. Thresholding is
computationally inexpensive and fast. Thresholding can easily be done in real time using
specialized hardware. Complete segmentation can result from thresholding in simple scenes.
Search all the pixels f(i,j) of the image f. An image element g(i,j) of the segmented image is
an object pixel if f(i,j) >= T, and is a background pixel otherwise.
It is successful only under very unusual circumstances. Gray level variations are likely due to
non-uniform lighting, non-uniform input device parameters or a number of other factors.
T=T(f)
If some property of an image after segmentation is known a priori, the task of threshold
selection is simplified, since the threshold is chosen to ensure this property is satisfied. A
printed text sheet may be an example if we know that characters of the text cover 1/p of the
sheet area.
P-type thresholding
image area
choose a threshold T (based on the image histogram) such that 1/p of the
has gray values less than T and the rest has gray values larger than T.
and character
in text segmentation, prior information about the ratio between the sheet area
area can be used.
if such a priori information is not available - another property, for example the
average width of lines in drawings, etc. can be used - the threshold can be determined
to provide the required line width in the segmented image.
that differs
bimodal histogram - if objects have approximately the same gray level
from the gray level of the background
Mode Method: find the highest local maxima first and detect the threshold as a
minimum between them. To avoid detection of two local maxima belonging to the
same global maximum, a minimum distance in gray levels between these maxima is
usually required or techniques to smooth histograms are applied.
•graylevel
•color
•texture
•shape
• model
The basic purpose of region growing is to segment an entire image R into smaller sub-
images, Ri, i=1,2,….,N. which satisfy the following conditions:
(b)Region Splitting:
The basic idea of region splitting is to break the image into a set of disjoint regions, which are
coherent within themselves:
• Look at the area of interest and decide if all pixels contained in the region satisfy some
similarity constraint.
• If TRUE then the area of interest corresponds to an entire region in the image.
• If FALSE split the area of interest (usually into four equal subareas) and consider each
of the sub-areas as the area of interest in turn.
• This process continues until no further splitting occurs. In the worst case this
happens when the areas are just one pixel in size.
If only a splitting schedule is used then the final segmentation would probably contain many
neighboring regions that have identical or similar properties. We need to merge these regions.
(c)Region merging:
The result of region merging usually depends on the order in which regions are merged. The
simplest methods begin merging by starting the segmentation using regions of 2x2, 4x4 or
8x8 pixels. Region descriptions are then based on their statistical gray level properties. A
region description is compared with the description of an adjacent region; if they match, they
are merged into a larger region and a new region description is computed. Otherwise regions
are marked as non-matching. Merging of adjacent regions continues between all neighbors,
including newly formed ones. If a region cannot be merged with any of its neighbors, it is
marked `final' and the merging process stops when all image regions are so marked.
Merging Heuristics:
• Two adjacent regions are merged if a significant part of their common boundary
consists of weak edges
• Two adjacent regions are also merged if a significant part of their common boundary
consists of weak edges, but in this case not considering the total length of the region
borders.
Of the two given heuristics, the first is more general and the second cannot be used
alone because it does not consider the influence of different region sizes.
The last case is called as “Split and Merge” method. Region merging methods generally use
similar criteria of homogeneity as region splitting methods, and only differ in the direction
of their application.
To illustrate the basic principle of split and merge methods, let us consider an
imaginary image.
• Not all the pixels in Fig (a) are similar. So the region is split as in Fig. (b).
• Assume that all pixels within each of the regions I1, I2 and I3 are similar, but those in
I4 are not.
• Now assume that all pixels within each region are similar with respect to that region,
and that after comparing the split regions, regions I43 and I44 are found to be identical.
• These pair of regions is thus merged together, as in shown in Fig. (d)
A combination of splitting and merging may result in a method with the advantages of both
the approaches. Split-and-merge approaches work using pyramid image representations.
Regions are square-shaped and correspond to elements of the appropriate pyramid level. If
any region in any pyramid level is not homogeneous (excluding the lowest level), it is split
into four sub-regions -- these are elements of higher resolution at the level below. If four
regions exist at any pyramid level with approximately the same value of homogeneity
measure, they are merged into a single region in an upper pyramid level. We can also
describe the splitting of the image using a tree structure, called a modified quad tree. Each
non-terminal node in the tree has at most four descendants, although it may have less due to
merging. Quad tree decomposition is an operation that subdivides an image into blocks that
contain "similar" pixels. Usually the blocks are square, although sometimes they may be
rectangular. For the purpose of this demo, pixels in a block are said to be "similar" if the
range of pixel values in the block are not greater than some threshold. Quad tree
decomposition is used in variety of image analysis and compression applications. An
unpleasant drawback of segmentation quad trees, is the square region shape assumption. It is
not possible to merge regions which are not part of the same branch of the segmentation tree.
Because both split-and-merge processing options are available, the starting segmentation
does not have to satisfy any of the homogeneity conditions. The segmentation process can be
understood as the construction of a segmentation quad tree where each leaf node represents a
homogeneous region. Splitting and merging corresponds to removing or building parts of the
segmentation quad tree.
Dept. of ECE Page | 68
LECTURE NOTES DIGITAL IMAGE PROCESSING
(e)Region growing:
Region growing approach is the opposite of the split and merges approach:
• Start by choosing an arbitrary seed pixel and compare it with neighboring pixels
• Region is grown from the seed pixel by adding in neighboring pixels that are
similar, increasing the size of the region.
• When the growth of one region stops we simply choose another seed pixel which does
not yet belong to any region and start again.
• This whole process is continued until all pixels belong to some region.
• A bottom up method.
Region growing methods often give very good segmentations that correspond well to the
observed edges.
However starting with a particular seed pixel and letting this region grow completely before
trying other seeds biases the segmentation in favour of the regions which are segmented
first. This can have several undesirable effects:
• Problems can occur if the (arbitrarily chosen) seed point lies on an edge.
To counter the above problems, simultaneous region growing techniques have been
developed.
• Similarities of neighboring regions are taken into account in the growing process.
• Control of these methods may be quite complicated but efficient methods have
been developed.
(a)Edge detection:
Edges are places in the image with strong intensity contrast. Since edges often occur at image
locations representing object boundaries, edge detection is extensively used in image
segmentation when we want to divide the image into areas corresponding to different objects.
Representing an image by its edges has the further advantage that the amount of data is
reduced significantly while retaining most of the image information.
It is optimal for step edges corrupted by white noise. Optimality related to three criteria
o detection criterion ... important edges should not be missed, there should be no
spurious responses
o localization criterion ... distance between the actual and located position of the
edge should be minimal
o one response criterion ... minimizes multiple responses to a single edge (also partly
covered by the first criterion since when there are two responses to a single edge
1) The edge detector was expressed for a 1D signal and the first two optimality criteria. A
closed form solution was found using the calculus of variations.
2) If the third criterion (multiple responses) is added, the best solution may be found by
numerical optimization. The resulting filter can be approximated effectively with error
less than 20% by the first derivative of a Gaussian smoothing filter with standard
deviation; the reason for doing this is the existence of an effective implementation.
3) The detector is then generalized to two dimensions. A step edge is given by its
position, orientation, and possibly magnitude (strength).
Suppose G is a 2D Gaussian and assume we wish to convolute the image with an
operator Gn which is a first derivative of G in the direction n.
available
The edge location is then at the local maximum in the direction n of the operator G_n
convoluted with the image g
This equation shows how to find local maxima in the direction perpendicular to the
edge; this operation is often referred to as non-maximum suppression.
o strength of the edge (magnitude of the gradient of the image intensity function
g) is measured as
Streaking means breaking up of the edge contour caused by the operator fluctuating
above and below the threshold.
o Individual weak responses usually correspond to noise, but if these points are
connected to any of the pixels with strong responses they are more likely to
be actual edges in the image.
o Such connected pixels are treated as edge pixels if their response is above a
low threshold.
o The low and high thresholds are set according to an estimated signal to noise
ratio.
5) The correct scale for the operator depends on the objects contained in the image.
The solution to this unknown is to use multiple scales and aggregate information from
them.
Different scale for the Canny detector is represented by different standard deviations
of the Gaussians.
There may be several scales of operators that give significant responses to edges
(i.e., signal to noise ratio above the threshold); in this case the operator with the
smallest scale is chosen as it gives the best localization of the edge.
o All significant edges from the operator with the smallest scale are marked
first.
o Edges of a hypothetical operator with larger are synthesized from them (i.e., a
prediction is made of how the large should perform on the evidence gleaned
from the smaller).
o Then the synthesized edge response is compared with the actual edge
response for larger.
o Additional edges are marked only if they have significantly stronger response
than that predicted from synthetic output.
This procedure may be repeated for a sequence of scales, a cumulative edge map is
built by adding those edges that were not identified at smaller scales.
1. Repeat steps (2) till (6) for ascending values of the standard deviation .
3. Estimate local edge normal directions n for each pixel in the image.
7. Aggregate the final information about edges at multiple scale using the `feature synthesis
approach.
(b)Edge Operators:
Since edges consist of mainly high frequencies, we can, in theory, detect edges by applying a
high pass frequency filter in the Fourier domain or by convolving the image with an
appropriate kernel in the spatial domain. In practice, edged detection is performed in the
spatial domain, because it is computationally less expensive and often yields better results.
`Since edges correspond to strong illumination gradients, we can highlight them by
calculating the derivatives of the image. We can see that the position of the edge can be
estimated with the maximum of the 1st derivative or with the zero-crossing of the 2nd
derivative. Therefore we want to find a technique to calculate the derivative of a two-
dimensional image. For a discrete one dimensional function f(i), the first derivative can be
approximated by
nd
Calculating this formula is equivalent to convolving the function with [-1 1]. Similarly the 2
derivative can be estimated by convolving f(i) with [1 -2 1]. Different edge detection kernels
which are based on the above formula enable us to calculate either the 1st or the 2nd
derivative of a two-dimensional image. There are two common approaches to estimate the 1st
derivative in a two-dimensional image, Prewitt compass edge detection and gradient edge
detection.Prewitt compass edge detection involves convolving
the image with a set of (usually 8) kernels, each of which is sensitive to a different edge
orientation. The kernel producing the maximum response at a pixel location determines the
edge magnitude and orientation. Different sets of kernels might be used: examples include
Prewitt, Sobel, Kirsch and Robinson kernels. Gradient edge detection is the second and more
widely used technique. Here, the image is
convolved with only two kernels, one estimating the gradient in the x-direction, Gx, the other
the gradient in the y-direction, Gy. The absolute gradient magnitude is then given by
In many implementations, the gradient magnitude is the only output of a gradient edge
detector, however the edge orientation might be calculated with The most common kernels
used for the gradient edge detector are the Sobel, Roberts Cross and Prewitt operators. After
having calculated the magnitude of the 1st derivative, we now have to identify those pixels
corresponding to an edge. The easiest way is to threshold the gradient image, assuming that
all pixels having a local gradient above the threshold must representanedge.Analternative
technique is to look for local maxima in the gradient image, thus producing one pixel wide
edges. A more sophisticated technique is used by the Canny edge detector. It first applies a
gradient edge detector to the image and then finds the edge pixels using non-maximal
suppression and hysteresis tracking.
An operator based on the 2nd derivative of an image is the Marr edge detector, also known
as zero crossing detector. Here, the 2nd derivative is calculated using a Laplacian of aussian
nd
(LoG) filter. The Laplacian has the advantage that it is an isotropic measure of the 2
derivative of an image, i.e. the edge magnitude is obtained independently from the edge
orientation by convolving the image with only one kernel. The edge positions are then given
by the zero-crossings in the LoG image. The scale of the edges which are to be detected can
be controlled by changing the variance of the Gaussian. A general problem for edge
detection is its sensitivity to noise, the reason being that calculating the derivative in the
spatial domain corresponds to accentuating high frequencies and hence magnifying noise.
This problem is addressed in the Canny and Marr operators by convolving the image with a
smoothing operator (Gaussian) before calculating the derivative.
(c)Line detection:
While edges (I. E. boundaries between regions with relatively distinct graylevels) are by far
the most common type of discontinuity in an image, instances of thin lines in an image occur
frequently enough that it is useful to have a separate mechanism for detecting them. A
convolution based technique can be used which produces an image description of the thin
Dept. of ECE Page | 74
LECTURE NOTES DIGITAL IMAGE PROCESSING
lines in an input image. Note that the Hough transform can be used to detect lines; however,
in that case, the output is a P ARA ME TRIC description of the lines in an image. The line
detection operator consists of a convolution kernel tuned to detect the presence of lines of a
particular width n, at a particular orientation ө. Figure below shows a collection of four such
kernels, which each respond to lines of single pixel width at the particular orientation shown.
Four line detection kernels which respond maximally to horizontal, vertical and oblique (+45
and - 45 degree) single pixel wide lines. These masks above are tuned for light lines against a
dark background, and would give a big negative response to dark lines against a light
background. If we are only interested in detecting dark lines against a light background, then
we should negate the mask values. Alternatively, we might be interested in either kind of
line, in which case, we could take the absolute value of the convolution output.
If Ri denotes the response of kernel I, we can apply each of these kernels across an image,
and for any particular point, if for all that point is more likely to contain a line whose
orientation (and width) corresponds to that of kernel I. One usually thresholds to eliminate
weak lines corresponding to edges and other features with intensity gradients which have a
different scale than the desired line width. In order to find complete lines, one must join
together line fragments, ex: with an edge tracking operator.
(d)Corner detection:
Input to the corner detector is the gray-level image. Output is an image in which values are
proportional to the likelihood that the pixel is a corner. The Moravec detector is maximal in
pixels with high contrast. These points are on corners and sharp edges.
Using ZH Operator,
The image function f is approximated in the neighborhood of the pixel (i,j) by a cubic
polynomial with coefficients c_k. This is a cubic facet model. The ZH operator estimates
the corner strength based on the coefficients of the cubic facet model.
UNIT-V
IMAGE COMPRESSION
1.1 Introduction:
In recent years, there have been significant advancements in algorithms and architectures for
the processing of image, video, and audio signals. These advancements have proceeded along
several directions. On the algorithmic front, new techniques have led to the development of
robust methods to reduce the size of the image, video, or audio data. Such methods are
extremely vital in many applications that manipulate and store digital data. Informally, we
refer to the process of size reduction as a compression process. We will define this process in
a more formal way later. On the architecture front, it is now feasible to put sophisticated
compression processes on a relatively low-cost single chip; this has spurred a great deal of
activity in developing multimedia systems for the large consumer market.
1.2 Background:
Example 1: Let us consider facsimile image transmission. In most facsimile machines, the
document is scanned and digitised. Typically, an 8.5x11 inches page is scanned at 200 dpi;
Dept. of ECE Page | 77
LECTURE NOTES DIGITAL IMAGE PROCESSING
thus, resulting in 3.74 Mbits. Transmitting this data over a low-cost 14.4 kbits/s modem
would require 5.62 minutes. With compression, the transmission time can be reduced to 17
seconds. This results in substantial savings in transmission costs.
Image, video, and audio signals are amenable to compression due to the factors below.
1. Within a single image or a single video frame, there exists significant correlation among
neighbor samples. This correlation is referred to as spatial correlation.
2. For data acquired from multiple sensors (such as satellite images), there exists significant
correlation amongst samples from these sensors. This correlation is referred to as spectral
correlation.
3. For temporal data (such as video), there is significant correlation amongst samples in
different segments of time. This is referred to as temporal correlation.
• Some data tends to have high-level features that are redundant across space and time;
that is, the data is of a fractal nature.
Dept. of ECE Page | 78
LECTURE NOTES DIGITAL IMAGE PROCESSING
For a given application, compression schemes may exploit any one or all of the above factors
to achieve the desired compression data rate.
There are many applications that benefit from data compression technology. Table shown
above lists a representative set of such applications for image, video, and audio data, as well
as typical data rates of the corresponding compressed bit streams. Typical data rates for the
uncompressed bit streams are also shown.
The core of the encoder is the source coder. The source coder performs the compression
process by reducing the input data rate to a level that can be supported by the storage or
transmission medium. The bit rate output of the encoder is measured in bits per sample or bits
per second. For image or video data, a pixel is the basic element; thus, bits per sample is also
referred to as bits per pixel or bits per pel. In the literature, the term compression ratio,
denoted as cr , is also used instead of bit rate to characterise the capability of the compression
video. In a practical system, the source coder is usually followed by a second level of coding:
the channel coder. The channel coder translates the compressed bit stream into a signal
suitable for either storage or transmission. In most systems, source coding and channel
coding are distinct processes. In recent years, methods to perform combined source and
channel coding have also been developed. Note that, in order to reconstruct the image, video,
or audio signal, one needs to reverse the processes of channel coding and source coding. This
is usually performed at the decoder.
From a system design viewpoint, one can restate the compression problem as a bit rate
minimization problem, where several constraints may have to be met, including the
following:
Specified level of signal quality. This constraint is usually applied at the decoder.
Communication delay. This constraint refers to the end to end delay, and is
measured from the start of encoding a sample to the complete decoding of that
sample.
Note that, these constraints have different importance in different applications. For example,
in a two-way teleconferencing system, the communication delay might be the major
constraint, whereas, in a television broadcasting system, signal quality and decoder
complexity might be the main constraints.
Lossless compression
Lossy compression
(i)Lossless compression:
In many applications, the decoder has to reconstruct without any loss the original data. For a
lossless compression process, the reconstructed data and the original data must be identical in
value for each 5 and every data sample. This is also referred to as a reversible process. In
lossless compression, for a specific application, the choice of a compression method involves
a trade-off along the three dimensions depicted in Figure that is, coding efficiency, coding
complexity, and coding delay.
Dept. of ECE Page | 80
LECTURE NOTES DIGITAL IMAGE PROCESSING
Coding Efficiency:
This is usually measured in bits per sample or bits per second (bps). Coding efficiency is
usually limited by the information content or entropy of the source. In intuitive terms, the
entropy of a source X provides a measure for the "randomness" of X. From a compression
theory point of view, sources with large entropy are more difficult to compress (for example,
random noise is very hard to compress).
Coding Complexity :
Coding Delay:
A complex compression process often leads to increased coding delays at the encoder and the
decoder. Coding delays can be alleviated by increasing the processing power of the
computational engine; however, this may be impractical in environments where there is a
power constraint or when the underlying computational engine cannot be improved.
Furthermore, in many applications, coding delays have to be constrained; for example, in
interactive communications. The need to constrain the coding delay often forces the
compression system designer to use a less sophisticated algorithm for the compression
processes.
Dept. of ECE Page | 81
LECTURE NOTES DIGITAL IMAGE PROCESSING
From this discussion, it can be concluded that these trade-offs in coding complexity, delay,
and efficiency are usually limited to a small set of choices along these axes. In a subsequent
section, we will briefly describe the trade-offs within the context of specific lossless
compression methods.
(ii)Lossy compression:
The majority of the applications in image or video data processing do not require that the
reconstructed data and the original data are identical in value. Thus, some amount of loss is
permitted in the reconstructed data. A compression process that results in an imperfect
reconstruction is referred to as a lossy compression process. This compression process is
irreversible. In practice, most irreversible compression processes degrade rapidly the signal
quality when they are repeatedly applied on previously decompressed data.
The choice of a specific lossy compression method involves trade-offs along the four
dimensions shown in Figure. Due to the additional degree of freedom, namely, in the signal
quality, a lossy compression process can yield higher compression ratios than a lossless
compression scheme.
Signal Quality: This term is often used to characterize the signal at the output of the decoder.
One measure that is often cited is the signal to noise ratio, which can be expressed as SNR.
The noise signal energy is defined as the energy measured for a hypothetical signal that is the
difference between the encoder input signal and the decoder output signal. Note that, SNR as
defined here is given in decibels (dB). In the case of images or video, PSNR (peak signal-to-
noise ratio) is used instead of SNR . The calculations are essentially the same as in the case of
SNR , however, in the numerator, instead of using the encoder input signal one uses a
hypothetical signal with a signal strength of 255 (the maximum decimal value of an unsigned
8-bit number, such as in a pixel). High SNR or PSNR values do not always correspond to
signals with perceptually high quality. Another measure of signal quality is the mean opinion
score, where the performance of a compression process is characterized by the subjective
quality of the decoded signal. For instance, a five point scale such as very annoying,
annoying, slightly annoying, perceptible but not annoying, and imperceptible might be used
to characterize the impairments in the decoder output.
In either lossless or lossy compression schemes, the quality of the input data affects the
compression ratio. For instance, acquisition noise, data sampling timing errors, and even the
analogue-to-digital conversion process affects the signal quality and reduces the spatial and
Dept. of ECE Page | 82
LECTURE NOTES DIGITAL IMAGE PROCESSING
temporal correlation. Some compression schemes are quite sensitive to the loss in correlation
and may yield significantly worse compression in the presence of noise.
In this chapter, we have introduced some fundamental concepts related to image, video, and
audio compression. When choosing a specific compression method, one should consider the
following issues:
Interplay with other data modalities, such as audio and video. In a system where
several data modalities have to be supported, the compression methods for each
modality should have some common elements. For instance, in an interactive
videophone system, the audio compression method should have a frame structure that
is consistent with the video frame structure. Otherwise, there will be unnecessary
requirements on buffers at the decoder and a reduced tolerance to timing errors.
In this course we are interested in exploring various compression techniques referring to the
source coder only, where the image compression takes place. A general procedure for image
data compression is shown in the following block diagram.
(a)Lossless compression:
Lossless compression refers to compression methods for which the original uncompressed
data set can be recovered exactly from the compressed stream. The need for lossless
compression arises from the fact that many applications, such as the compression of digitized
medical data, require that no loss be introduced from the compression method. Bitonal image
transmission via a facsimile device also imposes such requirements. In recent years, several
compression standards have been developed for the lossless compression of such images. We
discuss these standards later. In general, even when lossy compression is allowed, the overall
compression scheme may be a combination of a lossy compression process followed by a
lossless compression process. Various image, video, and audio compression standards follow
this model, and several of the lossless compression schemes used in these standards are
described in this section. The general model of a lossless compression scheme is as depicted
in the following figure.
Given an input set of symbols, a modeler generates an estimate of the probability distribution
of the input symbols. This probability model is then used to map symbols into code words.
The combination of the probability modeling and the symbol-to-codeword mapping functions
is usually referred to as entropy coding. The key idea of entropy coding is to use short code
words for symbols that occur with high probability and long code words for symbols that
occur with low probability.
The probability model can be derived either from the input data or from a priori assumptions
about the data. Note that, for decodability, the same model must also be generated by the
decoder. Thus, if the model is dynamically estimated from the input data, causality
constraints require a delay function between the input and the modeler. If the model is
derived from a priori assumptions, then the delay block is not required; furthermore, the
model function need not have access to the input symbols. The probability model does not
have to be very accurate, but the more accurate it is, the better the compression will be. Note
that, compression is not always guaranteed. If the probability model is wildly inaccurate, then
the output size may even expand. However, even then the original input can be recovered
without any loss.
Decompression is performed by reversing the flow of operations shown in the above Figure.
(i)Differential coding:
Another preprocessing technique that improves the compression ratio is differential coding.
Dept. of ECE Page | 86
LECTURE NOTES DIGITAL IMAGE PROCESSING
Differential coding skews the symbol statistics so that the resulting distribution is more
amenable to compression. Image data tend to have strong inter-pixel correlation. If, say, the
pixels in the image are in the order x1,x2,….xN, then instead of compressing these pixels, one
might process the sequence of differentials yi=xi-xi-1, where i=0,1,2….N-1, and x0=0 . In
Let symbol Si have a probability of occurrence Pi. From coding theory, the ideal symbol-to
codeword mapping function will produce a codeword requiring log (1/2Pi) bits. A distribution
close to uniform for Pi, such as the one shown in the left plot of Figure, will result in
codewords that on the average require eight bits; thus, no compression is achieved. On the
other hand, for a skewed probability distribution, such as the one shown in the right plot of
Figure, the symbol-to-codeword mapping function can on the average yield codewords
requiring less than eight bits per symbol and thereby achieve compression.
We will understand these concepts better in the following Huffman encoding section.
(ii)Huffman coding:
In 1952, D. A. Huffman developed a code construction method that can be used to perform
lossless compression. In Huffman coding, the modeling and the symbol-to-codeword
mapping functions of Figure 1.1 are combined into a single process. As discussed earlier, the
input data are partitioned into a sequence of symbols so as to facilitate the modeling
process. In most image and video compression applications, the size of the alphabet
composing these symbols is restricted to at most 64000 symbols. The Huffman code
construction procedure evolves along the following parts:
1. Order the symbols according to their probabilities. For Huffman code construction, the
frequency of occurrence of each symbol must be known a priori. In practice, the frequency of
occurrence can be estimated from a training set of data that is representative of the data to be
compressed in a lossless manner. If, say, the alphabet is composed of N distinct symbols
s1,s2,s3….sN-1 and the probabilities of occurrence are p1,p2…pN-1, then the symbols are
rearranged so that.
2. Apply a contraction process to the two symbols with the smallest probabilities. Suppose
the two symbols are s1..sN . We replace these two symbols by a hypothetical symbol, say,
3. We repeat the previous part 2 until the final set has only one member.
The recursive procedure in part 2 can be viewed as the construction of a binary tree, since at
each step we are merging two symbols. At the end of the recursion process all the symbols N
s1,s2..sN-1 will be leaf nodes of this tree. The codeword for each symbol Si is obtained by
traversing the binary tree from its root to the leaf node corresponding to Si
We illustrate the code construction process with the following example depicted in Figure.
The input data to be compressed is composed of symbols in the alphabet k, l,u,w,e, r,? . First
we sort the probabilities. In Step 1, we merge the two symbols k and w to form the new
symbol (k,w) . The probability of occurrence for the new symbol is the sum of the
probabilities of occurrence for k and w. We sort the probabilities again and perform the merge
on the pair of least frequently occurring symbols which are now the symbols (k,w) and ? . We
repeat this process through Step 6. By visualizing this process as a binary tree as shown in
this figure and traversing the process from the bottom of the tree to the top, one can determine
the code words for each symbol. For example, to reach the symbol u from the root of the tree,
one traverses nodes that were assigned the bits 1,0 and 0 . Thus, the codeword for u is 100.
Dept. of ECE Page | 88
LECTURE NOTES DIGITAL IMAGE PROCESSING
Dept. of ECE Page | 89
LECTURE NOTES DIGITAL IMAGE PROCESSING
In this example, the average codeword length is 2.6 bits per symbol. In general, the average
codeword length is defined as
where li is the codeword length (in bits) for the codeword corresponding to symbol si . The
average codeword length is a measure of the compression ratio. Since our alphabet has seven
symbols, a fixed-length coder would require at least three bits per codeword. In this example,
we have reduced the representation from three bits per symbol to 2.6 bits per symbol; thus,
the corresponding compression ratio can be stated as 3/2.6=1.15. For the lossless
compression of typical image or video data, compression ratios in excess of two are hard to
come by.
Standards related to the coding and transmission of signals over public telecommunication
channels are developed under the auspices of the telecommunication standardization sector of
the International Telecommunication Union (ITU-T). This sector was formerly known as the
CCITT. The first standards for lossless compression were developed for facsimile
applications. Scanned images used in such applications are bitonal, that is, the pixels take on
one of two values, black or white, and these values are represented with one bit per pixel.
(iii)Run-length coding:
In every bitonal image there are large regions that are either all white or all black. For
instance,in Figure, we show a few pixels of a line in a bitonal image. Note that, the six
contiguous pixels of the same color can be described as a run of six pixels with value 0 .
Thus, if each pixel of the image is remapped from say, its (position, value) to a run and
value, then a more compact description can be obtained. In our example, no more than four
bits are needed to describe the six-pixel run. In general, for many document type images,
significant compression can be achieved using such preprocessing. Such a mapping scheme is
referred to as a run-length coding scheme.
The combination of a run-length coding scheme followed by a Huffman coder forms the
basis of the image coding standards for facsimile applications. These standards include the
following:
ITU-T Rec. T.4 (also known as Group 3). There are two coding approaches within
this standard.
1. Modified Huffman (MH) code. The image is treated as a sequence of scanlines, and a
runlength description is first obtained for each line. A Huffman code is then applied to
the (run, value) description. A separate Huffman code is used to distinguish between
black and white runs, since the characteristics of these runs are quite different. The
Huffman code table is static: that is, it does not change from image to image. For
error-detection purposes, after each line is coded, an EOL (end of line) codeword is
inserted.
2. Modified Read (MR) code. Here, pixel values in a previous line are used as
predictors for the current line. This is followed by a run-length description and a static
Huffman code as in the MH code. An EOL codeword is also used. To prevent error
propagation, MR coding is mixed with MH coding periodically.
ITU-T Rec. T.6 (also known as Group 4). The coding technique used here is referred
to as a Modified Modified Read (MMR) code. This code is a simplification of the MR
code, wherein the error-protection mechanisms in the MR code are removed so as to
improve the overall compression ratio.
These compression standards yield good compression (20:1 to 50:1) for business-type
scanned documents. For images composed of natural scenes and rendered as bitonal images
using a halftoning technique the compression ratio is severely degraded. In Figure shown, the
image on the left possesses characteristics representative of business document images
whereas the image on the right is a typical halftone image. The former is characterized by
long runs of black or white, and the static Huffman code in the facsimile compression
Dept. of ECE Page | 91
LECTURE NOTES DIGITAL IMAGE PROCESSING
standards is matched to these run-lengths. In the latter image, the run-lengths are relatively
short, spanning only one to two pixels and the static Huffman code is not matched to such
runs. An adaptive arithmetic coder is better suited for such images.
We also provide compression ratios for two typical images: a 202 Kbyte halftone image and a
1 Mbyte image, primarily comprised of text. The latter image is referred to as letter in the
table. For business-type documents, JBIG yields 20 percent to 50 percent more compression
than the facsimile compression standards ITU-T Rec. T.4 and Rec. T.6. For halftone images,
compression ratios with JBIG are two to five times more than those obtained with the
facsimile compression standards. However, software implementations of JBIG compression
on a general purpose computer are two to three times slower than implementations of the
ITU-T Rec. T.4 and T.6 standards. The JBIG standard can also handle grayscale images by
processing each bit-plane of a grayscale image as separate bitonal images.
Most people know JPEG as a transform-based lossy compression standard. JPEG (Joint
Photographic Experts Group), like JBIG, has been developed jointly by both the ITU-T and
the ISO. We will describe this standard in greater detail in a subsequent section; however,
here, we describe briefly the lossless mode of compression supported within this standard.
The lossless compression method within JPEG is fully independent from transform-based
coding. Instead, it uses differential coding to form prediction residuals that are then coded
with either a Huffman coder or an arithmetic coder. As explained earlier, the prediction
residuals usually have a lower entropy; thus, they are more amenable to compression than the
original image pixels. In lossless JPEG, one forms a prediction residual using previously
encoded pixels in the current line and/or the previous line. The prediction residual for pixel in
Figure is r = y-x. where can be any of the following functions:
y 0
ya
yb
yc
yabc
y a (b c) / 2
y b (a c) / 2
y (a b) / 2
Note that, pixel values at pixel positions, and, are available to both the encoder and the X y
decoder prior to processing. The particular choice for the function is defined in the scan
header of the compressed stream so that both the encoder and the decoder use identical
functions. Divisions by two are computed by performing a one-bit right shift.
Dept. of ECE Page | 93
LECTURE NOTES DIGITAL IMAGE PROCESSING
The prediction residual is computed modulo 2. This residual is not directly Huffman coded.
Instead, it is expressed as a pair of symbols: the category and the magnitude. The first symbol
represents the number of bits needed to encode the magnitude. Only this value is Huffman
coded. The magnitude categories for all possible values of the prediction residual are shown
in Table. If, say, the X prediction residual for is 42, then from Table 4.2 we determine that
this value belongs to category 6; that is, we need an additional six bits to uniquely determine
the value 42. The prediction residual is then mapped into the two-tuple (6, 6-bit code for 42).
Category 6 is Huffman coded, and the compressed representation for the prediction residual
consists of this Huffman codeword followed by the 6-bit representation for the magnitude. In
general, if the value of the residual is positive, then the code for the magnitude is its direct
binary representation. If the residual is negative, then the code for the magnitude is the one's
complement of its absolute value. Therefore, code words for negative residual always start
wish a zero bit.
At various points in their study of the image coding literature readers will come across with
the R(D) application of rate-distortion theory, or its equivalent, distortion-rate theory. The
significant introduction by Shannon (1959) of a fidelity criterion into his work on coding
theory led to an extensive body of work which sought to characterize the relationship
between coding rate and measured distortion in a consistent way. R(D) Briefly, the general
form of the curve is as shown in the following figure where, as expected, for smaller levels of
distortion we require a higher coding rate. If the attainable level of distortion is no smaller
than Dmax then no information need be sent anyway. For an initially analogue input signal,
as the distortion falls to zero the quantisation intervals must have a width which tends to zero
and so the rate curve moves towards infinity (dotted curve in Figure). For a discrete signal we
know that we can encode at a rate equivalent to the entropy and incur [R(0) H] zero
distortion, at least in principle . We may therefore set our operating point anywhere
Dept. of ECE Page | 94
LECTURE NOTES DIGITAL IMAGE PROCESSING
0 D Dmax within the region . Deviations between the results achieved with practical
algorithms and the theory are to be expected, and are usually due to lack of knowledge of the
true source R(D) distribution (and the associated relation) and an inability to allocate
fractional numbers of bits to encoded symbols (some kinds of block schemes, vector
quantisation, for example, allow this to be done).
Fig: Rate-distortion relationship. For a discrete signal zero distortion coding is R(0)
achieved when the source entropy. For a continuous source zero distortion implies that
the rate rises without limit (---).
Dept. of ECE Page | 95