Nothing Special   »   [go: up one dir, main page]

P X 64 P Ranges From 1 To 30. Hence The Standard Was Once Known As P 64, "P Star 64". The Standard Requires The Video Encoders Delay To Be Less Than 150

Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 11

H.261 is an earlier digital video compression standard.

Because its principle of motion -


compensation - based compression is very much retained in all later video compression
standards, we will start with a detailed discussion of H.261.

The International Telegraph and Telephone Consultative Committee (CCITT) initiated


development of H.261 in 1988. The final recommendation was adopted by the International
Telecommunication Union - Telecommunication standardization sector (ITU - T), formerly
CCITT, in 1990.

The standard was designed for videophone, video conferencing, and other, audio visual
services over ISDN telephone lines. Initially, it was intended to support multiples (from 1 to
5) of 384 kbps channels. In the end, however, the video codec supports bitrates of p  x 64
kbps, where p  ranges from 1 to 30. Hence the standard was once known as p  * 64,
pronounced "p  star 64". The standard requires the video encoders delay to be less than 150

msec, so that the video can be used for real - time, bidirectional video conferencing.
H.261 belongs to the following set of ITU recommendations for visual telephony systems:
H.221. Frame structure for an audiovisual channel supporting 64 to 1,920 kbps
H.230. Frame control signals for audiovisual systems
Table Video formats supported by H.261

1. H.242. Audiovisual communication protocols


2. H.261.Video encoder / decoder for audiovisual services at p  x 64 kbps
3. H.320. Narrowband audiovisual terminal equipment for p  x 64 kbps
transmission
The above table lists the video formats supported by H.261. Chroma subsampling in H.261
is 4:2:0. Considering the relatively low bitrate in network communications at the time,
support for CCIR 601 QCIF is specified as required, whereas support for CIF is optional.

The following figure illustrates a typical H.261 frame sequence. Two types of image frames
are defined: ultra - frames (I - frames)  and interframes (P - frames).
I - frames are treated as independent images. Basically, a transform coding method similar
to JPEG is applied within each I - frame, hence the name "intra".

P - frames are not independent. They are coded by a forward predictive coding method in
which current macroblocks are predicted from similar macroblocks in the preceding I: or P -
frame, and differences  between the macroblocks are coded. Temporal
redundancy  removal  is hence included in P - frame coding, whereas I - frame coding
performs only spatial  redundancy removal.  It is important to remember that prediction from
a previous P - frame is allowed (not just from a previous I - frame).
The interval between pairs of I - frames is a variable and is determined by the encoder.
Usually, an ordinary digital video has a couple of I - frames per second. Motion vectors in
H.261 are always measured in units of full pixels and have a limited range of ±15 pixels that
is, p =  15.
H.261 Frame sequence

I - frame coding
Intra - Frame(l - Frame) Coding
Macroblocks are of size 16 x 16 pixels for the Y frame of the orignal image. For Cb and Cr
frames, they correspond to areas of 8 x 8, since 4:2:0 chroma subsampling is employed.
Hence, a macroblock consists of four Y blocks, one Cb, and one Cr, 8 x 8 blocks.

For each 8 x 8 block, a DCT transform is applied. As in JPEG, the DCT coefficients go
through a quantization stage. Afterwards, they are zigzag - scanned and eventually entropy -
coded.

Inter - Frame (P - Frame) Predictive Coding


The following figure shows the H.261 P - frame coding scheme based on motion
compensation. For each macroblock in the Target frame, a motion vector is allocated by
one of the search methods discussed earlier. After the prediction, a difference
macroblock  is derived to measure the prediction error. It is  also carried in the form of four Y
blocks, one Cb, and one Cr block. Each of these 8 x 8 blocks goes through DCT,
quantization, zigzag scan, and entropy coding. The motion vector is also coded.
Sometimes, a good match cannot be found — the prediction error exceeds a certain
acceptable level."The macroblock itself is then encoded (treated as an intra macroblock)
and in this case is termed  a non - motion - compensated macroblock.
P - frame coding encodes the difference macroblock (not the Target macroblock itself).
Since the difference macroblock usually has a much smaller entropy than the Target
macroblock a a large compression ratio  is attainable.
In fact, even the motion vector is not directly coded. Instead, the difference, MVD, between
the motion vectors of the preceding macroblock and current macroblock is sent for entropy
coding:

Quantization in H.261
The quantization in H.261 does not use 8 x 8 quantization matrices, as in JPEG and MPEG.
Instead, it uses a constant, called stepsize,  for all DCT coefficients within a macroblock.
H.261 P - frame coding based on motion compensation

According to the need (e.g., bitrate control of the video) stepsize  can take on any one of the
31 even values from 2 to 62. One exception, however, is made for the DC coefficient in intra
mode, where a step size of 8 is always used. If we use DCT  and QDCT  to denote the DCT
coefficients before and after quantization, then for DC coefficients in intra mode,

where scale  is an integer in the range of [1, 31]


H.261 Encoder and decoder
The following figure shows a relatively complete picture of how the H.261 encoder and
decoder work. Here, Q  and Q - 1  stand for quantization and its inverse, respectively.
Switching of the intra - and inter - frame modes can be readily implemented by a
multiplexer. To avoid propagation of coding errors,
H.261: (a) encoder; (b) decoder
1. An I - frame is usually sent a couple of times in each second of the video.
2. As discussed earlier, decoded frames (not the original frames) are used as
reference frames in motion estimation.
Table Data flow at the observation points in H.261 encoder

Table Data flow at the observation points in H.261 decoder


To illustrate the operational detail of the encoder and decoder, let's use a scenario where
frames I, P1, and P2  are encoded and then decoded. The data that goes through the
observation points, indicated by the circled numbers in the above figure is summarized in
the above tables. We will use I, P1, P2  for the original data,for the decoded data (usually a
lossy version of the original), and P' 1, P' 2for the predictions in the Inter - frame mode.
For the encoder, when the Current Frame is an Intra - frame, Point number 1 receives
macroblocks from the I - frame. DCT, Quantization, and Entropy Coding steps, and the result
is sent to the Output Buffer, ready to be transmitted.

Meanwhile, the quantized DCT coefficients for I are also sent to Q - 1  and IDCT and hence
appear at Point as I. Combined with a zero input from Point, the data at Point remains as I
and this is stored in Frame Memory, waiting to be used for Motion Estimation and Motion -
Compensation - based Prediction for the subsequent frame P1.
Quantization Control serves as feedback — that is, when the Output Buffer is too full, the
quantization step size is increased, so as to reduce the size of the coded data. This is
known as an encoding rate control process.
When the subsequent Current Frame P1 arrives at Point 1, the Motion Estimation process is
invoked to find the motion vector for the best matching macroblock in frame I for each of
the macroblocks in P1. The estimated motion vector is sent to both Motion - Compensation
- based Prediction and Variable - Length Encoding (VLE). The MC - based Prediction yields
the best matching macroblock in P1. This is denoted as P`1  appearing at Point 2.
At Point, the "prediction error" is obtained, which is D1  = P1 - P`1.  Now D1  undergoes DCT,
Quantization, and Entropy Coding, and the result is sent to the Output Buffer. As before, the
DCT coefficients for D1  are also sent to Q - l  and IDCT and appear at Point 4 as D1.
Added to P’1 at Point, we have P' 1 = P' 1 + D' 1at Point6. This is stored in Frame Memory,
waiting to be used for Motion Estimation and Motion - Compensation - based Prediction for
the subsequent frame P2.  The steps for encoding P2  are similar to those for P1, except
that P2will be the Current Frame and P1 becomes the Reference Frame.
For the decoder, the input code for frames will be decoded first by Entropy Decoding, Q -
1,  and IDCT. For Intra - frame mode, the first decoded frame appears at Point 1 and then
Point 4 as I. It is sent as the first output and at the same time stored in the Frame Memory.
Subsequently, the input code for Inter - frame Pi is decoded, and prediction error D1  is
received at Point. Since the motion vector for the current macroblock is also entropy -
decoded and sent to Motion - Compensation - based Prediction, the corresponding
predicted macroblock P’1 can be located in frame I and will appear at Points.
Combined with D' 1, we have P'1 = P' 1 + D' 1 at point, and it is sent out as the decoded
frame and also stored in the Frame Memory, Again, the steps for decoding P2 are similar to
those for P1

A Glance at the H.261 Video Bitstream Syntax


Let's take a brief look at the H.261 video bitstream syntax. This consists of a hierarchy of
four layers: Picture, Group of Blocks (GOB), Macroblock,  and Block.
1. Picture layer.Picture Start Code (PSC)  delineates boundaries between
pictures. Temporal Reference (TR)  provides a timestamp for the picture. Since
temporal subsampling can sometimes be invoked such that some pictures will not
be transmitted, it is important to have TR, to maintain synchronization with
audio. Picture Type (PType)  specifies, for example, whether it is a OF or QCIF picture.
2. GOB layer. H.261 pictures are divided into regions of 11 x 3 macroblocks (i.e.,
regions of 176 x 48 pixels in luminance images), each of which is called a Group of
Blocks {GOB).  For instance, the OF image has 2 x 6 GOBs, corresponding to its image
resolution of 352 x 288 pixels.
Each GOB has its Start Code (GBSC)  and. Group number (GN).  The GBSC is unique and can
be identified, without decoding the entire variable - length code in the bitstream. In case a
network error causes a bit error or the loss of some bits, H.261 video can be recoyered and
resynchronized at the next identifiable GOB, preventing the possible propagation of errors.
Syntax of H.261 video bitstream

GQuant  indicates the quantizer to be used in the GOB, unless it is overridden by any
subsequent Macroblock Quantizer (MQuant).  GQuant and MQuant are referred to
as scale. Each macroblock (MB)  has its own Address,  indicating its position within the GOB,
quantizer (MQuant), and six 8 x 8 image blocks (4 Y, 1 Cb, 1 Cr). Type  denotes whether it is
an Intra- or Inter, motion - compensated or non - motion - compensated macroblock. Motion
Vector Data (MVD)  is obtained by taking the
Arrangement of GOBs in H.261 luminance images

difference between the motion vectors of the preceding and current macroblocks.
Moreover, since some blocks in the macroblocks match well and some match poorly in
Motion Estimation, a bitmask Coded Block Pattern (CBP)  is used to indicate this
information. Only well - matched blocks will have their coefficients transmitted. Block layer.
For each 8^ x. g block, the bitstream starts with DC value,  followed by pairs of length of zero
- run (Rim)  and the subsequent nonzero value (Level)  for ACs, and finally the End of Block
(EOB) code.  The range of "Run" is [0,63]. "Level" reflects quantized values its range is [ -
127,127], and Level ≠ 0.

You might also like