Content Repurposing For Mobile TV Networks
Content Repurposing For Mobile TV Networks
Content Repurposing For Mobile TV Networks
Abstract
Today a few content only exist for Mobile TV broadcast and there is a strong need in
adapting existing contents from different formats to fit in handsets' small screens. Present
conference highlights major technical challenges to provide the highest possible quality while
transcoding content from SD to Mobile TV video format. Different video coding formats will be
introduced according to the target Mobile TV networks, presenting as well the main
constraints brought by watching TV programs on a Mobile phone. Dedicated techniques and
algorithms have to be used to take the most out of the Mobile TV transcoding chain, and a
set of solutions to optimize transcoding processing for a smooth integration in a Mobile TV
system will be listed.
.1.1Digital TV
Digital TV has been introduced in late 1990s and progressively replaced previous TV
broadcast systems based on analog systems to transmit TV. Digital TV brings more
efficiency, more flexibility and also provides better quality for TV delivery.
Digital TV technologies provide ways to transmit and transport TV signal in a digital way. The
company consortium called the DVB project (DVB: Digital Video Broadcasting) defined and
published a set of open standards that where internationally accepted as the de-facto digital
TV standards. In this set of standards, all the aspects have been defined to ensure
interoperability of digital TV around the world. Some major elements of these standards are
based on works from MPEG (Moving Picture Experts Group), famous working group of
ENENSYS Technologies SA Tel: +33 1 70 72 51 70
Le Germanium Fax: +33 2 99 36 03 84
80, avenue des Buttes de Coesmes
35700 RENNES - FRANCE e-mail: contact@enensys.com
ISO/IEC organizations.
Like DVB, ATSC (Advanced Television Systems Committee) published digital TV standards
used in the United States and some other countries.
.1.2Coding standards
Compression - or encoding - has always been a requirement to digitally transport video and
audio data.
MPEG especially developed video and audio encoding standards. For broadcast-quality
television, MPEG video compression has been used in all the existing digital television
systems. The first MPEG project defined in 1993 was MPEG-1 (ISO/IEC 11172) and was
mainly targeted for VideoCD compression. MPEG-2 (ISO/IEC 13818) standard, published in
1995, is capable of coding standard-definition television at bit rates from about 3-15 Mbit/s
and high-definition television at 15-30 Mbit/s. MPEG-2 also extends the stereo audio
capabilities of MPEG-1 to multi-channel surround sound coding.
In short, the goal of compression algorithms is simple: reduce bit rate to carry the video data
while preserving the quality. The bit rate reduction is obtained to take advantage of 2 kinds of
redundancy present in video signals:
- Temporal and spatial redundancy: Image pixels are not independent: pixels are correlated
with their neighbors both within the same frame (spatial redundancy) and across frames
(temporal redundancy) . To some extent values of neighboring (in time or in space) pixels
help to predict other pixel values.
- Psycho-visual redundancy: Thanks to human eyes that has limited sensitivity to some
details (object edges, ...), it is possible to introduce impairments (artifacts) in a compression/
decompression chain without troubling the quality perception of human eyes.
.1.3Video formats
Digital TV transition has first been forced to keep many of traditional analog TV constraints to
avoid replacing all the TV receivers when switching from analog to digital broadcast. For the
video format, the Standard Definition (SD) has been used when broadcasting at the same
resolution as analog systems. For the color encoding systems, it means that the two main
formats used worldwide in analog TV continued to exist in the digital world: NTSC and PAL.
This is a concern when we talk about repurposing the content because the two color systems
differ in vertical resolution and in temporal resolution. The vertical resolution is measured in
number of lines that composes the image. The temporal resolution is measure in frames per
seconds, or more exactly in fields per seconds. We will go through this fields/frames aspect
in a section later about interlacing video.
NTSC, used in North America, Central America, Japan and some other countries provides
images composed of 525 lines at about 30 frames per seconds (with 486 visible lines, the
rest being used for other information such as sync data and captioning). PAL, used in large
parts of the world (Europe, Asia, Africa, ...) provides images composed of 625 line at 25
frames per seconds (576 visible lines). So for comparison, PAL uses has a higher vertical
resolution, but a lower temporal resolution than NTSC. For the horizontal resolution, both
uses 720 columns in digital format.
More recently, digital TV has been extended to offer bigger resolutions than analog systems
ENENSYS Technologies SA Tel: +33 1 70 72 51 70
Le Germanium Fax: +33 2 99 36 03 84
80, avenue des Buttes de Coesmes
35700 RENNES - FRANCE e-mail: contact@enensys.com
commonly referred as High Definition and is composed of multiple formats (720p, 1080i, ...)
.1.4Interlacing
Interlace is a method invented in the 1930s to improve the picture quality of a video signal by
removing some flickering artifacts on cathode ray tube screens. It consists of painting
alternatively half-resolution video image to take advantage of human eye persistence.
Instead of painting every single frames at about 25-30 frames per seconds (called
progressive scan), the images is refreshed at about 50-60 times per seconds but only half of
the image (bottom and even lines alternatively).
A1 A2 B1 B2
½ FRAME 1 (odd fields) ½ FRAME 2 (even fields)
FRAME 1 FRAME 2
Even if cathode ray tube are being replaced with progressive displays (computer screens,
LCD TV, ...) all over the world, current digital TV signals - in single definition - is still in
interlaced formats. It then required smart methods to "remove" the interlacing effects to
display the images efficiently on new displays.
The goal is to repurpose TV to these resolutions while preserving the best user experience.
.2.2Coding standards
New mobile TV standards brought new compression technologies. MPEG formed with ITU-T
Video Coding Experts Group (VCEG) in December 2001 the Joint Video Team (JVT) with the
purpose to develop a new video coding standard that it is usually known as MPEG-4 AVC or
H.264. The main objective was to achieve at least the double of the coding compression, for
the same video quality, when compared to all the available video coding standards.
While the MPEG2 coding standard has been standardized mainly targeting the broadcasting
market with bit rate of around 4 Mbps for standard definition video, H.264 coding standard is
flexible and offers a number of compression tools in order to support a range of applications
with very low as well as very high bit rate requirements. It is now observed that H.264
standard gives equivalent video quality at up to 50% of MPEG2 bit rates.
For comparison, typical bit rates used in mobile TV deployment are from 200 to 400 kbps.
SD (PAL)
720x576
HD QVGA
1920x1080 (1080i) 320x24
1280x720 (720p) 0
ENENSYS Technologies SA Tel: +33 1 70 72 51 70
Le Germanium Fax: +33 2 99 36 03 84
80, avenue des Buttes de Coesmes
35700 RENNES - FRANCE e-mail: contact@enensys.com
320x240 (QVGA), 352x288 (CIF) or even smaller resolutions. We quickly notice that we will
need to rescale/resize the image to adapt to the new resolutions.
In image processing, tons of algorithms exist in this spatial resolution change domain. If you
ever use any image manipulation software to perform some modifications to your personal
pictures, you probably have already used one of this image scaling algorithm.
Several methods exist. Let's review three of them. The simplest method is called the "nearest
neighbor interpolation", where the algorithm simply selects the value of the nearest point.
This is a really fast algorithm but has really undesirable “stairway effects”. Another method to
consider is called "bilinear interpolation". This is far better than "nearest" methods but the
resulting image can still be jagged. A third method called "Bicubic interpolation" gives even
better and sharper images. It requires however more processing power than "bilinear
interpolation".
Which method should we use? It is always a trade-off of performance versus quality. If you
want to get the best quality for the target video, always prefer methods that gives similar
results as "bicubic interpolation".
.3.2Coding standards
As we reviewed in previous sections, we need to move from MPEG-2 to MPEG-4. But this is
of course far more complicated than just changing 1 letter.
MPEG-2 and MPEG-4 shares the same basic structure. More precisely MPEG-2 is a sub-set
of MPEG-4. MPEG-4 brings major compression efficiency improvements over MPEG-2 but at
the cost of a very higher encoding complexity. For many applications like real-time encoding,
MPEG-4 requires large amount of processing power and any ways to simplify and reduce the
encoding cost were studied. That was the first target of MPEG-2 to MPEG-4 transcoding:
MPEG-4 encoding with minimal complexity. First methods that have been experimented was
to only do the minimal changes in the MPEG-2 bitstreams to become valid MPEG-4
bitstreams. This requires to translate MPEG-2 coding tools into their MPEG-4 counterparts.
This has the border effect to limit MPEG-4 tools to what was allowed in MPEG-2 and to not
use any of the key improvements of MPEG-4 (better inter-picture prediction, new entropy
coding systems, in-loop deblocking filter, ...). We then had a MPEG-4 valid streams but
without the expected quality improvements.
In SD to Mobile repurposing, as we intend to not only change the video coding standard but
also to change the resolution and the bit rates, one pragmatic method is recommended: to
combine a full MPEG-2 decoder and a full MPEG-4 encoder chained together. By doing this,
more processing power is required but it allows to have an optimal MPEG-4 encoding for the
target bit rate and also allows to do some image processing between the video coding
standards.
.3.3Framing
Anyone who has tried to watch a sport event, such as a soccer game, originally shot in for
SD or HD television screen on a mobile screen can attest that the experience can be less
than desirable:
The ball can be invisible to track when the camera shows the complete game field,
Players are often impossible to recognize,
This will stay a big concern until broadcast studios put in place dedicated camera or
appropriate editing system targeted for mobile video.
The key solution to this problem is to isolate the best region of interest in every image and
zoom to this region. The system must also preserve a smooth video experience and avoid
too fast changing zooms over the time.
Thanks to some advanced image tracking solution, algorithms to identify regions of interest
in an image have been applied to video sequences and provided some ways to automatically
zoom to such region of interest. It is however unlikely that a complete automatic system will
exist in the future. Every nature of content (sport, news, movies, ...) would imply really
different mechanisms to isolate regions of interest and only human eyes and brain can really
say "this part of the image is important".
.3.5De-interlacing
As we analyzed in first section, most of existing digital TV contents are in traditional
ENENSYS Technologies SA Tel: +33 1 70 72 51 70
Le Germanium Fax: +33 2 99 36 03 84
80, avenue des Buttes de Coesmes
35700 RENNES - FRANCE e-mail: contact@enensys.com
interlaced format. So when repurposing the content and transforming it, it is required to deal
with this aspect. As we will never see a mobile handset with cathode ray tube, all the mobile
TV standards are all in progressive video format (the opposite of interlaced).
Several methods (called de-interlacing) exist that recombine the 2 interlace fields into 1
frame. As the 2 fields have a slight time difference, it may result of “tearing” effects (alternate
lines displaced from each other). Over the years, multiple algorithms have been imagined to
avoid this effect. From the basic method that simply blends the 2 fields to advanced methods
using motion compensation systems to better align the 2 fields, you will find many literature
about de-interlacing methods everywhere.
We can do better: when rescaling SD images to resolutions with half or more of the original
height, the simplest and most efficient aspect is to simply use 1 out of 2 fields in the vertical
resolution. This method, sometimes called half-sizing, is perfect when we intend to rescale
the image to CIF of lower resolution in our problem.
.3.6Aspect ratios
As we noticed in previous sections, nearly all the mobile TV handsets are designed to display
resolutions with traditional 4/3 aspect ratio. However many SD contents and all the HD
contents are now prepared with widescreen aspect ratios like 16/9.
There is no magic trick to preserve wide screen aspect ratios on mobile TV: we have to use
the same good methods used to display 16/9 contents on 4/3 TV set : letterbox or pan and
scan method.
Letterbox is the simplest method: we simply adds black bars at the top and the bottom of the
picture. All the image is preserved but we can easily imagine that having 2 black borders on
a 2 to 3 inches LCD screen is a real waste of space.
The famous pan and scan method consist in intelligently panning and scanning horizontally
across the widescreen film to keep the action in the middle of the screen. This avoid wasting
some screen space but this results of 30% of the original image loss.