navbar
stwhite

Video Encoding Standards


See Also:


Introduction

Multiple protocols exist that handle audio and video over digital infrastructures. "Handling" encompasses some or all of the following functions: encoding, compression, storage, and transmission. Among the best-known protocols that include this functionality are Intel's Indeo; Apple's Quicktime; ITU's MPEG 1, 2, and 4, H.320, and JPEG; GI's Digicipher (1 and 2); proprietary protocols by Picturetel and Compression Labs, and many others. These protocols perform slightly different tasks, so they differ by design and have the usual tradeoffs among bandwidth, processor costs, memory requirements, and commercial interests. This paper summarizes some of these protocols (where information is available), with particular focus on MPEG-2.

Functional Differences -- Video

Still Pictures

JPEG is the International Telecommunication Union (ITU) protocol for encoding and compression of still pictures, either color or black and white. In most environments (but not all), it has higher quality, more colors, and greater compression than GIF or similar encoding/compression protocols. In the broadcast environment, JPEG can be used to show images that do not move, such as Home Shopping Network zirconia.

Videoconferencing

Videoconferencing, unlike broadcast television, has a low requirement for motion. That is, videoconferences are not sporting events, so motion prediction algorithms can assume relatively slow and restricted movement. For example, H.261, the ITU videoconferencing standard, assumes movement of plus or minus 15 pixels from frame to frame. If there is quicker movement, there is some picture loss, which is manifested as jerky movement. Also, sound quality is generally less than telephone toll quality.

H.261, Picturetel, Compression Labs, and Indeo are basically videoconferencing protocols and are generally useful for remote meetings but not for broadcast quality. The vendor protocols do not interoperate, but rather, use H.261 as a least common denominator to interoperate with each other's equipment.

Progressive Scanned Video

Progressive scanning is the term applied to a particular method of monitor display. In progressive scanning, the first line is displayed on the monitor, then the second, then the third, and so forth until the frame is completely painted. Intuitively, this process is very simple.

The advantage of progressive scanning is that it is relatively simple to compress a single frame. For any pixel at a particular location, there is a high probability that all eight contiguous pixels have the same value. This information is used when compressing a single frame (spatial compression).

Progressive scanning is also used for videoconferencing, computer monitors, and motion pictures. MPEG-1 was designed for use by progressively scanned media, such as CD-ROM.

Interlaced Presentation

Decades ago, the broadcast television industry elected to use another display technique. Instead of displaying each line in order, it uses interlacing, which displays the odd-numbered lines first. After all the odd-numbered lines are displayed, the even numbered-lines are displayed. This technique takes advantage of the fact that the human eye cannot discern flicker at 1/30th of a second. Further, viewers do not need every line painted. Interlacing reduces by half the frequency by which frames are painted.

The problem with interleaving is that it is more difficult to compress spatially. For any pixel at a particular position, only the pixels before and after on the same line are to be displayed within the scan rate of 1/30th of a second. The other six pixels will be displayed 1/30th of a second later. So normal spatial compression algorithms are a bit more complicated, although not impossibly so.

MPEG-2 is designed, among other things, for compression of interlaced displays. MPEG-1 is not suitable for television broadcast.

High-Definition Television

All consumer television today is analog. Digital technology gives rise to the possibility that there could be television with bigger screens, finer resolution, brighter colors, and studio-quality sound. High-definition television (HDTV) refers to a series of standards that define finer- resolution digital television.

Recent industry views have changed about HDTV. It is no longer viewed as an option for consumers. The television sets would be too expensive for too long, viewers cannot differentiate the quality on anything less than a 50-inch monitor, and programmers would rather sell more channels than better quality. If HDTV is used at all, it will be in commercial applications where large screens are required, such as air traffic control, network management monitors, baseball/football stadiums, and the like.

MPEG-2 was chosen as the encoding, compression, and transmission format for HDTV, in part because of its multiplexing and encryption features.

More on MPEG

It is possible to display a video with a sequence of JPEG pictures at 24 pictures per second (the rate used in motion pictures) or 30 frames per second (the rate used in U.S. broadcast television). However, this would not provide optimal compression. At these rates, any picture is very much like the picture either immediately preceding or immediately following it. This information should be used to transmit (and store) fewer bits. MPEG provides this interframe compression, called temporal compression.

To achieve temporal compression, some frames are computed from other frames. The technique is to define three different kind of frames. First there are Intraframes or I frames. These are much like fully coded JPEG pictures. Next there are Predicted frames or P frames. These are predicted from I frames or other P frames. Finally, there are Bidirectional frames or B frames. B frames are interpolated from I and/or P frames.

The process is as follows. The encoder sends a I frame. Then a P frame is sent, perhaps 100 ms later. The time interval is set by configuration. The decoder cannot display the two pictures consecutively, because a 100-ms gap would not provide a smooth picture. So the pictures in between are computed (interpolated) from the two. The sequence of frames in a video may be similar to the following:

  ------------------------------
  Time (ms)               Frame
  ==============================
  0                       I
  30                      B
  60                      B
  90                      P
  120                     B
  150                     B
  180                     P
  210                     I
  Repeat ...
  ------------------------------

This example is for illustration purposes only. By convention, I frames are sent roughly every 400 ms. Also by convention, there are generally 10 to 12 frames between I frames. The mix of B frames and P frames is variable. Some users have elected not to use B frames at all but to use more P frames instead.

B frames tend to make pictures smoother on playback while consuming less bandwidth. The problem is that they force the decoder to buffer P frames and compute B frames. This requirement increases decoder costs, which is a particular problem for cable TV set top manufacturers. General Instrument's Digicipher protocol has specifically excluded B frames in an effort to keep costs down.

This architecture also has implications on networking. I frames anchor picture quality, because ultimately P and B frames are derived from them. Therefore, it is important that I frames be transmitted with higher reliability than P or B frames. Thus, when transmitting MPEG frames over ATM or Frame Relay, it is advisable that I frames be given priority.

Video encoders should dynamically react to network congestion by dynamically altering the mix of I and B frames.

MPEG Types
There are three types of MPEG; numbers 1, 2, and 4. They address different issues. MPEG-1 video is optimized for T1/E1 speeds, single programs in a stream, and progressive scanning. MPEG-1 audio provides CD-ROM-quality stereo sound.

MPEG-2 is enhanced to handle HDTV. It supports higher speeds, multiple programs in a single stream, and interlaced as well as progressive images. MPEG-2 multiplexing provides data transmission, which may be necessary for home shopping. MPEG-2 audio supports MPEG-1 and has options for lower-quality sound, such as secondary audio channels for television broadcast.

MPEG-4 is designed for DS0 audio/video, such as MIME messages. MPEG-4 work is still in process.

MPEG-2 is rapidly assuming a centerpiece role in broadband networking. It is backward compatible with MPEG-1, which means that MPEG-2 decoders can display MPEG-1 encoded files. It has full functionality for video on demand, television broadcast, and Mosaic-type data services. MPEG-2 chips exist that permit real-time encoding, and there is a specification for MPEG-2 adaptation over ATM AAL5.


Posted: Fri Mar 19 09:54:40 PST 1999
Copyright 1996 © Cisco Systems Inc.