H.262
Encyclopedia
H.262 or MPEG-2 Part 2 is a digital video
compression and encoding standard developed and maintained jointly by ITU-T
Video Coding Experts Group (VCEG) and ISO
/IEC
Moving Picture Experts Group
(MPEG). It is the second part of the ISO/IEC MPEG-2
standard. The ITU-T Recommendation H.262 and ISO/IEC 13818-2 documents are identical. The standard is available for a fee from the ITU-T and ISO.
, but also provides support for interlaced video (the format used by analog broadcast TV systems). MPEG-2 video is not optimized for low bit-rates (less than 1 Mbit/s), but outperforms MPEG-1 at 3 Mbit/s and above. All standards-conforming MPEG-2 Video decoders are fully capable of playing back MPEG-1 Video streams.
In 1996 it was extended by two amendments by preetham to include the Registration of Copyright Identifiers and the 4:2:2 Profile. ITU-T published these amendments in 1996 and ISO in 1997.
There are also other amendments published later by ITU-T and ISO.
TV cameras used in broadcasting usually generate 25
pictures a second (in Europe) or 29.97
pictures a second (in North America). Digital television requires that these pictures be digitized so that they can be processed by computer hardware. Each picture element (a pixel
) is then represented by one luma
number and two chrominance
numbers. These describe the brightness and the color of the pixel (see YCbCr
). Thus, each digitized picture is initially represented by three rectangular arrays of numbers.
A common (and old) trick to reduce the amount of data is to separate each picture into two fields upon broadcast/encoding: the "top field," which is the odd numbered horizontal lines, and the "bottom field," which is the even numbered lines. Upon reception/decoding, the two fields are displayed alternately with the lines of one field interleaving between the lines of the previous field. This format is called interlaced video; two successive fields are called a frame. The typical field rate is then 50 (Europe/PAL) or 59.94 (US/NTSC) fields per second. If the video is not interlaced, then it is called progressive video and each picture is a frame. MPEG-2 supports both options.
Another common practice to reduce the data rate is to "thin out" or subsample
the two chrominance
planes. In effect, the remaining chrominance values represent the nearby values that are deleted. Thinning works because the eye better resolves brightness details than chrominance details. The 4:2:2 chrominance format indicates that half the chrominance values have been deleted. The 4:2:0 chrominance format indicates that three quarters of the chrominance values have been deleted. If no chrominance values have been deleted, the chrominance format is 4:4:4. MPEG-2 allows all three options.
MPEG-2 specifies that the raw frames be compressed into three kinds of frames: intra-coded frames (I-frames), predictive-coded frames (P-frames), and bidirectionally-predictive-coded frames (B-frames).
An I-frame is a compressed version of a single uncompressed (raw) frame. It takes advantage of spatial redundancy and of the inability of the eye to detect certain changes in the image. Unlike P-frames and B-frames, I-frames do not depend on data in the preceding or the following frames. Briefly, the raw frame is divided into 8 pixel by 8 pixel blocks. The data in each block is transformed by a discrete cosine transform
. The result is an 8 by 8 matrix of coefficients. The transform converts spatial variations into frequency variations, but it does not change the information in the block; the original block can be recreated exactly by applying the inverse cosine transform. The advantage of doing this is that the image can now be simplified by quantizing
the coefficients. Many of the coefficients, usually the higher frequency components, will then be zero. The penalty of this step is the loss of some subtle distinctions in brightness and color. If one applies the inverse transform to the matrix after it is quantized, one gets an image that looks very similar to the original image but that is not quite as nuanced. Next, the quantized coefficient matrix is itself compressed. Typically, one corner of the quantized matrix is filled with zeros. By starting in the opposite corner of the matrix, then zigzagging through the matrix to combine the coefficients into a string, then substituting run-length codes
for consecutive zeros in that string, and then applying Huffman coding
to that result, one reduces the matrix to a smaller array of numbers. It is this array that is broadcast or that is put on DVDs. In the receiver or the player, the whole process is reversed, enabling the receiver to reconstruct, to a close approximation, the original frame.
Typically, every 15th frame or so is made into an I-frame. P-frames and B-frames might follow an I-frame like this, IBBPBBPBBPBB(I), to form a Group Of Pictures (GOP)
; however, the standard is flexible about this.
. To generate a P-frame, the previous reference frame is reconstructed, just as it would be in a TV receiver or DVD player. The frame being compressed is divided into 16 pixel by 16 pixel macroblock
s. Then, for each of those macroblocks, the reconstructed reference frame is searched to find that 16 by 16 macroblock that best matches the macroblock being compressed. The offset is encoded as a "motion vector." Frequently, the offset is zero. But, if something in the picture is moving, the offset might be something like 23 pixels to the right and 4 pixels up. The match between the two macroblocks will often not be perfect. To correct for this, the encoder takes the difference of all corresponding pixels of the two macroblocks, and on that macroblock difference then computes the strings of coefficient values as described above. This "residual" is appended to the motion vector and the result sent to the receiver or stored on the DVD for each macroblock being compressed. Sometimes no suitable match is found. Then, the macroblock is treated like an I-frame macroblock.
The processing of B-frames is similar to that of P-frames except that B-frames use the picture in a subsequent reference frame as well as the picture in a preceding reference frame. As a result, B-frames usually provide more compression than P-frames. B-frames are never reference frames.
While the above generally describes MPEG-2 video compression, there are many details that are not discussed including details involving fields, chrominance formats, responses to scene changes, special codes that label the parts of the bitstream, and other pieces of information.
The profile defines the subset of features such as compression algorithm, chroma format, etc. The level defines the subset of quantitative capabilities such as maximum bit rate, maximum frame size, etc.
A MPEG application then specifies the capabilities in terms of profile and level. For example, a DVD player may say it supports up to main profile and main level (often written as MP@ML). It means the player can play back any MPEG stream encoded as MP@ML or less.
The tables below summarizes the limitations of each profile and level. There are many other constraints not listed here. Note that not all profile and level combinations are permissible.
Exempting scalability (a rarely used feature where one MPEG-2 stream augments another), the following are some of the constraints on levels:
standard uses MPEG-2 video, but imposes some restrictions:
and Blu-ray Standards use MPEG-2 video along with MPEG-4/AVC (H.264) and VC1 video standards.
Digital video
Digital video is a type of digital recording system that works by using a digital rather than an analog video signal.The terms camera, video camera, and camcorder are used interchangeably in this article.- History :...
compression and encoding standard developed and maintained jointly by ITU-T
ITU-T
The ITU Telecommunication Standardization Sector is one of the three sectors of the International Telecommunication Union ; it coordinates standards for telecommunications....
Video Coding Experts Group (VCEG) and ISO
International Organization for Standardization
The International Organization for Standardization , widely known as ISO, is an international standard-setting body composed of representatives from various national standards organizations. Founded on February 23, 1947, the organization promulgates worldwide proprietary, industrial and commercial...
/IEC
International Electrotechnical Commission
The International Electrotechnical Commission is a non-profit, non-governmental international standards organization that prepares and publishes International Standards for all electrical, electronic and related technologies – collectively known as "electrotechnology"...
Moving Picture Experts Group
Moving Picture Experts Group
The Moving Picture Experts Group is a working group of experts that was formed by ISO and IEC to set standards for audio and video compression and transmission. It was established in 1988 by the initiative of Hiroshi Yasuda and Leonardo Chiariglione, who has been from the beginning the Chairman...
(MPEG). It is the second part of the ISO/IEC MPEG-2
MPEG-2
MPEG-2 is a standard for "the generic coding of moving pictures and associated audio information". It describes a combination of lossy video compression and lossy audio data compression methods which permit storage and transmission of movies using currently available storage media and transmission...
standard. The ITU-T Recommendation H.262 and ISO/IEC 13818-2 documents are identical. The standard is available for a fee from the ITU-T and ISO.
Overview
MPEG-2 Video is similar to MPEG-1MPEG-1
MPEG-1 is a standard for lossy compression of video and audio. It is designed to compress VHS-quality raw digital video and CD audio down to 1.5 Mbit/s without excessive quality loss, making video CDs, digital cable/satellite TV and digital audio broadcasting possible.Today, MPEG-1 has become...
, but also provides support for interlaced video (the format used by analog broadcast TV systems). MPEG-2 video is not optimized for low bit-rates (less than 1 Mbit/s), but outperforms MPEG-1 at 3 Mbit/s and above. All standards-conforming MPEG-2 Video decoders are fully capable of playing back MPEG-1 Video streams.
History
The ISO/IEC approval process was completed in November 1994. The first edition was approved in July 1995 and published by ITU-T and ISO/IEC in 1996.In 1996 it was extended by two amendments by preetham to include the Registration of Copyright Identifiers and the 4:2:2 Profile. ITU-T published these amendments in 1996 and ISO in 1997.
There are also other amendments published later by ITU-T and ISO.
Editions
Edition | Release date | Latest amendment | ISO/IEC standard | ITU-T Recommendation | Description |
---|---|---|---|---|---|
First edition | 1995 | 2000 | ISO/IEC 13818-2:1996 | H.262 (07/95) | |
Second edition | 2000 | 2010 (2011) | ISO/IEC 13818-2:2000 | H.262 (02/00) |
Video coding
An HDTV camera generates a raw video stream of 149,299,200 (=24*1920*1080*3) bytes per second for 24fps video. This stream must be compressed if digital TV is to fit in the bandwidth of available TV channels and if movies are to fit on DVDs. Fortunately, video compression is practical because the data in pictures is often redundant in space and time. For example, the sky can be blue across the top of a picture and that blue sky can persist for frame after frame. Also, because of the way the eye works, it is possible to delete some data from video pictures with almost no noticeable degradation in image quality.TV cameras used in broadcasting usually generate 25
576i
576i is a standard-definition video mode used in PAL and SECAM countries. In digital applications it is usually referred to as "576i", in analogue contexts it is often quoted as "625 lines"...
pictures a second (in Europe) or 29.97
480i
480i is the shorthand name for a video mode, namely the US NTSC television system or digital television systems with the same characteristics. The i, which is sometimes uppercase, stands for interlaced, the 480 for a vertical frame resolution of 480 lines containing picture information; while NTSC...
pictures a second (in North America). Digital television requires that these pictures be digitized so that they can be processed by computer hardware. Each picture element (a pixel
Pixel
In digital imaging, a pixel, or pel, is a single point in a raster image, or the smallest addressable screen element in a display device; it is the smallest unit of picture that can be represented or controlled....
) is then represented by one luma
Luma (video)
In video, luma, sometimes called luminance, represents the brightness in an image . Luma is typically paired with chrominance. Luma represents the achromatic image without any color, while the chroma components represent the color information...
number and two chrominance
Chrominance
Chrominance is the signal used in video systems to convey the color information of the picture, separately from the accompanying luma signal . Chrominance is usually represented as two color-difference components: U = B' − Y' and V = R' − Y'...
numbers. These describe the brightness and the color of the pixel (see YCbCr
YCbCr
YCbCr or Y′CbCr, sometimes written or , is a family of color spaces used as a part of the color image pipeline in video and digital photography systems. Y′ is the luma component and CB and CR are the blue-difference and red-difference chroma components...
). Thus, each digitized picture is initially represented by three rectangular arrays of numbers.
A common (and old) trick to reduce the amount of data is to separate each picture into two fields upon broadcast/encoding: the "top field," which is the odd numbered horizontal lines, and the "bottom field," which is the even numbered lines. Upon reception/decoding, the two fields are displayed alternately with the lines of one field interleaving between the lines of the previous field. This format is called interlaced video; two successive fields are called a frame. The typical field rate is then 50 (Europe/PAL) or 59.94 (US/NTSC) fields per second. If the video is not interlaced, then it is called progressive video and each picture is a frame. MPEG-2 supports both options.
Another common practice to reduce the data rate is to "thin out" or subsample
Chroma subsampling
Chroma subsampling is the practice of encoding images by implementing less resolution for chroma information than for luma information, taking advantage of the human visual system's lower acuity for color differences than for luminance....
the two chrominance
Chrominance
Chrominance is the signal used in video systems to convey the color information of the picture, separately from the accompanying luma signal . Chrominance is usually represented as two color-difference components: U = B' − Y' and V = R' − Y'...
planes. In effect, the remaining chrominance values represent the nearby values that are deleted. Thinning works because the eye better resolves brightness details than chrominance details. The 4:2:2 chrominance format indicates that half the chrominance values have been deleted. The 4:2:0 chrominance format indicates that three quarters of the chrominance values have been deleted. If no chrominance values have been deleted, the chrominance format is 4:4:4. MPEG-2 allows all three options.
MPEG-2 specifies that the raw frames be compressed into three kinds of frames: intra-coded frames (I-frames), predictive-coded frames (P-frames), and bidirectionally-predictive-coded frames (B-frames).
An I-frame is a compressed version of a single uncompressed (raw) frame. It takes advantage of spatial redundancy and of the inability of the eye to detect certain changes in the image. Unlike P-frames and B-frames, I-frames do not depend on data in the preceding or the following frames. Briefly, the raw frame is divided into 8 pixel by 8 pixel blocks. The data in each block is transformed by a discrete cosine transform
Discrete cosine transform
A discrete cosine transform expresses a sequence of finitely many data points in terms of a sum of cosine functions oscillating at different frequencies. DCTs are important to numerous applications in science and engineering, from lossy compression of audio and images A discrete cosine transform...
. The result is an 8 by 8 matrix of coefficients. The transform converts spatial variations into frequency variations, but it does not change the information in the block; the original block can be recreated exactly by applying the inverse cosine transform. The advantage of doing this is that the image can now be simplified by quantizing
Quantization (image processing)
Quantization, involved in image processing, is a lossy compression technique achieved by compressing a range of values to a single quantum value. When the number of discrete symbols in a given stream is reduced, the stream becomes more compressible. For example, reducing the number of colors...
the coefficients. Many of the coefficients, usually the higher frequency components, will then be zero. The penalty of this step is the loss of some subtle distinctions in brightness and color. If one applies the inverse transform to the matrix after it is quantized, one gets an image that looks very similar to the original image but that is not quite as nuanced. Next, the quantized coefficient matrix is itself compressed. Typically, one corner of the quantized matrix is filled with zeros. By starting in the opposite corner of the matrix, then zigzagging through the matrix to combine the coefficients into a string, then substituting run-length codes
Run-length encoding
Run-length encoding is a very simple form of data compression in which runs of data are stored as a single data value and count, rather than as the original run...
for consecutive zeros in that string, and then applying Huffman coding
Huffman coding
In computer science and information theory, Huffman coding is an entropy encoding algorithm used for lossless data compression. The term refers to the use of a variable-length code table for encoding a source symbol where the variable-length code table has been derived in a particular way based on...
to that result, one reduces the matrix to a smaller array of numbers. It is this array that is broadcast or that is put on DVDs. In the receiver or the player, the whole process is reversed, enabling the receiver to reconstruct, to a close approximation, the original frame.
Typically, every 15th frame or so is made into an I-frame. P-frames and B-frames might follow an I-frame like this, IBBPBBPBBPBB(I), to form a Group Of Pictures (GOP)
Group of pictures
In Video coding, a group of pictures, or GOP structure, specifies the order in which intra- and inter-frames are arranged. The GOP is a group of successive pictures within a coded video stream. Each coded video stream consists of successive GOPs...
; however, the standard is flexible about this.
Macroblocks
P-frames provide more compression than I-frames because they take advantage of the data in a previous I-frame or P-frame - a reference frameReference frame (video)
Reference frames are frames of a compressed video that are used to define future frames. As such, they are only used in inter-frame compression techniques. In older video encoding standards, such as MPEG-2, only one reference frame – the previous frame – was used for P-frames...
. To generate a P-frame, the previous reference frame is reconstructed, just as it would be in a TV receiver or DVD player. The frame being compressed is divided into 16 pixel by 16 pixel macroblock
Macroblock
Macroblock is an image compression component and technique based on discrete cosine transform used on still images and video frames. Macroblocks are usually composed of two or more blocks of pixels. In the JPEG standard macroblocks are called MCU blocks....
s. Then, for each of those macroblocks, the reconstructed reference frame is searched to find that 16 by 16 macroblock that best matches the macroblock being compressed. The offset is encoded as a "motion vector." Frequently, the offset is zero. But, if something in the picture is moving, the offset might be something like 23 pixels to the right and 4 pixels up. The match between the two macroblocks will often not be perfect. To correct for this, the encoder takes the difference of all corresponding pixels of the two macroblocks, and on that macroblock difference then computes the strings of coefficient values as described above. This "residual" is appended to the motion vector and the result sent to the receiver or stored on the DVD for each macroblock being compressed. Sometimes no suitable match is found. Then, the macroblock is treated like an I-frame macroblock.
The processing of B-frames is similar to that of P-frames except that B-frames use the picture in a subsequent reference frame as well as the picture in a preceding reference frame. As a result, B-frames usually provide more compression than P-frames. B-frames are never reference frames.
While the above generally describes MPEG-2 video compression, there are many details that are not discussed including details involving fields, chrominance formats, responses to scene changes, special codes that label the parts of the bitstream, and other pieces of information.
Video profiles and levels
MPEG-2 video supports a wide range of applications from mobile to high quality HD editing. For many applications, it's unrealistic and too expensive to support the entire standard. To allow such applications to support only subsets of it, the standard defines profile and level.The profile defines the subset of features such as compression algorithm, chroma format, etc. The level defines the subset of quantitative capabilities such as maximum bit rate, maximum frame size, etc.
A MPEG application then specifies the capabilities in terms of profile and level. For example, a DVD player may say it supports up to main profile and main level (often written as MP@ML). It means the player can play back any MPEG stream encoded as MP@ML or less.
The tables below summarizes the limitations of each profile and level. There are many other constraints not listed here. Note that not all profile and level combinations are permissible.
Abbr. | Name | Picture Coding Types | Chroma Format | Aspect Ratios | Scalable modes | Intra DC Precision |
---|---|---|---|---|---|---|
SP | Simple profile | I, P | 4:2:0 | square pixels, 4:3, or 16:9 | none | 8, 9, 10 |
MP | Main profile | I, P, B | 4:2:0 | square pixels, 4:3, or 16:9 | none | 8, 9, 10 |
SNR | SNR Scalable profile | I, P, B | 4:2:0 | square pixels, 4:3, or 16:9 | SNR (signal-to-noise ratio) scalable | 8, 9, 10 |
Spatial | Spatially Scalable profile | I, P, B | 4:2:0 | square pixels, 4:3, or 16:9 | SNR- or spatial-scalable | 8, 9, 10 |
HP | High profile | I, P, B | 4:2:2 or 4:2:0 | square pixels, 4:3, or 16:9 | SNR- or spatial-scalable | 8, 9, 10, 11 |
422 | 4:2:2 profile | I, P, B | 4:2:2 or 4:2:0 | square pixels, 4:3, or 16:9 | none | 8, 9, 10, 11 |
MVP | Multi-view profile | I, P, B | 4:2:0 | square pixels, 4:3, or 16:9 | Temporal | 8, 9, 10 |
Exempting scalability (a rarely used feature where one MPEG-2 stream augments another), the following are some of the constraints on levels:
Abbr. | Name | Frame rates (Hz) | Max horizontal resolution | Max vertical resolution | Max luminance samples per second (approximately height x width x framerate) | Max bit rate in Main profile (Mbit/s) |
---|---|---|---|---|---|---|
LL | Low Level | 23.976, 24, 25, 29.97, 30 | 352 | 288 | 3,041,280 | 4 |
ML | Main Level | 23.976, 24, 25, 29.97, 30 | 720 | 576 | 10,368,000, except in High profile, where constraint is 14,475,600 for 4:2:0 and 11,059,200 for 4:2:2 | 15 |
H-14 | High 1440 | 23.976, 24, 25, 29.97, 30, 50, 59.94, 60 | 1440 | 1152 | 47,001,600, except that in High profile with 4:2:0, constraint is 62,668,800 | 60 |
HL | High Level | 23.976, 24, 25, 29.97, 30, 50, 59.94, 60 | 1920 | 1152 | 62,668,800, except that in High profile with 4:2:0, constraint is 83,558,400 | 80 |
Profile @ Level | Resolution (px) | Framerate max. (Hz) | Sampling | Bitrate (Mbit/s) | Example Application |
---|---|---|---|---|---|
SP@LL | 176 × 144 | 15 | 4:2:0 | 0.096 | Wireless handsets |
SP@ML | 352 × 288 | 15 | 4:2:0 | 0.384 | PDAs |
320 × 240 | 24 | ||||
MP@LL | 352 × 288 | 30 | 4:2:0 | 4 | Set-top boxes (STB) |
MP@ML | 720 × 480 | 30 | 4:2:0 | 15 (DVD: 9.8) | DVD, SD-DVB |
720 × 576 | 25 | ||||
MP@H-14 | 1440 × 1080 | 30 | 4:2:0 | 60 (HDV: 25) | HDV |
1280 × 720 | 30 | ||||
MP@HL | 1920 × 1080 | 30 | 4:2:0 | 80 | ATSC 1080i, 720p60, HD-DVB (HDTV). (Bitrate for terrestrial transmission is limited to 19.39Mbit/s) |
1280 × 720 | 60 | ||||
422P@LL | 4:2:2 | ||||
422P@ML | 720 × 480 | 30 | 4:2:2 | 50 | Sony IMX using I-frame only, Broadcast "contribution" video (I&P only) |
720 × 576 | 25 | ||||
422P@H-14 | 1440 × 1080 | 30 | 4:2:2 | 80 | Potential future MPEG-2-based HD products from Sony and Panasonic |
1280 × 720 | 60 | ||||
422P@HL | 1920 × 1080 | 30 | 4:2:2 | 300 | Potential future MPEG-2-based HD products from Panasonic |
1280 × 720 | 60 |
DVD
The DVDDVD-Video
DVD-Video is a consumer video format used to store digital video on DVD discs, and is currently the dominant consumer video format in Asia, North America, Europe, and Australia. Discs using the DVD-Video specification require a DVD drive and a MPEG-2 decoder...
standard uses MPEG-2 video, but imposes some restrictions:
- Allowed Resolutions
- 720 × 480, 704 × 480, 352 × 480, 352 × 240 pixel (NTSC)
- 720 × 576, 704 × 576, 352 × 576, 352 × 288 pixel (PAL)
- Allowed Aspect ratio (image)Aspect ratio (image)The aspect ratio of an image is the ratio of the width of the image to its height, expressed as two numbers separated by a colon. That is, for an x:y aspect ratio, no matter how big or small the image is, if the width is divided into x units of equal length and the height is measured using this...
(Display AR)- 4:3
- 16:9
- (1.85:1 and 2.35:1, among others, are often listed as valid DVD aspect ratios, but are actually just a 16:9 image with the top and bottom of the frame masked in black)
- Allowed Frame rates
- 29.97 frame/s (NTSC)
- 25 frame/s (PAL)
- 23.976 frame/s through use of 3:2 pulldown (FILM)
- Audio+video bitrate
- Video peak 9.8 Mbit/s
- Total peak 10.08 Mbit/s
- Minimum 300 kbit/s
- Y'CbCr 4:2:0Chroma subsamplingChroma subsampling is the practice of encoding images by implementing less resolution for chroma information than for luma information, taking advantage of the human visual system's lower acuity for color differences than for luminance....
- GOP structure
- Sequence header must be present at the beginning of every GOP
- Maximum frames per GOP: 18 (NTSC) / 15 (PAL), i.e. 0.6 seconds both
- Closed GOP required for multiple-angle DVDs
HD DVD & Blu-ray
The HD DVDHD DVD
HD DVD is a discontinued high-density optical disc format for storing data and high-definition video.Supported principally by Toshiba, HD DVD was envisioned to be the successor to the standard DVD format...
and Blu-ray Standards use MPEG-2 video along with MPEG-4/AVC (H.264) and VC1 video standards.
External links
- Official MPEG web site
- MPEG-2 Video Encoding (H.262) - The Library of Congress