Sum of absolute differences
Encyclopedia
Sum of absolute differences (SAD) is a widely used, extremely simple algorithm for measuring the similarity between image blocks
. It works by taking the absolute difference
between each pixel
in the original block and the corresponding pixel in the block being used for comparison. These differences are summed to create a simple metric of block similarity, the L1 norm
of the difference image.
The sum of absolute differences may be used for a variety of purposes, such as object recognition, the generation of disparity maps
for stereo
images, and motion estimation
for video compression.
from 0 to 9.
There are exactly three unique locations within the search image where the template may fit: the left side of the image, the center of the image, and the right side of the image. To calculate the SAD values, the absolute value of the difference between each corresponding pair of pixels is used: the difference between 2 and 2 is 0, 4 and 1 is 3, 7 and 8 is 1, and so forth.
Calculating the SAD values for each of these locations gives the following:
For each of these three image patches, the absolute differences are added together, giving a SAD value of 20, 25, and 17, respectively. From these SAD values, it is apparent that the right side of the search image is the most similar to the template image, because it has the least difference as compared to the other locations.
, to improve the reliability of results.
in a block. Therefore it is very effective for a wide motion search of many different blocks. SAD is also easily parallelizable
since it analyzes each pixel separately, making it easily implementable with such instructions as MMX and SSE2
. For example, SSE has packed sum of absolute differences instruction (PSADBW) specifically for this purpose. Once candidate blocks are found, the final refinement of the motion estimation process is often done with other slower but more accurate metrics, which better take into account human perception
. These include the sum of absolute transformed differences
(SATD), the sum of squared differences (SSD), and rate-distortion optimization.
Macroblock
Macroblock is an image compression component and technique based on discrete cosine transform used on still images and video frames. Macroblocks are usually composed of two or more blocks of pixels. In the JPEG standard macroblocks are called MCU blocks....
. It works by taking the absolute difference
Absolute difference
The absolute difference of two real numbers x, y is given by |x − y|, the absolute value of their difference. It describes the distance on the real line between the points corresponding to x and y...
between each pixel
Pixel
In digital imaging, a pixel, or pel, is a single point in a raster image, or the smallest addressable screen element in a display device; it is the smallest unit of picture that can be represented or controlled....
in the original block and the corresponding pixel in the block being used for comparison. These differences are summed to create a simple metric of block similarity, the L1 norm
Lp space
In mathematics, the Lp spaces are function spaces defined using a natural generalization of the p-norm for finite-dimensional vector spaces...
of the difference image.
The sum of absolute differences may be used for a variety of purposes, such as object recognition, the generation of disparity maps
Binocular disparity
Binocular disparity refers to the difference in image location of an object seen by the left and right eyes, resulting from the eyes' horizontal separation. The brain uses binocular disparity to extract depth information from the two-dimensional retinal images in stereopsis...
for stereo
Computer stereo vision
Computer stereo vision is the extraction of 3D information from digital images, such as obtained by a CCD camera. By comparing information about a scene from two vantage points, 3D information can be extracted by examination of the relative positions of objects in the two panels...
images, and motion estimation
Motion estimation
Motion estimation is the process of determining motion vectors that describe the transformation from one 2D image to another; usually from adjacent frames in a video sequence. It is an ill-posed problem as the motion is in three dimensions but the images are a projection of the 3D scene onto a 2D...
for video compression.
Example
This example uses the sum of absolute differences to identify which part of a search image is most similar to a template image. In this example, the template image is 3 by 3 pixels in size, while the search image is 3 by 5 pixels in size. Each pixel is represented by a single integerInteger
The integers are formed by the natural numbers together with the negatives of the non-zero natural numbers .They are known as Positive and Negative Integers respectively...
from 0 to 9.
Template Search image
2 5 5 2 7 5 8 6
4 0 7 1 7 4 2 7
7 5 9 8 4 6 8 5
There are exactly three unique locations within the search image where the template may fit: the left side of the image, the center of the image, and the right side of the image. To calculate the SAD values, the absolute value of the difference between each corresponding pair of pixels is used: the difference between 2 and 2 is 0, 4 and 1 is 3, 7 and 8 is 1, and so forth.
Calculating the SAD values for each of these locations gives the following:
Left Center Right
0 2 0 5 0 3 3 3 1
3 7 3 3 4 5 0 2 0
1 1 3 3 1 1 1 3 4
For each of these three image patches, the absolute differences are added together, giving a SAD value of 20, 25, and 17, respectively. From these SAD values, it is apparent that the right side of the search image is the most similar to the template image, because it has the least difference as compared to the other locations.
Object recognition
The sum of absolute differences provides a simple way to automate the searching for objects inside an image, but may be unreliable due to the effects of contextual factors such as changes in lighting, color, viewing direction, size, or shape. The SAD may be used in conjunction with other object recognition methods, such as edge detectionEdge detection
Edge detection is a fundamental tool in image processing and computer vision, particularly in the areas of feature detection and feature extraction, which aim at identifying points in a digital image at which the image brightness changes sharply or, more formally, has discontinuities...
, to improve the reliability of results.
Video compression
SAD is an extremely fast metric due to its simplicity; it is effectively the simplest possible metric that takes into account every pixelPixel
In digital imaging, a pixel, or pel, is a single point in a raster image, or the smallest addressable screen element in a display device; it is the smallest unit of picture that can be represented or controlled....
in a block. Therefore it is very effective for a wide motion search of many different blocks. SAD is also easily parallelizable
Parallel processing
Parallel processing is the ability to carry out multiple operations or tasks simultaneously. The term is used in the contexts of both human cognition, particularly in the ability of the brain to simultaneously process incoming stimuli, and in parallel computing by machines.-Parallel processing by...
since it analyzes each pixel separately, making it easily implementable with such instructions as MMX and SSE2
SSE2
SSE2, Streaming SIMD Extensions 2, is one of the Intel SIMD processor supplementary instruction sets first introduced by Intel with the initial version of the Pentium 4 in 2001. It extends the earlier SSE instruction set, and is intended to fully supplant MMX. Intel extended SSE2 to create SSE3...
. For example, SSE has packed sum of absolute differences instruction (PSADBW) specifically for this purpose. Once candidate blocks are found, the final refinement of the motion estimation process is often done with other slower but more accurate metrics, which better take into account human perception
Visual system
The visual system is the part of the central nervous system which enables organisms to process visual detail, as well as enabling several non-image forming photoresponse functions. It interprets information from visible light to build a representation of the surrounding world...
. These include the sum of absolute transformed differences
Sum of absolute transformed differences
Sum of absolute transformed differences is a widely used video quality metric used for block-matching in motion estimation for video compression. It works by taking a frequency transform, usually a Hadamard transform, of the differences between the pixels in the original block and the...
(SATD), the sum of squared differences (SSD), and rate-distortion optimization.
See also
- Computer stereo visionComputer stereo visionComputer stereo vision is the extraction of 3D information from digital images, such as obtained by a CCD camera. By comparing information about a scene from two vantage points, 3D information can be extracted by examination of the relative positions of objects in the two panels...
- Hadamard transformHadamard transformThe Hadamard transform is an example of a generalized class of Fourier transforms...
- Motion compensationMotion compensationMotion compensation is an algorithmic technique employed in the encoding of video data for video compression, for example in the generation of MPEG-2 files. Motion compensation describes a picture in terms of the transformation of a reference picture to the current picture. The reference picture...
- Motion estimationMotion estimationMotion estimation is the process of determining motion vectors that describe the transformation from one 2D image to another; usually from adjacent frames in a video sequence. It is an ill-posed problem as the motion is in three dimensions but the images are a projection of the 3D scene onto a 2D...
- Object recognition (computer vision)
- Rate-distortion optimization