Template matching - AbsoluteAstronomy.com

Template matching is a technique in digital image processing

Digital image processing

Digital image processing is the use of computer algorithms to perform image processing on digital images. As a subcategory or field of digital signal processing, digital image processing has many advantages over analog image processing...

for finding small parts of an image which match a template image. It can be used in manufacturing as a part of quality control, a way to navigate a mobile robot, or as a way to detect edges in images.

Approach

Template matching can be subdivided between two approaches: feature-based and template-based matching. The feature-based approach uses the features of the search and template image, such as edges

Edge detection

Edge detection is a fundamental tool in image processing and computer vision, particularly in the areas of feature detection and feature extraction, which aim at identifying points in a digital image at which the image brightness changes sharply or, more formally, has discontinuities...

or corners

Corner detection

Corner detection is an approach used within computer vision systems to extract certain kinds of features and infer the contents of an image. Corner detection is frequently used in motion detection, image registration, video tracking, image mosaicing, panorama stitching, 3D modelling and object...

, as the primary match-measuring metrics to find the best matching location of the template in the source image. The template-based, or global, approach, uses the entire template, with generally a sum-comparing metric (using SAD

Sum of absolute differences

Sum of absolute differences is a widely used, extremely simple algorithm for measuring the similarity between image blocks. It works by taking the absolute difference between each pixel in the original block and the corresponding pixel in the block being used for comparison...

, SSD

Sum of squares

The partition of sums of squares is a concept that permeates much of inferential statistics and descriptive statistics. More properly, it is the partitioning of sums of squared deviations or errors. Mathematically, the sum of squared deviations is an unscaled, or unadjusted measure of dispersion...

, cross-correlation

Cross-correlation

In signal processing, cross-correlation is a measure of similarity of two waveforms as a function of a time-lag applied to one of them. This is also known as a sliding dot product or sliding inner-product. It is commonly used for searching a long-duration signal for a shorter, known feature...

, etc.) that determines the best location by testing all or a sample of the viable test locations within the search image that the template image may match up to.

Feature-based approach

If the template image has strong features, a feature-based approach may be considered; the approach may prove further useful if the match in the search image might be transformed in some fashion. Since this approach does not consider the entirety of the template image, it can be more computationally efficient when working with source images of larger resolution, as the alternative approach, template-based, may require searching potentially large amounts of points in order to determine the best matching location.

Template-based approach

For templates without strong features, or for when the bulk of the template image constitutes the matching image, a template-based approach may be effective. As aforementioned, since template-based template matching may potentially require sampling of a large number of points, it is possible to reduce the number of sampling points by reducing the resolution of the search and template images by the same factor and performing the operation on the resultant downsized images (multiresolution, or pyramid, image processing), providing a search window of data points within the search image so that the template does not have to search every viable data point, or a combination of both.

Motion tracking and occlusion handling

In instances where the template may not provide a direct match, it may be useful to implement the use of eigenspaces – templates that detail the matching object under a number of different conditions, such as varying perspectives, illuminations, color contrasts, or acceptable matching object “poses”. For example, if the user was looking for a face, the eigenspaces may consist of images (templates) of faces in different positions to the camera, in different lighting conditions, or with different expressions.

It is also possible for the matching image to be obscured, or occluded by an object; in these cases, it is unreasonable to provide a multitude of templates to cover each possible occlusion. For example, the search image may be a playing card, and in some of the search images, the card is obscured by the fingers of someone holding the card, or by another card on top of it, or any object in front of the camera for that matter. In cases where the object is malleable or poseable, motion also becomes a problem, and problems involving both motion and occlusion become ambiguous. In these cases, one possible solution is to divide the template image into multiple sub-images and perform matching on each subdivision.

Template-based matching and convolution

A basic method of template matching uses a convolution

Convolution

In mathematics and, in particular, functional analysis, convolution is a mathematical operation on two functions f and g, producing a third function that is typically viewed as a modified version of one of the original functions. Convolution is similar to cross-correlation...

mask (template), tailored to a specific feature of the search image, which we want to detect. This technique can be easily performed on grey images or edge

Edge detection

images. The convolution output will be highest at places where the image structure matches the mask structure, where large image values get multiplied by large mask values.

This method is normally implemented by first picking out a part of the search image to use as a template:
We will call the search image S(x, y), where (x, y) represent the coordinates of each pixel in the search image. We will call the template T(x _t, y _t), where (x_t, y_t) represent the coordinates of each pixel in the template. We then simply move the center (or the origin) of the template T(x _t, y _t) over each (x, y) point in the search image and calculate the sum of products between the coefficients in S(x, y) and T(x_t, y_t) over the whole area spanned by the template. As all possible positions of the template with respect to the search image are considered, the position with the highest score is the best position. This method is sometimes referred to as 'Linear Spatial Filtering'

Spatial filter

A spatial filter is an optical device which uses the principles of Fourier optics to alter the structure of a beam of coherent light or other electromagnetic radiation. Spatial filtering is commonly used to "clean up" the output of lasers, removing aberrations in the beam due to imperfect, dirty,...

and the template is called a filter mask.

For example, one way to handle translation problems on images, using template matching is to compare the intensities of the pixel

Pixel

In digital imaging, a pixel, or pel, is a single point in a raster image, or the smallest addressable screen element in a display device; it is the smallest unit of picture that can be represented or controlled....

s, using the SAD (Sum of absolute differences

Sum of absolute differences

) measure.

A pixel in the search image with coordinates (x_s, y_s) has intensity I_s(x_s, y_s) and a pixel in the template with coordinates (x_t, y_t) has intensity I_t(x_t, y_t ). Thus the absolute difference

Absolute difference

The absolute difference of two real numbers x, y is given by |x − y|, the absolute value of their difference. It describes the distance on the real line between the points corresponding to x and y...

in the pixel intensities is defined as Diff(x_s, y_s, x _t, y _t) = | I_s(x_s, y_s) – I_t(x _t, y _t) |.

The mathematical representation of the idea about looping through the pixels in the search image as we translate the origin of the template at every pixel and take the SAD measure is the following:

S_rows and S_cols denote the rows and the columns of the search image and T_rows and T_cols denote the rows and the columns of the template image, respectively.
In this method the lowest SAD score gives the estimate for the best position of template within the search image. The method is simple to implement and understand, but it is one of the slowest methods.

Example

Implementation

In this simple implementation, it is assumed that the above described method is applied on grey images: This is why Grey is used as pixel intensity. The final position in this implementation gives the top left location for where the template image best matches the search image.

minSAD = VALUE_MAX;

// loop through the search image
for ( int x = 0; x <= S_rows - T_rows; x++ ) {
for ( int y = 0; y <= S_cols - T_cols; y++ ) {
SAD = 0.0;

// loop through the template image
for ( int i = 0; i < T_rows; i++ )
for ( int j = 0; j < T_cols; j++ ) {

pixel p_SearchIMG = S[x+i][y+j];

pixel p_TemplateIMG = T[i][j];

SAD += abs( p_SearchIMG.Grey - p_TemplateIMG.Grey );
}

// save the best found position
if ( minSAD > SAD ) {
minSAD = SAD;
// give me VALUE_MAX
position.bestRow = x;
position.bestCol = y;
position.bestSAD = SAD;
}
}
}

One way to perform template matching on color images is to decompose the pixel

Pixel

s into their color components and measure the quality of match between the color template and search image using the sum of the SAD computed for each color separately.

Speeding up the Process

In the past, this type of spatial filtering was normally only used in dedicated hardware solutions because of the computational complexity of the operation, however we can lessen this complexity by filtering it in the frequency domain of the image, referred to as 'frequency domain filtering,' this is done through the use of the convolution theorem

Convolution theorem

In mathematics, the convolution theorem states that under suitableconditions the Fourier transform of a convolution is the pointwise product of Fourier transforms. In other words, convolution in one domain equals point-wise multiplication in the other domain...

.

Another way of speeding up the matching process is through the use of an image pyramid. This is a series of images, at different scales, which are formed by repeatedly filtering and subsampling the original image in order to generate a sequence of reduced resolution images. These lower resolution images can then be searched for the template (with a similarly reduced resolution), in order to yield possible start positions for searching at the larger scales. The larger images can then be searched in a small window around the start position to find the best template location.

Other methods can handle problems such as translation, scale and image rotation.

Improving the accuracy of the matching

Improvements can be made to the matching method by using more than one template (eigenspaces), these other templates can have different scales and rotations.

It is also possible to improve the accuracy of the matching method by hybridizing the feature-based and template-based approaches. Naturally, this requires that the search and template images have features that are apparent enough to support feature matching.

Similar Methods

Other methods which are similar include 'Stereo matching', 'Image registration

Image registration

Image registration is the process of transforming different sets of data into one coordinate system. Data may be multiple photographs, data from different sensors, from different times, or from different viewpoints. It is used in computer vision, medical imaging, military automatic target...

' and 'Scale-invariant feature transform

Scale-invariant feature transform

Scale-invariant feature transform is an algorithm in computer vision to detect and describe local features in images. The algorithm was published by David Lowe in 1999....

Examples of Use

Template matching has various applications and is used in such fields as face recognition (see facial recognition system

Facial recognition system

A facial recognition system is a computer application for automatically identifying or verifying a person from a digital image or a video frame from a video source...

) and medical image processing. Systems have been developed and used in the past to count the number of faces that walk across part of a bridge within a certain amount of time. Other systems include automated calcified nodule detection within digital chest X-rays.

External links

The source of this article is wikipedia, the free encyclopedia. The text of this article is licensed under the GFDL.