Blob detection
Encyclopedia
In the area of computer vision
, blob detection refers to visual modules that are aimed at detecting points and/or regions in the image that differ in properties like brightness or color compared to the surrounding. There are two main classes of blob detectors (i) differential methods based on derivative expressions and (ii) methods based on local extrema in the intensity landscape. With the more recent terminology used in the field, these operators can also be referred to as interest point operators, or alternatively interest region operators (see also interest point detection
and corner detection
).
There are several motivations for studying and developing blob detectors. One main reason is to provide complementary information about regions, which is not obtained from edge detectors
or corner detectors
. In early work in the area, blob detection was used to obtain regions of interest for further processing. These regions could signal the presence of objects or parts of objects in the image domain with application to object recognition
and/or object tracking
. In other domains, such as histogram analysis, blob descriptors can also be used for peak detection with application to segmentation
. Another common use of blob descriptors is as main primitives for texture analysis and texture recognition. In more recent work, blob descriptors have found increasingly popular use as interest points
for wide baseline stereo matching
and to signal the presence of informative image features for appearance-based object recognition based on local image statistics. There is also the related notion of ridge detection
to signal the presence of elongated objects.
by a Gaussian kernel
at a certain scale to give a scale-space representation . Then, the Laplacian operator
is computed, which usually results in strong positive responses for dark blobs of extent and strong negative responses for bright blobs of similar size. A main problem when applying this operator at a single scale, however, is that the operator response is strongly dependent on the relationship between the size of the blob structures in the image domain and the size of the Gaussian kernel used for pre-smoothing. In order to automatically capture blobs of different (unknown) size in the image domain, a multi-scale approach is therefore necessary.
A straightforward way to obtain a multi-scale blob detector with automatic scale selection is to consider the scale-normalized Laplacian operator
and to detect scale-space maxima/minima, that are points that are simultaneously local maxima/minima of with respect to both space and scale (Lindeberg 1994, 1998). Thus, given a discrete two-dimensional input image a three-dimensional discrete scale-space volume is computed and a point is regarded as a bright (dark) blob if the value at this point is greater (smaller) than the value in all its 26 neighbours. Thus, simultaneous selection of interest points and scales is performed according to.
Note that this notion of blob provides a concise and mathematically precise operational definition of the notion of "blob", which directly leads to an efficient and robust algorithm for blob detection. Some basic properties of blobs defined from scale-space maxima of the normalized Laplacian operator are that the responses are covariant with translations, rotations and rescalings in the image domain. Thus, if a scale-space maximum is assumed at a point then under a rescaling of the image by a scale factor , there will be a scale-space maximum at in the rescaled image (Lindeberg 1998). This in practice highly useful property implies that besides the specific topic of Laplacian blob detection, local maxima/minima of the scale-normalized Laplacian are also used for scale selection in other contexts, such as in corner detection
, scale-adaptive feature tracking (Bretzner and Lindeberg 1998), in the scale-invariant feature transform
(Lowe 2004) as well as other image descriptors for image matching and object recognition
.
it follows that the Laplacian of the Gaussian operator can also be computed as the limit case of the difference between two Gaussian smoothed images (scale-space representations).
In the computer vision literature, this approach is referred to as the Difference of Gaussians
(DoG) approach. Besides minor technicalities, however, this operator is in essence similar to the Laplacian and can be seen as an approximation of the Laplacian operator. In a similar fashion as for the Laplacian blob detector, blobs can be detected from scale-space extrema of differences of Gaussians.
where denotes the Hessian matrix
of and then detecting scale-space maxima of this operator one obtains another straightforward differential blob detector with automatic scale selection which also responds to saddles (Lindeberg 1994, 1998).
The blob points and scales are also defined from an operational differential geometric definitions that leads to blob descriptors that are covariant with translations, rotations and rescalings in the image domain. In terms of scale selection, blobs defined from scale-space extrema of the determinant of the Hessian (DoH) also have slightly better scale selection properties under non-Euclidean affine transformations than the more commonly used Laplacian operator (Lindeberg 1994, 1998). In simplified form, the scale-normalized determinant of the Hessian computed from Haar wavelet
s is used as the basic interest point operator in the SURF
descriptor (Bay et al. 2006) for image matching and object recognition.
This operator has been used for image matching, object recognition as well as texture analysis.
to a blob descriptor, where the shape of the smoothing kernel is iteratively warped to match the local image structure around the blob, or equivalently a local image patch is iteratively warped while the shape of the smoothing kernel remains rotationally symmetric (Lindeberg and Garding 1997; Baumberg 2000; Mikolajczyk and Schmid 2004, Lindeberg 2008/2009). In this way, we can define affine-adapted versions of the Laplacian/Difference of Gaussian operator, the determinant of the Hessian and the Hessian-Laplace operator (see also Harris-Affine and Hessian-Affine).
It was proposed that regions of interest and scale descriptors obtained in this way, with associated scale levels defined from the scales at which normalized measures of blob strength assumed their maxima over scales could be used for guiding other early visual processing. An early prototype of simplified vision systems was developed where such regions of interest and scale descriptors were used for directing the focus-of-attention of an active vision system. While the specific technique that was used in these prototypes can be substantially improved with the current knowledge in computer vision, the overall general approach is still valid, for example in the way that that local extrema over scales of the scale-normalized Laplacian operator are nowadays used for providing scale information to other visual processes.
Lindeberg developed an algorithm based on pre-sorting the pixels,
alternatively connected regions having the same intensity, in
decreasing order of the intensity values.
Then, comparisons were made between nearest neighbours of either pixels or connected regions.
For simplicity, let us consider the case of detecting bright grey-level blobs and
let the notation "higher neighbour" stand for "neighbour pixel having a higher grey-level value".
Then, at any stage in the algorithm (carried out in decreasing order of intensity values)
is based on the following classification rules:
Compared to other watershed methods, the flooding in this algorithm stops once the intensity level falls below the intensity value of the so-called delimiting saddle point associated with the local maximum. However, it is rather straightforward to extend this approach the other types of watershed constructions. For example, by proceeding beyond the first delimiting saddle point a "grey-level blob tree" can be constructed. Moreover, the grey-level blob detection method was embedded in a scale-space representation and performed at all levels of scale, resulting in a representation called the scale-space primal sketch.
This algorithm with its applications in computer vision is described in more detail in Lindeberg's thesis as well as the monograph on scale-space theory partially based
on that work. Earlier presentations of this algorithm can also be found in. More detailed treatments of applications of grey-level blob detection and the scale-space primal sketch to computer vision and medical image analysis are given in.
There are close relations between this notion and the above mentioned notion of grey-level blob tree. The maximally stable extremum regions can be seen as making a specific subset of the grey-level blob tree explicit for further processing.
Computer vision
Computer vision is a field that includes methods for acquiring, processing, analysing, and understanding images and, in general, high-dimensional data from the real world in order to produce numerical or symbolic information, e.g., in the forms of decisions...
, blob detection refers to visual modules that are aimed at detecting points and/or regions in the image that differ in properties like brightness or color compared to the surrounding. There are two main classes of blob detectors (i) differential methods based on derivative expressions and (ii) methods based on local extrema in the intensity landscape. With the more recent terminology used in the field, these operators can also be referred to as interest point operators, or alternatively interest region operators (see also interest point detection
Interest point detection
Interest point detection is a recent terminology in computer vision that refers to the detection of interest points for subsequent processing...
and corner detection
Corner detection
Corner detection is an approach used within computer vision systems to extract certain kinds of features and infer the contents of an image. Corner detection is frequently used in motion detection, image registration, video tracking, image mosaicing, panorama stitching, 3D modelling and object...
).
There are several motivations for studying and developing blob detectors. One main reason is to provide complementary information about regions, which is not obtained from edge detectors
Edge detection
Edge detection is a fundamental tool in image processing and computer vision, particularly in the areas of feature detection and feature extraction, which aim at identifying points in a digital image at which the image brightness changes sharply or, more formally, has discontinuities...
or corner detectors
Corner detection
Corner detection is an approach used within computer vision systems to extract certain kinds of features and infer the contents of an image. Corner detection is frequently used in motion detection, image registration, video tracking, image mosaicing, panorama stitching, 3D modelling and object...
. In early work in the area, blob detection was used to obtain regions of interest for further processing. These regions could signal the presence of objects or parts of objects in the image domain with application to object recognition
Object recognition
Object recognition in computer vision is the task of finding a given object in an image or video sequence. Humans recognize a multitude of objects in images with little effort, despite the fact that the image of the objects may vary somewhat in different view points, in many different sizes / scale...
and/or object tracking
Video tracking
Video tracking is the process of locating a moving object over time using a camera. It has a variety of uses, some of which are: human-computer interaction, security and surveillance, video communication and compression, augmented reality, traffic control, medical imaging and video editing...
. In other domains, such as histogram analysis, blob descriptors can also be used for peak detection with application to segmentation
Segmentation (image processing)
In computer vision, segmentation refers to the process of partitioning a digital image into multiple segments . The goal of segmentation is to simplify and/or change the representation of an image into something that is more meaningful and easier to analyze...
. Another common use of blob descriptors is as main primitives for texture analysis and texture recognition. In more recent work, blob descriptors have found increasingly popular use as interest points
Interest point detection
Interest point detection is a recent terminology in computer vision that refers to the detection of interest points for subsequent processing...
for wide baseline stereo matching
Image registration
Image registration is the process of transforming different sets of data into one coordinate system. Data may be multiple photographs, data from different sensors, from different times, or from different viewpoints. It is used in computer vision, medical imaging, military automatic target...
and to signal the presence of informative image features for appearance-based object recognition based on local image statistics. There is also the related notion of ridge detection
Ridge detection
The ridges of a smooth function of two variables is a set of curves whose points are, in one or more ways to be made precise below, local maxima of the function in at least one dimension. For a function of N variables, its ridges are a set of curves whose points are local maxima in N-1 dimensions...
to signal the presence of elongated objects.
The Laplacian of Gaussian
One of the first and also most common blob detectors is based on the Laplacian of the Gaussian (LoG). Given an input image , this image is convolvedConvolution
In mathematics and, in particular, functional analysis, convolution is a mathematical operation on two functions f and g, producing a third function that is typically viewed as a modified version of one of the original functions. Convolution is similar to cross-correlation...
by a Gaussian kernel
at a certain scale to give a scale-space representation . Then, the Laplacian operator
is computed, which usually results in strong positive responses for dark blobs of extent and strong negative responses for bright blobs of similar size. A main problem when applying this operator at a single scale, however, is that the operator response is strongly dependent on the relationship between the size of the blob structures in the image domain and the size of the Gaussian kernel used for pre-smoothing. In order to automatically capture blobs of different (unknown) size in the image domain, a multi-scale approach is therefore necessary.
A straightforward way to obtain a multi-scale blob detector with automatic scale selection is to consider the scale-normalized Laplacian operator
and to detect scale-space maxima/minima, that are points that are simultaneously local maxima/minima of with respect to both space and scale (Lindeberg 1994, 1998). Thus, given a discrete two-dimensional input image a three-dimensional discrete scale-space volume is computed and a point is regarded as a bright (dark) blob if the value at this point is greater (smaller) than the value in all its 26 neighbours. Thus, simultaneous selection of interest points and scales is performed according to.
Note that this notion of blob provides a concise and mathematically precise operational definition of the notion of "blob", which directly leads to an efficient and robust algorithm for blob detection. Some basic properties of blobs defined from scale-space maxima of the normalized Laplacian operator are that the responses are covariant with translations, rotations and rescalings in the image domain. Thus, if a scale-space maximum is assumed at a point then under a rescaling of the image by a scale factor , there will be a scale-space maximum at in the rescaled image (Lindeberg 1998). This in practice highly useful property implies that besides the specific topic of Laplacian blob detection, local maxima/minima of the scale-normalized Laplacian are also used for scale selection in other contexts, such as in corner detection
Corner detection
Corner detection is an approach used within computer vision systems to extract certain kinds of features and infer the contents of an image. Corner detection is frequently used in motion detection, image registration, video tracking, image mosaicing, panorama stitching, 3D modelling and object...
, scale-adaptive feature tracking (Bretzner and Lindeberg 1998), in the scale-invariant feature transform
Scale-invariant feature transform
Scale-invariant feature transform is an algorithm in computer vision to detect and describe local features in images. The algorithm was published by David Lowe in 1999....
(Lowe 2004) as well as other image descriptors for image matching and object recognition
Object recognition
Object recognition in computer vision is the task of finding a given object in an image or video sequence. Humans recognize a multitude of objects in images with little effort, despite the fact that the image of the objects may vary somewhat in different view points, in many different sizes / scale...
.
The difference of Gaussians approach
From the fact that the scale-space representation satisfies the diffusion equationit follows that the Laplacian of the Gaussian operator can also be computed as the limit case of the difference between two Gaussian smoothed images (scale-space representations).
In the computer vision literature, this approach is referred to as the Difference of Gaussians
Difference of Gaussians
In computer vision, Difference of Gaussians is a grayscale image enhancement algorithm that involves the subtraction of one blurred version of an original grayscale image from another, less blurred version of the original. The blurred images are obtained by convolving the original grayscale image...
(DoG) approach. Besides minor technicalities, however, this operator is in essence similar to the Laplacian and can be seen as an approximation of the Laplacian operator. In a similar fashion as for the Laplacian blob detector, blobs can be detected from scale-space extrema of differences of Gaussians.
The determinant of the Hessian
By considering the scale-normalized determinant of the Hessian, also referred to as the Monge–Ampère operator,where denotes the Hessian matrix
Hessian matrix
In mathematics, the Hessian matrix is the square matrix of second-order partial derivatives of a function; that is, it describes the local curvature of a function of many variables. The Hessian matrix was developed in the 19th century by the German mathematician Ludwig Otto Hesse and later named...
of and then detecting scale-space maxima of this operator one obtains another straightforward differential blob detector with automatic scale selection which also responds to saddles (Lindeberg 1994, 1998).
The blob points and scales are also defined from an operational differential geometric definitions that leads to blob descriptors that are covariant with translations, rotations and rescalings in the image domain. In terms of scale selection, blobs defined from scale-space extrema of the determinant of the Hessian (DoH) also have slightly better scale selection properties under non-Euclidean affine transformations than the more commonly used Laplacian operator (Lindeberg 1994, 1998). In simplified form, the scale-normalized determinant of the Hessian computed from Haar wavelet
Haar wavelet
In mathematics, the Haar wavelet is a certain sequence of rescaled "square-shaped" functions which together form a wavelet family or basis. Wavelet analysis is similar to Fourier analysis in that it allows a target function over an interval to be represented in terms of an orthonormal function basis...
s is used as the basic interest point operator in the SURF
SURF
SURF is a robust image detector & descriptor, first presented by Herbert Bay et al. in 2006, that can be used in computer vision tasks like object recognition or 3D reconstruction. It is partly inspired by the SIFT descriptor...
descriptor (Bay et al. 2006) for image matching and object recognition.
The hybrid Laplacian and determinant of the Hessian operator (Hessian-Laplace)
A hybrid operator between the Laplacian and the determinant of the Hessian blob detectors has also been proposed, where spatial selection is done by the determinant of the Hessian and scale selection is performed with the scale-normalized Laplacian (Mikolajczyk and Schmid 2004):This operator has been used for image matching, object recognition as well as texture analysis.
Affine-adapted differential blob detectors
The blob descriptors obtained from these blob detectors with automatic scale selection are invariant to translations, rotations and uniform rescalings in the spatial domain. The images that constitute the input to a computer vision system are, however, also subject to perspective distortions. To obtain blob descriptors that are more robust to perspective transformations, a natural approach is to devise a blob detector that is invariant to affine transformations. In practice, affine invariant interest points can be obtained by applying affine shape adaptationAffine shape adaptation
Affine shape adaptation is a methodology for iteratively adapting the shape of the smoothing kernels in an affine group of smoothing kernels to the local image structure in neighbourhood region of a specific image point...
to a blob descriptor, where the shape of the smoothing kernel is iteratively warped to match the local image structure around the blob, or equivalently a local image patch is iteratively warped while the shape of the smoothing kernel remains rotationally symmetric (Lindeberg and Garding 1997; Baumberg 2000; Mikolajczyk and Schmid 2004, Lindeberg 2008/2009). In this way, we can define affine-adapted versions of the Laplacian/Difference of Gaussian operator, the determinant of the Hessian and the Hessian-Laplace operator (see also Harris-Affine and Hessian-Affine).
Grey-level blobs, grey-level blob trees and scale-space blobs
A natural approach to detect blobs is to associate a bright (dark) blob with each local maximum (minimum) in the intensity landscape. A main problem with such an approach, however, is that local extrema are very sensitive to noise. To address this problem, Lindeberg (1993, 1994) studied the problem of detecting local maxima with extent at multiple scales in scale-space. A region with spatial extent defined from a watershed analogy was associated with each local maximum, as well a local contrast defined from a so-called delimiting saddle point. A local extremum with extent defined in this way was referred to as a grey-level blob. Moreover, by proceeding with the watershed analogy beyond the delimiting saddle point, a grey-level blob tree was defined to capture the nested topological structure of level sets in the intensity landscape, in a way that is invariant to affine deformations in the image domain and monotone intensity transformations. By studying how these structures evolve with increasing scales, the notion of scale-space blobs was introduced. Beyond local contrast and extent, these scale-space blobs also measured how stable image structures are in scale-space, by measuring their scale-space lifetime.It was proposed that regions of interest and scale descriptors obtained in this way, with associated scale levels defined from the scales at which normalized measures of blob strength assumed their maxima over scales could be used for guiding other early visual processing. An early prototype of simplified vision systems was developed where such regions of interest and scale descriptors were used for directing the focus-of-attention of an active vision system. While the specific technique that was used in these prototypes can be substantially improved with the current knowledge in computer vision, the overall general approach is still valid, for example in the way that that local extrema over scales of the scale-normalized Laplacian operator are nowadays used for providing scale information to other visual processes.
Lindeberg's watershed-based grey-level blob detection algorithm
For the purpose of detecting grey-level blobs (local extrema with extent) from a watershed analogy,Lindeberg developed an algorithm based on pre-sorting the pixels,
alternatively connected regions having the same intensity, in
decreasing order of the intensity values.
Then, comparisons were made between nearest neighbours of either pixels or connected regions.
For simplicity, let us consider the case of detecting bright grey-level blobs and
let the notation "higher neighbour" stand for "neighbour pixel having a higher grey-level value".
Then, at any stage in the algorithm (carried out in decreasing order of intensity values)
is based on the following classification rules:
- If a region has no higher neighbour, then it is a local maximum and will be the seed of a blob.
- Else, if it has at least one higher neighbour, which is background, then it cannot be part of any blob and must be background.
- Else, if it has more than one higher neighbour and if those higher neighbours are parts of different blobs, then it cannot be a part of any blob, and must be background.
- Else, it has one or more higher neighbours, which are all parts of the same blob. Then, it must also be a part of that blob.
Compared to other watershed methods, the flooding in this algorithm stops once the intensity level falls below the intensity value of the so-called delimiting saddle point associated with the local maximum. However, it is rather straightforward to extend this approach the other types of watershed constructions. For example, by proceeding beyond the first delimiting saddle point a "grey-level blob tree" can be constructed. Moreover, the grey-level blob detection method was embedded in a scale-space representation and performed at all levels of scale, resulting in a representation called the scale-space primal sketch.
This algorithm with its applications in computer vision is described in more detail in Lindeberg's thesis as well as the monograph on scale-space theory partially based
on that work. Earlier presentations of this algorithm can also be found in. More detailed treatments of applications of grey-level blob detection and the scale-space primal sketch to computer vision and medical image analysis are given in.
Maximally stable extremum regions (MSER)
Matas et al. (2002) were interested in defining image descriptors that are robust under perspective transformations. They studied level sets in the intensity landscape and measured how stable these were along the intensity dimension. Based on this idea, they defined a notion of maximally stable extremum regions and showed how these image descriptors can be used as image features for stereo matching.There are close relations between this notion and the above mentioned notion of grey-level blob tree. The maximally stable extremum regions can be seen as making a specific subset of the grey-level blob tree explicit for further processing.
See also
- Blob extraction
- Corner detectionCorner detectionCorner detection is an approach used within computer vision systems to extract certain kinds of features and infer the contents of an image. Corner detection is frequently used in motion detection, image registration, video tracking, image mosaicing, panorama stitching, 3D modelling and object...
- Affine shape adaptationAffine shape adaptationAffine shape adaptation is a methodology for iteratively adapting the shape of the smoothing kernels in an affine group of smoothing kernels to the local image structure in neighbourhood region of a specific image point...
- Scale-space
- Ridge detectionRidge detectionThe ridges of a smooth function of two variables is a set of curves whose points are, in one or more ways to be made precise below, local maxima of the function in at least one dimension. For a function of N variables, its ridges are a set of curves whose points are local maxima in N-1 dimensions...
- Interest point detectionInterest point detectionInterest point detection is a recent terminology in computer vision that refers to the detection of interest points for subsequent processing...
- Feature detection (computer vision)
- Harris-Affine
- Hessian-Affine
- PCBRPrincipal Curvature-Based Region DetectorThe principal curvature-based region detector, also called PCBR is a feature detector used in the fields of computer vision and image analysis...