Color histogram
Encyclopedia
In image processing
Image processing
In electrical engineering and computer science, image processing is any form of signal processing for which the input is an image, such as a photograph or video frame; the output of image processing may be either an image or, a set of characteristics or parameters related to the image...

 and photography
Photography
Photography is the art, science and practice of creating durable images by recording light or other electromagnetic radiation, either electronically by means of an image sensor or chemically by means of a light-sensitive material such as photographic film...

, a color histogram is a representation of the distribution of colors in an image
Image
An image is an artifact, for example a two-dimensional picture, that has a similar appearance to some subject—usually a physical object or a person.-Characteristics:...

. For digital images, a color histogram represents the number of pixel
Pixel
In digital imaging, a pixel, or pel, is a single point in a raster image, or the smallest addressable screen element in a display device; it is the smallest unit of picture that can be represented or controlled....

s that have colors in each of a fixed list of color ranges, that span the image's color space
Color space
A color model is an abstract mathematical model describing the way colors can be represented as tuples of numbers, typically as three or four values or color components...

, the set of all possible colors.

The color histogram can be built for any kind of color space, although the term is more often used for three-dimensional spaces like RGB
RGB color space
An RGB color space is any additive color space based on the RGB color model. A particular RGB color space is defined by the three chromaticities of the red, green, and blue additive primaries, and can produce any chromaticity that is the triangle defined by those primary colors...

 or HSV. For monochromatic images, the term intensity histogram may be used instead. For multi-spectral images, where each pixel is represented by an arbitrary number of measurements (for example, beyond the three measurements in RGB), the color histogram is N-dimensional, with N being the number of measurements taken. Each measurement has its own wavelength range of the light spectrum, some of which may be outside the visible spectrum.

If the set of possible color values is sufficiently small, each of those colors may be placed on a range by itself; then the histogram is merely the count of pixels that have each possible color. Most often, the space is divided into an appropriate number of ranges, often arranged as a regular grid, each containing many similar color values. The color histogram may also be represented and displayed as a smooth function
Function (mathematics)
In mathematics, a function associates one quantity, the argument of the function, also known as the input, with another quantity, the value of the function, also known as the output. A function assigns exactly one output to each input. The argument and the value may be real numbers, but they can...

 defined over the color space that approximates the pixel counts.

Like other kinds of histogram
Histogram
In statistics, a histogram is a graphical representation showing a visual impression of the distribution of data. It is an estimate of the probability distribution of a continuous variable and was first introduced by Karl Pearson...

s, the color histogram is a statistic
Statistic
A statistic is a single measure of some attribute of a sample . It is calculated by applying a function to the values of the items comprising the sample which are known together as a set of data.More formally, statistical theory defines a statistic as a function of a sample where the function...

 that can be viewed as an approximation of an underlying continuous distribution
Frequency distribution
In statistics, a frequency distribution is an arrangement of the values that one or more variables take in a sample. Each entry in the table contains the frequency or count of the occurrences of values within a particular group or interval, and in this way, the table summarizes the distribution of...

 of colors values.

Overview

Color histograms are flexible constructs that can be built from images in various color space
Color space
A color model is an abstract mathematical model describing the way colors can be represented as tuples of numbers, typically as three or four values or color components...

s, whether RGB
RGB color space
An RGB color space is any additive color space based on the RGB color model. A particular RGB color space is defined by the three chromaticities of the red, green, and blue additive primaries, and can produce any chromaticity that is the triangle defined by those primary colors...

, rg chromaticity or any other color space of any dimension. A histogram of an image is produced first by discretization of the colors in the image into a number of bins, and counting the number of image pixels in each bin. For example, a Red–Blue chromaticity histogram can be formed by first normalizing color pixel values by dividing RGB values by R+G+B, then quantizing the normalized R and B coordinates into N bins each. A two-dimensional histogram of Red-Blue chromaticity divide in to four bins (N=4) might yield a histogram that looks like this table:
  red
0-63 64-127 128-191 192-255
blue 0-63 43 78 18 0
64-127 45 67 33 2
128-191 127 58 25 8
192-255 140 47 47 13


A histogram can be N-dimensional. Although harder to display, a three-dimensional color histogram for the above example could be thought of as four separate Red-Blue histograms, where each of the four histograms contains the Red-Blue values for a bin of green (0-63, 64-127, 128-191, and 192-255).

The histogram provides a compact summarization of the distribution of data in an image. The color histogram of an image is relatively invariant with translation and rotation about the viewing axis, and varies only slowly with the angle of view. By comparing histograms signatures of two images and matching the color content of one image with the other, the color histogram is particularly well suited for the problem of recognizing an object of unknown position and rotation within a scene. Importantly, translation of an RGB image into the illumination invariant rg-chromaticity space allows the histogram to operate well in varying light levels.

Example Histogram Data

Given the following image of a cat (an original version and a version that has been reduced to 256 colors for easy histogram purposes), the following data represents a color histogram in the RGB color space, using four bins. Bin 0 corresponds to intensities 0-63, bin 1 is 64-127, bin 2 is 128-191, and bin 3 is 192-255.
Red Green Blue Pixel Count
0 0 0 7414
0 0 1 230
0 0 2 0
0 0 3 0
0 1 0 8
0 1 1 372
0 1 2 88
0 1 3 0
0 2 0 0
0 2 1 0
0 2 2 10
0 2 3 1
0 3 0 0
0 3 1 0
0 3 2 0
0 3 3 0
1 0 0 891
1 0 1 13
1 0 2 0
1 0 3 0
1 1 0 592
1 1 1 3462
1 1 2 355
1 1 3 0
1 2 0 0
1 2 1 101
1 2 2 882
1 2 3 16
1 3 0 0
1 3 1 0
1 3 2 0
1 3 3 0
2 0 0 1146
2 0 1 0
2 0 2 0
2 0 3 0
2 1 0 2552
2 1 1 9040
2 1 2 47
2 1 3 0
2 2 0 0
2 2 1 8808
2 2 2 53110
2 2 3 11053
2 3 0 0
2 3 1 0
2 3 2 170
2 3 3 17533
3 0 0 11
3 0 1 0
3 0 2 0
3 0 3 0
3 1 0 856
3 1 1 1376
3 1 2 0
3 1 3 0
3 2 0 0
3 2 1 3650
3 2 2 6260
3 2 3 109
3 3 0 0
3 3 1 0
3 3 2 3415
3 3 3 53929

Drawbacks and other approaches

The main drawback of histograms for classification is that the representation is dependent of the color of the object being studied, ignoring its shape and texture. Color histograms can potentially be identical for two images with different object content which happens to share color information. Conversely, without spatial or shape information, similar objects of different color may be indistinguishable based solely on color histogram comparisons. There is no way to distinguish a red and white cup from a red and white plate. Put another way, histogram-based algorithms have no concept of a generic 'cup', and a model of a red and white cup is no use when given an otherwise identical blue and white cup. Another problem is that color histograms have high sensitivity to noisy interference such as lighting intensity changes and quantization errors. High dimensionality (bins) color histograms are also another issue. Some color histogram feature spaces often occupy more than one hundred dimensions.

Some of the proposed solutions have been color histogram intersection, color constant indexing, cumulative color histogram, quadratic distance, and color correlogram
Correlogram
In the analysis of data, a correlogram is an image of correlation statistics. For example, in time series analysis, a correlogram, also known as an autocorrelation plot, is a plot of the sample autocorrelations r_h\, versus h\, ....

s. Although there are drawbacks of using histograms for indexing and classification, using color in a real-time system has several advantages. One is that color information is faster to compute compared to other invariants. It has been shown in some cases that color can be an efficient method for identifying objects of known location and appearance.

Further research into the relationship between color histogram data to the physical properties of the objects in an image has shown they can represent not only object color and illumination but relate to surface roughness and image geometry and provide an improved estimate of illumination and object color.

Usually, Euclidean distance, histogram intersection, or cosine or quadratic distances are used for the calculation of image similarity ratings. Any of these values do not reflect the similarity rate of two images in itself; it is useful only when used in comparison to other similar values. This is the reason that all the practical implementations of content-based image retrieval must complete computation of all images from the database, and is the main disadvantage of these implementations.

Another approach to representative color image content is two-dimensional color histogram. A two-dimensional color histogram considers the relation between the pixel pair colors (not only the lighting component). A two-dimensional color histogram is a two-dimensional array. The size of each dimension is the number of colors that were used in the phase of color quantization. These arrays are treated as matrices, each element of which stores a normalized count of pixel pairs, with each color corresponding to the index of an element in each pixel neighborhood. For comparison of two-dimensional color histograms it is suggested calculating their correlation, because constructed as described above, is a random vector (in other words, a multi-dimensional random value). While creating a set of final images, the images should be arranged in decreasing order of the correlation coefficient.

The correlation coefficient may also be used for color histogram comparison. Retrieval results with correlation coefficient are better than with other metrics.

Intensity histogram of continuous data

The idea of an intensity histogram can be generalized to continuous data,
say audio signals represented by real functions or images represented by functions with two-dimensional domain.

Let (see Lebesgue space), then the cumulative histogram operator can be defined by:.
is the Lebesgue measure
Lebesgue measure
In measure theory, the Lebesgue measure, named after French mathematician Henri Lebesgue, is the standard way of assigning a measure to subsets of n-dimensional Euclidean space. For n = 1, 2, or 3, it coincides with the standard measure of length, area, or volume. In general, it is also called...

 of sets.
in turn is a real function.
The (non-cumulative) histogram is defined as its derivative
Derivative
In calculus, a branch of mathematics, the derivative is a measure of how a function changes as its input changes. Loosely speaking, a derivative can be thought of as how much one quantity is changing in response to changes in some other quantity; for example, the derivative of the position of a...

..

External links

The source of this article is wikipedia, the free encyclopedia.  The text of this article is licensed under the GFDL.
 
x
OK