Visual descriptors
Encyclopedia
In computer vision
, visual descriptors or image descriptors are descriptions of the visual features of the contents in images
, videos
, algorithms, or applications that produce such descriptions. They describe elementary characteristics such as the shape
, the color
, the texture or the motion
, among others.
in our society, the amount of audio-visual information available in digital format is increasing considerably. Therefore, it has been necessary to design some systems that allow us to describe the content of several types of multimedia
information in order to search and classify them.
The audio-visual descriptors are in charge of the contents description. These descriptors have a good knowledge of the objects and events found in a video
, image
or audio
and they allow the quick and efficient searches of the audio-visual content.
This system can be compared to the search engine
s for textual contents. Although it is certain, that it is relatively easy to find text with a computer, is much more difficult to find concrete audio and video parts. For instance, imagine somebody searching a scene of a happy person. The happiness is a feeling and it is not evident its shape
, color
and texture description in images
.
The description of the audio-visual content is not a superficial task and it is essential for the effective use of this type of archives. The standardization system that deals with audio-visual descriptors is the MPEG-7
(Motion Picture Expert Group - 7).
and what humans recall after having observed an image
or a group of images
after some minutes.
Visual descriptors are divided in two main groups:
, texture, shape
, motion
, location and others. This description is automatically generated by means of signal processing
.
As mentioned before, face recognition is a concrete example of an application that tries to automatically obtain this information.
Computer vision
Computer vision is a field that includes methods for acquiring, processing, analysing, and understanding images and, in general, high-dimensional data from the real world in order to produce numerical or symbolic information, e.g., in the forms of decisions...
, visual descriptors or image descriptors are descriptions of the visual features of the contents in images
Image
An image is an artifact, for example a two-dimensional picture, that has a similar appearance to some subject—usually a physical object or a person.-Characteristics:...
, videos
Motion graphics
Motion graphics are graphics that use video footage and/or animation technology to create the illusion of motion or rotation, graphics are usually combined with audio for use in multimedia projects. Motion graphics are usually displayed via electronic media technology, but may be displayed via...
, algorithms, or applications that produce such descriptions. They describe elementary characteristics such as the shape
Shape
The shape of an object located in some space is a geometrical description of the part of that space occupied by the object, as determined by its external boundary – abstracting from location and orientation in space, size, and other properties such as colour, content, and material...
, the color
Color
Color or colour is the visual perceptual property corresponding in humans to the categories called red, green, blue and others. Color derives from the spectrum of light interacting in the eye with the spectral sensitivities of the light receptors...
, the texture or the motion
Motion (physics)
In physics, motion is a change in position of an object with respect to time. Change in action is the result of an unbalanced force. Motion is typically described in terms of velocity, acceleration, displacement and time . An object's velocity cannot change unless it is acted upon by a force, as...
, among others.
Introduction
As a result of the new communication technologies and the massive use of InternetInternet
The Internet is a global system of interconnected computer networks that use the standard Internet protocol suite to serve billions of users worldwide...
in our society, the amount of audio-visual information available in digital format is increasing considerably. Therefore, it has been necessary to design some systems that allow us to describe the content of several types of multimedia
Multimedia
Multimedia is media and content that uses a combination of different content forms. The term can be used as a noun or as an adjective describing a medium as having multiple content forms. The term is used in contrast to media which use only rudimentary computer display such as text-only, or...
information in order to search and classify them.
The audio-visual descriptors are in charge of the contents description. These descriptors have a good knowledge of the objects and events found in a video
Video
Video is the technology of electronically capturing, recording, processing, storing, transmitting, and reconstructing a sequence of still images representing scenes in motion.- History :...
, image
Image
An image is an artifact, for example a two-dimensional picture, that has a similar appearance to some subject—usually a physical object or a person.-Characteristics:...
or audio
Sound
Sound is a mechanical wave that is an oscillation of pressure transmitted through a solid, liquid, or gas, composed of frequencies within the range of hearing and of a level sufficiently strong to be heard, or the sensation stimulated in organs of hearing by such vibrations.-Propagation of...
and they allow the quick and efficient searches of the audio-visual content.
This system can be compared to the search engine
Search engine
A search engine is an information retrieval system designed to help find information stored on a computer system. The search results are usually presented in a list and are commonly called hits. Search engines help to minimize the time required to find information and the amount of information...
s for textual contents. Although it is certain, that it is relatively easy to find text with a computer, is much more difficult to find concrete audio and video parts. For instance, imagine somebody searching a scene of a happy person. The happiness is a feeling and it is not evident its shape
Shape
The shape of an object located in some space is a geometrical description of the part of that space occupied by the object, as determined by its external boundary – abstracting from location and orientation in space, size, and other properties such as colour, content, and material...
, color
Color
Color or colour is the visual perceptual property corresponding in humans to the categories called red, green, blue and others. Color derives from the spectrum of light interacting in the eye with the spectral sensitivities of the light receptors...
and texture description in images
Image
An image is an artifact, for example a two-dimensional picture, that has a similar appearance to some subject—usually a physical object or a person.-Characteristics:...
.
The description of the audio-visual content is not a superficial task and it is essential for the effective use of this type of archives. The standardization system that deals with audio-visual descriptors is the MPEG-7
MPEG-7
MPEG-7 is a multimedia content description standard. It was standardized in ISO/IEC 15938 . This description will be associated with the content itself, to allow fast and efficient searching for material that is of interest to the user. MPEG-7 is formally called Multimedia Content Description...
(Motion Picture Expert Group - 7).
Types of visual descriptors
Descriptors are the first step to find out the connection between pixels contained in a digital imageDigital image
A digital image is a numeric representation of a two-dimensional image. Depending on whether or not the image resolution is fixed, it may be of vector or raster type...
and what humans recall after having observed an image
Image
An image is an artifact, for example a two-dimensional picture, that has a similar appearance to some subject—usually a physical object or a person.-Characteristics:...
or a group of images
Image
An image is an artifact, for example a two-dimensional picture, that has a similar appearance to some subject—usually a physical object or a person.-Characteristics:...
after some minutes.
Visual descriptors are divided in two main groups:
- General information descriptors: they contain low level descriptors which give a description about colorColorColor or colour is the visual perceptual property corresponding in humans to the categories called red, green, blue and others. Color derives from the spectrum of light interacting in the eye with the spectral sensitivities of the light receptors...
, shapeShapeThe shape of an object located in some space is a geometrical description of the part of that space occupied by the object, as determined by its external boundary – abstracting from location and orientation in space, size, and other properties such as colour, content, and material...
, regions, textures and motionMotion (physics)In physics, motion is a change in position of an object with respect to time. Change in action is the result of an unbalanced force. Motion is typically described in terms of velocity, acceleration, displacement and time . An object's velocity cannot change unless it is acted upon by a force, as...
. - Specific domain information descriptors: they give information about objects and events in the scene. A concrete example would be face recognition.
General information descriptors
General information descriptors consist of a set of descriptors that covers different basic and elementary features like: colorColor
Color or colour is the visual perceptual property corresponding in humans to the categories called red, green, blue and others. Color derives from the spectrum of light interacting in the eye with the spectral sensitivities of the light receptors...
, texture, shape
Shape
The shape of an object located in some space is a geometrical description of the part of that space occupied by the object, as determined by its external boundary – abstracting from location and orientation in space, size, and other properties such as colour, content, and material...
, motion
Motion (physics)
In physics, motion is a change in position of an object with respect to time. Change in action is the result of an unbalanced force. Motion is typically described in terms of velocity, acceleration, displacement and time . An object's velocity cannot change unless it is acted upon by a force, as...
, location and others. This description is automatically generated by means of signal processing
Signal processing
Signal processing is an area of systems engineering, electrical engineering and applied mathematics that deals with operations on or analysis of signals, in either discrete or continuous time...
.
- COLOR: the most basic quality of visual content. Five tools are defined to describe colorColorColor or colour is the visual perceptual property corresponding in humans to the categories called red, green, blue and others. Color derives from the spectrum of light interacting in the eye with the spectral sensitivities of the light receptors...
. The three first tools represent the colorColorColor or colour is the visual perceptual property corresponding in humans to the categories called red, green, blue and others. Color derives from the spectrum of light interacting in the eye with the spectral sensitivities of the light receptors...
distribution and the last ones describe the colorColorColor or colour is the visual perceptual property corresponding in humans to the categories called red, green, blue and others. Color derives from the spectrum of light interacting in the eye with the spectral sensitivities of the light receptors...
relation between sequences or group of imagesImageAn image is an artifact, for example a two-dimensional picture, that has a similar appearance to some subject—usually a physical object or a person.-Characteristics:...
:- Dominant Color Descriptor (DCD)
- Scalable Color Descriptor (SCD)
- Color Structure Descriptor (CSD)
- Color Layout DescriptorColor Layout DescriptorColor Layout Descriptor is designed to capture the spatial distribution of color in an image. The feature extraction process consists of two parts; grid based representative color selection and Discrete Cosine Transform with quantization....
(CLD) - Group of frame (GoF) or Group-of-pictures (GoP)
- TEXTURE: also, an important quality in order to describe an imageImageAn image is an artifact, for example a two-dimensional picture, that has a similar appearance to some subject—usually a physical object or a person.-Characteristics:...
. The texture descriptors characterize imageImageAn image is an artifact, for example a two-dimensional picture, that has a similar appearance to some subject—usually a physical object or a person.-Characteristics:...
textures or regions. They observe the region homogeneity and the histograms of these region borders. The set of descriptors is formed by:- Homogeneous Texture Descriptor (HTD)
- Texture Browsing Descriptor (TBD)
- Edge Histogram Descriptor (EHD)
- SHAPE: contains important semantic information due to human’s ability to recognize objects through their shapeShapeThe shape of an object located in some space is a geometrical description of the part of that space occupied by the object, as determined by its external boundary – abstracting from location and orientation in space, size, and other properties such as colour, content, and material...
. However, this information can only be extracted by means of a segmentationSegmentationSegmentation may mean:*Market segmentation, in economics and marketingBiology*A process of morphogenesis that divides a metazoan body into a series of semi-repetitive segments*Segmentation , a series of semi-repetitive segments...
similar to the one that the human visual system implements. Nowadays, such a segmentationSegmentationSegmentation may mean:*Market segmentation, in economics and marketingBiology*A process of morphogenesis that divides a metazoan body into a series of semi-repetitive segments*Segmentation , a series of semi-repetitive segments...
system is not available yet, however there exists a serial of algorithms which are considered to be a good approximation. These descriptors describe regions, contours and shapes for 2D2D geometric modelA 2D geometric model is a geometric model of an object as two-dimensional figure, usually on the Euclidean or Cartesian plane.Even though all material objects are three-dimensional, a 2D geometric model is often adequate for certain flat objects, such as paper cut-outs and machine parts made of...
imagesImageAn image is an artifact, for example a two-dimensional picture, that has a similar appearance to some subject—usually a physical object or a person.-Characteristics:...
and for 3DVolumeVolume is the quantity of three-dimensional space enclosed by some closed boundary, for example, the space that a substance or shape occupies or contains....
volumes. The shapeShapeThe shape of an object located in some space is a geometrical description of the part of that space occupied by the object, as determined by its external boundary – abstracting from location and orientation in space, size, and other properties such as colour, content, and material...
descriptors are the following ones:- Region-based Shape Descriptor (RSD)
- Contour-based Shape Descriptor (CSD)
- 3-D Shape Descriptor (3-D SD)
- MOTION: defined by four different descriptors which describe motionMotion (physics)In physics, motion is a change in position of an object with respect to time. Change in action is the result of an unbalanced force. Motion is typically described in terms of velocity, acceleration, displacement and time . An object's velocity cannot change unless it is acted upon by a force, as...
in videoVideoVideo is the technology of electronically capturing, recording, processing, storing, transmitting, and reconstructing a sequence of still images representing scenes in motion.- History :...
sequence. Motion is related to the objects motion in the sequence and to the cameraCameraA camera is a device that records and stores images. These images may be still photographs or moving images such as videos or movies. The term camera comes from the camera obscura , an early mechanism for projecting images...
motion. This last information is provided by the capture device, whereas the rest is implemented by means of image processingImage processingIn electrical engineering and computer science, image processing is any form of signal processing for which the input is an image, such as a photograph or video frame; the output of image processing may be either an image or, a set of characteristics or parameters related to the image...
. The descriptor set is the following one:- Motion Activity Descriptor (MAD)
- Camera Motion Descriptor (CMD)
- Motion Trajectory Descriptor (MTD)
- Warping and Parametric Motion Descriptor (WMD and PMD)
- LOCATION: elements location in the imageImageAn image is an artifact, for example a two-dimensional picture, that has a similar appearance to some subject—usually a physical object or a person.-Characteristics:...
is used to describe elements in the spatial domain. In addition, elements can also be located in the temporal domain:- Region Locator Descriptor (RLD)
- Spatio Temporal Locator Descriptor (STLD)
Specific domain information descriptors
These descriptors, which give information about objects and events in the scene, are not easily extractable, even more when the extraction is to be automatically done. Nevertheless they can be manually processed.As mentioned before, face recognition is a concrete example of an application that tries to automatically obtain this information.
Descriptors applications
Among all applications, the most important ones are:- MultimediaMultimediaMultimedia is media and content that uses a combination of different content forms. The term can be used as a noun or as an adjective describing a medium as having multiple content forms. The term is used in contrast to media which use only rudimentary computer display such as text-only, or...
documents search engines and classifiers. - Digital libraryDigital libraryA digital library is a library in which collections are stored in digital formats and accessible by computers. The digital content may be stored locally, or accessed remotely via computer networks...
: visual descriptors allow a very detailed and concrete search of any videoVideoVideo is the technology of electronically capturing, recording, processing, storing, transmitting, and reconstructing a sequence of still images representing scenes in motion.- History :...
or imageImageAn image is an artifact, for example a two-dimensional picture, that has a similar appearance to some subject—usually a physical object or a person.-Characteristics:...
by means of different search parameters. For instance, the search of films where a known actor appears, the search of videosMotion graphicsMotion graphics are graphics that use video footage and/or animation technology to create the illusion of motion or rotation, graphics are usually combined with audio for use in multimedia projects. Motion graphics are usually displayed via electronic media technology, but may be displayed via...
containing the Everest mountain, etc. - Personalized electronic news service.
- Possibility of an automatic connection to a TV channel broadcasting a soccer match, for example, whenever a player approaches the goal area.
- Control and filtering of concrete audio-visual contents, like violent or pornographic material. Also, authorization for some multimediaMultimediaMultimedia is media and content that uses a combination of different content forms. The term can be used as a noun or as an adjective describing a medium as having multiple content forms. The term is used in contrast to media which use only rudimentary computer display such as text-only, or...
contents.
External links
- Multimedia Content Analysis Using both Audio and Video Clueshttp://vision.poly.edu:8080/~jhuang/Publication/Content_Analysis_Wang2000SP.pdf
- Relating Visual and Semantic Image Descriptorshttp://www.acemedia.org/aceMedia/files/document/wp7/2004/ewimt04-dcuThom.pdf
- Fusing MPEG-7 visual descriptors for image classicationhttp://www.acemedia.org/aceMedia/files/document/wp7/2005/icann05-iti.pdf
- MPEG-7 Quick Referencehttp://gondolin.rutgers.edu/MIC/text/how/mpeg7ref.pdf