Video tracking
Encyclopedia
Video tracking is the process of locating a moving
object (or multiple objects) over time using a camera. It has a variety of uses, some of which are: human-computer interaction, security and surveillance, video communication and compression, augmented reality, traffic control, medical imaging and video editing. Video tracking can be a time consuming process due to the amount of data that is contained in video. Adding further to the complexity is the possible need to use object recognition
techniques for tracking.
. Another situation that increases the complexity of the problem is when the tracked object changes orientation over time. For these situations video tracking systems usually employ a motion model which describes how the image of the target might change for different possible motions of the object.
Examples of simple motion models are:
Target representation and localization is mostly a bottom-up process. These methods give a variety of tools for identifying the moving object. Locating and tracking the target object successfully is dependent on the algorithm. For example, using blob tracking is useful for identifying human movement because a person's profile changes dynamically. Typically the computational complexity for these algorithms is low. The following are some common target representation and localization algorithms:
Filtering and data association is mostly a top-down process, which involves incorporating prior information about the scene or object, dealing with object dynamics, and evaluation of different hypotheses. These methods allow the tracking of complex objects along with more complex object interaction like tracking objects moving behind obstructions. Additionally the complexity is increased if the video tracker (also named TV tracker or target tracker) is not mounted on rigid foundation (on-shore) but on a moving ship (off-shore), where typically an inertial measurement system is used to pre-stabilize the video tracker to reduce the required dynamics and bandwidth of the camera system..
The computational complexity for these algorithms is usually much higher. The following are some common filtering algorithms:
Motion (physics)
In physics, motion is a change in position of an object with respect to time. Change in action is the result of an unbalanced force. Motion is typically described in terms of velocity, acceleration, displacement and time . An object's velocity cannot change unless it is acted upon by a force, as...
object (or multiple objects) over time using a camera. It has a variety of uses, some of which are: human-computer interaction, security and surveillance, video communication and compression, augmented reality, traffic control, medical imaging and video editing. Video tracking can be a time consuming process due to the amount of data that is contained in video. Adding further to the complexity is the possible need to use object recognition
Object recognition
Object recognition in computer vision is the task of finding a given object in an image or video sequence. Humans recognize a multitude of objects in images with little effort, despite the fact that the image of the objects may vary somewhat in different view points, in many different sizes / scale...
techniques for tracking.
Objective
The objective of video tracking is to associate target objects in consecutive video frames. The association can be especially difficult when the objects are moving fast relative to the frame rateFrame rate
Frame rate is the frequency at which an imaging device produces unique consecutive images called frames. The term applies equally well to computer graphics, video cameras, film cameras, and motion capture systems...
. Another situation that increases the complexity of the problem is when the tracked object changes orientation over time. For these situations video tracking systems usually employ a motion model which describes how the image of the target might change for different possible motions of the object.
Examples of simple motion models are:
- When tracking planar objects, the motion model is a 2D transformation (affine transformationAffine transformationIn geometry, an affine transformation or affine map or an affinity is a transformation which preserves straight lines. It is the most general class of transformations with this property...
or homography) of an image of the object (e.g. the initial frame). - When the target is a rigid 3D object, the motion model defines its aspect depending on its 3D position and orientation.
- For video compression, key frameKey frameA key frame in animation and filmmaking is a drawing that defines the starting and ending points of any smooth transition. They are called "frames" because their position in time is measured in frames on a strip of film...
s are divided into macroblockMacroblockMacroblock is an image compression component and technique based on discrete cosine transform used on still images and video frames. Macroblocks are usually composed of two or more blocks of pixels. In the JPEG standard macroblocks are called MCU blocks....
s. The motion model is a disruption of a key frame, where each macroblock is translated by a motion vector given by the motion parameters. - The image of deformable objects can be covered with a mesh, the motion of the object is defined by the position of the nodes of the mesh.
Algorithms
To perform video tracking an algorithm analyzes sequential video frames and outputs the movement of targets between the frames. There are a variety of algorithms, each having strengths and weaknesses. Considering the intended use is important when choosing which algorithm to use. There are two major components of a visual tracking system: target representation and localization and filtering and data association.Target representation and localization is mostly a bottom-up process. These methods give a variety of tools for identifying the moving object. Locating and tracking the target object successfully is dependent on the algorithm. For example, using blob tracking is useful for identifying human movement because a person's profile changes dynamically. Typically the computational complexity for these algorithms is low. The following are some common target representation and localization algorithms:
- Blob tracking: segmentation of object interior (for example blob detectionBlob detectionIn the area of computer vision, blob detection refers to visual modules that are aimed at detecting points and/or regions in the image that differ in properties like brightness or color compared to the surrounding...
, block-based correlation or optical flowOptical flowOptical flow or optic flow is the pattern of apparent motion of objects, surfaces, and edges in a visual scene caused by the relative motion between an observer and the scene. The concept of optical flow was first studied in the 1940s and ultimately published by American psychologist James J....
) - Kernel-based tracking (mean-shift tracking): an iterative localization procedure based on the maximization of a similarity measure (Bhattacharyya coefficient).
- Contour tracking: detection of object boundary (e.g. active contours or Condensation algorithmCondensation algorithmThe condensation algorithm is a computer vision algorithm. The principal application is to detect and track the contour of objects moving in a cluttered environment. Object tracking is one of the more basic and difficult aspects of computer vision and is generally a prerequisite to object...
) - Visual feature matching: registrationImage registrationImage registration is the process of transforming different sets of data into one coordinate system. Data may be multiple photographs, data from different sensors, from different times, or from different viewpoints. It is used in computer vision, medical imaging, military automatic target...
Filtering and data association is mostly a top-down process, which involves incorporating prior information about the scene or object, dealing with object dynamics, and evaluation of different hypotheses. These methods allow the tracking of complex objects along with more complex object interaction like tracking objects moving behind obstructions. Additionally the complexity is increased if the video tracker (also named TV tracker or target tracker) is not mounted on rigid foundation (on-shore) but on a moving ship (off-shore), where typically an inertial measurement system is used to pre-stabilize the video tracker to reduce the required dynamics and bandwidth of the camera system..
The computational complexity for these algorithms is usually much higher. The following are some common filtering algorithms:
- Kalman filterKalman filterIn statistics, the Kalman filter is a mathematical method named after Rudolf E. Kálmán. Its purpose is to use measurements observed over time, containing noise and other inaccuracies, and produce values that tend to be closer to the true values of the measurements and their associated calculated...
: an optimal recursive Bayesian filter for linear functions subjected to Gaussian noise. - Particle filterParticle filterIn statistics, particle filters, also known as Sequential Monte Carlo methods , are sophisticated model estimation techniques based on simulation...
: useful for sampling the underlying state-space distribution of nonlinear and non-Gaussian processes.
See also
- Match movingMatch movingIn cinematography, match moving is a visual-effects, cinematic techniques that allows the insertion of computer graphics into live-action footage with correct position, scale, orientation, and motion relative to the photographed objects in the shot...
- Motion captureMotion captureMotion capture, motion tracking, or mocap are terms used to describe the process of recording movement and translating that movement on to a digital model. It is used in military, entertainment, sports, and medical applications, and for validation of computer vision and robotics...
- Motion estimationMotion estimationMotion estimation is the process of determining motion vectors that describe the transformation from one 2D image to another; usually from adjacent frames in a video sequence. It is an ill-posed problem as the motion is in three dimensions but the images are a projection of the 3D scene onto a 2D...
- SwistrackSwistrackSwisTrack is a tool for tracking robots, humans, animals and objects using a camera or a recorded video as input source. It uses Intel's OpenCV library for fast image processing and contains interfaces for USB, FireWire and GigE cameras, as well as AVI files....
- Single particle trackingSingle particle trackingSingle particle tracking is the observation of the motion of individual particles within a medium. The coordinates over a series of time steps is referred to as a trajectory. The trajectory can be analyzed to identify modes of motion or heterogeneities in the motion such as obstacles or regions...