Pose (computer vision)
Encyclopedia
In computer vision
and in robotics
, a typical task is to identify specific objects in an image and to determine each object's position and orientation relative to some coordinate system. This information can then be used, for example, to allow a robot to manipulate an object or to avoid moving into the object. The combination of position and orientation is referred to as the pose of an object, even though this concept is sometimes used only to describe the orientation. Exterior orientation and Translation are also used as synonyms to pose.
The image data from which the pose of an object is determined can be either a single image, a stereo image pair, or an image sequence where, typically, the camera is moving with a known speed. The objects which are considered can be rather general, including a living being or body parts, e.g., a head or hands. The methods which are used for determining the pose of an object, however, are usually specific for a class of objects and cannot generally be expected to work well for other types of objects.
The pose can be described by means of a rotation and translation transformation which brings the object from a reference pose to the observed pose. This rotation transformation can be represented in different ways, e.g., as a rotation matrix or a quaternion
.
Computer vision
Computer vision is a field that includes methods for acquiring, processing, analysing, and understanding images and, in general, high-dimensional data from the real world in order to produce numerical or symbolic information, e.g., in the forms of decisions...
and in robotics
Robotics
Robotics is the branch of technology that deals with the design, construction, operation, structural disposition, manufacture and application of robots...
, a typical task is to identify specific objects in an image and to determine each object's position and orientation relative to some coordinate system. This information can then be used, for example, to allow a robot to manipulate an object or to avoid moving into the object. The combination of position and orientation is referred to as the pose of an object, even though this concept is sometimes used only to describe the orientation. Exterior orientation and Translation are also used as synonyms to pose.
The image data from which the pose of an object is determined can be either a single image, a stereo image pair, or an image sequence where, typically, the camera is moving with a known speed. The objects which are considered can be rather general, including a living being or body parts, e.g., a head or hands. The methods which are used for determining the pose of an object, however, are usually specific for a class of objects and cannot generally be expected to work well for other types of objects.
The pose can be described by means of a rotation and translation transformation which brings the object from a reference pose to the observed pose. This rotation transformation can be represented in different ways, e.g., as a rotation matrix or a quaternion
Quaternion
In mathematics, the quaternions are a number system that extends the complex numbers. They were first described by Irish mathematician Sir William Rowan Hamilton in 1843 and applied to mechanics in three-dimensional space...
.
Pose Estimation
The specific task of determining the pose of an object in an image (or stereo images, image sequence) is referred to as pose estimation. The pose estimation problem can be solved in different ways depending on the image sensor configuration, and choice of methodology. Three classes of methodologies can be distinguished:- Analytic or geometric methods: Given that the image sensor (camera) is calibrated the mapping from 3D points in the scene and 2D points in the image is known. If also the geometry of the object is known, it means that the projected image of the object on the camera image is a well-known function of the object's pose. Once a set of control points on the object, typically corners or other feature points, has been identified it is then possible to solve the pose transformation from a set of equations which relate the 3D coordinates of the points with their 2D image coordinates.
- Genetic algorithmGenetic algorithmA genetic algorithm is a search heuristic that mimics the process of natural evolution. This heuristic is routinely used to generate useful solutions to optimization and search problems...
methods: If the pose of an object does not have to be computed in real-time a genetic algorithmGenetic algorithmA genetic algorithm is a search heuristic that mimics the process of natural evolution. This heuristic is routinely used to generate useful solutions to optimization and search problems...
may be used. This approach is robust especially when the images are not perfectly calibrated. In this particular case, the pose represent the genetic representationGenetic representationGenetic representation is a way of representing solutions/individuals in evolutionary computation methods. Genetic representation can encode appearance, behavior, physical qualities of individuals. Designing a good genetic representation that is expressive and evolvable is a hard problem in...
and the error between the projection of the object control points with the image is the fitness functionFitness functionA fitness function is a particular type of objective function that is used to summarise, as a single figure of merit, how close a given design solution is to achieving the set aims....
.
- Learning based methods: These methods use artificial learning-based system which learn the mapping from 2D image features to pose transformation. In short, this means that a sufficiently large set of images of the object, in different poses, must be presented to the system during a learning phase. Once the learning phase is completed, the system should be able to present an estimate of the object's pose given an image of the object.
See also
- homographyHomographyHomography is a concept in the mathematical science of geometry.A homography is an invertible transformation from a projective space to itself that maps straight lines to straight lines...
- camera calibrationCamera resectioningCamera resectioning is the process of finding the true parameters of the camera that produced a given photograph or video. Usually, the camera parameters are represented in a 3 × 4 matrix called the camera matrix....
- structure from motionStructure from motionIn computer vision structure from motion refers to the process of finding the three-dimensional structure of an object by analyzing local motion signals over time....
- 3D Pose Estimation3D Pose Estimation3D pose estimation is the problem of determining the transformation of an object in a 2D image which gives the 3D object. The need for 3D pose estimation arises from the limitations of feature based pose estimation. There exist environments where it is difficult to extract corners or edges from...