Object recognition
Encyclopedia
Object recognition in computer vision
Computer vision
Computer vision is a field that includes methods for acquiring, processing, analysing, and understanding images and, in general, high-dimensional data from the real world in order to produce numerical or symbolic information, e.g., in the forms of decisions...

 is the task of finding a given object in an image or video sequence. Humans recognize a multitude of objects in images with little effort, despite the fact that the image of the objects may vary somewhat in different view points, in many different sizes / scale or even when they are translated or rotated. Objects can even be recognized when they are partially obstructed from view. This task is still a challenge for computer vision
Computer vision
Computer vision is a field that includes methods for acquiring, processing, analysing, and understanding images and, in general, high-dimensional data from the real world in order to produce numerical or symbolic information, e.g., in the forms of decisions...

 systems in general.

Approaches based on CAD-like object models

Edge detection, primal sketch, Marr, Mohan and Nevatia, Lowe, Faugeras

Recognition by parts

Binford
Thomas Binford
Thomas Oriel Binford has been a leading researcher in image analysis and computer vision since 1967. He is known for pioneering a model-based approach to computer vision in which complex objects are represented as collections of generalized cylinders...

 (generalized cylinders), Biederman (geons), Dickinson, Forsyth and Ponce

Appearance-based methods

- Use example images (called templates or exemplars) of the objects to perform recognition

- Objects look different under varying conditions:
  • Changes in lighting or color
  • Changes in viewing direction
  • Changes in size / shape


- A single exemplar is unlikely to succeed reliably. However, it is impossible to represent all appearances of an object

1. Edge matching
• Uses edge detection techniques, such as the Canny edge detection, to find edges.
• Changes in lighting and color usually don’t have much effect on image edges
• Strategy:
  1. Detect edges in template and image
  2. Compare edges images to find the template
  3. Must consider range of possible template positions

• Measurements:
  • Good – count the number of overlapping edges. Not robust to changes in shape

  • Better – count the number of template edge pixels with some distance of an edge in the search image

  • Best – determine probability distribution of distance to nearest edge in search image (if template at correct position). Estimate likelihood of each template position generating image


2. Divide-and-Conquer search
• Strategy:
  • Consider all positions as a set (a cell in the space of positions)
  • Determine lower bound on score at best position in cell
  • If bound is too large, prune cell
  • If bound is not too large, divide cell into subcells and try each subcell recursively
  • Process stops when cell is “small enough”

• Unlike multi-resolution search, this technique is guaranteed to find all matches that meet the criterion (assuming that the lower bound is accurate)

• Finding the Bound:
  • To find the lower bound on the best score, look at score for the template position represented by the center of the cell
  • Subtract maximum change from the “center” position for any other position in cell (occurs at cell corners)

• Complexities arise from determining bounds on distance


3. Greyscale matching
• Edges are (mostly) robust to illumination changes, however they throw away a lot of information
• Must compute pixel distance as a function of both pixel position and pixel intensity
• Can be applied to color also


4. Gradient matching
• Another way to be robust to illumination changes without throwing away as much information is to compare image gradients
• Matching is performed like matching greyscale images
• Simple alternative: Use (normalized) correlation


5. Large modelbases
• One approach to efficiently searching the database for a specific image to use eigenvectors of the templates (called eigenfaces)
• Modelbases are a collection of geometric models of the objects that should be recognised

Feature-based methods

- a search is used to find feasible matches between object features and image features.

- the primary constraint is that a single position of the object must account for all of the feasible matches.

- methods that extract features from the objects to be recognized and the images to be searched.
  • surface patches
  • corners
  • linear edges


1. Interpretation trees
• A method for searching for feasible matches, is to search through a tree.
• Each node in the tree represents a set of matches.
  • Root node represents empty set
  • Each other node is the union of the matches in the parent node and one additional match.
  • Wildcard is used for features with no match
• Nodes are “pruned” when the set of matches is infeasible.
  • A pruned node has no children
• Historically significant and still used, but less commonly


2. Hypothesize and test
• General Idea:
  • Hypothesize a correspondence between a collection of image features and a collection of object features
  • Then use this to generate a hyponthesis about the projection from the object coordinate frame to the image frame
  • Use this projection hypothesis to generate a rendering of the object. This step is usually known as backprojection
  • Compare the rendering to the image, and, if the two are sufficiently similar, accept the hypothesis

• Obtaining Hypothesis:
  • There are a variety of different ways of generating hypotheses.
  • When camera intrinsic parameters are known, the hypothesis is equivalent to a hypothetical position and orientation – pose – for the object.
  • Utilize geometric constraints
  • Construct a correspondence for small sets of object features to every correctly sized subset of image points. (These are the hypotheses)

• Three basic approaches:
  • Obtaining Hypotheses by Pose Consistency
  • Obtaining Hypotheses by Pose Clustering
  • Obtaining Hypotheses by Using Invariants

• Expense search that is also redundant, but can be improved using Randomization and/or Grouping
  • Randomization
§ Examining small sets of image features until likelihood of missing object becomes small
§ For each set of image features, all possible matching sets of model features must be considered.
§ Formula:

( 1 – Wc)k = Z

W = the fraction of image points that are “good” (w ~ m/n)
c = the number of correspondences necessary
k = the number of trials
Z = the probability of every trial using one (or more) incorrect correspondences

  • Grouping
§ If we can determine groups of points that are likely to come from the same object, we can reduce the number of hypotheses that need to be examined


3. Pose consistency
• Also called Alignment, since the object is being aligned to the image
• Correspondences between image features and model features are not independent – Geometric constraints
• A small number of correspondences yields the object position – the others must be consistent with this
• General Idea:
  • If we hypothesize a match between a sufficiently large group of image features and a sufficiently large group of object features, then we can recover the missing camera parameters from this hypothesis (and so render the rest of the object)

• Strategy:
  • Generate hypotheses using small number of correspondences (e.g. triples of points for 3D recognition)
  • Project other model features into image (backproject) and verify additional correspondences
• Use the smallest number of correspondences necessary to achieve discrete object poses


4. Pose clustering
• General Idea:
  • Each object leads to many correct sets of correspondences, each of which has (roughly) the same pose
  • Vote on pose. Use an accumulator array that represents pose space for each object
  • This is essentially a Hough transform
    Hough transform
    The Hough transform is a feature extraction technique used in image analysis, computer vision, and digital image processing. The purpose of the technique is to find imperfect instances of objects within a certain class of shapes by a voting procedure...


• Strategy:
  • For each object, set up an accumulator array that represents pose space – each element in the accumulator array corresponds to a “bucket” in pose space.
  • Then take each image frame group, and hypothesize a correspondence between it and every frame group on every object
  • For each of these correspondences, determine pose parameters and make an entry in the accumulator array for the current object at the pose value.
  • If there are large numbers of votes in any object’s accumulator array, this can be interpreted as evidence for the presence of that object at that pose.
  • The evidence can be checked using a verification method

• Note that this method uses sets of correspondences, rather than individual correspondences
  • Implementation is easier, since each set yields a small number of possible object poses.

• Improvement

  • The noise resistance of this method can be improved by not counting votes for objects at poses where the vote is obviously unreliable
§ For example, in cases where, if the object was at that pose, the object frame group would be invisible.
  • These improvements are sufficient to yield working systems


5. Invariance
Invariance
Invariance is a French magazine edited by Jacques Camatte, published since 1968.It emerged from the Italian left-communist tradition associated with Amadeo Bordiga and it originally bore the subtitle "Invariance of the theory of the proletariat", indicating Bordiga's notion of the unchanging nature...

• There are geometric properties that are invariant to camera transformations
• Most easily developed for images of planar objects, but can be applied to other cases as well


6. Geometric hashing
Geometric hashing
In computer science, geometric hashing is originally a method for efficiently finding two-dimensional objects represented by discrete points that have undergone an affine transformation, though extensions exist to some other object representations and transformations. In an off-line step, the...

• An algorithm that uses geometric invariants to vote for object hypotheses
• Similar to pose clustering, however instead of voting on pose, we are now voting on geometry
• A technique originally developed for matching geometric features (uncalibrated affine views of plane models) against a database of such features
• Widely used for pattern-matching, CAD/CAM, and medical imaging.
• It is difficult to choose the size of the buckets
• It is hard to be sure what “enough” means. Therefore there my be some danger that the table will get clogged.


7. Scale-invariant feature transform
Scale-invariant feature transform
Scale-invariant feature transform is an algorithm in computer vision to detect and describe local features in images. The algorithm was published by David Lowe in 1999....

 (SIFT)
• Keypoints of objects are first extracted from a set of reference images and stored in a database
• An object is recognized in a new image by individually comparing each feature from the new image to this database and finding candidate matching features based on Euclidean distance of their feature vectors.


8. Speeded Up Robust Features (SURF)
• A robust image detector & descriptor
• The standard version is several times faster than SIFT and claimed by its authors to be more robust against different image transformations than SIFT
• Based on sums of approximated 2D Haar wavelet responses
Haar-like features
Haar-like features are digital image features used in object recognition. They owe their name to their intuitive similarity with Haar wavelets and were used in the first real-time face detector....

 and made efficient use of integral images.

Other approaches

Template matching, gradient histograms, intraclass transfer learning, explicit and implicit 3D object models, global scene representations, shading, reflectance, texture, grammars, topic models, biologically inspired object recognition

Window-based detection, 3D cues, context, leveraging internet data, unsupervised learning, fast indexing

Applications

Object recognition methods has the following applications:
  • Image panoramas
  • Image watermarking
  • Global robot localization
  • Face Detection
  • Optical Character Recognition
  • Manufacturing Quality Control
  • Content-Based Image Indexing
  • Object Counting and Monitoring
  • Automated vehicle parking systems
  • Visual Positioning and tracking
  • Video Stabilization

See also

  • 3D single object recognition
    3D single object recognition
    In computer vision, 3D single object recognition involves recognizing and determining the pose of user-chosen 3D object in a photograph or range scan. Typically, an example of the object to be recognized is presented to a vision system in a controlled environment, and then for an arbitrary input...

  • Scale-invariant feature transform
    Scale-invariant feature transform
    Scale-invariant feature transform is an algorithm in computer vision to detect and describe local features in images. The algorithm was published by David Lowe in 1999....

     (SIFT)
  • SURF
    SURF
    SURF is a robust image detector & descriptor, first presented by Herbert Bay et al. in 2006, that can be used in computer vision tasks like object recognition or 3D reconstruction. It is partly inspired by the SIFT descriptor...

  • Histogram of oriented gradients
    Histogram of oriented gradients
    Histogram of Oriented Gradients are feature descriptors used in computer vision and image processing for the purpose of object detection. The technique counts occurrences of gradient orientation in localized portions of an image...

  • Boosting methods for object categorization
    Boosting methods for object categorization
    Given images containing various known objects in the world, a classifier can be learned from them to automatically categorize the objects in future images. Simple classifiers built based on some image feature of the object tend to be weak in categorization performance...

  • Bag of words model in computer vision
    Bag of words model in computer vision
    This is an article introducing the "Bag of words model" in computer vision, especially for object categorization. From now, the "BoW" model refers to the BoW model in computer vision unless explicitly declared. This technique is also known as "Bag of Features model".Before introducing the BoW...


  • Feature detection (computer vision)
  • Interest point detection
    Interest point detection
    Interest point detection is a recent terminology in computer vision that refers to the detection of interest points for subsequent processing...


  • OpenCV
    OpenCV
    OpenCV is a library of programming functions mainly aimed at real time computer vision, developed by Intel and now supported by Willow Garage. It is free for use under the open source BSD license. The library is cross-platform. It focuses mainly on real-time image processing...

  • Pattern recognition
    Pattern recognition
    In machine learning, pattern recognition is the assignment of some sort of output value to a given input value , according to some specific algorithm. An example of pattern recognition is classification, which attempts to assign each input value to one of a given set of classes...

  • Template matching
    Template matching
    Template matching is a technique in digital image processing for finding small parts of an image which match a template image. It can be used in manufacturing as a part of quality control, a way to navigate a mobile robot, or as a way to detect edges in images....

The source of this article is wikipedia, the free encyclopedia.  The text of this article is licensed under the GFDL.
 
x
OK