Margin classifier
Encyclopedia
In machine learning
, a margin classifer is a classifier
which is able to give an associated distance from the decision boundary for each example. For instance, if a linear classifier
(e.g. perceptron
or linear discriminant analysis
) is used, the distance (typically euclidean distance
, though others may be used) of an example from the separating hyperplane is the margin of that example.
The notion of margin is important in several machine learning classification algorithms, as it can be used to bound the generalization error
of the classifier. These bounds are frequently shown using the VC dimension
. Of particular prominence is the generalization error bound on boosting
algorithms and support vector machine
s.
s and maximum-margin hyperplane
for details.
algorithm given a set of examples with two classes can be defined as follows. The classifier is given an example pair where is a domain space and is the label of the example. The iterative boosting algorithm then selects a classifier at each iteration where is a space of possible classifiers that predict real values. This hypothesis is then weighted by as selected by the boosting algorithm. At iteration , The margin of an example can thus be defined as
By this definition, the margin is positive if the example is labeled correctly and negative is the example is labeled incorrectly.
This definition may be modified and is not the only way to define margin for boosting algorithms. However, there are reasons why this definition may be appealing (see the Schapire et al. paper for details ).
Many boosting
algorithms rely on the notion of a margin to give weights to examples. If a convex loss is utilized (as in AdaBoost
, LogitBoost
, and all members of the AnyBoost family of algorithms) then an example with higher margin will receive less (or equal) weight than an example with lower margin. This leads the boosting algorithm to focus weight on low margin examples. In nonconvex algorithms (e.g. BrownBoost
), the margin still dictates the weighting of an example, though the weighting is non-monotone with respect to margin. There exists boosting algorithms that provably maximize the minimum margin (e.g. see ).
Support vector machine
s provably maximize the margin of the separating hyperplane. Support vector machines that are trained using noisy data (there exists no perfect separation of the data in the given space) maximize the soft margin. More discussion of this can be found in the support vector machine
article.
The voted-perceptron algorithm is a margin maximizing algorithm based on an iterative application of the classic perceptron
algorithm.
may be bound by parameters of the algorithm and a margin term. An example of such a bound is for the AdaBoost algorithm. Let be a set of examples sampled independently at random from a distribution . Assume the VC-dimension of the underlying base classifier is and . Then with probability we have the bound
for all .
Machine learning
Machine learning, a branch of artificial intelligence, is a scientific discipline concerned with the design and development of algorithms that allow computers to evolve behaviors based on empirical data, such as from sensor data or databases...
, a margin classifer is a classifier
Classifier
Classifier may refer to:*Classifier *Classifier *Classifier *Hierarchical classifier*Linear classifier...
which is able to give an associated distance from the decision boundary for each example. For instance, if a linear classifier
Linear classifier
In the field of machine learning, the goal of statistical classification is to use an object's characteristics to identify which class it belongs to. A linear classifier achieves this by making a classification decision based on the value of a linear combination of the characteristics...
(e.g. perceptron
Perceptron
The perceptron is a type of artificial neural network invented in 1957 at the Cornell Aeronautical Laboratory by Frank Rosenblatt. It can be seen as the simplest kind of feedforward neural network: a linear classifier.- Definition :...
or linear discriminant analysis
Linear discriminant analysis
Linear discriminant analysis and the related Fisher's linear discriminant are methods used in statistics, pattern recognition and machine learning to find a linear combination of features which characterizes or separates two or more classes of objects or events...
) is used, the distance (typically euclidean distance
Euclidean distance
In mathematics, the Euclidean distance or Euclidean metric is the "ordinary" distance between two points that one would measure with a ruler, and is given by the Pythagorean formula. By using this formula as distance, Euclidean space becomes a metric space...
, though others may be used) of an example from the separating hyperplane is the margin of that example.
The notion of margin is important in several machine learning classification algorithms, as it can be used to bound the generalization error
Generalization error
The generalization error of a machine learning model is a function that measures how far the student machine is from the teacher machine in average over the entire set of possible data that can be generated by the teacher after each iteration of the learning process...
of the classifier. These bounds are frequently shown using the VC dimension
VC dimension
In statistical learning theory, or sometimes computational learning theory, the VC dimension is a measure of the capacity of a statistical classification algorithm, defined as the cardinality of the largest set of points that the algorithm can shatter...
. Of particular prominence is the generalization error bound on boosting
Boosting
Boosting is a machine learning meta-algorithm for performing supervised learning. Boosting is based on the question posed by Kearns: can a set of weak learners create a single strong learner? A weak learner is defined to be a classifier which is only slightly correlated with the true classification...
algorithms and support vector machine
Support vector machine
A support vector machine is a concept in statistics and computer science for a set of related supervised learning methods that analyze data and recognize patterns, used for classification and regression analysis...
s.
Support vector machine definition of margin
See support vector machineSupport vector machine
A support vector machine is a concept in statistics and computer science for a set of related supervised learning methods that analyze data and recognize patterns, used for classification and regression analysis...
s and maximum-margin hyperplane
Maximum-margin hyperplane
In geometry, a maximum-margin hyperplane is a hyperplane which separates two 'clouds' of points and is at equal distance from the two. The margin between the hyperplane and the clouds is maximal. See the article on Support Vector Machines for more details....
for details.
Boosting definition of margin
The margin for an iterative boostingBoosting
Boosting is a machine learning meta-algorithm for performing supervised learning. Boosting is based on the question posed by Kearns: can a set of weak learners create a single strong learner? A weak learner is defined to be a classifier which is only slightly correlated with the true classification...
algorithm given a set of examples with two classes can be defined as follows. The classifier is given an example pair where is a domain space and is the label of the example. The iterative boosting algorithm then selects a classifier at each iteration where is a space of possible classifiers that predict real values. This hypothesis is then weighted by as selected by the boosting algorithm. At iteration , The margin of an example can thus be defined as
By this definition, the margin is positive if the example is labeled correctly and negative is the example is labeled incorrectly.
This definition may be modified and is not the only way to define margin for boosting algorithms. However, there are reasons why this definition may be appealing (see the Schapire et al. paper for details ).
Examples of margin-based algorithms
Many classifiers can give an associated margin for each example. However, only some classifiers utilize information of the margin while learning from a data set.Many boosting
Boosting
Boosting is a machine learning meta-algorithm for performing supervised learning. Boosting is based on the question posed by Kearns: can a set of weak learners create a single strong learner? A weak learner is defined to be a classifier which is only slightly correlated with the true classification...
algorithms rely on the notion of a margin to give weights to examples. If a convex loss is utilized (as in AdaBoost
AdaBoost
AdaBoost, short for Adaptive Boosting, is a machine learning algorithm, formulated by Yoav Freund and Robert Schapire. It is a meta-algorithm, and can be used in conjunction with many other learning algorithms to improve their performance. AdaBoost is adaptive in the sense that subsequent...
, LogitBoost
LogitBoost
LogitBoost is a boosting algorithm formulated by Jerome Friedman, Trevor Hastie, and Robert Tibshirani. The original paper casts the AdaBoost algorithm into a statistics framework...
, and all members of the AnyBoost family of algorithms) then an example with higher margin will receive less (or equal) weight than an example with lower margin. This leads the boosting algorithm to focus weight on low margin examples. In nonconvex algorithms (e.g. BrownBoost
BrownBoost
BrownBoost is a boosting algorithm that may be robust to noisy datasets. BrownBoost is an adaptive version of the boost by majority algorithm. As is true for all boosting algorithms, BrownBoost is used in conjunction with other machine learning methods...
), the margin still dictates the weighting of an example, though the weighting is non-monotone with respect to margin. There exists boosting algorithms that provably maximize the minimum margin (e.g. see ).
Support vector machine
Support vector machine
A support vector machine is a concept in statistics and computer science for a set of related supervised learning methods that analyze data and recognize patterns, used for classification and regression analysis...
s provably maximize the margin of the separating hyperplane. Support vector machines that are trained using noisy data (there exists no perfect separation of the data in the given space) maximize the soft margin. More discussion of this can be found in the support vector machine
Support vector machine
A support vector machine is a concept in statistics and computer science for a set of related supervised learning methods that analyze data and recognize patterns, used for classification and regression analysis...
article.
The voted-perceptron algorithm is a margin maximizing algorithm based on an iterative application of the classic perceptron
Perceptron
The perceptron is a type of artificial neural network invented in 1957 at the Cornell Aeronautical Laboratory by Frank Rosenblatt. It can be seen as the simplest kind of feedforward neural network: a linear classifier.- Definition :...
algorithm.
Generalization error bounds
One theoretical motivation behind margin classifiers is that their generalization errorGeneralization error
The generalization error of a machine learning model is a function that measures how far the student machine is from the teacher machine in average over the entire set of possible data that can be generated by the teacher after each iteration of the learning process...
may be bound by parameters of the algorithm and a margin term. An example of such a bound is for the AdaBoost algorithm. Let be a set of examples sampled independently at random from a distribution . Assume the VC-dimension of the underlying base classifier is and . Then with probability we have the bound
for all .