Boosting
Encyclopedia
Boosting is a machine learning
Machine learning
Machine learning, a branch of artificial intelligence, is a scientific discipline concerned with the design and development of algorithms that allow computers to evolve behaviors based on empirical data, such as from sensor data or databases...

 meta-algorithm for performing supervised learning
Supervised learning
Supervised learning is the machine learning task of inferring a function from supervised training data. The training data consist of a set of training examples. In supervised learning, each example is a pair consisting of an input object and a desired output value...

. Boosting is based on the question posed by Kearns: can a set of weak learners create a single strong learner? A weak learner is defined to be a classifier which is only slightly correlated with the true classification (it can label examples better than random guessing). In contrast, a strong learner is a classifier that is arbitrarily well-correlated with the true classification.

Schapire's affirmative answer to Kearns' question has had significant ramifications in machine learning
Machine learning
Machine learning, a branch of artificial intelligence, is a scientific discipline concerned with the design and development of algorithms that allow computers to evolve behaviors based on empirical data, such as from sensor data or databases...

 and statistics
Statistics
Statistics is the study of the collection, organization, analysis, and interpretation of data. It deals with all aspects of this, including the planning of data collection in terms of the design of surveys and experiments....

, most notably leading to the development of boosting.

Boosting algorithms

While boosting is not algorithmically constrained, most boosting algorithms consist of iteratively learning weak classifiers with respect to a distribution and adding them to a final strong classifier. When they are added, they are typically weighted in some way that is usually related to the weak learners' accuracy. After a weak learner is added, the data is reweighted: examples that are misclassified gain weight and examples that are classified correctly lose weight (some boosting algorithms actually decrease the weight of repeatedly misclassified examples, e.g., boost by majority and BrownBoost
BrownBoost
BrownBoost is a boosting algorithm that may be robust to noisy datasets. BrownBoost is an adaptive version of the boost by majority algorithm. As is true for all boosting algorithms, BrownBoost is used in conjunction with other machine learning methods...

). Thus, future weak learners focus more on the examples that previous weak learners misclassified.

There are many boosting algorithms. The original ones, proposed by Robert Schapire
Robert Schapire
Robert Elias Schapire is a professor and researcher in the computer science department at Princeton University. His primary specialty is theoretical and applied machine learning....

 (a recursive majority gate formulation ) and Yoav Freund
Yoav Freund
Yoav Freund is a researcher and professor at the University of California, San Diego who mainly works on machine learning, probability theory and related fields and applications.From his homepage:...

 (boost by majority ), were not adaptive and could not take full advantage of the weak learners.

Only algorithms that are provable boosting algorithms in the probably approximately correct learning
Probably approximately correct learning
In computational learning theory, probably approximately correct learning is a framework for mathematical analysis of machine learning. It was proposed in 1984 by Leslie Valiant....

 formulation are called boosting algorithms. Other algorithms that are similar in spirit to boosting algorithms are sometimes called "leveraging algorithms", although they are also sometimes incorrectly called boosting algorithms.

Examples of boosting algorithms

The main variation between many boosting algorithms is their method of weighting training data points and hypotheses. AdaBoost
AdaBoost
AdaBoost, short for Adaptive Boosting, is a machine learning algorithm, formulated by Yoav Freund and Robert Schapire. It is a meta-algorithm, and can be used in conjunction with many other learning algorithms to improve their performance. AdaBoost is adaptive in the sense that subsequent...

 is very popular and perhaps the most significant historically as it was the first algorithm that could adapt to the weak learners. However, there are many more recent algorithms such as LPBoost
LPBoost
Linear Programming Boosting is a supervised classifier from the Boosting family of classifiers. LPBoost maximizes a margin between training samples of different classes and hence also belongs to the class of margin-maximizing supervised classification algorithms...

, TotalBoost, BrownBoost
BrownBoost
BrownBoost is a boosting algorithm that may be robust to noisy datasets. BrownBoost is an adaptive version of the boost by majority algorithm. As is true for all boosting algorithms, BrownBoost is used in conjunction with other machine learning methods...

, MadaBoost, LogitBoost
LogitBoost
LogitBoost is a boosting algorithm formulated by Jerome Friedman, Trevor Hastie, and Robert Tibshirani. The original paper casts the AdaBoost algorithm into a statistics framework...

, and others. Many boosting algorithms fit into the AnyBoost framework, which shows that boosting performs gradient descent
Gradient descent
Gradient descent is a first-order optimization algorithm. To find a local minimum of a function using gradient descent, one takes steps proportional to the negative of the gradient of the function at the current point...

 in function space
Function space
In mathematics, a function space is a set of functions of a given kind from a set X to a set Y. It is called a space because in many applications it is a topological space, a vector space, or both.-Examples:...

 using a convex
Convex function
In mathematics, a real-valued function f defined on an interval is called convex if the graph of the function lies below the line segment joining any two points of the graph. Equivalently, a function is convex if its epigraph is a convex set...

 cost function. In 2008 Phillip Long (at Google) and Rocco A. Servedio (Columbia University) published a paper at the 25th International Conference for Machine Learning suggesting that these algorithms are provably flawed in that "convex potential boosters cannot withstand random classification
noise," thus making the applicability of such algorithms for real world, noisy data sets questionable.

See also

  • AdaBoost
    AdaBoost
    AdaBoost, short for Adaptive Boosting, is a machine learning algorithm, formulated by Yoav Freund and Robert Schapire. It is a meta-algorithm, and can be used in conjunction with many other learning algorithms to improve their performance. AdaBoost is adaptive in the sense that subsequent...

  • Alternating decision tree
    Alternating decision tree
    An alternating decision tree is a machine learning method for classification. It generalizes decision trees and has connections to boosting.-History:...

  • Bootstrap aggregating
    Bootstrap aggregating
    Bootstrap aggregating is a machine learning ensemble meta-algorithm to improve machine learning of statistical classification and regression models in terms of stability and classification accuracy. It also reduces variance and helps to avoid overfitting. Although it is usually applied to decision...

  • Cascading
    Cascading classifiers
    Cascading is based on the concatenation of several classifiers, using all information collected from the output from a given classifier as additional information for the next classifier in the cascade. Unlike voting or stacking ensembles, which are multiexpert systems, cascading is a multistage...

  • BrownBoost
    BrownBoost
    BrownBoost is a boosting algorithm that may be robust to noisy datasets. BrownBoost is an adaptive version of the boost by majority algorithm. As is true for all boosting algorithms, BrownBoost is used in conjunction with other machine learning methods...

  • CoBoosting
    CoBoosting
    CoBoost is a variant of Boosting proposed by Collins and Singer.It may be seen as a combination of co-training and boosting. Each example is available in two views, and boosting is applied iteratively and alternatively in each views using pseudo-labels produced in the other view....

  • GentleBoost
  • LPBoost
    LPBoost
    Linear Programming Boosting is a supervised classifier from the Boosting family of classifiers. LPBoost maximizes a margin between training samples of different classes and hence also belongs to the class of margin-maximizing supervised classification algorithms...

  • logistic regression
    Logistic regression
    In statistics, logistic regression is used for prediction of the probability of occurrence of an event by fitting data to a logit function logistic curve. It is a generalized linear model used for binomial regression...

  • maximum entropy methods
    Principle of maximum entropy
    In Bayesian probability, the principle of maximum entropy is a postulate which states that, subject to known constraints , the probability distribution which best represents the current state of knowledge is the one with largest entropy.Let some testable information about a probability distribution...

  • neural network
    Neural network
    The term neural network was traditionally used to refer to a network or circuit of biological neurons. The modern usage of the term often refers to artificial neural networks, which are composed of artificial neurons or nodes...

    s
  • support vector machine
    Support vector machine
    A support vector machine is a concept in statistics and computer science for a set of related supervised learning methods that analyze data and recognize patterns, used for classification and regression analysis...

    s
  • Gradient boosting
    Gradient boosting
    Gradient boosting is a machine learning technique for regression problems, which produces a prediction model in the form of an ensemble of weak prediction models, typically decision trees. It builds the model in a stage-wise fashion like other boosting methods do, and it generalizes them by...

  • RankBoost
  • margin classifier
    Margin classifier
    In machine learning, a margin classifer is a classifier which is able to give an associated distance from the decision boundary for each example. For instance, if a linear classifier In machine learning, a margin classifer is a classifier which is able to give an associated distance from the...

    s
  • cross-validation
  • machine learning
    Machine learning
    Machine learning, a branch of artificial intelligence, is a scientific discipline concerned with the design and development of algorithms that allow computers to evolve behaviors based on empirical data, such as from sensor data or databases...

  • Boosting methods for object categorization
    Boosting methods for object categorization
    Given images containing various known objects in the world, a classifier can be learned from them to automatically categorize the objects in future images. Simple classifiers built based on some image feature of the object tend to be weak in categorization performance...


  • Implementations

    • Orange
      Orange (software)
      Orange is a component-based data mining and machine learning software suite, featuring friendly yet powerful and flexible visual programming front-end for explorative data analysis and visualization, and Python bindings and libraries for scripting...

      , a free data mining software suite, module orngEnsemble
    • Weka
      Weka (machine learning)
      Weka is a popular suite of machine learning software written in Java, developed at the University of Waikato, New Zealand...

      is a machine learning set of tools that offers variate implementations of boosting algorithms like AdaBoost and LogitBoost
    • R package GBM (Generalized Boosted Regression Models) implements extensions to Freund and Schapire's AdaBoost algorithm and Friedman's gradient boosting machine.

    Notations

    • Yoav Freund and Robert E. Schapire A decision-theoretic generalization of on-line learning and an application to boosting. Journal of Computer and System Sciences, 55(1):119--139, 1997. http://www.cse.ucsd.edu/~yfreund/papers/adaboost.pdf
    • Robert E. Schapire and Yoram Singer. Improved Boosting Algorithms Using Confidence-Rated Predictors. Machine Learning, 37(3):297--336, 1999. http://citeseer.ist.psu.edu/schapire99improved.html

    External links

    The source of this article is wikipedia, the free encyclopedia.  The text of this article is licensed under the GFDL.
     
    x
    OK