Computational learning theory
Encyclopedia
In theoretical computer science
Theoretical computer science
Theoretical computer science is a division or subset of general computer science and mathematics which focuses on more abstract or mathematical aspects of computing....

, computational learning theory is a mathematical field related to the analysis of machine learning
Machine learning
Machine learning, a branch of artificial intelligence, is a scientific discipline concerned with the design and development of algorithms that allow computers to evolve behaviors based on empirical data, such as from sensor data or databases...

 algorithms.

Overview

Theoretical results in machine learning mainly deal with a type of
inductive learning called supervised learning. In supervised
learning, an algorithm is given samples that are labeled in some
useful way. For example, the samples might be descriptions of
mushrooms, and the labels could be whether or not the mushrooms are
edible. The algorithm takes these previously labeled samples and
uses them to induce a classifier. This classifier is a function that
assigns labels to samples including the samples that have never been
previously seen by the algorithm. The goal of the supervised learning
algorithm is to optimize some measure of performance such as
minimizing the number of mistakes made on new samples.

In addition to performance bounds, computational learning theorists
study the time complexity and feasibility of learning. In
computational learning theory, a computation is considered feasible if
it can be done in polynomial time. There are two kinds of time
complexity results:
  • Positive results Showing that a certain class of functions is learnable in polynomial time.
  • Negative results Showing that certain classes cannot be learned in polynomial time.

Negative results are proven only by assumption. The assumptions that are common in negative results are:
  • Computational complexity - P ≠ NP
  • Cryptographic
    Cryptography
    Cryptography is the practice and study of techniques for secure communication in the presence of third parties...

     - One-way function
    One-way function
    In computer science, a one-way function is a function that is easy to compute on every input, but hard to invert given the image of a random input. Here "easy" and "hard" are to be understood in the sense of computational complexity theory, specifically the theory of polynomial time problems...

    s exist.


There are several different approaches to computational learning
theory. These differences are based on making assumptions about the
inference
Inference
Inference is the act or process of deriving logical conclusions from premises known or assumed to be true. The conclusion drawn is also called an idiomatic. The laws of valid inference are studied in the field of logic.Human inference Inference is the act or process of deriving logical conclusions...

 principles used to generalize from limited data. This
includes different definitions of probability
Probability
Probability is ordinarily used to describe an attitude of mind towards some proposition of whose truth we arenot certain. The proposition of interest is usually of the form "Will a specific event occur?" The attitude of mind is of the form "How certain are we that the event will occur?" The...

 (see
frequency probability
Frequency probability
Frequency probability is the interpretation of probability that defines an event's probability as the limit of its relative frequency in a large number of trials. The development of the frequentist account was motivated by the problems and paradoxes of the previously dominant viewpoint, the...

, Bayesian probability
Bayesian probability
Bayesian probability is one of the different interpretations of the concept of probability and belongs to the category of evidential probabilities. The Bayesian interpretation of probability can be seen as an extension of logic that enables reasoning with propositions, whose truth or falsity is...

) and different assumptions on the generation of samples. The different approaches include:
  • Probably approximately correct learning
    Probably approximately correct learning
    In computational learning theory, probably approximately correct learning is a framework for mathematical analysis of machine learning. It was proposed in 1984 by Leslie Valiant....

     (PAC learning), proposed by Leslie Valiant
    Leslie Valiant
    Leslie Gabriel Valiant is a British computer scientist and computational theorist.He was educated at King's College, Cambridge, Imperial College London, and University of Warwick where he received his Ph.D. in computer science in 1974. He started teaching at Harvard University in 1982 and is...

    ;
  • VC theory, proposed by Vladimir Vapnik
    Vladimir Vapnik
    Vladimir Naumovich Vapnik is one of the main developers of Vapnikā€“Chervonenkis theory. He was born in the Soviet Union. He received his master's degree in mathematics at the Uzbek State University, Samarkand, Uzbek SSR in 1958 and Ph.D in statistics at the Institute of Control Sciences, Moscow in...

    ;
  • Bayesian inference
    Bayesian inference
    In statistics, Bayesian inference is a method of statistical inference. It is often used in science and engineering to determine model parameters, make predictions about unknown variables, and to perform model selection...

    , arising from work first done by Thomas Bayes
    Thomas Bayes
    Thomas Bayes was an English mathematician and Presbyterian minister, known for having formulated a specific case of the theorem that bears his name: Bayes' theorem...

    .
  • Algorithmic learning theory
    Algorithmic learning theory
    Algorithmic learning theory is a framework for machine learning.The framework was introduced in E. Mark Gold's seminal paper "Language identification in the limit"...

    , from the work of E. M. Gold.
  • Online machine learning
    Online machine learning
    In machine learning, online learning is a model of induction that learns one instance at atime. The goal in online learning is to predict labels for instances.For example, the instances could describe the current conditions of...

    , from the work of Nick Littlestone.


Computational learning theory has led to several practical
algorithms. For example, PAC theory inspired boosting
Boosting
Boosting is a machine learning meta-algorithm for performing supervised learning. Boosting is based on the question posed by Kearns: can a set of weak learners create a single strong learner? A weak learner is defined to be a classifier which is only slightly correlated with the true classification...

, VC theory
led to support vector machine
Support vector machine
A support vector machine is a concept in statistics and computer science for a set of related supervised learning methods that analyze data and recognize patterns, used for classification and regression analysis...

s, and Bayesian inference led to
belief networks (by Judea Pearl
Judea Pearl
Judea Pearl is a computer scientist and philosopher, best known for developing the probabilistic approach to artificial intelligence and the development of Bayesian networks ....

).

Surveys

  • Angluin, D. 1992. Computational learning theory: Survey and selected bibliography. In Proceedings of the Twenty-Fourth Annual ACM Symposium on Theory of Computing (May 1992), pp. 351--369. http://portal.acm.org/citation.cfm?id=129712.129746
  • D. Haussler. Probably approximately correct learning. In AAAI-90 Proceedings of the Eight National Conference on Artificial Intelligence, Boston, MA, pages 1101--1108. American Association for Artificial Intelligence, 1990. http://citeseer.ist.psu.edu/haussler90probably.html

VC dimension
VC dimension
In statistical learning theory, or sometimes computational learning theory, the VC dimension is a measure of the capacity of a statistical classification algorithm, defined as the cardinality of the largest set of points that the algorithm can shatter...

  • V. Vapnik and A. Chervonenkis. On the uniform convergence of relative frequencies of events to their probabilities. Theory of Probability and its Applications, 16(2):264--280, 1971.

Feature selection

  • A. Dhagat and L. Hellerstein. PAC learning with irrelevant attributes. In Proceedings of the IEEE Symp. on Foundation of Computer Science, 1994. http://citeseer.ist.psu.edu/dhagat94pac.html

Inductive inference

  • E. M. Gold. Language identification in the limit. Information and Control, 10:447--474, 1967.

Optimal O notation learning

  • O. Goldreich, D. Ron. On universal learning algorithms. http://citeseer.ist.psu.edu/69804.html

Negative results

  • M. Kearns and L. G. Valiant. 1989. Cryptographic limitations on learning boolean formulae and finite automata. In Proceedings of the 21st Annual ACM Symposium on Theory of Computing, pages 433--444, New York. ACM. http://citeseer.ist.psu.edu/kearns89cryptographic.html

Boosting
Boosting
Boosting is a machine learning meta-algorithm for performing supervised learning. Boosting is based on the question posed by Kearns: can a set of weak learners create a single strong learner? A weak learner is defined to be a classifier which is only slightly correlated with the true classification...

  • Robert E. Schapire. The strength of weak learnability. Machine Learning, 5(2):197--227, 1990 http://citeseer.ist.psu.edu/schapire90strength.html

Occam's Razor
Occam's razor
Occam's razor, also known as Ockham's razor, and sometimes expressed in Latin as lex parsimoniae , is a principle that generally recommends from among competing hypotheses selecting the one that makes the fewest new assumptions.-Overview:The principle is often summarized as "simpler explanations...

  • Blumer, A.; Ehrenfeucht, A.; Haussler, D.; Warmuth, M. K. "Occam's razor" Inf.Proc.Lett. 24, 377-380, 1987.
  • A. Blumer, A. Ehrenfeucht, D. Haussler, and M. K. Warmuth. Learnability and the Vapnik-Chervonenkis dimension. Journal of the ACM, 36(4):929--865, 1989.

Probably approximately correct learning
Probably approximately correct learning
In computational learning theory, probably approximately correct learning is a framework for mathematical analysis of machine learning. It was proposed in 1984 by Leslie Valiant....

  • L. Valiant. A Theory of the Learnable. Communications of the ACM, 27(11):1134--1142, 1984.

Error tolerance

  • Michael Kearns and Ming Li. Learning in the presence of malicious errors. SIAM Journal on Computing, 22(4):807--837, August 1993. http://citeseer.ist.psu.edu/kearns93learning.html
  • Kearns, M. (1993). Efficient noise-tolerant learning from statistical queries. In Proceedings of the Twenty-Fifth Annual ACM Symposium on Theory of Computing, pages 392--401. http://citeseer.ist.psu.edu/kearns93efficient.html

Equivalence

  • D.Haussler, M.Kearns, N.Littlestone and M. Warmuth
    Manfred K. Warmuth
    Manfred Klaus Warmuth is a researcher and professor at the University of California, Santa Cruz. His main research interest is computational learning theory with a special focus on online learning algorithms.-External links:*...

    , Equivalence of models for polynomial learnability, Proc. 1st ACM Workshop on Computational Learning Theory, (1988) 42-55.
  • L. Pitt and M. K. Warmuth: Prediction preserving reduction, Journal of Computer System and Science 41, 430--467, 1990.


A description of some of these publications is given at important publications in machine learning.

External links

The source of this article is wikipedia, the free encyclopedia.  The text of this article is licensed under the GFDL.
 
x
OK