Computational learning theory
Encyclopedia
In theoretical computer science
, computational learning theory is a mathematical field related to the analysis of machine learning
algorithms.
inductive learning called supervised learning. In supervised
learning, an algorithm is given samples that are labeled in some
useful way. For example, the samples might be descriptions of
mushrooms, and the labels could be whether or not the mushrooms are
edible. The algorithm takes these previously labeled samples and
uses them to induce a classifier. This classifier is a function that
assigns labels to samples including the samples that have never been
previously seen by the algorithm. The goal of the supervised learning
algorithm is to optimize some measure of performance such as
minimizing the number of mistakes made on new samples.
In addition to performance bounds, computational learning theorists
study the time complexity and feasibility of learning. In
computational learning theory, a computation is considered feasible if
it can be done in polynomial time. There are two kinds of time
complexity results:
Negative results are proven only by assumption. The assumptions that are common in negative results are:
There are several different approaches to computational learning
theory. These differences are based on making assumptions about the
inference
principles used to generalize from limited data. This
includes different definitions of probability
(see
frequency probability
, Bayesian probability
) and different assumptions on the generation of samples. The different approaches include:
Computational learning theory has led to several practical
algorithms. For example, PAC theory inspired boosting
, VC theory
led to support vector machine
s, and Bayesian inference led to
belief networks (by Judea Pearl
).
VC dimension
Boosting
Occam's Razor
Probably approximately correct learning
A description of some of these publications is given at important publications in machine learning.
Theoretical computer science
Theoretical computer science is a division or subset of general computer science and mathematics which focuses on more abstract or mathematical aspects of computing....
, computational learning theory is a mathematical field related to the analysis of machine learning
Machine learning
Machine learning, a branch of artificial intelligence, is a scientific discipline concerned with the design and development of algorithms that allow computers to evolve behaviors based on empirical data, such as from sensor data or databases...
algorithms.
Overview
Theoretical results in machine learning mainly deal with a type ofinductive learning called supervised learning. In supervised
learning, an algorithm is given samples that are labeled in some
useful way. For example, the samples might be descriptions of
mushrooms, and the labels could be whether or not the mushrooms are
edible. The algorithm takes these previously labeled samples and
uses them to induce a classifier. This classifier is a function that
assigns labels to samples including the samples that have never been
previously seen by the algorithm. The goal of the supervised learning
algorithm is to optimize some measure of performance such as
minimizing the number of mistakes made on new samples.
In addition to performance bounds, computational learning theorists
study the time complexity and feasibility of learning. In
computational learning theory, a computation is considered feasible if
it can be done in polynomial time. There are two kinds of time
complexity results:
- Positive results Showing that a certain class of functions is learnable in polynomial time.
- Negative results Showing that certain classes cannot be learned in polynomial time.
Negative results are proven only by assumption. The assumptions that are common in negative results are:
- Computational complexity - P ≠ NP
- CryptographicCryptographyCryptography is the practice and study of techniques for secure communication in the presence of third parties...
- One-way functionOne-way functionIn computer science, a one-way function is a function that is easy to compute on every input, but hard to invert given the image of a random input. Here "easy" and "hard" are to be understood in the sense of computational complexity theory, specifically the theory of polynomial time problems...
s exist.
There are several different approaches to computational learning
theory. These differences are based on making assumptions about the
inference
Inference
Inference is the act or process of deriving logical conclusions from premises known or assumed to be true. The conclusion drawn is also called an idiomatic. The laws of valid inference are studied in the field of logic.Human inference Inference is the act or process of deriving logical conclusions...
principles used to generalize from limited data. This
includes different definitions of probability
Probability
Probability is ordinarily used to describe an attitude of mind towards some proposition of whose truth we arenot certain. The proposition of interest is usually of the form "Will a specific event occur?" The attitude of mind is of the form "How certain are we that the event will occur?" The...
(see
frequency probability
Frequency probability
Frequency probability is the interpretation of probability that defines an event's probability as the limit of its relative frequency in a large number of trials. The development of the frequentist account was motivated by the problems and paradoxes of the previously dominant viewpoint, the...
, Bayesian probability
Bayesian probability
Bayesian probability is one of the different interpretations of the concept of probability and belongs to the category of evidential probabilities. The Bayesian interpretation of probability can be seen as an extension of logic that enables reasoning with propositions, whose truth or falsity is...
) and different assumptions on the generation of samples. The different approaches include:
- Probably approximately correct learningProbably approximately correct learningIn computational learning theory, probably approximately correct learning is a framework for mathematical analysis of machine learning. It was proposed in 1984 by Leslie Valiant....
(PAC learning), proposed by Leslie ValiantLeslie ValiantLeslie Gabriel Valiant is a British computer scientist and computational theorist.He was educated at King's College, Cambridge, Imperial College London, and University of Warwick where he received his Ph.D. in computer science in 1974. He started teaching at Harvard University in 1982 and is...
; - VC theory, proposed by Vladimir VapnikVladimir VapnikVladimir Naumovich Vapnik is one of the main developers of VapnikāChervonenkis theory. He was born in the Soviet Union. He received his master's degree in mathematics at the Uzbek State University, Samarkand, Uzbek SSR in 1958 and Ph.D in statistics at the Institute of Control Sciences, Moscow in...
; - Bayesian inferenceBayesian inferenceIn statistics, Bayesian inference is a method of statistical inference. It is often used in science and engineering to determine model parameters, make predictions about unknown variables, and to perform model selection...
, arising from work first done by Thomas BayesThomas BayesThomas Bayes was an English mathematician and Presbyterian minister, known for having formulated a specific case of the theorem that bears his name: Bayes' theorem...
. - Algorithmic learning theoryAlgorithmic learning theoryAlgorithmic learning theory is a framework for machine learning.The framework was introduced in E. Mark Gold's seminal paper "Language identification in the limit"...
, from the work of E. M. Gold. - Online machine learningOnline machine learningIn machine learning, online learning is a model of induction that learns one instance at atime. The goal in online learning is to predict labels for instances.For example, the instances could describe the current conditions of...
, from the work of Nick Littlestone.
Computational learning theory has led to several practical
algorithms. For example, PAC theory inspired boosting
Boosting
Boosting is a machine learning meta-algorithm for performing supervised learning. Boosting is based on the question posed by Kearns: can a set of weak learners create a single strong learner? A weak learner is defined to be a classifier which is only slightly correlated with the true classification...
, VC theory
led to support vector machine
Support vector machine
A support vector machine is a concept in statistics and computer science for a set of related supervised learning methods that analyze data and recognize patterns, used for classification and regression analysis...
s, and Bayesian inference led to
belief networks (by Judea Pearl
Judea Pearl
Judea Pearl is a computer scientist and philosopher, best known for developing the probabilistic approach to artificial intelligence and the development of Bayesian networks ....
).
Surveys
- Angluin, D. 1992. Computational learning theory: Survey and selected bibliography. In Proceedings of the Twenty-Fourth Annual ACM Symposium on Theory of Computing (May 1992), pp. 351--369. http://portal.acm.org/citation.cfm?id=129712.129746
- D. Haussler. Probably approximately correct learning. In AAAI-90 Proceedings of the Eight National Conference on Artificial Intelligence, Boston, MA, pages 1101--1108. American Association for Artificial Intelligence, 1990. http://citeseer.ist.psu.edu/haussler90probably.html
VC dimensionVC dimensionIn statistical learning theory, or sometimes computational learning theory, the VC dimension is a measure of the capacity of a statistical classification algorithm, defined as the cardinality of the largest set of points that the algorithm can shatter...
- V. Vapnik and A. Chervonenkis. On the uniform convergence of relative frequencies of events to their probabilities. Theory of Probability and its Applications, 16(2):264--280, 1971.
Feature selection
- A. Dhagat and L. Hellerstein. PAC learning with irrelevant attributes. In Proceedings of the IEEE Symp. on Foundation of Computer Science, 1994. http://citeseer.ist.psu.edu/dhagat94pac.html
Inductive inference
- E. M. Gold. Language identification in the limit. Information and Control, 10:447--474, 1967.
Optimal O notation learning
- O. Goldreich, D. Ron. On universal learning algorithms. http://citeseer.ist.psu.edu/69804.html
Negative results
- M. Kearns and L. G. Valiant. 1989. Cryptographic limitations on learning boolean formulae and finite automata. In Proceedings of the 21st Annual ACM Symposium on Theory of Computing, pages 433--444, New York. ACM. http://citeseer.ist.psu.edu/kearns89cryptographic.html
BoostingBoostingBoosting is a machine learning meta-algorithm for performing supervised learning. Boosting is based on the question posed by Kearns: can a set of weak learners create a single strong learner? A weak learner is defined to be a classifier which is only slightly correlated with the true classification...
- Robert E. Schapire. The strength of weak learnability. Machine Learning, 5(2):197--227, 1990 http://citeseer.ist.psu.edu/schapire90strength.html
Occam's RazorOccam's razorOccam's razor, also known as Ockham's razor, and sometimes expressed in Latin as lex parsimoniae , is a principle that generally recommends from among competing hypotheses selecting the one that makes the fewest new assumptions.-Overview:The principle is often summarized as "simpler explanations...
- Blumer, A.; Ehrenfeucht, A.; Haussler, D.; Warmuth, M. K. "Occam's razor" Inf.Proc.Lett. 24, 377-380, 1987.
- A. Blumer, A. Ehrenfeucht, D. Haussler, and M. K. Warmuth. Learnability and the Vapnik-Chervonenkis dimension. Journal of the ACM, 36(4):929--865, 1989.
Probably approximately correct learningProbably approximately correct learningIn computational learning theory, probably approximately correct learning is a framework for mathematical analysis of machine learning. It was proposed in 1984 by Leslie Valiant....
- L. Valiant. A Theory of the Learnable. Communications of the ACM, 27(11):1134--1142, 1984.
Error tolerance
- Michael Kearns and Ming Li. Learning in the presence of malicious errors. SIAM Journal on Computing, 22(4):807--837, August 1993. http://citeseer.ist.psu.edu/kearns93learning.html
- Kearns, M. (1993). Efficient noise-tolerant learning from statistical queries. In Proceedings of the Twenty-Fifth Annual ACM Symposium on Theory of Computing, pages 392--401. http://citeseer.ist.psu.edu/kearns93efficient.html
Equivalence
- D.Haussler, M.Kearns, N.Littlestone and M. WarmuthManfred K. WarmuthManfred Klaus Warmuth is a researcher and professor at the University of California, Santa Cruz. His main research interest is computational learning theory with a special focus on online learning algorithms.-External links:*...
, Equivalence of models for polynomial learnability, Proc. 1st ACM Workshop on Computational Learning Theory, (1988) 42-55. - L. Pitt and M. K. Warmuth: Prediction preserving reduction, Journal of Computer System and Science 41, 430--467, 1990.
A description of some of these publications is given at important publications in machine learning.
External links
- On-line book: Information Theory, Inference, and Learning Algorithms, by David MacKayDavid MacKay (scientist)David John Cameron MacKay, FRS, is the professor of natural philosophy in the department of Physics at the University of Cambridge and chief scientific adviser to the UK Department of Energy and Climate Change...
, gives a detailed account of the Bayesian approach to machine learning. - Review of An Introduction to Computational Learning Theory
- Review of The Nature of Statistical Learning Theory
- Basics of Bayesian inference