Inductive inference
Encyclopedia
Around 1960, Ray Solomonoff
founded the theory of universal inductive inference, the theory of prediction based on observations; for example, predicting the next symbol based upon a given series of symbols. The only assumption is that the environment follows some unknown but computable probability distribution
.
Fundamental ingredients of the theory are the concepts of algorithmic probability
and Kolmogorov complexity
. The universal prior probability
of any prefix p of a computable sequence x is the sum of the probabilities of all programs (for a universal computer) that compute something starting with p. Given some p and any computable but unknown probability distribution from which x is sampled, the universal prior and Bayes' theorem
can be used to predict the yet unseen parts of x in optimal fashion.
This is a mathematically formalized Occam's razor
: shorter (Kolmogorov complexity
) computable theories have more weight when calculating the probability of the next observation, using all computable theories which perfectly describe previous observations.
Marcus Hutter
's universal artificial intelligence builds upon this to calculate the expected value
of an action.
Another direction of inductive inference is based on E. Mark Gold's model of learning in the limit
from 1967 and has developed since then more and more models of learning. The general scenario is the following: Given a class S of computable functions, is there a learner (that is, recursive functional) which for any input of the form (f(0),f(1),...,f(n)) outputs a hypothesis (an index e with respect to a previously agreed on acceptable numbering of all computable functions; the indexed function should be consistent with the given values of f). A learner M learns a function f if almost all its hypotheses are the same index e, which generates the function f; M learns S if M learns every f in S. Basic results are that all recursively enumerable classes of functions are learnable while the class REC of all computable functions is not learnable. Many related models have been considered and also the learning of classes of recursively enumerable sets from positive data is a topic studied from Gold's pioneering paper in 1967 onwards.
A far reaching extension of the Gold’s approach is developed by Schmidhuber's theory of generalized Kolmogorov complexities, which are kinds of super-recursive algorithm
s.
Ray Solomonoff
Ray Solomonoff was the inventor of algorithmic probability, and founder of algorithmic information theory, He was an originator of the branch of artificial intelligence based on machine learning, prediction and probability...
founded the theory of universal inductive inference, the theory of prediction based on observations; for example, predicting the next symbol based upon a given series of symbols. The only assumption is that the environment follows some unknown but computable probability distribution
Probability distribution
In probability theory, a probability mass, probability density, or probability distribution is a function that describes the probability of a random variable taking certain values....
.
Fundamental ingredients of the theory are the concepts of algorithmic probability
Algorithmic probability
In algorithmic information theory, algorithmic probability is a method of assigning a probability to each hypothesis that explains a given observation, with the simplest hypothesis having the highest probability and the increasingly complex hypotheses receiving increasingly small probabilities...
and Kolmogorov complexity
Kolmogorov complexity
In algorithmic information theory , the Kolmogorov complexity of an object, such as a piece of text, is a measure of the computational resources needed to specify the object...
. The universal prior probability
Prior probability
In Bayesian statistical inference, a prior probability distribution, often called simply the prior, of an uncertain quantity p is the probability distribution that would express one's uncertainty about p before the "data"...
of any prefix p of a computable sequence x is the sum of the probabilities of all programs (for a universal computer) that compute something starting with p. Given some p and any computable but unknown probability distribution from which x is sampled, the universal prior and Bayes' theorem
Bayes' theorem
In probability theory and applications, Bayes' theorem relates the conditional probabilities P and P. It is commonly used in science and engineering. The theorem is named for Thomas Bayes ....
can be used to predict the yet unseen parts of x in optimal fashion.
This is a mathematically formalized Occam's razor
Occam's razor
Occam's razor, also known as Ockham's razor, and sometimes expressed in Latin as lex parsimoniae , is a principle that generally recommends from among competing hypotheses selecting the one that makes the fewest new assumptions.-Overview:The principle is often summarized as "simpler explanations...
: shorter (Kolmogorov complexity
Kolmogorov complexity
In algorithmic information theory , the Kolmogorov complexity of an object, such as a piece of text, is a measure of the computational resources needed to specify the object...
) computable theories have more weight when calculating the probability of the next observation, using all computable theories which perfectly describe previous observations.
Marcus Hutter
Marcus Hutter
Marcus Hutter is a German computer scientist and professor at the Australian National University. Hutter was born and educated in Munich, where he studied physics and computer science...
's universal artificial intelligence builds upon this to calculate the expected value
Expected value
In probability theory, the expected value of a random variable is the weighted average of all possible values that this random variable can take on...
of an action.
Another direction of inductive inference is based on E. Mark Gold's model of learning in the limit
Language identification in the limit
Language identification in the limit is a formal model for inductive inference. It was introduced by E. Mark Gold in his paper with the same title . In this model, a learner is provided with presentation of some language. The learning is seen as an infinite process. Each time an element of the...
from 1967 and has developed since then more and more models of learning. The general scenario is the following: Given a class S of computable functions, is there a learner (that is, recursive functional) which for any input of the form (f(0),f(1),...,f(n)) outputs a hypothesis (an index e with respect to a previously agreed on acceptable numbering of all computable functions; the indexed function should be consistent with the given values of f). A learner M learns a function f if almost all its hypotheses are the same index e, which generates the function f; M learns S if M learns every f in S. Basic results are that all recursively enumerable classes of functions are learnable while the class REC of all computable functions is not learnable. Many related models have been considered and also the learning of classes of recursively enumerable sets from positive data is a topic studied from Gold's pioneering paper in 1967 onwards.
A far reaching extension of the Gold’s approach is developed by Schmidhuber's theory of generalized Kolmogorov complexities, which are kinds of super-recursive algorithm
Super-recursive algorithm
In computability theory, super-recursive algorithms are a generalization of ordinary algorithms that are more powerful, that is, compute more than Turing machines. The term was introduced by Mark Burgin, whose book "Super-recursive algorithms" develops their theory and presents several mathematical...
s.
See also
- Mill's methodsMill's MethodsMill's Methods are five methods of induction described by philosopher John Stuart Mill in his 1843 book A System of Logic. They are intended to illuminate issues of causation.-Direct method of agreement:...
- Confirmation biasConfirmation biasConfirmation bias is a tendency for people to favor information that confirms their preconceptions or hypotheses regardless of whether the information is true.David Perkins, a geneticist, coined the term "myside bias" referring to a preference for "my" side of an issue...
- Language identification in the limitLanguage identification in the limitLanguage identification in the limit is a formal model for inductive inference. It was introduced by E. Mark Gold in his paper with the same title . In this model, a learner is provided with presentation of some language. The learning is seen as an infinite process. Each time an element of the...
- Minimum description lengthMinimum description lengthThe minimum description length principle is a formalization of Occam's Razor in which the best hypothesis for a given set of data is the one that leads to the best compression of the data. MDL was introduced by Jorma Rissanen in 1978...
- Minimum message lengthMinimum message lengthMinimum message length is a formal information theory restatement of Occam's Razor: even when models are not equal in goodness of fit accuracy to the observed data, the one generating the shortest overall message is more likely to be correct...