Likelihood principle
Encyclopedia
In statistics
Statistics
Statistics is the study of the collection, organization, analysis, and interpretation of data. It deals with all aspects of this, including the planning of data collection in terms of the design of surveys and experiments....

,
the likelihood principle is a controversial principle of statistical inference
Statistical inference
In statistics, statistical inference is the process of drawing conclusions from data that are subject to random variation, for example, observational errors or sampling variation...

 which asserts that all of the information
Information
Information in its most restricted technical sense is a message or collection of messages that consists of an ordered sequence of symbols, or it is the meaning that can be interpreted from such a message or collection of messages. Information can be recorded or transmitted. It can be recorded as...

 in a sample
Sampling (statistics)
In statistics and survey methodology, sampling is concerned with the selection of a subset of individuals from within a population to estimate characteristics of the whole population....

 is contained in the likelihood function
Likelihood function
In statistics, a likelihood function is a function of the parameters of a statistical model, defined as follows: the likelihood of a set of parameter values given some observed outcomes is equal to the probability of those observed outcomes given those parameter values...

.

A likelihood function
Likelihood function
In statistics, a likelihood function is a function of the parameters of a statistical model, defined as follows: the likelihood of a set of parameter values given some observed outcomes is equal to the probability of those observed outcomes given those parameter values...

 arises from a conditional probability distribution considered as a function of its distributional parameterization argument, conditioned on the data argument. For example, consider a model which gives the probability density function
Probability density function
In probability theory, a probability density function , or density of a continuous random variable is a function that describes the relative likelihood for this random variable to occur at a given point. The probability for the random variable to fall within a particular region is given by the...

 of observable random variable
Random variable
In probability and statistics, a random variable or stochastic variable is, roughly speaking, a variable whose value results from a measurement on some type of random process. Formally, it is a function from a probability space, typically to the real numbers, which is measurable functionmeasurable...

 X as a function of a parameter θ.
Then for a specific value x of X, the function L(θ | x) = P(X=x | θ) is a likelihood function of θ: it gives a measure of how "likely" any particular value of θ is, if we know that X has the value x. Two likelihood functions are equivalent if one is a scalar multiple of the other. The likelihood principle states that all information from the data relevant to inferences about the value of θ is found in the equivalence class. The strong likelihood principle applies this same criterion to cases such as sequential experiments where the sample of data that is available results from applying a stopping rule
Stopping rule
In probability theory, in particular in the study of stochastic processes, a stopping time is a specific type of “random time”....

 to the observations earlier in the experiment.

Example

Suppose
  • X is the number of successes in twelve independent
    Statistical independence
    In probability theory, to say that two events are independent intuitively means that the occurrence of one event makes it neither more nor less probable that the other occurs...

     Bernoulli trial
    Bernoulli trial
    In the theory of probability and statistics, a Bernoulli trial is an experiment whose outcome is random and can be either of two possible outcomes, "success" and "failure"....

    s with probability θ of success on each trial, and
  • Y is the number of independent Bernoulli trials needed to get three successes, again with probability θ of success on each trial.


Then the observation that X = 3 induces the likelihood function


and the observation that Y = 12 induces the likelihood function


These are equivalent because each is a scalar multiple of the other. The likelihood principle therefore says the inferences drawn about the value of θ should be the same in both cases.

The difference between observing X = 3 and observing Y = 12 is only in the design of the experiment
Design of experiments
In general usage, design of experiments or experimental design is the design of any information-gathering exercises where variation is present, whether under the full control of the experimenter or not. However, in statistics, these terms are usually used for controlled experiments...

: in one case, one has decided in advance to try twelve times; in the other, to keep trying until three successes are observed. The outcome is the same in both cases.

The law of likelihood

A related concept is the law of likelihood, the notion that the extent to which the evidence supports one parameter value or hypothesis against another is equal to the ratio of their likelihoods.
That is,
is the degree to which the observation x supports parameter value or hypothesis a against b.
If this ratio is 1, the evidence is indifferent,
and if greater or less than 1, the evidence supports a against b or vice versa. The use of Bayes factor
Bayes factor
In statistics, the use of Bayes factors is a Bayesian alternative to classical hypothesis testing. Bayesian model comparison is a method of model selection based on Bayes factors.-Definition:...

s can extend this by taking account of the complexity of different hypotheses.

Combining the likelihood principle with the law of likelihood yields the consequence that the parameter value which maximizes the likelihood function is the value which is most strongly supported by the evidence.
This is the basis for the widely-used method of maximum likelihood
Maximum likelihood
In statistics, maximum-likelihood estimation is a method of estimating the parameters of a statistical model. When applied to a data set and given a statistical model, maximum-likelihood estimation provides estimates for the model's parameters....

.

Historical remarks

The likelihood principle was first identified by that name in print in 1962
(Barnard et al., Birnbaum, and Savage et al.),
but arguments for the same principle, unnamed, and the use of the principle in applications goes back to the works of R.A. Fisher in the 1920s.
The law of likelihood was identified by that name by I. Hacking
Ian Hacking
Ian Hacking, CC, FRSC, FBA is a Canadian philosopher, specializing in the philosophy of science.- Life and works :...

 (1965).
More recently the likelihood principle as a general principle of inference has been championed by A. W. F. Edwards
A. W. F. Edwards
Anthony William Fairbank Edwards is a British statistician, geneticist, and evolutionary biologist, sometimes called Fisher's Edwards. He is a Life Fellow of Gonville and Caius College and retired Professor of Biometry at the University of Cambridge, and holds both the ScD and LittD degrees. A...

.
The likelihood principle has been applied to the philosophy of science
Philosophy of science
The philosophy of science is concerned with the assumptions, foundations, methods and implications of science. It is also concerned with the use and merit of science and sometimes overlaps metaphysics and epistemology by exploring whether scientific results are actually a study of truth...

 by R. Royall.

Birnbaum
Allan Birnbaum
Allan Birnbaum was an American statistician who contributed to statistical inference, foundations of statistics, statistical genetics, statistical psychology, and history of statistics....

 proved that the likelihood principle follows from two more primitive and seemingly reasonable principles, the conditionality principle
Conditionality principle
The conditionality principle is a Fisherian principle of statistical inference that Allan Birnbaum formally defined and studied in his 1962 JASA article. Together with the sufficiency principle, Birnbaum's version of the principle implies the famous likelihood principle...

and the sufficiency principle. The conditionality principle says that if an experiment is chosen by a random process independent of the states of nature , then only the experiment actually performed is relevant to inferences about . The sufficiency principle says that if is a sufficient statistic for , and if in two experiments with data and we have , then the evidence about given by the two experiments is the same.

Arguments for and against the likelihood principle

The likelihood principle is not universally accepted.
Some widely-used methods of conventional statistics,
for example many significance test
Statistical hypothesis testing
A statistical hypothesis test is a method of making decisions using data, whether from a controlled experiment or an observational study . In statistics, a result is called statistically significant if it is unlikely to have occurred by chance alone, according to a pre-determined threshold...

s, are not consistent with the likelihood principle.
Let us briefly consider some of the arguments for and against the likelihood principle.

Experimental design arguments on the likelihood principle

Unrealized events do play a role in some common statistical methods.
For example, the result of a significance test
Statistical hypothesis testing
A statistical hypothesis test is a method of making decisions using data, whether from a controlled experiment or an observational study . In statistics, a result is called statistically significant if it is unlikely to have occurred by chance alone, according to a pre-determined threshold...

 depends on the probability of a result as extreme or more extreme than the observation, and that probability may depend on the design of the experiment. Thus, to the extent that such methods are accepted, the likelihood principle is denied.

Some classical significance tests are not based on the likelihood.
A commonly cited example is the optional stopping problem.
Suppose I tell you that I tossed a coin 12 times and in the process observed 3 heads.
You might make some inference about the probability of heads and whether the coin was fair.
Suppose now I tell that I tossed the coin until I observed 3 heads, and I tossed it 12 times. Will you now make some different inference?

The likelihood function is the same in both cases: it is proportional to.

According to the likelihood principle, the inference should be the same in either case.

Suppose a number of scientists are assessing the probability of a certain outcome (which we shall call 'success') in experimental trials. Conventional wisdom suggests that if there is no bias towards success or failure then the success probability would be one half. Adam, a scientist, conducted 12 trials and obtains 3 successes and 9 failures. Then he left the lab.

Bill, a colleague in the same lab, continued Adam's work and published Adam's results, along with a significance test. He tested the null hypothesis that p, the success probability, is equal to a half, versus p < 0.5. The probability of the observed result that out of 12 trials 3 or something fewer (i.e. more extreme) were successes, if H0 is true, is
which is 299/4096 = 7.3%. Thus the null hypothesis is not rejected at the 5% significance level.

Charlotte, another scientist, reads Bill's paper and writes a letter, saying that it is possible that Adam kept trying until he obtained 3 successes, in which case the probability of needing to conduct 12 or more experiments is given by
which is 134/4096 = 3.27%. Now the result is statistically significant at the 5% level.

To these scientists, whether a result is significant or not seems to depend on the original design of the experiment, not just the likelihood of the outcome.

Paradoxical results of this kind are considered by some as arguments against the likelihood principle. For others it exemplifies the value of the likelihood principle and is an argument against significance tests which, for them, resolves the paradox.

It is worth noting that there is no real contradiction in this example. Bill's result is the probability of obtaining 3 or fewer successes in 12 trials. Charlotte's result is the probability that the 3rd success will occur on the 12th or later trial. These are fundamentally different things. The probability of obtaining 3 or fewer successes in 12 trials properly maps to the probability that the 4th success occurs after the 12th trial. By taking only the cases in which the 3rd success occurs on the 12th trial or later, those cases are ignored in which the 3rd success happens earlier and is followed by a string of failures up to the 12th trial; this accounts for the difference in the calculations. Another way of looking at this is that Charlotte's calculation implicitly assumes that the 3rd success occurs on the 12th trial, something that is not clear in the problem statement and that is not assumed in Bill's calculation. From Bill's perspective, Charlotte's calculation is the same as the probability of obtaining 2 or fewer successes in 11 trials, given a success on the 12th trial.

Similar themes appear when comparing Fisher's exact test
Fisher's exact test
Fisher's exact test is a statistical significance test used in the analysis of contingency tables where sample sizes are small. It is named after its inventor, R. A...

 with Pearson's chi-squared test
Pearson's chi-squared test
Pearson's chi-squared test is the best-known of several chi-squared tests – statistical procedures whose results are evaluated by reference to the chi-squared distribution. Its properties were first investigated by Karl Pearson in 1900...

.

The voltmeter story

An argument in favor of the likelihood principle is given by Edwards in his book Likelihood. He cites the following story from J.W. Pratt, slightly condensed here. Note that the likelihood function depends only on what actually happened, and not on what could have happened.
An engineer draws a random sample of electron tubes and measures their voltage. The measurements range from 75 to 99 volts. A statistician computes the sample mean and a confidence interval for the true mean. Later the statistician discovers that the voltmeter reads only as far as 100, so the population appears to be 'censored'. This necessitates a new analysis, if the statistician is orthodox. However, the engineer says he has another meter reading to 1000 volts, which he would have used if any voltage had been over 100. This is a relief to the statistician, because it means the population was effectively uncensored after all. But, the next day the engineer informs the statistician that this second meter was not working at the time of the measuring. The statistician ascertains that the engineer would not have held up the measurements until the meter was fixed, and informs him that new measurements are required. The engineer is astounded. "Next you'll be asking about my oscilloscope".


One might proceed with this story, and consider the fact that in general the actual situation could have been different. For instance, high range voltmeters don't break at predictable moments in time, but rather at unpredictable moments. So it could have been broken, with some probability. The likelihood theory claims that the distribution of the voltage measurements depends on the probability that an instrument not used in this experiment was broken at the time.

This story can be translated to Adam's stopping rule above, as follows. Adam stopped immediately after 3 successes, because his boss Bill had instructed him to do so. Adam did not die. After the publication of the statistical analysis by Bill, Adam discovers that he has missed a second instruction from Bill to conduct 12 trials instead, and that Bill's paper is based on this second instruction. Adam is very glad that he got his 3 successes after exactly 12 trials, and explains to his friend Charlotte that by coincidence he executed the second instruction. Later, he is astonished to hear about Charlotte's letter explaining that now the result is significant.

Optional stopping in clinical trials

The fact that Bayesian
Bayesian inference
In statistics, Bayesian inference is a method of statistical inference. It is often used in science and engineering to determine model parameters, make predictions about unknown variables, and to perform model selection...

and frequentist arguments differ on the subject of optional stopping has a major impact on the way that clinical trial data can be analysed. In frequentist setting there is a major difference between a design which is fixed and one which is sequential, i.e. consisting of a sequence of analyses. Bayesian statistics is inherently sequential and so there is no such distinction.

In a clinical trial it is strictly not valid to conduct an unplanned interim analysis of the data by frequentist methods, whereas this is permissible by Bayesian methods. Similarly, if funding is withdrawn part way through an experiment, and the analyst must work with incomplete data, this is a possible source of bias for classical methods but not for Bayesian methods, which do not depend on the intended design of the experiment. Furthermore, as mentioned above, frequentist analysis is open to unscrupulous manipulation if the experimenter is allowed to choose the stopping point, whereas Bayesian methods are immune to such manipulation.

External links

The source of this article is wikipedia, the free encyclopedia.  The text of this article is licensed under the GFDL.
 
x
OK