Hyperparameter
Encyclopedia
In Bayesian statistics
, a hyperparameter is a parameter of a prior distribution; the term is used to distinguish them from parameters of the model for the underlying system under analysis. They arise particularly in the use of conjugate prior
s.
For example, if one is using a beta distribution to model the distribution of the parameter p of a Bernoulli distribution, then:
One may take a single value for a given hyperparameter, or one can iterate and take a probability distribution on the hyperparameter itself, called a hyperprior
.
of probability distributions – this is done partly for explicitness (so one can write down a distribution, and choose the form by varying the hyperparameter, rather than trying to produce an arbitrary function), and partly so that one can vary the hyperparameter, particularly in the method of conjugate prior
s, or for sensitivity analysis.
.
Similarly, one may use a prior distribution with a range for a hyperparameter, perhaps reflecting uncertainty in the correct prior to take, and reflect this in a range for final uncertainty.
. In principle, one may iterate this, calling parameters of a hyperprior hyperhyperparameters, and so forth.
Bayesian statistics
Bayesian statistics is that subset of the entire field of statistics in which the evidence about the true state of the world is expressed in terms of degrees of belief or, more specifically, Bayesian probabilities...
, a hyperparameter is a parameter of a prior distribution; the term is used to distinguish them from parameters of the model for the underlying system under analysis. They arise particularly in the use of conjugate prior
Conjugate prior
In Bayesian probability theory, if the posterior distributions p are in the same family as the prior probability distribution p, the prior and posterior are then called conjugate distributions, and the prior is called a conjugate prior for the likelihood...
s.
For example, if one is using a beta distribution to model the distribution of the parameter p of a Bernoulli distribution, then:
- p is a parameter of the underlying system (Bernoulli distribution), and
- α and β are parameters of the prior distribution (beta distribution), hence hyperparameters.
One may take a single value for a given hyperparameter, or one can iterate and take a probability distribution on the hyperparameter itself, called a hyperprior
Hyperprior
In Bayesian statistics, a hyperprior is a prior distribution on a hyperparameter, that is, on a parameter of a prior distribution.As with the term hyperparameter, the use of hyper is to distinguish it from a prior distribution of a parameter of the model for the underlying system...
.
Purpose
One often uses a prior which comes from a parametric familyParametric family
In mathematics and its applications, a parametric family or a parameterized family is a family of objects whose definitions depend on a set of parameters....
of probability distributions – this is done partly for explicitness (so one can write down a distribution, and choose the form by varying the hyperparameter, rather than trying to produce an arbitrary function), and partly so that one can vary the hyperparameter, particularly in the method of conjugate prior
Conjugate prior
In Bayesian probability theory, if the posterior distributions p are in the same family as the prior probability distribution p, the prior and posterior are then called conjugate distributions, and the prior is called a conjugate prior for the likelihood...
s, or for sensitivity analysis.
Conjugate priors
When using a conjugate prior, the posterior distribution will be from the same family, but will have different hyperparameters, which reflect the added information from the data: in subjective terms, one's beliefs have been updated. For a general prior distribution, this is computationally very involved, and the posterior may have an unusual or hard to describe form, but with a conjugate prior, there is generally a simple formula relating the values of the hyperparameters of the posterior to the values of the hyperparameters of the prior, and thus the computation of the posterior distribution is very easy.Sensitivity analysis
A key concern of users of Bayesian statistics, and criticism by critics, is the dependence of the posterior distribution on one's prior. Hyperparameters address this by allowing one to easily vary them and see how the posterior distribution (and various statistics of it, such as credible intervals) vary: one can see how sensitive one's conclusions are to one's prior assumptions, and the process is called sensitivity analysisSensitivity analysis
Sensitivity analysis is the study of how the variation in the output of a statistical model can be attributed to different variations in the inputs of the model. Put another way, it is a technique for systematically changing variables in a model to determine the effects of such changes.In any...
.
Similarly, one may use a prior distribution with a range for a hyperparameter, perhaps reflecting uncertainty in the correct prior to take, and reflect this in a range for final uncertainty.
Hyperpriors
Instead of using a single value for a given hyperparameter, one instead take a probability distribution on the hyperparameter itself; this is called a hyperpriorHyperprior
In Bayesian statistics, a hyperprior is a prior distribution on a hyperparameter, that is, on a parameter of a prior distribution.As with the term hyperparameter, the use of hyper is to distinguish it from a prior distribution of a parameter of the model for the underlying system...
. In principle, one may iterate this, calling parameters of a hyperprior hyperhyperparameters, and so forth.