Bayes estimator - AbsoluteAstronomy.com

Estimation theory

Estimation theory is a branch of statistics and signal processing that deals with estimating the values of parameters based on measured/empirical data that has a random component. The parameters describe an underlying physical setting in such a way that their value affects the distribution of the...

and decision theory

Decision theory

Decision theory in economics, psychology, philosophy, mathematics, and statistics is concerned with identifying the values, uncertainties and other issues relevant in a given decision, its rationality, and the resulting optimal decision...

, a Bayes estimator or a Bayes action is an estimator

Estimator

In statistics, an estimator is a rule for calculating an estimate of a given quantity based on observed data: thus the rule and its result are distinguished....

or decision rule

Decision rule

In decision theory, a decision rule is a function which maps an observation to an appropriate action. Decision rules play an important role in the theory of statistics and economics, and are closely related to the concept of a strategy in game theory....

that minimizes the posterior

Posterior probability

In Bayesian statistics, the posterior probability of a random event or an uncertain proposition is the conditional probability that is assigned after the relevant evidence is taken into account...

expected value

Expected value

In probability theory, the expected value of a random variable is the weighted average of all possible values that this random variable can take on...

of a loss function

Loss function

In statistics and decision theory a loss function is a function that maps an event onto a real number intuitively representing some "cost" associated with the event. Typically it is used for parameter estimation, and the event in question is some function of the difference between estimated and...

(i.e., the posterior expected loss). Equivalently, it maximizes the posterior expectation of a utility

Utility

In economics, utility is a measure of customer satisfaction, referring to the total satisfaction received by a consumer from consuming a good or service....

function. An alternative way of formulating an estimator within Bayesian statistics

Bayesian statistics

Bayesian statistics is that subset of the entire field of statistics in which the evidence about the true state of the world is expressed in terms of degrees of belief or, more specifically, Bayesian probabilities...

is Maximum a posteriori estimation.

Definition

Suppose an unknown parameter θ is known to have a prior distribution

. Let

be an estimator of θ (based on some measurements x), and let

be a loss function

Loss function

, such as squared error. The Bayes risk of

is defined as

, where the expectation

Expected value

In probability theory, the expected value of a random variable is the weighted average of all possible values that this random variable can take on...

is taken over the probability distribution of

: this defines the risk function as a function of

. An estimator

is said to be a Bayes estimator if it minimizes the Bayes risk among all estimators. Equivalently, the estimator which minimizes the posterior expected loss

for each x also minimizes the Bayes risk and therefore is a Bayes estimator.

If the prior is improper then an estimator which minimizes the posterior expected loss for each x is called a generalized Bayes estimator.

Minimum mean square error estimation

The most common risk function used for Bayesian estimation is the mean square error (MSE), also called squared error risk. The MSE is defined by

where the expectation is taken over the joint distribution of

and

.

Using the MSE as risk, the Bayes estimate of the unknown parameter is simply the mean of the posterior distribution,

This is known as the minimum mean square error (MMSE) estimator. The Bayes risk, in this case, is the posterior variance.

Bayes estimators for conjugate priors

If there is no inherent reason to prefer one prior probability distribution over another, a conjugate prior

Conjugate prior

In Bayesian probability theory, if the posterior distributions p are in the same family as the prior probability distribution p, the prior and posterior are then called conjugate distributions, and the prior is called a conjugate prior for the likelihood...

is sometimes chosen for simplicity. A conjugate prior is defined as a prior distribution belonging to some parametric family

Parametric family

In mathematics and its applications, a parametric family or a parameterized family is a family of objects whose definitions depend on a set of parameters....

, for which the resulting posterior distribution also belongs to the same family. This is an important property, since the Bayes estimator, as well as its statistical properties (variance, confidence interval, etc.), can all be derived from the posterior distribution.

Conjugate priors are especially useful for sequential estimation, where the posterior of the current measurement is used as the prior in the next measurement. In sequential estimation, unless a conjugate prior is used, the posterior distribution typically becomes more complex with each added measurement, and the Bayes estimator cannot usually be calculated without resorting to numerical methods.

Following are some examples of conjugate priors.

If x|θ is normal, x|θ ~ N(θ,σ²), and the prior is normal, θ ~ N(μ,τ²), then the posterior is also normal and the Bayes estimator under MSE is given by
If x₁,...,x_n are iid Poisson
Poisson distribution
In probability theory and statistics, the Poisson distribution is a discrete probability distribution that expresses the probability of a given number of events occurring in a fixed interval of time and/or space if these events occur with a known average rate and independently of the time since...

random variables x_i|θ ~ P(θ), and if the prior is Gamma distributed θ ~ G(a,b), then the posterior is also Gamma distributed, and the Bayes estimator under MSE is given by
If x₁,...,x_n are iid uniformly distributed
Uniform distribution (continuous)
In probability theory and statistics, the continuous uniform distribution or rectangular distribution is a family of probability distributions such that for each member of the family, all intervals of the same length on the distribution's support are equally probable. The support is defined by...

x_i|θ~U(0,θ), and if the prior is Pareto distributed θ~Pa(θ₀,a), then the posterior is also Pareto distributed, and the Bayes estimator under MSE is given by

Alternative risk functions

Risk functions are chosen depending on how one measures the distance between the estimate and the unknown parameter. The MSE is the most common risk function in use, primarily due to its simplicity. However, alternative risk functions are also occasionally used. The following are several examples of such alternatives. We denote the posterior generalized distribution function by

A "linear" loss function, with , which yields the posterior median as the Bayes' estimate:

Another "linear" loss function, which assigns different "weights" to over or sub estimation. It yields a quantile
Quantile
Quantiles are points taken at regular intervals from the cumulative distribution function of a random variable. Dividing ordered data into q essentially equal-sized data subsets is the motivation for q-quantiles; the quantiles are the data values marking the boundaries between consecutive subsets...

from the posterior distribution, and is a generalization of the previous loss function:

The following loss function is trickier: it yields either the posterior mode, or a point close to it depending on the curvature and properties of the posterior distribution. Small values of the parameter are recommended, in order to use the mode as an approximation ():

mean squared error

Mean squared error

In statistics, the mean squared error of an estimator is one of many ways to quantify the difference between values implied by a kernel density estimator and the true values of the quantity being estimated. MSE is a risk function, corresponding to the expected value of the squared error loss or...

Generalized Bayes estimators

measure

Measure (mathematics)

In mathematical analysis, a measure on a set is a systematic way to assign to each suitable subset a number, intuitively interpreted as the size of the subset. In this sense, a measure is a generalization of the concepts of length, area, and volume...

Bayes' theorem

In probability theory and applications, Bayes' theorem relates the conditional probabilities P and P. It is commonly used in science and engineering. The theorem is named for Thomas Bayes ....

generalized Bayes estimator

Example

location parameter

Location parameter

In statistics, a location family is a class of probability distributions that is parametrized by a scalar- or vector-valued parameter μ, which determines the "location" or shift of the distribution...

Empirical Bayes estimators

empirical Bayes method

Empirical Bayes method

Empirical Bayes methods are procedures for statistical inference in which the prior distribution is estimated from the data. This approach stands in contrast to standardBayesian methods, for which the prior distribution is fixed before any data are observed...

empirical Bayes estimator

parametric

Parametric statistics

Parametric statistics is a branch of statistics that assumes that the data has come from a type of probability distribution and makes inferences about the parameters of the distribution. Most well-known elementary statistical methods are parametric....

non-parametric

Non-parametric statistics

In statistics, the term non-parametric statistics has at least two different meanings:The first meaning of non-parametric covers techniques that do not rely on data belonging to any particular distribution. These include, among others:...

Example

maximum likelihood

Maximum likelihood

In statistics, maximum-likelihood estimation is a method of estimating the parameters of a statistical model. When applied to a data set and given a statistical model, maximum-likelihood estimation provides estimates for the model's parameters....

conjugate prior

Conjugate prior

Admissibility

admissible

Admissible decision rule

In statistical decision theory, an admissible decision rule is a rule for making a decision such that there isn't any other rule that is always "better" than it, in a specific sense defined below....

If a Bayes rule is unique then it is admissible. For example, as stated above, under mean squared error (MSE) the Bayes rule is unique and therefore admissible.
If θ belongs to a discrete
Discrete
Discrete in science is the opposite of continuous: something that is separate; distinct; individual.Discrete may refer to:*Discrete particle or quantum in physics, for example in quantum theory...

set, then all Bayes rules are admissible.
If θ belongs to a continuous (non-discrete set), and if the risk function R(θ,δ) is continuous in θ for every δ, then all Bayes rules are admissible.

Asymptotic efficiency

normal distribution

₀

fisher information

Fisher information

In mathematical statistics and information theory, the Fisher information is the variance of the score. In Bayesian statistics, the asymptotic distribution of the posterior mode depends on the Fisher information and not on the prior...

₀

conjugate prior

Conjugate prior

Beta distribution

a+b

Practical example of misapplication of Bayes estimators

Internet Movie Database

Internet Movie Database is an online database of information related to movies, television shows, actors, production crew personnel, video games and fictional characters featured in visual entertainment media. It is one of the most popular online entertainment destinations, with over 100 million...