Rao–Blackwell theorem
Encyclopedia
In statistics
, the Rao–Blackwell theorem, sometimes referred to as the Rao–Blackwell–Kolmogorov theorem, is a result which characterizes the transformation of an arbitrarily crude estimator
into an estimator that is optimal by the mean-squared-error
criterion or any of a variety of similar criteria.
The Rao–Blackwell theorem states that if g(X) is any kind of estimator
of a parameter θ, then the conditional expectation
of g(X) given T(X), where T is a sufficient statistic, is typically a better estimator of θ, and is never worse. Sometimes one can very easily construct a very crude estimator g(X), and then evaluate that conditional expected value to get an estimator that is in various senses optimal.
The theorem is named after Calyampudi Radhakrishna Rao and David Blackwell
. The process of transforming an estimator using the Rao–Blackwell theorem is sometimes called Rao–Blackwellization. The transformed estimator
is called the Rao–Blackwell estimator.
In other words
The essential tools of the proof besides the definition above are the law of total expectation
and the fact that for any random variable Y, E(Y2) cannot be less than [E(Y)]2. That inequality is a case of Jensen's inequality
, although it may also be shown to follow instantly from the frequently mentioned fact that
:
where the "loss function" L may be any convex function
. For the proof of the more general version, Jensen's inequality cannot be dispensed with.
if and only if the original estimator is unbiased, as may be seen at once by using the law of total expectation
. The theorem holds regardless of whether biased or unbiased estimators are used.
The theorem seems very weak: it says only that the Rao–Blackwell estimator is no worse than the original estimator. In practice, however, the improvement is often enormous.
at an average rate of λ per minute. This rate is not observable, but the numbers X1, ..., Xn of phone calls that arrived during n successive one-minute periods are observed. It is desired to estimate the probability e−λ that the next one-minute period passes with no phone calls.
An extremely crude estimator of the desired probability is
i.e., it estimates this probability to be 1 if no phone calls arrived in the first minute and zero otherwise. Despite the apparent limitations of this estimator, the result given by its Rao–Blackwellization is a very good estimator.
The sum
can be readily shown to be a sufficient statistic for λ, i.e., the conditional distribution of the data X1, ..., Xn, depends on λ only through this sum. Therefore, we find the Rao–Blackwell estimator
After doing some algebra we have
Since the average number of calls arriving during the first n minutes is nλ, one might not be surprised if this estimator has a fairly high probability (if n is big) of being close to
So δ1 is clearly a very much improved estimator of that last quantity. In fact, since Sn is complete
and δ0 is unbiased, δ1 is the unique minimum variance unbiased estimator by the Lehmann–Scheffé theorem
.
, i.e., one which "admits no unbiased estimator of zero", the Rao–Blackwell process is idempotent. Using it to improve the already improved estimator does not obtain a further improvement, but merely returns as its output the same improved estimator.
": see Lehmann–Scheffé theorem
.
Statistics
Statistics is the study of the collection, organization, analysis, and interpretation of data. It deals with all aspects of this, including the planning of data collection in terms of the design of surveys and experiments....
, the Rao–Blackwell theorem, sometimes referred to as the Rao–Blackwell–Kolmogorov theorem, is a result which characterizes the transformation of an arbitrarily crude estimator
Estimator
In statistics, an estimator is a rule for calculating an estimate of a given quantity based on observed data: thus the rule and its result are distinguished....
into an estimator that is optimal by the mean-squared-error
Mean squared error
In statistics, the mean squared error of an estimator is one of many ways to quantify the difference between values implied by a kernel density estimator and the true values of the quantity being estimated. MSE is a risk function, corresponding to the expected value of the squared error loss or...
criterion or any of a variety of similar criteria.
The Rao–Blackwell theorem states that if g(X) is any kind of estimator
Estimator
In statistics, an estimator is a rule for calculating an estimate of a given quantity based on observed data: thus the rule and its result are distinguished....
of a parameter θ, then the conditional expectation
Conditional expectation
In probability theory, a conditional expectation is the expected value of a real random variable with respect to a conditional probability distribution....
of g(X) given T(X), where T is a sufficient statistic, is typically a better estimator of θ, and is never worse. Sometimes one can very easily construct a very crude estimator g(X), and then evaluate that conditional expected value to get an estimator that is in various senses optimal.
The theorem is named after Calyampudi Radhakrishna Rao and David Blackwell
David Blackwell
-Honors and awards:*President, Institute of Mathematical Statistics, 1956*National Academy of Sciences, 1965*American Academy of Arts and Sciences, 1968*Honorary Fellow, Royal Statistical Society, 1976*Vice President, American Statistical Association, 1978...
. The process of transforming an estimator using the Rao–Blackwell theorem is sometimes called Rao–Blackwellization. The transformed estimator
Estimator
In statistics, an estimator is a rule for calculating an estimate of a given quantity based on observed data: thus the rule and its result are distinguished....
is called the Rao–Blackwell estimator.
Definitions
- An estimatorEstimatorIn statistics, an estimator is a rule for calculating an estimate of a given quantity based on observed data: thus the rule and its result are distinguished....
δ(X) is an observable random variable (i.e. a statisticStatisticA statistic is a single measure of some attribute of a sample . It is calculated by applying a function to the values of the items comprising the sample which are known together as a set of data.More formally, statistical theory defines a statistic as a function of a sample where the function...
) used for estimating some unobservable quantity. For example, one may be unable to observe the average height of all male students at the University of X, but one may observe the heights of a random sample of 40 of them. The average height of those 40—the "sample average"--may be used as an estimator of the unobservable "population average".
- A sufficient statisticSufficiency (statistics)In statistics, a sufficient statistic is a statistic which has the property of sufficiency with respect to a statistical model and its associated unknown parameter, meaning that "no other statistic which can be calculated from the same sample provides any additional information as to the value of...
T(X) is a statistic calculated from data X to estimate some parameter θ for which it is true that no other statistic which can be calculated from data X provides any additional information about θ. It is defined as an observable random variableRandom variableIn probability and statistics, a random variable or stochastic variable is, roughly speaking, a variable whose value results from a measurement on some type of random process. Formally, it is a function from a probability space, typically to the real numbers, which is measurable functionmeasurable...
such that the conditional probabilityConditional probabilityIn probability theory, the "conditional probability of A given B" is the probability of A if B is known to occur. It is commonly notated P, and sometimes P_B. P can be visualised as the probability of event A when the sample space is restricted to event B...
distribution of all observable data X given T(X) does not depend on the unobservable parameter θ, such as the mean or standard deviation of the whole population from which the data X was taken. In the most frequently cited examples, the "unobservable" quantities are parameters that parametrize a known family of probability distributionProbability distributionIn probability theory, a probability mass, probability density, or probability distribution is a function that describes the probability of a random variable taking certain values....
s according to which the data are distributed.
-
- In other words, a sufficient statisticSufficiency (statistics)In statistics, a sufficient statistic is a statistic which has the property of sufficiency with respect to a statistical model and its associated unknown parameter, meaning that "no other statistic which can be calculated from the same sample provides any additional information as to the value of...
T(X) for a parameter θ is a statisticStatisticA statistic is a single measure of some attribute of a sample . It is calculated by applying a function to the values of the items comprising the sample which are known together as a set of data.More formally, statistical theory defines a statistic as a function of a sample where the function...
such that the conditional distributionConditional probabilityIn probability theory, the "conditional probability of A given B" is the probability of A if B is known to occur. It is commonly notated P, and sometimes P_B. P can be visualised as the probability of event A when the sample space is restricted to event B...
of the data X, given T(X), does not depend on the parameter θ.
- In other words, a sufficient statistic
- A Rao–Blackwell estimator δ1(X) of an unobservable quantity θ is the conditional expected valueExpected valueIn probability theory, the expected value of a random variable is the weighted average of all possible values that this random variable can take on...
E(δ(X) | T(X)) of some estimator δ(X) given a sufficient statistic T(X). Call δ(X) the "original estimator" and δ1(X) the "improved estimator". It is important that the improved estimator be observable, i.e. that it not depend on θ. Generally, the conditional expected value of one function of these data given another function of these data does depend on θ, but the very definition of sufficiency given above entails that this one does not.
- The mean squared errorMean squared errorIn statistics, the mean squared error of an estimator is one of many ways to quantify the difference between values implied by a kernel density estimator and the true values of the quantity being estimated. MSE is a risk function, corresponding to the expected value of the squared error loss or...
of an estimator is the expected value of the square of its deviation from the unobservable quantity being estimated.
Mean-squared-error version
One case of Rao–Blackwell theorem states:- The mean squared error of the Rao–Blackwell estimator does not exceed that of the original estimator.
In other words
The essential tools of the proof besides the definition above are the law of total expectation
Law of total expectation
The proposition in probability theory known as the law of total expectation, the law of iterated expectations, the tower rule, the smoothing theorem, among other names, states that if X is an integrable random variable The proposition in probability theory known as the law of total expectation, ...
and the fact that for any random variable Y, E(Y2) cannot be less than [E(Y)]2. That inequality is a case of Jensen's inequality
Jensen's inequality
In mathematics, Jensen's inequality, named after the Danish mathematician Johan Jensen, relates the value of a convex function of an integral to the integral of the convex function. It was proved by Jensen in 1906. Given its generality, the inequality appears in many forms depending on the context,...
, although it may also be shown to follow instantly from the frequently mentioned fact that
Convex loss generalization
The more general version of the Rao–Blackwell theorem speaks of the "expected loss" or risk functionRisk function
In decision theory and estimation theory, the risk function R of a decision rule, δ, is the expected value of a loss function L:...
:
where the "loss function" L may be any convex function
Convex function
In mathematics, a real-valued function f defined on an interval is called convex if the graph of the function lies below the line segment joining any two points of the graph. Equivalently, a function is convex if its epigraph is a convex set...
. For the proof of the more general version, Jensen's inequality cannot be dispensed with.
Properties
The improved estimator is unbiasedBias of an estimator
In statistics, bias of an estimator is the difference between this estimator's expected value and the true value of the parameter being estimated. An estimator or decision rule with zero bias is called unbiased. Otherwise the estimator is said to be biased.In ordinary English, the term bias is...
if and only if the original estimator is unbiased, as may be seen at once by using the law of total expectation
Law of total expectation
The proposition in probability theory known as the law of total expectation, the law of iterated expectations, the tower rule, the smoothing theorem, among other names, states that if X is an integrable random variable The proposition in probability theory known as the law of total expectation, ...
. The theorem holds regardless of whether biased or unbiased estimators are used.
The theorem seems very weak: it says only that the Rao–Blackwell estimator is no worse than the original estimator. In practice, however, the improvement is often enormous.
Example
Phone calls arrive at a switchboard according to a Poisson processPoisson process
A Poisson process, named after the French mathematician Siméon-Denis Poisson , is a stochastic process in which events occur continuously and independently of one another...
at an average rate of λ per minute. This rate is not observable, but the numbers X1, ..., Xn of phone calls that arrived during n successive one-minute periods are observed. It is desired to estimate the probability e−λ that the next one-minute period passes with no phone calls.
An extremely crude estimator of the desired probability is
i.e., it estimates this probability to be 1 if no phone calls arrived in the first minute and zero otherwise. Despite the apparent limitations of this estimator, the result given by its Rao–Blackwellization is a very good estimator.
The sum
can be readily shown to be a sufficient statistic for λ, i.e., the conditional distribution of the data X1, ..., Xn, depends on λ only through this sum. Therefore, we find the Rao–Blackwell estimator
After doing some algebra we have
Since the average number of calls arriving during the first n minutes is nλ, one might not be surprised if this estimator has a fairly high probability (if n is big) of being close to
So δ1 is clearly a very much improved estimator of that last quantity. In fact, since Sn is complete
Completeness (statistics)
In statistics, completeness is a property of a statistic in relation to a model for a set of observed data. In essence, it is a condition which ensures that the parameters of the probability distribution representing the model can all be estimated on the basis of the statistic: it ensures that the...
and δ0 is unbiased, δ1 is the unique minimum variance unbiased estimator by the Lehmann–Scheffé theorem
Lehmann–Scheffé theorem
In statistics, the Lehmann–Scheffé theorem is prominent in mathematical statistics, tying together the ideas of completeness, sufficiency, uniqueness, and best unbiased estimation...
.
Idempotence
In case the sufficient statistic is also a complete statisticCompleteness (statistics)
In statistics, completeness is a property of a statistic in relation to a model for a set of observed data. In essence, it is a condition which ensures that the parameters of the probability distribution representing the model can all be estimated on the basis of the statistic: it ensures that the...
, i.e., one which "admits no unbiased estimator of zero", the Rao–Blackwell process is idempotent. Using it to improve the already improved estimator does not obtain a further improvement, but merely returns as its output the same improved estimator.
Lehmann–Scheffé minimum variance
If the conditioning statistic is both complete and sufficient, and the starting estimator is unbiased, then the Rao–Blackwell estimator is the unique "best unbiased estimatorMinimum-variance unbiased estimator
In statistics a uniformly minimum-variance unbiased estimator or minimum-variance unbiased estimator is an unbiased estimator that has lower variance than any other unbiased estimator for all possible values of the parameter.The question of determining the UMVUE, if one exists, for a particular...
": see Lehmann–Scheffé theorem
Lehmann–Scheffé theorem
In statistics, the Lehmann–Scheffé theorem is prominent in mathematical statistics, tying together the ideas of completeness, sufficiency, uniqueness, and best unbiased estimation...
.
See also
- Basu's theoremBasu's theoremIn statistics, Basu's theorem states that any boundedly complete sufficient statistic is independent of any ancillary statistic. This is a 1955 result of Debabrata Basu....
— Another result on complete sufficient and ancillary statistics