Wald's equation
Encyclopedia
In probability theory
Probability theory
Probability theory is the branch of mathematics concerned with analysis of random phenomena. The central objects of probability theory are random variables, stochastic processes, and events: mathematical abstractions of non-deterministic events or measured quantities that may either be single...

, Wald's equation, Wald's identity or Wald's lemma is an important identity
Identity (mathematics)
In mathematics, the term identity has several different important meanings:*An identity is a relation which is tautologically true. This means that whatever the number or value may be, the answer stays the same. For example, algebraically, this occurs if an equation is satisfied for all values of...

 that simplifies the calculation of the expected value
Expected value
In probability theory, the expected value of a random variable is the weighted average of all possible values that this random variable can take on...

 of the sum of a random number of random quantities. In its simplest form, it relates the expectation of a sum of randomly many finite-mean, identically distributed random variable
Random variable
In probability and statistics, a random variable or stochastic variable is, roughly speaking, a variable whose value results from a measurement on some type of random process. Formally, it is a function from a probability space, typically to the real numbers, which is measurable functionmeasurable...

s to the expected number of terms in the sum and the random variables' common expectation under the condition that the number of terms in the sum is independent of the summands. The equation is named after the mathematician
Mathematician
A mathematician is a person whose primary area of study is the field of mathematics. Mathematicians are concerned with quantity, structure, space, and change....

 Abraham Wald
Abraham Wald
- See also :* Sequential probability ratio test * Wald distribution* Wald–Wolfowitz runs test...

. An identity for the second moment is given by the Blackwell–Girshick equation.

Statement

Let {Xn ; n ∈ 
Natural number
In mathematics, the natural numbers are the ordinary whole numbers used for counting and ordering . These purposes are related to the linguistic notions of cardinal and ordinal numbers, respectively...

} be an infinite sequence
Sequence
In mathematics, a sequence is an ordered list of objects . Like a set, it contains members , and the number of terms is called the length of the sequence. Unlike a set, order matters, and exactly the same elements can appear multiple times at different positions in the sequence...

 of real-valued, finite-mean random variables and let N be a nonnegative integer-valued random variable. Assume thatNEWLINE
    NEWLINE
  1. N has finite expectation,
  2. NEWLINE
  3. {Xn ; n∈ℕ} all have the same expectation,
  4. NEWLINE
  5. E[Xn1{N ≥ n}] = E[Xn] P(N ≥ n) for every natural number
    Natural number
    In mathematics, the natural numbers are the ordinary whole numbers used for counting and ordering . These purposes are related to the linguistic notions of cardinal and ordinal numbers, respectively...

     n, and
  6. NEWLINE
  7. the infinite series satisfies
NEWLINENEWLINE
NEWLINE
NEWLINE
NEWLINE
\sum_{n=1}^\infty\operatorname{E}\bigl[|X_n|1_{\{N\ge n\}}\bigr]<\infty.
NEWLINE
NEWLINE Then the random sumS:=\sum_{n=1}^NX_n is integrable and\operatorname{E}[S]=\operatorname{E}[N]\, \operatorname{E}[X_1].

Discussion of assumptions

Clearly, assumptions (1) and (2) are needed to formulate Wald's equation. Assumption (3) controls the amount of dependence allowed between the sequence (Xn)n∈ℕ and the number N of terms, see the counterexample
Counterexample
In logic, and especially in its applications to mathematics and philosophy, a counterexample is an exception to a proposed general rule. For example, consider the proposition "all students are lazy"....

 below for the necessity. Assumption (4) is of more technical nature, implying absolute convergence
Absolute convergence
In mathematics, a series of numbers is said to converge absolutely if the sum of the absolute value of the summand or integrand is finite...

 and therefore allowing arbitrary rearrangement of an infinite series in the proof. Assumption (4) can be strengthened to the simpler condition NEWLINE
NEWLINE
5. there exists a constant C such that E[|Xn|1{N ≥ n}] ≤ C P(N ≥ n) for all natural numbers n.
NEWLINE Indeed, using assumption (5),\sum_{n=1}^\infty\operatorname{E}\bigl[|X_n|1_{\{N\ge n\}}\bigr]\le C\sum_{n=1}^\infty\operatorname{P}(N\ge n), and the last series equals the expectation of N [Proof], which is finite by assumption (1). Therefore, (1) and (5) imply assumption (4). Assume in addition to (1) and (2) thatNEWLINE
NEWLINE
6. N is independent of the sequence (Xn)n∈N and
NEWLINE
7. there exists a constant C such that E[|Xn|] ≤ C for all natural numbers n.
NEWLINE Then all the assumptions (1)–(3) and (5), hence also (4) are satisfied. In particular, the conditions (2) and (7) are satisfied ifNEWLINE
NEWLINE
8. the random variables (Xn)n∈ℕ all have the same distribution.
NEWLINE Note that the random variables of the sequence (Xn)n∈ℕ don't need to be independent. The interesting point is to admit some dependence between the random number N of terms and the sequence (Xn)n∈ℕ. A standard version is to assume (1), (2), (7) and the existence of a filtration (Fn)n∈ℕ0 such thatNEWLINE
NEWLINE
9. N is a stopping time with respect to the filtration, and
NEWLINE
10. Xn and Fn–1 are independent for every n ∈ ℕ.
NEWLINE Then (9) implies that the event {N ≥ n} = {N ≤ n – 1}c is in Fn–1, hence by (10) independent of Xn. Together with (7), this implies (3) and (5). For convenience (see the proof below using the optional stopping theorem) and to specify the relation of the sequence (Xn)n∈ℕ and the filtration (Fn)n∈ℕ0, the following additional assumption is often imposed:NEWLINE
NEWLINE
11. the sequence (Xn)n∈ℕ is adapted
Adapted process
In the study of stochastic processes, an adapted process is one that cannot "see into the future". An informal interpretation is that X is adapted if and only if, for every realisation and every n, Xn is known at time n...

 to the filtration (Fn)n∈ℕ, meaning the Xn is Fn-measurable for every n ∈ ℕ.
NEWLINE Note that (10) and (11) together imply that the random variables (Xn)n∈ℕ are independent.

Application

An application is in actuarial science
Actuarial science
Actuarial science is the discipline that applies mathematical and statistical methods to assess risk in the insurance and finance industries. Actuaries are professionals who are qualified in this field through education and experience...

 when considering the total claim amountS=\sum_{n=1}^NX_n within a certain time period, say one year, arising from a random number N of individual insurance claims, whose sizes are described by the random variables (Xn)n∈ℕ. Under the above assumptions, Wald's equation can be used to calculate the expected total claim amount when information about the average claim number per year and the average claim size is available. Under stronger assumptions and with more information about the underlying distributions, Panjer's recursion
Panjer recursion
The Panjer recursion is an algorithm to compute the probability distribution of a compound random variablewhere both N\, and X_i\, are random variables and of special types. In more general cases the distribution of S is a compound distribution. The recursion for the special cases considered was...

 can be used to calculate the distribution of S.

Example with dependent terms

Let N be an integrable, ℕ0-valued random variable, which is independent of the integrable, real-valued random variable Z with E[Z] = 0. Define Xn = (–1)nZ for all n ∈ ℕ. Then assumptions (1), (2), (6), and (7) with C := E[|Z|] are satisfied, hence also (3) and (5), and Wald's equation applies. If the distribution of Z is not symmetric, then (8) does not hold. Note that, when Z is not almost surely equal to the zero random variable, then (10) and (11) cannot hold simultaneously for any filtration (Fn)n∈ℕ, because Z cannot be independent of itself as E[Z2] = (E[Z])2 = 0 is impossible.

Example where the number of terms depends on the sequence

Let (Xn)n∈ℕ be a sequence of independent, symmetric, and {–1,+1}-valued random variables. For every n ∈ ℕ let Fn be the σ-algebra generated by X1, ..., Xn and define N = n when Xn is the first random variable taking the value +1. Note that P(N = n) =1/2n, hence E[N] < ∞ by the ratio test. The assumptions (1), (8), hence (2) and (7) with C = 1, (9), (10), and (11) hold, hence also (3), and (5) and Wald's equation applies. However, (6) does not hold, because N is defined in terms of the sequence (Xn)n∈ℕ. Intuitively, one might expect to have E[S] > 0 in this example, because the summation stops right after a one, thereby apparently creating a positive bias. However, Wald's equation shows that this intuition is misleading.

A counterexample illustrating the necessity of assumption (3)

Consider a sequence (Xn)n∈ℕ of i.i.d. random variables, taking each of the two values 0 and 1 with probability ½ (actually, only X1 is needed in the following). Define N = 1 – X1. Then S is identically equal to zero, hence E[S] = 0, but E[X1] = ½ and E[N] = ½ and therefore Wald's equation does not hold. Indeed, the assumptions (1), (2) and (4) are satisfied, however, the equation in assumption (3) holds for all n ∈ ℕ except for n = 1.

A counterexample illustrating the necessity of assumption (4)

Very similar to the second example above, let (Xn)n∈ℕ be a sequence of independent, symmetric random variables, where Xn takes each of the values 2n and –2n with probability ½. Let N be the first n ∈ ℕ such that Xn = 2n. Then, as above, N has finite expectation, hence assumption (1) holds. Since E[Xn] = 0 for all n ∈ ℕ, assumption (2) holds. However, since S = 1 almost surely, Wald's equation cannot hold. Since N is a stopping time with respect to the filtration generated by (Xn)n∈ℕ, assumption (3) holds, see above. Therefore, only assumption (4) can fail, and indeed, since\{N\ge n\}=\{X_i=-2^{i} \text{ for } i=1,\ldots,n-1\} and therefore P(N ≥ n) = 1/2n–1 for every n ∈ ℕ, it follows that\sum_{n=1}^\infty\operatorname{E}\bigl[|X_n|1_{\{N\ge n\}}\bigr]

A proof using the optional stopping theorem

Assume (1), (2), (7), (9), (10) and (11). Define the sequence of random variablesM_n = \sum_{i=1}^n (X_i - \operatorname{E}[X_i]),\quad n\in{\mathbb N}_0. Assumption (10) implies that the conditional expectation of Xn given Fn–1 equals E[Xn] almost surely for every n ∈ ℕ, hence (Mn)n∈ℕ0 is a martingale
Martingale (probability theory)
In probability theory, a martingale is a model of a fair game where no knowledge of past events can help to predict future winnings. In particular, a martingale is a sequence of random variables for which, at a particular time in the realized sequence, the expectation of the next value in the...

 with respect to the filtration (Fn)n∈ℕ0. Assumptions (7) and (9) make sure that we can apply the optional stopping theorem
Optional stopping theorem
In probability theory, the optional stopping theorem says that, under certain conditions, the expected value of a martingale at a stopping time is equal to its initial value...

, hence\operatorname{E}\biggl[\sum_{i=1}^N(X_i - \operatorname{E}[X_i])\biggr] = \operatorname{E}[M_N] = \operatorname{E}[M_0] = 0. Rearranging and using assumption (2), it follows that\operatorname{E}\biggl[\sum_{i=1}^NX_i\biggr]\operatorname{E}[N]\,\operatorname{E}[X_1]. Remark: If we drop assumption (2) (i.e. we no longer assume that all variables (X_n) have the same expectation), we can still write\operatorname{E}\biggl[\sum_{i=1}^NX_i\biggr]

General proof

This proof uses only Lebesgue's monotone and dominated convergence theorems
Dominated convergence theorem
In measure theory, Lebesgue's dominated convergence theorem provides sufficient conditions under which two limit processes commute, namely Lebesgue integration and almost everywhere convergence of a sequence of functions...

. We prove the statement as given above in two steps. Step 1: We first show that the random sum S is integrable. Define the partial sums S_n=\sum_{i=1}^nX_i,\quad n\in{\mathbb N}_0. Since N takes almost surely its values in ℕ0 and since S0 = 0, it follows that |S|=\sum_{i=1}^\infty|S_i|\,1_{\{N=i\}}\quad\text{almost surely.} The Lebesgue monotone convergence theorem implies that \operatorname{E}[|S|]=\sum_{i=1}^\infty\operatorname{E}[|S_i|\,1_{\{N=i\}}]. By the triangle inequality, |S_i|\le\sum_{n=1}^i|X_n|,\quad i\in{\mathbb N}. Using this upper estimate and changing the order of summation (which is permitted because all terms are non-negative), we obtain \operatorname{E}[|S|]\le\sum_{n=1}^\infty\sum_{i=n}^\infty\operatorname{E}[|X_n|\,1_{\{N=i\}}]. Using the monotone convergence theorem again, this simplifies to \operatorname{E}[|S|]\le\sum_{n=1}^\infty\operatorname{E}[|X_n|\,1_{\{N\ge n\}}]. By assumption (4), this infinite sequence converges. Step 2: To prove Wald's equation, we essentially go through the same steps again without the absolute value, making use of the integrability of the random sum S. Using the dominated convergence theorem
Dominated convergence theorem
In measure theory, Lebesgue's dominated convergence theorem provides sufficient conditions under which two limit processes commute, namely Lebesgue integration and almost everywhere convergence of a sequence of functions...

 and the definition S_i=\sum_{n=1}^iX_n,\quad i\in{\mathbb N}, of the partial sums, it follows that \operatorname{E}[S]=\sum_{i=1}^\infty\operatorname{E}[S_i1_{\{N=i\}}]\sum_{i=1}^\infty\sum_{n=1}^i\operatorname{E}[X_n1_{\{N=i\}}]. Due to the absolute convergence proved above using assumption (4), we may rearrange the summation and obtain that \operatorname{E}[S]=\sum_{n=1}^\infty\sum_{i=n}^\infty\operatorname{E}[X_n1_{\{N=i\}}]=\sum_{n=1}^\infty\operatorname{E}[X_n1_{\{N\ge n\}}], where we used the dominated convergence theorem for the second equality. Using first assumption (3) and then assumption (2), it follows that \operatorname{E}[S]=\operatorname{E}[X_1]\sum_{n=1}^\infty\operatorname{P}(N\ge n). The remaining sequence is the expectation of N [Proof], which is finite by assumption (1). This completes the proof.

Generalizations

NEWLINE
    NEWLINE
  • Wald's equation can be transferred to Rd-valued random variables (Xn)n∈ℕ by applying the one-dimensional version to every component.
  • NEWLINE
  • If (Xn)n∈ℕ are Bochner-integrable
    Bochner integral
    In mathematics, the Bochner integral, named for Salomon Bochner, extends the definition of Lebesgue integral to functions that take values in a Banach space, as the limit of integrals of simple functions.-Definition:...

     random variables taking values in a Banach space
    Banach space
    In mathematics, Banach spaces is the name for complete normed vector spaces, one of the central objects of study in functional analysis. A complete normed vector space is a vector space V with a norm ||·|| such that every Cauchy sequence in V has a limit in V In mathematics, Banach spaces is the...

    , then the general proof above can be adjusted accordingly.
NEWLINE
The source of this article is wikipedia, the free encyclopedia.  The text of this article is licensed under the GFDL.
 
x
OK