Wald's equation - AbsoluteAstronomy.com

Probability theory

Probability theory is the branch of mathematics concerned with analysis of random phenomena. The central objects of probability theory are random variables, stochastic processes, and events: mathematical abstractions of non-deterministic events or measured quantities that may either be single...

, Wald's equation, Wald's identity or Wald's lemma is an important identity

Identity (mathematics)

In mathematics, the term identity has several different important meanings:*An identity is a relation which is tautologically true. This means that whatever the number or value may be, the answer stays the same. For example, algebraically, this occurs if an equation is satisfied for all values of...

that simplifies the calculation of the expected value

Expected value

In probability theory, the expected value of a random variable is the weighted average of all possible values that this random variable can take on...

of the sum of a random number of random quantities. In its simplest form, it relates the expectation of a sum of randomly many finite-mean, identically distributed random variable

Random variable

In probability and statistics, a random variable or stochastic variable is, roughly speaking, a variable whose value results from a measurement on some type of random process. Formally, it is a function from a probability space, typically to the real numbers, which is measurable functionmeasurable...

s to the expected number of terms in the sum and the random variables' common expectation under the condition that the number of terms in the sum is independent of the summands. The equation is named after the mathematician

Mathematician

A mathematician is a person whose primary area of study is the field of mathematics. Mathematicians are concerned with quantity, structure, space, and change....

Abraham Wald

- See also :* Sequential probability ratio test * Wald distribution* Wald–Wolfowitz runs test...

. An identity for the second moment is given by the Blackwell–Girshick equation.

Statement

Let {X_n ; n ∈ ℕ

Natural number

In mathematics, the natural numbers are the ordinary whole numbers used for counting and ordering . These purposes are related to the linguistic notions of cardinal and ordinal numbers, respectively...

} be an infinite sequence

Sequence

In mathematics, a sequence is an ordered list of objects . Like a set, it contains members , and the number of terms is called the length of the sequence. Unlike a set, order matters, and exactly the same elements can appear multiple times at different positions in the sequence...

of real-valued, finite-mean random variables and let N be a nonnegative integer-valued random variable. Assume thatNEWLINE

N has finite expectation,
{X_n ; n∈ℕ} all have the same expectation,
E[X_n1_{N ≥ n}] = E[X_n] P(N ≥ n) for every natural number
Natural number
In mathematics, the natural numbers are the ordinary whole numbers used for counting and ordering . These purposes are related to the linguistic notions of cardinal and ordinal numbers, respectively...

n, and
the infinite series satisfies

NEWLINENEWLINE

NEWLINE

\sum_{n=1}^\infty\operatorname{E}\bigl[|X_n|1_{\{N\ge n\}}\bigr]<\infty.

NEWLINE

NEWLINE Then the random sum

S:=\sum_{n=1}^NX_n

is integrable and

\operatorname{E}[S]=\operatorname{E}[N]\, \operatorname{E}[X_1].

Discussion of assumptions

Clearly, assumptions (1) and (2) are needed to formulate Wald's equation. Assumption (3) controls the amount of dependence allowed between the sequence (X_n)_n∈ℕ and the number N of terms, see the counterexample

Counterexample

In logic, and especially in its applications to mathematics and philosophy, a counterexample is an exception to a proposed general rule. For example, consider the proposition "all students are lazy"....

below for the necessity. Assumption (4) is of more technical nature, implying absolute convergence

Absolute convergence

In mathematics, a series of numbers is said to converge absolutely if the sum of the absolute value of the summand or integrand is finite...

and therefore allowing arbitrary rearrangement of an infinite series in the proof. Assumption (4) can be strengthened to the simpler condition NEWLINE

NEWLINE

5. there exists a constant C such that E[|X_n|1_{N ≥ n}] ≤ C P(N ≥ n) for all natural numbers n.

NEWLINE Indeed, using assumption (5),

\sum_{n=1}^\infty\operatorname{E}\bigl[|X_n|1_{\{N\ge n\}}\bigr]\le
C\sum_{n=1}^\infty\operatorname{P}(N\ge n),

and the last series equals the expectation of N ^[Proof], which is finite by assumption (1). Therefore, (1) and (5) imply assumption (4). Assume in addition to (1) and (2) thatNEWLINE

NEWLINE

6. N is independent of the sequence (X_n)_n∈N and

NEWLINE

7. there exists a constant C such that E[|X_n|] ≤ C for all natural numbers n.

NEWLINE Then all the assumptions (1)–(3) and (5), hence also (4) are satisfied. In particular, the conditions (2) and (7) are satisfied ifNEWLINE

NEWLINE

8. the random variables (X_n)_n∈ℕ all have the same distribution.

NEWLINE Note that the random variables of the sequence (X_n)_n∈ℕ don't need to be independent. The interesting point is to admit some dependence between the random number N of terms and the sequence (X_n)_n∈ℕ. A standard version is to assume (1), (2), (7) and the existence of a filtration (F_n)_n∈ℕ₀ such thatNEWLINE

NEWLINE

9. N is a stopping time with respect to the filtration, and

NEWLINE

10. X_n and F_n–1 are independent for every n ∈ ℕ.

NEWLINE Then (9) implies that the event {N ≥ n} = {N ≤ n – 1}^{c is in F_n–1, hence by (10) independent of X_n}. Together with (7), this implies (3) and (5). For convenience (see the proof below using the optional stopping theorem) and to specify the relation of the sequence (X_n)_n∈ℕ and the filtration (F_n)_n∈ℕ₀, the following additional assumption is often imposed:NEWLINE

NEWLINE

11. the sequence (X_n)_n∈ℕ is adapted

Adapted process

In the study of stochastic processes, an adapted process is one that cannot "see into the future". An informal interpretation is that X is adapted if and only if, for every realisation and every n, Xn is known at time n...

to the filtration (F_n)_n∈ℕ, meaning the X_n is F_n-measurable for every n ∈ ℕ.

NEWLINE Note that (10) and (11) together imply that the random variables (X_n)_n∈ℕ are independent.

Application

An application is in actuarial science

Actuarial science

Actuarial science is the discipline that applies mathematical and statistical methods to assess risk in the insurance and finance industries. Actuaries are professionals who are qualified in this field through education and experience...

when considering the total claim amount

S=\sum_{n=1}^NX_n

within a certain time period, say one year, arising from a random number N of individual insurance claims, whose sizes are described by the random variables (X_n)_n∈ℕ. Under the above assumptions, Wald's equation can be used to calculate the expected total claim amount when information about the average claim number per year and the average claim size is available. Under stronger assumptions and with more information about the underlying distributions, Panjer's recursion

Panjer recursion

The Panjer recursion is an algorithm to compute the probability distribution of a compound random variablewhere both N\, and X_i\, are random variables and of special types. In more general cases the distribution of S is a compound distribution. The recursion for the special cases considered was...

can be used to calculate the distribution of S.

Example with dependent terms

Let N be an integrable, ℕ₀-valued random variable, which is independent of the integrable, real-valued random variable Z with E[Z] = 0. Define X_n = (–1)ⁿZ for all n ∈ ℕ. Then assumptions (1), (2), (6), and (7) with C := E[|Z|] are satisfied, hence also (3) and (5), and Wald's equation applies. If the distribution of Z is not symmetric, then (8) does not hold. Note that, when Z is not almost surely equal to the zero random variable, then (10) and (11) cannot hold simultaneously for any filtration (F_n)_n∈ℕ, because Z cannot be independent of itself as E[Z ²] = (E[Z])² = 0 is impossible.

Example where the number of terms depends on the sequence

Let (X_n)_n∈ℕ be a sequence of independent, symmetric, and {–1,+1}-valued random variables. For every n ∈ ℕ let F_n be the σ-algebra generated by X₁, ..., X_n and define N = n when X_n is the first random variable taking the value +1. Note that P(N = n) =1/2ⁿ, hence E[N] < ∞ by the ratio test. The assumptions (1), (8), hence (2) and (7) with C = 1, (9), (10), and (11) hold, hence also (3), and (5) and Wald's equation applies. However, (6) does not hold, because N is defined in terms of the sequence (X_n)_n∈ℕ. Intuitively, one might expect to have E[S] > 0 in this example, because the summation stops right after a one, thereby apparently creating a positive bias. However, Wald's equation shows that this intuition is misleading.

A counterexample illustrating the necessity of assumption (3)

Consider a sequence (X_n)_n∈ℕ of i.i.d. random variables, taking each of the two values 0 and 1 with probability ½ (actually, only X₁ is needed in the following). Define N = 1 – X₁. Then S is identically equal to zero, hence E[S] = 0, but E[X₁] = ½ and E[N] = ½ and therefore Wald's equation does not hold. Indeed, the assumptions (1), (2) and (4) are satisfied, however, the equation in assumption (3) holds for all n ∈ ℕ except for n = 1.

A counterexample illustrating the necessity of assumption (4)

Very similar to the second example above, let (X_n)_n∈ℕ be a sequence of independent, symmetric random variables, where X_n takes each of the values 2ⁿ and –2ⁿ with probability ½. Let N be the first n ∈ ℕ such that X_n = 2ⁿ. Then, as above, N has finite expectation, hence assumption (1) holds. Since E[X_n] = 0 for all n ∈ ℕ, assumption (2) holds. However, since S = 1 almost surely, Wald's equation cannot hold. Since N is a stopping time with respect to the filtration generated by (X_n)_n∈ℕ, assumption (3) holds, see above. Therefore, only assumption (4) can fail, and indeed, since

\{N\ge n\}=\{X_i=-2^{i} \text{ for } i=1,\ldots,n-1\}

and therefore P(N ≥ n) = 1/2^n–1 for every n ∈ ℕ, it follows that

\sum_{n=1}^\infty\operatorname{E}\bigl[|X_n|1_{\{N\ge n\}}\bigr]

A proof using the optional stopping theorem

Assume (1), (2), (7), (9), (10) and (11). Define the sequence of random variables

M_n = \sum_{i=1}^n (X_i - \operatorname{E}[X_i]),\quad n\in{\mathbb N}_0.

Assumption (10) implies that the conditional expectation of X_n given F_n–1 equals E[X_n] almost surely for every n ∈ ℕ, hence (M_n)_n∈ℕ₀ is a martingale

Martingale (probability theory)

In probability theory, a martingale is a model of a fair game where no knowledge of past events can help to predict future winnings. In particular, a martingale is a sequence of random variables for which, at a particular time in the realized sequence, the expectation of the next value in the...

with respect to the filtration (F_n)_n∈ℕ₀. Assumptions (7) and (9) make sure that we can apply the optional stopping theorem

Optional stopping theorem

In probability theory, the optional stopping theorem says that, under certain conditions, the expected value of a martingale at a stopping time is equal to its initial value...

, hence

\operatorname{E}\biggl[\sum_{i=1}^N(X_i - \operatorname{E}[X_i])\biggr] = \operatorname{E}[M_N] = \operatorname{E}[M_0] = 0.

Rearranging and using assumption (2), it follows that

\operatorname{E}\biggl[\sum_{i=1}^NX_i\biggr]\operatorname{E}[N]\,\operatorname{E}[X_1].

Remark: If we drop assumption (2) (i.e. we no longer assume that all variables

(X_n)

have the same expectation), we can still write

\operatorname{E}\biggl[\sum_{i=1}^NX_i\biggr]

General proof

This proof uses only Lebesgue's monotone and dominated convergence theorems

Dominated convergence theorem

In measure theory, Lebesgue's dominated convergence theorem provides sufficient conditions under which two limit processes commute, namely Lebesgue integration and almost everywhere convergence of a sequence of functions...

. We prove the statement as given above in two steps. Step 1: We first show that the random sum S is integrable. Define the partial sums

S_n=\sum_{i=1}^nX_i,\quad n\in{\mathbb N}_0.

Since N takes almost surely its values in ℕ₀ and since S₀ = 0, it follows that

|S|=\sum_{i=1}^\infty|S_i|\,1_{\{N=i\}}\quad\text{almost surely.}

The Lebesgue monotone convergence theorem implies that

\operatorname{E}[|S|]=\sum_{i=1}^\infty\operatorname{E}[|S_i|\,1_{\{N=i\}}].

By the triangle inequality,

|S_i|\le\sum_{n=1}^i|X_n|,\quad i\in{\mathbb N}.

Using this upper estimate and changing the order of summation (which is permitted because all terms are non-negative), we obtain

\operatorname{E}[|S|]\le\sum_{n=1}^\infty\sum_{i=n}^\infty\operatorname{E}[|X_n|\,1_{\{N=i\}}].

Using the monotone convergence theorem again, this simplifies to

\operatorname{E}[|S|]\le\sum_{n=1}^\infty\operatorname{E}[|X_n|\,1_{\{N\ge n\}}].

By assumption (4), this infinite sequence converges. Step 2: To prove Wald's equation, we essentially go through the same steps again without the absolute value, making use of the integrability of the random sum S. Using the dominated convergence theorem

Dominated convergence theorem

and the definition

S_i=\sum_{n=1}^iX_n,\quad i\in{\mathbb N},

of the partial sums, it follows that

\operatorname{E}[S]=\sum_{i=1}^\infty\operatorname{E}[S_i1_{\{N=i\}}]\sum_{i=1}^\infty\sum_{n=1}^i\operatorname{E}[X_n1_{\{N=i\}}].

Due to the absolute convergence proved above using assumption (4), we may rearrange the summation and obtain that

\operatorname{E}[S]=\sum_{n=1}^\infty\sum_{i=n}^\infty\operatorname{E}[X_n1_{\{N=i\}}]=\sum_{n=1}^\infty\operatorname{E}[X_n1_{\{N\ge n\}}],

where we used the dominated convergence theorem for the second equality. Using first assumption (3) and then assumption (2), it follows that

\operatorname{E}[S]=\operatorname{E}[X_1]\sum_{n=1}^\infty\operatorname{P}(N\ge n).

The remaining sequence is the expectation of N ^[Proof], which is finite by assumption (1). This completes the proof.

Generalizations

NEWLINE

Wald's equation can be transferred to R^{d-valued random variables (X_n)_n∈ℕ by applying the one-dimensional version to every component.}
If (X_n)_n∈ℕ are Bochner-integrable
Bochner integral
In mathematics, the Bochner integral, named for Salomon Bochner, extends the definition of Lebesgue integral to functions that take values in a Banach space, as the limit of integrals of simple functions.-Definition:...

random variables taking values in a Banach space
Banach space
In mathematics, Banach spaces is the name for complete normed vector spaces, one of the central objects of study in functional analysis. A complete normed vector space is a vector space V with a norm ||·|| such that every Cauchy sequence in V has a limit in V In mathematics, Banach spaces is the...

, then the general proof above can be adjusted accordingly.

NEWLINE

The source of this article is wikipedia, the free encyclopedia. The text of this article is licensed under the GFDL.