Karhunen-Loève theorem - AbsoluteAstronomy.com

Stochastic process

In probability theory, a stochastic process , or sometimes random process, is the counterpart to a deterministic process...

es, the Karhunen–Loève theorem (named after Kari Karhunen

Kari Karhunen

Kari Karhunen was a probabilist and a mathematical statistician, of Finnish origin. His name is known to probabilists and statisticians because of the Karhunen–Loève theorem and Karhunen–Loève transform....

and Michel Loève

Michel Loève

Michel Loève was a French American probabilist and a mathematical statistician, of Palestinian Jewish origin. His name is known to probabilists and statisticians because of the Karhunen–Loève theorem and Karhunen–Loève transform.Michel Loève was born in Jaffa, Palestine in 1907, during the Ottoman...

) is a representation of a stochastic process as an infinite linear combination of orthogonal functions, analogous to a Fourier series

Fourier series

In mathematics, a Fourier series decomposes periodic functions or periodic signals into the sum of a set of simple oscillating functions, namely sines and cosines...

representation of a function on a bounded interval. Stochastic processes given by infinite series of this form were considered earlier by Damodar Dharmananda Kosambi. There exist many such expansions of a stochastic process: if the process is indexed over [a, b], any orthonormal basis

Orthonormal basis

In mathematics, particularly linear algebra, an orthonormal basis for inner product space V with finite dimension is a basis for V whose vectors are orthonormal. For example, the standard basis for a Euclidean space Rn is an orthonormal basis, where the relevant inner product is the dot product of...

of L²([a, b]) yields an expansion thereof in that form. The importance of the Karhunen–Loève theorem is that it yields the best such basis in the sense that it minimizes the total mean squared error

Mean squared error

In statistics, the mean squared error of an estimator is one of many ways to quantify the difference between values implied by a kernel density estimator and the true values of the quantity being estimated. MSE is a risk function, corresponding to the expected value of the squared error loss or...

.

In contrast to a Fourier series where the coefficients are real numbers and the expansion basis consists of sinusoidal functions

Trigonometric function

In mathematics, the trigonometric functions are functions of an angle. They are used to relate the angles of a triangle to the lengths of the sides of a triangle...

(that is, sine

Sine

In mathematics, the sine function is a function of an angle. In a right triangle, sine gives the ratio of the length of the side opposite to an angle to the length of the hypotenuse.Sine is usually listed first amongst the trigonometric functions....

and cosine functions), the coefficients in the Karhunen–Loève theorem are random variable

Random variable

In probability and statistics, a random variable or stochastic variable is, roughly speaking, a variable whose value results from a measurement on some type of random process. Formally, it is a function from a probability space, typically to the real numbers, which is measurable functionmeasurable...

s and the expansion basis depends on the process. In fact, the orthogonal basis functions used in this representation are determined by the covariance function

Covariance function

In probability theory and statistics, covariance is a measure of how much two variables change together and the covariance function describes the variance of a random variable process or field...

of the process. One can think that the Karhunen–Loève transform adapts to the process in order to produce the best possible basis for its expansion.

In the case of a centered stochastic process {X_t}_{t ∈ [a, b]} (where centered means that the expectations E(X_t) are defined and equal to 0 for all values of the parameter t in [a, b]) satisfying a technical continuity condition, X_t admits a decomposition

where Z_{k are pairwise uncorrelatedUncorrelated
In probability theory and statistics, two real-valued random variables are said to be uncorrelated if their covariance is zero. Uncorrelatedness is by definition pairwise; i.e...

random variables and the functions e_k are continuous real-valued functions on [a, b] that are pairwise orthogonal in L²[a, b]. It is therefore sometimes said that the expansion is bi-orthogonal since the random coefficients Z_{k are orthogonal in the probability space while the deterministic functions e_k are orthogonal in the time domain. The general case of a process X_t that is not centered can be brought back to the case of a centered process by considering (X_t − E(X_t)) which is a centered process.

Moreover, if the process is GaussianGaussian process
In probability theory and statistics, a Gaussian process is a stochastic process whose realisations consist of random values associated with every point in a range of times such that each such random variable has a normal distribution...

, then the random variables Z_k are Gaussian and stochastically independent. This result generalizes the Karhunen–Loève transform. An important example of a centered real stochastic process on [0,1] is the Wiener processWiener process
In mathematics, the Wiener process is a continuous-time stochastic process named in honor of Norbert Wiener. It is often called standard Brownian motion, after Robert Brown...

; the Karhunen–Loève theorem can be used to provide a canonical orthogonal representation for it. In this case the expansion consists of sinusoidal functions.

The above expansion into uncorrelated random variables is also known as the Karhunen–Loève expansion or Karhunen–Loève decomposition. The empiricalStatistic
A statistic is a single measure of some attribute of a sample . It is calculated by applying a function to the values of the items comprising the sample which are known together as a set of data.More formally, statistical theory defines a statistic as a function of a sample where the function...

version (i.e., with the coefficients computed from a sample) is known as the Karhunen–Loève transform (KLT), principal component analysis, proper orthogonal decomposition (POD), Empirical orthogonal functionsEmpirical orthogonal functions
In statistics and signal processing, the method of empirical orthogonal function analysis is a decomposition of a signal or data set in terms of orthogonal basis functions which are determined from the data. It is the same as performing a principal components analysis on the data, except that the...

(a term used in meteorologyMeteorology
Meteorology is the interdisciplinary scientific study of the atmosphere. Studies in the field stretch back millennia, though significant progress in meteorology did not occur until the 18th century. The 19th century saw breakthroughs occur after observing networks developed across several countries...

and geophysicsGeophysics
Geophysics is the physics of the Earth and its environment in space; also the study of the Earth using quantitative physical methods. The term geophysics sometimes refers only to the geological applications: Earth's shape; its gravitational and magnetic fields; its internal structure and...

), or the HotellingHarold Hotelling
Harold Hotelling was a mathematical statistician and an influential economic theorist.He was Associate Professor of Mathematics at Stanford University from 1927 until 1931, a member of the faculty of Columbia University from 1931 until 1946, and a Professor of Mathematical Statistics at the...

transform.

Formulation

Throughout this article, we will consider a square integrable zero-mean random process X_t defined over a probability space (Ω,F,P) and indexed over a closed interval [a, b], with covariance function K_X(s,t). We thus have:

We associate to K_X a linear operator T_{K_X} defined in the following way:
Since T_{K_X} is a linear operator, it makes sense to talk about its eigenvalues λ_k and eigenfunctions e_k, which are found solving the homogeneous Fredholm integral equationIntegral equation
In mathematics, an integral equation is an equation in which an unknown function appears under an integral sign. There is a close connection between differential and integral equations, and some problems may be formulated either way...

of the second kind

Statement of the theorem
Theorem. Let X_t be a zero-mean square integrable stochastic process defined over a probability space (Ω,F,P) and indexed over a closed and bounded interval [a, b], with continuous covariance function K_X(s,t).

Then K_X(s,t) is a Mercer kernelMercer's theorem
In mathematics, specifically functional analysis, Mercer's theorem is a representation of a symmetric positive-definite function on a square as a sum of a convergent sequence of product functions. This theorem, presented in , is one of the most notable results of the work of James Mercer...

and letting e_k be an orthonormal basis of L²([a, b]) formed by the eigenfunctions of T_{K_X} with respective eigenvalues λ_k, X_t admits the following representation

where the convergence is in L², uniform in t and

Furthermore, the random variables Z_k have zero-mean, are uncorrelated and have variance λ_k

Note that by generalizations of Mercer's theorem we can replace the interval [a, b] with other compact spaces C and the Lebesgue measure on [a, b] with a Borel measure whose support is C.

Proof

The covariance function K_X satisfies the definition of a Mercer kernel. By Mercer's theoremMercer's theorem
In mathematics, specifically functional analysis, Mercer's theorem is a representation of a symmetric positive-definite function on a square as a sum of a convergent sequence of product functions. This theorem, presented in , is one of the most notable results of the work of James Mercer...

, there consequently exists a set {λ_k,e_k(t)} of eigenvalues and eigenfunctions of T_{K_X} forming an orthonormal basis of L²([a,b]), and K_X can be expressed as

The process X_t can be expanded in terms of the eigenfunctions e_k as:
where the coefficients (random variables) Z_k are given by the projection of X_t on the respective eigenfunctions

We may then derive
and:
where we have used the fact that the e_k are eigenfunctions of T_{K_X} and are orthonormal.

Let us now show that the convergence is in L²:
let .
which goes to 0 by Mercer's theorem.

Special case: Gaussian distribution
Since the limit in the mean of jointly Gaussian random variables is jointly Gaussian, and jointly Gaussian random (centered) variables are independent if and only if they are orthogonal, we can also conclude:

Theorem. The variables Z_i have a joint Gaussian distribution and are stochastically independent if the original process {X_t}_t is Gaussian.

In the gaussian case, since the variables Z_i are independent, we can say more:

almost surely.

The Karhunen-Loève transform decorrelates the process
This is a consequence of the independence of the Z_k.

The Karhunen-Loève expansion minimizes the total mean square error
In the introduction, we mentioned that the truncated Karhunen-Loeve expansion was the best approximation of the original process in the sense that it reduces the total mean-square error resulting of its truncation. Because of this property, it is often said that the KL transform optimally compacts the energy.

More specifically, given any orthonormal basis {f_k} of L²([a, b]), we may decompose the process X_t as:

where

and we may approximate X_t by the finite sum for some integer N.

Claim.

Of all such approximations, the KL approximation is the one that minimizes the total mean square error (provided we have arranged the eigenvalues in decreasing order).

Explained variance
An important observation is that since the random coefficients Z_k of the KL expansion are uncorrelated, the Bienaymé formula asserts that the variance of X_t is simply the sum of the variances of the individual components of the sum:

Integrating over [a, b] and using the orthonormality of the e_k, we obtain that the total variance of the process is:

In particular, the total variance of the N-truncated approximation is . As a result, the N-truncated expansion explains of the variance; and if we are content with an approximation that explains, say, 95% of the variance, then we just have to determine an such that .

The Karhunen-Loève expansion has the minimum representation entropy property
Principal Component Analysis
We have established the Karhunen-Loève theorem and derived a few properties thereof. We also noted that one hurdle in its application was the numerical cost of determining the eigenvalues and eigenfunctions of its covariance operator through the Fredholm integral equation of the second kind .

However, when applied to a discrete and finite process , the problem takes a much simpler form and standard algebra can be used to carry out the calculations.

Note that a continuous process can also be sampled at N points in time in order to reduce the problem to a finite version.

We henceforth consider a random N-dimensional vector . As mentioned above, X could contain N samples of a signal but it can hold many more representations depending on the field of application. For instance it could be the answers to a survey or economic data in an econometrics analysis.

As in the continuous version, we assume that X is centered, otherwise we can let (where is the mean vector of X) which is centered.

Let us adapt the procedure to the discrete case.

Covariance matrix
Recall that the main implication and difficulty of the KL transformation is computing the eigenvectors of the linear operator associated to the covariance function, which are given by the solutions to the integral equation written above.

Define Σ, the covariance matrix of X. Σ is an N by N matrix whose elements are given by:

Rewriting the above integral equation to suit the discrete case, we observe that it turns into:

where is an N-dimensional vector.

The integral equation thus reduces to a simple matrix eigenvalue problem, which explains why the PCA has such a broad domain of applications.

Since Σ is a positive definite symmetric matrix, it possesses a set of orthonormal eigenvectors forming a basis of , and we write this set of eigenvalues and corresponding eigenvectors, listed in decreasing values of λ_i. Let also be the orthonormal matrix consisting of these eigenvectors:

Principal Component Transform
It remains to perform the actual KL transformation which we will call Principal Component Transform in this case. Recall that the transform was found by expanding the process with respect to the basis spanned by the eigenvectors of the covariance function. In this case, we hence have:

In a more compact form, the Principal Component Transform of X is defined by:

The i-th component of Y is , the projection of X on and the inverse transform yields the expansion of on the space spanned by the :

As in the continuous case, we may reduce the dimensionality of the problem by truncating the sum at some such that where α is the explained variance threshold we wish to set.

The Wiener process
There are numerous equivalent characterizations of the Wiener processWiener process
In mathematics, the Wiener process is a continuous-time stochastic process named in honor of Norbert Wiener. It is often called standard Brownian motion, after Robert Brown...

which is a mathematical formalization of Brownian motionBrownian motion
Brownian motion or pedesis is the presumably random drifting of particles suspended in a fluid or the mathematical model used to describe such random movements, which is often called a particle theory.The mathematical model of Brownian motion has several real-world applications...

. Here we regard it as the centered standard Gaussian process W_t with covariance function

We restrict the time domain to [a,b]=[0,1] without loss of generality.

The eigenvectors of the covariance kernel are easily determined. These are

and the corresponding eigenvalues are

This gives the following representation of the Wiener process:

Theorem. There is a sequence {Z_i}_i of independent Gaussian random variables with mean zero and variance 1 such that

Note that this representation is only valid for On larger intervals, the increments are not independent. As stated in the theorem, convergence is in the L² norm and uniform in t.

The Brownian bridge
Similarly the Brownian bridgeBrownian bridge
A Brownian bridge is a continuous-time stochastic process B whose probability distribution is the conditional probability distribution of a Wiener process W given the condition that B = B = 0.The expected value of the bridge is zero, with variance t, implying that the most...

which is a stochastic processStochastic process
In probability theory, a stochastic process , or sometimes random process, is the counterpart to a deterministic process...

with covariance function

can be represented as the series

Applications
Adaptive opticsAdaptive optics
Adaptive optics is a technology used to improve the performance of optical systems by reducing the effect of wavefront distortions. It is used in astronomical telescopes and laser communication systems to remove the effects of atmospheric distortion, and in retinal imaging systems to reduce the...

systems sometimes use K–L functions to reconstruct wave-front phase information (Dai 1996, JOSA A).

See also

Principal component analysis
Proper orthogonal decomposition
Polynomial chaosPolynomial chaos
Polynomial chaos , also called "Wiener Chaos expansion", is a non-sampling based method to determine evolution of uncertainty in dynamical system, when there is probabilistic uncertainty in the system parameters....

External links

Mathematica KarhunenLoeveDecomposition function.
E161: Computer Image Processing and Analysis notes by Pr. Ruye Wang at Harvey Mudd CollegeHarvey Mudd College
Harvey Mudd College is a private residential liberal arts college of science, engineering, and mathematics, located in Claremont, California. It is one of the institutions of the contiguous Claremont Colleges, which share adjoining campus grounds....

http://fourier.eng.hmc.edu/e161/lectures/klt/klt.html

The source of this article is wikipedia, the free encyclopedia. The text of this article is licensed under the GFDL.}}