Covariance matrix
Encyclopedia
In probability theory
Probability theory
Probability theory is the branch of mathematics concerned with analysis of random phenomena. The central objects of probability theory are random variables, stochastic processes, and events: mathematical abstractions of non-deterministic events or measured quantities that may either be single...

 and statistics
Statistics
Statistics is the study of the collection, organization, analysis, and interpretation of data. It deals with all aspects of this, including the planning of data collection in terms of the design of surveys and experiments....

, a covariance matrix (also known as dispersion matrix) is a matrix
Matrix (mathematics)
In mathematics, a matrix is a rectangular array of numbers, symbols, or expressions. The individual items in a matrix are called its elements or entries. An example of a matrix with six elements isMatrices of the same size can be added or subtracted element by element...

 whose element in the i, j position is the covariance
Covariance
In probability theory and statistics, covariance is a measure of how much two variables change together. Variance is a special case of the covariance when the two variables are identical.- Definition :...

 between the i th and j th elements of a random vector (that is, of a vector of random variable
Random variable
In probability and statistics, a random variable or stochastic variable is, roughly speaking, a variable whose value results from a measurement on some type of random process. Formally, it is a function from a probability space, typically to the real numbers, which is measurable functionmeasurable...

s). Each element of the vector is a scalar
Scalar (mathematics)
In linear algebra, real numbers are called scalars and relate to vectors in a vector space through the operation of scalar multiplication, in which a vector can be multiplied by a number to produce another vector....

 random variable, either with a finite number of observed empirical values or with a finite or infinite number of potential values specified by a theoretical joint probability distribution of all the random variables. Intuitively, the covariance matrix generalizes the notion of variance
Variance
In probability theory and statistics, the variance is a measure of how far a set of numbers is spread out. It is one of several descriptors of a probability distribution, describing how far the numbers lie from the mean . In particular, the variance is one of the moments of a distribution...

 to multiple dimensions. As an example, the variation in a collection of random points in two-dimensional space cannot be characterized fully by a single number, nor would the variances in the x and y directions contain all of the necessary information; a 2×2 matrix would be necessary to fully characterize the two-dimensional variation. Analogous to the fact that it is necessary to build a Hessian matrix
Hessian matrix
In mathematics, the Hessian matrix is the square matrix of second-order partial derivatives of a function; that is, it describes the local curvature of a function of many variables. The Hessian matrix was developed in the 19th century by the German mathematician Ludwig Otto Hesse and later named...

 to fully describe the concavity of a multivariate function, a covariance matrix is necessary to fully describe the variation in a distribution.

Definition

Throughout this article, boldfaced unsubscripted X and Y are used to refer to random vectors, and unboldfaced subscripted Xi and Yi are used to refer to random scalars. If the entries in the column vector \mathbf{X} = \begin{bmatrix}X_1 \\ \vdots \\ X_n \end{bmatrix} are random variable
Random variable
In probability and statistics, a random variable or stochastic variable is, roughly speaking, a variable whose value results from a measurement on some type of random process. Formally, it is a function from a probability space, typically to the real numbers, which is measurable functionmeasurable...

s, each with finite variance
Variance
In probability theory and statistics, the variance is a measure of how far a set of numbers is spread out. It is one of several descriptors of a probability distribution, describing how far the numbers lie from the mean . In particular, the variance is one of the moments of a distribution...

, then the covariance matrix Σ is the matrix whose (ij) entry is the covariance
Covariance
In probability theory and statistics, covariance is a measure of how much two variables change together. Variance is a special case of the covariance when the two variables are identical.- Definition :...

\Sigma_{ij} \mathrm{cov}(X_i, X_j) = \mathrm{E}\begin{bmatrix} (X_i - \mu_i)(X_j - \mu_j) \end{bmatrix} where NEWLINE
NEWLINE
NEWLINE \mu_i = \mathrm{E}(X_i)\, is the expected value
Expected value
In probability theory, the expected value of a random variable is the weighted average of all possible values that this random variable can take on...

 of the ith entry in the vector X. In other words, we have NEWLINE
NEWLINE
NEWLINE \Sigma \begin{bmatrix} \mathrm{E}[(X_1 - \mu_1)(X_1 - \mu_1)] & \mathrm{E}[(X_1 - \mu_1)(X_2 - \mu_2)] & \cdots & \mathrm{E}[(X_1 - \mu_1)(X_n - \mu_n)] \\ \\ \mathrm{E}[(X_2 - \mu_2)(X_1 - \mu_1)] & \mathrm{E}[(X_2 - \mu_2)(X_2 - \mu_2)] & \cdots & \mathrm{E}[(X_2 - \mu_2)(X_n - \mu_n)] \\ \\ \vdots & \vdots & \ddots & \vdots \\ \\ \mathrm{E}[(X_n - \mu_n)(X_1 - \mu_1)] & \mathrm{E}[(X_n - \mu_n)(X_2 - \mu_2)] & \cdots & \mathrm{E}[(X_n - \mu_n)(X_n - \mu_n)] \end{bmatrix}. The inverse of this matrix, \Sigma^{-1}, is the inverse covariance matrix, also known as the concentration matrix or precision matrix. The elements of the precision matrix have an interpretation in terms of partial correlation
Partial correlation
In probability theory and statistics, partial correlation measures the degree of association between two random variables, with the effect of a set of controlling random variables removed.-Formal definition:...

s and partial variances.

Generalization of the variance

The definition above is equivalent to the matrix equality \Sigma=\mathrm{E} \left[ \left( \textbf{X} - \mathrm{E}[\textbf{X}] \right) \left( \textbf{X} - \mathrm{E}[\textbf{X}] \right)^\top \right] This form can be seen as a generalization of the scalar-valued variance
Variance
In probability theory and statistics, the variance is a measure of how far a set of numbers is spread out. It is one of several descriptors of a probability distribution, describing how far the numbers lie from the mean . In particular, the variance is one of the moments of a distribution...

 to higher dimensions. Recall that for a scalar-valued random variable X \sigma^2 = \mathrm{var}(X)

Conflicting nomenclatures and notations

Nomenclatures differ. Some statisticians, following the probabilist William Feller
William Feller
William Feller born Vilibald Srećko Feller , was a Croatian-American mathematician specializing in probability theory.-Early life and education:...

, call this matrix the variance of the random vector X, because it is the natural generalization to higher dimensions of the 1-dimensional variance. Others call it the covariance matrix, because it is the matrix of covariances between the scalar components of the vector X. Thus \operatorname{var}(\textbf{X}) \mathrm{E} \left[ (\textbf{X} - \mathrm{E} [\textbf{X}]) (\textbf{X} - \mathrm{E} [\textbf{X}])^\top \right]. However, the notation for the cross-covariance between two vectors is standard: \operatorname{cov}(\textbf{X},\textbf{Y}) \mathrm{E} \left[ (\textbf{X} - \mathrm{E}[\textbf{X}]) (\textbf{Y} - \mathrm{E}[\textbf{Y}])^\top \right]. The var notation is found in William Feller's two-volume book An Introduction to Probability Theory and Its Applications, but both forms are quite standard and there is no ambiguity between them. The matrix \Sigma is also often called the variance-covariance matrix since the diagonal terms are in fact variances.

Properties

For \Sigma=\mathrm{E} \left[ \left( \textbf{X} - \mathrm{E}[\textbf{X}] \right) \left( \textbf{X} - \mathrm{E}[\textbf{X}] \right)^\top \right] and \mu = \mathrm{E}(\textbf{X}), where X is a random p-dimensional variable and Y a random q-dimensional variable, the following basic properties apply:NEWLINE
    NEWLINE
  1. \Sigma = \mathrm{E}(\mathbf{X X^\top}) - \mathbf{\mu}\mathbf{\mu^\top}
  2. NEWLINE
  3. \Sigma \, is positive-semidefinite and symmetric.
  4. NEWLINE
  5. \operatorname{cov}(\mathbf{A X} + \mathbf{a}) = \mathbf{A}\, \operatorname{cov}(\mathbf{X})\, \mathbf{A^\top}
  6. NEWLINE
  7. \operatorname{cov}(\mathbf{X},\mathbf{Y}) = \operatorname{cov}(\mathbf{Y},\mathbf{X})^\top
  8. NEWLINE
  9. \operatorname{cov}(\mathbf{X}_1 + \mathbf{X}_2,\mathbf{Y}) = \operatorname{cov}(\mathbf{X}_1,\mathbf{Y}) + \operatorname{cov}(\mathbf{X}_2, \mathbf{Y})
  10. NEWLINE
  11. If p = q, then \operatorname{var}(\mathbf{X} + \mathbf{Y}) = \operatorname{var}(\mathbf{X}) + \operatorname{cov}(\mathbf{X},\mathbf{Y}) + \operatorname{cov}(\mathbf{Y}, \mathbf{X}) + \operatorname{var}(\mathbf{Y})
  12. NEWLINE
  13. \operatorname{cov}(\mathbf{AX}, \mathbf{B}^\top\mathbf{Y}) = \mathbf{A}\, \operatorname{cov}(\mathbf{X}, \mathbf{Y}) \,\mathbf{B}
  14. NEWLINE
  15. If \mathbf{X} and \mathbf{Y} are independent, then \operatorname{cov}(\mathbf{X}, \mathbf{Y}) = 0
NEWLINE where \mathbf{X}, \mathbf{X}_1 and \mathbf{X}_2 are random p×1 vectors, \mathbf{Y} is a random q×1 vector, \mathbf{a} is q×1 vector, \mathbf{A} and \mathbf{B} are q×p matrices. This covariance matrix is a useful tool in many different areas. From it a transformation matrix can be derived that allows one to completely decorrelate the data or, from a different point of view, to find an optimal basis for representing the data in a compact way (see Rayleigh quotient for a formal proof and additional properties of covariance matrices). This is called principal components analysis
Principal components analysis
Principal component analysis is a mathematical procedure that uses an orthogonal transformation to convert a set of observations of possibly correlated variables into a set of values of uncorrelated variables called principal components. The number of principal components is less than or equal to...

 (PCA) and Karhunen-Loève transform (KL-transform).

As a linear operator

Applied to one vector, the covariance matrix maps a linear combination, c, of the random variables, X, onto a vector of covariances with those variables: \mathbf c^\top\Sigma = \operatorname{cov}(\mathbf c^\top\mathbf X,\mathbf X). Treated as a 2-form, it yields the covariance between the two linear combinations: \mathbf d^\top\Sigma\mathbf c=\operatorname{cov}(\mathbf d^\top\mathbf X,\mathbf c^\top\mathbf X). The variance of a linear combination is then \mathbf c^\top\Sigma\mathbf c, its covariance with itself. Similarly, the (pseudo-)inverse covariance matrix provides an inner product, \langle c-\mu|\Sigma^+|c-\mu\rangle which induces the Mahalanobis distance
Mahalanobis distance
In statistics, Mahalanobis distance is a distance measure introduced by P. C. Mahalanobis in 1936. It is based on correlations between variables by which different patterns can be identified and analyzed. It gauges similarity of an unknown sample set to a known one. It differs from Euclidean...

, a measure of the "unlikelihood" of c.

Which matrices are covariance matrices?

From the identity just above (let \mathbf{b} be a (p \times 1) real-valued vector) \operatorname{var}(\mathbf{b}^\top\mathbf{X}) = \mathbf{b}^\top \operatorname{var}(\mathbf{X}) \mathbf{b},\, the fact that the variance of any real-valued random variable is nonnegative, and the symmetry of the covariance matrix's definition it follows that only a positive-semidefinite matrix can be a covariance matrix. The answer to the converse question, whether every symmetric positive semi-definite matrix is a covariance matrix, is "yes." To see this, suppose M is a p×p positive-semidefinite matrix. From the finite-dimensional case of the spectral theorem
Spectral theorem
In mathematics, particularly linear algebra and functional analysis, the spectral theorem is any of a number of results about linear operators or about matrices. In broad terms the spectral theorem provides conditions under which an operator or a matrix can be diagonalized...

, it follows that M has a nonnegative symmetric square root, which let us call M1/2. Let \mathbf{X} be any p×1 column vector-valued random variable whose covariance matrix is the p×p identity matrix. Then \operatorname{var}(M^{1/2}\mathbf{X}) = M^{1/2} (\operatorname{var}(\mathbf{X})) M^{1/2} = M.\,

How to find a valid covariance matrix

In some applications (e.g. building data models from only partially observed data) one wants to find the “nearest” covariance matrix to a given symmetric matrix (e.g. of observed covariances). In 2002, Higham formalized the notion of nearness using a weighted Frobenius norm and provided a method for computing the nearest covariance matrix.

Complex random vectors

The variance of a complex
Complex number
A complex number is a number consisting of a real part and an imaginary part. Complex numbers extend the idea of the one-dimensional number line to the two-dimensional complex plane by using the number line for the real part and adding a vertical axis to plot the imaginary part...

 scalar-valued random variable with expected value μ is conventionally defined using complex conjugation: \operatorname{var}(z) \operatorname{E} \left[ (z-\mu)(z-\mu)^{*} \right] where the complex conjugate of a complex number z is denoted z^{*}; thus the variance of a complex number is a real number. If Z is a column-vector of complex-valued random variables, then we take the conjugate transpose
Conjugate transpose
In mathematics, the conjugate transpose, Hermitian transpose, Hermitian conjugate, or adjoint matrix of an m-by-n matrix A with complex entries is the n-by-m matrix A* obtained from A by taking the transpose and then taking the complex conjugate of each entry...

 by both transposing and conjugating, getting a square matrix: \operatorname{E} \left[ (Z-\mu)(Z-\mu)^{H} \right] where Z^{H} denotes the conjugate transpose, which is applicable to the scalar case since the transpose of a scalar is still a scalar. The matrix so obtained will be Hermitian positive-semidefinite, with real numbers in the main diagonal and complex numbers off-diagonal.

Estimation

The derivation of the maximum-likelihood estimator of the covariance matrix of a multivariate normal distribution is perhaps surprisingly subtle. See estimation of covariance matrices
Estimation of covariance matrices
In statistics, sometimes the covariance matrix of a multivariate random variable is not known but has to be estimated. Estimation of covariance matrices then deals with the question of how to approximate the actual covariance matrix on the basis of a sample from the multivariate distribution...

.

Probability density function

If a vector of n possibly correlated random variables is jointly normally distributed, or more generally elliptically distributed
Elliptical distribution
In probability and statistics, an elliptical distribution is any member of a broad family of probability distributions that generalize the multivariate normal distribution and inherit some of its properties.-Definition:...

, then its probability density function
Probability density function
In probability theory, a probability density function , or density of a continuous random variable is a function that describes the relative likelihood for this random variable to occur at a given point. The probability for the random variable to fall within a particular region is given by the...

 can be expressed in terms of the covariance matrix.

See also

NEWLINE
    NEWLINE
  • Estimation of covariance matrices
    Estimation of covariance matrices
    In statistics, sometimes the covariance matrix of a multivariate random variable is not known but has to be estimated. Estimation of covariance matrices then deals with the question of how to approximate the actual covariance matrix on the basis of a sample from the multivariate distribution...

  • NEWLINE
  • Multivariate statistics
    Multivariate statistics
    Multivariate statistics is a form of statistics encompassing the simultaneous observation and analysis of more than one statistical variable. The application of multivariate statistics is multivariate analysis...

  • NEWLINE
  • Sample covariance matrix
  • NEWLINE
  • Gramian matrix
  • NEWLINE
  • Eigenvalue decomposition
  • NEWLINE
  • Quadratic form (statistics)
    Quadratic form (statistics)
    If \epsilon is a vector of n random variables, and \Lambda is an n-dimensional symmetric matrix, then the scalar quantity\epsilon^T\Lambda\epsilonis known as a quadratic form in \epsilon.-Expectation:It can be shown that...

  • NEWLINE
  • Sum of normally distributed random variables
    Sum of normally distributed random variables
    In probability theory, calculation of the sum of normally distributed random variables is an instance of the arithmetic of random variables, which can be quite complex based on the probability distributions of the random variables involved and their relationships.-Independent random variables:If X...

NEWLINE
The source of this article is wikipedia, the free encyclopedia.  The text of this article is licensed under the GFDL.
 
x
OK