Covariance and correlation
Encyclopedia
In probability theory
and statistics
, the mathematical descriptions of covariance and correlation are very similar. Both describe the degree of similarity between two random variable
s or sets of random variables.
where and are the standard deviation
s of X and Y respectively. Notably, correlation is dimensionless while covariance is in units obtained by multiplying the units of the two variables. The covariance of a variable with itself (i.e. X = Y) is called the variance
. The correlation of a variable with itself is always 1 (except in the degenerate case where the two variances are zero, in which case the correlation does not exist).
In the case of a stationary time series
, both the means and variances are constant and the covariance and correlation are functions only of the difference in the indices:
Although the values of the theoretical covariances and correlations are linked in the above way, the probability distributions of sample estimates of these quantities are not linked in any simple way and they generally need to be treated separately. These distributions depend on the joint distribution
of the pair of random quantities (X,Y) when the values are assumed independent across different pairs. In the case of a time series, the distributions depend on the joint distributions of the whole time-series.
Probability theory
Probability theory is the branch of mathematics concerned with analysis of random phenomena. The central objects of probability theory are random variables, stochastic processes, and events: mathematical abstractions of non-deterministic events or measured quantities that may either be single...
and statistics
Statistics
Statistics is the study of the collection, organization, analysis, and interpretation of data. It deals with all aspects of this, including the planning of data collection in terms of the design of surveys and experiments....
, the mathematical descriptions of covariance and correlation are very similar. Both describe the degree of similarity between two random variable
Random variable
In probability and statistics, a random variable or stochastic variable is, roughly speaking, a variable whose value results from a measurement on some type of random process. Formally, it is a function from a probability space, typically to the real numbers, which is measurable functionmeasurable...
s or sets of random variables.
correlation Correlation In statistics, dependence refers to any statistical relationship between two random variables or two sets of data. Correlation refers to any of a broad class of statistical relationships involving dependence.... |
|
covariance Covariance In probability theory and statistics, covariance is a measure of how much two variables change together. Variance is a special case of the covariance when the two variables are identical.- Definition :... |
where and are the standard deviation
Standard deviation
Standard deviation is a widely used measure of variability or diversity used in statistics and probability theory. It shows how much variation or "dispersion" there is from the average...
s of X and Y respectively. Notably, correlation is dimensionless while covariance is in units obtained by multiplying the units of the two variables. The covariance of a variable with itself (i.e. X = Y) is called the variance
Variance
In probability theory and statistics, the variance is a measure of how far a set of numbers is spread out. It is one of several descriptors of a probability distribution, describing how far the numbers lie from the mean . In particular, the variance is one of the moments of a distribution...
. The correlation of a variable with itself is always 1 (except in the degenerate case where the two variances are zero, in which case the correlation does not exist).
In the case of a stationary time series
Time series
In statistics, signal processing, econometrics and mathematical finance, a time series is a sequence of data points, measured typically at successive times spaced at uniform time intervals. Examples of time series are the daily closing value of the Dow Jones index or the annual flow volume of the...
, both the means and variances are constant and the covariance and correlation are functions only of the difference in the indices:
cross correlation | |
cross covariance Cross covariance In statistics, the term cross-covariance is sometimes used to refer to the covariance cov between two random vectors X and Y, in order to distinguish that concept from the "covariance" of a random vector X, which is understood to be the matrix of covariances between the scalar components of X.In... |
Although the values of the theoretical covariances and correlations are linked in the above way, the probability distributions of sample estimates of these quantities are not linked in any simple way and they generally need to be treated separately. These distributions depend on the joint distribution
Joint distribution
In the study of probability, given two random variables X and Y that are defined on the same probability space, the joint distribution for X and Y defines the probability of events defined in terms of both X and Y...
of the pair of random quantities (X,Y) when the values are assumed independent across different pairs. In the case of a time series, the distributions depend on the joint distributions of the whole time-series.