Hotelling's T-square distribution
Encyclopedia
In statistics
Statistics
Statistics is the study of the collection, organization, analysis, and interpretation of data. It deals with all aspects of this, including the planning of data collection in terms of the design of surveys and experiments....

 Hotelling's T-squared distribution is important because it arises as the distribution
Probability distribution
In probability theory, a probability mass, probability density, or probability distribution is a function that describes the probability of a random variable taking certain values....

 of a set of statistics which are natural generalisations of the statistics underlying Student's t distribution. In particular, the distribution arises in multivariate statistics
Multivariate statistics
Multivariate statistics is a form of statistics encompassing the simultaneous observation and analysis of more than one statistical variable. The application of multivariate statistics is multivariate analysis...

 in undertaking tests of the differences between the (multivariate) means of different populations, where tests for univariate problems would make use of a t-test. It is proportional to the F distribution.

The distribution is named for Harold Hotelling
Harold Hotelling
Harold Hotelling was a mathematical statistician and an influential economic theorist.He was Associate Professor of Mathematics at Stanford University from 1927 until 1931, a member of the faculty of Columbia University from 1931 until 1946, and a Professor of Mathematical Statistics at the...

, who developed it as a generalization of Student's t distribution.

The distribution

If the notation is used to denote a random variable having Hotelling's T-squared distribution with parameters and then, if a random variable has Hotelling's T-squared distribution,
then
where is the F-distribution with parameters and .

Hotelling's T-squared statistic

Hotelling's T-squared statistic is a generalization of Student's t statistic that is used in multivariate hypothesis testing, and is defined as follows.

Let denote a -variate normal distribution with location
Location parameter
In statistics, a location family is a class of probability distributions that is parametrized by a scalar- or vector-valued parameter μ, which determines the "location" or shift of the distribution...

  and covariance
Covariance matrix
In probability theory and statistics, a covariance matrix is a matrix whose element in the i, j position is the covariance between the i th and j th elements of a random vector...

 . Let


be independent random variable
Random variable
In probability and statistics, a random variable or stochastic variable is, roughly speaking, a variable whose value results from a measurement on some type of random process. Formally, it is a function from a probability space, typically to the real numbers, which is measurable functionmeasurable...

s, which may be represented as column vectors of real numbers. Define


to be the sample mean. It can be shown that


where is the chi-squared distribution with degrees of freedom. However, is often unknown and we wish to do hypothesis testing on the location .

Define


to be the sample covariance. Here we denote transpose
Transpose
In linear algebra, the transpose of a matrix A is another matrix AT created by any one of the following equivalent actions:...

 by an apostrophe
Apostrophe
The apostrophe is a punctuation mark, and sometimes a diacritic mark, in languages that use the Latin alphabet or certain other alphabets...

. It can be shown that is positive-definite
Positive-definite matrix
In linear algebra, a positive-definite matrix is a matrix that in many ways is analogous to a positive real number. The notion is closely related to a positive-definite symmetric bilinear form ....

 and follows a -variate Wishart distribution with degrees of freedom. Hotelling's T-squared statistic is then defined to be


because it can be shown that


i.e.


where is the F-distribution with parameters and . In order to calculate a p value, multiply the statistic by the above constant and use the F distribution.

Hotelling's two-sample T-squared statistic

If and , with the samples independently
Statistical independence
In probability theory, to say that two events are independent intuitively means that the occurrence of one event makes it neither more nor less probable that the other occurs...

 drawn from two independent
Statistical independence
In probability theory, to say that two events are independent intuitively means that the occurrence of one event makes it neither more nor less probable that the other occurs...

 multivariate normal distributions with the same mean and covariance, and we define

as the sample means, and
as the unbiased pooled covariance matrix estimate, then Hotelling's two-sample T-squared statistic is


and it can be related to the F-distribution by


The non-null distribution of this statistic is the noncentral F-distribution
Noncentral F-distribution
In probability theory and statistics, the noncentral F-distribution is a continuous probability distribution that is a generalization of the F-distribution...

 (the ratio of a non-central Chi-squared random variable and an independent central Chi-squared
Chi-squared
In statistics, the term chi-squared has different uses:*chi-squared distribution, a continuous probability distribution;*chi-squared statistic, a statistic used in some statistical tests;...

 random variable)
with
where is the difference vector between the population means.

See also

  • Student's t-test
    Student's t-test
    A t-test is any statistical hypothesis test in which the test statistic follows a Student's t distribution if the null hypothesis is supported. It is most commonly applied when the test statistic would follow a normal distribution if the value of a scaling term in the test statistic were known...

     in univariate statistics
  • Student's t-distribution in univariate probability theory

  • Multivariate Student distribution.
  • F-distribution (commonly tabulated or available in software libraries, and hence used for testing the T-squared statistic using the relationship given above)
  • Wilks' lambda distribution
    Wilks' lambda distribution
    In statistics, Wilks' lambda distribution , is a probability distribution used in multivariate hypothesis testing, especially with regard to the likelihood-ratio test and Multivariate analysis of variance...

     (in multivariate statistics
    Multivariate statistics
    Multivariate statistics is a form of statistics encompassing the simultaneous observation and analysis of more than one statistical variable. The application of multivariate statistics is multivariate analysis...

     Wilks'
    Samuel S. Wilks
    Samuel Stanley Wilks was an American mathematician and academic who played an important role in the development of mathematical statistics, especially in regard to practical applications....

      is to Hotelling's as Snedecor's
    George W. Snedecor
    George Waddel Snedecor was an American mathematician and statistician. He contributed to the foundations of analysis of variance, data analysis, experimental design, and statistical methodology. Snedecor's F distribution and the George W...

      is to Student's
    William Sealy Gosset
    William Sealy Gosset is famous as a statistician, best known by his pen name Student and for his work on Student's t-distribution....

    in univariate statistics).

External links

The source of this article is wikipedia, the free encyclopedia.  The text of this article is licensed under the GFDL.
 
x
OK