Student's t-statistic
Encyclopedia
In statistics
Statistics
Statistics is the study of the collection, organization, analysis, and interpretation of data. It deals with all aspects of this, including the planning of data collection in terms of the design of surveys and experiments....

, the t-statistic is a ratio of the departure of an estimated parameter from its notional value and its standard error
Standard error (statistics)
The standard error is the standard deviation of the sampling distribution of a statistic. The term may also be used to refer to an estimate of that standard deviation, derived from a particular sample used to compute the estimate....

. It is used in hypothesis testing
Statistical hypothesis testing
A statistical hypothesis test is a method of making decisions using data, whether from a controlled experiment or an observational study . In statistics, a result is called statistically significant if it is unlikely to have occurred by chance alone, according to a pre-determined threshold...

, for example in the Student's t-test
Student's t-test
A t-test is any statistical hypothesis test in which the test statistic follows a Student's t distribution if the null hypothesis is supported. It is most commonly applied when the test statistic would follow a normal distribution if the value of a scaling term in the test statistic were known...

, in the augmented Dickey–Fuller test, and in bootstrapping
Bootstrapping (statistics)
In statistics, bootstrapping is a computer-based method for assigning measures of accuracy to sample estimates . This technique allows estimation of the sample distribution of almost any statistic using only very simple methods...

.

Definition

Let \scriptstyle\hat\beta be an estimator of parameter β in some statistical model
Statistical model
A statistical model is a formalization of relationships between variables in the form of mathematical equations. A statistical model describes how one or more random variables are related to one or more random variables. The model is statistical as the variables are not deterministically but...

. Then a t
-statistic for this parameter is any quantity of the form

where β0 is a non-random, known constant, and \scriptstyle s.e.(\hat\beta) is the standard error
Standard error (statistics)
The standard error is the standard deviation of the sampling distribution of a statistic. The term may also be used to refer to an estimate of that standard deviation, derived from a particular sample used to compute the estimate....

 of the estimator \scriptstyle\hat\beta. By default, statistical packages report t-statistic with (these t-statistics are used to test the significance of corresponding regressor). However, when t-statistic is needed to test the hypothesis of the form , then a non-zero β0 may be used.

If \scriptstyle\hat\beta is an ordinary least squares
Ordinary least squares
In statistics, ordinary least squares or linear least squares is a method for estimating the unknown parameters in a linear regression model. This method minimizes the sum of squared vertical distances between the observed responses in the dataset and the responses predicted by the linear...

 estimator in the classical linear regression model (that is, with normally distributed and homoskedastic
Homoscedasticity
In statistics, a sequence or a vector of random variables is homoscedastic if all random variables in the sequence or vector have the same finite variance. This is also known as homogeneity of variance. The complementary notion is called heteroscedasticity...

 error terms), and if the true value of parameter β is equal to β0, then the sampling distribution
Sampling distribution
In statistics, a sampling distribution or finite-sample distribution is the probability distribution of a given statistic based on a random sample. Sampling distributions are important in statistics because they provide a major simplification on the route to statistical inference...

 of the t-statistic is the Student’s t-distribution with degrees of freedom, where n is the number of observations, and k is the number of regressors (including the intercept).

In the majority of models the estimator \scriptstyle\hat\beta is consistent for β and distributed asymptotically normally. If the true value of parameter β is equal to β0 and the quantity \scriptstyle s.e.(\hat\beta) correctly estimates the asymptotic variance of this estimator, then the t-statistic will have asymptotically the standard normal distribution.

In some models the distribution of t-statistic is different from normal, even asymptotically. For example, when a time series
Time series
In statistics, signal processing, econometrics and mathematical finance, a time series is a sequence of data points, measured typically at successive times spaced at uniform time intervals. Examples of time series are the daily closing value of the Dow Jones index or the annual flow volume of the...

 with unit root
Unit root
In time series models in econometrics , a unit root is a feature of processes that evolve through time that can cause problems in statistical inference if it is not adequately dealt with....

 is regressed in the augmented Dickey–Fuller test, the test t-statistic will asymptotically have one of the Dickey–Fuller distributions (depending on the test setting).

Use

Most frequently, t-statistics are used in Student's t-test
Student's t-test
A t-test is any statistical hypothesis test in which the test statistic follows a Student's t distribution if the null hypothesis is supported. It is most commonly applied when the test statistic would follow a normal distribution if the value of a scaling term in the test statistic were known...

s, a form of statistical hypothesis testing
Statistical hypothesis testing
A statistical hypothesis test is a method of making decisions using data, whether from a controlled experiment or an observational study . In statistics, a result is called statistically significant if it is unlikely to have occurred by chance alone, according to a pre-determined threshold...

, and in the computation of certain confidence interval
Confidence interval
In statistics, a confidence interval is a particular kind of interval estimate of a population parameter and is used to indicate the reliability of an estimate. It is an observed interval , in principle different from sample to sample, that frequently includes the parameter of interest, if the...

s.

The key property of the t-statistic is that it is a pivotal quantity
Pivotal quantity
In statistics, a pivotal quantity or pivot is a function of observations and unobservable parameters whose probability distribution does not depend on unknown parameters....

 – while defined in terms of the sample mean, its sampling distribution does not depend on the sample parameters, and thus it can be used regardless of what these may be.

One can also divide a residual
Errors and residuals in statistics
In statistics and optimization, statistical errors and residuals are two closely related and easily confused measures of the deviation of a sample from its "theoretical value"...

 by the sample standard deviation:
to compute an estimate for the number of standard deviations a given sample is from the mean, as a sample version of a z-score, the z-score requiring the population parameters.

Prediction

Given a normal distribution with unknown mean and variance, the t-statistic of a future observation after one has made n observations, is an ancillary statistic
Ancillary statistic
In statistics, an ancillary statistic is a statistic whose sampling distribution does not depend on which of the probability distributions among those being considered is the distribution of the statistical population from which the data were taken...

 – a pivotal quantity (does not depend on the values of μ and σ2) that is a statistic (computed from observations). This allows one to compute a frequentist prediction interval
Prediction interval
In statistical inference, specifically predictive inference, a prediction interval is an estimate of an interval in which future observations will fall, with a certain probability, given what has already been observed...

 (a predictive confidence interval
Confidence interval
In statistics, a confidence interval is a particular kind of interval estimate of a population parameter and is used to indicate the reliability of an estimate. It is an observed interval , in principle different from sample to sample, that frequently includes the parameter of interest, if the...

), via the following t-distribution:
Solving for yields the prediction distribution
from which one may compute predictive confidence intervals – given a probability p, one may compute intervals such that 100p% of the time, the next observation will fall in that interval.

History

The term "t-statistic" is abbreviated from "test statistic", while "Student" was the pen name of William Sealy Gosset
William Sealy Gosset
William Sealy Gosset is famous as a statistician, best known by his pen name Student and for his work on Student's t-distribution....

, who introduced the t-statistic and t-test in 1908, while working for the Guinness
Guinness
Guinness is a popular Irish dry stout that originated in the brewery of Arthur Guinness at St. James's Gate, Dublin. Guinness is directly descended from the porter style that originated in London in the early 18th century and is one of the most successful beer brands worldwide, brewed in almost...

 brewery
Brewery
A brewery is a dedicated building for the making of beer, though beer can be made at home, and has been for much of beer's history. A company which makes beer is called either a brewery or a brewing company....

 in Dublin, Ireland.

Related concepts

z-score: If the population parameters are known, then rather than computing the t-statistic, one can compute the z-score; analogously, rather than using a t-test, one uses a z-test
Z-test
A Z-test is any statistical test for which the distribution of the test statistic under the null hypothesis can be approximated by a normal distribution. Due to the central limit theorem, many test statistics are approximately normally distributed for large samples...

. This is rare outside of standardized testing.
Studentized residual
Studentized residual
In statistics, a studentized residual is the quotient resulting from the division of a residual by an estimate of its standard deviation. Typically the standard deviations of residuals in a sample vary greatly from one data point to another even when the errors all have the same standard...

: In regression analysis
Regression analysis
In statistics, regression analysis includes many techniques for modeling and analyzing several variables, when the focus is on the relationship between a dependent variable and one or more independent variables...

, the standard errors of the estimators at different data points vary (compare the middle versus endpoints of a simple linear regression
Simple linear regression
In statistics, simple linear regression is the least squares estimator of a linear regression model with a single explanatory variable. In other words, simple linear regression fits a straight line through the set of n points in such a way that makes the sum of squared residuals of the model as...

), and thus one must divide the different residuals by different estimates for the error, yielding what are called studentized residual
Studentized residual
In statistics, a studentized residual is the quotient resulting from the division of a residual by an estimate of its standard deviation. Typically the standard deviations of residuals in a sample vary greatly from one data point to another even when the errors all have the same standard...

s.

See also

  • F-test
    F-test
    An F-test is any statistical test in which the test statistic has an F-distribution under the null hypothesis.It is most often used when comparing statistical models that have been fit to a data set, in order to identify the model that best fits the population from which the data were sampled. ...

  • Student's t-distribution
  • Student's t-test
    Student's t-test
    A t-test is any statistical hypothesis test in which the test statistic follows a Student's t distribution if the null hypothesis is supported. It is most commonly applied when the test statistic would follow a normal distribution if the value of a scaling term in the test statistic were known...


External links

The source of this article is wikipedia, the free encyclopedia.  The text of this article is licensed under the GFDL.
 
x
OK