Test statistic
Encyclopedia
In statistical hypothesis testing
, a hypothesis test is typically specified in terms of a test statistic, which is a function of the sample; it is considered as a numerical summary of a set of data
that
reduces the data to one or a small number of values that can be used to perform a hypothesis test. Given a null hypothesis
and a test statistic T, we can specify a "null value" T0 such that values of T close to T0 present the strongest evidence in favor of the null hypothesis, whereas values of T far from T0 present the strongest evidence against the null hypothesis. An important property of a test statistic is that we must be able to determine its sampling distribution
under the null hypothesis, which allows us to calculate p-values.
For example, suppose we wish to test whether a coin is fair (i.e. has equal probabilities of producing a head or a tail). If we flip the coin 100 times and record the results, the raw data can be represented as a sequence of 100 Heads and Tails. If our interest is in the marginal
probability of obtaining a head, we only need to record the number T out of the 100 flips that produced a head, and use T0 = 50 as our null value. In this case, the exact sampling distribution of T is the binomial distribution, but for larger sample sizes the normal approximation can be used. Using one of these sampling distributions, it is possible to compute either a one-tailed or two-tailed
p-value for the null hypothesis that the coin is fair. Note that the test statistic in this case reduces a set of 100 numbers to a single numerical summary that can be used for testing.
A test statistic shares some of the same qualities of a descriptive statistic
, and many statistics can be used as both test statistics and descriptive statistics. However a test statistic is specifically intended for use in statistical testing, whereas the main quality of a descriptive statistic is that it is easily interpretable. Some informative descriptive statistics, such as the sample range
, do not make good test statistics since it is difficult to determine their sampling distribution.
Statistical hypothesis testing
A statistical hypothesis test is a method of making decisions using data, whether from a controlled experiment or an observational study . In statistics, a result is called statistically significant if it is unlikely to have occurred by chance alone, according to a pre-determined threshold...
, a hypothesis test is typically specified in terms of a test statistic, which is a function of the sample; it is considered as a numerical summary of a set of data
Data
The term data refers to qualitative or quantitative attributes of a variable or set of variables. Data are typically the results of measurements and can be the basis of graphs, images, or observations of a set of variables. Data are often viewed as the lowest level of abstraction from which...
that
reduces the data to one or a small number of values that can be used to perform a hypothesis test. Given a null hypothesis
Null hypothesis
The practice of science involves formulating and testing hypotheses, assertions that are capable of being proven false using a test of observed data. The null hypothesis typically corresponds to a general or default position...
and a test statistic T, we can specify a "null value" T0 such that values of T close to T0 present the strongest evidence in favor of the null hypothesis, whereas values of T far from T0 present the strongest evidence against the null hypothesis. An important property of a test statistic is that we must be able to determine its sampling distribution
Sampling distribution
In statistics, a sampling distribution or finite-sample distribution is the probability distribution of a given statistic based on a random sample. Sampling distributions are important in statistics because they provide a major simplification on the route to statistical inference...
under the null hypothesis, which allows us to calculate p-values.
For example, suppose we wish to test whether a coin is fair (i.e. has equal probabilities of producing a head or a tail). If we flip the coin 100 times and record the results, the raw data can be represented as a sequence of 100 Heads and Tails. If our interest is in the marginal
Marginal distribution
In probability theory and statistics, the marginal distribution of a subset of a collection of random variables is the probability distribution of the variables contained in the subset. The term marginal variable is used to refer to those variables in the subset of variables being retained...
probability of obtaining a head, we only need to record the number T out of the 100 flips that produced a head, and use T0 = 50 as our null value. In this case, the exact sampling distribution of T is the binomial distribution, but for larger sample sizes the normal approximation can be used. Using one of these sampling distributions, it is possible to compute either a one-tailed or two-tailed
Two-tailed test
The two-tailed test is a statistical test used in inference, in which a given statistical hypothesis, H0 , will be rejected when the value of the test statistic is either sufficiently small or sufficiently large...
p-value for the null hypothesis that the coin is fair. Note that the test statistic in this case reduces a set of 100 numbers to a single numerical summary that can be used for testing.
A test statistic shares some of the same qualities of a descriptive statistic
Descriptive statistics
Descriptive statistics quantitatively describe the main features of a collection of data. Descriptive statistics are distinguished from inferential statistics , in that descriptive statistics aim to summarize a data set, rather than use the data to learn about the population that the data are...
, and many statistics can be used as both test statistics and descriptive statistics. However a test statistic is specifically intended for use in statistical testing, whereas the main quality of a descriptive statistic is that it is easily interpretable. Some informative descriptive statistics, such as the sample range
Range (statistics)
In the descriptive statistics, the range is the length of the smallest interval which contains all the data. It is calculated by subtracting the smallest observation from the greatest and provides an indication of statistical dispersion.It is measured in the same units as the data...
, do not make good test statistics since it is difficult to determine their sampling distribution.
See also
- Sufficiency (statistics)Sufficiency (statistics)In statistics, a sufficient statistic is a statistic which has the property of sufficiency with respect to a statistical model and its associated unknown parameter, meaning that "no other statistic which can be calculated from the same sample provides any additional information as to the value of...
- Neyman–Pearson lemma