Statistical theory
Encyclopedia
The theory of statistics provides a basis for the whole range of techniques, in both study design
and data analysis
, that are used within applications of statistics
. The theory covers approaches to statistical-decision problems and to statistical inference
, and the actions and deductions that satisfy the basic principles stated for these different approaches. Within a given approach, statistical theory gives ways of comparing statistical procedures; it can find a best possible procedure within a given context for given statistical problems, or can provide guidance on the choice between alternative procedures.
Apart from philosophical considerations about how to make statistical inferences and decisions, much of statistical theory consists of mathematical statistics
, and is closely linked to probability theory
, to utility theory, and to optimization.
s describe the sources of data and can have different types of formulation corresponding to these sources and to the problem being studied. Such problems can be of various kinds:
Statistical models, once specified, can be tested to see whether they provide useful inferences for new data sets. Testing a hypothesis using the data that was used to specify the model is a fallacy, according to the natural science of Bacon and the scientific method of Peirce.
, where the problem is to generate informative data using optimization
and randomization
while measuring and controlling for observational error
. Optimization of data collection reduces the cost of data while satisfying statistical goals, while randomization
allows reliable inferences. Statistical theory provides a basis for good data collection and the structuring of investigations in the topics of:
) is considered in theoretical statistics as a problem of defining what aspects of statistical samples need to be described and how well they can be described from a typically limited sample of data. Thus the problems theoretical statistics considers include:
, statistical theory has the task of considering the types of questions that data analysts might want to ask about the problems they are studying and of providing data analytic techniques for answering them. Some of these tasks are:
When a statistical procedure has been specified in the study protocol, then statistical theory provides well-defined probability statements for the method when applied to all populations that could have arisen from the randomization used to generate the data. This provides an objective way of estimating parameters, estimating confidence intervals, testing hypotheses, and selecting the best. Even for observational data, statistical theory provides a way of calculating a value that can be used to interpret a sample of data from a population, it can provide a means of indicating how well that value is determined by the sample, and thus a means of saying corresponding values derived for different populations are as different as they might seem; however, the reliability of inferences from post-hoc observational data is often worse than for planned randomized generation of data.
Interpreting data is an important objective of statistical research:
Many of the standard methods for these tasks rely on certain statistical assumptions (made in the derivation of the methodology) actually holding in practice. Statistical theory studies the consequences of departures from these assumptions. In addition it provides a range of robust statistical techniques that are less dependent on assumptions, and it provides methods checking whether particular assumptions are reasonable for a give data-set.
Study design
Clinical study design is the formulation of trials and experiments in medical and epidemiological research, sometimes known as clinical trials. Many of the considerations here are shared under the more general topic of design of experiments but there can be others, in particular related to patient...
and data analysis
Data analysis
Analysis of data is a process of inspecting, cleaning, transforming, and modeling data with the goal of highlighting useful information, suggesting conclusions, and supporting decision making...
, that are used within applications of statistics
Statistics
Statistics is the study of the collection, organization, analysis, and interpretation of data. It deals with all aspects of this, including the planning of data collection in terms of the design of surveys and experiments....
. The theory covers approaches to statistical-decision problems and to statistical inference
Statistical inference
In statistics, statistical inference is the process of drawing conclusions from data that are subject to random variation, for example, observational errors or sampling variation...
, and the actions and deductions that satisfy the basic principles stated for these different approaches. Within a given approach, statistical theory gives ways of comparing statistical procedures; it can find a best possible procedure within a given context for given statistical problems, or can provide guidance on the choice between alternative procedures.
Apart from philosophical considerations about how to make statistical inferences and decisions, much of statistical theory consists of mathematical statistics
Mathematical statistics
Mathematical statistics is the study of statistics from a mathematical standpoint, using probability theory as well as other branches of mathematics such as linear algebra and analysis...
, and is closely linked to probability theory
Probability theory
Probability theory is the branch of mathematics concerned with analysis of random phenomena. The central objects of probability theory are random variables, stochastic processes, and events: mathematical abstractions of non-deterministic events or measured quantities that may either be single...
, to utility theory, and to optimization.
Scope
Statistical theory provides an underlying rationale and provides a consistent basis for the choice of methodology used in applied statistics.Modelling
Statistical modelStatistical model
A statistical model is a formalization of relationships between variables in the form of mathematical equations. A statistical model describes how one or more random variables are related to one or more random variables. The model is statistical as the variables are not deterministically but...
s describe the sources of data and can have different types of formulation corresponding to these sources and to the problem being studied. Such problems can be of various kinds:
- SamplingSurvey samplingIn statistics, survey sampling describes the process of selecting a sample of elements from a target population in order to conduct a survey.A survey may refer to many different types or techniques of observation, but in the context of survey sampling it most often involves a questionnaire used to...
from a finite population - Measuring observational errorObservational errorObservational error is the difference between a measured value of quantity and its true value. In statistics, an error is not a "mistake". Variability is an inherent part of things being measured and of the measurement process.-Science and experiments:...
and refining procedures - Studying statistical relationsMultivariate statisticsMultivariate statistics is a form of statistics encompassing the simultaneous observation and analysis of more than one statistical variable. The application of multivariate statistics is multivariate analysis...
Statistical models, once specified, can be tested to see whether they provide useful inferences for new data sets. Testing a hypothesis using the data that was used to specify the model is a fallacy, according to the natural science of Bacon and the scientific method of Peirce.
Data collection
Statistical theory provides a guide to comparing methods of data collectionData collection
Data collection is a term used to describe a process of preparing and collecting data, for example, as part of a process improvement or similar project. The purpose of data collection is to obtain information to keep on record, to make decisions about important issues, to pass information on to...
, where the problem is to generate informative data using optimization
Optimization (mathematics)
In mathematics, computational science, or management science, mathematical optimization refers to the selection of a best element from some set of available alternatives....
and randomization
Randomization
Randomization is the process of making something random; this means:* Generating a random permutation of a sequence .* Selecting a random sample of a population ....
while measuring and controlling for observational error
Observational error
Observational error is the difference between a measured value of quantity and its true value. In statistics, an error is not a "mistake". Variability is an inherent part of things being measured and of the measurement process.-Science and experiments:...
. Optimization of data collection reduces the cost of data while satisfying statistical goals, while randomization
Randomization
Randomization is the process of making something random; this means:* Generating a random permutation of a sequence .* Selecting a random sample of a population ....
allows reliable inferences. Statistical theory provides a basis for good data collection and the structuring of investigations in the topics of:
- Design of experimentsDesign of experimentsIn general usage, design of experiments or experimental design is the design of any information-gathering exercises where variation is present, whether under the full control of the experimenter or not. However, in statistics, these terms are usually used for controlled experiments...
to estimate treatment effects, to test hypotheses, and to optimize responses. - Survey samplingSurvey samplingIn statistics, survey sampling describes the process of selecting a sample of elements from a target population in order to conduct a survey.A survey may refer to many different types or techniques of observation, but in the context of survey sampling it most often involves a questionnaire used to...
to describe populationsStatistical populationA statistical population is a set of entities concerning which statistical inferences are to be drawn, often based on a random sample taken from the population. For example, if we were interested in generalizations about crows, then we would describe the set of crows that is of interest...
Summarising data
The task of summarising statistical data in conventional forms (also known as descriptive statisticsDescriptive statistics
Descriptive statistics quantitatively describe the main features of a collection of data. Descriptive statistics are distinguished from inferential statistics , in that descriptive statistics aim to summarize a data set, rather than use the data to learn about the population that the data are...
) is considered in theoretical statistics as a problem of defining what aspects of statistical samples need to be described and how well they can be described from a typically limited sample of data. Thus the problems theoretical statistics considers include:
- Choosing summary statisticsSummary statisticsIn descriptive statistics, summary statistics are used to summarize a set of observations, in order to communicate the largest amount as simply as possible...
to describe a sample - Summarising probability distributionProbability distributionIn probability theory, a probability mass, probability density, or probability distribution is a function that describes the probability of a random variable taking certain values....
s of sample data while making limited assumptions about the form of distribution that may be met - Summarising the relationships between different quantities measured on the same items with a sample
Interpeting data
Besides the philosophy underlying statistical inferenceStatistical inference
In statistics, statistical inference is the process of drawing conclusions from data that are subject to random variation, for example, observational errors or sampling variation...
, statistical theory has the task of considering the types of questions that data analysts might want to ask about the problems they are studying and of providing data analytic techniques for answering them. Some of these tasks are:
- Summarising populations in the form of a fitted distribution or probability density functionProbability density functionIn probability theory, a probability density function , or density of a continuous random variable is a function that describes the relative likelihood for this random variable to occur at a given point. The probability for the random variable to fall within a particular region is given by the...
- Summarising the relationship between variables using some type of regression analysisRegression analysisIn statistics, regression analysis includes many techniques for modeling and analyzing several variables, when the focus is on the relationship between a dependent variable and one or more independent variables...
- Providing ways of predicting the outcome of a random quantity given other related variables
- Examining the possibility of reducing the number of variables being considered within a problem (the task of Dimension reduction)
When a statistical procedure has been specified in the study protocol, then statistical theory provides well-defined probability statements for the method when applied to all populations that could have arisen from the randomization used to generate the data. This provides an objective way of estimating parameters, estimating confidence intervals, testing hypotheses, and selecting the best. Even for observational data, statistical theory provides a way of calculating a value that can be used to interpret a sample of data from a population, it can provide a means of indicating how well that value is determined by the sample, and thus a means of saying corresponding values derived for different populations are as different as they might seem; however, the reliability of inferences from post-hoc observational data is often worse than for planned randomized generation of data.
Applied statistical inference
Statistical theory provides the basis for a number of data analytic methods that are common across scientific and social research. Some of these are:Interpreting data is an important objective of statistical research:
- Estimating parametersEstimation theoryEstimation theory is a branch of statistics and signal processing that deals with estimating the values of parameters based on measured/empirical data that has a random component. The parameters describe an underlying physical setting in such a way that their value affects the distribution of the...
- Testing statistical hypothesesStatistical hypothesis testingA statistical hypothesis test is a method of making decisions using data, whether from a controlled experiment or an observational study . In statistics, a result is called statistically significant if it is unlikely to have occurred by chance alone, according to a pre-determined threshold...
- Providing a range of valuesInterval estimationIn statistics, interval estimation is the use of sample data to calculate an interval of possible values of an unknown population parameter, in contrast to point estimation, which is a single number. Neyman identified interval estimation as distinct from point estimation...
instead of a point estimate - Regression analysisRegression analysisIn statistics, regression analysis includes many techniques for modeling and analyzing several variables, when the focus is on the relationship between a dependent variable and one or more independent variables...
Many of the standard methods for these tasks rely on certain statistical assumptions (made in the derivation of the methodology) actually holding in practice. Statistical theory studies the consequences of departures from these assumptions. In addition it provides a range of robust statistical techniques that are less dependent on assumptions, and it provides methods checking whether particular assumptions are reasonable for a give data-set.
Further reading
- Davidson, A,C. (2003) Statistical Models. Cambridge University Press. ISBN 0521773393