Statistical assumptions
Encyclopedia
Statistical assumptions are general assumptions about statistical populations.

Statistics
Statistics
Statistics is the study of the collection, organization, analysis, and interpretation of data. It deals with all aspects of this, including the planning of data collection in terms of the design of surveys and experiments....

, like all mathematical disciplines, does not generate valid conclusions from nothing. In order to generate interesting conclusions about real statistical population
Statistical population
A statistical population is a set of entities concerning which statistical inferences are to be drawn, often based on a random sample taken from the population. For example, if we were interested in generalizations about crows, then we would describe the set of crows that is of interest...

s, it is usually required to make some background assumptions. These must be made with care, because inappropriate assumptions can generate wildly inaccurate conclusions.

The most commonly applied statistical assumptions are:
  1. independence of observations from each other: This assumption is a common error. (see statistical independence
    Statistical independence
    In probability theory, to say that two events are independent intuitively means that the occurrence of one event makes it neither more nor less probable that the other occurs...

    )
  2. independence of observational error from potential confounding effects
  3. exact or approximate normality of observations: The assumption of normality is often erroneous, because many populations are not normal. However, it is standard practice to assume that the sample mean from a random sample is normal, because of the central-limit theorem. (see normal distribution)
  4. linearity of graded responses to quantitative stimuli (see linear regression
    Linear regression
    In statistics, linear regression is an approach to modeling the relationship between a scalar variable y and one or more explanatory variables denoted X. The case of one explanatory variable is called simple regression...

    )

Types of assumptions

Statistical assumptions can be categorised into a number of different types:
  • Non-modelling assumptions. Statistical analyses of data involve making certain types of assumption, whether or not a formal statistical model is used. Such assumptions underlie even descriptive statistics
    Descriptive statistics
    Descriptive statistics quantitatively describe the main features of a collection of data. Descriptive statistics are distinguished from inferential statistics , in that descriptive statistics aim to summarize a data set, rather than use the data to learn about the population that the data are...

    .
    • Population assumptions. A statistical analysis of data is made on the basis that the observations available derive from either a single population or several different populations, each of which is in some way meaningful. Here a "population" is informally a set of other possible observations that might have been made. The assumption here is a simple one, to the effect that the observer should know that the observations obtained are representative of the problem, topic or class of objects being studied.
    • Sampling assumptions. These relate to the way in which observations have been gathered and may often involve an assumption of random selection of some type.
  • Modelling assumptions. These may be divided into two types:
    • Distributional assumptions. Where a statistical model
      Statistical model
      A statistical model is a formalization of relationships between variables in the form of mathematical equations. A statistical model describes how one or more random variables are related to one or more random variables. The model is statistical as the variables are not deterministically but...

       involves terms relating to random errors assumptions may be made about the probability distribution
      Probability distribution
      In probability theory, a probability mass, probability density, or probability distribution is a function that describes the probability of a random variable taking certain values....

       of these errors. In some cases, the distributional assumption relates to the observations themselves.
    • Structural assumptions. Statistical relationships between variables are often modelled by equating one variable to a function of another (or several others), plus a random error
      Random error
      Random errors are errors in measurement that lead to measurable values being inconsistent when repeated measures of a constant attribute or quantity are taken...

      . Models often involve making a structural assumption about the form of the functional relationship here: for example, as in linear regression
      Linear regression
      In statistics, linear regression is an approach to modeling the relationship between a scalar variable y and one or more explanatory variables denoted X. The case of one explanatory variable is called simple regression...

      . This can be generalised to models involving relationships between underlying unobserved latent variable
      Latent variable
      In statistics, latent variables , are variables that are not directly observed but are rather inferred from other variables that are observed . Mathematical models that aim to explain observed variables in terms of latent variables are called latent variable models...

      s.
    • Cross-variation assumptions. These assumptions involve the joint probability distributions of either the observations themselves or the random errors in a model. Simple models may include the assumption that observations or errors are statistically independent.

Checking assumptions

Given that the validity of conclusions drawn from a statistical analysis depend on the validity of any assumptions made, it is clearly important that these assumptions should be reviewed at some stage. In some instances, for example where data are lacking, this may have to be restricted to just making a judgement about whether an assumption is reasonable. This can be expended slightly to trying to judge what effect a departure from the assumptions might have. Where more extensive data are available, various types of procedure for statistical model validation are available, in particular for regression model validation.
The source of this article is wikipedia, the free encyclopedia.  The text of this article is licensed under the GFDL.
 
x
OK