Glossary of probability and statistics
Encyclopedia
The following is a glossary
Glossary
A glossary, also known as an idioticon, vocabulary, or clavis, is an alphabetical list of terms in a particular domain of knowledge with the definitions for those terms...

 of terms. It is not intended to be all-inclusive.

Concerned fields

  • Probability theory
    Probability theory
    Probability theory is the branch of mathematics concerned with analysis of random phenomena. The central objects of probability theory are random variables, stochastic processes, and events: mathematical abstractions of non-deterministic events or measured quantities that may either be single...

  • Algebra of random variables
    Algebra of random variables
    In the algebraic axiomatization of probability theory, the primary concept is not that of probability of an event, but rather that of a random variable. Probability distributions are determined by assigning an expectation to each random variable...

     (linear algebra
    Linear algebra
    Linear algebra is a branch of mathematics that studies vector spaces, also called linear spaces, along with linear functions that input one vector and output another. Such functions are called linear maps and can be represented by matrices if a basis is given. Thus matrix theory is often...

    )
  • Statistics
    Statistics
    Statistics is the study of the collection, organization, analysis, and interpretation of data. It deals with all aspects of this, including the planning of data collection in terms of the design of surveys and experiments....

  • Measure theory
  • Estimation theory
    Estimation theory
    Estimation theory is a branch of statistics and signal processing that deals with estimating the values of parameters based on measured/empirical data that has a random component. The parameters describe an underlying physical setting in such a way that their value affects the distribution of the...


Glossary

  • Atomic event : another name for elementary event.
  • Bias
    Bias (statistics)
    A statistic is biased if it is calculated in such a way that it is systematically different from the population parameter of interest. The following lists some types of, or aspects of, bias which should not be considered mutually exclusive:...

     can refer either to a sample not being representative of the population, or to the difference between the expected value of an estimator and the true value.
  • Binary data is data that can take only two values, usually represented by 0 and 1.
  • Conditional distribution
    Conditional distribution
    Given two jointly distributed random variables X and Y, the conditional probability distribution of Y given X is the probability distribution of Y when X is known to be a particular value...

     : Given two jointly distributed random variables X and Y, the conditional probability distribution of Y given X (written "Y | X") is the probability distribution of Y when X is known to be a particular value.
  • Conditional probability
    Conditional probability
    In probability theory, the "conditional probability of A given B" is the probability of A if B is known to occur. It is commonly notated P, and sometimes P_B. P can be visualised as the probability of event A when the sample space is restricted to event B...

     is the probability of some event A, assuming event B. Conditional probability is written P(A|B), and is read "the probability of A, given B".
  • Completeness
    Completeness (statistics)
    In statistics, completeness is a property of a statistic in relation to a model for a set of observed data. In essence, it is a condition which ensures that the parameters of the probability distribution representing the model can all be estimated on the basis of the statistic: it ensures that the...

  • Correlation
    Correlation
    In statistics, dependence refers to any statistical relationship between two random variables or two sets of data. Correlation refers to any of a broad class of statistical relationships involving dependence....

    , also called correlation coefficient, is a numeric measure of the strength of linear relationship between two random variables (one can use it to quantify, for example, how shoe size and height are correlated in the population). An example is the Pearson product-moment correlation coefficient
    Pearson product-moment correlation coefficient
    In statistics, the Pearson product-moment correlation coefficient is a measure of the correlation between two variables X and Y, giving a value between +1 and −1 inclusive...

    , which is found by dividing the covariance of the two variables by the product of their standard deviations. Independent variables have a correlation of 0.
  • Count data is data arising from counting
    Counting
    Counting is the action of finding the number of elements of a finite set of objects. The traditional way of counting consists of continually increasing a counter by a unit for every element of the set, in some order, while marking those elements to avoid visiting the same element more than once,...

     that can take only non-negative integer values.
  • The Covariance
    Covariance
    In probability theory and statistics, covariance is a measure of how much two variables change together. Variance is a special case of the covariance when the two variables are identical.- Definition :...

     between two random variables X and Y, with expected values and is defined as the expected value of random variable , and is written . It is used for measuring correlation.
  • Credence A subjective estimate of probability.
  • A data set
    Data set
    A data set is a collection of data, usually presented in tabular form. Each column represents a particular variable. Each row corresponds to a given member of the data set in question. Its values for each of the variables, such as height and weight of an object or values of random numbers. Each...

     is a sample and the associated data points.
  • A data point
    Data point
    In statistics, a data point is a set of measurements on a single member of a statistical population, or a subset of those measurements for a given individual...

     is a typed measurement - it can be a boolean value, a real number, a vector (in which case it's also called a data vector), etc.
  • A Distribution function is the function that gives the probability distribution of a random variable. It cannot be negative, and its integral
    Integral
    Integration is an important concept in mathematics and, together with its inverse, differentiation, is one of the two main operations in calculus...

     on the probability space is equal to 1.
  • Efficiency
    Efficiency (statistics)
    In statistics, an efficient estimator is an estimator that estimates the quantity of interest in some “best possible” manner. The notion of “best possible” relies upon the choice of a particular loss function — the function which quantifies the relative degree of undesirability of estimation errors...

  • An Elementary event
    Elementary event
    In probability theory, an elementary event or atomic event is a singleton of a sample space. An outcome is an element of a sample space. An elementary event is a set containing exactly one outcome, not the outcome itself...

     (or atomic event) is an event with only one element. For example, when pulling a card out of a deck, "getting the jack of spades" is an elementary event, while "getting a king or an ace" is not.
  • Estimator
    Estimator
    In statistics, an estimator is a rule for calculating an estimate of a given quantity based on observed data: thus the rule and its result are distinguished....

     is a function of the known data that is used to estimate an unknown parameter; an estimate is the result from the actual application of the function to a particular set of data. The mean can be used as an estimator.
  • The Expected value
    Expected value
    In probability theory, the expected value of a random variable is the weighted average of all possible values that this random variable can take on...

     (or expectation) of a random variable is the sum of the probability of each possible outcome of the experiment multiplied by its payoff ("value"). Thus, it represents the average amount one "expects" to win per bet if bets with identical odds are repeated many times. For example, the expected value of a six-sided die roll is 3.5. The concept is similar to the mean. The expected value of random variable X is typically written E(X) or (mu
    Mu (letter)
    Carlos Alberto Vives Restrepo is a Grammy Award and three-time Latin Grammy Award winning-Colombian singer, composer and actor.-Biography:...

    ).
  • Experiment
    Experiment
    An experiment is a methodical procedure carried out with the goal of verifying, falsifying, or establishing the validity of a hypothesis. Experiments vary greatly in their goal and scale, but always rely on repeatable procedure and logical analysis of the results...

  • An event
    Event (probability theory)
    In probability theory, an event is a set of outcomes to which a probability is assigned. Typically, when the sample space is finite, any subset of the sample space is an event...

     is a subset of the sample space, to which a probability can be assigned. For example, on rolling a die, "getting a five or a six" is an event (with a probability of one third if the die is fair).
  • Generating function
    Generating function
    In mathematics, a generating function is a formal power series in one indeterminate, whose coefficients encode information about a sequence of numbers an that is indexed by the natural numbers. Generating functions were first introduced by Abraham de Moivre in 1730, in order to solve the general...

  • Independence or Statistical independence
    Statistical independence
    In probability theory, to say that two events are independent intuitively means that the occurrence of one event makes it neither more nor less probable that the other occurs...

     : Two events are independent if the outcome of one does not affect that of the other (for example, getting a 1 on one die roll does not affect the probability of getting a 1 on a second roll). Similarly, when we assert that two random variables are independent, we intuitively mean that knowing something about the value of one of them does not yield any information about the value of the other.
  • Joint distribution
    Joint distribution
    In the study of probability, given two random variables X and Y that are defined on the same probability space, the joint distribution for X and Y defines the probability of events defined in terms of both X and Y...

     : Given two random variables X and Y, the joint distribution of X and Y is the probability distribution of X and Y together.
  • Joint probability is the probability of two events occurring together. The joint probability of A and B is written or
  • Kurtosis
    Kurtosis
    In probability theory and statistics, kurtosis is any measure of the "peakedness" of the probability distribution of a real-valued random variable...

     is a measure of the "peakedness" of the probability distribution of a real-valued random variable. Higher kurtosis means more of the variance is due to infrequent extreme deviations, as opposed to frequent modestly sized deviations.
  • A likelihood function
    Likelihood function
    In statistics, a likelihood function is a function of the parameters of a statistical model, defined as follows: the likelihood of a set of parameter values given some observed outcomes is equal to the probability of those observed outcomes given those parameter values...

     (or just likelihood) is a conditional probability function considered a function of its second argument with its first argument held fixed. For example, imagine pulling a numbered ball with the number k from a bag of n balls, numbered 1 to n. Then you could describe a likelihood function for the random variable N as the probability of getting k given that there are n balls : the likelihood will be 1/n for n greater or equal to k, and 0 for n smaller than k. Unlike a probability distribution function, this likelihood function will not sum up to 1 on the sample space.
  • Marginal distribution
    Marginal distribution
    In probability theory and statistics, the marginal distribution of a subset of a collection of random variables is the probability distribution of the variables contained in the subset. The term marginal variable is used to refer to those variables in the subset of variables being retained...

     : given two jointly distributed random variables X and Y, the marginal distribution of X is simply the probability distribution of X ignoring information about Y.
  • Marginal probability is the probability of an event, ignoring any information about other events. The marginal probability of A is written P(A). Contrast with conditional probability.
  • The Mean
    Mean
    In statistics, mean has two related meanings:* the arithmetic mean .* the expected value of a random variable, which is also called the population mean....

     of a random variable is its expected value. The mean (or sample mean of a data set is just the average value.
  • Moment about the mean
  • Mutual independence : A collection of events is mutually independent if for any subset of the collection, the joint probability of all events occurring is equal to the product of the joint probabilities of the individual events. Think of the result of a series of coin-flips. This is a stronger condition than pairwise independence.
  • Pairwise independence
    Pairwise independence
    In probability theory, a pairwise independent collection of random variables is a set of random variables any two of which are independent. Any collection of mutually independent random variables is pairwise independent, but some pairwise independent collections are not mutually independent...

     : a pairwise independent collection of random variables is a set of random variables any two of which are independent.
  • Parameter
    Statistical parameter
    A statistical parameter is a parameter that indexes a family of probability distributions. It can be regarded as a numerical characteristic of a population or a model....

    , or "statistical parameter" : Can be a population parameter, a distribution parameter, an unobserved parameter (with different shades of meaning). In statistics, this is often a quantity to be estimated.
  • Prior probability
    Prior probability
    In Bayesian statistical inference, a prior probability distribution, often called simply the prior, of an uncertain quantity p is the probability distribution that would express one's uncertainty about p before the "data"...

    : in Bayesian inference
    Bayesian inference
    In statistics, Bayesian inference is a method of statistical inference. It is often used in science and engineering to determine model parameters, make predictions about unknown variables, and to perform model selection...

    , this represents prior beliefs or other information that is available before new data or observations are taken into account.
  • A population or statistical population
    Statistical population
    A statistical population is a set of entities concerning which statistical inferences are to be drawn, often based on a random sample taken from the population. For example, if we were interested in generalizations about crows, then we would describe the set of crows that is of interest...

     is a set of entities about which statistical inferences are to be drawn, often based on random sampling. One can also talk about a population of measurements or values.
  • Population parameter : See statistical parameter.
  • Posterior probability
    Posterior probability
    In Bayesian statistics, the posterior probability of a random event or an uncertain proposition is the conditional probability that is assigned after the relevant evidence is taken into account...

    : the result of a Bayesian analysis that encapsulates the combination of prior beliefs or information with observed data
  • Probability density is used to describe probability in a continuous probability distribution. For example, you can't say that the probability of a man being six feet tall is 20%, but you can say he has 20% of chances of being between five and six feet tall. Probability density is given by a probability density function. Contrast with probability mass.
  • A probability density function
    Probability density function
    In probability theory, a probability density function , or density of a continuous random variable is a function that describes the relative likelihood for this random variable to occur at a given point. The probability for the random variable to fall within a particular region is given by the...

     gives the probability distribution for a continuous random variable.
  • A probability distribution
    Probability distribution
    In probability theory, a probability mass, probability density, or probability distribution is a function that describes the probability of a random variable taking certain values....

     is a function that gives the probability of all elements in a given space: see List of probability distributions
  • Probability interpretations
    Probability interpretations
    The word probability has been used in a variety of ways since it was first coined in relation to games of chance. Does probability measure the real, physical tendency of something to occur, or is it just a measure of how strongly one believes it will occur? In answering such questions, we...

  • A Probability measure
    Probability measure
    In mathematics, a probability measure is a real-valued function defined on a set of events in a probability space that satisfies measure properties such as countable additivity...

     gives the probability of events in a probability space.
  • A probability space
    Probability space
    In probability theory, a probability space or a probability triple is a mathematical construct that models a real-world process consisting of states that occur randomly. A probability space is constructed with a specific kind of situation or experiment in mind...

     is a sample space over which a probability measure has been defined.
  • Random function
    Random function
    A random function is a function chosen at random from a finite family of functions. Typically, the family consists of the set of all maps from the domain to the codomain. Thus, a random function can be considered to map each input independently at random to any one of the possible outputs. Viewed...

  • A random variable
    Random variable
    In probability and statistics, a random variable or stochastic variable is, roughly speaking, a variable whose value results from a measurement on some type of random process. Formally, it is a function from a probability space, typically to the real numbers, which is measurable functionmeasurable...

     can be, for example, the possible outcomes of a dice roll (but it is not assigned a value). The distribution function of a random variable gives the probability of different results. We can also derive the mean and variance of a random variable.
    • Discrete random variable
    • Continuous random variable
  • A Random vector (or multivariate random variable
    Multivariate random variable
    In mathematics, probability, and statistics, a multivariate random variable or random vector is a list of mathematical variables each of whose values is unknown, either because the value has not yet occurred or because there is imperfect knowledge of its value.More formally, a multivariate random...

    ) is a vector whose components are random variables on the same probability space.
  • The Range
    Range (statistics)
    In the descriptive statistics, the range is the length of the smallest interval which contains all the data. It is calculated by subtracting the smallest observation from the greatest and provides an indication of statistical dispersion.It is measured in the same units as the data...

     is the length of the smallest interval which contains all the data.
  • A sample is that part of a population which is actually observed.
  • The sample space is the set of possible outcomes of an experiment. For example, the sample space for rolling a six-sided die will be {1, 2, 3, 4, 5, 6}.
  • Sampling
    Sampling (statistics)
    In statistics and survey methodology, sampling is concerned with the selection of a subset of individuals from within a population to estimate characteristics of the whole population....

     is a process of selecting observations to obtain knowledge about a population. There are many methods to choose on which sample to do the observations.
  • A sampling distribution
    Sampling distribution
    In statistics, a sampling distribution or finite-sample distribution is the probability distribution of a given statistic based on a random sample. Sampling distributions are important in statistics because they provide a major simplification on the route to statistical inference...

     is the probability distribution, under repeated sampling of the population, of a given statistic.
  • Skewness
    Skewness
    In probability theory and statistics, skewness is a measure of the asymmetry of the probability distribution of a real-valued random variable. The skewness value can be positive or negative, or even undefined...

     is a measure of the asymmetry of the probability distribution of a real-valued random variable. Roughly speaking, a distribution has positive skew (right-skewed) if the higher tail is longer and negative skew (left-skewed) if the lower tail is longer (confusing the two is a common error).
  • The standard deviation
    Standard deviation
    Standard deviation is a widely used measure of variability or diversity used in statistics and probability theory. It shows how much variation or "dispersion" there is from the average...

     is the most commonly used measure of statistical dispersion. It is the square root
    Square root
    In mathematics, a square root of a number x is a number r such that r2 = x, or, in other words, a number r whose square is x...

     of the variance, and is generally written (sigma).
  • Standardized moment
  • A statistic
    Statistic
    A statistic is a single measure of some attribute of a sample . It is calculated by applying a function to the values of the items comprising the sample which are known together as a set of data.More formally, statistical theory defines a statistic as a function of a sample where the function...

     is the result of applying a statistical algorithm to a data set. It can also be described as an observable random variable.
  • Statistical inference
    Statistical inference
    In statistics, statistical inference is the process of drawing conclusions from data that are subject to random variation, for example, observational errors or sampling variation...

      is inference about a population from a random sample drawn from it or, more generally, about a random process from its observed behavior during a finite period of time.
  • Statistical dispersion
    Statistical dispersion
    In statistics, statistical dispersion is variability or spread in a variable or a probability distribution...

     (also called statistical variability) is a measure of how diverse some data is. It can be expressed by the variance or the standard deviation.
  • A Statistical parameter
    Statistical parameter
    A statistical parameter is a parameter that indexes a family of probability distributions. It can be regarded as a numerical characteristic of a population or a model....

     is a parameter that indexes a family of probability distributions.
  • Sufficiency
    Sufficiency (statistics)
    In statistics, a sufficient statistic is a statistic which has the property of sufficiency with respect to a statistical model and its associated unknown parameter, meaning that "no other statistic which can be calculated from the same sample provides any additional information as to the value of...

  • The variance
    Variance
    In probability theory and statistics, the variance is a measure of how far a set of numbers is spread out. It is one of several descriptors of a probability distribution, describing how far the numbers lie from the mean . In particular, the variance is one of the moments of a distribution...

     of a random variable is a measure of its statistical dispersion, indicating how far from the expected value its values typically are. The variance of random variable X is typically designated as , , or simply .

See also

  • Notation in probability and statistics
  • Probability axioms
    Probability axioms
    In probability theory, the probability P of some event E, denoted P, is usually defined in such a way that P satisfies the Kolmogorov axioms, named after Andrey Kolmogorov, which are described below....

  • Glossary of experimental design
    Glossary of experimental design
    - Glossary :* Alias: When the estimate of an effect also includes the influence of one or more other effects the effects are said to be aliased . For example, if the estimate of effect D in a four factor experiment actually estimates , then the main effect D is aliased with the 3-way interaction ABC...

  • List of statistical topics
  • List of probability topics

External links

The source of this article is wikipedia, the free encyclopedia.  The text of this article is licensed under the GFDL.
 
x
OK