List of statistical topics - AbsoluteAstronomy.com

0–9

1.96
1.96
1.96 is the approximate value of the 97.5 percentile point of the normal distribution used in probability and statistics. 95% of the area under a normal curve lies within roughly 1.96 standard deviations of the mean, and due to the central limit theorem, this number is therefore used in the...
2SLS (two-stage least squares) — redirects to instrumental variable
Instrumental variable
In statistics, econometrics, epidemiology and related disciplines, the method of instrumental variables is used to estimate causal relationships when controlled experiments are not feasible....
3SLS — redirects to Three-stage least squares
68-95-99.7 rule
68-95-99.7 rule
In statistics, the 68-95-99.7 rule, or three-sigma rule, or empirical rule, states that for a normal distribution, nearly all values lie within 3 standard deviations of the mean....
100-year flood
100-year flood
A one-hundred-year flood is calculated to be the level of flood water expected to be equaled or exceeded every 100 years on average. The 100-year flood is more accurately referred to as the 1% annual exceedance probability flood, since it is a flood that has a 1% chance of being equaled or exceeded...

A

A posteriori probability (disambiguation)
A priori probability
A priori probability
The term a priori probability is used in distinguishing the ways in which values for probabilities can be obtained. In particular, an "a priori probability" is derived purely by deductive reasoning...
A priori (statistics)
A priori (statistics)
In statistics, a priori knowledge is prior knowledge about a population, rather than that estimated by recent observation. It is common in Bayesian inference to make inferences conditional upon this knowledge, and the integration of a priori knowledge is the central difference between the Bayesian...
Abductive reasoning
Abductive reasoning
Abduction is a kind of logical inference described by Charles Sanders Peirce as "guessing". The term refers to the process of arriving at an explanatory hypothesis. Peirce said that to abduce a hypothetical explanation a from an observed surprising circumstance b is to surmise that a may be true...
Absolute deviation
Absolute deviation
In statistics, the absolute deviation of an element of a data set is the absolute difference between that element and a given point. Typically the point from which the deviation is measured is a measure of central tendency, most often the median or sometimes the mean of the data set.D_i = |x_i-m|...
Absolute risk reduction
Absolute risk reduction
In epidemiology, the absolute risk reduction or risk difference is the decrease in risk of a given activity or treatment in relation to a control activity or treatment. It is the inverse of the number needed to treat....
ABX test
ABX test
An ABX test is a method of comparing two kinds of sensory stimuli to identify detectable differences. A subject is presented with two known samples , and one unknown sample X, for three samples total. X is randomly selected from A and B, and the subject identifies X as being either A or B...
Accelerated failure time model
Accelerated failure time model
In the statistical area of survival analysis, an accelerated failure time model is a parametric model that provides an alternative to the commonly used proportional hazards models...
Acceptable quality limit
Acceptable quality limit
The acceptable quality limit is the worst tolerable process average in percentage or ratio, that is still considered acceptable: that is, it is at an acceptable quality level...
Acceptance sampling
Acceptance sampling
Acceptance sampling uses statistical sampling to determine whether to accept or reject a production lot of material. It has been a common quality control technique used in industry and particularly the military for contracts and procurement. It is usually done as products leave the factory, or in...
Accidental sampling
Accidental sampling
Accidental sampling is a type of nonprobability sampling which involves the sample being drawn from that part of the population which is close to hand. That is, a sample population selected because it is readily available and convenient...
Accuracy and precision
Accuracy and precision
In the fields of science, engineering, industry and statistics, the accuracy of a measurement system is the degree of closeness of measurements of a quantity to that quantity's actual value. The precision of a measurement system, also called reproducibility or repeatability, is the degree to which...
Accuracy paradox
Accuracy paradox
The accuracy paradox for predictive analytics states that predictive models with a given level of accuracy may have greater predictive power than models with higher accuracy...
Acquiescence bias
Acquiescence bias
Acquiescence bias is a category of response bias in which respondents to a survey have a tendency to agree with all the questions or to indicate a positive connotation. Acquiescence is sometimes referred to as "yah-saying" and is the tendency of a respondent to agree with a statement when in doubt...
Actuarial science
Actuarial science
Actuarial science is the discipline that applies mathematical and statistical methods to assess risk in the insurance and finance industries. Actuaries are professionals who are qualified in this field through education and experience...
ADAPA
ADAPA
ADAPA is intrinsically a predictive decisioning platform. It combines the power of predictive analytics and business rules to facilitate the tasks of managing and designing automated decisioning systems.-Automated decisions:...

– software
Adapted process
Adapted process
In the study of stochastic processes, an adapted process is one that cannot "see into the future". An informal interpretation is that X is adapted if and only if, for every realisation and every n, Xn is known at time n...
Adaptive estimator
Adaptive estimator
In statistics, an adaptive estimator is an estimator in a parametric or semiparametric model with nuisance parameters such that the presence of these nuisance parameters does not affect efficiency of estimation.-Definition:...
Additive Markov chain
Additive Markov chain
In probability theory, an additive Markov chain is a Markov chain with an additive conditional probability function. Here the process is a discrete-time Markov chain of order m and the transition probability to a state at the next time is a sum of functions, each depending on the next state and one...
Additive model
Additive model
In statistics, an additive model is a nonparametric regression method. It was suggested by Jerome H. Friedman and Werner Stuetzle and is an essential part of the ACE algorithm. The AM uses a one dimensional smoother to build a restricted class of nonparametric regression models. Because of this,...
Additive smoothing
Additive smoothing
In statistics, additive smoothing, also called Laplace smoothing , or Lidstone smoothing, is a technique used to smooth categorical data...
Additive white Gaussian noise
Additive white Gaussian noise
Additive white Gaussian noise is a channel model in which the only impairment to communication is a linear addition of wideband or white noise with a constant spectral density and a Gaussian distribution of amplitude. The model does not account for fading, frequency selectivity, interference,...
Adjusted Rand index — redirects to Rand index
Rand index
The Rand index or Rand measure in statistics, and in particular in data clustering, is a measure of the similarity between two data clusterings...

(subsection)
ADMB
ADMB
ADMB or AD Model Builder is a free and open source software suite for non-linear statistical modeling. It was created by David Fournier and now being developed by the ADMB Project, a creation of the non-profit ADMB Foundation...

– software
Admissible decision rule
Admissible decision rule
In statistical decision theory, an admissible decision rule is a rule for making a decision such that there isn't any other rule that is always "better" than it, in a specific sense defined below....
Age adjustment
Age adjustment
In epidemiology and demography, age adjustment, also called age standardisation, is a technique used to better allow populations to be compared when the age profiles of the populations are quite different....
Age-standardized mortality rate
Age stratification
Age stratification
In critical sociology, age stratification refers to the hierarchical ranking of people into age groups within a society.Age stratification which is based on an ascribed status is a major source inequality, and thus may lead to ageism.-External links:* *...
Aggregate data
Aggregate data
In statistics, aggregate data describes data combined from several measurements.In economics, aggregate data or data aggregates describes high-level data that is composed of a multitude or combination of other more individual data....
Aggregate pattern
Aggregate pattern
An Aggregate pattern can refer to concepts in either statistics or computer programming. Both uses deal with considering a large case as composed of smaller, simpler, pieces.- Statistics :...
Akaike information criterion
Akaike information criterion
The Akaike information criterion is a measure of the relative goodness of fit of a statistical model. It was developed by Hirotsugu Akaike, under the name of "an information criterion" , and was first published by Akaike in 1974...
Algebra of random variables
Algebra of random variables
In the algebraic axiomatization of probability theory, the primary concept is not that of probability of an event, but rather that of a random variable. Probability distributions are determined by assigning an expectation to each random variable...
Algebraic statistics
Algebraic statistics
Algebraic statistics is the use of algebra to advance statistics. Algebra has been useful for experimental design, parameter estimation, and hypothesis testing....
Algorithmic inference
Algorithmic inference
Algorithmic inference gathers new developments in the statistical inference methods made feasible by the powerful computing devices widely available to any data analyst...
Algorithms for calculating variance
Algorithms for calculating variance
Algorithms for calculating variance play a major role in statistical computing. A key problem in the design of good algorithms for this problem is that formulas for the variance may involve sums of squares, which can lead to numerical instability as well as to arithmetic overflow when dealing with...
All-pairs testing
All-pairs testing
All-pairs testing or pairwise testing is a combinatorial software testing method that, for each pair of input parameters to a system , tests all possible discrete combinations of those parameters...
Allan variance
Allan variance
The Allan variance , also known as two-sample variance, is a measure of frequency stability in clocks, oscillators and amplifiers. It is named after David W. Allan. It is expressed mathematically as\sigma_y^2. \,...
Alignments of random points
Alignments of random points
Alignments of random points, as shown by statistics, can be found when a large number of random points are marked on a bounded flat surface. This might be used to show that ley lines exist due to chance alone .One precise definition which expresses the generally accepted meaning of "alignment"...
Almost surely
Almost surely
In probability theory, one says that an event happens almost surely if it happens with probability one. The concept is analogous to the concept of "almost everywhere" in measure theory...
Alpha beta filter
Alpha beta filter
An alpha beta filter is a simplified form of observer for estimation, data smoothing and control applications. It is closely related to Kalman filters and to linear state observers used in control theory...
Alternative hypothesis
Analyse-it
Analyse-it
Analyse-it is a statistical analysis add-in for Microsoft Excel. Analyse-it is the successor to Astute, developed in 1992 for Excel 4 and the first statistical analysis add-in for Microsoft Excel...

– software
Analysis of categorical data
Analysis of categorical data
This a list of statistical procedures which can be used for the analysis of categorical data, also known as data on the nominal scale and as categorical variables* Categorical distribution, general model* Stratified analysis* Chi-squared test...
Analysis of covariance
Analysis of molecular variance
Analysis of molecular variance
Analysis of molecular variance , is a statistical model for the molecular variation in a single species, typically biological. The name and model are inspired by ANOVA. The method was developed by Laurent Excoffier, Peter Smouse and Joseph Quattro at Rutgers University in 1992.Since developing...
Analysis of rhythmic variance
Analysis of rhythmic variance
In statistics, analysis of rhythmic variance is a method for detecting rhythms in biological time series, published by Peter Celec . It is a procedure for detecting cyclic variations in biological time series and quantification of their probability...
Analysis of variance
Analysis of variance
In statistics, analysis of variance is a collection of statistical models, and their associated procedures, in which the observed variance in a particular variable is partitioned into components attributable to different sources of variation...
Analytic and enumerative statistical studies
Analytic and enumerative statistical studies
Analytic and enumerative statistical studies are two types of scientific studies:In any statistical study the ultimate aim is to provide a rational basis for action. Enumerative and analytic studies differ by where the action is taken...
Ancestral graph
Ancestral graph
An ancestral graph is a graph with three types of edges: directed edge, bidirected edge, and undirected edge such that it can be decomposed into three parts: an undirected subgraph, a directed subgraph, and directed edges pointing from the undirected subgraph to the directed subgraph.An ancestral...
Anchor test
Anchor test
In psychometrics, an anchor test is a common set of test items administered in combination with two or more alternative forms of the test with the aim of establishing the equivalence of the test scores on the alternative forms. The purpose of the anchor test is to provide a baseline for an...
Ancillary statistic
Ancillary statistic
In statistics, an ancillary statistic is a statistic whose sampling distribution does not depend on which of the probability distributions among those being considered is the distribution of the statistical population from which the data were taken...
ANCOVA
ANCOVA
In statistics, analysis of covariance is a general linear model with a continuous outcome variable and two or more predictor variables where at least one is continuous and at least one is categorical . ANCOVA is a merger of ANOVA and regression for continuous variables...

– redirects to Analysis of covariance
Anderson–Darling test
ANOVA
ANOVA on ranks
ANOVA on ranks
In statistics, one purpose for the analysis of variance is to analyze differences in means between groups. The test statistic, F, assumes independence of observations, homogeneous variances, and population normality...
ANOVA-simultaneous component analysis
ANOVA-simultaneous component analysis
ASCA, ANOVA-SCA, or analysis of variance – simultaneous component analysis is a method that partitions variation and enables interpretation of these partitions by SCA, a method that is similar to PCA. This method is a multi or even megavariate extension of ANOVA. The variation partitioning is...
Anomaly detection
Anomaly detection
Anomaly detection, also referred to as outlier detection refers to detecting patterns in a given data set that do not conform to an established normal behavior....
Anomaly time series
Anomaly time series
In atmospheric sciences and some other applications of statistics, an anomaly time series is the time series of deviations of a quantity from some mean. Similarly a standardized anomaly series contains values of deviations divided by a standard deviation...
Anscombe transform
Anscombe transform
In statistics, the Anscombe transform, named after Francis Anscombe, is a variance-stabilizing transformation that transforms a random variable with a Poisson distribution into one with an approximately standard Gaussian distribution. The Anscombe transform is widely used in photon-limited imaging ...
Anscombe's quartet
Anscombe's quartet
Anscombe's quartet comprises four datasets that have identical simple statistical properties, yet appear very different when graphed. Each dataset consists of eleven points. They were constructed in 1973 by the statistician F.J...
Antecedent variable
Antecedent variable
In statistics and social sciences, an antecedent variable is a variable that can help to explain the apparent relationship between other variables that are nominally in a cause and effect relationship...
Antithetic variates
Antithetic variates
The antithetic variates method is a variance reduction technique used in Monte Carlo methods. Considering that the error reduction in the simulated signal has a square root convergence , a very large number of sample paths is required to obtain an accurate result.-Underlying principle:The...
Approximate Bayesian computation
Approximate Bayesian computation
Approximate Bayesian computation is a family of computational techniques in Bayesian statistics. These simulation techniques operate on summary data to make broad inferences with less computation than might be required if all available data were analyzed in detail...
Arcsine distribution
Area chart
Area chart
An area chart or area graph displays graphically quantitive data. It is based on the line chart. The area between axis and line are commonly emphasized with colors, textures and hatchings...
Area compatibility factor
ARGUS distribution
Arithmetic mean
Arithmetic mean
In mathematics and statistics, the arithmetic mean, often referred to as simply the mean or average when the context is clear, is a method to derive the central tendency of a sample space...
Armitage–Doll multistage model of carcinogenesis
Arrival theorem
Artificial neural network
Artificial neural network
An artificial neural network , usually called neural network , is a mathematical model or computational model that is inspired by the structure and/or functional aspects of biological neural networks. A neural network consists of an interconnected group of artificial neurons, and it processes...
Ascertainment bias
ASReml
ASReml
ASReml is a statistical software package for fitting linear mixed models using restricted maximum likelihood, a technique commonly used in plant and animal breeding and quantitative genetics as well as other fields...

– software
Association (statistics)
Association (statistics)
In statistics, an association is any relationship between two measured quantities that renders them statistically dependent. The term "association" refers broadly to any such relationship, whereas the narrower term "correlation" refers to a linear relationship between two quantities.There are many...
Association mapping
Association mapping
Association mapping, also known as "linkage disequilibrium mapping", is a method of mapping quantitative trait loci that takes advantage of historic linkage disequilibrium to link phenotypes to genotypes .-Theory:Association mapping is based on the idea that traits that have entered a population...
Association scheme
Association scheme
The theory of association schemes arose in statistics, in the theory of experimental design for the analysis of variance. In mathematics, association schemes belong to both algebra and combinatorics. Indeed, in algebraic combinatorics, association schemes provide a unified approach to many topics,...
Assumed mean
Assumed mean
In statistics the assumed mean is a method for calculating the arithmetic mean and standard deviation of a data set. It simplifies calculating accurate values by hand. Its interest today is chiefly historical but it can be used to quickly estimate these statistics...
Asymptotic distribution
Asymptotic distribution
In mathematics and statistics, an asymptotic distribution is a hypothetical distribution that is in a sense the "limiting" distribution of a sequence of distributions...
Asymptotic equipartition property
Asymptotic equipartition property
In information theory the asymptotic equipartition property is a general property of the output samples of a stochastic source. It is fundamental to the concept of typical set used in theories of compression....

(information theory)
Asymptotic normality – redirects to Asymptotic distribution
Asymptotic distribution
In mathematics and statistics, an asymptotic distribution is a hypothetical distribution that is in a sense the "limiting" distribution of a sequence of distributions...
Asymptotic relative efficiency redirects to Efficiency (statistics)
Efficiency (statistics)
In statistics, an efficient estimator is an estimator that estimates the quantity of interest in some “best possible” manner. The notion of “best possible” relies upon the choice of a particular loss function — the function which quantifies the relative degree of undesirability of estimation errors...
Asymptotic theory (statistics)
Asymptotic theory (statistics)
In statistics, asymptotic theory, or large sample theory, is a generic framework for assessment of properties of estimators and statistical tests...
Atkinson index
Atkinson index
The Atkinson index is a measure of income inequality developed by British economist Anthony Barnes Atkinson...
Attack rate
Attack rate
In epidemiology, an attack rate is the cumulative incidence of infection in a group of people observed over a period of time during an epidemic, usually in relation to foodborne illness....
Augmented Dickey–Fuller test
Aumann's agreement theorem
Aumann's agreement theorem
Aumann's agreement theorem says that two people acting rationally and with common knowledge of each other's beliefs cannot agree to disagree...
Autocorrelation
Autocorrelation
Autocorrelation is the cross-correlation of a signal with itself. Informally, it is the similarity between observations as a function of the time separation between them...
- Autocorrelation plot redirects to Correlogram
  Correlogram
  In the analysis of data, a correlogram is an image of correlation statistics. For example, in time series analysis, a correlogram, also known as an autocorrelation plot, is a plot of the sample autocorrelations r_h\, versus h\, ....
Autocovariance
Autocovariance
In statistics, given a real stochastic process X, the autocovariance is the covariance of the variable with itself, i.e. the variance of the variable against a time-shifted version of itself...
Autoregressive conditional duration
Autoregressive conditional duration
In financial econometrics, an autoregressive conditional duration model considers irregularly spaced and autocorrelated intertrade durations. ACD is analogous to GARCH...
Autoregressive conditional heteroskedasticity
Autoregressive conditional heteroskedasticity
In econometrics, AutoRegressive Conditional Heteroskedasticity models are used to characterize and model observed time series. They are used whenever there is reason to believe that, at any point in a series, the terms will have a characteristic size, or variance...
Autoregressive fractionally integrated moving average
Autoregressive fractionally integrated moving average
In statistics, autoregressive fractionally integrated moving average models are time series models that generalize ARIMA models by allowing non-integer values of the differencing parameter and are useful in modeling time series with long memory...
Autoregressive integrated moving average
Autoregressive integrated moving average
In statistics and econometrics, and in particular in time series analysis, an autoregressive integrated moving average model is a generalization of an autoregressive moving average model. These models are fitted to time series data either to better understand the data or to predict future points...
Autoregressive model
Autoregressive model
In statistics and signal processing, an autoregressive model is a type of random process which is often used to model and predict various types of natural phenomena...
Autoregressive moving average model
Autoregressive moving average model
In statistics and signal processing, autoregressive–moving-average models, sometimes called Box–Jenkins models after the iterative Box–Jenkins methodology usually used to estimate them, are typically applied to autocorrelated time series data.Given a time series of data Xt, the ARMA model is a...
Auxiliary particle filter
Auxiliary particle filter
The auxiliary particle filter is a particle filtering algorithm introduced by Pitt and Shephard in 1999 to improve some deficiencies of the sequential importance resampling algorithm when dealing with tailed observation densities....
Average
Average
In mathematics, an average, or central tendency of a data set is a measure of the "middle" value of the data set. Average is one form of central tendency. Not all central tendencies should be considered definitions of average....
Average treatment effect
Averaged one-dependence estimators
Azuma's inequality

B

BA model
BA model
The Barabási–Albert model is an algorithm for generating random scale-free networks using a preferential attachment mechanism. Scale-free networks are widely observed in natural and man-made systems, including the Internet, the world wide web, citation networks, and some social...

– model for a random network
Backfitting algorithm
Backfitting algorithm
In statistics, the backfitting algorithm is a simple iterative procedure used to fit a generalized additive model. It was introduced in 1985 by Leo Breiman and Jerome Friedman along with generalized additive models...
Balance equation
Balance equation
In probability theory, a balance equation is an equation that describes the probability flux associated with a Markov chain in and out of states or set of states.-Global balance:...
Balanced incomplete block design redirects to Block design
Balanced repeated replication
Balanced repeated replication
Balanced repeated replication is a statistical technique for estimating the sampling variability of a statistic obtained by stratified sampling.- Outline of the technique :# Select balanced half-samples from the full sample....
Balding–Nichols model
Banburismus
Banburismus
Banburismus was a cryptanalytic process developed by Alan Turing at Bletchley Park in England during the Second World War. It was used by Bletchley Park's Hut 8 to help break German Kriegsmarine messages enciphered on Enigma machines. The process used sequential conditional probability to infer...

— related to Bayesian networks
Bapat–Beg theorem
Bar chart
Bar chart
A bar chart or bar graph is a chart with rectangular bars with lengths proportional to the values that they represent. The bars can be plotted vertically or horizontally....
Barabási–Albert model
Barber–Johnson diagram
Barnard's test
Barnard's test
In statistics, Barnard's test is an exact test of the null hypothesis of independence of rows and columns in a contingency table. It is an alternative to Fisher's exact test but is more time-consuming to compute...
Barnardisation
Barnardisation
Barnardisation is a method of disclosure control for tables of counts that involves randomly adding or subtracting 1 from some cells in the table....
Barnes interpolation
Barnes interpolation
Barnes interpolation, named after Stanley L. Barnes, is the interpolation of unstructured data points from a set of measurements of an unknown function in two dimensions into an analytic function of two variables...
Bartlett's method
Bartlett's test
Bartlett's test
In statistics, Bartlett's test is used to test if k samples are from populations with equal variances. Equal variances across samples is called homoscedasticity or homogeneity of variances. Some statistical tests, for example the analysis of variance, assume that variances are equal across groups...
Base rate
Base rate
In probability and statistics, base rate generally refers to the class probabilities unconditioned on featural evidence, frequently also known as prior probabilities...
Baseball statistics
Baseball statistics
Statistics play an important role in summarizing baseball performance and evaluating players in the sport.Since the flow of a baseball game has natural breaks to it, and normally players act individually rather than performing in clusters, the sport lends itself to easy record-keeping and statistics...
Basu's theorem
Basu's theorem
In statistics, Basu's theorem states that any boundedly complete sufficient statistic is independent of any ancillary statistic. This is a 1955 result of Debabrata Basu....
Bates distribution
Bates distribution
In probability and statistics, the Bates distribution, is a probability distribution of the mean of a number of statistically independent uniformly distributed random variables on the unit interval...
Baum–Welch algorithm
Bayes' rule
Bayes' rule
In probability theory and applications, Bayes' rule relates the odds of event A_1 to event A_2, before and after conditioning on event B. The relationship is expressed in terms of the Bayes factor, \Lambda. Bayes' rule is derived from and closely related to Bayes' theorem...
Bayes' theorem
Bayes' theorem
In probability theory and applications, Bayes' theorem relates the conditional probabilities P and P. It is commonly used in science and engineering. The theorem is named for Thomas Bayes ....
- Evidence under Bayes theorem
  Evidence under Bayes theorem
  Bayes' theorem provides a way of updating the probability of an event in the light of new information. In the evidence law context, for example, it could be used as a way of updating the probability that a genetic sample found at the scene of the crime came from the defendant in light of a genetic...
Bayes estimator
Bayes estimator
In estimation theory and decision theory, a Bayes estimator or a Bayes action is an estimator or decision rule that minimizes the posterior expected value of a loss function . Equivalently, it maximizes the posterior expectation of a utility function...
Bayes factor
Bayes factor
In statistics, the use of Bayes factors is a Bayesian alternative to classical hypothesis testing. Bayesian model comparison is a method of model selection based on Bayes factors.-Definition:...
Bayes linear statistics
Bayesian — disambiguation
Bayesian additive regression kernels
Bayesian additive regression kernels
Bayesian additive regression kernels is a non-parametric statistical model for regression and statistical classification.The unknown mean function is represented as a weighted sum of kernel functions, which is constructed by a prior using...
Bayesian average
Bayesian average
A Bayesian average is a method of estimating the mean of a population consistent with Bayesian interpretation, where instead of estimating the mean strictly from the available data set, other existing information related to that data set may also be incorporated into the calculation in order to...
Bayesian brain
Bayesian brain
Bayesian brain is a term that is used to refer to the ability of the nervous system to operate in situations of uncertainty in a fashion that is close to the optimal prescribed by Bayesian statistics. This term is used in behavioural sciences and neuroscience and studies associated with this term...
Bayesian econometrics
Bayesian econometrics
Bayesian econometrics is a branch of econometrics which applies Bayesian principles to economic modelling. Bayesianism is based on a degree-of-belief interpretation of probability, as opposed to a relative-frequency interpretation....
Bayesian experimental design
Bayesian experimental design
Bayesian experimental design provides a general probability-theoretical framework from which other theories on experimental design can be derived. It is based on Bayesian inference to interpret the observations/data acquired during the experiment...
Bayesian game
Bayesian game
In game theory, a Bayesian game is one in which information about characteristics of the other players is incomplete. Following John C. Harsanyi's framework, a Bayesian game can be modelled by introducing Nature as a player in a game...
Bayesian inference
Bayesian inference
In statistics, Bayesian inference is a method of statistical inference. It is often used in science and engineering to determine model parameters, make predictions about unknown variables, and to perform model selection...
Bayesian inference in phylogeny
Bayesian inference in phylogeny
Bayesian inference in phylogeny generates a posterior distribution for a parameter, composed of a phylogenetic tree and a model of evolution, based on the prior for that parameter and the likelihood of the data, generated by a multiple alignment. The Bayesian approach has become more popular due...
Bayesian information criterion
Bayesian linear regression
Bayesian linear regression
In statistics, Bayesian linear regression is an approach to linear regression in which the statistical analysis is undertaken within the context of Bayesian inference...
Bayesian model comparison — redirects to Bayes factor
Bayes factor
In statistics, the use of Bayes factors is a Bayesian alternative to classical hypothesis testing. Bayesian model comparison is a method of model selection based on Bayes factors.-Definition:...
Bayesian multivariate linear regression
Bayesian network
Bayesian network
A Bayesian network, Bayes network, belief network or directed acyclic graphical model is a probabilistic graphical model that represents a set of random variables and their conditional dependencies via a directed acyclic graph . For example, a Bayesian network could represent the probabilistic...
Bayesian probability
Bayesian probability
Bayesian probability is one of the different interpretations of the concept of probability and belongs to the category of evidential probabilities. The Bayesian interpretation of probability can be seen as an extension of logic that enables reasoning with propositions, whose truth or falsity is...
Bayesian search theory
Bayesian search theory
Bayesian search theory is the application of Bayesian statistics to the search for lost objects. It has been used several times to find lost sea vessels, for example the USS Scorpion.-Procedure:The usual procedure is as follows:...
Bayesian spam filtering
Bayesian spam filtering
Bayesian spam filtering is a statistical technique of e-mail filtering. It makes use of a naive Bayes classifier to identify spam e-mail.Bayesian classifiers work by correlating the use of tokens , with spam and non spam e-mails and then using Bayesian inference to calculate a probability that an...
Bayesian statistics
Bayesian statistics
Bayesian statistics is that subset of the entire field of statistics in which the evidence about the true state of the world is expressed in terms of degrees of belief or, more specifically, Bayesian probabilities...
Bayesian VAR
Bayesian VAR
Bayesian Vector Autoregression is a term which indicates that Bayesian methods are used to estimate a vector autoregression . In that respect, the difference with standard VAR models lies on the fact that the model parameters are treated as random variables, and prior probabilities are assigned to...

— Bayesian Vector Autoregression
BCMP network
BCMP network
In queueing theory, a discipline within the mathematical theory of probability, a BCMP network is a class of queueing network for which a product form equilibrium distribution exists. It is named after the authors of the paper where the network was first described: Baskett, Chandy, Muntz and Palacios...

– queueing theory
Bean machine
Bean machine
The bean machine, also known as the quincunx or Galton box, is a device invented by Sir Francis Galton to demonstrate the central limit theorem, in particular that the normal distribution is approximate to the binomial distribution....
Behrens–Fisher problem
Belief propagation
Belief propagation
Belief propagation is a message passing algorithm for performing inference on graphical models, such as Bayesian networks and Markov random fields. It calculates the marginal distribution for each unobserved node, conditional on any observed nodes...
Belt transect
Belt transect
Belt transects are used in biology to investigate the distribution of organisms in relation to a certain area, such as the seashore or a meadow. It records all the species found between two lines and how far they are for a certain place or area and how many of them there are...
Benford's law
Benford's law
Benford's law, also called the first-digit law, states that in lists of numbers from many real-life sources of data, the leading digit is distributed in a specific, non-uniform way...
Bennett's inequality
Bennett's inequality
In probability theory, Bennett's inequality provides an upper bound on the probability that the sum of independent random variables deviates from its expected value by more than any specified amount...
Berkson error model
Berkson error model
The Berkson error model is a description of random error in measurement. Unlike classical error, Berkson error causes little or no bias in the measurement. It was proposed by Joseph Berkson in a paper entitled Are there two regressions?, published in 1950.An example of Berkson error arises in...
Berkson's paradox
Berkson's paradox
Berkson's paradox or Berkson's fallacy is a result in conditional probability and statistics which is counter-intuitive for some people, and hence a veridical paradox. It is a complicating factor arising in statistical tests of proportions...
Berlin procedure
Berlin procedure
The so-called Berlin procedure is a mathematical procedure for time series decomposition and seasonal adjustment of monthly and quarterly economic time series. The mathematical foundations of the procedure were developed in 1960's at the Technical University of Berlin and the German Institute...
Bernoulli distribution
Bernoulli process
Bernoulli process
In probability and statistics, a Bernoulli process is a finite or infinite sequence of binary random variables, so it is a discrete-time stochastic process that takes only two values, canonically 0 and 1. The component Bernoulli variables Xi are identical and independent...
Bernoulli sampling
Bernoulli sampling
In the theory of finite population sampling, Bernoulli sampling is a sampling process where each element of the population that is sampled is subjected to an independent Bernoulli trial which determines whether the element becomes part of the sample during the drawing of a single sample...
Bernoulli scheme
Bernoulli scheme
In mathematics, the Bernoulli scheme or Bernoulli shift is a generalization of the Bernoulli process to more than two possible outcomes. Bernoulli schemes are important in the study of dynamical systems, as most such systems exhibit a repellor that is the product of the Cantor set and a smooth...
Bernoulli trial
Bernoulli trial
In the theory of probability and statistics, a Bernoulli trial is an experiment whose outcome is random and can be either of two possible outcomes, "success" and "failure"....
Bernstein inequalities (probability theory)
Bernstein–von Mises theorem
Bernstein–von Mises theorem
In Bayesian inference, the Bernstein–von Mises theorem provides the basis for the important result that the posterior distribution for unknown quantities in any problem is effectively independent of the prior distribution once the amount of information supplied by a sample of data is large...
Berry–Esseen theorem
Berry–Esséen theorem
The central limit theorem in probability theory and statistics states that under certain circumstances the sample mean, considered as a random quantity, becomes more normally distributed as the sample size is increased...
Bertrand's ballot theorem
Bertrand's box paradox
Bertrand's box paradox
Bertrand's box paradox is a classic paradox of elementary probability theory. It was first posed by Joseph Bertrand in his Calcul des probabilités, published in 1889.There are three boxes:# a box containing two gold coins,...
Bessel process
Bessel process
In mathematics, a Bessel process, named after Friedrich Bessel, is a type of stochastic process. The n-dimensional Bessel process is the real-valued process X given byX_t = \| W_t \|,...
Bessel's correction
Bessel's correction
In statistics, Bessel's correction, named after Friedrich Bessel, is the use of n − 1 instead of n in the formula for the sample variance and sample standard deviation, where n is the number of observations in a sample: it corrects the bias in the estimation of the population variance,...
Best linear unbiased prediction
Best linear unbiased prediction
In statistics, best linear unbiased prediction is used in linear mixed models for the estimation of random effects. BLUP was derived by Charles Roy Henderson in 1950 but the term "best linear unbiased predictor" seems not to have been used until 1962...
Beta (finance)
Beta-binomial distribution
Beta-binomial model
Beta distribution
Beta function – for incomplete beta function
Beta negative binomial distribution
Beta prime distribution
Beverton–Holt model
Bhatia–Davis inequality
Bhatia–Davis inequality
In mathematics, the Bhatia–Davis inequality, named after Rajendra Bhatia and Chandler Davis, is an upper bound on the variance of any bounded probability distribution on the real line....
Bhattacharya coefficient redirects to Bhattacharyya distance
Bhattacharyya distance
In statistics, the Bhattacharyya distance measures the similarity of two discrete or continuous probability distributions. It is closely related to the Bhattacharyya coefficient which is a measure of the amount of overlap between two statistical samples or populations. Both measures are named after A...
Bias (statistics)
Bias (statistics)
A statistic is biased if it is calculated in such a way that it is systematically different from the population parameter of interest. The following lists some types of, or aspects of, bias which should not be considered mutually exclusive:...
Bias of an estimator
Bias of an estimator
In statistics, bias of an estimator is the difference between this estimator's expected value and the true value of the parameter being estimated. An estimator or decision rule with zero bias is called unbiased. Otherwise the estimator is said to be biased.In ordinary English, the term bias is...
Biased random walk (biochemistry)
Biased random walk (biochemistry)
In cell biology, a biased random walk enables bacteria to search for food and flee from harm. Bacteria propel themselves with the aid of flagella in a process called chemotaxis, and a typical bacteria trajectory has many characteristics of a random walk. They move forward for a certain distance,...
Biased sample – redirects to Sampling bias
Biclustering
Biclustering
Biclustering, co-clustering, or two-mode clustering is a data mining technique which allows simultaneous clustering of the rows and columns of a matrix....
Big O in probability notation
Big O in probability notation
The order in probability notation is used in probability theory and statistical theory in direct parallel to the big-O notation which is standard in mathematics...
Bienaymé–Chebyshev inequality
Chebyshev's inequality
In probability theory, Chebyshev’s inequality guarantees that in any data sample or probability distribution,"nearly all" values are close to the mean — the precise statement being that no more than 1/k2 of the distribution’s values can be more than k standard deviations away from the mean...
Bills of Mortality
Bills of Mortality
The London Bills of Mortality were the main source of mortality statistics, designed to monitor deaths from the plague from the 17th century-1830s. They were used mainly as a way of warning about plague epidemics....
Bimodal distribution
Bimodal distribution
In statistics, a bimodal distribution is a continuous probability distribution with two different modes. These appear as distinct peaks in the probability density function, as shown in Figure 1....
Binary classification
Binary classification
Binary classification is the task of classifying the members of a given set of objects into two groups on the basis of whether they have some property or not. Some typical binary classification tasks are...
Bingham distribution
Bingham distribution
In statistics, the Bingham distribution, named after Christopher Bingham, is an antipodally symmetric probability distribution on the n-sphere...
Binomial distribution
Binomial proportion confidence interval
Binomial proportion confidence interval
In statistics, a binomial proportion confidence interval is a confidence interval for a proportion in a statistical population. It uses the proportion estimated in a statistical sample and allows for sampling error. There are several formulas for a binomial confidence interval, but all of them rely...
Binomial regression
Binomial regression
In statistics, binomial regression is a technique in which the response is the result of a series of Bernoulli trials, or a series of one of two possible disjoint outcomes...
Binomial test
Binomial test
In statistics, the binomial test is an exact test of the statistical significance of deviations from a theoretically expected distribution of observations into two categories.-Common use:...
Bioinformatics
Bioinformatics
Bioinformatics is the application of computer science and information technology to the field of biology and medicine. Bioinformatics deals with algorithms, databases and information systems, web technologies, artificial intelligence and soft computing, information and computation theory, software...
Biometrics (statistics) — redirects to Biostatistics
Biostatistics
Biostatistics is the application of statistics to a wide range of topics in biology...
Biostatistics
Biostatistics
Biostatistics is the application of statistics to a wide range of topics in biology...
Biplot
Biplot
Biplots are a type of exploratory graph used in statistics, a generalization of the simple two-variable scatterplot. A biplot allows information on both samples and variables of a data matrix to be displayed graphically. Samples are displayed as points while variables are displayed either as...
Birnbaum–Saunders distribution
Birth-death process
Birth-death process
The birth–death process is a special case of continuous-time Markov process where the states represent the current size of a population and where the transitions are limited to births and deaths...
Bispectrum
Bispectrum
In mathematics, in the area of statistical analysis, the bispectrum is a statistic used to search for nonlinear interactions. The Fourier transform of the second-order cumulant, i.e., the autocorrelation function, is the traditional power spectrum...
Bivariate analysis
Bivariate analysis
Bivariate analysis is one of the simplest forms of the quantitative analysis. It involves the analysis of two variables , for the purpose of determining the empirical relationship between them...
Bivariate von Mises distribution
Bivariate von Mises distribution
In probability theory and statistics, the bivariate von Mises distribution is a probability distribution describing values on a torus. It may be thought of as an analogue on the torus of the bivariate normal distribution. The distribution belongs to the field of directional statistics. The general...
Black–Scholes
Bland–Altman plot
Blind deconvolution
Blind deconvolution
In image processing and applied mathematics, blind deconvolution is a deconvolution technique that permits recovery of the target scene from a single or set of "blurred" images in the presence of a poorly determined or unknown point spread function ....
Blind experiment
Block design
Blocking (statistics)
Blocking (statistics)
In the statistical theory of the design of experiments, blocking is the arranging of experimental units in groups that are similar to one another. For example, an experiment is designed to test a new drug on patients. There are two levels of the treatment, drug, and placebo, administered to male...
BMDP
BMDP
BMDP is a statistical package developed in 1961 at UCLA. Based on the older BIMED program for biomedical applications, it used keyword parameters in the input instead of fixed-format cards, so the letter P was added to the letters BMD, although the name was later defined as being an abbreviation...

– software
Bochner's theorem
Bochner's theorem
In mathematics, Bochner's theorem characterizes the Fourier transform of a positive finite Borel measure on the real line.- Background :...
Bonferroni correction
Bonferroni correction
In statistics, the Bonferroni correction is a method used to counteract the problem of multiple comparisons. It was developed and introduced by Italian mathematician Carlo Emilio Bonferroni...
Bonferroni inequalities – redirects to Boole's inequality
Boole's inequality
In probability theory, Boole's inequality, also known as the union bound, says that for any finite or countable set of events, the probability that at least one of the events happens is no greater than the sum of the probabilities of the individual events...
Boole's inequality
Boole's inequality
In probability theory, Boole's inequality, also known as the union bound, says that for any finite or countable set of events, the probability that at least one of the events happens is no greater than the sum of the probabilities of the individual events...
Boolean analysis
Boolean analysis
Boolean analysis was introduced by Flament . The goal of a Boolean analysis is to detect deterministic dependencies between the items of a questionnaire or similar data-structures in observed response patterns. These deterministic dependencies have the form of logical formulas connecting the items...
Bootstrap aggregating
Bootstrap aggregating
Bootstrap aggregating is a machine learning ensemble meta-algorithm to improve machine learning of statistical classification and regression models in terms of stability and classification accuracy. It also reduces variance and helps to avoid overfitting. Although it is usually applied to decision...
Bootstrap error-adjusted single-sample technique
Bootstrap error-adjusted single-sample technique
In statistics, the bootstrap error-adjusted single-sample technique is a non-parametric method that is intended to allow an assessment to be made of the validity of a single sample. It is based on estimating a probability distribution representing what can be expected from valid samples...
Bootstrapping (statistics)
Bootstrapping (statistics)
In statistics, bootstrapping is a computer-based method for assigning measures of accuracy to sample estimates . This technique allows estimation of the sample distribution of almost any statistic using only very simple methods...
Bootstrapping populations
Borel–Cantelli lemma
Bose–Mesner algebra
Bose–Mesner algebra
In mathematics, a Bose–Mesner algebra is a set of matrices, together with set of rules for combining those matrices, such that certain conditions apply...
Box–Behnken design
Box–Cox distribution
Box–Cox transformation – redirects to Power transform
Power transform
In statistics, the power transform is from a family of functions that are applied to create a rank-preserving transformation of data using power functions. This is a useful data processing technique used to stabilize variance, make the data more normal distribution-like, improve the correlation...
Box–Jenkins
Box–Muller transform
Box–Pierce test
Box plot
Box plot
In descriptive statistics, a box plot or boxplot is a convenient way of graphically depicting groups of numerical data through their five-number summaries: the smallest observation , lower quartile , median , upper quartile , and largest observation...
Branching process
Branching process
In probability theory, a branching process is a Markov process that models a population in which each individual in generation n produces some random number of individuals in generation n + 1, according to a fixed probability distribution that does not vary from individual to...
Bregman divergence
Bregman divergence
In mathematics, the Bregman divergence or Bregman distance is similar to a metric, but does not satisfy the triangle inequality nor symmetry. There are two ways in which Bregman divergences are important. Firstly, they generalize squared Euclidean distance to a class of distances that all share...
Breusch–Godfrey test
Breusch–Godfrey test
In statistics, the Breusch–Godfrey test is used to assess the validity of some of the modelling assumptions inherent in applying regression-like models to observed data series...
Breusch–Pagan statistic – redirects to Breusch–Pagan test
Breusch–Pagan test
Brown–Forsythe test
Brownian bridge
Brownian bridge
A Brownian bridge is a continuous-time stochastic process B whose probability distribution is the conditional probability distribution of a Wiener process W given the condition that B = B = 0.The expected value of the bridge is zero, with variance t, implying that the most...
Brownian excursion
Brownian excursion
In probability theory a Brownian excursion process is a stochastic processes that is closely related to a Wiener process . Realisations of Brownian excursion processes are essentially just realisations of a Weiner process seleced to satisfy certain conditions...
Brownian motion
Brownian motion
Brownian motion or pedesis is the presumably random drifting of particles suspended in a fluid or the mathematical model used to describe such random movements, which is often called a particle theory.The mathematical model of Brownian motion has several real-world applications...
Brownian tree
Brownian tree
A Brownian tree, whose name is derived from Robert Brown via Brownian motion, is a form of computer art that was briefly popular in the 1990s, when home computers started to have sufficient power to simulate Brownian motion...
Bruck–Ryser–Chowla theorem
Burke's theorem
Burke's theorem
In probability theory, Burke's theorem is a theorem in queueing theory by Paul J. Burke while working at Bell Telephone Laboratories that states for an M/M/1, M/M/m or M/M/∞ queue in the steady state with arrivals a Poisson process with rate parameter λ then:# The departure process is a Poisson...
Burr distribution
Business statistics
Business statistics
Business statistics is the science of good decision making in the face of uncertainty and is used in many disciplines such as financial analysis, econometrics, auditing, production and operations including services improvement, and marketing research....
Bühlmann model
Bühlmann model
The Bühlmann model is a random effects model used in credibility theory in actuarial science to determine the appropriate premium for a group of insurance contracts....
Buzen's algorithm
Buzen's algorithm
In queueing theory, a discipline within the mathematical theory of probability, Buzen's algorithm is an algorithm for calculating the normalization constant G in the Gordon–Newell theorem. This method was first proposed by Jeffrey P. Buzen in 1973. Once G is computed the probability distributions...
BV4.1 (software)
BV4.1 (software)
The application software BV4.1 is a user-friendly tool for decomposing and seasonally adjusting monthly or quarterly economic time series by the so-called Berlin procedure. It is being developed by the Federal Statistical Office of Germany...

C

c-chart
Càdlàg
Càdlàg
In mathematics, a càdlàg , RCLL , or corlol function is a function defined on the real numbers that is everywhere right-continuous and has left limits everywhere...
Calculating demand forecast accuracy
Calculating Demand Forecast Accuracy
Calculating demand forecast accuracy is the process of determining the accuracy of forecasts made regarding customer demand for a product.-Importance of forecasts:...
Calculus of predispositions
Calculus of predispositions
Calculus of predispositions is a basic part of predispositioning theory and belongs to the indeterministic procedures.-Introduction:“The key component of any indeterministic procedure is the evaluation of a position...
CalEst
CalEst
CalEst is a statistics package which also includes probability functions as well as tutorials to enhance the learning of Statistics and Probability...

– software
Calibrated probability assessment
Calibrated probability assessment
Calibrated probability assessments are subjective probabilities assigned by individuals who have been trained to assess probabilities in a way that historically represents their uncertainty. In other words, when a calibrated person says they are "80% confident" in each of 100 predictions they...
Calibration (probability) – subjective probability, redirects to Calibrated probability assessment
Calibrated probability assessment
Calibrated probability assessments are subjective probabilities assigned by individuals who have been trained to assess probabilities in a way that historically represents their uncertainty. In other words, when a calibrated person says they are "80% confident" in each of 100 predictions they...
Calibration (statistics)
Calibration (statistics)
There are two main uses of the term calibration in statistics that denote special types of statistical inference problems. Thus "calibration" can mean...

– the statistical calibration problem
Cancer cluster
Cancer cluster
Cancer cluster is a term used by epidemiologists, statisticians, and public health workers to define an occurrence of a greater-than-expected number of cancer cases within a group of people in a geographic area over a period of time....
Candlestick chart
Candlestick chart
A candlestick chart is a style of bar-chart used primarily to describe price movements of a security, derivative, or currency over time.It is a combination of a line-chart and a bar-chart, in that each bar represents the range of price movement over a given time interval. It is most often used in...
Canonical analysis
Canonical analysis
In statistics, canonical analysis belongs to the family of regression methods for data analysis. Regression analysis quantifies a relationship between a predictor variable and a criterion variable by the coefficient of correlation r, coefficient of determination r², and the standard regression...
Canonical correlation
Canonical correlation
In statistics, canonical correlation analysis, introduced by Harold Hotelling, is a way of making sense of cross-covariance matrices. If we have two sets of variables, x_1, \dots, x_n and y_1, \dots, y_m, and there are correlations among the variables, then canonical correlation analysis will...
Canopy clustering algorithm
Canopy clustering algorithm
The canopy clustering algorithm is an unsupervised pre-clustering algorithm, often used as preprocessing step for the K-means algorithm or the Hierarchical clustering algorithm....
Cantor distribution
Carpet plot
Carpet plot
A carpet plot is any of a few different specific types of diagram.- Interaction of two independent variables :Probably the more common plot referred to as a carpet plot is one that illustrates the interacting behaviour of two independent variables, which among other things facilitates interpolation...
Cartogram
Cartogram
A cartogram is a map in which some thematic mapping variable – such as travel time or Gross National Product – is substituted for land area or distance. The geometry or space of the map is distorted in order to convey the information of this alternate variable...
Case-control
Case-control
A case-control study is a type of study design in epidemiology. Case-control studies are used to identify factors that may contribute to a medical condition by comparing subjects who have that condition with patients who do not have the condition but are otherwise similar .Case-control studies are...

– redirects to Case-control study
Case-control study
Catastro of Ensenada
Catastro of Ensenada
In 1749 a large-scale census and statistical investigation was conducted in the Crown of Castile . It included population, territorial properties, buildings, cattle, offices, all kinds of revenue and trades, and even geographical information from each place...

– a census of part of Spain
Categorical data
Categorical data
In statistics, categorical data is that part of an observed dataset that consists of categorical variables, or for data that has been converted into that form, for example as grouped data...
Categorical distribution
Categorical distribution
In probability theory and statistics, a categorical distribution is a probability distribution that describes the result of a random event that can take on one of K possible outcomes, with the probability of each outcome separately specified...
Categorical variable
Cauchy distribution
Cauchy distribution
The Cauchy–Lorentz distribution, named after Augustin Cauchy and Hendrik Lorentz, is a continuous probability distribution. As a probability distribution, it is known as the Cauchy distribution, while among physicists, it is known as the Lorentz distribution, Lorentz function, or Breit–Wigner...
Cauchy–Schwarz inequality
Cauchy–Schwarz inequality
In mathematics, the Cauchy–Schwarz inequality , is a useful inequality encountered in many different settings, such as linear algebra, analysis, probability theory, and other areas...
Causal Markov condition
Causal Markov condition
The Markov condition for a Bayesian network states that any node in a Bayesian network is conditionally independent of its nondescendents, given its parents.A node is conditionally independent of the entire network, given its Markov blanket....
Ceiling effect
Ceiling effect
The term ceiling effect has two distinct meanings, referring to the level at which an independent variable no longer has an effect on a dependent variable, or to the level above which variance in an independent variable is no longer measured or estimated...
Censored regression model
Censored regression model
Censored regression models commonly arise in econometrics in cases where the variable ofinterest is only observable under certain conditions. A common example is labor supply. Data are frequently available on the hours worked by employees, and a labor supply model estimates the relationship between...
Censoring (clinical trials)
Censoring (clinical trials)
The term censoring is used in clinical trials to refer to mathematically removing a patient from the survival curve at the end of their follow-up time. Censoring a patient will reduce the sample size for analyzing after the time of the censorship...
Censoring (statistics)
Censoring (statistics)
In statistics, engineering, and medical research, censoring occurs when the value of a measurement or observation is only partially known.For example, suppose a study is conducted to measure the impact of a drug on mortality. In such a study, it may be known that an individual's age at death is at...
Centering matrix
Centering matrix
In mathematics and multivariate statistics, the centering matrix is a symmetric and idempotent matrix, which when multiplied with a vector has the same effect as subtracting the mean of the components of the vector from every component.- Definition :...
Centerpoint (geometry)
Centerpoint (geometry)
In statistics and computational geometry, the notion of centerpoint is a generalization of the median to data in higher-dimensional Euclidean space...

— Tukey median redirects here
Central composite design
Central composite design
In statistics, a central composite design is an experimental design, useful in response surface methodology, for building a second order model for the response variable without needing to use a complete three-level factorial experiment....
Central limit theorem
Central limit theorem
In probability theory, the central limit theorem states conditions under which the mean of a sufficiently large number of independent random variables, each with finite mean and variance, will be approximately normally distributed. The central limit theorem has a number of variants. In its common...
- Central limit theorem (illustration) — redirects to Illustration of the central limit theorem
  Illustration of the central limit theorem
  This article gives two concrete illustrations of the central limit theorem. Both involve the sum of independent and identically-distributed random variables and show how the probability distribution of the sum approaches the normal distribution as the number of terms in the sum increases.The first...
- Central limit theorem for directional statistics
  Central limit theorem for directional statistics
  In probability theory, the central limit theorem states conditions under which the mean of a sufficiently large number of independent random variables, each with finite mean and variance, will be approximately normally distributed....
- Lyapunov's central limit theorem
- Martingale central limit theorem
  Martingale central limit theorem
  In probability theory, the central limit theorem says that, under certain conditions, the sum of many independent identically-distributed random variables, when scaled appropriately, converges in distribution to a standard normal distribution...
Central moment
Central moment
In probability theory and statistics, central moments form one set of values by which the properties of a probability distribution can be usefully characterised...
Central tendency
Central tendency
In statistics, the term central tendency relates to the way in which quantitative data is clustered around some value. A measure of central tendency is a way of specifying - central value...
Census
Census
A census is the procedure of systematically acquiring and recording information about the members of a given population. It is a regularly occurring and official count of a particular population. The term is used mostly in connection with national population and housing censuses; other common...
Cepstrum
Cepstrum
A cepstrum is the result of taking the Fourier transform of the logarithm of the spectrum of a signal. There is a complex cepstrum, a real cepstrum, a power cepstrum, and phase cepstrum....
CHAID
CHAID
CHAID is a type of decision tree technique, based upon adjusted significance testing . The technique was developed in South Africa and was published in 1980 by Gordon V. Kass, who had completed a PhD thesis on this topic...

— CHi-squared Automatic Interaction Detector
Chain rule for Kolmogorov complexity
Chain rule for Kolmogorov complexity
The chain rule for Kolmogorov complexity is an analogue of the chain rule for information entropy, which states:H = H + HThat is, the combined randomness of two sequences X and Y is the sum of the randomness of X plus whatever randomness is left in Y once we know X.This follows immediately from the...
Challenge-dechallenge-rechallenge
Challenge-dechallenge-rechallenge
Challenge-dechallenge-rechallenge is a medical testing protocol in which a medicine or drug is administered, withdrawn, then re-administered, while being monitored for adverse effects at each stage...
Change detection
Change detection
In statistical analysis, change detection tries to identify changes in the probability distribution of a stochastic process or time series. In general the problem concerns both detecting whether or not a change has occurred, or whether several changes might have occurred, and identifying the times...
- Change detection (GIS)
  Change detection (GIS)
  Change detection for GIS is a process that measures how the attributes of a particular area have changed between two or more time periods. Change detection often involves comparing aerial photographs or satellite imagery of the area taken at different times...
Chapman–Kolmogorov equation
Chapman–Robbins bound
Chapman–Robbins bound
In statistics, the Chapman–Robbins bound or Hammersley–Chapman–Robbins bound is a lower bound on the variance of estimators of a deterministic parameter. It is a generalization of the Cramér–Rao bound; compared to the Cramér–Rao bound, it is both tighter and applicable to a wider range of problems...
Characteristic function (probability theory)
Characteristic function (probability theory)
In probability theory and statistics, the characteristic function of any random variable completely defines its probability distribution. Thus it provides the basis of an alternative route to analytical results compared with working directly with probability density functions or cumulative...
Chauvenet's criterion
Chauvenet's criterion
In statistical theory, the Chauvenet's criterion is a means of assessing whether one piece of experimental data — an outlier — from a set of observations, is likely to be spurious....
Chebyshev center
Chebyshev center
In geometry, the Chebyshev center of a bounded set Q having non-empty interior is the center of the minimal-radius ball enclosing the entire set Q, or, alternatively, the center of largest inscribed ball of Q ....
Chebyshev's inequality
Chebyshev's inequality
In probability theory, Chebyshev’s inequality guarantees that in any data sample or probability distribution,"nearly all" values are close to the mean — the precise statement being that no more than 1/k2 of the distribution’s values can be more than k standard deviations away from the mean...
Checking if a coin is biased — redirects to Checking whether a coin is fair
Checking whether a coin is fair
Cheeger bound
Cheeger bound
In mathematics, the Cheeger bound is a bound of the second largest eigenvalue of the transition matrix of a finite-state, discrete-time, reversible stationary Markov chain. It can be seen as a special case of Cheeger inequalities in expander graphs....
Chemometrics
Chemometrics
Chemometrics is the science of extracting information from chemical systems by data-driven means. It is a highly interfacial discipline, using methods frequently employed in core data-analytic disciplines such as multivariate statistics, applied mathematics, and computer science, in order to...
Chernoff bound
Chernoff bound
In probability theory, the Chernoff bound, named after Herman Chernoff, gives exponentially decreasing bounds on tail distributions of sums of independent random variables...

– a special case of Chernoff's inequality
Chernoff face
Chernoff's distribution
Chernoff's distribution
In probability theory, Chernoff's distribution, named after Herman Chernoff, is the probability distribution of the random variablewhere W is a "two-sided" Wiener process satisfying W = 0.If...
Chernoff's inequality
Chi distribution
Chi-squared distribution
Chi-squared test
Chinese restaurant process
Choropleth map
Choropleth map
A choropleth map A choropleth map A choropleth map (Greek χώρος + πληθαίν:, ("area/region" + "multiply") is a thematic map in which areas are shaded or patterned in proportion to the measurement of the statistical variable being displayed on the map, such as population density or per-capita...
Chow test
Chow test
The Chow test is a statistical and econometric test of whether the coefficients in two linear regressions on different data sets are equal. The Chow test was invented by economist Gregory Chow. In econometrics, the Chow test is most commonly used in time series analysis to test for the presence of...
Chronux
Chronux
Chronux is an open-source software package developed for the loading, visualization and analysis of a variety of modalities / formats of neurobiological time series data...

software
Circular distribution
Circular error probable
Circular error probable
In the military science of ballistics, circular error probable is an intuitive measure of a weapon system's precision...
Circular statistics
Circular statistics
Directional statistics is the subdiscipline of statistics that deals with directions , axes or rotations in Rn...

– redirects to Directional statistics
Circular uniform distribution
Circular uniform distribution
In probability theory and directional statistics, a circular uniform distribution is a probability distribution on the unit circle whose density is uniform for all angles.- Description :The pdf of the circular uniform distribution is:...
Clark–Ocone theorem
Class membership probabilities
Class membership probabilities
In general proplems of classification, class membership probabilities reflect the uncertainty with which a given indivual item can be assigned to any given class. Although statistical classification methods by definition generate such probabilities, applications of classification in machine...
Classic data sets
Classical definition of probability
Classical definition of probability
The classical definition of probability is identified with the works of Pierre-Simon Laplace. As stated in his Théorie analytique des probabilités,This definition is essentially a consequence of the principle of indifference...
Classical test theory
Classical test theory
Classical test theory is a body of related psychometric theory that predict outcomes of psychological testing such as the difficulty of items or the ability of test-takers. Generally speaking, the aim of classical test theory is to understand and improve the reliability of psychological...

- psychometrics
Classification rule
Classification rule
Given a population whose members can be potentially separated into a number of different sets or classes, a classification rule is a procedure in which the elements of the population set are each assigned to one of the classes. A perfect test is such that every element in the population is assigned...
Classifier (mathematics)
Climate ensemble
Climate ensemble
In physics, a statistical ensemble is a large set of copies of a system, considered all at once; each copy of the system representing a different possible detailed realisation of the system, consistent with the system's observed macroscopic properties....
Clinical significance
Clinical significance
In medicine and psychology, clinical significance refers to either of two related but slightly dissimilar concepts whereby certain findings or differences, even if measurable or statistically confirmed, either may or may not have additional significance, either by being of a magnitude that conveys...
Clinical study design
Clinical trial
Clinical trial
Clinical trials are a set of procedures in medical research and drug development that are conducted to allow safety and efficacy data to be collected for health interventions...
Clinical utility of diagnostic tests
Clinical utility of diagnostic tests
The clinical utility of a diagnostic test is its capacity to rule diagnosis in and/or out and to make a decision possible to adopt or to reject a therapeutic action. It can be integrated into clinical prediction rules for specific diseases or outcomes....
Cliodynamics
Cliodynamics
thumb|Clio—detail from [[The Art of Painting|The Allegory of Painting]] by [[Johannes Vermeer]]Cliodynamics is a new multidisciplinary area of research focused at mathematical modeling of historical dynamics.-Origins:The term was originally coined by Peter...
Closed testing procedure
Closed testing procedure
In statistics, the closed testing procedure is a general method for performing more than one hypothesis test simultaneously.-The closed testing principle:...
Cluster analysis
Cluster analysis (in marketing)
Cluster analysis (in marketing)
Cluster analysis is a class of statistical techniques that can be applied to data that exhibit “natural” groupings. Cluster analysis sorts through the raw data and groups them into clusters. A cluster is a group of relatively homogeneous cases or observations. Objects in a cluster are similar to...
Cluster randomised controlled trial
Cluster randomised controlled trial
A cluster randomised controlled trial is a type of randomised controlled trial in which groups of subjects are randomised...
Cluster sampling
Cluster sampling
Cluster Sampling is a sampling technique used when "natural" groupings are evident in a statistical population. It is often used in marketing research. In this technique, the total population is divided into these groups and a sample of the groups is selected. Then the required information is...
Cluster-weighted modeling
Cluster-weighted modeling
In statistics, cluster-weighted modeling is an algorithm-based approach to non-linear prediction of outputs from inputs based on density estimation using a set of models that are each notionally appropriate in a sub-region of the input space...
Clustering high-dimensional data
Clustering high-dimensional data
Clustering high-dimensional data is the cluster analysis of data with anywhere from a few dozen to many thousands of dimensions. Such high-dimensional data spaces are often encountered in areas such as medicine, where DNA microarray technology can produce a large number of measurements at once, and...
CMA-ES
CMA-ES
CMA-ES stands for Covariance Matrix Adaptation Evolution Strategy. Evolution strategies are stochastic, derivative-free methods for numerical optimization of non-linear or non-convex continuous optimization problems. They belong to the class of evolutionary algorithms and evolutionary computation...

(Covariance Matrix Adaptation Evolution Strategy)
Coalescent theory
Coalescent theory
In genetics, coalescent theory is a retrospective model of population genetics. It attempts to trace all alleles of a gene shared by all members of a population to a single ancestral copy, known as the most recent common ancestor...
Cochran's C test
Cochran's C test
In statistics, Cochran's C test , named after William G. Cochran, is a one-sided upper limit variance outlier test. The C test is used to decide if a single estimate of a variance is significantly larger than a group of variances with which the single estimate is supposed to be comparable...
Cochran's Q test
Cochran's theorem
Cochran's theorem
In statistics, Cochran's theorem, devised by William G. Cochran, is a theorem used in to justify results relating to the probability distributions of statistics that are used in the analysis of variance.- Statement :...
Cochran-Armitage test for trend
Cochran-Armitage test for trend
The Cochran-Armitage test for trend, named for William Cochran and Peter Armitage, is used in categorical data analysis when the aim is to assess for the presence of an association between a variable with two categories and a variable with k categories. It modifies the chi-squared test to...
Cochran–Mantel–Haenszel statistics
Cochran–Mantel–Haenszel statistics
In statistics, the Cochran–Mantel–Haenszel statistics are a collection of test statistics used in the analysis of stratified categorical data.. They are named after William G Cochran, Nathan Mantel and William Haenszel. One of these test statistics is the Cochran–Mantel–Haenszel test, which allows...
Cochrane–Orcutt estimation
Coding (social sciences)
Coding (social sciences)
Coding refers to an analytical process in which data, in both quantitative form or qualitative are categorised to facilitate analysis....
Coefficient of coherence — redirects to Coherence (statistics)
Coherence (statistics)
In probability theory and statistics, coherence can have two meanings.*When dealing with personal probability assessments, or supposed probabilities derived in nonstandard ways, it is a property of self-consistency across a whole set of such assessments...
Coefficient of determination
Coefficient of determination
In statistics, the coefficient of determination R2 is used in the context of statistical models whose main purpose is the prediction of future outcomes on the basis of other related information. It is the proportion of variability in a data set that is accounted for by the statistical model...
Coefficient of dispersion
Coefficient of dispersion
In probability theory and statistics, the index of dispersion, dispersion index, coefficient of dispersion, or variance-to-mean ratio , like the coefficient of variation, is a normalized measure of the dispersion of a probability distribution: it is a measure used to quantify whether a set of...
Coefficient of variation
Coefficient of variation
In probability theory and statistics, the coefficient of variation is a normalized measure of dispersion of a probability distribution. It is also known as unitized risk or the variation coefficient. The absolute value of the CV is sometimes known as relative standard deviation , which is...
Cognitive pretesting
Cognitive pretesting
Cognitive interviewing is a field research method used primarily in pre-testing survey instruments developed in collaboration by psychologists and survey researchers. It allows survey researchers to collect verbal information regarding survey responses and is used in evaluating whether the...
Cohen's class distribution function
Cohen's class distribution function
Bilinear time–frequency distributions, or quadratic time–frequency distributions, arise in a sub-field field of signal analysis and signal processing called time–frequency signal processing, and, in the statistical analysis of time series data...

– a time-frequency distribution function
Cohen's kappa
Cohen's kappa
Cohen's kappa coefficient is a statistical measure of inter-rater agreement or inter-annotator agreement for qualitative items. It is generally thought to be a more robust measure than simple percent agreement calculation since κ takes into account the agreement occurring by chance. Some...
Coherence (statistics)
Coherence (statistics)
In probability theory and statistics, coherence can have two meanings.*When dealing with personal probability assessments, or supposed probabilities derived in nonstandard ways, it is a property of self-consistency across a whole set of such assessments...
Cohort (statistics)
Cohort (statistics)
In statistics and demography, a cohort is a group of subjects who have shared a particular time together during a particular time span . Cohorts may be tracked over extended periods in a cohort study. The cohort can be modified by censoring, i.e...
Cohort effect
Cohort effect
The term cohort effect is used in social science to describe variations in the characteristics of an area of study over time among individuals who are defined by some shared temporal experience or common life experience, such as year of birth, or year of exposure to radiation.Cohort effects are...
Cohort study
Cohort study
A cohort study or panel study is a form of longitudinal study used in medicine, social science, actuarial science, and ecology. It is an analysis of risk factors and follows a group of people who do not have the disease, and uses correlations to determine the absolute risk of subject contraction...
Cointegration
Cointegration
Cointegration is a statistical property of time series variables. Two or more time series are cointegrated if they share a common stochastic drift.-Introduction:...
Collectively exhaustive events
Collider (epidemiology)
Collider (epidemiology)
In epidemiology, a collider is a variable which is the effect of two other variables. It is known as collider because, in graphical models, the other variables lead to the collider in a way that their arrow heads appear to collide on the same node that is the collider e.g.M \rightarrow P...
Combinatorial data analysis
Combinatorial data analysis
Combinatorial data analysis is the study of data sets where the arrangement of objects is important. CDA can be used either to determine how well a given combinatorial construct reflects the observed data, or to search for a suitable combinatorial construct that does fit the data.-See...
Combinatorial design
Combinatorial design
Combinatorial design theory is the part of combinatorial mathematics that deals with the existence and construction of systems of finite sets whose intersections have specified numerical properties....
Combinatorial meta-analysis
Common mode failure
Common-cause and special-cause
Common-cause and special-cause
Common- and special-causes are the two distinct origins of variation in a process, as defined in the statistical thinking and methods of Walter A. Shewhart and W. Edwards Deming...
Comparing means
Comparing means
The following tables provide guidance to the selection of the proper parametric or non-parametric statistical tests for a given data set.-Is there a difference ?:...
Comparison of general and generalized linear models
Comparison of statistical packages
Comparison of statistical packages
The following tables compare general and technical information for a number of statistical analysis packages.-General information:Basic information about each product...
Comparisonwise error rate
Complementary event
Complementary event
In probability theory, the complement of any event A is the event [not A], i.e. the event that A does not occur. The event A and its complement [not A] are mutually exclusive and exhaustive. Generally, there is only one event B such that A and B are both mutually exclusive and...
Complete-linkage clustering
Complete-linkage clustering
In cluster analysis, complete linkage or farthest neighbour is a method of calculating distances between clusters in agglomerative hierarchical clustering...
Complete spatial randomness
Complete spatial randomness
Complete spatial randomness describes a point process whereby point events occur within a given study area in a completely random fashion. Such a process is often modeled using only one parameter, i.e. the density of points, \rho within the defined area...
Completely randomized design
Completely randomized design
In the design of experiments, completely randomized designs are for studying the effects of one primary factor without the need to take other nuisance variables into account. This article describes completely randomized designs that have one primary factor. The experiment compares the values of a...
Completeness (statistics)
Completeness (statistics)
In statistics, completeness is a property of a statistic in relation to a model for a set of observed data. In essence, it is a condition which ensures that the parameters of the probability distribution representing the model can all be estimated on the basis of the statistic: it ensures that the...
Compositional data
Compositional data
In statistics, compositional data are quantitative descriptions of the parts of some whole, conveying exclusively relative information.This definition, given by John Aitchison has several consequences:...
Composite bar chart
Composite bar chart
Composite bar charts are bar charts which always total 100, but each element is shown as a percentage of the bar allowing different sample sizes to be more easily compared.-External links:...
Compound Poisson distribution
Compound Poisson distribution
In probability theory, a compound Poisson distribution is the probability distribution of the sum of a "Poisson-distributed number" of independent identically-distributed random variables...
Compound Poisson process
Compound Poisson process
A compound Poisson process is a continuous-time stochastic process with jumps. The jumps arrive randomly according to a Poisson process and the size of the jumps is also random, with a specified probability distribution...
Compound probability distribution
Compound probability distribution
In probability theory, a compound probability distribution is the probability distribution that results from assuming that a random variable is distributed according to some parametrized distribution F with an unknown parameter θ that is distributed according to some other distribution G, and then...
Computational formula for the variance
Computational learning theory
Computational learning theory
In theoretical computer science, computational learning theory is a mathematical field related to the analysis of machine learning algorithms.-Overview:Theoretical results in machine learning mainly deal with a type of...
Computational statistics
Computational statistics
Computational statistics, or statistical computing, is the interface between statistics and computer science. It is the area of computational science specific to the mathematical science of statistics....
Computer experiment
Computer experiment
In the scientific context, a computer experiment refer to mathematical modeling using computer simulation. It has become common to call such experiments in silico...
Concordance correlation coefficient
Concordance correlation coefficient
In statistics, the concordance correlation coefficient measures the agreement between two variables, e.g., to evaluate reproducibility or for inter-rater reliability.-Definition:...
Concordant pair
Concrete illustration of the central limit theorem
Concurrent validity
Concurrent validity
Concurrent validity is a parameter used in sociology, psychology, and other psychometric or behavioral sciences. Concurrent validity is demonstrated where a test correlates well with a measure that has previously been validated. The two measures may be for the same construct, or for different, but...
Conditional change model
Conditional change model
The conditional change model in statistics is the analytic procedure in which change scores are regressed on baseline values, together with the explanatory variables of interest . The method has some substantial advantages over the usual two-sample t-test recommended in textbooks.-References:*...
Conditional distribution — redirects to Conditional probability distribution
Conditional expectation
Conditional expectation
In probability theory, a conditional expectation is the expected value of a real random variable with respect to a conditional probability distribution....
Conditional independence
Conditional independence
In probability theory, two events R and B are conditionally independent given a third event Y precisely if the occurrence or non-occurrence of R and the occurrence or non-occurrence of B are independent events in their conditional probability distribution given Y...
Conditional probability
Conditional probability
In probability theory, the "conditional probability of A given B" is the probability of A if B is known to occur. It is commonly notated P, and sometimes P_B. P can be visualised as the probability of event A when the sample space is restricted to event B...
Conditional probability distribution
Conditional random field
Conditional random field
A conditional random field is a statistical modelling method often applied in pattern recognition.More specifically it is a type of discriminative undirected probabilistic graphical model. It is used to encode known relationships between observations and construct consistent interpretations...
Conditional variance
Conditional variance
In probability theory and statistics, a conditional variance is the variance of a conditional probability distribution. Particularly in econometrics, the conditional variance is also known as the scedastic function or skedastic function...
Conditionality principle
Conditionality principle
The conditionality principle is a Fisherian principle of statistical inference that Allan Birnbaum formally defined and studied in his 1962 JASA article. Together with the sufficiency principle, Birnbaum's version of the principle implies the famous likelihood principle...
Confidence band
Confidence band
A confidence band is used in statistical analysis to represent the uncertainty in an estimate of a curve or function based on limited or noisy data. Confidence bands are often used as part of the graphical presentation of results in a statistical analysis...
Confidence distribution
Confidence distribution
In statistics, the concept of a confidence distribution has often been loosely referred to as a distribution function on the parameter space that can represent confidence intervals of all levels for a parameter of interest...
Confidence interval
Confidence interval
In statistics, a confidence interval is a particular kind of interval estimate of a population parameter and is used to indicate the reliability of an estimate. It is an observed interval , in principle different from sample to sample, that frequently includes the parameter of interest, if the...
Confidence region
Confidence region
In statistics, a confidence region is a multi-dimensional generalization of a confidence interval. It is a set of points in an n-dimensional space, often represented as an ellipsoid around a point which is an estimated solution to a problem, although other shapes can occur.The confidence region is...
Configural frequency analysis
Configural frequency analysis
Configural frequency analysis is a method of exploratory data analysis. The goal of a configural frequency analysis is to detect patterns in the data that occur significantly more or significantly less often than expected by chance...
Confirmation bias
Confirmation bias
Confirmation bias is a tendency for people to favor information that confirms their preconceptions or hypotheses regardless of whether the information is true.David Perkins, a geneticist, coined the term "myside bias" referring to a preference for "my" side of an issue...
Confirmatory factor analysis
Confirmatory factor analysis
In statistics, confirmatory factor analysis is a special form of factor analysis. It is used to test whether measures of a construct are consistent with a researcher's understanding of the nature of that construct . In contrast to exploratory factor analysis, where all loadings are free to vary,...
Confounding
Confounding
In statistics, a confounding variable is an extraneous variable in a statistical model that correlates with both the dependent variable and the independent variable...
Confounding factor
Confusion of the inverse
Confusion of the inverse
Confusion of the inverse, also called the conditional probability fallacy, is a logical fallacy whereupon a conditional probability is equivocated with its inverse: That is, given two events A and B, the probability Pr is assumed to be approximately equal to Pr.-Example 1:In one study, physicians...
Conjoint analysis
Conjoint analysis
Conjoint analysis, also called multi-attribute compositional models or stated preference analysis, is a statistical technique that originated in mathematical psychology. Today it is used in many of the social sciences and applied sciences including marketing, product management, and operations...
- Conjoint analysis (in healthcare)
  Conjoint analysis (in healthcare)
  -Why conjoint in healthcare market research?:Pharmaceutical manufacturers need deeper and deeper market information they can rely on to make the right decisions and to identify the most promising market opportunities[1][6] . They can obtain great benefits from understanding physicians’ prescription...
- Conjoint analysis (in marketing)
  Conjoint analysis (in marketing)
  Conjoint analysis is a statistical technique used in market research to determine how people value different features that make up an individual product or service....
Conjugate prior
Conjugate prior
In Bayesian probability theory, if the posterior distributions p are in the same family as the prior probability distribution p, the prior and posterior are then called conjugate distributions, and the prior is called a conjugate prior for the likelihood...
Consensus-based assessment
Consensus forecast
Consistency (statistics)
Consistency (statistics)
In statistics, consistency of procedures such as confidence intervals or hypothesis tests involves their behaviour as the number of items in the data-set to which they are applied increases indefinitely...

(disambiguation)
Consistent estimator
Consistent estimator
In statistics, a sequence of estimators for parameter θ0 is said to be consistent if this sequence converges in probability to θ0...
Constant elasticity of substitution
Constant Elasticity of Substitution
In economics, Constant elasticity of substitution is a property of some production functions and utility functions.More precisely, it refers to a particular type of aggregator function which combines two or more types of consumption, or two or more types of productive inputs into an aggregate...
Constant false alarm rate
Constant false alarm rate
Constant false alarm rate detection refers to a common form of adaptive algorithm used in radar systems to detect target returns against a background of noise, clutter and interference.Other detection algorithms are not adaptive...
Constraint (information theory)
Constraint (information theory)
Constraint in information theory refers to the degree of statistical dependence between or among variables.Garner provides a thorough discussion of various forms of constraint with application to pattern recognition and psychology....
Consumption distribution
Consumption distribution
In economics, the consumption distribution is an alternative to the income distribution for judging economic inequality, comparing levels of consumption rather than income or wealth.-See also:* Economic inequality* Wealth condensation* Lorenz curve* Asset...
Contact process (mathematics)
Content validity
Content validity
In psychometrics, content validity refers to the extent to which a measure represents all facets of a given social construct. For example, a depression scale may lack content validity if it only assesses the affective dimension of depression but fails to take into account the behavioral dimension...
Contiguity (probability theory)
Contiguity (probability theory)
In probability theory, two sequences of probability measures are said to be contiguous if asymptotically they share the same support. Thus the notion of contiguity extends the concept of absolute continuity to the sequences of measures....
Contingency table
Contingency table
In statistics, a contingency table is a type of table in a matrix format that displays the frequency distribution of the variables...
Continuity correction
Continuity correction
In probability theory, if a random variable X has a binomial distribution with parameters n and p, i.e., X is distributed as the number of "successes" in n independent Bernoulli trials with probability p of success on each trial, then...
Continuous distribution — redirects to Continuous probability distribution
Continuous mapping theorem
Continuous mapping theorem
In probability theory, the continuous mapping theorem states that continuous functions are limit-preserving even if their arguments are sequences of random variables. A continuous function, in Heine’s definition, is such a function that maps convergent sequences into convergent sequences: if xn → x...
Continuous probability distribution
Continuous stochastic process
Continuous stochastic process
In the probability theory, a continuous stochastic process is a type of stochastic process that may be said to be "continuous" as a function of its "time" or index parameter. Continuity is a nice property for a process to have, since it implies that they are well-behaved in some sense, and,...
Continuous-time Markov process
Continuous-time stochastic process
Continuous-time stochastic process
In probability theory and statistics, a continuous-time stochastic process, or a continuous-space-time stochastic process is a stochastic process for which the index variable takes a continuous set of values, as contrasted with a discrete-time process for which the index variable takes only...
Contrast (statistics)
Contrast (statistics)
In statistics, particularly analysis of variance, a contrast is a linear combination of two or more factor level means whose coefficients add up to zero. A simple contrast is the difference between two means...
Control chart
Control chart
Control charts, also known as Shewhart charts or process-behaviour charts, in statistical process control are tools used to determine whether or not a manufacturing or business process is in a state of statistical control.- Overview :...
Control event rate
Control event rate
In epidemiology and biostatistics, the control event rate is a measure of how often a particular statistical event occurs within the scientific control group of an experiment ....
Control limits
Control limits
Control limits, also known as natural process limits, are horizontal lines drawn on a statistical process control chart, usually at a distance of ±3 standard deviations of the plotted statistic from the statistic's mean....
Control variate
Control variate
The control variates method is a variance reduction technique used in Monte Carlo methods. It exploits information about the errors in estimates of known quantities to reduce the error of an estimate of an unknown quantity.-Underlying principle:...
Controlling for a variable
Controlling for a variable
Controlling for a variable refers to the deliberate varying of the experimental conditions in order to see the impact of a specific variable when predicting the outcome variable . Controlling tends to reduce the experimental error...
Convergence of measures
Convergence of measures
In mathematics, more specifically measure theory, there are various notions of the convergence of measures. Three of the most common notions of convergence are described below.-Total variation convergence of measures:...
Convergence of random variables
Convergence of random variables
In probability theory, there exist several different notions of convergence of random variables. The convergence of sequences of random variables to some limit random variable is an important concept in probability theory, and its applications to statistics and stochastic processes...
Convex hull
Convex hull
In mathematics, the convex hull or convex envelope for a set of points X in a real vector space V is the minimal convex set containing X....
Convolution of probability distributions
Convolution of probability distributions
The convolution of probability distributions arises in probability theory and statistics as the operation in terms of probability distributions that corresponds to the addition of independent random variables and, by extension, to forming linear combinations of random variables...
Convolution random number generator
Conway–Maxwell–Poisson distribution
Cook's distance
Cook's distance
In statistics, Cook's distance is a commonly used estimate of the influence of a data point when doing least squares regression analysis. In a practical ordinary least squares analysis, Cook's distance can be used in several ways: to indicate data points that are particularly worth checking for...
Cophenetic correlation
Cophenetic correlation
In statistics, and especially in biostatistics, cophenetic correlation is a measure of how faithfully a dendrogram preserves the pairwise distances between the original unmodeled data points...
Copula (statistics)
Copula (statistics)
In probability theory and statistics, a copula can be used to describe the dependence between random variables. Copulas derive their name from linguistics....
Correct sampling
Correct sampling
During sampling of particulate materials, correct sampling is defined in Gy's sampling theory as a sampling scenario in which all particles in a population have the same probability of ending up in the sample ....
Correction for attenuation
Correction for attenuation
Correction for attenuation is a statistical procedure, due to Spearman , to "rid a correlation coefficient from the weakening effect of measurement error" , a phenomenon also known as regression dilution. In measurement and statistics, it is also called disattenuation...
Correlate summation analysis
Correlate summation analysis
Correlate summation analysis is a data mining method. It is designed to find the variables that are most covariant with all of the other variables being studied, relative to clustering. Aggregate correlate summation is the product of the totaled negative logarithm of the p-values for all of the...
Correlation
Correlation
In statistics, dependence refers to any statistical relationship between two random variables or two sets of data. Correlation refers to any of a broad class of statistical relationships involving dependence....
Correlation and dependence
Correlation does not imply causation
Correlation does not imply causation
"Correlation does not imply causation" is a phrase used in science and statistics to emphasize that correlation between two variables does not automatically imply that one causes the other "Correlation does not imply causation" (related to "ignoring a common cause" and questionable cause) is a...
Correlation clustering
Correlation clustering
In machine learning, correlation clustering or cluster editing operates in a scenario where the relationship between the objects are known instead of the actual representation of the objects...
Correlation function
Correlation function
A correlation function is the correlation between random variables at two different points in space or time, usually as a function of the spatial or temporal distance between the points...
- Correlation function (astronomy)
  Correlation function (astronomy)
  In astronomy, a correlation function describes the distribution of galaxies in the universe. By default, correlation function refers to the two-point autocorrelation function. For a given distance, the two-point autocorrelation function is a function of one variable which describes the...
- Correlation function (quantum field theory)
  Correlation function (quantum field theory)
  In quantum field theory, the matrix element computed by inserting a product of operators between two states, usually the vacuum states, is called a correlation function....
- Correlation function (statistical mechanics)
  Correlation function (statistical mechanics)
  In statistical mechanics, the correlation function is a measure of the order in a system, as characterized by a mathematical correlation function, and describes how microscopic variables at different positions are correlated....
Correlation implies causation
Correlation implies causation
"Correlation does not imply causation" is a phrase used in science and statistics to emphasize that correlation between two variables does not automatically imply that one causes the other "Correlation does not imply causation" (related to "ignoring a common cause" and questionable cause) is a...
Correlation inequality
Correlation inequality
In probability and statistics, a correlation inequality is one of a number of inequalities satisfied by the correlation functions of a model. Such inequalities are of particular use in statistical mechanics and in percolation theory.Examples include:...
Correlation ratio
Correlation ratio
In statistics, the correlation ratio is a measure of the relationship between the statistical dispersion within individual categories and the dispersion across the whole population or sample. The measure is defined as the ratio of two standard deviations representing these types of variation...
Correlogram
Correlogram
In the analysis of data, a correlogram is an image of correlation statistics. For example, in time series analysis, a correlogram, also known as an autocorrelation plot, is a plot of the sample autocorrelations r_h\, versus h\, ....
Correspondence analysis
Correspondence analysis
Correspondence analysis is a multivariate statistical technique proposed by Hirschfeld and later developed by Jean-Paul Benzécri. It is conceptually similar to principal component analysis, but applies to categorical rather than continuous data...
Cosmic variance
Cosmic variance
Cosmic variance is the statistical uncertainty inherent in observations of the universe at extreme distances. It is based on the idea that it is only possible to observe part of the universe at one particular time, so it is difficult to make statistical statements about cosmology on the scale of...
Cost-of-living index
Cost-of-living index
Cost of living is the cost of maintaining a certain standard of living. Changes in the cost of living over time are often operationalized in a cost of living index. Cost of living calculations are also used to compare the cost of maintaining a certain standard of living in different geographic areas...
Count data
Counternull
Counternull
In statistics, and especially in the statistical analysis of psychological data, the counternull is a statistic used to aid the understanding and presentation of research results...
Counting process
Covariance
Covariance
In probability theory and statistics, covariance is a measure of how much two variables change together. Variance is a special case of the covariance when the two variables are identical.- Definition :...
Covariance and correlation
Covariance and correlation
In probability theory and statistics, the mathematical descriptions of covariance and correlation are very similar. Both describe the degree of similarity between two random variables or sets of random variables....
Covariance intersection
Covariance intersection
Covariance intersection is an algorithm for combining two or more estimates of state variables in a Kalman filter when the correlation between them is unknown.-Specification:...
Covariance matrix
Covariance matrix
In probability theory and statistics, a covariance matrix is a matrix whose element in the i, j position is the covariance between the i th and j th elements of a random vector...
Covariance function
Covariance function
In probability theory and statistics, covariance is a measure of how much two variables change together and the covariance function describes the variance of a random variable process or field...
Covariate
Covariate
In statistics, a covariate is a variable that is possibly predictive of the outcome under study. A covariate may be of direct interest or it may be a confounding or interacting variable....
Cover's theorem
Cover's Theorem
Cover's Theorem is a statement in computational learning theory and is one of the primary theoretical motivations for the use of non-linear kernel methods in machine learning applications...
Coverage probability
Coverage probability
In statistics, the coverage probability of a confidence interval is the proportion of the time that the interval contains the true value of interest. For example, suppose our interest is in the mean number of months that people with a particular type of cancer remain in remission following...
Cox process
Cox process
A Cox process , also known as a doubly stochastic Poisson process or mixed Poisson process, is a stochastic process which is a generalization of a Poisson process...
Cox's theorem
Cox's theorem
Cox's theorem, named after the physicist Richard Threlkeld Cox, is a derivation of the laws of probability theory from a certain set of postulates. This derivation justifies the so-called "logical" interpretation of probability. As the laws of probability derived by Cox's theorem are applicable to...
Cox–Ingersoll–Ross model
Cramér–Rao bound
Cramér–von Mises criterion
Cramér's theorem
Cramér's theorem
In mathematical statistics, Cramér's theorem is one of several theorems of Harald Cramér, a Swedish statistician and probabilist.- Normal random variables :...
Cramér's V
Cramér's V
In statistics, Cramér's V is a popular measure of association between two nominal variables, giving a value between 0 and +1...
Craps principle
Craps principle
In probability theory, the craps principle is a theorem about event probabilities under repeated iid trials. Let E_1 and E_2 denote two mutually exclusive events which might occur on a given trial...
Credible interval
Credible interval
In Bayesian statistics, a credible interval is an interval in the domain of a posterior probability distribution used for interval estimation. The generalisation to multivariate problems is the credible region...
Cricket statistics
Cricket statistics
Cricket is a sport that generates a large number of statistics.Statistics are recorded for each player during a match, and aggregated over a career. At the professional level, statistics for Test cricket, one-day internationals, and first-class cricket are recorded separately...
Crime statistics
Crime statistics
Crime statistics attempt to provide statistical measures of the crime in societies. Given that crime is usually secretive by nature, measurements of it are likely to be inaccurate....
Critical region — redirects to Statistical hypothesis testing
Statistical hypothesis testing
A statistical hypothesis test is a method of making decisions using data, whether from a controlled experiment or an observational study . In statistics, a result is called statistically significant if it is unlikely to have occurred by chance alone, according to a pre-determined threshold...
Cromwell's rule
Cromwell's rule
Cromwell's rule, named by statistician Dennis Lindley, states that one should avoid using prior probabilities of 0 or 1, except when applied to statements that are logically true or false...
Cronbach's α
Cronbach's alpha
Cronbach's \alpha is a coefficient of reliability. It is commonly used as a measure of the internal consistency or reliability of a psychometric test score for a sample of examinees. It was first named alpha by Lee Cronbach in 1951, as he had intended to continue with further coefficients...
Cross-correlation
Cross-correlation
In signal processing, cross-correlation is a measure of similarity of two waveforms as a function of a time-lag applied to one of them. This is also known as a sliding dot product or sliding inner-product. It is commonly used for searching a long-duration signal for a shorter, known feature...
Cross-covariance
Cross-entropy method
Cross-entropy method
The cross-entropy method attributed to Reuven Rubinstein is a general Monte Carlo approach tocombinatorial and continuous multi-extremal optimization and importance sampling.The method originated from the field of rare event simulation, where...
Cross-sectional data
Cross-sectional data
Cross-sectional data or cross section in statistics and econometrics is a type of one-dimensional data set. Cross-sectional data refers to data collected by observing many subjects at the same point of time, or without regard to differences in time...
Cross-sectional regression
Cross-sectional regression
A Cross-sectional regression is a type of regression model in which the explained and explanatory variables are associated with one period or point in time...
Cross-sectional study
Cross-sectional study
Cross-sectional studies form a class of research methods that involve observation of all of a population, or a representative subset, at one specific point in time...
Cross-spectrum
Cross-spectrum
In time series analysis, the cross-spectrum is used as part of a frequency domain analysis of the cross correlation or cross covariance between two time series.- Definition :...
Cross tabulation
Cross tabulation
Cross tabulation is the process of creating a contingency table from the multivariate frequency distribution of statistical variables. Heavily used in survey research, cross tabulations can be produced by a range of statistical packages, including some that are specialised for the task. Survey...
Cross-validation (statistics)
Crystal Ball function
Crystal Ball function
The Crystal Ball function, named after the Crystal Ball Collaboration , is a probability density function commonly used to model various lossy processes in high-energy physics. It consists of a Gaussian core portion and a power-law low-end tail, below a certain threshold...

- a probability distribution
Cumulant
Cumulant
In probability theory and statistics, the cumulants κn of a probability distribution are a set of quantities that provide an alternative to the moments of the distribution. The moments determine the cumulants in the sense that any two probability distributions whose moments are identical will have...
Cumulant generating function — redirects to cumulant
Cumulant
In probability theory and statistics, the cumulants κn of a probability distribution are a set of quantities that provide an alternative to the moments of the distribution. The moments determine the cumulants in the sense that any two probability distributions whose moments are identical will have...
Cumulative distribution function
Cumulative distribution function
In probability theory and statistics, the cumulative distribution function , or just distribution function, describes the probability that a real-valued random variable X with a given probability distribution will be found at a value less than or equal to x. Intuitively, it is the "area so far"...
Cumulative frequency analysis
Cumulative frequency analysis
Cumulative frequency analysis is the applcation of estimation theory to exceedance probability . The complement, the non-exceedance probability concerns the frequency of occurrence of values of a phenomenon staying below a reference value. The phenomenon may be time or space dependent...
Cumulative incidence
Cumulative incidence
Cumulative incidence or incidence proportion is a measure of frequency, as in epidemiology, where it is a measure of disease frequency during a period of time...
Cunningham function
Cunningham function
In statistics, the Cunningham function or Pearson–Cunningham function ωm,n is a generalisation of a special function introduced by and studied in the form here by...
CURE data clustering algorithm
CURE data clustering algorithm
CURE is an efficient data clustering algorithm for large databases that is more robust to outliers and identifies clusters having non-spherical shapes and wide variances in size.- Drawbacks of traditional algorithms :...
Curve fitting
Curve fitting
Curve fitting is the process of constructing a curve, or mathematical function, that has the best fit to a series of data points, possibly subject to constraints. Curve fitting can involve either interpolation, where an exact fit to the data is required, or smoothing, in which a "smooth" function...
CUSUM
CUSUM
In statistical quality control, the CUSUM is a sequential analysis technique due to E. S. Page of the University of Cambridge. It is typically used for monitoring change detection...
Cuzick–Edwards test
Cyclostationary process

D

d'
D'
The sensitivity index or d' is a statistic used in signal detection theory. It provides the separation between the means of the signal and the noise distributions, in units of the standard deviation of the noise distribution....
d-separation
D'Agostino's K-squared test
D'Agostino's K-squared test
In statistics, D’Agostino’s K2 test is a goodness-of-fit measure of departure from normality, that is the test aims to establish whether or not the given sample comes from a normally distributed population...
Dagum distribution
DAP
DAP (software)
Dap is a statistics and graphics program, that performs data management, analysis, and graphical visualization tasks which are commonly required in statistical consulting practice....

— open source software
Data analysis
Data analysis
Analysis of data is a process of inspecting, cleaning, transforming, and modeling data with the goal of highlighting useful information, suggesting conclusions, and supporting decision making...
Data assimilation
Data assimilation
Applications of data assimilation arise in many fields of geosciences, perhaps most importantly in weather forecasting and hydrology. Data assimilation proceeds by analysis cycles...
Data binning
Data binning
Data binning is a data pre-processing technique used to reduce the effects of minor observation errors. The original data values which fall in a given small interval, a bin, are replaced by a value representative of that interval, often the central value...
Data classification (business intelligence)
Data classification (Business Intelligence)
In business intelligence, data classification has close ties to data clustering, but where data clustering is descriptive, data classification is predictive. In essence data classification consists of using variables with known values to predict the unknown or future values of other variables. It...
Data cleansing
Data cleansing
Data cleansing, data cleaning, or data scrubbing is the process of detecting and correcting corrupt or inaccurate records from a record set, table, or database. Used mainly in databases, the term refers to identifying incomplete, incorrect, inaccurate, irrelevant, etc...
Data clustering
Data clustering
Cluster analysis or clustering is the task of assigning a set of objects into groups so that the objects in the same cluster are more similar to each other than to those in other clusters....
Data collection
Data collection
Data collection is a term used to describe a process of preparing and collecting data, for example, as part of a process improvement or similar project. The purpose of data collection is to obtain information to keep on record, to make decisions about important issues, to pass information on to...
Data Desk
Data Desk
Data Desk is a software program for visual data analysis, visual data exploration, and statistics. It carries out Exploratory Data Analysis and standard statistical analyses by means of dynamically linked graphic data displays that update any change simultaneously.-History:Data Desk was developed...

– software
Data dredging
Data dredging
Data dredging is the inappropriate use of data mining to uncover misleading relationships in data. Data-snooping bias is a form of statistical bias that arises from this misuse of statistics...
Data generating process (disambiguation)
Data mining
Data mining
Data mining , a relatively young and interdisciplinary field of computer science is the process of discovering new patterns from large data sets involving methods at the intersection of artificial intelligence, machine learning, statistics and database systems...
Data reduction
Data reduction
Data Reduction is the transformation of numerical or alphabetical digital information derived empirical or experimentally into a corrected, ordered, and simplified form....
Data point
Data point
In statistics, a data point is a set of measurements on a single member of a statistical population, or a subset of those measurements for a given individual...
Data quality assurance
Data quality assurance
Data quality assurance is the process of profiling the data to discover inconsistencies, and other anomalies in the data and performing data cleansing activities Data quality assurance is the process of profiling the data to discover inconsistencies, and other anomalies in the data and performing...
Data set
Data set
A data set is a collection of data, usually presented in tabular form. Each column represents a particular variable. Each row corresponds to a given member of the data set in question. Its values for each of the variables, such as height and weight of an object or values of random numbers. Each...
Data-snooping bias
Data transformation (statistics)
Data transformation (statistics)
In statistics, data transformation refers to the application of a deterministic mathematical function to each point in a data set — that is, each data point zi is replaced with the transformed value yi = f, where f is a function...
Data visualization
Data visualization
Data visualization is the study of the visual representation of data, meaning "information that has been abstracted in some schematic form, including attributes or variables for the units of information"....
DataDetective
DataDetective
DataDetective is a data mining platform developed by Sentient Information Systems. Since 1992, this software is being applied in organizations that have the need for retrieving patterns and relations in their typically large databases...

– software
Dataplot
Dataplot
Dataplot is a public-domain software system for scientific visualization and statistical analysis. It was developed at the National Institute of Standards and Technology.-External links:*...

– software
Davies–Bouldin index
Davies–Bouldin index
The Davies–Bouldin index in 1979 is a metric for evaluating clustering algorithms. This is an internal evaluation scheme, where the validation of how well the clustering has been done is made using quantities and features inherent to the dataset...
Davis distribution
De Finetti's game
De Finetti's theorem
De Finetti's theorem
In probability theory, de Finetti's theorem explains why exchangeable observations are conditionally independent given some latent variable to which an epistemic probability distribution would then be assigned...
de Moivre's law
De Moivre's law
De Moivre's Law is a survival model applied in actuarial science, named for Abraham de Moivre. It is a simple law of mortality based on a linear survival function.-Definition:De Moivre's law has a singleparameter \omega called the ultimate age...
De Moivre–Laplace theorem
De Moivre–Laplace theorem
In probability theory, the de Moivre–Laplace theorem is a normal approximation to the binomial distribution. It is a special case of the central limit theorem...
Decision boundary
Decision boundary
In a statistical-classification problem with two classes, a decision boundary or decision surface is a hypersurface that partitions the underlying vector space into two sets, one for each class...
Decision theory
Decision theory
Decision theory in economics, psychology, philosophy, mathematics, and statistics is concerned with identifying the values, uncertainties and other issues relevant in a given decision, its rationality, and the resulting optimal decision...
Decomposition of time series
Deep sampling
Deep sampling
Deep sampling is a variation of statistical sampling in which precision is sacrificed for insight. Small numbers of samples are taken, with each sample containing much information. The samples are taken approximately uniformly over the resource of interest, such as time or space...
Degenerate distribution
Degrees of freedom (statistics)
Degrees of freedom (statistics)
In statistics, the number of degrees of freedom is the number of values in the final calculation of a statistic that are free to vary.Estimates of statistical parameters can be based upon different amounts of information or data. The number of independent pieces of information that go into the...
Delphi method
Delphi method
The Delphi method is a structured communication technique, originally developed as a systematic, interactive forecasting method which relies on a panel of experts.In the standard version, the experts answer questionnaires in two or more rounds...
Delta method
Delta method
In statistics, the delta method is a method for deriving an approximate probability distribution for a function of an asymptotically normal statistical estimator from knowledge of the limiting variance of that estimator...
Demand forecasting
Demand forecasting
Demand forecasting is the activity of estimating the quantity of a product or service that consumers will purchase. Demand forecasting involves techniques including both informal methods, such as educated guesses, and quantitative methods, such as the use of historical sales data or current data...
Deming regression
Deming regression
In statistics, Deming regression, named after W. Edwards Deming, is an errors-in-variables model which tries to find the line of best fit for a two-dimensional dataset...
Demographics
Demographics
Demographics are the most recent statistical characteristics of a population. These types of data are used widely in sociology , public policy, and marketing. Commonly examined demographics include gender, race, age, disabilities, mobility, home ownership, employment status, and even location...
Demography
Demography
Demography is the statistical study of human population. It can be a very general science that can be applied to any kind of dynamic human population, that is, one that changes over time or space...
- Demographic statistics
  Demographic statistics
  Among the kinds of data that national leaders need are the demographic statistics of their population. Records of births, deaths, marriages, immigration and emigration and a regular census of population provide information that is key to making sound decisions about national policy.A useful summary...
Dendrogram
Dendrogram
A dendrogram is a tree diagram frequently used to illustrate the arrangement of the clusters produced by hierarchical clustering...
Density estimation
Density estimation
In probability and statistics,density estimation is the construction of an estimate, based on observed data, of an unobservable underlying probability density function...
Dependent and independent variables
Dependent and independent variables
The terms "dependent variable" and "independent variable" are used in similar but subtly different ways in mathematics and statistics as part of the standard terminology in those subjects...
Descriptive research
Descriptive research
Descriptive research, also known as statistical research, describes data and characteristics about the population or phenomenon being studied. Descriptive research answers the questions who, what, where, when, "why" and how......
Descriptive statistics
Descriptive statistics
Descriptive statistics quantitatively describe the main features of a collection of data. Descriptive statistics are distinguished from inferential statistics , in that descriptive statistics aim to summarize a data set, rather than use the data to learn about the population that the data are...
Design effect
Design effect
In statistics, the design effect is an adjustment used in some kinds of studies, such as cluster randomised trials, to allow for the design structure. The adjustment inflates the variance of parameter estimates, and therefore their standard errors, which is necessary to allow for correlations among...
Design matrix
Design matrix
In statistics, a design matrix is a matrix of explanatory variables, often denoted by X, that is used in certain statistical models, e.g., the general linear model....
Design of experiments
Design of experiments
In general usage, design of experiments or experimental design is the design of any information-gathering exercises where variation is present, whether under the full control of the experimenter or not. However, in statistics, these terms are usually used for controlled experiments...
- The Design of Experiments
  The Design of Experiments
  The Design of Experiments is a 1935 book by the British statistician R.A. Fisher, which effectively founded the field of design of experiments. The book has been highly influential.-References:...
  
  (book by Fisher)
Detailed balance
Detailed balance
The principle of detailed balance is formulated for kinetic systems which are decomposed into elementary processes : At equilibrium, each elementary process should be equilibrated by its reverse process....
Detection theory
Detection theory
Detection theory, or signal detection theory, is a means to quantify the ability to discern between information-bearing energy patterns and random energy patterns that distract from the information Detection theory, or signal detection theory, is a means to quantify the ability to discern between...
Determining the number of clusters in a data set
Determining the number of clusters in a data set
Determining the number of clusters in a data set, a quantity often labeled k as in the k-means algorithm, is a frequent problem in data clustering, and is a distinct issue from the process of actually solving the clustering problem....
Detrended correspondence analysis
Detrended Correspondence Analysis
Detrended correspondence analysis is a multivariate statistical technique widely used by ecologists to find the main factors or gradients in large, species-rich but usually sparse data matrices that typify ecological community data. For example, Hill and Gauch analyse the data of a vegetation...
Detrended fluctuation analysis
Detrended fluctuation analysis
In stochastic processes, chaos theory and time series analysis, detrended fluctuation analysis is a method for determining the statistical self-affinity of a signal. It is useful for analysing time series that appear to be long-memory processes In stochastic processes, chaos theory and time series...
Deviance (statistics)
Deviance information criterion
Deviance information criterion
The deviance information criterion is a hierarchical modeling generalization of the AIC and BIC . It is particularly useful in Bayesian model selection problems where the posterior distributions of the models have been obtained by Markov chain Monte Carlo simulation...
Deviation (statistics)
Deviation (statistics)
In mathematics and statistics, deviation is a measure of difference for interval and ratio variables between the observed value and the mean. The sign of deviation , reports the direction of that difference...
Deviation analysis (disambiguation)
DFFITS — a regression diagnostic
Dickey–Fuller test
Difference in differences
Difference in differences
Difference in differences is a quasi-experimental technique used in econometrics that measures the effect of a treatment at a given period in time. It is often used to measure the change induced by a particular treatment or event, though may be subject to certain biases...
Differential entropy
Differential entropy
Differential entropy is a concept in information theory that extends the idea of entropy, a measure of average surprisal of a random variable, to continuous probability distributions.-Definition:...
Diffusion process
Diffusion process
In probability theory, a branch of mathematics, a diffusion process is a solution to a stochastic differential equation. It is a continuous-time Markov process with continuous sample paths....
Diffusion-limited aggregation
Diffusion-limited aggregation
Diffusion-limited aggregation is the process whereby particles undergoing a random walk due to Brownian motion cluster together to form aggregates of such particles. This theory, proposed by Witten and Sander in 1981, is applicable to aggregation in any system where diffusion is the primary means...
Dimension reduction
Dilution assay
Dilution assay
The term dilution assay is generally used to designate a special type of bioassay in which one or more preparations are administered to experimental units at different dose levels inducing a measurable biological response. The dose levels are prepared by dilution in a diluent that is inert in...
Direct relationship
Direct relationship
In mathematics and statistics, a positive or direct relationship is a relationship between two variables in which change in one variable is associated with a change in the other variable in the same direction. For example all linear relationships with a positive slope are direct relationships...
Directional statistics
Dirichlet distribution
Dirichlet process
Dirichlet process
In probability theory, a Dirichlet process is a stochastic process that can be thought of as a probability distribution whose domain is itself a random distribution...
Disattenuation
Discrepancy function
Discrepancy function
A discrepancy function is a mathematical function which describes how closely a structural model conforms to observed data. Larger values of the discrepancy function indicate a poor fit of the model to data. In general, the parameter estimates for a given model are chosen so as to make the...
Discrete choice
Discrete choice
In economics, discrete choice problems involve choices between two or more discrete alternatives, such as entering or not entering the labor market, or choosing between modes of transport. Such choices contrast with standard consumption models in which the quantity of each good consumed is assumed...
Discrete choice analysis
Discrete distribution
Discrete phase-type distribution
Discrete phase-type distribution
The discrete phase-type distribution is a probability distribution that results from a system of one or more inter-related geometric distributions occurring in sequence, or phases. The sequence in which each of the phases occur may itself be a stochastic process...
Discrete probability distribution
Discrete time
Discrete time
Discrete time is the discontinuity of a function's time domain that results from sampling a variable at a finite interval. For example, consider a newspaper that reports the price of crude oil once every day at 6:00AM. The newspaper is described as sampling the cost at a frequency of once per 24...
Discretization of continuous features
Discretization of continuous features
In statistics and machine learning, discretization refers to the process of converting or partitioning continuous attributes, features or variables to discretized or nominal attributes/features/variables/intervals. This can be useful when creating probability mass functions – formally, in density...
Discriminant function analysis
Discriminant function analysis
Discriminant function analysis is a statistical analysis to predict a categorical dependent variable by one or more continuous or binary independent variables. It is different from an ANOVA or MANOVA, which is used to predict one or multiple continuous dependent variables by one or more...
Discriminative model
Discriminative model
Discriminative models are a class of models used in machine learning for modeling the dependence of an unobserved variable y on an observed variable x...
Disorder problem
Disorder problem
In the study of stochastic processes in mathematics, a disorder problem has been formulated by Kolmogorov. Specifically, the problem is use ongoing observations on a stochastic process to decide whether or not to raise an alarm that the probabilistic properties of the process have changed.An...
Distance correlation
Distance correlation
In statistics and in probability theory, distance correlation is a measure of statistical dependence between two random variables or two random vectors of arbitrary, not necessarily equal dimension. Its important property is that this measure of dependence is zero if and only if the random...
Distributed lag
Distributed lag
In statistics and econometrics, a distributed lag model is a model for time series data in which a regression equation is used to predict current values of a dependent variable based on both the current values of an explanatory variable and the lagged values of this explanatory variable.The...
Divergence (statistics)
Divergence (statistics)
In statistics and information geometry, divergence or a contrast function is a function which establishes the “distance” of one probability distribution to the other on a statistical manifold...
Diversity index
Diversity index
A diversity index is a statistic which is intended to measure the local members of a set consisting of various types of objects. Diversity indices can be used in many fields of study to assess the diversity of any population in which each member belongs to a unique group, type or species...
Divisia index
Divisia index
A Divisia index is a theoretical construct to create index number series for continuous-time data on prices and quantities of goods exchanged.It is designed to incorporate quantity and price changes over time from subcomponents which are measured in different units -- e.g...
Divisia monetary aggregates index
Dixon's Q test
Dominating decision rule
Dominating decision rule
In decision theory, a decision rule is said to dominate another if the performance of the former is sometimes better, and never worse, than that of the latter....
Donsker's theorem
Donsker's theorem
In probability theory, Donsker's theorem, named after M. D. Donsker, identifies a certain stochastic process as a limit of empirical processes. It is sometimes called the functional central limit theorem....
Doob decomposition theorem
Doob decomposition theorem
In the theory of discrete time stochastic processes, a part of the mathematical theory of probability, the Doob decomposition theorem gives a unique decomposition of any submartingale as the sum of a martingale and an increasing predictable process. The theorem was proved by and is named for J. L....
Doob martingale
Doob martingale
A Doob martingale is a mathematical construction of a stochastic process which approximates a given random variable and has the martingale property with respect to the given filtration...
Doob's martingale convergence theorems
Doob's martingale convergence theorems
In mathematics — specifically, in stochastic analysis — Doob's martingale convergence theorems are a collection of results on the long-time limits of supermartingales, named after the American mathematician Joseph Leo Doob....
Doob's martingale inequality
Doob's martingale inequality
In mathematics, Doob's martingale inequality is a result in the study of stochastic processes. It gives a bound on the probability that a stochastic process exceeds any given value over a given interval of time...
Doob–Meyer decomposition theorem
Doomsday argument
Doomsday argument
The Doomsday argument is a probabilistic argument that claims to predict the number of future members of the human species given only an estimate of the total number of humans born so far...
Dot plot (bioinformatics)
Dot plot (bioinformatics)
A dot plot is a graphical method that allows the comparison of two biological sequences and identify regions of close similarity between them. It is a kind of recurrence plot.-Introduction:...
Dot plot (statistics)
Dot plot (statistics)
A dot chart or dot plot is a statistical chart consisting of data points plotted on a simple scale, typically using filled in circles. There are two common, yet very different, versions of the dot chart. The first is described by Wilkinson as a graph that has been used in hand-drawn graphs to...
Double counting (fallacy)
Double counting (fallacy)
Double counting is a fallacy in which, when counting events or occurrences in probability or in other areas, a solution counts events two or more times, resulting in an erroneous number of events or occurrences which is higher than the true result...
Double exponential distribution — disambiguation
Double mass analysis
Double mass analysis
Double mass analysis is a commonly used data analysis approach for investigating the behaviour of records made of hydrological or meteorological data at a number of locations. It is used to determine whether there is a need for corrections to the data to account for changes in data collection...
Doubly stochastic model
Doubly stochastic model
In statistics, a doubly stochastic model is a type of model that can arise in many contexts, but in particular in modelling time-series and stochastic processes....
Drift rate — redirects to Stochastic drift
Stochastic drift
In probability theory, stochastic drift is the change of the average value of a stochastic process. A related term is the drift rate which is the rate at which the average changes. This is in contrast to the random fluctuations about this average value...
Dudley's theorem
Dudley's theorem
In probability theory, Dudley’s theorem is a result relating the expected upper bound and regularity properties of a Gaussian process to its entropy and covariance structure. The result was proved in a landmark 1967 paper of Richard M...
Dummy variable (statistics)
Duncan's new multiple range test
Duncan's new multiple range test
In statistics, Duncan's new multiple range test is a multiple comparison procedure developed by David B. Duncan in 1955. Duncan's MRT belongs to the general class of multiple comparison procedures that use the studentized range statistic qr to compare sets of means.Duncan's new multiple range test...
Durbin test
Durbin test
In the analysis of designed experiments, the Friedman test is the most common non-parametric test for complete block designs. The Durbin test is a nonparametric test for balanced incomplete designs that reduces to the Friedman test in the case of a complete block design.-Background:In a randomized...
Durbin–Watson statistic
Dutch book
Dutch book
In gambling a Dutch book or lock is a set of odds and bets which guarantees a profit, regardless of the outcome of the gamble. It is associated with probabilities implied by the odds not being coherent....
Dvoretzky–Kiefer–Wolfowitz inequality
Dvoretzky–Kiefer–Wolfowitz inequality
In the theory of probability and statistics, the Dvoretzky–Kiefer–Wolfowitz inequality predicts how close an empirically determined distribution function will be to the distribution function from which the empirical samples are drawn...
Dyadic distribution
Dyadic distribution
A dyadic distribution is a specific type of discrete or categorical probability distribution that is of some theoretical importance in data compression.-Definition:...
Dynamic Bayesian network
Dynamic Bayesian network
A dynamic Bayesian network is a Bayesian network that represents sequences of variables. These sequences are often time-series or sequences of symbols . The hidden Markov model can be considered as a simple dynamic Bayesian network.- References :* , Zoubin Ghahramani, Lecture Notes In Computer...
Dynamic factor

E

E-statistic
Earth mover's distance
Earth Mover's Distance
In computer science, the earth mover's distance is a measure of the distance between two probability distributions over a region D. In mathematics, this is known as the Wasserstein metric...
Ecological correlation
Ecological correlation
In statistics, an ecological correlation is a correlation between two variables that are group means, in contrast to a correlation between two variables that describe individuals. For example, one might study the correlation between physical activity and weight among sixth-grade children...
Ecological fallacy
Ecological fallacy
An ecological fallacy is a logical fallacy in the interpretation of statistical data in an ecological study, whereby inferences about the nature of specific individuals are based solely upon aggregate statistics collected for the group to which those individuals belong...
Ecological study
Ecological study
An ecological study is an epidemiological study in which the unit of analysis is a population rather than an individual. For instance, an ecological study may look at the association between smoking and lung cancer deaths in different countries...
Econometrics
Econometrics
Econometrics has been defined as "the application of mathematics and statistical methods to economic data" and described as the branch of economics "that aims to give empirical content to economic relations." More precisely, it is "the quantitative analysis of actual economic phenomena based on...
Econometric model
Econometric model
Econometric models are statistical models used in econometrics. An econometric model specifies the statistical relationship that is believed to hold between the various economic quantities pertaining to a particular economic phenomenon under study...
Econometric software – a list of software articles
Economic data
Economic data
Economic data or economic statistics may refer to data describing an actual economy, past or present. These are typically found in time-series form, that is, covering more than one time period or in cross-sectional data in one time period Economic data or economic statistics may refer to data...
Economic epidemiology
Economic epidemiology
Economic epidemiology is a field at the intersection of epidemiology and economics. Its premise is to incorporate incentives for healthy behavior and their attendant behavioral responses into an epidemiological context to better understand how diseases are transmitted...
Economic statistics
Economic statistics
Economic statistics is a topic in applied statistics that concerns the collection, processing, compilation, dissemination, and analysis of economic data. It is also common to call the data themselves 'economic statistics', but for this usage see economic data. The data of concern to economic ...
Eddy covariance
Eddy covariance
The eddy covariance technique is a key atmospheric flux measurement technique to measure and calculate vertical turbulent fluxes within atmospheric boundary layers...
Edgeworth series
Edgeworth series
The Gram–Charlier A series , and the Edgeworth series are series that approximate a probability distribution in terms of its cumulants...
Effect size
Effect size
In statistics, an effect size is a measure of the strength of the relationship between two variables in a statistical population, or a sample-based estimate of that quantity...
Efficiency (statistics)
Efficiency (statistics)
In statistics, an efficient estimator is an estimator that estimates the quantity of interest in some “best possible” manner. The notion of “best possible” relies upon the choice of a particular loss function — the function which quantifies the relative degree of undesirability of estimation errors...
Efficient estimator
Ehrenfest model
Ehrenfest model
The Ehrenfest model of diffusion was proposed by Paul Ehrenfest to explain the second law of thermodynamics. The model considers N particles in two containers. Particles independently change container at a rate λ...
Eigenpoll
Eigenpoll
An eigenpoll is a type of statistical survey which gathers knowledge from the community. It differs from opinion polls by finding the best solution, rather than finding the most popular opinion.- Methodology :...
Elastic map
Elastic map
Elastic maps provide a tool for nonlinear dimensionality reduction. By their construction, they are system of elastic springs embedded in the dataspace. This system approximates a low-dimensional manifold...
Elliptical distribution
Elliptical distribution
In probability and statistics, an elliptical distribution is any member of a broad family of probability distributions that generalize the multivariate normal distribution and inherit some of its properties.-Definition:...
Ellsberg paradox
Ellsberg paradox
The Ellsberg paradox is a paradox in decision theory and experimental economics in which people's choices violate the expected utility hypothesis.An alternate viewpoint is that expected utility theory does not properly describe actual human choices...
Elston–Stewart algorithm
Empirical
Empirical
The word empirical denotes information gained by means of observation or experimentation. Empirical data are data produced by an experiment or observation....
Empirical Bayes method
Empirical Bayes method
Empirical Bayes methods are procedures for statistical inference in which the prior distribution is estimated from the data. This approach stands in contrast to standardBayesian methods, for which the prior distribution is fixed before any data are observed...
Empirical distribution function
Empirical distribution function
In statistics, the empirical distribution function, or empirical cdf, is the cumulative distribution function associated with the empirical measure of the sample. This cdf is a step function that jumps up by 1/n at each of the n data points. The empirical distribution function estimates the true...
Empirical measure
Empirical measure
In probability theory, an empirical measure is a random measure arising from a particular realization of a sequence of random variables. The precise definition is found below. Empirical measures are relevant to mathematical statistics....
Empirical orthogonal functions
Empirical orthogonal functions
In statistics and signal processing, the method of empirical orthogonal function analysis is a decomposition of a signal or data set in terms of orthogonal basis functions which are determined from the data. It is the same as performing a principal components analysis on the data, except that the...
Empirical probability
Empirical probability
Empirical probability, also known as relative frequency, or experimental probability, is the ratio of the number of "favorable" outcomes to the total number of trials, not in a sample space but in an actual sequence of experiments...
Empirical process
Empirical process
The study of empirical processes is a branch of mathematical statistics and a sub-area of probability theory. It is a generalization of the central limit theorem for empirical measures...
Empirical statistical laws
Empirical statistical laws
An empirical statistical law or a law of statistics represents a type of behaviour that has been found across a number of datasets and, indeed, across a range of types of data sets. Many of these observances have been formulated and proved as statistical or probabilistic theorems and the term...
Endogeneity (economics)
Endogeneity (economics)
In an econometric model, a parameter or variable is said to be endogenous when there is a correlation between the parameter or variable and the error term. Endogeneity can arise as a result of measurement error, autoregression with autocorrelated errors, simultaneity, omitted variables, and sample...
End point of clinical trials
End point of clinical trials
An endpoint is something which is measured in a clinical trial or study. Measuring the selected endpoints is the goal of a trial. The response rate and survival are examples of the endpoints....
Energy distance
Energy distance
Energy distance is a statistical distance between probability distributions. If X and Y are independent random vectors in Rd, with cumulative distribution functions F and G respectively, then the energy distance between the distributions F and G is definedwhere X, X' are independent and identically...
Energy statistics
Energy statistics
Energy statistics refers to collecting, compiling, analyzing and disseminating data on commodities such as coal, crude oil, natural gas, electricity, or renewable energy sources , when they are used for the energy they contain...
Encyclopedia of Statistical Sciences
Encyclopedia of Statistical Sciences
The Encyclopedia of Statistical Sciences is the largest-ever encyclopaedia of statistics. It is published by John Wiley & Sons.The first edition, in nine volumes, was edited by Norman Lloyd Johnson and Samuel Kotz and appeared in 1982. The second edition, in 16 volumes, was published in 2006. ...

(book)
Engineering statistics
Engineering statistics
Engineering statistics combines engineering and statistics:# Design of Experiments is a methodology for formulating scientific and engineering problems using statistical models. The protocol specifies a randomization procedure for the experiment and specifies the primary data-analysis,...
Engineering tolerance
Engset calculation
Ensemble forecasting
Ensemble forecasting
Ensemble forecasting is a numerical prediction method that is used to attempt to generate a representative sample of the possible future states of a dynamical system...
Ensemble Kalman filter
Ensemble Kalman filter
The ensemble Kalman filter is a recursive filter suitable for problems with a large number of variables, such as discretizations of partial differential equations in geophysical models...
Entropy (information theory)
Entropy estimation
Entropy estimation
Estimating the differential entropy of a system or process, given some observations, is useful in various science/engineering applications, such as Independent Component Analysis, image analysis, genetic analysis, speech recognition, manifold learning, and time delay estimation...
Entropy power inequality
Entropy power inequality
In mathematics, the entropy power inequality is a result in probability theory that relates to so-called "entropy power" of random variables. It shows that the entropy power of suitably well-behaved random variables is a superadditive function. The entropy power inequality was proved in 1948 by...
Environmental statistics
Environmental statistics
Environmental statistics is the application of statistical methods to environmental science. It covers procedures for dealing with questions concerning both the natural environment in its undistrurbed state and the interaction of humanity with the environment...
Epi Info
Epi Info
Epi Info is public domain statistical software for epidemiology developed by Centers for Disease Control and Prevention in Atlanta, Georgia ....

— software
Epidata
Epidata
EpiData refers to a group of applications used in combination for creating documented data structures and analysis of quantitative data. The EpiData Association, which created the software, was created in 1999 and is based in Denmark...

— software
Epidemic model
Epidemic model
An Epidemic model is a simplified means of describing the transmission of communicable disease through individuals.-Introduction:The outbreak and spread of disease has been questioned and studied for many years...
Epidemiological methods
Epidemiological methods
The science of epidemiology has matured significantly from the times of Hippocrates and John Snow. The techniques for gathering and analyzing epidemiological data vary depending on the type of disease being monitored but each study will have overarching similarities....
Epilogism
Epilogism
Epilogism is a style of Inference invented by the ancient Empiric school of medicine. It is a theory-free method of looking at history by accumulating fact with minimal generalization and being conscious of the side effects of making causal claims .Epilogism is an inference which moves entirely...
Epitome (image processing)
Epitome (image processing)
In image processing, an epitome is a condensed digital representation of the essential statistical properties of ordered datasets, such as matrices representing images, audio signals, videos, or genetic sequences...
Epps effect
Epps effect
In econometrics and time series analysis, the Epps effect, named after T. W. Epps, is the phenomenon that the empirical correlation between the returns of two different stocks decreases as the sampling frequency of data increases. The phenomenon is caused by non-synchronous/asynchronous...
Equating
Equating
Test equating traditionally refers to the statistical process of determining comparable scores on different forms of an exam. It can be accomplished using either classical test theory or item response theory....

– test equating
Equipossible
Equipossible
Equipossibility is a philosophical concept in possibility theory that is a precursor to the notion of equiprobability in probability theory. It is used to distinguish what can occur in a probability experiment...
Equiprobable
Equiprobable
Equiprobability is a philosophical concept in probability theory that allows one to assign equal probabilities to outcomes when they are judged to be equipossible or to be "equally likely" in some sense...
Erdős–Rényi model
Erdos–Rényi model
In graph theory, the Erdős–Rényi model, named for Paul Erdős and Alfréd Rényi, is either of two models for generating random graphs, including one that sets an edge between each pair of nodes with equal probability, independently of the other edges...
Erlang distribution
Ergodic theory
Ergodic theory
Ergodic theory is a branch of mathematics that studies dynamical systems with an invariant measure and related problems. Its initial development was motivated by problems of statistical physics....
Ergodicity
Ergodicity
In mathematics, the term ergodic is used to describe a dynamical system which, broadly speaking, has the same behavior averaged over time as averaged over space. In physics the term is used to imply that a system satisfies the ergodic hypothesis of thermodynamics.-Etymology:The word ergodic is...
Error bar
Error bar
Error bars are a graphical representation of the variability of data and are used on graphs to indicate the error, or uncertainty in a reported measurement. They give a general idea of how accurate a measurement is, or conversely, how far from the reported value the true value might be...
Error correction model
Error correction model
An error correction model is a dynamical system with the characteristics that the deviation of the current state from its long-run relationship will be fed into its short-run dynamics....
Error function
Error function
In mathematics, the error function is a special function of sigmoid shape which occurs in probability, statistics and partial differential equations...
Errors and residuals in statistics
Errors and residuals in statistics
In statistics and optimization, statistical errors and residuals are two closely related and easily confused measures of the deviation of a sample from its "theoretical value"...
Errors-in-variables models
Errors-in-variables models
In statistics and econometrics, errors-in-variables models or measurement errors models are regression models that account for measurement errors in the independent variables...
An Essay towards solving a Problem in the Doctrine of Chances
An Essay towards solving a Problem in the Doctrine of Chances
An Essay towards solving a Problem in the Doctrine of Chances is a work on the mathematical theory of probability by the Reverend Thomas Bayes, published in 1763, two years after its author's death. It included a statement of a special case of what is now called Bayes' theorem. In 18th-century...
Estimating equations
Estimating equations
In statistics, the method of estimating equations is a way of specifying how the parameters of a statistical model should be estimated. This can be thought of as a generalisation of many classical methods --- the method of moments, least squares, and maximum likelihood --- as well as some recent...
Estimation
Estimation
Estimation is the calculated approximation of a result which is usable even if input data may be incomplete or uncertain.In statistics,*estimation theory and estimator, for topics involving inferences about probability distributions...
Estimation theory
Estimation theory
Estimation theory is a branch of statistics and signal processing that deals with estimating the values of parameters based on measured/empirical data that has a random component. The parameters describe an underlying physical setting in such a way that their value affects the distribution of the...
Estimation of covariance matrices
Estimation of covariance matrices
In statistics, sometimes the covariance matrix of a multivariate random variable is not known but has to be estimated. Estimation of covariance matrices then deals with the question of how to approximate the actual covariance matrix on the basis of a sample from the multivariate distribution...
Estimation of signal parameters via rotational invariance techniques
Estimator
Estimator
In statistics, an estimator is a rule for calculating an estimate of a given quantity based on observed data: thus the rule and its result are distinguished....
Etemadi's inequality
Etemadi's inequality
In probability theory, Etemadi's inequality is a so-called "maximal inequality", an inequality that gives a bound on the probability that the partial sums of a finite collection of independent random variables exceed some specified bound...
Ethical problems using children in clinical trials
Ethical problems using children in clinical trials
In health care, a clinical trial is a comparison test of a medication or other medical treatment , versus a placebo , other medications or devices, or the standard medical treatment for a patient's condition....
Event (probability theory)
Event (probability theory)
In probability theory, an event is a set of outcomes to which a probability is assigned. Typically, when the sample space is finite, any subset of the sample space is an event...
Event study
Event study
An Event study is a statistical method to assess the impact of an event on the value of a firm. For example, the announcement of a merger between two business entities can be analyzed to see whether investors believe the merger will create or destroy value...
Evidence under Bayes theorem
Evidence under Bayes theorem
Bayes' theorem provides a way of updating the probability of an event in the light of new information. In the evidence law context, for example, it could be used as a way of updating the probability that a genetic sample found at the scene of the crime came from the defendant in light of a genetic...
Evolutionary data mining
Evolutionary data mining
Evolutionary data mining, or genetic data mining is an umbrella term for any data mining using evolutionary algorithms. While it can be used for mining data from DNA sequences, it is not limited to biological contexts and can be used in any classification-based prediction scenario, which helps...
Ewens's sampling formula
Ewens's sampling formula
In population genetics, Ewens' sampling formula, describes the probabilities associated with counts of how many different alleles are observed a given number of times in the sample.-Definition:...
EWMA chart
EWMA chart
In statistical quality control, the EWMA chart is a type of control chart used to monitor either variables or attributes-type data using the monitored business or industrial process's entire history of output...
Exact statistics
Exact statistics
Exact statistics, such as that described in exact test, is a branch of statistics that was developed to provide more accurate results pertaining to statistical testing and interval estimation by eliminating procedures based on asymptotic and approximate statistical methods...
Exact test
Exact test
In statistics, an exact test is a test where all assumptions upon which the derivation of the distribution of the test statistic is based are met, as opposed to an approximate test, in which the approximation may be made as close as desired by making the sample size big enough...
Examples of Markov chains
Examples of Markov chains
- Board games played with dice :A game of snakes and ladders or any other game whose moves are determined entirely by dice is a Markov chain, indeed, an absorbing Markov chain. This is in contrast to card games such as blackjack, where the cards represent a 'memory' of the past moves. To see the...
Excess risk
Excess risk
In statistics, excess risk is a measure of the association between a specified risk factor and a specified outcome...
Exchange paradox
Exchangeable random variables
Expander walk sampling
Expander walk sampling
In the mathematical discipline of graph theory, the expander walk sampling theorem states that sampling vertices in an expander graph by doing a random walk is almost as good as sampling the vertices independently from a uniform distribution....
Expectation-maximization algorithm
Expectation-maximization algorithm
In statistics, an expectation–maximization algorithm is an iterative method for finding maximum likelihood or maximum a posteriori estimates of parameters in statistical models, where the model depends on unobserved latent variables...
Expectation propagation
Expectation propagation
Expectation propagation is a technique in Bayesian machine learning, developed by Thomas Minka.EP finds approximations to a probability distribution. It uses an iterative approach that leverages the factorization structure of the target distribution. It differs from other Bayesian approximation...
Expected utility hypothesis
Expected utility hypothesis
In economics, game theory, and decision theory the expected utility hypothesis is a theory of utility in which "betting preferences" of people with regard to uncertain outcomes are represented by a function of the payouts , the probabilities of occurrence, risk aversion, and the different utility...
Expected value
Expected value
In probability theory, the expected value of a random variable is the weighted average of all possible values that this random variable can take on...
Expected value of sample information
Expected value of sample information
In decision theory, the expected value of sample information is the expected increase in utility that you could obtain from gaining access to a sample of additional observations before making a decision. The additional information obtained from the sample may allow you to make a more informed,...
Experiment
Experiment
An experiment is a methodical procedure carried out with the goal of verifying, falsifying, or establishing the validity of a hypothesis. Experiments vary greatly in their goal and scale, but always rely on repeatable procedure and logical analysis of the results...
Experimental design diagram
Experimental Design Diagram
Experimental Design Diagram is a diagram used by scientists, to design an experiment. This diagram helps to identify the essential components of an experiment...
Experimental event rate
Experimental event rate
In epidemiology and biostatistics, the experimental event rate is a measure of how often a particular statistical event occurs within the experimental group of an experiment ....
Experimental research design
Experimental uncertainty analysis
Experimental uncertainty analysis
The purpose of this introductory article is to discuss the experimental uncertainty analysis of a derived quantity, based on the uncertainties in the experimentally measured quantities that are used in some form of mathematical relationship to calculate that derived quantity...
Experimental techniques — redirects to Experimental research design
Experimenter's bias
Experimenter's bias
In experimental science, experimenter's bias is subjective bias towards a result expected by the human experimenter. David Sackett, in a useful review of biases in clinical studies, states that biases can occur in any one of seven stages of research:...
Experimentwise error rate
Experimentwise error rate
In statistics, during multiple comparisons testing, experimentwise error rate is the probability of at least one false rejection of the null hypothesis over an entire experiment. The α that is assigned applies to all of the hypothesis tests as a whole, not individually as in the comparisonwise...
Explained sum of squares
Explained sum of squares
In statistics, the explained sum of squares is a quantity used in describing how well a model, often a regression model, represents the data being modelled...
Explained variation
Explained variation
In statistics, explained variation or explained randomness measures the proportion to which a mathematical model accounts for the variation of a given data set...
Explanatory variable
Exploratory data analysis
Exploratory data analysis
In statistics, exploratory data analysis is an approach to analysing data sets to summarize their main characteristics in easy-to-understand form, often with visual graphs, without using a statistical model or having formulated a hypothesis...
Exponential dispersion model
Exponential dispersion model
Exponential dispersion models are statistical models in which the probability distribution is of a special form. This class of models represents a generalisation of the exponential family of models which themselves play an important role in statistical theory because they have a special structure...
Exponential distribution
Exponential distribution
In probability theory and statistics, the exponential distribution is a family of continuous probability distributions. It describes the time between events in a Poisson process, i.e...
Exponential family
Exponential family
In probability and statistics, an exponential family is an important class of probability distributions sharing a certain form, specified below. This special form is chosen for mathematical convenience, on account of some useful algebraic properties, as well as for generality, as exponential...
Exponential-logarithmic distribution
Exponential power distribution — redirects to Generalized normal distribution
Exponential random numbers — redirect to subsection of Exponential distribution
Exponential distribution
In probability theory and statistics, the exponential distribution is a family of continuous probability distributions. It describes the time between events in a Poisson process, i.e...
Exponential smoothing
Exponential smoothing
Exponential smoothing is a technique that can be applied to time series data, either to produce smoothed data for presentation, or to make forecasts. The time series data themselves are a sequence of observations. The observed phenomenon may be an essentially random process, or it may be an...
Exponentiated Weibull distribution
Exponentiated weibull distribution
In statistics, the exponentiated Weibull family of probability distributions was introduced by Mudholkar and Srivastava as an extension of the Weibull family obtained by adding a second shape parameter....
Exposure variable
Extended Kalman filter
Extended Kalman filter
In estimation theory, the extended Kalman filter is the nonlinear version of the Kalman filter which linearizes about the current mean and covariance...
Extended negative binomial distribution
Extended negative binomial distribution
In probability and statistics the extended negative binomial distribution is a discrete probability distribution extending the negative binomial distribution. It is a truncated version of the negative binomial distribution for which estimation methods have been studied.In the context of actuarial...
Extensions of Fisher's method
Extensions of Fisher's method
In statistics, extensions of Fisher's method are a group of approaches that allow approximately valid statistical inferences to be made when the assumptions required for the direct application of Fisher's method are not valid...
External validity
External validity
External validity is the validity of generalized inferences in scientific studies, usually based on experiments as experimental validity....
Extrapolation domain analysis
Extrapolation domain analysis
Extrapolation domain analysis is a methodology for identifying geographical areas that seem suitable for adoption of innovative ecosystem management practices on the basis of sites exhibiting similarity in conditions such as climatic, land use and socio-economic indicators...
Extreme value theory
Extreme value theory
Extreme value theory is a branch of statistics dealing with the extreme deviations from the median of probability distributions. The general theory sets out to assess the type of probability distributions generated by processes...
Extremum estimator
Extremum estimator
In statistics and econometrics, extremum estimators is a wide class of estimators for parametric models that are calculated through maximization of a certain objective function, which depends on the data...

F

F-distribution
F-divergence
F-divergence
In probability theory, an ƒ-divergence is a function Df that measures the difference between two probability distributions P and Q...
F-statistics
F-statistics
In population genetics, F-statistics describe the level of heterozygosity in a population; more specifically the degree of a reduction in heterozygosity when compared to Hardy–Weinberg expectation...

– population genetics
F-test
F-test
An F-test is any statistical test in which the test statistic has an F-distribution under the null hypothesis.It is most often used when comparing statistical models that have been fit to a data set, in order to identify the model that best fits the population from which the data were sampled. ...
F-test of equality of variances
F-test of equality of variances
In statistics, an F-test for the null hypothesis that two normal populations have the same variance is sometimes used, although it needs to be used with caution as it can be sensitive to the assumption that the variables have this distribution....
F1 score
F1 Score
In statistics, the F1 score is a measure of a test's accuracy. It considers both the precision p and the recall r of the test to compute the score: p is the number of correct results divided by the number of all returned results and r is the number of correct results divided by the number of...
Factor analysis
Factor analysis
Factor analysis is a statistical method used to describe variability among observed, correlated variables in terms of a potentially lower number of unobserved, uncorrelated variables called factors. In other words, it is possible, for example, that variations in three or four observed variables...
Factor regression model
Factor graph
Factor graph
In probability theory and its applications, a factor graph is a particular type of graphical model, with applications in Bayesian inference, that enables efficient computation of marginal distributions through the sum-product algorithm...
Factorial code
Factorial code
Most real world data sets consist of data vectors whose individual components are not statistically independent, that is, they are redundant in the statistical sense. Then it is desirable to create a factorial code of the data, i...
Factorial experiment
Factorial experiment
In statistics, a full factorial experiment is an experiment whose design consists of two or more factors, each with discrete possible values or "levels", and whose experimental units take on all possible combinations of these levels across all such factors. A full factorial design may also be...
Factorial moment
Factorial moment generating function
Failure rate
Failure rate
Failure rate is the frequency with which an engineered system or component fails, expressed for example in failures per hour. It is often denoted by the Greek letter λ and is important in reliability engineering....
Fair coin
Fair coin
In probability theory and statistics, a sequence of independent Bernoulli trials with probability 1/2 of success on each trial is metaphorically called a fair coin. One for which the probability is not 1/2 is called a biased or unfair coin...
Falconer's formula
Falconer's formula
Falconer's formula is used in twin studies to determine the genetic heritability of a trait based on the difference between twin correlations.The formula is hb2 = 2, where hb2 is the broad sense heritability, rmz is the identical twin correlation, and rdz is the fraternal twin correlation...
False discovery rate
False discovery rate
False discovery rate control is a statistical method used in multiple hypothesis testing to correct for multiple comparisons. In a list of rejected hypotheses, FDR controls the expected proportion of incorrectly rejected null hypotheses...
False negative
Type I and type II errors
In statistical test theory the notion of statistical error is an integral part of hypothesis testing. The test requires an unambiguous statement of a null hypothesis, which usually corresponds to a default "state of nature", for example "this person is healthy", "this accused is not guilty" or...
False positive
Type I and type II errors
In statistical test theory the notion of statistical error is an integral part of hypothesis testing. The test requires an unambiguous statement of a null hypothesis, which usually corresponds to a default "state of nature", for example "this person is healthy", "this accused is not guilty" or...
False positive rate
False positive rate
When performing multiple comparisons in a statistical analysis, the false positive rate is the probability of falsely rejecting the null hypothesis for a particular test among all the tests performed...
False positive paradox
False positive paradox
The false positive paradox is a statistical result where false positive tests are more probable than true positive tests, occurring when the overall population has a low incidence of a condition and the incidence rate is lower than the false positive rate...
Familywise error rate
Familywise error rate
In statistics, familywise error rate is the probability of making one or more false discoveries, or type I errors among all the hypotheses when performing multiple pairwise tests.-Classification of m hypothesis tests:...
Fan chart (time series)
Fan chart (time series)
In time series analysis, a fan chart is a chart that joins a simple line chart for observed past data, by showing ranges for possible values of future data together with a line showing a central estimate or most likely value for the future outcomes...
Fano factor
Fast Fourier transform
Fast Fourier transform
A fast Fourier transform is an efficient algorithm to compute the discrete Fourier transform and its inverse. "The FFT has been called the most important numerical algorithm of our lifetime ." There are many distinct FFT algorithms involving a wide range of mathematics, from simple...
Fast Kalman filter
Fast Kalman filter
The fast Kalman filter , devised by Antti Lange , is an extension of the Helmert-Wolf blocking method from geodesy to real-time applications of Kalman filtering such as satellite imaging of the Earth...
FastICA
FastICA
FastICA is an efficient and popular algorithm for independent component analysis invented by Aapo Hyvärinen at Helsinki University of Technology. The algorithm is based on a fixed-point iteration scheme maximizing non-Gaussianity as a measure of statistical independence...

– fast independent component analysis
Fat tail
Fat tail
A fat-tailed distribution is a probability distribution that has the property, along with the heavy-tailed distributions, that they exhibit extremely large skewness or kurtosis. This comparison is often made relative to the ubiquitous normal distribution, which itself is an example of an...
Feasible generalized least squares
Feature extraction
Feature extraction
In pattern recognition and in image processing, feature extraction is a special form of dimensionality reduction.When the input data to an algorithm is too large to be processed and it is suspected to be notoriously redundant then the input data will be transformed into a reduced representation...
Feller process
Feller process
In probability theory relating to stochastic processes, a Feller process is a particular kind of Markov process.-Definitions:Let X be a locally compact topological space with a countable base...
Feller's coin-tossing constants
Feller's coin-tossing constants
Feller's coin-tossing constants are a set of numerical constants which describe asymptotic probabilities that in n independent tosses of a fair coin, no run of k consecutive heads appears....
Feller-continuous process
Feller-continuous process
In mathematics, a Feller-continuous process is a continuous-time stochastic process for which the expected value of suitable statistics of the process at a given time in the future depend continuously on the initial condition of the process...
Felsenstein's tree peeling algorithm
Felsenstein's Tree Peeling Algorithm
In statistical genetics, Felsenstein's tree-pruning algorithm , due to Joseph Felsenstein, is an algorithm for computing the likelihood of an evolutionary tree from nucleic acid sequence data....

— statistical genetics
Fides (reliability)
Fides (reliability)
Fides is a guide allowing estimated reliability calculation for electronic components and systems. The reliability prediction is generally expressed in FIT or MTBF...
Fiducial inference
Fiducial inference
Fiducial inference is one of a number of different types of statistical inference. These are rules, intended for general application, by which conclusions can be drawn from samples of data. In modern statistical practice, attempts to work with fiducial inference have fallen out of fashion in...
Field experiment
Field experiment
A field experiment applies the scientific method to experimentally examine an intervention in the real world rather than in the laboratory...
Fieller's theorem
Fieller's theorem
In statistics, Fieller's theorem allows the calculation of a confidence interval for the ratio of two means.-Approximate confidence interval:...
File drawer problem
Filtering problem (stochastic processes)
Filtering problem (stochastic processes)
In the theory of stochastic processes, the filtering problem is a mathematical model for a number of filtering problems in signal processing and the like. The general idea is to form some kind of "best estimate" for the true value of some system, given only some observations of that system...
Financial econometrics
Financial econometrics
People working in the finance industry often use econometric techniques in a range of activities. For example in support of portfolio management, risk management and in the analysis of securities...
Financial models with long-tailed distributions and volatility clustering
Finite-dimensional distribution
Finite-dimensional distribution
In mathematics, finite-dimensional distributions are a tool in the study of measures and stochastic processes. A lot of information can be gained by studying the "projection" of a measure onto a finite-dimensional vector space .-Finite-dimensional distributions of a measure:Let be a measure space...
First-hitting-time model
First-hitting-time model
In statistics, first-hitting-time models are a sub-class of survival models. The first hitting time, also called first passage time, of a set A with respect to an instance of a stochastic process is the time until the stochastic process first enters A....
First-in-man study
First-in-man study
A first-in-man study is a clinical trial where a medical procedure, previously developed and assessed through in vitro or animal testing, or through mathematical modelling is tested on human subjects for the first time....
Fishburn–Shepp inequality
Fishburn–Shepp inequality
In combinatorial mathematics, the Fishburn–Shepp inequality is an inequality for the number of extensions of partial orders to linear orders, found by and .It states that if x, y, and z are incomparable elements of a finite poset, then PIn combinatorial mathematics, the Fishburn–Shepp inequality...
Fisher consistency
Fisher consistency
In statistics, Fisher consistency, named after Ronald Fisher, is a desirable property of an estimator asserting that if the estimator were calculated using the entire population rather than a sample, the true value of the estimated parameter would be obtained...
Fisher information
Fisher information
In mathematical statistics and information theory, the Fisher information is the variance of the score. In Bayesian statistics, the asymptotic distribution of the posterior mode depends on the Fisher information and not on the prior...
Fisher information metric
Fisher information metric
In information geometry, the Fisher information metric is a particular Riemannian metric which can be defined on a smooth statistical manifold, i.e., a smooth manifold whose points are probability measures defined on a common probability space....
Fisher kernel
Fisher kernel
In statistical classification, the Fisher kernel, named in honour of Sir Ronald Fisher, is a function that measures the similarity of two objects on the basis of sets of measurements for each object and a statistical model...
Fisher transformation
Fisher transformation
In statistics, hypotheses about the value of the population correlation coefficient ρ between variables X and Y can be tested using the Fisher transformation applied to the sample correlation coefficient r.-Definition:...
Fisher's exact test
Fisher's exact test
Fisher's exact test is a statistical significance test used in the analysis of contingency tables where sample sizes are small. It is named after its inventor, R. A...
Fisher's inequality
Fisher's inequality
In combinatorial mathematics, Fisher's inequality, named after Ronald Fisher, is a necessary condition for the existence of a balanced incomplete block design satisfying certain prescribed conditions....
Fisher's linear discriminator
Fisher's method
Fisher's Method
In statistics, Fisher's method, also known as Fisher's combined probability test, is a technique for data fusion or "meta-analysis" . It was developed by and named for Ronald Fisher...
Fisher's noncentral hypergeometric distribution
Fisher's noncentral hypergeometric distribution
In probability theory and statistics, Fisher's noncentral hypergeometric distribution is a generalization of the hypergeometric distribution where sampling probabilities are modified by weight factors...
Fisher's z-distribution
Fisher-Tippett distribution — redirects to Generalized extreme value distribution
Fisher–Tippett–Gnedenko theorem
Five-number summary
Five-number summary
The five-number summary is a descriptive statistic that provides information about a set of observations. It consists of the five most important sample percentiles:# the sample minimum # the lower quartile or first quartile...
Fixed effects estimator
Fixed effects estimator
In econometrics and statistics, a fixed effects model is a statistical model that represents the observed quantities in terms of explanatory variables that are treated as if the quantities were non-random. This is in contrast to random effects models and mixed models in which either all or some of...

and Fixed effects estimation — redirect to Fixed effects model
FLAME clustering
FLAME clustering
Fuzzy clustering by Local Approximation of MEmberships is a data clustering algorithm that defines clusters in the dense parts of a dataset and performs cluster assignment solely based on the neighborhood relationships among objects...
Fleiss' kappa
Fleiss' kappa
Fleiss' kappa is a statistical measure for assessing the reliability of agreement between a fixed number of raters when assigning categorical ratings to a number of items or classifying items. This contrasts with other kappas such as Cohen's kappa, which only work when assessing the agreement...
Fleming-Viot process
Fleming-Viot process
In probability theory, a Fleming–Viot process is a member of a particular subset of probability-measure valued Markov processes on compact metric spaces, as defined in the 1979 paper by Wendell Helms Fleming and Michel Viot...
Flood risk assessment
Flood risk assessment
A flood risk assessment is an assessment of the risk of flooding, particularly in relation to residential, commercial and industrial land use.-England and Wales:...
Floor effect
Floor effect
In statistics, the term floor effect refers to when data cannot take on a value lower than some particular number, called the floor.An example of this is when an IQ test is given to young children who have either been given training or have been given no training...
FNN algorithm
FNN algorithm
The false nearest neighbor algorithm is an algorithm for estimating the embedding dimension....

(false nearest neighbour algorithm)
Focused information criterion
Focused information criterion
In statistics, the focused information criterion is a method for selecting the most appropriate model among a set of competitors for a given data set...
Fokker–Planck equation
Folded normal distribution
Folded Normal Distribution
The folded normal distribution is a probability distribution related to the normal distribution. Given a normally distributed random variable X with mean μ and variance σ2, the random variable Y = |X| has a folded normal distribution. Such a case may be encountered if only the magnitude of some...
Forecast bias
Forecast bias
A forecast bias occurs when there are consistent differences between actual outcomes and previously generated forecasts of those quantities; that is, forecasts may have a general tendency to be too high or too low...
Forecast error
Forecast error
In statistics, a forecast error is the difference between the actual or real and the predicted or forecast value of a time series or any other phenomenon of interest....
Forecast skill
Forecast skill
Skill in forecasting is a scaled representation of forecast error that relates the forecast accuracy of a particular forecast model to some reference model....
Forecasting
Forecasting
Forecasting is the process of making statements about events whose actual outcomes have not yet been observed. A commonplace example might be estimation for some variable of interest at some specified future date. Prediction is a similar, but more general term...
Forest plot
Forest plot
A forest plot is a graphical display designed to illustrate the relative strength of treatment effects in multiple quantitative scientific studies addressing the same question. It was developed for use in medical research as a means of graphically representing a meta-analysis of the results of...
Fork-join queue
Fork-join queue
In queueing theory, a discipline within the mathematical theory of probability, a fork-join queue is a queue where incoming jobs are split on arrival for service by numerous servers and joined before departure. The model is often used for parallel computations or systems where products need to be...
Formation matrix
Formation matrix
In statistics and information theory, the expected formation matrix of a likelihood function L is the matrix inverse of the Fisher information matrix of L, while the observed formation matrix of L is the inverse of the observed information matrix of L.Currently, no notation for dealing with...
Forward measure
Forward measure
In finance, a T-forward measure is a pricing measure absolutely continuous with respect to a risk-neutral measure but rather than using the money market as numeraire, it uses a bond with maturity T...
Foster's theorem
Foster's theorem
In probability theory, Foster's theorem, named after F. G. Foster, is used to draw conclusions about the positive recurrence of Markov chains with countable state spaces...
Foundations of statistics
Foundations of statistics
Foundations of statistics is the usual name for the epistemological debate in statistics over how one should conduct inductive inference from data...
Founders of statistics
Founders of statistics
Statistics is the theory and application of mathematics to the scientific method including hypothesis generation, experimental design, sampling, data collection, data summarization, estimation, prediction and inference from those results to the population from which the experimental sample was drawn...
Fourier analysis
Fraction of variance unexplained
Fraction of variance unexplained
In statistics, the fraction of variance unexplained in the context of a regression task is the fraction of variance of the regressand Y which cannot be explained, i.e., which is not correctly predicted, by the explanatory variables X....
Fractional Brownian motion
Fractional factorial design
Fréchet distribution
Fréchet mean
Fréchet mean
The Fréchet mean , is the point, x, that minimizes the Fréchet function, in cases where such a unique minimizer exists. The value at a point p, of the Fréchet function associated to a random point X on a complete metric space is the expected squared distance from p to X...
Free statistical software
Free statistical software
In this article, the word free generally means can be legally obtained without paying any money . Just a few of the software packages mentioned here are also free as in the sense of free speech: they are not only open source but also free software in the sense that the source code of the software...
Freedman's paradox
Freedman's paradox
In statistical analysis, Freedman's paradox, named after David Freedman, describes a problem in model selection whereby predictor variables with no explanatory power can appear artificially important. Freedman demonstrated that this is a common occurrence when the number of variables is similar to...
Freedman–Diaconis rule
Freidlin–Wentzell theorem
Frequency (statistics)
Frequency (statistics)
In statistics the frequency of an event i is the number ni of times the event occurred in the experiment or the study. These frequencies are often graphically represented in histograms....
Frequency distribution
Frequency distribution
In statistics, a frequency distribution is an arrangement of the values that one or more variables take in a sample. Each entry in the table contains the frequency or count of the occurrences of values within a particular group or interval, and in this way, the table summarizes the distribution of...
Frequency domain
Frequency domain
In electronics, control systems engineering, and statistics, frequency domain is a term used to describe the domain for analysis of mathematical functions or signals with respect to frequency, rather than time....
Frequency probability
Frequency probability
Frequency probability is the interpretation of probability that defines an event's probability as the limit of its relative frequency in a large number of trials. The development of the frequentist account was motivated by the problems and paradoxes of the previously dominant viewpoint, the...
Frequentist inference
Frequentist inference
Frequentist inference is one of a number of possible ways of formulating generally applicable schemes for making statistical inferences: that is, for drawing conclusions from statistical samples. An alternative name is frequentist statistics...
Friedman test
Friedman test
The Friedman test is a non-parametric statistical test developed by the U.S. economist Milton Friedman. Similar to the parametric repeated measures ANOVA, it is used to detect differences in treatments across multiple test attempts. The procedure involves ranking each row together, then...
Friendship paradox
Friendship paradox
The friendship paradox is the phenomenon first observed by the sociologist Scott L. Feld in 1991 that most people have fewer friends than their friends have, on average. It can be explained as a form of sampling bias in which people with greater numbers of friends have an increased likelihood of...
Frisch–Waugh–Lovell theorem
Fully crossed design
Function approximation
Function approximation
The need for function approximations arises in many branches of applied mathematics, and computer science in particular. In general, a function approximation problem asks us to select a function among a well-defined class that closely matches a target function in a task-specific way.One can...
Functional data analysis
Functional data analysis
Functional data analysis is a branch of statistics that analyzes data providing information about curves, surfaces or anything else varying over a continuum...
Funnel plot
Funnel plot
A funnel plot is a useful graph designed to check the existence of publication bias in systematic reviews and meta-analyses. It assumes that the largest studies will be near the average, and small studies will be spread on both sides of the average...
Fuzzy logic
Fuzzy logic
Fuzzy logic is a form of many-valued logic; it deals with reasoning that is approximate rather than fixed and exact. In contrast with traditional logic theory, where binary sets have two-valued logic: true or false, fuzzy logic variables may have a truth value that ranges in degree between 0 and 1...
Fuzzy measure theory
Fuzzy measure theory
Fuzzy measure theory considers a number of special classes of measures, each of which is characterized by a special property. Some of the measures used in this theory are plausibility and belief measures, fuzzy set membership function and the classical probability measures...
FWL theorem
FWL theorem
In econometrics, the Frisch–Waugh–Lovell theorem is named after the econometricians Ragnar Frisch, Frederick V. Waugh, and Michael C. Lovell.The Frisch–Waugh–Lovell theorem states that if the regression we are concerned with is:...

— relating regression and projection

G

G-network
G-test
G-test
In statistics, G-tests are likelihood-ratio or maximum likelihood statistical significance tests that are increasingly being used in situations where chi-squared tests were previously recommended....
Galbraith plot
Galbraith plot
In statistics, a Galbraith plot , is one way of displaying several estimates of the same quantity that have different standard errors....
Gallagher Index
Gallagher Index
The Gallagher Index is used to measure the disproportionality of an electoral outcome, that is the difference between the percentage of votes received and the percentage of seats a party gets in the resulting legislature. This is especially useful for comparing proportionality across electoral...
Galton–Watson process
Galton's problem
Galton's problem
Galton’s problem, named after Sir Francis Galton, is the problem of drawing inferences from cross-cultural data, due to the statistical phenomenon now called autocorrelation. The problem is now recognized as a general one that applies to all nonexperimental studies and to experimental design as well...
Gambler's fallacy
Gambler's fallacy
The Gambler's fallacy, also known as the Monte Carlo fallacy , and also referred to as the fallacy of the maturity of chances, is the belief that if deviations from expected behaviour are observed in repeated independent trials of some random process, future deviations in the opposite direction are...
Gambler's ruin
Gambler's ruin
The term gambler's ruin is used for a number of related statistical ideas:* The original meaning is that a gambler who raises his bet to a fixed fraction of bankroll when he wins, but does not reduce it when he loses, will eventually go broke, even if he has a positive expected value on each bet.*...
Gambling and information theory
Gambling and information theory
Statistical inference might be thought of as gambling theory applied to the world around. The myriad applications for logarithmic information measures tell us precisely how to take the best guess in the face of partial information. In that sense, information theory might be considered a formal...
Game of chance
Game of chance
A game of chance is a game whose outcome is strongly influenced by some randomizing device, and upon which contestants may or may not wager money or anything of monetary value...
Gamma distribution
Gamma test (statistics)
Gamma test (statistics)
In statistics, a gamma test tests the strength of association of the cross tabulated data when both variables are measured at the ordinal level. It makes no adjustment for either table size or ties. Values range from −1 to +1...
Gamma process
Gamma variate
GAUSS (software)
GAUSS (software)
GAUSS is a matrix programming language for mathematics and statistics, developed and marketed by Aptech Systems. Its primary purpose is the solution of numerical problems in statistics, econometrics, time-series, optimization and 2D- and 3D-visualization...
Gauss's inequality
Gauss's inequality
In probability theory, Gauss's inequality gives an upper bound on the probability that a unimodal random variable lies more than any given distance from its mode....
Gauss–Kuzmin distribution
Gauss–Markov process
Gauss–Markov process
Gauss–Markov stochastic processes are stochastic processes that satisfy the requirements for both Gaussian processes and Markov processes. The stationary Gauss–Markov process is a very special case because it is unique, except for some trivial exceptions...
Gauss–Markov theorem
Gauss–Markov theorem
In statistics, the Gauss–Markov theorem, named after Carl Friedrich Gauss and Andrey Markov, states that in a linear regression model in which the errors have expectation zero and are uncorrelated and have equal variances, the best linear unbiased estimator of the coefficients is given by the...
Gauss–Newton algorithm
Gaussian function
Gaussian isoperimetric inequality
Gaussian measure
Gaussian measure
In mathematics, Gaussian measure is a Borel measure on finite-dimensional Euclidean space Rn, closely related to the normal distribution in statistics. There is also a generalization to infinite-dimensional spaces...
Gaussian noise
Gaussian noise
Gaussian noise is statistical noise that has its probability density function equal to that of the normal distribution, which is also known as the Gaussian distribution. In other words, the values that the noise can take on are Gaussian-distributed. A special case is white Gaussian noise, in which...
Gaussian process
Gaussian process
In probability theory and statistics, a Gaussian process is a stochastic process whose realisations consist of random values associated with every point in a range of times such that each such random variable has a normal distribution...
Gaussian process emulator
Gaussian process emulator
In statistics, Gaussian process emulator is one name for a general type of statistical model that has been used in contexts where the problem is to make maximum use of the outputs of a complicated computer-based simulation model. Each run of the simulation model is computationally expensive and...
Gaussian q-distribution
Gaussian q-distribution
In mathematical physics and probability and statistics, the Gaussian q-distribution is a family of probability distributions that includes, as limiting cases, the uniform distribution and the normal distribution...
Geary's C
Geary's C
Geary's C is a measure of spatial autocorrelation. Like autocorrelation, spatial autocorrelation means that adjacent observations of the same phenomenon are correlated. However, autocorrelation is about proximity in time. Spatial autocorrelation is about proximity in space...
GEH
GEH
The GEH Statistic is a formula used in traffic engineering, traffic forecasting, and traffic modelling to compare two sets of traffic volumes. The GEH formula gets its name from Geoffrey E. Havers, who invented it in the 1970s while working as a transport planner in London, England. Although its...

— a statistic comparing modelled and observed counts
General linear model
General linear model
The general linear model is a statistical linear model.It may be written aswhere Y is a matrix with series of multivariate measurements, X is a matrix that might be a design matrix, B is a matrix containing parameters that are usually to be estimated and U is a matrix containing errors or...
General matrix notation of a VAR(p)
Generalizability theory
Generalizability theory
Generalizability theory, or G Theory, is a statistical framework for conceptualizing, investigating, and designing reliable observations. It is used to determine the reliability of measurements under specific conditions. It is particularly useful for assessing the reliability of performance...
Generalized additive model
Generalized additive model
In statistics, the generalized additive model is a statistical model developed by Trevor Hastie and Rob Tibshirani for blending properties of generalized linear models with additive models....
Generalized additive model for location, scale and shape
Generalized additive model for location, scale and shape
In statistics, the generalized additive model location, scale and shape is a class of statistical model that provides extended capabilities compared to the simpler generalized linear models and generalized additive models. These simpler models allow the typical values of a quantity being modelled...
Generalized canonical correlation
Generalized canonical correlation
In statistics, the generalized canonical correlation analysis , is a way of making sense of cross-correlation matrices between the sets of random variables when there are more than two sets. While a conventional CCA generalizes Principal component analysis to two sets of random variables, a gCCA ...
Generalized chi-squared distribution
Generalized Dirichlet distribution
Generalized Dirichlet distribution
In statistics, the generalized Dirichlet distribution is a generalization of the Dirichlet distribution with a more general covariance structure and twice the number of parameters...
Generalized entropy index
Generalized entropy index
The generalized entropy index is a general formula for measuring redundancy in data. The redundancy can be viewed as inequality, lack of diversity, non-randomness, compressibility, or segregation in the data. The primary use is for income inequality...
Generalized estimating equation
Generalized expected utility
Generalized expected utility
The expected utility model developed by John von Neumann and Oskar Morgenstern dominated decision theory from its formulation in 1944 until the late 1970s, not only as a prescriptive, but also as a descriptive model, despite powerful criticism from Maurice Allais and Daniel Ellsberg who showed...
Generalized extreme value distribution
Generalized gamma distribution
Generalized gamma distribution
The generalized gamma distribution is a continuous probability distribution with three parameters. It is a generalization of the two-parameter gamma distribution...
Generalized Gaussian distribution
Generalised hyperbolic distribution
Generalized inverse Gaussian distribution
Generalized least squares
Generalized least squares
In statistics, generalized least squares is a technique for estimating the unknown parameters in a linear regression model. The GLS is applied when the variances of the observations are unequal , or when there is a certain degree of correlation between the observations...
Generalized linear array model
Generalized linear array model
In statistics, the generalized linear array model is used for analyzing data sets with array structures. It based on the generalized linear model with the design matrix written as a Kronecker product.- Overview :...
Generalized linear mixed model
Generalized linear mixed model
In statistics, a generalized linear mixed model is a particular type of mixed model. It is an extension to the generalized linear model in which the linear predictor contains random effects in addition to the usual fixed effects...
Generalized linear model
Generalized linear model
In statistics, the generalized linear model is a flexible generalization of ordinary linear regression. The GLM generalizes linear regression by allowing the linear model to be related to the response variable via a link function and by allowing the magnitude of the variance of each measurement to...
Generalized logistic distribution
Generalized logistic distribution
The term generalized logistic distribution is used as the name for several different families of probability distributions. For example, Johnson et al. list four forms, which are listed below. One family described here has also been called the skew-logistic distribution...
Generalized method of moments
Generalized method of moments
In econometrics, generalized method of moments is a generic method for estimating parameters in statistical models. Usually it is applied in the context of semiparametric models, where the parameter of interest is finite-dimensional, whereas the full shape of the distribution function of the data...
Generalized multidimensional scaling
Generalized normal distribution
Generalized p-value
Generalized p-value
In statistics, a generalized p-value is an extended version of the classical p-value, which except in a limited number of applications, provide only approximate solutions....
Generalized Pareto distribution
Generalized Procrustes analysis
Generalized Procrustes analysis
Generalized Procrustes analysis is a method of statistical analysis that can be used to compare the shapes of objects, or the results of surveys, interviews, panels. It was developed for analyising the results of free-choice profiling, a survey technique which allows respondents to describe a...
Generalized randomized block design
Generalized randomized block design
In randomized statistical experiments, generalized randomized block designs are used to study the interaction between blocks and treatments...
Generalized Tobit
Generalized Tobit
A generalized Tobit is a generalization of the econometric Tobit model after James Tobin. It is also called Heckit after James Heckman. Anothername is "type 2 Tobit model".Tobit models assume that a variable is truncated.-References:...
Generalized Wiener process
Generalized Wiener process
In statistics, a generalized Wiener process is a continuous time random walk with drift and random jumps at every point in time...
Generative model
Generative model
In probability and statistics, a generative model is a model for randomly generating observable data, typically given some hidden parameters. It specifies a joint probability distribution over observation and label sequences...
Genetic epidemiology
Genetic epidemiology
Genetic epidemiology is the study of the role of genetic factors in determining health and disease in families and in populations, and the interplay of such genetic factors with environmental factors...
GenStat
GenStat
GenStat is a general statistical package. Early versions were developed for large mainframe computers. Up until version 5, there was a Unix binary available, and this continues to be used by many universities and research institutions...

– software
Geo-imputation
Geo-imputation
In data analysis involving geographical locations, geo-imputation or geographical imputation methods are steps taken to replace missing values for exact locations with approximate locations derived from associated data...
Geodemographic segmentation
Geodemographic Segmentation
In marketing, Geodemographic segmentation is a multivariate statistical classification technique for discovering whether the individuals of a population fall into different groups by making quantitative comparisons of multiple characteristics with the assumption that the differences within any...
Geometric Brownian motion
Geometric Brownian motion
A geometric Brownian motion is a continuous-time stochastic process in which the logarithm of the randomly varying quantity follows a Brownian motion, also called a Wiener process...
Geometric data analysis
Geometric data analysis
Geometric data analysis can refer to geometric aspects of image analysis, pattern analysis and shape analysis or the approach of multivariate statistics that treats arbitrary data sets as clouds of points in n-dimensional space...
Geometric distribution
Geometric median
Geometric median
The geometric median of a discrete set of sample points in a Euclidean space is the point minimizing the sum of distances to the sample points. This generalizes the median, which has the property of minimizing the sum of distances for one-dimensional data, and provides a central tendency in higher...
Geometric standard deviation
Geometric standard deviation
In probability theory and statistics, the geometric standard deviation describes how spread out are a set of numbers whose preferred average is the geometric mean...
Geometric stable distribution
Geospatial predictive modeling
Geospatial predictive modeling
Geospatial predictive modeling is conceptually rooted in the principle that the occurrences ofevents being modeled are limited in distribution...
Geostatistics
Geostatistics
Geostatistics is a branch of statistics focusing on spatial or spatiotemporal datasets. Developed originally to predict probability distributions of ore grades for mining operations, it is currently applied in diverse disciplines including petroleum geology, hydrogeology, hydrology, meteorology,...
German tank problem
German tank problem
In the statistical theory of estimation, estimating the maximum of a uniform distribution is a common illustration of differences between estimation methods...
Gerschenkron effect
Gerschenkron effect
The Gerschenkron effect was developed by Alexander Gerschenkron, and claims that changing the base year for an index determines the growth rate of the index.This description is from the OECD website :...
Gibbs sampling
Gibbs sampling
In statistics and in statistical physics, Gibbs sampling or a Gibbs sampler is an algorithm to generate a sequence of samples from the joint probability distribution of two or more random variables...
Gillespie algorithm
Gillespie algorithm
In probability theory, the Gillespie algorithm generates a statistically correct trajectory of a stochastic equation. It was created by Joseph L...
Gini coefficient
Gini coefficient
The Gini coefficient is a measure of statistical dispersion developed by the Italian statistician and sociologist Corrado Gini and published in his 1912 paper "Variability and Mutability" ....
Girsanov theorem
Girsanov theorem
In probability theory, the Girsanov theorem describes how the dynamics of stochastic processes change when the original measure is changed to an equivalent probability measure...
Gittins index
Gittins index
The Gittins index is a measure of the reward that can be achieved by a process evolving from its present state onwards with the probability that it will be terminated in future...
GLIM (software)
GLIM (software)
GLIM is a statistical software program for fitting generalized linear models .It was developed by the Royal Statistical Society'sWorking Party on Statistical Computing...

– software
Glivenko–Cantelli theorem
GLUE (uncertainty assessment)
GLUE (uncertainty assessment)
In hydrology, Generalized Likelihood Uncertainty Estimation is a statistical method for quantifying the uncertainty of model predictions. The method has been introduced by Beven and Binley...
Goldfeld–Quandt test
Goldfeld–Quandt test
In statistics, the Goldfeld–Quandt test checks for homoscedasticity in regression analyses. It does this by dividing a dataset into two parts or groups, and hence the test is sometimes called a two-group test. The Goldfeld–Quandt test is one of two tests proposed in a 1965 paper by Stephen...
Gompertz function
Gompertz–Makeham law of mortality
Good–Turing frequency estimation
Goodhart's law
Goodhart's law
Goodhart's law, although it can be expressed in many ways, states that once a social or economic indicator or other surrogate measure is made a target for the purpose of conducting social or economic policy, then it will lose the information content that would qualify it to play that role...
Goodman and Kruskal's lambda
Goodman and Kruskal's lambda
In probability theory and statistics, Goodman & Kruskal's lambda is a measure of proportional reduction in error in cross tabulation analysis...
Goodness of fit
Goodness of fit
The goodness of fit of a statistical model describes how well it fits a set of observations. Measures of goodness of fit typically summarize the discrepancy between observed values and the values expected under the model in question. Such measures can be used in statistical hypothesis testing, e.g...
Gordon–Newell network
Gordon–Newell theorem
Gordon–Newell theorem
In queueing theory, a discipline within the mathematical theory of probability, the Gordon–Newell theorem is an extension of Jackson's theorem from open queueing networks to closed queueing networks of exponential servers. We cannot apply Jackson's theorem to closed networks because the queue...
Graeco-Latin square
Graeco-Latin square
In mathematics, a Graeco-Latin square or Euler square or orthogonal Latin squares of order n over two sets S and T, each consisting of n symbols, is an n×n arrangement of cells, each cell containing an ordered pair , where s is in S and t is in T, such that every row and every column contains...
Grand mean
Grand mean
The grand mean is the mean of the means of several subsamples. For example, consider several lots, each containing several items. The items from each lot are sampled for a measure of some variable and the means of the measurements from each lot are computed. The mean of the measures from each lot...
Granger causality
Granger causality
The Granger causality test is a statistical hypothesis test for determining whether one time series is useful in forecasting another. Ordinarily, regressions reflect "mere" correlations, but Clive Granger, who won a Nobel Prize in Economics, argued that there is an interpretation of a set of tests...
Graph cuts in computer vision
Graph cuts in computer vision
As applied in the field of computer vision, graph cuts can be employed to efficiently solve a wide variety of low-level computer vision problems , such as image smoothing, the stereo correspondence problem, and many other computer vision problems that can be formulated in terms of energy minimization...

– a potential application of Bayesian analysis
Graphical model
Graphical model
A graphical model is a probabilistic model for which a graph denotes the conditional independence structure between random variables. They are commonly used in probability theory, statistics—particularly Bayesian statistics—and machine learning....
Graphical models for protein structure
Graphical models for protein structure
Graphical models have become powerful frameworks for protein structure prediction, protein–protein interaction and free energy calculations for protein structures...
GraphPad InStat
GraphPad InStat
GraphPad InStat is a commercial scientific statistics software published by GraphPad Software, Inc., a privately owned California corporation. InStat is available for both Windows and Macintosh computers.-Features:...

– software
GraphPad Prism
GraphPad Prism
GraphPad Prism is a commercial scientific 2D graphing and statistics software published by GraphPad Software, Inc., a privately-held California corporation...

– software
Gravity model of trade
Gravity model of trade
The gravity model of trade in international economics, similar to other gravity models in social science, predicts bilateral trade flows based on the economic sizes of and distance between two units. The model was first used by Tinbergen in 1962...
Greenwood statistic
Gretl
Gretl
gretl is an open-source statistical package, mainly for econometrics. The name is an acronym for Gnu Regression, Econometrics and Time-series Library. It has a graphical user interface and can be used together with X-12-ARIMA, TRAMO/SEATS, R, Octave, and Ox. It is written in C, uses GTK as widget...
Group family
Group family
In probability theory, especially as that field is used in statistics, a group family of probability distributions is a family obtained by subjecting a random variable with a fixed distribution to a suitable family of transformations such as a location-scale family, or otherwise a family of...
Group method of data handling
Group method of data handling
Group method of data handling is a family of inductive algorithms for computer-based mathematical modeling of multi-parametric datasets that features fully automatic structural and parametric optimization of models....
Group size measures
Group size measures
Many animals, including humans, tend to live in groups, herds, flocks, bands, packs, shoals, or colonies of conspecific individuals. The size of these groups, as expressed by the number of participant individuals, is an important aspect of their social environment...
Grouped data
Grouped data
Grouped data is a statistical term used in data analysis. A raw dataset can be organized by constructing a table showing the frequency distribution of the variable...
Grubbs' test for outliers
Grubbs' test for outliers
Grubbs' test , also known as the maximum normed residual test, is a statistical test used to detect outliers in a univariate data set assumed to come from a normally distributed population.-Definition:...
Guess value
Guess value
A guess value is more commonly called a starting value or initial value. These are necessary for most optimization problems which use search algorithms, because those algorithms are mainly deterministic and iterative, and they need to start somewhere...
Guesstimate
Guesstimate
Guesstimate is an informal English contraction of guess and estimate, first used by American statisticians in 1934 or 1935. It is defined as an estimate made without using adequate or complete information, or, more strongly, as an estimate arrived at by guesswork or conjecture...
Gumbel distribution
Guttman scale
Guttman scale
In statistical surveys conducted by means of structured interviews or questionnaires, a subset of the survey items having binary answers forms a Guttman scale if they can be ranked in some order so that, for a rational respondent, the response pattern can be captured by a single index on that...
Gy's sampling theory
Gy's sampling theory
Gy's sampling theory is a theory about the sampling of materials, developed by Pierre Gy from the 1950s to beginning 2000s in articles and books including:* Sampling nomogram* Sampling of particulate materials; theory and practice...

H

h-index
H-index
The h-index is an index that attempts to measure both the productivity and impact of the published work of a scientist or scholar. The index is based on the set of the scientist's most cited papers and the number of citations that they have received in other publications...
Hájek–Le Cam convolution theorem
Hájek–Le Cam convolution theorem
In statistics, the Hájek–Le Cam convolution theorem states that any regular estimator in a parametric model is asymptotically equivalent to a sum of two independent random variables, one of which is normal with asymptotic variance equal to the inverse of Fisher information, and the other having...
Half circle distribution
Half-logistic distribution
Half-normal distribution
Half-normal distribution
The half-normal distribution is the probability distribution of the absolute value of a random variable that is normally distributed with expected value 0 and variance σ2. I.e...
Halton sequence
Hamburger moment problem
Hannan–Quinn information criterion
Harris chain
Hardy–Weinberg principle – statistical genetics
Hartley's test
Hartley's test
In statistics, Hartley's test, also known as the Fmax test or Hartley's Fmax, is used in the analysis of variance to verify that different groups have a similar variance, an assumption needed for other statistical tests.It was developed by H. O...
Hat matrix
Hat matrix
In statistics, the hat matrix, H, maps the vector of observed values to the vector of fitted values. It describes the influence each observed value has on each fitted value...
Hammersley–Clifford theorem
Hammersley–Clifford theorem
The Hammersley–Clifford theorem is a result in probability theory, mathematical statistics and statistical mechanics, that gives necessary and sufficient conditions under which a positive probability distribution can be represented as a Markov network...
Hausdorff moment problem
Hausman specification test redirects to Hausman test
Hausman test
The Hausman test or Hausman specification test is a statistical test in econometrics named after Jerry A. Hausman. The test evaluates the significance of an estimator versus an alternative estimator...
Haybittle–Peto boundary
Haybittle–Peto boundary
The Haybittle–Peto boundary is a rule for deciding when to stop a clinical trial prematurely.The typical clinical trial compares two groups of patients. One group are given a placebo or conventional treatment, while the other group of patients are given the treatment that is being tested...
Hazard function — redirects to Failure rate
Failure rate
Failure rate is the frequency with which an engineered system or component fails, expressed for example in failures per hour. It is often denoted by the Greek letter λ and is important in reliability engineering....
Hazard ratio
Hazard ratio
In survival analysis, the hazard ratio is the ratio of the hazard rates corresponding to the conditions described by two sets of explanatory variables. For example, in a drug study, the treated population may die at twice the rate per unit time as the control population. The hazard ratio would be...
Heaps' law
Heaps' law
In linguistics, Heaps' law is an empirical law which describes the portion of a vocabulary which is represented by an instance document consisting of words chosen from the vocabulary. This can be formulated as V_R = Kn^\beta...
Health care analytics
Health care analytics
Health care analytics is a rapidly evolving field of health care business solutions that makes extensive use of data, statistical and qualitative analysis, explanatory and predictive modeling.- Theory :...
Heart rate variability
Heart rate variability
Heart rate variability is a physiological phenomenon where the time interval between heart beats varies. It is measured by the variation in the beat-to-beat interval....
Heavy-tailed distribution
Heavy-tailed distribution
In probability theory, heavy-tailed distributions are probability distributions whose tails are not exponentially bounded: that is, they have heavier tails than the exponential distribution...
Heckman correction
Heckman correction
The Heckman correction is any of a number of related statistical methods developed by James Heckman in 1976 through 1979 which allow the researcher to correct for selection bias...
Hedonic regression
Hedonic regression
In economics, hedonic regression or hedonic demand theory is a revealed preference method of estimating demand or value. It decomposes the item being researched into its constituent characteristics, and obtains estimates of the contributory value of each characteristic...
Hellin's law
Hellin's Law
Hellin's Law is the principle that one in about 89 pregnancies ends in the birth of twins, triplets once in 892 births, and quadruplets once in 893 births....
Hellinger distance
Hellinger distance
In probability and statistics, the Hellinger distance is used to quantify the similarity between two probability distributions. It is a type of f-divergence...
Helmert–Wolf blocking
Herfindahl index
Herfindahl index
The Herfindahl index is a measure of the size of firms in relation to the industry and an indicator of the amount of competition among them. Named after economists Orris C. Herfindahl and Albert O. Hirschman, it is an economic concept widely applied in competition law, antitrust and also...
Heston model
Heston model
In finance, the Heston model, named after Steven Heston, is a mathematical model describing the evolution of the volatility of an underlying asset...
Heteroscedasticity
Heteroscedasticity-consistent standard errors
Heteroscedasticity-consistent standard errors
The topic of heteroscedasticity-consistent standard errors arises in statistics and econometrics in the context of linear regression and also time series analysis...
Heteroskedasticity — redirects to Heteroscedasticity
Hidden Markov model
Hidden Markov model
A hidden Markov model is a statistical Markov model in which the system being modeled is assumed to be a Markov process with unobserved states. An HMM can be considered as the simplest dynamic Bayesian network. The mathematics behind the HMM was developed by L. E...
Hidden Markov random field
Hidden Markov random field
A hidden Markov random field is a generalization of a hidden Markov model. Instead of having an underlying Markov chain, hidden Markov random fields have an underlying Markov random field.Suppose that we observe a random variable Y_i , where i \in S ....
Hidden semi-Markov model
Hidden semi-Markov model
A hidden semi-Markov model is a statistical model with the same structure as a hidden Markov model except that the unobservable process is semi-Markov rather than Markov. This means that the probability of there being a change in the hidden state depends on the amount of time that has elapsed...
Hierarchical Bayes model
Hierarchical Bayes model
The hierarchical Bayes model is a method in modern Bayesian statistical inference. It is a framework for describing statistical models that can capture dependencies more realistically than non-hierarchical models....
Hierarchical clustering
Hierarchical clustering
In statistics, hierarchical clustering is a method of cluster analysis which seeks to build a hierarchy of clusters. Strategies for hierarchical clustering generally fall into two types:...
Hierarchical hidden Markov model
Hierarchical hidden Markov model
The hierarchical hidden Markov model is a statistical model derived from the hidden Markov model . In an HHMM each state is considered to be a self-contained probabilistic model. More precisely each state of the HHMM is itself an HHMM....
Hierarchical linear modeling
Hierarchical linear modeling
In statistics, hierarchical linear modeling , a form of multi-level analysis, is a more advanced form of simple linear regression and multiple linear regression. Multilevel analysis allows variance in outcome variables to be analysed at multiple hierarchical levels, whereas in simple linear and...
High-dimensional statistics
High-dimensional statistics
In statistical theory, the field of high-dimensional statistics studies data whose dimension is larger than dimensions considered in classical multivariate analysis. High-dimensional statistics relies on the theory of random vectors...
Higher-order factor analysis
Higher-order factor analysis
Higher-order factor analysis is a statistical method consisting of repeating steps factor analysis – oblique rotation – factor analysis of rotated factors... Its merit is to enable the researcher to see the hierarchical structure of studied phenomena...
Higher-order statistics
Higher-order statistics
Higher-order statistics are descriptive measures of, among other things, qualities of probability distributions and sample distributions, and are, themselves, extensions of first- and second-order measures to higher orders. Skewness and kurtosis are examples of this...
Hirschman uncertainty
Hirschman uncertainty
In quantum mechanics, information theory, and Fourier analysis, the Hirschman uncertainty is defined as the sum of the temporal and spectral Shannon entropies. It turns out that Heisenberg's uncertainty principle can be expressed as a lower bound on the sum of these entropies...
Histogram
Histogram
In statistics, a histogram is a graphical representation showing a visual impression of the distribution of data. It is an estimate of the probability distribution of a continuous variable and was first introduced by Karl Pearson...
Historiometry
Historiometry
Historiometry is the historical study of human progress or individual personal characteristics, using statistics to analyze references to geniuses, their statements, behavior and discoveries in relatively neutral texts...
History of randomness
History of randomness
In ancient history, the concepts of chance and randomness were intertwined with that of fate. Many ancient peoples threw dice to determine fate, and this later evolved into games of chance...
History of statistics
History of statistics
The history of statistics can be said to start around 1749 although, over time, there have been changes to the interpretation of what the word statistics means. In early times, the meaning was restricted to information about states...
Hitting time
Hitting time
In the study of stochastic processes in mathematics, a hitting time is a particular instance of a stopping time, the first time at which a given process "hits" a given subset of the state space...
Hodges’ estimator
Hodges’ estimator
In statistics, Hodges’ estimator is a famous counter example of an estimator which is "superefficient", i.e. it attains smaller asymptotic variance than regular efficient estimators...
Hodges–Lehmann estimator
Hoeffding's independence test
Hoeffding's lemma
Hoeffding's lemma
In probability theory, Hoeffding's lemma is an inequality that bounds the moment-generating function of any bounded random variable. It is named after the Finnish–American mathematical statistician Wassily Hoeffding....
Hoeffding's inequality
Hoeffding's inequality
In probability theory, Hoeffding's inequality provides an upper bound on the probability for the sum of random variables to deviate from its expected value. Hoeffding's inequality was proved by Wassily Hoeffding.LetX_1, \dots, X_n \!...
Holm–Bonferroni method
Holtsmark distribution
Holtsmark distribution
The Holtsmark distribution is a continuous probability distribution. The Holtsmark distribution is a special case of a stable distribution with the index of stability or shape parameter \alpha equal to 3/2 and skewness parameter \beta of zero. Since \beta equals zero, the distribution is...
Homogeneity (statistics)
Homogeneity (statistics)
In statistics, homogeneity and its opposite, heterogeneity, arise in describing the properties of a dataset, or several datasets. They relate to the validity of the often convenient assumption that the statistical properties of any one part of an overall dataset are the same as any other part...
Homoscedasticity
Homoscedasticity
In statistics, a sequence or a vector of random variables is homoscedastic if all random variables in the sequence or vector have the same finite variance. This is also known as homogeneity of variance. The complementary notion is called heteroscedasticity...
Hoover index
Horvitz–Thompson estimator
Horvitz–Thompson estimator
In statistics, the Horvitz–Thompson estimator, named after Daniel G. Horvitz and Donovan J. Thompson, is a method for estimating the mean of a superpopulation in a stratified sample. Inverse probability weighting is applied to account for different proportions of observations within strata...
Hosmer–Lemeshow test
Hosmer–Lemeshow test
The Hosmer–Lemeshow test is a statistical test for goodness of fit for logistic regression models. It is used frequently in risk prediction models. The test assesses whether or not the observed event rates match expected event rates in subgroups of the model population. The Hosmer–Lemeshow...
Hotelling's T-squared distribution
How to Lie with Statistics
How to Lie with Statistics
How to Lie with Statistics is a book written by Darrell Huff in 1954 presenting an introduction to statistics for the general reader. Huff was a journalist who wrote many "how to" articles as a freelancer, but was not a statistician....

(book)
Howland will forgery trial
Howland will forgery trial
The Howland will forgery trial was a U.S. court case in 1868 to decide Henrietta Howland Robinson's contest of the will of Sylvia Ann Howland. It is famous for the forensic use of mathematics by Benjamin Peirce as an expert witness.-History:...
Hubbert curve
Hubbert curve
The Hubbert curve is an approximation of the production rate of a resource over time. It is a symmetric logistic distribution curve, often confused with the "normal" gaussian function. It first appeared in "Nuclear Energy and the Fossil Fuels," geophysicist M...
Huber–White standard error — redirects to Heteroscedasticity-consistent standard errors
Heteroscedasticity-consistent standard errors
The topic of heteroscedasticity-consistent standard errors arises in statistics and econometrics in the context of linear regression and also time series analysis...
Huber loss function
Huber Loss Function
In statistical theory, the Huber loss function is a function used in robust estimation that allows construction of an estimate which allows the effect of outliers to be reduced, while treating non-outliers in a more standard way.-Definition:...
Human subject research
Hurst exponent
Hurst exponent
The Hurst exponent is used as a measure of the long term memory of time series. It relates to the autocorrelations of the time series and the rate at which these decrease as the lag between pairs of values increases....
Hyper-exponential distribution
Hyper-Graeco-Latin square design
Hyper-Graeco-Latin square design
In the design of experiments, hyper-Graeco-Latin squares are efficient designs to study the effect of one primary factor in the presence of 4 blocking factors. They are restricted, however, to the case in which all the factors have the same number of levels.Designs for 4- and 5-level factors are...
Hyperbolic distribution
Hyperbolic distribution
The hyperbolic distribution is a continuous probability distribution that is characterized by the fact that the logarithm of the probability density function is a hyperbola. Thus the distribution decreases exponentially, which is more slowly than the normal distribution...
Hyperbolic secant distribution
Hypergeometric distribution
Hyperparameter
Hyperparameter
In Bayesian statistics, a hyperparameter is a parameter of a prior distribution; the term is used to distinguish them from parameters of the model for the underlying system under analysis...
Hyperprior
Hyperprior
In Bayesian statistics, a hyperprior is a prior distribution on a hyperparameter, that is, on a parameter of a prior distribution.As with the term hyperparameter, the use of hyper is to distinguish it from a prior distribution of a parameter of the model for the underlying system...
Hypoexponential distribution
Hypoexponential distribution
In probability theory the hypoexponential distribution or the generalized Erlang distribution is a continuous distribution, that has found use in the same fields as the Erlang distribution, such as queueing theory, teletraffic engineering and more generally in stochastic processes...

I

Idealised population
Idealised population
main article: effective population sizeIn population genetics an idealised population, also sometimes called a Fisher-Wright population after R.A. Fisher and Sewall Wright, is a population whose members can mate and reproduce with any other member of the other gender, has a sex ratio of 1 and no...
Idempotent matrix
Idempotent matrix
In algebra, an idempotent matrix is a matrix which, when multiplied by itself, yields itself. That is, the matrix M is idempotent if and only if MM = M...
Identifiability
Identifiability
In statistics, identifiability is a property which a model must satisfy in order for inference to be possible. We say that the model is identifiable if it is theoretically possible to learn the true value of this model’s underlying parameter after obtaining an infinite number of observations from it...
Ignorability
Ignorability
In statistics, ignorability refers to an experiment design where the method of data collection do not depend on the missing data...
Illustration of the central limit theorem
Illustration of the central limit theorem
This article gives two concrete illustrations of the central limit theorem. Both involve the sum of independent and identically-distributed random variables and show how the probability distribution of the sum approaches the normal distribution as the number of terms in the sum increases.The first...
Image denoising
Image denoising
Image denoising refers to the recovery of a digital image that has been contaminated by additive white Gaussian noise .-Technical description:...
Importance sampling
Importance sampling
In statistics, importance sampling is a general technique for estimating properties of a particular distribution, while only having samples generated from a different distribution rather than the distribution of interest. It is related to Umbrella sampling in computational physics...
Imprecise probability
Imprecise probability
Imprecise probability generalizes probability theory to allow for partial probability specifications, and is applicable when information is scarce, vague, or conflicting, in which case a unique probability distribution may be hard to identify...
Imputation (statistics)
Imputation (statistics)
In statistics, imputation is the substitution of some value for a missing data point or a missing component of a data point. Once all missing values have been imputed, the dataset can then be analysed using standard techniques for complete data...
Incidence (epidemiology)
Incidence (epidemiology)
Incidence is a measure of the risk of developing some new condition within a specified period of time. Although sometimes loosely expressed simply as the number of new cases during some time period, it is better expressed as a proportion or a rate with a denominator.Incidence proportion is the...
Inclusion probability
Inclusion probability
In statistics, in the theory relating to sampling from finite populations, the inclusion probability of an element or member of the population is its probability of becoming part of the sample during the drawing of a single sample....
Increasing process
Indecomposable distribution
Indecomposable distribution
In probability theory, an indecomposable distribution is a probability distribution that cannot be represented as the distribution of the sum of two or more non-constant independent random variables: Z ≠ X + Y. If it can be so expressed, it is decomposable:...
Independence of irrelevant alternatives
Independence of irrelevant alternatives
Independence of irrelevant alternatives is an axiom of decision theory and various social sciences.The word is used in different meanings in different contexts....
Independent component analysis
Independent component analysis
Independent component analysis is a computational method for separating a multivariate signal into additive subcomponents supposing the mutual statistical independence of the non-Gaussian source signals...
Independent and identically distributed random variables
Independent and identically distributed random variables
In probability theory and statistics, a sequence or other collection of random variables is independent and identically distributed if each random variable has the same probability distribution as the others and all are mutually independent....
Index number
Index of coincidence
Index of coincidence
In cryptography, coincidence counting is the technique of putting two texts side-by-side and counting the number of times that identical letters appear in the same position in both texts...
Index of dispersion
Indicators of spatial association
Indicators of spatial association
Indicators of spatial association are statistics that evaluate the existence of clusters in the spatial arrangement of a given variable. For instance if we are studying cancer rates among census tracts in a given city local clusters in the rates mean that there are areas that have higher or lower...
Indirect least squares
Inductive inference
Inductive inference
Around 1960, Ray Solomonoff founded the theory of universal inductive inference, the theory of prediction based on observations; for example, predicting the next symbol based upon a given series of symbols...
An inequality on location and scale parameters — redirects to Chebyshev's inequality
Chebyshev's inequality
In probability theory, Chebyshev’s inequality guarantees that in any data sample or probability distribution,"nearly all" values are close to the mean — the precise statement being that no more than 1/k2 of the distribution’s values can be more than k standard deviations away from the mean...
Inference
Inference
Inference is the act or process of deriving logical conclusions from premises known or assumed to be true. The conclusion drawn is also called an idiomatic. The laws of valid inference are studied in the field of logic.Human inference Inference is the act or process of deriving logical conclusions...
Inferential statistics redirects to Statistical inference
Statistical inference
In statistics, statistical inference is the process of drawing conclusions from data that are subject to random variation, for example, observational errors or sampling variation...
Infinite divisibility (probability)
Infinite divisibility (probability)
The concepts of infinite divisibility and the decomposition of distributions arise in probability and statistics in relation to seeking families of probability distributions that might be a natural choice in certain applications, in the same way that the normal distribution is...
Infinite monkey theorem
Infinite monkey theorem
The infinite monkey theorem states that a monkey hitting keys at random on a typewriter keyboard for an infinite amount of time will almost surely type a given text, such as the complete works of William Shakespeare....
Influence diagram
Influence diagram
An influence diagram is a compact graphical and mathematical representation of a decision situation...
Info-gap decision theory
Info-gap decision theory
Info-gap decision theory is a non-probabilistic decision theory that seeks to optimize robustness to failure – or opportuneness for windfall – under severe uncertainty, in particular applying sensitivity analysis of the stability radius type to perturbations in the value of a given estimate of the...
Information bottleneck method
Information bottleneck method
The information bottleneck method is a technique introduced by Naftali Tishby et al. [1] for finding the best tradeoff between accuracy and complexity when summarizing a random variable X, given a joint probability distribution between X and an observed relevant variable Y...
Information geometry
Information geometry
Information geometry is a branch of mathematics that applies the techniques of differential geometry to the field of probability theory. It derives its name from the fact that the Fisher information is used as the Riemannian metric when considering the geometry of probability distribution families...
Information gain ratio
Information gain ratio
- Information Gain Calculation :Let Attr be the set of all attributes and Ex the set of all training examples,value withx\in Ex defines the value of a specific example x for attribute a\in Attr, H specifies the entropy....
Information ratio
Information ratio
The Information ratio is a measure of the risk-adjusted return of a financial security . It is also known as Appraisal ratio and is defined as expected active return divided by tracking error, where active return is the difference between the return of the security and the return of a selected...

– finance
Information source (mathematics)
Information source (mathematics)
In mathematics, an information source is a sequence of random variables ranging over a finite alphabet Γ, having a stationary distribution.The uncertainty, or entropy rate, of an information source is defined as...
Information theory
Information theory
Information theory is a branch of applied mathematics and electrical engineering involving the quantification of information. Information theory was developed by Claude E. Shannon to find fundamental limits on signal processing operations such as compressing data and on reliably storing and...
Inherent bias
Inherent bias
The term "inherent bias" refers to the effect of underlying factors or assumptions that skew viewpoints a subject under discussion. There are multiple formal definitions of "inherent bias" which depend on the particular field of study....
Inherent zero
Inherent zero
In statistics, an inherent zero is a reference point used to describe data sets which are indicative of magnitude of an absolute or relative nature. Inherent zeros are used on ratio scales....
Injury prevention
Injury prevention
Injury prevention are efforts to prevent or reduce the severity of bodily injuries caused by external mechanisms, such as accidents, before they occur. Injury prevention is a component of safety and public health, and its goal is to improve the health of the population by preventing injuries and...

– application
Innovation (signal processing)
Innovation (signal processing)
In time series analysis — as conducted in statistics, signal processing, and many other fields — the innovation is the difference between the observed value of a variable at time t and the optimal forecast of that value based on information available prior to time t...
Innovations vector
Innovations vector
The innovations vector or residual vector is the difference between the measurement vector and the predicted measurement vector. Each difference represents the deviation of the observed random variable from the predicted response. The innovation vector is often used to check the validity of a...
Institutional review board
Institutional review board
An institutional review board , also known as an independent ethics committee or ethical review board , is a committee that has been formally designated to approve, monitor, and review biomedical and behavioral research involving humans with the aim to protect the rights and welfare of the...
Instrumental variable
Instrumental variable
In statistics, econometrics, epidemiology and related disciplines, the method of instrumental variables is used to estimate causal relationships when controlled experiments are not feasible....
Intention to treat analysis
Intention to treat analysis
In epidemiology, an intention to treat analysis is an analysis based on the initial treatment intent, not on the treatment eventually administered. ITT analysis is intended to avoid various misleading artifacts that can arise in intervention research...
Interaction (statistics)
Interaction (statistics)
In statistics, an interaction may arise when considering the relationship among three or more variables, and describes a situation in which the simultaneous influence of two variables on a third is not additive...
Interaction variable – see Interaction (statistics)
Interaction (statistics)
In statistics, an interaction may arise when considering the relationship among three or more variables, and describes a situation in which the simultaneous influence of two variables on a third is not additive...
Interclass correlation
Interclass correlation
In statistics, the interclass correlation measures a bivariate relation among variables.The Pearson correlation coefficient is the most commonly used interclass correlation....
Interdecile range
Interdecile range
In statistics, the interdecile range is the difference between the first and the ninth deciles . The interdecile range is a measure of statistical dispersion of the values in a set of data, similar to the range and the interquartile range....
Interim analysis
Interim analysis
Clinical trials are unique in that enrollment of patients is a continual process staggered in time. This means that if a treatment is particularly beneficial or harmful compared to the concurrent placebo group while the study is on-going, the investigators are ethically obliged to assess that...
Internal consistency
Internal consistency
In statistics and research, internal consistency is typically a measure based on the correlations between different items on the same test . It measures whether several items that propose to measure the same general construct produce similar scores...
Internal validity
Internal validity
Internal validity is the validity of inferences in scientific studies, usually based on experiments as experimental validity.- Details :...
Interquartile mean
Interquartile mean
The interquartile mean is a statistical measure of central tendency, much like the mean , the median, and the mode....
Interquartile range
Interquartile range
In descriptive statistics, the interquartile range , also called the midspread or middle fifty, is a measure of statistical dispersion, being equal to the difference between the upper and lower quartiles...
Inter-rater reliability
Inter-rater reliability
In statistics, inter-rater reliability, inter-rater agreement, or concordance is the degree of agreement among raters. It gives a score of how much homogeneity, or consensus, there is in the ratings given by judges. It is useful in refining the tools given to human judges, for example by...
Interval estimation
Interval estimation
In statistics, interval estimation is the use of sample data to calculate an interval of possible values of an unknown population parameter, in contrast to point estimation, which is a single number. Neyman identified interval estimation as distinct from point estimation...
Intervening variable
Intervening variable
An intervening variable is a hypothetical internal state that is used to explain relationships between observed variables, such as independent and dependent variables, in empirical research.- History :...
Intra-rater reliability
Intra-rater reliability
In statistics, intra-rater reliability is the degree of agreement among multiple repetitions of a diagnostic test performed by a single rater.-See also:* Inter-rater reliability* Reliability * Repeatability* Test-retest reliability...
Intraclass correlation
Intraclass correlation
In statistics, the intraclass correlation is a descriptive statistic that can be used when quantitative measurements are made on units that are organized into groups. It describes how strongly units in the same group resemble each other...
Invariant estimator
Invariant estimator
In statistics, the concept of being an invariant estimator is a criterion that can be used to compare the properties of different estimators for the same quantity. It is a way of formalising the idea that an estimator should have certain intuitively appealing qualities...
Invariant extended Kalman filter
Invariant extended Kalman filter
The invariant extended Kalman filter is a new version of the extended Kalman filter for nonlinear systems possessing symmetries . It combines the advantages of both the EKF and the recently introduced symmetry-preserving filters...
Inverse distance weighting
Inverse distance weighting
Inverse distance weighting is a method for multivariate interpolation, a process of assigning values to unknown points by using values from usually scattered set of known points...
Inverse Gaussian distribution
Inverse Gaussian distribution
| cdf = \Phi\left +\exp\left \Phi\left...
Inverse Mills ratio
Inverse Mills ratio
In statistics, the inverse Mills ratio, named after John P. Mills, is the ratio of the probability density function to the cumulative distribution function of a distribution....
Inverse probability
Inverse probability
In probability theory, inverse probability is an obsolete term for the probability distribution of an unobserved variable.Today, the problem of determining an unobserved variable is called inferential statistics, the method of inverse probability is called Bayesian probability, the "distribution"...
Inverse relationship
Inverse relationship
An inverse or negative relationship is a mathematical relationship in which one variable, say y, decreases as another, say x, increases. For a linear relation, this can be expressed as y = a-bx, where -b is a constant value less than zero and a is a constant...
Inverse-chi-squared distribution
Inverse-gamma distribution
Inverse transform sampling
Inverse-variance weighting
Inverse-variance weighting
In statistics, inverse-variance weighting is a method of aggregating two or more random variables to minimize the variance of the sum. Each random variable in the sum is weighted in inverse proportion to its variance....
Inverse-Wishart distribution
Iris flower data set
Iris flower data set
The Iris flower data set or Fisher's Iris data set is a multivariate data set introduced by Sir Ronald Aylmer Fisher as an example of discriminant analysis...
Irwin–Hall distribution
Isomap
Isomap
In statistics, Isomap is one of several widely used low-dimensional embedding methods, where geodesic distances on a weighted graph are incorporated with the classical scaling . Isomap is used for computing a quasi-isometric, low-dimensional embedding of a set of high-dimensional data points...
Isotonic regression
Item response theory
Item response theory
In psychometrics, item response theory also known as latent trait theory, strong true score theory, or modern mental test theory, is a paradigm for the design, analysis, and scoring of tests, questionnaires, and similar instruments measuring abilities, attitudes, or other variables. It is based...
Item-total correlation
Item-total correlation
The item-total correlation test arises in psychometrics in contexts where a number of tests or questions are given to an individual and where the problem is to construct a useful single quantity for each individual that can be used to compare that individual with others in a given population...
Item tree analysis
Item tree analysis
Item tree analysis is a data analytical method which allows constructing ahierarchical structure on the items of a questionnaire or test from observed responsepatterns. Assume that we have a questionnaire with m items and that subjects can...
Iterative proportional fitting
Iterative proportional fitting
The iterative proportional fitting procedure is an iterative algorithm for estimating cell values of a contingency table such that the marginal totals remain fixed and the estimated table decomposes into an outer...
Iteratively reweighted least squares
Itō calculus
Ito calculus
Itō calculus, named after Kiyoshi Itō, extends the methods of calculus to stochastic processes such as Brownian motion . It has important applications in mathematical finance and stochastic differential equations....
Itō isometry
Ito isometry
In mathematics, the Itō isometry, named after Kiyoshi Itō, is a crucial fact about Itō stochastic integrals. One of its main applications is to enable the computation of variances for stochastic processes....
Itō's lemma
Ito's lemma
In mathematics, Itō's lemma is used in Itō stochastic calculus to find the differential of a function of a particular type of stochastic process. It is named after its discoverer, Kiyoshi Itō...

J

Jaccard index
Jaccard index
The Jaccard index, also known as the Jaccard similarity coefficient , is a statistic used for comparing the similarity and diversity of sample sets....
Jackknife (statistics) redirects to Resampling (statistics)
Resampling (statistics)
In statistics, resampling is any of a variety of methods for doing one of the following:# Estimating the precision of sample statistics by using subsets of available data or drawing randomly with replacement from a set of data points # Exchanging labels on data points when performing significance...
Jackson network
Jackson's theorem (queueing theory)
Jadad scale
Jadad scale
The Jadad scale, sometimes known as Jadad scoring or the Oxford quality scoring system, is a procedure to independently assess the methodological quality of a clinical trial...
James–Stein estimator
Jarque–Bera test
Jarque–Bera test
In statistics, the Jarque–Bera test is a goodness-of-fit test of whether sample data have the skewness and kurtosis matching a normal distribution. The test is named after Carlos Jarque and Anil K. Bera...
Jeffreys prior
Jeffreys prior
In Bayesian probability, the Jeffreys prior, named after Harold Jeffreys, is a non-informative prior distribution on parameter space that is proportional to the square root of the determinant of the Fisher information:...
Jensen's inequality
Jensen's inequality
In mathematics, Jensen's inequality, named after the Danish mathematician Johan Jensen, relates the value of a convex function of an integral to the integral of the convex function. It was proved by Jensen in 1906. Given its generality, the inequality appears in many forms depending on the context,...
Jensen–Shannon divergence
Jensen–Shannon divergence
In probability theory and statistics, the Jensen–Shannon divergence is a popular method of measuring the similarity between two probability distributions. It is also known as information radius or total divergence to the average. It is based on the Kullback–Leibler divergence, with the notable ...
JMulTi
JMulTi
JMulTi is an open-source interactive software for econometric analysis, specialised in univariate and multivariate time series analysis. It has a Java graphical user interface....

– software
Johansen test
Johansen test
In statistics, the Johansen test, named after Søren Johansen, is a procedure for testing cointegration of several I time series. This test permits more than one cointegrating relationship so is more generally applicable than the Engle–Granger test which is based on the Dickey–Fuller test for...
Joint probability distribution
JMP (statistical software)
JMP (statistical software)
JMP is a computer program that was first developed by John Sall and others to perform simple and complex statistical analyses.It dynamically links statistics with graphics to interactively explore, understand, and visualize data...
Jump process
Jump process
A jump process is a type of stochastic process that has discrete movements, called jumps, rather than small continuous movements.In physics, jump processes result in diffusion...
Jump-diffusion model
Junction tree algorithm

K

K-distribution
K-distribution
The K-distribution is a probability distribution that arises as the consequence of a statistical or probabilistic model used in Synthetic Aperture Radar imagery...
K-means algorithm
K-means algorithm
In statistics and data mining, k-means clustering is a method of cluster analysis which aims to partition n observations into k clusters in which each observation belongs to the cluster with the nearest mean...

redirects to k-means clustering
K-means++
K-means++
In applied statistics, k-means++ is an algorithm for choosing the initial values for the k-means clustering algorithm. It was proposed in 2007 by David Arthur and Sergei Vassilvitskii, as an approximation algorithm for the NP-hard k-means problem—a way of avoiding the sometimes poor...
K-medians clustering
K-medians clustering
In statistics and machine learning, k-medians clustering is a variation of k-means clustering where instead of calculating the mean for each cluster to determine its centroid, one instead calculates the median...
K-medoids
K-medoids
The -medoids algorithm is a clustering algorithm related to the -means algorithm and the medoidshift algorithm. Both the -means and -medoids algorithms are partitional and both attempt to minimize squared error, the distance between points labeled to be in a cluster and a point designated as the...
Kalman filter
Kalman filter
In statistics, the Kalman filter is a mathematical method named after Rudolf E. Kálmán. Its purpose is to use measurements observed over time, containing noise and other inaccuracies, and produce values that tend to be closer to the true values of the measurements and their associated calculated...
Kaplan–Meier estimator
Kappa coefficient
Kappa statistic
Karhunen–Loève theorem
Kendall tau distance
Kendall tau distance
The Kendall tau distance is a metric that counts the number of pairwise disagreements between two lists. The larger the distance, the more dissimilar the two lists are. Kendall tau distance is also called bubble-sort distance since it is equivalent to the number of swaps that the bubble sort...
Kendall tau rank correlation coefficient
Kendall tau rank correlation coefficient
In statistics, the Kendall rank correlation coefficient, commonly referred to as Kendall's tau coefficient, is a statistic used to measure the association between two measured quantities...
Kendall's notation
Kendall's notation
In queueing theory, Kendall's notation is the standard system used to describe and classify the queueing model that a queueing system corresponds to. First suggested by D. G...
Kendall's W
Kendall's W
Kendall's W is a non-parametric statistic. It is a normalization of the statistic of the Friedman test, and can be used for assessing agreement among raters...

– Kendall's coefficient of concordance
Kent distribution
Kernel density estimation
Kernel density estimation
In statistics, kernel density estimation is a non-parametric way of estimating the probability density function of a random variable. Kernel density estimation is a fundamental data smoothing problem where inferences about the population are made, based on a finite data sample...
Kernel methods
Kernel methods
In computer science, kernel methods are a class of algorithms for pattern analysis, whose best known elementis the support vector machine...
Kernel principal component analysis
Kernel regression
Kernel regression
The kernel regression is a non-parametric technique in statistics to estimate the conditional expectation of a random variable. The objective is to find a non-linear relation between a pair of random variables X and Y....
Kernel smoother
Kernel smoother
A kernel smoother is a statistical technique for estimating a real valued function f\,\,\left by using its noisy observations, when no parametric model for this function is known...
Kernel (statistics)
Kernel (statistics)
A kernel is a weighting function used in non-parametric estimation techniques. Kernels are used in kernel density estimation to estimate random variables' density functions, or in kernel regression to estimate the conditional expectation of a random variable. Kernels are also used in time-series,...
Khmaladze transformation
Khmaladze transformation
The Khmaladze Transformation is a statistical tool.Consider the sequence of empirical distribution functions F_n based on asequence of i.i.d random variables, X_1,\ldots, X_n, as n increases.Suppose F is the hypothetical distribution function of...

(probability theory)
Killed process
Killed process
In probability theory — specifically, in stochastic analysis — a killed process is a stochastic process that is forced to assume an undefined or "killed" state at some time.-Definition:...
Khintchine inequality
Khintchine inequality
In mathematics, the Khintchine inequality, named after Aleksandr Khinchin and spelled in multiple ways in the Roman alphabet, is a theorem from probability, and is also frequently used in analysis...
Kingman's formula
Kirkwood approximation
Kirkwood approximation
The Kirkwood superposition approximation was introduced by Matsuda as a means of representing a discrete probability distribution. The name apparently refers to a 1942 paper by John G. Kirkwood...
Kish grid
Kish grid
The Kish grid is a method for selecting members within a household to be interviewed. In telephone surveys, the next-birthday method is sometimes preferred to the Kish grid.- References :...
Kitchen sink regression
Kitchen sink regression
A kitchen sink regression is an informal and usually pejorative term for a regression analysis which uses a long list of possible independent variables to attempt to explain variance in a dependent variable. In economics, psychology, and other social sciences, regression analysis is typically used...
Knightian uncertainty
Knightian uncertainty
In economics, Knightian uncertainty is risk that is immeasurable, not possible to calculate.Knightian uncertainty is named after University of Chicago economist Frank Knight , who distinguished risk and uncertainty in his work Risk, Uncertainty, and Profit:- Common-cause and special-cause :The...
Kolmogorov backward equation
Kolmogorov backward equation
The Kolmogorov backward equation and its adjoint sometimes known as the Kolmogorov forward equation are partial differential equations that arise in the theory of continuous-time continuous-state Markov processes. Both were published by Andrey Kolmogorov in 1931...
Kolmogorov continuity theorem
Kolmogorov continuity theorem
In mathematics, the Kolmogorov continuity theorem is a theorem that guarantees that a stochastic process that satisfies certain constraints on the moments of its increments will be continuous...
Kolmogorov extension theorem
Kolmogorov extension theorem
In mathematics, the Kolmogorov extension theorem is a theorem that guarantees that a suitably "consistent" collection of finite-dimensional distributions will define a stochastic process...
Kolmogorov’s criterion
Kolmogorov’s criterion
In probability theory, Kolmogorov's criterion, named after Andrey Kolmogorov, is a theorem in Markov processes concerning stationary Markov chains...
Kolmogorov’s generalized criterion
Kolmogorov's inequality
Kolmogorov's inequality
In probability theory, Kolmogorov's inequality is a so-called "maximal inequality" that gives a bound on the probability that the partial sums of a finite collection of independent random variables exceed some specified bound...
Kolmogorov's zero-one law
Kolmogorov's zero-one law
In probability theory, Kolmogorov's zero-one law, named in honor of Andrey Nikolaevich Kolmogorov, specifies that a certain type of event, called a tail event, will either almost surely happen or almost surely not happen; that is, the probability of such an event occurring is zero or one.Tail...
Kolmogorov–Smirnov test
KPSS test
KPSS test
In econometrics, Kwiatkowski–Phillips–Schmidt–Shin tests are used for testing a null hypothesis that an observable time series is stationary around a deterministic trend. Such models were proposed in 1982 by Alok Bhargava in his Ph.D. thesis where several John von Neumann or Durbin–Watson type...
Kriging
Kriging
Kriging is a group of geostatistical techniques to interpolate the value of a random field at an unobserved location from observations of its value at nearby locations....
Kruskal–Wallis one-way analysis of variance
Kuder-Richardson Formula 20
Kuder-Richardson Formula 20
In statistics, the Kuder-Richardson Formula 20 first published in 1937 is a measure of internal consistency reliability for measures with dichotomous choices. It is analogous to Cronbach's α, except Cronbach's α is also used for non-dichotomous measures...
Kuiper's test
Kuiper's test
Kuiper's test is used in statistics to test that whether a given distribution, or family of distributions, is contradicted by evidence from a sample of data. It is named after Dutch mathematician Nicolaas Kuiper....
Kullback's inequality
Kullback's inequality
In information theory and statistics, Kullback's inequality is a lower bound on the Kullback–Leibler divergence expressed in terms of the large deviations rate function. If P and Q are probability distributions on the real line, such that P is absolutely continuous with respect to Q, i.e...
Kullback–Leibler divergence
Kullback–Leibler divergence
In probability theory and information theory, the Kullback–Leibler divergence is a non-symmetric measure of the difference between two probability distributions P and Q...
Kumaraswamy distribution
Kumaraswamy distribution
In probability and statistics, the Kumaraswamy's double bounded distribution is a family of continuous probability distributions defined on the interval [0,1] differing in the values of their two non-negative shape parameters, a and b....
Kurtosis
Kurtosis
In probability theory and statistics, kurtosis is any measure of the "peakedness" of the probability distribution of a real-valued random variable...
Kushner equation
Kushner equation
In filtering theory the Kushner equation is an equation for the conditional probability density of the state of a stochastic non-linear dynamical system, given noisy measurements of the state. It therefore provides the solution of the nonlinear filtering problem in estimation theory...

L

L-estimator
L-moment
L-moment
In statistics, L-moments are statistics used to summarize the shape of a probability distribution. They are analogous to conventional moments in that they can be used to calculate quantities analogous to standard deviation, skewness and kurtosis, termed the L-scale, L-skewness and L-kurtosis...
Labour Force Survey
Labour Force Survey
Labour Force Surveys are statistical surveys conducted in a number of countries designed to capture data about the labour market. All European Union member states are required to conduct a Labour Force Survey annually. Labour Force Surveys are also carried out in some non-EU countries. They are...
Lack-of-fit sum of squares
Lack-of-fit sum of squares
In statistics, a sum of squares due to lack of fit, or more tersely a lack-of-fit sum of squares, is one of the components of a partition of the sum of squares in an analysis of variance, used in the numerator in an F-test of the null hypothesis that says that a proposed model fits well.- Sketch of...
Lady tasting tea
Lady tasting tea
In the design of experiments in statistics, the lady tasting tea is a famous randomized experiment devised by Ronald A. Fisher and reported in his book Statistical methods for research workers . The lady in question was Dr...
Lag operator
Lag windowing
Lag windowing
Lag windowing is a technique that consists of windowing the auto-correlation coefficients prior to estimating Linear prediction coefficients . The windowing in the auto-correlation domain has the same effect as a convolution in the power spectral domain and helps stabilizing the result of the...
Lambda distribution — disambiguation
Landau distribution
Lander–Green algorithm
Lander–Green algorithm
The Lander–Green algorithm is an algorithm, due to Eric Lander and Philip Green for computing the likelihood of observed genotype data given a pedigree. It is appropriate for relatively small pedigrees and a large number of markers. It is used in the analysis of genetic linkage....
Language model
Language model
A statistical language model assigns a probability to a sequence of m words P by means of a probability distribution.Language modeling is used in many natural language processing applications such as speech recognition, machine translation, part-of-speech tagging, parsing and information...
Laplace distribution
Laplace principle (large deviations theory)
Laplace principle (large deviations theory)
In mathematics, Laplace's principle is a basic theorem in large deviations theory, similar to Varadhan's lemma. It gives an asymptotic expression for the Lebesgue integral of exp over a fixed set A as θ becomes large...
Large deviations theory
Large deviations theory
In probability theory, the theory of large deviations concerns the asymptotic behaviour of remote tails of sequences of probability distributions. Some basic ideas of the theory can be tracked back to Laplace and Cramér, although a clear unified formal definition was introduced in 1966 by Varadhan...
Large deviations of Gaussian random functions
Large deviations of Gaussian random functions
A random function – of either one variable , or two or more variables – is called Gaussian if every finite-dimensional distribution is a multivariate normal distribution. Gaussian random fields on the sphere are useful when analysing* the anomalies in the cosmic microwave background...
LARS — see least-angle regression
Least-angle regression
In statistics, least-angle regression is a regression algorithm for high-dimensional data, developed by Bradley Efron, Trevor Hastie, Iain Johnstone and Robert Tibshirani....
Latent variable
Latent variable
In statistics, latent variables , are variables that are not directly observed but are rather inferred from other variables that are observed . Mathematical models that aim to explain observed variables in terms of latent variables are called latent variable models...

, latent variable model
Latent variable model
A latent variable model is a statistical model that relates a set of variables to a set of latent variables.It is assumed that 1) the responses on the indicators or manifest variables are the result of...
Latent class model
Latent class model
In statistics, a latent class model relates a set of observed discrete multivariate variables to a set of latent variables. It is a type of latent variable model. It is called a latent class model because the latent variable is discrete...
Latent Dirichlet allocation
Latent Dirichlet allocation
In statistics, latent Dirichlet allocation is a generative model that allows sets of observations to be explained by unobserved groups that explain why some parts of the data are similar...
Latent growth modeling
Latent growth modeling
Latent growth modeling is a statistical technique used in the structural equation modeling framework to estimate growth trajectory. It is a longitudinal analysis technique to estimate growth over a period of time. It is widely used in the field of behavioral science, education and social science. ...
Latent semantic analysis
Latent semantic analysis
Latent semantic analysis is a technique in natural language processing, in particular in vectorial semantics, of analyzing relationships between a set of documents and the terms they contain by producing a set of concepts related to the documents and terms. LSA assumes that words that are close...
Latin rectangle
Latin rectangle
In combinatorial mathematics, a Latin rectangle is an r × n matrix that has the numbers 1, 2, 3, ..., n as its entries with no number occurring more than once in any row or column where r ≤ n. An n × n Latin rectangle is called a...
Latin square
Latin square
In combinatorics and in experimental design, a Latin square is an n × n array filled with n different symbols, each occurring exactly once in each row and exactly once in each column...
Latin hypercube sampling
Latin hypercube sampling
Latin hypercube sampling is a statistical method for generating a distribution of plausible collections of parameter values from a multidimensional distribution. The sampling method is often applied in uncertainty analysis....
Law (stochastic processes)
Law (stochastic processes)
In mathematics, the law of a stochastic process is the measure that the process induces on the collection of functions from the index set into the state space...
Law of averages
Law of averages
The law of averages is a lay term used to express a belief that outcomes of a random event will "even out" within a small sample.As invoked in everyday life, the "law" usually reflects bad statistics or wishful thinking rather than any mathematical principle...
Law of comparative judgment
Law of comparative judgment
The law of comparative judgment was conceived by L. L. Thurstone. In modern day terminology, it is more aptly described as a model that is used to obtain measurements from any process of pairwise comparison...
Law of large numbers
Law of large numbers
In probability theory, the law of large numbers is a theorem that describes the result of performing the same experiment a large number of times...
Law of the iterated logarithm
Law of the iterated logarithm
In probability theory, the law of the iterated logarithm describes the magnitude of the fluctuations of a random walk. The original statement of the law of the iterated logarithm is due to A. Y. Khinchin . Another statement was given by A.N...
Law of the unconscious statistician
Law of the unconscious statistician
In probability theory and statistics, the law of the unconscious statistician is a theorem used to calculate the expected value of a function g of a random variable X when one knows the probability distribution of X but one does not explicitly know the distribution of g.The form of the law can...
Law of total covariance
Law of total covariance
In probability theory, the law of total covariance or covariance decomposition formula states that if X, Y, and Z are random variables on the same probability space, and the covariance of X and Y is finite, then...
Law of total cumulance
Law of total cumulance
In probability theory and mathematical statistics, the law of total cumulance is a generalization to cumulants of the law of total probability, the law of total expectation, and the law of total variance. It has applications in the analysis of time series...
Law of total expectation
Law of total expectation
The proposition in probability theory known as the law of total expectation, the law of iterated expectations, the tower rule, the smoothing theorem, among other names, states that if X is an integrable random variable The proposition in probability theory known as the law of total expectation, ...
Law of total probability
Law of total probability
In probability theory, the law of total probability is a fundamental rule relating marginal probabilities to conditional probabilities.-Statement:The law of total probability is the proposition that if \left\...
Law of total variance
Law of total variance
In probability theory, the law of total variance or variance decomposition formula states that if X and Y are random variables on the same probability space, and the variance of Y is finite, then...
Law of Truly Large Numbers
Law of Truly Large Numbers
The law of truly large numbers, attributed to Persi Diaconis and Frederick Mosteller, states that with a sample size large enough, any outrageous thing is likely to happen. Because we never find it notable when likely events occur, we highlight unlikely events and notice them more...
Layered hidden Markov model
Layered hidden Markov model
The layered hidden Markov model is a statistical model derived from the hidden Markov model .A layered hidden Markov model consists of N levels of HMMs, where the HMMs on level i + 1 correspond to observation symbols or probability generators at level i.Every level i of the LHMM...
Le Cam's theorem
Le Cam's theorem
In probability theory, Le Cam's theorem, named after Lucien le Cam , is as follows.Suppose:* X1, ..., Xn are independent random variables, each with a Bernoulli distribution , not necessarily identically distributed.* Pr = pi for i = 1, 2, 3, ...* \lambda_n = p_1 + \cdots + p_n.\,* S_n = X_1...
Lead time bias
Lead time bias
Lead time is the length of time between the detection of a disease and its usual clinical presentation and diagnosis ....
Least absolute deviations
Least absolute deviations
Least absolute deviations , also known as Least Absolute Errors , Least Absolute Value , or the L1 norm problem, is a mathematical optimization technique similar to the popular least squares technique that attempts to find a function which closely approximates a set of data...
Least-angle regression
Least-angle regression
In statistics, least-angle regression is a regression algorithm for high-dimensional data, developed by Bradley Efron, Trevor Hastie, Iain Johnstone and Robert Tibshirani....
Least squares
Least squares
The method of least squares is a standard approach to the approximate solution of overdetermined systems, i.e., sets of equations in which there are more equations than unknowns. "Least squares" means that the overall solution minimizes the sum of the squares of the errors made in solving every...
Least-squares spectral analysis
Least-squares spectral analysis
Least-squares spectral analysis is a method of estimating a frequency spectrum, based on a least squares fit of sinusoids to data samples, similar to Fourier analysis...
Least squares support vector machine
Least squares support vector machine
Least squares support vector machines are least squares versions of support vector machines , which are a set of related supervised learning methods that analyze data and recognize patterns, and which are used for classification and regression analysis...
Least trimmed squares
Least Trimmed Squares
Least trimmed squares , or least trimmed sum of squares, is a robust statistical method that attempts to fit a function to a set of data whilst not being unduly affected by the presence of outliers...
Learning theory (statistics)
Leftover hash-lemma
Leftover hash-lemma
The leftover hash lemma is a lemma in cryptography first stated by Russell Impagliazzo, Leonid Levin, and Michael Luby.Imagine that you have a secret key X that has n uniform random bits, and you would like to use this secret key to encrypt a message. Unfortunately, you were a bit careless with the...
Lehmann–Scheffé theorem
Lehmann–Scheffé theorem
In statistics, the Lehmann–Scheffé theorem is prominent in mathematical statistics, tying together the ideas of completeness, sufficiency, uniqueness, and best unbiased estimation...
Length time bias
Length time bias
Length time bias is a form of selection bias, a statistical distortion of results which can lead to incorrect conclusions about the data. Length time bias can occur when the lengths of intervals are analysed by selecting intervals that occupy randomly chosen points in time or space...
Levene's test
Levene's test
In statistics, Levene's test is an inferential statistic used to assess the equality of variances in different samples. Some common statistical procedures assume that variances of the populations from which different samples are drawn are equal. Levene's test assesses this assumption. It tests the...
Level of measurement
Level of measurement
The "levels of measurement", or scales of measure are expressions that typically refer to the theory of scale types developed by the psychologist Stanley Smith Stevens. Stevens proposed his theory in a 1946 Science article titled "On the theory of scales of measurement"...
Levenberg–Marquardt algorithm
Leverage (statistics)
Leverage (statistics)
In statistics, leverage is a term used in connection with regression analysis and, in particular, in analyses aimed at identifying those observations that are far away from corresponding average predictor values...
Levey–Jennings chart — redirects to Laboratory quality control
Laboratory quality control
Laboratory quality control is designed to detect, reduce, and correct deficiencies in a laboratory's internal analytical process prior to the release of patient results and improve the quality of the results reported by the laboratory. Quality control is a measure of precision or how well the...
Lévy's convergence theorem
Lévy's continuity theorem
Lévy's continuity theorem
In probability theory, the Lévy’s continuity theorem, named after the French mathematician Paul Lévy, connects convergence in distribution of the sequence of random variables with pointwise convergence of their characteristic functions...
Lévy arcsine law
Lévy arcsine law
In probability theory, the Lévy arcsine law, found by , states that the probability distribution of the proportion of the time that a Wiener process is positive is a random variable whose probability distribution is the arcsine distribution...
Lévy distribution
Lévy flight
Lévy flight
A Lévy flight is a random walk in which the step-lengths have a probability distribution that is heavy-tailed. When defined as a walk in a space of dimension greater than one, the steps made are in isotropic random directions...
Lévy process
Lévy process
In probability theory, a Lévy process, named after the French mathematician Paul Lévy, is any continuous-time stochastic process that starts at 0, admits càdlàg modification and has "stationary independent increments" — this phrase will be explained below...
Lewontin's Fallacy
Lewontin's Fallacy
Human genetic diversity: Lewontin's fallacy is a 2003 paper by A. W. F. Edwards that refers to an argument first made by Richard Lewontin in his 1972 article The apportionment of human diversity, which argued that race for humans is not a valid taxonomic construct. Edwards' paper criticized and...
Lexis diagram
Lexis diagram
In demography a Lexis diagram is a two dimensional diagram that is used to represent events that occur to individuals belonging to different cohorts...
Lexis ratio
Lexis ratio
The Lexis ratio is used in statistics as a measure which seeks to evaluate differences between the statistical properties of random mechanisms where the outcome is two-valued — for example "success" or "failure", "win" or "lose"...
Lies, damned lies, and statistics
Lies, damned lies, and statistics
"Lies, damned lies, and statistics" is a phrase describing the persuasive power of numbers, particularly the use of statistics to bolster weak arguments...
Life expectancy
Life expectancy
Life expectancy is the expected number of years of life remaining at a given age. It is denoted by ex, which means the average number of subsequent years of life for someone now aged x, according to a particular mortality experience...
Life table
Life table
In actuarial science, a life table is a table which shows, for each age, what the probability is that a person of that age will die before his or her next birthday...
Lift (data mining)
Lift (data mining)
In data mining, lift is a measure of the performance of a model at predicting or classifying cases, measuring against a random choice model.For example, suppose a population has a predicted response rate of 5%, but a certain model has identified a segment with a predicted response rate of 20%...
Likelihood function
Likelihood function
In statistics, a likelihood function is a function of the parameters of a statistical model, defined as follows: the likelihood of a set of parameter values given some observed outcomes is equal to the probability of those observed outcomes given those parameter values...
Likelihood principle
Likelihood principle
In statistics,the likelihood principle is a controversial principle of statistical inference which asserts that all of the information in a sample is contained in the likelihood function....
Likelihood-ratio test
Likelihood-ratio test
In statistics, a likelihood ratio test is a statistical test used to compare the fit of two models, one of which is a special case of the other . The test is based on the likelihood ratio, which expresses how many times more likely the data are under one model than the other...
Likelihood ratios in diagnostic testing
Likelihood ratios in diagnostic testing
In evidence-based medicine, likelihood ratios are used for assessing the value of performing a diagnostic test. They use the sensitivity and specificity of the test to determine whether a test result usefully changes the probability that a condition exists.-Calculation:Two versions of the...
Likert scale
Likert scale
A Likert scale is a psychometric scale commonly involved in research that employs questionnaires. It is the most widely used approach to scaling responses in survey research, such that the term is often used interchangeably with rating scale, or more accurately the Likert-type scale, even though...
Lilliefors test
Lilliefors test
In statistics, the Lilliefors test, named after Hubert Lilliefors, professor of statistics at George Washington University, is an adaptation of the Kolmogorov–Smirnov test...
Limited dependent variable
Limited dependent variable
A limited dependent variable is a variable whose range ofpossible values is "restricted in some important way." In econometrics, the term is often used whenestimation of the relationship between the limited dependent variable...
Limiting density of discrete points
Limiting density of discrete points
In information theory, the limiting density of discrete points is an adjustment to the formula of Claude Elwood Shannon for differential entropy.It was formulated by Edwin Thompson Jaynes to address defects in the initial definition of differential entropy....
Lincoln index
Lincoln Index
The Lincoln index is a statistical measure used in several fields to estimate the number of cases that have not yet been observed, based on two independent sets of observed cases. It is also sometimes known as the Lincoln-Petersen method.-Applications:...
Lindeberg's condition
Lindeberg's condition
In probability theory, Lindeberg's condition is a sufficient condition for the central limit theorem to hold for a sequence of independent random variables...
Lindley equation
Lindley equation
In probability theory, the Lindley equation, Lindley recursion or Lindley processes is a discrete time stochastic process An where n takes integer values and...
Lindley's paradox
Lindley's paradox
Lindley's paradox is a counterintuitive situation in statistics in which the Bayesian and frequentist approaches to a hypothesis testing problem give opposite results for certain choices of the prior distribution...
Line chart
Line chart
A line chart or line graph is a type of graph, which displays information as a series of data points connected by straight line segments. It is a basic type of chart common in many fields. It is an extension of a scatter graph, and is created by connecting a series of points that represent...
Line-intercept sampling
Line-intercept sampling
In statistics, line-intercept sampling is a method of sampling elements in a region whereby an element is sampled if a chosen line segment, called a “transect”, intersects the element ....
Linear classifier
Linear classifier
In the field of machine learning, the goal of statistical classification is to use an object's characteristics to identify which class it belongs to. A linear classifier achieves this by making a classification decision based on the value of a linear combination of the characteristics...
Linear discriminant analysis
Linear discriminant analysis
Linear discriminant analysis and the related Fisher's linear discriminant are methods used in statistics, pattern recognition and machine learning to find a linear combination of features which characterizes or separates two or more classes of objects or events...
Linear least squares — disambiguation
Linear least squares (mathematics)
Linear model
Linear model
In statistics, the term linear model is used in different ways according to the context. The most common occurrence is in connection with regression models and the term is often taken as synonymous with linear regression model. However the term is also used in time series analysis with a different...
Linear prediction
Linear prediction
Linear prediction is a mathematical operation where future values of a discrete-time signal are estimated as a linear function of previous samples....
Linear probability model
Linear regression
Linear regression
In statistics, linear regression is an approach to modeling the relationship between a scalar variable y and one or more explanatory variables denoted X. The case of one explanatory variable is called simple regression...
Linguistic demography
LISREL
LISREL
LISREL, an acronym for linear structural relations, is a statistical software package used in structural equation modeling. LISREL was developed in 1970s by Karl Jöreskog, then a scientist at Educational Testing Service in Princeton, NJ, and Dag Sörbom, later both professors of Uppsala University,...

— proprietary statistical software package
List of basic statistics topics — redirects to Outline of statistics
List of convolutions of probability distributions
List of graphical methods
List of information graphics software
List of probability topics
List of random number generators
List of scientific journals in statistics
List of statistical packages
List of statisticians
Listwise deletion
Little's law
Little's law
In the mathematical theory of queues, Little's result, theorem, lemma, law or formula says:It is a restatement of the Erlang formula, based on the work of Danish mathematician Agner Krarup Erlang...
Littlewood's law
Littlewood's law
Littlewood's Law states that individuals can expect a "miracle" to happen to them at the rate of about one per month.-History:The law was framed by Cambridge University Professor J. E...
Ljung–Box test
Ljung–Box test
The Ljung–Box test is a type of statistical test of whether any of a group of autocorrelations of a time series are different from zero...
Local convex hull
Local independence
Local independence
Local independence is the underlying assumption of latent variable models.The observed items are conditionally independent of each other given an individual score on the latent variable. This means that the latent variable explains why the observed items are related to another...
Local martingale
Local martingale
In mathematics, a local martingale is a type of stochastic process, satisfying the localized version of the martingale property. Every martingale is a local martingale; every bounded local martingale is a martingale; however, in general a local martingale is not a martingale, because its...
Local regression
Local regression
LOESS, or LOWESS , is one of many "modern" modeling methods that build on "classical" methods, such as linear and nonlinear least squares regression. Modern regression methods are designed to address situations in which the classical procedures do not perform well or cannot be effectively applied...
Location estimation redirects to Location parameter
Location parameter
In statistics, a location family is a class of probability distributions that is parametrized by a scalar- or vector-valued parameter μ, which determines the "location" or shift of the distribution...
Location estimation in sensor networks
Location estimation in sensor networks
Location estimation in wireless sensor networks is the problem of estimating the location of an object from a set of noisy measurements, when the measurements are acquired in a distributedmanner by a set of sensors.-Motivation:...
Location parameter
Location parameter
In statistics, a location family is a class of probability distributions that is parametrized by a scalar- or vector-valued parameter μ, which determines the "location" or shift of the distribution...
Location test
Location test
A location test is a statistical hypothesis test that compares the location parameter of a statistical population to a given constant, or that compares the location parameters of two statistical populations to each other...
Location-scale family
Location-scale family
In probability theory, especially as that field is used in statistics, a location-scale family is a family of univariate probability distributions parametrized by a location parameter and a non-negative scale parameter; if X is any random variable whose probability distribution belongs to such a...
Local asymptotic normality
Local asymptotic normality
In statistics, local asymptotic normality is a property of a sequence of statistical models, which allows this sequence to be asymptotically approximated by a normal location model, after a rescaling of the parameter...
Locality (statistics)
Loess curve redirects to Local regression
Local regression
LOESS, or LOWESS , is one of many "modern" modeling methods that build on "classical" methods, such as linear and nonlinear least squares regression. Modern regression methods are designed to address situations in which the classical procedures do not perform well or cannot be effectively applied...
Log-Cauchy distribution
Log-Cauchy distribution
In probability theory, a log-Cauchy distribution is a probability distribution of a random variable whose logarithm is distributed in accordance with a Cauchy distribution...
Log-Laplace distribution
Log-Laplace distribution
In probability theory and statistics, the log-Laplace distribution is the probability distribution of a random variable whose logarithm has a Laplace distribution. If X has a Laplace distribution with parameters μ and b, then Y = eX has a log-Laplace distribution...
Log-normal distribution
Log-linear model
Log-linear modeling
Log-log graph
Log-log graph
In science and engineering, a log-log graph or log-log plot is a two-dimensional graph of numerical data that uses logarithmic scales on both the horizontal and vertical axes...
Log-logistic distribution
Log-logistic distribution
In probability and statistics, the log-logistic distribution is a continuous probability distribution for a non-negative random variable. It is used in survival analysis as a parametric model for events whose rate increases initially and decreases later, for example mortality from cancer following...
Logarithmic distribution
Logarithmic mean
Logarithmic mean
In mathematics, the logarithmic mean is a function of two non-negative numbers which is equal to their difference divided by the logarithm of their quotient...
Logistic distribution
Logistic function
Logistic function
A logistic function or logistic curve is a common sigmoid curve, given its name in 1844 or 1845 by Pierre François Verhulst who studied it in relation to population growth. It can model the "S-shaped" curve of growth of some population P...
Logistic regression
Logistic regression
In statistics, logistic regression is used for prediction of the probability of occurrence of an event by fitting data to a logit function logistic curve. It is a generalized linear model used for binomial regression...
Logit
Logit
The logit function is the inverse of the sigmoidal "logistic" function used in mathematics, especially in statistics.Log-odds and logit are synonyms.-Definition:The logit of a number p between 0 and 1 is given by the formula:...
Logit analysis in marketing
Logit analysis in marketing
Logit analysis is a statistical technique used by marketers to assess the scope of customer acceptance of a product, particularly a new product. It attempts to determine the intensity or magnitude of customers' purchase intentions and translates that into a measure of actual buying behaviour...
Logit-normal distribution
Lognormal distribution
Logrank test
Logrank test
In statistics, the logrank test is a hypothesis test to compare the survival distributions of two samples. It is a nonparametric test and appropriate to use when the data are right skewed and censored...
Lomax distribution
Long-range dependency
Long-range dependency
Long-range dependency is a phenomenon that may arise in the analysis of spatial or time series data. It relates to the rate of decay of statistical dependence, with the implication that this decays more slowly than an exponential decay, typically a power-like decay...
Long Tail
Long tail
Long tail may refer to:*The Long Tail, a consumer demographic in business*Power law's long tail, a statistics term describing certain kinds of distribution*Long-tail boat, a type of watercraft native to Southeast Asia...
Long-tail traffic
Long-tail traffic
This article covers a range of tools from different disciplines that may be used in the important science of determining the probability of rare events....
Longitudinal study
Longitudinal study
A longitudinal study is a correlational research study that involves repeated observations of the same variables over long periods of time — often many decades. It is a type of observational study. Longitudinal studies are often used in psychology to study developmental trends across the...
Lorenz curve
Lorenz curve
In economics, the Lorenz curve is a graphical representation of the cumulative distribution function of the empirical probability distribution of wealth; it is a graph showing the proportion of the distribution assumed by the bottom y% of the values...
Loss function
Loss function
In statistics and decision theory a loss function is a function that maps an event onto a real number intuitively representing some "cost" associated with the event. Typically it is used for parameter estimation, and the event in question is some function of the difference between estimated and...
Lot quality assurance sampling
Lot Quality Assurance Sampling
Lot quality assurance sampling is a simple, low-cost random sampling methodology developed in the 1920s to control the quality of output in industrial production processes....
Lotka's law
Low birth weight paradox
Low birth weight paradox
The low birth weight paradox is an apparently paradoxical observation relating to the birth weights and mortality of children born to tobacco smoking mothers. Low birth weight children born to smoking mothers have a lower infant mortality rate than the low birth weight children of non-smokers...
Lucia de Berk
Lucia de Berk
Lucia de Berk, often called Lucia de B. or Lucy de B is a Dutch licenced paediatric nurse, who was subject to a miscarriage of justice. She was sentenced to life imprisonment in 2003 for four murders and three attempted murders of patients in her care...

– prob/stats related court case
Lukacs's proportion-sum independence theorem
Lukacs's proportion-sum independence theorem
In statistics, Lukacs's proportion-sum independence theorem is a result that is used when studying proportions, in particular the Dirichlet distribution...
Lumpability
Lumpability
In probability theory, lumpability is a method for reducing the size of the state space of some continuous-time Markov chains, first published by Kemeny and Snell.-Definition:...
Lusser's law
Lusser's Law
Lusser's law, named after Robert Lusser, is a prediction of reliability named after Robert Lusser. It is also called the "probability product law of series components". It states that the reliability of a series system is equal to the product of the reliability of its component subsystems, if their...
Lyapunov's central limit theorem

M

M/G/1 model
M/M/1 model
M/M/1 model
In queueing theory, a discipline within the mathematical theory of probability, a M/M/1 queue represents the queue length in a system having a single server, where arrivals are detemined by a Poisson process and job service times have an exponential distribution. The model name is written in...
M/M/c model
M/M/c model
In the mathematical theory of random processes, the M/M/c queue is a multi-server queue model. It is a generalisation of the M/M/1 queue.Following Kendall's notation it indicates a system where:*Arrivals are a Poisson process...
M-estimator
M-estimator
In statistics, M-estimators are a broad class of estimators, which are obtained as the minima of sums of functions of the data. Least-squares estimators and many maximum-likelihood estimators are M-estimators. The definition of M-estimators was motivated by robust statistics, which contributed new...
- Redescending M-estimator
  Redescending M-estimator
  In statistics, Redescending M-estimators are Ψ-type M-estimators which have Ψ functions that are non-decreasing near the origin, but decreasing toward 0 far from the origin...
M-separation
M-separation
In statistics, m-separation is a measure of disconnectedness in ancestral graphs and a generalization of d-separation for directed acyclic graphs. It is the opposite of m-connectedness....
Machine learning
Machine learning
Machine learning, a branch of artificial intelligence, is a scientific discipline concerned with the design and development of algorithms that allow computers to evolve behaviors based on empirical data, such as from sensor data or databases...
Mahalanobis distance
Mahalanobis distance
In statistics, Mahalanobis distance is a distance measure introduced by P. C. Mahalanobis in 1936. It is based on correlations between variables by which different patterns can be identified and analyzed. It gauges similarity of an unknown sample set to a known one. It differs from Euclidean...
Main effect
Main effect
In the design of experiments and analysis of variance, a main effect is the effect of an independent variable on a dependent variable averaging across the levels of any other independent variables...
Mallows' Cp
Mallows' Cp
In statistics, Mallows' Cp, named for Colin L. Mallows, is used to assess the fit of a regression model that has been estimated using ordinary least squares. It is applied in the context of model selection, where a number of predictor variables are available for predicting some outcome, and the...
MANCOVA
MANCOVA
Multivariate analysis of covariance is an extension of analysis of covariance methods to cover cases where there is more than one dependent variable and where the dependent variables cannot simply be combined....
Manhattan plot
Manhattan plot
A Manhattan plot is a type of scatter plot, usually used to display data with a large number of data-points - many of non-zero amplitude, and with a distribution of higher-magnitude values, for instance in genome-wide association studies...
Mann–Whitney U
MANOVA
MANOVA
Multivariate analysis of variance is a generalized form of univariate analysis of variance . It is used when there are two or more dependent variables. It helps to answer : 1. do changes in the independent variable have significant effects on the dependent variables; 2. what are the interactions...
Mantel test
Mantel test
The Mantel test, named after Nathan Mantel, is a statistical test of the correlation between two matrices. The matrices must be of the same rank, in most applications they are matrices of interrelations between the same vectors of objects....
MAP estimator — redirects to Maximum a posteriori estimation
Marchenko–Pastur distribution
Marchenko–Pastur distribution
In random matrix theory, the Marchenko–Pastur distribution, or Marchenko–Pastur law, describes the asymptotic behavior of singular values of large rectangular random matrices...
Marcinkiewicz–Zygmund inequality
Marcinkiewicz–Zygmund inequality
In mathematics, the Marcinkiewicz–Zygmund inequality, named after Józef Marcinkiewicz and Antoni Zygmund, gives relations between moments of a collection of independent random variables...
Marcum Q-function
Margin of error
Margin of error
The margin of error is a statistic expressing the amount of random sampling error in a survey's results. The larger the margin of error, the less faith one should have that the poll's reported results are close to the "true" figures; that is, the figures for the whole population...
Marginal distribution
Marginal distribution
In probability theory and statistics, the marginal distribution of a subset of a collection of random variables is the probability distribution of the variables contained in the subset. The term marginal variable is used to refer to those variables in the subset of variables being retained...
Marginal likelihood
Marginal likelihood
In statistics, a marginal likelihood function, or integrated likelihood, is a likelihood function in which some parameter variables have been marginalised...
Marginal model
Marginal model
In statistics, marginal models are a technique for obtaining regression estimates in multilevel modeling, also called hierarchical linear models....
Marginal variable — redirects to Marginal distribution
Marginal distribution
In probability theory and statistics, the marginal distribution of a subset of a collection of random variables is the probability distribution of the variables contained in the subset. The term marginal variable is used to refer to those variables in the subset of variables being retained...
Mark and recapture
Mark and recapture
Mark and recapture is a method commonly used in ecology to estimate population size. This method is most valuable when a researcher fails to detect all individuals present within a population of interest every time that researcher visits the study area...
Markov additive process
Markov blanket
Markov blanket
In machine learning, the Markov blanket for a node A in a Bayesian network is the set of nodes \partial A composed of A's parents, its children, and its children's other parents. In a Markov network, the Markov blanket of a node is its set of neighbouring nodes...
Markov chain
Markov chain
A Markov chain, named after Andrey Markov, is a mathematical system that undergoes transitions from one state to another, between a finite or countable number of possible states. It is a random process characterized as memoryless: the next state depends only on the current state and not on the...
- Markov chain geostatistics
  Markov chain geostatistics
  Markov chain geostatistics refer to the Markov chain models, simulation algorithms and associated spatial correlation measures based on the Markov chain random field theory, which extends a single Markov chain into a multi-dimensional field for geostatistical modeling. A Markov chain random field...
- Markov chain mixing time
  Markov chain mixing time
  In probability theory, the mixing time of a Markov chain is the time until the Markov chain is "close" to its steady state distribution.More precisely, a fundamental result about Markov chains is that a finite state irreducible aperiodic chain has a unique stationary distribution π and,...
Markov chain Monte Carlo
Markov chain Monte Carlo
Markov chain Monte Carlo methods are a class of algorithms for sampling from probability distributions based on constructing a Markov chain that has the desired distribution as its equilibrium distribution. The state of the chain after a large number of steps is then used as a sample of the...
Markov decision process
Markov decision process
Markov decision processes , named after Andrey Markov, provide a mathematical framework for modeling decision-making in situations where outcomes are partly random and partly under the control of a decision maker. MDPs are useful for studying a wide range of optimization problems solved via...
Markov information source
Markov information source
In mathematics, a Markov information source, or simply, a Markov source, is an information source whose underlying dynamics are given by a stationary finite Markov chain.-Formal definition:...
Markov kernel
Markov kernel
In probability theory, a Markov kernel is a map that plays the role, in the general theory of Markov processes, that the transition matrix does in the theory of Markov processes with a finite state space.- Formal definition :...
Markov logic network
Markov logic network
A Markov logic network is a probabilistic logic which applies the ideas of a Markov network to first-order logic, enabling uncertain inference...
Markov model
Markov model
In probability theory, a Markov model is a stochastic model that assumes the Markov property. Generally, this assumption enables reasoning and computation with the model that would otherwise be intractable.-Introduction:...
Markov network
Markov network
A Markov random field, Markov network or undirected graphical model is a set of variables having a Markov property described by an undirected graph. A Markov random field is similar to a Bayesian network in its representation of dependencies...
Markov process
Markov process
In probability theory and statistics, a Markov process, named after the Russian mathematician Andrey Markov, is a time-varying random phenomenon for which a specific property holds...
Markov property
Markov property
In probability theory and statistics, the term Markov property refers to the memoryless property of a stochastic process. It was named after the Russian mathematician Andrey Markov....
Markov random field
Markov's inequality
Markov's inequality
In probability theory, Markov's inequality gives an upper bound for the probability that a non-negative function of a random variable is greater than or equal to some positive constant...
Markovian arrival processes
Markovian arrival processes
In queueing theory, Markovian arrival processes are used to model the arrival of customers to a queue.Some of the most common include the Poisson process, Markov arrival process and the batch Markov arrival process.-Background:...
Marsaglia polar method
Marsaglia polar method
The polar method is a pseudo-random number sampling method for generating a pair of independent standard normal random variables...
Martingale (probability theory)
Martingale (probability theory)
In probability theory, a martingale is a model of a fair game where no knowledge of past events can help to predict future winnings. In particular, a martingale is a sequence of random variables for which, at a particular time in the realized sequence, the expectation of the next value in the...
Martingale difference sequence
Martingale difference sequence
In probability theory, a martingale difference sequence is related to the concept of the martingale. A stochastic series Y is an MDS if its expectation with respect to past values of another stochastic series X is zero...
Martingale representation theorem
Martingale representation theorem
In probability theory, the martingale representation theorem states that a random variable which is measurable with respect to the filtration generated by a Brownian motion can be written in terms of an Itô integral with respect to this Brownian motion....
Master equation
Master equation
In physics and chemistry and related fields, master equations are used to describe the time-evolution of a system that can be modelled as being in exactly one of countable number of states at any given time, and where switching between states is treated probabilistically...
Matched filter
Matched filter
In telecommunications, a matched filter is obtained by correlating a known signal, or template, with an unknown signal to detect the presence of the template in the unknown signal. This is equivalent to convolving the unknown signal with a conjugated time-reversed version of the template...
Matching pursuit
Matching pursuit
Matching pursuit is a type of numerical technique which involves finding the "best matching" projections of multidimensional data onto an over-complete dictionary D...
Matching (statistics)
Matching (statistics)
Matching is a statistical technique which is used to evaluate the effect of a treatment by comparing the treated and the non-treated in non experimental design . People use this technique with observational data...
Matérn covariance function
Matérn covariance function
In statistics, the Matérn covariance is a covariance function used in spatial statistics, geostatistics, machine learning, image analysis, and other applications of multivariate statistical analysis on metric spaces...
Mathematica
Mathematica
Mathematica is a computational software program used in scientific, engineering, and mathematical fields and other areas of technical computing...

– software
Mathematical biology
Mathematical biology
Mathematical and theoretical biology is an interdisciplinary scientific research field with a range of applications in biology, medicine and biotechnology...
Mathematical modelling in epidemiology
Mathematical modelling in epidemiology
It is possible to mathematically model the progress of most infectious diseases to discover the likely outcome of an epidemic or to help manage them by vaccination...
Mathematical modelling of infectious disease
Mathematical statistics
Mathematical statistics
Mathematical statistics is the study of statistics from a mathematical standpoint, using probability theory as well as other branches of mathematics such as linear algebra and analysis...
Matthews correlation coefficient
Matthews Correlation Coefficient
The Matthews correlation coefficient is used in machine learning as a measure of the quality of binary classifications. It takes into account true and false positives and negatives and is generally regarded as a balanced measure which can be used even if the classes are of very different sizes...
Matrix normal distribution
Matrix normal distribution
The matrix normal distribution is a probability distribution that is a generalization of the normal distribution to matrix-valued random variables.- Definition :...
Matrix population models
Matrix population models
Population models are used in population ecology to model the dynamics of wildlife or human populations. Matrix population models are a specific type of population model that uses matrix algebra...
Mauchly's sphericity test
Mauchly's sphericity test
Mauchly's sphericity test is a statistical test used to validate repeated measures factor ANOVAs. The test was introduced by ENIAC co-inventor John Mauchly in 1940.-What is sphericity?:...
Maximal ergodic theorem
Maximum a posteriori estimation
Maximum entropy classifier redirects to Logistic regression
Logistic regression
In statistics, logistic regression is used for prediction of the probability of occurrence of an event by fitting data to a logit function logistic curve. It is a generalized linear model used for binomial regression...
Maximum entropy Markov model
Maximum entropy Markov model
In machine learning, a maximum-entropy Markov model , or conditional Markov model , is a graphical model for sequence labeling that combines features of hidden Markov models and maximum entropy models...
Maximum entropy method redirects to Principle of maximum entropy
Principle of maximum entropy
In Bayesian probability, the principle of maximum entropy is a postulate which states that, subject to known constraints , the probability distribution which best represents the current state of knowledge is the one with largest entropy.Let some testable information about a probability distribution...
Maximum entropy probability distribution
Maximum entropy probability distribution
In statistics and information theory, a maximum entropy probability distribution is a probability distribution whose entropy is at least as great as that of all other members of a specified class of distributions....
Maximum entropy spectral estimation
Maximum entropy spectral estimation
The maximum entropy method applied to spectral density estimation. The overall idea is that the maximum entropy rate stochastic process that satisfies the given constant autocorrelation and variance constraints, is a linear Gauss-Markov process with i.i.d...
Maximum likelihood
Maximum likelihood
In statistics, maximum-likelihood estimation is a method of estimating the parameters of a statistical model. When applied to a data set and given a statistical model, maximum-likelihood estimation provides estimates for the model's parameters....
Maximum likelihood sequence estimation
Maximum Likelihood Sequence Estimation
Maximum likelihood sequence estimation is a mathematical algorithm to extract useful data out of a noisy data stream.-Theory:For an optimized detector for digital signals the priority is not to reconstruct the transmitter signal, but it should do a best estimation of the transmitted data with the...
Maximum parsimony
Maximum parsimony
Parsimony is a non-parametric statistical method commonly used in computational phylogenetics for estimating phylogenies. Under parsimony, the preferred phylogenetic tree is the tree that requires the least evolutionary change to explain some observed data....
Maximum spacing estimation
Maximum spacing estimation
In statistics, maximum spacing estimation , or maximum product of spacing estimation , is a method for estimating the parameters of a univariate statistical model...
Maxwell speed distribution
Maxwell Speed Distribution
Classically, an ideal gas' molecules bounce around with somewhat arbitrary velocities, never interacting with each other. In reality, however, an ideal gas is subjected to intermolecular forces. It is to be noted that the aforementioned classical treatment of an ideal gas is only useful when...
Maxwell–Boltzmann distribution
Maxwell’s theorem
MCAR
MCAR
In statistical analysis, data-values in a data set are missing completely at random if the events that lead to any particular data-item being missing are independent both of observable variables and of unobservable parameters of interest....

(missing completely at random)
McCullagh's parametrization of the Cauchy distributions
McDiarmid's inequality
McDonald–Kreitman test — statistical genetics
McNemar's test
McNemar's test
In statistics, McNemar's test is a non-parametric method used on nominal data. It is applied to 2 × 2 contingency tables with a dichotomous trait, with matched pairs of subjects, to determine whether the row and column marginal frequencies are equal...
Meadow's law
Meadow's law
Meadow's Law was a precept much in use until recently in the field of child protection, specifically by those investigating cases of multiple cot or crib death — SIDS — within a single family.-History:...
Mean
Mean
In statistics, mean has two related meanings:* the arithmetic mean .* the expected value of a random variable, which is also called the population mean....
Mean – see also expected value
Expected value
In probability theory, the expected value of a random variable is the weighted average of all possible values that this random variable can take on...
Mean absolute error
Mean absolute error
In statistics, the mean absolute error is a quantity used to measure how close forecasts or predictions are to the eventual outcomes. The mean absolute error is given by...
Mean absolute percentage error
Mean Absolute Percentage Error
Mean absolute percentage error is measure of accuracy in a fitted time series value in statistics, specifically trending. It usually expresses accuracy as a percentage, and is defined by the formula:...
Mean absolute scaled error
Mean absolute scaled error
In statistics, the mean absolute scaled error is a measure of the accuracy of forecasts . It was proposed in 2006 by Australian statistician Rob Hyndman, who described it as a "generally applicable measurement of forecast accuracy without the problems seen in the other measurements."The mean...
Mean and predicted response
Mean and predicted response
In linear regression mean response and predicted response are values of the dependent variable calculated from the regression parameters and a given value of the independent variable...
Mean deviation
Mean difference
Mean difference
The mean difference is a measure of statistical dispersion equal to the average absolute difference of two independent values drawn from a probability distribution. A related statistic is the relative mean difference, which is the mean difference divided by the arithmetic mean...
Mean integrated squared error
Mean of circular quantities
Mean of circular quantities
In mathematics, a mean of circular quantities is a mean which is suited for quantities like angles, daytimes, and fractional parts of real numbers. This is necessary since most of the usual means fail on circular quantities...
Mean percentage error
Mean Percentage Error
In statistics, the mean percentage error is the computed average of percentage errors by which estimated forecasts differ from actual values of the quantity being forecast.Formula for mean percentage error calculation is:...
Mean preserving spread
Mean reciprocal rank
Mean reciprocal rank
Mean reciprocal rank is a statistic for evaluating any process that produces a list of possible responses to a query, ordered by probability of correctness. The reciprocal rank of a query response is the multiplicative inverse of the rank of the first correct answer...
Mean signed difference
Mean square quantization error
Mean square quantization error
Mean square quantization error is a figure of merit for the process of analog to digital conversion.As the input is varied, the input's value is recorded when the digital output changes. For each digital output, the input's difference from ideal is normalized to the value of the least significant...
Mean square weighted deviation
Mean square weighted deviation
Mean square weighted deviation is used extensively in geochronology, the science of obtaining information about the time of formation of, for example, rocks, minerals, bones, corals, or charcoal, or the time at which particular processes took place in a rock mass, for example recrystallization and...
Mean squared error
Mean squared error
In statistics, the mean squared error of an estimator is one of many ways to quantify the difference between values implied by a kernel density estimator and the true values of the quantity being estimated. MSE is a risk function, corresponding to the expected value of the squared error loss or...
Mean squared prediction error
Mean time between failures
Mean-reverting process — redirects to Ornstein–Uhlenbeck process
Mean value analysis
Mean value analysis
In queueing theory, a specialty within the mathematical theory of probability, mean value analysis is a technique for computing expected queue lengths in equilibrium for a closed separable system of queues...
Measurement, level of — see level of measurement
Level of measurement
The "levels of measurement", or scales of measure are expressions that typically refer to the theory of scale types developed by the psychologist Stanley Smith Stevens. Stevens proposed his theory in a 1946 Science article titled "On the theory of scales of measurement"...

.
MedCalc
MedCalc
MedCalc is a statistical software package designed for the biomedical sciences. It has an integrated spreadsheet for data input and can import files in several formats...

– software
Median
Median
In probability theory and statistics, a median is described as the numerical value separating the higher half of a sample, a population, or a probability distribution, from the lower half. The median of a finite list of numbers can be found by arranging all the observations from lowest value to...
Median absolute deviation
Median absolute deviation
In statistics, the median absolute deviation is a robust measure of the variability of a univariate sample of quantitative data. It can also refer to the population parameter that is estimated by the MAD calculated from a sample....
Median polish
Median polish
The median polish is an exploratory data analysis procedure proposed by the statistician John Tukey. It finds an additively-fit model for data in a two-way layout table of the form row effect + column effect + overall median.-References:* Frederick Mosteller and John Tukey . "Data Analysis and...
Median test
Median test
In statistics, Mood's median test is a special case of Pearson's chi-squared test. It is a nonparametric test that tests the null hypothesis that the medians of the populations from which two samples are drawn are identical...
Mediation (statistics)
Mediation (Statistics)
In statistics, a mediation model is one that seeks to identify and explicate the mechanism that underlies an observed relationship between an independent variable and a dependent variable via the inclusion of a third explanatory variable, known as a mediator variable...
Medical statistics
Medical statistics
Medical statistics deals with applications of statistics to medicine and the health sciences, including epidemiology, public health, forensic medicine, and clinical research...
Medoid
Medoid
Medoids are representative objects of a data set or a cluster with a data set whose average dissimilarity to all the objects in the cluster is minimal. Medoids are similar in concept to means or centroids, but medoids are always members of the data set...
Memorylessness
Memorylessness
In probability and statistics, memorylessness is a property of certain probability distributions: the exponential distributions of non-negative real numbers and the geometric distributions of non-negative integers....
Mendelian randomization
Mendelian randomization
In epidemiology, Mendelian randomization is a method of using measured variation in genes of known function to examine the causal effect of a modifiable exposure on disease in non-experimental studies...
Mentor (statistics)
Mentor (statistics)
Mentor is a flexible and sophisticated statistical analysis system produced by CfMC. It specializes in the tabulation and graphical display of market and opinion research data, and is integrated with their Survent data collection software....

– software
Meta-analysis
Meta-analysis
In statistics, a meta-analysis combines the results of several studies that address a set of related research hypotheses. In its simplest form, this is normally by identification of a common measure of effect size, for which a weighted average might be the output of a meta-analyses. Here the...
Meta-analytic thinking
Method of moments (statistics)
Method of simulated moments
Method of simulated moments
In econometrics, the method of simulated moments is a structural estimation technique introduced by Daniel McFadden. It extends the generalized method of moments to cases where theoretical moment functions cannot be evaluated directly, such as when moment functions involve high-dimensional...
Method of support
Method of support
In statistics, the method of support is a technique that is used to make inferences from datasets.According to A. W. F. Edwards, the method of support aims to make inferences about unknown parameters in terms of the relative support, or log likelihood, induced by a set of data for a particular...
Metropolis–Hastings algorithm
Mexican paradox
Mexican paradox
The Mexican paradox is the observation that the Mexican people exhibit a surprisingly low incidence of low birth mass, contrary to what would be expected from their socioeconomic status...
Microdata (statistics)
Microdata (statistics)
In the study of survey and census data, microdata is information at the level of individual respondents. For instance, a national census might collect age, home address, educational level, employment status, and many other variables, recorded separately for every person who responds; this is...
Midhinge
Midhinge
In statistics, the midhinge is the average of the first and third quartiles and is thus a measure of location.Equivalently, it is the 25% trimmed mid-range; it is an L-estimator....
Mid-range
MinHash
MinHash
In computer science, MinHash is a technique for quickly estimating how similar two sets are...
Minimax
Minimax
Minimax is a decision rule used in decision theory, game theory, statistics and philosophy for minimizing the possible loss for a worst case scenario. Alternatively, it can be thought of as maximizing the minimum gain...
Minimax estimator
Minimisation (clinical trials)
Minimum distance estimation
Minimum distance estimation
Minimum distance estimation is a statistical method for fitting a mathematical model to data, usually the empirical distribution.-Definition:...
Minimum mean square error
Minimum-variance unbiased estimator
Minimum-variance unbiased estimator
In statistics a uniformly minimum-variance unbiased estimator or minimum-variance unbiased estimator is an unbiased estimator that has lower variance than any other unbiased estimator for all possible values of the parameter.The question of determining the UMVUE, if one exists, for a particular...
Minimum viable population
Minimum Viable Population
Minimum viable population is a lower bound on the population of a species, such that it can survive in the wild. This term is used in the fields of biology, ecology, and conservation biology...
Minitab
Minitab
Minitab is a statistics package. It was developed at the Pennsylvania State University by researchers Barbara F. Ryan, Thomas A. Ryan, Jr., and Brian L. Joiner in 1972...
MINQUE
Minque
In statistics, the theory of minimum norm quadratic unbiased estimation was developed by C.R. Rao. Its application was originally to the estimation of variance components in random effects models.The theory involves three stages:...

– minimum norm quadratic unbiased estimation
Missing completely at random
Missing data
Missing values — redirects to Missing data
Mittag–Leffler distribution
Mixed logit
Mixed logit
Mixed logit is a fully general statistical model for examining discrete choices. The motivation for the mixed logit model arises from the limitations of the standard logit model...
Misuse of statistics
Misuse of statistics
A misuse of statistics occurs when a statistical argument asserts a falsehood. In some cases, the misuse may be accidental. In others, it is purposeful and for the gain of the perpetrator. When the statistical reason involved is false or misapplied, this constitutes a statistical fallacy.The false...
Mixed data sampling
Mixed data sampling
Mixed data sampling is an econometric regression or filtering method developed by Ghysels et al. A simple regression example has the regressor appearing at a higher frequency than the regressand:...
Mixed-design analysis of variance
Mixed-design analysis of variance
In statistics, a mixed-design analysis of variance model is used to test for differences between two or more independent groups whilst subjecting participants to repeated measures...
Mixed model
Mixed model
A mixed model is a statistical model containing both fixed effects and random effects, that is mixed effects. These models are useful in a wide variety of disciplines in the physical, biological and social sciences....
Mixing (mathematics)
Mixing (mathematics)
In mathematics, mixing is an abstract concept originating from physics: the attempt to describe the irreversible thermodynamic process of mixing in the everyday world: mixing paint, mixing drinks, etc....
Mixture distribution
Mixture model
Mixture model
In statistics, a mixture model is a probabilistic model for representing the presence of sub-populations within an overall population, without requiring that an observed data-set should identify the sub-population to which an individual observation belongs...
Mixture (probability)
Mixture (probability)
In probability theory and statistics, a mixture is a combination of two or more probability distributions. The concept arises in two contexts:* A mixture defining a new probability distribution from some existing ones, as in a mixture density...
MLwiN
MLwiN
MLwiN is a statistical software package for fitting multilevel models. It uses both maximum likelihood estimation and Markov Chain Monte Carlo methods...
Mode (statistics)
Mode (statistics)
In statistics, the mode is the value that occurs most frequently in a data set or a probability distribution. In some fields, notably education, sample data are often called scores, and the sample mode is known as the modal score....
Model output statistics
Model output statistics
Model Output Statistics is an omnipresent statistical technique that forms the backbone of modern weather forecasting. The technique pioneered in the 1960s and early 1970s is used to post-process output from numerical weather forecast models...
Model selection
Model selection
Model selection is the task of selecting a statistical model from a set of candidate models, given data. In the simplest cases, a pre-existing set of data is considered...
Moderator variable redirects to Moderation (statistics)
Moderation (statistics)
In statistics, moderation occurs when the relationship between two variables depends on a third variable. The third variable is referred to as the moderator variable or simply the moderator...
Modifiable areal unit problem
Modifiable Areal Unit Problem
The modifiable areal unit problem is a source of statistical bias that can radically affect the results of statistical hypothesis tests. It affects results when point-based measures of spatial phenomena are aggregated into districts. The resulting summary values are influenced by the choice of...
Moffat distribution
Moffat distribution
The Moffat distribution, named after the physicist Anthony Moffat, is a continuous probability distribution based upon the Lorentzian distribution...
Moment (mathematics)
Moment (mathematics)
In mathematics, a moment is, loosely speaking, a quantitative measure of the shape of a set of points. The "second moment", for example, is widely used and measures the "width" of a set of points in one dimension or in higher dimensions measures the shape of a cloud of points as it could be fit by...
Moment-generating function
Moment-generating function
In probability theory and statistics, the moment-generating function of any random variable is an alternative definition of its probability distribution. Thus, it provides the basis of an alternative route to analytical results compared with working directly with probability density functions or...
Moments, method of — see method of moments (statistics)
Moment problem
Monotone likelihood ratio
Monte Carlo integration
Monte Carlo method
Monte Carlo method
Monte Carlo methods are a class of computational algorithms that rely on repeated random sampling to compute their results. Monte Carlo methods are often used in computer simulations of physical and mathematical systems...
Monte Carlo method for photon transport
Monte Carlo method for photon transport
Modeling photon propagation with Monte Carlo methods is a flexible yet rigorous approach to simulate photon transport. In the method, local rules of photon transport are expressed as probability distributions which describe the step size of photon movement between sites of photon-tissue interaction...
Monte Carlo methods for option pricing
Monte Carlo methods in finance
Monte Carlo methods in finance
Monte Carlo methods are used in finance and mathematical finance to value and analyze instruments, portfolios and investments by simulating the various sources of uncertainty affecting their value, and then determining their average value over the range of resultant outcomes. This is usually done...
Monte Carlo molecular modeling
Monte Carlo molecular modeling
Monte Carlo molecular modeling is the application of Monte Carlo methods to molecular problems. These problems can also be modeled by the molecular dynamics method. The difference is that this approach relies on statistical mechanics rather than molecular dynamics. Instead of trying to reproduce...
Moral graph
Moral graph
A moral graph is a concept in graph theory, used to find the equivalent undirected form of a directed acyclic graph. It is a key step of the junction tree algorithm, used in belief propagation on graphical models....
Moran process
Moran process
A Moran process, named after Patrick Moran, is a stochastic process used in biology to describe finite populations. It can be used to model variety-increasing processes such as mutation as well as variety-reducing effects such as genetic drift and natural selection...
Moran's I
Moran's I
In statistics, Moran's I is a measure of spatial autocorrelation developed by Patrick A.P. Moran. Spatial autocorrelation is characterized by a correlation in a signal among nearby locations in space. Spatial autocorrelation is more complex than one-dimensional autocorrelation because spatial...
Morisita's overlap index
Morris method
Morris method
In applied statistics, the Morris method for global sensitivity analysis is a so-called one-step-at-a-time method , meaning that in each run only one input parameter is given a new value. It facilitates a global sensitivity analysis by making a number r of local changes at different points x of the...
Mortality rate
Mortality rate
Mortality rate is a measure of the number of deaths in a population, scaled to the size of that population, per unit time...
Most probable number
Most probable number
The most probable number method, otherwise known as the method of Poisson zeroes, is a method of getting quantitative data on concentrations of discrete items from positive/negative data....
Moving average
Moving average model
Moving average model
In time series analysis, the moving-average model is a common approach for modeling univariate time series models. The notation MA refers to the moving average model of order q:...
Moving average representation — redirects to Wold's theorem
Moving least squares
Moving least squares
Moving least squares is a method of reconstructing continuous functions from a set of unorganized point samples via the calculation of a weighted least squares measure biased towards the region around the point at which the reconstructed value is requested....
Multi-armed bandit
Multi-armed bandit
In statistics, particularly in the design of sequential experiments, a multi-armed bandit takes its name from a traditional slot machine . Multiple levers are considered in the motivating applications in statistics. When pulled, each lever provides a reward drawn from a distribution associated...
Multi-vari chart
Multi-vari chart
In quality control, multi-vari charts are a visual way of presenting variability through a series of charts. The content and format of the charts has evolved over time.-Original concept:...
Multiclass classification
Multiclass classification
In machine learning, multiclass or multinomial classification is the problem of classifying instances into more than two classes.While some classification algorithms naturally permit the use of more than two classes, others are by nature binary algorithms; these can, however, be turned into...
Multiclass LDA (Linear discriminant analysis) — redirects to Linear discriminant analysis
Linear discriminant analysis
Linear discriminant analysis and the related Fisher's linear discriminant are methods used in statistics, pattern recognition and machine learning to find a linear combination of features which characterizes or separates two or more classes of objects or events...
Multicollinearity
Multicollinearity
Multicollinearity is a statistical phenomenon in which two or more predictor variables in a multiple regression model are highly correlated. In this situation the coefficient estimates may change erratically in response to small changes in the model or the data...
Multidimensional analysis
Multidimensional analysis
In statistics, econometrics, and related fields, multidimensional analysis is a data analysis process that groups data into two or more categories: data dimensions and measurements. For example, a data set consisting of the number of wins for a single football team at each of several years is a...
Multidimensional Chebyshev's inequality
Multidimensional Chebyshev's inequality
In probability theory, the multidimensional Chebyshev's inequality is a generalization of Chebyshev's inequality, which puts a bound on the probability of the event that a random variable differs from its expected value by more than a specified amount....
Multidimensional panel data
Multidimensional panel data
In econometrics, panel data is data observed over two dimensions . A panel data set is termed "multidimensional" when the phenomenon is observed over three or more dimensions...
Multidimensional scaling
Multidimensional scaling
Multidimensional scaling is a set of related statistical techniques often used in information visualization for exploring similarities or dissimilarities in data. MDS is a special case of ordination. An MDS algorithm starts with a matrix of item–item similarities, then assigns a location to each...
Multifactor design of experiments software
Multifactor design of experiments software
Software that is used for designing factorial experiments plays an important role in scientific experiments generally and represents a route to the implementation of design of experiments procedures that derive from statistical and combinatoric theory...
Multifactor dimensionality reduction
Multifactor dimensionality reduction
Multifactor dimensionality reduction is a data mining approach for detecting and characterizing combinations of attributes or independent variables that interact to influence a dependent or class variable...
Multilevel model
Multilevel model
Multilevel models are statistical models of parameters that vary at more than one level...
Multinomial distribution
Multinomial logit
Multinomial logit
In statistics, economics, and genetics, a multinomial logit model, also known as multinomial logistic regression, is a regression model which generalizes logistic regression by allowing more than two discrete outcomes...
Multinomial probit
Multinomial probit
In econometrics and statistics, the multinomial probit model, a popular alternative to the multinomial logit model, is a generalization of the probit model that allows more than two discrete, unordered outcomes. It is not to be confused with the multivariate probit model, which is used to model...
Multinomial test
Multinomial test
In statistics, the multinomial test is the test of the null hypothesis that the parameters of a multinomial distribution equal specified values. It is used for categorical data; see Read and Cressie....
Multiple baseline design
Multiple Baseline Design
A multiple baseline design is a style of research involving the careful measurement of multiple persons, traits or settings both before and after a treatment. This design is used in medical, psychological and biological research to name a few areas. It has several advantages over AB designs which...
Multiple comparisons
Multiple comparisons
In statistics, the multiple comparisons or multiple testing problem occurs when one considers a set of statistical inferences simultaneously. Errors in inference, including confidence intervals that fail to include their corresponding population parameters or hypothesis tests that incorrectly...
Multiple correlation
Multiple correlation
In statistics, multiple correlation is a linear relationship among more than two variables. It is measured by the coefficient of multiple determination, denoted as R2, which is a measure of the fit of a linear regression...
Multiple correspondence analysis
Multiple correspondence analysis
In statistics, multiple correspondence analysis is a data analysis technique for nominal categorical data, used to detect and represent underlying structures in a data set. It does this by representing data as points in a low-dimensional Euclidean space. The procedure thus appears to be the...
Multiple discriminant analysis
Multiple discriminant analysis
Multiple Discriminant Analysis is a method for compressing a multivariate signal to yield a lower dimensional signal amenable to classification....
Multiple-indicator kriging
Multiple-indicator kriging
Multiple-indicator kriging is a recent advance on other techniques for mineral deposit modeling and resource block model estimation, such as ordinary kriging....
Multiple Indicator Cluster Survey
Multiple Indicator Cluster Survey
The Multiple Indicator Cluster Surveys are a survey program developed by the United Nations Children's Fund to provide internationally comparable, statistically rigorous data on the situation of children and women. The first round of surveys was carried out in over 60 countries in 1995 in...
Multiple of the median
Multiple of the median
A multiple of the median is a measure of how far an individual test result deviates from the median. MoM is commonly used to report the results of medical screening tests, particularly where the results of the individual tests are highly variable....
Multiple testing correction redirects to Multiple comparisons
Multiple comparisons
In statistics, the multiple comparisons or multiple testing problem occurs when one considers a set of statistical inferences simultaneously. Errors in inference, including confidence intervals that fail to include their corresponding population parameters or hypothesis tests that incorrectly...
Multiple-try Metropolis
Multiple-try Metropolis
In Markov chain Monte Carlo, the Metropolis–Hastings algorithm can be used to sample from a probability distribution which is difficult to sample from directly. However, the MH algorithm requires the user to supply a proposal distribution, which can be relatively arbitrary...
Multiresolution analysis
Multiresolution analysis
A multiresolution analysis or multiscale approximation is the design method of most of the practically relevant discrete wavelet transforms and the justification for the algorithm of the fast wavelet transform...
Multiscale decision making
Multiscale decision making
Multiscale decision making, also referred to as Multiscale decision theory , is a recently developed approach in operations research that fuses game theory, multi-agent influence diagrams, in particular dependency graphs, and Markov decision processes to solve multiscale challenges across...
Multiscale geometric analysis
Multiscale geometric analysis
Multiscale geometric analysis or geometric multiscale analysis is an emerging area of high-dimensional signal processing and data analysis.-See also:*Wavelet*Scale space*Multi-scale approaches*Multiresolution analysis*Singular value decomposition...
Multistage testing
Multistage testing
Multistage testing is an algorithm-based approach to administering tests. It is very similar to computer-adaptive testing in that items are interactively selected for each examinee by the algorithm, but rather than selecting individual items, groups of items are selected, building the test in stages...
Multitrait-multimethod matrix
Multitrait-multimethod matrix
The multitrait-multimethod matrix is an approach to examining Construct Validity developed by Campbell and Fiske. There are six major considerations when examining a construct's validity through the MTMM matrix, which are as follows:...
Multivariate adaptive regression splines
Multivariate adaptive regression splines
Multivariate adaptive regression splines is a form of regression analysis introduced by Jerome Friedman in 1991. It is a non-parametric regression techniqueand can be seen as an extension of linear models that...
Multivariate analysis
Multivariate analysis
Multivariate analysis is based on the statistical principle of multivariate statistics, which involves observation and analysis of more than one statistical variable at a time...
Multivariate analysis of variance
Multivariate distribution – redirects to Joint probability distribution
Multivariate kernel density estimation
Multivariate kernel density estimation
Kernel density estimation is a nonparametric technique for density estimation i.e., estimation of probability density functions, which is one of the fundamental questions in statistics. It can be viewed as a generalisation of histogram density estimation with improved statistical properties...
Multivariate normal distribution
Multivariate Pólya distribution
Multivariate Polya distribution
The multivariate Pólya distribution, named after George Pólya, also called the Dirichlet compound multinomial distribution, is a compound probability distribution, where a probability vector p is drawn from a Dirichlet distribution with parameter vector \alpha, and a set of discrete samples is...
Multivariate probit
Multivariate probit
In statistics and econometrics, the multivariate probit model is a generalization of the probit model used to estimate several correlated binary outcomes jointly...
Multivariate random variable
Multivariate random variable
In mathematics, probability, and statistics, a multivariate random variable or random vector is a list of mathematical variables each of whose values is unknown, either because the value has not yet occurred or because there is imperfect knowledge of its value.More formally, a multivariate random...
Multivariate stable distribution
Multivariate statistics
Multivariate statistics
Multivariate statistics is a form of statistics encompassing the simultaneous observation and analysis of more than one statistical variable. The application of multivariate statistics is multivariate analysis...
Multivariate Student distribution

N

n = 1 fallacy
Naive Bayes classifier
Naive Bayes classifier
A naive Bayes classifier is a simple probabilistic classifier based on applying Bayes' theorem with strong independence assumptions...
Nakagami distribution
National and international statistical services
Nash–Sutcliffe model efficiency coefficient
National Health Interview Survey
National Health Interview Survey
The National Health Interview Survey is an annual, cross-sectional survey intended to provide nationally-representative estimates on a wide range of health status and utilization measures among the nonmilitary, noninstitutionalized population of the United States...
Natural experiment
Natural experiment
A natural experiment is an observational study in which the assignment of treatments to subjects has been haphazard: That is, the assignment of treatments has been made "by nature", but not by experimenters. Thus, a natural experiment is not a controlled experiment...
Natural exponential family
Natural exponential family
In probability and statistics, the natural exponential family is a class of probability distributions that is a special case of an exponential family...
Natural process variation
NCSS (statistical software)
Negative binomial distribution
Negative binomial distribution
In probability theory and statistics, the negative binomial distribution is a discrete probability distribution of the number of successes in a sequence of Bernoulli trials before a specified number of failures occur...
Negative multinomial distribution
Negative predictive value
Negative predictive value
In statistics and diagnostic testing, the negative predictive value is a summary statistic used to describe the performance of a diagnostic testing procedure. It is defined as the proportion of subjects with a negative test result who are correctly diagnosed. A high NPV means that when the test...
Negative relationship
Negative relationship
In statistics, a relationship between two variables is negative if the slope in a corresponding graph is negative, or—what is in some contexts equivalent—if the correlation between them is negative...
Negentropy
Negentropy
The negentropy, also negative entropy or syntropy, of a living system is the entropy that it exports to keep its own entropy low; it lies at the intersection of entropy and life...
Neighbourhood components analysis
Neighbourhood components analysis
Neighbourhood components analysis is a supervised learning method for clustering multivariate data into distinct classes according to a given distance metric over the data...
Nelson rules
Nelson rules
Nelson rules are a method in process control of determining if some measured variable is out of control . Rules, for detecting "out-of-control" or non-random conditions were first postulated by Walter A. Shewhart in the 1920s...
Nelson–Aalen estimator
Nelson–Aalen estimator
The Nelson–Aalen estimator is a non-parametric estimator of the cumulative hazard rate function in case of censored data or incomplete data. It is used in survival theory, reliability engineering and life insurance to estimate the cumulative number of expected events. An event can be a failure of a...
Nested case-control study
Nested case-control study
A nested case control study is a variation of a case-cohort study in which only a subset of controls from the cohort are compared to the incident cases. In a case-cohort study, all incident cases in the cohort are compared to a random subset of participants who do not develop the disease of interest...
Nested sampling algorithm
Nested sampling algorithm
The nested sampling algorithm is a computational approach to the problem of comparing models in Bayesian statistics, developed in 2004 by physicist John Skilling.-Background:...
Network probability matrix
Network Probability Matrix
The network probability matrix describes the probability structure of a network based on the historical presence or absence of edges in a network. For example, individuals in a social network are not connected to other individuals with uniform random probability. The probability structure is much...
Neural network
Neural network
The term neural network was traditionally used to refer to a network or circuit of biological neurons. The modern usage of the term often refers to artificial neural networks, which are composed of artificial neurons or nodes...
Neutral vector
Neutral vector
In statistics, and specifically in the study of the Dirichlet distribution, a neutral vector of random variables is one that exhibits a particular type of statistical independence amongst its elements...
Newcastle–Ottawa scale
Newcastle–Ottawa scale
In statistics, the Newcastle–Ottawa scale is a method for assessing the quality of nonrandomised studies in meta-analyses. The scales allocate stars, maximum of nine, for quality of selection, comparability, exposure and outcome of study participants. The method was developed as a collaboration...
Newey–West estimator
Newey–West estimator
A Newey–West estimator is used in statistics and econometrics to provide an estimate of the covariance matrix of the parameters of a regression-type model when this model is applied in situations where the standard assumptions of regression analysis do not apply. It was devised by Whitney K. Newey...
Newman–Keuls method
Newman–Keuls method
In statistics, the Newman–Keuls method is a post-hoc test used for comparisons after the performed F-test is found to be significant...
Neyer d-optimal test
Neyer d-optimal test
The Neyer D-Optimal Test is one way of analyzing a sensitivity test of explosives as described by Barry T. Neyer in 1994. This method has replaced the earlier Bruceton analysis or "Up and Down Test" that was devised by Dixon and Mood in 1948 to allow computation with pencil and paper. Samples are...
Neyman construction
Neyman construction
Neyman construction is a frequentist method to construct an interval at a confidence level C\, that if we repeat the experiment many times the interval will contain the true value a fraction C\, of the time. The probability that the intervals contains the true value is called the coverage.-...
Neyman–Pearson lemma
Nicholson–Bailey model
Nominal category
Nominal category
A nominal category or a nominal group is a group of objects or ideas that can be collectively grouped on the basis of shared, arbitrary characteristic....
Noncentral beta distribution
Noncentral beta distribution
In probability theory and statistics, the noncentral beta distribution is a continuous probability distribution that is a generalization of the beta distribution.- Probability density function :...
Noncentral chi distribution
Noncentral chi-squared distribution
Noncentral F-distribution
Noncentral F-distribution
In probability theory and statistics, the noncentral F-distribution is a continuous probability distribution that is a generalization of the F-distribution...
Noncentral hypergeometric distributions
Noncentral hypergeometric distributions
In statistics, the hypergeometric distribution is the discrete probability distribution generated by picking colored balls at random from an urn without replacement....
Noncentral t-distribution
Noncentral t-distribution
In probability and statistics, the noncentral t-distribution generalizes Student's t-distribution using a noncentrality parameter. Like the central t-distribution, the noncentral t-distribution is primarily used in statistical inference, although it may also be used in robust modeling for data...
Noncentrality parameter
Noncentrality parameter
Noncentrality parameters are parameters of families of probability distributions which are related to other "central" families of distributions. If the noncentrality parameter of a distribution is zero, the distribution is identical to a distribution in the central family...
Nonlinear autoregressive exogenous model
Nonlinear autoregressive exogenous model
In time series modeling, a nonlinear autoregressive exogenous model is a nonlinear autoregressive model which has exogenous inputs. This means that the model relates the current value of a time series which one would like to explain or predict to both:...
Nonlinear dimensionality reduction
Nonlinear dimensionality reduction
High-dimensional data, meaning data that requires more than two or three dimensions to represent, can be difficult to interpret. One approach to simplification is to assume that the data of interest lies on an embedded non-linear manifold within the higher-dimensional space...
Non-linear iterative partial least squares
Non-linear iterative partial least squares
In statistics, non-linear iterative partial least squares is an algorithm for computing the first few components in a principal component or partial least squares analysis. For very high-dimensional datasets, such as those generated in the 'omics sciences it is usually only necessary to compute...
Nonlinear regression
Nonlinear regression
In statistics, nonlinear regression is a form of regression analysis in which observational data are modeled by a function which is a nonlinear combination of the model parameters and depends on one or more independent variables...
Non-homogeneous Poisson process
Non-homogeneous Poisson process
In probability theory, a non-homogeneous Poisson process is a Poisson process with rate parameter \lambda such that the rate parameter of the process is a function of time...
Non-linear least squares
Non-linear least squares
Non-linear least squares is the form of least squares analysis which is used to fit a set of m observations with a model that is non-linear in n unknown parameters . It is used in some forms of non-linear regression. The basis of the method is to approximate the model by a linear one and to...
Non-negative matrix factorization
Non-parametric statistics
Non-parametric statistics
In statistics, the term non-parametric statistics has at least two different meanings:The first meaning of non-parametric covers techniques that do not rely on data belonging to any particular distribution. These include, among others:...
Non-response bias
Non-response bias
Non-response bias occurs in statistical surveys if the answers of respondents differ from the potential answers of those who did not answer.- Example :...
Non-sampling error
Non-sampling error
In statistics, non-sampling error is a catch-all term for the deviations from the true value that are not a function of the sample chosen, including various systematic errors and any random errors that are not due to sampling. Non-sampling errors are much harder to quantify than sampling errors ....
Nonparametric regression
Nonparametric regression
Nonparametric regression is a form of regression analysis in which the predictor does not take a predetermined form but is constructed according to information derived from the data...
Nonprobability sampling
Nonprobability sampling
Sampling is the use of a subset of the population to represent the whole population. Probability sampling, or random sampling, is a sampling technique in which the probability of getting any particular sample may be calculated. Nonprobability sampling does not meet this criterion and should be...
Normal curve equivalent
Normal curve equivalent
In educational statistics, a normal curve equivalent , developed for the United States Department of Education by the RMC Research Corporation,NCE stands for Normal Curve Equivalent and was developed [for] the [US] Department of Education. is a way of standardizing scores received on a test. It is...
Normal distribution
Normal probability plot
Normal probability plot
The normal probability plot is a graphical technique for normality testing: assessing whether or not a data set is approximately normally distributed....

– see also rankit
Rankit
In statistics, rankits of a set of data are the expected values of the order statistics of a sample from the standard normal distribution the same size as the data. They are primarily used in the normal probability plot, a graphical technique for normality testing.-Example:This is perhaps most...
Normal score
Normal score
The term normal score is used with two different meanings in statistics. One of them relates to creating a single value which can be treated as if it had arisen from a standard normal distribution...

– see also rankit
Rankit
In statistics, rankits of a set of data are the expected values of the order statistics of a sample from the standard normal distribution the same size as the data. They are primarily used in the normal probability plot, a graphical technique for normality testing.-Example:This is perhaps most...

and Z score
Normal variance-mean mixture
Normal-exponential-gamma distribution
Normal-exponential-gamma distribution
In probability theory and statistics, the normal-exponential-gamma distribution is a three-parameter family of continuous probability distributions...
Normal-gamma distribution
Normal-inverse Gaussian distribution
Normal-scaled inverse gamma distribution
Normality test
Normality test
In statistics, normality tests are used to determine whether a data set is well-modeled by a normal distribution or not, or to compute how likely an underlying random variable is to be normally distributed....
Normalization (statistics)
Normalization (statistics)
In one usage in statistics, normalization is the process of isolating statistical error in repeated measured data. A normalization is sometimes based on a property...
Normally distributed and uncorrelated does not imply independent
Normally distributed and uncorrelated does not imply independent
In probability theory, two random variables being uncorrelated does not imply their independence. In some contexts, uncorrelatedness implies at least pairwise independence ....
Notation in probability and statistics
Novikov's condition
Novikov's condition
In probability theory, Novikov's condition is the sufficient condition for a stochastic process which takes the form of the Radon-Nikodym derivative in Girsanov's theorem to be a martingale...
np-chart
Null distribution
Null distribution
In statistical hypothesis testing, the null distribution is the probability distribution of the test statistic when the null hypothesis is true.In an F-test, the null distribution is an F-distribution....
Null hypothesis
Null hypothesis
The practice of science involves formulating and testing hypotheses, assertions that are capable of being proven false using a test of observed data. The null hypothesis typically corresponds to a general or default position...
Null result
Null result
In science, a null result is a result without the expected content: that is, the proposed result is absent. It is an experimental outcome which does not show an otherwise expected effect. This does not imply a result of zero or nothing, simply a result that does not support the hypothesis...
Nuisance parameter
Nuisance variable
Nuisance variable
In statistics, a nuisance parameter is any parameter which is not of immediate interest but which must be accounted for in the analysis of those parameters which are of interest...
Numerical data
Numerical data
Numerical data is data measured or identified on a numerical scale. Numerical data can be analyzed using statistical methods, and results can be displayed using tables, charts, histograms and graphs. For example, a researcher will ask a questions to a participant that include words how often, how...
Numerical methods for linear least squares
Numerical parameter
Numerical smoothing and differentiation
Numerical smoothing and differentiation
An experimental datum value can be conceptually described as the sum of a signal and some noise, but in practice the two contributions cannot be separated. The purpose of smoothing is to increase the Signal-to-noise ratio without greatly distorting the signal...
NumXL
NumXL
NumXL is an econometrics/time series analysis add-in for Microsoft Excel. Developed by Spider Financial, NumXL provides a wide variety of statistical and time series analysis techniques, including linear and nonlinear time series modeling, statistical tests and others...

— software (Excel addin)
Nuremberg Code
Nuremberg Code
The Nuremberg Code is a set of research ethics principles for human experimentation set as a result of the Subsequent Nuremberg Trials at the end of the Second World War.-Background:...

O

Observable variable
Observable variable
In statistics, observable variables or manifest variables, as opposed to latent variables, are those variables that can be observed and directly measured.- See also :* Observables in physics* Observability in control theory* Latent variable model...
Observational equivalence
Observational equivalence
In econometrics, two parameter values are considered observationally equivalent if they both result in the same probability distribution of observable data...
Observational error
Observational error
Observational error is the difference between a measured value of quantity and its true value. In statistics, an error is not a "mistake". Variability is an inherent part of things being measured and of the measurement process.-Science and experiments:...
Observational study
Observational study
In epidemiology and statistics, an observational study draws inferences about the possible effect of a treatment on subjects, where the assignment of subjects into a treated group versus a control group is outside the control of the investigator...
Observed information
Observed information
In statistics, the observed information, or observed Fisher information, is the negative of the second derivative of the "log-likelihood"...
Occupancy frequency distribution
Occupancy frequency distribution
In macroecology and community ecology, an occupancy frequency distribution is the distribution of the numbers of species occupying different numbers of areas. It was first reported in 1918 by the Danish botanist Christen C. Raunkiær in his study on plant communities...
Odds
Odds
The odds in favor of an event or a proposition are expressed as the ratio of a pair of integers, which is the ratio of the probability that an event will happen to the probability that it will not happen...
Odds algorithm
Odds algorithm
The odds-algorithm is a mathematical method for computing optimalstrategies for a class of problems that belong to the domain of optimal stopping problems. Their solution follows from the odds-strategy, and the importance of the...
Odds ratio
Odds ratio
The odds ratio is a measure of effect size, describing the strength of association or non-independence between two binary data values. It is used as a descriptive statistic, and plays an important role in logistic regression...
Official statistics
Official statistics
Official statistics are statistics published by government agencies or other public bodies such as international organizations. They provide quantitative or qualitative information on all major areas of citizens' lives, such as economic and social development, living conditions, health, education,...
Ogden tables
Ogden tables
Ogden tables are a set of statistical tables and other information for use in court cases in the UK.Their purpose is to make it easier to calculate future losses in personal injury and fatal accident cases. The tables take into account life expectancy and provide a range of discount rates from...
Ogive
Ogive
An ogive is the roundly tapered end of a two-dimensional or three-dimensional object.-Applied physical science and engineering:In ballistics or aerodynamics, an ogive is a pointed, curved surface mainly used to form the approximately streamlined nose of a bullet or other projectile.The traditional...
Omitted-variable bias
Omitted-variable bias
In statistics, omitted-variable bias occurs when a model is created which incorrectly leaves out one or more important causal factors. The 'bias' is created when the model compensates for the missing factor by over- or under-estimating one of the other factors.More specifically, OVB is the bias...
Omnibus test
Omnibus test
Omnibus tests are a kind of statistical test. They test whether the explained variance in a set of data is significantly greater than the unexplained variance, overall. One example is the F-test in the analysis of variance. There can be legitimate significant effects within a model even if the...
One-class classification
One-class classification
One-class classification tries to distinguish one class of objects from all other possible objects, by learning from a training set containing only the objects of that class. This is different from and more difficult than the traditional classification problem, which tries to distinguish between...
One-factor-at-a-time method
One-factor-at-a-time method
The one-factor-at-a-time method is a method of designing experiments involving the testing of factors, or causes, one at a time instead of all simultaneously. Prominent text books and academic papers currently favor factorial experimental designs, a method pioneered by Sir Ronald A. Fisher, where...
One-tailed test — redirects to two-tailed test
Two-tailed test
The two-tailed test is a statistical test used in inference, in which a given statistical hypothesis, H0 , will be rejected when the value of the test statistic is either sufficiently small or sufficiently large...
One-way ANOVA
One-way ANOVA
In statistics, one-way analysis of variance is a technique used to compare means of two or more samples . This technique can be used only for numerical data....
Online NMF
Online NMF
Online NMF is a recently developed method for real time data analysis in an online context. Non-negative matrix factorization in the past has been used for static data analysis and pattern recognition...

Online Non-negative Matrix Factorisation
Open-label trial
Open-label trial
An open-label trial or open trial is a type of clinical trial in which both the researchers and participants know which treatment is being administered....
OpenEpi
OpenEpi
OpenEpi is a free, web-based, open source, operating system-independent series of programs for use in epidemiology, biostatistics, public health, and medicine, providing a number of epidemiologic and statistical tools for summary data. OpenEpi was developed in JavaScript and HTML, and can be run in...

– software
OpenBUGS
OpenBUGS
OpenBUGS is a computer software for the Bayesian analysis of complex statistical models using Markov chain Monte Carlo methods. OpenBUGS is the open source variant of WinBUGS . It runs under Windows and Linux, as well as from inside the R statistical package...

– software
Operational confound
Operational sex ratio
Operational sex ratio
In the evolutionary biology of sexual reproduction, the operational sex ratio is the ratio of sexually competing males that are ready to mate to sexually competing females that are ready to mate...
Operations research
Operations research
Operations research is an interdisciplinary mathematical science that focuses on the effective use of technology by organizations...
Opinion poll
Opinion poll
An opinion poll, sometimes simply referred to as a poll is a survey of public opinion from a particular sample. Opinion polls are usually designed to represent the opinions of a population by conducting a series of questions and then extrapolating generalities in ratio or within confidence...
Optimal decision
Optimal decision
An optimal decision is a decision such that no other available decision options will lead to a better outcome. It is an important concept in decision theory. In order to compare the different decision outcomes, one commonly assigns a relative utility to each of them...
Optimal design
Optimal design
Optimal designs are a class of experimental designs that are optimal with respect to some statistical criterion.In the design of experiments for estimating statistical models, optimal designs allow parameters to be estimated without bias and with minimum-variance...
Optimal discriminant analysis
Optimal discriminant analysis
Optimal discriminant analysis and the related classification tree analysis are statistical methods that maximize predictive accuracy...
Optimal matching
Optimal matching
Optimal matching is a sequence analysis method used in social science, to assess the dissimilarity of ordered arrays of tokens that usually represent a time-ordered sequence of socio-economic states two individuals have experienced. Once such distances have been calculated for a set of observations...
Optimal stopping
Optimal stopping
In mathematics, the theory of optimal stopping is concerned with the problem of choosing a time to take a particular action, in order to maximise an expected reward or minimise an expected cost. Optimal stopping problems can be found in areas of statistics, economics, and mathematical finance...
Optimality criterion
Optimality criterion
In statistics, an optimality criterion provides a measure of the fit of the data to a given hypothesis. The selection process is determined by the solution that optimizes the criteria used to evaluate the alternative hypotheses...
Optional stopping theorem
Optional stopping theorem
In probability theory, the optional stopping theorem says that, under certain conditions, the expected value of a martingale at a stopping time is equal to its initial value...
Order of a kernel
Order of a kernel
The order of a kernel is the first non-zero moment of a kernel....
Order of integration
Order of integration
Order of integration, denoted I, is a summary statistic for a time series. It reports the minimum number of differences required to obtain a stationary series.- Integration of order zero :...
Order statistic
Order statistic
In statistics, the kth order statistic of a statistical sample is equal to its kth-smallest value. Together with rank statistics, order statistics are among the most fundamental tools in non-parametric statistics and inference....
Ordered logit
Ordered logit
In statistics, the ordered logit model , is a regression model for ordinal dependent variables...
Ordered probit
Ordered probit
In statistics, ordered probit is a generalization of the popular probit analysis to the case of more than two outcomes of an ordinal dependent variable. Similarly, the popular logit method also has a counterpart ordered logit....
Ordered subset expectation maximization
Ordinary least squares
Ordinary least squares
In statistics, ordinary least squares or linear least squares is a method for estimating the unknown parameters in a linear regression model. This method minimizes the sum of squared vertical distances between the observed responses in the dataset and the responses predicted by the linear...
Ordination (statistics)
Ordination (statistics)
In multivariate analysis, ordination is a method complementary to data clustering, and used mainly in exploratory data analysis . Ordination orders objects that are characterized by values on multiple variables so that similar objects are near each other and dissimilar objects are farther from...
Ornstein–Uhlenbeck process
Orthogonal array testing
Orthogonality
Orthogonality
Orthogonality occurs when two things can vary independently, they are uncorrelated, or they are perpendicular.-Mathematics:In mathematics, two vectors are orthogonal if they are perpendicular, i.e., they form a right angle...
Orthogonality principle
Orthogonality principle
In statistics and signal processing, the orthogonality principle is a necessary and sufficient condition for the optimality of a Bayesian estimator. Loosely stated, the orthogonality principle says that the error vector of the optimal estimator is orthogonal to any possible estimator...
Outlier
Outlier
In statistics, an outlier is an observation that is numerically distant from the rest of the data. Grubbs defined an outlier as: An outlying observation, or outlier, is one that appears to deviate markedly from other members of the sample in which it occurs....
Outliers in statistics – redirects to Robust statistics
Robust statistics
Robust statistics provides an alternative approach to classical statistical methods. The motivation is to produce estimators that are not unduly affected by small departures from model assumptions.- Introduction :...

(section)
Outliers ratio
Outliers Ratio
In objective video quality assessment, the outliers ratio is a measure of the performance of an objective video quality metric. It is the ratio of "false" scores given by the objective metric to the total number of scores. The "false" scores are the scores that lie outside the intervalwhere MOS...
Outline of probability
Outline of probability
Probability is the likelihood or chance that something is the case or will happen. Probability theory is used extensively in statistics, mathematics, science and philosophy to draw conclusions about the likelihood of potential events and the underlying mechanics of complex systems.The following...
Outline of regression analysis
Outline of regression analysis
In statistics, regression analysis includes any technique for learning about the relationship between one or more dependent variables Y and one or more independent variables X....
Outline of statistics
Overdispersion
Overdispersion
In statistics, overdispersion is the presence of greater variability in a data set than would be expected based on a given simple statistical model....
Overfitting
Overfitting
In statistics, overfitting occurs when a statistical model describes random error or noise instead of the underlying relationship. Overfitting generally occurs when a model is excessively complex, such as having too many parameters relative to the number of observations...
Owen's T function
OxMetrics
OxMetrics
OxMetrics is an econometric software including the Ox programming language for econometrics and statistics, developed by Jurgen Doornik and David Hendry...

– software

P

p-chart
p-rep
P-rep
In statistical hypothesis testing, p-rep or prep has been proposed as a statistical to the classic p-value. Whereas a p-value is the probability of obtaining a result under the null hypothesis, p-rep computes the probability of replicating an effect...
P-value
P-value
In statistical significance testing, the p-value is the probability of obtaining a test statistic at least as extreme as the one that was actually observed, assuming that the null hypothesis is true. One often "rejects the null hypothesis" when the p-value is less than the significance level α ,...
P-P plot
Page's trend test
Page's trend test
In statistics, the Page test for multiple comparisons between ordered correlated variables is the counterpart of Spearman's rank correlation coefficient which summarizes the association of continuous variables. It is also known as Page's trend test or Page's L test...
Paid survey
Paid survey
A paid or incentivized survey is a type of statistical survey where the participants/members are rewarded through an incentive program, generally entry into a sweepstakes program or a small cash reward, for completing one or more surveys.- Details :...
Paired comparison analysis
Paired comparison analysis
In paired-comparison analysis, also known as paired-choice analysis, a range of options are compared and the results are tallied to find an overall winner. A range of plausible options is listed. Each option is compared against each of the other options, determining the preferred option in each case...
Paired difference test
Paired difference test
In statistics, a paired difference test is a type of location test that is used when comparing two sets of measurements to assess whether their population means differ...
Pairwise comparison
Pairwise comparison
Pairwise comparison generally refers to any process of comparing entities in pairs to judge which of each entity is preferred, or has a greater amount of some quantitative property. The method of pairwise comparison is used in the scientific study of preferences, attitudes, voting systems, social...
Pairwise independence
Pairwise independence
In probability theory, a pairwise independent collection of random variables is a set of random variables any two of which are independent. Any collection of mutually independent random variables is pairwise independent, but some pairwise independent collections are not mutually independent...
Panel analysis
Panel analysis
Panel analysis is statistical method, widely used in social science, epidemiology, and econometrics, which deals with two-dimensional panel data. The data are usually collected over time and over the same individuals and then a regression is run over these two dimensions...
Panel data
Panel data
In statistics and econometrics, the term panel data refers to multi-dimensional data. Panel data contains observations on multiple phenomena observed over multiple time periods for the same firms or individuals....
Panjer recursion
Panjer recursion
The Panjer recursion is an algorithm to compute the probability distribution of a compound random variablewhere both N\, and X_i\, are random variables and of special types. In more general cases the distribution of S is a compound distribution. The recursion for the special cases considered was...

– a class of discrete compound distributions
Paleostatistics
Paleostatistics
Paleontology often faces phenomena so vast and complex they can be described only through statistics.First applied to the study of a population in 1662 statistics is today a basic tool for natural sciences practitioners, and a solid acquaintance with methods and applications is essential for...
Paley–Zygmund inequality
Paley–Zygmund inequality
In mathematics, the Paley–Zygmund inequality bounds theprobability that a positive random variable is small, in terms ofits mean and variance...
Parabolic fractal distribution
Parabolic fractal distribution
In probability and statistics, the parabolic fractal distribution is a type of discrete probability distribution in which the logarithm of the frequency or size of entities in a population is a quadratic polynomial of the logarithm of the rank...
PARAFAC (parallel factor analysis)
Parallel factor analysis redirects to PARAFAC
Paradigm (experimental)
Paradigm (experimental)
In the behavioural sciences, e.g. Psychology, Biology, Neurosciences, an experimental paradigm is an experimental setup that is defined by certain fine-tuned standards and often has a theoretical background...
Parameter identification problem
Parameter identification problem
The parameter identification problem is a problem which can occur in the estimation of multiple-equation econometric models where the equations have variables in common....
Parameter space
Parameter space
In science, a parameter space is the set of values of parameters encountered in a particular mathematical model. Often the parameters are inputs of a function, in which case the technical term for the parameter space is domain of a function....
Parametric family
Parametric family
In mathematics and its applications, a parametric family or a parameterized family is a family of objects whose definitions depend on a set of parameters....
Parametric model
Parametric model
In statistics, a parametric model or parametric family or finite-dimensional model is a family of distributions that can be described using a finite number of parameters...
Parametric statistics
Parametric statistics
Parametric statistics is a branch of statistics that assumes that the data has come from a type of probability distribution and makes inferences about the parameters of the distribution. Most well-known elementary statistical methods are parametric....
Pareto analysis
Pareto analysis
Pareto analysis is a statistical technique in decision making that is used for selection of a limited number of tasks that produce significant overall effect. It uses the Pareto principle – the idea that by doing 20% of work, 80% of the advantage of doing the entire job can be generated...
Pareto chart
Pareto distribution
Pareto index
Pareto index
In economics the Pareto index, named after the Italian economist and sociologist Vilfredo Pareto, is a measure of the breadth of income or wealth distribution. It is one of the parameters specifying a Pareto distribution and embodies the Pareto principle...
Pareto interpolation
Pareto interpolation
Pareto interpolation is a method of estimating the median and other properties of a population that follows a Pareto distribution. It is used in economics when analysing the distribution of incomes in a population, when one must base estimates on a relatively small random sample taken from the...
Pareto principle
Pareto principle
The Pareto principle states that, for many events, roughly 80% of the effects come from 20% of the causes.Business-management consultant Joseph M...
Partial autocorrelation — redirects to Partial autocorrelation function
Partial autocorrelation function
In time series analysis, the partial autocorrelation function plays an important role in data analyses aimed at identifying the extent of the lag in an autoregressive model...
Partial autocorrelation function
Partial autocorrelation function
In time series analysis, the partial autocorrelation function plays an important role in data analyses aimed at identifying the extent of the lag in an autoregressive model...
Partial correlation
Partial correlation
In probability theory and statistics, partial correlation measures the degree of association between two random variables, with the effect of a set of controlling random variables removed.-Formal definition:...
Partial least squares
Partial least squares regression
Partial least squares regression
Partial least squares regression is a statistical method that bears some relation to principal components regression; instead of finding hyperplanes of maximum variance between the response and independent variables, it finds a linear regression model by projecting the predicted variables and the...
Partial leverage
Partial regression plot
Partial regression plot
In applied statistics, a partial regression plot attempts to show the effect of adding an additional variable to the model...
Partial residual plot
Partial residual plot
In applied statistics, a partial residual plot is a graphical technique that attempts to show the relationship between a given independent variable and the response variable given that other independent variables are also in the model.-Background:...
Particle filter
Particle filter
In statistics, particle filters, also known as Sequential Monte Carlo methods , are sophisticated model estimation techniques based on simulation...
Partition of sums of squares
Parzen window
Path analysis (statistics)
Path coefficient
Path space
Path space
In mathematics, the term path space refers to any topological space of paths from one specified set into another. In particular, it may refer to* the classical Wiener space of continuous paths;* the Skorokhod space of càdlàg paths....
Pattern recognition
Pattern recognition
In machine learning, pattern recognition is the assignment of some sort of output value to a given input value , according to some specific algorithm. An example of pattern recognition is classification, which attempts to assign each input value to one of a given set of classes...
Pearson's chi-squared test
Pearson's chi-squared test
Pearson's chi-squared test is the best-known of several chi-squared tests – statistical procedures whose results are evaluated by reference to the chi-squared distribution. Its properties were first investigated by Karl Pearson in 1900...

(one of various chi-squared tests)
Pearson distribution
Pearson distribution
The Pearson distribution is a family of continuous probability distributions. It was first published by Karl Pearson in 1895 and subsequently extended by him in 1901 and 1916 in a series of articles on biostatistics.- History :...
Pearson product-moment correlation coefficient
Pearson product-moment correlation coefficient
In statistics, the Pearson product-moment correlation coefficient is a measure of the correlation between two variables X and Y, giving a value between +1 and −1 inclusive...
People v. Collins
People v. Collins
The People of the State of California v. Collins was a 1968 jury trial in California, USA that made notorious forensic use of mathematics and probability.-Trial at first instance:...

(prob/stats related court case)
Per capita
Per capita
Per capita is a Latin prepositional phrase: per and capita . The phrase thus means "by heads" or "for each head", i.e. per individual or per person...
Per-comparison error rate
Per-protocol analysis
Per-protocol analysis
In epidemiology, per-protocol analysis is a strategy of analysis in which only patients who complete the entire clinical trial are counted towards the final results. Intention to treat analysis uses data from all patients, including those who did not complete the study.- External links :* - of...
Percentile
Percentile
In statistics, a percentile is the value of a variable below which a certain percent of observations fall. For example, the 20th percentile is the value below which 20 percent of the observations may be found...
Percentile rank
Percentile rank
The percentile rank of a score is the percentage of scores in its frequency distribution that are the same or lower than it. For example, a test score that is greater than 75% of the scores of people taking the test is said to be at the 75th percentile....
Periodic variation — redirects to Seasonality
Seasonality
In statistics, many time series exhibit cyclic variation known as seasonality, periodic variation, or periodic fluctuations. This variation can be either regular or semi regular....
Periodogram
Periodogram
The periodogram is an estimate of the spectral density of a signal. The term was coined by Arthur Schuster in 1898 as in the following quote:...
Peirce's criterion
Peirce's criterion
In robust statistics, Peirce's criterion is a rule for eliminating outliers from data sets, which was devised by Benjamin Peirce.-The problem of outliers:...
Pensim2
Pensim2
Pensim2 is a dynamic microsimulation model to simulate the income of pensioners, owned by the British Department for Work and Pensions.Pensim2 is the second version of Pensim which was developed in the 1990s. The time horizon of the model is 100 years, by which time today's school leavers will...

— an econometric model
Percentage point
Percentage point
Percentage points are the unit for the arithmetic difference of two percentages.Consider the following hypothetical example: in 1980, 40 percent of the population smoked, and in 1990 only 30 percent smoked...
Permutation test — redirects to Resampling (statistics)
Resampling (statistics)
In statistics, resampling is any of a variety of methods for doing one of the following:# Estimating the precision of sample statistics by using subsets of available data or drawing randomly with replacement from a set of data points # Exchanging labels on data points when performing significance...
Pharmaceutical statistics
Pharmaceutical Statistics
Pharmaceutical Statistics is a peer-reviewed scientific journal that publishes papers related to pharmaceutical statistics. It is the official journal of Statisticians in the Pharmaceutical Industry and is published by John Wiley & Sons....
Phase dispersion minimization
Phase dispersion minimization
Phase dispersion minimization is a data analysis technique that searches for periodic components of a time series data set. It is useful for data sets with gaps, non-sinusoidal variations, poor time coverage or other problems that would make Fourier techniques unusable...
Phase-type distribution
Phase-type distribution
A phase-type distribution is a probability distribution that results from a system of one or more inter-related Poisson processes occurring in sequence, or phases. The sequence in which each of the phases occur may itself be a stochastic process. The distribution can be represented by a random...
Phi coefficient
Phi coefficient
In statistics, the phi coefficient is a measure of association for two binary variables introduced by Karl Pearson. This measure is similar to the Pearson correlation coefficient in its interpretation...
Phillips–Perron test
Philosophy of probability
Philosophy of statistics
Philosophy of statistics
The philosophy of statistics involves the meaning, justification, utility, use and abuse of statistics and its methodology, and ethical and epistemological issues involved in the consideration of choice and interpretation of data and methods of Statistics....
Pie chart
Pie chart
A pie chart is a circular chart divided into sectors, illustrating proportion. In a pie chart, the arc length of each sector , is proportional to the quantity it represents. When angles are measured with 1 turn as unit then a number of percent is identified with the same number of centiturns...
Pignistic probability
Pignistic probability
Pignistic probability, in decision theory, is a probability that a rational person will assign to an option when required to make a decision.A person may have, at one level certain beliefs or a lack of knowledge, or uncertainty, about the options and their actual likelihoods...
Pinsker's inequality
Pinsker's inequality
In information theory, Pinsker's inequality, named after its inventor Mark Semenovich Pinsker, is an inequality that relates Kullback-Leibler divergence and the total variation distance...
Pitman–Koopman–Darmois theorem
Pitman–Yor process
Pivotal quantity
Pivotal quantity
In statistics, a pivotal quantity or pivot is a function of observations and unobservable parameters whose probability distribution does not depend on unknown parameters....
Placebo-controlled study
Plackett–Burman design
Plate notation
Plate notation
Plate notation is a method of representing variables that repeat in a graphical model. Instead of drawing each repeated variable individually, a plate or rectangle is used to group variables into a subgraph that repeat together, and a number is drawn on the plate to represent the number of...
Player wins
Player wins
Player wins is a stat used to estimate the number of games a player won for his team developed by Dean Oliver, the first full-time statistical analyst in the NBA.The formula used to calculate player wins is Player Games * Player Winning Percentage....
Plot (graphics)
Plot (graphics)
A plot is a graphical technique for representing a data set, usually as a graph showing the relationship between two or more variables. The plot can be drawn by hand or by a mechanical or electronic plotter. Graphs are a visual representation of the relationship between variables, very useful for...
Pocock boundary
Pocock boundary
The Pocock boundary is a method for determine whether to stop a clinical trial prematurely. The typical clinical trial compares two groups of patients. One group are given a placebo or conventional treatment, while the other group of patients are given the treatment that is being tested...
Poincaré plot
Poincaré plot
A Poincaré plot, named after Henri Poincaré, is used to quantify self-similarity in processes, usually periodic functions. It is also known as a return map.Given a time series of the form...
Point-biserial correlation coefficient
Point-biserial correlation coefficient
The point biserial correlation coefficient is a correlation coefficient used when one variable is dichotomous; Y can either be "naturally" dichotomous, like gender, or an artificially dichotomized variable. In most situations it is not advisable to artificially dichotomize variables...
Point estimation
Point estimation
In statistics, point estimation involves the use of sample data to calculate a single value which is to serve as a "best guess" or "best estimate" of an unknown population parameter....
Point pattern analysis
Point process
Point process
In statistics and probability theory, a point process is a type of random process for which any one realisation consists of a set of isolated points either in time or geographical space, or in even more general spaces...
Poisson binomial distribution
Poisson distribution
Poisson distribution
In probability theory and statistics, the Poisson distribution is a discrete probability distribution that expresses the probability of a given number of events occurring in a fixed interval of time and/or space if these events occur with a known average rate and independently of the time since...
Poisson hidden Markov model
Poisson hidden Markov model
In statistics, Poisson hidden Markov models are a special case of hidden Markov models where a Poisson process has a rate which varies in association with changes between the different states of a Markov model...
Poisson limit theorem
Poisson limit theorem
The Poisson theorem gives a Poisson approximation to the binomial distribution, under certain conditions. The theorem was named after Siméon-Denis Poisson .- The theorem :If...
Poisson process
Poisson process
A Poisson process, named after the French mathematician Siméon-Denis Poisson , is a stochastic process in which events occur continuously and independently of one another...
Poisson regression
Poisson regression
In statistics, Poisson regression is a form of regression analysis used to model count data and contingency tables. Poisson regression assumes the response variable Y has a Poisson distribution, and assumes the logarithm of its expected value can be modeled by a linear combination of unknown...
Poisson random numbers — redirects to section of Poisson distribution
Poisson distribution
In probability theory and statistics, the Poisson distribution is a discrete probability distribution that expresses the probability of a given number of events occurring in a fixed interval of time and/or space if these events occur with a known average rate and independently of the time since...
Poisson sampling
Poisson sampling
In the theory of finite population sampling, Poisson sampling is a sampling process where each element of the population that is sampled is subjected to an independent Bernoulli trial which determines whether the element becomes part of the sample during the drawing of a single sample.Each element...
Polar distribution — redirects to Circular distribution
Policy capturing
Policy capturing
Policy capturing or "the PC technique" is a statistical method used in social psychology to quantify the relationship between a person's judgement and the information that was used to make that judgement. Policy capturing assessments rely upon regression analysis models...
Political forecasting
Political forecasting
Political forecasting aims at predicting the outcome of elections. Models include:- Opinion polls :Polls are an integral part of political forecasting. However, incorporating poll results into political forecasting models can cause problems in predicting the outcome of elections...
Pollaczek–Khinchine formula
Pollyanna Creep
Pollyanna creep
Pollyanna Creep is a phrase that originated with John Williams, a California-based economic analyst and statistician. It describes the way the U.S. government has modified the way important economic measures are calculated with the purpose of giving a better impression of economic development. This...
Poly-Weibull distribution
Poly-Weibull distribution
In probability theory and statistics, the poly-Weibull distribution is a continuous probability distribution. The distribution is defined to be that of a random variable defined to be the smallest of a number of statistically independent random variables having non-identical Weibull...
Polychoric correlation
Polychoric correlation
In statistics, polychoric correlation is a technique for estimating the correlation between two theorised normally distributed continuous latent variables, from two observed ordinal variables. Tetrachoric correlation is a special case of the polychoric correlation applicable when both observed...
Polynomial and rational function modeling
Polynomial and rational function modeling
In statistical modeling , polynomial functions and rational functions are sometimes used as an empirical technique for curve fitting.-Polynomial function models:A polynomial function is one that has the form...
Polynomial chaos
Polynomial chaos
Polynomial chaos , also called "Wiener Chaos expansion", is a non-sampling based method to determine evolution of uncertainty in dynamical system, when there is probabilistic uncertainty in the system parameters....
Polynomial regression
Polynomial regression
In statistics, polynomial regression is a form of linear regression in which the relationship between the independent variable x and the dependent variable y is modeled as an nth order polynomial...
Polytree
Polytree
In graph theory, a polytree is a directed graph with at most one undirected path between any two vertices. In other words, a polytree is a directed acyclic graph for which there are no undirected cycles either...

(Bayesian networks)
Pooled standard deviation redirects to Pooled variance
Pooled variance
In statistics, many times, data are collected for a dependent variable, y, over a range of values for the independent variable, x. For example, the observation of fuel consumption might be studied as a function of engine speed while the engine load is held constant...
Pooling design
Pooling design
A pooling design is an algorithm to intelligently classify items by testing them in groups or pools rather than individually. The result from the pools is usually binary — either positive or negative. A negative result can imply that all the items tested in that pool were failures, if the...
Popoviciu's inequality on variances
Popoviciu's inequality on variances
In probability theory, Popoviciu's inequality, named after Tiberiu Popoviciu, is an upper bound on the variance of any bounded probability distribution. Let M and m be upper and lower bounds on the values of any random variable with a particular probability distribution...
Population
Statistical population
A statistical population is a set of entities concerning which statistical inferences are to be drawn, often based on a random sample taken from the population. For example, if we were interested in generalizations about crows, then we would describe the set of crows that is of interest...
Population dynamics
Population dynamics
Population dynamics is the branch of life sciences that studies short-term and long-term changes in the size and age composition of populations, and the biological and environmental processes influencing those changes...
Population ecology
Population ecology
Population ecology is a sub-field of ecology that deals with the dynamics of species populations and how these populations interact with the environment. It is the study of how the population sizes of species living together in groups change over time and space....

– application
Population modeling
Population modeling
A population model is a type of mathematical model that is applied to the study of population dynamics.Models allow a better understanding of how complex interactions and processes work. Modeling of dynamic interactions in nature can provide a manageable way of understanding how numbers change over...
Population process
Population process
In applied probability, a population process is a Markov chain in which the state of the chain is analogous to the number of individuals in a population , and changes to the state are analogous to the addition or removal of individuals from the population.Although named by analogy to biological...
Population pyramid
Population pyramid
A population pyramid, also called an age structure diagram, is a graphical illustration that shows the distribution of various age groups in a population , which forms the shape of a pyramid when the population is growing...
Population statistics
Population statistics
Population statistics is the use of statistics to analyze characteristics or changes to a population. It is related to social demography and demography.Population statistics can analyze anything from global demographic changes to local small scale changes...
Population variance
Population viability analysis
Population viability analysis
Population viability analysis is a species-specific method of risk assessment frequently used in conservation biology.It is traditionally defined as the process that determines the probability that a population will go extinct within a given number of years.More recently, PVA has been described...
Portmanteau test
Portmanteau test
A portmanteau test is a type of statistical hypothesis test in which the null hypothesis is well specified, but the alternative hypothesis is more loosely specified. Tests constructed in this context can have the property of being at least moderately powerful against a wide range of departures from...
Positive predictive value
Positive predictive value
In statistics and diagnostic testing, the positive predictive value, or precision rate is the proportion of subjects with positive test results who are correctly diagnosed. It is a critical measure of the performance of a diagnostic method, as it reflects the probability that a positive test...
Post-hoc analysis
Post-hoc analysis
Post-hoc analysis , in the context of design and analysis of experiments, refers to looking at the data—after the experiment has concluded—for patterns that were not specified a priori. It is sometimes called by critics data dredging to evoke the sense that the more one looks the more likely...
Posterior probability
Posterior probability
In Bayesian statistics, the posterior probability of a random event or an uncertain proposition is the conditional probability that is assigned after the relevant evidence is taken into account...
Power law
Power law
A power law is a special kind of mathematical relationship between two quantities. When the frequency of an event varies as a power of some attribute of that event , the frequency is said to follow a power law. For instance, the number of cities having a certain population size is found to vary...
Power transform
Power transform
In statistics, the power transform is from a family of functions that are applied to create a rank-preserving transformation of data using power functions. This is a useful data processing technique used to stabilize variance, make the data more normal distribution-like, improve the correlation...
Prais–Winsten estimation
Pre- and post-test probability
Pre- and post-test probability
Pre-test probability and post-test probability are the subjective probabilities of the presence of a condition before and after a diagnostic test, respectively...
Precision (statistics)
Precision (statistics)
In statistics, the term precision can mean a quantity defined in a specific way. This is in addition to its more general meaning in the contexts of accuracy and precision and of precision and recall....
Precision and recall
Precision and recall
In pattern recognition and information retrieval, precision is the fraction of retrieved instances that are relevant, while recall is the fraction of relevant instances that are retrieved. Both precision and recall are therefore based on an understanding and measure of relevance...
Prediction interval
Prediction interval
In statistical inference, specifically predictive inference, a prediction interval is an estimate of an interval in which future observations will fall, with a certain probability, given what has already been observed...
Predictive analytics
Predictive analytics
Predictive analytics encompasses a variety of statistical techniques from modeling, machine learning, data mining and game theory that analyze current and historical facts to make predictions about future events....
Predictive inference
Predictive inference
Predictive inference is an approach to statistical inference that emphasizes the prediction of future observations based on past observations.Initially, predictive inference was based on observable parameters and it was the main purpose of studying probability, but it fell out of favor in the 20th...
Predictive informatics
Predictive informatics
Predictive informatics is the combination of predictive modeling and informatics applied to healthcare, pharmaceutical, life sciences and business industries....
Predictive modelling
Predictive modelling
Predictive modelling is the process by which a model is created or chosen to try to best predict the probability of an outcome. In many cases the model is chosen on the basis of detection theory to try to guess the probability of an outcome given a set amount of input data, for example given an...
Predictive validity
Predictive validity
In psychometrics, predictive validity is the extent to which a score on a scale or test predicts scores on some criterion measure.For example, the validity of a cognitive test for job performance is the correlation between test scores and, for example, supervisor performance ratings...
Preference regression (in marketing)
Preference regression (in marketing)
Preference regression is a statistical technique used by marketers to determine consumers’ preferred core benefits. It usually supplements product positioning techniques like multi dimensional scaling or factor analysis and is used to create ideal vectors on perceptual maps.-Application:Starting...
Preferential attachment process — redirects to Preferential attachment
Preferential attachment
A preferential attachment process is any of a class of processes in which some quantity, typically some form of wealth or credit, is distributed among a number of individuals or objects according to how much they already have, so that those who are already wealthy receive more than those who are not...
Prevalence
Prevalence
In epidemiology, the prevalence of a health-related state in a statistical population is defined as the total number of cases of the risk factor in the population at a given time, or the total number of cases in the population, divided by the number of individuals in the population...
Principal component analysis
- Multilinear principal-component analysis
Principal component regression
Principal component regression
In statistics, principal component regression is a regression analysis that uses principal component analysis when estimating regression coefficients...
Principal geodesic analysis
Principal geodesic analysis
In geometric data analysis and statistical shape analysis, principal geodesic analysis is a generalization of principal component analysis to a non-Euclidean, non-linear setting of manifolds suitable for use with shape descriptors such as medial representations....
Principal stratification
Principal stratification
Principal stratification is a statistical technique used in causal inference.-References: * Zhang, Junni L.; Rubin, Donald B. "Estimation of Causal Effects via Principal Stratification When Some Outcomes are Truncated by “Death”", Journal of Educational and Behavioral Statistics, 28: 353–368...
Principle of indifference
Principle of indifference
The principle of indifference is a rule for assigning epistemic probabilities.Suppose that there are n > 1 mutually exclusive and collectively exhaustive possibilities....
Principle of marginality
Principle of marginality
In statistics, the principle of marginality refers to the fact that the average effects, of variables in an analysis are marginal to their interaction effect...
Principle of maximum entropy
Principle of maximum entropy
In Bayesian probability, the principle of maximum entropy is a postulate which states that, subject to known constraints , the probability distribution which best represents the current state of knowledge is the one with largest entropy.Let some testable information about a probability distribution...
Prior knowledge for pattern recognition
Prior knowledge for pattern recognition
Pattern recognition is a very active field of research intimately bound to machine learning. Also known as classification or statistical classification, pattern recognition aims at building a classifier that can determine the class of an input pattern...
Prior probability
Prior probability
In Bayesian statistical inference, a prior probability distribution, often called simply the prior, of an uncertain quantity p is the probability distribution that would express one's uncertainty about p before the "data"...
Prior probability distribution redirects to Prior probability
Prior probability
In Bayesian statistical inference, a prior probability distribution, often called simply the prior, of an uncertain quantity p is the probability distribution that would express one's uncertainty about p before the "data"...
Probabilistic causation
Probabilistic causation
Probabilistic causation designates a group of philosophical theories that aim to characterize the relationship between cause and effect using the tools of probability theory...
Probabilistic design
Probabilistic design
Probabilistic design is a discipline within engineering design. It deals primarily with the consideration of the effects of random variability upon the performance of an engineering system during the design phase. Typically, these effects are related to quality and reliability...
Probabilistic forecasting
Probabilistic forecasting
Probabilistic forecasting summarises what is known, or opinions about, future events. In contrast to a single-valued forecasts , probabilistic forecasts assign a probability to each of a number of different outcomes,...
Probabilistic latent semantic analysis
Probabilistic latent semantic analysis
Probabilistic latent semantic analysis , also known as probabilistic latent semantic indexing is a statistical technique for the analysis of two-mode and co-occurrence data. PLSA evolved from latent semantic analysis, adding a sounder probabilistic model...
Probabilistic metric space
Probabilistic metric space
A probabilistic metric space is a generalization of metric spaces where the distance is no longer defined on positive real numbers, but on distribution functions....
Probabilistic proposition
Probabilistic proposition
A probabilistic proposition is a proposition with a measured probability of being true for an arbitrary person at an arbitrary time.These are some examples of probabilistic propositions collected by the Mindpixel project:* You are not human 0.17...
Probabilistic relational model
Probabilistic relational model
A Probabilistic relational model is the counterpart of a Bayesian network in statistical relational learning.-References:*Friedman N, Getoor L, Koller D, Pfeffer A....
Probability
Probability
Probability is ordinarily used to describe an attitude of mind towards some proposition of whose truth we arenot certain. The proposition of interest is usually of the form "Will a specific event occur?" The attitude of mind is of the form "How certain are we that the event will occur?" The...
Probability and statistics
Probability and statistics
See the separate articles on probability or the article on statistics. Statistical analysis often uses probability distributions, and the two topics are often studied together. However, probability theory contains much that is of mostly mathematical interest and not directly relevant to statistics...
Probability density function
Probability density function
In probability theory, a probability density function , or density of a continuous random variable is a function that describes the relative likelihood for this random variable to occur at a given point. The probability for the random variable to fall within a particular region is given by the...
Probability distribution
Probability distribution
In probability theory, a probability mass, probability density, or probability distribution is a function that describes the probability of a random variable taking certain values....
Probability distribution function
Probability distribution function
Depending upon which text is consulted, a probability distribution function is any of:* a probability distribution function,* a cumulative distribution function,* a probability mass function, or* a probability density function....

(disambiguation)
Probability integral transform
Probability integral transform
In statistics, the probability integral transform or transformation relates to the result that data values that are modelled as being random variables from any given continuous distribution can be converted to random variables having a uniform distribution...
Probability interpretations
Probability interpretations
The word probability has been used in a variety of ways since it was first coined in relation to games of chance. Does probability measure the real, physical tendency of something to occur, or is it just a measure of how strongly one believes it will occur? In answering such questions, we...
Probability mass function
Probability mass function
In probability theory and statistics, a probability mass function is a function that gives the probability that a discrete random variable is exactly equal to some value...
Probability matching
Probability matching
Probability matching is a suboptimal decision strategy in which predictions of class membership are proportional to the class base rates. Thus, if in the training set positive examples are observed 60% of the time, and negative examples are observed 40% of the time, the observer using a...
Probability metric
Probability of error
Probability of error
In statistics, the term "error" arises in two ways. Firstly, it arises in the context of decision making, where the probability of error may be considered as being the probability of making a wrong decision and which would have a different value for each type of error...
Probability of precipitation
Probability of Precipitation
A probability of precipitation is a formal measure of the likelihood of precipitation that is often published from weather forecasting models. Its definition varies.-U.S. usage:...
Probability plot
Probability plot
In statistics, a P-P plot is a probability plot for assessing how closely two data sets agree, which plots the two cumulative distribution functions against each other....
Probability plot correlation coefficient — redirects to Q-Q plot
Q-Q plot
In statistics, a Q-Q plot is a probability plot, which is a graphical method for comparing two probability distributions by plotting their quantiles against each other. First, the set of intervals for the quantiles are chosen...
Probability plot correlation coefficient plot
Probability plot correlation coefficient plot
Many statistical analyses are based on distributional assumptions about the population from which the data have been obtained. However, distributional families can have radically different shapes depending on the value of the shape parameter. Therefore, finding a reasonable choice for the shape...
Probability space
Probability space
In probability theory, a probability space or a probability triple is a mathematical construct that models a real-world process consisting of states that occur randomly. A probability space is constructed with a specific kind of situation or experiment in mind...
Probability theory
Probability theory
Probability theory is the branch of mathematics concerned with analysis of random phenomena. The central objects of probability theory are random variables, stochastic processes, and events: mathematical abstractions of non-deterministic events or measured quantities that may either be single...
Probability-generating function
Probability-generating function
In probability theory, the probability-generating function of a discrete random variable is a power series representation of the probability mass function of the random variable...
Probable error
Probable error
-Statistics:In statistics, the probable error of a quantity is a value describing the probability distribution of that quantity. It defines the half-range of an interval about a cental point for the distribution, such that half of the values from the distribution will lie within the interval and...
Probit
Probit
In probability theory and statistics, the probit function is the inverse cumulative distribution function , or quantile function associated with the standard normal distribution...
Probit model
Probit model
In statistics, a probit model is a type of regression where the dependent variable can only take two values, for example married or not married....
Procedural confound
Process Window Index
Process Window Index
Process Window Index is a statistical measure that quantifies the robustness of a manufacturing process which involves heating and cooling, known as a thermal process...
Procrustes analysis
Procrustes analysis
In statistics, Procrustes analysis is a form of statistical shape analysis used to analyse the distribution of a set of shapes. The name Procrustes refers to a bandit from Greek mythology who made his victims fit his bed either by stretching their limbs or cutting them off.To compare the shape of...
Proebsting's paradox
Proebsting's paradox
In probability theory, Proebsting's paradox is an argument that appears to show that the Kelly criterion can lead to ruin. Although it can be resolved mathematically, it raises some interesting issues about the practical application of Kelly, especially in investing. It was named and first...
Product distribution
Product distribution
A product distribution is a probability distribution constructed as the distribution of the product of random variables having two other known distributions...
Product form solution
Product form solution
In probability theory, a product form solution is a particularly efficient form of solution for determining some metric of a system with distinct sub-components, where the metric for the collection of components can be written as a product of the metric across the different components...
Profile likelihood redirects to Likelihood function
Likelihood function
In statistics, a likelihood function is a function of the parameters of a statistical model, defined as follows: the likelihood of a set of parameter values given some observed outcomes is equal to the probability of those observed outcomes given those parameter values...
Progressively measurable process
Progressively measurable process
In mathematics, progressive measurability is a property of stochastic processes. A progressively measurable process is one for which events defined in terms of values of the process across a range of times can be assigned probabilities . Being progressively measurable is a strictly stronger...
Prognostics
Prognostics
Prognostics is an engineering discipline focused on predicting the time at which a system or a component will no longer perform its intended function . This lack of performance is most often a failure beyond which the system can no longer be used to meet desired performance...
Projection pursuit
Projection pursuit
Projection pursuit is a type of statistical technique which involves finding the most "interesting" possible projections in multidimensional data. Often, projections which deviate more from a Normal distribution are considered to be more interesting...
Projection pursuit regression
Projection pursuit regression
In statistics, projection pursuit regression is a statistical model developed by Jerome H. Friedman and Werner Stuetzle which is an extension of additive models...
Proof of Stein's example
Proof of Stein's example
Stein's example is an important result in decision theory which can be stated asThe following is an outline of its proof. The reader is referred to the main article for more information.-Sketched proof:...
Propagation of uncertainty
Propagation of uncertainty
In statistics, propagation of error is the effect of variables' uncertainties on the uncertainty of a function based on them...
Propensity probability
Propensity probability
The propensity theory of probability is one interpretation of the concept of probability. Theorists who adopt this interpretation think of probability as a physical propensity, or disposition, or tendency of a given type of physical situation to yield an outcome of a certain kind, or to yield a...
Propensity score
Propensity score
In the design of experiments, a propensity score is the probability of a unit being assigned to a particular condition in a study given a set of known covariates...
Propensity score matching
Propensity score matching
In the statistical analysis of observational data, propensity score matching is a methodology attempting to provide unbiased estimation of treatment-effects...
Proper linear model
Proper linear model
In statistics, a proper linear model is a linear regression model in which the weights given to the predictor variables are chosen in such a way as to optimize the relationship between the prediction and the criterion. Simple regression analysis is the most common example of a proper linear model...
Proportional hazards models
Proportional hazards models
Proportional hazards models are a class of survival models in statistics. Survival models relate the time that passes before some event occurs to one or more covariates that may be associated with that quantity. In a proportional hazards model, the unique effect of a unit increase in a covariate...
Proportional reduction in loss
Proportional reduction in loss
Proportional reduction in loss refers to a general framework for developing and evaluating measures of the reliability of particular ways of making observations which are possibly subject to errors of all types...
Prosecutor's fallacy
Prosecutor's fallacy
The prosecutor's fallacy is a fallacy of statistical reasoning made in law where the context in which the accused has been brought to court is falsely assumed to be irrelevant to judging how confident a jury can be in evidence against them with a statistical measure of doubt...
Proxy (statistics)
Proxy (statistics)
In statistics, a proxy variable is something that is probably not in itself of any great interest, but from which a variable of interest can be obtained...
Psephology
Psephology
Psephology is that branch of political science which deals with the study and scientific analysis of elections. Psephology uses historical precinct voting data, public opinion polls, campaign finance information and similar statistical data. The term was coined in the United Kingdom in 1952 by...
Pseudo-determinant
Pseudo-determinant
In linear algebra and statistics, the pseudo-determinant is the product of all non-zero eigenvalues of a square matrix. It coincides with the regular determinant when the matrix is non-singular.- Definition :...
Pseudocount
Pseudocount
A pseudocount is an amount added to the number of observed cases in order to change the expected probability in a model of those data, when not known to be zero. Depending on the prior knowledge, which is sometimes a subjective value, a pseudocount may have any non-negative finite value...
Pseudolikelihood
Pseudolikelihood
In statistical theory, a pseudolikelihood is an approximation to the joint probability distribution of a collection of random variables. The practical use of this is that it can provide an approximation to the likelihood function of a set of observed data which may either provide a computationally...
Pseudomedian
Pseudomedian
In statistics, the pseudomedian is defined as the median of all possible midpoints of pairs of observations. It is the Hodges–Lehmann one-sample estimate of the central location for a probability distribution.-References:...
Pseudoreplication
Pseudoreplication
Hurlbert defined pseudoreplication as the use of inferential statistics to test for treatment effects with data from experiments where either treatments are not replicated or replicates are not statistically independent....
PSPP
PSPP
PSPP is a free software application for analysis of sampled data. It has a graphical user interface and conventional command line interface. It is written in C, uses GNU Scientific Library for its mathematical routines, and plotutils for generating graphs....

(free software)
Psychological statistics
Psychological statistics
Psychological statistics is the application of statistics to psychology. Some of the more common applications include:#psychometrics#learning theory#perception#human development#abnormal psychology#Personality test#psychological tests...
Psychometrics
Psychometrics
Psychometrics is the field of study concerned with the theory and technique of psychological measurement, which includes the measurement of knowledge, abilities, attitudes, personality traits, and educational measurement...
Pythagorean expectation
Pythagorean expectation
Pythagorean expectation is a formula invented by Bill James to estimate how many games a baseball team "should" have won based on the number of runs they scored and allowed. Comparing a team's actual and Pythagorean winning percentage can be used to evaluate how lucky that team was...

Q

Q test
Q test
In statistics, Dixon's Q test, or simply the Q test, is used for identification and rejection of outliers. Per Dean and Dixon, and others, this test should be used sparingly and never more than once in a data set...
Q research software
Q research software
Q research software is computer software for the analysis of market research data. Launched in 2007, Q is developed by Numbers International Pty Ltd.- Interactive data analysis :...
Q-exponential distribution
Q-function
Q-function
In statistics, the Q-function is the tail probability of the standard normal distribution. In other words, Q is the probability that a standard normal random variable will obtain a value larger than x...
Q-Gaussian distribution
Q-Gaussian distribution
In q-analog theory, the q-Gaussian is a probability distribution arising from the maximization of the Tsallis entropy under appropriate constraints. It is one example of a Tsallis distribution. The q-Gaussian is a generalization of the Gaussian in the same way that Tsallis entropy is a...
Q-Q plot
Q-Q plot
In statistics, a Q-Q plot is a probability plot, which is a graphical method for comparing two probability distributions by plotting their quantiles against each other. First, the set of intervals for the quantiles are chosen...
Q-statistic
Q-statistic
The Q-statistic is a test statistic output by either the Box-Pierce test or, in a modified version which provides better small sample properties, by the Ljung-Box test. It follows the chi-squared distribution...
Quadrat
Quadrat
A quadrat is a square used in ecology and geography to isolate a sample, usually about 1m2 or 0.25m2. The quadrat is suitable for sampling plants, slow-moving animals , and some aquatic organisms.When an ecologist wants to know how many organisms there are in a particular habitat, it would not be...
Quadratic classifier
Quadratic classifier
A quadratic classifier is used in machine learning and statistical classification to separate measurements of two or more classes of objects or events by a quadric surface...
Quadratic form (statistics)
Quadratic form (statistics)
If \epsilon is a vector of n random variables, and \Lambda is an n-dimensional symmetric matrix, then the scalar quantity\epsilon^T\Lambda\epsilonis known as a quadratic form in \epsilon.-Expectation:It can be shown that...
Quadratic variation
Quadratic variation
In mathematics, quadratic variation is used in the analysis of stochastic processes such as Brownian motion and martingales. Quadratic variation is just one kind of variation of a process.- Definition :...
Qualitative comparative analysis
Qualitative comparative analysis
Qualitative Comparative Analysis is a technique, developed by Charles Ragin in 1987, for solving the problems that are caused by making causal inferences on the basis of only a small number of cases...
Qualitative data
Qualitative variation
Qualitative variation
An index of qualitative variation is a measure of statistical dispersion in nominal distributions. There are a variety of these, but they have been relatively little-studied in the statistics literature...
Quality control
Quality control
Quality control, or QC for short, is a process by which entities review the quality of all factors involved in production. This approach places an emphasis on three aspects:...
Quantile
Quantile
Quantiles are points taken at regular intervals from the cumulative distribution function of a random variable. Dividing ordered data into q essentially equal-sized data subsets is the motivation for q-quantiles; the quantiles are the data values marking the boundaries between consecutive subsets...
Quantile function
Quantile function
In probability and statistics, the quantile function of the probability distribution of a random variable specifies, for a given probability, the value which the random variable will be at, or below, with that probability...
Quantile normalization
Quantile normalization
In statistics, quantile normalization is a technique for making two distributions identical in statistical properties. To quantile-normalize a test distribution to a reference distribution of the same length, sort the test distribution and sort the reference distribution...
Quantile regression
Quantile regression
Quantile regression is a type of regression analysis used in statistics. Whereas the method of least squares results in estimates that approximate the conditional mean of the response variable given certain values of the predictor variables, quantile regression results in estimates approximating...
Quantitative marketing research
Quantitative marketing research
Quantitative marketing research is the application of quantitative research techniques to the field of marketing. It has roots in both the positivist view of the world, and the modern marketing viewpoint that marketing is an interactive process in which both the buyer and seller reach a satisfying...
Quantitative parasitology
Quantitative parasitology
-Counting parasites:Quantifying parasites in a sample of hosts or comparing measures of infection across two or more samples can be challenging.The parasitic infection of a sample of hosts inherently exhibits a complex pattern that cannot be adequately quantified by a single statistical measure...
Quantitative psychological research
Quantitative psychological research
Quantitative psychological research is defined as psychological research which performs mathematical modeling and statistical estimation or statistical inference. This definition distinguishes it from so-called qualitative psychological research; however, many psychologists do not acknowledge any...
Quantitative research
Quantitative research
In the social sciences, quantitative research refers to the systematic empirical investigation of social phenomena via statistical, mathematical or computational techniques. The objective of quantitative research is to develop and employ mathematical models, theories and/or hypotheses pertaining to...
Quantum (Statistical programming language)
Quartile
Quartile
In descriptive statistics, the quartiles of a set of values are the three points that divide the data set into four equal groups, each representing a fourth of the population being sampled...
Quartile coefficient of dispersion
Quartile coefficient of dispersion
In statistics, the quartile coefficient of dispersion is a descriptive statistic which measures dispersion and which is used to make comparisons within and between data sets....
Quasi-birth–death process
Quasi-experiment
Quasi-experiment
A quasi-experiment is an empirical study used to estimate the causal impact of an intervention on its target population. Quasi-experimental research designs share many similarities with the traditional experimental design or randomized controlled trial, but they specifically lack the element of...
Quasi-experimental design — redirects to Design of quasi-experiments
Quasi-likelihood
Quasi-likelihood
In statistics, quasi-likelihood estimation is one way of allowing for overdispersion, that is, greater variability in the data than would be expected from the statistical model used. It is most often used with models for count data or grouped binary data, i.e...
Quasi-maximum likelihood
Quasi-maximum likelihood
A quasi-maximum likelihood estimate is an estimate of a parameter θ in a statistical model that is formed by maximizing a function that is related to the logarithm of the likelihood function, but is not equal to it...
Quasireversibility
Quasireversibility
In probability theory, specifically queueing theory, quasireversibility is a property of some queues. The concept was first identified by Richard R. Muntz and further developed by Frank Kelly. Quasireversibility differs from reversibility in that a stronger condition is imposed on arrival rates...
Queueing model
Queueing model
In queueing theory, a queueing model is used to approximate a real queueing situation or system, so the queueing behaviour can be analysed mathematically...
Queueing theory
Queueing theory
Queueing theory is the mathematical study of waiting lines, or queues. The theory enables mathematical analysis of several related processes, including arriving at the queue, waiting in the queue , and being served at the front of the queue...
Queuing delay
Queuing delay
In telecommunication and computer engineering, the queuing delay is the time a job waits in a queue until it can be executed. It is a key component of network delay....
Queuing theory in teletraffic engineering
Quota sampling
Quota sampling
Quota sampling is a method for selecting survey participants. In quota sampling, a population is first segmented into mutually exclusive sub-groups, just as in stratified sampling. Then judgment is used to select the subjects or units from each segment based on a specified proportion. For example,...

R

R programming language — redirects to R (programming language)
R (programming language)
R is a programming language and software environment for statistical computing and graphics. The R language is widely used among statisticians for developing statistical software, and R is widely used for statistical software development and data analysis....
R v Adams
R v Adams
R v Adams [1996] 2 Cr App R 467, [1996] Crim LR 898, CA and R v Adams [1998] 1 Cr App R 377, The Times, 3 November 1997, CA, are rulings that ousted explicit Bayesian statistics from the reasoning admissible before a jury in DNA cases.-Facts:...

(prob/stats related court case)
Radar chart
Radar chart
A radar chart is a graphical method of displaying multivariate data in the form of a two-dimensional chart of three or more quantitative variables represented on axes starting from the same point...
Rademacher distribution
Radial basis function network
Radial basis function network
A radial basis function network is an artificial neural network that uses radial basis functions as activation functions. It is a linear combination of radial basis functions...
Raikov's theorem
Raikov's theorem
In probability theory, Raikov’s theorem, named after Dmitry Raikov, states that if the sum of two independent random variables X and Y has a Poisson distribution, then both X and Y themselves must have the Poisson distribution. It says the same thing about the Poisson distribution that Cramér's...
Raised cosine distribution
Ramsey RESET test
Ramsey reset test
The Ramsey Regression Equation Specification Error Test test is a general specification test for the linear regression model. More specifically, it tests whether non-linear combinations of the estimated values help explain the endogenous variable...

— the Ramsey Regression Equation Specification Error Test
Rand index
Rand index
The Rand index or Rand measure in statistics, and in particular in data clustering, is a measure of the similarity between two data clusterings...
Random assignment
Random assignment
Random assignment or random placement is an experimental technique for assigning subjects to different treatments . The thinking behind random assignment is that by randomizing treatment assignment, then the group attributes for the different treatments will be roughly equivalent and therefore any...
Random compact set
Random compact set
In mathematics, a random compact set is essentially a compact set-valued random variable. Random compact sets are useful in the study of attractors for random dynamical systems.-Definition:...
Random data — see randomness
Randomness
Randomness has somewhat differing meanings as used in various fields. It also has common meanings which are connected to the notion of predictability of events....
Random effects estimation — redirects to Random effects model
Random effects model
Random element
Random element
In probability theory, random element is a generalization of the concept of random variable to more complicated spaces than the simple real line...
Random field
Random field
A random field is a generalization of a stochastic process such that the underlying parameter need no longer be a simple real or integer valued "time", but can instead take values that are multidimensional vectors, or points on some manifold....
Random graph
Random graph
In mathematics, a random graph is a graph that is generated by some random process. The theory of random graphs lies at the intersection between graph theory and probability theory, and studies the properties of typical random graphs.-Random graph models:...
Random matrix
Random matrix
In probability theory and mathematical physics, a random matrix is a matrix-valued random variable. Many important properties of physical systems can be represented mathematically as matrix problems...
Random measure
Random multinomial logit
Random multinomial logit
In statistics and machine learning, random multinomial logit is a technique for statistical classification using repeated multinomial logit analyses via Leo Breiman's random forests.-Rationale for the new method:...
Random naive Bayes
Random naive Bayes
Random naive Bayes extends the Naive Bayes classifier by adopting the random forest principles: random input selection, bagging , and random feature selection .- Naive Bayes classifier :...
Random permutation statistics
Random permutation statistics
The statistics of random permutations, such as the cycle structure of a random permutation are of fundamental importance in the analysis of algorithms, especially of sorting algorithms, which operate on random permutations. Suppose, for example, that we are using quickselect to select a random...
Random regular graph
Random sample
Random sample
In statistics, a sample is a subject chosen from a population for investigation; a random sample is one chosen by a method involving an unpredictable component...
Random sampling
Random sequence
Random sequence
The concept of a random sequence is essential in probability theory and statistics. The concept generally relies on the notion of a sequence of random variables and many statistical discussions begin with the words "let X1,...,Xn be independent random variables...". Yet as D. H. Lehmer stated in...
Random variable
Random variable
In probability and statistics, a random variable or stochastic variable is, roughly speaking, a variable whose value results from a measurement on some type of random process. Formally, it is a function from a probability space, typically to the real numbers, which is measurable functionmeasurable...
Random variate
Random variate
A random variate is a particular outcome of a random variable: the random variates which are other outcomes of the same random variable would have different values. Random variates are used when simulating processes driven by random influences...
Random walk
Random walk
A random walk, sometimes denoted RW, is a mathematical formalisation of a trajectory that consists of taking successive random steps. For example, the path traced by a molecule as it travels in a liquid or a gas, the search path of a foraging animal, the price of a fluctuating stock and the...
Random walk hypothesis
Random walk hypothesis
The random walk hypothesis is a financial theory stating that stock market prices evolve according to a random walk and thus the prices of the stock market cannot be predicted. It is consistent with the efficient-market hypothesis....
Randomization
Randomization
Randomization is the process of making something random; this means:* Generating a random permutation of a sequence .* Selecting a random sample of a population ....
Randomized block design
Randomized block design
In the statistical theory of the design of experiments, blocking is the arranging of experimental units in groups that are similar to one another. Typically, a blocking factor is a source of variability that is not of primary interest to the experimenter...
Randomized controlled trial
Randomized controlled trial
A randomized controlled trial is a type of scientific experiment - a form of clinical trial - most commonly used in testing the safety and efficacy or effectiveness of healthcare services or health technologies A randomized controlled trial (RCT) is a type of scientific experiment - a form of...
Randomized experiment
Randomized experiment
In science, randomized experiments are the experiments that allow the greatest reliability and validity of statistical estimates of treatment effects...
Randomized response
Randomized response
Randomized response is a research method used in structured survey interview. It was first proposed by S. L. Warner in 1965 and later modified by B. G. Greenberg in 1969. It allows respondents to respond to sensitive issues while maintaining confidentiality...
Randomness
Randomness
Randomness has somewhat differing meanings as used in various fields. It also has common meanings which are connected to the notion of predictability of events....
Randomness tests
Randomness tests
The issue of randomness is an important philosophical and theoretical question.Many random number generators in use today generate what are called "random sequences" but they are actually the result of prescribed algorithms and so they are called pseudo-random number generators.These generators do...
Range (statistics)
Range (statistics)
In the descriptive statistics, the range is the length of the smallest interval which contains all the data. It is calculated by subtracting the smallest observation from the greatest and provides an indication of statistical dispersion.It is measured in the same units as the data...
Rank abundance curve
Rank abundance curve
A rank abundance curve or "Whittaker plot" is a chart used by ecologists to display relative species abundance, a component of biodiversity. It can also be used to visualize species richness and species evenness...
Rank correlation
Rank correlation
In statistics, a rank correlation is the relationship between different rankings of the same set of items. A rank correlation coefficient measures the degree of similarity between two rankings, and can be used to assess its significance....

mainly links to two following
- Spearman's rank correlation coefficient
  Spearman's rank correlation coefficient
  In statistics, Spearman's rank correlation coefficient or Spearman's rho, named after Charles Spearman and often denoted by the Greek letter \rho or as r_s, is a non-parametric measure of statistical dependence between two variables. It assesses how well the relationship between two variables can...
- Kendall tau rank correlation coefficient
  Kendall tau rank correlation coefficient
  In statistics, the Kendall rank correlation coefficient, commonly referred to as Kendall's tau coefficient, is a statistic used to measure the association between two measured quantities...
Rank product
Rank product
The rank product is a biologically motivated test for the detection of differentially expressed genes in replicated microarray experiments.It is a simple non-parametric statistical method based on ranks of fold changes...
Rank-size distribution
Rank-size distribution
Rank-size distribution or the rank-size rule describes the remarkable regularity in many phenomena including the distribution of city sizes around the world, sizes of businesses, particle sizes , lengths of rivers, frequencies of word usage, wealth among individuals, etc...
Ranking
Ranking
A ranking is a relationship between a set of items such that, for any two items, the first is either 'ranked higher than', 'ranked lower than' or 'ranked equal to' the second....
Rankit
Rankit
In statistics, rankits of a set of data are the expected values of the order statistics of a sample from the standard normal distribution the same size as the data. They are primarily used in the normal probability plot, a graphical technique for normality testing.-Example:This is perhaps most...
Ranklet
Ranklet
In statistics, a ranklet is an orientation-selective non-parametric feature which is based on the computation of Mann–Whitney–Wilcoxon rank-sum test statistics...
RANSAC
RANSAC
RANSAC is an abbreviation for "RANdom SAmple Consensus". It is an iterative method to estimate parameters of a mathematical model from a set of observed data which contains outliers. It is a non-deterministic algorithm in the sense that it produces a reasonable result only with a certain...
Rational quadratic covariance function
Rational quadratic covariance function
In statistics, the rational quadratic covariance function is used in spatial statistics, geostatistics, machine learning, image analysis, and other fields where multivariate statistical analysis is conducted on metric spaces. It is commonly used to define the statistical covariance between...
Rao–Blackwell theorem
Rao–Blackwell theorem
In statistics, the Rao–Blackwell theorem, sometimes referred to as the Rao–Blackwell–Kolmogorov theorem, is a result which characterizes the transformation of an arbitrarily crude estimator into an estimator that is optimal by the mean-squared-error criterion or any of a variety of similar...
Rao-Blackwellisation — redirects to *Rao–Blackwell theorem
Rao–Blackwell theorem
In statistics, the Rao–Blackwell theorem, sometimes referred to as the Rao–Blackwell–Kolmogorov theorem, is a result which characterizes the transformation of an arbitrarily crude estimator into an estimator that is optimal by the mean-squared-error criterion or any of a variety of similar...
Rasch model
Rasch model
Rasch models are used for analysing data from assessments to measure variables such as abilities, attitudes, and personality traits. For example, they may be used to estimate a student's reading ability from answers to questions on a reading assessment, or the extremity of a person's attitude to...
- Polytomous Rasch model
  Polytomous Rasch model
  The polytomous Rasch model is generalization of the dichotomous Rasch model. It is a measurement model that has potential application in any context in which the objective is to measure a trait or ability through a process in which responses to items are scored with successive integers...
Rasch model estimation
Rasch model estimation
Estimation of a Rasch model is used to estimate the parameters of the Rasch model. Various techniques are employed to estimate the parameters from matrices of response data. The most common approaches are types of maximum likelihood estimation, such as joint and conditional maximum likelihood...
Ratio distribution
Ratio distribution
A ratio distribution is a probability distribution constructed as the distribution of the ratio of random variables having two other known distributions....
Rayleigh distribution
Raw score
Raw score
In statistics and data analysis, a raw score is an original datum that has not been transformed. This may include, for example, the original result obtained by a student on a test as opposed to that score after transformation to a standard score or percentile rank or the like.Often the conversion...
Realization (probability)
Realization (probability)
In probability and statistics, a realization, or observed value, of a random variable is the value that is actually observed . The random variable itself should be thought of as the process how the observation comes about...
Recall bias
Recall bias
In psychology, recall bias is a type of systematic bias which occurs when the way a survey respondent answers a question is affected not just by the correct answer, but also by the respondent's memory. This can affect the results of the survey. As a hypothetical example, suppose that a survey in...
Receiver operating characteristic
Receiver operating characteristic
In signal detection theory, a receiver operating characteristic , or simply ROC curve, is a graphical plot of the sensitivity, or true positive rate, vs. false positive rate , for a binary classifier system as its discrimination threshold is varied...
Rectified Gaussian distribution
Rectified Gaussian Distribution
In probability theory, the rectified Gaussian distribution is a modification of the Gaussian distribution when its negative elements are reset to 0...
Recurrence period density entropy
Recurrence period density entropy
Recurrence period density entropy is a method, in the fields of dynamical systems, stochastic processes, and time series analysis, for determining the periodicity, or repetitiveness of a signal.- Overview :...
Recurrence plot
Recurrence plot
In descriptive statistics and chaos theory, a recurrence plot is a plot showing, for a given moment in time, the times at which a phase space trajectory visits roughly the same area in the phase space...
Recurrence quantification analysis
Recurrence quantification analysis
Recurrence quantification analysis is a method of nonlinear data analysis for the investigation of dynamical systems. It quantifies the number and duration of recurrences of a dynamical system presented by its phase space trajectory....
Recursive Bayesian estimation
Recursive Bayesian estimation
Recursive Bayesian estimation, also known as a Bayes filter, is a general probabilistic approach for estimating an unknown probability density function recursively over time using incoming measurements and a mathematical process model.-In robotics:...
Recursive least squares
Recursive partitioning
Recursive partitioning
Recursive partitioning is a statistical method for multivariable analysis. Recursive partitioning creates a decision tree that strives to correctly classify members of the population based on several dichotomous dependent variables....
Reduced form
Reduced form
In statistics, and particularly in econometrics, the reduced form of a system of equations is the result of solving the system for the endogenous variables. This gives the latter as a function of the exogenous variables, if any...
Reference class problem
Reference class problem
In statistics, the reference class problem is the problem of deciding what class to use when calculating the probability applicable to a particular case...
Regenerative process
Regenerative process
In applied probability, a regenerative process is a special type of stochastic process that is defined by having a property whereby certain portions of the process can be treated as being statistically independent of each other...
Regression analysis
Regression analysis
In statistics, regression analysis includes many techniques for modeling and analyzing several variables, when the focus is on the relationship between a dependent variable and one or more independent variables...

— see also linear regression
Linear regression
In statistics, linear regression is an approach to modeling the relationship between a scalar variable y and one or more explanatory variables denoted X. The case of one explanatory variable is called simple regression...
Regression Analysis of Time Series — proprietary software
Regression control chart
Regression control chart
In statistical quality control, the regression control chart allows for monitoring a change in a process where two or more variables are correlated. The change in a dependent variable can be detected and compensatory change in the independent variable can be recommended...
Regression dilution
Regression dilution
Regression dilution is a statistical phenomenon also known as "attenuation".Consider fitting a straight line for the relationship of an outcome variable y to a predictor variable x, and estimating the gradient of the line...
Regression discontinuity design
Regression estimation
Regression estimation
Regression estimation is a technique used to replace missing values in data. The variable with missing data is treated as the dependent variable, while the rest of the cases are treated as independent variables. A regression equation is then generated which can be used to predict missing values...
Regression fallacy
Regression fallacy
The regression fallacy is an informal fallacy. It ascribes cause where none exists. The flaw is failing to account for natural fluctuations. It is frequently a special kind of the post hoc fallacy.-Explanation:...
Regression model validation
Regression toward the mean
Regression toward the mean
In statistics, regression toward the mean is the phenomenon that if a variable is extreme on its first measurement, it will tend to be closer to the average on a second measurement, and—a fact that may superficially seem paradoxical—if it is extreme on a second measurement, will tend...
Regret (decision theory)
Regret (decision theory)
Regret is defined as the difference between the actual payoff and the payoff that would have been obtained if a different course of action had been chosen. This is also called difference regret...
Reification (statistics)
Reification (statistics)
In statistics, reification is the use of an idealized model of a statistical process. The model is then used to make inferences connecting model results, which imperfectly represent the actual process, with experimental observations....
Rejection sampling
Rejection sampling
In mathematics, rejection sampling is a basic pseudo-random number sampling technique used to generate observations from a distribution. It is also commonly called the acceptance-rejection method or "accept-reject algorithm"....
Relationships among probability distributions
Relationships among probability distributions
Many statistical distributions have close relationships. Some examples include:* Bernoulli distribution, binomial distribution, and normal distribution.* exponential distribution and Poisson distribution....
Relative change and difference
Relative change and difference
The relative difference, percent difference, relative percent difference, or percentage difference between two quantities is the difference between them, expressed as a comparison to the size of one or both of them. Such measures are unitless numbers...
Relative efficiency redirects to Efficiency (statistics)
Efficiency (statistics)
In statistics, an efficient estimator is an estimator that estimates the quantity of interest in some “best possible” manner. The notion of “best possible” relies upon the choice of a particular loss function — the function which quantifies the relative degree of undesirability of estimation errors...
Relative index of inequality
Relative index of inequality
The relative index of inequality is a regression-based index which summarizes the magnitude of socio-economic status as a source of inequalities in health. RII is useful because it takes into account the size of the population and the relative diadvantage experienced by different groups...
Relative risk
Relative risk
In statistics and mathematical epidemiology, relative risk is the risk of an event relative to exposure. Relative risk is a ratio of the probability of the event occurring in the exposed group versus a non-exposed group....
Relative risk reduction
Relative risk reduction
In epidemiology, the relative risk reduction is a measure calculated by dividing the absolute risk reduction by the control event rate.The relative risk reduction can be more useful than the absolute risk reduction in determining an appropriate treatment plan, because it accounts not only for the...
Relative standard deviation
Relative standard deviation
In probability theory and statistics, the relative standard deviation is the absolute value of the coefficient of variation. It is often expressed as a percentage. A similar term that is sometimes used is the relative variance which is the square of the coefficient of variation...
Relative standard error — redirects to Relative standard deviation
Relative standard deviation
In probability theory and statistics, the relative standard deviation is the absolute value of the coefficient of variation. It is often expressed as a percentage. A similar term that is sometimes used is the relative variance which is the square of the coefficient of variation...
Relative variance — redirects to Relative standard deviation
Relative standard deviation
In probability theory and statistics, the relative standard deviation is the absolute value of the coefficient of variation. It is often expressed as a percentage. A similar term that is sometimes used is the relative variance which is the square of the coefficient of variation...
Relative survival
Relative survival
When describing the survival experience of a group of people or patients typically the method of overall survival is used, and it presents estimates of the proportion of people or patients alive at a certain point in time...
Relativistic Breit–Wigner distribution
Relevance vector machine
Relevance Vector Machine
Relevance vector machine is a machine learning technique that uses Bayesian inference to obtain parsimonious solutions for regression and classification...
Reliability (statistics)
Reliability (statistics)
In statistics, reliability is the consistency of a set of measurements or of a measuring instrument, often used to describe a test. Reliability is inversely related to random error.-Types:There are several general classes of reliability estimates:...
Reliability block diagram
Reliability block diagram
A reliability block diagram is a diagrammatic method for showing how component reliability contributes to the success or failure of a complex system. RBD is also known as a dependence diagram ....
Reliability engineering
Reliability engineering
Reliability engineering is an engineering field, that deals with the study, evaluation, and life-cycle management of reliability: the ability of a system or component to perform its required functions under stated conditions for a specified period of time. It is often measured as a probability of...
Reliability theory
Reliability theory
Reliability theory describes the probability of a system completing its expected function during an interval of time. It is the basis of reliability engineering, which is an area of study focused on optimizing the reliability, or probability of successful functioning, of systems, such as airplanes,...
Reliability theory of aging and longevity
Reliability theory of aging and longevity
Reliability theory of aging and longevity is a scientific approach aimed to gain theoretical insights into mechanisms of biological aging and species survival patterns by applying a general theory of systems failure, known as reliability theory.-Overview:...
Rencontres numbers — a discrete distribution
Renewal theory
Renewal theory
Renewal theory is the branch of probability theory that generalizes Poisson processes for arbitrary holding times. Applications include calculating the expected time for a monkey who is randomly tapping at a keyboard to type the word Macbeth and comparing the long-term benefits of different...
Repeatability
Repeatability
Repeatability or test-retest reliability is the variation in measurements if they would have been taken by a single person or instrument on the same item and under the same conditions. A less-than-perfect test-retest reliability causes test-retest variability. Such variability can be caused by, for...
Repeated measures design
Repeated measures design
The repeated measures design uses the same subjects with every condition of the research, including the control. For instance, repeated measures are collected in a longitudinal study in which change over time is assessed. Other studies compare the same measure under two or more different conditions...
Replication (statistics)
Replication (statistics)
In engineering, science, and statistics, replication is the repetition of an experimental condition so that the variability associated with the phenomenon can be estimated. ASTM, in standard E1847, defines replication as "the repetition of the set of all the treatment combinations to be compared in...
Representation validity
Representation validity
Representation validity is concerned about how well the constructs or abstractions translate into observable measures. There are two primary questions to be answered:...
Reproducibility
Reproducibility
Reproducibility is the ability of an experiment or study to be accurately reproduced, or replicated, by someone else working independently...
Resampling (statistics)
Resampling (statistics)
In statistics, resampling is any of a variety of methods for doing one of the following:# Estimating the precision of sample statistics by using subsets of available data or drawing randomly with replacement from a set of data points # Exchanging labels on data points when performing significance...
Rescaled range
Rescaled range
The rescaled range is a statistical measure of the variability of a time series introduced by the British hydrologist Harold Edwin Hurst...
Resentful demoralization
Resentful demoralization
Resentful demoralization is an issue in controlled experiments in which those in the control group become resentful of not receiving the experimental treatment. Alternatively, the experimental group could be resentful of the control group, if the experimental group perceive its treatment as...

– experimental design
Residual. See errors and residuals in statistics
Errors and residuals in statistics
In statistics and optimization, statistical errors and residuals are two closely related and easily confused measures of the deviation of a sample from its "theoretical value"...

.
Residual sum of squares
Residual sum of squares
In statistics, the residual sum of squares is the sum of squares of residuals. It is also known as the sum of squared residuals or the sum of squared errors of prediction . It is a measure of the discrepancy between the data and an estimation model...
Response bias
Response bias
Response bias is a type of cognitive bias which can affect the results of a statistical survey if respondents answer questions in the way they think the questioner wants them to answer rather than according to their true beliefs...
Response rate
Response rate
Response rate in survey research refers to the number of people who answered the survey divided by the number of people in the sample...
Response surface methodology
Response surface methodology
In statistics, response surface methodology explores the relationships between several explanatory variables and one or more response variables. The method was introduced by G. E. P. Box and K. B. Wilson in 1951. The main idea of RSM is to use a sequence of designed experiments to obtain an...
Response variable
Restricted maximum likelihood
Restricted maximum likelihood
In statistics, the restricted maximum likelihood approach is a particular form of maximum likelihood estimation which does not base estimates on a maximum likelihood fit of all the information, but instead uses a likelihood function calculated from a transformed set of data, so that nuisance...
Restricted randomization
Restricted randomization
Many processes have more than one source of variation in them. In order to reduce variation in processes, these multiple sources must be understood, and that often leads to the concept of nested or hierarchical data structures. For example, in the semiconductor industry, a batch process may operate...
Reversible-jump Markov chain Monte Carlo
Reversible dynamics
Reversible dynamics
- Mathematics :In mathematics, a dynamical system is invertible if the forward evolution is one-to-one, not many-to-one; so that for every state there exists a well-defined reverse-time evolution operator....
Rind et al. controversy – interpretations of paper involving meta-analysis
Rice distribution
Rice distribution
In probability theory, the Rice distribution or Rician distribution is the probability distribution of the absolute value of a circular bivariate normal random variable with potentially non-zero mean. It was named after Stephen O...
Richardson–Lucy deconvolution
Ridge regression redirects to Tikhonov regularization
Tikhonov regularization
Tikhonov regularization, named for Andrey Tikhonov, is the most commonly used method of regularization of ill-posed problems. In statistics, the method is known as ridge regression, and, with multiple independent discoveries, it is also variously known as the Tikhonov-Miller method, the...
Risk factor
Risk factor
In epidemiology, a risk factor is a variable associated with an increased risk of disease or infection. Sometimes, determinant is also used, being a variable associated with either increased or decreased risk.-Correlation vs causation:...
Risk function
Risk function
In decision theory and estimation theory, the risk function R of a decision rule, δ, is the expected value of a loss function L:...
Risk perception
Risk perception
Risk perception is the subjective judgment that people make about the characteristics and severity of a risk. The phrase is most commonly used in reference to natural hazards and threats to the environment or health, such as nuclear power. Several theories have been proposed to explain why...
Risk theory
Risk theory
Risk theory connotes the study usually by actuaries and insurers of the financial impact on a carrier of a portfolio of insurance policies. For example, if the carrier has 100 policies that insures against a total loss of $1000, and if each policy's chance of loss is independent and has a...
Risk-benefit analysis
Risk-benefit analysis
Risk–benefit analysis is the comparison of the risk of a situation to its related benefits. Exposure to personal risk is recognized as a normal aspect of everyday life. We accept a certain level of risk in our lives as necessary to achieve certain benefits. In most of these risks we feel as though...
Robbins lemma
Robbins lemma
In statistics, the Robbins lemma, named after Herbert Robbins, states that if X is a random variable with a Poisson distribution, and f is any function for which the expected value E exists, then...
Robin Hood index
Robin Hood index
The Hoover index is a measure of income inequality. It is equal to the portion of the total community income that would have to be redistributed for there to be perfect equality....
Robust confidence intervals
Robust confidence intervals
In statistics a robust confidence interval is a robust modification of confidence intervals, meaning that one modifies the non-robust calculations of the confidence interval so that they are not badly affected by outlying or aberrant observations in a data-set.- Example :In the process of weighing...
Robust regression
Robust regression
In robust statistics, robust regression is a form of regression analysis designed to circumvent some limitations of traditional parametric and non-parametric methods. Regression analysis seeks to find the effect of one or more independent variables upon a dependent variable...
Robust statistics
Robust statistics
Robust statistics provides an alternative approach to classical statistical methods. The motivation is to produce estimators that are not unduly affected by small departures from model assumptions.- Introduction :...
Root mean square
Root mean square
In mathematics, the root mean square , also known as the quadratic mean, is a statistical measure of the magnitude of a varying quantity. It is especially useful when variates are positive and negative, e.g., sinusoids...
Root mean square deviation
Root mean square deviation
The root-mean-square deviation is the measure of the average distance between the atoms of superimposed proteins...
Root mean square deviation (bioinformatics)
Root mean square fluctuation
Robust measures of scale
Robust measures of scale
In statistics, a robust measure of scale is a robust statistic that quantifies the statistical dispersion in a set of quantitative data. Robust measures of scale are used to complement or replace conventional estimates of scale such as the sample variance or sample standard deviation...
Rossmo's formula
Rossmo's formula
Rossmo's formula is a geographic profiling formula to predict where a serial criminal lives. The formula was developed by criminologist Kim Rossmo.-Formula:...
Rothamsted Experimental Station
Rothamsted Experimental Station
The Rothamsted Experimental Station, one of the oldest agricultural research institutions in the world, is located at Harpenden in Hertfordshire, England. It is now known as Rothamsted Research...
Round robin test
Round robin test
In experimental methodology, a round robin test is an interlaboratory test performed independently several times. This can involve multiple independent scientists performing the test with the use of the same method in different equipment, or a variety of methods and equipment...
Rubin causal model
Rubin Causal Model
The Rubin Causal Model is an approach to the statistical analysis of cause and effect based on the framework of potential outcomes. RCM is named after Donald Rubin, Professor of Statistics at Harvard University...
Ruin theory
Ruin theory
Ruin theory, sometimes referred to as collective risk theory, is a branch of actuarial science that studies an insurer's vulnerability to insolvency based on mathematical modeling of the insurer's surplus....
Rule of succession
Rule of succession
In probability theory, the rule of succession is a formula introduced in the 18th century by Pierre-Simon Laplace in the course of treating the sunrise problem....
Rule of three (medicine)
Rule of three (medicine)
In the statistical analysis of clinical trials, the rule of three states that if no major adverse events occurred in a group of n people, there can be 95% confidence that the chance of major adverse events is less than one in n / 3...
Run chart
Run Chart
A run chart, also known as a run-sequence plot is a graph that displays observed data in a time sequence. Often, the data displayed represent some aspect of the output or performance of a manufacturing or other business process.- Overview :...
RV coefficient
RV coefficient
In statistics, the RV coefficientis a multivariate generalization of the Pearson correlation coefficient.It measures the closeness of two set of points that may each be represented in a matrix....

S

S (programming language)
S-PLUS
S-PLUS
S-PLUS is a commercial implementation of the S programming language sold by TIBCO Software Inc..It features object-oriented programming capabilities and advanced analytical algorithms.-Historical timeline:...
Safety in numbers
Safety in numbers
Safety in numbers is the hypothesis that, by being part of a large physical group or mass, an individual is proportionally less likely to be the victim of a mishap, accident, attack, or other bad event...
Sally Clark
Sally Clark
Sally Clark was a British solicitor who became the victim of an infamous miscarriage of justice when she was wrongly convicted of the murder of two of her sons in 1999...

(prob/stats related court case)
Sammon projection
Sample mean and covariance redirects to Sample mean and sample covariance
Sample mean and sample covariance
The sample mean or empirical mean and the sample covariance are statistics computed from a collection of data on one or more random variables. The sample mean is a vector each of whose elements is the sample mean of one of the random variables that is, each of whose elements is the average of the...
Sample mean and sample covariance
Sample mean and sample covariance
The sample mean or empirical mean and the sample covariance are statistics computed from a collection of data on one or more random variables. The sample mean is a vector each of whose elements is the sample mean of one of the random variables that is, each of whose elements is the average of the...
Sample maximum and minimum
Sample maximum and minimum
In statistics, the maximum and sample minimum, also called the largest observation, and smallest observation, are the values of the greatest and least elements of a sample....
Sample size determination
Sample space
Sample standard deviation — disambiguation
Sample (statistics)
Sample (statistics)
In statistics, a sample is a subset of a population. Typically, the population is very large, making a census or a complete enumeration of all the values in the population impractical or impossible. The sample represents a subset of manageable size...
Sample-continuous process
Sampling (statistics)
Sampling (statistics)
In statistics and survey methodology, sampling is concerned with the selection of a subset of individuals from within a population to estimate characteristics of the whole population....
- simple random sampling
- Snowball sampling
  Snowball sampling
  In sociology and statistics research, snowball sampling is a non-probability sampling technique where existing study subjects recruit future subjects from among their acquaintances. Thus the sample group appears to grow like a rolling snowball...
- systematic sampling
  Systematic sampling
  Systematic sampling is a statistical method involving the selection of elements from an ordered sampling frame. The most common form of systematic sampling is an equal-probability method, in which every kth element in the frame is selected, where k, the sampling interval , is calculated as:k =...
- stratified sampling
  Stratified sampling
  In statistics, stratified sampling is a method of sampling from a population.In statistical surveys, when subpopulations within an overall population vary, it is advantageous to sample each subpopulation independently. Stratification is the process of dividing members of the population into...
- cluster sampling
  Cluster sampling
  Cluster Sampling is a sampling technique used when "natural" groupings are evident in a statistical population. It is often used in marketing research. In this technique, the total population is divided into these groups and a sample of the groups is selected. Then the required information is...
- multistage sampling
  Multistage sampling
  Multistage sampling is a complex form of cluster sampling.Advantages * cost and speed that the survey can be done in* convenience of finding the survey sample* normally more accurate than cluster sampling for the same size sampleDisadvantages...
- nonprobability sampling
  Nonprobability sampling
  Sampling is the use of a subset of the population to represent the whole population. Probability sampling, or random sampling, is a sampling technique in which the probability of getting any particular sample may be calculated. Nonprobability sampling does not meet this criterion and should be...
- slice sampling
  Slice sampling
  Slice sampling is a type of Markov chain Monte Carlo algorithm for pseudo-random number sampling, i.e. for drawing random samples from a statistical distribution...
Sampling bias
Sampling design
Sampling design
In the theory of finite population sampling, a sampling design specifies for every possible sample its probability of being drawn.Mathematically, a sampling design is denoted by the function P which gives the probability of drawing a sample S....
Sampling distribution
Sampling distribution
In statistics, a sampling distribution or finite-sample distribution is the probability distribution of a given statistic based on a random sample. Sampling distributions are important in statistics because they provide a major simplification on the route to statistical inference...
Sampling error
Sampling error
-Random sampling:In statistics, sampling error or estimation error is the error caused by observing a sample instead of the whole population. The sampling error can be found by subtracting the value of a parameter from the value of a statistic...
Sampling fraction
Sampling fraction
In sampling theory, sampling fraction is the ratio of sample size to population size or, in the context of stratified sampling, the ratio of the sample size to the size of the stratum....
Sampling frame
Sampling frame
In statistics, a sampling frame is the source material or device from which a sample is drawn. It is a list of all those within a population who can be sampled, and may include individuals, households or institutions....
Sampling risk
Sampling risk
In auditing, sampling is an inevitable means of testing. However, sampling is always associated with sampling risks which auditors have to control....
Samuelson's inequality
Samuelson's inequality
In statistics, Samuelson's inequality, named after the economist Paul Samuelson, also called the Laguerre–Samuelson inequality, after the mathematician Edmond Laguerre, states that every one of any collection x1, ..., xn, is within √ standard deviations of their mean...
Sargan test
Sargan test
The Sargan test is a statistical test used to check for over-identifying restrictions in a statistical model. The Sargan test is based on the observation that the residuals should be uncorrelated with the set of exogenous variables if the instruments are truly exogenous...
SAS (software)
SAS language
SAS language
The SAS language is a data processing and statistical analysis .See more on origins of SAS language at SAS System and at Barr Systems .-Structure:The SAS language basically divides data processing and analysis into two kinds of steps....
SAS System — redirects to SAS (software)
Savitzky–Golay smoothing filter
Savitzky–Golay smoothing filter
The Savitzky–Golay smoothing filter is a type of filter first described in 1964 by Abraham Savitzky and Marcel J. E. Golay.The Savitzky–Golay method essentially performs a local polynomial regression on a series of values to determine the smoothed value for each point...
Sazonov's theorem
Sazonov's theorem
In mathematics, Sazonov's theorem, named after Vyacheslav Vasilievich Sazonov , is a theorem in functional analysis.It states that a bounded linear operator between two Hilbert spaces is γ-radonifying if it is Hilbert–Schmidt...
Saturated array
Saturated array
In experiments in which additional factors are not likely to interact with any of the other factors, a saturated array can be used. In a saturated array, a controllable factor is substituted for the interaction of two or more by-products. Using a saturated array, a two-factor test matrix could be...
Scale analysis (statistics)
Scale analysis (statistics)
In statistics, scale analysis is a set of methods to analyse survey data, in which responses to questions are combined to measure a latent variable. These items can be dichotomous or polytomous...
Scale parameter
Scale parameter
In probability theory and statistics, a scale parameter is a special kind of numerical parameter of a parametric family of probability distributions...
Scaled-inverse-chi-squared distribution
Scaling pattern of occupancy
Scaling pattern of occupancy
In spatial ecology and macroecology, scaling pattern of occupancy , also known as the area-of-occupancy is the way in which species distribution changes across spatial scales. In physical geography and image analysis, it is similar to the modifiable areal unit problem. Simon A...
Scatter matrix
Scatter matrix
In multivariate statistics and probability theory, the scatter matrix is a statistic that is used to make estimates of the covariance matrix of the multivariate normal distribution.-Definition:...
Scatter plot
Scatterplot smoothing
Scheffé's method
Scheffé's method
In statistics, Scheffé's method, named after Henry Scheffé, is a method for adjusting significance levels in a linear regression analysis to account for multiple comparisons...
Schilder's theorem
Schilder's theorem
In mathematics, Schilder's theorem is a result in the large deviations theory of stochastic processes. Roughly speaking, Schilder's theorem gives an estimate for the probability that a sample path of Brownian motion will stray far from the mean path . This statement is made precise using rate...
Schramm–Loewner evolution
Schuette–Nesbitt formula
Schuette–Nesbitt formula
In probability theory, the Schuette–Nesbitt formula is a generalization of the probabilistic version of the inclusion-exclusion principle. It is named after Donald R. Schuette and Cecil J...
Schwarz criterion
Schwarz criterion
In statistics, the Bayesian information criterion or Schwarz criterion is a criterion for model selection among a finite set of models...
Score (statistics)
Score (statistics)
In statistics, the score, score function, efficient score or informant plays an important role in several aspects of inference...
Score test
Score test
A score test is a statistical test of a simple null hypothesis that a parameter of interest \theta isequal to some particular value \theta_0. It is the most powerful test when the true value of \theta is close to \theta_0. The main advantage of the Score-test is that it does not require an...
Scoring algorithm
Scoring algorithm
In statistics, Fisher's scoring algorithm is a form of Newton's method used to solve maximum likelihood equations numerically.-Sketch of Derivation:...
Scoring rule
Scoring rule
In decision theory a score function, or scoring rule, is a measure of the performance of an entity, be it person or machine, that repeatedly makes decisions under uncertainty. For example, every evening a TV weather forecaster may give the probability of rain on the next day, in a type of...
SCORUS
SCORUS
An acronym for "Standing Committee of Regional and Urban Statistics", SCORUS is a sub-committee of the International Association for Official Statistics which is a section of the International Statistical Institute. The sub-committee has specific responsibility for regional and urban statistics...
Scott's Pi
Scott's Pi
Scott's pi is a statistic for measuring inter-rater reliability for nominal data in communication studies. Textual entities are annotated with categories by different annotators, and various measures are used to assess the extent of agreement between the annotators, one of which is Scott's pi...
SDMX
SDMX
SDMX is an initiative to foster standards for the exchange of statistical information. It started in 2001 and aims at fostering standards for Statistical Data and Metadata eXchange...

– a standard for exchanging statistical data
Seasonal adjustment
Seasonal adjustment
Seasonal adjustment is a statistical method for removing the seasonal component of a time series that is used when analyzing non-seasonal trends. It is normal to report un-adjusted data for current unemployment rates, as these reflect the actual current situation...
Seasonality
Seasonality
In statistics, many time series exhibit cyclic variation known as seasonality, periodic variation, or periodic fluctuations. This variation can be either regular or semi regular....
Seasonal subseries plot
Seasonal subseries plot
Seasonal subseries plots are a tool for detecting seasonality in a time series. This plot allows one to detect both between-group and within-group patterns. This plot is only useful if the period of the seasonality is already known. In many cases, this will in fact be known. For example, monthly...
Seasonal variation
Seasonally adjusted annual rate
Seasonally adjusted annual rate
The Seasonally Adjusted Annual Rate refers to the rate adjustment employed when drawing comparisons between various sets of statistical data. As the name suggests, it takes into account fluctuations of values in such data which might occur due to seasonality...
Second moment method
Second moment method
In mathematics, the second moment method is a technique used in probability theory and analysis to show that a random variable has positive probability of being positive...
Secretary problem
Secretary problem
The secretary problem is one of many names for a famous problem of theoptimal stopping theory.The problem has been studied extensively in the fields ofapplied probability, statistics, and decision theory...
Secular trend
Secular variation
Secular variation
The secular variation of a time series is its long-term non-periodic variation . Whether something is perceived as a secular variation or not depends on the available timescale: a secular variation over a time scale of centuries may be part of a periodic variation over a time scale of millions of...
Seemingly unrelated regressions
Seismic to simulation
Seismic to simulation
Seismic to Simulation is the process and associated techniques used to develop highly accurate static and dynamic 3D models of hydrocarbon reservoirs for use in predicting future production, placing additional wells, and evaluating alternative reservoir management scenarios...
Selection bias
Selection bias
Selection bias is a statistical bias in which there is an error in choosing the individuals or groups to take part in a scientific study. It is sometimes referred to as the selection effect. The term "selection bias" most often refers to the distortion of a statistical analysis, resulting from the...
Selective recruitment
Selective recruitment
Selective recruitment is an observed effect in traffic safety. When safety belt laws are passed, belt wearing rates increase, but casualties decline by smaller percentages than estimated in a simple calculation. This is because those converted from non-use to use are not “recruited” random...
Self-selection bias
Self-similar process
Self-similar process
Self-similar processes are types of stochastic processes that exhibit the phenomenon of self-similarity. A self-similar phenomenon behaves the same when viewed at different degrees of magnification, or different scales on a dimension . Self-similar processes can sometimes be described using...
Segmented regression
Segmented regression
Segmented regression is a method in regression analysis in which the independent variable is partitioned into intervals and a separate line segment is fit to each interval. Segmented or piecewise regression analysis can also be performed on multivariate data by partitioning the various independent...
Seismic inversion
Seismic inversion
Seismic inversion, in Geophysics , is the process of transforming seismic reflection data into a quantitative rock-property description of a reservoir...
Self-similarity matrix
Self-similarity matrix
In data analysis, the self-similarity matrix is a graphical representation of similar sequences in a data series. Similarity can be explained by different measures, like spatial distance , correlation, or comparison of local histograms or spectral properties...
Semantic mapping (statistics)
Semantic mapping (statistics)
The semantic mapping is a dimensionality reduction method that extracts new features by clustering the original features in semantic clusters and combining features mapped in the same cluster to generate an extracted feature...
Semantic relatedness
Semantic similarity
Semantic similarity
Semantic similarity or semantic relatedness is a concept whereby a set of documents or terms within term lists are assigned a metric based on the likeness of their meaning / semantic content....
Semi-Markov process
Semi-Markov process
A continuous-time stochastic process is called a semi-Markov process or 'Markov renewal process' if the embedded jump chain is a Markov chain, and where the holding times are random variables with any distribution, whose distribution function may depend on the two states between which the move is...
Semi-log graph
Semidefinite embedding
Semidefinite embedding
Semidefinite embedding or maximum variance unfolding is an algorithm in computer science, that uses semidefinite programming to perform non-linear dimensionality reduction of high-dimensional vectorial input data....
Semimartingale
Semimartingale
In probability theory, a real valued process X is called a semimartingale if it can be decomposed as the sum of a local martingale and an adapted finite-variation process....
Semiparametric model
Semiparametric regression
Semiparametric regression
In statistics, semiparametric regression includes regression models that combine parametric and nonparametric models. They are often used in situations where the fully nonparametric model may not perform well or when the researcher wants to use a parametric model but the functional form with...
Semivariance
Sensitivity (tests)
Sensitivity analysis
Sensitivity analysis
Sensitivity analysis is the study of how the variation in the output of a statistical model can be attributed to different variations in the inputs of the model. Put another way, it is a technique for systematically changing variables in a model to determine the effects of such changes.In any...
Sensitivity and specificity
Sensitivity and specificity
Sensitivity and specificity are statistical measures of the performance of a binary classification test, also known in statistics as classification function. Sensitivity measures the proportion of actual positives which are correctly identified as such Sensitivity and specificity are statistical...
Separation test
Separation test
A separation test is a statistical procedure for early-phase research, to decide whether or not to pursue further research. It is designed to avoid the prevalent situation in early-phase research, when a statistically underpowered test gives a negative result....
Sequential analysis
Sequential analysis
In statistics, sequential analysis or sequential hypothesis testing is statistical analysis where the sample size is not fixed in advance. Instead data are evaluated as they are collected, and further sampling is stopped in accordance with a pre-defined stopping rule as soon as significant results...
Sequential estimation
Sequential estimation
In statistics, sequential estimation refers to estimation methods in sequential analysis where the sample size is not fixed in advance. Instead, data is evaluated as it is collected, and further sampling is stopped in accordance with a pre-defined stopping rule as soon as significant results are...
Sequential Monte Carlo methods redirects to Particle filter
Particle filter
In statistics, particle filters, also known as Sequential Monte Carlo methods , are sophisticated model estimation techniques based on simulation...
Sequential probability ratio test
Sequential probability ratio test
The sequential probability ratio test is a specific sequential hypothesis test, developed by Abraham Wald. Neyman and Pearson's 1933 result inspired Wald to reformulate it as a sequential analysis problem...
Serial dependence
Serial dependence
In statistics and signal processing, random variables in a time series have serial dependence if the value at some time t in the series is statistically dependent on the value at another time s...
Seriation (archaeology)
Seriation (archaeology)
In archaeology, seriation is a relative dating method in which assemblages or artifacts from numerous sites, in the same culture, are placed in chronological order. Where absolute dating methods, such as carbon dating, cannot be applied, archaeologists have to use relative dating methods to date...
SETAR (model)
SETAR (model)
In statistics, Self-Exciting Threshold AutoRegressive models are typically applied to time series data as an extension of autoregressive models, in order to allow for higher degree of flexibility in model parameters through a regime switching behaviour.Given a time series of data xt, the SETAR...

— a time series model
Sethi model
Sethi model
The Sethi model was developed by Suresh P. Sethi and describes the process of how sales evolve over time in response to advertising. The rate of change in sales depend on three effects: response to advertising that acts positively on the unsold portion of the market, the loss due to forgetting or...
Seven-number summary
Seven-number summary
In descriptive statistics, the seven-number summary is a collection of seven summary statistics, and is a modification or extension of the five-number summary...
Sexual dimorphism measures
Sexual dimorphism measures
Although the subject of sexual dimorphism is not in itself controversial, the measures by which it is assessed differ widely. Most of the measures are used on the assumption that a random variable is considered so that probability distributions should be taken into account...
Shannon–Hartley theorem
Shannon–Hartley theorem
In information theory, the Shannon–Hartley theorem tells the maximum rate at which information can be transmitted over a communications channel of a specified bandwidth in the presence of noise. It is an application of the noisy channel coding theorem to the archetypal case of a continuous-time...
Shape of the distribution
Shape of the distribution
In statistics, the concept of the shape of the distribution refers to the shape of a probability distribution and it most often arises in questions of finding an appropriate distribution to use to model the statistical properties of a population, given a sample from that population...
Shape parameter
Shape parameter
In probability theory and statistics, a shape parameter is a kind of numerical parameter of a parametric family of probability distributions.- Definition :...
Shapiro–Wilk test
Sharpe ratio
Sharpe ratio
The Sharpe ratio or Sharpe index or Sharpe measure or reward-to-variability ratio is a measure of the excess return per unit of deviation in an investment asset or a trading strategy, typically referred to as risk , named after William Forsyth Sharpe...
SHAZAM (software)
SHAZAM (software)
SHAZAM is a comprehensive econometrics and statistics package for estimating, testing, simulating and forecasting many types of econometrics and statistical models...
Shewhart individuals control chart
Shifted Gompertz distribution
Shifted log-logistic distribution
Shifting baseline
Shrinkage (statistics)
Shrinkage (statistics)
In statistics, shrinkage has two meanings:*In relation to the general observation that, in regression analysis, a fitted relationship appears to perform less well on a new data set than on the data set used for fitting. In particular the value of the coefficient of determination 'shrinks'...
Shrinkage estimator
Shrinkage estimator
In statistics, a shrinkage estimator is an estimator that, either explicitly or implicitly, incorporates the effects of shrinkage. In loose terms this means that a naïve or raw estimate is improved by combining it with other information. The term relates to the notion that the improved estimate is...
Sichel distribution
Siegel–Tukey test
Sieve estimator
Sieve estimator
In statistics, sieve estimators are a class of nonparametric estimator which use progressively more complex models to estimate an unknown high-dimensional function as more data becomes available, with the aim of asymptotically reducing error towards zero as the amount of data increases. This method...
Sigma-algebra
Sigma-algebra
In mathematics, a σ-algebra is a technical concept for a collection of sets satisfying certain properties. The main use of σ-algebras is in the definition of measures; specifically, the collection of sets over which a measure is defined is a σ-algebra...
SigmaStat
SigmaStat
SigmaStat is a statistical software package, which was originally developed by Jandel Scientific Software in the 1980s. As of October 1996, Systat Software is now based in San Jose, California. SigmaStat users have the ability to compare effects among groups. This includes before and after or...

– software
Sign test
Sign test
In statistics, the sign test can be used to test the hypothesis that there is "no difference in medians" between the continuous distributions of two random variables X and Y, in the situation when we can draw paired samples from X and Y...
Signal-to-noise ratio
Signal-to-noise ratio
Signal-to-noise ratio is a measure used in science and engineering that compares the level of a desired signal to the level of background noise. It is defined as the ratio of signal power to the noise power. A ratio higher than 1:1 indicates more signal than noise...
Signal-to-noise statistic
Signed differential mapping
Signed differential mapping
Signed differential mapping or SDM is a statistical technique for meta-analyzing studies on differences in brain activity or structure which used neuroimaging techniques such as fMRI, VBM, DTI or PET...
Significance analysis of microarrays
Significance Analysis of Microarrays
Significance analysis of microarrays is a statistical technique, established in 2001 by Tusher, Tibshirani and Chu, for determining whether changes in gene expression are statistically significant. With the advent of DNA microarrays it is now possible to measure the expression of thousands of...
Silhouette (clustering)
Silhouette (clustering)
Silhouette refers to a method of interpretation and validation of clusters of data. The technique provides a succinct graphical representation of how well each object lies within its cluster. It was first described by Peter J. Rousseeuw in 1986.- Method :...
Simfit
Simfit
Simfit is a free Open Source Windows package for simulation, curve fitting, statistics, and plotting, using a library of models or user-defined equations. Simfit has been in continuous development for many years by Dr Bill Bardsley of the University of Manchester...

– software
Similarity matrix
Similarity matrix
A similarity matrix is a matrix of scores which express the similarity between two data points. Similarity matrices are strongly related to their counterparts, distance matrices and substitution matrices.-Use in sequence alignment:...
Simon model
Simon model
-Motivation:Aiming to account for the wide range of empirical distributions following a power-law, Herbert Simon proposed a class of stochastic models that results in a power-law distribution function. It models the dynamics of a system...
Simple linear regression
Simple linear regression
In statistics, simple linear regression is the least squares estimator of a linear regression model with a single explanatory variable. In other words, simple linear regression fits a straight line through the set of n points in such a way that makes the sum of squared residuals of the model as...
Simple moving average crossover
Simple moving average crossover
In the statistics of time series, and in particular the analysis of financial time series for stock trading purposes, a moving-average crossover occurs when, on plotting two moving averages each based on different degrees of smoothing, the traces of these moving averages cross...
Simple random sample
Simple random sample
In statistics, a simple random sample is a subset of individuals chosen from a larger set . Each individual is chosen randomly and entirely by chance, such that each individual has the same probability of being chosen at any stage during the sampling process, and each subset of k individuals has...
Simpson's paradox
Simpson's paradox
In probability and statistics, Simpson's paradox is a paradox in which a correlation present in different groups is reversed when the groups are combined. This result is often encountered in social-science and medical-science statistics, and it occurs when frequencydata are hastily given causal...
Simulated annealing
Simulated annealing
Simulated annealing is a generic probabilistic metaheuristic for the global optimization problem of locating a good approximation to the global optimum of a given function in a large search space. It is often used when the search space is discrete...
Simultaneous equation methods (econometrics)
Simultaneous equations model
Simultaneous equations model
Simultaneous equation models are a form of statistical model in the form of a set of linear simultaneous equations. They are often used in econometrics.- Structural and reduced form :...
Single equation methods (econometrics)
Single equation methods (econometrics)
A variety of methods are used in econometrics to estimate models consisting of a single equation. The oldest and still the most commonly used is the ordinary least squares method used to estimate linear regressions....
Singular distribution
Singular distribution
In probability, a singular distribution is a probability distribution concentrated on a set of Lebesgue measure zero, where the probability of each point in that set is zero. These distributions are sometimes called singular continuous distributions...
Singular spectrum analysis
Singular Spectrum Analysis
Singular spectrum analysis combines elements of classical time series analysis, multivariate statistics, multivariate geometry, dynamical systems and signal processing...
Sinusoidal model
Sinusoidal model
In statistics, signal processing, and time series analysis, a sinusoidal model to approximate a sequence Yi is:Y_i = C + \alpha\sin + E_i...
Sinkov statistic
Sinkov statistic
Sinkov statistics, also known as log-weight statistics, is a specialized field of statistics that was developed by Abraham Sinkov, while working for the small Signal Intelligence Service organization, the primary mission of which was to compile codes and ciphers for use by the U.S. Army...
Skellam distribution
Skew normal distribution
Skewness
Skewness
In probability theory and statistics, skewness is a measure of the asymmetry of the probability distribution of a real-valued random variable. The skewness value can be positive or negative, or even undefined...
Skorokhod's representation theorem
Skorokhod's representation theorem
In mathematics and statistics, Skorokhod's representation theorem is a result that shows that a weakly convergent sequence of probability measures whose limit measure is sufficiently well-behaved can be represented as the distribution/law of a pointwise convergent sequence of random variables...
Slash distribution
Slash distribution
In probability theory, the slash distribution is the probability distribution of a standard normal variate divided by an independent standard uniform variate...
Slice sampling
Slice sampling
Slice sampling is a type of Markov chain Monte Carlo algorithm for pseudo-random number sampling, i.e. for drawing random samples from a statistical distribution...
Sliced inverse regression
Slutsky's theorem
Slutsky's theorem
In probability theory, Slutsky’s theorem extends some properties of algebraic operations on convergent sequences of real numbers to sequences of random variables.The theorem was named after Eugen Slutsky. Slutsky’s theorem is also attributed to Harald Cramér....
Small area estimation
Small area estimation
Small area estimation is any of several statistical techniques involving the estimation of parameters for small sub-populations, generally used when the sub-population of interest is included in a larger survey....
Smearing retransformation
Smearing retransformation
The Smearing retransformation is used in regression analysis, after estimating the logarithm of a variable. Estimating the logarithm of a variable instead of the variable itself is a common technique to more closely approximate normality...
Smoothing
Smoothing
In statistics and image processing, to smooth a data set is to create an approximating function that attempts to capture important patterns in the data, while leaving out noise or other fine-scale structures/rapid phenomena. Many different algorithms are used in smoothing...
Smoothing spline
Smoothing spline
The smoothing spline is a method of smoothing using a spline function.-Definition:Let ;x_1...
Smoothness (probability theory)
Smoothness (probability theory)
In probability theory and statistics, smoothness of a density function is a measure which determines how many times the density function can be differentiated, or equivalently the limiting behavior of distribution’s characteristic function....
Snowball sampling
Snowball sampling
In sociology and statistics research, snowball sampling is a non-probability sampling technique where existing study subjects recruit future subjects from among their acquaintances. Thus the sample group appears to grow like a rolling snowball...
Social network change detection
Social network change detection
Social network change detection is a process of monitoring social networks to determine when significant changes to their organizational structure occur and what caused them. This scientific approach combines analytical techniques from social network analysis with those from statistical process...
Social statistics
Social statistics
Social statistics is the use of statistical measurement systems to study human behavior in a social environment. This can be accomplished through polling a particular group of people, evaluating a particular subset of data obtained about a group of people, or by observation and statistical...
SOFA Statistics
SOFA Statistics
SOFA Statistics is an open-source statistical package, with an emphasis on ease of use, learn as you go, and beautiful output. The name stands for Statistics Open For All. It has a graphical user interface and can connect directly to MySQL, PostgreSQL, SQLite, MS Access, and Microsoft SQL Server...

– software
Soliton distribution
Soliton distribution
A soliton distribution is a type of discrete probability distribution that arises in the theory of erasure correcting codes. A paper by Luby introduced two forms of such distributions, the ideal soliton distribution and the robust soliton distribution.-Ideal distribution:The ideal soliton...

– redirects to Luby transform code
Sørensen similarity index
Sørensen similarity index
The Sørensen index, also known as Sørensen’s similarity coefficient, is a statistic used for comparing the similarity of two samples. It was developed by the botanist Thorvald Sørensen and published in 1948....
Spaghetti plot
Sparse binary polynomial hashing
Sparse binary polynomial hashing
Sparse binary polynomial hashing is a generalization of Bayesian filtering that can match mutating phrases as well as single words. SBPH is a way of generating a large number of features from an incoming text automatically, and then using statistics to determine the weights for each of those...
Sparse PCA
Sparse PCA
Sparse PCA is a specialised technique used in statistical analysis and, in particular, in the analysis of multivariate data sets....

– sparse principal components analysis
Sparsity-of-effects principle
Sparsity-of-effects principle
The sparsity-of-effects principle states that a system is usually dominated by main effects and low-order interactions. Thus it is most likely that main effects and two-factor interactions are the most significant responses . In other words, higher order interactions such as three-factor...
Spatial analysis
Spatial analysis
Spatial analysis or spatial statistics includes any of the formal techniques which study entities using their topological, geometric, or geographic properties...
Spatial dependence
Spatial dependence
In applications of statistics, spatial dependence is the existence of statistical dependence in a collection of random variables or a collection time series of random variables, each of which is associated with a different geographical location...
Spatial descriptive statistics
Spatial descriptive statistics
Spatial descriptive statistics are used for a variety of purposes in geography, particularly in quantitative data analyses involving Geographic Information Systems .-Types of spatial data:...
Spatial distribution
Spatial distribution
A spatial distribution is the arrangement of a phenomenon across the Earth's surface and a graphical display of such an arrangement is an important tool in geographical and environmental statistics. A graphical display of a spatial distribution may summarize raw data directly or may reflect the...
Spatial econometrics
Spatial econometrics
Spatial Econometrics is the field where spatial analysis and econometrics intersect. In general, econometrics differs from other branches of statistics in focusing on theoretical models, whose parameters are estimated using regression analysis...
Spatial statistics redirects to Spatial analysis
Spatial analysis
Spatial analysis or spatial statistics includes any of the formal techniques which study entities using their topological, geometric, or geographic properties...
Spatial variability
Spatial variability
Spatial variability occurs when a quantity that is measured at different spatial locations exhibits values that differ across the locations. Spatial variability can be assessed using spatial descriptive statistics such as the range.- References :...
SPC XL
SPC XL
SPC XL is a statistical add-in for Microsoft Excel. SPC XL is a replacement for SPC KISS which was released in 1993 making it one of the oldest statistical addons to Excel...

– software
Spearman's rank correlation coefficient
Spearman's rank correlation coefficient
In statistics, Spearman's rank correlation coefficient or Spearman's rho, named after Charles Spearman and often denoted by the Greek letter \rho or as r_s, is a non-parametric measure of statistical dependence between two variables. It assesses how well the relationship between two variables can...
Spearman–Brown prediction formula
Species discovery curve
Species discovery curve
In ecology, the species discovery curve is a graph recording the cumulative number of species of living things recorded in a particular environment as a function of the cumulative effort expended searching for them...
Specification (regression)
Specification (regression)
In regression analysis and related fields such as econometrics, specification is the process of converting a theory into a regression model. This process consists of selecting an appropriate functional form for the model and choosing which variables to include. Model specification is one of the...
Specificity (tests)
Spectral density estimation
Spectral density estimation
In statistical signal processing, the goal of spectral density estimation is to estimate the spectral density of a random signal from a sequence of time samples of the signal. Intuitively speaking, the spectral density characterizes the frequency content of the signal...
Spectrum bias
Spectrum bias
Initially identified in 1978, spectrum bias refers to the phenomenon that the performance of a diagnostic test may change between different clinical settings owing to changes in the patient case-mix thereby affecting the transferability of study results in clinical practice...
Spectrum continuation analysis
Spectrum continuation analysis
Spectrum continuation analysis is a generalization of the concept of Fourier series to non-periodic functions of which only a fragment has been sampled in the time domain....
Speed prior
Speed prior
Jürgen Schmidhuber's speed prior is a complexity measure similar to Kolmogorov complexity, except that it is based on computation speed as well as programlength.The speed prior complexity of a program is its...
Spherical design
Spherical design
A spherical design, part of combinatorial design theory in mathematics, is a finite set of N points on the d-dimensional unit hypersphere Sd such that the average value of any polynomial f of degree t or less on the set equals the average value of f on the whole sphere...
Split normal distribution
Split normal distribution
In probability theory and statistics, the split normal distribution also known as the two-piece normal distribution results from joining at the mode the corresponding halves of two normal distributions with the same mode but different variances...
SPRT — redirects to Sequential probability ratio test
Sequential probability ratio test
The sequential probability ratio test is a specific sequential hypothesis test, developed by Abraham Wald. Neyman and Pearson's 1933 result inspired Wald to reformulate it as a sequential analysis problem...
SPSS
SPSS
SPSS is a computer program used for survey authoring and deployment , data mining , text analytics, statistical analysis, and collaboration and deployment ....

– software
SPSS Clementine
SPSS Clementine
SPSS Modeler is a data mining software tool by SPSS Inc., an IBM company. It was originally named SPSS Clementine by SPSS, after which it was renamed PASW Modeler in 2009 by SPSS. It was since acquired by IBM in its acquisition of SPSS Inc.-Overview:...

– software (data mining)
Spurious relationship
Spurious relationship
In statistics, a spurious relationship is a mathematical relationship in which two events or variables have no direct causal connection, yet it may be wrongly inferred that they do, due to either coincidence or the presence of a certain third, unseen factor In statistics, a spurious relationship...
Square root biased sampling
Square root biased sampling
Square root biased sampling is a sampling method proposed by William H. Press, a professor in the fields of computer sciences and computational biology, for use in airport screenings as a mathematically efficient compromise between simple random sampling and strong profiling.Using this method, if a...
Squared deviations
Squared deviations
In probability theory and statistics, the definition of variance is either the expected value , or average value , of squared deviations from the mean. Computations for analysis of variance involve the partitioning of a sum of squared deviations...
St. Petersburg paradox
St. Petersburg paradox
In economics, the St. Petersburg paradox is a paradox related to probability theory and decision theory. It is based on a particular lottery game that leads to a random variable with infinite expected value, i.e., infinite expected payoff, but would nevertheless be considered to be worth only a...
Stability (probability)
Stability (probability)
In probability theory, the stability of a random variable is the property that a linear combination of two independent copies of the variable has the same distribution, up to location and scale parameters. The distributions of random variables having this property are said to be "stable...
Stable distribution
Stable and tempered stable distributions with volatility clustering – financial applications
Standard deviation
Standard deviation
Standard deviation is a widely used measure of variability or diversity used in statistics and probability theory. It shows how much variation or "dispersion" there is from the average...
Standard error (statistics)
Standard error (statistics)
The standard error is the standard deviation of the sampling distribution of a statistic. The term may also be used to refer to an estimate of that standard deviation, derived from a particular sample used to compute the estimate....
Standard normal deviate
Standard normal deviate
A standard normal deviate is a normally distributed random variable with expected value 0 and variance 1. A fuller term is standard normal random variable...
Standard normal table
Standard normal table
A standard normal table also called the "Unit Normal Table" is a mathematical table for the values of Φ, the cumulative distribution function of the normal distribution....
Standard probability space
Standard probability space
In probability theory, a standard probability space is a probability space satisfying certain assumptions introduced by Vladimir Rokhlin in 1940...
Standard score
Standard score
In statistics, a standard score indicates how many standard deviations an observation or datum is above or below the mean. It is a dimensionless quantity derived by subtracting the population mean from an individual raw score and then dividing the difference by the population standard deviation...
Standardized coefficient
Standardized coefficient
In statistics, standardized coefficients or beta coefficients are the estimates resulting from an analysis carried out on variables that have been standardized so that their variances are 1. Therefore, standardized coefficients refer to how many standard deviations a dependent variable will change,...
Standardized moment
Standardised mortality rate
Standardised mortality rate
Standardized mortality ratio tells how many persons, per thousand of the population, will die in a given year and what the causes of death will be...
Standardized mortality ratio
Standardized mortality ratio
The standardized mortality ratio or SMR in epidemiology is the ratio of observed deaths to expected deaths, where expected deaths are calculated for a typical area with the same age and gender mix by looking at the death rates for different ages and genders in the larger population.The SMR may be...
Standardized rate
Standardized rate
Standardized rates are a statistical measure of any rates in a population. The most common are birth, death and unemployment rates.The formula for standardized rates is as follows:...
Stanine
Stanine
Stanine is a method of scaling test scores on a nine-point standard scale with a mean of five and a standard deviation of two.Some web sources attribute stanines to the U.S. Army Air Forces during World War II...
STAR model
STAR model
In statistics, Smooth Transition Autoregressive models are typically applied to time series data as an extension of autoregressive models, in order to allow for higher degree of flexibility in model parameters through a smooth transition.Given a time series of data xt, the STAR model is a tool for...

— a time series model
Star plot — redirects to Radar chart
Radar chart
A radar chart is a graphical method of displaying multivariate data in the form of a two-dimensional chart of three or more quantitative variables represented on axes starting from the same point...
Stata
Stata
Stata is a general-purpose statistical software package created in 1985 by StataCorp. It is used by many businesses and academic institutions around the world...
Statgraphics
Statgraphics
Statgraphics is a statistics package that performs and explains basic and advanced statistical functions. The software was created in 1980 by Dr. Neil Polhemus...

– software
Static analysis
Static analysis
Static analysis, static projection, and static scoring are terms for simplified analysis wherein the effect of an immediate change to a system is calculated without respect to the longer term response of the system to that change...
Stationary distribution
Stationary distribution
Stationary distribution may refer to:* The limiting distribution in a Markov chain* The marginal distribution of a stationary process or stationary time series* The set of joint probability distributions of a stationary process or stationary time series...
Stationary ergodic process
Stationary ergodic process
In probability theory, stationary ergodic process is a stochastic process which exhibits both stationarity and ergodicity. In essence this implies that the random process will not change its statistical properties with time and that its statistical properties can be deduced from a single,...
Stationary process
Stationary process
In the mathematical sciences, a stationary process is a stochastic process whose joint probability distribution does not change when shifted in time or space...
Stationary sequence
Stationary sequence
In probability theory – specifically in the theory of stochastic processes, a stationary sequence is a random sequence whose joint probability distribution is invariant over time...
Stationary subspace analysis
Stationary subspace analysis
Stationary Subspace Analysis is a blind source separation algorithm which factorizes a multivariate time series into stationary and non-stationary components.- Introduction :...
Statistic
Statistic
A statistic is a single measure of some attribute of a sample . It is calculated by applying a function to the values of the items comprising the sample which are known together as a set of data.More formally, statistical theory defines a statistic as a function of a sample where the function...
STATISTICA
STATISTICA
STATISTICA is a statistics and analytics software package developed by StatSoft. STATISTICA provides data analysis, data management, data mining, and data visualization procedures...

– software
Statistical arbitrage
Statistical arbitrage
In the world of finance and investments, statistical arbitrage is used in two related but distinct ways:* In academic literature, "statistical arbitrage" is opposed to arbitrage. In deterministic arbitrage, a sure profit can be obtained from being long some securities and short others...
Statistical assembly
Statistical assembly
In statistics, for example in statistical quality control, a statistical assembly is a collection of parts or components which makes up a statistical unit. Thus a statistical unit, which would be the prime item of concern, is made of discrete components like organs or machine parts...
Statistical assumption
Statistical benchmarking
Statistical benchmarking
In statistics, benchmarking is a method of using auxiliary information to adjust the sampling weights used in an estimation process, in order to yield more accurate estimates of totals....
Statistical classification
Statistical conclusion validity
Statistical conclusion validity
Statistical conclusion validity refers to the appropriate use of statistics to infer whether the presumed independent and dependent variables covary...
Statistical consultant
Statistical consultant
A statistical consultant provides statistical advice andguidance to clients interested in making decisions through theanalysis or collection of data. Clients often need statistical advice to answer questions in business, medicine, biology, genetics, forestry, agriculture, fisheries, wildlife...
Statistical deviance—see deviance (statistics)
Statistical dispersion
Statistical dispersion
In statistics, statistical dispersion is variability or spread in a variable or a probability distribution...
Statistical distance
Statistical distance
In statistics, probability theory, and information theory, a statistical distance quantifies the distance between two statistical objects, which can be two samples, two random variables, or two probability distributions, for example.-Metrics:...
Statistical efficiency
Statistical epidemiology
Statistical epidemiology
Statistical epidemiology is an emerging branch of the disciplines of epidemiology and biostatistics that aims to:* Bring more statistical rigour to bear in the field of epidemiology...
Statistical estimation — redirects to Estimation theory
Estimation theory
Estimation theory is a branch of statistics and signal processing that deals with estimating the values of parameters based on measured/empirical data that has a random component. The parameters describe an underlying physical setting in such a way that their value affects the distribution of the...
Statistical finance
Statistical finance
Statistical finance, sometimes called econophysics, is an empirical attempt to shift finance from its normative roots to a positivist framework using exemplars from statistical physics with an emphasis on emergent or collective properties of financial markets...
Statistical genetics — redirects to population genetics
Population genetics
Population genetics is the study of allele frequency distribution and change under the influence of the four main evolutionary processes: natural selection, genetic drift, mutation and gene flow. It also takes into account the factors of recombination, population subdivision and population...
Statistical geography
Statistical geography
Statistical geography is the study and practice of collecting, analysing and presenting data that has a geographic or areal dimension, such as census or demographics data. It uses techniques from spatial analysis, but also encompasses geographical activities such as the defining and naming of...
Statistical graphics
Statistical graphics
Statistical graphics, also known as graphical techniques, are information graphics in the field of statistics used to visualize quantitative data.- Overview :...
Statistical hypothesis testing
Statistical hypothesis testing
A statistical hypothesis test is a method of making decisions using data, whether from a controlled experiment or an observational study . In statistics, a result is called statistically significant if it is unlikely to have occurred by chance alone, according to a pre-determined threshold...
Statistical independence
Statistical independence
In probability theory, to say that two events are independent intuitively means that the occurrence of one event makes it neither more nor less probable that the other occurs...
Statistical inference
Statistical inference
In statistics, statistical inference is the process of drawing conclusions from data that are subject to random variation, for example, observational errors or sampling variation...
Statistical interference
Statistical interference
When two probability distributions overlap, statistical interference exists. Knowledge of the distributions can be used to determine the likelihood that one parameter exceeds another, and by how much....
Statistical Lab
Statistical Lab
The computer program Statistical Lab is an explorative and interactive toolbox for statistical analysis and visualization of data. It supports educational applications of statistics in business sciences, economics, social sciences and humanities. The program is developed and constantly advanced by...

– software
Statistical learning theory
Statistical literacy
Statistical literacy
Statistical literacy is a term used to describe an individual's or group's ability to understand statistics. Statistical literacy is necessary for citizens to understand material presented in publications such as newspapers, television, and the Internet. Numeracy is a prerequisite to being...
Statistical model
Statistical model
A statistical model is a formalization of relationships between variables in the form of mathematical equations. A statistical model describes how one or more random variables are related to one or more random variables. The model is statistical as the variables are not deterministically but...
Statistical model validation — redirects to Regression model validation
Statistical noise
Statistical noise
Statistical noise is the colloquialism for recognized amounts of unexplained variation in a sample. See errors and residuals in statistics....
Statistical package
Statistical parameter
Statistical parameter
A statistical parameter is a parameter that indexes a family of probability distributions. It can be regarded as a numerical characteristic of a population or a model....
Statistical parametric mapping
Statistical parametric mapping
Statistical parametric mapping or SPM is a statistical technique created by Karl Friston for examining differences in brain activity recorded during functional neuroimaging experiments using neuroimaging technologies such as fMRI or PET...
Statistical parsing
Statistical parsing
Statistical parsing is a group of parsing methods within natural language processing. The methods have in common that they associate grammar rules with a probability. Grammar rules are traditionally viewed in computational linguistics as defining the valid sentences in a language...
Statistical population
Statistical population
A statistical population is a set of entities concerning which statistical inferences are to be drawn, often based on a random sample taken from the population. For example, if we were interested in generalizations about crows, then we would describe the set of crows that is of interest...
Statistical power
Statistical power
The power of a statistical test is the probability that the test will reject the null hypothesis when the null hypothesis is actually false . The power is in general a function of the possible distributions, often determined by a parameter, under the alternative hypothesis...
Statistical probability
Statistical process control
Statistical process control
Statistical process control is the application of statistical methods to the monitoring and control of a process to ensure that it operates at its full potential to produce conforming product. Under SPC, a process behaves predictably to produce as much conforming product as possible with the least...
Statistical process control software
Statistical process control software
There are a number of software programs designed to aid in statistical process control .Typically the software program undertakes two functions: data collection and data analysis.-Data collection:...
Statistical proof
Statistical proof
.Statistical proof is the rational demonstration of degree of certainty for a proposition, hypothesis or theory to convince others subsequent to a statistical test of the increased understanding of the facts. Statistical methods are used to demonstrate the validity and logic of inference with...
Statistical randomness
Statistical randomness
A numeric sequence is said to be statistically random when it contains no recognizable patterns or regularities; sequences such as the results of an ideal dice roll, or the digits of π exhibit statistical randomness....
Statistical range – see range (statistics)
Range (statistics)
In the descriptive statistics, the range is the length of the smallest interval which contains all the data. It is calculated by subtracting the smallest observation from the greatest and provides an indication of statistical dispersion.It is measured in the same units as the data...
Statistical regularity
Statistical regularity
Statistical regularity is a notion in statistics and probability theory that random events exhibit regularity when repeated enough times or that enough sufficiently similar random events exhibit regularity...
Statistical sample
Statistical semantics
Statistical semantics
Statistical semantics is the study of "how the statistical patterns of human word usage can be used to figure out what people mean, at least to a level sufficient for information access"...
Statistical shape analysis
Statistical shape analysis
Statistical shape analysis is a geometrical analysis from a set of shapes in which statistics are measured to describe geometrical properties from similar shapes or different groups, for instance, the difference between male and female Gorilla skull shapes, normal and pathological bone shapes, etc...
Statistical signal processing
Statistical signal processing
Statistical signal processing is an area of Applied Mathematics and Signal Processing that treats signals as stochastic processes, dealing with their statistical properties...
Statistical significance
Statistical significance
In statistics, a result is called statistically significant if it is unlikely to have occurred by chance. The phrase test of significance was coined by Ronald Fisher....
Statistical survey
Statistical survey
Survey methodology is the field that studies surveys, that is, the sample of individuals from a population with a view towards making statistical inferences about the population using the sample. Polls about public opinion, such as political beliefs, are reported in the news media in democracies....
Statistical syllogism
Statistical syllogism
A statistical syllogism is a non-deductive syllogism. It argues from a generalization true for the most part to a particular case .-Introduction:Statistical syllogisms may use qualifying words like "most", "frequently", "almost never", "rarely",...
Statistical theory
Statistical theory
The theory of statistics provides a basis for the whole range of techniques, in both study design and data analysis, that are used within applications of statistics. The theory covers approaches to statistical-decision problems and to statistical inference, and the actions and deductions that...
Statistical unit
Statistical unit
A unit in a statistical analysis refers to one member of a set of entities being studied. It is the material source for the mathematical abstraction of a "random variable"...
Statisticians' and engineers' cross-reference of statistical terms
Statisticians' and engineers' cross-reference of statistical terms
The following terms are used by electrical engineers in statistical signal processing studies instead of typical statistician's terms.The following terms are used by electrical engineers in statistical signal processing studies instead of typical statistician's terms.The following terms are used by...
Statistics
Statistics
Statistics is the study of the collection, organization, analysis, and interpretation of data. It deals with all aspects of this, including the planning of data collection in terms of the design of surveys and experiments....
Statistics education
Statistics education
Statistics education is concerned with the teaching and learning of statistics.Statistics is both a formal science and a practical theory of scientific inquiry, and both aspects are considered in statistics education. Education in statistics has similar concerns as does education in other...
Statistics Online Computational Resource – training materials
StatPlus
StatPlus
StatPlus is a software product that includes basic and multivariate statistical analysis , including time series analysis, nonparametric statistics, survival analysis and ability to build different charts ....
StatXact
StatXact
StatXact is a statistical software package for exact statistics. It calculates exact p-values and confidence intervals for contingency tables and non-parametric procedures. It is marketed by Cytel Inc.-References:...

– software
Stein's example
Stein's example
Stein's example , in decision theory and estimation theory, is the phenomenon that when three or more parameters are estimated simultaneously, there exist combined estimators more accurate on average than any method that handles the parameters separately...
- Proof of Stein's example
  Proof of Stein's example
  Stein's example is an important result in decision theory which can be stated asThe following is an outline of its proof. The reader is referred to the main article for more information.-Sketched proof:...
Stein's lemma
Stein's lemma
Stein's lemma, named in honor of Charles Stein, is a theorem of probability theory that is of interest primarily because of its applications to statistical inference — in particular, to James–Stein estimation and empirical Bayes methods — and its applications to portfolio choice...
Stein's unbiased risk estimate
Stein's unbiased risk estimate
In statistics, Stein's unbiased risk estimate is an unbiased estimator of the mean-squared error of "a nearly arbitrary, nonlinear biased estimator." In other words, it provides an indication of the accuracy of a given estimator...
Steiner system
Steiner system
250px|right|thumbnail|The [[Fano plane]] is an S Steiner triple system. The blocks are the 7 lines, each containing 3 points. Every pair of points belongs to a unique line....
Stemplot
Stemplot
A stemplot , in statistics, is a device for presenting quantitative data in a graphical format, similar to a histogram, to assist in visualizing the shape of a distribution. They evolved from Arthur Bowley's work in the early 1900s, and are useful tools in exploratory data analysis...
Stepwise regression
Stepwise regression
In statistics, stepwise regression includes regression models in which the choice of predictive variables is carried out by an automatic procedure...
Stetson–Harrison method
Stieltjes moment problem
Stimulus-response model
Stimulus-response model
The stimulus–response model is a characterization of a statistical unit as a black box model, predicting a quantitative response to a quantitative stimulus, for example one administered by a researcher.-Fields of application:...
Stochastic
Stochastic
Stochastic refers to systems whose behaviour is intrinsically non-deterministic. A stochastic process is one whose behavior is non-deterministic, in that a system's subsequent state is determined both by the process's predictable actions and by a random element. However, according to M. Kac and E...
Stochastic approximation
Stochastic approximation
Stochastic approximation methods are a family of iterative stochastic optimization algorithms that attempt to find zeroes or extrema of functions which cannot be computed directly, but only estimated via noisy observations....
Stochastic calculus
Stochastic calculus
Stochastic calculus is a branch of mathematics that operates on stochastic processes. It allows a consistent theory of integration to be defined for integrals of stochastic processes with respect to stochastic processes...
Stochastic convergence
Stochastic differential equation
Stochastic differential equation
A stochastic differential equation is a differential equation in which one or more of the terms is a stochastic process, thus resulting in a solution which is itself a stochastic process....
Stochastic dominance
Stochastic dominance
Stochastic dominance is a form of stochastic ordering. The term is used in decision theory and decision analysis to refer to situations where one gamble can be ranked as superior to another gamble. It is based on preferences regarding outcomes...
Stochastic drift
Stochastic drift
In probability theory, stochastic drift is the change of the average value of a stochastic process. A related term is the drift rate which is the rate at which the average changes. This is in contrast to the random fluctuations about this average value...
Stochastic gradient descent
Stochastic gradient descent
Stochastic gradient descent is an optimization method for minimizing an objective function that is written as a sum of differentiable functions.- Background :...
Stochastic grammar
Stochastic grammar
A stochastic grammar is a grammar framework with a probabilistic notion of grammaticality:*Stochastic context-free grammar*Statistical parsing*Data-oriented parsing*Hidden Markov model*Estimation theory...
Stochastic kernel estimation
Stochastic matrix
Stochastic matrix
In mathematics, a stochastic matrix is a matrix used to describe the transitions of a Markov chain. It has found use in probability theory, statistics and linear algebra, as well as computer science...
Stochastic modelling (insurance)
Stochastic optimization
Stochastic optimization
Stochastic optimization methods are optimization methods that generate and use random variables. For stochastic problems, the random variables appear in the formulation of the optimization problem itself, which involve random objective functions or random constraints, for example. Stochastic...
Stochastic ordering
Stochastic ordering
In probability theory and statistics, a stochastic order quantifies the concept of one random variable being "bigger" than another. These are usually partial orders, so that one random variable A may be neither stochastically greater than, less than nor equal to another random variable B...
Stochastic process
Stochastic process
In probability theory, a stochastic process , or sometimes random process, is the counterpart to a deterministic process...
Stochastic rounding
Stochastic simulation
Stochastic simulation
Stochastic simulation algorithms and methods were initially developed to analyse chemical reactions involving large numbers of species with complex reaction kinetics. The first algorithm, the Gillespie algorithm was proposed by Dan Gillespie in 1977...
Stopped process
Stopped process
In mathematics, a stopped process is a stochastic process that is forced to assume the same value after a prescribed time.-Definition:Let* be a probability space;...
Stopping time
Stratified sampling
Stratified sampling
In statistics, stratified sampling is a method of sampling from a population.In statistical surveys, when subpopulations within an overall population vary, it is advantageous to sample each subpopulation independently. Stratification is the process of dividing members of the population into...
Stratonovich integral
Stratonovich integral
In stochastic processes, the Stratonovich integral is a stochastic integral, the most common alternative to the Itō integral...
Stress majorization
Stress majorization
Stress majorization is an optimization strategy used in multidimensional scaling where, for a set of n, m-dimensional data items, a configuration X of n points in rStress majorization is an optimization strategy used in multidimensional scaling (MDS) where, for a set of n, m-dimensional data...
Strong Law of Small Numbers
Strong Law of Small Numbers
"The Strong Law of Small Numbers" is a humorous paper by mathematician Richard K. Guy and also the so-called law that it proclaims: "There aren't enough small numbers to meet the many demands made of them." In other words, any given small number appears in far more contexts than may seem...
Strong prior
Strong prior
A Strong prior is a preceding assumption, theory, concept or idea upon which a current assumption, theory, concept or idea is founded.In Bayesian statistics, the term is used to contrast the case of a weak or uniformative prior probability...
Structural break
Structural break
A structural break is a concept in econometrics. A structural break appears when we see an unexpected shift in a time series. This can lead to huge forecasting errors and unreliability of the model in general...
Structural equation modeling
Structural equation modeling
Structural equation modeling is a statistical technique for testing and estimating causal relations using a combination of statistical data and qualitative causal assumptions...
Structural estimation
Structural estimation
Structural estimation is a technnique for estimating deep "structural" parameters of theoretical economic models. In this sense, "structural estimation" is contrasted with "reduced-form estimation," which generally provides evidence about partial equilibrium relationships in a regression...
Structured data analysis (statistics)
Structured data analysis (statistics)
Structured data analysis is the statistical data analysis of structured data. This can arise either in the form of an a priori structure such as multiple-choice questionnaires or in situations with the need to search for structure that fits the given data, either exactly or approximately...
Studentized range
Studentized residual
Studentized residual
In statistics, a studentized residual is the quotient resulting from the division of a residual by an estimate of its standard deviation. Typically the standard deviations of residuals in a sample vary greatly from one data point to another even when the errors all have the same standard...
Student's t-distribution
Student's t-statistic
Student's t-statistic
In statistics, the t-statistic is a ratio of the departure of an estimated parameter from its notional value and its standard error. It is used in hypothesis testing, for example in the Student's t-test, in the augmented Dickey–Fuller test, and in bootstrapping.-Definition:Let \scriptstyle\hat\beta...
Student's t-test
Student's t-test
A t-test is any statistical hypothesis test in which the test statistic follows a Student's t distribution if the null hypothesis is supported. It is most commonly applied when the test statistic would follow a normal distribution if the value of a scaling term in the test statistic were known...
Student’s t-test for Gaussian scale mixture distributions — redirects to Location testing for Gaussian scale mixture distributions
Location testing for Gaussian scale mixture distributions
In statistics, the topic of location testing for Gaussian scale mixture distributions arises in some particular types of situations where the more standard Student's t-test is inapplicable...
Studentization
Studentization
In statistics, Studentization, named after William Sealy Gosset, who wrote under the pseudonym Student, is the adjustment consisting of division of a first-degree statistic derived from a sample, by a sample-based estimate of a population standard deviation...
Study design
Study design
Clinical study design is the formulation of trials and experiments in medical and epidemiological research, sometimes known as clinical trials. Many of the considerations here are shared under the more general topic of design of experiments but there can be others, in particular related to patient...
Study heterogeneity
Study heterogeneity
In statistics, study heterogeneity is a problem that can arise when attempting to undertake a meta-analysis. Ideally, the studies whose results are being combined in the meta-analysis should all be undertaken in the same way and to the same experimental protocols: study heterogeneity is a term used...
Subcontrary mean redirects to Harmonic mean
Harmonic mean
In mathematics, the harmonic mean is one of several kinds of average. Typically, it is appropriate for situations when the average of rates is desired....
Subgroup analysis
Subgroup analysis
Subgroup analysis, in the context of design and analysis of experiments, refers to looking for pattern in a subset of the subjects....
Subindependence
Substitution model
Substitution model
In biology, a substitution model describes the process from which a sequence of characters changes into another set of traits. For example, in cladistics, each position in the sequence might correspond to a property of a species which can either be present or absent. The alphabet could then consist...
SUDAAN
SUDAAN
SUDAAN is a statistical software package for the analysis of correlated data, including correlated data encountered in complex sample surveys. SUDAAN originated in 1972 at .-Current version:...

– software
Sufficiency (statistics) — redirects to Sufficient statistic
Sufficient dimension reduction
Sufficient dimension reduction
In statistics, sufficient dimension reduction is a paradigm for analyzing data that combines the ideas of dimension reduction with the concept of sufficiency.Dimension reduction has long been a primary goal of regression analysis...
Sufficient statistic
Sum of normally distributed random variables
Sum of normally distributed random variables
In probability theory, calculation of the sum of normally distributed random variables is an instance of the arithmetic of random variables, which can be quite complex based on the probability distributions of the random variables involved and their relationships.-Independent random variables:If X...
Sum of squares — general disambiguation
Sum of squares (statistics) — redirects to Partition of sums of squares
Summary statistic
Superstatistics
Superstatistics
Superstatistics is a branch of statistical mechanics or statistical physics devoted to the study of non-linear and non-equilibrium systems. It is characterized by using the superposition of multiple differing statistical models to achieve the desired non-linearity...
Support curve
Support curve
Support curve is a statistical term, coined by A. W. F. Edwards, to describe the graph of the natural logarithm of the likelihood function. The function being plotted is used in the computation of the score and Fisher information, and the graph has a direct interpretation in the context of maximum...
Support vector machine
Support vector machine
A support vector machine is a concept in statistics and computer science for a set of related supervised learning methods that analyze data and recognize patterns, used for classification and regression analysis...
Surrogate model
Surrogate model
Most engineering design problems require experiments and/or simulations to evaluate design objective and constraint functions as function of design variables. For example, in order to find the optimal airfoil shape for an aircraft wing, an engineer simulates the air flow around the wing for...
Survey data collection
Survey data collection
The methods involved in survey data collection are any of a number of ways in which data can be collected for a statistical survey. These are methods that are used to collect information from a sample of individuals in a systematic way....
Survey sampling
Survey sampling
In statistics, survey sampling describes the process of selecting a sample of elements from a target population in order to conduct a survey.A survey may refer to many different types or techniques of observation, but in the context of survey sampling it most often involves a questionnaire used to...
Survey methodology
Survey Methodology
Survey Methodology is a peer-reviewed open access scientific journal that publishes papers related to the development and application of survey techniques...
Survival analysis
Survival analysis
Survival analysis is a branch of statistics which deals with death in biological organisms and failure in mechanical systems. This topic is called reliability theory or reliability analysis in engineering, and duration analysis or duration modeling in economics or sociology...
Survival rate
Survival rate
In biostatistics, survival rate is a part of survival analysis, indicating the percentage of people in a study or treatment group who are alive for a given period of time after diagnosis...
Survival function
Survival function
The survival function, also known as a survivor function or reliability function, is a property of any random variable that maps a set of events, usually associated with mortality or failure of some system, onto time. It captures the probability that the system will survive beyond a specified time...
Survivorship bias
Survivorship bias
Survivorship bias is the logical error of concentrating on the people or things that "survived" some process and inadvertently overlooking those that didn't because of their lack of visibility. This can lead to false conclusions in several different ways...
Symmetric design
Symmetric design
In combinatorial mathematics, a symmetric design is a block design with equal numbers of points and blocks. Thus, it has the fewest possible blocks given the number of points . They are also known as projective designs....
Symmetric mean absolute percentage error
Symmetric mean absolute percentage error
Symmetric mean absolute percentage error is an accuracy measure based on percentage errors. It is usually defined as follows:where At is the actual value and Ft is the forecast value....
SYSTAT
SYSTAT
SYSTAT is a statistics and statistical graphics software package, developed by Leland Wilkinson in the late 1970s, who was at the time an assistant professor of psychology at the University of Illinois at Chicago...

– software
System dynamics
System dynamics
System dynamics is an approach to understanding the behaviour of complex systems over time. It deals with internal feedback loops and time delays that affect the behaviour of the entire system. What makes using system dynamics different from other approaches to studying complex systems is the use...
System identification
System identification
In control engineering, the field of system identification uses statistical methods to build mathematical models of dynamical systems from measured data...
Systematic error
Systematic error
Systematic errors are biases in measurement which lead to the situation where the mean of many separate measurements differs significantly from the actual value of the measured attribute. All measurements are prone to systematic errors, often of several different types...

(also see bias (statistics)
Bias (statistics)
A statistic is biased if it is calculated in such a way that it is systematically different from the population parameter of interest. The following lists some types of, or aspects of, bias which should not be considered mutually exclusive:...

and errors and residuals in statistics
Errors and residuals in statistics
In statistics and optimization, statistical errors and residuals are two closely related and easily confused measures of the deviation of a sample from its "theoretical value"...

)
Systematic review
Systematic review
A systematic review is a literature review focused on a research question that tries to identify, appraise, select and synthesize all high quality research evidence relevant to that question. Systematic reviews of high-quality randomized controlled trials are crucial to evidence-based medicine...

T

t-distribution; see Student's t-distribution (includes table)
T distribution — disambiguation
t-statistic
Tag cloud
Tag cloud
A tag cloud is a visual representation for text data, typically used to depict keyword metadata on websites, or to visualize free form text. 'Tags' are usually single words, and the importance of each tag is shown with font size or color...

– graphical display of info
Taguchi loss function
Taguchi loss function
The Taguchi Loss Function is a graphical depiction of loss developed by the Japanese business statistician Genichi Taguchi to describe a phenomenon affecting the value of products produced by a company. Praised by Dr. W...
Taguchi methods
Taguchi methods
Taguchi methods are statistical methods developed by Genichi Taguchi to improve the quality of manufactured goods, and more recently also applied to, engineering, biotechnology, marketing and advertising...
Tajima's D
Tajima's D
Tajima's D is a statistical test created by and named after the Japanese researcher Fumio Tajima. The purpose of the test is to distinguish between a DNA sequence evolving randomly and one evolving under a non-random process, including directional selection or balancing selection, demographic...
Taleb distribution
Taleb Distribution
In economics and finance, a Taleb distribution is a term coined by U.K. economists/journalists Martin Wolf and John Kay to describe a returns profile that appears at times deceptively low-risk with steady returns, but experiences periodically catastrophic drawdowns. It does not describe a...
Tampering (quality control)
Tampering (quality control)
Tampering in the context of a controlled process is when adjustments to the process are made based on outcomes which are within the expected range of variability. The net result is to re-align the process so that an increased proportion of the output is out of specification. The term was introduced...
Taylor expansions for the moments of functions of random variables
Taylor expansions for the moments of functions of random variables
In probability theory, it is possible to approximate the moments of a function f of a random variable X using Taylor expansions, provided that f is sufficiently differentiable and that the moments of X are finite...
Telegraph process
Telegraph process
In probability theory, the telegraph process is a memoryless continuous-time stochastic process that shows two distinct values.If these are called a and b, the process can be described by the following master equations:...
Test for structural change
Test for structural change
Test for structural change is an econometric test. It is used to verify the equality of coefficients in separate subsamples. See Chow test....
Test-retest (disambiguation)
Test score
Test score
A test score is a piece of information, usually a number, that conveys the performance of an examinee on a test. One formal definition is that it is "a summary of the evidence contained in an examinee's responses to the items of a test that are related to the construct or constructs being...
Test set
Test set
A test set is a set of data used in various areas of information science to assess the strength and utility of a predictive relationship. Test sets are used in artificial intelligence, machine learning, genetic programming, intelligent systems, and statistics...
Test statistic
Test statistic
In statistical hypothesis testing, a hypothesis test is typically specified in terms of a test statistic, which is a function of the sample; it is considered as a numerical summary of a set of data that...
Testimator
Testimator
A testimator is an estimator whose value depends on the result of a test for statistical significance. In the simplest case the value of the final estimator is that of the basic estimator if the test result is significant, and otherwise the value is zero...
Testing hypotheses suggested by the data
Testing hypotheses suggested by the data
In statistics, hypotheses suggested by the data, if tested using the data set that suggested them, are likely to be accepted even when they are not true...
Text analytics
Text analytics
The term text analytics describes a set of linguistic, statistical, and machine learning techniques that model and structure the information content of textual sources for business intelligence, exploratory data analysis, research, or investigation. The term is roughly synonymous with text mining;...
The Long Tail
The Long Tail
The Long Tail or long tail refers to the statistical property that a larger share of population rests within the tail of a probability distribution than observed under a 'normal' or Gaussian distribution...

— possibly seminal magazine article
The Unscrambler
The Unscrambler
The Unscrambler is a commercial software product for multivariate data analysis, used primarily for calibration in the application of near infrared spectroscopy and development of predictive models for use in real-time spectroscopic analysis of materials. The software was originally developed in...

— software
Theil index
Theil index
The Theil index is a statistic used to measure economic inequality. It has also been used to measure the lack of racial diversity. The basic Theil index TT is the same as redundancy in information theory which is the maximum possible entropy of the data minus the observed entropy. It is a special...
Theil–Sen estimator
Theil–Sen estimator
In non-parametric statistics, the Theil–Sen estimator, also known as Sen's slope estimator, slope selection, the single median method, or the Kendall robust line-fit method, is a method for robust linear regression that chooses the median slope among all lines through pairs of two-dimensional...
Theory of conjoint measurement
Theory of conjoint measurement
The theory of conjoint measurement is a general, formal theory of continuous quantity. It was independently discovered by the French economist Gerard Debreu and by the American mathematical psychologist R...
Therapeutic effect
Therapeutic effect
A therapeutic effect is a consequence of a medical treatment of any kind, the results of which are judged to be desirable and beneficial. This is true whether the result was expected, unexpected, or even an unintended consequence of the treatment...
Three-point estimation
Three-point estimation
The three-point estimation technique is used in management and information systems applications for the construction of an approximate probability distribution representing the outcome of future events, based on very limited information...
Three-stage least squares
Threshold model
Threshold model
In mathematical or statistical modelling a threshold model is any model where a threshold value, or set of threshold values, is used to distinguish ranges of values where the behaviour predicted by the model differs in some important way...
Thurstone scale
Thurstone scale
In psychology, the Thurstone scale was the first formal technique for measuring an attitude. It was developed by Louis Leon Thurstone in 1928, as a means of measuring attitudes towards religion. It is made up of statements about a particular issue, and each statement has a numerical value...
Time-frequency analysis
Time-frequency analysis
In signal processing, time–frequency analysis comprises those techniques that study a signal in both the time and frequency domains simultaneously, using various time–frequency representations...
Time–frequency representation
Time reversibility
Time reversibility
Time reversibility is an attribute of some stochastic processes and some deterministic processes.If a stochastic process is time reversible, then it is not possible to determine, given the states at a number of points in time after running the stochastic process, which state came first and which...
Time series
Time series
In statistics, signal processing, econometrics and mathematical finance, a time series is a sequence of data points, measured typically at successive times spaced at uniform time intervals. Examples of time series are the daily closing value of the Dow Jones index or the annual flow volume of the...
Time-series regression
Time use survey
Time use survey
A Time Use Survey is a statistical survey which aims to report data on how, on average, people spend their time.- Objectives :The objective is to identify, classify and quantify the main types of activity that people engage in during a definitive time period, e.g...
Time-varying covariate
Time-varying covariate
A time-varying covariate is a term used in statistics, particularly in survival analyses. It reflects the phenomenon that a covariate is not necessarily constant through the whole study...
Timeline of probability and statistics
Timeline of probability and statistics
A timeline of probability and statistics-Before 1600:* 9th Century - Al-Kindi was the first to use statistics to decipher encrypted messages and developed the first code breaking algorithm in the House of Wisdom in Baghdad, based on frequency analysis...
TinkerPlots
TinkerPlots
TinkerPlots is exploratory data analysis software designed for use by students in grades 4-8. It was designed by Clifford Konold and Craig Miller at the University of Massachusetts Amherst and is published by Key Curriculum Press. It has some similarities with Fathom, and runs on Windows XP or...

— proprietary software for schools
Tobit model
Tobit model
The Tobit model is a statistical model proposed by James Tobin to describe the relationship between a non-negative dependent variable y_i and an independent variable x_i....
Tolerance interval
Tolerance interval
A tolerance interval is a statistical interval within which, with some confidence level, a specified proportion of a population falls.A tolerance interval can be seen as a statistical version of a probability interval. If we knew a population's exact parameters, we would be able to compute a range...
Top-coded
Top-coded
In econometrics and statistics, a top-coded dataset is one for which the upper bound is not known. This is often done to preserve the anonymity of people participating in the survey .-Example: Top-coding of wealth:Jacob S...
Topic model
Topic model
In machine learning and natural language processing, a topic model is a type of statistical model for discovering the abstract "topics" that occur in a collection of documents. An early topic model was probabilistic latent semantic indexing , created by Thomas Hofmann in 1999...

(statistical natural language processing)
Topological data analysis
Topological data analysis
Topological data analysis is a new area of study aimed at having applications in areas such as data mining and computer vision.The main problems are:# how one infers high-dimensional structure from low-dimensional representations; and...
Tornqvist index
Tornqvist index
In economics the Törnqvist index is a price or quantity index. Using price and quantity data, a Tornqvist index is a discrete approximation to a continuous Divisia index. A Divisia index is a weighted sum of the growth rates of the various components, where the weights are the component's shares in...
Total correlation
Total correlation
In probability theory and in particular in information theory, total correlation is one of several generalizations of the mutual information. It is also known as the multivariate constraint or multiinformation...
Total least squares
Total sum of squares
Total sum of squares
In statistical data analysis the total sum of squares is a quantity that appears as part of a standard way of presenting results of such analyses...
Total variation distance — a statistical distance measure
TPL Tables
TPL Tables
TPL Tables is a cross tabulation system used to generate statistical tables for analysis or publication.- Background / History :TPL Tables has its roots in the Table Producing Language system, developed at the Bureau of Labor Statistics in the 1970s and early 1980s to run on IBM mainframes. It...

– software
Tracy–Widom distribution
Tracy–Widom distribution
The Tracy–Widom distribution, introduced by , is the probability distribution of the largest eigenvalue of a random hermitian matrix in the edge scaling limit. It also appears in the distribution of the length of the longest increasing subsequence of random permutations and in current fluctuations...
Traffic equations
Traffic equations
In queueing theory, a discipline within the mathematical theory of probability, traffic equations are equations that describe the mean arrival rate of traffic, allowing the arrival rates at individual nodes to be determined...
Training set
Training set
A training set is a set of data used in various areas of information science to discover potentially predictive relationships. Training sets are used in artificial intelligence, machine learning, genetic programming, intelligent systems, and statistics...
Transect
Transect
A transect is a path along which one records and counts occurrences of the phenomena of study .It requires an observer to move along a fixed path and to count occurrences along the path and, at the same time, obtain the distance of the object from the path...
Transferable belief model
Transferable belief model
The transferable belief model is an elaboration on the Dempster-Shafer theory of evidence.-Context:Consider the following classical problem of information fusion. A patient has an illness that can be caused by three different factors A, B and C...
Transiogram
Transiogram
Transiogram is the accompanying spatial correlation measure of Markov chain random fields and an important part of Markov chain geostatistics. It is defined as a transition probability function over the distance lag. Simply, a transiogram refers to a transition probability diagram. Transiograms...
Transmission risks and rates
Transmission risks and rates
Transmission of an infection requires three conditions:*an infectious individual*a susceptible individual*an effective contact between themAn effective contact is defined as any kind of contact between two individuals such that, if one individual is infectious and the other susceptible, then the...
Treatment group
Trend analysis
Trend analysis
Trend Analysis is the practice of collecting information and attempting to spot a pattern, or trend, in the information. In some fields of study, the term "trend analysis" has more formally-defined meanings....
Trend estimation
Trend estimation
Trend estimation is a statistical technique to aid interpretation of data. When a series of measurements of a process are treated as a time series, trend estimation can be used to make and justify statements about tendencies in the data...
Trend stationary
Treynor ratio
Treynor ratio
The Treynor ratio , named after Jack L. Treynor, is a measurement of the returns earned in excess of that which could have been earned on an investment that has no diversifiable risk , per each unit of market risk assumed.The Treynor ratio relates...
Triangular distribution
Trimean
Trimean
In statistics the trimean , or Tukey's trimean, is a measure of a probability distribution's location defined as a weighted average of the distribution's median and its two quartiles:This is equivalent to the average of the median and the midhinge:...
Trimmed estimator
Trimmed estimator
Given an estimator, a trimmed estimator is obtained by excluding some of the extreme values. This is generally done to obtain a more robust statistic: the extreme values are considered outliers....
Trispectrum
Trispectrum
In mathematics, in the area of statistical analysis, the trispectrum is a statistic used to search for nonlinear interactions. The Fourier transform of the second-order cumulant, i.e., the autocorrelation function, is the traditional power spectrum...
True experiment
True experiment
A true experiment is a method of social research in which there are two kinds of variables. The independent variable is manipulated by the experimenter, and the dependent variable is measured...
True variance
Truncated distribution
Truncated distribution
In statistics, a truncated distribution is a conditional distribution that results from restricting the domain of some other probability distribution. Truncated distributions arise in practical statistics in cases where the ability to record, or even to know about, occurrences is limited to values...
Truncated mean
Truncated mean
A truncated mean or trimmed mean is a statistical measure of central tendency, much like the mean and median. It involves the calculation of the mean after discarding given parts of a probability distribution or sample at the high and low end, and typically discarding an equal amount of both.For...
Truncated normal distribution
Truncated normal distribution
In probability and statistics, the truncated normal distribution is the probability distribution of a normally distributed random variable whose value is either bounded below or above . The truncated normal distribution has wide applications in statistics and econometrics...
Truncated regression model
Truncated regression model
Truncated regression models arise in many applications of statistics, for example in econometrics, in cases where observations with values in the outcome variable below or above certain thresholds systematically excluded from the sample...
Truncation (statistics)
Truncation (statistics)
In statistics, truncation results in values that are limited above or below, resulting in a truncated sample. Truncation is similar to but distinct from the concept of statistical censoring. A truncated sample can be thought of as being equivalent to an underlying sample with all values outside the...
Tsallis distribution
Tsallis distribution
In q-analog theory and statistical mechanics, a Tsallis distribution is a probability distribution derived from the maximization of the Tsallis entropy under appropriate constraints. There are several different families of Tsallis distributions, yet different sources may reference an individual...
Tsallis statistics
Tsallis statistics
The term Tsallis statistics usually refers to the collection of q-analogs of mathematical functions and associated probability distributions that were originated by Constantino Tsallis. Using these tools, it is possible to derive Tsallis distributions from the optimization of the Tsallis entropic...
Tschuprow's T
Tschuprow's T
In statistics, Tschuprow's T is a measure of association between two nominal variables, giving a value between 0 and 1 . It is closely related to Cramér's V, coinciding with it for square contingency tables....
Tucker decomposition
Tucker decomposition
In mathematics, Tucker decomposition decomposes a tensor into a set of matrices and one small core tensor. It is named after Ledyard R. Tuckeralthough it goes back to Hitchcock in 1927....
Tukey's range test — multiple comparisons
Tukey's test of additivity
Tukey's test of additivity
In statistics, Tukey's test of additivity, named for John Tukey, is an approach used in two-way anova to assess whether the factor variables are additively related to the expected value of the response variable...

— interaction in two-way anova
Tukey–Kramer method
Tukey lambda distribution
Tweedie distributions
Tweedie distributions
In probability and statistics, the Tweedie distributions are a family of probability distributions which include continuous distributions such as the normal and gamma, the purely discrete scaled Poisson distribution, and the class of mixed compound Poisson-Gamma distributions which have positive...
Twisting properties
Two stage least squares — redirects to Instrumental variable
Instrumental variable
In statistics, econometrics, epidemiology and related disciplines, the method of instrumental variables is used to estimate causal relationships when controlled experiments are not feasible....
Two-tailed test
Two-tailed test
The two-tailed test is a statistical test used in inference, in which a given statistical hypothesis, H0 , will be rejected when the value of the test statistic is either sufficiently small or sufficiently large...
Type I and type II errors
Type I and type II errors
In statistical test theory the notion of statistical error is an integral part of hypothesis testing. The test requires an unambiguous statement of a null hypothesis, which usually corresponds to a default "state of nature", for example "this person is healthy", "this accused is not guilty" or...
Type-1 Gumbel distribution
Type-2 Gumbel distribution
Tyranny of averages
Tyranny of averages
The tyranny of averages is a phrase used in applied statistics to describe the often overlooked fact that the mean does not provide any information about the distribution of a data set or skewness, and that decisions or analysis based on this value—as opposed to median and standard deviation—may be...

U

u-chart
U-quadratic distribution
U-quadratic distribution
In probability theory and statistics, the U-quadratic distribution is a continuous probability distribution defined by a unique quadratic function with lower limit a and upper limit b.-Parameter relations:...
U-statistic
U-statistic
In statistical theory, a U-statistic is a class of statistics that is especially important in estimation theory. In elementary statistics, U-statistics arise naturally in producing minimum-variance unbiased estimators...
U test
Umbrella sampling
Umbrella sampling
Umbrella sampling is a technique in computational physics and chemistry, used to improve sampling of a system where ergodicity is hindered by the form of the system's energy landscape. It was first suggested by Torrie and Valleau in 1977...
Unbiased estimator—see bias (statistics)
Bias (statistics)
A statistic is biased if it is calculated in such a way that it is systematically different from the population parameter of interest. The following lists some types of, or aspects of, bias which should not be considered mutually exclusive:...
Unbiased estimation of standard deviation
Unbiased estimation of standard deviation
The question of unbiased estimation of a standard deviation arises in statistics mainly as question in statistical theory. Except in some important situations, outlined later, the task has little relevance to applications of statistics since its need is avoided by standard procedures, such as the...
Uncertainty
Uncertainty
Uncertainty is a term used in subtly different ways in a number of fields, including physics, philosophy, statistics, economics, finance, insurance, psychology, sociology, engineering, and information science...
Uncertainty coefficient
Uncertainty coefficient
In statistics, the uncertainty coefficient, also called entropy coefficient or Theil's U, is a measure of nominal association. It was first introduced by Henri Theil and is based on the concept of information entropy. Suppose we have samples of two random variables, i and j...
Uncertainty quantification
Uncertainty quantification
Uncertainty quantification is the science of quantitative characterization and reduction of uncertainties in applications. It tries to determine how likely certain outcomes are if some aspects of the system are not exactly known...
Uncomfortable science
Uncomfortable science
Uncomfortable science is the term coined by statistician John Tukey for cases in which there is a need to draw an inference from a limited sample of data, where further samples influenced by the same cause system will not be available...
Uncorrelated
Uncorrelated
In probability theory and statistics, two real-valued random variables are said to be uncorrelated if their covariance is zero. Uncorrelatedness is by definition pairwise; i.e...
Underdispersion — redirects to Overdispersion
Overdispersion
In statistics, overdispersion is the presence of greater variability in a data set than would be expected based on a given simple statistical model....
Unexplained variation — redirects to Explained variation
Explained variation
In statistics, explained variation or explained randomness measures the proportion to which a mathematical model accounts for the variation of a given data set...
Underprivileged area score
Underprivileged area score
The Underprivileged Area Score is an indicy to measure socio-economic variation across small geographical areas. The score is an outcome of the need identified in the Acheson Committee Report , to create an indicy to identify 'underprivileged areas' where there were high numbers of patients and...
Uniform distribution (continuous)
Uniform distribution (continuous)
In probability theory and statistics, the continuous uniform distribution or rectangular distribution is a family of probability distributions such that for each member of the family, all intervals of the same length on the distribution's support are equally probable. The support is defined by...
Uniform distribution (discrete)
Uniformly most powerful test
Uniformly most powerful test
In statistical hypothesis testing, a uniformly most powerful test is a hypothesis test which has the greatest power 1 − β among all possible tests of a given size α...
Unimodal distribution redirects to Unimodal function (has some stats context)
Unimodality
Unimodality
Unimodality is a term used in several contexts in mathematics. Originally, it relates to possessing a unique mode.- Unimodal probability distribution :...
Unistat
Unistat
The Unistat computer program is a statistical data analysis tool featuring two modes of operation: The stand-alone user interface is a complete workbench for data input, analysis and visualization while the Microsoft Excel add-in mode extends the features of the mainstream spreadsheet application...

– software
Unit (statistics)
Unit of observation
Unit of observation
The Unit of observation is the unit on which one collects data . For example, a study may have a unit of observation at the individual level but may have the unit of analysis at the neighborhood level, drawing conclusions on neighborhood characteristics from data collected from individuals....
Unit root
Unit root
In time series models in econometrics , a unit root is a feature of processes that evolve through time that can cause problems in statistical inference if it is not adequately dealt with....
Unit root test
Unit root test
In statistics, a unit root test tests whether a time series variable is non-stationary using an autoregressive model. A well-known test that is valid in large samples is the augmented Dickey–Fuller test. The optimal finite sample tests for a unit root in autoregressive models were developed by John...
Unit-weighted regression
Unit-weighted regression
In statistics, unit-weighted regression is perhaps the easiest form of multiple regression analysis, a method in which two or more variables are used to predict the value of an outcome....
Unitized risk
Univariate
Univariate
In mathematics, univariate refers to an expression, equation, function or polynomial of only one variable. Objects of any of these types but involving more than one variable may be called multivariate...
Univariate analysis
Univariate analysis
Univariate analysis is the simplest form of quantitative analysis. The analysis is carried out with the description of a single variable and its attributes of the applicable unit of analysis...
Univariate distribution
Univariate distribution
In statistics, a univariate distribution is a probability distribution of only one random variable. This is in contrast to a multivariate distribution, the probability distribution of a random vector.-Further reading:...
Unmatched count
Unmatched count
In psychology and social research, unmatched count, or item count, is a technique to improve through anonymity the number of true answers to possibly embarrassing or self-incriminating questions. It is very simple to use but yields only the number of people bearing the property of interest.- Method...
Unsolved problems in statistics
Unsolved problems in statistics
There are many longstanding unsolved problems in mathematics for which a solution has still not yet been found. The unsolved problems in statistics are generally of a different flavor; according to John Tukey, "difficulties in identifying problems have delayed statistics far more than difficulties...
Upper and lower probabilities
Upper and lower probabilities
Upper and lower probabilities are representations of imprecise probability. Whereas probability theory uses a single number, the probability, to describe how likely an event is to occur, this method uses two numbers: the upper probability of the event and the lower probability of the event.Because...
Upside potential ratio
Upside potential ratio
The Upside-Potential Ratio is a measure of a return of an investment asset relative to the minimal acceptable return. The measurement allows a firm or individual to choose investments which have had relatively good upside performance, per unit of downside risk....

– finance
Urn problem
Urn problem
In probability and statistics, an urn problem is an idealized mental exercise in which some objects of real interest are represented as colored balls in an urn or other container....
Ursell function
Ursell function
In statistical mechanics, an Ursell function or connected correlation function, is a cumulant ofa random variable. It is also called a connected correlation function as it can often be obtained by summing over...
Utility maximization problem
Utility maximization problem
In microeconomics, the utility maximization problem is the problem consumers face: "how should I spend my money in order to maximize my utility?" It is a type of optimal decision problem.-Basic setup:...
Utilization
Utilization
Utilization is a statistical concept as well as a primary business measure for the rental industry.-Queueing theory:In queueing theory, utilization is the proportion of the system's resources which is used by the traffic which arrives at it. It should be strictly less than one for the system to...
Utilization distribution
Utilization distribution
A utilization distribution is a probability distribution constructed from data providing the location of an individual in space at different points in time....

V

Validity (statistics)
Validity (statistics)
In science and statistics, validity has no single agreed definition but generally refers to the extent to which a concept, conclusion or measurement is well-founded and corresponds accurately to the real world. The word "valid" is derived from the Latin validus, meaning strong...
Van der Waerden test
Van der Waerden test
Named for the Dutch mathematician Bartel Leendert van der Waerden, the Van der Waerden test is a statistical test that k population distribution functions are equal. The Van Der Waerden test converts the ranks from a standard Kruskal-Wallis one-way analysis of variance to quantiles of the standard...
Van Houtum distribution
Van Houtum distribution
In probability theory and statistics, the Van Houtum distribution is a discrete probability distribution named after prof. Geert-Jan van Houtum. It can be characterized by saying that all values of a finite set of possible values are equally probable, except for the smallest and largest element of...
Vapnik–Chervonenkis theory
Varadhan's lemma
Varadhan's lemma
In mathematics, Varadhan's lemma is a result in large deviations theory named after S. R. Srinivasa Varadhan. The result gives information on the asymptotic distribution of a statistic φ of a family of random variables Zε as ε becomes small in terms of a rate function for the variables.-Statement...
Variable
Variable (mathematics)
In mathematics, a variable is a value that may change within the scope of a given problem or set of operations. In contrast, a constant is a value that remains unchanged, though often unknown or undetermined. The concepts of constants and variables are fundamental to many areas of mathematics and...
Variable kernel density estimation
Variable kernel density estimation
In statistics, adaptive or "variable-bandwidth" kernel density estimation is a form of kernel density estimation in which the size of the kernels used in the estimate are varied...
Variable-order Bayesian network
Variable-order Bayesian network
Variable-order Bayesian network models provide an important extension of both the Bayesian network models and the variable-order Markov models...
Variable-order Markov model
Variable-order Markov model
Variable-order Markov models are an important class of models that extend the well known Markov chain models. In contrast to the Markov chain models, where each random variable in a sequence with a Markov property depends on a fixed number of random variables, in VOM models this number of...
Variable rules analysis
Variance
Variance
In probability theory and statistics, the variance is a measure of how far a set of numbers is spread out. It is one of several descriptors of a probability distribution, describing how far the numbers lie from the mean . In particular, the variance is one of the moments of a distribution...
Variance decomposition
Variance decomposition
Variance decomposition or forecast error variance decomposition indicates the amount of information each variable contributes to the other variables in a vector autoregression models...
Variance gamma process
Variance gamma process
In the theory of stochastic processes, a part of the mathematical theory of probability, the variance gamma process , also known as Laplace motion, is a Lévy process determined by a random time change. The process has finite moments distinguishing it from many Lévy processes. There is no diffusion...
Variance inflation factor
Variance inflation factor
In statistics, the variance inflation factor quantifies the severity of multicollinearity in an ordinary least squares regression analysis...
Variance-gamma distribution
Variance reduction
Variance reduction
In mathematics, more specifically in the theory of Monte Carlo methods, variance reduction is a procedure used to increase the precision of the estimates that can be obtained for a given number of iterations. Every output random variable from the simulation is associated with a variance which...
Variance-stabilizing transformation
Variance-stabilizing transformation
In applied statistics, a variance-stabilizing transformation is a data transformation that is specifically chosen either to simplify considerations in graphical exploratory data analysis or to allow the application of simple regression-based or analysis of variance techniques.The aim behind the...
Variance-to-mean ratio
Variation ratio
Variational Bayesian methods
Variational message passing
Variational message passing
Variational message passing is an approximate inference technique for continuous- or discrete-valued Bayesian networks, with conjugate-exponential parents, developed by John Winn...
Variogram
Variogram
In spatial statistics the theoretical variogram 2\gamma is a function describing the degree of spatial dependence of a spatial random field or stochastic process Z...
Varimax rotation
Varimax rotation
In statistics, a varimax rotation is a change of coordinates used in principal component analysis and factor analysis that maximizes the sum of the variances of the squared loadings...
Vasicek model
Vasicek model
In finance, the Vasicek model is a mathematical model describing the evolution of interest rates. It is a type of "one-factor model" as it describes interest rate movements as driven by only one source of market risk...
VC dimension
VC dimension
In statistical learning theory, or sometimes computational learning theory, the VC dimension is a measure of the capacity of a statistical classification algorithm, defined as the cardinality of the largest set of points that the algorithm can shatter...
VC theory
Vector autoregression
Vector autoregression
Vector autoregression is a statistical model used to capture the linear interdependencies among multiple time series. VAR models generalize the univariate autoregression models. All the variables in a VAR are treated symmetrically; each variable has an equation explaining its evolution based on...
VEGAS algorithm
VEGAS algorithm
The VEGAS algorithm, due to G. P. Lepage, is a method for reducing error in Monte Carlo simulations by using a known or approximate probability distribution function to concentrate the search in those areas of the graph that make the greatest contribution to the final integral.The VEGAS algorithm...
Violin plot
Violin plot
Violin plots are a method of plotting numeric data. A violin plot is a combination of a box plot and a kernel density plot. Specifically, it starts with a box plot...
ViSta - Software — redirects to ViSta, The Visual Statistics system
ViSta, The Visual Statistics system
ViSta, the Visual Statistics system is a freeware statistical system developed by Forrest W. Young of the University of North Carolina. ViSta current version maintained by Pedro M. Valero-Mora of the University of Valencia and can be found at...
Voigt profile
Volatility (finance)
Volatility (finance)
In finance, volatility is a measure for variation of price of a financial instrument over time. Historic volatility is derived from time series of past market prices...
Volcano plot (statistics)
Volcano plot (statistics)
In statistics, a volcano plot is a type of scatter-plot that is used to quickly identify changes in large datasets composed of replicate data . It plots significance versus fold-change on the y- and x-axes, respectively...
Von Mises distribution
Von Mises–Fisher distribution
V-optimal histograms
V-statistic
V-statistic
V-statistics are a class of statistics named for Richard von Mises who developed their asymptotic distribution theory in a fundamental paper in 1947. V-statistics are closely related to U-statistics introduced by Wassily Hoeffding in 1948...
Vuong's closeness test
Vysochanskiï–Petunin inequality

W

Wald distribution redirects to Inverse Gaussian distribution
Inverse Gaussian distribution
| cdf = \Phi\left +\exp\left \Phi\left...
Wald test
Wald test
The Wald test is a parametric statistical test named after Abraham Wald with a great variety of uses. Whenever a relationship within or between data items can be expressed as a statistical model with parameters to be estimated from a sample, the Wald test can be used to test the true value of the...
Wald's decision theory
Wald's decision theory
Wald's decision theory was explicated in his last book, "Statistical decision functions"...
Wald–Wolfowitz runs test
Wallenius' noncentral hypergeometric distribution
Wallenius' noncentral hypergeometric distribution
In probability theory and statistics, Wallenius' noncentral hypergeometric distribution is a generalization of the hypergeometric distribution where items are sampled with bias....
Wang and Landau algorithm
Wang and Landau algorithm
The Wang and Landau algorithm proposed by Fugao Wang and David P. Landau is an extension of Metropolis Monte Carlo sampling. It is designed to calculate the density of states of a computer-simulated system, such as an Ising model of spin glasses, or model atoms in a molecular force field...
Watterson estimator
Watterson estimator
In population genetics, the Watterson estimator is a method for estimating the population mutation rate, \theta = 4N_e\mu, where N_e is the effective population size and \mu is the per-generation mutation rate of the population of interest...
Watts and Strogatz model
Watts and Strogatz model
The Watts and Strogatz model is a random graph generation model that produces graphs with small-world properties, including short average path lengths and high clustering. It was proposed by Duncan J. Watts and Steven Strogatz in their joint 1998 Nature paper...
Weibull chart — presently redirects to weibull distribution
Weibull distribution
Weibull modulus
Weibull modulus
The Weibull modulus is a dimensionless parameter of the Weibull distribution which is used to describe variability in measured material strength of brittle materials. For ceramics and other brittle materials, the maximum stress that a sample can be measured to withstand before failure may vary from...
Weight function
Weight function
A weight function is a mathematical device used when performing a sum, integral, or average in order to give some elements more "weight" or influence on the result than other elements in the same set. They occur frequently in statistics and analysis, and are closely related to the concept of a...
Weighted sample redirects to Sample mean and sample covariance
Sample mean and sample covariance
The sample mean or empirical mean and the sample covariance are statistics computed from a collection of data on one or more random variables. The sample mean is a vector each of whose elements is the sample mean of one of the random variables that is, each of whose elements is the average of the...
Weighted covariance matrix redirects to Sample mean and sample covariance
Sample mean and sample covariance
The sample mean or empirical mean and the sample covariance are statistics computed from a collection of data on one or more random variables. The sample mean is a vector each of whose elements is the sample mean of one of the random variables that is, each of whose elements is the average of the...
Weighted mean
Weighted mean
The weighted mean is similar to an arithmetic mean , where instead of each of the data points contributing equally to the final average, some data points contribute more than others...
Welch's t test
Welch's t test
In statistics, Welch's t test is an adaptation of Student's t-test intended for use with two samples having possibly unequal variances. As such, it is an approximate solution to the Behrens–Fisher problem.-Formulas:...
Welch–Satterthwaite equation
Well-behaved statistic
Well-behaved statistic
A well-behaved statistic is a term sometimes used in the theory of statistics to describe part of a procedure. This usage is broadly similar to the use of well-behaved in more general mathematics...
Wick product
Wick product
In probability theory, the Wick product\langle X_1,\dots,X_k \rangle\,named after physicist Gian-Carlo Wick, is a sort of product of the random variables, X1, ..., Xk, defined recursively as follows:\langle \rangle = 1\,...
Wilks' lambda distribution
Wilks' lambda distribution
In statistics, Wilks' lambda distribution , is a probability distribution used in multivariate hypothesis testing, especially with regard to the likelihood-ratio test and Multivariate analysis of variance...
Winsorized mean
Winsorized mean
A Winsorized mean is a Winsorized statistical measure of central tendency, much like the mean and median, and even more similar to the truncated mean...
Whipple's index
Whipple's Index
Survey or census respondents sometimes inaccurately report ages or dates of birth. Whipple's index , invented by the American demographer George Chandler Whipple , indicates the extent to which age data show systematic heaping on certain ages as a result of digit preference or rounding...
White test
White test
In statistics, the White test is a statistical test that establishes whether the residual variance of a variable in a regression model is constant: that is for homoscedasticity....
White noise
White noise
White noise is a random signal with a flat power spectral density. In other words, the signal contains equal power within a fixed bandwidth at any center frequency...
Wide and narrow data
Wide and narrow data
Wide and narrow are terms used to describe two different presentations for tabular data.- Wide :Wide, or unstacked data is presented with each different data variable in a separate column.- Narrow :...
Wiener deconvolution
Wiener deconvolution
In mathematics, Wiener deconvolution is an application of the Wiener filter to the noise problems inherent in deconvolution. It works in the frequency domain, attempting to minimize the impact of deconvoluted noise at frequencies which have a poor signal-to-noise ratio.The Wiener deconvolution...
Wiener filter
Wiener filter
In signal processing, the Wiener filter is a filter proposed by Norbert Wiener during the 1940s and published in 1949. Its purpose is to reduce the amount of noise present in a signal by comparison with an estimation of the desired noiseless signal. The discrete-time equivalent of Wiener's work was...
Wiener process
Wiener process
In mathematics, the Wiener process is a continuous-time stochastic process named in honor of Norbert Wiener. It is often called standard Brownian motion, after Robert Brown...
Wigner quasi-probability distribution
Wigner quasi-probability distribution
The Wigner quasi-probability distribution is a quasi-probability distribution. It was introduced by Eugene Wigner in 1932 to study quantum corrections to classical statistical mechanics...
Wigner semicircle distribution
Wike's law of low odd primes
Wike's law of low odd primes
Wike's law of low odd primes is a methodological principle to help design sound experiments in psychology. It is: "If the number of experimental treatments is a low odd prime number, then the experimental design is unbalanced and partially confounded" Wike's law of low odd primes is a...
Wilcoxon signed-rank test
Wilcoxon signed-rank test
The Wilcoxon signed-rank test is a non-parametric statistical hypothesis test used when comparing two related samples or repeated measurements on a single sample to assess whether their population mean ranks differ The Wilcoxon signed-rank test is a non-parametric statistical hypothesis test used...
Will Rogers phenomenon
Will Rogers phenomenon
The Will Rogers phenomenon is obtained when moving an element from one set to another set raises the average values of both sets. It is based on the following quote, attributed to comedian Will Rogers:...
WinBUGS
WinBUGS
WinBUGS is statistical software for Bayesian analysis using Markov chain Monte Carlo methods.It is based on the BUGS project started in 1989...

– software
Window function
Window function
In signal processing, a window function is a mathematical function that is zero-valued outside of some chosen interval. For instance, a function that is constant inside the interval and zero elsewhere is called a rectangular window, which describes the shape of its graphical representation...
Winpepi
Winpepi
WinPepi is a freeware package of statistical programs for epidemiologists, comprising seven programs with over 120 modules. WinPepi is not a complete compendium of statistical routines for epidemiologists but it provides a very wide range of procedures, including those most commonly used and many...

– software
Winsorising
Winsorising
Winsorising or Winsorization is the transformation of statistics by limiting extreme values in the statistical data to reduce the effect of possibly spurious outliers. It is named after the engineer-turned-biostatistician Charles P. Winsor...
Wishart distribution
Wold's theorem
Wombling
Wombling
In statistics, Wombling is any of a number of techniques used for identifying zones of rapid change, typically in some quantity as it varies across some geographical or Euclidean space. It is named for statistician William H. Womble....
World Programming System
World Programming System
The World Programming System, also known as WPS, is a software product developed by a company called World Programming. WPS allows users to create, edit and run programs written in the language of SAS. The latest release of WPS covers a significant gap in use of WPS. It now provides PROC REG and...

– software
Wrapped Cauchy distribution
Wrapped distribution
Wrapped distribution
In probability theory and directional statistics, a wrapped probability distribution is a continuous probability distribution that describes data points that lie on a unit n-sphere...
Wrapped exponential distribution
Wrapped normal distribution
Wrapped normal distribution
In probability theory and directional statistics, a wrapped normal distribution is a wrapped probability distribution which results from the "wrapping" of the normal distribution around the unit circle. It finds application in the theory of Brownian motion and is a solution to the heat equation for...
Wrapped Lévy distribution
Wrapped Lévy distribution
In probability theory and directional statistics, a wrapped Lévy distribution is a wrapped probability distribution that results from the "wrapping" of the Lévy distribution around the unit circle.- Description :The pdf of the wrapped Lévy distribution is...
Writer invariant
Writer invariant
Writer invariant, also called authorial invariant or author's invariant, is a property of a text which is invariant of its author, that is, it will be similar in all texts of a given author and different in texts of different authors. It can be used to find plagiarism or discover who is real author...

X

X-12-ARIMA
X-12-ARIMA
X-12-ARIMA is the U.S. Census Bureau's software package for seasonal adjustment. It can be used together with gretl, which provides a graphical user interface for X-12-ARIMA.X-12-ARIMA is the successor to X-11-ARIMA-See also:*AnSWR*ARIMA*CSPro...
chart
X-bar chart
In industrial statistics, the X-bar chart is a type of Shewhart control chart that is used to monitor the arithmetic means of successive samples of constant size, n. This type of control chart is used for characteristics that can be measured on a continuous scale, such as weight, temperature,...
and R chart
and s chart
XLispStat
XLispStat
XLispStat is an open-source statistical scientific package based on the XLISP language.As from xlispstat startup: XLISP-PLUS version 3.04 Portions Copyright 1988, by David Betz. Modified by Thomas Almy and others....

– software
XLSTAT
XLSTAT
XLSTAT is a commercial statistical and multivariate analysis software. The software has been developed by Addinsoft and was introduced by Thierry Fahmy, the founder of Addinsoft, in 1993. It is a Microsoft Excel add-in...

– software
XploRe
XploRe
XploRe is the name of a commercial statistics software, developed by the German software company MD*Tech. XploRe is not sold anymore. The last version, 4.8, is available for download at no cost. The user interacts with the software via the XploRe programming language, which is derived from the C...

– software

Y

Yamartino method
Yamartino method
The Yamartino method is an algorithm for calculating an approximation to the standard deviation σθ of wind direction θ during a single pass through the incoming data...
Yates analysis
Yates Analysis
Full- and fractional-factorial designs are common in designed experiments for engineering and scientific applications. In these designs, each factor is assigned two levels. These are typically called the low and high levels. For computational purposes, the factors are scaled so that the low level...
Yates's correction for continuity
Youden's J statistic
Youden's J statistic
Youden's J statistic is a single statistic that captures the performance of a diagnostic test. The use of such a single index is "not generally to be recommended". It is equal to the risk difference for a dichotomous test ....
Yule–Simon distribution
Yxilon
Yxilon
Yxilon is a modular open-source statistical programming language.Developed by Sigbert Klinke, Uwe Ziegenhagen, and Yuval Guri.A re-implementation of the XploRe language, with the intention of providing better performance by using compiled code instead of a language interpreter...

– statistical programming language

Z

z-score
z-factor
Z-factor
The Z-factor is a measure of statistical effect size. It has been proposed for use in high-throughput screening to judge whether the response in a particular assay is large enough to warrant further attention.-Background:...
z statistic
Z statistic
In statistics, the Vuong closeness test is likelihood-ratio-based test for model selection using the Kullback-Leibler information criterion. This statistic makes probabilistic statements about two models. It tests the null hypothesis, that two models are as close to the actual model against the...
Z-test
Z-test
A Z-test is any statistical test for which the distribution of the test statistic under the null hypothesis can be approximated by a normal distribution. Due to the central limit theorem, many test statistics are approximately normally distributed for large samples...
Zakai equation
Zakai equation
In filtering theory the Zakai equation is a linear recursive filtering equation for the un-normalized density of a hidden state. In contrast, the Kushner equation gives a non-linear recursive equation for the normalized density of the hidden state...
Zelen's design
Zero-one law (disambiguation)
Zeta distribution
Ziggurat algorithm
Ziggurat algorithm
The ziggurat algorithm is an algorithm for pseudo-random number sampling. Belonging to the class of rejection sampling algorithms, it relies on an underlying source of uniformly-distributed random numbers, typically from a pseudo-random number generator, as well as precomputed tables. The...
Zipf–Mandelbrot law — a discrete distribution
Zipf's law

External links

ISI Glossary of Statistical Terms (multilingual), International Statistical Institute

The source of this article is wikipedia, the free encyclopedia. The text of this article is licensed under the GFDL.

0–9

A

B

C

D

E

F

G

H

I

J

K

L

M

N

O

P

Q

R

S

T

U

V

W

X

Y

Z

See also

External links