Bias (statistics)
Encyclopedia
A statistic
is biased if it is calculated in such a way that it is systematically different from the population parameter
of interest. The following lists some types of, or aspects of, bias which should not be considered mutually exclusive:
Statistic
A statistic is a single measure of some attribute of a sample . It is calculated by applying a function to the values of the items comprising the sample which are known together as a set of data.More formally, statistical theory defines a statistic as a function of a sample where the function...
is biased if it is calculated in such a way that it is systematically different from the population parameter
Statistical parameter
A statistical parameter is a parameter that indexes a family of probability distributions. It can be regarded as a numerical characteristic of a population or a model....
of interest. The following lists some types of, or aspects of, bias which should not be considered mutually exclusive:
- Selection biasSelection biasSelection bias is a statistical bias in which there is an error in choosing the individuals or groups to take part in a scientific study. It is sometimes referred to as the selection effect. The term "selection bias" most often refers to the distortion of a statistical analysis, resulting from the...
, where individuals or groups are more likely to take part in a researchResearchResearch can be defined as the scientific search for knowledge, or as any systematic investigation, to establish novel facts, solve new or existing problems, prove new ideas, or develop new theories, usually using a scientific method...
project than others, resulting in biased sampleBiased sampleIn statistics, sampling bias is when a sample is collected in such a way that some members of the intended population are less likely to be included than others. It results in a biased sample, a non-random sample of a population in which all individuals, or instances, were not equally likely to...
s. This can also be termed Berksonian bias.- Spectrum biasSpectrum biasInitially identified in 1978, spectrum bias refers to the phenomenon that the performance of a diagnostic test may change between different clinical settings owing to changes in the patient case-mix thereby affecting the transferability of study results in clinical practice...
arises from evaluating diagnostic tests on biased patient samples, leading to an overestimate of the sensitivity and specificitySensitivity and specificitySensitivity and specificity are statistical measures of the performance of a binary classification test, also known in statistics as classification function. Sensitivity measures the proportion of actual positives which are correctly identified as such Sensitivity and specificity are statistical...
of the test.
- Spectrum bias
- The bias of an estimatorBias of an estimatorIn statistics, bias of an estimator is the difference between this estimator's expected value and the true value of the parameter being estimated. An estimator or decision rule with zero bias is called unbiased. Otherwise the estimator is said to be biased.In ordinary English, the term bias is...
is the difference between an estimator's expectation and the true value of the parameter being estimated.- Omitted-variable biasOmitted-variable biasIn statistics, omitted-variable bias occurs when a model is created which incorrectly leaves out one or more important causal factors. The 'bias' is created when the model compensates for the missing factor by over- or under-estimating one of the other factors.More specifically, OVB is the bias...
is the bias that appears in estimates of parameters in a regression analysis when the assumed specification is incorrect, in that it omits an independent variable that should be in the model.
- Omitted-variable bias
- In statistical hypothesis testingStatistical hypothesis testingA statistical hypothesis test is a method of making decisions using data, whether from a controlled experiment or an observational study . In statistics, a result is called statistically significant if it is unlikely to have occurred by chance alone, according to a pre-determined threshold...
, a test is said to be unbiased when the probability of rejecting the null hypothesis is less than or equal to the significance level when the null hypothesis is true, and the probability of rejecting the null hypothesis is greater than or equal to the significance level when the alternative hypothesis is true, - Detection bias is where a phenomenon is more likely to be observed and/or reported for a particular set of study subjects. For instance, the syndemicSyndemicSyndemic refers to the aggregation of two or more diseases in a population in which there is some level of positive biological interaction that exacerbates the negative health effects of any or all of the diseases...
involving obesityObesityObesity is a medical condition in which excess body fat has accumulated to the extent that it may have an adverse effect on health, leading to reduced life expectancy and/or increased health problems...
and diabetesDiabetes mellitusDiabetes mellitus, often simply referred to as diabetes, is a group of metabolic diseases in which a person has high blood sugar, either because the body does not produce enough insulin, or because cells do not respond to the insulin that is produced...
may mean doctors are more likely to look for diabetes in obese patients than in less overweight patients, leading to an inflation in diabetes among obese patients because of skewed detection efforts. - Funding biasFunding biasThe terms funding bias, sponsorship bias, funding outcome bias, or funding publication bias refer to an observed tendency of the conclusion of a scientific research study to support the interests of the study's financial sponsor. This phenomenon is recognized sufficiently that researchers undertake...
may lead to selection of outcomes, test samples, or test procedures that favor a study's financial sponsor. - Reporting bias involves a skew in the availability of data, such that observations of a certain kind may be more likely to be reported and consequently used in research.
- Data-snooping bias comes from the misuse of data mining techniques.