Stratified sampling
Encyclopedia
In statistics
, stratified sampling is a method of sampling
from a population.
In statistical survey
s, when subpopulations within an overall population vary, it is advantageous to sample each subpopulation (stratum) independently. Stratification is the process of dividing members of the population into homogeneous subgroups before sampling. The strata should be mutually exclusive: every element in the population must be assigned to only one stratum. The strata should also be collectively exhaustive: no population element can be excluded. Then random or systematic sampling
is applied within each stratum. This often improves the representativeness of the sample by reducing sampling error. It can produce a weighted mean
that has less variability than the arithmetic mean
of a simple random sample
of the population.
In computational statistics
, stratified sampling is a method of variance reduction
when Monte Carlo method
s are used to estimate population statistics from a known population.
A real-world example of using stratified sampling would be for a political survey
. If the respondents needed to reflect the diversity of the population, the researcher would specifically seek to include participants of various minority groups such as race or religion, based on their proportionality to the total population as mentioned above. A stratified survey could thus claim to be more representative of the population than a survey of simple random sampling or systematic sampling
.
Similarly, if population density varies greatly within a region, stratified sampling will ensure that estimates can be made with equal accuracy in different parts of the region, and that comparisons of sub-regions can be made with equal statistical power
. For example, in Ontario
a survey taken throughout the province might use a larger sampling fraction in the less populated north, since the disparity in population between north and south is so great that a sampling fraction based on the provincial sample as a whole might result in the collection of only a handful of data from the north.
Randomized stratification can also be used to improve population representativeness in a study.
It would be a misapplication of the technique to make subgroups' sample sizes proportional to the amount of data available from the subgroups, rather than scaling sample sizes to subgroup sizes (or to their variances, if known to vary significantly e.g. by means of an F Test). Data representing each subgroup are taken to be of equal importance if suspected variation among them warrants stratified sampling. If, on the other hand, the very variances vary so much, among subgroups, that the data need to be stratified by variance, there is no way to make the subgroup sample sizes proportional (at the same time) to the subgroups' sizes within the total population. (What is the most efficient way to partition sampling resources among groups that vary in both their means and their variances?)
and we are asked to take a sample of 40 staff, stratified according to the above categories.
The first step is to find the total number of staff (180) and calculate the percentage in each group.
This tells us that of our sample of 40,
Another easy way without having to calculate the percentage is to multiply each group size by the sample size and divide by the total population size (size of entire staff):
Statistics
Statistics is the study of the collection, organization, analysis, and interpretation of data. It deals with all aspects of this, including the planning of data collection in terms of the design of surveys and experiments....
, stratified sampling is a method of sampling
Sampling (statistics)
In statistics and survey methodology, sampling is concerned with the selection of a subset of individuals from within a population to estimate characteristics of the whole population....
from a population.
In statistical survey
Statistical survey
Survey methodology is the field that studies surveys, that is, the sample of individuals from a population with a view towards making statistical inferences about the population using the sample. Polls about public opinion, such as political beliefs, are reported in the news media in democracies....
s, when subpopulations within an overall population vary, it is advantageous to sample each subpopulation (stratum) independently. Stratification is the process of dividing members of the population into homogeneous subgroups before sampling. The strata should be mutually exclusive: every element in the population must be assigned to only one stratum. The strata should also be collectively exhaustive: no population element can be excluded. Then random or systematic sampling
Systematic sampling
Systematic sampling is a statistical method involving the selection of elements from an ordered sampling frame. The most common form of systematic sampling is an equal-probability method, in which every kth element in the frame is selected, where k, the sampling interval , is calculated as:k =...
is applied within each stratum. This often improves the representativeness of the sample by reducing sampling error. It can produce a weighted mean
Weighted mean
The weighted mean is similar to an arithmetic mean , where instead of each of the data points contributing equally to the final average, some data points contribute more than others...
that has less variability than the arithmetic mean
Arithmetic mean
In mathematics and statistics, the arithmetic mean, often referred to as simply the mean or average when the context is clear, is a method to derive the central tendency of a sample space...
of a simple random sample
Simple random sample
In statistics, a simple random sample is a subset of individuals chosen from a larger set . Each individual is chosen randomly and entirely by chance, such that each individual has the same probability of being chosen at any stage during the sampling process, and each subset of k individuals has...
of the population.
In computational statistics
Computational statistics
Computational statistics, or statistical computing, is the interface between statistics and computer science. It is the area of computational science specific to the mathematical science of statistics....
, stratified sampling is a method of variance reduction
Variance reduction
In mathematics, more specifically in the theory of Monte Carlo methods, variance reduction is a procedure used to increase the precision of the estimates that can be obtained for a given number of iterations. Every output random variable from the simulation is associated with a variance which...
when Monte Carlo method
Monte Carlo method
Monte Carlo methods are a class of computational algorithms that rely on repeated random sampling to compute their results. Monte Carlo methods are often used in computer simulations of physical and mathematical systems...
s are used to estimate population statistics from a known population.
Stratified sampling strategies
- Proportionate allocation uses a sampling fractionSampling fractionIn sampling theory, sampling fraction is the ratio of sample size to population size or, in the context of stratified sampling, the ratio of the sample size to the size of the stratum....
in each of the strata that is proportional to that of the total population. For instance, if the population consists of 60% in the male stratum and 40% in the female stratum, then the relative size of the two samples (three males, two females) should reflect this proportion. - Optimum allocation (or Disproportionate allocation) - Each stratum is proportionate to the standard deviationStandard deviationStandard deviation is a widely used measure of variability or diversity used in statistics and probability theory. It shows how much variation or "dispersion" there is from the average...
of the distribution of the variable. Larger samples are taken in the strata with the greatest variability to generate the least possible sampling variance.
A real-world example of using stratified sampling would be for a political survey
Statistical survey
Survey methodology is the field that studies surveys, that is, the sample of individuals from a population with a view towards making statistical inferences about the population using the sample. Polls about public opinion, such as political beliefs, are reported in the news media in democracies....
. If the respondents needed to reflect the diversity of the population, the researcher would specifically seek to include participants of various minority groups such as race or religion, based on their proportionality to the total population as mentioned above. A stratified survey could thus claim to be more representative of the population than a survey of simple random sampling or systematic sampling
Systematic sampling
Systematic sampling is a statistical method involving the selection of elements from an ordered sampling frame. The most common form of systematic sampling is an equal-probability method, in which every kth element in the frame is selected, where k, the sampling interval , is calculated as:k =...
.
Similarly, if population density varies greatly within a region, stratified sampling will ensure that estimates can be made with equal accuracy in different parts of the region, and that comparisons of sub-regions can be made with equal statistical power
Statistical power
The power of a statistical test is the probability that the test will reject the null hypothesis when the null hypothesis is actually false . The power is in general a function of the possible distributions, often determined by a parameter, under the alternative hypothesis...
. For example, in Ontario
Ontario
Ontario is a province of Canada, located in east-central Canada. It is Canada's most populous province and second largest in total area. It is home to the nation's most populous city, Toronto, and the nation's capital, Ottawa....
a survey taken throughout the province might use a larger sampling fraction in the less populated north, since the disparity in population between north and south is so great that a sampling fraction based on the provincial sample as a whole might result in the collection of only a handful of data from the north.
Randomized stratification can also be used to improve population representativeness in a study.
Disadvantages
Stratified sampling is not useful when the population cannot be exhaustively partitioned into disjoint subgroups.It would be a misapplication of the technique to make subgroups' sample sizes proportional to the amount of data available from the subgroups, rather than scaling sample sizes to subgroup sizes (or to their variances, if known to vary significantly e.g. by means of an F Test). Data representing each subgroup are taken to be of equal importance if suspected variation among them warrants stratified sampling. If, on the other hand, the very variances vary so much, among subgroups, that the data need to be stratified by variance, there is no way to make the subgroup sample sizes proportional (at the same time) to the subgroups' sizes within the total population. (What is the most efficient way to partition sampling resources among groups that vary in both their means and their variances?)
Practical example
In general the size of the sample in each stratum is taken in proportion to the size of the stratum. This is called proportional allocation. Suppose that in a company there are the following staff:- male, full time: 90
- male, part time: 18
- female, full time: 9
- female, part time: 63
- Total: 180
and we are asked to take a sample of 40 staff, stratified according to the above categories.
The first step is to find the total number of staff (180) and calculate the percentage in each group.
- % male, full time = 90 / 180 = 50%
- % male, part time = 18 / 180 = 10%
- % female, full time = 9 / 180 = 5%
- % female, part time = 63 / 180 = 35%
This tells us that of our sample of 40,
- 50% should be male, full time.
- 10% should be male, part time.
- 5% should be female, full time.
- 35% should be female, part time.
- 50% of 40 is 20.
- 10% of 40 is 4.
- 5% of 40 is 2.
- 35% of 40 is 14.
Another easy way without having to calculate the percentage is to multiply each group size by the sample size and divide by the total population size (size of entire staff):
- male, full time = 90 x (40 / 180) = 20
- male, part time = 18 x (40 / 180) = 4
- female, full time = 9 x (40 / 180) = 2
- female, part time = 63 x (40 / 180) = 14
See also
- Opinion PollOpinion pollAn opinion poll, sometimes simply referred to as a poll is a survey of public opinion from a particular sample. Opinion polls are usually designed to represent the opinions of a population by conducting a series of questions and then extrapolating generalities in ratio or within confidence...
- Statistical benchmarkingStatistical benchmarkingIn statistics, benchmarking is a method of using auxiliary information to adjust the sampling weights used in an estimation process, in order to yield more accurate estimates of totals....
- Stratified sample size