Cluster sampling
Encyclopedia
Cluster Sampling is a sampling
technique used when "natural" groupings are evident in a statistical population
. It is often used in marketing research
. In this technique, the total population is divided into these groups (or clusters) and a sample
of the groups is selected. Then the required information is collected from the elements within each selected group. This may be done for every element in these groups or a subsample of elements may be selected within each of these groups. A common motivation for cluster sampling is to reduce the average cost per interview. Given a fixed budget, this can allow an increased sample size. Assuming a fixed sample size, the technique gives more accurate results when most of the variation in the population is within the groups, not between them.
s. Each cluster should be a small scale representation of the total population. The clusters should be mutually exclusive and collectively exhaustive. A random sampling technique is then used on any relevant clusters to choose which clusters to include in the study. In single-stage cluster sampling, all the elements from each of the selected clusters are used. In two-stage cluster sampling, a random sampling technique is applied to the elements from each of the selected clusters.
The main difference between cluster sampling and tratified sampling] is that in cluster sampling the cluster is treated as the sampling unit so analysis is done on a population of clusters (at least in the first stage). In stratified sampling, the analysis is done on elements within strata. In stratified sampling, a random sample is drawn from each of the strata, whereas in cluster sampling only the selected clusters are studied. The main objective of cluster sampling is to reduce costs by increasing sampling efficiency. This contrasts with stratified sampling where the main objective is to increase precision.
There also exists multistage sampling, where more than two steps are taken in selecting clusters from clusters.
s, but cost savings may make that feasible.
In some situations, cluster analysis is only appropriate when the clusters are approximately the same size. This can be achieved by combining clusters. If this is not possible, probability proportionate to size sampling is used. In this method, the probability of selecting any cluster varies with the size of the cluster, giving larger clusters a greater probability of selection and smaller clusters a lower probability. However, if clusters are selected with probability proportionate to size, the same number of interviews should be carried out in each sampled cluster so that each unit sampled has the same probability of selection.
Cluster sampling is used to estimate high mortalities in cases such as war
s, famine
s and natural disaster
s.
Sampling (statistics)
In statistics and survey methodology, sampling is concerned with the selection of a subset of individuals from within a population to estimate characteristics of the whole population....
technique used when "natural" groupings are evident in a statistical population
Statistical population
A statistical population is a set of entities concerning which statistical inferences are to be drawn, often based on a random sample taken from the population. For example, if we were interested in generalizations about crows, then we would describe the set of crows that is of interest...
. It is often used in marketing research
Marketing research
Marketing research is "the function that links the consumer, customer, and public to the marketer through information — information used to identify and define marketing opportunities and problems; generate, refine, and evaluate marketing actions; monitor marketing performance; and improve...
. In this technique, the total population is divided into these groups (or clusters) and a sample
Sample (statistics)
In statistics, a sample is a subset of a population. Typically, the population is very large, making a census or a complete enumeration of all the values in the population impractical or impossible. The sample represents a subset of manageable size...
of the groups is selected. Then the required information is collected from the elements within each selected group. This may be done for every element in these groups or a subsample of elements may be selected within each of these groups. A common motivation for cluster sampling is to reduce the average cost per interview. Given a fixed budget, this can allow an increased sample size. Assuming a fixed sample size, the technique gives more accurate results when most of the variation in the population is within the groups, not between them.
Cluster elements
Elements within a cluster should ideally be as heterogeneous as possible, but there should be homogeneity between cluster meanMean
In statistics, mean has two related meanings:* the arithmetic mean .* the expected value of a random variable, which is also called the population mean....
s. Each cluster should be a small scale representation of the total population. The clusters should be mutually exclusive and collectively exhaustive. A random sampling technique is then used on any relevant clusters to choose which clusters to include in the study. In single-stage cluster sampling, all the elements from each of the selected clusters are used. In two-stage cluster sampling, a random sampling technique is applied to the elements from each of the selected clusters.
The main difference between cluster sampling and tratified sampling] is that in cluster sampling the cluster is treated as the sampling unit so analysis is done on a population of clusters (at least in the first stage). In stratified sampling, the analysis is done on elements within strata. In stratified sampling, a random sample is drawn from each of the strata, whereas in cluster sampling only the selected clusters are studied. The main objective of cluster sampling is to reduce costs by increasing sampling efficiency. This contrasts with stratified sampling where the main objective is to increase precision.
There also exists multistage sampling, where more than two steps are taken in selecting clusters from clusters.
Aspects of cluster sampling
One version of cluster sampling is area sampling or geographical cluster sampling. Clusters consist of geographical areas. Because a geographically dispersed population can be expensive to survey, greater economy than simple random sampling can be achieved by treating several respondents within a local area as a cluster. It is usually necessary to increase the total sample size to achieve equivalent precision in the estimatorEstimator
In statistics, an estimator is a rule for calculating an estimate of a given quantity based on observed data: thus the rule and its result are distinguished....
s, but cost savings may make that feasible.
In some situations, cluster analysis is only appropriate when the clusters are approximately the same size. This can be achieved by combining clusters. If this is not possible, probability proportionate to size sampling is used. In this method, the probability of selecting any cluster varies with the size of the cluster, giving larger clusters a greater probability of selection and smaller clusters a lower probability. However, if clusters are selected with probability proportionate to size, the same number of interviews should be carried out in each sampled cluster so that each unit sampled has the same probability of selection.
Cluster sampling is used to estimate high mortalities in cases such as war
War
War is a state of organized, armed, and often prolonged conflict carried on between states, nations, or other parties typified by extreme aggression, social disruption, and usually high mortality. War should be understood as an actual, intentional and widespread armed conflict between political...
s, famine
Famine
A famine is a widespread scarcity of food, caused by several factors including crop failure, overpopulation, or government policies. This phenomenon is usually accompanied or followed by regional malnutrition, starvation, epidemic, and increased mortality. Every continent in the world has...
s and natural disaster
Natural disaster
A natural disaster is the effect of a natural hazard . It leads to financial, environmental or human losses...
s.
Advantages
- Can be cheaper than other methods - e.g. fewer travel expenses, administration costs
Disadvantages
- Higher sampling errorSampling error-Random sampling:In statistics, sampling error or estimation error is the error caused by observing a sample instead of the whole population. The sampling error can be found by subtracting the value of a parameter from the value of a statistic...
, which can be expressed in the so-called "design effect", the ratio between the number of subjects in the cluster study and the number of subjects in an equally reliable, randomly sampled unclustered study.