Likert scale
Encyclopedia
A Likert scale is a psychometric
scale commonly involved in research that employs questionnaire
s. It is the most widely used approach to scaling responses in survey research, such that the term is often used interchangeably with rating scale
, or more accurately the Likert-type scale, even though the two are not synonymous. The scale is named after its inventor, psychologist
Rensis Likert
. Likert distinguished between a scale proper, which emerges from collective responses to a set of items (usually eight or more), and the format in which responses are scored along a range. Technically speaking, a Likert scale refers only to the former. The difference between these two concepts has to do with the distinction Likert made between the underlying phenomenon being investigated and the means of capturing variation that points to the underlying phenomenon. When responding to a Likert questionnaire item, respondents specify their level of agreement or disagreement on a symmetric agree-disagree scale for a series of statements. Thus, the range captures the intensity of their feelings for a given item, while the results of analysis of multiple items (if the items are developed appropriately) reveals a pattern that has scaled properties of the kind Likert identified.
A Likert item is simply a statement which the respondent is asked to evaluate according to any kind of subjective or objective criteria; generally the level of agreement or disagreement is measured. It is considered symmetric or "balanced" because there are equal amounts of positive and negative positions. Often five ordered response levels are used, although many psychometricians advocate using seven or nine levels; a recent empirical study found that a 5- or 7- point scale may produce slightly higher mean scores relative to the highest possible attainable score, compared to those produced from a 10-point scale, and this difference was statistically significant. In terms of the other data characteristics, there was very little difference among the scale formats in terms of variation about the mean, skewness
or kurtosis
.
The format of a typical five-level Likert item, for example, could be:
Likert scaling is a bipolar scaling method
, measuring either positive or negative response to a statement. Sometimes a even-point scale is used, where the middle option of "Neither agree nor disagree" is not available. This is sometimes called a "forced choice" method since the neutral opion is removed . The neutral option can be seen as an easy option to take when a respondent is unsure - so is questionable whether it is a true neutral option. It has been shown that when comparing between a 4-point and 5-point Liket scale, where the former has the netural option unavailable, that the overall difference in the response is negligible.
.
Likert scales may be subject to distortion from several causes. Respondents may avoid using extreme response categories (central tendency bias); agree with statements as presented (acquiescence bias
); or try to portray themselves or their organization in a more favorable light (social desirability bias
). Designing a scale with balanced keying (an equal number of positive and negative statements) can obviate the problem of acquiescence bias, since acquiescence on positively keyed items will balance acquiescence on negatively keyed items, but central tendency and social desirability are somewhat more problematic.
Whether individual Likert items can be considered as interval-level data, or whether they should be treated as ordered-categorical data is the subject of considerable disagreement in literature, which strong convictions on what are the most applicable methods. This disagreement can be traced back, in many respects, to the extent in which Likert items are interpreted as being ordinal
data.
There are two primary considerations in this discussion. Firstly, a key factor to accept is that Likert scales are arbitrary. The value assigned to a Likert item has no unique mathematical property, either in terms of measure theory or scale (from which a distance metric can be determined). The value assigned for each Likert item is simply determined by the researcher as providing the necessary detail for their research. However, for convention, Likert items tend to take progressive positive integer values. Likert scales typically range from 2 to 10 – with 5 or 7 being the most common. In this, the typical structure of the Likert scale is such that each progressive Likert item is treated as having a ‘better’ response than the preceding value. (This may differ in cases where reverse ordering of the Likert Scale is needed).
The second, and possibly more important point, is whether the ‘distance’ between each successive Likert item is equidistant – which is traditionally inferred. For example, in the above 5-point Likert Scale, the inference is that the ‘distance’ between items ‘1’ and ‘2’ is the same as between items ‘3’ and ‘4’. In terms of good research ethics, an equidistant presentation by the researcher is important; otherwise it will introduce a research bias into the analysis. For example, a 4-point Likert Scale – Poor, Average, Good, Very Good – is unlikely to be equidistant as there is only one item that can receive a below average rating. This would clearly bias any result in favor of a better outcome. However, even if a researcher presents an equidistant scale, this may not be interpreted as such by the respondent.
A good Likert scale, as above, will present a symmetry of Likert items about a middle category that have clearly defined linguistic qualifiers for each item. In such symmetric scaling, equidistant attributes will typically be more clearly observed or, at least, inferred. It is when a Likert scale is symmetric and equidistant that it will behave more like an interval-level measurement. So while a Likert scale is ordinal
(which cannot be denied) – if it is well presented, then it may be possible the Likert Scale can approximate an interval-level measurement. This is beneficial as, if it was treated just as an ordinal scale, then some valuable information could be lost if the ‘distance’ between Likert items were not available for consideration. The important idea here is that the appropriate type of analysis is dependent on how the Likert scale has been presented.
Given its ordinal basis, it remains more correct to summarize the central tendency of responses from a Likert scale by using either the median
or the mode
, with ‘spread’ measured by quartiles or percentiles. Non-parametric tests should be preferred for statistical inferences, such as chi-squared test, Mann–Whitney test, Wilcoxon signed-rank test
, or Kruskal–Wallis test. . While some commentators consider that parametric analysis is justified for a Likert scale using the Central Limit Theorem
, this should be reserved for when the Likert scale has suitable symmetry and equidistance so an interval-level measurement can be approximated and reasonably inferred.
Responses to several Likert questions may be summed, providing that all questions use the same Likert scale and that the scale is a defensible approximation to an interval scale, in which case they may be treated as interval
data measuring a latent variable. If the summed responses fulfill these assumptions, parametric statistical tests such as the analysis of variance
can be applied. These can be applied only when more than 5 Likert questions are summed.
Data from Likert scales are sometimes reduced to the nominal level by combining all agree and disagree responses into two categories of "accept" and "reject". The chi-squared
, Cochran Q, or McNemar test are common statistical procedures used after this transformation.
Consensus based assessment (CBA) can be used to create an objective standard for Likert scales in domains where no generally accepted standard or objective standard exists. Consensus based assessment (CBA) can be used to refine or even validate generally accepted standards.
. But this can only be the case if the intervals between the scale points correspond to empirical observations in a metric sense. Reips and Funke (2008) show that this criterion is much better met by a visual analogue scale
. In fact, there may also appear phenomena which even question the ordinal scale level in Likert scales. For example, in a set of items A,B,C rated with a Likert scale circular relations like A>B, B>C and C>A can appear. This violates the axiom of transitivity for the ordinal scale.
, when data can be obtained that fit this model. In addition, the polytomous Rasch model permits testing of the hypothesis
that the statements reflect increasing levels of an attitude or trait, as intended. For example, application of the model often indicates that the neutral category does not represent a level of attitude or trait between the disagree and agree categories.
Again, not every set of Likert scaled items can be used for Rasch measurement. The data has to be thoroughly checked to fulfill the strict formal axiom
s of the model.
, the developer of the scale, pronounced his name 'lick-urt' with a short "i" sound. It has been claimed that Likert's name "is among the most mispronounced in [the] field." Although many people use the long "i" variant ('lie-kurt'), those who attempt to stay true to Dr. Likert's pronunciation use the short "i" pronunciation ('lick-urt').
Psychometrics
Psychometrics is the field of study concerned with the theory and technique of psychological measurement, which includes the measurement of knowledge, abilities, attitudes, personality traits, and educational measurement...
scale commonly involved in research that employs questionnaire
Questionnaire
A questionnaire is a research instrument consisting of a series of questions and other prompts for the purpose of gathering information from respondents. Although they are often designed for statistical analysis of the responses, this is not always the case...
s. It is the most widely used approach to scaling responses in survey research, such that the term is often used interchangeably with rating scale
Rating scale
A rating scale is a set of categories designed to elicit information about a quantitative or a qualitative attribute. In the social sciences, common examples are the Likert scale and 1-10 rating scales in which a person selects the number which is considered to reflect the perceived quality of a...
, or more accurately the Likert-type scale, even though the two are not synonymous. The scale is named after its inventor, psychologist
Psychologist
Psychologist is a professional or academic title used by individuals who are either:* Clinical professionals who work with patients in a variety of therapeutic contexts .* Scientists conducting psychological research or teaching psychology in a college...
Rensis Likert
Rensis Likert
Rensis Likert was an American educator and organizational psychologist best known for his research on management styles...
. Likert distinguished between a scale proper, which emerges from collective responses to a set of items (usually eight or more), and the format in which responses are scored along a range. Technically speaking, a Likert scale refers only to the former. The difference between these two concepts has to do with the distinction Likert made between the underlying phenomenon being investigated and the means of capturing variation that points to the underlying phenomenon. When responding to a Likert questionnaire item, respondents specify their level of agreement or disagreement on a symmetric agree-disagree scale for a series of statements. Thus, the range captures the intensity of their feelings for a given item, while the results of analysis of multiple items (if the items are developed appropriately) reveals a pattern that has scaled properties of the kind Likert identified.
Sample question presented using a five-point Likert item
An important distinction must be made between a Likert scale and a Likert item. The Likert scale is the sum of responses on several Likert items. Because Likert items are often accompanied by a visual analog scale (e.g., a horizontal line, on which a subject indicates his or her response by circling or checking tick-marks), the items are sometimes called scales themselves. This is the source of much confusion; it is better, therefore, to reserve the term Likert scale to apply to the summed scale, and Likert item to refer to an individual item.A Likert item is simply a statement which the respondent is asked to evaluate according to any kind of subjective or objective criteria; generally the level of agreement or disagreement is measured. It is considered symmetric or "balanced" because there are equal amounts of positive and negative positions. Often five ordered response levels are used, although many psychometricians advocate using seven or nine levels; a recent empirical study found that a 5- or 7- point scale may produce slightly higher mean scores relative to the highest possible attainable score, compared to those produced from a 10-point scale, and this difference was statistically significant. In terms of the other data characteristics, there was very little difference among the scale formats in terms of variation about the mean, skewness
Skewness
In probability theory and statistics, skewness is a measure of the asymmetry of the probability distribution of a real-valued random variable. The skewness value can be positive or negative, or even undefined...
or kurtosis
Kurtosis
In probability theory and statistics, kurtosis is any measure of the "peakedness" of the probability distribution of a real-valued random variable...
.
The format of a typical five-level Likert item, for example, could be:
- Strongly disagree
- Disagree
- Neither agree nor disagree
- Agree
- Strongly agree
Likert scaling is a bipolar scaling method
Scale (social sciences)
In the social sciences, scaling is the process of measuring or ordering entities with respect to quantitative attributes or traits. For example, a scaling technique might involve estimating individuals' levels of extraversion, or the perceived quality of products...
, measuring either positive or negative response to a statement. Sometimes a even-point scale is used, where the middle option of "Neither agree nor disagree" is not available. This is sometimes called a "forced choice" method since the neutral opion is removed . The neutral option can be seen as an easy option to take when a respondent is unsure - so is questionable whether it is a true neutral option. It has been shown that when comparing between a 4-point and 5-point Liket scale, where the former has the netural option unavailable, that the overall difference in the response is negligible.
.
Likert scales may be subject to distortion from several causes. Respondents may avoid using extreme response categories (central tendency bias); agree with statements as presented (acquiescence bias
Acquiescence bias
Acquiescence bias is a category of response bias in which respondents to a survey have a tendency to agree with all the questions or to indicate a positive connotation. Acquiescence is sometimes referred to as "yah-saying" and is the tendency of a respondent to agree with a statement when in doubt...
); or try to portray themselves or their organization in a more favorable light (social desirability bias
Social desirability bias
Social desirability bias is the tendency of respondents to answer questions in a manner that will be viewed favorably by others. It can take the form of over-reporting good behavior or under-reporting bad behavior. The tendency poses a serious problem with conducting research with self-reports,...
). Designing a scale with balanced keying (an equal number of positive and negative statements) can obviate the problem of acquiescence bias, since acquiescence on positively keyed items will balance acquiescence on negatively keyed items, but central tendency and social desirability are somewhat more problematic.
Scoring and analysis
After the questionnaire is completed, each item may be analyzed separately or in some cases item responses may be summed to create a score for a group of items. Hence, Likert scales are often called summative scales.Whether individual Likert items can be considered as interval-level data, or whether they should be treated as ordered-categorical data is the subject of considerable disagreement in literature, which strong convictions on what are the most applicable methods. This disagreement can be traced back, in many respects, to the extent in which Likert items are interpreted as being ordinal
Level of measurement
The "levels of measurement", or scales of measure are expressions that typically refer to the theory of scale types developed by the psychologist Stanley Smith Stevens. Stevens proposed his theory in a 1946 Science article titled "On the theory of scales of measurement"...
data.
There are two primary considerations in this discussion. Firstly, a key factor to accept is that Likert scales are arbitrary. The value assigned to a Likert item has no unique mathematical property, either in terms of measure theory or scale (from which a distance metric can be determined). The value assigned for each Likert item is simply determined by the researcher as providing the necessary detail for their research. However, for convention, Likert items tend to take progressive positive integer values. Likert scales typically range from 2 to 10 – with 5 or 7 being the most common. In this, the typical structure of the Likert scale is such that each progressive Likert item is treated as having a ‘better’ response than the preceding value. (This may differ in cases where reverse ordering of the Likert Scale is needed).
The second, and possibly more important point, is whether the ‘distance’ between each successive Likert item is equidistant – which is traditionally inferred. For example, in the above 5-point Likert Scale, the inference is that the ‘distance’ between items ‘1’ and ‘2’ is the same as between items ‘3’ and ‘4’. In terms of good research ethics, an equidistant presentation by the researcher is important; otherwise it will introduce a research bias into the analysis. For example, a 4-point Likert Scale – Poor, Average, Good, Very Good – is unlikely to be equidistant as there is only one item that can receive a below average rating. This would clearly bias any result in favor of a better outcome. However, even if a researcher presents an equidistant scale, this may not be interpreted as such by the respondent.
A good Likert scale, as above, will present a symmetry of Likert items about a middle category that have clearly defined linguistic qualifiers for each item. In such symmetric scaling, equidistant attributes will typically be more clearly observed or, at least, inferred. It is when a Likert scale is symmetric and equidistant that it will behave more like an interval-level measurement. So while a Likert scale is ordinal
Level of measurement
The "levels of measurement", or scales of measure are expressions that typically refer to the theory of scale types developed by the psychologist Stanley Smith Stevens. Stevens proposed his theory in a 1946 Science article titled "On the theory of scales of measurement"...
(which cannot be denied) – if it is well presented, then it may be possible the Likert Scale can approximate an interval-level measurement. This is beneficial as, if it was treated just as an ordinal scale, then some valuable information could be lost if the ‘distance’ between Likert items were not available for consideration. The important idea here is that the appropriate type of analysis is dependent on how the Likert scale has been presented.
Given its ordinal basis, it remains more correct to summarize the central tendency of responses from a Likert scale by using either the median
Median
In probability theory and statistics, a median is described as the numerical value separating the higher half of a sample, a population, or a probability distribution, from the lower half. The median of a finite list of numbers can be found by arranging all the observations from lowest value to...
or the mode
Mode
Mode may mean:* Transport mode, a means of transportation* Block cipher modes of operation, in cryptography* A technocomplex of stone tools...
, with ‘spread’ measured by quartiles or percentiles. Non-parametric tests should be preferred for statistical inferences, such as chi-squared test, Mann–Whitney test, Wilcoxon signed-rank test
Wilcoxon signed-rank test
The Wilcoxon signed-rank test is a non-parametric statistical hypothesis test used when comparing two related samples or repeated measurements on a single sample to assess whether their population mean ranks differ The Wilcoxon signed-rank test is a non-parametric statistical hypothesis test used...
, or Kruskal–Wallis test. . While some commentators consider that parametric analysis is justified for a Likert scale using the Central Limit Theorem
Central limit theorem
In probability theory, the central limit theorem states conditions under which the mean of a sufficiently large number of independent random variables, each with finite mean and variance, will be approximately normally distributed. The central limit theorem has a number of variants. In its common...
, this should be reserved for when the Likert scale has suitable symmetry and equidistance so an interval-level measurement can be approximated and reasonably inferred.
Responses to several Likert questions may be summed, providing that all questions use the same Likert scale and that the scale is a defensible approximation to an interval scale, in which case they may be treated as interval
Level of measurement
The "levels of measurement", or scales of measure are expressions that typically refer to the theory of scale types developed by the psychologist Stanley Smith Stevens. Stevens proposed his theory in a 1946 Science article titled "On the theory of scales of measurement"...
data measuring a latent variable. If the summed responses fulfill these assumptions, parametric statistical tests such as the analysis of variance
Analysis of variance
In statistics, analysis of variance is a collection of statistical models, and their associated procedures, in which the observed variance in a particular variable is partitioned into components attributable to different sources of variation...
can be applied. These can be applied only when more than 5 Likert questions are summed.
Data from Likert scales are sometimes reduced to the nominal level by combining all agree and disagree responses into two categories of "accept" and "reject". The chi-squared
Chi-squared
In statistics, the term chi-squared has different uses:*chi-squared distribution, a continuous probability distribution;*chi-squared statistic, a statistic used in some statistical tests;...
, Cochran Q, or McNemar test are common statistical procedures used after this transformation.
Consensus based assessment (CBA) can be used to create an objective standard for Likert scales in domains where no generally accepted standard or objective standard exists. Consensus based assessment (CBA) can be used to refine or even validate generally accepted standards.
Level of measurement
The five response categories are often believed to represent an Interval level of measurementLevel of measurement
The "levels of measurement", or scales of measure are expressions that typically refer to the theory of scale types developed by the psychologist Stanley Smith Stevens. Stevens proposed his theory in a 1946 Science article titled "On the theory of scales of measurement"...
. But this can only be the case if the intervals between the scale points correspond to empirical observations in a metric sense. Reips and Funke (2008) show that this criterion is much better met by a visual analogue scale
Visual Analogue Scale
A visual analogue scale is a psychometric response scale which can be used in questionnaires. It is a measurement instrument for subjective characteristics or attitudes that cannot be directly measured. When responding to a VAS item, respondents specify their level of agreement to a statement by...
. In fact, there may also appear phenomena which even question the ordinal scale level in Likert scales. For example, in a set of items A,B,C rated with a Likert scale circular relations like A>B, B>C and C>A can appear. This violates the axiom of transitivity for the ordinal scale.
Rasch model
Likert scale data can, in principle, be used as a basis for obtaining interval level estimates on a continuum by applying the polytomous Rasch modelPolytomous Rasch model
The polytomous Rasch model is generalization of the dichotomous Rasch model. It is a measurement model that has potential application in any context in which the objective is to measure a trait or ability through a process in which responses to items are scored with successive integers...
, when data can be obtained that fit this model. In addition, the polytomous Rasch model permits testing of the hypothesis
Hypothesis
A hypothesis is a proposed explanation for a phenomenon. The term derives from the Greek, ὑποτιθέναι – hypotithenai meaning "to put under" or "to suppose". For a hypothesis to be put forward as a scientific hypothesis, the scientific method requires that one can test it...
that the statements reflect increasing levels of an attitude or trait, as intended. For example, application of the model often indicates that the neutral category does not represent a level of attitude or trait between the disagree and agree categories.
Again, not every set of Likert scaled items can be used for Rasch measurement. The data has to be thoroughly checked to fulfill the strict formal axiom
Axiom
In traditional logic, an axiom or postulate is a proposition that is not proven or demonstrated but considered either to be self-evident or to define and delimit the realm of analysis. In other words, an axiom is a logical statement that is assumed to be true...
s of the model.
Pronunciation
Rensis LikertRensis Likert
Rensis Likert was an American educator and organizational psychologist best known for his research on management styles...
, the developer of the scale, pronounced his name 'lick-urt' with a short "i" sound. It has been claimed that Likert's name "is among the most mispronounced in [the] field." Although many people use the long "i" variant ('lie-kurt'), those who attempt to stay true to Dr. Likert's pronunciation use the short "i" pronunciation ('lick-urt').
See also
- Analog scale
- Bogardus Social Distance ScaleBogardus Social Distance ScaleThe Bogardus social distance scale is a psychological testing scale created by Emory S. Bogardus to empirically measure people's willingness to participate in social contacts of varying degrees of closeness with members of diverse social groups, such as racial and ethnic groups.The scale asks...
- Consensus-based assessment (CBA)
- Diamond of oppositesDiamond of oppositesThe diamond of opposites is a type of two-dimensional plot used in psychodrama groups. This tool can illuminate the presence of contradictions in processes that cannot be detected by any single questionnaire item using a traditional format such as the Likert scale...
- Discan scaleDiscanDiscan is both a scale and a method in clinical psychology. As a scale, it is a type of ordered-metric scaling that yields a scale with internal reliability, and scale-points in excess of the number of initial anchors, more than would be the case with the Likert scale, though not as many as are...
- F-scaleF-scaleThe F-scale is a 1947 personality test, designed by Theodor W. Adorno and others to measure the authoritarian personality. The "F" stands for "fascist." The F-scale measures responses on several different components of authoritarianism, including conventionalism, authoritarian submission,...
- Guttman scaleGuttman scaleIn statistical surveys conducted by means of structured interviews or questionnaires, a subset of the survey items having binary answers forms a Guttman scale if they can be ranked in some order so that, for a rational respondent, the response pattern can be captured by a single index on that...
- IpsativeIpsativeIpsative is a descriptor used in psychology to indicate a specific type of measure in which respondents compare two or more desirable options and pick the one that is most preferred . This is contrasted with measures that use Likert-type scales, in which respondents choose the score Ipsative is a...
- Mokken scale
- Phrase completion scalesPhrase completionsPhrase completion scales are a type of psychometric scale used in questionnaires. Developed in response to the problems associated with Likert scales, Phrase completions are concise, unidimensional measures that tap ordinal level data in a manner that approximates interval level data.- Overview of...
- ProScan SurveyProScan SurveyThe ProScan Survey is an instrument designed by Professional DynaMetric Programs, Inc. to measure the major aspects of self-perception, including an individual’s basic behavior, reaction to environment, and predictable behavior. It was originally developed beginning in 1976 by Dr. Samuel R....
- Rating scaleRating scaleA rating scale is a set of categories designed to elicit information about a quantitative or a qualitative attribute. In the social sciences, common examples are the Likert scale and 1-10 rating scales in which a person selects the number which is considered to reflect the perceived quality of a...
- Rating sites
- Reverse coding
- Rosenberg self-esteem scale
- Satisficing
- Semantic differentialSemantic differentialSemantic differential is a type of a rating scale designed to measure the connotative meaning of objects, events, and concepts. The connotations are used to derive the attitude towards the given object, event or concept.-Semantic differential:...
- Thurstone scaleThurstone scaleIn psychology, the Thurstone scale was the first formal technique for measuring an attitude. It was developed by Louis Leon Thurstone in 1928, as a means of measuring attitudes towards religion. It is made up of statements about a particular issue, and each statement has a numerical value...
- Voting systemVoting systemA voting system or electoral system is a method by which voters make a choice between options, often in an election or on a policy referendum....
External links
- Correlation scatter-plot matrix - for ordered-categorical data - On the visual presentation of correlation between Likert scale variables