Rating scale
Encyclopedia
- Concerning rating scales as systems of educational marks, see articles about education in different countries (named "Education in ..."), for example, Education in UkraineEducation in UkraineThere is nearly 100% literacy in Ukraine.11 years of schooling are mandatory. As a rule, schooling begins at the age of 6..According to Chairman of the Verkhovna Rada Volodymyr Lytvyn the amount of budget financing for the sphere of education reached about 6% of Ukraine's GDP in November...
. - Concerning rating scales used in the practice of medicine, see articles about diagnoses, for example, Major depressive disorder.
A rating scale is a set of categories designed to elicit information about a quantitative
Quantitative property
A quantitative property is one that exists in a range of magnitudes, and can therefore be measured with a number. Measurements of any particular quantitative property are expressed as a specific quantity, referred to as a unit, multiplied by a number. Examples of physical quantities are distance,...
or a qualitative attribute. In the social sciences
Social sciences
Social science is the field of study concerned with society. "Social science" is commonly used as an umbrella term to refer to a plurality of fields outside of the natural sciences usually exclusive of the administrative or managerial sciences...
, common examples are the Likert scale
Likert scale
A Likert scale is a psychometric scale commonly involved in research that employs questionnaires. It is the most widely used approach to scaling responses in survey research, such that the term is often used interchangeably with rating scale, or more accurately the Likert-type scale, even though...
and 1-10 rating scales
Scale of one to ten
A scale of one to ten or scale from one to ten is a general and largely vernacular concept used for rating things, people, places, ideas, and so on. It is a natural and popular choice of scale used in ordinary speech, along with scales of one to five and then one to four...
in which a person selects the number which is considered to reflect the perceived quality of a product
Product (business)
In general, the product is defined as a "thing produced by labor or effort" or the "result of an act or a process", and stems from the verb produce, from the Latin prōdūce ' lead or bring forth'. Since 1575, the word "product" has referred to anything produced...
.
Background
A rating scale is an instrument that requires the rater to assign the rated object that have numerals assigned to them.Types of Rating Scales
All the rating scales can be classified into one of the following four classifications:-- Some data are measured at the ordinal level. Numbers indicate the relative position of items, but not the magnitude of difference. One example is a Likert scaleLikert scaleA Likert scale is a psychometric scale commonly involved in research that employs questionnaires. It is the most widely used approach to scaling responses in survey research, such that the term is often used interchangeably with rating scale, or more accurately the Likert-type scale, even though...
:- Statement: I could not live without my computer.
- Response options:
- Strongly disagree
- Disagree
- Agree
- Strongly agree
- Some data are measured at the interval level. Numbers indicate the magnitude of difference between items, but there is no absolute zero point. Examples are attitude scales and opinion scales.
- Some data are measured at the ratio level. Numbers indicate magnitude of difference and there is a fixed zero point. Ratios can be calculated. Examples include age, income, price, costs, sales revenue, sales volume and market share.
More than one rating scale is required to measure
Measurement
Measurement is the process or the result of determining the ratio of a physical quantity, such as a length, time, temperature etc., to a unit of measurement, such as the metre, second or degree Celsius...
an attitude or perception due to the requirement for statistical comparisons between the categories in the polytomous Rasch model
Polytomous Rasch model
The polytomous Rasch model is generalization of the dichotomous Rasch model. It is a measurement model that has potential application in any context in which the objective is to measure a trait or ability through a process in which responses to items are scored with successive integers...
for ordered categories (Andrich, 1978). In terms of Classical test theory
Classical test theory
Classical test theory is a body of related psychometric theory that predict outcomes of psychological testing such as the difficulty of items or the ability of test-takers. Generally speaking, the aim of classical test theory is to understand and improve the reliability of psychological...
, more than one question is required to obtain an index of internal reliability such as Cronbach's alpha
Cronbach's alpha
Cronbach's \alpha is a coefficient of reliability. It is commonly used as a measure of the internal consistency or reliability of a psychometric test score for a sample of examinees. It was first named alpha by Lee Cronbach in 1951, as he had intended to continue with further coefficients...
(Cronbach, 1951), which is a basic criterion for assessing the effectiveness of a rating scale and, more generally, a psychometric instrument.
Rating scales used online
Rating scales are used widely online in an attempt to provide indications of consumer opinions of products. Examples of sites which employ ratings scales are IMDb, Epinions.com, Internet Book List, Yahoo! MoviesYahoo! Movies
Yahoo! Movies , provided by the Yahoo! network, is home to a large collection of information on movies, past and new releases, trailers and clips, box office information, and showtimes and movie theater information. Yahoo! Movies also includes red carpet photos, actor galleries, and production...
, Amazon.com
Amazon.com
Amazon.com, Inc. is a multinational electronic commerce company headquartered in Seattle, Washington, United States. It is the world's largest online retailer. Amazon has separate websites for the following countries: United States, Canada, United Kingdom, Germany, France, Italy, Spain, Japan, and...
, BoardGameGeek
BoardGameGeek
BoardGameGeek is a website that was founded in January 2000 by Scott Alden and Derk Solko as a resource for the board gaming hobby. The database holds reviews, articles, and session reports for over 45,000 different games, expansions, and designers. BoardGameGeek includes German-style board games,...
, TV.com
TV.com
TV.com is a website owned by CBS Interactive. The site covers television and focuses on English-language shows made or broadcast in the United States, the United Kingdom, Canada, Australia, New Zealand, Ireland and Japan...
and Ratings.net. The Criticker website uses a rating scale from 0 to 100 in order to obtain "personalised film recommendations".
In almost all cases, online rating scales only allow one rating per user per product, though there are exceptions such as Ratings.net, which allows users to rate products in relation to several qualities. Most online rating facilities also provide few or no qualitative descriptions of the rating categories, although again there are exceptions such as Yahoo! Movies which labels each of the categories between F and A+ and BoardGameGeek, which provides explicit descriptions of each category from 1 to 10. Often, only the top and bottom category is described, such as on IMDbs online rating facility.
With each user rating a product only once, for example in a category from 1 to 10, there is no means for evaluating internal reliability
Reliability (statistics)
In statistics, reliability is the consistency of a set of measurements or of a measuring instrument, often used to describe a test. Reliability is inversely related to random error.-Types:There are several general classes of reliability estimates:...
using an index such as Cronbach's alpha
Cronbach's alpha
Cronbach's \alpha is a coefficient of reliability. It is commonly used as a measure of the internal consistency or reliability of a psychometric test score for a sample of examinees. It was first named alpha by Lee Cronbach in 1951, as he had intended to continue with further coefficients...
. It is therefore impossible to evaluate the validity
Validity
In logic, argument is valid if and only if its conclusion is entailed by its premises, a formula is valid if and only if it is true under every interpretation, and an argument form is valid if and only if every argument of that logical form is valid....
of the ratings as measures of viewer perceptions. Establishing validity would require establishing both reliability and accuracy (i.e. that the ratings represent what they are supposed to represent).
Another fundamental issue is that online ratings usually involve convenience sampling
Sampling (statistics)
In statistics and survey methodology, sampling is concerned with the selection of a subset of individuals from within a population to estimate characteristics of the whole population....
much like television polls, i.e., they represent only the conglomeration of those inclined to submit ratings.
Sampling is one factor which can lead to results which have a specific bias or are only relevant to a specific subgroup. To illustrate the importance of such factors, consider an example. Suppose that a film's marketing strategy and reputation is such that it appeals to a specialist audience: 90% of them are devotees of this particular kind of film, and only 10% are people with a general interest in movies. Suppose also that the film is very popular among the audience that does see the film and, in addition, that only those who feel most strongly about the film are inclined to rate the film online, so that they are all drawn from the devotees. This combination may lead to very high ratings of the film which do not generalize beyond the people who actually see the film (or possibly even beyond those who actually rate it).
Qualitative description of categories is an important feature of a rating scale. For example, if only the points 1-10 are given without description, some people may select 10 rarely whereas other may select the category often. If, instead, "10" is described as "near flawless", the category is more likely to mean the same thing to different people. This applies to all categories, not just the extreme points.
These issues are also compounded when aggregated statistics such as averages are used for lists and rankings of products. User ratings are at best ordinal categorizations. While it is not uncommon to calculate averages or means for such data, doing so cannot be justified because in calculating averages, equal intervals are required to represent the same difference between levels of perceived quality. The key issues with aggregate data based on the kinds of rating scales commonly used online are as follow:
- Averages should not be calculated for data of the kind collected.
- It is usually impossible to evaluate the reliability or validity of user ratings.
- Products are not compared with respect to explicit, let alone common, criteria.
- Only users inclined to submit a rating for a product do so.
- Data are not usually published in a form that permits evaluation of the product ratings.
More developed methodologies include Choice Modelling
Choice Modelling
Choice modelling attempts to model the decision process of an individual or segment in a particular context. Choice modelling may also be used to estimate non-market environmental benefits and costs....
or Maximum Difference
MaxDiff
Maximum difference scaling is a discrete choice model first described by Jordan Louviere in 1987 while on the faculty at the University of Alberta. The first working papers and publications occurred in the early 1990s...
methods, the latter being related to the Rasch model
Rasch model
Rasch models are used for analysing data from assessments to measure variables such as abilities, attitudes, and personality traits. For example, they may be used to estimate a student's reading ability from answers to questions on a reading assessment, or the extremity of a person's attitude to...
due to the connection between Thurstone's law of comparative judgement and the Rasch model.
See also
- Likert scaleLikert scaleA Likert scale is a psychometric scale commonly involved in research that employs questionnaires. It is the most widely used approach to scaling responses in survey research, such that the term is often used interchangeably with rating scale, or more accurately the Likert-type scale, even though...
- Rating scales for depressionRating scales for depressionA depression rating scale is a psychiatric measuring instrument having descriptive words and phrases that indicate the severity of depression symptoms for a time period. When used, an observer may make judgements and rate a person at a specified scale level with respect to identified characteristics...
- Semantic differentialSemantic differentialSemantic differential is a type of a rating scale designed to measure the connotative meaning of objects, events, and concepts. The connotations are used to derive the attitude towards the given object, event or concept.-Semantic differential:...
- Voting systemVoting systemA voting system or electoral system is a method by which voters make a choice between options, often in an election or on a policy referendum....
- MaxDiffMaxDiffMaximum difference scaling is a discrete choice model first described by Jordan Louviere in 1987 while on the faculty at the University of Alberta. The first working papers and publications occurred in the early 1990s...
- Advantages of Rating scale