Consensus based assessment
Encyclopedia
Consensus-based assessment expands on the common practice of consensus decision-making
and the theoretical observation that expertise can be closely approximated by large numbers of novices or journeymen. It creates a method for determining measurement standards
for very ambiguous domains of knowledge, such as emotional intelligence
, politics, religion, values and culture in general. From this perspective, the shared knowledge that forms cultural consensus can be assessed in much the same way as expertise or general intelligence.
s, with similar mean ratings. Thus, from the perspective of a CBA framework, cultural standards for scoring keys can be derived from the population that is being assessed. Peter Legree and Joseph Psotka, working together over the past decades, proposed that psychometric g
could be measured unobtrusively through survey-like scales requiring judgments. This could either use the deviation score for each person from the group or expert mean; or a Pearson correlation
between their judgments and the group mean. The two techniques are perfectly correlated. Legree and Psotka subsequently created scales that requested individuals to estimate word frequency; judge binary probabilities of good continuation; identify knowledge implications; and approximate employment distributions. The items were carefully identified to avoid objective referents, and therefore the scales required respondents to provide judgments that were scored against broadly developed, consensual standards. Performance on this judgment battery correlated approximately 0.80 with conventional measures of psychometric g. The response keys were consensually derived. Unlike mathematics or physics questions, the selection of items, scenarios, and options to assess psychometric g were guided roughly by a theory that emphasized complex judgment, but the explicit keys were unknown until the assessments had been made: they were determined by the average of everyone's responses, using deviation scores, correlations, or factor scores.
, or measurement standards to evaluate performance. This approach is particularly relevant to scoring subjective areas of knowledge that are scaled using Likert response scales, and the approach has been applied to develop scoring standards for several domains where experts are scarce.
judgments across a set of items against the mean of all people's judgments on those same items. The correlation is then a measure of that person's proximity to the consensus. It is also sometimes computed as a standardized deviation score from the consensus means of the groups. These two procedures are mathematically isomorphic. If culture is considered to be shared knowledge; and the mean of the group’s ratings on a focused domain of knowledge is considered a measure of the cultural consensus in that domain; then both procedures assess CBA as a measure of an individual person’s cultural understanding.
However, it may be that the consensus is not evenly distributed over all subordinate items about a topic. Perhaps the knowledge content of the items is distributed over domains with differing consensus. For instance, conservatives who are libertarians may feel differently about invasion of privacy than conservatives who feel strongly about law and order. In fact, standard factor analysis
brings this issue to the fore.
In either centroid or principal components analysis
(PCA) the first factor scores are created by multiplying each rating by the correlation of the factor (usually the mean of all standardized ratings for each person) against each item’s ratings. This multiplication weights each item by the correlation of the pattern of individual differences on each item (the component scores). If consensus is unevenly distributed over these items, some items may be more focused on the overall issues of the common factor. If an item correlates highly with the pattern of overall individual differences, then it is weighted more strongly in the overall factor scores. This weighting implicitly also weights the CBA score, since it is those items that share a common CBA pattern of consensus that are weighted more in factor analysis.
The transposed or Q methodology
factor analysis
, created by William Stephenson (psychologist)
brings this relationship out explicitly. CBA scores are statistically isomorphic to the component scores in PCA for a Q factor analysis. They are the loading of each person’s responses on the mean of all people’s responses. So, Q factor analysis may provide a superior CBA measure, if it can be used first to select the people who represent the dominant dimension, over items that best represent a subordinate attribute dimension of a domain (such as liberalism in a political domain). Factor analysis can then provide the CBA of individuals along that particular axis of the domain.
In practice, when items are not easily created and arrayed to provide a highly reliable scale, the Q factor analysis is not necessary, since the original factor analysis should also select those items that have a common consensus. So, for instance, in a scale of items for political attitudes, the items may ask about attitudes toward big government; law and order; economic issues; labor issues; or libertarian issues. Which of these items most strongly bear on the political attitudes of the groups polled may be difficult to determine a priori. However, since factor analysis is a symmetric computation on the matrix of items and people, the original factor analysis of items, (when these are Likert scales) selects not just those items that are in a similar domain, but more generally, those items that have a similar consensus. The added advantage of this factor analytic technique is that items are automatically arranged along a factor so that the highest Likert ratings are also the highest CBA standard scores. Once selected, that factor determines the CBA (component) scores.
Consensus decision-making
Consensus decision-making is a group decision making process that seeks the consent, not necessarily the agreement, of participants and the resolution of objections. Consensus is defined by Merriam-Webster as, first, general agreement, and second, group solidarity of belief or sentiment. It has its...
and the theoretical observation that expertise can be closely approximated by large numbers of novices or journeymen. It creates a method for determining measurement standards
Rubric (academic)
A rubric is an assessment tool for communicating expectations of quality. Rubrics support student self-reflection and self-assessment as well as communication between assessor and assessees...
for very ambiguous domains of knowledge, such as emotional intelligence
Emotional intelligence
Emotional intelligence is a skill or ability in the case of the trait EI model, a self-perceived ability to identify, assess, and control the emotions of oneself, of others, and of groups. Various models and definitions have been proposed of which the ability and trait EI models are the most...
, politics, religion, values and culture in general. From this perspective, the shared knowledge that forms cultural consensus can be assessed in much the same way as expertise or general intelligence.
Measurement standards for general intelligence
Consensus-based assessment is based on a simple finding: that samples of individuals with differing competence (e.g., experts and apprentices) rate relevant scenarios, using Likert scaleLikert scale
A Likert scale is a psychometric scale commonly involved in research that employs questionnaires. It is the most widely used approach to scaling responses in survey research, such that the term is often used interchangeably with rating scale, or more accurately the Likert-type scale, even though...
s, with similar mean ratings. Thus, from the perspective of a CBA framework, cultural standards for scoring keys can be derived from the population that is being assessed. Peter Legree and Joseph Psotka, working together over the past decades, proposed that psychometric g
General intelligence factor
The g factor, where g stands for general intelligence, is a statistic used in psychometrics to model the mental ability underlying results of various tests of cognitive ability...
could be measured unobtrusively through survey-like scales requiring judgments. This could either use the deviation score for each person from the group or expert mean; or a Pearson correlation
Correlation
In statistics, dependence refers to any statistical relationship between two random variables or two sets of data. Correlation refers to any of a broad class of statistical relationships involving dependence....
between their judgments and the group mean. The two techniques are perfectly correlated. Legree and Psotka subsequently created scales that requested individuals to estimate word frequency; judge binary probabilities of good continuation; identify knowledge implications; and approximate employment distributions. The items were carefully identified to avoid objective referents, and therefore the scales required respondents to provide judgments that were scored against broadly developed, consensual standards. Performance on this judgment battery correlated approximately 0.80 with conventional measures of psychometric g. The response keys were consensually derived. Unlike mathematics or physics questions, the selection of items, scenarios, and options to assess psychometric g were guided roughly by a theory that emphasized complex judgment, but the explicit keys were unknown until the assessments had been made: they were determined by the average of everyone's responses, using deviation scores, correlations, or factor scores.
Measurement standards for cultural knowledge
One way to understand the connection between expertise and consensus is to consider that for many performance domains, expertise largely reflects knowledge derived from experience. Since novices tend to have fewer experiences, their opinions err in various inconsistent directions. However, as experience is acquired, the opinions of journeymen through to experts become more consistent. According to this view, errors are random. Ratings data collected from large samples of respondents of varying expertise can thus be used to approximate the average ratings a substantial number of experts would provide were many experts available. Because the standard deviation of a mean will approach zero as the number of observations becomes very large, estimates based on groups of varying competence will provide converging estimates of the best performance standards. The means of these groups’ responses can be used to create effective scoring rubricsRubric (academic)
A rubric is an assessment tool for communicating expectations of quality. Rubrics support student self-reflection and self-assessment as well as communication between assessor and assessees...
, or measurement standards to evaluate performance. This approach is particularly relevant to scoring subjective areas of knowledge that are scaled using Likert response scales, and the approach has been applied to develop scoring standards for several domains where experts are scarce.
Experimental results
In practice, analyses have demonstrated high levels of convergence between expert and CBA standards with values quantifying those standards highly correlated (Pearson Rs ranging from .72 to .95), and with scores based on those standards also highly correlated (Rs ranging from .88 to .99) provided the sample size of both groups is large (Legree, Psotka, Tremble & Bourne, 2005). This convergence between CBA and expert referenced scores and the associated validity data indicate that CBA and expert based scoring can be used interchangeably, provided that the ratings data are collected using large samples of experts and novices or journeymen.Factor analysis
CBA is often computed by using the Pearson R correlation of each person's Likert scaleLikert scale
A Likert scale is a psychometric scale commonly involved in research that employs questionnaires. It is the most widely used approach to scaling responses in survey research, such that the term is often used interchangeably with rating scale, or more accurately the Likert-type scale, even though...
judgments across a set of items against the mean of all people's judgments on those same items. The correlation is then a measure of that person's proximity to the consensus. It is also sometimes computed as a standardized deviation score from the consensus means of the groups. These two procedures are mathematically isomorphic. If culture is considered to be shared knowledge; and the mean of the group’s ratings on a focused domain of knowledge is considered a measure of the cultural consensus in that domain; then both procedures assess CBA as a measure of an individual person’s cultural understanding.
However, it may be that the consensus is not evenly distributed over all subordinate items about a topic. Perhaps the knowledge content of the items is distributed over domains with differing consensus. For instance, conservatives who are libertarians may feel differently about invasion of privacy than conservatives who feel strongly about law and order. In fact, standard factor analysis
Factor analysis
Factor analysis is a statistical method used to describe variability among observed, correlated variables in terms of a potentially lower number of unobserved, uncorrelated variables called factors. In other words, it is possible, for example, that variations in three or four observed variables...
brings this issue to the fore.
In either centroid or principal components analysis
Principal components analysis
Principal component analysis is a mathematical procedure that uses an orthogonal transformation to convert a set of observations of possibly correlated variables into a set of values of uncorrelated variables called principal components. The number of principal components is less than or equal to...
(PCA) the first factor scores are created by multiplying each rating by the correlation of the factor (usually the mean of all standardized ratings for each person) against each item’s ratings. This multiplication weights each item by the correlation of the pattern of individual differences on each item (the component scores). If consensus is unevenly distributed over these items, some items may be more focused on the overall issues of the common factor. If an item correlates highly with the pattern of overall individual differences, then it is weighted more strongly in the overall factor scores. This weighting implicitly also weights the CBA score, since it is those items that share a common CBA pattern of consensus that are weighted more in factor analysis.
The transposed or Q methodology
Q methodology
Q Methodology is a research method used in psychology and other social sciences to study people's "subjectivity" -- that is, their viewpoint. Q was developed by psychologist William Stephenson...
factor analysis
Factor analysis
Factor analysis is a statistical method used to describe variability among observed, correlated variables in terms of a potentially lower number of unobserved, uncorrelated variables called factors. In other words, it is possible, for example, that variations in three or four observed variables...
, created by William Stephenson (psychologist)
William Stephenson (psychologist)
William Stephenson was a psychologist and physicist best known for developing Q methodology.He was born in England and trained in physics at the University of Oxford and Durham University . His interest in research methods in physics and complementarity led him to an increased interest in psychology...
brings this relationship out explicitly. CBA scores are statistically isomorphic to the component scores in PCA for a Q factor analysis. They are the loading of each person’s responses on the mean of all people’s responses. So, Q factor analysis may provide a superior CBA measure, if it can be used first to select the people who represent the dominant dimension, over items that best represent a subordinate attribute dimension of a domain (such as liberalism in a political domain). Factor analysis can then provide the CBA of individuals along that particular axis of the domain.
In practice, when items are not easily created and arrayed to provide a highly reliable scale, the Q factor analysis is not necessary, since the original factor analysis should also select those items that have a common consensus. So, for instance, in a scale of items for political attitudes, the items may ask about attitudes toward big government; law and order; economic issues; labor issues; or libertarian issues. Which of these items most strongly bear on the political attitudes of the groups polled may be difficult to determine a priori. However, since factor analysis is a symmetric computation on the matrix of items and people, the original factor analysis of items, (when these are Likert scales) selects not just those items that are in a similar domain, but more generally, those items that have a similar consensus. The added advantage of this factor analytic technique is that items are automatically arranged along a factor so that the highest Likert ratings are also the highest CBA standard scores. Once selected, that factor determines the CBA (component) scores.
Critiques
The most common critique of CBA standards is to question how an average could possibly be a maximal standard. This critique argues that CBA is unsuitable for maximum-performance tests of psychological attributes, especially intelligence. Even so, CBA techniques are routinely employed in various measures of non-traditional intelligences (e.g., practical, emotional, social, etc.). Detailed critiques are presented in Gottfredson (2003) and MacCann, Roberts, Matthews, & Zeidner (2004) as well as elsewhere in the scientific literature.See also
- Collective intelligenceCollective intelligenceCollective intelligence is a shared or group intelligence that emerges from the collaboration and competition of many individuals and appears in consensus decision making in bacteria, animals, humans and computer networks....
- Consensus
- Consensus decision-makingConsensus decision-makingConsensus decision-making is a group decision making process that seeks the consent, not necessarily the agreement, of participants and the resolution of objections. Consensus is defined by Merriam-Webster as, first, general agreement, and second, group solidarity of belief or sentiment. It has its...
- Consensus democracyConsensus democracyConsensus democracy is the application of consensus decision-making to the process of legislation in a democracy. It is characterised by a decision-making structure which involves and takes into account as broad a range of opinions as possible, as opposed to systems where minority opinions can...
- Consensus theory of truthConsensus theory of truthA consensus theory of truth is any theory of truth that refers to a concept of consensus as a part of its concept of truth.-Consensus gentium:...
- Emotional IntelligenceEmotional intelligenceEmotional intelligence is a skill or ability in the case of the trait EI model, a self-perceived ability to identify, assess, and control the emotions of oneself, of others, and of groups. Various models and definitions have been proposed of which the ability and trait EI models are the most...
- IntelligenceIntelligenceIntelligence has been defined in different ways, including the abilities for abstract thought, understanding, communication, reasoning, learning, planning, emotional intelligence and problem solving....
- Participation (decision making)Participation (decision making)Participation in social science refers to different mechanisms for the public to express opinions - and ideally exert influence - regarding political, economic, management or other social decisions. Participatory decision making can take place along any realm of human social activity, including...
- Polder modelPolder ModelThe polder model is a term with uncertain origin that was first used to describe the internationally acclaimed Dutch version of consensus policy in economics, specifically in the 1980s and 1990s. However, the term was quickly adopted for a much wider meaning, for similar cases of consensus...
- Social representationsSocial representationsA social representation is a stock of values, ideas, beliefs, and practices that are shared among the members of groups and communities. Social Representations Theory is a body of theory within Social Psychology and Sociological social psychology...
- UnanimityUnanimityUnanimity is agreement by all people in a given situation. When unanimous, everybody is of the same mind and acting together as one. Though unlike uniformity, it does not constitute absolute agreement. Many groups consider unanimous decisions a sign of agreement, solidarity, and unity...
- Libertarian socialismLibertarian socialismLibertarian socialism is a group of political philosophies that promote a non-hierarchical, non-bureaucratic, stateless society without private property in the means of production...
External links
- Meta Collab - a free collaborative encyclopedia on collaboration.
- Information and Collaboration Technologies (see Chapter 5): Managing Collective Intelligence, Toward a New Corporate Governance
- Smart Mobs
- The Wisdom of Crowds