Standard-setting study - AbsoluteAstronomy.com

A standard-setting study is an official research study conducted by an organization that sponsors tests to determine a cutscore

Cutscore

A cutscore, also known as a passing score or passing point, is a single point on a score continuum that differentiates between classifications along the continuum...

for the test. To be legally defensible in the USA and meet the Standards for Educational and Psychological Testing

Standards for Educational and Psychological Testing

The Standards for Educational and Psychological Testing is a set of testing standards developed jointly by the American Educational Research Association , American Psychological Association , and the National Council on Measurement in Education...

, a cutscore cannot be arbitrarily determined, it must be empirically justified. For example, the organization cannot merely decide that the cutscore will be 70% correct. Instead, a study is conducted to determine what score best differentiates the classifications of examinees, such as competent vs. incompetent.

Standard-setting studies are often performed using focus groups of 5-15 subject matter experts that represent key stakeholders for the test. For example, in setting cut scores for educational testing, experts might be instructors familiar with the capabilities of the student population for the test.

Types of standard-setting studies

Standard-setting studies fall into two categories, item-centered and person-centered. Examples of item-centered methods include the Angoff, Ebel, Nedelsky, and Bookmark methods, while examples of person-centered methods include the Borderline Survey and Contrasting Groups approaches. These are so categorized by the focus of the analysis; in item-centered studies, the organization evaluates items with respect to a given population of persons, and vice versa for person-centered studies.

Item-centered studies

The Angoff approach is very widely used. This method requires the assembly of a group of subject matter experts, who are asked to evaluate each item and estimate the proportion of minimally competent examinees that would correctly answer the item. The ratings are averaged across raters for each item and then summed to obtain a panel-recommended raw cutscore. This cutscore then represents the score which the panel estimates a minimally competent candidate would get. This is of course subject to decision biases for example the
overconfidence

Overconfidence effect

The overconfidence effect is a well-established bias in which someone's subjective confidence in their judgments is reliably greater than their objective accuracy, especially when confidence is relatively high. For example, in some quizzes, people rate their answers as "99% certain" but are wrong...

bias. Calibration with other - more objective - sources of data
is preferable.

The Bookmark method is another widely used item-centered approach. Items in a test (or a subset of them) are ordered by difficulty, and each expert places a "bookmark" in the sequence at the location of the cutscore.

Person-centered studies

Rather than the items that distinguish competent candidates, person-centered studies evaluate the examinees themselves. While this might seem more appropriate, it is often more difficult because examinees are not a captive population, as is a list of items.

For example, if a new test comes out regarding new content (as often happens in information technology

Information technology

Information technology is the acquisition, processing, storage and dissemination of vocal, pictorial, textual and numerical information by a microelectronics-based combination of computing and telecommunications...

tests), the test could be given to an initial sample called a beta sample, along with a survey of professional characteristics. The testing organization could then analyze and evaluate the relationship between the test scores and important statistics, such as skills, education, and experience. The cutscore could be set as the score that best differentiates between those examinees characterized as "passing" and those as "failing."

The source of this article is wikipedia, the free encyclopedia. The text of this article is licensed under the GFDL.