Test validity
Encyclopedia
Test validity concerns the test and assessment procedures used in psychological
Psychological testing
Psychological testing is a field characterized by the use of samples of behavior in order to assess psychological construct, such as cognitive and emotional functioning, about a given individual. The technical term for the science behind psychological testing is psychometrics...

 and educational testing, and the extent to which these measure what they purport to measure. “Validity refers to the degree to which evidence and theory support the interpretations of test scores entailed by proposed uses of tests.” Although classical models divided the concept into various "validities," such as content validity
Content validity
In psychometrics, content validity refers to the extent to which a measure represents all facets of a given social construct. For example, a depression scale may lack content validity if it only assesses the affective dimension of depression but fails to take into account the behavioral dimension...

, criterion validity
Criterion validity
A valid measure actually measures what it says it will measure. To define a measure as valid, one can assess different types of validity. The type of validity of measurement assessed depends on what the researcher wants to know...

, and construct validity
Construct validity
In science , construct validity refers to whether a scale measures or correlates with the theorized psychological scientific construct that it purports to measure. In other words, it is the extent to which what was to be measured was actually measured...

, the modern view is that validity is a single unitary construct.

Introduction

Validity is the most important issue in psychological and educational testing because it concerns the meaning placed on test results. Though many textbooks present validity as a static construct, various models of validity have evolved since the first published recommendations for constructing psychological and education tests. These models can be categorized into two primary groups: classical models, which include several types of validity, and modern models, which present validity as a single construct. The modern models reorganize classical "validities" into either "aspects" of validity or types of validity-supporting evidence

Historical background

Although psychologists and educators were aware of several facets of validity before World War II, their methods for establishing validity were commonly restricted to correlations of test scores with some known criterion. Under the direction of Lee Cronbach
Lee Cronbach
Lee Joseph Cronbach was an American educational psychologist who made significant contributions to psychological testing and measurement. Born in Fresno, California, Cronbach was selected as a child to participate in Lewis Terman's long-term study of talented children...

, the 1954 Technical Recommendations for Psychological Tests and Diagnostic Techniques attempted to clarify and broaden the scope of validity by dividing it into four parts: (a) concurrent validity
Concurrent validity
Concurrent validity is a parameter used in sociology, psychology, and other psychometric or behavioral sciences. Concurrent validity is demonstrated where a test correlates well with a measure that has previously been validated. The two measures may be for the same construct, or for different, but...

, (b) predictive validity
Predictive validity
In psychometrics, predictive validity is the extent to which a score on a scale or test predicts scores on some criterion measure.For example, the validity of a cognitive test for job performance is the correlation between test scores and, for example, supervisor performance ratings...

, (c) content validity
Content validity
In psychometrics, content validity refers to the extent to which a measure represents all facets of a given social construct. For example, a depression scale may lack content validity if it only assesses the affective dimension of depression but fails to take into account the behavioral dimension...

, and (d) construct validity
Construct validity
In science , construct validity refers to whether a scale measures or correlates with the theorized psychological scientific construct that it purports to measure. In other words, it is the extent to which what was to be measured was actually measured...

. Cronbach and Meehl’s subsequent publication grouped predictive and concurrent validity into a "criterion-orientation", which eventually became criterion validity
Criterion validity
A valid measure actually measures what it says it will measure. To define a measure as valid, one can assess different types of validity. The type of validity of measurement assessed depends on what the researcher wants to know...

.

Over the next four decades, many theorists, including Cronbach himself, voiced their dissatisfaction with this three-in-one model of validity. Their arguments culminated in Samuel Messick’s
Samuel Messick
Samuel J. Messick III was an American psychologist professor whose work at the Educational Testing Service examined construct validity....

1995 article that described validity as a single construct composed of six "aspects". In his view, various inferences made from test scores may require different types of evidence, but not different validities.

The 1999 Standards for Educational and Psychological Testing largely codified Messick’s model. They describe five types of validity-supporting evidence that incorporate each of Messick’s aspects, and make no mention of the classical models’ content, criterion, and construct validities.

Validation process

According to the 1999 Standards, validation is the process of gathering evidence to provide “a sound scientific basis” for interpreting the scores as proposed by the test developer and/or the test user. Validation therefore begins with a framework that defines the scope and aspects (in the case of multi-dimensional scales) of the proposed interpretation. The framework also includes a rational justification linking the interpretation to the test in question.

Validity researchers then list a series of propositions that must be met if the interpretation is to be valid. Or, conversely, they may compile a list of issues that may threaten the validity of the interpretations. In either case the researchers proceed by gathering evidence – be it original empirical research, meta-analysis or review of existing literature, or logical analysis of the issues – to support or to question the interpretation’s propositions (or the threats to the interpretation’s validity). Emphasis is placed on quality, rather than quantity, of the evidence.

A single interpretation of any test may require several propositions to be true (or may be questioned by any one of a set of threats to its validity). Strong evidence in support of a single proposition does not lessen the requirement to support the other propositions.

Evidence to support (or question) the validity of an interpretation can be categorized into one of five categories:
  1. Evidence based on test content
  2. Evidence based on response processes
  3. Evidence based on internal structure
  4. Evidence based on relations to other variables
  5. Evidence based on consequences of testing


Techniques to gather each type of evidence should only be employed when they yield information that would support or question the propositions required for the interpretation in question.

Each piece of evidence is finally integrated into a validity argument. The argument may call for a revision to the test, its administration protocol, or the theoretical constructs underlying the interpretations. If the test and/or the interpretations meant to be made of the test’s results are revised in any way, a new validation process must gather evidence to support the new version.
The source of this article is wikipedia, the free encyclopedia.  The text of this article is licensed under the GFDL.
 
x
OK