Computerized classification test
Encyclopedia
A computerized classification test (CCT) refers to, as its name would suggest, a test that is administered by computer
for the purpose of classifying
examinees. The most common CCT is a mastery test where the test classifies examinees as "Pass" or "Fail," but the term also includes tests that classify examinees into more than two categories. While the term may generally be considered to refer to all computer-administered tests for classification, it is usually used to refer to tests that are interactively administered or of variable-length, similar to computerized adaptive testing (CAT). Like CAT, variable-length CCTs can accomplish the goal of the test (accurate classification) with a fraction of the number of items used in a conventional fixed-form test.
A CCT requires several components:
1. An item bank
calibrated with a psychometric model selected by the test designer
2. A starting point
3. An item selection algorithm
4. A termination criterion and scoring procedure
The starting point is not a topic of contention; research on CCT primarily investigates the application of different methods for the other three components. Note: The termination criterion and scoring procedure are separate in CAT, but the same in CCT because the test is terminated when a classification is made. Therefore, there are five components that must be specified to design a CAT.
An introduction to CCT is found in Thompson (2007) and a book by Parshall, Spray, Kalohn and Davey (2006). A bibliography of published CCT research is found below.
scores it and determines if the examinee is able to be classified yet. If they are, the test is terminated and the examinee is classified. If not, another item is administered. This process repeats until the examinee is classified or another ending point is satisfied (all items in the bank have been administered, or a maximum test length is reached).
(CTT) and item response theory
(IRT). Classical test theory assumes a state model because it is applied by determining item parameters for a sample of examinees determined to be in each category. For instance, several hundred "masters" and several hundred "nonmasters" might be sampled to determine the difficulty and discrimination for each, but doing so requires that you be able to easily identify a distinct set of people that are in each group. IRT, on the other hand, assumes a trait model; the knowledge or ability measured by the test is a continuum. The classification groups will need to be more or less arbitrarily defined along the continuum, such as the use of a cutscore to demarcate masters and nonmasters, but the specification of item parameters assumes a trait model.
There are advantages and disadvantages to each. CTT offers greater conceptual simplicity. More importantly, CTT requires fewer examinees in the sample for calibration of item parameters to be used eventually in the design of the CCT, making it useful for smaller testing programs. See Frick (1992) for a description of a CTT-based CCT. Most CCTs, however, utilize IRT. IRT offers greater specificity, but the most important reason may be that the design of a CCT (and a CAT) is expensive, and is therefore more likely done by a large testing program with extensive resources. Such a program would likely use IRT.
is used as the termination criterion, it implicitly assumes a starting ratio of 1.0 (equal probability of the examinee being a master or nonmaster). If the termination criterion is a confidence interval
approach, a specified starting point on theta must be specified. Usually, this is 0.0, the center of the distribution
, but it could also be randomly drawn from a certain distribution if the parameters of the examinee distribution are known. Also, previous information regarding an individual examinee, such as their score the last time they took the test (if re-taking) may be used.
Methods of item selection fall into two categories: cutscore-based and estimate-based. Cutscore-based methods (also known as sequential selection) maximize the information
provided by the item at the cutscore, or cutscores if there are more than one, regardless of the ability of the examinee. Estimate-based methods (also known as adaptive selection) maximize information at the current estimate of examinee ability, regardless of the location of the cutscore. Both work efficiently, but the efficiency depends in part on the termination criterion employed. Because the sequential probability ratio test
only evaluates probabilities near the cutscore, cutscore-based item selection is more appropriate. Because the confidence interval
termination criterion is centered around the examinees ability estimate, estimate-based item selection is more appropriate. This is because the test will make a classification when the confidence interval is small enough to be completely above or below the cutscore (see below). The confidence interval will be smaller when the standard error of measurement is smaller, and the standard error of measurement will be smaller when there is more information at the theta level of the examinee.
approach calculates a confidence interval around the examinee's current theta estimate at each point in the test, and classifies the examinee when the interval falls completely within a region of theta that defines a classification. This was originally known as adaptive mastery testing (Kingsbury & Weiss, 1983), but does not necessarily require adaptive item selection, nor is it limited to the two-classification mastery testing situation. The sequential probability ratio test
(Reckase, 1983) defines the classification problem as a hypothesis test that the examinee's theta is equal to a specified point above the cutscore or a specified point below the cutscore.
With the 3-Parameter Logistic Model. Unpublished doctoral dissertation, University of Minnesota, Minneapolis, MN.
Computer
A computer is a programmable machine designed to sequentially and automatically carry out a sequence of arithmetic or logical operations. The particular sequence of operations can be changed readily, allowing the computer to solve more than one kind of problem...
for the purpose of classifying
Classification rule
Given a population whose members can be potentially separated into a number of different sets or classes, a classification rule is a procedure in which the elements of the population set are each assigned to one of the classes. A perfect test is such that every element in the population is assigned...
examinees. The most common CCT is a mastery test where the test classifies examinees as "Pass" or "Fail," but the term also includes tests that classify examinees into more than two categories. While the term may generally be considered to refer to all computer-administered tests for classification, it is usually used to refer to tests that are interactively administered or of variable-length, similar to computerized adaptive testing (CAT). Like CAT, variable-length CCTs can accomplish the goal of the test (accurate classification) with a fraction of the number of items used in a conventional fixed-form test.
A CCT requires several components:
1. An item bank
Item bank
An item bank is a term for a repository of test items that belong to a testing program, as well as all information pertaining to those items. In most applications of testing and assessment, the items are of multiple choice format, but any format can be used...
calibrated with a psychometric model selected by the test designer
2. A starting point
3. An item selection algorithm
Algorithm
In mathematics and computer science, an algorithm is an effective method expressed as a finite list of well-defined instructions for calculating a function. Algorithms are used for calculation, data processing, and automated reasoning...
4. A termination criterion and scoring procedure
The starting point is not a topic of contention; research on CCT primarily investigates the application of different methods for the other three components. Note: The termination criterion and scoring procedure are separate in CAT, but the same in CCT because the test is terminated when a classification is made. Therefore, there are five components that must be specified to design a CAT.
An introduction to CCT is found in Thompson (2007) and a book by Parshall, Spray, Kalohn and Davey (2006). A bibliography of published CCT research is found below.
How a CCT Works
A CCT is very similar to a CAT. Items are administered one at a time to an examinee. After the examinee responds to the item, the computerComputer
A computer is a programmable machine designed to sequentially and automatically carry out a sequence of arithmetic or logical operations. The particular sequence of operations can be changed readily, allowing the computer to solve more than one kind of problem...
scores it and determines if the examinee is able to be classified yet. If they are, the test is terminated and the examinee is classified. If not, another item is administered. This process repeats until the examinee is classified or another ending point is satisfied (all items in the bank have been administered, or a maximum test length is reached).
Psychometric Model
Two approaches are available for the psychometric model of a CCT: classical test theoryClassical test theory
Classical test theory is a body of related psychometric theory that predict outcomes of psychological testing such as the difficulty of items or the ability of test-takers. Generally speaking, the aim of classical test theory is to understand and improve the reliability of psychological...
(CTT) and item response theory
Item response theory
In psychometrics, item response theory also known as latent trait theory, strong true score theory, or modern mental test theory, is a paradigm for the design, analysis, and scoring of tests, questionnaires, and similar instruments measuring abilities, attitudes, or other variables. It is based...
(IRT). Classical test theory assumes a state model because it is applied by determining item parameters for a sample of examinees determined to be in each category. For instance, several hundred "masters" and several hundred "nonmasters" might be sampled to determine the difficulty and discrimination for each, but doing so requires that you be able to easily identify a distinct set of people that are in each group. IRT, on the other hand, assumes a trait model; the knowledge or ability measured by the test is a continuum. The classification groups will need to be more or less arbitrarily defined along the continuum, such as the use of a cutscore to demarcate masters and nonmasters, but the specification of item parameters assumes a trait model.
There are advantages and disadvantages to each. CTT offers greater conceptual simplicity. More importantly, CTT requires fewer examinees in the sample for calibration of item parameters to be used eventually in the design of the CCT, making it useful for smaller testing programs. See Frick (1992) for a description of a CTT-based CCT. Most CCTs, however, utilize IRT. IRT offers greater specificity, but the most important reason may be that the design of a CCT (and a CAT) is expensive, and is therefore more likely done by a large testing program with extensive resources. Such a program would likely use IRT.
Starting point
A CCT must have a specified starting point to enable certain algorithms. If the sequential probability ratio testSequential probability ratio test
The sequential probability ratio test is a specific sequential hypothesis test, developed by Abraham Wald. Neyman and Pearson's 1933 result inspired Wald to reformulate it as a sequential analysis problem...
is used as the termination criterion, it implicitly assumes a starting ratio of 1.0 (equal probability of the examinee being a master or nonmaster). If the termination criterion is a confidence interval
Confidence interval
In statistics, a confidence interval is a particular kind of interval estimate of a population parameter and is used to indicate the reliability of an estimate. It is an observed interval , in principle different from sample to sample, that frequently includes the parameter of interest, if the...
approach, a specified starting point on theta must be specified. Usually, this is 0.0, the center of the distribution
Probability distribution
In probability theory, a probability mass, probability density, or probability distribution is a function that describes the probability of a random variable taking certain values....
, but it could also be randomly drawn from a certain distribution if the parameters of the examinee distribution are known. Also, previous information regarding an individual examinee, such as their score the last time they took the test (if re-taking) may be used.
Item Selection
In a CCT, items are selected for administration throughout the test, unlike the traditional method of administering a fixed set of items to all examinees. While this is usually done by individual item, it can also be done in groups of items known as testlets (Leucht & Nungester, 1996; Vos & Glas, 2000).Methods of item selection fall into two categories: cutscore-based and estimate-based. Cutscore-based methods (also known as sequential selection) maximize the information
Information
Information in its most restricted technical sense is a message or collection of messages that consists of an ordered sequence of symbols, or it is the meaning that can be interpreted from such a message or collection of messages. Information can be recorded or transmitted. It can be recorded as...
provided by the item at the cutscore, or cutscores if there are more than one, regardless of the ability of the examinee. Estimate-based methods (also known as adaptive selection) maximize information at the current estimate of examinee ability, regardless of the location of the cutscore. Both work efficiently, but the efficiency depends in part on the termination criterion employed. Because the sequential probability ratio test
Sequential probability ratio test
The sequential probability ratio test is a specific sequential hypothesis test, developed by Abraham Wald. Neyman and Pearson's 1933 result inspired Wald to reformulate it as a sequential analysis problem...
only evaluates probabilities near the cutscore, cutscore-based item selection is more appropriate. Because the confidence interval
Confidence interval
In statistics, a confidence interval is a particular kind of interval estimate of a population parameter and is used to indicate the reliability of an estimate. It is an observed interval , in principle different from sample to sample, that frequently includes the parameter of interest, if the...
termination criterion is centered around the examinees ability estimate, estimate-based item selection is more appropriate. This is because the test will make a classification when the confidence interval is small enough to be completely above or below the cutscore (see below). The confidence interval will be smaller when the standard error of measurement is smaller, and the standard error of measurement will be smaller when there is more information at the theta level of the examinee.
Termination criterion
There are three termination criteria commonly used for CCTs. Bayesian decision theory methods offer great flexibility by presenting an infinite choice of loss/utility structures and evaluation considerations, but also introduce greater arbitrariness. A confidence intervalConfidence interval
In statistics, a confidence interval is a particular kind of interval estimate of a population parameter and is used to indicate the reliability of an estimate. It is an observed interval , in principle different from sample to sample, that frequently includes the parameter of interest, if the...
approach calculates a confidence interval around the examinee's current theta estimate at each point in the test, and classifies the examinee when the interval falls completely within a region of theta that defines a classification. This was originally known as adaptive mastery testing (Kingsbury & Weiss, 1983), but does not necessarily require adaptive item selection, nor is it limited to the two-classification mastery testing situation. The sequential probability ratio test
Sequential probability ratio test
The sequential probability ratio test is a specific sequential hypothesis test, developed by Abraham Wald. Neyman and Pearson's 1933 result inspired Wald to reformulate it as a sequential analysis problem...
(Reckase, 1983) defines the classification problem as a hypothesis test that the examinee's theta is equal to a specified point above the cutscore or a specified point below the cutscore.
A bibliography of CCT research
- Armitage, P. (1950). Sequential analysis with more than two alternative hypotheses, and its relation to discriminant function analysis. Journal of the Royal Statistical SocietyJournal of the Royal Statistical SocietyThe Journal of the Royal Statistical Society is a series of three peer-reviewed statistics journals published by Blackwell Publishing for the London-based Royal Statistical Society.- History :...
, 12, 137-144. - Braun, H., Bejar, I.I., and Williamson, D.M. (2006). Rule-based methods for automated scoring: Application in a licensing context. In Williamson, D.M., Mislevy, R.J., and Bejar, I.I. (Eds.) Automated scoring of complex tasks in computer-based testing. Mahwah, NJ: Erlbaum.
- Dodd, B. G., De Ayala, R. J., & Koch, W. R. (1995). Computerized adaptive testing with polytomous items. Applied Psychological Measurement, 19, 5-22.
- Eggen, T. J. H. M. (1999). Item selection in adaptive testing with the sequential probability ratio test. Applied Psychological Measurement, 23, 249-261.
- Eggen, T. J. H. M, & Straetmans, G. J. J. M. (2000). Computerized adaptive testing for classifying examinees into three categories. Educational and Psychological Measurement, 60, 713-734.
- Epstein, K. I., & Knerr, C. S. (1977). Applications of sequential testing procedures to performance testing. Paper presented at the 1977 Computerized Adaptive Testing Conference, Minneapolis, MN.
- Ferguson, R. L. (1969). The development, implementation, and evaluation of a computer-assisted branched test for a program of individually prescribed instruction. Unpublished doctoral dissertation, University of Pittsburgh.
- Frick, T. W. (1989). Bayesian adaptation during computer-based tests and computer-guided exercises. Journal of Educational Computing Research, 5, 89-114.
- Frick, T. W. (1990). A comparison of three decisions models for adapting the length of computer-based mastery tests. Journal of Educational Computing Research, 6, 479-513.
- Frick, T. W. (1992). Computerized adaptive mastery tests as expert systems. Journal of Educational Computing Research, 8, 187-213.
- Huang, C.-Y., Kalohn, J.C., Lin, C.-J., and Spray, J. (2000). Estimating Item Parameters from Classical Indices for Item Pool Development with a Computerized Classification Test. (Research Report 2000-4). Iowa City, IA: ACT, Inc.
- Jacobs-Cassuto, M.S. (2005). A Comparison of Adaptive Mastery Testing Using Testlets
With the 3-Parameter Logistic Model. Unpublished doctoral dissertation, University of Minnesota, Minneapolis, MN.
- Jiao, H., & Lau, A. C. (2003). The Effects of Model Misfit in Computerized Classification Test. Paper presented at the annual meeting of the National Council of Educational Measurement, Chicago, IL, April 2003.
- Jiao, H., Wang, S., & Lau, C. A. (2004). An Investigation of Two Combination Procedures of SPRT for Three-category Classification Decisions in Computerized Classification Test. Paper presented at the annual meeting of the American Educational Research Association, San Antonio, April 2004.
- Kalohn, J. C., & Spray, J. A. (1999). The effect of model misspecification on classification decisions made using a computerized test. Journal of Educational Measurement, 36, 47-59.
- Kingsbury, G.G., & Weiss, D.J. (1979). An adaptive testing strategy for mastery decisions. Research report 79-05. Minneapolis: University of Minnesota, Psychometric Methods Laboratory.
- Kingsbury, G.G., & Weiss, D.J. (1983). A comparison of IRT-based adaptive mastery testing and a sequential mastery testing procedure. In D. J. Weiss (Ed.), New horizons in testing: Latent trait theory and computerized adaptive testing (pp. 237–254). New York: Academic Press.
- Lau, C. A. (1996). Robustness of a unidimensional computerized testing mastery procedure with multidimensional testing data. Unpublished doctoral dissertation, University of Iowa, Iowa City IA.
- Lau, C. A., & Wang, T. (1998). Comparing and combining dichotomous and polytomous items with SPRT procedure in computerized classification testing. Paper presented at the annual meeting of the American Educational Research Association, San Diego.
- Lau, C. A., & Wang, T. (1999). Computerized classification testing under practical constraints with a polytomous model. Paper presented at the annual meeting of the American Educational Research Association, Montreal, Canada.
- Lau, C. A., & Wang, T. (2000). A new item selection procedure for mixed item type in computerized classification testing. Paper presented at the annual meeting of the American Educational Research Association, New Orleans, Louisiana.
- Lewis, C., & Sheehan, K. (1990). Using Bayesian decision theory to design a computerized mastery test. Applied Psychological Measurement, 14, 367-386.
- Lin, C.-J. & Spray, J.A. (2000). Effects of item-selection criteria on classification testing with the sequential probability ratio test. (Research Report 2000-8). Iowa City, IA: ACT, Inc.
- Linn, R. L., Rock, D. A., & Cleary, T. A. (1972). Sequential testing for dichotomous decisions. Educational & Psychological Measurement, 32, 85-95.
- Luecht, R. M. (1996). Multidimensional Computerized Adaptive Testing in a Certification or Licensure Context. Applied Psychological Measurement, 20, 389-404.
- Reckase, M. D. (1983). A procedure for decision making using tailored testing. In D. J. Weiss (Ed.), New horizons in testing: Latent trait theory and computerized adaptive testing (pp. 237–254). New York: Academic Press.
- Rudner, L. M. (2002). An examination of decision-theory adaptive testing procedures. Paper presented at the annual meeting of the American Educational Research Association, April 1–5, 2002, New Orleans, LA.
- Sheehan, K., & Lewis, C. (1992). Computerized mastery testing with nonequivalent testlets. Applied Psychological Measurement, 16, 65-76.
- Spray, J. A. (1993). Multiple-category classification using a sequential probability ratio test (Research Report 93-7). Iowa City, Iowa: ACT, Inc.
- Spray, J. A., Abdel-fattah, A. A., Huang, C., and Lau, C. A. (1997). Unidimensional approximations for a computerized test when the item pool and latent space are multidimensional (Research Report 97-5). Iowa City, Iowa: ACT, Inc.
- Spray, J. A., & Reckase, M. D. (1987). The effect of item parameter estimation error on decisions made using the sequential probability ratio test (Research Report 87-17). Iowa City, IA: ACT, Inc.
- Spray, J. A., & Reckase, M. D. (1994). The selection of test items for decision making with a computerized adaptive test. Paper presented at the Annual Meeting of the National Council for Measurement in Education (New Orleans, LA, April 5–7, 1994).
- Spray, J. A., & Reckase, M. D. (1996). Comparison of SPRT and sequential Bayes procedures for classifying examinees into two categories using a computerized test. Journal of Educational & Behavioral Statistics,21, 405-414.
- Thompson, N.A. (2006). Variable-length computerized classification testing with item response theory. CLEAR Exam Review, 17(2).
- Vos, H. J. (1998). Optimal sequential rules for computer-based instruction. Journal of Educational Computing Research, 19, 133-154.
- Vos, H. J. (1999). Applications of Bayesian decision theory to sequential mastery testing. Journal of Educational and Behavioral Statistics, 24, 271-292.
- Wald, A. (1947). Sequential analysis. New York: Wiley.
- Weiss, D. J., & Kingsbury, G. G. (1984). Application of computerized adaptive testing to educational problems. Journal of Educational Measurement, 21, 361-375.
- Weissman, A. (2004). Mutual information item selection in multiple-category classification CAT. Paper presented at the Annual Meeting of the National Council for Measurement in Education, San Diego, CA.
- Weitzman, R. A. (1982a). Sequential testing for selection. Applied Psychological Measurement, 6, 337-351.
- Weitzman, R. A. (1982b). Use of sequential testing to prescreen prospective entrants into military service. In D. J. Weiss (Ed.), Proceedings of the 1982 Computerized Adaptive Testing Conference. Minneapolis, MN: University of Minnesota, Department of Psychology, Psychometric Methods Program, 1982.
External links
- Measurement Decision Theory by Lawrence Rudner
- CAT Central by David J. Weiss