Computer-adaptive testing: Dichotomous and Polytomous CAT:
A Bayesian Maximum-Falsification [maximum accuracy] approach
as an alternative to the usual Maximum-Information [maximum precision] approach

Note: Most CAT methods are based on "maximum information" - with rating scales these target respondents at the center of the rating scale. Essentially the central rating scale item difficulty substitutes for the dichotomous item difficulty as far as item selection is concerned, and www.rasch.org/rmt/rmt122q.htm is used for the estimation. Here is a CAT method based on "maximum falsification" - the tests are slightly longer, but yield a greater variety of expected responses. It is more complicated to operationalize, so start with the "maximum information" method, and if respondents say "this test is too bland", then switch to "maximum falsification".

Item selection and examinee measure estimation methods for computer-adaptive testing (CAT) have been motivated by the impractical and largely irrelevant goal of minimizing test length. In practice, there is little inefficiency or inconvenience associated with administering a few extra items to each examinee. In contrast, there is considerable benefit in administering longer tests in order to improve content balance and coverage, and to equalize item use.

A further benefit would be to implement CAT algorithms that are easy to check for correct operation and have face validity to non-specialists. The abstruse maximum-information item selection methods do not meet this need. A simpler Bayesian approach may serve. Its correct functioning is easy to verify, and it is also easy to explain in concrete, raw score terms.

Before we administer a dichotomous item, what performance level do we expect of examinees who will fail that item? Clearly we expect their performance to be nowhere near success, "1". But we would not administer the item to these examinees if we expected them to be clear failures, "0". So their expected performance level must be between "0" and "1", but on the failing side, less than "0.5". A Bayesian position could be that the expected performance of examinees who will fail an item is halfway between "0" and "0.5", say, 0.25 score points. Similarly, the expected performance for examinees who will succeed could be 0.75 score points.

From this standpoint, and in the absence of other information, our guess at the ability measure of examinees who succeed on an item is the measure corresponding to 0.75 score points on that item, i.e., 1.1 logits above that item's calibration. Since the information in 0.75 score points is 0.75*0.25, the variance of that one-item ability measure is 1/(0.75*0.25) = 5. Thus, after administering Ln dichotomous items to examinee n, that examinee's current ability estimate, Bn, can be approximated by the mean of the distribution of the one-item ability estimates:

Bn


with standard error

SE(Bn)^2


But a better estimate is the MLE estimate obtained by computing according to RMT 12:2 p. 638 (polytomies) or RMT 10:2 p. 449 (dichotomies)

This estimate of Bn selects the next item for administration. Only at the conclusion of the test is a more exact algorithm used to produce a more precise estimate and S.E. Indeed, more precise estimates may only be needed for examinees close to criterion levels.

This Bayesian approach extends to partial credit items. For a multiple category item, scored 0, 1, 2,.., k-1, k, the expected prior performance of those who score in the intermediate categories can correspond to the values of those categories. We expect examinees who score "1" to be "1"-level examinees, etc. We treat the extreme categories like dichotomies, so that the set of expected scores becomes 0.25, 1, 2,.., k-1, k-0.25. The logit values of these expected scores can be estimated for any CAT item with a calibrated partial credit scale and incorporated into the item bank.

After administering Ln partial credit items to examinee n, that examinee's current ability estimate, Bn can be approximated by:

Bn


with standard error

SE Bn


where Dix is the measure on item i corresponding to an expected score of x (using 0.25 when x=0, and k-0.25, when x=k, the top category.)

But a better estimate is the MLE estimate obtained by computing according to RMT 12:2 p. 638

With this approach, items are selected to give examinees the greatest opportunity to demonstrate their performance level. Item selection is based on score-level transitions, 0.5, 1.5, 2.5,..., k-1.5, k-0.5. Each successive response clarifies whether the examinee is performing higher or lower than currently estimated. The selection algorithm selects at random any item with a transition point near Bn. Transition points will border low scores on hard items and high scores on easy items. The Figure shows score-levels, their ability estimates, transition points and their difficulties for a typical item.

Item characteristic curve for Bayesian CAT


An alternative view of this approach is that it is an application of Carl Popper's Principle of Falsification. The test is continually challenging the examinee to falsify our previous measure estimate.

This CAT approach is motivated by the idea of a conversation about a person's problems, attitudes etc. The typical "maximum information" approach is like a talk with bureaucrat during which both participants are careful to avoid probing for, or asserting, extreme positions - a "politically correct" conversation. Everything is very safely in the middle of the rating scale. At the end of the test, the respondent can say "I answered all the questions truthfully, but I never told them what I really felt on the issues".  On a 5-category Lickert instrument, the ideal "maximum information" response would likely be "neutral" every time! And respondents quickly see the pattern, so response sets are encouraged!

Real conversations between intimates, and particularly real counselling sessions, probe the extremities. At the end of the administration, the respondent could say "I told them what I loved, what I hated, and what I didn't care about either way." The best item administration sequence would follow a substantive plan, going from the more superficial items to the more sensitive issues, but, in general, probably a uniform random selection from "falsification thresholds" would suffice. Of course, this method will perform worse statistically than the "maximum information" method - because it is designed to optimize the psychological results, not the statistical ones.


Computer-Adaptive Testing, CAT: A Bayesian approach. Linacre JM. … Rasch Measurement Transactions, 1995, 9:1 p.412



Rasch Books and Publications
Invariant Measurement: Using Rasch Models in the Social, Behavioral, and Health Sciences, 2nd Edn. George Engelhard, Jr. & Jue Wang Applying the Rasch Model (Winsteps, Facets) 4th Ed., Bond, Yan, Heene Advances in Rasch Analyses in the Human Sciences (Winsteps, Facets) 1st Ed., Boone, Staver Advances in Applications of Rasch Measurement in Science Education, X. Liu & W. J. Boone Rasch Analysis in the Human Sciences (Winsteps) Boone, Staver, Yale
Introduction to Many-Facet Rasch Measurement (Facets), Thomas Eckes Statistical Analyses for Language Testers (Facets), Rita Green Invariant Measurement with Raters and Rating Scales: Rasch Models for Rater-Mediated Assessments (Facets), George Engelhard, Jr. & Stefanie Wind Aplicação do Modelo de Rasch (Português), de Bond, Trevor G., Fox, Christine M Appliquer le modèle de Rasch: Défis et pistes de solution (Winsteps) E. Dionne, S. Béland
Exploring Rating Scale Functioning for Survey Research (R, Facets), Stefanie Wind Rasch Measurement: Applications, Khine Winsteps Tutorials - free
Facets Tutorials - free
Many-Facet Rasch Measurement (Facets) - free, J.M. Linacre Fairness, Justice and Language Assessment (Winsteps, Facets), McNamara, Knoch, Fan
Other Rasch-Related Resources: Rasch Measurement YouTube Channel
Rasch Measurement Transactions & Rasch Measurement research papers - free An Introduction to the Rasch Model with Examples in R (eRm, etc.), Debelak, Strobl, Zeigenfuse Rasch Measurement Theory Analysis in R, Wind, Hua Applying the Rasch Model in Social Sciences Using R, Lamprianou El modelo métrico de Rasch: Fundamentación, implementación e interpretación de la medida en ciencias sociales (Spanish Edition), Manuel González-Montesinos M.
Rasch Models: Foundations, Recent Developments, and Applications, Fischer & Molenaar Probabilistic Models for Some Intelligence and Attainment Tests, Georg Rasch Rasch Models for Measurement, David Andrich Constructing Measures, Mark Wilson Best Test Design - free, Wright & Stone
Rating Scale Analysis - free, Wright & Masters
Virtual Standard Setting: Setting Cut Scores, Charalambos Kollias Diseño de Mejores Pruebas - free, Spanish Best Test Design A Course in Rasch Measurement Theory, Andrich, Marais Rasch Models in Health, Christensen, Kreiner, Mesba Multivariate and Mixture Distribution Rasch Models, von Davier, Carstensen

To be emailed about new material on www.rasch.org
please enter your email address here:

I want to Subscribe: & click below
I want to Unsubscribe: & click below

Please set your SPAM filter to accept emails from Rasch.org

Rasch Measurement Transactions welcomes your comments:

Your email address (if you want us to reply):

If Rasch.org does not reply, please post your message on the Rasch Forum
 

ForumRasch Measurement Forum to discuss any Rasch-related topic

Go to Top of Page
Go to index of all Rasch Measurement Transactions
AERA members: Join the Rasch Measurement SIG and receive the printed version of RMT
Some back issues of RMT are available as bound volumes
Subscribe to Journal of Applied Measurement

Go to Institute for Objective Measurement Home Page. The Rasch Measurement SIG (AERA) thanks the Institute for Objective Measurement for inviting the publication of Rasch Measurement Transactions on the Institute's website, www.rasch.org.

Coming Rasch-related Events
Apr. 21 - 22, 2025, Mon.-Tue. International Objective Measurement Workshop (IOMW) - Boulder, CO, www.iomw.net
Jan. 17 - Feb. 21, 2025, Fri.-Fri. On-line workshop: Rasch Measurement - Core Topics (E. Smith, Winsteps), www.statistics.com
Feb. - June, 2025 On-line course: Introduction to Classical Test and Rasch Measurement Theories (D. Andrich, I. Marais, RUMM2030), University of Western Australia
Feb. - June, 2025 On-line course: Advanced Course in Rasch Measurement Theory (D. Andrich, I. Marais, RUMM2030), University of Western Australia
May 16 - June 20, 2025, Fri.-Fri. On-line workshop: Rasch Measurement - Core Topics (E. Smith, Winsteps), www.statistics.com
June 20 - July 18, 2025, Fri.-Fri. On-line workshop: Rasch Measurement - Further Topics (E. Smith, Facets), www.statistics.com
Oct. 3 - Nov. 7, 2025, Fri.-Fri. On-line workshop: Rasch Measurement - Core Topics (E. Smith, Winsteps), www.statistics.com

 

The URL of this page is www.rasch.org/rmt/rmt91d.htm

Website: www.rasch.org/rmt/contents.htm