Diagnosing Person Misfit

"Nearly twenty years after Sato introduced his caution index, person-fit statistics still seem to be in the realm of potential...
* The research has been largely unsystematic,
* The research has been largely atheoretical,
* The research has not explored... applied settings."
Rudner et al., 1995, p.23

The problem of idiosyncratic responses has long been known: "one must expect that some subjects will do their task in a perfunctory or careless manner... [or] fail to understand the experiment or fail to read the... instructions carefully... It has seemed desirable, therefore, to set up some criterion by which we could identify those individual records which were so inconsistent that they should be eliminated from our calculations." (Thurstone & Chave, 1929, p.32-33). But general acceptance of useful person misfit criterion has been slow in coming.

Devised in 1975, Sato's caution index quantifies deviations from Guttman ordering: "The basic condition to be satisfied is that persons who answer a question `favorably' all have higher scores than persons answering the same question `unfavorably'" (Guttman, 1950, p.76-77). Guttman notes that this permits person response diagnosis: "Scale analysis can actually help pick out responses that were correct by guessing from an analysis of the pattern of errors" (Guttman, 1950, p. 81). A deficiency in Sato's approach, however, is insensitivity to item spacing. Items of equal difficulty cannot be Guttman ordered and so raise the caution index in a way irrelevant to person misfit. Another deficiency is Sato's requirement to group persons by total score. This makes Sato's index incalculable when there are missing data.

Rudner et al. credit Wright (1977) with identifying a wide range of potential sources for idiosyncratic responses: "guessing, cheating, sleeping, fumbling, plodding, and cultural bias". Wright and his students are also credited with two stochastically-based solutions to the fit index problem, the statistics now known as INFIT and OUTFIT, whose distributional properties have been exhaustively investigated and reported by Richard Smith (1986, 1991).

After discussing these and various other indices, Rudner et al. chose INFIT, an information-weighted statistic, for their analysis of the NAEP data, but with probabilities computed from the reported person "plausible values" (= theta estimates with their error distributions) and 3-P item parameter estimates.

Rudner chooses INFIT because it
a) "is most influenced by items of median difficulty."
See "Chi-square fit statistics" (RMT 8:2 p. 360-361, 1994) for examples of INFIT and OUTFIT behavior.

b) "has a standardized distribution".
INFIT approximates a mean-square distribution (chi^2/d.f.) with expectation 1.0. Departure from 1.0 measures the proportion of excess (or deficiency) in data stochasticity. Rudner's criterion of 1.20 rejects response strings manifesting more than 20% unmodelled noise.

c) "has been shown to be near optimal in identifying spurious scores at the ability distribution tails."

Rudner's INFIT mean-square distribution is reassuring for the NAEP Trial State Assessment (see Figure 1). Its mean is 0.97, standard deviation .17. But the tails, though statistically acceptable invite investigation. Rudner's other two Figures show how unwanted examinee behavior is indicated by the tails.

Figure 1. Person residuals


In Figure 2, high mean-squares indicate unexpected successes or failures. Unexpected responses by low performers are bound to be improbable successes. These could be due to special knowledge or lucky guessing. Unexpected responses by high performers are bound to be improbable failures. These could be due to carelessness, slipping, misunderstandings or "special ignorance". In Figure 2, in the upper right quadrant, there are many more persons misfitting because of careless errors (or incomplete response strings) than, in the upper left quadrant, persons benefiting from lucky guessing.

Figure 2. Ability vs. Fit


Low mean squares indicate less randomness in the response strings than modelled. This could indicate a curriculum effect, i.e., competence at everything taught against a test that also includes difficult, untaught material. Another possibility is the effect of a time limit. When data are taken to be complete, comprising equally determined efforts to succeed on each item, then a time limit makes the last items in a test appear harder. Slow, but careful, workers get all earlier items correct. This higher success rate on early items makes them appear easier. When time runs out these plodders "fail" the later items. The lower success rate on the later items makes them appear harder. This interaction between time and item difficulty makes response strings too predictable and lowers mean-squares below 1.0.

Figure 3 suggests an unexpected interaction between high ability and calculator use in the NAEP Mathematics test. 1990 was the first year that allowed calculators. Items involving calculators misfit. Perhaps high ability persons found calculators as much a liability as an asset, and so committed unexpected errors on items they would have got right by hand. Again there is an excess of unlucky errors over lucky guesses in Figure 3.

Figure 3. Fit vs. Score


Although Rudner reports that trimming unexpected response strings has minimal impact on the overall NAEP conclusions, examining and diagnosing the response strings of such individuals enables us to evaluate and improve our tests, discover when and when not to trust test results, and identity those examinees who require special personal attention for instruction, guidance and decision making.

Guttman, L. 1950. The Basis for Scalogram Analysis. pp. 60-90 in Stouffer, S.A., et. al., Measurement and Prediction. New York: John Wiley, pp.76-77.

Rudner LM, Skagg G, Bracey G, Getson PR. 1995. Use of Person-Fit Statistics in Reporting and Analyzing National Assessment of Educational Progress Results. NCES 95-713. Washington DC: National Center for Education Statistics.

Smith, R.M. (1986) Person fit in the Rasch model. Educational and Psychological Measurement. 46(2) 359-372

Smith, R.M. (1991) The distributional properties of Rasch item fit statistics. Educational and Psychological Measurement. 51(3) 541-565.

Thurstone, L.L., Chave, E.J. 1929. The Measurement of Attitudes. Chicago: University of Chicago Press.

Wright, B.D. 1977. Solving Measurement Problems with the Rasch model. Journal of Educational Measurement, 14(2), 108.


Diagnosing person misfit. Rudner L, Wright BD. … Rasch Measurement Transactions, 1995, 9:2 p.430



Rasch Books and Publications
Invariant Measurement: Using Rasch Models in the Social, Behavioral, and Health Sciences, 2nd Edn. George Engelhard, Jr. & Jue Wang Applying the Rasch Model (Winsteps, Facets) 4th Ed., Bond, Yan, Heene Advances in Rasch Analyses in the Human Sciences (Winsteps, Facets) 1st Ed., Boone, Staver Advances in Applications of Rasch Measurement in Science Education, X. Liu & W. J. Boone Rasch Analysis in the Human Sciences (Winsteps) Boone, Staver, Yale
Introduction to Many-Facet Rasch Measurement (Facets), Thomas Eckes Statistical Analyses for Language Testers (Facets), Rita Green Invariant Measurement with Raters and Rating Scales: Rasch Models for Rater-Mediated Assessments (Facets), George Engelhard, Jr. & Stefanie Wind Aplicação do Modelo de Rasch (Português), de Bond, Trevor G., Fox, Christine M Appliquer le modèle de Rasch: Défis et pistes de solution (Winsteps) E. Dionne, S. Béland
Exploring Rating Scale Functioning for Survey Research (R, Facets), Stefanie Wind Rasch Measurement: Applications, Khine Winsteps Tutorials - free
Facets Tutorials - free
Many-Facet Rasch Measurement (Facets) - free, J.M. Linacre Fairness, Justice and Language Assessment (Winsteps, Facets), McNamara, Knoch, Fan
Other Rasch-Related Resources: Rasch Measurement YouTube Channel
Rasch Measurement Transactions & Rasch Measurement research papers - free An Introduction to the Rasch Model with Examples in R (eRm, etc.), Debelak, Strobl, Zeigenfuse Rasch Measurement Theory Analysis in R, Wind, Hua Applying the Rasch Model in Social Sciences Using R, Lamprianou El modelo métrico de Rasch: Fundamentación, implementación e interpretación de la medida en ciencias sociales (Spanish Edition), Manuel González-Montesinos M.
Rasch Models: Foundations, Recent Developments, and Applications, Fischer & Molenaar Probabilistic Models for Some Intelligence and Attainment Tests, Georg Rasch Rasch Models for Measurement, David Andrich Constructing Measures, Mark Wilson Best Test Design - free, Wright & Stone
Rating Scale Analysis - free, Wright & Masters
Virtual Standard Setting: Setting Cut Scores, Charalambos Kollias Diseño de Mejores Pruebas - free, Spanish Best Test Design A Course in Rasch Measurement Theory, Andrich, Marais Rasch Models in Health, Christensen, Kreiner, Mesba Multivariate and Mixture Distribution Rasch Models, von Davier, Carstensen

To be emailed about new material on www.rasch.org
please enter your email address here:

I want to Subscribe: & click below
I want to Unsubscribe: & click below

Please set your SPAM filter to accept emails from Rasch.org

Rasch Measurement Transactions welcomes your comments:

Your email address (if you want us to reply):

If Rasch.org does not reply, please post your message on the Rasch Forum
 

ForumRasch Measurement Forum to discuss any Rasch-related topic

Go to Top of Page
Go to index of all Rasch Measurement Transactions
AERA members: Join the Rasch Measurement SIG and receive the printed version of RMT
Some back issues of RMT are available as bound volumes
Subscribe to Journal of Applied Measurement

Go to Institute for Objective Measurement Home Page. The Rasch Measurement SIG (AERA) thanks the Institute for Objective Measurement for inviting the publication of Rasch Measurement Transactions on the Institute's website, www.rasch.org.

Coming Rasch-related Events
Apr. 21 - 22, 2025, Mon.-Tue. International Objective Measurement Workshop (IOMW) - Boulder, CO, www.iomw.net
Jan. 17 - Feb. 21, 2025, Fri.-Fri. On-line workshop: Rasch Measurement - Core Topics (E. Smith, Winsteps), www.statistics.com
Feb. - June, 2025 On-line course: Introduction to Classical Test and Rasch Measurement Theories (D. Andrich, I. Marais, RUMM2030), University of Western Australia
Feb. - June, 2025 On-line course: Advanced Course in Rasch Measurement Theory (D. Andrich, I. Marais, RUMM2030), University of Western Australia
May 16 - June 20, 2025, Fri.-Fri. On-line workshop: Rasch Measurement - Core Topics (E. Smith, Winsteps), www.statistics.com
June 20 - July 18, 2025, Fri.-Fri. On-line workshop: Rasch Measurement - Further Topics (E. Smith, Facets), www.statistics.com
Oct. 3 - Nov. 7, 2025, Fri.-Fri. On-line workshop: Rasch Measurement - Core Topics (E. Smith, Winsteps), www.statistics.com

 

The URL of this page is www.rasch.org/rmt/rmt92h.htm

Website: www.rasch.org/rmt/contents.htm