Dichotomous Mean-Square Fit Statistics

Georg Rasch suggests chi-square fit statistics to control the applicability of data to his model (Rasch 1980 p. 25). The chi- squares in common use are known as OUTFIT and INFIT. These are reported as mean-squares, chi-square statistics divided by their degrees of freedom, so that they have a ratio-scale form with expectation 1 and range 0 to +infinity. They are also reported in various interval-scale forms in which their expected value is zero.

OUTFIT is based on the conventional sum of squared standardized residuals. Let X be an observation, E be its expected value based on Rasch parameter estimates and σ² be its modelled variance about its expectation. Then the squared standardized residual is

INFIT is an information-weighted sum. The statistical information in a Rasch observation is its variance, σ². This is larger for targeted observations, and smaller for extreme observations, e.g., easy items administered to able persons. INFIT is Sum(z²σ²)/Sum(σ²) = Sum((X-E)²)/Sum(σ²), summed over the relevant observations.

Fit statistics are formulated to test particular hypotheses. OUTFIT is dominated by unexpected outlying, off-target, low information responses and so is outlier-sensitive. INFIT is dominated by unexpected inlying patterns among informative, on-target observations and so is inlier-sensitive.

The Table shows typical dichotomous patterns. (For polytomies, see www.rasch.org/rmt/rmt103a.htm The S.E. inflator is a multiplier which can be used to increase the imprecision due to modelled observation error to allow for the added uncertainty due to misfit. This inflator is the square-root of the maximum value of INFIT mean-square, OUTFIT mean-square and 1.0. Infit and Outfit mean-squares less than 1.0 do not increase the standard errors, but suggest that the latent variable is locally compressed for the item or person.

The "!" in the tabled response patterns indicates a threshold from the zone in which OUTFIT is more sensitive to the zone in which INFIT is more sensitive. The > indicates the relevant diagnostic mean-square fit value for this range of item difficulties. In the outlying, OUTFIT zones, we expect nearly all successes or nearly all failures. In the transition, INFIT zone, we expect a mixture of success and failure. Departures from these expectations are flagged by the indicated fit statistics. Fit values noticeably above 1.0 indicate excessive unmodelled noise. Fit values noticeably below 1.0 indicate a local deficit in the stochastic variation necessary for useful measurement. What is noticeable depends on the nature of the data. Fit values in well-controlled data, e.g., MCQ responses, are more central than those for free-form responses and clinical observations. What is acceptable depends on what produces useful measurement in context.

Why is a Guttman response pattern, flagged by low INFIT and OUTFIT statistics, problematic? Why isn't it the ideal? A fundamental requirement for useful measurement is that it be test-free and sample- free, so that data sets that "differ materially in some relevant respects" (Rasch 1980 p. 9) produce statistically equivalent results. An obvious relevant difference is that between a hard test and an easy test. But when a Guttman pattern is split in two, it produces an easy test on which the subject performed infinitely well, and a hard test on which the same subject performed infinitely badly. This implicit contradiction exists within every Guttman pattern and so increases our uncertainty in the reported measure. Is the sharp transition really a precise indicator or the subject's measure or is it caused by a time limit? response style? curriculum effect? scanning error? illness?

A useful rule of thumb when investigating fit is to start with extreme high OUTFIT and INFIT values, and work down towards more central values, stopping when diagnosis no longer prompts remedial action nor provokes further thought about the nature of the subjects or the test questions. Edit the data as necessary, e.g., put to one side subjects with obvious "response sets" until the final reporting run. Then reestimate and examine extreme low OUTFIT and INFIT values. Elimination of high misfit values will make most low misfit values less extreme. Low fit values provide less motivation for data editing than do high values, unless obvious duplication is found, e.g., a repeated question or a double-scanned response form. Low fit values do not disturb the meaning of a measure. They merely reduce precision.

(Dichotomous Mean-square) Chi-square fit statistics. Linacre JM, Wright BD. … Rasch Measurement Transactions, 1994, 8:2 p.360

Rasch Books and Publications
Invariant Measurement: Using Rasch Models in the Social, Behavioral, and Health Sciences, 2nd Edn. George Engelhard, Jr. & Jue Wang	Applying the Rasch Model (Winsteps, Facets) 4th Ed., Bond, Yan, Heene	Advances in Rasch Analyses in the Human Sciences (Winsteps, Facets) 1st Ed., Boone, Staver	Advances in Applications of Rasch Measurement in Science Education, X. Liu & W. J. Boone	Rasch Analysis in the Human Sciences (Winsteps) Boone, Staver, Yale
Introduction to Many-Facet Rasch Measurement (Facets), Thomas Eckes	Statistical Analyses for Language Testers (Facets), Rita Green	Invariant Measurement with Raters and Rating Scales: Rasch Models for Rater-Mediated Assessments (Facets), George Engelhard, Jr. & Stefanie Wind	Aplicação do Modelo de Rasch (Português), de Bond, Trevor G., Fox, Christine M	Appliquer le modèle de Rasch: Défis et pistes de solution (Winsteps) E. Dionne, S. Béland
Exploring Rating Scale Functioning for Survey Research (R, Facets), Stefanie Wind	Rasch Measurement: Applications, Khine	Winsteps Tutorials - free Facets Tutorials - free	Many-Facet Rasch Measurement (Facets) - free, J.M. Linacre	Fairness, Justice and Language Assessment (Winsteps, Facets), McNamara, Knoch, Fan
Other Rasch-Related Resources: Rasch Measurement YouTube Channel
Rasch Measurement Transactions & Rasch Measurement research papers - free	An Introduction to the Rasch Model with Examples in R (eRm, etc.), Debelak, Strobl, Zeigenfuse	Rasch Measurement Theory Analysis in R, Wind, Hua	Applying the Rasch Model in Social Sciences Using R, Lamprianou	El modelo métrico de Rasch: Fundamentación, implementación e interpretación de la medida en ciencias sociales (Spanish Edition), Manuel González-Montesinos M.
Rasch Models: Foundations, Recent Developments, and Applications, Fischer & Molenaar	Probabilistic Models for Some Intelligence and Attainment Tests, Georg Rasch	Rasch Models for Measurement, David Andrich	Constructing Measures, Mark Wilson	Best Test Design - free, Wright & Stone Rating Scale Analysis - free, Wright & Masters
Virtual Standard Setting: Setting Cut Scores, Charalambos Kollias	Diseño de Mejores Pruebas - free, Spanish Best Test Design	A Course in Rasch Measurement Theory, Andrich, Marais	Rasch Models in Health, Christensen, Kreiner, Mesba	Multivariate and Mixture Distribution Rasch Models, von Davier, Carstensen

Go to Institute for Objective Measurement Home Page. The Rasch Measurement SIG (AERA) thanks the Institute for Objective Measurement for inviting the publication of Rasch Measurement Transactions on the Institute's website, www.rasch.org.

Coming Rasch-related Events
Apr. 21 - 22, 2025, Mon.-Tue.	International Objective Measurement Workshop (IOMW) - Boulder, CO, www.iomw.net
Jan. 17 - Feb. 21, 2025, Fri.-Fri.	On-line workshop: Rasch Measurement - Core Topics (E. Smith, Winsteps), www.statistics.com
Feb. - June, 2025	On-line course: Introduction to Classical Test and Rasch Measurement Theories (D. Andrich, I. Marais, RUMM2030), University of Western Australia
Feb. - June, 2025	On-line course: Advanced Course in Rasch Measurement Theory (D. Andrich, I. Marais, RUMM2030), University of Western Australia
May 16 - June 20, 2025, Fri.-Fri.	On-line workshop: Rasch Measurement - Core Topics (E. Smith, Winsteps), www.statistics.com
June 20 - July 18, 2025, Fri.-Fri.	On-line workshop: Rasch Measurement - Further Topics (E. Smith, Facets), www.statistics.com
July 21 - 23, 2025, Mon.-Wed.	Pacific Rim Objective Measurement Symposium (PROMS) 2025, www.proms2025.com
Oct. 3 - Nov. 7, 2025, Fri.-Fri.	On-line workshop: Rasch Measurement - Core Topics (E. Smith, Winsteps), www.statistics.com

Person Responses: Easy -- Items -- Hard	Diagnosis Pattern	OUTFIT Mean-square	INFIT Mean-square	Point-Measure Correlation	S.E. Inflator
111¦0110110100¦000	Modelled/Ideal	1.0	1.1	0.62	1.0
111¦1111100000¦000	Guttman/Deterministic	0.3	0.5	0.87	1.0
000¦0000011111¦111	Miscode	12.6	4.3	-0.87	3.5
011¦1111110000¦000	Carelessness Sleeping Slipping	3.8	1.0	0.65	1.9
111¦1111000000¦001	Lucky Guessing	3.8	1.0	0.65	1.9
101¦0101010101¦010	Response set/Miskey	4.0	2.3	0.11	2.0
111¦1000011110¦000	Special knowledge	0.9	1.3	0.43	1.1
111¦1010110010¦000	Imputed outliers *	0.6	1.0	0.62	>1.0*
111¦0101010101¦000	Low discrimination	1.5	1.6	0.46	1.3
111¦1110101000¦000	High discrimination	0.5	0.7	0.79	1.0
111¦1111010000¦000	Very high discrimination	0.3	0.5	0.84	1.0
Right¦Transition¦Wrong
high - low - high	OUTFIT sensitive to outlying observations	>>1.0 unexpected outliers	>>1.0 disturbed pattern
low - high - low	INFIT sensitive to pattern of inlying observations	<<1.0 overly predictable outliers	<<1.0 Guttman pattern
* as when a tailored test is filled out by imputing all "right" response to easier items and all "wrong" to harder items. Increase S.E. based on number of observed response.
The exact details of these computations have been lost, but the items appear to be uniformly distributed about 0.4 logits apart.

Dichotomous Infit and Outfit Mean-Square Fit Statistics