Dichotomous Infit and Outfit Mean-Square Fit Statistics

Georg Rasch suggests chi-square fit statistics to control the applicability of data to his model (Rasch 1980 p. 25). The chi- squares in common use are known as OUTFIT and INFIT. These are reported as mean-squares, chi-square statistics divided by their degrees of freedom, so that they have a ratio-scale form with expectation 1 and range 0 to +infinity. They are also reported in various interval-scale forms in which their expected value is zero.

OUTFIT is based on the conventional sum of squared standardized residuals. Let X be an observation, E be its expected value based on Rasch parameter estimates and σ² be its modelled variance about its expectation. Then the squared standardized residual is

z² = (X-E)² / σ²

OUTFIT is Sum(z²)/N, where N is the number of observations summed.

INFIT is an information-weighted sum. The statistical information in a Rasch observation is its variance, σ². This is larger for targeted observations, and smaller for extreme observations, e.g., easy items administered to able persons. INFIT is Sum(z²σ²)/Sum(σ²) = Sum((X-E)²)/Sum(σ²), summed over the relevant observations.

Fit statistics are formulated to test particular hypotheses. OUTFIT is dominated by unexpected outlying, off-target, low information responses and so is outlier-sensitive. INFIT is dominated by unexpected inlying patterns among informative, on-target observations and so is inlier-sensitive.

Person Responses:
Easy -- Items -- Hard
111¦0110110100¦000 Modelled/Ideal 1.0 1.1 0.62 1.0
111¦1111100000¦000 Guttman/Deterministic 0.3 0.5 0.87 1.0
000¦0000011111¦111 Miscode 12.6 4.3 -0.87 3.5
011¦1111110000¦000 Carelessness/Sleeping 3.8 1.0 0.65 1.9
111¦1111000000¦001 Lucky Guessing 3.8 1.0 0.65 1.9
101¦0101010101¦010 Response set/Miskey 4.0 2.3 0.11 2.0
111¦1000011110¦000 Special knowledge 0.9 1.3 0.43 1.1
111¦1010110010¦000 Imputed outliers * 0.6 1.0 0.62 >1.0*
111¦0101010101¦000 Low discrimination 1.5 1.6 0.46 1.3
111¦1110101000¦000 High discrimination 0.5 0.7 0.79 1.0
111¦1111010000¦000 Very high discrimination 0.3 0.5 0.84 1.0

high - low - high
OUTFIT sensitive to outlying observations >>1.0 unexpected outliers >>1.0 disturbed pattern    

low - high - low
INFIT sensitive to pattern of inlying observations <<1.0 overly predictable outliers <<1.0 Guttman pattern    
* as when a tailored test is filled out by imputing all "right" response to easier items and all "wrong" to harder items. Increase S.E. based on number of observed response.
The exact details of these computations have been lost, but the items appear to be uniformly distributed about 0.4 logits apart.

The Table shows typical dichotomous patterns. (For polytomies, see www.rasch.org/rmt/rmt103a.htm The S.E. inflator is a multiplier which can be used to increase the imprecision due to modelled observation error to allow for the added uncertainty due to misfit. This inflator is the square-root of the maximum value of INFIT mean-square, OUTFIT mean-square and 1.0. Infit and Outfit mean-squares less than 1.0 do not increase the standard errors, but suggest that the latent variable is locally compressed for the item or person.

The "!" in the tabled response patterns indicates a threshold from the zone in which OUTFIT is more sensitive to the zone in which INFIT is more sensitive. The > indicates the relevant diagnostic mean-square fit value for this range of item difficulties. In the outlying, OUTFIT zones, we expect nearly all successes or nearly all failures. In the transition, INFIT zone, we expect a mixture of success and failure. Departures from these expectations are flagged by the indicated fit statistics. Fit values noticeably above 1.0 indicate excessive unmodelled noise. Fit values noticeably below 1.0 indicate a local deficit in the stochastic variation necessary for useful measurement. What is noticeable depends on the nature of the data. Fit values in well-controlled data, e.g., MCQ responses, are more central than those for free-form responses and clinical observations. What is acceptable depends on what produces useful measurement in context.

Why is a Guttman response pattern, flagged by low INFIT and OUTFIT statistics, problematic? Why isn't it the ideal? A fundamental requirement for useful measurement is that it be test-free and sample- free, so that data sets that "differ materially in some relevant respects" (Rasch 1980 p. 9) produce statistically equivalent results. An obvious relevant difference is that between a hard test and an easy test. But when a Guttman pattern is split in two, it produces an easy test on which the subject performed infinitely well, and a hard test on which the same subject performed infinitely badly. This implicit contradiction exists within every Guttman pattern and so increases our uncertainty in the reported measure. Is the sharp transition really a precise indicator or the subject's measure or is it caused by a time limit? response style? curriculum effect? scanning error? illness?

A useful rule of thumb when investigating fit is to start with extreme high OUTFIT and INFIT values, and work down towards more central values, stopping when diagnosis no longer prompts remedial action nor provokes further thought about the nature of the subjects or the test questions. Edit the data as necessary, e.g., put to one side subjects with obvious "response sets" until the final reporting run. Then reestimate and examine extreme low OUTFIT and INFIT values. Elimination of high misfit values will make most low misfit values less extreme. Low fit values provide less motivation for data editing than do high values, unless obvious duplication is found, e.g., a repeated question or a double-scanned response form. Low fit values do not disturb the meaning of a measure. They merely reduce precision.

ptbiserial = measure
1 1
2 2
3 3
4 4
5 5
6 6
7 7
8 8
9 9
10 10
11 11
12 12
13 13
14 14
15 15
16 16
uascale = 2.4
1110110110100000 Modelled/Ideal 1.0 1.1 1.0
1111111100000000 Guttman/Deterministic 0.3 0.5 1.8
0000000011111111 Miscode 12.6 4.3 3.5
0111111110000000 Carelessness/Sleeping 3.8 1.0 1.9
1111111000000001 Lucky Guessing 3.8 1.0 1.9
1010101010101010 Response set/Miskey 4.0 2.3 2.0
1111000011110000 Special knowledge 0.9 1.3 1.1
1111010110010000 Imputed outliers * 0.6 1.0 1.3
1110101010101000 low discrimination
1111110101000000 high discrimination
1111111010000000 very high discrimination

(Dichotomous Mean-square) Chi-square fit statistics. Linacre JM, Wright BD. … Rasch Measurement Transactions, 1994, 8:2 p.360

Rasch Publications
Rasch Measurement Transactions (free, online) Rasch Measurement research papers (free, online) Probabilistic Models for Some Intelligence and Attainment Tests, Georg Rasch Applying the Rasch Model 3rd. Ed., Bond & Fox Best Test Design, Wright & Stone
Rating Scale Analysis, Wright & Masters Introduction to Rasch Measurement, E. Smith & R. Smith Introduction to Many-Facet Rasch Measurement, Thomas Eckes Invariant Measurement: Using Rasch Models in the Social, Behavioral, and Health Sciences, George Engelhard, Jr. Statistical Analyses for Language Testers, Rita Green
Rasch Models: Foundations, Recent Developments, and Applications, Fischer & Molenaar Journal of Applied Measurement Rasch models for measurement, David Andrich Constructing Measures, Mark Wilson Rasch Analysis in the Human Sciences, Boone, Stave, Yale
in Spanish: Análisis de Rasch para todos, Agustín Tristán Mediciones, Posicionamientos y Diagnósticos Competitivos, Juan Ramón Oreja Rodríguez

To be emailed about new material on www.rasch.org
please enter your email address here:

I want to Subscribe: & click below
I want to Unsubscribe: & click below

Please set your SPAM filter to accept emails from Rasch.org

www.rasch.org welcomes your comments:

Your email address (if you want us to reply):


ForumRasch Measurement Forum to discuss any Rasch-related topic

Go to Top of Page
Go to index of all Rasch Measurement Transactions
AERA members: Join the Rasch Measurement SIG and receive the printed version of RMT
Some back issues of RMT are available as bound volumes
Subscribe to Journal of Applied Measurement

Go to Institute for Objective Measurement Home Page. The Rasch Measurement SIG (AERA) thanks the Institute for Objective Measurement for inviting the publication of Rasch Measurement Transactions on the Institute's website, www.rasch.org.

Coming Rasch-related Events
Sept. 27-29, 2017, Wed.-Fri. In-person workshop: Introductory Rasch Analysis using RUMM2030, Leeds, UK (M. Horton), Announcement
Oct. 13 - Nov. 10, 2017, Fri.-Fri. On-line workshop: Practical Rasch Measurement - Core Topics (E. Smith, Winsteps), www.statistics.com
Oct. 25-27, 2017, Wed.-Fri. In-person workshop: Applying the Rasch Model hands-on introductory workshop, Melbourne, Australia (T. Bond, B&FSteps), Announcement
Dec. 6-8, 2017, Wed.-Fri. In-person workshop: Introductory Rasch Analysis using RUMM2030, Leeds, UK (M. Horton), Announcement
Jan. 5 - Feb. 2, 2018, Fri.-Fri. On-line workshop: Practical Rasch Measurement - Core Topics (E. Smith, Winsteps), www.statistics.com
Jan. 10-16, 2018, Wed.-Tues. In-person workshop: Advanced Course in Rasch Measurement Theory and the application of RUMM2030, Perth, Australia (D. Andrich), Announcement
Jan. 17-19, 2018, Wed.-Fri. Rasch Conference: Seventh International Conference on Probabilistic Models for Measurement, Matilda Bay Club, Perth, Australia, Website
April 13-17, 2018, Fri.-Tues. AERA, New York, NY, www.aera.net
May 25 - June 22, 2018, Fri.-Fri. On-line workshop: Practical Rasch Measurement - Core Topics (E. Smith, Winsteps), www.statistics.com
June 29 - July 27, 2018, Fri.-Fri. On-line workshop: Practical Rasch Measurement - Further Topics (E. Smith, Winsteps), www.statistics.com
Aug. 10 - Sept. 7, 2018, Fri.-Fri. On-line workshop: Many-Facet Rasch Measurement (E. Smith, Facets), www.statistics.com
Oct. 12 - Nov. 9, 2018, Fri.-Fri. On-line workshop: Practical Rasch Measurement - Core Topics (E. Smith, Winsteps), www.statistics.com


The URL of this page is www.rasch.org/rmt/rmt82a.htm

Website: www.rasch.org/rmt/contents.htm