Person Fit Statistics: What is Their Purpose?

"The objective of person-fit measurement is to detect item-score patterns that are improbable given an IRT model or given the other patterns in a sample." (Meijer & Sitsma, 2001)

This sounds correct to the conventional statistician, but W. E. Deming's response to this line of reasoning was: "Of course we don't wish to violate specifications, but we must do better" (Neave, 1990, p.169).

Here are ways we can do better:

(i) Person-fit measurement must not only identify patterns that are improbable, but also ones that are too probable! The Rasch model predicts uncertainty. Too much certainty can indicate a constraint on the responses, perhaps the hidden imputation of "wrong" to not-reached items, or a response set on an attitude survey. Since the substantive length of a logit depends on the randomness in the data, constraints on the data artificially reduce the randomness. This makes the differences between items and persons appear greater when expressed in logits. If the constraints are limited to a subset of persons or items, then the measurement system is distorted accordingly.

(ii) Person-measurement fit statistics must flag misleading or ambiguous measures. Simply because a response string is improbable does not mean that it is misleading, and vice-versa. Meijer & Sitsma (2001) criticize residual-based fit statistics, such as OUTFIT and INFIT, because they "do not reflect the probability of ordering of the score patterns" (p.130). This is true, but is it relevant? What is needed is an indication of how much misfit disturbs the estimated measures, not the likelihood of any particular response pattern.

The Old Advice is still the Good Advice

Meijer & Sitsma (2001) end their section "Improving Measurement Practice" with this sound advice:

"Smith (1985) mentioned four actions that could be taken when an item-score pattern is classified as misfitting:
(1) report several [ability] estimates (rather than just one) for an examinee based on subtests that are in agreement with the model,
(2) modify the item-score pattern (e.g., eliminate the unreached items at the end) and re-estimate [ability],
(3) do not report the [ability] estimate and retest the examinee, or
(4) decide that the error is small enough for the impact on [ability] to be [negligible]. ...

Which of these actions is taken depends on the context in which testing takes place. The usefulness of a person-fit statistic thus also depends heavily on the application for which it is intended."

Meijer R.R., & Sijtsma K. (2001) Methodology review: evaluating person fit. Applied Psychological Measurement 25(2), 107-135

Neave H.R. (1990) The Deming Dimension. Knoxville TN: SPC Press.

Smith R.M. (1985) A comparison of Rasch person analysis and robust estimators. Educational and Psychological Measurement, 16, 149-157.

Person Fit Statistic - What is Their Purpose? Meijer R.R., Sitsma K. … Rasch Measurement Transactions, 2001, 15:2 p. 823

