A common misapprehension about person fit statistics has been the belief that the widely-known and widely-used standardized infit and outfit statistics have the power to detect all types of departures from the objective measurement model. The statistics in question, either of which is referred to here as Z, were designed to identify misfit associated with undifferentiated patterns of responses, and, in the case of outfit, the presence of lucky guessing or unlucky carelessness. What is often not appreciated is that there are other aspects of person misfit for which Z is not a suitable detective. Not only should we not expect Z to detect all aspects of misfit in persons, but any insistence on statistics that might claim such universality would be naive.
The response to critics, therefore, who point out that Z does not detect this-or-that particular type of misfit (usually induced artificially via specifically distributed simulated data) is to reemphasize which types of data problems can be usefully detected by Z. Certainly data constructed to produce two or more clearly identifiable subsets of items or persons, for example, can result in response patterns which Z would not identify as "misfitting". This is because the multi-dimensionality of such data could go unrecognized without fit analysis based on relevant partitions of the persons or the items. In this example, some form of between item-group or between person-group fit statistic is called for. The decision concerning which subsets of items or persons to investigate corresponds to an a priori hypothesis about the data. One might describe this type of misfit investigation as "confirmatory" in contrast to the "exploratory" role for which Z was designed. The practical utility of Z has been noted in the literature from the time of its invention (Wright and Panchapakesan, 1969), yet some call upon it to solve problems for which it was never intended.
There are also critics who, whilst recognizing the situations in which Z is rightly called for, claim that even there it is too crude to do a decent job of detective work, that it sometimes picks out patterns as aberrant when it shouldn't or that it sometimes fails to detect misfitting patterns when it should. Statisticians would describe these problems in terms of the power of the Z statistic: too much power in the first case and too little in the second. Recent research completed by the author and Ben Wright shows that at the exploratory level where some form of statistic is required to fine-tune our decisions about misfitting response patterns, Z is, in fact, quite satisfactory. It is our intention to report this work in a future article. For the present, some observations which have arisen out of that study are in order.
Of all of the possible response patterns resulting in a particular raw score, the majority are so unlikely under the governance of the objective measurement model, that one does not need any form of a fine-tuned Z statistic in order to make sensible decisions about person fit. Even in the case of a very short test of L=8 items (spread uniformly in difficulty between plus and minus three logits), of the seventy patterns resulting in a raw score of r=4, 80 percent have exceedance probabilities less than 0.05.
This comment requires amplification. The probabilities referred to are the conditional probabilities of the individual response patterns, conditioned on the raw score (in this example, the raw score of four). Furthermore, the determination of "exceedance" necessitates the calculation of the conditional probability of every response pattern which is less probable than the one under investigation. To implement this all patterns are ordered from most to least likely according to their probability. When items are sequenced in order of increasing difficulty, no matter how long the test nor what the value of the raw score under study, the most likely pattern is always the Guttman (1111...0000) and the least likely is always its inverse (0000...1111). For the eight items mentioned and a raw score of four, (11110000) has a probability of 0.43 and (00001111) has a probability of 0.0000005. The remaining 68 patterns are listed in the Table, and have probabilities between these two values. These individual probabilities, however, are not much help by themselves, since, for any test of practical length, the individual probabilities are infinitesimally small for the majority of patterns. The individual probability of a response pattern such as (10101100) in the current example is 0.006, but the accumulated probability of this and the 53 more aberrant and hence less probable patterns amounts to 0.04. If hypothesis testing is the psychometrician's preference, then the conclusion at an alpha level of 0.05 would be that the pattern (10101100) does not fit the objective measurement model. In practice, it is unlikely that rigid alpha levels would be useful, so patterns within some exceedance interval, say .025 to .075 (flagged by "?" in the Table), might be identified as warranting investigation.
The final observation concerns an "exact" fit statistic for patterns, but one with a distribution which has to be approximated in a way more complex than Z. As with so many aspects of objective measurement, the first suggestion of an exact fit statistic came from Georg Rasch in his 1960 book. He was describing fit in an overall sense when he made the observation that an exact test would require the determination of the probabilities of all data matrices (of 0's and 1's) less probable than the one under investigation. He recognized, as have others since (Douglas 1982), that the combinatorial problems of determining which are less probable matrices are sufficient to direct researchers to approximations and to focus those approximations on the rows (for persons) or columns (for items) of the matrix of responses.
The conditional probability of a pattern is a function of (1) the sum of the difficulties, Di, of the items the respondent had correct, and (2) the elementary symmetric function, tr, associated with a raw score of r, that is,
P([X]|r) = E(-ΣXiDi) / γr
where Xi = 0, 1 (for incorrect, correct) and Di = item difficulty.
There is a direct monotonic relationship between the accumulation of these probabilities (the exceedance) and the negative of the numerator terms, ΣXiDi, which we will call Wr. This is because, for fixed r, the denominator γr is a constant.
All response patterns may be ordered from least likely to most likely by the magnitude of Wr. This gives us an "exact" fit statistic for person performance. If the difficulties of the eight items in the example being used as illustration are ±3.0, ±2.1, ±1.3, and ±0.4, and we focus on a raw score of r=4, the most likely pattern (11110000) has W4 of -3.0 - 2.1 - 1.3 - 0.4 = -6.9. An intermediate pattern such as (10101100) has W4 = -2.6, and the least likely pattern (00001111) has W4 = 6.9. Extreme values of W, like extreme values of Z, in either direction, indicate misfit.
A preliminary investigation of this statistic and approximations to its distribution has been carried out by Molenaar and Hoijtink (1990). Since W is dependent on both the number and dispersion of item difficulties, however, it is capable of little more than a probability ordering of that score's patterns. The published approximations, however, reduce these dependencies. But they increase the complexity of the calculations substantially and would appear to have limited utility. Our research shows that the approximations embedded in Z are sufficiently accurate without resorting to complex expressions involving second and third-order symmetric functions. These conclusions add further evidence to the extensive body of research on fit carried out over the last decade by Richard Smith (1988).
Graham A. Douglas
University of Western Australia
Douglas, G.A. (1982) Issues in the fit of data to psychometric models. Education research and Perspectives, 9, 32-43.
Molenaar, I.W. and Hoijtink, H. (1990) The many null distributions of person fit indices. Psychometrika, 55(1), 75-106.
Rasch, G. (1960, 1992) Probabilistic models for some intelligence and attainment tests. Chicago: MESA Press.
Smith, R.M. (1988). The distributional properties of Rasch standardized residuals. Educational and Psychological Measurement, 48, 657-667.
Wright, B.D. and Panchapakesan, N.A. (1969) A procedure for sample-free item analysis. Educational and Psychological Measurement, 29, 23-48.
Ways of scoring 4 on 8 items, uniformly distributed
±3 logits Response strings in descending order of probability Items ordered from easiest to hardest | ||||||||
---|---|---|---|---|---|---|---|---|
Response String |
Probability | Exceedance | Diagnosis | Response String |
Probability | Exceedance | Diagnosis | |
Most Likely 11110000 11101000 11011000 11100100 10111000 11100010 11010100 01111000 10110100 11001100 11010010 11100001 01110100 10101100 10110010 11001010 11010001 01101100 01110010 10011100 10101010 10110001 11000110 11001001 10011010 10100110 01011100 01101010 01110001 10101001 11000101 00111100 01011010 01100110 10010110 |
0.4317 0.1832 0.0777 0.0777 0.0330 0.0330 0.0330 0.0140 0.0140 0.0140 0.0140 0.0140 0.0059 0.0059 0.0059 0.0059 0.0059 0.0025 0.0025 0.0025 0.0025 0.0025 0.0025 0.0025 0.0011 0.0011 0.0011 0.0011 0.0011 0.0011 0.0011 0.0005 0.0005 0.0005 0.0005 |
1.0000 0.5683 0.3074 0.1637 0.0747 0.0369 0.0158 0.0069 0.0026 |
Muted OK OK ? ? ? ? ? ? ? ? ? ? Noisy Noisy |
10011001 10100101 11000011 01101001 00111010 01010110 10001110 10010101 10100011 01011001 01100101 00110110 00111001 01001110 01010101 01100011 10001101 10010011 00101110 10001011 00110101 01001101 01010011 10000111 00011110 00101101 00110011 01001011 00011101 00101011 01000111 00011011 00100111 00010111 00001111 Least Likely |
0.0005 0.0005 0.0005 0.0005 0.0002 0.0002 0.0002 0.0002 0.0002 0.0002 0.0002 0.0001 0.0001 0.0001 0.0001 0.0001 0.0001 0.0001 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 |
0.0026 0.0010 0.0004 0.0001 0.0000 0.0000 0.0000 0.0000 0.0000 |
Noisy Noisy |
Response patterns and their probabilities. Douglas GA. Rasch Measurement Transactions, 1990, 3:4 p.75
Forum | Rasch Measurement Forum to discuss any Rasch-related topic |
Go to Top of Page
Go to index of all Rasch Measurement Transactions
AERA members: Join the Rasch Measurement SIG and receive the printed version of RMT
Some back issues of RMT are available as bound volumes
Subscribe to Journal of Applied Measurement
Go to Institute for Objective Measurement Home Page. The Rasch Measurement SIG (AERA) thanks the Institute for Objective Measurement for inviting the publication of Rasch Measurement Transactions on the Institute's website, www.rasch.org.
Coming Rasch-related Events | |
---|---|
Apr. 21 - 22, 2025, Mon.-Tue. | International Objective Measurement Workshop (IOMW) - Boulder, CO, www.iomw.net |
Jan. 17 - Feb. 21, 2025, Fri.-Fri. | On-line workshop: Rasch Measurement - Core Topics (E. Smith, Winsteps), www.statistics.com |
Feb. - June, 2025 | On-line course: Introduction to Classical Test and Rasch Measurement Theories (D. Andrich, I. Marais, RUMM2030), University of Western Australia |
Feb. - June, 2025 | On-line course: Advanced Course in Rasch Measurement Theory (D. Andrich, I. Marais, RUMM2030), University of Western Australia |
May 16 - June 20, 2025, Fri.-Fri. | On-line workshop: Rasch Measurement - Core Topics (E. Smith, Winsteps), www.statistics.com |
June 20 - July 18, 2025, Fri.-Fri. | On-line workshop: Rasch Measurement - Further Topics (E. Smith, Facets), www.statistics.com |
Oct. 3 - Nov. 7, 2025, Fri.-Fri. | On-line workshop: Rasch Measurement - Core Topics (E. Smith, Winsteps), www.statistics.com |
The URL of this page is www.rasch.org/rmt/rmt34a.htm
Website: www.rasch.org/rmt/contents.htm