Response Patterns and Their Probabilities

A common misapprehension about person fit statistics has been the belief that the widely-known and widely-used standardized infit and outfit statistics have the power to detect all types of departures from the objective measurement model. The statistics in question, either of which is referred to here as Z, were designed to identify misfit associated with undifferentiated patterns of responses, and, in the case of outfit, the presence of lucky guessing or unlucky carelessness. What is often not appreciated is that there are other aspects of person misfit for which Z is not a suitable detective. Not only should we not expect Z to detect all aspects of misfit in persons, but any insistence on statistics that might claim such universality would be naive.

The response to critics, therefore, who point out that Z does not detect this-or-that particular type of misfit (usually induced artificially via specifically distributed simulated data) is to reemphasize which types of data problems can be usefully detected by Z. Certainly data constructed to produce two or more clearly identifiable subsets of items or persons, for example, can result in response patterns which Z would not identify as "misfitting". This is because the multi-dimensionality of such data could go unrecognized without fit analysis based on relevant partitions of the persons or the items. In this example, some form of between item-group or between person-group fit statistic is called for. The decision concerning which subsets of items or persons to investigate corresponds to an a priori hypothesis about the data. One might describe this type of misfit investigation as "confirmatory" in contrast to the "exploratory" role for which Z was designed. The practical utility of Z has been noted in the literature from the time of its invention (Wright and Panchapakesan, 1969), yet some call upon it to solve problems for which it was never intended.

There are also critics who, whilst recognizing the situations in which Z is rightly called for, claim that even there it is too crude to do a decent job of detective work, that it sometimes picks out patterns as aberrant when it shouldn't or that it sometimes fails to detect misfitting patterns when it should. Statisticians would describe these problems in terms of the power of the Z statistic: too much power in the first case and too little in the second. Recent research completed by the author and Ben Wright shows that at the exploratory level where some form of statistic is required to fine-tune our decisions about misfitting response patterns, Z is, in fact, quite satisfactory. It is our intention to report this work in a future article. For the present, some observations which have arisen out of that study are in order.

Of all of the possible response patterns resulting in a particular raw score, the majority are so unlikely under the governance of the objective measurement model, that one does not need any form of a fine-tuned Z statistic in order to make sensible decisions about person fit. Even in the case of a very short test of L=8 items (spread uniformly in difficulty between plus and minus three logits), of the seventy patterns resulting in a raw score of r=4, 80 percent have exceedance probabilities less than 0.05.

This comment requires amplification. The probabilities referred to are the conditional probabilities of the individual response patterns, conditioned on the raw score (in this example, the raw score of four). Furthermore, the determination of "exceedance" necessitates the calculation of the conditional probability of every response pattern which is less probable than the one under investigation. To implement this all patterns are ordered from most to least likely according to their probability. When items are sequenced in order of increasing difficulty, no matter how long the test nor what the value of the raw score under study, the most likely pattern is always the Guttman (1111...0000) and the least likely is always its inverse (0000...1111). For the eight items mentioned and a raw score of four, (11110000) has a probability of 0.43 and (00001111) has a probability of 0.0000005. The remaining 68 patterns are listed in the Table, and have probabilities between these two values. These individual probabilities, however, are not much help by themselves, since, for any test of practical length, the individual probabilities are infinitesimally small for the majority of patterns. The individual probability of a response pattern such as (10101100) in the current example is 0.006, but the accumulated probability of this and the 53 more aberrant and hence less probable patterns amounts to 0.04. If hypothesis testing is the psychometrician's preference, then the conclusion at an alpha level of 0.05 would be that the pattern (10101100) does not fit the objective measurement model. In practice, it is unlikely that rigid alpha levels would be useful, so patterns within some exceedance interval, say .025 to .075 (flagged by "?" in the Table), might be identified as warranting investigation.

The final observation concerns an "exact" fit statistic for patterns, but one with a distribution which has to be approximated in a way more complex than Z. As with so many aspects of objective measurement, the first suggestion of an exact fit statistic came from Georg Rasch in his 1960 book. He was describing fit in an overall sense when he made the observation that an exact test would require the determination of the probabilities of all data matrices (of 0's and 1's) less probable than the one under investigation. He recognized, as have others since (Douglas 1982), that the combinatorial problems of determining which are less probable matrices are sufficient to direct researchers to approximations and to focus those approximations on the rows (for persons) or columns (for items) of the matrix of responses.

The conditional probability of a pattern is a function of (1) the sum of the difficulties, D_i, of the items the respondent had correct, and (2) the elementary symmetric function, tr, associated with a raw score of r, that is,

There is a direct monotonic relationship between the accumulation of these probabilities (the exceedance) and the negative of the numerator terms, ΣX_iD_i, which we will call W_r. This is because, for fixed r, the denominator γ_r is a constant.

All response patterns may be ordered from least likely to most likely by the magnitude of W_r. This gives us an "exact" fit statistic for person performance. If the difficulties of the eight items in the example being used as illustration are ±3.0, ±2.1, ±1.3, and ±0.4, and we focus on a raw score of r=4, the most likely pattern (11110000) has W₄ of -3.0 - 2.1 - 1.3 - 0.4 = -6.9. An intermediate pattern such as (10101100) has W₄ = -2.6, and the least likely pattern (00001111) has W₄ = 6.9. Extreme values of W, like extreme values of Z, in either direction, indicate misfit.

A preliminary investigation of this statistic and approximations to its distribution has been carried out by Molenaar and Hoijtink (1990). Since W is dependent on both the number and dispersion of item difficulties, however, it is capable of little more than a probability ordering of that score's patterns. The published approximations, however, reduce these dependencies. But they increase the complexity of the calculations substantially and would appear to have limited utility. Our research shows that the approximations embedded in Z are sufficiently accurate without resorting to complex expressions involving second and third-order symmetric functions. These conclusions add further evidence to the extensive body of research on fit carried out over the last decade by Richard Smith (1988).

Molenaar, I.W. and Hoijtink, H. (1990) The many null distributions of person fit indices. Psychometrika, 55(1), 75-106.

Rasch, G. (1960, 1992) Probabilistic models for some intelligence and attainment tests. Chicago: MESA Press.

Smith, R.M. (1988). The distributional properties of Rasch standardized residuals. Educational and Psychological Measurement, 48, 657-667.

Wright, B.D. and Panchapakesan, N.A. (1969) A procedure for sample-free item analysis. Educational and Psychological Measurement, 29, 23-48.

Ways of scoring 4 on 8 items, uniformly distributed ±3 logits Response strings in descending order of probability Items ordered from easiest to hardest
Response String	Probability	Exceedance	Diagnosis		Response String	Probability	Exceedance	Diagnosis
Most Likely 11110000 11101000 11011000 11100100 10111000 11100010 11010100 01111000 10110100 11001100 11010010 11100001 01110100 10101100 10110010 11001010 11010001 01101100 01110010 10011100 10101010 10110001 11000110 11001001 10011010 10100110 01011100 01101010 01110001 10101001 11000101 00111100 01011010 01100110 10010110	0.4317 0.1832 0.0777 0.0777 0.0330 0.0330 0.0330 0.0140 0.0140 0.0140 0.0140 0.0140 0.0059 0.0059 0.0059 0.0059 0.0059 0.0025 0.0025 0.0025 0.0025 0.0025 0.0025 0.0025 0.0011 0.0011 0.0011 0.0011 0.0011 0.0011 0.0011 0.0005 0.0005 0.0005 0.0005	1.0000 0.5683 0.3074 0.1637 0.0747 0.0369 0.0158 0.0069 0.0026	Muted OK OK ? ? ? ? ? ? ? ? ? ? Noisy Noisy		10011001 10100101 11000011 01101001 00111010 01010110 10001110 10010101 10100011 01011001 01100101 00110110 00111001 01001110 01010101 01100011 10001101 10010011 00101110 10001011 00110101 01001101 01010011 10000111 00011110 00101101 00110011 01001011 00011101 00101011 01000111 00011011 00100111 00010111 00001111 Least Likely	0.0005 0.0005 0.0005 0.0005 0.0002 0.0002 0.0002 0.0002 0.0002 0.0002 0.0002 0.0001 0.0001 0.0001 0.0001 0.0001 0.0001 0.0001 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000	0.0026 0.0010 0.0004 0.0001 0.0000 0.0000 0.0000 0.0000 0.0000	Noisy Noisy

Response patterns and their probabilities. Douglas GA. … Rasch Measurement Transactions, 1990, 3:4 p.75

Rasch Books and Publications
Invariant Measurement: Using Rasch Models in the Social, Behavioral, and Health Sciences, 2nd Edn. George Engelhard, Jr. & Jue Wang	Applying the Rasch Model (Winsteps, Facets) 4th Ed., Bond, Yan, Heene	Advances in Rasch Analyses in the Human Sciences (Winsteps, Facets) 1st Ed., Boone, Staver	Advances in Applications of Rasch Measurement in Science Education, X. Liu & W. J. Boone	Rasch Analysis in the Human Sciences (Winsteps) Boone, Staver, Yale
Introduction to Many-Facet Rasch Measurement (Facets), Thomas Eckes	Statistical Analyses for Language Testers (Facets), Rita Green	Invariant Measurement with Raters and Rating Scales: Rasch Models for Rater-Mediated Assessments (Facets), George Engelhard, Jr. & Stefanie Wind	Aplicação do Modelo de Rasch (Português), de Bond, Trevor G., Fox, Christine M	Appliquer le modèle de Rasch: Défis et pistes de solution (Winsteps) E. Dionne, S. Béland
Exploring Rating Scale Functioning for Survey Research (R, Facets), Stefanie Wind	Rasch Measurement: Applications, Khine	Winsteps Tutorials - free Facets Tutorials - free	Many-Facet Rasch Measurement (Facets) - free, J.M. Linacre	Fairness, Justice and Language Assessment (Winsteps, Facets), McNamara, Knoch, Fan
Other Rasch-Related Resources: Rasch Measurement YouTube Channel
Rasch Measurement Transactions & Rasch Measurement research papers - free	An Introduction to the Rasch Model with Examples in R (eRm, etc.), Debelak, Strobl, Zeigenfuse	Rasch Measurement Theory Analysis in R, Wind, Hua	Applying the Rasch Model in Social Sciences Using R, Lamprianou	El modelo métrico de Rasch: Fundamentación, implementación e interpretación de la medida en ciencias sociales (Spanish Edition), Manuel González-Montesinos M.
Rasch Models: Foundations, Recent Developments, and Applications, Fischer & Molenaar	Probabilistic Models for Some Intelligence and Attainment Tests, Georg Rasch	Rasch Models for Measurement, David Andrich	Constructing Measures, Mark Wilson	Best Test Design - free, Wright & Stone Rating Scale Analysis - free, Wright & Masters
Virtual Standard Setting: Setting Cut Scores, Charalambos Kollias	Diseño de Mejores Pruebas - free, Spanish Best Test Design	A Course in Rasch Measurement Theory, Andrich, Marais	Rasch Models in Health, Christensen, Kreiner, Mesba	Multivariate and Mixture Distribution Rasch Models, von Davier, Carstensen

Go to Institute for Objective Measurement Home Page. The Rasch Measurement SIG (AERA) thanks the Institute for Objective Measurement for inviting the publication of Rasch Measurement Transactions on the Institute's website, www.rasch.org.

Coming Rasch-related Events
Apr. 21 - 22, 2025, Mon.-Tue.	International Objective Measurement Workshop (IOMW) - Boulder, CO, www.iomw.net
Jan. 17 - Feb. 21, 2025, Fri.-Fri.	On-line workshop: Rasch Measurement - Core Topics (E. Smith, Winsteps), www.statistics.com
Feb. - June, 2025	On-line course: Introduction to Classical Test and Rasch Measurement Theories (D. Andrich, I. Marais, RUMM2030), University of Western Australia
Feb. - June, 2025	On-line course: Advanced Course in Rasch Measurement Theory (D. Andrich, I. Marais, RUMM2030), University of Western Australia
May 16 - June 20, 2025, Fri.-Fri.	On-line workshop: Rasch Measurement - Core Topics (E. Smith, Winsteps), www.statistics.com
June 20 - July 18, 2025, Fri.-Fri.	On-line workshop: Rasch Measurement - Further Topics (E. Smith, Facets), www.statistics.com
July 21 - 23, 2025, Mon.-Wed.	Pacific Rim Objective Measurement Symposium (PROMS) 2025, www.proms2025.com
Oct. 3 - Nov. 7, 2025, Fri.-Fri.	On-line workshop: Rasch Measurement - Core Topics (E. Smith, Winsteps), www.statistics.com