Meaningfulness, Measurement and Item Response Theory (IRT)

There is a basic principle of meaningfulness accepted by a wide cross-section of different philosophical viewpoints that justifies the use of fundamental measurement models like Rasch's to the exclusion of IRT models. That principle states that meaning is an abstract fictional ideal approximated to the extent that it is separable from the (metaphorical, geometrical, numerical, historical, etc.) figures representing it. According to the deconstructionist Jacques Derrida (1982, p. 229), for instance, "... the sense aimed at through these figures [of metaphor] is an essence rigorously independent of that which transports it." Similarly, the hermeneuticist Paul Ricoeur (1977, p. 293) concurs, saying "No philosophical discourse would be possible, not even a discourse of deconstruction, if we ceased to assume what Derrida justly holds to be 'the sole thesis of philosophy,' namely 'that the meaning aimed at through these figures is an essence rigorously independent of that which carries it over.'" Gadamer (1980, p. 100) also agrees, saying "It is clear to us that the figure which we draw to illustrate a mathematical relationship visually is not the mathematical relationship itself. ...in a manner of speaking one looks right through the drawn circle and keeps the pure thought of the circle in mind."

We find the same principle at work again in Mundy's (1986, p. 392) general theory of meaningful representation: "The hallmark of a meaningless proposition is that its truth-value depends on what scale or coordinate system is employed, whereas meaningful propositions have truth-value independent of the choice of representation, within certain limits. The formal analysis of this distinction leads, in all three areas [measurement theory, geometry, and relativity], to a rather involved technical apparatus focusing upon invariance under changes of scale or changes of coordinate system." The same focus on the independence of figure and meaning, or scale and proportion, emerges in a wide variety of other works on the creation of qualitative mathematical meaning (Heidegger, 1967; Luce, 1978; Narens, 1981, 2002; Roberts, 1985, 1994, 1999).

These issues of meaningfulness and measurement are explored at length by Fisher (2003a, 2003b, 2004). The basic point is that the content of tests and surveys ought to be used to illustrate mathematical relationships between abilities and difficulties visually and conceptually without confusing them with the mathematical relationships themselves. We need to look right through the sample of items used to illustrate the construct and keep the pure thought of the construct in mind, in the manner of the numerical and geometrical figures that are understood to paradigmatically define meaningful representation by, again, a wide range of diverse philosophers (Derrida, 1989, p. 66; Descartes, 1961, p. 8; Gadamer, 1989, pp. 412-3; Kant, 1970, p. 7; see Michell, 1990, pp. 6-8 for more).

Meaningful measurement requires what Rasch (1960) called parameter separation, what Ronald Fisher (1922) called statistical sufficiency, and what Luce and Tukey (1964) called conjoint additivity; all of these can be identified in Rasch models, but not in IRT models (Wright 1984, 1999). In purporting to produce meaningful results, IRT models assume, but do not test or establish, the separation of figure and meaning. Wood (1978, p. 31), accordingly, found himself "persuaded by Lumsden (Lumsden 1978) that two- and three-parameter models are not the answer - test scaling models are self-contradictory if they assert both unidimensionality and different slopes for the item characteristic curves." In David Andrich's (1988, p. 67) study of measurement, he notes that "[the 2-parameter IRT model] attempts to capture the differences in discriminations of the ICCs. The model destroys the possibility of explicit invariance of the estimates of the person and item parameters and will, therefore, not be pursued here."

Because of its internal contradictions, IRT discrimination and guessing parameter estimation requires much larger samples than are required for Rasch models (Lord, 1983). Even with very large samples, the estimation process may diverge instead of converge, a failure prevented by the authors of one popular IRT software program by eliminating the additional item parameters and reducing to a Rasch model on alternate iterations (Stocking, 1989). Given all of these issues, Lumsden (1978, p. 22) accordingly contends that "The two- and three-parameter logistic and normal ogive scaling models should be abandoned since, if the unidimensionality requirement is met, the Rasch (1960) one-parameter model will be realized."

It has lately been noted that some IRT advocates mistakenly think that unidimensionality is not tested in Rasch model applications, but is merely assumed. This has never been the case, but may be a perceptual by-product of a failure to recognize and accept the paradigmatic difference between the measurement perspective (data are fit to models specifying the relational structures needed for meaningfulness) and the statistical perspective (models are fit to data as a means of describing the data). IRT's item discrimination parameter captures and describes interactions between items and respondents in an essentially statistical exercise that compromises the requirements of measurement. This is, of course, quite a reasonable way to proceed in contexts where measurement has already been accomplished, and understanding of the relations between measures is at issue. But the need for information on these interactions does not require that they be estimated at the same time that item calibrations and respondent measures are estimated.

On the contrary, the internal contradictions and estimation problems introduced by the additional item parameter(s) are unnecessary complications that are easily overcome by removing them from the measurement environment and placing them in the statistical environment where they belong. Model fit statistics, for instance, usually correlate very highly (0.95 and up) with the discrimination parameter, do not confound the estimation process, have been available in Rasch software since the early 1970s, and are routinely employed in evaluating data quality. The fit statistics isolate specific scale dependencies that render measures and/or calibrations meaningless, and in doing so facilitate the deconstruction of the original research question in a critical search for a more fundamental articulation capable of supporting figure-meaning separation. As Derrida (in Wood and Bernasconi 1988, p. 88-9) put it, "I try to place myself at a certain point at which -- and this would be the very 'content' of what I would like to 'signify' -- the thing signified is no longer easily separable from the signifier." Model fit statistics flag failures of invariance in which the clarity of mathematical representation is compromised by content-dependencies that prevent the separation of the amount measured (the signified) from the number representing it (the signifier).

A popular Rasch measurement software package, Winsteps (Linacre, 2005), now makes it possible to estimate IRT item parameters without confounding the estimation of measures and calibrations (Linacre, 2004). The model fit statistics routinely produced in Rasch applications enable the study of individual residual differences between observed and expected responses, and are widely recognized for their diagnostic utility. The high correlation of the IRT discrimination parameter estimates with the fit statistics simply reproduces information already available and does not offer additional information beyond what is provided in the Rasch residuals. The recent inclusion of the IRT parameter estimates in Winsteps, then, provides an instructive point of contrast between the measurement and IRT perspectives, but does not enhance the existing substantive value of the information previously provided (Wright, 1992).

Andrich, D. (1988). Rasch models for measurement. Beverly Hills, California: Sage Publications.

Derrida, J. (1982). Margins of philosophy. Chicago, Illinois: University of Chicago Press.

Derrida, J. (1989). Edmund Husserl's Origin of Geometry: An introduction. Lincoln: University of Nebraska Press.

Descartes, R. (1961). Rules for the direction of the mind (L. J. Lafleur, Trans.). The Library of Liberal Arts. Indianapolis: Bobbs-Merrill.

Fisher, R. A. (1922). On the mathematical foundations of theoretical statistics. Philosophical Transactions of the Royal Society of London, A, 222, 309-368.

Fisher, W. P., Jr. (2003, December). Mathematics, measurement, metaphor, metaphysics: Part I. Implications for method in postmodern science. Theory & Psychology, 13(6), 753-90.

Fisher, W. P., Jr. (2003, December). Mathematics, measurement, metaphor, metaphysics: Part II. Accounting for Galileo's "fateful omission." Theory & Psychology, 13(6), 791-828.

Fisher, W. P., Jr. (2004, October). Meaning and method in the social sciences. Human Studies: A Journal for Philosophy and the Social Sciences, 27(4), 429-54.

Gadamer, H.-G. (1980). Dialogue and dialectic: Eight hermeneutical studies on Plato (P. C. Smith, Trans.). New Haven: Yale University Press.

Gadamer, H.-G. (1989). Truth and method (J. Weinsheimer & D. G. Marshall, Trans.) (Rev. ed.). New York: Crossroad (Original work published 1960).

Heidegger, M. (1967). What is a thing? (W. B. Barton, Jr. & V. Deutsch, Trans.). South Bend, Indiana: Regnery/Gateway.

Kant, I. (1786). (1970). Metaphysical foundations of natural science (J. Ellington, Trans.). Indianapolis, Indiana: Bobbs-Merrill.

Lord, F. M. (1983). Small N justifies Rasch model. In D. J. Weiss (Ed.), New horizons in testing: Latent trait test theory and computerized adaptive testing (pp. 51-61). New York: Academic Press, Inc.

Luce, R. D. (1978). Dimensionally invariant numerical laws correspond to meaningful qualitative relations. Philosophy of Science, 45, 1-16.

Luce, R. D., & Tukey, J. W. (1964). Simultaneous conjoint measurement: A new kind of fundamental measurement. Journal of Mathematical Psychology, 1(1), 1-27.

Lumsden, J. (1978). Tests are perfectly reliable. British Journal of Mathematical and Statistical Psychology, 31, 19-26.

Michell, J. (1990). An introduction to the logic of psychological measurement. Hillsdale, New Jersey: Lawrence Erlbaum Associates.

Mundy, B. (1986). On the general theory of meaningful representation. Synthese, 67, 391-437.

Narens, L. (1981). A general theory of ratio scalability with remarks about the measurement-theoretic concept of meaningfulness. Theory and Decision: An International Journal for Philosophy and Methodology of the Social Sciences, 13, 1-70.

Narens, L. (2002). Theories of meaningfulness (S. W. Link & J. T. Townsend, Eds.). Scientific Psychology Series. Mahwah, New Jersey: Lawrence Erlbaum Associates.

Ricoeur, P. (1977). The rule of metaphor: Multi-disciplinary studies of the creation of meaning in language (R. Czerny, Trans.). Toronto: University of Toronto Press.

Roberts, F. S. (1985). Applications of the theory of meaningfulness to psychology. Journal of Mathematical Psychology, 29, 311-32.

Roberts, F. S. (1994). Limitations on conclusions using scales of measurement. In A. Barnett, S. Pollock & M. Rothkopf (Eds.), Operations research and the public sector (pp. 621-671). Amsterdam, The Netherlands: Elsevier.

Roberts, F. S. (1999). Meaningless statements. In R. Graham, J. Kratochvil, J. Nesetril & F. Roberts (Eds.), Contemporary trends in discrete mathematics, DIMACS Series, Volume 49 (pp. 257-274). Providence, RI: American Mathematical Society.

Stocking, M. L. (1989). Empirical estimation errors in item response theory as a function of test properties. Princeton, New Jersey: Educational Testing Service. ETS Research Reports.

Wood, R. (1978). Fitting the Rasch model: A heady tale. British Journal of Mathematical and Statistical Psychology, 31, 27-32.

Wood, D., & Bernasconi, R. (1988). Derrida and differance. Evanston, Illinois: Northwestern University Press.

Wright, B. D. (1984). Despair and hope for educational measurement. Contemporary Education Review, 3(1), 281-288.

Wright, B. D. (1992, Spring). IRT in the 1990's: Which models work best? Rasch Measurement Transactions, pp. 196-200 www.rasch.org/rmt/rmt61.htm].

Wright, B. D. (1999). Fundamental measurement for psychology. In S. E. Embretson & S. L. Hershberger (Eds.), The new rules of measurement: What every educator and psychologist should know (pp. 65-104 [www.rasch.org/memo64.htm]). Hillsdale, New Jersey: Lawrence Erlbaum Associates.

Meaningfulness, Measurement and Item Response Theory (IRT). Fisher W.P. Jr.… Rasch Measurement Transactions, 2005, 19:2 p. 1018-20

Rasch Books and Publications
Invariant Measurement: Using Rasch Models in the Social, Behavioral, and Health Sciences, 2nd Edn. George Engelhard, Jr. & Jue Wang	Applying the Rasch Model (Winsteps, Facets) 4th Ed., Bond, Yan, Heene	Advances in Rasch Analyses in the Human Sciences (Winsteps, Facets) 1st Ed., Boone, Staver	Advances in Applications of Rasch Measurement in Science Education, X. Liu & W. J. Boone	Rasch Analysis in the Human Sciences (Winsteps) Boone, Staver, Yale
Introduction to Many-Facet Rasch Measurement (Facets), Thomas Eckes	Statistical Analyses for Language Testers (Facets), Rita Green	Invariant Measurement with Raters and Rating Scales: Rasch Models for Rater-Mediated Assessments (Facets), George Engelhard, Jr. & Stefanie Wind	Aplicação do Modelo de Rasch (Português), de Bond, Trevor G., Fox, Christine M	Appliquer le modèle de Rasch: Défis et pistes de solution (Winsteps) E. Dionne, S. Béland
Exploring Rating Scale Functioning for Survey Research (R, Facets), Stefanie Wind	Rasch Measurement: Applications, Khine	Winsteps Tutorials - free Facets Tutorials - free	Many-Facet Rasch Measurement (Facets) - free, J.M. Linacre	Fairness, Justice and Language Assessment (Winsteps, Facets), McNamara, Knoch, Fan
Other Rasch-Related Resources: Rasch Measurement YouTube Channel
Rasch Measurement Transactions & Rasch Measurement research papers - free	An Introduction to the Rasch Model with Examples in R (eRm, etc.), Debelak, Strobl, Zeigenfuse	Rasch Measurement Theory Analysis in R, Wind, Hua	Applying the Rasch Model in Social Sciences Using R, Lamprianou	El modelo métrico de Rasch: Fundamentación, implementación e interpretación de la medida en ciencias sociales (Spanish Edition), Manuel González-Montesinos M.
Rasch Models: Foundations, Recent Developments, and Applications, Fischer & Molenaar	Probabilistic Models for Some Intelligence and Attainment Tests, Georg Rasch	Rasch Models for Measurement, David Andrich	Constructing Measures, Mark Wilson	Best Test Design - free, Wright & Stone Rating Scale Analysis - free, Wright & Masters
Virtual Standard Setting: Setting Cut Scores, Charalambos Kollias	Diseño de Mejores Pruebas - free, Spanish Best Test Design	A Course in Rasch Measurement Theory, Andrich, Marais	Rasch Models in Health, Christensen, Kreiner, Mesba	Multivariate and Mixture Distribution Rasch Models, von Davier, Carstensen

Go to Institute for Objective Measurement Home Page. The Rasch Measurement SIG (AERA) thanks the Institute for Objective Measurement for inviting the publication of Rasch Measurement Transactions on the Institute's website, www.rasch.org.

Coming Rasch-related Events
Oct. 4 - Nov. 8, 2024, Fri.-Fri.	On-line workshop: Rasch Measurement - Core Topics (E. Smith, Winsteps), www.statistics.com
Jan. 17 - Feb. 21, 2025, Fri.-Fri.	On-line workshop: Rasch Measurement - Core Topics (E. Smith, Winsteps), www.statistics.com
May 16 - June 20, 2025, Fri.-Fri.	On-line workshop: Rasch Measurement - Core Topics (E. Smith, Winsteps), www.statistics.com
June 20 - July 18, 2025, Fri.-Fri.	On-line workshop: Rasch Measurement - Further Topics (E. Smith, Facets), www.statistics.com
Oct. 3 - Nov. 7, 2025, Fri.-Fri.	On-line workshop: Rasch Measurement - Core Topics (E. Smith, Winsteps), www.statistics.com