When the twentieth century is viewed from the perspective of mental test technology, the Rasch model stands out as a watershed between earlier forms of empirical investigation and the construction of objective social research. By its elimination of reference groups and its emphasis on objective measurement which statistically models the properties of linearity and additivity, the Rasch model offers researchers the opportunity to undertake quantitative studies of mental growth and development with a precision and clarity that, even now, we only expect in the physical sciences.
Unfortunately, advances in methodology, however significant, are generally resisted by a research community (Cohen, 1985). New ideas and methods, despite their benefits, require reappraisal of prevailing practice and the acquisition of new concepts and skills. In general, the more concepts and techniques that must be given up, the more resistance there is against a new methodology. This resistance to change is necessary to protect the practice of science from frivolity and triviality but, not surprisingly, it also inhibits the dissemination and dispersal of genuine advances in scientific method and thinking.
Four topics associated with traditional mental testing obscure the advantages that objective measurement has for the study of educational and mental growth: (1) the distinction between qualitative and quantitative observations, (2) empirical analyses based on grade equivalents, (3) the longing for an absolute zero for the measurement of mental characteristics, and (4) a rigid conceptualization of reliability.
Qualitative versus quantitative:
Few issues in contemporary discussions of research methodology, and
arguably in the history of social research, are more artificial and
have led to more muddled thinking than a futile distinction that some
researchers make between qualitative and quantitative observations.
Miles & Huberman (1984) argue that social reality consists of a
fundamental dichotomy between quantitative and qualitative
observations which logically prohibits the application of statistical
methods to particular social observations. Their approach attempts to
preserve the "qualitative" integrity of social phenomena from a feared
debasement by quantitative methods. Many philosophers have analyzed
this misbegotten perspective and have concluded that the claimed
distinction between qualitative and quantitative observations has no
logical basis (Kaplan, 1964; Richardt & Cook, 1979, 1980; Walker &
Evers, 1988). Unfortunately, most measurement specialists in
contemporary social research, unable or unwilling to address this
conclusion, have abandoned vast areas of empirical study to inadequate
methods which cannot even approximate objectivity.
In the 1920's, Thurstone concluded that, while every measure begins as a qualitative experience, all empirical investigations, on close inspection, involve the application of quantitative reasoning. A measure focuses on a single aspect of experience, associated with some quality of particular interest, and describes it numerically in order to accomplish a fundamental scientific goal -- the precise description of variation.
Observations do not fall into mutually exclusive quantitative and qualitative classes. All observations are at first qualitative. The methods by which observations are used, however, are almost all quantitative. An important distinction between methods is the degree to which observations are summarized numerically. At one extreme are the "qualitative" methods that employ only non-numerical description, such as personal impression and subjective opinion. Other methods achieving greater generality apply increasingly numerical description, e.g. rank orders. At the other extreme, there are methods that rely exclusively on scientifically modelled linear measures.
Grade equivalents:
In 1972, Angoff noted the severe shortcomings of grade equivalents
(GE) when measuring intellectual growth. The definition of GE's
enforces an equal amount of growth each year, forcing all growth
curves to be straight lines of predetermined slope, thus completely
concealing variations in growth rate. Angoff explained how
differences in GE could not be interpreted as differences in ability
and so urged that GE's be avoided. Twenty years later, scholarly
journals, public school systems and government agencies, otherwise
committed to clarity and precision, continue to study and report
growth in GE's. What a scientific embarrassment!.
Absolute zero:
For many, researchers and lay-persons, measurement in the social
sciences will always seem fundamentally flawed because measures of
non-physical characteristics, such as mental ability or attitude, do
not seem to have the "natural" absolute zeroes so plentiful in
physics. Even measurement specialists, knowledgeable about their
particular techniques, fail to provide an adequate response to this
naive apprehension. In fact, the "no zero" criticism is frequently
accepted as an inherent limitation on the application of science to
human affairs. This, in turn, perpetuates a myth, associated with
Descartes, that the human aspects of experience are not suitable for
scientific investigation.
While the role of zero in measurement has several perspectives , I offer the reader two from the physical sciences. First is the simple fact that many measurement applications in the physical sciences, such as pitch, hue, loudness and hardness, do well without any absolute zeroes. The Mohs hardness scale is not even a measure, but a physical operation for ranking geological specimens! In fact, the familiar measures of length and time only acquire their zeroes through the context in which they are applied. Neither length nor time have natural origins or absolute zeroes. What they have is agreed upon starting points - the points from which differences are measured. The practical importance of "natural" zeroes is vastly overrated.
Second is a lesson from thermodynamics where researchers use scales with various zeroes, each of which has its own theoretical significance. In a social research devoted to the expansion of scientific knowledge, this is the central concern. At the simplest level, say for measurement of temperature, the correlation of an observation with the physical expansion of a criterion requires only a convention to establish the numerical values on a scale such as centigrade or fahrenheit. The zero is no more than a convenient means of anchoring the numbers on the scale.
At higher levels, speculation on theoretical constructs that might underlie the interaction of observation and instrument become central. The instrument developer puts greater emphasis on assigning numbers to a scale according to a reproducible consistency between numerical order and hypothesized theoretical terms. In proposing that temperature scales be based on molecular activity and heat exchange, the concept of zero acquires a theoretical context the utility of which can be investigated through empirical research. When successful, this approach results in a measure with broad empirical implications, an outcome not possible when measurement is based on no more than a correlation with a criterion. The importance of conceptual insight to the development of the theoretical context for a scale of temperature with a meaningful zero applies equally well to the development of social measures.
Reliability:
Reliability is a term that has taken on a sacred and obscure status in
contemporary social research. Few measurement terms have wider
application and less meaning. Researchers rely upon reliability to
qualify the fundamental adequacy of their research. They use it as
the blanket criterion for success or failure. From the perspective of
objective measurement, however, the implications of any particular
reliability are revealed as ambiguous at best. The reliability of a
test is determined by a local and by no means general or necessary
mixture of item difficulties and person abilities. A minor, even
trivial, change in any part of this mixture will change the value of
the reliability coefficient. Indeed, it is not possible to decide
from the value of a reliability coefficient alone whether the test in
question is useful or useless. This widespread misunderstanding about
reliability leads to confusion at best and to entirely erroneous
conclusions at worst.
Faulty Thinking by Educational Researchers, N Bezruczko Rasch Measurement Transactions, 1990, 4:3 p. 114-115
Forum | Rasch Measurement Forum to discuss any Rasch-related topic |
Go to Top of Page
Go to index of all Rasch Measurement Transactions
AERA members: Join the Rasch Measurement SIG and receive the printed version of RMT
Some back issues of RMT are available as bound volumes
Subscribe to Journal of Applied Measurement
Go to Institute for Objective Measurement Home Page. The Rasch Measurement SIG (AERA) thanks the Institute for Objective Measurement for inviting the publication of Rasch Measurement Transactions on the Institute's website, www.rasch.org.
Coming Rasch-related Events | |
---|---|
Apr. 21 - 22, 2025, Mon.-Tue. | International Objective Measurement Workshop (IOMW) - Boulder, CO, www.iomw.net |
Jan. 17 - Feb. 21, 2025, Fri.-Fri. | On-line workshop: Rasch Measurement - Core Topics (E. Smith, Winsteps), www.statistics.com |
Feb. - June, 2025 | On-line course: Introduction to Classical Test and Rasch Measurement Theories (D. Andrich, I. Marais, RUMM2030), University of Western Australia |
Feb. - June, 2025 | On-line course: Advanced Course in Rasch Measurement Theory (D. Andrich, I. Marais, RUMM2030), University of Western Australia |
May 16 - June 20, 2025, Fri.-Fri. | On-line workshop: Rasch Measurement - Core Topics (E. Smith, Winsteps), www.statistics.com |
June 20 - July 18, 2025, Fri.-Fri. | On-line workshop: Rasch Measurement - Further Topics (E. Smith, Facets), www.statistics.com |
Oct. 3 - Nov. 7, 2025, Fri.-Fri. | On-line workshop: Rasch Measurement - Core Topics (E. Smith, Winsteps), www.statistics.com |
The URL of this page is www.rasch.org/rmt/rmt43e.htm
Website: www.rasch.org/rmt/contents.htm