Student Personality and Test Objectivity

How can we know when a single achievement test result is valid for a particular student? We can't. The universally-applicable uncertainty principle applies in several ways: we cannot determine the effect on the measured by the measuring instrumentation from a single test result. However, in some camps, there is a belief that it is possible to tell that a test produced an invalid measure of a student's performance by perusing the pattern of responses to test items. Is this an application of science or an attractive mythology? This is a serious question that has important implications for kids, teachers, and testing programs.

Is calculating "student fit" an example of detecting a difference? A discrepancy between what and what? Difference between students' item-by-item performance and the item-by-item relationships specified by Rasch mathematics? Student personalized test performance being different from behavior we expected to happen on a purely probabilistic basis? Difference between what a student did on a test and what that student should have done under more favorable sampling conditions?

It is always troubling when something done with numbers is equated to human behavior or misbehavior, if the connection between them is an explanation. Explanations never prove anything; they just make people say "Oh! I see." Since William Fisher has done such an artistic in-service on the contextual nature of seemingly objective observations of reality, such as response patterns, it might be appropriate to back off and consider the sources of the response patterns in tests and the way they are observed by psychometricians. For ground rules let's concern ourselves with the "what" and "how" questions of science and leave the "why" questions to philosophy.

Consider the mythology surrounding the traditional item discrimination index calculation and its use, without regard to the reality of curriculum pacing. What does this kind of "discrimination" really mean? Information about the wide or narrow range of the curriculum over which an item is supposed to reveal student knowledge is embodied in instructional programs. This range is only superficially, rather than causally, represented by high and low groups in some specified population.

Another myth is that some items should count more than others because they are mathematically related to high and low groups. These examples of overextended, tangential reasoning seem plausible enough to be swallowed by some statisticians.

In contrast, a different kind of background information comes from the practice, some years ago, of providing teachers with test information that incorporated information about other related characteristics of their students. The number of items students got correct and the number they attempted on a speeded verbal test were included in the teacher's report. To get a derived score, the number correct was divided by the number tried and this was then multiplied once more by the number correct. All scores were reported in standard notation based on the district population. Although it was not clear whether this "score" reflected study skills, a turtle-or-hare approach to problem solving, or was just equated for time constraints, it was clear that the teachers recognized and liked the way this information reflected their own assessment of the students approach to schooling. Some students worked carefully and got all of the few items they tried correct. Some raced ahead to increase the total number correct and take the penalty of more errors. Both behaviors reflected students' ways of working - their achievement mode. This derived "score" was more useful in reflecting ability to do school work on a day-to-day basis than any other we could derive from the data.

We know that test validity is specific to a student. We assume that validity differs across students taking a test. And, although we can't know their covert behaviors, we assume each student is unique in the combination of learning style, background experiences, cortical cycling speed, learned social responses, interest in different aspects of the curriculum, success or failure with schooling, and other personality characteristics, as well as in transitory attitudes toward school, testing, teachers, and others. Can't we, then, simply say that, whatever responses a student makes, they reflect ability to answer questions in this situation? Scores taken previously and in the future can be the data to use in deciding to delete a score as aberrant.

Consider also that validity is actually made up of two separate components; instrument validity is one and the student component, what we sometimes call reliability, is the other. Like heredity and environment, neither can exist without the other, but we can discuss their contributions individually. The personal part of test taking can only be guessed. Our perception of its manifestation is biased by many experiences and our learned response of searching for a statistic when all else fails. Why not leave reliability to whatever unknown factors might be exerting influence?

We know that learning for an individual should not be expected to progress in a linear manner or at a constant pace. We know that reinforcement of different learnings in the school situation varies and its effects on different individuals vary. And we know that teaching some lessons make earlier learnings evaporate. For example, students who formerly knew how to correctly add a number to zero will often incorrectly answer later that a number and zero equal zero after they have been taught that a number multiplied by zero equals zero.

Low raw scores are a different matter. They raise a different question - poor test construction. First, get rid of ambiguous items. Second, use a different test. When students get less than a third of the items correct on a test, it is due to faulty test assignment procedures and can be corrected by retesting with a lower level test. Inevitable idiosyncrasies in teaching and learning must never be penalized in a misguided effort to report higher test reliabilities.

Student personality and test objectivity. Ingebo G. … Rasch Measurement Transactions, 1990, 3:4 p.86

Rasch Books and Publications
Invariant Measurement: Using Rasch Models in the Social, Behavioral, and Health Sciences, 2nd Edn. George Engelhard, Jr. & Jue Wang	Applying the Rasch Model (Winsteps, Facets) 4th Ed., Bond, Yan, Heene	Advances in Rasch Analyses in the Human Sciences (Winsteps, Facets) 1st Ed., Boone, Staver	Advances in Applications of Rasch Measurement in Science Education, X. Liu & W. J. Boone	Rasch Analysis in the Human Sciences (Winsteps) Boone, Staver, Yale
Introduction to Many-Facet Rasch Measurement (Facets), Thomas Eckes	Statistical Analyses for Language Testers (Facets), Rita Green	Invariant Measurement with Raters and Rating Scales: Rasch Models for Rater-Mediated Assessments (Facets), George Engelhard, Jr. & Stefanie Wind	Aplicação do Modelo de Rasch (Português), de Bond, Trevor G., Fox, Christine M	Appliquer le modèle de Rasch: Défis et pistes de solution (Winsteps) E. Dionne, S. Béland
Exploring Rating Scale Functioning for Survey Research (R, Facets), Stefanie Wind	Rasch Measurement: Applications, Khine	Winsteps Tutorials - free Facets Tutorials - free	Many-Facet Rasch Measurement (Facets) - free, J.M. Linacre	Fairness, Justice and Language Assessment (Winsteps, Facets), McNamara, Knoch, Fan
Other Rasch-Related Resources: Rasch Measurement YouTube Channel
Rasch Measurement Transactions & Rasch Measurement research papers - free	An Introduction to the Rasch Model with Examples in R (eRm, etc.), Debelak, Strobl, Zeigenfuse	Rasch Measurement Theory Analysis in R, Wind, Hua	Applying the Rasch Model in Social Sciences Using R, Lamprianou	El modelo métrico de Rasch: Fundamentación, implementación e interpretación de la medida en ciencias sociales (Spanish Edition), Manuel González-Montesinos M.
Rasch Models: Foundations, Recent Developments, and Applications, Fischer & Molenaar	Probabilistic Models for Some Intelligence and Attainment Tests, Georg Rasch	Rasch Models for Measurement, David Andrich	Constructing Measures, Mark Wilson	Best Test Design - free, Wright & Stone Rating Scale Analysis - free, Wright & Masters
Virtual Standard Setting: Setting Cut Scores, Charalambos Kollias	Diseño de Mejores Pruebas - free, Spanish Best Test Design	A Course in Rasch Measurement Theory, Andrich, Marais	Rasch Models in Health, Christensen, Kreiner, Mesba	Multivariate and Mixture Distribution Rasch Models, von Davier, Carstensen

Go to Institute for Objective Measurement Home Page. The Rasch Measurement SIG (AERA) thanks the Institute for Objective Measurement for inviting the publication of Rasch Measurement Transactions on the Institute's website, www.rasch.org.

Coming Rasch-related Events
Apr. 21 - 22, 2025, Mon.-Tue.	International Objective Measurement Workshop (IOMW) - Boulder, CO, www.iomw.net
Jan. 17 - Feb. 21, 2025, Fri.-Fri.	On-line workshop: Rasch Measurement - Core Topics (E. Smith, Winsteps), www.statistics.com
Feb. - June, 2025	On-line course: Introduction to Classical Test and Rasch Measurement Theories (D. Andrich, I. Marais, RUMM2030), University of Western Australia
Feb. - June, 2025	On-line course: Advanced Course in Rasch Measurement Theory (D. Andrich, I. Marais, RUMM2030), University of Western Australia
May 16 - June 20, 2025, Fri.-Fri.	On-line workshop: Rasch Measurement - Core Topics (E. Smith, Winsteps), www.statistics.com
June 20 - July 18, 2025, Fri.-Fri.	On-line workshop: Rasch Measurement - Further Topics (E. Smith, Facets), www.statistics.com
July 21 - 23, 2025, Mon.-Wed.	Pacific Rim Objective Measurement Symposium (PROMS) 2025, www.proms2025.com
Oct. 3 - Nov. 7, 2025, Fri.-Fri.	On-line workshop: Rasch Measurement - Core Topics (E. Smith, Winsteps), www.statistics.com