Psychometricians have long known that guessing is a major threat to the validity of a test score and can be a source for construct irrelevant variance. Guessing behaviors typically are investigated in a number of ways, but almost all involve administering an exam to an appropriate sample and investigating the scores and response patterns for clues that guessing might have occurred. At the University of North Carolina at Chapel Hill, we wanted to evaluate the psychometric integrity of our medical school exam items. In doing so, we opted to construct an exam consisting of actual medical school items and administer them to university staff in the Office of Medical Education. It was theorized that the sample would need to rely almost entirely on guessing strategies as none of the participants had any formal educational or experiential training in medicine or the health sciences. By intentionally offering an exam to an inappropriate sample we were able to more deliberately investigate guessing, identify which exam items were vulnerable to testwiseness, and better discern how guessing might impact the quality of our medical students' test scores.
As part of our experiment, a purposeful mix of easy, moderate, and difficult items were randomly pulled from each of the courses that comprise the first two years (pre- clinical) of the medical school curriculum. Criteria for determining easy, moderate, and difficult items were arbitrarily categorized by the following schema. Easy items were those that were answered correctly by 76% or more of medical students; moderately difficult items were those that were answered correctly by 51%-75% of medical students; and difficult items were those that were answered correctly by less than 50% of medical students. The exam consisted of a total of 63 items and was administered to 14 professional staff personnel in the Office of Medical Education. A requirement for participation in the study was that all staff must hold at least a bachelor's degree and have no formal educational or experiential training in the physical, life, or health sciences that might unduly offer an advantage on the exam. These criteria for inclusion were necessary so as to assess primarily guessing behaviors with minimal influence of content knowledge.
Accompanying each item was a follow-up question that asked test-takers to rate the extent to which they relied on guessing strategies to answer the previous question. Using Rogers (1999) framework for guessing, we asked test- takers to indicate whether they relied on random, cued, or informed guessing, or no guessing at all. Specifically, we provided the following item:
Please identify the strategy you used to answer the previous question from the options below:
Overall, results reveal a mix of guessing strategies were used. Table 1 presents information regarding the use and success of each guessing strategy. Participants reported they did not guess on 17 items, but the success rate for this strategy indicates they were correct only 70% of the time. Random guessing was used most frequently (nearly half the time), but resulted in the lowest success rate (around 24%). Cued and informed guessing resulted in nearly equal success rates (45-49%).
To take the analysis a step farther, we investigated guessers' performance based on item difficulty. Using the aforementioned criteria for easy, moderate, and difficult items, guessing strategies were investigated to determine which type of guessing resulted in the best success rate relative to item difficulty. Results indicate the easy items are highly vulnerable to guessing. Such high levels of contamination certainly threaten the validity of the information obtained from these items. Interestingly, cued guessing strategies resulted in a slightly higher success rate on easy items than having informed knowledge. However, as the difficulty of the items increased, success rates between cued and informed guessing strategies tended to shift towards informed guessing providing the greater probability of success. The gap between the success rates of informed guessing over cued guessing also widened when the items became more difficult.
According to Rasch measurement theory, a more knowledgeable person should always have a greater probability of success on any item than someone that is less knowledgeable. Because cued guessing (less knowledge) can result in a greater probability of success on easier items than informed guessing (some partial knowledge), this violates Rasch theory. Results presented here illustrate the necessity for good, sound items that are not susceptible to testwiseness strategies.
Additional Considerations and Recommendations
Guessing can impact virtually any test score. Even the best psychometrically functioning exams result in test- takers having a minimum of 20-25% chance of getting any given item correct when presented with four to five response options. Despite the ever-present threat to validity, it remains unclear to what extent guessing threatens the validity of test scores for persons/organizations that do not have a great deal of psychometric expertise and/or editorial resources. Professional testing organizations go to great pains to produce items that are as "bulletproof" as possible, but for others offering moderate to high-stakes exams, this is not always feasible. It is likely the threat to exam score validity is even greater in such situations.
Organizations without sophisticated psychometric expertise would be wise to securely administer their exams to a sample of savvy test-takers in an effort to determine the extent to which the exam items are susceptible to guessing strategies. By asking examinees to provide the type of guessing strategy they used to respond to each item one can get a reasonable estimate of how much guessing is a threat to one's exam. Items deemed particularly problematic, or contaminated, could then be revised and administered on future exams. With proper equating, one could evaluate the effectiveness of the attempt to remove guessing contamination by Rasch analyzing the data and comparing the probability of success on the revised item relative to the item in its initial form. If the item's difficulty estimate increases after the revision, it is likely the revision was successful in removing much of the guessing contamination.
Rogers, H. J. (1999). Guessing in multiple-choice tests. In G. N. Masters and J. P. Keeves (Eds.). Advances in measurement in educational research and assessment. (pp. 23-42) Oxford, UK: Pergamon.
Kenneth D. Royal and Mari-Wells Hedgpeth
University of North Carolina at Chapel Hill
Suggestions for Improving AERA's Peer Review Process and Quality of Symposia. William P. Fisher, Jr. Rasch Measurement Transactions, 2013, 27:1 p. 1408-9
Please help with Standard Dataset 4: Andrich Rating Scale Model
|Rasch Measurement Transactions (free, online)||Rasch Measurement research papers (free, online)||Probabilistic Models for Some Intelligence and Attainment Tests, Georg Rasch||Applying the Rasch Model 3rd. Ed., Bond & Fox||Best Test Design, Wright & Stone|
|Rating Scale Analysis, Wright & Masters||Introduction to Rasch Measurement, E. Smith & R. Smith||Introduction to Many-Facet Rasch Measurement, Thomas Eckes||Invariant Measurement: Using Rasch Models in the Social, Behavioral, and Health Sciences, George Engelhard, Jr.||Statistical Analyses for Language Testers, Rita Green|
|Rasch Models: Foundations, Recent Developments, and Applications, Fischer & Molenaar||Journal of Applied Measurement||Rasch models for measurement, David Andrich||Constructing Measures, Mark Wilson||Rasch Analysis in the Human Sciences, Boone, Stave, Yale|
|in Spanish:||Análisis de Rasch para todos, Agustín Tristán||Mediciones, Posicionamientos y Diagnósticos Competitivos, Juan Ramón Oreja Rodríguez|
|Forum||Rasch Measurement Forum to discuss any Rasch-related topic|
Go to Top of Page
Go to index of all Rasch Measurement Transactions
AERA members: Join the Rasch Measurement SIG and receive the printed version of RMT
Some back issues of RMT are available as bound volumes
Subscribe to Journal of Applied Measurement
Go to Institute for Objective Measurement Home Page. The Rasch Measurement SIG (AERA) thanks the Institute for Objective Measurement for inviting the publication of Rasch Measurement Transactions on the Institute's website, www.rasch.org.
|Coming Rasch-related Events|
|Sept. 15-16, 2017, Fri.-Sat.||IOMC 2017: International Outcome Measurement Conference, Chicago, jampress.org/iomc2017.htm|
|Oct. 13 - Nov. 10, 2017, Fri.-Fri.||On-line workshop: Practical Rasch Measurement - Core Topics (E. Smith, Winsteps), www.statistics.com|
|Oct. 25-27, 2017, Wed.-Fri.||In-person workshop: Applying the Rasch Model hands-on introductory workshop, Melbourne, Australia (T. Bond, B&FSteps), Announcement|
|Jan. 5 - Feb. 2, 2018, Fri.-Fri.||On-line workshop: Practical Rasch Measurement - Core Topics (E. Smith, Winsteps), www.statistics.com|
|Jan. 10-16, 2018, Wed.-Tues.||In-person workshop: Advanced Course in Rasch Measurement Theory and the application of RUMM2030, Perth, Australia (D. Andrich), Announcement|
|Jan. 17-19, 2018, Wed.-Fri.||Rasch Conference: Seventh International Conference on Probabilistic Models for Measurement, Matilda Bay Club, Perth, Australia, Website|
|April 13-17, 2018, Fri.-Tues.||AERA, New York, NY, www.aera.net|
|May 25 - June 22, 2018, Fri.-Fri.||On-line workshop: Practical Rasch Measurement - Core Topics (E. Smith, Winsteps), www.statistics.com|
|June 29 - July 27, 2018, Fri.-Fri.||On-line workshop: Practical Rasch Measurement - Further Topics (E. Smith, Winsteps), www.statistics.com|
|Aug. 10 - Sept. 7, 2018, Fri.-Fri.||On-line workshop: Many-Facet Rasch Measurement (E. Smith, Facets), www.statistics.com|
|Oct. 12 - Nov. 9, 2018, Fri.-Fri.||On-line workshop: Practical Rasch Measurement - Core Topics (E. Smith, Winsteps), www.statistics.com|
|The HTML to add "Coming Rasch-related Events" to your webpage is:|
The URL of this page is www.rasch.org/rmt/rmt271d.htm