MEASUREMENT RESEARCH ASSOCIATES
TEST INSIGHTS
September 2009
Greetings
 

Item writers often find it difficult to write multiple choice items that comply with good-item writing guidelines. This study shows that it is worth the extra effort spent writing good items.

Ross Brown
Manager, Test Development and Analysis

Consequences of Flawed Items
Many guidelines for writing good multiple choice items are intended to reduce the measurement error that results when candidates who potentially know the information being tested get an item wrong due to the construction of the item. Two examples of item flaws that may introduce such measurement error are multiple true/false items, and items with negative stems.

Multiple true/false items violate the principle that items should be focused on a single idea or issue. Multiple true-false items usually consist of a minimal stem and distractors that are conceptually unrelated. Candidates are required to assess each distractor independently and determine whether each response is true or false. For example:

                        The common cold:
                        A.  is transmitted through saliva only.
                        B.  is evident in a chest X-ray
                        C.  will most often clear up after two days.
                        D.  is treatable with Tamiflu.

Items with negative stems require candidates to select from the distractors the one that does NOT answer the conditions described in the stem. Candidates may get these items incorrect because they skim over and miss the negative word in the stem, and mistakenly choose a response that meets the conditions in the stem. In addition, these items do not assess what the candidate actually knows, but rather if they can identify an incorrect response to the issue presented in the stem. For example, a candidate can answer the question below without knowing the color of a pomegranate.
                         Which of the following is NOT red?
                         A.     apples
                         B.     pomegranates
                         C.     pears
                         D.     tomatoes

This study looked at the consequences of using items with these flaws in terms of 1) item difficulty and 2) candidate outcomes. This study is patterned after a study of items administered to medical school students by Downing (2005). The analysis was conducted on a group of 138 items, of which 69 were flawed items and 69 were unflawed items. The item flaws were multiple true/false and negative items.

Item p-value is the percentage of candidates who answered the item correctly. The table below shows that the average p-value for the flawed items was lower than for the unflawed items and the total items, indicating these items are more difficult for candidates to answer correctly.

 

69
Flawed Items
69
Unflawed Items
138
Total Items
P-value

.61

.69

.65


For purposes of this study the passing standard was set arbitrarily at a score of 65% correct.  Candidates outcomes were then determined based on the total items, flawed items only and unflawed items only. Only 37% of the candidates pass when the flawed items are used, compared to 71% of the candidates passing when the unflawed items are used, and 52% passing based on total items. 
 
While this study is simulated from real data, it confirms the impact of flawed items found by Downing. It also provides concrete evidence that supports eliminating multiple true/false and items with negative stems from examinations. 

Reference
Downing, S. M. (2005). The effects of violating standard item writing principles on tests and students: The consequences of using flawed test items on achievement examinations in medical education. Advances in Health Sciences Education, 10, 133-143.
Measurement Research Associates, Inc.
505 North Lake Shore Dr., Suite 1304
Chicago, IL  60611
Phone: (312) 822-9648     Fax: (312) 822-9650


Rasch-Related Resources: Rasch Measurement YouTube Channel
Rasch Measurement Transactions & Rasch Measurement research papers - free An Introduction to the Rasch Model with Examples in R (eRm, etc.), Debelak, Strobl, Zeigenfuse Rasch Measurement Theory Analysis in R, Wind, Hua Applying the Rasch Model in Social Sciences Using R, Lamprianou El modelo métrico de Rasch: Fundamentación, implementación e interpretación de la medida en ciencias sociales (Spanish Edition), Manuel González-Montesinos M.
Rasch Models: Foundations, Recent Developments, and Applications, Fischer & Molenaar Probabilistic Models for Some Intelligence and Attainment Tests, Georg Rasch Rasch Models for Measurement, David Andrich Constructing Measures, Mark Wilson Best Test Design - free, Wright & Stone
Rating Scale Analysis - free, Wright & Masters
Virtual Standard Setting: Setting Cut Scores, Charalambos Kollias Diseño de Mejores Pruebas - free, Spanish Best Test Design A Course in Rasch Measurement Theory, Andrich, Marais Rasch Models in Health, Christensen, Kreiner, Mesba Multivariate and Mixture Distribution Rasch Models, von Davier, Carstensen
Rasch Books and Publications: Winsteps and Facets
Applying the Rasch Model (Winsteps, Facets) 4th Ed., Bond, Yan, Heene Advances in Rasch Analyses in the Human Sciences (Winsteps, Facets) 1st Ed., Boone, Staver Advances in Applications of Rasch Measurement in Science Education, X. Liu & W. J. Boone Rasch Analysis in the Human Sciences (Winsteps) Boone, Staver, Yale Appliquer le modèle de Rasch: Défis et pistes de solution (Winsteps) E. Dionne, S. Béland
Introduction to Many-Facet Rasch Measurement (Facets), Thomas Eckes Rasch Models for Solving Measurement Problems (Facets), George Engelhard, Jr. & Jue Wang Statistical Analyses for Language Testers (Facets), Rita Green Invariant Measurement with Raters and Rating Scales: Rasch Models for Rater-Mediated Assessments (Facets), George Engelhard, Jr. & Stefanie Wind Aplicação do Modelo de Rasch (Português), de Bond, Trevor G., Fox, Christine M
Exploring Rating Scale Functioning for Survey Research (R, Facets), Stefanie Wind Rasch Measurement: Applications, Khine Winsteps Tutorials - free
Facets Tutorials - free
Many-Facet Rasch Measurement (Facets) - free, J.M. Linacre Fairness, Justice and Language Assessment (Winsteps, Facets), McNamara, Knoch, Fan

To be emailed about new material on www.rasch.org
please enter your email address here:

I want to Subscribe: & click below
I want to Unsubscribe: & click below

Please set your SPAM filter to accept emails from Rasch.org

www.rasch.org welcomes your comments:
Please email inquiries about Rasch books to books \at/ rasch.org

Your email address (if you want us to reply):

 

FORUMRasch Measurement Forum to discuss any Rasch-related topic

Coming Rasch-related Events
Apr. 21 - 22, 2025, Mon.-Tue. International Objective Measurement Workshop (IOMW) - Boulder, CO, www.iomw.net
Jan. 17 - Feb. 21, 2025, Fri.-Fri. On-line workshop: Rasch Measurement - Core Topics (E. Smith, Winsteps), www.statistics.com
Feb. - June, 2025 On-line course: Introduction to Classical Test and Rasch Measurement Theories (D. Andrich, I. Marais, RUMM2030), University of Western Australia
Feb. - June, 2025 On-line course: Advanced Course in Rasch Measurement Theory (D. Andrich, I. Marais, RUMM2030), University of Western Australia
May 16 - June 20, 2025, Fri.-Fri. On-line workshop: Rasch Measurement - Core Topics (E. Smith, Winsteps), www.statistics.com
June 20 - July 18, 2025, Fri.-Fri. On-line workshop: Rasch Measurement - Further Topics (E. Smith, Facets), www.statistics.com
Oct. 3 - Nov. 7, 2025, Fri.-Fri. On-line workshop: Rasch Measurement - Core Topics (E. Smith, Winsteps), www.statistics.com