Deleting Items Improves Reliability on Multiple Choice Examinations

MEASUREMENT RESEARCH ASSOCIATES

TEST INSIGHTS

May 2008

Greetings!

The quest to improve reliability of certification examinations is ongoing. The quality of the items is the basis for educational measurement. Our observations are that removing poorly performing items (usually poorly written items) from scoring, actually reduces the error of measurement and improves the reliability of the examination.

Mary E. Lunz, Ph.D.

Deleting Items Improves Reliability on Multiple Choice Examinations

The purpose of written certification examinations is to identify the candidates who are qualified to practice effectively. The mechanism for accomplishing this is usually four or five part multiple choice items. The quality of the multiple choice items included in an examination is the basis for the reliability or the accuracy of the decisions made about candidate performance. In classical terms, this means the item should have a good p-value (percent correct) and point biserial correlation. In Rasch terms it means the difficulty, as well as, the infit and outfit should be within acceptable limits. Of course, the items must reasonably represent the pertinent content areas in the field of practice. Meeting the criteria for good item performance leads to a lower error of measurement, and more accurate outcomes for candidates. Candidate separation reliability ((Standard Deviation² - Standard Error²)/Standard Deviation²) estimates the accuracy of the measured differences among candidate performance.

On items that are good measures, candidates who do well on the total test have the highest probability of answering the item correctly, while candidates who do poorly have the lowest probability of answering the item correctly. There are many item writing guides that reiterate item writing principles (see Item Development Guidelines at www.MeasurementResearch.com). When multiple choice items are well written, they distinguish between more and less knowledgeable candidates, reduce the error of measurement, and consequently lead to a higher candidate separation reliability.

One way to reduce measurement error is to include a sufficient number of items on the examination, at least 100. The conventional wisdom is that more items decrease the error of measurement and increase reliability. However, after reviewing the data from many examinations, we have found that it takes more than long tests to improve reliability. The consistency of item content within sections and within the test is critical for good reliability. Another issue is the statistical performance of the item on the test. Whether item performance is measured with classical statistics or with Rasch IRT, items that do not perform well introduce measurement error and subsequently reduce examination reliability. In fact, we have found that deleting poorly performing items often increases the reliability of the examination, even though the total number of items decreases. Some examples that confirm the value of deleting poorly performing items are shown in the Table below.

Exam	Number of items before deletion	Reliability of Candidate Separation before item deletion	Number of items after deletion	Reliability of Candidate Separation after item deletion
Exam 1	150	0.89	133	0.91
Exam 2	351	0.88	313	0.90
Exam 3	225	0.77	217	0.80
Exam 4	200	0.82	190	0.83
Exam 5	150	0.83	142	0.85

Measurement Research Associates, Inc.

505 North Lake Shore Dr., Suite 1304

Chicago, IL 60611

Phone: (312) 822-9648 Fax: (312) 822-9650

www.MeasurementResearch.com

Rasch-Related Resources: Rasch Measurement YouTube Channel
Rasch Measurement Transactions & Rasch Measurement research papers - free	An Introduction to the Rasch Model with Examples in R (eRm, etc.), Debelak, Strobl, Zeigenfuse	Rasch Measurement Theory Analysis in R, Wind, Hua	Applying the Rasch Model in Social Sciences Using R, Lamprianou	El modelo métrico de Rasch: Fundamentación, implementación e interpretación de la medida en ciencias sociales (Spanish Edition), Manuel González-Montesinos M.
Rasch Models: Foundations, Recent Developments, and Applications, Fischer & Molenaar	Probabilistic Models for Some Intelligence and Attainment Tests, Georg Rasch	Rasch Models for Measurement, David Andrich	Constructing Measures, Mark Wilson	Best Test Design - free, Wright & Stone Rating Scale Analysis - free, Wright & Masters
Virtual Standard Setting: Setting Cut Scores, Charalambos Kollias	Diseño de Mejores Pruebas - free, Spanish Best Test Design	A Course in Rasch Measurement Theory, Andrich, Marais	Rasch Models in Health, Christensen, Kreiner, Mesba	Multivariate and Mixture Distribution Rasch Models, von Davier, Carstensen
Rasch Books and Publications: Winsteps and Facets
Applying the Rasch Model (Winsteps, Facets) 4th Ed., Bond, Yan, Heene	Advances in Rasch Analyses in the Human Sciences (Winsteps, Facets) 1st Ed., Boone, Staver	Advances in Applications of Rasch Measurement in Science Education, X. Liu & W. J. Boone	Rasch Analysis in the Human Sciences (Winsteps) Boone, Staver, Yale	Appliquer le modèle de Rasch: Défis et pistes de solution (Winsteps) E. Dionne, S. Béland
Introduction to Many-Facet Rasch Measurement (Facets), Thomas Eckes	Rasch Models for Solving Measurement Problems (Facets), George Engelhard, Jr. & Jue Wang	Statistical Analyses for Language Testers (Facets), Rita Green	Invariant Measurement with Raters and Rating Scales: Rasch Models for Rater-Mediated Assessments (Facets), George Engelhard, Jr. & Stefanie Wind	Aplicação do Modelo de Rasch (Português), de Bond, Trevor G., Fox, Christine M
Exploring Rating Scale Functioning for Survey Research (R, Facets), Stefanie Wind	Rasch Measurement: Applications, Khine	Winsteps Tutorials - free Facets Tutorials - free	Many-Facet Rasch Measurement (Facets) - free, J.M. Linacre	Fairness, Justice and Language Assessment (Winsteps, Facets), McNamara, Knoch, Fan

Rasch-Related Resources: Rasch Measurement YouTube Channel

Rasch Measurement Transactions & Rasch Measurement research papers - free

An Introduction to the Rasch Model with Examples in R (eRm, etc.), Debelak, Strobl, Zeigenfuse

Rasch Measurement Theory Analysis in R, Wind, Hua

Applying the Rasch Model in Social Sciences Using R, Lamprianou

El modelo métrico de Rasch: Fundamentación, implementación e interpretación de la medida en ciencias sociales (Spanish Edition), Manuel González-Montesinos M.

Rasch Models: Foundations, Recent Developments, and Applications, Fischer & Molenaar

Probabilistic Models for Some Intelligence and Attainment Tests, Georg Rasch

Rasch Models for Measurement, David Andrich

Constructing Measures, Mark Wilson

Best Test Design - free, Wright & Stone
Rating Scale Analysis - free, Wright & Masters

Virtual Standard Setting: Setting Cut Scores, Charalambos Kollias

Diseño de Mejores Pruebas - free, Spanish Best Test Design

A Course in Rasch Measurement Theory, Andrich, Marais

Rasch Models in Health, Christensen, Kreiner, Mesba

Multivariate and Mixture Distribution Rasch Models, von Davier, Carstensen

Rasch Books and Publications: Winsteps and Facets

Applying the Rasch Model (Winsteps, Facets) 4th Ed., Bond, Yan, Heene

Advances in Rasch Analyses in the Human Sciences (Winsteps, Facets) 1st Ed., Boone, Staver

Advances in Applications of Rasch Measurement in Science Education, X. Liu & W. J. Boone

Rasch Analysis in the Human Sciences (Winsteps) Boone, Staver, Yale

Appliquer le modèle de Rasch: Défis et pistes de solution (Winsteps) E. Dionne, S. Béland

Introduction to Many-Facet Rasch Measurement (Facets), Thomas Eckes

Rasch Models for Solving Measurement Problems (Facets), George Engelhard, Jr. & Jue Wang

Statistical Analyses for Language Testers (Facets), Rita Green

Invariant Measurement with Raters and Rating Scales: Rasch Models for Rater-Mediated Assessments (Facets), George Engelhard, Jr. & Stefanie Wind

Aplicação do Modelo de Rasch (Português), de Bond, Trevor G., Fox, Christine M

Exploring Rating Scale Functioning for Survey Research (R, Facets), Stefanie Wind

Rasch Measurement: Applications, Khine

Winsteps Tutorials - free
Facets Tutorials - free

Many-Facet Rasch Measurement (Facets) - free, J.M. Linacre

Fairness, Justice and Language Assessment (Winsteps, Facets), McNamara, Knoch, Fan

Coming Rasch-related Events
Apr. 21 - 22, 2025, Mon.-Tue.	International Objective Measurement Workshop (IOMW) - Boulder, CO, www.iomw.net
Jan. 17 - Feb. 21, 2025, Fri.-Fri.	On-line workshop: Rasch Measurement - Core Topics (E. Smith, Winsteps), www.statistics.com
Feb. - June, 2025	On-line course: Introduction to Classical Test and Rasch Measurement Theories (D. Andrich, I. Marais, RUMM2030), University of Western Australia
Feb. - June, 2025	On-line course: Advanced Course in Rasch Measurement Theory (D. Andrich, I. Marais, RUMM2030), University of Western Australia
May 16 - June 20, 2025, Fri.-Fri.	On-line workshop: Rasch Measurement - Core Topics (E. Smith, Winsteps), www.statistics.com
June 20 - July 18, 2025, Fri.-Fri.	On-line workshop: Rasch Measurement - Further Topics (E. Smith, Facets), www.statistics.com
July 21 - 23, 2025, Mon.-Wed.	Pacific Rim Objective Measurement Symposium (PROMS) 2025, www.proms2025.com
Oct. 3 - Nov. 7, 2025, Fri.-Fri.	On-line workshop: Rasch Measurement - Core Topics (E. Smith, Winsteps), www.statistics.com

Coming Rasch-related Events

Apr. 21 - 22, 2025, Mon.-Tue.

International Objective Measurement Workshop (IOMW) - Boulder, CO, www.iomw.net

Jan. 17 - Feb. 21, 2025, Fri.-Fri.

On-line workshop: Rasch Measurement - Core Topics (E. Smith, Winsteps), www.statistics.com

Feb. - June, 2025

On-line course: Introduction to Classical Test and Rasch Measurement Theories (D. Andrich, I. Marais, RUMM2030), University of Western Australia

Feb. - June, 2025

On-line course: Advanced Course in Rasch Measurement Theory (D. Andrich, I. Marais, RUMM2030), University of Western Australia

May 16 - June 20, 2025, Fri.-Fri.

On-line workshop: Rasch Measurement - Core Topics (E. Smith, Winsteps), www.statistics.com

June 20 - July 18, 2025, Fri.-Fri.

On-line workshop: Rasch Measurement - Further Topics (E. Smith, Facets), www.statistics.com

July 21 - 23, 2025, Mon.-Wed.

Pacific Rim Objective Measurement Symposium (PROMS) 2025, www.proms2025.com

Oct. 3 - Nov. 7, 2025, Fri.-Fri.

On-line workshop: Rasch Measurement - Core Topics (E. Smith, Winsteps), www.statistics.com