Standard Errors for Performance Standards based on Bookmark Judgments

A variety of methods can be used to estimate the standard errors of performance standards or cut scores. Historically, these methods have ranged from classical methods based on the standard errors of mean panelist judgments (Jaeger, 1991) to more elaborate approaches based on generalizability theory (Yin and Sconing, 2008). Engelhard (2007) and his colleagues (Sullivan, Caines, Tucker, & Engelhard, 2008) recently described the use of Rasch measurement theory as a conceptual framework for evaluating the quality of panelist judgments within the context of bookmark and other item mapping based methods. The multifaceted Rasch measurement (MFR) model provides another approach for estimating the standard errors of performance standards. The MFR model can be used to model judgments collected from modified-Angoff procedures, as well as procedures based on item maps, such as bookmark and mapmark procedures (Schulz and Mitzel, in press).

Modified-Angoff and item-map based procedures are the two most popular methods for collecting judgments from standard-setting panelists (Cizek and Bunch, 2007). The bookmark procedure (Mitzel, Lewis, Patz, and Green, 2001) is becoming the standard-setting method of choice in many statewide assessment programs. For example one possible MRM model for bookmark judgments is:

Ln [Pnijk / Pnijk-1] = qn - di - wj - tk [1]

where

Pnijk = probability of panelist n giving a bookmark rating of k on item i for round j,

Pnijk-1 = probability of rating of k-1,

qn = judged performance level for panelist n,

di = judged difficulty for item i,

wj = judged performance level for round j, and

tk = judged performance standard for bookmark rating category k relative to category k-1.

The rating category coefficients, tk, defines the performance standards or cut scores.

In order to illustrate the use of the MFR model for estimating standard errors, data from the Michigan Educational Assessment Program are used. (http://www.michigan.gov/mde/) There were 21 panelists on the standard-setting panel. The instrument examined in this study is the Grade 3 mathematics test used in the Michigan Educational Assessment Program (MEAP). The judgments were obtained based on a modified bookmark approach called Item Mapping. The standard-setting judgments were obtained in three separate rounds.


Figure 1. Wright Map (Grade 3 mathematics)

The Wright map with the calibrations of the items, panelists, rounds, and performance standards is presented in Figure 1. The judged locations of the items represent the shared understandings of the standard-setting panelists for students within the four performance levels. Panelist locations represent their severities, while round locations represent average difficulties of judgments for each round. Finally, the category coefficients represent the performance standards by round (R.1, R.2, and R3 for these panelists on this assessment (A=Apprentice, B=Basic, M=Met, and E=Exceeded).

Table 1 presents the category statistics with the category coefficients defined as the performance standards or cut scores. The performance standards change over rounds, and the most disagreement is found in Round 1 for the Apprentice category (OUTFIT=3.00). The final column in Table 1 gives the standard errors. The standard errors for the performance standards do not vary much over rounds for the apprentice/basic cut score or the basic/met cut score. However, uncertainty regarding the met/exceeded category increases significantly over rounds. The error variance at Round 3 is three times larger than the error variance at Round 1 (.0625/.0225 = 2.7777). Figure 2 presents the category response function for the performance standards for Round 3. Figure 3 presents the information function with a very distinctive shape with a peak at each of the performance standards. The information function shows graphically the spread in the information function at each performance standard.


Additional work is needed to compare different approaches for estimating standard errors for performance standards. Given the high-stakes decisions made on the basis of assessments in education, health, and the professions, it is essential to develop procedures for conveying the uncertainty inherent in the estimated performance standards. The standard errors are readily obtained using the MFR model, and the MFR model offers additional information about the quality of standard-setting judgments that is not available with approaches based on classical or generalizability theory.

George Engelhard, Jr., Ph.D.

Emory University

Cizek, G.J., & Bunch, M.B. (2007). Standard setting: A guide to establishing and evaluating performance standards on tests. Thousand Oaks, CA: Sage.

Engelhard, G. (2007). Evaluating bookmark judgments. Rasch measurement Transactions, 21(2), 1097-1098.

Jaeger, R.M. (1991). Selection of judges for standard-setting. Educational Measurement, Spring, 3-6, 10, 14.

Mitzel, H.C., Lewis, D.M., Patz, R.J., & Green, D.R. (2001). The bookmark procedure: Psychological perspectives. In G.J. Cizek (Ed), Setting performance standards: Concepts, methods and perspectives (pp. 249-281). Mahwah, NJ: Lawrence Erlbaum Associates

Schulz, E.M., & Mitzel, H.C. (in press). A mapmark method of standard setting as implemented for the National Assessment Governing Board. In E. V. Smith, Jr., and G. E. Stone (Eds.), Applications of Rasch measurement in criterion-referenced testing, JAM Press.

Sullivan, R., Caines, J., Tucker, C., Engelhard, G. (March 2008). Examining the bookmark ratings of standard-setting panelists: An approach based on the multifaceted Rasch measurement model. IOMW 2008, New York.

Yin, P., & Sconing, J. (2007). Evaluating standard errors of cut scores for Item Rating and Mapmark procedures: A Generalizability Theory approach. Educational and Psychological Measurement, 68(1), 25-41.


Figure 2. Category Response Function


Figure 3.Information Function

Standard Errors for Performance Standards based on Bookmark Judgments. Engelhard, G. Jr. … Rasch Measurement Transactions, 2008, 22:1 p. 1156-7




Rasch Books and Publications
Invariant Measurement: Using Rasch Models in the Social, Behavioral, and Health Sciences, 2nd Edn. George Engelhard, Jr. & Jue Wang Applying the Rasch Model (Winsteps, Facets) 4th Ed., Bond, Yan, Heene Advances in Rasch Analyses in the Human Sciences (Winsteps, Facets) 1st Ed., Boone, Staver Advances in Applications of Rasch Measurement in Science Education, X. Liu & W. J. Boone Rasch Analysis in the Human Sciences (Winsteps) Boone, Staver, Yale
Introduction to Many-Facet Rasch Measurement (Facets), Thomas Eckes Statistical Analyses for Language Testers (Facets), Rita Green Invariant Measurement with Raters and Rating Scales: Rasch Models for Rater-Mediated Assessments (Facets), George Engelhard, Jr. & Stefanie Wind Aplicação do Modelo de Rasch (Português), de Bond, Trevor G., Fox, Christine M Appliquer le modèle de Rasch: Défis et pistes de solution (Winsteps) E. Dionne, S. Béland
Exploring Rating Scale Functioning for Survey Research (R, Facets), Stefanie Wind Rasch Measurement: Applications, Khine Winsteps Tutorials - free
Facets Tutorials - free
Many-Facet Rasch Measurement (Facets) - free, J.M. Linacre Fairness, Justice and Language Assessment (Winsteps, Facets), McNamara, Knoch, Fan
Other Rasch-Related Resources: Rasch Measurement YouTube Channel
Rasch Measurement Transactions & Rasch Measurement research papers - free An Introduction to the Rasch Model with Examples in R (eRm, etc.), Debelak, Strobl, Zeigenfuse Rasch Measurement Theory Analysis in R, Wind, Hua Applying the Rasch Model in Social Sciences Using R, Lamprianou El modelo métrico de Rasch: Fundamentación, implementación e interpretación de la medida en ciencias sociales (Spanish Edition), Manuel González-Montesinos M.
Rasch Models: Foundations, Recent Developments, and Applications, Fischer & Molenaar Probabilistic Models for Some Intelligence and Attainment Tests, Georg Rasch Rasch Models for Measurement, David Andrich Constructing Measures, Mark Wilson Best Test Design - free, Wright & Stone
Rating Scale Analysis - free, Wright & Masters
Virtual Standard Setting: Setting Cut Scores, Charalambos Kollias Diseño de Mejores Pruebas - free, Spanish Best Test Design A Course in Rasch Measurement Theory, Andrich, Marais Rasch Models in Health, Christensen, Kreiner, Mesba Multivariate and Mixture Distribution Rasch Models, von Davier, Carstensen

To be emailed about new material on www.rasch.org
please enter your email address here:

I want to Subscribe: & click below
I want to Unsubscribe: & click below

Please set your SPAM filter to accept emails from Rasch.org

Rasch Measurement Transactions welcomes your comments:

Your email address (if you want us to reply):

If Rasch.org does not reply, please post your message on the Rasch Forum
 

ForumRasch Measurement Forum to discuss any Rasch-related topic

Go to Top of Page
Go to index of all Rasch Measurement Transactions
AERA members: Join the Rasch Measurement SIG and receive the printed version of RMT
Some back issues of RMT are available as bound volumes
Subscribe to Journal of Applied Measurement

Go to Institute for Objective Measurement Home Page. The Rasch Measurement SIG (AERA) thanks the Institute for Objective Measurement for inviting the publication of Rasch Measurement Transactions on the Institute's website, www.rasch.org.

Coming Rasch-related Events
Apr. 21 - 22, 2025, Mon.-Tue. International Objective Measurement Workshop (IOMW) - Boulder, CO, www.iomw.net
Jan. 17 - Feb. 21, 2025, Fri.-Fri. On-line workshop: Rasch Measurement - Core Topics (E. Smith, Winsteps), www.statistics.com
Feb. - June, 2025 On-line course: Introduction to Classical Test and Rasch Measurement Theories (D. Andrich, I. Marais, RUMM2030), University of Western Australia
Feb. - June, 2025 On-line course: Advanced Course in Rasch Measurement Theory (D. Andrich, I. Marais, RUMM2030), University of Western Australia
May 16 - June 20, 2025, Fri.-Fri. On-line workshop: Rasch Measurement - Core Topics (E. Smith, Winsteps), www.statistics.com
June 20 - July 18, 2025, Fri.-Fri. On-line workshop: Rasch Measurement - Further Topics (E. Smith, Facets), www.statistics.com
Oct. 3 - Nov. 7, 2025, Fri.-Fri. On-line workshop: Rasch Measurement - Core Topics (E. Smith, Winsteps), www.statistics.com

 

The URL of this page is www.rasch.org/rmt/rmt221g.htm

Website: www.rasch.org/rmt/contents.htm