Standard Errors for Performance Standards based on Bookmark Judgments

A variety of methods can be used to estimate the standard errors of performance standards or cut scores. Historically, these methods have ranged from classical methods based on the standard errors of mean panelist judgments (Jaeger, 1991) to more elaborate approaches based on generalizability theory (Yin and Sconing, 2008). Engelhard (2007) and his colleagues (Sullivan, Caines, Tucker, & Engelhard, 2008) recently described the use of Rasch measurement theory as a conceptual framework for evaluating the quality of panelist judgments within the context of bookmark and other item mapping based methods. The multifaceted Rasch measurement (MFR) model provides another approach for estimating the standard errors of performance standards. The MFR model can be used to model judgments collected from modified-Angoff procedures, as well as procedures based on item maps, such as bookmark and mapmark procedures (Schulz and Mitzel, in press).

Modified-Angoff and item-map based procedures are the two most popular methods for collecting judgments from standard-setting panelists (Cizek and Bunch, 2007). The bookmark procedure (Mitzel, Lewis, Patz, and Green, 2001) is becoming the standard-setting method of choice in many statewide assessment programs. For example one possible MRM model for bookmark judgments is:

Pnijk = probability of panelist n giving a bookmark rating of k on item i for round j,

tk = judged performance standard for bookmark rating category k relative to category k-1.

The rating category coefficients, tk, defines the performance standards or cut scores.

In order to illustrate the use of the MFR model for estimating standard errors, data from the Michigan Educational Assessment Program are used. (http://www.michigan.gov/mde/) There were 21 panelists on the standard-setting panel. The instrument examined in this study is the Grade 3 mathematics test used in the Michigan Educational Assessment Program (MEAP). The judgments were obtained based on a modified bookmark approach called Item Mapping. The standard-setting judgments were obtained in three separate rounds.

The Wright map with the calibrations of the items, panelists, rounds, and performance standards is presented in Figure 1. The judged locations of the items represent the shared understandings of the standard-setting panelists for students within the four performance levels. Panelist locations represent their severities, while round locations represent average difficulties of judgments for each round. Finally, the category coefficients represent the performance standards by round (R.1, R.2, and R3 for these panelists on this assessment (A=Apprentice, B=Basic, M=Met, and E=Exceeded).

Table 1 presents the category statistics with the category coefficients defined as the performance standards or cut scores. The performance standards change over rounds, and the most disagreement is found in Round 1 for the Apprentice category (OUTFIT=3.00). The final column in Table 1 gives the standard errors. The standard errors for the performance standards do not vary much over rounds for the apprentice/basic cut score or the basic/met cut score. However, uncertainty regarding the met/exceeded category increases significantly over rounds. The error variance at Round 3 is three times larger than the error variance at Round 1 (.0625/.0225 = 2.7777). Figure 2 presents the category response function for the performance standards for Round 3. Figure 3 presents the information function with a very distinctive shape with a peak at each of the performance standards. The information function shows graphically the spread in the information function at each performance standard.

Additional work is needed to compare different approaches for estimating standard errors for performance standards. Given the high-stakes decisions made on the basis of assessments in education, health, and the professions, it is essential to develop procedures for conveying the uncertainty inherent in the estimated performance standards. The standard errors are readily obtained using the MFR model, and the MFR model offers additional information about the quality of standard-setting judgments that is not available with approaches based on classical or generalizability theory.

Cizek, G.J., & Bunch, M.B. (2007). Standard setting: A guide to establishing and evaluating performance standards on tests. Thousand Oaks, CA: Sage.

Engelhard, G. (2007). Evaluating bookmark judgments. Rasch measurement Transactions, 21(2), 1097-1098.

Jaeger, R.M. (1991). Selection of judges for standard-setting. Educational Measurement, Spring, 3-6, 10, 14.

Mitzel, H.C., Lewis, D.M., Patz, R.J., & Green, D.R. (2001). The bookmark procedure: Psychological perspectives. In G.J. Cizek (Ed), Setting performance standards: Concepts, methods and perspectives (pp. 249-281). Mahwah, NJ: Lawrence Erlbaum Associates

Schulz, E.M., & Mitzel, H.C. (in press). A mapmark method of standard setting as implemented for the National Assessment Governing Board. In E. V. Smith, Jr., and G. E. Stone (Eds.), Applications of Rasch measurement in criterion-referenced testing, JAM Press.

Sullivan, R., Caines, J., Tucker, C., Engelhard, G. (March 2008). Examining the bookmark ratings of standard-setting panelists: An approach based on the multifaceted Rasch measurement model. IOMW 2008, New York.

Yin, P., & Sconing, J. (2007). Evaluating standard errors of cut scores for Item Rating and Mapmark procedures: A Generalizability Theory approach. Educational and Psychological Measurement, 68(1), 25-41.

Standard Errors for Performance Standards based on Bookmark Judgments. Engelhard, G. Jr. … Rasch Measurement Transactions, 2008, 22:1 p. 1156-7

Rasch Publications
Rasch Measurement Transactions (free, online)	Rasch Measurement research papers (free, online)	Probabilistic Models for Some Intelligence and Attainment Tests, Georg Rasch	Applying the Rasch Model 3rd. Ed., Bond & Fox	Best Test Design, Wright & Stone
Rating Scale Analysis, Wright & Masters	Introduction to Rasch Measurement, E. Smith & R. Smith	Introduction to Many-Facet Rasch Measurement, Thomas Eckes	Invariant Measurement: Using Rasch Models in the Social, Behavioral, and Health Sciences, George Engelhard, Jr.	Statistical Analyses for Language Testers, Rita Green
Rasch Models: Foundations, Recent Developments, and Applications, Fischer & Molenaar	Journal of Applied Measurement	Rasch models for measurement, David Andrich	Constructing Measures, Mark Wilson	Rasch Analysis in the Human Sciences, Boone, Stave, Yale
in Spanish:	Análisis de Rasch para todos, Agustín Tristán	Mediciones, Posicionamientos y Diagnósticos Competitivos, Juan Ramón Oreja Rodríguez

Go to Institute for Objective Measurement Home Page. The Rasch Measurement SIG (AERA) thanks the Institute for Objective Measurement for inviting the publication of Rasch Measurement Transactions on the Institute's website, www.rasch.org.

Coming Rasch-related Events
May 17 - June 21, 2024, Fri.-Fri.	On-line workshop: Rasch Measurement - Core Topics (E. Smith, Winsteps), www.statistics.com
June 12 - 14, 2024, Wed.-Fri.	1st Scandinavian Applied Measurement Conference, Kristianstad University, Kristianstad, Sweden http://www.hkr.se/samc2024
June 21 - July 19, 2024, Fri.-Fri.	On-line workshop: Rasch Measurement - Further Topics (E. Smith, Winsteps), www.statistics.com
Aug. 5 - Aug. 6, 2024, Fri.-Fri.	2024 Inaugural Conference of the Society for the Study of Measurement (Berkeley, CA), Call for Proposals
Aug. 9 - Sept. 6, 2024, Fri.-Fri.	On-line workshop: Many-Facet Rasch Measurement (E. Smith, Facets), www.statistics.com
Oct. 4 - Nov. 8, 2024, Fri.-Fri.	On-line workshop: Rasch Measurement - Core Topics (E. Smith, Winsteps), www.statistics.com
Jan. 17 - Feb. 21, 2025, Fri.-Fri.	On-line workshop: Rasch Measurement - Core Topics (E. Smith, Winsteps), www.statistics.com
May 16 - June 20, 2025, Fri.-Fri.	On-line workshop: Rasch Measurement - Core Topics (E. Smith, Winsteps), www.statistics.com
June 20 - July 18, 2025, Fri.-Fri.	On-line workshop: Rasch Measurement - Further Topics (E. Smith, Facets), www.statistics.com
Oct. 3 - Nov. 7, 2025, Fri.-Fri.	On-line workshop: Rasch Measurement - Core Topics (E. Smith, Winsteps), www.statistics.com