A variety of methods can be used to estimate the standard errors of performance standards or cut scores. Historically, these methods have ranged from classical methods based on the standard errors of mean panelist judgments (Jaeger, 1991) to more elaborate approaches based on generalizability theory (Yin and Sconing, 2008). Engelhard (2007) and his colleagues (Sullivan, Caines, Tucker, & Engelhard, 2008) recently described the use of Rasch measurement theory as a conceptual framework for evaluating the quality of panelist judgments within the context of bookmark and other item mapping based methods. The multifaceted Rasch measurement (MFR) model provides another approach for estimating the standard errors of performance standards. The MFR model can be used to model judgments collected from modified-Angoff procedures, as well as procedures based on item maps, such as bookmark and mapmark procedures (Schulz and Mitzel, in press).
Modified-Angoff and item-map based procedures are the two most popular methods for collecting judgments from standard-setting panelists (Cizek and Bunch, 2007). The bookmark procedure (Mitzel, Lewis, Patz, and Green, 2001) is becoming the standard-setting method of choice in many statewide assessment programs. For example one possible MRM model for bookmark judgments is:
Ln [Pnijk / Pnijk-1] = qn - di - wj - tk [1]
where
Pnijk = probability of panelist n giving a bookmark rating of k on item i for round j,
Pnijk-1 = probability of rating of k-1,
qn = judged performance level for panelist n,
di = judged difficulty for item i,
wj = judged performance level for round j, and
tk = judged performance standard for bookmark rating category k relative to category k-1.
The rating category coefficients, tk, defines the performance standards or cut scores.
In order to illustrate the use of the MFR model for estimating standard errors, data from the Michigan Educational Assessment Program are used. (http://www.michigan.gov/mde/) There were 21 panelists on the standard-setting panel. The instrument examined in this study is the Grade 3 mathematics test used in the Michigan Educational Assessment Program (MEAP). The judgments were obtained based on a modified bookmark approach called Item Mapping. The standard-setting judgments were obtained in three separate rounds.
Figure 1. Wright Map (Grade 3 mathematics) |
The Wright map with the calibrations of the items, panelists, rounds, and performance standards is presented in Figure 1. The judged locations of the items represent the shared understandings of the standard-setting panelists for students within the four performance levels. Panelist locations represent their severities, while round locations represent average difficulties of judgments for each round. Finally, the category coefficients represent the performance standards by round (R.1, R.2, and R3 for these panelists on this assessment (A=Apprentice, B=Basic, M=Met, and E=Exceeded).
Table 1 presents the category statistics with the category coefficients defined as the performance standards or cut scores. The performance standards change over rounds, and the most disagreement is found in Round 1 for the Apprentice category (OUTFIT=3.00). The final column in Table 1 gives the standard errors. The standard errors for the performance standards do not vary much over rounds for the apprentice/basic cut score or the basic/met cut score. However, uncertainty regarding the met/exceeded category increases significantly over rounds. The error variance at Round 3 is three times larger than the error variance at Round 1 (.0625/.0225 = 2.7777). Figure 2 presents the category response function for the performance standards for Round 3. Figure 3 presents the information function with a very distinctive shape with a peak at each of the performance standards. The information function shows graphically the spread in the information function at each performance standard.
Additional work is needed to compare different approaches for estimating standard errors for performance standards. Given the high-stakes decisions made on the basis of assessments in education, health, and the professions, it is essential to develop procedures for conveying the uncertainty inherent in the estimated performance standards. The standard errors are readily obtained using the MFR model, and the MFR model offers additional information about the quality of standard-setting judgments that is not available with approaches based on classical or generalizability theory.
George Engelhard, Jr., Ph.D.
Emory University
Cizek, G.J., & Bunch, M.B. (2007). Standard setting: A guide to establishing and evaluating performance standards on tests. Thousand Oaks, CA: Sage.
Engelhard, G. (2007). Evaluating bookmark judgments. Rasch measurement Transactions, 21(2), 1097-1098.
Jaeger, R.M. (1991). Selection of judges for standard-setting. Educational Measurement, Spring, 3-6, 10, 14.
Mitzel, H.C., Lewis, D.M., Patz, R.J., & Green, D.R. (2001). The bookmark procedure: Psychological perspectives. In G.J. Cizek (Ed), Setting performance standards: Concepts, methods and perspectives (pp. 249-281). Mahwah, NJ: Lawrence Erlbaum Associates
Schulz, E.M., & Mitzel, H.C. (in press). A mapmark method of standard setting as implemented for the National Assessment Governing Board. In E. V. Smith, Jr., and G. E. Stone (Eds.), Applications of Rasch measurement in criterion-referenced testing, JAM Press.
Sullivan, R., Caines, J., Tucker, C., Engelhard, G. (March 2008). Examining the bookmark ratings of standard-setting panelists: An approach based on the multifaceted Rasch measurement model. IOMW 2008, New York.
Yin, P., & Sconing, J. (2007). Evaluating standard errors of cut scores for Item Rating and Mapmark procedures: A Generalizability Theory approach. Educational and Psychological Measurement, 68(1), 25-41.
Figure 2. Category Response Function |
Figure 3.Information Function |
Standard Errors for Performance Standards based on Bookmark Judgments. Engelhard, G. Jr. Rasch Measurement Transactions, 2008, 22:1 p. 1156-7
Forum | Rasch Measurement Forum to discuss any Rasch-related topic |
Go to Top of Page
Go to index of all Rasch Measurement Transactions
AERA members: Join the Rasch Measurement SIG and receive the printed version of RMT
Some back issues of RMT are available as bound volumes
Subscribe to Journal of Applied Measurement
Go to Institute for Objective Measurement Home Page. The Rasch Measurement SIG (AERA) thanks the Institute for Objective Measurement for inviting the publication of Rasch Measurement Transactions on the Institute's website, www.rasch.org.
Coming Rasch-related Events | |
---|---|
Apr. 21 - 22, 2025, Mon.-Tue. | International Objective Measurement Workshop (IOMW) - Boulder, CO, www.iomw.net |
Jan. 17 - Feb. 21, 2025, Fri.-Fri. | On-line workshop: Rasch Measurement - Core Topics (E. Smith, Winsteps), www.statistics.com |
Feb. - June, 2025 | On-line course: Introduction to Classical Test and Rasch Measurement Theories (D. Andrich, I. Marais, RUMM2030), University of Western Australia |
Feb. - June, 2025 | On-line course: Advanced Course in Rasch Measurement Theory (D. Andrich, I. Marais, RUMM2030), University of Western Australia |
May 16 - June 20, 2025, Fri.-Fri. | On-line workshop: Rasch Measurement - Core Topics (E. Smith, Winsteps), www.statistics.com |
June 20 - July 18, 2025, Fri.-Fri. | On-line workshop: Rasch Measurement - Further Topics (E. Smith, Facets), www.statistics.com |
Oct. 3 - Nov. 7, 2025, Fri.-Fri. | On-line workshop: Rasch Measurement - Core Topics (E. Smith, Winsteps), www.statistics.com |
The URL of this page is www.rasch.org/rmt/rmt221g.htm
Website: www.rasch.org/rmt/contents.htm