Calibration Matrices For Test Equating

A study of student performance across grades K through 8 required the equating of 17 test forms for each of two curriculum areas: Mathematics and Reading. Advances in Rasch technology enabled this to be achieved through the construction of one response matrix for each area. I will focus on the Mathematics analysis.

Figure 1. Equating Study Design

17 test forms, comprising Levels 6 through 14 (Grades K through 8) of the ITBS Form 7, and levels 7 through 14 (Grades 1 through 8) of CPS90, were equated in one step. Figure 1 shows the equating design. Each lettered rectangle corresponds to one test form. Some students took only one test form in the usual way. Some students took two test forms to provide common-person linking. Each of the 14 arrows in Figure 1 indicates a group of 100 to 150 students who took two test forms marked by arrow ends. These are the common-person links between pairs of forms. The test publishers designed these test forms so that adjacent levels between levels 9 and 14 share common items. This provides "common-item" equating at the higher levels.

Valid equating of math forms requires data that capture the math variable. Data contaminated by guessing, response set or disinterest, must be set aside from an equating study and only be reintroduced later for diagnostic or individual reporting.

Irrelevant test behavior was "cleaned out" of these data in five stages. Set aside were: (1) Answer forms with scanning or marking problems: (a) more than three double-marked responses. (b) lightly marked forms with more than 1 blank response followed by non-blank responses. (c) very lightly marked forms.

(2) Response strings indicating extreme student disinterest or out-of-level testing: (a) more than 25% of the items left blank. (b) many identical responses: "response sets". (c) repeating patterns of responses.

(3) When each test form at each grade level was analyzed separately, response strings showing excessive off-variable behavior, i.e., with infit and outfit mean squares above 2.5.

(4) When infit or outfit mean-squares were above 2.5, and there were many standardized residuals 3 or larger (suggesting guessing or carelessness).

(5) When students took two test forms, and standardized differences between their pairs of measures were above 2, and responses in their lower test performance showed evidence of irrelevant test-taking behavior, e.g., many omitted responses, response sets.

Standardized differences were obtained by:

where MH and ML are the higher and lower performance measures of a student relative to the mean of the common persons on that form, and SH and SL are the measures' standard errors.

This cleaning set aside 12% of the data.

Figure 2. Matrix for Equating Forms

The common-person and common-item links enabled all 17 test forms to be amalgamated into one block-diagonal "giant" matrix, shown schematically in Figure 2. Responses to different test items by the same person were aligned in the same row. Since many pairs of test forms had items in common, students often took the same item twice. In these cases, chronologically first responses were used. Responses to the same item by different persons were stacked in the same column.

Clerical mistakes were hard to avoid in setting up this equating design. Positioning the common items in the giant matrix required care. ITBS items are shared by two and sometimes three tests. Each different new item was assigned its own column in the matrix. When counting out columns, it proved easy to miscount. This threw subsequent item columns out of alignment. Sometimes miscounting went unnoticed until analysis reported the number of items to be different from that expected. When that happened, it was necessary to determine which columns were misplaced, and realign them.

Once the giant matrix was correctly constructed, it was analyzed by computer in the usual way. As discussed in RMT (5:3, p.172), obtaining good estimates from the block diagonal form of Figure 2, with 86% of the data missing, required fine convergence criteria. These criteria overcame the vertical-equating "range restriction" problems sometimes reported in the literature. Convergence required 263 iterations, 240 more than usual for a single test form. The decisive convergence criterion was the maximum marginal score residual. Convergence was not satisfactory until the largest marginal score residual was less than 0.5 score points.

The fact that all students and all test items were now part of the same connected data set, regardless of grade, test form or test publisher, enabled all student and item measures to be located on a single common scale of mathematics competency. The measures were then used for further investigation into such topics as the equivalence of test forms and the changes in math competency across grades.

Ong Kim Lee
MESA Psychometric Laboratory
University of Chicago

Calibration Matrices For Test Equating. Lee O.K. … Rasch Measurement Transactions, 1992, 6:1, 202-203

Please help with Standard Dataset 4: Andrich Rating Scale Model

Rasch Publications
Rasch Measurement Transactions (free, online) Rasch Measurement research papers (free, online) Probabilistic Models for Some Intelligence and Attainment Tests, Georg Rasch Applying the Rasch Model 3rd. Ed., Bond & Fox Best Test Design, Wright & Stone
Rating Scale Analysis, Wright & Masters Introduction to Rasch Measurement, E. Smith & R. Smith Introduction to Many-Facet Rasch Measurement, Thomas Eckes Invariant Measurement: Using Rasch Models in the Social, Behavioral, and Health Sciences, George Engelhard, Jr. Statistical Analyses for Language Testers, Rita Green
Rasch Models: Foundations, Recent Developments, and Applications, Fischer & Molenaar Journal of Applied Measurement Rasch models for measurement, David Andrich Constructing Measures, Mark Wilson Rasch Analysis in the Human Sciences, Boone, Stave, Yale
in Spanish: Análisis de Rasch para todos, Agustín Tristán Mediciones, Posicionamientos y Diagnósticos Competitivos, Juan Ramón Oreja Rodríguez

To be emailed about new material on
please enter your email address here:

I want to Subscribe: & click below
I want to Unsubscribe: & click below

Please set your SPAM filter to accept emails from welcomes your comments:

Your email address (if you want us to reply):


ForumRasch Measurement Forum to discuss any Rasch-related topic

Go to Top of Page
Go to index of all Rasch Measurement Transactions
AERA members: Join the Rasch Measurement SIG and receive the printed version of RMT
Some back issues of RMT are available as bound volumes
Subscribe to Journal of Applied Measurement

Go to Institute for Objective Measurement Home Page. The Rasch Measurement SIG (AERA) thanks the Institute for Objective Measurement for inviting the publication of Rasch Measurement Transactions on the Institute's website,

Coming Rasch-related Events
March 31, 2017, Fri. Conference: 11th UK Rasch Day, Warwick, UK,
April 2-3, 2017, Sun.-Mon. Conference: Validity Evidence for Measurement in Mathematics Education (V-M2Ed), San Antonio, TX, Information
April 26-30, 2017, Wed.-Sun. NCME, San Antonio, TX, - April 29: Ben Wright book
April 27 - May 1, 2017, Thur.-Mon. AERA, San Antonio, TX,
May 26 - June 23, 2017, Fri.-Fri. On-line workshop: Practical Rasch Measurement - Core Topics (E. Smith, Winsteps),
June 30 - July 29, 2017, Fri.-Fri. On-line workshop: Practical Rasch Measurement - Further Topics (E. Smith, Winsteps),
July 31 - Aug. 3, 2017, Mon.-Thurs. Joint IMEKO TC1-TC7-TC13 Symposium 2017: Measurement Science challenges in Natural and Social Sciences, Rio de Janeiro, Brazil,
Aug. 7-9, 2017, Mon-Wed. In-person workshop and research coloquium: Effect size of family and school indexes in writing competence using TERCE data (C. Pardo, A. Atorressi, Winsteps), Bariloche Argentina. Carlos Pardo, Universidad Catòlica de Colombia
Aug. 7-9, 2017, Mon-Wed. PROMS 2017: Pacific Rim Objective Measurement Symposium, Sabah, Borneo, Malaysia,
Aug. 10, 2017, Thurs. In-person Winsteps Training Workshop (M. Linacre, Winsteps), Sydney, Australia.
Aug. 11 - Sept. 8, 2017, Fri.-Fri. On-line workshop: Many-Facet Rasch Measurement (E. Smith, Facets),
Aug. 18-21, 2017, Fri.-Mon. IACAT 2017: International Association for Computerized Adaptive Testing, Niigata, Japan,
Sept. 15-16, 2017, Fri.-Sat. IOMC 2017: International Outcome Measurement Conference, Chicago,
Oct. 13 - Nov. 10, 2017, Fri.-Fri. On-line workshop: Practical Rasch Measurement - Core Topics (E. Smith, Winsteps),
Jan. 5 - Feb. 2, 2018, Fri.-Fri. On-line workshop: Practical Rasch Measurement - Core Topics (E. Smith, Winsteps),
Jan. 10-16, 2018, Wed.-Tues. In-person workshop: Advanced Course in Rasch Measurement Theory and the application of RUMM2030, Perth, Australia (D. Andrich), Announcement
Jan. 17-19, 2018, Wed.-Fri. Rasch Conference: Seventh International Conference on Probabilistic Models for Measurement, Matilda Bay Club, Perth, Australia, Website
May 25 - June 22, 2018, Fri.-Fri. On-line workshop: Practical Rasch Measurement - Core Topics (E. Smith, Winsteps),
June 29 - July 27, 2018, Fri.-Fri. On-line workshop: Practical Rasch Measurement - Further Topics (E. Smith, Winsteps),
Aug. 10 - Sept. 7, 2018, Fri.-Fri. On-line workshop: Many-Facet Rasch Measurement (E. Smith, Facets),
Oct. 12 - Nov. 9, 2018, Fri.-Fri. On-line workshop: Practical Rasch Measurement - Core Topics (E. Smith, Winsteps),
The HTML to add "Coming Rasch-related Events" to your webpage is:
<script type="text/javascript" src=""></script>


The URL of this page is