Greetings
Test equating is a method of insuring that candidates are
measured against the same criterion-referenced standard regardless of the test administration
they challenge. An exam meant to test
the same area may vary in difficulty from administration to administration. Test
equating accounts for these differences so that the same criterion-referenced
standard can be used.
Mary E. Lunz, Ph.D. Executive Director
|
Test Equating For Comparable
Passing Standards
|
The purpose of test equating is to
place examination administrations on the same Benchmark Scale. The differences
in the difficulty of the two administrations are accounted for so the same criterion-referenced
standard can be used from administration to administration.
For certification testing, Rasch
common item test equating is frequently used in conjunction with a criterion-referenced
standard. First a criterion-referenced standard is established on a Benchmark
Scale. The data from an exam is used to calibrate the Benchmark scale. The exam
should match the test blueprint to assure content validity, and it should include
a sufficient number of items that have been field-tested or previously used, to
insure that the exam is a satisfactory measure of the construct. A
criterion-referenced standard can be established using any of the accepted
methods, such as a modified Angoff, objective standard setting, bookmark (if
item calibrations are available), or other. After the Benchmark Scale is
established, the criterion-referenced standard is established as a score on that
scale.
Equating to that Benchmark Scale
and criterion standard requires that subsequent test administrations include a
number of items that are calibrated to the Benchmark Scale (commonly called
equators). The group of items chosen to be equators should represent all
content areas, and should include items with a range of difficulty
calibrations. The purpose of the equators is to statistically identify
differences in difficulty between the Benchmark Scale and the current test administration.
The current test administration may be more difficult or easier than the
Benchmark Scale. Test equating allows these differences to be taken into
account, so that the criterion-referenced standard can be used.
Using the Rasch model, the initial mean
difficulty of the Benchmark Scale is set at a scaled score of 5.00. The mean
difficulty represents the average difficulty of all items on the test.
Therefore, if a subsequent test form is more difficult, the mean difficulty
will be more than 5.00, but if the
test is easier, the mean difficulty will be less
than 5.00.
The pass point that is determined
by the standard setting is set as a scaled score on the Benchmark Scale. However,
if we translate the scaled score back to a percent correct, it is easier to
understand how test equating works. For example, if a test administration is
more difficult, the percent correct necessary to pass would be lowered to be
equivalent to the criterion standard. On the other hand, if a test administration
is easier, the percent correct necessary to pass would be higher to be
equivalent to the criterion standard. Test equating is the statistical process
that accounts for the differences in test difficulty and then adjusts the scale
of the current test administration so that the same criterion standard can be
used.
The table below shows how the test
equating process works. Five different exams are represented. The test forms are different administrations
of the each exam, each of which includes equator items and is calibrated to the
Benchmark Scale. Some test
administrations of a particular exam are more difficult while others are
easier. The results are simulated from samples of real data and the percent to
pass is an approximation for demonstration purposes.
|
Mean Item Difficulty and Percent Correct Equivalent
of the Criterion Standard
|
Exam
|
Benchmark Scale (%
pass point)
|
Test Form #1 (% to pass)
|
Test Form #2 (% to pass)
|
Test Form #3 (% to pass)
|
1
|
5.00 (65%)
|
5.39 (harder, 62%)
|
4.87 (easier, 67%)
|
5.35 (harder, 63%)
|
2
|
5.00 (60%)
|
5.12 (harder, 57%)
|
4.99 (easier, 61%)
|
5.17 (harder, 56%)
|
3
|
5.00 (65%)
|
4.98 (easier, 66%)
|
4.83 (easier,
67%)
|
4.82 (easier, 68%)
|
4
|
5.00 (55%)
|
5.39 (harder, 53%)
|
4.99 (easier, 56%)
|
5.20 (harder, 52%)
|
5
|
5.00 (65%)
|
5.28 (harder, 63%)
|
5.20 (harder, 64%)
|
5.42 (harder, 61%)
|
|