Greetings,
Examiner consistency within his/her own rating pattern in an
oral examination should be monitored.
This brief study explores whether an examiner's internal consistency is
related to his/her severity.
Lidia Martinez Manager, Test Development and
Analysis
|
The Relationship between Examiner Severity and
Consistency
|
Examiner severity is the convenient term for the
tendency of an examiner to give lower ratings or higher ratings. This tendency
towards severity or leniency is due to examiner expectations, characteristics,
and standards. A severity measure for each examiner in the many-facet Rasch analysis
is calculated using all of the ratings the examiner gave during the course of
the examination.
Examiner consistency is measured by a
mean-square fit statistic. This statistic is based on the ratio of observed error
variance to expected error variance. It's
expected value is 1 (i.e., a ratio of 1:1). The mean square fit statistic for
an examiner indicates his/her consistency or how well his/her pattern of ratings
meet expectations given examiner severity and candidate ability (i.e., fit to
the model). Neither too high nor too low fit statistics are desirable.
When the examiner's fit statistic is less than
.5, it indicates over 50% less variance in his/her ratings than is expected. It
is likely that the examiner tended to give many candidates the same rating,
regardless of their ability. This type of examiner is not only too predictable,
but he/she is not distinguishing differences among candidates. When the fit
statistic is greater than 1.5, it indicates over 50% more variance in his/her
ratings than is expected. It is likely that the examiner gave candidates unexpectedly
high or low ratings compared to their overall ability.
The question is whether there is a correlation
between measured examiner severity and examiner consistency (outfit mean square
fit statistic). To study this question, random performance examinations were
selected and the Pearson correlation between severity and consistency for the
examiners was calculated.
The table below shows that there are low,
non-significant correlations between examiner severity and consistency. The table also shows that the vast majority
of the examiners meet the criteria for consistency. The low correlations between severity and
consistency show that 1) most examiners are internally consistent in their
rating of candidates; 2) that examiners, regardless of their measured severity,
tend to be consistent in their rating of candidates; and 3) that severity does
not predict consistency or vice-versa.
The low numbers of inconsistent examiners reflects good examiner
training and an understanding of the rating process.
Exam
|
N
of Examiners
|
Correlation
between Severity and Consistency
|
Significance (ns = not significant)
|
Number (%) of
inconsistent
examiners
|
Exam
1
|
44
|
.05
|
ns
|
0
|
Exam
2
|
24
|
-.14
|
ns
|
0
|
Exam
3
|
72
|
.00
|
ns
|
4
(5%)
|
Exam
4
|
146
|
-.09
|
ns
|
4
(3%)
|
Exam
5
|
81
|
-.06
|
ns
|
2
(2%)
|
|
|