The reliability of medically related oral certification examinations is critical because pass or fail decisions are made. Over the past 20 years, research and practice have demonstrated that the reliability of oral examinations is influenced by the structure of the examination and use of the multi-facet model to produce accurate candidate results.
Surintorn Suanthong, Ph.D.
Manager, Test Analysis and Research
|
Reliability of Oral Certification Examinations |
Oral certification examinations are often noted for their validity rather than their reliability. However, if pass or fail decisions are being made using oral certification examinations, then the pass or fail decisions should be as accurate as possible.
The structure of the oral certification examination is critical to insure reliability. It is important to structure the examination scoring so that the examiners record as much information as possible about the performance of the candidates. When examiners give analytic ratings independently for standardized clinical tasks within cases, more detailed information about the candidate's performance is recorded. However, if only summative holistic ratings are recorded, examiners mentally combine information about the candidate's performance and enter one holistic rating which provides little scoring information. When analytic scores are used, there is enough information to calculate candidate means, standard deviations, and measurement errors, as well as, candidate separation reliability ((SD2 - SE2)/SD2) estimates that document the accuracy of the measured differences among candidate performance. This reliability calculation requires that multiple ratings be awarded to each candidate. Most oral certification examinations use a rating scale to measure candidate performance.
The table below shows the structure of several medically related oral certification examinations and the candidate separation reliability. These oral certification examinations include standardized cases and require the examiner to give analytic ratings for standardized tasks within each case. Tasks such as diagnosis, treatment, outcome, or ethics may be rated. The content of the cases covers pertinent subjects for the specialty. Depending on the number of examiners, cases and tasks, a great deal of information can be collected about the candidate's performance. Oral exams are never completely without subjectivity; however; analytic scoring and the use of the multi-facet analysis model provides enough information about the candidate, so that there is measurable accuracy with regard to candidate outcomes.
Oral Certification Exam |
Number of cases per candidate |
Number of tasks within cases |
Number of examiners per case |
Total number of ratings given to a candidate |
Candidate-separation reliability |
Exam 1 |
6 |
4 |
2 |
48 |
0.95 |
Exam 3 |
3 |
10 |
2 |
60 |
0.97 |
Exam 3 |
5 |
7 or 9 |
2 |
82 |
0.92 |
Exam 4 |
6 |
7 |
2 |
84 |
0.95 |
Exam 5 |
17 |
3 |
2 |
102 |
0.97 | |
|