Greetings
With computer-based tests, test takers have the ability to
go back and review items. This brief
study investigates the relationship between the amount of time spent reviewing
items and candidate test performance.
Lidia Martinez
Manager Test Development and Analysis
|
Time Usage and Candidate Performance
|
Computer based testing provides the opportunity to track the
amount of time candidates spend responding to and reviewing each exam
item. The time usage of a test was studied
to understand the relationship between the amount of time a candidate takes to review items and their final
score. For purposes of this study,
scores are reported as percent correct without any consideration for calibrated
item difficulty or test equating. The
question is the impact of the amount of time used for review on candidate
scores.
This candidate population took a multiple choice examination
and was divided into three groups based on the mean amount of time they used to
review items. Candidates in Group 1 used an average of 5
seconds or less per item to review.
Candidates in Group 2 used an average of 5 - 20 seconds per item to
review and candidates in Group 3 used an average of more than 20 seconds per
item to review. These groups were
compared by 1) mean time spent initially responding per item; 2) mean time
spent reviewing per item; 3) total test percent correct. All time is given in seconds. An alpha level of .05 was used for all statistical
tests.
An analysis of variance showed that there was a
significant difference in the amount of time used to initially respond to items
(p = .039). A post hoc analysis using
Tukey's HSD test revealed that Group 3's average time spent initially responding
to items was significantly less than Group 1's average time (p = .030).
Descriptive
Statistics for Time Used to Initially
Respond to Items
Group based
on time used to review
|
Mean Time
per Item
|
SD
|
Min
|
Max
|
Group
1:
Average
Review Time ≤ 5 sec.
|
57.19
|
15.33
|
35.25
|
84.31
|
Group
2:
5
sec. < Avg. Rev. Time ≤ 20 sec.
|
54.39
|
13.04
|
31.25
|
75.33
|
Group
3:
Average
Review Time > 20 sec.
|
47.68
|
9.78
|
30.54
|
63.43
|
Total
Population
|
54.22
|
13.87
|
30.54
|
84.31
|
|
An ANOVA showed that there was a significant difference in
the amount of time used to review items (p < .001). A post hoc analysis revealed all groups were
significantly different from one another (all p values < .001). Since the groups were divided based on amount
of time taken to review, these results are not surprising.
Descriptive Statistics for Time Used to Review
Items after Initial Response
Group based
on time used to review
|
Mean Time
per Item
|
SD
|
Min
|
Max
|
Group
1:
Average
Review Time ≤ 5 sec.
|
1.25
|
1.37
|
.00
|
4.71
|
Group
2:
5
sec. < Avg. Rev. Time ≤ 20 sec.
|
12.05
|
4.35
|
5.31
|
19.76
|
Group
3:
Average
Review Time > 20 sec.
|
27.45
|
6.79
|
20.57
|
44.26
|
Total
Population
|
10.55
|
10.71
|
.00
|
44.26
|
An ANOVA showed that there was no significant difference in percent
correct scores based on the amount of time spent reviewing items (p = .335). Based
on this study, the amount of time spent reviewing items does not seem to have
an effect on candidate test performance.
Descriptive Statistics for Candidate Total Percent Correct Scores
Group based
on time used to review
|
Mean %
Correct
|
SD
|
Min
|
Max
|
Group
1:
Average
Review Time ≤ 5 sec.
|
59%
|
8%
|
40%
|
75%
|
Group
2:
5
sec. < Avg. Rev. Time ≤ 20 sec.
|
61%
|
7%
|
51%
|
74%
|
Group
3:
Average
Review Time > 20 sec.
|
61%
|
8%
|
47%
|
74%
|
Total
Population
|
60%
|
8%
|
40%
|
75%
|
For this data sample, candidates who spent more
time reviewing items, spent less time initially responding to items. This could be due to the fact that if more
time is taken initially to view items, there will be less time remaining after
the first view of the exam to review items. While candidates who spent more
time reviewing items earned slightly higher percent correct scores, this is not
a trend, since there was only a 2% difference in group performance. The mean percent correct for each group is
statistically comparable.
|