Guidelines for Rating Scales and Andrich Thresholds

Optimizing a rating scale is "fine-tuning" to try to squeeze the last ounce of performance out of a test. So the first stage is to check that everything else about the test is working as well as is reasonable. For instance, there is no point in trying to optimize a rating scale if half the sample employ a "response set". Clean the data as much as possible. Put to one side for the moment clearly misfitting items and idiosyncratic people. When you have a core that looks like it should work well, take a look at the misfitting responses. Make sure that no data entry errors, random guessing, or other off-dimensional "bad spots" remain. Now you are ready to begin optimizing. Remember these are only guidelines. Not all apply. Not all are good to do under all circumstances. Keep a good eye on what is happening at the item level. The more you collapse categories, the more statistical and diagnostic information you lose.

Andrich thresholds are also called Step Calibrations and Step Difficulties

Stage

Guideline

Measure Stability

Measure Accuracy (Fit)

Description of this sample

Inference for next sample

Pre.

Scale oriented with latent variable

Essential

Essential

Essential

Essential

1.

At least 10 observations of each category.

Essential

Helpful

 

Helpful

2.

Regular observation distribution.

Helpful

 

 

Helpful

3.

Observed Average measures (of the persons in the category) advance monotonically with category.

Helpful

Essential

Essential

Essential

4.

OUTFIT mean‑squares less than 2.0.

Helpful

Essential

Helpful

Helpful

5.

Andrich thresholds advance.

 

 

 

Helpful

6.

Ratings imply measures, and measures imply ratings.

 

Helpful

 

Helpful

7.

Andrich thresholds advance by at least 1.4 logits.

 

 

 

Helpful

8.

Andrich thresholds advance by less than 5.0 logits

Helpful

 

 

 

Summary of Guideline Pertinence. from JAM, 2002

This is an early research note. See Journal of Applied Measurement 3:1 2002 p.85-106.


See also:
Optimizing Rating Scales for Self-Efficacy (and Other) Research. Smith Jr. E.V.; Wakely M.B.; de Kruif R.E.L.; Swartz C.W.
Educational and Psychological Measurement, 1 June 2003, vol. 63, no. 3, pp. 369-391(23)
Abstract:
This article (a) discusses the assumptions underlying the use of rating scales, (b) describes the use of information available within the context of Rasch measurement that may be useful for optimizing rating scales, and (c) demonstrates the process in two studies. Participants in the first study were 330 fourth- and fifth-grade students. Participants provided responses to the Index of Self-Efficacy for Writing. Based on category counts, average measures, thresholds and category fit statistics, the responses on the original 10-point scale were better represented by a 4-point scale. The modified 4-point scale was given to a replication sample of 668 fourth- and fifth-grade students. The rating scale structure was found to be congruent with the results from the first study. In addition, the item fit statistics and item hierarchy indicated the writing self-efficacy construct to be stable across the two samples. Combined, these results provide evidence for the generalizability of the findings and hence utility of this scale for use with samples of respondents from the same population.


Example: Guilford's Ratings of Creativity, (Psychometric Methods p.282 Guilford 1954)

+--------------------------------------------------------------------------------------------------------------------+
|           DATA                 |   QUALITY CONTROL |RASCH-ANDRICH|  EXPECTATION  |  MOST  |  RASCH-  | Cat|Response|
|      Category Counts       Cum.|  Avge  Exp. OUTFIT| Thresholds  |  Measure at   |PROBABLE| THURSTONE|PEAK|Category|
|Score Total      Used    %    % |  Meas  Meas  MnSq |Measure  S.E.|Category  -0.5 |  from  |Thresholds|Prob|  Name  |
|--------------------------------+-------------------+-------------+---------------+--------+----------+----+--------|
|  1       4         4    4%   4%|  -.86   -.72   .8 |             |( -2.70)       |   low  |   low    |100%| lowest |
|  2       4         4    4%   8%|  -.11   -.57  2.7 |  -.64    .53|  -1.65   -2.21|        |  -1.75   | 17%|        |
|  3      25        25   24%  31%|  -.36*  -.40   .9 | -2.32    .39|   -.93   -1.26|  -1.48 |  -1.39   | 48%|        |
|  4       8         8    8%  39%|  -.43*  -.22   .5 |   .83    .25|   -.41    -.66|        |   -.46   | 11%|        |
|  5      31        31   30%  69%|  -.04   -.03   .8 | -1.48    .24|    .02    -.19|   -.32 |   -.29   | 39%| middle |
|  6       6         6    6%  74%|  -.46*   .17  4.1 |  1.71    .25|    .44     .23|        |    .34   |  9%|        |
|  7      21        21   20%  94%|   .45    .34   .6 | -1.00    .26|    .94     .68|    .35 |    .47   | 47%|        |
|  8       3         3    3%  97%|   .74    .49   .5 |  2.36    .44|   1.62    1.24|        |   1.37   | 16%|        |
|  9       3         3    3% 100%|   .77    .60   .8 |   .54    .60|(  2.69)   2.17|   1.45 |   1.70   |100%| highest|
+---------------------------------------------------------------------(Mean)---------(Modal)--(Median)---------------+

Probability Curves

 -3.0       -2.0       -1.0        0.0        1.0        2.0        3.0
  ++----------+----------+----------+----------+----------+----------++
1 |                                                                   |
  |                                                                   |
  |1                                                                 9|
  | 111                                                           999 |
  |    11                                                      999    |
P |      11                                                  99       |
r |        11                                               9         |
o |          1                                            99          |
b |           11                                         9            |
a |             1                                       9             |
b |              1        3                           99              |
i |               1   3333 333             77777777  9                |
l |                133        33   555   77        7*                 |
i |               3311          355   55*          9 7                |
t |              3    1        5533    7 55       9   77              |
y |            33      1     55    3 77    5    99      77            |
  |          33         11  5       *       55 9          77          |
  |       2**2222222222222**      77 33      9*5888888888888**        |
  |2222***3            55*****44**444*6**66***8855            ***8888 |
  |3333           4****44    7******6 ******3 6666****           7777*|
0 |*******************************************************************|
  ++----------+----------+----------+----------+----------+----------++
 -3.0       -2.0       -1.0        0.0        1.0        2.0        3.0

First, express the rating scale as a clearly defined, substantively relevant, ordered sequence of categories. Then use these guidelines to check it for measurement effectiveness.

Guideline 1: At least 10 observations of a category.

Andrich threshold (Fk) is approximately the log-ratio of the frequency of adjacent categories. When category frequency is low, then the Andrich threshold is poorly estimated and unstable.
In example: Used counts as low as 3.
Solution: combine adjacent categories, or omit observations (e.g., "don't know")

Guideline 2: Observation distribution.

Irregularity in category observation frequency signals irregularity in usage. Look for unimodal use or peaking in a central or extreme categories.
In example: roller-coaster Used distribution.
Solution: combine adjacent categories, or omit observations (e.g., "other")

Guideline 3: Average category measures advance.

Observed Average measures (of the persons whose observations are in the category) are an empirical indicator of the context in which the category is used. Since higher categories are intended to reflect higher measures, then the average measures are expected to advance.
In example: average measure for category 6 is noticeably less than for category 5.
Solution: combine out of order categories with those below them.

Guideline 4: Outfit mean-squares less than 2.0.

We model a definite amount of randomness in choosing categories. This amount is indicated by a mean-square of 1.0. Values over 2.0 indicate that there is more unexpected than expected randomness. A high mean-square value indicates that this category has been used in contexts in which the expected category is far different.
In example: category 6 has a mean-square of 4.1.
Solution: omit observations, combine categories or drop categories.

Guideline 5: Andrich thresholds advance.

Advancing Andrich thresholds imply that each category in turn is most likely to be chosen. This makes the probability curves look like a range of hills. Disordered Andrich thresholds imply that a category may not be observed as one advances along the variable. Categories with narrow definitions produce disordered Andrich thresholds. Disordered Andrich thresholds do not mean that the categories are out of order. The decision to eliminate or combine narrow categories must be decided substantively based on the reasons for selecting the rating categories. for developmental scales, ordered categories support the interpretation that a rating of k implies having passed through k-1 lower categories.
In example: Andrich Threshold 3 is less than Andrich threshold 2.
Solution: combine categories, edit data, but may not be attainable.

Guideline 6: Ratings imply measures, and measures imply ratings.

This is useful for inference and for confirming the construct validity of the rating scale. Most users of your findings will assume this is true. This is true when the observed values of the average measures measures for each category approximate their expected values.
In example: the most conspicuous failure is category 6. The observed average measure is -.46 logits. The expected average measure is .17 logits. The difference is 0.63 logits.
Solution: combine categories, edit data. A reasonable approximation is usually attainable.

Guideline 7: Andrich thresholds advance by at least 1.4 logits.

When all Andrich threshold advances are larger than 1.4 logits, then the rating scale can be decomposed, in theory, into a series of independent dichotomous items. Even though such dichotomies may not be empirically meaningful, their possibility implies that the rating scale is equivalent to a subtest of (category count - 1) dichotomies. For developmental scale, this supports the interpretation that a rating of k implies successful leaping of k hurdles.
1.4 logits lessens with more categories. In general, for m+1 categories -> m dichotomous items, the minimum thresholds are ln(x / (m+1-x)) for x=1 to m.
In example: this is not seen, due to disordering.
Solution: combine categories, edit data, but may not be attainable.

Guideline 8: Andrich thresholds advance by less than 5.0 logits

When adjacent Andrich thresholds are too far apart, then a category becomes too wide and a less informative dead zone appears in the middle of the category. This corresponds to a sag in the statistical information available from the item. Often this results from Guttman-style (forced consensus) rating procedures.
In example: this is not seen. The thresholds are close together.
Solution: define more categories; change rating procedures.

MESA Research Note #2 by John Michael Linacre
Midwest Objective Measurement Seminar, Chicago, June 1997


Go to Top of Page
Go to Institute for Objective Measurement Page

Please help with Standard Dataset 4: Andrich Rating Scale Model



Rasch Publications
Rasch Measurement Transactions (free, online) Rasch Measurement research papers (free, online) Probabilistic Models for Some Intelligence and Attainment Tests, Georg Rasch Applying the Rasch Model 3rd. Ed., Bond & Fox Best Test Design, Wright & Stone
Rating Scale Analysis, Wright & Masters Introduction to Rasch Measurement, E. Smith & R. Smith Introduction to Many-Facet Rasch Measurement, Thomas Eckes Invariant Measurement: Using Rasch Models in the Social, Behavioral, and Health Sciences, George Engelhard, Jr. Statistical Analyses for Language Testers, Rita Green
Rasch Models: Foundations, Recent Developments, and Applications, Fischer & Molenaar Journal of Applied Measurement Rasch models for measurement, David Andrich Constructing Measures, Mark Wilson Rasch Analysis in the Human Sciences, Boone, Stave, Yale
in Spanish: Análisis de Rasch para todos, Agustín Tristán Mediciones, Posicionamientos y Diagnósticos Competitivos, Juan Ramón Oreja Rodríguez

To be emailed about new material on www.rasch.org
please enter your email address here:

I want to Subscribe: & click below
I want to Unsubscribe: & click below

Please set your SPAM filter to accept emails from Rasch.org

www.rasch.org welcomes your comments:
Please email inquiries about Rasch books to books \at/ rasch.org

Your email address (if you want us to reply):

 

FORUMRasch Measurement Forum to discuss any Rasch-related topic

Coming Rasch-related Events
March 31, 2017, Fri. Conference: 11th UK Rasch Day, Warwick, UK, www.rasch.org.uk
April 2-3, 2017, Sun.-Mon. Conference: Validity Evidence for Measurement in Mathematics Education (V-M2Ed), San Antonio, TX, Information
April 26-30, 2017, Wed.-Sun. NCME, San Antonio, TX, www.ncme.org - April 29: Ben Wright book
April 27 - May 1, 2017, Thur.-Mon. AERA, San Antonio, TX, www.aera.net
May 26 - June 23, 2017, Fri.-Fri. On-line workshop: Practical Rasch Measurement - Core Topics (E. Smith, Winsteps), www.statistics.com
June 30 - July 29, 2017, Fri.-Fri. On-line workshop: Practical Rasch Measurement - Further Topics (E. Smith, Winsteps), www.statistics.com
July 31 - Aug. 3, 2017, Mon.-Thurs. Joint IMEKO TC1-TC7-TC13 Symposium 2017: Measurement Science challenges in Natural and Social Sciences, Rio de Janeiro, Brazil, imeko-tc7-rio.org.br
Aug. 7-9, 2017, Mon-Wed. In-person workshop and research coloquium: Effect size of family and school indexes in writing competence using TERCE data (C. Pardo, A. Atorressi, Winsteps), Bariloche Argentina. Carlos Pardo, Universidad Catòlica de Colombia
Aug. 7-9, 2017, Mon-Wed. PROMS 2017: Pacific Rim Objective Measurement Symposium, Sabah, Borneo, Malaysia, proms.promsociety.org/2017/
Aug. 10, 2017, Thurs. In-person Winsteps Training Workshop (M. Linacre, Winsteps), Sydney, Australia. www.winsteps.com/sydneyws.htm
Aug. 11 - Sept. 8, 2017, Fri.-Fri. On-line workshop: Many-Facet Rasch Measurement (E. Smith, Facets), www.statistics.com
Aug. 18-21, 2017, Fri.-Mon. IACAT 2017: International Association for Computerized Adaptive Testing, Niigata, Japan, iacat.org
Sept. 15-16, 2017, Fri.-Sat. IOMC 2017: International Outcome Measurement Conference, Chicago, jampress.org/iomc2017.htm
Oct. 13 - Nov. 10, 2017, Fri.-Fri. On-line workshop: Practical Rasch Measurement - Core Topics (E. Smith, Winsteps), www.statistics.com
Jan. 5 - Feb. 2, 2018, Fri.-Fri. On-line workshop: Practical Rasch Measurement - Core Topics (E. Smith, Winsteps), www.statistics.com
Jan. 10-16, 2018, Wed.-Tues. In-person workshop: Advanced Course in Rasch Measurement Theory and the application of RUMM2030, Perth, Australia (D. Andrich), Announcement
Jan. 17-19, 2018, Wed.-Fri. Rasch Conference: Seventh International Conference on Probabilistic Models for Measurement, Matilda Bay Club, Perth, Australia, Website
May 25 - June 22, 2018, Fri.-Fri. On-line workshop: Practical Rasch Measurement - Core Topics (E. Smith, Winsteps), www.statistics.com
June 29 - July 27, 2018, Fri.-Fri. On-line workshop: Practical Rasch Measurement - Further Topics (E. Smith, Winsteps), www.statistics.com
Aug. 10 - Sept. 7, 2018, Fri.-Fri. On-line workshop: Many-Facet Rasch Measurement (E. Smith, Facets), www.statistics.com
Oct. 12 - Nov. 9, 2018, Fri.-Fri. On-line workshop: Practical Rasch Measurement - Core Topics (E. Smith, Winsteps), www.statistics.com
The HTML to add "Coming Rasch-related Events" to your webpage is:
<script type="text/javascript" src="http://www.rasch.org/events.txt"></script>

 

Our current URL is www.rasch.org

The URL of this page is www.rasch.org/rn2.htm