Rank Ordering and Rasch Measurement

[Later work suggests that modeling rank-orders as partial-credit items is productive. "Rasch Analysis of Rank-Ordered Data" Linacre J M, Journal of Applied Measurement, 2006, 7:1 129-136 ]

The ideals and axioms of fundamental measurement which Georg Rasch espoused have generally been applied to tests in which the data to be analyzed consist of the direct responses made by examinees to items. This type of data can be termed "two-facet", the two facets being agents (test items) and objects (examinees). In the last two years fundamental measurement has been expanded to many-facet tests, such as performance assessment, in which ratings are given by judges to examinees' performances on several tasks, a three-facet (or more) situation.

Several rank orderings of the same examinees, each ordering made by a different judge, present a rather different measurement problem. A rank ordering looks like the outcome of a one-facet test. The performances of examinees are compared with each other, either by direct encounter or by the judge's thought-experiments. Thus the final ordering no longer has any quantifiable connection with the difficulty of the elements of performance on which the comparison was made, or the severity of the judges who constructed the orderings. Removing judge severity and item difficulty from consideration is often an intended aim of rank ordering. But, this type of data does not appear to be amenable to the familiar axioms of fundamental measurement, e.g., that there must be agents and objects, or to analysis by a Rasch program.

The good news is that objective measurement is possible with rank ordered data. In addition, Rasch analysis of rank ordered data exhibits the Rasch model's usual robustness against missing data, so it does not require that every rank ordering contain every examinee. Each judge need only rank the examinees with whose performance he is familiar, and may omit all the others. So long as there is some network of examinee overlap across the rankings made by the different judges, a coherent overall picture can be constructed.

This overall picture places each examinee at his competence measure on a latent variable, which is marked out in logits and has its local origin at, say, the mean ability of the examinees. Each measure has associated with it a standard error indicating the precision with which the measure has been determined. This information enables examinee measures to be compared in exactly the same manner as the examinee measures derived from the familiar two- facet test, except that it is no longer possible to relate performance levels to item difficulties and their implications for interpreting the substantive meaning of a measure.

Rasch measurement of rank ordered examinees also enables fit statistics to be calculated for the evaluation of the consistency of the performance level of each examinee as reflected in his rankings by the judges. Fit statistics can further report the degree to which each judge's rank ordering is consistent with the estimated measures based on the overall rankings. Especially deviant rankings can be flagged in precisely the same way that unexpected responses are identified in two-facet analysis.

The key to the analysis of rank-ordered data is the deduction that, for measurement to be constructed, each ranking of examinees must function as though it were independent of both the judge who made the ranking and the real or conceptual items which were used by the judge in assessing the relative performance level of examinees.

In the simplest case, rankings of pairs of examinees, what must dominate the data is the paired-comparison of the ordered examinees. In each ordering, any particular examinee is ranked higher or lower than any other particular examinee. Of course, when a set of orderings are obtained, all judges will rarely, if ever, agree perfectly. In fact, we depend on a certain level of stochastic disagreement in order to construct a measurement system.

What is decisive for the quantitative comparison of examinees is the number of times one examinee is ranked higher than another. Examinee n with measure Bn might be ranked HIGHER than examinee m with measure Bm a total of H times across the orderings made by the different judges. In contrast, examinee n might be ranked LOWER than m a total of L times. The ratio H/L is the essential data for the estimation of a distance between examinees n and m as in (B_n - B_m).

A straight-forward derivation of a measurement model from objectivity, similar to that in RMT 1:1 (also see Rasch, 1980, p.171-172), yields the model that must underlie the intention of obtaining meaning from multiple rankings of the same examinees. This measurement model for rank orders is remarkably simple and familiar in appearance:

where P_nm is the probability that n is ranked higher than m and P_mn is the probability that m is ranked higher than n. P_mn + P_nm = 1.

The ratio P_nm/P_mn is realized in the rankings as H/L, and this becomes the empirical data for estimating the parameters. This model has the form of the Bradley-Terry model, but that model is motivated by data description, not measurement.

For rankings of more than two examinees, there are added constraints because examinees are not compared independently, but are reported in a composite rank-order. This alters the final form of the estimation equations from those presented in Rating Scale Analysis (Wright and Masters, 1982) and elsewhere.

Alternative estimation equations can be formulated. One approach is to decompose the rank orderings into paired comparisons. A more convenient conceptualization, however, is to imagine that the judges internalize a rating scale defined such that one examinee is found in each category (or multiple examinees, if tied rankings are allowed). A measurement model for this conceptualization is

where P_rnk is the probability that, in ordering r, examinee n will be ranked k. B_n is the ability of examinee n. B_r is the mean ability of the examinees included in ordering r. F_rk is the step difficulty up from a ranking of k+1 to a ranking of k within ordering r.

A delight of these measurement models is that it doesn't matter, in general, how many judges include each examinee in their rankings. Nor does it matter how many examinees each judge ranks, or even what numerical system is used to record the ranks. The estimates of the measures are derived merely from counting each examinee's location in each ordering.

Initial application of this technique looks promising. A more comprehensive paper will be published in Mark Wilson's (1992) Objective Measurement: Theory into Practice, Vol. 1. (See Chap. 12, p. 195-209.)

Rank ordering and Rasch measurement. Linacre JM. … Rasch Measurement Transactions, 1989, 2:4 p.41-42

Rasch Publications
Rasch Measurement Transactions (free, online)	Rasch Measurement research papers (free, online)	Probabilistic Models for Some Intelligence and Attainment Tests, Georg Rasch	Applying the Rasch Model 3rd. Ed., Bond & Fox	Best Test Design, Wright & Stone
Rating Scale Analysis, Wright & Masters	Introduction to Rasch Measurement, E. Smith & R. Smith	Introduction to Many-Facet Rasch Measurement, Thomas Eckes	Invariant Measurement: Using Rasch Models in the Social, Behavioral, and Health Sciences, George Engelhard, Jr.	Statistical Analyses for Language Testers, Rita Green
Rasch Models: Foundations, Recent Developments, and Applications, Fischer & Molenaar	Journal of Applied Measurement	Rasch models for measurement, David Andrich	Constructing Measures, Mark Wilson	Rasch Analysis in the Human Sciences, Boone, Stave, Yale
in Spanish:	Análisis de Rasch para todos, Agustín Tristán	Mediciones, Posicionamientos y Diagnósticos Competitivos, Juan Ramón Oreja Rodríguez

Go to Institute for Objective Measurement Home Page. The Rasch Measurement SIG (AERA) thanks the Institute for Objective Measurement for inviting the publication of Rasch Measurement Transactions on the Institute's website, www.rasch.org.

Coming Rasch-related Events
May 17 - June 21, 2024, Fri.-Fri.	On-line workshop: Rasch Measurement - Core Topics (E. Smith, Winsteps), www.statistics.com
June 12 - 14, 2024, Wed.-Fri.	1st Scandinavian Applied Measurement Conference, Kristianstad University, Kristianstad, Sweden http://www.hkr.se/samc2024
June 21 - July 19, 2024, Fri.-Fri.	On-line workshop: Rasch Measurement - Further Topics (E. Smith, Winsteps), www.statistics.com
Aug. 5 - Aug. 6, 2024, Fri.-Fri.	2024 Inaugural Conference of the Society for the Study of Measurement (Berkeley, CA), Call for Proposals
Aug. 9 - Sept. 6, 2024, Fri.-Fri.	On-line workshop: Many-Facet Rasch Measurement (E. Smith, Facets), www.statistics.com
Oct. 4 - Nov. 8, 2024, Fri.-Fri.	On-line workshop: Rasch Measurement - Core Topics (E. Smith, Winsteps), www.statistics.com
Jan. 17 - Feb. 21, 2025, Fri.-Fri.	On-line workshop: Rasch Measurement - Core Topics (E. Smith, Winsteps), www.statistics.com
May 16 - June 20, 2025, Fri.-Fri.	On-line workshop: Rasch Measurement - Core Topics (E. Smith, Winsteps), www.statistics.com
June 20 - July 18, 2025, Fri.-Fri.	On-line workshop: Rasch Measurement - Further Topics (E. Smith, Facets), www.statistics.com
Oct. 3 - Nov. 7, 2025, Fri.-Fri.	On-line workshop: Rasch Measurement - Core Topics (E. Smith, Winsteps), www.statistics.com