The Efficacy of Unconditional [Joint] Maximum Likelihood Bias Correction

Jansen, van den Wollenberg and Wierda (JWW, 1988) object to the bias in unconditional [joint] maximum likelihood estimation (JMLE, formerly UMLE or UCON) of Rasch parameters. Their comments on the necessity of the Rasch model for measurement and their algebra are impeccable. The practical consequences of their work, however, contradict their objections. The crucial question for practitioners is whether there is a convenient correction for JMLE bias which is accurate enough for practical purposes. Psychometricians can, in fact, find firm support for the use of JMLE in the articles by JWW.

Even though JWW perseverate on their discovery that the Wright-Douglas (1977) correction of ((K-l)/K) for JMLE bias (K is the number of test items) is slightly inexact for very short tests, the conclusion JWW actually report is that the difference between bias predicted by (K/(K-1)) and the bias JWW observe "practically disappears" (their statement) when tests have more than 10 items. This JWW statement goes further than the Wright and Douglas (1977) recommendation of ((K-1)/K) for tests of more than 20 items.

JMLE bias has no effect on the relative position of items and so no effect on substantive interpretations of the variable defined by the item calibrations. There are, however, two applications in which bias in item difficulties could become a practical problem. These are the effects of item bias on person measurement and on test equating.

PERSON MEASUREMENT BIAS?

What effect does JMLE item calibration bias have on person measures? The aim of testing is to provide person measures sufficiently accurate for fair evaluation. The ((K-l)/K) correction for bias is applied to the item difficulties after they are centered at zero. This makes the measures most affected by error in bias correction to be those associated with extreme scores R = 1 and R = K-1 (R is the number of right answers and K is the number of test items). To discover the maximum effect of JWW inaccuracy in ((K-l)/K) on person measures, I use the values JWW claim to be "correct" in their Tables 1-3, their associated item distributions and test lengths (K = 5,10,15,20) to calculate the person measurement bias when R = 1.

The relevant UFORM formulae (which are exact for the uniform tests used by JWW) are derived in Wright and Douglas (1975, 21-24, 32) and applied in Wright and Stone (1979, 143-151, 212-215). I express the maximum person measurement bias in log-odds units to show its tiny magnitude and in standard error units to show its statistical insignificance.

Maximum Measurement Bias Due to JMLE Item Calibration After Correction (K-l)/K
in Logits
Test Length Item Parameter Range
K -2,+2 -3,+ 1 -3,+3 -4,+4
5
10
15
20
.05
.02
.02
.02
.11
.06
.05
.04
.20
.09
.06
.05
.43
.18
.15
.10
Maximum Measurement Bias Due to JMLE Item Calibration After Correction (K-l)/K
in Standard Errors of Measurement
Test Length Item Parameter Range
K -2,+2 -3,+ 1 -3,+3 -4,+4
5
10
15
20
.04
.02
.02
.02
.09
.05
.05
.04
.15
.08
.05
.05
.31
.15
.14
.09

For method of computation see BEST TEST DESIGN, Wright and Stone, 1979, pages 143-151, 212-215.

One sees immediately that, even for K = 5, JMLE item bias is of no practical consequence as far as person measurement is concerned. Except for the 5 item, 8 logit test, a very rare configuration, maximum measurement bias is less than .21 logits (less than .16 standard errors of measurement!).

For tests of usual length and width - more than 10 items, less than 6 logits - the maximum measurement bias due to JWW's results is ALWAYS less than .09 logits (less than .08 standard errors or measurement!). Even these minute discrepancies only occur when scores are extreme, R = 1 or R = K-1. When tests are on target, observed scores cluster around K/2 where JMLE measurement bias is zero. It is clear that person measurement bias cannot be a reason to avoid JMLE.

TEST EQUATING BIAS?

What effect does JMLE item calibration bias have on test equating? The Rasch way to equate two tests is to include a subset of common items in both, to calibrate each test separately, to plot the resulting pairs of item estimates for the common items and to use the intercept of a line with a slope of one fitted to these common item points as the equating constant (Wright and Stone 1979, 108-118).

In this procedure inaccuracy in ((K-1)/K) tends to cancel, especially when tests are similar in length and difficulty (the usual situation) because then the inaccuracy is similar for the two calibrations. If, however, tests differ substantially in length and difficulty, then fitting a line with a slope adjusted to the distributions of common item difficulties can remove the effect of bias.

The least biased and most efficient way to equate two or more tests linked by a network of common items and/or common persons is to combine the data from each administration into one large matrix with a column for every item included in any test and a row for every person included in any sample, indicating missing data whenever a person does not take an item. The single Rasch analysis of this one large matrix provides item calibrations and person measures on a common linear scale for all items and all persons involved in any test (Wright and Linacre 1985, Schulz 1987, Wright, Schulz, Congdon and Rossner 1987).

CONDITIONAL ESTIMATION?

JWW advocate a minimum chi-square pair-wise estimation as their cure for the effects of ((K-1)/K) inaccuracy on JMLE. They would have done better by their readers to remind them, instead, of the logically equivalent but statistically superior maximum likelihood pair-wise estimation described by Rasch (1960/1980, 171-172) and Choppin (1968) and applied extensively by Choppin (1976, 1977, 1978, 1983). This method has significant antecedents in Case V of Thurstone's 1927 Law of Comparative Judgement (1927a, 1927b), Bradley and Terry's 1952 method of paired comparisons and Luce's 1959 probabilistic theory of choice. It is easy to use and understand, and generalizes directly to rating scale and partial credit models (Wright and Masters 1982, 67-72, 82-85). Should a real situation actually arise where conditional estimation is seriously deemed worth the trouble, then the Rasch/Choppin pair-wise approach is the method of choice.

CONCLUSION

For practitioners working with tests of more than 10 items, the articles by Jansen, van den Wollenberg and Wierda give no reason at all to avoid unconditional maximum likelihood estimation of Rasch item calibrations and person measures. In fact their articles provide data which firmly supports the adequacy of this practice.

Benjamin D. Wright, 1988

MESA Research Memorandum Number 45
MESA PSYCHOMETRIC LABORATORY

REFERENCES

Bradley, R.A., and Terry, M.E. Rank analysis of incomplete block designs. I. The method of paired comparisons, Biometrika. 1952, 39, 324-45.

Choppin, B.H., An item bank using sample-free calibration. Nature, 1968, 219. 870-872.

Choppin, B.. Recent developments in item banking. In D.N.M. de Gruiter and L.J.T. Van der Kamp (Eds.) Advances in Psychological and Educational Measurement, New York: Wiley, 1976.

Choppin, B.H., Developments in item banking. In R. Sumner (Ed.) Monitoring National Standards of Attainment in Schools. Windsor: NFER, 1977.

Choppin, B.H., Item Banking and the Monitoring of Achievement. Slough: NFER, 1978.

Choppin, B.H., A Fully Conditional Estimation Procedure for Rasch Model Parameters. Los Angeles: UCLA CSE Technical Report No. 196, ERIC Document No. ED 228267, 1983.

Jansen, P.G., Van den Wollenberg, A.L., Wierda, F.W. (1988) Correcting unconditional parameter estimates in the Rasch model for inconsistency. Applied Psychological Measurement 12(3) 297-306.

Luce, R.D. Individual Choice Behavior. New York: Wiley, 1959.

Rasch, G. Probabilistic Models for Some Intelligence and Attainment Tests. (Copenhagen: 1960) Chicago: MESA Press, 1992.

Schulz, E.M. One-step vertical equating with MSCALE. Presented at the Fourth International Workshop on Objective Measurement,University of Chicago, 17 April, 1987, and the American Educational Research Association Annual Meeting. Washington, 22 April, 1987.

Thurstone, L.L. A law of comparative judgment. Psychological Review, 1927a, 34, 273-86.

Thurstone, L.L. The method of paired comparisons for social values. Journal of Abnormal and Social Psychology, 1927b, 21, 384-400.

Van den Wollenberg, A.L., Wierda, F.W., Jansen, P.G. (1988) Consistency of Rasch model parameter estimation: A simulation study. Applied Psychological Measurement 12(3) 307-313.

Wright, B.D. and Douglas, G.A. Best test design and self-tailored testing. Research Memorandum No. 19, Statistical Laboratory, Department of Education, University of Chicago, 1975.

Wright, B.D. and Douglas, G.A. Best procedures for sample-free item analysis. Applied Psychological Measurement 1977 1 281-294.

Wright, B.D. and Linacre, J.M. MICROSCALE. Westport: MEDIAX, 1985.

Wright, B.D. and Masters, G.N. Rating Scale Analysis. Chicago MESA Press, 1982.

Wright, B.D., Schulz, E.M., Congdon, R.T. and Rossner, M. The MSCALE Program for Rasch Measurement. Chicago: MESA Press, 1987.

Wright, B.D. and Stone, M.H. Best Test Design. Chicago: MESA Press, 1979.

This appeared in
Applied Psychological Measurement
12 (3) pp. 315-318, September 1988.


Go to Top of Page
Go to Institute for Objective Measurement Page



Rasch-Related Resources: Rasch Measurement YouTube Channel
Rasch Measurement Transactions & Rasch Measurement research papers - free An Introduction to the Rasch Model with Examples in R (eRm, etc.), Debelak, Strobl, Zeigenfuse Rasch Measurement Theory Analysis in R, Wind, Hua Applying the Rasch Model in Social Sciences Using R, Lamprianou Journal of Applied Measurement
Rasch Models: Foundations, Recent Developments, and Applications, Fischer & Molenaar Probabilistic Models for Some Intelligence and Attainment Tests, Georg Rasch Rasch Models for Measurement, David Andrich Constructing Measures, Mark Wilson Best Test Design - free, Wright & Stone
Rating Scale Analysis - free, Wright & Masters
Virtual Standard Setting: Setting Cut Scores, Charalambos Kollias Diseño de Mejores Pruebas - free, Spanish Best Test Design A Course in Rasch Measurement Theory, Andrich, Marais Rasch Models in Health, Christensen, Kreiner, Mesba Multivariate and Mixture Distribution Rasch Models, von Davier, Carstensen
Rasch Books and Publications: Winsteps and Facets
Applying the Rasch Model (Winsteps, Facets) 4th Ed., Bond, Yan, Heene Advances in Rasch Analyses in the Human Sciences (Winsteps, Facets) 1st Ed., Boone, Staver Advances in Applications of Rasch Measurement in Science Education, X. Liu & W. J. Boone Rasch Analysis in the Human Sciences (Winsteps) Boone, Staver, Yale Appliquer le modèle de Rasch: Défis et pistes de solution (Winsteps) E. Dionne, S. Béland
Introduction to Many-Facet Rasch Measurement (Facets), Thomas Eckes Rasch Models for Solving Measurement Problems (Facets), George Engelhard, Jr. & Jue Wang Statistical Analyses for Language Testers (Facets), Rita Green Invariant Measurement with Raters and Rating Scales: Rasch Models for Rater-Mediated Assessments (Facets), George Engelhard, Jr. & Stefanie Wind Aplicação do Modelo de Rasch (Português), de Bond, Trevor G., Fox, Christine M
Exploring Rating Scale Functioning for Survey Research (R, Facets), Stefanie Wind Rasch Measurement: Applications, Khine Winsteps Tutorials - free
Facets Tutorials - free
Many-Facet Rasch Measurement (Facets) - free, J.M. Linacre Fairness, Justice and Language Assessment (Winsteps, Facets), McNamara, Knoch, Fan

To be emailed about new material on www.rasch.org
please enter your email address here:

I want to Subscribe: & click below
I want to Unsubscribe: & click below

Please set your SPAM filter to accept emails from Rasch.org

www.rasch.org welcomes your comments:
Please email inquiries about Rasch books to books \at/ rasch.org

Your email address (if you want us to reply):

 

FORUMRasch Measurement Forum to discuss any Rasch-related topic

Coming Rasch-related Events
May 17 - June 21, 2024, Fri.-Fri. On-line workshop: Rasch Measurement - Core Topics (E. Smith, Winsteps), www.statistics.com
June 12 - 14, 2024, Wed.-Fri. 1st Scandinavian Applied Measurement Conference, Kristianstad University, Kristianstad, Sweden http://www.hkr.se/samc2024
June 21 - July 19, 2024, Fri.-Fri. On-line workshop: Rasch Measurement - Further Topics (E. Smith, Winsteps), www.statistics.com
Aug. 5 - Aug. 6, 2024, Fri.-Fri. 2024 Inaugural Conference of the Society for the Study of Measurement (Berkeley, CA), Call for Proposals
Aug. 9 - Sept. 6, 2024, Fri.-Fri. On-line workshop: Many-Facet Rasch Measurement (E. Smith, Facets), www.statistics.com
Oct. 4 - Nov. 8, 2024, Fri.-Fri. On-line workshop: Rasch Measurement - Core Topics (E. Smith, Winsteps), www.statistics.com
Jan. 17 - Feb. 21, 2025, Fri.-Fri. On-line workshop: Rasch Measurement - Core Topics (E. Smith, Winsteps), www.statistics.com
May 16 - June 20, 2025, Fri.-Fri. On-line workshop: Rasch Measurement - Core Topics (E. Smith, Winsteps), www.statistics.com
June 20 - July 18, 2025, Fri.-Fri. On-line workshop: Rasch Measurement - Further Topics (E. Smith, Facets), www.statistics.com
Oct. 3 - Nov. 7, 2025, Fri.-Fri. On-line workshop: Rasch Measurement - Core Topics (E. Smith, Winsteps), www.statistics.com

 

Our current URL is www.rasch.org

The URL of this page is www.rasch.org/memo45.htm