MESA Memo 50: Motor and Cognitive Structure of the FIM

The Functional Independence Measure (FIM)^SM records the severity of disability of rehabilitation patients. The necessarily S-shaped relationship between the finite range of recorded FIM raw scores and the conceptually infinite range of additive disability measures is resolved through Rasch analysis. The analysis of admission and discharge FIM ratings of 14,799 patients shows that the 18 FIM items define two statistically and clinically different indicators. 13 items define disability in motor functions. 5 items define disability in cognitive functions. Additive measures for each indicator have the same characteristics at admission and discharge, so that these measures can be used to assess change in patient status.

Key words:Disability evaluation; FIM; Measurement; Scales; Rasch Model; Rehabilitation

The deficiencies of numerical analysis based on ordinal "raw score" measures are well documented.^3,7 The fact that useful linear, interval measures can be obtained from most ordinal scale data has also been demonstrated.^2,5 What requires further investigation are the circumstances under which the transformation from ordinal scores to linear measures can be accomplished successfully. Raw scores are aggregated counts of successes on whatever test items are included in the assessment instrument. Interval measures are constructed from ratings on test items that cooperate together to be indicators of "more" of the same, one underlying trait. In this paper, as an example of an approach to the general problem of ordinal score to interval measure conversion, the raw scores on an instrument are examined for the extent and manner in which they support interval measurement.

The Functional Independence Measure (FIM¹) was designed to measure disability from total dependency to completely independent functioning. The FIM employs 18 items in which a patient's degree of disability and burden of care are apparent. These items are listed in Figure 1. Each item is rated according to the seven-level classification shown in Figure 2. Ratings are accumulated across items to indicate severity of disability. The level of independence of a patient's performance on each item is rated by therapists and other care-providers first at admission to rehabilitation and second at discharge from rehabilitation. 29,598 of these ratings are the empirical basis for this examination of the measurement properties of the FIM.

A FIM raw score is not a "measure" in the usual sense of the word. The familiar measurement devices of physical science, meter rulers and spring-balances, implement two conditions which are required for mathematical manipulation of measures.⁶ The first condition is that of order: the greater the quantity, the larger the number associated with it. The second is addition: the additional quantity associated with the increase of a number by one unit is of the same size, whatever the size of the original quantity. Since the FIM raw score covers a limited range and is made up of the sum of numerically labelled, but actually only ordinal, recorded ratings, it fails to meet the addition condition for measurement. The relationship between FIM raw scores and their equivalent measures is not linear. In order for raw scores on the FIM to attain the same measurement status as those expected of physical science measurement, the raw scores must be transformed into the measurements they imply on an additive scale. This transformation is done by Rasch analysis^4,9 and enables the degree of disability reported for each patient to be located at an explicit position along an additive scale of disability.

The FIM must not only function as a measuring device, but must also measure what its designers intend. This requires an investigation of the validity of the FIM as a measure of disability. The empirically determined difficulties of the FIM items must concur with clinical experience of the inherent difficulty of these activities. Less difficult FIM items must correspond to activities that are clinically-observed to be easier to perform, and vice-versa. Furthermore, each patient measure must communicate a clear picture of the degree of disability it represents. The extent to which the FIM achieves these goals is evaluated through a comparison of the pattern of ratings on the FIM items of each patient, with the pattern of ratings predicted by the patient's FIM measure.

Whenever the FIM is used, its 18 FIM items are intended to combine to yield an unambiguous quantitative summary which provides one "basic indicator of severity of disability".¹ But, comparison of FIM raw scores at different time-points in the rehabilitation process requires that the FIM function in a stable manner, i.e., in the same way each time it is used. The stability of the FIM can be inferred by comparing how it works at admission with how it works at discharge. If the FIM functions in the same way at admission and discharge, then FIM measures provide a valid basis for analyzing changes in patient status.

The admission and discharge FIM ratings for 14,799 patients were selected from the patient records of the Uniform Data System for Medical Rehabilitation (UDS) at the State University of New York at Buffalo. All Health Care Financing Administration-defined (HCFA) impairment groups were represented, including 6,412 stroke patients, 3,562 orthopedic patients, but only 25 cases of congenital deformity and 12 of burns. To minimize the participation of atypical patients with unusual medical complications, selection was restricted to first-time rehabilitation admissions. Patient records with FIM ratings of total dependence at admission or complete independence at discharge, on either motor or cognitive items, were excluded because these records indicate performance outside the effective range of the FIM. Also excluded were patient records in which some FIM items were not rated, or in which the ratings on any of the three items with alternate rating modes indicated the less common mode of functioning, i.e., wheelchair for "Walk or wheelchair" (L), visual for "Comprehension" (N) or nonverbal for "Expression" (O).

The FIM ratings were transformed to linear measures by Rasch analysis with the computer program BIGSCALE.⁸ The calibration for each item is the additive measure of the difficulty it entails, averaged across the seven rating levels of dependency. FIM item calibrations and patient independence measures are reported in statistically convenient, additive measurement units (called "logits") on a shared linear scale.

The initial analysis of FIM items attempted to combine all 18 items to quantify a single measurement scale of disability. The "Combined Analysis Measures" are shown in Table 1. The admission and discharge calibrations in Table 1 are plotted in Figure 3. Figure 3 shows the expected and encouraging result that the easiest FIM item to accomplish is "Eating" (A). The most difficult item is "Stairs" (M) -- defined as ascending and descending 12 to 14 stairs.

Accompanying each measure in Table 1 is a fit or validity statistic comparing the variance of the observed ratings with that predicted by the Rasch analysis. Values near 1.0 indicate satisfactory functioning of the item. Values below 0.7 or above 1.3 indicate misfit and threaten valid measurement. Such values are marked by asterisks. Values above 1.3 indicate a lack of coherence between ratings on the item and the overall levels of disability of patients. Values less than 0.7 indicate that the item is falling short in providing independent information as to patient status.

The fit statistics in Table 1 reveal misfit among the 5 cognitive items. Every cognitive item provokes fit statistics above 1 at both time points. In contrast, 8 of the 13 motor items show fit statistics below 1 on both occasions. This indicates that the motor and cognitive items do not work together in a homogeneous way to measure disability.

An even clearer indication of the disharmony between motor and cognitive items appears in Figure 3. The separation between the straight line connecting the 13 motor items and the straight line connecting the 5 cognitive items shows that, when all items are combined into one set, the admission and discharge calibrations of all 18 FIM items are not equivalent. The mixture of 13 motor and 5 cognitive items produces a measuring system in which the meaning of a particular measure depends on whether the measure is for admission or discharge. This shows that a single pattern of disability combining 13 motor and 5 cognitive items cannot be identified for patient measures on the FIM.

The systematic discontinuity between the 13 motor and the 5 cognitive items in evidence in Table 1 and Figure 3 suggests that the statistical validity and clinical meaning of FIM measures can be improved by dividing the 18 items into two substantively-distinguishable groups of items. Accordingly, two separate Rasch analyses, first of the 13 motor items, and then of the 5 cognitive items, were performed. The results are in Tables 2 and 3, and Figures 4 and 5.

The measurement capability of the FIM is improved when the motor and cognitive items are separated into two different FIM indicators. In Table 3, the 5 cognitive items now show acceptable fit statistics at admission and discharge. A further indication of the improvement of the cognitive part of the FIM is that the range of cognitive item calibrations is 50% greater in Table 3 than in Table 1, where all 18 FIM items were calibrated together. This brings out a 50% increase in the facility of the FIM to distinguish levels of cognitive disability. "Social interaction" (P) emerges as the most irregular of the 5 cognitive items. It not only displays the most misfit at each time point, but also the largest change in calibration between time points, as shown in Figure 5.

Comparing Tables 2 and 1, the range of the 13 motor item calibrations has also increased as a consequence of the removal of the cognitive items. In addition, misfit for "Eating" (A), "Bladder" (G), "Bowel" (H), and "Stairs" (M) is now evident in this more precisely constructed context.

The item calibrations in Table 2 enable the construction of FIM motor measures from FIM motor raw scores. Figure 6 plots the relationship between FIM ratings and FIM measures for motor items at admission. The S-shaped curve, called an ogive, shows how the bounded range of FIM raw ratings maps onto an unbounded linear scale of FIM patient performance. This also shows how unsatisfactory it is to treat FIM raw scores as though they were already linear measures. The slopes at the extremes of the S-shape are almost flat for the FIM raw scores but steep for the linear measures. A change in disability of 10 raw score points at the extremes of the FIM range means four times as much change on the linear scale as a change of 10 raw score points at the center of the FIM range. This explains the apparent, but illusory, ceiling effect so often lamented towards the end of rehabilitation. Ever greater actual improvement in functioning is required for each extra raw score point gain as the extremes of the score range are approached. This is one reason why conversion from raw scores to linear measures is so essential to quantifying changes in patient status.

After splitting the FIM into two indicators, Tables 2 and 3 still report slightly different calibrations for the FIM items at admission and discharge. Consequently the raw score-to-measure relationships are not identical at admission and discharge, as shown in Figure 7. In this Figure, the effect on patient measures of the differences between FIM functioning at the two time-points is indicated by the vertical discrepancies between the paired solid and dotted curves. Since the standard error of a patient's disability measure is 0.25 linear units, however, these particular discrepancies are small enough to be ignored in practice.

Disability is manifestly complex. Each FIM item probes disability in a different way. An extreme position could be adopted in which each FIM item defined a unique path of rehabilitation, but then there would no longer be a general frame of reference within which to study patient progress, therapy effectiveness, or by which to allocate scarce resources. This analysis has shown that combining the FIM items into motor and cognitive groups is practical and provides good measurement characteristics.

Unique substantive characteristics of each item remain, and some are prominent enough to provoke statistical misfit among the ratings. Stairs (Item M) shows misfit at both admission and discharge. This happens because, when this item is not observable, or when the patient does not climb stairs, the FIM Guide instructs users to score the item "Total assist" (1). Because of the safety considerations associated with climbing stairs, clinicians do not attempt to assess a patient's performance on this item unless they are confident that the patient can perform the activity safely. The intercession in the rating process of this clinical decision causes Stairs to be rated irregularly. Thus Stairs ratings cannot be expected to participate uniformly in defining an exactly stable overall measure. Nevertheless, the inclusion of Stairs in the FIM can be useful for patients approaching completely independent functioning. Revised scoring instructions, in which "Not Testable" is differentiated from "Total Assist," will improve the measurement properties of the Stairs item.

A prime concern is to measure the change of patient disability from admission to discharge. To do this, the measuring instrument must function in the same way at both times. The quantitative differences between each item's calibrations at admission and discharge will always be statistically significant because of the enormous sample sizes, and hence extremely small standard errors. Only a few items, however, have meaningfully different calibrations. These are identified in Figures 1, 2 and 3. In the combined analysis, the 5 cognitive items have differential performance. This identifies them as a second FIM indicator. In the separate motor and cognitive analyses the only noticeably time-point-dependent calibrations belong to items that have already been identified as misfitting: "Bladder" (G) and "Bowel" (H), which involve neurological as well as functional deficits, "Stairs" (M), the rating of which considers also safety, and "Social Interaction" (P), which may be affected by the increase of staff familiarity with patients during rehabilitation.

It could be argued that the item misfit and differential calibrations across time-points, evident in Tables 2 and 3, are indications that the FIM items could be separated into yet further, still more consistent sub-groups. Here, the principles of parsimony and clinical relevance come into play. Though each FIM item contributes its own, unique information, it is impractical clinically, and burdensome analytically to allow each item to act as a separate measurement device. One of the reasons for grouping items is to allow what they share in common to dominate the ways in which they differ. It is when the commonality is undermined by systematic differences, such as those between motor and cognitive items, that separate sub-grouping becomes imperative.

The successful endeavor to obtain interval measures from FIM scores, as an illustration of general procedures, has disclosed more of the nature of the FIM. The FIM was intended to quantify one unambiguous disability indicator. Analysis of FIM ratings confirms that the FIM has measurement capability that can be realized by converting the ordinal FIM raw scores into two linear measures. The FIM items detect two substantively different aspects of disability: motor function and cognitive function. Separate analyses of the items representing these two indicators show that two distinct measures provide more useful information from the FIM than one combined measure. Further, the two measures have the same meaning at admission and discharge, permitting a valid quantitative comparison of measures at the two time-points. Inspection of the residual misfit of the data from these two newly defined FIM measures suggests that further refinement of the FIM motor items would produce an even more precise and useful measurement system.

1.Forer S, Granger C, et al. Functional Independence Measure. Buffalo NY: The Buffalo General Hospital, State University of New York at Buffalo, 1987.

2.McArthur DL, Cohen MJ, Schandler SL. Rasch analysis of functional assessment scale: an example using pain behaviors. Arch Phys Med Rehabil 1991;72:296-304.

3.Merbitz C, Morris J, Grip JC: Ordinal scales and foundations of misinference. Arch Phys Med Rehabil 70:308-32, 1989

4.Rasch G. Some probabilistic models for intelligence and attainment tests. Chicago: University of Chicago Press, 1980.

5.Silverstein BS, Fisher WP, Kilgore KM, et al. Applying psychometric criteria to functional assessment in medical rehabilitation: II. Defining interval measures. Arch Phys Med Rehabil 1992;73:507-518.

6.Smith BO. Logical aspects of educational measurement. New York: Columbia University Press, 1938.

7.Wright BD, Linacre JM. Observations are always ordinal; measurements, however, must be interval. Arch Phys Med Rehabil 1989;70:857-860.

8.Wright BD, Linacre JM, Schulz M. BIGSCALE Rasch analysis computer program. Chicago: MESA Press, 1989.

This appeared in
Archives of Physical Medicine and Rehabilitation
75 (2) pp. 127-132, February 1994

Table 1. All 18 FIM items calibrated at admission and discharge with mean-square variance-ratio fit statistics. "*" indicates serious misfit.

Table 2. 13 FIM Motor items calibrated at admission and discharge with mean-square variance-ratio fit statistics. "*" indicates serious misfit within the more consistent, improved measurement system.

Table 3. 5 FIM Cognitive items calibrated at admission and discharge with mean-square
variance-ratio fit statistics.

Figure 3. FIM item calibrations for all 18 items showing the dissimilarity of admission and discharge calibrations when motor and cognitive items are combined. Rehabilitation improves patient independence more on motor items (A-M) than it does on cognitive items (N-R).

Figure 4. FIM item calibrations for 13 Motor Items showing the similarity of admission and discharge calibrations when motor items are calibrated separately. The calibrations lie along the identity line except for Stairs (M) which is noticeably harder at admission.

Figure 5. FIM item calibrations for 5 Cognitive Items showing similarity of calibrations, when cognitive items are calibrated separately. The calibrations lie along the identity line except for Social Interaction (P) which is noticeably harder at admission.

Figure 6. Relationship between raw Motor FIM rating scores and linear measures at Admission showing its non-linear, S-shape.

Figure 7. Relationship between raw FIM rating scores and linear measures showing practical equivalence of admission and discharge relationships for Cognitive and for Motor measures. Solid lines () indicate the admission relationships. Dotted lines (...) indicate the discharge relationships.

Allen W. Heinemann, Ph.D.
Rehabilitation Institute of Chicago
Department of Physical Medicine and Rehabilitation
Northwestern University Medical School
Chicago, Illinois

Carl V. Granger, M.D.
Byron B. Hamilton, M.D., Ph.D.
Center for Functional Assessment Research
State University of New York at Buffalo
Buffalo, New York

An earlier version of this appeared in
Archives of Physical Medicine and Rehabilitation
70 (12) pp. 857-860, November 1989

Rasch-Related Resources: Rasch Measurement YouTube Channel
Rasch Measurement Transactions & Rasch Measurement research papers - free	An Introduction to the Rasch Model with Examples in R (eRm, etc.), Debelak, Strobl, Zeigenfuse	Rasch Measurement Theory Analysis in R, Wind, Hua	Applying the Rasch Model in Social Sciences Using R, Lamprianou	El modelo métrico de Rasch: Fundamentación, implementación e interpretación de la medida en ciencias sociales (Spanish Edition), Manuel González-Montesinos M.
Rasch Models: Foundations, Recent Developments, and Applications, Fischer & Molenaar	Probabilistic Models for Some Intelligence and Attainment Tests, Georg Rasch	Rasch Models for Measurement, David Andrich	Constructing Measures, Mark Wilson	Best Test Design - free, Wright & Stone Rating Scale Analysis - free, Wright & Masters
Virtual Standard Setting: Setting Cut Scores, Charalambos Kollias	Diseño de Mejores Pruebas - free, Spanish Best Test Design	A Course in Rasch Measurement Theory, Andrich, Marais	Rasch Models in Health, Christensen, Kreiner, Mesba	Multivariate and Mixture Distribution Rasch Models, von Davier, Carstensen
Rasch Books and Publications: Winsteps and Facets
Applying the Rasch Model (Winsteps, Facets) 4th Ed., Bond, Yan, Heene	Advances in Rasch Analyses in the Human Sciences (Winsteps, Facets) 1st Ed., Boone, Staver	Advances in Applications of Rasch Measurement in Science Education, X. Liu & W. J. Boone	Rasch Analysis in the Human Sciences (Winsteps) Boone, Staver, Yale	Appliquer le modèle de Rasch: Défis et pistes de solution (Winsteps) E. Dionne, S. Béland
Introduction to Many-Facet Rasch Measurement (Facets), Thomas Eckes	Rasch Models for Solving Measurement Problems (Facets), George Engelhard, Jr. & Jue Wang	Statistical Analyses for Language Testers (Facets), Rita Green	Invariant Measurement with Raters and Rating Scales: Rasch Models for Rater-Mediated Assessments (Facets), George Engelhard, Jr. & Stefanie Wind	Aplicação do Modelo de Rasch (Português), de Bond, Trevor G., Fox, Christine M
Exploring Rating Scale Functioning for Survey Research (R, Facets), Stefanie Wind	Rasch Measurement: Applications, Khine	Winsteps Tutorials - free Facets Tutorials - free	Many-Facet Rasch Measurement (Facets) - free, J.M. Linacre	Fairness, Justice and Language Assessment (Winsteps, Facets), McNamara, Knoch, Fan
Rasch Meta-Metres of Growth for Some Intelligence and Attainment Tests: A Meta-metre for some Intelligence and Attainment Tests, David Andrich, Ida Marais, Sonia Sappl

Coming Rasch-related Events
Jan. 16 - Feb. 13, 2025, Fri.-Fri.	On-line workshop: Rasch Measurement - Core Topics (E. Smith, Winsteps), www.statistics.com
Apr. 8 - Apr. 11, 2026, Wed.-Sat.	National Council for Measurement in Education - Los Angeles, CA, ncme.org/events/2026-annual-meeting
Apr. 8 - Apr. 12, 2026, Wed.-Sun.	American Educational Research Association - Los Angeles, CA, www.aera.net/AERA2026
May. 15 - June 12, 2026, Fri.-Fri.	On-line workshop: Rasch Measurement - Core Topics (E. Smith, Winsteps), www.statistics.com
June 19 - July 25, 2026, Fri.-Sat.	On-line workshop: Rasch Measurement - Further Topics (E. Smith, Winsteps), www.statistics.com

	Combined Analysis
	Calibrations		Fit Statistics
FIM Item:	Admission	Discharge	Admission	Discharge
Motor Functions: A.Eating B.Grooming C.Bathing D.Dressing - upper body E.Dressing - lower body F.Toileting G.Bladder management H.Bowel management I.Bed, chair, wheelchair J.Toilet K.Tub, Shower L.Walk M.Stairs	-.79 -.45 .32 -.21 .36 .21 -.26 -.35 .17 .28 .84 .52 1.59	-.96 -.54 .24 -.35 .20 .11 -.28 -.34 .04 .16 .70 .35 1.07	1.1 .8 .6 .7 .6 .7 1.4* 1.2 .6 .6 1.2 1.0 1.7*	1.0 .8 .7 .8 .7 .7 1.4* 1.1 .5 .5 1.0 .7 1.3

Cognitive Functions: N.Comprehension (auditory) O.Expression (verbal) P.Social interaction Q.Problem solving R.Memory	-.70 -.69 -.46 -.14 -.24	-.34 -.32 -.22 .29 .20	1.4* 1.6* 1.1 1.1 1.2	1.7* 1.9* 1.2 1.2 1.3

	Separate "Motor" Analysis
	Calibrations		Fit Statistics
FIM Item:	Admission	Discharge	Admission	Discharge
Motor Functions: A.Eating B.Grooming C.Bathing D.Dressing - upper body E.Dressing - lower body F.Toileting G.Bladder management H.Bowel management I.Bed, chair, wheelchair J.Toilet K.Tub, Shower L.Walk M.Stairs	-1.19 -.78 .20 -.48 .24 .05 -.54 -.66 .00 .14 .84 .44 1.73	-1.31 -.77 .28 -.52 .23 .10 -.43 -.51 .01 .16 .91 .42 1.43	1.3 .9 .7 .8 .6 .7 1.7* 1.4* .6 .6 1.2 1.1 1.6*	1.4* 1.0 .8 .9 .7 .7 1.7* 1.4* .6 .5 1.3 .9 1.5*

	Separate "Cognitive" Analysis
	Calibrations		Fit Statistics
FIM Item:	Admission	Discharge	Admission	Discharge
Cognitive Functions: N.Comprehension (auditory) O.Expression (verbal) P.Social interaction Q.Problem solving R.Memory	-.41 -.39 -.02 .48 .33	-.37 -.35 -.21 .52 .40	1.0 1.1 1.2 .8 .8	1.1 1.2 1.2 .7 .8

FUNCTIONAL INDEPENDENCE MEASURE
Classification	Item	In this analysis
Self-care	A.Eating B.Grooming C.Bathing D.Dressing - upper body E.Dressing - lower body F.Toileting
Sphincter control	G.Bladder management H.Bowel management
Mobility	I.Bed, chair, wheelchair J.Toilet K.Tub, Shower
Locomotion	L.Walk or wheelchair M.Stairs	walk only
Communication	N.Comprehension O.Expression	auditory only verbal only
Social cognition	P.Social interaction Q.Problem solving R.Memory

Degree of Dependency	Level of functioning
No helper	7.Complete independence 6.Modified independence
Modified dependence on a helper	5.Supervision 4.Minimal assist(at least 75% independence) 3.Moderate assist(at least 50% independence)
Complete dependence on a helper	2.Maximal assist(at least 25% independence) 1.Total assist(less than 25% independence)

The Structure and Stability of the Functional Independence Measure (FIM)SM

The Structure and Stability of the Functional Independence Measure (FIM)^SM