Outcomes Instruments and Information

Understanding Outcomes Scoring, Normative Study, and Reliability/Validity


The AAOS Outcomes Instruments Normative Data Study was conducted to provide users of the instruments with general/healthy population scores.

The AAOS Outcomes Instruments were tested for validity and reliability in collaboration with the Council of Musculoskeletal Specialty Societies (COMSS) and the Council of Spine Societies (COSS).

Summary means and standard deviations for each Instrument are already included in the formulae on each Scoring Worksheet, but you may open up a summary table that has all of the standardized and normative scores for each instrument (see below).

Understanding Scoring

Scoring of the AAOS outcomes instrument scales for comparative purposes can be done with the computation of a standardized and normative score for each patient. The Excel worksheets included with the instruments include formulae that build in any necessary item recoding, computation of missing items, and known general population means and standard deviations, as needed. A summary of the overall general population means and standard deviations is also available.

Standardized and normative scores CANNOT be computed if less than one-half (< 51%) of the items in a scale are missing.

Individual Standardized Score Overview

The individual's Standardized score is based on the mean of items that make up the scale. Before computing this, all items must be recalibrated so that they are all in the same metric.

The most straightforward way to understand the scoring is that each response is rescaled so that every item has a value in the range 0 through 5 (i.e., lowest score possible = 0 and maximum score possible = 5) for each item.

Next, all of the items comprising a given scale are averaged over the number of items answered.

This average of the rescaled values is multiplied by a constant so that each scale's resulting value falls between 0 and 100. If these values are scored in a way such that high scores represent the least disability (i.e., reversed), this number must be subtracted from 100 to reverse score the scale.

All standardized scores are calculated in the worksheets such that a 0 represents the MOST disability and 100 represents LEAST disability.

Individual Standardized Score Example

Suppose a given scale has 7 items (called A, B, C, D, E, F, G) that are all rated so that a "1" means No Pain and the highest rating represents Most Pain.

Further suppose that four of these items (A, B, C, D) are rated on a 1 - 6 scale, but the rest (E, F, G) are on a scale of 1 - 4.

The first step is to rescale all of the items to have a range of 0 to 5.

  • For items A - D, this is accomplished by subtracting the value "1" from each item's score. Now each of these four items has a value of 0 to 5.
  • Items E-G need to be rescaled in two steps:
    • First, subtract the value "1" from each item (resulting in values of 0 to 3).
    • Then multiply each item by 5 / 3 (resulting in values of 0 to 5 for each item).

The next step is to average these rescaled values (Sum all rescaled items, divide by number of items).

The resulting numbers have the range of 0 to 5 and need to be multiplied by 20 to have a range of 0 to 100.

This number is then subtracted from 100 (so that a 0 is MOST disability and 100 is LEAST disability) This is the patient's Standardized Score and will be in the range of 0 to 100.

Normative Scale Scoring Overview

In order to provide the user with a method of interpreting/comparing the results of a given patient's functioning to a healthy population, Normative values were created. While the AAOS Outcomes Instrument standardized scores are all in the range of 0 to 100, interpretation of the standardized score is not consistent between scales due to differences in how the general, healthy population scored.

For example, a standardized score of 80 on scale "X" appears to be equal to a standardized score of 80 on scale "Y". In actuality, the patient's standardized score of 80 on scale "X" may be 10 points below the average healthy population's standardized score for that scale, but a patient's standardized score of 80 score on scale "Y" may be 6 points above the average healthy population's standardized score.

To make the scores comparative across various scales, the Normative Data Study's results were transformed for each scale so that each has a mean Normative Score of 50. Thus, a patient scoring above 50 on a particular scale is above the general population's average, while a patient scoring below 50 on a scale is below the general, healthy population's norm.

A mean for the overall scale scores was derived from the general United States population and is set at 50, with a standard deviation of 10. (Forcing to a set mean and standard deviation rather than using a standard z-score transformation with a mean of 0 and a standard deviation of 1 provides the basis of comparison for the Normative Scoring.)

  • Each scale is transformed to the 0 to 100 metric (i.e., is made into a standardized score).
  • Using the actual mean and standard deviation of the 0 to 100 scale from the general, healthy population, a formula is applied to derive the normative score. This formula is:
    • Subtract the general population standardized mean from each individual's standardized score.
    • Divide this by the general population's standard deviation.
    • Multiplied the resulting value by 10.
    • Add 50 to the resulting number.
    • The final value is the the Normative Score for that patient.

Normative Scale Scoring Example

To compute the individual normative score requires knowledge of what the general population mean (standardized) score and corresponding standard deviations are. These values are already included in each of the Instrument's Scoring Worksheet, but can also be found in the Outcomes Means Table.

An example of this methodology shows the following calculations:

  • The Standardized score for a patient is 84 on the 0 to 100 scale (calculated as above).
  • Suppose it was found that the healthy population has a standardized mean of 75, and standard deviation of 20 for that scale.
  • The Normative Score for this patient becomes { [(84 - 75) / 20] * 10} + 50 = 54.5.
  • Based on a general population mean of 50, this person's functioning is slightly higher (less disability) on this measure than what is found in the general population.

Normative Study

In 2000, the AAOS completed a Normative Data Study for all existing instruments to provide users with general, healthy population scale scores against which they can compare their patient’s scores, and to further assess the reliability and validity the instruments.

The sampling methodology for the Normative Data Study was designed to collect current health data from a non-institutionalized, general United States population. The sampling plan was stratified by the following demographic markers: gender, co-morbid conditions, ethnicity, and age. A panel methodology was selected as the simplest way to attain the desired sampling distribution. The panel consisted of a group of households selected by National Family Opinion Research (NFO) from among their more than 475,000 participating members to be representative of general, non-institutionalized individuals and the families in which they reside within the United States population.

The Normative Data Study was fielded for the AAOS by the National Research Corporation (NRC) and distributed by direct mail to the representative sample of the general United States population (n=32,108). The overall response rate across all conditions, at 67.4% (21,639 responses), met study expectations. Of the total responses, 20,631 (94.1%) were valid returns. For each of the core instruments sampled, the overall confidence interval of ± 3% at a 95% confidence level set a priori was also exceeded.

Reliability and Validity

Initial testing for reliability and validity of all AAOS outcomes instruments was conducted in collaboration with the Council of Musculoskeletal Specialty Societies (COMSS) and the Council of Spine Societies (COSS). On the basis of these findings, the instruments were further tested using a general population in the Normative Data Study.

Analysis of the normative data using a Multitrait/Multi-item Analysis Program, showed all sub-scales within each of the core instruments exhibited high internal reliability, as well as discriminant and convergent validity. Items within each of the sub-scales contributed roughly equal proportions of information to the total scale scores.

