Outcomes Instruments and Information

Understanding Outcomes Scoring, Normative Study, and Reliability/Validity


The AAOS Outcomes Instruments Normative Data Study was conducted to provide users of the instruments with general/healthy population scores.

The AAOS Outcomes Instruments were tested for validity and reliability in collaboration with the Council of Musculoskeletal Specialty Societies (COMSS) and the Council of Spine Societies (COSS).

Summary means and standard deviations for each Instrument are already included in the formulae on each Scoring Worksheet, but you may open up a summary table that has all of the standardized and normative scores for each instrument (see below).

back to top

Understanding Scoring

Scoring of the AAOS outcomes instrument scales for comparative purposes can be done with the computation of a standardized and normative score for each patient. The Excel worksheets included with the instruments include formulae that build in any necessary item recoding, computation of missing items, and known general population means and standard deviations, as needed. A summary of the overall general population means and standard deviations is also available.

Standardized and normative scores CANNOT be computed if less than one-half (< 51%) of the items in a scale are missing.

Individual Standardized Score Overview

The individual's Standardized score is based on the mean of items that make up the scale. Before computing this, all items must be recalibrated so that they are all in the same metric.

The most straightforward way to understand the scoring is that each response is rescaled so that every item has a value in the range 0 through 5 (i.e., lowest score possible = 0 and maximum score possible = 5) for each item.

Next, all of the items comprising a given scale are averaged over the number of items answered.

This average of the rescaled values is multiplied by a constant so that each scale's resulting value falls between 0 and 100. If these values are scored in a way such that high scores represent the least disability (i.e., reversed), this number must be subtracted from 100 to reverse score the scale.

All standardized scores are calculated in the worksheets such that a 0 represents the MOST disability and 100 represents LEAST disability.

back to top

Individual Standardized Score Example

Suppose a given scale has 7 items (called A, B, C, D, E, F, G) that are all rated so that a "1" means No Pain and the highest rating represents Most Pain.

Further suppose that four of these items (A, B, C, D) are rated on a 1 - 6 scale, but the rest (E, F, G) are on a scale of 1 - 4.

The first step is to rescale all of the items to have a range of 0 to 5.

  • For items A - D, this is accomplished by subtracting the value "1" from each item's score. Now each of these four items has a value of 0 to 5.
  • Items E-G need to be rescaled in two steps:
    • First, subtract the value "1" from each item (resulting in values of 0 to 3).
    • Then multiply each item by 5 / 3 (resulting in values of 0 to 5 for each item).

The next step is to average these rescaled values (Sum all rescaled items, divide by number of items).

The resulting numbers have the range of 0 to 5 and need to be multiplied by 20 to have a range of 0 to 100.

This number is then subtracted from 100 (so that a 0 is MOST disability and 100 is LEAST disability) This is the patient's Standardized Score and will be in the range of 0 to 100.

back to top

Normative Scale Scoring Overview

In order to provide the user with a method of interpreting/comparing the results of a given patient's functioning to a healthy population, Normative values were created. While the AAOS Outcomes Instrument standardized scores are all in the range of 0 to 100, interpretation of the standardized score is not consistent between scales due to differences in how the general, healthy population scored.

For example, a standardized score of 80 on scale "X" appears to be equal to a standardized score of 80 on scale "Y". In actuality, the patient's standardized score of 80 on scale "X" may be 10 points below the average healthy population's standardized score for that scale, but a patient's standardized score of 80 score on scale "Y" may be 6 points above the average healthy population's standardized score.

To make the scores comparative across various scales, the Normative Data Study's results were transformed for each scale so that each has a mean Normative Score of 50. Thus, a patient scoring above 50 on a particular scale is above the general population's average, while a patient scoring below 50 on a scale is below the general, healthy population's norm.

A mean for the overall scale scores was derived from the general United States population and is set at 50, with a standard deviation of 10. (Forcing to a set mean and standard deviation rather than using a standard z-score transformation with a mean of 0 and a standard deviation of 1 provides the basis of comparison for the Normative Scoring.)

  • Each scale is transformed to the 0 to 100 metric (i.e., is made into a standardized score).
  • Using the actual mean and standard deviation of the 0 to 100 scale from the general, healthy population, a formula is applied to derive the normative score. This formula is:
    • Subtract the general population standardized mean from each individual's standardized score.
    • Divide this by the general population's standard deviation.
    • Multiplied the resulting value by 10.
    • Add 50 to the resulting number.
    • The final value is the the Normative Score for that patient.

back to top

Normative Scale Scoring Example

To compute the individual normative score requires knowledge of what the general population mean (standardized) score and corresponding standard deviations are. These values are already included in each of the Instrument's Scoring Worksheet, but can also be found in the Outcomes Means Table.

An example of this methodology shows the following calculations:

  • The Standardized score for a patient is 84 on the 0 to 100 scale (calculated as above).
  • Suppose it was found that the healthy population has a standardized mean of 75, and standard deviation of 20 for that scale.
  • The Normative Score for this patient becomes { [(84 - 75) / 20] * 10} + 50 = 54.5.
  • Based on a general population mean of 50, this person's functioning is slightly higher (less disability) on this measure than what is found in the general population.

back to top

Normative Study

In 2000, the AAOS completed a Normative Data Study for all existing instruments to provide users with general, healthy population scale scores against which they can compare their patient’s scores, and to further assess the reliability and validity the instruments.

The sampling methodology for the Normative Data Study was designed to collect current health data from a non-institutionalized, general United States population. The sampling plan was stratified by the following demographic markers: gender, co-morbid conditions, ethnicity, and age. A panel methodology was selected as the simplest way to attain the desired sampling distribution. The panel consisted of a group of households selected by National Family Opinion Research (NFO) from among their more than 475,000 participating members to be representative of general, non-institutionalized individuals and the families in which they reside within the United States population.

The Normative Data Study was fielded for the AAOS by the National Research Corporation (NRC) and distributed by direct mail to the representative sample of the general United States population (n=32,108). The overall response rate across all conditions, at 67.4% (21,639 responses), met study expectations. Of the total responses, 20,631 (94.1%) were valid returns. For each of the core instruments sampled, the overall confidence interval of ± 3% at a 95% confidence level set a priori was also exceeded.

back to top

Reliability and Validity

Initial testing for reliability and validity of all AAOS outcomes instruments was conducted in collaboration with the Council of Musculoskeletal Specialty Societies (COMSS) and the Council of Spine Societies (COSS). On the basis of these findings, the instruments were further tested using a general population in the Normative Data Study.

Analysis of the normative data using a Multitrait/Multi-item Analysis Program, showed all sub-scales within each of the core instruments exhibited high internal reliability, as well as discriminant and convergent validity. Items within each of the sub-scales contributed roughly equal proportions of information to the total scale scores.

back to top

Selected Bibliography

Among the published studies of the AAOS outcomes instruments are the following (listed by topic area):


  • Asher M, Min Lai S, Burton D, Manna B. Scoliosis research society-22 patient questionnaire: responsiveness to change associated with surgical treatment. Spine. 2003 Jan 1;28(1):70-3.
  • Dvorak MF, Johnson MG, Boyd M, Johnson G, Kwon BK, Fisher CG. Long-term health-related quality of life outcomes following Jefferson-type burst fractures of the atlas. J Neurosurg Spine. 2005 Apr;2(4):411-7.
  • McMillan MR, Patterson PA, Parker V. Percutaneous laser disc decompression for the treatment of discogenic lumbar pain and sciatica: a preliminary report with 3-month follow-up in a general pain clinic population. Photomed Laser Surg. 2004 Oct;22(5):434-8.
  • Molinari RW, Gerlinger T. Functional outcomes of instrumented posterior lumbar interbody fusion in active-duty US servicemen: a comparison with nonoperative management. Spine J. 2001 May-Jun;1(3):215-24.
  • Padua R, Padua L, Ceccarelli E, Romanini E, Bondi R, Zanoli G, Campi A. Cross-cultural adaptation of the lumbar North American Spine Society questionnaire for Italian-speaking patients with lumbar spinal disease. Spine. 2001 Aug 1;26(15):E344-7.

back to top

Lower Extremities

  • Haddad FS, Masri BA, Garbuz DS, Duncan CP. Femoral bone loss in total hip arthroplasty: classification and preoperative planning. Instr Course Lect. 2000;49:83-96.
  • Johanson NA, Liang MH, Daltroy L, Rudicel S, Richmond J. American Academy of Orthopaedic Surgeons lower limb outcomes assessment instruments. Reliability, validity, and sensitivity to change. J Bone Joint Surg Am. 2004 May;86-A(5):902-9.
  • Katz JN, Phillips CB, Poss R, Harrast JJ, Fossel AH, Liang MH, Sledge CB. The validity and reliability of a Total Hip Arthroplasty Outcome Evaluation Questionnaire. J Bone Joint Surg Am. 1995 Oct;77(10):1528-34. Comment in: J Bone Joint Surg Am. 1996 Sep;78(9):1445-6.
  • Pinzur MS, Evans A. Health-related quality of life in patients with Charcot foot. Am J Orthop. 2003 Oct;32(10):492-6.
  • Thordarson DB, Ebramzadeh E, Rudicel SA, Baxter A. Age-adjusted baseline data for women with hallux valgus undergoing corrective surgery. J Bone Joint Surg Am. 2005 Jan;87(1):66-75.
  • Thordarson DB, Rudicel SA, Ebramzadeh E, Gill LH. Outcome study of hallux valgus surgery--an AOFAS multi-center study. Foot Ankle Int. 2001 Dec;22(12):956-9. Erratum in: Foot Ankle Int 2002 Feb;23(2):96.
  • Tran T, Thordarson D. Functional outcome of multiply injured patients with associated foot injury. Foot Ankle Int. 2002 Apr;23(4):340-3.
  • Vannah WM, Davids JR, Drvaric DM, Setoguchi Y, Oxley BJ. A survey of function in children with lower limb deficiencies. Prosthet Orthot Int. 1999 Dec;23(3):239-44.

back to top

PODCI/POSNA (Pediatric/Adolescent)

  • Abel MF, Damiano DL, Blanco JS, Conaway M, Miller F, Dabney K, Sutherland D, Chambers H, Dias L, Sarwark J, Killian J, Doyle S, Root L, LaPlaza J, Widmann R, Snyder B. Relationships among musculoskeletal impairments and functional health status in ambulatory cerebral palsy. J Pediatr Orthop. 2003 Jul-Aug;23(4):535-41.
  • Daltroy LH, Liang MH, Fossel AH, Goldberg MJ. The POSNA pediatric musculoskeletal functional health questionnaire: report on reliability, validity, and sensitivity to change. Pediatric Outcomes Instrument Development Group. Pediatric Orthopaedic Society of North America. J Pediatr Orthop. 1998 Sep-Oct;18(5):561-71.
  • Haynes RJ, Sullivan E. The Pediatric Orthopaedic Society of North America pediatric orthopaedic functional health questionnaire: an analysis of normals. J Pediatr Orthop. 2001 Sep-Oct;21(5):619-21.
  • Lerman JA, Sullivan E, Haynes RJ. The Pediatric Outcomes Data Collection Instrument (PODCI) and functional assessment in patients with adolescent or juvenile idiopathic scoliosis and congenital scoliosis or kyphosis. Spine. 2002 Sep 15;27(18):2052-7; discussion 2057-8.
  • McCarthy JJ, Kim DH, Eilert RE. Posttraumatic genu valgum: operative versus nonoperative treatment. J Pediatr Orthop. 1998 Jul-Aug;18(4):518-21.
  • McCarthy ML, Silberstein CE, Atkins EA, Harryman SE, Sponseller PD, Hadley-Miller NA. Comparing reliability and validity of pediatric instruments for measuring health and well-being of children with spastic cerebral palsy. Dev Med Child Neurol. 2002 Jul;44(7):468-76.
  • Noonan KJ, Flynn JM, Skaggs DL. Report on the 2002 Pediatric Orthopaedic Society of North America Traveling Fellowship. J Pediatr Orthop. 2004 Mar-Apr;24(2):231-4.
  • Oeffinger DJ, Tylkowski CM, Rayens MK, Davis RF, Gorton GE 3rd, D'Astous J, Nicholson DE, Damiano DL, Abel MF, Bagley AM, Luan J. Gross Motor Function Classification System and outcome tools for assessing ambulatory cerebral palsy: a multicenter study. Dev Med Child Neurol. 2004 May;46(5):311-9.
  • Payne WK 3rd, Ogilvie JW. Back pain in children and adolescents. Pediatr Clin North Am. 1996 Aug;43(4):899-917.
  • Pencharz J, Young NL, Owen JL, Wright JG. Comparison of three outcomes instruments in children. J Pediatr Orthop. 2001 Jul-Aug;21(4):425-32.
  • Pirpiris M, Graham HK. Uptime in children with cerebral palsy. J Pediatr Orthop. 2004 Sep-Oct;24(5):521-8.
  • Tervo RC, Azuma S, Stout J, Novacheck T. Correlation between physical functioning and gait measures in children with cerebral palsy. Dev Med Child Neurol. 2002 Mar;44(3):185-90.
  • Ugwonali OF, Lomas G, Choe JC, Hyman JE, Lee FY, Vitale MG, Roye DP Jr. Effect of bracing on the quality of life of adolescents with idiopathic scoliosis. Spine J. 2004 May-Jun;4(3):254-60.
  • Vitale MG, Levy DE, Moskowitz AJ, Gelijns AC, Spellmann M, Verdisco L, Roye DP Jr. Capturing quality of life in pediatric orthopaedics: two recent measures compared. J Pediatr Orthop. 2001 Sep-Oct;21(5):629-35.
  • Yassir WK, Grottkau BE, Goldberg MJ. Costello syndrome: orthopaedic manifestations and functional health. J Pediatr Orthop. 2003 Jan-Feb;23(1):94-8.
  • Young NL, Wright JG. Measuring pediatric physical function. J Pediatr Orthop. 1995 Mar-Apr;15(2):244-53.

back to top


  • Hunsaker FG, Cioffi DA, Amadio PC, Wright JG, Caughlin B. The American academy of orthopaedic surgeons outcomes instruments: normative values from the general population. J Bone Joint Surg Am. 2002 Feb;84-A(2):208-15.


  • Asher M, Min Lai S, Burton D, Manna B. Scoliosis research society-22 patient questionnaire: responsiveness to change associated with surgical treatment. Spine. 2003 Jan 1;28(1):70-3.
  • Engelberg R, Martin DP, Agel J, Obremsky W, Coronado G, Swiontkowski MF. Musculoskeletal Function Assessment instrument: criterion and construct validity. J Orthop Res. 1996 Mar;14(2):182-92.
  • Jaglal S, Lakhani Z, Schatzker J. Reliability, validity, and responsiveness of the lower extremity measure for patients with a hip fracture. J Bone Joint Surg Am. 2000 Jul;82-A(7):955-62.
  • Kessler S, Pfander T, Nelitz M, Puhl W, Gunther KP. [The Pediatric Musculoskeletal Functional Health Questionnaire. A function assessment questionnaire for detection of illnesses of the locomotor system in children and adolescents--initial results of validating a German version] [Article in German] Z Orthop Ihre Grenzgeb. 2001 Mar-Apr;139(2):134-7.
  • Martin DP, Engelberg R, Agel J, Snapp D, Swiontkowski MF. Development of a musculoskeletal extremity health status instrument: the Musculoskeletal Function Assessment instrument. J Orthop Res. 1996 Mar;14(2):173-81.
  • Swiontkowski MF, Engelberg R, Martin DP, Agel J. Short musculoskeletal function assessment questionnaire: validity, reliability, and responsiveness. J Bone Joint Surg Am. 1999 Sep;81(9):1245-60.


  • Atroshi I, Gummesson C, Andersson B, Dahlgren E, Johansson A. The disabilities of the arm, shoulder and hand (DASH) outcome questionnaire: reliability and validity of the Swedish version evaluated in 176 patients. Acta Orthop Scand. 2000 Dec;71(6):613-8.
  • Beaton DE, Katz JN, Fossel AH, Wright JG, Tarasuk V, Bombardier C. Measuring the whole or the parts? Validity, reliability, and responsiveness of the Disabilities of the Arm, Shoulder and Hand outcome measure in different regions of the upper extremity. J Hand Ther. 2001 Apr-Jun;14(2):128-46.
  • Germann G, Wind G, Harth A. [The DASH(Disability of Arm-Shoulder-Hand) Questionnaire--a new instrument for evaluating upper extremity treatment outcome] [Article in German] Handchir Mikrochir Plast Chir. 1999 May;31(3):149-52.
  • Hudak PL, Amadio PC, Bombardier C. Development of an upper extremity outcome measure: the DASH (disabilities of the arm, shoulder and hand) [corrected]. The Upper Extremity Collaborative Group (UECG) Am J Ind Med. 1996 Jun;29(6):602-8. Erratum in: Am J Ind Med 1996 Sep;30(3):372.


  • Johanson NA, Liang MH, Daltroy L, Rudicel S, Richmond J. American Academy of Orthopaedic Surgeons lower limb outcomes assessment instruments. Reliability, validity, and sensitivity to change. J Bone Joint Surg Am. 2004 May;86-A(5):902-9.
  • Marx RG, Jones EC, Allen AA, Altchek DW, O'Brien SJ, Rodeo SA, Williams RJ, Warren RF, Wickiewicz TL. Reliability, validity, and responsiveness of four knee outcome scales for athletic patients.J Bone Joint Surg Am. 2001 Oct;83-A(10):1459-69.

back to top