Table 1: Performance metrics of a ConvNeXt-Tiny model trained on a dataset of pelvic radiographs in detecting hip joint abnormalities PPV, positive predictive value; NPV, negative predictive value; AUROC, area under the receiver operating curve
Source: Khosravi B, et al: “Characterizing hip joint morphology using a multitask deep learning model”

AAOS Now

Published 9/11/2025
|
Rebecca Araujo

Can a deep learning model identify morphological hip abnormalities using radiographic images?

Researchers from Mayo Clinic in Rochester, Minnesota, developed a novel technique for automated detection of radiographic morphological hip pathologies using a deep learning (DL) model. Their findings were presented at the AAOS 2025 Annual Meeting by Lainey Bukowiec, MD, orthopaedic surgery resident at Mayo Clinic.

Early, reliable, and routine identification of abnormal pediatric hip morphologies is critical, given their predisposition to hip pain, functional decline, and osteoarthritis (OA). “Plain radiographs remain the gold standard for initial workup; however, expert classification of morphological abnormalities seen in [femoral acetabular impingement (FAI)] and [developmental dysplasia of the hip (DDH)] has shown poor inter- and intra-rater reliability,” wrote Dr. Bukowiec and coauthors. For this study, researchers from the Orthopedic Surgery Artificial Intelligence (AI) Laboratory at Mayo Clinic investigated the utility of DL models compared with human experts in classifying basic hip morphological abnormalities.

To begin, the team manually annotated 400 AP radiographs to identify ischial spine sign, femoral head cam deformity, hip dysplasia, or any combination thereof. A fellowship-trained hip surgeon determined the ground truth for a holdout test set of 100 radiographs.

Then the researchers utilized a previously validated object-detection model to localize regions of interest (ROIs) within the AP pelvic radiographs, identifying any critical anatomical landmarks. “Joint ROIs that contained any hardware were removed from the dataset,” the authors noted.

The DL model utilized for this study was ConvNeXt-Tiny, a convolution-based model. The model predicts four joint characteristics (i.e., ischial spine sign, dysplasia, cam deformity, other abnormalities) as well as patient sex. The model was used to analyze radiographs from 500 patients (mean age, 47 years; 49% female). Each radiograph was processed in 832 milliseconds. Table 1 lists the performance metrics.

Overall, the authors reported that the model achieved high accuracy and area under the receiver operating curve scores. In particular, the model was highly specific in identifying ischial spine sign (96.0%) and all abnormalities (85.3%). Sensitivity for cam deformity was high at 80.0%, as well as for dysplasia at 75.0%.

The model achieved a high positive predictive value for all abnormalities (93.5%) and ischial spine sign (87.5%). Inter-rater agreement, measured via Gwet’s AC1, was “substantial” for dysplasia (0.83) and all abnormalities (0.88) and moderate for ischial spine sign (0.75) and cam deformity (0.61).

“The results demonstrate the efficacy of a ConvNeXt-Tiny model trained on a dataset of AP pelvic radiographs, achieving good, but nowhere near perfect, accuracy in predicting various hip joint characteristics. Human expert inter-rater reliability was also good, but not perfect,” the authors summarized. “Taken together, these results show the promise and current limitations of both using plain radiographs for simple morphological hip classifications and the downstream impact this has on developing reliable DL models.”

They added that this study adds to a “growing body of evidence supporting the application of DL in musculoskeletal radiographic analysis to the diagnosis of FAI and DDH.”

Among the limitations of the model was its limited range of detectable radiographic signs, including ischial spine sign, cam deformity, and dysplasia, as opposed to more complex measurements of continuous metrics, such as Tonnis or center-edge angles. “Clinicians often use MRI and CT scans to evaluate morphological hip abnormalities; however, the current paper deliberately examined the most basic form of hip pain workup — radiographs without accounting for angles — to evaluate whether DL could effectively classify simple measures with performance comparable to human interpretation,” the authors noted. Furthermore, this work can be readily expanded to evaluate for more radiographic parameters as DL models continue to mature.

Overall, the authors concluded, “The findings suggest a promising avenue for leveraging AI-driven technologies to enhance musculoskeletal radiographic interpretation and improve patient care but also introduce a strong note of caution in training models based on metrics upon which even experts have poor consensus, which will ultimately influence model capability.”

Dr. Bukowiec’s coauthors of “Characterizing hip joint morphology using a multitask deep learning model” are Bardia Khosravi; John Patrick Mickley; Jacob Francis Oeding, MS; Pouria Rouzrokh, MD, MPH, MHPE; Bradley James Erickson, MD, PhD; Emmanouil Grigoriou, MD; Michael J. Taunton, MD, FAAOS; and Cody Wyles, MD.

Rebecca Araujo is the managing editor of AAOS Now. She can be reached at raraujo@aaos.org.