Comparison of ultrasonographic, radiographic and intra-operative findings in severe hip osteoarthritis

Aim of this study was to assess the US findings of patients with late-stage hip OA undergoing total hip arthroplasty (THA), and to associate the US findings with conventional radiography (CR) and intraoperative findings. Moreover, the inter-rater reliability of hip US, and association between the US and Oxford Hip Score (OHS) were evaluated. Sixty-eight hips were included, and intraoperative findings were available on 48 hips. Mean patient age was 67.6 years and 38% were males. OA findings—osteophytes at femoral collum and anterosuperior acetabulum, femoral head deformity and effusion—were assessed on US, CR and THA. The diagnostic performance of US and CR was compared by applying the THA findings as the gold standard. Osteoarthritic US findings were very common, but no association between the US findings and OHS was observed. The pooled inter-rater reliability (n = 65) varied from moderate to excellent (k = 0.538–0.815). When THA findings were used as the gold standard, US detected femoral collum osteophytes with 95% sensitivity, 0% specificity, 81% accuracy, and 85% positive predictive value. Concerning acetabular osteophytes, the respective values were 96%, 0%, 88% and 91%. For the femoral head deformity, they were 92%, 36%, 38% and 83%, and for the effusion 49%, 85%, 58% and 90%, respectively. US provides similar detection of osteophytes as does CR. On femoral head deformity, performance of the US is superior to CR. The inter-rater reliability of the US evaluation varies from moderate to excellent, and no association between US and OHS was observed in this patient cohort.

. Large femoral collum osteophytes (white arrows) in longitudinal ultrasonographic view (a) and lateral conventional radiographic view (b). In the antero-posterior radiography (c), the same osteophyte is barely visible (white arrowhead). During the total hip arthroplasty, the femoral head was removed, and the corresponding osteophytes (black arrow) are clearly distinguishable at the femoral collum (d). In ultrasonography, a small effusion is also observed (measured within the dashed lines).

US findings vs. THA.
To evaluate the performance of the US and CR to detect osteoarthritic changes, the macroscopic findings during the THA were used as the gold standard. For the US-detected femoral collum osteophytes, the sensitivity was 95%, the specificity 0%, accuracy 81% and positive predictive value 85%. Concerning acetabular osteophytes, the respective values were 96%, 0%, 88% and 91%. For the deformity of the femoral head they were 92%, 36%, 38% and 83%, respectively. On US-detected effusion, the respective values were 49%, 85%, 58% and 90%. Table 2 summarizes the performance of the US.
When the performance of CR was evaluated similarly, femoral osteophytes were reported with 90% sensitivity, 14% specificity, 79% accuracy and 86% positive predictive value. For the acetabular osteophytes, the respective values were 98%, 0%, 90% and 92%. Concerning deformity of the femoral head they were 46%, 64%, 50% and 81%, respectively. Table 3 outlines the performance of the CR. When the diagnostic performance of the US and CR were compared using McNemar's tests, US outperformed CR only on the deformity of the femoral head. Table 4 depicts the diagnostic capacity of the US and CR when the THA findings were used as the gold standard.

Discussion
In this study, we compared the US and CR of findings of the late-stage hip OA with the intraoperative findings of the THA. Results indicate that the US is as good as CR in detecting the osteophytes of femoral collum and anterosuperior acetabulum. Figure 2 shows an example of a large femoral collum osteophyte visible on US but not on CR. Concerning the morphology of the femoral head, US slightly outperformed CR (Fig. 3). It is Table 1. The inter-rater reliability of the hip ultrasonography between the radiologist and the three independent sonographers. Prevalence and bias adjusted kappa (PABAK) with 95% confidence intervals are presented.      . Ultrasonography (US) detects deformity of the femoral head better than conventional radiography (CR). In longitudinal US view of the hip joint (a), marked deformity of the femoral head is seen (white arrows), whereas on anteroposterior (b) and lateral CR views (c) the femoral head has preserved its normal ball-like appearance (white arrowheads). Additionally, a large femoral acetabular osteophyte (black arrow) is observed on US (a) and anteroposterior CR (b). Furthermore, femoral collum osteophyte (dashed white arrow) is seen on US (a) and on lateral CR (c).
Scientific Reports | (2020) 10:21108 | https://doi.org/10.1038/s41598-020-78235-z www.nature.com/scientificreports/ detect hip OA than CR; however, the authors studied only the hip osteophytes and femoral head deformity 18 . Even though our results suggest high sensitivity for US to detect osteophytes, the absence of normal findings pushes the specificity to zero. Interestingly, every US feature assessed in our study showed excellent positive predictive value (83-91%) highlighting the common feature of US examination in the OA applications; positive findings are trustworthy but negative finding does not rule out pathology. This highly contradicts the expert opinion by European Society of Musculoskeletal Radiology from the year 2012, which does not recommend US for hip osteoarthritis diagnostics 19 . Since no relevant Doppler signal was acquired in the first 20 patients, we only assessed the capsular bulging on the femoral collum. Accordingly, we observed poor sensitivity (49%), but good specificity (85%) and positive predictive value (90%), for the detecting the hip effusion (≥ 8 mm) when the corresponding THA finding was used as the gold standard. Previously, Qvistgaard et al. (2006) reported virtually no association between the US-detected effusion and joint aspiration; however, their gold standard was not as reliable as ours 10 . The inefficiency of the Doppler imaging remains somewhat disconcerting and may be related to the US device that we used, since in a small sample of 24 hips undergoing THA, Walther et al. found that Power Doppler US is a reliable tool to detect synovitis; they observed a significant correlation between the US and histopathologic findings 20 .
We found no association between the US findings and the OHS. On the contrary to our study, Qvistgaard et al. found association between US-detected hip OA and the patient's pain at rest and on walking 10 . In general, the OA's association with symptoms remains debatable. In a large study including over 5000 subjects, Kim et al. established that hip pain was not present in many hips with radiographic OA, and many painful hips did not show radiographic hip OA 4 . On MRI, Kumar et al. suggested that acetabular cartilage defects, bone marrow lesions and subchondral cysts were associated with hip pain and dysfunction. However, the authors observed no association between radiographic hip OA and pain, which supports our findings here 21 .
Only two previous studies assessing the reproducibility of US of osteoarthritic hip joint exist. With 100 patients, Qvistgaard et al. reported that the intra-rater reliability (kappa) was 0.75 for osteophytes, 0.69 for femoral head deformity, 0.58 for effusion and 0.55 for synovitis. The corresponding inter-rater reliability was 0.49, 0.35, 0.38 and 0.43, respectively. As a minor weakness, only image re-interpretation was performed, not a second US scan 10 . Later, Abraham et al. (2014) found that inter-rater reliability was only moderate (kappa 0.47) for femoral collum osteophytes and femoral head deformity 18 . Our results are in line with these previous studies with the inter-rater agreement being between moderate and almost perfect (PABAK 0.538-0.815). Our slightly better reliability values may be due to the technical development of the US devices and the corresponding spatial resolution and contrast.
Several limitations exist in our study. First, the high prevalence of OA findings induces bias to this study, which is reflected mostly as the very low specificity obtained by the US evaluation since almost every patient had a positive finding. Second, the time from the US assessment to the TKA operation slightly varied and the amount of effusion could have changed. Third, the cartilage of the femoral head was not assessed on US, leaving no possibility for correlation with joint-space narrowing on CR; however, the diagnostic view on US to the femoral head cartilage is presumably insufficient. Fourth, the high BMI of few patients weakened the diagnostic US window allowing misinterpretation of US findings. Fifth, the rather large number of operating orthopedic surgeons may have created variation to the documentation of TKA findings; therefore, the grading was kept as effortless as possible. Additionally, the surgeons were not blinded to the CR images, allowing biased classification of the intraoperative findings.
In conclusion, on severe hip OA, US provides similar detection of osteophytes as does CR. On femoral head deformity, the performance of the US is superior to CR. The inter-rater reliability of the US evaluation varies from moderate to excellent, and no association with Oxford Hip Score exists.

Methods
Patients. Sixty-six patients scheduled for THA for late-stage OA of the hip were enrolled consecutively in this study during November 2017 and March 2018. The mean age was 67.6 years (range 50 to 88 years) and 38% were males. Two patients had bilateral THA, thus, in total 68 hips were included in this study initially. One patient failed to report the OHS, and the intraoperative findings were available on 48 hips. Written informed consent was obtained from every patient. This prospective diagnostic study (level of evidence: III) was carried out in accordance with the Declaration of Helsinki and approved by the Ethical Committee of Northern Finland Health Care District, Oulu University Hospital (number 106/2017). Ultrasonography. US imaging was conducted using a clinical US device (LOGIQ S7, GE Healthcare, Milwaukee, WI, USA) with 15 MHz linear transducer ML6-15. If the deep location of the hip joint hindered the visibility, the 5 MHz convex transducer C1-5-D was applied to depict the anatomy. B-mode imaging settings were kept constant for each subject and the focus was set at the level of region of interest. US of the hip was performed by a single radiologist with four years of experience. Second ultrasonographic assessment was performed by three independent sonographers-all with more than five years of experience-on 65 hips to evaluate the interreader reliability. All observers were blinded to the clinical and radiographic findings.
The hip was scanned with patient in supine position with toes slightly turned outwards (eversion) in anteriorlongitudinal plane parallel to the femoral neck to assess effusion, osteophytes, appearance of femoral head. The probe was moved from a medial to lateral direction to obtain the optimal image. Doppler imaging yielded no signal on the first 20 patients, so it was not further utilized in this study. The US findings were graded by each observer according to Qvistgaard et al. 10 . In the literature, the threshold for hip effusion varies between 7 and 9 mm [13][14][15][16] ; thus thickness of fluid or capsular bulge of at least 8 mm in collum/caput interface was defined as effusion in our study. The presence and size of osteophytes was evaluated anteriorly at the femoral collum and Scientific Reports | (2020) 10:21108 | https://doi.org/10.1038/s41598-020-78235-z www.nature.com/scientificreports/ anterior acetabulum as follows: Grade 0 = no osteophyte, Grade 1 = small osteophyte, Grade 2 = large osteophyte. The contour of the femoral head was classified Grade 0 = normal, Grade 1 = slightly flattened, or Grade 2 = clearly deformed.
Radiography. One to 95 weeks (mean 20 weeks) before the US study, the patients underwent hip standard anteroposterior hip CR with an addition of a cross-table lateral view projection of the symptomatic side. In the anteroposterior view, patient is in a supine position and lower extremities were internally rotated to 15°-20°. The image was centered to the upper part of the symphysis pubis. In the cross-table lateral view the patient was in supine position, with hip in 15°-20° internal rotation, and the contralateral hip and knee in 90° flexion to exclude the unaffected lower extremity from the image. The projection was toward groin region at 45° of incidence parallel to the longitudinal axis of the femur. The CR of the hips were assessed by one radiologist with five years of experience for osteophytes, joint space narrowing, appearance of the femoral caput and Kellgren-Lawrence (KL) grades. Osteophytes were graded at femoral collum and superior anterolateral acetabulum as follows: Grade 0 = no osteophyte, Grade 1 = marginal/ small osteophyte, Grade 2 = a definite osteophyte. Joint space was defined either normal or narrowed. The contour of the femoral head classified either Grade 0 = normal, Grade 1 = slightly irregular, or Grade 2 = clearly deformed. Ultimately, the total KL grade was given for the hip joint. The reader was blinded to clinical and US findings.
Total hip arthroplasty findings. The THA operation was performed 3 to 39 days (mean 15 days) after the US evaluations by eight orthopedic surgeons with at least 10 years of THA experience. The surgeons were blinded to the US findings, but not to clinical history and CR findings. Routine posterior approach for THA was conducted and the surgical findings were collected as follows: effusion (no, yes), anterior osteophytes on femoral collum (no, yes) and on acetabulum (no, yes), and the deformity of the femoral caput (no, yes). The intraoperative grading was kept simple due to several different surgeons performing the THAs.

Statistical analysis.
Owing to the rather small sample size and distributions of US findings, cut-offs were applied to create dichotomous score on certain variables: both US-detected and CR-detected osteophytes were categorized as non-significant (Grade 0) or significant (Grades 1 and 2); similarly femoral head deformity was grade as non-significant (Grade 0) or significant (Grades 1 and 2) on both imaging modalities. After dichotomizing, an US sum score ranging from zero to four was created. Data of US and CR findings are given as numbers of true positive and negative findings according to intraoperative findings. Sensitivity, specificity, accuracy, positive predictive value, and negative predictive value with their 95% confidence intervals were calculated for each finding. Confidence intervals for sensitivity, specificity and accuracy were calculated using Clopper-Pearson confidence interval; for predictive values the standard logit confidence intervals were applied 17 . The sensitivities between US and radiography were compared within positive intraoperative findings using Mc-Nemar's test. To evaluate the associations of US and CR findings with OHS, Mann-Whitney and Kruskal-Wallis methods were used. For correlation analyses, Spearman's rho was calculated. P-value < 0.05 was considered as statistically significant. To evaluate inter-reader agreement, prevalence-adjusted and bias-adjusted kappa (PABAK) was applied with the following interpretation: PABAK values ≤ 0 indicated no agreement, 0.01-0.20 indicated none to slight agreement, 0.21-0.40 indicated fair agreement, 0.41-0.60 indicated moderate agreement, 0.61-0.80 indicated substantial agreement, and 0.81-1.00 indicated excellent agreement. SPSS 24.0 and R 3.5.2 were used for data analysis.

Data availability
The datasets generated during and/or analysed during the current study are available from the corresponding author on reasonable request.