Leg length measures appear inaccurate in the early phase following total hip arthroplasty

The aims of this study were to (1) assess reliability of leg length discrepancy (LLD) measurements at different anatomical landmarks, (2) longitudinally investigate LLD in patients within the first year following total hip arthroplasty (THA) and to (3) correlate changes in LLD with functional outcome. Ninety-nine patients with short stem THA (53.3% males, mean age: 61.0 ± 8.1 years) were prospectively included. Upright pelvic anteroposterior (a.p.) radiographs taken at 6 timepoints (preoperatively, discharge, 6, 12, 24, 52 weeks postoperatively) were used to assess LLD at 5 anatomical landmarks (iliac crest, upper sacroiliac joint, lower sacroiliac joint, tear drop figure, greater trochanter). WOMAC and Harris Hip Score (HHS) were obtained preoperatively and at 6 and 52 weeks. LLD measures significantly increased in the initial phase following THA, from discharge to 6 weeks postoperatively and remained constant thereafter. Documentation of LLDs is dependent on measurement site: LLDs varied significantly between trochanter and iliac crest to tear drop figure (p < 0.001). Functional assessments did not correlate with the occurrence of LLDs [WOMAC (p = 0.252); HHS (p = 0.798)]. Radiographic assessment of LLD following THA may not be performed early postoperatively, as measurements appear to inaccurately reflect actual LLDs at this time, potentially due incomplete leg extension and/or inhibited weight-bearing.


Material and methods
One-hundred patients undergoing primary THA with a standardised pressfit cementless short stem hip system (ANA NOVA Proxy® Stem and Alpha® Cup, ImplanTec GmbH, Mödling, Austria) and ceramic bearings were consecutively enrolled to this observational study. All surgeries as corresponding preoperative digital templatings were performed by a single experienced surgeon at our institution in the period between February, 2016 and March, 2017. Ninety-nine patients were eligible for final analysis, including 3 patients with concurrent bilateral THA (6.1%), 2 patients with metachronous bilateral THA (4.0%) and 89 patients with unilateral THA (89.9%). One patient was excluded as statistical outlier with respect to pre-existing leg length difference of more than 5.0 cm due to functional soft tissue contracture that could be corrected during surgery.
Operations were performed through a minimally invasive anterolateral approach in a supine position after preoperative digital templating with mediCAD 2D (Hectec GmbH, Altdorf bei Landshut, Germany; Version 5.5 (since then updated to Version 6.0, see: https:// www. medic ad. eu/ en/ medic ad/ medic ad-class ic). A preoperative single dose antibiotic prophylaxis was given. Patients were allowed to start full weight-bearing on day one after operation. Crutches were prescribed for 6 postoperative weeks.
The study was reviewed and approved by our local institutional review board (EK-Nr. 28-152 ex 15/16). All patients provided informed consent prior to their participation.
Leg length discrepancy measures. Preoperatively, LLD was measured using anteroposterior pelvic radiograph in an upright standing position (neutral abduction and flexion, 15° internal rotation; X-ray tube-to-film distance of 150 cm in perpendicular orientation). Likewise, postoperative anteroposterior pelvis radiographs were obtained in the standardised standing position at the time of discharge (after 4-7 days), and after 6, 12, 24, and 52 weeks. A true leg/limb length discrepancy (LLD) between the (to-be) operated and contralateral leg was calculated as the absolute difference from a horizontal line to five different anatomical landmarks: the acetabular tear drop figure, the most inferior portion of the sacroiliac joint, the most superior portion of the sacroiliac joint, the most superior portion of the iliac crest, and the most cranial location of the greater trochanter ( Fig. 1). Digital image viewing and measurements were performed on mediCAD 2D (Hectec GmbH) by two independent investigators after calibrating the images to either a standardised, 25 mm-diameter calibration ball, or the femoral head of the operated hip, being 32 to 36 mm. Neither of the independent investigators was the operating surgeon.
Assessment and evaluation of the clinical outcome. Clinical assessment of hip related pain and functional abilities were assessed through the clinician-based Harris Hip Score (HHS) preoperatively, as well as 6 weeks and 12 months postoperatively. The HHS is a frequently used clinician based valid and reliable [9][10][11][12] instrument for the assessment of hip related symptoms and functionality after THA [13][14][15][16] . The HHS ranges from 0 to 100 with higher scores reflecting higher function and better outcomes.
The Western Ontario and McMaster Universities Osteoarthritis Index (WOMAC) was used to assess patientreported pain, stiffness, and physical function. The questionnaire is a widely used, reliable, and validated instrument with high sensitivity to changes in the health status of patients with hip and/or knee related osteoarthritis or joint replacement [17][18][19] . It comprises 24-items that cover 3 dimensions: pain (5 questions), stiffness (2 questions), and function (17 questions). Patients can choose their answer on an ordinal 11-point Likert-Scale. Higher scores indicate worse pain, stiffness, and functional impairment. We administered a validated German version of the WOMAC 20 .
Orthopaedic arch supports used by patients for actual or perceived LLDs following surgery were documented. All methods were carried out in accordance with relevant guidelines and regulations. Ethical approval has been obtained by the institutional review board. www.nature.com/scientificreports/ Statistical methods. Statistical analyses were performed using Stata/MP 13.0 (StataCorp, College Station, Texas). Parametric tests were chosen over non-parametric tests as previous simulation studies showed that populations around 100 are "sufficiently large" for the application of parametric tests despite any assumption of normal distribution 21 . We present patients' characteristics by measures of central tendencies (e.g. proportion, mean [and standard deviation], median [and interquartile ranges, IQR]) as appropriate. Chi-squared tests were performed to assess differences in groups for binary and categorical variables. (Paired) t-tests or Wilcoxon-ranksum tests were used in case of normally and non-normally distributed continuous variables, respectively. A p value < 0.05 was considered as being statistically significant. Intraclass correlation coefficient (ICC) of the absolute agreement on a two-way random model was used to assess the interrater reliability of LLD measurements at various anatomical landmarks. The interpretation of an average consistency between the two observers is performed as follows: values less than 0.5 indicate poor, values between 0.5 and 0.75 moderate, values between 0.75 and 0.9 good, and values greater than 0.90 excellent reliability 22 .
A multilevel linear mixed effects model for repeated measurements with potentially correlated random LLD intercepts and slopes, nested for anatomical landmarks and using restricted maximum likelihood (REML), was constructed to study the variation in LLD measurements over time (with all LLDs turned into positive values). As in particular LLD measurements taken at the tear drop figure are recommended as a reference point on a.p. pelvic radiographs, they were used as reference to the other anatomical landmark-based measurements within the statistical models 23 . Scientifically relevant demographic variables (age, gender, and BMI) were added to the model to determine potential effects on the outcome. Quadratic terms were added to reflect if effects can rather be described on a linear or quadratic relationship. Coefficients (b), corresponding 95% confidence intervals, standard errors (SEs), z-values and p values were provided. Furthermore, multilevel linear mixed effects models for repeated measurements with REML were used to investigate longitudinal changes of clinical outcome assessment (i.e. HHS & WOMAC). Relevant demographic data were added to increase the models' predictability. LLD measures (from the iliac crest) were transformed to positive values and added to the model to assess a potential independent effect on the functional outcomes. An interaction term between LLD and time accomplished was added to the model to investigate whether a time dependent effect of LLD exists.
There are currently no accepted general standards for sample size calculation in linear mixed effect models. However, it is accepted to recast random effects to simple plot-models and use multivariate linear models with random effects modelled instead as multiple response values for calculations 24 . In such analogy, a sample of 85 participants would enable a regression model with 80% statistical power at an alpha-level set at 0.05 and four predictors to account for 15% or more of the variability in the outcome.

Ethics approval and consent to participate. The present study has been approved by the Institutional
Review Board of the Medical University of Graz (IRB-No. 28-152 ex 15/16). All patients gave their written informed consent prior to being included in this study.

Consent for publication.
As no patient-identifying information is made public, no specific consent with regards to this publication was obtained from patients.

Results
General. Table 1 shows the demographic distribution of the study population (N = 99; mean age: 61 years; 46% female, 54% male; mean BMI: 29). Indications for surgery included osteoarthritis in 95 cases (96%) and avascular necrosis of the femoral head in 4 (4%). All patients showed substantive clinical impairments in the preoperative assessments, demonstrating median HHS of 50 (IQR: 42-60). Preoperative LLD measurements were normally distributed at all measurement points and ranged at various hip and pelvis landmarks from − 16.5 mm to 22 mm (Table 1) Table 2). Mean LLD measures and standard errors of the study population over time are demonstrated in Fig. 2. Leg length difference measures at the tear drop figure and the upper and lower sacroiliac joints were nearly congruent, whereas LLD measures at the iliac crest were distinct larger over the entire observation period. Interestingly, an increase of LLD was observed in the entire sample in the initial postoperative phase between week 1 and week 6 and reached a steady plateau subsequently. As the LLD increase between week 1 and 6 is also seen coincidently with measures at the trochanter (note that measures at the trochanter in upright standing positions can be interpreted as reference mark, because THAs do not impact limb lengths distal to the hip joint), inferences may be drawn that patients have not been standing upright with both legs fully straightened, thus eventually leading to hip flexion or pelvic torsion. With measures at week 6, trochanter refer- www.nature.com/scientificreports/ ence measures reached balanced LLD, indicating that standard upright standing and balanced weight bearing was regained. The influence of demographics on postoperative leg length differences over the course of time were investigated by a linear mixed effect model (Table 3) (Table 4). There was a positive influence of time from THA on HHS, both from preoperative to 12 weeks postoperative, as well as from there to 12 months postoperative (Table 4). Furthermore, male gender significantly correlated with increased HHS over time (b = 5.06 points (SE: 1.56); P = 0.001; Table 4). On the other hand, the use of orthopaedic arch support for perceived LLD was significantly  (Table 5).

Discussion
In the current study, an increase of leg length discrepancies over all 5 anatomical landmarks was measured during the initial postoperative phase, whereupon LLDs did not change significantly any more. The best interobserver agreements were found for the trochanter, followed by the iliac crest, and the upper sacroiliac joint. Moreover, there was an average difference of 1.8 mm and 2.8 mm in LLD as assessed on the iliac crest and the trochanter, respectively, in comparison to the tear drop figure. Therefore, it is important to report the landmark used for assessment of LLD. Harris Hip Score and WOMAC score significantly improved over time, whilst larger LLDs did not seem to negatively affect functional outcomes. Limitations of the study include the relatively small study size of 99 patients. Therefore, currently close to but not statistically significant results may become more or less significant in case further patients could have been included. On the other hand, the results of the current study were based on over 2300 repeated measurement values taken at distinct time points, thus improving conformity. Notably, LLD measurements were exclusively performed on radiographic images rather than incorporating direct clinical methods as well, wherefore it cannot be ruled out that length differences in the femoral shaft, knee, lower leg or ankle, as well as the pelvic tilt itself, may account for LLDs observed. Although advantages and favourable precision rates of other methods to assess LLD as computed tomography, magnetic resonance imaging (MRI), or X-ray are well known 5,[25][26][27][28][29] , all these methods may not be routinely applicable in patients following THA due to the associated high costs and radiation exposure, as well as metal artefacts in case MRI-based techniques are to be used 5,[25][26][27][28] . Thus, considering the high interobserver agreement for all anatomical landmarks used, as well as the uniform and significant change of LLDs in each patient with time, the authors suppose that the chosen measurement allows reproducible and reliable objectification of LLD. Another limitation of the study is that-although a plateau in LLD measurements was observed from the 6th postoperative week onwards-it cannot be concluded with certainty from which www.nature.com/scientificreports/ postoperative timepoint on reliable LLD measurements may be obtained, as no additional measurements had been performed earlier than 6 weeks. As previously described and frequently applied in clinical practice, we used plain a.p. X-rays of the pelvis to assess LLDs following THA [30][31][32][33] . Particularly the tear drop figure has been recommended in the past as a reference point to assess LLD on a.p. pelvic radiographs, as its configuration may not be significantly affected by pelvic rotation 23 . Apart from the tear drop figure, we also used the upper and lower sacroiliac joint, the trochanter, and the iliac crest to assess LLD. In line with previous findings that the lower sacroiliac joint and especially the tear drop figure may be difficult to identify following THA 34 , the best interobserver agreements were present for the trochanter, iliac crest and upper sacroiliac joint. Notably, depending on the anatomical landmark used, LLDs varied, with the largest differences seen for the trochanter to the tear drop figure, as well as the tear drop figure to the iliac crest.
Longitudinal X-ray-based measurements revealed significantly smaller LLDs at discharge in comparison to X-rays obtained at 6 weeks, whereupon LLDs did not more change significantly. Of note, due to lack of additional measurements from early postoperatively to 6th postoperative week, an exact time point upon which reliable LLD measurements may be obtained, is difficult to define. Whilst comparable changes in LLD measurements at the upper and lower sacroiliac joint as well as the tear drop figure were observed over time, those based at the trochanter and the iliac crest markedly differed. For the trochanter, which may be referred to as the reference landmark as THA does not affect leg length distal to the hip joint, stable measurements were obtained from the 6th postoperative week onwards, implying that patients did not completely extend and/or fully weight-bear the affected leg upon radiographic examination early postoperatively. At the iliac crest, on the other hand, LLD measurements on a.p. radiographs are significantly affected by pelvic torsion 35,36 , which could serve as an explanation for the marked difference in LLD measurements as compared with the lower and upper sacroiliac joint, as well as the tear drop figure.
The observation that LLD measurements on pelvic a.p. radiographs increase from discharge to the 6th postoperative week seemingly contradicts the fact that subsidence of femoral components may likewise occur during this time that would lead to a decrease in LLD 37 . However, even if subsidence occurs, it is usually small, with reported migration of 0.5 mm 38 to 0.96 mm 39 during the first 6 to 12 postoperative weeks. Thus, changes in patients' postures leading to a seemingly increase in LLD during the early postoperative period may offset simultaneously occurring migration of the femoral component.
Considering that LLDs following THA are found in up to 40% of patients following THA and may even lead to a lawsuit, we further investigated whether LLD has an influence on clinical outcome scores 30,40,41 . Similar to previous studies, HHS and WOMAC improved from preoperative to the third postoperative month, and kept improving thereafter 42,43 . Notably, there was no clear correlation between change in LLD and HHS or WOMAC over time. On the other hand, male gender correlated with a significant improvement of HHS, whereas this was not the case for WOMAC. Strikingly, the use of orthopaedic arch support for patient-perceived LLD was significantly associated with poorer functional outcome, as reflected by worse HHS and higher WOMAC upon final follow-up. Similar observations have been made by Mavcic et al. 44 and Sykes et al. 45 , with patients reporting on subjectively felt LLD presenting with worse clinical outcome scores following THA.
In conclusion, radiographic assessment of LLD following THA may not be performed early postoperatively, as measurements appear to inaccurately reflect actual LLDs at this time. However, the latest from the 6 th postoperative week onwards, stable and reproducible LLD measurements can be expected from plain a.p. pelvic radiographs, with the upper or lower sacroiliac joint, or tear drop figure, to be preferably used as reference landmarks.

Data availability
The datasets used and/or analysed during the current study are available from the corresponding author on reasonable request.