Automated analysis of knee joint alignment using detailed angular values in long leg radiographs based on deep learning

Malalignment in the lower limb structure occurs due to various causes. Accurately evaluating limb alignment in situations where malalignment needs correction is necessary. To create an automated support system to evaluate lower limb alignment by quantifying mechanical tibiofemoral angle (mTFA), mechanical lateral distal femoral angle (mLDFA), medial proximal tibial angle (MPTA), and joint line convergence angle (JLCA) on full-length weight-bearing radiographs of both lower extremities. In this retrospective study, we analysed 404 radiographs from one hospital for algorithm development and testing and 30 radiographs from another hospital for external validation. The performance of segmentation algorithm was compared to that of manual segmentation using the dice similarity coefficient (DSC). The agreement of alignment parameters was assessed using the intraclass correlation coefficient (ICC) for internal and external validation. The time taken to load the data and measure the four alignment parameters was recorded. The segmentation algorithm demonstrated excellent agreement with human-annotated segmentation for all anatomical regions (average similarity: 89–97%). Internal validation yielded good to very good agreement for all the alignment parameters (ICC ranges: 0.7213–0.9865). Interobserver correlations between manual and automatic measurements in external validation were good to very good (ICC scores: 0.7126–0.9695). The computer-aided measurement was 3.44 times faster than was the manual measurement. Our deep learning-based automated measurement algorithm accurately quantified lower limb alignment from radiographs and was faster than manual measurement.

www.nature.com/scientificreports/caused by distal femoral wear (mLDFA = 89°), tibial varus obliquity (MPTA = 87°), and lateral joint line opening (JLCA = 3°) 7 .However, compensating for any measurement abnormalities can achieve a balanced limb position.Therefore, measuring each parameter is vital for comprehending alignment abnormalities and identifying their primary cause 7 .However, this may be a laborious and time-consuming task for radiologists.
Therefore, there is a clinical need for a standardised and reproducible automatic analysis tool that measures lower limb alignment using full-length weight-bearing radiographs 8,9 .Moreover, developing a technical framework based on artificial intelligence applicable in clinical settings is potentially feasible 9 .Our objective was to create, train, and validate an automated support system to evaluate lower limb alignment by quantifying mTFA, mLDFA, MPTA, and JLCA on full-length weight-bearing radiographs of both lower extremities (Fig. 1).

Study participants and radiograph data
This retrospective study received approval from the institutional review boards of a tertiary hospital (A) (Yonsei University Gangnam Severance Hospital, Institutional Review Board, No 3-2020-0127) and a military hospital (B) (Armed Forces Capital Hospital, Institutional Review Board, 2023-02-002), and informed consent was waived because the data used in this retrospective study were fully de-identified to protect patient confidentiality.All methods were performed in accordance with the ethical standards of Helsinki Declaration.A total of 404 fulllength weight-bearing radiographs of both lower extremities from 404 patients (mean age: 44.3 years, 188 men, 186 women) from hospital A were used to develop and test the algorithm.An external test set of 30 consecutive radiographs from 30 men (mean age: 30.2 years) from hospital B was included.The patients underwent long-leg radiography at the two institutions between March 2015 to January 2019 and between August 2022 and September 2022, respectively.Patients from hospital A with K-L grade 4, intra-articular fracture, deformity due to previous trauma, and knee arthroplasty, and those < 19 years were excluded (n = 426) (Fig. 2).The long-leg radiographs were obtained using two imaging acquisition systems and covered the whole lower limbs from the hips to the ankles under single anteroposterior exposure.Philips DigitalDiagnost (Philips, Best, The Netherlands) and Carestream DRX-Evolution (Carestream Health, Rochester, NY, USA) were used in hospitals A and B, respectively.
Next, 30 radiographs out of the 404 were used for clinical verification of the algorithm's anatomical feature points, chosen through stratified random splitting based on the K-L grade.The remaining 374 radiographs were used to develop and validate the automatic segmentation algorithm.Cases with overlapping bones (n = 12), bones containing metal (n = 33), and unclear bone outline (n = 32) were excluded to ensure methodological consistency 10 .For the algorithm's development, 342 radiographs for the femoral head, 352 for the distal femur, 341 for the proximal tibia, 362 for the distal tibia, and 367 for the talus were used.The collected radiographs were divided into the training set (80%), validation set (10%), and test set (10%) (Fig. 2).

Manual segmentation
The femoral heads, knee joints, and ankle joints were manually segmented using Adobe Photoshop CC 2018 (Adobe Systems Inc., San Jose, CA, USA) to create masks, which served as the reference for comparison.A radiology technician, supervised by an experienced radiologist, labeled the masks.

Manual reference measurements
Lower limb alignment was evaluated based on the following anatomic feature points (Fig. 3): (1) the centre of the femoral head, (2) the centre of the femoral intercondylar notch, (3) centres of the medial and lateral tibial  www.nature.com/scientificreports/spines, (4) two most distal points of the medial and lateral femoral condyles, (5) two most proximal points of the medial and lateral tibial plateaus, and (6) mid-malleolar point (centre of the ankle).The mechanical axis of the femur was defined as a line drawn from the centre of the femoral head to the centre of the femoral intercondylar notch.The mechanical and anatomical axes of the tibia were defined as the line connecting the centre of the tibial spines and the centre of the ankle.The distal femoral articular axis was defined by the line connecting the most distal points of the medial and lateral femoral condyles.The proximal tibial articular axis was defined as the line connecting the two most proximal points of the tibial plateaus.Four alignment parameters (mTFA, mLDFA, MPTA, and JLCA) were measured using the aforementioned eight feature points and four lines.
We developed a tool for measuring alignment parameters using MATLAB's Graphical User Interface Development Environment (GUIDE) to create a Graphical User Interface (GUI) in MATLAB.This tool allows the designation of landmarks for angle measurement and calculates the angles using these points (Fig. 3).To assess the intraobserver and interobserver agreement of the measured values between the readers and algorithm, an orthopedic fellow measured the angles of the clinical verification data set (n = 30) twice, with a 2-week interval between the measurement sessions.Another radiology fellow measured the angles once.Regarding the test from the external institution, a fellowship-trained radiologist measured the angles twice.The time taken to load the data and measure the four alignment parameters using the tool was recorded.

Automated segmentation algorithm
Representative models of Semantic Segmentation include FCN (Fully Convolutional networks), U-Net, and SegNet.FCN needs to learn deconvolution when upsampling, so it needs weight parameters for learning, but in SegNet, this process is omitted, so the learning parameters are reduced.U-Net skip combines during the decoding process, but U-Net transfers the entire feature map information of the same layer from the encoder to the decoder and concats it.Therefore, it is heavier than SegNet, which only selects and uses some features of Max pooling indices.
For this reason, in this study, the outline of each bone was automatically segmented using SegNet.The SegNet architecture consists of a down sampling (encoding) path and a corresponding upsampling (decoding) path, followed by a final pixel-wise classification layer.In the encoder path, there are 13 convolutional layers that match the first 13 convolutional layers in the VGG16 network.Each encoder layer has a corresponding decoder layer; therefore, the decoder network also has 13 convolutional layers.The output of the final decoder layer is fed into a multi-class softmax classifier to produce class probabilities for each pixel independently 11 .
To automatically segment the contours of each bone, we implemented a two-step segmentation algorithm (Fig. 4).In the initial step, we identified the region of interest containing the target bone, and subsequently, in the second step, we delineated the boundaries of the target bones within the identified image region.www.nature.com/scientificreports/ the learning rate to 1 × e −2 .The SegNet model was trained using the training and validation data and implemented with MATLAB R2018b on a GeForce GTX 1080Ti graphics processing unit.

Automatic determination of anatomic feature points
The mechanical axes for lower limb alignment were automatically determined based on the segmentation masks (Fig. 5).The computer-aided automatic measurement times from image data loading to determining the four alignment parameters were recorded.
The femoral head anatomic feature point A circle was fit to the segmentation outline of the femoral head to determine its centre.

The distal femur anatomic feature point and the distal femur surface line
The region comprising the distal femur surface line and the centroid of the segmentation outline was identified as the distal femur anatomic feature point.The distal femur surface line was determined by minimisig the distance between the bottom line of the bounding box and the segmentation outline, resulting in two points.The highest point within the defined area, encompassing the outline, was designated as the distal femur anatomic feature point.
The proximal tibia anatomic feature points Two peaks were detected from the segmentation outline, and the midpoint between these two points was extracted to determine the proximal tibia anatomic feature points.Next, an orthogonal line was created by connecting the two points and the midpoint, and the position along the segmented outline where the distance between the orthogonal line and the outline was minimised was defined as the proximal tibia anatomic feature point.

The proximal tibia surface line
The convex hull 12,13 and bounding box of the segmentation outline were calculated.To determine the feature points, candidate points were identified by selecting points above the centroid of the segmentation outline within the region defined by the convex hull.Next, the proximal tibia surface line was defined by identifying the two points closest to the upper corner points of the bounding box from the candidate points.

Distal tibia anatomic feature points
Two talus feature points were defined by applying the same method of defining the proximal tibia surface line.
Next, an orthogonal line was constructed by connecting the midpoint of the two talus feature points, and the position where the distance between the orthogonal line and the segmented outline of the distal tibia was minimum was defined as the distal tibia anatomic feature point.

Statistical analysis
We implemented global accuracy, mean accuracy, mean intersection over union (IoU), weighted IoU, and the dice similarity coefficient (DSC) to evaluate the segmentation algorithm's performance, which compares the www.nature.com/scientificreports/similarity of the automated segmentation mask with the human-annotated segmentation mask.As a representative measurement, we considered a DSC ≥ 0.7 as indicative of excellent agreement between two segmented regions, following previous studies 14,15 .We confirmed normality in each group for mTFA, mLDFA, MPTA, and JLCA using the Shapiro-Wilk test and performed group-wise comparisons of their means and standard deviations (SDs) using repeated measures analysis of variance (ANOVA) between three groups or paired t-tests between two groups.
We evaluated the intraobserver and interobserver agreement of mTFA, mLDFA, MPTA, and JLCA between the readers and algorithm using the intraclass correlation coefficient (ICC) to assess measurement reproducibility.Altman considered an ICC of 0.81-1 as very good, 0.61-0.8as good, and 0.41-0.6 as moderate (13).In the interobserver agreement test, we used the result of the second session for comparison when a reader performed two measurements.

Segmentation performance
As indicated in Table 1, we assessed the segmentation performance using metrics including global accuracy, mean accuracy, mean IoU, weighted IoU, and DSC to thoroughly analyze the results obtained in segmentation problems.The segmentation algorithm demonstrated excellent agreement with the human-annotated segmentation for all the anatomical regions, with an average DSC of 93% for the femoral head, 95% for the distal femur, 95% for the proximal tibia, 89% for the distal tibia, and 97% for the talus.Other values ranged from 96 to 98% for the femoral head, 95% to 96% for the distal femur, 96% to 98% for the proximal tibia, 93% to 96% for the distal tibia, and 94% to 98% for the talus.

Measurement times
The time taken for the manual measurements of lower limb alignment from the internal institution test set (n = 30) by the two readers averaged 86 min (average of 172 s/patient).In contrast, the time taken for computeraided automatic measurements was 25 min, including the loading time for training data (average of 50 s/patient),

External validation of the algorithm
External validation included 30 long-leg radiographs from consecutive patients at an external hospital.Intraobserver correlations (ICC ranges: 0.9393-0.9979)between sessions 1 and 2 for Reader 3 and the interobserver correlations (ICC ranges, 0.7126-0.9695)between the manual and automatic measurements were good to very good, as shown in Table 5.There was no statistically significant difference between the measurements of the lower limb alignment by the reader and algorithm in the external validation, as shown in The average angle differences between the reader and algorithm are shown in Fig. 6.

Discussion
The variability of conventional alignment measurement causes controversy.Surgeons have reported inconsistencies and discordance between conventional radiographic measurements and intraoperative navigation measurements 16,17 .Wright et al. reported three sources of measurement inconsistency: physiological variations, procedure variability (inconsistent positioning), and intra-and interobserver variability 18 .The mean interobserver difference was 1.4° (SD = 1.1), and the mean intra-observer difference was 0.7° (SD = 0.9).Laskin et al. reported up to 7° variability in tibiofemoral angle measurements among 50 surgeons 19 .Automated measurement reduces these errors by minimising subjectivity.We proposed a time-efficient system that automatically measures mTFA, mLDFA, MPTA, and JLCA from full-length leg weight-bearing radiographs.The system strongly correlated with the manual measurements in the internal and external institution tests.
Accurate segmentation is required for the automatic measurement of lower limb alignment.Previous studies performed femoral and tibial segmentation using a traditional spectral clustering and active shape model 20 or unsupervised or atlas-guided approaches [21][22][23] .Deep-learning methods have been applied in image segmentation, with UNet being popular in the medical field.However, UNet may not be the most efficient option for relatively simple data (images with fewer large objects) as it may require more resources.In this study, a SegNet model was used for image segmentation.
There have been studies utilizing long leg radiographs to investigate detailed angular values related to coronal alignment [24][25][26][27][28] .However, these papers commonly employ a method where landmarks are directly annotated by humans, and algorithms are subsequently trained based on this annotated data.This approach inherently introduces a potential bias to the reference values, as the ground truth is produced by humans marking points manually.In contrast, our approach involves segmentation followed by the identification of landmarks using a predetermined rule-based system.This method has the potential to reduce interobserver agreement on ground truth, as it eliminates the reliance on manual point annotation by humans.Moreover, the segmentation mask generated by the algorithm can be used to identify new geometric landmarks.
Zheng et al. proposed a method for automatically measuring leg length discrepancy in a pediatric population using deep learning 29 .The method demonstrated a high concordance rate between manual and automatic segmentation of the pediatric leg, with a Dice value of 0.94.However, their study employed a wide exclusion criteria.In contrast, Schock et al. achieved a high level of concordance rate across a wide range of clinical and pathologic indications, with an average Sørensen-Dice coefficient of 0.97 for the femur and 0.96 for the tibia 10 .
In our internal validation, the readers and algorithm demonstrated a high concordance rate.The algorithm required 1 min/patient, in contrast to the manual measurement time of up to 3 min.In the external validation, the algorithm results significantly correlated with the manual measurements.However, the validation population consisted of young soldiers aged 20-30 years from a military hospital and may not represent the general population.JLCA values tended to be lower in military hospital patients than in those from the other included hospital.Nevertheless, these findings suggest that our algorithm may be useful in other populations.
Our study had several limitations.First, the training data did not include images from patients with skeletal dysplasia or hardware, limiting the clinical variability of the images.Second, several cases showed a large absolute error (> 5°) between manual and automated measurement results.Future studies should include a wider variety and number of training data to reduce these errors.Third, our study included a total of 374 images from 374 patients for algorithm development, which may be considered too few compared to those in larger studies.However, studies by Zheng et al. and Schock et al. enrolled 179 and 255 patients, respectively, and showed convincing results in their analyses, indicating that the number of cases analysed in our study (n = 374) was sufficient to demonstrate excellent performance 10,29 .
In conclusion, our deep-learning-based automated measurement algorithm accurately quantified the clinical values of lower limb alignment from long-leg radiographs and was faster than manual measurement was.The algorithm may be applied in clinical settings since it was validated for various patient images and clinical and pathological situations.

Figure 3 .
Figure 3. Alignment parameter measurement tool by manually selecting 8 feature points and 4 lines: (a) Femoral head centre, (b) centre of femoral intercondylar notch, centre of the tibial spines, two most distal points of medial and lateral femoral condyles, and two most proximal points of medial and lateral tibial plateaus and (c) mid-malleolar point.

Figure 4 .
Figure 4. Flowchart of the automatic segmentation algorithm.The first step was performed on raw images.The second step was performed based on the region of interest (ROI) image created by cropping the raw image.

Figure 5 .
Figure 5. Flowchart of automatic determination algorithm of anatomic feature points.(a) Segmented images.(b) Anatomic feature points automatically determined based on segmented images.(c) The mechanical axes for the lower limb alignment.

Table 3 .
Detailswhich was 3.44 times faster than that for manual measurement.The processing time taken after data loading averaged 20 s/patient.

Table 5 .
Details of intraobserver and interobserver agreement of lower limb alignment between the manual and automatic measurement on external validation.ICC in-class correlation coefficient, CI confidence interval, R3 reader 3, mTFA mechanical tibiofemoral angle, mLDFA mechanical lateral distal femoral articular angle, MPTA medial proximal tibial angle, JLCA joint line convergence angle, AI artificial intelligence.

Table 6 .
Details of manual and automatic measurement of lower limb alignment on external validation.SD standard deviation, mTFA mechanical tibiofemoral angle, mLDFA mechanical lateral distal femoral articular angle, MPTA medial proximal tibial angle, JLCA joint line convergence angle.