A machine learning-based diagnostic model associated with knee osteoarthritis severity

Knee osteoarthritis (KOA) is characterized by pain and decreased gait function. We aimed to find KOA-related gait features based on patient reported outcome measures (PROMs) and develop regression models using machine learning algorithms to estimate KOA severity. The study included 375 volunteers with variable KOA grades. The severity of KOA was determined using the Western Ontario and McMaster Universities Osteoarthritis Index (WOMAC). WOMAC scores were used to classify disease severity into three groups. A total of 1087 features were extracted from the gait data. An ANOVA and student’s t-test were performed and only features that were significant were selected for inclusion in the machine learning algorithm. Three WOMAC subscales (physical function, pain and stiffness) were further divided into three classes. An ANOVA was performed to determine which selected features were significantly related to the subscales. Both linear regression models and a random forest regression was used to estimate patient the WOMAC scores. Forty-three features were selected based on ANOVA and student’s t-test results. The following number of features were selected from each joint: 12 from hip, 1 feature from pelvic, 17 features from knee, 9 features from ankle, 1 feature from foot, and 3 features from spatiotemporal parameters. A significance level of < 0.0001 and < 0.00003 was set for the ANOVA and t-test, respectively. The physical function, pain, and stiffness subscales were related to 41, 10, and 16 features, respectively. Linear regression models showed a correlation of 0.723 and the machine learning algorithm showed a correlation of 0.741. The severity of KOA was predicted by gait analysis features, which were incorporated to develop an objective estimation model for KOA severity. The identified features may serve as a tool to guide rehabilitation and progress assessments. In addition, the estimation model presented here suggests an approach for clinical application of gait analysis data for KOA evaluation.

accurate answers 8 .In addition, discordance between WOMAC scores and actual physical gait improvement has been noted 14 .
Clinical gait analysis is a powerful technique that provides objective and reliable biomechanical information, including temporal waveforms for each of the lower body joints 15 .The measurement devices for gait quantification includes 3D motion capture, force plates, instrumented mats, wearable sensors with inertial measurement unites and accelerometer 16 .Since gait dysfunction can be evaluated objectively using this method, it has been suggested as an alternative tool for measuring patient disabilities 8,14,17 .Previously 18 , we identified an association between gait analysis features and KOA radiological grade and showed successful estimation of the Kellgren-Lawerence (KL) grade using a machine learning algorithm based on key gait features.
Information is limited regarding the relationship between PROMs and kinetic and kinematic gait features.These analyses can provide potentially objective measures of symptoms and provide insight regarding the relationship between symptomatic diagnosis of disease and gait quality.Current barriers to clinical application of gait analysis include the absence of a standard method for gait evaluation and the large volume and high complexity of gait analysis data 19 .Feature extraction is widely used method to analyze complex signal.Previous studies have extracted features from gait signal and analyzed the relationship between features and KOA severity 7,20,21 .However, features reported from the most previous studies were limited to traditional features and joints.In this study, we have extracted as many feature from gait data including both traditional and engineering methods from multiple joints.Also, we anticipate that WOMAC estimation model based on gait feature would explain the biomechanical difference between the severity of KOA and provide further understanding for the relationship between KOA and gait function.
Our cross-sectional study analyzed the relationship between gait data and the WOMAC scores of KOA patients.The WOMAC indices of KOA patients without cognitive impairment, depression and who were willing to answer accurately, were included to avoid longitudinal bias and other possible inaccuracies.We hypothesized that the WOMAC index and its three subscales would closely relate to KOA patients' gait function and that specific features would change with disease progression.Overall, our study aimed to identify the key features associated with the WOMAC index and its three subscales, and to apply these key features to develop estimation models for WOMAC to improve rehabilitation and suggest standardize application for gait analysis.

Methods
Participants.This study was approved by our Institutional Review Board of Seoul National University Hospital (IRB no.1810-004-974) and were performed in accordance with relevant guidelines and regulations.Written informed consent was obtained from all participants.This study was performed using our gait lab database.The database consists of gait reports of KOA patients with various degrees of knee pain and healthy volunteers without any knee pain from 2013 to 2017.We excluded subjects based on the following criteria: (1) missing some data for both legs; (2) aged < 20 years; (3) spine disease, hip, or ankle arthritis on x-ray; (4) inflammatory or traumatic arthritis of the knee; (5) any prior bone surgery in the lower extremities; and (6) cognitive impairment or depression.A total of 375 subjects were included in our study.

Data collection.
All gait analysis data, including kinetic, kinematic and spatial-temporal data, were collected at the Human Motion Analysis Laboratory of Seoul National University Hospital following OrthoTrack 6.6 Reference Manual 22 with daily quality check to maintain the error within 1 mm.All data collection process was performed by an operator with 20 years of experience.The subjects has a few minutes to warm up to acclimate to the setting before placing reflective markers based on the Helen Hayes arrangement.After placing the markers, an operator asked the subjects to walk along a 9 m track.Motion data were collected using twelve charge-coupled device cameras with a three-dimensional optical motion capture system (Motion Analysis Corp., Santa Rosa, CA, USA) at a sampling frequency of 120 Hz.Two floor-embedded force plates were used to obtain the kinetic data.An average of five or six trials of the 9 m walk of the kinetic and kinematic data for each joint were used in this study.
All participants performed self-administered Korean version of WOMAC 23 with three subscales; pain (5 questions), stiffness (2 questions) and physical function (17 questions).Each questions were answered in numeric scale ranging from 0 (no symptoms) to 4 (extreme symptom).
Feature extraction and statistical analysis.All data analyses and classification were performed using MATLAB 2018b (MathWorks, Massachusetts).The gait features were extracted from the gait parameters, which are temporal signal of kinetic and kinematic data of hip, pelvic area, knee and ankle.These features included, but were not limited to, area under the curve, maximum value of swing phase, and minimum value of the curve.An additional 16 gait characteristics (i.e., velocity and cadence) were also selected as classification model features.Only the right leg was included to avoid statistical dependency from multiple observations of single individuals 24 .Detail information of extracted features were included in Supplementary Table S1.
To statically analyze the relationship between the WOMAC score and gait features, the severity of WOMAC was classified into three classless: mild, moderate, and severe.Each WOMAC questions are answered into 5 different answers: none (0), mild (1), moderate (2), severe (3) and extreme (4).To divide the WOMAC score into three different severities, 1.5, the midpoint between mild to moderate, and 2.5, the midpoint between moderate to severe, were chosen as the cut point and were multiplied by 24, which is the number of WOMAC questionnaires.Accordingly, WOMAC scores below 36 was classified as mild, scores between 36 and 60 were classified as moderate, and the scores above 60 were classified as severe.The WOMAC subscales were divided into three classes using the same procedure.A one-way analysis of variance (ANOVA) with a significance level of 0.0001 was performed.A student t-test was used to analyze class differences between each severity groups for features with significant difference as the result of ANOVA.For a multiple-comparison correction, a new alpha value of < 0.00003 was used as significance level according to Bonferroni correction 25 .Features that were significant for all three comparisons between each classes were selected as key features.Student's t-test was performed again for selected key features between each severity group divided in accordance to each subscale of WOMAC.
A multiple linear regression was performed to estimate the WOMAC index and to examine its relationship with WOMAC key features and observe feasibility of the estimation model.To resolve dataset imbalances, we down-sampled the sample size to 231.A random forest algorithm 26 , an ensemble learning method constructed with multiple decision trees, was used to build the regression model for WOMAC index estimation.'Statistic and Machine Learning Toolbox' from Matlab was used for the machine learning analysis.
The hold-out method was used for model validation only for random forest model.Seventy percent of the data were randomly selected to train the model and the other thirty percent of data were used for validation.The model was analyzed by observing the root mean square error (RMSE) and correlation between actual and estimated WOMAC score.
Clinical implication.Gait function of KOA patients decreases due to typical symptoms of KOA patients such as tenderness, loss of flexibility, and swelling.This study statistically analyzed the relationship between gait data and symptomatic severity of KOA and applied machine learning algorithm for WOMAC estimation.The implication of this study were followings: 1) Provides further understanding between KOA symptoms and gait data 2) Estimation model can be applied to patients who cannot properly perform WOMAC evaluation due to cognitive impairment or other clinical problem 3) If gait analysis can be performed with more accessible technology, such as wearable sensor and pose-estimation using camera, this study can serve as foundation research for patient independent diagnosis.

Results
Table 1 summarizes the participants' demographic characteristics and symptomatic severity.A total of 1083 features (of 23 gait parameters) were extracted from the gait analysis dataset and 42 features (12 hip, 1 pelvic, 17 knee, 8 ankle, 1 foot, and 3 spatiotemporal) were selected according to ANOVA and t-test results.The gait parameter features included hip rotation moment, hip flexion angle, hip adduction angle, hip power, pelvic obliquity angle, knee extension moment, knee flexion angle, knee power, knee varus angle, ankle plantarflexion moment, ankle power, foot progression angle, total speed, duration of single limb support phase (% of gait cycle), timing of initial double limb support (% of gait cycle), and timing of weight acceptance (% of gait cycle).Physical function was significantly related to all features, with the exception of hip power.Pain differed significantly in relation to hip adduction angle, hip power, knee power, knee varus angle, ankle plantarflexion moment, and ankle power.Stiffness was significantly different in relation to hip rotation moment, hip adduction angle, knee flexion angle, and knee varus angle.
The representative mean values of parameters for each group were divided according to WOMAC score (Fig. 1).Table 2 summarizes the key WOMAC features with mean and standard deviation.All features listed in Table 2 showed significant difference among all severity groups according to student's t-test.Area under the curve during stance phase of hip adduction angle, variance of knee flexion angle, area under the curve of stance phase and mid-reference level of knee varus angle, and peak-to-RMS of ankle power showed most significant difference among the three groups.The RMSE for linear regression was 16.10, and RMSE for random forest regression was 17.38.The correlation between actual and estimated WOMAC score was 0.722 and 0.741, respectively for linear regression and random forest regression (Fig. 2).

Discussion
While previous studies 14,27,28 have reported the relationship between spatiotemporal gait features, such as speed and stride length, and WOMAC indices of KOA or hip OA patients, this is the first study to analyze the relationship between kinetic and kinematic gait parameters and the WOMAC indices.Biomechanical intervention is recognized an alternative method to control pain and improve physical function 29 .Gait analysis provides meaningful KOA biomechanical information, but its complexity has limited its clinical applicability 19,30 .Here, we statistically analyzed key gait cycle features and identified critical KOA biomechanical information.In addition, we built linear and machine learning estimation models for the WOMAC index based on the identified features.While PROM methods are cheap, easy and quick, they are not applicable to patients who are unable or unwilling www.nature.com/scientificreports/ to perform the task.Despite the ability of gait analysis to provide valuable information about KOA biomechanical properties, a standardized method is not available for clinical use.Our estimation model provides objective and reliable symptomatic results and suggests utility as a consistent method for evaluating gait analysis data.Finally, we have extracted key features based on both conventional methods, such as mean value of the curve, and novel engineering methods, such as occupied bandwidth of the curve in frequency domain.Conventional features, such as peak and minimum gait data values, are limited to load or motion at a single time point during the gait cycle and do not contain information over the gait cycle 31 .We have developed methods that include information over the entire gait cycle, such as area under the curve, root mean square (RMS) and power spectrum.We also conducted detailed feature analysis during gait cycle sub-phases: loading response, mid-stance, terminal-stance and pre-swing of stance phase, and initial swing mid-swing and terminal swing of swing phase.We identified well-known joint parameters that are specific to KOA patients and function in gait performance (listed in Table 2).Ankle dorsiflexion moment, for example, is an ankle joint movement involved in supination and pronation and three-dimensional ankle joint motions 32 .Previous studies have shown that knee varus angle changes are closely related to KOA 33,34 .Lo and colleagues reported an association between knee varus angle and knee pain during weight bearing activities, most likely due to narrowing of the medial joint space, opening of the lateral space or increased lateral soft tissue pretension.We found that hip, knee and ankle joint power, the product of torque and angular velocity, differed significantly according to WOMAC severity.Similarly, Segal et al. 35 reported joint power differences between symptomatic KOA patients and high-functioning controls.In one of our previous studies, we have reported that difference between maximum and minimum value of both hip flexion angle and hip adduction angle were smaller in KOA patients compared with control group 36 .Weidow et al. have reported that 37 the maximum value of hip rotation moment significantly differed between symptomatic and asymptomatic group.KOA patients was reported to have significantly lower knee flexion range of motion in swing and stance phase during gait cycle, which is in agreement with our findings 38 .McCarthy et al. 39 claimed that knee extension moment is also an important gait characteristic to analyze the relationship between KOA  and gait data.Bechard et al. 40 reported that toe-out angle of foot progression angle was significantly smaller in patients with KOA and pelvic obliquity angle was reported to be correlated with symptoms of KOA.
In our study 18 , physical function was influenced by the greatest number of features (42 from 13 parameters), indicating that WOMAC is a comprehensive score that incorporates the movement of many joints.This is reasonable given that KOA also effects the kinetic and kinematics of hip and ankle joints.Thus, to improve the physical function of patients, it is important to train not just the knee joint but also other KOA-affected joints 41 .The results of our study provide guidelines for KOA exercise and rehabilitation (Table 2).Pain and stiffness were most related to knee-specific parameters.This pattern is demonstrated by tibiofemoral OA, which is a fairly common form of OA related to varus alignment.Tibiofemoral OA patients report higher pain levels than patellofemoral OA patients.Knee extension moment was not significantly related to pain.However, the WOMAC pain questionnaire only included one stair-related question, which may have influenced this result.In addition, the questionnaire also lacked questions related to knee adduction moment.Stiffness showed a significant relationship with knee flexion angle, a sagittal plane parameter.This is notable because the main movement of the knee, extension and flexion, is included in the sagittal plane.
A limitation of our study was that it was validated internally; to validate the model for overfitting it should be subjected to external validation.In addition, the features identified in this study were not applied to actual rehabilitation.Future studies should apply the key features to patient rehabilitation and determine the therapeutic effects.
In conclusion, we have built estimation models for the WOMAC index and have identified features associated with the WOMAC and its subscales.The features have been extracted using a feature engineering technique and statistically selected and validated.The estimation models were generated by traditional linear regression and random forest regression models.Our estimation model and list of key features represents an objective and alternative option for KOA symptom diagnosis and rehabilitation.

Figure 1 .
Figure 1.Mean values of representative gait parameters for each symptomatic severity of KOA where features were extracted from the (a) ankle power, (b) hip adduction angle, (c) knee flexion angle, and (d) knee varus angle.The shaded area represents standard deviation.

Figure 2 .
Figure 2. Regression result for WOMAC results using (a) linear regression (b) the random forest algorithm and identified key features.

Table 2 .
Mean and standard deviation of selected features significantly different for WOMAC severity groups.The bolded rows are four features that showed most significant difference among each groups.