Introduction

The prevalence of knee osteoarthritis (OA) is increasing worldwide. In the United States, for example, knee OA affects 12% of those above 60 years old and has a significant economic impact on health systems, with a cost of approximately USD 140,000 per patient over their lifetime1,2,3. Strong evidence considers that structural and mechanical changes in joints are responsible for the development and progression of OA4,5. The intensity and distribution of forces in knee regions throughout life are related to articular cartilage degeneration6,7,8.

During the gait stance phase, the resultant tibiofemoral contact force presents a waveform with two clear peaks, the first as a result of initial contact of the foot with the ground during the loading response sub-phase and the second relative to the propulsion of the body forward during gait9,10. Considering the intensity and distribution of forces on the knee as one of those responsible for the onset and progression of knee OA, these forces must be analyzed. For a non-invasive analysis of joint contact forces, musculoskeletal (MSK) modelling approaches have been used10,11. MSK modeling platforms extend the utility of biomechanics lab measurement by coupling joint kinematics and ground reaction forces (GRFs) with computational methods to estimate muscle and joint reaction forces during human movements. With the development of theoretical and experimental methods to improve accuracy and reliability, human motion analysis has become a useful investigative and diagnostic tool in many research and clinical areas, such as medicine, ergonomics, and sports12.

Quantitative analysis in biomechanics generally requires a set of equipment, synchronization devices, and consumable materials. In practice, that means substantial investments are needed, considering typical technologies commercially available to support data acquisition. Thus, this demand restricts the exploration and application of biomechanics analysis in minor clinical settings, hospitals in underdeveloped countries, schools, sports clubs, etc. A suitable alternative relies on non-expensive (semi)-automatic methods. Machine learning (ML) is a class of algorithms frequently used for making predictions based on statistical patterns discovered from data. Several studies have applied these data-driven algorithms to gait lab prediction tasks to avoid hardware or computational bottlenecks by exploiting the inference capabilities of trained machine learning models13,14,15,16. Although some relevant research has been conducted with machine learning algorithms to predict discrete and time series kinematic and kinetic data for gait analysis, generalized models applied to healthy and pathological participants is underreported.

To address this issue, this study assessed the prediction of gait tibiofemoral contact forces of healthy and OA individuals followed by various regression techniques (24 different algorithms in total). We also adjudge the performance of each regression technique considering different sets of predictors, i.e., we evaluated if only primary kinematic (joint angles) and kinetic data (ground reaction forces) are enough to outcome accurate predictions, and how is the influence of including post-processing data, such as joint moments and muscles forces. We hypothesized that it is possible to accurately predict tibiofemoral contact forces of both healthy and pathological individuals from primary kinetic and kinematic data using ML methods. However, better performance would be achieved when post-processing data is presented.

Background

Biomechanics is the science that examines forces acting upon and within a biological structure and the effects produced by such forces. Biomechanics has an interdisciplinary approach and several investigation methods that provide information about the internal and external mechanics associated with the locomotion17.

Kinematic information obtained from the quantitative analysis provides data from the body and segments’ position, orientation, velocity, and acceleration. Combined with kinetic and segment parameters data, information relative to the center of mass, segment energy levels and power, joint moments and forces can be computed18. Joint kinematics and ground reaction forces (GRFs) offer measurable quantities that characterize movement quality and form the basis of a biomechanics laboratory assessment.

MSK models have been used for a non-invasive analysis of joint contact forces. When using MSK models, the choice of the model is critical, considering the variables of interest and the capacity of the model to estimate the contact forces19. However, patient kinematics and mechanics derived from MSK requires a multi-stage computational pipeline, including subject-specific calibration and scaling, as well as manual optimization procedures13. In this sense, ML algorithms are a suitable alternative to predict muscle and joint reaction forces.

Burton and colleagues13 evaluated four different ML algorithms to estimate joint contact and muscle forces activities of daily living based on anthropometric, GRFs, and joint angle data of total knee arthroplasty (TKA) patients. Patient mechanics were accurately predicted by recurrent neural networks, even considering fewer predictor variables. A similar approach was conducted by Giarmatzis et al.14 with young and elderly participants during treadmill walking. The authors assessed artificial neural networks (ANNs) and support vector regression (SVR) algorithms based on kinematics data and considered the inclusion/exclusion of GRFs in the dataset during training steps. ANNs presented the best-performing predictor of knee contact forces and excluding GRFs data did not substantially decrease the prediction power. Also, using ANNs, healthy participants’ knee flexion and adduction moments during various locomotion tasks were predicted in the study of Stetter and colleagues20. Recent research15 showed promising results for Random Forest (RF) and Convolutional Neural Networks (CNN) algorithms to predict kinematics and kinetics outcomes from inertial measurement unit (IMU) data of healthy individuals during walk trials. When pathological conditions were evaluated, Aljaaf and colleagues21 successfully predicted the frontal plane internal knee abduction moment of patients with alkaptonuria. From kinematics data, the authors evaluated four ML algorithms: Decision Tree, Random Forest, Linear Regression, and Multilayer Perceptron neural network. The Multilayer Perceptron neural network method presented superior results, considering both algorithms’ performance and speed. Also, in a previous study22, knee contact force was accurately predicted by integrating the Artificial Fish Swarm and the Random Forest algorithm. However, the authors evaluated data of only three patients implanted with an instrumented knee replacement, requiring evaluation of a possible generalization of the algorithms for a larger variability dataset.

Relative to knee OA patients, a previous study23 considered almost 500 participants. Personal cameras were utilized to record a 5-trial sit-to-stand task. Later, participants were invited to answer a survey, including physical and mental health characteristics, and OA status. The authors reported that the trunk kinematics parameters are sensitive enough to predict physical health and OA. A recent study24 applied the probabilistic principal component analysis (PPCA) model in IMU data of knee OA patients to predict tibiofemoral contact forces during gait. The root mean square error ranged from 0.15 to 0.40 of body weight, with moderate to strong correlations between contact forces estimated by MSK and PPCA models. Finally, the feasibility of using IMU training data from people with knee OA performing multiple clinically important activities was evaluated to predict knee joint sagittal plane kinematics using a deep learning approach25. However, none of these studies dealt with predicting joint reaction forces in knee OA patients and healthy individuals. One can argue that generic models may not properly predict the biomechanical data of pathological groups, and vice-versa.

Relevant research provided insights regarding using machine learning algorithms to address classification and prediction tasks involving biomechanical data. In the present study, we advance the state of the art regarding exploring of a broader set of ML techniques and their parameter settings, to predict tibiofemoral contact forces for both healthy individuals and OA patients. To the best of our knowledge, this is the first study to explore such a range of techniques and the first with knee OA patients. We also proposed to investigate the accuracy of different combinations of discrete data to predict the first and the second tibiofemoral contact force peaks during the gait stance phase. For a comprehensive evaluation with a clinical focus, we trained the models using data from healthy individuals and OA patients. However, separate tests were also conducted to ensure accuracy.

Materials and methods

Participants

The study evaluated 14 individuals with severe unilateral knee OA (KL4)26. The group included six females and eight males, with a median age of 63.7 (55.2; 68.1) years old, 1.67 m (1.61; 1.77) height and 80.2 kg (70.4; 85.3) weight. For the control groups, 14 healthy individuals were evaluated, seven females and seven males, with 63 (60; 64) years old, 1.69 m (1.63; 1.73) height, and 73.6 kg (61.0; 77.7) weight. Participants with body mass index (BMI) higher than 35 kg/m\(^2\) and a waist circumference higher than 102 cm for males and 88 cm for females were excluded from both groups. In the OA group, participants who had undergone any joint replacement for lower limbs or with any other degenerative joint conditions than the affected knee were excluded, as well, for both groups, any other conditions that could affect the gait.

The University of Ottawa and the Ottawa Hospital Research Institute ethics committees approved the study. All participants provided written informed consent, and the research was conducted by the principles of good clinical practice and the Declaration of Helsinki.

Data collection

The data collection was performed with ten infrared cameras (200 Hz, 2 Vantage V5 and 8 Vero 2.2, Vicon, Oxford Metrics, UK) and four force plates (1000 Hz, model 9286B, Kistler; model FP4060, Bertec, USA) embedded in the floor, in the middle distance of the ten meters walkway. For tracking the segments, the University of Ottawa Motion Analysis Model (UOMAM) marker set was used27. A static kinematic capture was performed in a similar anatomical position with shoulder abduction of around 30 degrees. Next, three to five gait trials were performed at a self-selected pace.

Data processing

The workflow of the study methodology is presented in Fig. 1. First, the marker trajectories were labeled using the manufacturer’s software, and the gaps were filled. The force plate data were filtered with a 4th order (zero lag) Butterworth filter with a cut-off frequency of 10 Hz. A Woltring filter with a mean standard error of 15 mm was applied for the kinematic data. The gait stance phase was cropped using the foot strike and the foot off using the vertical force signal from the force plate, with a threshold of 10 N. The stance phase was normalized to 101 points, and then data was exported for OpenSim formats.

Figure 1
figure 1

The methodology workflow, from data collection to machine learning setup for tibiofemoral contact forces prediction.

Using the OpenSim 3.3 software28, a generic model was scaled using a marker-based approach. The generic MSK model employed29 implemented muscle parameters that reduced late-stance knee contact force30. Basically, the adjustments were: (a) knee mediolateral translation was locked, (b) adjustments in passive muscle forces and tendon compliance proposed by Uhlrich et al.30 , and (c) muscle-tendon units paths for gluteus medius, gluteus minimus, and tensor fascia latae were adjusted about the origin (moved superiorly and laterally) and insertion (anteriorly). The MSK model developed in OpenSim and employed in this study is available for download (see Sect. 7). The model included 80 lower-limb Hill-type muscle-tendon units with 37 degrees of freedom and 17 ideal torque actuators driving the upper body31. The model allowed for estimating the medial and lateral compartments of the vertical tibiofemoral contact force9,32.

The inverse kinematics, inverse dynamics, static optimization, and joint reaction analyses (JRA) were processed using the Batch OpenSim Processing Scripts (BOPS) Matlab toolbox33. Static optimization was employed to calculate the muscle activation and forces, which minimized the sum of squared muscle activation11. The JRA computed the resultant forces and moments in each joint. For tibiofemoral forces, the total force was considered as the sum of the lateral and medial compartment vertical forces32. Thus, the time series for all variables were extracted as a function of the stance phase.

Dataset organization and machine learning algorithms

Considering that all participants (\(n = 28\)) performed 3–5 trials, the data source was formed by 126 elements. Data were split into 90 samples for training data (70%) and 36 for test data (30%), according to recent recommendations regarding optimal ratio for data splitting34. Samples related to a single participant were included either in the training set or in the test set, i.e., no participants from the training dataset were included in the test dataset. The test data was further independently evaluated into three forms: All Participants (36 samples), OA Patients (20 samples), and Control Individuals (16 samples).

To evaluate the dependency between the predicting variables and the accuracy of the tibiofemoral contact forces, three input datasets were assessed (Table 1):

  • Input 1: only with primary kinematic and kinetic data;

  • Input 2: data from Input 1 with hip and knee moments; and

  • Input 3: data from Input 2 with muscle forces.

Table 1 Dataset input assessed by the machine learning algorithms.

In total, 24 machine learning algorithms were evaluated in the present study. Several experiments were performed for each algorithm to identify the best parameters based on training accuracy. The algorithms were selected based on previous literature with biomechanics and health sciences data prediction13,14,23,25,35. A brief description of the parameters and hyperparameters tuning (when applicable) tested and selected over experiments and respective references for each algorithm are presented in Table 12 in Appendix A.

Model evaluation and statistical analysis

The performance of each model for training and each test dataset (All Participants, OA Patients, and Control Individuals) considering the three input options (Input 1, Input 2, and Input 3) was evaluated based on mean absolute error (MAE), root mean squared error (RMSE), Mean Delta Force (i.e., the difference between MSK model tibiofemoral force and predicted tibial force) and 95% Confidence Interval (CI), Pearson Correlation Coefficient (R), the coefficient of determination (R\(^2\)). The coefficient of determination R\(^2>0.70\) was defined as high36,37. Additionally, to have a measure of the error relative to the peak values estimated by the MSK model, we calculated the relative peak error:

$$\begin{aligned} RPE = \frac{ \left| Predicted_{Peak}-MSK_{Peak} \right| }{MSK_{Peak}}\times 100 \end{aligned}$$
(1)

All the algorithms and performance analyses were run using Matlab Software (MATLAB R2021b—MathWorks, Inc., Natick, MA, USA). Part of the algorithms was personally written based on previous literature codes35. The parameters/hyperparameters tested for each model, the training and independent tests steps were performed in an Intel\(^{(R)}\) Core\(^{(TM)}\) i7-9750H generation and NVIDIA GeForce RTX 2060 GPU machine.

Results

Figure 2 presents, for both OA and Control groups, violin plots of selected kinetic and kinematic data used as predictors variables, as well as the predicted variables (1st and 2nd tibiofemoral force peaks) during gait. At the top, the vertical ground reaction forces peaks during gait were represented. At the center of the figure, kinematic data was exemplified by the hip and knee range of motion at the sagittal plane. At the bottom, the key-predicted variables of the present study were represented, calculated from the musculoskeletal model.

Tables 13, 14 and 15 in Appendix B present the training results for each model, considering Inputs 1, 2, and 3, respectively. As expected, most of the ML models presented high coefficients of determination and low errors.

Figure 2
figure 2

Descriptive statistics of selected data explored in the present study for both control and osteoarthritis groups.

The experimental results on independent tests were performed considering three groups: all participants (formed by healthy individuals and knee osteoarthritis patients), OA patients, and Controls (formed only by healthy participants). Tables 23, and 4 present the performance of each model for the All Participants group, considering Inputs 1, 2, and 3 as training data, respectively.

When Input 1 was applied as training data, the range of MAE for the 1st peak ranged from 0.17 to 0.49. The Gaussian Regression (Kernel-exponential) presented the highest accuracy (in bold lettering), but good performance was identified for Gaussian Regression (Kernel-matern 32) and Gaussian SVR. For the prediction of the 2nd peak, results presented lower accuracy with MAE ranging from 0.28 to 0.91. The higher accuracy was achieved by the DNNE model (in bold text). When Input 2 was set as training data, MAE ranged from 0.19 to 0.68, with higher accuracy found for Gaussian Regression (Kernel-matern 32). For the 2nd peak, poor results were found, with MAE ranging from 0.29 to 0.75.

Interestingly, for both peaks, proving more information (i.e., Input 2 considers data from Input 1 and joint moments data) as training data did not provide increased accuracy. However, when Input 3 was used as training data, increased performance was identified. For the 1st peak, MAE ranged from 0.09 to 0.67. The Gaussian SVR model achieved the highest accuracy, but promising results were also identified for Gaussian Regression (Kernel-matern 32 and 52). Considering the predictions of the 2nd peak, MAE ranged from 0.16 to 0.55, with higher accuracy found for the Linear SVR model.

Table 2 Summary of the performance of the algorithms for all participants group, considering Input 1 as training data.
Table 3 Summary of the performance of the algorithms for all participants group, considering input 2 as training data.
Table 4 Summary of the performance of the algorithms for All Participants group, considering Input 3 as training data.

Tables 56, and 7 present the performance of each model for the OA group, considering Inputs 1, 2, and 3 as training data, respectively. When Input 1 was set up as training data, considering the 1st peak predictions, MAE ranged from 0.12 to 0.57. The highest accuracy was identified for Gaussian Regression (Kernel-matern 32) with a coefficient of determination of 0.86, but an excellent performance was also achieved by Gaussian Regression (Kernel-exponential) and Linear Regression. For the 2nd peak, the best accuracy was achieved by the DNNE regressor, with a coefficient of determination of 0.90 and an RPE lower than 5%.

Considering Input 2 as the training dataset, MAE ranged from 0.14 to 0.64 with the highest accuracy for 1st peak predictions identified for the Gaussian Regression (Kernel-matern 32) considering its highest coefficient of determination and an RPE lower than 7%. For the 2nd peak, the highest coefficient of determination was identified for the DNNE model, but with an MAE around 0.80.

MAE ranged from 0.07 to 1.11 for 1st peak predictions when Input 3 was used as the training dataset, being the Gaussian Regression (Kernel-matern 32) the model that presented the highest coefficient of determination. Excellent results were also identified for the Linear and Gaussian SVR, with an RPE lower than around 3%. For the 2nd peak predictions, MAE ranged from 0.15 to 0.73, with the highest accuracy coefficient of determination identified for the Linear SVR and the lowest RPE for the Neural Networks.

Table 5 Summary of the performance of the algorithms for the OA group, considering input 1 as training data.
Table 6 Summary of the performance of the algorithms for the OA group, considering Input 2 as training data.
Table 7 Summary of the performance of the algorithms for the OA group, considering Input 3 as training data.

Finally, independent tests were also performed for healthy participants, labelled as the Control group. Tables 89, and 10 present algorithms performance taking into consideration Input 1, 2, and 3 as the training datasets, respectively. Input 1 as the training dataset resulted in MAE ranging from 0.17 and 0.50 for the 1st peak prediction, with the highest coefficient of determination achieved by Gaussian regressors (Kernel matern 52 and rational quadratic). Excellent performance was also identified for the Kernel squared exponential Gaussian model, with an RPE lower than 7%. For the 2nd peak prediction, MAE ranged from 0.38 to 0.72, with the highest accuracy achieved by Gaussian Regressor (Kernel squared exponential).

When Input 2 was set as training data, for the 1st peak prediction, MAE ranged from 0.17 to 0.47. The highest accuracy was identified for the Cubic SVR, with an RPE lower than 8% and a coefficient of determination higher than 0.70. The Gaussian regressors (Kernel matern 32 and 52) also achieved promising performance, with an RPE lower than 7%. For the 2nd peak prediction, Quadratic SVR achieved the highest performance, with an RPE lower than 6% and a coefficient of determination of 0.80. Considering all models, MAE ranged from 0.14 to 0.55.

MAE ranged from 0.10 to 0.26 for the 1st peak prediction, considering Input 3 as the training dataset. The lowest RPE was identified for the Cubic SVR model and the highest coefficient of determination (0.98) for the Lasso Regression. For the 2nd peak, MAE ranged from 0.09 to 0.30. The lowest RPE was identified for the Gaussian SVR, while Kernel Ridge Regression presented the highest coefficient of determination (0.92).

Table 8 Summary of the algorithms’ performance for the Control group, considering input 1 as training data.
Table 9 Summary of the performance of the algorithms for the control group, considering Input 2 as training data.
Table 10 Summary of the performance of the algorithms for the control group, considering Input 3 as training data.

Discussion

This study presented a comprehensive evaluation of different machine learning models to predict tibiofemoral contact forces during the gait task of healthy and knee OA patients. Results were analyzed in light of different training datasets. The main results were: (a) accurate predictions of the tibiofemoral contact forces were possible using machine learning algorithms, independent of the participants’ features (healthy or OA); (b) in general, the 1st force peak was not very sensitive to changes in the input dataset, reaching promising results only with kinetic and kinematic primary data; (c) in general, the 2nd force peak was sensitive to changes in the input data, once better results were achieved when a greater range of variables was defined as training data; (d) when analyzed independently by the pre-trained machine learning models, the OA and Control groups presented promising accuracy to predict both peaks with primary data while using lower limbs joint moments information.

Machine learning algorithms’ performance was evaluated considering a different number of predicting variables (labelled as Input 1, 2, and 3) as the training dataset. It is important to emphasize that the training dataset was composed of healthy and knee OA patients. Still, independent tests were performed considering a mixed group (labelled as All participants, with healthy and symptomatic individuals), and separated groups. No participants included in the training dataset were evaluated during the independent tests, assuring that the model generalizes well to new unseen data and does not overfit due to dependency between training and test split data38. In general, our results presented similar or higher accuracy for knee contact forces prediction when compared to a previous study with total knee replacement patients13 that reported mean correlation coefficients ranging from 0.93 to 0.94, and when compared to Giarmatzis and colleagues14 study that reported correlation coefficients ranging from 0.89 to 0.98. However, the previous study included some trials from the participants in the training set and other trials from the same participants in the test set. When the trials from participants were in the training set or the test set, correlation coefficients ranged from 0.45 to 0.85.

In general, our results show that the 1st force peak was accurately predicted, even when only primary kinematic and kinetic data was used as the training dataset. Gaussian regressors and variations (Kernel exponential, matern 32, and matern 52) provided promising results with coefficients of determination above 0.70 and relative peak error under 7%. The Gaussian regressors family is considered a non-parametric model, which considers the probability distribution over all admissible functions that fit the data, allowing for flexible modeling of complex and non-linear relationships between variables35. During gait, the 1st tibiofemoral contact force is clinically relevant because it is related to the maximum force experienced by the knee joint during the initial contact of the foot with the ground. This moment is related to quadriceps eccentric contraction to counterbalance knee flexion during the loading response phase. A good prediction of this variable extends the possibility of understanding the knee compressive loads that may represent a magnitude of approximately 3 times body weight at normal walking speed39. Our comprehensive evaluation suggests that this information may be accurately predicted with a relative amount of biomechanical data.

On the other hand, ML models needed more information to present good performance to predict the 2nd tibiofemoral contact force peak, mainly when the All participants group was evaluated. For predictions specifically in OA or Control groups, Input 1 and Input 2 datasets were enough. Linear SVR presented the highest accuracy for All participants group. However, it demanded more complex data for good predictions, such as information on muscle forces. For the OA group, promising results were identified for Fast Decorrelated Neural Network Ensembles (DNNE) considering that only Input 1 training data was enough for accurate predictions. DNNE randomly initializes the hidden layer parameters of base random vector functional link networks and then employs the least square method with a negative correlation learning scheme to analytically calculate the output weights of these base networks40. It is a fast and efficient solution to build ensemble models, which facilitates its application for analyzing biomechanical data, reducing the computational bottleneck for obtaining internal biomechanical parameters. For the Control group, 2nd tibiofemoral force peak, promising results were obtained with the Quadratic Support Vector Machine model, with Input 2 as the training dataset. Quadradic SVR also performed well in other health science problems (e.g., brain age prediction) showing flexibility in data-generalization35. The second tibiofemoral contact force is clinically relevant during gait because it is associated with the push-off phase. This phase is critical for efficient forward movement and may be connected to functional ability. The difficulty in predicting the second peak may be attributed to the different coordination patterns observed during late stance. As demonstrated in Fig. 2, exploratory analysis allows to visualize that the data distribution for the 2nd peak presents great variability, mainly for the OA group. We speculate that this variability may explain the worst predictions of the machine learning models for the 2nd peak.

The most promising results were achieved when the OA and Control participants were tested separately. This indicates that using models according to participants’ diagnoses/characteristics may improve the model’s output. Table 11 provides a summary of the best model for each group based on the following criteria: the need for the least amount of data as input (i.e., Input 1 is preferred over Input 2 and Input 2 over Input 3), the model with the lowest MAE but with at least 0.7 variance explained36,37. The models may be chosen accordingly if a participant is properly classified between knee OA or healthy. If there is no clear classification the participant may be evaluated as belonging to the ‘All participants’ group. For these situations, it may be necessary to collect all variables included in Input 3 to a more accurate prediction of the 2nd tibiofemoral force peak during gait. However, in terms hardware or computational bottlenecks avoidance, Input 3 represents almost the entire process of data processing and analysis, including the long-lasting static optimization procedure. One can argue that there is no great advantages in using Input 3 to reduce the associated processing time. In this sense, it important to emphasize that when Input 3 was used as training dataset, RPE was around 4.7%, against \(\approx\)10% for the Quadratic SVR when Input 2 was used. Thus, researchers and clinic professionals may evaluate the pros and cons of every model and input combinations to choose the most appropriate procedure depending on the evaluation objectives and assumed error thresholds. Additionally, it seems promising to perform an in-depth evaluation regarding the roles that each variable presents on the predictions quality. Although an evaluation of the weights of each variable on the model is possible for the linear models, the non-linear models are more complex and requires further development of new algorithms to identify the key-variables and the explained variance for the predictions with the best models.

Future work may investigate, from evaluating these 24 models, a fusion41 of the best-performed ones for even improved accuracy prediction with the least amount of data required. Additionally, the promising results of tibiofemoral contact forces estimate from primary kinematic and kinetic data highlight a broad possibility of providing accurate biomechanical analysis in clinical settings. More than that, IMU24 and markerless systems42 represent low-cost alternatives to provide biomechanical data that, together with ML algorithms evaluated in the present study, may supply joint contact forces reports with very low time-consuming protocols.

Table 11 Summary of the best algorithms for each group.

Finally, this study has some limitations to be highlighted. Although our sample size is relatively large considering the specific inclusion criteria for the OA patients, larger datasets are desirable for ML evaluation study design. Thus, it is possible that, with more samples, other ML models may outperform the ones reported in the present study, or even better accurate the predictions achieved with the best models presented here. Also, the symptomatic group was composed of severe unilateral knee OA (KL4 class). Thus, our results may not be generalized for different stages of OA. Further studies concerning the assessment of ML methods in scenarios in which more variation in the OA characteristics between patients can be included in both training and test datasets will help to improve model’s prediction. In this sense, two alternatives deserve attention. The first one is that machine learning models may benefit from public multimodal datasets43 to improve the training step. However, it is also necessary a cooperation from the scientific community to provide public datasets not only of injury-free participants but pathological individuals, such as OA patients. The second promising alternative is to develop deep learning (DL) solutions presented in literature for synthetic data generation, such as “generative adversarial networks” (GANs)44. Future studies may also investigate the potential of such data augmentation strategies to improve the accuracy of the models, specifically for pathological individuals in respect to their physical function condition. We also emphasize that both the training and test dataset included males and females. One can argue that sex-specific regression models may outperform generic models. However, an additional split in our data for female and male dataset training and testing would restrict the generalization of the results. On the other hand, considering the very promising results reported in the present study with a joined sample, future studies with refined models is highly recommended. Lastly, it is important to consider that the tibiofemoral forces results used in this study are derived from musculoskeletal simulations, and the outcomes are influenced by factors such as the choice of the model, scaling techniques, and optimization processes10,45. However, it should be noted that direct in vivo measurements have limitations in terms of sample size and their applicability, as they rely on the use of an instrumented knee prosthesis.

Conclusion

This study evaluated 24 machine learning models to predict tibiofemoral contact forces in healthy individuals and knee OA patients. Machine learning models could predict tibiofemoral contact forces, and may be an alternative for sites with fewer structures for biomechanical evaluations. Our study provided insights into the most promising models considering the amount of biomechanical data required as input data according to participant’s classification (healthy or knee OA), representing an important starting point for the generalization of biomechanical analyses in clinical settings, as well as for improvements in musculoskeletal models equations for the calculation of joint reaction forces.