Deep Learning Predicts Total Knee Replacement from Magnetic Resonance Images

Knee Osteoarthritis (OA) is a common musculoskeletal disorder in the United States. When diagnosed at early stages, lifestyle interventions such as exercise and weight loss can slow OA progression, but at later stages, only an invasive option is available: total knee replacement (TKR). Though a generally successful procedure, only 2/3 of patients who undergo the procedure report their knees feeling “normal” post-operation, and complications can arise that require revision. This necessitates a model to identify a population at higher risk of TKR, particularly at less advanced stages of OA, such that appropriate treatments can be implemented that slow OA progression and delay TKR. Here, we present a deep learning pipeline that leverages MRI images and clinical and demographic information to predict TKR with AUC 0.834 ± 0.036 (p < 0.05). Most notably, the pipeline predicts TKR with AUC 0.943 ± 0.057 (p < 0.05) for patients without OA. Furthermore, we develop occlusion maps for case-control pairs in test data and compare regions used by the model in both, thereby identifying TKR imaging biomarkers. As such, this work takes strides towards a pipeline with clinical utility, and the biomarkers identified further our understanding of OA progression and eventual TKR onset.

Given the multitude of factors on which a decision to pursue TKR is made, devising a model to predict if the invasive intervention will be necessary is a difficult task, but with obvious utility. For a patient in earlier stages of OA, a model predicting the patient to be at risk of TKR can be the impetus for a more aggressive nonsurgical treatment. Meanwhile, for a late-stage OA patient, a model predicting them to undergo TKR may facilitate a doctor and patient opting for the treatment earlier than they otherwise would, thereby reducing time spent pursuing nonsurgical alternatives with minimal probability of success while dealing with serious pain. Beyond this, if the model were to draw from medical images of the knee, it could identify anatomic regions most correlated with a TKR prediction. To this point, few studies have been conducted in this space, and those that have primarily investigate the importance of cartilage volume loss, subchondral bone defects, and bone marrow lesions [17][18][19] . An identification of more such biomarkers for TKR, however, could greatly improve understanding of both OA and TKR, and ultimately guide treatment strategies.
Predictive modeling of TKR, however, has a limited history, particularly with models that use medical images. A few studies have leveraged random forest regression, Cochran-Armitage tests for trend, and t-tests to identify demographic, general health, and physical examination measurements that most strongly correlate with TKR or total joint arthroplasty (TJA) 20,21 . Others have taken these efforts further, using techniques such as multiple regression and multivariate risk prediction models to predict TKR outright 22,23 . To our knowledge, only one group has developed a predictive model of TKR that accepts image inputs, attaining performance that surpasses that of models using only clinical and demographic information 24 . Notably, past TKR predictive models largely measure performance by evaluating the area under the receiver operating characteristic (ROC) curve, which plots true positive rate against false positive rate 25 . However, in most datasets used in this space, the number of patients who eventually undergo TKR is dramatically higher among those who have advanced OA as opposed to those with no or moderate OA. Consequently, this performance metric (AUC), while effectively capturing a model's combination of sensitivity and specificity, can be inflated for TKR prediction by indiscriminately predicting patients without OA not to undergo TKR, while more accurately predicting patients with severe OA to undergo TKR, the latter of which is easier. As a result, while past works have made clear progress in predicting TKR, none have overcome datasets imbalanced with respect to OA severity to report sensitive and specific prediction at these early stages, where a model would have the most utility.
One technique that has shown promise in delivering such performance is deep learning (DL). DL, especially convolutional neural networks (CNNs), has made strides in image classification tasks, attaining performances on the popular ImageNet classification challenge that approach or surpass human performance [26][27][28] . DL shines when afforded large datasets, as its automated feature extraction allows one to solve problems too complex for conventional approaches 29 . Given the complex prognostic features in TKR recommendation, CNNs become more promising for TKR prediction. In the past, DL had seen limited utility in OA and TKR prediction due to the large dataset requirement for efficacy; that limitation has been somewhat mitigated by the curation of large-sized cohort studies such as the Osteoarthritis Initiative (OAI) 30 . Consequently, DL has recently been applied for knee OA classification and progression prediction 9,31-33 . The success of these works further suggests the feasibility of leveraging DL to predict TKR.
In this study, we formulate a DL-based pipeline that incorporates knee joint images in addition to clinical and demographic information to predict the onset of TKR (Fig. 1). We demonstrate that the pipeline's predictions using solely Magnetic Resonance Imaging (MRI) images matches that of past work, while the integration of MRI image-based predictions with non-imaging variables facilitates TKR prediction with especially high sensitivity and specificity for patients without radiographic OA. Furthermore, we show the increase in pipeline performance when using 3D MRI images as opposed to 2D radiographs, suggesting MRI may have a role in TKR risk screening despite higher costs and more limited availability. And finally, we leverage occlusion maps to conduct a thorough Figure 1. Pipeline predicting if patient will undergo TKR within 5 years from MRI/X-ray images and nonimaging variables. MRI and X-ray images are center-cropped and cropped to a region centered around the joint, respectively, and normalized. DenseNet-121 is pretrained to predict OA and fine-tuned to predict TKR. Imagebased predictions and clinical information are fed to a logistic regression (LR) ensemble based on OA severity. Each ensemble, whose hyperparameters were optimized for Youden's index in a hyperparameter search, averages predictions of LR models in its OA severity for final TKR prediction. Pipeline is subsequently analyzed through occlusion map analysis to identify imaging biomarkers of TKR.
Pipeline architecture. The DL-based pipeline is based on a DenseNet-121 with the following parameters: 16 filters in initial layer, growth rate of 32, pooling block configuration of [6,12,24,16], 4 bottleneck layers, 2 classes. The same architecture was used for the radiograph and MRI pipelines, but for the MRI pipeline, we modified the convolutional layers, batch normalization layers, pooling layers, and leaky rectified linear unit (ReLU) layers to allow for 3D image input 40 . The network yielded a scalar reflecting certainty of TKR within 5 years, which was added to the non-imaging variables. The 28 resulting variables were fed into one of three sets of Logistic Regression (LR) ensembles, with each ensemble optimized to maximize sensitivity and specificity in cases of no (KL = 0, 1), moderate (KL = 2, 3), and severe OA (KL = 4). Based on the KL grade of a sample, it was fed into an LR ensemble, yielding a prediction as to whether the patient will undergo a TKR within 5 years.

training.
A DenseNet-121 was initially pretrained to predict knee OA using the entire training set, assessing cross-entropy loss and accuracy on the validation set after completion of each epoch. The pre-train was stopped when validation loss began to increase. The pretrained model was subsequently fine-tuned to predict TKR. We utilized a random search to determine optimal learning rate, dropout rate, weights of the cross-entropy loss function, and number of layers to freeze during fine-tuning. The search was carried out for 25 iterations, after which a set of parameters were selected that yielded the best combination of accuracy, sensitivity, and specificity on the validation set. Due to computational intensity, the hyperparameter search was not conducted on the entire dataset: for the 2D DenseNet-121, 10% of training and validation sets were used, whereas for the 3D DenseNet-121, 2.5% of both were used. After the search, the model fine-tuned using the subset of the training set was further fine-tuned on the entire training set using optimal parameters until validation loss began to increase. The test set was held out during training and predictions for it evaluated just once after fine-tuning, which marked the end of model optimization.
integration of imaging and non-imaging data. Random forest regression, support vector machine, neural network, and LR architectures were assessed for efficacy of integrating imaging and non-imaging predictions, with LR providing best results on validation data. The LR architecture was thus used: all 28 imaging and non-imaging models were fed into an LR model, the optimal parameters of which were also identified through a random search. The search was conducted for 100 iterations, seeking to optimize the cross-entropy loss function weights afforded to both classes. For the cases of no, moderate, and severe OA, ideal parameters were identified by selecting those that maximized Youden's index within each OA classification in the search 41 . Predictions of the best few models in each classification were averaged to yield final TKR predictions. The number of predictions averaged in each classification was selected by finding a value that optimized validation accuracy, AUC, and Youden's index. The resulting LR models were ensembled and run on test data just once. Confidence intervals of accuracy, sensitivity, and specificity for each OA severity were obtained by bootstrapping, sampling 100% of test data with replacement (B = 100). Confidence intervals for AUC were calculated in the same manner. Results are reported on 3 versions of each model: the sole DenseNet-121 output (image only), output of a single LR model trained to predict TKR using solely the 27 non-imaging variables while not weighting the loss function class weights (non-imaging info. only), and output of the LR ensemble with image predictions (integrated model).  www.nature.com/scientificreports www.nature.com/scientificreports/ Statistical analysis. The accuracies of X-ray and MRI pipeline performances within each OA classification and overall were compared using McNemar's test 42,43 . This test was appropriate because it specifically tests for differences in a dichotomous variable in matched groups. In our case, the variable was correct TKR prediction and the groups were the X-ray and MRI pipelines. Initially, the McNemar test statistic was modeled with a chi-squared distribution to test for significant differences between the pipelines, and if one existed, a binomial distribution was used to interrogate which pipeline yielded the significantly higher performance. All tests were carried out at α = 0.05.
Relative sensitivity and specificity of the X-ray and MRI pipelines were assessed by comparing their AUCs within each OA classification and overall. This test is appropriate because the ROC curve plots true positive rate (sensitivity) against false positive rate (1 -specificity); consequently, the closer the AUC is to 1, the better the combination of sensitivity and specificity. 100% of test data was sampled with replacement (B = 100), and for each corresponding pair of X-ray and MRI pipelines (matched by OA classification and use of images only or both image and non-image information), AUCs were calculated. To test if one outperformed the other, differences in AUCs were calculated at each iteration, and the mean and standard deviation of the differences used to conduct a student's t-test with 99 degrees of freedom. This test is applicable on each matched pair of X-ray and MRI pipelines due to the number of iterations for which test data was sampled, allowing the central limit theorem to apply. For confidence intervals, mean and standard deviation of AUCs of individual models were calculated and used to report 95% intervals.

Imaging biomarker identification.
For all 124 true positives in the test data for the integrated MRI pipeline, corresponding controls were identified by randomly sampling from test data true negatives, keeping OA status distributions identical and using a student's t-test with 123 degrees of freedom to ensure no significant difference in KOOS pain scores across cases and corresponding controls at α = 0.05. Occlusion maps were generated for all cases and controls using voxel size of 12 × 32 × 32 and stride of 12. For each pixel, the value displayed represented the magnitude of change in the scalar pipeline output resulting when that pixel was occluded, averaged across all occlusions in which that pixel existed. Pixels for which scalar pipeline output change lied in the top 5% were designated as "hotspots. " Anatomic regions of these hotspots were identified and odds ratios (OR) calculated to interrogate possible imaging biomarkers of TKR. 95% OR confidence intervals were calculated for each anatomic region investigated in this analysis using Cornfield's method, as this method performs well with relatively small sample sizes 44 . P values of ORs were calculated using a two-tailed Fisher's exact test 45 . Tissues where p values fell below the significance level of α = 0.05 and in which 95% OR confidence intervals did not include 1 were deemed significant. These test selections were appropriate, as they allowed for direct comparison of the frequencies at which several tissues were hotspots across cases and controls, and as such, identified significant tissues with regards to TKR onset.

Results
oA pretrain utility in tKR prediction. To test information learned from the OA pretrain, pretrained models themselves were used to predict TKR, with results depicted in Table 3. Predictably, the radiograph OA pretrain model had poor sensitivity for patients without OA, and poor specificity in moderate and severe cases of OA. While the MRI OA pretrain model expectedly yielded more balanced sensitivity and specificity across all OA stages, it too left room for improvement, particularly in sensitivity at no OA and specificity at severe OA. This confirmed the pretrain provided useful information to both architectures but fine-tuning and integration of non-imaging variables were necessary to attain desired TKR prediction performance.

X-Ray pipeline optimization and performance.
For the X-Ray model, hyperparameter tuning steps found the following to yield the best combination of validation accuracy, sensitivity, and specificity: learning rate of 3.981 × 10 −6 , TKR class weight in cross-entropy loss function of 0.927 and non-TKR class weight of 0.073, dropout rate of 0.375, and only the last 2 layers fine-tuned after OA pretrain.
A radiograph model was fine-tuned to predict TKR with these parameters, and its predictions fed into an LR ensemble. Averaging predictions of the best 5 LR models found through random search in the 3 OA categories yielded best validation performance, so this ensemble was used on the test set. Test accuracy, sensitivity, and specificity are provided in Table 4 MRi pipeline optimization and performance. Similarly, a hyperparameter search was carried out for the MRI pipeline to optimize parameters for eventual fine-tuning. The following hyperparameters were found optimal: learning rate of 1.906 × 10 −2 , TKR class cross-entropy weight of 0.902 and non-TKR class weight of 0.098, dropout rate of 0.329, only last layer of model fine-tuned after OA pretrain.
An MRI-based model was fine-tuned from these parameters. The resulting predictions were fed into an LR ensemble, where averaging predictions of the best 4 models in each OA category optimized validation performance. Performance of the resulting architecture on test data is reported in the same manner as the radiograph pipeline, in Table 4  Comparison of MRI and radiograph pipeline performances. A comparison of overall AUCs attained by the integrated MRI and X-ray pipelines across OA grades and overall shows that at no OA and severe OA, the MRI pipeline outperformed the X-ray pipeline (No OA, B = 100: p = 3.04 × 10 −2 ; Moderate OA, B = 100: p = 9.55 × 10 −1 ; Severe OA, B = 100: p = 4.57 × 10 −2 ; Overall, B = 100: p = 9.94 × 10 −1 ). The MRI pipeline thus has a superior combination of sensitivity and specificity than does the X-ray pipeline for patients without OA and those with severe OA. The AUCs obtained by the image-only pipelines also were compared, and showed the MRI pipeline to outperform the X-ray pipeline for patients without OA and overall (No OA, B = 100: p = 6.10 × 10 −5 ; Moderate OA, B = 100: p = 7.58 × 10 −1 ; Severe OA, B = 100: p = 4.37 × 10 −1 ; Overall, B = 100: p = 1.16 × 10 −2 ). These results follow intuition: while radiographic imaging is primarily capable of illuminating bones in the joint, MRI can visualize soft tissues such as cartilage, muscle, and meniscus 46,47 . It follows that an MRI model will  www.nature.com/scientificreports www.nature.com/scientificreports/ exhibit a better combination of sensitivity and specificity, especially in early OA stages at which few radiographic changes in the knee have occurred. ROC curves for pipeline versions and OA classifications in which the MRI architecture yielded a significantly better AUC than its X-ray counterpart are shown in Fig. 3.
McNemar's test assessed relative accuracies of these pipelines. There was a statistically significant difference between the accuracies of the integrated X-ray and MRI pipelines for patients at no OA, moderate OA, and overall (No OA, n = 537: p = 1.65 × 10 −59 ; Moderate OA, n = 521: p = 1.13 × 10 −9 ; Severe OA, n = 47: p = 8.84 × 10 −1 ; Overall, n = 1,105: p = 1.52 × 10 −54 ), and in each of those 3 statistically significant cases, the X-ray pipeline outperformed the MRI pipeline (No OA, n = 537: p = 1.11 × 10 −16 ; Moderate OA, n = 521: p = 5.97 × 10 −10 ; Overall, n = 1,105: p = 1.11 × 10 −16 ). In interpreting these tests and the AUC tests holistically, it is evident that the X-ray pipeline is able to attain superior accuracy in several OA classifications by compromising on its combination of sensitivity and specificity. This is further supported by the accuracies and sensitivities reported for the respective pipelines in Table 4, which show that while the X-ray pipeline is more accurate than its MRI counterpart at every OA classification, the opposite is true for sensitivity-drastically so for patients without OA. In the clinic, where sensitivity as to whether a patient is at risk of eventual TKR is paramount, these results would show the MRI pipeline to be the more useful model.
It is also worthy to note the improvement in performance that occurs for patients without OA when imaging predictions are added to non-imaging variables in both pipelines. In the X-ray pipeline, the model's AUC increased from 0.514 ± 0.087 to 0.799 ± 0.055 when non-imaging variables were added to the radiographs, a sizeable increase when compared to the MRI pipeline performance, which saw AUC increase from 0.897 ± 0.039 to 0.943 ± 0.029 (p < 0.05 for all). This demonstrates that non-imaging variables such as various pain scales seem to add critical information to the X-ray pipeline, while the same information is less important in the MRI pipeline.
Biomarker identification and analysis. Of the 152 patients in test data who underwent a TKR, 124 were detected by the MRI pipeline. Occlusion maps were generated for these cases and their corresponding true negative controls, an example of which is shown in Fig. 4. Tissues and their hotspot percentages across these true positives and corresponding true negative controls can be found in Supplementary Table S2 and S3, respectively. ORs, 95% confidence intervals, and associated p values for each tissue can be found in Table 5.
Three tissues saw ORs and 95% confidence intervals that lied above 1 and p values below α = 0.05: the medial patellar retinaculum, gastrocnemius tendon, and plantaris muscle. Thus, we conclude there is a substantial and statistically significant difference in the risk of TKR within 5 years when these tissues are identified as hotspots by the pipeline. From the ORs, we see that the risk of TKR increases when any of the three are identified as hotspots: for the medial patellar retinaculum, the risk is 1.98 times higher with a 95% confidence interval from 1.02 to 3.99; for the gastrocnemius tendon, it is 2.97 times higher with a 95% confidence interval from 1.12 to 10.0; and for the plantaris muscle, it is 2.84 times higher with a 95% confidence interval from 1.47 to 5.82. As such, these results provide evidence that all are imaging biomarkers of TKR.
On the other hand, several tissues located within or near the tibiofemoral joint-namely, cartilage and bone in both medial and lateral locations of the joint, menisci in all tested regions, and the ACL-saw ORs and 95% confidence intervals entirely below 1 and p values below α = 0.05. Consequently, for all of these tissues, we find a statistically significant difference in the risk of TKR within 5 years when these tissues are identified as hotspots. In the case of each, the risk of TKR appears to decrease when these tissues are identified as hotspots. Interestingly, www.nature.com/scientificreports www.nature.com/scientificreports/ each of these tissues have either been implicated as imaging biomarkers of OA progression, or damage within them is associated with OA onset [48][49][50] . These results, in conjunction with the three tissues in which risk of TKR increased when identified as hotspots, suggest that compared to OA progression, TKR onset relies less on tissues in and around the tibiofemoral joint and more on tissues in other locations of the joint to make predictions. TKR has been considered an outcome of OA progression, but these results demonstrate in part how it is a more nuanced problem.

Discussion
In this work, we present a pipeline that integrates MR imaging and non-imaging features to attain strong TKR prediction performance, reporting accuracy of 78.5 ± 0.134%, sensitivity of 81.8 ± 0.643%, and specificity of 78.4 ± 0.138% (intervals calculated with standard error of measurement (s.e.m.), p < 0.05). Comparisons of AUCs showed the MRI pipeline to outperform the X-ray pipeline for patients without OA and with severe OA, thereby showing the MRI model to have a better combination of sensitivity and specificity in these OA classifications. That it did so particularly for patients without OA shows the utility of the MRI pipeline in screening for patients at risk of TKR despite higher costs. It was also interesting that, particularly among patients with no OA, the X-ray model improved drastically more than the MRI model when non-imaging information was added, judging by disparities in AUCs. This suggests the MRI-trained DenseNet-121 may have learned to predict some of the non-imaging features from the images themselves, indicating that MRI images may intrinsically contain information regarding pain, quality of life, and physical performance, among other non-imaging variables used in this study. The utility of MRI in predicting these variables through DL is certainly worth further investigation. . ROC curves for MRI and X-ray pipelines at selected OA classifications and pipeline versions in which MRI performance was significantly better than that of X-ray. MRI pipeline outperforms X-ray pipeline at no OA for both image-only and integrated models, as seen in (a,c). As shown in (b), integrated MRI pipeline also outperformed integrated X-ray pipeline for patients with severe OA, while (d) shows image-only MRI pipeline outperformed image-only X-ray pipeline across all OA stages. AUCs are displayed in the figure with p < 0.05. Standard deviations used to calculate confidence intervals. ROC curves with AUCs within 1 standard deviation of the mean for each pipeline version during bootstrapping are also shown on plots. www.nature.com/scientificreports www.nature.com/scientificreports/ A comparison of the MRI pipeline performance to past work is insightful. The closest analog to our work was conducted by Wang, T. et al. 24 , who trained independent residual networks to predict TKR from both DESS and Turbo Spin Echo (TSE) MRI images, integrating both predictions with non-imaging variables in an LR model to yield a final TKR prediction. This yielded a model with AUC of 0.86 ± 0.01 (p < 0.01) when solely DESS or TSE images were used, and 0.88 ± 0.02 (p < 0.01) when both images and non-imaging features were integrated. Our MRI image-only model saw AUC of 0.886 ± 0.020 (image only, p < 0.05) and an integrated AUC of 0.834 ± 0.036 (combined, p < 0.05). Our image-only model thus yields performance superior to its image-only counterpart, with a 95% confidence interval lying entirely above the mean AUC of the image-only model by Wang, T. et al. 24 . Our integrated model, as discussed previously, was optimized to maximize Youden's index within each OA classification rather than overall AUC, explaining why our integrated model has a lower overall AUC than our image-only model. However, due to this decision, we obtained strong performance at early and moderate OA stages, with sensitivity and specificity of 92.2 ± 1.68% and 82.4 ± 0.173% at no OA, respectively, and 78.9 ± 0.974% and 74.7 ± 0.228% at moderate OA (intervals calculated using s.e.m., p < 0.05). In particular, the AUC of 0.943 ± 0.029 (interval calculated with s.d., p < 0.05) obtained by the MRI pipeline for patients without OA, the most difficult OA classification from which to predict TKR, by far surpasses that of past TKR predictive models that include patients across all stages of OA. This performance marks progress towards a model that identifies patients at risk for TKR such that nonsurgical treatment strategies can be implemented to delay TKR.
The biomarker analysis conducted also has implications, as it identified several tissues located within or near the tibiofemoral joint as reducing risk of TKR when identified as hotspots by the full MRI pipeline-namely, these were medially and laterally located cartilage and bone, all examined meniscal regions, and the ACL. These tissues or damage within them all have been associated with progression or onset of OA, and that our model shows TKR onset to be less reliant on these imaging features in cases compared to controls demonstrates TKR onset to be a more complicated problem than OA progression, despite the relationship between the two. On the other hand, the model identifies three tissues as increasing risk of TKR when identified as hotspots in the pipeline: the medial patellar retinaculum, gastrocnemius tendon, and plantaris muscle. The medial patellar retinaculum is crucial for lateral stabilization of the knee joint, and as such, damage to it results in a patella that more easily dislocates 51 . Past work has shown patellar dislocation increases risk for OA, and TKR can be an effective procedure to treat inveterate patellar dislocation, showing a previous link between this tissue's functionality and eventual OA and TKR 52,53 . The gastrocnemius tendon and plantaris muscle, on the other hand, are both posteriorly located tissues within the knee that play a key role in knee flexion 54 . While literature regarding the plantaris muscle is rather sparse, injuries to the muscle can be implicated in knee and calf pain felt by a patient 55 . Given their related functionality and location, the gastrocnemius tendon and plantaris muscle can jointly be implicated in conditions such as "tennis leg, " which refers to mid-calf pain felt during extension of the leg, usually due to damage to one of these tissues or their associated muscles or tendons 56 . The significance of the plantaris muscle and gastrocnemius tendon to OA progression and TKR, however, have not been well characterized, and these results justify future studies to these ends.
This study had some limitations. The first is specific to the OAI dataset, which tends towards older, female patients, all from the United States: across 4,796 patients, the mean age is 61 years and 58% of patients are female. This is not emblematic of the general population, so the robustness of the pipeline could be strengthened by testing on a dataset such as the Multicenter Osteoarthritis Study (MOST). A further limitation of the dataset is that, despite the fairly large size, there are a very limited number of patients with the classification of most interest: those without radiographic OA that still undergo TKR within 5 years. Only 66 such cases existed in the entire OAI dataset, and 12 were in the test set. As such, the OAI dataset and the number of comparison experiments we ran within and across OA classifications limits the statistical power of our conclusions. Furthermore, in this study, pixels in MRI images were compressed to 14 possible values to optimize performance-a version of the pipeline was also constructed and evaluated without the compression, but its TKR prediction performance was not as strong. Ideally, a model that uses all available information would be used in occlusion map analysis to draw more precise conclusions regarding anatomic regions that associate with TKR, but this compromise was necessary to improve performance. A final limitation was computational intensity in occlusion map generation: the voxel size and stride used were 12 × 32 × 32 and 12, respectively. These ideally would be smaller so maps could yield more precise insights but doing so was infeasible in a reasonable amount of time.
To conclude, this work presents a predictive model that delivers performance not previously seen in predicting TKR, especially for patients without OA. By delivering such performance, this pipeline can identify patients at risk of TKR with high sensitivity and specificity, and for patients with no or moderate OA, this can allow a non-invasive treatment to be implemented that prolongs good health of the knee and delays TKR. The biomarker analysis identifies the medial patellar retinaculum, gastrocnemius tendon, and plantaris muscle as increasing risk of TKR when identified as a hotspot by the model, while its assessment that several tissues within and near the tibiofemoral joint appear to reduce risk of TKR helps demonstrate the added complexity of predicting TKR onset  Table 5. Summary of occlusion map analysis comparing frequencies with which selected knee joint tissues were indicated as hotspots in analysis. Hotspots were defined as pixels that, when occluded, were among the top 5% of all pixels in change of pipeline TKR prediction output metric when occluded. Odds ratios, 95% confidence intervals calculated using Cornfield's method, and p values calculated using Fisher's exact test are displayed. Tissues that were significant at α = 0.05 are designated with a * . N value for all tests was n = 124.