Development and validation of a clinical prediction model for patient-reported pain and function after primary total knee replacement surgery

To develop and validate a clinical prediction model of patient-reported pain and function after undergoing total knee replacement (TKR). We used data of 1,649 patients from the Knee Arthroplasty Trial who received primary TKR across 34 centres in the UK. The external validation included 595 patients from Southampton University Hospital, and Nuffield Orthopaedic Centre (Oxford). The outcome was the Oxford Knee Score (OKS) 12-month after TKR. Pre-operative predictors including patient characteristics and clinical factors were considered. Bootstrap backward linear regression analysis was used. Low pre-operative OKS, living in poor areas, high body mass index, and patient-reported anxiety or depression were associated with worse outcome. The clinical factors associated with worse outcome were worse pre-operative physical status, presence of other conditions affecting mobility and previous knee arthroscopy. Presence of fixed flexion deformity and an absent or damaged pre-operative anterior cruciate ligament (compared with intact) were associated with better outcome. Discrimination and calibration statistics were satisfactory. External validation predicted 21.1% of the variance of outcome. This is the first clinical prediction model for predicting self-reported pain and function 12 months after TKR to be externally validated. It will help to inform to patients regarding expectations of the outcome after knee replacement surgery.


Predictors of Outcome -model development.
Of the 14 variables entered into a backward regression model, ten variables were identified as predictors of post-operative OKS in the KAT study. Regression coefficients and 95% confidence intervals (CI) for each predictor are shown in Table 2. A positive regression coefficient value indicates that the group had better post-operative pain/function and a negative value indicates that the group had worse post-operative pain/function.
Worse pre-operative OKS, self-reported anxiety/depression, presence of ASA grade 3/4 (compared to fit and healthy), presence of other conditions affecting mobility and previous knee arthroscopy were strongly associated with worse outcome. Increasing deprivation score and increasing BMI were associated with decreasing OKS at 12 months follow-up (worse pain/function). Presence of fixed flexion deformity and damaged/absent ACL (compared with intact) were significantly associated with better outcome at 12 months after TKR.
A significant interaction between age and gender was found (P-value < 0.001) and included into the final model. Patients aged younger than 60 and older than 80 presented a worse pain and functional status at 12 months after knee surgery, and this effect also varied by gender. Younger women (age < 60) had better outcome than men; but in the oldest age group (age 80 or more) women had worse outcomes than men. There was no difference of gender on OKS outcome in the middle age groups (age 60 to 80).
Internal validation. The bias-corrected R 2 statistic in the final predictive model including the patient and clinical factors was 17.6%. Age, sex and pre-operative OKS explained 12.8% of the variability in outcome, when the other patients' characteristics were included 14.7% of the variance of outcome was explained, reaching 17.6% when clinical variables were added. Model calibration was good, with close agreement between predicted and observed values of post-operative OKS at 12 months after TKR (Fig. 1).
External validation. The model showed better discriminatory ability than the model internally validated with an adjusted R 2 of 21.1%. Calibration shows underestimation of the predicted values with regard to the observed data in COASt but with close agreement between both scores (Fig. 1).

Discussion
We have developed and externally validated a new prediction model for patient-reported pain and function after TKR by using patient characteristics and clinical variables that are easy to measure. To our knowledge, this is the first study to attempt external validation for predicting outcome after TKR. The model showed adequate predictive validity with an R 2 of 17.5% and had good calibration across all deciles of predicted 12-month OKS. The external validation improved prediction up to 21.1% but this prediction underestimated observed OKS.
Presence of a fixed flexion deformity, and an absent or damaged pre-operative ACL (compared with intact ACL) were all significantly associated with better outcome.
Determinants of worse outcome included in the final model were: worse pre-operative OKS, living in poor areas, high BMI, worse mental health, worse ASA grade, presence of other condition affecting mobility and knee arthroscopy.
All predictors have undergone internal validation using bootstrap techniques to ensure they were consistently identified as significant predictors.
This prediction model provides an individualised estimate of post-operative OKS, and change in OKS, and this information will help to inform to patients regarding expectations of the outcome after knee replacement surgery.
Few papers have described clinical risk prediction models for outcomes of knee arthroplasty 13,22-24 , however these studies are informative for decision-making but require confirmation and external validation in new patients cohorts. We included known risk factors in our model and report coefficients similar in both magnitude and direction to those reported elsewhere for outcome after replacement. Within our study, we found that worse pre-operative knee pain and function score and greater socioeconomic deprivation, based on residential area, were significantly associated with worse patient reported outcomes, and these associations have been well documented [12][13][14]16,25,26 . We also found that patients with self-reported anxiety/depression were most likely to have worse post-operative OKS. These results have been consistent with previous reports, using measures of mental health such as the SF-36 27 and the EQ-5D-3L 13 . Within this study higher BMI was associated with worse post-operative OKS. There is controversy with respect to BMI, previous studies found association between high BMI and worse outcome 28,29 whereas others found no evidence of an association 13,30 . A possible explanation for this association may be that  as BMI is known to be associated with limited physical performance 31 , the judgment of obese patients on their health status may be based on functional mobility. Although the relationship was statistically significant, the effect size was not a clinically important, meaning that BMI should not be a barrier to surgery. In relation to the clinical variables, we found that patients with worse pre-operative physical status (ASA grade 3-4), compared to fit and healthy, pre-operative disability and previous knee arthroscopy were associated with worse outcome, and these results were consistent with other studies 3,18 .
We found that patients with a fixed flexion deformity, or an absent pre-operative ACL, achieved better outcomes one year after TKR, even after adjustment for pre-operative OKS and this finding has been previously reported in other study using KAT data 32 . Patients with a pre-operative fixed flexion deformity were more likely to present absent/damaged pre-operative ACL in this study. Also, these patients presented with similar or even better pre-operative characteristics compared with those without fixed flexion deformity and intact pre-operative (data not shown).
This study has several strengths: (1) Large sample size. A total of 1,649 patients were included in the develop analysis (2) the use of a wide range of predictor variables, including socio-demographic and almost unique data on clinical factors, (3) the use of multiple imputation and bootstrapping as an internal validation technique 33 , to ensure significant predictors are and not anomalous to this dataset, (4) in addition good reproducibility of the model has been confirmed by external validation 34,35 , and (5) collating these risk factors together to develop a clinical prediction model that may be informative for decision making.
There are some potential limitations to this study. First, although calibration of the prediction model was good for predicting attained post-operative OKS, and change (improvement in OKS), it does not currently predict dichotomous "good" or "poor" outcome (e.g. a change in score of less than 5 points"). This model could be tested against such definitions once they have been agreed. Second, potential predictive factors, principally radiographic variables (e.g. Kellgren and Lawrence (KL) grade of OA), were not available in this study. In addition, lifestyle measures such as smoking, alcohol consumption and exercise were not available in the development dataset, and hence we did not consider them in this study. We therefore used BMI and area deprivation, which have been shown to correlate well with behavioural risk factors 36 . Similarly, race was not collected in the KAT study, and hence this variable was not included in the analysis. Since patients included in the COASt study were predominantly white (over 95%), and both studies had similar patient's characteristics, the results of this study may not be generalizable to other racial groups. Future research should focus on predictors of outcome after TKR in different racial groups.
Third, nonresponse bias from this study limits the generalizability of study findings. Nonresponses were more likely to have worse outcome and reported anxiety and depression, thus this implies that the true effects of these predictors may be underestimated in this study. Importantly, external validation has confirmed the calibration and discrimination ability of the model.
Finally, R 2 values were relatively low, predictive factors explained 17.5% and 21% of the variability in outcome, suggesting that the nature of the health status in TKR patients is multifactorial. However, since strong statistically significant predictors were found, we can still draw important conclusions about how changes in the predictor values are associated with changes in the outcome. Those values are consistent with other studies attempting to explain the variability in outcome of TKR 13,22,23 and also with other well-known prediction tools such as QRISK and Framingham score that explain around 30% of the variability in outcome 37 .
We have developed and for the first time externally validated a clinical prediction model for outcome 12-month after TKR. Clinicians could use information on the level of patient outcome improvement, when counselling patients about the prognostic of TKR, allowing to the patients to be involved in the decision whether to undergo surgery. External validation has confirmed its performance and validity and it can be already used in clinical practice.

Patients and Methods
Data sources. Development dataset. This study was carried out using data from the KAT trial 20,21 : a pragmatic, partial-factorial, unblinded randomised controlled trial (International Standard Randomized Trial No. ISRCTN45837371). Patients were recruited from July 1999 to January 2003 through a random sample, stratified by surgeon according to age group, gender and site of disease. The KAT study contains information on patients receiving primary TKR across 34 centres in the UK.  Participants. Development dataset. Patients were eligible for inclusion in KAT if a decision had been made for them to undergo primary TKR 21 . A participant was not eligible if the surgeon considered a particular type of operation to be clearly indicated (metal-backed tibial component, patellar resurfacing and/or a mobile bearing). 4,070 potentially eligible patients were found and 2,374 (58%) provided their consent and were randomised. Of those, 22 were later discarded because were randomised in error, which left 2,352 participants in the trial. For the purpose of the present study, 1,649 individuals (41%) with available data about OKS at baseline and year 1 were selected.
Validation dataset. Patients included had osteoarthritis (OA) or rheumatoid arthritis (RA). They were over 18 years and were competent and willing to consent to undergo primary TKR. 1,674 patients who underwent knee replacement were initially accepted by COASt study. Fifty-seven patients (3%) were excluded because of (a) duplication, (b) ineligibility, (c) withdrawn, (d) limited consent, (e) preoperatively assessed knee was different than the actual operated knee, or (f) any data provided. Patients undergoing patella-femoral resurfacing (n = 16, 1%), TKR revision (n = 112, 7%) and unicompartmental knee replacement (n = 643, 38%) were also excluded. Finally, 595 answering about OKS at year 1 (36%) were used for the external validation.
Outcome. The outcome was the patient's pain and functional status as measured by the OKS at 12-month after primary TKR. OKS is a validated patient-administered questionnaire which consists of 12 questions relating to knee pain and physical function limitations during the past 4 weeks 38 . Each question is on a Likert scale taking values from 0 to 4, with 4 being the best outcome. A total score was created ranging from 0 (severe symptoms and dysfunction) to 48 (no problem on any item) 7 . OKS questionnaires were also completed by participants pre-operatively. Patients were required to answer OKS questionnaires thinking of their operated knee in development and validation studies.
Predictor variables. The pre-operative patient characteristics included were age, gender, marital status, socioeconomic deprivation measured by Index of Multiple Deprivation 2004 (IMD) 39 , BMI and mental health ( Table 3). As the association of age on outcome was non-linear, we considered age according to the categories used in the randomization process: less than 60 years; 60 to 69 years; 70-79 years and 80 years or older. The IMD 2004, based on patients' residential postcodes, combines weighted scores for each postcode in seven deprivation domains, where a high score indicates increased deprivation. BMI was calculated at baseline as the ratio of the weight to the square of height in meters (kg/m 2 ). Values of IMD 2004 and BMI were collapsed each 10 units. Mental health was assessed using the anxiety/depression item of the European Quality of life-five domain (EQ-5D-3L) questionnaire 40 .
The following variables were considered as potential clinical risk factors: (a) physical status such as is classified by the American Society of Anesthesiologists (ASA) grade. This grading system is a standard assessment of the patient's general physical health prior to surgery 41 . It is composed by four categories (1, fit and healthy; 2, mild disease; 3, incapacitating disease; and 4, life-threatening disease) but for this study the last two categories were collapsed, (b) disease type (OA, RA), (c) disease side (left, right) 42 , (d) previous knee arthroscopy (it happened in any time for the development dataset while for the validation dataset it only referred to the 12 previous months before TKR), (e) presence of other conditions affecting mobility, (f) pre-operative fixed flexion deformity and (g) pre-operative status of anterior cruciate ligaments (ACL) ( Table 3).
Clinical pre-operative factors refer to the operated knee in the development and the validation datasets.
Statistics. All analyses were conducted using the Stata version 13.1 statistical software (StataCorp, College Station, Texas). To determine selection (response) bias, an analysis was conducted to compare patient characteristics in responders and non-responders to both pre-operative and 12 months post-operative OKS questionnaires in the development dataset. General linear models were used to identify predictors on post-operative OKS. Linearity of continuous variables with the outcome was assessed using fractional polynomials and collinearity between variables was assessed by the variance inflation factor (VIF). Because the variance of the residuals is non-constant (evidence of heteroscedasticity), robust standard errors were used with the sandwich variance estimator 43 . Interactions between age and sex with all other factors were tested. To get the fit of the final model with the smallest number of variables and the best predictive validity, we followed these steps 44-46 : Step 1: multiple imputed dataset using chained equations was generated to investigate the impact of missing data 47 . Forty imputed datasets were generated using all potential factors (including the outcome) and estimated parameters were combined using Rubin's rules; Step 2: bootstrap linear regression model method with the sandwich variance estimator was used. We drew 200 bootstrap samples with replacement from the combined 40 imputed datasets. Within each bootstrap sample, automatic backward selection was applied using a significance level equal to 0.157 (except age and gender were force-entered into all models) 48 ; Step 3: variables which appeared at least 70% of the time were retained in the final regression model.

Internal validation.
To check the internal validity of the model, 200 bootstrap samples with replacement combined with multiple imputations was once again used to assess bias-corrected estimates of predictive ability 43 . Predictive ability was assessed by examining measures of discrimination (R 2 statistic) and calibration 48,49 . Samples were divided in ten deciles for calibration according to their predicted risk. For each decile, means of predicted and observed OKS were obtained. For each observed mean was also calculated its 95% Poisson confidence interval. External validation. For the external validation, the missing values of the predictors selected in the prediction model were also imputed using the same procedure describe above. We applied the pooled coefficients of the model developed in KAT with the forty imputed datasets of COASt, and we then calculate discrimination and calibration. Discrimination was calculated by the pooled R 2 statistics using Fisher's r to z transformation 50 .
Sensitivity analysis. Finally, to evaluate whether there were differences in post-operative OKS for patients with RA versus OA, we conducted an analysis excluding patients with RA; because no substantial model differences were observed, these supporting analysis are not shown.

Ethics.
For the KAT trial ethical approval was obtained from by the Multi Centre Research Ethics Committee for Scotland in November 1998 (research protocol MREC/98/0/100) and was approved by the Local Research Ethics Committees in each study centre recruiting trial participants.
COASt has been approved by the Oxford REC A (Ethics Reference: 10/H0604/91). The sponsoring organisation of the study is the University Hospitals Southampton NHS Foundation Trust (UHS).
Both cohorts confirm to national guidelines and individual ethics and data protection requirements. Data were collected within the two cohorts in an anonymised format as confirmed by the study participants in their written informed consent and as directed by the ICH-GCP guidelines and appropriate local and International legislation. It was not be possible to re-identify the donors. Datasets were stored at the University of Oxford in a secure database. The data storage, management and handling were protected in accordance with European Commission Directive 95/46/EC and appropriate national regulations.