Introduction

Primary total hip (THR) and knee replacements (TKR) are amongst the most common elective orthopaedic procedures1. In the UK, the mortality-adjusted lifetime risk for a THR and TKR at fifty years-old is 11.6% and 10.8% and 7.1% and 8.1% for females and males, respectively, whilst, in the USA, over 50% of patients diagnosed with symptomatic knee osteoarthritis will undergo TKR2,3. Most frequently, these surgeries are performed in the management of end-stage osteoarthritis—a degenerative process which accounts for 2% of global disability years, for example impacting 10% of UK adults4,5.

Following the COVID-19 pandemic, the total number of THRs and TKRs performed in the UK has halved and not yet recovered6,7. Concurrently, demand has continued to rise, resulting in extended waiting lists6,7. Similar trends can be seen in comparable health systems including, Canada, the Netherlands, and Denmark8,9. Consequently, understanding how increased waiting-times for elective primary THR/TKR influences patient-centred outcomes will be essential for future healthcare planning.

Previously a systematic review of 1,646 patients with osteoarthritis awaiting THR and TKR by Hoogeboom et al. identified no evidence of deterioration in self-reported pain status10. However, this analysis was limited to the first 180 days of waiting for THR or TKR10. More recently, Patten et al. concluded that the pain levels of patients with osteoarthritis remained stable for the first year after addition to a surgical waiting list, although only reported a median follow-up of 13.6 weeks11. Furthermore, both reviews did not measure changes in joint functionality, quality of life, wider patient perceptions, nor explore how prolonged pre-surgical waiting time might impact postoperative outcomes.

To better inform elective service planning, this systematic review and meta-analysis sought to understand how pre-surgical waiting time—defined as the time from placement on surgical waiting list until surgery—for patients undergoing elective primary THR or TKR influenced both pre- and post-operative joint specific pain and functional status, global health-related quality of life (HRQOL), and patient perspectives.

Results

Literature summary and evaluation

Study selection is summarised in Supplementary Fig. 1. After deduplication, 525 studies were initially screened for eligibility with substantial inter-rater reliability (k = 0.75)12. The remaining thirty-four studies were then assessed in-detail, with twenty-six being included within this review13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38. Subsequently, sixteen studies were suitable for quantitative meta-analysis. Exclusory causes and summary characteristics for eight fully assessed studies are summarised in Supplementary Table 239,40,41,42,43,44,45,46.

The included studies capture a reported population of 89,996 patients (60.6% female, mean age 67.4 years) between 2001–2022 (Table 1)13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38. Similar study composition and methodologies enabled valid comparisons. Individual patient meta-analysis was not feasible. Risk of bias for each extracted study outcome for randomised controlled trial, non-randomised (case–control and cohort), and cross-sectional studies are summarised in Supplementary Tables 3, 4, and 5 with the ROB-2, ROBINS-I, and JBI frameworks, respectively47,48,49. Supplementary Table 6 presents Grading of Recommendations Assessment, Development and Evaluation (GRADE) outcome certainty evaluations50.

Table 1 Summary characteristics of included studies.

Reported joint-specific pain and function and HRQOL outcome data with a preoperative endpoint are qualitatively summarised in Table 2. Comparatively, the studies presented in Table 3 show joint-specific pain and function and HRQOL data postoperatively. Table 4 outlines patient perspectives on prolonged presurgical waiting time for primary elective THR and THR.

Table 2 Summary of findings from studies reporting changes in clinical functionality and health-related quality of life between baseline and preoperative endpoint.
Table 3 Summary of findings from studies reporting clinical functionality and health-related quality of life with a postoperative endpoint.
Table 4 Summary of findings from studies exploring patient perspectives on prolonged waiting times.

Changes in joint functionality whilst awaiting total hip or knee replacement surgery

Fifteen studies commented on deteriorating joint pain/function whilst awaiting THR or TKR, capturing 9,070 patients . Seven of these (n = 995 patients) were suitable for meta-analysis, reporting mean difference changes in joint-specific pain and functional outcomes across nine cohorts of patients awaiting either THR or TKR13,16,17,21,23,24,26,27,31,33,34. Figure 1A presents a forest plot comparing changes in joint-specific outcome (0–100, worst-best) whilst awaiting surgery. A deterioration was observed (MD 3.05; 95% CI − 0.32,6.32; p = 0.07; I2 = 80.8%) between baseline and presurgical scores, which trended towards significance. Consequently, a linear meta-regression was performed (Fig. 1B) to explore the influence of continuous waiting time on this observed heterogeneity, which identified a significant signal corresponding to an absolute deterioration of 0.0575 on a 100-point scale, per additional day of waiting time (95% CI 0.0064,0.1086; p = 0.028 (to 4 d.p.); I2 = 73.1%).

Figure 1
figure 1

Preoperative Joint-Specific Outcome Data (n = 995]. (A) Forest plot of mean difference; (B) Meta-regression by mean waiting time; (C) Funnel plot. Outcome measures reported include: WOMAC score (Yellow); Oxford Hip Score (Orange) and Oxford Knee Score (Blue].

Tuominen et al. was excluded from this meta-analysis as, despite reporting an acceptable joint-specific outcome (Knee Society Clinical Rating System), the short (n = 132) and non-fixed waiting time (n = 198) groups reported mean differences of zero (to 3 decimal places) after mean respective waits of 94.6 and 239.2 days27,51. Given expected random variation in outcome measurement, this introduced concerns of possible reporting error. Both Brown et al. and Johnson et al. were cross-sectional studies and thus unable to identify temporal changes in outcomes. However, in both studies, the majority of patients associated subjective increases in pain with increased waiting times (487/848 and 68/117, respectively)31,34.

Farrow et al. reported an odds ratio of 1.84 (95% CI 1.29,2.62; p < 0.001) associated with any opioid prescription at presurgical assessment between a 2014–2017 control cohort and a 2020 COVID-19 cohort (median waiting times 365 and 455, respectively)33. Although surrogate for both joint functional outcome and HRQOL, opioid prescription is of high clinical relevance to patient and clinician stakeholders and merits the inclusion of this study within qualitative syntheses52.

Overall certainty in this outcome was high (Supplementary Table 6). The funnel plot presented in Fig. 1C showed limited risk of publication biases, whilst there was insufficient threat of imprecision, inconsistency, and bias (Supplementary Table 5) to downgrade certainty. Subsequently, this analysis concludes that the observed meta-regression is representative of the true effect and that deterioration in joint function occurs with increasing preoperative waiting period.

Post-operative impact of waiting time on the joint specific outcome of total hip or knee joint replacement

Thirteen studies, capturing 81,523 patients, reported on postoperative joint pain and function14,15,16,17,18,20,25,27,28,29,30,37,38. Fifteen reported cohorts across eight studies were suitable for quantitative meta-analysis (n = 66,836 patients)14,15,16,17,27,28,29,30. Improvement in joint specific outcome was observed postoperatively (MD (0–100 score, worst-to-best) 38.57; 95% CI 34.00,43.14; p < 0.001; I2 = 99.7%) and presented as a forest plot (Fig. 2A). In the absence of mixed cohorts, it was possible to perform a sensitivity analysis comparing THR and TKR data, which is also shown in Fig. 2A. There was a significantly greater (p = 0.000) reported cumulative effect in THR patients (MD 44.25; 95% CI 41.30,47.20; p < 0.001; I2 = 95.8%), relative to TKR (MD 28.81; 95% CI 26.29,31.33; p = 0.003; I2 = 57.1%).

Figure 2
figure 2

Postoperative Joint-Specific Outcome Data (N = 66,836]. (A) Forest plot of mean difference showing Total Hip; (B) Meta-regression by mean waiting time; (C) Meta-regression sensitivity analysis of Nikolova et al. 2016 by mean waiting time. (N = 4,873); (D) Funnel plot. Outcome measures reported included: Western Ontario and McMaster Universities Osteoarthritis Index score (Yellow); Oxford Hip Score (Orange); Oxford Knee Score (Blue), Knee Society Clinical Rating System (Purple), and Knee Injury and Osteoarthritis Outcome Score (Magenta].

To explore how waiting time discriminated this outcome, Fig. 2B presents a pooled linear meta-regression of preoperative waiting time against MD (coefficient 0.00383; 95% CI − 0.0411,0.0487; p = 0.8684 (to 4 d.p.); I2 = 99.7%). Figure 2C presents the sensitivity analysis of this regression with respect to THR (coefficient 0.0234; 95% CI − 0.0064,0.0531; p = 0.1230 (to 4 d.p.); I2 = 27.3%) and TKR subgroups (coefficient 0.0151; 95% CI − 0.0135,0.0437; p = 0.2996 (to 4 d.p.); I2 = 96.6%]. Applying Clogg’s method, there was no significant difference between the rate of decreasing postoperative functional gain53. The symmetrical funnel plot (Fig. 2D) alleviated concerns discounts publication bias when considering outcome certainty.

Five studies were unsuitable for quantitative synthesis due to outcome reporting18,20,25,37,38. Of these three associated decreasing postoperative functional gains with increasing time awaiting surgery18,20,25. Apart from Holzapfel et al., all study outcomes were captured 6 to12 months post-operatively. Holzapfel explored the impact of presurgical postponement, e.g., following patient admission for primary elective THR and TKR, over 10 years at a single German centre (N = 10,140) and identified an increased risk of complications and revisions within the postponed surgery group37.

Within multivariate regressions of their reported THR (n = 29,303) and TKR (n = 32,602) populations, Nikolova and colleagues identified statistically significant deterioration in 6-month post-operative OHS(0.0951, p < 0.01) and OKS(0.0385, p < 0.01) per additional day of waiting time, exceeding the estimated effect sizes proposed in this review (Fig. 2C)30. This apparent inconsistency between univariate effect and a sufficiently powered multivariate analysis raised concerns around imprecision and unmeasured confounding, and limited certainty in the proposed effect size to “moderate” (Supplementary Table 6). Consequently, it is likely that a negative correlation between preoperative waiting time and postoperative joint function exists.

Changes in health-related quality of life whilst awaiting total hip or knee replacement surgery

Eleven studies, representing 7,831 patients commented on changes in HRQOL in the preoperative period13,16,17,19,22,24,26,27,32,33,35. Seven of these studies (containing ten cohorts and capturing 2,153 patients) were suitable for meta-analysis13,16,17,22,27,32,35. Figure 3A presents a forest plot of the change in single preference-based health related quality of life index associated with waiting for total hip or knee replacement. Whilst awaiting surgery, there was a significant deterioration in patients’ HRQOL (MD 0.04; 95% CI 0.00,0.09; p = 0.04; I2 = 88.8%) on a scale of 0–1 (worst-to-best). Linear meta-regression (Fig. 3B) showed a statistically significant daily deterioration coefficient of 0.0005 (95% CI − 0.0001.0009; p = 0.011 (to 4 d.p.); I2 = 80.6%).

Figure 3
figure 3

Preoperative Health-Related Quality of Life Outcome Data (N = 2,153]. (A) Forest plot of mean difference; (B) Meta-regression by mean waiting time; (C) Funnel plot. Outcome measures reported include EQ-5D (Pink) and 15D Score (Green].

Notably, this analysis excluded the non-fixed waiting time cohort of Hivronen et al. Despite 143/183 patients breaking protocol, the reported per-protocol and intention-to-treat admission 15D score and standard deviations were identical to 3 decimal points. In the absence of author clarification, this raised significant concerns around both reporting errors and a negative compliance bias22.

Hirvonen et al. reported no statistical difference in 15D score whilst awaiting total hip or knee replacement (MD 0.008; 95% CI 0.002–0.0019; p = 0.123), relative to a matched population cohort19. McHugh et al. did not report SF-36 subdomains with appropriate granularity to enable quantitative synthesis but identified significant improvements in patient perceptions of the role emotion and general health outcomes (8.5; 95% CI 1.2,15.9 and 7.9; 95% CI 3.6,12.1, respectively)24. Desmeules et al. reported HRQOL using only the SF-36 physical functioning, physical health, and bodily pain subdomains, which could not be converted into a single-preference based index26,54. This study reported significant deterioration the in physical function (100-point scale) of patients awaiting hip and knee replacements (MD 4.8; 95% CI 7.2,2.4)26. Specifically, this was driven by deteriorations within the 3–6 (MD 4.4; 95% CI 7.6, 1.1), 9–12 (11.3; 95% CI 18.4,4.1), and > 12 month (MD 7.1; 95% CI 12.9, 1.3) waiting time subgroups26. Significant deterioration was also seen in the 9–12-month subgroup in physical health (MD 20.6; 95% CI: 35.1,6.1)26.

Although publication bias was not of concern, (Fig. 3C), risk of bias, primarily due to limited control of confounding, enabled only moderate certainty in this conclusion (Supplementary Table 6).16,24,35. Notably, McHugh et al., the study which contributed to this risk of bias, was not included within the quantitative synthesis, and therefore did not influence the slope of the meta-regression (0.0005; 95% CI − 0.0001,0009; p = 0.011 (to 4 d.p.); I2 = 79.8%). Thus, this risk of bias is immaterial when considering the certainty of the proposed effect size. In summary, HRQOL deteriorates every day whilst awaiting primary elective THR or TKR surgery.

Post-operative impact of waiting time on health related quality of life following total hip or knee joint replacement

Seven studies reported postoperative health related quality of life and duration of waiting time, representing 62,777 patients14,16,18,27,28,29,30. Of these, five studies, containing eight cohorts and representing 62,501 patients, reported single preference index indices of HRQOL(0–1, worst-to-best) and were thus suitable for meta-analysis16,17,27,29,30. Figure 4A presents a forest plot of these outcomes, showing an expected significant improvement in long-term HRQOL post-operatively (MD: 0.22; 95% CI 0.13,0.33; p = 0.00; I2 = 98.0%]. Subgroup analysis identified a significant improvement in HRQOL at long-term follow-up, which again favoured THR (MD: 0.32; 95% CI 0.25,0.38; p < 0.01; I2 = 74.0%) relative to TKR (MD: 0.12; 95% CI 0.04,0.20; p < 0.01; I2 = 98.0%) patients.

Figure 4
figure 4

Postoperative Health-Related Quality of Life Outcome Data (n = 62,501). Forest plot of mean difference showing total hip and knee replacement subgroups; B. Pooled meta-regression by mean waiting time; C. Meta-regression sensitivity analysis total hip and knee replacement by mean waiting time. Outcome measures reported include EQ-5D (Pink) and 15D Score (Green].

To explore heterogeneity with regards to presurgical waiting time, Fig. 4B presents a linear meta-regression, identifying a coefficient for daily reduction in post-operative HRQOL of 0.0003 (95% CI − 0.0012,0.0017; p = 0.715 (to 4 d.p.); I2 = 99.9%]) Fig. 4C compares the influence of type of joint replacement on this. Both TKR (coefficient: 0.0006; 95% CI − 0.0005,0.0016; p = 0.287 (to 4 d.p.); I2 = 96.4%) and THR (coefficient: − 0.0004; 95% CI − 0.0018,0.0009; p = 0.5554 (to 4 d.p.); I2 = 61.4%) had no significant deterioration with waiting time. Clogg’s method did not identify significant differences between these slopes53.

Three studies lacked reported parameters necessary for quantitative synthesis14,18,28. Using logistic regression, Fielden et al. did not observe a relationship between patients waiting longer than 6 months and EQ-5D18. Although commenting on postoperative SF-36, Mahon et al. did not present the subscale data within the article text (there was no accompanying supplementary materials and the authors could not be contacted) precluding extraction14. Desmeules et al. reported a significant reduction in the 6-month postoperative SF-36 role physical domain within a subgroup of patients waiting longer than 9 months for TKR, relative to the 3–6 and 6–9 month subgroups, with no statistically significant changes in the additional reported SF-36 subdomains (physical functioning and bodily pain only)28.

Six-months postoperatively, Nikolova et al. identified significant EQ-5D reductions of 0.0620 (n = 29,303) and 0.0587 (n = 32,602) per additional day spent waiting for THR and TKR, respectively (rescaled 0–100, worst-to-best)30. Subsequently, a post-hoc sensitivity analysis of Nikolova et al. demonstrated negligible influence on the results of the postoperative HRQOL synthesis (N = 596; MD 0.0003; 95% CI − 0.0015,0.0022; p = 0.7437 (to 4d.p.); I2 = 96.0%). Although the magnitude of Nikolova’s proposed multivariate effect size was consistent with the estimated effect (Fig. 4C) for TKR, the THR outcome showed marked discrepancy with this synthesis. This presented a challenge to certainty when considering this analysis’s univariate meta-regression, particularly around unmeasured confounding within long-term, post-operative outcomes (Supplementary Table 6). As such, whilst THR patients reported greater improvements in HRQOL at 6–12 months post-operatively relative to TKR, there is likely an additional relationship with presurgical waiting time that could not be observed in this analysis. The influence of publication bias, as assessed by funnel plot, (Fig. 4D) was neglibile.

Patient perspectives on delayed elective total hip or knee replacement surgery

Only five studies that reported patients’ qualitative responses to and perceptions of delayed THR or TKR were identified, summarised in Table 4 (below). All studies (3/3) commenting on psychological responses reported increased patient anxiety, whilst two studies (2/2) reported worsening patient perspectives of both THR and TKR services.

However, it was noted that there were several limitations to this outcome. Firstly, the qualitative nature of these outcomes introduced diverse outcome measures, limited both the directness and comparability of the outcome synthesis. Consequently, this precluded quantitative synthesis and assessment of publication bias, reducing certainty in this outcome to “moderate”. Despite these, all reported outcomes captured negative patient perspectives on delaying THR and TKR for patients, speaking to an underlying deleterious patient experience associated with delayed surgical care.

Discussion

Summary of findings

This review presents a contemporary and high-quality (Oxford Centre for Evidence-Based Medicine, OCEBM, Level 1) analysis exploring the impact of presurgical waiting time on primary, elective THR and TKR outcomes55. In this systematic review and meta-analysis a significant association between increased presurgical waiting time and deteriorations in patient joint-related function and HRQOL for primary elective THR and TKR were identified. Given reported thresholds, this synthesis estimates that clinically meaningful deterioration (i.e., more than the minimum clinically important difference) in joint specific outcomes and HRQOL likely occur within 6 months of waiting for surgery32,56,57. Furthermore, qualitative synthesis indicated that pre-surgical waiting time may deleteriously influence clinical outcomes up to 12-months post-operatively. In alignment with current evidence, post-operative meta-analysis identified a decreased gain in joint specific outcomes and HRQOL at 6 to 12 months post-operatively for primary elective TKR patients, relative to THR58. Patients also indicated strongly deleterious perspectives on prolonged presurgical wait for these procedures.

Limitations

There are several considerations when interpreting this review. Firstly, limitations within the reported literature prevented the use of gold-standard individual patient meta-analysis and/or multivariate meta-regression approaches.

Secondly, the outcome measures synthesised for the primary outcomes of this meta-analysis were diverse. However, only validated outcome measures were utilised in quantitative analysis, and recommended approaches to comparing these were utilised to preserve the validity of this review. Interestingly, there was no trend in preference for particularly outcome measures over time, indicating broad consensus on the utility and validity of each system.

Furthermore, 61,905 of the 89,996 patients reported were drawn from one study30. Although this study had a low risk of bias and, through its own multivariate regression, estimated a greater effect size in post-operative joint-function and HRQOL outcomes than was observed in the current meta-analysis, it is possible that this review was underpowered to directly observe any relationship between pre-surgical waiting time and postoperative joint function and HRQOL30. Consequently, a deleterious relationship remains plausible. Reassuringly, this meta-analysis independently replicated previous findings in literature, showing a greater improvement in joint functionality and HRQOL following THR relative to TKR in patients 6–12 months post-operatively58. Despite this, these postoperative outcomes also likely carried greater exposure to confounding due to the multifactorial nature of postoperative recovery (e.g., when considering the impact of independent predictors of postoperative outcomes, such as capacity for self-care, comorbid diseases (e.g. diabetes), and exercise)59,60.

Finally, all reported surgeries were performed in Europe, North America, or New Zealand. This limits the applicability and generalisability of these findings to African, Asian, and South American patient populations61,62,63. Indeed, future research should seek to expand this evidence-base for patients in underserviced health systems. However, when considering the influence of patient heterogeneity and confounding on the reported analyses, both these geographical restrictions and the broader homogeneity of included patient populations serve to mitigate against unmeasurable confounders, (e.g., socioeconomic circumstance), and preserve the internal validity of this work. In alignment with this, whilst included studies capture a 21-year study period with consequent practice variability, reported effect sizes show broad temporal agreement. Interestingly, the significant reduction in preoperative HRQOL associated with prolonged pre-surgical wait time was driven by more recent studies, perhaps indicating the influence of additional risk factors and multimorbidity within modern patient populations.

Conclusions

This is the first meta-analysis and systematic review to explore both post-operative outcomes and patients’ perspectives to delayed primary elective THR and TKR surgeries. Despite variable outcome assessment, patient voice was unanimous that delays to surgery carried deleterious impacts on both their psychosocial wellbeing and perceptions of care. Indeed, out with this context this finding is reflected within broader elective orthopaedic literature39,42,44. Importantly given its influence on HRQOL, future work should seek to further explore patient perspectives on delayed surgical care provision.

The findings of this review differ from previous systematic reviews and meta-analyses on the effect of waiting for THR or TKR. When considering previous reviews, this reported population (preoperative outcomes N = 9,020) exceeds both previous sample sizes (N = 1,646 and N = 2,490 patients for Hoogeboom and Patten, respectively), enhancing the relative power of this meta-analysis(10,11]. Given this larger patient population, and coupled with this review’s implementation of meta-regression to prevent false negative errors from arising around binary discrimination, the pre-operative arm of this analysis is likely powered.

Whilst this meta-analysis clearly shows the deleterious effect of prolonging pre-surgical waiting time in primary elective THR and TKR, no explicit relationship between waiting time and postoperative HRQOL and joint functionality was identified. However, as pre-surgical wait times continue to increase, and patient pre-operative joint function (itself an independent predicator for postoperative joint function) continues to deteriorate, this effect may become more apparent in future25,45.

In conclusion, this systematic review and meta-analysis of 89,996 patients undergoing primary elective THR and TKR demonstrates a significant association between prolonged pre-surgical waiting and deleterious preoperative joint-specific outcomes and HRQOL. Furthermore, patient voice unanimously condemned delayed care. Postoperatively, there was a plausible relationship between waiting time and 12-month joint function and HRQOL, whilst patients undergoing THR experience greater joint functionality and HRQOL gains from surgery, relative to TKR. Future work should elucidate the determinants of post-operative joint function and HRQOL following primary THR and TKR, across the global economic spectrum. However, in lieu of this, urgent action should be taken to minimise the ongoing deterioration of patients’ joint functionality, HRQOL, and psychological status whilst awaiting elective primary THR/TKR.

Methods

Protocol registration and ethics

The study protocol was registered on the International Prospective Register of Systematic Reviews (PROSPERO) database on 01/03/2022 (CRD42022288128) and was undertaken in accordance with the Preferred Reporting Items for Systematic Review and Meta-analyses (PRISMA, 2020) and Cochrane Handbook for Systematic Reviews of Interventions (2021)64,65. Ethical approval was not required.

Information sources and search strategy

Reported populations of patients undergoing elective primary THRs or TKRs were interrogated, to explore how the duration of the pre-surgical waiting period, defined as the period from placement on waiting list until surgery, influenced joint-specific outcome scores, indices of health-related quality of life, and psychological responses. On 30/1/2023, this search strategy (Supplementary Table 1) was executed on the MEDLINE(R) and In-Process, In-Data-Review & Other Non-Indexed Citations 1946 to January 30th, 2023”, “EMBASE 1980 to 2023 Week 5″, PubMed, and Cochrane CENTRAL databases. No limitations were placed these searches.

Selection and eligibility criteria

Primary published observational and randomised-controlled literature reporting elective primary TKR and THR populations, their time-to-treatment on presurgical waiting lists, and inclusion of relevant outcome data were eligible. The primary outcomes of this systematic review were validated measures of joint specific outcomes, HRQOL, or psychosocial perspective, with qualitative data being sought secondarily. Individual case-studies and unpublished data were ineligible. To ensure internal validity and preserve external validity, reported populations containing patients aged 16 or under and those undergoing non-elective (e.g., unplanned trauma surgery), partial, or secondary revision of THR/ TKR were also excluded.

Following execution of the search strategy, identified records were de-duplicated and independently screened by two authors to assess relevance. Inter-rater reliability was thus assessed using Cohen’s kappa and disagreements resolved by the senior author12,65. Subsequently, relevant abstracts were reviewed to establish final eligibility. All relevant records were reviewed to identify further references of interest. Non-English language manuscripts were only to be excluded after attempting to contact the corresponding authors.

Data collection and analysis

Following identification, primary outcome data was extracted by the primary author under supervision from the final author. This data was then analysed in prespecified pre- and post-operative subgroups to prevent confounding. Where duplicate data was reported in literature, the revised analysis was retained. Preferred primary outcome measures included clinically validated joint-specific outcome scores (Oxford Hip and Knee scores (OHS, OKS), health-related quality of life scores (HRQOL, e.g., Short Form 36 (SF-36) and EuroQol-5D (EQ5D)), for meta-analysis. However, qualitative, subjective, non-validated, or otherwise incompatible outcome measures were also recorded and manually tabulated for systematic review.

Potential sources of ascertainment bias and heterogeneity were identified through further collection of descriptive datasets (e.g., patient eligibility criteria, duration of presurgical waiting time, study period, study location(s), sample sizes, proportion female sex, and mean patient age was sought for reported populations]. Where missing data was identified or further information required for analysis, clarification was sought from the relevant corresponding author.

Given the diverse outcome measures within scoping literature, comparable data was transformed and represented as mean and standard deviation to facilitate quantitative synthesis. Joint-specific pain and function scores were converted into a common weighted score (0–100, worst-to-best of pain and function subdomains)65. Similarly, SF-36 HRQOL (0–100 scale) was converted into a single preference-based EQ-5D index (0–1, worst-to-best) following Ara and Brazier’s method54. Where necessary, the standard deviation was imputed using Wan et al.’s adaptive method66. Effect sizes were summarised as either mean difference (MD, percentage change) or odds ratios (OR), where possible.

In the absence of individual patient data, summary meta-analysis was undertaken using a random effect residual maximum likelihood model of mean difference. 95% confidence interval (95% CI), and heterogeneity (I2) were also calculated. Linear meta-regressions were performed to explore heterogeneity due to variable reported mean pre-surgical waiting times, whilst sensitivity analyses of regression curves were performed using Clogg’s method to enable comparison of postoperative hip and knee arthroplasty data53,65. Mixed model meta-analyses and meta-regressions were performed using the “Metafor” R software package67,68. Publication bias of quantitative syntheses was assessed using Egger’s funnel plot tests65,68,69.

Risk of bias and certainty assessment

Risk of bias from randomised, non-randomised, and cross-sectional studies following gold-standard Risk of Bias 2 (RoB2), Risk of Bias in Non-Randomised Studies (ROBINS-I), and Joanna Briggs Institute (JBI) frameworks, respectively47,48,70. In each instance, the overall risk of bias for sought outcomes was ascribed following the highest constituent domain. Sensitivity analysis was indicated following identification of maximal overall risk of bias. The Grading of Recommendations Assessment, Development and Evaluation (GRADE) framework was also used to assess the certainty of evidence for each outcome synthesised as either “high”, “moderate”, “low” or “very low”50.