Introduction

Surgical site infections (SSI) complicate up to 40% of surgical procedures depending on operative type and procedure1. By definition, an SSI occurs within 30 days of surgery (or within 90 days if an implant is left in place)2. Given the current health landscape, the mean postoperative inpatient stay is four days, therefore the majority of SSI become apparent after discharge3,4. Early recognition of SSI is essential to minimise associated morbidity and mortality, and patients frequently seek care from primary or community care providers, who may not be familiar with managing surgical complications. Strategies are required to enable secondary care providers to conduct robust surveillance and follow up of surgical wounds5,6.

Telemedicine is an innovative solution for monitoring patients and their wounds postoperatively. Remote consultations ameliorate the need to leave home and associated carer requirement, reduce travel times and costs, and reduce waiting room times and risk of nosocomial infection7,8,9. Patients frequently find the experience reassuring and many would prefer future consultations by this method10,11,12. A reduction in patient travel seems to have wider implications still; a recent review concluded that the use of telemedicine consistently reduces carbon footprint compared with face-to-face reviews, even when factoring the impact of equipment and resource use13. With national targets such as net zero emissions by 2045, implementation of remote measures may become a mainstay of practice in years to come14. The SARS-CoV-2 pandemic catalysed integration of digital health models worldwide and telemedicine was applied at the forefront of many patient facing services. Surgical follow-up has followed with rapid adoption of remote post-operative follow-up, but cautious examination is warranted before being welcomed as standard practice15. Telephone consultations, whilst providing invaluable information at a fraction of clinic resource use, do not provide direct visualisation of a patient’s post-operative wound. However, even with the addition of a visual aspect in photo- or video-based approaches, there are barriers to this service. For example, erythema, a hallmark characteristic of accepted SSI definitions, has poor levels of interobserver agreement on photograph assessment14,16,17. Before telemedicine can be unanimously recognised as established practice substantial evidence of diagnostic accuracy is required.

The aim of this study was to (1) establish the overall accuracy of telemedicine for diagnosis of SSI; (2) identify factors associated with heterogeneity of findings between studies; and (3) assess the effect of individual telemedicine methods and impact of varying reference standards on diagnostic accuracy.

Results

Study selection

The study selection process flow diagram is shown in Fig. 1. A total of 1400 records were screened after 488 duplicates removed. After title and abstract screening, 61 full text reports were assessed for eligibility. The final review included 19 studies, and 17 had paired designs taken into a meta-analysis16,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35. 11,437 observations were made in 19,090 patients as ten studies only included telemedicine investigation in a subset of patients18,19,20,21,22,25,26,28,30,32. Three reports were unable to be retrieved. For each, contact was attempted through the publishing journal and first author on two separate occasions, after which studies were excluded from review.

Fig. 1: PRISMA flow diagram for study selection process.
figure 1

Identification of studies via databases and registers is found in the left column, and via other methods on the right.

Characteristics of included studies

Studies were conducted in nine countries across five continents globally. Five were in low or lower-middle income economies, as per the World Bank classification18,19,22,27,28. The remaining reports were from high income economies. Weighted mean age of participants across the included reports was 47.1 ± 13.3 years. Female patients made up 57.4% participants. Pooled SSI rate was 5.6% (95% CI, 5.49–5.74). Individual study characteristics can be found in Table 1.

Table 1 Individual study characteristics.

Methodological quality of included studies

A summary of QUADAS-2 assessments is presented in Fig. 2. Risk of bias was present in all studies, and two studies were scored as high risk of bias in all domains22,30. This was largely owing to inconsecutive sampling, clarity over interpretation of index tests without knowledge of the reference standard (and vice versa), the interval time between interpretation of index test and reference standard, and only subgroups of patients being included in study analysis. Nine reports had high applicability concerns, principally from the index test, patients or reference standard differing from the review question22,23,24,26,28,29,32,34. Risk proportions are displayed in Fig. 3.

Fig. 2: Risk of bias and applicability concerns summary: review authors’ judgements about each domain for each study.
figure 2

High, unclear and low risk of bias and applicability concerns are represented as shown in the legend.

Fig. 3: Risk of bias and applicability concerns graph: review authors’ judgements about each domain presented as percentages across included studies.
figure 3

High, unclear and low risk of bias or applicability concerns are represented as shown in the legend.

Synthesis of results

Individual study estimates of test accuracy are presented in Fig. 4 as a coupled forest plot of sensitivity and specificity. Index tests were categorised into photograph, telephone and questionnaire based methods, with five16,21,23,24,25, nine18,19,22,27,28,29,30,31,33 and three studies20,26,32 available for each respectively. 15 manuscripts16,18,19,20,21,24,25,26,27,28,29,30,31,32,33 utilised a CDC based reference standard, with the remaining two22,23 having empirical or site-specific protocol for these. Two studies22,23 conducted follow up within 14 days, and a further four studies24,26,30,32 were unclear as to the timeframe for reference standard review. There were no studies available that compared multiple index tests or reference standards.

Fig. 4: Coupled forest plot presenting sensitivity and specificity of SSI diagnosis by telemedical methods.
figure 4

Final two columns display the sensitivity and specificity respectively with 95% confidence intervals.

The mean sensitivity of all telemedical methods for SSI diagnosis is 87.9% (95% CI, 68.4–96.1) and mean specificity is 96.8% (95% CI, 93.5–98.4). Mean values broken down by index test is shown in Table 2. Youden’s index is acceptable at 0.847. The mean positive and negative predictive values for diagnosis of SSI are 54.8% (95% CI, 52.1–57.4) and 98.5% (95% CI, 98.2–98.6) respectively. Random effects SROC curve for all methods of telemedicine in diagnosis of SSI shows a symmetric design approaching the top left corner and is plotted in Fig. 5. Heterogeneity seen in the 95% prediction region is explored further in subgroup analysis. Overall diagnostic odds ratio indicates high effectiveness for SSI diagnosis at 217.6 (95% CI, 47.0–1006.8).

Table 2 Summary test accuracy of surgical site infection diagnosis by index test method.
Fig. 5: Random effects bivariate summary receiver operator characteristic curve of telemedicine for the diagnosis of surgical site infection.
figure 5

Summary curve and point estimates display high levels of accuracy. Elliptical data points represent weight sensitivity-specificity trade-off for each study. The summary point is expressed in the summary curve with dotted line 95% confidence region and dashed line 95% prediction region.

Subgroup analysis

Five studies utilised photograph based telemedicine16,21,23,24,25. Two studies16,21 retrieved images with a digital camera, another two23,25 used smartphone and a final study24 did not specify the platform used. No studies used machine learning or ‘artificial intelligence’ methods to assist in diagnosis of SSI. A total of 1638 observations were available in 2287 patients, again due to subsets being included in diagnostic test accuracy analysis. The weighted average age was 46.8 ± 11.7 years and 35.8% of patients were female. All studies were conducted in high income countries (HIC). SSI rate across the available studies was 3.72% (95% CI, 3.16–4.29). The mean sensitivity for photograph based methods is 63.9% (95% CI, 30.4–87.8) and mean specificity 92.6% (95% CI, 89.9–94.5). The mean positive and negative predictive values for SSI diagnosis are 15.6% (95% CI, 11.6–20.7) and 97.6% (95% CI, 97.0–98.0). The random effects SROC curve for photograph based methods shows a symmetric distribution and is displayed in Fig. 6. Overall diagnostic odds ratio indicates good test effectiveness at 22.0 (95% CI, 4.7–102.5). Heterogeneity is largely reduced, although the region of confidence is conversely enlarged.

Fig. 6: Random effects bivariate summary receiver operator characteristic curve for photograph based recognition of surgical site infection.
figure 6

Elliptical data points represent weight sensitivity-specificity trade-off for each study. The summary point is expressed in the summary curve with dotted line 95% confidence region and dashed line 95% prediction region.

Comparative index test SROC curve analysis reveals three distinct distributions of symmetric plots with telephone methods showing superior test accuracy, approaching the upper left corner (Supplementary Fig. 1). CDC criteria were used as the reference standard in all but two studies22,23. Analysis of tests standardised by CDC reference standard marginally increases overall sensitivity to 90.3% (95% CI, 0.695–0.974) but has no significant impact on specificity (96.8%, 95% CI, 0.932–0.985). SROC curve for telemedicine using CDC criteria reflects this marginal increase in sensitivity but 95% prediction region is also increased in size (Supplementary Fig. 2). Summary test accuracy by CDC reference standard is represented in Supplementary table 2. All methods of telemedical follow-up are informative with diagnostic odds ratios >10. Five studies diagnosed SSI through review by surgeons. Observations from 4451 participants were available resulting in lower overall sensitivity at 84.5% (95% CI, 22.3–99.0) and specificity at 94.5% (95% CI, 90.9–96.8). Heterogeneity in diagnosis was reduced with this limitation but at the cost of a greater 95% confidence region.

Discussion

Telemedicine achieves good sensitivity (88%) and high specificity (97%) irrespective of geographical location and socioeconomic status. As such, remote methods could be considered globally as a screening tool for SSI post discharge, on the basis that correctly identifying patients without an infection can prevent them from needing to travel long distances to see a clinician, and conversely those diagnosed as having had infection can be signposted to appropriate post-operative care at an earlier stage. Telephone-based appears to be the most accurate telemedicine method in SSI diagnosis, and has been the most extensively studied index test in the last two decades. Arguably this is the most readily deployable and economically viable option. Telephone discussions enable real-time data collection from the patient with follow-up questions adaptable to the scenario, and conversely, can be used to deliver validated questionnaires by untrained individuals, should the need arise (as part of widespread screening, for example). Clinicians can obtain further information from patients in response to signalling, such as exploring systemic symptoms of infection, that may not be derived through images alone.

In contrast, photograph-based methods are contemporary, with all studies conducted within the last seven years. This index test offers additional visual stimuli which other methods do not, debatably of paramount importance in the diagnosis of SSI. Unlike telephone methods, photograph reviews are asynchronous, with data collected at a time prior to their review, preventing flexible further questioning. There may be a lack of standardisation across photograph based studies as image quality and provision of wound photography guidance (or training) are important factors in determining test accuracy, however these elements were not extractable from the available literature and may have contributed to the lower accuracy compared with telephone based methods36. In addition, diagnosis of SSI based on appearance alone is subjective, whether in person or using digital images, and this subjectivity may reduce the overall accuracy of digital assessment. Standardisation of wound photograph technique, and patient acceptability in digitally naïve populations (due to both age and socioeconomic status) are important research factors to be established in this area. Further, no single study investigated more than one diagnostic method or the impact on test accuracy with a combination of techniques i.e. photograph review with simultaneous questionnaire submission or with telephone review for concurrent data extraction, or the impact of video based assessments, where dialogue and a contemporaneous wound assessment can take place. Future studies should assess the diagnostic accuracy of combined or novel telemedicine methods in order to determine the optimal approach.

Postoperative wound surveillance is notoriously challenging, resulting in underreporting of SSI rates37,38. Overstretched primary care services are often burdened with facilitating management of such complications after discharge. Digital telemedicine requires limited resources and has potential utility in alleviating primary care exigency by offering a direct connection with secondary or tertiary care providers. Healthy wounds can be easily identified without the inconvenience of travelling great distances to clinic. Equally, obvious and indeterminate infection can be swiftly identified and either appropriately managed remotely or returned to secondary care for definitive treatment.

The National Health Service (NHS) in the UK is committed to delivering a net zero service by 204514. In the NHS in England, up to 10% of total carbon dioxide emissions are attributable to personal travel, and specifically, emissions from patient travel have almost doubled since 1990 (0.63 to 1.23 Mt CO2e)39. Each hospital outpatient appointment is estimated to produce 76 kg CO2e, and a visit to general practice produces an estimated 66 kg CO2e39. Digital, remote follow-up offers some mitigation in the personal travel targets for a greener NHS. Further, artificial intelligence, or machine learning has been identified as a route to buttress the emission reduction effort40. If established into practice, machine learning could reasonably be applied to digital wound surveillance models to alleviate clinician time, minimise carbon footprint and unburden clinical resource use.

The telemedicine population studied is relatively young (weighted mean average age 47.1 years) which may reflect usability and is unlikely to represent the entire surgical population. Vascular patients for instance, are frequently much older (average age 64.1 years) and comorbid, as such may not be able to comply with smartphone-based telemedicine methods41. Investigation of telemedicine in elderly patients has identified a lack of access and experience with technology, and hearing, visual and communication challenges as barriers to utilisation42,43. Widespread adoption of such strategies without efforts to improve inclusion may be disproportionately disadvantageous to the elderly or infirm population groups.

This study has some limitations. High levels of heterogeneity were apparent in the initial meta-analysis (Fig. 5), as expected from diagnostic test accuracy studies. Photograph based subgroup analysis provided moderation (Fig. 6), but substantially fewer observations (1638 compared with 11437 for all telemedicine) should warrant cautious interpretation, and were all from high income economies. All included studies contained high risk of bias and so no exclusions were made on this basis alone. Most studies did not report test accuracy as their primary outcome and as such only subgroups of participants were included in analysis. No studies used a case-control design and one had a retrospective nature32. Three reports either had all or subsets of patients recruited in non-consecutive samples20,26,32. An unclear or inadequate (more than one week) time interval between index test reference standard was apparent in five studies23,24,26,30,32. Two studies investigated index tests or reference standards within two weeks of surgery22,23. The same reference standard was not used throughout, giving rise to potential for verification bias. Accuracy of diagnosis and heterogeneity did not however alter significantly in subgroup analysis using CDC criteria. Whilst recognised as the gold standard, CDC criteria is not without challenges. The classification is subjective and has poor interrater agreement, resulting in variable comparisons of wounds44. When compared with the ASEPSIS criteria, which are objective, ASEPSIS over classify SSI but under report SSI if pus is present17,44,45. The Southampton score is another alternative45. CDC is the most widely used, and as often regarded as the reference standard despite the inherent flaws. The need for a robust gold standard has been identified, but does not seem to have been accomplished yet.

The evidence suggests that using telemedicine, in the form of telephone consultation, with or without photographic adjuncts, to diagnose SSI is highly specific and as such could be utilised as an effective screening tool in patients post discharge. Implementation of this method has great potential in the reduction of resource use, associated healthcare cost, and patient and clinician time expenditure. It has widespread applications spanning geographical and socioeconomic barriers and would improve the carbon footprint of health services globally. However, the average age of participants in all studies is relatively young and as such may under-represent the surgical population. Widespread adoption of telemedicine without strategies to improve inclusion may therefore disproportionately discriminate against the elderly or infirm. Included studies were also at risk of bias which may impact upon the validity of results. Further work is required to maximise engagement with telemedicine in digitally naïve or incapable populations, and to determine the specific utility of telemedicine within clinical practice in order to maximise its benefits.

Methods

This study was conducted in accordance with the Cochrane handbook for systematic reviews of diagnostic test accuracy, and has been reported in line with the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA-DTA) statement, a copy of which is attached to this article in the supplementary information (supplementary table 1)46,47,48. The protocol for this review was prospectively registered with PROSPERO (ID CRD42021290610) and has been submitted for peer reviewed publication, with a pre-print available online49.

Search strategy and selection criteria

Studies meeting the following criteria were considered for inclusion:

  1. i.

    Participants: All post-operative patients over 18, of any operation type. No restrictions were placed on the study setting or length of follow-up.

  2. ii.

    Index tests: Telemedicine by any method (telephone, video call, photograph or questionnaire), including the use of questionnaires as these can be delivered remotely.

  3. iii.

    Reference Standards: Face to face review, as per the United States (US) Centres for Disease Control and Prevention (CDC) criteria for SSI is deemed the gold standard, but no restrictions were placed if other methods were use. This was to ensure all available evidence would be synthesised.

  4. iv.

    Target condition: SSI as defined by the CDC criteria; infection within 30 days of surgery or within 90 days if an implant is left in place50.

  5. v.

    Study design: Abstracts, reviews and conference proceedings were excluded. All other research designs were included in the systematic review, but only comparative, paired methodologies were taken forward to meta-analysis as all patients would experience index tests and reference standards.

Studies were excluded if they did not meet the inclusion criteria or were not presented in English (for lack of resources to translate other languages). The following databases were searched from inception to January 2022: Medline, Embase, CENTRAL and CINAHL. A combination of synonyms related to the keywords; “telemedicine” AND “surgical wound infection” formulated the terms used. The strategy used for Medline, Embase and CINAHL can be found in supplementary information (supplementary methods 1).

The search strategy was developed with and conducted by an information specialist who uploaded results onto the Rayyan, a bespoke tool for conducting systematic reviews51. These were deduplicated before screening of titles and abstracts by two independent reviewers against the inclusion criteria. Relevant manuscripts were retrieved for full text review and assessed for eligibility by two independent reviewers. Reference lists of these articles were searched manually for any additional studies not identified in preliminary search. Any disagreement at each stage was resolved by a third reviewer for consensus.

There were no limitations placed on study design for qualitative synthesis to comprehensively synthesise the literature. Reports with paired designs were taken forward for quantitative analysis to enable random-effects bivariate meta-analysis, and summary receiver operator characteristic (SROC) curves to be plotted.

Data extraction

A bespoke data spreadsheet (Microsoft Excel Version 16.59) was designed and utilised for data extraction by two independent authors. Data on study and diagnostic characteristics (author, year, country, study design, sample size, gender, age, telemedicine method, reference standard, type of surgery, follow-up schedule) among potential confounding factors (diabetes, BMI, and smoking status) were collected in addition to SSI rates, sensitivity, and specificity of diagnosis.

Surgical site infections were defined as per CDC criteria2. Only superficial SSI were included due to inherent barriers of diagnosing deep SSI remotely. No restrictions were placed on classification of telemedicine, reference standard type, or other characteristics.

Assessment of methodological quality

Risk of bias and the applicability of studies were assessed again by two independent reviewers with the QUADAS-2 tool52. The tool was first piloted by the reviewers with agreement of 80% across all categories on two of the included studies considered sufficient before further assessment of remaining studies, as recommended by the Cochrane handbook for systematic reviews of diagnostic test accuracy47. QUADAS-2 contains four domains, each assessed for risk of bias; patient selection, index tests, reference standard and flow and timing. The first three domains are also investigated for applicability concerns. For each domain category, signalling questions are asked to assist judgments with answers ‘yes’, ‘no’ or ‘unclear’, such that ‘yes’ indicates low risk of bias. If any question is answered ‘no,’ this domain category is judged as high risk of bias or has applicability concerns. Answers of ‘unclear’ are used only if there was insufficient data reported. Risk of bias and applicability scores were taken into consideration for subgroup meta-analysis, ascertaining a strength of recommendation from data retrieved.

Statistical analysis

Continuous descriptive characteristics were expressed as weighted mean averages with standard error. A bivariate model for meta-analysis was used to produce summary measures of sensitivity and specificity with confidence regions. All studies with paired designs had pooled forest plots and summary receiver operator characteristic curves synthesised in the initial exploratory analysis. Analysis was conducted with MetaDTA and plots constructed with Review Manager 5.453,54. Additional sources of heterogeneity were investigated through covariates (study country, type of surgery, telemedicine method, reference standard used). Overall index test effectiveness is expressed through diagnostic odds ratios.

For cases of multi-threshold test positivity, the cut-off achieving the maximum possible sensitivity – specificity trade off were taken forward. Indeterminate index test results were classified as ‘no SSI’ as this more closely reflects what would happen in practice. Tests were grouped as a unified ‘telemedicine’ and through the sub-groups; ‘photograph,’ ‘telephone,’ and ‘questionnaire.’ No studies reported video-based methods.

Subgroup analysis

All studies which compared photograph to face to face review will be referred to as photograph-based telemedicine methods. Photograph-based methods utilise visual input whereas questionnaire and telephone do not incorporate trained physicians viewing a patient’s wound. As such, pre-specified analysis is conducted for studies including these methods for their sensitivity and specificity. Further analyses are performed as per the reference standard used and whether a pre-specified threshold was stated.

Reporting summary

Further information on research design is available in the Nature Research Reporting Summary linked to this article.