Performance of a novel risk model for deep sternal wound infection after coronary artery bypass grafting

Clinical prediction models for deep sternal wound infections (DSWI) after coronary artery bypass graft (CABG) surgery exist, although they have a poor impact in external validation studies. We developed and validated a new predictive model for 30-day DSWI after CABG (REPINF) and compared it with the Society of Thoracic Surgeons model (STS). The REPINF model was created through a multicenter cohort of adults undergoing CABG surgery (REPLICCAR II Study) database, using least absolute shrinkage and selection operator (LASSO) logistic regression, internally and externally validated comparing discrimination, calibration in-the-large (CL), net reclassification improvement (NRI) and integrated discrimination improvement (IDI), trained between the new model and the STS PredDeep, a validated model for DSWI after cardiac surgery. In the validation data, c-index = 0.83 (95% CI 0.72–0.95). Compared to the STS PredDeep, predictions improved by 6.5% (IDI). However, both STS and REPINF had limited calibration. Different populations require independent scoring systems to achieve the best predictive effect. The external validation of REPINF across multiple centers is an important quality improvement tool to generalize the model and to guide healthcare professionals in the prevention of DSWI after CABG surgery.


Methods
The REPLICCAR II study is an observational, multicenter, prospective cohort study (9 hospitals in the state of São Paulo) conducted between August 2017 and June 2019. The Ethical Committee Board of the Heart Institute of the Hospital das Clínicas, Faculty of Medicine, University of São Paulo, Brazil approved this study as a sub-analysis of the REPLICCAR project (CAPPesq: 2.507.078). Thus, informed consent was waived due to the analysis of pre-established data logs.
We declare that all methods were performed in accordance with relevant guidelines and regulations. All consecutive patients over 18 years undergoing isolated CABG surgery (first cardiac surgery) constituted the sample. The indications for CABG surgery were according to guidelines 9 . All patients received antibiotic prophylaxis at least one hour before skin incision according to institutional policies.
The variables included in REPLICCAR II were defined using the STS ACSD (Adult Cardiac Surgery Database) collection tool (version 2.9, 2017). Approximately 760 variables were collected preoperatively, intraoperatively, and postoperatively, and included risk factors, clinical and laboratory characteristics, and complications of surgery. The data were collected using a secure web application for building and managing online surveys and databases, the REDCap platform (Research Electronic Data Capture, https:// www. proje ct-redcap. org/).
The participating hospitals, their researchers, and data managers participated in meetings and data training before and during the data collection period. Data were audited twice by the REPLICCAR II team to evaluate the accuracy and validity of the information collected by the trained data managers 10 .
A trained surgical clinical nurse reviewed the infection criteria and definitions following the infection control surveillance system (Standard CDC National Healthcare Safety Network definitions following the National Healthcare Safety Network-NHSN) 11 . All infections involving the subcutaneous tissue to the mediastinum within 30 days following CABG surgery were considered DSWI. This involves fascia and muscle layers as well as organs, spaces and/or deep soft tissues. "Mediastinitis" refers to an infection of the mediastinum, which can be caused by different etiologies, including DSWI following sternotomy 11 . A computed tomography imaging study was performed in all patients with suspected DSWI and/or Mediastinitis for diagnostic confirmation. The infection control services of the participating hospitals on REPLICCAR II perform routinely the active surveillance to report surgical wound infections that progressed to deep planes and Mediastinitis. Data were verified by a specialist nurse from the coordinating center while carrying out her doctoral project. Thus, only cases diagnosed with the definitions and criteria of the CDC-NHSN were included in our analysis. The reference was reviewed and changed for the one used for the infection control services during the study. Patients who have a fascia or muscle affected by an infection during hospitalization often receive surgical wound debridement, antibiotics and negative-pressure wound therapy (vacuum-assisted closure) to prevent mediastinitis.
Approach. Confounders and predictors. First, we eliminated all variables missing in more than 30% of the patients because the imputed values would be driven by the imputation model. We next identified variables related to incidence of DSWI in the scientific literature and found 160 variables in our database. Of these, 55 variables with statistical association or clinical significance for DSWI were considered as predictors (supplementary Table 1). Therefore, the variables were chosen for the initial analysis according to their relationship with the scientific literature and subsequently for their statistical significance, all of this given the multifactorial nature related to infections. Our goal was to build a model that was the most rational and at the same time scientifically robust, including both pre and intraoperative variables.
Treating missing data with multivariate imputation. We used chained equations (MICE) to impute missing data and created 10 imputed datasets.
Sample distribution was captured with histograms and descriptive statistics.
Statistical analysis. The training sample was created using the REPLICCAR II database that included 4,085 patients. Information from an additional 498 patients from a different set of hospitals was assembled to create an external validation set (from 2015 to 2016). The model development for variables selection and regularization was performed with the least absolute shrinkage and selection operator (LASSO) logistic regression tenfold cross-validation, to enhance the prediction accuracy and interpretability of the statistical model produced.
We calculated the area under the receiver operating characteristic curve (c-index) to evaluate the discriminatory performance of the model and calibration in-large (CL) containing the observed and predicted values (ratio of observed/predicted). The discriminative ability was also evaluated by net reclassification improvement www.nature.com/scientificreports/ (NRI) and integrated discrimination improvement (IDI) 12 . The results were plotted to compare the new model (REPINF) with the STS in both the training and validation databases. We follow the guidelines recommended in the Transparent Reporting of a multivariable prediction model for Individual Prognosis or Diagnosis statement checklist (TRIPOD) 13 .

Ethics approval. This study was submitted and approved by the Ethics Commission for Analysis of Research
Projects (CAPPesq) under number 2016/15163-0. The free and informed consent was dismissed due to the analysis dealing with pre-established data logs.

Results
Nine hospitals started data collection, but 7 hospitals actively participated during the 2 years of the project. After exclusion of the patients from 2 hospitals (n = 53), our final sample size was 4085 patients undergoing isolated CABG surgery as the first cardiac surgery. The mean age was 63.3 years (95%CI 62.9-63.5) and 74% were male. The mean body mass index (BMI) was 27 kg/m 2 (95%CI 26.9-27.2) and common comorbidities included diabetes (49%), hypertension (88%), dyslipidemia (62%) and previous myocardial infarction (52%). The baseline characteristics and missing percentages are described in supplementary Table 2. After excluding variables with more than 30% missing, 5% of all patients had at least one missing variable in the REPLICCAR II database (n = 4085 and 160 variables).
The incidence of DSWI during 30 days from surgery was 2.47% (n = 101). We observed 104 deaths, a competing risk for DSWI, within 30 days (3.1%); of these, 3 patients died with DSWI in the period (2.9%). Characteristics between infected and non-infected patients are described in Table 1.

Model development.
Out of a total 55 of variables related to pre-and intraoperative factors, 7 were included in the Lasso modeling after tenfold cross-validation (Table 2). In the training sample (n = 4085), the REPINF had a c-index of 0.81 (95%CI 0.77-0.86) compared to STS c-index of 0.70 (95%CI 0.64-0.75) (Fig. 1). The predicted mean for DSWI was 0.12% (SD = 0.08) using the STS. The calibration in-the-large plot (Fig. 2) demonstrated that STS predictions tended to underestimate the DSWI risk in our sample.

External model validation. The validation database included 498 patients undergoing isolated CABG
during 2015-2016. The mean age was 61.7 (SD = 9.5), 78.5% were male, 54.6% had diabetes and 88% had hypertension. Relative to the STS, REPINF demonstrated improved classification with a NRI of 29% (Table 3) and, the discrimination IDI was 0.065 (6,5%). The incidence of DSWI in this sample was 1.61%. The STS model predicted 0.24% (SD = 0.15) of DSWI events and, REPINF 2.65% (SD = 1.58). The c-index was 0.83 (95%CI 0.72-0.95) and 0.72 (95%CI 0.56-0.88) for REPINF and STS in the external validation sample, respectively.    www.nature.com/scientificreports/ It is important to note that EuroSCORE II and STS are the most used worldwide to predict mortality risk after cardiac surgery. However, only the STS model has a validated index to predict the risk of DSWI. Therefore, our research team decided to compare our model only with the STS in order not to bring the comparison bias if we choose to use the EuroSCORE II since our outcome would not be mortality but DSWI. Below, we show in Table 4 some basic characteristics between the STS, REPINF and EuroSCORE II models.

Discussion
The development of prognostic models that combine patient characteristics, risk profiles, and surgical practice to produce predictions about future outcomes allow informed clinical decision-making 8 . Risk models can be used for quality measurement, clinical practice improvement, voluntary public reporting, and research.
The STS systematically underestimates the risk of infection in CABG patients, possibly due to the low rate of this complication for this type of procedure (less than 0.5%), yielding a c-index of 0.68 for CABG surgery patients 3 . The REPINF demonstrated better discrimination (IDI = 6,5%) and net reclassification improvement (NRI = 29%) compared to STS model. Differences in health management and performance across countries may also affect model discrimination. The MagedanzSCORE 4 (2010) was created using information from a single Brazilian institution among adults (n = 2809) undergoing isolated CABG and valve surgery. The score was developed and validated in the same population and thus provides overly optimistic model performance metrics 5 .
A recent prospective multicentric study with 16 centers of cardiac surgery in 6 European countries (England, Finland, France, Germany, Italy and Sweden) reported an incidence of DSWI of 2.5% and the following  www.nature.com/scientificreports/ independent predictors: female gender, BMI ≥ 30 kg/m 2 , estimated glomerular filtration rate < 45 mL/min/1.73 m 2 , diabetes, chronic lung disease, preoperative atrial fibrillation, critical preoperative state and BITA grafting. The model achieved better discrimination than the usual scores (Alfred Hospital Risk Index, Friedman Score, and Brompton-Harefield Infection Score). Compared to these values, improvements in discrimination (IDI) ranged from 1.2 to 2.1%, but were not compared with the validated cardiac surgery model STS 24 . A single ''calibrated model'' to make predictions across patients undergoing many different surgery types is challenging 25,26 . Our model achieved better accuracy than the STS having c-indices of 0.83 (REPINF) and 0.72 for STS in the validation cohort. It is important to note that the external validation database corresponds to an active participant in STS reports since 2014, which is center that is likely not representative of all Brazilian hospitals. The STS systematically underestimated DSWI in both the internal and external validation datasets (Calibration: Fig. 2). Our score overestimated risk in external validation cohort for those at the lower end of the risk scale. This overestimation may be related to the largest volume of patients coming from the public health system used for the elaboration of the REPINF. For validation, the REPINF was evaluated in a specific population of private network patients. This may have influenced REPINF to overestimate the risk of infection in the validation sample.
Calibration is an important aspect in models constructed for predictive purposes. It is necessary to keep data collection guided by rigorous quality registries and criteria to achieve and maintain the best accuracy in predictions, considering that this information may be often contaminated by noise 26 . To improve calibration, risk scores should be adjusted for the case-mix of hospitals, with recalibration or remodeling being recommended 27 . Adding more variables and optimizing estimates of improvement may increase model performance but at the same time cause overfitting. REPINF model was created considering these situations, where all variables included for LASSO regression were associated with DSWI creating a difficult task for the variable selection.
LASSO was originally formulated for linear regression, and it's applied in statistics and machine learning for variable selection and regularization. Before LASSO, the stepwise approach is the most widespread method for choosing covariates. Also, LASSO improves prediction error by shrinking the sum of the squares of the regression coefficients to be less than a fixed value to reduce overfitting 26,28 .
Accurate information is essential to access patient's prognosis, which simultaneously considers a number of factors and provides an estimate of the patient's absolute risk of an event and, for DSWI, it is a great challenge. Clinicians and surgeons need an accurate risk prediction for decision support, quality of care assessment, and patient education. Continuous evaluation of the model performance is important to ascertain that the classification performance does not degrade with time. Some models are redeveloped periodically to adjust for temporal trends. Recently, STS updated the model to predict mortality in children following cardiac surgery using the proposed machine learning method [29][30][31][32][33] .
We suggest that REPINF score should be estimated when the patient arrives to the intensive care unit. This moment becomes fundamental in the management of patients after CABG. By this way, the professional team would be able to establish a clear plan of care based on the patient risk, minimizing thereby the potential complications, and reducing costs and hospital length of stay. Also, specific protocol may be developed by the infection control team 34 . More investigation should be performed to determine cut-offs on risk classification and timing for the application such preventing strategies. This paper describes the development and validation of the deep sternal wound infection model for CABG patients. According to the medical literature, factors related to intraoperative timing are also associated with Mediastinitis, so we included these variables in our registry for analysis. Limitations related to data completeness and accuracy were carefully addressed during quality audits for all institutions. Still, some important clinical aspects were not evaluated and could increase the sensibility and precision of the model, for example, the use of pedicled or skeletonized harvesting conduits, glycated hemoglobin, albumin, bilirubin, and variables related to DSWI treatment (fluid or tissue culture, antibiotics, wound intervention, bandages, and others). In our institutions, the infection control service follows these data for epidemiological surveillance, and to guide preventive protocols according to the CDC surgical site prevention manuals 34 . Future studies should consider all possible detailed information and recommend standard prevention interventions to avoid bias and increase accuracy.
Another important issue is related to the DSWI detection method (30-day follow-up), which may vary across institutions 24,25 . In our study, this limitation was controlled by having trained researchers to make contact 30 days after surgery with each patient, with only 5.97% of incomplete follow-up.
In summary, this study considered a structured, standardized approach to model development, and validation to identify factors to help multidisciplinary teams prevent DSWI after CABG. More studies should be performed to validate these findings, but we suggest that REPINF, as well as the STS prediction models 35 , provides the highest generalizability for future data. Thus, it's proven that different populations require independent scoring systems to achieve the best predictive effect.

Data availability
The data generated during the current study are not publicly available due to ethical restrictions; patients did not consent to their deidentified data being publicly shared but are available on reasonable request to the Scientific Committee Director Renata do Val (renata.doval@incor.usp.br; https:// www. incor. usp. br/ sites/ incor 2013/ index. php/ 16-pesqu isa/ comis sao-cient ifica/ 158-fale-conos co).