Validation of SinoSCORE for isolated CABG operation in East China

From January 2010 to December 2016, 1616 consecutive patients who underwent isolated coronary artery bypass grafting (CABG) were evaluated for their predicted mortality according to the online Sino System for Coronary Operative Risk Evaluation (SinoSCORE), European System for Cardiac Operative Risk Evaluation II (EuroSCORE II) and Society of Thoracic Surgeons (STS) risk evaluation system. The calibration and discrimination in the total and in the subsets were assessed by the Hosmer-Lemeshow (H-L) statistics and by the C statistics respectively, to evaluate the efficiency of the three risk evaluation systems. The realized mortality was 1.92% (31/1616). The predictive mortality of SinoSCORE, EuroSCORE II and STS risk evaluation system were 1.35%, 1.74% and 1.05%, respectively. SinoSCORE achieved best discrimination. When grouping by risk, SinoSCORE also achieved the best discrimination in high-risk group, followed by STS risk evaluation system and EuroSCORE II while SinoSCORE and EuroSCORE II had excellent performance in low-risk group. In terms of calibration, SinoSCORE, EuroSCORE II and STS risk evaluation system all achieved positive calibrations (H-L: P > 0.05) in the overall population and grouped subsets. SinoSCORE achieved good predictive efficiency in East China patients undergoing isolated CABG and showed no compromise when compared with EuroSCORE II and STS risk evaluation system.

Calibration plots showed that three risk evaluation systems deviated from the diagonal. It was explained that three risk evaluation systems underestimated mortality rates in total patients, where SinoSCORE performed slightly better than others (Fig. 3).
The decision curve analyses (DCA) represented the clinical practicability of the three risk evaluation systems to predict operative mortality. The results were showed as a graph with the selected probability threshold (i.e., the degree of certitude of postoperative mortality over which patients refused operation) plotted on the abscissa and the net benefits of the risk evaluation system on the ordinate. In the entire cohort, decision curves of EuroSCORE II and SinoSCORE were similar, and the curve of EuroSCORE II was slightly greater than the curve of SinoSCORE, included between 0 and 30%. But they were all always above the curve of STS risk evaluation system regardless of the selected threshold. (Fig. 4C) In high-risk group, the net benefits of the STS risk evaluation system were worse than those of SinoSCORE and EuroSCORE II regardless of the selected threshold. The curve of SinoSCORE was slightly greater than that of EuroSCORE II, included between 0 and 40%. (Fig. 4A) In low-risk group, the net benefit of the SinoSCORE was always greater than that of EuroSCORE II and STS risk evaluation system between 0 and 20% (Fig. 4B).

Discussion
In recent years, because of the rapidly increasing CABG patients and the demand for high-risk surgery, both patients and surgeons have become aware of the risk evaluation system. These systems have played an important role in surgical decision-making and have improved the quality of medical treatment, preoperative patient education and consent, optimisation of the allocation of medical resources and standardisation of the comparisons among different centers or surgeons [6][7][8] . The risk evaluation systems were aimed at providing a more accurate assessment to guide surgery for individual patients by balancing the potential risks and benefits 9 . A thorough risk evaluation system should be established on a large database that is representative of current clinical practice, and systematic data validation should be utilised to affirm its accuracy 10 . Risk evaluation systems for heart surgery have been under study in developed countries since decades ago, and based primarily on European (EuroSCORE II) and North American (STS risk evaluation system) databases, which may lead to the obvious errors when applied in Chinese population [11][12][13][14][15] . In this context, SinoSCORE, which was established with Chinese database, was developed in 2010. At the same time, the previously developed risk evaluation systems were under continuous revision to improve the accuracy and representativeness of the database due to the increasing numbers of research centers, cases, and changed or removed of outdated risk factors [16][17][18][19][20][21] . Therefore, SinoSCORE, EuroSCORE II and STS risk evaluation system were all established for several years. The first affiliated hospital of Nanjing Medical University and East hospital affiliated to Tongji University are both regional central hospitals, located in Nanjing and Shanghai, East China. Patients from the two hospitals could represent typical East China patients. Because of the vast territory of China, there are great differences in the four corners. There were some different proportions in the same risk factors between our study database and SinoSCORE database, such as age, diabetes, hypertension, renal failure, cerebrovascular accident, previous cardiac surgery and so on (Table 5). It is significant to compare the three risk evaluation systems in East China patients. Validation literatures on Chinese patients excluding isolated valve surgery, only one had been published that indicated the EuroSCORE II performed well in predicting mortality in total and in the low-middle risk group, whereas not in the high-risk group 22 . Although the EuroSCORE II database had significant differences with our study database in parity of regions and populations, it achieved excellent predictive value in total (AUC = 0.814), as well as in low-risk groups of patients (AUC = 0.861). Similar to the result of Bai et al. 22 , the discrimination of EuroSCORE II in the high-risk group was not satisfactory. The number of patients at high-risk in EuroSCORE II was two times higher than in SinoSCORE and STS risk evaluation system, some patients with low-risk were assigned to the high-risk group, and which might be the reason contributed to the discrimination of EuroSCORE II in the high-risk group was not satisfactory.
As well-known as EuroSCORE II, STS risk evaluation system was composed of three parts: isolated CABG, isolated valve surgery and valve surgery plus CABG 17,19,20 . The validation database affirmed the clinical application value of this system 19 . In recent years, there were reports that STS risk evaluation system was well-validated in  British, New Zealander and in Indian patients (in which it had satisfactory calibration power but poor discriminatory power) undergoing heart surgery 2,23,24 . In our study, STS risk evaluation system achieved positive calibrations (H-L: P > 0.05) in the entire cohort and in subsets, which was in accordance with Zhang et al. 23 . They reported that this system might be a potentially appropriate choice for Chinese patients undergoing isolated CABG. But discrimination of STS risk evaluation system (AUC = 0.687), as well as EuroSCORE II (AUC = 0.647), was poor in high-risk group. One possible reason was that the preoperative parameters of patients in high-risk group had dramatic difference. Another possible reason was that EuroSCORE II and STS risk evaluation system also predict others cardiac surgical mortality, evaluating the predictive capacities of isolated CABG mortality may undermine its potency. SinoSCORE solved the problem that China did not have its own heart surgery risk evaluation system. Although just started, SinoSCORE has achieved good assessments in several medical centers throughout China [25][26][27][28][29][30] . Therefore, in theory, SinoSCORE should be most relevant to Chinese patients compared with others. In our study, SinoSCORE remained the most valuable risk evaluation system (AUC = 0.888). There are several reasons. First, our study database shared the same human race with SinoSCORE database. Second, There were more similar risk factors between our study database and SinoSCORE database, such as sex, peripheral vascular disease, active endocarditis, critical preoperative state 3,4 , and which might be the reason contributed to SinoSCORE had excellent expected power. Third, all the patients in the modelling of SinoSCORE were patients only underwent CABG while patients underwent different kinds of cardiac operations were subjected to EuroSCORE II and STS risk evaluation system.  As we all know, for the risk evaluation systems, it is more meaningful to improve the ability of predicting high risk patients. Although the discrimination of the three risk evaluation systems in the high-risk group was lower than the discrimination in the low-risk group, SinoSCORE was the best discrimination in high-risk group. A part of patients in the study were involved in the establishment of SinoSCORE, which might be the reason contributed to the discrimination of SinoSCORE in high-risk group is satisfactory. Although the three systems all had good calibration and discrimination, unfortunately, they sensibly underestimated the mortality in the entire cohort and subsets. One possible reason was that although cardiac surgery and perioperative care in China have developed rapidly in the last decades, there are still some gaps compared with the developed countries. Another possible reason was that there were 3.87% of patients (65 cases) excluded from the study because of incomplete data. The discrimination of risk evaluation systems was tested by AUC, which was used to assess how well the system could discriminate between survivors and non-survivors. Therefore, AUC is considered to be one of the most important indicators to evaluate the systems. AUC is an indicator of the comprehensive evaluation system, which is more important than the predictive accuracy.   There are some limitations of the study. First, this study was a double-center retrospective and non-randomised observational study. Second, the population size was still small compared with other systems that were sourced from a large number of patients. Third, EuroSCORE II and STS risk evaluation system are designed for variety cardiac surgery, And STS risk evaluation system can also predict other outcomes. Evaluate the predictive capacities of EuroSCORE II and STS risk evaluation system to predict only isolated CABG mortality may undermine its potency. The above points might contribute to bias. Therefore, the mortality statistics maybe limited to some degree.
In summary, for isolated CABG operation in East China patients, SinoSCORE fits the data well, with excellent discrimination and good calibration. SinoSCORE showed no compromise when compared with EuroSCORE II and STS risk evaluation system.

Methods
The study included all patients (1681 enrolled) undergoing isolated CABG in two hospitals (the first affiliated hospital of Nanjing Medical University and the east hospital affiliated to Tongji University) between January 2010 to December 2016, which was approved by ethics committees of the two hospitals. All experiments were performed in accordance with relevant guidelines and regulations. Written informed consent was obtained before data collection. There were 65 (3.87%) patients excluded from the analyses because of incomplete data, and a total of 1616 procedures comprised the study's database. The database included 1267 males and 349 females, with an average age of 65.21 ± 8.50 years. Each patient's diagnosis was confirmed by coronary arteriography. According to the study database, the operative risk was predicted using the algorithms online SinoSCORE available at http:// www.cvs-china.com/sino.asp, EuroSCORE II available at http://www.euroscore.org/calc.html and STS risk evaluation system available at http://riskcalc.sts.org/STSWebRiskCalc273/de.aspx. The predictive mortality of each patient was ascertained by each of the systems. The definition of mortality was post-operative in-hospital death and included against-advice discharge deaths.
To further explore the predict efficacy of the three evaluation systems, in each set, it was divided into two subgroups according to the realized mortality rate (1.92%, 31/1616): high-risk group (predictive mortality ≥1.92%) and low-risk group (predictive mortality <1.92%). The calibration and discrimination of the three systems in total patients and each subset were assessed, and were compared. In order to make a fair comparison among the three systems, we compared the predictive and realized mortality rates in total and each subset.  Statistical Analysis. The baseline data were presented as means ± standard deviation, interquartile rang for continuous variables and calculated by the t test; categorical variables were expressed as percentages and were calculated by the χ 2 (chi-square) test. P < 0.05 was considered as the statistically significant level.
Calibration and discrimination were used to assess predictive efficiency. The calibration was assessed by the Hosmer-Lemeshow (H-L) statistics. The calibration is considered to be good if P > 0.05, which indicates that the system could predict mortality accurately 31 . The discrimination was assessed by C statistics using the area under the receiver operating characteristic curve (AUC). Discrimination measured the evaluated system's capacity to differentiate the individuals by illness or death. AUC ranges 0.50-1.00, and AUC > 0.70, > 0.75, and > 0.80 indicates that the discrimination is available, good and excellent, respectively 32 .
Calibration plots of realized versus predictive mortality rates for 20 equally sized groups by ranked predictive risk calculated of the three systems were constructed. The ideal calibrated predictions consist with the 45° line. When points below or above the diagonal indicates overestimation or underestimation respectively.
The net benefit of three risk evaluation systems for predicting in-hospital mortality was performed by Decision Curve Analysis (DCA). DCA consists in the subtraction of the proportion of all patients who are false-positive from the proportion who are true-positive, weighting by the relative harm of a false-positive and a false-negative result 33 . The statistical analysis was performed with SPSS Version 18 (SPSS Inc., Chicago, Illinois, USA). DCA was performed with R software version 3.4.0 (The R Foundation for Statistical Computing; State of Jersey, Austria) with package Decision curve. Data Availability. All data generated or analyzed during this study are included in this published article.