Risk stratification based on components of the complete blood count in patients with acute coronary syndrome: A classification and regression tree analysis

To develop a risk stratification model based on complete blood count (CBC) components in patients with acute coronary syndrome (ACS) using a classification and regression tree (CART) method. CBC variables and the Global Registry of Acute Coronary Events (GRACE) scores were determined in 2,693 patients with ACS. The CART analysis was performed to classify patients into different homogeneous risk groups and to determine predictors for major adverse cardiovascular events (MACEs) at 1-year follow-up. The CART algorithm identified the white blood cell count, hemoglobin, and mean platelet volume levels as the best combination to predict MACE risk. Patients were stratified into three categories with MACE rates ranging from 3.0% to 29.8%. Kaplan-Meier analysis demonstrated MACE risk increased with the ascending order of the CART risk categories. Multivariate Cox regression analysis showed that the CART risk categories independently predicted MACE risk. The predictive accuracy of the CART risk categories was tested by measuring discrimination and graphically assessing the calibration. Furthermore, the combined use of the CART risk categories and GRACE scores yielded a more accurate predictive value for MACEs. Patients with ACS can be readily stratified into distinct prognostic categories using the CART risk stratification tool on the basis of CBC components.

developed in patients undergoing coronary angiography for all-cause mortality 11 and clinical morbidity endpoints 12 , and subsequently rederived in individuals with no cardiovascular disease history 13 . The CBC score, based on beta coefficients from a logistic regression model, was a powerful predictor of poor outcomes in patients with suspected cardiovascular disease, suggesting that combined use of CBC components can provide valuable additional risk information to clinicians. However, few studies have produced a clinically practical way of integrating various CBC variables to stratify risk in patients with ACS.
Classification and regression tree (CART) analysis is an innovative and powerful statistical technique where the most important predictors of outcomes are identified and patients are divided into different homogeneous risk groups 14 . CART analysis has been shown to allow reliable risk stratification for in-hospital mortality in patients with acute heart failure 15 and myocardial infarction (MI) 16 . Moreover, compared with logistic or Cox regression models requiring a nomogram reference to calculate risk, the CART analysis produces a decision tree that is simple to interpret and apply at the bedside 14 . In the present study, CART analysis was performed to identify key CBC components and develop a risk stratification model. The incremental prognostic value of combining the CART risk model with the GRACE score was also determined.

Results
CART for CBC to stratify risk. Of the 2,693 patients with ACS, 240 (8.9%, 95% CI 7.8-10.0%) had experienced major adverse cardiovascular events (MACEs) at 1-year follow-up. The CART method identified the WBC count from the 18 CBC metrics as the best single discriminator between those with and without MACEs. For the node with patients having a WBC count of less than 8.62 × 10 9 /L, a hemoglobin level of less than 133 g/L provided additional prognostic value. For the node with patients having a WBC count of 8.62 × 10 9 /L or higher (≥8.62 × 10 9 /L), the next best predictor of a MACE was MPV. For the node with patients having the WBC count of 8.62 × 10 9 /L or higher (≥8.62 × 10 9 /L) and a MPV level of less than 12.90 fL, a hemoglobin level of less than 128 g/L provided increased prognostic information. Based on the number of positive biomarkers in the CART analysis, patients were stratified into three risk groups: 1) low-risk with 0 positive biomarker (WBC count <8.62 × 10 9 /L and hemoglobin level ≥133 g/L), 2) intermediate-risk with 1 positive biomarker (WBC count <8.62 × 10 9 /L and hemoglobin level <133 g/L; or WBC count ≥8.62 × 10 9 /L, MPV level <12.90 fL, and hemoglobin level ≥128 g/L), and 3) high-risk with 2 positive biomarkers (WBC count ≥8.62 × 10 9 /L and MPV level ≥12.90 fL; or WBC count ≥8.62 × 10 9 /L, MPV level <12.90 fL, and hemoglobin level <128 g/L). Figure 1 depicts the final tree generated by the CART analysis and each child node of this tree.
Baseline characteristics in the stratified patients with ACS. Baseline demographics and clinical characteristics of patients in the three risk groups are shown in Table 1. High-risk and intermediate-risk patients were more likely to be older, female and have a history of diabetes mellitus, advanced Killip class, cardiac arrest, elevated cardiac enzymes, ischemic ST-segment changes on electrocardiograms (ECG), and a diagnosis Figure 1. Predictors of 1-year major adverse cardiac events using the classification and regression tree model, and risk stratification for patients with acute coronary syndrome. Each predictor was written within line, and each node is based on the data available for each of the predictive variables presented. Each approval rate for each predictor is marked within ovals (intermediate node) or squares (terminal node). WBC, white blood cell; MPV, mean platelet volume; MACEs, major adverse cardiovascular events. . The linear test of trends across the CART risk categories was significant (P trend < 0.001). After adjustment for multiple covariates using five different models, the CART risk categories, which were assessed using tertiles and a linear trend test, remained independent predictors of MACEs in patients with ACS (P < 0.001).
To further validate the predictive ability of the CART algorithm, the association between the three CBC components in the CART analysis and outcomes was determined ( Table 2). The three biomarkers (WBC, hemoglobin, and MPV) evaluated as both continuous and categorical variables were significantly associated with higher risk of MACEs in the unadjusted and adjusted Cox models (P < 0.001).
Discrimination and calibration testing. First, we evaluated the usefulness of the CART risk categories in risk discrimination. As shown in Fig. 3, the area under the receiver operating characteristic curve (ROC-AUC) of the CART risk categories for predicting MACEs was 0.721 (95% CI 0.704-0.738). The discriminatory performance of the CART risk categories was similar to the GRACE score (0.732 [95% CI 0.714-0.748]) (P = 0.674). When the CART risk categories were incorporated into the GRACE score, the updated prediction model underwent an improvement in the ROC-AUC to 0.792 (95% CI 0.776-0.807). There was a significant difference in the ROC-AUC between the updated model and GRACE score alone (0.061, 95% CI 0.036-0.086, P < 0.001). With regard to the classification accuracy, by including the CART risk categories into the GRACE score, 38.6% (95% CI 25.7-51.4%) of patients were correctly reclassified (net reclassification improvement [NRI], P < 0.001) (Fig. 4). The integrated discrimination improvement (IDI) was estimated as 0.05 (95% CI 0.04-0.07).
Second, we evaluated the calibration of the CART risk model in patients with ACS. Comparison of the predicted probabilities with the observed frequencies of MACEs at 1-year follow-up demonstrated good calibration, as shown in Fig. 5. Figure 6 displays a forest plot showing HRs for the linear trend by the CART risk categories after adjusting for the GRACE score in various subgroups. The CART risk categories predicted risk in all relevant subgroups with no evidence for differential associations (P value for interaction >0.05 for all subgroups).

Discussion
In the present study, the CART method identified three of 18 potential CBC components as the most important prognostic variables. WBC count, hemoglobin, and MPV levels were independent predictors of MACE risk at 1-year follow-up. Patients were stratified into low-, intermediate-, and high-risk categories based on these three variables. The CART risk categories were strongly associated with MACE risk, and had good discrimination and calibration. Moreover, incremental prognostic information was achieved when the CART risk categories were added to the GRACE score.
Clinical risk prediction tools are helpful in informing medical decisions in a diverse patient population 3 . A variety of risk stratification models have been proposed for the prediction of poor outcomes in patients with ACS. Unfortunately, these models often require the collection of extensive data from clinical characteristics and test results, or they have many calculation difficulties for application to the clinical care process 17 . The CBC is the most widely available, relatively inexpensive laboratory test in the early in-hospital period. In certain clinical settings, such as STEMI patients undergoing primary percutaneous coronary intervention (PCI), the CBC may be the only test determined before or during the procedure. The present study developed a simple risk stratification tool on the basis of the CBC measures that can effectively identify patients with ACS at low-, intermediate-, and high-risk hemoglobin; MCHC, mean corpuscular hemoglobin concentration; MPV, mean platelet volume; PDW, platelet distribution width; P-LCR, platelet large cell ratio; LDL, low-density lipoprotein cholesterol; ECG, electrocardiograms; LVEF, left ventricular ejection fraction; GRACE, Global Registry of Acute Coronary Events; ACS, acute coronary syndrome; NSTE-ACS, non-ST-segment elevation ACS; STEMI, ST-segment elevation MI; ACEI, angiotensin-converting enzyme inhibitor; and ARB, angiotensin-receptor blocker. Values are expressed as number (percentage) or median (interquartile range). Low-risk category is defined as patients with WBC count <8.62 × 10 9 /L and a hemoglobin level ≥133 g/L. Intermediate-risk category is defined as patients with either WBC count (<8.62 × 10 9 /L) + hemoglobin level (<133 g/L) or WBC count (≥8.62 × 10 9 /L) + MPV level (<12.90 fL) + hemoglobin level (≥128 g/L). High-risk category is defined as patients with either WBC count (≥8.62 × 10 9 /L) + MPV level (≥12.90 fL) or WBC count (≥8.62 × 10 9 /L) + MPV level (<12.90 fL) + hemoglobin level (<128 g/L).
SCieNTiFiC RepoRts | (2018) 8:2838 | DOI:10.1038/s41598-018-21139-w for 1-year MACEs. The CART risk stratification model uses existing clinical laboratory data with no additional costs, without the need for either a physician to collect other data in addition to the CBC, or application of a calculator to assist in the risk evaluation of adverse outcomes. Furthermore, the CART risk categories showed similar discrimination capacity as the GRACE score, but with added prognostic value. Therefore, it could be easily and routinely implemented into everyday clinical practice.
Although 18 CBC variables were considered in this study, only three were actually used in construction of the CART. These three indices, namely, the WBC count, hemoglobin, and MPV levels, have previously been demonstrated as independent predictors of adverse outcomes in patients with ACS. The WBC count is an important marker of inflammation measured on routine hemograms, and recent studies demonstrated a persistent association between the WBC count and MACE risk in patients with ACS 9 or all-comers population undergoing PCI 8 . Several hypotheses have been postulated to explain the mechanism responsible for the association, such as leukocyte-exacerbated coronary thrombus formation, leukocyte-mediated microvascular injury, and release of proinflammatory and vasculotoxic factors 18 . MPV, the most commonly used measure of platelet size, is considered a marker of platelet activation 19 . Increased MPV is associated with high platelet aggregation activity, more thromboxane synthesis and granule secretion, and high expression of adhesion molecules 19 . Studies have reported that elevated MPV levels were closely related to higher incidence of MACEs in patients with ACS 10,20 . Anemia has the potential to worsen the myocardial ischemic insult in ACS, both by decreasing oxygen supply to  21 and by increasing myocardial oxygen demand through stimulating a greater cardiac output to maintain adequate blood oxygen levels 22 . Anemia may represent larger extent of ischemic myocardium, and was a powerful predictor of ischemic events and cardiovascular mortality in patients with ACS 6 . The excess mortality associated with high hemoglobin concentration has been shown in STEMI patients with high WBC count, whereas the association did exist in those with low WBC count 23 . The interaction between hemoglobin level and WBC count was also observed in our present study. The combined use of the WBC count, hemoglobin, and MPV levels may represent a biochemical-integrated assessment of inflammatory status, thrombotic risk, and the extent of myocardium at risk. The results showing that the WBC count, hemoglobin, and MPV levels were the three predictors providing the best MACE risk discrimination underscored the importance of these pathophysiologic mechanisms in patients with ACS. Because there are multiple risk factors in the same patient, a meaningful risk factor analysis must consider factors in combination rather than isolation 15 . In a study including 29,526 patients with suspected cardiovascular disease 11 , Figure 3. Receiver operating characteristic curve analysis. Addition of the risk categories to the GRACE score improved the predictive value for major adverse cardiovascular events. See Table 1 footnote for the definitions of risk categories determined by the white blood cell count, hemoglobin, and mean platelet volume levels.  Table 1 footnote for the definitions of risk categories determined by the white blood cell count, hemoglobin, and mean platelet volume levels.
SCieNTiFiC RepoRts | (2018) 8:2838 | DOI:10.1038/s41598-018-21139-w a CBC score was developed for the prediction of mortality with the use of the multivariable logistic regression coefficients. Results of the prospective cohort study showed the CBC score which concurrently considered the WBC count, platelet count, and four RBC-related metrics (RDW, mean corpuscular volume [MCV], mean corpuscular hemoglobin concentration [MCHC], and hematocrit [HCT]) divided patients into subgroups at markedly different mortality risks (<1% to >14%). The CBC score was significantly superior to models based only on WBC count alone, HCT alone, or traditional risk factors. Furthermore, the CBC score was significantly associated with morbidity endpoints consisted of heart failure, myocardial infarction, coronary artery disease, atrial fibrillation, and other risk factors that may lead to death 12 . These findings suggested that the combined use of CBC parameters may provide excellent risk prediction and clinical acceptance. Previous studies examining the predictive value of the combined use of CBC markers in patients with ACS often interpreted some individual CBC components as a ratio or aggregated a risk score. The elevated WBC count to MPV level ratio was associated with worse outcomes in patients with STEMI 24 or non-ST-segment elevation ACS (NSTE-ACS) 25 . The role of the aggregate risk score based on the WBC count, hemoglobin, and PDW levels has been assessed in patients with acute MI 26 . This study by Bae et al. found that patients with scores of 2 and 3 had increased risk for in-hospital death compared with those with scores of 0 and 1 26 . The aggregate risk score provided an independent prognostic value beyond cardiovascular risk factors, such as age, gender, heart rate, systolic blood pressure, Killip class, creatinine, and myocardial enzyme levels. However, the study did not construct a series of logistic or Cox regression models to consider other CBC markers simultaneously, which may yield more accurate prognostic  26 . Moreover, the study did not test the model performance (discrimination and calibration), which inform clinicians about the accuracy of the prediction 17 . In the CART analysis, all variables have equal opportunities at each decision level at optimal cut-off values. The CART method can identify importance of variables and search for optimal combinations of variables that best predict outcomes. Results from the CART are presented as tree diagrams similar to the mode of thinking of clinicians in assessing the prognosis. In the present study, we used the CART method to identify the most important predictors from the CBC panel. Three variables, WBC count, hemoglobin, and MPV, were selected to stratify patients into groups at low-, intermediate-, and high-risk for MACEs at 1-year follow-up. To reduce the risk of confounding, we determined the predictive role of the CART risk categories after adjusting for a comprehensive list of characteristics that may bias our results. Both each of the three variables and the CART risk categories independently predicted the risk of MACEs, which indicated the utility of the CART risk categories for identification of high-risk patients. The predictive accuracy of the CART risk categories was validated by measuring discrimination (C statistic of 0.721) and calibration (assessed graphically). Given the wide availability of CBC data, the CART risk model may provide valuable risk information to clinicians.
The GRACE risk score has been demonstrated to a clinically prognostic model in patients with ACS 3,4 . The GRACE score alone had an AUC of 0.732 in our patients, which confirmed the score as a valuable tool for risk assessment. Therefore, the GRACE score was chosen as the reference to test the additive value of the CART risk categories. This added value was evidenced by the significant increase in three complementary statistical metrics (the C-statistic, NRI, and IDI). The GRACE score did not take the CBC components into account as candidate variables during the development of the model 4 . Our study found a significant but weak correlation between the CART risk categories and GRACE score, suggesting that they may reflect some different pathophysiological aspects related to outcomes. It is possible that indicators reflecting other pathophysiological processes of ACS could help to further classify patients into homogeneous subgroups. Therefore, combining the CART risk categories with the GRACE score may be a more valuable model than the GRACE score alone to identify patients with increased risk of adverse outcomes. Consequently, the combined model may help in further optimizing management decisions and improving outcomes in patients with ACS. Additional studies in large-scale populations are required to fully elucidate the clinical utility of combined of the CART risk categories with the GRACE score.
The current study had several limitations. First, this was a retrospective observational analysis. The study findings might have been influenced by some residual confounders, although we attempted to control our results for multiple prognostic factors by means of multivariable and subgroup analyses. Second, the study used only the CBC data on admission. The CBC parameters obtained from a repeated measurement over time may provide additional research ability to stratify risk. Third, given a lower occurrence of MACEs in our entire study population, we did not perform a subgroup analysis using separate end points. Fourth, as the sample size in the study was relatively small, the prognostic value of the CART risk categories needed to be externally validated on large cohorts.  Table 1 footnote for the definitions of risk categories determined by the white blood cell count, hemoglobin, and mean platelet volume levels.
With the help of the optimal combination of WBC count, hemoglobin, and MPV levels as determined by CART analysis, patients with ACS can be quickly and accurately stratified into subgroups with distinct MACE risks at 1-year follow-up. Moreover, the CART risk categories added prognostic information to the GRACE score. Because the CBC is relatively inexpensive and routinely obtained on admission, the CART risk model could be a potentially useful tool for early risk stratification of patients with ACS. Further studies are required to confirm the usefulness of the CART risk categories in clinical practice, such as improvement of the evaluation and, potentially, management and outcomes of patients with ACS.  Table 1 footnotes for definitions of risk categories determined by white blood cell count, hemoglobin, and mean platelet volume levels. § Adjusted for the Global Registry of Acute Coronary Events score. Cardiology of the First Hospital of Lanzhou University with initial diagnoses of ACS were screened. ACS was defined as STEMI and NSTE-ACS, including non-STEMI (NSTEMI) and unstable angina. According to the Universal Definition of Myocardial Infarction 27 , STEMI was diagnosed in patients showing a rise and/or fall of cardiac biomarker values with at least one value above the 99 th percentile of the upper reference limit and at least one of the following: ischemic symptoms, new-onset ST-segment elevation or left bundle-branch block in the index or subsequent ECG, pathological Q waves on ECG, and/or imaging evidence indicative of new ischemia. Cases of NSTEMI required at least one instance of positive cardiac biochemical marker of necrosis without new ST-segment elevation observed on ECG 28 . Unstable angina was defined as rest, new-onset, or worsening angina, respectively, with or without ischemic changes on the ECG and normal myocardial enzymes 28 . The exclusion criteria included (1) cancer or hematological proliferative diseases, (2)  The GRACE risk score was derived from eight variables that were available on admission (age, heart rate, systolic blood pressure, serum creatinine concentration, Killip class, cardiac arrest, ST-segment deviation, and elevated cardiac enzymes) 4 . Values for these variables were entered into a GRACE risk calculator (www. outcomes-umassmed.org/grace/) to obtain scores for each patient. The GRACE score was originally designed to estimate the cumulative risks of all-cause mortality or nonfatal MI in the period from admission to six months, and it has been shown to have good predictive value for cardiovascular events within 1-year of admission 5,29 . Clinical Outcomes. The outcome of interest was MACEs defined as a composite of all-cause mortality, nonfatal MI, stroke, heart failure, or ischemia-driven revascularization at 1-year follow-up. Investigators who were unaware of the aims of the study and blinded to laboratory findings performed the follow-up via medical records or telephone contact. Data Collection. Trained investigators collected data by reviewing hospital medical records using a standardized case report form. Data extraction included information regarding demographic characteristics, medical histories, presentation features, laboratory and echocardiographic parameters, medication, and clinical outcomes.
CART Analysis. The CART analysis was used to identify the best predictors of adverse outcomes and develop the risk stratification model. The CART analysis can establish a binary-branching tree to classify patients into homogeneous subgroups by using recursive partitioning and regression techniques 30 . The CART method involves two steps: First, the tree branches are divided into two child nodes that contain a subgroup of samples from a root node that includes all samples. The child nodes in turn can become parent nodes producing additional child nodes until some minimum subgroup size is reached. The criterion for branching is selected based on the Gini Index after examining every value of each candidate variable. Second, the tree structure is pruned from the bottom of the tree until the tree fits without overfitting the information contained in the data set. The CART method is a nonparametric procedure that can handle continuous variables that are highly skewed and categorical data with either a nominal or ordinal structure. For our study, the independent variable included 18 CBC components, namely total WBC count and its subtypes (neutrophil, lymphocyte, eosinophil, basophil, and monocyte), RBC count, HCT, MCV, RDW, hemoglobin, MCH, MCHC, platelets, MPV, PDW, P-LCR, and plateletcrit. The minimum size of cases in parent node and final child nodes were 100 and 50, respectively. A 10-fold cross-validation, where each run consisted of 10 random partitions of the samples into 90% training and 10% test sets, was used to prune the tree. The MACE rate was calculated for each of the terminal nodes in the CART analysis. Branch points of the CART dividing patients into categories with a higher incidence of MACE were considered as positive biomarkers. Patients were further classified into low-, intermediate-, and high-risk categories according to the number of positive biomarkers. The predictive value of the risk stratification model was determined using univariable and multivariable Cox proportional hazards models. The model discrimination was assessed by calculating the ROC-AUC (C-statistic). The ability of the CART risk stratification to more accurately stratify individuals into higher or lower risk categories (reclassification) was evaluated by using the NRI and IDI metrics. The calibration of the risk categories were tested by comparing the predicted with the observed outcomes in each quartile of predicted risk by CBC variables in the CART analysis.
Statistical Analyses. The Kolmogorov-Smirnov test was used to assess the normal distribution of quantitative variables. Continuous variables are reported as medians with interquartile ranges and compared with Kruskal-Wallis test, as all data were non-normally distributed. Categorical variables are presented as frequencies and percentages and compared with χ 2 test, Fisher exact test, or linear-by-linear association, as appropriate. The relationship between the CART risk categories and the GRACE score was assessed by Spearman rank correlation. The Kaplan-Meier method was used to analyze the cumulative incidence of events among the different risk categories during follow-up, and the pairwise log-rank test was performed to assess the difference between the groups. The relationship between the CART risk categories and outcomes was determined using Cox regression in unadjusted models and in five different models adjusted for demographic variables (model one: age, gender, and body mass index), demographic variables plus clinically relevant covariates for MACE outcomes (model two: age, gender, body mass index, smoker, diagnosis of hypertension, diabetes, dyslipidemia, history of MI, PCI, stroke, heart rate, systolic blood pressure, Killip class, cardiac arrest on admission, elevated cardiac enzymes, ST-segment deviation on ECG, baseline creatinine, glucose, low-density lipoprotein levels, and left ventricle ejection fraction [LVEF]), other CBC components (model three: neutrophil, lymphocyte, eosinophil, basophil, monocyte, RBC count, HCT, MCV, RDW, MCH, MCHC, platelets, PDW, P-LCR, and plateletcrit), the GRACE risk score (model four), or the variables that were statistically different among the 3 risk groups plus other variables known to affect prognosis after ACS (model five: age, gender, diabetes, heart rate, systolic blood pressure, Killip class, cardiac arrest on admission, baseline neutrophil, eosinophil, basophil, monocyte, RBC count, HCT, MCV, RDW, MCH, MCHC, platelets, PDW, P-LCR, plateletcrit, creatinine, glucose, and low-density lipoprotein levels, elevated cardiac enzymes, ST-segment deviation on ECG, LVEF, GRACE score, and ACS presentation). In the Cox regression analysis, the CART risk categories were modeled as either nominal (individual groups) or ordinal (a linear trend test) categorical variables. Additionally, individual CBC biomarkers in the CART risk categories were also evaluated both as continuous and as categorical variables based on cut-off points identified with the CART method. Results are reported as HRs with 95% CIs. The predictive accuracy of the CART risk categories alone, and the combined CART risk categories and the GRACE score, were estimated by applying DeLong's test to compare the AUC from each of the models 31 . The continuous NRI and IDI measures were also computed to assess the increased discriminative power after the addition of the CART risk categories to the GRACE score by using the PredictABEL package in R 32 . The continuous NRI determines net percentages of patients who do and do not have events who were correctly reclassified using the updated model 33 . In graphical representation of the continuous NRI, for those without events, decreasing predicted risk is good; for those with events, increasing predicted risk is good. The IDI is equal to the difference of initial and updated models in discrimination slope formed between the mean predicted probabilities in those with and without events 33 . Calibration plot was made using the rms package in R 34 . Subgroup analyses were conducted to examine whether the prognostic value of the CART risk categories varied according to specified factors including age (<65 years or ≥65 years), sex, smoker, diagnosis of hypertension, diabetes, dyslipidemia, heart rate (<median or ≥median), systolic blood pressure (<median or ≥median), Killip class (I or >I), low-density lipoprotein level (<median or ≥median), glucose level (<median or ≥median), creatinine level (<median or ≥median), elevated cardiac enzymes, ST-segment deviation on ECG, LVEF (<50% or ≥50%), GRACE score (<median or ≥median), and ACS presentation (STEMI or NSTE-ACS). We evaluated effect modification using interaction terms between subgroups. A 2-tailed P < 0.05 was considered significant. Data availability statement. All data generated or analysed during this study are included in this published article.