Comparison of aneurysmal subarachnoid hemorrhage grading scores in patients with aneurysm clipping and coiling

Past studies revealed the prognosis differed between aneurysmal subarachnoid hemorrhage (aSAH) patients with surgical clipping and endovascular coiling. We retrospectively reviewed aSAH patients in our institution to investigate the effectiveness of grading scores between two groups. In the surgical clipping group (n = 349), VASOGRADE had a favorable performance for predicting delayed cerebral ischemia (DCI) (area under curve (AUC) > 0.750), and had better results than clinical (World Federation of Neurosurgical Societies (WFNS), Hunt & Hess (HH) and radiological scores (modified Fisher Scale (mFS), Subarachnoid Hemorrhage Early Brain Edema Score) (P < 0.05). Clinical and combined scores (VASOGRADE, HAIR) had favorable performance for predicting poor outcome (AUC > 0.750), and had better results than radiological scores (P < 0.05). In the coiling group (n = 320), none of the grading scores demonstrated favorable predictive accuracy for DCI (AUC < 0.750). Only WFNS and VASOGRADE had AUC > 0.700, with better performance than mFS (P < 0.05). The clinical and combined scores showed favorable performance for predicting a poor outcome (AUC > 0.750), and were better than the radiological scores (P < 0.05). Radiological scores appeared inferior to the clinical and combined scores in clipping and coiling groups. VASOGRADE can be an effective grading score in patients with clipping or coiling for predicting DCI and poor outcome.

the other single scores in aSAH patients 1 . Furthermore, the radiological scores had the poorest performance when predicting the outcome when compared with other scores 1,9 . These seem to deviate from the initial score results.
The surgical clipping and endovascular coiling groups had significantly different outcomes after aSAH. Coiling yielded a better clinical outcome in many large prospective randomized studies 11 . The effectiveness of grading scores may differ among aneurysm patients treated with clipping or coiling due to the different clinical courses. Herein, we compared the values of various grading systems in patients treated with clipping or coiling. The evaluation of different grading scores in different aSAH patients may provide neurologists with an optimal tool to predict outcomes.

Methods patients selection.
With approval from the Institutional Review Board of Second Affiliated Hospital of Zhejiang University, this study retrospectively reviewed SAH patients admitted to our neurosurgery department between January 2014 and December 2015, with a confirmatory radiographic diagnosis or lumbar puncture included. Because the study was retrospective, the institutional review board determined that patient informed consent was not required. All methods were performed in accordance with relevant guidelines and regulations.
The inclusion criteria included patients with spontaneous SAH. Exclusion criteria included angiogram-negative patients, patients with history of trauma or previous brain injury (i.e. stroke, hemorrhage, surgery, etc., which left associated chronic changes on CT), arteriovenous malformation, missing radiological data, presence of serious comorbidities before SAH onset (i.e. coagulation defects, uncontrollable hypertension, arrhythmia, etc.) and initial radiological assessments performed more than 3 days after SAH onset. Those patients underwent external ventricular drainage (EVD) or decompressive craniotomy only, or conservative treatment were also excluded. It should be mentioned that intra-parenchymal hemorrhage was a common phenomenon of severe SAH, so those patients with diffused SAH with associated intra-parenchymal hemorrhage were also included in this study.
Routine CT scans were conducted on all patients at admission to evaluate the severity of SAH. The presence of an aneurysm was confirmed via digital subtraction angiography (DSA). Surgical and endovascular treatments were performed by neurosurgeons within 3 days following hospital admission. The decision to perform surgical clipping or endovascular coiling was determined by aneurysm-related factors (i.e. geometry and location). The aSAH patients were divided into clipping and coiling groups, and treated according to available guidelines 12 .
Baseline characteristics and scores. For patients' characteristics, the following data were recorded: age, sex, history of drinking or smoking, hypertension, hyperlipidemia, diabetes, aneurysm location (anterior circulation, posterior circulation or multiple aneurysms), aneurysm sizes, SAH-related complications (DCI, hydrocephalus (defined as radiological ventricular enlargement or clinical symptoms appeared 13 ), rebleeding (defined as new or expanded hemorrhage 8 ), and presence of seizure. Information pertaining to size and location of aneurysm were collected from angiographic records.
The clinical scores, including HH 14 and WFNS 15 , and radiological scores, such as mFS 4 and SEBES 5 , were reviewed. Combined scores including VASOGRADE, HAIR, and the SAH Score were evaluated according to the original criteria of each score 8,16,17 . HH and WFNS were reviewed from the medical history that were documented at admission. MFS and SEBES were independently scored by two blinded neurosurgeons. An independent third examiner was used when there was a discrepancy between the two blinded neurosurgeons. outcome measures. The presence of DCI during hospitalization was used in this analysis as the primary outcome measure. The definition of DCI followed the criteria of previous studies 18,19 . Briefly, DCI was defined as clinical cerebral vasospasm (clinical deterioration that excluded other causes), or cerebral infarction (new cerebral infarction appeared on CT or MRI, which should exclude infarctions that appear within 48 hours after surgery or coiling).
The second outcome measure was the development of poor outcome, defined as a modified Rankin Scale (mRS) 20 ranging from 3 to 6, and assessed at 3 months after discharge. An additional dichotomization of mRS (poor outcome defined as mRS 4 to 6) was also used in the analysis of the supplemental data. The data of mRS were obtained using telephone follow-up or outpatient follow-up records.
Statistical analysis. Statistical analysis was performed using SPSS 22.0 (SPSS Institute, Chicago, IL, USA) and MedCalc Statistical Software version 18.2.1 (MedCalc Software bvba, Ostend, Belgium; http://www.medcalc. org; 2018). Continuous variables are expressed as the mean with standard error (SD). Categorical variables are expressed as frequency and percentage. Comparisons between groups were performed using the parametric t-test for continuous parameters and the Chi-square test or Fisher's exact test for categorical parameters. P < 0.05 was considered statistically significant.
To assess the grading scores' predictive performance for DCI and poor outcome in different groups after aSAH, the trends between grade and either DCI or poor outcome rate were analyzed using the cochairman-Armitage trend test 21,22 . Binary logistic regression analysis was performed with grading scores to evaluate the odds ratio (OR) and 95% confidence interval (CI) 1 . Receiver operating characteristic (ROC) curves were calculated to evaluate the area under the ROC curve (AUC), estimating the discrimination of grading scores 23 . An AUC greater than 0.750 was considered a good predictive accuracy (discrimination) of scores [24][25][26] . AUCs were compared using Delong test, and P < 0.05 was considered statistically significant 27 .

Results
Baseline characteristics. From January 2014 to December 2015, a total of 1,119 patients with SAH were admitted to our hospital. The following patients were excluded before analysis: 190 patients with negative angiogram, 9 patients with history of trauma or suspected trauma, 25 patients with a history of other brain injuries, 23 patients diagnosed with arteriovenous malformation, 22 patients accompanied with serious comorbidities, 97 patients who arrived at the hospital more than three days after onset of symptoms, 43 patients with aneurysm who neither underwent clipping nor coiling, and 41 patients with missing radiological data. In total, 669 aSAH patients were included in our analysis. In 669 aneurysmal patients, 349 patients underwent clipping and 320 patients underwent coiling (Fig. 1).
predictive performance of grading scores in clipping group. There was no patient distribution in HAIR grades 4, 5, 7, or 8, nor was there patient distribution in The SAH Score of grades 6, 7, or 8 ( Fig. 2A). In predicting DCI, each presented a strong trend between increased score and DCI rate (P for trend <0.001) ( Fig. 2A). Each score predicted DCI with a good correlation. VASOGRADE (OR = 5.421) had the highest OR value, followed by HH, HAIR, WFNS, mFS, the SAH Score, and SEBES (both OR > 1) ( Table 2).
For predicting poor outcome, each score showed a good trend between increased scores and poor outcome rate (P for trend <0.001) ( Fig. 2A). The OR values were considerably correlated with poor outcome. The VASOGRADE (OR = 6.123) had the highest OR value, followed by HH, HAIR, The SAH Score, WFNS, mFS, and SEBES (both OR > 1) ( Table 2).
The clinical scores, WFNS (AUC = 0.785) and HH (AUC = 0.773), and the two combined scores, VASOGRADE (AUC = 0.786) and HAIR (AUC = 0.797), showed favorable predictive accuracy for poor outcome (Table 2) (Fig. 3A). The performance of radiological scores was poorer when compared with the clinical and combined scores. The AUC of mFS (AUC = 0.671) was significantly lower than WFNS, HH, VASOGRADE, and HAIR (both P ≤ 0.006). Similarly, the AUC of SEBES (AUC = 0.654) was significantly lower than that of WFNS, HH, VASOGRADE, and the SAH Score (both P ≤ 0.040). It should be mentioned that the AUC of the SAH Score was significantly higher than the AUC of SEBES (P = 0.040), but lower than HH (P = 0.042) ( Table 3).
The ability of each score to predict mRS ranging from 3 to 6 were consistent with the ability to predict mRS ranging from 4 to 6 (Supplemental Table 1).
predictive performance of grading scores in coiling group. Distributions of each score are shown in www.nature.com/scientificreports www.nature.com/scientificreports/ 0.001), and combined scores (VASOGRADE and HAIR, P for trend <0.001 and the SAH Score, P for trend = 0.002), presented a good trend between increased score and DCI rate. Furthermore, both were significantly associated with an increased incidence of DCI, with the VASOGRADE scores demonstrating the strongest association (OR = 3.432), followed by the HH, WFNS, HAIR, mFS, the SAH Score (both OR > 1) ( Table 2).

Discussion
choice of grading scores in patients with clipping and coiling. In this study, we retrospectively compared the performance of different grading scores for predicting DCI and poor outcome in patients with surgery clipping and endovascular coiling. The performance of different grading scores varied in each patient group. However, we found that VASOGRADE maintained a leading predictive accuracy, whether in clipping or coiling patients. The radiological scores showed poor predictive power in each group of patients. The predictive performances of clinical and combined scores were acceptable and comparable between the two groups for predicting poor outcome (except The SAH Score in clipping group), but varied for predicting DCI.  www.nature.com/scientificreports www.nature.com/scientificreports/ In clipping patients, VASOGRADE may be the first choice for predicting DCI. The clinical scores and the other combined scores had a similar power to predict DCI, which was significantly better than the radiological scores (P < 0.05). Clinical scores, such as WFNS and HH, as well as combined scores, such as HAIR and VASOGRADE, can be optimal in predicting poor outcomes. It should be mentioned that the SAH Score was neither accurate nor superior compared to the HH in this study (P = 0.042).
In coiling patients, WFNS and VASOGRADE were recommended to predict DCI, despite no scores showing favorable predictive accuracy, and the statistical significance was only shown when compared to the mFS (P < 0.05). Additionally, the performance of the SAH Score was significantly lower than the clinical scores, which seems contrary to its intended purpose. Both clinical and combined scores showed favorable performance for predicting poor outcome, and were also significantly better than the radiological scores (P < 0.05). Similarly, when predicting DCI, the performance of the SAH Score was significantly lower than WFNS (P = 0.021).   www.nature.com/scientificreports www.nature.com/scientificreports/ performance of grading scores in literature. The predictive performance of clinical, radiological, and combined scores have been investigated previously. However, the results of comparison were conflicting in various studies 1,5,9,10,17,28 . In reviewing the literature, a recent study, based on 423 aSAH patients, compared three types of scores, and found that the combined grading scores (VASOGRADE, HAIR) have no superiority to clinical scores (WFNS, HH), whether in predicting cerebral infarction or unfavorable outcome. Additionally, the radiological scores (mFS, Barrow Neurological Institute Grading Scale (BNI) 29 ) had the poorest predictive performance in scores of three categories 1 . Another study comparing grading scores had 279 aSAH patients, and the radiological scores had poor predictive performance. The AUCs of BNI (AUC 0.684 and 0.680 for predicting unfavorable outcome and mortality) and mFS (AUC 0.604 and 0.554 for predicting unfavorable outcome and mortality) at discharge were significantly lower than HH (AUC 0.806 and 0.782 for predicting unfavorable outcome and mortality) and WFNS (AUC 0.785 and 0.740 for predicting unfavorable outcome and mortality) 9 . In the study regarding the SEBES score, mFS presented with the poorest AUC value (AUC = 0.66) for predicting unfavorable outcome (mRS score of 4 to 6 at 3 months) when compared with clinical scores (WFNS, HH). However, no difference was found in predicting DCI, although there was no AUC higher than 0.750 (AUC = 0.60 for WFNS, AUC = 0.56, for HH, AUC = 0.58 for mFS) 5 . However, the predictive performance of SEBES in our study was not as desirable as described in the initial study. While in another study, it was found that the combined score, HAIR, had an increased AUC value compared to the clinical score, HH 10 . Additionally, the initial study of The SAH Score showed a favorable AUC value of HH (AUC = 0.771) and WFNS (AUC = 0.777) 17 . potential causes of performance discrepancy. The inconsistent results between the literature may be derived from the different patient cohorts consisting of different numbers of clipping, coiling, negative angiograms, and other treatments (i.e. EVD, decompressive craniotomy, or conservative treatment). The clipping patients varied from 37.2-59.1%, and coiling patients varied from 19.6-62.8% in the previous studies 1,5,9,10,17,28 . Generally, the aneurysms of clipping patients are more likely to be wide-necked, larger in size, and located in the anterior circulation 30 . Coiling presented with more benefit to the short-term outcome due the reduced invasiveness relative to clipping 11 . In this study, we confirmed these differences of characteristics, clinical course, and outcome in the clipping and coiling groups (Table 1). Thus, we thought that the cohorts with more coiling patients may have a relatively better outcome than the cohorts with more patients from the clipping group. The different outcome distributions may also reflect different results of the performance analysis of the grading scores. Therefore, we analyzed the performance of grading scores in patients that had underwent coiling or clipping.
Regarding the variation of clinical course and outcome in clipping and coiling patients, we observed that the performance of radiological scores was generally poor in both clipping and coiling patients, similar to the results of previous studies 1, 5 . We speculate that this may due to differences regarding the evaluation index of the imaging score and the prognosis score. As we know, the prognostic assessment is only quantified by clinical symptoms (clinical cerebral vasospasm (a part of DCI), mRS, and Glasgow Outcome Scale, which was quantified from the ability of work and life change than radiological change). However, the radiological data may partially indicate the severity after SAH, the clinical symptoms of each patient with the same degree of bleeding or brain edema vary due to each individual's unique tolerance.
VASOGRADE simply combined the data from clinical and radiological scores to predict DCI and poor outcome, and avoided inaccurate predictions for those with mild clinical symptoms and significant amount of bleeding 7 . A lower VASOGRADE score compared to other combined scores can improve the discrimination of VASOGRADE. Besides, the HAIR and The SAH Score had more sophisticated categorization, and were usually lost patients' distribution in some grades which appears to have some problems in the patient distributions. In contrast to VASOGRADE, HAIR and the SAH Score were derived from multiple logistic regression models, therefore the accuracy is greatly limited by the prior factors included in the analysis. The characteristics and size of the patient cohort, especially the small sample size, impacted the performance of the predictive model. Meanwhile, these scores were initially invented to predict the in-hospital mortality rather than the DCI or poor outcome 8,17 . It showed no superiority to single grading systems, but their predictive performance was acceptable for predicting DCI and poor outcome.

Limitations
Our study presented some limitations that should be addressed. First, the retrospective nature and single-center observational study design may introduce some potential biases. To limit this impact, all clinical scores were reviewed from the medical records at admission and reconfirmed by signs and symptoms documented at admission. Radiological scores and DCI confirmation were conducted by two examiners that were blinded to clinical information. There is a possibility that clinically silent infarctions were missed in our study, especially for those patients with mild symptoms and a short hospital stay. However, all patients in this study had a routine CT evaluation prior to being discharged from the hospital to avoid missing any clinically silent infarctions. The DCI rate was consistent with other studies (ranging from 21.0% to 31.3%) 5,29,31,32 . Second, the difference of dichotomization regarding mRS and the time of outcome evaluation may introduce some biases to the performance of different grading scores. We adopted the dichotomization of mRS (mRS scores of 1-2 defined as favorable outcome, and scores of 3-6 defined as poor outcome) and the assessment of outcome at 3 months after discharge, as utilized by previous studies 1, 29 . We analyzed the predictive accuracy of each score by setting poor outcome with a definitive score of mRS > 3 (Supplemental Table 1), and the results were consistent with the former dichotomization. Future studies should validate the predictive performance for long-term prognosis. Third, differences in selection criteria for clipping versus coiling may introduce biases to future studies. There were more smokers in the clipping group than the coiling group. This potentially limits the generalizability of our results.