Fully automated detection and segmentation of intracranial aneurysms in subarachnoid hemorrhage on CTA using deep learning

In aneurysmal subarachnoid hemorrhage (aSAH), accurate diagnosis of aneurysm is essential for subsequent treatment to prevent rebleeding. However, aneurysm detection proves to be challenging and time-consuming. The purpose of this study was to develop and evaluate a deep learning model (DLM) to automatically detect and segment aneurysms in patients with aSAH on computed tomography angiography. In this retrospective single-center study, three different DLMs were trained on 68 patients with 79 aneurysms treated for aSAH (2016–2017) using five-fold-cross-validation. Their outputs were combined to a single DLM via ensemble-learning. The DLM was evaluated on an independent test set consisting of 185 patients with 215 aneurysms (2010–2015). Independent manual segmentations of aneurysms in a 3D voxel-wise manner by two readers (neurosurgeon, radiologist) provided the reference standard. For aneurysms > 30 mm3 (mean diameter of ~ 4 mm) on the test set, the DLM provided a detection sensitivity of 87% with false positives (FPs)/scan of 0.42. Automatic segmentations achieved a median dice similarity coefficient (DSC) of 0.80 compared to the reference standard. Aneurysm location (anterior vs. posterior circulation; P = .07) and bleeding severity (Fisher grade ≤ 3 vs. 4; P = .33) did not impede detection sensitivity or segmentation performance. For aneurysms > 100 mm3 (mean diameter of ~ 6 mm), a sensitivity of 96% with DSC of 0.87 and FPs/scan of 0.14 were obtained. In the present study, we demonstrate that the proposed DLM detects and segments aneurysms > 30 mm3 in patients with aSAH with high sensitivity independent of cerebral circulation and bleeding severity while producing FP findings of less than one per scan. Hence, the DLM can potentially assist treating physicians in aSAH by providing automated detection and segmentations of aneurysms.


Scientific Reports
| (2020) 10 Aneurysmal subarachnoid hemorrhage (aSAH) is caused by spontaneous rupture of an intracranial aneurysm and represents a severe neurological condition with mortality ranging between 8 and 67% 1,2 . In patients with non-traumatic SAH, accurate and reliable diagnosis of aneurysm is essential for subsequent treatment to prevent re-bleeding and further neurological deterioration 3,4 . Usually, CT-angiography (CTA) is performed immediately upon radiological proof of SAH with sensitivity rates for detection of aneurysms ranging between 85-98% compared to digital subtraction angiography (DSA), which is considered as the gold standard for aneurysm imaging 5,6 . Due to advances in imaging and diagnostic quality over the last decades as well as its non-invasiveness, CTA has the potential to replace DSA in pre-treatment assessment of aSAH in selected patients 7,8 . Timely aneurysm occlusion by endovascular or surgical means represents a key concept in modern aSAH management with selection of most suitable treatment depending on various factors such as aneurysm localization, size, and shape 3,4,9 . With the introduction and advancements of convolutional neural networks (CNN) over the last decade, deep learning algorithms have shown great potential in performing diagnostic and analyzing tasks on medical imaging in different subspecialties [10][11][12] . Aneurysm detection on CTA, especially for smaller ones, proves to be challenging and misdiagnosis of aSAH can be associated with a poor clinical outcome [13][14][15] . Hence, the development of a deep learning model (DLM) to automatically detect and segment intracranial aneurysms would be of valuable assistance to the radiologist. This is of particular interest due to the growing workload and consequent fatigue of radiologists, which correlates with increased risk to miss relevant findings 16 .
Previous studies have proposed several approaches for (semi-) automated detection of intracranial aneurysms on CTA 10 and time-of-flight magnetic resonance angiography (TOF-MRA) 17,18 . However, these studies focused on unruptured intracranial aneurysms (UIAs) and did not include patients with aSAH. Hence, it remains unclear, how DLM algorithms perform on patients with acutely ruptured intracranial aneurysms (RIAs) and whether the extent of hemorrhage impedes detection sensitivity.
The objective of this study was to develop and validate a DLM for automatic detection and segmentation of aneurysms on CTA in aSAH. Furthermore, we evaluated the performance of the algorithm with regard to aneurysm size, location, and bleeding severity using an independent test set of patients with aSAH.

Materials and methods
Patient population. All consecutive patients treated for aSAH at our tertiary-care university hospital setting between January 2010 and December 2017 were reviewed and served as our dataset (n = 340). Exclusion criteria were: (1) unavailable CTA scans, (2) no distinct aneurysm finding in CTA, (3) severe motion artifacts on CTA, (4) insufficient contrast of CTA, and (5) previously treated aneurysms. There were no exclusions due to aneurysm size. Consequently, 87 patients were excluded, resulting in 253 patients (mean age: 54.7 ± 13.9 years, 67.6% female) with 294 aneurysms for analysis. All included scans between 2016 and 2017 (n = 68 patients/79 aneurysms) served as the training dataset, whereas scans before 2016 were allocated to the independent test set (n = 185 patients/215 aneurysms). Figure 1 provides an overview of patient selection. CTA source images were anonymized and exported to IntelliSpace Discovery (ISD, v3.0, Philips Healthcare, Best, the Netherlands).
Imaging. Included examinations were performed on different multidetector-CTs, namely iCT (n = 229), Brilliance 64 (n = 8), and Brilliance 16 (n = 14) (Philips Healthcare, Best, the Netherlands). On all of these scanners, our institutional standard clinical protocol for head and neck (n = 222) or head (n = 29) CTA was used with slice thickness ranging between 0.62 to 1.25 mm. Two CTA datasets of the head from referring hospitals were included. These were acquired using Siemens Somatom Definition AS (Siemens Healthineers, Erlangen, Germany) and Toshiba Aquilion 64 (Canon Medical Systems Corporation, Otawara, Japan). (Pre-) interventional DSA was performed on a biplane angiography system (AlluraClarity FD 20/15 or FD 20 C-arm system, Philips Healthcare, Best, the Netherlands). Two-hundred-thirty-four patients received (pre-) interventional DSA.
Reference standard. To establish the aneurysm count and location, aneurysms were confirmed by a neurosurgeon with four years, a radiologist with three years, and a board certified neuroradiologist with twelve years of experience in neurovascular imaging. Together, they conducted a review of the original radiology report of the CTA and double reviewed the CTA as well as DSA images (if available). Further, they collectively reviewed non-enhanced CT scans to determine respective Fisher grade of aSAH in consensus. Reference standard for aneurysm segmentations were provided by the above-mentioned neurosurgeon and radiologist, who performed semi-automatic 3D voxel-wise segmentations of aneurysms on ISD in consensus.  19 . Second, a multi-scale vessel enhancement filter was applied to the brain masked images to enhance the arteries from the background of CTA scans 20 . In this context, two vessel-enhanced images were computed; one with scale 0.5-5 voxels and the other with scale 5-15 voxels. The two vessel-enhanced images together would help to distinguish between blood vessels and aneurysms. Third, image standardization was performed by resampling to isotropic resolution of 0.5 × 0.5 × 0.5 mm and intensity normalized. The original CTA image was normalized between 5-95% of its intensity values and the vessel enhanced images were Z-score normalized. The fully automatic image pre-processing workflow is shown in Fig. 2 21 . 3D image-segments centered at the same image location provide inputs to the two pathways. However, for the second pathway, the image is down-sampled to a third of its original size. The model comprises of 11 layers with kernels of size 3 3 . The model also consists of residual connections for layers 4, 6, 8, and 10 whereas layers 9 and 10 are fully connected. A number of preliminary experiments were carried out to define the model training strategy. Based on this work, we found it most promising to use three different training procedures, henceforth referred to as DLM-Orig, DLM-Vess and DLM-LDim.
(I) For DLM-Orig, the input to the CNN is a single channel original CTA image with the size of extracted image-segments for training being set to 25 3 voxels. (II) For DLM-Vess, the input to the CNN is multi-channel comprising of the CTA source images and the two vessel enhanced images. The size of the extracted imagesegments was set to 25 3 voxels. (III) For DLM-LDim, CTA source images represent the input, but the size of the extracted image-segments was increased from 25 3 to 45 3 voxels.
To increase the amount of training samples for all three models, image augmentation was employed by flipping the images along their axes. Training batch size was set to 15, batch normalization was applied and parametric rectified linear unit was used as the activation function, Dice similarity coefficient (DSC) was provided as the loss function and the number of training epochs was set to 30.
The three DLMs were trained on the training-set by a five-fold-cross-validation approach using an 80-20% training-validation split without overlapping data. Similar to the work of Kamnitsas et al. 22 , we created an ensemble model by combining the outputs of the three separate DLMs. We refer to this combination strategy as DLM-Ens.
During inference, the trained DLMs (DLM-Orig, DLM-Vess and DLM-LDim) were applied to the test dataset. Each trained DLM consisted of five individual sub-models from the five-fold-cross-validation training approach. Outputs from these five sub-models were fused together using simultaneous truth and performance level estimation (STAPLE) 23 . Subsequently, STAPLE outputs from the three DLMs were passed to DLM-Ens to produce final aneurysm segmentation prediction.
Statistical analysis. Statistical analysis was performed with SPSS (V22.0; IBM Armonk, NY, USA), with P < .05 considered statistically significant. Categorical variables (e.g. scale results) are presented in form of percentages, frequencies, and medians. Comparison of paired nonparametric variables was performed with Wilcoxon signed-rank tests. Normally distributed variables are given as mean ± standard deviation with comparisons being conducted using Student's t-test.
Following measures were computed in order to determine the detection sensitivity of the aneurysms with TP being true positives, FP false positives and FN false negatives: www.nature.com/scientificreports/ Since no CTA scans without aneurysms were included, a true specificity cannot be determined; hence, precision was calculated, as usually conducted in machine learning tasks. To evaluate the segmentation performance of the DLMs, the automatically obtained segmentations (S DLM ) were compared to the manual annotations (S RS ) with spatial overlap measure between the segmentations being computed using DSC: Resulting DSCs are reported as median. For quantitative volumetric measurements, Pearson's correlation coefficient (r) was calculated.
Ethics approval and consent to participate. The local institutional review board approved this retrospective, single-center study (reference number: 19-1329; Ethikkomission der Medizinischen Fakultät der Universität zu Köln) and waived the requirement for written informed patient consent. All methods were conducted in accordance with the relevant guidelines and regulations.

Results
Baseline patient and aneurysm characteristics were comparable between the training and validation groups, as outlined in Table 1.
Evaluation of the different DLMs on the training data. In the training set, 79 aneurysms were identified as the reference standard.
In contrast, the DLM-Ens provided a sensitivity of 72% with median DSC of 0.74, precision of 80% and FPs/ scan of 0.21. For aneurysms > 30 mm 3 (n = 64) and > 50 mm 3 (n = 55), the DLM-Ens achieved higher sensitivity (84%, 94%) and higher DSC (0.79, 0.81) with high precision (88%, 94%) and decreased number of FPs/scan (0.1, 0.05). In this context, Table 2    www.nature.com/scientificreports/ Overall volumetric correlation with the reference standard. For true positive aneurysm findings, the DLM achieved a mean volume of 154.4 ± 236.0 mm 3 , thus achieving a significant correlation with manual segmentations (r = 0.95, P < .001). The strongest correlation between manual annotations and automatic DLM segmentations was observed for large aneurysms, while correlation was insignificant (P = .810) for small aneurysms, as detailed in Table 3 and Fig. 3b. Volume correlation plots between manual and automatic segmentations of training and test set using the DLM are presented in Fig. 4.

Discussion
In the present study, we developed and trained a DLM for automatic detection and segmentation of intracranial aneurysms on CTA in patients with aSAH and evaluated its performance on an independent test set. As a major finding of the study, the DLM achieves a detection rate of 87% for aneurysms > 30 mm 3 with high segmentation performance and less than one FP finding per scan. Previous studies have investigated DLMs for detection of UIA providing detection rates between 83 and 95% on TOF-MRA 17,18 and CTA 10 , respectively. However, these studies reported FPs/scan of 6 18 and 10 17 or did not report them 10 , which questions their usefulness in clinical routine. The present study is the first to evaluate the performance of a DLM for automatic detection and segmentation of aneurysms in patients with aSAH. In the current study, the DLM provided a sensitivity of 87% for aneurysms > 30 mm 3 18 . For aneurysms > 100 mm 3 (~ mean diameter of 6 mm), sensitivity of the DLM increased to almost 100%.Since aneurysms > 7 mm bear the highest risk of rupture [24][25][26] , the proposed DLM enables accurate detection of the clinically most relevant aneurysms in aSAH.
Achieving an overall DSC of 0.75 for aneurysm segmentation and of 0.87 for aneurysms > 100 mm 3 , the proposed DLM yields high segmentation performance (superior to the DLM provided by Sichtermann et al. 18 , who reported a DSC of 0.53) despite the small lesion size. Therefore, the DLM can be of assistance to treating physicians regarding treatment planning, e.g. providing 3D geometrical characterization of aneurysms.
Using ensemble learning of three different DLMs, an average number of FPs of less than one per scan was observed, being lower than in aforementioned studies 17,18 . Concerning detection of larger aneurysms (> 100 mm 3 ), FPs were almost zero per scan. For remaining findings, majority of FPs were found in venous vessels or adjacent to bone and usually not associated with arterial vessels. Therefore, most of them were easily identifiable as an incorrect finding.
There was no significant difference between the anterior and posterior circulation regarding aneurysm detection. However, ICA aneurysms showed the lowest detection rates with a significant difference compared to the MCA. This discrepancy is most likely due to presence of calcified plaques, dilatation and elongation of the vessel due to hypertension, complex anatomy of aneurysms, and, in particular, proximity to the skull 27 . In future, additional training methods identifying bone geometry might help the DLM to better distinguish bone from adjacent vessels and aneurysms.
Furthermore, we evaluated whether the presence of aSAH would impede the sensitivity of the DLM. Remarkably, the algorithm performed independently of the Fisher grade with additional presence of parenchymal or intraventricular hemorrhage neither decreasing detection rates nor increasing FPs/scan. Therefore, the DLM provides high feasibility and detection sensitivity even in severe aSAH with potential hydrocephalus or midline shift possibly leading to an overall complex image of the brain 28 .
CTA-based detection of intracranial aneurysms can be time consuming and challenging and shows a large variability among physicians, especially for small aneurysms. They present a highly variable interrater agreement depending on various factors, e.g., localization and subspecialty training, hence resulting in a lower detection rate than larger aneurysms even for experienced clinicians 13,14 . In this study, we aimed to overcome these limitations by establishing a robust reference standard based on review of CTA scans and reports by three readers as well as availability of DSA in the majority of patients to find aneurysms potentially not been described in the initial CTA report.
As proof of concept, the results of this study indicate that deep learning is able to provide sufficient detection of aneurysms in aSAH, especially of larger ones, which are bearing the highest risk of rupture [24][25][26] . With an overall detection rate of 82%, the DLM enables detection rates comparable to a human reader (e.g., sensitivity 83% for UIAs > 3 mm in the study by Park et al. 10 ). In this context, the DLM may provide support to physicians that lack concentration due to fatigue or lack of training. This is important in the setting of aSAH if the treating physician detects one aneurysm, preferably the one causing the bleeding, but potentially misses a second aneurysm caused by decreased concentration due to the "satisfaction of search" phenomenon 16,29,30 . Nevertheless, the DLM should be further improved to achieve 100% detection sensitivity, which is required to guarantee patient safety in aSAH. Table 4. Sensitivity, median dice similarity coefficient (DSC), precision, F1 score, and average false positive (FP)/scan of the DLM-Ens in the combined test cohort in relation to aneurysm volume.

Limitations
Besides its retrospective design, our study has a few limitations. Although scans from five different CT scanners were included in this study, 91% were acquired using the iCT, hence the true performance of the DLM on CTA images acquired on different scanners besides the iCT is unknown and the evaluation of generalizability is limited. This limitation should be addressed in future studies. Being a single-center study, only two scans were included from referring institutions, therefore a multi-center study investigating the performance of the DLM on other CT scanners using different protocols should be conducted. Further, we did not include previously treated www.nature.com/scientificreports/ aneurysms; hence, the detection performance in these patients (with additional artifacts potentially impeding detection) still needs to be investigated.

Conclusions
In conclusion, we developed a DLM able to provide sufficient detection of aneurysms in aSAH with almost 100% sensitivity for aneurysms > 100 mm 3 (~ mean diameter of 6 mm). Furthermore, high volumetric correlation to human segmentations and a low number of FPs/scan were obtained. Confounders such as cerebral circulation and bleeding severity did not significantly affect the performance of the DLM. Additional training is required to increase sensitivity for smaller aneurysms; however, the DLM may already be of assistance to treating physicians by providing automated detection of aneurysms in aSAH. www.nature.com/scientificreports/ Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creat iveco mmons .org/licen ses/by/4.0/.