Deep learning segmentation of major vessels in X-ray coronary angiography

X-ray coronary angiography is a primary imaging technique for diagnosing coronary diseases. Although quantitative coronary angiography (QCA) provides morphological information of coronary arteries with objective quantitative measures, considerable training is required to identify the target vessels and understand the tree structure of coronary arteries. Despite the use of computer-aided tools, such as the edge-detection method, manual correction is necessary for accurate segmentation of coronary vessels. In the present study, we proposed a robust method for major vessel segmentation using deep learning models with fully convolutional networks. When angiographic images of 3302 diseased major vessels from 2042 patients were tested, deep learning networks accurately identified and segmented the major vessels in X-ray coronary angiography. The average F1 score reached 0.917, and 93.7% of the images exhibited a high F1 score > 0.8. The most narrowed region at the stenosis was distinctly captured with high connectivity. Robust predictability was validated for the external dataset with different image characteristics. For major vessel segmentation, our approach demonstrated that prediction could be completed in real time with minimal image preprocessing. By applying deep learning segmentation, QCA analysis could be further automated, thereby facilitating the use of QCA-based diagnostic methods.

models were evaluated using datasets from two institutes, and the impact of data composition and dataset size on segmentation performance was investigated.

Methods
Study population. In this study, 3309 patients who underwent X-ray coronary angiography in Asan Medical Center from Feb. 2016 to Nov. 2016 were retrospectively enrolled (Fig. 1a). This study complies with the Declaration of Helsinki and research approval was granted from the Institutional Review Board of the Asan Medical Center with a waiver of research consent. A series of X-ray coronary angiography comprised 2-3 acquisitions per three major vessels (right coronary artery, RCA; left anterior descending artery, LAD; left circumflex artery, LCX) at different acquisition angles (Fig. 1b). One image per major vessel with at least one lesion (diameter stenosis > 30%) was collected in projection, demonstrating the most severe narrowing. After excluding cases in which a coronary tree structure was not identified, such as coronary total chronic occlusion or overlaps of a medical device used for prior treatment, 3302 images of 2042 patients were ultimately included in this study. The baseline characteristics of the patients are summarized in Table 1.
X-ray coronary angiography and label process. Catheterization was performed through the femoral or radial routes using standard catheters, and coronary angiograms were digitally recorded. Personal patient information in the DICOM format was removed using the anonymization tool provided by the Health Innovation Big Data Center of Asan Medical Center. To generate label masks, a major vessel area on an angiogram was annotated by two experts with more than ten years of experience using CAAS workstation 7.5 (Pie Medical Imaging, Netherlands). On the image frame in the end-diastole phase, the initial mask of a major vessel boundary was generated using the semi-automatic edge-detection tool and then manually corrected. For each major vessel, the segmentation area was set from the ostium to the far distal (Fig. 1c). For RCA, the distal end of the segmented area was the bifurcation point between two branches-posterior descending artery (PDA) and posterolateral artery (PL). The capture and extraction of pixel information at the major vessel boundaries were performed using a customized Python script, and label masks were separately created ("internal dataset"; Fig. 1d).
networks. Four deep learning models were evaluated, which were constructed on the basis of U-Net architecture for semantic segmentation 13 (Fig. 1e). Deep learning models based on U-Net have demonstrated powerful performance in binary semantic segmentation of grayscale images 13,15,16 . U-Net consists of a fully convolutional encoder called a backbone and a deconvolution-based decoder ('SimpleUNet'). By replacing the backbone of U-Net with one of the most popular networks for image classification, such as ResNet101 17 , DenseNet121 18 , or InceptionResNet-v2 19 , deep learning models were applied for segmentation of X-ray angiography (see Appendix for network details). Input images of 512 × 512 pixels were normalized by using 2-D min/max normalization, and the initial weight was adopted from ImageNet for transfer learning.
Loss function. Generalized dice loss (GD) 20 was adopted to train the binary class segmentation. GD is defined as where c is the number of classes, p is number of pixels, G ln is the ground truth, and P ln is the prediction result. The invariance term , which is inversely proportional to label frequency ( ∑ G n p ln ), was introduced to mitigate the class imbalance between the major vessel region and other areas, where ε = − 10 6 . When w l is equal between classes, GD is inherently the same as the dice loss. The major vessel area accounted for 2.69% ± 0.86% of 512 × 512 pixel images in the internal dataset. training setup. Prediction models were trained for 400 epochs at maximum with a mini-batch size of 16.
Data augmentation was performed with rotation (−20° to 20°), translation shift (0-10% of image size in horizontal and vertical axes), and zoom (0-10%). For training, an Adam optimizer 21 was applied with β1 = 0.9 and β2 = 0.999, and the learning rate, which was initially set to 10 −3 , was reduced by half up to less than 10 -6 each time the validation loss remained saturated for 20 epochs. The deep learning networks were implemented in Python using TensorFlow library and trained on a workstation with Intel i9-7900X CPU 3.3 GHz, 128 GB RAM, and four NVIDIA Geforce GTX 1080 Ti GPUs.
The evaluation metrics used to assess the predictability of the deep learning models were precision, recall, and F1 score, which were defined as precision = TP/(TP + FP), recall = TP/(TP + FN), and F1 = 2 × precision × recall/ (precision + recall), where TP is true positive, FP is false positive, and FN is false negative. Evaluation metrics were calculated only for the major vessel area, i.e. TP represents the number of pixels for which the major vessel area was accurately predicted.
Dataset and experimental setup. The constructed dataset was divided into five folds according to the exam date (Fig. 1d). Each fold had almost the same number of patients, which avoided the subdivision of the angiograms of a patient into different folds. First, to compare the segmentation performance of deep learning networks, cross validation was applied to each fold comprising three major vessels (Table 2). Then, to investigate the impact of the dataset composition, deep learning analyses were conducted with a separate dataset for each single major vessel, similar to the previous approaches 10, 11 . In the cross validation, the fold proportion of training, validation, and test sets was 3:1:1, and the fold composition was changed in sequence under cyclic permutation. external validation. Although each major vessel has a standard view for CAG acquisition, the CAG characteristics are affected by clinical settings, such as view angle, magnification ratio, use of contrast media, and  (Table 1). A total of 181 label masks in the "external dataset" was created using the identical protocol as the internal dataset. External dataset was used as the test set with the trained model using the internal dataset (Table 2). Research approval was granted from the Institutional Review Board of the Chungnam National University Hospital with a waiver of research consent.

Statistical analysis.
Continuous values are presented as mean ± standard deviation or median and interquartile range, as appropriate. Categorical variables are presented as numbers and percentages. The    www.nature.com/scientificreports www.nature.com/scientificreports/ Mann-Whitney U test was applied to assess the differences in evaluation metrics associated with deep learning networks, dataset composition (combined vs. separate), and dataset size. The Kruskal-Wallis test was used to evaluate the impact of the minimum lumen diameter on the local F1 score around the stenosis. Values of p < 0.05 were considered statistically significant. Statistical analyses were performed using R package and SPSS 17.0 for Windows (SPSS, Inc., Chicago, IL, USA).   www.nature.com/scientificreports www.nature.com/scientificreports/

Effects of hyperparameters on segmentation performance.
To determine the hyperparameter set for evaluation, segmentation performance was examined with varying the epoch limit for plateau, augmentation parameters and optimizer (Table 3). 20 epochs for the plateau and the rotation angle of 20° exhibited the highest F1 score among the parameter combinations considered. Image flip offset the improvement effect of other augmentation techniques. Adam optimizer showed a higher average and larger standard deviation of the F1 score than stochastic gradient descent (SGD) and root mean square propagation (RMSprop) methods. performance in combined dataset of three major vessels. In major vessel segmentation, ResNet101, DenseNet121, and InceptionResNet-v2 statistically outperformed SimpleUNet in terms of recall, precision, and F1 score (p < 0.001; Table 4). DenseNet121 achieved the highest F1 scores of 0.917 ± 0.103 in total and 0.940 ± 0.058 in the RCA subset, respectively. Although LCX segmentation exhibited a lower performance compared with the other major vessels, the average F1 score of LCX was ≥ 0.878 for all the considered networks except SimpleUNet.
In a cumulative histogram, 93.7% of the total images exhibited F1 score > 0.8 with DenseNet121 (Fig. 2a). Histogram differences between DenseNet121 and InceptionResNet-v2 were negligible for all three major vessels, www.nature.com/scientificreports www.nature.com/scientificreports/ especially for images of F1 score > 0.9 (Fig. 2b-d). Only for RCA, SimpleUNet provided outcome quality comparable with the other networks with deeper layers (Fig. 2b).
The representative examples of deep-learning segmentation are depicted in Fig. 3. Despite overlap with catheters and other blood vessels, DenseNet121 and InceptionResNet-v2 accurately predicted the lumen area in major vessels and exhibited improved connectivity at the site of stenosis. Around the stenosis, the lumen boundary at the most narrowed location was sharply captured by DenseNet121 and InceptionResNet-v2 (Fig. 4a).

Analysis of segmentation errors in combined dataset. Error types of major vessel segmentation were
classified for the images of F1 score < 0.8 (Fig. 5a). The most frequent patterns of segmentation errors were mask separation consisting of multiple blobs and misidentification. Catheters predicted as major vessels that hindered the improvement of segmentation performance 6 were rarely found in the deep learning segmentation using DenseNet121. Among the images of low F1 scores, deep learning algorithms recognized a side branch as the distal part of major vessels, which may be an accurate identification, depending on the analyzers (Fig. 5b). In the local region around the stenosis, the minimum lumen diameter had a significant impact on the local F1 score (Fig. 4b).
comparison of separate and combined datasets. Separate datasets exhibited average F1 scores comparable to the combined dataset, despite learning with a smaller number of images (Fig. 6). Adding images of the other major vessels to the training set of a major vessel statistically improved the predictability of InceptionResNet-v2 (0.008-0.012 in F1 score), whereas SimpleUNet produced better outcomes with separate datasets of RCA and LAD (0.009-0.014 in F1 score).
impact of dataset size. Even with approximately a quarter of the dataset, which was used for the cross validation (fold 1 in Fig. 7), the F1 score was 0.833 ± 0.142 for SimpleUNet, approaching an average of 0.9 for the other networks. When more than 3 folds were used for the training and validation sets, the segmentation capability was almost saturated. external validation. DenseNet121 demonstrated robustness to changes in the characteristic of angiographic images, achieving an F1 score of 0.896 ± 0.138. A noticeable degradation of LCX led to an overall reduction in the segmentation capability in the external dataset. www.nature.com/scientificreports www.nature.com/scientificreports/

Discussion
The major finding of the present study is that deep learning networks accurately identified and segmented the major vessels in X-ray coronary angiography. The average F1 score reached 0.917, and 93.7% of the images exhibited a high F1 score > 0.8. The most narrowed region at the stenosis was distinctly captured with high connectivity. Deep learning segmentation was assessed for a large number of angiographic images, and robust predictability was validated for the external dataset with different image characteristics.
In recent years, the combination of novel deep learning networks and U-Net 13 has been proposed, remarkably improving the performance of semantic image segmentation 14 . DenseNet121 and InceptionResNet-v2 in the present study also demonstrated distinguished results for major vessel segmentation compared with the base model of U-Net, even with a relatively small dataset. DenseNet121 and InceptionResNet-v2 showed better results than the updated deep learning networks for multi-class segmentation 23,24 in the current setting (see Appendix for comparison results). DenseNet121 tended to cover most of the lumen area (higher recall), whereas InceptionResNet-v2, with a similar F1 score, exhibited a propensity for excluding the area outside the lumen  www.nature.com/scientificreports www.nature.com/scientificreports/ (higher precision). For DenseNet121, although the advantage of fewer parameters was offset with an increase in the memory usage of the inter-layer connection, the training time per epoch was 10.9% less than that for InceptionResNet-v2 ( Table 5).
The advantage of the current architecture for major vessel segmentation was that image preprocessing steps were minimized as min/max intensity normalization, which was seamlessly integrated into the deep learning model. The processes of extracting the entire coronary tree 5 and segmenting catheters 6 were not required. With the reduction in steps, the segmentation time was shortened to approximately 0.04 s per angiogram, which was shorter than the frame time of the recording system of 8-16 frame/s.
In comparison with the previous deep learning approaches using fully convolutional networks, higher F1 scores were achieved in the present study for RCA (0.704 in Au et al. 10 ), LAD (0.676 in Jo et al. 11 ), and three major vessels including normal arteries (0.890 in Jun et al. 12 ). The primary reasons for improved segmentation performance in the current setting were the annotation criteria based on apparent anatomical landmarks and the application of appropriate techniques for image augmentation. By using the vessel ostium and major bifurcations as fiducial points, the inter-and intra-observer variability resulting from eye estimation to determine a vessel segment or a diseased lesion 25 could be avoided. Near the bifurcation of LAD and LCX where overlapping lumen areas inevitably occurred, the ostium boundaries of the major vessels were consistently divided by referring to the adjacent image frames. Because the acquisition conditions of angiography vary within a certain range with respect to the standard imaging parameters (Fig. 1b), the limited amplitudes of the augmentation parameters were applied to rotation, translation, and zoom. Flip and crop techniques, which do not correspond to coronary anatomy and typical imaging setting, were excluded (Table 3). Although a single static image was used as an input to deep learning networks in this study, multi-view approaches could further improve the outcome by encompassing dynamic changes of the coronary arteries caused by heartbeat 26,27 .
Employment of deep learning applications could lead to changes in clinical activity based on the segmentation of X-ray angiography. Offline QCA analysis could be completed with less manual correction, and morphological information of the coronary lesion would be obtained by simply adjusting the reference locations. Therefore, the time required to calculate the SYNTAX score, which requires QCA analysis of entire coronary trees, would be reduced. Diagnostic methods with 3-D QCA 28 , which is constructed by combining the 2-D QCA of multiple angiograms could be further utilized. In the prediction of post-stenotic FFR and vulnerable plaque using fluid dynamics simulation 3 , the precise 3D reconstruction of a coronary artery is a prerequisite for accurate analysis 29 . The reduced time requirement for QCA analysis may allow for real-time application in the catheterization room, where clinician hands are not free to operate, eventually reducing the dependence of visual assessment and quantifiably guiding stent selection and optimization.
Although the deep-learning segmentation distinguished most luminal areas of the major vessels, there are some aspects to be improved for practical use. First, angiographic images with a low F1 score due to misidentification or separation may require greater modification of lumen boundaries compared with when edge-detection methods are applied. For a comprehensive interpretation of the coronary tree with stenoses, geometric analyses of the left main artery and side branches are necessary. In the assessment of bifurcation lesions, the evaluation of the location and shape of the narrowed area is important for both major and side branch vessels, which have different Figure 7. Impact of dataset size on segmentation performance of deep learning networks. *p < 0.001 for all deep networks; † p = 0.027 for DenseNet121.  implications for cardiovascular risk. Among the excluded cases, segmentation of normal arteries and totally or subtotally occluded lesions may be necessary for more general use of the automated QCA, as well as the calculation of the SYNTAX score. Another limitation is that the tasks for QCA analysis before and after major vessel segmentation still rely on the competence of the analyst. To provide more automated and improved outcomes of QCA analysis with reduced analyst dependency, integration with conventional image processing techniques, such as edge detection and ECG-based frame selection, would be helpful in a complementary manner, as well as deep learning application across multiple stages. The size of the external dataset was small to generalize the segmentation capability of our method. Therefore, the robustness and reliability of the deep learning segmentation must be validated against the diversity of angiography characteristics, which vary depending on the institute or operator.

conclusion
Deep learning networks accurately identified and segmented the major vessels in X-ray coronary angiography. The prediction process could be completed in real time with minimal image preprocessing. By applying deep learning segmentation, QCA analysis could be further automated and, thus facilitating the use of QCA-based diagnostic methods.

Data availability
The datasets generated and/or analyzed during the current study are not publicly available because permission of sharing patient data was not granted by the Institutional Review Board but are available from the corresponding author on reasonable request.