Background & Summary

Coronary artery disease is a leading cause of death worldwide1, causing a large body of research to focus on the understanding of coronary anatomy and blood flow, disease progression and treatment options2,3,4. With rapid advancements in computation, additive manufacturing and other technologies capable of taking advantage of virtual organ models, computational models of coronary arteries have been increasingly used in research, including the designing and testing of medical devices, as well as for education and training purposes5.

While different modalities can be used to image coronary arteries, only Computed Tomography Coronary Angiography (CTCA) is non-invasive and has sufficient sub-millimetre resolution to allow reconstruction of the small coronary arteries. Therefore it is commonly used and as a result ideal as underlying modality for subsequent image segmentation and virtual coronary artery reconstruction. This commonly required manual refinement after initial automatic threshold due to the small scale, lack of clear contrast with the surrounding tissue and common image artefacts, especially for calcified lesions. Segmentation of the full coronary tree is particularly difficult as even with highest resolution CTCA machines today, distal vessels are only captured via a few image pixels. As a result, despite a wealth of CTCA data available to date, there are extremely few virtual coronary models publicly available and the use of reconstruction workflows on a large-scale patient-specific basis is cost and time intensive.

Traditional segmentation methods are extremely time consuming6, generally requiring semi-automated segmentation closely supervised by a human expert to guide the algorithm and correct errors. Additionally, the segmentations produced are highly sensitive to the individual expert and hence consistent segmentation between different experts is difficult. This has led to no public datasets currently available for use in applications that require accurate patient-specific coronary models. Related datasets are limited, including the ‘Visible Heart Project’, which, however focuses on educational images and videos using magnetic resonance imaging. Although access may be provided to limited CTCA images, these are without annotations or reconstructed models7. Also available is the ‘The Rotterdam dataset’8,9, which is primary public dataset, but focused on stenosis detection and stenosis evaluation with sub-voxel accuracy. This dataset may only be used for its stated purpose of stenosis detection and lumen segmentation, and is also no longer publicly available from the challenge website (https://coronary.bigr.nl/).

To overcome the problems with these traditional segmentation methods, we created high quality segmentations of the coronary arteries, to serve as both a benchmark dataset for newly developed segmentation methods and pre-existing segmentation for further processing, for example investigating differences in helicity between stented idealized and patient-specific vessels10. This was part of the ‘Automated Segmentation of Coronary Arteries’ (ASOCA) Challenge11,12 we facilitated during the Medical Image Computing and Computer Assisted Intervention (MICCAI) 2020 conference to focus on the development of automated segmentation algorithms using this data, providing a convenient system for submission of results and automated evaluation and ranking.

The coronary artery CTCA images were available to us through the Coronary Atlas13, an ongoing collection of CTCA images and associated clinical and demographics data used to investigate differences in coronary anatomy14 and haemodynamic behaviour between patients15,16,17. A set of 40 patient-specific coronary artery tree data is provided here, including anonymized CTCA images in .nrdd format, combined high-quality manual voxel annotations derived from 3 experts, and other associated data such as centrelines, smoothed meshes in .stl format and calcification scores. These served as the training dataset for the ASOCA challenge. Our dataset is the only public dataset of annotations and associated data of the full coronary tree in 20 normal and 20 disease cases. Additionally, a separate set of 20 CTCA images (the test set images for the ASOCA challenge) is provided primarily to facilitate participating in the challenge. In order to not compromise the integrity of the challenge, no other information is provided with these images. Researchers can participate on the challenge website (asoca.grand-challenge.org), using the training data to develop segmentation algorithms and submit results to the challenge website for automatic evaluation and scoring.

In summary, the current dataset has several advantages over previously available coronary artery datasets. While our dataset is based solely on CTCA and can not provide sub-voxel segmentation and stenosis identification as accurate as the Rotterdam dataset, we do however provide high quality segmentation of all coronary vessels visible in CTCA. In contrast our dataset is available to all researchers including commercial projects. Further, our inclusion of all arteries larger than 1 mm rather than selected vessel segments allows for expanded applications such as more complex simulations, and more comprehensive training and educational applications. The balanced set of normal and diseased patients ensures effects of disease can be independently studied, as well as ensuring that newly developed segmentation algorithms can robustly handle cases with disease. The dataset is sufficiently large and balanced for training machine learning models. Device manufacturers and researchers with an interest in cardiovascular modelling, prediction and treatment of coronary artery disease can analyse this data directly or combine it with other available datasets. The smooth surface meshes and centrelines can be directly used for computational modelling16, directly 3D printed for experiments18,19,20,21, assist in developing and testing medical devices such as stents22,23,24, and can be used for Virtual Reality applications for education and training25,26,27. Moreover, our dataset allows for the development and benchmarking of new segmentation algorithms aiming to efficiently annotate the coronary arteries automatically as per ASOCA challenge28.

Methods

Patient cohort

Forty patients were randomly selected from a retrospective dataset based on the calcification, stenosis and image quality reported by the cardiologist. Images must have acceptable quality as described by the cardiologist. The dataset was divided into twenty normal patients with no evidence of stenosis and non-obstructive disease, and twenty patients with evidence of calcium scores higher than 0 and obstructive disease. The calcification score in the diseased group ranged between 1 and 986 with a mean of 254. Obstructions in the diseased group ranged from 30% to 70% stenosis. Patients were included during routine procedures after written and informed consent and approval from University of New South Wales Human Research Ethics Committee (Ref. 022961).

Imaging

The CTCA imaging was undertaken using a multi-detector CT scanner (GE Lightspeed 64 multi-slice scanner, USA) using retrospective ECG gating. A contrast medium (Omnipaque 350) was used for imaging and the patient heart rate was controlled around 60bpm by administration of beta blockers. The end diastolic time step was saved for analysis and the images exported as DICOM files. Images were converted to Nearly Raw Raster Data (NRRD) format during the anonymization process and the intensity rescaled to Hounsfield units based on the appropriate DICOM tags.

Annotation

The open-source software 3D Slicer (version 4.3)29 was used to manually annotate the coronary arteries images. The annotation process was performed independently by three annotators, who were instructed to segment the left and right coronary trees starting at the aortic root. Thresholding at a cut-off chosen by the expert was used to generate an initial rough segmentation of the vessel, followed by manually correcting the vessel contours in each slice. All coronary vessels with a diameter larger 1 mm, representing 1–2 voxels, were included in the segmentation. In segments showing significant imaging artefacts affecting the vessel that would make further segmentation unfeasible, the rest of the vessel was ignored. A sample of the annotated CTCA images and the resulting 3D reconstructions are shown in Fig. 1. Figure 2 shows a diseased case with calcified plaque and stenosis.

Fig. 1
figure 1

Samples of annotated data showing (a) an annotated slice of the CTCA images, (b) volumetric rendering of the labelled voxels, and (c) smooth surface mesh (left to right) generated from the normal (top two rows) and diseased (bottom two rows) coronary artery image annotations.

Fig. 2
figure 2

Calcified and non-calcified plaques present in the dataset.

Post processing

The annotations are combined to produce a final segmentation of the arteries by majority vote among the three annotations, i.e. including regions where at least two of the annotators agreed. Small vessels (<1 mm, i.e. 1–2 voxels) were removed if they were mistakenly included. The segmentations are available as voxel-wise annotations, as well as smoothed surface meshes. Surface meshes were produced from the annotations using the Flying Edges algorithm30. It should be noted that with voxel-wise labelling as used in this dataset rather than a tubular parametrization, further smoothing would be necessary to recover a smooth vessel shape. The annotations were smoothed using Taubin’s algorithm31, implemented in the open-source Vascular Modelling Tool Kit (VMTK, https://www.vmtk.org)32,33, with a passband of 0.03 and 30 iterations before being exported as an STL file. Taubin’s smoothing method is commonly used when processing vessel segmentations34 and is expected to preserve topology and volume of the vessels35. These settings correspond to the smoothing used in the Coronary Atlas to calculate shape parameters. The raw annotations provided can be used to produce surface meshes with different smoothing settings if needed. Vessel centrelines were extracted manually by marking the inlet and outlet points on the mesh for automated centreline calculation in VMTK, as shown in Fig. 3.

Fig. 3
figure 3

Sample coronary tree surface and centreline.

ASOCA test data set

An additional 10 normal and 10 diseased CTCA cases, separate to the 20 normal and 20 diseased used for the training data, were selected based on the same criteria to serve as the test set for the ASOCA challenge. These cases will be distributed alongside the annotated dataset to facilitate further participation in the challenge. Ground truth annotation and other associated data for these cases is not publicly available.

Data Records

The dataset is available on the UK Data Service (https://reshare.ukdataservice.ac.uk/855916/)36. Patients are labelled sequentially from 1 to 20, with normal and diseased patients labelled separately (i.e. Normal_1…Normal_20 represent the normal patients and Diseased_1…Diseased_20 represent diseased patients). CTCA scans are provided as Nearly Raw Raster Data (NRRD) file labelled sequentially based on patient name (Normal_1.nrrd, Normal_2.nrrd…). This naming convention is used for the rest of the data folders. The annotations folder contains the final annotation for each patient. This represents the voxel-wise annotations, with the background voxels assigned a value of 0 and the foreground (vessel lumen) assigned a value of 1. Both the CTCA images and annotations have anisotropic resolution, a common characteristic of most CT machines, with the z-axis resolution of 0.625 mm and the in-plane resolution ranging from 0.3 mm to 0.4 mm depending on the patient. The SurfaceMeshes directory contains smooth surface meshes generated from the voxel annotations. These meshes are provided in STL format, with an average of 37,000 vertices to capture the arterial geometry.

The centrelines folder contains centrelines of the coronary arteries for each patient, provided in VTK Poly Data (VTP) format that allows for efficient storage of centreline data. Figure 3 shows a sample of the extracted centreline and underlying surface mesh. The spreadsheet DiseaseReports.xlsx reports calcium score and stenoses levels for each patient.

Technical Validation

Dice Similarity Coefficient (DSC)37 is frequently used to measure the degree of overlap between annotations. DSC is defined as in Eq. 1 for two sets of voxels A and B. Similarly, Hausdorff Distance (HD) as shown in Eq. 2 measures the distance of corresponding points between annotations. In practice commonly the 95th percentile HD is used rather than the maximum in order to reduce sensitivity to outliers38.

$${\rm{DSC}}=\frac{2| A\cap B}{| A| +| B| }$$
(1)
$${\rm{HD}}={\rm{\max }}(\mathop{{\rm{\max }}}\limits_{x\in A}\,\mathop{{\rm{\min }}}\limits_{y\in B}d(x,y),\mathop{{\rm{\max }}}\limits_{y\in A}\,\mathop{{\rm{\min }}}\limits_{x\in B}d(x,y))$$
(2)

We used DSC (Table 1) and 95th percentile HD (Table 2) to compare variability between annotators compared to the final ground truth generated for each case. The average Dice Score among the three annotators was 85.6% ± 7.7% (mean ± standard deviation) and an average HD of 5.92 ± 7.3 mm (mean ± standard deviation). The concordance between annotators was higher for normal cases compared to diseased (87.4% vs 83.9%, p = 0.01 using Welch’s t test39), due presence of stenosis and calcified plaques complicating the annotation of diseased images. Hausdorff Distance showed similar results (4.45 mm in normal cases vs 7.38 mm in diseased, p = 0.028). A Dice Score of 1 (indicating perfect agreement) is difficult to achieve, as this dataset attempts to segment the full coronary artery tree including small arteries near the limit of CTCA imaging resolution. This Dice Score and Hausdorff Distance indicates high agreement between the annotators and is unlikely to adversely affect usage of this dataset. Table 3 shows the Hausdorff Distance between centre of the voxel labels and the smoothed mesh.

Table 1 Annotator Dice Similarity Coefficient for each patient.
Table 2 Annotator 95th percentile Hausdorff Distance for each patient.
Table 3 95th percentile Hausdorff Distance between smoothed meshes and voxel labelmap.

Usage Notes

These recommendations focus on free, open-source software, however as the dataset is provided commonly used formats commercially available software suites will can also be utilised. CTCA and ground-truth data is provided in NRRD format, compatible with all common medical imaging software such as 3D Slicer29 and ITK-SNAP40. 3D Slicer is the recommended software for working with this data, providing tools for common editing operations and various add-ons for specialised tasks. The centrelines are saved in VTK Poly Data (VTP) format, expected to be used with the Visualization Toolkit (VTK)41 and the Vascular Modelling Toolkit32,33. VMTK is also available as a 3D Slicer add-on. Surface meshes are provided in Standard Tessellation Language (STL), compatible with most mesh software. Both 3D Slicer and VMTK allow editing and processing STL meshes, including addition of flow extensions and generation of volume meshes for computational fluid dynamics simulations. Specific mesh editing software such as Meshlab42 can be used for more complex tasks. The dataset can be also be used to develop new segmentation algorithms and evaluate the performance on the standardised ASOCA challenge. Submission instructions are available on the challenge website (https://asoca.grand-challenge.org/SubmittingResults/).

The dataset can be used for research and commercial purposes. Researchers request access on the UK Data Service36 for access and provide evidence of ethics review and approval, or waiver regarding their project.