Background & Summary

Pulmonary embolism (PE) is a sudden blockage of a lung artery by a deep vein thrombosis (DVT) clot, typically originating in the pelvis veins and carried by the blood flow through the heart into the lung. Since it may reduce respiratory capability by pulmonary artery (PA) closure, early diagnosis and treatment of DVT can decrease the risk of PE.

However, once arterial obstruction exceeds 50% of the cross-sectional area, massive PE may occur with acute and severe cardiopulmonary failure because of right ventricular overload. It was reported that 70% of patients died within the first hour after onset of the above symptoms. Therefore, early and precise diagnosis of PE is important, due to the high morbidity and mortality risk1,2.

Contrast-enhanced computed tomography (called CT angiography or CTA) images have been widely used for PE diagnosis35 because of their suitable lesion discrimination in blood vessels6. Specifically, PE regions appear as dark spots among the bright regions of blood arteries in CTA images7. The radiologist should record the CTA image in a suitable time interval after injection of the contrast material and before its traveling from the arteries to the veins. In this case, although the vein and PE regions may have similar gray-levels in the CTA image, the latter can be distinguished from the former by its higher contrast. Nevertheless, lymphatic tissue, parenchymal disease, and partial volume effect may also provide similar dark regions (especially, on artery boundaries) in CTA images8. This is why the manual delineation of PE regions is a time consuming task and depends on the expert insight9.

In recent years, by progressing computing and computational technologies, computer-aided detection (CAD) systems have gained increasing impact in clinical and research applications5. However, due to the above challenges, automated/semi-automated detection of PE, still, is a challenging endeavor for radiologists, physicians, and biomedical engineers. These groups are unable to precisely evaluate and compare their results with each other, due to the lack of a proper dataset of PE-CTA images with suitable ground-truth, evaluation scores, and prognosis information. To tackle this problem, some researchers have generated private datasets, which are not widely shared4,8,10. Recently, Madrid-MIT M+Visión Consortium11 supplied a public dataset of 20 PE-CTA images with ground-truth. However, they reported neither the clinical information of subjects nor, evaluation scores of PE-CTA images.

In this paper, we present a new dataset of three dimensional (3D) PE-CTA images, called FUMPE (standing for Ferdowsi University of Mashhad's PE dataset), for computer-aided detection with research and education purposes. It includes 35 PE-CTA images with a total of 8792 slices. Furthermore, an expert radiologistmanually and precisely delineated the PE regions in every slice of each CTA image as the ground-truth. We took advantage of a semi-automated software tool to enhance the segmentation results. The final PE regions were re-examined by another expert radiologist. In addition, for further evaluation, the first radiologistprovided five CTA measurements for every benchmark image.

Methods

We primarily obtained ethical approval of the ethics committee of Mashhad University of Medical Sciences. Although all images have been anonymously published in the proposed dataset to avoid the risk of privacy breach, we got a signed informed consent from every patient. As shown in Fig. 1, the development process of the proposed dataset consists of contrast material injection, image acquisition, image selection, image segmentation, and Qanadli scoring, as comprehensively stated in the sequel.

Figure 1
figure 1

Five steps of the development process of the proposed dataset.

Contrast material injection

In a normal PE-CTA, the pulmonary arteries should be full of the contrast material while the aorta should be empty of it. Therefore, a total of 70-100 mL of non-ionic contrast material (containing 300-370 milligrams of iodine per milliliter) was injected into the right antecubital vein by using gauge-18 or -16 catheters (with the flow of 4-5 mm per second) at 10-12 seconds before imaging.

Prognosis symptoms

To collect the FUMPE dataset, 400 PE-CTA images were primarily recorded from different patients. The most-common patient-complaints were dyspnea, tachypnea, and pleuritic chest pain with haemoptysis. Moreover, some patients had non-specific signs and symptoms, such as tachycardia, palpitations, wheezing, and cough. However, patients with massive PE had hypotension, extreme hypoxemia, cyanosis, syncope, or even cardiac arrest. Furthermore, for non-urgent patients, the DVT test was performed.

Imaging

CT-scanning was performed in Emam-Reza and Ghaem Medical Centers (http://quaem.mums.ac.ir/index.php/en) by using the NeuViz 16 multi–slice helical CT scanners of Philips and Neusoft Medical System Co., Ltd with 120 kVp, 0.75 mm × 16 collimation, the gantry rotation time of 0.75 s, and a beam-pitch of 1.2. Also, in order to automatically adjust the tube current, scanners took advantages of both the dose modulation and angular/longitudinal tube-current modulation (with automatic current selection) for all subjects except Patient21. The range of the tube current variations for each subject was reported in Table 1. All PE-CTA images of FUMPE were acquired in one breath hold with:

  • slice-thickness≤1mm (except for Patient24 and Patient32 with slice-thickness=2mm)

  • slice-interval≤1.5 (except for Patient03, Patient10, and Patient28 with slice-interval=4mm)

  • in the caudocranial direction (except for Patient12 and Patient13 with the craniocaudal direction)

Table 1 Different characteristics of CTA images reported in FUMPE.

Image selection

It was frequently demonstrated that CAD systems could better extract PE regions in the main arteries compared to the peripheral vessels, due to higher contrast and better discrimination8. Thus, a suitable benchmark dataset for evaluation of CAD systems should considerably include a large number of PE regions in the peripheral arteries. Therefore, from among all the recorded PE-CTA images, by visual inspection, we choose 35 images with the largest number of PE regions in peripheral arteries to make the proposed dataset.

Image segmentation

To establish the ground-truth, a board-certified radiologist (with over 5-year experience for PE-CTA analysis) primarily delineated all PE regions of interest (PE-ROIs) in each PE-CTA image. He also took advantage of a semi-automated software tool called MIS (standing for medical image segmenter) which supports the coronal and sagittal reconstructions (in addition to the original axial view) to ensure about delineation accuracy. Finally, the delineated PE-ROIs were re-examined and approved by the head of the radiology department (with 18-year experience) of Emam-Reza Medical Center (http://emamreza.mums.ac.ir/index.php/en).

Code availability

We developed the MIS software tool in the MATLAB R2017 environment. It consists of a GUI window in which the user can see a 3D DICOM image in the axial, coronal, and sagittal views. Also, the user can choose the region of interest in each slice by multiple mouse selections. The software took advantage of a semi-automated segmentation algorithm which consists of the thresholding and connected-component analysis steps12. It can determine the local connected region to a seed point, chosen by the user, through a gray-level similarity criterion. The source codes (in the MATLAB environment), compiled executable file, and pictorial user manual of MIS are publicly available in: https://doi.org/10.6084/m9.figshare.6289085 (with the Figshare Repository).

CTA measurements

We provided five measurements for each PE-CTA image, as follows:

  • RV/LV Ratio: The right ventricular (RV) failure is one of the most important causes of early death after PE13. CTA enables the radiologist to assess RV dysfunction by calculating the ratio of RV to left ventricular (LV) diameter (called RV/LV ratio) in the reconstructed four-chamber views.

  • Reflux into IVC: Reflux of the contrast material into the inferior vena cava (IVC), which can be observed in CTA images, is associated with right heart failure due to PE14.

  • Straight Septum & PA Diameter: Severe PE increases the right heart pressure. In this case, the interventricular septum may be abnormally shifted toward the left ventricle15; and also, the diameter of the main pulmonary artery (lateral to the ascending aorta and at the level of its bifurcation) may be increased16.

  • Q-score: After image segmentation, we assessed the arterial clots of each subject according to the Qanadli scoring system (Q-score). Generally, the Q-score is computed as the superposition Q= k = 1 n d k where n indicates the total number of proximal clot sites and dk determines the obstruction index of the k-th one. In more detail, in the left lung, the upper, lingual, and lower lobar arteries are branched into three (apical, posterior, and anterior), two (superior and inferior), and five (superior, medial, lateral, posterior, and anterior) segments, respectively. Similarly, the lobar arteries of the right lung are also separated into 10 segments, in the same manner. Thus, as illustrated in Fig. 2, we totally have n=20 segments in both lungs. For the k-th segment (k=1,2,…,n), dk is set equal to 0, 1, and 2 for the clot-free, partial obstruction, and total occlusion situations, respectively17. Once there is an embolus in the most proximal arterial level, its corresponding index is computed as the superposition of the obstruction indices of all segmental arteries arising distally. For example, Fig. 2 illustrates the obstruction indices of all arterial segments in the left and right lungs of Patient16. The Q-score can be used for prognosis evaluation, treatment-reply, and determining the anti-coagulant treatment period18. Also, the patients with larger Q-scores than 18 have high mortality and morbidity rates19.

    Figure 2: Sample Q-Score computation.
    figure 2

    The arterial segments of the left and right lungs of Patient16 (used for Q-score computation) are illustrated as an example. The obstruction index of each segment is indicated in the figure. The total Q-score (i.e. 19) was computed as the superposition of all the obstruction indices.

Data Records

All data records described in this paper are available on the Figshare Repository, organized in 35 different patients (Patient01 to Patient35, Data Citation 1) and one ground-truth archive (Ground Truth, Data Citation 1). Each patient archive includes all slices of the corresponding 3D CTA image (stored in the DICOM file format) while the ground-truth archive consists of all the 3D ground-truth images of FUMPE in the MAT file format. MAT files can be simply loaded to the MATLAB programming environment by using the function load. In every ground-truth image, the foreground and background voxels were indicated by the gray-levels 1 and 0, respectively.

Technical Validation

Each image was visually checked by an experienced CT technologist to be artifact-free and have sufficient contrast for image analysis. If the image quality was not acceptable, he repeated the image acquisition process.

Summary of the dataset

Table 1 reports the characteristics of all CTA images of the proposed dataset including the subject gender and age, DVT test, slice thickness and interval, range of the tube current, imaging direction, number of slices, number of PE regions in the main arteries, and that in the peripheral arteries. FUMPE includes the PE-CTA images of 17 male and 18 female patients (aged 24-82 years). In addition, from among all FUMPE images (with totally 8792 slices), only Patient24 and Patient32 have no PE-clots (false positives) in both the main and peripheral arteries.

For example, Fig. 3 illustrates the source and ground-truth images of the 77th, 80th, 106th, 116th, 119th, 133th, 139th, and 151th slices of Patient16. Note that here, all PE regions of the ground-truth were indicated by the semi-transparent green color over the source image for better visual inspection. Also, the size of PE regions was significantly various from few to hundreds of voxels.

Figure 3: A sample CTA image of FUMPE.
figure 3

Including the source (left-hand side in each pair) and ground-truth (right-hand side in the same pair) images corresponding to the (a) 77th, (b) 80th, (c) 106th, (d) 116th, (e) 119th, (f) 133th, (g) 139th, and (h) 151th slices of Patient16.

For every ground-truth image of the dataset, we counted the number of PE regions in the main and peripheral arteries. As reported in Table 1, the proposed dataset totally includes 3438 PE-ROIs; such that most of them (i.e. 67%) are located in the peripheral arteries. Therefore, FUMPE is a challenging benchmark for evaluation of CAD systems.

Finally, Table 2 reports the five specified CTA measurements (including RV/LV ratio, Reflux into IVC, straight septum, PA diameter, and Q-score) for all FUMPE images. As further illustrated in Fig. 4-scores were ranged from 0 to 31. Also, the most frequent Q-scores were 20, 7, and 3, with the abundance of 5, 4, and 3 subjects, respectively. Moreover, 11 patients, with larger Q-scores than 18, had high mortality/morbidity risk.

Table 2 Five different measurements reported for each CTA image of FUMPE.
Figure 4
figure 4

The Q-score histogram of FUMPE.

Comparing with other PE datasets

As shown in Table 3, FUMPE is further compared with 14 PE-CTA datasets reported in35,8,10,11,2027. All the counterpart datasets, except m+visión11, are private (i.e. with non-public accessibility). Clearly, FUMPE includes a large number of PE-ROIs compared to the other datasets. Furthermore, it is the first public PE dataset with the Q-score evaluation, which can be used for development of automatic scoring algorithms and medical education purposes. It is also the only dataset which provides appropriate complementary prognosis information such as DVT test, RV/LV ratio, reflux into IVC, straight septum, and PA diameter.

Table 3 Comparing FUMPE with 14 different PE-CTA datasets.

Additional information

How to cite this article: Masoudi, M. et al. A new dataset of computed-tomography angiography images for computer-aided detection of pulmonary embolism. Sci. Data 5:180180 doi: 10.1038/sdata.2018.178 (2018).

Publisher’s note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.