Automatic detect lung node with deep learning in segmentation and imbalance data labeling

In this study, a novel method with the U-Net-based network architecture, 2D U-Net, is employed to segment the position of lung nodules, which are an early symptom of lung cancer and have a high probability of becoming a carcinoma, especially when a lung nodule is bigger than 15 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mathrm{mm}^2$$\end{document}mm2. A serious problem of considering deep learning for all medical images is imbalanced labeling between foreground and background. The lung nodule is the foreground which accounts for a lower percentage in a whole image. The evaluation function adopted in this study is dice coefficient loss, which is usually used in image segmentation tasks. The proposed pre-processing method in this study is to use complementary labeling as the input in U-Net. With this method, the labeling is swapped. The no-nodule position is labeled. And the position of the nodule becomes non-labeled. The result shows that the proposal in this study is efficient in a small quantity of data. This method, complementary labeling could be used in a small data quantity scenario. With the use of ROI segmentation model in the data pre-processing, the results of lung nodule detection can be improved a lot as shown in the experiments.

evaluation chosen in this study is dice coefficient loss which is a typical loss function in the image segmentation field. In the testing, results show that complementary labeling is a method to have a better result than that of using original labeling in small data quantity. Since it is hard to label the medicine images, the quantity of medical images labeling is usually not sufficient. It well-known that to resolve the problem of small data quantity is hard. Semi-supervised learning 11 for auto-labeling is one efficient way of resolve the lack of labeling problem. In addition, using complementary labeling is also another efficient way. In this study, a net is used to perform the ROI segmentation with CT images to get the position of pulmonary. It can be viewed as the attention model for the process. With the use of ROI segmentation model in the data pre-processing, the results of lung nodule detecting can be improved a lot as shown in the experiments.

Methods
Data description. We confirm that all methods were carried out in accordance with relevant guidelines and regulations. This study was approved by Taipei Medical University Hospital Joint Institutional Review Board and is not a retrospective study. All of the patients signed the informed consent of identifying information and image publishing. Furthermore, all of the records from patients with pathologically was confirmed and retrospectively reviewed by the final pathological confirmation or clinical diagnosis from 2016 to 2019. CT reports were searched for target patients by initially using the keyword "CT" and "nodule". After achieving the first round filtered cases, cases with keyword "nodule", "opacity", "GGO (ground-glass opacity)", "adenocarcinoma", "granuloma", "metastasis" and "cancer" in type section and "pleural", "hilum", "pulmonary", "lung", "RUL", "RLL", "EML", "LLL", "LUL" in position section would be kept by manually CT report screening. It is noted that the CT reports with "shadow", "emphysema", "pneumonia", "pneumonitis", "cysts", "fibrotic foci", "inflammation", and "consolidative patchy" were not included. The inclusion criteria for the study were as follow: (1) patients were  www.nature.com/scientificreports/ scanned with routine CT using a slice thickness of 5 mm; (2) diagnosis without distant metastasis was confirmed by surgery and pathology; (3) Only the last CT scan before surgery or biopsy was chosen. (4) The diameter of each nodule was smaller than 30 mm. Under above criteria, a total of 457 cases (220 women, 236 men; average age, 65 ± 13 ) with 472 lung nodules were enrolled in the study. The all chest CT images were taken under free-breathing condition with supine position on the scanning bed for all the patients. CT scanners that produced by these manufacturers (GE Medical Systems, Philips Medical Systems, Siemens) were used for the acquisition of those CT images with 110-120 kV and 10-20 mA. The image slice matrix was 512 × 512, with slice thickness of 5 mm and the pixel spacing of 0.168 × 0.168 mm 2 .
There are two nodule morphologies, segmentation and semantic features in this dataset. All were discussed by three radiological doctors with 10 to 20-year radiological experiences with consensus. These CT images are analyzed mainly on the lung window (the range of HU values from −1400 to 400). There are two steps in nodule segmentation. The first is to use a semi-automatically segmentation way to get the target nodules on commercial software (IntelliSpace Discovery, Netherlands, Philips Healthcare) and then to store into a DICOM format. The second one are the contours which were segmented and would be checked again by another doctor. The nodule contour would be modified by freehand drawn using a self-developed software running on Matlab (r2018b, The MathWorks, USA) if necessary. The data format is DICOM with de-identification. DICOM (Digital Imaging and Communication in Medicine) is a general communication protocol. It can integrate the medical applications from multiple manufacturers such as scanners, servers, workstations, printers, etc. DICOM is widely used in hospital and be used in local clinics and dentist clinics.
Pre-processing. In our implementation, the data format is transferred from DICOM to NIfTI. NIfTI (Neuroimaging Informatics Technology Initiative) is an open file format. It was developed for neuroimaging at the first but now is widely used in brain images and other medical images. The feature of this format has two affine coordinates linked to its value pixel (i,j,k) and its position space (x,y,z).
Each scan of the data is a 3D medical image of a resolution in 512 × 512 that spanning multiple slices about 100 slices. NIfTI format is used because NIfTI is for 3D images processing. The DICOM format will consume more memories because of the data space. CLAHE 12 (Contrast Limited Adaptive Histogram Equalization) is used to balance the contrast in CT images. The different hounsfield (HU) in CT images are divided into three different windows, original, softtissue, and lung window in this study as shown in Fig. 2. The mainly difference between them is the contrast and image detail. The hounsfield center value and width of lung window and softtissue is −600, 1500 and 50, 400 respectively. The lung window is selected as our input images because this is the most common hounsfield range used in clinic diagnosis of lung images. The hounsfield value parts of the body is shown in Table 1. After the sampling, images are resized from 512 × 512 to 256 × 256 to decrease the memory consumption. Doing normalization is the next step to decrease the computing cost. Then CLAHE is used to adjust the contrast to make the nodule more obvious.  www.nature.com/scientificreports/

CLAHE. CLAHE (Contrast Limited Adaptive Histogram Equalization) is widely used in image processing.
It differs from AHE 13 (Adaptive Histogram Equalization). The traditional adaptive histogram equalization has a tendency to over amplify noise in relatively homogeneous regions in images. It is because the histogram in these regions is highly concentrated. As a result, AHE may cause noise to be amplified in near-constant regions. CLAHE can limit the concentration of the histogram. The advantage of CLAHE is that it will not discard the part of the concentrated histogram. Instead, it keeps them and redistributes the exceed histogram equally among all histogram bins. As mentioned in the study of 14 , using the Wiener filter can efficiently decrease the noise in images. The PET/CT images are prone to constant power additive noise. Figure 3 shows the performance of preprocessing with the wiener filter and CLAHE in TMUH dataset.
U-net. U-net is an encoder-decoder deep learning model which is known to be used in medical images. It is first used in biomedical image segmentation. U-net contained three main blocks, downsampling, upsampling, and concatenation. As shown in Fig. 4, U-net is well known for its architecture. Some literature called the architecture as the encoder-decoder architecture. Since the shape of the net looks like the uppercase English letter U, it is named as U-net. As shown in the figure, each box corresponds to a multi-channel feature map. The number of channels shown on top of the box. The bottom right corner shows the meaning of the arrows. However, He Initialization is adopted as initial weight without pre-training in this model and the difference between the net used in our study and the original U-net is that the LeakyRelu with alpha = 0.3 is used as an activation function in each convolution layer. The important difference between U-net and other segmentation net is U-net used a totally different feature fusion way: concatenation. It concatenates the feature channel together to get a feature group. It could decrease the loss of features during convolution layers. And the training result would be better than without it.  pre-processing is considered on the images. The result seems not good. In the study [15][16][17] , they used thresholds to get the position of the object to be recognized. In the paper 18  To invert the labeling is to let the labeling area become a higher percentage in the whole CT image as shown in Fig. 6. In this paper 19 , a method like complementary labeling is proposed. The different between the paper and ours is the using of labeling. In their study, the proposal is loss function which correct the loss calculation. The proposal of our study is the data pre-processing, which means the calculation of loss is different. And another method is hybrid labeling input. The input dimensions become two; one is the original labeling and the other is the complementary labeling. So the output would get two different mask with positive and negative labeling. The calculation of dice coefficient is respectively to each one. As the input labeling become two dimensions, the weights of per pixel decreases. Ideally, by using the same loss function, it can let the calculating in backpropagation become more objective.

Results
Pre-processing. The dice coefficient loss is selected as the loss function. Dice coefficient as (1) is often used in medical image segmentation 18,20 .
It is usually used to calculate the similarity of two samples. Data augmentation is not employed in this experiment because the amount of data is sufficient. 2D images are considered in this study. One reason is that it can www.nature.com/scientificreports/ efficiently decrease the computing cost. The other one is GPU limited. In most of GPU, it is hard to train 3D models due to the space problem. We are aware that there are many other 2D or 3D models. In this work, we intent to report our study on some possible ideas to resolve problem for segmentation, especially unbalance data and small. Besides, about the model training, the training progress converged in 200 epochs for the setting of 3000 epochs in this study. The results obtained from the models are showed in the confusion matrix where the ground truth are compared with respect to the model predicted values and hence the calculation of accuracy, sensitivity, and specificity are shown as the following.
As shown in Fig. 7, the testing loss under the condition without pre-processing stop to decrease around 0.4 in Fig. 7a and with ROI segmentation, CLAHE and wiener can reach as low as 0.1 in Fig. 7b. It shows that using these pre-processing methods is effective to decrease the testing loss. In order to show the effectiveness of the using of ROI segmentation, some examples of the obtained detections are shown in Fig. 8. Figure 8a are those only considering the pre-processing with CLAHE and Wiener. The left side is the ground truth and the right side is the prediction of the model. It can be observed wrong prediction figures show that the model always predicts on the non-pulmonary area. In Fig. 8b, the results by adding the approach of ROI segmentation. It is clearly evident that it can resolve the problem of the model labeled in the wrong areas. Because of that, the model trained can get lower loss. Also, it can be found that it keeps decreasing as the training epoch goes on. The training process becomes more smooth.   www.nature.com/scientificreports/ Complementary labeling. As mentioned, complementary labeling is considered to train the U-net. With two dimensions of labeling use the same loss function. The model will tend to decrease the loss in two labeling dimensions. In other words, the weight of per pixel decreases. In the output, the results of two dimensional predicted output are obtained by performing interaction with them. Table 2 shows the comparison among different methods in our testing data, without pre-processing and with pre-processing (with CLAHE, wiener filter, and ROI segmentation). From the table, it can be found that the Sensitivity is improved from 0.90 to 0.96 and the dice coefficient is improved from 0.573 to 0.790 due to the pre-processing. Also, the results are considered with different labeling ways in mono or in hybrid. Mono means only one label input in positive or negative (complementary labeling) is used and hybrid means two label inputs in positive or negative output are used. It shows that whether using positive or negative ground truth as model input, the result cannot be better than using mono input significantly. It seems that using complementary labeling is not effective under the big data quantity condition. Different data quantities, 472 cases and 50 cases are considered in this study. As shown in Table 3, it seems that using complementary labeling with pre-processing gets a better result than using mono input. But without pre-processing, complementary labeling is not efficient. It is worthwhile to use the complementary labeling method in small data quantity. That is why in this study complementary labeling is considered. The sensitivity of small data quantity, the hybrid negative is higher than using mono input positive with the pre-processing conditions. The hybrid negative is 0.9716 and the mono input positive is 0.9560. Thus, if data quantity is not big enough, trying to use complementary labeling is a good method to get better result. www.nature.com/scientificreports/ ROI segmentation model. It is noted that the ROI segmentation model used in this study is different from other approaches like these in the literature 21,22 . In those approaches, 3D ROI segmentation methods are employed to get the position of pulmonary. However, in this study, only a 2D segmentation method is considered. Although the result of our approach is not very good, there is future work need to do. The reason for the results being not good enough is the segmentation is not perfect in small lung slices. The nodules seldom appear in the top or bottom of the pulmonary. And the segmentation effects in those positions are not perfect. So the U-net, prediction model cannot get good results in these slices. Transfer learning is necessary to do in future work from 3D ROI segmentation methods. Nevertheless, by using this method in TMUH dataset can efficiently increase the dice coefficient score. The model can also widely be used in other CT lung images. Compared with the traditional method, ROI segmentation model can have better results even on 2D segmentation.

Discussion
We have shown that using complementary labeling is efficient in small data quantity in this study. However, using the pre-processing method such as CLAHE and ROI segmentation is also efficient as shown in our experiments. The results obtained by using some pre-processing method and complementary labeling show good results as shown in Table 3. In addition, although the testing dataset is only 4 cases. As shown in Table 4, the variance of mean and std in different dataset is not disparate obviously. Because the testing data is split from the same source to training data and validation data. To conclude, the model trained is U-net and the loss function used is dice coefficient loss. As the result shows in Tables 2 and 3, complementary labeling does not show good effects with a big quantity of data but show better results in a small quantity of data. The main reason is the background (no nodule) is more easily to be found than the foreground (nodule) in a small data set, and the foreground is more   www.nature.com/scientificreports/ easily to be found than the background in a big data set. There are only 50 cases of patient CT images at first in this study. Due to this reason, we choose complementary labeling somewhat like a data augmentation method to train the model. Using ROI segmentation is efficient, but in some image slices (no lung) are not efficient. The model tends to predict something in no lung image slices. The main idea of this study is to use a 2D ROI segmentation model to get the position of pulmonary. In future work, transfer learning is necessary to do to make this model more completed. Although the result of complementary labeling is not efficient in a big quantity of data, the consumption of computing power and time is good. Using an NVIDIA RTX 2080 Ti GPU with 11 GB RAM, training took almost 3 days for 3000 epochs. Future research should do the data augmentation if the GPU memory is enough. In the paper 23 , using a large batch can get better results than use a small batch. Using 3D images will consume more memory in GPU. How to find a trade-off between the input image size and the batch size is an important issue to be considered.