A review and comparison of breast tumor cell nuclei segmentation performances using deep convolutional neural networks

Breast cancer is currently the second most common cause of cancer-related death in women. Presently, the clinical benchmark in cancer diagnosis is tissue biopsy examination. However, the manual process of histopathological analysis is laborious, time-consuming, and limited by the quality of the specimen and the experience of the pathologist. This study's objective was to determine if deep convolutional neural networks can be trained, with transfer learning, on a set of histopathological images independent of breast tissue to segment tumor nuclei of the breast. Various deep convolutional neural networks were evaluated for the study, including U-Net, Mask R-CNN, and a novel network (GB U-Net). The networks were trained on a set of Hematoxylin and Eosin (H&E)-stained images of eight diverse types of tissues. GB U-Net demonstrated superior performance in segmenting sites of invasive diseases (AJI = 0.53, mAP = 0.39 & AJI = 0.54, mAP = 0.38), validated on two hold-out datasets exclusively containing breast tissue images of approximately 7,582 annotated cells. The results of the networks, trained on images independent of breast tissue, demonstrated that tumor nuclei of the breast could be accurately segmented.


Methods
Training datasets. The training dataset implemented in this study featured open-source images from the Multi-Organ Nucleus Segmentation (MoNuSeg) challenge 16,37 . The MoNuSeg dataset, obtained from National Cancer Institute's cancer genome atlas 38 (TCGA), included high-resolution WSI of H&E stained slides from nine tissue types, digitized in eighteen different hospitals at 40 × magnification 16 . The types of tissue included breast, liver, kidney, prostate, bladder, colon, stomach, lung, and brain. Sub-regions (1,000 × 1,000 pixel), densely populated with nuclei, were extracted from the WSI. This study used the ground truth annotations released during the challenge. These ground truth annotations included all epithelial and stromal cells, which were annotated by students, then approved by an expert pathologist, with less than a 1% error in detection 16 . There were approximately 28,000 nuclei annotated across the entire dataset. The training set for this study consisted of colour normalized 39 H&E images from all tissue types, excluding breast.
Testing datasets. The hold-out (test) set exclusively contained H&E-stained images of breast tissue. There were fifty-eight images in total from two separate datasets. Eight images were from the MoNuSeg dataset and fifty from the Triple Negative Breast Cancer (TNBC) dataset 19 . The ground truth annotations associated with these images were curated and released with the respective dataset. The TNBC dataset contained fifty H&E stained images taken at 40 × magnification from eleven TNBC patients 19 . Three to eight sub-regions (512 × 512 pixel), with varying cellularity, were extracted per patient. The annotations were performed by three expert pathologists. Annotated cells included: normal epithelial cells, myoepithelial breast cells, invasive carcinoma cells, fibroblasts, endothelial cells, adipocytes, macrophages, and inflammatory cells. As the cell class was unavailable, with respect to both datasets, performance measures were based on segmenting all annotated cells within the test images.
All images were colour normalized 39 , then tiled to a size of 256 × 256 pixels with a 50% overlap. Post-processing the probability map, from any given U-Net like DCNN, involved calculating a threshold that maximized precision at a given intersections over union (IoU) threshold. Furthermore, we implemented morphological operations that filled missing pixels and removed small predicted artifacts.  49 , and Inception-v3 50 . These DCNN encoders were chosen because of their success in image classification 14,51 and their wide use in a range of instance and semantic segmentation tasks [52][53][54] . Weights pretrained on the ImageNet 55 dataset initialized each DCNN. All DCNN decoders consisted of up-sampling, concatenation of the respective feature map from the encoding path, followed by two blocks of 3 × 3 convolutions, batch normalization 56 , and rectified linear units 57 (ReLU) activation. The final layer of all DCNNs consisted of a 1 × 1 convolution with SoftMax activation. All U-Net like DCNNs implemented training images with randomly applied combinations of rotation, distortion, skew, shear, and flip augmentations. The normalized training images was passed through an augmentation pipeline in groupings of H&E images, annotations, and weighted maps; ensuring that all corresponding images remained spatially aligned. All images were further tiled to a size of 256 × 256 pixels, with a 10% overlap.
There were 6,720 images used during training, split into a ratio of 80:20 between the training and validation sets, respectively. Adaptive moment estimation 58 (Adam) optimizer with a learning rate of 1e-4 was used to fine-tune all U-Net like DCNNs. All DCNNs were trained until convergence. Each DCNN had a combined loss function of weighted cross-entropy 26 and soft Dice (Eq. 1): Equation (2), which represents weighted cross-entropy 26 was formulated as: where X was the image used during training, α was the ground truth annotation of each image, across all classes, and W was the weighted map introduced to the loss function. Dice loss (Eq. 3) was formulated as: where P was the output of the SoftMax activation function, and gt was the ground truth annotation.
Mask R-CNN. Prior to training Mask R-CNN, all training images were tiled to a resolution of 256 × 256 pixels with a 10% overlap, for a total of 560 images. The images were further randomly split into training and validation sets at a ratio of 80:20, respectively. The CNN backbone of Mask R-CNN was ResNet-101 and through optimization, Gradient Descent 59 was chosen as the optimizer for the network. Furthermore, ResNet-101 was initialized with weights pre-trained on the common objects in context 60 (COCO) dataset. Gradient Descent was set to a momentum to 0.9, a learning rate of 1e-3, a weight regularizer of 1e-4, and trained until convergence.
Ensemble networks. U-Net ensemble calculated the pixel-wise mean of the probability maps from the U-Net like DCNNs with VGG-19, ResNet-101, and DenseNet-121 encoders.
A gradient boosting network (GB U-Net) was developed to improve localization. GB U-Net is a two-stage network where at a high-level, concatenates feature maps from U-Net like DCNNs with VGG-19, ResNet-101, and DenseNet-121 encoders with the colour normalized H&E image, then uses the concatenated images as input to an additional U-Net (Fig. 1). The feature maps concatenated with the H&E image were taken from the last convolutional layer, just before the SoftMax activation. The second stage of GB U-Net was trained with a slightly modified encoder, which featured repeated blocks of 3 × 3 convolutions, batch normalization 56 , ReLU 57 activation, and Dropout 61 layers. The decoder followed the same structure as all previous U-Net like DCNNs. GB U-Net was trained until convergence using the 6,720 augmented H&E images, masks, and weighted maps previously mentioned. The network featured a combined loss function of weighted cross-entropy 26 and soft Dice, previously outlined.
Evaluation metrics. This study's evaluation metrics were Aggregated Jaccard Index 16 (AJI) and mean average precision (mAP). Kumar et al., 16 proposed AJI, which penalized pixel and object level detection. AJI penalized missed objects, falsely detected objects, along with under and over segmented objects. It did this by computing the ratio of aggregated cardinality of intersection and union of predicted nuclei matched with annotated nuclei. The mAP is a popular metric used in object detection and segmentation tasks. The metric yields average precision over multiple IoU thresholds. Specifically, ten IoU thresholds starting at 0.5 and increasing linearly by 0.05 to 0.95, were used for this study.  The overall top-scoring network in both datasets was GB U-Net. For the MoNuSeg dataset GB U-Net achieved an AJI of 0.5331 and a mAP of 0.3909, while for the. TNBC dataset the network achieved an AJI of 0.5403 and mAP of 0.3772. Figure 2 displays AP plotted against the evaluated IoU thresholds for several segmentation methods using the MoNuSeg test dataset. GB U-Net and Mask R-CNN scored highest in this metric. Both networks displayed similar AP values throughout the IoU thresholds; however, GB U-Net displayed superior performance in the lower thresholds, while Mask R-CNN performed better in thresholds greater than 0.65. Both the ResNet-101 encoder and U-Net Ensemble displayed similar mAP scores, and Fig. 2 also demonstrated that both AP values remained similar across all IoU thresholds. A noticeable difference between the networks was that U-Net ensemble scored slightly higher in AP throughout the mid-range of IoU thresholds (0.55-0.8). Additionally, Fig. 2 demonstrated that the DCNNs outperformed the top-scoring classical segmentation method (Fiji) across all IoU thresholds. Figure 3 displays the predicted maps of the MoNuSeg test images with the lowest and highest combined AJI. The predicted maps were colour coded with green indicating correctly classified pixels (true positive), red indicating pixels not classified as nuclei (false negative), and blue indicating pixels that were miss classified as nuclei (false positive). Row 1 (Fig. 3) provided examples of predicted maps associated with the lowest scoring image.

Results
The AJI values of these images are summarized in Table 3. The image was difficult to segment accurately, as large areas of the image displayed densely crowded regions of tumor nuclei. Fiji struggled to correctly separate nuclei within these areas, which resulted in over-segmentation. However, Fiji also under-segmented the surrounding areas. The DCNNs were better able to separate nuclei within the crowded areas; however, the networks tended to over-segment the outer areas. The top-scoring networks, GB U-Net and Mask R-CNN performed similarly on both images; however, under qualitative analysis, the main difference between the networks was that Mask R-CNN tended to over-segment images compared to GB U-Net. www.nature.com/scientificreports/ The work presented in this study reflects the emerging field of pathomics, which combines digital pathology and novel AI-based software. Overall, pathomics-based approaches provide novel techniques to augment clinical workflows by automating WSI feature extraction for diagnostic, treatment, and prognostic applications. For example, breakthroughs in AI-based technologies have automated histological grading of breast cancer specimens 62 , mitosis detection 12 , nuclei pleomorphism segmentation 63 , and tubule nuclei classification of estrogen receptor (ER) + breast cancers 64 . Furthermore, DL networks have automated important diagnostic and prognostic factors such as the classification of human epidermal growth factor receptor 2 (HER2) score 65 , along with ER and progesterone receptor (PR) status 66 . These molecular markers are fundamental to clinical decision making both at the time of diagnosis and recurrence, if recurrence occurs, to select the most effective therapies. Recent work has also identified certain tumor nuclei morphological features to be independent prognostic factors for tumor size and tubular formation 67 , alongside features that are significantly associated with eight-year disease-free survival 68 . Additionally, recent work by Ali et al., 69,70 introduced a prognostic assay that evaluated lymphocyte density as a predictor of pathological complete response (pCR) in breast cancer patients with neoadjuvant chemotherapy regimens. As the field of oncology moves into an era of precision medicine 71 , novel insights and innovations in tumor nuclei segmentation and feature extraction may lead to more precise breast cancer treatment selection. One such way is AI driven diagnostic and theragnostic tools that leverage DL architectures.  16 , and additional annotations were released during the MoNuSeg challenge. Table 3. Individual Aggregated Jaccard Index scores of the eight breast tissue images, which composed the MoNuSeg test dataset. www.nature.com/scientificreports/ U-Net excels in biomedical semantic segmentation 29 ; however, the network struggles with correctly classifying close or touching objects. In semantic segmentation, each pixel is identified with a class label specific to the object. Object instances within the same class are not separated; therefore, all foreground objects will possess the same class label within binary segmentation. Objects that are close together or overlapping, i.e., areas of crowded nuclei, may be incorrectly classified as a single object. Weighted cross-entropy 26 is a loss function used to improve pixel classification of close or touching objects. Pre-calculated weighted maps 26 , are passed to the network and used to weight the cross-entropy loss. Weighted cross-entropy severely penalizes incorrect pixel classification of close or touching nuclei.
Symmetrical networks such as U-Net's encoder-decoder architecture further possess an opportunity to modify the network's structure and improve performance. The original U-Net encoder followed a typical CNN structure, where successive convolution, activation, and pooling layers calculated feature maps. However, the complexity of training DCNNs has resulted in the advancement of network architectures 15 and contributed to the development of novel networks such as: VGG 47 , ResNet 48 , DenseNet 49 , and GoogLeNet 50 . These relatively newer networks have achieved superior state-of-the-art object detection results 14 with improvements in feature extraction, minimizing the vanishing gradient, and aiding network convergence. By modifying the network architecture or replacing the standard encoder with a deeper CNN and using U-Nets long skip connections, researchers have improved object and segmentation level performance 52,53,72 .
Mask R-CNN is a novel network that was proposed to provide instance level annotations. The first stage of Mask R-CNN is identical to region proposal networks 33 (RPN) of Faster-RCNN, which proposes class probability maps. The CNN backbone of Mask R-CNN provides the feature map as input for the convolutional layer specific to RPN; the convolutional layer's output then feeds both regression and classification layers. The regression layer predicts region proposals, while the classification layer predicts the probability of an object bound within the proposal. The second stage of Mask R-CNN uses the region proposals to calculate the object class, bounding box offset, and segments the object using the FCN 27 . Region of interest (RoI) align was introduced with Mask R-CNN to correct the RoI pool's roundoff error, aligning the segmented image with the object. RoI pool, introduced with fast regional convolutional neural network 73 (Fast R-CNN), was developed to reduce computational time by taking feature maps, defined by a RoI, and scaling them to a fixed size; this prompted misalignment between the feature map and input image. RoI align incorporated bilinear interpolation to calculate floating-point values of the sampling points, avoiding quantization. Mask R-CNN outputs bounding box coordinates, object class probability score, and a segmented map of each object instance.
ML-based approaches work exceptionally well when the training and testing data are drawn from the same feature space 74 . In circumstances where data are sparse, transfer learning can transfer knowledge from one domain to a task within an independent domain 74 . However, it is imperative to ensure that both domains are related to avoid negative transfer. Transfer learning has been extensively implemented by data scientists for tasks such as text classifiers [75][76][77] . However, this was the first study investigating the use of transfer learning to segment tumor nuclei of breast tissue exclusively, to the author's knowledge. Currently there are minimal open-source annotated breast tissue datasets (Table 1), therefore it is essential to identify if DCNNs can leverage transfer learning to accurately segment breast nuclei from open-source datasets. Previous studies 34,35,78 have used the MoNuSeg and TNBC datasets to evaluate the segmentation accuracy of novel DCNNs. The MoNuSeg test dataset of these studies included fourteen tissue samples from seven organs, three (bladder, colon, stomach) of which were withheld from the training set. Specifically, work by Graham et al 78  Many of the studies mentioned above featured DCNNs developed to provide precise segmentation of tumor nuclei, explicitly improving on segmenting areas of densely populated nuclei. Differentiation of nuclei is essential for spatial features; however, a study by Boucheron et al 86 ., identified that, specific to tumor nuclei classification, perfectly segmented nuclei do not guarantee optimum classification accuracy. Although GB U-Net did not outperform all DCNNs the results demonstrate that transfer learning can be implemented while training DCNNs to segment breast tumor nuclei if precise segmentations are not explicitly required.
A limitation of this study is the relatively small dataset used for training and testing, based on the limited open-source datasets. Future work will involve expanding the number of images and types of tissues included in the dataset. Generative adversarial networks (GANs) are an innovative approach to creating synthetic histopathological images. GANs are a promising approach to data synthesis, in which synthetic H&E stained histopathology images and ground truth images can be generated from features learned from the training data [87][88][89] . GANs have the potential to significantly increase histopathological training sets without requiring additional annotations from expert pathologists.

Conclusion
The study's objective was to determine if transfer learning could be implemented with DCNNs to segment tumor nuclei of breast tissue accurately. This study introduced a novel ensemble network (GB U-Net), which scored highest in AJI and mAP. Mask R-CNN scored slightly lower in AJI and mAP, compared to GB U-Net; however, one of the main differences between the networks was that Mask R-CNN displayed improved AP at higher IoU thresholds (greater than 0.65). Overall, this study demonstrated that DCNNs trained on images independent of breast tissue could accurately segment invasive carcinoma of the breast by implementing transfer learning. One of the main limitations facing DL based researchers and data scientists is the limited availability of training and testing data. As it is impractical to expect that expert pathologists provide the time commitment required to produce large-scale histopathological datasets, various boosting and data synthesis options, such as transfer learning and GANs, should be further explored to augment clinical workflows and ultimately lead to improved patient care.

Data availability
The MoNuSeg dataset used for the current study is available in the Medical Image Computing and Computer Assisted Intervention Society (MICCAI) 2018, MoNuSeg challenge repository, https:// monus eg. grand-chall enge. org/ Data/. The TNBC dataset used for the current study is available in the segmentation of nuclei in histopathology images by deep regression of the distance map, Peter Jack Naylor GitHub repository, https:// github. com/ Peter JackN aylor/ DRFNS. The authors have made every effort to provide a detailed description of the software and hardware implemented with this study. All data, which has not been published in tables or alongside the article will be made available from the corresponding author upon request.