Introduction

Breast cancer remains the most commonly diagnosed type of cancer and the second most common cause of cancer-related death in women1,2. Improved patient survival requires a multidisciplinary approach, with specialists in pathology, medical, surgical, and radiation oncology3. Clinical management and accurate diagnosis of breast cancer type, staging, and grade require tissue biopsies. Biopsy specimens must be processed onto slides and stained using immunohistochemical (IHC) staining procedures4. Hematoxylin and Eosin (H&E) is a common stain4 used to identify nuclei and cytoplasm; Hematoxylin stains nuclei blue; Eosin stains the cytoplasm pink4,5,6. Examination of tissue biopsy specimens, prepared onto slides, is currently considered the gold-standard clinical diagnosis of cancer6. A recent update to the American Joint Committee on Cancer (AJCC) breast cancer staging manual7 now also incorporates histological features, such as grade and receptor status, into the prognostic staging process. However, the manual process of histopathological analysis is laborious, time-consuming, and limited by the quality of the specimen and the experience of the pathologist8,9. As a result, there has been growing interest in automating and digitizing pathological workflows, in addition to using computer-aided diagnosis (CAD) to streamline and optimize the process of tissue analysis8.

The emerging field of digital pathology has been made possible with the advent of high-resolution whole slide image (WSI) scanners10 and has further been driven by advancements in imaging technologies, data storage, and computer vision algorithms11. The high-resolution WSIs, obtained by scanning histopathological specimens, allow for in silico analysis. The application of artificial intelligence (AI) – in particular, its subfields of machine learning (ML) and deep learning (DL) – for in silico analysis are proving to be promising tools10. Researchers and data scientists in AI have applied convolutional neural networks (CNN) and machine classifiers to segment and classify objects, predict disease diagnosis, and treatment response; to tailor medical treatment, develop diagnostic assays, and determine patient response to treatment therapies10. For example, Wang et al.,12 demonstrated that in combining a CNN and random forest classifiers, mitotic nuclei could accurately be detected in H&E stained breast tissue. Digital pathology aims to develop clinically relevant computational methods to enhance specimen characterization and automate the detection of conventional features such as nuclear pleomorphism (e.g. nuclear shape and size) and tissue-spatial characteristics (e.g. cellular distribution)13.

Recent developments in CNN architectures represent the rapidly growing field of DL14. CNNs, trained on sizeable supervised image datasets, have been used to develop state-of-the-art learning algorithms15, capable of carrying out classification and segmentation tasks for medical image analysis. However, a significant challenge with advancing these algorithms includes the limited availability of supervised histopathological datasets. A dataset review is presented in Table 1 and outlines publicly available supervised histopathological datasets16,17,18,19,20,21,22,23,24,25. These datasets feature expert annotations of histopathological structure from a variety of tissues and staining techniques. A number of these datasets contain breast tissue samples; however, the method in which the histopathological structures were annotated is not consistent. Of the seven datasets that contain breast tissue, two datasets17,21 were not exhaustively annotated, and two other datasets18,23 contained annotations where the borders of the nuclei were not outlined. In general, training CNNs for semantic or instance segmentation requires exhaustive and full annotations of specific histopathological structures19.

Table 1 Open-source histopathological datasets.

U-Net26 is an example of a deep CNN (DCNN) that provides semantic segmentations. The network was first developed by Ronneberger et al., building on the fully convolutional network27 (FCN). U-Net improved on the FCN by reducing training time and enhancing localization with the addition of long skip connections and up-sampling layers to the encoder-decoder architecture. Since the introduction of U-Net, it is now recognized that the network can yield state-of-the-art segmentation results28,29,30,31.

Innovative segmentation algorithms that have gained widespread attention further include Mask regional convolutional neural network32 (Mask R-CNN), a two-stage DCNN. He et al.,32 introduced Mask R-CNN to provide instance segmentation, expanding on faster regional convolutional neural network33 (Faster R-CNN). At a high-level Mask R-CNN localizes each object instance to a bounding box then segments each instance.

Many recent state-of-the-art DCNN architectures, together with network weights pre-trained on large supervised image repositories, have been made publicly available for researchers and data scientists to implement. However, one of the restrictive factors in breast pathomics is the limited availability of supervised histopathological datasets. Previous studies34,35 and computer vision competitions22,36 have tested the generalizability of novel DCNN on multi-organ and multi-organism datasets; however, no study has evaluated if DCNNs trained on a variety of organs can accurately segment breast tumor nuclei. Therefore, this study aimed to determine if DCNNs, trained on histology images independent of breast tissue, can accurately segment tumor nuclei of the breast. Two DCNN architectures, one providing semantic segmentation (U-Net) and one providing instance segmentation (Mask R-CNN), will be evaluated. Additionally, classical segmentation methods will segment tumor nuclei of the breast, and the results compared to those of the DCNNs.

Methods

Training datasets

The training dataset implemented in this study featured open-source images from the Multi-Organ Nucleus Segmentation (MoNuSeg) challenge16,37. The MoNuSeg dataset, obtained from National Cancer Institute’s cancer genome atlas38 (TCGA), included high-resolution WSI of H&E stained slides from nine tissue types, digitized in eighteen different hospitals at 40 × magnification16. The types of tissue included breast, liver, kidney, prostate, bladder, colon, stomach, lung, and brain. Sub-regions (1,000 × 1,000 pixel), densely populated with nuclei, were extracted from the WSI. This study used the ground truth annotations released during the challenge. These ground truth annotations included all epithelial and stromal cells, which were annotated by students, then approved by an expert pathologist, with less than a 1% error in detection16. There were approximately 28,000 nuclei annotated across the entire dataset. The training set for this study consisted of colour normalized39 H&E images from all tissue types, excluding breast.

Testing datasets

The hold-out (test) set exclusively contained H&E-stained images of breast tissue. There were fifty-eight images in total from two separate datasets. Eight images were from the MoNuSeg dataset and fifty from the Triple Negative Breast Cancer (TNBC) dataset19. The ground truth annotations associated with these images were curated and released with the respective dataset. The TNBC dataset contained fifty H&E stained images taken at 40 × magnification from eleven TNBC patients19. Three to eight sub-regions (512 × 512 pixel), with varying cellularity, were extracted per patient. The annotations were performed by three expert pathologists. Annotated cells included: normal epithelial cells, myoepithelial breast cells, invasive carcinoma cells, fibroblasts, endothelial cells, adipocytes, macrophages, and inflammatory cells. As the cell class was unavailable, with respect to both datasets, performance measures were based on segmenting all annotated cells within the test images.

All images were colour normalized39, then tiled to a size of 256 × 256 pixels with a 50% overlap. Post-processing the probability map, from any given U-Net like DCNN, involved calculating a threshold that maximized precision at a given intersections over union (IoU) threshold. Furthermore, we implemented morphological operations that filled missing pixels and removed small predicted artifacts.

Software and hardware

All software related to this study was written in Python programming language version 3.7.6., using Anaconda (https://www.anaconda.com). Each DCNN was trained and implemented with Keras version 2.3.140 and Tensorflow version 2.1.041. U-Net like architectures were implemented with the segmentation models package (https://github.com/qubvel/segmentation_models) and Mask R-CNN was implemented with the matterport package (https://github.com/matterport/Mask_RCNN). The ImageJ2-Fiji42,43 (Fiji) package (https://imagej.nih.gov/ij/download.html) was applied for nuclei segmentation. Otsu threshold44 and watershed transform45,46 were applied with the scikit-image package (https://scikit-image.org) version 0.17.2. All experiments were performed on a workstation equipped with an AMD (Advanced Micro Devices, Inc., Santa Clara, USA) Ryzen Threadripper 1920X 12-Core Processor, 64 GB of RAM, and a single NVIDIA (NVIDIA Corporation, Santa Clara, USA) GeForce RTX 2080 Ti graphics processing unit (GPU).

Classical segmentation techniques

Fiji is an open-source image processing and analysis software package that features a built-in nuclei segmentation pipeline. To prepare the images for segmentation, they were first colour normalized39, then converted to gray-scale. The pipeline involved implementing a Gaussian filter, thresholding the images, followed by applying the watershed transform. The classical segmentation techniques (Otsu threshold, Watershed transform, Fiji) applied to the images of this study, were used to compare the performance of DCNNs to such techniques.

U-net

Eight DCNNs with U-Net like architecture were trained for this study; furthermore, the encoders of the DCNNs were replaced with the following networks: visual geometry group47 (VGG)-16, VGG-1947, Residual Networks48 (ResNet)-50, ResNet-10148, ResNet-15248, Dense convolutional Network49 (DenseNet)-121, DenseNet-20149, and Inception-v350. These DCNN encoders were chosen because of their success in image classification14,51 and their wide use in a range of instance and semantic segmentation tasks52,53,54. Weights pre-trained on the ImageNet55 dataset initialized each DCNN. All DCNN decoders consisted of up-sampling, concatenation of the respective feature map from the encoding path, followed by two blocks of 3 × 3 convolutions, batch normalization56, and rectified linear units57 (ReLU) activation. The final layer of all DCNNs consisted of a 1 × 1 convolution with SoftMax activation.

All U-Net like DCNNs implemented training images with randomly applied combinations of rotation, distortion, skew, shear, and flip augmentations. The normalized training images was passed through an augmentation pipeline in groupings of H&E images, annotations, and weighted maps; ensuring that all corresponding images remained spatially aligned. All images were further tiled to a size of 256 × 256 pixels, with a 10% overlap.

There were 6,720 images used during training, split into a ratio of 80:20 between the training and validation sets, respectively. Adaptive moment estimation58 (Adam) optimizer with a learning rate of 1e-4 was used to fine-tune all U-Net like DCNNs. All DCNNs were trained until convergence. Each DCNN had a combined loss function of weighted cross-entropy26 and soft Dice (Eq. 1):

$${\mathcal{L}}_{combined}= {\mathcal{L}}_{WCE}+ {(1-\mathcal{L}}_{Dice})$$
(1)

Equation (2), which represents weighted cross-entropy26 was formulated as:

$${\mathcal{L}}_{WCE}= -\sum_{\mathcal{X}\in {\mathbb{X}}}{\mathcal{W}}_{map}\left(x\right) \mathrm{log}({p}_{a(x)}(\mathcal{X}))$$
(2)

where \(\mathcal{X}\) was the image used during training, \(\alpha\) was the ground truth annotation of each image, across all classes, and \(\mathcal{W}\) was the weighted map introduced to the loss function. Dice loss (Eq. 3) was formulated as:

$${\mathcal{L}}_{Dice}= \frac{2\sum_{\mathcal{X}\in {\mathbb{X}}}p\left(\mathcal{X}\right)gt\left(\mathcal{X}\right)}{\sum_{\mathcal{X}\in {\mathbb{X}}}({p(x)}^{2}+ {gt\left(\mathcal{X}\right)}^{2})}$$
(3)

where \(\mathrm{\rm P}\) was the output of the SoftMax activation function, and \(gt\) was the ground truth annotation.

Mask R-CNN

Prior to training Mask R-CNN, all training images were tiled to a resolution of 256 × 256 pixels with a 10% overlap, for a total of 560 images. The images were further randomly split into training and validation sets at a ratio of 80:20, respectively. The CNN backbone of Mask R-CNN was ResNet-101 and through optimization, Gradient Descent59 was chosen as the optimizer for the network. Furthermore, ResNet-101 was initialized with weights pre-trained on the common objects in context60 (COCO) dataset. Gradient Descent was set to a momentum to 0.9, a learning rate of 1e-3, a weight regularizer of 1e-4, and trained until convergence.

Ensemble networks

U-Net ensemble calculated the pixel-wise mean of the probability maps from the U-Net like DCNNs with VGG-19, ResNet-101, and DenseNet-121 encoders.

A gradient boosting network (GB U-Net) was developed to improve localization. GB U-Net is a two-stage network where at a high-level, concatenates feature maps from U-Net like DCNNs with VGG-19, ResNet-101, and DenseNet-121 encoders with the colour normalized H&E image, then uses the concatenated images as input to an additional U-Net (Fig. 1). The feature maps concatenated with the H&E image were taken from the last convolutional layer, just before the SoftMax activation. The second stage of GB U-Net was trained with a slightly modified encoder, which featured repeated blocks of 3 × 3 convolutions, batch normalization56, ReLU57 activation, and Dropout61 layers. The decoder followed the same structure as all previous U-Net like DCNNs. GB U-Net was trained until convergence using the 6,720 augmented H&E images, masks, and weighted maps previously mentioned. The network featured a combined loss function of weighted cross-entropy26 and soft Dice, previously outlined.

Figure 1
figure 1

Network architecture of GB U-Net, a gradient boosting network. The first stage of GB U-Net concatenated features maps from three U-Net like DCNN (VGG-19, DenseNet-121, and ResNet-101 encoders), with the colour normalized H&E image. The second stage of GB U-Net passed the concatenated image through a final U-Net. The original H&E images were curated by the TCGA. Ground truth annotations were released by Kumar et al.16, and additional annotations were released during the MoNuSeg challenge.

Evaluation metrics

This study's evaluation metrics were Aggregated Jaccard Index16 (AJI) and mean average precision (mAP). Kumar et al.,16 proposed AJI, which penalized pixel and object level detection. AJI penalized missed objects, falsely detected objects, along with under and over segmented objects. It did this by computing the ratio of aggregated cardinality of intersection and union of predicted nuclei matched with annotated nuclei. The mAP is a popular metric used in object detection and segmentation tasks. The metric yields average precision over multiple IoU thresholds. Specifically, ten IoU thresholds starting at 0.5 and increasing linearly by 0.05 to 0.95, were used for this study.

Results

Table 2a,b summarize the results of the DL and classical segmentation methods for the MoNuSeg and TNBC datasets, respectively. Table 2a is composed of three categories, 1) classical segmentation methods (Otsu threshold, watershed transform, Fiji), 2) U-Net like DCNNs with varying encoders, 3) Mask R-CNN and ensemble networks. With an AJI of 0.3396 and mAP of 0.237, Fiji considerably outperformed the Otsu thresholding and watershed transform. However, the DL methods outperformed Fiji across all metrics. In comparing all U-Net like DCNNs, the Densenet-201 encoder scored highest in AJI with 0.5083, while the ResNet-101 encoder scored highest in mAP with 0.3318. With the highest AJI, the Densenet encoder surpassed all U-Net like DCNNs in both object level and pixel-wise segmentation accuracy. In contrast, the ResNet encoder demonstrated that it excelled in accurately classifying nuclei from other histopathological structures. Furthermore, the DenseNet-201 encoder scored highest in F1 at an IoU threshold of 0.5 and recall at IoU thresholds of 0.5 and 0.7; demonstrating that the network excelled in correctly identifying all nuclei, while also provided the best balance in correctly identifying and separating nuclei from other structures. With regards to the TNBC dataset (Table 2b) the ResNet-101 encoder scored highest in AJI with 0.5080, mAP with 0.3306, and AP at both IoU thresholds.

Table 2 F1, recall, and precision metrics are reported for two intersection over union thresholds, 0.5 and 0.7.

The overall top-scoring network in both datasets was GB U-Net. For the MoNuSeg dataset GB U-Net achieved an AJI of 0.5331 and a mAP of 0.3909, while for the. TNBC dataset the network achieved an AJI of 0.5403 and mAP of 0.3772.

Figure 2 displays AP plotted against the evaluated IoU thresholds for several segmentation methods using the MoNuSeg test dataset. GB U-Net and Mask R-CNN scored highest in this metric. Both networks displayed similar AP values throughout the IoU thresholds; however, GB U-Net displayed superior performance in the lower thresholds, while Mask R-CNN performed better in thresholds greater than 0.65. Both the ResNet-101 encoder and U-Net Ensemble displayed similar mAP scores, and Fig. 2 also demonstrated that both AP values remained similar across all IoU thresholds. A noticeable difference between the networks was that U-Net ensemble scored slightly higher in AP throughout the mid-range of IoU thresholds (0.55–0.8). Additionally, Fig. 2 demonstrated that the DCNNs outperformed the top-scoring classical segmentation method (Fiji) across all IoU thresholds.

Figure 2
figure 2

Average precision graphically displayed, across ten intersection over union thresholds, for the MoNuSeg test dataset.

Figure 3 displays the predicted maps of the MoNuSeg test images with the lowest and highest combined AJI. The predicted maps were colour coded with green indicating correctly classified pixels (true positive), red indicating pixels not classified as nuclei (false negative), and blue indicating pixels that were miss classified as nuclei (false positive). Row 1 (Fig. 3) provided examples of predicted maps associated with the lowest scoring image. The AJI values of these images are summarized in Table 3. The image was difficult to segment accurately, as large areas of the image displayed densely crowded regions of tumor nuclei. Fiji struggled to correctly separate nuclei within these areas, which resulted in over-segmentation. However, Fiji also under-segmented the surrounding areas. The DCNNs were better able to separate nuclei within the crowded areas; however, the networks tended to over-segment the outer areas. The top-scoring networks, GB U-Net and Mask R-CNN performed similarly on both images; however, under qualitative analysis, the main difference between the networks was that Mask R-CNN tended to over-segment images compared to GB U-Net.

Figure 3
figure 3

Nuclei annotations of the lowest and highest AJI images of the MoNuSeg dataset. Annotations pertaining to classical segmentation methods and the DCNN have been colour coded, such that green indicated true positive, red indicated false negative, and blue indicated false positive pixels. The original H&E images were curated by the TCGA. Ground truth annotations were released by Kumar et al.16, and additional annotations were released during the MoNuSeg challenge.

Table 3 Individual Aggregated Jaccard Index scores of the eight breast tissue images, which composed the MoNuSeg test dataset.

Discussion

This was the first study that investigated DCNN’s accuracy in segmenting invasive carcinoma of the breast when trained on H&E-stained digital images of liver, kidney, prostate, bladder, colon, stomach, lung, and brain cancers. The eleven DCNNs trained for this study included: eight U-Net like architectures with various encoders, two gradient boosting networks, and Mask R-CNN. Furthermore, the study's three classical segmentation methods included: Otsu threshold, watershed transform, and Fiji. Overall, the DCNNs outperformed all classical segmentation methods. The top-scoring DCNN in AJI and mAP was GB U-Net. Mask R-CNN scored slightly lower; however, it outperformed all U-Net like DCNNs in the MoNuSeg test dataset. Segmentation performance of the DCNNs demonstrated that, in using transfer learning, the networks were able to effectively implement features learned independently of breast tissue to segment tumor nuclei within breast tissue.

The work presented in this study reflects the emerging field of pathomics, which combines digital pathology and novel AI-based software. Overall, pathomics-based approaches provide novel techniques to augment clinical workflows by automating WSI feature extraction for diagnostic, treatment, and prognostic applications. For example, breakthroughs in AI-based technologies have automated histological grading of breast cancer specimens62, mitosis detection12, nuclei pleomorphism segmentation63, and tubule nuclei classification of estrogen receptor (ER) + breast cancers64. Furthermore, DL networks have automated important diagnostic and prognostic factors such as the classification of human epidermal growth factor receptor 2 (HER2) score65, along with ER and progesterone receptor (PR) status66. These molecular markers are fundamental to clinical decision making both at the time of diagnosis and recurrence, if recurrence occurs, to select the most effective therapies. Recent work has also identified certain tumor nuclei morphological features to be independent prognostic factors for tumor size and tubular formation67, alongside features that are significantly associated with eight-year disease-free survival68. Additionally, recent work by Ali et al.,69,70 introduced a prognostic assay that evaluated lymphocyte density as a predictor of pathological complete response (pCR) in breast cancer patients with neoadjuvant chemotherapy regimens. As the field of oncology moves into an era of precision medicine71, novel insights and innovations in tumor nuclei segmentation and feature extraction may lead to more precise breast cancer treatment selection. One such way is AI driven diagnostic and theragnostic tools that leverage DL architectures.

U-Net excels in biomedical semantic segmentation29; however, the network struggles with correctly classifying close or touching objects. In semantic segmentation, each pixel is identified with a class label specific to the object. Object instances within the same class are not separated; therefore, all foreground objects will possess the same class label within binary segmentation. Objects that are close together or overlapping, i.e., areas of crowded nuclei, may be incorrectly classified as a single object. Weighted cross-entropy26 is a loss function used to improve pixel classification of close or touching objects. Pre-calculated weighted maps26, are passed to the network and used to weight the cross-entropy loss. Weighted cross-entropy severely penalizes incorrect pixel classification of close or touching nuclei.

Symmetrical networks such as U-Net’s encoder-decoder architecture further possess an opportunity to modify the network’s structure and improve performance. The original U-Net encoder followed a typical CNN structure, where successive convolution, activation, and pooling layers calculated feature maps. However, the complexity of training DCNNs has resulted in the advancement of network architectures15 and contributed to the development of novel networks such as: VGG47, ResNet48, DenseNet49, and GoogLeNet50. These relatively newer networks have achieved superior state-of-the-art object detection results14 with improvements in feature extraction, minimizing the vanishing gradient, and aiding network convergence. By modifying the network architecture or replacing the standard encoder with a deeper CNN and using U-Nets long skip connections, researchers have improved object and segmentation level performance52,53,72.

Mask R-CNN is a novel network that was proposed to provide instance level annotations. The first stage of Mask R-CNN is identical to region proposal networks33 (RPN) of Faster-RCNN, which proposes class probability maps. The CNN backbone of Mask R-CNN provides the feature map as input for the convolutional layer specific to RPN; the convolutional layer's output then feeds both regression and classification layers. The regression layer predicts region proposals, while the classification layer predicts the probability of an object bound within the proposal. The second stage of Mask R-CNN uses the region proposals to calculate the object class, bounding box offset, and segments the object using the FCN27. Region of interest (RoI) align was introduced with Mask R-CNN to correct the RoI pool's roundoff error, aligning the segmented image with the object. RoI pool, introduced with fast regional convolutional neural network73 (Fast R-CNN), was developed to reduce computational time by taking feature maps, defined by a RoI, and scaling them to a fixed size; this prompted misalignment between the feature map and input image. RoI align incorporated bilinear interpolation to calculate floating-point values of the sampling points, avoiding quantization. Mask R-CNN outputs bounding box coordinates, object class probability score, and a segmented map of each object instance.

ML-based approaches work exceptionally well when the training and testing data are drawn from the same feature space74. In circumstances where data are sparse, transfer learning can transfer knowledge from one domain to a task within an independent domain74. However, it is imperative to ensure that both domains are related to avoid negative transfer. Transfer learning has been extensively implemented by data scientists for tasks such as text classifiers75,76,77. However, this was the first study investigating the use of transfer learning to segment tumor nuclei of breast tissue exclusively, to the author's knowledge. Currently there are minimal open-source annotated breast tissue datasets (Table 1), therefore it is essential to identify if DCNNs can leverage transfer learning to accurately segment breast nuclei from open-source datasets. Previous studies34,35,78 have used the MoNuSeg and TNBC datasets to evaluate the segmentation accuracy of novel DCNNs. The MoNuSeg test dataset of these studies included fourteen tissue samples from seven organs, three (bladder, colon, stomach) of which were withheld from the training set. Specifically, work by Graham et al78., compared a novel DCNN (HoVer-Net) to various other DCNNs. In the first experiment Graham and colleagues trained the DCNNs using the MoNuSeg dataset. In comparing the results of the current study to those of Graham and colleagues, GB U-Net outperformed Cell Profiler79 (AJI: 0.366), QuPath80 (AJI: 0.432), FCN827 (AJI: 0.281), FCN8 + WS (AJI: 0.429), SegNet81 (AJI: 0.377), SegNet + WS (AJI: 0.508), deep contour-aware network (DCAN)82 (AJI: 0.525), and CNN316 (AJI: 0.508). In the second experiment Graham and colleagues evaluated the DCNNs using the TNBC dataset. In comparing their results to the current study GB U-Net outperformed FCN8 + WS (AJI: 0.506), FCN8 (AJI: 0.281), Mask R-CNN (AJI: 0.529), DCAN (AJI: 0.537), Micro-Net83 (AJI: 0.531), and DIST19 (AJI: 0.523). Furthermore, a study by Wang et al34., evaluated various DCNNs trained using the MoNuSeg dataset. In comparing their results to the current study GB U-Net, again, outperformed FCN (AJI: 0.452), U-Net26 (AJI: 0.513), SegNet (AJI: 0.505), and DCAN (AJI: 0.518). Additionally, a study by Liu et al35., used thirteen images from the TNBC dataset to evaluate multiple DCNNs. In comparing their results to the current study GB U-Net outperformed GCN84 (AJI: 0.1907), Mask R-CNN (AJI: 0.5297), and pixel2pixel85 (AJI: 0.4760). However, novel DCNNs including HoVer-Net (AJI: 0.618 & 0.59), Bending Loss34 (AJI: 0.641), and panoptic segmentation35 (AJI: 0.5865) achieved a higher AJI compared to GB U-Net.

Many of the studies mentioned above featured DCNNs developed to provide precise segmentation of tumor nuclei, explicitly improving on segmenting areas of densely populated nuclei. Differentiation of nuclei is essential for spatial features; however, a study by Boucheron et al86., identified that, specific to tumor nuclei classification, perfectly segmented nuclei do not guarantee optimum classification accuracy. Although GB U-Net did not outperform all DCNNs the results demonstrate that transfer learning can be implemented while training DCNNs to segment breast tumor nuclei if precise segmentations are not explicitly required.

A limitation of this study is the relatively small dataset used for training and testing, based on the limited open-source datasets. Future work will involve expanding the number of images and types of tissues included in the dataset. Generative adversarial networks (GANs) are an innovative approach to creating synthetic histopathological images. GANs are a promising approach to data synthesis, in which synthetic H&E stained histopathology images and ground truth images can be generated from features learned from the training data87,88,89. GANs have the potential to significantly increase histopathological training sets without requiring additional annotations from expert pathologists.

Conclusion

The study's objective was to determine if transfer learning could be implemented with DCNNs to segment tumor nuclei of breast tissue accurately. This study introduced a novel ensemble network (GB U-Net), which scored highest in AJI and mAP. Mask R-CNN scored slightly lower in AJI and mAP, compared to GB U-Net; however, one of the main differences between the networks was that Mask R-CNN displayed improved AP at higher IoU thresholds (greater than 0.65). Overall, this study demonstrated that DCNNs trained on images independent of breast tissue could accurately segment invasive carcinoma of the breast by implementing transfer learning. One of the main limitations facing DL based researchers and data scientists is the limited availability of training and testing data. As it is impractical to expect that expert pathologists provide the time commitment required to produce large-scale histopathological datasets, various boosting and data synthesis options, such as transfer learning and GANs, should be further explored to augment clinical workflows and ultimately lead to improved patient care.