PathoNet introduced as a deep neural network backend for evaluation of Ki-67 and tumor-infiltrating lymphocytes in breast cancer

The nuclear protein Ki-67 and Tumor infiltrating lymphocytes (TILs) have been introduced as prognostic factors in predicting both tumor progression and probable response to chemotherapy. The value of Ki-67 index and TILs in approach to heterogeneous tumors such as Breast cancer (BC) that is the most common cancer in women worldwide, has been highlighted in literature. Considering that estimation of both factors are dependent on professional pathologists’ observation and inter-individual variations may also exist, automated methods using machine learning, specifically approaches based on deep learning, have attracted attention. Yet, deep learning methods need considerable annotated data. In the absence of publicly available benchmarks for BC Ki-67 cell detection and further annotated classification of cells, In this study we propose SHIDC-BC-Ki-67 as a dataset for the aforementioned purpose. We also introduce a novel pipeline and backend, for estimation of Ki-67 expression and simultaneous determination of intratumoral TILs score in breast cancer cells. Further, we show that despite the challenges that our proposed model has encountered, our proposed backend, PathoNet, outperforms the state of the art methods proposed to date with regard to harmonic mean measure acquired. Dataset is publicly available in http://shiraz-hidc.com and all experiment codes are published in https://github.com/SHIDCenter/PathoNet.


A.1 Imaging Setup
The camera sensor used for imaging has the following specifications: Aptina 1/2.3 inch color CMOS with a resolution equal to 4912x3684 pixels (18 megapixels) with pixel size about 1.25um x 1.25um; Complementary details for the mentioned camera are as follows: SNR: 36.3 dB; Dynamic Range: 65.8 dB; Frame speed: 5.6 FPS at 4912x3684, 18.1 FPS at 2456x1842, 32.2 FPS at 1228x922.

A.2 Data Label Representation
The motivation behind using a distribution over the center pixel as ground truth (GT) is to prevent the data from becoming unbalanced. This motivation is further investigated in Table. 1 that shows the effect of label type on the models' performance.
To obtain GT, after determining a center pixel by experts, a Gaussian distribution fits on the center with a variance of 9 pixels and the maximum value equal to 255. Figure. 1 represents discrete Gaussian distribution used in this study.

A.3 SHIDC-Lab Software
In this study, experts used SHIDC-Lab to annotate the images. SHIDC-Lab is designed to provide flexibility, speed, easy monitoring capability, and revision ability that can be run on any device with a web browser so that experts can perform labeling in their spare time. In SHIDC-Lab software, the process starts by uploading images to the server and defining annotation classes, corresponding signs, and colors. In our case, immunopositive, immunonegative, and TILs types are defined. Afterward, the head expert assigns images to each expert to perform labeling. Finally, head-experts revise annotations and mark images as the accepted version or re-assign them for another round of labeling.

A.4 Pre-processing and training
In order to prepare data, we first crop the 4912 × 3684 image and its corresponding label to non-overlapping 1228 × 1228 regions. Next, cropped images are resized to 256 × 256 pixels. Then the dataset, is split into two parts; 0.7 of the data for training and 0.3 for the test. Further, data augmentation is performed by flipping images with respect to x and y coordinates as well as 90, 180, and 270 degrees rotations. The final data after performing data augmentation has 8,280 images and 566,130 cells.
In training PathoNet, the simple MSE loss function by means of the ADAM optimizer is used. The learning rate is set to 0.0001 and decreases with a 0.1 rate every ten epochs. Keras framework was used in this study to train the network using two NVIDIA Geforce GTX 1060 and an Intel Core-i5 6400 processor.

A.5 Threshold Tuning
As explained in the Methodology section, to extract cell center pixels from the network's density map, thresholding is applied. To tune this threshold for each model, all values from 0 to 255 with a step size of 5 have been evaluated in terms of the F1-score of the cell predictions and the best threshold picked for each model. the train and test+train sets but performs poorly on the test set. The results indicate that U-Net couldn't generalize well on the task that may come from 1) the high number of parameters 2) simple architecture compared to the rest of the models. Results of DeepLabv3-Xeption puts the first hypothesis aside because DeepLabv3-Xeption has approximately 32% more parameters than the U-Net. We believe that the U-Net structure is a powerful yet simple structure that needs modifications to be utilized in more complex tasks. In this study, by designing a U-Net architecture with residual and dilated inception modules, we demonstrated a candidate for such a method with fewer parameters yet capable of generalizing complex tasks.

A.7 Watershed Algorithm Parameters.
In this study, the implementation of the Watershed algorithm from the Skimage library was used. To improve the Watershed algorithm's accuracy, the maximum points of the density map passed to the algorithm as the initial minimum points. As another parameter to this algorithm, the minimum distance between two maximum points is picked as 5.