Quantifying the cell morphology and predicting biological behavior of signet ring cell carcinoma using deep learning

Signet ring cell carcinoma (SRCC) is a malignant tumor of the digestive system. This tumor has long been considered to be poorly differentiated and highly invasive because it has a higher rate of metastasis than well-differentiated adenocarcinoma. But some studies in recent years have shown that the prognosis of some SRCC is more favorable than other poorly differentiated adenocarcinomas, which suggests that SRCC has different degrees of biological behavior. Therefore, we need to find a histological stratification that can predict the biological behavior of SRCC. Some studies indicate that the morphological status of cells can be linked to the invasiveness potential of cells, however, the traditional histopathological examination can not objectively define and evaluate them. Recent improvements in biomedical image analysis using deep learning (DL) based neural networks could be exploited to identify and analyze SRCC. In this study, we used DL to identify each cancer cell of SRCC in whole slide images (WSIs) and quantify their morphological characteristics and atypia. Our results show that the biological behavior of SRCC can be predicted by quantifying the morphology of cancer cells by DL. This technique could be used to predict the biological behavior and may change the stratified treatment of SRCC.

www.nature.com/scientificreports/ reveal important information to render an accurate diagnosis and thus, more accurate treatment strategies for the treatment of SRCC.

Results
Overview of the datasets. A total of 607 whole-slide images (WSIs) extracted from high resolution were analyzed. More than 29 million cells were quantified by the deep-learning framework we developed (Fig. 1). Four inherent cell properties were measured: average cell area (SC), nucleus area (SN), ellipticity (EP), and nuclear-plasma ratio (NCR). The standard deviation of each term is used to represent the atypia. According to the primary site, these sections were divided into stomach (259, 42.7%) and colorectum (348, 56.8%). Except for the primary tumor (87/259, 33.6%; 218/348, 62.6%), we also analyzed the metastatic regional lymph nodes (74/259, 28 Biological behavior and cell morphology of gastric SRCC . Gastric SRCC is a special entity of poorly differentiated adenocarcinoma in the pathological classification of gastric cancer. The prognosis of gastric SRCC in different TNM stages is significantly different: early gastric SRCC has a low rate of lymph node metastasis and a better prognosis than other types of early gastric cancer, while advanced gastric SRCC has high invasion and metastatic ability and poor prognosis [1][2][3] . Early gastric SRCC can be cured by endoscopic submucosal dissection (ESD) or endoscopic mucosal resection (EMR) to reduce postoperative complications 12 . Here we analyze the condition of gastric SRCC. This part of the analysis cohort consists of 87 WSIs, which can be divided into either lymph node involvement or not, or T1 ~ T4 according to the depth of invasion according to the current AJCC TNM staging system (Supplementary  13 conclusion that the shape of nucleus can encode prognostic information for different types of cancer. Lymph node metastasis is required to assess in pathologic staging of gastric SRCC for its prognostic value of postoperative patients. However, most studies believe that there is no reliable approach to predict gastric SRCC lymph node metastasis before the operation, even with the tissues from sentinel node biopsy 14 , which becomes one of the reasons for overtreatment. Our study revealed that there was a statistically significant correlation between the ellipticity difference and lymph node metastasis in gastric SRCC. The cells in the lymph node-positive group tend to have a larger EP SD than the lymph node-negative group (0.1013:0.0991, p = 0.045) (Supplementary Table 2), which means that their shapes are more irregular. These results suggest that it is possible to predict lymph node metastasis by cell morphology, which depends on the physical properties of the cell membrane, which has been proved to be related to cancer metastasis in some studies. Accurate prediction of lymph node metastasis in patients with gastric SRCC is of great significance for effective clinical treatment and ensuring a better prognosis.
Lymph node involvement and cell morphology of colorectal SRCC . Colorectal SRCC is a rare subtype of colorectal cancer with distinctive molecular characteristics, including a low incidence of KRAS, PIK3CA, and APC mutations 1 . A study shows that diffuse infiltrating colorectal SRCC with little or no extracellular mucin is more invasive and has a worse prognosis than mucin-rich SRCC, which is often accompanied by peritoneal dissemination 15 . This suggests that there is a certain relationship between the morphology of tumor cells in colorectal SRCC and their biological behavior. Compared with gastric SRCC, colorectal SRCC exhibit more aggressive biological behavior and patients were often diagnosed with neoplasms of greater size, which explained the limited number of T1 and T2 cases in our cohort, hence we only analyze the status of lymph node involvement of colorectal SRCC. This part of the analysis cohort consists of 218 HE stained WSIs taken from primary lesions of colorectal SRCC, which are either negative or exhibit metastases in sentinel lymph nodes.
Compared with gastric SRCC, colorectal SRCC has larger SC (p < 0.001) and SN (p = 0.002), while the NCR was lower (p = 0.003). It is worth noting that atypia of colorectal SRCC is more obvious than gastric SRCC which is mainly reflected in SC SD (p < 0.001), SN SD (p < 0.001), and EP SD (p = 0.031) (Supplementary Table 3). The results are consistent with the objective fact that the prognosis of colorectal SRCC is worse, and further explain its aggressive biological behavior.
In addition, our results further confirmed some independent significant predictors of lymph node metastasis in colorectal SRCC. We found the SC (p = 0.001), SN (p = 0.001) and EP (p = 0.001) were the most correlated inherent properties with lymph node involvement (Supplementary Table 4). The cases with lymph node involvement have bigger cell and nuclear cross-sectional area than the rests, while the cell ellipticity is relatively smaller. SC SD (p = 0.001), SN SD (p = 0.001) and are the most related parameters of atypia with lymph node metastasis. Moreover, EP (HR 2.836, 95%CI 0.224-35.906, p = 0.002) and SN SD (HR 22.672, 95%CI 2.808-183.03, p = 0.001) were identified as independent significant predictors of lymph node metastasis in colorectal SRCC (Fig. 4). Obviously, in cases with lymph node involvement, the atypia of cancer cells is greater in all three categories. When colorectal SRCC lymph nodes are involved, the tumor cells have greater sizes with enlarged, irregular-shaped www.nature.com/scientificreports/

Different forms of metastatic lesions.
It is well known that in addition to the higher rate of lymph node metastasis, SRCC also has several other paths of metastasis. Peritoneal dissemination is a classic metastasis form of gastrointestinal cancer. The "seed and soil" theory has been established as the basic theory 17 . Besides, about 50% of the metastatic tumors of the ovary are Krukenberg tumors, most of them came from SRCC 18 . Gastric SRCC is the most common primary tumor, followed by colorectal cancer. It is generally believed that the tumor cells in the metastatic lesion are one or more subclones in the primary lesion which is competent to overcome the metastatic barriers 16 . To metastasize, a cell must overcome multiple obstacles in the metastatic cascade-invasion and migration through the stromal, vessels, survival from shear forces of blood flow, successful re-attachment to blood vessel walls, and thus settle down into another place 17 . Different metastatic lesions from the same tumor need to face distinct metastatic barriers, its internal molecular changes are diverse, and the prognostic significance is also different-these are directly associated with the cellular physical properties, such as the formation and destruction of the cytoskeleton 19 , the process of epithelial-mesenchymal transition 20 , or the effect of extracellular matrix on cells 21 , which are all directly related to cell morphology. Therefore, to explore the differences among metastatic lesions, we analyzed the metastatic regional lymph nodes, peritoneal implants, and Krukenburg tumors by DL. The differences in cellular properties and atypia among the three categories are listed in Supplementary Table 5.
In terms of SC and its atypia, peritoneal implants were the largest, Krukenburg tumors were the second, and regional lymph nodes were the smallest (p < 0.01). Metastatic regional lymph nodes are so different from peritoneal implants that all four cellular properties show a significant difference. Peritoneal implants have bigger SC (p < 0.001) and SN (p < 0.001) than regional lymph nodes and smaller EP (p = 0.049) and NCR (p = 0.012). Krukenburg tumor shows less atypia than regional lymph nodes (p = 0.031) and peritoneal implants (p = 0.011) in the degree of EP, and less atypia than peritoneal implants in the degree of SN (p = 0.016). The atypia of NCR is more obvious than peritoneal implants (p = 0.02). It is easy to notice that the atypia of peritoneal implants is the most prominent, which may mean that there are relatively fewer barriers to metastasis to the peritoneal, and more tumor subclones will be able to meet this condition. Moreover, compared to regional lymph nodes, the difference in cell size of Krukenburg tumors is more obvious, but the shape is relatively more regular. This may indicate that more consideration should be given to whether these genes are located in the molecular pathways that play a major role in cell growth when we look for genes that play an important role in metastasis to the ovary. While the genes that play a more important role in lymph node metastasis may regulate the physical properties of cell membrane fluid 22 , affect the cytoskeletal rearrangements within cancer cells 23 , and drive their invasion and migration through the stroma 24 .
The results of this experiment showed that there were significant differences in cell properties and atypia among metastatic regional lymph nodes, peritoneal implants, and Krukenburg tumors, and further confirmed that different forms of metastatic lesions represented different biological events. A study has analyzed the whole gene expression of gastric cancer cells with different metastatic potential 25 . The results suggest that there is a www.nature.com/scientificreports/ significant difference in gene expression between gastric cancer cells with abdominal metastasis and lymph node metastasis. The same result has also been confirmed by experiments in other common tumors, such as breast cancer and pancreas cancer 26 . These studies are carried out on cell lineages, and the biggest limitation is that they can not reflect the most real biological behavior of tumors objectively in a variety of complex body environments. However, our study makes up for the above regret.

Discussion
Whether SRCC is an independent tumor entity has been questioned for a long, because of its vague definition and variable prognosis. It is challenging to predict biological behavior and formulate accurate treatment due to the limitations of traditional pathological examination [1][2][3]14 . However, pathologists could only visually detect signet ring cells under the microscope, and carry out limited qualitative analysis and inaccurate quantitative analysis. Several deep learning models for the diagnose of gastrointestinal tract cancers based on WSIs have been reported. The advance functions including automated histological classification 4,26,27 , immunohisto-chemistry results interpretation 28 , or even microsatellite instability prediction 29 . However, these studies are based on histology and do not perform the identification of individual tumor cells. Moreover, all of these studies acknowledge that there are some deficiencies in their diagnosis of poorly differentiated cancers, including SRCC, because they do not have complex histological structures such as glands or papillaries that can be identified in well-differentiated adenocarcinomas. For the first time, we used DL to analyze the morphology of gastrointestinal SRCC and its different metastatic lesions (lymph nodes, peritoneal implants, and Kukenburg tumors) and accurately measure the inherent properties of signet ring cells, including the cross-sectional area of cell plasma and nuclear, the cell ellipticity, as well as the nuclear/cytoplasmic ratio. We analyzed these inherent properties and used them to define different dimensions of cell atypia. As is known to all, cell morphology is the macroscopic expression of the final coding proteins by the tumor genome 30 . Compared with the analysis of some specific genes, cell morphology is relatively easier to observe and measure 31 . A tumor with greater heterogeneity, or an "ugly tumor", usually has a worse prognosis, which has been widely recognized by pathologists. The measurement of intra-tumor heterogeneity can be used as a biomarker to predict treatment and improve outcome 32 , while the same picture can also be captured by DL to achieve a better prediction effect 33,34 . Compared with previous studies on SRCC, our results quantify the specific manifestations of atypia, or how ugly the tumor is, in more detail, and clarify that cell morphology reflects the outcome of the tumor objectively. Our results provide new insight into the biological processes involved in disease etiology and can be used as biomarkers for the diagnosis or prognosis of SRCC.
Our research has the potential to be further developed. The research was conducted by Sundar et al. 35 on primary gastric cancer intratumoral heterogeneity has revealed the significant differences in gene expression in tumor superficial areas, deep areas, and regional lymph node from the gene level, which may affect the treatment. Therefore, we're indicated to concentrate our future work on evaluating the multi-dimensional information of different subregions in the primary lesion and develop appropriate segmentation algorithms to confirm the impact of temporal and spatial heterogeneity of the tumor at the cell level. In addition, cell morphology reflects the complex genomic and gene expression changes of cancer cells, and the measurement of morphological heterogeneity combined with functional spectrum can be a powerful, high-throughput, economical and effective means to diagnose and guide treatment. Therefore, we can further explore and clarify the relationship between the morphology and molecular changes of SRCC tumor cells. In addition, we will try to establish a prognosis cohort of different patients from different regions to confirm whether our algorithm can predict prognosis or guide treatment as good as or better than the existing tumor staging system.
As more and more data being available and algorithms become perfect, we hope that our results can be combined with biopsy, pathological examination, gene sequencing, and other means of testing methods, which will contribute to more effective stratified treatment, survival prediction, and patient management, and improve the treatment decisions and outcomes of SRCC patients.

Methods
Datasets information. Each case of our data set was prepared in the Department of Pathology, Ruijin Hospital, Shanghai Jiaotong University School of Medicine in 2014-2020. The WSIs were produced at × 40 magnification (0.238 μm/pixel) by the National Medical Products Administration-cleared KFBio KF-PRO-005 digital scanner. All of the sections were evaluated by two pathologists and reviewed by a senior pathologist through a standardized procedure. We adopt the 4th edition of the WHO Classification of Tumors of the Digestive System as the reference standard. The patients who received neoadjuvant chemotherapy were excluded to eliminate the effect of treatment on cell morphology.
For gastric and colorectal SRCC, we selected 1-4 HE sections of each case, which was determined by the number of tumor cells at the time of initial evaluation. In order to ensure the stability of the test, sections containing at least 500 signet ring cells will be selected. We also selected all the Krukenburg tumors and peritoneal implants of SRCC in our case bank over the past six years. Cases with poor image quality were excluded. Additional clinical information was listed (Supplementary Table 1 www.nature.com/scientificreports/ with 95% CI of patients with lymph node involvement in colorectal SRCC. Youden index was used in calculating its corresponding optimal cutoff point. Signet ring cell and nuclei segmentation. The deep learning methodology was conducted by Pytorch (Release 1.1.0) to perform Signet Ring Cell and Nuclei segmentation. The architecture of the deep learning model is called Deep Layer Aggregation 9 , a variant to UNet, which is an elaborately designed connection manner of convolution modules to assemble image features from various scales. A coarse UNet was established to find an approximate region that contains Signet Ring Cell at X10 magnification, to swiftly discard most of the benign parts in whole slide images. We developed another fine UNet to carry out four class segmentation for the constant size of an image input in 512 pixels height and width, at X40 magnification, who classifies each pixel to the background, cell, nuclei, or instance boundary. To extend the model robustness on the various source of whole slide images, while it is unattainable to make pixel-level annotations to numerous images, a semi-supervised learning methodology was conducted as described in Li et al. 11 , for models running on X40 magnification. For each independent Signet Ring Cell, the nucleus which has the largest intersection area is assigned to this cell. Then those morphological features of each cell could be easily obtained based on precise cell and nuclei mask, with the help of Opencv (2.4.9) and Scikit-Image (0.17.2). Based on thousands of cells will we have statistical results for each whole slide image for analysis. In details, based on the exact mask of cells and nuclei.
Cell Area is determined as the total number of pixels of Signet Ring Cell instance mask, Nucleus Area is the number of pixels of Nucleus instance mask. The aspect ratio of the minimum area rotated bounding box of each SRCC instance mask is called Cell Aspect Ratio. Nuclear Cytoplasmic Ratio is defined as the Nucleus Area divided by Cell Area.
Signet ring cell detection algorithms were developed by Pytorch1.1.0 and Python3.6.1. Thumbnail image and patches at 0.25 μm are extracted from whole slide image by Openslide3.4.1. After detecting cells at 0.25 μm, we downscale cell coordinates into a mask whose size is same as thumbnail image to generate gaussian map, pixel values near by cell coordinates are higher. With such gaussian map we could generate heatmap, then merge together with the thumbnail image to be those in Fig. 2.
Inference status. Previously as described in our former work 11 , during three folds cross-validation, there exists obvious scores degrading from easy (Ins Recall 0.705, Nor FPs 1.45, Ins FROC 0.692) separation to hard (Ins Recall 0.658, Nor FPs 0.943, Ins FROC 0.657) separation for not enough training data. In this paper we add extra 1000 unlabeled patches containing crowded signet ring cell and 2000 benign patches into the cooperative training procedure, leading to similar scores between easy (Ins Recall 0.713, Nor FPs 0.912, Ins FROC 0.702) and hard (Ins Recall 0.709, Nor FPs 0.904, Ins FROC 0.695) mode. With similar scores between the two data separation modes, we believe the model generalization could be enough for new whole slide images. By introducing a coarse signet ring cell detection model on X10 magnification to discard most of the benign regions, averagely for each whole slide image inference time is reduced from 20 to 3 min, compared to predicting all the areas on × 40 magnification.
#FROC: by adjusting the confidence threshold, when the number of normal region's false positives is 1, 2, 4, 8, 16, 32, the FROC is the average of relevant recall at those confidence thresholds. Ins Recall: instance-level recall, Nor FPs: normal region false positives.
Ethical statements. This study did not involve human trials or participants, so it did not require approval from ethical committee.