Identification of metastatic primary cutaneous squamous cell carcinoma utilizing artificial intelligence analysis of whole slide images

Cutaneous squamous cell carcinoma (cSCC) harbors metastatic potential and causes mortality. However, clinical assessment of metastasis risk is challenging. We approached this challenge by harnessing artificial intelligence (AI) algorithm to identify metastatic primary cSCCs. Residual neural network-architectures were trained with cross-validation to identify metastatic tumors on clinician annotated, hematoxylin and eosin-stained whole slide images representing primary non-metastatic and metastatic cSCCs (n = 104). Metastatic primary tumors were divided into two subgroups, which metastasize rapidly (≤ 180 days) (n = 22) or slowly (> 180 days) (n = 23) after primary tumor detection. Final model was able to predict whether primary tumor was non-metastatic or rapidly metastatic with slide-level area under the receiver operating characteristic curve (AUROC) of 0.747. Furthermore, risk factor (RF) model including prediction by AI, Clark’s level and tumor diameter provided higher AUROC (0.917) than other RF models and predicted high 5-year disease specific survival (DSS) for patients with cSCC with 0 or 1 RFs (100% and 95.7%) and poor DSS for patients with cSCCs with 2 or 3 RFs (41.7% and 40.0%). These results indicate, that AI recognizes unknown morphological features associated with metastasis and may provide added value to clinical assessment of metastasis risk and prognosis of primary cSCC.


Results
Performance of AI-models. For testing of AI-models we utilized input data from 45 whole slide images (WSIs) representing mcSCCs and 59 WSIs representing non-mcSCCs. For rapid metastasis -AI-model, WSIs of a subcohort of 22 rapidly (≤ 180 days) metastatic mcSCCs was used. Tumor characteristics of the rapid metastasis -AI-model are shown in Table 1.
In single tile -AI-model, a slide level area under the receiver operating characteristic curve (AUROC) of 0.689 was reached at best. However, the model was dramatically overfitting to the training data despite heavy regulation and data augmentation and ultimately the model could not reliably reproduce the results between different sampling of folds. Invasive front -AI-model produced results inferior to the single tile -AI-model with tile level AUROC of 0.629 at best, regardless of whether tiles inside the annotation, outside the annotation or both were included. Multi-tile -AI-model did not produce more convincing results than the previous approaches (AUROC 0.672 at best) on stack tile level. This may be due to the fact that using n tiles as input effectively reduced the size of the dataset by n-fold, since we retained from showing a tile in different stacks multiple times.
Rapid metastasis -AI-model generated most convincing results from the beginning. Finally, AUROCs of 0.754-0.814 were reached on tile level depending on the fold (Fig. 1A). Furthermore, an average AUROC of 0.747 was achieved on slide level (Fig. 1B). Slide level results visualizing summary confusion matrices show that sensitivity of the rapid metastasis -AI-model was 64%, specificity 76% and accuracy 73% ( Supplementary Fig. 1). This rapid metastasis -AI-model was utilized in the creation of RFMs and survival analyses.
Risk factor analysis and evaluation by dermatopathologist. Next, we evaluated whether traditional histopathological features could predict the risk of metastasis better than the rapid metastasis -AI-model and whether the predictive power of the model could be explained by histopathological features. Pearson correlation was conducted taking into account every variable in Table 1. The highest correlation of AI prediction (0.329) was with AJCC-8 tumor staging and second highest (0.256) with BWH tumor staging (Supplementary Table 1). In comparison, Pearson correlations regarding prediction by pathologist were higher with highest correlation of 0.494 with BWH staging system and second highest (0.415) with Clark's level (Supplementary Table 1). Thus, it can be concluded that AI prediction did not strongly rely on any of the classical clinicopathological variables but was based on other morphological features of the tumor.
Logistic regression analysis was conducted and AUROCs created to evaluate the metastasis risk for each variable and to estimate the classification power of the rapid-metastasis -AI-model. Rapid metastasis -AI-model provided a slide level AUROC of 0.747 and an odds ratio (OR) of 5.63. In comparison, pathologist reached an AUROC of 0.694 and an OR of 5.71. Clark's level provided higher OR (13.75) and AUROC (0.788) than invasion beyond fat, and diameter provided highest AUROC of 0.804 of the individual clinicopathological variables. AJCC-8 and BWH tumor staging systems provided slightly higher AUROCs (0.816 and 0.818, respectively), but in logistic regression analysis the increase of risk was non-linear (Table 2).
Next, it was evaluated, whether prediction as metastatic by AI could act as an individual risk factor in multifactorial risk factor model (RFM) and improve the clinical risk assessment. For comparison, a RFM (conventional-RFM) taking into account tumor diameter ≥ 30 mm and Clark's level 5 as risk factors provided higher AUROC (0.862) than AJCC-8 or BWH staging systems. Furthermore, a RFM containing prediction by AI as metastatic and Clark's level 5 as risk factors provided even higher AUROC of 0.872. Finally, a RFM including prediction by AI as metastatic, Clark's level 5 and diameter ≥ 30 mm as risk factors (AI-RFM) produced an AUROC of 0.917. Similar RFMs that took into account prediction by pathologist instead of AI resulted in lower AUROCs of 0.807 and 0.841 respectively ( Survival analyses. Survival analyses were conducted to further evaluate the discriminative power of AI and AI-RFM from prognostic point of view. First, survival between slow and rapid metastasis cohorts, as well as non-metastatic cohort was analyzed. Survival was higher in slow metastasis cohort up to approximately 4 years until slow metastasis cohort reached the level of rapid metastasis cohort with 50% of patients alive at 1.2 years for rapid metastasis and 3.4 years for slow metastasis cohort, when OS was considered and 1.3 and 4.1 years, respectively, when disease-specific survival (DSS) was considered ( Supplementary Fig. 2). This notion emphasizes the importance of identifying especially mcSCCs, which metastasize rapidly. Next, the prognostic power of discrimination by both AI and dermatopathologist was analyzed. AI and pathologist provided nearly similar OS and DSS prediction ( Fig. 2A,B).
In order to elucidate the feasibility of AI-RFM, the prognostic power of conventional histopathologic parameters was evaluated and visualized. Notably, the survival prediction by histological grade was inferior to diameter and Clark's level, especially with respect to DSS (Fig. 3A,B). These observations support the inclusion of Clark's level 5 and diameter ≥ 30 mm in AI-RFM as risk factors for prediction of metastasis in combination with AI ( Fig. 3C,D). The AI-RFM provided survival prediction with excellent prognosis for patients with cSCC with 0 (5-year DSS estimate of 100%) or 1 (5-year DSS estimate of 95.7%) risk factors and poor prognosis for patients with cSCC with 2 (5-year DSS estimate of 41.7%) or 3 (5-year DSS estimate of 40.0%) risk factors (Figs. 3D, 4B,D). The discriminative power of AI-RFM was superior to conventional-RFM which included diameter ≥ 30 mm and Clark's level 5 as risk factors with respect to DSS (Fig. 3D). Furthermore, the AI-RFM was compared to BWH tumor staging (Fig. 4A,B) and to the comparative RFM utilizing prediction by pathologist instead of AI (pathologist-RFM) (Fig. 4C,D). When DSS was considered the discriminative power of AI-RFM was superior to both BWH tumor staging (Fig. 4B) and pathologist-RFM (Fig. 4D). Notably, although the survival predictions by both AI and pathologist alone were almost identical ( Fig. 2A,B), the AI-RFM was superior to pathologist-RFM ( Fig. 4D) emphasizing the notion that the evaluation by AI is based on yet unestablished morphological features unlike the evaluation by pathologist. The superiority is based on the discrimination of cases by AI-RFM into good (0-1 risk factors) and poor prognosis (2-3 risk factors) and the lack of "grey zone" which turns out to be a problem with comparative RFMs and BWH staging.

Discussion
To examine the ability of AI to recognize primary cSCCs that will develop metastasis, several approaches were tested with cross-validation. This is a clinically challenging task as to date there are no established biomarkers to predict the risk of metastasis or prognosis of primary cSCC. Conventional clinicopathological features are utilized in tumor staging systems, but these are unsatisfactory in predicting the risk of metastasis 7 .
Our results show that an experienced dermatopathologist and AI with limited training and testing data perform quite similarly in order to distinguish rapidly metastatic primary cSCCs from non-metastatic cSCCs. www.nature.com/scientificreports/ Although pathologists in clinical practice do not directly evaluate the risk of metastasis of primary cSCC, here the dermatopathologist was requested to do so. Notably, AI and pathologist had only one WSI available for analysis and no access to whole tissue material of the tumor or to clinical information from which the actual diameter or invasion depth of the whole tumor could have been deduced. By Pearson correlation (Supplementary Table 1) it appears that prediction by pathologist is based more on conventional histopathological features, such as invasion depth than the prediction by AI. This notion is further supported by similar survival prediction performance by AI and pathologist alone ( Fig. 2A,B), in comparison to superior discriminative performance of AI-RFM to pathologist-RFM (Fig. 4D). These findings provide evidence, that the prediction by AI is based on yet unestablished morphological features or feature combinations. Additionally, these observations support our hypothesis, that these morphologic features appear in primary tumors temporally close to the time of metastasis. The results obtained with the AI-RFM generated in this study show, that inclusion of AI algorithm in RFM improved the assessment of cSCC metastasis risk, as shown by AUROC of 0.917, which was superior to other RFMs and staging systems tested. Furthermore, the AI-RFM clearly differentiated cases with 0 or 1 risk factors from cases with 2 or 3 risk factors with respect to DSS prediction. This observation indicates that primary cSCCs with ≤ 1 risk factor should be considered as cases with low risk for metastasis and cSCCs with ≥ 2 risk factors as cases with high risk for metastasis when utilizing AI-RFM generated in this study.
To date, there are few published studies with similar study design in cancer field and no studies on metastasis risk of cutaneous cancers. A leave-one-out cross-validation accuracy of 80.77% was lately reported when predicting the risk of metastasis in pancreatic neuroendocrine tumors 21 . Furthermore, AUROC of 0.68 was achieved in prediction of lymph node metastasis of prostate cancer from primary tumor tissue utilizing CNN 22 . A third study harnessed CNN to predict lung cancer recurrence and metastasis from histopathological images and reached an AUROC of 0.79 23 . Our results provide evidence, that the AI prediction exploiting multifactorial RFM appears to be a more encouraging approach to the prediction of metastasis risk by AI.
It is conceivable, that further studies on AI and metastasis risk of primary cSCCs with larger cohorts for training and testing are warranted. It would be advisable to use tissue specimens scanned on same occasion or sufficiently large, preferably equally sized cohorts of samples scanned on different dates. Same approach is  www.nature.com/scientificreports/ also recommended with usage of different scanners. Based on our findings, small subcohorts scanned with different scanners or on another date can generate bias and therefore identical scanning settings should be used. Furthermore, with larger cohorts it would be useful to further analyze the ability of AI algorithm to recognize both rapid and slow metastasis cases as well as biopsies and resections. Unfortunately, in this study with limited tissue specimens further analysis of the impact of above mentioned qualities as well as analysis of tissue samples at different stages of tumor progression could not be executed. In summary, our results provide proof of concept, that there are certain yet unknown morphological features or feature combinations, which associate with the risk of cSCC metastasis and can be recognized by AI. Further studies are warranted in order to unveil these and to further develop AI algorithm as prognostic tool in     www.nature.com/scientificreports/ the analyses. mcSCCs were further divided into two subcohorts: one with primary mcSCCs, which developed metastasis within 180 days (rapid metastasis cohort) (n = 22) and another with primary mcSCCs that developed metastasis after 180 days from the initial diagnosis of primary tumor (slow metastasis cohort) (n = 23). Clinical, histopathological, follow-up and demographic data were gathered manually from patient records and pathology reports (Tables 1,2). All mcSCCs developed at least one nodal metastasis and part also distant metastases. In survival analyses death was the primary clinical endpoint. The exact cause of death was rarely reported and autopsies were infrequently performed, but the cause of death could be deduced from patient records with acceptable reliability by clinician (JSK). Both unambiguous OS and deduced DSS were utilized in survival analyses. Survival time was calculated from the date of initial diagnosis of the primary cSCC with 5 years of follow-up. Death represented the end of follow-up.
Tissue specimens (slides) were scanned with 3DHistech scanner (3DHistech, Budapest, Hungary) and annotated using CaseViewer (3DHistech, Budapest, Hungary). The WSIs representing cSCCs were manually annotated. Annotated area included whole tumor represented on the WSI, including tumor cells, intratumoral and peritumoral stroma, as well as intratumoral inflammatory cells (Supplementary Fig. 3). Additionally, manual www.nature.com/scientificreports/ exclusions were made if artefacts were detected in annotated tumor area. These annotated areas of the WSIs were used to train and validate the neural networks. As WSIs are too large to feed any ML model, the images were cut into 1024 × 1024 and 512 × 512 pixel tiles at the largest zoom level (20 ×) from the annotated tumor areas. The tissue content of the tiles was further analyzed by using a combination of binary and Otsu thresholding: a tile was considered valid if it contained more than 50% of tissue pixels as defined by the thresholding algorithm. Similarly, a tile was considered a tumor tile if it contained more than 50% pixels from the tumor annotation mask. All tiles inherited the cohort label from the original slide. Every tile was then resized into 299 × 299 pixels before training the algorithms.
Training setup. We analyzed the task of distinguishing mcSCCs from non-mcSCCs as a binary classification problem on the level of a single tumor tile. Each tumor tile was assigned to a single cohort and the binary classification problem was to classify the tiles into these two classes. As the data set is relatively small we approached the task with cross-validation, in which data is sectioned into subsets and one subset at a time is used for testing and rest for training. Thus, several models are trained and the performance of each model is tested with alternating validation subset.
Final reported results of the rapid metastasis -AI-model are based on the ResNet18 architecture with a custom head consisting of average pooling and two dense layers with heavy dropout to combat overfitting. More complex ResNet models (ResNet50) were also tested but were more prone to overfitting, which lead to the choice of a simpler model for the final rapid metastasis -AI-model. The models were trained using threefold cross-validation (3-CV) with different extraction tile sizes until the final rapid metastasis -AI-model in which fourfold cross-validation (4-CV) was used. The loss function used was the binary cross-entropy loss. In training, it was confirmed that the tiles from a given patient and slide were exclusively sampled into one of the crossvalidation-folds to prevent the leakage of information between the training and validation sets. With every fold the model was trained 10 or 20 epochs.
Visualization of the results. In addition to analyzing the cross-validation accuracy and loss, we created the out-of-the-fold (OOF) tile level receiver operating characteristic (ROC) curves and calculated area under the curve (AUC) values. The OOF tile level predictions were also mapped to slide level. Namely, each tile and the corresponding location in the WSI was assigned a prediction probability between 0 and 1 for developing a metastasis. This probability was scaled to [0,100] and then shifted to the interval [− 50, 50] for each tile. The scores on WSI were then spatially smoothed using a median filter with window size 2. This meant that a window of 2 × 2 tiles was moved along the slide and the tile value was replaced with the median of the tile values within the window. The idea was to remove noise according to our hypothesis, that the features representing metastasis risk would vary smoothly across the tissue slide. To date, there is no way to scientifically select a correct size for the smoothing window. Our selection was based on visualizations of the results. Probability maps of the annotated tumors were generated for metastasis score visualization (Fig. 5).
After the previous steps, where the OOF tile level predictions were accumulated to slide level, a simple majority vote of the scaled predictions was performed to determine the predicted label of the WSI and the tumor by assessing the mean slide level score. We took 0 as the decision threshold to discriminate between the low/high metastasis risk tumors and visualized the slide level results in ROC curves with AUC scores and summary confusion matrices. The workflow from WSI input to tile level result is visualized in Fig. 6. AI-models and hypotheses. In our initial approach we used single tile -AI-model, in which individual tiles served as input. Hypothetically, as most of the cells are the offspring of the cancerous cells and inherit the genetic alterations of the first generation of tumor cells, most tiles should represent the possible differences between the metastatic and non-metastatic tumors. It was noted that this classification algorithm is easy to implement and it is easy to aggregate and visualize the single tile predictions at slide level, but the model is prone to label noise. Another approach was to focus on the invasive front of the tumor (invasive front -AImodel). Hypothetically, metastatic characteristics of the tumor are more likely visualized in the invasive front and focusing on this area was expected to lead to less noisy training labels. Third approach was to use stack of tiles (multi-tile -AI-model) as an input data in order to reduce the noise in the labels. Instead of just one tile, randomly selected sample of stack of n tiles from the same WSI was used as the input of the classifier. In above mentioned approaches ResNet50 architecture was used with 3-CV and single tiles from annotated WSIs representing the two cohorts (primary mcSCCs and primary non-mcSCCs) were used as input data. In multi-tile -AI-model instead of single tile a stack of tiles and in invasive front -AI-model only tiles adjacent to the edge of the annotation (either inside and/or outside of the annotated area) from annotated WSIs were used as input data.
Due to previous notion of rapid development of metastasis 3 we subdivided the mcSCC cohort into cases that metastasize rapidly and slowly. Furthermore, we hypothesized that features characteristic for the metastatic tumor would be more prominent or more probably already present at the date of tissue specimen in primary tumors, which metastasize rapidly. Therefore, only tumors, which metastasized rapidly were utilized in the rapid metastasis -AI-model. Both stack of tiles and single tiles from annotated WSIs representing two cohorts (primary non-mcSCCs and primary rapid mcSCCs) were used as input data. Fourfold cross-validation and ResNet18 architecture were ultimately used. To further prevent overfitting into the more exclusive dataset, we used a "zoomed-in" approach in which 512 × 512 pixels instead of 1024 × 1024 pixels tiles were used.
Regarding technical execution of the study, during repeated runs of 3 × or 4 × CV analysis on all of the available slides for training, it seemed that depending on the choice of the CV split, the results varied considerably. Often it seemed that one fold was much more difficult for the model to analyze than the others, leading to considerably lower AUROC values. We took this as a sign that the dataset contained some examples that were www.nature.com/scientificreports/ difficult for the model to analyze, for example by being very atypical cases of rapid metastasis tumors. More careful analysis of the metadata revealed, however, that the difficult cases appeared in folds where the slide data had been scanned in 2020 instead of 2016. Most of the rapid metastasis cases were scanned earlier because various research projects have been conducted with the samples in the past. The scanner used was the same, but we hypothesized that the scanning software, image packing algorithms or the physical components of the scanner itself could have been updated affecting the image colors, noise patterns or other qualities. We used heavy color augmentations in training of the algorithm, but to study the possible effect of slides scanned at different times we tried and restricted the data to old samples only. This reduced the size of the dataset available, but was useful in ruling out at least one more possible source of error. This affected the AUROC results somewhat and brought the average AUROC value up.
Blinded assessment by pathologist. For comparison purposes, every WSI included in the final rapid metastasis -AI-model was analyzed by experienced dermatopathologist (LT). Only tissue sample ID and information, whether the specimen represented biopsy or resection was provided to the pathologist in accordance with access to CaseCenter folder including 81 WSIs representing non-mcSCCs and rapid metastasis mcSCCs without the knowledge of the proportion of cases. The histological criteria used by the pathologist for predicting Figure 5. Probability maps of annotated whole slide images analyzed by artificial intelligence (AI). The color in the probability map indicates the predicted metastasis score by AI on tile level in annotated tumor area. Red color represents high and blue color low score i.e. red color indicates tiles analyzed as metastatic and blue color tiles analyzed as non-metastatic by AI algorithm. White color on the edge of the slides represents excluded tissue outside manual annotations. (A) and (D) represent rapid metastasis and (B) and (C) non-metastatic cSCCs that were classified correctly on slide level by AI. www.nature.com/scientificreports/ metastasis in this series included invasion depth, histological grade, tumor size, perineural invasion and also the subjective interpretation of the pattern of invasion (large vs. small nests/single cells). Although for instance invasion depth is a part of staging systems, no staging system as such was used. The pathologist also had the option not to classify the specimen into the cohorts, if assessment was not possible, as was the case with biopsy samples.

Statistical analysis.
All statistical analyses were conducted using IBM SPSS Statistics for Windows, version 25.0 (IBM Corp., Armonk, NY, USA). Bidirectional p values < 0.05 and 95% confidence intervals (95% CIs) of odds ratios (ORs) not including 1.00 were considered statistically significant. Baseline tumor characteristics were analyzed using descriptive statistics mainly crosstabs and frequency tabulation. Statistical analyses were conducted with Pearson χ 2 test and Fisher's exact test. Binary logistic regression analyses with 95% CIs were performed in order to determine ORs regarding the risk of metastasis. For every risk factor and risk factor combination an AUROC was calculated in order to further visualize results in relation to AI prediction. Pearson correlation was conducted in order to visualize multicollinearity and to examine whether predictions correlated with some of the clinicopathological variables. The Kaplan-Meier method was applied to generate survival estimate curves and define survival probabilities.

Data availability
The data collected during this study is patient data obtained under Ethical Committees approval and cannot be shared.
Received: 27 January 2022; Accepted: 26 May 2022 Figure 6. The rapid metastasis -AI-model workflow. The whole slide images are divided into small tiles. The tiles are assigned the binary yes/no tumor labels based on the annotations. The tumorous tiles are further labeled based on the metadata to yes/no rapid metastasis. This is done for all of the WSI images. The ResNet-18 model is trained to classify the tiles according to the labels. Batches of tiles are fed to the model, which then learns to extract relevant visual features (feature encoding) from them and produce a classification. Finally, the confidence scores "P(metastatic)" are aggregated to produce whole slide level results.