Creating an atlas of normal tissue for pruning WSI patching through anomaly detection

Patching whole slide images (WSIs) is an important task in computational pathology. While most of them are designed to classify or detect the presence of pathological lesions in a WSI, the confounding role and redundant nature of normal histology are generally overlooked. In this paper, we propose and validate the concept of an “atlas of normal tissue” solely using samples of WSIs obtained from normal biopsies. Such atlases can be employed to eliminate normal fragments of tissue samples and hence increase the representativeness of the remaining patches. We tested our proposed method by establishing a normal atlas using 107 normal skin WSIs and demonstrated how established search engines like Yottixel can be improved. We used 553 WSIs of cutaneous squamous cell carcinoma to demonstrate the advantage. We also validated our method applied to an external dataset of 451 breast WSIs. The number of selected WSI patches was reduced by 30% to 50% after utilizing the proposed normal atlas while maintaining the same indexing and search performance in leave-one-patient-out validation for both datasets. We show that the proposed concept of establishing and using a normal atlas shows promise for unsupervised selection of the most representative patches of the abnormal WSI patches.


Introduction
Pathology is the field of studying the cause, development, and effects of diseases.This field involves examining sampled tissues, cells, or bodily fluids to diagnose pathologic disorders or help with determining the prognosis of a disorder's progression.Examining tissue samples of solid tumors, microscopes have been the most commonly used tool in this field before the introduction of digital scanners 1 to convert glass slides with mounted processed and fixed tissue samples into a digital image.This digitization introduced a new concept to the field commonly referred to as digital pathology 2 .While digital pathology is the general term used to describe integrating innovative digital tools into pathology, computational pathology describes a field that uses computational techniques and algorithms to analyze and interpret pathology data with the goal of improving the diagnosis, prognosis, and treatment of diseases 3 .
As tissue glass slides constitute a major source of information in pathology, whole slide imaging and whole slide image (WSI) analysis comprise a major and new subset of computational pathology.The current market of digital scanners is very diverse with most of the commercially available digital scanners being able to capture whole slide images at magnifications as high as 40X, generating extremely large image files 4 .While storing and leveraging the information in such a huge amount of data can be challenging, the stored retrospective data can be effectively used to introduce assistive tools into digital pathology increasing the speed and accuracy of pathologists hence reducing the workload.Artificial intelligence (AI) has shown promise in utilizing mass medical archives.
Deep learning uses artificial neural networks to learn complex patterns in any type of data 5 .In contrast to conventional machine learning which mainly uses structured data, e.g., tabular data, the so-called deep models (deep artificial neural networks) can take in unstructured data including text, sound, and image, without pre-processing or pre-extraction of features.This makes deep networks extremely useful in real-life applications, including medicine.Deep models are mainly composed of multiple layers of connected artificial neurons.The general workflow of training and using a deep model in a supervised manner includes presenting the model with enough samples so the model can be properly trained to perform the desired task on new cases 5 .
It has been shown that searching in an indexed dataset with image-level labels may provide a base for computational consensus [6][7][8] .An index dataset can be called an "atlas" of medical information where previously established knowledge is stored and can be retrieved for patient matching.In such an atlas, newly encountered cases, i.e., new patients, are indexed using the same methods and are matched against all the atlas cases to find the most similar patients.Diagnosis, staging, prognosis, and any other decision or prediction can be then inferred from the top similar atlas patients for which all outcomes are known.
Indexing visual content as generating a vector is the major arXiv:2310.03106v1[eess.IV] 4 Oct 2023 task for atlas creation.While deep networks are mainly used as classifiers to provide definite labels on the input data, it has been shown that the numerical representations of some pre-decision layers of the network, also called embeddings or deep features, contain important information about the content of the input data 9,10 .The concept of image retrieval in an atlas using deep features has been used in both radiology and pathology 7,8,11 .Any deep network can be used to obtain deep features, but the architecture of the network and the data it has been trained on can greatly affect the representativeness of the features for the task at hand.For instance, KimiaNet is a dense convolutional neural network fine-tuned using all diagnostic histopathology images from the TCGA repository 12 .Vision transformers have also shown a promising ability to capture the details in a given image 13,14 .
While deep networks are widely used for generating deep features from images, they also impose limitations regarding image size.State-of-the-art graphical processing units (GPUs) which are commonly used in deep learning can process up to a certain size of the image in each iteration.Therefore, any whole slide image (WSI) processing requires the image to be broken down into many smaller images, also known as patches 15 .While patching gigapixel WSIs is a crucial step in computational pathology, one WSI can generate several hundred and even several thousands of patches of ordinary size, say 224 by 224 or 512 by 512 pixels, depending on the size of the WSI, magnification, and the biopsy type.One approach would be to feed all the patches into the deep network and use the output for the downstream tasks.Such brute force approach would require massive computational power, a factor that would limit the adoption of the technology.
Processing WSIs is a difficult problem that requires a divide & conquer approach.Patching is generally the divide and extracting deep features is the conquer part.Various methods have been proposed to select a smaller subset of WSI patches as representative ones for downstream tasks.One approach as described in Yottixel is clustering patches based on color distribution or RGB histogram and proximity of patches in WSI to select a diverse but smaller set of patches from all morphological structures within a WSI 7 .This is called a mosaic of WSI.The Yottixel mosaic concept is simple and reliable and has been adopted by other works 16,17 .Still, patch selection remains one of the open fields for research because it can greatly affect the performance of the downstream analyses.Yottixel's mosaic ignores normal histology and selects normal and malignant patches based on their frequency in the clustering space.
As pathologists have been trained to recognize normal histologies and disregard them when focusing on the pathology, it can be helpful for a better patching strategy to remove normal histology specific to the site.One way to formulate this task is by integrating anomaly detection into the patching scheme.Provided WSIs depicting entirely normal tissue are available, one-class classifiers 18 could learn the typical features of normal histology.The trained classifiers can then accompany the mosaic generation, i.e., the patching, to remove normal patches.
In this paper, we propose and validate an "atlas of normal tissue" solely using samples of WSIs obtained from normal tissue biopsies.The proposed method leverages a weakly supervised multiple-instance learning method using one-class classifiers to exclude the normal patches and focus on clinically important parts of any given WSI.We argue that this method will reduce the number of patches selected, hence removing redundancy, from each WSI while maintaining the representativeness of the patches for downstream tasks.

Experiments Setup
Datasets.Two datasets from two different organs were used to test the proposed atlas of normal atlas and the one-class classifier for normal patch detection.
• One dataset included a total number of 660 skin tissue WSIs of patients diagnosed with cutaneous squamous cell carcinoma (cSCC) which were pulled from the internal Mayo Clinic database (REDCap).The cases were first identified through enterprise-wide internal search engines which identified patients with a histopathologic diagnosis of cSCC or metastatic cSCC from pathology reports of archived tissue specimens.These cases subsequently underwent chart review to confirm the primary tumor as part of the inclusionary criteria.Samples were either taken at Mayo Clinic locations (Minnesota, Arizona, Florida) or at outside facilities and shared with Mayo Clinic for consultation.The tissue may be from either biopsy (punch, shave, etc.) or from subsequent excision.All tissue included in the pilot underwent both initial review and re-review by a dermatopathologist at one of the Mayo Clinic sites.The purpose of the re-review was to confirm cSCC, tumor characteristics (depth, differentiation, PNI, etc.), and tumor stage, and to select cases most appropriate for scanning and sequencing.
Selected cases included 386 well-differentiated, 100 moderately differentiated, and 67 poorly differentiated cSCC.There were also 107 normal WSIs selected to represent the normal skin tissue.The only associated label used is the degree of differentiation or being normal.A total number of 10 slides were randomly selected from well-differentiated and poorly differentiated cases for internal validation of the normal atlas at the patch level.A pathologist selected the regions with the most prominent abnormal morphology in these slides.The annotations were later used to generate normal and abnormal patches for validation purposes only.
• The second collected dataset included 21 breast tissue WSIs obtained from 8 patients with no prior diagnosis of breast cancer at the Mayo Clinic.For validation purposes, 78 WSIs from cases with lobular carcinoma of the breast and 354 WSIs from cases with ductal carcinoma of the breast were obtained from the TCGA Research Network.These were all the diagnostic WSIs available on the repository for the two selected medical conditions at the time of collecting the data.Tissue samples from cases with no prior malignancy and no treatment at the time of sampling were selected to imitate the diagnostic pipeline of a deployed model.
Normal Atlas Creation.The idea of the normal atlas relies upon leveraging anomaly detection algorithms used in machine learning to detect outliers in a dataset.One-class classifiers are best suited for this task.Due to the complex nature of the data, we selected Isolation Forest 19 and One-Class SVM 20 as the abnormality detection mechanisms in this study.Both methods were presented with a series of deep features obtained from normal patches.The fitted classifiers were then used to classify normal and abnormal in a given set of deep features they had not seen before.Data preprocessing included tissue segmentation, patching, and color normalization.The tissue segmentation was carried out using an in-house trained U-NET segmentation model.Patching was done at 20X magnification with a patch size of 1024 by 1024 and an overlap of 30 percent between adjacent patches, both in height and width.A minimum of 75 percent patch area coverage was selected as the criteria for excluding patching with insufficient tissue.All patches were separately fed to three different deep networks to obtain the deep features.These networks included one convolutional neural network with the architecture of DenseNet121 specifically trained on pathology patches named KimiaNet 12 , one pre-trained vision transformer using a method named DINO, and another convolutional neural network with the architecture of ResNet50 trained using the same DINO approach 14 .
When training the one-class classifiers with the normal data, the sample was assumed to be contaminated with some abnormal data.In the context of normal WSIs, this may include artifacts, tissue folds, stain sedimentation, pen markings, and out-of-focus slide areas.Both Isolation Forest and one-class SVM use methods to incorporate this assumption.One-class SVM utilizes a factor called ν to set an upper bound on the fraction of margin errors and a lower bound on the fraction of support vectors relative to the total number of training examples 21 .We set ν = 5% of the total cases according to the expert's opinion and let the Isolation Forest classifier deduct this parameter automatically according to the method described in the original paper 19 .The pseudocode of creating and using the normal atlas is outlined in algorithms 1 and 2. The pipeline of creating the normal atlas is also visually shown in Figure 1.We used the Python library SciKit Learn for the implementation of Isolation Forest and One-Class SVM 22 .
Index and Search Workflow.To demonstrate the role of removing the normal patches from the WSI, we have used the general workflow for indexing and searching of WSIs for retrieval of similar cases as previously explained in Yottixel with some minor adjustments 7 .The tissue segmentation and patching follow the same method used to obtain the normal patches.To reduce the computational and storage cost, a subset of patches is then selected from each WSI using the patch selection method of Yottixel 7 .This method involves clustering patches into 9 clusters using a histogram of their RGB values and then selecting 15 percent of the patches from each cluster in a spatially homogenous manner.This results in selecting almost 15 percent of the total patches from each  WSI, collectively called mosaic.The deep features for the patches of each WSI are used as an indexed version of that WSI.The median of minimum one-to-one Euclidean distances between all the patches of the two WSI is used to find the most similar indexed WSIs to any given WSI.The same approach for obtaining similarity is used in Yottixel with the difference being the Hamming distance for barcoded features 7 instead of the Euclidean between deep features which was used in our study.The schematic workflow of indexing and searching WSIs is shown in Figure 2.
Internal and External Validation.The performance of the normal atlas at the patch level was tested by obtaining normal and abnormal patches from 10 randomly selected cSCC test slides which were annotated by a pathologist.Having a minimum of 50 percent surface area of the patch annotated as the abnormal region was set as the threshold to label the patch as abnormal.The rest of the patches were considered normal.The patches were obtained in the same size and magnification as the original datasets used, i.e., 1024 by 1024 at 20X.This process yielded 6479 normal and 9352 abnormal patches.Precision, recall, and F1 scores were selected as performance metrics for this classification task.
Three different setups of indexing and search pipeline were examined to test the effect of the normal atlas at the WSI level.The base setup included the original indexing and search pipeline for the WSIs as described in the Yottixel paper 7 .Two other setups were almost the same with one added step of excluding normal patches using an atlas of normal tissue after applying the Yottixel's mosaic to acquire patches.One setup used the Isolation Forest as the method of anomaly detection and another one used the one-class SVM.The resulting patches were also used to perform indexing and search in the same manner as the base setup.WSIs with all patches excluded as normal patches were labeled as normal.Performance of the indexing and search pipelines were measured using a leave-one-patient-out validation for all included patches by taking the output attached to top-1 and a majority vote among top-3 and top-5 most similar WSIs.Performance metrics such as precision, recall, and F1 score were calculated for each label inference.The overall performance of each  configuration was also reported using the weighted F1 score of all the labels in the dataset according to their number of occurrences.

Results
Normal Atlas Patch-Level Validation.The patch-level internal validation for the normal atlases was carried out using the 10 WSIs which were annotated for abnormal and normal regions by a pathologist.The performance varied based on the deep network used to obtain the deep features and the anomaly detection method used.KimiaNet's features showed the best performance both using Isolation Forest and one-class SVM, with F1 scores of 0.82 and 0.80, respectively.Among the networks trained with natural images, the DINO-trained vision transformer combined with one-class SVM showed the best results with an F1 score of 0.79.The detailed patch-level validation results for normal skin atlases are shown in Figure 3.One visual example of how patch-level classification of different deep networks and anomaly detection algorithms comply with pathologist annotations is shown in Figure 4 for a cSCC WSI and in Figure 5 for a breast cancer WSI.
Indexing and Search Results.Indexing and search were carried out on both datasets of cSCC and breast cancer WSIs using the three previously described experiment setups.The detailed performance results of the indexing and search experiments are reported in Tables 2 and 3 in the supplementary material section.As the confusion matrices for the leaveone-patient-out classification depicted in Figure 6 for the skin dataset show the two mostly mislabeled classes are 'welldifferentiated' and 'moderately differentiated'.While this is true for the indexing and search using the deep features from all three deep networks, KimiaNet (trained specifically on pathology images 12 ) shows the best performance.The classification, regardless of the network used to obtain the deep features, shows almost the same level of performance in all three setup configurations of using no normal atlas, using a normal atlas with Isolation Forest, and using a normal atlas with one-class SVM.To show this in the skin dataset, the F1 score for different experimental setups of the top-5 consensus is illustrated in Figure 8 the indexing and search leave-one-patient-out validation for the breast dataset are shown in Figure 7. Indexing and searching for breast carcinoma cases using deep features from all three models also show suboptimal performance for lobular carcinoma and infiltrating ductal carcinoma.However, the performance stays almost the same in three setups of not using the normal atlas, using one with Isolation Forest, and using one with a one-class SVM.This is shown in the F1 scores plot of the top-5 consensus of the corresponding experiments in Figure 9.While the performance of the classification using the indexing and search stays almost the same in both skin and breast datasets, the number of total patches selected in the whole dataset to represent the WSIs is reduced by at least 12% and at most 86% of the original number of patches as shown in Table 1.The exact level of patch reduction depends on the deep network used to obtain the deep features and on the anomaly detection algorithm used in the normal atlas.

Discussion
While most of the computational pathology tasks are designed to classify or detect the presence of pathological lesions in gigapixel WSIs, the presence of computationally redundant nor-mal tissue in WSIs is often ignored.While a large body of scientific work has been done in the field of WSI retrieval systems to address the numerical representation of the WSI 12,[23][24][25] and WSI search engines 7,8,26,27 , no study has addressed the effect of normal tissue in the indexing and search pipeline.Therefore, removing normal tissue is not an established practice in this field.While some works have used normal tissue exclusion as part of their workflow 28 , the potential effect of redundant normal tissue in WSI retrieval is an unknown area.
In this work, we tested the use of two anomaly detection algorithms as a means of excluding abnormal patches from Yottixel's indexing and search 7 .Our methods included no WSI-level annotation and used multiple instance learning for known normal WSIs.
The urgent need for novel WSI patching is an important aspect of current works in the field of computational pathology.Patching involves dividing an image into smaller regions or patches and using these patches to represent the tissue in the image.There are several benefits to using tissue patching in computational pathology.First, it allows for the efficient analysis of large images, as the patches can be processed independently and in parallel.This can significantly reduce the Table 1.Number of patches used for indexing and search in three different experimental setups including no normal atlas, normal atlas with Isolation Forest, and normal atlas with one-class SVM for skin and breast WSIs.For each setup, the numbers have been reported for deep features obtained from all three networks.The percentage inside the parenthesis for each cell shows the number of patches excluded compared to the base configuration, i.e., no normal atlas used, in each experiment.Rows represent results in three experimental setups using no normal atlas, using a normal atlas with Isolation Forest, and using a normal atlas with one-class SVM, respectively.

Experiment
computational resources required for image analysis.Second, WSI patching can improve the accuracy of algorithms by pro- viding a more representative set of tissue patches.The main focus in most digital pathology tasks is to find the unique histopathologic fingerprint each WSI represents which is basically the diagnosis of the disease.This fingerprint might be unique to that WSI while normal (healthy) tissue constitutes a normal variation of cellular morphology and tissue structure that is common to all human histology in each primary site.Including these common features, namely normal histology, in indexing WSIs with relevant features, namely malignant lesions, introduces redundancy into the WSI retrieval workflow.
Our results show that removing the areas with normal histol-  ogy during the patching process helps with selecting a smaller set of representative patches for each WSI, therefore decreasing the computational and storage cost of the search engine, while maintaining the overall retrieval performance of the search engine.
As shown in Figure 8, by using a one-class SVM to detect and exclude the abnormal patches in the cSCC dataset using features from KimiaNet, the classification performance for all three classes was either maintained at the same level or even improved.At the same time, as Table 1 depicts, the number of patches for selected cases was reduced by 67% after excluding the normal tissue.Contrary to this improvement, the performance for correctly classifying the normal cases decreases, with a decrease in the F1 score from 0.97 to 0.79.This drop can be attributed to the fact that excluding patches that look normal from a normal WSI will leave it with patches mainly containing artifacts and distortions which makes the indexing and search return mismatched results.In contrast to the improvement seen for the experiments with deep features obtained by KimiaNet, those done on deep features obtained using DINO-trained ViT and ResNet50 show rather mixed results.
The results for the breast dataset show almost the same pattern as the results for the cSCC dataset.As Figure 9 shows, the classification performance for ductal carcinoma has shown minor changes regardless of the anomaly detection method selected and the network used to obtain the deep features.However, excluding normal tissue inflicts varying changes to the lobular carcinoma classification performance based on the network and anomaly detection method.Excluding normal tissue from breast WSIs also decreased the number of patches selected by 12% to 47% as depicted in Table 1.The same effect of decreased performance in normal cases for the cSCC dataset can also be observed in the breast dataset.
The changes in indexing and search performance seen in the skin dataset are also in concordance with the patch-level classification performance seen for annotated validation cases which is shown in Figure 3.The validation results show that using Isolation Forest and one-class SVM classifiers on deep features obtained from KimiaNet are the best combination with an F1 score of 0.82 and 0.80, respectively.The same configurations also showed the best results in indexing and search as shown in Figure 8. Applying one-class SVM on DINOtrained ViT and Isolation Forest on DINO-trained ResNet50 also result in acceptable performance with an F1 score of 0.79 and 0.52, respectively.These two configurations also showed mediocre performance in indexing and search.Even though these two pairs of combinations showed good results, combining them the other way around does not show promising performance.These figures are in concordance with the visual demonstration of the normal versus abnormal region selection and the pathologist-approved ground-truths shown for two sample cases in Figures 4 and 5.These findings prove that patch-level classification performance is an indicator of how a normal atlas would perform in an indexing and search pipeline.This calls for the need to validate the performance of the normal atlas at the patch level before applying it in the pipeline.
There are some limitations to this study.As previously stated, the indexing and search performance for some cases was suboptimal before applying the normal atlas.Examples of such cases include moderately versus differentiated cases of cSCC and lobular versus ductal carcinoma of the breast.We speculate that this is an active research area involving WSI representation.Improving the performance of WSI search engines and pathology-specific deep networks is not in the scope of this study.We selected the currently established pipeline of Yottixel as the state-of-the-art search engine for our experiments.
Looking toward future research, other studies have shown the role of vision transformers in the area of anomaly de-tection [29][30][31] .These approaches mainly rely on using the reconstruction score of previously trained vision transformer encoder and decoder to extinguish normal versus abnormal image features.This approach hypothesizes that a coupled encoder and decoder trained only on images of normal phenomena would struggle to reconstruct an image of an abnormal instance.This approach holds promise for training unsupervised anomaly detection models in medical imaging and should be further investigated in future studies.

Figure 1 .
Figure 1.The general workflow of creating the normal atlas: (a) Normal WSIs are generated by scanning slides of known normal tissue biopsies, (b) Deep features of normal WSIs are obtained by passing the patches from each WSI through a deep network; the resulting deep features are stored in the normal atlas, (c) The constructed normal atlas is used to differentiate normal vs abnormal deep features from unknown patches obtained using the same preprocessing and feature extraction method.

Figure 2 .
Figure 2. General workflow of index and search: (a) Deep features of known slides are obtained via passing the preprocessed patches through a deep network and are stored with their corresponding labels, (b) The deep features of a new (diagnostically unknown) slide are obtained through the same process, and (c) Stored deep features of the known slides are searched to find the closest matches to the newly generated deep features to extract the most probable label through a majority vote among the top-n closest labels.

Figure 3 .
Figure 3. Normal atlas performance at patch-level demonstrated using validation results for 10 manually annotated WSIs by a pathologist shown as confusion matrices using normal atlases established using Isolation Forest (a,b,c) and one-class SVM (d,e,f) on deep features extracted by KimiaNet (a,d), ViT-DINO (b,e), and ResNet50-DINO (c,f).

Figure 4 .
Figure 4. Visual results in a cSCC case: (a) Pathologist ground-truth, (b) Yottixel's mosaic, (c) KimiaNet and Isolation Forest, (d) KimiaNet and one-class SVM, (e) ViT-DINO and Isolation Forest, (f) ViT-DINO and one-class SVM, (g) ResNet50-DINO and Isolation Forest, (h) ResNet50-DINO and one-class SVM.Red boxes show the patches selected for indexing and search within each WSI.

Figure 5 .
Figure 5. Visual results of a breast carcinoma case: (a) Pathologist ground truth, (b) Yottixel's mosaic, (c) KimiaNet and Isolation Forest, (d) KimiaNet and one-class SVM, (e) ViT-DINO and Isolation Forest, (f) ViT-DINO and one-class SVM, (g) ResNet50-DINO and Isolation Forest, and (h) ResNet50-DINO and one-class SVM.Red boxes show the patches selected for indexing and search within each WSI.

Figure 6 .
Figure 6.Indexg and search confusion matrices for leave-one-patient-out validation for top-5 consensus predictions in skin dataset.Columns represent the indexing and search using deep features from KimiaNet, DINO-trained vision transformer, and DINO-trained ResNet50, respectively.Rows represent results in three experimental setups using no normal atlas, using a normal atlas with Isolation Forest, and using a normal atlas with one-class SVM, respectively.

Figure 7 .
Figure 7. Indexing and search confusion matrices for leave-one-patient-out validation for top-5 consensus predictions in breast dataset.Columns represent the indexing and search using deep features from KimiaNet, DINO-trained vision transformer, and DINO-trained ResNet50, respectively.Rows represent results in three experimental setups using no normal atlas, using a normal atlas with Isolation Forest, and using a normal atlas with one-class SVM, respectively.

Figure 8 .
Figure 8. Indexing and search F1 score for leave-one-patient-out validation for top-5 consensus predictions in skin dataset plotted for each class, deep network, and experimental setup separately (WD = well-differentiated, MD = moderately differentiated, PD = poorly differentiated, NL = normal, B = base configuration of using no atlas, IF = using normal atlas based on Isolation Forest, OS = using normal atlas based on one-class SVM).

Figure 9 .
Figure 9. Indexing and search F1 score leave-one-patient-out for top-5 consensus predictions in breast dataset plotted for each class, deep network, and experimental setup separately (DC = infiltrating ductal carcinoma, LC = lobular carcinoma, NL = normal, B = base configuration of using no atlas, IF = using normal atlas based on Isolation Forest, OS = using normal atlas based on one-class SVM).
Algorithm 1 Pseudocode for creating the normal atlas given a set of normal WSIs I N Pseudo-code for using the normal atlas given a patch P and fitted one-class classifier M F