A new technological approach in diagnostic pathology: mass spectrometry imaging-based metabolomics for biomarker detection in urachal cancer

Urachal adenocarcinomas (UrC) are rare but aggressive. Despite being of profound therapeutic relevance, UrC cannot be differentiated by histomorphology alone from other adenocarcinomas of differential diagnostic importance. As no reliable tissue-based diagnostic biomarkers are available, we aimed to detect such by integrating mass-spectrometry imaging-based metabolomics and digital pathology, thus allowing for a multimodal approach on the basis of spatial information. To achieve this, a cohort of UrC (n = 19) and colorectal adenocarcinomas (CRC, n = 27) as the differential diagnosis of highest therapeutic relevance was created, tissue micro-arrays (TMAs) were constructed, and pathological data was recorded. Hematoxylin and eosin (H&E) stained tissue sections were scanned and annotated, enabling an automized discrimination of tumor and non-tumor areas after training of an adequate algorithm. Spectral information within tumor regions, obtained via matrix-assisted laser desorption/ionization (MALDI)-Orbitrap-mass spectrometry imaging (MSI), were subsequently extracted in an automated workflow. On this basis, metabolic differences between UrC and CRC were revealed using machine learning algorithms. As a result, the study demonstrated the feasibility of MALDI-MSI for the evaluation of FFPE tissue in UrC and CRC with the potential to combine spatial metabolomics data with annotated histopathological data from digitalized H&E slides. The detected Area under the curve (AUC) of 0.94 in general and 0.77 for the analyte taurine alone (diagnostic accuracy for taurine: 74%) makes the technology a promising tool in this differential diagnostic dilemma situation. Although the data has to be considered as a proof-of-concept study, it presents a new adoption of this technology that has not been used in this scenario in which reliable diagnostic biomarkers (such as immunohistochemical markers) are currently not available.


Introduction
The urachus is a fetal structure that connects the forming urinary bladder to the allantois during early intra-uterine development. It obliterates to form the median umbilical ligament and runs from the roof of the bladder to the umbilicus in the midline within the space of Retzius [1]. While macroscopic residues are uncommon, microscopic urachal remnants can be detected in up to 32% of adults [2]. With an incidence of <1 case per 1,000,000 people per year, urachal cancer can rarely arise from these remnants with urachal adenocarcinomas (UrC) accounting for over 90% of cases [3][4][5][6]. Non-cystic type UrC mostly (57%) exhibit a mucinous histology followed by intestinal, not otherwise specified (NOS), mixed, and signet ring cell histology subtypes [4,7]. These histological subtypes, however, show striking overlaps with other types of adenocarcinomas. This can pose a major differential diagnostic problem in the histopathological evaluation of biopsies from this region. However, the correct distinction from other tumors is vital as the therapy regimes differ. For example, and most importantly, a colorectal adenocarcinoma (CRC) growing into the bladder mostly represents a palliative situation while localized UrC can be cured by partial cystectomy with resection of the median umbilical ligament and umbilicus. As in this specific setting immunohistochemistry is of little utility and radiology and clinical examination often are non-conclusive, tissue based diagnostic biomarkers are urgently needed to allow a correct preoperative diagnosis and individual therapy planning [4]. We therefore sought to identify metabolic diagnostic biomarkers using mass spectrometry imaging (MSI), which has not been performed in this field.
The tumor metabolism is known to differ from metabolism of corresponding normal cells [8]. Reprogramming of the energy metabolism is one of the hallmarks of cancer [9], including elevated glutaminolysis [10] and enhanced glycolysis rates even under aerobic conditions, known as the Warburg effect [11]. The altered metabolism of cancer is necessary for the enhanced proliferation rate of tumor cells [12,13]. To analyze metabolic alterations in situ, matrixassisted laser desorption/ionization (MALDI) MSI is a powerful tool [14]. Depending on the applied matrix, different types of analytes can be detected on a single tissue section. By scanning the section with a laser and combining with the pixel-wise sensitive detection by mass spectrometry, the spatial information of various metabolites in the tissue is revealed. The combination of data obtained from MSI with histopathological information, known as multimodal imaging, is crucial to avoid artefacts in data analysis [15] and allows for correct classification of profiles in cancer and non-cancerous tissue. Highly detailed histological or immunohistochemical data for co-registrations can be obtained by digital pathology accompanied by significant annotations by pathologists [16].
Multimodal analysis of MSI data with fluorescent image data from a digital pathology software was demonstrated lately [17]. A similar approach was additionally recently described for the combination of MSI data with digital pathology information from hematoxylin and eosin (H&E) stained slides [18]. This is relevant as H&E-staining of tissue specimen represents a routine histological technique of high informative value on cellular and non-cellular level, in context of tissue structure and composition. We therefore, for the first time, used this new technological approach to detect diagnostic biomarkers in a critical tissue based differential diagnostic setting, focusing on the discrimination of UrC and CRC.

Material and methods
Cohort and construction of tissue micro-arrays A cohort of UrC and CRC was retrospectively collected from the archive of the Institute of Pathology at the University Hospital Essen (UrC: n = 14, CRC: n = 27) and of the Institute of Pathology at the University Hospital Göttingen (UrC: n = 5). Details on clinico-pathological data are given in Table 1. Diagnoses of UrC and CRC were established following WHO criteria [19,20]. Histopathological information was compiled after review by a genitourinary (GU) pathologist (HR). Tumor areas were marked on the H&E slides (HR) and TMAs were constructed using an automated platform (TMA Grand Master, 3DHISTECH, Budapest, Hungary) with three cores per case (diameter: 1.3 mm). The study was approved by the ethics committee of the University of Duisburg-Essen (15-6372-BO) and it was performed in accordance with the Helsinki declaration and its amendments.

Digital pathology
TMAs were sectioned and H&E stained on a HE600 platform (Ventana/Roche diagnostics, Oro valley, AZ, USA) using standard diagnostic protocols. Stained TMAs were scanned using an Aperio AT2 system (Leica Biosystems, Wetzlar, Germany) for creation of digital whole slide images (WSIs). WSIs were annotated by a GU-pathologist (HR) using the software QuPath v0.1.2 [21] as basis of adjustment of automated tumor detection thresholds (JMN). After TMA dearraying and cell detection, smoothed features were calculated for 25 µm and 50 µm. The classifier was trained on tumor and non-tumor regions. The random trees classifier was built with 23,755 training objects and classification results were verified by the pathologist.

Mass spectrometry imaging
Serial TMA sections (4 µm thickness) were cut onto indium tin oxide coated glass slides (Bruker Daltonik GmbH, Bremen, Germany) using fresh blades for every new block. Sections were stored at 4°C in the dark until further use. Directly before matrix application, TMA sections were deparaffinized twice for 8 min in reagent grade xylene, as described by others [22]. Matrix N-(1-naphthyl) ethylenediamine dihydrochloride (NEDC) (≥99% p. a., Carl Roth GmbH + Co. KG, Karlsruhe, Germany) was used at a concentration of 7 mg/ml in Methanol/Water (70/30, v/v).
Matrix application was executed using the TM-Sprayer (HTX Technologies, LLC, Chapel Hill, USA) with a flow rate of 0.12 ml/min, a velocity of 1200 mm/min, and 3 mm track spacing for 30 passes at a nozzle temperature of 70°C. Samples were stored in a dry cabinet (Eureka Dry Tech/Taiwan Dry Tech, Taipei City, Taiwan) until measurements.
MALDI-Orbitrap-MSI was performed on a Spectroglyph MALDI/ESI Injector (Spectroglyph, LLC, Kennewick, USA) coupled with a Q Exactive Plus orbitrap (Thermo Fisher Scientific Inc., Waltham, USA). Pierce Negative Ion Calibration Solution (Thermo Fisher Scientific Inc.) was used for external mass calibration. Raster step size was set to 75 µm. The mass range m/z 85-1000 was recorded with a fixed inject time of 250 ms and a mass resolution of 70,000 in negative ion mode.

Data analysis
MALDI-Orbitrap-MSI data were converted to imzML format using the software Spectroglyph Image Insight Ver 0.1.0.17171 (Spectroglyph). For data exploration, TMAs were combined into one dataset using the software SCiLS Lab MVS 2020a Pro (Bruker Daltonik GmbH). A peak list was created manually to exclude artefacts and matrix peaks. Spectra were normalized to the total ion count and ion images were generated with a threshold of ±1 mDa. Analytes were putatively annotated by their accurate mass using METLIN [23] and the Human Metabolome Database (HMDB) [24].
Further data analysis was performed with Python 3.7 (Python Software Foundation, Wilmington, USA). MSI imzML data was imported using the pyimzML parser. A software solution was implemented for the automated coregistration of MSI data with digital pathology results from QuPath using OpenCV [25]. Spectral information was extracted for classified tumor regions of TMA cores for manually picked peaks (n = 199) and mean intensities for cores were calculated. Analytes with absolute mean intensities above 70 (n = 173) were used to calculate the feature importance by random forest classification with a threshold of 0.01, yielding 27 ion channels (Supplementary information). Different algorithms (k-nearest neighbors (KNN), support-vector machine (SVM), and random forest) were used to classify cases based on metabolic profiles using eightfold cross-validation. For this purpose, mean intensities for each case were calculated by combining core intensities. The diagnostic ability of the classifiers was visualized in a receiver operating characteristic (ROC) using scikit-learn [26]. Differences between groups were furthermore visualized via t-distributed stochastic neighbor embedding (t-SNE) [27] and boxplot analyses. Boxplots were generated with tumor cases, using mean intensities for Tumor stage of UrC is indicated using the Sheldon [39] or Mayo staging system [40] as commonly accepted. The Tumor, Node, Metastasis (TNM) system in its 7th edition was used for staging of CRC [41]. y years, n/a data not available, NOS not otherwise specified. each tumor. Statistical significance was calculated with the statannot package using Kruskal-Wallis test with Bonferroni correction.

Results
In order to characterize UrC versus CRC, TMAs were established. First, thin sections were analyzed by the established histopathological classification upon H&Estaining. In a second step these TMAs were analyzed by MSI. Histopathological classification and transformation on mass spectrometry imaging data Cells from 146 TMA cores were automatically detected using the QuPath software and classified by utilizing cell features in a random trees algorithm on the basis of H&E images. In this way, stained tissue sections (Fig. 1A) were divided into tumor and non-tumor regions, e.g., stroma and necrotic tissue (Fig. 1B, D and E). The resulting mask of tumor regions was transferred onto MSI data after image transformation, minimizing the inclusion of non-tumor regions in data analysis (Fig. 1C).

Differentiation of UrC and CRC through multivariate analyses
Metabolic differences in tumor regions from UrC and CRC tissues were demonstrated through multivariate analyses. Twenty-seven m/z channels were selected using feature importance and were used for the calculations (Supplementary information). Similarities between metabolic phenotypes are visualized via t-SNE algorithm. A separation of UrC and CRC cores can be recognized, however, transition between tumor groups shows overlaps ( Fig. 2A). Considering tumor subtypes of all analyzed cases revealed that outliers particularly consist of mucinous CRC (n = 2), which seem to resemble metabolic profiles of mucinous UrC specimen (Fig. 2B). Different classifiers were trained on the metabolite intensity data of UrC and CRC cases. Using cross-validation, a classification accuracy of 0.87 (±0.15) was yielded using a random forest algorithm, 0.87 (±0.22) using a KNN algorithm and 0.83 (±0.24) using a SVM algorithm. The corresponding ROC analysis describes the ability to distinguish between UrC and CRC tumors and shows an area under the curve (AUC) of 0.94 for the random forest classifier, 0.9 for the KNN classifier and 0.88 for the SVM classifier (Fig. 2C).

UrC metabolite levels differ from CRC metabolite levels
Several metabolites were found to be significantly different in their abundance, when comparing UrC with CRC specimen. However, no analyte was found to be abundant uniquely in one tumor group. Antioxidant taurine (m/z 124.0064) shows higher signal intensities in cores of the CRC group (Fig. 3), which was verified through statistical analysis (p = 0.0009). A classification accuracy of 0.74 was achieved by a random forest classifier based solely on taurine levels. Intensity levels of taurine and further analytes that are significantly different in the tumors are visualized in boxplots (Fig. 4). Ion channels m/z 170.0231 and m/z 186.0188 are enhanced in CRC samples as well. These m/z values represent the chloride adduct ions of purine bases adenine and guanine with respective p values of 0.0003 and 0.0003 (Fig. 4C, D). Supporting these results, taurine, adenine, and guanine were detected as [M-H] − and [M + Cl] − ions, showing similar differences between groups. Therefore, only the ion channel with higher intensity is shown, respectively.
Furthermore, analytes were found to have higher abundances in UrC. The analyte with m/z 115.0026 was annotated as fumarate. The tricarboxylic acid cycle metabolite shows increased intensity levels in UrC specimen, compared to CRC specimen (p = 0.0006) (Fig. 4A). Likewise, ion channels m/z 232.0829 and m/z 238.0485, a N-acylalpha amino acid, show significantly higher levels in UrC specimen with p values of 0.0002 and 0.0341, respectively (Fig. 4E, F).

Discussion
The histopathological differential diagnostic process of UrC is of major therapeutic importance. However, as supportive diagnostic technologies were shown to be helpful only in a subset of cases or specific settings [4,28,29], diagnostic biomarkers are urgently needed.
For the detection of diagnostic biomarkers in UrC, we therefore sought to employ a technology that has not yet been used in this setting (MALDI-MSI). Aims of the present study were to (i) show the feasibility of MALDI-MSI for the evaluation of FFPE tissue in UrC and CRC as its most relevant differential diagnosis, (ii) combine spatial MALDI-MSI data with annotated histopathological data from digitalized H&E slides, and finally (iii) evaluate metabolites as a differential diagnostic biomarker in UrC versus CRC.
Considering the first aim, the analysis of metabolites from FFPE tissue is still a great obstacle. Although the feasibility of MALDI-MSI for the evaluation of FFPE tissue was demonstrated in principle [22], it has to be noted that less metabolites are detectable via MS in FFPE samples compared to fresh frozen tissue [30]. However, in case of rare tumors such as UrC, where only few tissue samples are available over years, the use of FFPE material is inevitable. In this study, the less commonly used matrix NEDC [31] was utilized and a successful application on FFPE tissue samples was demonstrated. Considering these results, the present study is the first to use these techniques in this setting showing metabolic differences.
The metabolite with most prominent differential expression between UrC and CRC, taurine, is an amino acid with antioxidative properties, that can induce apoptosis and can suppress proliferation in tumor cells [32,33]. Increased taurine levels in CRC in comparison to non-tumorous specimen were reported previously [34]. This emphasizes the need of spatially separating tumorous from non-tumorous tissue in the MALDI-MSI analysis to detect metabolites that derive from the cancer cells themselves but not from the stromal compartment or non-tumorous epithelia. We addressed this issue by H&E-staining of serial sections of the TMA used for MALDI-MSI analysis. These H&E slides were scanned, and cancer cells were digitally identified after training of an algorithm based on the pathologist's annotations and manual validation of the final detection results. After merging the MALDI-MSI data with data from digital pathology, the metabolic profile could be spatially assigned to the cancer itself thus acting as a proof-of-concept of the study's second aim to combine MALDI-MSI and histopathological data with spatial discrimination (Fig. 1). The relevance of this multimodal approach is emphasized, as the 2021a release of the commercially available MSI software SCiLS Lab (Bruker Daltonik GmbH) now also allows an export of QuPath annotations into SCiLS Lab.
Beside taurine, also several small molecules with significantly different levels in the two tumor types were detected. This is important for achieving the AUC of 0.94 in ROC-analyses (Fig. 2C). Best classification result was obtained using a random forest algorithm, which is limiting data-overfitting and was used in various MSI approaches before [35]. However, for taurine alone, the diagnostic accuracy was 74% with an AUC of 0.77 representing an excellent result in the present study setup. As the two tumor types also show an overlap in the t-SNE visualization, it is interesting to note that most outliers were of mucinous subtypes both in UrC and CRC with strong discrimination of intestinal type tumors ( Fig. 2A and B). These differences have to be kept in mind when applying the technology in this scenario. However, considering the third aim of the study, the diagnostic accuracy of taurine levels measured by MALDI-MSI considerably outperforms currently available adjunctive technologies such as immunohistochemistry of beta-catenin or CK7 [4,36,37].
Taurine was additionally identified to be enhanced in urine samples of patients with a colorectal neoplasia [38]. In turn, the finding of lower taurine levels in UrC specimen  might also be reflected in urinary samples and should be further analyzed.
Our study has some limitations. As stated above, data quality could be increased, if fresh frozen tissue of both UrC and CRC would have been used. However, as UrC is such a rare tumor type, FFPE tissue samples are the only available source of material in sufficient numbers. Although we additionally analyzed different cores per sample and discriminated tumorous from non-tumorous areas, the number of samples, i.e., cohort size, used in the present study still is low. Therefore, the results should be considered as a proof-of-concept with the result of a promising diagnostic biomarker (taurine) from a combined MALDI-MSI/digital pathology approach, which has to be validated in further studies and larger cohorts.

Data availability
The datasets used and analyzed in this study are available from the corresponding author on reasonable request.
Author contributions HR, HB, TS, JMN, KN, and UKe. performed study concept and design; JMN performed development of MSI methodology and data acquisition, analysis and interpretation, and statistical analysis; HR annotated TMAs; JMN and NN performed software engineering; HCK, FB, PN, and UKr provided support for TMA generation as well as technical and material support; HR and JMN performed writing of the paper. All authors reviewed the paper and read and approved the final paper.

Compliance with ethical standards
Conflict of interest The authors declare no competing interests.
Ethics approval The study was approved by the ethics committee of the University of Duisburg-Essen (15-6372-BO) and it was performed in accordance with the Helsinki declaration and its amendments.
Publisher's note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons. org/licenses/by/4.0/.