MACE RNA sequencing analysis of conjunctival squamous cell carcinoma and papilloma using formalin-fixed paraffin-embedded tumor tissue

Recent advances in the field of biomedical research allow for elucidation of the transcriptional signature of rare tumors such as conjunctival squamous cell carcinoma (SCC). In this study we compare its expression profile to conjunctival papilloma (Pap) and healthy conjunctival tissue (Ctrl) and develop a classification tool to differentiate these entities. Seven conjunctival SCC, seven Pap and ten Ctrl were formalin-fixed and paraffin-embedded (FFPE) and analyzed using Massive Analysis of cDNA Ends (MACE) RNA sequencing. Differentially expressed genes (DEG) and gene ontology (GO) clusters were explored and the abundance of involved cell types was quantified by xCell. Finally, a classification model was developed to distinguish SCC from Pap and Ctrl. Among the most prominent DEG in SCC a plethora of keratins were upregulated when compared to Pap and Ctrl. xCell analysis revealed an enrichment of immune cells, including activated dendritic cells and T-helper type 1 cells (Th1), in SCC when compared to Ctrl. The generated classification model could reliably discriminate between the three entities according to the expression pattern of 30 factors. This study provides a transcriptome-wide gene expression profile of rare conjunctival SCC. The analysis identifies distinct keratins, as well as dendritic and Th1 cells as important mediators in SCC. Finally, the provided gene expression classifier may become an aid to the conventional histological classification of conjunctival tumors in uncertain cases.


Scientific Reports
| (2020) 10:21292 | https://doi.org/10.1038/s41598-020-78339-6 www.nature.com/scientificreports/ standard genome annotations required for probe design and possible cross-hybridization problems 20 . The exact cellular and molecular mediators and the signaling pathways involved in the development of SCC remain not fully understood and a targeted molecular therapy approach is therefore not available. Transcriptome-wide gene expression analysis is currently evolving as an important tool in cancer research to reveal gene expression signatures and molecular diagnostic markers and to define potential therapeutic targets [21][22][23] . In light of these recent endeavors in the field of biomedical research, the aim of this study was to elucidate the cellular and molecular mediators of conjunctival SCC by comparing them to Pap and healthy conjunctiva and to define their expression signature as an aid to the histological classification. The transcriptional signatures of conjunctival SCC presented in this study may improve our understanding of the disease, as well as its diagnosis and eventually contribute to successful personalized therapeutic approaches.

Patients.
A total of 24 conjunctival samples from 24 patients were included in this study. The tissue samples were collected during therapeutic procedures and not specifically for this study. Seven subjects with conjunctival SCC and seven subjects with Pap who had been treated at the Eye Center of the University of Freiburg from 2003 to 2017 were retrospectively enrolled in this study. Healthy conjunctival tissue samples (superior quadrants in proximity to the limbus) from ten subjects who underwent buckling or 20-gauge vitrectomy surgery for retinal detachment served as controls. All tissue samples were analyzed in an anonymized manner. Ethics Committee approval was granted from the Ethics Committee Freiburg for this study and a written informed consent was obtained from each patient. All experiments were performed in accordance with the standards of the Declaration of Helsinki.
Paraffin embedding. Formalin fixation and paraffin embedding (FFPE) of tumor and control tissue were performed according to routine protocols as previously described 24 . Briefly, specimens were fixed immediately after surgery in 4% formalin for 12 h, dehydrated in alcohol and finally processed for paraffin embedding. Paraffinized tissue samples were then processed for RNA isolation. 4-µm-thick sections were cut, mounted on silanized slides and deparaffinized in xylol-alcohol. Following routine histological staining, the histological diagnosis for each specimen was determined by an experienced ophthalmic pathologist. RNA isolation. Total RNA was isolated from ten to fifteen formalin-fixed and paraffin-embedded (FFPE) 4-µm-thick sections from each specimen using the Quick-RNA FFPE Kit (Zymo Research, USA). Following a DNAse I digestion using the Baseline-ZERO kit (Epicentre, USA), the RNA concentration was measured with the Qubit RNA HS Assay Kit on a Qubit Fluorometer (Life Technologies, USA). The RNA quality was determined with the RNA Pico Sensitivity Assay on a LabChip GXII Touch (PerkinElmer, USA).

MACE libraries.
RNA sequencing was performed using massive analysis of cDNA ends (MACE), a 3′ RNA sequencing method. The preparation of MACE libraries was carried out using 1 µg of total RNA, as previously described 25 . The barcoded libraries (seven SCCs, seven Pap and ten healthy conjunctival samples) were sequenced on the NextSeq 500 (Illumina, USA) with 1× 75 bp. PCR bias was removed using unique molecular identifiers. The sequence data have been submitted to the Gene Expression Omnibus database under accession number GSE147449.

Statistics and bioinformatics.
Sequencing data were uploaded to and analyzed on the Galaxy web platform (usegalaxy.eu) 26 , as previously described 27 . Quality control was performed with FastQC Galaxy Version 0.72 28 . Reads were mapped to the human reference genome (Gencode, release 31, hg38) with RNA STAR Galaxy Version 2.7.2b 29 with default parameters using the Gencode annotation file (Gencode, release 31, https :// www.genco degen es.org/human /relea ses.html). Reads mapped to the human reference genome were counted using featureCounts Galaxy Version 1.6.4 30 with default parameters using the aforementioned annotation file. The output of featureCounts was imported to RStudio (Version 1.2.1335, R Version 3.5.3). Gene symbols and gene types were determined based on ENSEMBL release 97 (July 2019) (Human genes, GRCh38.p12, download on 09/28/2019) 31 . Genes with 0 mean reads in at least one group were removed from analysis. Batch effects were removed by the limma function removeBatchEffect 32 . After principal component analysis, differential gene expression with Benjamini-Hochberg adjusted (adj.) p-values was analyzed using the R package DESeq2 Version 1.22.2 33 with default parameters. Transcripts with log2 fold change (log2 FC) > 2 or < − 2, adjusted p-value < 0.05 and mean of normalized reads > 10 in the respective upregulated group were considered as differentially expressed genes (DEG). Heatmaps were created with the R package ComplexHeatmap 1.20.0 34 . Venn diagrams were created using the VennDiagram package 35 . Other data visualization was performed using the ggplot2 package 36 . Gene enrichment analysis and its visualization were done using the R package clusterProfiler 3.10.1 37 . To elucidate the cellular profile within the normal conjunctival, Pap and SCC microenvironment, we applied the computational method xCell, which quantifies the abundance scores of 64 immune and stroma cell types including hematopoietic progenitors, epithelial cells, extracellular matrix cells as well as adaptive and innate immune cells 38 . For this purpose, transcripts per million were calculated based on the output of featureCounts (assigned reads and feature length), as previously described 39 . Enrichment scores were compared between different groups using Mann-Whitney U test. Cell types with p < 0.05 were considered to be significantly differentially enriched.
For each group, specific genes were identified based on the overlap of upregulated factors in comparison to the two other groups. The 10 highest expressed genes in each group were used to develop a 30-gene classifier, as previously described 40 . Briefly, Pearson correlation was calculated for each sample's expression of these 30 genes to the mean expression of each group using leave one out validation 41 www.nature.com/scientificreports/ one out sample was compared to the remaining samples of each group, repeating the procedure for each sample. For example, a higher Pearson correlation with SCC mean expression than with Papilloma mean expression predicted the sample to be SCC. Each sample's correlation pair was plotted (SCC vs. Pap and Pap vs. Ctrl). The distance of each point from the diagonal with slope 1 defined a score to differentiate between SCC and Pap or between Pap and healthy conjunctiva, respectively. Cutoff values were determined based on standard deviation of absolute values of a group's distance.

Results
24 conjunctival samples from 24 patients were included in this study-seven conjunctival SCC (five male and two female patients, mean age of 70 ± 13 years), seven conjunctival Pap (three male and four female patients, mean age of 37 ± 24 years) and ten healthy conjunctival tissue samples (nine male and one female patients, mean age of 55 ± 8 years) (Fig. 1A). Principal component analysis revealed samples clustering into three distinct groups, which co-segregated with clinical features of healthy conjunctiva, conjunctival papilloma and carcinoma (Fig. 1B). Differentially expressed genes between the three groups are illustrated in a heatmap, showing their clear segregation on a transcriptional level. No obvious sex-or time-in-FFPE-dependent influence on the transcriptional profile was found (Fig. 1C).
Differential expression of genes in conjunctival SCC and Pap compared to healthy conjunctiva. Next, we explored the differentially expressed genes in conjunctival SCC and Pap compared to conjunctiva. MACE analysis revealed 480 upregulated and 359 downregulated factors in SCC compared to healthy conjunctiva ( Fig. 2A   www.nature.com/scientificreports/ genes in SCC versus control tissue and in Pap versus control tissue (Fig. 2B). While 98 transcripts were upregulated in both SCC and Pap when compared to healthy conjunctiva, we detected 382 genes showing a specific overexpression in conjunctival SCC only (Fig. 2B). The GO terms with the most outstanding p enrichment value among this set of carcinoma-specific transcripts were skin development, epidermis development and keratinocyte differentiation (Fig. 2C). On the other hand we found 317 upregulated factors, which were exclusive in Pap and contributed to the GO terms nuclear division, organelle splitting and meiotic cell cycle (Fig. 2D).
To further determine SCC-specific factors we compared the differentially expressed genes in SCC versus control tissue and in SCC versus papilloma. We found 222 factors to be upregulated in SCC when compared to both, healthy conjunctiva and papilloma (Fig. 3A). The most significantly upregulated transcripts within this group (according to the mean of normalized reads in SCC) were the keratins 14, 17 and 16 (KRT14, KRT16 and KRT17) (Fig. 3B).
Papilloma-specific factors were also defined by comparing the differentially expressed genes in Pap versus SCC and Pap versus control tissue. 184 transcripts were upregulated in Pap when compared to both other entities. The highest expressed transcript within this group (according to mean of normalized reads in Pap) was YTHDC1 (YTH domain containing 1) (SupplFig. 1 A,B).

Cell type enrichment analysis in conjunctival tumors.
Since the RNA-Seq profile is a conglomerate of RNA of multiple cell types including resident and infiltrating cells, we first conducted cell type enrichment analysis to recover the identity of cell types found in conjunctival SCC, Pap and healthy conjunctiva, using the gene signature expression-based cell type enrichment tool xCell 38 . Cell type enrichment scores (ES) across 64 immune and stromal cell types were obtained for the three entities (Fig. 4A, SupplFig. 3). Our data demonstrates that immune scores representing the immune cell content were overall higher in SCC (0.08 ± 0.05) when compared to healthy conjunctiva (0.03 ± 0.03 for Ctrl, p = 0.04) and similar to Pap (0.1 ± 0.15, p = 0.52). The stromal scores, in contrast, were overall reduced in SCC (0.01 ± 0.01) and Pap (0.01 ± 0.02) when compared to healthy conjunctiva (0.03 ± 0.02; for SCC p = 0.004, for Pap, p = 0.03). Among the different cell subpopulations, activated dendritic cells (aDC) and T-helper type 1 cells (Th1) were significantly increased in SCC samples compared to both other groups (p < 0.05, Fig. 4B). In Pap, smooth muscle cells and CD8 + naïve T cells were significantly enriched in comparison to SCC and normal conjunctiva (p < 0.05), while control conjunctiva was characterized by an enrichment of hematopoietic stem cells and endothelial cells (p < 0.05, Fig. 4B).
Classification model. To build a classifier distinguishing SCC, Pap and normal conjunctiva, we developed a classification model on the basis of factors specific for the three analyzed groups. For this purpose, Pearson correlation was calculated for the mean expression of the top 10 SCC- (Fig. 3B), Pap-(SupplFig. 1) and Ctrlspecific transcripts (SupplFig. 2) to the mean expression of each group using leave one out validation (Fig. 5A,B).
The classification model showed a reliable classification of the diagnosis according to the expression pattern of the 30 genes (Fig. 5C). Using this approach, the sensitivity and specificity for discrimination of SCC, as well as for normal conjunctiva was 100.0% each. For Papilloma, the sensitivity and specificity were 57.1% and 100.0%, respectively. The overall accuracy of the model was 85.7% (AUC (area under the curve) for the whole model: 0.86).

Discussion
Transcriptome-wide gene expression analysis is emerging as an important tool in cancer research to decipher altered gene expression patterns, detect molecular diagnostic markers and to define potential therapeutic targets. However, the transcriptional analysis of rare tumors, such as SCC, has been impeded in the past by their low incidence, which makes a prospective study with fresh tissue challenging. Formalin-fixed paraffin-embedded (FFPE) tissue samples of rare tumors, including conjunctival SCC and Pap, are, among others, routinely collected in clinical histological archives. The advent of 3′ RNA-Seq methods, such as the Massive Analysis of cDNA Ends (MACE) technology, allows for high throughput sequencing of FFPE specimens that have been stored in histological archives over the past decades 43,44 . In this study we conduct the first comparative analysis of the cellular tumor microenvironment and expression profiles of conjunctival SCC with conjunctival Pap and healthy conjunctiva using the novel MACE technique for sequencing archived samples paired with xCell data deconvolution for cell type discrimination in bulk transcriptomic data. www.nature.com/scientificreports/ In line with histological observations of changed epithelial cell differentiation and keratinization 2 , our transcriptional analysis reveals a plethora of altered factors in SCC that contribute to biological GO processes such as skin development, epidermis development and keratinocyte differentiation. Almost the entire gene cluster of the epidermal differentiation complex (EDC) is cumulatively upregulated in conjunctival SCC when compared to healthy conjunctival tissue. The EDC comprises a gene complex of over fifty genes encoding molecules acting during the coordinated process of cornification, the specific form of differentiation and programmed cell death characteristic for keratinocytes 45 . Among them, e.g. FLG2, filaggrin family member 2, LCE3D, late cornified envelope 3D, SPRR2G, small proline rich protein 2G, and many more were upregulated in SCC in our dataset. The notion of altered differentiation in the malign tumor, as evidenced by our GO term analysis, is in contrast to the mere upregulation of proliferation-associated factors in the benign papilloma.  www.nature.com/scientificreports/ In accordance with the above-mentioned GO terms, we found several keratins (KRT14, KRT17, KRT16, KRT6A, KRT6B, KRT6C) to be significantly increased in conjunctival SCC when compared to both healthy conjunctiva and Pap. Changes in keratin subtype expression have been linked to various tumor species, particularly cutaneous epithelial tumors 46 , but also other squamous cell carcinoma types 40,[47][48][49][50] . Their expression and organization have an impact on cell growth, migration and invasion, all hallmarks of cancer metastasis 51,52 . Furthermore, keratins function in immune recognition, tenacity to cellular stresses, posttranslational modifications that modulate intermediate filament assembly and cellular motility and thereby are highly relevant in cancer progression and tumor cell dissemination 53 . In line with our analysis SCC of the skin is characterized by the expression of the stratified epithelial keratins KRT5 and KRT14 and the hyperproliferative keratinocyte-type keratins KRT6, KRT16 and KRT17, overexpressed upon cell stress and injury 54 . Furthermore, KRT17 has been previously found to be upregulated in SCC of the conjunctiva in microarray analyses 19 . Taken together, this data supports the notion that stratified-epithelial keratins, in particular KRT6 and KRT17, are useful as general markers for squamous cell carcinomas in histologically uncertain, poorly differentiated conjunctival SCC samples 54 .
In addition to keratins, S100 calcium-binding proteins (S100A2 and S100A7) stand out as highly upregulated SCC-specific factors in our analysis. S100 family members are involved in a plethora of cellular processes including proliferation, differentiation, inflammation and apoptosis 55 . Altered S100A7 expression has been linked to poor clinical outcomes in several solid cancer types by promoting invasiveness of cancer cells via upregulation of MMPs 56 . Moreover, S100A2 is a crucial factor for proliferation and differentiation of epithelial cells 57 and promoting neoplastic transformation 19 . Thus, S100A2 and S100A7 may possess certain prognostic and therapeutic potential in conjunctival SCC.
SCC are characterized by tissue infiltrating immune cells, epithelial cell differentiation and keratinization 2,58 . In line with these histological findings, the x-Cell-based bioinformatic cell type enrichment analysis demonstrated an overall increased immune cell score when compared to healthy conjunctiva. Among the different cell subpopulations, activated dendritic cells (aDC) and T-helper type 1 cells (Th1) were significantly increased in SCC samples compared to both other groups. These results concur with reports suggesting that the presence of dendritic cells correlates with poor prognosis for SCC of the ovary 59 and SCC of the head and neck 60 . Dendritic cells are antigen presenting cells mediating a link between innate and adaptive immunity by secreting interleukin (IL)-12 and type I interferons and thus activating Th1 cells 61 , which are also substantially represented in conjunctival SCC in our cell type analysis. Thus, the accumulation of these immune cell types opens new perspectives for cell-specific treatment of conjunctival malignancies.
In order to define an expression signature as an aid to the histological classification of conjunctival papilloma and squamous cell carcinoma and to reduce potential observer bias and variability, we developed a classification model including 30 factors identified in our analysis as specific for the three entities (Ctrl, Pap and SCC) using a strategy described recently 40 . The model showed an overall accuracy of 0.86 (AUC), indicative of robust allocation. Thus, the presented RNA classifier provides an objective quantitative method to aid in the pathological diagnosis of conjunctival SCC and Pap. This non-subjective molecular classification of SCC and Pap may be useful in the future as an additional diagnostic tool for complicated histological cases and may serve to optimize the reproducibility and accuracy of the diagnosis. However, further studies featuring more patients and especially challenging cases (e.g. dysplastic changes in epithelium) are warranted to validate the discriminatory power of the classifier and confirm its applicability in clinical routine.
Although this study is retrospective, includes a limited sample size and relies on internal expression signature validation only (leave one out validation approach 41,42 ), its findings are consistent with previous observations in conjunctival and cutaneous epithelial tumors, which supports this approach. In addition, a transcriptome-wide evaluation is inherently more comprehensive and serves as a basis for the generation of new hypotheses to be tested in the future. Single-cell RNA sequencing could provide a detailed impression of cell heterogeneity, but www.nature.com/scientificreports/ is not feasible with archived FFPE material. We employed cell type enrichment analysis in bulk RNAseq data using xCell 38 to assess the identity of the major cell types found in conjunctival samples. It is important to note, that these results were not validated by histopathology due to the limited amount of available tissue and sample acquisition difficulties for a considerable number of immunohistochemical stainings. Further prospective studies with increased power are warranted for clinical validation of the developed classification model. Taken together, the present study provides a high throughput transcriptome-wide gene expression profile of rare squamous cell carcinoma of the conjunctiva using the MACE technology in FFPE archival tumor samples. The analysis identifies, among others, dendritic cells and Th1 cells as primary accumulating cell types and keratins as highly expressed factors in SCC, which all together may contribute to tumor growth and disease progression. Finally, we developed a sensitive and specific gene expression classifier for conjunctival tumors that distinguishes SCC from Pap and normal conjunctiva, which may be useful as an additional diagnostic tool for complicated histological cases and serve to optimize the reproducibility and accuracy of the diagnosis.

Data availability
The sequence data have been submitted to the Gene Expression Omnibus database under accession number GSE147449. The password is available from the corresponding author upon request. www.nature.com/scientificreports/ Publisher's note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creat iveco mmons .org/licen ses/by/4.0/.