Whole transcriptome targeted gene quantification provides new insights on pulmonary sarcomatoid carcinomas

Pulmonary sarcomatoid carcinomas (PSC) are a rare group of lung cancer with a median overall survival of 9–12 months. PSC are divided into five histotypes, challenging to diagnose and treat. The identification of PSC biomarkers is warranted, but PSC molecular profile remains to be defined. Herein, a targeted whole transcriptome analysis was performed on 14 PSC samples, evaluated also for the presence of the main oncogene mutations and rearrangements. PSC expression data were compared with transcriptome data of lung adenocarcinomas (LUAD) and squamous cell carcinomas (LUSC) from The Cancer Genome Atlas. Deregulated genes were used for pathway enrichment analysis; the most representative genes were tested by immunohistochemistry (IHC) in an independent cohort (30 PSC, 31 LUAD, 31 LUSC). All PSC cases were investigated for PD-L1 expression. Thirty-eight genes deregulated in PSC were identified, among these IGJ and SLMAP were confirmed by IHC. Moreover, Forkhead box signaling and Fanconi anemia pathways were specifically enriched in PSC. Finally, some PSC harboured alterations in genes targetable by tyrosine kinase inhibitors, as EGFR and MET. We provide a deep molecular characterization of PSC; the identification of specific molecular profiles, besides increasing our knowledge on PSC biology, might suggest new strategies to improve patients management.

Transcriptome analyses. The differential expression analyses between each tumor dataset and the respective normal control revealed 115, 118 and 275 differentially expressed genes in PSC, LUAD and LUSC respectively. Among these 38, 22 and 152 genes were deregulated only in PSC, LUAD and LUSC respectively, whereas 44 genes were deregulated in all datasets. Deregulated genes are reported in Table 2. FDR and fold changes are reported in Supplementary Table S1A-C. Immunohistochemistry. Among the 38 genes specifically deregulated in PSC group, 4 genes had a logFC ratio between PSC, LUAD and LUSC over the 90 th percentile: sarcolemma associated protein (SLMAP), joining chain of multimeric IgA and IgM (IGJ), RNA binding region (RNP1, RRM) containing 3 (RNPC3) and RRN3 homolog, RNA polymerase I transcription factor pseudogene 2 (RRN3P2). According to literature data, IGJ and SLMAP seemed to be the most relevant in human cancer, and they were selected to validate the differential gene expression by IHC on an independent cohort, including 30 PSC, 31 LUAD and 31 LUSC cases. The number of IGJ positive inflammatory cells infiltrating the tumor was lower in PSC than in LUSC samples (FDR = 0.0015) and in PSC than in LUAD samples (FDR = 0.0035) (Fig. 1a). The expression of SLMAP was higher in PSC (FDR < 0.0001) and LUAD (FDR = 0.0010) in comparison with LUSC samples (Fig. 1b).

Id
Age Gender T-stage N-stage Histotype* Histotype components Genotype SLMAP score** IGJ score** www.nature.com/scientificreports www.nature.com/scientificreports/ Results related to the immunohistochemistry evaluation of PD-L1 expression on tumor and infiltrating immune cells of PSC cases are reported in Table 3. IGJ expression did not correlate with PD-L1, whereas PD-L1 expression levels were correlated in tumor and immune cells (ρ = 0.71, P < 0.0001).
Pathway enrichment analysis. Pathway analysis revealed 6, 6 and 5 enriched pathways in PSC, LUAD and LUSC respectively. Cell cycle and p53 signaling pathways were enriched in all groups, Forkhead box (FOXO) signaling and Fanconi anemia (FA) pathways were specifically enriched in PSC. Details are reported in Fig. 2.

Discussion
PSC are a group of rare and aggressive tumors, which are extremely challenging both to diagnose and to treat 11 . Preoperative pathological diagnosis, by bronchoscopy or CT-guided fine needle biopsy, may fail to identify these tumors, probably because of their high heterogeneity 1 . Currently, no specific signs, symptoms or biomarkers have been found for PSC in comparison with other NSCLC types. Concerning therapeutic options, the high relapse and low survival rates, even after surgical treatment, weaken the role of surgery itself, which could be considered only in a minority of patients mainly at early stage 11 . Furthermore, PSC are highly resistant to conventional chemotherapy 12 and only few authors reported an increased overall survival associated with perioperative chemotherapy 11,13 .
PSC have been scarcely studied according to their rarity, and the unresolved diagnostic, prognostic and therapeutic issues underline the urgent need to better define the molecular profiles of these tumors to identify new potential markers.
In this study we characterized the gene expression profiles of a well selected series of 14 PSC cases, followed by a comparison of PSC expression data with transcriptome data of LUAD and LUSC available on TGCA database. The analysed 14 PSC cases included GCC, SCC, CS, PLC and one case of PB. As described in the introduction section, each PSC type is characterized by peculiar aspects, making this group a real heterogeneous one. Particularly, PB appears as the most histogenetically and morphologically different from all the other subtypes, it is a biphasic tumor where the epithelial component is fetal-type adenocarcinoma with a mesenchymal component that is heterologous and/or immature (blastema-like) 14 and some evidences suggested that blastomas may have unique histologic, immunohistochemical and molecular characteristics 15 . Anyway, all PSC types represent poorly differentiated or dedifferentiated forms of conventional NSCLC. Even though our study cohort includes different PSC histotypes, we considered them as a unique group in the comparison with LUAD and LUSC, because all of them are generally referred to as sarcomatoid pulmonary carcinomas 2 and, to date, no relevant genetic differences have  www.nature.com/scientificreports www.nature.com/scientificreports/ been reported between them 5 . Moreover, we have performed a principal component analysis using transcriptome data and unsupervised clustering (Supplementary Fig. 1) using the expression levels of the 10.000 genes with the highest variance. In our series histotypes do not significantly influence gene expression profiles.
To analyse gene expression profiles of our samples, we used a targeted whole transcriptome sequencing approach reported as highly accurate and perfectly comparable with RNA-sequencing approaches and next generation sequencing platforms utilized to obtain TCGA data 16 . Although TCGA samples are fresh frozen and we analysed only formalin-fixed and paraffin-embedded (FFPE) tissues, the comparison of transcriptome data is supported by several evidences demonstrating a high correlation of gene expression profiles between these two types of specimens [17][18][19] . Anyway, in order to overcome this potential limitation, a validation of results was carried out on an independent cohort of PSC, LUAD and LUSC FFPE tissues from our archives.
In detail, the transcriptome analysis identified genes deregulated in PSC, two of which, IGJ and SLMAP, were confirmed by IHC on the second series on cases.
IGJ gene encodes for Immunoglobulin J chain, it is essential for cell development 20 , immunological defence and antibody secreting cells 21 . IGJ is a linker protein for immunoglobulin alpha and mu polypeptides, it links together IgM or IGA monomers to form pentameric IgM or dimeric IgA. IGJ participates in B cell differentiation and activation and it shows a high expression during the last stages of B cell activation 22 . However, it was reported that IGJ is transcriptionally active also during early stages of both B cell and intrathymic stages of T cell differentiation, but not in peripheral T cells, monocytes or natural killer cells 23 . In addition, an increased expression of IGJ was observed in early B cells in comparison with hemopoietic stem cells and pro-B cells 20 .
Our results demonstrated that it is down-regulated at RNA level and that the number of IGJ positive inflammatory cells infiltrating the tumor is lower in PSC in comparison to other NSCLC types. In some cancers, such as B-acute lymphoblastic leukaemia and Acute Myeloid Leukemia, IGJ expression was found to be correlated with prognosis 24,25 and in hepatocellular carcinoma a high expression of IGJ together with CD5 antigen-like (CD5L) and galectin-3-binding protein (LGALS3BP) was reported to predict response to the chemotherapeutic agent Sorafenib 26 . As regards lung cancer, Martin and collaborators found that this gene could serve as a biomarker for the smoking exposure response 27 . In addition, Kuo and collaborators have recently identified a unique immune gene expression signature of bronchoalveolar lavage cells of tumor-bearing lung segments and tumor adiacent non-neoplastic lung tissues. This signature that included IGJ correlated to inhibitory checkpoints expression 28 . Moreover, IGJ transcription in the lungs was reported to decrease during tumorigenesis, probably due to the immunosuppressive effects of the tumor cells 26 .
The exact role of IGJ in lung cancer has not been definitively identified, but it is clear that the characterization of tumor-infiltrating immune cells has been acquiring an increasing importance, since it could provide crucial information for therapeutic assessment 28 . In the same context we evaluated the expression of PD-L1 on tumor and infiltrating immune cells of all PSC cases included in this study. Interestingly, as shown in Table 3 CS never had a high PD-L1 expression both in the tumor and immune cells. On the other hand, PLC never showed absence of PD-L1 expression in tumor cells. Notably, in our series IGJ and PD-L1 expression were not correlated, whereas PD-L1 levels were positively correlated in tumor and immune cells. Indeed, we cannot draw conclusions with the present data, but it is worth investigating PD-L1 expression across different PSC histotypes.
SLMAP gene was up-regulated in PSC only in comparison to LUSC samples. This gene encodes for an integral membrane protein containing C-terminal regions of coiled-coil structure, which is a component of a conserved striatin-interacting phosphatase and kinase complex 29 . The striatin complexes are involved in several cellular processes such as signaling, cell migration, cell cycle control and apoptosis. Moreover, it was demonstrated that depletion of SLMAP results in the constitutive activation of the Hippo Pathway, whose disruption was associated with tumorigenesis, cancer progression and tumor immunogenicity [29][30][31] .
Besides emerging single biomarkers, the pathway enrichment analysis revealed two pathways specifically enriched in PSC: the FOXO signaling pathway and the FA pathway, strongly involved in cancer processes.
FOXO pathway, essential for both cell growth and differentiation, comprises proteins from the Forkhead-box (FOX) family, which consist of a FOX domain and a transactivation domain and include DNA-binding proteins regulating transcription and DNA repair. In detail, FOXO transcription factors have a tumor suppression function, control apoptosis by regulating E2F transcription factor 1 (E2F1) transcriptional specificity, enhance the transcription of the pro-apoptotic mediator phorbol-12-myristate-13-acetate-induced protein 1 (PMAIP1) and are usually lost in cancer cells [32][33][34] . In addition, FOXO proteins were reported as key regulators of epithelial-mesenchymal transition 35 , which is involved in the sarcomatoid morphologic change 36 .
FA pathway is responsible for repairing DNA crosslinks and double strand breaks and maintaining chromosomal stability 37 . It was demonstrated that FA pathway activation occurs during DNA replication or upon DNA damage induced by carcinogens in cigarette smoke 38 or by DNA cross linking agents, such as the chemotherapeutic agents gemcitabine and cisplatin 39 . Chemotherapy still plays an important role in the management of advanced NSCLC and platinum-based agents are the most effective in PSC 3,40 . In this context, it was proved that the up-regulation of FA pathway is linked to acquired resistance to cisplatin. Considering that this pathway is essential for the response to DNA damage, its enrichment in PSC might then explain why these tumors are less susceptible to DNA cross-link based chemotherapy and may suggest new therapeutic strategies based on the inhibition or downregulation of FA, which was already demonstrated to be able to reverse acquired resistance to cisplatin in NSCLC cell lines 41 .
In addition, it was reported that PSC may harbour molecular alterations similar to other NSCLC types 5,7 . Indeed, we found 9 out of 14 patients harbouring alterations in commonly mutated oncogenes, among which 5 had alterations in genes currently recommended for testing in NSCLC, as MET, ALK, EGFR and BRAF 5,6,[8][9][10] . In this study, the analysis of targetable gene alterations seemed to further support a possible involvement of targeted therapy also in this subgroup of patients 6 . (2019) 9:3536 | https://doi.org/10.1038/s41598-019-40016-8 www.nature.com/scientificreports www.nature.com/scientificreports/ Moreover, some mutated oncogenes may be directly involved in the deregulation of enriched pathways, for instance PIK3-AKT signaling activation leads to functional loss of FOXO transcription factors 32 . In order to investigate such a relationship, we analysed the expression levels of 130 genes belonging to FOXO signaling according to KEGG database. Principal component analysis and hierarchical clustering were performed, and the results are showed in Fig. 3. The two PIK3CA mutants (i.e. SL4 and SL5) did not have similar variation pattern nor expression profile, and they did not have altered levels of FOXO mRNA. Nevertheless, this could be explained by the post-translational effect of PI3K/AKT axis on FOXOs. In fact, activating mutations in PIK3CA result in increased AKT activity and the consequent phosphorylation of FOXO proteins. Phosphorylated FOXOs are retained in the cytoplasm and eventually degraded 32 . Interestingly, the pulmonary blastoma (PB) case had a peculiar FOXO-gene profile. This is in agreement with the theoretical lower differentiation of PB, since FOXO signaling plays a pivotal role in differentiation and epithelial-mesenchymal transition 35 .
Although this study suffers from some limitations, mainly due to the low number of samples on which transcriptome analysis was performed, the comparison with TCGA data and the IHC validation on an independent cohort let us be quite confident about the reliability of our analyses.
In conclusion, we identified 38 genes specifically deregulated in PSC samples, of which two were validated by IHC as deregulated in comparison with LUSC and LUAD samples, and two pathways with a crucial role in cancer specifically enriched in PSC.

Study population. This study was retrospectively conducted in accordance to the principles of the Helsinki
Declaration of 1975 and it was approved by the ethics committee "Comitato Etico di Area Vasta Nord Ovest". Only archival and anonymous samples were analyzed, no protected health information was used and informed consents were obtained from patients.
A first series of 14 PSC tissues (surgical specimens) was selected for transcriptome analysis, comprising 1 PB, 4 CS, 2 SCC, 2 GCC and 5 PLC cases; 3 samples of normal lung parenchyma were also included in the transcriptome analysis. The validation of deregulated genes was conducted by IHC on a second series of lung tissues including 30 PSC, 31 LUAD, 31 LUSC surgical samples. Tumor samples were selected from the archives of the unit of pathological anatomy of the University Hospital of Pisa. All tissues were FFPE. The tumors were independently revised and classified by two expert pathologists (GA and GF) according to the current WHO histologic criteria 1  Nucleic acid purification. Only cases with adequate tumor material (>40% tumor cells) and minimal contamination from benign cells were selected for molecular analysis. For each FFPE tissue, three 10 µm thick sections and four 5 µm thick sections underwent standard deparaffinization for DNA and RNA purification, respectively.
All samples were enriched by manual macrodissection. Total DNA and total RNA were isolated using Qiagen Qiamp DNA mini kit and Qiagen RNeasy FFPE kit (Qiagen, Hilden, Germany) respectively, according to the manufacturer's instructions.
The concentration of total DNA was assessed using a spectrophotometer (ND1000; NanoDrop Technologies, ThermoFisher Scientific, Waltham, MA, USA), whereas the concentration of RNA was determined using a Qubit fluorometer and the Qubit RNA BR assay kit (Life Technologies, Carlsbad, CA, USA). Moreover, each RNA sample was inspected for quality using the RNA 6000 Nano kit by Agilent 2100 Bioanalyzer (Agilent Technologies, Santa Clara, CA, USA).
DNA samples with a concentration greater or equal to 5 ng/µl were considered for molecular analyses. RNA samples with a concentration greater than 10 ng/µl and at least 50% of RNA fragments longer than 200 base pair (bp) were considered adequate for further analyses.  Human transcriptome gene expression analysis. A targeted whole transcriptome analysis was performed using the Ion Ampliseq Transcriptome Human Gene Expression kit (Life Technologies, Carlsbad, CA, USA). This method relies on a highly sensitive amplicon-based transcript quantification by semiconductor sequencing, allowing the simultaneous amplification and evaluation of more than 20.000 human genes. Ten ng of total RNA were reverse transcribed to cDNA using the Invitrogen SuperScript VILO cDNA Synthesis kit (Applied Biosystem -ThermoFisher Scientific, San Francisco, CA, USA).
Transcriptome libraries were generated according to the manufacturer's instructions. Briefly, after targets amplification and primers digestion, adapters and molecular barcodes were ligated to the amplicons followed by magnetic bead purification; then libraries were amplified and purified. Libraries were evaluated and quantified using an Agilent 2100 Bioanalyzer (Agilent Technologies, Santa Clara, CA, USA). Multiplexed barcoded libraries were pooled for emulsion PCR on Ion sphere particles using the Ion PI Template OT2 200 kit on the One Touch 2 instrument. An automated Ion OneTouch ES enrichment of template-positive Ion sphere particles was then performed. Finally, samples were loaded on Ion PI chips, and sequenced, using the Ion PI Sequencing 200 kit (ThermoFisher Scientific, Waltham, MA, USA) on the Ion Proton sequencing system (Life Technologies, Carlsbad, CA, USA).
Immunohistochemistry. IHC analyses were performed on 3 µm thick tissue sections from the second series of samples. Immunoreaction was displayed using the avidin-biotin-peroxidase complex method and peroxidase activity was visualized with diaminobenzidine. Counterstaining was performed with hematoxylin and negative controls were carried out by omitting the primary antibodies. Immunostaining was executed as a fully automated assay using BenchMark XT automated slide stainers (Ventana Medical Systems, Tuscon, AZ, USA). All cases were independently evaluated by 2 pathologists (GA and GF), who were blinded to the clinical-pathological characteristics of the patients; discordant cases were revised until mutual agreement was reached.
For SLMAP immunohistochemical staining, the sections were incubated with a mouse antihuman SLMAP monoclonal antibody (clone SJ-09; Santa Cruz Biotechnology, Inc., Dallas, Texas) used at a 1:100 dilution. FFPE human testis tissue showing membrane and cytoplasmic localization was used as positive control. Immunostaining was heterogeneous so all cases were analyzed using a semiquantitative histologic scoring (H score), as previously reported 43 . Briefly, immunostaining intensity of each case was scored as follows: 0, none; 1, weak; 2, moderate; and 3, intense. In addition, the percentage of positive neoplastic cells was evaluated. For each case, a value designated H score was obtained by multiplying each intensity with the corresponding percentage of positive cells, thereby obtaining a final resulting score value (range 0-300) (Fig. 4a-c). www.nature.com/scientificreports www.nature.com/scientificreports/ For IGJ immunohistochemistry, sections were incubated with a rabbit monoclonal antibody to IGJ (1:100 dilution; clone SP105; Abcam, Cambridge, UK) using FFPE human tonsil tissue as positive control. IGJ is a secreted protein expressed by inflammatory cells in the plasma cell stage. The relative number of cytoplasmic positive plasma cells infiltrating the tumors was assessed in five randomly chosen 200X microscopic fields and the average of their counts was calculated (Fig. 4d-f).
For PD-L1 immunohistochemical staining, the slides were incubated with a rabbit monoclonal antibody to PD-L1 (clone SP263, Ventana Medical Systems, Tuscon, AZ, USA) according to manufacturer's instructions. Placental tissue was used as external positive control and inflammatory cells, such as dendritic cells, macrophages, mast cells, and T-and B-lymphocytes, were used as internal control. The PD-L1 expression was evaluated by tumor proportion score (TPS) as previously described 44 . Only viable tumor cells were included in the scoring. All other stained cells, such as tumor-associated immune cells, normal/non neoplastic cells and necrotic cells, were excluded from evaluation. The scoring was interpreted as no-PD-L1 expression (TPS < 1%), low PD-L1 expression (TPS 1-49%), and high PD-L1 expression (TPS ≥ 50%). We also evaluated the immunohistochemical expression of PD-L1 in the population of immune cells (IC) tipically found in the intratumoral and peritumoral regions. The immune cells were scored using the proportion of the tumor area that was occupied by PD-L1-positive immune cells of any intensity, including any PD-L1 staining regardless of the type of immune cell or its location, as previously described 45 . In detail, tumour-infiltrating immune cells were scored as IC3 to IC0 (IC3 ≥ 10%, IC2 ≥ 5% and < 10%, IC1 ≥ 1% and < 5%, and IC0 < 1%).
All evaluations were conducted using a LEICA DMLB microscope (Leica Microsystems Srl, Milan, Italy).

Bioinformatics analyses.
Raw sequencing reads were evaluated on the Torrent Server using a free ampliS-eqRNA plug-in that provides quality control, visualization and normalized counts per million for each gene. RNA-seq reads for each library were mapped to the amplicon sequences of the panel (reference human genome hg19) using the same plug-in. Raw counts for each gene in the UCSC coding gene annotation (hg19) were measured with HTSeq (http:// www-huber.embl.de/users/anders/HTSeq/) version 0.5.3p3. Genes with very low counts across all libraries were filtered. In particular, we retained only those genes with at least a count of 10 in all samples. Differential gene expression was first determined between our PSC samples and normal controls; the same analysis was then performed between LUAD and LUSC datasets and their respective normal controls (TCGA data).
In detail, differential expression was detected by edgeR version 2.6.8 46 . Briefly, the raw read count data was first scaled to library size followed by normalizing with weighted trimmed mean of M values (TMM) to consider the compositional bias in sequenced libraries. Then, the dispersion of the reads counts was estimated and an exact test was performed to detect differential expressed mRNAs between the groups. Genes showing an absolute logFC value ≥ 2 and a FDR < 0.05 were defined as differentially expressed.
To select genes with the strongest deregulation in PSC, the ratio between the logFC in PSC with those ones obtained in LUAD and LUSC was calculated. IHC validation was then performed on genes with a ratio over than 90 percentile of this distribution. Differences of IHC scores between groups were estimated by Kruskal-Wallis and Dunn's test with Benjamini-Hochberg correction for multiple comparison using dunn.test package version 1.3.5 47 . IHC scores of IGJ and PD-L1 (both in tumor and immune cells) were correlated by Spearman's correlation