The tumor immune microenvironmental analysis of 2,033 transcriptomes across 7 cancer types

Understanding the tumor microenvironment is important to efficiently identify appropriate patients for immunotherapies in a variety of cancers. Here, we presented the tumor microenvironmental analysis of 2,033 cancer samples across 7 cancer types: colon adenocarcinoma, skin cutaneous melanoma, kidney renal papillary cell carcinoma, sarcoma, pancreatic adenocarcinoma, glioblastoma multiforme, and pheochromocytoma / paraganglioma from The Cancer Genome Atlas cohort. Unsupervised hierarchical clustering based on the gene expression profiles separated the cancer samples into two distinct clusters, and characterized those into immune-competent and immune-deficient subtypes using the estimated abundances of infiltrated immune and stromal cells. We demonstrated differential tumor microenvironmental characteristics of immune-competent subtypes across 7 cancer types, particularly immunosuppressive tumor microenvironment features in kidney renal papillary cell carcinoma with significant poorer survival rates and immune-supportive features in sarcoma and skin cutaneous melanoma. Additionally, differential genomic instability patterns between the subtypes were found across the cancer types, and discovered that immune-competent subtypes in most of cancer types had significantly higher immune checkpoint gene expressions. Overall, this study suggests that our subtyping approach based on transcriptomic data could contribute to precise prediction of immune checkpoint inhibitor responses in a wide range of cancer types.

The tumor microenvironment (TME) is composed of many different types of cells such as fibroblasts and myofibroblasts, neuroendocrine cells, extracellular matrix, stromal cells, and immune cells 1 . As TME significantly contributes to the cancer development and malignancy 2 , understanding of TME is important. The paradigm of clinical cancer treatment has been shifted towards the use of immune checkpoint inhibitor (ICI) treatments, which target T cell inhibitory receptors 3 . The ICIs have shown promising clinical effects in several types of cancer, especially in non-small-cell lung cancer (NSCLC) 4 . However, most of the patients in different types of cancer still show non-responsiveness to the treatment, and instead suffer intolerable side effects 5,6 . The predictive and prognostic biomarkers for ICIs have been developed by estimating the expression levels of immune checkpoint genes including PD-1, PD-L1 and CTLA4 as well as mutational burden in cancer samples, but the heterogeneity of tumor microenvironment around tumor cells was not considered 7 . In addition, the expressions of immune checkpoint genes and mutational burden are not sufficient to select the adequate patients and predict the responses to ICIs in several cancer types 8,9 .
The classifications of immunological associated subtypes in cancer have demonstrated its clinical significance as prognostic and predictive factors that could be used for a personalized cancer immunotherapy [10][11][12] . For instance, enhanced cytolytic immune functions in infiltrating lymphocytes CD8 T cells improved efficacy of immunotherapy 5,6 , and the relative contribution of each immune cells was considered to estimate the anti-tumor response 13,14 . Since immunosuppression from abnormalities of the TME critically interrupts immunotherapeutic

Results
Unsupervised hierarchical clustering and immune characterization using TME scores separated 2,033 cancer samples into TME-related immune subtypes of 7 cancer types from TCGA cohorts. We conducted unsupervised hierarchical clustering of 7,762 cancer samples and 622 non-cancer controls across 22 cancer types using gene expression data. Among these cancer types, non-cancer controls in BLCA, BRCA, CESC, ESCA, HNSC, KIRC, PRAD, STAD, THCA, THYM and UCEC were separated into 2 or 3 clusters simultaneously along with cancer samples, which indicated that clusters cannot be defined into cancer-specific subtypes. Additionally, there was only one cancer sample at one of the clusters in READ. We thus excluded these 12 cancer types that were not clearly differentiated, and identified that 2,508 cancer samples in 10 cancer types were clearly separated into subtypes by the clustering.
The subtyping approach distinguished samples in 6 cancer types at k = 2 and 4 types at k = 3 via additional clustering. The principal component analysis (PCA) represented clustering patterns between the immune subtypes of cancer samples and non-cancer controls. At k = 2, immune-competent subtypes were closely clustered with non-cancer controls in PAAD, PCPG, SARC and SKCM while immune-deficient subtypes were closely clustered with controls in GBM and LICH (Fig. 1a). At k = 3, non-cancer controls, subtype A and B were distinctly clustered in CHOL, COAD, KICH and KIRP (Fig. 1b). Cluster dendrograms further demonstrated differential clustering patterns in 10 cancer types ( Supplementary Fig. 1). Gene set enrichment analysis of the 1,000 most variables genes between cancer and controls for clustering in 10 cancer types demonstrated these genes overlapped with at least one immune-related gene set except KICH (Supplementary Table 2).
The subtypes of 2,508 cancer samples were defined by comparing their tumor microenvironmental (TME) characteristics based on the estimated abundance of infiltrating immune and stromal cells in tumor tissues, called immune and stromal scores, respectively, from Estimation of STromal and Immune cells in MAlignant Tumor tissues using Expression data (ESTIMATE) 13 . Samples with relatively lower mean scores were classified as immune-deficient subtype (subtype A) and those with relatively higher mean scores as immune-competent subtype (subtype B).
To verify the significance of TME characteristics in immune-competent subtypes across 10 cancer types, we compared immune and stromal scores using ESTIMATE, and estimated levels of cytolytic activity (CYT) scores using Tumor IMmune Estimation Resource (TIMER) 14 . Also tumor purity scores were compared and calculated using ESTIMATE, which showed strong correlations with the purity scores inferred by ABSOLUTE algorithm using copy number alteration (CNA) and somatic mutation data 17 (Supplementary Fig. 2). Immune-competent subtypes in 10 cancer types showed enriched immune, stromal, CYT and lower tumor purity scores ( Fig. 2a-d, respectively). We discovered significant differences in these well-established TME predictors between the immune subtypes in COAD, GBM, KIRP, PAAD, PCPG, SARC and SKCM. Although correlation between tumor purity data from score. The level of significance denoted as: ns., non-significant, *p < 0.05, **p < 0.01, ***p < 0.001 and ****p < 0.0001. Statistical significances between subtypes were measured by unpaired Student t test.
ESTIMATE and ABSOLUTE was poor in PAAD (R2 = 0.16 and P = 5.08 × 10-2; Supplementary Fig. 2), we included this cancer for further analysis due to strong evidences of low tumor purity: significantly elevated immune, stromal and CYT scores.
In summary, we carried out unsupervised hierarchical clustering on 7,762 samples in 22 cancer types, and 11 cancers that were not clearly differentiated were excluded. A total of 2,675 samples in 11 cancer types were clearly differentiated at k = 2 and k = 3. Among these clearly differentiated cancer types, READ was excluded due to dramatically unbalanced distribution of the samples between subtypes, and we also excluded CHOL, KICH and LIHC based on criteria for TME characteristics using four estimated scores. Therefore, a total of 2,033 samples in 7 cancer types were selected and analyzed for further investigation ( Supplementary Fig. 3.). Among these samples, 728 cancer samples were identified as immune-deficient subtypes and 1,305 samples as immune-competent subtypes in 7 cancer types ( Table 1).
Immune-competent subtypes in 7 cancer types demonstrated differential TME characteristics. To further elucidate TME features of immune-competent subtypes in 7 different types of cancer, we compared the estimated abundances of infiltrated immune cells, including B cells, CD4 + T cells, CD8 + T cells, neutrophils, macrophages and dendritic cells (DC) from TIMER, which were appropriate for inter-sample comparison and validated using multiple approaches 14,18 . We found that most of these immune cells were significantly infiltrated in the TME of immune-competent subtypes across 7 cancer types (Fig. 3a).
Specifically, M1 macrophages, M2 macrophages and regulatory B (B reg ) cells were known to show differential effects on the TME: M2 macrophages and B reg were involved in immune evasion, and reduced the sensitivity to immune-checkpoint inhibitors, while M1 macrophages along with CD8 + T cells and DC played roles in anti-tumor activity 19,20 . Therefore, we investigated average z scores of signature gene expressions for M1 macrophages, M2 macrophages 21,22 , and B reg cell infiltrations that were experimentally validated 23,24 between the subtypes in this study. As immune-competent subtypes in 7 cancers demonstrated relatively increased abundance of infiltrated immune cells, the expressions of signature genes for M1 macrophages, M2 macrophages and B reg cells were also elevated in these TME highlighted subtypes. The signature genes for both M2 macrophage and B reg were significantly elevated in immune-competent subtypes of KIRP and PAAD among 7 types of cancer and only B reg cells in immune-competent subtypes of SKCM. While the signature genes for M1 macrophages were significantly elevated in immune-competent subtypes of PAAD and SARC (Fig. 3b). We also compared several immune-related molecules that were up-regulated within high cytolytic activity subtypes in colon cancer 25 for clarification of TME within our immune-competent subtypes. The majority of expressions of immune-related genes were also elevated in immune-competent subtypes across 7 cancers which clearly indicated high proportion of immune molecules within these subtypes ( Supplementary Fig. 4).
In addition to signature gene expressions for these immune cells and molecules, we compared the expression levels to estimate the abundance of NK cells 26 , which were well known to be involved in anti-tumor effects in the TME 27,28 . We discovered that immune-competent subtypes showed significantly elevated expressions of signature genes for NK cell infiltration in both KIRP and SKCM among 7 cancers (Fig. 3b). We further analyzed detailed transcriptomic signatures of anti-tumor effects induced by NK cells within two types of cancer: KIRP and SKCM as immune-competent subtypes in these cancer types showed significant elevation of genes related to NK cell infiltration compared to the other cancer types. Recruiting conventional type 1 dendritic cells (cDC1) into the TME by NK cells induced anti-tumor effects 29 . Interestingly, we found that the expressions of signature genes for cDC1 were only significant in SKCM (P = 9.60 × 10-3; Fig. 3c). In contrast, only immune-competent subtypes in KIRP had significantly increased expression of HLA class 1 molecules antigen E (HLA-E), which suppressed NK cell activity 30 (P = 4.76 × 10-5; Fig. 3c). At gene levels, immune-competent subtypes in KIRP and immune-deficient subtypes in SKCM showed decreased expression of C-type lectin domain containing 9 A (CLEC9A) (P = 4.82 × 10-1 and P = 4.08 × 10-3, respectively; Supplementary Fig. 5). The expressions of other markers for cDC1, including XCR1, CLNK and BATF3 were increased in both immune-competent subtypes of SKCM and KIRP. We also investigated NK inhibitory receptors: KLRC1 (NKG2A) and KLRD1 (CD94) that suppressed NK cell anti-tumor activity via binding HLA-E 30 to evaluate differential NK cell-mediated effects in KIRP and SKCM. However, they were increased in both immune-competent subtypes of KIRP and SKCM ( Supplementary Fig. 5).
We conducted survival analysis between the subtypes across 7 cancer types to support immune characteristics of TME in the immune-competent subtypes and to evaluate clinical significance of immune subtyping. www.nature.com/scientificreports www.nature.com/scientificreports/ We excluded PAAD and PCPG for the survival analysis because of the following reasons: overall survival (OS) was used as an endpoint since it is recommended to be used for the majority of TCGA cancer types except PCPG due to insufficient follow-up time 31 , and all of immune-deficient subtypes were identified as alive in PAAD. Our survival analysis found that immune-competent subtypes had significantly poorer survival rates than immune-deficient subtypes in KIRP (P = 1.42 × 10-2; Fig. 3d). In contrast to KIRP, immune-deficient subtypes showed poorer survival rates than immune-competent subtypes in SARC and SKCM (P = 9.04 × 10-3 and P = 1.11 × 10-5, respectively; Fig. 3d). Unlike KIRP, SARC and SKCM, non-significant differences in survival rates between the subtypes in COAD and GBM were found (P = 4.84 × 10-1 and P = 3.67 × 10-1, respectively; Fig. 3d).
To investigate the potentials of immune-competent subtypes as efficient targets for immune checkpoint inhibitors, we compared the expressions of immune checkpoint genes: PD-1, PD-L1, PD-L2 and CTLA4 which are well-known targets of immune checkpoint inhibitors 3 between the subtypes in 7 cancer types. Immune-competent subtypes in 7 cancers had increased expression of these genes compared to immune-deficient subtypes except PD-1 expression in GBM and PD-L1 in PCPG, and CTLA4 in SARC. Particularly, the expressions of PD-L1 and PD-L2 in immune-competent subtypes were significantly elevated in 7 types except PCPG. In addition to those ligands, the expressions of PD-1 were significantly increased in COAD, PAAD and SKCM and those of CTLA4 in COAD, KIRP, PAAD and PCPG (Fig. 4c). There are several immune checkpoint genes including VTCN1, VISTA, LAG3, IDO1, IDO2 and TIM3 [32][33][34] that are emerging in the development of immunotherapy, and we compared the expressions of these genes between the immune subtypes. The patterns of differences between expressions across 7 cancers were inconsistent. The majority of immune-competent subtypes had increased expressions. However, immune-competent subtypes in several cancer types had relatively lower expressions than immune-deficient subtypes, particularly VTCN1 in GBM and LAG3 and IDO1 in PCPG were significantly elevated in immune-deficient subtypes (Fig. 4c). Comparing absolute RNA expressions of four immune-checkpoint molecules across 7 cancers types also demonstrated elevated expressions in the immune-competent subtypes. Furthermore, in terms of cancer types, relatively higher expressions of PD-1 and CTLA4 in SARC and SKCM compared to the others, elevated expressions of PD-L2 in GBM, SARC and SKCM, and similar levels of PD-L1 expressions across cancer types were observed (Fig. 4d).

Discussion
Predictive methods for ICI response including immunohistochemistry (IHC) of immune checkpoint genes and emerging prognostic markers using genomic data such as TMB and CNV burdens 35,36 are conventionally used, but more precise estimations with non-complex approaches still remain to be investigated and developed. Here, we represent a computational strategy to potentially maximize choosing targets for ICIs based on only transcriptomic data that gives analytical advantages in measuring the expressions over IHC 37 .
Using unsupervised hierarchical clustering based on the gene expression, we presented the TME-dependent differentiation of cancer samples in 7 cancer types from TCGA. The identified immune-competent subtypes in these cancers showed TME signatures: significantly elevated immune and stromal cell infiltration and cytolytic activities, and lower tumor purity. The characteristics of TME in these subtypes were differentially identified using survival data along with the expressions of signature genes for several immune cells and molecules which were known to be involved in the TME. They often exhibit immunosuppressive characteristics that inhibit the functions of effector T cells, and reduce the efficacy of immunotherapy 38 . Interestingly from our results, the signatures for immunosuppressive TME were clearly represented in only KIRP, shown by significantly poorer OS with elevated expression for signature genes of M2 macrophage and B reg , and clear transcriptomic patterns of impaired NK cell induced anti-tumor activities.
Meanwhile, immune-competent subtypes showed significantly improved OS in SARC and SKCM. These subtypes in SARC had significantly increased signature gene expressions for M1 macrophage and estimated infiltration of CD8 + T cells and DC which are associated with anti-tumor activity 19,20 . In SKCM, we demonstrated that transcriptomic patterns in immune-competent subtypes potentially reflected anti-tumor activity from NK cell even though the expressions of signature genes for immunosuppressive B reg were significantly increased in SKCM. The results from SARC and SKCM implied that infiltrated immune cells in TME were not associated with immune evasion, and possibly formed immune-supportive TME in immune competent subtypes.
We then evaluated genomic instabilities including sCNV burden, TMB, and neoepitope loads between the immune subtypes. Differences in number of CNV segments between subtypes in the majority of cancer types showed inconsistent results in sCNV with previous TME studies from LUSC and colorectal cancers (CRC) 11,25 . At (2020) 10:9536 | https://doi.org/10.1038/s41598-020-66449-0 www.nature.com/scientificreports www.nature.com/scientificreports/ gene levels, recurrent amplifications and deletions at immune checkpoint genes were found in several subtypes of cancers but not in TME-dependent manner. We also demonstrated that immune-competent subtypes had significantly increased TMB and neoepitope loads in SARC and SKCM. Subtypes in COAD also had increased these genomic instabilities without significance in TMB elevation, and correlation analysis suggested that neoepitope loads may play a role in CYT activity in both immune subtypes of COAD. In GBM, KIRP, and PCPG, immune-competent subtypes had lower TMB and neoepitope loads. Particularly, these subtypes in KIRP showed significant difference like LUSC 11 . cancers. The level of significance denoted as: ns., non-significant, *p < 0.05, **p < 0.01, ***p < 0.001 and ****p < 0.0001. Statistical significances of the genomic instability scores and the expression of the genes between subtypes were measured by unpaired Student t test.
The fact that expressions of immune checkpoint genes were significantly elevated in the immune-competent subtypes and higher expressions in certain types of cancer could potentially suggest and optimize usage of ICIs. However, we need to compare these immune-related subtypes in terms of responses to ICIs, to evaluate the prognostic value of this immune subtyping in 7 cancer types in the future.
Since we applied a single subtyping approach to different types of cancer originated from different types of tissue, inconsistency in identifying the characteristics of TME in immune-competent subtypes across 7 cancers was discovered. Also M1 and M2 macrophages are not always associated with anti-tumor effect and tumor-associated macrophages, respectively 39 as macrophage populations are tissue-and tumor-specific. Hence experimental validation of immune cell infiltration is needed to be conduct to clarify the TME characteristics for each cancer type in the future.
Although further studies are required on this approach to become clinically significant, our subtyping provides convenient prediction for identifying the prognostic and predictive factors that could guide personalized cancer immunotherapies using only transcriptomic data. We believe that considering the immune subtypes in a TME dependent manner will be a decent diagnostic biomarker and predictor for responses to immunotherapy.

Materials and Methods
The Cancer Genome Atlas (TCGA) data sets. RNA sequencing data in TCGA, which are composed of cancer and non-cancer control data were used for characterizing the tumor microenvironment and classifying immune subtypes in 22 cancer types. We excluded ACC, DLBC, LAML, LGG, MESO, OV, TGCT, UCS and UVM as there were no non-cancer controls in those types, and also excluded previously studied LUAD and LUSC. The raw reads of RNA expression datasets as htseq count format were obtained in the TCGA 40 . The list of TCGA cancer type abbreviations is available (see URL).
Identification of immune subtypes. Immune subtypes of TCGA cancers were identified by unsupervised complete linkage clustering method, then we cut cluster trees into 2 and 3 groups to identify subtypes. Raw reads were processed and transformed to variance stabilizing data (VSD) using R package 'DESeq2' 41 , then 1,000 most variable genes were used for subtype classification of cancer samples. The PCA was plotted using first three principal components with a 95% confidence interval.
Processing RNA-seq based gene expression. For normalized RNA data, fragments per kilobase million (FPKM) was computed from the raw reads from HTseq counts by using the R package ' edgeR' 42 , and the FPKM expression values were adjusted to log2 and centered median expression values by cluster 3.0 43 . For comparing the expressions for immune checkpoint gene between the subtypes, log2 FPKM was used. We converted FPKM to transcripts per kilobase million (TPM) using the formula as described below 44  Since there was only one non-cancer control sample in SKCM, we calculated Z scores for SKCM using mean FPKM of cancer samples instead of values from non-cancer control samples.
Scores for genomic instability. The mutation rates, scores for sCNV and neoantigen loads were derived from previous studies 45,46 . We excluded the samples without available genomic instability data for this analysis. Silent and non-silent mutation rates were the number of mutations divided by target length of the genome in Mb. In sCNV burden, 'Number of segments' equals to total number of segments in each sample's copy number profiles.
Identification of recurrent sCNV. The copy number SEG file of TCGA samples was downloaded 17 .
GISTIC analysis 47 was conducted to find recurrent amplification and deletion in the subtypes of 7 cancer types. The confidence level for the significant region was 0.90 and the q value was 0.25.
Statistical analyses. R-3.3.0 was used for statistical analyses. In the case that the sample size was bigger than 30, we used the unpaired student's t test for comparison between immune subtypes. The p-value for overall survival curve was compared by the log-rank test. The Cox hazardous ratio was estimated by using the R package 'survminer' . The boxplot with the quantitative data was presented as mean ± standard deviation. For correlation, Pearson's product-moment correlation was used.