Pan-cancer analysis connects tumor matrisome to immune response

Recent sequencing efforts unveil genomic landscapes of tumor microenvironment. A key compartment in this niche is the extracellular matrix (ECM) and its related components – matrisome. Yet, little is known about the extent to which matrisome pattern is conserved in progressive tumors across diverse cancer types. Using integrative genomic approaches, we conducted multi-platform assessment of a measure of deregulated matrisome associated with tumor progression, termed as tumor matrisome index (TMI), in over 30,000 patient-derived samples. Combined quantitative analyses of genomics and proteomics reveal that TMI is closely associated with mutational load, tumor pathology, and predicts survival across different malignancies. Interestingly, we observed an enrichment of specific tumor-infiltrating immune cell populations, along with signatures predictive of resistance to immune checkpoint blockade immunotherapy, and clinically targetable immune checkpoints in TMIhigh tumors. B7-H3 emerged as a particularly promising target for anti-tumor immunity in these tumors. Here, we show that matrisomal abnormalities could represent a potential clinically useful biomarker for prognostication and prediction of immunotherapy response.


INTRODUCTION
The extracellular matrix (ECM) is a complex multi-spatial meshwork of macromolecules with structural, biochemical and biomechanical cues, influencing virtually all fundamental aspects of cell biology. 1 Although tightly controlled during normal development, ECM is frequently altered in many diseases, including cancer. 2,3 Despite clear evidence of abnormal tumor matrix in cancer, characterization and understanding their functional role in tumors have been challenging, possibly owing to complex nature of ECM proteins and their associated factors, or matrisome. [4][5][6] Little is known about the extent to which matrisome pattern is shared across various carcinomas or unique in tumors of differing metastatic potential; it remains unknown as to whether there exist subclasses of tumor matrisome that modulates tumor initiation and response to therapy, particularly in the context of immune response.
Tumor matrisome index (TMI) is a 29-matrisome-gene expression based classifier that has been validated for its predictive value in prognosis and adjuvant therapy response in early-stage nonsmall cell lung cancer (NSCLC). 7,8 TMI comprises exclusively of a small set of matrisome gene signatures, primarily encoding noncore matrisome proteins, which were found to be most differentially expressed in NSCLC relative to matched normal epithelium. In essence, TMI is calculated by the sum of the expression level that is multiplied by predefined Cox proportional hazards model coefficient of each TMI gene (see Methods).
Here we hypothesized that this specific pattern of deregulated matrisome could be a common determinant of tumor aggression irrespective of tumor origin. Given the prognostic significance of TMI in predicting the response to adjuvant chemotherapy (ACT) in NSCLC, 8 we investigated if TMI would be associated with signatures predictive of clinical response to immune checkpoint inhibitor (ICI) treatments, including total mutational burden (TMB), PD-L1 expression, tumor-infiltrating lymphocytes (TILs), and immune gene signatures. 9 Through parallel analyses of wholetranscriptomic and matched proteomic data, we report robust associations of TMI with TMB, histopathological and clinical features, and reveal the immune landscape of matrisomederegulated tumors. Our resource of curated compendiums of 8,386 genome-wide profiles, and molecular and clinical associations of TMI may advance our understanding on the underlying role of biophysical matrix changes that is common across cancer types.

Matrisome is commonly deregulated in human cancers
The present study was conducted to extend the investigation of TMI performance into 11 major cancer types (Fig. 1a). To facilitate multi-platform parallel analyses with RNA-seq-acquired TCGA data, we first generated a unified, cancer type-specific, merged microarray-acquired dataset (MMD), comprising 8386 transcriptome profiles of tumor and tumor-free tissues, using our previously developed R pipeline 7 (Supplementary Data files S1 and S2; see Methods). As TMI biomarkers were derived from lung cancer, we first queried the extent to which genome-wide expression patterns of lung cancer would be shared across various cancers.
Using MMD and TCGA, we found that the differential expression patterns of lung cancers, relative to non-tumor controls, were comparable to that of breast, ovarian, bladder, colorectal, and prostate cancers, regardless of assayed platforms (Tables S1 and  S2), as recently observed in systematic pan-cancer analyses. 10,11 Several of the 29 TMI genes were consistently placed in the top ranked differentially expressed gene (DEG) list across multiple cancer types at the individual gene level ( Fig. 1b and Table S3). We identified a subset of genes that was significantly enriched in lung adenocarcinoma (UAD), prostate adenocarcinoma (PRAD), kidney renal papillary cell carcinoma (KIRP), stomach adenocarcinoma (STAD), colon adenocarcinoma (COAD), breast invasive carcinoma (BRCA), liver hepatocellular carcinoma (LIHC) and bladder urothelial carcinoma (BLCA), and we termed it as a generic TMI signature (Supplementary Data file S3; see Methods). This initial screen supported our hypothesis that altered matrisome dynamics might represent a common phenotype across different malignancies.
TMI distinguishes cancers from normal tissues Except for TCGA PAAD (pancreatic adenocarcinoma), which had an insufficient number of normal samples (n = 4), all cancer types demonstrated significant difference in the TMI between tumor and non-tumor tissues ( Fig. 1c and Supplementary Data files S1, S4 and S5). Expectedly, NSCLC-derived TMI achieved near-perfect diagnostic accuracy in lung cancer datasets, where the area under the receiver operating characteristic (ROC) curve (AUC) at the optimal cut-offs was 0.946, 0.999, and 0.999 in lung MMD, TCGA LUAD, and LUSC (lung squamous cell carcinoma), respectively (Fig.  1d). Across all malignancies, we observed an overall classification accuracies of 82% and 92% using MMD and TCGA datasets, respectively (Table S4). Of note, the observed difference between the two assayed platforms (MMD vs. TCGA) is possibly due to different number of TMI genes filtered in TCGA for final index computation (Supplementary Data file S2; see Methods).
To perform a pan-cancer analysis with minimal technical variability across samples, we next analyzed an independent data source in which tumors were procured under standard conditions ( Fig. S1 and Supplementary Data file S6; see Methods). Although over a narrower range than for lung cancer, 1492 carcinomas spanning 10 cancer types displayed a wide and diverse TMI distribution, suggesting a varying degree of matrisome abnormalities at different stages of cancer progression and a specific TMI spectrum for each cancer type.
TMI is predictive of patient survival We obtained a total of 72 independent validation cohorts annotated with clinical information, including overall survival (OS), disease-specific survival (DSS), disease-free survival (DFS), relapse-free survival (RFS), metastasis-free survival (MFS), and/or progression-free survival (PFS) data. As previously described, 8 a cut-off index was determined for each dataset to stratify patients into TMI low and TMI high groups. Univariable survival analyses revealed a cancer-specific association of TMI in predicting time to death, recurrence, and distant metastasis (Fig. 2a, b; Supplementary Data file S7; see our earlier work for lung cancer 8 ). TMI high was an unfavorable prognostic factor for OS in colon, liver, renal, and breast cancers, whereas it appeared to confer a favorable prognosis in ovarian and gastric cancers, even after adjustment for clinical parameters on multivariable analyses using TCGA Fig. 1 Common matrisome variation in human cancers. a Schematic of the study design: 11 cancer type-specific, merged microarray-acquired dataset (MMD) were newly generated for parallel analyses with TCGA cohorts. b Circular plot illustrating the ranked position of matrisome genes based on differential expression (cancer vs. normal) in TCGA cohorts. 1 denotes the most differentially expressed gene (DEG). Black and gray lines represent the ranked position of generic and lung-specific TMI signature, respectively. c TMI in tumor vs. non-tumor tissues across 11 cancer types. The black horizontal line indicates the mean of the samples. ***Mann-Whitney U-test P < 0.001, **P < 0.01, *P < 0.05. d Area under the ROC curves (AUCs) of the TMI classifier for all cancer types. Smooth ROC curves are drawn for MMDs (left) and TCGA cohorts (right) datasets ( Fig. 2c and Supplementary Data file S8). Comparing with traditional clinical features between TMI low and TMI high patients from TCGA datasets, we found that the TMI high group had a higher proportion of patients staged pathologically as T3 or T4, diagnosed as lymph node and distant metastasis positive, and classified as late stage (Fig. 2d).
TMI is associated with pathological and molecular features We next asked if there would be a change in TMI during multistep carcinogenesis. In breast, colorectal, and pancreatic cancers, we observed increasing TMI values corresponding to the progressive steps of oncogenesis (normal to adenoma to carcinoma in-situ to invasive carcinoma; Fig. 3a and Supplementary Data file S9). Next, we further investigated for an association between TMI and known prognostic molecular phenotypes in breast cancers (Luminal A/B, HER2+ and basal subtypes). 12,13 Our prognostic TMI was highest among the most adverse molecular phenotypes (basal and HER2+ tumors), which were known to harbor the worst prognosis ( Fig. 3b and Supplementary Data file S10).
We also explored if TMI expression would be correlated with genomic mutational burden, and analyzed matched whole-exome sequencing-derived data of nine TCGA cohorts (pancreas, stomach, prostate, colon, lung, breast, liver, kidney, and bladder), for which TMI has previously been computed (Table S5). TMI was positively correlated with somatic TMB across cancer types (Spearman's rho test r s = 0.23 and P = 1.77E-14 (Breast) to r s = 0.48 and P = 1.41E−11 (Pancreas); Fig. 3c, top; Supplementary Data file S11). Additionally, based on a previously defined cut-off for discretization into TMI high and TMI low , the former harbored significantly higher TMB in all cancers (Fig. 3c, bottom), except for renal (Mann-Whitney U-test P = 0.104) and liver cancers (Mann-Whitney U-test P = 0.065).
TMI in the context of immune response Given the evidence in support of TMB as a robust determinant of tumor immunogenicity in many solid tumors, 14,15 and that TMI is closely associated with TMB, we next asked if there would be any direct association of this specified matrisomal pattern with the cancer immune landscape. Applying machine learning-based CIBERSORT to MMD and TCGA cancer patient samples (see Methods), we observed a close correlation between TMI and the composition of specific TIL populations for several cancers (Fig.  4a). Enrichment of these TILs related to both innate and adaptive immunity was diverse and cancer type-specific. Relative abundance of M0 and M1 macrophages, neutrophils, activated mast cells, regulatory T cells (Treg), and T follicular helper (Tfh) cells, activated CD4+ memory T cells increased, while resting CD4+ memory T cells, mast cells, naive B cells, and resting dendritic cells decreased with tumor-promoting matrisomal change (Supplementary Data file S12). Impact on immune infiltrates was particularly pronounced in gastric, lung, colorectal and breast cancers, having significant positive and negative correlations with TMI, thus highlighting the influence on the immune contexture during matrisome remodeling in this subset of cancers.
Early works suggest that signatures of T cell states, particularly that of CD8+ T cells, may predict clinical response to ICI-based immunotherapy. [16][17][18][19] Of all cancer types analyzed, breast (BRCA) and pancreatic (PAAD) cancers demonstrated pronounced negative correlation with the estimated abundance of CD8+ T cells, indicating that TMI low tumors harbored higher CD8+ T-cell infiltration levels in these cancers. To validate their differential expression at the protein level, we assessed matched proteomes of TCGA samples provided by the NCI Clinical Proteomic Tumor Analysis Consortium for breast (BRCA) cancers, for which samples were previously classified as either TMI low or TMI high at the transcriptomic level (CPTAC; see Methods).
A matched-cohort assessment of 108 BRCA samples revealed higher protein levels of CIBERSORT-defined CD8+ T cell signatures, including CD8A, GZMB, LIME1, and RASA3, in TMI low tumors (Fig. 4b). Interestingly, differential expression analysis of whole proteomes comparing the two groups (TMI high vs. TMI low ) identified MAGEA3, a frequently expressed tumor-specific antigen, as the second most highly expressed protein in TMI low tumors (Fig.  4c). The functional contribution of spontaneously occurring MAGEA3-reactive CD8+ T cells to favorable prognosis 20 may explain the better patient outcomes consistently observed in TMI low breast tumors.
Abundance of CD8+ T cells has been associated with better response to immunotherapies. 9,17,18 Given the enriched CD8+ T cell signatures at both transcript and protein levels in TMI low tumors, we next investigated the association between TMI and recent reported predictive signatures for immunotherapy response. The responder signature comprises 161 genes, which were highly expressed in anti-PD-1 responding melanoma patients compared to non-responding patients 21 (see Methods). Intriguingly, TMI was highly negatively correlated with the gene set variation analysis (GSVA) z-scores of the responder signature for each breast cancer sample (TCGA BRCA; see Methods); higher levels of GSVA z-scores of the responder signature were found in TMI low tumors, in which CIBERSORT-defined CD8+ T cell signatures were enriched in these selected tumors (Fig. 4d).
Extending the analysis to the other cancer types, we found that except for melanomas, TMI had negative correlations between the two variables, to different extents, with the most pronounced association seen in lung cancer (Fig. 4e).
B7-H3 as a potential immune target for TMI high tumors We next correlated the index with the expression of 20 potentially targetable immune checkpoints-that are currently in preclinical or clinical trial stages, and/or FDA-approved 22 (Fig. 5a and Supplementary Data file S13). Unexpectedly, the data revealed strong correlations of TMI with B7-H3 expression in all (MMD) and nearly all (TCGA) tumor samples regardless of cancer types. Having observed mRNA-protein expression correlation using the processed proteomic data in TCGA BRCA cohort ( Fig. 5b; Spearman's rho test r s = 0.38 and P = 4.19E−05), we found markedly higher levels of B7-H3 protein expression in TMI high breast tumors (Fig.  5c). We thus speculate that the present combined quantitative analyses may help to unveil specific immune targets that could be selectively efficacious in a subset of tumors across diverse carcinomas. Promising predictive value for immunotherapy and other potential targets We questioned if TMI could further be associated with innate PD-1 resistance signature (IPRES), including genes involved in immunosuppression, angiogenesis, monocyte and macrophage chemotaxis, and EMT, 21 comparatively with other commercially available multi-gene tests (MGTs). Given that TMI signatures were derived from lung cancer, we chose lung cancer-derived prognostic MGTs and their predictive values for immunotherapy were assessed using lung cancer datasets (lung MMD and TCGA LUAD), with MGT-specific predefined cut-off thresholds for risk stratification (see Methods). The Myriad myplan TM Lung Cancer and Pervenio TM Lung RS tests are the only currently available prognostic MGTs for lung cancers, which comprise 31 cell cycle progression (CCP) genes and 11 cancer-related pathway genes, respectively. For each validation dataset, IPRES-enriched patients were identified as previously described 21 (see Methods), and demonstrated significant differences in TMI, but not in other prognostic indices, with the exception of Pervenio TM in the MMD cohort (Fig. 5d).
To identify potential therapeutic targets for both TMI low and TMI high groups, we performed differential proteomic analyses comparing the two groups and investigated if there would be any overlapping targets identified by other commercially available MGTs. Of note, as matched proteomic data were available only for breast cancer (TCGA BRCA), we analyzed two breast cancer MGTs (Oncotype DX and MammaPrint) in addition to the two lung cancer MGTs (Fig. 5e; Supplementary Data file S14; see Methods). Interestingly, while both breast and lung cancer MGTs demonstrated several differentially expressed proteins between the two groups, we observed new potential targets that were exclusive to either lung or breast cancers ( Fig. 5f; Supplementary Data file S15). Further, a total of 4 (CASP14, CENPI, PAX9, and TDRD12) and 12 (FAM174A, UBXN10, BBS5, C9orf40, CACNA2D2, TFF1, UBXN10, RIPK4, CALHM6, WFDC2, NAGPA) proteins were consistently enriched in MGT high and MGT low tumors in TCGA BRCA cohort, respectively (Fig. 5g).

DISCUSSION
This work is based on multi-platform evaluation of TMI at multiple molecular levels across 11 major cancer types. By curating a total of 8836 patient-derived tumor and tumor-free samples, we Considering that the index was computed under the assumptions of each gene having concordant regulation as observed in lung cancer, it is noteworthy that tumors with highly dynamic and constantly remodeling ECM could have ubiquitously altered expression of matrisome genes across genetically and phenotypically diverse epithelial tumors. The data may imply the presence of shared signal transduction pathways activated across these selected tumors, such as recently reported TGFβ or HIF1α/VEGF pathways, in controlling ECM balance or cell-intrinsic mechanism regulating the expression of set of ECM genes contributing to tumor development. 23 The functions of TMI matrisome genes, including secreted factors and metalloproteinases, in forming the pre-metastatic niche at specific target site remain poorly defined despite their wellestablished role in local matrix degradation and regulation. 24,25 Exceptions are MMP1, which induces vascular permeability and mediates breast cancer metastasis to the lung, 24,26 and S100 proteins, which generate pro-inflammatory phenotypes and recruit non-resident, bone marrow-derived cells. 24,27,28 Intriguingly, our survival analyses revealed that TMI was correlated with a trophism for metastasis development to the lungs, but not to the bones in breast cancer patients, albeit in a subset of studies (Fig.  2b). This may suggest that the TMI signatures could inform on the tumor promoting microenvironment that facilitates metastasis to respective tissue organs. Through a matched proteomic data analysis, we further identified a small number of specific genes consistently highly expressed at protein level in both risk groups classified by TMI and other commercially available MGTs. Although their expression levels and prognostic values in malignant tumors remain inconclusive, high expression of CASP14 29 and CENPI 30 have been associated with poor prognosis in breast cancer, consistent with our observations in TMI high patient group.
In addition to close associations with patient survival, we provide a comprehensive overview of commonly deregulated matrisome pattern in the context of tumor genotypes, molecular phenotypes, and immune response. Particularly, our combined quantitative analyses of matched proteomes revealed significant enrichment of MAGEA3 and CD8+ T cell signatures in TMI low tumors. MAGEA3 is a cancer-germline gene highly expressed in various carcinomas, but the gene is silent in normal adult tissues, except for the testis and placenta. 31 Independent of tumor burden, the prognostic value of MAGEA3-reactive CD8+ T cells for overall survival was reported in esophageal squamous cell carcinoma. 20 Further, the study observed elevated expansion of functional MAGEA3-specific CD8+ T cells both in vitro and in vivo upon anti-PD-1 treatment. 20 It is thus tempting to speculate that the genes included in the TMI signature might have functional roles for immune surveillance in the tumor microenvironment, given that MAGEA3 was the second most differentially expressed gene and anti-PD-1 therapy responder signature was highly enriched in TMI low tumors.
To assess if the present approach could provide new indications for immunotherapy, we next associated TMI directly with the previously reported predictive biomarkers for ICI-based treatments, such as TMB, PD-L1 expression, TIL density, peripheral blood markers and other immune gene signatures. 9 Of 20 targetable immune checkpoints, B7-H3 was the only gene with higher protein expression in TMI high tumors in TCGA BRCA cohort. Consistent with our findings, increasing evidence suggests B7-H3 as a negative predictor of patient outcomes in solid tumors, including breast cancer. [32][33][34][35] Further, its functional contribution to T cell inhibition and immune evasion is increasingly being recognized, making the molecule an appealing target as a novel immunotherapeutic drug. 36,37 Beyond its role as an immune regulator, the functional impact of B7-H3 on cancer progression, including migration, 38 invasion, 39 angiogenesis, 40 and gene regulation via epigenetic modifiers have also been suggested in a variety of cancers. 36 Here we demonstrate a potential clinical utility of TMI as a pan-tumor predictive biomarker to identify a subset of patients who may benefit from anti-B7-H3 treatment. This concept could be validated using the same predefined cut-off for TMI-based risk stratification in a prospective clinical trial of this potential drug target.
Nonetheless, a clinically important finding in our study relates to the relative predictive potential of the respective immune response signatures in specific cancer types. For example, in lung cancer, the IPRES signatures were enriched in TMI high tumors, which had higher TMB and no significant change in PD-L1 expression level as well as CD8+ T cell density relative to TMI low tumors. These data point toward a closer association of IPRES with matrisomal abnormalities than with other previously reported signatures predictive of clinical response to ICI-based immunotherapy in this subset of tumors. As evidenced by recent clinical studies, tumors with high PD-L1 expression might not necessarily have low TIL density, or vice versafor which either PD-L1 level or TIL abundance alone might not serve as an ideal predictive biomarker. 9,41,42 However, more recent evidence do suggest that specific biomarkers such as TMB harbor a broader clinical utility to predict immunotherapy response across tumor types. 14,15 Taken together, we propose the design of combinatorial approaches whereby tumor agnostic biomarkers are integrated with tumorspecific molecular indices, so as to optimize the accuracy of predicting immunotherapy response in specific cancer types.

Data preprocessing and MMD generation
This article and the accompanying data descriptor were previously published as preprints. 43,44 Ethics approval was not required. Public datasets analyzed in this study are summarized in Supplementary Data file S1. Preprocessing methods, number of patient sample, platform assayed, and genes included in the computation of TMI are recorded in Supplementary Data file S2. Raw data of independent studies were RMAnormalized using the affy package 45 or preprocessed-or author-defined normalized-data were used (Supplementary Data file S2). Most were assayed with the full 29-gene platform (i.e., Affymetrix Human Genome U133 Plus 2.0 Array). Probes having maximum expression values were collapsed to the genes for subsequent index scoring. Comprising a total of 8,386 samples, 95 independent GEO datasets (http://www.ncbi.nlm.nih. gov/geo), where raw data profiled on GPL570 platform were preprocessed, merged, and ComBat-adjusted (batch-effect removed) using the inSilico-Merging package 46 based on cancer type, as previously done for lung MMD. 7

TCGA datasets
Using TCGA-Assembler package, 47 the Cancer Genome Atlas (TCGA) data of 11 epithelial cancer types were collected, processed, and annotated with clinical parameters (Supplementary Data files S1 and S2). Due to lack of normal tissue samples, OV and SKCM cohorts, representing ovarian and melanoma cancers respectively, were excluded in differential expression analyses. TCGA-Assembler R package 47 was used to extract normalized RPKM count values. In each dataset, we excluded genes without minimum 1 counts per million (cpm) or RPMK value in <20% of total number of samples were excluded using edgeR package. 48 These filtered genes were normalized by Trimmed Mean of M-values (TMM) and were subjected to the voom function in the limma package 49 for further analyses.

Generic TMI signature
We extracted the ranking of 29 TMI genes (Table S3) and visualized their positions in each DEG list from each cancer-specific TCGA cohort using circular plots generated via the circlize package. 50 To further investigate the extent to which TMI genes exhibit greater degree of differential expression variation among 29 lung-specific TMI genes across all tumors, we derived a generic TMI signature based on a weight computed for each TMI gene using the following equation (1), which was slightly modified from equation that was used to derive generic EMT signature in a previous study 51 : where D is the total number of diseases (D = 7 in this study; prostate, renal, gastric, colorectal, breast, liver, and bladder cancer), fc gd and P gd are the fold-change and adjusted P value of the TMI gene, g, of disease, d, and n d is the number of patient samples in each TCGA cohort (Supplementary Data file S3). As not all 29 TMI genes were present in the final cancer-specific DEG list due to preprocessing, each gene was computed with different number of total n i for the weight. Generic TMI signature was then derived from 17 TMI genes having a weighted sum >3.90 and further visualized for their position in the ranked DEG list in a circular plot (Fig. 1b). SFTPC gene was not present in the final DEG list of all TCGA cohorts, except for lung cancer cohort which was not included in the weight computation.

TMI spectrum quantification
We analyzed the Expression Project for Oncology (expO) data provided by the International Genomics Consortium (IGC, USA, www.intgen.com) for TMI spectrum quantification, for tumor samples procured and processed under standard conditions, resulting in minimal non-biological variations across multiple cancer types. All tumor specimens annotated with prostate, lung, renal, liver, gastric, breast, ovarian, pancreatic, colorectal, and bladder cancer from the expO dataset were included in the spectrum quantification ( Fig. S1 and Supplementary Data file S6).

Survival analyses
A statistical summary of datasets used in survival analyses is recorded in Supplementary Data file S7. Kaplan-Meier (KM) survival curves were derived for OS and DSS and other multiple endpoints using the survival package in R (http://CRAN.R-project.org/package = survival). Clinical data for TCGA datasets were collected via the embedded DownloadBiospeci-menClinicalData function in the TCGA-Assembler package. 47 Multivariate Cox regression analyses were performed to adjust confounding factors including age, race, gender, pT, pN, and pM status (Supplementary Data file S8). For all cancer types, patients with available survival data and TMI were all included in the KM analyses.

Molecular subtyping in breast cancer
Of analyzed datasets, five datasets (BRCA, GSE20711, GSE21653, GSE19615, and GSE50567) were annotated with either ER, PR, and HER2 status or predefined molecular subtypes of breast cancers (normal breast-like, luminal A, luminal B, HER2 positive, and basal-like). Tumors harboring positive status of ER and/or negative status of HER2 were classified as luminal cancers while the basal-like tumors were defined as ER-, PR-, and HER2-negative cancers (Supplementary Data file S10).

Total mutational burden
The mutational load of TCGA tumor samples across nine epithelial cancer types, for which TMI was previously computed, was obtained from the National Cancer Institute GDC Data Portal (http://portal.gdc.cancer.gov/ projects/). The portal defines the mutational load as the total number of simple somatic mutations. Only tumors harboring at least one mutation were included and log 10 transformed to compute Spearman correlations between TMI and mutational load. A descriptive statistics including the number of patient samples available for both TMI gene expression and mutational load data, cut-off points used for patient stratification, and key outputs of Spearman correlation tests in each dataset is shown in Supplementary Data file S11.

IPRES and GSVA
Gene signatures previously associated with resistance to PD-1 immunotherapy, termed Innate PD-1 RESistance (IPRES), were obtained from Broad MSigDB (http://software.broadinstitute.org/gsea/msigdb) and Supplementary Data provided by the original publication. 21 The gene set variation analysis (GSVA) scores of the signatures were computed for each MMD and TCGA patient sample. GSVA scores were transformed to z-scores, and were correlated with TMI. As previously described, 21 a cut-off of > 0.35 was applied to the mean z-score for a patient to be determined as IPRESenriched. As these signatures were derived from non-responder (resistant) tumors, we further defined responder signatures as 161 highly expressed genes (log FC > 2 and Mann-Whitney P < 0.1) in responders compared to non-responders using the list of 693 DE genes provided by the original work. 21

CIBERSORT
The estimated fraction of individual immune cell types was computed using the beta version of CIBERSORT (http://cibersort.stanford.edu/). For any given sample, we calculated Spearman correlation between TMI and relative abundance of each immune cell type using our 11 generated MMDs and 11 TCGA cohorts (LUAD, OV, SKCM, BLCA, LIHC, BRCA, COAD, STAD, KIRC, PRAD, and PAAD). As our MMDs of breast, colorectal, and lung cancer exceeded maximum load capacity (500 MB), 1000 tumors were randomly selected for input data files. We selected LM22 (22 immune cell types) for signature gene file, 100 for permutations, and disabled quantile normalization for all runs.

Quantitative proteomic analysis
Relative protein abundance data of 10,625 protein-coding genes were generated by the National Cancer Institute Clinical Proteomic Tumor Analysis Consortium (CPTAC; https://cptc-xfer.uis.georgetown.edu/ publicData/Phase_II_Data/TCGA_Breast_Cancer/). Log ratios (base 2), representing relative abundance of each sample compared to the pooled reference sample, were obtained from TCGA_Breast_BI_Proteome.itraq.tsv for a total of 108 TCGA BRCA breast cancer samples, for which TMI was previously computed.
Differential expression analysis and heatmaps R/Bioconductor limma package 49 was used to assess differential protein expression between two patient groups classified by five MGTs. FC > 1.5 or FC < −1.5 and limma P < 0.05 were applied to determine DE proteins. A descriptive list of all DEGs at the protein level found in both groups is provided in Supplementary Data file S15. Heatmaps used in this study were generated using Morpheus (http://software.broadinstitute.org/morpheus/).

CODE AVAILABILITY
The R codes used to preprocess, merge and batch-effect correct datasets for generation of all 11 MMDs can be found in Figshare (https://doi.org/10.6084/m9. figshare.7878086).