Main

The development of CPIs has had a tremendous impact on cancer therapy1. However, the response of patients with cancer to these agents varies considerably1,2,3,4,5,6, and important immune-related adverse events may appear as a result of treatment7. Consequently, intense research has been dedicated in recent years to identifying features that influence the response to CPIs2,4,8,9,10,11,12,13, leading to the identification of potential biomarkers.

These studies have made it increasingly clear that the response to CPIs is mediated by several characteristics of the tumor, its microenvironment and the host12, which we may regard as latent factors defining CPI response and survival across patients. However, it is likely that different biomarkers identified across a multitude of studies—often focused on one or a small group of features—represent different versions of the same underlying latent factor. For example, the expression of a number of genes and gene sets previously identified as biomarkers may represent the degree of infiltration of cytotoxic cells in the tumor14,15,16,17. Furthermore, given that separate research groups independently test different sets of potential biomarkers, there is no effective control of the potential false positives associated with multiple testing. As a result of these problems, it is not clear at present how many such independent latent factors of CPI response and survival there are, what aspects of the tumor, its microenvironment or the host they represent and whether they are relevant across different tumor types.

To answer these questions, we exploited a richly profiled and annotated cohort of patients with metastatic tumors (fresh–frozen biopsied) treated with CPIs (part of the cohort profiled by the Hartwig Medical Foundation (HMF)18,19; n = 479). Specifically, we aimed to identify features of the tumors, their microenvironment or the host that appeared to be significantly associated with CPI response and survival, both across the pan-cancer HMF-CPI cohort and all represented cancer types. To this end, we used an exhaustive—not biased by prior knowledge—analysis of thousands of molecular and clinical features to detect their association with CPI response or survival. We discovered that all significantly associated features collapse into one of five independent latent factors that are relevant across all tumor types represented in this cohort. They are the tumor mutation burden (TMB), effective T cell infiltration, whether the patients received any prior treatment, the activity of transforming growth factor-beta (TGF-β) in the tumor microenvironment and the proliferative potential of the tumor. We verified that at the current level of statistical power, there are no other latent factors of CPI response and survival common to all cancer types analyzed. We validated the association of these five latent factors with CPI response and survival in six independent cohorts (n = 1,491 patients) spanning six major cancer types; to our knowledge, the largest such validation effort.

Results

Extracting features from a metastatic cancer cohort

Within the HMF18,19 cohort (n = 5,288), 479 patients with metastatic cancer who were part of the Center for Personalized Cancer Treatment study (https://www.cpct.nl/cpct-02) received anti-PD1/PDL1 or a combination of anti-PD1/PDL1 and anti-CTLA4 therapy. We refer to these patients as the HMF-CPI cohort. These include patients who had suffered from primary tumors of the skin (melanomas, n = 191), lung (n = 110), bladder (n = 88) and other cancer types (other; n = 90; Fig. 1a,b and Supplementary Table 1). Whole-genome somatic alterations of the metastatic tumors before CPI treatment were identified across all tumors in the HMF-CPI cohort and, for 396 of them, the whole transcriptome of the tumor was also sequenced. Rich clinical data, including treatments received before the diagnosis of their metastatic tumors, response to the CPI therapies, following Response Evaluation Criteria20 (n = 467) and survival information (n = 479), were also available (Supplementary Table 1).

To carry out a systematic de novo discovery of biomarkers of CPI response, we computed 27,923 features (Fig. 1c,d and Supplementary Note 1). These included the mutational (single nucleotide variants + indels) status of 15,829 genes, the copy number status of 2,415 genomic regions, 64 aggregated somatic mutation features (for example, TMB, frameshift indel burden, activity of mutational signatures) and the occurrence of known driver structural variants as well as features summarizing the genomic instability (for example, total number of chromosomal fragments, ploidy, whole-genome doubling, and so forth). We also used the expression level of 8,817 genes and clinical characteristics of the patients such as sex, type of treatment received for the primary tumor and age at the time of diagnosis of the metastasis as features. Finally, human leukocyte antigen (HLA) features that can affect the immune response to the tumor were also included, such as their HLA haplotype and the number of somatically lost HLA alleles.

Fig. 1: Extracting features from the HMF-CPI cohort.
figure 1

ac, For 479 patients with metastatic cancer in the HMF-CPI database of different cancer types, we obtained 18 clinical features, 19 germline HLA allotype features, 18,382 somatic features (based on single base substitutions, indels, copy number variants and other structural variants affecting specific genomic elements or summaries thereof) and 8,817 transcriptomic features, corresponding to all expressed genes. d, Numeric feature values were rescaled and re-normalized (Methods), yielding a large table describing the cohort. LOH, loss of heterozygosity, RECIST, Response Evaluation Criteria, CNV, copy number variant; SV, structural variant; WGD, whole-genome doubling; CR, complete response; PR, partial response; SD, stable disease; PD, progressive disease; OS, overall survival; PFS, progression-free survival.

Five latent factors of CPI response and survival

To identify which of the more than 27,000 features computed per patient were significantly associated with CPI response, we performed univariate regressions (adjusted for the site of origin of the tumor, age of the patients, site of biopsy of the metastasis and tumor purity). After controlling for multiple testing21, we identified several hundred features that appeared to be significantly associated with CPI response (Fig. 2a, Extended Data Fig. 1a–c and Supplementary Note 1).

Fig. 2: Three latent factors associated with CPI response.
figure 2

a, Logistic regression analysis (represented as a volcano plot) identified features significantly associated with CPI response (Methods and Supplementary Note 1). Dots with larger sizes represent significant features, and they are colored following the type of feature. P values shown in the plots were computed by logistic regressions. These are, by definition, two-sided. b, All significant features were selected and clustered based on their pairwise correlations. The colors denoting the clusters are inherited from the type of feature included in each of them according to the color legend in a. c, Mean expression values of cluster R3 (x-axis), and the 'T-cell effector' gene set (y-axis), across patients (dots). The Pearson's correlation coefficient is indicated. d, To discern the nature of cluster R3, the correlation of its mean to 255 gene sets collected from the literature was computed across patients (as illustrated in panel (c)). Dots represent gene sets. e, Relationship between the significance of the association with the response (y axis) and the correlation (x axis) to the mean of the cluster of the features in each cluster. P values shown in the plots were computed by logistic regressions. These are, by definition, two-sided. Dots in these three panels appear in darker color if they represent features significantly associated with CPI response and with a correlation coefficient above 0.5 with the mean of their respective cluster. In (a), (d) and (e), the horizontal dashed lines represent the significance threshold according to the Benjamini–Yekutieli correction.

Then, we asked how these significant features relate to each other and which underlying latent factors of CPI response they represent. To answer these questions, we clustered all significant features based on their pairwise correlations (Fig. 2b). Virtually all (Supplementary Note 1) could be unambiguously assigned to one of three clusters (R1, R2 or R3, encompassing somatic, clinical or transcriptomics features). This implies that only three latent factors associated with CPI response were detected from the more than 27,000 features analyzed.

To understand the nature of cluster R1, we first computed the mean of its integrating features. The single feature in the cluster with the highest correlation to the mean was the overall TMB, with other aggregated mutational features (for example, clonal TMB) also showing a high correlation. Specifically, the increase of TMB is associated with a higher probability of response and also increased survival (Extended Data Fig. 2a and Supplementary Fig. 1). Thus, we named this latent factor TMB, and although it could be measured using any of the features in the cluster, we selected the TMB to represent it. Importantly, the mutation rate of virtually all genes (some of which have been previously associated with CPI response3,4,10,22,23,24) also appear to be highly correlated with the TMB as part of this cluster of features, and indeed, some heavily mutated genes exhibit lower P values in the regression analysis than TMB (Supplementary Note 1). This implies that identifying the mutations of individual genes as biomarkers of CPI response independently of the TMB is a very challenging task.

Cluster R2 was integrated by two highly correlated clinical features: prior exposure to systemic therapy11 and prior exposure to any therapy. These two features appear to be significantly negatively associated with CPI response and survival (Fig. 2b, Extended Data Fig. 2b, Supplementary Fig. 1 and Supplementary Note 1), perhaps owing to increased tumor aggressiveness or deteriorated patient condition. Thus, we named cluster R2 ‘prior treatment’, and for the following analyses, we represented this cluster using exposure to any prior treatment.

Cluster R3 grouped the expression of 48 genes. We reasoned that the expression of gene sets representing biological functions in the tumor or its microenvironment could aid in the interpretation of this cluster. Thus, we computed the mean expression of 255 gene sets (225 representing all Kyoto Encyclopedia of Genes and Genomes (KEGG) pathways and cancer hallmarks obtained from the Molecular Signatures Database (MSigDB)25,26 and 30 collected from the literature12,27,28; Supplementary Table 2, Fig. 2c,d and Methods). The mean expression of 13 gene sets was significantly associated with the response to CPI (Fig. 2d, Extended Data Figs. 2c and 3a and Supplementary Fig. 1), and all of them showed a high correlation (Pearson’s coefficient of >0.8) with the mean expression of the genes in the cluster. They all represent some aspect of immune infiltration in the tumor; most specifically, T lymphocyte infiltration. Therefore, we named this third latent factor ‘effective T cell infiltration’ and represented it through the mean expression of all genes in the cluster. The increase in effective T cell infiltration appears significantly associated with a higher probability of CPI response and longer survival.

There is a clear positive relationship between the correlation of every feature to the representative of its corresponding latent factor (that is, TMB, prior treatment and the mean expression of genes in cluster R3) and the significance of its association with CPI response. The higher the correlation of a feature with the mean of its corresponding cluster, the more significant its association with CPI response (Fig. 2e and Extended Data Fig. 2a–c). These three latent factors are also significantly associated with overall survival and progression-free survival upon CPI treatment (Extended Data Fig. 2a–c).

We then asked whether any other latent factors of the tumor, its microenvironment or the host specifically influence the survival of patients, independently of the previous three latent factors associated with response (given that response is, by itself, a major determinant of survival). To answer this question, we focused on features that appeared to be significantly associated with overall survival, after controlling (in addition to the aforementioned covariates) for the three latent factors previously associated with response (Fig. 3a). Again, to discern how many latent factors were represented by these features, we clustered them based on their pairwise correlations (Fig. 3b).

Fig. 3: Two latent factors associated with survival.
figure 3

a, Features significantly associated with survival residuals, that is, after correction for the three latent factors associated with response. Larger dots represent significant features. Features with high correlation (Pearson coefficient of >0.5) with any of the three previously identified latent factors are removed. P values shown in the plots were computed by Cox regressions. These are, by definition, two-sided. b, Clusters of features based on their pairwise correlations. c,d, Cluster S1 and cluster S2.1 are highly correlated with gene sets representing the tumor proliferative potential and the activity of TGF-β in the tumor microenvironment, respectively. Dots represent the mean expression of two gene sets (y axis) and the mean expression of the genes in clusters S1 and S2.1 (x axis) across patients. Pearson's correlation coefficients are indicated. e, Features significantly associated with overall survival. Larger dots represent significant features. f, Significance of the association with the response (y axis) and the correlation (x axis) to the mean of the cluster of the features in each cluster. P values shown in the plots were computed by Cox regressions. These are, by definition, two-sided. g, Depiction of the five latent factors associated with CPI response and survival. Upwards arrows, positive association with response and/or survival; downwards arrows, negative associations. In (a), (e) and (f), the horizontal dashed lines represent the significance threshold according to the Benjamini–Yekutieli correction.

One of the three clusters was clearly orthogonal to the other two, which exhibited a certain degree of inter-correlation. Thus, we named them clusters S1, S2.1 and S2.2, as they only represent two mutually orthogonal latent factors (Fig. 3b and Supplementary Note 1). To interpret them, we analyzed the correlation of their mean expression with that of 255 gene sets (Supplementary Table 2), as explained above. The mean of cluster S1 showed the highest correlation with a gene set named ‘Proliferation potential’ (Fig. 3c) and a high correlation with other gene sets representing cell cycle and overall cell proliferation (Extended Data Figs. 2d and 3b,c and Supplementary Fig. 1). We thus named it ‘tumor proliferative potential’ and represented it through the mean expression of all genes in the cluster.

The mean expression of the genes in cluster S2.1 showed the highest correlation with a gene set representing TGF-β in fibroblasts (Fig. 3d) and a high correlation with other gene sets related to this biological process (Extended Data Figs. 2e and 3b,d–f). As in other cases, we represented this latent factor through the mean expression of the genes in the cluster. Low values of this latent factor (TGF-β activity in the microenvironment) are associated with longer survival of patients upon CPI treatment even without correcting for the effect of the three response-associated latent factors (Fig. 3e).

We next asked whether the latent factors are specifically associated with CPI treatment or whether they represent general elements that influence response and survival upon any type of therapy. To answer these questions, we analyzed the data for 2,497 patients in the HMF cohort who received non-CPI therapies and found that the effects of TMB, effective T cell infiltration and TGF-β activity in the microenvironment are unique to CPI therapies, whereas prior treatment appears to affect the response to both CPI and non-CPI therapies, and the effect of tumor proliferative potential appears even larger across non-CPI than CPI-treated patients (Extended Data Fig. 4). Virtually all features that appear significantly associated with CPI response and/or survival are grouped in one of the five latent factors (Extended Data Fig. 5), indicating that no latent factor remains to be discovered in the HMF-CPI cohort.

In summary, five mutually orthogonal latent factors underlying CPI response and survival across the HMF-CPI cohort (Fig. 3g) emerged from this unbiased analysis. Supplementary Dataset 1 lists the results of the unbiased analysis of features in their association with CPI response and survival. Each of them can be represented through a number of features that are clustered by virtue of their pairwise correlations.

Validation of the five latent factors

Next, we asked whether the five latent factors, identified across HMF-CPI, were of comparable importance in the four groups of tumors with different tissues of origin represented in the cohort. To answer this question, we conducted, for each latent factor, multivariate regressions (adjusted for age, tumor purity, biopsy location and the remaining four latent factors) of their effect on CPI response and survival (Fig. 4). We found that the direction of the association of each factor (with response or survival) was maintained for all tumor types as in the pan-cancer analysis, with small differences in the effect size and the significance of their associations. One exception is the association of effective T cell infiltration with response in patients with lung tumors, which was not significant (although the significance in the association with survival is maintained). The other is prior treatment, which does not exhibit a significant association with response across bladder tumors. Very similar results were obtained in cancer type-wise univariate regressions (Extended Data Fig. 6a). In summary, we find that the latent factors, with few exceptions, appear to underlie CPI response and survival across all tumor types represented in the HMF-CPI cohort (Supplementary Note 1).

Fig. 4: Validation of the latent factors across independent cohorts.
figure 4

Forest plots illustrating the association of latent factors across groups of tumors with different origins in the HMF-CPI cohort (left) and across six independent cohorts (right) with CPI response and overall survival. The value of each latent factor was computed as the mean of the cluster of features obtained in the HMF-CPI cohort across each validation cohort, except in the VHIO cohort, where the transcriptomics latent factors were estimated from alternative sets of genes (Methods). In the forest plots, the dots represent the strength (coefficients estimated through multivariate logistic or Cox regression) of the association between the latent factor and response or survival across cohorts. The horizontal bars denote the 95% confidence intervals. Gray dots represent latent factors whose estimates are within one standard error at either side of 0, dots with a light color (green or red) represent non-significant associations with coefficient estimates above (or below) one standard error of 0 and dark-colored dots represent significant associations. Green dots represent positive associations with improved outcomes (higher response odds or lower hazard ratio), while red dots represent negative associations (lower response or higher hazard ratio). Mixed denotes cohorts integrated by patients with multiple tumor types.

We next asked whether the five latent factors are validated in independent cohorts of the same and other tumor types representing the wide diversity of approaches of sample processing and tumor profiling used in the clinic. To this end, we collected data from the literature for five independent cohorts or metacohorts (INSPIRE29, Lyon30, MARIATHASAN27, PARKER ICI31, RAVI32) and obtained the data from another cohort of patients treated at the Vall d’Hebron Institute of Oncology (VHIO). These validation cohorts comprised 1,491 patients with primary or metastatic tumors of different organs (Supplementary Table 1 and Supplementary Note 1). For 339 of these patients, we obtained sufficient information to compute the five latent factors, while for the remaining 1,152, we could compute only four or three latent factors. Using the available clinical information, we were able to evaluate the association of the latent factors with CPI response for 1,294 patients across all cohorts, while the association with overall survival could be computed for 1,165 patients across five cohorts. Unlike in the case of the HMF-CPI cohort, most of these studies (except INSPIRE) started from formalin-fixed, paraffin-embedded samples. The approaches used to identify somatic mutations range from whole-exome tumor-normal paired sequencing to tumor-only sequencing of a panel of 432 genes. The expression of genes was measured by whole-transcriptome, targeted RNA sequencing (RNA-seq) or a panel of 170 genes using the nCounter (NanoString) platform.

In a multivariate analysis, pooling all external cohorts, the associations were consistent between each of the five latent factors and CPI response or survival, with all except tumor proliferative potential reaching significance (Fig. 4). In some individual cohorts, the association of a particular latent factor with CPI response or survival could not be verified, such as the TMB in the VHIO cohort. In this case, owing to the lack of a control sample to reliably call somatic mutations, the calculation of TMB is probably not reliable (Supplementary Note 1). Nevertheless, despite the differences in cohorts, profiling and sample collections, the associations observed in the HMF-CPI cohort for the five latent factors were, overall, reproduced across the validation cohorts. T cell effective infiltration was positively associated with CPI response across five validation cohorts (three significantly), TGF-β activity in the microenvironment was negatively associated with survival in the five cohorts in which it could be evaluated (four significantly) and tumor proliferative potential was negatively associated with survival in four out of five cohorts (two significantly; Fig. 4). The association with prior treatment was validated in all (two significantly) but one cohort (Fig. 4 and Supplementary Note 1). Genes closer to the mean of clusters R1 (T cell effective infiltration), S2.1 (TGF-β activity in the microenvironment) and S1 (tumor proliferative potential) in HMF-CPI also tend to correlate better with one another across the four validation cohorts with transcriptomics data (Methods; Extended Data Fig. 6b).

In summary, despite the wide differences in tumor sample processing and profiling, many of the associations between the five latent factors and CPI response or survival previously discovered in the HMF-CPI are also observed across six independent cohorts.

Multivariate models to predict CPI response and survival

We next asked how the effects of the five latent factors combine (through accumulation or interaction) to influence CPI response and survival. To that end, we trained multivariate machine-learning (tree-based gradient-boosting) models33 to predict the response, overall survival or progression-free survival of patients in the HMF-CPI cohort. To exploit the higher statistical power provided by the full cohort and the specificity inherent in the response across cancer types, we first constructed pan-cancer models and then used them as the base to obtain hybrid models; that is, subjecting the pan-cancer models to added rounds of training on the data corresponding to each tumor type (Fig. 5a and Supplementary Note 1). The hybrid models trained on the five latent factors outperformed models trained solely on tumor type-specific data (Supplementary Fig. 2a) as well as equivalent models trained solely on values of TMB and PDL1 expression (Supplementary Fig. 2b) within the HMF-CPI cohort. Models trained on different representations of the five latent factors showed comparable performance, supporting the idea that the features of each cluster constitute alternative representations of the latent factors (Supplementary Fig. 3a,b). The variability in the influence of the five latent factors across different tumor types in the HMF-CPI cohort observed in the multivariate regression analysis described above is verified through a survey of their relative importance on the prediction cast by the multivariate machine-learning models of CPI response and overall survival (Methods; Extended Data Fig. 7a,b).

Fig. 5: Multivariate models to predict patients’ response and survival.
figure 5

a, The values of the representative biomarkers of the five latent factors across patients in the HMF-CPI cohort were used to train hybrid (pan-cancer-informed tumor type-specific) gradient-boosting models to predict CPI response and survival. The performance of the models was assessed through cross-validation (Methods and Supplementary Note 1). b, Stratifying patients based on model predictions. We separated the patients in the HMF-CPI cohort into three groups based on their predicted probability of response (histograms) and the three-segment bar below. We then calculated the fraction of responders within each group (bar plots below each histogram). c, Differences in overall survival between the three groups of patients are represented by Kaplan–Meier curves. The P value for each cohort (annotated in the plot) was calculated with a one-sided log-rank test. The line colors correspond to the three groups of patients defined in a. d, The TMB for each patient in the HMF-CPI cohort (with complete data for all five latent factors) was computed with a measure commonly used in the clinic: the number of mutations per genomic megabase. Tumors were classified as low-TMB or high-TMB based on a simple cutoff (10 mutations per megabase). The bars are colored according to the fraction of patients with high or low TMB in each of them. Interestingly, a number of patients with high-TMB tumors are predicted to have a low probability of response, whereas some patients with low-TMB tumors appear in the high probability of response group. The bottom bar plots present the percentage of patients in the low-TMB and high-TMB groups that showed clinical response to CPIs. OS, overall survival; BOR, best overall response according to RECIST; MB, megabase.

We then stratified 396 patients in the HMF-CPI cohort with all data types (jointly and separately by tumor type) into three groups of low (below 0.1), medium (between 0.1 and 0.5) and high (greater than 0.5) predicted probability of response to CPI. Only 2 (3%) of the 67 patients in the low-probability group actually responded to CPI treatment, compared with 61 out of 97 (63%) patients in the high probability of response group (Fig. 5b). This stratification also significantly separated patients in the HMF-CPI cohort based on their survival (Fig. 5c). Stratifying the patients based on a threshold of TMB used in the clinical practice (ten mutations per Mbp34,35) to separate high-TMB and low-TMB tumors is less optimal, with 17% of responders among patients with low-TMB tumors and 42% among those with high-TMB tumors (Fig. 5d). Interestingly, patients in the group with low probability of response exhibit a range of predicted hazards according to the overall survival pan-cancer model (Extended Data Fig. 8a–g and Supplementary Note 1). Across the VHIO and INSPIRE cohorts, the stratification based on the predicted probability of response produced a perfect identification of patients with a low probability of response, while results were less accurate across the RAVI cohort (Extended Data Fig. 9a and Supplementary Fig. 4).

When applied to patients in the HMF cohort who did not receive CPIs, the multivariate models of response identified an important fraction of the patients with skin (35%), bladder (42%) and lung (16%) tumors with a high likelihood of response to the treatment (Extended Data Fig. 10). Interestingly, patients suffering from other metastatic malignancies (some not usually considered as candidates for CPI) were also identified as potentially good responders. For example, 18 (4%) patients with breast cancer, 10 (3%) patients with colorectal cancer, 10 (19%) patients with kidney tumors and 5 (15%) patients with liver cancer exhibited high probability of response to CPI.

In summary, we illustrate that multivariate models combining the five latent factors produce a more accurate stratification of patients according to their predicted probability of response than the TMB alone.

Discussion

In this work, we followed a completely unbiased approach to discover genomics, transcriptomics and clinical features associated with CPI response and survival across patients with cancer. We aimed to answer how many and which aspects of the tumor, its microenvironment and the host influence the response to CPIs across patients (that is, latent factors), in an effort to provide a framework of reference to existing copious reports of biomarkers. First, through univariate logistic and Cox regressions, we identified a few hundred features that are significantly associated with CPI response and/or survival. Five latent factors emerge when these significant features are clustered based on their pairwise correlation. These represent mutually independent aspects of the tumor, its microenvironment and the host that influence the response of a patient to CPI and their hazard after the treatment.

Although the fact that some genomics and transcriptomics features may represent the same aspect of tumors, their microenvironment or the host had been reported before, here we show that an array of different, highly intercorrelated features (for example, expression of genes related to T cell function) actually represent different measurements of the same latent factor. This is particularly striking in the case of TMB: the mutation rate of hundreds of genes (including cancer driver genes) appears to be highly correlated with TMB, suggesting that the association of mutations in a given gene with CPI response rather than an independent biomarker is just an alternative proxy measurement of TMB. Although the mutation status of some genes may still be bona fide biomarkers of CPI response, independently of TMB, any analysis to identify them should account for the confounding factor of their correlation with TMB. We also demonstrate that the associations of the latent factors with CPI response and survival are observed across different tumor types, and we validated them across six independent cohorts of patients. Most of the associations discovered in the HMF-CPI cohort were corroborated across these independent cohorts, despite differences in sample collection and processing procedures and profiling methods between these cohorts and the HMF-CPI. This indicates that these latent factors are mostly universally associated with CPI response and survival. They may thus be potentially used in the future within the clinical practice, despite the heterogeneity of sample processing and tumor profiling approaches used. The variability observed across tumors of different origins in more than one cohort (for example, the smaller association of effective T cell infiltration with the response of lung tumors) may point to findings that could be pursued further. To our knowledge, this constitutes the most extensive exploration, to date, of the biomarkers of CPI response and survival across cohorts with tumors from different organs.

Importantly, no features other than these five latent factors were significantly associated with CPI response or survival. That is, virtually all significant features cluster within one of them. However, some relevant features may still lay below the statistical power of the HMF-CPI cohort or appear significantly associated with CPI response or survival in only one tumor type. This is particularly important for features that may be relevant for a fraction of patients, such as mechanisms of immune escape (for example, B2M deletion, an event that we observe as significant across melanomas but not other tumors in the HMF-CPI cohort; see Supplementary Note 1)8,22,24,36. Other examples of such features relevant for specific groups of patients may include common polymorphisms that affect the immune response37 and the heterozygosity at HLA loci38. Particularities of the tumor types not represented in the HMF-CPI cohort are, of course, also absent from our current catalog of proxy biomarkers. These will be discovered when larger CPI cohorts are analyzed; it is not inconceivable that even more latent factors will become apparent then. Our discovery of latent factors is also limited by the profiling technologies used, which rely on deconvolution of the immune infiltrate based on bulk RNA-seq data. More detailed studies of this infiltrate—based on fine mapping of immune populations and their interactions with other cells in the microenvironment—will contribute in the future to refine the landscape of biomarkers of CPI response and survival.

A test of the application of multivariate models combining the five latent factors produced a stratification of the patients in the HMF-CPI cohort based on their predicted response probabilities that discriminates better between responders and non-responders than the threshold of TMB frequently used in the clinic. This could be regarded as a proof-of-principle for application of the five latent factors to clinical practice. Being able to identify patients with a very low probability of response would be relevant to spare them the potential side effects of the therapy7. Additionally, it may aid in reducing the financial burden on healthcare providers39. There is also the possibility—illustrated through the analyses described above—to use such multivariate models to identify patients with tumors that are not usually considered suitable candidates for CPI who have a high probability to respond. In the future, when these types of models can be used in support of clinical decision-making, this could potentially contribute to expanding the therapeutic options for patients suffering from these malignancies.

In summary, we envision that the results of this work can provide a frame of reference to the research of biomarkers of CPI response and survival, resulting in the classification of all identified significant features falling into one of these five latent factors, or a completely independent one.

Methods

Discovery cohort

Whole-genome somatic mutations, copy number and other structural variants across metastatic tumors from 4,484 patients in the HMF cohort were obtained from the HMF database18,19 (version DR-263_update1). Of these, 479 subsequently received CPI therapy (HMF-CPI cohort), for which all somatic variation information was available. Whole transcriptome from RNA-seq was available from the same source for a subset of 396 patients in the HMF-CPI cohort. Several features computed by the HMF pipeline from this data for each tumor (for example, number of neoepitopes, activity of mutational signatures) as well as the patients’ germline features (such as HLA allotypes) were obtained as part of the dataset. It also included all relevant clinical information regarding exposure to treatment for their primary malignancy, the subsequent treatment regimen for the metastasis and longitudinal measurements of the outcome (Supplementary Note 1 and Supplementary Table 1). The ethical approval to use this data in research has been obtained by the HMF.

Validation cohorts

INSPIRE

Whole-exome somatic mutations, the whole-transcriptome RNA-seq gene expression of tumors and all clinical data pertaining to prior treatment as well as outcome upon treatment with pembrolizumab within the INSPIRE basket trial (NCT02644369)29 were obtained for 64 patients from https://github.com/pughlab/inspire-genomics.

Lyon

Targeted RNA-seq (2,559 genes) and clinical data from 315 patients treated at several hospitals in Lyon and Paris30 were obtained from https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE159067, https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE161537, https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE162519 and https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE162520.

MARIATHASAN

Whole-exome somatic mutations, the whole-transcriptome RNA-seq gene expression of tumors and all clinical data in a previously published study27 of 348 patients were obtained from http://research-pub.gene.com/IMvigor210CoreBiologies.

PARKER ICI

Whole-exome somatic mutations, the whole-transcriptome RNA-seq gene expression of tumors and all clinical data of several cohorts of tumors (including those within clinical trials CheckMate 038 and CheckMate 067 and two cohorts published within other studies) compiled in a previous publication31 totaling 315 patients were obtained from https://github.com/ParkerICI/MORRISON-1-public.

RAVI

Whole-exome somatic mutations, the whole-transcriptome RNA-seq gene expression of tumors and all clinical data of the SU2C-MARK cohort from a previous publication32 comprising 352 patients were obtained from https://zenodo.org/records/7625517.

VHIO

The estimated TMB and the expression (via NanoString) of 170 genes across the tumors of 74 patients with cancer profiled and treated at the VHIO Hospital in Barcelona were obtained directly from the Cancer Genomics Group. Clinical data of these patients were provided by attending oncologists at VHIO.

Details of all cohorts appear in Supplementary Table 1 and Supplementary Note 1.

Ethical approval to use the data of the first five validation cohorts in research was obtained by the original institutions, who obtained written informed consent from patients and made the data available through scientific publications. The Vall d’Hebron University Hospital Ethics Committee of Clinical Research approved the study according to local guidelines and regulations, and written consent was obtained from all the patients included in this study.

Feature extraction for systematic analysis

Somatic features (18,382) representing single nucleotide variants, indels, copy number variants and other structural variants were extracted from files directly downloaded from the HMF database. These included the list of variants as well as summary statistics, such as TMB, burden of structural variants, predicted neoantigen burden, and so on. Although some features were obtained directly from the files, others were derived. The level of expression (transcripts per million) of 8,817 genes (measured through whole-transcriptome RNA-seq) was also obtained from files downloaded from the HMF database after pre-processing (see below). Other RNA-seq features were derived from these values, mainly through the summarization of the expression of genesets in separate features12,27,40,41, or by Cibersort17 derivation of immune cell populations from gene expression data (Supplementary Table 2). The HLA allotypes of HMF-CPI patients were directly obtained from files downloaded from the HMF database, while somatic HLA loss of heterozygosity in the tumors was estimated using the LILAC tool22. Clinical information regarding courses of treatment before the biopsy of the metastasis and the subsequent outcome of CPI treatment was also obtained from files downloaded from the HMF database. Again, part of that information was directly converted into features, while other features, such as the time elapsed between the end of the prior treatment and the biopsy of the metastasis, were derived from these data. Some of these clinical features were converted into outcomes of the analysis (best overall response to CPI, overall survival and progression-free survival upon CPI), while others were maintained as potentially predictive features. A detailed description of the strategy followed for the extraction of features in the HMF-CPI cohorts appears in Supplementary Note 1.

Pre-processing

All outcomes and features were computed across all samples in the HMF database. Finally, the data was joined based on the sample identifiers to produce a data frame ready for statistical analyses. Before systematic analyses, several pre-processing steps were performed. First, to reduce multiple testing, we applied filters to remove features with little chance of providing meaningful associations. For somatic mutations by gene, only genes with at least one mutation per 20 samples were kept for the analyses. For RNA expression, only coding genes with a mean and standard deviation of adjusted transcript per million values greater than 0.5 were considered. For the driver features, only driver genes mutated in at least one in 30 samples were included. Similarly, for mutational signatures, only signatures with exposure greater than 0.02 for at least one in 20 samples were included. Second, all features were standardized to have a mean of zero and a standard deviation of one across the CPI samples. This standardization allowed for fair comparisons of estimated effect sizes. Outside of the primary tissue location, all features in the analyses were numeric or ordinal.

Systematic analyses

Each feature was tested individually for the strength of association to best overall response, progression-free survival and overall survival. Generalized linear models and their native maximum-likelihood-based tools were used for all estimation, standard error calculation and hypothesis testing.

The best overall response was modeled with logistic regression, in which we assumed that the probability of response followed a Bernoulli distribution with mean p. For each feature X, we accounted for primary tissue, biopsy location, tumor purity and age as model covariates. Formally, let Ij represent the covariate indicator functions for primary tissue (skin, lung, bladder, other tissue), let Ik represent the indicator function for biopsy location (lung, liver, lymph node, primary, skin, other tissue), let Xage represent patient age and Xpurity represent the tumor purity. The full and reduced models were fit as follows.

Full model:

$${Logit}\left(\;p\right)={\beta }_{0}+X{\beta }_{X}+\sum _{j\in {tissue}}{I}_{j}{\beta }_{j}+\sum _{k\in {biopsy}}{I}_{k}{\beta }_{k}+{X}_{{purity}}{\beta }_{{purity}}+{X}_{{age}}{\beta }_{{age}}$$

Reduced model:

$${Logit}\left(\;p\right)={\beta }_{0}+\sum _{j\in {tissue}}{I}_{j}{\beta }_{j}+\sum _{k\in {biopsy}}{I}_{k}{\beta }_{k}+{X}_{{purity}}{\beta }_{{purity}}+{X}_{{age}}{\beta }_{{age}}$$

The models were fitted with the base R glm function.

Progression-free and overall survival outcomes were modeled for each feature with Cox proportional hazards models. The hazard rates, denoted h(t), were modeled as follows.

Full model:

$$\log \left(h\left(t\right)\right)=X{\beta }_{X}+\sum _{j\in {tissue}}{I}_{j}{\beta }_{j}+\sum _{k\in {biopsy}}{I}_{k}{\beta }_{k}+{X}_{{purity}}{\beta }_{{purity}}+{X}_{{age}}{\beta }_{{age}}$$

Reduced model:

$$\log \left(h\left(t\right)\right)=\sum _{j\in {tissue}}{I}_{j}{\beta }_{j}+\sum _{k\in {biopsy}}{I}_{k}{\beta }_{k}+{X}_{{purity}}{\beta }_{{purity}}+{X}_{{age}}{\beta }_{{age}}$$

Survival models were fitted using the coxph function from the survival package in R.

For all analyses, P values were computed based on the likelihood ratio tests comparing the full and reduced models.

For the main analyses of best overall response, progression-free survival and overall survival, the covariates included were the indicators for primary tissue, the age of patients, the site of biopsy of the metastasis and the tumor purity. For the overall survival residuals analysis, the covariates additionally included the representative biomarkers of the three latent factors explaining response: TMB, T cell effective infiltration and pretreatment. For all model–feature–covariate combinations, the P values were calculated from the likelihood ratio test comparing the full model to the reduced model (with the feature of interest removed). Effect sizes (log odds ratio for the logistic regression and hazard ratios for the Cox regression) and standard errors were estimated with maximum likelihood from the full models. All effect sizes, standard errors and corresponding P values were stored for further analysis. Given the large dependency in tests, we used the Benjamini–Yekutieli multiple testing threshold to control the false discovery rate21. Several exhaustive analyses were run, with different sets of covariates each producing similar conclusions. Full documentation of all exhaustive analyses can be found in Supplementary Note 1.

Identification of latent factors

Latent factors were defined as the independent biological mechanisms underlying the features most predictive of CPI response and survival. To label latent factors, we first focused on features passing the Benjamini–Yekutieli multiple test significance threshold. From these significant features, we computed their pairwise Pearson correlations and identified clusters using hierarchical clustering (hclust() in R with the Ward.D2 algorithm). The optimal number of clusters was defined using the R package ‘factoextra’ function fviz_nbclust, using silhouette’, ‘wss’ and ‘gap_stat’ options.

To label transcriptomics clusters, we computed the expression of 255 gene sets reported in the literature. The gene sets (Supplementary Table 2) were collected by downloading the Hallmark and KEGG annotated gene sets from MSigDB25,26 (version 2023.1.Hs). These genesets were further complemented by others, obtained from previous publications27, including a paper describing the CPI-1000 analyses12. Genesets with a Pearson correlation of >0.8 with the mean of a specific cluster and passing the multiple test P value threshold of association with CPI response or survival were considered cluster-specific and thus used to discern the nature of the cluster.

Stability of transcriptomics latent factors

For each gene in each transcriptomics cluster, we calculated the silhouette score42 using the silhouette() function from the package ‘cluster’ in R. This score reflects how close the particular data point is to the cluster of assignment and how far it is from other clusters. The matrix of distances between genes used to calculate silhouette scores was obtained from the correlation matrix

$$d=\sqrt{1-\left|{cor}\right|},$$

where cor is the correlation matrix of gene expression levels.

We then calculated silhouette scores for each gene in every other cohort with available expression data (INSPIRE, MARIATHASAN, PARKER ICI, RAVI) using gene expression levels from the corresponding dataset but keeping the initial clustering obtained in the HMF cohort. Aggregation of silhouette scores across datasets was performed using the aggregateRanks (method = ‘stuart’) function from the ‘RobustRankAggreg’ R package43.

Multivariate machine-learning models

Multivariate models were fitted using the Extreme Gradient Boosting (XGBoost) package in R44. The training in all cases sought to find the tree function T (sum of trees) that minimizes the expected loss between the observed and predicted response values; that is:

$${\hat{T}={argmi}{n}_{T}E}_{X,Y}L\left(Y,T\left(X\;\right)\right)$$

where X and Y are the feature and response data, respectively, and \(L\) is the loss function of choice. In our setting, the loss function was chosen to be a negative likelihood compatible with the typical distributional assumptions for each type of data. Specifically, the best overall response was modeled using the logistic regression likelihood, while progression-free survival and overall survival were modeled using the Cox proportional hazards likelihood.

We trained three pan-cancer models (one per outcome) incorporating all available training data (479 patients). We also trained 12 hybrid models based on the three pan-cancer models followed by further cycles of training on patients of each tumor type, but maintaining exactly the same loss function and all hyperparameters. Patients suffering from malignancies other than skin melanomas and lung or bladder tumors were pooled within a group labeled as ‘other’ tumor types. This model fit procedure found a compromise between low variance but high bias from the pan-cancer models and low bias but high variance in pure tumor type-specific models. Finally, we also trained pure tumor type-specific models, starting from the patients in each of the four groups separately (details in Supplementary Note 1).

The XGBoost models require many tuning parameters (learning rate, depth, sub-sampling, minimum tree leaf size) that guide the internal model fitting. Initially, our model building used grid searches to select optimal internal tuning parameters. However, in our cross-validation study, we found that simple additive models (depth, 1; fast learning rate, 0.05; minimum leaf size, 5; sub-sampling, 0.75) had the best performance.

For the best overall response, the model casts the prediction outputs as log odds ratio scores that can then be recast into probability scores (continuous values between 0 and 1). For progression-free survival and overall survival, the models cast the prediction outputs as log hazard ratios that can then be recast into hazard ratios (continuous and positive).

Separate models were trained solely on TMB and PDL1 expression (the continuous value reported in the HMF-CPI cohort by whole-transcriptome RNA-seq). These models were used to represent the predictive power of clinically approved biomarkers across analyses of the performance of multivariate models.

Calculation of Shapley values

Given that the final tree-based models were additive, the calculation and extraction of Shapley values was straightforward. For each feature, for a given additive model and individual sample, there was 1-to-1 mapping from the feature values and the Shapley values. This relationship between feature and Shapley values is visualized by the marginal dependence plots in Extended Data Figure 7a,b. In R, using the predict function applied to the XGBoost output, we set the argument contribution = TRUE to extract the Shapley values. The extracted Shapley values measure additive feature contribution to the log odds ratio for response models and the log hazard ratio for the Cox survival models.

Proxy biomarkers in the VHIO cohort

In the VHIO cohort, the TMB was estimated from the mutations detected using a 432-gene hybrid capture-based panel45. The expression of 170 genes was measured using the nCounter (NanoString) platform46. Normalized NanoString counts were log transformed and standardized, and proxy biomarkers were selected based on their correlation with the representative biomarkers of the five latent factors in the HMF-CPI cohort. For the T-cell effective infiltration gene set, CXCL9, CXCL10, CXCL11, GZMA, GZMB and IFNG were selected. Overall, this gene set was strongly correlated with the original T-cell effective infiltration gene set (ρ = 0.97) and showed high statistical significance in the exhaustive analysis (P = 7.0 × 10−8). To select a set of genes to represent the latent factors of TGF-β activity in the tumor microenvironment and tumor proliferative potential, we selected genes with a correlation of >0.5 to the respective gene set. This process yielded BRCA1, BRCA2 and TUBB for the tumor proliferative potential gene set. Although none of these genes were included in the representative biomarker obtained from the HMF-CPI cohort, they all showed a strong correlation to this gene set. The proxy gene set also showed a statistically significant association with overall survival residuals. The aforementioned process, in the case of the VHIO TGF-β gene set, yielded DLL4, HEYL, NOTCH3, NOTCH4, SERPINE1, TGFB1 and TGFB3. This gene set also showed a very strong correlation to the representative TGF-β activity in the tumor microenvironment biomarker (Supplementary Note 1).

Statistics and reproducibility

The systematic analysis to identify features associated with CPI response and survival was carried out through logistic and Cox regressions, and the results were filtered for multiple testing as described in the Methods and Supplementary Information. These features were grouped into latent factors based on their pairwise correlations. Standard statistical approaches, such as univariate and multivariate regressions or Kaplan–Meier analysis, were used downstream for the analysis of the latent factors across validation cohorts. No statistical method was used to determine sample size for the analysis. All available samples from the discovery and validation cohorts were used; none were excluded from the analysis. Given that the study consisted entirely of the analysis of existing data, it was not randomized and the investigators were not blinded, as no allocation of samples in groups was carried out. All data used in this study are publicly available (see below) and the code used to reproduce the analysis described in the paper has been deposited in public repositories (see below).

Reporting summary

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.