Introduction

Immune checkpoint blockade immunotherapy is an emerging cancer treatment, and several regimens have been approved by the US Food and Drug Administration for treatment of melanoma and lung cancer with and without metastases, as well as renal cell carcinoma (RCC), bladder cancer, and Hodgkin’s lymphoma1. However, despite impressive response rates, the clinical benefit of these regimens has been limited to subgroups of patients with specific tumor types2 and certain interaction patterns between cancer cells and immune cells within the microenvironment1. There is an unmet need for biomarker identification to correctly stratify patients for proper treatment3,4,5.

Several tumor biomarkers, including Programmed Death Ligand 1 (PD-L1) immunohistochemistry and RNA gene expression profile (GEP) IFN-gamma cytolytic signatures, have been reported to predict the clinical response to immune checkpoint blockade immunotherapy, especially to anti- PD-L1 and anti-Programmed Death 1 (PD-1) immunotherapies. Nonsynonymous somatic mutations alter amino acid residues of a protein; in turn, these altered amino acid residues create new T-cell epitopes (neoepitopes) from a previously self-peptide6,7, serving as neoantigens capable of eliciting an antitumor immune response. Each nonsynonymous mutation increases the chances of immunogenic neoantigen formation—a probabilistic event. Therefore, tumor mutational burden (TMB)—defined as the number of nonsynonymous mutations in the tumor—can be used as a predictive biomarker for early response to immune checkpoint blockade immunotherapy.

Whole exome sequencing can be used to estimate TMB, which is directly associated with the success of checkpoint blockade immunotherapy8,9,10. For example, Snyder et al.8 showed that melanoma patients with a tumor mutational load ≥ 100 received significantly more clinical benefit from CTLA-4 blockade therapy than patients with mutation loads less than 100. The same researchers found that a high mutational load (> 100) was also significantly correlated with improved overall survival (OS) in learning data but failed to detect such effect on testing data. Likewise, Rizvi et al.9 showed that a tumor mutational load cutoff of ≥ 178 was associated with significant long-term clinical benefits in 34 patients with non-small cell lung cancer (NSCLC) treated with PD-1 blockade therapy compared to patients with lower mutation loads; they also observed that a high mutational load (> 178) was significantly associated with improved progression-free survival [PFS; area under the receiver operating characteristic curve (AUROC) = 0.86]11,12,13,14,15,16. TMB has been found to predict clinical response to pembrolizumab monotherapy among NSCLC patients11 as well as in patients with a variety of rare cancers12,13,14,15. However, TMB did not predict clinical response to pembrolizumab combined with chemotherapy or nivolumab combined with ipilimumab (the Checkmate-227 study)17. It is possible that TMB may play a different role in the setting of combination therapies due to differences in how combined therapies overcome tumor resistance in biomarker-defined samples18. Similarly, a recent study emphasized the importance of studies that explore the combination of additional biomarkers and interrogate a wider set of clinical data16; however, combinations of established makers that were tested using generalized linear models showed only moderate predictive ability (AUROC = 0.78). Therefore, use of TMB should be further validated in separate samples of patients treated with monotherapy and combination therapy.

Given that whole exome sequencing is costly and not regularly performed as part of routine care, the generation of mutation load markers that depend on a denominator derived from the number of gene mutations processed is not practical. As a result, several studied have attempted to evaluate TMB using a sub-sample of genes. For example, Campesato et al.19 used established cancer-gene panels (CGP), such as the Foundation Medicine Panel, for mutation load marker identification in NSCLC; while ability to predict six-month clinical benefit of immunotherapy was “good” (AUROC 0.72–0.84), the biomarker failed to predict overall survival. Roszik et al.20 used a set of 170 gene mutations to predict overall survival for patients with melanoma or NSCLC. Another recent study21 showed that TMB determined by the Memorial Sloan Kettering-Integrated Mutation Profiling of Actionable Cancer Targets (MSK-IMPACT) gene panel predicted survival after blockade immunotherapy across multiple cancer types in patients with advanced cancer; the cutoffs (defined as number of mutations) varied by cancer type. However, it was not clear whether those gene combinations or TMB predicted 6-month clinical benefit. Although 6-month clinical benefit of immunotherapy is a strong surrogate endpoint for overall survival, TMB biomarkers that predict clinical benefit may not predict survival, or vice versa8,9,10,19,20. There remains an unmet need for a cost efficient and more predictive biomarker to ensure that cancer patients benefit from the promise of checkpoint blockade immunotherapies.

We hypothesized that advanced machine-learning technologies and proper modeling approaches could improve biomarker predictive ability and identify a set of gene mutations that could correctly classify which patients were likely to achieve clinical benefits at six months after blockade immunotherapy initiation, and that this same set of mutations would be a prognostic marker for increased overall survival.

Methods

We downloaded whole exome sequencing mutation data for two independent samples of melanoma patients (n = 39 and n = 25) published by Snyder et al.8 and a sample of NSCLC patients (n = 34) published by Rizvi9; links to data are available in the referenced manuscripts. For the original studies, matched normal DNA extracted from peripheral blood was used to filter germline variants and to enable unambiguous identification of somatic mutations. All patients received blockade immunotherapy [either anti-cytotoxic T lymphocyte-associated antigen (CTLA-4) or anti-PD-1]. The primary endpoint—long-term clinical benefit—was defined by radiographic evidence of freedom from disease or evidence of a stable or decreased volume of disease for more than 6 months. Patients were followed up for overall survival as the secondary endpoint.

Following the methods of Breiman et al.22, we used the larger melanoma dataset (n = 39) for learning data and the smaller melanoma dataset (n = 25) for testing data to develop and validate the mutation model. We then used the remaining dataset (NSCLC sample, n = 34) for external validation of the model classification ability.

Data availability

Melanoma data is embedded in Supplementary Tables available in the Appendix to Snyder et al.8: https://www.nejm.org/doi/suppl/10.1056/NEJMoa1406498/suppl_file/nejmoa1406498_appendix.pdf.

NSCLC data is available in the Supplementary Materials to Rizvi et al9: https://science.sciencemag.org/content/suppl/2015/03/11/science.aaa1348.DC1?_ga=2.203136467.331745608.1594129794-458264062.1591795446.

Statistical modeling for 6-month clinical benefits

We used a comprehensive approach to construct our modeling process in four steps (see Fig. 1): (1) variable screening and selection; (2) creation of an initial multi-variable model by including variables with individual effects (in univariate analyses); (3) multi-variable modeling with a pruning approach to improve parsimony while retaining excellent predictive ability; and (4) model validation using independent samples.

Figure 1
figure 1

Four-step modeling process.

We first selected genetic mutations common to the two Snyder datasets8, given the large number of mutations reported (9291–18,850) between them. To screen for mutations potentially associated with the outcomes of interest (Step 1 in Fig. 1), two-sample tests (presence and absence of long-term benefit) were performed with various cut-off p-values (0.2, 0.3, and 0.5) using learning data. Mutations were selected for inclusion in the initial multivariable model (Step 2) if they demonstrated predictive ability in the univariate models within a reasonable computational time (i.e. minutes versus hours).

A Least Absolute Shrinkage and Selection Operator (LASSO) machine-learning technique23 and Salford Predictive Modeler (SPM v8)24 was used to identify combinations of mutations with the maximum ability to predict long-term clinical benefit (Step 3 in Fig. 1). LASSO estimates the contribution of each covariate based on ordinary least squares (OLS), emphasizing combined effects rather than individual covariate effects. An elasticity option in LASSO accounts for correlations between variables and restricts the OLS estimation based on a linear combination of the sum of the absolute regression coefficients as well as on the sum of the squared regression coefficients. Pruning (Step 3) is similar to backward variable selection in a regression model; this approach was used to scale down the number of mutations/ covariates included in the model while retaining same level of the model predictive ability (as measured by AUROC) compared the first multivariable model.

The most parsimonious model that retained “excellent” goodness-of-fit (i.e., AUROC > 0.90) on the testing data was considered to be optimal. In addition, we generated a cut-off point/threshold that optimized both sensitivity and specificity to classify patients into two groups—those with a high-weight tumor mutational burden (HW-TMB) and those with a low-weight tumor mutation burden (LW-TMB). Finally, the model predictive ability (AUROC) was validated using external data.

Testing the effect of HW-TMB on overall survival

Kaplan–Meier and log-rank tests were used to test differences in overall survival between the HW-TMB and LW-TMB groups. First, the effect of the HW-TMB marker on overall survival was tested using two independent samples from Snyder8, and then validated using the external NSCLC sample data9.

Ethics approval and consent to participate

The study was conducted based on published data.

Results

Patient characteristics and clinical follow-up are presented in Table 1. In general, patients’ age and sex were similar across all samples, except that patients were less likely to be male in the NSCLC sample9. Roughly half (41–66%) were determined to have received clinical benefit from treatment based on radiological evidence. Median follow-up was approximately 2 years in melanoma samples and 0.36 years for the NSCLC sample. The number of genetic mutations processed was 18850 in the first melanoma sample (n = 39), roughly twice that of the 9000 genes processed in the second melanoma and the NSCLC sample.

Table 1 Patient characteristics across study data sets.

TMB biomarker to predict long-term clinical benefit at 6 months

We used a p-value cutoff of 0.30 to screen for genetic mutations; this yielded 2139 genes for consideration in the initial multivariable model. The first LASSO model retained 670 genetic mutations with an AUROC of 1.0 on both learning and testing data and 100% accuracy for long-term clinical benefit. Our pruning approach reduced the number of mutations in the model to 161 (Table 2); among these, only nine genes (< 6%) had negative coefficients, indicating that our weighted gene mutation combination model was consistent with previous published mutation load markers for 6-month clinical benefits of immunotherapy. One hundred percent of patients with HW-TMB demonstrated clinical benefits of immunotherapy at 6 months; classification accuracy was 100% (AUROC = 1.0 on both learning and testing data; see Table 2). In further validation using lung cancer data9 AUROC was 0.83.

Table 2 161-gene tumor mutational burden (TMB) model: gene mutations and coefficients included in the final least absolute shrinkage and selection operator (LASSO) model.

TMB biomarker as a prognostic marker for overall survival

In the melanoma learning data sample (n = 39), estimated median survival was was 7.9 years for patients with HW-TMB versus 0.8 years in patients with LW-TMB (p-value < 0.0001; Fig. 2). In the melanoma testing data sample (n = 25), estimated median survival was > 7 years in HW-TMB patients, compared to 0.9 years in LW-TMB patients (p < 0.0001; Fig. 3). In the NSCLC sample, estimated median survival was 0.69 years in HW-TMB patients versus 0.26 in LW-TMB patients (p = 0.0057; Fig. 4).

Figure 2
figure 2

Kaplan–Meier curves for overall survival in advanced melanoma patients treated with immune checkpoint blockade immunotherapy (learning data n = 39); patients with the high-weight tumor mutation burden (HW-TMB) had significant longer survival, compared to those with low-weight tumor mutation burden (LW-TMB).

Figure 3
figure 3

Kaplan–Meier curves for overall survival in advanced melanoma patients treated with immune checkpoint blockade immunotherapy (testing data n = 25); patients with high-weight tumor mutation burden (HW-TMB) had significant longer survival, compared to those with low-weight tumor mutation burden (LW-TMB).

Figure 4
figure 4

Kaplan–Meier curves for overall survival in advanced non-small cell lung cancer patients treated with immune checkpoint blockade immunotherapy (testing data n = 34); patients with high-weight tumor mutation burden (HW-TMB) had significant longer survival, compared to those with low-weight tumor mutation burden (LW-TMB).

Discussion

Our four-step proper model building process and advanced machine-learning approach resulted in a validated model based on a set of 161 mutations generated a model with an “outstanding” 100% predictive accuracy to classify melanoma patients into two groups based on tumor mutation burden. All patients in the HW-TMB group responded to immunotherapy, showing clinical benefit at six months, while all patients in the LW-TMB group were non-responders. This model also had “good” classification ability (AUROC = 0.83) among patients with NSCLC. Furthermore, melanoma patients classified as HW-TMB has median survival time approximately eightfold higher than those with LW-TMB in two independent studies (p < 0.0001). Among NSCLC patients, patients with the same HW-TMB marker had median survival almost three times that of patients in the LW-TMB group (p < 0.001).

Although previous publications have demonstrated that tumors with a high number of nonsynonymous somatic mutations are likely to respond to immune checkpoint blockade immunotherapy in patients with melanoma, these biomarkers depend on a denominator derived from the number of gene mutations processed and fail to predict a better overall survival. In addition, whole exome sequencing is not routinely available in clinical practice; the process can also be costly and time-consuming (results may take up to 5–6 weeks based on current technology). As a result, several studies have attempted to evaluate TMB using a gene panel assays with reduced turnaround time (approximately 2 weeks). Campesato et al.19 proposed using established cancer-gene panels (CGPs) for biomarker identification. The study showed that a mutational load cutoff point of 7 (based on a gene panel consisting of 315 genes) was comparable to a cutoff ≥ 190 points (based on whole exome sequencing) for prediction of durable clinical benefit (partial or stable response lasting > 6 months) among patients with NSCLC; a high mutational load was significantly associated with progression-free survival but no data was presented for overall survival. However, the CGP-mutational load biomarker was not associated with clinical benefit from CTLA-4 blockade therapy in melanoma patients, and there were no observed differences in overall survival between the high and low mutational load groups. Although model predictive ability (AUROC) for durable clinical benefit in NSCLC patients was moderate (0.73–0.84) based on the derivation sample, none of biomarkers predict long term overall survival. In contrast, Roszik et al.20 used a panel of 170 gene mutations to predict overall survival. Their study showed that a weighted 169 gene mutation combination significantly predicted overall survival for patients with melanoma or NSCLC treated with immunotherapies. However, the study was restricted to a fixed panel of gene mutations, and it was unclear whether those gene combinations predicted the clinical benefit of immunotherapy at 6 months.

To address limitations of these methods, we used a comprehensive approach to construct our modeling process in four steps (Fig. 1): (1) variable screening and selection; (2) creation of an initial multi-variable model by including variables with individual effects (in univariate analyses); (3) multi-variable modeling with a pruning approach to improve parsimony while retaining excellent predictive ability; and (4) model validation using independent samples. Our analysis focused on a subset of gene mutations to predict clinical benefits of immunotherapy at 6 months among patients with metastatic melanoma. As described above, whole exome sequencing usually identifies a large quantity of mutations which may vary across studies; this creates challenges for model development and validation. We found that the process of screening potential mutations of interest was an essential first step for a successful model-building process. In the current analysis, we used p-values ranging from 0.1 to 0.5 for individual genetic mutation selection before multivariable modeling; in this analysis, a p-value of 0.3 was chosen in order to retain a sufficient number of gene mutations for the initial multivariable model, given that lower p-values did not identify enough mutations to be useful for the modeling analysis, and higher p-values required too much computational time and added noise to the model.

Our advanced machine learning LASSO method, which emphasizes the combined effects of multiple mutations, generated a model that predicted both long-term clinical outcomes and overall survival across two cancer types among patients who received blockade immunotherapies. Because overall survival is a commonly used outcome for testing treatment efficacy in cancer research and clinical practice, our marker may be useful for identifying appropriate patients for better treatments and care.

We also note that the final HW-TMB model was not unique. The first model retained as many as 670 gene-mutation combination and yielded AUROC = 100% predictive accuracy based on the testing data. During model pruning process, we trimmed the number of mutations until the model reached 161 mutations, the smallest subset as the final model which continued yielding AUROC = 100% on testing data. This 161-mutation HW-TMB biomarker has much better predictive ability for both clinical benefit at 6 months and overall survival than previously published biomarkers based on either whole exome sequencing data or subsets of whole exome sequencing8,9.

Recently, Samstein et al.21 showed that TMB, defined as mutation load based on a pre-defined subset of 468 cancer-related genes from MSK-IMPACT, predicted survival after blockade immunotherapy across multiple cancer types in patients with advanced cancer. However, TMB cutoffs derived from targeted sequencing (gene panel)s may vary across cancer types. Our results are consistent with the Samstein results21, despite the difference in TMB measures as well as the study populations. Our 161 gene-mutation signature/marker was a prognostic marker for survival in patients with one of two advanced stage cancers (melanoma or NSCLC) treated with blockade immunotherapy. This supports the use of 6-month clinical response as a surrogate endpoint for overall survival and the use of a 6-month clinical response biomarker as a surrogate for survival. Another study16 has interrogated established markers and tested their predictive ability for a wider set of clinical endpoints; result were only moderately predictive (AUROC = 0.78). One limitation of that study could be the inclusion of both monotherapy and combination therapy data; the role of TMB in monotherapy should be studied separately from combination therapy due to different resistance mechanisms overcome by combined agents.

As shown in Table 2, our marker accounts for both the weight and value of mutations, which allows us to reduce the number of genes included in the model. This parsimony translates into significantly reduced computational processing time, compared to whole exome sequencing, without loss of predictive accuracy—resulting in a marker that can be adapted for clinical practice, allowing patients to receive appropriate treatment in a timely fashion with clear expectations regarding the clinical benefits.

There are a number of limitations to our analysis. First, the 161 gene-mutation signature is hypothesis-driven; however, the relationship of TMB—measured by either a set of mutation genes or a total mutation cut off—to clinical response in immune monotherapy is well recognized. It is possible that the gene mutations identified in our model may be associated with well-studied genes or may be new candidates for study. A goal for future work includes investigation of the correlation between these new and previously recognized genes to determine their biological relevance. Second, although we have used only a subset of mutations in our analysis, implementation of this method in clinical practice may still be challenging, given the number of genes included. However, compared to whole-exome sequencing or CGPs based on 315 genes, our method may be more cost-efficient. Although our analysis was focused on development and validation among patients with melanoma, we observed “good” predictive ability (AUROC = 0.83) for NSCLC outcomes in our external validation. It is likely that a more specific biomarker would improve predictive ability for patients with NSCLC, but such an exercise was outside the scope of the current analysis, given the small dataset and a lack of existing external data for validation. Finally, while our biomarker was validated using external melanoma data, the validation dataset was derived from patients from the same institution or using the same whole exome sequencing processing technology as the learning data. We were unable to validate our model using published metastatic melanoma data from Hugo et al.25 (n = 37) or by Van Allen et al.26 (n = 110) because there were differences in the whole exome sequencing technology used. Future work will include validation of our marker using a larger independent external dataset based on the same whole exome sequencing technology, evaluated against the same outcomes of interest.

Conclusions

Although previous studies showed that TMB (defined as the number of mutations) predicted the six-month clinical benefits of blockade immunotherapy, predictive ability of these biomarkers was only “good;” these markers also failed to accurately predict overall survival. Our four-step proper model building process and advanced machine-learning approach generated a 161-HW-TMB that predicted six-month clinical response to treatment with 100% discrimination ability (AUROC = 1.0). This same set of mutations, further validated using external data sets, was a prognostic marker for overall survival among patients with melanoma and lung cancer, regardless of heterogeneous patient characteristics, previous treatment status, or type of immunotherapy. We believe our biomarker shows great promise to be adapted for clinical practice and will improve cancer treatment and care.