Introduction

Tumor mutational burden has been described as a predictor of tumor behavior and immunological response1,2,3. At its core, mutation formation promotes carcinogenesis via activation or inactivation of genes and associated pathways, thus generating novel peptide sequences which can stimulate immune response. High mutational burden may in some cases represent a high underlying number of drivers, and indicate a higher-risk tumor: for example patients with high mutational burden lung adenocarcinoma tumors showed a 14-month survival decrease4, supporting that high mutation burden may be a harbinger of poor clinical outcomes. Alternatively, highly mutated tumors may develop many novel peptides and thus display more neoantigens, rendering them more susceptible T-cell targets5. For example, patients with melanomas with a high mutational load showed improved survival with ipilimumab6 and improved overall survival7; patients with highly mutated ovarian cancer had improved postoperative chemotherapy response and higher overall survival2.

Here, we systematically analyzed mutational burden survival effects across multiple cancer types. We hypothesized that tumor exonic missense mutational burden (TEMMB) is predictive of underlying total exonic mutational burden (TEMB), and that TEMMB is independent of critical demographic and tumor-specific factors. Furthermore, we hypothesized that TEMMB is a predictive marker of tumor immune surveillance and clinical outcomes. We sought to test these hypotheses, and to describe the potential genetic underpinnings for the impact of TEMMB on survival. We focused on somatic missense mutation burden in subsequent analyses. Missense mutations represent the most common observed oncogenic variants8, and are known to alter sequences of expressed transcripts and thus lead to downstream translation of mutated proteins9. Furthermore, missense variants specifically have been suggested to be the most frequent class of alterations to carry the potential for neoepitope generation in chronic lymphocytic leukemia malignancy (as compared to frameshift or splice-site variants)10. In multiple myeloma, missense mutational load was found to be highly correlated with predicted neoantigen loads11. Missense mutations produce specific amino acid changes in a known pattern, allowing for a systematic way to characterize mutational profiles by defining recurrent and non-recurrent mutations.

Results

Tumor missense mutational burden (TEMMB) variability among cancers

Total missense mutational burden across all cohorts ranged from a low of 8 (median) missense mutations among acute myeloid leukemia (LAML) and thymoma (THYM), to 256 median mutations among the skin cutaneous melanoma (SKCM) cohort (Fig. S1). 10 individuals were removed as TEMMB outliers (Fig. S2). Total (TEMB) and missense (TEMMB) tumor exonic mutational burden were found to be closely correlated among all cohorts: Pearson’s r ranged from 0.95–1.00 for all cohorts other than uveal melanoma (UVM) which also revealed a strong positive correlation with r = 0.88 likely due to a small (N = 79) sample size (p < 2.2 × 10−16 for all cohorts) (Fig. S3).

TEMMB relations to age, sex and tumor stage

Increasing patient age was significantly correlated with high TEMMB among 17 of 31 (55%) cohorts (Table 1). Male sex was significantly associated with high TEMMB in renal papillary cell carcinoma (KIRP), sarcoma (SARC), and cutaneous melanoma (SKCM). Female sex was significantly associated with high TEMMB in colorectal adenocarcinoma (COAD) and glioblastoma multiforme (GBM). High tumor stage (Stage III and above) was observed to be significantly associated with both high TEMMB in 3 cohorts and low TEMMB in 7 cohorts.

Table 1 Contributions of age, sex, and tumor stage to tumor exonic missense mutational burden (TEMMB). Each model was additionally adjusted by recruitment center (IRR and p-values not shown).

Melanoma, ovarian carcinoma, and bladder carcinoma benefit from high mutational load

Following multivariate adjustment for age, sex, stage, and patient recruitment center and exclusion of seven cohorts with a low number of non-censored events, TEMMB was found to be significantly correlated with survival in 4 of 24 TCGA cohorts (Fig. 1). High TEMMB correlated with improved survival in skin cutaneous melanoma (SKCM): HR = 0.71 [0.60–0.85], p = 0.0002, bladder urothelial carcinoma (BLCA): HR = 0.74 [0.59–0.93], p = 0.01, and ovarian carcinoma (OV): HR = 0.80 [0.70–0.93], p = 0.003. High TEMMB was associated with decreased survival in colorectal adenocarcinoma (COAD): HR = 1.32 [1.00–1.74], p < 0.05 (p = 0.0497). Following Bonferroni adjustment of α = 0.05 for 24 comparisons, yielding a p-value cutoff of α/24 = 0.0021, only cutaneous melanoma retained a significant correlation between TEMMB and survival.

Figure 1
figure 1

Survival effects of tumor exonic missense mutational burden (TEMMB). Survival is expressed as hazard ratio (HR) per effective multivariate-adjusted TEMMB. Raw two-sided Wald-test p-values are reported with *indicating p < 0.05 and **indicating Bonferroni-adjusted significance for 24 multiple comparisons. Ovarian carcinoma, cutaneous melanoma, bladder carcinoma, and colorectal adenocarcinoma showed significant survival benefit with high TEMMB.

Relative burden of recurrent and non-recurrent mutations expressed with recurrence score (RS)

To characterize the somatic mutational profile of each cancer, we determined the relative burden of recurrent mutations to total mutations within each cohort, expressed as a recurrence score (RS). Recurrent mutations were defined as specific amino acid changes observed among greater than 1% of each cohort’s population. Mutational profiles, and thus RS, varied significantly between distinct cancers (Fig. S4). Several cohorts, notably adrenocortical carcinoma (ACC), acute myeloid leukemia (LAML), brain lower grade glioma (LGG), pheochromocytoma and paraganglioma (PCPG), thyroid carcinoma (THCA), thymoma (THYM), and uveal melanoma (UVM), revealed mutations occurring at high prevalence among the sequenced population. The recurrent mutations can be readily visualized as sharp peaks in the cancers’ mutational profiles. Such cohorts were found to have high recurrence scores (RS). Other cohorts, such as skin cutaneous melanoma (SKCM) and ovarian carcinoma (OV), displayed mutational profiles with fewer pronounced recurring mutations (sharp peaks). These cohorts carried a higher enrichment of non-recurrent mutations, and thus were found to have lower RS (Fig. 2).

Figure 2
figure 2

Recurrence scores (RS) of all cancer cohorts, calculated as the fraction of recurrent missense mutations to total missense mutations in the pool. Recurrent mutations were defined as those which exceeded 1% prevalence in the cohort.

We catalogued the specific tumor mutations observed among these cohorts displaying a highly-recurrent mutational landscape. In the adrenocortical carcinoma (ACC) cohort, 0.29% of all pooled missense mutations were in the ZNF517 gene (p.V349A), and 0.29% of missense mutations were recurrent GARS (p.P42A) mutations. In uveal melanoma (UVM) cohort, 2.54% were recurrent GNA11 (p.Q209P) mutations, 2.01% were recurrent GNAQ p.Q209P, and 0.75% GNAQ p.Q1209L. In thyroid carcinoma (THCA), 5.23% were BRAF p.V600E, 0.65% were NRAS p.Q61R, and 0.25% HRAS p.Q61R. In the acute myeloid leukemia (LAML) cohort, 1.38% of missense mutations were in DNMT3A gene (p.R882H), 1.05% were IDH2 p.R140Q, and 0.79% were IDH1 p.R132C. In pheochromocytoma and paraganglioma (PCPG), 0.64% of mutations were recurrent HRAS p.Q61R, and 0.36% were CHEK2 p.K152E.

Cancers with low recurrence scores (RS) show survival benefit from high TEMMB

We identified a significant positive correlation (r = 0.49, p = 0.016) among all cancer cohorts between the survival effect, or Hazard Ratio (HR), of adjusted-TEMMB and cancer recurrence score (RS) (Fig. 3). Cancers with low RS tended to exhibit survival benefit (HR < 1) with increased adjusted-TEMMB. Conversely, cancers with high RS were observed to have a decrease in survival (HR > 1) with increased adjusted-TEMMB. Testing an alternate recurrence cutoff of 5% (traditional cutoff for minor allele frequency) confirmed a significant positive correlation: r = 0.66, p = 0.002 (Table S1).

Figure 3
figure 3

Correlation of log-adjusted mutational burden survival Hazard Ratios (HR) with cohorts’ log-adjusted recurrence scores (RS). Cancers with high recurrent mutation enrichment showed survival harm with increasing TEMMB, while tumors with low recurrent missense mutation burden tended to show survival benefit (r = 0.49, p = 0.016).

Discussion

Exonic missense mutation distribution displays considerable variability among cancers studied in TCGA. We identified cutaneous melanoma and lung squamous cell carcinoma as the tumors with the highest TEMMB, and acute myeloid leukemia and thyroid carcinoma as among the lowest. These results were consistent with previously-reported mutational burden distribution12. Somatic missense mutations strongly contribute to the generation of novel tumor epitopes. Understanding whether a more highly-immunogenic tumor carries a direct link to mutational burden could provide a mechanistic explanation for observed clinical survival patterns. In our results, TEMMB was closely correlated with TEMB among all TCGA cohorts, supporting TEMMB’s role as a robust proxy for TEMB.

Exonic missense mutational burden showed strong consistent positive association with age, supporting current understanding of human mutagenesis. While age-related mutagenesis rates do vary between individuals and tissue types, a consistent positive correlation between mutational load and age has been supported by animal and human research13,14,15,16,17,18. Several “clock-like” mutational signatures may be contributory to this chronological mutagenesis phenomenon19.

Interestingly, low tumor stage was correlated (after Bonferroni adjustment) with high TEMMB in breast carcinoma, colon and rectal adenocarcinoma, and uveal melanoma. Chromosomal and microsatellite instability (MSI) are observed in early stages of adenomas, and significant chromosomal instability has been proposed as an underlying feature present prior to malignant transformation20,21,22. Low-stage adenocarcinoma tumors may thus carry higher mutational loads due to the pronounced underlying genomic instability. Although the role of immune therapy is not yet strongly established in colorectal cancer (CRC), the immune tumor microenvironment in CRC is an important factor in disease progression23,24. Likewise, breast carcinogenesis has been proposed to be regulated by innate and adaptive inflammatory responses25. It is possible that during progression towards high-stage adenocarcinoma tumors in breast and colorectal cancers, highly-immunogenic or high-TEMMB cells are cleared through immune targeting and elimination, thus selecting for a population of low-TEMMB cells with low neoantigen loads. Uveal melanoma has a low mutational burden which has been suggested as a possible reason for low success of immunotherapy in its treatment26. Given the high propensity for rapid metastasis in uveal melanoma, it is possible that intercepting such tumors at an early stage may partially be explained by a higher mutational load and thus more favorable immune response.

Driver mutations impart tumor growth advantage and are positively selected in cancer evolution, while biologically inert passengers accumulate without directional selection over the tumor growth timespan27. Many established bioinformatics methods to study drivers rely on techniques that identify recurrent mutations28, and thus we quantified recurrent and non-recurrent mutations to serve as proxy for relative amounts of drivers and passengers within a cancer type. Our results suggest high TEMMB tends to confer survival benefit in cancers with more non-recurrent (likely passenger) mutations, and decreased survival in cancers with high recurrent (likely driver) fractions. We propose that in malignancies with large enrichments of non-recurrent mutations, high TEMMB marks a high passenger count, and increasing passenger mutation load increases neoantigen presentation29 without imparting additional growth advantage or aggressiveness. Our observed benefit with high TEMMB supports literature findings for melanoma6 and ovarian carcinoma2. In cases of malignancies with higher relative amounts of recurrent or driver mutations, for instance in adrenocortical carcinoma (ACC), uveal melanoma (UVM), and brain lower grade glioma (LGG), high mutational burden correlates with increased drivers of aggressiveness and invasion. In our study, increasing TEMMB showed a trend towards survival harm in these highly somatically-recurrent tumors.

Recent work has suggested a “double-edged” effect of increased DNA variants, noting that on the one hand, high DNA variation increases accumulation of drivers which are beneficial to tumor adaptation; conversely, high concurrent passenger loads may outweigh the driver effects30. Our results suggest a model for improved understanding of the variable manifestations of this molecular tug-of-war among a variety of cancer types. We found the underlying mutational landscape of DNA changes to be quite variable among malignancies documented in TCGA. A group of cancers such as adrenocortical carcinoma, uveal melanoma, and brain glioma emerged as a “driver-enriched” class, while a second group – including cutaneous melanoma and ovarian carcinoma – emerged as a “passenger-enriched” class. Increasing DNA variation in these two classes, quantified as TEMMB, yielded opposing survival effects. Our findings highlight TEMMB as an independent survival biomarker with potential utility for risk-stratification and identification of those patients who may benefit from immunotherapy. Classification of malignancies into driver- or passenger-rich classes may also aid in identifying suitable candidate cancers for immune therapy trials.

The study was limited by the following factors: first, TCGA describes exome sequences, and thus mutations in noncoding regions could not be analyzed. Thus, TEMMB reflects specifically the exonic mutational burden rather the full genome scale. It is possible that non-coding DNA contributes significantly to survival, and further study with comprehensive full genome sequencing may help elucidate such effects. Second, details of therapy and treatment course were available not for all patients, and thus we were unable to systematically study effect modification and confounding by treatment differences. Third, in-silico findings are important for discovery of novel relationships and insights in tumor biology, but in-vivo studies are required to further analyze mechanisms by which TEMMB affects tumor immune surveillance, metabolic, and growth properties. Future work will focus on analysis of immunological mechanisms responsible for clearing high-TEMMB tumors with a low enrichment of recurrent mutations. Lastly, the study is also significantly limited by a lack of controlled population-based recruitment among the TCGA cohorts. We adjusted TEMMB to account for recruitment center to partially address this limitation. However, future work would benefit from a study with more clearly and regularly ascertained cohorts.

Our overall analyses suggest that positive and negative TEMMB effects on survival may depend on the enrichment of underlying recurrent mutations. Cancers with higher proportions of non-recurrent and thus likely passenger mutations showed survival benefit with high TEMMB, while cancers with higher recurrent mutation fractions (likely drivers) revealed a decrease in survival. Mutational signatures for some cancers might contribute significantly to overall TEMMB (e.g. UV-signature in the cutaneous melanoma cohort), thus, in part environmental effects contribute to the TEMMB survival effect. These findings highlight the relationship of tumor mutational burden to driver and passenger effects. Understanding how tumor mutational burden correlates with clinical outcomes for certain classes of malignancies will help guide clinical decisions regarding TEMMB as a useful biomarker for predicting survival and response to immunotherapy.

Methods

R statistical language (Version 3.4.4)31 with ‘RTCGAToolbox’32, ‘MASS’33, ‘survminer’34, ‘forestplot’35 were used for analysis and plotting. We obtained somatic mutation and clinical data for 31 cancer cohorts in The Cancer Genome Atlas (TCGA). 6947 individuals had available mutation data; 6717 of the set had complete clinical data on age, sex, and stage; 2113 patients were deceased and had available time-to-death survival data.

We examined individuals with maximum TEMMB value in each cohort, excluding those with TEMMB greater than triple of the next largest TEMMB value. As an initial quality control (QC) step, 10 (0.1% of total) samples were excluded as outliers potentially representing technical batch effects in tumor DNA analysis. Pearson’s correlation was used to examine the relationship between TEMMB and TEMB across all cohorts. We then analyzed the relationship between TEMMB and patients’ clinical factors. Negative binomial regression was used to model TEMMB as a function of age (continuous variable: “years”), sex (categorical variable: “male” and “female”), tumor stage (categorical variable: “low” defined as Stage 0, I, II, “high” defined as Stage III, IV), and recruitment center (categorical variable). Sex was omitted from the model for those cancers affecting exclusively one gender – cervical squamous cell carcinoma (CESC), ovarian carcinoma (OV), prostate adenocarcinoma (PRAD), uterine corpus endometrial carcinoma (UCEC), and uterine carcinosarcoma (UCS). Staging data was not available for glioblastoma multiforme (GMB), acute myeloid leukemia (LAML), brain lower grade glioma (LGG), pheochromocytoma and paraganglioma (PCPG), prostate adenocarcinoma (PRAD), sarcoma (SARC), and thymoma (THYM).

Next, we examined the effect of TEMMB on survival. We considered the residuals obtained from the negative binomial regression models as the effective TEMMB adjusted for age, sex, stage, and recruitment center. We used these residuals as inputs to Cox-proportional hazards models to predict survival (in days) as a function of effective TEMMB. Survival effects were expressed as hazard ratios (HR), which can be defined as the effective hazard per day conferred by effective TEMMB. Because TEMMB is an overdispersed count variable, it was adjusted well through negative binomial regression. The significance of Cox-proportional hazards models was calculated with two-sided Wald tests. Survival analysis for all 31 cohorts is reported in Figure S5A. We observed that in certain cohorts, such as pheochromocytoma and paraganglioma (PCPG), fewer than 10 patients were tracked until death, with the majority lost to follow-up. In such cases, we suspected that the survival analysis was dominated by censored data points (Fig. S5B). Thus, we performed an additional QC step by excluding cohorts in the bottom 5th, 10th, and 20th percentiles of number of non-censored events. Results upon stringent exclusion of the bottom 20th percentile of cohorts are reported in the main text.

We aggregated all nonsynonymous missense mutations among all individuals in each cancer. Missense variants resulting in identical amino acid changes were aggregated as one specific variant type. Recurrent mutations were defined as those variants exceeded 1% prevalence in the cohort, which is the traditional allele frequency cutoff for eliminating rare DNA variation36,37. A somatic recurrence score (RS) was calculated as the fraction of total mutations in the cohort’s pool comprised by recurrent mutations as defined above:

$${RS}=\frac{\sum {Recurrent}\,{Missense}\,{Variants}}{\sum {All}\,{Missense}\,{Variants}}$$

A RS was assigned to each cancer type, and the correlation between log10-adjusted survival HR and log10-adjusted RS was computed with Pearson’s correlation. To demonstrate robustness to parameter choice, an additional recurrent mutation prevalence definition of 5% (traditional Minor Allele Frequency cutoff for common DNA variation38) was tested.