# Strength of immune selection in tumors varies with sex and age

## Abstract

Individual MHC genotype constrains the mutational landscape during tumorigenesis. Immune checkpoint inhibition reactivates immunity against tumors that escaped immune surveillance in approximately 30% of cases. Recent studies demonstrated poorer response rates in female and younger patients. Although immune responses differ with sex and age, the role of MHC-based immune selection in this context is unknown. We find that tumors in younger and female individuals accumulate more poorly presented driver mutations than those in older and male patients, despite no differences in MHC genotype. Younger patients show the strongest effects of MHC-based driver mutation selection, with younger females showing compounded effects and nearly twice as much MHC-II based selection. This study presents evidence that strength of immune selection during tumor development varies with sex and age, and may influence the availability of mutant peptides capable of driving effective response to immune checkpoint inhibitor therapy.

## Introduction

The major histocompatibility complex (MHC) exposes protein content on the cell surface to allow detection of antigens by the immune system. This applies to nonself antigens such as viral proteins, and self-proteins that include tumor antigens. Tumor cells harbor oncogenic alterations that can be presented to the immune system by the MHC, causing immune recognition and elimination (immune surveillance)1. However, in order to grow, invade, and spread, tumors must evade immune surveillance. Common mechanisms of immune evasion include loss of the MHC molecules and the upregulation of immune checkpoint molecules on cell surfaces that normally regulate the amplitude and duration of a T-cell response2. Immune checkpoint blockade (ICB) uses antibodies to block these immune checkpoint molecules, and can invigorate inactive and/or exhausted T cells to produce antitumor effects that confer long-term survival benefits in certain types of cancer3. However, ICB is effective in only 10–40% of patients for reasons that remain unclear. Meta-analyses of clinical trials in multiple cancer types treated with ICB suggest that young and female patients are characterized by low response rates4,5,6,7,8. The reason(s) for the poor response of these two populations remains elusive.

An accumulating body of literature points to sexual dimorphism in immune responses9. Moderated by genetic and hormonal factors, females have twice the antibody response to influenza vaccines10 and higher CD4+ T-cell counts than males11. Moreover, females are far more susceptible to autoimmune diseases12, demonstrating a stark imbalance in the way the immune response causes diseases in the two sexes. Immunosequencing of over 800 individuals revealed sex associated differences in the extent to which HLA molecules propagate selection and expansion of CD8+ T cells13. Interestingly, a stronger immune response in females has been observed across several species14,15,16, and sexual dimorphism has been demonstrated in immune selection and restriction of intratumor genetic heterogeneity in a mouse model of B-cell lymphoma17. In addition, a recent study has found sex-based differences in molecular biomarkers and immune checkpoint expression in multiple tumor types treated with ICB8. Altogether, these studies suggest that these differences are sex-specific and not lifestyle dependent.

Studies have demonstrated age-related changes in immune response as well. As humans age, there is a decrease of general immune function including production of IL-2, a pivotal growth factor for T cells18. Reduced thymic output, lower numbers of naive T cells, and overall reshaping of the size and specificity of the T-cell repertoire by microbial pathogens may explain why, for example, about 90% of excess deaths during flu season occur in patients greater than 65 years of age19. In addition, elderly people have reduced phagocytic function and HLA-II expression on antigen presenting cells20. Collectively, these factors render elderly individuals less able to mount a T-cell response to new antigens and respond to vaccination.

Recently, we developed the Patient Harmonic-mean Best Rank (PHBR) score that quantifies patients’ ability to present somatic mutations in their tumor by their specific MHC-I and MHC-II haplotypes21,22. PHBR-I and PHBR-II scores aggregate predicted peptide-MHC molecule binding affinities from established tools23,24 to produce a mass spectrometry-validated, residue-centric, and patient-specific presentation score that captures a mutant peptide’s visibility to the immune system. In previous publications we used PHBR scores to assess the role of MHC genotype in shaping mutation accumulation during tumorigenesis21,22. We found that patients tend to accumulate driver mutations that cannot be effectively presented by their own MHC molecules, likely a consequence of immune-based elimination of tumor cells harboring well-presented driver mutations, a selective process referred to as immunoediting25. This analysis revealed that thyroid carcinoma and low-grade glioma patients experience the highest MHC-based selective pressure on driver mutations21,22. Interestingly, these tumor types also had the youngest average age at diagnosis compared to all studied tumor types. In light of these observations, we reasoned that younger and female patients may experience stronger immunoediting early in their tumor history, accumulating mutations that are less favorably presented by their MHC, i.e., mutations more invisible to their immune system, at the time of diagnosis. Predictably, a depletion of potentially immunogenic mutant peptides would cause ICB to be ineffective. At first approximation we ruled out an effect due to sex-specific (MHC-I Pearson R = 0.99, MHC-II Pearson R = 0.99) or age-specific (MHC-I Pearson R = 0.98, MHC-II Pearson R = 0.99) imbalances in MHC genotype frequencies. Therefore, we sought to test the hypothesis that sex- and age-specific differences in driver mutation presentation are the result of differential immunoediting.

In this study we find that female and younger patients exhibit stronger immune selection in their tumors, measured by the affinity of their observed, expressed driver mutations compared to male and older patients. MHC-II appears to have a stronger effect compared to MHC-I. Our findings, based on TCGA samples, are validated in an independent validation cohort.

## Results

### Fewer presentable drivers in female and younger patients

We focused on a set of 1018 driver mutations, defined in21, as driver mutations are more prevalent in the clonal architecture of an individual’s cancer and confer a selective growth advantage. We assigned MHC-I and MHC-II types using PolySolver and HLA-HD, two exome-based calling methods26,27 and considered only microsatellite-stable TCGA tumors. After excluding 515 patients from class I and 1064 patients from class II analyses due to HLA genotype incompatibility with NetMHCpan affinity prediction software, 9913 patients with MHC-I calls and 7174 patients with MHC-II calls remained. These patients were diverse in sex, with more males than females (Supplementary Fig. 1A), and a broad distribution of age at diagnosis (Supplementary Fig. 1B). PHBR-I and -II scores were calculated for all patients across the 1018 driver events by taking the harmonic mean of each allele’s best NetMHCpan percentile rank affinity score, providing an estimate of each patient’s potential to present each mutation via MHC-I and MHC-II, respectively. Importantly, the PHBR-I and PHBR-II scores aggregate percentile rank scores of mutated peptides relative to large numbers of random peptide provided by NetMHCpan-4.0 and NetMHCIIpan3.2. For single peptide-HLA pairs, percentile rank scores of 0.5% and 2% for MHC-I and 2% and 10% for MHC-II have been used to represent strong and weak binding cutoffs respectively28,29.

To rule out other covariates, we performed a series of control analyses. We categorized patients into subgroups according to sex (male versus female) and age (younger versus older based on pan-cancer 30th and 70th percentiles at age of diagnosis for categorical analyses). For sex-specific analyses, we further excluded seven sex-specific tumor types (breast, cervical, ovarian, uterine, prostate, and testicular cancer). First, we established that there were similar average numbers of driver mutations across sex and age patient groups (Supplementary Fig. 2). We previously found that TCGA patients with somatic MHC-I mutations had altered mutational landscapes, with a higher fraction of binding mutant peptides than patients without MHC-I mutations30. To ensure that somatic MHC-I mutations would not skew the driver mutation PHBR-I score distributions, we compared scores for patients with and without MHC-I mutations grouped by sex and age and found no significant differences (Supplementary Fig. 3). We then compared the distributions of patient PHBR-I and PHBR-II scores across the 1018 driver mutations (Supplementary Fig. 4A–D) and found significant p values, but very small effect sizes between groups. To ensure that the potential to present driver mutations was consistent across sex and age, we compared the fraction of presented drivers at various score thresholds, and found no significant differences (Supplementary Fig. 4E, F). The overall similarity of MHC presentation suggests that patients of both sexes and various ages at diagnosis present driver mutations with roughly equivalent efficacy, implying that specificity of MHC presentation resulting from specific allele combinations is not a mechanism causing differences in ICB response rate.

We therefore reasoned that the discrepancy might be due to differences in the strength of immune selection, e.g., tumors with stronger immunoediting should retain fewer driver mutations that are presentable to T cells by the patient’s own MHC molecules. For sex- and age-specific groups in each cohort, we compared the PHBR-I and PHBR-II score distributions for observed, RNA-expressed driver mutations observed in patient tumors, excluding 4782 patients with no drivers from the list of 1018. While the number of observed drivers was not significantly different between sex and age groups (Supplementary Fig. 2), younger female patients were overrepresented in the group with no observed driver mutations (Fisher’s exact test: class I: OR = 1.12, p < 0.12; class II: OR = 1.28, p < 0.015). We note this group had an overrepresentation of thyroid cancer cases, a disease associated with low mutational burden and that typically only has a single driver mutation31. We therefore performed sex-specific analysis for unique 2900 patients and age-specific analysis for 3928 unique patients.

Across pan-cancer cohorts, females were at a significant disadvantage (higher PHBR scores) in presenting their driver mutations by both their MHC-I and MHC-II molecules (Fig. 1a, b, p < 2.6e−04 and p < 1.2e−07, respectively). Younger patients also tended to have worse presentation of driver mutations by both MHC-I and MHC-II molecules (Fig. 1c, d, p < 2.4e−5 and p < 7.3e−04, respectively). Notably, the shift in PHBR score distributions between groups occurs near the threshold for weak binding. Given that a limited number of somatic mutations generate mutant peptides and not all of these are immunogenic, this small shift may translate to significantly less opportunity to generate a host antitumor response upon ICB. Importantly, we found that these observed between-group differences in PHBR scores were far greater (falling outside the 99% confidence interval) than differences when we randomly reassigned mutations across patients and recalculated patient-specific PHBR scores (Methods; Supplementary Fig. 5), and were an order of magnitude greater than the effect sizes observed when comparing score distributions independent of mutation occurrence (Supplementary Fig. S4). We also found differences in affinity independent of the PHBR score, using median NetMHCpan affinity scores across all alleles (Supplementary Fig. 6). Altogether this suggests that score differences do indeed result from the interaction of inherited MHC genotype with the observed mutations. Interestingly, the mutation-specific fraction of RNA reads mapping to these driver mutations was significantly lower for females and younger patients (Supplementary Fig. 7), further supporting sex- and age-based differential strength in immune selection.

We next examined evidence for sex and age differences in specific tumor types, adjusting age thresholds according to tumor type. There was a general trend for female and younger patients’ tumors to have higher median PHBR-I and II scores across tumor types, although the difference was only statistically significant in melanoma (Supplementary Fig. 8A). We observed more variability in the trends across tumor types by age. Younger individuals trended toward higher median PHBR-I and II scores in tumors where the 30th/70th percentile was associated with a large age gap and the younger age threshold was under 55, with some notable exceptions that included rectal cancer, thyroid cancer, stomach cancer, and liver (Supplementary Fig. 8B). Overall these trends suggest that stronger pan-cancer immune selection in younger and female patients results broadly from effects observed across multiple tumor types.

Next, we explored the effect of age and sex in the context of the immune system’s ability to eliminate effectively-presented mutations by modeling the relationship between mutation occurrence and immune visibility as modeled by PHBR-I and II scores. We constructed sex- and age-specific generalized additive models with random effects to account for variation in mutation rate across individuals, and examined the coefficients corresponding to independent and interaction effects for PHBR-I, PHBR-II, and sex or age to assess their contribution to immune selection for expressed mutations observed ≥2 times in the cohort, excluding patients with no observed, expressed driver mutations. To control for the fact that some driver mutations occurred in the same tumor, and thus are not completely independent events, we included patient ID as a random effect in our linear model. In both models, we found that PHBR-I and PHBR-II scores alone had significant effects on the probability of a mutation to be a target of immune selection (Table 1). Positive coefficients for both PHBR scores indicate that the higher the PHBR score (i.e., poorer presentation), the higher the probability of mutation. Furthermore, when we quantified the influence of both scores on probability of mutation using odds ratios between respective 25th and 75th percentiles, we found that PHBR-II (OR: 3.4, CI [3.19, 3.6]) has a much larger impact on probability of mutation than PHBR-I (OR: 1.27, CI [1.26, 1.29]), echoing the larger effect sizes seen in Fig. 1. As expected, sex and age alone did not influence the probability of mutation; however, of particular interest are the interaction terms that indicate the influence of PHBR scores on probability of mutation within the context of sex and age. Both the PHBR-I:sex and PHBR-I:age interactions as well as the PHBR-II:sex and PHBR-II:age interactions were significant. The negative PHBR:age estimates indicate stronger effects of PHBR-I as well as PHBR-II contribution to the probability of mutation in younger patients. On the other hand, positive PHBR:sex estimates indicate stronger effects of PHBR-I and PHBR-II contributing to probability of mutation in females according to the model formulation (Methods). Collectively, these results suggest stronger immune selection in females and younger patients.

As females and younger patients both demonstrated stronger immune selection compared to males and older patients, we further partitioned the cohorts simultaneously by sex and age, and investigated the distribution of PHBR-I and -II scores for these groups. We found that sex and age effects are cumulative, with tumors in younger females exhibiting significantly higher selective pressure by MHC than those in the other three groups (Fig. 2). We noticed a profound difference between PHBR score distributions between younger females and older males. Because younger males had worse presentation of their driver mutations compared to older females (Fig. 2), we sought to ensure that sex had an effect on immune selection independent of age. In two models incorporating sex, age, and PHBR-I and PHBR-II scores, respectively, both PHBR:sex and PHBR:age were independently significant for both class I and class II (Supplementary Table 1). These results demonstrate that more aggressive immune selection in younger females selects for tumors with driver mutations that are less visible to the immune system.

### Mutational signatures do not explain differential selection

We next explored whether sex- and age-specific effects could be driven by differences in environmental exposure rather than the strength of immune selection. Mutational signatures assign specific mutations to different mutagenic processes, allowing the exploration of differences in environmental exposure across sex and age. We compared the sex-specific occurrence of mutational signatures in each tumor type and found only a minority of instances where signature strength was weakly but significantly associated with sex (Fig. 3a). Importantly, only three of the signatures (01, 02, and 05) where we observed significant sex-specific differences contribute to the set of driver mutations used for this analysis (Fig. 3b). Since signatures 01 and 05 are endogenous rather than exposure associated signatures, this suggests a very low impact of environmental exposures on sex-specific effects of immune selection on drivers. Furthermore, when we excluded the tumor types with significant signature differences (glioblastoma multiforme, GBM and liver hepatocellular carcinoma, LIHC), we still observed sex- and age-related differences (Supplementary Table 2). In addition, only two signatures correlated with age, both of which have known association with aging32. We examined C>T and T>C mutations, which are hallmarks of signature 01 and 05, respectively, and found that observed driver mutations in these categories were broadly distributed across age at diagnosis. To explain weaker immune selection in older individuals, age-related mutations would have to be better presented (have lower PHBR scores) than other mutations. Instead, we found that C>T and T>C mutations were significantly more poorly presented (had slightly higher PHBR scores) than other mutations across all possible MHC-I and MHC-II alleles, suggesting that these mutations, and by extension, signatures 01 and 05, could not drive the apparent age-associated difference in immune selection (Fig. 3c). Thus, we conclude that the sex- and age-specific effects on immune selection are not likely due to environmental exposure differences32,33.

### Validation in an independent non-TCGA cohort

We sought validation of our findings in a cohort of 342 patients (309 with compatible MHC-I type calls and 277 with MHC-II type calls) compiled from published dbGaP studies and non-TCGA samples in the International Cancer Genome Consortium (ICGC) database34 and filtered to exclude tumor types not represented in TCGA. While fewer tumor types were represented relative to the discovery cohort, these patients were diverse with respect to sex and age at diagnosis, with slightly more males than females, and similar average numbers of driver mutations. As in the discovery cohort, we found some significant differences in patient PHBR score distributions across the 1018 driver mutations, also with very small effect sizes between groups. Likewise, there was no difference in the fraction of presented drivers at various score thresholds (Supplementary Fig. 9). The majority of our validation cohort did not have expression data, so we predicted RNA expression using a logistic regression classifier trained on the TCGA cohort (Methods).

We found, as in the discovery cohort, that effectively-presented driver mutations were significantly depleted in younger and female patients compared to older and male patients (Fig. 4a–d). These differences were an order of magnitude greater than the effect sizes observed when comparing score distributions independent of mutation occurrence (Supplementary Fig. S9E–H). When we examined the simultaneous effects of sex and age (Fig. 4e, f), younger females once again had significantly worse presentation of their driver mutations than older males across both MHC-I and MHC-II (p < 0.001, p < 0.007). We repeated the sex- and age-specific analyses using the generalized additive models and found that, for both sex and age, PHBR-II scores alone significantly influenced the probability of mutation, with higher PHBR scores (i.e., worse presentation) leading to higher probability of mutation (Supplementary Table 3). While PHBR-II:sex and PHBR-II:age coefficients trended in the same direction, with stronger effects in females and younger patients, they did not reach significance, likely due to sample size.

## Discussion

Here, we present evidence that both sex and age impact the driver mutations that arise and persist during tumorigenesis. We found that younger and female patients accumulate driver mutations in their tumors that are less readily presented by their MHC molecules (Fig. 5), suggesting a stronger toll by immune selection early in tumorigenesis. This finding is consistent with recent meta-analyses across multiple tumors showing sex- and age-dependent differences in response to ICB4,5,6,7. We also observed the strongest effects in MHC-II based selection, in agreement with the fact that females have higher CD4+ T-cell counts than males35. A prevalent role of MHC-II driven immune selection can be explained by the fact that CD4+ T cells, besides direct effector function comparable to that of CD8+ T cells, also play a deep-rooted regulatory role in cooperating with CD8+ T cells via associative recognition of antigen36,37. Their function in orchestrating T-cell immunity, in general terms, makes them privileged actors, hence targets of immune selection as revealed herein. In older individuals, immune selection effects by MHC-II presentation of driver mutations are mitigated by a reduced CD4+/CD8+ ratio38 and greater telomere attrition in CD4+ T cells than in CD8+ T cells39 leading to accelerated senescence. Taken together, the evidence suggests that tumors developing in younger and female patients are prone to stronger immunoediting than those in older and male patients.

Our findings based on the TCGA were reproduced in the smaller validation cohort where we once again observed poorer MHC-based presentation of driver mutations in females versus males and younger versus older patients, with presentation being worse in younger and female patients. When modeling the influence of MHC genotype on the probability of observing driver mutations, the estimated effect sizes are modest, although relatively large compared to effects detected by genome wide association studies where odds ratios are often <1.240. Several sources of uncertainty, including errors in patient genotyping, prediction of the peptide-HLA binding affinities used to calculate the PHBR score, and errors in somatic mutation calling could obscure the true effects21. More accurate estimates will likely require larger sample sizes, and ideally availability of expression data as non-expressed mutations should not reflect the effects of immune selection.

In this analysis, we focused on a set of recurrent missense and indel mutations in established driver genes developed in our previous work. This is motivated by the assumption that these are more likely to occur early during tumorigenesis, and may thus provide a view of immune selection before various mechanisms of immune evasion occur22. However it is unlikely that immune selection operates differently on different categories of mutation, and nondriver mutation-derived neoantigens should be equally capable of triggering a T-cell response. Whether tumor cells can evade T-cell responses more easily when they are targeted against nonessential nondriver mutations remains an important question. It has been suggested that ICB responses are most effective when a clonal driver neoantigen is present41. While we did not observe large sex or age bias in the mutational signatures associated with the 1018 driver mutations, we speculate that it is possible nondriver mutations could show differences in their potential to serve as neoantigens if the underlying mutational processes are active at different times or are biased to generate mutations in expressed protein coding sequences with characteristics that bias their presentation.

Notwithstanding some limitations, our analysis provides a compelling case for the paradigm that immune selection exerts its toll differently with respect to sex and age, with a greater effect in younger females. Of note, the younger female cohort had the poorest driver mutation presentation across both the discovery and validation cohorts, suggesting that these effects are strong and complementary. Although our analysis suggests that younger age is associated with stronger antitumor immune responses, we strongly suggest caution in considering whether this trend could generalize to pediatric tumors. The genomic landscape of pediatric tumors is distinct from that of adulthood tumors, with lower mutation burdens, different driver events and more germline factors and the characteristics of the pediatric immune system differ greatly from those of an adult42. Furthermore, we are unable to control for other sex- and age-related factors beyond predicted MHC presentation of driver mutation-derived peptides. These possibilities may include (a) differences in the antigen processing machinery preceding surface exposure of MHC-peptide complexes, and (b) genetic and epigenetic factors causing preferential mutation accumulation in the cohorts for reasons other than immunoediting.

In conclusion, this study indicates that immune selection exerts its toll differently with respect to sex and age, with a greater effect in younger females. As such, the response rate to ICB may be dependent on the strength of immune selection occurring early in tumorigenesis. Methods to accurately predict the impact of immunoediting on a patient-specific basis may lead to better predictive algorithms for response to therapy. As a corollary, we posit that ICB treatment is likely to have a reduced effect in younger female patients since this treatment will attempt to reactivate T cells for immunologically invisible neoantigens. Rather, adaptive T-cell therapy against patient-validated neoantigens or therapeutic vaccination against conserved antigens will likely be more beneficial in these patients. Notably prior to treatment with ICB, male sex (and less consistently older age) are associated with higher risk of recurrence and death in melanoma and may stand to benefit more from ICB43,44, thus it is also possible that overall stronger immune surveillance could prove advantageous in the context of ICB despite differences in the quality of neoantigens. Finally, these findings shed light on the role of immune surveillance in cancer progression.

## Methods

### HLA typing

HLA genotyping was performed for class I genes HLA-A, HLA-B, HLA-C, and class II genes HLA-DRB1, HLA-DPA1, HLA-DPB1, HLA-DQA1, and HLA-DQB1, which encode three protein determinants of MHC-I peptide binding specificity, HLA-DR, HLA-DP, and HLA-DQ. TCGA samples were typed with Polysolver26, with default parameters, for class I and typed with HLA-HD27, using default parameters, for class II. Both tools require germline (whole blood or tissue matched) whole exome sequenced samples. Samples with very low coverage on specific genes are left untyped by HLA-HD. Patients were assigned an HLA-DR type if they were successfully typed for HLA-DRB1. Patients were assigned HLA-DP and -DQ types if they had successful typing for HLA-DPA1/HLA-DPB1 and HLA-DQA1/HLA-DQB1, respectively. Class I and class II types were validated by xHLA45, run with default parameters, and only patients where all alleles agreed in both classes were included in the analysis.

### Presentation score assignment

We used patient presentation scores, as defined in21, to represent a particular patient’s ability to present a residue given their distinct set of HLA types. For class I, 6 HLA alleles were considered (HLA-A, HLA-B, and HLA-C). For class II, 12 HLA-encoded MHC-II molecules (4 combinations of HLA-DPA1/DPB1 and HLA-DQA1/DQB1; 2 alleles of HLA-DRB1 considered twice each—since HLA-DRA1 is invariant—for consistency between resulting molecules). NetMHCpan4.028 and NetMHCIIpan3.229 were used to calculate binding affinities. The PHBR score was assigned as the harmonic mean of the best residue presentation scores for each group of MHC-I and MHC-II molecules. A lower patient presentation score indicates that the patient’s MHC molecules are more likely to present a residue on the cell surface.

### Set of driver mutations

Somatic mutations were considered to be recurrent and oncogenic if they occurred in one of the 100 most highly ranked oncogenes or tumor suppressors described by Davoli et al.46 and were observed in at least three TCGA samples. Among these, we retained only mutations that would result in predictable protein sequence changes that could generate neoantigens, including missense mutations and inframe indels. A total of 1018 mutations (512 missense mutations from oncogenes, 488 missense mutations from tumor suppressors, 11 indels from oncogenes and 7 indels from tumor suppressors) were obtained21.

### Modeling the effects of PHBR score on mutation probability

We built two matrices, for PHBR-I scores and PHBR-II scores, from the 1018 mutations and the 1912 patients with both PHBR-I and -II calls. Next, we built a binary mutation matrix yij {0,1} indicating whether patient i has a specific mutation j. We evaluated the relationship between this binary matrix, the matched 1912 × 1018 matrices with log PHBR-I and -II scores, x1ij and x2ij, respectively, and the variable of interest (sex or age) for patient i and mutation j. We fit a generalized additive model for the centered log PHBR-I, centered log PHBR-II scores, centered sex (coded 0/1 for males/females) or centered age, and mutation probability with the GAM function in the MGCV R package47. To estimate the effects of PHBR and sex or age on probability of mutation, we considered the following random effects models:

$${\mathrm{Logit}}\left( {{\mathrm{P}}\left( {{{y}}_{ij} \,=\, 1} \right)} \right) \,= \, {\upbeta}_{\mathrm{1}}{{x1}}_{ij} \,+\, {\upbeta}_{\mathrm{2}}{{x2}}_{ij} \,+\, {\upbeta}_{\mathrm{3}}{\mathrm{Sex}}_i \,+\, {\upbeta}_4\left( {{{x}}1_{ij} \,\times\, {\mathrm{Sex}}_i} \right) \\ \quad\, +\, {\upbeta}_{\mathrm{5}}\left( {{{x}}2_{ij} \,\times\, {\mathrm{Sex}}_i} \right) + {\upeta}_i,$$
(1)
$${\mathrm{Logit}}\left( {{\mathrm{P}}\left( {{{y}}_{ij} \,=\, 1} \right)} \right) \,=\, {\upbeta}_{\mathrm{1}}{{x1}}_{ij} \,+\, {\upbeta}_{\mathrm{2}}{{x2}}_{ij} \,+\, {\upbeta}_{\mathrm{3}}{\mathrm{Age}}_i \,+\, {\upbeta}_4\left( {{{x}}1_{ij} \,\times\, {\mathrm{Age}}_i} \right) \\ \, \quad +\, {\upbeta}_{\mathrm{5}}\left( {{{x}}2_{ij} \,\times\, {\mathrm{Age}}_i} \right) \,+\, {\upeta}_i.$$
(2)

And PHBR-I and PHBR-II specific models (results in Supplementary Table 1):

$${\mathrm{Logit}}\left( {{\mathrm{P}}\left( {{{y}}_{ij} \,=\, 1} \right)} \right) \,=\, {\upbeta}_{\mathrm{1}}{{x1}}_{ij} \,+\, {\upbeta}_{\mathrm{2}}{\mathrm{Age}}_i \,+\, {\upbeta}_{\mathrm{3}}{\mathrm{Sex}}_i \,+\, {\upbeta}_4\left( {{{x}}1_{ij} \,\times\, {\mathrm{Sex}}_i} \right) \\ \, \quad +\, {\upbeta}_{\mathrm{5}}\left( {{{x}}1_{ij} \,\times\, {\mathrm{Age}}_i} \right) \,+\, {\upeta}_i,$$
(3)
$${\mathrm{Logit}}\left( {{\mathrm{P}}\left( {{{y}}_{ij} \,=\, 1} \right)} \right) \,=\, {\upbeta}_{\mathrm{1}}{{x2}}_{ij} \,+\, {\upbeta}_{\mathrm{2}}{\mathrm{Age}}_i \,+\, {\upbeta}_{\mathrm{3}}{\mathrm{Sex}}_i \,+\, {\upbeta}_4\left( {{{x}}2_{ij} \,\times\, {\mathrm{Sex}}_i} \right) \\ \, \quad +\, {\upbeta}_{\mathrm{5}}\left( {{{x}}2_{ij} \,\times\, {\mathrm{Age}}_i} \right) \,+\, {\upeta}_i.$$
(4)

where ηi ~ N(0, θη) are random effects capturing different mutation propensities among patients, using patient IDs. In these models βn measures the effect of the log-PHBR-I, log-PHBR-II, and sex or age. This analysis was repeated for the validation cohort.

### Mutational signature analysis

Mutational signatures analysis was performed using a previously developed computational framework SigProfiler48. A detailed description of the workflow of the framework can be found in ref. 48, while the code can be downloaded freely from: https://www.mathworks.com/matlabcentral/fileexchange/38724-sigprofiler.

### Predicting RNA expression from DNA variant allelic fraction

To predict binary RNA expression (≥5 reads at the mutant allele), we used the LogisticRegressionCV function from the Python sklearn v0.20.3 package to train a logistic classifier on the TCGA discovery cohort, using DNA variant allelic fraction (VAF), VAF percentile rank within the patient, and mutated gene as features. We conducted 10-fold cross-validation, achieving a mean 72% area under the receiver operating curve.

### Statistical analysis

All box plots were evaluated using the default one-tailed Mann–Whitney U statistical test, via the scipy.stats Python package. Mutational signature sex-specific distributions were also compared using the one-tailed Mann–Whitney U test, and p values were adjusted using the Benjamini–Hochberg Procedure. All boxplot figures include the median line, the box denotes the interquartile range (IQR), whiskers denote the rest of the data distribution and outliers are denoted by points determined by ±1.5 × IQR. Effect sizes were calculated using Cliff’s d (Cliff 1993).

## Data availability

Discovery cohort: data were obtained from publicly available sources including The Cancer Genome Atlas (TCGA) Research Network [http://cancergenome.nih.gov/]. TCGA normal exome sequences and TCGA clinical data were downloaded from the GDC on June 23–26th, 2018 and April 25th, 2017, respectively, using the gdc-client v1.3.0. Furthermore, TCGA somatic mutations were accessed from the NCI Genomic Data Commons [https://portal.gdc.cancer.gov/] on May 14th, 2017. Validation cohort: dbGaP studies (accession numbers: phs001493.v1.p1.c2, phs001041.v1.p1.c1, phs001425.v1.p1.c1, phs001493.v1.p1.c1, phs000980.v1.p1.c1, phs001469.v1.p1.c1, phs000452.v2.p1.c1, phs001451.v1.p1.c1, phs001519.v1.p1.c1, phs001565.v1.p1.c1) were obtained from the dbGaP database using the ascp tool from AsperaConnect v3.9.5.172984 and WXS/WGS data obtained from the Sequence Read Archive (SRA)49 using the SRA toolkit v2.9.2. Somatic mutation files were obtained from the respective papers associated with each study. Additional non-TCGA patients’ WXS/WGS data was obtained from the ICGC using the EGA download client v2.2.2 and icgc-get v0.6.1 and somatic mutation data from the ICGC DCC Data Release [https://dcc.icgc.org/] on (April 2, 2019 (PCAWG), March 18, 2019 (THCA-SA)) (Supplementary Dataset 1). The validation cohort’s MHC-I and -II genotypes were typed using HLA-HD27 and PHBR scores calculated using the method described in “Presentation score assignment”. All remaining relevant data are available in the article, Supplementary Information, or from the corresponding author upon reasonable request.

## Code availability

Code to reproduce findings and figures can be freely accessed at https://github.com/CarterLab/HLA-immunoediting.

## References

1. 1.

Burnet, F. M. The concept of immunological surveillance. Prog. Exp. Tumor Res. 13, 1–27 (1970).

2. 2.

Schreiber, R. D., Old, L. J. & Smyth, M. J. Cancer immunoediting: integrating immunity’s roles in cancer suppression and promotion. Science 331, 1565–1570 (2011).

3. 3.

Ribas, A. & Wolchok, J. D. Cancer immunotherapy using checkpoint blockade. Science 359, 1350–1355 (2018).

4. 4.

Nosrati, A. et al. Evaluation of clinicopathological factors in PD-1 response: derivation and validation of a prediction scale for response to PD-1 monotherapy. Br. J. Cancer 116, 1141–1147 (2017).

5. 5.

Wu, Y. et al. Correlation between sex and efficacy of immune checkpoint inhibitors (PD-1 and CTLA-4 inhibitors). Int. J. Cancer https://doi.org/10.1002/ijc.31301 (2018).

6. 6.

Botticelli, A. et al. The sexist behaviour of immune checkpoint inhibitors in cancer therapy? Oncotarget 8, 99336–99346 (2017).

7. 7.

Kugel, C. H., 3rd et al. Age correlates with response to anti-PD1, reflecting age-related differences in intratumoral effector and regulatory T-cell populations. Clin. Cancer Res. https://doi.org/10.1158/1078-0432.CCR-18-1116 (2018).

8. 8.

Ye, Y. et al. Sex-associated molecular differences for cancer immunotherapy. Nat. Commun. 11, 1779 (2020).

9. 9.

Klein, S. L. & Flanagan, K. L. Sex differences in immune responses. Nat. Rev. Immunol. 16, 626–638 (2016).

10. 10.

Engler, R. J. M. Half- vs. full-dose trivalent inactivated influenza vaccine (2004–2005). Arch. Intern. Med. 168, 2405–2414 (2008).

11. 11.

Abdullah, M. et al. Gender effect on in vitro lymphocyte subset levels of healthy individuals. Cell. Immunol. 272, 214–219 (2012).

12. 12.

Jacobson, D. L., Gange, S. J., Rose, N. R. & Graham, N. M. Epidemiology and estimated population burden of selected autoimmune diseases in the United States. Clin. Immunol. Immunopathol. 84, 223–243 (1997).

13. 13.

Schneider-Hohendorf, T. et al. Sex bias in MHC I-associated shaping of the adaptive immune system. Proc. Natl Acad. Sci. U.S.A. 115, 2168–2173 (2018).

14. 14.

Hill-Burns, E. M. & Clark, A. G. X-linked variation in immune response in drosophila melanogaster. Genetics 183, 1477–1491 (2009).

15. 15.

Mondal, S. & Rai, U. Sexual dimorphism in phagocytic activity of wall lizard’s splenic macrophages and its control by sex steroids. Gen. Comp. Endocrinol. 116, 291–298 (1999).

16. 16.

Pap, P. L., Czirják, G. A., Vágási, C. I., Barta, Z. & Hasselquist, D. Sexual dimorphism in immune function changes during the annual cycle in house sparrows. Naturwissenschaften 97, 891–901 (2010).

17. 17.

Milo, I. et al. The immune system profoundly restricts intratumor genetic heterogeneity. Sci. Immunol. 3 (2018).

18. 18.

Simon, A. K., Hollander, G. A. & McMichael, A. Evolution of the immune system in humans from infancy to old age. Proc. Biol. Sci. 282, 20143085 (2015).

19. 19.

Jiang, N. et al. Lineage structure of the human antibody repertoire in response to influenza vaccination. Sci. Transl. Med. 5, 171ra19 (2013).

20. 20.

Agrawal, A., Agrawal, S. & Gupta, S. Dendritic cells in human aging. Exp. Gerontol. 42, 421–426 (2007).

21. 21.

Marty, R. et al. MHC-I genotype restricts the oncogenic mutational landscape. Cell 171, 1272–1283.e15 (2017).

22. 22.

Marty, R., Thompson, W. K., Salem, R. M., Zanetti, M. & Carter, H. Evolutionary pressure against MHC class II binding cancer mutations. Cell https://doi.org/10.1016/j.cell.2018.08.048 (2018).

23. 23.

Nielsen, M. & Andreatta, M. NetMHCpan-3.0; improved prediction of binding to MHC class I molecules integrating information from multiple receptor and peptide length datasets. Genome Med. 8, 33 (2016).

24. 24.

Karosiene, E. et al. NetMHCIIpan-3.0, a common pan-specific MHC class II prediction method including all three human MHC class II isotypes, HLA-DR, HLA-DP and HLA-DQ. Immunogenetics 65, 711–724 (2013).

25. 25.

Dunn, G. P., Bruce, A. T., Ikeda, H., Old, L. J. & Schreiber, R. D. Cancer immunoediting: from immunosurveillance to tumor escape. Nat. Immunol. 3, 991–998 (2002).

26. 26.

Shukla, S. A. et al. Comprehensive analysis of cancer-associated somatic mutations in class I HLA genes. Nat. Biotechnol. 33, 1152–1158 (2015).

27. 27.

Kawaguchi, S., Higasa, K., Shimizu, M., Yamada, R. & Matsuda, F. HLA-HD: an accurate HLA typing algorithm for next-generation sequencing data. Hum. Mutat. 38, 788–797 (2017).

28. 28.

Jurtz, V. et al. NetMHCpan-4.0: improved peptide-MHC class I interaction predictions integrating eluted ligand and peptide binding affinity data. J. Immunol. 199, 3360–3368 (2017).

29. 29.

Jensen, K. K. et al. Improved methods for predicting peptide binding affinity to MHC class II molecules. Immunology 154, 394–406 (2018).

30. 30.

Wong, W. C. et al. CHASM and SNVBox: toolkit for detecting biologically important single nucleotide mutations in cancer. Bioinformatics 27, 2147–2148 (2011).

31. 31.

Cancer Genome Atlas Research Network. Integrated genomic characterization of papillary thyroid carcinoma. Cell 159, 676–690 (2014).

32. 32.

Alexandrov, L. B. et al. Clock-like mutational processes in human somatic cells. Nat. Genet. 47, 1402–1407 (2015).

33. 33.

Alexandrov, L. B. et al. Signatures of mutational processes in human cancer. Nature 500, 415–421 (2013).

34. 34.

Zhang, J. et al. International cancer genome consortium data portal—a one-stop shop for cancer genomics data. Database 2011, bar026–bar026 (2011).

35. 35.

Amadori, A. et al. Genetic control of the CD4/CD8 T-cell ratio in humans. Nat. Med. 1, 1279–1283 (1995).

36. 36.

Keene, J. A. & Forman, J. Helper activity is required for the in vivo generation of cytotoxic T lymphocytes. J. Exp. Med. 155, 768–782 (1982).

37. 37.

Gerloni, M. et al. Functional cooperation between T helper cell determinants. Proc. Natl Acad. Sci. U.S.A. 97, 13269–13274 (2000).

38. 38.

Goronzy, J. J., Fang, F., Cavanagh, M. M., Qi, Q. & Weyand, C. M. Naive T cell maintenance and function in human aging. J. Immunol. 194, 4073–4080 (2015).

39. 39.

Son, N. H., Murray, S., Yanovski, J., Hodes, R. J. & Weng, N. Lineage-specific telomere shortening and unaltered capacity for telomerase expression in human T and B lymphocytes with age. J. Immunol. 165, 1191–1196 (2000).

40. 40.

Hodge, S. E. & Greenberg, D. A. How can we explain very low odds ratios in GWAS? I. Polygenic models. Hum. Hered. 81, 173–180 (2016).

41. 41.

McGranahan, N. et al. Clonal neoantigens elicit T cell immunoreactivity and sensitivity to immune checkpoint blockade. Science 351, 1463–1469 (2016).

42. 42.

Gröbner, S. N. et al. The landscape of genomic alterations across childhood cancers. Nature 555, 321–327 (2018).

43. 43.

Natale, C. A. et al. Activation of G protein-coupled estrogen receptor signaling inhibits melanoma and improves response to immune checkpoint blockade. Elife 7 (2018).

44. 44.

Zhai, Y., Haresi, A. J., Huang, L. & Lang, D. Differences in tumor initiation and progression of melanoma in the BrafCA ;Tyr-CreERT2;Ptenf/f model between male and female mice. Pigment Cell Melanoma Res. 33, 119–121 (2020).

45. 45.

Xie, C. et al. Fast and accurate HLA typing from short-read next-generation sequence data with xHLA. Proc. Natl Acad. Sci. U.S.A. 114, 8059–8064 (2017).

46. 46.

Davoli, T. et al. Cumulative haploinsufficiency and triplosensitivity drive aneuploidy patterns and shape the cancer genome. Cell 155, 948–962 (2013).

47. 47.

Wood, S. N. mgcv: GAMs and generalized ridge regression for R. R. News 1, 20–25 (2001).

48. 48.

Alexandrov, L. B., Nik-Zainal, S., Wedge, D. C., Campbell, P. J. & Stratton, M. R. Deciphering signatures of mutational processes operative in human cancer. Cell Rep. 3, 246–259 (2013).

49. 49.

Leinonen, R., Sugawara, H. & Shumway, M., Collaboration, I. N. S. D. The sequence read archive. Nucleic Acids Res. 39, D19–D21 (2010).

## Author information

Authors

### Contributions

Original concept, R.M.P.; project supervision, H.C. and M.Z.; project planning and experimental design, A.C., R.M.P., C.P.D., M.Z., and H.C.; statistical advising, X.Z., W.K.T.; data acquisition, processing, and analysis, A.C. and R.M.P.; mutational signature analysis, L.A.; preparation of paper, A.C., R.M.P., M.Z., and H.C.

### Corresponding author

Correspondence to Hannah Carter.

## Ethics declarations

### Competing interests

R.M.P. is an employee and holds stock in Personalis. The remaining authors declare no competing interests.

Peer review information Nature Communications thanks Joshua Rubin and the other, anonymous reviewer(s) for their contribution to the peer review of this work. Peer review reports are available.

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

## Rights and permissions

Reprints and Permissions

Castro, A., Pyke, R.M., Zhang, X. et al. Strength of immune selection in tumors varies with sex and age. Nat Commun 11, 4128 (2020). https://doi.org/10.1038/s41467-020-17981-0

• Accepted:

• Published:

• ### Sexual dimorphism in cancer

• Anna Dart

Nature Reviews Cancer (2020)

• ### Senescent Tumor CD8+ T Cells: Mechanisms of Induction and Challenges to Immunotherapy

• Wei Liu
• , Paweł Stachura
• , Haifeng C. Xu
• , Sanil Bhatia
• , Arndt Borkhardt
• , Philipp A. Lang
•  & Aleksandra A. Pandyra

Cancers (2020)