Introduction

Glioma is the most common type of primary malignant brain tumor in the United States (US), with an average annual age-adjusted incidence rate of 6.0/100,0001. Glioma can be broadly classified into glioblastoma (GBM, 61.9% of gliomas in adults 18+ in the US) and lower-grade glioma (non-GBM glioma, 24.2% of adult gliomas) with tumors such as ependymoma (6.3%), unclassified malignant gliomas (5.1%), and pilocytic astrocytoma (1.9%) making up the majority of other cases1. Many environmental exposures have been investigated as sources of glioma risk, but the only validated risk factors for these tumors are ionizing radiation (which increases risk), and history of allergies or other atopic disease (which decreases risk)2. These tumors are significantly more common in people of European ancestry, in males and in older adults1. The contribution of common low-penetrance SNPs to the heritability of sporadic glioma in persons with no documented family history is estimated to be ~25%3. A recent glioma genome-wide association study (GWAS) meta-analysis validated 12 previously reported risk loci4, and identified 13 new risk loci. These 25 loci in total are estimated to account for ~30% of heritable glioma risk. This suggests that there are both undiscovered environmental risk (which accounts for ~75% of incidence variance) and genetic risk factors (accounting for ~70% of heritable risk)3,4.

Population-based studies consistently demonstrate that incidence of gliomas varies significantly by sex. Most glioma histologies occur with a 30–50% higher incidence in males, and this male preponderance of glial tumors increases with age in adult glioma (Fig. 1)1. Several studies have attempted to estimate the influence of lifetime estrogen and progestogen exposure on glioma risk in women5,6. Results of these analyses have been mixed, and it is not possible to conclusively determine the impact of hormone exposure on glioma risk. Male predominance in incidence occurs broadly across multiple cancer types and is also evident in cancers that occur in pre-pubertal children and in post-menopausal adults7,8. Together these observations suggest that other mechanisms in addition to acute sex hormone actions must be identified to account for the magnitude of sex difference in glioma incidence.

Figure 1
figure 1

Average Annual Incidence of all glioma, glioblastoma and lower grade glioma by sex and age at diagnosis (CBTRUS 2010–2014).

Though sex differences exist in glioma incidence, sex differences have not been interrogated in previous glioma GWAS. Sex-specific analyses have the potential to reveal genetic sources of sexual dimorphism in risk, as well as to increase power for detection of loci where effect size or direction may vary by sex9,10. The aim of this analysis is to investigate potential sex-specific sources of genetic risk for glioma that may contribute to observed sex-specific incidence differences.

Results

Study population

There were 4,831 male cases, 5,216 male controls, 3,206 female cases, and 5,470 female controls (Table 1). A slightly larger proportion of male cases were GBM (58.7% of male cases vs 52.5% of female cases). Controls were slightly older than cases. GBM cases had a higher mean age than non-GBM cases, which was consistent with known incidence patterns of these tumors. Male and female cases within histology groups had similar age at diagnosis. The proportion of non-GBM cases varied by study due to differing recruitment patterns and study objectives (see original publications for details of recruitment patterns and inclusion criteria)4,11,12,13,14.

Table 1 Population characteristics by study and sex.

Previously discovered glioma risk regions

There were 5,934 SNPs within 500 kb of 26 previously discovered glioma risk loci with IMPUTE2 information score (INFO) > 0.7 and MAF > 0.01 that were previously found to have at least a nominal (p < 5 × 10−4) association with glioma4, and results were considered significant at p < 2.8 × 10−6 level (adjusted for 6,000 tests in each of three histologies [18,000 tests], see Fig. 2A for schematic of study design). Among the 25 previously validated glioma risk loci, nine loci contained 10 SNPs with pM < 2.8 × 10−6 and/or pF < 2.8 × 10−6 in any histology: 1p31.3 (RAVER2), 5p15.33 (TERT), 7p11.2 (EGFR, two independent loci), 8q24.21 (intergenic region near MYC), 9p21.3 (CDKN2B-AS1), 11q23.3 (PHLDB1), 16p13.3 (RHBDF1), 17p13.1 (TP53), and 20q13.33 (RTEL1) (Table 2). ORM and ORF were similar in the majority of these loci.

Figure 2
figure 2

Study Schematic for analyses of (A) autosomal SNPs and (B) SNPs on sex chromosomes.

Table 2 Previously identified glioma risk loci and histology-specific odds ratios (OR) and 95% confidence intervals (95% CI) stratified by sex.

For one of two independent loci at 7p11.2 (rs11979158), there was a significant association only in males for all glioma (ORM = 1.33 [95% CI = 1.23–1.44], pM = 4.87 × 10−12) and GBM (ORM = 1.40 [95% CI = 1.28–1.54], pM = 1.26 × 10−12) but the sex differences did not meet the significance threshold (overall pD = 0.0055, and GBM pD = 0.1184) (Fig. 3, Table 2).

Figure 3
figure 3

Sex-specific odds ratios overall and by histology grouping, 95% CI and p values for selected previous GWAS hits and 3p21.31 (rs9841110) for all glioma, GBM, and non-GBM.

The previously identified SNP at 8q24.21 (rs55705857) was the most significant SNP in both males and females. Odds ratio for rs55705857 in all glioma was significantly higher in females (ORF = 2.45 [95% CI = 2.14–2.80], pF = 1.22 × 10−39) as compared to males (ORM = 1.56 [95% CI = 1.40–1.75], pM = 1.09 × 10−14) with pD = 3.46 × 10−7. In non-GBM only, ORF (ORF = 4.71 [95% CI = 3.94–5.63], pF = 1.85 × 10−65) was also elevated as compared to ORM (ORM = 2.66 [95% CI = 2.28–3.10)], pM = 8.13 × 10−36) with pD = 8.44 × 10−7 (Fig. 3, Table 2). This association was further explored in a case-only analysis, where there was a significant difference between males and females overall (p = 0.0012), and in non-GBM (p = 0.0084) (Supplemental Table 1).

Previous studies have found a strong association between rs55705857 and oligodendroglial tumors (particularly tumors with isocitrate dehydrogenase 1/2 (IDH1/2) mutation and loss of the short arm of chromosome 1 [1p] and the long arm of chromosome 19 [19q]), so this association was further explored in the non-GBM (lower grade glioma [LGG]) histology groups (Table 3). For World Health Organization (WHO) grade II-grade III astrocytoma, effect was stronger in females (ORF = 4.64 [95% CI = 3.53–6.09], pF = 2.15 × 10−28) as compared to males (ORM = 2.87 [95% CI = 2.31–3.56], pM = 1.19 × 10−21) with pD = 0.0065. For WHO grade II–III oligodendrogliomas effect was stronger than observed in WHO grade II–III astrocytomas, and effect size was stronger in females (ORF = 12.15 [95% CI = 8.96–16.48], pF = 3.68 × 10−58) as compared to males (ORM = 5.47 [95% CI = 4.16–7.19], pM = 5.37 × 10−34) with pD = 6.60 × 10−5. Oligoastrocytic tumors were not included in sub-analyses due to recent research that suggests that these tumors are not an entity that is molecularly distinct from oligodendrogliomas or astrocytomas15.

Table 3 Sex-specific odds ratios (OR), 95% confidence intervals (95% CI), and p values from meta-analysis for rs11979158, rs55705857 and rs9841110 by specific non-GBM histologies.

Genome-wide scan of nominally significant regions

In a previous eight study meta-analysis, ~12,000 SNPs (INFO > 0.7, MAF > 0.01) were identified as having a nominally significant (p < 5 × 10−4) association with all glioma, GBM, or non-GBM4. A sex-stratified genome-wide scan was conducted within this set of SNPs and results were considered significant at pD < 1.4 × 10−6 (adjusted for 12,000 tests in each of three histologies [36,000 tests], see Fig. 2A for schematic of study design). Similar genome-wide peaks were observed between males and females (Fig. 4). One large region within 3p21.31 (49400kb–49600kb, ~200 kb) was identified as being significantly associated with glioma and GBM in females only (Fig. 5, Supplemental Fig. 1). There were 243 SNPs with nominally significant associations within this region in the previous eight-study meta-analysis (p < 5 × 10−4), and 32 of these had nominally significant sex associations (pF < 5 × 10−6 or pM < 5 × 10−6) in all glioma or GBM. The strongest association in females within this region was at rs9841110, in both all glioma (ORF = 1.22 [95% CI = 1.14–1.32], pF = 5.55 × 10−8) with pD = 1.77 × 10−4) and GBM only (ORF = 1.27 [95% CI = 1.16–1.38], pF = 3.86 × 10−7) with pD = 6.04 × 10−4), while there were no significant associations detected in males (Fig. 3). No SNPs in this region were significantly associated with non-GBM. In a case-only analysis a marginally significant difference was detected between males and females overall (p = 0.0520) and in GBM (p = 0.0428) (Supplemental Table 1).

Figure 4
figure 4

Manhattan plot of -log(p) values for all glioma in (A) males and (B) females, for GBM in (C) males and (D) females, and for non-GBM in (E) males and (F) females.

Figure 5
figure 5

Plot of region on chromosome 3 identified as having a sex-specific association with GBM for (A) males and (B) females.

Agnostic scan of sex chromosome loci

SNPs on the sex chromosomes were analyzed in GICC only. There were 245,746 SNPs with INFO > 0.7 and MAF > 0.01 on the X chromosome after quality control and imputation, and results were considered significant at p < 2 × 10−7 (corrected for 250,000 tests, see Fig. 2B for a schematic of study design). No SNPs met this significance threshold. After quality control procedures were complete, there were 300 SNPs remaining on the Y chromosome. There was no imputation performed of the Y chromosome data and only the 300 genotyped SNPs were evaluated. No significant signals were detected on the Y chromosome.

Combined analysis of germline variants and somatic characterization

Due to the lack of molecular classification data included in the GICC, MDA-GWAS, SFAGS-GWAS and GliomaScan datasets, glioma data obtained from TCGA datasets (GBM and LGG) were used to explore the potential confounding due to molecular subtype variation with histologies. There were 758 individuals from the TCGA dataset available for analysis with available germline genotyping, molecular characterization, sex, and age data (Supplemental Table 2). Overall, slightly more females (53.2%) as compared to males (47.2%) had IDH1/2 mutant glioma, but this difference was not statistically significant (p = 0.1104) (Supplemental Fig. 2). When tumors were stratified by histological type, approximately equal proportions of males and females had IDH1/2 mutations present in their tumors (GBM: 6.0% in males, and 5.2% in females; LGG: 17.9% in males, and 17.7% in females). There were also no significant differences by sex in IDH/TERT/1p19q subtype (Supplemental Fig. 3, overall p = 0.2859), or pan-glioma methylation subgroup (Supplemental Fig. 4, overall p = 0.4153).

SNPs found to be nominally significant (p < 5 × 10−4) in a previous 8 study meta-analysis, with imputation quality (r2) ≥ 0.7 were identified within the TCGA germline genotype data and D’ and r2 values in CEU were used to select proxy SNPs (Supplemental Table 3)16. A case-only analysis was conducted using sex as a binary phenotype for proxy SNPs in the TCGA dataset. In the overall meta-analysis, there was a nominally significant signal in the case-only meta-analysis for the proxy SNP in 3p21.31 in glioblastoma (Table 4). There was no significant association in the TCGA set, but RAF was elevated in females as compared to males in the GBM set, as well as in all IDH1/2 wild type gliomas. MAF in LGG and IDH1/2 mutant glioma was similar among males and females. There was a nominally significant signal in the case-only meta-analysis for the proxy SNP at 7p11.2, but no significant association in the TCGA, but RAF was elevated in males as compared to females in the GBM set, as well as in all IDH1/2 wild type gliomas. There was no significant signal detected in the overall case-only meta-analysis for the proxy SNP at 8q24.21, or within the TCGA set. Among both LGG and IDH1/2 mutant, RAF was elevated in females as opposed to males.

Table 4 Risk allele frequencies (RAF) Case-only odds ratios, 95% confidence intervals (95% CI), and p values for marker SNPs from four study meta-analysis and the Cancer Genome Atlas genotyping data.

Sex-stratified genotypic risk scores

In order to estimate the cumulative effects of significant variants by sex, unweighted risk scores (URS) were calculated by summing all risk alleles for each individual using the 10 SNPs (rs12752552, rs9841110, rs10069690, rs11979158, rs55705857, rs634537, rs12803321, rs3751667, rs78378222, and rs2297440) found to be significantly associated with glioma in this analysis. GBM (URS-GBM) and non-GBM (URS-NGBM) specific URS were calculated only using sets of 6 SNPs in this set that were significantly associated with these histologies (URS-GBM: rs9841110, rs10069690, rs11979158, rs634537, rs78378222, and rs2297440, and URS-NGBM: rs10069690, rs55705857, rs634537, rs12803321, rs78378222, and rs2297440). See Methods for additional information on score calculation. Median URS, URS-GBM, and URS-NGBM were significantly different (p < 0.0001) between cases and controls in both males and females in all histology groups (Supplemental Fig. 5). There was no significant difference in median risk scores between male and female cases for any histology group. Glioma risk increased with increasing number of alleles in both males and females for the 10 SNPs included in the overall URS, as well as the 6 SNPs in the URS-GBM and 6 SNPs in URS-NGBM (Fig. 6, Supplemental Table 4). Risk was higher in females (OR = 3.97 [95% CI = 2.42–6.80]) as compared to males (OR = 1.74 [95% CI = 1.21–2.53]) in all glioma for individuals for with 13–16 alleles, though the difference between these estimates were not statistically significant. Risk was also higher among females (OR = 2.69 [95% CI = 1.98–3.66]) as compared to males (OR = 1.79 [95% CI = 1.38–2.32]) in GBM for individuals with 8–11 risk alleles, as well as in non-GBM for individuals with 6–11 risk alleles (females: OR = 2.83 [95% CI = 2.12–3.78], males: OR = 1.70 [95% CI = 1.31–2.19]), though the difference between these estimates were not statistically significant. The estimates may underestimate actual risk due to varying effect sizes and alleles frequencies between risk variants.

Figure 6
figure 6

Odds ratios and 95% confidence intervals for unweighted risk (URS) score in (A) all glioma, (B) GBM-specific URS (URS-G) in GBM, and (C) and non-GBM-specific URS (URS-NGBM) for in non-GBM.

Discussion

This is the first analysis of inherited risk variants in sporadic glioma focused specifically on sex differences, and the first agnostic unbiased scan for glioma risk variants on the X and Y sex chromosomes. One SNP at the 7p11.2 locus (rs11979158) showed significant association in males only, in both all glioma and GBM (Table 2, see Supplemental Table 5 and Supplemental Fig. 6 for study-specific estimates). This variant is within one of two previously identified independent glioma risk loci located near epidermal growth factor receptor (EGFR) and is most strongly associated with risk for GBM4,17. Though EGFR is implicated in many cancer types and is a target for many anti-cancer therapies, this risk locus has not been previously associated with any other cancer type. While estrogen has been demonstrated to interact with EGFR as well as other growth factors, previous epidemiological studies have not consistently found an association between proxy markers for endogenous estrogen exposure and decreased glioma risk18. Cell intrinsic, hormone independent sex differences in EGF effects have been observed in a murine model of gliomagenesis, where EGF treatment was transforming for male but not female astrocytes that had been rendered null for neurofibromin and p53 function19. While this SNP was not genotyped on the germline genotyping array used for TCGA, a SNP in strong LD with rs11979158 (rs7785013, D’ = 1, r2 = 1 in CEU16) was evaluated using a case-only approach. The association was not statistically significant in any histology group, but a similar trend to that observed in the overall meta-analysis in sex-specific RAF was observed in both the overall GBM and the IDH1/2 wild type groups.

The association at 8q24.21 (rs55705857) is the strongest that has been identified by glioma GWAS to date4, with an odds ratio of 1.99 (95% CI = 1.85–2.13, p = 9.53 × 10−79) in glioma overall, and an odds ratio of 3.39 (95% CI = 3.09–3.71, p = 7.28 × 10−149) in non-GBM (see Supplemental Table 5 and Supplemental Fig. 7 for study-specific estimates). The identified SNP, rs55705857, is located in an intergenic region near coiled-coil domain containing 26 (CCDC26, a long non-coding RNA). This analysis found a stronger association in females than males in all glioma and non-GBM, where female odds ratio estimates are ~2 × those of males (Table 2). ORs were higher in women than men in all studies included in the analysis (see Supplemental Tables 5 and 6 for study-specific estimates and MAF). A sensitivity analysis was conducted to assess the effect of study heterogeneity on this estimate in non-GBM using only the GICC, MDA-GWAS, and GliomaScan datasets. The exclusion of SFAGS-GWAS did not substantially change the results (Main analysis pD = 1.20 × 10−6 and sensitivity pD = 1.49 × 10−5). A histology-specific analysis found a similar sex differences in ORs for rs55705957 for both non-GBM astrocytoma, and oligodendroglioma (Table 3, see Supplemental Table 7 for study-specific estimates). This variant is strongly associated with IDH1/2 mutant and 1p/19q codeleted glioma tumors, but data on these molecular markers was not available for the four GWAS datasets used20,21. The TCGA GBM and LGG datasets22,23,24 were used to assess potential sex differences in frequency of IDH1/2 mutation within histologies. Approximately the same proportion of males as females with histologically confirmed GBM had IDH1/2 mutations (5.2% vs 6.0%, respectively, Supplemental Fig. 2). While this SNP was not genotyped on the germline genotyping array used for TCGA, a SNP in weak LD with rs55705857 (rs4636162, D’ = 1; r2 = 0.104, in CEU16) was evaluated using a case-only approach. There was no significant association in the overall meta-analysis for this SNP, and the association in the analysis of TCGA cases was not statistically significant in any group.

A large region in 3p21.31 was identified that was associated with all glioma and GBM in females only (Table 2, see Supplemental Table 5 and Supplemental Fig. 8 for study-specific estimates). The strongest association in this region was rs9841110, an intronic variant located upstream of dystroglycan 1 (DAG1) within an enhancer region. While this SNP was not genotyped on the germline genotyping array used for TCGA, a SNP in strong LD with rs9841110 (rs9814873, D’ = 1, r2 = 1 in CEU16) was able to be evaluated using a case-only approach. The association was not statistically significant in any group, but a similar trend in RAF was observed in the overall GBM and IDH1/2 wild type groups. Though this region has not previously been associated with glioma, previous GWAS have detected associations at 3p21.31 for a large variety of traits, including several autoimmune diseases as well as increased age at menarche25,26,27,28. If increased lifetime estrogen exposure decreases glioma risk, as some have hypothesized, it is reasonable that variants which increase age at menarche (potentially decreasing total lifetime estrogen exposure) may increase glioma risk in females. Due to the complexity of measuring lifetime estrogen exposure (which is affected by age at menarche, age at menopause, parity, breast feeding patterns, and estrogen replacement therapy post-menopause) it is difficult to determine the ‘true’ effect that this exposure might have on glioma risk.

As compared to a model containing age at diagnosis and sex alone, the three SNPs (rs55705857, rs9841110 and rs11979158) identified as having sex-specific effects explain an additional 1.4% of trait variance within the GICC set. The variance explained by these SNPs varies by histology (0.6% in GBM, and 3.3% in Non-GBM). The variance explained by the addition of these three SNPs was higher in females for all glioma (1.3% in males and 2.2% in females), and non-GBM glioma (2.3% in males and 5.3% in females), and slightly higher in males for GBM (0.9% in males and 0.7% in females). Unweighted risk scores (URS) were generated to compare the cumulative effects of glioma risk variants by sex by summing all risk alleles using the 10 SNPs found to be significantly associated with glioma in this analysis. GBM (URS-GBM) and non-GBM (URS-NGBM) specific URS were calculated using sets of six SNPs in this set that were associated with significantly associated with each histology. Individuals with lower numbers of risk alleles had significantly lower odds of glioma, and those with higher numbers of alleles had increased odds of glioma, with statistically significant trends in each histology group. Males and females with low risk scores had similar odds of glioma, while females had increased odds in the upper strata of scores as compared to males. Development of risk scores that weight alleles by effect size, and use sex-specific estimates for variants for which effect size varies by sex (such as 7p11.2 and 8q24.21), may lead to better predictive values.

While often not included in GWAS, sex-stratified analyses can reveal genetic sources of sexual dimorphism in risk9,10. Sex-stratified analyses not only contribute to understanding of sources of sex difference in incidence, but may also suggest mechanisms and pathways disease development that vary by sex. Sex variation in genetic susceptibility to disease is likely not due to sex differences in actual DNA sequence, but is thought to be the result of sex-specific regulatory functions29,30,31. In addition to genetic sources of difference, there are likely several additional factors acting in combination which contribute to sex differences in glioma incidence. Sex differences in disease can also be linked to in-utero development, during which time gene expression and risk phenotypes are patterned through the action of X chromosome alleles that escape inactivation and genes on the non-pseudo-autosomal component of the Y chromosome, as well as the epigenetic effects of in utero testosterone32. A previous analysis estimating heritability of brain and CNS tumors by sex using twins attempted to estimate sex-specific relative risks, but these analyses were limited by a small sample size33. Further investigation of the inheritance patterns of familial glioma by sex may also provide additional information about sex differences in this disease.

There are several limitations to this analysis. Individuals included in these datasets were recruited during different time periods from numerous institutions, with no central review of pathology. Molecular tumor markers were unavailable for all datasets, and as a result classifications are based on the treating pathologist using the prevailing histologic criteria at time of diagnosis. The variant at 8q24.21 has been shown to have significant association with particular molecular subtypes, and without molecular data it was not possible to determine whether the observed result is an artifact of varying molecular features by sex. Oligodendroglioma as a histology is highly enriched for IDH1/2 and 1p/19q co-deleted tumors (117/174, or ~67% within TCGA) and it is therefore likely that the analysis using only tumors classified as oligodendroglioma captured most of this molecular subtype. Males and females within histology groups have different frequencies of IDH1/2 mutation22, which may have confounded the estimates for 8q24.21. The TCGA dataset was used to explore sex differences in allele frequency within molecular groups, but none of the identified SNPs were able to be directly validated within this set; however SNPs in strong LD were evaluated except for in 8q24.21. The 8q24.21 region is not well characterized on the array used for the TCGA genotyping, and as a result this region imputed poorly. No proxy SNP in strong LD with rs55705857 was able to be identified. Similar trends in RAF to those observed in the overall meta-analysis were seen in the TCGA set, though these differences were not statistically significant. Further interrogation in datasets with molecular classification where direct genotyping of these regions is warranted in order to confirm the sex-specific associations observed in this analysis.

Conclusions

Sex and other demographic differences in cancer susceptibility can provide important clues to etiology, and these differences can be leveraged for discovery in genetic association studies. This analysis identified potential sex-specific effects in 2 previous identified glioma risk loci (7p11.2, and 8q24.21), and 1 newly identified autosomal locus (3p21.31). Odds ratios for the highest strata of an unweighted risk score calculated by summing total risk alleles was higher in females as compared to males in all three histology groups. These significant differences in effect size may be a result of differing biological function of these variants by sex due to biological sex differences, or interaction between these variants and unidentified risk factors that vary in prevalence or effect by sex.

Materials and Methods

Study cohorts

This study was approved locally by the institutional review board (IRB) at University Hospitals Cleveland Medical Center and by each participating study site’s IRB. Written informed consent was obtained from all participants. All research was performed in accordance with relevant guidelines and regulation. In this study, data was combined from four prior glioma GWAS: Glioma International Case-Control Study (GICC), San Francisco Adult Glioma Study GWAS (SFAGS-GWAS), MD Anderson Glioma GWAS (MDA-GWAS), and National Cancer Institute’s GliomaScan (Fig. 4A)4,11,12,13,14. The SFAGS-GWAS includes controls from the Illumina iControls dataset, and MDA-GWAS includes controls from Cancer Genetic Markers of Susceptibility (CGEMS) breast and prostate studies34,35,36. Details of data collection and classification are available in previous publications4,11,12,13,14.

Genotyping and imputation of GWAS datasets

GICC cases and controls were genotyped on the Illumina Oncoarray37. The array included 37,000 beadchips customized to include previously-identified glioma-specific candidate single nucleotide polymorphisms (SNPs). SFAGS-GWAS cases and some controls were genotyped on Illumina’s HumanCNV370-Duo BeadChip, and the remaining controls were genotyped on the Illumina HumanHap300 and HumanHap550. MDA-GWAS cases were genotyped on the Illumina HumanHap610 and controls using the Illumina HumanHap550 (CGEMS breast34,36) or HumanHap300 (CGEMS prostate35). GliomaScan cases were genotyped on the Illumina 660 W, while controls were selected from cohort studies and were genotyped on Illumina 370D, 550 K, 610Q, or 660 W (See Rajaraman et al. for specific details of genotyping)14. Details of DNA collection and processing are available in previous publications4,12,13,14. Individuals with a call rate (CR) <99% were excluded, as well as all individuals who were of non-European ancestry (<80% estimated European ancestry using the FastPop38 procedure developed by the GAMEON consortium). For all apparent first-degree relative pairs were removed (identified using estimated identity by descent [IBD] ≥ .5), for example, the control was removed from a case-control pair; otherwise, the individual with the lower call rate was excluded. SNPs with a call rate < 95% were excluded as were those with a minor allele frequency (MAF) <0.01, or displaying significant deviation from Hardy-Weinberg equilibrium (HWE) (p < 1 × 10−5). Additional details of quality control procedures have been previously described in Melin et al.4. All datasets were imputed separately using SHAPEIT v2.837 and IMPUTE v2.3.2 using a merged reference panel consisting of data from phase three of the 1,000 genomes project and the UK10K39,40,41,42,43,44.

TCGA cases were genotyped on the Affymetrix Genomewide 6.0 array using DNA extracted from whole blood (see previous manuscript for details of DNA processing23,24), and underwent standard GWAS QC, and duplicate and related individuals within datasets have been excluded4. Ancestry outliers were identified in TCGA using principal components analysis in plink 1.945. Resulting files were imputed using Eagle 2 and Minimac3 as implemented on the Michigan imputation server (https://imputationserver.sph.umich.edu) using the Haplotype Reference Consortium Version r1.1 2016 as a reference panel46,47,48. Somatic characterization of TCGA cases was obtained from the final dataset used for the TCGA pan-glioma analysis22, and classification schemes were adopted from Eckel-Passow, et al.49 and Ceccarelli, et al.22.

Sex-stratified scan of the autosomal chromosomes

The data were analyzed using sex-stratified logistic regression models in SNPTEST for all SNPs on autosomal chromosomes within 500 kb of previously identified risk loci, and/or those found to be nominally significant (p < 5 × 10−4) in a previous meta-analysis (Fig. 2A)4,50. Sex-specific betas (βM and βF), standard errors (SEM and SEF), and p-values (pM and pF) were generated using sex-stratified logistic regression models that were adjusted for number of principal components found to significant differed between cases and controls within each study in a previous meta-analysis4,50. Genomic inflation factors were calculated After excluding SNPs with MAF < 0.05, INFO score < 0.7, and that significantly violated Hardy-Weinberg equilibrium in controls (p < 5 × 10−8), genomic inflation factors (Males: GICC: λadjusted = 1.04, SFAGS-GWAS: λadjusted = 1.01 MDA-GWAS: λadjusted = 1.02; Gliomascan: λadjusted = 1.01. Females: GICC: λadjusted = 1.03; SFAGS-GWAS: λadjusted = 1.02; MDA-GWAS: λadjusted = 1.04; Gliomascan: λadjusted = 1.01).

Estimation of sex difference and test of statistical significance

βD and SED were estimated using the sex-specific betas and standard errors separately for each dataset, as follows:

$${\beta }_{D}={\beta }_{M}-{\beta }_{F}$$
(1)
$$S{E}_{D}=\sqrt{S{E}_{M}^{2}+S{E}_{F}^{2}}$$
(2)

The difference between the groups was then tested using a z test51,52. Sex-stratified results and differences estimates from the four studies were separately combined via inverse-variance weighted fixed effects meta-analysis in META53. See Fig. 2A for schematic of autosomal analysis methods. Case only-analyses were performed for SNPs found to be significant in agnostic analyses using sex as outcome for all glioma, GBM, and non-GBM by study and betas and standard errors were combined via inverse-variance weighted fixed effects meta-analysis in META53.

Sex chromosome analysis

X and Y chromosome data were available from GICC set only. Males and females were imputed separately for the X chromosome using the previously described merged reference panel. X chromosomes were analyzed using logistic regression model in SNPTEST module ‘newml’ assuming complete inactivation of one allele in females, and males are treated as homozygous females (Fig. 2B). For prioritized SNPs in the combined model, sex-specific effect estimates were generated using stratified logistic regression models. Y chromosome data were analyzed using logistic regression in SNPTEST (Fig. 2B)54. Figures were generated using LocusZoom and R 3.3.2 using GenABEL, qqman, and ggplot55,56,57,58,59.

Analysis of TCGA germline and somatic data

Only newly diagnosed cases from TCGA GBM and LGG with no neo-adjuvant treatment or prior cancer were used. Demographic characteristics, molecular classification and somatic alterations data was obtained from Ceccarelli, et al.22. Chi-square tests were used to compare the frequency of somatic alterations between age groups. SNPs found to be nominally significant (p < 5 × 10−4) in a previous 8 study meta-analysis4, with imputation quality ≥ 0.7 were identified within the TCGA genotype data and D’ and r2 values in CEU were used to select proxy SNPs16. Using these SNPs, a case-only analysis using sex as a binary phenotype was conducted using logistic regression in SNPTEST assuming an additive model to estimate beta, standard error, and p values50. Results were considered significant at p < 0.003 (Bonferroni correction for 15 tests, for the three assessed loci in each of five histology groups).

Calculation of unweighted genetic risk scores

In order to estimate the cumulative effects of significant variants by sex, histology-specific unweighted risk scores were calculated using the SNPs found to be significantly associated with each outcome. Data from all four studies was merged, and any imputed genotypes with genotype probability >0.8 were converted to hard calls. An overall unweighted risk score (URS) was generated using the sum of risk alleles at rs12752552, rs9841110, rs10069690, rs11979158, rs55705857, rs634537, rs12803321, rs3751667, rs78378222, and rs2297440. As risk alleles are known to have histology specific associations4, histologic specific scores were generated for GBM and non-GBM using only the SNPs found to have a significant association with each histology. GBM-specific URS (URS-G) was calculated by summing the number of risk alleles at rs9841110, rs10069690, rs11979158, rs634537, rs78378222, and rs2297440. Non-GBM-specific (URS-N) specific URS was calculated by summing the number of risk alleles at rs10069690, rs55705857, rs634537, rs12803321, rs78378222, and rs2297440. Unweighted risk scores (URS) were calculated by summing all risk alleles for each individual. Differences in median scores between groups using were tested using Wilcoxon rank sum tests. Scores were compared against the median score for each set (URS: ten alleles, URS-GBM: six alleles, URS-NGBM: four alleles). Odds ratios and 95% confidence intervals for each level of the score using sex-stratified logistic regression adjusted for age at diagnosis (for controls where only an age range was available, the mean value of the range was used), where each score was compared to the median score within the entire population as described in Shete et al.13.

Calculation of trait variance explained by SNPs with sex-specific effects

In order to determine whether the identified SNPs with sex-specific effects more accurate estimate odds of glioma than sex alone, logistic regression models were used to estimate odds of all glioma, GBM, and non-GBM glioma based on sex using the GICC data only. Proportion of variance in odds of glioma explained by sex-specific SNPs was calculated using R2 estimated using the log likelihood of the null model (sex, age at diagnosis, and the first two principal components only) and the full model (including identified SNPs, rs9841110, rs11979158, rs55705857)60, calculated as follows:

$${R}^{2}=1-\frac{\mathrm{log}({L}_{full})}{\mathrm{log}({L}_{null})}$$
(3)

Proportion of variance explained was also calculated separately by sex for each histology (null model adjusted for age at diagnosis, and the first two principal components only).