Introduction

Gliomas are the most common primary brain and spinal tumors, representing 81% of malignant brain tumors. Gliomas occur in the brain and central nervous system (CNS) especially in glial or precursor cells1,2. In the 2007 World Health Organization (WHO) classification of tumors of the CNS, gliomas were classified according to their histological characteristics as Grade I–IV3,4. In the 2016 WHO classification, gliomas were classified according to molecular properties, such as isocitrate dehydrogenase (IDH) and 1p/19q status. According to its histological and molecular properties, a glioma is classified as a diffuse astrocytoma, anaplastic astrocytoma, oligodendroglioma, anaplastic oligodendroglioma, or glioblastoma (GBM)1,5,6.

Genome-wide association studies (GWASs) have been performed to identify regions associated with the risk of gliomas. Previous studies have reported variants at 27 loci associated with the risk of glioma7,8,9,10, these include, eight loci associated with all glioma (3q26.2, 5p15.33, 7p11.2, 8q24.21, 9p21.3, 11q23.3, 17p13.1, and 20q13.33), seven loci associated with GBM (1p31.3, 11q14.1, 12q23.3, 12q23.33, 16q12.1, 16p13.3, and 22q13.1), and 12 loci for non-GBM glioma (1q32.1, 1q44, 2q33.3, 3p14.1, 10q24.33, 10q25.2, 11q21, 11q23.2, 12q21.2, 14q12, 15q24.2, and 16q13.3).

Epidermal growth factor receptor (EGFR) is located at 7p11.2, and is essential for cell survival and development11. Many cancers, including glioma, are known to increase EGFR activity due to gene mutations, overexpression, or amplification.12,13. EGFR plays an especially key role in gliomas12. Several studies have shown that EGFR variants are associated with the risk of glioma. For example, rs1468727 and rs730437 are associated with an increased risk in the Han Chinese population14,15. Similarly, rs2252586 and rs11979158 are associated with an increased risk in the Caucasian population11,16. In a meta-analysis, rs11506105 was associated with an increased risk in both Asian and Caucasian populations17. Previous studies have confirmed the association between common genetic variants of EGFR and the heritable risk of gliomas. However, the association between the risk of gliomas and EGFR SNPs has not been studied in Korean populations.

To examine this association, we first selected SNPs of EGFR. Due to the large number of EGFR variants (> 5500 variants), we only considered important coding variants and previous glioma variants. We also performed an association analysis between susceptibility alleles and glioma subgroups with respect to clinical characteristics such as grades and histological and molecular properties.

Material and methods

Study subjects

A total of 804 subjects that are 324 cases, and 480 controls was analyzed in this study. The sample of glioma patients (n = 324) were collected at the Yonsei University Severance Hospital and collaborating hospitals, diagnosed between 2006 and 2016. Case subjects were divided to glioma subgroups based on the histologic and molecular properties according to 2007 and 2016 WHO classification of CNS tumors3,4. Patients who had history of other cancers were excluded through clinical record review. The population control (PC) samples (n = 480), which excluded participants who had a past medical history of various cancer types, were provided by the National Biobank of Korea, the Korean Genome and Epidemiology Study (KoGES) Consortium18. The controls were composed of quality-controlled biospecimen collections from population-based cohorts which comprised 10,038 blood donors aged 40–60 years from the Ansung-Ansan Community-based Cohort in 2001. The institutional review board of Yonsei University Severance Hospital approved the study protocols and the patients gave written informed consent for participation. Genomic DNA was extracted from blood samples using the Wizard Genomic DNA Purification Kit (Promega, Madison, WI).

The molecular alterations (IDH mutation and 1p/19q codeletion) were assessed in the following methods at Yonsei University Severance Hospital19. They investigated the molecular profile of all patients, which included 1p/19q codeletion, O-6-methylguanine-DNA methyltransferase (MGMT) promoter methylation, and IDH mutation status. The IDH mutation status was initially evaluated using immunostaining for the IDH1-R132H mutation using a Ventana Bench Mark XT autostainer (Ventana Medical System, Inc., Tucson, AZ, USA) according to the protocol. The antibody used was anti-human IDH1 R132H mouse monoclonal antibody (Clone H09L, 1:80 dilution; Dianova, Hamburg, Germany). In the absence of a positive mutant IDH1-R132H with immunohistochemistry, sequencing of IDH1 codon 132 and IDH2 codon 172 was performed. FISH analysis of 1p/19q status was performed using the LSI 1p36/1q25 and 19q13/19p13 Dual-Color Probe Kit (Abbott Molecular Inc., Abbott Park, IL, USA). Acquired images were interpreted by an experienced neuropathologist as the basis for Euro-CNS protocols20. If the numbers of “deleted” nuclei exceed 50%, the tumor was considered to show a “deletion” for the targeted chromosome.

All procedures performed in studies involving human participants were in accordance with the ethical standards of the institutional and/or national research committee and with the 1964 Helsinki declaration and its later amendments or comparable ethical standards.

SNP selection and genotyping

The candidate SNPs of EGFR were selected for genotyping from the Japanese and Han Chinese population in the 1000 genomes database with minor allele frequency (MAF) > 5%. The final 13 SNPs in EGFR were selected based on functional variants position and high linkage disequilibrium (LD) between SNPs interest (r2 > 0.98). Also, we included four SNPs (rs11979158, rs2252586, rs11506105 and rs1468727) that previously were reported to have association with the risk of gliomas. The primer tool was designed for the Fludigm SNP Type™ (San Francisco, CA, USA) to detect candidate SNPs except for two SNPs (rs17290169 and rs56183713) because of non-designable. In addition, genotyping was performed in all 804 subjects (324 cases and 480 controls) by using the Fludigm EP1 system (Fludigm 96.96 SNPtype™, San Francisco, CA, USA). The genotype data were analyzed with the BioMark SNP Genotyping analysis software (version 4.3.2). All candidate SNPs have been submitted to dbSNP (batch ID: EGFR_Glioma_SNP): https://www.ncbi.nlm.nih.gov/SNP/snp_viewTable.cgi?handle=GDLABSOGANGLF.

Statistical analysis

Linkage disequilibrium (LD) analysis between genotyped SNPs was carried out using the haploview v4.2 software from the Broad Institute (http://www.broadinstitute.org/mpg/haploview). Each individual haplotypes were estimated using PHASE 2.1 software21. To analyze the association with EGFR variants, logistic regression analysis under additive model was used for calculating Odds ratios (ORs), 95% confidence intervals, and corresponding P-values by adjusting age and sex as covariates using Golden helix SVS8 software (Bozeman, MT, USA). Also, the genotypes distribution such as the minor allele frequency (MAF) and Hardy–Weinberg equilibrium (HWE) of each SNP was compared in glioma patients and PCs. The P-values were corrected by Bonferroni correction for multiple testing of 13 times. In addition, to identify independent association among the significant EGFR variants, stepwise analysis and conditional logistic analysis were conducted using Statistical Analysis System (SAS) 9.4 software (SAS Inc., Cary, NC, USA). Subsequently, referent model analysis based on the allele distribution of SNPs (rs2227983 and rs1050171) was performed to verify detailed genetic effect using the Golden Helix SVS8 software (Bozeman, MT, USA). An in silico analysis was conducted for identifying function of associated SNPs using the SNPinfo (http://snpinfo.niehs.nih.gov/snpinfo/snpfunc.html).

Ethical approval

All procedures performed in studies involving human participants were in accordance with the ethical standards of the institutional and/or national research committee and with the 1964 Helsinki declaration and its later amendments or comparable ethical standards.

Informed consent

Informed consent was obtained from all individual participants included in the study.

Results

Subjects’ characteristics

Glioma patient cases (n = 324, mean age = 51.0 ± 14.8 years, 52.7% male) were classified according to histological characteristics into diffuse astrocytoma (n = 32, mean age = 46.3 ± 12.2 years, 53.1% male), anaplastic astrocytoma (n = 46, mean age = 41.9 ± 14.5 years, 47.8% male), oligodendroglioma (n = 16, mean age = 46.1 ± 7.2 years. 50.0% male), anaplastic oligodendroglioma (n = 22, mean age = 44.0 ± 10.7 years, 63.6% male) and GBM (n = 201, mean age = 55.4 ± 14.4 years, 52.7% male). According to the 2016 WHO classification of CNS tumors, of 324 glioma patients, IDH-mutants were found in 87 patients while 1p/19q codeletion were found in 68 patients. The population control group consisted of 480 individuals over the age of 40 years (mean age = 54.8 ± 9.5 years, 49.4% male). The detailed classifications of cases are summarized in Table 1.

Table 1 Clinical characteristics of study subjects.

Genotyping EGFR genetic variants

A physical map of genotyped EGFR SNPs located on chromosome 7p11.2, is shown in Fig. 1A. One linkage disequilibrium (LD) block was constructed as shown in Fig. 1C. The LD block was composed of four haplotypes with a frequency > 5%, as shown in Fig. 1B. Additional information, such as SNP alleles, coordinates, and positions, is presented in Table 2.

Figure 1
figure 1

Physical map, haplotypes, and LDs of EGFR (epidermal growth factor receptor). (A) Physical map of EGFR and its SNPs genotyped in this study. Black blocks indicate coding exons; white blocks indicate 5'-untranslated region (UTR) and 3'-UTR. Score in the bracket indicates the minor allele frequency (MAF) of SNP. (B) Haplotypes of EGFR. Only common haplotypes with frequency over 0.05 are analyzed for association analyses. (C) LD plot of EGFR. SNPs investigated in this study compose one LD block. Number in block represents the value of LD coefficient │D'│.

Table 2 Genotyped EGFR SNP information and association of variants with risk of glioma.

Associations between EGFR SNPs and glioma risk

To identify causal variants among EGFR SNPs associated with the risk of glioma in a Korean population, a logistic regression analysis under an additive model adjusted for age and sex as covariates was performed as shown in Table 2. As a result, five SNPs (rs2252486, rs2072454, rs2227983, rs2227984 and rs1050171) were significantly associated with the risk of glioma. After applying the Bonferroni correction, the two SNPs rs2227983 (Pcorr = 0.009 in the additive model) and rs1050171 (Pcorr = 0.005 in the additive model) remained significantly associated with the risk of glioma. Furthermore, three haplotypes (frequency > 5%) were used for logistic regression analysis, which revealed that EGFR-ht3 (OR = 0.69, P = 0.01) was associated with a decreased risk of glioma. Additionally, EGFR-ht2 was associated with an increased risk of glioma (OR = 1.32, P = 0.02). Additional information is provided in Supplementary Table S1.

Genetic effects of variants on glioma risk

Stepwise and conditional analyses were performed on the two significant EGFR variants to verify the independent association between significant SNPs and glioma risk. In the stepwise analysis, two SNPs (rs2227983 and rs1050171) remained in the model at the parametric discriminant P-value (0.05). Subsequently, conditional logistic regression analysis indicated that the two SNPs were variants with independent genetic effects. The results of the two analyses are summarized in Table 3. The genetic effects of the two SNPs (rs2227983 and rs1050171) were then analyzed separately in the referent model. The GG genotype of rs2227983 (OR = 2.07, 95% confidence interval [CI] 1.36—3.14) had a higher OR than the AG genotype (OR = 1.33, 95% CI 0.95–1.85) in referent analysis model (compared with AA referent groups) (Table 4). Thus, patients with two G alleles are likely to have a higher risk of glioma than patients with one G allele. Additionally, the AA genotype of rs1050171 (OR = 2.60, 95% CI 0.96–7.01) had a higher OR than the GA genotype (OR = 1.71 95% CI 1.22–2.39) (Table 4) in referent analysis model. Thus, patients with two A alleles in rs1050171 are likely to have a higher risk of glioma than patients with one A allele. Additionally, we investigated differences in the association between the two independent SNPs (rs2227983 and rs1050171) and glioma subgroups in relation to clinical characteristics such as WHO grade and, histological and molecular properties. These two variants were identified to be particularly associated with an increased risk of glioma in cases of GBM, IDH-wildtype and 1p/19q codeletion, as shown in Fig. 2.

Table 3 Independent association signals among glioma-associated EGFR variants.
Table 4 Logistic analysis of rs2227983 and rs1050171 in EGFR with the risk of Glioma.
Figure 2
figure 2

The association result of two independent SNPs between glioma subgroups and PCs. Logistic regression between glioma subgroups and PCs (n = 480) under additive model, adjusted by age and sex as covariates, was used for calculating ORs (95% CI) and P-values at rs2227983 and rs1050171. Each plot indicates the point estimate of ORs on the X-axis shown with 95% CI on the error bars. Significant associations are bolded. PC population control, WHO world health organization, AST astrocytomas, ODG oligodendrogliomas, GBM glioblastomas, IDH-mutant IDH1 or IDH2-mutated gliomas, IDH-wildtype IDH-wildtype gliomas, 1p/19q (-) 1p/19q codeletion, 1p/19q ( +) 1p/19q non-codeletion, OR odds ratio, PCs population control, CI confidence interval.

Discussion

This study suggests that specific loci in EGFR are associated with an increased risk of glioma. Moreover, two independent coding variants (rs2227983 and rs1050171) of gliomas were found in the Korean population. Additionally, we verified the association between EGFR coding variants and glioma subgroups based on histological characteristics and molecular properties by referring to previous studies 5. The ORs for all glioma subgroups were higher than 1, but P -values for some subgroups were not significant, as shown in Fig. 2.

A previous study indicated that rs11979158 and rs2252586 were significantly associated with gliomas in several European populations11,22,23,24. However, the association of these two SNPs with gliomas was not identified in Korean subjects. According to the 1000 Genomes database, the MAFs of rs11979158 and rs2252586 in the European populations (EUR) were 0.17 (rs11979158) and 0.28 (rs2252586). In our study, the MAFs of these SNPs in the Korean populations were 0.0006 (rs11979158) and 0.02 (rs2252586) (Supplementary Table S2). Thus, despite being reported as risk factors for glioma in Europeans, these SNPs were not risk factors for Korean glioma patients. The possible causal variants (rs2227983 and rs1050171) in this study confirmed that the major and minor alleles between East Asian and European populations can differ based on the 1000 Genome Project (Supplementary Table S2). No studies have analyzed the association between these two variants and glioma in the European population. Consequently, glioma-associated genetic variants may vary by race or ethnicity, as allele frequencies differed by race (Supplementary Table S4).

The two SNPs (rs1468727 and rs11506105) that were previously reported to be linked to glioma risk in other Asian populations were also analyzed14,15. No signals were detected with rs11506105 or rs1468727. However, rs2072454 (in absolute LD with rs730437 in a Chinese population14) was significantly associated with glioma risk in our study (P = 0.008 before correction for multiple testing). However, considering the uncorrected P values in a study of Chinese population (P = 0.016 in the additive model) as well as this study (P = 0.008 in the additive model), these associations might be not reliable, as no statistical significance remained after correction in both studies.

EGFR is a cell membrane receptor that is activated by the binding of ligands such as EGF. Ligand binding to EGFR induces the activation of various signaling pathways, including the PI3K/AKT, Jak/Stat, JUNK, and MEK/ERK pathways, which can contribute to tumorigenesis. Variants in EGFR lead to overexpression of the EGFR protein have been associated with many cancers, including gliomas. Previous studies have reported that EGFR overexpression contributes to tumorigenesis and tumor progression in the classical subtype of gliomas25,26. According to Han et al. (2016), rs2227983 is associated with the expression of TP53 and p21 in Chinese hepatitis B virus-related hepatocellular carcinoma. In particular, the G allele has a higher p21 expression than the A allele. Additionally, according to their in silico analysis, p21 and EGFR mRNAs were expressed in the same pathway or co-expressed27. In another in vitro experiment, the rs2227983 variant (R521K, P = 0.0007 in this study) reduced EGFR ligand binding, growth stimulation, tyrosine kinase activation, and induction of proto-oncogenes28,29. This suggests that the rs2227983 variant can increase EGFR activity through a substitution of the A allele with the G allele, leading to a change from lysine (K) to arginine (R). Moreover, this variant can induce overexpression of EGFR30, which can increase the risk of glioma12,31.

Previous studies have reported that rs2227983 and rs1050171 were associated with the risk of breast, lung, and colon cancer32,33,34, though one study found no association between rs2227983 and the risk of lung cancer in Korean populations35. Other studies have shown that the EGFR 521R variant is associated with a poor prognosis in bladder cancer and colon cancer36,37. However, to date, no studies have reported the association of these variants with the risk of glioma. The rs1050171 variant is associated with the risk of lung cancer in European and Korean populations38,39, and one study showed that this variant is associated with renal disease risk in the Korean population40. Although rs1050171 is a synonymous mutation that does not substitute amino acids, it can affect mRNA stability or protein structure folding41. The variant rs1050171 (G > A) is located in a highly conserved region, as predicted by SNPinfo (Supplementary Table S3). The synonymous variant rs1050171 has a higher regulatory potential value (Reg potential = 0.489) than the nonsynonymous variant rs2227983 (Reg potential = 0.390), shown in Supplementary Table S3. Moreover, rs1050171 may affect EGFR gene expression and predispose patients to gliomas. Collectively, these two variants could increase the risk of gliomas by activating downstream signaling pathways through the overexpression of EGFR proteins.

We further investigated whether that two SNPs (rs2227983 and rs1050171) are associated with brain tissue gene expression using eQTL calculators in the GTEx database (https://gtexportal.org/home/testyourown) 42. We found that the two variants were associated with gene expression in some brain tissues. This information is shown in Supplementary Table S5. However, no information was found regarding sQTLs.

Recently, advances in gene expression analysis, such as molecular profiling, have provided more predictive information than WHO classification of glioma43. Mutations in IDH1 and IDH2 have been frequently observed in astrocytoma and oligodendroglioma patients44. The 1p/19q codeletion is most common among oligodendroglioma patients and is used as a prognostic biomarker43,45,46. Oligodendroglioma patients also have both IDH mutations and 1p/19q codeletion in almost all cases, as shown in Table 1. In particular, rs2227983 and rs1050171 have a more significant association with IDH-wildtype subgroups than IDH-mutant subgroups. A previous study showed that primary GBM patients typically exhibit IDH-wildtype properties, obtained similar to the results in this study47 (shown in Table 1). These findings suggest that the risk of GBM is associated with belonging to IDH-wildtype subgroups. Additionally, the ORs of rs1050171 were higher than those of rs2227983 in almost all glioma subgroups except the WHO Grade III groups, as shown in Fig. 2. Because of limitations in statistical power, such as the low MAFs in 6 SNPs (MAF < 0.1) and small sample sizes, especially, in glioma subgroups analyses, interpretation of this study’s results requires caution. In this study, we used PCs matched for age and sex with insufficient clinical information, such as susceptibility to glioma, for detailed inclusion and exclusion criteria. Despite the use of these PCs, considering the difficulty of collecting large numbers of controls, this study can be considered as an alternative method to identify the genetic effects on gliomas48. Therefore, to determine the genetic effect of rs2227983 and rs1050171 on gliomas in a Korean population, subsequent clinical studies, such as mRNA and protein analyses, will be essential. In addition, although stepwise and conditional logistic analysis indicated two independent associations, it is not possible to know which SNP(s) are causal, because the causal variant(s) may be SNP(s) in LD with these SNPs. Further evidence from functional studies is needed to more confidently identify causal SNPs.

The purpose of this study was to investigate the genetic association between SNPs in EGFR and the risk of glioma in a Korean population. This study provided the first evidence that potentially functional polymorphisms in the EGFR gene, especially rs2227983 (K521R) and rs1050171 (Q787Q), may contribute to glioma susceptibility in the Korean population. Furthermore, it is essential for researchers in different populations to perform association studies of EGFR variants with glioma samples isolated from local population, as glioma-associated genetic variants may vary by ethnicity. This study will be useful for understanding and predicting the effect of SNPs on glioma susceptibility in Korean populations.