Main

The chromosome 9p21.3 region has been identified as a genetic susceptibility locus for multiple disease phenotypes including coronary artery disease, diabetes and cancer (Pasmant et al, 2011). This region encompasses several tumour suppressor genes including cyclin-dependent kinase inhibitors 2A (CDKN2A), CDKN2B, a non-coding RNA (CDKN2BAS, or ANRIL) and methylthioadenosine phosphorylase (MTAP). The CDKN2A/2B loci are well recognised as tumour-suppressor genes that are involved in the regulation of cell cycle, aging, senescence and apoptosis (Yang et al, 2010b). CDKN2A confers susceptibility to familial melanoma and germline mutations in CDKN2A occur in about 20% of melanoma families (Goldstein, 2004). The CDKN2A encodes both p16 (INK4A), a negative regulator of cyclin-dependant kinases, and p14 (ARF), an activator of p53. The exact function of CDKN2BAS is unknown, but it has been shown to regulate gene expression of CDKN2A/2B and SNPs in this locus have been associated with cardiovascular disease, cancer and other diseases in genome-wide association studies (GWAS) (Yap et al, 2010; Pasmant et al, 2011). The MTAP, identified by GWAS as a naevus- and melanoma-associated gene (Bishop et al, 2009; Falchi et al, 2009), encodes an enzyme that has a role in polyamine metabolism. Loss of MTAP expression can exert a tumour-promoting effect, and has been observed in a variety of other tumours (Stevens et al, 2009), suggesting that MTAP may function as a tumour suppressor gene. The 9p21.3 region also includes a cluster of type I interferon (IFN) genes, which encode pleiotropic cytokines that exhibit strong antiviral, antiproliferative and immunomodulatory effects (Stark et al, 1998).

In addition to its well established role in melanoma, deletions of the 9p21.3 region have been observed in a variety of cancers (van der Riet et al, 1994; Okami et al, 1997; Waber et al, 1997; Nakanishi et al, 1999; Perinchery et al, 1999; Schmid et al, 2000; Sanchez-Cespedes et al, 2001; Hu et al, 2004; Hustinx et al, 2005; Bartoletti et al, 2007; Gu et al, 2008), and SNPs in 9p21.3 have been associated with breast cancer, melanoma and glioma by GWAS (Bishop et al, 2009; Shete et al, 2009; Wrensch et al, 2009; Turnbull et al, 2010).These findings are consistent with a broad role for 9p21.3 genes in carcinogenesis. However, whether genetic polymorphisms in 9p21.3 confer susceptibility to other cancers remains unclear. The goal of the current study was to systematically evaluate variants in 9p21.3 with the risk of multiple cancers/tumours.

Materials and methods

Study population

This study sample included data from eight studies that participated in iSelect, a jointly conducted project in the Division of Cancer Epidemiology and Genetics of the National Cancer Institute (NCI), with a goal to evaluate common genetic variants in selected genes and pathways in multiple tumours, especially rare cancers (Gao et al, 2009; Yang et al, 2010a; Gao et al, 2011; Mirabello et al, 2011; Han et al, 2012; Neta et al, 2012). The study samples comprised seven cancers (renal cell carcinoma (RCC), endometrial cancer (EC), thyroid cancer (ThC), colorectal cancer (CRC), oesophageal squamous cell carcinoma (ESCC), gastric cardia adenocarcinoma (GCA) and osteosarcoma (OS)) and one benign condition (colorectal adenoma (CA)). Study participants were Chinese (for ESCC and GCA studies) or whites (all other studies). The design of these studies included nested case–control (RCC (1994), Prorok et al, 2000; Han et al, 2012), CRC, CA (Gao et al, 2011)), population-based case–control (EC (Yang et al, 2010a)), hospital-based case–control (OS (Troisi et al, 2006; Mirabello et al, 2011)), and case–control studies of mixed design (ThC (Neta et al, 2012), ESCC and GCA (Blot et al, 1993; Gao et al, 2009)). After excluding subjects with a low genotyping completion rate (<80%), the final analysis for each tumour outcome included 437 RCC cases and 1603 controls; 417 EC cases and 407 controls; 344 ThC cases and 452 controls; 393 CRC cases and 434 controls; 1234 CA cases and 1368 controls; 1027 ESCC cases and 1452 controls; 753 GCA cases and 1452 controls; 96 OS cases and 1428 controls. We pooled controls for ESCC and GCA (1452 controls total), as these cases were drawn from the same underlying studies. The RCC and OS shared a subset (1170 and 1363, respectively) of PLCO controls with CA. Detailed information for each study is summarised in Table 1 and Supplementary Table S1.

Table 1 Description of the study samples

After correction for multiple testing, ESCC, GCA and CA had adequate power (94%, 87% and 98%, respectively) to detect an association for a SNP with minor allele frequency (MAF)=0.35 and an odds ratio (OR) of 1.4, while all other studies were underpowered (power<80%). However, our aim was to identify genetic variants in the 9p21 region associated with multiple cancer/tumour outcomes using combined data across studies.

SNP selection, genotyping and quality control

SNP selection, genotyping and quality control have been described previously (Yang et al, 2010b). In brief, 252 tag SNPs for 22 genes located at the chromosome 9p21.3 region (19.9–32.8 Mb) were genotyped at the NCI Core Genotyping Facility (Advanced Technology Center, Gaithersburg, MD; http://snp500cancer.nci.nih.gov) using a custom-designed iSelect Infinium assay (Illumina, www.illumina.com). From telomere to centromere, these genes included: IFNB1, IFNW1, IFNA21,10,16,17,14,5, KLHL9, IFNA6, 2, 8, 1, IFNE1, MTAP, CDKN2A, CDKN2B, CDKN2BAS, TUSC1, PLAA, IFNK, ACO1. For each gene, SNPs spanned 20 kb 5′ of the transcription start point (exon 1) to 10 kb 3′ of the last exon. Tag SNPs were selected using a MAF criterion of MAF >5% based upon HapMap data for whites (CEU) and Yoruba (YRI) samples using a Tagging algorithm (Carlson et al, 2004). Selected SNPs are listed in Supplementary Table 2.

The iSelect panel was validated using all three HapMap populations (CEU, YRI, Japanese and Chinese). The SNPs with low (<90%) genotyping completion rate, low (<95%) concordance rate or deviation (P<0.001) from Hardy–Weinberg equilibrium among controls were excluded from each participating study. The number of SNPs included in the final analyses were: 170 SNPs for RCC, 202 SNPs for EC, 195 SNPs for ThC, 193 SNPs for CRC, 203 SNPs for CA, 139 SNPs for ESCC and GCA and 200 SNPs for OS. In the ESCC and GCA study, a larger number of SNPs were excluded due to low MAF (<5%), likely reflecting differences between white and Asian populations.

Statistical analyses

We first assessed the association between each SNP and each cancer outcome separately. Unconditional logistic regression was used to estimate ORs and 95% confidence intervals (CIs) and P-values for trend, using additive coding for genotypes (0,1,2 minor alleles). The homozygote of the common allele served as the reference group. Heterozygous and homozygous rare genotypes were combined when the number of subjects with homozygous minor alleles was <5, and a dominant genetic model was used. Appropriate covariates adjustment was performed for each tumour outcome per discussion with principal investigators of each study (Table 1).

To examine whether 9p21 variants were associated with multiple cancer/tumour outcomes, we conducted meta-analyses combining data from the eight studies. To combine SNP results across studies, we first used a standard fixed-effect meta-analysis and then a newly developed subset-based statistical approach (ASSET) (Bhattacharjee et al, 2012). ASSET is a modified fix-effect meta-analysis approach that allows for heterogeneity of SNP effects on different outcomes by exhaustively exploring subsets of the studies for the presence of association signals. The ASSET test statistic Z(S) for a given subset S of k studies is a weighted sum of the k study-specific test statistics, Z(S)=a1 Z1+…+ak Zk, where the aj is the proportion of the sample size for the jth study relative to the total sample size for the studies in the given subset S. The overall evidence of the association of the SNP is then based on evaluation of Zmax= maxS |Z(S)|, i.e. the maximum of the subset-specific test statistics over all possible subsets of the studies. Under the null hypothesis the vector of values Z(S) has a multivariate normal distribution with mean zero and variances equal to one. The correlation between Z(A) and Z(B) for two different subsets A and B is given in (Bhattacharjee et al, 2012). We computed a two-sided version of the test that also allows the detection of effects in opposite directions. Both fixed-effect and subset-based meta-analyses were performed using the ‘ASSET’ R package, which can take into account shared controls across studies. Because the SNP minor alleles may differ across studies, we standardised the effects before combining the data by multiplying the beta-coefficients of SNPs by 1 or -1.

Gene-based analyses were performed on the 22 genes to assess the significance of the joint effect of multiple SNPs in each gene on each outcome separately. Gene-based P-values (Pgene) were computed using the minP method by Adaptive Rank Truncated Product (ARTP) program (Yu et al, 2009). The minimum P-value of each gene was used as the test statistic and its significance was assessed using a permutation test with 10 000 permutations, taking into account the number of SNPs genotyped in each gene and their linkage disequilibrium (LD) structure.

We used Bonferroni correction to account for the number of SNPs or genes and studies tested, therefore P for SNP<3.1 × 10−5 (0.05/(203 × 8)) and Pgene<2.8 × 10−4 (0.05/(22 × 8)) were used to define SNP-based and gene-based statistical significance. In meta-analysis, P for combined analysis<2.46 × 10−4 (0.05/203) was considered statistically significant after Bonferroni correction for numbers of SNPs. All statistical analyses were performed using the R software.

Results

When analysing each study separately, we found one SNP in CDKN2A (rs3731239) that was significantly associated with ESCC after Bonferroni correction (P=7 × 10−6) (Table 2a). The minor allele (G, MAF=0.12) of this SNP was associated with increased ESCC risk (OR=1.51, 95% CI=1.25, 1.84, for AG vs AA; OR=1.88, 95% CI=1.04, 3.41, for GG vs AA; Table 2a). Figure 1 shows that the LD pattern and genotype frequencies among controls were different in the Chinese and Caucasian samples.

Table 2a Association between rs3731239 and ESCC in a Chinese populationa
Figure 1
figure 1

Linkage disequilibrium structures and genotype frequencies of rs3731239 among controls of Chinese (A) and Caucasian (B) samples. The LD (indicated by r2) maps were drawn using the Haploview software, based on the genotyping data of control samples for ESCC (A) and CA (B). LD patterns in other Caucasian studies were similar to that in CA.

We also found 18 additional SNPs that were associated with at least one tumour outcome at P<0.01 (Table 2b), although the associations were not significant after Bonferroni correction. Among them, one SNP in MTAP (rs7023329) that was previously associated with melanoma and nevi in several GWAS (Bishop et al, 2009; Barrett et al, 2011), was associated with CA (P=0.0005). Another previously identified SNP (rs4977756) in CDKN2BAS from a GWAS for glioma (Shete et al, 2009), was associated with EC (P=0.009) and ESCC (P=0.002).

Table 2b Selected SNP-based results, with P-value<0.01 for at least one tumour outcome

In fixed-effect meta-analyses, only rs7023329 in MTAP showed marginal association (fixed effect P<0.05) before correction for multiple testing (Table 2c). When using the subset-based approach (ASSET), rs7023329 showed suggestive association with multiple tumours (positive effect P=0.007), with the strongest signal obtained from the subset combining data from CRC, CA and OS studies (Table 2c). In addition, the subset approach identified significant associations between rs3731239 in CDKN2A, rs615552 and rs573687 in CDKN2B, and rs564398 in CDKN2BAS, and EC and ESCC after Bonferroni correction (positive effect P<2.46 × 10−4, Table 2c), although these associations seemed to be mainly driven by ESCC based on sensitivity analyses that excluded ESCC. The effects of all SNPs with P0.01 in the subset analyses showed the same direction (positive effect) across contributing study outcomes (Table 2b and c).

Table 2c Meta-analyses of selected SNPs with two-sided subset search P0.01a

Gene-based analyses showed that the CDKN2A gene was significantly associated with ESCC (Pgene=0.0001) and the association remained significant after adjusting for multiple testing (Table 3). Other genes in the nearby region, MTAP (Pgene=0.015), CDKN2B (Pgene=0.01) and CDKN2BAS (Pgene=0.009), also showed suggestive associations with ESCC (Table 3). In addition, MTAP showed a suggestive association with CA (Pgene=0.006).

Table 3 Gene-based P-values for 9p21.3 genes in association with eight tumour outcomesa

Discussion

In this study, we evaluated associations of up to 203 SNPs in 22 genes located on chromosome 9p21.3 with the risk of eight tumour outcomes in data from eight case–control studies. When analysing each tumour outcome separately, we identified a single SNP in CDKN2A (rs3731239) that was significantly associated with the risk of ESCC, after correction for multiple comparisons. Gene-based analyses also suggested that the CDKN2A gene was significantly associated with ESCC. In the subset-based meta-analyses, four SNPs (rs3731239 in CDKN2A, rs615552 and rs573687 in CDKN2B, and rs564398 in CDKN2BAS) showed significant associations with ESCC and EC. Two previously identified GWAS SNPs, rs7023329 in MTAP for melanoma and nevi and rs4977756 in CDKN2A for glioma, showed suggestive associations with CA (for rs7023329) and EC and ESCC (for rs4977756), respectively, in our study. Our findings further highlight the importance of 9p21.3, in particular the MTAP-CDKN2A/2B/CDKN2BAS region, in the pathogenesis of multiple tumours.

Rs3731239 previously demonstrated weak associations with breast cancer (Driver et al, 2008; Mavaddat et al, 2009) and ovarian cancer (Goode et al, 2009) in predominantly Caucasian populations. In our study, this SNP was significantly associated with ESCC in Chinese and only weakly associated with EC in Caucasians. The minor allele of this SNP is more common in Caucasians (0.39 among controls in our study) than in Chinese (0.12 among controls in ESCC). In addition, the two ethnic populations showed distinct LD patterns in the region flanking this SNP, which may also contribute to the differences in the association observed.

Recent studies have suggested that the 9p21.3 region was enriched in regulatory sequences such as enhancers that regulate the expression of genes in this region (MTAP-CDKN2A/2B/CDKN2BAS) and downstream (such as IFNA21), thereby establishing a functional link between 9p21 genetic variation and immune signalling pathways (Harismendy et al, 2011). Interestingly, rs564398 in CDKN2BAS, which showed suggestive associations with both EC and ESCC in our study (see Tables 2b and c), was located within a predicted enhancer sequence. The most significant SNP in our study, rs3731239 in CDKN2A, is located adjacent to the promoter region of CDKN2A (about 500 bp away from a CpG island and predicted transcription binding and DNase I sites based on ENCODE data, http://www.genome.ucsc.edu/ENCODE/). A previous study correlating 9p21 SNPs with gene expression found that rs3731239 was significantly associated with allele-specific expression of CDKN2BAS (P=10−25) (Cunnington et al, 2010). Three other SNPs (rs1063192, rs564398 and rs11790231) in the CDKN2B/CDKN2BAS locus that showed suggestive associations with EC (and/or ESCC, ThC) were also significantly associated with allele-specific expression of CDKN2BAS. CDKN2BAS is a non-coding RNA within the CDKN2A/2B locus, which has been identified by GWAS of multiple diseases; its expression showed the strongest association with the multiple phenotypes (coronary disease, stroke, diabetes, melanoma and glioma) that were associated with the 9p21.3 region, as compared with the three other genes of the cluster (MTAP, CDKN2A, CDKN2B) (Pasmant et al, 2011). The CDKN2BAS is involved in regulating CDKN2A/2B expression through a cis-acting mechanism as well as by regulating cell proliferation and senescence through pathways independent from CDKN2A/2B (Visel et al, 2010; Congrains et al, 2012). In addition to CDKN2BAS, two SNPs in MTAP (rs10757257 and rs7027989), which were suggestively associated with CA in our study, were also found to be expression quantitative trait loci for MTAP (Zeller et al, 2010). These data, combined with previous publications, indicate that common genetic variants in this region may influence disease risk by regulating gene expression through a cis-effect. With rapid progress in mapping regulatory elements and the growing availability of cell and tissue-specific gene expression data, future studies should be able to evaluate the functional relevance of genetic variants at 9p21.3.

Somatic 9p21 deletions frequently occur in human cancers such as bladder cancer, pancreatic cancer, oesophageal cancer, glioma and melanoma (Schmid et al, 2000; Hu et al, 2004; Hustinx et al, 2005; Bartoletti et al, 2007; Gu et al, 2008; Rakosy et al, 2008). In a previous study conducted in the same Chinese population from which the ESCC cases in the current study were obtained, the majority (73%) of ESCC tumour specimens analysed were found to have LOH at 9p21–22, and 25% (14 of 56) of tumours had CDKN2A mutations (point mutations, deletions, insertions) (Hu et al, 2004). In addition, promoters in CDKN2A are typically methylated in ESCC tumours (Roth et al, 2006).

Ours is the first systematic evaluation of genetic variation in the 9p21.3 region in relation to multiple tumour outcomes. The strengths of our study include the careful and comprehensive selection of genes in the entire 9p21.3 region, the application of a newly developed subset analysis method to combine SNP data across multiple studies, and use of a gene-based permutation analysis method to comprehensively evaluate variation in genes with cancer risk. In addition, SNPs were genotyped for all studies using the same platform and quality control procedures. Our findings suggest that combining data from multiple cancer outcomes may provide additional information in understanding disease associations with GWAS variants.

There are several limitations in our study. First, studies included in this analysis varied by study design, population ethnicity and sharing controls in some studies, which posed challenges for combining data as well as generalising the findings. We therefore applied a new statistical approach, which was specifically designed to handle heterogeneity across studies. Second, our sample size was in general small, which may limit statistical power for identifying significant associations in the smaller individual studies. In fact, most associations were not significant after correcting for multiple testing, with the noted exception of rs3731239 in CDKN2A, with ESCC, which was among the largest studies. However, the Bonferroni test is conservative, especially for previously identified GWAS SNPs, and therefore the observed associations in our study warrant future investigation in larger samples.

In conclusion, our data indicated that genetic variants in the 9p21.3 region, particularly near the MTAP-CDKN2A/2B/CDKN2BAS, may be associated with ESCC and possibly several other tumours. Our findings further highlight the importance of the 9p21.3 region in disease susceptibility and cancer aetiology. Future studies are needed to further investigate the role of this chromosomal region in cancer pathogenesis. Further, data on somatic alterations of this region (in tumour tissue), such as gene expression, will be particularly helpful to identify the mechanisms underlying the observed associations.