Association of Genome-Wide Association Study (GWAS) Identified SNPs and Risk of Breast Cancer in an Indian Population

To date, no studies have investigated the association of the GWAS-identified SNPs with BC risk in Indian population. We investigated the association of 30 previously reported and replicated BC susceptibility SNPs in 1,204 cases and 1,212 controls from a hospital based case-control study conducted at the Tata Memorial Hospital, Mumbai. As a measure of total susceptibility burden, the polygenic risk score (PRS) for each individual was defined by the weighted sum of genotypes from 21 independent SNPs with weights derived from previously published estimates of association odds-ratios. Logistic regression models were used to assess risk associated with individual SNPs and overall PRS, and stratified by menopausal and receptor status. A total of 11 SNPs from eight genomic regions (FGFR2, 9q31.2, MAP3K, CCND1, ZM1Z1, RAD51L11, ESR1 and UST) showed statistically significant (p-value ≤ 0.05) evidence of association, either overall or when stratified by menopausal status or hormone receptor status. BC SNPs previously identified in Caucasian population showed evidence of replication in the Indian population mainly with respect to risk of postmenopausal and hormone receptor positive BC.


Design of Custom SNP Panel. A customized panel of 384 SNPs was designed by including
GWAS-identified BC risk loci, BC SNPs identified from candidate gene studies, SNPs previously reported to be associated with obesity related traits and other SNPs in obesity genes. This paper focuses on a subset of 31 BC GWAS SNPs identified in Caucasian and East Asian populations using the Human Genome Epidemiology (HuGE) Navigator and the National Institute of Health (NIH) GWAS Catalog 12,13 . We have also included 45 BC SNPs from candidate gene studies. Only 30 BC GWAS SNPs and 42 BC candidate SNPs were used for final analysis after quality assessment. For BC GWAS SNPs, only SNPs with p value < 5 × 10 −8 were included in the analysis. Duplicate SNPs between the HuGE Navigator and NIH GWAS Catalog were removed. There were no overlapping SNPs between GWAS and candidate gene studies. The panel was designed in March 2011 and therefore SNPs identified after this time were not included.
Since results from candidate gene SNP studies have had a poor record of replication 14 , we have presented the results in Supplementary Table S3 and have not evaluated them in the main tables. Quality Assessment. The reproducibility rate of replicate samples (n = 160) for all assays was > 98%.
Examination of negative controls indicated no inter-sample contamination. A designability rank score (0-1.0) was calculated for each SNP by Illumina for conversion of SNP into a successful GoldenGate Assay. All SNPs had a score of 1.0, indicating a high success rate. Following completion of the assay, data were cleaned using the Illumina Genome Studio software version 1.9.4. Automatic allele calling was performed using a GenCall (GC) threshold of 0.25. The software assigned three clusters on a graph based on the fluorescence obtained. Seventeen samples had a call rate < 90% that were excluded and a total of 2,399 (1,194 cases and 1,205 control) samples were included in the final analysis. One SNP with MAF < 1% and 3 SNPs with diffused clusters in our study population were excluded, yielding a list of 30 GWAS SNPs and 42 candidate SNPs (i.e. total 72 SNPs across 41 genes) for final analysis. All SNPs had call frequency above 95%. No deviation from hardy-weinberg equilibrium (HWE) (P < 0.001) was observed using the chi-square test, and all SNPs had Gen-train score value of 0.4 and above. All quality control dashboards provided in Genome Studio Software showed that the quality of the assays were satisfactory including allele specific extension, PCR uniformity, extension gap, first and second hybridization.
Out of 1,194 cases and 1,205 controls, information on menopausal status could be obtained on 1,193 cases (607 premenopausal and 586 postmenopausal) and 1,191 (650 premenopausal and 541 postmenopausal) controls respectively. When we stratified cases by hormone receptor status, there were 408 estrogen receptor positive (ER+ )/ progesterone receptor positive (PR+ ), 529 estrogen receptor negative (ER− )/progesterone receptor negative (PR− ) irrespective of their HER2 status and 340 triple negative breast cancer (TNBC) cases.
Statistical Analysis. Unconditional logistic regression was used to estimate adjusted odds ratios (OR) and corresponding 95% confidence intervals (CI) between genotype and BC case-control status. The model was adjusted for age (continuous variable) and region of residence (North, South, East, West and Central India). An additive model of inheritance (continuous effect of increasing number of variant alleles -0 versus 1 versus 2) was assumed and the genotypes were coded as 0 = wild type, 1 = heterozygous and 2 = homozygous variant. Further analyses were performed by menopausal and hormone receptor status. ORs for all SNPs were reported with respect to the risk allele as identified in previous GWAS. All statistical tests were two-sided, and a P value equal to or less than 0.05 was considered statistically significant. To investigate the association between BC risk and total susceptibility burden defined by the combination of the SNPs, a polygenic risk score (PRS) was derived for each individual using the formula: PRS = β 1 x 1 + β 2 x 2 + … + β n x n . For any given SNP n, β n is the log-odds-ratio associated with risk allele reported in the literature from prior GWAS conducted in Caucasian population 2 and x n is the number of risk alleles carried by an individual in our study. Only independent SNPs were included in the PRS analysis. If there were multiple SNPs in LD (r^2 > 0.2) from one region, the SNP with strongest association signal reported in previous GWAS was picked for our analysis, resulting in a total of 21 independent SNPs. Logistic regression models were used to estimate the odds ratios for BC by percentile of the PRS, with 25 th percentile as the reference. Based on previously reported ORs of all the SNPs, their allele frequencies in the Indian population, and the sample, we estimated power of replication of each SNP in the current study.
The probability of observing a larger number of significant associations (P ≤ 0.05) than would be expected by chance is a function of the binomial distribution 15 . We conducted a global test for the hypothesis to evaluate whether the number of significant associations with BC (P ≤ 0.05) was greater than expected for the number of loci tested. All analyses were performed using the statistical software Stata version 12.0 16 .

Results
A total of 1,194 cases and 1,205 controls were included in the final analysis. Table 1 describes the distribution of cases and controls with respect to age, education, region of residence at enrolment, menopausal status, and family history of breast, ovary or endometrial cancer. The risk of TNBC increases two-fold in females with family history of breast, ovary or endometrial cancer (OR = 2.00; 95%CI = 1.01-3.97) (data not shown). Direction of effect were similar to previously reported GWAS for 22 of the 30 SNPs, but different for SNPs rs10069690 (TERT), rs13387042 (TNP1), rs1562430 (FAM84B), rs2180341 (RNF146), rs2981575 (FGFR2), rs3757318 (C6orf97), rs6504950 (STXBP4), rs999737 (RAD51L1) ( Table 2). For the risk of overall BC, we confirm previously-reported associations between 5 GWAS-identified BC susceptibility SNPs, with an exception of rs2981575 which was associated with BC risk in our population but with reverse direction of effect (Table 2). Overall, out of the 30 GWAS BC SNPs analysed 22 SNPs showed effects in the same direction as that reported in previous GWAS. Applying the binomial test for enrichment indicated that the pattern was unlikely to be due to chance (p-value = 0.016). When cases were stratified by menopausal status, a number of BC GWAS-identified SNPs appeared to show stronger association in postmenopausal versus premenopausal women (Table 3). In particular, 7 SNPs achieved statistical significance (P ≤ 0.05) for association with postmenopausal BCs: FGFR2 (rs1219648, rs2981575, rs2981579 and rs2981582), MAP3K1 (rs889312), ESR1 (rs2046210) and 9q31.2 (rs865686) whereas only one SNP RAD51L1 (rs999737) showed association with premenopausal BCs. Details of all SNPs stratified on menopausal status are presented in Supplementary Table S1. Analyses performed on ER/PR status showed a total of 8 SNPs achieved statistical significance for association with ER+ /PR+ BC. In addition, the SNP rs2046210 in ESR1 and rs9485372 (UST) showed statistically significant increased risk for BCs for ER− /PR− and TNBC but not for ER+ / PR+ ( Table 4). The minor alleles of rs1219648, rs2981579, rs2981582 (FGFR2), rs889312 (MAP3K1), rs614367 (CCND1) and rs704010 (ZMIZ1) increased the risk. The alleles T and C of rs2981575 (FGFR2) and rs999737 (RAD51L1) respectively decreased the risk of hormone receptor positive BC ( Table 4). Details of all SNPs analysed by hormone receptor status are presented in Supplementary Table S2. The distribution of the PRS was shifted upwards in cases when compared to controls. Among all the different outcomes considered, PRS showed strongest association with the risk of BC among postmenopausal cases. In particular, postmenopausal women in the highest quartile of the PRS score had an 83% increased BC risk when compared to women in the lowest quartile. Results were similar when study participants with family history of breast, ovary or endometrial cancer were excluded from analysis (Table 5).

Details of Present
We also attempted to replicate associations for 42 BC SNPs which were previously identified in candidate studies and observed significant association in 3 regions viz. rs2420946 (FGFR2), rs3218408 (XRCC2), rs1641535 and rs1641536 (ATP1B2) (Supplementary Table S3).

Discussion
We used 1,194 cases and 1,205 controls from a hospital based case-control study in India to report evidence of replication for susceptibility BC SNPs that have been previously identified primarily through GWAS conducted in Caucasian population. To our knowledge this is the first study to evaluate risk of BC and GWAS-identified SNPs in the Indian population. The study population is unique in that it is an unscreened population with no use of hormone replacement therapy and 52% premenopausal women. The average tumour size in our study cases was well over 2 cm 17 and cases were not diagnosed mammographically. Out of 30 GWAS-identified SNPs analysed, 11 SNPs from eight genomic regions (FGFR2, 9q31.2, MAP3K1, CCND1, ZM1Z1, RAD51L11, ESR1 and UST) showed statistically significant (P ≤ 0.05) evidence of association either overall or when stratified by menopausal status or when analysed separately by hormone receptor status BCs. The direction of effect was the same as the previously reported GWAS results for 22 of the 30 SNPs.
We also attempted to replicate associations for 42 SNPs reported to be associated with BC in candidate gene studies. We observed statistical significance in 4 SNPs (9.5%) indicating that association observed in candidate gene studies are more prone to false positive results.
Our current data support the conclusions of previous GWAS studies 18,19 that FGFR2 (a tumour suppressor gene) polymorphisms (rs2981582 and rs2981575), first identified as susceptibility loci for BC in Caucasian population 20,21 , are associated with overall BC. The risk allele frequencies of FGFR2 SNPs were similar to those reported in previous GWAS, with the exception of rs2981575, which was much common in our population. rs865686 in 9q31.2 was significantly associated with BC, further the rare allele (G) frequency of rs865686 (MAF = 0.14) obtained in our study was comparable to that reported in a previous study in Asians (MAF = 0.09) 22 .
Consistent with results from Caucasian and East Asian populations, we found that both rs889312 (MAP3K1) and rs2046210 (close to ESR1) were associated with increased risk of overall BC in our Indian population.

Parameters Categories
All  rs889312 lies in an LD block of approximately 280 kb which includes the MAP3K1 gene 23 . The MAP3K1 gene encodes a 196-kDa serine/threonine protein kinase that activates the extracellular signal regulated kinase (ERK), c-Jun NH2-terminal kinase (JNK) and nuclear factor-kB (NF-kB) pathways 24 . Downstream signal transduction genes regulate cell survival, differentiation, proliferation and apoptosis, and appear to be involved in tumour development and tumour progression [25][26][27] . The SNP rs2046210 is located 180 kb upstream of the transcription initiation site of the first coding exon of the ESR1 gene. Considering the relative vicinity of rs2046210 to the ESR1 gene, it has been speculated that either the SNP itself, or causal variants in LD with it, might alter ESR1 gene expression, thus affecting susceptibility to BC. Functional genomic analyses and in vitro functional experiments conducted by Cai et al. 28 provided no support for the potential involvement of the polymorphism itself in the regulation of ESR1. The function of this SNP therefore is still unclear; future fine-mapping of the BC susceptibility loci tagged by rs2046210 is warranted and the underlying biological mechanism of this polymorphism needs further investigation.
Results of our hormone receptor analyses successfully replicated previous GWAS reported loci for hormone receptor positive BCs in FGFR2 (rs1219648, rs2981575, rs2981579 and rs2981582) 29 ; MAP3K1 (rs889312) 30,31 , CCND1 (rs614367) ZMIZ1 (rs704010) 3 , although the inverse direction of effect as compared to the previous GWAS report for FGFR2 (rs2981575) and RAD51L1 (rs999737) was unexpected. Our observed association of increased risk with respect to rs2046210 (ESR1) was consistent with previous studies suggesting that rs2046210 tended to increase BC risk in ER− tumour by a greater magnitude as compared to ER+ tumour [32][33][34] . Consistent with the literature, we also found SNP rs9485372 (UST) to be associated more with ER− /PR− and TNBC 1 .
Our analyses stratified on menopausal status showed 7 SNPs to be associated with postmenopausal BC, as opposed to only 1 SNP associated with premenopausal BC. SNPs in FGFR2 (rs2981582, rs2981575, rs1219648 and rs2981579), MAP3K1 (rs889312) and the 9q31.2 region (rs865686) were associated with BCs in postmenopausal women. Large scale GWAS studies have previously reported that the association of rs865686 is stronger in postmenopausal women 2,22,35 . Our observed associations of rs2046210 and BC were also consistent with prior literature suggesting that risk in postmenopausal BC cases was greater than risk in premenopausal BC cases 4 .
We observed evidence of replication of association for previously reported GWAS BC SNPs mostly among postmenopausal or ER+ /PR+ BC patients. This is not surprising given that these SNPs were identified mainly in American/European populations comprised largely of postmenopausal/older women 21,36,37 . Among ER+ /PR+ SNPs, most of SNPs (80%) were significant for postmenopausal women (data not shown). None of 19 SNPs which could not be replicated in present study (except rs2180341) had statistical power of 80% or more for replication (Supplementary Table S4), suggesting that larger sample sizes may be needed in order to detect an association. Nonetheless, the consistency analysis showed that the similarity in direction of effect for these 18 non-replicated SNPs was more than chance (P = 0.016). On the other hand, the null observation in our study for rs2180341 (an association initially observed in Ashkenazi Jewish GWAS population) even with 98% power for replication, is more likely to reflect a true difference in association across different ethnicities.
Our results of the 21 SNP PRS (including only the strongest SNP from LD > 0.2 SNP groups) showed that postmenopausal women in the highest quartile of PRS had an OR of 1.83 (95% CI = 1.28-2.60) when compared to women in lowest quartile. An increase in risk of ER+ /PR+ BCs was also observed, having OR = 1.36 (95% CI = 1.04-1.79) with per unit increase in PRS.
In conclusion, our study provides early evidence that the genetic architecture of postmenopausal and/or hormone receptor positive BC in the Indian population may be similar to that of Caucasian populations. More population-based studies in the Indian population are needed in order to identify additional BC susceptibility SNPs, especially for hormone receptor negative BCs.