INTRODUCTION

Breast cancer is the most common cancer in women and a common cause of cancer-related death worldwide.1 The risk of breast cancer is determined by both genetic and lifestyle factors.2 The variation in breast cancer incidence between populations can be explained in part by differences in lifestyle factors, such as reproductive patterns or diet. However, there is substantial variation within a population that seems to be determined by inherited genetic risk factors, possibly modified by external factors.3 The overall contribution of inherited genes to the development of a disease can be quantified by the familial aggregation of the disease. Epidemiological studies have shown that breast cancer is about twice as common in first-degree relatives of women with the disease than in the general population, reflecting the inheritable component of the disease.4 A considerably higher risk in monozygotic twins of affected relatives than in dizygotic twins has been demonstrated in twin studies, suggesting that the familial aggregation is largely determined by inheritable rather than environmental risk factors.5, 6

GWAS plus previous linkage and candidate gene association studies have identified many susceptibility genes, the most prominent being the high-penetrance genes breast cancer-related BRCA1 and BRCA2, which contribute up to 20% of hereditary breast cancer.7 Further, high-penetrance (TP53 and PTEN) and moderate-penetrance genes (CHEK2, ATM, BRIP1 and PALB2) were shown to predispose to breast cancer susceptibility.8 However, these genes account only for 8% of the genetic risk of breast cancer9 and it is unlikely that any single variant will have a major impact on risk prediction. The residual genetic risk is therefore likely to be due to a large number of common variants. The risk conferred by each of these alleles may be small but may combine in an additive or synergistic fashion to affect breast cancer susceptibility.10, 11

In the past, breast cancer incidence in China was low but a substantial increase in new cases of breast cancer is expected due to rapidly changing reproductive and lifestyle risk factors among Chinese women. While the overall cancer rate in urban Shanghai decreased by 0.5% per year between 1972 and 1994, the breast cancer incidence increased by about 50% over the same 23 year period.12 A recent study on reproductive and demographic changes stated that by the year 2021, the incidence of breast cancer is expected to increase from current rates of 10–60 cases per 100 000 women to more than 100 new cases per 100 000 women aged between 55–69.13 Incorporation of this disease into the Chinese public health-care system is an urgent need for efficient future health-care planning. Recently, a risk assessment model, which integrated genetic and demographic factors evaluated the importance of genetic risk factors to breast cancer, for screening and prevention programs.14

Currently, little is known about the differential role of SNPs contributing to breast cancer in the Chinese population compared with the European population. However, the use of studies from multiple populations with different patterns of linkage disequilibrium (LD) can substantially reduce the number of variants that need to be subjected in post-GWAS functional analysis.15, 16

In this study, we evaluated the association of candidate SNPs and the risk of breast cancer in Chinese and German cohorts. Eighteen candidate SNPs were selected from previous GWAS on Caucasian and Chinese populations. The objective was to validate the SNPs in an independent German cohort (311 cases vs 960 controls) and to identify SNPs not previously associated with breast cancer risk in the Chinese population (984 cases vs 2206 controls) and vice versa.

MATERIALS AND METHODS

Study population

Two independent study groups were evaluated in this work. All Chinese samples used for genotyping were Chinese Han. These included 984 breast cancer cases and 2206 healthy controls obtained by doctors through collaborations with multiple hospitals from provinces in the central area of China. All Chinese controls were clinically assessed to be without breast cancer, other neoplastic diseases, systemic disorders or family history of neoplastic diseases (including first-, second- and third-degree relatives). All breast cancer patients were diagnosed and categorized according to the TNM breast cancer classification. Clinical information was collected from the affected individuals through a full clinical checkup by breast cancer specialists. Additional demographic information was collected from cases through a structured questionnaire. The German cases were obtained from the University Medical Center Mannheim and other hospitals in the German Rhein-Neckar region and included 311 cases and 960 controls. German controls were derived from healthy blood donors, obtained by the German Red Cross and have partly been used as control groups in previous studies on breast cancer.17, 18, 19 The characteristics of the two study populations are presented in Table 1. All of the cases and controls were females. Cases with one or more first-degree relatives having breast or ovarian cancer were considered to have a family history of breast cancer. All participants provided written informed consent. The study was approved by the Institutional Ethical Committee of Anhui Medical University and the Medical Ethics Commission II of the Medical Faculty of Mannheim, University Heidelberg and was conducted according to Declaration of Helsinki principles.

Table 1 Baseline patient characteristics of breast cancer cases and controls

SNP selection

In recent years, several GWAS have identified more than 27 SNPs within 20 different loci to be associated with the risk of breast cancer. Most of these studies were conducted on women of European descent. In this study, 18 SNPs from previously published GWAS in the European20, 21, 22, 23, 24, 25, 26, 27 and Chinese28, 29 populations were selected (Table 2). Only SNPs with a minor allele frequency (MAF) higher than 5% in the HapMap CHB or CEU data were selected. These 18 SNPs represent 13 independent loci, either present in genes or in intergenic regions.

Table 2 Information about the 18 candidate SNPs selected

Genotyping and quality controls

Genotyping analyses were conducted by using the Sequenom MassArray system at the State Key Laboratory Incubation Base of Dermatology, Ministry of National Science and Technology, Hefei, Anhui, China. Genomic DNA was extracted from whole-blood or buffy coat using FlexiGene DNA kits (QIAGEN, Hilden, Germany). All samples were surveyed for DNA quality using a Nanodrop Spectrophotometer ND-2000 (Thermo Scientific, Wilmington, DE, USA) and by agarose gel electrophoresis assay to ensure genomic integrity. Approximately 15 ng of genomic DNA was used to genotype each sample. Locus-specific PCR and detection primers were designed using the MassARRAY Assay Design 3.0 software (Sequenom, San Diego, CA, USA). Following the manufacturer’s instructions, the DNA samples were amplified by multiplex PCR reactions, and the PCR products were then used for locus-specific single-base extension reactions. The resulting products were desalted and transferred to a 384-element SpectroCHIP array. Allele detection was performed using MALDI-TOF MS. The mass spectrograms were analyzed by the MassARRAY Typer software (Sequenom). Exclusion criteria for genotyped SNPs were a call rate<95%, MAF<0.05 and deviation from Hardy–Weinberg equilibrium (HWE, P<0.05) in the controls. 12 SNPs passed the quality control and were subjected to statistical analysis.

Statistical analysis

The association between the SNPs and disease susceptibility were assessed using the χ2 test or Fisher’s exact test. The strength of association was estimated by calculating odds ratio (OR) with 95% confidence interval (CI). As a quality control of the genotyping, genotype data were analyzed for deviations from Hardy–Weinberg equilibrium using χ2 statistics. All alleles were observed to be in normal Hardy–Weinberg equilibrium. Association with subphenotype (age of onset, clinical stage of breast cancer) were analyzed by comparing cases with a certain subphenotype with controls. All statistics were analyzed with SPSS 13.0 (SPSS Inc. Chicago, IL, USA) and Plink 1.07 (http://pngu.mgh.harvard.edu/~purcell/plink/) software packages. Haplotype analyses were performed using Haploview v4.2 to generate haplotype frequencies and calculate the significance of the associations.

Results

Clinical evaluation of patients and controls

Two populations were evaluated in this study, participants recruited from the Anhui province in China and German patients treated at the University Medical Center Mannheim were included. The age of German controls was well matched to German breast cancer patients with a mean age of 60.6 years. As shown in Table 1, the percentage of patients with carcinoma in situ (Tis) was similar in the Chinese and German population, but tumor size (T1–T4) was larger and infiltration of lymph nodes (N0–N3) and metastatic spread (M0, M1) were increased in the Chinese cohort. These may be in part due to the breast cancer screening program established in Germany in the last decade, which results in the diagnosis of breast cancer at an early, non-invasive stage.

Association analyses

Six of the 18 SNPs (Table 2) were excluded for further analyses, because they did not pass quality control due to a significant deviation from HWE (rs1562430, rs889312, rs1011970 and rs2180341) or showed a low call rate (rs9902718 and rs13281615). All of the remaining SNPs genotyped in this study showed allele frequencies similar to the HapMap data (Supplementary Table 1) thereby confirming that the recruitment of our study cohort is representative for the Chinese and the German population, respectively. Of the 12 SNPs analyzed in this study, seven SNPs (rs2046210, rs3757318, rs4784227, rs1219648, rs3803662, rs8051542 and rs2981582) in the Chinese population and five SNPs (rs2046210, rs4784227, rs1219648, rs3803662 and rs8051542) in the German population were significantly associated with breast cancer (Table 3).

Table 3 Result of genotyped SNPs in Chinese and German populations

Three common susceptibility loci between the Chinese and the European population have been confirmed in this study. The five breast cancer susceptibility SNPs identified in the German population represent three independent genetic loci, ESR1 (6q25.1; rs2046210), FGFR2 (10q26.13; rs1219648) and TOX3 (16q12.1; rs3803662, rs8051542, rs4784227) and were also significant in the Chinese population plus two additional SNPs (rs3757318 and rs2981582) in the same loci (ESR1 and FGFR2, respectively).

ESR1 locus

The present study identified a significant association for rs2046210 on chromosome 6q25.1 in both populations (Table 3). In the Chinese population rs2046210 at ESR129 resulted in our study in a highly significant P=1.9 × 10−10 and an OR (95% CI) of 1.42 (1.28–1.59) as well as in the German population P=3.62 × 10−2 and OR (95% CI)=1.23 (1.01–1.48) (Table 3). This SNP showed the highest significant level and the strongest association with breast cancer risk in the Chinese population in our study sample. However, in the German population this SNP had the lowest association within the significant SNPs and was only marginally significant. Another SNP, rs3757318, which was located 200 kb upstream of the transcription start site of the gene ESR1, showed a significant association in the Chinese population (P=1.94 × 10−6) with an OR (95% CI) of 1.33 (1.18–1.49), but was not significantly associated with breast cancer in the German study group (P=1.74 × 10−1). However, the allele frequency of rs3757318 was much lower in the German (MAF=0.08 in the controls) than in the Chinese population (MAF=0.26 in the controls).

The two SNPs in this locus, rs2046210 and rs3757318, are in weak linkage disequilibrium in the Asian population (r2=0.50) and in the Caucasian population (r2=0.09). The pattern of linkage disequilibrium of our study populations is consistent with the HapMap data (Supplementary Table 2) and indicates that the two SNPs in this region are independent of each other. Haplotype analyses of two SNPs in the Chinese samples found one protective haplotype GC, which had a higher frequency in cases (53%) and in controls (62%). This haplotype showed a stronger association evidence for breast cancer in the Chinese population (P=8.3 × 10−11, OR=0.69; Table 4).

Table 4 Haplotype analysis

FGFR2 locus

Two SNPs in the FGFR2 locus, rs2981582 and rs1219648, were identified in this study to be associated in cases with a strong family history of breast cancer. SNP rs2981582, which lies within intron 2 of FGFR2, showed a significant association in the Chinese population (P=1.39 × 10−3) with an OR (95% CI) of 1.20 (1.07–1.34), but did not reach significance in the German population (P=2.6 × 10−1). The other SNP in this locus, rs1219648, displayed a significant association in the Chinese (P=1.41 × 10−4) and the German population (P=1.5 × 10−2) with a similar OR for each SNP (1.23 (1.11–1.37) and 1.26 (1.05–1.51), respectively). Haplotype analyses of two SNPs in the German samples found one risk haplotype GC, which had a lower frequency in cases (3%) and in controls (1%). This haplotype showed a stronger association evidence for breast cancer in the German population (P=5.4 × 10−5, OR=3.4) (Table 4).

TOX3 locus

All three SNPs within the TOX3 locus (rs8051542, rs3803662 and rs4784227) showed a significant association for breast cancer in both populations. In the Chinese population, the strongest effect was found for rs4784227 with an OR (95% CI) of 1.31 (1.16–1.47), being significant at P=9.3 × 10−6. Whereas the significance level in the German population was lower (P=2.9 × 10−2), mainly due to the smaller sample size. In the German population, the overall strongest association (P=4.01 × 10−4) in this study was found for rs3803662 with an OR (95% CI) of 1.43 (1.17–1.74). The third SNP in the TOX3 locus, rs8051542, had a higher effect size (OR (95% CI)= 1.30 (1.08–1.56)) and significance (P=5.2 × 10−3) in the German population compared with the Chinese population.

A computational analysis of haplotypes found the most common haplotype in the German population (CCC) was present in 43% of all cases and 51% of all controls, which conferred significant protection against breast cancer (OR=0.73; P=9.4 × 10−4) (Table 4).

Stratification

Stratification for age of onset was performed by dividing Chinese and German patients in groups younger and older than 50 years (Table 5). Analysis of the Chinese samples resulted in a higher effect size of rs4784227 (P=1.37 × 10−7, OR=1.49) in patients<50 years than in total samples (P=9.3 × 10−6, OR=1.31) and no association in cases≥50 years (P=4.31 × 10−1, OR=1.08), suggesting that this SNP confers a higher risk for the development of breast cancer for younger patients. By contrast, rs3757318 displayed a higher risk (OR=1.44) in cases older than 50 years compared with total samples (OR=1.33). In German patients younger than 50 years, rs2046210 and rs1219648 showed significant P-values and an increased effect size for breast cancer whereas risk alleles for rs3803662 and rs8051542 were enriched in patients older than 50 years. In addition, stratification of patients was performed according to clinical stage (Table 6). SNP rs2046210 and rs3757318 were significant and had a higher effect size for breast cancer in Stage 0 and Stage IV breast cancer cases compared with the combined analysis, suggesting that risk variants associated with the ESR1 locus were overrepresented in Stage 0 and Stage IV in the Chinese population. SNP rs1219648 and rs2981582 had a higher OR in Stage 0, suggesting a role for FGFR2 in early stages of breast cancer. The association of all significant SNPs in Stage II did not differ from all samples and stratification of Stage III samples resulted in no significant SNPs, which was due to the too small number (107 cases) belonging to this stage. Additionally, three SNPs (rs4784227, rs3803662 and rs8051542) were significant in Stage I with an increased OR and were also significant in the total sample analysis.

Table 5 Stratification—age of onset
Table 6 Stratification – Clinical stage of breast cancer (A) Chinese; (B) German

DISCUSSION

The association of 18 SNPs previously shown to be associated with breast cancer were evaluated, of which 12 SNPs passed quality control. Out of these, seven SNPs could be significantly confirmed in the Chinese population and five SNPs in the German population. To our knowledge, we have shown for the first time that rs3757318, reported to be associated with breast cancer risk in the European population, is also correlated with breast cancer in the Chinese population. These results extend the findings from previous GWAS in European descends to the Chinese population and may help to capture the causal variants in fine-mapping approaches.

In 2010 Turnbull et al27 identified for the first time SNP rs3757318 in the Caucasian population with an OR (95% CI) of 1.30 (1.17–1.46). Interestingly, no significant association with breast cancer risk could be replicated for this SNP in the pooled analysis of the German samples, albeit after stratification, a significant association was observed in Stage II breast cancer cases. However, a similar association (OR=1.33) was obtained for our Chinese study cohort. This SNP is located at chromosome 6q25.1, 200 kb upstream of the transcription start site of the gene encoding estrogen receptor 1 (ESR1). In 2009, Zheng et al29 conducted the first GWAS of breast cancer with an Asian population-based background in which a highly significant association (P=2.0 × 10−15) was identified for rs2046210, a SNP located 180 kb upstream of ESR1. In our Chinese samples the strongest association was identified for rs2046210 with a slightly lower P-value (P=1.9 × 10−10), potentially caused by the smaller study sample size. The minor allele frequency of rs3757318 was much lower in the German (MAF=0.08 in the controls) than in the Chinese population (MAF=0.26 in the controls). Taking into account the smaller German sample size, the power to identify a significant association for rs3757318 may not have been high enough for the German study group.

The two SNPs in this locus, rs2046210 and rs3757318, are in weak LD in the German and Chinese populations (r2=0.15 and 0.44, respectively). The pattern of LD of our study populations is consistent with the HapMap data (Supplementary Table 1) and indicates that the two SNPS in this region are independent of each other. Interestingly, the ESR1 locus has recently been confirmed in East Asians by Long et al.31 However, in this study SNP rs9383951 located in Intron 5 of the ESR1 gene showed the strongest association with breast cancer. ESR1 is a hormone receptor, which functions as a transcription factor upon ligand binding. The estrogen receptors have a central role in the pathology of breast cancer. Several studies have focused on the role of genetic variants in ESR1 and an increased breast cancer risk in the Asian32, 33, 34 and the Caucasian population.35, 36 The close location of rs3757318 to ESR1 suggests that correlated variants change expression patterns of ESR1 and thereby increase breast cancer risk.

The first GWAS on breast cancer in the Caucasian population identified SNP rs2981582 to be associated with breast cancer in cases with a strong family history of breast cancer.21 This SNP lies in an LD block within intron 2 of FGFR2. At the same time, Hunter et al23 identified rs1219648, located 6 kb upstream of rs2981582, as being significantly associated with sporadic breast cancer in postmenopausal women. SNP rs2981582 and rs1219648 were highly significant in the pooled analysis (P=2 × 10−76 and P=4.2 × 10−10) with a heterogeneous OR of 1.23 each, being similar in our Chinese and German cases (OR=1.23 and 1.26, respectively). FGFR2 is a tyrosine receptor kinase with several alternatively spliced variants, resulting in differential ligand binding and signal transduction.37 Amplification and overexpression of FGFR2 have been observed in primary tumors of the breast38 and breast cancer cell lines.39, 40 Interestingly, expression of FGFR2 has been implicated in the development of estrogen receptor-positive breast cancer,20, 41, 42 suggesting an interaction with genetic breast cancer risk variants involved in the metabolism of sex hormones.

The third locus identified in this work involves three SNPs located close to or in the open reading frame of TOX3 on chromosome 16q12.1. The pattern of LD in this region is very different in Asians from that in the Caucasian population (Supplementary Table 2). In the Chinese samples, SNP rs4784227 is in low LD with rs3803662 and rs8051542, with r2 being 0.14 and 0.37, respectively. In the German samples, however, SNP rs4784227 is in strong LD with rs3803662, having an r2 value of 0.86, whereas linkage with rs8051542 is low (r2=0.20). Loss of chromosomal material on the long arm of chromosome 16 was observed frequently in several cancers, including breast, prostate, ovarian and fallopian tubes cancer.43, 44 The function of TOX3 is unknown, but a putative high-mobility group motif suggests that it might act as a transcription factor or is involved in the alteration of the chromatin structure.45

In conclusion, using a Chinese and a German population-based case-control study, SNPs in the Chinese and German population associated with breast cancer were identified. These results provide additional support that the association of SNPs detected in Europeans is also found in the Chinese population. In spite of the difference in segregation of complex traits between the Chinese and German population, in principle the same loci were identified for the association of breast cancer. Although common variants confer only a minor increase of risk, a combination of genetic risk variants might be useful as a powerful predictor for the development of breast cancer. Fine-mapping is an essential step in the identification of a functional variant, which may help to improve clinical therapies for breast cancer.