Introduction

Genome-wide association studies (GWAS) of breast cancer have substantiated a role for common low-risk alleles in disease susceptibility.1, 2, 3, 4, 5, 6 To date, discovery and replication efforts have been limited primarily to populations of European ancestry, which traditionally have been the focus of most genetic studies of cancer. Prediction of individual risk on the basis of multiple risk markers is one potential utility of this new genetic information, although it has been persuasively argued7, 8, 9, 10 that many more common low penetrance genetic markers would need to be identified before they would have, individually or as a group, any public health utility. Nevertheless, private companies have already begun to market genetic tests for the currently known risk markers, such as: deCODE BreastCancer (deCODE, Reykjavik, Iceland) for women of European ancestry before the causal alleles underlying the marker associations have been identified. Whether these tests have any relation to risk at all in non-White populations, which make up a third of the US population, is not known.

Associations with cancer risk alleles may not be consistent across populations for a number of reasons,11 including differences by race/ethnicity of the linkage disequilibrium patterns relating a risk marker to the causal variant(s), and/or context dependency of the association resulting from genetic and environmental modifiers that vary in frequency across populations. Studies in multiple populations12, 13, 14 are needed to examine the generalizability of these markers before their potential for public health utility can be applied to populations of non-European ancestry. Here, we report on the association of 12 validated risk alleles identified in breast cancer GWAS conducted primarily in populations of European ancestry, among European-American, African-American, Native Hawaiian, Japanese-American and Latino breast cancer cases and controls from the Multiethnic Cohort (MEC) study.15

Materials and methods

Study population: the MEC

The MEC study is a prospective cohort study initiated between 1993 and 1996. The study consists of 215 251 adult men and women living in Hawaii and California (mainly Los Angeles County) mainly from the following populations: European-American, African-American, Native Hawaiian, Japanese-American and Latino. Drivers’ license files were used as a primary source to identify the study subjects. Participants entered the cohort study by completing and returning a self-administered questionnaire that asked information about general demographic characteristics as well as known breast cancer risk factors. Cases were identified through cohort linkage to population-based cancer surveillance, epidemiology and end results (SEER) registries in California and Hawaii.15 Through December 31, 2005, the breast cancer case-control study nested in the MEC assembled for genetic studies included 2224 cases and 2827 controls frequency-matched on race/ethnicity and age. For this study, we included additional African-American controls to allow for more precise risk estimation. The median ages of cases and controls were 66 and 65 years, respectively, and ranged from 44 to 87. This study was approved by the Institutional Review Boards at the University of Southern California and at the University of Hawaii.

Laboratory assays

We genotyped 12 SNPs from GWAS of breast cancer.1, 2, 3, 4, 5, 6 We also tested one additional variant in FGFR2 that was revealed by fine-mapping, which the MEC also participated in identifying (African-Americans only).16 Genotyping was performed using the TaqMan allelic discrimination assay.17 We substituted rs10483813 for rs999737 (14q24) as genotyping of the latter failed; the two SNPs have perfect correlation in European-Americans (r2=1). The overall genotyping call rate ranged from 94.6 to 100.0% (average, 97.9%) for the 13 variants. For blinded duplicates, the mismatch rate was <2% for all 13 SNPs (average <1%). Hardy–Weinberg equilibrium (HWE) testing was conducted for each variant in each population using a 1-df χ2-test, and all 13 variants were consistent with HWE using a criterion of P>0.01 in controls (Supplementary Table 1).

Statistical analysis

In each of the five racial/ethnic groups constituting the MEC, we examined the distribution and breast cancer risk associated with an unweighted summary score, taken as the number of risk alleles for 12 variants (1p11, rs11249433; 2q35, rs13387042; 3p24, rs4973768; 5p12, rs10941679; 5q11, rs889312; 6q25, rs2046210; 8q24, rs13281615; 10q26, rs2981582; 11p15, rs3817198; 14q24, rs10483813; 16q12, rs3803662; 17q23, rs6504950) (Supplementary Table 2) in order to determine their combined contribution to breast cancer risk. Analysis was conducted on 2171 cases and 2795 controls as individuals missing greater than or equal to four SNP genotypes were excluded (53 (2.4%) cases and 32 (1.1%) controls). Missing genotypes of individuals were given the mean score for that locus within each population. Odds ratios were estimated for this risk score, which ranged from 4 to 18 risk alleles per individual, with a median of 11, over all 5 racial/ethnic groups. Odds ratios were adjusted for age (quartiles) and race. As ancestry may differ by case-control status, SNPs may be associated with risk simply because they vary in frequency across racial/ethnic groups. Although we adjust for self-reported ethnicity, several of the populations we consider here are known to be admixed between two or more ancestral groups. We used principal components analysis18 to control for hidden population stratification (including admixture) that could otherwise cause confounding of unreported ethnicity (or ethnic mixture) with SNP effects. Specifically, we computed the first 10 eigenvectors for principal components analysis using a panel of >1300 SNPs from previous studies not linked to the 12 markers of interest here.19 These were included as adjustment variables in all models. All statistical analysis was performed in a SAS 9.1 package, SAS Institute Inc., Cary, NC, USA.

Results

Using the unweighted summary score, we observed a highly significant association with breast cancer risk in an ethnic-pooled analysis (per allele: OR, 1.09; 95% CI, 1.06–1.12; P=2.0 × 10−10) (Table 1), with women in the upper quintile having a 1.6-fold greater risk than women in the bottom quintile (>12 alleles vs <9 alleles; 95% CI, 1.32–1.97; P=3.0 × 10−6) (Supplementary Table 3) and women in the highest decile having a 2.0-fold greater risk of breast cancer, compared with those in the bottom decile (>13 alleles vs <8 alleles; 95% CI, 1.52–2.73; P=2.3 × 10−6).

Table 1 The summary associations of validated breast cancer risk variants in diverse populations

However, significant racial/ethnic heterogeneity was noted (P=0.030, 4-df test). Specifically, the summary score variable was found to be positively and significantly associated with breast cancer risk (P≤3.9 × 10−3) with effects per allele of ≥1.10 in all populations, except in African-Americans (per allele: OR=1.03, 95% CI=0.98–1.08, P=0.23; Table 1). The apparent lack of an association between breast cancer risk either with this aggregate allele count variable, and with many of the individual SNPs (Supplementary Table 4), in African-Americans suggests that few of these variants are likely to be markers of risk in the African-American population.

Given that several of the validated risk alleles have been more strongly associated with ER-positive disease,2, 3, 5, 20 we tested for heterogeneity of the risk score by ER status in each population. As is expected, the score was more informative for ER-positive disease than ER-negative disease (Supplementary Table 5). We observed the same pattern of association in ER-positive disease as in the overall pooled analysis, with the summary risk score being significantly associated with disease risk in most populations, and no significant association observed in African-Americans (Supplementary Table 5).

We tested for dominant and recessive effects of single SNPs and observed no significant evidence of a better model fit when the genotypes for each SNP were modeled in combination with the summary risk score (Supplementary Table 6). Odds ratios estimated over the range of observed allele counts were also inspected and were found to be very consistent with the assumption of linear allelic effect (Supplementary Table 3). We also tested for pair-wise gene by gene interactions and observed seven nominally significant interactions; however, none remained statistically significant after Bonferroni correction for multiple comparisons. We also constructed a second summary score, weighting each SNP by its published log OR (Supplementary Table 2). This score measure was highly correlated with the unweighted score (r=0.88), and was not superior to the unweighted score when included in the same model.

Discussion

An ultimate public health goal of mapping risk alleles is to predict individual risk so that we can identify those at greater risk, among whom targeted intervention and preventive measures may be applied. An understanding of the polygenic component to breast cancer risk would undoubtedly add significantly to risk prediction,7, 8 and to the efficacy of population-based programs for prevention and early detection. Nevertheless, as noted by Gail,9 a much larger number of modestly penetrant risk variants may be needed to make a significant impact on the problem of breast cancer risk prediction. Replicating aggregate allele counts, or other summary variables, thus, is much more important to this problem than the replication of any one specific risk marker. Our sample sizes were too small to fully replicate, in any single racial/ethnic group, the modest risks that each of the 12 validated markers have shown in Whites (Supplementary Table 2 and 4); however, we did have very good power to detect the effect of the aggregate variable in any specific racial/ethnic group assuming homogeneity of effect by ethnic group.

Possible explanations for the lack of association with the aggregate variable in African-Americans is that the majority of true risk alleles underlying the marker associations are rare in African-Americans and/or linkage disequilibrium does not extend as far in persons of African ancestry. Both possibilities emphasize the need to conduct full-scale high-density association studies to identify racial/ethnic specific risk markers or to further refine the association signals in the regions containing these risk alleles in racial/ethnic groups. For example, fine-mapping of FGFR2 has revealed a stronger marker of risk in African-Americans (rs2981578).16 This marker made some improvement to risk prediction with the aggregate score in this population (per allele: OR, 1.05; 95% CI, 1.00–1.10, P=0.054), which emphasizes the value to be gained from comprehensively surveying genetic variation across all risk loci in all populations. Another possible source of the difference in association among ethnic groups could be environmental exposures that vary in frequency across populations, and which may modify the effect of these variants. However, recent studies provide little support for known breast cancer risk factors serving as modifiers of the associations with these alleles.21

In summary, in this multiethnic study, we evaluated the generalizability of breast cancer risk markers identified by GWAS to other populations. We observed strong evidence that, in aggregate, the 12 published risk variants are strongly associated with breast cancer risk in the majority of, but not all, populations considered. However, even for populations of non-African origin, it is clear that many more variants will be needed for this risk score to be informative in predicting breast cancer risk. Larger studies that include even more diverse populations, aimed at discovery, validation and fine-mapping, are needed to identify an accurate and more complete set of risk alleles which could better determine the contribution of these genetic regions to breast cancer risk in various populations, especially for women of African ancestry.