Skip to main content

Thank you for visiting You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

Cross-ancestry GWAS meta-analysis identifies six breast cancer loci in African and European ancestry women


Our study describes breast cancer risk loci using a cross-ancestry GWAS approach. We first identify variants that are associated with breast cancer at P < 0.05 from African ancestry GWAS meta-analysis (9241 cases and 10193 controls), then meta-analyze with European ancestry GWAS data (122977 cases and 105974 controls) from the Breast Cancer Association Consortium. The approach identifies four loci for overall breast cancer risk [1p13.3, 5q31.1, 15q24 (two independent signals), and 15q26.3] and two loci for estrogen receptor-negative disease (1q41 and 7q11.23) at genome-wide significance. Four of the index single nucleotide polymorphisms (SNPs) lie within introns of genes (KCNK2, C5orf56, SCAMP2, and SIN3A) and the other index SNPs are located close to GSTM4, AMPD2, CASTOR2, and RP11-168G16.2. Here we present risk loci with consistent direction of associations in African and European descendants. The study suggests that replication across multiple ancestry populations can help improve the understanding of breast cancer genetics and identify causal variants.


Breast cancer is the most common cancer in women worldwide and accounted for 2.1 million new cases and 627,000 deaths in 20181. Studies have shown a significant contribution of genetic factors to breast cancer risk2,3, yet the landscape of this contribution has not been fully elucidated. Mutations in high- and moderate-penetrance genes confer relatively high risks of breast cancer but are rare in the population and account for <5–10% of cases4. Genome-wide association studies (GWAS) have been successful in identifying common low-penetrance genetic variation and approximately 200 risk loci have now been identified5,6,7. The risk loci so far identified have provided clues to elucidating breast cancer tumorigenesis through previously unknown mechanisms. Additionally, when combined into risk scores, these polymorphisms can be used for breast cancer risk prediction8.

Despite the usefulness of GWAS, the majority of the GWAS studies have been performed among European ancestry populations9,10,11,12,13, it is unclear whether the same genetic risk factors are also important in other populations, which may limit the applicability of the findings to other groups14. The earliest GWAS conducted in African ancestry populations identified genetic variants at 5p15.33 (TERT-CLPTM1L) associated with estrogen receptor (ER) negative breast cancer15. A larger analysis of African ancestry individuals which included several consortia identified a SNP at 3q26.21 also associated with ER-negative breast cancer16. Some common susceptibility loci are shared across populations, and the shared disease-associated variants are more likely to be causal6,9,14.

Here we present, using a cross-ancestry GWAS approach in 248,000 women, genetic risk variants at 1p13.3, 5q31.1, 15q24, and 15q26.3 for overall breast cancer, and at 1q41 and 7q11.23 for ER-negative disease. The consistency of the directions of the risk for these loci in African and European samples increases the likelihood of their being causal variants.


We discovered six loci containing seven SNPs significantly associated with breast cancer at P < 5 × 10−8 on cross-ancestry meta-analysis, with odds ratios (OR) ranging from 0.95 to 1.05 (Tables 1, 2; Supplementary Figs. 1, 2). Five SNPs were associated with overall breast cancer risk (rs17024628 at 1p13.3, rs2522057 at 5q31.1, rs1869959 at 15q24.1, rs60381548 at 15q24.2, rs181337095 at 15q26.3) and two were associated with ER-negative breast cancer (rs67931591 at 1q14 and rs1637365 at 7q11.2). The two SNPs at the 15q24 region were about 582 kb apart and independently associated with breast cancer risk. Four SNPs were within genes (rs67931591 in KCNK2, rs2522057 in C5orf56, rs1869959 in SCAMP2, and rs60381548 in SIN3A) and the others were in intergenic regions. The direction of the associations was consistent for the pooled African and European estimates. The estimates for overall and ER-negative breast cancer were generally consistent across the five contributing studies of African ancestry participants (Supplementary Table 2) and the BCAC European datasets (Supplementary Table 3).

Table 1 Novel breast cancer risk loci identified by cross-ancestry meta-analysis of African and European populations.
Table 2 Association analysis of novel SNPs in cross-ancestry combined meta-analysis by estrogen receptor status.

Conditional analysis revealed three additional independent signals significant at p < 10−4 at the 1p13.3 locus (rs116363925, rs114351980, and 1:109969874:C:T), two independent signals at 15q24 (rs113939578, rs12917507), and one each at 5q31.1 (5:132149322:G:GGCCGCCGCC) and 15q26.3 (rs117793215) for overall breast cancer risk. Another independent SNP at 1q41 that was associated with ER-negative breast cancer was rs5780828 (Table 3).

Table 3 Conditional regression analysis of top SNPs and others in the loci.

Concerning pleiotropy, none of the SNPs identified above have been reported in previous GWAS associations at genome-wide significance with cancers. Associations with mosquito bite size and asthma had been reported for rs2522057 and SNPs in LD with this lead SNP. For the 15q24 region, associations with cardiovascular phenotypes have been previously reported for rs1869959 while body height, glomerular filtration rate, and type 2 diabetes have been associated with rs60381548 and SNPs highly correlated with this lead SNP (Supplementary Table 4).

The eQTL analysis of breast tumors revealed significant associations in four loci: 1p13.3, 5q31, 15q24.1, and 15q24.2 (Supplementary Table 5A). There were significant associations (P < 10−6) between the protective allele of rs17024629 (T allele) at 1q13.3 and increased expression of GSTM1, GSTM2, and GSTM4, which are located 19 kb, 31 kb, and 51 kb downstream of the SNP. At 5q31, the top SNP rs2522057, located 15 kb downstream of IRF1, was most significantly associated with the gene’s expression levels. At 15q24.1, rs1869959, located 35 kb upstream of MPI and 12 kb upstream of ULK3, was significantly associated with the expression of these two genes. The other top SNP at the 15q24 locus, rs60381548, located intron of SIN3A, 30 kb downstream of PTPN9, 162 kb downstream of SNUPN, and 212 kb upstream of SNX33 was correlated with all four genes. The 1q41 locus revealed a significant association between rs67931591 and PTPN14. The SNP at 7q11.23 was significantly correlated with STAG3L2, a pseudogene. Previous published report on normal breast tissues from the GTEx revealed associations between rs2522057 and the SLC22A5 gene, and between rs1869959 and the ULK3 gene (Supplementary Table 5B).

Functional annotation analyses pointed out relationships with genomic functional biofeatures for rs2522057, rs17024629, rs1869959, and rs60381548 or SNPs in strong LD with these top SNPs in breast tissue-originated cell lines (Supplementary Tables 6, 7A, 7B). Active enhancer and promoter states were found for SNPs in strong LD with rs2522057 (rs2188962, rs4705950, rs4705950, rs72797306, rs11741255) using the 25-state chromatin model. Additional associations were found with histone modifications. These included: H3K4me1 and H3K27ac enhancer peaks for: rs2522057 and other SNPs in strong LD (rs2188962, rs17622378, rs12521868, rs146604341, rs11951091, rs6866614, rs4705950, rs72797303, rs2706396, rs2522052, rs2706403, rs2706336, rs72797306, rs2248116, rs11741255); those in strong LD with rs17024629 (rs538388, rs560674, rs568686, rs669426, rs3850616, rs17024628); a SNP in strong LD with rs1869959 (rs7180432); and for the top SNP rs60381548. H3K4me3 and H3K9ac promoter peaks were found for: rs2522057 and other SNPs in strong LD (rs12515180, rs11951091, rs72797306); SNPs in LD with rs17024629 (rs538388, rs669426, rs3850616); rs1869959 and other SNPs in strong LD (rs4886613, rs936230).

We evaluated the consistency of the association of the identified loci in Latinos, and found the effect and direction of the association were consistent in 8 out of 11 evaluated variants (Supplementary Table 8). However, none of these consistent variants was statistically significant at p < 0.05 in the Latino study of 2385 cases and 6416 controls.


We found seven variants associated with breast cancer risk among women of African ancestry that may contribute to better prediction of breast cancer risk and provide further insights into mechanisms of breast cancer carcinogenesis. Although the discovery of the loci is largely driven by effects in European ancestry populations, observation of risk loci in multiple ancestral populations lends credence to the chances of those variants being causal. We designed our current approach of cross-ancestry meta-analysis to uncover genetic variants shared across ancestry.

The SNPs identified in this study lie in regions that are close to genes that have been previously implicated in cancer. Interestingly we found three variants located within the introns of genes. One of the variants, rs67931591 was found in KCNK2 (also known as TREK1), which encodes the protein potassium channel subfamily K member 2, a member of the two-pore-domain background potassium channel family. Potassium channels are known to play a role in cancer and studies using TCGA data have shown associations with DNA methylation in the KCNK genes and triple negative breast cancer. Additionally, overexpression of KCNK5, KCNK9, and KCNK12 and under-expression of KCNK6 and KCNK15 were associated with triple negative breast cancer17. Other studies investigated expression of KCNK2 gene as potential prognostic markers. For example, Innamaa et al.18 found increased KCNK2 expression in human ovaries and a role in cell proliferation and apoptosis for KCNK2 modulators in ovarian cancer cell lines. Li et al.19 found differential expression of KCNK2, KCNK15 and KCNK17 in liver cancer cells compared to healthy tissue. KCNK2 has also been reported in amplified regions in a genome-wide scan of chromosomal alterations in esophageal squamous cell carcinoma20.

We found two independent SNPs at the 15q24 locus at about 582 kb apart (rs1869959 at 15q24.1 in the SCAMP2 intron and rs60381548 at 15q24.2 in the SIN3A gene). The SIN3A gene was associated with rs60381548 in the eQTL analysis of breast tumor in the present study. Switch-independent 3 family A (SIN3A) is a transcriptional regulator, that along with its paralog and corepressor play important roles in normal breast development, cancer and metastasis21,22,23. Furthermore, SIN3A mediates STAT3 transcriptional repressor activity24 and along with genes involved in histone modification such as HDAC and Lysine specific demethylase (LSD), inhibits several cancer genes including CASP7, TGFB2, CDKN1A, HIF1A, TERT and MDM225. Studies have shown key roles for SIN3A in breast cancer including sensitivity to chemotherapy25 and breast cancer progression26,27.

The other SNP at 15q24, rs1869959, is located in the intron of the SCAMP2 gene that codes for secretory carrier associated membrane protein 2 that functions as carriers to the cell surface in post-golgi recycling pathways28. The recent GTEx project pilot study found significant associations between the SNP and SCAMP2 in esophageal mucosa, ULK3 in breast mammary tissue, adipose, whole blood, and lung tissue29,30. We also found that rs1869959 was associated the expression of ULK3 in breast tumor. ULK3 is a serine threonine kinase that activates GLI2, a key component of the Hedgehog signaling pathway, and implicated in many cancers31,32.

Similarly, the C5orf56 gene harboring the rs2522057 SNP returned no interesting associations with cancer. However, nearby genes in the 5q31 locus included RAD50, that codes for a DNA repair protein, a part of the MRE11-RAD50-NBS1 complex33. Other nearby genes include SLC22A5 solute carrier family 22 member 5 encoding the OCTN2 (organic cation transporter protein), and IRF1 that encodes interferon regulatory factor 1. SLC22A5 is an estrogen-dependent gene whose expression is associated with ER status in breast cancer cell lines and tissue specimens34. Significantly decreased levels of SLC22A5 have been reported in colorectal cancer tissues compared to normal tissues in eQTL studies35. Moreover, eQTL studies report associations between rs2522057 and gene expression in several tissues including breast mammary tissue, lymphocytes, esophageal mucosa, lung, skeletal muscle, skin, thyroid and whole blood29,30. We found an association between rs2522057 and IRF1 expression levels in the eQTL analysis of breast cancer in this study lending support to the likelihood of involvement of the IRF1 gene in the mechanism of the SNP on breast cancer carcinogenesis. Additionally, IRF1 has been shown to have tumor suppressor functions in breast cancer through its inhibition of NF-kB36 and CASP8 activation and induction of apoptosis37.

The majority of GWAS-identified SNPs were located in non-coding regions of the genome, and three loci in the present study were found in intergenic regions. The closest gene to rs17024629 is AMPD2 (high adenosine monophosphate deaminase 2) and has recently been shown to predict worse outcomes in undifferentiated pleomorphic sarcoma38. Earlier studies39 found high expression levels of AMPD2 in hepatocellular carcinoma, though the levels did not differ substantially from those in the non-tumorous organ. It is noteworthy that our eQTL analysis did not find a significant association with AMPD2 expression. The carcinogen metabolism genes, GSTM1, GSTM2, and GSTM4 are also located in this region and our eQTL analysis of breast tumor revealed highly significant associations between rs17024629 and these genes. The GSTM1 null genotype has been associated with several cancers including cancers of the colorectum, oral cavity, lung, cervix, and stomach40,41,42,43,44,45,46,47. In eQTL studies, GSTM4 was significantly associated with gene expression in several tissues including the aorta, lungs, tibia nerve and whole blood29,30.

The rs1637365 SNP at the 7q11.23 locus is near the CASTOR2 gene (cytosolic arginine sensor for MTORC1 protein, also known as GATSL1, GATS-like protein 1). The CASTOR proteins are arginine sensors that function as negative regulators of the TORC1 signaling pathway, an often dysregulated pathway in human cancer, through the GATOR complex, inhibiting mTORC148,49. The rs181337095 SNP is located 6 kb 5′ of RP11-168G16.2, an antisense DNA.

A potential limitation of this study is the different genotyping platforms used by the different consortia. However, stringent QC measures pre- and post-imputation were carried out. Additionally, the meta-analysis did not reveal significant heterogeneity across studies. Secondly, the sample size for ER-negative breast cancer cases was relatively small, thus reducing the precision of the estimates and providing less power for detecting risk loci. The third limitation is related to the additional SNPs identified at the same loci with the index SNPs from the conditional regression analysis. The regression procedures were based on a liberal p value cutoff of 10−4, and the chance that some of the identified SNPs could be spurious findings cannot be ruled out. Another noteworthy point is that identification of genetic variants in GWAS is just the first step of the discovery of true causal variants and genes associated with breast cancer. Further studies are needed, including in vitro and in vivo functional studies to elucidate the mechanisms by which identified putative causal variants are acting and identify the targeted genes, Finally, although the direction and strength of the associations were consistent between African and European populations, and mostly consistent with Latino populations, we could not find statistically significant replication of the identified variants, which are likely due to the modest sample sizes of the Latino study.

Our study found six loci that could provide further insights into pathways for breast cancer carcinogenesis. The genetic variants that shared across ancestry populations makes them possible causal variants. Functional studies on these loci are desirable to identify causal variants and elucidate the mechanisms of breast cancer carcinogenesis. In addition, future studies can evaluate these variants for breast cancer risk prediction, particularly in African ancestry populations.


Study population

Data for this study were obtained from four consortia of African ancestry populations (ROOT, AMBER, AABC, and BCAC-African ancestry)16 and the Ghana Breast Health Study (GBHS)50,51, with a combined sample size of 19434 participants including 9241 cases and 10193 controls (Supplementary Table 1). Estimates from these studies were meta-analysed to generate pooled African ancestry estimates of breast cancer risk. Additionally, we used summary estimates (odds ratios, ORs) of breast cancer from European ancestry BCAC datasets (GWAS, iCOGs and OncoArray) with a combined sample size of 228,951 (122,977 cases and 105,974 controls)6.

Genotyping and quality control

Genotyping and quality control (QC) procedures have been described in detail for the three consortia16 and the BCAC European ancestry data6. The AABC was genotyped using the Illumina Human 1M-Duo BeadChip. After QC, a total of 3007 cases (1518 ER-positive, 987 ER-negative) and 2720 controls remained in the analysis52. Genotyping in the ROOT consortium was done using Illumina HumanOmni 2.5-8v1 array and 1657 cases (374 ER-positive, 403 ER-negative) and 2029 controls passed QC. In the BCAC-African ancestry consortium, genotyping was done using the Illumina OncoArray (260K GWAS backbone) and after removing overlapped samples between OncoArray with AABC, AMBER and ROOT and samples failed in QC, a total of 2271 cases (1130 ER-positive, 613 ER-negative) and 1406 controls remained for analysis. The Illumina MEGA array was used for genotyping in the AMBER consortium, and 1407 cases (952 ER-positive, 385 ER-negative) and 2408 controls remained in analysis passed QC. In the GBHS, Illumina Global Screening Array was used for genotyping, and 899 cases (296 ER-positive, 277 ER-negative) and 1630 controls were included in analysis. Imputation for all studies was done using the cosmopolitan reference panel in the 1000 Genomes Project (Phase 3 release).

In addition, we examined the association between the identified SNPs of interest and breast cancer risk in a GWAS of Latinos (2385 cases and 6416 controls). Details of the genotyping, QC and data analysis have been published53.

Data analysis


In the ROOT and AABC GWAS studies, genotyped SNPs were analyzed and imputed with imputation score >0.3 and minor allele frequency >0.01 to account for uncertainty in imputation. Unconditional logistic regression was used to examine the association of each SNP and breast cancer risk adjusting for age, study site and eigenvectors from Principal Components Analysis (PCA). In the ROOT GWAS, the first four eigenvectors were used to control for population stratification as only the first 4 eigenvectors were associated with case status. The AABC GWAS adjusted for the first 10 eigenvectors from the PCA. OR and 95% confidence intervals (CI) were calculated from the multivariable logistic regressions. All tests of statistical significance were two sided. Using similar methods, separate analyses were conducted to compare ER-positive and ER-negative breast cancers with controls. The AMBER consortium estimated ORs and P values using unconditional logistic regression, adjusting for 10-year age group, sample type (saliva, blood, other), study (Black Women’s Health Study (BWHS) versus others) and PCs that associated with breast cancer at P < 0.1. The GBHS estimated per-allele ORs and 95% CI for each SNP on allele counts (dosages) using unconditional logistic regression adjusting for the first ten principal components, self-reported ethnicity and age. In the Oncoarray African ancestry samples, a total of 27 million SNPs with MAF ≥ 0.1% and imputation quality score ≥0.3 were included in the analysis. PCs were estimated using EIGENSTRAT. ORs and P value of each SNP were estimated using unconditional logistic regression, adjusting for age, study (Women of African Ancestry Breast Cancer Study—WAABCS versus other) and the first ten PCs.

The BCAC European study used a two-stage imputation approach, using SHAPEIT2 for phasing and IMPUTE version 2 for imputation. The first ten principal components and country were adjusted for in the logistic regression, and per-allele ORs and standard errors were computed6.


Regression coefficient estimates from the five contributing African ancestry studies were combined in a fixed effects meta-analysis using METAL54. Variants associated with breast cancer at P < 0.05 from the African ancestry meta-analysis were then combined in another fixed effects meta-analysis with the coefficients from the BCAC European ancestry data. Heterogeneity in both meta-analyses was assessed using the I2 statistic. SNPs that were significant genome-wide (P < 5 × 10−8) in the cross-ancestry meta-analysis, and >500 kb away from the 180 loci known to be associated with breast cancer risk were identified5,6. Conditional analysis below confirmed the identified loci. All analyses were done separately for ER-positive, ER-negative, and overall breast cancer risk.

Regression analysis conditional on index SNPs

In order to identify independent SNPs in the identified loci, conditional analysis was done in each of the regions, including all variants in the flanking ±500 kb region of the lead SNP. The 15q24 region had two SNPs about 582 kb apart that were both genome-wide significant (see results for details). Hence, all variants in the region extending from 500 kb upstream of the proximal SNP and 500 kb downstream of the other SNP were included in the conditional analysis for this region. We used the GCTA software with the –COJO option55, that utilizes summary statistics and population-specific linkage disequilibrium (LD) from 1000 Genomes Project, for the computation of conditional beta coefficients. SNPs significant at p < 10−4 after adjusting for lead SNP were considered as independent signals. The p < 10−4 cutoff was derived by applying a factor of 3000 (the ratio of the 3 billion base pairs genome-wide to the 1 million base pairs in each region in the conditional analysis) to the GWAS significance of 5 × 10−8. This procedure was repeated until no additional independent signals were significant. In addition to the conditional analysis involving the lead SNP and one other candidate SNP, we also determined joint ORs including all independent loci in the same model. Separate analyses were done for African and European ancestry data, and the estimates from the conditional analysis were combined in a meta-analysis.

Functional annotation

The functional annotations of the SNPs were determined using HaploReg v4.156. Using data from ENCODE57 and the Roadmap Epigenomics Consortium58, we examined the chromatin states including core 15-state model and 25-state model using 12 imputed marks, H3K4me1 and H3K27ac (enhancers), and H3K4me3 and H3K9ac (promoters) for each identified SNP and other SNPs in strong LD with these lead SNPs (>0.8). We also assessed evolutionary conserved regions, DNase hypersensitivity sites, and variant effect on regulatory motifs, proteins bound and eQTL hits from previous studies.

eQTL analysis

We carried out a cis-eQTL analysis to understand possible target genes in the six loci. All genes within ±1 MB around each index SNP were evaluated and gene expression in breast tumors from TCGA breast cancer patients (African ancestry, n = 164 and European ancestry, n = 778) were used in the analysis. A linear regression model estimated additive effects for each SNP, adjusting for age, ancestry, copy number variation, batch effect, and molecular subtype. Separate analyses were done for African and European ancestry samples and the estimates were meta-analysed to obtain overall estimates. Bonferroni significance levels were applied to determine statistical significance. We also checked associations between the identified loci and gene expression in several tissues, including normal breast, that had been published from previous eQTL analyses on the Haploreg website.

Allelic pleiotropy

We assessed the GWAS catalog ( for previously reported associations for the identified lead SNPs and all other SNPs in LD with r2 > 0.4 and phenotypes.

Ethical approval

Informed consent was obtained from all subjects included in the analysis. The relevant ethical review boards at all participating institutions approved study protocols.

Data availability

The genotype datasets used in this study are publicly available via dbGaP ( including AABC under accession code phs000851.v1.p1, ONCO under accession code phs001265.v1.p1, AMBER under accession code phs000669.v1.p1, ROOT under accession code phs000383.v1.p1, and GBHS under accession code phs002387.v1.p1. Data for TCGA is available via The remaining data are available within the Article, Supplementary Information or Source Data file.

Code availability

The codes and summary statistics of the top 10000+ variants are stored on github ( The remaining data are available within the Article, Supplementary Information or available from the authors upon request.


  1. 1.

    Bray, F. et al. Global cancer statistics 2018: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries. CA Cancer J. Clin. 68, 394–424 (2018).

    PubMed  PubMed Central  Article  Google Scholar 

  2. 2.

    Lichtenstein, P. et al. Environmental and heritable factors in the causation of cancer–analyses of cohorts of twins from Sweden, Denmark, and Finland. N. Engl. J. Med. 343, 78–85 (2000).

    CAS  PubMed  Article  PubMed Central  Google Scholar 

  3. 3.

    Mucci, L. A. et al. Familial risk and heritability of cancer among twins in nordic countries. JAMA 315, 68–76 (2016).

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  4. 4.

    Couch, F. J., Nathanson, K. L. & Offit, K. Two decades after BRCA: setting paradigms in personalized cancer care and prevention. Science 343, 1466–1470 (2014).

    ADS  CAS  PubMed  PubMed Central  Article  Google Scholar 

  5. 5.

    Lilyquist, J., Ruddy, K. J., Vachon, C. M. & Couch, F. J. Common genetic variation and breast cancer risk-past, present, and future. Cancer Epidemiol. Biomark. Prev. 27, 380–394 (2018).

    CAS  Article  Google Scholar 

  6. 6.

    Michailidou, K. et al. Association analysis identifies 65 new breast cancer risk loci. Nature 551, 92–94 (2017).

    ADS  PubMed  PubMed Central  Article  CAS  Google Scholar 

  7. 7.

    Shu, X. et al. Identification of novel breast cancer susceptibility loci in meta-analyses conducted among Asian and European descendants. Nat. Commun. 11, 1217 (2020).

    ADS  CAS  PubMed  PubMed Central  Article  Google Scholar 

  8. 8.

    Mavaddat, N. et al. Polygenic risk scores for prediction of breast cancer and breast cancer subtypes. Am. J. Hum. Genet. 104, 21–34 (2019).

    CAS  PubMed  Article  PubMed Central  Google Scholar 

  9. 9.

    Milne, R. L. et al. Identification of ten variants associated with risk of estrogen-receptor-negative breast cancer. Nat. Genet. 49, 1767–1778 (2017).

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  10. 10.

    Michailidou, K. et al. Genome-wide association analysis of more than 120,000 individuals identifies 15 new susceptibility loci for breast cancer. Nat. Genet. 47, 373–380 (2015).

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  11. 11.

    Fletcher, O. et al. Novel breast cancer susceptibility locus at 9q31.2: results of a genome-wide association study. J. Natl Cancer Inst. 103, 425–435 (2011).

    CAS  PubMed  Article  PubMed Central  Google Scholar 

  12. 12.

    Turnbull, C. et al. Genome-wide association study identifies five new breast cancer susceptibility loci. Nat. Genet. 42, 504–507 (2010).

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  13. 13.

    Ghoussaini, M. et al. Genome-wide association analysis identifies three new breast cancer susceptibility loci. Nat. Genet 44, 312–318 (2012).

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  14. 14.

    Park, S. L., Cheng, I. & Haiman, C. A. Genome-wide association studies of cancer in diverse populations. Cancer Epidemiol. Biomark. Prev. 27, 405–417 (2018).

    CAS  Article  Google Scholar 

  15. 15.

    Haiman, C. A. et al. A common variant at the TERT-CLPTM1L locus is associated with estrogen receptor-negative breast cancer. Nat. Genet. 43, 1210–1214 (2011).

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  16. 16.

    Huo, D. et al. Genome-wide association studies in women of African ancestry identified 3q26.21 as a novel susceptibility locus for oestrogen receptor negative breast cancer. Hum. Mol. Genet. 25, 4835–4846 (2016).

    CAS  PubMed  PubMed Central  Google Scholar 

  17. 17.

    Dookeran, K. A., Zhang, W., Stayner, L. & Argos, M. Associations of two-pore domain potassium channels and triple negative breast cancer subtype in The Cancer Genome Atlas: systematic evaluation of gene expression and methylation. BMC Res. Notes 10, 475 (2017).

    PubMed  PubMed Central  Article  CAS  Google Scholar 

  18. 18.

    Innamaa, A. et al. Expression and effects of modulation of the K2P potassium channels TREK-1 (KCNK2) and TREK-2 (KCNK10) in the normal human ovary and epithelial ovarian cancer. Clin. Transl. Oncol. 15, 910–918 (2013).

    CAS  PubMed  Article  PubMed Central  Google Scholar 

  19. 19.

    Li, W. C. et al. KCNK levels are prognostic and diagnostic markers for hepatocellular carcinoma. Aging (Albany NY) 11, 8169–8182 (2019).

    CAS  Article  Google Scholar 

  20. 20.

    Chattopadhyay, I. et al. Genome-wide analysis of chromosomal alterations in patients with esophageal squamous cell carcinoma exposed to tobacco and betel quid from high-risk area in India. Mutat. Res. 696, 130–138 (2010).

    CAS  PubMed  Article  PubMed Central  Google Scholar 

  21. 21.

    Lewis, M. J. et al. SIN3A and SIN3B differentially regulate breast cancer metastasis. Oncotarget 7, 78713–78725 (2016).

    PubMed  PubMed Central  Article  Google Scholar 

  22. 22.

    Bansal, N., David, G., Farias, E. & Waxman, S. Emerging Roles of Epigenetic Regulator Sin3 in Cancer. Adv. Cancer Res. 130, 113–135 (2016).

    CAS  PubMed  Article  PubMed Central  Google Scholar 

  23. 23.

    Ellison-Zelski, S. J. & Alarid, E. T. Maximum growth and survival of estrogen receptor-alpha positive breast cancer cells requires the Sin3A transcriptional repressor. Mol. Cancer 9, 263 (2010).

    PubMed  PubMed Central  Article  CAS  Google Scholar 

  24. 24.

    Gambi, G. et al. The transcriptional regulator Sin3A contributes to the oncogenic potential of STAT3. Cancer Res. 79, 3076–3087 (2019).

    CAS  PubMed  Article  PubMed Central  Google Scholar 

  25. 25.

    Yang, Y. et al. LSD1 coordinates with the SIN3A/HDAC complex and maintains sensitivity to chemotherapy in breast cancer. J. Mol. Cell Biol. 10, 285–301 (2018).

    CAS  PubMed  Article  PubMed Central  Google Scholar 

  26. 26.

    Watanabe, K. et al. A novel somatic mutation of SIN3A detected in breast cancer by whole-exome sequencing enhances cell proliferation through ERalpha expression. Sci. Rep. 8, 16000 (2018).

    ADS  PubMed  PubMed Central  Article  CAS  Google Scholar 

  27. 27.

    Li, W. et al. The FOXN3-NEAT1-SIN3A repressor complex promotes progression of hormonally responsive breast cancer. J. Clin. Investig. 127, 3421–3440 (2017).

    PubMed  PubMed Central  Article  Google Scholar 

  28. 28.

    Singleton, D. R., Wu, T. T. & Castle, J. D. Three mammalian SCAMPs (secretory carrier membrane proteins) are highly related products of distinct genes having similar subcellular distributions. J. Cell Sci. 110, 2099–2107 (1997).

    CAS  PubMed  Article  PubMed Central  Google Scholar 

  29. 29.

    Consortium, G. T. Human genomics. The Genotype-Tissue Expression (GTEx) pilot analysis: multitissue gene regulation in humans. Science 348, 648–660 (2015).

    Article  CAS  Google Scholar 

  30. 30.

    Westra, H. J. et al. Systematic identification of trans eQTLs as putative drivers of known disease associations. Nat. Genet. 45, 1238–1243 (2013).

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  31. 31.

    Goruppi, S. et al. The ULK3 kinase is critical for convergent control of cancer-associated fibroblast activation by CSL and GLI. Cell Rep. 20, 2468–2479 (2017).

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  32. 32.

    Maloverjan, A., Piirsoo, M., Michelson, P., Kogerman, P. & Osterlund, T. Identification of a novel serine/threonine kinase ULK3 as a positive regulator of Hedgehog pathway. Exp. Cell Res. 316, 627–637 (2010).

    CAS  PubMed  Article  PubMed Central  Google Scholar 

  33. 33.

    Zhong, Q. et al. Association of BRCA1 with the hRad50-hMre11-p95 complex and the DNA damage response. Science 285, 747–750 (1999).

    CAS  PubMed  Article  PubMed Central  Google Scholar 

  34. 34.

    Wang, C., Uray, I. P., Mazumdar, A., Mayer, J. A. & Brown, P. H. SLC22A5/OCTN2 expression in breast cancer is induced by estrogen via a novel intronic estrogen-response element (ERE). Breast Cancer Res. Treat. 134, 101–115 (2012).

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  35. 35.

    Zou, D. et al. Integrative expression quantitative trait locus-based analysis of colorectal cancer identified a functional polymorphism regulating SLC22A5 expression. Eur. J. Cancer 93, 1–9 (2018).

    CAS  PubMed  Article  PubMed Central  Google Scholar 

  36. 36.

    Armstrong, M. J. et al. IRF-1 inhibits NF-kappaB activity, suppresses TRAF2 and cIAP1 and induces breast cancer cell specific growth inhibition. Cancer Biol. Ther. 16, 1029–1041 (2015).

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  37. 37.

    Bouker, K. B. et al. Interferon regulatory factor-1 (IRF-1) exhibits tumor suppressor activities in breast cancer associated with caspase activation and induction of apoptosis. Carcinogenesis 26, 1527–1535 (2005).

    CAS  PubMed  Article  PubMed Central  Google Scholar 

  38. 38.

    Orth, M. F. et al. Functional genomics identifies AMPD2 as a new prognostic marker for undifferentiated pleomorphic sarcoma. Int J. Cancer 144, 859–867 (2019).

    CAS  PubMed  Article  PubMed Central  Google Scholar 

  39. 39.

    Szydlowska, M. & Roszkowska, A. Expression patterns of AMP-deaminase isozymes in human hepatocellular carcinoma (HCC). Mol. Cell Biochem. 318, 1–5 (2008).

    CAS  PubMed  Article  PubMed Central  Google Scholar 

  40. 40.

    Li, J. et al. Breast cancer genetic risk profile is differentially associated with interval and screen-detected breast cancers. Ann. Oncol. 26, 517–522 (2015).

    CAS  PubMed  Article  PubMed Central  Google Scholar 

  41. 41.

    Liu, X. et al. Meta-analysis of GSTM1 null genotype and lung cancer risk in Asians. Med. Sci. Monit. 20, 1239–1245 (2014).

    PubMed  PubMed Central  Article  Google Scholar 

  42. 42.

    Cai, X., Yang, L., Chen, H. & Wang, C. An updated meta-analysis of the association between GSTM1 polymorphism and colorectal cancer in Asians. Tumour Biol. 35, 949–953 (2014).

    CAS  PubMed  Article  PubMed Central  Google Scholar 

  43. 43.

    Zhao, Y. et al. GSTM1 polymorphism and lung cancer risk among East Asian populations: a meta-analysis. Tumour Biol. 35, 6493–6500 (2014).

    CAS  PubMed  Article  PubMed Central  Google Scholar 

  44. 44.

    Zhang, X. L. & Cui, Y. H. GSTM1 null genotype and gastric cancer risk in the Chinese population: an updated meta-analysis and review. Onco Targets Ther. 8, 969–975 (2015).

    PubMed  PubMed Central  Article  Google Scholar 

  45. 45.

    Yang, H. et al. The association of GSTM1 deletion polymorphism with lung cancer risk in Chinese population: evidence from an updated meta-analysis. Sci. Rep. 5, 9392 (2015).

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  46. 46.

    Gu, J. et al. GSTM1 null genotype is associated with increased risk of gastric cancer in both ever-smokers and non-smokers: a meta-analysis of case-control studies. Tumour Biol. 35, 3439–3445 (2014).

    CAS  PubMed  Article  PubMed Central  Google Scholar 

  47. 47.

    Economopoulos, K. P., Choussein, S., Vlahos, N. F. & Sergentanis, T. N. GSTM1 polymorphism, GSTT1 polymorphism, and cervical cancer risk: a meta-analysis. Int J. Gynecol. Cancer 20, 1576–1580 (2010).

    PubMed  Article  PubMed Central  Google Scholar 

  48. 48.

    Kim, L. C., Cook, R. S. & Chen, J. mTORC1 and mTORC2 in cancer and the tumor microenvironment. Oncogene 36, 2191–2201 (2017).

    CAS  PubMed  Article  PubMed Central  Google Scholar 

  49. 49.

    Chantranupong, L. et al. The CASTOR Proteins Are Arginine Sensors for the mTORC1 Pathway. Cell 165, 153–164 (2016).

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  50. 50.

    Nyante, S. J. et al. Recruiting population controls for case-control studies in sub-Saharan Africa: the Ghana Breast Health Study. PLoS ONE 14, e0215347 (2019).

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  51. 51.

    Brinton, L. A. et al. Design considerations for identifying breast cancer risk factors in a population-based study in Africa. Int. J. Cancer 140, 2667–2677 (2017).

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  52. 52.

    Feng, Y. et al. A comprehensive examination of breast cancer risk loci in African American women. Hum. Mol. Genet. 23, 5518–5526 (2014).

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  53. 53.

    Hoffman, J. et al. Identification of novel common breast cancer risk variants at the 6q25 locus among Latinas. Breast Cancer Res. 21, 3 (2019).

    PubMed  PubMed Central  Article  Google Scholar 

  54. 54.

    Willer, C. J., Li, Y. & Abecasis, G. R. METAL: fast and efficient meta-analysis of genomewide association scans. Bioinformatics 26, 2190–2191 (2010).

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  55. 55.

    Yang, J., Lee, S. H., Goddard, M. E. & Visscher, P. M. GCTA: a tool for genome-wide complex trait analysis. Am. J. Hum. Genet. 88, 76–82 (2011).

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  56. 56.

    Ward, L. D. & Kellis, M. HaploReg v4: systematic mining of putative causal variants, cell types, regulators and target genes for human complex traits and disease. Nucleic Acids Res. 44, D877–D881 (2016).

    CAS  PubMed  Article  PubMed Central  Google Scholar 

  57. 57.

    Encode Project Consortium. A user’s guide to the encyclopedia of DNA elements (ENCODE). PLoS Biol. 9, e1001046 (2011).

    Article  CAS  Google Scholar 

  58. 58.

    Roadmap Epigenomics C. et al. Integrative analysis of 111 reference human epigenomes. Nature 518, 317–330 (2015).

    Article  CAS  Google Scholar 

Download references


The ROOT Consortium was supported by National Cancer Institute grants R01-CA142996, R01-CA89085, R01-CA228198, and P20-CA233307. D.H. and G.G. were also partially supported by the National Cancer Institute (R03-CA227357 and R01-CA242929). D.H., O.I.O., and K.L.N. were also partially supported by Breast Cancer Research Foundation (BCRF-20-071). K.L.N. was also supported by Basser Center for BRCA. AABC was supported by a Department of Defense Breast Cancer Research Program Era of Hope Scholar Award to C.A.H. [W81XWH-08-1-0383] and the Norris Foundation. Each of the participating AABC studies was supported by the following grants: MEC (National Institutes of Health grants R01-CA63464 and R37-CA54281); CARE (National Institute for Child Health and Development grant NO1-HD-3-3175, K05 CA136967); WCHS (U.S. Army Medical Research and Material Command (USAMRMC) grant DAMD-17-01-0-0334, the National Institutes of Health grant R01-CA100598, and the Breast Cancer Research Foundation; SFBCS (National Institutes of Health grant R01-CA077305 and United States Army Medical Research Program grant DAMD-17-96-6071); NC-BCFR (National Institutes of Health grant U01-CA069417); CBCS (National Institutes of Health Specialized Program of Research Excellence in Breast Cancer, grant number P50-CA58223, and Center for Environmental Health and Susceptibility National Institute of Environmental Health Sciences, National Institutes of Health, grant number P30-ES10126); PLCO (Intramural Research Program, National Cancer Institute, National Institutes of Health); NBHS (National Institutes of Health grant R01-CA100374). The Breast Cancer Family Registry (BCFR) was supported by the National Cancer Institute, National Institutes of Health under RFA-CA-06-503 and through cooperative agreements with members of the Breast Cancer Family Registry and Principal Investigators. The content of this manuscript does not necessarily reflect the views or policies of the National Cancer Institute or any of the collaborating centers in the BCFR, nor does mention of trade names, commercial products, or organizations imply endorsement by the U.S. Government or the BCFR. MP was supported by Breast Cancer Research Foundation, Tower Cancer Research Foundation, and a gift from Dr. Richard Balch. AMBER was supported by the National Cancer Institute grants P01-CA151135, R01-CA098663, R01-CA058420, UM1-CA164974, R01-CA100598, P50-CA58223, and the University Cancer Research Fund of North Carolina. JRP was supported by the Susan G. Komen Foundation and the Karin Grunebaum Foundation. Pathology data were obtained from numerous state cancer registries (Arizona, California, Colorado, Connecticut, Delaware, District of Columbia, Florida, Georgia, Hawaii, Illinois, Indiana, Kentucky, Louisiana, Maryland, Massachusetts, Michigan, New Jersey, New York, North Carolina, Oklahoma, Pennsylvania, South Carolina, Tennessee, Texas, Virginia). The results reported do not necessarily represent their views or the views of the National Institutes of Health. The study protocol for the BWHS was approved by the Boston University Medical Center IRB and the IRBs of participating cancer registries, as required. BCAC is funded by Cancer Research UK [C1287/A16563, C1287/A10118], the European Union’s Horizon 2020 Research and Innovation Program (grant numbers 634935 and 633784 for BRIDGES and B-CAST respectively), and by the European Community´s Seventh Framework Program under grant agreement number 223175 (grant number HEALTH-F2-2009-223175) (COGS). The EU Horizon 2020 Research and Innovation Program funding source had no role in study design, data collection, data analysis, data interpretation or writing of the report. The Sister Study was funded by the Intramural Research Program of the NIH, National Institute of Environmental Health Sciences (Z01-ES044005). GBHS authors acknowledge the research contributions of the Cancer Genomics Research Laboratory for their expertise, execution, and support of this research in the areas of project planning, wet laboratory processing of specimens, and bioinformatics analysis of generated data. This project has been funded in whole or in part with Federal funds from the National Cancer Institute, National Institutes of Health, under NCI Contract No. 75N910D00024. The content of this publication does not necessarily reflect the views or policies of the Department of Health and Human Services, nor does mention of trade names, commercial products, or organizations imply endorsement by the U.S. Government.

Author information





Study design: D.H., C.A.H.; Data analysis: B.A., Z.D., and G.G.; Data interpretation: B.A., Z.D., G.G., M.G., J.R.P., C.A.H., and D.H.; Writing of the paper: B.A., D.H.; Review of the paper: B.A., Z.D., G.G., T.A., K.L.L., G.Z., J.F., E.M.J., L.B., W.Z., J.J.H., R.G.Z., S.N., E.V.B., S.A.I., M.F.P., S.L.D., J.L.R., S.Y.,T.O.O., O.O., W.B., M.A.T., K.L.N., A.H., B.N., S.A., P.N.F., L.E.S., J.T.B., L.H.K., G.T., D.H., L.F., M.K.B., J.D., A.M.D., D.F.E., K.M., P.D.P.P., Q.W., D.P.S., J.A.T., K.M.O., C.M.K., A.G.F., C.B., J.Y., B.A., B.A.W. S.J.C., A.F.O., C.B.A., D.V.C., E.Z., O.I.O., M.G., J.R.P., C.A.H., and D.H.

Corresponding authors

Correspondence to Christopher A. Haiman or Dezheng Huo.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Peer review information Nature Communications thanks the anonymous reviewer(s) for their contribution to the peer review of this work. Peer reviewer reports are available

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Adedokun, B., Du, Z., Gao, G. et al. Cross-ancestry GWAS meta-analysis identifies six breast cancer loci in African and European ancestry women. Nat Commun 12, 4198 (2021).

Download citation


By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.


Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing