SNP variants at the MAP3K1/SETD9 locus 5q11.2 associate with somatic PIK3CA variants in breast cancers


Genome-wide association studies have revealed many breast cancer (BC) risk-associated genetic variants that might functionally interact with other molecular determinants of BC. We analysed the association of 21 known risk-associated single-nucleotide variants (SNVs) with recurrent somatic variants in two cohorts of 77 and 754 oestrogen receptor α-positive BCs. Four SNVs located at 5q11.2 were found to be associated with the somatic PIK3CA variant status in the pilot cohort of 77 cases with odds ratio (OR) up to 6.5 indicating strong effects, and were selected for the validation phase. Two of these SNVs, rs252913 and rs331499, located in the MAP3K1/SETD9 gene boundary, were confirmed to be associated with somatic PIK3CA variants in the large cohort with OR 2.97 (1.17–7.75) and 1.76 (1.11–2.77), respectively, notably higher than their BC risk-associated values, both around 1.1. In the presence of the SNV or of somatic PIK3CA variants, cancers express significantly elevated levels of MAP3K1 and SETD9, with synergy of SNV and PIK3CA variants in MAP3K1 gene overexpression, consistent with a preferential PIK3CA-dependent regulation of the variant alleles.


Less than 10% of human breast cancers (BCs) show pronounced familiarity that can be explained by high penetrance germline variants, but sporadic BCs are also co-determined by low penetrance variants.1, 2, 3 Genome-wide association studies in BC cohorts compared with the general population have identified almost a 100 single-nucleotide variants (SNVs) associated with BC with odds ratio (OR) below 1.2 for most single SNVs.4, 5 However, differently from high penetrance variants, how SNVs functionally increase BC risk is difficult to establish and mostly unknown.

In sporadic BC, next-generation sequencing has revealed a mutational landscape characterised by a large number of somatic variants (SM), but few recurrently mutated genes: PIK3CA (25–35%), TP53 (20–30%), and to a lesser extent MAP3K1, GATA3 and CDH1, with a certain preference for specific BC (sub)types.6, 7, 8 Recurrent somatic variants drive tumour progression, but little is known about what leads to the development of tumours carrying these variants. We reasoned that risk-associated genetic variants might modify driver gene penetrance, and investigated the issue by analysing the association of a small series of known BC risk-associated SNVs with the occurrence of SM in the frequently mutated genes PIK3CA, TP53 and MAP3K1.


Twenty-one SNVs were selected (Supplementary Table S1): 11 from O’Brien et al.,9 rs889312, an expression quantitative trait locus (eQTL) SNV,10, 11 and three further SNVs in high linkage disequilibrium (LD) with the former all located at 5q11.2 (Supplementary Table S2), as well as seven further SNVs were selected from Rhie et al.12 We preferred SNVs associated with oestrogen receptor α-positive (ER+) BC risk (often higher than for all BC) and close to coding sequences, to maximise the use of genotyped data without imputations of allelic variant status.

Because no previous OR assessment was available, we started by analysing a small pilot data set powered enough to observe strong association, Ellis et al.,13 whole-genome and exome sequences of samples from 77 ER+ BC patients of CEU ethnicity (genomic data from dbGap14 repository, authorisation #7444; SM data from Ellis et al.13). To confirm the observed associations, we used the 754 ER+ BC patient data from the BRCA data set of the TCGA consortium collection7 (genomic data, available from CGHub15 repository, authorisation #7821; public data at, and many research portals such as CBIO ( Clinical and protected SRR/BAM files of normal blood samples, collecting selected sequences that included the 21 SNVs for the Ellis 2012 data set or the four 5q11.2 SNVs for the BRCA data set, were downloaded by using dbGap’s SRA-Toolkit service for Elllis 2012 and beta2 version of CGHub’s BAM-Slicer service for TCGA-BRCA. Selected sequence downloading allowed a crucial reduction of the multi-terabytes secure file handling required for full data sets; to our knowledge, this is the first report of an analysis entirely performed by using selected downloads. Data were processed by Python and R scripts for realignments, filtering and allele calls, then linked to public (level 3) SM data of BC samples. For the Ellis 2012 pilot data set, ORs were calculated for each allele (dominant model) of the 21 SNVs vs the SM status of each of PIK3CA, MAP3k1 and TP53. Uncorrected Fisher’s exact test P<0.1 and OR >3 were used to select the resulting associations. Allele correlations were assessed by the Pearson’s r coefficient. For the BRCA data set, ER+ samples were selected and a similar analysis was performed on the four 5q11.2 SNVs vs PIK3CA SM only. Fisher’s P were reported without multiple test correction owing to the high correlations among the four SNVs.17 Logistic regression and ethnicity strata calculation with forest plots and Breslow–Day homogeneity test were also performed. Public (level 3) log2 normalised gene expression data for the BRCA BC samples were merged with the SNV/SM/clinical data and difference of expression significances were assessed by using the Student’s t-test. Statistical analyses were performed by using R(x64) 3.1.3 ( Result data were submitted to the GWAS Central repository (


In the Ellis 2012 pilot data set, we found strong indication that four SNVs in the 5q11.2 locus were associated with the mutational status of PIK3CA with high OR (Table 1). As expected by their LD, high correlation was found among these variants. Most of the other high OR SNV/SM associations had very low significance levels (full results in Supplementary Table S3). We focused on the associations of the four 5q11.2 SNVs with PIK3CA SM and verified them in the 754 ER+ BC patient data from the TCGA-BRCA data set. Two variants, rs331499 (hg19.chr5:g.56210923A>G) and rs252913 (hg19.chr5:g.56195846G>A), located in the boundary of MAP3K1 and SETD9 genes, were confirmed to be correlated with PIK3CA SM with high OR; a third, rs832552(hg19.chr5:g.56113850T>G) inside MAP3K1, had few valid samples but a similar OR trend (Table 1). High correlation (Pearson's r range: 0.73–0.94) among the variants was confirmed (Supplementary Table S4). In logistic regression, no evidence of significant heterogeneity for ethnicity was found (Supplementary Tables S4 and S5).

Table 1 Association of SNVs close to the MAP3K1 gene with somatic PIK3CA variants, two cohorts

The three 5q11.2 variants were found to be associated with the overexpression of one or both of their nearest genes MAP3K1 and SETD9; for MAP3K1, associations were stronger in PI3KCA SM than in wild type (WT; Table 2 and Supplementary Tables S6–S8). Furthermore, we found a direct association between both MAP3K1 and SETD9 overexpression, and PIK3CA SM status – MAP3K1 expression at PIK3CA SM/WT, difference of means: 0.38 (95% CI: 0.19, 0.57), P=4e−4; SETD9 expression at PIK3CA SM/WT, difference of means: 1.44 (95% CI: 1.24, 1.63), P=2e−16.

Table 2 5q11.2 SNV association with MAP3K1 and SETD9 gene expression in TCGA-BRCA (ER+) data set


In this short report, we show that germline SNVs located near the MAP3K1/SETD9 genes associate with PIK3CA SM in ER+ BC with OR values (1.75 and 2.97 for rs331499 and rs252913, respectively) much higher than their OR of association with BC or BC subtypes (OR about 1.1,18 as the OR of most cancer-risk SNV4). SNV data are coherent with gene expression data: the SNV associations with MAP3K1/SETD9 overexpression are increased when the distance from the target gene is reduced, and, for MAP3K1, are stronger in PIK3CA SM BC samples. The overall picture is compatible with a MAP3K1/SETD9 variant-dependent overexpression affecting PIK3CA SM penetrance. Moreover, we found a clear direct association of PIK3CA SM with MAP3K1 and SETD9 overexpression. Indeed, inter-regulation between PI3K and MAP-kinase pathways has been described in in vitro experiments and computer simulation,19 and combination of drugs targeting both pathways is under clinical investigation.20 A possible SETD9 involvement is suggested by the strong SNV associations with SETD9 overexpression; moreover, 5q11.2 SNV eQTL to SETD9 has been reported also in normal blood.18 However, we found a synergy of PI3KCA SM and SNV only for MAP3K1 overexpression.

Two of our findings indicate that a complex BC risk SNV structure is present in the 5q11.2 region. First, only the SNV in the boundary of MAP3K1/SETD9 genes (but not the reference risk SNP rs889312, which they are in high LD with) were found associated with PI3KCA SM. Second, phasing data showed that the SNV alleles associated with increased PIK3CA SM (and MAP3k1/SETD9 overexpression) are actually correlated with the reduced BC risk allele (A) of rs88931211(Supplementary Table S10). Hence, their opposite alleles should be associated with BCs in which PIK3CA is not mutated to build up to their overall increased BC risk.18 This ‘reverse’ phase should not surprise because the SNV/PIK3CA SM associations found have an allele unbalancing effect one order of magnitude stronger than the reported SNV/BC risk. However, it predicts the presence of multiple classes of SNV BC risk in the 5q11.2 segment that split when probed for PIK3CA somatic variants.

This multiplicity could be a consequence of MAP3K1 ubiquitinase activity in addition to its kinase activity, which can therefore both activate and destabilise MAP-kinases.21, 22 The complex BC risk SNV structure has been confirmed by a recent fine scale analysis of 5q11.2 region in a large cohort of patients (not available when we started our investigation) that identified, by logistic regression, four BC risk-associated haplo-blocks.17 By analysing in the BRCA data set, four genotyped SNV representative of the haplo-blocks, we found that only one SNV allele correlated with enriched PIK3CA SM, and it was associated with a reduced BC risk (Supplementary Table S9).

In conclusion, the germline 5q11.2 variants, rs331499 allele A and rs252913 allele G, are associated with MAP3K1 and SETD9 overexpression, and correlate with increased PIK3CA SM frequency in ER+ BC. Genome-wide analysis of SNV/SM associations can increase our understanding of tumour biology with relevant information for precision medicine.


  1. 1

    Foulkes WD : Inherited susceptibility to common cancers. N Engl J Med 2008; 359: 2143–2153.

    CAS  Article  Google Scholar 

  2. 2

    Pharoah PD, Antoniou A, Bobrow M, Zimmern RL, Easton DF, Ponder BA : Polygenic susceptibility to breast cancer and implications for prevention. Nat Genet 2002; 31: 33–36.

    CAS  Article  Google Scholar 

  3. 3

    Mucci LA, Hjelmborg JB, Harris JR et al: Familial risk and heritability of cancer among twins in Nordic countries. JAMA 2016; 315: 68–76.

    CAS  Article  Google Scholar 

  4. 4

    Michailidou K, Beesley J, Lindstrom S et al: Genome-wide association analysis of more than 120,000 individuals identifies 15 new susceptibility loci for breast cancer. Nat Genet 2015; 47: 373–380.

    CAS  Article  Google Scholar 

  5. 5

    Harlid S, Ivarsson MI, Butt S et al: Combined effect of low-penetrant SNVs on breast cancer risk. Br J Cancer 2012; 106: 389–396.

    CAS  Article  Google Scholar 

  6. 6

    Stephens PJ, Tarpey PS, Davies H et al: The landscape of cancer genes and mutational processes in breast cancer. Nature 2012; 486: 400–404.

    CAS  Article  Google Scholar 

  7. 7

    Cancer Genome Atlas Network: Comprehensive molecular portraits of human breast tumours. Nature 2012; 490: 61–70.

    Article  Google Scholar 

  8. 8

    Stratton MR, Campbell PJ, Futreal PA : The cancer genome. Nature 2009; 458: 719–724.

    CAS  Article  Google Scholar 

  9. 9

    O'Brien KM, Cole SR, Engel LS et al: Breast cancer subtypes and previously established genetic risk factors: a bayesian approach. Cancer Epidemiol Biomarkers Prev 2014; 23: 84–97.

    CAS  Article  Google Scholar 

  10. 10

    Li Q, Seo JH, Stranger B et al: Integrative eQTL-based analyses reveal the biology of breast cancer risk loci. Cell 2013; 152: 633–641.

    CAS  Article  Google Scholar 

  11. 11

    Lu PH, Yang J, Li C et al: Association between mitogen-activated protein kinase kinase kinase 1 rs889312 polymorphism and breast cancer risk: evidence from 59977 subjects. Breast Cancer Res Treat 2011; 126: 663–670.

    CAS  Article  Google Scholar 

  12. 12

    Rhie SK, Coetzee SG, Noushmehr H et al: Comprehensive functional annotation of seventy-one breast cancer risk Loci. PLoS One 2013; 8: e63925.

    Article  Google Scholar 

  13. 13

    Ellis MJ, Ding L, Shen D et al: Whole-genome analysis informs breast cancer response to aromatase inhibition. Nature 2012; 486: 353–360.

    CAS  Article  Google Scholar 

  14. 14

    Tryka KA, Hao L, Sturcke A et al: NCBI's Database of Genotypes and Phenotypes: dbGaP. Nucleic Acids Res 2014; 42: D975–D979.

    CAS  Article  Google Scholar 

  15. 15

    Wilks C, Cline MS, Weiler E et al: The Cancer Genomics Hub (CGHub): overcoming cancer through the power of torrential data. Database 2014, e-pub ahead of print 29 September 2014 doi:10.1093/database/bau093.

  16. 16

    Cerami, Gao J, Dogrusoz U et al: The cBio Cancer Genomics Portal: an open platform for exploring multidimensional cancer genomics data. Cancer Discov 2012; 2: 401.

    Article  Google Scholar 

  17. 17

    Nyholt DR : A simple correction for multiple testing for single-nucleotide polymorphisms in linkage disequilibrium with each other. Am J Hum Genet 2004; 74: 765–769.

    CAS  Article  Google Scholar 

  18. 18

    Glubb DM, Maranian MJ, Michailidou K et al: Fine-scale mapping of the 5q11.2 breast cancer locus reveals at least three independent risk variants regulating MAP3K1. Am J Hum Genet 2015; 96: 5–20.

    CAS  Article  Google Scholar 

  19. 19

    Aksamitiene E, Kiyatkin A, Kholodenko BN : Cross-talk between mitogenic Ras/MAPK and survival PI3K/Akt pathways: a fine balance. Biochem Soc Trans 2012; 40: 139–146.

    CAS  Article  Google Scholar 

  20. 20

    Britten CD : PI3K and MEK inhibitor combinations: examining the evidence in selected tumor types. Cancer Chemother Pharmacol 2013; 71: 1395–1409.

    CAS  Article  Google Scholar 

  21. 21

    Lu Z, Xu S, Joazeiro C, Cobb MH, Hunter T : The PHD domain of MEKK1 acts as an E3 ubiquitin ligase and mediates ubiquitination and degradation of ERK1/2. Mol Cell 2002; 9: 945–956.

    CAS  Article  Google Scholar 

  22. 22

    Xia Y, Wang J, Xu S, Johnson GL, Hunter T, Lu Z : MEKK1 mediates the ubiquitination and degradation of c-Jun in response to osmotic stress. Mol Cell Biol 2007; 27: 510–517.

    CAS  Article  Google Scholar 

Download references


The results shown here are in part based upon data generated by the TCGA Research Network: We thank Paola Ghiorzo, Genova, for helpful comments.

Author information



Corresponding author

Correspondence to Roberto Puzone.

Ethics declarations

Competing interests

The authors declare no conflict of interest.

Additional information

Supplementary Information accompanies this paper on European Journal of Human Genetics website

Supplementary information

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Puzone, R., Pfeffer, U. SNP variants at the MAP3K1/SETD9 locus 5q11.2 associate with somatic PIK3CA variants in breast cancers. Eur J Hum Genet 25, 384–387 (2017).

Download citation

Further reading


Quick links