The breast cancer risk variants identified in genome-wide association studies explain only a small fraction of the familial relative risk, and the genes responsible for these associations remain largely unknown. To identify novel risk loci and likely causal genes, we performed a transcriptome-wide association study evaluating associations of genetically predicted gene expression with breast cancer risk in 122,977 cases and 105,974 controls of European ancestry. We used data from the Genotype-Tissue Expression Project to establish genetic models to predict gene expression in breast tissue and evaluated model performance using data from The Cancer Genome Atlas. Of the 8,597 genes evaluated, significant associations were identified for 48 at a Bonferroni-corrected threshold of P < 5.82 × 10−6, including 14 genes at loci not yet reported for breast cancer. We silenced 13 genes and showed an effect for 11 on cell proliferation and/or colony-forming efficiency. Our study provides new insights into breast cancer genetics and biology.
Subscribe to Journal
Get full journal access for 1 year
only $17.42 per issue
All prices are NET prices.
VAT will be added later in the checkout.
Rent or Buy article
Get time limited or full article access on ReadCube.
All prices are NET prices.
Kamangar, F., Dores, G. M. & Anderson, W. F. Patterns of cancer incidence, mortality, and prevalence across five continents: defining priorities to reduce cancer disparities in different geographic regions of the world. J. Clin. Oncol. 24, 2137–2150 (2006).
Beggs, A. D. & Hodgson, S. V. Genomics and breast cancer: the different levels of inherited susceptibility. Eur. J. Hum. Genet. 17, 855–856 (2009).
Southey, M. C. et al. PALB2, CHEK2 and ATM rare variants and cancer risk: data from COGS. J. Med. Genet. 53, 800–811 (2016).
Nathanson, K. L., Wooster, R. & Weber, B. L. Breast cancer genetics: what we know and what we need. Nat. Med. 7, 552–556 (2001).
Anglian Breast Cancer Study Group. Prevalence and penetrance of BRCA1 and BRCA2 mutations in a population-based series of breast cancer cases. Br. J. Cancer 83, 1301–1308 (2000).
Milne, R. L. et al. Identification of ten variants associated with risk of estrogen-receptor-negative breast cancer. Nat. Genet. 49, 1767–1778 (2017).
Michailidou, K. et al. Association analysis identifies 65 new breast cancer risk loci. Nature 551, 92–94 (2017).
Michailidou, K. et al. Large-scale genotyping identifies 41 new loci associated with breast cancer risk. Nat. Genet. 45, 353–361 (2013).
Michailidou, K. et al. Genome-wide association analysis of more than 120,000 individuals identifies 15 new susceptibility loci for breast cancer. Nat. Genet. 47, 373–380 (2015).
Cai, Q. et al. Genome-wide association analysis in East Asians identifies breast cancer susceptibility loci at 1q32.1, 5q14.3 and 15q26.1. Nat. Genet. 46, 886–890 (2014).
Zheng, W. et al. Common genetic determinants of breast-cancer risk in East Asian women: a collaborative study of 23 637 breast cancer cases and 25 579 controls. Hum. Mol. Genet. 22, 2539–2550 (2013).
Zhang, B., Beeghly-Fadiel, A., Long, J. & Zheng, W. Genetic variants associated with breast-cancer risk: comprehensive research synopsis, meta-analysis, and epidemiological evidence. Lancet Oncol. 12, 477–488 (2011).
French, J. D. et al. Functional variants at the 11q13 risk locus for breast cancer regulate cyclin D1 expression through long-range enhancers. Am. J. Hum. Genet. 92, 489–503 (2013).
Hindorff, L. A. et al. Potential etiologic and functional implications of genome-wide association loci for human diseases and traits. Proc. Natl Acad. Sci. USA 106, 9362–9367 (2009).
The ENCODE Project Consortium An integrated encyclopedia of DNA elements in the human genome. Nature 489, 57–74 (2012).
Roadmap Epigenomics, C. et al. Integrative analysis of 111 reference human epigenomes. Nature 518, 317–330 (2015).
Dunning, A. M. et al. Breast cancer risk variants at 6q25 display different phenotype associations and regulate ESR1, RMND1 and CCDC170. Nat. Genet. 48, 374–386 (2016).
Ghoussaini, M. et al. Evidence that breast cancer risk at the 2q35 locus is mediated through IGFBP5 regulation. Nat. Commun. 4, 4999 (2014).
Li, Q. et al. Integrative eQTL-based analyses reveal the biology of breast cancer risk loci. Cell 152, 633–641 (2013).
Darabi, H. et al. Polymorphisms in a putative enhancer at the 10q21.2 breast cancer risk locus regulate NRBF2 expression. Am. J. Hum. Genet. 97, 22–34 (2015).
Glubb, D. M. et al. Fine-scale mapping of the 5q11.2 breast cancer locus reveals at least three independent risk variants regulating MAP3K1. Am. J. Hum. Genet. 96, 5–20 (2015).
Lawrenson, K. et al. Functional mechanisms underlying pleiotropic risk alleles at the 19p13.1 breast-ovarian cancer susceptibility locus. Nat. Commun. 7, 12675 (2016).
Lee, D. et al. A method to predict the impact of regulatory variants from DNA sequence. Nat. Genet. 47, 955–961 (2015).
Finucane, H. K. et al. Partitioning heritability by functional annotation using genome-wide association summary statistics. Nat. Genet. 47, 1228–1235 (2015).
Gusev, A. et al. Partitioning heritability of regulatory and cell-type-specific variants across 11 common diseases. Am. J. Hum. Genet. 95, 535–552 (2014).
Barbeira, A.N. et al. Exploring the phenotypic consequences of tissue specific gene expression variation inferred from GWAS summary statistics. Nat. Commun. 9, 1825 (2018).
Gamazon, E. R. et al. A gene-based association method for mapping traits using reference transcriptome data. Nat. Genet. 47, 1091–1098 (2015).
Gusev, A. et al. Integrative approaches for large-scale transcriptome-wide association studies. Nat. Genet. 48, 245–252 (2016).
Zhu, Z. et al. Integration of summary data from GWAS and eQTL studies predicts complex trait gene targets. Nat. Genet. 48, 481–487 (2016).
Hoffman, J. D. et al. Cis-eQTL-based trans-ethnic meta-analysis reveals novel genes associated with breast cancer risk. PLoS Genet. 13, e1006690 (2017).
Lin, W. Y. et al. Identification and characterization of novel associations in the CASP8/ALS2CR12 region on chromosome 2 with breast cancer risk. Hum. Mol. Genet. 24, 285–298 (2015).
Camp, N. J. et al. Discordant haplotype sequencing identifies functional variants at the 2q33 breast cancer risk locus. Cancer Res. 76, 1916–1925 (2016).
Li, Q. et al. Expression QTL-based analyses reveal candidate causal genes and loci across five tumor types. Hum. Mol. Genet. 23, 5294–5302 (2014).
Caswell, J. L. et al. Multiple breast cancer risk variants are associated with differential transcript isoform expression in tumors. Hum. Mol. Genet. 24, 7421–7431 (2015).
Darabi, H. et al. Fine scale mapping of the 17q22 breast cancer locus using dense SNPs, genotyped within the Collaborative Oncological Gene-Environment Study (COGs). Sci. Rep. 6, 32512 (2016).
Yang, J. et al. Conditional and joint multiple-SNP analysis of GWAS summary statistics identifies additional variants influencing complex traits. Nat. Genet. 44, 369–375 (2012).
Kramer, A., Green, J., Pollard, J. Jr & Tugendreich, S. Causal analysis approaches in Ingenuity Pathway Analysis. Bioinformatics 30, 523–530 (2014).
Koh, J. L. et al. COLT-Cancer: functional genetic screening resource for essential genes in human cancer cell lines. Nucleic Acids Res. 40, D957–D963 (2012).
Marcotte, R. et al. Essential gene profiles in breast, pancreatic, and ovarian cancer cells. Cancer Discov. 2, 172–189 (2012).
Walen, K. H. & Stampfer, M. R. Chromosome analyses of human mammary epithelial cells at stages of chemical-induced transformation progression to immortality. Cancer Genet. Cytogenet. 37, 249–261 (1989).
Treszezamsky, A. D. et al. BRCA1- and BRCA2-deficient cells are sensitive to etoposide-induced DNA double-strand breaks via topoisomerase II. Cancer Res. 67, 7078–7081 (2007).
Sanchez, Y. et al. Genome-wide analysis of the human p53 transcriptional network unveils a lncRNA tumour suppressor signature. Nat. Commun. 5, 5812 (2014).
Li, Y., Peart, M. J. & Prives, C. Stxbp4 regulates DeltaNp63 stability by suppression of RACK1-dependent degradation. Mol. Cell. Biol. 29, 3953–3963 (2009).
Sekine, Y. et al. The Kelch repeat protein KLHDC10 regulates oxidative stress-induced ASK1 activation by suppressing PP5. Mol. Cell 48, 692–704 (2012).
Kim, M. H. et al. Anaplastic lymphoma kinase gene copy number gain in inflammatory breast cancer (IBC): prevalence, clinicopathologic features and prognostic implication. PLoS One 10, e0120320 (2015).
Shaw, A.T. et al. Crizotinib versus chemotherapy in advanced ALK-positive lung cancer. N. Engl. J. Med. 368, 2385–2394 (2013).
Le Page, C. et al. BTN3A2 expression in epithelial ovarian cancer is associated with higher tumor infiltrating T cells and a better prognosis. PLoS One 7, e38541 (2012).
Kan, L. et al. LRRC3B is downregulated in non-small-cell lung cancer and inhibits cancer cell proliferation and invasion. Tumour Biol. 37, 1113–1120 (2016).
Cox, A. et al. A common coding variant in CASP8 is associated with breast cancer risk. Nat. Genet. 39, 352–358 (2007).
Yang, J. et al. Genomic inflation factors under polygenic inheritance. Eur. J. Hum. Genet. 19, 807–812 (2011).
Marouli, E. et al. Rare and low-frequency coding variants alter human adult height. Nature 542, 186–190 (2017).
Turcot, V. et al. Protein-altering variants associated with body mass index implicate pathways that control energy intake and expenditure in obesity. Nat. Genet. 50, 26–41 (2018).
Melé, M. et al. The human transcriptome across tissues and individuals. Science 348, 660–665 (2015).
The GTEx Consortium. The Genotype-Tissue Expression (GTEx) pilot analysis: multitissue gene regulation in humans. Science 348, 648–660 (2015).
McCarthy, S. et al. A reference panel of 64,976 haplotypes for genotype imputation. Nat. Genet. 48, 1279–1283 (2016).
Delaneau, O., Marchini, J. & Zagury, J. F. A linear complexity phasing method for thousands of genomes. Nat. Methods 9, 179–181 (2012).
Howie, B. N., Donnelly, P. & Marchini, J. A flexible and accurate genotype imputation method for the next generation of genome-wide association studies. PLoS Genet. 5, e1000529 (2009).
DeLuca, D. S. et al. RNA-SeQC: RNA-seq metrics for quality control and process optimization. Bioinformatics 28, 1530–1532 (2012).
Stegle, O., Parts, L., Piipari, M., Winn, J. & Durbin, R. Using probabilistic estimation of expression residuals (PEER) to obtain increased power and interpretability of gene expression analyses. Nat. Protoc. 7, 500–507 (2012).
Guo, X., Lin, M., Rockowitz, S., Lachman, H. M. & Zheng, D. Characterization of human pseudogene-derived non-coding RNAs for functional potential. PLoS One 9, e93972 (2014).
Casbas-Hernandez, P. et al. Tumor intrinsic subtype is reflected in cancer-adjacent tissue. Cancer Epidemiol. Biomark. Prev. 24, 406–414 (2015).
Huang, X., Stern, D. F. & Zhao, H. Transcriptional profiles from paired normal samples offer complementary information on cancer patient survival – Evidence from TCGA pan-cancer data. Sci. Rep. 6, 20567 (2016).
Ghoussaini, M. et al. Genome-wide association analysis identifies three new breast cancer susceptibility loci. Nat. Genet. 44, 312–318 (2012).
Garcia-Closas, M. et al. Genome-wide association studies identify four ER negative-specific breast cancer risk loci. Nat. Genet. 45, 392–398 (2013).
Devlin, B. & Roeder, K. Genomic control for association studies. Biometrics 55, 997–1004 (1999).
Freedman, M. L. et al. Assessing the impact of population stratification on genetic association studies. Nat. Genet. 36, 388–393 (2004).
Rao, S. S. et al. A 3D map of the human genome at kilobase resolution reveals principles of chromatin looping. Cell 159, 1665–1680 (2014).
He, B., Chen, C., Teng, L. & Tan, K. Global view of enhancer-promoter interactome in human cells. Proc. Natl Acad. Sci. USA 111, E2191–E2199 (2014).
Corradin, O. et al. Combinatorial effects of multiple enhancer variants in linkage disequilibrium dictate levels of gene expression to confer susceptibility to common traits. Genome Res. 24, 1–13 (2014).
Hnisz, D. et al. Super-enhancers in the control of cell identity and disease. Cell 155, 934–947 (2013).
The FANTOM Consortium and the RIKEN PMI and CLST (DGT). A promoter-level mammalian expression atlas. Nature 507, 462–470 (2014).
The authors thank J. He, W. Wen, A. Giri and T. Edwards of Vanderbilt Epidemiology Center and R. Tao of the Department of Biostatistics, Vanderbilt University Medical Center for their help with the data analysis of this study. The authors would also like to thank all of the individuals for their participation in the parent studies and all of the researchers, clinicians, technicians and administrative staff for their contribution to the studies. We are also grateful to H. K. Im of University of Chicago for her help. The data analyses were conducted using the Advanced Computing Center for Research and Education (ACCRE) at Vanderbilt University. This project at Vanderbilt University Medical Center was supported in part by grants R01CA158473 and R01CA148677 from the US National Institutes of Health as well as funds from Anne Potter Wilson endowment. L.W. is supported by NCI K99 CA218892 and the Vanderbilt Molecular and Genetic Epidemiology of Cancer (MAGEC) training program (US NCI grant R25 CA160056 awarded to X.-O.S.). Genotyping of the OncoArray was principally funded from three sources: the PERSPECTIVE project, funded by the Government of Canada through Genome Canada and the Canadian Institutes of Health Research, the Ministère de l’Économie, de la Science et de l’Innovation du Québec through Genome Québec and the Quebec Breast Cancer Foundation; the NCI Genetic Associations and Mechanisms in Oncology (GAME-ON) initiative and the Discovery, Biology and Risk of Inherited Variants in Breast Cancer (DRIVE) project (National Institutes of Health (NIH) grants U19 CA148065 and X01HG007492); and Cancer Research UK (C1287/A10118 and C1287/A16563). BCAC is funded by Cancer Research UK (C1287/A16563), by the European Community’s Seventh Framework Programme under grant agreement 223175 (HEALTH-F2-2009-223175) (COGS) and by the European Union’s Horizon 2020 Research and Innovation Programme under grant agreements 633784 (B-CAST) and 634935 (BRIDGES). Genotyping of the iCOGS array was funded by the European Union (HEALTH-F2-2009-223175), Cancer Research UK (C1287/A10710), the Canadian Institutes of Health Research for the ‘CIHR Team in Familial Risks of Breast Cancer’ program, and the Ministry of Economic Development, Innovation and Export Trade of Quebec—grant no. PSR-SIIRI-701. Combining of the GWAS data was supported in part by the NIH Cancer Post-Cancer GWAS initiative grant U19 CA 148065 (DRIVE, part of the GAME-ON initiative). A full description of funding and acknowledgments for BCAC studies, along with consortium membership, are included in the Supplementary Note.
The authors declare no competing interests.
Publisher’s note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Integrated Supplementary Information
Study design flow chart
Supplementary Figure 2 Performance of expression prediction models in GTEx and TCGA datasets for genes with at least 10% correlation in GTEx data.
The x axis represents the prediction performance (R2) in the GTEx dataset (n = 67). The y axis represents the prediction performance in the TCGA dataset (n = 86). Each dot represents the expression prediction model for one gene. There is a trend that genes with high internal prediction performance in GTEx data also have high external prediction performance in TCGA data (Pearson's correlation coefficient: 0.55).
a, Quantile–quantile plot of P values in –log scale of associations between the genetically predicted expression levels of 8,597 genes and breast cancer risk. b, Quantile–quantile plot of P values in –log scale of associations between all 11.8 million SNPs and breast cancer risk in BCAC. c, Quantile–quantile plot of P values in –log scale of associations between the over 250,000 SNPs predicting expression levels of the 8,597 genes and breast cancer risk in BCAC.
Supplementary Figure 4 Heatmap of log fold change (FC) of selected genes normalized to expression levels in 184A1 breast cells.
Two or three primer sets were designed for each gene (y axis), and mRNA levels were quantified by qPCR in the indicated cells lines (x axis), including 184A1. The FC of genes normalized to that in 184A1 equals the mRNA level in the indicated cells divided by the mRNA level in 184A1. The log2 (FC) over 184A1 is depicted as a heatmap. An X represents ‘not detectable’ with all primer sets. The experiment was repeated independently twice with similar results.
184A1, MCF7 and T47D cells, transfected with the indicated siRNAs, were harvested after 36 h for qPCR analysis to assess knockdown efficiency. The fold changes over NTCsi-transfected parental cells are plotted. The experiment was repeated three times independently with similar results.
a–c, 184A1 (a), MCF7 (b) and T47D (c) cells were transfected with the indicated siRNAs over 7 d, and phase-contrast images were collected using an IncuCyte ZOOM. Each cell proliferation time course was normalized to the baseline confluency and analyzed in GraphPad Prism. Corrected proliferation % = 100 ± (relative proliferation in indicated siRNA – proliferation in control siRNA (consi))/knockdown efficiency. Related to Fig. 2a.
MCF7 cells were transfected with the indicated siRNAs and then reseeded after 16 h for colony formation (CF) assays. At day 14, colonies were fixed with methanol, stained with crystal violet, scanned and batch analyzed by ImageJ. Corrected CF efficiency (CFE) % = 100 ± (relative CFE in indicated siRNA – CFE in control siRNA (consi))/knockdown efficiency. Error bars, s.d. (n = 4). P values were determined by one-way ANOVA followed by Dunnett’s multiple-comparisons test: *P < 0.05. Related to Fig. 2b.
The simulation analysis is based on 122,977 cases and 105,974 controls. Gene expression was generated from the empirical distribution of predicted gene expression levels in the BCAC. Statistical power was calculated at P < 5.82 × 10–6 (the significance level used in the main TWAS analyses) according for cis-heritability (h2), which we aim to capture using gene expression prediction models (R2). The figure shows results per 1 s.d. increase (or decrease) in the gene expression based on 1,000 replicates.
About this article
Cite this article
Wu, L., Shi, W., Long, J. et al. A transcriptome-wide association study of 229,000 women identifies new candidate susceptibility genes for breast cancer. Nat Genet 50, 968–978 (2018). https://doi.org/10.1038/s41588-018-0132-x
Integrating human brain proteomes with genome-wide association data implicates new proteins in Alzheimer’s disease pathogenesis
Nature Genetics (2021)
Transcriptome-wide association study identifies multiple genes associated with childhood body mass index
International Journal of Obesity (2021)
Clinical Breast Cancer (2021)
Genome Biology (2021)