Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

A transcriptome-wide association study of 229,000 women identifies new candidate susceptibility genes for breast cancer

Abstract

The breast cancer risk variants identified in genome-wide association studies explain only a small fraction of the familial relative risk, and the genes responsible for these associations remain largely unknown. To identify novel risk loci and likely causal genes, we performed a transcriptome-wide association study evaluating associations of genetically predicted gene expression with breast cancer risk in 122,977 cases and 105,974 controls of European ancestry. We used data from the Genotype-Tissue Expression Project to establish genetic models to predict gene expression in breast tissue and evaluated model performance using data from The Cancer Genome Atlas. Of the 8,597 genes evaluated, significant associations were identified for 48 at a Bonferroni-corrected threshold of P < 5.82 × 10−6, including 14 genes at loci not yet reported for breast cancer. We silenced 13 genes and showed an effect for 11 on cell proliferation and/or colony-forming efficiency. Our study provides new insights into breast cancer genetics and biology.

Access options

Rent or Buy article

Get time limited or full article access on ReadCube.

from$8.99

All prices are NET prices.

Fig. 1: A Manhattan plot of the association results from the breast cancer transcriptome-wide association study.
Fig. 2: Heat maps of proliferation and CFE in breast cells.

References

  1. 1.

    Kamangar, F., Dores, G. M. & Anderson, W. F. Patterns of cancer incidence, mortality, and prevalence across five continents: defining priorities to reduce cancer disparities in different geographic regions of the world. J. Clin. Oncol. 24, 2137–2150 (2006).

    Article  PubMed  Google Scholar 

  2. 2.

    Beggs, A. D. & Hodgson, S. V. Genomics and breast cancer: the different levels of inherited susceptibility. Eur. J. Hum. Genet. 17, 855–856 (2009).

    Article  PubMed  CAS  Google Scholar 

  3. 3.

    Southey, M. C. et al. PALB2, CHEK2 and ATM rare variants and cancer risk: data from COGS. J. Med. Genet. 53, 800–811 (2016).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  4. 4.

    Nathanson, K. L., Wooster, R. & Weber, B. L. Breast cancer genetics: what we know and what we need. Nat. Med. 7, 552–556 (2001).

    Article  PubMed  CAS  Google Scholar 

  5. 5.

    Anglian Breast Cancer Study Group. Prevalence and penetrance of BRCA1 and BRCA2 mutations in a population-based series of breast cancer cases. Br. J. Cancer 83, 1301–1308 (2000).

    Article  PubMed Central  Google Scholar 

  6. 6.

    Milne, R. L. et al. Identification of ten variants associated with risk of estrogen-receptor-negative breast cancer. Nat. Genet. 49, 1767–1778 (2017).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  7. 7.

    Michailidou, K. et al. Association analysis identifies 65 new breast cancer risk loci. Nature 551, 92–94 (2017).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  8. 8.

    Michailidou, K. et al. Large-scale genotyping identifies 41 new loci associated with breast cancer risk. Nat. Genet. 45, 353–361 (2013).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  9. 9.

    Michailidou, K. et al. Genome-wide association analysis of more than 120,000 individuals identifies 15 new susceptibility loci for breast cancer. Nat. Genet. 47, 373–380 (2015).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  10. 10.

    Cai, Q. et al. Genome-wide association analysis in East Asians identifies breast cancer susceptibility loci at 1q32.1, 5q14.3 and 15q26.1. Nat. Genet. 46, 886–890 (2014).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  11. 11.

    Zheng, W. et al. Common genetic determinants of breast-cancer risk in East Asian women: a collaborative study of 23 637 breast cancer cases and 25 579 controls. Hum. Mol. Genet. 22, 2539–2550 (2013).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  12. 12.

    Zhang, B., Beeghly-Fadiel, A., Long, J. & Zheng, W. Genetic variants associated with breast-cancer risk: comprehensive research synopsis, meta-analysis, and epidemiological evidence. Lancet Oncol. 12, 477–488 (2011).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  13. 13.

    French, J. D. et al. Functional variants at the 11q13 risk locus for breast cancer regulate cyclin D1 expression through long-range enhancers. Am. J. Hum. Genet. 92, 489–503 (2013).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  14. 14.

    Hindorff, L. A. et al. Potential etiologic and functional implications of genome-wide association loci for human diseases and traits. Proc. Natl Acad. Sci. USA 106, 9362–9367 (2009).

    Article  PubMed  Google Scholar 

  15. 15.

    The ENCODE Project Consortium An integrated encyclopedia of DNA elements in the human genome. Nature 489, 57–74 (2012).

    Article  PubMed Central  CAS  Google Scholar 

  16. 16.

    Roadmap Epigenomics, C. et al. Integrative analysis of 111 reference human epigenomes. Nature 518, 317–330 (2015).

    Article  CAS  Google Scholar 

  17. 17.

    Dunning, A. M. et al. Breast cancer risk variants at 6q25 display different phenotype associations and regulate ESR1, RMND1 and CCDC170. Nat. Genet. 48, 374–386 (2016).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  18. 18.

    Ghoussaini, M. et al. Evidence that breast cancer risk at the 2q35 locus is mediated through IGFBP5 regulation. Nat. Commun. 4, 4999 (2014).

    Article  PubMed  CAS  Google Scholar 

  19. 19.

    Li, Q. et al. Integrative eQTL-based analyses reveal the biology of breast cancer risk loci. Cell 152, 633–641 (2013).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  20. 20.

    Darabi, H. et al. Polymorphisms in a putative enhancer at the 10q21.2 breast cancer risk locus regulate NRBF2 expression. Am. J. Hum. Genet. 97, 22–34 (2015).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  21. 21.

    Glubb, D. M. et al. Fine-scale mapping of the 5q11.2 breast cancer locus reveals at least three independent risk variants regulating MAP3K1. Am. J. Hum. Genet. 96, 5–20 (2015).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  22. 22.

    Lawrenson, K. et al. Functional mechanisms underlying pleiotropic risk alleles at the 19p13.1 breast-ovarian cancer susceptibility locus. Nat. Commun. 7, 12675 (2016).

    Article  PubMed  PubMed Central  Google Scholar 

  23. 23.

    Lee, D. et al. A method to predict the impact of regulatory variants from DNA sequence. Nat. Genet. 47, 955–961 (2015).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  24. 24.

    Finucane, H. K. et al. Partitioning heritability by functional annotation using genome-wide association summary statistics. Nat. Genet. 47, 1228–1235 (2015).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  25. 25.

    Gusev, A. et al. Partitioning heritability of regulatory and cell-type-specific variants across 11 common diseases. Am. J. Hum. Genet. 95, 535–552 (2014).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  26. 26.

    Barbeira, A.N. et al. Exploring the phenotypic consequences of tissue specific gene expression variation inferred from GWAS summary statistics. Nat. Commun. 9, 1825 (2018).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  27. 27.

    Gamazon, E. R. et al. A gene-based association method for mapping traits using reference transcriptome data. Nat. Genet. 47, 1091–1098 (2015).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  28. 28.

    Gusev, A. et al. Integrative approaches for large-scale transcriptome-wide association studies. Nat. Genet. 48, 245–252 (2016).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  29. 29.

    Zhu, Z. et al. Integration of summary data from GWAS and eQTL studies predicts complex trait gene targets. Nat. Genet. 48, 481–487 (2016).

    Article  PubMed  CAS  Google Scholar 

  30. 30.

    Hoffman, J. D. et al. Cis-eQTL-based trans-ethnic meta-analysis reveals novel genes associated with breast cancer risk. PLoS Genet. 13, e1006690 (2017).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  31. 31.

    Lin, W. Y. et al. Identification and characterization of novel associations in the CASP8/ALS2CR12 region on chromosome 2 with breast cancer risk. Hum. Mol. Genet. 24, 285–298 (2015).

    Article  PubMed  CAS  Google Scholar 

  32. 32.

    Camp, N. J. et al. Discordant haplotype sequencing identifies functional variants at the 2q33 breast cancer risk locus. Cancer Res. 76, 1916–1925 (2016).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  33. 33.

    Li, Q. et al. Expression QTL-based analyses reveal candidate causal genes and loci across five tumor types. Hum. Mol. Genet. 23, 5294–5302 (2014).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  34. 34.

    Caswell, J. L. et al. Multiple breast cancer risk variants are associated with differential transcript isoform expression in tumors. Hum. Mol. Genet. 24, 7421–7431 (2015).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  35. 35.

    Darabi, H. et al. Fine scale mapping of the 17q22 breast cancer locus using dense SNPs, genotyped within the Collaborative Oncological Gene-Environment Study (COGs). Sci. Rep. 6, 32512 (2016).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  36. 36.

    Yang, J. et al. Conditional and joint multiple-SNP analysis of GWAS summary statistics identifies additional variants influencing complex traits. Nat. Genet. 44, 369–375 (2012).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  37. 37.

    Kramer, A., Green, J., Pollard, J. Jr & Tugendreich, S. Causal analysis approaches in Ingenuity Pathway Analysis. Bioinformatics 30, 523–530 (2014).

    Article  PubMed  CAS  Google Scholar 

  38. 38.

    Koh, J. L. et al. COLT-Cancer: functional genetic screening resource for essential genes in human cancer cell lines. Nucleic Acids Res. 40, D957–D963 (2012).

    Article  PubMed  CAS  Google Scholar 

  39. 39.

    Marcotte, R. et al. Essential gene profiles in breast, pancreatic, and ovarian cancer cells. Cancer Discov. 2, 172–189 (2012).

    Article  PubMed  CAS  Google Scholar 

  40. 40.

    Walen, K. H. & Stampfer, M. R. Chromosome analyses of human mammary epithelial cells at stages of chemical-induced transformation progression to immortality. Cancer Genet. Cytogenet. 37, 249–261 (1989).

    Article  PubMed  CAS  Google Scholar 

  41. 41.

    Treszezamsky, A. D. et al. BRCA1- and BRCA2-deficient cells are sensitive to etoposide-induced DNA double-strand breaks via topoisomerase II. Cancer Res. 67, 7078–7081 (2007).

    Article  PubMed  CAS  Google Scholar 

  42. 42.

    Sanchez, Y. et al. Genome-wide analysis of the human p53 transcriptional network unveils a lncRNA tumour suppressor signature. Nat. Commun. 5, 5812 (2014).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  43. 43.

    Li, Y., Peart, M. J. & Prives, C. Stxbp4 regulates DeltaNp63 stability by suppression of RACK1-dependent degradation. Mol. Cell. Biol. 29, 3953–3963 (2009).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  44. 44.

    Sekine, Y. et al. The Kelch repeat protein KLHDC10 regulates oxidative stress-induced ASK1 activation by suppressing PP5. Mol. Cell 48, 692–704 (2012).

    Article  PubMed  CAS  Google Scholar 

  45. 45.

    Kim, M. H. et al. Anaplastic lymphoma kinase gene copy number gain in inflammatory breast cancer (IBC): prevalence, clinicopathologic features and prognostic implication. PLoS One 10, e0120320 (2015).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  46. 46.

    Shaw, A.T. et al. Crizotinib versus chemotherapy in advanced ALK-positive lung cancer. N. Engl. J. Med. 368, 2385–2394 (2013).

    Article  PubMed  CAS  Google Scholar 

  47. 47.

    Le Page, C. et al. BTN3A2 expression in epithelial ovarian cancer is associated with higher tumor infiltrating T cells and a better prognosis. PLoS One 7, e38541 (2012).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  48. 48.

    Kan, L. et al. LRRC3B is downregulated in non-small-cell lung cancer and inhibits cancer cell proliferation and invasion. Tumour Biol. 37, 1113–1120 (2016).

    Article  PubMed  CAS  Google Scholar 

  49. 49.

    Cox, A. et al. A common coding variant in CASP8 is associated with breast cancer risk. Nat. Genet. 39, 352–358 (2007).

    Article  PubMed  CAS  Google Scholar 

  50. 50.

    Yang, J. et al. Genomic inflation factors under polygenic inheritance. Eur. J. Hum. Genet. 19, 807–812 (2011).

    Article  PubMed  PubMed Central  Google Scholar 

  51. 51.

    Marouli, E. et al. Rare and low-frequency coding variants alter human adult height. Nature 542, 186–190 (2017).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  52. 52.

    Turcot, V. et al. Protein-altering variants associated with body mass index implicate pathways that control energy intake and expenditure in obesity. Nat. Genet. 50, 26–41 (2018).

    Article  PubMed  CAS  Google Scholar 

  53. 53.

    Melé, M. et al. The human transcriptome across tissues and individuals. Science 348, 660–665 (2015).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  54. 54.

    The GTEx Consortium. The Genotype-Tissue Expression (GTEx) pilot analysis: multitissue gene regulation in humans. Science 348, 648–660 (2015).

    Article  CAS  Google Scholar 

  55. 55.

    McCarthy, S. et al. A reference panel of 64,976 haplotypes for genotype imputation. Nat. Genet. 48, 1279–1283 (2016).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  56. 56.

    Delaneau, O., Marchini, J. & Zagury, J. F. A linear complexity phasing method for thousands of genomes. Nat. Methods 9, 179–181 (2012).

    Article  CAS  Google Scholar 

  57. 57.

    Howie, B. N., Donnelly, P. & Marchini, J. A flexible and accurate genotype imputation method for the next generation of genome-wide association studies. PLoS Genet. 5, e1000529 (2009).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  58. 58.

    DeLuca, D. S. et al. RNA-SeQC: RNA-seq metrics for quality control and process optimization. Bioinformatics 28, 1530–1532 (2012).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  59. 59.

    Stegle, O., Parts, L., Piipari, M., Winn, J. & Durbin, R. Using probabilistic estimation of expression residuals (PEER) to obtain increased power and interpretability of gene expression analyses. Nat. Protoc. 7, 500–507 (2012).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  60. 60.

    Guo, X., Lin, M., Rockowitz, S., Lachman, H. M. & Zheng, D. Characterization of human pseudogene-derived non-coding RNAs for functional potential. PLoS One 9, e93972 (2014).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  61. 61.

    Casbas-Hernandez, P. et al. Tumor intrinsic subtype is reflected in cancer-adjacent tissue. Cancer Epidemiol. Biomark. Prev. 24, 406–414 (2015).

    Article  CAS  Google Scholar 

  62. 62.

    Huang, X., Stern, D. F. & Zhao, H. Transcriptional profiles from paired normal samples offer complementary information on cancer patient survival – Evidence from TCGA pan-cancer data. Sci. Rep. 6, 20567 (2016).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  63. 63.

    Ghoussaini, M. et al. Genome-wide association analysis identifies three new breast cancer susceptibility loci. Nat. Genet. 44, 312–318 (2012).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  64. 64.

    Garcia-Closas, M. et al. Genome-wide association studies identify four ER negative-specific breast cancer risk loci. Nat. Genet. 45, 392–398 (2013).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  65. 65.

    Devlin, B. & Roeder, K. Genomic control for association studies. Biometrics 55, 997–1004 (1999).

    Article  PubMed  CAS  Google Scholar 

  66. 66.

    Freedman, M. L. et al. Assessing the impact of population stratification on genetic association studies. Nat. Genet. 36, 388–393 (2004).

    Article  PubMed  CAS  Google Scholar 

  67. 67.

    Rao, S. S. et al. A 3D map of the human genome at kilobase resolution reveals principles of chromatin looping. Cell 159, 1665–1680 (2014).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  68. 68.

    He, B., Chen, C., Teng, L. & Tan, K. Global view of enhancer-promoter interactome in human cells. Proc. Natl Acad. Sci. USA 111, E2191–E2199 (2014).

    Article  PubMed  CAS  Google Scholar 

  69. 69.

    Corradin, O. et al. Combinatorial effects of multiple enhancer variants in linkage disequilibrium dictate levels of gene expression to confer susceptibility to common traits. Genome Res. 24, 1–13 (2014).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  70. 70.

    Hnisz, D. et al. Super-enhancers in the control of cell identity and disease. Cell 155, 934–947 (2013).

    Article  PubMed  CAS  Google Scholar 

  71. 71.

    The FANTOM Consortium and the RIKEN PMI and CLST (DGT). A promoter-level mammalian expression atlas. Nature 507, 462–470 (2014).

    Article  CAS  Google Scholar 

Download references

Acknowledgements

The authors thank J. He, W. Wen, A. Giri and T. Edwards of Vanderbilt Epidemiology Center and R. Tao of the Department of Biostatistics, Vanderbilt University Medical Center for their help with the data analysis of this study. The authors would also like to thank all of the individuals for their participation in the parent studies and all of the researchers, clinicians, technicians and administrative staff for their contribution to the studies. We are also grateful to H. K. Im of University of Chicago for her help. The data analyses were conducted using the Advanced Computing Center for Research and Education (ACCRE) at Vanderbilt University. This project at Vanderbilt University Medical Center was supported in part by grants R01CA158473 and R01CA148677 from the US National Institutes of Health as well as funds from Anne Potter Wilson endowment. L.W. is supported by NCI K99 CA218892 and the Vanderbilt Molecular and Genetic Epidemiology of Cancer (MAGEC) training program (US NCI grant R25 CA160056 awarded to X.-O.S.). Genotyping of the OncoArray was principally funded from three sources: the PERSPECTIVE project, funded by the Government of Canada through Genome Canada and the Canadian Institutes of Health Research, the Ministère de l’Économie, de la Science et de l’Innovation du Québec through Genome Québec and the Quebec Breast Cancer Foundation; the NCI Genetic Associations and Mechanisms in Oncology (GAME-ON) initiative and the Discovery, Biology and Risk of Inherited Variants in Breast Cancer (DRIVE) project (National Institutes of Health (NIH) grants U19 CA148065 and X01HG007492); and Cancer Research UK (C1287/A10118 and C1287/A16563). BCAC is funded by Cancer Research UK (C1287/A16563), by the European Community’s Seventh Framework Programme under grant agreement 223175 (HEALTH-F2-2009-223175) (COGS) and by the European Union’s Horizon 2020 Research and Innovation Programme under grant agreements 633784 (B-CAST) and 634935 (BRIDGES). Genotyping of the iCOGS array was funded by the European Union (HEALTH-F2-2009-223175), Cancer Research UK (C1287/A10710), the Canadian Institutes of Health Research for the ‘CIHR Team in Familial Risks of Breast Cancer’ program, and the Ministry of Economic Development, Innovation and Export Trade of Quebec—grant no. PSR-SIIRI-701. Combining of the GWAS data was supported in part by the NIH Cancer Post-Cancer GWAS initiative grant U19 CA 148065 (DRIVE, part of the GAME-ON initiative). A full description of funding and acknowledgments for BCAC studies, along with consortium membership, are included in the Supplementary Note.

Author information

Affiliations

Authors

Consortia

Contributions

W.Z. and J. Long conceived the study. L.W. contributed to the study design and performed statistical analyses. L.W., W.Z. and G.C.-T. wrote the manuscript with significant contributions from W.S., J. Long, X.G. and S.L.E. W.S. performed the in vitro experiments. G.C.-T. directed the in vitro experiments. X.G. contributed to the model building and pathway analyses. J.B. contributed to the bioinformatics analyses. F.A.-E., E.R. and S.L.E. contributed to the in vitro experiments. Y.L. and C.Z. contributed to the model building. K.M., M.K.B., X.-O.S., Q.W., J.D., B.L., C.Z., H.F., A.G., R.T.B., A.M.D., P.D.P.P., J.S., R.L.M., P.K. and D.F.E. contributed to manuscript revision, statistical analyses and/or BCAC data management. I.L.A., H.A.-C., V.A., K.J.A., P.L.A., M. Barrdahl, C.B., M.W.B., J.B., M. Bermisheva, C.B., N.V.B., S.E.B., H. Brauch, H. Brenner, L.B., P.B., S.Y.B., B.B., Q.C., T.C., F.C., B.D.C., J.E.C., J.C.-C., X.C., T.-Y.D.C., H.C., C.L.C., NBCS Collaborators, M.C., S.C., F.J.C., D.C., A.C., S.S.C., J.M.C., K.C., M.B.D., P.D., K.F.D., T.D., I.d.S.S., M. Dumont, M. Dwek, D.M.E., U.E., H.E., C.E., M.E., L.F., P.A.F., J.F., D.F.-J., O.F., H.F., L.F., M. Gabrielson, M.G.-D., S.M.G., M.G.-C., M.M.G., M. Ghoussaini, G.G.G., M.S.G., D.E.G., A.G.-N., P.G., E. Hahnen, C.A.H., N.H., P. Hall, E. Hallberg, U.H., P. Harrington, A. Hein, B.H., P. Hillemanns, A. Hollestelle, R.N.H., J.L.H., G.H., K.H., D.J.H., A.J., W.J., E.M.J., N.J., K.J., M.E.J., A. Jung, R.K., M.J.K., E.K., V.-M.K., V.N.K., D.L., L.L.M., J. Li, S.L., J. Lissowska, W.-Y.L., S. Loibl, J. Lubinski, C.L., M.P.L., R.J.M., T.M., I.M.K., A. Mannermaa, J.E.M., S.M., D.M., H.M.-H., A. Meindl, U.M., J.M., A.M.M., S.L.N., H.N., P.N., S.F.N., B.G.N., O.I.O., J.E.O., H.O., P.P., J.P., D.P.-K., R.P., N.P., K.P., B.R., P.R., N.R., G.R., H.S.R., V.R., A. Romero, J.R., A. Rudolph, E.S., D.P.S., E.J.S., M.K.S., R.K.S., A.S., R.J.S., C.G.S., S.S., M.S., M.J.S., A.S., M.C.S., J.J.S., J.S., H.S., A.J.S., R.T., W.T., J.A.T., M.B.T., D.C.T., A.T., K.T., R.A.E.M.T., D.T., T.T., M.U., C.V., D.V.D.B., D.V., Q.W., C.R.W., C.W., A.S.W., H.W., W.C.W., R.W., A.W., L.X., X.R.Y., A.Z., E.Z. and kConFab/AOCS Investigators contributed to the collection of the data and biological samples for the original BCAC studies. All authors have reviewed and approved the final manuscript.

Corresponding authors

Correspondence to Georgia Chenevix-Trench or Wei Zheng.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Integrated Supplementary Information

Supplementary Figure 1

Study design flow chart

Supplementary Figure 2 Performance of expression prediction models in GTEx and TCGA datasets for genes with at least 10% correlation in GTEx data.

The x axis represents the prediction performance (R2) in the GTEx dataset (n = 67). The y axis represents the prediction performance in the TCGA dataset (n = 86). Each dot represents the expression prediction model for one gene. There is a trend that genes with high internal prediction performance in GTEx data also have high external prediction performance in TCGA data (Pearson's correlation coefficient: 0.55).

Supplementary Figure 3 Quantile–quantile plots.

a, Quantile–quantile plot of P values in –log scale of associations between the genetically predicted expression levels of 8,597 genes and breast cancer risk. b, Quantile–quantile plot of P values in –log scale of associations between all 11.8 million SNPs and breast cancer risk in BCAC. c, Quantile–quantile plot of P values in –log scale of associations between the over 250,000 SNPs predicting expression levels of the 8,597 genes and breast cancer risk in BCAC.

Supplementary Figure 4 Heatmap of log fold change (FC) of selected genes normalized to expression levels in 184A1 breast cells.

Two or three primer sets were designed for each gene (y axis), and mRNA levels were quantified by qPCR in the indicated cells lines (x axis), including 184A1. The FC of genes normalized to that in 184A1 equals the mRNA level in the indicated cells divided by the mRNA level in 184A1. The log2 (FC) over 184A1 is depicted as a heatmap. An X represents ‘not detectable’ with all primer sets. The experiment was repeated independently twice with similar results.

Supplementary Figure 5 Validation of knockdown.

184A1, MCF7 and T47D cells, transfected with the indicated siRNAs, were harvested after 36 h for qPCR analysis to assess knockdown efficiency. The fold changes over NTCsi-transfected parental cells are plotted. The experiment was repeated three times independently with similar results.

Supplementary Figure 6 Proliferation in breast cells using two independent siRNAs.

ac, 184A1 (a), MCF7 (b) and T47D (c) cells were transfected with the indicated siRNAs over 7 d, and phase-contrast images were collected using an IncuCyte ZOOM. Each cell proliferation time course was normalized to the baseline confluency and analyzed in GraphPad Prism. Corrected proliferation % = 100 ± (relative proliferation in indicated siRNA – proliferation in control siRNA (consi))/knockdown efficiency. Related to Fig. 2a.

Supplementary Figure 7 Colony formation efficiency in MCF7 cells using two independent siRNAs.

MCF7 cells were transfected with the indicated siRNAs and then reseeded after 16 h for colony formation (CF) assays. At day 14, colonies were fixed with methanol, stained with crystal violet, scanned and batch analyzed by ImageJ. Corrected CF efficiency (CFE) % = 100 ± (relative CFE in indicated siRNA – CFE in control siRNA (consi))/knockdown efficiency. Error bars, s.d. (n = 4). P values were determined by one-way ANOVA followed by Dunnett’s multiple-comparisons test: *P < 0.05. Related to Fig. 2b.

Supplementary Figure 8 Power calculation of the TWAS analysis.

The simulation analysis is based on 122,977 cases and 105,974 controls. Gene expression was generated from the empirical distribution of predicted gene expression levels in the BCAC. Statistical power was calculated at P < 5.82 × 10–6 (the significance level used in the main TWAS analyses) according for cis-heritability (h2), which we aim to capture using gene expression prediction models (R2). The figure shows results per 1 s.d. increase (or decrease) in the gene expression based on 1,000 replicates.

Supplementary information

Supplementary Text and Figures

Supplementary Figures 1–8, Supplementary Tables 1, 5, 6, 8–11 and 13, and Supplementary Note

Reporting Summary

Supplementary Tables

Supplementary Tables 2–4, 7 and 12

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Wu, L., Shi, W., Long, J. et al. A transcriptome-wide association study of 229,000 women identifies new candidate susceptibility genes for breast cancer. Nat Genet 50, 968–978 (2018). https://doi.org/10.1038/s41588-018-0132-x

Download citation

Further reading

Search

Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing