Skip to main content

Thank you for visiting You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Article
  • Published:

Deep neural network improves the estimation of polygenic risk scores for breast cancer


Polygenic risk scores (PRS) estimate the genetic risk of an individual for a complex disease based on many genetic variants across the whole genome. In this study, we compared a series of computational models for estimation of breast cancer PRS. A deep neural network (DNN) was found to outperform alternative machine learning techniques and established statistical algorithms, including BLUP, BayesA, and LDpred. In the test cohort with 50% prevalence, the Area Under the receiver operating characteristic Curve (AUC) were 67.4% for DNN, 64.2% for BLUP, 64.5% for BayesA, and 62.4% for LDpred. BLUP, BayesA, and LPpred all generated PRS that followed a normal distribution in the case population. However, the PRS generated by DNN in the case population followed a bimodal distribution composed of two normal distributions with distinctly different means. This suggests that DNN was able to separate the case population into a high-genetic-risk case subpopulation with an average PRS significantly higher than the control population and a normal-genetic-risk case subpopulation with an average PRS similar to the control population. This allowed DNN to achieve 18.8% recall at 90% precision in the test cohort with 50% prevalence, which can be extrapolated to 65.4% recall at 20% precision in a general population with 12% prevalence. Interpretation of the DNN model identified salient variants that were assigned insignificant p values by association studies, but were important for DNN prediction. These variants may be associated with the phenotype through nonlinear relationships.

This is a preview of subscription content, access via your institution

Access options

Buy this article

Prices may be subject to local taxes which are calculated during checkout

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others


  1. NIH. Female Breast Cancer—Cancer Stat Facts. Accessed 03 Dec 2019.

  2. Nelson HD, Tyne K, Naik A, Bougatsos C, Chan BK, Humphrey L. Screening for breast cancer: an update for the U.S. Preventive Services Task Force. Ann Intern Med. 2009;151:727–37.

    Article  PubMed  PubMed Central  Google Scholar 

  3. Oeffinger KC, Fontham ETH, Etzioni R, Herzig A, Michaelson JS, Shih Y-CT, et al. Breast Cancer Screening for Women at Average Risk: 2015 Guideline Update From the American Cancer Society. J AMA. 2015;314:1599–614.

    CAS  Google Scholar 

  4. Dudbridge F. Power and predictive accuracy of polygenic risk scores. PLoS Genet. 2013;9:e1003348.

  5. Clark SA, Kinghorn BP, Hickey JM, van der Werf JH. The effect of genomic information on optimal contribution selection in livestock breeding programs. Genet Select. Evol. 2013;45:44.

  6. Whittaker AJ, Royzman I, Orr-Weaver TL. Drosophila Double parked: a conserved, essential replication protein that colocalizes with the origin recognition complex and links DNA replication with mitosis and the down-regulation of S phase transcripts. Genes Dev. 2000;14:1765–76.

    CAS  PubMed  PubMed Central  Google Scholar 

  7. Meuwissen THE, Hayes BJ, Goddard ME. Prediction of total genetic value using genome-wide dense marker maps. Genetics. 2001;157:1819–29.

    CAS  PubMed  PubMed Central  Google Scholar 

  8. Maier R, Moser G, Chen G-B, Ripke S, Cross-Disorder Working Group of the Psychiatric Genomics Consortium, Coryell W, et al. Joint analysis of psychiatric disorders increases accuracy of risk prediction for schizophrenia, bipolar disorder, and major depressive disorder. Am J Hum Genet. 2015;96:283–94.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  9. Speed D, Balding DJ. MultiBLUP: improved SNP-based prediction for complex traits. Genome Res. 2014;24:1550–7.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  10. Vilhjálmsson BJ, Yang J, Finucane HK, Gusev A, Lindström S, Ripke S, et al. Modeling Linkage Disequilibrium Increases Accuracy of Polygenic Risk Scores. Am J Hum Genet. 2015;97:576–92.

    Article  PubMed  PubMed Central  Google Scholar 

  11. Khera AV, Chaffin M, Aragam KG, Haas ME, Roselli C, Choi SH, et al. Genome-wide polygenic scores for common diseases identify individuals with risk equivalent to monogenic mutations. Nat Genet. 2018;50:1219–24.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  12. Ge T, Chen C-Y, Ni Y, Feng Y-CA, Smoller JW. Polygenic prediction via Bayesian regression and continuous shrinkage priors. Nat Commun. 2019;10:1–10.

    Article  CAS  Google Scholar 

  13. Ho DSW, Schierding W, Wake M, Saffery R, O’Sullivan J. Machine learning SNP based prediction for precision medicine. Front Genet. 2019.

  14. Wei Z, Wang K, Qu H-Q, Zhang H, Bradfield J, Kim C, et al. From Disease Association to Risk Assessment: An Optimistic View from Genome-Wide Association Studies on Type 1 Diabetes. PLOS Genetics. 2009;5:e1000678.

    Article  PubMed  PubMed Central  Google Scholar 

  15. Bellot P, de los Campos G, Pérez-Enciso M. Can deep learning improve genomic prediction of complex human traits? Genetics. 2018;210:809–19.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  16. Yin B, Balvert M, Spek RAA van der, Dutilh BE, Bohte S, Veldink J, et al. Using the structure of genome data in the design of deep neural networks for predicting amyotrophic lateral sclerosis from genotype. bioRxiv. 2019;533679.

  17. Mavaddat N, Michailidou K, Dennis J, Lush M, Fachal L, Lee A, et al. Polygenic Risk Scores for Prediction of Breast Cancer and Breast Cancer Subtypes. The American Journal of Human Genetics. 2019;104:21–34.

    Article  CAS  PubMed  Google Scholar 

  18. Chan CHT, Munusamy P, Loke SY, Koh GL, Yang AZY, Law HY, et al. Evaluation of three polygenic risk score models for the prediction of breast cancer risk in Singapore Chinese. Oncotarget. 2018;9:12796–804.

    Article  PubMed  PubMed Central  Google Scholar 

  19. Wen W, Shu X, Guo X, Cai Q, Long J, Bolla MK, et al. Prediction of breast cancer risk based on common genetic variants in women of East Asian ancestry. Breast Cancer Res. 2016;18:124.

    Article  PubMed  PubMed Central  Google Scholar 

  20. Hsieh Y-C, Tu S-H, Su C-T, Cho E-C, Wu C-H, Hsieh M-C, et al. A polygenic risk score for breast cancer risk in a Taiwanese population. Breast Cancer Res Treat. 2017;163:131–8.

    Article  CAS  PubMed  Google Scholar 

  21. Shrikumar A, Greenside P, Kundaje A. Learning important features through propagating activation differences. In International Conference on Machine Learning. 2017. p. 3145–53. Accessed 11 Nov 2019.

  22. Ribeiro MT, Singh S, Guestrin C. ‘Why Should I Trust You?’: explaining the predictions of any classifier. In Proceedings of the 22Nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, New York, 2016. p. 1135–44.

  23. Amos CI, Dennis J, Wang Z, Byun J, Schumacher FR, Gayther SA, et al. The OncoArray Consortium: A Network for Understanding the Genetic Architecture of Common Cancers. Cancer Epidemiol Biomarkers Prev. 2017;26:126–35.

    Article  PubMed  Google Scholar 

  24. Chang CC, Chow CC, Tellier LC, Vattikuti S, Purcell SM, Lee JJ. Second-generation PLINK: rising to the challenge of larger and richer datasets. Gigascience. 2015.

  25. Bengio Y. Learning deep architectures for AI. Found Trends Mach Learn 2009;2:1–127.

    Article  Google Scholar 

  26. Xu B, Wang N, Chen T, Li M. Empirical evaluation of rectified activations in convolutional network. arXiv. 2019.

  27. Kingma DP, Ba J. Adam: a method for stochastic optimization. In 3rd International Conference for Learning Representations. 2015.

  28. Srivastava N, Hinton G, Krizhevsky A, Sutskever I, Salakhutdinov R. Dropout: a simple way to prevent neural networks from overfitting. J Mach Learn Res. 2014;15:1929–58.

    Google Scholar 

  29. Ioffe S and Szegedy C. Batch normalization: accelerating deep network training by reducing internal covariate shift. arXiv. 2019.

  30. Hastie T, Rosset S, Zhu J, Zou H. Multi-class adaboost. Stat its Interface. 2009;2:349–60.

    Article  Google Scholar 

  31. Friedman JH. Greedy function approximation: A gradient boosting machine. Ann Statist. 2001;29:1189–232.

    Article  Google Scholar 

  32. Hastie T, Tibshirani R, Friedman J. The Elements of Statistical Learning: Data Mining, Inference, and Prediction, 2nd edn, 2nd ed. New York: Springer-Verlag, 2009.

  33. De R, Bush WS, Moore JH. Bioinformatics Challenges in Genome-Wide Association Studies (GWAS). In Trent R, editor. Clinical Bioinformatics. New York: Springer; 2014. p. 63–81.

  34. Gola D, Erdmann J, Müller-Myhsok B, Schunkert H, König IR. Polygenic risk scores outperform machine learning methods in predicting coronary artery disease status. Genet Epidemiol. 2020;44:125–38.

    Article  PubMed  Google Scholar 

  35. Fergus P, Montanez A, Abdulaimma B, Lisboa P, Chalmers C, Pineles B. Utilising deep learning and genome wide association studies for epistatic-driven preterm birth classification in African-American Women. In IEEE/ACM Transactions on Computational Biology and Bioinformatics. 2018.

  36. Cudic M, Baweja H, Parhar T, Nuske S, Prediction of sorghum bicolor genotype from in-situ images using autoencoder-identified SNPs. In 2018 17th IEEE International Conference on Machine Learning and Applications (ICMLA). 2018.

  37. Dayem Ullah AZ, Oscanoa J, Wang J, Nagano A, Lemoine NR, Chelala C. SNPnexus: assessing the functional relevance of genetic variation to facilitate the promise of precision medicine. Nucleic Acids Res. 2018;46:W109–W113.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  38. O’Connor MJ. Targeting the DNA damage response in cancer. Mol Cell. 2015;60:547–60.

    Article  CAS  PubMed  Google Scholar 

  39. Kolch W, Halasz M, Granovskaya M, Kholodenko BN. The dynamic control of signal transduction networks in cancer cells. Nat Rev Cancer. 2015;15:9.

    Article  CAS  Google Scholar 

  40. Fernald K, Kurokawa M. Evading apoptosis in cancer. Trends Cell Biol. 2013;23:620–33.

    Article  PubMed  PubMed Central  Google Scholar 

  41. Michailidou K, Lindström S, Dennis J, Beesley J, Hui S, Kar S, et al. Association analysis identifies 65 new breast cancer risk loci. Nature. 2017;551:92–4.

    Article  PubMed  PubMed Central  Google Scholar 

  42. Li X, Zou Z, Tang J, Zheng Y, Liu Y, Luo Y, et al. NOS1 upregulates ABCG2 expression contributing to DDP chemoresistance in ovarian cancer cells. Oncol Lett. 2019;17:1595–602.

    CAS  PubMed  Google Scholar 

  43. Mao Q, Unadkat JD. Role of the breast cancer resistance protein (BCRP/ABCG2) in drug transport—an update. AAPS J. 2015;17:65–82.

    Article  CAS  PubMed  Google Scholar 

  44. Lee J-Y, Park AK, Lee K-M, Park SK, Han S, Han W, et al. Candidate gene approach evaluates association between innate immunity genes and breast cancer risk in Korean women. Carcinogenesis. 2009;30:1528–31.

    Article  CAS  PubMed  Google Scholar 

  45. Tinholt M, Viken MK, Dahm AE, Vollan HKM, Sahlberg KK, Garred Ø, et al. Increased coagulation activity and genetic polymorphisms in the F5, F10 and EPCRgenes are associated with breast cancer: a case-control study. BMC Cancer. 2014;14:845.

    Article  PubMed  PubMed Central  Google Scholar 

  46. Cesaratto L, Grisard E, Coan M, Zandonà L, De Mattia E, Poletto E, et al. BNC2 is a putative tumor suppressor gene in high-grade serous ovarian carcinoma and impacts cell survival after oxidative stress. Cell Death Dis. 2016;7:e2374–e2374.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  47. Tsuboi M, Kondo K, Masuda K, Tange S, Kajiura K, Kohmoto T, et al. Prognostic significance of GAD1 overexpression in patients with resected lung adenocarcinoma. Cancer Med. 2019;8:4189–99.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  48. International Schizophrenia Consortium, Purcell SM, Wray NR, Stone JL, Visscher PM, O’Donovan MC, et al. Common polygenic variation contributes to risk of schizophrenia and bipolar disorder. Nature. 2009;460:748–52.

    Article  PubMed Central  Google Scholar 

  49. Scott RA, Scott LJ, Mägi R, Marullo L, Gaulton KJ, Kaakinen M, et al. An Expanded Genome-Wide Association Study of Type 2 Diabetes in Europeans. Diabetes. 2017;66:2888–902.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  50. LeBlanc M, Kooperberg C. Boosting predictions of treatment success. Proc Natl Acad Sci USA. 2010;107:13559–60.

    Article  PubMed  PubMed Central  Google Scholar 

  51. Angermueller C, Pärnamaa T, Parts L, Stegle O. Deep learning for computational biology. Mol Syst Biol. 2016.

  52. Schmidhuber J. Deep learning in neural networks: an overview. Neural Netw. 2015;61:85–117.

    Article  PubMed  Google Scholar 

Download references


We would like to thank the OU Supercomputing Center for Education & Research (OSCER) for supercomputing technical support, the DRIVE project for the GWAS data, NIH dbGap for data access authorization, and Dr. Xu Chao for helpful discussions. The study was funded by Dr. Pan’s startup funding from the University of Oklahoma and by the Oak Ridge National Laboratory (ORNL)’ Directed Research Development (LDRD) Funding. Oak Ridge National Laboratory is managed by UT-Battelle, LLC for the U.S. Department of Energy under Contract Number DE-AC05-00OR22725.

Author information

Authors and Affiliations


Corresponding author

Correspondence to Chongle Pan.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Badré, A., Zhang, L., Muchero, W. et al. Deep neural network improves the estimation of polygenic risk scores for breast cancer. J Hum Genet 66, 359–369 (2021).

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI:

This article is cited by


Quick links