Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

A genomic and epigenomic atlas of prostate cancer in Asian populations

Abstract

Prostate cancer is the second most common cancer in men worldwide1. Over the past decade, large-scale integrative genomics efforts have enhanced our understanding of this disease by characterizing its genetic and epigenetic landscape in thousands of patients2,3. However, most tumours profiled in these studies were obtained from patients from Western populations. Here we produced and analysed whole-genome, whole-transcriptome and DNA methylation data for 208 pairs of tumour tissue samples and matched healthy control tissue from Chinese patients with primary prostate cancer. Systematic comparison with published data from 2,554 prostate tumours revealed that the genomic alteration signatures in Chinese patients were markedly distinct from those of Western cohorts: specifically, 41% of tumours contained mutations in FOXA1 and 18% each had deletions in ZNF292 and CHD1. Alterations of the genome and epigenome were correlated and were predictive of disease phenotype and progression. Coding and noncoding mutations, as well as epimutations, converged on pathways that are important for prostate cancer, providing insights into this devastating disease. These discoveries underscore the importance of including population context in constructing comprehensive genomic maps for disease.

This is a preview of subscription content, access via your institution

Relevant articles

Open Access articles citing this article.

Access options

Buy article

Get time limited or full article access on ReadCube.

$32.00

All prices are NET prices.

Fig. 1: Molecular landscape of the CPGEA cohort.
Fig. 2: The genomic alteration landscape in CPGEA and TCGA.
Fig. 3: FOXA1 mutations in CPGEA.
Fig. 4: DNA methylation abnormalities and subtypes of the CPGEA prostate cancer cohort.

Data availability

All data, including raw data, mutation calls, and clinical information, have been deposited to the Genome Sequence Archive for Human (http://bigd.big.ac.cn/gsa-human/) at the BIG Data Center, Beijing Institute of Genomics, Chinese Academy of Sciences, under the accession number PRJCA001124. The raw sequencing data and somatic and germ-line mutation calls contain information unique to an individual and require controlled access. The deposited and publicly available data are compliant with the regulations of the Ministry of Science and Technology of the People’s Republic of China. Source Data for Figs. 2, 4 and Extended Data Figs. 68 are provided with the paper.

Code availability

All computational code used in this study is available at the supporting website (http://www.cpgea.com).

References

  1. Bray, F. et al. Global cancer statistics 2018: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries. CA Cancer J. Clin. 68, 394–424 (2018).

    Article  PubMed  Google Scholar 

  2. Cancer Genome Atlas Research Network. The molecular taxonomy of primary prostate cancer. Cell 163, 1011–1025 (2015).

    Article  CAS  Google Scholar 

  3. Armenia, J. et al. The long tail of oncogenic drivers in prostate cancer. Nat. Genet. 50, 645–651 (2018).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  4. Shoag, J. & Barbieri, C. E. Clinical variability and molecular heterogeneity in prostate cancer. Asian J. Androl. 18, 543–548 (2016).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  5. Kimura, T. East meets West: ethnic differences in prostate cancer epidemiology between East Asians and Caucasians. Chin. J. Cancer 31, 421–429 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  6. Baca, S. C. et al. Punctuated evolution of prostate cancer genomes. Cell 153, 666–677 (2013).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  7. Barbieri, C. E. et al. Exome sequencing identifies recurrent SPOP, FOXA1 and MED12 mutations in prostate cancer. Nat. Genet. 44, 685–689 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  8. Beltran, H. et al. Divergent clonal evolution of castration-resistant neuroendocrine prostate cancer. Nat. Med. 22, 298–305 (2016).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  9. Fraser, M. et al. Genomic hallmarks of localized, non-indolent prostate cancer. Nature 541, 359–364 (2017).

    Article  ADS  CAS  PubMed  Google Scholar 

  10. Gao, D. et al. Organoid cultures derived from patients with advanced prostate cancer. Cell 159, 176–187 (2014).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  11. Grasso, C. S. et al. The mutational landscape of lethal castration-resistant prostate cancer. Nature 487, 239–243 (2012).

    Article  ADS  CAS  PubMed  PubMed Central  Google Scholar 

  12. Hieronymus, H. et al. Copy number alteration burden predicts prostate cancer relapse. Proc. Natl Acad. Sci. USA 111, 11139–11144 (2014).

    Article  ADS  CAS  PubMed  PubMed Central  Google Scholar 

  13. Kumar, A. et al. Substantial interindividual and limited intraindividual genomic diversity among tumors from men with metastatic prostate cancer. Nat. Med. 22, 369–378 (2016).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  14. Robinson, D. et al. Integrative clinical genomics of advanced prostate cancer. Cell 161, 1215–1228 (2015).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  15. Taylor, B. S. et al. Integrative genomic profiling of human prostate cancer. Cancer Cell 18, 11–22 (2010).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  16. Yuan, J. et al. Integrated analysis of genetic ancestry and genomic alterations across cancers. Cancer Cell 34, 549–560.e9 (2018).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  17. Abida, W. et al. Prospective genomic profiling of prostate cancer across disease states reveals germline and somatic alterations that may affect clinical decision making. JCO Precis. Oncol. https://doi.org/10.1200/PO.17.00029 (2017).

  18. Dall’Era, M. A., deVere-White, R., Rodriguez, D. & Cress, R. Changing incidence of metastatic prostate cancer by race and age, 1988–2015. Eur. Urol. Focus 5, 1014–1021 (2019).

    Article  PubMed  Google Scholar 

  19. Ren, S. et al. Whole-genome and transcriptome sequencing of prostate cancer identify new genetic alterations driving disease progression. Eur. Urol. 73, 322–339 (2017).

    Article  CAS  PubMed  Google Scholar 

  20. Vogelstein, B. et al. Cancer genome landscapes. Science 339, 1546–1558 (2013).

    Article  ADS  CAS  PubMed  PubMed Central  Google Scholar 

  21. Alexandrov, L. B. et al. Signatures of mutational processes in human cancer. Nature 500, 415–421 (2013).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  22. Shen, M. M. & Abate-Shen, C. Molecular genetics of prostate cancer: new prospects for old challenges. Genes Dev. 24, 1967–2000 (2010).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  23. Tomlins, S. A. et al. Recurrent fusion of TMPRSS2 and ETS transcription factor genes in prostate cancer. Science 310, 644–648 (2005).

    Article  ADS  CAS  PubMed  Google Scholar 

  24. Quigley, D. A. et al. Genomic hallmarks and structural variation in metastatic prostate cancer. Cell 174, 758–769.e9 (2018).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  25. Viswanathan, S. R. et al. Structural alterations driving castration-resistant prostate cancer revealed by linked-read genome sequencing. Cell 174, 433–447.e19 (2018).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  26. Cortés-Ciriano, I. et al. Comprehensive analysis of chromothripsis in 2,658 human cancers using whole-genome sequencing. Nat. Genet. 52, 331–341 (2020).

  27. Yu, Y. P. et al. Novel fusion transcripts associate with progressive prostate cancer. Am. J. Pathol. 184, 2840–2849 (2014).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  28. Jang, J. S. et al. Common oncogene mutations and novel SND1-BRAF transcript fusion in lung adenocarcinoma from never smokers. Sci. Rep. 5, 9755 (2015).

    Article  ADS  CAS  PubMed  PubMed Central  Google Scholar 

  29. Fishilevich, S. et al. GeneHancer: genome-wide integration of enhancers and target genes in GeneCards. Database (Oxford) 2017, bax028 (2017).

    Article  CAS  Google Scholar 

  30. Huang, F. W. et al. Highly recurrent TERT promoter mutations in human melanoma. Science 339, 957–959 (2013).

    Article  ADS  CAS  PubMed  PubMed Central  Google Scholar 

  31. Rheinbay, E. et al. Analyses of non-coding somatic drivers in 2,658 cancer whole genomes. Nature 578, 102–111 (2020).

    Article  ADS  CAS  PubMed  PubMed Central  Google Scholar 

  32. Zhu, H. et al. Candidate cancer driver mutations in distal regulatory elements and long-range chromatin interaction networks. Mol. Cell. https://doi.org/10.1016/j.molcel.2019.12.027 (2020).

  33. Jozwik, K. M. & Carroll, J. S. Pioneer factors in hormone-dependent cancers. Nat. Rev. Cancer 12, 381–385 (2012).

    Article  CAS  PubMed  Google Scholar 

  34. Sahu, B. et al. Dual role of FoxA1 in androgen receptor binding to chromatin, androgen signalling and prostate cancer. EMBO J. 30, 3962–3976 (2011).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  35. Espiritu, S. M. G. et al. The evolutionary landscape of localized prostate cancers drives clinical aggression. Cell 173, 1003–1013.e15 (2018).

    Article  CAS  PubMed  Google Scholar 

  36. Gao, N. et al. The role of hepatocyte nuclear factor-3 alpha (Forkhead Box A1) and androgen receptor in transcriptional regulation of prostatic genes. Mol. Endocrinol. 17, 1484–1507 (2003).

    Article  CAS  PubMed  Google Scholar 

  37. Adams, E. J. et al. FOXA1 mutations alter pioneering activity, differentiation and prostate cancer phenotypes. Nature 571, 408–412 (2019).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  38. Parolia, A. et al. Distinct structural classes of activating FOXA1 alterations in advanced prostate cancer. Nature 571, 413–418 (2019).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  39. McGranahan, N. et al. Clonal status of actionable driver events and the timing of mutational processes in cancer evolution. Sci. Transl. Med. 7, 283ra54 (2015).

    Article  PubMed  PubMed Central  Google Scholar 

  40. Mina, M. et al. Conditional selection of genomic alterations dictates cancer evolution and oncogenic dependencies. Cancer Cell 32, 155–168.e6 (2017).

    Article  CAS  PubMed  Google Scholar 

  41. Ishizaki, F. et al. Androgen deprivation promotes intratumoral synthesis of dihydrotestosterone from androgen metabolites in prostate cancer. Sci. Rep. 3, 1528 (2013).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  42. Berman, B. P. et al. Regions of focal DNA hypermethylation and long-range hypomethylation in colorectal cancer coincide with nuclear lamina-associated domains. Nat. Genet. 44, 40–46 (2011).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  43. Hansen, K. D. et al. Increased methylation variation in epigenetic domains across cancer types. Nat. Genet. 43, 768–775 (2011).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  44. Hon, G. C. et al. Global DNA hypomethylation coupled to repressive chromatin domain formation and gene silencing in breast cancer. Genome Res. 22, 246–258 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  45. Mazor, T. et al. DNA methylation and somatic mutations converge on the cell cycle and define similar evolutionary histories in brain tumors. Cancer Cell 28, 307–317 (2015).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  46. Xiao, Q. et al. Systematic analysis reveals molecular characteristics of ERG-negative prostate cancer. Sci. Rep. 8, 12868 (2018).

    Article  ADS  CAS  PubMed  PubMed Central  Google Scholar 

  47. Chakravarty, D. et al. OncoKB: a precision oncology knowledge base. JCO Precis. Oncol. https://doi.org/10.1200/PO.17.00011 (2017).

  48. Xu, B. et al. Altered chromatin recruitment by FOXA1 mutations promotes androgen independence and prostate cancer progression. Cell Res. 29, 773–775 (2019).

    Article  PubMed  PubMed Central  Google Scholar 

  49. Gao, S. et al. Forkhead domain mutations in FOXA1 drive prostate cancer progression. Cell Res. 29, 770–772 (2019).

    Article  PubMed  PubMed Central  Google Scholar 

  50. Gao, X., Wang, H., Wang, Y., Xu, C. & Sun, Y. Construction and clinical application of prostate cancer database (PC-Follow) based on browser/server schema. Chin. J. Urol. 36, 694–698 (2015).

    Google Scholar 

  51. Bergmann, E. A., Chen, B. J., Arora, K., Vacic, V. & Zody, M. C. Conpair: concordance and contamination estimator for matched tumor-normal pairs. Bioinformatics 32, 3196–3198 (2016).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  52. Krueger, F. & Andrews, S. R. Bismark: a flexible aligner and methylation caller for Bisulfite-Seq applications. Bioinformatics 27, 1571–1572 (2011).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  53. Langmead, B. & Salzberg, S. L. Fast gapped-read alignment with Bowtie 2. Nat. Methods 9, 357–359 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  54. Kim, D. et al. TopHat2: accurate alignment of transcriptomes in the presence of insertions, deletions and gene fusions. Genome Biol. 14, R36 (2013).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  55. Anders, S., Pyl, P. T. & Huber, W. HTSeq—a Python framework to work with high-throughput sequencing data. Bioinformatics 31, 166–169 (2015).

    Article  CAS  PubMed  Google Scholar 

  56. Love, M. I., Huber, W. & Anders, S. Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome Biol. 15, 550 (2014).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  57. Trapnell, C. et al. Differential analysis of gene regulation at transcript resolution with RNA-seq. Nat. Biotechnol. 31, 46–53 (2013).

    Article  CAS  PubMed  Google Scholar 

  58. Trapnell, C. et al. Differential gene and transcript expression analysis of RNA-seq experiments with TopHat and Cufflinks. Nat. Protocols 7, 562–578 (2012).

    Article  CAS  PubMed  Google Scholar 

  59. Friedländer, M. R., Mackowiak, S. D., Li, N., Chen, W. & Rajewsky, N. miRDeep2 accurately identifies known and hundreds of novel microRNA genes in seven animal clades. Nucleic Acids Res. 40, 37–52 (2012).

    Article  CAS  PubMed  Google Scholar 

  60. McKenna, A. et al. The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data. Genome Res. 20, 1297–1303 (2010).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  61. Cibulskis, K. et al. Sensitive detection of somatic point mutations in impure and heterogeneous cancer samples. Nat. Biotechnol. 31, 213–219 (2013).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  62. Saunders, C. T. et al. Strelka: accurate somatic small-variant calling from sequenced tumor-normal sample pairs. Bioinformatics 28, 1811–1817 (2012).

    Article  CAS  PubMed  Google Scholar 

  63. Wang, K., Li, M. & Hakonarson, H. ANNOVAR: functional annotation of genetic variants from high-throughput sequencing data. Nucleic Acids Res. 38, e164 (2010).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  64. Thorvaldsdóttir, H., Robinson, J. T. & Mesirov, J. P. Integrative Genomics Viewer (IGV): high-performance genomics data visualization and exploration. Brief. Bioinform. 14, 178–192 (2013).

    Article  CAS  PubMed  Google Scholar 

  65. Boeva, V. et al. Control-FREEC: a tool for assessing copy number and allelic content using next-generation sequencing data. Bioinformatics 28, 423–425 (2012).

    Article  CAS  PubMed  Google Scholar 

  66. Amemiya, H. M., Kundaje, A. & Boyle, A. P. The ENCODE blacklist: identification of problematic regions of the genome. Sci. Rep. 9, 9354 (2019).

    Article  ADS  CAS  PubMed  PubMed Central  Google Scholar 

  67. Mermel, C. H. et al. GISTIC2.0 facilitates sensitive and confident localization of the targets of focal somatic copy-number alteration in human cancers. Genome Biol. 12, R41 (2011).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  68. Yang, L. et al. Diverse mechanisms of somatic structural variations in human cancer genomes. Cell 153, 919–929 (2013).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  69. Jia, W. et al. SOAPfuse: an algorithm for identifying fusion transcripts from paired-end RNA-Seq data. Genome Biol. 14, R12 (2013).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  70. Panigrahi, P., Jere, A. & Anamika, K. FusionHub: A unified web platform for annotation and visualization of gene fusion events in human cancer. PLoS One 13, e0196588 (2018).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  71. Shugay, M., Ortiz de Mendíbil, I., Vizmanos, J. L. & Novo, F. J. Oncofuse: a computational framework for the prediction of the oncogenic potential of gene fusions. Bioinformatics 29, 2539–2546 (2013).

    Article  CAS  PubMed  Google Scholar 

  72. Gonzalez-Perez, A. et al. Computational approaches to identify functional genetic variants in cancer genomes. Nat. Methods 10, 723–729 (2013).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  73. Porta-Pardo, E. et al. Comparison of algorithms for the detection of cancer drivers at subgene resolution. Nat. Methods 14, 782–788 (2017).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  74. Dees, N. D. et al. MuSiC: identifying mutational significance in cancer genomes. Genome Res. 22, 1589–1598 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  75. Lawrence, M. S. et al. Mutational heterogeneity in cancer and the search for new cancer-associated genes. Nature 499, 214–218 (2013).

    Article  ADS  CAS  PubMed  PubMed Central  Google Scholar 

  76. Sondka, Z. et al. The COSMIC Cancer Gene Census: describing genetic dysfunction across all human cancers. Nat. Rev. Cancer 18, 696–705 (2018).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  77. Fu, Y. et al. FunSeq2: a framework for prioritizing noncoding regulatory variants in cancer. Genome Biol. 15, 480 (2014).

    Article  PubMed  PubMed Central  Google Scholar 

  78. Melton, C., Reuter, J. A., Spacek, D. V. & Snyder, M. Recurrent somatic mutations in regulatory regions of human cancer genomes. Nat. Genet. 47, 710–716 (2015).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  79. Weinhold, N., Jacobsen, A., Schultz, N., Sander, C. & Lee, W. Genome-wide analysis of noncoding regulatory mutations in cancer. Nat. Genet. 46, 1160–1165 (2014).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  80. Clark, K. L., Halay, E. D., Lai, E. & Burley, S. K. Co-crystal structure of the HNF-3/fork head DNA-recognition motif resembles histone H5. Nature 364, 412–420 (1993).

    Article  ADS  CAS  PubMed  Google Scholar 

  81. Humphrey, W., Dalke, A. & Schulten, K. VMD: visual molecular dynamics. J. Mol. Graph. 14, 33–38 (1996).

    Article  CAS  PubMed  Google Scholar 

  82. The ENCODE Project Consortium. An integrated encyclopedia of DNA elements in the human genome. Nature 489, 57–74 (2012).

    Article  ADS  CAS  Google Scholar 

  83. Wu, H. et al. Detection of differentially methylated regions from whole-genome bisulfite sequencing data without replicates. Nucleic Acids Res. 43, e141 (2015).

    PubMed  PubMed Central  Google Scholar 

  84. Kishore, K. et al. methylPipe and compEpiTools: a suite of R packages for the integrative analysis of epigenomics data. BMC Bioinformatics 16, 313 (2015).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  85. Quinlan, A. R. & Hall, I. M. BEDTools: a flexible suite of utilities for comparing genomic features. Bioinformatics 26, 841–842 (2010).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  86. Harrow, J. et al. GENCODE: the reference human genome annotation for The ENCODE Project. Genome Res. 22, 1760–1774 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  87. McLean, C. Y. et al. GREAT improves functional interpretation of cis-regulatory regions. Nat. Biotechnol. 28, 495–501 (2010).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  88. Weisenberger, D. J. et al. CpG island methylator phenotype underlies sporadic microsatellite instability and is tightly associated with BRAF mutation in colorectal cancer. Nat. Genet. 38, 787–793 (2006).

    Article  CAS  PubMed  Google Scholar 

  89. Noushmehr, H. et al. Identification of a CpG island methylator phenotype that defines a distinct subgroup of glioma. Cancer Cell 17, 510–522 (2010).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  90. Mo, Q. et al. Pattern discovery and cancer gene identification in integrated cancer genomic data. Proc. Natl Acad. Sci. USA 110, 4245–4250 (2013).

    Article  ADS  CAS  PubMed  PubMed Central  Google Scholar 

  91. Subramanian, A. et al. Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles. Proc. Natl Acad. Sci. USA 102, 15545–15550 (2005).

    Article  ADS  CAS  PubMed  PubMed Central  Google Scholar 

  92. Cerami, E. et al. The cBio cancer genomics portal: an open platform for exploring multidimensional cancer genomics data. Cancer Discov. 2, 401–404 (2012).

    Article  PubMed  Google Scholar 

Download references

Acknowledgements

We thank the patients and their families. This study was supported by the ‘Key Research and Development Project on Precision Medicine’ fund (2016YFC090220) granted by the Chinese Ministry of Science and Technology, the Shanghai Key Laboratory of Cell Engineering (14DZ2272300), the Shanghai ‘Top Priority’ Medical Center Project (2017ZZ01005), the ‘National Major New Drug Discovery Initiative’ Fund (2017ZX093040300002) granted by the ‘13th Five-Year Plan’ (Subproject), National Natural Science Foundation of China (81602467, J.L.), and the ‘Zhangjiang National Innovation Demonstration Zone’ Initiative Development Fund. H.J.L., N.M.S., E.C.P. and Ting Wang were supported by American Cancer Society grant RSG-14-049-01-DMC, and E.C.P. was supported by a Postdoctoral Fellowship, PF-17-201-01, from the American Cancer Society. We thank K.-l. Huang for technical assistance on iCluster, and X. Zhang for managing and organizing this project.

Author information

Authors and Affiliations

Authors

Contributions

J.L., Ting Wang and Y.S. conceived and implemented the study. Xu Gao performed surgeries and set up the follow-up database. J.L. and H.J.L. performed data analyses. Z.Z. performed basic WGS analysis. X. Li performed basic RNA-seq analysis. C.X., S.R., H.W., Xiaofeng Gao, J.H., L.W., B.Y., Qing Yang, H.Y., T.Z., Shuo Wang, Z.W., Jun Jiang, C.L., Jianquan Hou, C.H., M.C., N.J., D.Z., S. Wu, Jinjian Yang, Y.C., J.C. and W.Y. contributed to sample collection as surgeons. X. Lu, Yan Wang, M.Q., R.C., H.C., F.Z. and B.L. took care of patients as attending doctors. Qingsong Yang performed radiology review of all patients. Y. Yu and Y. Zhu performed pathology review of all samples. Y. Zhang, J.X. and Shaogang Wang followed up the patients. W.Z., N.M.S., E.C.P. and Tao Wang contributed to computational analysis. X.Z. stored samples in liquid nitrogen. C.Y. and C.W. generated sequencing libraries. Y.W. and G.X. performed Sanger sequencing validation. Junfeng Jiang and Y. Yang performed fusion validation experiment. J.L., H.J.L. and Ting Wang prepared the manuscript with input from all authors.

Corresponding authors

Correspondence to Xu Gao, Ting Wang or Yinghao Sun.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Peer review information Nature thanks Arul Chinnaiyan, Colin Collins, Colin Cooper and Charlie Massie for their contribution to the peer review of this work.

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

A list of affiliations appears at the end of the paper

Extended data figures and tables

Extended Data Fig. 1 Clinical samples, data generation and somatic mutation landscape of CPGEA.

a, Clinical and pathological patient characterization. b, Study design, indicating the number of tumours with each data type. The cohort consisted of 208 patients who underwent radical prostatectomy. All tumours were analysed by WGS, as was a matched normal para-tumour specimen from each patient. In addition, RNA-seq (n = 134 tumours), miRNA-seq (n = 105), and whole-genome DNA methylation (n = 187) data were generated for a subset of patients. cf, Comparison between somatic alteration calls from two pipelines for the TCGA PRAD (primary prostate tumour) cohort. ‘CPGEA pipeline’ indicates the pipeline used in this study. ‘TCGA report’ indicates publicly available somatic alteration calls. c, Distribution of mutation burdens in each cohort. Each dot corresponds to a mutation burden calculated from a tumour–normal pair. Red horizontal bars indicate the median mutation burden from the CPGEA pipeline and TCGA (both 0.70 per Mb). d, Genomic regions with significantly recurrent somatic CNAs called by GISTIC2.0. e, Heat map showing genome-wide CNAs. Top, 114 tumours clustered using the WGS-based CPGEA pipeline. Bottom, array-based TCGA results for the same tumours, arranged in the same order. f, Gene-level alteration frequencies from the two pipelines for the TCGA cohort. g, Alexandrov signatures in CPGEA and their association with clinical features. Top, percentage of samples per signature. Bottom, mutation counts for each signature, ordered from low to high by individual patient. h, Box plot showing the correlation of signatures 8 and 16 with Gleason score (Kruskal–Wallis test). Box plots as in Fig. 4b. Each dot corresponds to a tumour sample.

Extended Data Fig. 2 Landscape of CNA in CPGEA.

a, Heat map representationof CNA segments grouped by CNA burden subgroup (high, intermediate and low). b, Kaplan–Meier plot of biochemical relapse-free survival in different CNA burden subgroups, using the intermediate CNA burden (7.85%) as a cut-off value. P = 0.0024, two-sided log-rank test. c, Cancer genes with a significant CNA in CPGEA and Western cohorts. The inner circle displays a CNA heat map of individual patients sorted by chromosome, with CNA frequencies and significantly altered genes on the outer rim. d, Number of intra-chromosomal rearrangements as a function of the deletion status of CHD1P values were determined by two-sided Mann–Whitney U-test. Box plots as in Fig. 4b.

Extended Data Fig. 3 Landscape of structural variations in CPGEA.

a, Types of structural variation and numbers for individual tumours (columns). Chromoplexy and chromothripsis status, CHD1 deletion status, and ERG fusion status are displayed as a heat map. b, Frequency of recurrent structural variations and their affected genes for five types of structural variation. c, A recurrent inversion potentially disrupts a TAD boundary and results in enhancer hijacking. HiC map for the LNCaP cell line over the inversion. The inversion and TAD boundaries are marked. Expression levels of potentially affected genes are displayed as box plots. P values were determined by two-sided Mann–Whitney U-test. Box plots are as in Fig. 4b. Each dot corresponds to a normal sample (n = 134), a tumour with no structural variation (wild-type (WT), n = 131), or a tumour with structural variation (n = 3). d, Definition of five tiers of structural variation patterns based genomic annotation of the 5′ and 3′ breakpoints. e, Genomic location distribution of 5′ (left) and 3′ (middle) breakpoints, and distribution of different types of structural variation across the five defined tiers (right).

Extended Data Fig. 4 Landscape of gene fusions and SMGs in CPGEA.

a, The circle represents gene fusions in Chinese and Western cohorts. Recurrent fusions (more than two samples) are displayed as connected gene pairs, in which the width of the connecting arc represents the number of samples that contained the fusion. Red indicates novel gene fusions not present in public databases (FusionHub). b, Fusion was validated by Sanger sequencing and RNA-seq data. Red cells indicate validated fusion events, and green cells indicate PCR failure. c, Circos plot displaying ETS family fusions. Expression levels are shown as a function of copy number. d, The SCHLAP1-UBE2E3 gene fusion. e, AMACR fusions. f, A heterozygous SND1-BRAF fusion found in CPGEA. g, In total, 83 SMGs were detected by MuSiC, including 7 genes called by both MuSiC and MutSigCV. h, Fraction of primary, metastatic, and other cancer types investigated by each study. i, Venn diagrams of SMGs defined in different studies. j, Genes significantly mutated in CPGEA, Western primary, and Western metastatic cohorts. Purple cells indicate that the gene was defined as an SMG in the study. hj, The Western cohorts are from CPCG9, SU2C11, T/C/B (Trento/Cornell/Broad, neuroendocrine prostate cancer)8, B/C (Broad/Cornell)7, CRC13, M/DFCI3, TCGA2, Michigan11, MSKCC15, Organoid10, CNA-PNAS12 and MSK17.

Extended Data Fig. 5 Noncoding mutations in CPGEA.

a, Schematic workflow of noncoding mutation analysis in CPGEA. b, Distribution of noncoding mutations across different genomic features. c, Significance of mutation hotspots in noncoding regulatory regions. Each hotspot is colour-coded for its regulatory region annotation, and the statistical significance (false discovery rate (FDR)) and number of hits per sample are displayed. d, Significance of recurrent mutations in regulatory regions of interest. Regulatory regions for individual genes are displayed based on local and global measures of statistical significance (FDR). Colours indicate regulatory region annotations, and key genes are labelled. e, Enrichment of noncoding mutations resulting in gain or loss of transcription factor-binding sites. For each transcription factor, the match score to the position weight matrix (PWM) was determined for mutations that could potentially destroy or create a binding site for that transcription factor. Plotted for each transcription factor is the mean difference in the match scores for the mutated and reference alleles. Red indicates FDR < 0.05. P values for differences in mean match score were computed by two-sided paired Wilcoxon rank-sum test. fh, Examples of noncoding mutations in selected genes. TBL1XR1 (f), FOXA1 (g) and FLI1 (h) are shown. Genome browser views show the location of the noncoding mutation. The genomic coordinates and types of noncoding mutation are labelled above the genome browser. Gene expression of genes with noncoding mutations is depicted.

Extended Data Fig. 6 FOXA1 mutations in CPGEA.

a, FOXA1 mutation validation. Two representative validations by Sanger sequencing and reconstructed RNA-seq analysis. b, Validation of a FOXA1 in-frame deletion-derived peptide by mass spectrometry. c, Mapping of FOXA1 mutations onto the three-dimensional structure of FOXA1 and bound DNA (based on PDB registry 1VTN78). d, DNA methylation over FOXA1-binding sites in tumours with FOXA1 truncation/in-frame deletion. Top, FOXA1-binding motifs in the ENCODE chromatin immunoprecipitation with high-throughput sequencing (ChIP–seq) dataset (left) versus FOXA1-binding motifs outside of FOXA1 ChIP–seq peaks (right). Bottom, wild-type FOXA1-binding sites (left) and mutant FOXA1-binding sites (right) from recently published ChIP–seq data38. P values were determined by one-sided Mann–Whitney U-tests. Box plots are as in Fig. 4b. Each dot corresponds to a normal or tumour sample. e, Clonal analysis of FOXA1 in CPGEA. f, Mutual exclusivity or co-occurrence of gene alterations between genes belonging to 12 important curated pathways. Only alterations with at least one significant interaction (P < 0.05) are included. Asterisks indicate significant relationships. g, Allele frequency distribution of FOXA1 mutations in CPGEA and TCGA processed with the CPGEA pipeline. h, Significant mutual exclusions and co-occurrences between FOXA1 mutations and other genetic lesions in CPGEA, identified by OncoPrint from cBioPortal92. i, FOXA1 mutations and downstream pathways. Pairwise comparison of expression levels of important pathways. The z-score of specific genes and clinical features are displayed in a heat map grouped by different mutation subtypes.

Source Data

Extended Data Fig. 7 DNA methylation abnormalities in CPGEA.

a, Heat map of DNA methylation levels in the CPGEA cohort. Rows represent defined genomic regions including PMDs, hypoDMRs and hyperDMRs, and columns represent samples. Tumours (right) and matched normal samples (left) are sorted by epimutation rate. In each category, genomic regions are sorted by chromosomal coordinates. The top panel shows clinicopathological features of patients (as in Fig. 1), genetic alterations including fusions and coding mutations, and other molecular phenotypes. b, Two-dimensional density plot of the average CpG methylation level in normal versus tumour samples from the same patient. c, Average methylation level of CpGs overlapping different genomic features. P values determined by two-sided Wilcoxon signed-rank test. CDS, coding sequence. Each dot corresponds to a normal prostate or tumour sample. d, Average methylation level of CpGs overlapping different repeat element classes. P values were determined by two-sided Wilcoxon signed-rank test. Each dot corresponds to a normal prostate or tumour sample. e, Average non-CG methylation level in tumours and matched normal samples. Each dot represents a sample. Mean 0.37% for each group. P values were determined by two-sided Wilcoxon signed-rank test. Each dot corresponds to a normal prostate or tumour sample. f, Genome-wide methylation levels in 100-kb bins, clustered across tumour samples. Rows represent samples, and columns represent 100-kb genomic bins, with the DNA methylation level of each bin represented by the heat map. g, The genome fraction of total PMD length in each tumour, in decreasing order. The leftmost bar represents the genome fraction of the union set of PMDs across all tumours. h, PMD recurrence. The red line represents PMDs shared by at least 100 tumours (711 out of 2,218). i, Mutation frequency inside versus outside PMDs. P = 7.5 × 10−32, two-sided Wilcoxon signed-rank test. Mutation frequency was measured as the average number of SNVs per Mb. Each dot corresponds to a tumour sample (n = 187). j, Expression level of genes located in PMDs (n = 4,043) or outside PMDs (n = 15,344) in tumours versus matched normal samples. P values determined by one-sided Wilcoxon signed-rank test. Genes in PMDs had significantly lower expression than genes outside PMDs in both tumours and normal samples (P = 0, two-sided Mann–Whitney U-test). Outlier genes with very high expression were omitted from the plot. All box plots are as in Fig. 4b.

Source Data

Extended Data Fig. 8 DMRs and CIMP in CPGEA.

a, Recurrence of hypoDMRs. There were 1,172 hypoDMRs were shared by at least 10 tumours (red line). b, Recurrence of hyperDMRs. There were 4,214 hyperDMRs were shared by at least 10 tumours (red line). c, Genomic location of the union set of hypoDMRs and recurrent hypoDMRs. The innermost circle represents the reference genome background. d, Genomic location of the union set of hyperDMRs and recurrent hyperDMRs. The innermost circle represents the reference genome background. e, MSigDB perturbation enrichment analysis of recurrent hypoDMRs (n = 1,172) using GREAT87. f, Gene Ontology (GO) enrichment analysis of recurrent hyperDMRs (n = 4,214) using GREAT. The top 20 GO biological process terms are shown. g, Scatter plots of example epigenetically silenced genes. Each dot represents a normal sample (red), a tumour without a silenced gene (blue), or a tumour with a silenced gene (black). TPM, transcripts per million. h, Heat map of CIMP-CGI methylation levels. Rows represent CIMP-CGIs, and columns represent samples. Tumours (right) were clustered by CIMP-CGI methylation levels, and matched normal samples (left) were sorted in the same order. CIMP-CGIs were sorted by chromosome and genomic coordinates. The top panel shows clinicopathological features of patients (as in Fig. 1), genetic alterations, including fusions and coding mutations, and other molecular phenotypes. i, Proportion of recurrent hyperDMRs overlapping CGIs. j, Association of CIMP+ tumours (n = 33) with gene mutation status. Red vertical line represents P = 0.05 (two-sided Fisher’s exact test). k, Kaplan–Meier plot of biochemical recurrence-free survival in patients with CIMP+ and CIMP tumours. P values were determined by two-sided log-rank test. l, m, Correlation between epimutation burden and mutation (l) or CNA (m) burden. Spearman’s correlation coefficient ρ = 0.37, P = 2.5 × 10−7 for mutation burden, and ρ = 0.65, P = 1.2 × 10−23 for CNA burden. Each dot represents a tumour (n = 187).

Source Data

Extended Data Fig. 9 Molecular subtypes of prostate cancer.

a, Molecular taxonomy across eight cohorts based on seven important oncogenic drivers identified by TCGA. b, Mutation burden, CNA burden and epimutation burden across the four molecular subtypes in CPGEA. c, Key CNA events, CIMP and fusion events across the four subtypes. ERG fusion-positive genes were combined results from Meerkat, SOAPfuse and high expression samples. d, Annotation of each molecular subtype. e, Kaplan–Meier plot of biochemical relapse-free survival for iCluster subtype D compared to the other three iCluster subtypes. P values were determined by two-sided log-rank test. fh, Clustering of tumours using single datasets, using RNA-seq analysis (f), DNA methylation (g), and miRNA data (h). h, Rows represent miRNAs and columns represent tumours. The top panel shows clinical features of patients (as in Fig. 1) along with four miRNA clusters and four iCluster subtypes. i, Violin plots of mutation, CNA and epimutation burdens for four miRNA clusters. Mutation burden, P = 0.85, 0.43, 0.61, 0.58, 0.24 and 0.16, for the comparison between miRNA clusters of 1–2, 1–3, 1–4, 2–3, 2–4 and 3–4, respectively. CNA burden, P = 5.9 × 10−6, 0.00025, 0.29, 0.045, 1.3 × 10−26, and 4.1 × 10−5, in the same order. Epimutation burden, P = 0.0052, 0.090, 0.24, 0.20, 6.1 × 10−5 and 0.0080, in the same order. P values determined by two-sided Mann–Whitney U-test. Each dot corresponds to a tumour sample belong to miRNA cluster 1 (n = 21), 2 (n = 37), 3 (n = 34), or 4 (n = 13). j, Box plots of miRNA expression levels in normal samples and four miRNA-based tumour clusters (cluster 1 (n = 21), 2 (n = 37), 3 (n = 34), or 4 (n = 13)). Box plots are as in Fig. 4b. k, Kaplan–Meier plot of biochemical recurrence-free survival in patients with tumours belonging to miRNA cluster 2 or other clusters. P values were determined by two-sided log-rank test. Primary tumours without any treatment were included.

Extended Data Fig. 10 Oncogenic pathways in prostate cancer.

a, Summary of genetic and epigenetic lesions in 12 curated pathways across the Chinese prostate cancer subtypes. b, Comparison of the frequency of disturbances in the AR pathway between CPGEA (primary), TCGA (primary) and SU2C (metastasis) cohorts. The frequency of coding mutations in each AR pathway gene is shown. c, The frequency of fusions, structural variations, noncoding mutations and epimutations in each AR pathway gene in the CPGEA cohort. Information on additional pathways is provided at http://www.cpgea.com. d, Comparison of pathway-level alterations across the CPGEA (206 samples, excluding 2 microsatellite instability (MSI) samples), TCGA (114 samples processed with the CPGEA pipeline), and SU2C cohorts (150 samples downloaded from cBioPortal). To compare across cohorts, only coding mutations and CNAs were considered. e, Frequency of coding alterations (CNAs, fusion genes and nonsynonymous coding mutations) noncoding alterations, and both for each pathway in the CPGEA cohort. f, Different levels of actionable mutations predicted by OncoKB in CPGEA and TCGA.

Supplementary information

Supplementary Information

This file contains Supplementary Discussion.

Reporting Summary

Supplementary Data

Supplementary Data 1: Metadata of public large cohorts of prostate cancer genomics studies.

Supplementary Data

Supplementary Data 2: Clinical and pathological information of specimen.

Supplementary Data

Supplementary Data 3: Comparison of SMG, CNA, and fusion frequencies between CPGEA pipeline and TCGA report on TCGA cohort, and between CPGEA and other public cohorts. Arm level copy number alterations were estimated by GISTIC. Focal copy number alterations and affected genes were also estimated by GISTIC.

Supplementary Data

Supplementary Data 4: GISTIC output of CPGEA cohort (= 208).

Supplementary Data

Supplementary Data 5: Complete list of structural variations in CPGEA cohort.

Supplementary Data

Supplementary Data 6: Complete list of fusion events and validation in CPGEA cohort.

Supplementary Data

Supplementary Data 7: Hotspot, local, global, and TF (= 117) analysis of noncoding mutations in CPGEA cohort.

Supplementary Data

Supplementary Data 8: FOXA1 mutations, validations, and mutual exclusivity and co-occurrences with other 74 genetic alterations from 206 tumours (SELECT output).

Supplementary Data

Supplementary Data 9: Pathway comparison between CPGEA, TCGA and SU2C.

Source data

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Li, J., Xu, C., Lee, H.J. et al. A genomic and epigenomic atlas of prostate cancer in Asian populations. Nature 580, 93–99 (2020). https://doi.org/10.1038/s41586-020-2135-x

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1038/s41586-020-2135-x

This article is cited by

Comments

By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.

Search

Quick links

Nature Briefing: Cancer

Sign up for the Nature Briefing: Cancer newsletter — what matters in cancer research, free to your inbox weekly.

Get what matters in cancer research, free to your inbox weekly. Sign up for Nature Briefing: Cancer