The genomic landscape of metastatic castration-resistant prostate cancers reveals multiple distinct genotypes with potential clinical impact

Metastatic castration-resistant prostate cancer (mCRPC) has a highly complex genomic landscape. With the recent development of novel treatments, accurate stratification strategies are needed. Here we present the whole-genome sequencing (WGS) analysis of fresh-frozen metastatic biopsies from 197 mCRPC patients. Using unsupervised clustering based on genomic features, we define eight distinct genomic clusters. We observe potentially clinically relevant genotypes, including microsatellite instability (MSI), homologous recombination deficiency (HRD) enriched with genomic deletions and BRCA2 aberrations, a tandem duplication genotype associated with CDK12−/− and a chromothripsis-enriched subgroup. Our data suggests that stratification on WGS characteristics may improve identification of MSI, CDK12−/− and HRD patients. From WGS and ChIP-seq data, we show the potential relevance of recurrent alterations in non-coding regions identified with WGS and highlight the central role of AR signaling in tumor progression. These data underline the potential value of using WGS to accurately stratify mCRPC patients into clinically actionable subgroups.


Introduction 56
Prostate cancer is known to be a notoriously heterogeneous disease and the genetic basis for 57 this interpatient heterogeneity is poorly understood 1,2 . The ongoing development of new therapies for 58 metastatic prostate cancer that target molecularly defined subgroups further increases the need for 59 accurate patient classification and stratification [3][4][5] . Analysis of whole-exome sequencing data of 60 metastatic prostate cancer tumors revealed that 65% of patients had actionable targets in non-61 androgen receptor related pathways, including PI3K, Wnt and DNA repair 6 . Several targeted agents 62 involved in these pathways, including mTOR/AKT pathway inhibitors 7 and PARP inhibitors 8  In this study, we analyzed the WGS data obtained from 197 metastatic castration-resistant 87 prostate cancer (mCRPC) patients. We describe the complete genomic landscape of mCRPC, 88 including tumor specific single and multi-nucleotide variants (SNVs and MNVs), small insertions and 89 deletions (InDels), copy number alterations (CNAs), mutational signatures, kataegis, chromothripsis 90 and structural variants (SVs). Next, we compared the mutational frequency of the detected driver 91 genes and genomic subgroups with an unmatched WGS cohort of primary prostate cancer (n = 210), 92 consisting of exclusively of Gleason score 6-7 tumors 15,25 . We investigated the presence of possible 93 driver genes by analyzing genes with enriched (non-synonymous) mutational burdens and recurrent 94 or high-level copy number alterations 26,27 . By utilizing various basic genomic features reflecting 95 genomic instability and employing unsupervised clustering, we were able to define eight distinct 96 genomic subgroups of mCRPC patients. We combined our genomic findings with AR, FOXA1 and 97 H3K27me ChIP-seq data and confirmed that important regulators of AR-mediated signaling are 98 located in non-coding regions with open chromatin and highlight the central role of AR signaling in 99 tumor progression. 100

Characteristics of the mCRPC cohort and sequencing approach 102
We analyzed fresh-frozen metastatic tumor samples and matched blood samples from 197 103 castration-resistant prostate cancer (CRPC) patients using WGS generating to date the largest WGS 104 dataset for mCRPC ( Figure 1a). Clinical details on biopsy site, age and previous treatments of the 105 included patients are described in Figure 1b-

enriched. 125
Our copy number analysis revealed distinct amplified genomic regions including 8q and Xq 126 and deleted regions including 8p, 10q, 13q and 17p (Supplementary figure 2d). Well-known prostate 127 cancer driver genes 8,16 , such as AR, PTEN, TP53 and RB1, are located in these regions. In addition 128 to large-scale chromosomal copy number alterations, we could identify narrow genomic regions with 129 6 recurrent copy number alterations across samples which could reveal important prostate cancer driver 130 genes (Supplementary table 3). 131 TMPRSS2-ERG gene fusions were the most common fusions in our cohort (n = 84 out of 132 197; 42.6%) and were the majority of ETS family fusions (n = 84 out of 95; 88.4%; Figure 2 and 133 Supplementary figure 3). This is comparable to primary prostate cancer, where ETS fusions are found 134 in approximately 50% of tumors 13,15 . The predominant break point was located upstream of the 135 second exon of ERG, which preserves its ETS-domain in the resulting fusion gene. 136 In 42 patients (21.3%), we observed regional hypermutation (kataegis; Figure 2 and 137 Supplementary figure 4). In addition, we did not observe novel mutational signatures specific for 138 metastatic disease or possible pre-treatment histories (Supplementary figure 5) 29 . 139 To further investigate whether our description of the genome-wide mutational burden and 140 observed alterations in drivers and/or subtype-specific genes in mCRPC were metastatic specific, we 141 compared our data against an unmatched WGS cohort of primary prostate cancer (n = 210) 15,25 , 142 consisting of Gleason score 6-7 disease. Comparison of the median genome-wide TMB (SNVs and 143 InDels per Mbp) revealed that the TMB was roughly 3.8 times higher in mCRPC ( Figure 3a) and the 144 frequency of structural variants was also higher ( Figure 3b) between disease stages, increasing as 145 disease progresses. Analysis on selected driver and subtype-specific genes showed that the 146 mutational frequency of several genes (AR,TP53,MYC,ZMYM3,PTEN,PTPRD,ZFP36L2,147 ADAM15,MARCOD2,BRIP1,APC,KMT2C,CCAR2,C8orf58 and RYBP) was significantly 148 altered (q ≤ 0.05) between the primary and metastatic cohorts (Figure 3c-e). All genes for which we 149 observed significant differences in mutational frequency, based on coding mutations, were enriched in 150 mCRPC ( Figure 3d). We did not identify genomic features that were specific for the metastatic setting, 151 beyond androgen deprivation therapy-specific aberrations revolving AR (no aberrations in hormone-152 sensitive setting versus 137 aberrations in castration-resistant setting). We cannot exclude from these 153 data that matched sample analysis or larger scale analysis could reveal such aberrations. 154 We next determined whether previous treatments affect the mutational landscape. Using 155 treatment history information, we grouped prior secondary anti-hormonal therapy, taxane-based

WGS-based stratification defines genomic subgroups in mCRPC 185
Our comprehensive WGS data and large sample size enabled us to perform unsupervised 186 clustering on several WGS characteristics to identify genomic scars that can define subgroups of 187 mCRPC patients. We clustered our genomic data using the total number of SVs, relative frequency of 188 SV events (translocations, inversions, insertions, tandem duplications and deletions), genome-wide 189 8 TMB encompassing SNV, InDels and MNV, and tumor ploidy. Prior to clustering, we subdivided 190 tandem duplications and deletions into two major categories based on the respective genomic size of 191 the aberration (smaller and larger than 100 kbp) since previous studies revealed distinctions based on 192 similar thresholds for these structural variants in relation to specific mutated genes [19][20][21]32  were no distinct genomic signatures or biologic rationale for patient clustering (cluster C, E, G, H). In 9 cluster C, conjoint aberrations of BRCA1 and TP53 were observed in one patient with a high HR-220 deficiency prediction score (CHORD), which is known to lead to a small tandem duplication 221 phenotype (< 100 kbp) 32 . Two other patients within cluster C displayed a weak CHORD scoring 222 associated with HR-deficiency, however no additional definitive evidence was found for a BRCA1 223 loss-of-function mutation within these patients. 224 In addition to our unsupervised clustering approach, we clustered our samples using the Furthermore, only one sample (1%) with MSI-like and high TMB (> 10), respectively, was observed in 245 the primary cancer cohort. Indeed, there is a striking difference in the mutational load between both 246 disease settings. 247

Discussion 248
We performed WGS of metastatic tumor biopsies and matched normal blood obtained from 249 197 patients with mCRPC to provide an overview of the genomic landscape of mCRPC. The size of 250 our cohort enables classification of patients into distinct disease subgroups using unsupervised 251 clustering. Our data suggest that classification of patients using genomic events, as detected by 252 WGS, improves patient stratification, specifically for clinically actionable subgroups such as BRCA 253 deficient and MSI patients. Furthermore, we confirm the central role of AR signaling in mCRPC that 254 mediates its effect through regulators located in non-coding regions and the apparent difference in 255 primary versus metastatic prostate cancers. In addition, PCAT1 is a long non-coding RNA which is known to be upregulated in prostate 289 cancer and negatively regulates BRCA2 expression while positively affecting MYC expression 38,39 . 290 Combining our WGS approach with AR, FOXA1 and H3K27ac ChIP-seq data, we identify non-coding 291 regions affecting both AR itself, and possibly MYC, through AR-enhancer amplification as a potential 292 mechanism contributing to castration resistance. 293 294 A potential pitfall of our clustering analysis is the selection of features used; for this we made 295 a number of assumptions based on the literature and distribution of the structural variants within our 296 cohort [19][20][21]32 . As the input features and weights for clustering analysis are inherent to the clustering 297 outcome, we performed additional clustering analyses using various combinations of these features 298 and applied alternative approaches but did not detect striking differences compared to the current 299 approach. Another potential pitfall of the employed hierarchical clustering scheme is that patients are 300 only attributed to a single cluster. An example of this can be seen in cluster A where a patient is 301 grouped based on its predominant genotype (MSI) and associated mutations in MMR-related genes 302 (MLH1, POLE, POLD3 and BLM), but this sample also displays an increased number of structural 303 variants and increased ploidy status and harbors a pathogenic BRCA2 mutation. However, it is 304 missing the characteristic number of genomic deletions (< 100 kbp) and BRCA mutational signature 305 associated with BRCA2 -/samples that define cluster D. Despite these pitfalls we conclude that 306 unbiased clustering contributes towards improved classification of patients.

308
The CPCT-02 study was designed to examine the correlation of genomic data with treatment 309 outcome after biopsy at varying stages of disease. Our cohort contains patients with highly variable 310 pre-treatment history and since the treatments for mCRPC patients nowadays significantly impacts 311 overall survival, the prognosis of patients differs greatly. Therefore, correlation between genomic data 312 and clinical endpoints, such as survival is inherently flawed due to the very heterogeneous nature of 313 the patient population. Moreover, our analysis comparing primary and metastatic samples shows a 314 significant increase in the number of genomic aberrations with advancing disease, meaning that the 315 difference in timing of the biopsies may bias the prognostic value of the data. In future studies we plan 316 to gather all known clinically defined prognostic information and determine whether the genomic 317 subtypes increase the ability to predict outcome. Unfortunately, some clinical parameters with 318 prognostic importance such as ethnicity will not be available due to ethical regulations. Moreover, we 319 will increase the sample size, in order to correlate genomic features to clinical parameters to better 320 determine whether the subtypes we identified are stable over time. Therefore, we are currently unable 321 to present meaningful correlations between clinical endpoints and the clusters we identified.   1963-1970 (2006).