Introduction

Homologous recombination repair mutations (HRRms) in metastatic castration-resistant prostate cancer (mCRPC) are associated with aggressive disease and can indicate potential tumor susceptibility to polyadenosine diphosphate–ribose polymerase (PARP) inhibition [1, 2]. Recently, PARP inhibitors including olaparib and rucaparib have been approved to treat mCRPC [3,4,5,6,7,8,9]. Based on results of the phase III PROfound trial, olaparib was approved by the United States Food and Drug Administration (FDA) in 2020 as a treatment for patients with HRR-mutated mCRPC who progressed following prior treatment with enzalutamide or abiraterone [10, 11]. In the US prescribing information for olaparib, HRRm is defined as a pathogenic mutation in any of the following 14 genes: BRCA1, BRCA2, ATM, BRIP1, BARD1,CDK12, CHEK1, CHEK2, FANCL, PALB2, RAD51B, RAD51C, RAD51D, and RAD54L [11], which can be detected by the FDA-approved FoundationOne CDx [12]. Other testing by the FoundationOne Liquid CDx [13] and the BRACAnalysis CDx [14] were approved for the detection of BRCA1/2 and ATM, and germline BRCA1/2 mutations, respectively. In 2020, the US FDA also approved rucaparib as a treatment for patients with mCRPC associated with a deleterious germline and/or somatic BRCA mutation who were previously treated with androgen receptor–directed therapy and taxane-based chemotherapy [9] based on the results from the phase II TRITON2 trial [15]. Additionally, the phase III TRITON3 trial further showed that in patients with mCRPC with a BRCA mutation, the median duration of imaging-based progression-free survival was significantly longer with rucaparib compared with a physician’s choice control of docetaxel or a second-generation androgen receptor pathway inhibitor (abiraterone acetate or enzalutamide) [16].

Clinical trial evidence suggests that nearly a quarter of patients with mCRPC have tumors with DNA repair pathway gene mutations or alterations [2] Genetic testing to identify patients with HRRm is an essential tool to guide treatment in mCRPC. Several next-generation sequencing (NGS) platforms are used in real-world practice to determine HRRm status in mCRPC, including the FDA-approved tests previously mentioned to determine patient eligibility for olaparib treatment, as well as existing platforms that are standard of care at various institutions. Of note, FoundationOne CDx is the only companion diagnostic approved for detecting somatic and germline mutations in all 14 HRR genes indicated in the US prescribing information for olaparib [12]. In PROfound, the prevalence of any pathogenic mutation among those 14 genes was 26.9% in men whose tumors were successfully sequenced using the FoundationOne CDx assay [10]. However, clinical trial populations may not be representative of real-world populations, and there are incomplete data in the real-world clinical setting on overall HRRm prevalence defined by these 14 genes.

Accurate estimation of HRRm prevalence is key to identifying patients who may benefit from PARP inhibition monotherapy. Published data on HRRm prevalence in advanced prostate cancer derive from heterogeneous defintions of HRRm. For example, studies may assess prevalence in tumor versus germline samples or use different genes to define HRRm, and few studies have analyzed differences by patient demographics (e.g., state of disease or race/ethnicity) or testing panel [17,18,19]. The objectives of this study were (1) to describe real-world HRRm prevalence as defined by the 14 genes in the olaparib US label in advanced prostate cancer using two clinicogenomics databases, the American Association for Cancer Research (AACR) Project Genomics Evidence Neoplasia Information Exchange (GENIE) [20], and the Flatiron Health (FH) and Foundation Medicine, Inc. (FMI) Clinico-Genomic Database (CGDB) [21]; and (2) to understand how HRRm prevalence may vary by patient demographics, clinical characteristics, and treatment center.

Materials and methods

Study design and data sources

The CGDB [21, 22] is a de-identified longitudinal database originating from approximately 280 US cancer clinics (~800 sites of care). Retrospective longitudinal clinical data were derived from electronic health record data comprising patient-level structured and unstructured data, curated via technology-enabled abstraction, and were linked to genomic data derived from FMI comprehensive genomic profiling tests in the CGDB by de-identified, deterministic matching. The data used for this study were updated through December 31, 2020. GENIE [20] is an international cancer registry that provides clinical-grade, de-identified, next-generation cancer genomic sequencing data collected during routine medical practice. The data used in this analysis were from GENIE public version 10.1, which was updated through June 30, 2020.

Patients

The CGDB patient cohort included men aged ≥18 years diagnosed with metastatic or advanced prostate cancer between January 1, 2018, and December 31, 2019, as assayed by FoundationOne CDx. Patients must have had a loss of heterozygosity score availability; at least two documented clinical visits at a site in the FH research network on or after January 1, 2011; and demographic information at FH and FMI that was uniquely and deterministically matched by a third party–linking vendor. Patients with histology not otherwise specified (e.g., not adenocarcinoma) were excluded [21, 22]. For the CGDB database, the data included age at sequencing, time between specimen collection and sequencing, race (White, Black, Asian, other, unknown or not collected), Gleason score and stage at initial diagnosis, Eastern Cooperative Oncology Group (ECOG) performance status, prostate-specific antigen (PSA) level, castration resistance or hormone sensitivity status, metastatic status at specimen collection date and at sequencing, practice type (community based, academic), and sample type (primary tumor, metastatic site). Ethnicity data were not available in the CGDB database. The GENIE cohort included men aged ≥18 years with prostate cancer who had NGS as standard of care between January 2011 and June 30, 2020. Patients with sample type not specified (e.g., primary vs. metastatic) were excluded. For the GENIE database, the data included age at sequencing, race (White, Black, Asian, other, unknown or not collected), ethnicity (non-Hispanic, Hispanic, unknown or not collected), sample type (primary tumor, metastatic site), and sequencing platform/treatment center (Dana-Farber Cancer institute [DFCI]; Duke Cancer Institute [Duke]; Memorial Sloan Kettering Cancer Center [MSK]; or other). Each of the three main centers used a different sequencing platform. DFCI and MSK each used a custom institutional panel, and Duke used a FMI panel; other sites used a mix of panels (Table S1). In addition to differences in the demographic and clinical variables available, a key difference between the datasets is in the type of mutations reported. The CGDB reports pathogenic mutations, regardless of somatic and germline origin, while filtering out most common benign germline mutations. Although the NGS performed on GENIE patient tumor tissue captured both germline and somatic mutations for their clinical care, ultimately germline mutation data were filtered out in the GENIE database for patient privacy. This filtering process has been previously described [23]. Although the filtering may not have removed all germline mutations (those with <0.0005% population frequency may remain), the HRRm mutations available for analysis in the GENIE dataset can be considered to be primarily somatic. Key differences between PROfound and the databases are listed in Table S2.

Objectives and analyses

The primary objective was to determine the prevalence of HRRm based on the 14 genes indicated in the olaparib US prescribing information. The algorithm to determine HRRm from tissue samples was similar to that used in the PROfound trial [10, 24]. Only pathogenic or likely pathogenic gene alterations were included. DNA alterations were identified that result in truncation of the protein (nonsense mutations, frameshift mutations, and splice site mutations), large-scale (i.e., affecting at least a whole exon) genomic deletions/insertions/rearrangements, homozygous deletions, and other mutation types identified and reported as deleterious variants in the Breast Cancer Information Core database [25] or ClinVar [26]. Copy number alterations (e.g., genomic deletions/insertions and rearrangements, and homozygous deletions) were assessed for FoundationOne® CDx only due to difficulties in harmonizing copy number alteration calling across different NGS panels between AACR GENIE centers. Exploratory objectives were to describe the prevalence of mutations in BRCA1 and BRCA2 (jointly as BRCAm), ATM, and CDK12, and to understand how HRRm prevalence may vary by patient demographics, clinical characteristics, and treatment center. The analysis for this retrospective study was completed in April 2022. Due to key differences between databases, separate analyses were conducted for CGDB and GENIE.

Statistical methods

Mean/standard deviation and median/interquartile range were calculated for continuous and count variables. Frequency and percentage were reported for categorical variables. Missing data were reported, and categories with low frequency or with small proportion were grouped together with other categories.

The overall prevalence (with 95% CIs) of HRRm was calculated as the number of patients with HRRm divided by the total number of study patients and was stratified by demographics, clinical characteristics, and NGS testing panels. In both databases, only one sample was analyzed from each patient. For patients with multiple samples, the most recent sample was selected if the results for each sample were concordant for HRRm, whereas the most recent positive sample was chosen for analysis of patients with discordant results. All analyses were descriptive, and no statistical comparisons were made. Due to the overall heterogeneity of available covariates in the two different databases and the lack of germline data for GENIE, no direct comparisons were performed between the cohorts.

Study ethics

Institutional review board approval was not required for the CGDB or the GENIE databases because all personally identifiable characteristics had been intentionally omitted. Both FH-FMI and GENIE data were stored on a secure server owned by the study sponsor.

Results

In the analysis, a total of 487 patients were included from CGDB and 3270 patients from GENIE. In CGDB, mean age at sequencing was 69.2 years (Table 1). Of the patients with data on race (n = 452), 70.8% were White, 10.2% were Black, 1.5% were Asian, and 17.5% did not fit into any of the above categories (other). Most patients (78.2%) in the CGDB database received primary oncology care at community-based practices. Most patients had high-risk disease (80.8% had a Gleason score of 8–10) and/or advanced disease (84.8% had stage IV disease) at diagnosis. Further, 77.4% of patients had metastatic disease and 17.2% had castration-resistant disease at the time of specimen collection. By the time sequencing occurred, these numbers increased to 97.9% and 51.1%, respectively. Of the 480 patients with data on sample type, 60.8% had a sample from the primary tumor and 39.2% from metastatic tissue (Table 1).

Table 1 CGDB patient characteristics.

In the GENIE database, the mean age at sequencing was 66.9 years (Table 2). Patient characteristics were relatively consistent across treatment centers, with the exception of race and ethnicity. Of the 3270 patients in this database, 85.6% were White, 9.0% were Black, 3.4% were Asian, and 2.0% did not fit into any of the above categories (other). Between treatment centers, the lowest percentage of Black patients was at DFCI (5.2%) and the highest was at Duke (18.4%). Among 2987 patients with data on ethnicity, 95.3% were non-Spanish/non-Hispanic. DFCI and Duke had very few patients with Spanish/Hispanic ethnicity (1.1% and 0%, respectively) compared with MSK and other centers (5.1% and 11.2%, respectively) (Table 2). Across all sites, 62.6% of samples were from primary tumors and 37.4% were from metastatic tumors. Most patients (78.0%) were treated at MSK. Treatment centers used different NGS platforms; however, all platforms included in the analysis had coverage of at least 10 of the 14 genes, and only rare genes (with expected prevalence of <1%: BRIP1, PALB2, BARD1, RAD51B, RAD54L, RAD51D, CHEK1, FANCL, RAD51C4 [10, 27]) were allowed to be missing (Table S1).

Table 2 GENIE patient characteristics.

The overall prevalence of HRRm was 24.6% in CGDB (somatic and germline) and 11.0% in GENIE (somatic only) (Table 3). The percent contributions for the individual gene components were relatively consistent for patients in PROfound [27] and FH-FMI-CGDB, with the most common HRRm component genes (ATMm, BRCAm, and CDK12m) contributing ~90% of the mutations (Table 3). No major differences were found in the overall HRRm prevalence by race in the CGDB data (Fig. 1A). In the GENIE database, HRRm prevalence varied by academic center (Fig. 1B). Patients treated at DFCI and Duke had higher HRRm prevalence (18.4% and 15.5%, respectively) compared with those treated at MSK (9.5%).

Table 3 HRRm prevalence and contribution by gene in PROfound, CGDB, and GENIE.
Fig. 1: HRRm prevalence by selected characteristics.
figure 1

Estimates are listed with 95% confidence intervals. A CGDB. B GENIE. CGDB clinico-genomic database, GENIE Genomics Evidence Neoplasia Information Exchange.

In the CGDB data, prevalence of mutations in the most common individual component genes (BRCAm, ATMm, and CDK12m) was also generally consistent across race- and ethnicity-based subgroups based on clinical characteristics (Table 4). However, in the GENIE database, there were suggestive differences in the prevalence of these genes between centers and race (Table 4). Specifically, the prevalence of BRCAm and ATMm was higher in patients treated at DFCI (6.8% and 6.5%, respectively) compared with patients treated at MSK (3.0% and 2.0%, respectively) and Duke (3.5% and 0.7%, respectively). The prevalence of CDK12m was also higher in patients treated at Duke (7.7%) compared with patients treated at DFCI (3.8%) and MSK (4.2%). White patients had higher prevalence of BRCAm (3.7%) and ATMm (2.7%) compared with Black patients (2.2% and 1.1%, respectively), whereas Black patients had higher prevalence of CDK12m (6.9%) compared with White patients (3.9%) (Table 4).

Table 4 CGDB and GENIE prevalence of BRCAm, ATMm, and CDK12m by selected characteristics.

Discussion

Approximately one-quarter of patients with advanced/metastatic prostate cancer in the CGDB database with relevant testing data available had tumors with HRRm. This prevalence in our real-world study was consistent with the findings from the PROfound trial, which used a similar NGS platform and algorithm to define HRRm [10, 27]. Further, the contribution of the 14 component genes was also relatively consistent between the CGDB data and PROfound, especially for the most common component genes (BRCAm, ATMm, and CKD12m). When the prevalence of HRRm was analyzed by various patient characteristics, no major differences were observed in the CGDB database; however, the sample sizes stratified by patient groups were small (46 African American and seven Asian patients), and no data on ethnicity were available.

Direct comparisons between GENIE and PROfound or FH-FMI-CGDB were not possible as the GENIE database filtered germline mutations, whereas pathogenic germline mutations were retained in FM-FMI-CGDB. The breakdown of somatic and germline prevalence for the PROfound or the FH-FMI-CGDB databases was not available. In addition, the contribution of germline and somatic mutations can vary for the different component HRRm genes. For example, BRCA2 studies have shown the prevalence in mCRPC to range from 3.3% to 6.0% and 5.0% to 15.1% for germline and somatic mutations, respectively [28]. Estimates from studies have shown that approximately 36% to 52% of BRCA2 mutations were predicted to be germline, whereas CDK12 mutations are almost always somatic [29,30,31].

The large sample size in the GENIE database (n = 3270) and the diversity of patients and treatment centers allowed for the assessment of differences in somatic HRRm prevalence based on the treatment center and race. We observed differences in HRRm prevalence by treatment center in the GENIE database. Several possible factors may have contributed to the differences between centers. First, different NGS platforms were used to detect mutations (Table S1). The NGS platform used by MSK is a tumor normal-based platform that may filter out germline mutations more efficiently compared with the tumor-only platforms used at DFCI and Duke, which might contribute to the lower overall HRRm prevalence observed at MSK. None of the NGS panels for this study were missing more than 4 genes; given that the missing genes had <1% expected prevalence, this likely did not contribute significantly. For example, the lowest prevalence of HRRm was found in patients treated at MSK whose NGS platform was only missing FANCL, which had a prevalence of 0.1% in the PROfound trial [10, 27]. Different platforms may also have variable sensitivity to detect specific variants and/or variable postprocessing bioinformatics variant calling algorithms. For example, CDK12m is predominantly a somatic mutation that would not be affected by germline filtration [30], but we still observed differences in prevalence by treatment center/NGS platform. Another study by Armenia et al. [32] has presented data on CDK12m prevalence that varied by NGS platform (5% prevalence in MSK-Impact data vs 11% prevalence in FMI data).

Variations in the racial composition of patient groups at each site might also have contributed to the observed differences in HRRm by treatment center. The prevalence of CKD12m was higher in patients at Duke (7.7%) compared with patients from MSK (4.2%) and DFCI (3.8%). We observed that Black patients had higher prevalence of CDK12m compared with White patients (6.9% vs 3.9%) and that Duke had a higher proportion of Black patients (18.4%) compared with DFCI (5.2%) and MSK (9.0%).

Even though no differences were found in overall HRRm prevalence by race, differences in BRCAm, ATMm, and CDK12m by race were observed. There has been a lack of diversity in most advanced/metastatic prostate cancer cohorts and variable definitions of component genes in previously published literature. Two prior studies examined the overall genomic landscape among patients from GENIE, but this analysis was limited to MSK and DFCI data [33, 34]. In one study among patients with metastatic prostate cancer (n = 909), Black men were more likely than White men to have actionable mutations overall, specifically in the DNA repair pathway genes (as defined by ERCC5, MRE11, TP53BP1, POLE, RAD21, MSH2, MSH6, BRCA1/2, ATR, and ATM). The frequency of CDK12m was also higher in Black versus White men with metastatic prostate cancer. Differences in race were less pronounced in patients with primary tumor samples [33]. The other study used a similar cohort but removed 458 duplicate samples and did not find a statistically significant difference in DNA repair mutations between Black and White men [34]. Compared with these previous GENIE analyses, our study used updated data, included all contributing centers with relevant NGS platforms, and focused on the HRRm definition specific to olaparib, which resulted in a larger number of Black men included in the current study (n = 277 compared with n = 71 reported by Mahal et al. [33] and n = 77 reported by Schumacher et al. [34]). In a study of 2069 men, including 169 Black men, with prostate cancer, genomic differences by race were found using MSK-IMPACT data [29]. Tumors from Black men harbored fewer phosphatase and tensin homolog mutations and more androgen receptor alterations than tumors from White men, and tumors from Asian men had more forkhead box A1 mutations and more zinc finger homeobox 3 alterations than White men. in our study, no differences were observed in overall DNA repair alterations by race, but the definitions for HRRm between our study and the previous studies were inconsistent [29]. Another study compared the prevalence of pathogenic/likely pathogenic germline variants in Black and White men with metastatic prostate cancer and found that Black men were more likely to have a germline BRCA1 mutation and were less likely to have a non-BRCA DNA repair germline variant (as defined by MSH2, MSH6, PMS2, MLH1,ATM, RAD50, RAD51D, NBN, CHEK2, BRIP1, PALB2, RAD51C, ATM, BLM, and TP53) [35]. Note that there were too few BRCA1 mutations (0.5%) among somatic mutations in our study to assess differences by race. More research in diverse populations is needed to confirm whether these differences truly exist and are clinically significant.

Study limitations

The GENIE database mostly represents major academic centers, whereas the CGDB database is mostly community based. Although complementary, these databases may not be generalizable to the overall real-world population of patients with advanced prostate cancer. For example, the CGDB patient cohort was predominantly stage IV at initial diagnosis, and at the time of sequencing, 98% had metastatic disease and 51.1% had castration-resistant disease. GENIE provided limited patient clinical data, and no information is available about the Gleason score and stage at diagnosis or whether patients had metastatic or castration-resistant disease at the time of sequencing [20]. Furthermore, databases were limited to patients receiving NGS testing as standard of care. Although the National Comprehensive Cancer Network guidelines for prostate cancer began recommending HRRm testing for patients with metastatic prostate cancer in 2019 [36], a survey of providers treating advanced prostate cancer found that in early 2020, only 38% of US patients with mCRPC were tested for HRRm [37]. In another study using data from 2014 to 2018 (prior to the approval of PARP inhibitor for prostate cancer), only 13% of patients with mCRPC were tested for HRRm [19]. In addition, because the GENIE database filters out most germline mutations, we were unable to determine the total prevalence of HRRm in this population and could not make direct comparisons with CGDB or PROfound. However, the large and diverse GENIE database provided the opportunity to assess differences by testing center and race. Finally, data represent prevalence among those who were tested in real-world practice that may represent potential selection bias.

Conclusion

In summary, the CGDB data have shown that the prevalence of HRRm in real-world clinical data (24.6%) was consistent with the prevalence of HRRm in the PROfound trial (26.9%) when a similar NGS platform (FoundationOne CDx) and algorithm were used. When testing was performed using different NGS platforms (GENIE database), HRRm prevalence was variable across treatment centers. Suggestive racial differences were observed for the most common HRRm genes, but not in overall HRRm prevalence. Because Black men have the highest incidence of and mortality from prostate cancer, more studies with increased diversity in genomic testing cohorts are urgently needed. To our knowledge, this is the first and largest analysis to provide HRRm prevalence data defined by 14 different genes using a definition consistent with the olaparib indication and to assess differences by patient characteristics and treatment center/NGS platforms.