Evaluating BRCA mutation risk predictive models in a Chinese cohort in Taiwan

Accurate estimation of carrier probabilities of cancer susceptibility gene mutations is an important part of pre-test genetic counselling. Many predictive models are available but their applicability in the Asian population is uncertain. We evaluated the performance of five BRCA mutation risk predictive models in a Chinese cohort of 647 women, who underwent germline DNA sequencing of a cancer susceptibility gene panel. Using areas under the curve (AUCs) on receiver operating characteristics (ROC) curves as performance measures, the models did comparably well as in western cohorts (BOADICEA 0.75, BRCAPRO 0.73, Penn II 0.69, Myriad 0.68). For unaffected women with family history of breast or ovarian cancer (n = 144), BOADICEA, BRCAPRO, and Tyrer-Cuzick models had excellent performance (AUC 0.93, 0.92, and 0.92, respectively). For women with both personal and family history of breast or ovarian cancer (n = 241), all models performed fairly well (BOADICEA 0.79, BRCAPRO 0.79, Penn II 0.75, Myriad 0.70). For women with personal history of breast or ovarian cancer but no family history (n = 262), most models did poorly. Between the two well-performed models, BOADICEA underestimated mutation risks while BRCAPRO overestimated mutation risks (expected/observed ratio 0.67 and 2.34, respectively). Among 424 women with personal history of breast cancer and available tumor ER/PR/HER2 data, the predictive models performed better for women with triple negative breast cancer (AUC 0.74 to 0.80) than for women with luminal or HER2 overexpressed breast cancer (AUC 0.63 to 0.69). However, incorporating ER/PR/HER2 status into the BOADICEA model calculation did not improve its predictive accuracy.

counterparts. Several studies reported poor model performance in Asian breast cancer cohorts, including evaluation of Manchester Scoring System 14 and BOADICEA 10 in a Malaysian cohort of 187 patients with BRCA mutation rate of 14.4% 21 , and evaluation of BRCAPRO 11 and Myriad 12 in a Korean cohort of 236 patients with BRCA mutation rate of 19.5% 22 . In contrast, an updated Manchester Scoring System that included adjustment for breast cancer receptor status and high grade serous type ovarian cancer was shown to be equally effective in BRCA mutation prediction in the Singapore cohort as in the Manchester population 23 . In a cohort of 212 Chinese familial breast cancer patients with BRCA mutation rate of 15.6%, BRCAPRO, Penn II 16 , and Myriad models showed comparable accuracy to western cohorts 24 . In a study of Hong Kong Chinese cohort consisted of 310 female and male breast or ovarian cancer patients with BRCA mutation rate of 13.9% 25 , BOADICEA appeared to be the most accurate in combined BRCA1/2 mutation prediction among the five tested models, while BRCAPRO better predicted mutations of BRCA1 alone. These two models actually performed slightly better in the Chinese cohort than in several western and Asian cohorts previously reported. BRCAPRO and Myriad models were also tested in a Korean ovarian cancer cohort of 232 with 24.6% BRCA mutation prevalence 26 , in which both models had acceptable performance. The aforementioned studies tested the models only in affected individuals with breast or ovarian cancer, and the cohorts were relatively small. More studies in Asian cohorts are clearly needed to validate and discern how best to use these models.
In the present study, we aimed to evaluate the mutation predictive accuracies of BOADICEA, BRCAPRO, Myriad, Penn II, and Tyrer-Cuzick models in those at risk for hereditary breast and ovarian cancer syndrome in a Chinese cohort in Taiwan. We explored model performances in various subgroups to determine how best to use the models in pre-test genetic counselling.

Results
A total of 647 female participants from 488 families were included in the study. The cohort was divided into three subgroups based on the presence or absence of personal history and family history (FH) of breast cancer (BC) or ovarian cancer (OC). The personal characteristics, cancer characteristics, and mutation frequencies for the entire cohort and for the subgroups are shown in Table 1. The mean age at study enrolment was 50.2, ranging from 16 to 96. Among them, 503 (77.7%) had a personal history of BC or OC, and 385 (59.5%) had a family history of BC or OC. The subgroup with personal history but no family history of BC or OC (BC/OC(+)FH(−)) were younger (mean age 47.8) and included more early onset cancer and triple-negative breast cancer. In the entire cohort, 48 individuals were found to be carrying a BRCA mutation (12 BRCA1, 36 BRCA2), making the carrier rate of 7.4%. The subgroup with both personal and family history of BC or OC (BC/OC(+)FH(+)) had the highest BRCA mutation carrier rate of 10.4%.
Model performance by genes. Figure 1 shows the performance of four mutation predictive models on ROC curves. The AUCs for having either a BRCA1 or a BRCA2 mutation were: 0.75 (95% CI, 0.67-0.83) for BOADICEA, 0.73 (95% CI, 0.64-0.81) for BRCAPRO, 0.68 (95% CI, 0.59-0.77) for Myriad, and 0.69 (95% CI, 0.60-0.77) for Penn II (Fig. 1a). At the optimal cut-points, defined by the closest points to the left upper corner, the values of the mutation carrier probabilities varied widely, with BOADICEA having the lowest cut-off value of 3.3%, BRCAPRO having the highest cut-off value of 24.6%, and Myriad and Penn II having the middle values of 5.3% and 11.5%, respectively. The sensitivities at the optimal cut-points were between 0.56 and 0.69, while the specificities were between 0.57 and 0.81 (Fig. 1e). www.nature.com/scientificreports www.nature.com/scientificreports/ Three of the models also predict the BRCA1 and BRCA2 mutation carrier probabilities separately. The models performed very well for BRCA1 and worse for BRCA2. Figure 1b shows the AUCs for BRCA1-only predictions: 0.98 (95% CI, 0.95-1.00) for BOADICEA, 0.93 (95% CI, 0.86-1.00) for BRCAPRO, and 0.85 (95% CI, 0.73-0.97) Figure 1. Performance of four BRCA mutation risk predictive models using ROC curves; the respective AUC for each model is shown at the right lower corner of the curves and in panel (e); the optimal cut-points (closest point to the left upper corner) are shown as triangles on each curve. (a) BRCA1/2: probability of BRCA1 or BRCA2 mutation prediction using the BOADICEA, BRCAPRO, Myriad, and Penn II models; (b) BRCA1 only: probability of BRCA1 mutation prediction using the BOADICEA, BRCAPRO, and Penn II models; (c) BRCA2 only: probability of BRCA2 mutation prediction using the BOADICEA, BRCAPRO, and Penn II models; (d) Non-BRCA HR pathway genes: probability of HR pathway gene other than BRCA1/2 (ATM, BARD1, BRIP1, PALB2, RAD50, RAD51C, RAD51D) mutation prediction using the BOADICEA, BRCAPRO, Myriad, and Penn II models; (e) AUC with 95% confidence interval (CI) for each model and each gene set, as well as mutation carrier probability, sensitivity, and specificity at the optimal cut-point for each curve are listed.
www.nature.com/scientificreports www.nature.com/scientificreports/ for Penn II. The optimal cut-point values for BRCA1 mutation carrier probability were: 7.2% for BOADICEA, 19.6% for BRCAPRO, and 8.5% for Penn II. Figure 1c shows the ROC curves for BRCA2-only prediction, and the AUCs were: 0.69 (95% CI, 0.60-0.78) for BOADICEA, 0.64 (95% CI, 0.54-0.75) for BRCAPRO, and 0.60 (95% CI, 0.50-0.69) for Penn II. The optimal cut-point values for BRCA2 mutation carrier probability were: 1.0% for BOADICEA, 5.0% for BRCAPRO, and 5.5% for Penn II. Table S2 shows the numbers and proportions of BRCA mutation carriers in each predicted range category of the mutation carrier probability. BOADICEA gave good correlation between actual mutation rates and predicted mutation probability ranges. For the other models, the mutation rates had an upward trend with increasing ranges of predicted probabilities, but the actual rate values did not always fit the predicted probability ranges.

Model performance by clinical subgroups.
To find the population group where the models are the most applicable, we did subgroup analyses based on personal and familial BC/OC status. In the subgroup that had no personal history of BC/OC but had family history of BC/OC (designated BC/OC(−)FH(+) in Fig. 2a), three models had superb performance: AUC 0.93 (95% CI, 0.81-1.05) for BOADICEA, 0.92 (95% CI, 0.82-1.03) for BRCAPRO, and 0.92 (95% CI, 0.83-1.02) for the Tyrer-Cuzick model. In the BC/OC(+)FH(+) subgroup, the models performed fairly well with AUCs between 0.70 and 0.79 (Fig. 2b). In the BC/OC(+)FH(−) subgroup ( Fig. 2c), the models had poor accuracy, except Myriad, which had comparable AUC (0.62) as those with family history (BC/OC(−)FH(+) 0.62 or BC/OC(+)FH(+) 0.70). The optimal cut-off values for mutation carrier probability were much higher for BRCAPRO than for all other models (Fig. 1d). www.nature.com/scientificreports www.nature.com/scientificreports/ The observed and expected number of BRCA mutation carriers for the entire cohort and the subgroups are shown in Table 2. Examining the expected/observed (E/O) ratio, we found that the BOADICEA model gave a fairly accurate estimation of mutation rate (E/O 0.91) in the BC/OC(+)FH(+) subgroup but an underestimation in the other two subgroups. The Myriad prevalence table gave a good estimation of mutation rates in all subgroups (E/O 0.79 to 0.86), probably because the Myriad table was based on prevalence rates and was constructed by personal and family history of BC/OC, similar to our subgroup division. The BRCAPRO and Penn II models gave an approximately 2-fold overestimation in all subgroups.
The BOADICEA model allows inclusion of the ER/PR/HER2 data in the calculation for BRCA mutation prediction. We compared the predictive accuracy with and without including the receptor status into the model, as shown in Table 3 and Fig. 4. The BOADICEA model performance did not improve with the additional receptor information for the group as a whole (AUC 0.71 to 0.70, Fig. 4a), or for any of the pathology or clinical subgroups (Fig. 4b,c). setting the positive test threshold. The performance measures using sensitivity, specificity, positive predictive value (PPV) and negative predictive value (NPV) for each predictive model are shown in Table 4, with the positive test threshold for carrying a BRCA1 or BRCA2 mutation set at 10% or 20% carrier probability. For the entire study cohort, at the same threshold level, BRCAPRO had a relatively high sensitivity while BOADICEA had the highest specificity and PPV. Moving the threshold from 10% to 20%, the performance of BOADICEA and BRCAPRO were affected only slightly, while the performance measures of Penn II were significantly affected with a large drop in sensitivity (0.82 to 0.25) and a large rise in specificity (0.39 to 0.91). The positive predictive values (PPVs) were generally low due to the low mutation prevalence in the study cohort, although using the 20% threshold, BOADICEA could give a PPV of 0.46. The negative predictive values (NPVs) of all four models were high (range 0.94-0.96). In the BC/OC(-)FH(+) subgroup at a positive test threshold of 10%, Tyrer-Cuzick, BOADICEA, and BRCAPRO models gave good sensitivities (0.67, 0.83, and 0.83, respectively) and specificities (0.96, 0.98, and 0.70, respectively). Moving the threshold to 20% markedly lowered the sensitivity to 0.33 for Tyrer-Cuzick and BOADICEA without much gain in specificity. BRCAPRO's performance did not differ much between the 10% and 20% thresholds.    www.nature.com/scientificreports www.nature.com/scientificreports/

Discussion
Our study tested five widely-used BRCA mutation risk predictive models in a large Chinese cohort in Taiwan, which included breast or ovarian cancer patients with or without family history of breast or ovarian cancer, as well as unaffected women with family history of breast or ovarian cancer. Participants were enrolled to the study with comprehensive personal and pedigree data collection, and the same experimental and data analytical protocols for genetic testing were used for all participants. Our data thus allowed consistent evaluation of model performance not only in the cohort as a whole, but also in different subgroups of women with or without personal or family history of cancer. Using AUCs in the ROC curves for combined BRCA1/2 predictions, we showed that BOADICEA and BRCAPRO models performed equally well in this Chinese cohort (AUCs 0.75 and 0.73) as previous studies in western cohorts (AUCs ranging from 0.71 to 0.77), while Myriad and Penn II models performed less well (AUCs 0.68 and 0.69) than those in western cohorts (AUCs ranging from 0.71-0.79) 16,[27][28][29][30] .
In pre-test genetic counselling, it is important to know which model(s) are best used for which type of patients. We showed that BOADICEA and BRCAPRO were particularly well suited for unaffected women with family history (BC/OC(-)FH(+) subgroup), achieving AUCs of 0.93 and 0.92 respectively. These two models also performed fairly well (AUCs 0.79 and 0.79) for women with both personal and family history of breast or ovarian cancer (BC/OC(+)FH(+) subgroup), but they were close to unhelpful (AUCs 0.57 and 0.54) for those without family history. The Tyrer-Cuzick model can only be applied to unaffected women with family history and it worked very well for this subgroup and achieved AUC of 0.92. The Penn II and Myriad models worked the best for women with both personal and family history (AUC 0.75 and 0.70), and less well for unaffected women with family history (AUC 0.69 and 0.62). For women with breast or ovarian cancer but no family history, Myriad performed relatively well (AUC 0.62) compared to the other models.
Among women with breast cancer and known ER/PR/HER2 receptor status, we found that all models performed better for those with triple negative cancer than for those with luminal or HER2 overexpressed cancer, although none of the models actually used the receptor data in the prediction. In fact, incorporating the receptor status in the BOADICEA model had almost no effect on its predictive accuracy for the whole group or any of the subgroups.
All the models that predict separate BRCA1 and BRCA2 probabilities performed much better for BRCA1 than for BRCA2: AUC of 0.98 vs 0.69 for BOADICEA, 0.93 vs 0.64 for BRCAPRO, and 0.85 vs 0.54 for Penn II. Similar difference was observed in previous studies but not as profound as our results 16,21,25,29 . This difference could possibly explain the poorer model performance in some Asian cohorts since BRCA2 mutations seemed to be found more often than BRCA1 mutations in Asians while it is the opposite in Whites [31][32][33] . Most models performed even poorer for non-BRCA HR pathway genes probably because these genes have lower penetrance in phenotype than BRCA1/2. However, BRCAPRO model gave similar AUC for these HR genes (0.65) to that of BRCA2 (0.64).
A mutation risk threshold of 10% is often used to recommend genetic testing. The threshold could be set lower or higher depending on resources available to the individual or to the healthcare system. The availability of new cancer treatment options targeted for BRCA-mutated tumors, such as PARP inhibitors, could lower the threshold for testing. Moreover, the threshold should also depend on the model used to determine the risk. Our study showed that at the optimal points on the ROC curves, the cut-off carrier probability values were much higher for BRCAPRO (24.6%) than for BOADICEA (3.3%) (Fig. 1), consistent with the results shown in Table 2 that  It is clear that most models rely heavily on family history of cancer. The models that performed better overall in the cohort (e.g. BOADICEA, BRCAPRO, Tyrer-Cuzick) utilized detailed pedigree data rather than the categorical yes/no or age cut-off clinical variables in the other models, while the models (e.g. Myriad) that incorporated more personal cancer information performed better for affected women without family history of breast or ovarian cancer. However, family histories are often limited due to many factors, including inaccurate or unavailable information, small families, scarcity of females in a family, premature death due to war or natural causes, migration or separation within a family. In modern societies, extended family history will probably become more and more limited. With more genetic testing results available, new models may be developed using personal history, and clinical and genetic information of nuclear families.
There are several limitations in our study. First, despite the relatively large Chinese cohort, our sample size was still limited. A larger pool of at risk individuals with genetic data available would make the model assessment more accurate, and new models could possibly be developed. Second, our cohort was consisted of a quite uniform Chinese population. The results may not be able to extend to other Asian ethnic groups. Third, the cohort included only women and a very small group of ovarian cancer patients, and therefore the results may not be applicable to men or women with ovarian cancer.
In summary, the five mutation predictive models performed generally well in this Chinese cohort as compared with western cohorts. The predictions were the most accurate for unaffected women with family history of breast or ovarian cancer using the BOADICEA, BRCAPRO and Tyrer-Cuzick models. The predictions were also fairly accurate for women with both personal and family history of breast or ovarian cancer, as well as for women with triple negative breast cancer. For breast or ovarian cancer patients with no family history, the predictions were quite unreliable. Between the two better-performed models, BOADICEA seemed to underestimate mutation risk while BRCAPRO seemed to overestimate mutation risk, thus we recommend setting higher risk threshold for genetic testing when using BRCAPRO (e.g. 20%) and lower risk threshold when using BOADICEA (e.g. 5%).
Methods study cohort and data collection. The study was conducted in accordance with the Declaration of Helsinki, and the study protocol was approved by the Institutional Review Board Committee at Koo Foundation Sun Yat-Sen Cancer Center (case No. 20141222A). Written informed consent was obtained from each study participant. Eligible individuals were enrolled between July 2015 and April 2017 at Koo Foundation Sun Yat-Sen Cancer Center (KF-SYSCC) to participate in germline testing of a panel of cancer susceptibility genes. Participants had to fulfil at least one of the following eligibility criteria: family history of breast or ovarian cancer at any age (2 or more individuals on the same lineage of the family), personal history of breast cancer or ovarian cancer with age of diagnosis less than or equal to 40, bilateral breast cancer, triple negative breast cancer, or both breast and ovarian cancer in the same individual. None of the participants had known mutation status in any cancer susceptibility genes prior to enrolment. Through participant surveys, detailed personal and family history regarding all cancers were collected, and pedigrees were extended to third-degree relatives as much as possible. The data of each pedigree was manually checked and formulated into a relational database for analysis. For the analyses in this study, male probands were excluded. ER, PR, and HER2 immunohistochemical (IHC) stains were available for a majority of invasive breast tumors in this cohort. ER(+) or PR(+) were defined as 1+, 2+ or 3+ on www.nature.com/scientificreports www.nature.com/scientificreports/ IHC stain. HER2(+) was defined as HER2 overexpression (3+ on IHC stain, or positive on dual in situ hybridization or fluorescence in situ hybridization).
Genetic testing for BRCA1/2 and other cancer susceptibility genes. Exonal and exon-flanking regions of twenty cancer susceptibility genes were sequenced on a next generation sequencing platform and variants were identified using standard protocols, details of which have been published previously 34  www.nature.com/scientificreports www.nature.com/scientificreports/ only protein-truncating variants including nonsense, frameshift, and splice-site mutations as pathogenic mutations. In addition to BRCA1 and BRCA2, seven genes including ATM, BARD1, BRIP1, PALB2, RAD50, RAD51C and RAD51D were denoted as homologous recombination pathway genes for predictive model analyses.
Calculation of germline mutation carrier probabilities. Relevant proband and pedigree information were formatted and stored in a local database, and input data for running the models were generated by in-house scripts (Perl, PHP, R script and shell script) in an automated fashion. Five mutation predictive models, BOADICEA, BRCAPRO, Myriad, Penn II, and Tyrer-Cuzick, were used for estimation of carrier probability of BRCA gene mutations. The prediction results were filtered and stored in the local database for statistical analyses.
The BOADICEA and BRCAPRO models compute the individual BRCA mutation carrier probability based on individual information on the proband and each of her relatives, including current age or age of death, incidence of breast, ovarian and other cancers, age at diagnosis and relationship to the proband. For BOADICEA, the predicted probability of carrying either a BRCA1 or a BRCA2 mutation was generated using the BOADICEA web application v3, (https://pluto.srl.cam.ac.uk/cgi-bin/bd3/v3/bd.cgi). BOADICEA allows input of the breast tumor ER/PR/HER2 (receptor) data. For comparison among the models, receptor status was not included in the calculation. Separate analyses comparing the predictive accuracies of BOADICEA with and without inclusion of the receptor status were performed. For BRCAPRO, the BayesMendel R package version 2.1-3 (available at https://projects.iq.harvard.edu/bayesmendel/bayesmendel-r-package) was used. The inputs for the Penn II and the Myriad model use a summary of personal and family cancer history. The Penn II predictions were carried out through a web interface (available at https://pennmodel2.pmacs.upenn.edu/penn2/). The Myriad model considers only the combined probability of carrying a mutation in either BRCA1 or BRCA2, by using a mapping table downloaded from the Myriad Genetics website (https://myriadgenetics.eu/healthcare-professional-treating-diseases/ hereditary-cancer-testing/hereditary-breast-and-ovarian-cancer-hboc-syndrom/prevalence-tables/). The Tyrer-Cuzick model is only applicable to individuals without personal history of breast or ovarian cancer but have family history of these cancers. The International Breast Cancer Intervention Study (IBIS) breast cancer risk evaluation tool (v8) was used to calculate the mutation carrier probabilities. statistical analysis. Characteristics of the cohort were summarized using descriptive statistics stratified by subgroups. To determine the performance of the predictive models, receiver operating characteristics (ROC) curves were constructed, and the areas under the curve (AUCs) and 95% confidence interval (CI) were calculated. For each model, mutation carrier probability value, sensitivity, and specificity at the optimal cut-off point, which is the closest value to the left upper corner, were recorded. Subgroup analyses were done based on personal history and family history status. The applicable models were compared within the subgroups. Goodness of fit was assessed by comparing the expected/observed ratio (E/O) of the predicted probability to the actual frequency of the mutations, and calibration evaluated how well the model performed in each subgroup. Sensitivity, specificity, positive predictive value (PPV), and negative predictive value (NPV) were calculated for each risk model at the 10% and 20% thresholds for mutation carrier probability. All ROC curves were plotted using SigmaPlot (Systat Software, Inc.) version 12.0.