INTRODUCTION

Genetic testing for germline pathogenic variants in BRCA1 and BRCA2 is widely used in clinical practice to identify individuals at increased risk of breast, ovarian, and other cancers. One factor limiting the clinical utility of such testing is the prevalence of rare sequence variants of uncertain clinical relevance in these genes. These are primarily missense changes but also include in-frame deletions and insertions, and variants (both intronic and exonic) that may affect splicing efficiency.1,2,3,4

Initial studies aimed at establishing the clinical relevance of variants of unknown significance (VUS) in BRCA1 and BRCA2 used clinical genetic testing results from ~60,000 individuals.2 Likelihood ratios for individual variants were derived using a logistic regression equation based on the characteristics of the personal and family histories of the individuals with known pathogenic variants (largely loss of function [LOF]) compared with individuals who did not have BRCA1 or BRCA2 pathogenic variants or VUS. In that analysis, it was estimated that, overall, 20% of the VUS analyzed were pathogenic; in subsequent analysis3,5 the proportion of variants with certain in silico characteristics based on the A-GVGD prediction model3 had estimated proportion of pathogenic variants of 0.81 (A-GVGD score C65) while other bioinformatics categories were predicted to contain a very small fraction of pathogenic variants. The values derived from theses analyses of in silico prediction are posted on the website6 and are used as prior probabilities in multifactorial models for classification of VUS by the BRCA1/2 expert panel (ENIGMA) and other groups.7

Because of the efforts of commercial labs, independent research efforts, and in particular the work of the ENIGMA consortium8,9 over the ten years following the initial study, many VUS have been reclassified as pathogenic or benign with respect to cancer risk. However, we note that due to the increased volume of genetic testing, the absolute numbers of individuals receiving an inconclusive BRCA1/BRCA2 test result are still quite large, with roughly equal numbers of VUS (7360) and pathogenic/likely pathogenic variants (6968) in BRCA1/BRCA2 listed in the National Institutes of Health (NIH) repository ClinVar (https://www.ncbi.nlm.nih.gov/clinvar; accessed 17 May 2019).10

Moreover, clinical characteristics of patients undergoing BRCA1/BRCA2 testing have also changed over time. For example, BRCA1/BRCA2 testing criteria have broadened and now include additional indications such as personal history of pancreatic cancer and unaffected women with only minimal family history. In addition, individuals with constellations of cancer beyond breast and ovarian frequently undergo BRCA1/BRCA2 testing as part of multigene panel testing (MGPT).11

Considering these changes in genetic testing practices, we re-examined the utility of a clinical history–based approach for classification of BRCA1/BRCA2 VUS in a cohort of >170,000 individuals undergoing hereditary cancer MGPT to develop prediction models that in turn allow inferences to be made about individual variants that can be included in multifactorial classification models. The results can then be used to provide updated calibration of in silico predictors that are often used as prior probabilities in these models.7

MATERIALS AND METHODS

Source of data and variants analyzed

The data analyzed in this report came from the large database of patients who underwent MGPT at Ambry Genetics (Aliso Viejo, CA) from March 2012 to December 2016. MGPT included comprehensive analysis of 5–49 genes, depending on the panel ordered.

Clinical history information was obtained from test requisition forms (available at https://www.ambrygen.com/file/material/view/984/Cancer_Comp_TRF_0918_final.pdf), and entered into Ambry’s database in a standardized manner by a team of trained clinical data curators. Ambry has previously shown that information reported on the TRF is accurate when compared with clinical records and pedigree drawings.12 This study has been exempted from review by the Western Institutional Review Board.

A total of 154,653 individuals were tested for BRCA1 and BRCA2. Of these, 10,534 were excluded due to insufficient information regarding personal or family history of cancer, leaving 144,119 potential subjects. Lastly 5777 were excluded from the analysis because they had a pathogenic or likely pathogenic (VLP) variant in another gene (ATM, PALB2, BARD1, CDH1, CHEK2, MLH1, MSH2, MSH6, NBN, NF1, PTEN, RAD51C, RAD51D, TP53). After these exclusions, 138,342 individuals tested for BRCA1 and BRCA2 were eligible for analysis who had either a pathogenic variant, VLP or VUS in BRCA1 or BRCA2, or were MGPT negative; that is, had no pathogenic (or VLP) variant identified in any of the other breast cancer susceptibility genes tested.

BRCA prediction models

To construct a predictive model to be applied to VUS/VLP, a logistic regression model was used comparing the clinical histories (CH) of individuals with known pathogenic variants versus those with no reportable variant in these genes. We have included variants classified as VLP here as the analyses could in many cases generate evidence to move them into the pathogenic category. For the analysis of BRCA1, individuals who had pathogenic variants, VLP or VUS in BRCA2 were excluded, leaving 131,352 included in the analysis of that gene. Conversely, probands with pathogenic variants, VLP, or VUS in BRCA1 were excluded for the BRCA2 analysis leaving 131,465 eligible. In total there were 14 parameters included for personal cancer history and 15 for family history, making 29 total clinical history parameters to be estimated in the logistic regressions. All logistic regressions included personal and family histories of the following cancers: breast cancer, ovarian cancer, pancreatic cancer, and prostate cancer. For breast cancer, age at diagnosis was categorized as <50 years or ≥50 years, and for ovarian and prostate cancers, ages were categorized as <60 years or ≥60 years. We also included ductal carcinoma in situ (DCIS), bilateral breast cancer, and male breast cancer as separate predictors. No age considerations were applied for pancreatic cancer. For personal history of breast cancer, additional subcategories were incorporated within each age group according to triple-negative (TN) status: TN, not TN (e.g., ER+), or status unknown. For family history, cancer counts were restricted to first- and second-degree relatives (maternal and paternal combined), with the number of affected relatives categorized as 0, 1, 2, 3+ for breast cancer and 0, 1, 2+ for the other cancer types.

Since index cases of different ethnic backgrounds might be expected to present with different distributions/frequencies of variants and different distributions of personal and family histories of cancer,13,14,15,16 we performed separate logistic regression analyses for each of four racial/ethnic groups: (1) Caucasian plus mixed or unknown race/ethnicity, (2) African American, (3) Asian, and (4) Hispanic. The predicted probabilities rk of carrying a pathogenic variant for each tested individual were then derived using the predict option in Stata. We denote r0 to be the corresponding probability under the null hypothesis that the variant is unrelated to family history, or equivalently the prior probability of a pathogenic variant in the tested population. This is estimated by the overall proportion of individuals who have a pathogenic variant in the given gene (rather than a normal sequence). For example, using the data in Table 1, for the European group for BRCA1, r0 = 1706/(1706 + 108,602) = 0.015. We have shown previously that the required likelihood ratio (LR)

Table 1 Number of individuals and variants included in the study by gene and classification
$$\frac{{L[CH|V\;is\;pathogenic]}}{{L[CH|V\;is\;neutral]}}$$

is given by:

$$\frac{{r_k\left( {1 - r} \right)}}{{(1 - r_k)r_0}}$$

That is, the odds ratio of the predicted probability that an individual with the given family history is a carrier of a pathogenic variant against the corresponding probability under the null.2 For each variant, LRs were multiplied for each individual carrying that variant (potentially in different race/ethnicity groups) to arrive at a per-variant LR.

Heterogeneity analysis

To estimate the proportion of pathogenic variants in the data set that are likely to be clinically significant as a function of bioinformatically predicted classifications, we performed a heterogeneity analysis analogous to that used previously in linkage analysis. Specifically, the required likelihood for a given class C is given by:

$$\mathop {\prod }\limits_{i = 1}^{N_C} [\alpha LR_i + (1 - \alpha )]$$

where NC is the number of variants in the class and LRi is the combined likelihood ratio for all probands carrying the ith variant.

This likelihood (in practice, the log-likelihood) is then maximized over α and approximate 95% confidence intervals (CIs) can be constructed by finding those values of α where 2ln(likelihood) differs by 3.84 from the −2ln(likelihood) at the maximum value. Hypotheses regarding differences in values of α as a function of partitions of the total variant space are performed using likelihood ratio tests.

RESULTS

A total of 2383 distinct VUS/VLP in 4644 tested probands were identified through MGPT and were analyzed using the methods described above. Table 1 displays the BRCA1 and BRCA2 variant status of these individuals by racial/ethnic group. Tables 2 and 3 show the estimated odds ratios (OR) and corresponding 95% confidence intervals for the personal and family history factors, respectively, in the model as predictors of BRCA1 and BRCA2 pathogenic variant status across the four race/ethnicity groups. The numbers of tested individuals in each personal and family history category for each race/ethnicity group are provided in Supplementary Tables 1 and 2. For all racial and ethnic groups, the area under the receiver–operator curve (AUC) for the predicted pathogenic variant status were higher for BRCA1 (0.79–0.83) than for BRCA2 (0.66–0.70), due primarily to the higher predictive power of ovarian cancer and the association with triple-negative breast cancer in BRCA1. For BRCA1, the African American sample had the highest AUC (0.83), which was significantly higher than that for Caucasians. This is likely due to the higher prevalence of triple-negative breast cancer cases among African Americans. For BRCA2, the AUC was highest for the Asian sample (0.70) though not significantly different from the other racial/ethnic groups

Table 2 Odds ratios and corresponding 95% confidence intervals for personal history factors as predictors of BRCA1 and BRCA2 pathogenic variant carrier status

.

Table 3 Odds ratios and corresponding 95% confidence intervals for family history predictors of BRCA1 and BRCA2 pathogenic variant carrier status

Variant classification

The calculated log-likelihood ratio scores and number of probands for each variant observed are shown in Supplementary Tables 3 (BRCA1) and 4 (BRCA2).

Assuming the previous prior probabilities based on the data in Tavtigian et al.,3 LRs from the present study provide evidence to support classification of 26 VUS with prior probabilities of pathogenicity between 0.29 and 0.81. Of these, 19 could be classified as benign or likely benign and 7 as pathogenic or likely pathogenic. In addition, LRs for 15 variants with an assumed prior probability of pathogenicity of 0.03 (variants in key functional domains that have A-GVGD scores of C0, indicative of neutrality) provide evidence to support a benign classification, as they were associated with odds of greater than 10:1 against pathogenicity. It should be emphasized here and in subsequent discussion that for many of these variants there is likely other evidence that is not considered in this paper; true clinical classifications should take into account all other available data, such as the multifactorial summary for a large number of variants in Parsons et al.7

VUS with clinical history LRs indicative of high probability of pathogenicity

There were 22 variants labeled as VUS that had odds of >10:1 in favor of pathogenicity; of these, seven were missense variants not located in functional domains and with no bioinformatics evidence of interfering with normal splicing. Among the other 15, BRCA1 EX16–18dup was notable with odds in favor of pathogenicity of 1895:1 and thus should be reclassified as pathogenic if considered appropriate in the context of other supporting evidence. Another variant, BRCA1 c.5332G>A; p.D1778N, occurring at the last nucleotide of exon 21, had a moderate probability of damage to the wild-type splice donor, and odds in favor of pathogenicity of 884:1 based on clinical histories of eight individuals with this variant. If we assume the previously estimated prior probability of 0.34,5 this variant has a posterior probability of pathogenicity of 0.997 and can also be considered pathogenic based on multifactorial likelihood analysis, assuming no conflicting data from other sources. BRCA2 c.383A>G, for which bioinformatic analysis indicates a moderate probability of creating a de novo splice donor with an assigned prior probability of 0.3 (http://priors.hci.utah.edu/PRIORS),6 had odds of 18:1 in favor of pathogenicity based on the clinical histories of three individuals with this variant.

Analysis of variants classified as likely pathogenic (VLP)

We observed four variants (two in each gene) that were classified previously as likely pathogenic, but had odds of greater than 10:1 against pathogenicity in this analysis. In BRCA1, a frameshift variant, c.5578dupC, had odds against pathogenicity of ~11:1 based on personal and family histories of five index cases with this variant. This likely indicates that this variant results in a stable, almost full-length protein and does not undergo nonsense-mediated decay and that the truncated residues do not have any functional importance. In BRCA2, a splice variant c.517-2A>G had odds of 22:1 against pathogenicity based on six families. This variant was shown to result in deletion of BRCA2 exon 7 and to lead to a frameshift.17 Lastly, analysis of BRCA2 c.7878G>C;p.W2626C resulted in odds of 12:1 against pathogenicity based on 12 families. This variant was also evaluated in the Pruss et al.18 analysis and it was concluded that this was likely a hypomorphic allele, a finding consistent with our results.

Heterogeneity analysis

Based on the family history log-likelihood ratios for each variant we estimated the proportion of variants in subgroups of variants that were pathogenic using the admixture model described in “Materials and Methods”. We first divided the missense variants into two groups based on their presence/absence in one of the known key functional domains: BRCA1 nucleotides 1–294 and nucleotides 4987–5577 encompassing the start codon, the RING domain, and the BRCT repeats; and the DNA binding domain of BRCA2 nucleotides 7669–9558. Then for those variants in one of these key domains we grouped them according to Align GVGD.3 Variants more likely to affect function by splicing than by an altered protein were categorized by their likelihood to create a de novo donor or damage the wild type as predicted by MaxENT scan.5 These groupings are reported on the website (http://priors.hci.utah.edu/PRIORS).6 Table 4 shows the results of these analyses for each of these subgroups. For each, we performed the analysis in two ways: first, including all variants that met the specified prior probability criterion irrespective of the Ambry Genetics classification (reflecting more the Easton et al.2 analysis); and then, removing all variants previously classified as pathogenic from these analyses. As shown in Table 4, 23% of VUS/VLP variants that were located in a key domain but were in A-GVGD class C0 were estimated to be pathogenic, compared with the previous estimate of 1% for this group. For variants within a key domain with an A-GVGD score of C65, for which with the prior probability was estimated to be 0.81 in the 2007 study,2 we found that the estimate to be numerically lower in the current analysis. The estimate was 0.77 when Ambry pathogenic variants (most classified after 2007, so would have been considered VUS in the original 2007 analysis) were included versus 0.60 when Ambry pathogenic variants were excluded. However, the newly estimated proportions were not significantly different (upper 95% confidence limits of 0.92 and 0.80, respectively).

Table 4 Heterogeneity analysis by bioinformatic groups

Sensitivity analyses

We had included 16,500 subjects with unknown or mixed ethnicity within the large European/Caucasian group. To ensure that this did not bias our analyses, we performed a sensitivity analysis excluding patients with unknown/mixed ethnicity. All previously identified factors were significant in this analysis and ORs and model fits were similar (AUC 0.795 vs. 0.786).

In addition, we performed the analyses excluding individuals from the reference group who had been previously tested for BRCA1/2 and found negative to avoid any biases with regard to family history for qualifying for previous BRCA1/2 testing. Again, very little differences were observed in the parameter estimates and model performance (AUC 0.795 vs. 0.786).

DISCUSSION

We have used a large clinical MGPT data set to inform classification for VUS/VLP in BRCA1 and BRCA2 based on analysis of personal and family history of >135,000 tested individuals. Of the 2383 such variants, 45 (5 of which were based on CH data from at least 5 probands) had LRs in favor of pathogenicity of >10.0 and 150 (57 of which had CH data from at least 5 probands) had odds against pathogenicity of >10.0. Integration of these LRs with in silico and other existing genetic and functional data should allow the clinical classification of significant numbers of VUS and VLP thus reducing uncertainty, improving the utility of genetic testing, and providing useful information to the individuals who carry these variants.

Racial and ethnic differences in models

The large sample size of tested individuals here allowed us to fit models separately for each of four groups based on race/ethnicity. Tables 2 and 3 show some differences in strength of predictor variables between ethnic groups—for example, breast or ovarian cancer at older ages was a weaker predictor of BRCA1 status in African American individuals than in the larger Caucasian set. Interestingly we observed that prostate cancer, particularly diagnosed over age 60, was a significant predictor of BRCA2 pathogenic variant status in men, with relatively strong effects of both personal and family history of prostate cancer in Hispanics and African Americans. This could be of interest in setting testing criteria in these populations and may be due to a higher prevalence of aggressive prostate cancer in these populations and/or to higher Gleason scores in these populations that are known to be associated with germline BRCA2 pathogenic variants.19,20

Differences in model from the 2007 paper

The current study differs from the earlier similar study2 in several respects. Because testing criteria were stricter due to the cost of testing in 2007, the frequency of observed pathogenic variants was approximately double than that observed here due to ascertainment bias of the historic cohort; however, the larger sample size in the present study results in similar numbers of variants in the logistic regressions. Secondly, in the current study we were able to exclude individuals with pathogenic variants in several other genes from the non-BRCA group because of the current availability of multigene testing. Another important difference is that in the analyses presented here we included the triple-negative status of the breast cancer in the index case that is particularly important for BRCA1 prediction as demonstrated in Table 2. Lastly we expanded our analysis to include two other cancers types, prostate and pancreatic, that were not included as predictors in the 2007 model. Although many of the factors related to breast and ovarian cancer were significant predictors of pathogenicity for both genes and across all races and ethnicities, there were a few differences, particularly with regard to pancreatic cancer and male breast cancer, which were shown to be important predictors for BRCA2, but not BRCA1. Interestingly, DCIS alone in the index case and having first-degree relatives with breast cancer over age 50 were predictors of the absence of a BRCA1 pathogenic variant, but predictors of the presence of a BRCA2 pathogenic variant.

Heterogeneity analyses

The analyses of groups of variants by bioinformatics predictions shown in Table 4 illustrate a number of important findings. First, our analysis confirms previous indications that missense variants in regions of the gene that are not in functionally important domains of the protein are very unlikely to be pathogenic. In contrast to a previous study in which only 1% of variants predicted to be neutral (A-GVGD class C0) that were in recognized functional domains were estimated to be pathogenic, here we found that a significant fraction (23%, 95% CI 12–37%) of such variants were estimated to be pathogenic. Variants in this group with high odds of pathogenicity should be examined in detail, including detailed functional assays. For example, BRCA1 c.5527G>C; p.Ala1843Pro, classified by Ambry as VLP, had odds of 21:1 in favor of pathogenicity (though based on only a single proband) and was classified as loss of function in both the assays of Woods et al.21 and Findlay et al.22 However in the in silico analysis, this variant was predicted to be neutral, with considerable variation observed in the multiple species sequence alignment with the variant amino acid proline observed. Lastly, the VLP variant BRCA2 c.8188G>C;p.Ala2730Pro also scored as A-GVGD class C0 in the priors database but had odds of 18:1 in favor of pathogenicity based on five families; this variant displayed impaired homology directed repair (HDR) function in Hart et al.23

The estimates for the variants with predicted aberrant splicing as the more likely driver of pathogenicity indicated that for variants that were predicted to moderately damage the wild-type donor/acceptor site, nearly 80% could be predicted to be pathogenic, which was similar to that for those variants that were expected to severely damage the donor/acceptor. That only 80–90% of the variants that essentially impact the consensus splice site were as a group expected to be pathogenic is surprising, and may be due to the presence of alternative transcripts, in-frame exons that do not contain any important functional domains, both of which are known to be present in these genes (e.g., BRCA1 c.594-2A>C24). Conversely, few if any of the variants predicted to create de novo donor sites were likely to be pathogenic, indicating that this is an unlikely mechanism of pathogenicity in these genes, or that the splice prediction algorithms currently do not use information on additional splice motifs or other context to differentiate which de novo donor sites are more likely to be used. It is also worth mentioning that the estimates in Vallee et al.5 were based on only eight variants and the confidence intervals were extremely wide; the present study had 25 variants in this category and the upper confidence limit was 0.32.

It would be of interest to compare our results here to other in silico predictors that have shown similar/higher correlations with pathogenicity. In addition, further discussion is needed regarding the calibration of the prior probabilities as delineated in the online database (http://priors.hci.utah.edu/PRIORS)6 as this is widely used and is integrated into a number of important resources such as the BRCA Exchange database (www.brcaexchange.org).25

Caveats and limitations

At the individual level many variants (65%) were observed in only a single individual so that for these variants the results should not be overinterpreted. Only 177 variants (7.4%) were observed in 5 or more individual probands (Supplementary Tables 3 and 4). However, the general results for groups of variants should be sufficiently precise to draw valid conclusions, and to reclassify a substantial number of individual variants from VUS to the benign or likely benign categories, given other information.

Conclusions

Based on the results presented here, it seems clear that personal and family history analyses in large clinical data sets are useful for providing statistical evidence about pathogenicity of VUS that then can be combined with other lines of evidence (e.g., cosegregation) in multifactorial models to derive clinically useful classifications for BRCA1/2 variants.

The LRs calculated from this analysis can be utilized in a variant assessment scoring system such as the standards and guidelines recommended by the American College of Medical Genetics and Genomics (ACMG).24,25,26,27 For example our analysis suggests that a variant that is located outside of known functional domains, and that in silico predictions indicate is not likely to affect splicing, could be strong evidence (BS4) that the variant is benign based on the ACMG schema.28 LRs could also be more generally applied as “other data” to support pathogenic versus benign classification. For example, a variant with odds of >10:1 for or against pathogenicity based on five or more informative families could be considered as supporting evidence of pathogenicity or benign impact, respectively. Similar to cosegregation data, LRs could be used as even stronger evidence with increasing data such as additional families or increasing odds.

The results of this study bode well for applying this approach to other data sets from large clinical testing centers, and potentially from large population-based sequencing studies, provided family history data is sufficiently detailed. Moreover, this method should work for other cancer susceptibility genes for which the penetrance is high enough (e.g., TP53, PALB2) and/or there are rare cancers associated with pathogenic variants in the given gene that personal/family history is predictive of carrier status. Beyond the application to cancer susceptibility genes, the approach taken here should work when (1) phenotypic features are sufficiently predictive of individuals/families segregating a pathogenic variant in the gene of interest (e.g., AUC>0.65), (2) the sample size and frequency of pathogenic variants is sufficiently high that the number of index cases with pathogenic variants provides statistical power, and (3) relevant personal and family history data can be accurately and systematically collected.

The evidence presented in this paper should be integrated into ongoing efforts to provide large-scale multifactorial classification7 as well as translated directly into components of qualitative classifications such as the ACMG criteria used by many clinical testing laboratories,26,28,29 and further should be integrated into public resources displaying variant data (BRCA Challenge). These efforts will reduce the prevalence of VUS classifications that are so problematic from both provider and patient perspectives.