Population-based estimates of breast cancer risk for carriers of pathogenic variants identified by gene-panel testing

Population-based estimates of breast cancer risk for carriers of pathogenic variants identified by gene-panel testing are urgently required. Most prior research has been based on women selected for high-risk features and more data is needed to make inference about breast cancer risk for women unselected for family history, an important consideration of population screening. We tested 1464 women diagnosed with breast cancer and 862 age-matched controls participating in the Australian Breast Cancer Family Study (ABCFS), and 6549 healthy, older Australian women enroled in the ASPirin in Reducing Events in the Elderly (ASPREE) study for rare germline variants using a 24-gene-panel. Odds ratios (ORs) were estimated using unconditional logistic regression adjusted for age and other potential confounders. We identified pathogenic variants in 11.1% of the ABCFS cases, 3.7% of the ABCFS controls and 2.2% of the ASPREE (control) participants. The estimated breast cancer OR [95% confidence interval] was 5.3 [2.1–16.2] for BRCA1, 4.0 [1.9–9.1] for BRCA2, 3.4 [1.4–8.4] for ATM and 4.3 [1.0–17.0] for PALB2. Our findings provide a population-based perspective to gene-panel testing for breast cancer predisposition and opportunities to improve predictors for identifying women who carry pathogenic variants in breast cancer predisposition genes.


INTRODUCTION
Data from breast cancer predisposition gene panel testing are accumulating rapidly as it becomes more affordable and more accessible in different settings, including clinical care. Large studies based in clinical and commercial testing laboratories have demonstrated that gene panel testing has increased the number of clinically actionable variants identified in women undergoing testing, compared with previous testing that included only BRCA1 and BRCA2. This increase in actionable findings is in the order of 5-10% depending on the setting, inclusion criteria for actionable variants and study design 1-5 . Most of this work has been based on women selected for high-risk features, such as personal or family history of breast cancer, who underwent gene panel testing for cancer susceptibility at commercial laboratories [6][7][8] . Far fewer data are available to make inference about breast cancer risk for women unselected for family history, which is important to consider for population screening of affected and unaffected women. The value of population-based case-control studies and gene panel testing have recently been illustrated by Hu et al. 9 who reported the outcome of a US-based study (CARRIERS consortium), involving over 32,000 affected and 32,000 unaffected women and Dorling et al. 10 who reported the outcome of an international study (BRIDGES) involving 60,000 women affected and over 53,000 women unaffected by breast cancer. These studies provided improved estimates of the prevalence and the magnitude of breast cancer risk associated with pathogenic variants in known breast cancer predisposition genes to guide genetic counselling.
Several studies have only included women affected by breast cancer (case only) and have reported variant prevalence [1][2][3][4] . Kurian et al. 11 linked cancer registries from Georgia and California (USA) to the gene panel testing outcomes from four key clinical testing laboratories. Their study linked 24.1% of the 77,085 women with breast cancer to genetic testing results and reported that panel testing increased the frequency of actionable genetic findings by 1.5%. Several large studies have reported estimates of breast cancer risk associated with carrying a rare germline variant in a gene included in these gene panel tests [7][8][9][10][12][13][14][15][16] .
Here, we report the prevalence and breast cancer risk estimates associated with pathogenic rare variants identified in breast cancer predisposition gene panel tests, conducted in an Australian population-based case-control study of breast cancer (with an emphasis on early age at disease onset), involving both (i) agematched population-based controls and (ii) a healthy older group of Australian women as controls. Table 1 and Fig. 1 give descriptive statistics for the study subjects. Regarding the six potential confounders that we used as adjustment variables in our main analyses, the ABCFS cases and controls were very similar to each other, and both were similar to the ASPREE controls except that the ASPREE controls were older, as expected due to study design differences.

Gene panel testing
There were 162 (11%) ABCFS cases with a pathogenic variant, compared with 32 (4%) of the ABCFS controls and 145 (2%) of the ASPREE controls (Table 2). Further details of pathogenic variants detected are provided in Supplementary Tables 1-3 for the ABCFS cases, ABCFS controls and ASPREE participants, respectively. The number of carriers of pathogenic variants in genes other than BRCA1 or BRCA2 in the ABCFS cases, ABCFS controls and ASPREE controls were 73 (5%), 22 (3%) and 128 (2%) respectively. Carriers with pathogenic variants in BRCA1 or BRCA2 were also more likely to have a family history of breast cancer than carriers of pathogenic variants in the other genes (p < 0.0001).

Statistical analyses
Pathogenic variants in BRCA1 were strongly associated with an increased risk of ER-negative breast cancer (Supplementary Table  5). There was also weak evidence that pathogenic variants in CHEK2 were associated with risk of ER-negative breast cancer, though only after age adjustment. Pathogenic variants in ATM, BRCA1, BRCA2 and PALB2 were all associated with the risk of ERpositive breast cancer (Supplementary Table 6). Carriers of pathogenic variants in BRCA1 were less likely than non-carriers to have ER-positive breast cancer, presumably as a consequence of the strong association between pathogenic variants in BRCA1 and ER-negative breast cancer, since ER-positive and ER-negative breast cancer were treated as separate diseases for these analyses.
Sensitivity analysis showed that excluding the ASPREE controls gave broadly similar adjusted OR estimates as the main analyses, though with wider confidence intervals (Supplementary Table 7), validating our adjustment for age. Another sensitivity analysis showed that our results were almost unaffected by the exclusion of subjects with pathogenic variants in two or more genes (Supplementary Table 8).

DISCUSSION
The OR for breast cancer risk associated with a pathogenic variant in BRCA1 and BRCA2 in this study is consistent with other estimates 8 . These estimates are lower than reports that involve cases selected via criteria targeting breast and ovarian cancer syndromes and triple-negative breast cancer 12,13 . The breast cancer risk estimates reported here for BRCA1 and BRCA2 are less than the point estimates published by Dorling et al. and Hu et al. but are not statistically significantly different 9,10 .
Consistent with many other studies we found that, after pathogenic variants in BRCA1 or BRCA2, pathogenic variants in CHEK2 were the most frequently identified. The prevalence of CHEK2 c.1100delC in some populations makes it possible for analyses to consider the risk associated with this variant individually. Although there is some evidence that this risk may be higher than that associated with all other CHEK2 pathogenic variants, our estimates did not reach statistical significance. For  [17][18][19][20][21][22] ) and could benefit from specific screening modalities such as magnetic resonance imaging 23 .
We identified 17 carriers of pathogenic variants in ATM, only one of which was ATM c.7271T > G, a pathogenic variant that is well described in the Australian population and has an established association with a substantially increased risk of breast cancer [24][25][26] . Breast cancer risk estimates for women carrying other pathogenic variants in ATM have been consistently reported to be in the order of 2-3-fold [8][9][10][12][13][14]16 .
The literature has consistently reported a prevalence of germline pathogenic variants in genes other than BRCA1 and BRCA2 of~4% when affected women attending high-risk clinics are the study subjects and gene panel tests are applied 1,2 . The prevalence of pathogenic variants in genes other than BRCA1 or BRCA2 are also similar in reports from clinical series of affected women (unselected for family history) 4 . Our findings are consistent with previous work in this setting and illustrate that, in contrast to pathogenic variants in BRCA1 and BRCA2 which are less prevalent in women diagnosed at older ages, the prevalence of pathogenic variants in other breast cancer genes is independent of age at diagnosis 4,9 . In our Australian population-based case subjects, the frequency of pathogenic variants in genes other than BRCA1 or BRCA2 was 5%, as high as that reported from groups of women from high-risk Fig. 1 Box-and-whisker plots of potential confounders. Boxplots of potential confounders, used as adjustment variables in the analyses, for controls from the ASPirin in Reducing Events in the Elderly (ASPREE; in green) study, and for cases and controls from the Australian Breast Cancer Family Study (ABCFS; in purple and blue). Panels are for age (years), height (m), body mass index (kg/m 2 ), number of children, education (years) and alcohol consumption (drinks per week), as indicated. For each boxplot, the horizontal lines are at the potential confounder's median (bold line), 25th and 75th percentiles (horizontal bounds of the box) and most extreme data points within a distance, from the box, of 1.5 times the interquartile range (shorter horizontal lines).
clinical settings. This frequency may be surprising given the population-based ABCFS recruited participants unselected for family history. However, other attributes of the ABCFS participants and the nature of the breast cancer risks associated with pathogenic variants in genes other than BRCA1 and BRCA2 may partially provide an explanation. Predictive factors used to identify families appropriate to refer to high-risk genetics clinics, which include family history, have not been found to be as predictive for carrying pathogenic variants in genes other than BRCA1 and BRCA2 4 . In addition, the average penetrance of pathogenic variants in these genes, although not precisely estimated beyond PALB2 27,28 , is anticipated to be lower than for pathogenic variants in BRCA1 and BRCA2. The relative risks estimated here and elsewhere support this expectation. Therefore, by not selecting for affected women with a family history, yet having a focus on early onset disease, the ABCFS has a prevalence of pathogenic variants in other breast cancer susceptibility genes similar to that of highly selected women attending genetics services. Improved predictors for identifying women and families who carry pathogenic variants in breast cancer predisposition genes other than BRCA1 and BRCA2 are urgently needed.
A significant challenge for studies in this setting, and the assessment of very rare variants, is the availability of suitable datasets to use as reference controls. Few have had resources that provide population-specific reference control datasets of suitable size to incorporate into risk estimation methodology 29,30 . Large publicly available databases have recently become available, including ExAC and gnomAD. Although they constitute invaluable resources as variant frequency databases and can be used to filter out "common" variants that are unlikely to be associated with increased risk of disease, these databases have important limitations when used as controls in case-control studies 7,12 . These limitations include (i) potential technical artefacts resulting from the aggregation of data generated by different sequencing platforms, (ii) differences in the call sets due to the cases and controls not being jointly processed or annotated, (iii) the absence or limited lifestyle and ancestry information for the control subjects and (iv) the absence of genetic information available at the individual subject level as only variant-based data is available. In the context of gene-burden analysis for rare conditions, these public databases can serve as reasonable control datasets with additional computational precautions to mitigate the abovementioned issues, as described by Guo et al. 31 . For common diseases including cancer, they are still likely to contain affected individuals, even when excluding the TCGA sample set of gnomAD and ExAC. By using 862 population-based age-match controls from the ABCFS and 6549 older healthy Australian women participating in ASPREE, our study overcame some of these limitations.
The inclusion of older controls from ASPREE in this study means that our unadjusted OR estimates are biased, so we adjusted all ORs in our main analyses for age and other potential confounders (though we note that our unadjusted p-values are valid, since ascertainment bias disappears under the null hypothesis in our case, and super-cases and super-controls can validly be used for gene discovery). A sensitivity analysis based just on the agematched cases and controls from the ABCFS gave broadly similar ORs as the main analyses, validating our adjustment methods and our adjusted OR estimates.
In our study, although different sequencing platforms have been used to generate the raw sequencing data, we aimed to reduce potential artefactual variant calls by utilising the processing pipeline that was the most appropriate for the sequencing  technology used to produce the raw sequencing data for the case and the control subjects, then harmonising the variant calls by (i) restricting calls to regions that are equally able to be called across the three targeted regions and (ii) applying the same filtering and annotation pipelines. Our study used ClinVar to select pathogenic variants to include, as a group, in our association analysis. Although the level of confidence in ClinVar calls can be variable, as demonstrated by the star rating system or the "Conflicting evidence of pathogenicity" label, this approach allowed us to harmonise our pathogenicity calls with other studies, e.g., from Ambry 8 or Myriad 11 who regularly deposit their classification calls into ClinVar. For genes such as BRCA1, BRCA2, TP53 or the mismatch repair genes, we were also able to keep our pathogenicity assessment contemporary with regular updates from the genes respective expert panels. A limitation of our approach is the potential to underestimate the contribution of missense variants, as they are very challenging to classify. Functional assays can provide important additional information for variant classification but are currently less well developed for breast cancer predisposition genes other than BRCA1 and BRCA2, although some recent and promising progress has been made for PALB2 32 . A large number of unclassified variants (n = 924) were identified in the case subjects of the ABCFS in this study. It is likely that an extremely small number of these variants will be classified as pathogenic in the future. Recent data from Dorling et al. 10 , provided further evidence for breast cancer risk for missense variants in a number of breast cancer susceptibility genes, most notably CHEK2 10,33 . Considerable effort has been invested by the ENIGMA consortium to understand the effect of deleterious variants in these genes and keep the variant classification up-to-date and publicly available.
Our data provide a population perspective to gene panel testing for breast cancer predisposition and contribute to international efforts to refine the breast cancer risk estimates for genetic variants identified in panel testing in women enriched for early age at breast cancer diagnosis and unselected for family history.

METHODS Subjects
The present study includes cases and controls from the Australian Breast Cancer Family Study (ABCFS) and participants from the ASPirin in Reducing Events in the Elderly (ASPREE) study.
Aspects of the ABCFS relevant to this study are the population-based probands and corresponding data collected at baseline. Briefly, the ABCFS probands were either breast cancer cases (identified through populationcomplete cancer registries) or age-matched controls. All probands completed interviewer-administered risk factor questionnaires and verification of cancers was sought through pathologist reviews of cancer tissue, pathology reports, cancer registries, medical records, and death certificates 34,35 .
The ASPREE study is a randomized, placebo-controlled trial for daily lowdose aspirin. We selected Australian participants aged 70 years or older at enrolment, without a previous diagnosis or current symptoms of atherothrombotic cardiovascular disease, physical disability, or dementia. Study design, recruitment, baseline characteristics and outcomes have been previously described 36,37 . Our statistical analysis only used ASPREE data that were collected at baseline. ASPREE female participants who reported at baseline a personal history of breast cancer were excluded from the statistical analysis.
Written informed consent was obtained from all individual participants included in the study. This study was approved by the Human Research Ethics Committee of the University of Melbourne and Monash University.

Gene panel testing
We analysed rare genetic variants identified in the blood-derived germline DNA of 1,451 women diagnosed with breast cancer and 857 age-matched controls participating in the ABCFS, and 13,197 individuals (6549 women) enroled in the ASPREE trial. Only selected regions of PMS2 were targeted as described previously 38 . Gene-panel testing and raw DNA sequencing reads alignment to the reference genome GRCh37 were performed as described in Nguyen-Dumont et al. and Lacaze et al. for the ABCFS and ASPREE subjects, respectively 38,39 . Briefly, the ABCFS subjects were sequenced in-house, using either a Hi-Plex panel on the NextSeq550 40 or a HaloPlexHS panel on the HiSeq3000 (both Illumina, San Diego, CA, USA). The ASPREE subjects were sequenced using an AmpliSeq panel on the Ion Torrent S5TM XL (Thermo Fisher Scientific, Waltham, MA, USA) and aligned sequencing files (BAMs) were provided for variant calling in this study.

Variant calling and filtering
Variant calling was performed using VarDict 1.7 41 and restricted to the overlap of the regions targeted by the three panels. For ASPREE controls sequenced on the Ion Torrent platform, variant calling had also been performed using the Torrent Variant Calling Suite v1.5 as previously described 42 and the intersection with the variant calls from VarDict was used in downstream analyses. Subsequent genetic analyses were restricted to variants: (i) with the following read depth and variant allele frequency: 50X and 0.2 for Hi-Plex and AmpliSeq samples, and 30X and 0.15 for HaloPlexHS samples. In addition, for the ASPREE samples, we determined a conservative but high-confidence call set by filtering out (i) variants present in more than 0.05% of all ASPREE participants (n = 65), under the assumption that common variants were either sequencing artefacts or too common to be associated with disease risk, and (ii) variants that had passed our quality filters described above in <95% of the genotype calls at a given genomic location, to ensure that variants that progressed to the next analysis stage were adequately covered.
Variant annotation was performed using VarSeq VSClinical v2.2 (Golden Helix Inc., Bozeman, MT, USA) and included ClinVar annotations from July 2020. This study focused on rare predicted protein-truncating variants (PTVs) and pathogenic (including likely pathogenic) variants. Rare variants were defined as those identified in ExAC v.0.3 with a minor allele frequency ≤0.01 in the non-Finnish European population (NFE non-TCGA). Genetic variants were considered pathogenic if they were annotated as "Pathogenic" or "Likely Pathogenic" in ClinVar. Mono-allelic pathogenic MUTYH variant carriers are reported in Supplementary Table 1 but not included in our analysis. Predicted PTVs that were classified as "Conflicting" in ClinVar with annotations tending towards pathogenicity (e.g., CHEK2 c.1100delC) were included in this analysis. Also included were PTVs that were absent (unreported) in ClinVar, except if they were located in the last coding exon. Further details can be found in Supplementary Table 1.

Statistical methods
For each of the genes considered, pathogenic variants were combined and an odds ratio (OR) for their association with breast cancer was estimated using unconditional multivariate logistic regression. These analyses were adjusted for the following potential confounders (or, where indicated, a subset of these): age at enrolment, height, body mass index, number of children, number of years of education and number of alcoholic drinks per week. These potential confounders are the known breast cancer risk factors that are available in both the ABCFS and ASPREE datasets, and data harmonisation was performed to make the relevant variables from these two studies comparable.
Excluded from all statistical analyses were males, ASPREE females with a previous diagnosis of breast cancer, and females with no gene-panel testing data. Women with missing data were excluded from analyses involving the relevant variables, though <1% had missing values for each variable (except for number of children, which was missing for 4.3% of ASPREE). Women with pathogenic variants in two or more genes (other than MUTYH, see above) were excluded from the main analyses.