Current gene panels account for nearly all homologous recombination repair-associated multiple-case breast cancer families

It was hypothesized that variants in underexplored homologous recombination repair (HR) genes could explain unsolved multiple-case breast cancer (BC) families. We investigated HR deficiency (HRD)-associated mutational signatures and second hits in tumor DNA from familial BC cases. No candidates genes were associated with HRD in 38 probands previously tested negative with gene panels. We conclude it is unlikely that unknown HRD-associated genes explain a large fraction of unsolved familial BC.

It was hypothesized that variants in underexplored homologous recombination repair (HR) genes could explain unsolved multiplecase breast cancer (BC) families. We investigated HR deficiency (HRD)-associated mutational signatures and second hits in tumor DNA from familial BC cases. No candidates genes were associated with HRD in 38 probands previously tested negative with gene panels. We conclude it is unlikely that unknown HRD-associated genes explain a large fraction of unsolved familial BC.
npj Breast Cancer (2021) 7:109 ; https://doi.org/10.1038/s41523-021-00315-8 Many multiple-case BC families remain unexplained by a pathogenic variant, despite routine clinical testing of an increasing number of recognized BC susceptibility genes (BCSGs). These cases pose a major clinical challenge for disease management of patients and cancer prevention.
To identify germline pathogenic variants (GPVs) in patients, Next-Generation Sequencing (NGS) of blood-derived DNA is used, often through pan-cancer panels containing up to 25 genes. The three major BCSGs -BRCA1, BRCA2, and PALB2are involved in homology-directed DNA repair (HR). Several other, less wellstudied genes, also involved in DNA repair and all variably associated with increased risk for hereditary breast and ovarian cancer (HBOC), are also included on some panels. As PARP inhibitors have been approved for treating cancers with HR defects, it is important to identify all possible HR pathway genes reliably implicated in BC susceptibility. To this end, many clinical BC susceptibility genes (BCSGs) testing panels include putative BCrelated HR pathway genes, but their candidacy remains unproven. This can lead to inappropriate use of expensive therapies.
We and others showed that tumor sequencing is a powerful tool to find genes or variants that cause HRD breast cancer. In BC cases due to known BCSGs, the inherited pathogenic variant confers genetic susceptibility, but tumorigenesis is usually driven by a somatic inactivating "second-hit" event resulting in loss of gene function, often via loss of heterozygosity (LOH) through deletion of the wild-type allele. In this tumor suppressor model of BCSG function, as a nascent tumor cell loses the ability to repair DNA, lesions accumulate, including in genes favoring tumor progression 1 . Tumor genomes of patients with BRCA1/2 inactivation were found to be enriched in copy number losses and rearrangements. This observation led to the development of an HRD index based on these somatic genomic abnormalities that can be used to detect HRD implicated tumors rather than on the specific mutational spectrum at the nucleotide level of the altered the DNA sequence [2][3][4][5] . In addition, in tumors with anomalies in HR repair, single base-pair changes have been shown to accumulate in the tumor genome, giving rise to what is referred to as mutational signatures. We showed that using sequencing of paired tumor and normal tissues we can detect these HRDassociated mutational signatures as well as second hits 6,7 . COSMIC tumor mutational signature 3 (Sig 3) has been associated with homologous repair deficiency (HRD) 8 . Monoallelic inactivation of BRCA1/2 does not give rise to Sig 3, whereas Sig 3 is seen in tumors with biallelically inactivated BRCA1/2.
The mutational signature analysis helped to define the role of various HR genes directly from case-only data by association with Sig 3 levels. We showed that inactivation of RAD51C, RAD51D, PALB2, and BARD1 are all associated with elevated Sig 3 levels, indicating that GPVs in these genes lead to BRCA-like/HRD cancers. In contrast, GPVs in other established and candidate BCSG genes such as ATM, ATR, BRIP1, MRE11, NBN, and RAD50 do not lead to BRCA-like tumors 6 . Moreover, we developed a framework to reclassify germline BRCA1/2 variants of unknown significance (VUS) as benign or pathogenic variants in the context of HRD in tumors 6 . A variant was classified as a pathogenic or diseasecausing variant if the tumor that harbored it showed a second hit and Sig 3. This approach had 100% sensitivity and 98.5 % specificity in classifying 142 known benign and pathogenic missense mutations in BRCA1/2. Thus, Sig 3 levels can be used to identify new HR genes and new pathogenic variants in genes that lead to BRCA-like tumors.
In this study, we designed a pipeline encompassing variant calling and detection of second events involving somatic variants, combined with mutational signature analysis for HRD detection.
The pipeline was optimized for archival formalin-fixed paraffinembedded (FFPE) tumor tissues. We studied 38 unsolved BC patients with either early-onset BC or a strong family history of BC with no identified GPVs in known BCSGs (Supplementary Table 1). We tested the hypothesis that their tumors are linked to HRD due to alterations in genes that have not yet been definitively found associated with heritable risk to BC (Fig. 1).
To identify possible genetic causes that underly BC in our cases, we compiled a list of candidate genes implicated in HR which are not on current genetic testing panels (see methods). We analyzed whole-exome sequencing data (WES) of tumor/normal pairs to look for potentially pathogenic rare variants and to search for evidence of a second hit in the tumor based on a tumor suppressor gene model as well as for evidence of an HRD signature 6 (Fig. 1).
Three identified variants were considered as pathogenic by ClinVar: two variants were in the FANCA and GJB2 genes, but neither showed a second hit. The third was a nonsense variant in LIG4 that showed LOH in the tumor. We observed LOH of the WT allele for 18 additional germline variants. We found five VUSs and three variants with Conflicting Interpretation of pathogenicity (CIOP) in APC, ATM, BRCA2, CHEK2, FLCN, and RAD51D; and the remaining variants were not classified in ClinVar (Fig. 2, Supplementary Fig. 1). No second somatic LOF alteration was found in combination with a germline candidate variant. All variants are summarized in Tables 1, 2.
To identify patients with HRD-related tumors, we used two HRD classifiers tools. SigMA is tailored for WES tumor data and is used to identify Sig 3 (Fig. 2, Supplementary Fig. 2). We also used ScarHRD, which is based on evaluating genomics scars due to HRD in tumor DNA and has been recently adapted for use with WES data. We identified 7 HRD tumors based on SigMA, and 12 based on ScarHRD ( Supplementary Fig. 3). All 7 cases exhibiting an HRD signature by SigMA analysis were those also classified as having genomic scars consistent with HR defects by ScarHRD. Given that WES is not ideally suited to detect large genomic events, we considered the 7 tumors (18%) that were called positive by both tools for further analysis (Fig. 2). This percentage is within the range of detection reported by other HRD studies 7,[9][10][11] . The only HRD tumor with a germline BCSG candidate variant was from a BC patient with the missense variant RAD51D p.S207L (case C10758). Using copy number analysis, we were able to identify duplication of the germline missense variant along with a trans allele loss, resulting in copy neutral LOH (CN-LOH) event at the gene locus as the second hit. We found a somatic deletion at a donor splice-site in RAD51C associated with duplication-LOH in another tumor with HRD (case JGH1), without germline events (Fig. 2). No other HRD tumor was found to be associated with our candidate genes either somatically or in the germline.
Variants in known BC drivers 12 were found in 84% of tumors ( Fig. 2, Supplementary Fig. 4). Of these, TP53, GATA3, and CBFB were considered as a driver (see methods). Three patients harbored a PIK3CA hotspot mutation. We also detected somatic alterations in three BC genes (PTEN, PALB2, and CDH1) in 4 other patients. Copy number profiles were similar to observations from previous BC cohort studies 12,13 .
Our pipeline worked well in calling variants in archival FFPE tumors, which are more accessible than fresh tumor material. Except for the RAD51D variant, which has a conflicting interpretation of pathogenicity (https://www.ncbi.nlm.nih.gov/clinvar/ variation/142102/), we did not find supporting evidence to reclassify any ClinVar-described variants found in our HRD tumors as pathogenic. The fact that we were able to independently identify p.S207L RAD51D variant as a candidate pathogenic variant using our pipeline demonstrates the robustness of the genomic tools used in this study to reassess VUS. Although we sequenced a highly selected population of BC patients from multiple-case BC families, we did not find any other plausible GPVs in genes beyond  (2) received prior clinical screening. (3) unsolved cases were selected for WES through normal/tumor pair samples from blood lymphocytes and FFPE tissues. (4) sequencing data were analyzed through a germline candidate variant detection algorithm focused on homologydirected repair candidates following a classical BCSG model. (5) tumor landscapes were evaluated through a variant calling pipeline designed to minimize artifacts and a second-hit detection strategy based on the biological mechanism (i.e. defects in the hr pathway). (6) candidate BCSGs were re-evaluated in the context of a patient harboring a germline variant with an HRD tumor and second hit.
T.S. Matis et al. those currently included in the clinical gene testing panels, which is in line with recent studies 14 . We focused on HRD as a causal mechanism in BC, but our approach could be of use to identify germline drivers in other tumor types with mutational signatures that are associated with defects in DNA repair pathways, such as mismatch repair, hyper-mutated tumors, and base excision repair.
There are a number of caveats to our study. Our inability to detect potentially pathogenic variants in recently confirmed BCSGs (e.g. RAD51C and BARD1 10,15 ) as well in largely unexplored HR genes could be due to the small sample size of our study. As WES is not ideally suited to identify large structural rearrangements associated with HR defects we could have missed tumor harboring anomalies due to HRD. Also, it is possible that undiscovered BCSGs do not operate via the HR pathway, and in this case our HRD-based framework will not detect variants in these genes. Finally, we cannot exclude the possibility that variants were missed for technical reasons given the age of some of the FFPE material, as one-quarter of our cases were diagnosed more than 10 years prior to analysis, which likely affected the quality of DNA for analysis. This limitation would not be as relevant for somatic testing of freshly processed tissues in a clinical context. Importantly, our study focused on WES data, so we could not assess other molecular mechanisms affecting gene function as potential contributors, such as complex structural variations 9 or somatic methylation events 16 .
Nones et al. 17 took a similar approach using fresh frozen tumors and were able to show that all but one causally unexplained BCs with HRD were due to inactivation of BRCA1, BRCA2, or PALB2. Taken together with the results we present here, we can conclude that the tail of missing heritability due to HRD genes must be long and thin. It also means that existing gene testing panels are sufficient to identify carriers of clinically important BC genes using a normal/tumor pair sequencing-based approach. We must therefore look elsewhere to find the missing inherited susceptibility present in these families.

Samples selection
This study called "Genome-wide approaches in hereditary cancer families", study number A08-M61-09B, was created in 2011 and approved by the McGill University Research Ethics Board. Participants to this study were consented at two Montreal sites. Some participants were consented directly into the scientific study at McGill University and its affiliated hospitals. Others were recruited from a Montreal-based biobank entitled "Banque d'échantillons biologiques et de données (cliniques et biologiques) associées à des fins de recherche sur les cancers métastatiques du

Germline variant calling and filtering
Exome sequencing data analysis was performed using the pipeline we developed for this project. Briefly, BWA (v. 0.5.9) and the Genome Analysis Toolkit (GATK) 18 were used to align the sequenced reads to the reference genome (UCSC hg19) and to perform local realignment of reads around small insertions and deletions (indels). PCR duplicate reads were marked using Picard (https://github.com/broadinstitute/picard). The coverage of consensus coding sequence (CCDS) bases was calculated by GATK, resulting in a mean coverage of 141.3X (ranging from 26.9X to 250.6X) and 97.3% of CCDS bases were covered by at least 10 reads. Germline variants were called individually for each sample using HaplotypeCaller from GATK and annotated by wAnnovar 19 . From the generated list of germline variants, variants were filtered out if they fulfilled any one of the following quality criteria: (i) genomic position of variant covered by <5 reads; (ii) <2 reads support the alternative variant; (iii); allelic fraction between < 0.35 or >0.5; (iv) and mapping quality < 30. We then focused on four categories of genes (n = 864): 20 BCSG frequently used in clinical testing, and previously tested in our HBC cases; 594 "Candidate HR genes" list (C-HRGs) purported to have co-evolved 20 or known to interact in HR (unpublished data from Dr. Alexander Orthwein); 103 known DNA repair genes; and 147 other known Cancer Susceptibility Genes (CSGs) 21 (Supplementary Table 2). Only variants having an allele frequency <0.01 based on allele frequencies in GnomAD 22 and 1000 Genomes project 23 were retained for further investigation.

Somatic variant calling and filtering
Somatic variants were called individually through normal/tumor paired caller using Mutect2 24 for SNV and indel. The PoN was used as well as germline population resources according to GATK's best practice. Variants were filtered out if they (i) displayed strand bias, to remove artifacts caused by the formalin fixation process; (ii) were germline in origin; (iii) were due to technical artifacts revealed by comparison to PoN, and (iv) had a depth < 10 reads. Variants were next annotated with VEP 25 .

Somatic Copy Number Analysis and LOH detection
With GATK CNA, read coverage counts across the target were collected. We created a separate copy number variation (CNV) PoN to reduce read coverage data noise to obtain denoising copy ratios against the PoN. Then

Variant visualization
The Integrative Genomics Viewer 29 was used for the manual examination and visualization of all germline candidate variants, and somatic events.

Somatic cancer driver gene identification
A combination of several tools was used to infer drivers of oncogenesis based on statistical methods through q-score to identify genes that were mutated more often than expected by chance: MutSigCV 30 , Oncodri-veFML 31 , MutPanning 32 , and FishHook 33 . Genes that were found mutated more than once with a q-score < 0.05 were considered as cancer driver genes.

Mutational signature and homologous recombination repair deficiency detection
Different tools were used to assess the mutational signature from normaltumor paired WES VCF data: MutationalPatterns 34 , Maftools 35 , Decon-structSig 36 for tumor signature clustering and the mutational signature contribution in each tumor. Different approaches were used: a de novo mutational signature extraction using a non-negative matrix factorization (NMF) and a COSMIC signature similarity using cosine similarity. We used SigMA 37 to detect the mutational signature associated with HR defect designed for WES. We also used ScarHRD 38 to detect HRD genomic scarsgenomic abnormalities characteristic of HRD, which is based on the sum of LOH score, Large Scale Transition (LST) score, and Telomeric Allelic Imbalance (TAI) score. A score of ≥ 42 classifies tumors as HRD (see Supplementary Fig. 3).

Candidate variant selection algorithm
A list of candidate pathogenic variants was generated using our variant detection algorithm ( Supplementary Fig. 5). ClinVar annotations were used to classify variants already reported, and when no ClinVar data was available, we used the ACMG criteria. Variants classified as likely benign or benign by ClinVar were filtered out. The master list of variants was comprised of truncating, missense, or splice-site alterations found in gene candidates. For non BSCG, missense and splice-site variants predicted to be deleterious by at least 4 of 8 bioinformatic prediction tools (SIFT, PolyPhen2, MutationTaster, FATHM, Provean, MetaSVM, MetaLR, and CADD) were considered for further analysis. For known BSCG genes, we considered all variants regardless of scores as these can be evaluated by expertise. Truncating variants were classified separately. The candidate list was comprised of 231 variants: 7 missense VUSs or variants with conflicting interpretations of pathogenicity in known BCSGs, 46 truncating variants, and 178 predicted pathogenic variants in C-HRGs, DNA repair genes, and in CSGs. From this list of candidate genes, we searched for evidence of allelic imbalance at each gene locus in the tumor using statistical tests (Chisquared test) and filtered out variants where the p-value was >0.05 and confirmed with FACET results that there was a deletion of one haplotype (see above). We also looked for somatic second-hit point mutations. Our final list of candidate variants was comprised of genes that showed evidence of a possible second mutational hit in the tumor.

Reporting summary
Further information on research design is available in the Nature Research Reporting Summary linked to this article.

DATA AVAILABILITY
The data generated and analyzed during this study are described in the following data record: https://doi.org/10.6084/m9.figshare.14802954 39 . Tumor and germline datasets generated during the current study are not publicly available as the consent provided by study participants did not include a provision for widespread disclosure. However, direct requests can be made to the corresponding authors for access. The following files are openly available as part of the figshare data record: 'Master_list_variant.xlsx' (underlying Table 1 Table 2). All other data are housed on institutional storage and are not openly available in order to protect patient privacy as informed consent to share participantlevel data was not obtained prior to or during data collection. However, these data can be requested from Dr Foulkes. The files are: 'all germline_sample.vcf', 'data.tumor. maf', 'all tumor_sample.vcf', 'all sample.out.gzsmall.seqz.gz'.

CODE AVAILABILITY
The bioinformatics analyses were performed using open-source software, including Burrows-Wheeler alignment tool (v. 0.5.