Many multiple-case BC families remain unexplained by a pathogenic variant, despite routine clinical testing of an increasing number of recognized BC susceptibility genes (BCSGs). These cases pose a major clinical challenge for disease management of patients and cancer prevention.

To identify germline pathogenic variants (GPVs) in patients, Next-Generation Sequencing (NGS) of blood-derived DNA is used, often through pan-cancer panels containing up to 25 genes. The three major BCSGs – BRCA1, BRCA2, and PALB2 – are involved in homology-directed DNA repair (HR). Several other, less well-studied genes, also involved in DNA repair and all variably associated with increased risk for hereditary breast and ovarian cancer (HBOC), are also included on some panels. As PARP inhibitors have been approved for treating cancers with HR defects, it is important to identify all possible HR pathway genes reliably implicated in BC susceptibility. To this end, many clinical BC susceptibility genes (BCSGs) testing panels include putative BC-related HR pathway genes, but their candidacy remains unproven. This can lead to inappropriate use of expensive therapies.

We and others showed that tumor sequencing is a powerful tool to find genes or variants that cause HRD breast cancer. In BC cases due to known BCSGs, the inherited pathogenic variant confers genetic susceptibility, but tumorigenesis is usually driven by a somatic inactivating “second-hit” event resulting in loss of gene function, often via loss of heterozygosity (LOH) through deletion of the wild-type allele. In this tumor suppressor model of BCSG function, as a nascent tumor cell loses the ability to repair DNA, lesions accumulate, including in genes favoring tumor progression1. Tumor genomes of patients with BRCA1/2 inactivation were found to be enriched in copy number losses and rearrangements. This observation led to the development of an HRD index based on these somatic genomic abnormalities that can be used to detect HRD implicated tumors rather than on the specific mutational spectrum at the nucleotide level of the altered the DNA sequence2,3,4,5. In addition, in tumors with anomalies in HR repair, single base-pair changes have been shown to accumulate in the tumor genome, giving rise to what is referred to as mutational signatures. We showed that using sequencing of paired tumor and normal tissues we can detect these HRD-associated mutational signatures as well as second hits6,7. COSMIC tumor mutational signature 3 (Sig 3) has been associated with homologous repair deficiency (HRD)8. Monoallelic inactivation of BRCA1/2 does not give rise to Sig 3, whereas Sig 3 is seen in tumors with biallelically inactivated BRCA1/2.

The mutational signature analysis helped to define the role of various HR genes directly from case-only data by association with Sig 3 levels. We showed that inactivation of RAD51C, RAD51D, PALB2, and BARD1 are all associated with elevated Sig 3 levels, indicating that GPVs in these genes lead to BRCA-like/HRD cancers. In contrast, GPVs in other established and candidate BCSG genes such as ATM, ATR, BRIP1, MRE11, NBN, and RAD50 do not lead to BRCA-like tumors6. Moreover, we developed a framework to reclassify germline BRCA1/2 variants of unknown significance (VUS) as benign or pathogenic variants in the context of HRD in tumors6. A variant was classified as a pathogenic or disease-causing variant if the tumor that harbored it showed a second hit and Sig 3. This approach had 100% sensitivity and 98.5 % specificity in classifying 142 known benign and pathogenic missense mutations in BRCA1/2. Thus, Sig 3 levels can be used to identify new HR genes and new pathogenic variants in genes that lead to BRCA-like tumors.

In this study, we designed a pipeline encompassing variant calling and detection of second events involving somatic variants, combined with mutational signature analysis for HRD detection. The pipeline was optimized for archival formalin-fixed paraffin-embedded (FFPE) tumor tissues. We studied 38 unsolved BC patients with either early-onset BC or a strong family history of BC with no identified GPVs in known BCSGs (Supplementary Table 1). We tested the hypothesis that their tumors are linked to HRD due to alterations in genes that have not yet been definitively found associated with heritable risk to BC (Fig. 1).

Fig. 1: Study workflow.
figure 1

(1) all eligible women were selected based on their personal and family history of cancer (red and pink women) and (2) received prior clinical screening. (3) unsolved cases were selected for WES through normal/tumor pair samples from blood lymphocytes and FFPE tissues. (4) sequencing data were analyzed through a germline candidate variant detection algorithm focused on homology-directed repair candidates following a classical BCSG model. (5) tumor landscapes were evaluated through a variant calling pipeline designed to minimize artifacts and a second-hit detection strategy based on the biological mechanism (i.e. defects in the hr pathway). (6) candidate BCSGs were re-evaluated in the context of a patient harboring a germline variant with an HRD tumor and second hit.

To identify possible genetic causes that underly BC in our cases, we compiled a list of candidate genes implicated in HR which are not on current genetic testing panels (see methods). We analyzed whole-exome sequencing data (WES) of tumor/normal pairs to look for potentially pathogenic rare variants and to search for evidence of a second hit in the tumor based on a tumor suppressor gene model as well as for evidence of an HRD signature6 (Fig. 1).

Three identified variants were considered as pathogenic by ClinVar: two variants were in the FANCA and GJB2 genes, but neither showed a second hit. The third was a nonsense variant in LIG4 that showed LOH in the tumor. We observed LOH of the WT allele for 18 additional germline variants. We found five VUSs and three variants with Conflicting Interpretation of pathogenicity (CIOP) in APC, ATM, BRCA2, CHEK2, FLCN, and RAD51D; and the remaining variants were not classified in ClinVar (Fig. 2, Supplementary Fig. 1). No second somatic LOF alteration was found in combination with a germline candidate variant. All variants are summarized in Tables 1, 2.

Fig. 2: Co-mutation plot of germline and somatic landscape after HRD detection.
figure 2

The HRD status evaluated by SigMA (that estimates the presence of Signature 3) is shown in the second row from the top, and columns with HRD based on SigMA are highlighted in vertical green columns. ScarHRD, shown in row 3, calculates the HRD index and cases with HRD determined by this method that has scores great than 41 are represented by red bars. The germline and somatic mutation status of candidate genes across patients are shown for four classes of genes (rows 4–7). BCDG: Breast Cancer Driver Gene; CN-LOH: copy neutral LOH; Dup-LOH: Duplication with LOH; Del-LOH: hemizygous deletion with LOH.

Table 1 List of missenses variants retained in the tumor by LOH.
Table 2 List of truncating variants retained in the tumor by LOH.

To identify patients with HRD-related tumors, we used two HRD classifiers tools. SigMA is tailored for WES tumor data and is used to identify Sig 3 (Fig. 2, Supplementary Fig. 2). We also used ScarHRD, which is based on evaluating genomics scars due to HRD in tumor DNA and has been recently adapted for use with WES data. We identified 7 HRD tumors based on SigMA, and 12 based on ScarHRD (Supplementary Fig. 3). All 7 cases exhibiting an HRD signature by SigMA analysis were those also classified as having genomic scars consistent with HR defects by ScarHRD. Given that WES is not ideally suited to detect large genomic events, we considered the 7 tumors (18%) that were called positive by both tools for further analysis (Fig. 2). This percentage is within the range of detection reported by other HRD studies7,9,10,11. The only HRD tumor with a germline BCSG candidate variant was from a BC patient with the missense variant RAD51D p.S207L (case C10758). Using copy number analysis, we were able to identify duplication of the germline missense variant along with a trans allele loss, resulting in copy neutral LOH (CN-LOH) event at the gene locus as the second hit. We found a somatic deletion at a donor splice-site in RAD51C associated with duplication-LOH in another tumor with HRD (case JGH1), without germline events (Fig. 2). No other HRD tumor was found to be associated with our candidate genes either somatically or in the germline.

Variants in known BC drivers12 were found in 84% of tumors (Fig. 2, Supplementary Fig. 4). Of these, TP53, GATA3, and CBFB were considered as a driver (see methods). Three patients harbored a PIK3CA hotspot mutation. We also detected somatic alterations in three BC genes (PTEN, PALB2, and CDH1) in 4 other patients. Copy number profiles were similar to observations from previous BC cohort studies12,13.

Our pipeline worked well in calling variants in archival FFPE tumors, which are more accessible than fresh tumor material. Except for the RAD51D variant, which has a conflicting interpretation of pathogenicity (https://www.ncbi.nlm.nih.gov/clinvar/variation/142102/), we did not find supporting evidence to reclassify any ClinVar-described variants found in our HRD tumors as pathogenic. The fact that we were able to independently identify p.S207L RAD51D variant as a candidate pathogenic variant using our pipeline demonstrates the robustness of the genomic tools used in this study to reassess VUS. Although we sequenced a highly selected population of BC patients from multiple-case BC families, we did not find any other plausible GPVs in genes beyond those currently included in the clinical gene testing panels, which is in line with recent studies14. We focused on HRD as a causal mechanism in BC, but our approach could be of use to identify germline drivers in other tumor types with mutational signatures that are associated with defects in DNA repair pathways, such as mismatch repair, hyper-mutated tumors, and base excision repair.

There are a number of caveats to our study. Our inability to detect potentially pathogenic variants in recently confirmed BCSGs (e.g. RAD51C and BARD110,15) as well in largely unexplored HR genes could be due to the small sample size of our study. As WES is not ideally suited to identify large structural rearrangements associated with HR defects we could have missed tumor harboring anomalies due to HRD. Also, it is possible that undiscovered BCSGs do not operate via the HR pathway, and in this case our HRD-based framework will not detect variants in these genes. Finally, we cannot exclude the possibility that variants were missed for technical reasons given the age of some of the FFPE material, as one-quarter of our cases were diagnosed more than 10 years prior to analysis, which likely affected the quality of DNA for analysis. This limitation would not be as relevant for somatic testing of freshly processed tissues in a clinical context. Importantly, our study focused on WES data, so we could not assess other molecular mechanisms affecting gene function as potential contributors, such as complex structural variations9 or somatic methylation events16.

Nones et al.17 took a similar approach using fresh frozen tumors and were able to show that all but one causally unexplained BCs with HRD were due to inactivation of BRCA1, BRCA2, or PALB2. Taken together with the results we present here, we can conclude that the tail of missing heritability due to HRD genes must be long and thin. It also means that existing gene testing panels are sufficient to identify carriers of clinically important BC genes using a normal/tumor pair sequencing-based approach. We must therefore look elsewhere to find the missing inherited susceptibility present in these families.

Methods

Samples selection

This study called “Genome-wide approaches in hereditary cancer families”, study number A08-M61-09B, was created in 2011 and approved by the McGill University Research Ethics Board. Participants to this study were consented at two Montreal sites. Some participants were consented directly into the scientific study at McGill University and its affiliated hospitals. Others were recruited from a Montreal-based biobank entitled “Banque d'échantillons biologiques et de données (cliniques et biologiques) associées à des fins de recherche sur les cancers métastatiques du sein et de l’ovaire” previously created in 2000 and approved by the CHUM Institutional Review Board, approval number BD 04.002. We identified over 70 candidate families eligible for this study. All had early-onset BC or a strong family history of breast cancer (median Manchester score 24, range 10-59). None carried a BRCA1/2 or PALB2 pathogenic or likely pathogenic variant on clinical genetic testing. We reviewed approximately 130 pathology blocks to retrieve available BC tumor material that would be suitable for analysis. Blood samples were only obtained from patients where tumor material was successfully retrieved and processed. Of 59 FFPE tumors with extracted DNA, 20 samples failed sequencing due either to poor quality or insufficient quantity of DNA. WES was thus performed on 39 normal/tumor pairs of women with BC diagnosed between 1995 and 2017, but we removed one because of unresolved contamination issues. The interval between diagnosis and sequencing ranged from a few months to 22 years (median 3 years).

Normal/tumor WES sequencing

Genomic DNA was extracted from the patient’s peripheral blood lymphocytes and tumor using published commercial protocols and kits optimized in our lab (Maxwell RSC Buffy Coat DNA/Tissue DNA kits, Promega). Library preparation, exome capture, and sequencing were conducted over a period of 2 years, as material was obtained, reflecting repeat assays for DNA samples that failed. A Panel of Normals (PoN) comprised of 47 normal tissue samples, all sequenced using the same protocol as we applied to tumor tissues, was used to assist in removing recurrent technical artifacts due to formalin fixation. Genomic DNA libraries were generated using one of the following: Nextera Rapid Capture Exome Target (Illumina, San Diego, California), Sure Select Human All Exon V6, or Sure Select Human All Exon V7 (Agilent). DNA samples were then sequenced using Illumina sequencers and appropriate protocols available at the time of sequencing: Illumina HiSeq 2000 as 100 bp paired-end reads, Illumina HiSeq 2500 as 125 bp paired-end reads, and Illumina HiSeq 4000 as 100 bp paired-end reads (Illumina, San Diego, California).

Germline variant calling and filtering

Exome sequencing data analysis was performed using the pipeline we developed for this project. Briefly, BWA (v. 0.5.9) and the Genome Analysis Toolkit (GATK)18 were used to align the sequenced reads to the reference genome (UCSC hg19) and to perform local realignment of reads around small insertions and deletions (indels). PCR duplicate reads were marked using Picard (https://github.com/broadinstitute/picard). The coverage of consensus coding sequence (CCDS) bases was calculated by GATK, resulting in a mean coverage of 141.3X (ranging from 26.9X to 250.6X) and 97.3% of CCDS bases were covered by at least 10 reads. Germline variants were called individually for each sample using HaplotypeCaller from GATK and annotated by wAnnovar19. From the generated list of germline variants, variants were filtered out if they fulfilled any one of the following quality criteria: (i) genomic position of variant covered by <5 reads; (ii) <2 reads support the alternative variant; (iii); allelic fraction between < 0.35 or >0.5; (iv) and mapping quality < 30. We then focused on four categories of genes (n = 864): 20 BCSG frequently used in clinical testing, and previously tested in our HBC cases; 594 “Candidate HR genes” list (C-HRGs) purported to have co-evolved20 or known to interact in HR (unpublished data from Dr. Alexander Orthwein); 103 known DNA repair genes; and 147 other known Cancer Susceptibility Genes (CSGs)21 (Supplementary Table 2). Only variants having an allele frequency <0.01 based on allele frequencies in GnomAD22 and 1000 Genomes project23 were retained for further investigation.

Somatic variant calling and filtering

Somatic variants were called individually through normal/tumor paired caller using Mutect224 for SNV and indel. The PoN was used as well as germline population resources according to GATK’s best practice. Variants were filtered out if they (i) displayed strand bias, to remove artifacts caused by the formalin fixation process; (ii) were germline in origin; (iii) were due to technical artifacts revealed by comparison to PoN, and (iv) had a depth < 10 reads. Variants were next annotated with VEP25.

Somatic Copy Number Analysis and LOH detection

With GATK CNA, read coverage counts across the target were collected. We created a separate copy number variation (CNV) PoN to reduce read coverage data noise to obtain denoising copy ratios against the PoN. Then we collected counts of reference versus alternate alleles for allelic imbalance. We used GISTIC26 to assess recurrent focal amplifications and deletions in tumors. We used FACETS27 and Sequenza28 to infer the integer copy number of each pair of alleles as well as LOH. Hemizygous deletions were defined as the loss of one allele in the tumor. Homozygous deletions were defined as the loss of both alleles in the tumor. Amplification was defined as a copy number status ≥ 6 per tumor for ERBB2, MYC, and ZNF703.

Variant visualization

The Integrative Genomics Viewer29 was used for the manual examination and visualization of all germline candidate variants, and somatic events.

Somatic cancer driver gene identification

A combination of several tools was used to infer drivers of oncogenesis based on statistical methods through q-score to identify genes that were mutated more often than expected by chance: MutSigCV30, OncodriveFML31, MutPanning32, and FishHook33. Genes that were found mutated more than once with a q-score < 0.05 were considered as cancer driver genes.

Mutational signature and homologous recombination repair deficiency detection

Different tools were used to assess the mutational signature from normal-tumor paired WES VCF data: MutationalPatterns34, Maftools35, DeconstructSig36 for tumor signature clustering and the mutational signature contribution in each tumor. Different approaches were used: a de novo mutational signature extraction using a non-negative matrix factorization (NMF) and a COSMIC signature similarity using cosine similarity. We used SigMA37 to detect the mutational signature associated with HR defect designed for WES. We also used ScarHRD38 to detect HRD genomic scars—genomic abnormalities characteristic of HRD, which is based on the sum of LOH score, Large Scale Transition (LST) score, and Telomeric Allelic Imbalance (TAI) score. A score of ≥ 42 classifies tumors as HRD (see Supplementary Fig. 3).

Candidate variant selection algorithm

A list of candidate pathogenic variants was generated using our variant detection algorithm (Supplementary Fig. 5). ClinVar annotations were used to classify variants already reported, and when no ClinVar data was available, we used the ACMG criteria. Variants classified as likely benign or benign by ClinVar were filtered out. The master list of variants was comprised of truncating, missense, or splice-site alterations found in gene candidates.

For non BSCG, missense and splice-site variants predicted to be deleterious by at least 4 of 8 bioinformatic prediction tools (SIFT, PolyPhen2, MutationTaster, FATHM, Provean, MetaSVM, MetaLR, and CADD) were considered for further analysis. For known BSCG genes, we considered all variants regardless of scores as these can be evaluated by expertise. Truncating variants were classified separately. The candidate list was comprised of 231 variants: 7 missense VUSs or variants with conflicting interpretations of pathogenicity in known BCSGs, 46 truncating variants, and 178 predicted pathogenic variants in C-HRGs, DNA repair genes, and in CSGs. From this list of candidate genes, we searched for evidence of allelic imbalance at each gene locus in the tumor using statistical tests (Chi-squared test) and filtered out variants where the p-value was >0.05 and confirmed with FACET results that there was a deletion of one haplotype (see above). We also looked for somatic second-hit point mutations. Our final list of candidate variants was comprised of genes that showed evidence of a possible second mutational hit in the tumor.

Reporting summary

Further information on research design is available in the Nature Research Reporting Summary linked to this article.