Sequencing for germline mutations in Swedish breast cancer families reveals novel breast cancer risk genes

Identifying genetic cancer risk factors will lead to improved genetic counseling, cancer prevention and cancer care. Analyzing families with a strong history of breast cancer (BC) has been a successful method to identify genes that contribute to the disease. This has led to discoveries of high-risk genes like the BRCA-genes. Nevertheless, many BC incidences are of unknown causes. In this study, exome sequencing on 59 BC patients from 24 Swedish families with a strong history of BC was performed to identify variants in known and novel BC predisposing genes. First, we screened known BC genes and identified two pathogenic variants in the BRIP1 and PALB2 genes. Secondly, to identify novel BC genes, rare and high impact variants and segregating in families were analyzed to identify 544 variants in novel BC candidate genes. Of those, 22 variants were defined as high-risk variants. Several interesting genes, either previously linked with BC or in pathways that when flawed could contribute to BC, were among the detected genes. The strongest candidates identified are the FANCM gene, involved in DNA double-strand break repair, and the RAD54L gene, involved in DNA recombination. Our study shows identifying pathogenic variants is challenging despite a strong family history of BC. Several interesting candidates were observed here that need to be further studied.


Scientific Reports
| (2021) 11:14737 | https://doi.org/10.1038/s41598-021-94316-z www.nature.com/scientificreports/ Here, we performed exome sequencing on 59 BC patients from 24 Swedish families with the aim of identifying variants that could contribute to BC. First, pathogenic variants in known BC susceptible genes were analyzed. Secondly, rare and high impact variants in new BC candidate genes shared by all affected family members were identified.

Material and methods
Families. The individuals in this study were BC patients from families that had undergone genetic counseling at the Department of Clinical Genetics, Karolinska University Hospital Solna, Sweden. All families comprised of at least three close relatives with BC (range 3-8 BC patients). As a part of the study, additional family members were recruited when possible. For each family, one to four individuals were whole-exome sequenced, resulting in 59 BC patients from 24 families represented by of 1st to 4th degree relatives. In total, the study used three families with four sequenced individuals (WES-4s, average age of onset 50.8 ± 6.7 years, consisting of 1st to 3rd degree relatives), six families with three sequenced individuals (WES-3s, average age of onset 49.4 ± 11.9 years, consisting of 1st to 4th degree relatives), 14 families with two sequenced individuals (WES-2s, average age of onset 49.4 ± 10.9 years, consisting of 1st to 4th degree relatives) and one family with one sequenced individual (WES-1s, age of onset 47 years).
All patients gave written informed consent to participate in the study and to donate blood samples. The study was approved by the regional ethics committee in Stockholm. All methods were conducted in accordance with the Declaration of Helsinki guidelines. Bioinformatics workflow. Sequencing reads were aligned to the reference genome GRCh37 using BWA 14 and Picard (http:// broad insti tute. github. io/ picard/) used to mark PCR-duplicated reads. Variants were called using GATK by following the best practice procedure implemented at the Broad Institute 15 . Variant annotation was done by ANNOVAR 16 , including RefSeq gene 17 and dbSNP150 18 . Max minor allele frequency (MMAF) was calculated from the ExAC 19 , 200Danes 20 , SweGen 21 , and 1000 Genomes Project allele frequencies 22 . To assess and predict pathogenic effects of the variants ClinVar 23,24 , ACMG classification 25 and the in silico predictor tool CADD 26 were used. CADD > 20 and CADD > 30 indicate the 1% and 0.1% of most deleterious variants, respectively.
To exclude variants with missing data, BC genotype frequency (BC_GF) was calculated for every variant. A variant with a BC_GF of 0.8 indicates that 80% of the patients had genotypes for that particular variant. No alternative method was used to confirm the genetic variants identified in this study. The presence of high-risk variants was confirmed by manual inspection of the bam files in the IGV software 27 . Ethical statements. All patients gave written informed consent to participate in the study and to donate blood samples. The study was approved by the research ethics committee at Karolinska Institutet and the regional ethics committee in Stockholm. All methods were conducted in accordance with the Declaration of Helsinki guidelines.

Results
Pathogenic variants in known BC-predisposing genes were seen in five BC families. Previously, only one affected individual in each family has been tested for variants in known BC and OC-predisposing genes. Therefore, we searched for variants in the 15 genes from the clinical panel (see "Material and methods" section) in all 59 BC patients from the 24 families. In total, 10 variants were seen in 13 individuals from 9 families ( Table 1). Three of the variants were known pathogenic variants: (1) c.2108delTinsGGA (rs786203384, p.(Lys703fs)) in the BRIP1 gene, (2) c.2748 + 1G > T (rs753153576) in the PALB2 gene, and (3) c.1100delC (rs555607708) in the CHEK2 gene. The BRIP1 frameshift variant and the PALB2 splice donor variant result in protein truncation and were observed in one family each. CHEK2 variant c.1100delC was seen in four individuals from three different families, in the WES-2 family Br15 and the WES-3 family Br7 and in two individuals from the WES-4 family Br1 (Table 1). Five additional missense variants listed as VUS (variant of uncertain significance) or conflict interpretation of pathogenicity were detected in the BC families (Table 1).
Since family members of families Br4 and Br16 carried clear pathogenic variants in the PALB2 and the BRIP1 genes, these two families were excluded from further analysis.
Nearly 40 pathogenic variants in novel BC candidate genes were seen in BC families. In the remaining 22 BC families we searched for new BC-predisposing genes. All variants that (1) were observed in all family members within each family; (2) had BC_GF > 0.8; (3) MMAF < 0.01 and (4) CADD > 20 were selected for further analysis.
In two families, the WES-4 and WES-2 families Br2 and Br18, no variants were observed after applying the criteria. In the remaining 20 families, 544 variants in 521 genes were observed (Tables 2, S1-S2), where www.nature.com/scientificreports/ the majority of the variants were missense (n = 506, Table S2). There were a total of 38 variants with potential pathogenic effect (stop-gain, splicing and frameshift indels), where 20 variants were detected in four of the 12 WES-2s families (Br10, Br12, Br21 and Br22) ( Table S1).
Recurrent genes, defined as genes segregating variants in more than one family, were seen among the BC families where we identified 46 variants located in 23 genes (Table S3). Two of the variants were detected in two families each: (1) rs142493383, in ALPP gene in families Br11 and Br24, and (2) rs200175537 in the CLEC16A gene in families Br12 and Br17. The DNAH14 and OBSCN genes harbored three missense variants each, while we observed two variants in the remaining 19 genes. All variants were missense, apart from one stop-gain variant seen in the LHCGR gene in the WES-2 family Br19, and two frameshift deletions in the RTN3 and TTLL12 genes seen in the WES-2 families Br 22 and Br10, respectively (Table S3).
To further identify the most likely high-risk variants, a stricter criterion was applied to identify very rare and high impact variants in the larger families with sequencing data from 3 to 4 family members. The variants with MMAF < 0.001 and CADD > 25 were considered the most likely high-risk variants. In total, 22 variants in 22 genes were identified in six of the nine WES-3s and WES-4s families ( Table 3). All variants except one were detected in the WES-3 families, and all but two were missense. Most high-risk mutations were detected in family Br7, followed by families Br9, Br5 and Br6 (n = 6, 5, 4 and 4, respectively) ( Table 3). A stop-gain variant, rs143701013 (c.C889T, p.(R297*)), in the last exon of the ZNF563 gene was observed in the WES-4 family Br3 where one individual was a homozygote carrier, and a frameshift deletion, rs769623079 (c.631_632del, p.(C211fs)), was seen in exon 7 in the FANK1 gene in the WES-3 family Br5 (Tables 3, S1).

FANCM stop-gain variant was observed in BC patients from four BC families. Finally, we
searched for rare variants with high CADD that were observed in several BC patients, although not segregating within the families. In total, 15 variants with MMAF < 0.01 and CADD > 25 and detected in at least three families were seen (Table S4) (Table S4). Similarly, a missense variant, rs1065746 (c.G3244C, p.(D1082H)) in the HTT gene was observed in both family members of the WES-2 family Br20, as well as in two individuals from the WES-4 family Br3 and two individuals from the WES-3 family Br8 (Table S4). A missense variant, rs149133270 (c.G1379A, p.(R460Q)), in the MPO gene found in the WES-1 Br24, was also seen in four individuals from two WES-4 families and one WES-3 family. The remaining seven variants were seen in three families each (Table S4).

Discussion
To identify known and novel causative variants that could contribute to hereditary BC, we exome sequenced a selection of patients from 24 Swedish BC families. First, we screened for variants with a pathogenic or possible pathogenic consequence in known BC predisposing genes. Secondly, we searched for rare variants that segregated in BC families with predicted high impact and which could have contributed to the disease.
Three pathogenic variants in the BRIP1, PALB2 and CHEK2 genes were found in five families. Since lossof-function variants in the BRIP1 and PALB2 genes increase the risk of BC and OC 28,29 , these variants were considered to be the main cause of the increased cancer risk in these two families. The c.1100delC variant in the CHEK2 gene is a well-known variant considered to confer an increased risk of BC 30 . However, the risk is considered moderate, and it cannot be concluded that this variant solely explains the BC risk in these families. Several variants with uncertain significance were detected in the BC families. However, further analyses are needed to determine their contribution to BC.
To identify new BC predisposing genes, strict filtering was performed on the remaining families. Variants shared by all family members and with deleterious effects or high CADD were as critera for possible high-risk predisposing variants in the families. In total, 38 deleterious variants and over 500 missense variants were seen in the families, most of them in WES-2s families. Of the 506 missense variants, 29 had CADD > 30 and were considered strong candidates to predispose to the disease. We observed variants located within genes that have previously been linked to BC. The FANCM gene is part of the Fanconi anemia complementation group, which includes the well-known BC risk genes BRCA2, BRIP1 and PALB2. Like those genes, FANCM is involved in DNA double-strand break repair and has been linked to BC [31][32][33] . The stop-gain variant in the FANCM gene seen here in Swedish BC patients has previously been reported in BC patients 31,32 including familial cases 31 , and is has been sugressed to be common in Finnish triple-negative BC patients 32 . Here, it was found in four families, although not in all family members, suggesting this variant can be a risk factor for BC.
Several other interesting variants were seen in genes that could contribute to BC, such as the RAD54L and FN1 genes. The RAD54L gene is involved in DNA recombination, along with the RAD51C and RAD51D genes, and has been linked to BC 34 . The variant is located in exon 7 that contains helicase motif I and Ia 35 [37][38][39][40][41] . Further studies are needed to understand their contribution to BC. This study has several limitations that need to be considered. The cohort consists of a limited number of BC patients and families that were exome sequenced. Therefore, variants outside of the exons are not analyzed here, and our analysis is limited to single nucleotide variants and smaller indels. Furthermore, a strict selection criterion was applied to identify novel risk genes that are rare and assumed with a high impact, thereby excluding more common variants that might contribute to the disease. Since part of our criteria was that variants needed to segregate within all family members sequenced, we have a bias towards more variants detected in smaller families and families containing close relatives. Finally, only affected family members were analyzed. Including unaffected family members could have been beneficial regarding variant filtering.