Introduction

Hereditary breast and ovarian cancer (HBOC) syndrome is mainly caused by pathogenic variants in the BRCA1 [1] and BRCA2 [2] genes. Several additional genes, such as PALB2 [3], RAD51C [4], and RAD51D [5] are now recognized as susceptibility genes for HBOC, although they are associated with a lower risk of cancer than are BRCA1 and BRCA2. The French Genetic and Cancer Group (GGC)—Unicancer guidelines recommend testing 13 genes (BRCA1, BRCA2, PALB2, TP53, CDH1, PTEN, RAD51C, RAD51D, MLH1, MSH2, MSH6, PMS2, and EPCAM) in HBOC individuals [6]. However, the mutation detection rate remains low, at around 15–20%, even in high-risk families. Identifying pathogenic variants in known responsible genes that are not detected by current screening methods could account for some of the cases that still lack a molecular diagnosis. One potential class of variants is mobile element insertions (MEI). MEI occur when an active mobile element inserts into a new genomic location [7]. MEI could then cause disease by disrupting a gene coding sequence, altering gene regulation, or altering splicing, depending on the insertion site. One example in familial breast cancer is the c.156_157insAlu in exon 3 of BRCA2, which represents 27% of all deleterious BRCA1 and BRCA2 mutations in northern/central Portugal, due to a founder effect [8]. Pathogenic MEI have been shown to be involved in various conditions and are estimated to account for approximately 0.03–0.1% of pathogenic variants responsible for genetic diseases [9]. However, this is likely to be a substantial underestimation, since it is challenging to detect this type of variant using routine molecular tests.

Here we report the detection of a pathogenic Alu insertion in the PALB2 gene by using a dedicated MEI detection pipeline. This variant c.2872_2888delinsGTGTCCCCAATACGTCTAATAAATAGGCCTGCAGGTCTAGAGCTCAAGAAAGAGCTCAGAGGTAGAAATGTATGTTTGCAAGTAGTCAGCACCAGATTTTATATTTCCAGATGA (or c.2872_2888delins114AluL2), was detected in a patient with a strong hereditary predisposition for breast cancer for whom no pathogenic variant had previously been identified by NGS data analysis.

Methods

Patients

All patients included in this study were seen for genetic consultation or are being treated at a specialist center for the disease of interest. Blood samples were sent to our laboratory for molecular analysis because of their clinical presentation and/or family history. All patients provided written informed consent for genetic analysis.

Target capture sequencing method

Libraries were prepared according to the Kapa sample preparation protocol (Kapa Biosystems®). Libraries were pooled and captured using the SeqCap EZ Choice Library (Roche/NimbleGen, Madison, WI) according to the manufacturer’s protocol and sequenced using an Illumina Platform (Illumina, San Diego, CA).

Data analysis

Reads were mapped to hg19 using BWA (0.7.12). Only alignments with at least 10 soft clipped bases were kept. Coverage for all bases was assessed using GATK (3.5) and was used to detect significant differences between two consecutive positions (breakpoint) using an in-house R script (R 3.5.3). The detection step is based on differences in read depth (all reads vs. only soft clipped reads) and depth ratios between the coordinates. Some putative breakpoints are filtered using mismatch ratio and indel ratio parameters, which excludes events that appear in repeat sequences or regions with low mappability. Consensus sequences around breakpoint were predicted using Trinity (2.8.4) from soft clipped reads and their mates. Rebuilt contigs were filtered out based on the read number used for Trinity reconstruction and the CIGAR alignment pattern. Selected sequences were annotated using Blast (2.0.9+) and the RepeatMasker track from UCSC (hits with p value < 0.01). Finally, a depiction of the region was created using the IGV-snapshot-automator script.

Dedicated mobile element insertion polymerase chain reaction and Sanger sequencing

Each detected event was validated by dedicated MEI PCR and Sanger sequencing. Based on the rebuilt sequence generated bioinformatically, we were able to design MEI-specific primers located within or at one breakpoint of the MEI. DNA was PCR-amplified using primers specific for the gene of interest or Alu insertion–specific primers, where one primer (forward or reverse) annealed to the Alu insertion breakpoint or within the Alu insertion, and the second primer was specific for the gene of interest. All amplifications were performed using the following conditions: 95 °C for 10 min, (95 °C for 30 s, 60 °C for 45 s, 72 °C for 1 min) ×40, and 72 °C for 20 min. The PCR products were then subjected to Sanger sequencing analysis using an ABI 3730DNA Analyzer (Life Technologies).

Results

We performed retrospective analysis of targeted NGS data from 362 cases referred to our laboratory for molecular diagnosis of HBOC (n = 197) or angiogenetic diseases (n = 162), and for whom no pathogenic variants (single nucleotide variants or copy number variant) were previously identified. Analysis of MEI was performed using a dedicated pipeline that involves the following key steps: (1) selection of clipped reads; (2) preselection of events based on coverage (depth ratio); (3) a first filtering step based on alignment quality; (4) reconstruction of the inserted sequence; (5) a second filtering step based on the reconstructed sequence; and (6) annotation of the inserted sequence (Supplementary Fig. 1).

Fifteen distinct MEI were detected in 55 samples (Table 1). We also analyzed three samples with known Alu insertions as positive controls (supplementary Table 1). Two samples had the c.156 157insAlu in exon 3 of BRCA2, previously detected by MLPA. The third sample had an Alu insertion within MSH2 exon 8 previously described [10].

Table 1 New mobile element insertions (MEI) detected in the retrospective cohort (n = 362).

Each event was analyzed by dedicated MEI PCR followed by Sanger sequencing. We molecularly confirmed the presence of nine MEI. According to American College of Medical Genetics and Genomics (ACMG) criteria [11, 12], six events (1, 2, 4, 5, 6, and 15) were classified as variant of unknown significance based on the frequency of their detection and three events (11, 12 and 13) detected on the same sample were classified as benign since these events correspond to the integration of a SMAD4 processed pseudogene [13]. NoPCR amplification of the mutated allele was observed for six MEI whereas wild-type allele amplification was obtained using specific primers for the gene of interest (Supplementary figure 2). This suggests that these events were false positives. One MEI was detected in exon 9 of PALB2 (Fig. 1A). The rebuilt sequence predicted a 114-bp LINE type L2 element insertion associated with deletion of 17 bp of the PALB2 sequence at the insertion site, (Fig. 1B). Molecular analysis confirmed the presence of this insertion. This variant, c.2872_2888delins114AluL2, disrupts the PALB2 coding sequence and leads to the production of a truncated protein, p.(Gln958Valfs*38) (Fig. 1C, D). This variant was classified as pathogenic according to ACMG criteria [11, 12]. This variant was detected in a women diagnosed with breast cancer at 55 years of age. This woman had a familial history of breast cancer: her mother and maternal grandmother had both died from breast cancer at the age of 42 and before the age of 50, respectively. No DNA was available to perform a segregation study.

Fig. 1: Mobile element insertion (MEI) in exon 9 of the PALB2 gene.
figure 1

A Integrative Genomics Viewer visualization of reads used for bioinformatics detection. B MEI rebuilt sequence generated by bioinformatic analysis. The PALB2 reference sequence (NM_024675.3) is indicated in black, with intronic sequences shown in lowercase letters and exonic sequence shown in uppercase letters. The detected LINE type L2 insertion sequence is indicated in blue. The PALB2 fragment that was deleted at the insertion breakpoints is indicated in bold. The primer sequences used for Sanger sequencing confirmation are underlined. C Gel electrophoresis of PALB2 exon 9 polymerase chain reaction (PCR) products from one control individual (C) and the patient (P) showing an additional band in the P sample compared to the C sample that corresponds to the allele with the insertion. D Schematic representation of the LINE type L2 element inserted into exon 9 of PALB2 and Sanger sequencing of the PCR product shown in C that confirmed the in silico rebuilt sequence.

Discussion

MEI are known to cause genetic disease, but detecting them in the context of routine diagnosis remains a challenge. We used a dedicated MEI detection pipeline to analyze targeted NGS data generated from the routine molecular diagnosis of 359 patients. We were able to identify a pathogenic MEI in exon 9 of PALB2 gene, suggesting that this type of variant may be responsible for cancers in some high-risk families for whom a pathogenic variant has not yet been identified. PALB2 (partner and localizer of BRCA2; also known as FANCN) was originally identified as a BRCA2-interacting protein that mediates BRCA2 recruitment to DNA damage sites and is therefore essential for BRCA2 function in double-strand break repair by homologous recombination [14]. Subsequently, it was also shown to interact with BRCA1, acting as a bridge between these two proteins [15]. PALB2 bi-allelic germline loss-of-function variants cause Fanconi’s anemia [16], whereas mono-allelic loss-of-function variants are associated with an increased risk of breast cancer [17]. The frequency of PALB2 pathogenic variants is variable among different populations, ranging from 0.73% to 3.40%. This is the first report of an MEI in this gene, even though it is known that large rearrangements can occur when initial breakpoints occur near Alu elements, and Alu-related tandem duplications have been reported in PALB2 [18, 19].

This case demonstrates that, even if MEI only account for a small proportion of pathogenic variants in HBOC syndrome, detecting these events could resolve some cases for which no molecular diagnosis was available. Identification of a genetic cause is essential for clinical management and genetic counseling in HBOC families. This study shed in light that MEI detection in routine diagnosis, using a dedicated pipeline that could be easily integrated into current diagnostic pipelines, could improve molecular diagnostic yield and the assessment of the prevalence of MEI in human disease.