Introduction

Since the identification of germline inactivating variants within the BRCA1 and BRCA2 genes that are implicated in hereditary breast and ovarian cancer (HBOC),1, 2 numerous other genes have also been found to be involved in this syndrome. Pathogenic variants within genes like TP53, PTEN, STK11 and CDH1 are associated with an increased risk of breast cancer, as well as other cancers and/or pathologies, defining the Li-Fraumeni syndrome, Cowden syndrome, Peutz-Jeghers syndrome and hereditary diffuse gastric cancer, respectively.3, 4, 5, 6 In the case of these syndromes, the clinical phenotype can target more easily the gene involved in the disease. Pathogenic variants in the PALB2 gene have a similar risk spectrum to BRCA2 for developing breast cancer. Pathogenic variants in RAD51C and in BRIP1 confer an increased risk of ovarian cancer,7, 8 whereas pathogenic variants of CHEK2 are associated with moderate risks of breast cancer.9 Finally, variants of other genes, including BARD1, RAD51B, RAD51D, XRCC2 and XRCC3, are associated with a low risk of breast and/or ovarian cancer, but their contribution and their penetrance remain to be characterized.10, 11, 12, 13, 14

Analysis of gene panels by next-generation sequencing (NGS) has resulted in the detection of a large number of new variants of unknown significance (VUS).15 However, current genetic counselling practice only considers variants if they directly affect protein structure; VUS are mostly ignored simply because their potentially deleterious character has not yet been confirmed. One way VUS can be deleterious is if they modify RNA splicing.16 Splicing depends on highly redundant but specific sequences in the gene’s pre-mRNA, such as 3′ and 5′ splice site (3′ss and 5′ss) consensus sequences and splicing regulatory elements.17, 18 VUS can directly impact these important sequences.19, 20 Most exons are constitutively included in transcripts but a few are alternatively included in a regulated manner and both alternative and constitutive splicing can be disturbed by splicing variants.19 The aberrant transcript that results can then contribute to carcinogenesis, as can mis-regulation of expression levels between transcripts.21, 22 Often the effect of the splicing variant is partial and normal transcript continues to be expressed by the mutated allele at varying levels.23, 24 In HBOC genes, it is estimated that one-third of potential deleterious variants impact mRNA splicing.15 Therefore study of how VUS impact gene expression at the RNA level in HBOC-related genes is a major means of identifying new molecular alterations that could be used for genetic counselling. However, the complexity of splicing regulation, mentioned above, makes the interpretation of VUS effects difficult. Moreover, while the splicing pattern of BRCA1 and BRCA2 has been extensively characterized,25, 26 the splicing pattern of the other HBOC-related genes has been less studied and the lack of a reference set of common splicing junctions currently hinders the interpretation of splicing VUS.

Targeted high-throughput sequencing of mRNA (targeted RNA-Seq) is a powerful method for detection of any alternative or abnormal transcripts.27 Here we chose a method based on the capture of exons28 without designing specific baits of known exon–exon junctions, to target a selected panel of transcripts of interest. In parallel, we developed a specific quantitative and qualitative bioinformatics and biostatistics pipeline to analyse transcripts and splicing variants in targeted RNA-Seq data. This pipeline enabled the identification of new alternative and abnormal junctions. For the first time, we describe, using this targeted RNA-Seq method, the splicing pattern of 11 HBOC-related genes: BARD1, BRCA1, BRCA2, BRIP1, CHEK2, PALB2, RAD51B, RAD51C, RAD51D, XRCC2 and XRCC3.

Patients and methods

Gene nomenclature

Nucleotide numbering of the transcripts was based on the cDNA sequences denoting c.1 as the first nucleotide of the translation initiation codon, according to the Human Genome Variation Society recommendations. Descriptions containing intronic positions were based on a genomic reference sequence. The NCBI accession numbers of the sequences used in this study are listed as follows: BARD1 (NG_012047.2 and NM_000465.3), BRCA1 (NG_005905.2 and NM_007294.3), BRCA2 (NG_012772.3 and NM_000059.3), BRIP1 (NG_007409.2 and NM_032043.2), CHEK2 (NG_008150.1 and NM_001005735.1), PALB2 (NG_007406.1 and NM_024675.3), RAD51B (NG_023267.2 and NM_133510.3), RAD51C (NG_023199.1 and NM_058216.2), RAD51D (NG_031858.1 and NM_002878.3), XRCC2 (NG_027988.1 and NM_005431.1) and XRCC3 (NG_011516.1 and NM_005432.3).

Biological material

Total RNA from eight patient lymphoblastoid cell lines (LCLs) were used as positive controls for the validation of the assay. These patients were carriers of variants causing abnormal splicing in BRCA1, BRCA2 or RAD51C: BRCA1 c.4675G>C (https://www.ncbi.nlm.nih.gov/clinvar/ SCV000538190); BRCA2 c.39−1G>A (SCV000536676); BRCA2 c.156_157insAlu (SCV000538191); BRCA2 c.475+3A>G (SCV000538192); BRCA2 c.7975A>G (SCV000538193); BRCA2 c.9501+3A>T (SCV000538194); RAD51C c.706−2A>G (SCV000536677); RAD51C c.1026+5_1026+7del (SCV000536678). All patients were carriers of splicing variant in only one gene, so they were used as negative controls for the other 10 non-mutated genes. In addition, total RNA from one voluntary healthy donor was also used as an independent negative control, as well as total RNA from one healthy breast tissue obtained after cosmetic contralateral surgery in a breast cancer patient with no HBOC familial history. Total RNA from 15 patient LCLs were also studied. These patients with suspicion of HBOC syndrome, who had previously tested negative for a BRCA1/2 pathogenic variant, were selected based on a predisposition probability higher than 90% according to the Claus model.29 All subjects gave informed consent for genetic analysis and were approved by the French Biomedicine Agency.

RNA extraction

LCLs were established for patients and controls. Total RNA was extracted using either TRIzol reagent (Invitrogen, Carlsbad, CA, USA), the AllPrep DNA/RNA Kit (Qiagen, Courtaboeuf, France) or the NucleoSpin RNA Kit (Macherey-Nagel, Hoerdt, France), according to the manufacturer’s instructions. Quality of all RNAs was assessed on the 2200 TapeStation (Agilent, Santa Clara, CA, USA) by the RNA integrity number. For all samples RNA integrity number was >7.5.

Sample preparation and targeted enrichment for NGS

We used Agilent eArray (SureDesign; Agilent) to design 120 nucleotide SureSelect solution library baits that target all known exons of the 11 genes of interest (see Supplementary Data). Fifty nucleotides of intron surrounding exons were also covered by baits, to allow bait design for all small exons up to 60 nucleotides. There were no baits spanning exon–exon junctions, to avoid bias for the enrichment of already known exon–exon junctions. The enrichment for targets of interest was performed on 1 μg of total RNA using the SureSelect RNA Reagent Kit, ILM (Agilent) according to the manufacturer’s instructions. Libraries were then sequenced on a NextSeq500 (Illumina, San Diego, CA, USA) using the high-output paired-end 2 × 101 bps program, with 16 to 18 samples per run.

Bioinformatics and biostatistics pipeline for the analysis of RNA-Seq data

We developed a specific bioinformatics pipeline for quantitative and qualitative analysis of targeted RNA-Seq data (Supplementary Figure S1). RNA-Seq reads were aligned to the human reference genome (hg19) using STAR v.2.3.0e with a splice junction database consisting of junctions from UCSC known genes and RefSeq.30 Splice junction read coverages were obtained from the output file from STAR, SAMtools v.1.3 and BEDTools v.2.17.0 to obtain counts for known or unknown splice junctions.31, 32 The HTseq-count v.0.6.1 program was used to count RNA-Seq reads by gene, with the ‘union’ mode option.33 Gene expression was normalized by DESeq2 v.2.1.6.0.34 We used BEDTools and homemade scripts (available upon request from GenoSplice technology) for read counts and junction annotations. Only junctions with 100 reads or more in at least one sample per run were analysed; those whose coverage was below this threshold were considered as background sequencing noise or related to very weakly expressed spliced transcripts. However, with this method, every aberrant junction, even if detected in only one sample, was selected for analysis. The Δ6q, 7 transcript in BRCA2, which contains an exon 6-derived TG dinucleotide between flanking exons 5 and 8, was detected and analysed using a homemade pattern research script (available upon request from GenoSplice technology). We classified detected junctions into different types of splicing events: cryptic exon inclusion, exon skipping and use of intronic/exonic cryptic splice site. For each junction and each individual, we considered the ratio of alternative/constitutive junction counts. Thereafter, these ratios were expressed as percentages. We considered only spliced junctions that were expressed over 0.1%.

We established the splicing pattern of each captured gene (see formulae in Figure 1) with a calculation method described in Supplementary Information. First, for each junction, we applied a procedure to eliminate outliers that might be abnormal or overexpressed junctions. Then we modelled the distribution of percentages of junction reads according to a gamma distribution or a negative binomial distribution (see Supplementary Information for further details). Given the modelling of the values of the percentages of junction reads, we computed for each percentage the probability that the value was in the distribution. Given the ith event, considering Ri a random variable distributed according to the modelling (gamma or negative binomial distributions), then the probability that a percentage (p) to be in the distribution is defined as P(Ri>p). A P-value <5% was then considered significant.

Figure 1
figure 1

Splicing junctions and calculation of the percentage of junction reads. We classified each detected junction and calculated the percentage of junction reads using the formulae on the right. Exons are represented by black boxes, constitutive junctions are represented by thick black lines and alternative junctions are represented by thin grey lines. The calculation method is described in Supplementary Information. Types of splicing modifications considered are: (a) cryptic exon inclusion; (b) exon skipping; (c) multiple exon skipping; (d) splice intronic donor shift; (e) splice intronic acceptor shift; (f) splice exonic donor shift and (g) splice exonic acceptor shift.

Alternative spliced transcripts nomenclature

The nomenclature of alternative spliced transcripts in this study follows the convention used by Colombo et al25 and Fackenthal et al,26 and the Human Genome Variation Society guidelines. The letter delta (Δ) indicates alternative event resulting from single exon skipping. Commas or dashes indicate events resulting in skipping of two or more contiguous exons, respectively. Events involving a shifting of 5′ss (distal) or 3′ss (proximal) sites are indicated with p or q, respectively. Cryptic alternative 5′ss or 3′ss uses within introns are indicated as Xp or Xq, where X represents the exon number. Cryptic exon inclusion is indicated as XA.

Validation of new alternative spliced transcripts

Reverse transcriptase PCRs (RT-PCRs) were performed from 200 ng of total RNA using the Onestep RT-PCR Kit (Qiagen) as described previously.35 Two strategies were used depending on the frequency of the alternative spliced transcript tested. For highly expressed alternative spliced transcripts, RT-PCRs were performed with primers flanking the spliced region. To detect very weakly expressed alternative spliced transcripts, we chose primers overlapping the splice junction, with 1–4 nucleotides' overlap (sequences available upon request).

Results

Description of the splicing patterns of the 11 captured HBOC genes

We used a tailored exon capture enrichment strategy to focus on the RNA splicing junctions of 11 HBOC genes: BARD1, BRCA1, BRCA2, BRIP1, CHEK2, PALB2, RAD51B, RAD51C, RAD51D, XRCC2 and XRCC3. On average, for each run, we generated 900 million sequence reads, with at least 50 million reads per sample. Coverage of all the small and large exons of interest was 100%, with a sequencing depth between 700x and 305 000x (data not shown). First, the global splicing pattern of these genes was determined using all LCL samples, by considering all alternative spliced transcripts with a percentage of alternative/normal junction reads more than 0.1% and excluding outliers events (see Patients and Methods) (Supplementary Table S1).

To verify if our method could reliably detect alternative splicing in more detail, we compared BRCA1 junctions detected by our method, in controls, with those described previously by Colombo et al.25 All the 10 alternative spliced transcripts previously classified as predominant were detected, as well as the majority of the very weakly expressed alternative spliced transcripts (Supplementary Table S2). Transcripts containing alternative terminal exons (eg, BRCA1-IRIS),36 resulting from the use of alternative transcription termination sites, could not be detected in our analyses because they do not create additional junction. We confirmed the presence of the newly discovered transcripts by RT-PCR. These include three inclusions of cryptic exons (2Aa (0.95%), 2Ab (2.1%) and 2B (0.28%)) and two splice donor shifts (Δ15q (0.16%) and 16q (0.24%)) (Supplementary Figure S2). Of the 38 spliced transcripts detected in the BRCA1 gene by RNA-Seq, 13 (34%) generate premature stop codons (Supplementary Table S2). The 17 most frequent BRCA1 spliced transcripts defined according to our method are represented in the Figure 2a.

Figure 2
figure 2

Schematic representation of the main ubiquitously expressed alternative splicing events in 11 HBOC-related genes, BRCA1, BRCA2, RAD51C, PALB2, BARD1, RAD51B, BRIP1, XRCC2, CHEK2, XRCC3, RAD51D (a-k respectively). Summary of splicing events detected by targeted RNA-Seq in the current study. Genes are represented in grey, boxes correspond to exons and horizontal lines correspond to introns. Exons and introns are not drawn to scale. For each alternative junction, we calculated the percentage of junction reads and established the splicing pattern of each captured gene (see Figure 1 and Supplementary Information for the calculation method). All the splicing events shown here were detected with a percentage of junction reads >1%. Junctions expressed more than 10% are represented in bold, junctions expressed between 5 and 10% are represented by dotted lines and junctions expressed <5% are represented with thin lines. The exhaustive list of the splicing events detected in all captured genes is available in the Supplementary Table S1.

For BRCA2, we performed the same comparison of detected junctions between our method and those recently described by Fackenthal et al.26 As for BRCA1, all of the alternative spliced transcripts previously classified as predominant were detected, as well as the majority of the very weakly expressed alternative spliced transcripts (Supplementary Table S3). We confirmed nine newly discovered splicing events in BRCA2 by RT-PCR: five cryptic exon inclusions (18 A (0.31%), 20B (0.22%), 24 A (0.7%), 24B (3.1%) and 25 A (0.81%)), three exon skipping (Δ7 (0.09%), Δ(9,10) (0.14%) and Δ17_19 (0.04%)) and one splice acceptor shift (Δ4p (1.2%)) (Supplementary Figures S3 and S4). The new exon skipping Δ17_19 was the most weakly expressed event to be detected by RNA-Seq and confirmed by RT-PCR. Of the 27 spliced transcripts detected in the BRCA2 gene by RNA-Seq, 15 (56%) generate premature stop codons (Supplementary Table S3). The most frequent BRCA2 spliced transcripts defined according to our method are represented in Figure 2b.

These results confirmed that our approach was able to identify known and new alternative spliced transcripts in BRCA1 and BRCA2. In the same way, we described the main splicing pattern of the other nine genes studied (Figure 2). Our targeted RNA-Seq method could detect all splicing species (cryptic exon inclusion, single or multiple exon(s) skipping, splice donor/acceptor shifts). All splicing junctions detected by RNA-Seq for these 11 genes are listed in the Supplementary Table S1. Next, we performed a comprehensive screen of alternative spliced transcripts in one healthy breast tissue sample. Interestingly, according to statistical analysis, we qualitatively and quantitatively detected the same well-represented spliced transcripts previously identified in these 11 genes in LCLs (Supplementary Table S4).

Detection of splicing alterations in captured HBOC genes

Here, we analysed RNA extracted from eight LCLs with known splicing variants in BRCA1, BRCA2 and RAD51C (Table 1). All specific transcript anomalies were detected in these specific LCLs (see final column in Table 1), thus validating our technique.

Table 1 BRCA1, BRCA2 and RAD51C splicing variants carried by patients used in the current study showing detection of aberrant splices

For the BRCA1 c.4675G>C pathogenic variant (Table 1), located in the last position of exon 15, a total effect on splicing has been shown by RT-PCR (Supplementary Figure S5). Analysing the junction data, we observed, as expected, a slight strengthening of the out-of-frame skipping of exon 15 (Δ15) (1.3% against 0.15±0.11%, corrected P-value <0.01) and a strong reinforcement of the out-of-frame deletion of the last 11 nucleotides of BRCA1 exon 15 (Δ15q) (13.4% against 0.16±0.13%, corrected P-value <0.001) (Figure 3a and Supplementary Table S5). When visualizing the mapped RNA-Seq sequences at position c.4675, only the nucleotide G corresponding to the wild-type allele was observed, confirming the drastic splicing defect (Figure 3b).

Figure 3
figure 3

Impact of the BRCA1 c.4675G>C pathogenic variant on the splicing pattern of BRCA1 exon 15 in lymphoblastoid cell lines of a heterozygous HBOC patient. (a) Cartography of the percentage of junction reads observed between BRCA1 exons 14 and 16. Bold lines represent the junctions 14–15 and 15–16. Thin lines represent the alternative junctions 14–16, corresponding to exon 15 skipping. The abnormally enhanced junction Δ15q detected in the cell line carrying the variant is shown below and represents the out-of-frame deletion of the last 11 nucleotides of BRCA1 exon 15. (b) Coverage, sequencing depth and sequence observed at the variant positions.

For the BRCA2 c.39−1G>A pathogenic variant (Table 1), located in intron 1, a total effect on splicing has been also identified by RT-PCR (Supplementary Figure S5). Analysing the junction data, we observed, as expected, the aberrant skipping of exon 2 (Δ2) (20.9%) and aberrant combined skipping of exons 2 and 3 (Δ2, 3) (4.8%). These two spliced transcripts were not detected in the controls (Supplementary Table S1). Both these events cause loss of the translation initiation codon of BRCA2, which is naturally located in exon 2. The LCL carrying this variant also carried the heterozygous single-nucleotide polymorphism c.−26G>A. When visualizing the mapped RNA-Seq sequences at position c.−26, only the nucleotide G was observed, confirming the drastic splicing defect (data not shown).

In the same manner, our junction data confirmed that the BRCA2 c.156_157insAlu, which is a Portuguese founder pathogenic variant,37 causes the strong reinforcement of the in-frame skipping of exon 3 (Δ3) (46% against 1.2±0.9%, corrected P-value <0.001) (Table 1 and Supplementary Table S5).

Another total effect on splicing was observed for the BRCA2 c.475+3A>G pathogenic variant, which causes the out-of-frame skipping of exon 5, as shown in our RT-PCR experiments (Table 1 and Supplementary Figure S5). Our targeted RNA-Seq data confirmed the strong exon 5 skipping (Δ5) (94% against 2.5±2%, corrected P-value <0.001) (Supplementary Table S5).

The BRCA2 c.7975 A>G unclassified variation, located in the penultimate position of exon 17, caused the moderate in-frame skipping of exon 17 (Δ17) (3% against undetected here), as expected38 (Table 1 and Supplementary Table S5). When visualizing the mapped RNA-Seq sequences at position c.7975, the two alleles were observed, which confirms the partial effect of this variant on splicing (data not shown).

Another partial effect on splicing was observed for the BRCA2 c.9501+3 A>T unclassified variation, located in intron 25, which causes moderate (2.6%) frameshift skipping of exon 25 (Table 1).23 This spliced transcript was not detected in the controls (Supplementary Table S1).

The RAD51C c.1026+5_1026+7del (intron 8) and the RAD51C c.706−2A>G (intron 4) pathogenic variants caused the total frameshift skipping of exon 8 and in-frame skipping of exon 5, respectively11 (Table 1). These two exon skips were detected as very weak spliced transcripts in controls (Supplementary Table S1). For the first pathogenic variant, we correctly detected the exon 8 skipping (Δ8) (17.9% against 0.08±0.05%, corrected P-value <0.001). For the second pathogenic variant, we found that spliced transcript without exon 5 was overexpressed (Δ5) (29.2% against 0.03±0.03%, corrected P-value <0.001), but there were two other spliced transcripts, including overexpression of the frameshifting combined skipping of exons 4 and 5 (Δ4, 5) (3.6% against 0.04±0.05%, corrected P-value <0.001), as well as the expression of an abnormal frameshifting combined skipping of exons 5, 6 and 7 (Δ5_7) (0.48%) (Figure 4a and Supplementary Table S5). We confirmed these two aberrant spliced transcripts by RT-PCR with primers located in exons 2 and 8 (Figure 4b). We also noticed the presence of a weakly expressed additional transcript corresponding to the combination of the skipping of exons 5 and 7 (Δ5, Δ7) caused by the mutated allele. Moreover, we detected a new splice junction, corresponding to an alternative acceptor splice site, created de novo by the variant and recognized by the splicing machinery (Δ5p) (Figure 4a). Overall, these results confirm the detection and quantification of known and new mRNA transcripts in HBOC-relevant genes by our targeted RNA-Seq approach.

Figure 4
figure 4

Impact of the RAD51C c.706−2 A>G pathogenic variant on the splicing of RAD51C exon 5 in LCLs of an HBOC patient, heterozygous for this variation. (a) RAD51C exon 5 cartography of the abnormal splicing junctions detected in LCLs from a patient with the splicing variant c.706−2 A>G (Table 1). The gene is represented by boxes corresponding to exons and horizontal lines corresponding to introns. Exons and introns are not drawn to scale. Junctions that are used at over 10% are represented in bold, junctions expressed between 1 and 10% are shown as dotted lines and junctions <1% are thin lines. (b) RT-PCR analysis of the RAD51C variation’s effect on splicing, using primers located in exons 2 and 8. Identities of the different transcripts are indicated on the right of the gel. The Δ7 splice transcript is detected in both control and patient samples, whereas the Δ5, Δ4, 5 and combination of Δ5 and Δ7 (Δ5, Δ7) (in bold) are only detected in the patient.

Discussion

Interpretation of VUS is a major challenge for the laboratories performing the molecular diagnosis of HBOC, especially considering that many genes are now known to be involved in the syndrome. One of the main areas contributing to understanding the functional impact of these variants is an investigation of their effects on RNA splicing to find out if they could lead to aberrant RNAs and consequently to potential loss of function of the proteins. Theoretically, any detected variation can affect RNA splicing.20 Here we developed a targeted RNA-Seq approach with bioinformatics and biostatistics analyses, which allows detection and quantification of splicing junctions in many genes simultaneously, with excellent sensitivity. Our ‘exon-restricted’ capture set designed for 11 HBOC-related genes enables the efficient capture of all exons and the efficient detection of known and new alternative splicing junctions, as well as cryptic exon inclusions. This strategy is applicable to any RNA-Seq platform capable of sequencing at least 5 Gb per sample after targeted capture.

All presented analyses were performed on LCL samples. However, the PAXgene system, which provides a snapshot of the transcripts at the time of sampling, is widely used in laboratories for molecular diagnosis and for studying the effect of a variant on splicing.24 Today this system is even used for RNA collection in the ‘100 000 Genomes Project’39 led by Genetics England for the National Health Service (http://www.genomicsengland.co.uk). Indeed, we observed, in a preliminary step, that the PAXgene system does not seem to be adapted to RNA-Seq, for very low expressed HBOC-associated genes, such as BRCA2 (Supplementary Figure S6). Analysis of this sample type would require a technical adjustment, with a decrease in the number of samples sequenced per run and an increase in sequencing capacity. However, these changes would have a significant impact on sequencing costs. Analysis of LCLs from patients, albeit time consuming, overcomes this limitation because in this type of sample the expression levels of all the genes were compatible with splicing analysis by RNA-Seq. However, Epstein–Barr virus transformation, modifying the global gene expression,40 might modify the splicing pattern. These results were obtained by sequencing mRNA from LCLs that we did not treat with a nonsense-mediated mRNA decay inhibitor, such as puromycin. In this way, we detected the most representative splicing pattern of these 11 genes (Supplementary Table S1). Moreover, we detected all expected abnormal spliced transcripts (Table 1). Of note, no outlier event was detected in the 15 patient LCLs (Supplementary Table S5). We assumed that they did not have splicing anomalies that could explain the familial presentation.

For BRCA1 and BRCA2, our method detected all most frequent spliced transcripts and most of the minor alternative spliced transcripts described in the literature by fragment analysis25, 26 (Supplementary Tables S1–S3). All the events that we did not detect were known to be very weakly expressed. This difference in sensitivity can be explained by the fact that we do not use a targeted RT-PCR method but a capture of exons to study simultaneously the set of captured transcripts. Targeted RNA-Seq has the advantage of simultaneously analysing the alternative spliced transcripts in many genes with high sensitivity. Our targeted RNA-Seq strategy does not attempt to describe rare splicing junctions since we established a detection limit of 100 reads (Supplementary Tables S1–S5). Importantly, our targeted RNA-Seq method allowed the detection of 14 novel alternative spliced transcripts in BRCA1 and BRCA2, with a percentage of junction reads between 0.04 and 3.1%.

We present here the characterization of the splicing pattern of 11 genes of interest in HBOC. This is the first time that the near exhaustive alternative splicing junctions sets have been described in HBOC-related genes other than BRCA1 and BRCA2 by RNA-Seq, and for these two genes we have increased the described repertoire of alternative spliced transcripts. Altogether, these new results provide an important resource for interpretation of VUS’ splicing impact for these genes. Furthermore, our data suggest that predominant alternative splicing in these 11 genes is similar in blood and breast tissues. The conclusion is that splicing analyses performed on, easy-to-obtain, blood samples are relevant for diagnosis.

In addition, our bioinformatics and biostatistics pipeline detected all the qualitatively and quantitatively abnormal splice junctions caused by variants that were previously detected by conventional methods in BRCA1, BRCA2 and RAD51C (Table 1). Indeed, we detected both partial and total effects on splicing without clearly distinguishing them in all cases, as our approach gives globally indicative rather than definitive exact quantification. Interestingly, we observed that a variant may cause a series of complex splicing anomalies (Figure 4a). Here we suggest a method for detection of the most representative splicing pattern and aberrant spliced transcripts in 11 HBOC-related genes. The method allows detection of abnormal splicing junctions caused by variants, including potential variants located in deep intronic regions that are far away from those that can be detected by routine exon-centric DNA sequencing. Our method could be included in a global strategy to classify variants for their pathogenicity. Indeed, we could consider our method as a first line method to detect abnormal splicing in patients. These events have to be confirmed with specific targeted approaches such as minigene assay or quantitative RT-PCR, to characterize the total or partial effect on splicing.23, 41 If the interpretation of RNA studies remain tricky, these results may be included in a multifactorial likelihood model calculating the posterior probability that the variant is pathogenic.42, 43, 44

The proposed strategy for the data analysis could be applied for studying splicing by targeted RNA-Seq for any complex genetic disease.