Introduction

Most mammalian genes consist of multiple exons interspersed by long intronic sequences [1]. To create mature mRNA, the introns must be correctly identified and ‘spliced out’, and the exons joined together [1]. The spliceosome, the splicing machinery responsible for this process, recognises conserved motifs at or near the intron ends and a branch site within the intron [1]. Splice regulatory elements (SRE) near exon–intron boundaries, such as serine–arginine rich (SR) or heterogeneous nuclear ribonucleoparticle (hnRNP) proteins, are indispensable for correct splice site identification [2]. These elements can enhance or repress splicing and play an important role in alternative splicing [1, 2].

Of the variants that cause disease, 15–60% are proposed to disrupt splicing [3]. Included are variants in the canonical splice site, which directly alter the canonical splice site efficiency, but also intronic and exonic variants that alter the SREs or result in the creation of a new splice site or activation of a cryptic splice site [3, 4]. The latter could result in inclusion of a pseudoexon, an intronic sequence wrongly interpreted as an exon. Exon skipping or inclusion of a pseudoexon often results in a shift of the open reading frame, resulting in a premature stop codon or leading to a non- or less-functional protein. The mechanism of nonsense-mediated decay (NMD) in which mRNAs with a premature termination codon are (partially) degraded can remove aberrant mRNAs encoding for truncated proteins, ensuring mRNA quality [5].

Current assessment whether variants result in aberrant RNA transcripts often consists of in silico prediction with bioinformatics prediction tools, sometimes followed by reverse transcriptase polymerase chain reaction (RT-PCR) analysis of RNA extracted from blood [6,7,8] or functional splicing reporter minigene assays [4, 9]. Assessing alternative splicing using RNA-seq data has been described, but until now this has only been shown to be feasible with high-quality RNA [10, 11]. Additionally, splicing microarrays can be used for large-scale identification of splicing differences but are not always implemented in current diagnostics [3]. Although high-quality patient RNA analysis is usually preferred, this RNA is not always available, or the analysis is hampered because of degradation of aberrant transcripts through NMD [1].

With the current high-throughput sequencing methods applied in molecular tumour diagnostics, many variants are found, most of uncertain significance (VUS); hence, functional tests are required to classify these variants. Of these unclassified variants, a percentage is predicted to affect splicing. Specific kits are available to isolate RNA from formalin-fixed paraffin-embedded (FFPE) tissue blocks, and previous studies show that PCR, RT-PCR and even next-generation sequencing (NGS) are possible on these RNA samples [12, 13]. RNA analysis on RNA isolated from the FFPE tissue is currently not standardly performed but would enable the analysis of somatic splice site variants.

In the current study, the effect of splice site variants was examined in multiple cancer susceptibility genes, MLH1, MSH2, MSH6, APC and BRCA1. MLH1, MSH2 and MSH6 are part of the mismatch repair (MMR) pathway. Pathogenic heterozygous germ line variants in the MMR genes cause Lynch Syndrome, an autosomal dominant predisposition for colorectal, endometrial and other cancers [14]. Other known causes of MMR deficiency are somatic MLH1 promoter hypermethylation and the recently described biallelic somatic inactivation of the MMR genes caused by somatic variants [15,16,17]. Pathogenic germ line variants in the APC gene are known to result in familial adenomatous polyposis, a dominant disorder characterised by the occurrence of hundreds to even thousands of adenomas throughout the colon [18, 19]. In a small percentage of patients, the tumour phenotype can be explained by mosaic APC variants [20,21,22]. These variants can be easily detected by screening multiple adenomas, because the APC variant is present with a higher variant allele frequency in the tumour [20, 23]. Pathogenic variants in the BRCA1 gene, a key player in the nucleotide excision repair pathway, result in a high susceptibility to breast and ovarian cancers [24, 25]. Because BRCA-mutation status affects treatment strategies (PARP-inhibitors), the ability to detect and functionally assess both germ line and somatic mutations in BRCA1 and BRCA2 must increase [26, 27]. With the shift towards increased diagnostic screening of tumour tissue for all three syndromes, more somatic variants are found, which require functional tests to assess their pathogenicity.

The aim of our study was to investigate the possibility of analysing RNA isolated from the FFPE tissue to assess the effect of germ line and somatic variants predicted to affect splicing. We hypothesised that formalin fixation could inhibit RNA degradation, enabling the detection of aberrant RNA in FFPE tissues.

Materials and methods

Selection of variants

In total, 13 variants in the cancer susceptibility genes MLH1, MSH2, MSH6, APC and BRCA1 were tested for their effect on splicing (Supplemental Table 1). Of all variants, eight were somatic variants found between 2014 and 2017 with NGS in a previous study [28] or through molecular tumour diagnostic NGS, with all having a variant allele frequency of at least 12%. Five were germ line splice site variants, all previously demonstrated to result in aberrant RNA (Supplemental Table 1). The MLH1 c.(453+1_454-1)_(545+1_546-1)del, a germ line genomic exon 6 deletion, was added as a positive control.

Total nucleic acid isolation and cDNA synthesis

For 11 variants, total nucleic acid was obtained from tissue cores punched from FFPE blocks embedded between 2009 and 2016. Tumour areas were marked on a hematoxylin and eosin stained slide by a pathologist. Tissue cores from the corresponding area on the FFPE block were punched with a 0.6-mm biopsy needle. Total nucleic acid (undivided RNA/DNA) was isolated from the obtained punches and microdissected areas with a Tissue Preparation System with VERSANT Tissue Preparation Reagents (Siemens Healthcare Diagnostics, Tarrytown, NY, USA) as previously described [29]. For two variants (MLH1 c.791-1G>C and MSH2 c.1511-2A>G), no FFPE tissue was available, but Epstein-Barr virus (EBV)-immortalised B-cells were cultured. Additionally, RNA from the MSH2 c.1511-2A>G B-cell line was isolated after incubating the cells with 4% formalin for 5 h. RNA from the B-cell lines and three colorectal cancer cell lines (SW480, SW837 and LS180) was isolated using a Nucleospin RNA isolation kit (Macherey-Nagel-06/2015, Rev.17, Düren, Germany) according to the manufacturer’s protocol. Colorectal cancer cell lines were used as a positive control for RNA expression.

All cDNA was synthesised using OligoDT’s and random primers with the SuperScript VILO cDNA synthesis Kit (Invitrogen, Carlsbad, CA, USA) according to the manufacturer’s protocol using total nucleic acid. To show that RNA was successfully isolated and cDNA was synthesised, HNRNPM and/or CPSF6 housekeeping gene expression was assessed. Also, gene expression of the tested genes (MLH1, MSH2, MSH6, APC and BRCA1) was assessed using qPCR primers.

Splice site prediction/variant nomenclature

For (canonical) splice site prediction, Alamut (Interactive Biosoftware, Rouen, France) was used. This software package includes the in silico splice site prediction algorithms SpliceSite Finder (SSF), MaxEntScan (MES) (http://genes.mit.edu/burgelab/maxent/Xmaxentscan_scoreseq.html), NNSPLICE (http://www.fruitfly.org/seq_tools/splice.html) and Human Splicing Finder (HSF, http://www.umd.be/HSF/). Variants were annotated according to the Human Genetics Variation Society (HGVS) guidelines. Recommendations of the HGVS were followed to use the term “variant” instead of “mutation” (http://www.hgvs.org/mutnomen/recs.html). The following Genbank reference sequences were used: NM_000249.2/NG_007109.2 for MLH1, NM_000251.2/NG_007110.2 for MSH2, NM_000179.2/NG_007111.1 for MSH6, NM_000038.5/NG_008481.4 for APC and NM_007294.3/NG_005905.2 for BRCA1. All variants have been submitted to the LOVD databases. The MMR and APC variants were submitted to www.insight-database.org (patient IDs 00033694-00033800 and 00033702-00033705), and the BRCA1 variants were added to https://databases.lovd.nl/shared (patient IDs 00144415 and 00144416).

Primer design and PCR

For all variants with an (predicted) exon skip, two primer pairs were created to amplify two exon–exon boundaries. Primers were used in three combinations: Forward1/Reverse1, Foward2/Reverse2 and Forward1/Reverse2. For all variants with a partial exon skip or pseudoexon insertion as (predicted) RNA effect, primers were designed to amplify the exon–exon boundary. All primer sequences are listed in Supplemental Table 2.

Real-time PCR was used to amplify the exon–exon boundaries and to assess the expression of the affected gene. All PCR reactions were performed on a CFX96 touch Realtime PCR machine (Bio-rad, Hercules, CA, USA) with the following PCR program: 95 °C for 5 min (1 cycle), 95 °C for 15 s, 60 °C for 30 s and 72 °C for 30 s (38 cycles), followed by a melt curve from 65 °C to 95 °C with a 0.5 °C increment for 5 s with plate read. When no PCR product was detected, PCR was repeated with high cDNA input and 44 instead of 38 cycles. Because of limited cDNA, only F1/R2 was repeated for variants with two primer pairs, when the first PCR failed. All PCR products were analysed on a Qiaxcel capillary electrophoresis system (Qiagen, Hilden, Germany) and sequenced with Sanger Sequencing.

Results

MMR variants

Five MMR splice variants and one MLH1 genomic exon deletion in RNA isolated from FFPE tissue were analysed for their effect on splicing (Table 1). Total nucleic acid was isolated from the FFPE blocks and converted to cDNA using OligoDT’s and random primers. The quality of cDNA was evaluated by detecting the expression of housekeeping genes (HKG) HNRNPM and/or CPSF6. From five FFPE tissue blocks, HKG expression was detected, and in three of the five, cDNA from the affected MMR gene could be amplified and analysed (MLH1 c.(453+1_454-1)_(545+1_546-1)del, MLH1 c.2104G>C and MSH6 c.3801+1_3801+5del). The amplified products of the three MMR cDNAs from FFPE tissues were measured with Qiaxcel (Fig. 1a) and sequenced (Fig. 1b). Size determination of the cDNAs carrying the MLH1 c.(453+1_454-1)_(545+1_546-1)del and MSH6 c.3801+1_3801+5del variants showed only a product size smaller than that of the wildtype (WT) control, whereas MLH1 c.2104-1G>C only showed a product comparable in size with that of the WT product. Sequencing showed an aberrant product in two of the three FFPE samples; the MLH1 genomic exon 6 deletion (r.454_545del) and an r.3647_3801 deletion in the MSH6 c.3801+1_3801+5del sample (Fig. 1b), whereas for MLH1 c.2104-1G>C, sequencing was normal, as was that for the WT control.

Table 1 Results of splicing assay
Fig. 1
figure 1

Size determination and sequencing results. Qiaxcel results showing the size of the MMR variants (a), the APC variants (c) and BRCA1 variants (e) of patient material (pat), cell lines (Cell) and control RNA isolated from colorectal cell lines (C+). The MSH2 cell line was analysed with (Cell+) and without (Cell−) formalin fixation of the cells. The BRCA1 variants were analysed with the same primers, pat1 shows the BRCA1 c.212+3A>T and pat2 shows the BRCA1 c.213-12A>G. b, d and f Sanger sequencing results of variants showing aberrant products

Additionally, RNA isolated from the two EBV-immortalised B-cell lines carrying an MSH2 c.1511-2A>G and an MLH1 c.791-1G>C splice site variant was tested. Size determination of the cDNA from the two cell lines showed a product of approximately 350 bp (comparable with the WT control) and a product of 250 bp for the MLH1 c.791-1G>C sample and a product comparable in size with that of the WT for the MSH2 c.1511-2A>G sample. Sequencing detected an aberrant product only in the MLH1 c.791-1G>C sample, which showed an r.791_884 deletion. To mimic FFPE conditions, 4% formalin was added to fixate EBV- immortalised B-cells carrying the MSH2 variant. After fixation, RNA was isolated, and cDNA was synthesised following the same protocol as that for the non-formalin fixed cells. cDNA from the formalin-fixed cells was tested and showed a size comparable with that of the WT (Fig. 1a), and an aberrant product was detected with Sanger Sequencing (Fig. 1b).

APC variants

Three APC variants were analysed for their effect on splicing (Table 1). RNA was successfully isolated from all three FFPE blocks, shown by the detection of HKG expression. From all three samples, cDNA from APC could be amplified (Fig. 1c) and sequenced (Fig. 1d). Compared with the control without the variant, the difference in size of APC c.1548G>A was almost 125 bp. Sequencing showed an aberrant product for all three variants: an APC r.830_834 deletion in the APC c.834+2T>A sample; a one-nucleotide deletion (r.1959del) in the APC c.1959-1G>A sample; and an r.1409_1548 deletion in the APC c.1548G>A sample (Fig. 1d).

BRCA1 variants

Two BRCA1 splice site variants were analysed for their effect on splicing (Table 1). RNA was successfully isolated from both FFPE blocks (shown by HKG detection), and BRCA1 cDNA was amplified and analysed. PCR was performed with the same primers for both variants. Size determination indicated a smaller (BRCA1 c.212+3A>T) and a slightly larger (BRCA1 c.213-12A>G) product compared with that of the WT control (C+, Fig. 1e). Sequencing showed a BRCA1 r.191_212 deletion (BRCA1 c.212+3A>T) and an inclusion of the last 11 nucleotides of intron 5 corresponding to the BRCA1 c.213-11_c.213-1 sequence (BRCA1 c.213-12A>G, Fig. 1f).

Discussion

We performed RNA analysis for six splice site variants known to result in aberrant splicing and five variants predicted to result in aberrant splicing using RNA isolated from FFPE tissues. For the six variants shown previously to result in aberrant splicing, the reported splice effect was confirmed for four, and for the other two variants, RNA analysis was not possible because either RNA isolation from FFPE tissue failed (no expression of HKG) or the affected gene (i.e., MLH1) did not show expression in the presence of positive HKG expression. In all four confirmed splice effects, no WT product was identified. For two variants (MLH1 c.(453+1_454-1)_(545+1_546-1)del and BRCA1 c.213-12A>G), this result was expected because of the high variant allele frequencies (approximately 100%, as detected with Sanger Sequencing and 93%, respectively). The APC c.1548G>A and MSH6 c.3801+1_3801+5del had variant allele frequencies of 28% and 50%, respectively, but only aberrant product was detected, which could be due to preferential amplification of the smaller (aberrant) product or possible FFPE-induced RNA degradation. For the APC variant predicted to result in an exon 11 skip, the F2/R2 primers that amplified the boundary of exon 11–exon 12 produced a product, which showed that WT cDNA product with exon 11 was present.

For two variants, MLH1 c.791-1G>C and MSH2 c.1511-2A>G, RNA was isolated from EBV-immortalised B-cell lines because the FFPE tissue was not available. Both previously resulted in an (partial) exon skip [4, 30,31,32], which changed the reading frame and led to a premature stop codon; however, in the current study, aberrant splicing was only confirmed for one of the two variants (MLH1 c.791-1G>C), whereas only WT transcript was detected for the MSH2 c.1511-2A>G. The B-cell lines were cultured without NMD inhibitors, and we hypothesised that the aberrant MSH2 RNA was possibly degraded through NMD. Detection of MSH2 c.1511-2A>G RNA and no detection of the MLH1 c.791-1G>C RNA is consistent with previous studies in which NMD inhibitors were omitted [33, 34]. Notably, the aberrant RNA was detected after formalin fixation of the B-cells carrying the MSH2 c.1511-2A>G, which is consistent with our hypothesis that aberrant RNA in the FFPE tissue can be detected because formalin fixation prevents the degradation of RNA.

For the five variants predicted to affect the canonical splice site, three resulted in aberrant splicing, showing the predicted splice effect. For the other two, in one (MLH1 c.2104-1G>C), only the WT transcript was detected, and in the other, no expression of the affected gene (MLH1) was detected in the presence of normal HKG expression. The APC c.834+2T>A and APC c.1959-1G>A showed WT and aberrant product, which can be explained by the variant allele frequencies (12% and 40%, respectively). The BRCA1 c.212+3A>T had a variant allele frequency of 54%, although only the aberrant product was detected, which was possibly due to preferential amplification of the smaller (aberrant) product or because no RNA expression of the WT allele occurred. However, FFPE-related RNA degradation could not be excluded.

Two of the five variants predicted to result in splicing (MLH1 c.885-2A>G and APC c.1959-1G>A) have previously been described to be (likely) pathogenic, even though the functional effect was unknown [35, 36]. The remaining three variants (MLH1 c.2104-1G>C, APC c.834+2T>A and BRCA1 c.212+3A>T) are novel, although the BRCA1 c.212+3A>G variant has been described to result in aberrant splicing. Aberrant RNA was detected in three variants predicted to affect splicing, while RNA could not be assessed for the pathogenic MLH1 c.885-2A>G variant. The RNA effect seen was conform to the predicted effect, confirming that in silico splice prediction tools are reliable in their predictions, particularly for variants disrupting the canonical splice site, although experimental analysis is required [4, 7, 37]. For two novel variants, this study supports their pathogenic effect.

Interestingly, no aberrant splicing was shown for the pathogenic MLH1 c.2104-1G>C variant, possibly because the splice effect falls outside the amplification window. This shows the limitations of the current study. First, RNA from FFPE tissue blocks is often degraded to fragments less than ~300 bases in length [38]. To analyse this RNA, small amplicons must be designed instead of large amplicons containing multiple exons, as is preferred in leucocyte RNA testing. The design of the primers is very specific for the variant and is based on splicing prediction software. Although these algorithms are described as accurate for variants in the canonical splice site [4, 7, 8], aberrant products that are not predicted and fall outside the amplification window of the assay will not be detected. Second, when an aberrant product is not detected, poor RNA quality, a wrongly predicted effect or no splice effect of the variant can be implicated, but no expression or complete RNA degradation of the mutant product by NMD can also occur. Therefore, the assay can confirm aberrant splicing, but from negative assay results, one cannot accurately conclude that no aberrant splicing occurs or that lack of aberrant product is due to other factors. Negative results should be confirmed with other methods, such as a minigene splicing assay [4, 9]. Further evidence of FFPE-based RNA analysis should be obtained from a larger study with more samples and variants. Targeted RNA-seq of the RNA isolated from the FFPE tissue might enable high-throughput analysis of somatic splice site variants. Finally, possibly because of a size-related amplification bias, the reaction seems to favour the aberrant splice product. Leaky splicing (a partial splice defect with also the ability to express normal RNA) might remain unnoticed as a result of this.

FFPE blocks were collected from the archives and were embedded between 2000 and 2016, with most (n = 9) embedded after 2009. Notably, from the two samples embedded before 2009, no RNA was successfully isolated from the FFPE block, indicating that isolating RNA from the FFPE tissue might not be possible in older blocks. From eight of the nine blocks embedded after 2009, RNA was successfully isolated, showing WT and/or aberrant RNA, and combined with the formalin fixation results from the cell lines, showed that formalin fixation possibly inhibited RNA degradation. With the analysis of the RNA from the FFPE tissues, the splicing effect of somatic variants that are only present in the tumour can be analysed. Furthermore, available material can be retrospectively analysed, without having to request RNA from leukocyte DNA, which is not always available.

RNA analysis on total nucleic acid isolated from the FFPE tissue blocks is a valuable tool for fast and easy detection of aberrant splicing, offering additional support for the pathogenicity of a (predicted) splice variant. While the current study focuses on MMR, APC and BRCA1 gene variants, this method could be applied on splice variants in other genes as well. With this assay, we correctly showed the splice effect of six known splice site variants and showed the splice effect of three variants predicted to affect splicing. This assay can be used to analyse somatic variants found in the FFPE tumour tissue, with formalin fixation possibly inhibiting RNA degradation, and can be easily implemented in current molecular tumour diagnostics to help classify the high number of variants of uncertain significance currently found with high-throughput sequencing.