Genome sequencing-based discovery of a novel deep intronic APC pathogenic variant causing exonization

Familial adenomatous polyposis (FAP) is a hereditary cancer syndrome that occurs as a result of germline mutations in the APC gene. Despite a clear clinical diagnosis of FAP, a certain proportion of the APC variants are not readily detectable through conventional genotyping routines. We accomplished genome sequencing in duo of the disease-affected proband and non-affected sibling followed by in silico predictions and a series of RNA-based assays clarifying variant functionality. By prioritizing variants obtained by genome sequencing, we discovered the novel deep intronic alteration APC:c.531 + 1482 A > G that was demonstrated to cause out-of-frame exonization of 56 base pairs from intron 5 of the gene. Further cDNA assays confirmed, that the aberrant splicing event was complete and its splice product was subject to nonsense-mediated decay. Co-segregation was observed between the variant carrier status and the disease phenotype. Cumulative evidence confirmed that APC:c.531 + 1482 A > G is a pathogenic variant causative of the disease.


INTRODUCTION
Familial adenomatous polyposis (FAP) is an autosomal dominantly inherited polyposis subtype of colorectal cancer diseases, accounting for 0.5-1% of all colorectal cancer cases [1]. FAP has a strong genotype correlation with the susceptibility locus adenomatous polyposis coli (APC), with a germline mutation detection rate of 60-90% [2,3]. The missing heritability can be largely explained by the presence of certain mutation types of the APC gene that are outside the scope of routine genotyping methods. Such mutations include promoter variants [4], copyneutral inversions [2], or deep intronic variants [5]. However, detection of the causative genetic elements may be critical to the selection of appropriate therapies, preventive measures and genetic counseling of the affected family members. Here, we disclose a mutation detection history that unravels a novel deepintronic APC pathogenic variant in a polyposis family meeting clinical criteria of FAP. Our investigation demonstrates the utility of whole genome sequencing (WGS) for the discovery of mutations beyond exonic regions.

Family selection and germline genetic screening
A classical FAP family with adenomatous polyposis affecting multiple members over three generations was assigned for genetic analysis at the Department of Molecular Genetics, National Institute of Oncology, Hungary. DNA was extracted from blood leukocytes using the Gentra DNA Blood extraction Kit (Qiagen, Hilden, Germany). Bidirectional Sanger sequencing of the coding exons and exon-intron boundaries of the APC gene and multiplex ligation-dependent probe amplification (MLPA) testing for copy number variants by P043-E1 kit (MRC-Holland, The Netherlands) were performed. Whole exome sequencing was done as previously described [6,7] (see Supplementary Methods). Whole genome sequencing (WGS) was designed as a duo of the disease-affected proband and his nonaffected brother (details in Supplementary Methods). The resulting variant list was prioritized for colon polyposis genes [8] and only considered changes with a frequency of less than <0.01 in gnomAD database and present only in the affected family member. All participants gave written informed consent for the genetic testing.

RNA-based testing
Total RNA was isolated from peripheral blood using Tempus Spin RNA Isolation Kit (ThermoFisher Scientific). RNA was reverse transcribed with Protoscript II First Strand Synthesis Kit (New England Biolabs, MA, USA) using random hexamers. PCR reactions from the cDNA template (RT-PCR) were carried out using Multiplex PCR Kit (Qiagen). RT-PCR products were visualized by electrophoresis on 1.5% agarose gels and on DNA1000 chips (Agilent Technologies, CA, USA). Allele imbalance test and completeness of aberrant splicing were done as previously reported [10]. Fluorescent fragment analysis was performed as follows: 24 cycles of duplex PCR reactions were carried out with Multiplex PCR Kit (Qiagen) using forward primers designed to selectively amplify the normal and the aberrant transcripts, respectively, paired with a FAM-labelled reverse primer in one reaction. The PCR products were subjected to capillary electrophoresis on an ABI3500 Genetic Analyzer (ThermoFisher Scientific) and visualized with Gene Mapper Software v.6. (Primers are listed in Supplementary Table).

Genotyping
A family with a clinical diagnosis suggestive of classic FAP was enrolled for germline genetic testing. Several family members spanning 3 generations were affected with polyposis-type colorectal cancer, with 10-300 adenomatous polyps at diagnosis, thus meeting the requirements for the syndrome and being eligible for genetic counseling according to the relevant guidelines [11]. Age of onset of the disease ranged from 18 to 53 years (pedigree is shown in Fig. 1A). Genomic DNA from the proband (II/1) was tested by Sanger sequencing for all coding exons and exon-intron boundaries of the APC gene. MLPA testing for large copy number variations of the gene was also performed. No pathogenic variant or large deletions/duplications were identified. Subsequently, whole exome sequencing (WES) was carried out to screen for potentially deleterious variants in established polyposis genes [8]. Again, no clinically relevant variant was recognized. Subject II/1 then underwent WGS, the most comprehensive type of genotyping, along with his unaffected sibling (II/2) who did not develop disease until age 55. Resulting variants with adequate quality metrics called within the polyposis genes were listed, filtered for  only rare variants (<0.01 in gnomAD database) and evaluated only those, present in the proband but absent from the healthy relative (Fig. 1B). Annotations with in silico splice prediction softwares revealed a deep intronic variant with potential splicing effect (hg19) chr5:112112916 A > G; NM_000038.5(APC):c.531 + 1482 A > G (prioritization workflow is shown in Fig. 1C). Confirmatory Sanger sequencings were performed (Fig. 1D). This variant was novel, not registered in the Variant Database of Hereditary Gastrointestinal Tumors (http://insight-database.org/genes/APC) and absent in large population screenings (gnomAD https://gnomad.broadinstitute.org, 1000 Genomes https://www.internationalgenome.org or ExAC http://exac.broadinstitute.org). SpliceAI algorithm predicted (score: 0.93) this base change to activate a splice donor site five nucleotides upstream. Human Splice Finder revealed, that the variant affected several intronic splice silencer motifs (Supplementary Figure). The variant tested positive in two additional family members (II/4 and III/ 4), both affected with the disease, whereas it was not present in the healthy family member II/2 (current age: 55 years).
RNA-based testing RNA was obtained from both the proband (II/1) and another affected relative of the family (III/4) who was also heterozygote for the variant. RT-PCR from the cDNA was carried out with primers designed for exons surrounding the variant. Gel electrophoresis demonstrated the presence of the same aberrant product for both carriers ( Fig. 2A). Bidirectional Sanger sequencing confirmed that it was a 56-bp exonization from intron 5 (Fig. 2B) as a result of the activation of a pre-existing cryptic GT donor site five nucleotides upstream of the variant (Fig. 2C). The activated donor site induced a nearby AG dinucleotide as a functional acceptor site and these together established an extra exon from the intronic sequence (NM_000038.5(APC):r.531_532ins531 + 1422_531 + 1477).
The pseudoexon was out-of-frame, and the inferred translated product was p.(Phe178Ilefs14*). The private existence of the aberrantly spliced product in variant carriers was certified by fluorescent fragment analysis (Fig. 2D).
The possible nonsense-mediated decay (NMD) effect on the aberrant splice product was tested by allelic imbalance using exonic heterozygote variant positions as markers. Only the proband (II/1) was eligible for this test because he was heterozygous for c.1458 T > C (rs2229992) and c.1635G > A (rs351771). Relative allelic area under the curve ratios measured at the heterozygote positions were 0.57 and 0.56, respectively (Fig. 2E), indicating that the transcript carrying the minor alleles was reduced by about half. The III/4 family member was genotyped homozygous for both polymorphic variants, providing indirect evidence that c.531 + 1482 A > G variant is in phase with the minor alleles of these markers. This fact is consistent with the allelic imbalance result: the allele, carrying also c.531 + 1482 A > G was detectable in lower quantity.
Next, we addressed the integrity of the aberrant splicing: long RT-PCR was performed using primers designed to amplify only the normal-length wild-type transcript, and quantified the allelic composition of the product by examining tagging positions. Sanger sequencing pointed out the exclusive presence of the major alleles at both tagging polymorphic positions rs2229992 and rs351771 (Fig. 2F), indicating that the normal transcript was not admixed by the variant-carrier allele. This outcome confirmed the completeness of the aberrant splicing event.

DISCUSSION
The selected family showed obvious clinical features of classical FAP and the appearance of the disease in several generations indicated the conclusive presence of a heritable factor and ruled out mosaicism [12]. However, routine genetic testing failed to detect a causative germline susceptibility variant in the APC. Therefore, we performed two further sequencing sessions with increasing comprehensiveness. WES also did not point out possible pathogenic mutations in various polyposis genes. Finally, WGS provided a solution to the conundrum of the disease inheritance. Computational prioritization of the WGS variants highlighted a novel deep intronic alteration NM_000038.5:c.531 + 1482 A > G. Functional characterizations on the cDNA level discovered that the variant elicited an out-of-frame exonization. The aberrant splicing event was complete, which is critical in terms of pathogenicity [13]. The aberrant transcript was only present in variant carriers and the variant co-segregated with the disease. The variant-containing transcript was partially degraded, presumably by NMD. Based on the summed score of these pieces of evidence (score: 12), this genetic alteration was judged as a genuine pathogenic variant by the ACMG guidelines [9]. Diverse pseudoexon formations due to deep intronic variants have been described in the APC gene by others [14,15], most of them were infrequent findings reported only in single families. Approximately 15-20% of the total germline APC pathogenic variants emerge de novo [16], so isolated and diverse pathogenic hits can be attributable to independent germline mutation events of this gene. So far, WGS-based discovery of causative mutations was reported in various clinical syndromes, possessing strong phenotype-genotype correlations [17][18][19][20]. Performing APC whole gene sequencing or even WGS is especially valid for multigenerational FAP families, if panel sequencing results are negative. Comparison of the variants of affected and non-affected family members promotes clinical evaluation.