mRNA analysis identifies deep intronic variants causing Alport syndrome and overcomes the problem of negative results of exome sequencing

Mutations in COL4A3, COL4A4 and COL4A5 genes lead to Alport syndrome (AS). However, pathogenic variants in some AS patients are not detected by exome sequencing. The aim of this study was to identify the underlying genetic causes of five unrelated AS probands with negative next-generation sequencing (NGS) test results. Urine COL4A3–5 mRNAs were analyzed in the probands with an uncertain inherited mode of AS, and COL4A5 mRNA of skin fibroblasts was analyzed in the probands with X-linked AS. RT-PCR and direct sequencing were performed to detect mRNA abnormalities. PCR and direct sequencing were used to analyze the exons with flanking intronic sequences corresponding to mRNA abnormalities. Six novel deep intronic splicing variants in COL4A4 and COL4A5 genes that cannot be captured by exome sequencing were identified in the four AS probands. Skipping of an exon was caused by an intronic variant, and retention of an intron fragment caused by five variants. In the remaining AS proband, COL4A5 variants c.2677 + 646 C > T and r.2678_r.2767del were detected at the DNA and RNA level, respectively, whereas it is unclear whether c.2677 + 646 C > T may not lead to r.2678_r.2767del. Our results reveal that mRNA analysis for AS genes from either urine or skin fibroblasts can resolve genetic diagnosis in AS patients with negative NGS results. We recommend analyzing COL4A3–5 mRNA from urine as the first choice for these patients because it is feasible and non-invasive.

However, to our knowledge, the value of detecting mutations in COL4A5 and COL4A4 genes in mRNA isolated from urine has not been adequately studied.
The aim of this study was to identify the genetic etiologies of five unrelated AS patients with negative NGS results. We used our developed approach for analysis of the entire coding regions of COL4A3, COL4A4, and COL4A5 mRNAs isolated from urine and COL4A5 mRNA extracted from cultured skin fibroblasts and identified deep intronic splicing variants in the enrolled patients. These findings indicate that our developed approach may help guide medical practitioners and genetic counselors to provide personalized management of AS.

Results
Analysis of urine NPHS2 and COL4A3-5 mRNAs of the control. As show in Fig. 1A,B, agarose gel electrophoresis showed that the sizes of the products of two independent RT-PCR assays for NPHS2 mRNA transcript in the control's urine were the same as initially designed, and subsequent sequencing demonstrated the amplified sequence corresponded exactly to the 388 bp of the published NPHS2 mRNA sequence (NM_014625.3), which indicated that urine pellets contain podocytes. The sizes of all ten overlapping fragments covering the entire coding sequence of either COL4A3, COL4A4, or COL4A5 mRNA were the same as originally conceived (Fig. 1C), and sequencing of these RT-PCR products confirmed that the amplified sequences mapped precisely to the published COL4A3, COL4A4, and COL4A5 mRNA sequences (NM_000091.5, NM_000092.5, NM_000495.5 and NM_033381), respectively. The first author can provide the original data if needed.
Clinical features of AS patients. Five unrelated Alport syndrome probands were enrolled according to the inclusion and exclusion criteria listed in the Material and Methods. Patient clinical information and pedigrees are shown in Table 1 and Fig. 2, respectively. Proband 1 was diagnosed with AS based on characteristic AS features in GBM; the inheritance pattern was uncertain because of a negative family history and normal staining of α5(IV) chain in skin tissue. Proband 2 was highly suspected of having XLAS based on a positive family history of hematuria and end stage renal disease (ESRD) and diffuse thinning of the GBM (less than 200 nm shown (B) showed agarose gel electrophoresis and sequences of the products of two independent RT-PCR assays for NPHS2 mRNA transcript in the control's urine; (C) showed agarose gel electrophoresis of COL4A3, COL4A4, or COL4A5 mRNA. M: DNA molecular mass marker. S1and S2: the products of two independent RT-PCR tests for NPHS2 mRNA. Lane 1 to 10: 10 overlapping PCR products covering the entire selected gene cDNA from the control. Table 1. Clinical features and analyzed samples of the 5 probands in this study. Patients 1 and 4 did not undergo staining of type IV collagen α5 chain in their renal specimen; hearing loss and ocular changes were not detected in five patients. *Age at which serum creatinine tests were carried out in the four probands was 25, 30, 10 and 14 years, respectively. **Positive family history of hematuria and/or end stage renal disease. EM electron microscopy, EBM epithelial basement membrane, NA not available, ND not done. www.nature.com/scientificreports/ by ultrastructural examination of 3 glomeruli) found in her daughter's renal biopsy taken at age 3.25 years in another hospital; however, normal staining of α5(IV) chain in skin tissue did not support the diagnosis. XLAS was diagnosed in probands 3-5 with abnormal staining of α5(IV) chain in skin specimens.
Gene variants in the five probands. In proband 1 (II-2 of family 1), cDNA analysis showed that no abnormal transcripts were detected in COL4A3 and COL4A5 mRNAs isolated from proband 1's urine (Suppl. Figure 1A and B). Agarose gel electrophoresis revealed COL4A4 mRNA transcript in proband 1's urine was successfully amplified (Fig. 3A). Sequencing of 10 RT-PCR products revealed a heterozygous skipping of exons 3 and 25 (r.72_r113del, p.Trp24*; r.1804_r1987del, p.Gly602Valfs*8; Fig. 3B,D, Suppl. Figure  In proband 2 (II-4 of family 2), cDNA analysis showed that no abnormal transcripts were detected in COL4A3 and COL4A4 mRNAs isolated from proband 2's urine (Suppl. Figure 1C and D). Agarose gel electrophoresis revealed COL4A5 mRNA transcript in proband 2's urine was successfully amplified (Fig. 4A). Sequencing of 10 RT-PCR products revealed that exon 32 of COL4A5 gene was skipped heterozygously (r.2678_r2767del, Suppl. Figure 2D), which led to an in-frame deletion (p.Thr894_Gly923del) (Fig. 4B). COL4A5 exons 31-33 with the sequences of the flanking introns 31-32 were amplified by PCR from genomic DNA for proband 2, her husband, and her daughter and then sequenced. Proband 2 and her daughter (III-1) were heterozygous for the variant intron 31 c.2677 + 487C > A and c.2677 + 646C > T (Fig. 4C,D). Neither of the two variants were identified in her husband (II-5). The frequency of variant c.2677 + 487 C > A in gnomAD is 21.96%, that means it is a benign variant. The variant c.2677 + 646 C > T had not been reported in gnomAD, HGMD and ClinVar. In addition, haplotype reconstruction demonstrated an X-linked inheritance mode in family 2 (Fig. 4E).
In proband 4 (III-1 of family 4), agarose gel electrophoresis revealed COL4A5 mRNA transcript in proband 4's skin fibroblasts was successfully amplified (Fig. 5D). Sequencing of 10 RT-PCR products revealed a 53 bp sequence from intron 29 of COL4A5 gene in COL4A5 mRNA from skin fibroblasts inserted between exons 29 and exon 30 (r.2395_r.2396 ins [2395 + 1308_2395 + 1360]) (Fig. 5E, Suppl. Figure 3B), which led to premature termination of α5(IV) chain (p.Gly799Alafs*15). COL4A5 intron 29 was amplified by PCR from genomic DNA in proband 4 and his mother (II-2). A hemizygous C to G variant in intron 29 at 1275 bp and a G to T variant in intron 29 at 1292 bp downstream from exon 29 (IVS29 c.2395 + 1275C > G and c.2395 + 1292G > T) were found in genomic DNA of proband 4 (Fig. 5F). Neither of the two variants were identified in his mother. These two variants had not been documented in gnomAD, HGMD and ClinVar.
According to the American College of Medical Genetics and Genomics guidelines 16 , the foregoing six rare deep intronic variants were classified as pathogenic variants, except COL4A5 variant c.2677 + 646 C > T (    Table 2 shows the output of HSF, NNSPLICE, NetGene2, SpliceAI, and MaxEntScan for each rare deep intronic splice variant identified in this study. Only two out of the seven variants (COL4A4: c.1623 + 702 T > A and COL4A5: c.609 + 879A > G) were correctly predicted as deleterious by four tools, 3 variants were predicted by two tools, and one variant was only predicted by one tool. However, in the case of the residue variant COL4A5: c.2677 + 646 C > T, no effect on RNA splicing was predicted by four tools, whereas skipping of exon 32 was observed at the RNA level.

Discussion
In this study, by analyzing COL4A3-5 mRNAs from urine or skin fibroblasts, six deep intronic pathogenic variants were identified in four unrelated AS patients with negative NGS results, although it is difficult to assess the contribution of COL4A5 variant c.2677 + 646 C > T to aberrant RNA transcript containing variant www.nature.com/scientificreports/ r.2678_r.2767del detected in the remaining AS patient. These findings indicate that our developed approach may be applied to help provide personalized evaluation and care of patients and their families. Meanwhile, compared with the mRNA-based approach using skin fibroblasts for finding (likely) pathogenic variants leading to XLAS, the method using urine mRNA has a clear advantage to identify the underlying genetic causes of AS with uncertain inheritance pattern. In addition, this is the first report on compound heterozygous deep intronic splicing mutations in COL4A4 gene in an AS patient. Numerous studies have shown that NGS is effective in finding single nucleotide variations and small indels in exons and the flanking intronic regions 17 . However, some genetic events such as deep intronic variants, copy number variants, and somatic cell mosaicism may be missed by NGS 18 . Therefore, for a patient with clinically diagnosed or suspected AS and no pathogenic variants detected by NGS, it is necessary to further analyze COL4A3-5 genes by mRNA sequencing, chromosome microarray analysis, droplet digital PCR or other approaches to improve genetic diagnosis 19,20 .
According to the literature and public databases (HGMD and Leiden Open source DNA Variation Database), pathogenic splicing variants account for 14.9% to 24.5% in the COL4A5 gene 21,22 . Approximately 70.4% (112/159) occurred at consensus splice sites, and only seven splicing variants occurred in introns at more than 100 base pairs up/downstream from exon-intron junctions. Approximately 70% (23/32) of the pathogenic COL4A3 splicing variants occurred at consensus splice sites and only two variants were located in introns at more than 100 base pairs upstream from the exons. No deep intronic COL4A4 splicing variants have been reported to date. These findings indicate that deep intronic COL4A3-5 mutations are rare. The six novel deep intronic pathogenic variants obtained in the present study extend the mutational spectrum of AS. These findings also highlight COL4A3-5 mRNA analysis as an effective supplementary approach for NGS in molecular diagnosis of this disease.
Previous studies have reported that GBM collagen α3α4α5(IV) is synthesized solely by podocytes 23 , and the urine podocyte detachment rate (assessed by podocin mRNA in urine pellets) is increased in AS patients 24,25 . Therefore, extraction of RNA directly from patient-originated urine may be a valuable approach to the analysis of all three Alport gene variants, which was demonstrated by the findings of the present study. Previous studies showed urine-derived podocyte-lineage cells could be used as the primary material for identifying the variants in known nephropathy genes 26,27 , whereas compared with this method our developed approach for isolation of RNA directly from urine is simpler and more practical. A weak point of our approach is the requirement for patient cooperation to obtain enough fresh urine, which means that young patients who cannot rapidly drink 1000-1500 ml water are not suitable for urine mRNA analysis. Meanwhile, the complexity of intron sequences may be unfavorable for amplification and sequencing to detect the variants, and compelling evidence is needed to assess the relationship between exon skipping in the causative gene mRNA observed in the patient-originated urine and the only plausible genomic variant in the candidate region. As family 2 presented here, a positive family history of hematuria and ESRD, diffuse thinning of the GBM observed in the proband's daughter, and the haplotype of three microsatellite markers around the COL4A5 gene co-segregated with the proband and her affected daughter were important clues in making a diagnosis of XLAS. Whereas the results of limited splicing computational tools did not implicate the only novel rare deep intronic DNA variant in the COL4A5 gene caused the aberrant splicing pattern we observed. Although the causative gene mature mRNA pseudo-exon inclusion www.nature.com/scientificreports/ appears to be the major effect of deep intronic DNA variants, exon skipping from intronic DNA variants away from the canonical splice sites had been reported [28][29][30] . Given that the deep intronic variants identified in the present study could be detected using whole genome sequencing, and in silico splicing prediction tools are usually used to select variants that are predicted to have an effect on splicing in a molecular diagnostic setting 31 , we assessed the reliability of HSF, NNSPLICE, NetGene2, SpliceAI, and MaxEntScan in discriminating between neutral and pathogenic variants. Assuming that the splice outcomes obtained from one tool were consistent with transcript analysis results, six variants detected in this study were correctly predicted, which indicated that these tools are useful to select deep intronic variants that are likely to be worth RNA analysis. However, extensive in silico analysis should be compared with transcript analysis results to determine their benefit in the context of molecular diagnosis.
In summary, two novel pathogenic COL4A4 variants and four novel pathogenic COL4A5 splicing variants were detected in four unrelated AS patients with negative NGS test results. All identified variants were deep intronic variants. As obtaining urine is feasible and non-invasive, we suggest analyzing COL4A3-5 mRNA from urine as the preferred method for evaluation of patients with clinically diagnosed or suspected AS with negative results of NGS analysis of coding regions.

Materials and methods
All methods were carried out in accordance with relevant guidelines and regulations.
Ethical considerations. The Ethical Committee of Peking University First Hospital approved the procedures in this study.

Patients.
Patients with hematuria or hematuria and proteinuria were enrolled from August 2019 to August 2020 by pediatric nephrologists from the Department of Pediatrics, Peking University First Hospital based on fulfillment of the following two criteria: diagnosed or suspected AS and no pathogenic COL4A3-5 variants identified by exome sequencing. Patients were diagnosed with AS if they met one of the following three criteria: 1. abnormal staining of the type IV collagen α5 chain in skin and/or renal specimens; 2. ultrastructural alterations in the GBM typical of AS; or 3. positive family history of hematuria and/or ESRD. Informed consent was obtained from adult subjects or the parents or legal guardians of the subjects who were less than 18 years of age. Patients were excluded if they were unwilling to participate in the study.
Urine mRNA extraction, and sequence of NPHS2 and COL4A3-5 cDNA of the control. To obtain fresh urine, a healthy volunteer without hematuria or proteinuria was asked to drink approximately 1000-1500 ml water rapidly after emptying the bladder and spontaneously void every 30-45 min. Approximately 500 ml of urine per patient was collected and allocated in 50 mL centrifuge tubes pre-treated with RNAlater (Qiagene, 145,023,696). Urine samples were centrifuged for 5 min (1200 rpm at 4 °C), and the supernatants were carefully removed using pipettes. The urinary pellets were washed twice with ice-cold PBS supplemented with RNAlater (1 ml RNAlater per 50 ml PBS) and the samples were centrifuged for 5 min (1200 rpm at 4 °C). Total RNA was isolated from urinary pellets using TRIzol reagent (Gibco, Grand Island, NY, USA) according to the manufacturer's instructions. The concentration of RNA was quantified with a NanoDrop 2000 spectrophotometer (Thermo Fisher Scientific, Waltham, MA, USA). Reverse-transcription was performed using the RevertAid First Strand cDNA Synthesis Kit (TAKARA, K1622). A podocyte protein podocin is encoded by NPHS2 gene, so this gene was used to assess the podocytes in urine. A 388 bp fragment of NPHS2 (NM_014625.3) cDNA, including exons 3, 4, 5 and partial sequences of exons 2 and 6, was amplified by PCR using a pair of primers (F: 5'-GGT ACC AAA TCC TCC GGC TTA-3' , R: 5'-CCA AGG CAA CCT TTG CAT CTT -3'). Ten pairs of PCR primers were designed to amplify the entire coding sequence of COL4A3 (NM_000091.5), COL4A4 (NM_000092.5), and COL4A5 (NM_000495.5), respectively; the sequences were listed in Table 3. Figure 6 showed the strategy for amplification of the COL4A3-5 cDNAs by PCR. The 'Touchdown' PCR procedure included annealing from 64 °C to 57 °C, descending 1 °C every two cycles, followed by annealing at 57 °C for 26 cycles. The PCR amplification products were checked by 2% agarose gel electrophoresis and sequenced on an ABI 3730XL (SinoGenoMax Company Limited, China).
Analysis of urine COL4A3-5 mRNA from AS patients. For AS patients with an uncertain inheritance pattern, COL4A3-5 mRNAs from urine were analyzed. When available, RNA from parents was sequenced to assess the segregation of variants with the disease in the respective families. RT-PCR and direct sequencing followed the above protocol.
Analysis of COL4A5 mRNA from skin fibroblasts. For patients with XLAS, COL4A5 mRNA from cultured skin fibroblasts was analyzed. Dermal fibroblasts were cultured as described previously 11 . Primers for COL4A5 cDNA analyses were performed using the same primers as shown in Table 3. RT-PCR and direct sequencing followed the above protocol.
Genomic DNA analysis. Genomic DNA was extracted from peripheral blood lymphocytes. Once abnormal COL4A3-5 transcripts were detected, the corresponding exons with flanking intronic sequences were further analyzed using PCR and direct sequencing to identify the point variants that may cause new splice sites. www.nature.com/scientificreports/ PCR primers are available on request. The public database gnomAD (http:// gnomad-sg. org/) was used to assess the frequencies of variants. HGMD and ClinVar were used to detect previously reported pathogenic variants.
Haplotype analysis. According to the report of Tazón-Vega et al 13 , three short tandem repeats (DXS1120, DXS6802, and DXS1210) around the COL4A5 gene were used to perform linkage analysis in II-4 (proband 2), II-5 and III-1 of family 2. PCRs were performed using the published primers (the forward primer in each pair was labeled with 6-carboxyfluorescein fluorescence) 13 and following the above protocol. PCR products were separated on a 3730XL automatic sequencer (Applied Biosystems) and analyzed by GeneMapper software (version 4.0).