Molecular diagnosis of pediatric patients with citrin deficiency in China: SLC25A13 mutation spectrum and the geographic distribution

Citrin deficiency (CD) is a Mendelian disease due to biallelic mutations of SLC25A13 gene. Neonatal intrahepatic cholestasis caused by citrin deficiency (NICCD) is the major pediatric CD phenotype, and its definite diagnosis relies on SLC25A13 genetic analysis. China is a vast country with a huge population, but the SLC25A13 genotypic features of CD patients in our country remains far from being well clarified. Via sophisticated molecular analysis, this study diagnosed 154 new CD patients in mainland China and identified 9 novel deleterious SLC25A13 mutations, i.e. c.103A > G, [c.329 − 154_c.468 + 2352del2646; c.468 + 2392_c.468 + 2393ins23], c.493C > T, c.755 − 1G > C, c.845_c.848 + 1delG, c.933_c.933 + 1insGCAG, c.1381G > T, c.1452 + 1G > A and c.1706_1707delTA. Among the 274 CD patients diagnosed by our group thus far, 41 SLC25A13 mutations/variations were detected. The 7 mutations c.775C > T, c.851_854del4, c.1078C > T, IVS11 + 1G > A, c.1364G > T, c.1399C > T and IVS16ins3kb demonstrated significantly different geographic distribution. Among the total 53 identified genotypes, only c.851_854del4/c.851_854del4 and c.851_854del4/c.1399C > T presented different geographic distribution. The northern population had a higher level of SLC25A13 allelic heterogeneity than those in the south. These findings enriched the SLC25A13 mutation spectrum and brought new insights into the geographic distribution of the variations and genotypes, providing reliable evidences for NICCD definite diagnosis and for the determination of relevant molecular targets in different Chinese areas.

Novel mutations and cDNA cloning analysis. As shown in Table 2 To address the issue whether and how c.755− 1G > C, c.845_c.848 + 1delG and c.933_c.933 + 1insG-CAG affect the splicing process of the relevant pre-mRNA molecules, cDNA cloning analysis of the transcripts from the affected SLC25A13 alleles in PBLs were performed. They were found to give rise to the aberrant transcripts r.755_756delAG (p.252fs269X), r.845_848delG (p.D283fsX285) and r.933_934insGCAG (p.A312fsX317), respectively (Fig. 2b).
In patient C0360, four high-frequency mutation screening just detected the maternal mutation IVS16ins3kb. Further cDNA cloning analysis revealed that all the transcripts from the paternal allele featured exon 5 skipping ( Table 3). The following screening of large insertion/deletion with primer set located in the adjacent intronic regions of exon 5 revealed a paternally-inherited unexpected PCR band of 1 kb in size, as illustrated in Fig. 3. Direct sequencing of this unexpected product revealed a 2646 bp deletion, involving the entire exon 5 and the adjacent intronic sequences, along with a 23 bp insertion 40 bp behind the breakpoint. According to the nomenclature rules 39,40 , this complex mutation was described as [c.329− 154_c.468 + 2352del2646; c.468 + 2392_c.468 + 2393ins23], predictively leading to the production of a truncated citrin molecule p.E110fs127X.
In patient C0388, the maternal SLC25A13 allele harbored two mutations [c.851_854del4; c.1452 + 1G > A], while the paternal mutation was not detected by direct DNA sequencing. The subsequent cDNA cloning analysis also shown that all the transcripts were from the maternal allele with r.851_854del4; moreover, half of them (12/21) demonstrated exon 14 skipping (r.1312_1452del, p.Ala438_Lys484del) ( Table 3), indicating that the mutation c.1452 + 1G > A affected the splicing process of pre-mRNA molecules transcribed from the maternal allele.
Bioinformatic and functional findings of the novel missense mutation c.103A > G (p.M35V).
Alignment analysis in 11 different species indicated that the amino acid M35 in citrin protein is highly conserved (Supplemental Information 1). The probability value of disease-causing potential was > 0.9999 upon MutationTaster analysis and strongly indicated its deleterious nature. Meanwhile, this mutation is predicted to be probably damaging by using PolyPhen-2 with a score of 0.989 (sensitivity: 0.72; specificity: 0.97), and have a deleterious effect by using PROVEAN with a score of − 3.25.
As depicted in Fig. 4, after growing for 96 hours, the growth ability of the pYX212-mutant (p.M35V) was not significantly different (P = 0.341) from that of the empty vector pYX212 (vector). However, both of them had significantly lower (P = 0.000) growth ability in comparison to pYX212-citrin (citrin). These findings indicated that the mutation p.M35V, as a lack-of-function variation, caused the elimination of the AGC2 function of citrin protein.
SLC25A13 mutation spectrum and regional distribution. SLC25A13 mutations/variations were detected in 522 out of the 528 independent alleles in Table 2 c.851_854del4, c.1078C > T (p.R360X), IVS11 + 1G > A, c.1364G > T (p.R455L), c.1399C > T (p.R467X) and IVS16ins3kb presented with significantly different geographic distribution with P all < 0.05 (Table 2). In pairwise comparisons, the southern population had a higher relative frequency of c.851_854del4 than the border (P = 0.001) and northern (P = 0.000) populations. By contrast, the relative frequency of IVS16ins3kb was lower  SLC25A13 genotype and allelic heterogeneity. Except the 6 patients with only one mutation detected and the 13 ones with parents of different geographic origins, in this paper, the remaining 245 unrelated individuals were enrolled for the comparison of the genotype distribution. As shown in Further pairwise comparisons revealed that the relative frequency of c.851_854del4/ c.851_854del4 in the south was much higher than that in the north (P = 0.000).
The observed homozygosity calculated based on the genotype frequencies was 17.39% (4/23) in the north, 34.28% (12/35) in the border and 52.94% (99/187) in the south, respectively. In addition, the theoretical homozygosity value, which was calculated based on allele frequencies, was 15.24% in the north, 24.31% in the border and 44.81% in the south, respectively. In pairwise comparisons, both of the observed and theoretical homozygosity values were higher in the south than in the north, with the P values of 0.001 and 0.012, respectively. In other words, the northern population had a higher level of allelic heterogeneity at the SLC25A13 locus than the southern population.

Discussion
The first case of NICCD in mainland China was reported by our group in 2006 43 . Since then, more and more Chinese CD patients were definitely diagnosed by SLC25A13 genetic analysis in our department 23,24,27,33,34 . During this process, conventional DNA analytic approaches such as PCR/LA-PCR, PCR-RFLP and Sanger sequencing played important roles. In this study, as shown in Table 2, the four mutations c.851_854del4, c.1638_1660dup, IVS6 + 5G > A and IVS16ins3kb together had a relative frequency of 84.47%, indicating that the screening of these high-frequency mutations should be initially performed for the rapid molecular diagnosis of CD patients. In addition, direct sequencing of the 18 SLC25A13 exons and their adjacent intronic regions could identified the remaining micro mutations, which accounted for 12.5%. Besides, the functional and bioinformatic tools also made substantial contribution to the pathogenicity confirmation of the novel missense mutation c.103A > G (p.M35V). As a result, the 154 new CD patients diagnosed in this paper, together with those reported in our department previously, constituted a 274-case cohort. So far as we know, this is the hitherto largest CD cohort in official references worldwide, laying a foundation for our subsequent clinical investigation. In particular, the 9 novel mutations identified in this study enriched the SLC25A13 mutation spectrum, and provided reliable laboratory evidences not only for the definite diagnosis of the corresponding individuals, but also for the genetic counseling of their families in the future.
It was noteworthy that, as an unique technique developed by our group 42 , the SLC25A13 cDNA cloning analysis using PBLs had proved to be a feasible tool for the detection of the large insertion/deletion mutations 23,24,27 . In this paper, this molecular tool also played unique roles in the definite diagnosis of NICCD patients. The aberrant transcripts detected by cDNA analysis proved that c.755 − 1G > C, c.845_c.848 + 1delG, c.933_c.933 + 1insG-CAG and c.1452 + 1G > A all had influence on the splicing process of the pre-mRNA. In addition, by using this tool, the novel mutation [c.329 − 154_c.468 + 2352del2646; c.468 + 2392_c.468 + 2393ins23] in patient C0360 was identified as the third large mutation resulting in exon 5 skipping (r.329_468, p.E110fs127X) following the mutation IVS4ins6kb (GenBank accession number: KF425758) 27   Of particular note, in patient C0388, we confirmed that the maternally-inherited allele harbored two deleterious mutations based on the cDNA cloning results. Although the paternal mutation remained to be explored currently, the undetectable transcriptional product from the paternal allele clearly indicated the existence of a pathogenic mutation, providing direct laboratory evidences supporting the CD diagnosis. Unfortunately, due to the lack of fresh PBLs or liver specimens for further cDNA cloning analysis, mutations in 6 SLC25A13 alleles remained obscure in this study; however, the rate of unidentified mutations (1.14% of all mutated alleles) in our study is much lower than those in previous publications using the conventional molecular approaches only. SLC25A13 cDNA cloning analysis using PBLs should be taken as an important tool for the molecular diagnosis of CD patients.
Although the relative frequencies of c.1638_1660dup and IVS6 + 5G > A were uniform in different regions of China, some SLC25A13 mutations demonstrated different geographic distribution in this study. The relative frequencies of c.851_854del4 tended to decrease gradually from south to north, while that of IVS16ins3kb had an opposite tendency, as shown in the Table 2. The genetic variation following a continuous pattern from south to north might be attributed to the genetic flow occurred between distinct populations. Recent genetic studies have suggested that modern humans colonized East Asia via Southern and Northern routes on both sides of the Himalayas. Genetic flow between populations, which took place when the two migration routes overlapped, probably lasted a long time, resulting in the continuous pattern of genetic variation 44,45 . The continuous patterns of genetic variation at the SLC25A13 locus in the present study are compatible with this presumed migration model. Actually, previous haplotype study 28 suggested that c.851_854del4 originated around the Guangxi and Yunan areas. Its higher frequency in south China can be explained as a result of the founder effect, while its lower frequency in the north, by genetic drift. Interestingly, besides the four common mutations, c.1399C > T (p.R467X) was relatively common in the border, while IVS11 + 1G > A and c.1364G > T (p.R455L) were relatively common in the north. This phenomenon might be attributed to different founding populations but a lower migration rate among different areas in mainland China. These mutations should be considered as targets when establishing a screening strategy for CD patients in relevant populations.
Moreover, although a diversity of genotypes with a total number of 53 was discovered in the large CD cohort, c.851_854del4/c.851_854del4 was the unique genotype with higher relative frequency in the south than in the north (49.20% vs 4.35%), as shown in Table 4. This could be explained once again by the aforementioned founder effect of the c.851_854del4 mutation, which might occurred in a far remote ancestor in the south; Furthermore, subsequent homozygosity comparison demonstrated that, different from the CD patients from the south who had higher homozygosity, patients in the north showed higher allelic heterogeneity at the SLC25A13 locus. This finding suggested that some CD patients might be missed while the CD prevalence be underestimated in the north,  when the same high-frequency mutations as in the south were choosed as the molecular targets for the detection of CD patients and SLC25A13 carriers in the north area. Therefore, the exploration of additional SLC25A13 mutations should be regarded as an important issue in the north area in terms of CD molecular diagnosis and epidemic survey. In summary, via sophisticated molecular, functional and in silico analysis of SLC25A13 gene and its cDNA, this paper reported 154 new CD patients and identified 9 novel pathogenic mutations. The SLC25A13 mutation spectrum in the hitherto largest CD cohort of 274 cases and their different geographic distribution formed a substantial contribution to the in-depth understanding of the genotypic feature of CD patients in China, and provided reliable evidences for the development of molecular diagnostic strategies in different Chinese areas.   Reverse transcription-PCR (RT-PCR) and cDNA cloning analysis. RT-PCR and cDNA cloning analysis were subsequently carried out in patients still with undetected mutation by all the approaches above, and in those with novel mutations that might affect the splicing of pre-mRNA molecules. In brief, total RNA was extracted from peripheral blood lymphocytes (PBLs), which were collected from 2 ml fresh EDTA-anticoagulant peripheral venous blood. Then RT-PCR was performed to synthesized cDNA following the kit manufacture's protocol (Invitrogen, USA). With the cDNA as template, nest-PCR was then performed for the target products, and the purified nest-PCR products were cloned into PMD-18T vector (Takara, Japan) and transformed into DH5α Escherichia coli competent cells. The positive clones were tested by LA-PCR with the universal primer set RVM (GAGCGGATAACAATTTCACACAGG) and M13-47 (CAGCACTGACCCTTTTGGGACCGC) and then sequenced, as previously described 42 .

Functional study.
A diploid AGC1-disrupted yeast model, BYagc1Δ , which was constructed in our previous publication 34 , was used to evaluate the functional effect of the novel missense mutation. The normal citrin-coding sequence (NM_014251.2) was amplified and the novel missense mutation was introduced into the wild type SLC25A13 cDNA by overlap-extension PCR. These products were purified and cloned into the vector pYX212 (Novagen, USA) to constitute the plasmid pYX212-citrin and pYX212-mutant, respectively. The BYagc1Δ strains were then transfected with the recombinant plasmids and the empty vector pYX212, respectively. The positive clones were screened using the uracil minus medium SD-URA and cultured in SA medium with acetate as the unique carbon source. After 96 hour of culture, the growth abilities of these three strains were assessed by the cell density measured at OD 600 .
Geographic division. The Yangtze River has been considered as a historically significant boundary of the Chinese population [52][53][54] . In addition, the previously estimated carrier rate of SLC25A13 mutations was 1/48 in the south but 1/940 in the north of this river 28 . Accordingly, the distribution of the mutations and genotypes in this study were compared among the north, border and south regions relevant to this boundary, based on the origin of the parents of each case. In this study, individuals in the north area referred to those from Beijing, Inner Mongolia, Shangdong, Shanxi, Shaanxi, Henan, Hebei, Liaoning, Jilin and Heilongjiang; in the border, from Shanghai, Jiangsu, Anhui, Hubei, Sichuan and Chongqing; and in the south, form Guangdong, Guangxi, Yunnan, Guizhou, Hunan, Fujian, Zhejiang, Jiangxi, Hainan and Taiwan, respectively.
Calculation of homozygosity. The theoretical homozygosity (J) at a locus in a given population is measured by J = ∑ Χ i 2 , where ∑ stands for summation over all alleles, and Χ i is the frequency of the ith allele 55,56 . If the number of the ith allele is m, Χ i is calculated to be m/N, where N is the total number of the mutant alleles being investigated. The alleles harboring obscure mutations were counted as SLC25A13 alleles different from those with detected variations, and thus, each of them was defined to have a frequency of 1/N. Statistical analysis. The frequencies of the SLC25A13 mutations and genotypes, as well as the homozygosity values among the three geographic areas, were compared by means of Chi-square test or Fisher's exact tests, respectively. When the Chi-square test of 3 × 2 table was significant with P < 0.05, pairwise comparisons were then performed with Bonferroni corrections of the P values. There were 3 pairwise comparisons: (1) north vs border, (2) north vs south, and (3) border vs south; accordingly, the adjusted P values of 2 × 2 Chi-square test for significance was 0.017 (0.05/3) 57 . The data of growth abilities of the yeast strains were analyzed by using one-way ANOVA followed by the Games-Howell test for the pairwise comparison of the non-homogeneity of variances, with P < 0.05 as the significant criteria. All statistical calculations were performed on the software SPSS17.0.