Genetic analysis by targeted next-generation sequencing and novel variation identification of maple syrup urine disease in Chinese Han population

Maple syrup urine disease (MSUD) is a rare autosomal recessive disorder that affects the degradation of branched chain amino acids (BCAAs). Only a few cases of MSUD have been documented in Mainland China. In this report, 8 patients (4 females and 4 males) with MSUD from 8 unrelated Chinese Han families were diagnosed at the age of 6 days to 4 months. All the coding regions and exon/intron boundaries of BCKDHA, BCDKHB, DBT and DLD genes were analyzed by targeted NGS in the 8 MSUD pedigrees. Targeted NGS revealed 2 pedigrees with MSUD Ia, 5 pedigrees with Ib, 1 pedigree with MSUD II. Totally, 13 variants were detected, including 2 variants (p.Ala216Val and p.Gly281Arg) in BCKDHA gene, 10 variants (p.Gly95Ala, p.Ser171Pro, p.Phe175Leu, p.Arg183Trp, p.Lys222Thr, p.Arg285Ter, p.Arg111Ter, p.S184Pfs*46, p.Arg170Cys, p.I160Ffs*25) in BCKDHB gene, 1 variant (p.Arg431Ter) in DBT gene. In addition, 4 previously unidentified variants (p.Gly281Arg in BCKDHA gene, p.Ser171Pro, p.Gly95Ala and p.Lys222Thr in BCKDHB gene) were identified. NGS plus Sanger sequencing detection is effective and accurate for gene diagnosis. Computational structural modeling indicated that these novel variations probably affect structural stability and considered as likely pathogenic variants.

www.nature.com/scientificreports/ the gold standard for diagnosis, and 50% of patients have leucine levels exceeding 1500 μmol/L at the time of diagnosis. Determination of BCKAD complex enzyme activity and variation analysis of four genes (BCKDHA, BCKDHB, DBT and DLD) are helpful for definite diagnosis. Tandem mass spectrometry (MSMS) was used to analyze the blood amino acid profile, with leucine, isoleucine and valine levels as the main indicators and leucine/ phenylalanine and valine/phenylalanine ratios as the secondary indicators. Those who screened positive were further diagnosed by urine organic acid, plasma amino acid and gene analysis. MSUD is inherited in autosomal recessive pattern, and it is very rare in most populations, with an incidence of 1:185,000 7 . BCKDC is located in the mitochondrial inner membrane and consists of 4 subunits: Elα, E1β, E2, E3, which are encoded by BCKDHA, BCKDHB, DBT and DLD genes, respectively 8 . According to the involved subunit, MSUD is divided into the following types: (1) type Ia (OMIM 608348), caused by biallelic pathogenic variants in BCKDHA gene encoding the Elα subunit (2) type I b (OMIM 248611), caused by biallelic pathogenic variants in BCKDHB gene encoding the E1β subunit (3) type II (OMlM 248610), caused by biallelic pathogenic variants in DBT gene encoding E2 subunit (4) type III (OMIM 238331), caused by biallelic pathogenic variants in DLD gene encoding E3 subunit 7 . Another two subtypes Type IV and type V are specific kinase and phosphatase gene mutation types, respectively.
MSUD is a genetically heterogeneous disease, and the traditional sequencing technology is time-consuming and costly. High-throughput sequencing technology based on target gene capture for sequencing of the four genes can simultaneously detect gene mutations in the causative genes, not only providing accurate genetic diagnosis results for patients, but also providing clinicians with the basis for differential diagnosis, drug treatment, subsequent genetic counseling, and prenatal diagnosis. In this study, we applied targeted high-throughput sequencing to sequence the target regions of BCKDHA, BCKDHB, DBT and DLD genes in peripheral blood samples of patients or parents in 8 families with MSUD, and Sanger sequencing validation was subsequently performed for confirmation of suspected pathogenic variants.

Methods
Subjects. This is a retrospective study of clinical cases from the First Affiliated Hospital of Zhengzhou University between 2015 and 2020. Eight unrelated families of Chinese Han nationality that had given birth to children affected with MSUD were collected from a single center (Fig. 1). Written informed consent was obtained from the legal guardians. All of the procedures and informed consent were approved by the Medical Ethics Committee of the First Affiliated Hospital of Zhengzhou University (KS-2018-KY-36), and were performed according to the principles of the Declaration of Helsinki.
Blood amino acid and ester acylcarnitine spectra analysis. Urine and venous blood was collected from the children on an empty stomach for more than 4 h. Urinary organic acid analysis was performed using gas chromatography mass spectrometry (GC-MS) and blood amino acid and ester-acylcarnitine profiling was performed using liquid chromatography-tandem mass spectrometry (LC-MS/MS). DNA extraction. Blood samples (2 ml) were collected from each patient and their parents in families 3 and 7 by venipuncture in EDTA tubes. For the remaining six families, parental blood samples were collected. Genomic DNA was extracted from peripheral blood leukocytes using a DNA Blood Mini Kit (Qiagen, Cat. No.51106, Germany) according to the recommended protocol.
Targeted next-generation sequencing. Targeted genes were chosen according to OMIM database (https:// omim. org/) and were designed by the MyGenostics company (Beijing, China). Metabolic disease gene panel was specifically captured and enriched using array-based hybridization chip (NimbleGen, Madison, USA) followed by HiSeq2000 (Illumina, San Diego, USA) sequencing to generate paired read 100 bp according to www.nature.com/scientificreports/ the manufacturer's protocol. Then, the final products were amplified by PCR and validated using the Agilent Bioanalyzer. Fastq-format reads were aligned to the human reference genome (GRCh37/hg19, https:// genome. ucsc. edu/) using BWA 9 software (Burrows Wheeler Aligner). Base quality score recalibration together with SNP and short Indel calling was conducted with GATK 3.8 10 . Quality metrics were evaluated the average depth was 80 × per sample, with at least 97% of the target region covered by 10 × reads or more. The VCF files were then annotated using SnpEff 11 . Variants with > 1% frequency in the population variant databases-1000 Genomes Project 12 , Exome Variant Server (EVS, http:// evs. gs. washi ngton. edu/ EVS/) and Exome Aggregation Consortium (ExAC, http:// exac. broad insti tute. org) or > 5% frequency in the local database with 150 exome datasets were filtered, and subsequently intergenic, intronic, and synonymous variants were filtered, except those located at canonical splice sites. By searching the Human Gene Mutation Database (HGMD, http:// www. hgmd. org/) to clarify whether the variant is a known pathogenic variant. The nomenclature of new variants was based on the international gene variant nomenclature system (http:// www. hgvs. org/ mutno men).
Validation tests of Sanger sequencing. Gene tool software was used for designing primers for suspected variants. Routine PCR reactions were performed. PCR products were purified and directly sequenced on ABl3130-xl gene sequencing instrument using the ABIBigDye3.1 sequencing kit (Thermo Fisher Scientific, USA), and the sequencing data were compared and analyzed using ABI Sequencing Analysis 5.1.1 software.
In silico webservers and structure prediction. Multiple sequence alignments were performed using HomoloGene database (http:// www. ncbi. nlm. nih. gov/ homol ogene) to verify the degree of conservation. The pathogenicity of the variants was then evaluated using three in silico webservers, PolyPhen2 (http:// genet ics. bwh. harva rd. edu/ pph2/), SIFT (http:// prove an. jcvi. org/ index. php) and Mutation Taster (http:// www. mutat ionta ster. org). The American College of Medical Genetics and Genomics (ACMG) guideline was applied to assess novel variants' pathogenicity. Computational modeling was carried out to observe the effect of new missense variants on protein structure. Three-dimensional structure of the target protein sequence was constructed using PyMOL protein model structure simulation software to determine the effect of amino acid substitution on protein structure.
Summary of published data. A literature search by using the PubMed and WanFang databases were conducted to identify reported variants in Chinese MSUD patients.The searches were using the Keywords "Maple syrup urine disease (MSUD)" or "BCKDHA" or "BCKDHB" or "DBT" or "DLD" and "Chinese".

Results
Characteristics of recruited subjects. Between 2015 and 2020, a total of 8 families were collected in our study. The study design is shown in Fig. 1. Characteristics of these cases are shown in Table 1. Only children in family 3 and 7 accepted timely diagnosis and treatment after neonatal screening. All children in the 8 families were screened by tandem mass spectrometry and received positive screening results. As is shown in Table 1, the remaining 6 children developed the disease from 3 days to 4 months, and died at 16 days, 2 months, 20 days, 1 month, 10 days and 1 month, respectively.  Table 2.

Molecular analysis in
Sanger sequencing results. The suspected variants found by NGS were confirmed by Sanger sequencing.
The patient in family 3 carried BCKDHB gene c.511T>C(p.Ser171Pro) and c. c.508C>T(p.Arg170Cys) compound heterozygous variants, and the child in pedigree 7 carried c.523T>C(p.Phe175Leu) and c.478-552del(p. I160Ffs*25) compound heterozygous variants. Their parents were heterozygous carriers of the respective variant. Heterozygous variants in the same causative gene of MSUD were detected in both couples in the remaining six families. Gene sequences of four novel variant in BCKDHA and BCKDHB genes were shown in Fig. 2.

In silico prediction of novel gene variant sequences. Mutation Taster and PolyPhen-2 analysis
showed that four novel missense variants: p.Gly95Ala, p.Gly281Arg, p.Lys222Thr and p.Ser171Pro were highly likely to be pathogenic/deleterious variants. We use The American College of Medical Genetics and Genomics (ACMG) 13 guideline to assess these novel variants' pathogenicity in Table 2.
Three-dimensional structure of proteins. The predicted three-dimensional structures of 4 novel variants in BCKDHA and BCKDHB genes were shown in Fig. 3. In BCKDHA gene, Glycine 281 is located in the random coil structure of protein secondary structure. Glycine lacks side chain (only one H-bond). After variation to arginine, arginine is a basic amino acid with ions, affecting the stability of E1α tertiary structure, thus affecting protein function.
In BCKDHB gene, As is shown in Fig. 3d, Gly95 is located in the β-turn region, and due to the lack of side chains (only one H atom) by Glycine, there is no steric hindrance, allowing a U-shaped turn of the peptide chain   www.nature.com/scientificreports/ affecting the stability of protein secondary structure. Therefore, it is speculated that p.Ser171Pro variant has a greater impact on protein function. In Fig. 3f, Amino acid 222 is located in the α-helix of the secondary structure of the protein, amino acid 222 forms hydrogen bonds with amino acids 79, 83, 218, 225, 252, and 254. After variation to Threonine, it reforms hydrogen bonds with amino acids 76, 218, and 225. The secondary structure of the protein is changed, which disrupts the stability of the protein and may affect the cleavage and activation function of the protein.  (Table 3).

Discussion
Maple syrup urine diabetes is a branched-chain amino acid metabolism disease caused by mitochondrial branched-chain α-keto acid dehydrogenase (BCKDC) deficiency. Scaini et al. 14 suggested that cognitive impairment after accumulation of branched-chain amino acids is mainly due to oxidative damage to the brain. The clinical manifestations of MSUD are lack of specificity with rapid onset. The detection of amino acid levels and the ratio between related amino acids in hemofilter paper by tandem mass spectrometry 15 allow for early screening of MSUD and provide an important basis for further diagnosis and treatment. In this study, the results of blood tandem mass spectrometry in all families showed that both leucine and valine were significantly higher, accompanied by amino acid ratio changes, consistent with MSUD biochemical findings.
In our study, a total of 13 variants (15.4% located in the BCKDHA gene, 76.9% in the BCKDHB gene, and 7.7% in the DBT gene, no variants in DLD gene) were identified in 16 alleles in 8 families. In the systematic literature review of MSUD reported in Chinese population, 81 mutations have been detected in 61 patients in China, including the 8 patients in our study. There are 26 (32.1%) gene variants located in the BCKDHA gene, 45 (55.6%) gene variants in the BCKDHB gene, 10 (12.3%) gene variants in the DBT gene, no variants in the DLD gene. The BCKDHB gene may be a major variant type of MSUD in the Chinese population. Gene variations of MSUD patients are mainly concentrated in the BCKDHB gene, followed by BCKDHA and DBT genes 16 . Current study suggested that DLD gene variants account for 13% 17 . While our data are inconsistent with this, DLD gene variants may be very rare in Chinese population. MSUD gene has high allelic heterogeneity, with the exception of gene mutation hotspots found in minority of ethnic groups, such as the most common mutation in the Mennonite community being the BCKDHA gene c.1312T>A (p.Tyr393Asn) 18 , Portuguese gypsy mutation hotspot c.117delC 19 . The BCKDHB gene c.538G>C was a common mutation found in Ashkenazi Jews 20 , and exon 5 of the BCKDHB gene may be a region of genetic variation and a hotspot region 21,22 . Hotspot mutations are not found in the remaining population [23][24][25] . There were no significant hotspot mutations have been identified in the Chinese population [26][27][28][29][30][31][32] . Variants c.331C>T and c.853C>T in BCKDHB gene may be relatively common in Chinese patients. www.nature.com/scientificreports/ Four variants were novel variants, one located in BCKDHA gene (p. Gly281Arg) and three located in BCKDHB gene (p. Gly95Ala, p.Lys222Thr, p.Ser171Pro) illustrates that the disease has high allelic heterogeneity. Protein structure prediction was carried out for BCKDHA, BCKDHB in order to analyze the variants in a visual way. In BCKDHA, variation p. Gly281Arg is present in coil and minimal change has been discovered in the protein structure. In BCKDHB, p. Gly95Ala is present in the β-turn region, which may change the turning of the β-turn and causing change in local steric conformation. Two variants (p.Lys222Thr and p.Ser171Pro) are present in the α-helix of the secondary structure of the protein and thus have great impact on the protein function. According to the American College of Medical Genetics and Genomics (ACMG) standard and guidelines, the three variants (p.Gly281Arg of BCKDHA, p.Gly95Ala and p.Ser171Pro of BCKDHB) are variants of uncertain significance www.nature.com/scientificreports/ (PM3 + PP3 + PP4), and p.Lys222Thr of BCKDHB is likely pathogenic (PM3 + PM5 + PP3 + PP4). All the novel variants are predicted to be disease causing by prediction software (PolyPhen2, SIFT and Mutation Taster). In summary, these four novel variant may be causative variants.
The relationship between MSUD genotype and phenotype has not yet been established. The incidence of the disease is low, and fewer cases are included in each study, making it difficult to obtain an exact genotype-phenotype relationship. Current studies suggest that patients with BCKDHA and BCKDHB gene variants mostly present with classical type, BCKDH activity is less than 2%. DBT gene variants accounts for about 24%, and most of them are thiamine responsive type 6 . The clinical manifestations of patients are relatively mild, including developmental retardation and hypotonia. Patients with nonsense variations presented the severe classic phenotype. Variations in p.Arg111Ter and p.Arg285Ter in BCKDHB gene generate premature termination codons and the encoded protein has serious effects on the activity of the complex 33 . Our cases in family 5 and 6 carry the nonsense variants p.Arg111Ter and p.Arg285Ter, respectively, and they have classic phenotype. However, the same type of genetic variation also leads to different clinical phenotypes. For example, In BCKDHA gene, p.Glu327Lys has been reported to be associated with intermediate phenotype 8 while the same variation results in classic phenotype in patients in another study 22 . In our study, patient 4 carries the missense variation p.Arg183Trp in the BCKDHB gene and shows intermediate phenotype, while the patient who had the same variation showed classic phenotype in a previous report 34 . Therefore, we could not establish any genotype-phenotype correlation in our patients with MSUD. Half of our cases are classical phenotypes and half are intermediate phenotypes.
Majority of patients with intermediate phenotype had variants in the BCKDHB gene. All the three genes are implicated in classic phenotype. Most of Chinese patients carried DBT gene variations were diagnosed as classic type, while in Norway 35 , patients had DBT gene variants were intermittent phenotype. All patients with classic phenotype have worst clinical outcome.
MSUD is a fatal and disabling inherited metabolic disease which is difficult to treat, and has a poor prognosis. Untreated classical children mostly die shortly after birth. The principle of treatment is to remove the inducement, reduce the toxic effect of blood leucine, correct acute metabolic disorders, maintain plasma branched-chain amino acids in the ideal range, and ensure good nutrition and growth and development 36 . MSUD treatment mainly includes acute phase management, dietary management and vitamin B1 treatment. In recent years, liver transplantation for MSUD has been reported 37,38 . However, shortage of liver sources, high cost, and the need to take immunosuppressive agents for a long time after surgery are disadvantages of this treatment. Currently, the best preventive strategy for the disease is to avoid the birth of affected children through prenatal diagnosis. When MUSD is clinically suspected, capture-based high throughput sequencing followed by Sanger sequencing confirmation allows for accurate detection of gene mutations in the causative genes in an effective manner.
In conclusion, we present the clinical characteristics and 16 variants in 8 patients with MSUD and explore the genotype-phenotype relationship. We identified four pathogenic variants in the BCKDHA and BCKDHB gene by applying high throughput sequencing technology based on target gene capture for sequencing, which have not been previously reported in the Chinese population. This article will contribute to a better understanding of the MSUD variation spectrum identified so far. NGS combined with Sanger sequencing can detect gene variants in the causative genes in an effective way, providing clinicians with the basis for differential diagnosis, drug treatment, subsequent genetic counseling and prenatal diagnosis.