Novel findings from family-based exome sequencing for children with biliary atresia

Biliary atresia (BA) is a progressive inflammation and fibrosis of the biliary tree characterized by the obstruction of bile flow, which results in liver failure, scarring and cirrhosis. This study aimed to explore the elusive aetiology of BA by conducting whole exome sequencing for 41 children with BA and their parents (35 trios, including 1 family with 2 BA-diagnosed children and 5 child-mother cases). We exclusively identified and validated a total of 28 variants (17 X-linked, 6 de novo and 5 homozygous) in 25 candidate genes from our BA cohort. These variants were among the 10% most deleterious and had a low minor allele frequency against the employed databases: Kinh Vietnamese (KHV), GnomAD and 1000 Genome Project. Interestingly, AMER1, INVS and OCRL variants were found in unrelated probands and were first reported in a BA cohort. Liver specimens and blood samples showed identical variants, suggesting that somatic variants were unlikely to occur during morphogenesis. Consistent with earlier attempts, this study implicated genetic heterogeneity and non-Mendelian inheritance of BA.


Results
Clinical features. We recruited a total of 42 children who had been diagnosed with BA based on intraoperative findings and liver biopsy. All patients showed typical BA symptoms, such as prolonged jaundice, acholic stool and abnormalities of the biliary tract at early infantile. The patients, including 23 males and 19 females born from 2009 to 2019, but the majority of patients were born in recent years. All patients underwent Kasai surgery immediately after birth (mostly after their 2 months of life), but the concentrations of bilirubin and serum enzymes indicating liver function, such as ALP, ALT, AST and γ-GT, remained high at the time of enrolment (Table 1). Some patients have developed liver cirrhosis (BA002_3, BA005, BA012, BA013 and BA018). One BA patient has infected with CMV (BA037). Several probands whose siblings were reported to develop liver diseases or other genetic conditions, including BA (BA002_4), primary sclerosing cholangitis (BA032), choledochal cyst (BA042) and haemophilia (BA025). Four mothers experienced abnormal pregnancy (BA024, BA027; BA036, BA038). The remaining families did not show any significant concern during their pregnancy and had no family history of BA or other genetic conditions. Excluding one family who failed to come for blood drawing after the first health examination, we were finally able to collect blood samples from 41 BA-affected children and   Genetic properties. We applied a strict filtering strategy by removing variants with MAF > 1%, synonymous variants and variants with a CADD scaled score < 10. Finally, we identified a total of 28 variants in 25 genes from our BA-affected cohort ( Table 2). All variants were subsequently confirmed by Sanger sequencing (Fig. S1). Among the 28 detected variants, 17 X-linked variants (61%) were detected in 17 different genes, 6 de novo variants (21%) were detected in 6 genes from 5 probands, including INVS, ELP2, TINAG, CEP63, CCDC136, and BCAR1, and 5 homozygous variants were identified in 5 genes (18%) (Fig. 1), including HACE1, VPS13C, RAPGEF4, FOCAD and INVS (Table 2). Family #2 involved two siblings with similar phenotypes (early onset jaundice, BA diagnosed). Two X-linked and 1 homozygous variants were detected in the male sib of family #2, and none were detected in his sister (Table 2). Interestingly, several genes with genetic predisposition were observed in unrelated patients, including AMER1 (BA004 and BA007), INVS (BA014 and BA041), and OCRL (BA032 and BA041). Noticeably, proband BA014 carried an INVS de novo variant, while proband BA041 carried an INVS homozygous variant (Table 2). In addition to blood samples, we were able to collect 18 liver specimens from our BA cohort. Of these, blood and liver samples from 8 children shared identical variants (BA009, BA016, BA032, BA036, BA037, BA038, BA040 and BA041). Additionally, we did not detect any significant variants based on our rationales for variant classification (Table 3). In other words, this study did not detect any somatic variants from liver samples.
Effect of genetic predisposition. The detected variants showed extremely low MAFs against three employed databases: Kinh Vietnamese (KHV), GnomAD and 1000 Genome Project (Table 2). We noticed that the MAFs of the HACE1 and VPS13C variants were above 1% against the KHV database, while the rest were significantly below the thread hold of 1%. All variants with CADD Phred scaled scores were above 10 and mostly above 20, indicating either the 10% or 1% most deleterious substitutions, respectively. Among these variants, INVS:c.C208 > T was the most deleterious, with the highest scaled score of 37 ( Table 2).
The Polyphen-2 and SIFT tools showed a consensus on the damaging impact of HACE1, PHKA1, XIAP, and AMER1 (c.A1075 > T), POF1B, MAOA, BCAR1, FOCAD, ARSF and OCRL variant, while the rest varied from tools (Table S1). We used I-Mutant to predict the stability of amino acid substitution for 28 identified variants via the change of free energy change values (DDG). The results show that except OCRL:c.T2603 > A(p.Met876Lys), which increased the stability of the mutant compared to that of the wild-type variants, all variants showed decreased stability (Table S2). The HOPE tool was used to predict the structural effect of missense variants, showing changes in residue size and hydrophobic and structural stability (Fig. S2). Changes in amino acid size and charge resulted in a loss of interaction and disturbance of protein function. Several variants, for example HACE1:c.G1660 > A(pAla651Thr) and PHKA1:c.G478 > A(p.Asp160Asn), whose wild-type residues are located in important domains. Thus, any substitution in these regions was predicted to lead to a functional disturbance. In contrast to I-mutant prediction, HOPE showed that an alternation of methionine by lysine residue in the variant OCRL:c.T2603 > A(p.Met876Lys) can disturb the hydrophobic interaction of the altered residue with other molecules on the surface of the protein (Fig. S2).

Analysis of biological function and human disease phenotype. Compute overlaps of 25 candidate
genes to the human phenotype ontology from the Molecular Signatures Database, involving 4,494 gene sets (FDR q value < 0.05), indicated that the candidate genes felt into various human phenotype gene sets, ranging from gonosomal inheritance and X-linked recessive inheritance to involuntary movements (Table 4). We also computed our gene set to find the association of these genes with the reported phenotypes available from the HPO and Monarch Initiative (Table S3). However, we did not find any overlapping phenotypes from these databases. The reason might be a lack of genes/pathways associated with the BA phenotype in these available databases, which are often dominated by studies on Caucasians, where the prevalence of BA in this group is much lower than that in Asians. By applying the same strategy to identify the potential contribution of ciliary dysgenesis underlying the BA phenotype, we used a gene set containing 2016 genes of interest 43 . We found that some genes from our study, including BCOR, INVS and OCRL, were included in this gene set. This result suggested the novelty of BCOR, INVS and OCRL from our BA cohort.
Similar to a previous study 43 , we did not identify any variants in some genes that have been previously suggested to be associated with BA or BA-related diseases, such as PKD2 (polycystic kidney disease 2, polycystic kidney and hepatic disease 1), CFC1 (polysplenia), JAG1 (Alagille syndrome) and PKD1L1 (biliary atresia splenic malformation syndrome-BASM). We also did not find significant variants in the susceptibility loci of ADD3, XPNPEP1, GPC1, ARF6 and EFEMP1, as suggested by GWAS 44 .

Discussion
Similar to other previous studies, we attempted to reveal the genetic pattern of BA disorder by conducting triobased exome sequencing for 40 families involving 41 children with BA. Going beyond this establishment in a genetic study for such a rare and complex disorder, we further tested our hypothesis of whether the detected variants occurred in somatic or germline cells by sequencing both blood and available liver specimens obtained from our BA cohort. Due to the complexity of BA, we applied a stringent bioinformatics pipeline and tight quality control to determine either the rarest variants or putative de novo events from our BA cohort, which would avoid a huge number of variants as often experienced from mass sequencing. Taking this straightforward principle enabled us to end up with a total of 28 variants in 25  www.nature.com/scientificreports/ and liver samples allowed us to rule out the occurrence of somatic variants in the development of the disease as previously hypothesized 45 .
In agreement with previous studies, our results showed an intriguing genetic aspect of BA, which was highly heterogeneous. It is worth noting that along with other variants, this study found 3 genes whose variants occurred    43 further supported the possibility of the causative role of these genes in BA. AMER1 (MIM#300647) encodes APC membrane recruitment protein 1, which acts as an inhibitor of the canonical Wnt/beta-catenin signalling pathway 46 and controls hepatobiliary development during embryogenesis. In mature healthy liver cells, it is mostly inactive, and the abnormal Wnt/beta-catenin signalling pathway can promote the development of liver diseases 47 . AMER1 associates with osteopathia striata with cranial sclerosis 48 and Wilms tumour development [49][50][51] . The gene is involved in the activation of the Wnt/ beta-catenin signalling pathway, which drives hepatocarcinoma and cholangiocarcinoma 52 . In addition, analysis of the effect of genetic predispositions of AMER1 variants indicated that they were damaging because the alternated residues were located in highly conserved positions. The alternations might lead to destabilization of the local conformation and a loss of protein interaction (Table S1, S2, Fig. S2). Despite a lack of AMER1 to typical BA phenotypes, we inferred its indirect role in the development of BA as a result of activation of the Wnt/betacatenin signalling pathway.
Our study highly suggested INVS as a BA candidate gene owing to INVS variant detection in 2 unrelated probands, their mode of inheritance and the effect of genetic predisposition. In particular, INVS: c.C208 > T (p.Arg396*) was de novo, and a loss-of-function variant with a CADD score of 37 and its allele frequency was absent from all employed databases. INVS encodes inversin protein, which plays a role in primary cilia function and is involved in the cell cycle. Intriguingly, inactivation of INVS in a mouse model shows a significant increase in bilirubin levels compared to that of the wild-type and pathogenic changes in ductal plate malformation in the intrahepatic biliary of the mutant mouse 53 . The association of INVS with BA had not been previously established due to an absence of INVS variants detected in BA patients 54,55 . However, INVS is associated with infantile nephronophthisis type 2 [56][57][58] . In our study, we detected an INVS heterozygous de novo variant and a homozygous variant from 2 BA unrelated patients (BA014 and BA041). To our knowledge, this novelty is first reported in BA patients, although future studies are needed to clearly explore the role of INVS in BA development. Similar to BCOR and INVS, OCRL encodes inositol polyphosphate 5-phosphatase, which might be involved in primary cilia assembly. OCRL has been widely reported to be linked to Lowe and Dent syndrome, where clinical manifestations often overlap with Zellweger spectrum disorders, characterized by low muscle tone, feeding difficulty, seizures and liver dysfunction [59][60][61] . Likewise, a lack of an association of OCRL and BA or liver diseases remains a gap for future investigation.
As a result of a rapidly declining cost of DNA sequencing, dozens of rare and previously undiagnosed genetic disorders are currently detectable. For the last 10 years, NGS technology has revolutionized our understanding of human genetics with a high level of accuracy, cost effectiveness and high throughput capability. NGS is steadily becoming a standard in routine diagnostic practices 62 . In BA studies, mitochondrial DNA has been found www.nature.com/scientificreports/ to associate with BA, suggesting the role of mitochondria in underling the pathogenic mechanism 17 . WES has revealed dozen candidate genes either encode ATP-binding cassette transporters (the ABC superfamily) 18,19 or are involved in the Notch signalling pathway, such as JAG1 19,63 and NOTHC2 20 . GWAS have highlighted a strong association between BA and some variants in the ADD3 gene located on 10q24.2 64 . Another subsequent study on 171 BA patients and 1,630 controls of European descent found the strongest signal at rs7099604 in the ADD3 gene 65 . A significant association was found between variant rs17095355 on the XPNPEP1 gene and the disease 66 . Taken together, the aetiology of BA remains challenging due to the involvement of multiple genes and complex mechanisms. Being encouraged by the pioneers, we provided a concrete genetic aspect obtained from an exome trio-based study of a Vietnamese BA cohort. The findings add to our knowledge of the genetic heterogeneity and complexity of BA disorder.

Conclusion
The aetiology of BA remains challenging because there is a lack of conclusive evidence despite extensive research and medical practices for hundreds of years. However, the recent development of NGS technology and its application in studies of BA and liver diseases have gradually revealed the hidden genetic picture of BA aetiology, where dozens of BA-associated genes have been found. Our study identified 28 variants in 25 genes (all validated) from 41 children with BA. These variants were in the 10% most deleterious and were either rare or extremely rare in the population genome database. A combination of functional prediction and analysis of biological processes enabled us to suggest these candidate genes for the development of BA, particularly with those detected in unrelated BA individuals, including AMER1, INVS and OCRL. Identical variants detected from blood and liver wedge specimens from each BA individual suggested that somatic variants in the liver cells were unlikely to occur during morphogenesis. Taken together, we highlighted the genetic heterogeneity of BA and ruled out the Mendelian model. Future studies are needed to further explore the roles of these genes in the development of BA.

Data availability
Data are available from this manuscript and supplementary information.