Introduction

Biliary atresia is a progressive inflammation and fibrosis of the biliary tree that consequently results in the development of cholestatic liver disease. BA was first described by surgeon John Thomson in 18921 and is among the most fatal diseases, leading to severe complications in infants. The disease occurs in the early stage of neonates and can be treated by hepatic portoenterostomy or Kasai operation2. After surgical treatment, however, approximately 50% of affected infants require liver transplantation, while the rest would sustain their own liver up to the age of 5–10 years3. A study on Vietnamese BA patients reported that 84% and 71% of Kasai-treated patients survived after 1–2 years, respectively. Additionally, the respective ratios were 52% and 28% for the group without Kasai treatment4. It is estimated that after hepatic portoenterostomy operation, 70–80% of patients with BA still require liver transplantation by adulthood due to the progressive development of liver scarring, failure and cirrhosis5,6.

Although BA has been extensively studied, its aetiology and pathogenesis remain elusive. Several hypotheses explaining the cause of the disease, including viral infection, autoimmune-mediated bile duct destruction, biliary toxin, and genetic abnormality, have been proposed7. Regarding genetic aspects, debate over the Mendelian mechanism of the disease has been raised due to a lack of familial BA and a discordant presentation of BA in the monozygotic twin8. Nevertheless, some cases with familial BA have been reported, suggesting that either a recessive autosomal inheritance or a combination of genetic and acquired factors might contribute to the disease’s aetiology9,10,11,12. In addition, some studies have examined an association between BA and microchimerism, where the genetic trait is maternally transferred from the mother and later contributes to phenotypic heterogeneity and non-Mendelian inheritance13,14. More specifically, a heterozygous transition CFC1:c.433G > A in 5 BA patients with polysplenia syndrome implies a genetic predisposition to BA splenic malformation15. In a mouse model, inactivation of the hepatocyte nuclear factor 1 beta gene (Hnf1β) causes abnormalities of the gallbladder and intrahepatic bile ducts, resulting in severe jaundice16. Observations of an increased incidence of BA in some groups, such as Asian and Polynesian populations, suggest that genetic and environmental factors might cause the disease. Recent genetic studies have revealed a linkage between cholestatic jaundice and genetic predispositions in both nuclear DNA and mitochondrial DNA17,18,19,20.

The prevalence of BA is 1 in 8000–18,000 live births and varies among countries and groups, with a dominance of females over males21. The disease occurs more frequently in Southeast Asia and the Ocean Pacific22. It is approximately 1 in 5000 live births in Taiwan compared to 1 in 14,000–20,000 in North America or Western Europe6,23,24. To our knowledge, there is no epidemiological study of BA in Vietnamese, as is the prevalence of this fatal disease. The prevalence of BA in Vietnam is estimated to be as high as 1 in 2400 live births as equal to that of the Ocean Pacific regions22. Although BA and BA-related liver diseases are often observed in Vietnamese infants and are life-threatening diseases, few studies have been reported thus far4,25,26. To date, the Kasai portoenterostomy procedure has been introduced as a routine surgical practice and offers a better opportunity to patients25. However, a number of patients still require liver transplantation after the operation or have a low quality of life due to the disease’s complications. Recently, next-generation sequencing (NGS), particularly whole exome sequencing (WES), has been increasingly applied for detecting variants in patients with cholestasis27. It appears to be a powerful tool to aid diagnosis and to provide timely and accurate therapeutic treatments. Therefore, we aimed to investigate the genetic pattern of BA by conducting a family-based WES for children with BA in hope of exploring new and characterized causative variants, which would shed light on the aetiology of the deadly disease.

Materials and methods

Patient recruitment

BA diagnosis was based on intraoperative findings and liver biopsy. Patients with confirmed BA and their parents were recruited at Vinmec International Hospital and Vietnam National Hospital of Pediatrics in Hanoi from May 2019 to May 2020. A written informed consent form was provided to the parents for their participation. The study was approved by the Ethics Committee of the hospitals in accordance with the Declaration of Helsinki.

Sample collection and DNA extraction

Approximately 2 mL of peripheral blood from patients and their parents was collected in an EDTA anticoagulant tube and stored at − 80 °C. Liver wedge specimens were collected from the Kasai operation, snapped frozen in liquid nitrogen and stored at − 80 °C. Genomic DNA was extracted by using a DNA Mini Blood Isolation Kit based on the manufacturer’s protocol (Qiagen, Germany). DNA samples were quantified by fluorescence using a Qubit BR Quantification Kit (Invitrogen, USA). Extracted DNA samples were preserved at − 80 °C for future uses.

Whole exome sequencing

Exome sequencing libraries were prepared by using a Nextera Rapid Capture Kit (Illumina, Calif, USA) based on the manufacturers’ protocol with slight modifications. The library concentration was quantified by a Qubit dsDNA Broad Range Assay Kit (Invitrogen, USA). Library size was measured by using a LabChip 3 K Hisense Kit (PerkinElmer, USA). Paired-end exome sequencing with 150 bp cycles was performed on a HiSeq 4000 (Illumina, Calif, USA), targeting an averaged depth of 100X.

Bioinformatics analysis

Variant calling and annotation were performed based on highly regarded tools28. Reads with low quality, adapters and noise were removed prior to the downstream analysis by using FastQC and Trimmomatic. Reads were aligned with the reference genome GRch38 version29. Bowtie2, BWA and Qualimap were used for quality control30. To minimize the false-positive rate, multiple variant calling tools, including the Genome Analysis Toolkit (GATK)31, SAMtools mpileup32 and Freebayes33, were mutually used.

Variant classification, functional prediction and genotype–phenotype analysis

A stringent strategy was applied for variant classification, including (i) inclusion of rare and nonsynonymous variants with a minor allele frequency (MAF) < 1% against three databases: the Kinh Vietnamese population (KHV) obtained from our previous study on the Vietnamese human genome database34, GnomAD (https://gnomad.broadinstitute.org/) and 1000 Genome Project29; (ii) inclusion of variants with 3 types of inheritance modes: X-linked, homozygous and putative de novo; and (iii) variants with a CADD Phred score of > 10, indicating the 10% most deleterious variants in the genome35. In silico prediction tools, including SIFT36, PolyPhen-237, Mutation Taster38, I-Mutant39 and HOPE40, were employed to predict the impact of genetic changes. Molecular Signatures Databases (MSigDB v7.2) were used to compute the candidate genes with the gene sets of human phenotype ontology41,42.

Validation of WES results

Selected variants were then confirmed by bidirectional Sanger sequencing. Proper primers were designed for these variants, followed by PCR amplification and sequencing on an ABI 3500 DX system using a BigDye Terminator v3.1 (Thermo Fisher, Massachusetts, USA).

Ethics approval

The study was approved by the Ethics Committee of Vinmec International General Hospital JSC (Decision No. 48/2019/QD-VMEC).

Consent to participate

A written informed consent form was provided to the parents prior to their participation.

Consent for publication

The participants provided consent for publication of all relevant data and this manuscript.

Results

Clinical features

We recruited a total of 42 children who had been diagnosed with BA based on intraoperative findings and liver biopsy. All patients showed typical BA symptoms, such as prolonged jaundice, acholic stool and abnormalities of the biliary tract at early infantile. The patients, including 23 males and 19 females born from 2009 to 2019, but the majority of patients were born in recent years. All patients underwent Kasai surgery immediately after birth (mostly after their 2 months of life), but the concentrations of bilirubin and serum enzymes indicating liver function, such as ALP, ALT, AST and γ-GT, remained high at the time of enrolment (Table 1). Some patients have developed liver cirrhosis (BA002_3, BA005, BA012, BA013 and BA018). One BA patient has infected with CMV (BA037). Several probands whose siblings were reported to develop liver diseases or other genetic conditions, including BA (BA002_4), primary sclerosing cholangitis (BA032), choledochal cyst (BA042) and haemophilia (BA025). Four mothers experienced abnormal pregnancy (BA024, BA027; BA036, BA038). The remaining families did not show any significant concern during their pregnancy and had no family history of BA or other genetic conditions. Excluding one family who failed to come for blood drawing after the first health examination, we were finally able to collect blood samples from 41 BA-affected children and their parents. Among these 41 children, we collected liver specimens from 18 children obtained from the Kasai operation.

Table 1 Clinical features of children with biliary atresia.

Genetic properties

We applied a strict filtering strategy by removing variants with MAF > 1%, synonymous variants and variants with a CADD scaled score < 10. Finally, we identified a total of 28 variants in 25 genes from our BA-affected cohort (Table 2). All variants were subsequently confirmed by Sanger sequencing (Fig. S1). Among the 28 detected variants, 17 X-linked variants (61%) were detected in 17 different genes, 6 de novo variants (21%) were detected in 6 genes from 5 probands, including INVS, ELP2, TINAG, CEP63, CCDC136, and BCAR1, and 5 homozygous variants were identified in 5 genes (18%) (Fig. 1), including HACE1, VPS13C, RAPGEF4, FOCAD and INVS (Table 2). Family #2 involved two siblings with similar phenotypes (early onset jaundice, BA diagnosed). Two X-linked and 1 homozygous variants were detected in the male sib of family #2, and none were detected in his sister (Table 2). Interestingly, several genes with genetic predisposition were observed in unrelated patients, including AMER1 (BA004 and BA007), INVS (BA014 and BA041), and OCRL (BA032 and BA041). Noticeably, proband BA014 carried an INVS de novo variant, while proband BA041 carried an INVS homozygous variant (Table 2).

Table 2 Genetic characteristics of Vietnamese children with biliary atresia.
Figure 1
figure 1

Mode of inheritance of identified variants from the biliary atresia cohort. X-linked variants are presented in blank; de novo variants are presented in grey; and autosomal recessive variants are presented in dots.

In addition to blood samples, we were able to collect 18 liver specimens from our BA cohort. Of these, blood and liver samples from 8 children shared identical variants (BA009, BA016, BA032, BA036, BA037, BA038, BA040 and BA041). Additionally, we did not detect any significant variants based on our rationales for variant classification (Table 3). In other words, this study did not detect any somatic variants from liver samples.

Table 3 Identical variants detected from blood and liver samples.

Effect of genetic predisposition

The detected variants showed extremely low MAFs against three employed databases: Kinh Vietnamese (KHV), GnomAD and 1000 Genome Project (Table 2). We noticed that the MAFs of the HACE1 and VPS13C variants were above 1% against the KHV database, while the rest were significantly below the thread hold of 1%. All variants with CADD Phred scaled scores were above 10 and mostly above 20, indicating either the 10% or 1% most deleterious substitutions, respectively. Among these variants, INVS:c.C208 > T was the most deleterious, with the highest scaled score of 37 (Table 2).

The Polyphen-2 and SIFT tools showed a consensus on the damaging impact of HACE1, PHKA1, XIAP, and AMER1 (c.A1075 > T), POF1B, MAOA, BCAR1, FOCAD, ARSF and OCRL variant, while the rest varied from tools (Table S1). We used I-Mutant to predict the stability of amino acid substitution for 28 identified variants via the change of free energy change values (DDG). The results show that except OCRL:c.T2603 > A(p.Met876Lys), which increased the stability of the mutant compared to that of the wild-type variants, all variants showed decreased stability (Table S2). The HOPE tool was used to predict the structural effect of missense variants, showing changes in residue size and hydrophobic and structural stability (Fig. S2). Changes in amino acid size and charge resulted in a loss of interaction and disturbance of protein function. Several variants, for example HACE1:c.G1660 > A(pAla651Thr) and PHKA1:c.G478 > A(p.Asp160Asn), whose wild-type residues are located in important domains. Thus, any substitution in these regions was predicted to lead to a functional disturbance. In contrast to I-mutant prediction, HOPE showed that an alternation of methionine by lysine residue in the variant OCRL:c.T2603 > A(p.Met876Lys) can disturb the hydrophobic interaction of the altered residue with other molecules on the surface of the protein (Fig. S2).

Analysis of biological function and human disease phenotype

Compute overlaps of 25 candidate genes to the human phenotype ontology from the Molecular Signatures Database, involving 4,494 gene sets (FDR q value < 0.05), indicated that the candidate genes felt into various human phenotype gene sets, ranging from gonosomal inheritance and X-linked recessive inheritance to involuntary movements (Table 4). We also computed our gene set to find the association of these genes with the reported phenotypes available from the HPO and Monarch Initiative (Table S3). However, we did not find any overlapping phenotypes from these databases. The reason might be a lack of genes/pathways associated with the BA phenotype in these available databases, which are often dominated by studies on Caucasians, where the prevalence of BA in this group is much lower than that in Asians. By applying the same strategy to identify the potential contribution of ciliary dysgenesis underlying the BA phenotype, we used a gene set containing 2016 genes of interest43. We found that some genes from our study, including BCOR, INVS and OCRL, were included in this gene set. This result suggested the novelty of BCOR, INVS and OCRL from our BA cohort.

Table 4 Analysis of human phenotype ontology.

Similar to a previous study43, we did not identify any variants in some genes that have been previously suggested to be associated with BA or BA-related diseases, such as PKD2 (polycystic kidney disease 2, polycystic kidney and hepatic disease 1), CFC1 (polysplenia), JAG1 (Alagille syndrome) and PKD1L1 (biliary atresia splenic malformation syndrome- BASM). We also did not find significant variants in the susceptibility loci of ADD3, XPNPEP1, GPC1, ARF6 and EFEMP1, as suggested by GWAS44.

Discussion

Similar to other previous studies, we attempted to reveal the genetic pattern of BA disorder by conducting trio-based exome sequencing for 40 families involving 41 children with BA. Going beyond this establishment in a genetic study for such a rare and complex disorder, we further tested our hypothesis of whether the detected variants occurred in somatic or germline cells by sequencing both blood and available liver specimens obtained from our BA cohort. Due to the complexity of BA, we applied a stringent bioinformatics pipeline and tight quality control to determine either the rarest variants or putative de novo events from our BA cohort, which would avoid a huge number of variants as often experienced from mass sequencing. Taking this straightforward principle enabled us to end up with a total of 28 variants in 25 respective genes. Identical variants detected from blood and liver samples allowed us to rule out the occurrence of somatic variants in the development of the disease as previously hypothesized45.

In agreement with previous studies, our results showed an intriguing genetic aspect of BA, which was highly heterogeneous. It is worth noting that along with other variants, this study found 3 genes whose variants occurred in unrelated probands, including AMER1, INVS and OCRL. While the aetiology of BA remained unclear and was unlikely to follow the Mendelian model, our results implicated their role in the disease's development. Overlapping findings of BCOR, INVS and OCRL in the Vietnamese BA cohort with a large comprehensive ciliopathy and biliary genes of interest in the previous study43 further supported the possibility of the causative role of these genes in BA. AMER1 (MIM#300647) encodes APC membrane recruitment protein 1, which acts as an inhibitor of the canonical Wnt/beta-catenin signalling pathway46 and controls hepatobiliary development during embryogenesis. In mature healthy liver cells, it is mostly inactive, and the abnormal Wnt/beta-catenin signalling pathway can promote the development of liver diseases47. AMER1 associates with osteopathia striata with cranial sclerosis48 and Wilms tumour development49,50,51. The gene is involved in the activation of the Wnt/beta-catenin signalling pathway, which drives hepatocarcinoma and cholangiocarcinoma52. In addition, analysis of the effect of genetic predispositions of AMER1 variants indicated that they were damaging because the alternated residues were located in highly conserved positions. The alternations might lead to destabilization of the local conformation and a loss of protein interaction (Table S1, S2, Fig. S2). Despite a lack of AMER1 to typical BA phenotypes, we inferred its indirect role in the development of BA as a result of activation of the Wnt/beta-catenin signalling pathway.

Our study highly suggested INVS as a BA candidate gene owing to INVS variant detection in 2 unrelated probands, their mode of inheritance and the effect of genetic predisposition. In particular, INVS: c.C208 > T (p.Arg396*) was de novo, and a loss-of-function variant with a CADD score of 37 and its allele frequency was absent from all employed databases. INVS encodes inversin protein, which plays a role in primary cilia function and is involved in the cell cycle. Intriguingly, inactivation of INVS in a mouse model shows a significant increase in bilirubin levels compared to that of the wild-type and pathogenic changes in ductal plate malformation in the intrahepatic biliary of the mutant mouse53. The association of INVS with BA had not been previously established due to an absence of INVS variants detected in BA patients54,55. However, INVS is associated with infantile nephronophthisis type 256,57,58. In our study, we detected an INVS heterozygous de novo variant and a homozygous variant from 2 BA unrelated patients (BA014 and BA041). To our knowledge, this novelty is first reported in BA patients, although future studies are needed to clearly explore the role of INVS in BA development. Similar to BCOR and INVS, OCRL encodes inositol polyphosphate 5-phosphatase, which might be involved in primary cilia assembly. OCRL has been widely reported to be linked to Lowe and Dent syndrome, where clinical manifestations often overlap with Zellweger spectrum disorders, characterized by low muscle tone, feeding difficulty, seizures and liver dysfunction59,60,61. Likewise, a lack of an association of OCRL and BA or liver diseases remains a gap for future investigation.

As a result of a rapidly declining cost of DNA sequencing, dozens of rare and previously undiagnosed genetic disorders are currently detectable. For the last 10 years, NGS technology has revolutionized our understanding of human genetics with a high level of accuracy, cost effectiveness and high throughput capability. NGS is steadily becoming a standard in routine diagnostic practices62. In BA studies, mitochondrial DNA has been found to associate with BA, suggesting the role of mitochondria in underling the pathogenic mechanism17. WES has revealed dozen candidate genes either encode ATP-binding cassette transporters (the ABC superfamily)18,19 or are involved in the Notch signalling pathway, such as JAG119,63 and NOTHC220. GWAS have highlighted a strong association between BA and some variants in the ADD3 gene located on 10q24.264. Another subsequent study on 171 BA patients and 1,630 controls of European descent found the strongest signal at rs7099604 in the ADD3 gene65. A significant association was found between variant rs17095355 on the XPNPEP1 gene and the disease66. Taken together, the aetiology of BA remains challenging due to the involvement of multiple genes and complex mechanisms. Being encouraged by the pioneers, we provided a concrete genetic aspect obtained from an exome trio-based study of a Vietnamese BA cohort. The findings add to our knowledge of the genetic heterogeneity and complexity of BA disorder.

Conclusion

The aetiology of BA remains challenging because there is a lack of conclusive evidence despite extensive research and medical practices for hundreds of years. However, the recent development of NGS technology and its application in studies of BA and liver diseases have gradually revealed the hidden genetic picture of BA aetiology, where dozens of BA-associated genes have been found. Our study identified 28 variants in 25 genes (all validated) from 41 children with BA. These variants were in the 10% most deleterious and were either rare or extremely rare in the population genome database. A combination of functional prediction and analysis of biological processes enabled us to suggest these candidate genes for the development of BA, particularly with those detected in unrelated BA individuals, including AMER1, INVS and OCRL. Identical variants detected from blood and liver wedge specimens from each BA individual suggested that somatic variants in the liver cells were unlikely to occur during morphogenesis. Taken together, we highlighted the genetic heterogeneity of BA and ruled out the Mendelian model. Future studies are needed to further explore the roles of these genes in the development of BA.