Introduction

Autosomal recessive polycystic kidney disease (ARPKD; Online Mendelian Inheritance in Man (OMIM) 263200) is a rare severe genetic disorder that occurs in 1:20 000 live births.1 Typically, the disease arises in the perinatal period and patients present with enlarged kidneys and liver, respiratory failure, hypertension and urinary tract infection. Moreover, enlarged and echogenic kidneys, leading to oligohydramnios and possibly to pulmonary hypoplasia, can be observed in utero by fetal ultrasound. Pulmonary hypoplasia is the major cause of morbidity and mortality in the newborn period. In the survivors, hypertension and renal insufficiency, including end-stage renal disease (up to one-third of children require renal replacement therapy), are the major signs of renal disease.2 A further pathognomonic sign of disease is the biliary dysgenesis resulting in congenital hepatic fibrosis, plus intrahepatic bile duct dilatation (Caroli disease). Late-onset presentation of the disease, in childhood or adulthood, has been reported, and these patients predominantly have liver disease and milder kidney involvement.3, 4 ARPKD is caused by mutations of the PKHD1 (polycystic kidney and hepatic disease 1) gene mapped on the short arm of chromosome 6 (6p12.2).5, 6 PKHD1 is one of the largest human genes, extending over 469 kb encompassing 86 exons, 67 of which lead to the longest transcript encoding a 4074-amino-acid protein, polyductin/fibrocystin. The protein has a predicted molecular weight of 447 kDa and it is expressed in the basal body and primary cilia of renal and bile duct epithelial cells.6, 7 Mutations are spread along the entire gene, but they are not equally scattered. Except for a few population-specific founder alleles (that is, p.Arg496*,8 p.Met627Lys9 and the common p.Thr36Met amino acid change), PKHD1 is characterized by significant allelic heterogeneity and the great majority of the patients are compound heterozygous. The screenings of the gene have been performed on several cohorts of different sample size as well as on isolated familial cases. To date, >800 variants have been identified in the PKHD1 gene, 748 of which have been recorded in the public ARPKD/PKHD1 database (http://www.humgen.rwth-aachen.de; last update: August 2013).8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32 Detection rates ranging between 42 and 87% depend on the technique used and the heterogeneity of the population studied. The mutational spectrum consists of truncating mutations (frameshift, nonsense and splice site) and a large number of missense mutations spread over the entire gene. Single-exon–multiexon deletions/duplications of the PKHD1 gene are very rare, with these copy number variations being reported only by three authors.16, 22, 32 In individuals who are PKHD1 negative, mutations in other PKD genes, such as HNF1B, PKD1 and PKD2, have been detected.33, 34, 35 In 2005, based on the analysis of a large number of PKHD1-mutated alleles reported in literature, Bergman et al.36 defined a worldwide exon algorithm for PKHD1 mutation screening, with 50% of the detected mutations affecting 7 exons (3, 58, 32, 36, 57, 61 and 9). Taking into account the algorithm proposed by Bergmann et al.,36 we analyzed by Sanger sequencing 130 unrelated ARPKD patients, 117 of whom were of Italian origin. In addition, we screened a subset of the patients by applying the multiple ligation probe amplification (MLPA) technique in order to search for exon deletions/duplications.

Materials and Methods

Patients

Up to December 2013, at Medical Genetics Center of IRCCS Casa Sollievo della Sofferenza Hospital, San Giovanni Rotondo (Italy), we collected blood samples from 116 unrelated ARPKD patients and 28 parents of 14 not available affected subjects (59 females and 71 males). Seventy-six showed a disease onset at <1 year of age (27 were neonatal cases, 40 perinatal and 9 were pregnancy termination cases) and 54 presented a childhood, juvenile or adult disease onset with an average age of 13.3 years (range 1–54). For each individual, a signed informed consent for genetic testing was obtained.

Inclusion criteria for an ARPKD diagnosis were dilated collecting ducts and congenital hepatic fibrosis accordingly with an available renal and liver histology. If the histology was not available, imaging findings should include strong evidence of ARPKD (bilateral nephromegaly, increased renal echogenicity, loss of corticomedullar differentiation and signs of hepatic fibrosis). Moreover, all probands presented with at least one of the following clinical features: oligohydramnios or anhydramnios, pulmonary hypoplasia, portal hypertension, Potter’s face or affected siblings.

This study was approved by the Ethics Committee of the IRCCS CSS Hospital and complies with the guidelines of the Declaration of Helsinki.

DNA amplification and sequencing

DNA was extracted from blood using EZ1 DNA Blood Kit (QIAGEN, Hilden, Germany). PCR was performed with 70–80 ng of genomic DNA, 15 pmol of each primer and AmpliTaq Gold DNA polymerase (Applied Biosystems, Austin, TX, USA) in the following conditions: 12 min at 94 °C followed by 35 cycles of steps at 94 °C, 56–58–60–62–64–65 °C and 72 °C. PCR products were purified with ExoSAP-IT (USB, Affymetrix, Cleveland, OH, USA) and sequenced in both directions using BigDye terminator v1.1 chemistry on an ABI PRISM 3130xl Genetic Analyzer (Applied Biosystems, Austin, TX, USA).

DNA sequencing was performed on all coding exons (2–67) of the longest open reading frame of PKHD1 and their intronic flanking sequences (20 to 70 bp for each side of exon). These regions were amplified in 74 amplicons. Exons 32, 58 and 61 were sequenced in overlapping fragments because of their large size (5, 3 and 3 fragments, respectively). Primers were designed using the primer 3 program (http://frodo.wi.mit.edu/) and the amplicons were arranged and analyzed according first to the algorithm proposed by Bergmann et al.36 and second to the annealing temperatures in order to optimize the workflow. DNA alignment with the reference sequence (NM_138694.3) and variant analysis has been carried out using Sequencher (GeneCodes, Ann Arbor, MI, USA).

MLPA analysis

Patients resulting negative or carrying a single PKHD1 mutation underwent MLPA analysis in order to detect gene deletions or duplications. Similarly, subjects carrying variants with a priori doubt pathogenicity (that is, novel missense variants) were also analyzed by MLPA.

P341 and P342 probemix developed by R Vijzelaar at MRC-Holland (Amsterdam, The Netherlands) containing 71 probes, interrogating all exons of the longest open reading frame (NM_138694.3 sequence), except for exon 17, were used.

MLPA was performed according to the manufacturer’s protocol. Briefly, 100 ng of genomic DNA was used as starting material; after hybridization, ligation and amplification, the PCR products were size-separated by capillary electrophoresis on an ABI PRISM 3130xl Genetic Analyzer (Applied Biosystems, Foster City, CA, USA). Electropherograms were visualized by GeneMapper v.4.0 (Applied Biosystems, Foster City, CA, USA) and data analysis was carried out using the Coffalyser.Net, the free software designed specifically by MRC-Holland for the analysis of MLPA data and compatible with data files produced directly by all major capillary electrophoresis systems.

Samples showing copy number variations at MLPA analysis were reanalyzed in a second independent experiment and, when available, the result has been confirmed in the parents and other available relatives.

Bioinformatics analysis

The putative pathogenicity of nonsynonymous PKHD1 variants was inspected by querying the dbNSFP version 3.0b2a,37 a database containing multiple precomputed functional impact predictions for every possible amino acid changing mutation within the human genome. In particular, SIFT,38 PROVEAN 1.1,39 PolyPhen-2 v2.2.2,40 LRT,41 MutationTaster2,42 MutationAssessor2,43 FATHMM v2.3,44 and CADD v1.245 predictions were extracted for each single missense variant.

These tools implement different methods (often based on supervised machine learning) in order to assign a pathogenicity score together with a (typically) double categorical predictions (‘neutral’ or ‘pathogenic’).

Furthermore, evolutionary conservation on corresponding genomic sites was inspected by retrieving the PhyloP7way46 and GERP++ indexes47 (see Table 3 for details).

Mutations were also checked for their presence in public variant databases, such as dbSNPv14448 and ExAC v0.3 (Exome Aggregation Consortium; http://exac.broadinstitute.org/, accessed 20 October 2015).

Regarding PKHD1 variants located in the splicing regions, the possible outcome was predicted by using the Human Splice Finder algorithm on the HSF3.0 webserver,49 a position weight matrix-based method that identifies splicing signals and determines a similarity score variation between the wild-type and the mutant one with respect to a signal-specific position weight matrix. Briefly, splice site signal is identified along a DNA sequence for its similarity with a matrix (that is, it reaches a determined score threshold); if a mutation alters such sequence, the splice site is tagged as ‘broken’ when its score decrease for at least 10% of the wild-type value. Conversely, when similarity score increases for 10%, then a new putative splice site has been created by the considered mutation.

Furthermore, information on protein structure and domains was retrieved by querying the Uniprot (http://www.uniprot.org/) and the PFam web resources (http://pfam.xfam.org/).

Results

Mutation analysis

A total of 130 unrelated subjects underwent genetic testing for ARPKD, of whom 117 (90%) were Italian; the remaining patients were from Morocco, Albania, China and Pakistan (6, 4, 2 and 1, respectively).

Mutation analysis of PKHD1 gene was carried out by direct sequencing, applying the algorithm described by Bergmann et al.36 and MLPA technique. We performed an exhaustive analysis in 110 out of 130 individuals, whereas in the other 20 subjects, all of Italian origin, a partial screening was carried out because of insufficient and not more available DNA sample. For each of these 20 subjects, a number of PKHD1 exons ranging from 12 to 45 were analyzed by sequencing.

Taking into account the 110 patients who were completely analyzed (97 Italian and 13 from different countries), we identified 173 mutations on a total of 220 affected alleles, with a detection rate of 78.6%. Thus, in 98 families (89.1%) at least one mutation was identified. Any mutation was identified on 12 probands (Table 1). Sixty-five patients were compound heterozygous and 10 patients were homozygous. After the segregation analysis performed on the parents, in 54 out of 75 probands carrying 2 mutations, we were able to prove that the mutant alleles reside on separate chromosomes (Table 2).

Table 1 Number of patients characterized for PKHD1 mutations by exhaustive sequencing and MLPA analysis
Table 2 Genotype description in a cohort of 130 ARPKD families

On the other hand, considering the total number (n=130) of analyzed individuals, 107 different types of mutations were detected in 193 mutated alleles. Out of 107 mutations, 45 had been previously described and reported in the literature: 12 truncating mutations, 1 splice site mutation and 30 missense mutations that are considered pathogenic in the public ARPKD/PKHD1 database. Two additional variants, p.Glu1448Gly and p.Trp1229Ser with a very low frequency (minor allele frequency=0.002 and 0.003 in ExAC database, respectively), were considered as possible mutations even if they are not clearly characterized as pathogenic on this locus-specific database.

In all, 62 were novel mutations (Tables 2 and 3); 24 were truncating mutations considered definitely pathogenic: 11 nonsense, 6 frameshift, 2 multiexon deletions detected by MLPA and 5 splice site mutations; 2 were in-frame deletions. Two splice site mutations affected nucleotide at noncanonical position of the donor and acceptor splice site, in intron 23 and intron 29, respectively. The c.2407+4A>G variant, identified in patient pkd220, was predicted to significantly alter the wild-type donor splice site according to the HSF3.0 resource (wild-type site score: 81.15; mutant site score: 72.81; variation: −10.28%). On the contrary, the c.3365-3C>T change was not predicted as dangerous variant; however, its finding in two unrelated patients pkd82 and pkd133, the segregation in their families and the absence in the public database makes it a possible pathogenic mutation rather than a neutral variant. Thirty-four were missense variants classified as likely pathogenic mutations according to the segregation analysis and in silico evaluation (Table 3). Figure 1 shows the map of amino acid variants along the PKHD1 protein schematic structure. In two patients, three variants were identified: the nonsense p.Arg124* mutation and the novel missense p.Thr713Ala and p.Leu2244His changes were identified in patient pkd189; the novel missense mutations p.Asp2528Asn and p.Ala3224Pro, the latter at homozygous state, were identified in patient pkd30. Unfortunately, parents of pkd189 were not available and thus it was not possible to verify the segregation pattern. The p.Ala3224Pro variant was found in homozygosity in pkd30 patient and in compound heterozygosity in pkd31 and pkd214 patients, with all these subjects belonging to the same geographical origin of North Italy. In all, 69 subjects underwent MLPA analysis: 25 carried a single variant, 13 without mutations and 31 carried novel missense mutations in homozygosity or in compound heterozygosity. These latter were included in order to exclude the presence of deleterious mutations different from the novel missense found. In total, four alleles were characterized. A heterozygous deletion of the exons 38 and 39 (c.6122-?_6490+?del) was identified in patients pkd111 and pkd156 and confirmed in the available parents. A second large deletion encompassing the exons 38–41 (c.6122-?_6808+?del) was found in patients pkd100 and pkd190 (Figure 2). At the protein level, the deletion of exons 38–39 (p.Ser2042_Gly2164del) partially affects the G8 domain50 (amino acid positions: 1933–2052), whereas the deletion of exons 38–41 (p.Ser2042_Gly2270del) affects the region between the G8 and the β-helix secondary motif (residues 1933–2052 and 2242–2419, respectively).

Table 3 Description of the putative effect of missense PKHD1 mutations
Figure 1
figure 1

Map of amino acid variants along the PKHD1 (polycystic kidney and hepatic disease 1) protein schematic structure. Novel missense, frameshift and nonsense variants were considered and mapped on PKHD1 protein ‘lollipop plot’, by using the MutationMapper v1.01. webtool (www.cbioportal.org). Plot was downloaded as ‘scalable vector graphics’ file format and refined appropriately. Gray-scale blocks represent IPT/TIG, G8 and β-helix domains, respectively.

Figure 2
figure 2

Characterization of two PKHD1 (polycystic kidney and hepatic disease 1) exonic deletions. Bar chart of the multiple ligation probe amplification (MLPA) analysis performed on patients pkd111 and pkd100 showing the deletion of exons 38–39 and 38–41 of the PKHD1 gene, respectively. The normal range of the rapport of the dosage is defined by 0.7 and 1.3 values.

In addition, 10 subjects showed an apparent deletion of the exon 33 or exon 57 because of the interference of mutations p.Arg1775*, p.Asn1779Ser, p.Asn1781Ser and p.Ile2957Thr with MLPA reaction.

Discussion

Here we present one of the largest studies on genetic mutations in ARPKD and representing the first screening of the PKHD1 gene in a large cohort of Italian patients. In fact, only 21 subjects of Italian origin were studied and reported up to now in the public ARPKD/PKHD1 database.

In order to identify causative mutations in our cohort of 130 probands, we applied the algorithm defined by Bergman et al.36 to sequence all coding regions of the gene. Moreover, to search for exon deletions/duplications, MLPA analysis was carried out in a subset of subjects who resulted negative or carried only one mutation after sequencing analysis.

As only 13 out of 130 probands (10%) analyzed in this study were not Italian, we believe these results can be considered representative of the Italian population.

In the completely analyzed 110 ARPKD probands, 173 out of 220 expected mutated alleles were characterized, achieving a detection rate of 78.6%, in line with whole-coding sequence screenings of the PKHD1 gene in cohorts of similar size.13, 14 Failure to detect mutations in 21.4% of chromosomes may have several explanations. The missing mutations in heterozygous subjects and in 12 patients without identified mutations could be located in deep intronic or other regulatory regions distant from the splice donor and acceptor sites that have not been screened so far; alternatively, causative mutations could reside in other genes such as HNF1B, PKD1 and PKD2, not analyzed in the present study.

If we consider all 130 probands, a total of 107 different types of mutations have been detected accounting for a total of 193 characterized alleles. Out of 107 mutations, 45 (42.1%) had been previously described and reported in the literature, whereas 62 (57.9%) were novel. The most frequent mutations identified in our cohort were: c.5895dupA (p.Leu1966Thrfs*4) and p.Arg1624Trp (4.7%), p.Thr36Met and p.Arg1775* (3.6%), p.Ile222Val (3.1%), among the known mutations and the p.Asn1779Ser (2.6%) and the p.Gln2708* (2.1%) among the novel. The use of MLPA technique to search for PKHD1 deletions or duplications allowed us to identify 4 alleles (2%) and it has proved to be indispensable to reach the conclusive molecular diagnosis in 3 patients. The absence of a founder mutation reveals the high degree of the genetic heterogeneity of the Italian population. Indeed, taking into account the families carrying two mutated alleles, only 5 out of 67 (7.4%) of the Italian probands showed homozygosity, not due to consanguineous marriages, compared with the 5 out of 8 (62%) of the non-Italian probands (Table 1). Before our work, Krall et al.29 applied the algorithm defined by Bergmann et al.36 to analyze 50 individuals affected by ARPKD and proposed a sequencing strategy in order to facilitate genetic testing for Hispanic populations. They found that only 21 exons were sufficient to identify 86% of the expected mutated alleles. In our Italian population mutations have been detected in 45 of the 67 PKHD1 exons with more than half of them concentrated in only 7 exons (32, 36, 58, 33, 61, 3 and 9), thus providing a 51% of chance to find at least one mutation.

In conclusion, our report expands the spectrum of PKHD1 mutations and confirms the allelic heterogeneity of this disorder. The studied population represents the largest Italian ARPKD cohort reported to date. Nevertheless, we think the present could be one of the last screening carried out by Sanger sequencing in such large cohort of patients. In fact, genes of large size and presenting allelic heterogeneity such as PKHD1 make the Sanger-based sequencing of single amplicons labor intensive and expensive and fully justify the use of next-generation sequencing (NGS) technologies. In addition, the NGS approach makes it possible to evaluate several genes of interest, (that is, HNF1B, PKD1 and PKD2 in the case of ARPKD) in a single test. In the past years, some PKHD1 mutations were identified by exome sequencing.23, 30, 31 Recently, Tavira et al.51 estimated that the cost of the NGS of the 30 samples, plus Sanger sequencing of PCR-fragments to assign the identified mutations, was 3.5-fold lower than the Sanger sequencing of all the PKHD1 amplicons on an ABI3130xl sequencer. Besides, Sanger sequencing-based molecular screening would have required several weeks, whereas the NGS project was completed in only 2 weeks. Exome sequencing and targeted resequencing are thus attractive tools that can improve molecular diagnosis in ARPKD and many other genetic diseases.