Introduction

Diagnosis of congenital anomalies of the kidney and urinary tract (CAKUT) is based on the recognition of a broad spectrum of renal and urinary tract malformations that, in aggregate, constitute the most common cause of end-stage renal disease in children.1,2 CAKUT may result in chronic kidney disease that leads to severe impairment of physical and psychosocial development.3 Socioeconomically, CAKUT poses a substantial economic burden to families and health-care systems. CAKUT is a clinically heterogeneous phenotype that encompasses renal agenesis, renal hypoplasia/dysplasia, multicystic kidney dysplasia, cross-fused ectopia, duplex renal collecting system, ureteropelvic junction obstruction, mega-ureter, posterior urethral valves, and vesicoureteral reflux (VUR).

Multiple lines of evidence suggest that genetic factors contribute to CAKUT. This evidence includes familial segregation of CAKUT cases and the identification of causative genes.4 Discovery of an underlying genetic etiology facilitates molecular diagnosis and can aid physicians and family members by clarifying associated risks and allowing improved genetic counseling.

In the past, genetic diagnosis was limited to the analysis of individual candidate genes, but whole-exome sequencing (WES) provides an opportunity to arrive at an accurate molecular diagnosis with a single test.5,6 WES is able to identify single-nucleotide variants (SNVs); however, recently, it has also been used to uncover even small CNVs encompassing a single gene or even one exon.7 Extraction of CNV information from WES data is challenging, partly due to the potential artifacts introduced during the exon targeting and amplification steps of WES.8 Moreover, WES can enhance gene discovery for novel potential contributory genes.

Initial reports of clinical WES at a Clinical Laboratory Improvement Amendments–certified laboratory indicated a molecular diagnostic rate of 25% for patients referred for genetic evaluation.5,6 However, neurological phenotypes constituted 80% of the patient population in that study. The clinical utility of using WES in common, sporadic birth defects is undergoing active investigation. Current evaluations of CAKUT patients involve diagnostic imaging, but involvement of other organs may go undiagnosed during such evaluations. Here, we investigated the utility of WES to define a molecular diagnosis (SNVs and CNVs) in patients clinically diagnosed with CAKUT.

Materials and Methods

Patients and their families were recruited from the pediatric urology and renal diseases clinics at the Texas Children’s Hospital in Houston, Texas. Inclusion criteria included individuals with nonsyndromic forms of CAKUT (as defined above) and individuals with syndromic forms of CAKUT for which a genetic etiology had not been identified. Exclusion criteria included individuals with syndromic forms of CAKUT in which an underlying genetic etiology was known and individuals with nonsyndromic and nonfamilial forms of VUR. Therefore, individuals with syndromic features without a known diagnosis were included in the study. The study protocol was approved by the Institutional Review Board for the Protection of Human Subjects at Baylor College of Medicine. Standard procedures were used to recruit subjects for this study. Demographics of the families and phenotypic details of subjects with CAKUT are summarized in Table 1 and Supplementary Table S1 online. Blood samples or saliva-based specimens were collected by standard procedures and according to the families’ wishes. DNA extraction was performed with a QIAmp kit (Qiagen) per the manufacturer’s instructions. DNA was quantified with nanodrop and 1 µg of DNA was used for WES. In familial multiplex cases, WES was performed on the affected available family members most distantly related and observed in their respective pedigrees (see Figure 1 ). Among apparently isolated singleton cases, we performed WES for the proband and the two apparently unaffected parents (case–parent trios in 20 families) when both parents were available. In all other cases, WES was performed only for the proband.

Table 1 Demographics and different phenotypes of the 62 families with CAKUT who underwent whole-exome sequencing
Figure 1
figure 1

Pedigrees and genotypes of the families with pathogenic single-nucleotide variants (SNVs) in genes known to cause CAKUT and a novel CAKUT gene (family 38, FOXP1). *Individuals for whom whole-exome sequencing was performed. NT, not tested. Family 1: Solid black fill shows renal dysplasia and solid gray fill shows proteinuria. CG/CG is normal and CG/C− denotes heterozygous deletion of G (c.70delG) in PAX2. Family 2: Proband has cystic renal dysplasia. C/C is normal and C/CC is heterozygous duplication of C (c.1132dupC) in HNF1B. Family 3: Proband has vesicoureteral reflux and multicystic dysplastic kidney. G/G is normal and G/A denotes heterozygous splice site variant (c.867+5G>A) in EYA1. Family 38: Proband has unilateral renal agenesis and hydrocephaly. G/G is normal and G/T denotes heterozygous de novo FOXP1 p.P225T SNV.

Whole-exome sequencing analysis

WES analysis started with conversion of raw sequencing data (bcl files) to the fastq format by Casava. Then, the short reads were mapped to a human genome reference sequence (GRCh37) by the Burrows-Wheeler Alignment. Subsequently, the recalibration was performed by GATK,9 and variant calling was performed by the Atlas2 suite.10 The Mercury pipeline is available in the Cloud via DNANexus (http://blog.dnanexus.com/2013-10-22-run-mercury-variant-calling-pipeline/).

SNV prioritization and filtering workflow

After detection of all biallelic (homozygous or compound heterozygous) and de novo variants from WES data, we established a SNV prioritization workflow. This included sequential analysis of biallelic predicted loss of function variants (stopgain, frameshift indels, and splicing), biallelic missense variants, de novo truncating variants, and de novo missense variants. Finally, we further examined the shared rare variants among affected family members and parents to detect potential mosaic variants in parents. This SNV prioritization workflow was followed by subsequent filtering of variants based on their frequencies (minor allele frequency ≤0.1%) in internal and external databases, including the Baylor-Hopkins Center for Mendelian Genomics (BHCMG), the Exome Aggregation Consortium, the Exome Sequencing Project (ESP), the 1000 Genomes Project, and the Atherosclerosis Risk in Communities Study (ARIC) databases. To retrieve potentially deleterious and conserved missense changes, we utilized various bioinformatics tools, including the Phylop conservation score and Mutation Taster, SIFT (the “sorting intolerant from tolerant” algorithm), and PolyPhen-2 prediction scores. Next, these potential rare causative variants were analyzed in terms of (1) gene function and the associated phenotype in OMIM and PubMed; (2) gene-associated animal models; (3) tissue expression of the encoded protein; (4) association with the already known gene/genes linked to the patient’s phenotype in terms of (i) gene networks, (ii) gene families, (iii) coexpression, (iv) physical protein–protein interaction, (v) predicted protein–protein interaction, and (vi) molecular pathways; and (5) location of the variant with respect to functional protein domains. The resulting most promising candidate variants were further confirmed and segregated by Sanger sequencing. Finally, the confirmed variants in candidate genes were interrogated in the BHCMG and Baylor Miraca Genetics Laboratories (BMGL) databases and/or through GeneMatcher for the identification of additional affected cases with similar phenotypes.

Copy-number variant inference

To identify copy-number variants (CNVs), our WES data were analyzed using CoNIFER11 software and CoNVex algorithms.12 In CoNVex, as a first step, read depth information from WES data was extracted. Then, the general additive model correction method was performed to remove the systemic bias from the read depth information. Next, the Smith-Waterman algorithm was used to infer the CNV state and score the detected CNV regions. Each potential CNV region was assigned a confidence score. We further filtered CoNVex-detected CNV calls by selecting those that have an associated confidence score ≥5 and ≥5 probes. Afterward, these detected CNV calls by CoNVex were overlapped with CNV calls detected by CoNIFER by using Granges function in R Bioconductor GenomicRanges Package. Overlapping CNVs in previous studies were subjected to validation by array comparative genomic hybridization (aCGH).13,14 Nonoverlapping CNVs were also investigated. Among them, rare CNVs—not present or present with low frequency per the Database of Genomic Variants (http://dgv.tcag.ca)—involving genes potentially contributing to kidney abnormalities were selected for aCGH validation. The flowchart of CNV discovery from WES data is provided in Supplementary Figure S1a online.

SNV and CNV interpretation criteria

SNV interpretation was based on the most recent guidelines published by the American College of Medical Genetics and Genomics (ACMG).15 Accordingly, only variants that met strict criteria were called pathogenic. CNV interpretation was based on size, gene content, overlap with known disease-associated regions, and phenotype overlap according to the ACMG guidelines for postnatal CNV calling.16

Known CAKUT-associated genes4,17

In this study, Codified software (https://www.scienceexchange.com/labs/codified-genomics) was used to search WES data for pathogenic SNVs, variants of uncertain clinical significance (VUS), and benign SNVs in the following 19 dominantly inherited genes reported to be associated with CAKUT: BMP4, BMP7, CDC5L, CHD1L, DSTYK, EYA1, GATA3, HNF1B, KAL1, PAX2, RET, ROBO2, SALL1, SIX1, SIX2, SIX5, SOX17, TNXB, and UPK3A. The following recessive CAKUT genes were interrogated for two SNVs: AGT, ACE, REN, AGTR1, FRAS1, FREM2, GRIP1, HPSE2, LRP4, and ROR2. In addition, we searched for pathogenic SNVs in the following six dominantly inherited genes: GLI3, JAG1, NOTCH2, TFAP2A, TBX18, and WNT4.

Codified Genomics software was used to annotate, filter, and prioritize variants. Variants were filtered as previously described.5 Annotations were generated by Annovar18 and VEP19 against the UCSC, RefSeq, and Ensembl gene models. Variants and genes were further annotated using dbNSFP,20 Illumina body map, Uniprot, HPO, and OMIM databases, among others. Variants were prioritized based on patient phenotype similarity to known disease genes and mutation type, and nonsynonymous variants were prioritized based on predicted deleteriousness.

Results

We performed WES for 112 individuals from 62 families with CAKUT ( Table 1 and Supplementary Table S1 online). Probands were mostly children and young adults ranging in age from 2 months to 24 years. In 31% of the probands, more than one organ (other than the kidney and urinary tract) was involved, which suggests that these patients potentially harbor a syndromic form of CAKUT. In approximately 16% of the families, WES was performed in a familial mode because more than one individual was affected with CAKUT. The most common phenotypic indications were “renal dysplasia” and “agenesis/hypoplasia.”

WES results were interrogated for SNVs in known genes that cause CAKUT as described in Methods. Pathogenic SNVs were identified in three known genes (EYA1, HNF1B, and PAX2) in three families (approximately 5%) ( Table 2 ). Two of these variants were frameshift variants and one was a splice site variant, with each suggesting a loss-of-function mechanism. The frameshift variant in HNF1B is a novel pathogenic allele. Pedigrees of these families are illustrated as Figure 1 . Among the pathogenic SNVs identified in these three families, two were de novo from trio analyses and one was inherited from an affected parent. All selected SNVs identified in the probands and their parents (when available) were confirmed by Sanger sequencing.

Table 2 Pathogenic single-nucleotide variants in 35 known genes identified in 62 families with CAKUT by whole-exome sequencing

Among the three families with pathogenic SNVs, clinical assessment had not identified anomalies of any other organs prior to WES. Importantly, WES elicited further clinical assessment and the delineation of the additional organ system involvement in families 1 and 2 retrospectively. In family 3, defects in other organs have not been observed clinically.

The initial diagnosis of family 1 with p.G24fs SNV in PAX2 was a familial form of renal dysplasia and membranous nephropathy. After the familial variant was identified and in recognition of the current understanding of the phenotype of patients with PAX2 variants (renal-coloboma syndrome; MIM 120330), the first-degree relatives were referred to an ophthalmologist with expertise in the diagnosis and management of genetic disorders. This clinical evaluation revealed optic nerve colobomata and other congenital optic nerve abnormalities in those first-degree relatives who were proven to be variant carriers.

After the initial clinical diagnosis of cystic renal dysplasia (CRD) in family 2, and in light of the WES results (p.Q378fs in HNF1B), the patient’s clinical presentation was further reviewed. The patient had a recent diagnosis of gout and elevated liver-function test (LFT) results. The patient also had increased echogenicity of the pancreas (one of the known signs of HNF1B variants) noted previously by abdominal ultrasound; its significance became apparent after the genetic analyses. A known recurrent de novo intronic variant (c.867+5G>A) was identified in EYA1 in the proband of family 3, who has VUR and multicystic dysplastic kidney. Both parents were negative for this variant based on both trio WES and Sanger sequencing.

In this study, we also defined VUSs in 19 dominantly inherited known CAKUT genes (Supplementary Table S2 online). In this cohort, SNVs were not identified in SIX1, SOX17, GATA3, or UPK3A. Benign SNVs and VUSs were identified in BMP7, CDC5L, CHD1L, SALL1, SIX5, SIX2, ROBO2, BMP4, KAL1, TNXB, RET, PAX2, EYA1, and DSTYK (Supplementary Table S2 online). Further allele frequencies and prediction data for all SNVs identified in known CAKUT genes in this cohort are summarized in Supplementary Table S3 online. Probability of loss-of-function score (pLI) is also provided in this table. The closer the pLI score is to 1 (unity), the more LoF (loss-of-function)-intolerant the gene appears to be (http://exac.broadinstitute.org). We attempted to confirm all VUSs with Sanger sequencing. Details of confirmation are provided in Supplementary Table S2 online.

Novel CAKUT gene identification: Forkhead Box P1 (FOXP1)

Trio analyses consisting of WES for the proband and both biological parents to evaluate for new mutations were performed for 20 families. We confirmed relationships (paternity and maternity) in the trios by review of the de novo SNVs in each family. There was no proband with more than the expected number of de novo SNVs (>2) in the coding exonic region of the genome, which was well within the expected rate of 1.20 × 10−8 per nucleotide per generation.21 We identified a de novo SNV (p.P225T) in FOXP1 (MIM 605515) in a proband with hydrocephaly and unilateral renal agenesis (family 38) ( Figure 1 ). This patient was enrolled initially into this study at age 4 months. Later, the patient manifested delay in gross motor and speech development. In addition, he was diagnosed with strabismus and left optic atrophy. The pedigree of this family is shown in Figure 1 (family 38).

We next attempted to identify other subjects/families with variants in FOXP1. The database of the Whole Genome Laboratory at BMGL was queried for other de novo SNVs in FOXP1. We identified seven more de novo SNVs in this gene among approximately 5,000 patients ( Table 3 ). Relationships (paternity and maternity) were confirmed by inheritance of rare SNVs from each parent in cases 3–8. In case 2, paternity was confirmed by inheritance of rare SNVs from the father. Maternity, however, could not be genetically confirmed per consent and was verified by pregnancy history.

Table 3 FOXP1 SNVs identified in one individual from this cohort (family 38) and 7 additional (cases 2–8 from clinical whole-exome sequencing database) individuals with pathogenic novel de novo SNVs

All eight individuals had neurodevelopmental phenotypes consistent with loss-of-function variants in FOXP1 (MIM 613670). However, four out of eight individuals also had upper urinary tract defects, and five had defects in the lower genitourinary (GU) tract, including undescended testis, hypospadias, and neurogenic bladder. In addition, these patients have brain and heart involvement, which is consistent with the role of FOXP1 in development of these two organs.22,23,24 CNS malformations including hydrocephaly and cardiac defects were among the phenotypes of the patients in this study. The genotypes and phenotypes of these individuals (6 out of 8 with upper or lower urinary tract defects) are summarized in Table 3 . The pLI score of FOXP1 is 1, which suggests this gene is intolerant to loss-of-function variants.

CNV discovery from WES data

CNVs were inferred from WES data as described in the Materials and Methods section. Pathogenic CNVs and CNVs of uncertain clinical significance are summarized in Table 4 . A de novo 22q11.1q11.21 triplication was identified in family 34; the proband had syndromic VUR. This triplication is proximal to the DiGeorge region, consistent with the gain of genetic material seen with type I supernumerary inv dup(22)(q11), which is associated with cat-eye syndrome25 (MIM 115470) (family 34). This patient’s phenotype has overlap with Goldenhar or oculo-auriculo-vertebral spectrum (OAV; MIM 164210) and VATER Association (MIM 192350). The patient was an 11-year-old Latin American boy with short stature, imperforate anus, thumb anomaly, severe gastroesophageal reflux disease (GERD), VUR, neurogenic bladder, right renal hypoplasia with evidence of scarring/renal damage, bilateral ear tags, ocular Duane anomaly, and left microphthalmia. The patient had normal cognition, although with some difficulty in mathematics.

Table 4 Copy-number variants (CNVs) identified (from whole-exome sequencing data of 62 families) that are relevant to the patient’s phenotype

Three other pathogenic CNVs were found in regions associated with known syndromes, namely 16p11.2 deletion, 16p11.2 duplication, and 16p13.11 duplication. CNVs in all individuals in Table 4 were validated by aCGH. Parental studies were also performed by aCGH. In family 34, in which samples were available from both parents, the CNV was found to be de novo. The flowchart of copy-number data inference, CoNIFER, and aCGH data for de novo 22q11 triplication are shown in Supplementary Figure S1 online.

Discussion

In the past 4 years, WES has become a powerful clinical test for defining both recognized and previously undefined genes and potential variant susceptibilities to establish molecular diagnoses for birth defects. Clinical WES identifies pathogenic SNVs in approximately 25% of pediatric cases (mostly syndromic) that represent diagnostic dilemmas refractory to clinical diagnosis despite previous extensive medical evaluations.5,6 Nevertheless, the utility of WES for molecular diagnosis in isolated birth defects including CAKUT remains uncertain. This study shows that WES can be used in the diagnostic setting to define the molecular defects that underlie CAKUT and to reveal additional insights into the clinical presentation of the disorder. In addition, WES can be used for the identification of new candidate genes.

In family 1 ( Figure 1 and Table 2 ), the diagnosis of renal coloboma syndrome (MIM 120330) was possible only after WES results became available. Prior to WES, the proband was using immunosuppressants for proteinuria but after molecular diagnosis management was changing to a tapering dose of immunosuppressive therapy. This allows avoidance of unnecessary immunosuppression because the etiology of kidney disease is not immunologic in this family. The molecular diagnosis of this family obtained by WES thus affected clinical decision making for the patient and the prognosis and management for family members. These data also expand the phenotype related to PAX2 pathogenic variants because the proband has membranous nephropathy and other family members have proteinuria and renal dysplasia. To date, focal segmental glomerulosclerosis has been reported with PAX2 variants26; membranous nephropathy is a novel finding. In family 2, a novel frameshift pathogenic SNV (p.Q378fs) was identified in HNF1B. This molecular diagnosis concluded by WES further substantiated the clinical phenotype as described in the results, thus minimizing the necessity for additional diagnostic evaluation. The variant in this family is novel, which adds to our current knowledge of diseases related to the HNF1B gene.

The phenotype of family 3 with the de novo EYA1 variant suggests that underlying genetic predisposition can lead to or at least exacerbate renal pathology in patients with VUR. Variants in EYA1 can cause branchio-oto-renal syndrome (BOR; MIM 113650), an autosomal dominant disorder characterized by sensorineural, conductive, or mixed hearing loss, structural defects of the outer, middle, and inner ear, branchial fistulas or cysts, and renal abnormalities ranging from mild hypoplasia to agenesis. The c.867+5G>A SNV does not affect the invariant splice site; nevertheless, RNA analysis of samples from patients with BOR showed that this SNV affects EYA1 splicing, producing an aberrant mRNA transcript that lacks exon 8 and results in premature termination in exon 9.27 The proband in this family will be evaluated for hearing impairment because the SNV in this individual causes BOR. This family provides another example that WES can improve the clinical diagnosis of syndromic forms of CAKUT beyond clinical evaluation alone.

Families 1, 2, and 3 exemplify the effects of WES on the clinical management of the patients and families because identification of the SNVs in PAX2, HNF1B, and EYA1, respectively, warranted further investigations in other organ systems. We identified only a fraction of families (3/62 = 4.8%) with pathogenic SNVs, similar to a recent large study that evaluated 749 individuals from 650 families with CAKUT for variants in 17 known CAKUT genes (6.3%).28

CAKUT is a clinically heterogeneous clinical spectrum; therefore, many more genes and causal variants are likely to be identified. Next-generation sequencing and specifically WES have improved discovery of novel causative genes.7,29,30,31,32,33,34 We have identified a novel gene (FOXP1) that likely contributes to the CAKUT phenotype. We found eight novel different de novo pathogenic SNVs from both clinical and research WES in FOXP1 in unrelated individuals. As summarized in Table 3 , the phenotypes observed in these individuals suggest a clinical pattern that may be potentially recognizable. Structural brain anomalies (including hydrocephaly), intellectual disability, developmental delay, cardiac defects, hypotonia, behavior problems, and renal/GU defects are some of the more common features of this syndrome. Six out of eight individuals in this study ( Table 3 ) have a known renal/GU phenotype in addition to other organ involvement. Although de novo disruptions in FOXP1 were recently discovered to cause intellectual disability (OMIM 613670),23,24 here we defined a new syndrome that is characterized by hydrocephalus/brain malformation, cognitive impairment, cardiac defects, and CAKUT attributable to a single gene with pleiotropic effect. We recommend that patients with pathogenic or likely pathogenic variants in FOXP1 should undergo renal ultrasound. Upper tract defects may remain undiagnosed if ultrasound is not performed.

Although FOXP1 has been shown to have important roles in the developmental process of key organs including lung, heart, and brain,22,24,35 there are no data regarding the role of this master transcription factor in kidney and urinary tract development. In this study, we showed the role of FOXP1 in CAKUT and lower urinary tract defects. All the FOXP1 SNVs identified in this study were de novo and novel variants. These variants included frameshift, as documented for other birth defects such as the megacystis microcolon intestinal hypoperistalsis syndrome due to de novo SNVs in ACTG2.36

Based on previous investigations, CNVs account for approximately 16% of CAKUT cases.13 We hypothesized that CNVs underlie a substantive fraction of birth defects in our families as well; therefore, we inferred CNV data with two detection tools. There are some known limitations to CNV discovery from WES data.37 One primary limitation is a high false-positive rate, particularly for small CNVs. We used a stringent approach to identify potentially pathogenic CNVs for validation by aCGH. Approximately 6.5% (4/62) of our cohort have pathogenic CNVs related to the patients’ phenotype.

The four pathogenic CNVs identified ( Table 4 ) are in disease-associated regions and have been evaluated based on ACMG guidelines. Although the fraction of families with pathogenic CNVs (6.5%) is lower than that in studies designed specifically to identify CNV, we included only known pathogenic CNVs and not CNVs of uncertain significance.

Among the most intriguing CNVs identified in this study is triplication of 22q11. Although the patient with proximal 22q11 triplication did not have chromosome analysis to determine if a marker chromosome was present, the gain of euchromatic genetic material is the same as what is seen in cat-eye syndrome. Urogenital malformations are present in ~70% of reported individuals with this syndrome and include male and female genital malformations, renal agenesis, hydronephrosis, VUR, dysplastic or cystic kidneys, and bladder defects.38 Individuals with partial gains of the cat-eye critical region and renal anomalies have been described, providing evidence that the distal portion of the region, including CECR2, SLC25A18, and ATPV1E1, may be responsible for these features.39 Our patient carried a clinical diagnosis of OAV; however, after uncovering the CNV triplication, most of his features are consistent with cat-eye syndrome.

Our findings support the concept that WES could be an adjuvant diagnostic tool even in cases of nonsyndromic CAKUT because the involvement of other organs may be subtle or not manifest at the time of primary evaluation. WES may identify novel candidate genes, as exemplified here, and uncover underlying CNVs that contribute to the CAKUT spectrum.

This study reports the use of WES for molecular diagnosis of the genetic contribution to CAKUT. Nearly 5% of individuals with CAKUT have pathogenic SNVs in known key genes that can be uncovered by WES. In addition, 6.5% of these patients have pathogenic CNVs that were extracted from WES data. In some families, organ involvement beyond CAKUT was sought retrospectively, after the review of WES results. We identified previously unrecognized genes and genetic variants (both SNVs and CNVs) in this cohort and expanded the phenotype of several known genes. Pathogenic SNVs in FOXP1 in individuals with GU/renal phenotype strongly suggest an important role for this gene in urinary tract development.

Disclosure

The Department of Molecular and Human Genetics at Baylor College of Medicine derives revenue from clinical exome sequencing offered by the Baylor Miraca Genetics Laboratories and Whole Genome Laboratory. Authors who are faculty members in the Department of Molecular and Human Genetics at Baylor College of Medicine are identified as such in the affiliation section. The authors declare no conflict of interest with the following exceptions: M.N.B. is the founder of Codified Genomics LLC, a genomic interpretation company; R.A.G. is CSO of the Baylor Miraca Genetics Laboratories; D.J.L. is on the Scientific Advisory Board of Cellmatrix Inc.; J.R.L. has stock ownership in 23andMe and Lasergen, is a paid consultant for Regeneron, and is a coinventor of US and European patents related to molecular diagnostics for inherited neuropathies, eye diseases, and bacterial genomic fingerprinting.