Introduction

Glaucoma is the second leading cause of blindness affecting nearly 65 million people worldwide1,2. Primary congenital glaucoma (PCG: OMIM # 231300) is a rare form of glaucoma, characterized by defective development of the anterior chamber structures that lead to aqueous outflow obstruction, increased IOP, and optic nerve damage3,4. It is usually inherited as an autosomal recessive disorder with incomplete penetrance3,4. Increased IOP results in enlargement of the globe (buphthalmos) and irritation of the cornea cause corneal edema/haze. Other clinical findings include Haabā€™s striae, conjunctival erythema, and optic atrophy in the later stages of the disease5,6.

PCG is a genetically heterogeneous disorder and to date, four genetic loci, GLC3A (CYP1B1, 2p22-p21), GLC3B (1p36.2ā€“36.1), GLC3C (14q24.3), and GLC3D (LTBP2, 14q24.2ā€“24.3) have been reported7,8,9,10. Mutations in Cytochrome P450 Family 1 Subfamily B Member 1 (CYP1B1) (OMIM # 601771) and Latent Transforming Growth Factor-beta Binding Protein 2 (LTBP2) (OMIM # 602091) have been identified in patients with autosomal recessive PCG11,12, whereas the genes for remaining two genetic loci; GLC3B and GLC3C are yet to be cloned. Moreover, genetic variants responsible for autosomal dominant PCG have been reported13,14. Souma and colleagues13, reported multiple heterozygous mutations in tunica interna endothelial cell kinase (TEK), responsible for autosomal dominant PCG in a multiethnic cohort of familial and sporadic cases. Moreover, Thomson and colleagues14, reported one missense and two nonsense heterozygous variants in angiopoietin-1 (ANGPT1) responsible for autosomal dominant PCG in three human subjects.

Multiple studies have reported mutations in CYP1B1 and LTBP2 responsible for PCG in the Pakistani population15,16,17,18,19. We previously reported the identification of pathogenic mutations in CYP1B1 and LTBP2 responsible for PCG in families of Pakistani descent20,21,22. Here, we employed next-generation whole exome sequencing to identify the genetic basis of PCG in ten affected individuals belonging to four familial cases excluded for mutations in CYP1B1 and LTBP2.

Results and Discussion

In an ongoing effort to identify the genetic determinants responsible for PCG in patients of Pakistani descent, we have ascertained a large cohort consisting of 48 families with at least two affected individuals per family. We previously employed short tandem repeat (STR) markers that localized two-thirds of our familial cohort (i.e. 32 families) to the reported PCG loci and as summarized in Fig. 1, sequencing identified mutations in CYP1B1 and LTBP2 in these families20,21,22. Importantly, the remaining one-third of the cohort (16 families) detailed in Table 1, excluded for linkage to reported PCG loci remains unsolved (Fig.Ā 1). In the present study, we investigated the exomes of ten affected individuals manifesting cardinal systems of PCG from the four unlinked families (PKGL034, 036, 044 and 062) through next-generation sequencing (Fig.Ā 2) along with four control samples consisting of genomic DNAs of affected individuals harboring mutations in CYP1B1 and LTBP2. These four families were selected out of the 16 unlinked families based on a stronger pedigree structure with a higher number of affected individuals and consanguineous marriages within the family.

Table 1 Summary of the unlinked familial cases in our cohort with primary congenital glaucoma patients.
Figure 1
figure 1

Pie chart illustrating the contributions of CYP1B1 and LTBP2 mutant alleles responsible for primary congenital glaucoma (PCG) in a cohort of familial cases of Pakistani descent. Distribution of (A) PCG loci, (B) CYP1B1 mutations, and (C) LTBP2 mutations in the PCG cohort. Missense, nonsense, and frameshift mutations were identified in both CYP1B1 and LTBP220,21,22.

Figure 2
figure 2

Pedigree drawings illustrating segregation of primary congenital glaucoma in four familial cases. (A) PKGL034, (B) PKGL036, (C) PKGL044, and (D) PKGL062 examined by exome sequencing. Squares are males, circles are females, filled symbols are affected individuals, a double line between individuals indicates consanguinity, and a diagonal line through a symbol is a deceased family member.

Affected individuals in four families (PKGL034, 036, 044 and 062) underwent detailed medical examination including tonometry and slit-lamp microscopy at Layton Rahmatulla Benevolent Trust (LRBT) in Lahore, Pakistan. The ophthalmic examination in these four families revealed common symptoms of PCG including elevated IOP, increased corneal diameter, increased CD ratio, and visual acuity that was reduced to hand movement and/or light perceptions (Table 2). Moreover, bilateral buphthalmos, corneal opacity, central corneal haze, megalocornea, nystagmus, and myopic fundus were identified in some but not all affected individuals (Table 2).

Table 2 Clinical characteristics of primary congenital glaucoma patients.

Prior to next-generation sequencing, we reconfirmed the exclusion of linkage to the reported PCG loci through STR marker-based exclusion analysis (Table 3). Once exclusion was reconfirmed, we selected 10 affected individuals from PKGL034, 036, 044 and 062, and performed whole exome sequencing as described in the materials and methods.

Table 3 Exclusion of GLC3A/CYP1B1 (D2S2163, D2S177, D2S1346), GLC3B (D1S228, D1S402, D1S507, D1S2672), and GLC3D/LTBP2 (D14S43, D14S1036, D14S61, D14S59, D14S74) through linkage analysis.

The quality control analysis of exome data revealed thatā€‰>ā€‰99% of the reads were of 100 and 150 base pairs, while 95% of the sequencing data yielded a PHRED score, of 30 or above. High throughput sequencing yielded 39ā€“71 million paired-end reads for each sample andā€‰~ā€‰39 to 69 million reads (>ā€‰97% of total reads) were uniquely mapped to the human genome (GRCh38.p13) representing an average of 89Ɨā€‰to 127Ɨā€‰coverage for all ten exomes (Table 4).

Table 4 Summary of the statistics of next-generation sequencing data.

A multifaceted filtering approach was used for the identification of pathogenic variants responsible for the PCG (Fig.Ā 3). Briefly, we included homozygous variants based on the disease segregation pattern (autosomal recessive) that were common in all affected individuals examined by exome sequencing. We interrogated missense and nonsense alleles, small insertions, and deletions (Indels), and variants at the splice-site and untranslated regions (UTRs) based on either their absence (novel) or MAFā€‰<ā€‰0.01 in public databases (i.e., dbSNP (Ver. 153), 1000 Genomes, NHLBI ESP, and gnomAD), and absence in the in-house exome dataset. Any variants passing the above-mentioned filtering criteria were examined for segregation with the disease phenotype in their respective families.

Figure 3
figure 3

Flow chart depicting the protocol used for the bioinformatic analysis of whole exome sequencing data. The paired-end reads were aligned to the human genome (GRCh38.p13) using SeqMan NGen (Ver. 12; DNASTAR) and mapped reads were processed for variant calling and annotation with ArrayStar (Ver. 12; DNASTAR). The non-synonymous homozygous variants in the coding regions of the genome segregating in multiple affected individuals of the same family were selected for analyses. Any variants that did not adhere to MAFā€‰<ā€‰0.01 in public databases (i.e., dbSNP (Ver. 153), 1000 Genomes, NHLBI ESP, and gnomAD), and absent in the in-house exome dataset (>ā€‰50 ethnically matched exomes without PCG phenotype) were excluded from the analyses. MS missense, NS nonsense, Indel insertion/deletion, UTR untranslated region, MAF minor allele frequency, N.A. not applicable.

Whole exome sequencing identified 29,014 common variants in three affected individuals from PKGL034 (Fig.Ā 3 and Supplementary Tables 1ā€“3). We identified 1143 non-synonymous variants in coding, splice-site, and the UTRs (Fig.Ā 3 and Supplementary Tables 1ā€“3). As shown in Fig.Ā 3, none of these 1143 variants passed the criteria of low allele frequency (MAFā€‰<ā€‰0.01). The exome sequencing identified 72,262 variants common to both affected individuals from PKGL036 (Fig.Ā 3 and Supplementary Tables 4 and 5). We identified 2777 non-synonymous variants in coding, splice-site, and the UTRs (Fig.Ā 3 and Supplementary Tables 4 and 5). As shown in Fig.Ā 3, none of these 2777 variants passed the criteria of low allele frequency (MAFā€‰<ā€‰0.01). The exome sequencing identified 39,348 variants common to the three affected individuals from PKGL044 (Fig.Ā 3 and Supplementary Tables 6ā€“8). We identified 1207 non-synonymous variants in coding, splice-site, and the UTRs (Fig.Ā 3 and Supplementary Tables 6ā€“8). As shown in Fig.Ā 3, none of these 1207 variants passed the criteria of low allele frequency (MAFā€‰<ā€‰0.01). Finally, whole exome sequencing identified 73,302 variants common in the two affected individuals from PKGL062 (Fig.Ā 3 and Supplementary Tables 9 and 10). We identified 2680 non-synonymous variants in coding, splice-site, and the UTRs (Fig.Ā 3 and Supplementary Tables 9 and 10). As shown in Fig.Ā 3, none of these 2680 variants passed the criteria of low allele frequency (MAFā€‰<ā€‰0.01). Taken together, the whole exome analysis failed to identify any potential variants that would satisfy the criteria of causality including but not limited to low MAF.

To rule out the possibility that our next generation-based sequencing strategy is unable to identify causal mutations in genes responsible for PCG, we included two families, PKGL067, and PKGL015 that we previously reported to harbor mutations in CYP1B1 and LTBP2, respectively21,22. We included two affected individuals from each family (individuals 9 and 20 of PKGL067, and individuals 8 and 13 of PKGL015; please see Refs.21,22 for pedigree drawings of PKGL067 and PKGL015, respectively) and performed whole exome sequencing as a positive control. Exome sequencing identified 80,742 variants common in the two affected individuals from PKGL067 (Fig.Ā 3 and Supplementary Tables 11 and 12). We identified 1699 non-synonymous variants in coding, splice-site, and the untranslated region (Fig.Ā 3 and Supplementary Tables 11 and 12). Importantly, we identified the missense allele c.1169Gā€‰>ā€‰A (p. Arg390His) in CYP1B1 reported in PKGL067 responsible for PCG21. Likewise, exome sequencing identified 42,545 variants common in the two affected individuals from PKGL015 (Fig.Ā 3 and Supplementary Tables 13 and 14). We identified 1945 non-synonymous variants in coding, splice-site, and the untranslated region (Fig.Ā 3 and Supplementary Tables 13 and 14). Importantly, we identified the single base deletion (c.3427delC; p.Gln1143Argfs*35) in LTBP2 reported in PKGL015 responsible for PCG22.

Although mutations in CYP1B1 are the most common cause of PCG and are responsible for 27% of sporadic and 87% of familial cases worldwide23, a number of sporadic and familial PCG cases do not localize to CYP1B1 (or to other reported PCG loci)24. Previously, two independent studies reported familial and sporadic cases of PCG that failed to identify pathogenic homozygous mutations through whole exome sequencing24,25. Kuchtey and colleagues24, presented results of exome sequencing using an autosomal recessive model of inheritance that failed to identify any causative variant in a familial case with six affected members. Sharafieh and colleagues25, performed whole exome sequencing of 24 families (30 PCG patients negative for mutations in both CYP1B1 and LTBP2) but failed to detect any homozygous variants responsible for PCG in the affected cases.

It is worth noting that we have successfully applied linkage coupled with whole exome26,27, and whole genome28,29, sequencing approaches to delineate pathogenic variants responsible for ocular dystrophies. Likewise, a similar approach to delineate the genetic basis of extraocular diseases has been adopted by our group30,31, and many other groups32,33. Therefore, we propose genome-wide homozygosity or linkage mapping coupled with a whole genome sequencing approach to delineate the unknown genetic determinants of PCG in the 16 unsolved familial cases of PCG. Importantly, advancements in exome capture technologies i.e., to resolve the insufficient capture of GC-rich sequences and purging of other current limitations i.e., failure to detect large deletions or copy number variation (CNV) will also help to delineate the genetic basis of the unsolved familial cases in our cohort.

In summary, next-generation whole exome sequencing of multiple affected individuals from consanguineous families failed to identify the genetic basis of PCG. The lack of pathogenic variants in exome data strengthens the notion that compound heterozygous coding variants, non-coding RNA, or intronic variants in the inter- or intragenic regions are likely responsible for the PCG phenotype in the cohort of families excluded for mutations in CYP1B1 and LTBP2.

Materials and methods

Subject recruitment and clinical evaluation

Patients affected with PCG were identified and recruited from the pediatric departments of LRBT Lahore. Informed written consent was obtained from all participating family members consistent with the tenets of the Declaration of Helsinki. This study was approved by the Institutional Review Board (IRB) of the Johns Hopkins University School of Medicine (Baltimore, MD), the National Institutes of Health (Bethesda MD), and the National Centre of Excellence in Molecular Biology (Lahore, Pakistan). The study was completed in accordance with the Declaration of Helsinki and all participating subjects provided informed consent before enrollment in the study.

A detailed medical and clinical history was obtained by interviewing members of the families. Ophthalmic examination including slit-lamp microscopy was performed at the LRBT Hospital. Elevated IOPā€‰>ā€‰16Ā mmHg for children andā€‰>ā€‰21Ā mmHg for adults, corneal edema, increased corneal diameter;ā€‰>ā€‰12.0Ā mm and larger cup to disc (CD) ratio were inclusion criteria for the patients.

Approximately 10Ā ml of blood was drawn from all participating members and the samples were stored in 50Ā ml Sterilin Falcon tubes with 20Ā mM EDTA. Genomic DNA was extracted from white blood cells using a non-organic modified procedure as described20,21,22.

Exclusion and linkage analysis

The reported loci/genes associated with PCG were screened by genotyping 12 polymorphic short tandem repeat (STR) markers spanning GLC3A/CYP1B1 (D2S2163, D2S177, D2S1346), GLC3B (D1S228, D1S402, D1S507, D1S2672), and GLC3D/LTBP2 (D14S43, D14S1036, D14S61, D14S59, D14S74). PCR amplification for genotyping was performed as described20,21,22. Two-point linkage analysis was performed using the FASTLINK version of MLINK from the LINKAGE Program Package34,35. The maximum two-point LOD scores were calculated using ILINK. PCG was analyzed as a fully penetrant autosomal recessive trait with an affected allele frequency of 0.001. The marker order and distances between respective markers were obtained from NCBI (National Center for Biotechnology Information; https://www.ncbi.nlm.nih.gov/) chromosomes 1, 2, and 14 sequence maps and Marshfield database (https://www.biostat.wisc.edu/~kbroman/publications/mfdmaps/).

Next-generation whole exome sequencing

Whole exome library preparation and next-generation sequencing were performed in-house and commercially by Novogene Corporation Inc (Sacramento, CA). The exome libraries (in-house) were prepared using the Nextera Rapid Capture Expanded Exome kit (Catalog # FC-140-1005; Illumina Inc., San Diego, CA) according to the manufacturerā€™s protocol. Genomic DNA was quantitated using a Qubit Fluorometer (Qubit 2.0; Invitrogen, Carlsbad, CA). Approximately 50Ā ng of genomic DNA was subjected to an enzyme-based tagmentation process followed by amplification using barcode-specific indexes to prepare the genomic libraries. The genomic libraries were further processed for exome enrichment using expanded exome oligos (Illumina Inc., San Diego, CA). The exome-enriched libraries were quantitated using a high-sensitivity DNA chip on an Agilent 2100 Bioanalyzer (Agilent, Santa Clara, CA) and quantitative PCR (qPCR) according to the manufacturer's instructions. The bar-coded exome libraries were pooled and clustered using the TruSeq Cluster Kit (Ver. 3, Illumina, Inc. San Diego, CA) at 13Ā pM concentration and were paired-end (2ā€‰Ć—ā€‰100Ā bp) sequenced on a single lane of HiSeq2000. The exomes (commercially) were captured by Agilent SureSelect Human All Exon kits (Ver.6) (Agilent Technologies, Inc. Santa Clara, CA) and sequenced in a paired-end fashion (2ā€‰Ć—ā€‰150 bp) using the Illumina (Illumina Inc., San Diego, CA) HiSeq X-10 platform.

Lasergene Genomics Suite (DNASTAR, Madison, WI) was used for reference-guided genome alignment and variant calling/annotation of the whole exome sequencing data. The paired-end raw reads were aligned to the human genome (GRCh38.p13) using SeqMan NGen (Ver. 12) with default parameters. The mapped reads in the BAM file format were converted into the DNASTAR-specific format and processed for variant analysis. In the next step, mapped reads were further processed with ArrayStar (Ver. 12) for variant calling and annotation. The stringent criterion was used to filter false-positive results from the potentially causal variants. To ensure data quality, variants with low sequencing depth (<ā€‰2) and read quality (<ā€‰Q20) were excluded.

Based on the disease segregation pattern (autosomal recessive) and consanguinity of familial cases, we assumed that a casual variant must be homozygous. We excluded all heterozygous variants from the analyses. Next, we removed all synonymous and intronic homozygous variants, and only non-synonymous homozygous variants located in the coding and splice regions of the genes were selected for further analysis. The non-synonymous homozygous variants were further scrutinized based on their absence (novel) or minor allele frequency (MAF)ā€‰<ā€‰0.01 in public databases (i.e., dbSNP (Ver. 153), 1000 Genomes, NHLBI ESP, and gnomAD), and absence in the in-house exome dataset (>ā€‰50 ethnically matched exomes excluded for PCG). Note: Our strategy also includes the segregation analysis of potential causal variants with the PCG phenotype in their respective familial cases.