Background

Hearing impairment (HI) is the most common sensory deficit in the world; one to two per 1000 children are born with congenital HI [1]. Over 50% of these cases are due to a genetic cause, most commonly with autosomal recessive (AR) inheritance. To date, ~70 genes have been identified for AR nonsyndromic (NS) HI (Hereditary Hearing Loss Homepage). The DFNB67 locus was mapped to 6p21.1-p22.3 and afterward Homo sapiens lipoma HMGIC fusion partner-like 5 (LHFPL5; MIM 609427), also known as Tetraspan membrane protein of hair cell stereocilia (TMHS), was identified as the causal gene for this locus [2]. Nine pathogenic variants in LHFPL5 (c.1A > G, c.89dupG, c.246delC, c.250delC, c.258_260delCTC, c.380A > G, c.494C > T, c.518T > A, c.649delG) have been reported in ARNSHI families without vestibular dysfunction from Pakistan, India, Turkey, Palestine, Algeria, Iran and Tunisia [2,3,4,5,6,7]. In the mouse, a missense variant c.482G > T (p.Cys161Phe) in the Tmhs gene of hurry-scurry mice was reported to cause deafness and vestibular dysfunction [8]. The protein encoded by LHFPL5 is transiently expressed in hair cell stereocilia bundles from E16.5 to P3 and is presumed to organize a transient cytoskeleton–cell membrane interaction necessary for proper hair cell bundle morphogenesis that is critical for auditory function [2].

In this study, seven Pakistani ARNSHI families were mapped to the DFNB67 region using genome-wide linkage analyses. Sequencing of LHFPL5 revealed three families with previously reported pathogenic variants: two families with the c.250delC variant and one family with a c.380A > G variant. Novel variants were observed in four families: two with missense variant c.452G > T (p.Gly151Val) and two with a splice site variant c.*16 + 1G > A located in the 3′-untranslated region (3′-UTR). Variants in LHFPL5 causing HI highlight the important roles of hair cell stereocilia and their ability to transmit auditory signals from external stimuli within the inner ear.

Methods

Subjects

This study was approved by the Institutional Review Boards of Quaid-i-Azam University and Baylor College of Medicine and Affiliated Hospitals. Informed consent was obtained from each family member participating in the study. Known and novel pathogenic variants in LHFPL5, which underlies ARNSHI were identified in seven consanguineous Pakistani families (Fig. 1). These families are from different ethnic groups: families 4072A, 4072B, 4298 and 4464 are from the Punjab province and speak Punjabi; family 4275 is from the Punjab province, but speaks Saraiki; family 4506 is from the Khyber Pakhtunkhwa province and speaks Pashto; and family 4194 is from Balochistan and speak Balochi. Clinical histories were recorded to rule out non-genetic causes of HI, such as maternal or perinatal infections, administration of ototoxic medications, or trauma and syndromic forms of HI. Physical exams, including tandem gait and Romberg tests, were performed to evaluate for gross vestibular deficits. Pure tone air conduction audiometric testing at 250–8000 Hz was performed on hearing-impaired family members.

Fig. 1
figure 1

Pedigree drawings of the seven ARNSHI families with LHFPL5 variants. Families 4275 and 4506 segregate the known variant c.250delC, the -/- signifies that the family member is homozygous for the c.250delC variant and + /- indicates individuals that are heterozygous c.250delC variant carriers. Family 4298 segregates the known variant c.380 A > G. Families 4194 and 4464 segregate the novel variant c.452 G > T and families 4072A and 4072B segregate another novel variant c.*16 + 1 G > A. It was reported that families 4072A and 4072B are distantly related but the exact relationship is unknown. Filled symbols represent individuals with HI and clear symbols hearing individuals. The six individuals with arrows indicate their audiograms are displayed in Fig. 2. Haplotypes are presented for the two families 4072A and 4464, which have novel variants. A boxed haplotype carries the pathogenic variant. For the other five families, the corresponding nucleotide substitutions are presented below each sequenced individual

Genotyping and linkage analyses

Venous blood was obtained from both hearing and hearing-impaired members of the seven families (Fig. 1). DNA extraction was performed following a phenol–chloroform protocol. The coding region of GJB2 was screened, as well as two variants, which are common causes of ARNHI in Pakistan: c.482 + 1986_88delTGA in HGF and c.272A > G (p.Phe91Ser) in CIB2. DNA samples underwent whole-genome genotyping using the Illumina Human Linkage-panels containing ~6000 single-nucleotide polymorphism (SNP) marker loci at the Center for Inherited Disease Research (CIDR).

The genotype data underwent quality control using MERLIN [9] to detect occurrences of double recombination events over short genetic distances, which could be due to genotyping errors, and PEDCHECK [10] to identify Mendelian inconsistencies. Two-point and multipoint linkage analyses were performed using Superlink Online [11], whereas haplotypes were constructed using Simwalk2 [12]. For linkage analysis, an AR mode of inheritance with complete penetrance and disease allele frequency of 0.001 was used. Marker allele frequencies were estimated from observed genotypes and reconstructed genotypes of founders from Pakistani families that were genotyped at the same time. Genetic map positions of the marker loci were obtained through interpolation using the Rutgers combined linkage-physical map of the human genome (hg19) [13]. The linkage region was defined by the three-unit support intervals and regions of homozygosity.

DNA sequencing

Primers for all four exons of LHFPL5 were created using Primer3 [14]. ExoSAP-IT (USB Corp., Cleveland, OH, USA) was used to purify PCR-amplified products. Sequencing was performed on ABI 3730 DNA Analyzer using BigDye Terminator v3.1 Cycle Sequencing Kit (Applied Biosystems Inc, Foster City, CA, USA). DNA sequences were aligned and analyzed using Sequencher software v4.9 (GeneCodes Corp., Ann Arbor, MI, USA).

Bioinformatics analyses

Pathogenicity of the identified variants were investigated using Polyphen-2 [15], MutationTaster [16], SIFT [17], LRT [18], Mutation Assessor [19], FATHMM [20] and CADD [21]. Nucleotide conservation was predicted by GERP + +, whereas amino-acid residue conservation was investigated by importing similar non-human proteins found from UniProt [22] and aligning the protein sequences on ClustalW2 [23]. The transmembrane helical structure was predicted on TMHMM 2.0 server [24].

The functional effects of donor splice site variant c.*16 + 1G > A were investigated using extensive bioinformatics approaches as RNA samples of hearing-impaired individuals were not available. First, the effect of the variant and a possible cryptic splicing were predicted using three different softwares: NNsplice [25], HSF [26] and NetGene2 [27]. For the 3′-UTRs, based on natural and cryptic splice sites, RNA secondary structures and the minimum free energies (MFEs were predicted on the Vienna RNA web-server [28]. Possible regulatory elements and microRNA (miRNA)-binding sites were identified via the UTRscan [29] and PITA algorithm [30], respectively.

Results

Hearing-impaired individuals from all families had no clinical history or physical examination findings that suggested that the HI is part of a syndrome. All hearing-impaired family members had prelingual bilateral profound HI. Air conduction audiometry showed bilateral hearing thresholds in the profound impairment range for all frequencies (Fig. 2). There was no evidence of gross vestibular dysfunction based on the results of tandem gait and Romberg testing.

Fig. 2
figure 2

Air conduction thresholds of six hearing-impaired individuals. Circles represent the right ear and crosses the left ear. All the tested subjects have bilateral and profound HI across all frequencies

Linkage analysis was performed on all families and resulting log of odds (LOD) scores can be found in Table 1. Three of the seven ARNSHI families in this study had a significant LOD scores of ≥ 4.0, whereas the remaining families had suggestive evidence of linkage (LOD scores 1.7–2.7) to the 6p21.1–22.3 region, which contains LHFPL5. Of the two families for which the novel variants were discovered, one family could establish linkage, i.e., c.*16 + 1 G > Aa family 4072A LOD score = 5.0 and p.Gly151Val family 4464 LOD score = 5.4 (Table 1; Fig. 1).

Table 1 Bioinformatics analyses results of four pathogenic mutations found in seven Pakistani families

LHFPL5 was selected for follow-up because for each family it is the only known HI gene within the linkage region. DNA samples from all available family members (Fig. 1) underwent Sanger sequencing to determine if pathogenic variants lie within this gene. Known LHFPL5 variants segregated with HI in three of the seven families: c.250delC (p. Leu84*) for families 4275 and 4506 and c.380 A > G (p. Try127Cys) for family 4298 (Fig. 1). Families 4194 and 4464 had a novel missense variant c.452 G > T (p. Gly151Val) in exon 2, and families 4072A and 4072B had a novel nucleotide substitution (G > A) at the 5′ donor splice site of exon 3 at c.*16 + 1 (Fig. 3a, c), which segregated with HI. The known and novel variants were not observed in 200 and 600 Pakistani control chromosomes, respectively (Table 1). None of the variants were reported in the Greater Middle East (GME) Variome and Trans-omics for precision medicine (TOPMed) Bravo Database. Novel variant c.452 G > T (p.Gly151Val) was observed in gnomAD exome data with a variant frequency 8.1 × 10−6 with two South Asians heterozygous individuals (Minor allele frequency (MAF) = 6.5 × 10−5). Also in gnomAD exome sequence data known variant c.380 A > G (p.Tyr127Cys; rs104893975) was observed with a MAF = 2.8 × 10−5 with two heterozygous variants observed in South Asians (MAF 6.5 × 10−5) and five heterozygous variants observed in non-Finnish Europeans (MAF = 4.5 × 10−5) and it is also reported in dbSNP and ClinVar as a clinically associated pathogenic variant (Table 1).

Fig. 3
figure 3

a Chromatograms displaying the novel variants c.452G > T in families 4194 and 4464 and c.*16 + 1G > A in families 4072A and 4072B. b ClustalW2 sequence alignment of amino acids across LHFPL5 proteins from various species with conserved amino acids indicated with an asterisk, whereas colons indicate conservation between groups with strongly similar properties. The glycine151 residue is indicated with an arrow, and is fully conserved across all species (left panel). DNA sequence alignment containing the guanine nucleotide c.*16 + 1, indicated with an arrow. The position is fully conserved across various species (right panel). c Schematic presentation of the exon–intron structure with 11 pathogenic variants. The boxed variants indicate novel variants found in this study (top panel). The wild-type structure of exon 3–intron 3–exon 4 of LHFPL5 (middle panel). In the c.*16 + 1 mutant transcript, exon 3 was extended by 357 bp due to the activation of a cryptic splice site. The regulatory element binding sites in 3′-UTR are indicated with an arrow (bottom panel). d Predicted transmembrane helices in LHFPL5 (adapted from the result of TMHMM 2.0 analysis) and depiction of the amino-acid positions of previously reported two missense variants and the novel variant found in this study. The dotted line arrow indicates the location of the missense variant found in hscy mice

The novel missense variant c.452 G > T (p.Gly151Val) has a CADD C-score of 28 and was predicted to be “disease causing”, “damaging” or “functional” by various bioinformatics prediction software (Table 1). The guanine nucleotide at c.452 has a GERP + + score of 5.53 indicating that it is under strong evolutionary constraint. Based on Clustalw2 alignment, the glycine residue at p.151 is fully conserved across 14 species ranging from frog to gorilla (Fig. 3b). It is predicted the glycine residue (p.G151) is located on the second extracellular loop of the transmembrane protein (Fig. 3d).

The donor splice site variant c.*16 + 1 G > A was predicted to be disease causing by MutationTaster (Table 1) due to natural splice site disruption (Table 2). It is predicted to lead to a loss of the 5′ donor splice site in the 3′-UTR of LHFPL5, predicted by various bioinformatics tools (Table 2). In addition, the adaptive boosting (ADA) and random forest (RF) score for this variant are 0.99 and 0.93 respectively ( > 0.60 is predicted to impact pre-mRNA splicing), shown by annotation of the dbscSNV database [31]. The guanine nucleotide at c.*16 + 1 has a GERP + + score of 3.49 indicating that the nucleotide is under strong evolutionary constraint (Fig. 3b). A further in silico analysis using three splice site analysis tools (Table 2) shows that the c.*16 + 1 G > A mutation is predicted to activate a cryptic splice site 357 bp toward the 3′ direction. The extended 3′-UTR leads to a different RNA secondary structure with much less MFE (Supplementary Figure 1). A UTR analysis tool and miRNA target scanning software predicted that the extended 3′-UTR may include additional regulatory elements, “K-BOX” and “SXL binding site” (Fig. 3c), and new miRNA-binding sites (Supplementary Table 1).

Table 2 Splice site analyses results for the mutation, c.*16 + 1G > A, found in families 4072A and 4072B

Discussion

Variants underlying deafness and vestibular dysfunction in the Tmhs gene were identified in hscy mice [8]. This finding was followed by the identification of variants underlying HI in the human ortholog (LHFPL5) in families with HI without vestibular dysfunction that segregated DFNB67 [2,3,4,5]. Previously, nine pathogenic variants were reported. LHFPL5 is predicted to be a tetraspan transmembrane protein with two extracellular loops (Fig. 3d). The known variant c.250delC (p.Leu84*) observed in families 4275 and 4506 causes a frame-shift in exon 1, introducing a premature stop codon (Fig. 3c), and the mRNA is eventually degraded by nonsense-mediated decay. The second known variant, c.380 A > G (p. Tyr127Cys), seen in family 4298 replaces the tyrosine residue with a cysteine. This amino-acid change occurring within the third transmembrane helix is predicted to result in the mis-localization of the protein (Fig. 3d) [2].

Novel variant c.452 G > T (p.Gly151Val) replaces the glycine residue located on the second extracellular loop with a valine. This variant is in close proximity to the p.Cys161Phe hscy mouse variant [8] and a previously reported human pathogenic variant p.Thr165Met [4] (Fig. 3d), implicating the functional significance of the second extracellular loop. As LHFPL5 is presumed to organize a transient cytoskeleton–membrane interaction in the stereocilia of sensory hair cells, the variant p.Gly151Val may cause dysfunction or mis-localization of LHFPL5, leading to stereocilia pathology similar to that found in hscy mice [8].

The second novel variant c.*16 + 1 G > A introduces a 5′ donor splice site disruption in exon 3 (Fig. 3c), activating a cryptic splice site predicted to occur 357-bp downstream and extending the 3′-UTR. The change occurs in the 3′-UTR, to which regulatory molecules and miRNAs can bind to regulate gene expression. The extended 3′-UTR sequence may affect the expression of LHFPL5 in different ways: (1) longer 3′-UTR sequences may introduce unstable RNA secondary structures, which eventually lead to translational repression; (2) additional regulatory elements – “K-BOX”, “SXL binding site” (Fig. 3c) – may negatively affect the gene expression; (3) new miRNA-binding sites in the extended 3′-UTR sequences (Supplementary table 1) may reduce the mRNA activity. Causal variants in 3′-UTR regions have not frequently been reported, especially variants affecting 3′-UTR splicing and therefore affecting gene function [32]. This low identification rate could be attributed to the limited understanding of their functional impact.

miRNAs play very important roles in the auditory system and variants in miR-96 have been reported to cause deafness in humans [33, 34] and mice [35]. miRNAs are involved with hearing functions inhibiting target mRNAs by repressing translational activity and destabilizing RNA secondary structure [36]. Among the new miRNA-binding sites found in the extended region, miR-5787 was reported to repress cellular growth targeting eukaryotic translation initiation factor 5 in fibroblasts [37]. Interestingly, in Drosophila, “K-Box” bound miRNAs (K-Box miRNAs) were reported to inhibit the Notch pathway [38], which is involved in cochlear development and deafness [39]. Thus, new miRNA-binding sites and regulatory element such as “K-box” in the longer 3′-UTR sequences may alter the expression of LHFPL5, which may result in pathogenic effects.

Variants in LHFPL5 are a relatively rare cause of ARNSHI, but knowledge of all variants contributing to the HI phenotype provides a valuable resource for diagnostic genetic testing. Moreover, the unusual splice site variant in 3′-UTR provides a deeper understanding of the functional roles of various 3′-UTR regulatory motifs in the etiology of human deafness.