Comprehensive molecular-genetic analysis of mid-frequency sensorineural hearing loss

The genetic heterogeneity of sensorineural hearing loss (SNHL) is a major hurdle to the detection of disease-causing variants. We aimed to identify underlying causal genes associated with mid-frequency hearing loss (HL), which contributes to less than about 1% of SNHL cases, by whole exome sequencing (WES). Thirty families segregating mid-frequency SNHL, in whom biallelic GJB2 mutations had been previously excluded, were selected from among 851 families in our DNA repository of SNHL. DNA samples from the probands were subjected to WES analysis and searched for candidate variants associated with SNHL. We were able to identify the genetic aetiology in six probands (20%). In total, we found three pathogenic and three likely pathogenic variants in four genes (COL4A5, OTOGL, TECTA, TMPRSS3). One more proband was a compound heterozygote for a pathogenic variant and a variant of uncertain significance (VUS) in MYO15A gene. To date, MYO15A and TMPRSS3 have not yet been described in association with mid-frequency SNHL. In eight additional probands, eight candidate VUS variants were detected in five genes (DIAPH1, MYO7A, TECTA, TMC1, TSPEAR). Seven of these 16 variants have not yet been published or mentioned in the available databases. The most prevalent gene was TECTA, identified in 23% of all tested families. Furthermore, we confirmed the hypothesis that a substantive portion of cases with this conspicuous audiogram shape is a consequence of a genetic disorder.

Recessive variants. We detected six recessive variants in genes MYO15A, OTOGL, TMPRSS3 and TSPEAR in four families (27%) (Fig. 2). Three families (FAM253, FAM834 and FAM837) were characterized by sporadic HL. Familial HL occurred in FAM174, with two affected siblings. We identified a pathogenic and VUS variant in the MYO15A gene (NM_016239.4) in family FAM837, which was characterized with early-onset HL. Due to rapid HL progression, the proband received a cochlear implant at the age of 9 years with excellent early outcomes (a free field pure tone average 38.75 dB at 10 months post-implantation). One pathogenic variant and one likely pathogenic variant in the OTOGL gene (NM_173591.5) were detected in the proband D1315 (family FAM834), who had congenital HL. In both families, it would be appropriate to investigate the genotype of the parents to exclude the cis form of the variants in the probands.
The two remaining families carrying homozygous variants are of Roma ethnicity. In family FAM253, with perilingual HL, we detected the known p.Arg106Cys variant in the TMPRSS3 gene (NM_024022. 3). From a clinical point of view, the proband's hearing gradually deteriorated to deafness between the 4th and 14th year of life. In the last family, FAM174, we detected VUS variant p.Glu624* in the TSPEAR gene (NM_144991. 3). The proband's sister (D1032) is also affected by HL but interestingly was found to only be a heterozygous carrier of the p.Glu624* variant. The HL phenotype was similar in both siblings. in two affected siblings (D815, D817), both of whom showed a clinical picture of Alport syndrome. In addition to HL, they had both suffered from microscopic haematuria since childhood as well as symptoms of chronic glomerulonephritis. In subject D817, Alport syndrome was confirmed by kidney biopsy and the condition has progressively led to end-stage kidney failure with the need for haemodialysis.

Discussion
Mutations in GJB2 have been reported as the most frequent cause of hereditary HL 14,15 . For this reason, we first analyzed the GJB2 gene by Sanger sequencing in all 851 probands. Using this method, we determined the genetic aetiology in 245 (29%) of the probands, who were then excluded from further analyses.
In the next step, the remaining probands were selected based on the clinical phenotype. Although we focused on a relatively uniform group of patients from the audiological point of view (mid-frequency HL), the number of potentially involved genes was large (> 226). Moreover, the spectrum of the genes responsible for this particular form of HL is mostly unknown. It is not practical to analyze each gene separately in such a case. Therefore, next generation sequencing (NGS) allowing the analysis of many genes simultaneously seemed to be the best solution 16,17 .
Using WES, we identified the genetic aetiology in six of the 30 probands (20%) with mid-frequency HL. Additionally, novel candidate variants (VUS) were found in eight probands. A compound heterozygous pathogenic Glycoproteins expressed in the tectorial membrane. The tectorial membrane (TM) of the cochlea is a ribbon-like strip of extracellular matrix that contacts the tips of the sensory hair bundles of the outer hair cells. Sound induces movement of these hair cells relative to the TM, deflects the stereocilia, and leads to fluctuations in hair-cell membrane potential, transducing sound into electrical signals 30,31 .
The TECTA gene encodes a protein called alpha-tectorin. Mutations in TECTA account for 2-4% of all autosomal dominant non-syndromic SNHL characterized by mild prelingual HL from a clinical point of view 15,[32][33][34] . TECTA contains several extracellular matrix (ECM)-interacting domains, including a nidogen-like domain and  www.nature.com/scientificreports/ and p.Tyr1942Cys) located in the ZP domain were identified in probands with a U-shaped audiogram. However, the p.Cys650Arg variant identified in the cysteine-rich TIL1 domain is responsible for mild HL, which became clinically manifest at the age of 16 years. The audiometric curve has a shallow U-shape profile. Similarly, Yasukawa et al. 37 identified the likely pathogenic p.Cys606Gly variant in the TIL1 domain in a proband with mid-frequency HL. Variant p.Lys2150Asnfs*9 is located in the last exon and causes replacement of the last six amino acids and extension of the glycosylphosphatidylinositol (GPI) anchorage of TECTA protein by two amino acids. It has been shown that GPI-dependent release of TECTA is required for elongation and bundled organization of collagen fibrils on the apical surface of inner supporting cells during formation of the TM 38 . Therefore, any change in the conformation or length of the GPI anchorage may impair this process and lead to a clinical phenotype. However, further study of these mechanisms is required. In our study, 10% of all probands with a U-shaped audiogram carried pathogenic variants in TECTA , and when we also take into account all the identified VUS variants, the theoretical contribution of TECTA to hearing impairment in our cohort may be as high as 47%. This is more than the published prevalence of TECTA in patients with U-shaped audiograms, for example in Japan, where 6% of patients with U-shaped audiograms could be attributed to TECTA 36 . The difference may be explained by high variability in the prevalence of causal deafness variants and genes among different ethnic populations as a well-known phenomenon in hereditary HL 39 .
The OTOGL gene encodes an otogelin-like protein, which has structural similarities to the epithelial-secreted mucin protein family 40 . Its mutations lead to non-syndromic prelingual HL, which is mild to moderate and stable, with autosomal recessive inheritance 41 . The 2,343-amino acid protein contains an N-terminal signal peptide, four paired von Willebrand factor (VWF) and cysteine-rich (C8) domains, four trypsin inhibitor-like (TIL) domains, and a C-terminal cystine knot motif 40 . We identified two heterozygous variants in proband D1315. The first is the known pathogenic nonsense variant p.Gln520* in vWFD2 42 . The second likely pathogenic variant, c.4032_4054+30del, has not yet been published. It is located at the interface of the 34th exon and 34th intron and, according to HSF and MaxEnt, it can cause an alteration of splicing due to activation of several cryptic acceptor and donor sites. However, we cannot rule out exon skipping, which may result in a more significant shortening of the protein sequence located between domains vWFD3 and vWFD4. The exon skipping does not lead to a frameshift, and therefore, according to Abou Tayoun et al. 43 , only the PVS1_strong classification criterion can be applied. On the other hand, vWFD domains containing the multimerization consensus site CGLC are essential for the multimer assembly of proteins expressed in the tectorial membrane (like α-tectorin, otogelin or otogelin-like protein) to form a filament and higher order structures. Although little is known about these domains in OTOGL, mutations in the vWFDs of TECTA have been clearly connected with hearing impairment 37,44 . Moreover, the shape of the audiometric curve observed in our proband is almost identical to the audiogram obtained from the compound heterozygous patient carrying mutations p.Gln520* and p.Arg925* reported by Bonnet et al. 42 . The vestibular hypofunction observed in patients by Oonk et al. 45 was not diagnosed in our proband.
Collagens expressed in the cochlea. The collagen superfamily comprises 28 types (I-XXVIII) in vertebrates 46 . Collagens are fibrous structural proteins involved in the construction of skin, cartilage, bone, eye, and other tissues 47 . All collagen molecules are comprised of one, two, or three different types of α-chain subunits tightly wrapped into a triple helix. A single missense mutation of a glycine to another residue in the triple helix can result in almost 40 genetic diseases 46 . Collagens encoded by genes COL2A1, COL4A3, COL4A4, COL4A5, COL4A6, COL9A1, COL9A3, COL11A1 and COL11A2 are essential for proper auditory function. Mutations in these genes are linked to non-syndromic or syndromic HL (Marshall syndrome, fibrochondrogenesis, Alport syndrome, Stickler syndrome) 19,48 . In this study, we present a novel variant in the COL4A5 gene. Variants in this gene are associated with X-linked Alport syndrome 49 . In 80% of all cases, Alport syndrome is caused by mutations in the COL4A5 gene 50 . The disorder shows considerable heterogeneity in the age of onset of end-stage renal disease and the occurrence of deafness among particular families 51 . Most patients have mid-frequency HL, although high frequency and flat SNHL curves may be present, as well. SNHL in males may first manifest around the age of 10 years and usually gets progressively worse 49,52 . In family FAM618 we identified the variant p.Gly472Glu, which interrupts the Gly-Val-Lys triplet in the α-chain and results in disruption of the triple helix's function, changing the scaffolding and flexibility of collagen IV 46 . This results in an Alport syndrome phenotype in both siblings (D815, D817).

Genes expressed in hair cells.
Myosins play an important role in the cellular organization of the cochlea. In general, myosins are motor proteins, and based on their variable C-terminal binding domains, they are included in conventional myosins (class II) and unconventional myosins (classes I and III to XV) 53 . Unconventional myosins IA, IIIA, VI, VIIA and XVA occur in the cochlea. MYO7A and MYO15A, encoding unconventional myosins, are expressed in the outer and inner hair cells of the Organ of Corti 54,55 .
MYO15A is known to be the third-most mutated gene causing severe to profound ARNSHL, after GJB2 and SLC26A4 mutations. Mutations in this gene cause both pre-and post-lingual forms of progressive HL 41,56 . Our patient D1321 clinically manifests profound non-syndromic HL. Interestingly, she passed the neonatal hearing screening, and her HL became apparent at the age of 1.5 years. We identified two variants, c.5693G>A and c.8050T>C, the latter of which was previously described in trans with other pathogenic variants 57,58 , whereas c.5693G>A variant was submitted only to the ClinVar archive and to the Deafness Variation Database (https:// deafn essva riati ondat abase. org/) 59 . It is located in the myosin motor domain of the MYO15A protein. This domain consists of ATP-and actin-binding sites, which can generate force and move actin filaments; therefore, motor domain dysfunction results in shorter stereocilia with an ectopic staircase structure of stereocilia associated with a severe deafness phenotype 60  www.nature.com/scientificreports/ The MYO7A tail contains a pair of myosintail homology 4-protein 4.1, ezrin, radixin, moesin (MyTH4-FERM) tandems separated by an SH3 domain. Mutations in the head and tail regions of MYO7A lead to defects in mechanotransduction and result in Usher syndrome 1B (USH1B), an autosomal recessive disorder with sensory impairment 61 . In addition to syndromic conditions, mutations in MYO7A can cause dominant (DFNA11) 62 and recessive (DFNB2) 63 non-syndromic HL. The p.Arg1338His variant detected in FAM411 is located in the FERM 1 domain of the tail of the myosin VIIA protein. The nonsense variants p.Arg1338AlafsTer61, p.Gln1336Ter or p.Glu1337SerfsTer62, linked to Usher syndrome, were identified close to our identified p.Arg1338His variant. VUS missense variants p.Arg1338Ser or p.Arg1338Cys with unclear clinical phenotype were also identified (https:// deafn essva riati ondat abase. org/). Our proband had HL onset at the age of 22 years with mild progression of hearing impairment and a U-shape audiogram. Moreover, no retinal abnormalities were recorded, indicating the dominant non-syndromic HL.
Transmembrane channel-like protein 1 is a protein encoded by the TMC1 gene. The predicted structure of TMC1 shows a dimeric channel with 10 transmembrane domains (S1-S10) and intracellular amino-and carboxyl-termini in each monomer. Transmembrane helices S4-S7 form a groove that lines the pore of sensory transduction channels 64 . The transmembrane channels are proteins necessary for hair cell mechanotransduction 65 . Dominant mutations in TMC1 cause the late-onset progressive HL phenotype, whereas ARSNHL cases are linked to congenital severe-to-profound HL 66,67 . We identified a missense variant, c.1949C>T, which is predicted to substitute a conserved proline at position 650 for leucine, which may influence the correct folding or assembly of TMC1 into multimers. It is located in the helical S9 domain. Two pathogenic autosomal recessive variants were previously identified in close proximity to our variant: p.Met654Val 68 and p.Ser647Pro 69 . However, according to the phenotype observed in FAM416, we cannot rule out the dominant character of inheritance of the p.Pro650Leu variant or a yet undetected causal variant in a different gene. One young family member, D1337, showed still normal hearing, although she carried the same variant as her father, thus supporting the possibility of late-onset HL.
The TMPRSS3 gene encodes a member of the serine protease family. Transmembrane Serine Protease 3 is linked to hair cell sterocilia mechanics and to actin network formation, which supports cell motility and integrity. Apart from hair cells, the TMPRSS3 gene is also expressed in the inner and outer pillar cells and the Deiters cells, stria vascularis or ganglion spirale 70,71 . This gene was identified by its association with childhood onset of an autosomal recessive deafness. The hearing impairment in families with mutations in the TMPRSS3 gene has a prelingual to postlingual onset. A common feature of patients with TMPRSS3 mutations is progressive HL beginning in high frequencies and often leading to partial or complete deafness. However, Sasamori et al. 72 reported a case of progressive mid-to low-frequency sensorineural HL associated with mutation of the TMPRSS3 gene. We identified a likely pathogenic variant, p.Arg106Cys, which is located within exon 4 and causes the negatively charged residue arginine to change to a neutral hydrophobic cysteine at position 106. This change can result in the loss of hydrogen bonds and/or disturb the correct folding of TMPRSS3. It has been observed in the compound heterozygote state with p.(Phe13Serfs*12) and p.Ala306Thr, where it cosegregated with prelingual profound hearing impairment or postlingual milder hearing impairment, respectively 70 . The homozygous form observed in proband D298 has not previously been observed. Although Gao et al. 70 predicted a very subtle or no effect on auditory function of homozygous p.Arg106Cys, we have shown that it is associated with moderate to severe perilingual progressive HL with a typical U-shape audiogram.
The DIAPH1 gene encodes human protein DIA1, a formin that elongates unbranched actin 73, 74 . The correct polymerization of actin is critical for the formation and elongation of stereocilia on the apical surface of cochlear hair cells 75 . Expression of DIAPH1 in the Organ of Corti has been proven in the inner pillar hair cells as well as at the base of the outer hair cells and in the outer pillar cells 76 . DIA1 consists of the GTPase-binding domain (GBD), a partially overlapping N-terminal diaphanous inhibitory domain (DID), formin homology (FH) domains FH1 and FH2, and the C-terminal DAD. It is held inactive in the resting state through an autoinhibitory intramolecular interaction between the DID and the DAD, which is regulated by Rho family GTPases 74 . In this study, we identified the VUS variant p.Gly1197Asp which, according to the Human Splicing Finder, has no significant impact on splicing signals. It is located in close proximity to the MDxLLxxL recognition site (1199-1206aa), which presents the central consensus motif of the DAD domain and is essential for binding to the DID domain. We hypothesize that our missense c.3590G>A variant results in a change from a small, uncharged amino acid glycine to a larger, charged residue aspartate. This may have a profound effect on the conformation of the amphipathic helix of the DAD domain and subsequently impair interaction with the acidic groove of the DID. Disruption of the autoinhibitory intramolecular DID-DAD interaction and consequent activation of DIA1 may explain the DFNA1 pathology 74 . DIAPH1 mutations have been associated with initially mild, low-frequency SNHL, characterized by progressive deafness starting in childhood 73 . In contrast, some described variants showed hearing impairment beginning in the high-frequency range and progressing to deafness involving all frequencies 74 . Audiograms of our proband and family members showed a typical U-shape. Autosomal dominant deafness caused by mutations in the DIAPH1 gene can be associated with thrombocytopenia 76,77 . However, in the affected family members D1248, D1249 and D1250 no thrombocytopenia was confirmed suggesting nonsyndromic HL. The only pathological finding identified by differential blood count analysis was neutropenia in the family member D1250.
The TSPEAR gene encodes the thrombospondin-type laminin G domain and EAR repeats protein (TSPEAR), which was detected at the basal region of the stereocilia of hair cells. Mutations in TSPEAR cause non-syndromic SNHL with prelingual onset 78 . We identified the nonsense variant p.Glu624* in exone 12 resulting in a loss of the EAR7 repeat. Seven epilepsy-associated repeats (EARs) of TSPEAR are predicted to form a beta-propeller structure, function as part of a ligand-binding domain and mediate protein-protein interactions 79 . Delmaghani et al. 78 identified in 3 affected brothers from a consanguineous Iranian family segregating autosomal recessive nonsyndromic sensorineural deafness a homozygous frame-shift mutation (c.1726G>T+c.1728delC) that was predicted to result in termination of translation in the EAR6 repeat, connected with defective secretion of the www.nature.com/scientificreports/ mutant protein and congenital profound sensorineural deafness. As both affected siblings in our family FAM174 segregated a similar HL phenotype, but only one of them carried the homozygous variant in TSPEAR, we may assume that in addition to the identified p.Glu624* variant in the TSPEAR gene a second, yet unknown gene may contribute to HL in this family. In general, autosomal recessive variants are mainly responsible for prelingual forms of severe to profound hearing impairment, whereas autosomal dominant variants cause HL that usually appears later and is often mild or moderate. Exceptions comprising postlingual HL include recessive variants in the TMPRSS3 gene; prelingual include dominant variants in the TECTA gene 32 .
The precise data on the prevalence of pathogenic variants in the genes described above in the general and hearing impaired populations in Slovakia are not yet available. Most of the variants were found in single families, although pathogenic variants in TMPRSS3 were also identified in two probands with typical ski-slope audiograms (unpubl. data).
Conclusions. This study confirms the genetic background in at least 20% of patients with mid-frequency SNHL. Two of the identified genes (TMPRSS3 and MYO15A) are reported in association with this audiogram configuration for the first time. Moreover, we attempted to narrow the spectrum of candidate genes in this specific and rare SNHL form to 3 distinct groups based on the site of their expression and their function in the cochlea. In our opinion, Sanger sequencing of TECTA or WES are currently two cost-effective options for DNA analysis of mid-frequency SNHL.

Methods
Patients recruitment. Patients with mid-frequency HL were selected from among 851 families (1074 hearing impaired individuals) with bilateral SNHL with onset before the age of 60 years whose DNA samples and clinical data were collected throughout the whole Slovak Republic over the last decade. The subjects were referred from major ENT departments in Bratislava or recruited at boarding schools for hearing-disabled children and in selected isolated Roma communities. Peripheral blood and/or buccal swabs were taken for DNA isolation. The GJB2 gene was analyzed in all probands first to exclude the most frequent cause of hereditary HL. Only non-GJB2 associated cases (71% of all probands) were considered for further analyses. All individuals or their legal representatives (in subjects under 18 years of age) signed informed consent to genetic testing and the study was approved by the Ethics Committee of University Hospital in Bratislava.
Mediocochlear HL was defined by the highest pure tone thresholds in mid-frequencies (1-4 kHz). In typical cases, the difference between the worst mid-frequency threshold and the high-and low-frequency thresholds was at least 20 dB. However, individual variability of the audiometric curve shapes was relatively common among the patients and sometimes also between the two ears of the same individual. Borderline cases with shallow curves were included only if the mid-frequency HL was confirmed to be reproducible during the audiometric follow-up. Preschool children and subjects in whom reliable audiometric data were not available were excluded from the study. Based on the criteria mentioned above, we selected 30 probands (27 of Caucasian and 3 of Roma ethnicity) with mid-frequency HL of yet unknown aetiology.
Whole exome sequencing. Probands meeting the inclusion criteria were selected for WES analysis.
Genomic DNA was extracted from peripheral blood using standard procedures. WES was performed by service providers: BGI, Hong Kong, and Theragen, Republic of Korea. A DNA library was prepared using the BGI 59M Human Exome kit (probands D527, D753, D815, D821 and D822) and the SureSelect XT V6 kit (all remaining probands), respectively. Sequencing data were processed by the provider's standard bioinformatics pipeline, which encompassed base calling and alignment of generated reads to the GRCh38 reference genome without the unplaced or alternate loci. Variant calling was routinely performed in-house using The Genetic Analysis Tool Kit (GATK) HaplotypeCaller, combined with the GATK UnifiedGenotyper in the case of BGI samples 80 .
Aligned reads and called variants were obtained in standard bioinformatics formats and subjected to the following bioinformatics pipeline. Variants were decomposed and normalized using vt 81 . Variant effect predictor 82 was used to annotate the variants with respect to their potential effects on genes and transcripts and to add scores from in silico prediction algorithms PolyPhen and SIFT 83 . Finally, the Gemini framework 84 was used for submitting variants into a newly created SQLite database and for annotating variants with additional data from genome annotation databases, e.g. the Single Nucleotide Polymorphism Database (dbSNP) 85 , Encyclopedia of DNA Elements (ENCODE) (The ENCODE Project Consortium), ClinVar 86 , 1000 Genomes (The 1000 Genomes Project Consortium), the Exome Sequencing Project (ESP) (http:// evs. gs. washi ngton. edu/ EVS/), Exome Aggregation Consortium (ExAC) and the Genome Aggregation Database gnomAD 87 . Afterwards, variant prioritization and filtering were carried out by removing those variants with an MAF ≥ 0.01 (based on maximal MAF in all used population databases), and the resulting variant set was first narrowed down by removing variants not lying inside the regions of 226 genes with known association with SNHL. The set of evaluated SNHL genes was based mainly on the webpage https:// hered itary heari ngloss. org/. In the case that no candidate variants were found among the virtual panel of 226 genes, the whole exome was evaluated. Candidate variants were clinically classified according to the American College of Medical Genetics and Genomics (ACMG) guidelines 88 adapted by Oza et al. 89 , with modification for the PVS1 criterion according to Abou Tayoun et al. 43 . The VUS variants were selected as candidates only if they belonged to previously reported deafness genes. The limited number of available family members needed for cosegregation analysis makes it difficult to classify new gene variants among novel genes. Moreover, the further classification of VUS variant in unknown genes would be inconclusive without functional studies. www.nature.com/scientificreports/ Sanger sequencing. Sanger sequencing was performed to verify variants identified by WES and to determine co-segregation of the candidate variants with HL in all participating family members. The primers were designed using the program Primer Blast 90 . The resulting PCR products were sequenced using the Big Dye Terminator v3.1 kit and analyzed with the ABI3500 Genetic Analyzer. Sequence chromatograms were analyzed by Sequence Scanner Software 2.0 (available from http:// www. therm ofish er. com/). All methods were performed in accordance with the relevant guidelines and regulations. Copy Number Variants (CNV) were not investigated. Some of the unsolved families could be due to CNVs.