Predicting pathogenicity for novel hearing loss mutations based on genetic and protein structure approaches

Hearing loss is a heterogeneous disorder. Identification of causative mutations is demanding due to genetic heterogeneity. In this study, we investigated the genetic cause of sensorineural hearing loss in patients with severe/profound deafness. After the exclusion of GJB2-GJB6 mutations, we performed whole exome sequencing in 32 unrelated Argentinean families. Mutations were detected in 16 known deafness genes in 20 patients: ACTG1, ADGRV1 (GPR98), CDH23, COL4A3, COL4A5, DFNA5 (GSDDE), EYA4, LARS2, LOXHD1, MITF, MYO6, MYO7A, TECTA, TMPRSS3, USH2A and WSF1. Notably, 11 variants affecting 9 different non-GJB2 genes resulted novel: c.12829C > T, p.(Arg4277*) in ADGRV1; c.337del, p.(Asp109*) and c.3352del, p.(Gly1118Alafs*7) in CDH23; c.3500G > A, p.(Gly1167Glu) in COL4A3; c.1183C > T, p.(Pro395Ser) and c.1759C > T, p.(Pro587Ser) in COL4A5; c.580 + 2 T > C in EYA4; c.1481dup, p.(Leu495Profs*31) in LARS2; c.1939 T > C, p.(Phe647Leu), in MYO6; c.733C > T, p.(Gln245*) in MYO7A and c.242C > G, p.(Ser81*) in TMPRSS3 genes. To predict the effect of these variants, novel protein modeling and protein stability analysis were employed. These results highlight the value of whole exome sequencing to identify candidate variants, as well as bioinformatic strategies to infer their pathogenicity.


Results
Validation of variants and genetic diagnosis. Thirty-two patients with different forms of hearing loss were studied by WES, followed by data filtering based on 183 genes reported for this pathology. Approximately 90,000 variants were identified for each patient studied. Filtering reduced the list of candidate variants to 1-10 for each patient. After analysis following the American College of Medical Genetics and Genomics (ACMG) and Hearing Loss Variant Curation Expert Panel (HL-EP) parameters, variants were classified from benign to pathogenic in each patient and confirmed by Sanger sequencing and family segregation when available (full data are detailed in Supplementary Figure S1).
Overall, 27 different mutations were identified in the following 16 genes: ACTG 1, ADGRV1 (GPR98), CDH23, COL4A3, COL4A5, DFNA5 (GSDDE), EYA4, LARS2, LOXHD1, MITF, MYO6, MYO7A, TECTA, TMPRSS3, USH2A and WSF1 in 20 of the 32 studied patients (62.5%). The other 12 cases that remained undiagnosed had recessive variants in the heterozygous state or variants that failed to segregate with the pathology in the family. Therefore, since they did not fulfill the ACMG criteria, they were not reported as positive. Since neither large insertions/deletions nor repetitions or deep intronic variants were studied, exome data remain available for further analysis.
In positive cases, segregation within the family was performed in 16 out of the 20 cases, and variant inheritance was established. All variants are listed in Table 2, together with the patients´ auditory phenotype (pedigrees are detailed in Fig. 1 and Supplementary Figure S1 Supplementary Table S4. Five of the thirty-two patients under study presented features compatible with syndromic forms of hearing loss: 2 Alport, 2 Usher and 1 Waardenburg type 2 (WS2) ( Table 1). Both Alport and Usher cases resulted in causative mutations in genes related to a syndromic phenotype, and the Waardenburg patient remained undiagnosed (nevertheless large deletions and/or insertions cannot be ruled out by this approach). Three additional cases presenting nonsyndromic HL at consultation resulted in pathogenic variants in genes associated both with syndromic and nonsyndromic forms: 2 Usher syndrome and 1 Perrault syndrome. These molecular genetic results might modify the future clinical outcome, genetic counseling and clinical follow-up to the patients and families. Of the 20 patients with identified variants, 13 required no further studies since they had nonsyndromic hearing loss.
Clinical characteristics of relevant cases. Case #1. Two novel variants in CDH23 (NM_022124.5) were identified, c.337del, p.(Val113*) and c.3353del, p.(Gly1118Alafs*7), in a 23-year-old patient with cochlear implant in childhood (Fig. 1A). Both variants were associated with Usher Type 1D and resulted in truncated proteins, probably with no residual function. According to HL-EP recommendations, they were both interpreted as pathogenic. Both parents were healthy carriers by Sanger sequencing. The patient already showed some initial retinopathy compatible with Usher signs, further indicating that the mutations were causative of the phenotype and that the signs will progress. Case #2. The patient had postlingual bilateral moderate hearing loss with a U-shaped audiogram, which was caused by a novel heterozygous variant in the MYO6 gene (NM_004999.4): c.1939T > C in exon 18 p.(Phe647Leu). The patient had four affected siblings (not available for molecular diagnosis). Thus, the variant was linked to a dominant form of inheritance (DFNA22). To further study the effect of the variant in the MYO6 structure, a model of the affected motor domain bearing the p.(Phe647Leu) mutation was generated (Fig. 3).
Case #3. The patient had postlingual bilateral moderate hearing loss that was caused by a novel heterozygous variant in EYA4: c.580+2T > C (splicing site). The variant cosegregated with the affected mother in the family, consistent with a dominant form of inheritance (Fig. 1C).   For each domain mutations affect the electrostatic surface of the protein as well as the distance between neighboring residues. The p.Thr629Met variant lies in the LS domain between the hairpin of beta strand I-I, altering the folding of this loop and compromising the stability of the region (Zoom F). Detailed information regarding the genetic variants analyzed can be found in Table 3.
tion in the family was confirmed by Sanger sequencing. The affected proband, a 1-year-old boy, presented isolated prelingual hearing loss at the time of genetic diagnosis, with no retinal or vestibular pathologies (Fig. 1D). Since syndromic and nonsyndromic forms have been reported due to mutations in the MYO7A gene, patient clinical follow-up was recommended.
Case #5. A familial case with two affected cochlear implanted siblings (8-and 12-year-old boys). Two variants were identified in LARS2 (NM_015340.3): c.1886C > T, p.Thr629Met was previously reported only once in ClinVar related to Perrault syndrome, and the novel c.1481dup, p.(Leu495Thrfs*31) (Fig. 1E). This latter novel frameshift mutation is predicted to be pathogenic based on its truncating effect on LeuRS, leading to the loss of its catalytic, leucine-specific and anticodon-binding domains. Pathogenicity was further confirmed by the deleterious effect of the mutation on the LeuRS structure through molecular modeling analysis (Fig. 2). Segregation analysis indicated that the parents were carriers for the mutations.
Case #7. A family case with a proband diagnosed with prelingual bilateral profound sensorineural hearing loss and three other affected members with a similar phenotype (son, sister and mother), who were cochlear implanted. The reported nonsense mutation c.877C > T (NM_000248.3) in MITF was found and predicted to be pathogenic, leading to an early truncated and nonfunctional protein p.(Arg293*). The variant cosegregated with the pathology in all of the affected members of the family. The same mutation has been reported in a family with www.nature.com/scientificreports/ Waardenburg syndrome type 2 (WS2) 15 . It is striking that no Waardenburg signs were observed in any of the members of the family (Fig. 1F).
Case #8:. An affected girl (15 years old) with isolated postlingual hearing loss and a sloping audiogram was diagnosed at 11 years old. Heterozygous variants were detected in TMPRSS3 (NM_024022.2): c.1276 G > A, p.(Ala426Thr), reported several times as likely pathogenic in ClinVar supporting a deleterious effect (rs56264519) and the novel c.733C > T, p.(Ser81*) mutation. Parents were found to be carriers of these mutations, consistent with recessive inheritance.
Case #11. A 27-year-old patient with high frequency hearing loss. Two variants were identified in CDH23: c.1515-12G > A, previously reported as variant of uncertain significance (VUS) and now reclassified as likely pathogenic, and c.1096G > A, p.(Ala366Thr) classified as benign based on its high population frequency (BA1 applied) (Fig. 1B). It remains unclear whether these genetic findings are related to the pathology in this family, since the proband had no retinopathies or vestibular abnormalities, as seen in Usher Type D syndromes associated with mutations in CDH23 (OMIM #601067).
Variant curation. After WES analysis, 28 different variants were found in 20 patients. Sixteen variants were already reported in the ClinVar database: 5 pathogenic, 6 pathogenic/likely pathogenic, 1 VUS, 3 conflicting interpretations and 1 benign ( In the case of novel variants, classification was established following the standard protocol as per ACMG/ AMP for variant interpretation for genetic hearing loss and the updated recommendations of the ClinGen Hearing Loss Expert Panel, and in some cases (LARS2 and MYO6), their final classification was further established through molecular modeling and in silico strategies.
Combined in silico analysis. Functional assays are essential for the interpretation of missense variants associated with pathology. However, experiments for functional validation are time consuming and not always feasible in the clinical context. Therefore, bioinformatic tools that predict protein malfunction appear to be valid predictable tools of pathogenicity. We implemented full modeling and domain modeling as bioinformatic approaches to determine the in silico implications of missense variants.
Full modeling of LeuRS (LARS2 gene). The LARS2 gene encodes a mitochondrial leucyl-tRNA synthetase (LeuRS) that catalyzes the aminoacylation of a specific tRNA. The protein architecture of LeuRS includes motifs that are catalytically important (HIGH and KMSKS) and different domains: catalytic, editing, leucine-specific (LS), anticodon-recognition, and the C-terminal domains (C-ter) (Fig. 2). Sequence variants in LARS2 have been previously associated with Perrault syndrome, characterized by premature ovarian failure, hearing loss and other severe multisystem metabolic disorders (OMIM #604544).
In Case #9, we identified two variants: c.1481dup, p.(Leu495Thrfs*31) and c.1886C > T (p.Thr629Met). The former is novel and predicted to yield a truncated nonfunctional protein. Although the latter variant has already been reported, we performed a deeper follow-up analysis to understand the impact of the mutation on the translated protein. Thus, we conducted molecular modeling of the entire human LeuRS and analyzed its stability, electrostatic surface and tRNA interaction. In addition to the p.Thr629Met variant found in the proband of Case #5, we included in our analysis 17 additional missense variants reported as likely pathogenic or pathogenic for the LARS2 gene in LOVD, Deafness Variation Database and ClinVar (2 frameshift and 1 nonsense reported variant were not included). These variants are identified above the primary structure of LeuRS ( Fig. 2A). Most of the variants were located in the catalytic, LS and editing domains, and none were located in the C-terminal or anticodon domain.
The model shows that protein stability was altered in 7 of the variants analyzed (41%), the electrostatic charge in 3 (18%) and the tRNA-protein interaction in 3 (18%) of the previously reported mutations. The analysis was nonconclusive for 4 variants (24%) ( Table 3).
The p.Thr629Met variant detected in the patient of Case #5 lies in the LS domain between the hairpin of beta strand I-I altering the folding of this loop, compromising the stability of the region ( Fig. 2F and Table 3). An additional previously reported variant, p.(Glu638Lys), also lies in the LS domain and alters the interaction with de t-RNA through the change of a negative to a positive side chain of the residue (Fig. 2F). Ten genetic variants are located in the catalytic domain of the protein. According to the model, p.Met117 (light green helix in Fig. 2C) interacts with residues p.Trp800, p.Val820, p.Trp825 and p.Val786, of the anticodon domain (red helix in Fig. 2C). The methionine to isoleucine change in p.(Met117Ile) results in altered protein stability (   . 2C).
Under the magnifying glass of the present model, variant p.Arg453Gln lies within the catalytic domain of the protein, in contrast with the previously reported model that positioned it in the t-RNA binding domain 16 . Thus, the pathogenicity of p.(Arg453Gln) is related to a combination of a change in surface electrostatic charge and a steric effect of this residue in the region (Table 3 and Fig. 2E) www.nature.com/scientificreports/ the structure of the catalytic domain through a steric effect. In addition, p.(Pro507Arg) affects the structure of the loop between the G-G β-strand, and p.(Pro536Leu) affects the structure of the H19-H20 α-helix (Fig. 2E, green). Three missense variants were found in the editing domain. Variants p.(Glu294Lys) and p.(Glu413Lys) generate a change from glutamic acid to arginine, decreasing the electrostatic surface charge. The p.Thr300 residue, mutated to M in a previously reported patient 17 , is crucial for leucine-tRNA edition in the β-strand of this domain, as observed in the center of Fig. 2D.
The new model of LeuRS presented in the present work can be considered a new parameter (PP3 score applied) to classify variants reported in databases according to the Hearing Loss Expert Panel classification. The final classification of reported LARS2 variants is detailed in Table 3.

Modeling of the N-terminal motor (head) domain of the MYO6 protein.
Myosins are actin-based motor molecules with ATPase activity. Myosin VI is a reverse-direction motor protein that moves toward the minus-end of actin filaments and plays a crucial role in the organization of the stereociliary bundle and the maintenance of the cuticular plate anchoring of stereocilia rootlets in hair cells 18 . Mutation p.(Phe647Leu) in MYO6 detected in Case #6 was not previously reported in ClinVar and was not found in the GnomAd (Genome Aggregation Database) or 1000 Genomes databases. To determine the potential effect of the missense variant on MYO6 function, we further performed in silico studies. The bioinformatic analysis predicted that c.1939T > C, (p.Phe647Leu) is a disease causing variant by Mutation Taster (0,81), damaging by PolyPhen-2 HumDiv (score 0.986), damaging by SIFT (0), deleterious by CADD, damaging by PROVEAN (-5,7) and pathogenic by the UMD-Predictor (score 75), REVEL (0, 9).
With the aim of further analyzing the impact of the mutation on the protein, specific protein domain modeling was performed in two conformations. The root mean-square deviation (RMSD) values of the protein backbone distances between the two MYO6 motor domain conformations (pre-powerstroke and PI release) were first calculated to determine whether the residue is located in a motil region. In particular, two high RMSD values were obtained for amino acid positions 395 to 405 and 621 to 649 in the two conformations. Therefore, amino acid changes in these regions could affect the interaction with nearby residues, affecting the proper function of the protein (Fig. 3B). Our analysis indicates that residue Phe647 is indeed located in a motile region between both conformations, resulting in a significant influence on the structure of the protein. This change leads to a steric effect in the amino acid side chain, altering interactions with nearby nonpolar residues. In particular, when Phe647 is mutated to Leu, the distance between the side chains of the three nearest residues (less than 5 Å in the wild type), Ile457, Cys476 and Leu651, is significantly increased. For the pre-powerstroke state conformation, the average distance between residue 647 and these amino acids is 4.98 Å (Phe) and 8.42 Å (Leu). For the Pi release state conformation, the average is 3.95 Å (Phe) and 7.64 Å (Leu). Thus, in both the pre-powerstroke and Pi release conformations, the average distance was almost twofold increased in the Leu647 variant (distances are detailed in Fig. 3D). In addition, model analysis shows that the alpha helix containing the Phe647 residue is adjacent to the actin binding region (Fig. 3C, in green β-strandand). Hence, it is possible to infer that alterations in the surrounding areas may affect the structure and/or function of the actin binding site.

Discussion
Genotype-phenotype characterization in HL patients is not a straightforward endeavor. Nonequivocal genotypic information is crucial for the clinical care and genetic counseling of HI patients. Gene variants leading to frameshift and nonsense mutations or affecting canonical splice sites most likely lead to a null translation of the mutated allele. However, predicting the effect of DNA substitutions leading to missense mutations is far more complicated. These can lead to a myriad of effects that end in protein malfunction, including altered stability, nonfunctional protein domains and lack of catalytic function. Compiled by field experts gathered in the Hearing Loss Expert Panel, genotype validation is accelerating. In the present work, we report new gene variants in a cohort of Argentinian HI patients. These were curated following the ACMG and Hearing Loss Expert Panel guidelines. Moreover, the pathogenicity of some of the variants was validated through in silico protein analysis. In addition, the pathogenicity of previously reported variants in the ClinVar database was reclassified. The present work adds to the standardization of HI variant interpretation as a crucial step to provide consistent and accurate diagnoses for families and professionals involved, as well as for a better understanding of the mechanisms underlying disease pathogenesis.
Overall, 27 different mutations were identified in 16 hearing loss genes in 20 out of the 32 patients studied. The rate of genetic diagnosis was 63%, significantly higher than the 36% standard of care (GJB2/6 sequencing only), previously reported in our laboratory 8,19,20 . There was a significant diversity in the overall diagnostic rate. In patients with a family history, the diagnostic rate was 50% (8/16), while in isolated cases, 69% were diagnosed (11/16). This bias can be attributed to the fact that the isolated patients who were diagnosed had mostly syndromic forms, where the genetic target to be studied is more enclosed and has a higher diagnostic success rate. The curation procedure was effective since eleven of the sixteen (69%) reported variants changed or refined their category previously reported in ClinVar. In this regard, rigorous variant manual curation demonstrates its importance in accurate variant interpretation and hence precise genetic counseling to patients. These outcomes are in accordance with our previous report 19 .
The two sisters of Case #11 with nonsyndromic hearing loss exhibited two variants in CDH23, a gene that encodes a putative cell adhesion protein with multiple cadherin-like domains. CDH23 is responsible for both Usher syndrome and DFNB12 nonsyndromic deafness 1 . It has been established that nonsense, splice-site, or frameshift mutations in CHD23 are related to Usher 1D, whereas missense mutations are related to a milder phenotype (nonsyndromic HL, DFNB12). Moreover, concerning the type of hearing loss, previous genotype-phenotype reports showed that the majority of patients have some residual hearing at lower frequencies and a www.nature.com/scientificreports/ characteristic high frequency hearing loss sloping pattern. This is in accordance with the audiogram of our studied siblings 21,22 . After the curation process, variant c.1515-12G > A in CDH23, applying criteria PM2_Supporting, PP3, PM3, PP4 and PP1_Supporting, changed its previous classification in ClinVar from VUS to Likely Pathogenic. Further functional analysis, as well as new reports in Usher patients would provide confident evidence concerning its pathogenicity. Variant c.1096G > A; p.(Ala366Thr) was classified as benign due to its high population frequency in the Ashkenazi population (BA1 criteria applied). Hidden variants in gene regions that we did not explore, such as deep intronic mutations that can disrupt transcription regulatory motifs and/or noncoding RNA regions, might underlie the observed phenotype 22,23 . Taking into account all the evidence presented we cannot unequivocally conclude that these two variants are the cause of the hearing loss in this family. Nevertheless, due to the reclassification of c.1515-12G > A to likely pathogenic, the case is worth mentioning and discussed. In Case #10 a nonsense variant was detected in MITF. Most mutations in MITF have been mostly associated with Waardenburg syndrome type 2, a dominant syndromic form of hearing loss. It is associated with hypopigmentation of the skin, hair and eyes, since MITF has a regulatory effect on TYR transcription, a key enzyme involved in melanin synthesis 24 . The variant identified in our patients had already been reported in a Waardenburg type 2 family case 15 . Notably, none of the four affected members of this family presented any other signs in addition to profound HL. This finding is in accordance with some reports in which variants in the MITF gene cause only hearing loss 25,26 . The variable phenotype expression could be explained by the presence of modifier genes, as well as interactions with environmental factors 27,28 .
Several bioinformatic algorithms have been developed to predict the functional consequences of single nucleotide variants in protein coding regions. These in silico approaches are an alternative to tedious and timeconsuming experimental approaches to infer pathogenicity. Thus, the new model presented in the present work for the LeuRS human protein aided in defining the plausible pathogenicity of the identified p.Thr629Met variant in Case #9. Moreover, it can be further used to predict the pathogenicity of other reported variants. The p.Thr629Met mutation in the patient lies in the beta hairpin of LeuRS between strands I-I of the LS domains and leads to the disruption of the motif and protein stability. This is in accordance with the observation that beta hairpin motifs are implicated in protein stability 29,30 . Moreover, the substitution of p.Ala508 in Escherichia coli LeuRS (analogous to human LeuRS p.Thr629) with a nonpolar methionine disrupts the structure and/or www.nature.com/scientificreports/ position of the leucine-specific domain and thus shifts the location of the KMSKS loop and reduces the catalytic efficiency 16,31 . It should be noted that the present model of human LeuRS is an improved version of those previously reported 16,32 since we used a variety of selected crystals. In this regard, the present model shows that most of the variants in the catalytic domain produce a severe effect on protein stability, affecting the proper function of the protein. Our new model shows that variants reported as pathogenic in databases effectively induce significant structural changes of the protein that would cause potential functional changes, playing an important role in genotype/phenotype prediction. Thus, we propose that this model should be used for future in silico predictions of variant pathogenicity in patients with Perrault syndrome, becoming a bridge between genomics and structural data to guide the interpretation of human genetic variants.
In Case #6, we identified a novel MYO6 variant c.1939T > C, p.(Phe647Leu) in a family with late-onset autosomal dominant nonsyndromic hearing loss. Most MYO6-known genetic variants present progressive hearing loss [33][34][35] and myosin VI is required for structural integrity and proper functioning of inner ear hair cells 18,36 . More than half of the known pathogenic mutations are located in the motor head domain, indicating that it plays an important role in the function of MYO6 34 . The motor-head domain modeling approach applied in the present work reveals that mutation p.(Phe647Leu) alters the proper function, most likely through the increased distance of this amino acid from the three nearby nonpolar residues. Moreover, the structural change caused by the mutation in the alpha-helix of this highly motile region could potentially affect the contiguous actin binding region. In this regard, mutations in the MYO6 motor domain alter anchoring of the membrane of stereocilia to actin filaments, leading to disruption of hair bundle organization 36,37 .
In conclusion, the present work highlights the importance of the curation of genetic variants leading to HL following recommendations of experts for the correct phenotype-genotype correlation. Moreover, we show the importance of the incorporation of integrated workflows for predicting the biomedical impact of the variations identified by exome analysis. Most importantly, we propose a multitarget approach including genomics, protein structure and data analysis to guide the interpretation and standardization of human genetic variants leading to hearing loss.

Materials and methods
Subjects and selection criteria. Thirty-two unrelated Argentinean families were included in this study ( Fig. 1 and Supplementary Figure S1). Hearing loss was bilateral and moderate to severe (45-95 dB) or profound (> 95 dB), and onset was either congenital or postlingual progressive (average 20-40 years). Audiological evaluation included pure-tone audiometry at four frequencies (0.5, 1, 2, and 4 kHz). The pure-tone average (PTA) was calculated from the audiometric thresholds. The HL patients were divided into three groups based on severity: moderate (41-70 dB HL), severe (71-95 dB HL), and profound (> 95 dB HL). The audiometric configurations were classified into low-frequency, middle-frequency (U-shaped), high-frequency and flat types 38 . Patients underwent auditory brainstem response (ABR), tympanometry, fundus examination, and cardiac and renal ultrasonography to detect undiagnosed syndromic forms. Clinical examinations revealed symptoms suggesting a syndromic form of deafness in five patients: two with visual defects, two others with hematuria, and one with hair pigment abnormalities. All data were reviewed by a clinical geneticist. The study was conducted in accordance with the Declaration of Helsinki, and the protocol was approved by the Ethics Committee of "Administración Nacional de Laboratorios e Institutos de Salud" (ANLIS-19122018) and "Fundación para la Lucha contra las Enfermedades Neurológicas de la Infancia" (FLENI -04,092,020). Written informed consent for testing and publication was obtained from patients or parents in the case of minors.
Genomic DNA was extracted from peripheral blood samples using the standard method. Quality and concentration were measured by agarose gel electrophoresis and absorbance-based nucleic acid quantification (Thermo Scientific-NanoDrop™). A total of 695 patients with different forms of deafness were recruited for GJB2/GJB6 mutation screening, identifying biallelic pathogenic mutations in 103 of them (15%). Thirty-two patients undiagnosed for GJB2/GJB6 mutations were selected for WES screening. The age of the patients varied between 6 months and 50 years. Seventeen out of the 32 patients (53%) were females, and fifteen (47%) were males. Among the 32 probands, 50% (16/32) were sporadic and 50% (16/32) had at least two affected relatives with HL (familial cases).
Whole exome sequencing and bioinformatics analysis. Massive parallel sequencing was carried out on an Illumina NovaSeq6000. Base calling, read mapping and annotation of variants were performed by Macrogen Genome Sequencing Services (Macrogen, Korea) and data were processed according to the Genome-Analysis-Toolkit (GATK) best practices workflow. Variants for 183 deafness-related genes were filtered from the WES on samples from affected patients and relatives when available (analyzed genes are listed in Supplementary Table S1). The average read length covered 148 bp, with an average exon depth coverage in analyzed genes of 100X; 97% of targeted reads had > tenfold coverage. Aligned reads were compared to the human reference genome (GRCh37/hg19). After variant annotation, mutations that arose from known deafness genes were selected by filtering with an in silico panel using a homemade Python script pipeline. The missense, nonsense, insertion/deletion and splicing variants were selected from the detected variants. The minor allele frequency threshold (MAF) considered was ≤ 0.01 and 0.005 for recessive and dominant alleles, respectively, when compared with those reported in gnomAD-http:// gnomad. broad insti tute. org/) and 1000 genomes (https:// www. inter natio nalge nome. org/ 1000-genom es-brows ers/). The pathogenic potential of selected variants was analyzed using bioinformatic programs and databases: Mutation Taster-http:// www. mutat ionta setr. org 39  www.nature.com/scientificreports/ dbNSFP-https:// sites. google. com/ site/ jpopg en/ dbNSFP. All information was compiled and criteria rules were combined to reach a variant classification according to data retrieved from InterVar -http:// winte rvar. wglab. org/, Varsome-https:// varso me. com/ and REVEL-https:// sites. google. com/ site/ revel genom ics 42 . Once annotation of variants was performed, the number of significant variations was reduced from more than eighty thousand to a small, manageable number (1 to 10) of putative candidate disease-causative mutations for further validation. This process included various parameters, such as mode of inheritance, mutation localization, mutation type (nonsynonymous variants, splice acceptor or donor site mutations, and coding noninframe In/Dels), frequency, pathogenicity of variants, published reports, and so forth. Pathogenicity prediction and variant classification were assessed taking into account criteria defined by the ACMG and further modifications, as well as recommendations of the HL-EP and data from deafness variation databases (http:// deafn essva riati ondat abase. org/) [43][44][45] . Variants were classified as pathogenic, likely pathogenic, VUS, likely benign or benign. Patients were informed of the results, and clinical follow-up was recommended when necessary, since some likely pathogenic variants were identified in syndromic genes, before the onset of additional clinical manifestations. Positive results were always confirmed by Sanger sequencing prior to reporting, and segregation through family was performed when possible.
Variant curation. To further analyze and validate the identified variants in patients, a combined in silico analysis was performed. In the first curation step, the prediction of pathogenicity of the variants previously reported in ClinVar was reanalyzed following the guidelines mentioned before 43 , collecting new data regarding segregation, new PubMed reports or publications and new updated data of frequency and phenotype/genotype correlations. The putative pathogenic effect of novel variants was manually scored using Varsome 46 .

Bioinformatic analysis.
To predict the potential consequences of some of the missense variants on protein function, different bioinformatic analyses were performed. These included a molecular homology modeling (MHM) approach and structural analysis of the mutated proteins. The line protein graphs in figures were generated with the PyGame library in the Python 2.7 programming language (http:// www. python. org) 47 .
Molecular modeling and structural analysis. MHM was performed for mitochondrial (LeuRS) (UniProt_ID Q15031-1) and the MYO6 motor-head domain (UniProt_ID Q9UM54-3, between amino acid residues 4 and 771). The evaluation criteria to select the template for the MHM were as follows: (1) protein sequence identity as template, (2) X-ray crystal resolution less than 2.4 Å, (3) crystallography conditions, (4) chain length and amount of residues of each template with respect to sequence identity and gaps, (5) structural conformation of each protein template, and (6) species of the template. The alignment between template and target sequences was performed with structural alignments (Expresso extension) in the T-Coffee web server 48,49 , MODELLER alignment scripts and Kluskal-Wallis in MEGA5 software, taking into account the secondary structures and topology of the regions 50 . Homology modeling was generated using the MODELLER program 51 . The models were first optimized with the variable target function method with conjugate gradients and then refined using molecular dynamics with simulated annealing 52 . Model quality evaluation was performed using DOPE 53 , QMEAN/QME-ANDisCo 54,55 and ProSA 56 to predict local pairwise residue-residue distances to the assessed model. The UCSF Chimera program 57 and the backbone-dependent rotamer library were used for structural interpretation and visualization 58 .
For the LeuRS protein model, template structures were selected from the RCSB Protein Data Base 59 and Uni-Prot (https:// www. unipr ot. org/). Fifteen potential templates were considered using a cutoff in sequence identity larger than 36% and a resolution less than 2.5 Å. For the alignment between templates and the LeuRS sequence, the N-terminal mitochondrial targeting signals (presequences) were deleted using MitoFates 60 . Structural regions that did not have reference templates were not considered for the structural analysis and are not shown in the visualization of the structure. After evaluation, six structures in the editing conformation were selected (Supplementary Table S2). Thirty molecular homology models with the selected templates were generated, and the final structure of the tRNA was determined with the 4AS1 template.
The MYO6 protein structure is divided into three regions: an N-terminal motor-head-domain (which comprises the N-terminal SH3-like, myosin motor, actin binding site and ATP catalytic region), followed by a neck domain (with a calmodulin-binding linker domain and a single IQ motif), and a C-terminal tail region (with a three-helix bundle region, an SAH domain and a unique globular domain required for interaction with other proteins such as cargo-binding) 61 (Fig. 3A). The modeled N-terminal motor (head) contains the N-terminal SH3-like (aa 4-53) and myosin motor (aa 57-771) domains with 98.15% identity between the target sequence and templates and a resolution of less than 2.5 Å. Modeling was performed in two MYO6 conformations according to ATPase cycle states: the pre-powerstroke, which is the key force-generating step (PDBID: 2V26, 4DBR, 4E7Z), and the next state with the release of phosphate, Pi release (PDBID: 4PFP, 4PFO, 4PJN) 62 (Supplementary  Table S3).
Stability analysis. To evaluate the effect of mutations on the stability/folding of LeuRS, we first analyzed the structural and energetic details of the interactions for each mutated residue using FoldX (http:// foldx suite. crg. eu/). 63 . Structures were optimized to the FoldX force field command from the molecular homology models. The ∆∆G values were estimated as the difference between the energy of the wild-type protein and the average of five replicas for each point mutation 64 . Values above 1,6 kcal/mol (twice the standard deviation) were considered to significantly destabilize the protein. To favor the wild-type conformation, all residues involved in tRNA interactions (5 Å distance) were fixed when optimizing the structures to the FoldX force field.