Introduction

The Sultanate of Oman is the second largest country in the Arabian peninsula. Its population of ~2.3 million is predominantly tribal, and is characterized by a rate of consanguinity estimated between 30 (ref. 1) and 55%,2 one of the highest in the world. In particular, the rate for first cousins marriages is 39% and is likely to remain stable in the future or decline slowly.3

The most widely recognized medical implication of consanguinity is the increased risk of autosomal recessive conditions in the offspring of closely related couples.4 In highly inbred populations, evolutionary forces such as genetic drift and founder effects result in the presence of few mutations in genes for specific diseases.5, 6 Therefore, in a tribal and genetically isolated population such as the Omani, it is likely to find the same mutation in the majority of the patients affected with an autosomal recessive disease.

The prevalence of bilateral disabling hearing loss in Oman was estimated to be 21/1000, as a result of a comprehensive community based cross-sectional study.7 Genetic hearing loss (GHL) was estimated to account for 2.69% of the total prevalence, corresponding to a specific GHL prevalence of 6/10 000.

A higher rate of consanguinity was observed among parents of the Omani children with severe to profound bilateral sensorineural hearing loss,8 which suggests a major role for the autosomal recessive forms. The most common genetic causes of GHL worldwide are mutations in GJB2 (MIM 121011), the gene coding for the gap junction protein connexin 26, associated with autosomal dominant or recessive forms of deafness. As in other Middle East populations recently reviewed,9 mutations in GJB2 were reported to be absent in Oman.10 Only recently Rajab et al.11 collected 300 mutations found in Omani populations disclosing the presence of three different GJB2 mutations. Because of to the extreme genetic heterogeneity underlying a stereotyped phenotype, the molecular genetic diagnosis of GHL is particularly challenging. Biallelic mutations in >40 different genes have been observed in autosomal recessive forms, accounting for >50% of the families with this type of deafness in different populations. Most of these mutations have been identified in countries with high consanguinity rates similar to Oman, such as Pakistan, Tunisia, Iran, Palestine, India and Turkey.12 Most of them are private mutations reported in only few families, their worldwide distribution is largely unknown and only few recurrent mutations, characterized either by a founder effect or by multiple origins, have been detected in distinct populations.12

Targeted DNA enrichment and massive parallel sequencing, ranging from candidate gene panels to whole-exome sequencing (WES), now enable the simultaneous evaluation of hundreds to thousands of genes at the individual level, proving to be the ideal test for genetically heterogeneous disorders.13 Such large-scale, rapid and increasingly cost-effective approaches limit the effort needed to carry out sequencing studies and make it possible to characterize populations, which for historical, geographical or other reasons have not yet taken advantage of in-depth genetic analysis of their genetic disorders.

Materials and methods

Patients

Twenty-six North Omani consanguineous families (degree of parental relatedness: first cousins), with one to three siblings, were enrolled in this study after collecting written consent from either the patients themselves or their legal guardians in accordance with regulations of the local Ethics Committee. Ninety-seven DNA samples of probands, affected/unaffected siblings and parents were collected.

Audiological evaluation was carried out at the ENT Department of Al Nahdha Hospital in Muscat (Oman) by pure tone audiometry brainstem evoked response audiometry (diagnostic, screening and threshold), otoacoustic emissions and acoustic immittance tests. Accordingly, the patients were advised to wear hearing aids, or subjected to cochlear implants. All patients had full ENT examination, showed various degrees of bilateral sensorineural deafness, and no symptoms or malformations, suggesting a syndromic form of deafness at clinical examination. By family history, all had GHL consistent with an autosomal recessive mode of inheritance.

Whole-exome sequencing

Whole-exome DNA from patients’ whole blood was captured using the TruSeq exome enrichment kit (Illumina Inc., San Diego, CA, USA; three samples) and sequenced as 100-bp paired-end reads on Illumina HiSeq2000 platform (Illumina Inc., San Diego, CA, USA). Generated reads were treated for variant calling as elsewhere reported.14 Only non-synonymous single-nucleotide variants, splice-site substitutions, small insertions/deletions (InDels) were considered as candidate mutations when they were rare, that is, absent or with an allele frequency <1% in ExAC database, and were not homozygous in in-house database of non-deafness samples.

H3M2 (ref. 15) was used for the identification of Runs Of Homozygosity (ROH) from WES alignments in the three siblings.

Duplication-specific PCR-based amplification assay

The duplication-specific forward primer has been designed so that the 3′-end was specific to the MYO15A GCCATCT duplication (F 5′-ACGAGGCCATCTGCCATCTA, R 5′-TTGCTGCTCGAAGAAGGCG; see Supplementary Figure 1). Under these conditions only DNA with the duplication will be amplified resulting in a single band on agarose gel electrophoresis. PCR was carried out under standard conditions.

SNP array genotyping

The Omni Express 24 v.10 array SNP genotyping chip by Illumina (716 503 SNPs) was used to extend ROH analysis to three unrelated subjects, who had no WES performed. Single-nucleotide polymorphism (SNP)-array genotyping was performed on a total of four unrelated subjects, including one of the siblings in whom the duplication was initially identified. After quality check, total genotyping rate resulted to be 99.8% and 685 443 autosomal SNPs were retained for the analysis. PLINK16 was used to identify ROH from SNP data.

LD and haplotype analyses

SNPs array data from the four affected individuals and whole-exome data from 2120 unrelated healthy individuals (2091 1000 Genomes Project subjects from different populations: 595 AFR, 504 EAS, 503 EUR, 489 SAS; 21 Omani subjects, 8 Qatari subjects) were used to investigate linkage disequilibrium (LD) patterns. Haplotype structure was reconstructed for the genomic region surrounding the MYO15A duplication using 46 SNPs covering 528 kb upstream and 215 kb downstream the duplication, and shared by SNP array and WES data, which were converted to the PLINK format with VCFtools.17 Identification of LD blocks was performed with the Haploview software18 and 31 SNPs included in an approximate contiguous LD block (including MYO15A duplication) were retained for haplotype phasing, which was performed with the Bayesian algorithm implemented in the PHASE software.19 Evolutionary relationships among inferred haplotypes present in the Omani sample or showing cumulative frequencies higher than 1% in the populations sequenced by the 1000 Genomes Project were investigated via haplotype network reconstruction by means of the median joining network algorithm20 implemented in the Network software (http://www.fluxus-engineering.com). The ancestral haplotype was inferred by using the chimpanzee alleles at each of the 31 considered polymorphic sites. An estimate of the mutation rate per site per year of the investigated genomic region (μ) of 0.76 × 10−9 was obtained by dividing the total nucleotide divergence (Dxy=0.0091) between healthy human samples representative of Omani, African (YRI-Youruba), European (CEU-CEPH) and East Asian (CHS-Southern Han Chinese) populations, and the chimpanzee by twice the divergence time between the species (6 million years). Such a rate was finally used to compute rough estimates of the time to the most recent common ancestor (TMRCA) for the haplotypes of interest. For this purpose, the rho statistic21 was calculated by considering the occurrence of one mutation every 6113 years, as obtained by the formula: 1/(μ × length of the investigated genomic region in bp).

Results

Identification of a MYO15A founder duplication

First, we performed WES on the three affected siblings of one family (family 041; Figure 1). WES yielded an average 112 × mean coverage over the targeted exome with the 83% of the exonic positions covered on average >20 × . The evaluation of rare and potentially damaging homozygous variants shared by the three siblings, absent or with an allele frequency <1% in ExAC database and non homozygous in our in-house database of non-deafness samples focused on a single variant: a novel homozygous c.1171_1177dupGCCATCT 7-bp frameshift duplication (p.Y393Cfs*41; Figure 1; Supplementary Material 1) in MYO15A (OMIM 602666), a gene associated with autosomal recessive deafness 3 (DFNB3, MIM 600316).

Figure 1
figure 1

Segregation of MYO15A duplication in family 041, and location of the duplication in MYO15A sequence and of the surrounding ROH on chromosome 17. From the top: the MYO15A c.1171_1177dupGCCATCT duplication segregates with the disease in the nuclear pedigree (parental degree of consanguinity:first cousins). Only one of the three electropherograms displaying the homozygous duplication in the three siblings is shown; the site of insertion of the in-tandem duplicated sequence in the wild-type MYO15A nucleotidic sequence; the H3M2 plot showing ROHs in chromosome 17 of one sibling of the first family (041–112): the magnification of the region surrounding the MYO15A duplication points to the ROH surrounding MYO15A. ROH, Runs Of Homozygosity. A full color version of this figure is available at the Journal of Human Genetics journal online.

This duplication was confirmed by Sanger sequencing in parents in the heterozygous state and in the affected offspring in the homozygous state. Analysis of ROH with H3M2 detected an 849 kb ROH surrounding MYO15A in the three affected siblings (Figure 2). Given its relatively short length, this haplotype was likely to originate from background population relatedness rather than from recent parental relatedness.22 This finding raised the question whether the same duplication could be present in other apparently unrelated GHL Omani patients. By screening the remaining 25 families, the same homozygous duplication was found to segregate in seven families for a total of 8/26 families (28%). All homozygous duplication carriers were diagnosed as having bilateral profound prelingual hearing loss (Table 1) within the first year of life and all needed cochlear implants. The ROH surrounding MYO15A detected by SNP array data in one of the three siblings resulted to be larger than that detected by WES data (Figure 2). This is likely because the SNP map density was higher for WES than for SNP array data (data not shown), making the use of WES data more informative for the detection of homozygous/heterozygous SNPs in this region. We therefore took the region detected by WES as the minimal region shared by the three siblings. The extension of the ROH analysis to other three unrelated subjects revealed ROHs of different length surrounding MYO15A, but all showed exact allelic match with the 849 kb interval initially detected in the first family, indicating this as a founder haplotype shared by all the c.1171_1177dupGCCATCT homozygous subjects (Figure 2).

Figure 2
figure 2

ROH analysis reveals a founder haplotype harboring the MYO15A duplication. Red horizontal bars: ROH initially identified by H3M2 on WES data in the three siblings of the first family. Green horizontal bars: ROH identified by PLINK analysis on SNP array data in four unrelated patients, including one sibling of the first family (041–112). WES data were more informative than SNP array data and thus enabled to narrow the ROH in subject 041–112 from 1.4 Mb to 849 kb. *denotes that this is the same individual. Orange horizontal bar: haplotype h193, identified by the presence of the MYO15A c.1171_1177dupGCCATCT duplication, displays exact allelic match across the four unrelated subjects and is therefore identified as a founder haplotype. ROH, Runs Of Homozygosity; SNP, single-nucleotide polymorphism; WES, whole-exome sequencing. A full color version of this figure is available at the Journal of Human Genetics journal online.

Table 1 Clinical audiometric data of patients with the MYO15A duplication

We then screened for the duplication 284 subjects from the Omani general population (568 chromosomes) using a duplication-specific PCR-based amplification assay, capable to discriminate between mutated and non-mutated alleles (Supplementary Figure 1). Using this assay, we were able to PCR-amplify specifically the duplicated sequence. We found two heterozygous individuals, then validated by Sanger sequencing, indicating a carrier frequency of 0.7% (2/284) and an allele frequency of 0.3% (2/568).

LD and haplotype structure at MYO15A surrounding genomic regions

LD patterns at the genomic region encompassing the 46 SNPs shared between SNPs array and WES data, as well as the MYO15A duplication, revealed the existence of multiple LD blocks showing diverse overall LD levels and located both upstream and downstream with respect to the duplication of interest (Supplementary Figure 2). Among them, blocks mapping downstream with respect to the MYO15A duplication turned out to be consecutive and with considerable increased LD levels. Accordingly, only the 31 SNPs forming such approximate contiguous block of LD were retained for haplotype phasing. Haplotype reconstruction led to the identification of a substantially larger number of haplotypes (193) (Supplementary Figure 3) with respect to that of haplotype-constituting variants (31), emphasizing the considerable level of recombination occurred within the overall chromosomal region under investigation. As expected, the haplotype carrying the causal duplication (h193) turned out to be private of Omani patients.

The four cosmopolitan haplotypes most frequent in the whole-data set were h147 (24%), h25 (14%), h3 (11%), h182 (8%) (Figure 3) and accounted for 55% of the Omani ‘healthy’ chromosomes. These haplotypes were found in all the examined human groups, showing the highest cumulative frequency when considering the 1000 Genomes Project populations as a whole. Overall, frequency patterns of these haplotypes in the ‘healthy’ Omani samples were comparable mainly to those observed in populations of European ancestry. The remaining 19 haplotypes found in the Omani group were instead private or shared with only one or two populations (mainly African or Qatari samples). Of these low-frequency haplotypes, 12 were private singletons of the Omani sample (that is, were observed only once). When evolutionary relationships among the inferred haplotypes were investigated via haplotype network reconstruction, a haplotype topology characterized by at least three distinct haplotype clusters was observed. A first clade emerged directly from the ancestral haplotype, being mainly represented in African populations with the sole exception of the high frequency h25 haplotype that showed a worldwide distribution. The other two clades were composed of haplotypes spread in all the considered populations. One contained two of the most frequent haplotypes discussed above, h182 and h147, pointing to their close phylogenetic relationship. The third cluster instead contained haplotype h3, the third most frequent allelic combination in the whole-data set and the second most represented in the Omani ‘healthy’ sample with a frequency of ~20% in Qatar and Europe, as well as of 17% in Oman, 14% in South Asia, 11% in Africa and 0.3% in East Asia, and from which branched out the haplotype containing the duplication (h193). Rough estimate of the TMRCA was computed for haplotype h3 and h193, pointing to ~73 766 years and 74 056 years to be necessary to accumulate variation in the examined genomic region to differentiate them from h122, the human haplotype exclusive of African populations and most close to the ancestral one observed in the chimpanzee. Comparable TMRCA (around 91 500 years) was obtained when also the other four common cosmopolitan haplotypes (h147, h25, h182, h171, showing a cumulative frequency with h3 of ~66%) were considered. Unfortunately, the fact that GHL patients cannot be considered representative of the overall Omani population (that is, they have not been randomly selected from it) prevents us to apply the same approach to directly date when the pathological haplotype (h193) originated from the common h3 one. Despite that, as h3 and h193 differ only for the causal duplication, suggesting that all the other h193 variants do not contribute to modulate the pathological phenotype, the difference observed between their TMRCAs could be interpreted as a reliable approximation of the age of the pathological variant. Accordingly, the MYO15A duplication seems to have been introduced in the Omani gene pool within the past two to three centuries.

Figure 3
figure 3

Geographical map of the four cosmopolitan haplotypes most frequent in the whole-data set. h147, h25, h3 and h182 are the most frequent haplotypes in all populations in the data set; h193 branched out from h3 approximately in the past two to three centuries. A full color version of this figure is available at the Journal of Human Genetics journal online.

Discussion

Mutations of MYO15A are a well-known cause of recessively inherited nonsyndromic deafness globally. As recently reviewed by Rehman et al.,23 a total of 192 recessive MYO15A variants are associated with hearing loss; the authors have categorized 82 of them as pathogenic following these criteria: frameshift, nonsense and ±1 or 2 splice-site variants with an allele frequency <0.5% in controls and carried in homozygous state or in compound heterozygosity (if both likely pathogenic alleles have been described) by deaf individuals. From a clinical point of view, the only documented phenotype associated with mutations of MYO15A is the combination of prelingual deafness and vestibular dysfunction24 and the reported phenotypes were severe to profound congenital hearing loss,25, 26, 27, 28, 29, 30, 31 except for a recent study providing the first evidence that patients with MYO15A mutations could also present a milder phenotype with postlingual onset and progressive hearing loss.32 We described here a novel MYO15A pathogenic duplication following the Rehman et al. criteria, and with a straightforward genotype–phenotype correlation, as all families have a congenital profound hearing loss (Table 1).

Moreover this duplication, found in 28% of families, emerges as the major cause of GHL in Oman and the frequency of carriers of 0.7% in a population sample indicates that the MYO15A duplication is not a rare allele in Oman. All the families enrolled in this study originate from North Oman and belong to three distinct Omani tribes. On the other hand, the Omani subjects screened for the duplication have different geographical origin: 190 subjects come from Northern Oman (380 chromosomes) and 94 subjects from Dhofar (188 chromosomes), a region located in Southern Oman and separated from the North by several hundred miles of desert, with a population ethnically distinct from the costal Arabs and probably related to the neighboring populations of Yemen.3, 33 The two heterozygous carriers of the MYO15A duplication found with the duplication-specific assay are both from Northern Oman, indicating a specific geographical carrier frequency of 1.05% (2/190) in the region of provenance of the families studied, although the duplication seems to be absent in Southern Oman. Nevertheless, the number of subjects tested is too small to claim that the MYO15A duplication is absent in Southern Oman.

The founder ROH MYO15A haplotype (849 Kb) is not as long as the ROH usually surrounding homozygous mutations causing Mendelian disorders in the offspring of consanguineous couples.34 ROH of intermediate length (500 kb<>1500 kb) have been interpreted as the result of background population relatedness,22 originating from the relatively recent but unknown kinship among parents of affected offspring. To test this hypothesis, we combined SNP array and WES data to explore haplotype structure in the MYO15A genomic region finding that the haplotype variation of the healthy Omani population is characterized by few cosmopolitan haplotypes, with frequencies comparable to those observed in populations of European ancestry, and by many Omani private allelic combinations. Most of these Omani specific haplotypes were extremely rare, thus having plausibly evolved very recently. In particular, the haplotype carrying the MYO15A duplication seems to have been introduced in the Omani gene pool within the past two to three centuries, and the high inbreeding levels, as well as the demographic explosion in the last decades,3 have likely contributed to its diffusion.

In conclusion, the identification of this founder MYO15A duplication, together with its frequency and spreading across different Northern Omani tribes, has important implications for the design of a prevention strategy of GHL in Oman, which can be efficient and cost-effective: the duplication-specific PCR-based test that we devised for the screening in our population sample could therefore be implemented in a simple and rapid carrier GHL screening. As discussed by Rajab et al,35 the presence of founder mutations in a population with a genetic structure such as the Omani offers a unique opportunity for planning a population screening for common autosomal recessive disorders. Moreover, the rapid and cost-effective collections of molecular data now made possible by next-generation sequencing technologies enable the application of genetic epidemiology methods, such as the Homozygosity Index method,36 that using even small samples can be applied effectively to investigate which autosomal recessive disorders are most frequent and to establish priorities for screening and intervention policies of autosomal recessive disorders.37 About a third of Omani families presenting with GHL referred for genetic counseling may carry the founder MYO15A duplication described here. High diagnostic value of single mutations may be disclosed in Oman by WES-based studies of other Mendelian disorders, encouraging the application of this approach to dissect the molecular causes contributing to many other autosomal recessive diseases in this, as well as in other countries of the so-called ‘consanguinity belt’ (1.2 billion people) where the frequency of consanguineous marriages can run up to 30–50%.38