Introduction

Global Developmental Delay (GDD)/Intellectual disability (ID) represents a group of genetic, phenotypic, and clinically heterogenic disorders that affect approximately 1% of children worldwide1. Significant limitations define GDD or ID in both intellectual functioning and adaptive behavior that originates during brain development. Non-genetic causes such as infections, autoimmunity, and environmental factors are described, but the majority of such disorders have a genetic basis2.

Hundreds of genes are thought to be involved in the etiology of ID3. The list of ID genes has expanded and according to the SysNDD database, there are now 2841 primary and candidate human ID genes4. In the last decade, advances in genetic technologies such as next-generation sequencing (NGS) have revolutionized clinical practice in medical genetics, aided clinical diagnosis, and proved to be very effective in discovering an ever-increasing number of ID-related genes. They have also enabled deciphering the ID’s heterogeneous genetic mechanisms5.

Whole-Exome Sequencing (WES), as a clinical diagnostic test, has a success rate of about 30–40%6. The diagnostic yield of chromosomal microarray in children with no underlying cause of their ID is around 15 to 20%7. In trio-based WES done in groups of children with severe ID, the yield ranged from 13 to 35%8. In contrast, exome sequencing in samples from consanguineous families with various ID-associated phenotypes has produced a high yield. For example, in several studies of ID from the Middle East, NGS yield ranged from 37 to 90%, depending on the patients’ cohorts, the study design, and the classification of the variants9,10,11,12,13,14,15.

This study reports on the diagnostic yield and candidate genes of a genetic study of intellectual disability in 118 Omani families. The study period extended over five years and included 215 affected individuals. The diagnostic yield when considering both pathogenic and uncertain variants was 55%. Candidate genes with a possible association with ID phenotype seen in this cohort were detected in 32/118 (27%) with a total number of 64 affected individuals. Understanding ID causes will provide precise counseling for affected families and aid in primary prevention. Also, the costs of unnecessary investigations will be spared, with fewer diagnostic odysseys. The paper also discusses the pitfalls and challenges of candidate gene discovery in a consanguineous population.

Materials and methods

Human subjects

The Medical Research Ethics Committee approved the study of Sultan Qaboos University (SQU MREC#1362). Informed written consent was obtained from all participants or their guardians. All methods were performed in accordance with the relevant guidelines and regulations and in accordance with the declaration of Helsinki. The target patients included in this study presented with global developmental delay or intellectual disability, all assessed clinically by medical geneticists (detailed methods in SUPP_S1). Severe phenotypes causing death within the neonatal or early infantile period were also included as a neurological phenotype was evident, such as seizures, hypotonia, or brain malformations. Families with a likely autosomal recessive pattern of inheritance were selected. Patients with known molecular diagnoses at the time of recruitment were excluded. The study and exome data analysis were carried out over 5 years, between 2016 and 2021.

Whole exome sequencing and variant interpretation

Whole-exome sequencing analysis was performed for all affected individuals where samples were available. A detailed methodology is presented in the supplemental data (SUPP_S1). In brief, the method used hybrid capture technology (Agilent SureSelect Human All-exons-V6 or V7) for exome enrichment and capture. Illumina technology (Hiseq2500, Hiseq4000, or NovaSeq6000) of 150 bp paired-end, at 150-200X coverage, was used for sequencing. The reads were mapped against UCSC GRCh37/hg19 or GRCh38/hg38. Filtering and variant prioritization were analyzed using an in-house pipeline (SUPP_S1). Variant filtration was performed to keep only novel or rare variants (≤ 1%). Public databases such as 1000 Genomes, Exome Variant Server, and GnomAD were used for alleles frequencies. For filtration of common variants against the Middle Eastern population, the Greater Middle East (GME) and variome database “al mena” that comprises data of 2497 samples was used16,17. Our in-house population-specific exomes database, which contains data of 1564 WES, was also used. During the filtration process, the phenotype and mode of inheritance were both considered. Any potential variants identified after prioritization were further confirmed by Sanger sequencing. Members included in the segregation analysis ranged from 3 to 12 members of each family, depending on the DNA availability. Most of the segregation was performed for the parents and siblings alongside the index patient. When a definitive cause was not possible or when a candidate gene was considered, further analysis of copy number variants was performed on exome data using ExomeDepth18 (SUPP_S1).

Classification of variants was based on the published ACMG guideline19,20. Pathogenic or likely pathogenic variants in known disease-causing genes which could be linked to the reported phenotypes of the affected patients were categorized as disease-causing variants. The second category was for the variants in known disease-causing genes that overlapped with the patient’s phenotype, and these were considered possible disease-causing variants. These were rare and damaging variants of uncertain significance (VUS). Variants in candidate genes, which were predicted to be deleterious and found in genes not previously confirmed to be implicated in human disease, formed the third category. These genes are novel; they were previously reported in a single family or cause strikingly different phenotypes with different modes of inheritance. Supporting data for candidate genes included variants within a shared autozygosity area. The variant is of high or moderate impact; the population frequency supports genic intolerance to such variants, and in-silico prediction tools indicate a damaging effect. Data for gene function and network, gene expression, and animal models were also considered.

Results

Whole-exome sequence analysis was performed for 188 individuals representing 118 characterized families with a total number of 215 affected individuals. Of the 118 families included, 93 (78.8%) had a family history of one or more affected individuals in addition to the index patient, all with a similar phenotype. The age range of the affected individuals, at first clinical assessment, was from birth to 34 years old. The average age at which WES was completed was 8.5 years, and children below five years of age, represented 30% of the affected individuals when WES was completed. Males represented 57.2% of the studied group (123 Males:92 Females). The rate of consanguineous marriages within the included families was 91%. The affected patients exhibited diverse phenotypes, including global developmental delay, seizures, brain malformations, microcephaly, facial dysmorphism, and other systemic manifestations (Table 1 and Supplementary SUPP_Table1).

A total of 420 members’ DNA samples were available for WES or Sanger sequencing for segregation analysis. These included healthy or affected members. However, DNA samples were not available for analysis in 22 out of the 215 affected individuals. Sanger sequencing was used to confirm the variant and phenotype-genotype segregation in all candidate variants. Only variants that were confirmed and segregated with the phenotype are reported.

Variants in previously known and described ID genes were seen in 65/118 families (55%). Following the ACMG guidelines of variant classification, pathogenic (P) or likely pathogenic (LP) variants were detected in 32/118 families (27%). Variants of uncertain significance were seen in 33/118 families (28%). The majority of these two groups showed homozygous variants (51/65; 78.5%). These variants are rare; they explain the disease manifestations, are predicted to be damaging, are confirmed by Sanger sequencing, and are segregated with the phenotype (All listed in SUPP_Table1).

Candidate genes with a possible association with ID phenotype seen in this cohort were detected in 32/118 (27%) with a total number of affected individuals of 64 (Table 1). These candidate genes were selected according to rarity and absence in homozygosity status in local control exomes or public databases. The impact of the variants is predicated damaging. The expression patterns or mouse models supported an association with neurological dysfunction. Importantly, Sanger sequencing confirmed segregation for all variants in candidate genes in up to 3 generations in the family pedigrees (Fig. 1). The total number of candidate genes identified was 28. Table 1 shows detailed findings for the candidate genes.

Figure 1
figure 1

Pedigrees of families with candidate genes, showing the variants. Shaded symbols indicate affected individuals and arrows indicate the proband, zygosity for the variant is included.

Twenty-one families (18%) with 40 affected individuals remain unsolved despite multiple molecular tests, including WES, chromosomal microarray, and Fragile X where applicable. There is a trend of milder phenotypes and usually non-syndromic intellectual disabilities in unresolved cases. However, we can not draw a firm conclusion because of the small numbers.

Discussion

Global Developmental Delay (GDD)/Intellectual disability (ID) represents a group of genetic, phenotypic, and clinically heterogenic disorders that affect approximately 1% of children worldwide. This study presents the results of 188 exome analyses representing 118 consanguineous Omani families. This cohort included 215 affected individuals with intellectual disabilities, including global developmental delay, seizures, brain malformations, microcephaly, facial dysmorphism, and other systemic manifestations. Overall 82% were found to have a possible explanation. Specifically, 55% had variants in previously described and known genes (P/LP or VUS) and 27% in possible candidate genes.

With the enrichment for consanguineous families (91%), it was not surprising that the majority (85.5%) of the overall variants in the three groups were homozygous. The consanguinity rate is high due to the preference to include families with autosomal recessive phenotypes and multiple affected individuals. This study detected pathogenic (P) or likely pathogenic (LP) variants in 32 families, making the diagnostic rate of the study 27%. The main goal of this study was to recruit unsolved cases. However, the diagnostic rate observed was higher than anticipated. This can be explained by frequent reanalysis of exome data during the last five years of the study, thus enabling newly published genes to be detected. Also, VUSs, including non-coding variants, were selected with further evidence of pathogenicity becoming available. Also, during the study's initial phase, some families did not have access to clinical exome sequencing and thus were channeled to the research exome.

In a large-scale exome sequencing study, Monies and colleagues15 reported the yield of exome sequencing on 2219 families from Saudi Arabia. The overall diagnostic yield of exome sequencing based on cases with confirmed pathogenic or likely pathogenic variants was 43.3%. However, if considering variants of unknown significance (VUS) that are in an established disease-related gene or candidate genes with compelling biological candidacy were considered, the yield rate would be 73%. The high throughput design of this study led to the discovery of 236 genes that have no established OMIM phenotypes and were proposed as candidate genes. The negative results (unsolved cases) accounted for 27% of the total.

The total number of candidate genes for intellectual disability identified in this study was 28. During the course of this study, Gene Matcher21 was used to provide further evidence of association with the phenotype. Through this study and in collaboration with the scientifc community, several of these genes have been successfully confirmed to cause intellectual disability (Table 1). One interesting shared candidate variant within the XPR1 gene was identified in multiple affected individuals from four apparently unrelated families (Table 1). These families come from different geographical areas of Oman. However, haplotype analysis using exome data indicated that they all shared the same haplotype (Data not shown). The XPR1 protein functions to mediate phosphate export from the cell as well as binding inositol hexakisphosphate and related inositol polyphosphates, which are key intracellular signaling molecules22. Mutations in the XPR1 gene are known to be associated with the dominant condition of idiopathic basal ganglia calcification-6; OMIM 61641323. The earliest age of onset for this condition is in the third and fourth decade of life, with symptoms of cerebrovascular insufficiency associated with movement disorders, cognitive decline and psychiatric symptoms. Our patients’ phenotype is strikingly different; we detected biallelic XPR1 variants and apparently healthy parents. The phenotype included variable signs of neonatal pulmonary hypertension, cardiomyopathy, serum hypophosphatemia, chronic lung disease requiring oxygen therapy, severe developmental delay, and brain basal ganglia calcification. Further functional characterization for these variants has commenced.

Whole-genome sequencing (WGS) analysis can cover up to 98% of the human genome, whereas WES only covers about 95% of the coding regions and only 1–2% of the genome. WES has a lower cost per sample than WGS, a greater depth of coverage in target regions, lesser storage requirements, and easier data analysis24. It is, however, worth highlighting the pitfalls and challenges that can occur when WES analysis is performed. WES is a high-throughput, complex technique with potential pitfalls at every step. These pitfalls and the consequential missing of the molecular diagnosis in exome sequencing and analysis is a recognized phenomenon25. This phenomenon could be caused by various factors, including technological limitations in variant detection, a lack of enrolling additional family members, many variants identified for probands, or the causal variant being located outside of the coding regions25. Pitfalls related to WES analysis can be categorized into three main groups (Table 2).

The first group of pitfalls consists of those which are sequence-related. Large rearrangements or complex structural variants are one example. Structural variants are genomic rearrangements larger than 50 bp in size, and they account for about 1% of the variation in human genomes26. Complex structural variants have been shown to contribute to human genomic variation and to cause Mendelian disease27. Unfortunately, these cannot be identified easily by WES. However, multiple pipelines for CNV analysis are available. In this study and using ExomeDepth, it was possible to detect CNVs in a homozygosity state as a cause of the phenotype (Families 11MS8800 and 10DH12500). Mitochondrial mutations are other causative factors that WES cannot detect. Other factors also include mosaicism, abnormal methylation, and uniparental disomy.

Other causes of missing variants in WES include decreased coverage, locus-specific features such as GC-rich regions, and sequencing biases28,29. Difficulty in the alignment of indels (insertions/deletions) larger than 20–50 nucleotides long is one of the limitations, which is likely to be the reason for WES missing such variants30. Incomplete human genome annotation and sequence might also affect the accuracy of variant mapping and annotation. For instance, an intronic variant could be located in an unannotated exon31. Another potential cause is the high sequence similarity between pseudogenes and their corresponding functional genes32.

The second group to consider is pitfalls due to annotation and prioritization errors. Annotation and prioritization steps are used in WES analyses to reduce thousands of variants to a few candidates. During the filtering process, all annotations except for the “canonical” transcripts (i.e., the longest transcript of a gene) are initially ignored. Remarkably, pathogenic variants can be missed if alternative transcripts are not fully considered33. Splicing is thought to be involved in 15–30% of all inherited disease variants34. Despite advances in exome capture methods or machine learning for detecting variants that affect splicing, accurate detections of deep intronic variants remain limited and this could be the reason for missing splicing variants. In our cohort, intronic variants that were likely to affect splicing were detected in 19/118 families (16.1%) of which 8 were in non-canonical splicing sites.

Databases like OMIM and HGMD are used to find gene-disease and variant–disease associations in the literature. Variants or genes listed in these databases would be flagged as potentially disease-causing35. Nevertheless, a reason for an initial false-negative result is that such variants or disease databases have not been kept up to date. A recent study by Bruel and colleagues illustrated this issue in a study cohort of 313 individuals36. Likewise, when considering deleterious variants, those with high population prevalence might be filtered out. Penetrance of the disease might be influenced by numerous factors, including other hidden rare variants, family history, inheritance, additional medical problems, and ethnic background.

Synonymous variants, which are known as ‘silent’ variants, represent almost 50% of the variant list identified by WES. Filtering out synonymous variants reduces the variants’ list because they are assumed to be benign. However, our increased knowledge about the relationship between genetic variants and disease has shown that synonymous variants play a significant role in human disease risk and other complex traits, including variants that affect splicing37. Indeed, a recent study challenged the concept that synonymous variants are neutral. In their yeast study, Shen and colleagues showed a strong non-neutrality of most synonymous mutations38. If this holds true for other genes and organisms, then numerous biological conclusions, including disease causation about synonymous mutation, would require re-examination. An example in our cohort is the family 16SS2600, where GPT2 (p.Gly245Gly) was initially missed and flagged as a silent variant.

All labs encounter pitfalls related to clinical factors and phenotypes. As a result, a negative WES result must be interpreted in the context of the patient’s clinical history to determine whether reevaluation or further testing is necessary. For example, within our cohort (family 10MS6600), two related families with multiple affected individuals were enrolled as having the same phenotype and WES analysis was initially negative. However, after a detailed clinical reevaluation and exome reanalysis, the results showed that two different diseases were possibly running in the family. Some of the affected members were indeed found to harbor a deep intronic variant in the PGAP3, which was recently reported to cause hyperphosphatasia with mental retardation syndrome type 4 (OMIM 615,716). Another example is family 10MS16500, demonstrating that multiple individuals carrying two or more different diseases can complicate the phenotype. Similarly, the family (10DF10800) with multiple affected individuals presented with developmental delay, congenital cataracts, and bilateral sensorineural hearing loss. WES analysis identified two different variants for two different syndromes in two genes, one of which is novel as the underlying cause39. Another aspect is the mode of inheritance in which conditions are known to be autosomal dominant but manifest as autosomal recessive. Monies and colleagues reported many examples of genes or diseases inherited as both AD and AR15.

Table 1 Candidate Genes. Several were published previously through gene matching.
Table 2 Pitfalls and challenges of exome analysis.

Conclusion

In conclusion, using WES to identify the novel causes of human disease has changed the research landscape of genetic and neurodevelopmental disorders. Although WES is comprehensive technology, its limitations must be considered when negative results are obtained. The pitfalls of WES can potentially reduce the effectiveness of this technique in biological and medical research as well as in clinical settings. Finally, it is worth emphasizing that identifying a likely candidate gene is often just the start of a long process to confirm the variant’s pathogenicity.