Main

Intellectual and developmental disabilities (IDD; often referred to as mental retardation/developmental delay) are genetically highly heterogeneous. Despite the enormous heterogeneity, all these children have in common clinical features such as low IQ and poor adaptive behavior. The genetic heterogeneity presents significant problems when attempting to identify the underlying etiologies for IDD. In particular, genetic linkage and association studies perform suboptimally in these circumstances. Hence, there is a need to develop alternative strategies to identify various genetic causes of IDD since, even with comprehensive evaluation, the underlying causes cannot be identified in the vast majority of patients (1).

We had previously shown that with diffusion tensor imaging (DTI), one can demonstrate abnormal development of brain white matter pathways (particularly arcuate fasciculus) in approximately one-third to one-half of the patients with IDD (2,3,4). Since the development of higher cognitive functions (such as language) depends on the integrity of arcuate fasciculus, demonstration of abnormalities in this pathway could shed some light into the underlying neurologic mechanisms contributing to IDD. The classical arcuate fasciculus connects the Wernicke’s and Broca’s areas and plays a central role in language and cognitive development. Arcuate fasciculus has been shown to be abnormal in a variety of conditions causing cognitive impairment such as global developmental delay (4), autism (5), Angelman syndrome (6,7), and congenital bilateral perisylvian syndrome (8).

Given that large numbers of rare variants are present in any individual, the search space from which causal variants need to be extracted is large. Demonstration of an abnormal language pathway could focus one toward the genes that participate in the development of white matter pathways, thus helping to narrow down the causal search space. In the present study, we combined our data regarding integrity of the arcuate fasciculus with genetic data obtained by whole-exome sequencing.

Exome sequencing is a promising new technology that has been applied to identify the genetic causes of several disorders (9,10). These studies have focused on Mendelian disorders where it has been quite successful (10,11). It is not yet clear whether such success can be replicated in a population of patients with heterogeneous genetic causes such as IDD. One of the issues is that exome sequencing shows a very large number of common and rare variants and discriminating the causal/susceptibility variants from nonpathogenic variants is generally quite difficult. This is further complicated by the variability in the genetic architectures of different diseases with respect to number, effect size, and population frequencies of risk alleles. Some of the most common approaches used in exome sequencing studies involve the segregation pattern of the variants in pedigrees and the identification of highly disruptive de novo mutations. Even though these approaches are often useful to identify multiple, rare risk alleles of large effect (12), their low etiologic yield suggests that alternative approaches are clearly required.

In the present study, we developed one such alternative approach to identify candidate genes for IDD with and without an identifiable arcuate fasciculus. We hypothesized that rare, nonsynonymous variants in ultraconserved genes that are known to cause abnormal brain morphology in mutant mice are important risk alleles for IDD. Rare, nonsynonymous variants are commonly identified in most exome sequencing projects. The specific approach to identify ultraconserved genes and combining this information with systematic knowledge about phenotypic effects of mouse mutants (available for more than 16,000 genes) are novel aspects of the present study.

Results

Each patient had an average number of 16,493 exonic single-nucleotide variants (with a minimum coverage of 20). On average, there were 3,167 novel, nonsynonymous variants per person (with a minimum coverage of 20) that corresponded to a total of 9,927 genes across all the 18 patients. Only seven novel, nonsynonymous (all of them were heterozygous, missense) variants belonged to ultraconserved genes that are known to cause abnormal brain morphology in mutant mice ( Table 1 ). One of the variants in the gene ISL1 was found to be a de novo variant. Similarly, only three novel, nonsynonymous (all of them were heterozygous, missense) variants belonged to known IDD genes ( Table 1 ). A variant in the gene MID1 belonged to both gene sets ( Table 1 ). All the nine variants ( Table 1 ) were validated by Sanger sequencing (Supplementary Figure S1 online).

Table 1 Demographic, genetic, and imaging characteristics of the IDD patients

Given the enormous heterogeneity and pleiotropy of IDD and limited sample size of the present study, we attempted to evaluate the functional relevance of the candidate genes in individual patients instead of performing association tests. Seven out of 18 IDD patients had underdeveloped/unidentifiable arcuate fasciculus ( Table 1 ). Two variants (involving MID1 and EN2) were present in two patients with underdeveloped arcuate fasciculus. Figure 1 illustrates the underdeveloped arcuate fasciculus in the patient with a novel, nonsynonymous variant in EN2. Both these genes (MID1 and EN2) are transcription factors known to regulate axon guidance genes (13,14).

Figure 1
figure 1

The abnormal white matter morphology in the region of arcuate fasciculus in a patient (#14 in Table 1) with rare variant in EN2 (left) compared with normal white matter morphology (right). The color code represents the orientation of white matter tracts (red, transverse fibers; green, anteroposterior fibers; blue, superoinferior fibers). A well-developed arcuate fasciculus (right) appears green in the indicated region with minimal red color. Note that in this patient (left), red fibers dominate the green fibers, resulting in an underdeveloped/unidentifiable arcuate fasciculus. Since axon guidance cues determine these white matter orientation patterns, the identified mutation in a gene regulating axon guidance seem to be strong candidates for producing this abnormal white matter pattern.

PowerPoint slide

We also highlight additional observations in some individual patients below: a patient (#11 in Table 1 ) with a novel, nonsynonymous variant in MAFB has ataxia and cleft lip in addition to IDD. This is notable because Beaty et al. (15) conducted a genome-wide association study of nonsyndromic cleft lip/palate involving 1,908 case-parent trios, and the SNP, rs13041247, near MAFB was the most highly significant SNP (P = 1.44 × 10(−11)). Replication studies further confirmed the evidence for association of cleft lip/palate with MAFB. In addition, the MAFB mutant mouse shows abnormal development of hindbrain (16). This suggests that the combination of cleft lip, ataxia, and IDD in this patient is likely due to the rare variant in MAFB. Such independent agreement between the present study and GWAS study suggests that the rare, nonsynonymous variants in MAFB are likely to be a risk allele of large effect in this patient. Similarly, another patient (#10 in Table 1 ) with glycine receptor β subunit (GLRB) variant has a 2-deoxy-2-(18F)fluoro-D-glucose positron emission tomography scan that revealed severe cerebellar hypometabolism ( Figure 2 ). GLRB mutant mice have previously been shown to result in abnormal gait (17) and abnormal Purkinje cell morphology (18). This suggests that the GLRB variant may have played an important functional role in producing the 2-deoxy-2-(18F)fluoro-D-glucose positron emission tomography imaging pattern and IDD in this patient. However, definitive evidence of the role of this GLRB mutation requires the demonstration of new patients with this mutation demonstrating this particular 2-deoxy-2-(18F)fluoro-D-glucose positron emission tomography imaging pattern.

Figure 2
figure 2

The 2-deoxy-2-(18F)fluoro-D-glucose positron emission tomography scan of a patient (#10 in Table 1) containing a rare variant in the glycine receptor β subunit (GLRB). Severely reduced glucose metabolism in bilateral cerebelli (relative to cerebral cortex) is shown with an arrow. GLRB-mutant mice are known to have abnormal gait and abnormal Purkinje cell morphology.

PowerPoint slide

Discussion

In the present study, we developed a new approach to identify candidate genes for IDD using exome sequencing. Whole-exome sequencing is an emerging and powerful technology to identify disease-causing variants; however, this remains a challenging task because of the large number of variants any given individual possesses. Although different filtering criteria can be applied to identify causal/susceptibility variants, the lack of detailed knowledge about the genetic architecture or pathogenic mechanisms of diseases permits only generic approaches to the variant filtering process. The typical filters in exome sequencing search for de novo variants that are highly disruptive (12) or variants that perfectly segregate with phenotype (11,19). While these approaches have been successful in identifying the causes of diseases which follow Mendelian inheritance patterns (9,10,11,19), the etiologic yield is low for the more common heterogeneous conditions such as IDD or autism (12).

A recent study reported that de novo nonsynonymous variants were found in as many as 7 out of 10 (70%) IDD patients (20). Nevertheless, it is clear that other nonsynonymous (missense) variants in some of these genes (e.g., rs10129889 in DYNC1H1, rs115820667, rs114966386, rs62623436, rs74834692 in ZNF599, rs34114147 in DEAF1, rs143918134 in PGA5, and rs141764282 in CIC) are present in at least one population at polymorphic (>1%) frequencies. Given the tolerance of other nonsynonymous (missense) variants in some of these genes at polymorphic frequencies, the pathogenic potential of some of these de novo, nonsynonymous variants is uncertain. Looking for nonsynonymous variants only in the ultraconserved genes (as employed in the present study) avoids the identification of such variants at the expense of decreased sensitivity.

A key strength of the current study is the integration of systematic knowledge about the phenotypic effects of mutant mice into the filtering process. Currently, such knowledge is available for more than 16,000 genes (21) and is expected to involve all genes soon. The availability of standardized phenotypic consequences in mutant mice facilitates easier integration into variant filtration procedures. Failure of diverse developmental mechanisms (such as neurogenesis, migration, axon pathfinding, dendritic morphogenesis, pruning, myelination, etc.) and diverse brain structures (such as cerebral cortex, white matter, and subcortical structures) may cause IDD. Hence, we reasoned that a brain-specific and mechanism/structure-generic phenotypic term such as “abnormal brain morphology” will capture the IDD genes at reasonably high specificity.

There are some additional advantages to using this dataset that complement the conventional genetic cause identification procedures. The mouse phenotypes are often the result of knock-out of single genes, indicating that these phenotypes can arise from monogenic, large effect risk alleles. Additionally, the difficulties in the causal attributions of human observational studies are minimized since the phenotypes are the direct result of targeted experimental manipulations rather than the associations of an observational study. Given that the experimental manipulations are not feasible in human studies, use of experimental data from model systems appears to be an excellent method to infer true causal associations. The large-scale international efforts to determine the phenotypic effects of different mutations of every gene in mouse organized using controlled vocabulary is an important resource of such experimental data. This controlled vocabulary format data stored in a mammalian phenotype browser allows one to easily integrate this knowledge into the variant filtration procedures of exome sequencing projects.

Some interesting observations further attest to the potential value of the proposed method. Particularly, the association of novel, nonsynonymous variants in MAFB and GLRB with unique combination of clinical features and concordance with a GWAS study and abnormal phenotypes in mutant mice (as described in the results section) demonstrates the translational potential of this approach. Further, we also found two candidate variants in two out of seven patients with underdeveloped arcuate fasciculus. One can speculate that IDD in this subset of patients arose by specific abnormalities in the white matter development, particularly with the mechanisms by which axons are guided to their targets. Consistent with this notion, both these genes (MID1 and EN2) are transcription factors regulating axon guidance (13,14) that are highly expressed early in development. MID1 mutation is known to cause Opitz BBB/G syndrome with clinical features such as IDD, hypertelorism (both features are present in patient #2), callosal agenesis, and abnormal midline development. The association with callosal agenesis and its interaction axon pathfinding systems (14) suggests that MID1 has key role in regulating axon guidance pathways. Similarly, it is well established that EN2 regulates the expression of ephrinA5 (13). The importance of Ephrin signaling can be further ascertained from its role in misrouting retinotectal (22) and thalamocortical pathways (23). While direct evidence linking abnormal ephrin signaling to underdeveloped arcuate fasciculus is unavailable, this seems to be an attractive downstream mechanism behind the abnormal development of arcuate fasciculus.

There are some limitations to the current approach. The sample size of the study is small and whole-genome association approaches are not feasible with this sample size. We also did not attempt to determine the de novo status of the variants in most patients due to lack of parental samples. The specific choice of mouse phenotype term could also influence our results. We have also not directly demonstrated the morphological abnormalities in the patients. Finally, looking for variants only in ultraconserved genes is likely to decrease the sensitivity even though it will increase the specificity. Despite these limitations, some of the identified variants could be readily associated with rare and unique clinical abnormalities in a surprising manner. We believe that the above limitations could potentially be offset by the known genotype–phenotype associations of large-scale mouse “experiments” organized in the mammalian phenotype browser. Nevertheless, it should be noted that interspecies differences in the function of these genes could decrease the value of this approach. But for Mendelian disorders, it has been generally difficult to reliably identify such risk alleles of large effects in human studies of heterogeneous conditions such as IDD. Future studies that improve upon this and other novel approaches may pave the way toward the long-cherished goal of personalized medicine.

Methods

Subjects

Eighteen children with IDD (age: 67 ± 36 mo, 9 females) without a specific diagnosis were included in the present study. All the subjects were the patients who came to the pediatric neurology clinics of Detroit Medical Center. DTI was subsequently performed on these patients. All recruited IDD subjects were new patients unrelated to the sample of patients we studied previously (4). Nevertheless, they belong to the same population of IDD patients. Quantitative developmental assessment (Vineland Adaptive Behavior Scales) was used to identify the IDD patients. Vineland Adaptive Behavior Scales was measured in all patients and since IQ was measured only in a subset of patients, this was not included in the patient selection. Vineland Adaptive Behavior Scales scores of less than 70 were considered abnormal. Patients with known genetic etiology (such as fragile X syndrome, Rett syndrome, etc) or known environmental insults (such as perinatal hypoxia, fetal alcohol syndrome, etc.) were excluded from the study. Demographic characteristics of the participants are described in Table 1 . Informed consent was obtained from the parents. The study was approved by the Institutional Review Board at Wayne State University.

Exome Capture and Sequencing

Genomic DNA was extracted from whole blood using a DNA extraction kit (DNeasy blood and tissue kit; Qiagen, Valencia, CA). Twenty micrograms of genomic DNA was used to extract the exome sequences by using the Agilent Sureselect target enrichment protocol. The next-generation sequencing was performed on the Solid 4 platform (Applied Biosystems, Foster city, CA) by outsourcing the sequencing to a third party sequencing facility (Beckman Coulter Genomics, Danvers, MA).

Alignment and Variant Calling

SOLID 4 analysis pipeline was used to perform the alignment and initial variant calling. Variants that did not align within exonic regions or with coverage of less than 20 were discarded. Subsequent downstream identification of presumptive pathogenic variants was performed as described below.

Derivation of Candidate Gene Set

A set of potentially important candidate genes for IDD was defined in a two-step procedure. In the first step, data from the Exome Sequencing Project database (National Heart, Lung, and Blood Institute and collaborating research institutions, Seattle, Washington) was downloaded to determine the population frequencies of exonic variants in about 5,400 individuals. For each gene, the number of nonsynonymous variants with >0.1% population frequency was derived by using plinkseq software (Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA). The genes that score zero in this measure were considered as ultraconserved genes (i.e., genes with nonsynonymous variants at <0.1% population frequency at all sites within a gene). 2,638 genes satisfied this criterion. In the second step, genes in which mutations are known to cause abnormal brain morphology in mouse were identified. Using the phenotype term “abnormal brain morphology” in MGI mammalian phenotype browser database (The Jackson laboratory, Bar Harbor, Maine), 2,297 human orthologous genes were identified. Three hundred and sixty-nine genes belonging to both these groups were considered to be an important list of candidate genes for IDD.

Variant Filtration and Validation

SeattleSeq annotation server (University of Washington, Seattle, WA) was used initially to identify the novel, nonsynonymous single-nucleotide variants. Subsequently, nonsynonymous variants belonging to the candidate gene set described above were extracted. These variants are provided in Table 1 . All the variants were validated by Sanger sequencing. In addition, the de novo status of the variants was evaluated by Sanger sequencing in two of the patients in whom parental DNA could be obtained.

DTI

Axial-DTI array spatial sensitivity encoding technique data were acquired on a 3 T GE Signa scanner (GE Healthcare, Milwaukee, WI) at TR = 1,250 ms, TI = 88.7 ms, field of view 240 cm, 128 × 128 matrix, contiguous 3-mm thickness slices to cover the whole brain with 55 isotropic gradient directions with b = 1,000 s/mm2, one b = 0 acquisition, and number of excitations = 1 for a total acquisition time of 12 min. A 3-D fast-spoiled gradient-echo image was also acquired from whole brain with TR/TE/TI of 9.12/3.66/400 ms, slice thickness of 1.2 mm, and planar resolution of 0.9375 × 0.9375 mm2. The fast-spoiled gradient-echo image was used as anatomic reference for this study. Double refocusing pulse was used to reduce eddy current artifacts. In addition, the array spatial sensitivity encoding technique was performed to further reduce geometric distortion due to the sequence design. All of the children were sedated during the scan (performed as part of the clinical evaluation) and were monitored during the procedure by a trained sedation nurse. After acquisition, all data sets were assessed for quality and deemed acceptable for analysis. No scans were repeated as a result of movement or other artifacts. Tensor calculation and tractography were performed using DtiStudio software (Johns Hopkins University, Baltimore, MD). Tractography was carried out on the basis of the fiber assignment by the continuous tracking algorithm, with fiber propagation starting at a fractional anisotropy threshold value of >0.2. The fiber propagation was stopped at a fractional anisotropy threshold of <0.2 or an angle threshold of >60 degrees. The tracking protocol followed to isolate the arcuate fasciculus has been described previously (4). The patients were classified into two groups: those with an identifiable/well-developed arcuate fasciculus and those with an unidentifiable/underdeveloped arcuate fasciculus. Two raters, blinded to the clinical status, made a dichotomous determination of whether the arcuate fasciculus is identifiable or not. There was no disagreement either within or between the two observers.

Statement of Financial Support

This work is supported by National Institute of Child Health and Human Development (National Institutes of Health, Bethesda, MD) grant R01HD059817 (PI: S.S).

Disclosure:

No potential/perceived conflict of interest to disclose.