Introduction

Intellectual disability (ID) is the most common reason for referral to clinical genetic centers, and is an unsolved problem in health care with important impact on the individuals, their families, and the health system. Although mild forms of ID (intelligence quotient (IQ) 70–50) may, in some cases, be of genetic complex nature, severe forms (IQ<50, incidence of 0.4%) are thought to be caused by striking environmental or genetic factors.1 In addition to the classification based on severity, patients may be classified into syndromic and non-syndromic forms. In syndromic forms, dysmorphisms, biochemical parameters, imaging findings, and organ anomalies exist, and may help identifying a syndrome of known etiology. In the non-syndromic form, further non-specific symptoms such as epilepsy, ataxia, mild microcephaly, and behavioral disorders may exist. This means that a characteristic pattern of subtle symptoms may be overlooked in the first patient presenting, and will be recognized only when examining several patients with mutations in the same gene. Non-specific ID may thus be considered as an exclusion diagnosis of distinguishable, syndromic ID, with a considerable gray zone between both forms.

In the last few years, it became clear that genetic factors, that is, chromosomal anomalies and point mutations, have a central role in the etiology of ID. In a representative study, Rauch et al2 showed that 15% of the cases of ID are due to cytogenetic anomalies, and that further 15% are due to submicroscopic aberrations. X-chromosomal mutations are possibly the cause in about 10% of the cases.3 This means that in more than half of cases, the etiology of ID remains unsolved. The majority of these have probably an autosomal form of ID. In industrial countries, both dominant and recessive forms usually appear as sporadic cases due to small family size. This hampers genetic approaches for identifying the underlying genetic cause. Genetically distinct subtypes are often clinically indistinguishable, precluding pooling of data from unrelated families, and rendering their elucidation even more difficult. It is therefore not surprising, that to date, only few genes have been identified (recently reviewed by Ropers).1

The frequency of autosomal recessive, non-specific ID is unknown. Although functional considerations and empirical data from mouse models suggest that most gene defects are inherited as recessive disorders,1, 4, 5 a recent study suggests that autosomal dominant de novo mutations are most prevalent.6 Until now, 26 significantly linked non-specific intellectual disability of autosomal recessive inheritance (NS-ARID) loci were described,7, 8, 9, 10 and only 10 genes (six of them located in described regions) were identified: PRSS12 (OMIM#606709, on 4q26, former MRT1), CRBN (OMIM#609262, on 3p26, former MRT2), CC2D1A (OMIM#610055, on 19p13.12, former MRT3), ST3GAL3 (OMIM#606494, on 1p34.3), GRIK2 (OMIM#138277, on 6q16, former MRT6), TUSC3 (OMIM#601385, on 8p22, former MRT7), ZNF26 (no OMIM#, on 19q13.2), TRAPPC9 (OMIM#613192, on 8q24, former MRT13), ZCH14 (OMIM#613279, on 14q31.3), and TECR (MIM*610057 on 19p13.12) (Figure 1).1, 11, 12, 13, 14, 15, 16, 17, 18, 19

Figure 1
figure 1

Overview of NS-ARID loci reported to date7, 8, 9, 10 (blue bars), and novel loci reported here (red bars). Positions of known NS-ARID genes are marked with an arrow next to the gene symbol. Diagram was drawn with help of the GenomeGraph tool of the UCSC genome browser.

Taking into account that over 90 genes are responsible for only about 40% of X-chromosomal recessive ID cases, and that about half of the estimated 22 000 human genes are expressed in the brain, the total number of ARID genes may run into the hundreds.1 Considering the heterogeneity observed in previous linkage studies, and the fact that the so far identified genes do not account for a significant fraction of NS-ARID cases, systematic approaches are the most promising strategy to identify further genes.7, 8, 10 Large consanguineous families provide enough information to map loci, based on analysis in a single family and thus represent the best starting point. We present here the results of the homozygosity mapping in a series of 64 Syrian multiplex consanguineous families with NS-ARID.

Patients and methods

Affected families were identified with the help of local pediatricians. After obtaining written informed consent, all affected individuals were clinically evaluated by a pediatrician (AAb, AAl, MF, SH, AI, or SM) and by a geneticist (RAJ), with special attention to neurological, morphological, ophthalmological, dermatological, and skeletal symptoms. The mental status of affected and unaffected individuals was estimated by investigation of verbal and motor aptitudes at the time of visit. Parents were interviewed to discriminate facts about prenatal, perinatal, and neonatal medical history of all children, either affected or unaffected. As formal assessment of ID was not possible, we used a description of developmental milestones and abilities to evaluate severity of ID, as previously described.20 A detailed pedigree was constructed, and the genealogical relationship within families was crosschecked by interviewing different family members. Blood probes were drawn, and DNA was isolated from all available individuals. For one patient of each nuclear family, testing for Fragile X syndrome was carried out by PCR. In addition, all mothers were checked for X-inactivation status.

From the 85 families initially ascertained, 11 were excluded either for not providing sufficient blood samples, non-random X-inactivation in the mother of the index case suggesting X-linked inheritance, or non-informativeness of clinical and familial history. Four further families were excluded, because a diagnosis could be made (phenylketonuria, Miller-Dieker-Syndrome, van der Knaap Leukoencephalopathy, Cohen syndrome). Six further families with only one affected child each were also not analyzed further, as an autosomal dominant cause could not be excluded. Of the remaining 64 families, six nuclear families could be aggregated into two complex families, with three branches of affected cousins each, and further 16 families into eight complex families, with two branches each. In total, 41 nuclear and 10 complex families consisted of 305 children (140 affected), and 128 parents were subject to molecular genetics analysis. Of those, 31 families had two affected children, and 20 families had three or more affected children. Kinship co-efficiencies varied between 0.0039 and 0.1563. Of these, 12 families had only affected males, and 11 only affected females. Though X-chromosomal recessive inheritance cannot be definitely excluded, this is unlikely, as mothers of affected males were healthy, had several healthy brothers, nephews, and uncles on the maternal side, Fragile X syndrome was excluded, and X-inactivation was unremarkable.

Affected children presented either with moderate ID in 45, or severe ID in 19 nuclear families. All cases were non-specific, that is, no peculiar or distinguishable combination of symptoms was found. Further, unspecific symptoms we documented were epilepsy in 17 families, growth retardation in 11 families, muscular hypertonia in five families and hypotonia in 17 families, microcephaly in 14 families, ataxia in four families, and developmental regression in three families. The symptoms and severity were comparable in sibs. As many syndromes have variable phenotypes, we cannot exclude syndromic forms with certainty (Tables 1).

Table 1 Structure and clinical description of the 12 families, which showed one linkage locus

All available affected individuals and their healthy siblings were genotyped, healthy parents only when grandparents were also consanguineous. In total, 154 healthy and 128 affected persons were genotyped, 11 using Human610-Quad DNA Analysis BeadChips (Illumina, San Diego, CA, USA), 72 with HumanCytoSNP-12 DNA Analysis BeadChips (Illumina), and 201 with Genome-Wide Human SNP Array 6.0 (Affymetrix, Santa Clara, CA, USA) microarrays. The platform used was consistent within each family, with the exception of three individuals recruited at a later stage. Marker density was approximately 600 000, 300 000, and 900 000 SNPs, respectively. The minimal coverage of approximately 300 000 SNPs is more than sufficient to identify homozygosities of at least 500 kb (average of 50 markers).

To detect the presence of copy number variants, molecular karyotyping was performed using either the Affymetrix Genotyping Console Software (Version 3.0.2), or the PennCNV software21 for the Illumina arrays. In identical by descent (IBD) regions, we set filter criteria to ≥10 kb and ≥5 markers (Affymetrix), and ≥20 kb and a confidence value ≥10 (Illumina). To discriminate between genomic polymorphisms and aberrations containing MR-associated genes or regions, we compared findings within each family, with the database of Genomic Variants (http://projects.tcag.ca/variation/) and with 820 (Affymetrix) and 750 (Illumina) internal controls.

Mendelian segregation was calculated using PedCheck and the EasyLinkage interface software, and was confirmed in all instances.22, 23, 24 Genome-wide analysis was performed using Homozygosity Mapper and re-checked manually.25 In addition, linkage analysis was performed using ALLEGRO or GeneHunter under an autosomal recessive mode of inheritance, with 99% penetrance and a disease allele frequency of 0.001, using the EasyLinkage interface software.23, 26, 27

Results and discussion

Novel ARID gene loci

Twelve families showed only one IBD region each, with significant LOD scores above 3. Eleven of these were not included in published regions and were thus designated MRT numbers (HUGO Gene Nomenclature Committee, http://www.genenames.org/index.html, see Tables 1 and 2). In the 12th family, MR055, we identified a mutation in TRAPPC9. The linked regions vary in length between 1.2 and 45.6 Mb, and include between 9 and 625 RefSeq genes (Table 2).

Table 2 Genetic findings in 12 families showing a unique linkage locus

In the era of Next-Generation Sequencing, families with chromosomal regions IBD are of great value, even if they have more than one IBD region. In addition to the 11 families with a single locus, further 17 families showed IBD regions with a total length of less than 1% of the genome (<31 Mb), distributed over two (six families), three (eight families), four (one family), or five (two families) IBD regions. These IBD segments will be highly valuable for identifying further mutations in candidate genes. CNV analysis in IBD regions identified no deletion or duplication in candidate genes. In one family with seven IBD regions, we found a relatively common, recently described deletion over the gene OTOA at 16p12.2, explaining their hearing impairment, but not the ID.28

Identifying the causing mutation in TRAPPC9 in family MR055

Family MR055 consists of two branches (a and b). In the first branch, parents were cousins I°, and there were three affected females (two are true twins) and one healthy male. In the second branch, parents were cousins III° once removed, and there were three affected males (two are true twins) and one healthy female. Both families are distantly related, as great-grand parents were cousins.

At the time of examination, the three affected boys of family MR055a were 13 and 7 (twins) years old, 141, 116, and 116 cm tall, respectively, and had head circumferences of 47, 45.5, and 46 cm, respectively. Pregnancies and births were unremarkable; in the neonatal period, parents reported muscular hypotonia. The affected boys learned to walk at age of 5 years; they can understand simple phrases, but cannot speak. All have stereotypic movements and hand-flapping behavior. The elder brother had few epileptic seizures. The affected brothers resemble each other with low frontal hairline, synophrys, and microcephaly. At the time of examination, the three affected women of family MR055b were 33 and 23 (twins) years old, 132, 133, and 139 cm tall, respectively, and had head circumferences of 43.5, 42, and 42 cm, respectively. Pregnancies and births were unremarkable. They learned to walk at age of 7 years, never learned speaking, and they do not understand simple phrases. All have stereotypic movements and hand-flapping behavior. None of the affected had epileptic seizures. The elder affected woman is spontaneously losing her teeth and losing weight. The parents reported that this started at age of 20, which can be observed by the younger two affected sisters (23 years at the time of examination). As in the first branch, the affected sisters resemble each other with low frontal hairline, synophrys, and microcephaly.

Family MR055 showed significant linkage on chromosome 8 (Table 2), including TRAPPC9, which is a perfect candidate gene.16, 17, 18 We sequenced the gene and identified the mutation c.1423C>T;p.R475X recently reported in a Palestinian family.17 Because of the geographical and ethnical proximity (our family MR055 is from the south of Syria), and assuming an extremely low frequency of this mutation, we postulate that this is a founder mutation of this specific population and/or geographical area. Symptoms of the Palestinian and Syrian patients were comparable, moderate to severe ID, microcephaly, hand-flapping movements, no regression, and no epilepsy. In contrast to our patients, Mir et al16 reported normal height and less severe ID, and Philippe et al18 reported normal height and obesity. Thus, we conclude that mutations in TRAPPC9 cause IDs, with tendency to severe forms and microcephaly. Further symptoms such as short stature, stereotypic movements, epilepsy, and remarkable MRI findings seem to be not specific. Because of the relative similarity of our and the Palestinian patients, we do not exclude a genotype–phenotype correlation.

Loci overlapping known ARID loci

Apart from family MR055 with the TRAPPC9 mutation, none of the families with one linkage locus overlapped any of the 10 to-date-described ARID genes. Nonetheless, five families overlap seven already known loci, but with yet undetected causative gene (Figure 1). Overlapping findings are of importance, as these might facilitate the finding of a second allele. Further, we checked for overlapping IBD regions across all families studied, that is, also families with several IBDs. We identified five regions, each with overlapping IBDs in five families on chromosomes 3 (175.5–177.5 and 178.5–180.6 Mb), 19 (7.6–7.9 and 8.3–9.2 Mb), and 22 (20.2–21.1 Mb, all hg19) and 13 regions with four overlapping IBDs, each. These regions harbor interesting candidates for further analysis. Nevertheless, in the 51 families, we identified a total of 275 IBD segments encompassing 3196 Mb; thus any overlap may also be coincidental.

Intrafamilial heterogeneity in complex families

The two complex families MR061 and MR071, each consisting of three branches, a, b, and c, are an example of the impressive heterogeneity of the NS-ARID phenotype (Figure 2). Both families did not show one common linkage region for all three branches. When analyzed together, families MR061a and MR061b showed one homozygous locus on chromosome 14q11-q12 (MRT26, 9.1 Mb long, LOD-score of 3.85), whereas branch c does not map to this locus (Figure 2). The nuclear families MR071a, MR071b, and MR071c also did not show a common homozygote region. Depending on the analyzed combination of nuclear families, different linked loci are possible. Although the combination of either MR071a and MR071b, or MR071a and MR071c showed four linkage regions of up to 30 Mb with LOD scores of ≥2.4 each, the combination MR071b and MR071c showed one homozygote region of about 33 Mb on chromosome 15q21-q25 (LOD-score of 2.7). Though we would expect that genotyping further members of this family of extensive consanguinity (kinship co-efficient >0.15) might define the true linkage region, this family shows that even high LOD-scores must be considered cautiously. On the other hand, suggestive LOD scores may turn out to be true findings. An example has been shown by Najmabadi et al,7 who identified a LOD score of only two in family M001, and later identified a mutation in TRAPPC9, which is located in this region and proved that this locus is true, though the linkage analysis was not significant.7, 16

Figure 2
figure 2

Overview of the 12 families, which showed one linkage locus. The families were simplified as possible, and important information such as healthy brothers of mothers or grandmothers are shown. Asterisks (*) indicate genotyped DNAs.

Eight complex families MR006, MR013, MR018, MR019, MR022, MR043, MR049, and MR055, have two branches each, a, and b. Families MR022a and MR022b, and MR049a and MR049b showed no common homozygous region (Supplementary Figure 1). Five of the other six families, MR006, MR013, MR019, MR043, and MR055 showed one common homozygous region with LOD scores >3, each, on 6q12-q15 (MRT18), 18p11.32-p11.31 (MRT19), 16p12.1-q12.1 (MRT20), 11p15.5-15-4 (MRT21), and over TRAPPC9, respectively (Table 2).

We conclude that large and complex families represent a promising template to identify candidate regions, but bear the risk of false positive results because of heterogeneity. At the extreme end, such heterogeneity may exist within the same nuclear family.

Conclusions

Our findings suggest that NS-ARID is extremely heterogeneous in the Syrian population, which is in line with earlier findings in other populations.7, 8 Autosomal recessive disorders are even a greater challenge in populations with low consanguineous marriages, as those will often be compound heterozygote, and thus not due to IBD mutations. In addition, it seems certain by now that mutations in the currently known NS-ARID genes (PRSS12, CRBN, CC2D1A, GRIK2, TUSC3, TRAPPC9, ZNF526, ZC3H14, ST3GAL3, and TECR) are responsible for only a small fraction of NS-ARID cases, whereas hundreds of additional genes are likely to be identified in the near future. The most promising concept for this is the systematic analysis of a large number of consanguineous families, followed by ultra deep sequencing strategies. Identification of these genes will improve genetic counseling of affected families, and advance our understanding of molecular networks involved in cognitive processes. Understanding these processes should allow the replacement of symptomatic and supportive therapies, with pharmacotherapies based on a principled understanding of the causes of cognitive impairment as is already becoming evident for several neurodevelopmental disorders.29