Introduction

22q13.3 deletion syndrome, also called Phelan–McDermid syndrome (PMS or PHMDS, OMIM 606232) is a neurodevelopmental disorder-associated syndrome caused by heterozygous contiguous gene deletions at chromosome 22q13 or by pathological variants of SHANK3 [1]. The clinical PMS phenotype can vary widely and lacks a distinguishing feature. PMS is characterized by intellectual disability (global developmental delays), loss or absence of speech, neonatal hypotonia, autism spectrum symptoms, mild to severe motor impairments, and minor dysmorphic traits [2], resulting in significant, life-long care needs. The overwhelming majority of cases identified thus far involve SHANK3, a distal gene of 22q13.33 [2,3,4,5,6,7,8,9]. Further, SHANK3 variants have been identified in individuals with a full range of phenotypical severity [7, 10,11,12,13]. In general, SHANK3 variants have been associated with cases of intellectual disability (ID) at a rate of about 2% [14], autism spectrum disorder (ASD) at a rate of about 0.7% [14] and schizophrenia at a rate of 0.6–2.16% [15, 16], making it an important target for possible pharmacological intervention in these major neuropsychiatric and neurodevelopmental disorders.

The genetic landscape of PMS is large. Terminal deletions can vary in size from roughly 0.2 to 9.2 Mbp, impacting up to 108 separate protein-coding genes (PMS-associated genes, “PMS genes”). The 108 PMS genes extend from 22q13.2 to 22q13.33 [5, 7, 10, 11, 17,18,19]. Studying the influences of genes other than SHANK3 has been difficult because terminal deletions that include SHANK3 comprise the bulk of reported cases. Some individuals with interstitial deletions, who retain both copies (presumed without deleterious variants) of SHANK3, can have phenotypes indistinguishable from individuals with SHANK3 involvement [5, 10, 17, 18], raising the possibility that much of what is seen in the broader PMS population may be under the influence of many genes of 22q13.3. Further, cohort studies have consistently demonstrated deletion size effects on both core features and other manifestations of PMS [9, 11, 20,21,22].

The large and, as yet, poorly documented range of genetic contributions to PMS impacts the search for therapeutic interventions. Individuals with PMS provide the only available cohorts for SHANK3-targeted clinical studies of ASD [23,24,25]. Cohorts are assembled from fewer than 1600 cases worldwide [26]. The majority of this population have terminal deletions with no evidence for a preferred breakpoint [7, 10]. A possible recurrent breakpoint of ~150 bp in SHANK3 was postulated based on three cases with similar breakpoints [27], however, these authors later reported no evidence of preferred deletion size for terminal deletions in 30 subjects [11]. Calculating the mean and median terminal deletion sizes from the graphical data in Sarasua et al. [10] (5.20 and 4.77 Mbp, respectively) and tabular data in Soorya et al. [7] (4.27 and 4.28 Mbp, respectively), it was determined that the mean and median deletion sizes are similar both within studies and between studies as would be expected for a uniform distribution. If the average terminal deletion size is taken as 4.5 Mbp, that value represents a loss of just under half of the 108 PMS genes (Supplementary Figure S1). Therefore, PMS commonly entails the loss of many genes. The frequency of loss for individual genes is quantified in Methods.

In the present study, we describe multiple genes near SHANK3 that are highly expressed in the brain and have a very high probability of “loss of function intolerance” as described in the bioinformatics study by Lek et al. [28, 29]. Loss of function intolerance is based on genetic variations in the general population that reflect locations on the genome where variation is likely to impact normal gene function in a way that negatively influences the individual’s wellbeing. Based on these probabilities, we identify both well-characterized and poorly-characterized PMS genes that may contribute to clinically relevant manifestations of the syndrome. We expect these results to guide validating functional studies, aid interpretation of SHANK3-targeted clinical studies and provide data for identification of new therapeutic targets for improving functioning of individuals with PMS.

Materials and methods

Methods for gene function discovery and likelihood estimation of gene contributors to PMS

Identifying potential contributors to features associated with PMS involves two processes that are not mutually exclusive: (1) matching up gene function with PMS phenotype (function discovery) and (2) estimating the probability that a gene or a group of related genes contribute to the disorder (likelihood estimation). Figure 1 shows the decision pipeline.

Fig. 1
figure 1

Gene selection and classification pipeline. PMS genes were selected for classification through a “functional discovery” literature review, as described in Materials and methods. PMS genes were independently rank ordered based on “probability of loss of function intolerance” (pLI) scores from the ExAC database (http://exac.broadinstitute.org/) and the population impact factor (PIF), as described in Materials and methods. Two overlapping groups were created: Group I (High pLI, pLI > 0.9 and PIF > 0.5) and Group II (High PIF, pLI > 0.7 and PIF > 0.95). Genes in either group with sufficient information were included in the gene reviews along with incidentally associated genes (see Table 1). Numbers in brackets are number of genes. Groups I and II share six genes

Function discovery

We applied several methods for function discovery. The Online Mendelian Inheritance in Man (OMIM) catalog of genetic disorders [30] was searched to identify which PMS genes might be associated with reported phenotype characteristics of PMS. Genes associated with recessive inheritance were included under the assumption that they may contribute to the PMS phenotype with only single copy loss when these genes have a high likelihood of contribution [29].

Our second approach for function discovery was to survey existing literature on individual genes, including cellular expression, regional expression, metabolic pathways, disease association, animal models and in vitro models. This survey began with four widely used online resources: PubMed [31], Ensembl [32], Gene Ontology Consortium (GO) [33] and GeneCards [34]. The evidence for functional association is discussed separately for each PMS gene that is reviewed here. We considered any manifestation that has been reported with a frequency ≥20% in at least one of the following studies of 10 or more individuals [6,7,8,9,10,11, 35,36,37]. For the purpose of function discovery, we assume that loss of one or more PMS genes drives that trait. Poorly characterized genes (unknown function) are treated as their own “phenotype” class.

We used curated databases and genome-wide association studies (GWAS) for both functional discovery and likelihood of contribution of neuropsychiatric-neurodevelopmental genes by locating PMS genes found in the following sources: Class I–III Genes from ASD quartets [38], TADA-65 genes [39], Category S and Category 1–3 SFARI Gene as of March 2017 [40]. We did not explicitly include genes from the Autism Speaks MSSING project [41], because our likelihood estimation criteria are a superset of that project’s initial screening criteria.

Likelihood estimation

Likelihood estimation is based on the probability of loss of function (LoF) intolerance (pLI) score from the aggregation of over 60,000 genomes provided by the ExAC project [28, 29]. pLI was computationally determined from high-quality exome sequencing of adults with no reported major childhood/developmental disorders. The full database was used to create a reference sequence for each gene, then variants from this reference were counted for each gene in each individual. Conceptually, impactful genes that are sensitive to variants will be selected out of a population, leaving fewer variants than expected by chance [42]. Lek et al. [29] developed a model that probabilistically assigns to each gene the likelihood that it should be classified into these categories: not sensitive to variants, haplosufficient and haploinsufficient, based on the difference between expected number of variants and observed number of variants. The degree to which each gene might belong in one, two, or all three of these categories, was fit with empirically derived frequencies of haplosufficient and haploinsufficient genes using Bayesian maximum-likelihood estimation. The pLI is calculated by normalizing the probability that a gene is haploinsufficient relative to the three possible categories. Thus, pLI is intolerance for either heterozygous or homozygous LoF. A pLI value of 1 is a high likelihood of intolerance (predicted fully haploinsufficient gene). We used a pLI > 0.9 for LoF variants as the threshold for likely candidates except as noted. In some cases, likelihood is discussed based on other evidence, which will be described in detail.

Plotting of gene positions

To capture the probable impact of deletion size on phenotype, we plot gene candidates as a function of chromosome position. Gene loci are plotted based upon the most distal exon of the most common isoform. Two-dimensional plots use the same positional criteria for the abscissa and the likelihood estimation (pLI) for the ordinate.

Population impact factor and gene selection

Genes other than SHANK3 were selected as candidates for detailed review of their possible contribution to PMS. Initial gene selection was based on pLI and the estimated population impact factor (PIF). The PIF is the proportion of the population in which the gene has been disrupted. The PIF is estimated for a given gene by dividing the number of cases of gene disruption by the size of the population. We use terminal deletions from Sarasua et al. [10] for this estimation because it is the largest published sample currently available, although the results are unlikely to vary substantively if we used data from other published studies because of the uniformity in breakpoint locations [7, 11]. Disruption is defined as any deletion that removes part, or all, of the canonical transcript of a gene. To calculate the PIF value, the breakpoints of all 105 terminal deletion cases from Sarasua et al. were located on genome build hg38 based upon the reported deletion sizes. This list was compared to the locus of the last exon of the conical transcript for each gene. The PIF is calculated as the number of case breakpoints that are greater than the locus of the last exon divided by the total number of cases (105). That is, the PIF is the number of individuals who experienced loss or disruption of the gene divided by the total number of individuals in the population.

Two overlapping groups of genes were selected for review based on a combination of pLI and PIF. The first group (Group I) has pLI > 0.9 and PIF > 0.5 (very probably intolerant of LoF variants and over half of the population potentially affected by LoF). The PIF criterion was relaxed for SULT4A1 because it has been specifically implicated in a study of interstitial deletions [18]. In that study, Disciglio et al. identified a minimal deleted region for nine cases of interstitial deletions roughly 3–7 Mbp in size. SULT4A1 and PARVB were identified as functionally the most likely candidates among 12 genes in the minimal deleted region [18] (see Supplementary Figure S1). Relaxing the PIF criterion for PARVB would have no effect given its very low pLI.

The second group of genes (Group II) has pLI > 0.7 and PIF > 0.95 (likely intolerant of LoF variants and 95% of the population potentially affected by LoF), to not overlook somewhat less likely pLI genes that could impact a high percentage of the population.

Genes were not reviewed if we could not find sufficient evidence to identify a possible association between a phenotypic manifestation of PMS and the known clinical or molecular role of the gene, either because of insufficient literature or an unclear association. Genes were not reviewed if no pLI values were reported for those genes [29]. PMS genes closely associated with a reviewed gene are included in the review for completeness.

Results

Function discovery

Manual curation provided gene candidates that might contribute to common PMS characteristics reported in both retrospective and prospective studies [7,8,9,10,11, 19,20,21,22, 43]. We were unable to find specific studies on some genes (e.g., TTC38, PRR34), whereas other genes have a rich literature (e.g., SHANK3). We found a mean of about 2 references/gene (227/108 = 2.10). Using the production of a transgenic mouse model as a measure of research investment into gene function, we found that mouse models have been created and validated for nearly one third of PMS genes (34/108 = 31%). The results are summarized using the format recommended by Cody and Hale [44] in Supplementary Table S1. Curated databases and GWAS identified a short list of ASD-associated neuropsychiatric genes (SHANK3, CHKB, SBF1, and TCF20).

Classification

Function discovery provided sufficient information to arrange 44 genes into 14 phenotype-related classes (Supplementary Table S2). Classes are not mutually exclusive to allow for pleiotropy. Twelve of these classes are summarized in Fig. 2. All classes, genes assigned to each class, evidence for the assignment and all associated pLI and PIF scores are listed in Supplementary Table S2.

Fig. 2
figure 2

Gene function discovery for 12 phenotype characteristics. Top part of figure shows the 12 characteristics + unclassified plotted across 9.2 Mbp of 22q13.3. The terminus is at the right edge of the plot. Numbers below the top plot show chromosome locations in millions of bp (Mbp) referenced to human genome build hg38. The bottom part is an expanded plot of the distal 1 Mbp of the chromosome. Symbols are the same within a characteristic. Pleiotropic genes are represented more than once in some cases. pLI estimation is evaluated separately.There are 64 unclassified genes, only genes with pLI>9 are plotted 

Functional discovery identified 43 of the 108 PMS genes with possible phenotype association. Eighteen of the forty-three genes have either a direct association with a neurodevelopmental disorder or regulate a key feature of neurodevelopment (e.g., neuronal migration, differentiation, cell polarity). Eleven of these genes reside on the distal 1 Mbp of the chromosome (Fig. 2 expanded graph and Supplementary Table S2). However, the remaining neurodevelopmental-associated genes are distributed throughout the adjacent ~8 Mbp of chromosome 22. Notably, genes associated with known autosomal recessive degenerative diseases (ARSA, CHKB, ODF3B, SCO2, SBF1, TUBGCP6, MLC1, ALG12, TBC1D22A, ATXN10) are all localized to distal ~5 Mbp of the chromosome.

Likelihood estimation

Of the 108 PMS genes, 100 meet criteria for assigning a pLI. The genes that do not meet criteria are: SYCE3, LMF2, TTLL8, C22orf34, PRR34, ARHGAP8, SHISA8, and C22orf46 [29]. The pLI values for PMS genes are tabulated in Supplementary Table S3 and plotted as a function of position on chromosome 22 in Supplementary Figure S2. Figure 3 shows the distribution of the 18 PMS genes with a pLI > 0.9, 7 of which can be found within the distal 1 Mbp of 22q13.33.

Fig. 3
figure 3

Probability of LoF intolerance for PMS genes. Genes with high pLI scores (p > 0.9) are plotted as a function of chromosome position. Inset graph expands the distal chromosome. Each gene is assigned to only one class

Reviews

Genes other than SHANK3 that meet the Group I and/or Group II criteria are listed in Table 1. Well-documented genes from this table are reviewed in detail along with genes that are incidentally associated with these genes (Supplementary Document S1). General features emerging from these reviews are summarized here.

Table 1 Genes that met one or both group criteria

The picture that emerges from reviewing high pLI and high PIF genes of PMS is that groups of genes may contribute to common features of the disorder, not unlike the groupings for all PMS genes (Fig. 2). Some of these groups can be tied to synaptic function and thus potentially contribute directly to the phenotypic manifestations associated with SHANK3 haploinsufficiency. Other groups are associated with more fundamental processes, such as brain development and neuronal organization, which may impact brain function separately from synaptic function per se. Such groups have the potential for pleiotropic impact. Five groups are described here and in detail in Supplementary Document S1.

The first group of PMS genes is the group of MAP kinase genes. There are three members of this group deleted in PMS patients, MAPK8IP2 (pLI = 1.00), MAPK11 (pLI = 0.47), and MAPK12 (pLI = 0.01). The MAPK pathway is a potent regulator of plasticity-related signaling cascades at excitatory synapses [45, 46]. MAP kinases are expressed in multiple brain regions (prefrontal cortex, hippocampus, cerebellum) and have been implicated in regulatory functions for different glutamate receptors (AMPA, NMDA, mGluR) as well as actin dynamics [47,48,49,50,51,52,53]. MAPK8IP2 has preclinical evidence as a potential contributor to the PMS phenotype, including behavioral, motor and motor learning deficits observed in homozygous (null) mice [47]. However, there is substantial potential for additional impact of MAPK11 and MAPK12 loss that would modify losses associated with SHANK3 and MAPK8IP2 because of their common pathways, association with neurotransmitters and association with the MEF2C transcription factor (Supplementary Document S1).

The second group of PMS genes is a pair of adjacent genes, each of which has been independently associated with neuropsychiatric disorders: BRD1 (pLI = 0.99) and ZBED4 (pLI = 0.99). Both genes are highly expressed in the brain prenatally and both are DNA-binding regulatory genes. Haploinsufficiency of BRD1 has a wide-ranging impact on neurodevelopment, generally and schizophrenia, specifically [54,55,56].

The third group includes three PMS genes associated with multicellular organization. Complementary to the SHANK3 protein’s role as a scaffolding for internal cell structure [57], CELSR1 (pLI = 1.00), Plexin-B2 (PLXNB2, pLI = 0.99), and Fibulin-1 (FBLN1, pLI = 0.85) proteins provide critical external structure for neural assemblies. CELSR1 is involved in tissue patterning in the cochlea, lungs, brain and elsewhere (see Supplementary Document S1). Plexin-B2 regulates the timing of neuronal differentiation and migration during development [58]. Fibulin-1, an extracellular matrix protein, is associated with syndactyly and thinning of the myocardium in mice [59] and homozygous variants have been associated with syndactyly in humans [60]. Its impact on the nervous system is limited to cranial nerves IX and X, but it may also affect other organs [59].

The fourth group of PMS genes impact cerebellar development and function. That group entails six PMS genes (counting SHANK3). These genes are highly LoF intolerant and are highly and/or differentially expressed in the cerebellum: SHANK3 (granule cells) [61], ZBED4 (cerebellum) [62], MAPK8IP2 (Purkinje cells) [47], SULT4A1 (granule cell layer) [63], PIM3 (upregulated in cerebellum after seizures) [64], and PLXNB2 (granule cell differentiation and migration) [65]. In one study, structural abnormalities of the cerebellum were observed in 8 out of 10 subjects with PMS [37]. Testing of cerebellar motor function may be an especially useful biomarker to investigate the impact of polygenic loss on central nervous system (CNS) function (e.g., refs. [66, 67]).

The last group of PMS genes is mitochondria-associated genes. Four of these genes have previously been identified as a functional group associated with PMS, GRAMD4 (pLI = 1.00), SCO2 (pLI = 0), TYMP (pLI = 0), and CPT1B (pLI = 0) [43]. The role of mitochondrial genes in PMS is unclear, since there was no consistent phenotype or correlation with deletion size [43]. The review of RABL2B (pLI = 0.77) provides a possible alternative association with mitochondrial function. Rabl2, the mouse ortholog of human RABL2A/B, is important for mitochondria transport as demonstrated in the mouse liver [68]. It is notable that mitochondria are normally moved to post-synaptic boutons in an activity-dependent manner [69, 70]. We speculate that poor activity-dependent transport of mitochondria might exacerbate activity-dependent deficits of SHANK3 at glutamate post-synaptic terminals in the face of SHANK3 haploinsufficiency in a pattern that follows the differential patterning of RABL2A versus RABL2B in the brain [71]. Unfortunately, RABL2B specific function cannot be studied using typical methods in model species because of the unique evolutionary pattern of RABL2B in humans. Human-induced pluripotent stem cells (iPSCs) and iPSC-derived neurons from patients might provide an opportunity for studying the loss of RABL2B in PMS.

Discussion

To the best of our knowledge, this is the first systematic overview that identifies protein-coding genes commonly deleted in the 22q13 region that are likely contributors to manifestations of PMS. We focused on genes most frequently lost [7, 10] and those most likely to influence survival based on computational modeling of variant frequency (LoF intolerance) in the general population [29], using pre-determined selection criteria (Fig. 1). In parallel, we performed a literature-based functional discovery to classify all 108 PMS genes. We show that for PIF > 0.95 (95% of terminal deletions), six genes are very highly LoF intolerant and 5/6 are potential contributors to CNS dysfunction over and above SHANK3 haploinsufficiency.

Classification

Of the 108 PMS genes, we found sufficient information to classify 44 into functional categories (Supplementary Table S2). Genes that met the pLI and PIF criteria fell into five groups of potential impact on CNS function, including MAP kinases and genes associated with DNA regulation, multicellular organization, cerebellar function, and mitochondria. Identifying functional groups has two implications. First, multiple gene losses within a group may have cumulative effects. Second, if there are cumulative effects, therapeutic target selection may benefit from that knowledge.

Interstitial deletions

By design, the PIF metric favors distal genes to estimate population impact on the most common cases of PMS, terminal deletions. However, PMS does not exclude interstitial deletions that spare SHANK3 [2] and there is no distinguishing phenotype difference between terminal deletions and interstitial deletions that do not include SHANK3 [5, 10, 17, 18]. This similarity suggests that interstitial genes may contribute significantly to the common characteristics of PMS. Although the impact of deleted non-coding regions of chromosome 22 on SHANK3 regulation cannot be ruled-out, it seems likely that high pLI genes, notably SULT4A1 [18], are potential causal genes for cases of interstitial deletions. The possibility of undetected pathological SHANK3 variants must also be ruled out.

Pleiotropic effects

Some of the genes we reviewed have evidence for pleiotropic effects. PIM3 and MAPK8IP2, for example, both participate in the regulation of glucose, and PIM3 has been associated with regulation of the circadian rhythm. ZBED4, FBLN1, and CELSR1 all influence kidney development (Supplementary Document S1). One third (4/12) of the genes most likely to contribute to the PMS phenotype are poorly characterized, but these genes may also be pleiotropic and they may not be any less important than the well characterized genes [72]. Some pleiotropic effects may be more penetrant than others for a given gene, and thus may contribute to the variability of features reported in individuals with PMS, such as sleep disturbances or kidney disorders.

Limitations of the study

Our results are computationally-based and, as such, can only identify candidate genes. The actual contributions of each candidate to the PMS phenotype will depend on clinical and functional studies, with supporting evidence from animal models and cellular/molecular studies. Crucial genes may be missed using the pLI as a predictor [29]. The pLI is also not aimed at major polygenic losses as in PMS, whereas other approaches do address polygenic loss [73]. We note that our function discovery is based on previously identified roles for each gene, which may not capture critical additional (e.g., pleotropic) action in the central nervous system. Separating the impact of SHANK3 disruption from the disruption of other genes is problematic, perhaps best addressed through the study of individuals with restricted interstitial deletions that spare SHANK3. However, this is challenging work considering the relatively small number of individuals with such genetic abnormalities.

Future directions

PMS cases with a deleted or an otherwise disrupted copy of SHANK3 provide valuable information on phenotype–genotype investigations of this important ASD-associated gene. Interpreting these studies, however, must consider the broader genetic landscape. The present work begins to fill in the picture, although considerably more needs to be done to understand the genetic origins of the PMS phenotype. Protein-coding genes are only part of the DNA lost in terminal or interstitial deletions. More detailed work is required to account for other genes, regulatory elements [74], and even the possible contributions of small open reading frames [75]. The unknown functions of poorly characterized protein-coding genes with very high pLI need to be addressed. Most importantly, more clinical studies are needed to validate the contributions of individual genes to the PMS phenotype, especially studies that include interstitial deletions. These steps will bring an opportunity to identify new therapeutic targets and bring aid to those awaiting relief from the difficult realities of PMS.