The impact of the metabotropic glutamate receptor and other gene family interaction networks on autism

Although multiple reports show that defective genetic networks underlie the aetiology of autism, few have translated into pharmacotherapeutic opportunities. Since drugs compete with endogenous small molecules for protein binding, many successful drugs target large gene families with multiple drug binding sites. Here we search for defective gene family interaction networks (GFINs) in 6,742 patients with the ASDs relative to 12,544 neurologically normal controls, to find potentially druggable genetic targets. We find significant enrichment of structural defects (P≤2.40E−09, 1.8-fold enrichment) in the metabotropic glutamate receptor (GRM) GFIN, previously observed to impact attention deficit hyperactivity disorder (ADHD) and schizophrenia. Also, the MXD-MYC-MAX network of genes, previously implicated in cancer, is significantly enriched (P≤3.83E−23, 2.5-fold enrichment), as is the calmodulin 1 (CALM1) gene interaction network (P≤4.16E−04, 14.4-fold enrichment), which regulates voltage-independent calcium-activated action potentials at the neuronal synapse. We find that multiple defective gene family interactions underlie autism, presenting new translational opportunities to explore for therapeutic interventions.

T he autism spectrum disorders (ASDs) represent a group of highly heritable childhood neuropsychiatric disorders characterized by a variable phenotypic spectrum of neurodevelopmental deficits of impaired socialization, reduced communication and restricted, repetitive, or stereotyped behaviour 1 . ASDs are four times more common in boys 2,3 , and the most recent prevalence estimates across the United States range from 1% 4 to 2% 5 , although a recent study reported a prevalence as high as 2.6% in a general school-aged population in South Korea 6 . The ASDs have an estimated heritability as high as 90% 7 based on data on monozygotic twin concordance studies [8][9][10] , whereas recent estimates of the sibling recurrence risk range from 19% to 22% 11,12 .
Despite being highly heritable, the vast majority of family studies suggest that the ASDs do not segregate as a simple Mendelian disorder, but rather display clinical and genetic heterogeneity consistent with a complex trait 13 . Indeed, recent studies estimate that the ASDs may comprise up to 400 distinct genetic and genomic disorders that phenotypically converge 14,15 . Common variants such as single-nucleotide polymorphisms seem to contribute to ASD susceptibility, but, taken individually, their effects appear to be small 16 . However, there is increasing evidence that the ASDs can arise from rare or 'private' highly penetrant mutations that segregate in families but are less generalizable to the general population [17][18][19] . Many genes implicated thus far, which are involved in chromatin remodelling, metabolism, mRNA translation and synaptic function, seem to converge in common pathways or genetic networks affecting neuronal and synaptic homeostasis 16 .
Such remarkable phenotypic and genotypic heterogeneity when coupled to the private nature of mutations in the ASDs has hindered identification of new genetic risk factors with therapeutic potential. However, it is noteworthy that many of the rare gene defects implicated in the ASDs belong to gene families. For instance, rare defects impacting multiple members of both the post-synaptic neuroligin (NLGN) gene family 20 as well as their pre-synaptic neurexin molecular-interacting partners 21,22 have long been reported in patients with ASDs. In addition, a number of other defective gene families with important functional roles have subsequently been well-characterized including ubiquitin conjugation 23 , gamma-aminobutyric acid receptor signalling [24][25][26][27] and cadherin/protocadherin cell junction proteins 28 in the brain. Furthermore, multiple defects in voltage-gated calcium channels have been found in schizophrenia 29 , and a defective network of metabotropic glutamate (GRM) receptor signalling was found in both ADHD 30 and schizophrenia [31][32][33][34][35][36] , two neuropsychiatric disorders that are highly coincident with the ASDs. Also, the vast majority of significant defective genes identified from recent whole-exome sequences belong to gene families [17][18][19] .
Many studies have found defective genetic networks in the ASDs 21,23,37-40 (see ref. 16 for review), and we complement these in this work by uncovering new networks and implicating specific defective gene families that may be enriched for novel potential therapeutic targets. Drug-binding sites on proteins usually exist out of functional necessity 33 , and gene families derive from gene duplication events that present additional binding sites for a given drug to exert its effects. Most successful drugs achieve their activity by competing for a binding site on a protein with an endogenous small molecule 41 ; therefore, many successful pharmacologic gene targets are within large gene families. Indeed, nearly half of the pharmacologic gene targets fall into just six gene families: G-protein-coupled receptors (GPCRs), serine/threonine and tyrosine protein kinases, zinc metallopeptidases, serine proteases, nuclear hormone receptors and phosphodiesterases 41 . Moreover, many large gene families are localized to pre-and post synaptic neuronal terminals to coordinate the highly complex and evolutionarily conserved process of neurotransmission 42 , which is thought to be compromised to varying degrees in the autistic brain 43 . Therefore, we hypothesize that we may select more druggable targets for the ASDs by enriching for defective interaction networks defined by gene families.
Here we perform a large genome-wide association study (GWAS) of structural variants that disrupt gene family protein interaction networks in patients with autism. We find multiple defective networks in the ASDs, most notably rare copy-number variants (CNVs) in the metabotropic glutamate receptor (mGluR) signalling pathway in 5.8% of patients with the ASDs. Defective mGluR signalling was found in both ADHD 30 and schizophrenia [31][32][33][34][35][36] , two common neuropsychiatric disorders that are highly coincident with the ASDs. Furthermore, we find other attractive candidates such as the MAX dimerization protein (MXD) network that is implicated in cancer, and a Calmodulin 1 (CALM1) gene interaction network that is active in neuronal tissues. The numerous defective gene family interactions we find to underlie autism present many novel translational opportunities to explore for therapeutic interventions.

Results
To identify and comprehensively characterize defective genetic networks underlying the ASDs, we performed a large-scale genome association study for copy-number variation (CNVs) enriched in patients with autism. By combining the affected cases from previously published large ASD studies 21,23,28,44 with more recently recruited cases from the Children's Hospital of Philadelphia, we executed one of the largest searches for rare pathogenic CNVs in ASDs to date. In sum, 6,742 genotyped samples from patients with the ASDs were compared with those from 12,544 neurologically normal controls recruited at The Children's Hospital of Philadelphia (CHOP).
These cases were each screened by neurodevelopmental specialists to exclude patients with known syndromic causes for autism. Genotyping was performed at CHOP for the vast majority of the ASD cases as well as all the controls. After cleaning the data to remove sample duplicates and performing standard QC for CNVs, we first inferred the continental ancestry of 5,627 affected cases and 9,644 disease-free controls using a training set defined by populations from HapMap 3 (ref. 45) and the Human Genome Diversity Panel 46 (Table 1). Using this QC criteria, we estimated that the sensitivity and specificity of calling CNVs is B70% and 100%, respectively, across 121 different genomic regions assayed by PCR (Methods). Across all ethnicities, there was an increased burden of CNVs in cases versus controls, a statistically significantly difference (Pr0.001) in the larger European (63.3 versus 54.5 Kb, respectively) and African-derived (70.4 versus 48.0 Kb, respectively) populations.
We then searched for pan-ethnic CNV regions (CNVRs) discovered in the European-derived data set (4,602 cases versus 4,722 controls; Pr0.0001 by Fisher's exact test) and replicated in an independent ASD data set of African ancestry (312 cases versus 4,169 controls; Pr0.001 by Fisher's exact test) with subsequent measurement of overall significance across the entire multi-ethnic discovery cohort (5,627 cases versus 9,644 controls) for maximal power (Fig. 1, Table 2). On the basis of these selection criteria, two large well-known ASD risk loci emerged that harboured multiple duplications in the Prader Willi/Angelman syndrome (15q11-13) critical region, and multiple deletions were detected in the DiGeorge syndrome (22q11) critical region, albeit notably smaller than the 22q11 deletion syndrome. A third locus harbouring deletions in poly ADP-ribose polymerase family 8 (PARP8) on chromosome 5q11 was also discovered. PARP8 was previously identified as associated with the ASDs in a Dutch population 47 , but it has not previously been described for its pan ethnic distribution across European-derived and African-derived populations.
We examined the genetic interaction networks derived from gene families with members localized to the the Prader Willi/ Angelman syndrome (15q11-13) critical region, the DiGeorge syndrome (22q11) critical region, and the novel PARP8 (5q11) region using a method previously applied to ADHD 30 ; however, hardly any of the most significant genes harbouring significant CNVRs clustered within gene families. Consequently, we broadened our search for gene family interaction networks (GFINs) and searched the entire genome for GFINs with CNVs enriched in autism. For every gene family, we defined a GFIN as the genetic interaction network spawned by its multiple duplicated members. We used standard HUGO 48 gene names to define 1,732 GFINs across which we searched for enrichment of network defects associated with the ASDs. However, because there is an a priori excess of CNV burden in ASD cases over disease-free controls (Table 1), larger GFINs are expected to display significant enrichment of case defects by virtue solely of their increased size and complexity. Therefore, for each GFIN, we used a network permutation test of case enrichment across 1,000 random sets of networked genes to control for the GFIN size and complexity. With this approach, we robustly identified network defects associated with the ASDs by minimizing statistical artefact derived from any a priori excessive CNV burden in cases over controls, as well as other unknown biases that may be inherent in the human interactome data 49-51 that we mined.
Out of 1,732 GFINs, we used the network permutation test to rank 1,557 GFINs with defined CNVs for enrichment of genetic defects in the ASDs. Among the top GFINs (Table 3) was the metabotropic glutamate receptor (mGluR) pathway defined by the GRM family of genes that impacts glutamatergic neurotransmission. The GRM family contains eight members, all of which were defined in the human interactome to cumulatively spawn a GFIN of 279 genes (Fig. 2). Across this GFIN for the GRM family of genes, we found CNV defects in 5.8% of   . By 1,000 random network permutations, we found this excess of enrichment across cases in the mGluR pathway to also be statistically significant (P perm r0.05). In addition, 69.2% (124/181) of the informative genes within our mGluR network showed an excess of CNVs among cases. However, the component genes that harbour the most significant CNVRs contributing to this overall network significance reveal that the duplicated mGluR genes themselves (GRM1, GRM3, GRM4, GRM5, GRM6, GRM7 and GRM8) fail to achieve significance individually, although there is a trend for an excess of CNV defects across a specific subset of mGluR receptors (GRM1, GRM3, GRM5, GRM7, GRM8) that is unique to cases (Supplementary Table 1). Many large studies of CNVs implicate genes within the glutamatergic signaling pathway in the aetiology of the ASDs 21,23,37-40 , and SNP 52,53 and CNV duplications 54 of GRM8 have been reported in association with the ASDs before in humans. Moreover, a recent functional study demonstrated that in mouse models of tuberous sclerosis and fragile X, two different forms of syndromic autism, the autistic phenotype was ameliorated by modulation of GRM5 in opposite directions for each syndrome, which suggests that GRM5 functional activity is central in defining the axis of synaptopathophysiology in syndromic autism 55 . Our GRM network findings implicate rare defects in mGluR signalling also contribute to the ASDs outside of fragile X and tuberous sclerosis, and we posit that functional mGluR synaptopathophysiology may be initiated from many dozens if not hundreds of defective genes within the mGluR pathway that may account for as much as 6% of the endophenotypes of the ASDs (Table 3).
In addition, we recently demonstrated the importance of mGluRs in ADHD 30,56 , a highly co-incident neuropsychiatric disorder within the autism spectrum. However, in contrast to ADHD where defects within the mGluR receptors themselves (GRMs) were among the most significant copy-number defects contributing to the overall network significance, we found that in the ASDs defects of component GRMs contributed only modestly to the overall significance of the mGluR pathway. Nonetheless, the defects within GRM1, GRM3, GRM5, GRM7 and GRM8 that we identified as unique to cases and thus enriched are the same GRMs we identified as being pathogenic in ADHD and may impact glutamatergic signalling.
Among the most highly ranked GFINs by permutation testing, the MAX dimerization protein (MXD) GFIN (P Fisher r3.83E À 23,  or dup), the closest gene impacted, the chromosomal band, the approximate size of the defect (Kb), the number of contributing SNPs, the numbers of affected cases and controls, as well as P-value and odds ratio (OR) from Fisher's exact test for across all populations, and subsets of European-derived and African-derived populations. *Genes with an asterix (*) harbour CNVRs that disrupt their exons of directly, while those without the asterix are located in the genomic region around the intergenic CNVRs.  Table 2). It is notable that we found PARP8 as significant across ethnicities as described earlier (Table 2), and we previously described the importance of structural defects in UBE3A in the ASDs 23 .
Other notable significant GFINs uncovered were POU class 5 homeobox (POU5F) GIFN (P Fisher r2.96E À 17, enrichment ¼ 2.3, P perm r0.008, and the SWI/SNF related, matrix associated, actin-dependent regulator of chromatin, subfamily c (SMARCC) GFIN (P Fisher r1.22E À 09, enrichment ¼ 1.9, P perm r0.035). The POU5F family of genes encodes for transcription factors containing a POU homeodomain, and their role has been demonstrated in embryonic development, especially during early embryogenesis, and it is necessary for embryonic stem cell pluripotency. Component genes of the SMARCC gene family are members of the SWI/SNF family of proteins, whose members display helicase and ATPase activities and which are thought to regulate transcription of certain genes by altering the chromatin structure around those genes. Most interestingly, the KIAA family of genes ranked among the top GFINs (P Fisher r3.12E À 23, enrichment ¼ 1.6, P perm r0.040). KIAA genes have been identified in the Kazusa cDNA sequencing project 61 and are predicted from novel large human cDNAs; however, they have no known function.
We also hypothesized that some component members of gene families may contribute disproportionately to the significance of a GFIN because they are highly connected to interacting gene partners that are enriched for CNV defects in ASD. Therefore, we decomposed the 1,732 gene families into their 15,352 component ARTICLE duplicated genes of which 1,218 had defined networks with data to test for significance by genome-wide network permutation. The calmodulin 1 (CALM1) gene interaction network ranked highest by network permutation testing of case enrichment for CNV defects across 1,000 random gene networks (Fig. 3, Table 4) and represents a novel and attractive candidate gene for the ASDs. Across the CALM1 network, we found CNV defects in 14/ 4,618 cases versus only 1/4726 controls (P fisher r4.16E À 04, enrichment ¼ 14.37, P perm r0.002), and these defects were distributed such that 90% (9/10) of genes that harboured CNVs in the CALM1 interactome were enriched in cases. Closer inspection of the most significant CNVR contributing to the CALM1 network significance (Supplementary Table 3) revealed that no single gene was significant on its own; instead, with the exception of only one gene (PTH2R), each contributing CNVR tagged highly penetrant rare defects unique to cases. Calmodulin is the archetype of the family of calcium-modulated proteins of which nearly 20 members have been found. Calmodulin contains 149 amino acids that define four calcium-binding domains used for Ca 2 þ -mediated coordination of a large number of enzymes, ion channels and other proteins including kinases and phosphatases; its functions include roles in growth and cell cycle regulation as well as in signal transduction and the synthesis and release of neurotransmitters [MIM 114180] 57 .
Among other highly ranked first degree gene interaction networks were the nuclear receptor co-repressor 1 (NCOR1; P fisher r1.11E À 06, enrichment ¼ 13.37, P perm r0.004) and BCL2-associated athanogene 1 (BAG1; P fisher r2.18E À 04, enrichment ¼ 15.40, P perm r0.014) networks. NCOR1 is a transcriptional coregulatory protein that appears to assist nuclear receptors in the downregulation of DNA expression through recruitment of histone deacetylases to DNA promoter regions; it is a principal regulator in neural stem cells 51 . The oncogene BCL2 is a membrane protein that blocks the apoptosis pathway, and BAG1 forms a BCL2-associated athanogene and represents a link between growth factor receptors and antiapoptotic mechanisms. The BAG1 gene has been implicated in age-related neurodegenerative diseases, including Alzheimer's disease 62,63 .
In summary, given the private nature of mutations in the ASDs, considering the cumulative contributions of rare highly penetrant genetic defects boosts our power to discover and prioritize significant pathway defects. As a result, our comprehensive, unbiased analytical approach has identified a diverse set of specific defective biological pathways that contribute to the underlying aetiology of the ASDs. Among GFINs robustly enriched for structural defects, the most enriched was that of the MXD family of genes that has been implicated in cancer pathogenesis 58 , thereby providing concrete genetic defects to explore the reported coincidence of specific cancers with the ASDs 59 . The most highly ranked component duplicated gene interaction network involves defects in CALM1 and its multiple interacting partners that are important in regulating voltageindependent calcium-activated action potentials at the neuronal synapse. Moreover, we found significant enrichment for defects within the GFIN for GRM that defines the mGluR pathway that has previously been shown to be defective in other neuropsychiatric diseases 29,30 . While specific mGluR gene family members have been shown to underlie syndromic ASDs 55 , our findings suggest that rare defects in mGluR signalling also contribute to idiopathic autism across the entire GFIN for GRM genes.
Consequently, in addition to specific neuronal pathways that are expected to be defective in the ASDs like those defined by GRM and CALM duplicate genes, we implicate completely novel The table lists the name and gene family member tested, the number and frequency of network genes enriched, the number and frequency of cases harbouring defects, the number and frequency of controls harbouring defects, and the significance of association by Fisher's exact test, the odds ratio of the effect size, and the significance of association by random permutation of network while controlling for number of genes tested. biological pathways such as the MXD pathway specific forms of which may be associated with the ASDs 59 . Given the unmet need for better treatment for neurodevelopmental diseases 64 , the functionally diverse set of defective genetic interaction networks we report presents attractive genetic biomarkers to consider for targeted therapeutic intervention in ASDs and across the neuropsychiatric disease spectrum.

Methods
Ethics statement. The research presented here has been approved by the Children's Hospital of Philadelphia IRB (CHOP IRB#: IRB 06-004886). Some patients and their families were recruited through CHOP outreach clinics. Written informed consent was obtained from the participants or their parents using IRB approved consent forms prior to enrollment in the project. There was no discrimination against individuals or families who chose not to participate in the study. All data were analysed anonymously and all clinical investigations were conducted according to the principles expressed in the Declaration of Helsinki.
Sample processing. The majority of cases (5,049 of 6,742) and all controls (12,544) were genotyped with genome-wide coverage using the Infinium II platform across various iterations of the HumanHap BeadChip with 550 K, 610 K, 660 K and 1 M markers by the Center for Applied Genomics at The Children's Hospital of Philadelphia (CHOP). There were 1,693 cases genotyped by the AGP consortium. All cases and B50% of controls were re-used from previously published large ASD studies 21,23,28,44 . All cases were diagnosed by ADI-R/ADOS and fulfilled standard criteria for ASDs. Duplicate samples were removed by selecting unique samples with the best quality (based on genotyping statistics used to QC samples) from clusters defined by single linkage clustering of all pairs of samples with high pairwise identity by state measures (IBS Z0.9) across 140 K noncorrelated SNPs. Ethnicity of samples was inferred by a supervised k-means classification (k ¼ 3) of the first 10 eigenvectors estimated by principal component analysis across the same subset of 140 K non-correlated SNPs. We used HapMap 3 (ref. 45) and the Human Genome Diversity Panel 46 samples with known continental ancestry to train the k-means classifier implemented by the R Language for Statistical Computing 65 .
CNV inference and association. We called CNVs with the PennCNV algorithm 66 , which combines multiple values, including genotyping fluorescence intensity (Log R Ratio), population frequency of SNP minor alleles (B-allele frequency) and SNP spacing into a hidden Markov model. The term 'CNV' represents individual CNV calls, whereas 'CNVR' refers to population-level variation shared across subjects. Quality control thresholds for sample inclusion in CNV analysis included a high call rate (call rate Z95%) across SNPs, low s.d. of normalized intensity (s.d. r0.3), low absolute genomic wave artefacts (|GCWF| r0.02) and low numbers of CNVs called (#CNVs r100). Genome-wide differences in CNV burden, defined as the average span of CNVs, between cases and controls and estimates of significance were computed using PLINK 67 . CNVRs were defined based on the genomic boundaries of individual CNVs, and the significance of the difference in CNVR frequency between cases and controls was evaluated at each CNVR using Fisher's exact test.
Gene family interaction networks definition and association. We extended our previous work on ADHD 30 here to rank all GFINs by a network permutation test. Specifically, using merged human interactome data from three different yeast two hybrid generated data sets [49][50][51] accessed through the Human Interactome Database 68 , we defined the directed second-degree gene interaction network for all gene families here just as we did for the sole metabotropic glutamate receptor gene family network in ADHD. Specifically, here we use GFIN to refer to these gene family-derived interaction networks. In sum, we found 2,611 gene families with at least two members based on official HUGO 48 gene nomenclature, and generated 1,732 GFINs using. For 1,557 GFINs with defined CNVs, we calculated an odds ratio of cumulative network enrichment over all genes harbouring CNVs within the network. Moreover, for each GFIN, we quantified its enrichment by a permutation test of 1,000 second-degree gene interaction networks derived from a random set of N genes, where N is the number of members of a given gene family. Because the CNVs we are focused on are so rare, we are relatively underpowered to achieve significance by permutation testing after correcting for multiple GFIN tests. However, we report all GFINs in the manuscript in order of their nominal/marginal significance.
Experimental validation of CNVs. Significant CNVRs that we identified were validated using commercially available qPCR Taqman probes run on the ABI GeneAmp 9700 system from Life Technology. Supplementary Data 1 lists 251 reactions that we tested using 121 different genomic probes across 85 different samples for which DNA was available. For deletions, our sensitivity ¼ 0.65, specificity ¼ 1.00, NPV ¼ 1.00 and PPV ¼ 0.88. For duplications, our sensitivity ¼ 0.68, specificity ¼ 0.99, NPV ¼ 0.94 and PPV ¼ 0.91.