Introduction

Neurodevelopmental and neuropsychiatric disorders are complex traits that result from multiple genetic determinants interacting in the context of poorly understood environmental factors to give rise to clinically diverse phenotypes.1, 2, 3 Affected individuals harbour different risk alleles in a heterogeneous genetic background that make candidate disorder genes difficult to detect. In spite of this difficulty there are now hundreds of candidate genes with DNA copy number variations or single nucleotide polymorphisms (SNPs) characterised from clinically diagnosed individuals. The power of modern genetic screening approaches has led to the identification of candidate genes associated with autism spectrum disorders (ASDs),4, 5, 6, 7, 8 X-linked intellectual disability (XLID),9, 10 attention deficit hyperactivity disorder (ADHD)11, 12, 13 and schizophrenia (SZ).14, 15, 16, 17, 18, 19, 20 More recently, whole-genome sequencing (exome capture) coupled with computational approaches that integrate protein interaction information have helped to build hypotheses concerning molecular pathways and processes that are likely to underpin these disorders.6, 7, 21

ASDs are a group of clinically diverse neurodevelopmental disorders (1–2% of the population) with a significant genetic heterogeneity.7, 8, 22, 23, 24 Although large twin studies have shown monozygotic concordance rates with high heritability estimates of 90%,25 the underlying genetic determinants remain largely unknown. Only three genetic loci (5p14.113, 5p15.214 and 7q31–q35) have statistically significant support with ASDs, suggesting common variation will account for only a small proportion of the heritability in ASD.7, 26 In contrast, recent copy number variation and SNP data show many rare variants occur in key neurological molecules that function in the synaptic junctions of neurons. These molecules include members of the neurexin-neuroligin complex (NRXN1, CNTNAP2, NLGN1, NLGN3, NLGN4X, NLGN4Y, LRRTM1 and LRRTM227, 28 and interacting proteins (SHANK3, PSD95 SHANK2, SHANK1, SYNGAP1, DLGAP2, FOXP1, GRIN2B, SCN1A and LAMC37, 22), suggesting that synapse development and function represents a major pathogenetic hub for ASDs and related disorders.27

XLID accounts for 5–10% of intellectual disability in males. There are over 150 XLID syndromes, including fragile X syndrome and Rett syndrome and numerous non-syndromal XLID disorders many, of which are caused by 102 genes on the X chromosome.29 Among these are genes that also contribute to ASD (that is, NLGN4, RPL10 and RAB39). Although familial linkage studies have facilitated a better understanding of the biological basis of XLID, the genetic basis for autosomal intellectual disability remains poorly defined.10

ADHD is a neuropsychiatric condition of childhood. Although underlying molecular mechanisms are poorly understood, genetic influences are recognised as important aetiological components of ADHD. Large twin studies show heritability estimates of 75–90%30 for this disorder. A polygenic model is consistent with the high prevalence of ADHD (2–10%) and high concordance in monozygotic twins, but modest risk to first-degree relatives. Pathogenetic models of ADHD have traditionally focused on molecules involved in neurotransmission and catecholamine synaptic dysfunction,31, 32 including dopamine transporter DAT1 (SLC6A3), dopamine receptors DRD4, DRD5 and synaptosomal protein SNAP-25.11 More recently neural developmental genes including cadherin 13 (CDH13) and cGMP-dependent protein kinase I (PRKG1)33, 34, 35, 36 have been associated with ADHD.

SZ is similarly a highly heritability disorder (80%) with monozygotic twin concordance rates estimated to be as high as 40–65%.37, 38 Similar to ASD and ADHD, there have been significant efforts to identify possible common and rare genetic variants that might explain susceptibility to this disorder. Recent genome-wide association studies39 have confirmed a substantial polygenic component associated with SZ. Not unlike ASD and ADHD, SZ is likely to encompass a broad genetic aetiology. Increasing evidence suggests the onset of neurological symptoms probably occur when a threshold of cumulative genetic liability is reached.40 A compelling feature concerning SZ is the number of genetic factors that are shared with other disorders. DNA variants identified in SZ have also been associated with bipolar disorder,41 ASD,23, 42 mental retardation,43, 44 and ADHD.12 A notable example of genetic comorbidity is the documented structural variation in the neurexin-1 (NRXN1) gene that increases risk for both ASD and SZ.45

Here, we propose a novel biological systems approach, a hypothetical ‘gene network model’ that can be used to analyse candidate genes and predict the association of genetic screening data with ASD, XLID, ADHD and SZ. This is based on observed differences in gene distributions and functional patterns that are informative for each disorder. The gene network model was successfully validated using cohort data from six recent disorder studies.

Materials and methods

A diagram of our computational approach is depicted in Supplementary Figure S1. Detailed description of genes, biological databases and computational and statistical methods used in this study are presented in the Supplementary Information.

Results

Primary database of neurodevelopmental and neuropsychiatric disorder genes

Our approach used current information concerning genes that have a documented association with neurodevelopmental and neuropsychiatric disorders. We used publicly available molecular data to create a comprehensive database of primary candidate genes associated with ASD, XLID, ADHD and SZ. For candidate genes to be included in the analysis, we required evidence of DNA variation including SNPs, insertion and deletions and larger copy number variations. A primary database of 700 genes comprising 361 genes for ASD,4, 7 93 for XLID,9, 10 158 for ADHD11, 12, 33, 34, 35, 36 and 218 for SZ14, 15, 16, 17, 18, 19, 20 provided a priori information for our analyses (Supplementary Table S1). There were 125 genes found to be associated with more than one disorder (Figure 1a).

Figure 1
figure 1

(a) Venn diagram of primary candidate genes associated with four disorders: blue (ASD), green (SZ), yellow (ADHD) and red (XLID). (b) Venn diagram of primary candidate genes and their first degree interacting neighbours in the PPI network. (c) PPI networks of all four disorders showing primary candidate genes and their adjacent neighbours. The AXAS–PPI network is the union of all four disorder PPI networks. Those genes that overlapped more than one disorder are marked as black circles (Supplementary Table S1 and S2).

PowerPoint slide

Protein–protein interaction networks of neurodevelopmental and neuropsychiatric disorders

A computational analysis was designed to build a functional map of the neurodevelopmental and neuropsychiatric disorder genes based on the fundamental features of protein–protein interaction (PPI) and gene regulation (that is, identifying transcription factor (TF) and microRNA (miRNA) regulatory sites). First, a database of nonredundant PPIs was created by joining the BioGRID (biological general repository for interaction data sets)46 and HPRD (human protein reference database) databases47, 48, 49 (Supplementary Table S2).

The BioGRID database (version 3.1.77) includes 9115 proteins and 37 748 interactions between proteins, and the HPRD (release 9) comprises 9563 proteins and 38 774 interactions. The union of the two major PPI networks yields a nonredundant network with 11 028 proteins and 58 256 interactions. This indicates that more than 80% of these proteins and 65% of the interactions are commonly found in both databases (Supplementary Table S2).

The PPI network is represented as a graph where the nodes are proteins and the edges are interactions between proteins. PPI networks for ASD, XLID, ADHD and SZ were created by retrieving all possible interactions from the PPI database between primary candidate genes and their respective first-order interactors (Table 1). A total of 534 out of 700 (76%) encoded primary candidate genes were found in the merged PPI database. We built a neurodevelopmental and neuropsychiatric disorder PPI network named AXAS to reflect the precise origins of the data (ASD, XLID, ADHD and SZ). The AXAS–PPI network was created by joining these 534 encoded proteins with 3413 first-order interacting proteins. Many proteins of the AXAS–PPI network were found to interact with primary candidate genes from more than one disorder (Figure 1b).

Table 1 Structural properties of PPI networks

The AXAS–PPI network reconstructed from 534 candidate genes represents 35% (3946 out of 11 028; see Table 1 and Figure 1c) of the merged BioGRID–HPRD PPI network. Strikingly, the AXAS–PPI network encompasses 20% of encoded genes found in the human genome. We also validated the neurological context of the AXAS–PPI network using available transcriptome data for the whole brain.50 A nucleotide sequence comparison of 3946 protein-coding genes found that 92% of genes in the AXAS–PPI network are expressed in the human brain (Supplementary Table S3).

All four individual disorder PPI networks, as well as the entire AXAS–PPI network were characterised using six structural properties of complex networks (Table 1). We further tested whether the distributions of the structural properties were similar or different between AXAS–PPI networks and PPI networks constructed from an equivalent number of randomly sampled proteins from the merged PPI database (Table 1 and Supplementary Figure S2). Komolgorov–Smirnov tests showed that the structural properties of average degree, density and clustering coefficient were significantly higher in the AXAS–PPI networks than in the random networks, while the average path length was smaller in the disorder PPI networks (Table 1). This control analysis confirms that genes previously associated with neurodevelopmental and neuropsychiatric disorders are significantly more interconnected than expected by chance. A functional enrichment analysis of the 534 primary candidate genes revealed that specific biological processes are overrepresented in each of these disorders. This higher-order functional pattern provides a basis to discriminate between these polygenic disorders (Table 2 and Supplementary Table S4). There is no single functional domain that clearly characterises any one disorder.

Table 2 Functional analysis of primary candidate disorder genes

Another common property of biological networks is community structure.51, 52 We found that the AXAS–PPI network is highly modular (Q=0.61), being subdivided into 30 structural communities (modules). The AXAS–PPI modules were ranked in order of the number of proteins that contribute to each module; M1 is the largest with 700 proteins, whereas M30 is the smallest with only three proteins. Most of the AXAS–PPI modules (21 out of 30) also show a significant enrichment of proteins that are functionally clustered (Supplementary Table S5). To illustrate these features, the AXAS–PPI network was summarised as a simple graph showing the contribution of modules to the four disorders (Figure 2). Interestingly M2 (synaptic transmission, signal transduction and phosphorylation) interacts with 12 other modules and is the main protein hub or functional focus of the network. M2 significantly integrates key biological processes, including the regulation of transcription and biosynthetic processes (M1), cell cycle (M3), cell–cell adhesion and development (M4) and cell–cell communication and differentiation and protein transport (M5) (Supplementary Table S5). Although primary candidate genes associated with ASD, XLID, ADHD and SZ contribute to most of the AXAS–PPI modules; they are not equally distributed in 9 out of 13 modules (Supplementary Table S6).

Figure 2
figure 2

Graphical representation of the 13 most populated protein modules of the AXAS–PPI network. The size of each module is proportional to the number of proteins. Weighted lines with numbers represent protein interactions between modules. Distributions of primary candidate genes are shown as a pie chart for each module. The proteins and intraconnections of each module are described in Supplementary Table S5. Modules with significant differences in the frequency of primary candidate genes are indicated by *(Supplementary Table S6).

PowerPoint slide

cis-Regulatory networks of neurodevelopmental and neuropsychiatric disorders

We analysed the upstream control region (genomic DNA containing cis-regulatory elements) and downstream untranslated regions (3′-UTR) of putative mRNA transcripts in the 30 AXAS–PPI modules. The aim was to test the hypothesis that genes in the same functional module or disorder share a common pattern of regulatory elements. In addition to identifying candidate genes that are TFs or containing miRNAs, we characterised the number and distribution of TF binding and miRNA target sites associated with genes of the AXAS–PPI network. This upstream control region analysis identified overrepresented DNA motifs in the upstream regions of genes that group in the same biological processes and disorder (Supplementary Figure S3). We used functional criteria defined by the Gene Ontology consortium to identify TF binding sites in the 13 most populated AXAS–PPI modules (Figure 2 and Supplementary Table S5). A total of 401 functional groups were identified among the hierarchical levels in the Gene Ontology database (Supplementary Table S5), including 52 DNA motifs that were found to be enriched in these functional groups (Supplementary Table S7). Of the 52, 38 show similarity to TF binding sites previously described in the TRANSFAC53 or JASPAR54 databases. The remaining 14 are novel motifs that are likely to function as binding sites for unknown TFs or other regulatory molecules. Interestingly, more than 80% of the TFs that bind to these enriched DNA target sites are proteins found in the AXAS–PPI network (Supplementary Table S7). This indicates that TFs, which are part the AXAS–PPI network also regulate many of the genes in this network.

The 3′-UTR analysis was designed to identify enriched miRNA target sites of genes in the AXAS–PPI modules. We used available sequence data from the miRBase database55, which contains 1223 human miRNA sequences (miRBase version 16). We identified 621 miRNAs that have target sites enriched in the AXAS–PPI network, including 154 that were enriched in the 534 primary candidates genes (1.5-fold enriched; P-value<0.01; Supplementary Table S8). A total of 683 miRNAs target a greater percentage of genes in the AXAS–PPI network than in the whole-genome (Supplementary Figure S4). We also found 3.9% (149) of the genes in the AXAS–PPI network contain miRNAs embedded within their predicted transcripts, and that this occurrence is nearly twice that observed for the whole-human genome (2.0%; P-value<0.01). Furthermore, although the AXAS–PPI network only represents 20% of encoded genes in the human genome, more than half of the known miRNAs (56%) target transcripts of this network. We also show that the number of DNA motifs enriched in the upstream control regions is positively correlated with the number of miRNA target sites in the 3′-UTRs (Pearson’s correlation=0.64; t=4.67; df=32; P-value<0.01; Supplementary Figure S5). The correlation between increased transcriptional and post-transcriptional control suggests that the molecular network that underpins neurodevelopmental and neuropsychiatric disorders is likely to be evolving in a concerted manner.

SNP loci associated with neurodevelopmental and neuropsychiatric disorders are enriched with regulatory elements

On the basis of analysis of data from recent autism26, 56 and SZ genome-wide association studies41, 57, 58, 59, 60, 61 (Supplementary Table S9), we created a database comprising 4850 unique SNPs associated with ASD and SZ, and found that a number of these SNPs map to TF binding sites enriched in the AXAS–PPI network (Supplementary Table S7 and S9). The most overrepresented DNA motifs in the genome-wide association data sets were binding sites for STAT1 and STAT6 (signal transducer and activator of transcription) TF proteins. SNP loci associated with STAT1 and STAT6 binding sites were overrepresented in seven out of eight genome-wide association data sets (Supplementary Table S9). Our analyses also highlight that TF binding sites for SREBF1 (sterol regulatory element binding factor 1) and CREB (cAMP response element binding) are exclusively associated with autism and SZ, respectively.

There are 789 of 4839 nonredundant SNPs (16%) located near or within 435 genes in the AXAS–PPI network (Supplementary Table S10). Only 4% of 789 SNPs (32) occur within coding regions, whereas 81% (636) are located within intronic regions and include SNPs that map to alternate promoter regions of genes. The remaining 15% were located within 10 kb of the 5′-upstream (54) or 3′-downstream (67) regions of AXAS–PPI genes. We also compared SNP data with TF binding sites data validated by chromatin immunoprecipitation-sequencing at ENCODE (Encyclopaedia of DNA Elements)62, and found that 113 SNP loci (14%) were associated with at least one TF binding site. Interestingly, we found 12 autism and 13 SZ SNPs associated with seven validated TF binding sites (STAT1, STAT3, YY1, SP1, ETS1, E2F1 and USF2) enriched in our AXAS–PPI network (Supplementary Table S11). Most of these SNPs are associated with genes involved in synaptic transmission, with multiple different SNPs often occurring in the same gene. These genes include NTRK3 (neurotrophic tyrosine receptor kinase, type 3), NRXN1, SLC25A12 (solute carrier family 25, mitochondrial carrier, Aralar, member 12), GRID1 (glutamate receptor, ionotropic, delta 1) and SDC2 (syndecan 2). Approximately 50% of the 789 SNPs map to three functional modules: M2 (25%, synaptic transmission), M1 (14%, regulation of transcription), and M4 (11%, cell–cell adhesion) (Supplementary Figure S5A). Our approach offers the potential to characterise SNPs according to molecular function, with an example being shown for NTRK3 and NRXN1 (Supplementary Figure S5B and S5C). These functions only become apparent when integrating information concerning gene organisation, regulatory mechanisms and PPI.

Cross-validation of the AXAS–PPI network

We have tested our gene network model using genetic data from six recent studies; ASD,5, 6, 63 XLID,64 ADHD65 and SZ66 (Supplementary Table S12). A binomial test and a standardised score (Z-score) were used to assess how these data sets would be distributed across the disorders represented in our AXAS–PPI network. A compelling result of the analysis was a demonstrated capacity to correctly predict the association of these data with ASD, XLID, ADHD and SZ, based on the highest Z-score (Figure 3; Supplementary Table S13). Our analysis also confirms cohort screening data can vary significantly in genetic composition between studies of the same disorder. Interestingly, two data sets (ASD5 and XLID64) show significant Z-scores for more than one disorder. Although cross-validation analysis shows the AXAS–PPI network model can potentially assess risk from large genetic screens, to assess individual risk, we would need to build a reliable classifier based on the statistical representation of gene variants across different populations.

Figure 3
figure 3

Binomial distribution radar plots with standardised Z-scores that examine association of genetic data with the AXAS–PPI model. The longest axis from the origin indicates the most significant association. All six genetic screening data were correctly predicted: ASD (a, b and c); XLID (d), SZ (e) and ADHD (f); see also Supplementary Table S12 and S13.

PowerPoint slide

Discussion

The AXAS–PPI network indicates that there are up to 4000 genes that may contribute to neurodevelopmental and neuropsychiatric disorders (Figure 1). Therefore, the number of polygenic combinations including de novo variants67, 68 that potentially contribute to a neurological deficit, will be extraordinarily large. However, how can this diversity manifest with high incidence (that is, 1–2% for ASD, 1% for SZ) in the human population. The answer may lie in the ‘small world’ properties of the AXAS–PPI network, which is characterised by a small clustering coefficient and average shortest path length (Table 1). Greater than 90% of the 4000 genes function with less than 4.0 degrees of separation between any two genes (average path length equals 4.4). Evidently, DNA variation is anchored by functional proximity (many genes contributing to the same pathways and processes) and neurodevelopmental and neuropsychiatric disorders are not unlike a journey where there are various paths to the same destination.

For this reason we can look to higher order interactions including biochemical pathways and biological processes to examine the nature of neurodevelopmental and neuropsychiatric disorders. We observed that genes associated with ASD, XLID, ADHD and SZ are mainly distributed in 13 modules, whose encoded proteins are predominantly involved in the regulation of transcription, synaptic transmission, cell–cell communication, intracellular signalling pathways, cell cycle, metabolic processes and nervous system development. However, the number of genes associated with ASD, XLID, ADHD and SZ are not equally distributed in 9 out of 13 modules. Some functional domains such as synaptic transmission (M2), proteolysis (M6), phosphorylation (M10) and G-protein signalling (M12) seem to be commonly affected in all four disorders yet other modules, such as regulation of transcription (M1), vesicle-mediated transport (M11) and protein kinase signal transduction (M7) are overrepresented in ASD, ADHD and SZ, respectively (Figure 2).

The great majority of proteins (92%) in the AXAS–PPI network were found to be expressed in adult human brain and are likely to be regulated by similar mechanisms. We addressed this possibility by analysing the regulatory elements in the upstream control region and 3′-UTR of disorder genes in AXAS–PPI modules. A number of TF and miRNA binding sites were found to be significantly enriched in the non-coding regions of these genes. These analyses confirm that genes found in functional modules of the AXAS–PPI network are likely to be coregulated. It is also worth noting that genes with alternatively spliced 3′-UTRs are potentially regulated by different miRNAs. This is particularly important for neurodevelopmental and neuropsychiatric disorders considering that alternative splicing of genes is more prevalent in the brain and liver than in other tissues.50 An inherent capacity to target many specific genes or variant transcripts in parallel makes miRNAs a powerfully configurable regulatory mechanism that has evolved for dynamic control of genes, biochemical pathways and biological processes.69, 70 Although miRNAs can be thought of as fine-tuning controls,69 transcript levels of some genes in the brain might be under a dynamic balance that is delicate enough that even a slight change mediated by a miRNA or miRNA target mutation could translate into an observable phenotype. The importance of transcript levels might be demonstrated in certain genes, where haploinsufficiency can be enough to cause phenotypic effects.71

In summary our hypothetical framework can be used to examine the molecular basis of neurodevelopmental and neuropsychiatric disorders, and to assist with the selection of candidate genes for further analysis. A noteworthy outcome of this approach is the potential to classify, which disorder would arise from a particular set of genetic variations, with the eventual possibility of new diagnostic tools and therapeutic strategies. However we are mindful of genetic heterogeneity in different populations, sparse statistical representation and potential problems with clinical/behavioural diagnosis. These issues will need to be addressed in order to build a more specific and sensitive diagnostic tool capable of assessing risk for individual genotypes.