Introduction

Gene expression correlation, protein-protein interaction and other high-throughput experiments in the post-genomic era have revealed that genes tend to form complex, scale-free networks where most genes have a few connections with others and a few have a high number of interactions, commonly referred to as “hubs”, establishing them as important central genes in these gene networks1. These highly interconnected genes have become the targets of intense research expecting them to play prominent roles in genetic diseases. However, measurements found only a weak correlation between disease genes and hubs, e.g. Barabasi et al.2 found that disease genes have 32% more interactions with other proteins than non-disease genes, arguing that genetic mutations in topologically central, widely expressed genes are more likely to result in severe impairment of normal development, leading to lethality in utero and eventual deletion from the population2.

This is apparently not the case in schizophrenia, a disease that steadily affects about one percent of the population despite the lower fecundity of the affected individuals3. In most cases of schizophrenia no known mutations exist in protein-coding genes4 and the role of gene expression dysregulation in schizophrenia is increasingly recognized5,6.

A recent meta-analysis, examining the mutations in the genomes of 36,000 schizophrenics, identified 108 distinct genomic regions with significantly higher mutation rates7. We integrated this set with gene expression data from the Allen Brain Atlas8,9. After determining the correlating and anti-correlating neighbors of genes in the 108 loci we found that the median number of correlating neighbors were about 35 times higher for the genes in the 108 regions than in the rest of the genome (32 times for the anti-correlating pairs).

We also integrated the data with a recent methylome study in schizophrenics10. Ranking the genes for the hypermethylated probes of the positively correlating gene pairs identified the top gene as SYNGR1, a synapse-related gene whose regulatory region overlaps with one of the 108 SNP-rich loci in ref. 7. Ranking the genes for the hypermethylated probes of the negatively correlating gene pairs identified MBP and MOBP. They are both myelin-related, anti-correlating with a large number of synapse-related and glutamate receptor genes, offering a model that connects these frequently observed but so far disjoint pathologies in schizophrenia.

Results

Genes in the most mutated genomic regions have the highest number of correlating partners

In the first step we identified all human gene expression pairs with a Pearson correlation of >=0.8 or <=−0.7 in human brain tissues using the Allen Brain Atlas (website: brain-map.org)8,9, and a recent database with pre-calculated Pearson correlations for all relevant gene pairs at http://www.szdb.org/download.html#coexpression by Wu et al.11. This resulted in 1,257,407 positively correlating and 1,108,585 negatively correlating unique gene pairs belonging to 16829 and 16761 individual genes, respectively (Supplementary Tables 1 and 2).

Subsequently we mapped all known human genes in 108 highly mutated human genomic regions identified in ref. 7 separately for genes whose promoters or only cis regulatory elements fall into these 108 loci. Filtering for genes present in the correlating or anti-correlating pairs by Wu et al.11 resulted in 254 promoter-selected and 462 cis-selected genes (indicated by 1p/1c in Supplementary Tables 1 and 2, respectively).

Counting the positively and negatively correlating partners for all the genes except in the 108 loci in ref. 7 resulted in a median number of 71 positively and 63 negatively correlating pairs, respectively (Fig. 1). Unexpectedly, the median numbers of correlating pairs for the promoter-selected genes were 2472 and 2013, corresponding to a 35-fold and 32-fold increase for the positive and negative pairs, respectively.

Figure 1: Boxplots of inteacting partners for all genes and promoter-selected and cis-selected genes in 108 SNP-rich genomic regions taken from ref. 7.
figure 1

(A) Positive, (B) negative correlating partners.

The number of correlating pairs for the cis-selected genes also proved to be higher (386 and 320 pairs on average for positively and negatively correlating pairs, respectively), corresponding to an approximately 2.5-fold increase when compared to the general gene population. The median values were similar to the average gene population in this category (Fig. 1) but an unpaired t-test showed that the cis-derived genes and the general gene population (excluding the 108-loci genes) are significantly different (p-value = 0.012).

We also investigated the number of differentially methylated gene pairs using a table of 56001 differentially methylated probes from a genome-wide methylome study in schizophrenics10. We counted separately the number of hypermethylated and hypomethylated genes for the highly correlating and anti-correlating gene pairs and also the total sum of hyper- and hypomethylated probes belonging to these genes. The sums of scores for both the number of differentially methylated gene pairs and the total sum of their differentially methylated probes are shown in Supplementary Tables 1 and 2, for the positively and negatively correlating pairs, respectively.

To determine which measurement distinguishes the best between schizophrenia-related and non-specific genes we calculated the effect size for the 108-loci genes and also for genes annotated by Genecards and Malacards12,13 as schizophrenia-related (Fig. 2), using Cliff’s delta for five different measurements (see legend for details).

Figure 2: The effect sizes for the ratios of gene neighbor numbers between specific genes and their complementary gene sets (i.e. the rest of the ~16 k genes present in the study): 108-loci cis genes; Malacards-annotated schizophrenia genes; Genecards-annotated schizophrenia genes; 108-loci promoter genes.
figure 2

Effect sizes were calculated for 5 different measurements: (i) pairs, the total number of correlating gene pairs; (ii) pairs_hypometh, the number of correlating gene pairs that are hypomethylated (defined as in at least one probe the gene is hypomethylated); (iii) pairs_hypermeth, the number of correlating gene pairs that are hypermethylated; (iv) pairs_hypoSum, the total number of hypomethylated probes of the correlating gene pairs; (v) pairs_hyperSum, the total number of hypermethylated pairs for the correlating gene pairs. The numbers were filtered using only pairs of genes where both genes were differentially methylated in ref. 10. (A) Positively, (B) Negatively correlating gene pairs.

Figure 2a and b shows the effect sizes for all relevant pairs of complementary gene sets (i.e. promoter-derived vs. non-promoter-derived, Genecards vs. non-Genecards genes, etc.) comparing the numbers of correlating gene partners in one set to the numbers of the gene partners in the other gene set. As expected the promoter-derived 108-loci genes had the greatest effect size, followed by Malacards and Genecards, the 108-loci cis-derived genes having the smallest effect. The results were similar for the negatively correlating pairs (Fig. 2b). Here we also filtered for gene pairs that were both differentially methylated in ref. 10, which increased the effect size by 0.039 for “pairs” in Fig. 2 for the promoter-derived positive set and by 0.061 for the same for the negative set. (Unfiltered charts shown in Supplementary Figure 1A,B).

Specific genes with high numbers of correlating partners

We ranked all genes according to the number of hypermethylated probes (taken from ref. 10) of the correlating gene partners. Tables 1 and 2 show the top 20 genes with the most hypermethylated probes, in the positively and negatively correlating partners, respectively. The top-ranking gene in Table 1 is SYNGR1, synaptogyrin, followed by GRIN1, a glutamate receptor and CHRNB2, a cholinergic receptor. All three genes have synapse-related functions, SYNGR1 and GRIN1 regulate synaptic plasticity whereas CHRNB2 regulates synapse assembly14.

Table 1 The top 20 positively correlating genes, ranked according to the hypermethylated probes (listed in column 6) in the correlating gene partners.
Table 2 The top 20 negatively correlating genes, ranked according to the hypermethylated probes (listed in column 6) in the anti-correlating gene partners.

SYNGR1 is also present in the 108 most mutated loci in ref. 7 although only for its regulatory region, not for its promoter. Further functional analysis identified 5 more synapse-related genes in Table 1: NISCH regulates synaptic transmission, RIMS1 and STX1A regulate postsynaptic potential, CPLX2 regulates synaptic plasticity and GRIA1 regulates long-term synaptic depression.

Altogether 11 out of the 20 genes in Table 1 are annotated in Genecards as schizophrenia-related and 14 appear in the 108 genomic regions in ref. 7. The only gene in Table 1 that appears in neither is ARHGEF11, a glutamate transport enhancer, however it had a significantly higher expression in the thalamus of schizophrenics in ref. 15.

Table 2 lists the top 20 genes with the highest number of hypermethylated probes for the negatively correlating partners. The top-ranking gene, DBI, is a GABA receptor modulator; its role having been contemplated in schizophrenia16 the authors concluded that DBI might have a symptom modulatory rather than an etiological role in schizophrenia. Unexpectedly, MOBP, myelin-associated oligodendrocyte basic protein and MBP, myelin basic protein, both appear in Table 2. While myelin-related abnormalities are one of the hallmarks of schizophrenia, surprisingly, not much functional information is available about MOBP beyond its role in the formation of the myelin sheath17 and a knockout mouse was phenotypically indistinguishable from the wild type18. Nevertheless, the authors argue that MOBP probably has a so far undiscovered function, due to the conservation of several alternatively spliced variants in rat and mouse18.

Altogether, 11 of the 20 genes with the greatest numbers of hypermethylated probes for their anti-correlating gene partners (Table 2) are annotated as schizophrenia-related in Genecards and 10 appear in the 108 loci in ref. 7.

Interestingly, several genes in both tables are cancer-related (indicated with a “*” next to the gene identifiers in Tables 1 and 2). Among the top 20 genes for hypermethylated probes of correlating partners are ARHGEF1119,20 and SREPF2, both associated with prostate and breast cancer21,22 while ALDOA functions as an oncogene in highly metastatic pancreatic cancer23. NISCH has a tumor-suppressive function in breast cancer24, CHGB is associated with aggressive VHL-associated pancreatic neuroendocrine tumors25. SEZ6L2 is a prognostic marker for lung cancer26, SLC45A1 is deleted in neuroblastoma27. GRIA1 might play a role in glioma proliferation28 while TCF20 is again associated with prostate29 and breast cancer30.

Among the top 20 genes for hypermethylated probes of anti-correlating partners are RRAS, an oncogene, also involved in neuronal axon guidance31 and MBP, associated with oligodendrogliomas32. TMEM219 regulates apoptosis33; CDK2AP1 is a putative oral cancer suppressor34; PSMB10, a proteasome subunit, is upregulated via the NFKB1 pathway in cancer cells35. S100A1 is also associated with several tumor types and inhibits apoptosis in ventricular cardiomyocytes36.

DGCR6 is associated with DiGeorge syndrome, a consequence of microdeletions in chromosomal region 22q11.2 and also has increased levels in metastatic mammary tumour cells37. FABP7 plays a role in neurogenesis and is a marker of glioma stem cells38. Epithelial splicing regulatory protein 2 (ESRP2) suppresses cancer cell motility39. ALDH1A1 is a marker and prognostic factor for several human cancers, bladder40, pancreatic41, lung42, colorectal and breast cancer43, among others.

Altogether, 10 genes among the top 20 for the greatest number of hypermethylated probes for the positively correlating gene partners (Table 1) and 13 of the 20 genes with the greatest numbers of hypermethylated probes for their anti-correlating gene partners (Table 2) are annotated as cancer-related in Genecards.

The connection between the positive and negative hubs

The existence of the two kinds of hub genes with high numbers of positive or negative correlating partners raises the question about their functionality and the underlying neurobiological pathways: are they related or do they form mostly separate networks? To answer this question we constructed three gene networks (Fig. 3): two representing the positive and negative correlations only for SYNGR1 and MOBP, respectively (Fig. 3a,b), and one (Fig. 3c) showing a combined network of the two. We chose SYNGR1 for the positive hub for it is the top ranking gene in Table 1 and is also located in one of the PGC regions whereas we chose MOBP because it ranks highly among the anti-correlating genes, indicating a putative regulatory function, which, however, has not been observed yet experimentally.

Figure 3: Gene networks of hub genes generated from hypermethylated Malacards-annotated genes.
figure 3

(A) SYNGR1-centered, positively correlating gene network. The red-colored edges show correlations with SYNGR1. (B) Negatively correlating gene network, centered on MOBP. All the red-colored edges show negative correlations with MOBP. All the other pairwise negative correlations are shown in grey. The purple-colored genes overlap with 108 highly mutated genomic loci in schizophrenics (see text for details). (C) The combined network in (A and B). All light green-colored genes correlate positively with SYNGR1 and negatively with MOBP. The purple-colored and turquoise-colored genes correlate only with MOBP or SYNGR1, respectively.

To reduce the size of the networks we selected only Malacards-annotated schizophrenia-related genes that are hypermethylated in ref. 10. SYNGR1 correlates positively with 49 genes (Fig. 3a); MOBP also has 49 negatively correlating partners (Fig. 3b). The combined network is shown in Fig. 3c. Strikingly, 41 genes are shared interacting partners between MOBP and SYNGR1. Each of the two hub genes interacts with only 8 genes that are not shared with the other hub. Clearly, the positive and negative correlations - and such interactions – form highly interconnected networks that provide synaptic functions, the core functionality of the human brain.

The shared genes reflect remarkably on the nature and most consistently observed features of schizophrenia: 6 shared genes are glutamate receptors (GRIA4, GRIK3, GRIK5, GRIN1, GRM1, GRM2), one GABA receptor (GABBR1), a cannabinoid receptor (CNR1), GAD2, a glutamate decarboxylase, two synapse-related (SYN2, SYT11) and several neuron-specific genes are also present among the shared genes (Fig. 3c).

We repeated the selection process replacing the Malacards-genes with the 108-loci genes. This resulted in a similar number of interacting genes that correlate positively with SYNGR1 (50 genes) and negatively with MOBP (48 genes). They share 39 common genes (Fig. 4). Biological processes derived from a Gene Ontology44 analysis of the shared genes either in the Malacards set or the 108-loci set and in both cases shared between SYNGR1 and MOBP and significantly enriched (p-value < 0.05) for both sets are shown in Table 3. For both gene sets the top two biological processes with the highest significance are “synaptic transmission” and “modulation of synaptic transmission”.

Figure 4
figure 4

(A) All positively correlating genes in the 108 loci. The network is again centered on SYNGR1. The red-colored edges show correlations with SYNGR1. (B) All the genes in the 108 loci that correlate negatively with MOBP (edges colored red) or one another (grey edges). (C) The combined network in (A and B). All light green-colored genes correlate positively with SYNGR1 and negatively with MOBP (edges show in red in both cases). The purple-colored and turquoise-colored genes correlate only with MOBP or SYNGR1 (of the two), respectively.

Table 3 Biological processes shared between the Malacards gene set interacting with both SYNGR1 and MOBP in Fig. 3C (41 light green-colored genes) and the 108-loci genes interacting with both SYNGR1 and MOBP in Fig. 4C (39 light green-colored genes).

Functional analysis of schizophrenia genes correlating with the top 20 hub genes

To get a further insight into the biological processes the top ranking genes and their network neighbors (i.e. their correlating and anti-correlating gene partners) partake in we took the top 20 genes with the most hypermethylated probes for the correlating (and anti-correlating) gene partners and selected those gene neighbors that correlated with minimum 19 of the top 20 genes. This resulted in 421 genes for the positively correlating set and 460 genes for the negatively correlating one.

For this in silico functional analysis we used an online tool provided by STRING45 to identify Gene Ontology terms that are over-represented in our gene set. The positive gene set was associated with 278 significantly enriched GO terms for biological processes while the negative set had 151 such GO terms. Interestingly, while the two gene sets share only 117 common genes i.e. about 1/4th of the total gene number for either set, they also share 117 common biological processes (Table 4), a much higher fraction for either set (42% of the positive set-derived biological processes and 77% of the negative set-derived processes are shared).

Table 4 Biological processes shared between the positively correlating and negatively correlating gene neighbors of the top 20 genes in Table 1 and Table 2.

This result also shows that gene networks as a whole are remarkably redundant, i.e. similar functions are often carried out by several functional homologs in the brain. This network view of the genes associated with schizophrenia is also revelatory for the polygenic nature of schizophrenia: once a mutation perturbs the expression of a highly connected hub gene with many interacting partners, this in turn will lead to the perturbation of several hundred or even thousand interacting genes. The differential methylation of several thousands of genes in the prefrontal cortex of schizophrenics might be one such manifestation of the complexity of the disease.

The shared GO terms ranked by significance are listed in Table 4. The top process is “nervous system development”, followed by “synaptic transmission”, reflecting two major aspects of schizophrenia, i.e. it is considered a (neuro)developmental disease46 affecting mostly the synapse47. On a different note, the most frequently occurring word in Table 4 is “regulation” (occurring 29 times), followed by “transport” (24x), perhaps underlining somewhat less pronounced aspects of schizophrenia.

Positive hub genes are longer, negative hub genes are shorter than average

We also calculated the protein length and protein disorder statistics for both the top 20 positive and negative hubs. Surprisingly, while the top 20 positive hub proteins were significantly longer (p-value < 10−5) than the average human protein (894 and 442 amino acids, on average, respectively), the top 20 negative hub proteins were significantly shorter (327 amino acids on average). We did not find their disorder to be significantly different from that of the general human protein population. Likewise, the 108 region-derived proteins among the top 20 positive and negative hubs in Tables 1 and 2, were also longer (14 proteins in Table 1a with an average length of 983 amino acids) and shorter (10 proteins in Table 1b with an average length of 261), respectively, than the average human protein (442 aa).

Discussion

It has been known for at least a decade that myelin has an inhibitory role in axonal regeneration48 in the CNS. Myelin is dysfunctional in schizophrenia49 and this dysfunctionality leads to changes in synaptic formation and function, another hallmark of schizophrenia49. Several studies have identified genes whose expression is abnormal in schizophrenic brains affecting myelin-related50 and synapse-related biochemical pathways47. However, this is the first time, to our knowledge, that a model for a complete network of gene interactions is presented that would account for both of these recurring anomalies in schizophrenia.

Our gene network has two hubs, one, SYNGR1, with a synaptic function, correlates positively and apparently interacts with a high number of genes that also interact with one another and at the same time interact negatively with MOBP, a myelin gene. Myelin is known to inhibit axonal sprouting, a step considered important for synaptic formation51. While neither MOBP, nor MBP are known to have such inhibitory functions, another myelin-related protein, RTN4 (also called Nogo-A), ranking 88th among the negatively correlating genes (Supplementary Table 2), does have such a function, inhibiting axon growth51. It is tempting to hypothesize that either MOBP or MBP also have such a - so far undiscovered - inhibitory function. MOBP is the 3rd most abundant protein in the CNS myelin and has several alternatively spliced variants. As highlighted by Montague et al.18 a physiological function for MOBP has not been found yet. Our model with two antagonistic hub genes would also account for the antagonistic relationship between myelin genes and synapse-related functions52.

We also found a high representation of cancer-related genes in both the top correlating and anti-correlating genes (Tables 1 and 2). While only limited association has been found so far connecting schizophrenia- and cancer-related gene hubs53, there are several studies pinpointing the essential role of hub proteins in cancer2,54. Only a handful of papers have been published on gene networks in schizophrenia, e.g. a study highlighting shared hubs and gene networks between schizophrenia and diabetes55 and our own work on microRNA-driven networks and hub genes in schizophrenia56. Perhaps the most relevant study connecting cancer- and schizophrenia-related gene networks is by Ibanez et al.57 who found an inverse comorbidity between the two diseases, observing expression deregulations in opposite directions of the same genes and gene networks.

Our model is also supported by the extraordinary enrichment of correlating partners for the genes encoded in the 108 genomic regions with the highest mutation rates in the genomes of schizophrenics identified in ref. 7. As mentioned above, the median number of correlating partners for those genes whose promoters are located in the 108 loci is 35 times higher for the positively correlating genes than for the rest of the genes in this study and 32 times higher for the negatively correlating genes. The effect size for these comparisons is in the range of 0.8–0.9, making the number of interacting partners, reflecting the centrality of a gene, the single most important factor when considering the biological significance of the individual schizophrenia-related genes and their contribution to the disease. There is a significant but smaller enrichment (2.5 times on average) for those 108-loci genes whose promoters are not, only their enhancers are present in these loci, which raises the intriguing possibility that for most genes in this set the mutations fall into or close to the promoter regions, compromising their functionality as in most cases of schizophrenia the protein sequences are not corrupted by mutations.

The assumption that it is the regulatory regions, not the protein-coding regions that are affected mostly in schizophrenia is also apparent in the fact that despite the strong centrality (“hubness”) of the affected genes we do find surviving phenotypes, which are the patients, exactly. As raised in the introduction, hub genes with a mutation in the coding region would make these mutations lethal in most instances2.

The robustness of our findings is also supported by the fact that when we replace the Malacards genes with the 108-loci genes in Fig. 3 (sharing only SYNGR1, SRR and NRGN as indicated in Fig. 3) and analyze the Gene Ontology terms for the 39 genes correlated positively by SYNGR1 and negatively by MOBP, we find that the most significant biological process associated with this gene set is again “synaptic transmission”. This also shows the remarkable redundancy of the gene networks in the human brain. Altogether, beyond providing an intriguing new model for schizophrenia, with more details for the underlining gene networks than before, it is also fascinating and quite fitting that synaptic transmission, perhaps the most complex and dynamic part of the human brain also entails the most genetic complexity, i.e. the most connected gene networks with the highest number of correlating/interacting gene partners.

Methods

Determining pairwise gene expression correlations in the human brain

In the first step all pairwise Pearson correlations were determined for those genes expressed in the brain that have expression data in the Allen Brain Atlas (website: brain-map.org)8,9. Gene expression was measured for 50,000 genes in 524 different tissues taken from several compartments of the brains of several individuals spanning the human lifetime between 2 weeks of post-conception and 40 years of age. We used an in-house Perl script to calculate pairwise correlations complemented by correlation data taken from SZDB.org11. We filtered the results keeping only pairwise Pearson correlation that were either minimum 0.8 or maximum −0.7. If there were several values for the same gene pair we used the most extreme ones, the highest and lowest values for the positive and negative correlations, respectively.

We mapped all known human genes in 108 highly mutated human genomic regions identified in ref. 7 separately for genes whose promoters or only cis regulatory elements fall into these 108 loci. We determined the location of the promoters and cis elements in the human genome using a study by Thurman et al.58.

Counting correlating gene pairs and their differentially methylated probes

For each gene whose expression correlated (r >= 0.8) or anti-correlated (r <= −0.7) with other genes in our data set (16830 and 16762 genes with correlating and anti-correlating partners, respectively) we counted the number of correlating and anti-correlating partners. Using a methylome data set in ref. 10 that recorded 56001 differentially methylated probes between two subgroups of schizophrenia patients we also counted gene partners for each gene that were hypermethylated or hypomethylated (with at least one hypermethylated or hypomethylated probe, respectively) and also the total number of hyper- or hypomethylated probes for the correlating and anti-correlating gene partners. The genes were ranked for the total sum of hypermethylated probes of the ‘neighboring’ (i.e. either correlating or anti-correlating) genes as this ranking placed SYNGR1, a gene present with its regulatory region in the 108 SNP rich regions in schizophrenics, identified by the Psychiatric Genomics Consortium (PGC) to the top of the list (in Table 1). This ranking also produced 14 genes out of the top 20 genes in Table 1 with either its promoter or cis-regulatory region overlapping with the SNP-rich PGC regions in ref. 7.

Determining the effect size to distinguish between schizophrenia-related and unrelated genes

To determine which measurement distinguishes the best between schizophrenia-related and non-specific genes we calculated the effect size for the 108-loci genes and also for genes annotated by Genecards and Malacards12,13 as schizophrenia-related (Fig. 2) regarding each specific feature mentioned in the previous paragraph (i.e. the number of correlating genes, the number of hypermethylated, hypomethylated correlating genes and the total sum of the differentially methylated probes of the correlating pairs). We repeated the calculations for the negatively correlating pairs as well. To calculate the effect size we used Cliff’s delta. Cliff’s delta is defined as

where the two distributions are of size m and n with items xi and xj (with i running from 1 to m and j running from 1 to n) respectively, and # is defined as the number of times. Cliff’s delta shows that out of all possible pairwise comparisons between the numbers in set A and set B in what proportion will the numbers in set A be bigger than in set B. Cliff’s delta does not require any assumptions about distribution types.

Functional analysis with the Gene Ontology module of STRING

To carry out functional analysis of the top 20 genes and their network neighbors we used the Gene Ontology (GO) module14 of the STRING45 webserver. The server lists all functional categories that are significantly enriched in the provided gene set, and supplies the corresponding p-values. We recorded the biological processes for the top 20 positive and negative genes and also their network neighbors.

Network visualization, statistical calculations

To visualize the gene correlation networks in Figs 3 and 4 Cytoscape59 was used. To carry out t-test calculations and calculate the corresponding p-values we used cpan’s Statistics package, a Perl library. Whenever not mentioned explicitly, calculations and data manipulation was carried out with in-house Perl script (available on request from the author).

Additional Information

How to cite this article: Hegyi, H. Connecting myelin-related and synaptic dysfunction in schizophrenia with SNP-rich gene expression hubs. Sci. Rep. 7, 45494; doi: 10.1038/srep45494 (2017).

Publisher's note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.