Connecting myelin-related and synaptic dysfunction in schizophrenia with SNP-rich gene expression hubs

Combining genome-wide mapping of SNP-rich regions in schizophrenics and gene expression data in all brain compartments across the human life span revealed that genes with promoters most frequently mutated in schizophrenia are expression hubs interacting with far more genes than the rest of the genome. We summed up the differentially methylated “expression neighbors” of genes that fall into one of 108 distinct schizophrenia-associated loci with high number of SNPs. Surprisingly, the number of expression neighbors of the genes in these loci were 35 times higher for the positively correlating genes (32 times higher for the negatively correlating ones) than for the rest of the ~16000 genes. While the genes in the 108 loci have little known impact in schizophrenia, we identified many more known schizophrenia-related important genes with a high degree of connectedness (e.g. MOBP, SYNGR1 and DGCR6), validating our approach. Both the most connected positive and negative hubs affected synapse-related genes the most, supporting the synaptic origin of schizophrenia. At least half of the top genes in both the correlating and anti-correlating categories are cancer-related, including oncogenes (RRAS and ALDOA), providing further insight into the observed inverse relationship between the two diseases.

= − 0.7 in human brain tissues using the Allen Brain Atlas (website: brain-map.org) 8,9 , and a recent database with pre-calculated Pearson correlations for all relevant gene pairs at http://www.szdb.org/download.html#coexpression by Wu et al. 11 . This resulted in 1,257,407 positively correlating and 1,108,585 negatively correlating unique gene pairs belonging to 16829 and 16761 individual genes, respectively (Supplementary Tables 1 and 2).
Subsequently we mapped all known human genes in 108 highly mutated human genomic regions identified in ref. 7 separately for genes whose promoters or only cis regulatory elements fall into these 108 loci. Filtering for genes present in the correlating or anti-correlating pairs by Wu et al. 11 resulted in 254 promoter-selected and 462 cis-selected genes (indicated by 1p/1c in Supplementary Tables 1 and 2, respectively).
Counting the positively and negatively correlating partners for all the genes except in the 108 loci in ref. 7 resulted in a median number of 71 positively and 63 negatively correlating pairs, respectively (Fig. 1). Unexpectedly, the median numbers of correlating pairs for the promoter-selected genes were 2472 and 2013, corresponding to a 35-fold and 32-fold increase for the positive and negative pairs, respectively.
The number of correlating pairs for the cis-selected genes also proved to be higher (386 and 320 pairs on average for positively and negatively correlating pairs, respectively), corresponding to an approximately 2.5-fold increase when compared to the general gene population. The median values were similar to the average gene population in this category ( Fig. 1) but an unpaired t-test showed that the cis-derived genes and the general gene population (excluding the 108-loci genes) are significantly different (p-value = 0.012).
We also investigated the number of differentially methylated gene pairs using a table of 56001 differentially methylated probes from a genome-wide methylome study in schizophrenics 10 . We counted separately the number of hypermethylated and hypomethylated genes for the highly correlating and anti-correlating gene pairs and also the total sum of hyper-and hypomethylated probes belonging to these genes. The sums of scores for both the number of differentially methylated gene pairs and the total sum of their differentially methylated probes are shown in Supplementary Tables 1 and 2, for the positively and negatively correlating pairs, respectively.
To determine which measurement distinguishes the best between schizophrenia-related and non-specific genes we calculated the effect size for the 108-loci genes and also for genes annotated by Genecards and Malacards 12,13 as schizophrenia-related (Fig. 2), using Cliff 's delta for five different measurements (see legend for details). Figure 2a and b shows the effect sizes for all relevant pairs of complementary gene sets (i.e. promoter-derived vs. non-promoter-derived, Genecards vs. non-Genecards genes, etc.) comparing the numbers of correlating gene partners in one set to the numbers of the gene partners in the other gene set. As expected the promoter-derived 108-loci genes had the greatest effect size, followed by Malacards and Genecards, the 108-loci cis-derived genes having the smallest effect. The results were similar for the negatively correlating pairs (Fig. 2b). Here we also filtered for gene pairs that were both differentially methylated in ref. 10, which increased the effect size by 0.039 for "pairs" in Fig. 2 for the promoter-derived positive set and by 0.061 for the same for the negative set. (Unfiltered charts shown in Supplementary Figure 1A  top 20 genes with the most hypermethylated probes, in the positively and negatively correlating partners, respectively. The top-ranking gene in Table 1 is SYNGR1, synaptogyrin, followed by GRIN1, a glutamate receptor and CHRNB2, a cholinergic receptor. All three genes have synapse-related functions, SYNGR1 and GRIN1 regulate synaptic plasticity whereas CHRNB2 regulates synapse assembly 14 .
SYNGR1 is also present in the 108 most mutated loci in ref. 7 although only for its regulatory region, not for its promoter. Further functional analysis identified 5 more synapse-related genes in Table 1: NISCH regulates synaptic transmission, RIMS1 and STX1A regulate postsynaptic potential, CPLX2 regulates synaptic plasticity and GRIA1 regulates long-term synaptic depression.
Altogether 11 out of the 20 genes in Table 1 are annotated in Genecards as schizophrenia-related and 14 appear in the 108 genomic regions in ref. 7. The only gene in Table 1 that appears in neither is ARHGEF11, a glutamate transport enhancer, however it had a significantly higher expression in the thalamus of schizophrenics in ref. 15. Table 2 lists the top 20 genes with the highest number of hypermethylated probes for the negatively correlating partners. The top-ranking gene, DBI, is a GABA receptor modulator; its role having been contemplated in schizophrenia 16 the authors concluded that DBI might have a symptom modulatory rather than an etiological role in schizophrenia. Unexpectedly, MOBP, myelin-associated oligodendrocyte basic protein and MBP, myelin basic protein, both appear in Table 2. While myelin-related abnormalities are one of the hallmarks of schizophrenia, surprisingly, not much functional information is available about MOBP beyond its role in the formation of the myelin sheath 17 and a knockout mouse was phenotypically indistinguishable from the wild type 18 . Nevertheless, the authors argue that MOBP probably has a so far undiscovered function, due to the conservation of several alternatively spliced variants in rat and mouse 18 .
Altogether, 11 of the 20 genes with the greatest numbers of hypermethylated probes for their anti-correlating gene partners (Table 2)   Malacards-annotated schizophrenia genes; Genecards-annotated schizophrenia genes; 108-loci promoter genes. Effect sizes were calculated for 5 different measurements: (i) pairs, the total number of correlating gene pairs; (ii) pairs_hypometh, the number of correlating gene pairs that are hypomethylated (defined as in at least one probe the gene is hypomethylated); (iii) pairs_hypermeth, the number of correlating gene pairs that are hypermethylated; (iv) pairs_hypoSum, the total number of hypomethylated probes of the correlating gene pairs; (v) pairs_hyperSum, the total number of hypermethylated pairs for the correlating gene pairs. The numbers were filtered using only pairs of genes where both genes were differentially methylated in ref. Interestingly, several genes in both tables are cancer-related (indicated with a "*" next to the gene identifiers in Tables 1 and 2). Among the top 20 genes for hypermethylated probes of correlating partners are ARHGEF11 19,20 and SREPF2, both associated with prostate and breast cancer 21,22 while ALDOA functions as an oncogene in highly metastatic pancreatic cancer 23 . NISCH has a tumor-suppressive function in breast cancer 24 , CHGB is associated with aggressive VHL-associated pancreatic neuroendocrine tumors 25 . SEZ6L2 is a prognostic marker for lung cancer 26 , SLC45A1 is deleted in neuroblastoma 27 . GRIA1 might play a role in glioma proliferation 28 while TCF20 is again associated with prostate 29 and breast cancer 30 .
Among the top 20 genes for hypermethylated probes of anti-correlating partners are RRAS, an oncogene, also involved in neuronal axon guidance 31 and MBP, associated with oligodendrogliomas 32 . TMEM219 regulates apoptosis 33 ; CDK2AP1 is a putative oral cancer suppressor 34 ; PSMB10, a proteasome subunit, is upregulated via the NFKB1 pathway in cancer cells 35 . S100A1 is also associated with several tumor types and inhibits apoptosis in ventricular cardiomyocytes 36 .
DGCR6 is associated with DiGeorge syndrome, a consequence of microdeletions in chromosomal region 22q11.2 and also has increased levels in metastatic mammary tumour cells 37 . FABP7 plays a role in neurogenesis and is a marker of glioma stem cells 38 . Epithelial splicing regulatory protein 2 (ESRP2) suppresses cancer cell motility 39 . ALDH1A1 is a marker and prognostic factor for several human cancers, bladder 40 , pancreatic 41 , lung 42 , colorectal and breast cancer 43 , among others.
Altogether, 10 genes among the top 20 for the greatest number of hypermethylated probes for the positively correlating gene partners ( Table 1) and 13 of the 20 genes with the greatest numbers of hypermethylated probes for their anti-correlating gene partners (Table 2) are annotated as cancer-related in Genecards.   The connection between the positive and negative hubs. The existence of the two kinds of hub genes with high numbers of positive or negative correlating partners raises the question about their functionality and the underlying neurobiological pathways: are they related or do they form mostly separate networks? To answer this question we constructed three gene networks ( Fig. 3): two representing the positive and negative correlations only for SYNGR1 and MOBP, respectively ( Fig. 3a,b), and one ( Fig. 3c) showing a combined network of the two. We chose SYNGR1 for the positive hub for it is the top ranking gene in Table 1 and is also located in one of the PGC regions whereas we chose MOBP because it ranks highly among the anti-correlating genes, indicating a putative regulatory function, which, however, has not been observed yet experimentally.
To reduce the size of the networks we selected only Malacards-annotated schizophrenia-related genes that are hypermethylated in ref. 10. SYNGR1 correlates positively with 49 genes (Fig. 3a); MOBP also has 49 negatively correlating partners (Fig. 3b). The combined network is shown in Fig. 3c. Strikingly, 41 genes are shared interacting partners between MOBP and SYNGR1. Each of the two hub genes interacts with only 8 genes that are not shared with the other hub. Clearly, the positive and negative correlations -and such interactions -form highly interconnected networks that provide synaptic functions, the core functionality of the human brain.
We repeated the selection process replacing the Malacards-genes with the 108-loci genes. This resulted in a similar number of interacting genes that correlate positively with SYNGR1 (50 genes) and negatively with MOBP (48 genes). They share 39 common genes (Fig. 4). Biological processes derived from a Gene Ontology 44 analysis of the shared genes either in the Malacards set or the 108-loci set and in both cases shared between SYNGR1 and MOBP   and significantly enriched (p-value < 0.05) for both sets are shown in Table 3. For both gene sets the top two biological processes with the highest significance are "synaptic transmission" and "modulation of synaptic transmission".
Functional analysis of schizophrenia genes correlating with the top 20 hub genes. To get a further insight into the biological processes the top ranking genes and their network neighbors (i.e. their correlating and anti-correlating gene partners) partake in we took the top 20 genes with the most hypermethylated probes for the correlating (and anti-correlating) gene partners and selected those gene neighbors that correlated with minimum 19 of the top 20 genes. This resulted in 421 genes for the positively correlating set and 460 genes for the negatively correlating one. For this in silico functional analysis we used an online tool provided by STRING 45 to identify Gene Ontology terms that are over-represented in our gene set. The positive gene set was associated with 278 significantly enriched GO terms for biological processes while the negative set had 151 such GO terms. Interestingly, while the two gene sets share only 117 common genes i.e. about 1/4 th of the total gene number for either set, they also share 117 common biological processes (Table 4), a much higher fraction for either set (42% of the positive set-derived biological processes and 77% of the negative set-derived processes are shared).
This result also shows that gene networks as a whole are remarkably redundant, i.e. similar functions are often carried out by several functional homologs in the brain. This network view of the genes associated with schizophrenia is also revelatory for the polygenic nature of schizophrenia: once a mutation perturbs the expression of a highly connected hub gene with many interacting partners, this in turn will lead to the perturbation of several hundred or even thousand interacting genes. The differential methylation of several thousands of genes in the prefrontal cortex of schizophrenics might be one such manifestation of the complexity of the disease.
The shared GO terms ranked by significance are listed in Table 4. The top process is "nervous system development", followed by "synaptic transmission", reflecting two major aspects of schizophrenia, i.e. it is considered a (neuro)developmental disease 46 affecting mostly the synapse 47 . On a different note, the most frequently occurring word in Table 4 is "regulation" (occurring 29 times), followed by "transport" (24x), perhaps underlining somewhat less pronounced aspects of schizophrenia.
Positive hub genes are longer, negative hub genes are shorter than average. We also calculated the protein length and protein disorder statistics for both the top 20 positive and negative hubs. Surprisingly, while the top 20 positive hub proteins were significantly longer (p-value < 10 −5 ) than the average human protein    Table 3. Biological processes shared between the Malacards gene set interacting with both SYNGR1 and MOBP in Fig. 3C (41 light green-colored genes) and the 108-loci genes interacting with both SYNGR1 and MOBP in Fig. 4C (39 light green-colored genes). 12 biological processes in the table are common in the two shared sets, despite the very limited number of actually shared genes (only NRGN is shared besides MOBP and SYNGR1 but these two were not included in the GO analysis). (894 and 442 amino acids, on average, respectively), the top 20 negative hub proteins were significantly shorter (327 amino acids on average). We did not find their disorder to be significantly different from that of the general human protein population. Likewise, the 108 region-derived proteins among the top 20 positive and negative hubs in Tables 1 and 2, were also longer (14 proteins in Table 1a with an average length of 983 amino acids) and shorter (10 proteins in Table 1b with an average length of 261), respectively, than the average human protein (442 aa).

Discussion
It has been known for at least a decade that myelin has an inhibitory role in axonal regeneration 48 in the CNS. Myelin is dysfunctional in schizophrenia 49 and this dysfunctionality leads to changes in synaptic formation and function, another hallmark of schizophrenia 49 . Several studies have identified genes whose expression is abnormal in schizophrenic brains affecting myelin-related 50 and synapse-related biochemical pathways 47 . However, this is the first time, to our knowledge, that a model for a complete network of gene interactions is presented that would account for both of these recurring anomalies in schizophrenia.
Our gene network has two hubs, one, SYNGR1, with a synaptic function, correlates positively and apparently interacts with a high number of genes that also interact with one another and at the same time interact negatively with MOBP, a myelin gene. Myelin is known to inhibit axonal sprouting, a step considered important for synaptic formation 51 . While neither MOBP, nor MBP are known to have such inhibitory functions, another myelin-related protein, RTN4 (also called Nogo-A), ranking 88th among the negatively correlating genes (Supplementary Table 2), does have such a function, inhibiting axon growth 51 . It is tempting to hypothesize that either MOBP or MBP also have such a -so far undiscovered -inhibitory function. MOBP is the 3 rd most abundant protein in the CNS myelin and has several alternatively spliced variants. As highlighted by Montague et al. 18 a physiological function for MOBP has not been found yet. Our model with two antagonistic hub genes would also account for the antagonistic relationship between myelin genes and synapse-related functions 52 .
We also found a high representation of cancer-related genes in both the top correlating and anti-correlating genes (Tables 1 and 2). While only limited association has been found so far connecting schizophrenia-and cancer-related gene hubs 53 , there are several studies pinpointing the essential role of hub proteins in cancer 2,54 . Only a handful of papers have been published on gene networks in schizophrenia, e.g. a study highlighting shared  Table 4. Biological processes shared between the positively correlating and negatively correlating gene neighbors of the top 20 genes in Table 1 and Table 2.
Scientific RepoRts | 7:45494 | DOI: 10.1038/srep45494 hubs and gene networks between schizophrenia and diabetes 55 and our own work on microRNA-driven networks and hub genes in schizophrenia 56 . Perhaps the most relevant study connecting cancer-and schizophrenia-related gene networks is by Ibanez et al. 57 who found an inverse comorbidity between the two diseases, observing expression deregulations in opposite directions of the same genes and gene networks. Our model is also supported by the extraordinary enrichment of correlating partners for the genes encoded in the 108 genomic regions with the highest mutation rates in the genomes of schizophrenics identified in ref. 7. As mentioned above, the median number of correlating partners for those genes whose promoters are located in the 108 loci is 35 times higher for the positively correlating genes than for the rest of the genes in this study and 32 times higher for the negatively correlating genes. The effect size for these comparisons is in the range of 0.8-0.9, making the number of interacting partners, reflecting the centrality of a gene, the single most important factor when considering the biological significance of the individual schizophrenia-related genes and their contribution to the disease. There is a significant but smaller enrichment (2.5 times on average) for those 108-loci genes whose promoters are not, only their enhancers are present in these loci, which raises the intriguing possibility that for most genes in this set the mutations fall into or close to the promoter regions, compromising their functionality as in most cases of schizophrenia the protein sequences are not corrupted by mutations.
The assumption that it is the regulatory regions, not the protein-coding regions that are affected mostly in schizophrenia is also apparent in the fact that despite the strong centrality ("hubness") of the affected genes we do find surviving phenotypes, which are the patients, exactly. As raised in the introduction, hub genes with a mutation in the coding region would make these mutations lethal in most instances 2 .
The robustness of our findings is also supported by the fact that when we replace the Malacards genes with the 108-loci genes in Fig. 3 (sharing only SYNGR1, SRR and NRGN as indicated in Fig. 3) and analyze the Gene Ontology terms for the 39 genes correlated positively by SYNGR1 and negatively by MOBP, we find that the most significant biological process associated with this gene set is again "synaptic transmission". This also shows the remarkable redundancy of the gene networks in the human brain. Altogether, beyond providing an intriguing new model for schizophrenia, with more details for the underlining gene networks than before, it is also fascinating and quite fitting that synaptic transmission, perhaps the most complex and dynamic part of the human brain also entails the most genetic complexity, i.e. the most connected gene networks with the highest number of correlating/interacting gene partners.

Methods
Determining pairwise gene expression correlations in the human brain. In the first step all pairwise Pearson correlations were determined for those genes expressed in the brain that have expression data in the Allen Brain Atlas (website: brain-map.org) 8,9 . Gene expression was measured for 50,000 genes in 524 different tissues taken from several compartments of the brains of several individuals spanning the human lifetime between 2 weeks of post-conception and 40 years of age. We used an in-house Perl script to calculate pairwise correlations complemented by correlation data taken from SZDB.org 11 . We filtered the results keeping only pairwise Pearson correlation that were either minimum 0.8 or maximum − 0.7. If there were several values for the same gene pair we used the most extreme ones, the highest and lowest values for the positive and negative correlations, respectively.
We mapped all known human genes in 108 highly mutated human genomic regions identified in ref. 7 separately for genes whose promoters or only cis regulatory elements fall into these 108 loci. We determined the location of the promoters and cis elements in the human genome using a study by Thurman et al. 58 .
Counting correlating gene pairs and their differentially methylated probes. For each gene whose expression correlated (r > = 0.8) or anti-correlated (r < = − 0.7) with other genes in our data set (16830 and 16762 genes with correlating and anti-correlating partners, respectively) we counted the number of correlating and anti-correlating partners. Using a methylome data set in ref. 10 that recorded 56001 differentially methylated probes between two subgroups of schizophrenia patients we also counted gene partners for each gene that were hypermethylated or hypomethylated (with at least one hypermethylated or hypomethylated probe, respectively) and also the total number of hyper-or hypomethylated probes for the correlating and anti-correlating gene partners. The genes were ranked for the total sum of hypermethylated probes of the 'neighboring' (i.e. either correlating or anti-correlating) genes as this ranking placed SYNGR1, a gene present with its regulatory region in the 108 SNP rich regions in schizophrenics, identified by the Psychiatric Genomics Consortium (PGC) to the top of the list (in Table 1). This ranking also produced 14 genes out of the top 20 genes in Table 1 with either its promoter or cis-regulatory region overlapping with the SNP-rich PGC regions in ref. 7. Determining the effect size to distinguish between schizophrenia-related and unrelated genes. To determine which measurement distinguishes the best between schizophrenia-related and non-specific genes we calculated the effect size for the 108-loci genes and also for genes annotated by Genecards and Malacards 12,13 as schizophrenia-related (Fig. 2) regarding each specific feature mentioned in the previous paragraph (i.e. the number of correlating genes, the number of hypermethylated, hypomethylated correlating genes and the total sum of the differentially methylated probes of the correlating pairs). We repeated the calculations for the negatively correlating pairs as well. To calculate the effect size we used Cliff 's delta. Cliff 's delta is defined as where the two distributions are of size m and n with items x i and x j (with i running from 1 to m and j running from 1 to n) respectively, and # is defined as the number of times. Cliff 's delta shows that out of all possible pairwise comparisons between the numbers in set A and set B in what proportion will the numbers in set A be bigger than in set B. Cliff 's delta does not require any assumptions about distribution types.

Functional analysis with the Gene Ontology module of STRING.
To carry out functional analysis of the top 20 genes and their network neighbors we used the Gene Ontology (GO) module 14 of the STRING 45 webserver. The server lists all functional categories that are significantly enriched in the provided gene set, and supplies the corresponding p-values. We recorded the biological processes for the top 20 positive and negative genes and also their network neighbors.
Network visualization, statistical calculations. To visualize the gene correlation networks in Figs 3 and