A unified resource and configurable model of the synapse proteome and its role in disease

Genes encoding synaptic proteins are highly associated with neuronal disorders many of which show clinical co-morbidity. We integrated 58 published synaptic proteomic datasets that describe over 8000 proteins and combined them with direct protein–protein interactions and functional metadata to build a network resource that reveals the shared and unique protein components that underpin multiple disorders. All the data are provided in a flexible and accessible format to encourage custom use.

. Studies included in the database. Dark grey corresponds to postsynaptic, light grey-to presynaptic, and green-to synaptosomal studies. The resulting network model is embedded into a SQLite implementation allowing users to derive custom network models based on meta-data including species, disease association, synaptic compartment, brain region, and method of extraction (Fig. 2). The database with manual is available from Supplementary Materials and from Edinburgh DataShare https:// doi. org/ 10. 7488/ ds/ 3017, along with a SQLite Studio manual and Rmd file for querying under the R environment, a screencast walk-through demonstrating use-cases can also be found here https:// youtu. be/ oaW9Y r9AkXM.
The dataset can be used to answer frequent questions such as "What is known about my favourite gene? Is it pre-or postsynaptic? Which brain region was it identified in?". Beyond that, users can extend these queries to extract custom networks based on bespoke subsets of molecules. Worked examples that are easy to customise are shown in the Supplementary files.
The underlying principle of a systems biology approach is that structural features (pathways and subnetworks) underpin network functionality and given a network, one should be able to extract these features. Clustering algorithms 61,62 are commonly used to identify local communities within the network under the assumption that shared network topology correlates with shared function (and dysfunction). However, the more important question is how the different communities are organised to enable a controllable flow of signals across the large network. Using the PSP network as example, we identified 1029 "Bridging" proteins as www.nature.com/scientificreports/ those known to interact locally with neighbours in the network-helping organise function inside communities they belong to 63,64 , and simultaneously influence other communities in the network (Fig. 3A, Methods).
Using graph entropy as a compliment means of ranking a protein's ability to inhibit or enhance information flow 65 , we found that proteins with high Bridgeness value have ability to decrease the entropy of the network thus facilitating the signal transmission (Fig. 3B,D, Methods). Of the 1029 candidate Bridging proteins (see Region 1, Fig. 3C), we found ~ 43%) associated with at least one known synaptopathy and ~ 21% linked to multiple diseases including: APP (AD&Epi&ASD&PD&HTN&MS&FTD), VDAC1 (AD&PD&MS), and MAPK14 (AD&SCH&HD&HTN&MS), which supports the functional/disease importance of "bridging" proteins. Indeed, we found significant overrepresentation for specific diseases, such as AD (P = 3.4 × 10-6), HTN (P = 2.1E−5), HD (P = 5.2E−5), PD (P = 2.6E−3) (Supplementary Table 2). www.nature.com/scientificreports/ There are many complex co-morbidities between psychiatric disorders at the population and the genetic level but for most the molecular basis remains elusive. The network perspective can be used to obtain a different view by linking topology and phenotype together. Gene-disease association data is noisy and far from complete, but we can partly compensate by measuring, for each disease, the distance from each protein in the network to its nearest known associated protein, which can be extended to disease pairs 66 to dissect how these different neurological diseases coalesce at the synapse.
In both postsynaptic and presynaptic models, we found overlap for Hypertension (HTN) with AD (P = 8.6E−4/1.0E−2, and with MS (P = 8.79E−5/2.12E−3) (Fig. 3E). The AD-HTN link is not, in itself, new but commonly considered as a cardiovascular mechanism with a neurological impact. However, the network view reveals a new potential mechanistic link at the synapse. Although we found significant overlaps between AD-HTN and AD-PD, we did not see evidence for a PD-HTN link (P = 0.17/0.36), which indicates the potential shared mechanistic pathway between AD and HTN, which is different to the pathways shared between AD and PD (Fig. 3E).
To further dissect the potential sharing of pathways between AD and HTN in the PSP network (Fig. 3F), we employed Belief Propagation to propagate these GDA's through the network's edges, and a Degree-Corrected Block Model (DC-SBM) to model its effect on network clustering 67 . Under a prior assumption of no correlation between the GDA's and the network communities, we found evidence for the co-localization of AD and HTN (C = 31 P = 4.69E−5 and C = 43 P = 1.6E−11). Functionally, these communities are enriched for synaptic transmission, axon guidance (C = 31, GO:0007268 = 5.8E−3, GO:0007411 = 7.46E−5), stress activated MAPK cascade and response to oxidative stress (C = 43, GO:0051403 = 1.92E−5, GO:0006979 = 5.34E−5).
The presented synapse proteome dataset is the largest, most complete and up to date and is freely available with lightweight tools to allow anyone to extract relevant subsets. It compliments previously published curated dataset of synaptic genes SynGO 68 , and both resources could be used jointly as we have cross-referenced the common genes. By mirroring the methods used it would be straightforward for any user to add in their own datasets for comparison. Figure 2. Structure of the SQLite database, which includes 58 synaptic studies covering 8087 unique genes and 407,643 direct protein interactions. Grey ovals on the top show the annotated metadata: left-for nodes/ genes, which include brain region, subcellular compartment, method of extraction, disease and GO function annotation and link to published quantitative models; right-for edges/PPIs, which include PSI-MI type and method. The orange ovals in the bottom illustrate the possible outcomes of the database, including: (1) information for specific protein/gene, and (2) information that could be obtained from PPI network, e.g., protein's topological importance, community to disease relationship, and disease-disease comorbidity. The database is available as a Supplementary File and from Edinburgh DataShare https:// doi. org/ 10. 7488/ ds/ 3017. www.nature.com/scientificreports/