Abstract
Comparing brain structure across species and regions enables key functional insights. Leveraging publicly available data from a novel mass cytometry-based method, synaptometry by time of flight (SynTOF), we applied an unsupervised machine learning approach to conduct a comparative study of presynapse molecular abundance across three species and three brain regions. We used neural networks and their attractive properties to model complex relationships among high dimensional data to develop a unified, unsupervised framework for comparing the profile of more than 4.5 million single presynapses among normal human, macaque, and mouse samples. An extensive validation showed the feasibility of performing cross-species comparison using SynTOF profiling. Integrative analysis of the abundance of 20 presynaptic proteins revealed near-complete separation between primates and mice involving synaptic pruning, cellular energy, lipid metabolism, and neurotransmission. In addition, our analysis revealed a strong overlap between the presynaptic composition of human and macaque in the cerebral cortex and neostriatum. Our unique approach illuminates species- and region-specific variation in presynapse molecular composition.
Similar content being viewed by others
Introduction
Synapses are asymmetric intercellular junctions that differ in presynaptic neurotransmitters and postsynaptic receptors. In animals, presynaptic terminals are present in all neurons and may be the only exclusively neuronal feature among cells1. Despite their unique and essential role in central nervous system function2,3,4,5,6,7, the molecular diversity of human presynapses remains poorly understood. Indeed, a fuller appreciation of the molecular diversity of human presynapses has been obscured by technological limitations that force a trade-off in either capturing single presynapses with limited molecular information, or capturing more detailed molecular information using bulk synaptosome preparations8,9,10,11. Our recently developed mass cytometry- (CyTOF-) based method12,13,14, synaptometry by time of flight (SynTOF), has enabled high-throughput, multiplex analysis of single synaptic events (analogous to cellular events in CyTOF), either pre- or post-synaptic vesicles, offering a unique opportunity to characterize the molecular composition of presynaptic events at an unprecedented scale15.
Cross-species comparative analysis is a powerful method to understand human biological process specificity and understand biological system evolution16. However, integrating a large amount of heterogeneous data across multiple species requires statistically advanced tools that are computationally efficient and highly scalable. While multiple techniques are emerging to address these issues for monospecies single-event data15,17,18, no principled framework exists for multispecies single-event data. One strategy to bypass this is to analyze species data independently and identify single events separately19,20,21 requires identified (annotated) and well-defined single events. The limited understanding of the molecular composition of the presynapse thus precludes using this paradigm on SynTOF data.
To address this gap, we develop here an alternate, “comparative anatomy” approach that leverages an advanced machine-learning algorithm, enabling a direct cross-species comparison among the molecular composition of single presynaptic events. We leveraged publicly available single-presynapse event data: 3,657,113 from research volunteers (Hu), 759,227 from cynomolgus macaque (Macaca fascicularis, non-human primate NHP), and 201,261 from wild-type C57Bl/6 mouse (Mu), characterized by the expression of phenotypic antibodies15,16,,17,18 related to synaptic composition and organization, and showing cross-species reactivity. This dataset was collected on research volunteers without neurologic disease or neuropathologic changes22,23,24,25 (n = 6, 2 females) aged 76–97 years, healthy female NHP without neuropathologic changes (n = 4, 4 females) aged 11 years, and healthy 22-month wild-type C57Bl/6 mice (n = 5, 3 females) (Fig. 1A).
Our method identifies differences and similarities between the three species in the cerebral cortex at unparalleled scale and breadth of multiplexing. It further leads to new insight into primates’ presynaptic differences in the neostriatum and Hu and Mu presynaptic divergences in the hippocampus. It reveals a strong similarity between “disease-free” (control) Hu and NHP in presynaptic molecular composition for both cerebral cortex and neostriatum, with analogous presynaptic signatures in both primates, as well as a large divergence between primates and Mu.
Results
Non-zero cross-reactive SynTOF markers
To address concerns about comparing interspecies data derived from antibody-based detection, we first assessed the positive mean marker expression using a one-sided t-test (Fig. S1A) and determined that our panel resulted in significant non-zero marker reactivities (P-value < 0.05). We then compared the target protein avidity of each antibody to minimize any potential impact on the observed measurements26,27,28,29,30,31. To do so, we compared the mean expression values using antibodies validated to react with Mu, NHP, Hu epitopes (Fig. 1B). Analysis of variance revealed no significant differences in mean expression levels of Hu, NHP, and Mu presynaptic proteins in cerebral cortex (P-value = 0.87 > 0.05 after multi-testing correction). Similarly, pairwise t-test comparisons between mean expression level of Hu and NHP presynaptic events in neostriatum (P-value = 0.93 > 0.05) and Hu and Mu presynaptic events in hippocampus (P-value = 0.99 > 0.05) also revealed no significant differences after multi-testing correction (Fig. S1B–C). These results show that the antibody panel did not have significant differences in reactivity across the three species.
In addition, no significant differences on the marker variance was observed when merging data from different species (P-value > 0.05) (Fig. 1C). Taken together, our results show that potential sources of technical variation in antibody reactivity across species are minimal and non-significant, indicating that within our defined parameters SynTOF cross-species comparison can be pursued.
Minimal confounded model enables cross-species comparison
We explored further the extent to which species-specific variations might impact our results. To do so, our machine-learning clustering algorithm (described in “Methods”) was jointly applied to presynaptic SynTOF data from Hu, NHP, and Mu using one model per brain region, to avoid confounding our results with regional variability (Fig. 1D, E). Single events per species with a mean frequency lower than 0.01 per cluster were filtered out to abrogate the contribution of noise. Cluster consistency was validated using the silhouette score32.
Since assessing the correctness of a clustering method would require labeled data or prior knowledge of presynaptic composition, we assessed the impact of well-known technical confounding factors on data variations. T-distributed stochastic neighbor embedding (t-SNE) applied on the shared representation of single presynapse exhibited good mixing, without clear separation between subjects or sex (Figs. 1D, S1D). The impact of species on this clustering was evaluated by creating a nearest-neighbor graph built on the mean expression vector of each subject on each cluster, weighted using the inverse Euclidean norm (Fig. 1E). To assess the confounding of species in clustering output, only clusters gathering events from multiple species—11 clusters including both Hu and NHP (P1–P11) and one cluster gathering events from the three species (A1)—were considered when creating the graph. The observed proximity between nodes from the same clusters in the graph suggests that our algorithm created low-dimensional representation that clustered events by presynaptic event features rather than species differences32. A higher similarity score was observed between presynapses and marker expression within the same cluster than within each species (see Fig. S2A–D and Method).
A similar validation pipeline was applied on models trained using data from other brain regions. As a meta-analysis, we found that using a separate model for each of the three species, thus completely eliminating marker reactivity issues, resulted in the same outcome as using a single model for all species. Specifically, Hu and NHP clusters from different models retained strong correlations compared to Mu (Fig. S2E–G). In addition, no significant differences were observed between the intra- and inter-species median correlations of primate (Hu and NHP) and Mu pre-synaptic subpopulations defined either using one model for all three species (single) or separate models for each species (separate) with the same number of clusters per species (Wilcoxon’s test P-value > 0.05) (Fig. S2G).
These results suggested the clustering method was minimally impacted by technical variability, reflecting observations using separate models (Fig. S2E–G) and the absence of technical confounder effects on our model results (Fig. 2A–C).
Finally, overlaying the original t-SNE with protein expression profiles revealed an overall high enrichment of presynaptic SNAP25 across species, spanning all clusters, with no significant differences between the three groups (P-value = 0.74 after Kruskal–Wallis test), further validating our presynaptic gating strategy (Fig. S3A, B). All together, these multiple controls and checks ensure that the antibodies selected for our comparative SynTOF study yielded robust single presynapse data across species with minimal confounding.
Distinct cerebral cortical presynaptic molecular composition between primates and Mu
The generated clusters mainly exhibited species-specific presynaptic subgroups, identifying 11 clusters exclusively with primate presynaptic events (P1–11) and 4 clusters composed entirely of Mu samples (Mu1-4) (Fig. 2A–D). Although one cluster grouped together events from all species (A1), the overall low expression for the 20 proteins observed in the A1 cluster suggests that presynaptic events present in this cluster remained largely undistinguished by the chosen antibody panel (Fig. 2B). A higher proportion of Mu events was observed compared to primate events in this cluster (Fig. 2C, D). Interestingly, VGLUT and GAD65 were found to co-expressed in one Mu-specific cluster (Mu4) and one Primate-specific cluster (P8) (Figs. 2B, S3C). Similar to17, one “high-expressed” primate-specific cluster was found (P4) with high expression of most of the markers (Fig. 2B) including co-expression of VGLUT, VMAT2 and SERT (Figs. 2B, S3C). Hu and NHP events were unequally distributed between primates-specific clusters, with a significant difference in event abundance between species observed in 4 clusters (P3, P7, P8, P10) (Fig. 2C). Additional statistical analysis showed high expression of GAD65 in 4 clusters (Mu2, Mu4, P8 and P9) and high expression of VGLUT in all Mu-specific clusters and 5 primate clusters (P1, P4, P5, P7 and P8) (Figs. 2B, S3B–D).
A more meaningful description of the underlying organization of the different clusters and the relative differences between species was obtained by building a Pearson correlation graph from the mean expression vectors of each species, illustrated in Fig. 2E. This graph exhibited significantly stronger correlations within primate-specific clusters than between primates and Mu, splitting the presynaptic events into two subgroups, with the multi-species cluster (A1) of unidentified events lying at the intersection of this binary partition.
This underlying molecular composition was consistent with the evolutionary tree and can be appreciated at different scales. At a high level, a cross-species correlation analysis showed lower correlation coefficients between mean expression of Mu and Hu protein levels than NHP and Hu protein levels, highlighting the close presynaptic molecular proximity between NHP and Hu compared to Hu and Mu (Figs. 2E, S3E). The resulting partitioning is also noticed at the single presynaptic event level with the t-SNE plot, depicting a species-dependent structure of the presynaptic molecular events with a large overlay between Hu and NHP samples (Fig. 2D).
The divergent nature of Hu and Mu presynaptic events in the cerebral cortex was also observed in hippocampus. Using the same pipeline, a new model was trained on presynaptic events from Hu and Mu hippocampi, generating a hippocampal low dimensional space with a few overlaps between events from the two species, corroborating our findings from the cerebral cortex (Fig. S4). Notably, three out of fourteen clusters contained events from both Hu and Mu, including two clusters of low-expressed markers and one cluster with a significantly greater number of Mu events found in the hippocampus (Fig. S4A–D). While these three Hu-Mu clusters showed high correlation, significantly higher expression of GAD65, Synaptobrevin and lower expression of DJ1 and ApoE in Mu compared with Hu distinguished the two species (Adjusted Wilcoxon’s P-value < 0.05) (Fig. S4E, F). Furthermore, hippocampal Hu-specific clusters showed similar profiles as primate-specific presynapses in the cerebral cortex: one “high expressed” human-specific cluster was generated aligning with our observation in the cerebral cortex and previous study17, and one Mu-specific cluster with co-expression of VGLUT and GAD65 (Fig. S4G).
More broadly, inter-individual correlations were more consistent within primates or Mu than across these species (Fig. 3A). For meta-analysis across brain regions, a correlation network associating mean expression per cluster per brain region was built, visualizing a comprehensive higher correlation within Mu-specific clusters regardless of brain region, (as suggested by the proximity of this community on the graph), than across clusters including primate samples (Fig. 3B). Together this comparative analysis describes an overall quantitatively divergent nature of presynaptic molecular composition between primates and Mu.
Pseudo-bulk analysis shows species-specific molecular profiles and weak connection between nuclear transcriptomic and presynaptic proteomic data in cerebral cortex
As no common presynaptic clusters were identified between Mu and primates, we performed a pseudo-bulk differential analysis to identify species-specific expression. To do so, the adjusted Wilcoxon test was applied on pseudo-bulk marker mean expressions between Hu and NHP cerebral cortex, and between Primate and Mu. A holistic comparative analysis of the cerebral cortex indicated significantly higher enrichment of CD47, ApoE, calreticulin, GAMT, SLC6A8, GATM and VMAT2 (P-value < 0.05), and lower expression of Synaptobrevin 2 (P-value < 1e−7), in primates compared with Mu, with no significant differences between Hu and NHP (P-value > 0.05) (Fig. 3C). Furthermore, significantly higher levels of pseudo-bulk mean expression of both GAD65, an enzyme specifically expressed by inhibitory neurons, and VGLUT, a vesicular transporter expressed by excitatory neurons, were found in Mu than in the two primates (Kruskal–Wallis test P-value < 1e−2) (Fig. S3A). Similar observations were found in the hippocampus: ApoE, calreticulin, CD47, CD56 and DJ1, had significantly higher expression in Hu, while synaptobrevin2, GAD65, VGLUT, as well as APP, TMEM and LRRK2 had significantly lower expression in Hu than in Mu (Fig. S4H).
We hypothesized that the synaptic proteomic signature across species also might be observed at the transcriptomic level by investigating the relative differences between protein abundance at the presynapse level and gene expression at the nuclear level. Exploiting publicly available transcriptomic data from Hu (n = 2), NHP (in this case marmoset) (n = 2), and Mu (n = 12)33, a pseudo-bulk comparison was applied to the 20 transcripts that encode the proteins targeted by our SynTOF panel. Although limited by the small cohort size, most of our presynaptic protein expression data correlated poorly with the nuclear abundance of the corresponding transcript in the motor cortex (Fig. 3D). Interestingly, the significant relative overabundance of VGLUT in Hu was conserved at both the nuclear transcript and synaptic protein levels. Although technical confounders including the cross-species brain size and relative age differences might limit this study, these results align with what has been observed by others in bulk tissue34,35,36, and emphasize the value of SynTOF in discovering the molecular composition of synapses.
Integrated analysis between Hu and NHP presynaptic events in cerebral cortex and neostriatum exhibited strong proximity of primate presynaptic organization
Finally, we supplemented our results from the frontal cortex by performing the same clustering analysis using NHP and Hu samples from the neostriatum (Figs. 4A, S5A). Precisely, the same unsupervised workflow (with a newly trained model) was applied on single-presynapses acquired from the neostriatum of both primates (see “Methods”). A correlation network built from species-specific mean expression per clusters brought out a stronger correlation among intra-cluster presynaptic events from different species as indicated by proximity of these vectors in the correlation network, emphasizing the overall strong similarity between primate presynaptic molecular composition for the 20 proteins analyzed (Fig. 4B). Just as for the frontal cortex, the presynaptic events from the neostriatum of the two primates blend well together, forming 15 new cross-species clusters (NS-P1-15) (Fig. 4A–C), including one “high-expressed” cluster (NS-P9) and one VGLUT-VMAT2-SERT co-expressed cluster (Fig. S5A, B). However, the relative presynaptic proportion per cluster contrasts significantly between the two species in 10 clusters (Fig. 4D).
To gain further insight of the differences between Hu and NHP, we investigated markers differentially expressed between Hu and NHP within each cluster (Fig. 4E). Using 20 phenotypic markers, we compared mean marker intensity between Hu and NHP using adjusted Wilcoxon’s test. Differences between the two primates included a relatively higher expression of DJ1 and AS in frontal cortex, and ApoE in neostriatum for Hu compared to NHP with an adjusted P-value < 0.05. Contrarily, parkin and LRRK2 in both regions, VMAT in frontal cortex and VGLUT in neostratium showed overall higher expression in NHP compared to Hu samples (FDR after correction P-value < 0.05). Minor differences in the Hu neostriatum include BIN1 and SERT (3 clusters) decreased protein expression, and AS and GAMT (2 clusters) increased expression compared to NHP (Figs. 4E, S5C).
Discussion
Cross-species comparison of the central nervous system has a long tradition of illuminating human-specific structures and providing insight into function. Indeed, Bjornson-Hooper, et al. recently reported an extensive cross-species comparison of Hu, NHP, and Mu CyTOF data from immune cells in blood, demonstrating the importance of understanding differences between human and model organisms37. Here, we report our cross-species comparison of presynaptic molecular composition using publicly available SynTOF data. As far as we are aware, our study provides the first unsupervised integrating cross-species comparison of multiplexed, single presynaptic data. Several antibodies were used to gate for presynaptic events versus other debris in the homogenate, yielding 20-plex quantitative data on over 4.5 million single, highly enriched presynaptic events from the three species. These unparalleled broad, deep, and specific multispecies molecular data formed the basis of our comparative presynaptic investigation. Leveraging recent machine-learning advances, we developed the first integrated framework to compare the large cross-species datasets generated by SynTOF and investigated key differences and similarities among species and brain regions in presynaptic protein expression from disease-free (control) Hu, NHP, and Mu brain.
Analysis across multiple species can be challenging as data obtained from different groups might be confounded by unidentified technical and biological factors26. While many techniques have been developed to correct known unwanted variations26,27,28,29,31, determining which biological factors play the most important roles in cross-species investigations remains an open question. While merged cross-species data enables direct comparison, it has some potential limitations, including notably the possible technical differences across species. This potential limitation, inherent to antibody-based experiments, was addressed by conducting statistical analyses on SynTOF signals across species and computing the dispersion between single-species and multi-species datasets. These analyses revealed no significant differences in both cases (P-value > 0.05). Additional validation analysis suggested that the clustering method was minimally impacted by technical variability, reflecting observations using separate models (Fig. S2E–G) and the absence of technical confounders on our model results (Fig. 1D–F). All together, these results confirmed the minimal influence of technical variation on SynTOF signals, and allowed the comparison of synaptic subpopulations using our multi-species integrating approach despite known differences in target sequence.
We focused primarily on isocortical samples because this region was available from all three species. Our clustering analysis revealed two non-overlapping types of presynapse clusters: primate-specific and Mu-specific; pseudo-bulk comparative analysis of primate and Mu protein expression in cerebral cortex exhibited 8 significantly higher protein levels in primates including components of “eat-me, don’t eat me” signaling between synapses and microglia (ApoE, calreticulin, and CD47)38, creatine metabolism (GAMT, GATM and SLC6A8), and neurotransmission (VMAT2)39,40,41,42. In contrast, 3 proteins involved in the machinery of neurotransmission (GAD65, synaptobrevin2, VGLUT) had a significantly lower expression in primates. A low correlation was found between presynaptic protein expression and the transcriptomic abundance of these proteins in the motor cortex34,43,44 with the exception of lower expression of VGLUT at both the presynaptic protein and nuclear transcriptomic levels (Fig. 3C).
We supported our analysis in cerebral cortex by comparing SynTOF data from Hu and Mu in the allocortical hippocampus. The clustering algorithm revealed a small overlap between single presynapses from the two species, corroborating the pervasive difference in the molecular composition of presynapses in Hu and Mu. Although most of our results aligned well with existing knowledge of the molecular composition of synapses, some synaptic profiles were unexpected, such as GAD65 + VGLUT + SERT in H-Mu4 (Fig. S4C), and should be interpreted cautiously until an alternative highly multiplexed single-synapse technology is available that can validate our results. Pseudo-bulk analysis performed on hippocampus data and led to similar significant differences between Hu and Mu presynaptic protein levels, which overlapped with isocortical presynapses (i.e., increased ApoE, calreticulin, CD47 and decreased VGLUT, synaptobrevin-2, GAD65 in Hu compared to Mu). These data support significant differences in both isocortical and hippocampal presynapse protein abundance important to basic functions such as neurotransmission, energy metabolism, and synaptic pruning.
Although differences in protein expression were observed between NHP and Hu, the generated clusters did not identify any Hu-only or NHP-only subgroups, indicating strong proximity between the molecular composition of presynapses for the two primates in both frontal cortex and neostriatum. In the frontal cortex, where only 6 presynaptic proteins expressed significantly different between Hu and NHP across all clusters AS and DJ1 expression was higher in Hu, while LRRK2 and parkin showed lower expression in Hu. In neostriatum, the significant cross-cluster differences between Hu and NHP presynaptic protein levels validated higher expression of ApoE in Hu and lower expression of the same three proteins (LRRK2, Parkin) in Hu.
There are several limitations to our study, including those inherent in using available data15 that, by nature, will in hindsight have missed opportunities for maximum utility. First, our study is of course limited by the 20 marker panel chosen to characterize the presynaptic particles across species. Specifically, around 32% of Mu pre-synapses and less than 3% of primate pre-synapses are undistinguished by our panel (Fig. 2C). In addition, a practical limitation in this study was the enormous difference in the size of the cerebral cortex between the three species. We used a region of prefrontal cerebral cortex (Brodmann area 9) from Hu, prefrontal cortex from NHP, and all of cerebral cortex from Mu. In theory, this variation in subregion of cerebral cortex in the different species and differences in potential projections from synapsing cells originating in other region might confound our results; Similarly, cellular composition and distribution34,45, synapse density46, network organization47,48, morphological features49 can vary across species. These variations could potentially contribute to the observed differences at the presynaptic level. Indeed, Bakken et al., reported a broadly conserved cellular taxonomy across mice, marmosets and humans albeit with differences in cellular proportion and gene expression in the motor cortex. Furthermore, they found a larger overlap of neuronal cell type composition between humans and marmosets (39%) than primates and mice (27%) along with a significant difference in the ratio of excitatory to inhibitory neurons (2:1 in humans, 3:1 in marmosets, and 5:1 in mice). These species-specific cellular profiles may explain the substantial overlap observed between Hu and NHPs presynapses, as well as the weaker similarity between primates and mice presynaptic composition. However, we expect that it has limited impact because we already have shown no significant difference in data from this same SynTOF panel from Hu temporal versus parietal cortex17.
Age also was a potential source of variability in our study. Approximated as percent of maximum lifespan, the humans had lived 73% ± 4.8, the macaques had lived 40% ± 0.9, and the mice had lived 96% of their lifespan. While Hu and Mu were comparably aged relative to maximum lifespan, NHP were relatively younger. Differences in age may influence the observed pre-synaptic composition, given the diverse lifetime of synaptic proteins. Recently50, revealed the heterogeneity and wide range of synapse protein lifetimes in various regions of the mouse brain. Although the lifespan and diversity of synaptic proteins in primates remain unexplored, synapse density and function were found altered during brain aging51,52. However, we found many differences in presynapse protein abundance between the similarly aged Hu and Mu, suggesting that these are valid species differences relatively uncompromised by effects of aging. We observed many fewer regional differences in SynTOF presynaptic signal between Hu and NHP, suggesting that at least for these 20 proteins there may be limited impact of aging when both clinical and pathologic examinations are used to exclude subclinical “age-related” diseases. A more comprehensive understanding of synapse protein lifetime across species would enhance our understanding of synaptic function and diversity.
In summary, we proposed a machine learning framework to compare presynapse molecular abundance across three species and three brain regions. After extensive analysis to assure the validity of cross-species comparison of SynTOF data, we observed significant differences in protein abundance of primate (Hu and NHP) vs. Mu presynapses with respect to synaptic pruning, cellular energy, lipid metabolism, and neurotransmission. In contrast, there was strong overlap between presynaptic molecular composition of Hu and NHP presynapses in both cerebral cortex and neostriatum. As expected, presynaptic composition correlated with evolutionary distance. More divergent presynaptic landscapes were observed between Hu and Mu (~ 87 MYA) than between Hu and NHP (~ 28.9 MYA)53, aligning with cross-species transcriptome and epigenome comparative analysis33.
Brain species specificity has been extensively studied at the cell level, comparing transcriptome expression from various brain regions. Our proteomic results did not correlate well with nuclear transcriptome, as has been observed by others in bulk tissue, and thereby provide a unique perspective on the comparative molecular composition of presynapses that may guide functional insight in humans and other species.
Methods
All SynTOF data used in the present study were generated in a previous publication17 and are publicly available. Please, see the Supplemental Methods for details. As described therein human, macaque, and mouse synaptosomes were prepared using established protocols10, modified for CyTOF analysis18, including mass tag barcoding15.
Unsupervised deep-learning approach for interspecies clustering
Identification of subpopulation similarities and differences between species usually relies on an integrated clustering assignment pipeline54,55,56,57. However, clustering in high-dimensional space is challenging, due to the unreliability of similarity metrics58. Dimension reduction techniques, such as Principal Component Analysis (PCA), circumvent this issue by lowering the dimensionality of the input data. Yet, the representative power of popular dimension reduction algorithms such as PCA is limited due to assumptions made about the data (e.g., linearity).
To bypass this issue, we leverage neural networks and their attractive property to model complex relationships between high dimensional data to develop a unified unsupervised framework for comparing the profile of more than 4.5 million presynaptic events among normal Hu, NHP, and Mu samples. That is, we used a fully connected neural network59 that has proven its effectiveness on large single-event datasets17,60. Indeed, this approach provides an effective framework to handle our large and heterogeneous multi-species dataset and perform direct comparisons, while being a suitable solution to simultaneously derive a conjoint cross-species low-dimensional representation and identify differences between the observed presynaptic events groups. We used the autoencoder paradigm, a non-linear embedding method, to derive a compressed low dimensional representation of the input data61 while jointly clustering it in an unsupervised way using a loss that preserves the local structure of the input data, critical to perform accurate clustering of the data.
We trained the neural network models with single-event vectors, representing expression across 20 markers, sampled equally from multiple species from the same region. Pooling the data enables direct comparisons between species. Three independent models were trained using samples from different brain regions: cerebral cortex (Hu, NHP, and Mu), hippocampus (Hu and Mu), and neostriatum (Hu and NHP). For each region, a balanced dataset was obtained by downsampling the number of events, stratified by species, to have a similar proportion of events across species and avoid creating a biased integrated space toward one dominating species.
Before applying clustering, we first pretrained the two autoencoders to learn a common low dimensional representation of the multi-species input data by minimizing the standard mean square error (MSE) loss (https://github.com/tpjoe/SynTOF2021)17. Then, the weights of the network were fine-tuned minimizing the reconstruction loss as well as a clustering loss62, added on top of the low dimensional representation of the two autoencoders. The two autoencoders consist of sequences of fully connected layers with a bottleneck in the middle that imposes a compressed representation of the original input. The optimal number of clusters was derived using the Elbow method and the centers of the clusters were initiated using the K-means algorithm. Both trainings were performed using an adaptive gradient method (Adagrad)63, with an initial learning rate of 0.1 and a batch size of 1024. A common Glorot scheme64 was adopted to initialize the weights of the networks. The optimal number of epochs was derived automatically exploiting the early stopping algorithm. Finally, the stochastic nature of the training was reduced by repeating the whole training procedure 10 times. Cluster robustness was assessed by applying a consensus meta-clustering which combines clusters from 10 runs using a greedy algorithm65, resulting in 15 clusters in the prefrontal cortex (Fscore = 0.776, NMI = 0.702). No hyperparameter selection process was employed. Our model was implemented in Python using Keras66. The architecture of the model with its parameterization is represented in Fig. S6.
For dimension reduction, we used the python implementation of tSNE algorithm from scikit learn library67, which has a random initialization of the point position. Reproducibility of the figures was ensured by fixing the initialization.
Clustering validation
We validated the consistency of the generated clustering partition using a silhouette score32. This score represents the similarity of an event to events from its own group compared to events from other groups. Comparing silhouette scores based on event clustering defined by model (\({s}_{cluster}\)), based on species grouping (\({s}_{species}\)) and subject grouping (\({s}_{subject}\)), allowed us to quantify bias in learned low dimensional embedding regarding species or subject origin26. The silhouette score of overall partitioning generated by the model reached 0.6, whereas the silhouette score obtained by grouping single events based on species origin or subjects was 0.1 or − 0.1, respectively.
To gain further insight into how the model generates the clustering of the single presynaptic events, we leveraged earth mover’s distance (EMD)68, which measures the similarity between distributions and quantifies the impact of confounders in CyTOF data26,31. Briefly, we computed the pairwise EMD between marker expression distribution of different species across clusters for each marker in cerebral cortex. A significantly lower mean EMD (P-value < 0.001) was found when comparing marker expression between species within the same clusters to different clusters for almost all markers (Fig. S2A–D).
Statistical analysis
Coefficient of variation (CV) is a popular metric to quantify the homogeneity and the spread of the distribution and compare the relative variability between datasets69. For instance, it has been widely used to quantify the batch effect on the data variance70,71. Here, CV was used to quantify the effect of pooling multiple species SynTOF data together on the marker expression variance by comparing the CV computed for Hu data only with the CV computed for combined data from multiple species (Fig. 1D). There was no significant difference between mean coefficient of variation (CV) from the two datasets (t-Test P-value = 0.79 in cerebral cortex = 0.99 in neostriatum, and = 0.98 > 0.05 in hippocampus), demonstrating that significant variation in marker distribution spread was not observed when pooling data from different species.
Graph-based analysis
A Pearson correlation graph was built using the spring layout of the Networkx Python package (https://networkx.org/documentation/stable/), based on the Fruchterman-Reingold algorithm72. Edge weights were set equal to the absolute correlation coefficient between nodes. P-values were derived using the Pearsonr function from Python package Scipy73 and adjusted for multiple hypothesis testing using Bonferroni correction. Only significant edges above the global mean edge value were drawn. For visualization purposes, we filtered out correlation coefficients lower than the mean correlation coefficient (Figs. 2E, 3B, S4F).
Proximity between subjects across clusters was determined using a nearest-neighbor graph built on single-event mean expression (Fig. 1E). To do so, we set edges equal to the normalized Euclidean distance between the nodes and only significant edges above the global mean edge value were drawn. A similar layout was used for this graph.
Transcriptomic data (Single nucleus RNA-seq)
Transcriptomic data (single nucleus RNA-seq) from the three species from motor cortex provided by33 were used to perform the pseudo-bulk analysis. Statistical analysis was performed in Python using Scipy library73 on all SCT normalized neuronal single cells74.
Visualization
Figures have been created using Matplotlib and seaborn packages in Python. Biorender was used to generate subfigures in Figs. 1, 2, 4, S2 and S4.
Data availability
The codes are available at https://github.com/elo-nsrb/SynTOF_Cross-species_analysis. The raw SynTOF data are available on Dryad at https://doi.org/10.5061/dryad.z612jm6cr (see18 for more information).
References
Südhof, T. C. Towards an understanding of synapse formation. Neuron 2018, 276–293. https://doi.org/10.1016/j.neuron.2018.09.040 (2018).
Terry, R. D. et al. Physical basis of cognitive alterations in Alzheimer’s disease: Synapse loss is the major correlate of cognitive impairment. Ann. Neurol. 1981, 572–580. https://doi.org/10.1002/ana.410300410 (1991).
Honer, W. G. et al. Cognitive reserve, presynaptic proteins and dementia in the elderly. Transl. Psychiatry 2, e114 (2012).
Masliah, E., Terry, R. D., Alford, M., DeTeresa, R. & Hansen, L. A. Cortical and subcortical patterns of synaptophysin like immunoreactivity in Alzheimer’s disease. Am. J. Pathol. 138, 235–246 (1991).
DeKosky, S. T. & Scheff, S. W. Synapse loss in frontal cortex biopsies in Alzheimer’s disease: Correlation with cognitive severity. Ann. Neurol. 27, 457–464 (1990).
Slotkin, T. A. et al. Regulatory changes in presynaptic cholinergic function assessed in rapid autopsy material from patients with Alzheimer disease: Implications for etiology and therapy. Proc. Natl. Acad. Sci. U. S. A. 87, 2452–2455 (1990).
Koopmans, F. et al. SynGO: An evidence-based, expert-curated knowledge base for the synapse. Neuron 103, 217-234.e4 (2019).
Gylys, K. H. & Bilousova, T. Flow cytometry analysis and quantitative characterization of tau in synaptosomes from alzheimer’s disease brains. Methods Mol. Biol. 1523, 273–284 (2017).
Bilousova, T. et al. Synaptic amyloid-β oligomers precede p-tau and differentiate high pathology control cases. Am. J. Pathol. 186, 185–198 (2016).
Postupna, N. O. et al. Flow cytometry analysis of synaptosomes from post-mortem human brain reveals changes specific to Lewy body and Alzheimer’s disease. Lab. Invest. 94, 1161–1172 (2014).
Sokolow, S. et al. Isolation of synaptic terminals from Alzheimer’s disease cortex. Cytometry A 81, 248–254 (2012).
Bendall, S. C., Nolan, G. P., Roederer, M. & Chattopadhyay, P. K. A deep profiler’s guide to cytometry. Trends Immunol. 33, 323–332 (2012).
Bandura, D. R. et al. Mass cytometry: Technique for real time single cell multitarget immunoassay based on inductively coupled plasma time-of-flight mass spectrometry. Anal. Chem. 81, 6813–6822 (2009).
Bendall, S. C. et al. Single-cell mass cytometry of differential immune and drug responses across a human hematopoietic continuum. Science 332, 687–696 (2011).
Gajera, C. R. et al. Mass-tag barcoding for multiplexed analysis of human synaptosomes and other anuclear events. Cytometry A 2021, 939–945. https://doi.org/10.1002/cyto.a.24340 (2021).
Zhou, X. J. & Gibson, G. Cross-species comparison of genome-wide expression patterns. Genome Biol. 5, 232 (2004)
Phongpreecha, T. et al. Single-synapse analyses of Alzheimer’s disease implicate pathologic tau, DJ1, CD47, and ApoE. Sci. Adv. 7, eabk0473 (2021).
Gajera, C. R. et al. Mass synaptometry: High-dimensional multi parametric assay for single synapses. J. Neurosci. Methods. 312, 73–83 (2019).
Shafer, M. E. R. Cross-species analysis of single-cell transcriptomic data. Front. Cell Dev. Biol. 7, 175 (2019).
Elhmouzi-Younes, J. et al. In depth comparative phenotyping of blood innate myeloid leukocytes from healthy humans and macaques using mass cytometry. Cytometry A 91, 969–982 (2017).
Bjornson-Hooper, Z. B. et al. Cell type-specific monoclonal antibody cross-reactivity screening in non-human primates and development of comparative immunophenotyping panels for CyTOF. Biorxiv https://doi.org/10.1101/577759 (2019).
Montine, T. J. et al. National Institute on Aging-Alzheimer’s Association guidelines for the neuropathologic assessment of Alzheimer’s disease: A practical approach. Acta Neuropathol. 123, 1–11 (2012).
Hyman, B. T. et al. National Institute on Aging-Alzheimer’s Association guidelines for the neuropathologic assessment of Alzheimer’s disease. Alzheim. Dement. 8, 1–13 (2012).
Frye, B. M. et al. Aging-related Alzheimer’s disease-like neuropathology and functional decline in captive vervet monkeys (Chlorocebus aethiops sabaeus). Am. J. Primatol. 83, e23260 (2021).
Latimer, C. S. et al. A nonhuman primate model of early Alzheimer’s disease pathologic change: Implications for disease pathogenesis. Alzheimer’s Dement. 15, 93–105 (2019).
Trussart, M. et al. Removing unwanted variation with CytofRUV to integrate multiple CyTOF datasets. Elife 2020, 9. https://doi.org/10.7554/eLife.59630 (2020).
Van Gassen, S., Gaudilliere, B., Angst, M. S., Saeys, Y. & Aghaeepour, N. CytoNorm: A normalization algorithm for cytometry data. Cytometry A 97, 268–278 (2020).
Finck, R. et al. Normalization of mass cytometry data with bead standards. Cytometry A 83, 483–494 (2013).
Schuyler, R. P. et al. Minimizing batch effects in mass cytometry data. Front. Immunol. 10, 2367 (2019).
Liechti, T. et al. An updated guide for the perplexed: Cytometry in the high-dimensional era. Nat. Immunol. 22, 1190–1197 (2021).
Leipold, M. D. et al. Comparison of CyTOF assays across sites: Results of a six-center pilot study. J. Immunol. Methods 453, 37–43 (2018).
Rousseeuw, P. J. Silhouettes: A graphical aid to the interpretation and validation of cluster analysis. J. Comput. Appl. Math. 1987, 53–65. https://doi.org/10.1016/0377-0427(87)90125-7 (1987).
Bakken, T. E. et al. Comparative cellular analysis of motor cortex in human, marmoset and mouse. Nature 598, 111–119 (2021).
de Sousa, A. R., Penalva, L. O., Marcotte, E. M. & Vogel, C. Global signatures of protein and mRNA expression levels. Mol. Biosyst. 5, 1512–1526 (2009).
Vogel, C. & Marcotte, E. M. Insights into the regulation of protein abundance from proteomic and transcriptomic analyses. Nat. Rev. Genet. 13, 227–232 (2012).
Zhang, B. et al. Comparative transcriptomic and proteomic analyses provide insights into the key genes involved in high-altitude adaptation in the Tibetan pig. Sci. Rep. 7, 1–11 (2017).
Bjornson-Hooper, Z. B. et al. A comprehensive atlas of immunological differences between humans, mice, and non-human primates. Front. Immunol. 13, 867015 (2022).
Lehrman, E. K. et al. CD47 protects synapses from excess microglia-mediated pruning during development. Neuron 100, 120-134.e6 (2018).
Sidransky, E. et al. Multicenter analysis of glucocerebrosidase mutations in Parkinson’s disease. N. Engl. J. Med. 361, 1651–1661 (2009).
Brockmann, K. et al. GBA-associated PD presents with nonmotor characteristics. Neurology 77, 276–280 (2011).
Neumann, J. et al. Glucocerebrosidase mutations in clinical and pathologically proven Parkinson’s disease. Brain 132, 1783–1794 (2009).
Davis, M. Y. et al. Association of GBA mutations and the E326K polymorphism with motor and cognitive progression in parkinson disease. JAMA Neurol. 73, 1217–1224 (2016).
Franjic, D. et al. Transcriptomic taxonomy and neurogenic trajectories of adult human, macaque, and pig hippocampal and entorhinal cells. Neuron 110, 452–469 (2022).
Hodge, R. D. et al. Conserved cell types with divergent features in human versus mouse cortex. Nature 573, 61–68 (2019).
Beauchamp, A. et al. Whole-brain comparison of rodent and human brains using spatial transcriptomics. Elife 11, (2022).
Wildenberg, G. A. et al. Primate neuronal connections are sparse in cortex as compared to mouse. Cell Rep. 36, 109709 (2021).
Laramée, M.-E. & Boire, D. Visual cortical areas of the mouse: Comparison of parcellation and network structure with primates. Front. Neural Circuits 8, 149 (2014).
Garin, C. M. et al. An evolutionary gap in primate default mode network organization. Cell Rep. 39, 110669 (2022).
Wan, B. et al. Heritability and cross-species comparisons of human cortical functional organization asymmetry. Elife 11, (2022).
Bulovaite, E. et al. A brain atlas of synapse protein lifetime across the mouse lifespan. Neuron 110, 4057–4073.e8 (2022).
Freire-Cobo, C. et al. Neuronal vulnerability to brain aging and neurodegeneration in cognitively impaired marmoset monkeys (Callithrix jacchus). Neurobiol. Aging 123, 49–62 (2023).
Peters, A., Sethares, C. & Luebke, J. I. Synapses are lost during aging in the primate prefrontal cortex. Neurosci. 152, 970–981 (2008).
Kumar, S., Stecher, G., Suleski, M. & Hedges, S. B. TimeTree: A resource for timelines, timetrees, and divergence times. Mol. Biol. Evol. 34, 1812–1819 (2017).
Aghaeepour, N., Nikolic, R., Hoos, H. H. & Brinkman, R. R. Rapid cell population identification in flow cytometry data. Cytometry A 79, 6–13 (2011).
Stanley, N. et al. VoPo leverages cellular heterogeneity for predictive modeling of single-cell data. Nat. Commun. 11, 3738 (2020).
Van Gassen, S. et al. FlowSOM: Using self-organizing maps for visualization and interpretation of cytometry data. Cytometry A 87, 636–645 (2015).
Weber, L. M. & Robinson, M. D. Comparison of clustering methods for high-dimensional single-cell flow and mass cytometry data. Cytometry A 89, 1084–1096 (2016).
Assent, I. Clustering high dimensional data. WIREs Data Min. Knowl. Discov. 2012, 340–350. https://doi.org/10.1002/widm.1062 (2012).
Guo, X., Gao, L., Liu, X. & Yin, J. Improved deep embedded clustering with local structure preservation. In Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence (2017) https://doi.org/10.24963/ijcai.2017/243.
Li, X. et al. Deep learning enables accurate clustering with batch effect removal in single-cell RNA-seq analysis. Nat. Commun. 11, 2338 (2020).
Baldi, P. Autoencoders, unsupervised learning, and deep architectures. In Proceedings of ICML Workshop on Unsupervised and Transfer Learning (eds. Guyon, I. et al.) 37–49 (PMLR, 2012).
Xie, J., Girshick, R. & Farhadi, A. Unsupervised deep embedding for clustering analysis. In Proceedings of the 33rd International Conference on Machine Learning (eds. Balcan, M. F. & Weinberger, K. Q.) 478–487 (PMLR, 2016).
Duchi, J., Hazan, E. & Singer, Y. Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 2011, 12. https://www.jmlr.org/papers/volume12/duchi11a/duchi11a.pdf (2011).
Glorot, X. & Bengio, Y. Understanding the difficulty of training deep feedforward neural networks. In Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics (eds. Teh, Y. W. & Titterington, M.) 249–256 (Chia Laguna Resort, 2010).
Hornik, K. A CLUE for CLUster ensembles. J. Stat. Softw. 2005, 1–25. https://doi.org/10.18637/jss.v014.i12 (2005).
Chollet, F. Keras: The Python Deep Learning library. Astrophysics Source Code Library ascl:1806.022 (2018). https://ui.adsabs.harvard.edu/abs/2018ascl.soft06022C.
Kramer, O. Scikit-Learn. In Machine Learning for Evolution Strategies (ed. Kramer, O.) 45–53 (Springer International Publishing, 2016).
Rubner, Y. The earth mover’s distance as a metric for image retrieval. Int. J. Comput. Vis. 40, 99–121 (2000).
Takkouche, B., Cadarso-Suarez, C. & Spiegelman, D. Evaluation of old and new tests of heterogeneity in epidemiologic meta-analysis. Am. J. Epidemiol. 1999, 206–215. https://doi.org/10.1093/oxfordjournals.aje.a009981 (1999).
Liu, X. et al. A comparison framework and guideline of clustering methods for mass cytometry data. Genome Biol. 20, 297 (2019).
Lo, Y.-C. et al. CytofIn enables integrated analysis of public mass cytometry datasets using generalized anchors. Nat. Commun. 13, 934 (2022).
Fruchterman, T. M. J. & Reingold, E. M. Graph drawing by force-directed placement. Softw. Pract. Exp. 1991, 1129–1164. https://doi.org/10.1002/spe.4380211102 (1991).
Virtanen, P. et al. SciPy 1.0: Fundamental algorithms for scientific computing in Python. Nat. Methods 17, 261–272 (2020).
Hafemeister, C. & Satija, R. Normalization and variance stabilization of single-cell RNA-seq data using regularized negative binomial regression. Genome Biol. 20, 296 (2019).
Acknowledgements
This work was supported by grants from the National Institutes of Health: AG049638 (T.J.M), AG057707 (T.J.M.), GM138353 (N.A), AG077443 (T.J.M, N.A) HL087103 (C.A.S.), AG058829 (C.A.S. and S.C.), P30 AG072947 (S.C.), HL122393 (T.C.R.), AG057915 (S.C.B.), AG056287 (S.C.B.), AG068279 (S.C.B.), AG066509 (C.D.K.), and AG066567 (C.D.K.), and the Nancy and Buster Alvord Endowment (C.D.K.). We would like to thank Holden Maecker and staff of the Stanford Immune Monitoring Center for technical assistance (S10RR027431-01) and Allison Beller, Roomana Patel, and Aimee Shantz for outstanding administrative support. We thank the brain donors and their families without whom this research would be impossible.
Author information
Authors and Affiliations
Contributions
Conceptualization: T.J.M., N.A., C.R.G and E.B. Resources: N.P., C.L., C.A.S., T.C.R., S.C., and C.D.K. Methodology: E.B., C.R.G., T.P., A.P, S.A.B., M.B., A.L.C., D.D.F, C.E., N.G.R, T.J.M., N.A., and S.C.B. Investigation: E.B and C.R.G. Software: E.B., T.P. Visualization: E.B., A.P Funding acquisition: T.J.M., N.A., C.A.S, S.C., T.C.R., S.C.B and E.J.F. Project administration: T.J.M. and E.B. Supervision: T.J.M. and N.A. Writing—original draft: T.J.M., E.B., C.R.G., A.P., and K.S.M. Writing—review and editing: N.A., K.S.M., S.C.B., C.D.K., E.J.F., S.C., C.A.S., N.P., A.P., S.A.B, M.B., A.L.C., D.D.F, C.E., N.G.R
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing interests.
Additional information
Publisher's note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary Information
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Berson, E., Gajera, C.R., Phongpreecha, T. et al. Cross-species comparative analysis of single presynapses. Sci Rep 13, 13849 (2023). https://doi.org/10.1038/s41598-023-40683-8
Received:
Accepted:
Published:
DOI: https://doi.org/10.1038/s41598-023-40683-8
Comments
By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.