Evolution of complexity in the zebrafish synapse proteome

The proteome of human brain synapses is highly complex and is mutated in over 130 diseases. This complexity arose from two whole-genome duplications early in the vertebrate lineage. Zebrafish are used in modelling human diseases; however, its synapse proteome is uncharacterized, and whether the teleost-specific genome duplication (TSGD) influenced complexity is unknown. We report the characterization of the proteomes and ultrastructure of central synapses in zebrafish and analyse the importance of the TSGD. While the TSGD increases overall synapse proteome complexity, the postsynaptic density (PSD) proteome of zebrafish has lower complexity than mammals. A highly conserved set of ∼1,000 proteins is shared across vertebrates. PSD ultrastructural features are also conserved. Lineage-specific proteome differences indicate that vertebrate species evolved distinct synapse types and functions. The data sets are a resource for a wide range of studies and have important implications for the use of zebrafish in modelling human synaptic diseases.

Biplot of the first two components of a PCA model comparing ratios of homolog counts in gene families for multiple species including additional fish.
Whole genome data was obtained from Ensembl and the number of homologs between mouse and each species for gene families were determined. The resulting matrix of homolog ratios for all species were compared using PCA.
The invertebrate and chordate species shows clear clustering away from the vertebrate species. This analysis supports the assumption that gene family size is dominated by vertebrate lineage whole genome duplication events. The separation of the Spotted Gar (Lepisosteus oculatus) whose lineage diverged before the additional teleost specific WGD 6 places it in the same quadrant as the mammalian species supporting the premise that the major event in genome evolution of the teleost fish was the additional WGD event.  The following software was used to process RNAseq data: adapters were removed from the raw reads using Cutadapt 7 . TopHat2 8 was used as a wrapper for the alignment programme Bowtie2 9 to map sequence reads to the reference genome e. Box plots for the percentage of protein identity since the last common ancestor between zebrafish and mouse of proteins identified in the zebrafish brain divided into those found (SYN) or absent (non-SYN) in our synaptosomal preparation. Data from two independent zebrafish brain proteomes are used 1-3 .
f. Box plots for the percentage of protein identity since the last common ancestor between zebrafish and mouse of proteins identified in the zebrafish brain divided into those found (PSD) or absent (non-PSD) in our postsynaptic density preparation. Data from two independent zebrafish brain proteomes are used 2,3 .
g. Box plots for the percentage of protein identity since the last common ancestor between zebrafish and mouse of proteins identified in a mouse brain proteome 4 and a human frontal cortex proteome 1 divided into those found (SYN) or absent (non-SYN) in our synaptosomal preparation.
h. Box plots for the percentage of protein identity since the last common ancestor between zebrafish and mouse of proteins identified in a mouse brain proteome 4 and a human frontal cortex proteome 1 divided into those found (PSD) or absent (non-PSD) in our postsynaptic density preparation.

Supplementary Note 1. Zebrafish synapse ultrastructure
Transmission electron microscopy images of zebrafish brain suggest differential morphological features of postsynaptic densities (PSD) between zebrafish brain regions, including olfactory bulb, telencephalon, optic tectum and cerebellum (Supplementary Fig. 1-5). The most noticeable changes were found in the cerebellar corpus ( Supplementary Fig. 4) where two well-differentiated types of PSDs could be observed. Flat PSDs, looking similar to mammalian ones and to those found in other zebrafish brain regions (standard, Supplementary Fig. 5a), and curved PSDs, presenting a presynaptic bouton greatly surrounding the postsynaptic element ( Supplementary Fig. 5b). Curved PSDs were prominent in the cerebellum, accounting for 87% of the total (Supplementary Fig. 5c). Moreover, two additional subsets of curved PSDs could be observed: short (Type 1) and long (Type 2) ( Supplementary Fig. 5d). To provide empirical evidence that these PSDs were indeed different in their morphology, we measured several parameters regarding their shape and size (Supplementary Fig. 5 and Supplementary Table 1). As we expected, the measured PSD lengths (arch length, see Supplementary Fig. 5e) showed a bimodal distribution ( Supplementary Fig. 5f), indicating the presence of two subpopulations. In this distribution type 2 curved PSDs were significantly longer than type 1 (p < 0.05, Supplementary Fig. 5g and Supplementary Table 1). Actually, long PSDs were also significantly longer than flat (standard) cerebellum PSDs (p < 0.05, Supplementary Fig. 5g and Supplementary Table 1). Additionally, to prove that the observed size differences were not due to the depth of the tissue section, we Finally, since the previous analysis provided valuable information on PSD morphology, we performed similar measurements on the other zebrafish brain regions. In this case we measured two variables: PSD length and area (Supplementary Fig. 5j and Supplementary Table 1). This showed that the largest forebrain PSDs were found in telencephalon synapses (p < 0.05, Supplementary Fig.   5k-l). Moreover, PSDs from the olfactory bulb, optic tectum, cerebellum type 1 and cerebellum standard were similar in size. Finally, type 2 curved PSDs from the cerebellar corpus were significantly the largest ones of the whole zebrafish brain (p < 0.05, Supplementary Fig. 5k-l).

Supplementary Note 2. Proteins identified by mass spectrometry and filtering criteria used to define synaptosomal and PSD datasets
The exact same criteria were applied to mouse and zebrafish mass spectrometry

Supplementary Note 3. Evolutionary origins of species differences in SYN and PSD proteomes
We have shown that zebrafish presents larger families than mouse (Fig. 3a,b). This has led us to hypothesize that the teleost-specific whole genome duplication (TGD) has been the major force driving this increase in family size. Nevertheless, these could also be consequence of gene loss in mouse or, alternatively, due to tandem duplications occurred in the fish lineages leading to zebrafish after the TGD.
We have also shown that mammalian PSD proteins absent from zebrafish include protein types with important synaptic functions. Here we hypothesized that after fish diverged from the rest of vertebrates the PSD proteome has incorporated new proteins. Nevertheless, loss of the genes coding for these proteins in the zebrafish genome is an alternative explanation.
To clarify these points we have performed two analyses: i) Evaluate family size in other fish species appeared after the TGD.
ii) Identify mouse synaptic proteins without an orthologue in the zebrafish genome as a measure of gene loss in zebrafish. proteins is increased in all fish appeared after the TGD but not in the Spotted Gar.
When doing a principal components analysis (PCA) and plotting the first two components, the fish species appearing after the TGD cluster together while the Spotted Gar clusters with the vertebrate (Supplementary Fig. 8). This data clearly indicates that gene families of proteins expressed at the synapse tend to be increased in all fish species appearing after the TGD. Supporting the idea that the TGD has shaped synaptic gene family size.
ii) We have looked for zebrafish orthologs of mouse synaptic proteins to evaluate gene loss in zebrafish. Importantly most mouse synaptic proteins have an orthologue identified in the zebrafish genome, thus discarding gene loss as the main explanation for the differences observed between species. For instance, if all mouse proteins are considered 90% have an orthologue in the zebrafish genome or of mouse PSD specific proteins (Mm-sPSD) 80% also show an orthologue in zebrafish ( Supplementary Fig. 9).
Overall these findings indicate that the teleost-specific whole genome duplication is most likely the major driving force behind the expansion of gene families for proteins 20 expressed at the zebrafish synapse. Also, that major gene loss has not occurred between mouse and zebrafish, at least for genes coding for proteins expressed at the synapse. Thus differences in proteome composition between species is not mainly consequence of gene loss.