Calpains in cyanobacteria and the origin of calpains

Calpains are cysteine proteases involved in many cellular processes. They are an ancient and large superfamily of enzymes responsible for the cleavage and irreversible modification of a large variety of substrates. They have been intensively studied in humans and other mammals, but information about calpains in bacteria is scarce. Calpains have not been found among Archaea to date. In this study, we have investigated the presence of calpains in selected cyanobacterial species using in silico analyses. We show that calpains defined by possessing CysPC core domain are present in cyanobacterial genera Anabaena, Aphanizomenon, Calothrix, Chamaesiphon, Fischerella, Microcystis, Scytonema and Trichormus. Based on in silico protein interaction analysis, we have predicted putative interaction partners for identified cyanobacterial calpains. The phylogenetic analysis including cyanobacterial, other bacterial and eukaryotic calpains divided bacterial and eukaryotic calpains into two separate monophyletic clusters. We propose two possible evolutionary scenarios to explain this tree topology: (1) the eukaryotic ancestor or an archaeal ancestor of eukaryotes obtained calpain gene from an unknown bacterial donor, or alternatively (2) calpain gene had been already present in the last common universal ancestor and subsequently lost by the ancestor of Archaea, but retained by the ancestor of Bacteria and by the ancestor of Eukarya. Both scenarios would require multiple independent losses of calpain genes in various bacteria and eukaryotes.

www.nature.com/scientificreports/ ability to tolerate extreme conditions 17 . Therefore, they have colonised almost all habitats on the Earth with the access to sunlight and they play a significant role in biochemical processes in nature 18 . The chloroplasts of eukaryotic supergroup Archaeplastida comprising glaucophytes, and red and green algae including land plants 19 have originated from cyanobacteria in the process termed primary endosymbiosis [20][21][22] . Due to the advances in genomics, transcriptomics and proteomics, the genetic makeup of cyanobacteria has been studied more intensively and their significance in biotechnological applications has increased. Cyanobacteria can be a source of bioactive compounds including pharmaceuticals and toxins 23 . Several calcium binding proteins (CaBPs) have been discovered in cyanobacteria. These proteins play a significant role in bacterial cells, mainly in processes such as cell division and development, motility, homeostasis, stress response, secretion, molecular transport, cellular signalling and host-pathogen interactions 24 . Nevertheless, the information about cysteine proteases from the calpain superfamily in cyanobacteria remains limited.
In this study, we have conducted bioinformatic search for calpain homologs in proteomes of various selected cyanobacterial species, mainly colonising extreme environments and species with biotechnological significance. The putative interacting partners of cyanobacterial calpains have also been identified in silico. We have also performed the phylogenetic analysis of calpain core CysPC domain to infer the phylogenetic position of the identified cyanobacterial calpains.
The Supplementary Table S2 shows the sequences of 13 cyanobacterial calpains identified in this study. Their sequence length ranges from 382 amino acid residues in F. thermalis to significantly longer sequences in C. minutus and C. polymorphus (1145 and 1160 amino acid residues, respectively). The domain structures of all 13 putative calpains analysed using CDD and Pfam are summarised in Table 1. All identified calpains contain conserved CysPC domain at the C-terminus and the most of them also contain single or multiple bacterial prepeptidase C-terminal domains (PPCs) at the N-terminus (Table 1, Fig. 2).
The alignment of CysPC domains from 13 cyanobacterial calpains, two another bacterial calpains, human calpains CAPN1 and CAPN2, and from two plants DEK1 calpains, and the sequence logo generated from this alignment is presented in Fig. 3. By definition, all CysPC domains should share a catalytic triad of amino acids typical for calpains-Cys (C), His (H) and Asn (N). All these three residues were correctly aligned for all 13 cyanobacterial CysPC domains (Fig. 3). Figure 4 shows the alignment of PPC domains present in cyanobacterial calpains. www.nature.com/scientificreports/ Although most calpains are cytosolic, a few of eukaryotic calpains can be also found in organelles such as mitochondria, e.g. human calpain 10 4,25 or they are transmembrane proteins in plasma membrane as in the case of plant calpain DEK1 26,27 . Thus, we also performed prediction of transmembrane regions in cyanobacterial calpains. TMHMM did not identify transmembrane regions in any of cyanobacterial calpains suggesting their cytosolic localization.
Smart BLAST was used to evaluate whether the identified sequences are considered to belong to the calpain superfamily. All 13 sequences show reasonable similarity with members of this superfamily. However, the level of sequence identity is relatively low (~ 30%). This might be due to the lack of annotated bacterial calpain sequences in public databases and only a limited number of well-studied calpains from unicellular eukaryotes and bacteria.
We performed also homological modelling of the 3D structure of each identified cyanobacterial CysPC domain. The results are summarised in Supplementary Table S3. The modelled structure was aligned with the appropriate Protein Data Bank (PDB) template, and the alignment was evaluated based on the number of aligned amino acid residues and Root mean square deviation (RMSD). All 3D structures show significant similarity with the template with RMSD > 1 for all modelled CysPCs (Supplementary Table S4). 3D structures are shown in Fig. 5.
Using String DB 28 , we predicted the putative interactions of cyanobacterial calpains with other proteins. Almost 40% of interaction partners of cyanobacterial calpains predicted by String DB were putative cyanobacterial proteins currently missing annotation in public databases. Putative interaction partners of cyanobacterial calpains are shown in Supplementary Fig. S1.
To determine evolutionary relationships between cyanobacterial calpains and calpains present in other bacteria and eukaryotes, we performed phylogenetic analysis of the CysPC domain. In contrast to other parts of calpain sequences, CysPC domain is highly conserved in all calpains. It consists of approximately 350 amino acid residues. The results of phylogenetic analysis are shown in Fig. 6. Bacterial and eukaryotic CysPC domains are clearly separated into two monophyletic clusters. All cyanobacterial calpains, except for S. hofmanii 2, form a monophyletic cluster within bacteria.

Discussion
We have searched for the presence of calpains in proteomes of 50 cyanobacterial species and we have identified calpains in 10 of them based on HMM of the catalytic CysPC domain typical for calpains proteins. The number of identified cyanobacterial species possessing calpains is relatively low, but as it has been shown previously, cyanobacteria are a highly diverse group and their genome content varies significantly even at the species and Table 1. Information about calpains found in cyanobacteria. All sequences possess a single catalytic core domain of calpains (CysPC). Most sequences also possess one, two or five bacterial pre-peptidase C-terminal domains (PPC). All identified CysPC domains contain three catalytic sites (CS) typical for calpains.  Fig. 2), which is typically present in bacterial secreted proteins at their C-terminus 30 , while in cyanobacterial calpains, it is found at the N-terminus. The transmembrane helical regions are absent from all putative cyanobacterial calpains suggesting their cytosolic localisation. These findings are consistent with the study of calpains in other bacteria that also possess PPC at the N-terminus and do not possess any predictable transmembrane regions 5 .
Calpains are known to be involved in many cellular processes in multicellular eukaryotes such as aleurone bilayer development and positional cell division in plants 31 , and brain function, memory formation and the development of many pathological processes in mammals 7 . Calpains cleave a wide range of substrates, among which are e.g. protein kinases, receptor molecules and proteins involved in signal transduction. It has been proposed that calpains play main role in regulation of cell signalling rather than in protein digestion 32,33 . However, their function in bacteria remains unknown.
The predicted interaction partners of identified cyanobacterial calpains differ significantly among studied cyanobacterial species. None of them has been predicted to interact with calpains in all cyanobacterial species and only few of them have been commonly predicted for two, three or four species. Methionine synthase is putatively interacting with calpains in four cyanobacterial species, while S8 peptidase and glycoside hydrolase family 3 proteins (such as beta-N-acetylhexosaminidase) with calpains in three cyanobacterial species. SecA involved in protein translocation across cytoplasmic and thylakoid membrane, TamB (a component of the translocation and assembly module autotransporter complex) and collagen triple helix repeat protein have been identified as putative calpain interacting partners only in two cyanobacterial species. Other annotated proteins putatively interacting with cyanobacterial calpains have been predicted only for a single cyanobacterial species and almost 40% of predicted interacting partners have been non-annotated proteins (Supplementary Fig. S1). Based on these results, it is currently difficult to draw any meaningful conclusion about a function of cyanobacterial calpains. The predicted interaction partners and the function of cyanobacterial calpains can be experimentally verified in the future.
We also conducted phylogenetic analysis of calpain core CysPC domain to infer the phylogenetic position of cyanobacterial calpains. The phylogenetic analysis revealed the monophyly of bacterial as well as of eukaryotic CysPCs with bootstrap support 97 and 98, respectively (Fig. 6). No horizontal gene transfers of CysPC domain from bacteria to eukaryotes or vice versa were detected using our taxon sampling. This is consistent with the results of Rawlings (2015) whose phylogenetic analysis identified only two recent horizontal gene transfers from eukaryotes to bacteria and no recent horizontal gene transfer from bacteria to eukaryotes 5 . The branching order within the domain Bacteria and within the domain Eukarya does not correspond to real evolutionary www.nature.com/scientificreports/ relationships of bacterial and eukaryotic taxonomic groups, respectively. CysPC is thus unlikely to be a suitable marker for inferring the evolutionary relationships between organisms and it is also possible that several horizontal transfers of calpains have occurred within bacteria as well as within eukaryotes.
With the exception of S. hofmannii 2, all cyanobacterial CysPC domains are a monophyletic group within bacterial CysPC domains (Fig. 6). The alignment of cyanobacterial CysPC domains also confirms that CysPC domain 2 from S. hofmannii is the most divergent in comparison to other cyanobacterial CysPC domains (Fig. 3). The tree topology also disproves the hypothesis that cyanobacteria, from which chloroplasts of Archaeplastida evolved, were the endosymbiotic donors of archaeplastidial calpains.
The explanation of the origin of eukaryotic calpains depends on the opinion about the origin of eukaryotes themselves. The most popular hypothesis for the origin of eukaryotes suggests that eukaryotes evolved by the endosymbiosis of an alphaproteobacterial ancestor of mitochondria in an archaeal host 34 , probably from the group Asgard archaea 35,36 . Since archaea do not possess calpains, while some alphaproteobacteria do, under this scenario, the host archaeal cell could have obtained calpain gene from an alphaproteobacterial endosymbiont. This scenario would be supported if alphaproteobacterial CysPC domains would be placed at the base of eukaryotic CysPCs in the phylogenetic tree with high bootstrap support. Since this is not the case (Fig. 6), our tree does not support alphaproteobacterial origin of eukaryotic calpains. Nevertheless, the hypothesis, that an archaeal ancestor of eukaryotes or the last common ancestor of eukaryotes obtained the calpain gene from an unknown bacterial donor, e.g. via an ancient horizontal gene transfer, cannot be rejected. The scenario that eukaryotic calpains are derived from genes horizontally transferred from a bacterium has been also suggested by 5 . Rawlings (2015) has also proposed that differential distribution of calpains in bacteria is the result of multiple ancient horizontal gene transfers among bacteria rather than multiple gene losses from various bacteria 5 . In our opinion, the alternative hypothesis that both bacterial ancestor as well as eukaryotic ancestor possessed calpain can be still considered. Currently less popular but still plausible hypotheses for the origin of eukaryotes suggest that Archaea and Eukarya are sister groups. The common ancestor of Archaea and Eukarya might have originated from a bacterium 37 or these two domains had a common undefined ancestor-a sister lineage of the domain Bacteria 38 . An undefined archaeo-eukaryotic ancestor might have been even more complex than all contemporary archaea, Archaea domain might have arisen via reductive evolution of this archaeo-eukaryotic ancestor and the differences between genome contents of contemporary archaeal lineages could be explained by differential gene losses [39][40][41] . Considering this scenario, the calpain gene could have been already present in the last universal common ancestor, lost in the ancestor of Archaea, while retained in the ancestor of Bacteria and in the ancestor of Eukarya. Since calpain genes are universally distributed in neither bacteria nor eukaryotes, all mentioned alternative scenarios would require multiple independent losses of calpain genes in various bacterial and eukaryotic lineages.

Methods
We have searched for calpains in silico in proteomes of 50 selected cyanobacterial species (Supplementary  Table S1). Our selection was focused on cyanobacteria from extreme biotopes as well as those with biotechnological potency. The proteomic data are available online in NCBI GenBank (https:// www. ncbi. nlm. nih. gov/ genba nk/) 42 and Uniprot Proteomes (https:// www. unipr ot. org/ help/ prote omes_ manual) databases 43 . Since calpain superfamily is relatively divergent and it has many members, to identify potential calpain sequences, we created Hidden Markov Model (HMM) of calpain catalytic core domain (CysPC). For HMM creation, annotated calpain sequences from various organisms were obtained from the UniProt database (https:// www. unipr ot. org/). HMM was built using HMMER 3.2.1 44 . This model was then applied to cyanobacterial proteomes and putative calpain sequences were identified.
SmartBLAST (https:// blast. ncbi. nlm. nih. gov/ smart blast/) was used to search for homologs of cyanobacterial calpains in other bacteria as well as in eukaryotes. 3D structure of putative calpains was predicted by Phyre2 50 to verify that the identified sequences are really calpains.
String DB was used for the prediction of putative interactions with other proteins 28 to elucidate possible function of cyanobacterial calpains. String DB is the database of experimentally determined as well as predicted protein interactions. The predictions of protein interactions are based on protein homology, gene neighbourhood, gene fusions, gene co-occurrence, gene co-expression and/or text mining 28 .
The homological modelling of the 3D structure of cyanobacterial calpains was performed by Phyre2 server 50 . Supplementary Table S3 contains the list of five best-fitting models for each calpain. The model with the highest confidence and the percentual identity was selected and aligned with an appropriate template structure downloaded from the Protein Data Bank (PDB) using Multiprot server 51 . All 3D structures were visualised by ChimeraX software version 1.4 (https:// www. cgl. ucsf. edu/ chime rax 52 . We gathered annotated calpain sequences from various organisms from the UniProt database (Supplementary  Table S5) and we included them in phylogenetic analysis together with the identified cyanobacterial calpains (Supplementary Table S2). Only the regions corresponding to catalytic CysPC domain were used for the phylogenetic analysis. All CysPC sequences were aligned in MAFFT with automatic settings for amino acid sequences. IQ-Tree 53 was used for the construction of phylogenetic trees. Out of 168 models, WAG + F + I + G4 54     www.nature.com/scientificreports/ as the best suiting model for our dataset. Bootstrap was set to 1000. Phylogenetic tree was visualized using ITOL (https:// itol. embl. de/) 55 .

Data availability
All data generated or analysed during this study are included in this published article (and its Supplementaryfiles).