Introduction

It is well known that cancer patients rarely die from the original disease but are usually victims of its dissemination to distant body sites. In this process, cancer cells undergo a series of events usually termed the invasion-metastasis cascade [1]. In order to inhabit new locations, metastatic cells must physically detach from the main tissue (tumor cell dissociation), break through the basal lamina and invade the surrounding tissue (invasion), enter the nearby blood or lymphatic vessels (intravasation), survive the transit through the lymphatic or blood system, and extravasate from blood/lymphatic vessels into distant tissue (invasion). In distant locations, metastatic cells can form small cellular clusters, which eventually grow into macroscopic tumors (colonization) [2]. Metazoans (animals) are built as complex structures, typically organized into tissues and organs, where every cell is committed to the wellbeing of the whole organism. Processes such as growth or cell migration are strictly controlled. Metazoan organism developed a series of defense mechanisms against non-cooperating cheater cells, such as apoptosis. Tumor (metastatic) cells are often destroyed by turbulences within the vascular system, get trapped in small vessels, or attacked by the immune system [3]. In order to disseminate, cancer cells have to acquire the capacity to invade the surrounding tissue and move into the circulatory system. The survival in the distant unfamiliar environment and, often, unrelated tissue is especially challenging and requires cellular transcriptional reprogramming which leads to major phenotypical changes usually called epithelial to mesenchymal transition (EMT) [4]. The precise nature of the changes that occur in the metastatic process on the molecular level is still quite unclear. The discovery of genes/proteins that are directly involved in the metastatic cascade is a big step forward in our understanding of this process. The group of metastasis-suppressor proteins was established in 1988 after the identification of NME using differential hybridization analysis of murine K-1735 cells of different metastatic potential [5]. Metastasis suppressors are specifically involved in regulation of one or several steps of the metastatic cascade. Their expression in the primary tumor is, in general, lower than in the corresponding metastasis. The key feature of a metastasis-suppressor gene is that its expression inhibits metastasis but it normally does not influence primary tumor growth. Upon restoration of its function, the cell is no longer metastatic although it remains tumorigenic [6]. Metastasis suppressors vary in their subcellular localization and have diverse functions in the cell spanning from protein kinases (MAP2K4, MAP2K7, MAPK14) or nucleoside diphosphate kinases (NME), to cell–cell adhesion molecules such as cadherins (CDHs), transcription factors (KFL17), scaffolding proteins (AKAP12), and many others [7]. Many metastasis suppressors are multifunctional proteins. One or several of their functions can be involved in metastasis suppression. The suppression activity of a specific metastasis suppressor depends on the tumor type. Furthermore, it is possible that a specific protein acts as a metastasis suppressor in one, and as a tumor suppressor or even promotes tumorigenesis in another tumor [8].

The goal of this paper is to give a general overview of the evolutionary history of known metastasis-suppressor genes/proteins in animals and to put it into the context of what is already known about the emergence of neoplasms in animal history. Herein we use the term metastasis suppressor both for the genes/proteins whose metastasis suppression activity is documented (usually in mammals) and for their homologs across metazoans. Whether those genes have similar properties and function in other animal lineages, especially in simple animals such as sponges and cnidarians, or even unicellular organisms, is largely unknown. Given the fact that the published data on the evolutionary history of metastasis suppressors are scarce, we performed an additional bioinformatics analysis to identify homologs of human metastasis suppressor genes in the genomes of animals from diverse lineages, and in closest unicellular relatives of animals (choanoflagellate Monosiga brevicollis and filasterean Capsaspora owczarzaki). The information obtained was used to complement the available literature on this topic. In addition, we attempted to correlate the appearance of a certain metastasis suppressor gene or a group of metastasis suppressor genes with its biochemical/biological function, localization and/or step in the metastatic cascade in which it is implicated. The list of metastasis suppressor genes/proteins we investigated is available in Table 1. The list of species we chose for our analysis, their phylogenetic relationships, common names, and taxonomic groups to which they belong are displayed in Table 2 and Fig. 1. The distribution of metastasis suppressor homologs across the studied species, as identified by our analysis, is shown in Fig. 2 and Supplementary Figure 1.

Table 1 Metastasis suppressors in humans (updated and adapted from refs. 7, 32, 44, 99)
Table 2 Representative organisms used for cross-species analysis
Fig. 1
figure 1

The schematic phylogenetic tree among species we analyzed and taxonomic groups to which they belong

Fig. 2
figure 2

The number of homologs of metastasis related genes is variable across species. The heatmap shows the number of gene homologs to human metastasis related genes across all studied species

Bioinformatics analysis

Data

Species for the comparative analysis (Table 2; Fig. 1) were chosen to sample key branches of the metazoan phylogeny and for the completion of their genomes. Full proteomes of representative species with whole genome assemblies where downloaded from Ensembl release 87 [9] or from Ensembl Genomes release 34 [10] (Table 2). For those species not represented in Ensembl or Ensembl Genomes, full genomes and proteomes were downloaded from the NCBI’s genome database [11] or, if also unavailable there, the JGI portal [12]. The proteomes of each species were filtered to include only the longest protein product per gene, i.e., to eliminate all but one isoform per gene, using a custom Perl script. Custom Perl scripts will be made freely available upon request.

Homologous groups of proteins

The sampling of species in this study is not represented in publicly available database of homologies, so we applied a computational pipeline to assign all the genes of our selected species to homology groups. Our method for determining homology groups is analogous to the approach used by many others, including EnsemblCompara [13] and OrthoMCL [14]. To assign genes to homologous groups, the filtered proteomes from all species were compared in an all-to-all blastp search using an e-value cutoff of 1e−5 with NCBI’s BLAST version 2.4.0+ [15]. The BLAST similarity scores were represented as a graph using an implementation of the MCL algorithm [16], with the program mcxload and the options --stream-mirror --stream-neg-log10 -stream-tf ‘ceil[200]’. This graph method based on similarity as estimated by BLAST scores allows for the inclusion of more distantly related genes. This makes it more advantageous to a BLAST-only method, especially in finding homologs in more distantly related species. Clusters were extracted from the network using the program mcl with the clustering parameter (-I) set to 3.0.

Extracting homologs groups of metastasis suppressor genes

Clusters of homologous genes were filtered to extract those clusters that contain a homolog to known human metastasis suppressor genes (Table 1). The resulting counts of homologous genes per organism were plotted in R version 3.2.5 [17] with the heatmap.2 function from the gplots package [18].

The presence of homologs of known human metastasis suppressor genes across metazoans is displayed in Fig. 2 and Supplementary Figure 1.

Interpretation of the results

Our approach does not have the power to distinguish between speciation or duplication events in the history of the genes, i.e. it cannot distinguish between orthologs and paralogs. Therefore, the resulting clusters can only be considered homologs. Furthermore, this approach does not allow a detailed reconstruction of evolutionary histories of each metastasis suppressor family. This is especially true for genes that have patchy distribution across metazoan lineages (Fig. 2; Supplementary Figure 1). The absence of a homolog in a genome assembly could mean that it has been lost in a lineage. However, it can also be a consequence of incomplete genomic information due to limits or errors in sequencing, assembly, or annotation techniques.

Metastasis suppressors that appeared before the origin of animals

According to our analysis and previous work [19], the most prominent period of emergence of metastasis suppressors was before the origin of animals. Most of these proteins, such as MAP2Ks, MAPK14 or NME are important for basic cellular processes common to all living beings (Table 1). We found homologs of these genes in the genome assemblies of all or most animal species we checked (Table 2; Fig. 1), and in, at least, one of their unicellular relatives, as shown in Fig. 2.

NME1

NME1, also known as nucleoside-diphosphate kinase A, is the first identified member of the NME family, and the first described metastasis suppressor gene in many different tumor types [20, 21]. NME1’s biochemical and biological properties have been extensively investigated over the last two decades, mostly in vertebrates. Besides its role in the maintenance of the cellular (d)NTP pool it seems to have other biochemical functions such as histidine kinase activity, transcription factor activity etc. [22, 23]. It is still unclear which of the functions is responsible for its metastasis suppression activity. The evolution of the NME is a rare example of a gene/protein family that has been thoroughly studied [24,25,26,27,28], and it appears to be rather complex. Members of the NME family are present in all three domains of life: Bacteria, Archaea, and Eukarya. NME1 belongs to the NME Group I proteins that are highly conserved within the group and between different species. All of the NME Group I proteins possess NDP kinase activity. Group I NME genes/proteins encompass four paralogs in human, NME1–4. Group I NME1/2 and NME3/4 genes emerged from an ancestor gene common to all chordates through the first round of whole genome duplication, occurring early in the vertebrate lineage. NME1 and NME2 split by cis-duplication after the emergence of amphibians [24]. The sponge homolog NMEGp1Sd shows similar biochemical properties to human NME1 and has the potential to modulate migratory properties of human tumor cells [26]. Similar results were recently reported for a Group I NME homolog from a unicellular eukaryote related to animals, C. owczarzaki, Filasterea (Ćetković et al., this issue). Therefore, we presume that the ancestral metazoan NME gene/protein was structurally and functionally similar to the sponge NME and its human homologs NME1/2. In our previous work, we speculated that NME in the sponge has the same biochemical function that is responsible for metastasis suppression in human, and was probably established in the ancestor of all metazoans [26] (Ćetković et al., this issue). Homologs of NME1 were present in the genome assemblies of all organisms we analyzed, from unicellular holozoans to human, with a varying number of homologs per species, which is probably a consequence of lineage-specific duplications, gene losses or incomplete genomic information.

ARHGDIB

Rho GDP dissociation inhibitor beta, is a member of a large family of proteins that regulate guanine nucleotide signaling. It was originally implicated in bladder carcinoma metastasis suppression, but it is involved in other cancer types as well [29]. It has been suggested that this protein is important for modulating tumor microenvironment [30]. We found ARHGDIB homologs in all analyzed organisms, from unicellular holozoans to human, except lamprey Petromyzon marinus (Vertebrata/Cyclostomata) and spider Stegodyphus mimosarum (Arthropoda). There was usually only one or up to four homologs present in each species.

BRMS1

Breast cancer metastasis suppressor 1 is expressed as a 246 amino acid protein in human and is reported to suppress metastasis in breast [31], but also in several other cancer types [32]. BRMS1 has been described in many species such as the fruit fly Drosophila melanogaster and different vertebrates [33]. It was found in the genome assemblies of all organisms analyzed except in the choanoflagellate M. brevicollis, the nematode worm Caenorhabditis elegans and the Pacific oyster Crassostrea gigas.

DPYSL3

Dihydropyrimidinase like 3 was identified as a metastasis suppressor in prostate cancer and is a member of a large family of colapsins [34]. Colapsins regulate axon guidance and neurite outgrowth as well as migration processes [35]. It was present in the genome assemblies of all organisms analyzed except in the choanoflagellate M. brevicollis and the ctenophore Mnemiopsis leidyi.

DRG1

Developmentally regulated GTP-binding protein 1, is a GTP-binding protein that belongs to the DRG family consisting of two members: DRG1 and DRG2. DRG1 seems to be involved in many metastasis-associated signaling pathways consequently altering angiogenesis and possibly colonization. Interestingly, DRG1 was first identified as a tumor suppressor in bladder and pancreatic cancers [36], whereas its metastasis suppressor activity was discovered by further research in breast, prostate, and colon cancer [37]. Homologs (either DRG1 or DRG2) have been found throughout metazoans [38]. In our survey, one to four DRG1 homologs were found in all analyzed genome assemblies.

RRM1

Ribonucleotide reductase catalytic subunit M1, encodes the regulatory subunit of ribonucleotide reductase and has been described to suppress metastasis in lung adenocarcinoma [39,40,41,42]. One to three homologs of this gene were present in all analyzed genome assemblies, from unicellular holozoans to human, but no homolog was found in the genome assembly of the lamprey P. marinus (Vertebrata/Cyclostomata).

KDM1A

Lysine demethylase 1A, functions as a metastasis suppressor in breast cancer, where it modulates TGFβ signaling and EMT [43]. Moreover, in some other tumors (ovarian, prostate, and colon cancer) its expression leads to poor clinical outcomes [44]. A possible single origin of all KDM1 histone demethylase genes before the split of major eukaryotic lineages has previously been suggested. The KDM1 genes are conserved during evolution in both number of homologs and domain structure, although a few duplication events were observed in plants [45]. Our analysis confirmed these findings on metazoans. One to three KDM1A homologs were present in all analyzed genome assemblies from unicellular holozoans to human.

MAP2Ks and MAPK14

Mitogen-activated protein kinase kinases (MAP2Ks) are protein kinases that phosphorylate (activate) mitogen-activated protein kinases (MAPKs). MAP2K4 is a dual specificity kinase that suppresses metastasis in prostate and ovarian carcinomas [46], whereas it has an opposite effect in breast and pancreatic cell lines [47]. MAP2K7, MAP2K6, and MAPK14 have been found to suppress metastasis in prostate and ovarian cancer [48, 49]. Furthermore, it has recently been published that MAPK14 signaling activation in breast cancer cells has an important role in repressing tumor metastasis [50]. As MAP2Ks and MAPKs are involved in many crucial cellular events, such as cell cycle progression and growth arrest, their involvement in metastasis suppression is not surprising. Two to eight MAP2Ks homologs were found in all genome assemblies, from close unicellular relatives of animals to human. Homologs of MAPK14 were also found in all analyzed genome assemblies but the number of homologs was much higher (15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49).

DLC1

Rho GTPase-activating protein was identified in breast cancer using microarray-based transcriptional profiling of cell lines with different metastatic potential [51]. The mechanism of its action is still not quite elucidated, but it seems to have a role in functioning of Rho GTPases [52]. DLC1 homologs were present in all genome assemblies analyzed.

CD82

The CD82 molecule is a glycoprotein and a member of the tetraspanin superfamily. It was found to inhibit cancer cell migration and invasion [53] and is frequently downregulated in human tumor cell lines [54]. Tetraspanins possess transmembrane domains [55] and are found in evolutionary distant taxa such as animals, protists, plants, and fungi [56, 57]. Our analysis confirms these findings; all analyzed animal genome assemblies contained a large number (13 to 41) of CD82 homologs. The gene was absent from the choanoflagellate M. brevicolis, but present in the filasterean C. owczarzaki.

KLF17

Krüppel-like factor 17 is a protein family of highly conserved zinc finger transcription factors, which are critical regulators of essential cellular processes, including proliferation, differentiation, apoptosis, and migration [58]. It has been shown that KLF17 expression is significantly downregulated in primary human breast cancer samples and that the combined expression patterns of KLF17 and ID1 (inhibitor of DNA binding 1) can serve as a potential biomarker for lymph node metastasis in breast cancer [59]. KLF homologs were present in the genome assemblies of all organisms we analyzed.

HUNK

Hormonally upregulated Neu-associated kinase was identified as a breast cancer metastasis suppressor by blocking actin polymerization which leads to reduced cell motility [60]. A large number of HUNK homologs (mostly between 20 and 35) were present in the genome assemblies of all organisms we studied.

GSN

Gelsolin was identified as a metastasis suppressor gene in B16-BL6 mouse melanoma cells [61]. GSN binds actin and consequently changes actin cytoskeletal organization [62], but its role in cancer is controversial. It has been described as a metastasis suppressor in breast, bladder, and gastric carcinoma [63,64,65] but also as a marker of unfavorable prognosis for colorectal cancer patients [66]. We found two to 14 gelsolin homologs in all the genome assemblies we studied.

CSTA

Cystatin A, was found to suppress metastasis formation in human esophageal squamous cell carcinoma and murine mammary carcinomas [67]. Cystatin A is an endogenous inhibitor of Cathepsin B. The balance between the two molecules regulates invasiveness in tumors. We identified CSTA homologs in unicellular relatives of animals, but not in the placozoan Trichoplax adhaerens, the nematode C. elegans, the fruit fly D. melanogaster, the sea squirt Ciona intestinalis (Urochordata), and the lamprey P. marinus (Vertebrata/Cyclostomata).

Metastasis suppressors that appeared in the early evolution of animals

A number of metastasis suppressor genes seem to appear with the emergence of animals. We identified their homologs in simple non-bilaterians, but not in the closest unicellular relatives of animals (Fig. 2). Although all biochemical functions of these proteins have not yet been completely elucidated, it seems that most of them are involved in cell–cell communication and cell cycle control (Table 1).

PEBP1

The mechanism by which phosphatidylethanolamine binding protein 1 executes his metastasis suppressor role is not yet clear, but it is known to interfere with the Raf/MEK/Erk signaling pathway involved in metastasis formation [68]. PEBP1 acts as a metastasis suppressor in several cancer model systems [68, 69]. We detected PEBP1 homologs in non-bilaterians Amphimedon queenslandica (Porifera) and Nematostella vectensis (Cnidaria), but our analysis did not reveal homologs in T. adhaerens (Placozoa) and M. leidyi (Ctenophora). This suggests that PEBP1 might have appeared early in the evolution of animals and was subsequently lost in some early-branching lineages, or that some of the genomic information from these early-branching lineages is incomplete.

TIMPs

Tissue inhibitors of metalloproteinase balance the activity of metalloproteinases, enzymes in charge of digesting the extracellular matrix during the process of invasion and penetration into the vascular system [70]. Therefore, TIMPs are considered to have metastasis suppressor potential [71,72,73]. The human genome assembly has four TIMP paralogs (TIMP-1, TIMP-2, TIMP-3, and TIMP-4) and they inhibit all known metalloproteinases and several members of the ADAMTS (A disintegrin and metalloproteinase with thrombospondin motifs family of proteinases) [74]. Most vertebrates possess at least one TIMP homolog [75]. TIMPs among invertebrates display a lower percentage of sequence similarity compared to human TIMPs [76]. We found a single TIMP homolog in the genome assembly of D. melanogaster, as previously reported [77]. In the genome assemblies of Hemichordata (Saccoglossus kowalevskii), Nematoda (C. elegans), Anellida (Capitella teleta), and Platyhelminthes (S. mansoni), we did not find TIMP homologs. Among the four phyla of early-branching non-bilaterian metazoans, Porifera (A. queenslandica) and Ctenophora (M. leidyi) do not possess TIMP homologs. The Placozoa (T. adhaerens) genome assembly had a TIMP homolog. Within Cnidarians, the genome assembly of N. vectensis had four TIMP genes while Hydra vulgaris had none. Our results indicate that TIMP family genes originated during the early evolution of animals, before the appearance of bilateria. If the above genome assemblies are complete, TIMP genes might have been lost in some and went through independent duplications in other invertebrate lineages.

CAV1

Caveolin-1 has been described as a tumor suppressor [78, 79], but it has also been shown to reduce metastasis in some other tumor models [80,81,82]. The mechanism behind its metastasis suppressor activity is still unresolved, but it is probably linked to its involvement in caveolae function and receiving signals from the local microenvironment [7]. The CAV1 homolog was absent from A. quinslandica (Porifera), H. vulgaris (Cnidaria), M. leidyi (Ctenophora) and S. mansoni (Platyhelminthes) genome assemblies, but present in N. vectensis (Cnidaria) and T. adherens (Placozoa) genome assemblies. CAV1 was also missing from D. melanogaster genomic data. According to our results, CAV1 probably emerged before the separation of placozoans from Eumetazoa. All other analyzed species possessed one to six CAV1 homologs. The exception was the Pacific oyster C. gigas (Mollusca) with a large number of CAV1 homologs [21] which might be the result of assembly or annotation errors.

MTBP-MDM2

The MTBP-MDM2 binding protein is a MDM2 interacting partner. Previous research [83] determined that MTBP functions as a metastasis suppressor in the osteosarcoma model system. Our analysis places the origin of MTBP in the early history of animals, before the separation of cnidarians and ctenophores. We did not find MTBP homologs in flatworm S. mansoni (Platyhelminthes), nematode C. elegans, or arthropod (D. melanogaster and S. mimosarum) genome assemblies.

GPR68

GPR68 (G protein-coupled receptor 68) is a metastasis suppressor in prostate cancer [84]. The proposed mechanism of its action is inhibiting cell migration and transendotelian migration through increased expression of Gαi1 (guanine nucleotide-binding protein G(i) subunit alpha-1) [85]. GPR68 probably appeared early in the evolution of animals, as it was present in all analyzed organisms except the sponge A. queenslandica and close unicellular relatives of animals. A large number of homologs, up to almost 200, was present in genome assemblies of chordates, whereas other animals usually had up to 50 homologs.

NR1H4

Nuclear receptor subfamily 1 group H member 4, a member of the nuclear hormone receptor superfamily, is predominantly expressed in tissues exposed to high levels of bile acids and has recently been designated as a metastasis suppressor [86]. A NR1H4 homolog was probably present in the common ancestor of all metazoans. A large number of its homologs—usually more than 20 and sometimes in excess of 100—were present in all animal genome assemblies we analyzed. The genome assemblies of the closest unicellular relatives of animals did not possess NR1H4 homologs.

CASP8

Loss of caspase-8 enhances the migration potential of neuroblastoma cells and drives the tumor towards malignancy [87]. Caspases are members of the family of cysteine depended aspartate-directed proteases, which are well known for their critical role in programmed cell death. It seems that its absence provides a survival advantage in metastatic cells [44]. Caspase-8 is specifically involved in the extrinsic apoptotic signaling pathway [88]. Neither apoptosis nor true caspases have been found in Protista, fungi, and plants [89]. According to our analysis, CASP8 homologs were present in the genome assemblies of all Metazoa, but not in their close unicellular relatives.

DCC

Either loss of heterozygosity or loss of expression of DCC (Netrin 1 Receptor) has been reported in many advanced stage tumors: ovarian, breast, colorectal, pancreatic, etc., which implicates its role as a metastasis suppressor gene [44, 90]. DCC homologs have been found in genome assemblies of all organisms that we analyzed, except in the choanoflagellate M. brevicollis and the filasterean C. owczarzaki.

Metastasis suppressors that are a chordate or vertebrate innovation

Several metastasis suppressor genes appeared with the origin of vertebrates or during the early vertebrate radiation. Their homologs are present in all or most vertebrate genome assemblies that we analyzed (Table 2; Fig. 1) and were generally absent from the genome assemblies of invertebrate animals and their closest unicellular relatives (Fig. 2). The only metastasis suppressor whose origins could clearly be traced back to the origin of chordates is E-cadherin (CDH1).

CADM1

Cell adhesion molecule 1, which belongs to the immunoglobulin superfamily of proteins, has a role in cell–cell adhesion and is responsible for the adhesive properties of human epithelial cells [91]. Its loss is associated with poor prognosis of breast cancer patients [92]. CADM1 expression is regulated via hypermethylation of its promoter which in turn leads to the EMT phenotype [93,94,95]. CADM1 was present in the last common ancestor of vertebrates. We found a putative homolog in the genome assembly of S. kowalevskii (Hemichordata) which could indicate a more ancient origin.

GAS1

Growth arrest specific 1, GAS1, was first identified for its metastasis suppression role after genome-wide shRNA screen in B16-F10 melanoma cells [96]. It seems that it exhibits suppressor activity through regulating apoptosis via Caspase 3 and 9 [97]. GAS1 is most probably a vertebrate innovation, although our analysis unexpectedly revealed one homolog in the cnidarian N. vectensis and the nematode C. elegans. The N. vectensis candidate homolog had a considerably shorter protein product: 135 aa compared to 200–384 aa in vertebrates. It may contain only a domain of the vertebrate protein, or be a truncated gene due to misassembly or misannotation. The C. elegans protein had a full length of 228 aa and is most likely a true homolog.

CD44

The CD44 molecule has a dual role in tumor development as a tumor promoter and a metastasis suppressor [98, 99]. This might be due to the enormous complexity of CD44’s mechanisms mediated by posttranslational modifications and involvement in multiple physiological processes in the cell [100, 101] which are not yet understood. CD44 almost always has as a single homolog per vertebrate genome assembly.

AKAP12

AKAP12 is a scaffolding protein that affects multiple steps in metastasis suppression in prostate cancer [102] and melanoma cells [103]. Our analysis suggests that a homolog of the metastasis suppressor AKAP12 was likely present in the last common ancestor of vertebrates.

LIFR

Leukemia inhibitory factor receptor alpha, LIFR has been described as a metastasis suppressor in breast cancer [104], acting as a downstream target of miR9, a metastasis promoter in breast cancer cells. LIFR homologs were present in the common ancestor of bony fishes and tetrapods: amphibians, Reptilia (including birds), and mammals.

KISS1

KISS1 was characterized as a metastasis suppressor gene/protein in 1996 [105]. The transcribed product of KISS1, kisspeptin, is a 145 aa peptide which is further processed into shorter, biologically active peptides. One of them, metastin, binds to the G protein-coupled receptor GPR54 (also known as KISS1 receptor—KISS1R), and is believed to be responsible for metastasis suppression [106]. Previously, it has been shown that KISS1 is missing from genomes of birds [107]. We have confirmed this result and found that it is also absent in the genome assemblies of anole lizard, Anolis carolinensis. The distribution of homologs in the genome assemblies of vertebrates that we analyzed indicates that KISS1 appeared in the common ancestor of tetrapods, and was subsequently lost in the common ancestor of sauropsids (extant Reptilia including birds).

CDHs

Metazoans developed three major cellular junctions that are typically present in vertebrate epithelial tissues. One of them, adherent junctions, seem to be present in all metazoan lineages and is considered to be critical for the maintenance of the tissue architecture of multicellular organisms [108]. Adherent junctions are composed primarily of Type I cadherins—transmembrane glycoproteins that form homotypic complexes. Loss of cadherins (CDHs) occurring during EMT enables cancer cells to detach from the original tissue and start the metastatic process. It is widely accepted that the key molecule in metastasis formation onset is specifically E-cadherin [109]. Cadherins and cadherin-related proteins are found in the entire metazoan kingdom and also in choanoflagellates—the closest unicellular relatives of animals [110]. However, CDH1, CDH2 (Type I), and CDH11 (Type II) cadherins are the only cadherin members known to be involved in metastasis suppression [111]. Our analysis identified four homologs in the urochordate C. intestinalis genome assembly, and a large number of homologs in vertebrate genome assemblies. We also detected homologs of Type I and Type II cadherins in the purple sea urchin Strongylocentrotus purpuratus, and the spider S. mimosarum, which suggests a possible more ancient origin.

Metastasis-suppressor genes have diverse evolutionary histories

Our bioinformatics analysis showed that a number of metastasis suppressors (for example PEBP1, RRM1, CSTA and ARAHGDIB) are unexpectedly missing from the genome assemblies of some animals (Fig. 2). In the sea lamprey P. marinus (Vertebrata/cyclostomata), this phenomenon is pronounced, and could be a consequence of drastic rearrangements during early embryogenesis of the lamprey P. marinus genome in which about 20% of the germline DNA from somatic tissues is shed, and potentially includes the genes we queried. It might also be a technical consequence due to the fact that the lamprey genome is highly repetitive and in parts has very high GC content which makes it difficult to sequence and assemble the genome [112, 113]. In general, not all genomes are equally well sequenced, assembled, annotated or studied. The absence of some metastasis suppressors from the genome assembly of the Pacific oyster C. gigas, spider S. mimosarum, hemichordate S. kowalevskii and other genomes with lower quality assemblies could easily be a result of incomplete genomic data. On the other hand, the absence of a gene from a genome assembly could also be due to true gene loss in specific lineages. It is known that accelerated evolution and gene loss are prominent in some animal lineages such as those leading to D. melanogaster and C. elegans [114]. Our results showed that some metastasis suppressor families in either or both of those lineages went through the same processes (MTBP, TIMPs, CAV1, BRMS1, and CSTA) (Fig. 2). Gene loss has to be taken into account while working on D. melanogaster and C. elegans model systems. For instance, until recently these organisms were considered to be appropriate models for studying apoptosis. On the basis of experiments on these organisms it was concluded that the extrinsic apoptotic pathway emerged on the level of vertebrates since C. elegans and D. melanogaster lack components required for this pathway. Surprisingly, recent findings on cnidarians [89, 115] have shown that both apoptotic pathways have ancient origins and were already present in the common ancestor of cnidarians and bilateral animals, more than 550 million years ago [116]. All these findings reiterate the necessity to take evolutionary history into account when interpreting results obtained with model organisms.

The survey of the available literature as well as our analysis suggest that metastasis suppressors emerged at different periods in the evolution of life, with the majority grouped at three points, or peaks, of emergence: the origin of the eukaryotic cell, the emergence of multicellularity and the appearance of vertebrates. This is expected because gene numbers and diversity increased with these important evolutionary events [19, 117, 118]. The most prominent period of emergence of metastasis suppressors seems to have occurred before the origin of animals. The appearance of numerous tumor-related genes at the level of unicellular eukaryotes might seem surprising. However, it becomes understandable as we are discovering that their physiological (versus pathophysiological) functions are connected to core biological processes necessary for the maintenance of every living cell. Our investigation indicates that a large number of metastasis suppressors appeared with the emergence of multicellularity in the animal lineage. Although all biochemical functions of the proteins within this peak are far from being elucidated, according to the present knowledge most of them are involved in cell–cell communication and cell cycle control. Four out of eight suppressors which emerged in parallel with multicellularity, CASP8, CAV1, DCC, GPR68, are located at least partly in the membrane, and some have receptor activity (GPR68 and DCC) while several have a role in cell cycle control or apoptosis (CASP8, MTBP, NRIH4, DCC). This, however does not come as a surprise. Multicellular organization has clear advantages; it allows the specialization of cells for specific functions, and the formation of tissues and organs, as well as a larger size of the organism. Aktipis and coworkers defined key foundations of multicellularity which include: controlled proliferation, controlled cell death, division of labor, specialized systems for transport of oxygen and nutrients, and extracellular environment maintenance [119]. A tumor can be defined as a disease in which individual cells attempt to “cheat” this highly organized system. Tumor cells increase their fitness but reduce the fitness of the whole organism [120]. It is presumed that, in order to fight tumors, multicellular organisms developed systems of communication and cell cycle control. The precise time when tumors became a threat in the history of the animal kingdom and the incidence of tumors in animals living in their natural habitats, especially in invertebrates, still remains to be resolved. Although the data are scarce and no systematic research has been done in the field, there is evidence that tumors appeared in different lineages within the animal kingdom. Besides well studied diseases in mammals, neoplasms has been reported in invertebrate deuterostomes [119] in protostomes [121, 122] and even in simple non-bilaterian animals [123,124,125]. The most thouroughly described non-human tumors are from vertebrates, especially farm animals and pets [126, 127] as well as other animals kept in captivity [128]. They were also identified in invertebrate animal models (H. vulgaris, D. melanogaster) but these findings should be taken with caution since laboratory breeding and culture conditions are far from those in natural habitats [125]. Therefore, it is questionable whether these organisms ever develop tumors in natural conditions. Indirect evidence of the presence of tumors in more distinct phyla also comes from the fact that marine invertebrates produce active substances that have antitumor activity on human tumor cells in culture [129]. However, it is not clear whether these substances are produced to protect the organism from potential carcinogens, or for a completely different ecological or physiological purpose. Most genes and pathways implicated in human genetic diseases and in neoplasia development and progression are highly conserved throughout evolution and can be found in early-branching metazoans such as non-bilaterians or even unicellular eukaryotes [19, 28, 130,131,132,133,134,135,136,137,138,139]. This is probably due to the fact that most human diseases developed by abusing or distorting basic cellular processes common to all living beings. This supports the idea that tumor is an ancient phenomenon [19]. However, at this point we can only speculate about the presence of tumors in the early evolution of animals. Although the homologs of tumor associated genes are present in invertebrates and even in unicellular relatives of animals, it is unclear whether the same genes relevant for neoplastic transformation in mammals are involved in the invertebrate tumors and whether these diseases are homologous. Several studies imply that this is highly probable since homologs of the key cancer-related genes, such as TP53 and RAS, are involved in neoplastic formations in invertebrates [140,141,142,143]. Tumors in invertebrates seem to rarely be malignant [144], although exceptions of this rule have been described. For instance, marine bivalves (Mollusca) form malignant neoplasms [145] as do cnidarinas [146]. For a detailed review on neoplasms detected so far across the eukaryotes see Aktipis et al 2017 [119]. In his comparative study of tumorigenesis and tumor immunity in invertebrates an non-mammalian vertebrates J. Robert suggests that the progress of malignancy runs in parallel with the development of the immune system. According to this theory the increased capacity of the immune system of a highly complex organism to generate a strong and specific immune response results in selection of a vast variety of more invasive tumors [144]. It seems that the development of a highly effective vasculatory system which is found in vertebrates could also be beneficial for metastasis dissemination [147].

Our analysis suggests that the members of the third group, emerging as a vertebrate innovation, have only a few homologs in their genomes (KISS1, LIFR, AKAP12, CD44, CDHs, GAS1), possibly because they appeared after two rounds of whole genome duplication that happened at the origin of vertebrates [148]. The proteins coded by these genes are membrane bound or extracellular, which implies their role in cell–cell communication, adhesion, movement, or some other interaction with the microenvironment.

Although serving as guards against cancer dissemination and, therefore, having a crucial function in the maintenance of an organism’s fitness, it is highly unlikely that the proteins in the first two peaks arose originally as metastasis suppressors, especially since metastatic tumors seem to be rare in non-vertebrate species [144]. It is more likely that their original biological function(s) were adapted in the course of evolution to fight the growing threat of malignancy. If we position neoplasms as an inevitable side effect of multicellularity that has developed in parallel with this type of organization, it is possible that some representatives of the third group emerged with the specific function—as guards against this severe and life-threatening side process. However, this is highly speculative and should be addressed by future experimental studies. It is expected that the list of metastasis suppressors might change in the future, either by the addition of new candidates or by discarding those whose function was misinterpreted. At this point we were unable to connect the evolutionary origin of specific suppressors with a specific step in the metastatic cascade. Hopefully this will be possible once the role of metastasis suppressors is more firmly established in human and other metazoans.