The Evolutionary History of The Orexin/Allatotropin GPCR Family: from Placozoa and Cnidaria to Vertebrata

Peptidic messengers constitute a highly diversified group of intercellular messengers widely distributed in nature that regulate a great number of physiological processes in Metazoa. Being crucial for life, it seem that they have appeared in the ancestral group from which Metazoa evolved, and were highly conserved along the evolutionary process. Peptides act mainly through G-protein coupled receptors (GPCRs), a family of transmembrane molecules. GPCRs are also widely distributed in nature being present in metazoan, but also in Choanoflagellata and Fungi. Among GPCRs, the Allatotropin/Orexin (AT/Ox) family is particularly characterized by the presence of the DRW motif in the second intracellular loop (IC Loop 2), and seems to be present in Cnidaria, Placozoa and in Bilateria, suggesting that it was present in the common ancestor of Metazoa. Looking for the evolutionary history of this GPCRs we searched for corresponding sequences in public databases. Our results suggest that AT/Ox receptors were highly conserved along evolutionary process, and that they are characterized by the presence of the E/DRWYAI motif at the IC Loop 2. Phylogenetic analyses show that AT/Ox family of receptors reflects evolutionary relationships that agree with current phylogenetic understanding in Actinopterygii and Sauropsida, including also the largely discussed position of Testudines.


Results
the Allatotropin/orexin receptors ancestral signature. As it is described above, GPCRs are characterized by the presence of the E/DR motif associated to the TMIII (i.e. IC Loop 2). Based on fully characterized AT and Ox receptors we looked in the GenBank for sequences in all animal phyla. After the analysis of 392 complete sequences, including N-terminal, C-terminal and the presence of 7 TM domains, we found that the motif E/DRWYI in the IC Loop 2 can be tracked from Chordata and Arthropoda, to Cnidaria and Placozoa. The most frequent motif found is DRWYAI, being present in 374 sequences, including the ancestral species T. adhaerens (Placozoa) ( Table 1; Supporting Information File 1). The analysis of the rest of the sequences (eighteen), shows that seven of them exhibit only one conservative change, presenting ERWYAI corresponding to sequences of phyla pertaining to Lophotrochozoa (i.e. Mollusca, Brachiopoda and Annelida). The comparison of the codons codifying for the asparctic acid (D) and glutamic acid (E), shows that a point mutation at the third position of the codon would be responsible of this conservative change. A particular situation is presented in H. vulgaris (Cnidaria: Hydrozoa) in which the Tyrosine (Y) residue is substituted by asparagine (N), being the only sequence analysed showing this conformation (i.e. ERWNAI). A point mutation at the first position of the codon should be responsible, and it has previously proposed as a sequence artefact 3 . www.nature.com/scientificreports www.nature.com/scientificreports/ Predicted sequences and general relationships between the animal phyla. As a result of a multiple sequence alignment, it also seems clear that at least two region of the AT/Ox receptor were highly conserved. One comprising the third transmembrane domain and its associated intracellular loop, and the second one comprising the TMVII (Fig. 1).
As a first approach to understand the relationships between the total sequences analysed, a Neighbour-Joining analysis were performed (Fig. 2). The analysis shows that, as might be predicted, Placozoa (two sequences) and Cnidaria (three sequences pertaining to two different species of Anthozoa), clusters together sharing a common ancestor. Interestingly, the only sequence fitting the characteristics of the AT/Ox family of GPCR in Hydra vulgaris (Cnidaria: Hydrozoa) is clustered alone as the sister group of Bilateria (Fig. 2).
Despite of several genomes of the phylum Nematoda are fully sequenced, none of the GPCR sequences found in the GenBank showed the DRWY motif, suggesting that the AT/Ox system is not present in this phylum. Similar situation was found for the other two groups of Metazoa with uncertain positions as Porifera and Ctenophora.
Mammals is the only group of organisms in which the existence of two different kind of receptors was proved (i.e. Type 1 and Type 2), suggesting that the presence of these two receptors constitutes a synapomorphy of this group. Interestingly, Lepisosteus oculatus, pertaining to the group of Lepisosteiformes (with only six extant species), representing together with Halecostomi, the extant groups of Neopterygii, also presents two sequences, sharing the same clade with Type 1 receptor of Mammals. A more detailed analysis performed with ML methodology (see Fig. 3) also shows that these two sequences fit in the same clade, suggesting that the Type 1 Ox receptor appeared at least twice along the evolutionary history of Vertebrata (Fig. 2).
Finally, the three best represented groups (i.e. Arthropoda, and the Type 1 and 2 receptors of Vertebrata) can be recognized at least by a highly conserved motif at the level of the interphase between TMIII and the second intercellular loop (see Table 2 and Supporting Information File 1). evolutionary history of orexin receptors in vertebrates. As previously stated, there exist two types of receptors in Vertebrate (i.e. Type 1 and Type 2). A ML analysis clearly divides the groups analysed in two clades based on the Type 1 and Type 2 characteristics (Fig. 3). As we described above, two out of three sequences www.nature.com/scientificreports www.nature.com/scientificreports/ predicted for L. oculatus are grouped in the same clade of Type 1 receptor of Mammals. The other one (accession number XP_006638920) is grouped as a Type 2 receptor in the Actinopterygii clade (Fig. 3).
Regarding the Type 2 group, beyond that the Sarcopterygii are not grouped as a clade, showing Coelacanthimorpha, Amphibia, and the rest of tetrapoda a common ancestor with Actinopterygii and Chondrichthyes, the more represented groups (i.e. Mammals, Acitnopterygii and Sauropsida) are well defined as monophyletic groups (Fig. 3). www.nature.com/scientificreports www.nature.com/scientificreports/ sauropsida. As a first attempt to further understand the evolutionary history of the Ox receptor family, we decided to go deeper in the analysis of two groups of vertebrates well represented in our sample, as Sauropsida and Actinopterygii are, looking also for signatures motifs for every group analysed. In fact, after a detailed analysis of The cut-off value of replicate trees in which the associated taxa clustered together after a bootstrap test (1000 replicates) was 50%. Note that both kind of orexin receptors (Type 1 and Type 2) group independently. Type 2 receptor is present in all the groups of vertebrates included in the analysis. Type 1 receptor is only present in mammals with the exception of Lepisosteiformes (Actinopteygii: Neopterygii), suggesting that this kind of receptor could have appeared more than once along the evolution of Vertebrata.
www.nature.com/scientificreports www.nature.com/scientificreports/ the alignments for each group, we could find signature motifs, that once blasted in the GenBank, remitted specifically to most of the groups under study ( Table 2).
A ML analysis of Sauropsida shows two well supported clades conformed, one of them by Lepidosauromorpha species, including those corresponding to Iguania and Serpentes, traditionally grouped in the order Squamata, and the second one, conformed by Archosauria and Testudines (Fig. 4). Regarding Squamata, sequences in the TMV -IC Loop 3 seems to be characteristic, showing Serpentes and Iguania the motifs APLCLMVLAYLQIFQKLWCQQ and YMAPLCLMVLAYLQIFQKLWC respectively ( Table 2).
As would be expected, Archosauria presents a well-defined phylogenetic pattern involving Crocodylomorpha, with species representing the three extant groups (i.e. Gavialoidea; Alligatoroidea and Crocodyloidea) and Aves. The clade including Crocodylomorpha seems to be characterized for four different signature motifs; two located at the N-terminal domain, one corresponding to the C-terminal, and a 4 th one in the IC Loop 3 (see Table 2). With respect to the birds, a sequence located in the interphase between N-terminal and TMI would act as a signature ( Table 2).
The clade corresponding to Aves, currently accepted as members of Coelurosauria (Dinosauria: Saurischia), shows the sequence YEWALIAGYIVVFIVA in the interphase N-terminal -TMI, fully conserved (Table 2). With respect to the phylogenetic relationships, the main groups are represented and grouped as well, including Paleognathae (Tinamus guttatus), and Neognathae which in fact form two well supported clades including Galloanserae and Neoaves (Fig. 4). Moreover, the two groups of Galloanserae are represented by four species pertaining to different genus, grouped in the expected clades. In fact, Anser cygnoides and Anas platythynchos (Anseriformes), and Coturnix japonica and Gallus gallus (Galliformes) form two monophyletic groups.  www.nature.com/scientificreports www.nature.com/scientificreports/ Regarding the Neoaves, only two currently recognized orders, Psittaciformes (represented by two species) and Passeriformes, are well defined (Fig. 4). Passeriformes represented by 16 sequences, would be recognized by the sequence TSNIDEAM at the C-terminal domain. Moreover, two families in this group, Pipridae and Paridae, would also be identified by signatures at the level of the C-terminal domain ( Table 2).
The last point to analyse is the position of turtles which phylogenetic position have been largely discussed. Our analyses shows the clade of Testudines, represented by species pertaining to three different families, as the sister group of Archosauria (Crocodylomorpha + Aves). Indeed, the sequence ASTESRKSLTTQISNFDN corresponding to the C-terminal domain, identify the Archosauria-Testudines clade (Fig. 4, Table 2).
Actinopterygii. Regarding to Actinopterygii (represented by species corresponding only to Neopterygii), the ML analyses of Type 2-like receptor, present them as a well-supported clade, sharing a common ancestor with Chondrichthyes which are characterized by the presence of the ADYDDEFI motif at the level of the N-terminal (Fig. 5, Table 2). As expected, the sequence corresponding to Type 2 receptor of Lepisosteiformes appears as the sister group of Halecostomi (Fig. 5). With respect to Halecostomi, only sequences corresponding to Teleostei was found. Amiiformes, one of the extant group is not represented in our samples. Teleostei, the more diversified group, represented by numerous species that can be grouped in 11 different clades (see tolweb.org for reference) is represented by 6, including Osteoglossomorpha, Ostariophysi, Clupeomorpha, Salmoniformes, Esociformes and Acanthomorpha (Fig. 5). Similarly to other studies, Osteoglossomorpha (Scleropages formosus) appears as the sister group of the clade that includes Ostarioclupeomorpha (Ostariophysi and Clupeomorpha) and Euteleostei (Protacanthopterygii and Neoteleostei).
The other two clades of Teleostei (i.e. Ostarioclupeomorpha and Euteleostei) share a common ancestor. The first one, involves one Clupeomorpha species appearing as the sister group of Otophysi, which is well represented by three out of four recognized orders (Characiformes, Siluriformes and Cypriniformes) (Fig. 5). Indeed, Characiformes and Siluriformes are grouped in a clade as expected by previous phylogenetic studies, being the sister group of Cypriniformes. Regarding Euteleostei, the two main clades appear as sister groups; Protacanthopterygii (which could be characterized by the presence of the KFRAEFKA motif in the C-terminal), including Esociformes (Esox lucius) and Salmoniformes. Salmoniformes are represented by three species of two different genus: Salmo salar, Oncorhynchus mykiss and O. kisutch. Moreover, the two species of the genus Oncorhynchus are recognized as a clade (Fig. 5). Regarding Salmoniformes, our analyses show the existence of 5 different motifs that might be considered as signatures (see Table 2).
With respect to Neoteleostei, a total of 28 sequences were analysed, pertaining all of them to the clade of Percomorpha (Acanthopterygii), corresponding to: Pleuronectiformes (2), Gasterosteiformes (1), Synbranchiformes (1), Tetraodontiformes (1), Beloniformes (1), Cyprinodontiformes (10) and 12 species corresponding to the non-monophyletic traditional "Perciformes". The members of two families traditionally considered as members of the order Perciformes, as Pomacentridae (represented by two species) and Cichlidae (five species), are well grouped as individual clades. Indeed, the clade of Cichlidae, currently considered as the Order Cichliformes 41 might be identified by three different motifs located at the N-terminal, C-terminal, and IC Loop 3 (Fig. 5, Table 2). Finally, other well represented group is Cyprinodontiformes, characterized by the presence of www.nature.com/scientificreports www.nature.com/scientificreports/ DNLSRLSDQ motif at the C-terminal domain, including 10 sequences corresponding to five different families, being Rivulidae (2) and Poeciliidae (5 species) those best represented. Interestingly both of them are grouped as individual clades (Fig. 5), being characterized by the RTLRCSAQT (Rivulidae) and QRNWRTIQCS motifs (Poeciliidae). Regarding Poeciliidae, two more motifs might be characteristics at the level of the N-terminal domain ( Table 2).

Discussion
As it is known, GPCRs are widely distributed in nature, being associated with the regulation of a great number of physiological mechanisms. As they are engaged with critical processes it is not rare that they were conserved along the evolutionary processes, being appeared early in the evolution. Indeed, SWSI (short-wavelength sensitive opsin), another member of the GPCR family of proteins which is involved in light signal transduction, has proved to be a potential phylogenetic marker in Vertebrata, showing phylogenetic relationships congruent with the evolution of this group at both high and low taxonomic levels 42 .
As we stated above, Allatotropin is a peptidic messenger originally characterized by its ability to stimulate the synthesis of Juvenile Hormones in the moth M. sexta 17 , a highly derived function due that Juvenile Hormones are only present in insects. Originally characterized as a neuropeptide, it was also proved to be secreted by epithelial cell populations 23,[25][26][27][28] . AT has shown to be pleiotropic, being associated to the regulation of a multiplicity of physiological processes as digestive enzyme secretion and ion exchange regulation 20,21 , as well as, the immune response in mosquitoes 43 . Moreover, it also acts as a myoregulatory peptide, modulating the visceral musculature at different levels of the gut, and also as a cardioregulatory peptide [22][23][24][25][26][27]44 . Regarding Ox peptides, they were originally characterized in mammals, being secreted by neurons located in the hypothalamus. Originally associated with feeding behaviour 35,36 , they also act on sleep-wakefulness, being involved in the neurological disorder known as narcolepsy 45,46 . Furthermore, Ox peptides are also related with mechanisms regulating the activity and differentiation of the brown adipose tissue 47 , a derived function in view of that this kind of tissue is only present in mammals 48 . Moreover, despite that they were originally characterized as neuropeptides (i.e. secreted by neurons) similarly to AT, the Ox peptides are also secreted by epithelial cell populations 49,50 . Both families of peptides, have proved to be present in other groups related with those in which were originally characterized. Indeed, while its function was not analysed, the presence of AT in other groups of Arthropoda, as Crustacea, Myriapoda and Chelicerata was suggested [51][52][53] . The presence of Ox peptides in other groups of vertebrata were also proved. In fact, it was shown that Ox has an orexigenic effect on the bullfrog larvae 54 . A similar effect was demonstrated in the goldfish Carassius auratus, in which Ox peptides stimulate both, feeding behaviour and food intake 55 . Despite of that the activity as a feeding behaviour modulator might be absent in Sauropsida 56 , it was proved that Ox is involved as sleep/wakefulness modulator in birds, playing an important role in the behaviour associated to vigilance 57 . In amphibians, beyond that no experiment about the sleep/wakefulness activity were performed, the distribution of the orexinergic fibers suggests that this function would be conserved 58 . The same was proved in the zebrafish (Danio rerio) in which the overexpression of orexins induces an insomnia-like behaviour, promoting locomotion and inhibiting rest 59 . Beyond the complex functions described above, Ox peptides have also www.nature.com/scientificreports www.nature.com/scientificreports/ been related with other functions as those related with visceral muscle activity modulation. In fact, like AT, the presence of Ox receptors in the gut, and their activity as myoregulators of smooth muscle cells was also proved 39 . Furthermore, it was recently shown that Ox peptides also act on cardiomyocytes, increasing the shortening of these cells in rats and humans 60 .
Regarding AT, as we described above the existence of AT-like peptides was also proposed in other groups of Protostomata. Moreover, the treatment with AT induces muscle contraction at the level of the digestive system in Platyhelminthes 30 . We have previously shown that GPCRs are present in a variety of Metazoa, including T. adhaerens, the multicellular organism pertaining to the neuron-less phylum, Placozoa 1 . Moreover, the finding of two predicted sequences exhibiting motifs that may be considered as signatures of the AT/Ox family of GPCRs, are shown in this study. Furthermore, studies in our laboratory suggest that in Hydra sp., ATr would be present (Cnidaria: Hydroazoa) 1-3 . In fact, these studies suggest the existence of an Allatotropin/Orexin homologous system that would acts as myoregulator, controlling the movements associated with the capture and digestion of the prey 1-3 . Beyond the multiplicity of processes regulated by AT and Ox peptides, some of them corresponding to derived functions, both peptides are involved in mechanisms controlling visceral muscle contractions from Cnidaria to Vertebrata, suggesting that this signalling system have appeared associated to feeding in the common ancestor of Metazoa.
AT/Ox GPCRs are characterized by the presence of a Tryptophan (W) instead of a Tyrosine (Y) associated to the E/DR motif in the IC Loop 2 16 . Our results show, that the AT/Ox family of GPCRs may be defined by the presence of the E/DRWYAI motif, present in 381 out of 392 sequences analysed, covering most of the Metazoa phyla, and that might be considered as a signature of the family. Interestingly, any convincing sequence showing this characteristic motif was found nor in Ctenophora neither in Porifera. The lack of the AT/Ox family of GPCR in those phyla, might be a biological phenomenon, or perhaps an artefact. In fact, beyond the great quantity of information about genomic and transcriptomic sequencing, it may be assumed that it is still perfectible. Moreover, the phylogenetic positions and the evolutionary relationships between Ctenophora, Porifera and the rest of the metazoan groups is still controversial 8,61 . Furthermore, regarding GPCRs, it was already suggested that the Porifera Rhodopsin family has not orthologous relationship with the ones found in the rest of Metazoa 11 .
Regarding Vertebrata two different groups were found. Interestingly, they are not defined by their phylogenetic relationships, but by the kind of the protein constituting the receptor (Type 1 and Type 2 receptor). One of these groups (i.e. Type 2) is represented in all the groups including, Chondrichthyes, Actinopterygii, Sauropsida and Mammalia, and might be defined for the presence of the CIAL/QDRWYAICHPL motif. On the other hand, with the exception of Lepisosteiformes (Actinopterygii: Neopterygii), Type 1 receptor is exclusively expressed in Mammalia (defined by the FIALDRWYAICHPL motif). In fact, in Lepisosteiformes, three different sequences were found; two of them are grouped in all the analysis performed with the Type 1 receptor of mammals showing also the FIALDRWYAICHPL motif in the interphase between TMIII and the IC loop 2. Beyond these two sequences, a third one (grouped as Type 2 receptor), shows a phylogenetic position according to the current assumption, as the sister group of Halecostomi. The existence of two kind of Ox receptors might be considered as a synapomorphy of Mammalia. The presence of the Type 1-like receptor in Lepisosteiformes would be suggesting that this receptor had appeared more than once along the evolution of Vertebrata.
As a way to further understand the evolutionary history of this family of receptors, we decided to go deeper in the analysis of Type 2-like receptor phylogenetic relationships in two groups of Vertebrata (Sauropsida and Actinopterygii). In both of them, our results show that the sequences phylogenetic relationships are mostly in agreement with current hypothesis about their phylogeny. As an example, a group of species of Neoteleostei (i.e. Oreochromis niloticus, Maylandia zebra, Neolamprologus brichardi, Haplochromis burtoni and Pundamilia nyererei), traditionally considered as the Cichlidae family pertaining to the order Perciformes (currently considered as polyphyletic), are still grouped as a clade, that in fact is now considered as the order Cichliformes 41 . Another interesting point is that related with the order Cyprinodontiformes. This group represented by 10 species pertaining to five different families, are well defined as independent groups, being the two families represented by two or more species (i.e. Poeciliidae and Rivulidae) grouped as monophyletic groups sharing a common ancestor with the rest of the species of the order. Indeed, these two families might be recognized by signatures located at the N-terminal and IC Loop 3.
Other interesting subject is related with the phylogeny of Sauropsida and the evolutionary position of turtles (Testudines). The phylogenetic position of turtles was largely controversial, as they were traditionally considered as an order pertaining to the group of Anapsida (having no temporal fenestrae in their skull). Traditional studies based on paleontological and morphological characters positioned them as the only extant group of Anapsida being the sister group of Diapsida (a clade that includes Lepidosauromorpha and Archosauria). Based on both paleonthologycal and molecular phylogeny, the evolutionary relationships of Testudines was revisited, considering them as the sister group of Lepidosauromorpha, or as the sister group of Archosauria (Aves and Crocodylomorpha) (for a review see 62 ). The finding of a stem-turtle from the middle Triassic finally positioned turtles as a member of Diapsida 63,64 . In agreement with previous molecular studies [65][66][67][68] , our results, based on sequences pertaining to three different families, place Testudines as the sister group of Archosauria, sharing the ASTESRKSLTTQISNFDN motif at the C-terminal domain. Indeed the existence of a new group including Testudines and Archosauria named Archelosauria was recently proposed 66 .
Finally, our results show the existence of numerous motifs that might be considered as signatures for several of the groups analysed, being hypothetically possible to test them both as phylogenetical markers at both higher and lower taxonomic levels. (2019) 9:10217 | https://doi.org/10.1038/s41598-019-46712-9 www.nature.com/scientificreports www.nature.com/scientificreports/ Methods Data retrieval. Sequences corresponding to Vertebrate and Insecta AT/Ox GPCRs were searched in protein database of the National Center for Biotechnology Information (NCBI) at https://www.ncbi.nlm.nih. gov/pubmed, and by protein BLAST (https://blast.ncbi.nlm.nih.gov/Blast.cgi?PROGRAM=blastp&PAGE_ TYPE=BlastSearch&LINK_LOC=blasthome) on the basis of already annotated sequences in the Non-redundant protein sequences database. All the selected sequences were checked for the presence of the characteristic seven transmembrane domains using the TMHMM Server v. 2.0 (http://www.cbs.dtu.dk/services/TMHMM/). The presence of the E/DRW domain at the IC Loop 2 associated to TMIII was also verified. The sequences were then aligned using the Clustal Omega algorithm for multiple sequence alignment (http://www.ebi.ac.uk/Tools/msa/ clustalo/) and further analysed by the JalView 2.7 69 . Only those sequences presenting the seven TMs and the E/DRW domains, were included. sequence analysis and alignment. Based on the alignment of the full set of sequences a search for motifs that might be considered as signatures in the AT/Ox family was performed. Once established at least one probable signature a search in different phyla including Bilateria and non-bilateria groups as Cnidaria and Placozoa were done. Each sequence were analysed looking for both, the presence of the seven transmembrane domains pattern and the presence of the E/DRW motif. The phyla in which probable GPCRs associated to the AT/Ox family were found are: Placozoa, Cnidaria, Arthropoda, Mollusca, Annelida, Brachiopoda and Chordata (see Supporting Information File 1). phylogenetic analysis. Finally, the analysis of evolutionary relationships between sequences, except for the one corresponding to Fig. 1 (Neighbor-Joining), was performed using the ML method based on the Poisson correction model, including a 1000 replicates bootstrap analysis, with a 50% cut-off for condensed tree by the use of Mega 6.06 software 70 . The trees were then edited by the use of FigTree software (http://tree.bio.ed.ac.uk/ software/figtree/).
The basic evolutionary relationships between groups are referred to Tree of Life web Project (http://tolweb.org/ tree/) 71 . search for signatures. Once the alignments were performed, we look manually for conserved motifs in different groups. The putative signatures were then blasted (https://blast.ncbi.nlm.nih.gov/Blast.cgi). Only those sequences presenting motifs covering the total length of the query blasted, showing %100 of identity were selected as putative signatures.

Data Availability
All the sequences analysed are in the Supplementary File 1. The datasets generated and/or analysed during the current study are available from the corresponding author on reasonable request.