Holothurians have a reduced GPCR and odorant receptor-like repertoire compared to other echinoderms

Sea cucumbers lack vision and rely on chemical sensing to reproduce and survive. However, how they recognize and respond to environmental cues remains unknown. Possible candidates are the odorant receptors (ORs), a diverse family of G protein-coupled receptors (GPCRs) involved in olfaction. The present study aimed at characterizing the chemosensory GPCRs in sea cucumbers. At least 246 distinct GPCRs, of which ca. 20% putative ORs, were found in a transcriptome assembly of putative chemosensory (tentacles, oral cavity, calcareous ring, and papillae/tegument) and reproductive (ovary and testis) tissues from Holothuria arguinensis (57 ORs) and in the Apostichopus japonicus genome (79 ORs). The sea cucumber ORs clustered with those of sea urchin and starfish into four main clades of gene expansions sharing a common ancestor and evolving under purifying selection. However, the sea cucumber ORs repertoire was the smallest among the echinoderms and the olfactory receptor signature motif LxxPxYxxxxxLxxxDxxxxxxxxP was better conserved in cluster OR-l1 which also had more members. ORs were expressed in tentacles, oral cavity, calcareous ring, and papillae/tegument, supporting their potential role in chemosensing. This study is the first comprehensive survey of chemosensory GPCRs in sea cucumbers, and provides the molecular basis to understand how they communicate.

All living organisms perceive and respond to chemical cues in their environment, which mediate a variety of activities such as feeding, predator avoidance, mating and social behaviours 1,2 . These cues can be detected over long and short distances, and include a large diversity of molecules ranging from amino acids and nucleic acids, to small volatile compounds, peptides and proteins (see review 3 ). To detect and discriminate chemical cues, animals have developed complex chemosensory organs, including the olfactory organs of vertebrates, which contain a large repertoire of chemosensory receptors 4 . Although well characterized in some animals [5][6][7] , chemosensory receptors remain largely undescribed in many metazoan lineages.
A large group of chemosensory receptors belong to the G protein-coupled receptors (GPCRs), one of the largest superfamilies of seven transmembrane domain receptors found in metazoans 8 . GPCRs convert extracellular stimuli, ranging from small molecules and photons to peptides and proteins, into intracellular biochemical signals via multiple signalling cascades (mostly cAMP and calcium secondary messengers) 9 . The large variety of ligands is reflected in the structural diversity of GPCRs which are classified into five main families based on their sequence similarity (GRAFS system): glutamate (G), rhodopsin (R), adhesion (A), frizzled (F) and secretin (S) 10 . Chemosensory functions have been associated with the glutamate-receptor family and the rhodopsin-type family 11,12 . The latter contains the largest number and the most diverse repertoire of GPCRs involved in vertebrate olfaction 8 .
Rhodopsin family members involved in vertebrate olfaction include, i) the odorant receptors (ORs) 13 , ii) the trace amine-associated receptors (TAARs) 11 and iii) the formyl peptide receptor-like proteins (FRPs) 14 . Type 1 and 2 vomeronasal receptors are also GPCRs involved mainly in vertebrate pheromone detection but poorly represented in teleost fishes [15][16][17] . Teleost fish possess olfactory receptors related to the rhodopsin and glutamate GPCRs involved in sensing of pheromones and other molecules such as amino acids [18][19][20] . ORs are the largest CCMAR -Centre of Marine Sciences, University of Algarve, Campus de Gambelas, 8005-139, Faro, Portugal. *email: nmarquet@gmail.com open GpcR transcripts and genes. The H. arguinensis transcriptome originated a total of 1,580 contigs with five, six and seven predicted transmembrane domains (TMs), of which 474 were retained as putative GPCRs. After elimination of duplicates, 246 were considered unique GPCRs and 236 were classified into the five main GRAFS families: glutamate (21), rhodopsin (141), adhesion (56), frizzled (3) and secretin (15) (Supplementary Table S1). Searches in the A. japonicus genome identified 310 GPCR genes, including 297 that were classified into the five GRAFS families (13 Glutamate, 231 Rhodopsin, 39 Adhesion, 1 Frizzled, 13 Secretin), suggesting that a similar number of receptors exists in the two sea cucumber species (Supplementary Table S2). Members of the vomeronasal and taste 2 receptors were not identified in either species.
The rhodopsin family was the largest and most represented, with more than 50% of the total GPCRs found both in the H. arguinensis transcriptome (141 transcripts) and A. japonicus genome (231 genes). The receptors within this family belonged to the four main groups represented in human (α, β, γ, δ), and the α group had the most numerous and diversified receptors in both species ( Post-QC reads  51,200,570  61,211,616  62,185,582 43,113,778  38,167,808  44,549,154  300,428,508   Total contigs  86,417  153,723  200,418  353,921  257,323  327,647  810,312   N50 (bp)  678  656  726  741  656  691  628   Average length (bp)  587  571  600  590  552 565 518  Tables S3 and S4). No hits were obtained against the profiles of other organisms. In total, 78 transcripts and 112 genes were retrieved as putative OR-like sequences for H. arguinensis and A. japonicus, respectively. Phylogenetic analysis revealed that most of the sea cucumber putative OR-like sequences cluster in proximity with the sea urchin and starfish OR-like sequences and originate four main receptor sequence clades (Fig. 1). Each of the four OR-like echinoderm clades was statistically supported by all the three branch support methods used (i.e. aLRT-Chi 2 , aBayes and SH-LRT) and were designated as OR-l1, OR-l2, OR-l3, OR-l4. The tree topology also suggested that the echinoderm OR-like groups shared common ancestry and that the clade OR-l1 was the first to diverge followed by OR-l2, OR-l3 and OR-l4 in subsequent duplication events. Gene expansions that resulted from species-specific events were also observed specially within the OR-l1 cluster (denoted as SC-1 in Fig. 1).   Fig. 1) with sequences from the sea urchin surreal-GPCRs group B and starfish OR-like group D. Clade OR-l2 contained 11 and 19 OR-like sequences from H. arguinensis and A. japonicus, respectively (SC-2 in Fig. 1), which clustered with the sea urchin surreal-GPCRs group C and the starfish OR-like groups H, I, J. The clades 3 and 4 contained the least number OR-like sequences: 6 from H. arguinensis and 8 from A. japonicus grouped with the sea urchin surreal-GPCRs group D (SC-3 in Fig. 1); and 8 from H. arguinensis and A. japonicus clustered with the starfish OR-like group F (SC-4 in Fig. 1). Figure 2 summarizes the number of GPCRs and their family members, including olfactory receptor in vertebrates and invertebrates.
The amino acid sequence similarity and identity between the H. arguinensis sequences within the same OR-like clade was on average 41% and 24%, respectively (Supplementary Table S5). When these sequences were Figure 1. Phylogeny of the sea cucumber OR-like candidates (H. arguinensis, 78 sequences and A. japonicus, 112 sequences) selected by the HMM analysis with OR-like/olfactory and non-olfactory rhodopsins from other echinoderms (A. planci, starfish and S. purpuratus, sea urchin), cnidaria (N. vectensis, sea anemone) and ORs from cephalochordates (B. floridae, amphioxus) and aquatic vertebrates (O. latipes, teleost fish). The ML tree was rooted with the chordates (amphioxus and teleost fish) cluster. The different OR-like/olfactory clusters are highlighted by a coloured line, corresponding to the respective group, around the tree. The four echinoderm OR-like clades are shaded in grey and numbered according to the cluster name: 1 is OR-l1, 2 is OR-l2, 3 is OR-l3 and 4 is OR-l4. The sea cucumber specific gene expansions are designated as SC-1, SC-2, SC-3 and SC-4. Branch support was represented only when at least one of the three methods used (aLRT-Chi 2 , aBayes and SH-LRT) had statistically significant supporting values. Tree branch symbol: full circle: three methods were significant; circle with a dot: two methods were significant and empty circle: one method was significant. SC: sea cucumber.
Genome mapping of oR-like genes. Analysis of the A. japonicus genome revealed that several OR-like receptor genes belonging to OR-l1 and OR-l2 were organized in tandem arrays (Fig. 3). However, no OR-like candidates from OR-l3 and OR-l4 were seen on the same genome scaffold.

Effect of selection on putative OR-like receptors.
To get an insight into the mechanisms that shape evolution of sea cucumber OR-like receptors, likelihood ratio analysis was carried out based on the ratios of non-synonymous (modifying amino acids) versus synonymous (non-modifying or silent) substitutions of the four OR-l clades. The likelihood ration test (LRT) for the branch (M0:M1) specific analysis revealed that for three (1, 3 and 4) out of the four OR groups, the data had a better fit to the free-ratio model (M1) where ω may vary between the branches indicating variable selective pressure regimes in each branch. The results for the site models (M7:M8) indicated that the four OR groups best fitted the M7 model which only allows for ω ≤ 1. Accordingly, the estimated ω values under the M8 model (Table 3) were significantly less than one, hereby indicating purifying (negative) selection among the different sites for each receptor clade (Table 3; Supplementary Note). echinoderm oR-like signature motifs. Several motifs have been found that discriminate ORs from non-ORs: LxxPxYxxxxxLxxxDxxxxxxxxP, MAxDRYxxxCxPLxY, KAxxTxxxH and PxxNPxxY (where x is any amino acid) which are conserved amongst vertebrates 28,29,50 . Analysis of the entire sea cucumber OR-like repertoire revealed that several amino acids were conserved in the LxxPxYxxxxxLxxxDxxxxxxxxP (intracellular loop, IL, 1)/ transmembrane domain, TM, 2), MAxDRYxxxCxPLxY (TM3/IL2) and PxxNPxxY (TM7) motifs ( Supplementary Fig. S1). However, the KAxxTxxxH motif (IL3/TM6) was poorly represented in sea cucumber ORs, where only the lysine (K) was conserved, as seen also in cnidaria.  47,48 , D. rerio 30,48 , T. rubripes 30,48 and B. floridae 29,86 . The cladogram corresponds to a species tree that was built using the ML method with concatenated sequence of four ORs per species. The values of the bootstrap are seen at the nodes of the trees. This species tree is in agreement with the generic tree defined in the Tree of Life 93 (http://tolweb.org). The percentage of OR-like Rhodopsin found within the Rhodopsin family is represented in light blue.
Four amino acid residues (alanine, A; arginine, R, tyrosine, Y; proline, P) within the MAxDRYxxxCxPLxY motif and three (asparagine, N; second proline, P; tyrosine, Y) within the PxxNPxxY motif were commonly found in sea cucumber ORs. Because these two motifs might not be specific to ORs due to the presence of the DRY and NPxxY residues that are characteristic to the rhodopsin-like family 34 , the LxxPxYxxxxxLxxxDxxxxxxxxP motif was choosen as OR target in sea cucumbers, as previously described for other echinoderms (starfish 34 and sea urchin 28 ). This motif was searched in each of the phylogenetic tree clades and a consensus motif sequence was determined for each of the four echinoderm OR-like clades. In amphioxus and sea anemone, the OR motif contained five of the six conserved residues, with the tyrosine (Y) typical of vertebrates OR motifs largely absent (Fig. 4). In the echinoderm OR-l1 clade, three residues including the two lysines (L) and the aspartic acid (D) were present (LxxxxxxxxxxLxxxD) while aspartic acid (D) was the dominant residue in OR-l2. In OR-l3 and OR-l4, the second lysine (L) and aspartic acid (D) were preserved over the six conserved residues (seen as LxxxD).
tissue expression of the oR-like candidates. Forty-eight of the 57 OR-like candidates selected through the combination of the HMM profiles and phylogenetic analysis were mapped in the six tissue transcriptomes of H. arguinensis (Supplementary Table S6). At least 75% of the 48 OR-like candidates identified were found to be present in the oral cavity (36) and the papillae/tegument (40) followed by the calcareous ring (21) and the tentacles (15), which represented 35% of the OR-like candidates mapped. None of the OR-like candidates were found in the testis and only four were mapped in the ovary (less than 10% of the OR-like candidates). The   www.nature.com/scientificreports www.nature.com/scientificreports/ calcareous ring, the papillae/tegument, and the oral cavity expressed members of all the four echinoderm OR-like candidate clades. However, OR-l3 members were not found in tentacles, and the ovary contained only a subset of OR-l1 members (Fig. 5). Most receptors within each cluster overlapped among tissues, with a few unique receptor transcripts identified in oral cavity and papillae/tegument. More OR-like transcripts were found in the oral cavity (113.02 transcripts per million, TPM) and papillae/tegument (109.88 TPM) followed by the calcareous ring (81.61 TPM), tentacles (74.84 TPM) and ovary (46.29 TPM).
Quantitative reverse-transcription polymerase chain reaction (qPCR) of 20 selected OR-like candidates confirmed generally higher levels of expression in the oral cavity and papillae/tegument and lower levels in the tentacles (Fig. 6). These receptors belonged to the four OR-like groups, and some (i.e. DN171647, DN18934, DN14406, DN113016) were 2-3 orders of magnitude more highly expressed in the oral cavity than in the tentacles (Fig. 6).

Discussion
A large diversity of GPCRs, of which 57 were considered OR-like, were identified in the H. arguinensis transcriptome and a slightly higher number (79 OR-like genes) were retrieved from the A. japonicus genome. These OR-like candidates were organized in four main clades that cluster with other echinoderm OR-like sequences from starfish and sea urchin, with each clade showing a characteristic signature motif. Most of these OR-like www.nature.com/scientificreports www.nature.com/scientificreports/ candidates were found in sensory tissues including tentacles, oral cavity, calcareous ring, and papillae/tegument, which is consistent with their potential involvement in chemical sensing.
The number of H. arguinensis and A. japonicus GPCRs (respectively 246 and 310 receptors) is of the same order of magnitude as reported for the hemichordate S. kowalevskii (260) 31 , the urochordate C. intestinalis (169) 51 and the demosponge Amphimedon queenslandica (220) 52 . However, this number is smaller than in other echinoderms which genomes possess more than 900 GPCRs such as the starfish A. planci 34 and sea urchin S. purpuratus 33 . This suggests that sea cucumbers have a more compact GPCR gene repertoire than starfish or sea urchin.
Odours are mostly detected by rhodopsin-like GPCRs 2 . However, no orthologues of the TAARs, FRPs and vomeronasal 1 and 2 receptors, characteristic of vertebrate olfaction, were identified in the sea cucumber. Neither were they found in other invertebrates such as starfish 35 , sea urchin 33 , sponge 52 or mollusc 27 . In contrast, based on HMM profiles and phylogeny, 57 H. arguinensis transcripts were considered OR-like and grouped in four OR-like clades with the starfish and sea urchin OR-like genes. Within these clades, there were large independent OR-like gene expansions that resulted from species-specific events. They were found mainly within the OR-l1 cluster, which is consistent with what has been described for the starfish and sea urchin ORs [33][34][35] . These lineage specific OR-like gene expansions could be linked to the rapid evolution and diversification of this receptor group as described in chordates [53][54][55][56] . They are also characterized by a relatively low percentage of sequence similarity and identity between the sea cucumber and other echinoderm OR-like sequences within each clade (ca. 22% of identity and 39% of similarity). The fact that sea cucumber OR-like receptor genes are mostly single exon and are organized in tandem arrays, as seen also in vertebrate ORs 6,13 and other invertebrates 27,33,35 , supports their olfactory receptor nature. Interestingly, the olfactory receptor signature sequence motif "LxxPxYxxxxxLxxxDxxxxxxxxP" was differently conserved across the four OR-like clusters and was most conserved in ORl1. Whether changes in receptor sequence are linked with their functional divergence deserves further investigation. www.nature.com/scientificreports www.nature.com/scientificreports/ Phylogenetic analysis revealed that the sea cucumber OR-like receptors emerged early in evolution and shared common ancestry with the other echinoderms. Evolution of the sea cucumber OR-like genes is likely to be the consequence of gene duplications that occurred earlier in the radiation of the echinoderms and to more recent duplications that independently occurred in Holothurians. Analysis of selection in OR-like receptors through the ratio of nonsynonymous to synonymous changes suggest that they are under purifying selection (ω < 1), which presumably maintains their function intact and reduces genetic variation. This is in agreement with several studies in vertebrates and Drosophila 57 , which are in general under weak purifying selection with no evidence for positive selection [58][59][60][61][62][63] . The extensive variation of OR genes among species, increasing rapidly in some species and undergoing mass pseudogenization in others suggests that olfactory ability is probably linked to the diversity of OR gene repertoires and their level of expression 53 .
In marine invertebrates, the OR repertoire seems to be variable between different taxonomic groups. No ORs have been found in urochordates, hemichordates and porifera 31,51,52 ; however, at least 300 OR-like were seen in starfish and sea urchin 33,34 . Compared to the rest of the echinoderms, sea cucumbers seem to have the smallest OR-like repertoire as shown in our transcriptome (57 transcripts) and genome (79 genes) analysis. The total number of OR-like transcripts in H. arguinensis could, nevertheless, have been underestimated due to our restrictive approach to select only GPCRs with at least five TMs and some GPCRs are known to be expressed only in larval stages 33,36 . Other sources of errors might have come from the HMM profiles generated, the transmembrane domain prediction or the phylogenetic clustering. Some putative ORs might also have been missed in our transcriptome due to assembly errors (formation of chimeras and fragmented contigs) even though this was limited by using the whole tissues transcriptome. However, only 22 additional OR-like genes were found in A. japonicus, suggesting that sea cucumbers possess fewer olfactory receptors than sea urchins and starfish. www.nature.com/scientificreports www.nature.com/scientificreports/ Moreover, the number of GPCRs and ORs obtained in S. purpuratus and T. rubripes with our methodology was similar, although slightly higher (and not smaller), to those found in previous studies. This increase is likely to be due to the integration in our analysis of more up-to-date models that incorporated recent data from other invertebrate OR-like sequences such as those from the cnidarian N. vectensis 28 and the phylogenetically related echinoderm A. planci 34 . Also, we have considered GPCRs containing 5 to 7 TMs while other studies, such as Raible, et al. (2006) 33 , included only 7 TMs sequences. It is also important to note that chemosensory receptors other than GPCRs may also exist in sea cucumbers. For example, putative variant ionotropic glutamate receptors were recently discovered in the starfish A. planci 64 .
The specific function of ORs in echinoderms is unknown as the vast majority of putative ORs are orphans. Also, it is not known if there are differences on how the different classes of echinoderms perceive the environment. Anatomically, sea urchins and starfish are the only ones to possess pedicellariae in addition to the tube feet that is common to all echinoderms. These appendages are mainly involved in defense mechanisms and are divided in four different types which are thought to harbour different chemoreceptors as they react to different chemical stimuli 65 . The OR-like transcripts identified were largely found in three tissues that are in direct contact with the environment -the tentacles, oral cavity and papillae/tegument -which have been previously described as sensory tissues 66,67 and qPCR of 20 OR-like confirmed that they are orders of magnitude more highly expressed in the oral cavity and papillae/tegument, supporting their potential role as chemosensory organs in sea cucumbers. This draws a parallel with the putative chemoreceptors found in body appendages, mouth and tegument in starfish 34,35 and sea urchin 33,36 . However, OR-like candidates were also found in sea cucumber internal tissues (46% of the total) such as the calcareous ring that is connected to the radial nerve cord, and in ovary. The presence of OR-like candidates has also been reported in testis, sperm, oocytes, radial nerves, stomach in other echinoderms [33][34][35]68 . These findings are not surprising as in vertebrates ORs have been found in non-chemosensory organs such as testes and sperm, lung, spleen, liver, heart and thyroid where they are thought to carry out diverse and unrelated functions, not specifically linked to sensorial perception, which include cell-cell communication, chemotaxis and tissue regeneration 69-71 . conclusion As sea cucumbers have limited visual abilities and no hearing; they rely more on chemosensation for detecting biotic and abiotic factors in their environment. The sea cucumber GPCR repertoire, of which 20% are OR-like, is extensive but less expanded than in other echinoderms. The OR-like receptors from sea cucumber grouped with other echinoderm receptors into four distinct clades, which suggests that each clade may be involved in different functions. These receptors are under purifying selection and receptors from the OR-l1 clade are the most conserved and potentially the most interesting candidates as pheromone receptors. They have a widespread distribution and are mostly expressed in tissues that are in direct contact with the external environment such as tentacles, oral cavity and papillae/tegument. Our results provide the first molecular basis of chemical sensing in sea cucumbers which is an essential step to the understanding on how these animals communicate. After arrival at the laboratory, they were anesthetized by immersion in MgCl 2 (5%) in sea water and tissue samples (tentacles, testis and ovary, papillae and tegument, oral cavity and calcareous ring, including nerve ring) were dissected and pooled from four individuals (two males and two females, except for gonads in which four individuals of each sex were pooled). All samples were frozen on dry ice and kept at −80 °C until RNA extraction.

RnA extraction and library preparation.
Total RNA was extracted using the Maxwell 16 total RNA purification kit (Promega, Madrid, Spain) according to the manufacturer's instructions. Pooled samples were homogenized using an Ultra-Turrax homogenizer (IKA T25, Staufen, Germany) and total RNA was precipitated with ethanol and quantified using a Nanodrop (1000 Spectrophotometer, Thermo Fisher Scientific, USA). Agarose gel electrophoresis (0.8%/1 × TAE: Tris-acetate-EDTA) was used to assess the RNA quality and integrity. RNA samples were subsequently treated twice with DNAse to remove any remaining genomic DNA using the Turbo DNA-free kit (Ambion, London, UK).
Sequencing library preparation and sequencing was conducted by Genenergy (Shanghai, China) using a Illumina TrueSeq mRNA-seq library Prep kit (RNA input 2 µg, insert size of 300-400 bps) and sequenced using the Illumina Hi-Seq. 1500 to generate 100 base paired-end reads (Bioproject Accession no.: PRJNA532556).
Sequence assembly. Quality control of raw reads and their respective editing was performed with Trimgalore wrapper script version 0.3.3 72 producing simple descriptive statistics and edited reads, before assembly. Tissue specific de novo assemblies were obtained using Trinity v. 2.0.6 (trinityrnaseq_r2012-05-18) with the default parameters 73 . The pair-end reads from each of the six tissue libraries were used to assemble tissue specific de novo transcriptomes. In order to increase the probability of retrieving low expressed transcripts, a whole tissues GpcR sequence annotation. The whole tissues de novo transcriptome from H. arguinensis was translated into protein using the TransDecoder v5.0.2 74 . The predicted proteins were annotated using TMHMM v2.0 75 to generate a sub-library containing transmembrane (TM) domains proteins. Only sequences with predicted five to seven TM domains were considered and were further annotated using BLASTP v2.7.1+ searches 76 against Swiss-Prot (version 2018 with 558,125 entries) and Pfam (version 32.0 with 17,929 entries) with a cut-off e-value ≤ 1e −5 . The H. arguinensis GPCRs were then classified into the five main GPCR families and categorized into subfamilies as described by Fredriksson, et al. 8 . As a comparison, the GPCR gene repertoire in the A. japonicus genome 42 were annotated the same way as for H. arguinensis using as database the predicted proteins of this species (30,221 sequences) available in NCBI (MRZV00000000.1; PRJNA354676).
Identification of the OR-like candidates: hidden markov models. To identify putative OR-like protein sequences, 23 OR and OR-like profile Hidden Markov Models (HMMs) were built using the hmmbuild of HMMER v.3.1.b2 77 according to the methodology used to identify OR-like in the starfish A. planci and described in Hall, et al. (2017) 34 . The HMM profiles were based upon the alignments of teleost fish ORs (Danio rerio, Oryzias latipes, Takifugu rubripes, Tetraodon nigroviridis, Gasterosteus aculeatus) 30 , amphioxus ORs (Branchiostoma floridae) 29 , marine gastropod mollusc chemoreceptors (Aplysia californica; groups a to c) 27 , cnidaria OR-like (Nematostella vectensis) 28 , sea urchin chemoreceptors (Strongylocentrotus purpuratus; surreal-GPCRs; groups A to F) 33 and starfish OR-like (A. planci; groups A to K) 34 . Only H. arguinensis and A. japonicus proteins that aligned to the HMMs using hmmsearch with an e-value ≤ 1e −5 were selected as OR-like candidates. To estimate the reliability of the methodology used here to identify GPCRs and putative ORs, the same procedure was applied to the sea urchin S. purpuratus and the teleost T. rubripes, and compared with the published GPCRs/ORs repertoire estimates for these species.

Identification of the OR-like candidates: Phylogenetic analysis.
To select the most likely sea cucumber (H. arguinensis and A. japonicus) OR-like candidates, a phylogenetic analysis was conducted with the sequences retrieved from the HMM profiles and the olfactory/OR-like rhodopsins and non-olfactory rhodopsins from the cnidarian sea anemone N. vectensis 28 and basal deuterostomes starfish A. planci 34 and the sea urchin S. purpuratus 33 ; the cephalochordate amphioxus B. floridae 29 and the vertebrate teleost fish O. latipes 30 . The final dataset used to build the tree consisted of an alignment performed using Aliview 78 , of 1626 sequences that was subsequently manually annotated to remove gaps and trimmed to conserved transmembrane domains. Due to the large number of sequences, TM domains were initially predicted, extracted and concatenated for ten sequences of each species and each group (i.e. olfactory/OR-like rhodopsins and non-olfactory rhodopsins) using SMART 79 which were then used as a reference to delete the non-TM regions from the remaining of the sequences in the alignment.
With the edited sequence alignment, a maximum-likelihood (ML) tree was built using PhyML v3.0 80 . The ML tree was constructed using a VT substitution model with a gamma shape (4 rate categories) of G = 1.808, as selected by the SMS (Smart Model Selection) 81 according to the Akaike Information Criterion (AIC) 82 . The tree was edited in Figtree 83 and the chordate cluster (amphioxus and fish) was used to root the tree. Branch support was estimated using three methods: two parametric methods, aLRT Chi 2 -based (approximate likelihood ratio test) and aBayes (approximate transformation Bayes test), and one non-parametric method SH-aLRT. The nodes were reported when at least one of the three methods showed significant branch supported values, defined as aBayes ≥ 0.95, aLRT Chi 2 -based ≥ 0.9 and SH-aLRT ≥ 0.85 84 . Sequence identity and similarity (only with the full length receptors) were determined using GeneDoc software 85 .
The percentage of each GPCR family and the percentage of OR-like Rhodopsin within the Rhodopsin family were compared between the sea cucumbers (H. arguinensis and A. japonicus; present study) and other vertebrates and invertebrates: Homo sapiens 47,48 , D. rerio 30,48 , T. rubripes 30,48 , B. floridae 29,86 , S. purpuratus 33,47 (ORs: surreal-GPCRs; groups A-F), A. planci 34 (ORs: groups A-K), and N. vectensis 28,47 . A species tree was built using as input the edited alignment obtained by concatenating the predicted protein sequence of four OR sequences per species using the ML method in the PhyML v3.0 program with 100 bootstrap replicates. The echinoderm OR sequences came from OR-l1 (2 sequences) and OR-l2 (2 sequences) as all species are represented in these clusters.

Genome mapping of oR-like candidates.
To determine if sea cucumber OR-like candidates were positioned in tandem in the genome, the A. japonicus OR-like candidates were searched (tBLASTn) against its own genome assembly and the position of OR genes were mapped.
Selection of putative oR-like receptors. The levels of functional constraint and functional divergence of OR-like receptors were analysed through the ratio of non-synonymous (dN) to synonymous (dS) substitutions (ω = dN/dS). Neutral evolution is defined when ω = 1, while ω > 1 and ω < 1 indicate positive (diversifying) and negative (purifying) selection, respectively 87 . The full-length nucleotide coding sequences of the four OR-l clades from the two sea cucumbers were aligned to obtain multiple codon alignments using PAL2NAL v14 88 . These alignments were used to build a phylogenetic tree for each OR-l clade using the ML method in MEGA 7 89 . The codon alignments and their respective phylogenetic trees were then used to calculate the codon substitution ratio (ω = dN/dS) with the CODEML package in PAML version 4.8 87 . The following branch (M0:M1) and site models (M7:M8) were used 87 following the methodology of Mondragón-Palomino, et al. 90 : the one ratio model (M0) and the free ratio model (M1), the beta model (M7) and the beta-ω model (M8) (see Supplementary Note for more information). www.nature.com/scientificreports www.nature.com/scientificreports/ echinoderm oR-like signature motifs. A WebLogo 91 using the entire sea cucumber OR repertoire was built to identify putative OR motifs and to highlight conserved amino acid residues that are commonly found in fish, cephalochordates and cnidaria ORs (Supplementary Fig. S1). Previous searches in echinoderms identified conserved amino acids from the OR motif LxxPxYxxxxxLxxxDxxxxxxxxP in sea urchins and starfish 28,34 , and this motif was used as target to identify OR-like motifs in our sea cucumber sequences. Another WebLogo 91 was created using the multisequence alignment of each receptor clade previously defined in the phylogenetic analysis and the conserved amino acids specific to the target motif were highlighted. tissue expression of oR-like candidates. The OR-like candidates identified in H. arguinensis were sought in the six individual tissue assemblies using tBLASTn. Sequences that produced hits with an e-value cut-off of < 1e-80, a sequence identity ≥ 97% and a sequence coverage ≥ 150 nucleotides between the query and the subject were considered to be similar. Venn diagrams were created using the web-based tool InteractiVenn 92 to analyse the distribution of the OR-like candidates from each cluster among the tissue libraries. The proportion of OR-like was estimated by dividing the number of each putative OR contigs by the total number of contigs in each tissue.
qPCR was used to confirm the expression of 20 genes from the four OR-like groups identified in tentacles, oral cavity and papillae/tegument of H. arguinensis. Total RNA (tRNA) was extracted from tissues from three adult  www.nature.com/scientificreports www.nature.com/scientificreports/ H. arguinensis using the E.Z.N.A kit (VWR, USA) according to the manufacture instructions. DNase I treatment was performed directly on the columns. Primer pairs specific for each transcript were designed using the NCBI primer BLAST and amplicon sizes were between 120 and 200 base pairs. cDNA was synthesised in 20 µl reactions containing 300 ng of DNase-treated tRNA, 200 ng of random hexamers (Jena Biosciences, Germany), 2 mM dNTPs, 100 U of RevertAid reverse transcriptase and 8 U of RiboLock RNase Inhibitor (Fermentas, Thermo Fisher) for 10 min at 25 °C, 60 min at 42 °C, and 10 min at 70 °C. qPCR reactions were performed on a CFX Connect TM Real-TIME PCR Detection System (Bio-Rad) using 96-well micro plates (Axygen). Reactions were performed in duplicate ( < 5% variation between replicates) in a final volume of 10 µl containing 2 µl of 1:5 and 1:5000 diluted cDNA for target genes and 18 S, respectively, SsoFast EvaGreen Supermix (Bio-Rad, Portugal) and 300 nM of the forward and reverse specific primer. Optimized conditions consisted of: 95 °C for 30 s, followed by 45 cycles of 95 °C for 5 s and 10 s at the appropriate annealing temperature for primers (Table 4).
Melting curves were performed to detect nonspecific products and primer dimers and target specificity was confirmed by the presence of a single peak in each melt curve. Standard curves were prepared from serial dilutions of quantified amplicons. All PCR products were sequenced to confirm their identity. Control reactions were included in all runs to confirm the absence of genomic DNA. qPCR reaction efficiencies and r 2 (coefficient of determination) were all > 90% for each target transcript. Expression normalization was performed using 18 S ribosomal RNA (18 S). ethical approval. All applicable international, national and institutional guidelines for the care and use of animals were followed.

Data availability
The sequence read data and sample information from this study were deposited in BioProject portal at NCBI and can be accessed through the following BioProject Accession no.: PRJNA532556. The datasets analysed in this study are publicly available and the sources are referenced in the text.