The rises and falls of opsin genes in 59 ray-finned fish genomes and their implications for environmental adaptation

We studied the evolution of opsin genes in 59 ray-finned fish genomes. We identified the opsin genes and adjacent genes (syntenies) in each genome. Then we inferred the changes in gene copy number (N), syntenies, and tuning sites along each phylogenetic branch during evolution. The Exorh (rod opsin) gene has been retained in 56 genomes. Rh1, the intronless rod opsin gene, first emerged in ancestral Actinopterygii, and N increased to 2 by the teleost-specific whole genome duplication, but then decreased to 1 in the ancestor of Neoteleostei fishes. For cone opsin genes, the rhodopsin-like (Rh2) and long-wave-sensitive (LWS) genes showed great variation in N among species, ranging from 0 to 5 and from 0 to 4, respectively. The two short-wave-sensitive genes, SWS1 and SWS2, were lost in 23 and 6 species, respectively. The syntenies involving LWS, SWS2 and Rh2 underwent complex changes, while the evolution of the other opsin gene syntenies was much simpler. Evolutionary adaptation in tuning sites under different living environments was discussed. Our study provides a detailed view of opsin gene gains and losses, synteny changes and tuning site changes during ray-finned fish evolution.


Results
Fish Phylogeny and Opsin Gene Prediction. We selected 59 better assembled genomes from 76 available ray-finned fish genomes ( Fig. 1 and Supplementary Table S1). The data of their living environments were collected from FishBase 28 (Supplementary Table S1). From a comprehensive molecular phylogeny of fishes 29 , TimeTree 30 and the studies of Cyprinodontiformes 31 and Cichliformes 23 , we obtained a cladogram of the 59 rayfinned fishes (Fig. 1). This cladogram was used to infer gene copy number changes and its relationship to environmental adaptation, and synteny changes.
Using the 59 genome assemblies and our bioinformatics pipeline (Materials and Methods, Supplementary  Fig. S1), we identified 547 opsin genes in the 59 genomes (Supplementary Table S2). These 547 genes include 483 complete genes, 32 pseudogenes and 32 truncated genes (missing DNA segments 100 bps); the pseudogenes and truncated genes were not included when counting the number of opsin genes in a genome. Figure 1 shows the types and numbers of opsin genes observed or predicted in each fish genome.

Opsin Gene Copy Number Changes during Fish Evolution.
We used the parsimony principle to infer the copy number of each type of opsin gene at each ancestral node and also the minimum gene copy number changes in each branch of the cladogram (Fig. 1). The minimum changes were then checked when we considered the synteny changes in each branch of the tree and modified an inferred number if it was not consistent with the inferred synteny changes. The phylogenetic trees of the five opsin gene families are shown in Supplementary Figs S2-S6. The evolution of copy number of each opsin gene family is shown in Supplementary Figs S7-S11. Our inferences are summarized below.
First, we inferred that all five types of opsin genes (LWS, Rh2, SWS1, SWS2, and Rh1 (Exorh)) already existed in the common ancestor of the 59 ray-finned fishes (Fig. 1) and a retro-duplication of the Rh1 gene occurred in the ancestor of ray-finned fishes because an intronless Rh1 gene exists in all ray-finned fishes 16 (Fig. 1 and Supplementary Figs S7-S11). Second, the 3R event in the ancestral teleost should have duplicated all five types of opsin genes, but only one copy of each of the LWS, SWS1, SWS2, Rh2, and Exorh genes, and two intronless Rh1 genes were retained in the ancestral teleost genome ( Fig. 1 and Supplementary Figs S7-S11). Third, one Rh2 gene duplication happened in Ostarioclupeomorpha ( Fig. 1 and Supplementary Fig. S10). Fourth, the common ancestor of Electrophorus electricus (electric eel), Astyanax mexicanus (Mexican tetra), Pygocentrus nattereri (red-bellied piranha), Tachysurus fulvidraco (yellow catfish) and Ictalurus punctatus (channel catfish) may have lost its SWS1 gene ( Fig. 1 and Supplementary Fig. S9). Subsequent gene gains and losses are shown in Fig. 1.
The 4R event in the ancestor of Cyprinus carpio and Sinocyclocheilus spp. fishes should have doubled the copy numbers of the then-existing opsin genes. The C. carpio lineage subsequently gained one Exorh genes but it also lost one Rh2 gene and had one SWS1 gene, two Exorh genes and one intronless Rh1 gene was truncated ( Fig. 1 and Supplementary  Fig. S12). In the Sinocyclocheilus lineage one of the SWS1 genes became a pseudogene ( Supplementary Fig. S12). The common ancestor of Oncorhynchus mykiss (rainbow trout) and Salmo salar (Atlantic salmon) also underwent a 4R event, but O. mykiss and S. salar have subsequently lost many opsin genes ( Supplementary Fig. S13).

Evolution of Opsin Gene Syntenies.
Here a synteny means the synteny of an opsin gene (or opsin genes) and its adjacent neighboring genes. Our inferences were based on the parsimony principle that requires the minimum number of genomic changes to explain the transition of a gene synteny from one ancestral node to the next (Figs 2-5). For convenience, we used numbers to denote non-opsin genes. However, the numbering system is different in different figures: e.g., Gene1 denotes the HCFC1 gene in Fig. 2, but the TNPO3 gene in Fig. 3 (see also Supplementary Information).
We inferred that the SWS2-LWS synteny in the common ancestor of the Actinopterygii 19 was Gene1-Gen e2-Gene3-Gene4-SWS2-LWS-Gene5-Gene6-Gene7, which is found in the Lepisosteus oculatus genome (Fig. 2) and Gene1-Gene2-Gene3-Gene4-SWS2 is observed in coelacanth (data not shown). This synteny lost Gene2 and was duplicated in the 3R event in the common ancestor of Teleostei. One of the two descendant syntenies subsequently lost the opsin genes and became Gene1-Gene3-Gene4-Gene6-Gene7, which will not be considered from now on. The other became Gene1-Gene4-SWS2-LWS-Gene5-Gene7, and then lost Gene4 and became Gene1-SWS2-LWS-Gene5-Gene7 in both the common ancestor of Ostariophysi and the common ancestor of Euteleosteomorpha (Fig. 2). Three translocation events were inferred for LWS. In the Scleropages formosus lineage LWS was duplicated twice and one copy was translocated to another chromosome. In the common ancestor of Astyanax mexicanus and Pygocentrus nattereri, LWS was duplicated and one copy was translocated to another chromosome. In the Esox lucius (northern pike) lineage, LWS was tandemly duplicated and then the two LWS genes were duplicated and one duplicate was translocated to another chromosome. A SWS2 tandem duplication occurred in the common ancestor of Neoteleostei and a SWS2A tandem duplication occurred in the ancestor of Percomorpha (Fig. 2). Our inferred pattern of SWS2 duplication is consistent with that of Cortesi et al. 26 .  Table S1). The opsin genes found in an extant fish genome are shown before the species name in the order of LWS, Rh2, SWS2, SWS1, Rh1/Exorh, and Rh1 (intronless); if an opsin is missing in a species, the position in that species is empty (no circle). The inferred opsin genes in each major ancestral node are presented. An inferred opsin gene duplication event on a branch is indicated by its gene symbol on the branch; the number inside a circle indicates the number of gene gains. A pseudogenization event is indicated by a black slash, while a gene loss event is indicated by a red "X". The pseudogenes are excluded when counting the total number of opsin genes in an extant fish genome. For Rh2 gene, a Rh2 gene symbol with black X indicates that the duplicated Rh2 gene experienced gene conversion even similar to what Nakamura et al. 21 observed on some of the Rh2 genes of Thunnus orientalis. For a fish order/family that only has one species included in this study, the order/family name (in parentheses) and the name of the species are shown. In addition, a gene loss event that may have resulted from incomplete assembly of the genomic region is indicated by "?". Note that a 4R event occurred in the ancestral genome of Salmoniformes and another occurred in the genome of the Cyprinus carpio (Cypriniformes).
SciEntific RepoRTs | 7: 15568 | DOI:10.1038/s41598-017-15868-7 The SWS1 synteny underwent very few changes during fish evolution as in almost all ray-finned fishes the SWS1 genes are located between Gene1 and Gene2 (Fig. 3). The only exception happened in Austrofundulus limnaeus, which might be an assembly error because the SWS1 synteny is highly conserved in Cyprinodontiformes (Fig. 3). In addition, we inferred that a SWS1 tandem duplication occurred in L. oculatus before 3R and another duplication in Cyprinodontiformes after 3R (Fig. 3).
Two ancestral-like Rh1 gene syntenies could be found in basal Lepisosteus oculatus (Fig. 5). First, Exorh was located in Gene1-Exorh-Gene2-Gene3, this synteny lost Gene2 and after the 3R event in the common ancestor of Teleostei only one copy (Gene1-Exorh-Gene3) was retained. Second, an Rh1 retro-duplication event 3 generated  Supplementary Table S1. The genomic location of each synteny is summarized in Supplementary Table S4. Due to space constraint, only up two genes upstream and up two genes downstream of the opsin genes are shown for the cases in which a translocation or rearrangement of neighboring genes occurred. Each neighboring gene is denoted by a gene number or a gene ID if its function is unknown. The complete list of gene names and gene IDs is given in Supplementary Text. an intronless Rh1 gene in the synteny Gene4-Gene5-Rh1-Gene6 in Lepisosteus oculatus, and this synteny was duplicated in the 3R event but one of the two copies lost Gene5. Therefore, the ancestor of Teleostei possessed the following three syntenies: Gene1-Exorh-Gene3, Gene4-Rh1-Gene6 and Gene4-Gene5-"Rh1-2"-Gene6. Later, the ancestor of Neoteleostei lost the intronless Rh1-2 gene in the Gene4-Gene5-"Rh1-2"-Gene6 block.
Finally, all of the opsin gene syntenies were duplicated in Cyprinus carpio and Sinocyclocheilus spp., and C. carpio experienced additional rearrangements ( Supplementary Fig. S12). Supplementary Fig. S13 showed that following the 4R event in the ancestor of Oncorhynchus mykiss and Salmo salar, the duplicated intronless Rh1 and Exorh genes in both species became pseudogenes and a newly duplicated Gene1-Gene12-(Rh2) 4 -Gene13 block lost the Rh2 genes and became Gene1-Gene12-Gene13.
Living Environment and Opsin Gene Copy Number. We made the following two pair-wise comparisons of fish groups according to their habitats: (1) freshwater vs. marine fishes, and (2) fishes living in water above 30 m vs. fishes living below 50 m ( Fig. 1 and Supplementary Table S1). As the freshwater and marine species do not constitute independent samples, we used the phylogenetic comparative analysis 33 . We reached two conclusions (Supplementary Table S5). First, the freshwater fishes possess significantly more SWS1 and LWS genes per species than the marine fishes. Second, fishes living in water above 30 m have significantly more SWS1 and LWS genes but fewer Rh2 genes than fishes living below 50 m (Supplementary Table S5). When we considered the species that only have undergone 3R event, these differences remained significant (Supplementary Table S5). Amino Acid Substitutions at Tuning Sites. To date, more than 30 tuning sites in opsin proteins have been identified to influence the λmax of vertebrate opsins 4,8 . However, many of them have been conserved in Teleostei fishes. For example, C90S and S90C changed the SWS1 λmax significantly in birds, whereas serine is found at this residue site in all of the Teleostei we studied. Therefore, we excluded all such conserved sites. Also we considered only those tuning sites that have been found to have an effect of 5 nm spectral shift 4,8,10,32,34,35 . Figure 6 summarizes the changes and their effects with experimental evidence 4,8,10,32,34,35 . We used the species tree ( Fig. 1) and maximum likelihood to infer the evolutionary changes at tuning sites (see Materials and Methods). As shown in Fig. 7, in the vast majority of cases the ancestral state of a tuning site was inferred with a probability of 1, and the probability for all other cases with one exception was 0.93.
In SWS2, the ancestral amino acids at the key tuning sites 32 were predicted to be A94, S97, T118 and W265 (Fig. 7). The tandem duplication of SWS2 in the common ancestor of Neoteleostei fishes led to SWS2A and SWS2B 8 (Fig. 2), and then A94C and S97C occurred in the ancestor of Neoteleostei SWS2B (Fig. 7). Several amino acid substitutions occurred independently in the SWS2A and SWS2B of Neoteleostei fishes. For example, W265Y occurred in the common ancestor of Cyprinodontiformes and Beloniformes SWS2B, and T118G in SWS2B occurred in the common ancestor of Fundulus heteroclitus, Cyprinodon variegatus, Poecilia spp. and Xiphophorus spp. (Fig. 7). On the other hand, substitutions at sites 94 and 97 occurred several times in Percomorpha SWS2A. For SWS1, F86 and S90 led to the UV sensing of non-avian SWS1 opsin in vertebrates 9 . In this study, F86 and S90 have been conserved in all fishes, except for A86 in Lepisosteus oculatus. In addition, substitutions occurred frequently at sites 114 and 118 during teleost SWS1 evolution.
The predicted amino acids of the ancestral Rh2 opsin at the four key tuning sites 35 are S97, E122, M207 and A292 (Fig. 7). Tandem duplications of Rh2 occurred independently in the common ancestors of many major lineages, i.e., Salmoniformes, Cyprniformes, Anguilliformes, and Neoteleostei. The Rh2 duplication in the common ancestor of Neoteleostei fishes and subsequent amino acid changes led to Rh2A and Rh2B. The blue shift E122Q occurred not only in the common ancestor of Rh2B but also in other lineages, e.g., Salmoniformes and Esociformes. In addition, S97T was observed in the Rh2 of several lineages, i.e., Anguilliformes, Cypriniformes, and some species of Neoteleostei.

Discussion
Our opsin gene identification pipeline integrated the information of known opsin genes and genome annotation ( Supplementary Fig. S1). It recovered all of the opsin genes identified in two previous studies of nine genomes 21,22 , and also our characterization of SWS2 genes is similar to that of Cortesi et al. 26 . However, one caveat is that most of the coding region of Rh2Aβ gene in the A. citrinellus reference genome was not sequenced, and Torres-Dowdall et al. 41 indeed reported that A. citrinellus Rh2β is functional.
Synteny analysis facilitates the inference of duplication and translocation events and gene gain and loss events. Our analysis supported the view that all five types of opsins already existed in the common ancestor of ray-finned fishes 42 (Fig. 1) and revealed that SWS1, Rh1 and Rh2 genes are usually located in different genomic regions, while SWS2 and LWS genes are usually linked together (Figs 2-5, Supplementary Figs S12-S13). These four syntenic regions have evolved by duplication, translocation, and point mutation events 19,43 .
After the 3R event, only the duplicated intronless Rh1 and the SWS2-LWS synteny have been retained in extant teleosts and one of the duplicated SWS2-LWS syntenies lost all LWS and SWS2 genes (Figs 2 and 5). In addition, the evolution of the Rh2 synteny was highly dynamic prior to the emergence of Neoteleostei fishes (Fig. 4). This observation supports the view that the 3R event accelerated the loss of ancestral syntenies 44 . The 4R event in the Cyprinus carpio lineage appeared to have been followed by a series of rearrangements, leading to the extant Rh1, Rh2 and Exorh syntenies in carps, while those in Sinocyclocheilus spp. underwent no massive rearrangement ( Supplementary Fig. S12). On the other hand, after the 4R event in the ancestor of Salmoniformes, a smaller number of duplicated opsin genes were retained ( Supplementary Fig. S13).
Our synteny analysis indicated that most gene duplication events involving opsin genes were tandem duplications (Figs 2-5, Supplementary Figs S12 and S13), especially opsins in the LWS and Rh2 families 3 . However, the 4R in the Cyprinus carpio and Sinocyclocheilus spp. doubled their entire opsin gene repertoire, and that in Salmoniformes contributed an extra copy of the SWS1 gene to Oncorhynchus mykiss (Supplementary Figs S12 and S13), which might be helpful for its foraging on zooplankton 45 .
Freshwater fishes possess more SWS1 and LWS genes than marine fishes (Supplementary Table S5). Most freshwaters are shallow, so that UV light can penetrate through the water. On the other hand, red light is quickly absorbed by water, so that marine fishes in deep water tend to lose LWS genes. Similar results were obtained when we compared the opsin copy number between fishes living above 30 m and those living below 50 m, because shallow water receives more UV and red light than deep water. In addition, phylogenetic comparative analysis also showed marine fishes inhabit below 50 m possess more Rh2 genes than those live above 30 m. Because the blue-green light dominate the photic environment in deep water, the more Rh2 genes a fish possess imply the better visual adaptation in different habitats or/and at different developmental stages, e.g., the pacific blue-fin tuna (Fig. 1). In Neoteleostei fishes, most loss events of Rh2 genes involved Rh2B genes, but the lost part of spectra sensitivity could be covered by SWS2A opsins. Electrophorus electricus, Ictalurus punctatus and Tachysurus fulvidraco, which live in turbid freshwater environments with reduced penetration of short wavelength lights 28,47 , have also lost the SWS1 and SWS2 genes and in E. electricus the Rh2 gene has become a pseudogene.
Fishes that inhabit diverse water bodies with different spectral backgrounds may have different opsin gene repertoires with tuning site changes that help fine tuning the visual adaptation to their habitats 21,22 (Figs 6 and 7). For LWS, the LWS phylogeny suggests that amino acid substitutions had occurred in different lineages to adapt to different photic environments (Fig. 7). For example, S164A, a blue shift in LWS, independently occurred in the Clupea harengus and Stegastes partitus (bicolor damselfish) lineages, enabling them to adapt to deeper ocean (Fig. 7). Substitutions in a duplicated LWS gene may confer a new function. For example, in Fundulus heteroclitus (mummichog), Poecilia reticulata (guppy) and Xiphophorus maculatus (southern platyfish), the spectral sensitivity of the duplicate LWS P180 was shifted to the green light spectrum owing to the three substitutions S164P, Y261F and T269A (Fig. 7). In Boleophthalmus pectinirostris (blue-spotted mudskipper), on the other hand, LWS2 had experienced the substitutions S164A, Y261F and S292A, which might have turned LWS2 into a green-light Figure 7. Predicted ancestral amino acid residues at the major tuning sites of opsins. The ancestral amino acid residues of the major tuning sites of opsin genes are shown at the major ancestral nodes (the common ancestor of the 59 fishes, Actinopterygii, Teleostei and Neoteleostei). We use the color code to indicate the opsin family with its member(s) that experienced the described amino acid substitution(s). The amino acid substitutions at tuning sites are indicated on the branches of the species tree. For the Rh2 and SWS2 genes in Neotelestei fishes, we show their subtypes (Rh2A, Rh2B, SWS2A, SWS2B, SWS2Aα and SWS2Aβ). A series of substitutions on the same set of opsins in the same state are separated by "/", followed by parentheses with the probability of each predicted ancestral amino acid. If not all opsins in the same family of a species/lineage experienced the same set of substitution(s), then we indicate the particular set of opsins experienced the substitution(s). For example, LWS1:S164A on the branch leading to Cyprinus carpio indicated that the C. carpio LWS1 opsin experienced the substitution S164A. Otherwise, a set of substitution(s) without a clear indication of any particular set of opsins means all opsins in the same fish family of a species/lineage shared the same substitution(s). For example, S118A substitution labeled in purple on the common ancestor of O. latipes and Cyprinodontiformes indicated that the S118A substitution in the SWS1 opsin is present in all of the 13 fishes belonging to this lineage. The tuning sites are numbered according to those in bovine rhodopsin. sensitive opsin 22 (Fig. 7). Therefore, the gene duplication events in LWS and subsequent tuning site changes might have led to a better visual ability to respond to middle-long wavelength lights in some fishes.
In this study, the fishes that possess SWS1 could have UV vision, except for Lepisosteus oculatus that has alanine at site 86 (Fig. 7). Site 86 is crucial for UV sensing 10,48 (Fig. 6), because fishes that lost F86 in SWS1 cannot sense UV-light 48 . The effect of F86A in SWS1 opsin is unknown. However, based on the mutagenesis experiment on Lepidopus fitchi SWS1 48 , we hypothesized that F86A might induce a red shiftthat led to a violet sensitive SWS1 opsin (Fig. 6). On the other hand, SWS1 loss events in some fishes could have resulted from environmental adaptation. For example, P. nattereri, A. mexicanus, I. punctatus and E. electricus live in turbid or tea-stained water, in which UV light could be absorbed very quickly, and these fishes lost the SWS1 genes. Some marine demersal fishes, e.g. G. morhua, L. crocea, M. miiuy and C. semilaevis, also lost their SWS1 genes possibly because UV vision is not needed in sandy or muddy habitats. In addition, some vertebrates use color lens or cornea to minimize UV-induced damages, and alter their SWS1 opsin toward violet light sensing 48 . A previous study on mudskipper's genome hypothesized that the loss of SWS1 might be the consequence of the protection their retinae from UV damage 22 . If this hypothesis holds, the SWS1 loss of shallow water species, such as Tetraodontiformes species and M. mola that often swim just under sea-surface might have also resulted from the protection against the UV-induced damage.
For SWS2, the λmax in the ancestor of Neoteleostei SWS2A could be 440~460 nm 4 , and A94C and S97C occurred in the ancestor of Neoteleostei SWS2B, shifting its λmax to 400~420 nm 4,34 (Fig. 7). In SWS2B, two blue shifts, W265Y and T118G, were observed in several lineages. A mutagenesis study showed that A94C and W265Y together in bluefin killifish could result in a 32 nm blue shift, slightly smaller than the sum of the two individual shifts 32 . In SWS2, the interaction between different tuning sites may reduce the effect of the spectral shift induced by individual replacements. Therefore, W265Y and T118G in SWS2B may only create a relatively small spectral shift because of the interaction between tuning sites.
For Rh2, E122Q caused a ~20 nm blue shift in Rh2B in the common ancestor of Neoteleostei; the λmax of Rh2A ranges from 500 to 530 nm, while that of Rh2B from 450 to 500 nm 4 (Fig. 6). In the Rh2A clade, some substitutions represent the adaptation to extreme light environments. For example, E122Q occurred in Anguilla spp. Rh2-3, Notothenia coriiceps Rh2A and Thunnus orientalis Rh2A2 for adapting to deep sea, deep Arctic and demersal habitats, respectively (Fig. 7). In addition, E122Q and M207L in some Rh2 duplicated genes of Danio rerio, Dicentrarchus labrax, Leuciscus waleckii, Sinocyclocheilus spp., Esox lucius, O. mykiss, S. salar, and M207L in that of Cyprinodon variegatus, Amphilophus citrinellus, Mola mola and several Cyprinodontiformes fishes might have conferred these fishes with different λmax's for more delicate sensing of blue and green lights (Fig. 7).
In the above we have mentioned some examples of neo-functionization of duplicated genes due to changes at tuning sites. Here we add some possible cases. First, the Rh1 duplication and the D83N/A292S tuning site changes in a duplicated copy led to the two different Rh1 opsins in Anguilliformes, which are utilized in different dwelling habitats at different developmental stages 14 . Second, the E122Q change in Rh2B led to the divergence in λmax between Rh2A and Rh2B in Neoteleostei. Third, A94C/T97C shifted the blue-light sensitive SWS2 opsin to the violet-light sensitive SWS2B opsin 32 . Fourth, H181Y/Y261F/T269A/A292S and S164P/Y261F/ T269A led the red-light sensitive LWS opsin to sense green light and divergent functions of LWS duplicates in Cyprinodontiformes 5,49 . In conclusion, opsin gene duplication and tuning site changes may provide fishes with visual diversity to deal with the selection pressure exerted by different environments.

Materials and Methods
Data Download. We first collected 76 ray-finned fish genomes from GenBank, RefSeq, other relevant databases and publications [50][51][52][53][54][55] (before December 29, 2016). Then we selected 55 assembled genomes that satisfied the criterion N50  100,000 bps. In addition, we added Anguilla anguilla, A. japonica and A. rostrata because they are basal teleosts, and also Gasterosteus aculeatus because it is a model organism. Detailed information of the 59 selected ray-finned fish genomes is given in Supplementary Table S1. In addition, we downloaded from GenBank the coding sequences of annotated opsin genes 5,16,26 (Supplementary Table S3).

Prediction of Opsin Genes in a Fish Genome.
We downloaded the zebrafish genome annotation (Zv10) from NCBI RefSeq, which included the annotation of all zebrafish opsin genes (Supplementary Tables S3). For each of the other fish genomes, the prediction workflow (pipeline) is shown in Supplementary Fig. S1. For a genome with annotation, we used all annotated protein sequences in the genome and BLASTP to search against the zebrafish protein sequences (e-value < 1 × 10 −5 ). Only the protein sequences with the best hits to annotated zebrafish opsins were kept (Supplementary Table S3), and the coding sequences of these protein sequences and their genomic locations were retrieved from the genome annotation. To reduce false negatives (i.e., an opsin gene sequence exists but was not recovered by BLASTP) and to handle genomes with no available annotation, the coding sequences and the genomic locations of opsin genes from previous studies 16,26 (Supplementary Table S3) were used to search against the genome sequences by TBLASTX (e-value < 1 × 10 −5 ) and BLASTN (e-value < 1 × 10 −5 ), respectively. In addition, we referenced the BAC clones of Watson et al. 56 (accession numbers: GQ999832, GQ999833) to predict the LWS genes in Xiphophorus hellerii, because the syntenies in the X. hellerii genome are incomplete. The coding sequences of putative opsin genes in these genomes were then obtained from the alignment with the best hits. For the cases with N's (unknown nucleotides) in coding regions, the number of N's was estimated from the alignment. Finally, we kept the coding sequences corresponding to the genomic regions with BLAST hits in both approaches or just in the direct search against genome sequences if genome annotation was not available (Fig. S1). A gene is defined as a pseudogene if a premature stop-codon(s) is found in its coding region. A gene is truncated if any of its exons are missing or the number of N's in its coding region is >100. The genomic information of all identified opsin coding sequences is summarized in Supplementary Table S2.
SciEntific RepoRTs | 7: 15568 | DOI:10.1038/s41598-017-15868-7 Synteny Analysis. To examine the synteny of opsin genes, we checked the genes that have been reported previously in the immediate upstream or downstream neighboring regions of an opsin gene 21,22,26 . These genes are called "typical adjacent genes" here. The locations of these genes in the studied genomes were obtained from genome annotation or using BLAST (e-value < 1 × 10 −5 ). The gene order and orientation in the syntenic region were defined based on the original genome annotation. We used a sliding window approach to check whether any typical adjacent genes appeared in nearby regions. The size of the sliding window was set to be 6 + n, where n is the number of opsin genes in the genomic region (i.e., n opsin genes plus 3 upstream and 3 downstream genes). If none of the typical adjacent genes could be found, we reported the three genes upstream and three genes downstream of the opsin genes. There were two cases where manual adjustments were required. The first case was the Gene1-Gene2-Gene3-Gene4-SWS2-LWS-Gene5-Gene6-Gene7 (HCFC1-TMEM187-IRAK1-MECP2-SWS 2-LWS-GNL3L-FGD1-TFE3) synteny in L. oculatus, because there were three genes between HCFC1 and SWS2 (Fig. 2). The other case was the Rh2 synteny in L. oculatus because there were 8 genes (Gene2~Gene9) between Gene1 (SLC6A13) and Rh2 (Fig. 5). For those genomes without annotation, the sequences of upstream or downstream genes in closely related species that have been annotated were used to search the location of each of these genes by BLASTN and TBLASTN (e-value < 1 × 10 −5 ). In addition, we referenced the BAC clones of Watson et al. 56 (accession numbers: GQ999832 and GQ999833) and predicted the following two syntenies that are related to LWS and SWS2 in X. hellerii HCFC1 -SWS2Aa -SWS2B -LWS S180-1 -LWS P180 -LWS S180-2 -GNL3L -TFE3 and LWS S180r, which is located in the intron of the gephyrin (GPHN) gene. These two inferred syntenies are exactly the same as those annotated by Watson et al. 56 .
We inferred the minimum number of synteny changes in each branch by the parsimony principle. We assumed that a translocation event happened if an opsin gene appeared in a new synteny that was different from the synteny at the immediate ancestral node (Figs 2-5). The inferred syntenies at each evolutionary node are summarized in Figs 2-5, Supplementary Table S4 and Figs S12-S13.

Construction of Phylogenetic Trees of Opsin Genes.
To have a good reference for predicting the duplication and diversification pattern of opsin genes, we used the predicted opsin gene sequences for each opsin families. The sequences which were regarded as pseudogenes and truncated genes were excluded in the construction of phylogenetic trees. The selected gene sequences of each families were first aligned by CLUSTALW 57 . The phylogenetic trees was inferred by using the Maximum Likelihood method based on the General Time Reversible model considering a discrete Gamma distribution for modeling evolutionary rate differences among sites (5 categories) and meanwhile allowed some sites to be evolutionarily invariable (GTR + G + I). The bootstrap procedure was repeated for 500 times. The entire process was done using MEGA 6.0 software 58 . The phylogenetic trees are shown in Supplementary Figs S2-S6.

Inference of Copy Number Changes.
As mentioned in Results, in inferring copy number changes, we used the parsimony principle to infer the minimum number of copy number changes in each branch of Fig. 1; see Supplementary Figs S7-S11 for the copy number changes in each opsin gene family. Then we checked if the minimum number of changes in a branch was consistent with the inferred synteny changes in that branch. We assumed that a WGD event (3R or 4R) doubled the copy number of each existing opsin gene. Also some tandem duplication events can be inferred from synteny changes. For example, for the SWS2 family, the tandem duplication event that produced SWS2A and SWS2B in the latest common ancestor of Neoteleostei fishes is supported by the predicted synteny Gene1-SWS2A-SWS2B-LWS-Gene5-Gene7 in the Neoteleostei fishes and the duplication event that produced SWS2Aα and SWS2Aβ in the latest common ancestor of Percomorpha fishes was supported by the predicted synteny Gene1-SWS2Aβ-SWS2Aα-SWS2B-LWS-Gene5-Gene7 in the Percomorpha fishes (Fig. 2). For the Rh2 family, the duplication event that produced Rh2A and Rh2B in the latest common ancestor of Neoteleostei was supported by the predicted Gene1-Gene12-Rh2B-Rh2A-Gene13 synteny in Neoteleostei fishes (Fig. 4). For the Rh1 family, the loss of Rh1-2 in the latest common ancestor of Neoteleostei was supported by the fact that all the predicted Rh1-2 syntenies in Neoteleostei fishes lost the Rh1-2 gene (Fig. 5).
Ancestral Sequence Prediction. We only considered complete opsin genes since truncated genes and pseudogenes are unlikely to be functional. In general, we selected the gene that is most similar to the counterpart in L. oculatus as the representative for each opsin gene family. The representative gene sequences were then aligned by CLUSTALW 57 for codon-based alignment. The species tree (Fig. 1) was used as the reference tree when using maximum likelihood approach to infer the ancestral states. The predicted syntenies (Figs 2-5) were also employed to help assign the homology between genes and determine the ancestral state of the tuning site. The analysis was done by using MEGA 6.0 software 58 . Table S1), we classified the studied fishes into two pair-wise comparisons: (1) freshwater vs. marine fishes, and (2) living water depth: <30 m vs. >50 m (Fig. 1).

Statistical tests. Using Fishbase (Supplementary
To take the fish phylogeny into consideration for statistical testing, we conducted the Phylogenetic Comparative Analysis 33 . The cladogram of the 59 ray-finned fishes (Fig. 1) was used as the reference phylogeny. The species divergence times were obtained from the literature we referenced when constructing Fig. 1 23,29,30 , the divergence time estimated by TimeTree 30 was taken when there were conflicts between different studies. For the internal nodes with unknown divergence time, we randomly assigned values that are not contradicting to the other constraints. For each test, four models of opsin gene copy number evolution were formulated. The first model (denoted as BM) is based on Brownian motion, which assumes that the evolution is a result of pure drift. It served as the null hypothesis. The rest three models which served as alternative hypotheses are based on Ornstein-Uhlenbeck process, which takes both drift and selection into consideration. The second model (denoted as OU (1)) assumed a global optimum for opsin gene copy number. The third model (denoted as OU (2)) assumed two different global optimum for opsin gene copy number as there are two types of habitats in each test. For OU (2), the habitats of ancestral states were inferred based on maximum parsimony and bottom-up principle. The last model (denoted as OU (3)), which is an extension of OU (2), assumed that the habitat in ancestral state is unknown. The parameters of each model were estimated by likelihood maximization. Finally, the likelihood ratio test, Akaike Information Criterion and Schwarz Information Criterion were used to assess which model fits better to your data. For likelihood ratio test, we regarded a test as statistically significant if the p-value is 0.05. For cases with either OU(2) and OU(3) as significant, we regarded them as cases number of opsin genes differs significantly between the two groups. Details of phylogenetic comparative analysis are summarized in Supplementary Table S5.