Deciphering the evolutionary signatures of pinnipeds using novel genome sequences: The first genomes of Phoca largha, Callorhinus ursinus, and Eumetopias jubatus

The pinnipeds, which comprise seals, sea lions, and walruses, are a remarkable group of marine animals with unique adaptations to semi-aquatic life. However, their genomes are poorly characterized. In this study, we sequenced and characterized the genomes of three pinnipeds (Phoca largha, Callorhinus ursinus, and Eumetopias jubatus), focusing on site-wise sequence changes. We detected rapidly evolving genes in pinniped lineages and substitutions unique to pinnipeds associated with amphibious sound perception. Phenotypic convergence-related sequence convergences are not common in marine mammals. For example, FASN, KCNA5, and IL17RA contain substitutions specific to pinnipeds, yet are potential candidates of phenotypic convergence (blubber, response to hypoxia, and immunity to pathogens) in all marine mammals. The outcomes of this study will provide insight into targets for future studies of convergent evolution or gene function.

Pinnipeds, which consist of three families (Phocidae, Otariidae, and Odobenidae) are distinguishable from other marine mammals 6 . Most pinnipeds are semi-aquatic, unlike other marine mammals that spend their entire lives in the water 4 , and have modified limbs as flippers that propel them both in the water and on land 7 . In addition, with the exception of the walrus, which is the only extant species of the family Odobenidae, all pinnipeds have fur coats 8 . These distinct characteristics have not been sufficiently characterized at the molecular level. Although a draft fur seal genome has recently been assembled 9 , the evolutionary and biological aspects of pinnipeds have not been investigated. Indeed, the genome of the Weddell seal (family Phocidae) has not been completed (http://software.broadinstitute.org/allpaths-lg/blog/?p=647). In addition, most phylogenetic studies of pinnipeds have used limited marker sequences, such as that of the mitochondrial genome [10][11][12] .
Comparative genomics enables investigation of the convergent evolution of distant species. For example, convergent amino acid changes for vocal learning were identified by sequencing 48 avian genomes 13 . Similarly, Parker et al. 14 reported nearly 200 convergent loci in the genomes of echolocating mammals. Although there are more studies to demonstrate to phenotypic convergence-linked sequence convergence, molecular convergence toward phenotypic convergence, at least in marine mammals, seems to be uncommon. By analyzing 22 mammalian genomes, including those of three marine mammals, Foote et al. 15 suggested that different molecular pathways could be used to reach the same phenotype. In a study of the Hox gene family in mammals, only a fraction of sites had positive selection signatures shared by three independent marine mammal lineages 16 . Rather than sequence-level, gene-level convergence was presented as widespread signatures when evolutionary rates were used 2 . Therefore, there is convergence at the functional level or higher in separate mammalian lineages, and different marine mammal lineages have used different molecular pathways to achieve phenotypic convergence.
Here, we constructed draft genomes of three species of two pinniped families: Phoca largha (Phocidae) and Callorhinus ursinus and Eumetopias jubatus (Otariidae) (Fig. S1 and Supplementary Note S1). We identified genes with a positive selection signature that were common to the three pinnipeds but absent from other mammals, which are likely related to the unique traits of pinnipeds. In addition, divergent molecular changes likely to occur only in the pinniped lineage during phenotypic convergence of marine mammals were investigated.

Results
Genome assembly and annotation. Before assembling the genomes of the three pinnipeds, we estimated the genome sizes using the 19-mer distribution of paired-end reads. The estimated genome sizes were 2.61, 2.71, and 2.64 Gbp for the spotted seal (SS), northern fur seal (NFS), and Steller sea lion (SSL), respectively (Fig. S2). The genomic DNA of the three pinnipeds was assembled to a size of approximately 2.5 Gbp, which is similar to that of previously assembled genomes (Antarctic fur seal 9 , Hawaiian monk seal [https://www.ncbi.nlm.nih.gov/ assembly/GCF_002201575.1], and Weddell seal [https://www.ncbi.nlm.nih.gov/assembly/GCF_000349705.1]). Summary statistics of the final assembly are provided in Table S1. To assess the quality of the draft genomes, we remapped paired-end reads with a 350 bp insert size, which yielded alignment rates of >98% for the three genomes (98.24, 98.74, and 98.73% for SS, NFS, and SSL, respectively). The completeness of core-orthologs was evaluated using Benchmarking Universal Single-Copy Orthologs (BUSCO). Each of the three genomes contained more than 90% core-orthologs from the class Mammalia, in the form of either complete or fragmented sequences (Table S2). The GC contents of the three genomes were investigated using 500 bp bins, and were similar to those of the draft genomes of related species (Fig. S3).
Repeat elements accounted for 35.83, 40.40, and 35.78% of the SS, NFS, and SSL genomes, respectively. Of the repeat regions, long interspersed nuclear element (LINE) was the most extended element in terms of base pairs (Table S3). After masking the identified repeat elements, 33,988, 32,740, and 28,081 protein-coding genes were predicted for SS, NFS, and SSL, respectively (Table S4). Of the predicted genes, ~95% were functionally annotated to at least one of the InterPro, SwissProt, and TrEMBL databases (Table S5).
Therefore, the SS, NFS, and SSL genomes were not significantly different from one another in terms of various statistics related to genome assembly. Because the three species are related, this similarity suggests that the three genomes have similar levels of completeness.
Phylogenomics and protein-coding gene families. To identify the relationships among SS, NFS, and SSL and other related species, we constructed a maximum-likelihood (ML) tree using the amino acid sequence of one-to-one orthologs generated using a dataset of the proteomes of nine species available in public databases. In total, there were 2,907 one-to-one orthologs, the combined length of which was 982,250 amino acid residues. The newly constructed tree provided robust support for the known phylogenetic tree of marine mammals (http:// www.timetree.org/) (Fig. 1A), and the phylogenetic tree is used in the downstream analysis for positively selected genes and substitutions.
We constructed orthologous gene clusters using the genomes of six marine mammals to identify gene clusters and their functions unique to pinnipeds (Fig. S4). The pinniped genomes contained 13,919 (NFS), 13,441 (SS), and 14,165 (SSL) orthologous gene families, respectively, 9,639 of which were shared by all three pinnipeds (Fig. 1B). Of these gene families, 1,874 were present in all pinnipeds, but not in three other mammals. By Gene Ontology (GO) enrichment analysis, we found these gene families to be enriched in 31 terms (p-value < 0.05), several of which were related to an aquatic lifestyle, such as 'aorta development' , 'sterol biosynthetic process' , 'cardiac septum development' , 'coronary vasculature development' , and 'cellular response to oxidative stress' (Table S6).
To investigate gene-family expansion and contraction, a computational analysis of gene-family sizes using the orthologous gene clusters was performed in CAFÉ 17 . By comparing six marine mammals, we found that 874 gene families were expanded, while 1,925 gene families were contracted in the pinniped lineage. Of these gene families, a subset of the Protocadherin (Pcdh) family (herein named family 34) was significantly expanded in the pinniped lineage (p = 0.000346). The genomes of the pinnipeds contained a larger number of Pcdh genes than those of the Genes with accelerated evolution in the pinniped lineage. To detect positive selection in the pinniped lineage, a dN/dS analysis using the branch-site model was performed. The branch-site model allows dN/ dS (ω) to vary both among sites in the protein and across branches on the tree 23 . Therefore, we hypothesized a few sites in the pinniped branches to have different ω ratios compared to other branches and that the genes containing these sites might be related to the unique features of pinnipeds. After the filtering step (see Methods), we analyzed 2,754 one-to-one orthologs identified in the proteomes of 12 mammals, of which seven genes with 145 sites were under positive selection (Bonferroni-corrected p < 0.05, posterior probability based on Bayes empirical Bayes inference [BEB] >0.95; Table 1). Of these genes, transmembrane protein 132B (TMEM132B) contained the largest number of positively selected sites (52 sites). Of the seven genes, six contained 29 conserved domains with 74 sites (51%) under positive selection. GO terms were assigned to each gene, and the following functional associations with pinniped lifestyle were found: TECTA, sensory perception of sound (GO:0007605), SPEG, muscle organ development (GO:0007517), and ADAMTS5, defence response to bacterium (GO:0042742) and tooth eruption (GO:0044691). TECTA encodes alpha-tectorin, a major non-collagenous glycoprotein of the tectorial membrane, an extracellular matrix in the inner ear 18 . Mutations in TECTA result in hearing loss [24][25][26] (OMIM: 602574). SPEG is required for cardiac development and is associated with cardiac myopathy 27,28 (OMIM: 615950). ADAMTS5, which encodes an extracellular matrix-degrading enzyme, plays an important role in the T-cell immune response to viral infection 29,30 .
To assess their uniqueness, the amino acid residues positively selected in the pinniped lineage were compared to other species in our analysis as well as in publicly available databases. For example, we investigated 4 of the 18 sites within TECTA after manually filtering out amino acid residues with spurious alignment (Fig. 2A). The four sites were pinniped-specific compared to the other nine species (Fig. 2B). Moreover, a 100-way multi-alignment showed that two pinnipeds (Pacific walrus and Weddell seal) had residues identical to those in the three pinnipeds in this study (Fig. S5). We could only find a small number of residues matching those in 100 vertebrates at these sites ( Fig. S5). Consequently, the four sites within TECTA might be unique to pinnipeds and generated during their adaptation to a semi-aquatic environment.
Unique substitutions of pinnipeds contributed to the phenotypic convergence of marine mammals. Parallel substitutions are widespread in marine mammals; however, most are not unique to marine mammals 15,31 . Moreover, molecular convergences are rarely linked to phenotypic convergences in marine mammals 2,15,16 . In this study, about half of the parallel substitutions shared by marine mammals were also found in terrestrial mammals, and a considerable number of unique substitutions was found between species with no obvious phenotypic convergence (Figs S6-S8). Therefore, we hypothesized the existence of pinniped-specific substitutions that contributed to aquatic adaptation and are shared by marine mammals. First, we focused on gene-level convergence (Fig. S9) and conducted a dN/dS analysis of one-to-one orthologs using the branch model. The branch model allows the dN/dS (ω) ratio to vary among branches in the phylogeny and is useful for detecting positive selection acting on particular lineages 32 . In this way we aimed to detect candidate genes with different ω ratios among the marine mammal lineages rather than candidate sites, which may contribute to phenotypic convergence among marine mammals. Of the 2,754 filtered one-to-one orthologs, the branch model-based dN/dS analysis detected 853 positively selected genes in marine mammal lineages (Fig. S10b, cetaceans, pinnipeds, and sirenians, Bonferroni corrected p-value < 0.05). These are hereafter referred to as rapidly evolving genes (REGs).   We also calculated the site-wise log likelihood support (SSLS) values for the amino acid sequences of 2,754 genes (996,522 residues in total) and calculated the ΔSSLS values to detect site-wise signatures of divergent evolution. The ΔSSLS value is indicative of the goodness-of-fit of each site to a pair of phylogenetic trees. We aimed to detect genes positively selected in three marine mammal lineages with substitutions unique to pinnipeds. Therefore, we calculated the SSLS for two hypotheses: H 0 , divergence among marine mammal clades and H 1 , convergence among marine mammal clades. Therefore, a ΔSSLS (log likelihood of H 0 − log likelihood of H 1 ) value > 0 means that the site in question supports divergence among marine mammal clades. We used the ΔSSLS value as a filtering criterion to exclude sites supporting convergence among marine mammals. By excluding those with low ΔSSLS values, we identified pinniped-specific sites that support the separation clades of marine mammals. We expected that this analysis would generate more reliable sites than directly extracting unique substitutions over REGs, as it considers the overall phylogeny not just the sequence itself.
We regarded the 9,965 residues with the top 1% ΔSSLS values as being supported by divergent substitutions (support for H 0 ) rather than convergent substitutions among three marine mammal clades (support for H 1 ) (Fig. 3A). We termed the 2,159 genes containing at least one of these residues as divergent substitution genes (DSGs). DSGs covered most of the 2,754 one-to-one orthologs (78%), and 85% of total residues had positive ΔSSLS values. Therefore, the majority of the sequences supported the commonly accepted phylogeny.
Of the 853 REGs, 658 (3,277 residues) had a least one top 1% ΔSSLS site (Fig. 3B). Although these genes covered the functional categories associated with marine mammals' adaptation, a single residue divergence supported by likelihood divergence (ΔSSLS) could be vulnerable to sequencing error. We also focused on sequence changes common to the pinniped clade; that is, changes from the ancestral node sequence shared by dog to that of the node of each pinniped. Therefore, we investigated unique substitutions (any amino acid residue at the same position in all three pinnipeds that was found in neither the ancestral nodes with their respective terrestrial taxa [dog] nor in other mammals) to rule out sequence divergences other than ancestral substitutions unique to the pinniped clade. There were 1,731 genes with at least one unique substitution (7,878 residues); these were termed unique substitution genes (USGs), 63 of which contained top 1% ΔSSLS residues at the same positions as unique substitutions. Finally, we obtained 24 REGs containing top 1% ΔSSLS residues and unique substitutions at the same positions ( Fig. 3 and Table 2).
Although the 24 REGs are supported by rapid evolutionary rates (dN/dS) and fixation of amino acid residues within the pinniped clade, the precise phenotypic effects of the unique substitutions cannot currently be ascertained. However, several of the 24 REGs have known functional associations that suggest a role in the convergent phenotypic evolution of marine mammal lineages. For instance, FASN encodes fatty-acid synthase, which catalyzes the conversion of acetyl-CoA and malonyl-CoA to long-chain saturated fatty acids 33 and is related to obesity 34 . KCNA5 (potassium voltage-gated channel subfamily A member 5) encodes voltage-gated potassium

Discussion
In this study, we presented three genomes of pinnipeds (Phoca largha, Callorhinus ursinus and Eumetopias jubatus) that belong to Phocidae, and Otariidae family for the first time. Pinnipedia is a monophyletic group distinct from other marine mammals in many respects, such as its semi-aquatic lifestyle and well-developed flippers 5 . Our findings provide insight into the common features of pinniped genomes, which is less clear than the convergent evolution of pinnipeds. Pinnipeds are the most amphibious mammalian species. Possibly, for that reason, their auditory systems are challenged by the need to function efficiently underwater and in air, unlike the solely underwater hearing of cetaceans and sirenians 39,40 . TECTA, which is related to sound perception 26 , was identified as positively selected in the pinniped lineage. TECTA encodes α-tectorin, a non-collagenous component of the tectorial membrane in the cochlea 41 . The tectorial membrane is an extracellular matrix that covers the surface of the sensory epithelium in the cochlea and plays a vital role in transmitting sound to the stereocilia of hair cells, where the sound is transduced into neural signals 42 . Therefore, mutations in TECTA might be involved in the semi-aquatic adaptation of pinnipeds by tuning their hearing ranges. Indeed, mutations in TECTA are responsible for loss of hearing at particular frequencies [43][44][45] . Interestingly, the four positively selected sites in TECTA were very rare among 100 vertebrates (Fig. S5). Although its relationship with amphibious sound perception is unclear, TECTA should be investigated in future studies of amphibious sound perception in pinnipeds. The pinniped lifestyle might influence the function of other candidate genes, such as SPEG and ADAMTS5. Comparative analysis of amphibious mammals may reveal their adaptations at the molecular level and show that an amphibious lifestyle results in selection pressure.
We found that a considerable number of parallel substitutions are not unique to marine mammals, consistent with two recent reports 15,31 . This implies that molecular convergence is not a driving force of phenotypic convergence among marine mammals, and that different clades of marine mammals used different molecular pathways to reach similar phenotypes. Although this phenomenon has been observed several times in marine mammals, whether it also applies to other clades is unclear. More evidence in other clades is needed to generalize this phenomenon to other forms of phenotypic convergence.
Because sequence convergences leading to phenotypic convergences are not common, we assumed that unique substitutions contributed to the aquatic adaptation of pinnipeds. In our analyses, three genes, FASN, KCNA5, and IL17RA, were identified as candidates. The well-defined roles of these genes (blubber 46 , resistance to hypoxia 47 , and the immune response to pathogens 15 , respectively) support their contributions to phenotypic convergences of marine mammals. FASN and KCNA5 were not found to be positively selected in the branch-site model analysis using all marine mammal branches as foreground branches. In addition, only ~17% of the REGs were found to be positively selected genes by the branch-site model analysis (Fig. S11). Such results suggest that rapid evolution occurred at different sites of the candidate genes between marine mammal clades, an example of gene-level convergent evolution. Convergent evolution can occur at molecule, gene, and function levels 31,48 . We focused on convergence at the gene level. However, the functions of the majority of the putative convergent genes were unrelated to apparent phenotypic convergence, such as lipid metabolism and resistance to oxidative stress. This may be due to the missing link between convergent genes and phenotypic convergences. In this case, the results can be complemented by studying the gene functions and convergence at a higher-level.

Conclusions
We report here the genomes of Phoca largha, Callorhinus ursinus, and Eumetopias jubatus. These genomes suggest the existence of considerable sequence diversity within and across marine mammal species. We identified several unique genome-level adaptations to the semi-aquatic lifestyle of pinnipeds, and several examples for evolution of marine mammals that are convergent in gene-level, but divergent in sequence-level. These findings suggest targets for future in vitro and in vivo studies of adaptive phenotypes and provide insight into convergent evolution at the molecular level.

Methods
Ethics statement. No ethics approval was required for the collection of DNA from blood samples of bycaught carcasses.

Sample information and collection. We collected five pinniped samples from Korean waters. Three male
Northern fur seals (Callorhinus ursinus) were bycaught in set nets and collected during January and February 2016 (one was used to produce sequence data). A bycaught female Steller sea lion (Eumetopias jubatus) was collected in April 2008. A female spotted seal (Phoca largha) was collected on a beach in August 2015. All of the above were found in the waters off Gangwon-do, northeastern South Korea.
DNA sequencing and genome assembly. For whole-genome shotgun sequencing and draft genome assembly, we constructed two paired-end libraries with insert sizes of 350 and 700 bp using the Illumina TruSeq DNA Sample Preparation Kit (Illumina, San Diego, CA, USA). For the Steller sea lion genome, mate-pair libraries with insert sizes of 3, 9, and 40 kb were constructed as scaffolds using the Illumina Nextera mate-pair library construction protocol (Illumina). Sequence reads were generated using the Illumina Nextseq 500 platform. Information on the constructed libraries and sequencing data is provided in Table S7.
The 19-mer distribution of the paired-end library with an insert size of 350 bp was calculated using Jellyfish 49 , and the sizes of three genomes were estimated (Fig. S1). To retrieve high-quality sequence reads, the quality of the raw data was controlled using FASTQC 50 . Artifact sequences were removed via Trimmomatic 51 for paired-end libraries, and Nxtrim 52 for mate-pair libraries. Sequencing errors within each read were estimated and discarded using the error-correction module of Allpaths-LG 53 . We assembled error-corrected paired-end reads using IDBA_UD 54 with the option of pre-correction and kmin = 40. Scaffolding on initial contigs was conducted using the paired-end reads with a 700 bp insert size, and mate-pair reads sequentially by SSPACE 55 and ScaffMatch 56 . After scaffolding, we iteratively filled gaps using Gapcloser 57 with the -l 155 and -p 31 parameters.
RepeatModeler 58 , which includes RECON 59 , RepeatScout 60 , and TRF 61 , was used to create a custom database for each species. A custom library was constructed by integrating the custom databases into the Repbase 62 database of mammals. Repeat elements were identified and masked using RepeatMasker 63 with the custom library and '-q, no_is' options.
Genome annotation. Two approaches were used to predict protein-coding genes. First, manually curated protein sequences of Mammalia were retrieved from Swiss-Prot 64 and aligned to the pinniped genomes using tBLASTn 65 . The homologous genome sequences with E-values ≤ 1E-5 were extracted and realigned to the matched proteins using Exonerate 66 to predict splice sites. Ab initio gene prediction was conducted using Augustus 67 , Geneid 68 , and GlimmerHMM 69 software with the default options. Predicted genes using each approach were combined using EvidenceModeler 70 into a consensus gene set.
For assessment of the quality of the draft genome, we remapped paired-end reads with a 350 bp insert size and investigated completeness of core-orthologs using BUSCO 71 .
For the three gene sets, the best match of a BLASTP 72 search against the SwissProt and TrEMBL databases 73 was assigned to putative functions. Gene motifs and domains were determined using InterProScan v. 5.19 74 . The GO IDs for each gene were obtained from the corresponding InterPro entries.
Ortholog identification. The complete proteome datasets were downloaded from UCSC Genome Browser 75 for the following nine mammals: human (hg19), mouse (mm10), dog (canFam3), cow (bosTau8), manatee (triMan1), dolphin (turTru2), Minke whale (balAcu1), opossum (monDom5), and elephant (loxAfr3). Gene clusters for these nine mammals and three pinnipeds were identified using OrthoMCL v. 2.0.9 76 with the default settings. A custom python script was used to generate a dataset comprising strict one-to-one orthologs (core-orthologs) from the 12 mammals. Phylogenomic analyses using a genome-wide set of one-to-one orthologs. Amino acid sequences of 12 mammals corresponding to the one-to-one orthologs were individually aligned using ClustalW v. 2.1 77 . A concatenated alignment was then prepared by merging individual alignments. The concatenated alignment was trimmed using Gblocks v. 0.91b 78 with auto settings. The best-fit substitution model for the alignment was determined using ModelGenerator 79 . For phylogenetic analyses, RAxML v. 7.2.8 80 was used to generate ML trees. Rapid bootstrap analysis and identification of the best-scoring ML tree (-f a option) were performed using RAxML v. 7.2.8 80 . Bootstrap support values/percentages were determined using 100 replicates. A Jones-Taylor-Thornton amino acid substitution model 81 (with the PROTCATIJTTF option) as recommended by ModelGenerator 82 was used to construct the ML trees.
Detection of lineage-specific gene losses and gains. Using the gene clusters defined by Orthomcl v.
2.0.9 76 , the genes in each gene family group were enumerated and converted to input data for CAFÉ software v. 3.1 17 . Expansion or contraction of the gene families was defined by comparing the cluster size of the ancestor to that of each of the current species using CAFÉ 17 .
Detection of positively selected genes and substitutions. To detect positively selected genes, coding sequence alignments were prepared by pal2nal v. 14 83 using the amino acid alignments of the one-to-one orthologs. After trimming of the poorly aligned regions, alignments that are shorter than 100 bp or contain an internal stop codon were excluded.
To detect positive selection affecting a few sites in particular lineages (foreground branches, pinniped lineage in this study), we employed a branch-site model, which allows the ω ratio to vary both among lineages and among sites. We used the ML method of codeml in PAML v. 4.9 84 , which estimates the rate of non-synonymous substitutions (dN), the rate of synonymous substitutions (dS), and the ratio of the non-synonymous to synonymous substitution rates (ω) values using the F3X4 codon frequencies. An alternative codon substitution model was specified using model = 2, NSsites = 2 (model A 23,85 , number of parameters k = 4), which was compared with the corresponding null model ω 2 = 1 (ω ratio of foreground branches) fixed (fix_omega = 1 and omega = 1) using a likelihood-ratio test (LRT). From the alternative model, two different ω ratios of site class 2b (proportion: (1 -p 0 -p 1 ) p 1 /(p 0 + p 1 ), ω 1 = 1, ω 2 ≥ 1) for pinniped branches (foreground branches) and other branches (background branches) were estimated (Fig. S10a) to detect positive selection.
To identify fast-evolving genes in marine mammals (pinnipeds, cetaceans, and sirenians), we employed a branch model, which allows the ω ratio to vary among branches 32 . In codeml, an alternative codon substitution model was specified using model = 2 and NSistes = 0, which was compared with the basic null model (model = 0, NSsites = 0) by LRT. From the alternative model, two different ω ratios for marine mammal branches (foreground branches) and other branches (background branches) were estimated (Fig. S10b).
Genes with a maximum dS of >3 or maximum dN/dS of >5 in all branches or a log-likelihood ratio of <0 were filtered from the output of each analysis. The Bonferroni method 46 was used to correct for multiple testing, and a value of p < 0.05 was taken to indicate statistical significance.
Calculation of site-wise likelihood support. To detect sites with molecular divergence that supported the monophyly of pinnipeds, we fitted the amino acid sequence alignment of one-to-one orthologs to a null model (H 0 , species tree) and an alternative model (H 1 , monophyly of marine mammals) (Fig. 3A). The goodness-of-fit of each site to a pair of phylogenetic trees under a given model was calculated as the SSLS value and directly compared as ΔSSLS = lnL (H 0 ) -lnL (H 1 ). Positive ΔSSLS values indicate a better fit of the model to the species tree, H 0 (supporting divergence), whereas negative ΔSSLS values indicate a better fit to H 1 (supporting convergence). The substitution model for each gene was determined by ModelGenerator 82 . The SSLS value for each site of alignment was estimated by RAxML v. 7.2.8 80 .
Identification of parallel and unique substitutions. We defined parallel substitutions as any amino acid change at the same position in marine mammals different from that of the ancestral node of each marine group, but identical in the three marine groups. To identify parallel amino acid changes in marine mammals, the species tree constructed in this study was used to reconstruct the ancestral sequences. The ancestral sequences for each node were reconstructed by Joint method using FastML v. 3.1 86 . We allowed FastML 3.1 to estimate the branch length of the phylogenetic tree for each gene when the ancestral sequences were reconstructed using the set of 12 mammals. For the sites with parallel and unique substitutions, the amino acid sequences of 100 vertebrates were investigated by 100-way multi-alignment 87 with the UCSC genome browser.

Conserved domain search.
To determine whether positively selected sites are located in gene functional domains, we searched for conserved domains within positively selected genes using the CD-Search tool in the NCBI 88 . The amino acid sequences of human orthologs were used as a query set with the following settings: data source, CDD v. 3.16; expected value threshold, 0.01; composition-based statistical adjustment, applied; low-complexity filter, not applied.
Gene ontology analysis. We mapped the identified genes to GO categories in Ensembl 89 to identify those putatively associated with a specific function, such as adipose tissue development. Gene set enrichment tests were performed by DAVID functional annotation 90 using a cutoff P-value of <0.05.

Availability of Data and Material
The datasets generated during the current study are available in the NCBI repository, PRJNA422019.