## Introduction

Aquatic hypoxia (low oxygen) varies in spatial scope, severity, and frequency, and it is increasing globally due to human activities (e.g., climate change, eutrophication)1. Oxygen availability has been proposed to be a major determinant of the distribution of marine animals2,3, and, along with changes in temperature and pH, hypoxia is widely recognized as a significant threat to aquatic organisms1,2,3. The importance of oxygen arises from its central role in aerobic energy metabolism, which supports critical cellular and organismal functions, including ion transport, motility, growth, and reproduction. Hence, oxygen is essential for the normal physiological function of most metazoans, and complex regulatory mechanisms have evolved to mitigate the deleterious effects of hypoxia.

The molecular responses to low oxygen are orchestrated by the hypoxia inducible factor (HIF) family of transcription factors4,5. The HIF transcription factor is comprised of two non-identical protein subunits (α and β), both of which are members of the basic helix-loop-helix Per-ARNT-Sim (bHLH-PAS) family of transcription factors6,7. The HIFβ subunit, also known as the aryl hydrocarbon receptor nuclear translocator (ARNT), is constitutively expressed, oxygen-independent, and serves other roles in cell signaling7. The oxygen-dependence of HIF function is attributed to an increase in the cellular concentration of the HIFα subunit during hypoxia, driven largely by a decrease in the rate of its degradation. At normal oxygen levels (normoxia), specific proline residues of the HIFα subunit are modified by hydroxylation, which signals the protein for ubiquitin-dependent degradation8,9,10. During hypoxia, proline hydroxylation and protein degradation of the HIFα subunit are blocked, whereupon HIFα accumulates, dimerizes with HIFβ, translocates to the nucleus, and, together with accessory proteins, binds specific DNA elements in target genes and activates their transcription. Gene targets of HIF number in the hundreds and include genes involved in oxygen transport, glucose uptake and metabolism, and cell survival and proliferation11. These molecular responses serve to ensure oxygen delivery to tissues or enhance the function of tissues during hypoxia.

HIF has been studied extensively in humans, where low tissue oxygenation is associated with several pathologies, including cardiovascular disease, pulmonary disease, and cancer, but also with fetal development and exposure to high-altitude hypoxia4,5,12,13,14,15. In humans, as in other mammals, three genes encode different HIFα subunits, HIF1α (HIF1A), HIF2α (HIF2A), and HIF3α (HIF3A; see Table 1), which likely arose from two rounds of genome duplication at the base of vertebrate evolution16. The protein products dimerize with HIFβ to form the active transcription factors, HIF1, HIF2, and HIF3. Among these, HIF1 is the most well-characterized, has the broadest tissue distribution and gene specificity, and is essential for proper development and the response to hypoxia in mammals5. HIF2, initially characterized in endothelial tissues and also known as endothelial PAS protein 1 (EPAS1), is more restricted with respect to tissue distribution and target genes, some of which are shared with HIF112,17. It is critical for angiogenesis, cancer progression, and high-altitude adaptation in humans12,14,15,18. HIF3 is the least well-described. Like HIF1α and HIF2α, HIF3α dimerizes with HIFβ to regulate the expression of specific genes; however, a variety of shortened forms, translated from splice variants, act as negative regulators of HIF1α or HIF2α19,20,21.

The ray-finned fishes (Actinopterygii) are the most speciose and diverse class of vertebrates, having over 30,000 species occupying virtually every aquatic habitat on earth22. Understanding HIF signaling among fishes could provide insights into their evolutionary history and their capacity to respond to the increasing prevalence of aquatic hypoxia. The initial characterization of HIFα subunits in fishes demonstrated that they possessed orthologs of the three genes found in mammals (reviewed in23,24,25). In a comprehensive phylogenetic analysis of HIFA genes in fishes, Rytkonen et al.26 found evidence that certain fish lineages retained duplicated copies of the HIFA genes that presumably arose from another round of genome duplication, the teleost-specific genome duplication (TGD)27,28. Specifically, the family Cyprinidae (carp and its allies, including the zebrafish, Danio rerio) was proposed to have teleost-specific paralogs HIF1Aa/b, HIF2Aa/b, and HIF3Aa/b (see Table 1). While these teleost-specific paralogs appear to be lost in most other lineages of fish, Rytkonen et al.26 present evidence that certain species among the more-derived Neoteleostei retained a shortened duplicate of HIF2Ab, a putative “relic” of the TGD.

Over the last decade, sequence analyses based primarily on HIFA transcripts from various fishes generally support the conclusions of Rytkonen et al.26; however, several questions regarding the evolution of the HIFA gene family remain unanswered. For example, the two rounds of genome duplication at the base of vertebrate evolution are predicted to result in four HIFA paralogs (“Ohnologs”29), rather than the three paralogs generally recognized to exist in vertebrates. While it is possible that nonfunctionalization of one paralog after the second round of genome duplication30 could account for this “missing” Ohnolog31, recent analyses have demonstrated that several fishes have HIF-like, HIFA-like, and HIF1A-like genes32. The relationships of these genes to one another and to the other HIFA genes, however, have not been resolved. Furthermore, the relationships among teleost-specific paralogs and their broader distribution among fishes are not well-established. This is especially true of the “relic” HIF2Ab. These uncertain relationships within and among teleost-specific paralogs has led to an inconsistent nomenclature of HIFA paralogs. Also, certain fish lineages have undergone additional rounds of genome duplication, for example the salmonid-specific genome duplication (SGD)33,34, which potentially further increased the diversity of HIFA genes in those lineages. Finally, there has been no systematic evaluation of the tissue expression of HIFA transcripts among fishes, which could help to clarify the contribution of subfunctionalization and neofunctionalization35 to the maintenance of HIFA paralogs.

The current study reexamines the evolution of HIFA in ray-finned fishes using recently sequenced genomes, including that of spotted gar (Lepisosteus oculatus) to represent a lineage that diverged prior to the TGD36. Specifically, we sought to determine (1) whether any ray-finned fishes retained the four HIFA genes that arose during the two rounds of genome duplication at the base of vertebrate evolution, (2) whether gene duplicates arising from the TGD are seen in fishes other than the Cyprinidae, (3) the distribution and phylogenetic relationship of shortened forms of HIF2Ab, (4) whether HIFA duplicates from the SGD are present in salmonid genomes, (5) the potential modes of selection acting on HIFA genes and the corresponding amino acid sites potentially under selection, and (6) the broad patterns of tissue expression of HIFA transcripts. This analysis of HIFA evolution in the ray-finned fishes provides evidence of “missing” Ohnologs, clarifies the relationships among teleost-specific paralogs, provides insights into the selective forces responsible for this diversity, and forwards a recommendation for a phylogenetically-based HIFA nomenclature.

## Results

A total of 114 putative HIFA homologs were recovered from searching the genomes of 22 species of Actinopterygii representing 14 orders (Supplemental Table S1). Phylogenetic analyses resolved four distinct clades (Fig. 1). This pattern was strongly supported by Bayesian analyses of nucleotide and deduced amino acid sequences (posterior probabilities ≥ 0.89; Supplemental Figs. S1, S3), as well as by maximum likelihood analyses (bootstrap values ≥ 0.71 for nucleotide analyses and ≥ 0.79 for amino acid analyses; Supplemental Figs. S2, S4). As expected, the branching patterns of species within each clade generally reflected the currently accepted phylogeny of fishes37. The spotted gar (Lepisosteus oculatus), a basal actinopterygian that diverged prior to the TGD, has one homolog in each clade. Thus, we infer that the four clades represent products of the two rounds of genome duplication in the ancestor of vertebrates and hereafter refer to these as HIF1A, HIF2A, HIF3A, and HIF4A. Most taxa examined here have at least one representative of all four HIFA genes, the exception being the most derived ray-finned fishes, the Neoteleostei, which appear to lack HIF4A.

### Teleost-specific HIFA paralogs

Teleost-specific duplicates of HIF1A (HIF1Aa and HIF1Ab) were only recovered in the Otocephala, a group including herrings, true minnows, carps, tetras, and catfish. Our results extend the observations of Rytkonen et al.26, who documented that the family Cyprinidae (e.g., zebrafish and carp) have teleost-specific paralogs of HIF1A, to include other Otocephala. Similarly, teleost-specific duplicates of HIF2A (HIF2Aa and HIF2Ab) are present in all the Otocephala examined here, as previously observed for cyprinids26. Additionally, more-derived fishes (Salmoniformes and their sister group, Esociformes, and Neoteleostei) retain a truncated version of HIF2A, previously referred to as a “relic” of the TGD26. The coding sequence of this truncated version is only one-third to one-half of “full-length” HIF2A, corresponding to the N-terminal portion of the protein (Supplemental Table S1). Although nucleotide and amino acid sequence analyses failed to reliably group it with the Otocephala HIF2Ab, evaluation of flanking genes placed the truncated form with other HIF2Ab (see below). Only one copy of HIF3A and HIF4A were recovered in any given species (with the exception of putative salmonid-specific paralogs, see below), which suggests that one duplicate of each of these genes was rapidly lost after the TGD. Previous analyses proposed that Cyprinidae retained teleost-specific duplicates of HIF3A26. Our analyses grouped one of these genes with HIF3A from spotted gar and one with HIF4A from spotted gar. Because spotted gar arose prior to the TGD, our results indicate that these cyprinid genes are HIF3A and HIF4A, rather than teleost-specific duplicates of HIF3A.

### Salmonid-specific HIFA paralogs

The current analysis revealed that Salmoniformes have two paralogs of HIF1Aa, HIF2Aa, and HIF3A (Fig. 1). These duplicates are not observed in the sister group Esociformes (Northern pike) and the branch lengths joining them are very short, consistent with an origin during the SGD. In support of this, Berthelot et al.33 noted that rainbow trout retained as many as 48% of the gene duplicates arising from the SGD, including an over-representation of transcription factors. In the absence of a naming convention for salmonid-specific duplicates, and to distinguish these from TGD duplicates, these paralogs are referred to as HIF1Aa_s1, HIF1Aa_s2, HIF2Aa_s1, HIF2Aa_s2, HIF3A_s1, and HIF3A_s2.

### Synteny analyses support relationships within paralogs

Because the relationship of paralogs arising from the TGD based upon sequence analyses alone can be ambiguous38, we used shared synteny among species representing major fish lineages to clarify the relationships among HIFA paralogs (Fig. 2; Supplemental Table S2). For HIF1A, several flanking genes in spotted gar (L. oculatus) are conserved throughout Actinopterygii. As expected, there are more shared flanking genes in primitive species (S. formosus) compared to more derived species. Notably, the gene order of the 10 upstream genes is perfectly conserved in HIF1Aa in Otocephala (represented by D. rerio). This pattern supports the view that HIF1Aa of Otocephala is orthologous with the single paralog of HIF1A found in other ray-finned fishes. Although there are a similar number of flanking genes conserved between spotted gar and Otocephala HIF1Ab, their order and direction are more variable.

For HIF2A, the order of flanking genes is not as highly conserved across species, especially among Neoteleostei (represented by X. maculatus), where only the immediate upstream flanking gene is conserved in one of the two teleost-specific duplicates. This is the full-length form of HIF2A (i.e., not truncated) and its grouping with the full-length forms of HIF2A from other species is strongly supported by phylogenetic analyses (Fig. 1). Because this gene from all other species shares strong syntenic relationships with spotted gar HIF2A, these genes are HIF2Aa. The other teleost-specific paralog of HIF2A in Otocephala and Salmoniformes shares fewer flanking genes with spotted gar, and thus represents HIF2Ab, as previously proposed26,39. Importantly, up to 15 flanking genes are shared between HIF2Ab from zebrafish, Danio rerio (representing Otocephala), and the truncated HIF2Ab genes in Salmoniformes and Neoteleostei (Fig. 2, yellow arrows), a strong indication of common ancestry. Thus, we conclude that Otocephala HIF2Ab is orthologous with the truncated HIF2Ab in more derived species.

Synteny analysis of HIF3A and HIF4A showed considerable variation in the number of shared flanking genes among these paralogs in Actinopterygii. Interestingly, Otocephala HIF3A shares only three flanking genes with spotted gar, considerably fewer than observed in more-derived species. For HIF4A, the number of flanking genes roughly reflected the degree of divergence from the ancestral species, as expected, being highest in the Asian arowana (S. formosus) and the least in the rainbow trout (O. mykiss). The number of shared genes among HIF1A, HIF2A, HIF3A, and HIF4A was extremely limited (Supplemental Table S2), supporting an origin of these four paralogs in the ancient genome duplications at the base of vertebrate evolution.

Finally, synteny analysis of salmonid-specific duplicates of HIF1Aa, HIF2Aa, and HIF3A showed that one of the paralogs (s1) shares more flanking genes with the corresponding gene in Esociformes, the sister group of Salmoniformes, than the other paralog (s2) (Supplemental Table S2).

### Evidence for positive selection

We investigated whether HIFA genes experienced variable selective pressures using branch model tests performed in EasyCodeML40 on all accessions across four HIFA clades and one outgroup, Ciona intestinalis. A two-ratio model was not a better fit to the data than a one-ratio model in any of the tests, indicating that ω ratios (nonsynonymous to synonymous substitution ratios; dN/dS) were similar among all four HIFA genes (Table 2). The one-ratio model ω values were all much less than one, suggestive of overall purifying selection on each HIFA gene.

Next, the four HIFA clades identified by phylogenetic analyses were independently examined for gene-wide and codon-based episodic and pervasive selection. Gene-wide tests of episodic selection performed with BUSTED41 found evidence of diversifying selection for at least one site on at least one branch of each HIFA gene (HIF1A: LRT = 46.754, p = 3.52e−11; HIF2A: LRT = 175.805, p = 0; HIF3A: LRT = 150.118, p = 0; HIF4A: LRT = 9.001, p = 0.006). This result was supported by aBSREL42,43 analyses showing evidence of diversifying selection for each HIFA (Fig. 3). The percentages of branches within each gene tree displaying significant positive selection were 18% for HIF1A (Fig. 3a), 10% for HIF2A (Fig. 3b), 19% for HIF3A (Fig. 3c), and 26% for HIF4A (Fig. 3d). For each gene, several of the branches that showed significant positive selection corresponded to major taxonomic groups. For example, significant positive selection was detected for the branch leading to Otocephala HIF1Aa, the branch leading to HIF1Aa in Salmoniformes and Neoteleostei, and the branch leading to Otocephala HIF1Ab (Fig. 3a).

Codons potentially under episodic or pervasive positive selection were detected for the four HIFA genes (Table 3). For all HIFA genes, episodic positive selection occurred more frequently than pervasive positive selection. MEME44 identified 25 sites under episodic positive selection for HIF1A, 41 sites for HIF2A, 34 sites for HIF3A, and 18 sites for HIF4A. Of this total of 118 sites, 60 also had BUSTED evidence ratios greater than two, providing further support that these sites have experienced episodic positive selection41. FEL45 showed that pervasive positive selection has acted on one site for HIF1A, two sites for HIF2A, two sites for HIF3A, and six sites for HIF4A. Two additional tests of pervasive selection, SLAC45 and FUBAR46, corroborated these results for one site in HIF1A and one site in HIF2A. Together, these analyses identified at least one site in each HIFA gene that has experienced both episodic and pervasive positive selection.

For each HIFA gene, the amino acid residues aligning with sites putatively under positive selection (Supplemental Table S3) were scored by their physicochemical properties47. Species were then grouped by discriminant analyses of principal components based upon these properties48,49. In agreement with the aBSREL analyses (Fig. 3), the physicochemical properties of these positively selected sites tended to group according to the species’ phylogenetic placement (Supplemental Fig. S5). In addition, these analyses distinguished the teleost-specific paralogs (HIF1Aa/b and HIF2Aa/b) from one another, with a few exceptions (e.g., HIF1Ab from the common carp, Cyprinus carpio, grouped with HIF1Aa from more-derived fishes). For HIF3A and HIF4A, the paralogs from Salmoniformes constituted a distinct group from the corresponding paralogs in other Actinopterygii. This analysis, however, did not discriminate between salmonid-specific paralogs for HIF1Aa, HIF2Aa, or HIF3A, likely reflecting their relatively recent origin.

### Structural modeling of HIFα amino acid variation

Sites potentially experiencing positive selection fell within protein domains responsible for DNA-binding (bHLH), protein dimerization (PAS-A and PAS-B), protein stability (NODD and CODD), or activation of target genes (NTAD and CTAD) for each HIFα subunit (Fig. 4). For HIF1α and HIF2α, we made structural models of their N-terminal halves based the corresponding regions from mammalian HIF1α and HIF2α50 and mapped the sites identified as being under positive selection for these two subunits. Only five of the 25 sites in HIF1α potentially under positive selection are found in this region of the protein (Fig. 4). Of these, one was the first amino acid of the bHLH domain, two fell in the PAS-B domain, and two were in loops connecting the major structural domains (Supplemental Fig. S6). For HIF2α, on the other hand, more than half of the 42 sites potentially under positive selection occur in the N-terminal half of the protein (Fig. 4), 23 of which mapped to a structural model of HIF2α (Fig. 5). The majority of these sites are in the PAS-A and PAS-B domains, and include five residues in the PAS-B domain that directly or indirectly interact with HIFβ (ARNT) in mammals50. Although structural models were not made for HIF3α or HIF4α, positively selected sites were found in the bHLH (HIF4α) or PAS domains (HIF3α and HIF4α) (Fig. 4). In addition, 13 of the 34 sites potentially under positive selection in HIF3α fell in the C-terminal leucine zipper (LZIP) domain specific to this HIFα subunit (Fig. 4). Collectively, these results suggest that amino acids involved in DNA-binding or protein dimerization may be under positive selection in Actinopterygii.

We next asked whether positive selection may have occurred at the N-terminal and C-terminal oxygen dependent degradation domains (NODD and CODD) that are potential targets of regulation by prolyl hydroxylases8 and the extreme C-terminal CEVN motif targeted by asparaginyl hydroxylase51. Across HIFα subunits, both the NODD and CODD were highly conserved, with a couple of notable exceptions. First, two sites aligning with alanine and proline in the canonical hydroxylation motif of LxxLAP in the NODD of HIF3α (MSA codons 463 and 464, Table 3) are putative targets of positive selection. Second, the NODD is absent in HIF4α. These observations support suggestions that the NODD may be less critical than the CODD in determining the oxygen-dependence of HIFα subunit degradation8,52. The current analysis also confirms that HIF3α lacks the asparaginyl hydroxylation motif, CEVN21,53. Moreover, the asparagine targeted by hydroxylation is absent in salmonid HIF4α, which have threonine at this position (Supplemental Table S3).

Although prolyl and asparaginyl hydroxylation are critical to the stability and transcriptional activity of HIFα subunits, respectively, the protein subunits are subject to a variety of other post-translational modifications (PTM) in mammals54,55. Accordingly, we determined if the sites potentially under positive selection in HIF1α and HIF2α from fishes aligned to sites of known PTM in humans. For HIF1α, one site (MSA codon 62) aligned with a lysine in humans (K11), which, when acetylated, blocks proteosomal degradation54,56. Four other sites (MSA codons 609, 695, 761, and 774) aligned with sites that are phosphorylated in human HeLa cells (S484, S581, S657, S664)55. At one of these sites (MSA codon 695), the residue in fishes is not phosphorylatable. For HIF2α, only one site (MSA codon 1093) aligned with a residue that is subject to phosphorylation in humans (S790). Finally, although not a site identified as under positive selection, it is relevant to note that the site that aligns with mammalian S31 is glycine in most fishes. In mammals, this residue is phosphorylated under hypoxia and may reduce the transcriptional activity of HIF155. As documented by Daly et al.55, and substantiated here, only primitive fishes have a serine at this location, suggesting that this potential mechanism of transcriptional regulation has been lost in more-derived species of fish.

### Transcript analyses

The PhyloFish database57 was queried for HIFA transcripts in multiple tissues across a broad sampling of ray-finned fishes. In general, HIF1A demonstrated the broadest tissue distribution, being higher, on average, than the other HIFA paralogs in most tissues represented in the PhyloFish database (Fig. 6; Supplemental Table S4). Frequently, the highest levels of HIF1A transcripts were found in heart. HIF2A was more restricted in its distribution and, in many species, it was the most abundant paralog in gill. HIF3A was expressed at substantial levels in many tissues, being the most highly expressed paralog in embryo in several species. In those species having the HIF4A gene, its expression was low and limited to a few tissues (e.g., heart, gill, kidney, and bone).

The Otocephala are the only lineage of ray-finned fishes to retain both teleost-specific paralogs of HIF1A (see above). While other lineages exclusively express HIF1Aa, Otocephala express HIF1Ab more highly and broadly across tissues, with only very low expression of HIF1Aa (Fig. 6b). On the other hand, Otocephala are like other fishes in expressing more HIF2Aa than HIF2Ab, especially in gill. In Otocephala, HIF2Ab encodes a “full-length” protein and it was also expressed in gill. Among the Salmoniformes, which have salmonid-specific duplicates of HIF1Aa, HIF2Aa, and HIF3A in their genomes, one paralog of each was preferentially expressed (Fig. 6c). HIF1Aa_s1 was broadly distributed and most abundant in heart; HIF2Aa_s1 was largely restricted to gill; low levels of HIF3A_s1 were detected in many tissues and was it the most abundant paralog in embryo. In each case, the expression of the other paralog (s2) was lower and showed similar tissue distribution.

HIFA expression in more derived species reflected broad tissue distribution of HIF1Aa and HIF3A, with more restricted, gill-specific expression of HIF2Aa, as seen in other species (Fig. 6d). The truncated form of HIF2Ab, which is present in the genomes of several fish lineages (see above), was not recovered in the PhyloFish database. The transcript for HIF4A was not found in any Neoteleostei represented in the PhyloFish database, consistent with its absence from genomes of more-derived Actinopterygii.

## Discussion

### Genome duplication and HIFA diversity among ray-finned fishes

Two rounds of genome duplication in the ancestor of vertebrates, followed by additional genome duplication during the evolution of ray-finned fishes, expanded certain gene families, including those encoding HIF, a master regulator of oxygen-dependent gene expression in animals. The present analyses of HIFA genes in Actinopterygii revealed that several lineages retain four paralogs predicted from two rounds of genome duplication at the base of vertebrate evolution. The current results suggest that several sequences formerly described as “HIFA-like” or “HIF1A-like32 should be recognized as either HIF3A or HIF4A. Although HIF3A has been previously described in vertebrates, including fish21,26,58, there has been no formal recognition of HIF4A in any vertebrate animal. Rytkonen et al.26 presented evidence that certain fishes possess HIF3Ab, a putative teleost-specific duplicate of HIF3A. We show that this gene occurs in the genome of spotted gar, representing a lineage of ray-finned fishes that diverged prior to the TGD. Thus, it is properly designated as HIF4A. This gene is found in all ray-finned fishes examined here with the exception of the more-derived Neoteleostei. Thus, HIF4A, a heretofore “missing Ohnolog”, is widely, but not uniformly, distributed among Actinopterygii.

Consistent with Rytkonen et al.26, we found teleost-specific paralogs of HIF1A and HIF2A in several lineages. For HIF1A, we present evidence that all Otocephala, not just the Cyprinidae, retain both paralogs. Similarly, duplicated “full-length” forms of HIF2A are present in all Otocephala examined here. This is significant because some species in the Otocephala are not particularly tolerant of low oxygen, meaning that retention of teleost-specific duplicates does not necessarily confer hypoxia tolerance. Rather, having duplicated HIFA genes could have permitted the evolution of hypoxia tolerance in certain lineages (e.g., cyprinids) given the proper ecological context (e.g., persistent or recurrent aquatic hypoxia)31. In addition, we found that a truncated form of HIF2Ab is more broadly distributed among ray-finned fishes than previously appreciated26. Only a single HIF3A and HIF4A were recovered in the species examined here, however, arguing that one teleost-specific duplicate was quickly lost after the TGD, consistent with the notion that nonfunctionalization is the most common fate of one paralog after gene duplication30. We also report, for the first time, the presence of duplicates of HIF1Aa, HIF2Aa, and HIF3A in Salmoniformes, which likely arose from the SGD.

### A phylogenetically-based gene nomenclature

The duplication of HIFA genes during the TGD followed by the subsequent lineage-specific loss of various paralogs has given rise to an inconsistent nomenclature. Herein, we adopt two naming conventions that recognize the evolutionary relationships of teleost-specific paralogs59. First, when one or more lineage retains both paralogs, the “a” form is the one that shares more flanking genes with the ancestral (gar) form. Applying this rule to HIF1A results in HIF1Aa and HIF1Ab that conform to the gene names currently recognized. For HIF2A, however, conclusions based upon synteny differ from the names of HIF2A paralogs in some fishes, most notably zebrafish, Danio rerio. Here, we show that in most species, including zebrafish, the paralog we propose as HIF2Aa shares more syntenic genes with gar HIF2A than the other paralog. Moreover, this paralog was the first HIF2A described in fishes60 and, when teleost-specific paralogs were initially described, it was referred to as HIF2Aa26. The paralog we propose as HIF2Ab shares fewer flanking genes with gar, is less broadly expressed among tissues and species, and is predicted to encode a truncated protein in fishes other than Otocephala26,39. In zebrafish, the “a” and “b” designations are reversed. That is, the paralog we suggest is properly designated as HIF2Aa is hif2ab (or epas1b) in zebrafish (located on chromosome 13) and the paralog we propose as HIF2Ab is hif2aa (or epas1a) in zebrafish (located on chromosome 12). This discrepancy arises because in zebrafish, “the a or b suffix does not indicate primacy of publication and will be assigned purely based on the suffix of the surrounding genes” (https://zfin.atlassian.net/wiki/spaces/general/overview). Gasanov et al.59 suggest that this convention lacks phylogenetic context and should be revisited as the syntenic relationships between individual paralogs and ancestral fishes are elucidated, as we have now done for HIF2A. Until there is consensus, however, great care will be needed when interpreting reports of paralog-specific differences in HIF2A.

The second naming convention applies when no lineage retains both paralogs, as observed for HIF3A and HIF4A in the current study. In these cases, the relationship of the paralog that has been retained in different lineages is not certain. Although one might expect that all extant species retained the same teleost-specific duplicate, it is possible that one lineage retained one duplicate and another lineage retained the other copy (i.e., reciprocal silencing30). Indeed, the very limited shared gene order between Otocephala HIF3A and HIF3A in other ray-finned fishes suggests that might have occurred for this gene. Without clear evidence of the relationships of the paralogs to the ancestral form, the use of “a” and “b” should be avoided59.

Finally, the same conventions can be applied to paralogs arising during other gene duplication events, for example the SGD. Here, we propose that duplicates of HIF1Aa, HIF2Aa, and HIF3A that share more flanking genes with the sister group, Esociformes, be recognized as the “s1” paralog and the duplicate that shares fewer flanking genes be the “s2” paralog.

### Potential causes and functional consequences of HIFA diversity

In our comparison of the strength of natural selection acting on HIFA genes, we found all four clades had very low rates of nonsynonymous to synonymous substitutions (dN/dS; ω), suggesting that HIFA is subject to purifying selection. This result is consistent with natural selection acting to conserve the sequences of critical regulatory proteins, including transcription factors, and it agrees with previous studies reporting low values of ω for HIFA genes in fishes16,32,61,62,63. In addition, we found that values of ω were not statistically different when comparing gene clades to one another (i.e., HIF1A, HIF2A, HIF3A, and HIF4A). This result differs from that of Rytkonen et al.16, who reported that, among fishes, ω was equivalent for HIF1A and HIF2A and slightly, but significantly, lower than that for HIF3A. Based upon this, Rytkonen et al.16 proposed that HIF3A was evolving under relaxed purifying selection or adaptive positive selection. Our study differs from Rytkonen et al.16 in many ways, including the number and specific sequences used, the species designated as outgroup, and our grouping of some sequences formerly classified as HIF3A as HIF4A (see above). Despite these differences, the studies are similar in the conclusion that a major theme in HIFA evolution is one of purifying selection.

Against this backdrop of purifying selection, however, we found evidence of widespread episodic positive selection when each HIFA clade was independently evaluated with gene-wide and codon-based tests of positive selection. This finding is consistent with the idea that natural selection is episodic, but the strength of this signal may be overshadowed by strong purifying selection acting on other branches44. Our findings are in general agreement with studies on cyprinid fishes showing positive selection acting on specific genes or lineages23,26,62,64. Although we did not formally test whether teleost-specific paralogs are experiencing differing rates of selection (cf.26), branches leading to HIF1Aa and HIF1Ab were characterized by having a high proportion of sites under significant positive selection. Grouping HIFA genes according to the physicochemical properties of their deduced amino acid sequences provided further evidence of divergence between teleost-specific paralogs of HIF1A and HIF2A. For HIF3A and HIF4A, gene-wide and codon-based tests showed significant positive selection in branches leading to major taxonomic groups (e.g., Salmoniformes), which was likewise supported by divergence in the physicochemical properties of the translated proteins.

When the sites putatively under positive selection were mapped to the respective subunits’ sequences, several fell within conserved protein domains. Consistent with studies in fishes and vertebrates in general, several sites potentially under positive selection in HIFα subunits occur in the PAS domains, which are involved in DNA binding and subunit dimerization15,16,32,64. Pamenter et al.15 reviewed the amino acid sites diverging in high-altitude species or populations of terrestrial vertebrates, mainly mammals. Similar to the results reported here, they found more divergent amino acid sites in HIF2α than in HIF1α, with a preponderance of those sites in the PAS domains. Amino acid variation in the PAS domains is speculated to affect dimerization with HIF1β (ARNT), post-translational modification, and transcriptional activation15. Intriguingly, sites potentially under positive selection in HIF2A resolved in the current study mapped to two amino acids that contact HIF1β in mammals and three other sites that bind to compounds interfering with subunit dimerization50. Other sites in ray-finned fish HIFA genes that appear to be under positive selection mapped to amino acids that are subject to post-translational modification in human HIF1α or HIF2α, alter the sequence of a canonical prolyl hydroxylation domain in HIF3α, or mutate the target of asparaginyl hydroxylation in HIF4α. Whether the amino acid variation we report here affects HIFα protein stability or function, as reported for mammalian HIFα subunits15, remains largely unexplored in fishes.

### Tissue expression of HIFA in actinopterygii

Our survey of HIFA expression across Actinopterygii supports the idea that HIF1A is broadly expressed across tissues and that HIF2A is more restricted in its distribution. Interestingly, the tissue showing highest HIF2A expression levels is gill. While elevated levels of HIF2A in gill have been documented in single-species studies39,65,66, our results show that this pattern is broadly distributed among ray-finned fishes. Recently, Pan et al.67 showed that HIF2A is highly expressed in neuroepithelial cells of zebrafish gill, suggesting it might play a role in oxygen sensing by this tissue, analogous to its role in mammalian carotid body68. In addition, gills are highly vascularized, and the presence of HIF2A transcripts could reflect a large proportion of endothelial cells, which are known to express HIF2A in mammals. The current results also demonstrate that HIF3A is expressed at substantial levels in several tissues, being the most highly expressed HIFA in embryos in many species. Previous studies have shown that HIF3A is broadly expressed among fish tissues including embryos39,52,58,66,69. In zebrafish embryos, Kopp et al.69 documented an increase in HIF3A transcripts during exercise, and Zhang et al.52 demonstrated that HIF3A acts as a hypoxia-dependent transcriptional activator during early zebrafish development. Across all Actinopterygii, HIF4A was the least expressed HIFA transcript. In general, the level of HIF4A declined from primitive to more-derived species, being lost from the genome and, consequently, not expressed in Neoteleostei. This pattern is consistent with nonfunctionalization of HIF4A during the evolution of ray-finned fishes. Of note, HIF4A is also missing from the genomes of other vertebrates, suggesting it has been nonfunctionalized in these lineages as well32.

The current survey of HIFA expression highlights processes that may serve to maintain paralogs arising from the TGD. It has been argued that subfunctionalization has been an important force in maintaining both teleost-specific duplicates of HIF1A and HIF2A in zebrafish, a member of the Otocephala26. In the case of HIF1A, one striking result is that HIF1Ab is highly expressed across a broad array of tissues in Otocephala, a pattern displayed by the other paralog, HIF1Aa, in other Actinopterygii. In Otocephala, this might have allowed HIF1Aa to assume a different role, for example during development26; such subdivision of functions was not possible in other Actinopterygii that lost HIF1Ab. In Otocephala, levels of HIF2Aa transcripts were higher than HIF2Ab, as previously reported for zebrafish66, although still limited in its tissue distribution (see above). Although levels of HIF2Ab transcripts were quite low in Otocephala, they are reported to respond robustly to low oxygen exposure, at least in zebrafish26. The truncated transcript predicted from HIF2Ab in other Actinopterygii was not recovered from the PhyloFish database, but it has been found in transcriptomic studies, albeit at very low levels39,70. Because tissues used to generate the RNA for the PhyloFish database were from fish held under standard laboratory conditions, we cannot rule out the possibility that the expression of “truncated” HIF2Ab increases under other conditions (e.g., hypoxia) or at different developmental stages.

For duplicated forms of HIFA arising from the SGD, one paralog of HIF1Aa, HIF2Aa, and HIF3A was more highly expressed than the other across all tissues. The observation that the other paralog was expressed in the same tissues, albeit at lower levels of expression, suggests that differing tissue specificity does not account for the maintenance of both duplicates. As mentioned above, RNA was derived from a limited number of individuals sampled under relatively benign conditions, and the less-expressed paralog may be upregulated during different environmental conditions or developmental stages. But, because salmonids generally occur in well-oxygenated habitats and have poor hypoxia tolerance, there does not appear to be a link between HIFA duplication and hypoxia tolerance in this group. This suggestion is supported by the observation that Northern pike, from the sister group to Salmoniformes that diverged prior to the SGD, lacks the salmonid-specific duplicate but are more hypoxia tolerant than salmonids71. It is possible that these duplicates may play other roles in salmonid physiology or life-history, and future research is needed to evaluate whether subfunctionalization or neofunctionalization are playing a role in maintaining these salmonid-specific duplicates. Alternatively, they may be destined for nonfunctionalization, a process that is likely still underway in this lineage given the recency of the SGD33.

## Conclusions

Here, we demonstrate that the diversity of HIFA genes in Actinopterygii is greater than previously appreciated, provide evidence that episodic positive selection is involved in generating this diversity, and report paralog- and tissue-specific HIFA expression levels. The current results present several opportunities for future research on HIFA in fishes. For example, tolerance to hypoxia measured at the organismal level demonstrates a strong phylogenetic signal among ray-finned fishes72, and future research could assess whether hypoxia-tolerant lineages are associated with the specific amino acid variants in HIFα subunits reported here. Furthermore, in fishes as in other vertebrates, HIF1A has received considerably more attention than the other HIFA paralogs. The tissue expression of HIF2A and HIF3A suggest that these paralogs may play critical roles in specific tissues, gill and embryo, respectively, that warrant further study. We also document that a truncated form of HIF2Ab is widespread in the genomes of ray-finned fishes. If transcribed and translated, the predicted protein product would have characteristics that could allow it to negatively regulate oxygen-dependent gene expression, as demonstrated for splice variants of mammalian HIF3A19,21. Finally, there is increasing appreciation of the hypoxia-independent roles of HIF signaling23,25. Perhaps some of the diversity in HIFA among ray-finned fishes is explained by functions other than regulation of oxygen-dependent gene expression. The current genomic and transcriptomic analyses may serve as a roadmap for the continued study into HIF signaling during normal fish development and physiology, as well as in the response of fishes to increasingly challenging environments.

## Methods

### Data

All Actinopterygian genomes available at NCBI (https://www.ncbi.nlm.nih.gov) or Ensemble (http://www.ensembl.org/) through June 2020 were searched for HIFA genes using known or putative HIF1A, HIF2A, HIF3A, and HIFA-like transcript sequences. The corresponding coding sequences (CDS) were checked to ensure each was complete and, when multiple CDSs were available for a single locus, the longest sequence was retained. This resulted in a list of 122 sequences from 24 species. Sequences for two species (Cynoglossus semilaevis and Maylandia zebra) were not included because they were less well annotated and did not substantially contribute to the taxonomic breadth represented by the other eight Neoteleostei. The final sequence list included 114 sequences from 22 actinopterygian species plus one sequence from the sea squirt, Ciona intestinalis, as an outgoup (see Supplemental Table S1). Three-dimensional protein structural models were based upon mouse HIF1α-ARNT-DNA (4ZPR) and HIF2α-ARNT-DNA (4ZPK) crystal structures from the Protein Data Bank (http://www.wwpdb.org/)50. Data for the analyses of HIFA transcript abundance were obtained from the PhyloFish database (http://phylofish.sigenae.org/index.html)57.

### Phylogenetic and synteny analyses

Multiple sequence alignments (MSA) were made using MAFFT version 7.123b73,74 implemented through the GUIDANCE2 server with Max-iterate of 20 (http://guidance.tau.ac.il/ver2/). For phylogenetic analyses, we used the MSA with the default column cutoff of below 0.9375,76. Bayesian analyses were performed in BEAST 2 v2.6.1 (https://www.beast2.org/) applying an uncorrelated log-normal relaxed molecular clock model77, a Yule model prior, with 10,000,000 chain-length and 100,000 burn-in78,79. The best-fit model of nucleotide substitution was identified as the GTR+I+G model by both jModelTest280,81 and ModelTest-NG82,83. Therefore, nucleotide analyses employed a general time reversible codon substitution model allowing for invariants and six gamma categories (GTR+I+G). Analyses of amino acid sequences employed a JTT matrix-based model84 allowing for invariants and six gamma categories, following a recent analysis of metazoan HIF-family proteins32. The maximum clade credibility tree was selected using TreeAnnotator v2.6.085. Maximum likelihood analyses were performed using MEGAX v10.1.886 with 100 bootstrap replicates87 under the same conditions used in Bayesian analyses. The trees with the highest log-likelihood were visualized and edited in FigTree v1.4.4 (http://tree.bio.ed.ac.uk/software/figtree/).

Shared synteny among representative species was assessed by determining the 10 deduced open reading frames (ORF) upstream and downstream of each putative HIFA gene using the NCBI Graphical Sequence Viewer (v3.38.0). If a putative ORF lacked a clear identification, BLASTP was used to compare the deduced protein sequence against Actinopterygii. For genes lacking an abbreviation at NCBI, the gene name was used in a search of UniprotKB (https://www.uniprot.org), and the corresponding abbreviation was used. A small number of putative ORFs could not be identified and were kept in the analysis as “unknowns”.

### Selection analyses

Translation alignments of full-length HIFA nucleotide sequences for selection analyses were created in Geneious v11.1.5 (https://www.geneious.com) using Clustal W88 alignment and BLOSUM89 substitution matrix. For each data set, we inferred the maximum likelihood gene tree using rapid hill-climbing mode in RAxML v8.2.090 as implemented through the CIPRES Science Gateway91. This was accomplished by drawing bipartition information on the best tree from 100 trees using the GTRGAMMA substitution model based on 1000 non-parametric bootstrap replicates. We replaced characters for frameshifts and stop codons, as required for selection analyses, with the exportAlignment program in MACSE v2.0092. Evolutionary selection analyses were conducted using branch models in EasyCodeML40 to explore differences in dN/dS (ω) ratios among HIFA gene clades. Four branch models were performed independently by selecting a particular HIFA gene clade as the foreground (e.g., HIF1A) and remaining clades as the background. Nested models were compared using likelihood-ratio tests (LRT)93 to assess significance of log-likelihood ratios between a one‐ratio model (Model 0) that assumes a constant ω throughout the tree and a two‐ratio model (Two-ratio Model 2) that allows ω for foreground branches to differ from branches throughout the rest of the tree94.

Additional tests of gene-wide and codon-based episodic (at a subset of sites or branches) and pervasive (across the whole phylogeny) selection were performed for individual HIFA gene subsets in the HyPhy package95,96 through the Datamonkey webserver97,98,99. To assess whether a gene has experienced positive (diversifying) selection at any site on at least one branch given a phylogeny, we implemented the Branch-site Unrestricted Statistical Test for Episodic Diversification (BUSTED)41. To test whether episodic selection occurred on any branch at a subset of sites in a gene, we used adaptive Branch-Site Random Effects Likelihood (aBSREL)42,43. We also assessed whether individual sites were subject to episodic selection on a proportion of branches using a Mixed Effects Model of Evolution (MEME)44, and pervasive selection with Fixed Effects Likelihood (FEL)45, Fast Unconstrained Bayesian AppRoximation (FUBAR)46, and Single-Likelihood Ancestor Counting (SLAC)45.

For each HIFA clade, we compiled an X-matrix of the amino acids at the sites identified as being putatively under positive selection by the HyPhy selection analyses. The physicochemical properties of each amino acid were scored by five z-descriptors as described by Sandberg et al.47: z1 (hydrophobicity), z2 (steric bulk), z3 (polarity), z4, and z5 (the latter two both related to electronic effects). We used the adegenet package48,49 in RStudio100 to identify the number of clusters across species by applying the k-means algorithm, then performed discriminant analysis of principal components on the minimum number of retained principal components.

### Protein structural modeling

Three-dimensional protein models for actinopterygian HIF1α and HIF2α were derived by structural homology modeling based upon the HIF1α:ARNT:DNA and HIF2α:ARNT:DNA complexes from mouse50. These structures correspond to residues 13-357 of mouse HIF1α (GenBank AAH26139.1) and residues 3-361 of mouse HIF2α (GenBank AAH57870.1), respectively. Three-dimensional models were built with Modeller v10.1, using align2d and the standard single-template "automodel” modeling protocol101. For both HIF1α and HIF2α, five models were produced, and the models with the lowest molpdf and DOPE scores were chosen as representative for further study. The amino acid sites putatively under positive selection in Actinopterygii were mapped to these structures using PyMol (v2.5.1).

The N- and C-terminal oxygen-dependent degradation domains (NODD and CODD) and the C-terminal asparaginyl hydroxylation motif (CEVN) were identified from multiple sequence alignments in Jalview v2.10.5102. The NODD and CODD included the canonical LxxLAP sequence targeted by prolyl hydroxylases and adjacent residues known to play a role in oxygen-dependent regulation of HIFα103.

### HIFA transcript analyses

The PhyloFish database contains RNA-seq raw counts from multiple tissues for 23 species representing all major lineages of ray-finned fishes57. The tissues represented are brain, liver, gill, heart, skeletal muscle, kidney, bones, intestine, ovary (derived from a single female), testis (derived from a single male), and embryos. Prior to tissue sampling, fish were maintained under standard laboratory conditions (i.e., adequate aeration). Other details of library construction, sequencing, and quality control are found in Pasquier et al.57.

Data were downloaded for 19 species: spotted gar (Lepisosteus oculatus), silver arowana (Osteoglossum bicirrhosum), bowfin (Amia calva), European eel (Anguilla anguilla), Allis shad (Alosa alosa), zebrafish (Danio rerio), panga (Pangasius hypophthalmus), Mexican tetra (Astyanax mexicanus), Northern pike (Esox lucius), Eastern mudminnow (Umbra pygmae), grayling (Thymallus thymallus), European whitefish (Coregonus lavaretus), brown trout (Salmo trutta), rainbow trout (Onchorhynchus mykiss), brook trout (Salvelinus fontinalis), Ayu sweetfish (Plecoglossus altivelis), Atlantic cod (Gadus morhua), medaka (Oryzias latipes), and European perch (Perca fluviatilis). BLASTn with sequences for each HIFA paralog (Supplemental Table S1) were used to find HIFA transcripts for each tissue in each species. Transcripts were normalized for gene length and total reads by determining reads per kilobase per million transcripts (RPKM). For each HIFA in each tissue, transcripts per million (TPM) were calculated as $$10^{6} {*}\frac{{{\text{RPKM}}}}{{\left( {{\text{sum}}\,{\text{RPKM}}} \right)}}$$104. Results are presented as heatmaps of median TPM values for each paralog for each tissue (GraphPad Prism v8.0.0; San Diego, California, USA).

### Ethics approval

This study did not use animal or human subjects.