Main

Teleost fishes represent about half of all living vertebrate species1 and provide important models for human disease (for example, zebrafish and medaka)2,3,4,5,6,7,8,9. Connecting teleost genes and gene functions to human biology (Fig. 1a) can be challenging given (i) the two rounds of early vertebrate genome duplication (VGD1 and VGD2 (ref. 10), but see ref. 11) followed by reciprocal loss of some ohnologs (gene duplicates derived from genome duplication12) in teleosts and tetrapods, including humans13,14; (ii) the TGD, which resulted in duplicates of many human genes15,16; and (iii) rapid teleost sequence evolution17,18, often due to asymmetric rates of ohnolog evolution, that frustrates ortholog identification. To help connect teleost biomedicine to human biology, we sequenced the genome of spotted gar (L. oculatus, henceforth 'gar'; Supplementary Fig. 1 and Supplementary Note) because its lineage represents the unduplicated sister group of teleosts19,20 (Fig. 1a).

Figure 1: Spotted gar bridges vertebrate genomes.
figure 1

(a) Spotted gar is a ray-finned fish that diverged from teleost fishes, including the major biomedical models zebrafish, platyfish, medaka and stickleback, before the TGD. Gar connects teleosts to lobe-finned vertebrates, such as coelacanth, and tetrapods, including human, by clarifying evolution after the two earlier rounds of vertebrate genome duplication (VGD1 and VGD2) that occurred before the divergence of ray-finned and lobe-finned fishes 450 million years ago (MYA). (b) Bayesian phylogeny inferred from an alignment of 97,794 amino acid positions for 243 proteins with a one-to-one orthology ratio from 25 jawed (gnathostome) vertebrates using PhyloBayes under the CAT + GTR + Γ4 model with rooting on cartilaginous fishes. Node support is shown as posterior probability (first number at each node) and bootstrap support from maximum-likelihood analysis (second number at each node) (Supplementary Fig. 6). The tree shows the monophyly and slow evolution of Holostei (gar and bowfin) as compared to their sister lineage, the teleosts (Teleostei). See also the Supplementary Data Set.

Source data

Gar informs the evolution of vertebrate genomes and gene functions after genome duplication and illuminates evolutionary mechanisms leading to teleost biodiversity. The gar genome evolved comparatively slowly and clarifies the evolution and orthology of problematic teleost protein-coding and microRNA (miRNA) gene families. Surprisingly, many entire gar chromosomes have been conserved with some tetrapods for 450 million years. Notably, gar facilitates the identification of CNEs, which are often regulatory, that teleosts and humans share but that are not detected by direct sequence comparisons. Global gene expression analyses show that expression domains and levels for TGD-generated duplicates usually sum to those for the corresponding gar gene, as expected if ancestral regulatory elements were partitioned after the TGD. By illuminating the legacy of genome duplication, the gar genome bridges teleost biology to human health, disease, development, physiology and evolution.

Results

Genome assembly and annotation

The genome of a single adult gar female collected in Louisiana was sequenced to 90× coverage using Illumina technology. The ALLPATHS-LG21 draft assembly covers 945 Mb with quality metrics comparable to those for other vertebrate Illumina assemblies21. To generate a 'chromonome' (chromosome-level genome assembly22), we anchored scaffolds to a meiotic map20, capturing 94% of assembled bases in 29 linkage groups (LGs) (Supplementary Note). Transcriptomes from adult tissues and developmental stages (Supplementary Note) facilitated the construction of a gene set annotated by MAKER23 of 21,443 high-confidence protein-coding genes and Ensembl annotation identified 18,328 protein-coding genes (mostly a subset of the MAKER annotations), 42 pseudogenes and 2,595 noncoding RNAs (Supplementary Note), in comparison to human (20,296 protein-coding genes) and zebrafish (25,642 protein-coding genes). About 20% of the gar genome is repetitive, including transposable elements (TEs) representing most lobe-finned and teleost TE superfamilies and a TE profile similar to that of coelacanth24, thus clarifying TE phylogenetic origins (Supplementary Figs. 2–5, Supplementary Tables 1–3 and Supplementary Note).

The gar lineage evolved slowly

Phylogenies of 243 one-to-one orthologs in 25 jawed vertebrates17, including the gar genome and our transcriptome of the bowfin Amia calva (Supplementary Note and Supplementary Data Set), strongly supported the monophyly of Holostei (gar and bowfin) as the sister group to teleosts (Fig. 1b, Supplementary Fig. 6 and Supplementary Note)25,26,27,28, suggesting that morphologies shared by bowfin and teleosts29,30 may be convergent or may be ancestral traits that were altered in the gar lineage.

Darwin applied his term 'living fossil' to 'ganoid fishes', including gars31; indeed, gars show low rates of speciation and phenotypic evolution32. Evolutionary rate analyses using cartilaginous fish outgroups showed that gar and bowfin proteins have evolved significantly slower than teleost sequences. Holostei had a substantially shorter branch length to the cartilaginous outgroup than most other bony vertebrates except coelacanth, the slowest evolving bony vertebrate17,33 (Fig. 1b, Supplementary Table 4 and Supplementary Note). Our results support the hypothesis that the TGD could have facilitated the high rate of teleost sequence evolution17,18,34. Gar TEs also showed a low turnover rate as compared to TEs in teleosts, mammals and even coelacanth24 (Supplementary Fig. 5 and Supplementary Note).

Gar informs the evolution of bony vertebrate karyotypes

Gar represents the first chromonome22 of a non-tetrapod, non-teleost jawed vertebrate, allowing for the first time long-range gene order analyses without the confounding effects of the TGD. The gar karyotype (2n = 58) contains both macro- and microchromosomes (Fig. 2a, Supplementary Fig. 7 and Supplementary Note). Aligning gar chromosomes to those of human, chicken and teleosts highlighted distinct conservation of orthologous segments in all species (Fig. 2b–e, Supplementary Figs. 8 and 9, and Supplementary Note). Strikingly, gar-chicken comparisons showed conservation of many entire chromosomes (Fig. 2c). The chicken and gar karyotypes differed only by about 17 large fissions, fusions or translocations. Almost half of the gar karyotype (14/29 chromosomes) showed a nearly one-to-one relationship in gar-chicken comparisons, including macro- and microchromosomes with highly correlated chromosome assembly lengths (Fig. 2d and Supplementary Note). This similarity in chromosome size and gene content is strong evidence that the karyotype of the common bony vertebrate ancestor of gar and chicken possessed both macro- and microchromosomes as Ohno35 hypothesized, consistent with microchromosomes in coelacanth36 and cartilaginous fishes35, for which no chromonomes are yet available.

Figure 2: Spotted gar preserves ancestral genome structure.
figure 2

(a) The spotted gar karyotype consists of macro- and microchromosomes (see Supplementary Fig. 7 for chromosome annotations). (b) Circos plot99 showing conserved synteny of gar (colored, left) and human (black, right) chromosomes. (c) Gar-chicken comparison shows strong conservation of the genomes over 450 million years and one-to-one synteny conservation for many entire chromosomes, particularly microchromosomes (for example, Loc13 and Gga14, Loc23 and Gga11, etc.). (d) The assembled chromosome lengths for gar and chicken chromosomes with one-to-one conserved synteny are highly correlated (R2 = 0.97). (e) Gar-medaka comparison shows the overall one-to-two double-conserved synteny relationship of gar to a post-TGD teleost genome (for example, gar Loc11 corresponds to medaka Ola16 and Ola11). The gar chromosomes are displayed in a different order in d than they are in b and c; asterisks indicate chromosomes inverted with respect to the arbitrarily oriented reference genome. (f) Gar-chicken-medaka comparisons illuminate the karyotype evolution leading to modern teleosts. The genome of the bony vertebrate ancestor contained both macro- and microchromosomes, some of which remain largely conserved in chicken and gar, for example, macrochromosome Loc2-GgaZ and microchromosomes Loc20-Gga15 and Loc21-Gga17. All three chromosomes possess double-conserved synteny with medaka chromosomes Ola9 and Ola12, which is explained by chromosome fusion in the lineage leading to teleosts after divergence from gar, followed by TGD duplication of the fusion chromosome and subsequent intrachromosomal rearrangements and rediploidization. Multiple examples of such pre-TGD chromosome fusions explain the absence of microchromosomes in teleosts. See the Supplementary Note for details.

Source data

The gar chromonome also tests the hypothesis that an increase in the number of interchromosomal rearrangements occurred in teleosts after, and possibly as a result of, the TGD20. For each gar chromosome segment, teleosts usually have two ohnologous segments, verifying gar-teleost divergence before the TGD20. Each TGD-derived pair in teleosts usually shows conserved synteny with more than one gar chromosome, indicating rearrangements before the TGD (Fig. 2e, Supplementary Figs. 8 and 9, and Supplementary Note). Gar shares many whole chromosomes with chicken (Fig. 2c) but few with teleosts (Fig. 2e). These results indicate that chromosome fusions thought to have occurred in the ray-finned lineage after divergence from the lobe-finned lineage37 actually occurred in the teleost lineage after divergence from gar but before the TGD (Fig. 2f and Supplementary Fig. 10). This finding explains how spotted gar has more chromosomes (n = 29; Fig. 2a) than typical teleosts (n 24 or 25; ref. 38) without experiencing the TGD. Comparisons taking the TGD into account further found an average fission and translocation rate in percomorphs (stickleback, medaka and pufferfish) relative to gar that is similar to that in the chicken lineage. Zebrafish had a higher rearrangement rate, even after accounting for the TGD (Supplementary Fig. 11 and Supplementary Note). These comparisons indicate that the TGD might not fully account for high teleost rearrangement rates.

Gar clarifies vertebrate gene family evolution

Lineage-specific loss of ohnologs often followed VGD1, VGD2 and the TGD (Fig. 1a), which complicates the identification of true orthologs22,39 and frustrates the translation of knowledge from teleost biomedical models to human biology13. Gar is uniquely informative because its lineage did not experience the TGD and often retains ancestral VGD1 and VGD2 ohnologs that were reciprocally lost in teleosts and tetrapods, thus clarifying the evolution of gene families involved in vertebrate development, physiology and immunity (Supplementary Note).

Analyses of developmental gene families showed stability in the gar gene repertoire, including for Hox gene clusters (Supplementary Note). Gar has 43 Hox genes organized into four clusters, as expected for an unduplicated ray-finned fish (Supplementary Fig. 12). No Hox gene has been completely lost in gar since divergence from the last common ray-finned ancestor. The hoxd14 gene, missing from teleosts but present in paddlefish40, is recognizable as a pseudogene in gar (Supplementary Fig. 13). In contrast, teleosts have far fewer Hox cluster genes than the 82 expected after genome duplication (for example, zebrafish has 49 genes and stickleback has 46 genes), demonstrating massive Hox gene loss after the TGD. Teleosts lack orthologs of hoxa6 and hoxd2, zebrafish lacks all HoxDb cluster protein-coding genes15 and percomorphs lack the HoxCb cluster41, but gar lacks just one Hox cluster gene from the last common bony vertebrate ancestor (hoxa14), fewer than tetrapods (for example, human has three losses) and coelacanth (two losses) (Supplementary Fig. 12). Gar ParaHox clusters (Supplementary Table 5 and Supplementary Note) are also more complete than those in teleosts and tetrapods, with four clusters containing seven genes. Gar retained cdx2, which highlights a VGD1/VGD2 ohnolog 'gone missing' from teleosts (Supplementary Fig. 14). Gar possesses the VGD1/VGD2 ohnolog pdx2, previously found only in cartilaginous fishes and coelacanth42, indicating that pdx2 was lost independently teleosts and tetrapods (Supplementary Figs. 14 and 15). Retinoic acid regulates Hox cluster gene expression43, but retinoic acid–synthesizing Aldh enzymes (Supplementary Note) vary in number among vertebrates44: tetrapods have three genes (Aldh1a1, Aldh1a2 and Aldh1a3), zebrafish has two genes (aldh1a2 and aldh1a3) and medaka has just one (aldh1a2)45. Finding all three genes in gar rules out the hypothesis45 that Aldh1a1 was a lobe-finned innovation (Supplementary Fig. 16).

Physiological mechanisms are shared among vertebrates, including light control of circadian rhythms, despite important gene repertoire differences between teleosts and tetrapods46,47. Analyses of gar circadian clock (Supplementary Fig. 17, Supplementary Table 6 and Supplementary Note)48 and opsin (Supplementary Fig. 18, Supplementary Table 7 and Supplementary Note)49 genes link the gene repertoires of teleosts and tetrapods: for example, gar clarifies which circadian genes originated in VGD events and which originated in the TGD event. Gar has pinopsin, present in tetrapods but absent from teleosts, along with exo-rhodopsin, previously thought to compensate for the lack of pinopsin in teleosts50.

Evolution of vertebrate immunity becomes clearer using gar (Supplementary Note). Major histocompatibility complex (MHC) class I and class II genes (Supplementary Figs. 19–21) are tightly linked in tetrapods and cartilaginous fishes but are unlinked in teleosts51,52. In gar, at least one pair of class I and class II genes is linked as in tetrapods53,54, suggesting that gar retains the ancestral configuration, although most gar MHC genes remain on unassembled scaffolds (Supplementary Fig. 21). Gar has some class I genes thought to be teleost specific (Z/P-like, L-like and U/S-like, for example54,55,56; Supplementary Fig. 19) and some class II genes similar to and some distinct from teleost DA/DB and DE lineages (Supplementary Fig. 20). Several gar MHC region genes are on unassembled scaffolds linked to genes whose human orthologs are encoded in the MHC class II or class III region on Hsa6, and some are adjacent to orthologs of teleost MHC class I genes (Supplementary Table 8). The human MHC class III region on Hsa6 has syntenic segments on Hsa1, Hsa9 and Hsa19; these four ohnologs likely arose in VGD1 and VGD2 (ref. 57), as supported by the gar genome (Supplementary Table 8).

Gar immunoglobulin genes (Supplementary Fig. 22) and transcripts generally resemble those of teleosts. Unexpectedly, gar has a second, distinct IgM locus but lacks IgT (IgZ)58,59, thought to provide mucosal immunity60, suggesting that IgT is teleost specific and that gar ganoid scales may suffice for exterior surface protection. Gar T cell receptor genes (Supplementary Fig. 23) are tightly linked as in mammals but, unlike in Xenopus tropicalis61, are downstream of VH and JH segments. Phylogenetic analyses of Toll-like receptor (TLR) genes (Supplementary Fig. 24) in tetrapods, teleosts and gar showed that the 16 identifiable gar TLRs encompass all six major TLR families62. Gar TLRs appear to share evolutionary histories with the TLRs from teleosts and/or tetrapods. Gar encodes Nitr (novel immune-type receptor) genes (Supplementary Fig. 25), which function in allorecognition and were thought to be teleost specific63,64. The 17 gar Nitr genes form 15 families, suggesting few recent tandem duplications or rapid divergence after gene duplication. In sum, the gar immunogenome bridges teleosts to tetrapods.

Gar uncovers evolution of vertebrate mineralized tissues

Bony vertebrates share mineralized tissues (bone, dentin, enameloid and enamel), yet the gene repertoires for the secretory calcium-binding phosphoproteins (Scpp) that form these tissues65,66 differ substantially between teleosts and tetrapods and their evolution remains controversial18,67,68. Gar clarifies understanding of these genes and their evolution because it retains ancient characteristics both in its ganoid scales, which contain ganoin, hypothesized to be a type of enamel69, and in its teeth, which are covered by both enameloid and enamel70 (Supplementary Note). Mammalian genomes were thought to contain the largest number of Scpp genes (human, 23 genes; coelacanth, 14 genes; zebrafish, 15 genes), and only 2 genes (Spp1 and Odam) seemed to be common to lobe-finned vertebrates and teleosts68 (Fig. 3a). We identified 35 Scpp genes in gar in two clusters on LG2 and LG4 (Fig. 3a, Supplementary Fig. 26, Supplementary Table 9 and Supplementary Note), which contain spp1 and odam, respectively. Notably, gar includes orthologs of five Scpp genes previously found only in teleosts and six Scpp genes known only from lobe-finned vertebrates. Another 18 gar Scpp genes have no identified ortholog in either lobe-finned vertebrates or teleosts (Fig. 3a, Supplementary Table 9 and Supplementary Note).

Figure 3: Gar helps connect vertebrate protein-coding and miRNA genes.
figure 3

(a) Scpp gene arrangements in human, coelacanth, gar and zebrafish including P/Q-rich (red) and acidic (blue) Scpp genes and Sparc-like genes (yellow) (Supplementary Note; ref. 68). Orthologies (gray vertical bars) among lobe-finned vertebrates (for example, human and coelacanth) and teleosts (for example, zebrafish) had previously been limited to Odam and Spp1 genes. Gar connects lineages through orthologs of genes previously known only from either teleosts (scpp1, scpp3, scpp5, scpp7 and scpp9) or lobe-finned vertebrates (enam, ambn, dmp1, dsppl1, ibsp and mepe). Further putative orthologies supported by only short stretches of sequence similarity (indicated by a question mark) connect gar enam, ambn and lpq14 genes with zebrafish fa93e10, scpp6 and scpp8 genes, respectively; gar lpq1 and coelacanth Scpppq4; and gar lpq5 with Amtn genes in lobe-finned vertebrates. Arrows in human and zebrafish indicate intrachromosomal rearrangements separating originally clustered genes into distant chromosomal locations (distance in Mb). Analysis of conserved synteny for the gar Scpp gene cluster on LG2 suggests that the Scpp gene regions on zebrafish chromosomes 10 and 5 are derived from the TGD (Supplementary Fig. 26 and Supplementary Note). (b) The gar 'conserved synteny bridge' (Supplementary Note) infers that the miRNA cluster of mir731 and mir462 on gar LG4 and zebrafish chromosome 8 and a miRNA-free region on zebrafish chromosome 2 are TGD ohnologous to the mammalian Mir425-191 cluster (highlighted in bold). (c) Gar newly connects through synteny zebrafish TGD-derived ohnologs mir135c-1 and mir135c-2 with mammalian Mir135B genes (highlighted in bold).

Source data

The enamel matrix protein genes encoding ameloblastin (Ambn), enamelin (Enam) and amelogenin (Amel) are found in lobe-finned vertebrates with enamel-bearing teeth but not in teleosts, which lack enamel-bearing teeth66,68. We identified ambn and enam genes (but no ortholog for Amel) in the gar genome and transcriptomes, a conclusion drawn by others using our data released before publication133. The gar ambn and enam genes show sequence similarity to zebrafish scpp6 (ref. 133) and fa93e10, respectively, suggesting that teleosts may have divergent orthologs, a hypothesis supported by conserved gene orders in the gar and zebrafish clusters (Fig. 3a).

RT-PCR and our gar skin transcriptome analysis identified expression of ambn and enam in enamel-containing gar teeth and in gar skin that includes scales with ganoin (Supplementary Table 9 and Supplementary Note), suggesting that strong expression of ambn and enam is limited to enamel and ganoin. Thus, enamel in teeth and ganoin in ganoid scales likely represent the same tissue, and common expression of Ambn and Enam in lobe-finned vertebrate enamel and in gar enamel and ganoin supports homology of these tissues. Analysis of gnathostome fossils suggested that ganoin is plesiomorphic for crown osteichthyans and arose before enamel71,133; thus, enamel-bearing teeth likely evolved by coopting enamel matrix genes originally used in ganoid scales. The Amel gene may have evolved subsequently to encode the principal organic component of the 'true enamel' that appears to have originated in lobe-finned vertebrates68,133.

Gar expresses 12 additional Scpp genes (including the odam and scpp9 hypermineralization genes66) in both teeth and scales and another 4 genes in bone (Supplementary Table 9), strongly suggesting that the common ancestor of extant bony vertebrates had a rich repertoire of Scpp genes, many of which were expressed in mineralized tissues, and that, although teleosts and lobe-finned vertebrates independently lost subsets of ancient Scpp genes65, gar has retained characteristics of both lineages.

Gar connects vertebrate microRNAomes

miRNA genes could become teleost or tetrapod specific18,72 by their loss in one lineage or gain in the other. We studied gar miRNAs computationally (Supplementary Fig. 27, Supplementary Table 10 and Supplementary Note) and annotated them using a sequence-based approach (Supplementary Note). Small RNA-seq data for four tissues identified 302 mature miRNAs derived from 233 genes, of which 229 belong to 107 families and 4 lack a known family (Supplementary Fig. 28 and Supplementary Table 11). Gar-zebrafish73,74 comparisons showed that four families and four individual miRNA genes emerged in teleosts. Of the 22 families thought to have been lost in teleosts18, 2 actually belong to the same family and orthologs of 4 gar miRNA genes were previously overlooked in teleosts. Fourteen families are absent from both gar and teleosts, and three are present in gar and many teleosts74 but absent from zebrafish. A single family present in teleosts and lobe-finned fishes (miR150) was not found in gar. Notably, no miRNA family loss was specific to teleosts, suggesting that the TGD did not accelerate family loss.

The 'gar bridge' helps to identify miRNA orthologies. For example, the mammalian Mir425 and Mir191 genes, thought to be lost in teleosts18, are orthologs of teleost mir731 and mir462, respectively (Fig. 3b). Additionally, mammalian Mir135B is orthologous to mir135c in gar and the zebrafish TGD-derived ohnologs mir135c-1 and mir135c-2 (Fig. 3c). The post-TGD retention rate for zebrafish miRNA ohnologs is 39% (81/208 analyzable cases), considerably higher than the retention rate for protein-coding genes (20–24%; ref. 75), consistent with the hypothesis that miRNA genes are likely to be retained after a duplication owing to their incorporation into multiple gene regulatory networks76,77,78,79.

Gar highlights hidden orthology of cis-regulatory elements

CNEs often function as cis-acting regulators80,81, but many appear to be absent in teleosts, presumably because of rapid teleost sequence evolution (Fig. 1b and Supplementary Note); ancestral CNEs identified in tetrapods, however, might be detected in ray-finned fish using the slowly evolving gar.

CNE analyses near developmental gene loci (Hox and ParaHox clusters, Pax6 and IrxB) showed that gar contains more gnathostome CNEs (conserved between bony vertebrates and elephant shark) than teleosts. Analyses incorporating gar identified many bony vertebrate CNEs (absent from elephant shark) that were not predicted by direct human-teleost comparisons; furthermore, gar-based alignments identified CNEs recruited in the common ancestor of ray-finned fishes (Supplementary Figs. 14, 15 and 29–35, Supplementary Tables 12–19 and Supplementary Note).

Gar elucidates the origins of tetrapod limb enhancers, evidenced by whole-genome alignments for 13 vertebrates (including gar, five teleosts, coelacanth, five tetrapods and elephant shark; Supplementary Fig. 36, Supplementary Tables 20 and 21, and Supplementary Note). Of 153 known human limb enhancers33,82,83,84, human-centric alignments identified 71% (108) in gar, but only 53% (81) were identified through direct human-teleost alignments. Of the 72 human limb enhancers not detected by human-teleost alignment, 40% (29) aligned to gar, confirming their presence in the bony vertebrate ancestor and loss or considerable divergence in teleosts. Of these 29 enhancers, 15 also aligned to elephant shark, highlighting their existence in the gnathostome ancestor. Fourteen occurred in gar but not in teleosts and would have been incorrectly characterized as lobe-finned vertebrate innovations without gar data (Supplementary Table 22 and Supplementary Note).

Using the gar bridge (Fig. 4a), we tested whether the 29 human enhancers not directly identified in teleosts might represent rapid divergence rather than definitive loss. Inspection of human-centric and then gar-centric alignments showed 48% (14/29) aligning to at least one teleost (Supplementary Table 22). Gar thus substantially improves understanding of the evolutionary origin of vertebrate limb enhancers and their fate in teleosts (Fig. 4b, Supplementary Fig. 37 and Supplementary Table 22). Strikingly, despite using the gar bridge, we found that teleosts lost substantially more limb enhancers (15) than gar (2) (Fig. 4b and Supplementary Fig. 37), suggesting that gar might be a better model than teleosts for investigating the fin-to-limb transition85.

Figure 4: Gar provides connectivity of vertebrate regulatory elements.
figure 4

(a) The gar bridge principle of vertebrate CNE connectivity from human through gar to teleosts. Hidden orthology is uncovered for elements that do not directly align between human and teleosts but become evident when first aligning tetrapod genomes to gar, and then aligning gar and teleost genomes. (b) Connectivity analysis of 13-way whole-genome alignments shows the evolutionary gain (green) and loss (red) of 153 human limb enhancers. Direct human-teleost orthology could only be established for 81 elements as opposed to 95 when using gar as a bridge as in a. See Supplementary Figure 37, Supplementary Table 22 and the Supplementary Note for details.

Functional studies of a HoxD limb enhancer tested the usefulness of a 'gar CNE bridge'. HoxD and HoxA clusters pattern proximal and distal mammalian limbs by 'early' and 'late' phases of gene expression, respectively86. Early-phase HoxD expression in fins and limbs shows several features that are presumed to be homologous87 and may derive from shared but cryptic regulatory elements. The CNS39 and CNS65 elements drive early-phase HoxD activation in mammals88 (Fig. 5a). Human-centric (Supplementary Table 22) and local mouse-centric (Fig. 5a) alignments failed to detect CNS39 in ray-finned fish but identified CNS65 in gar. Notably, CNS65 was identified in teleosts only by using the gar bridge (Fig. 5a and Supplementary Table 22).

Figure 5: Identification and functional analysis of the gar and teleost early-phase HoxD enhancer CNS65.
figure 5

(a) Top, schematic of the mouse HoxD telomeric gene desert, which contains the CNS39 and CNS65 enhancers that drive early-phase HoxD expression in limbs. Using mouse as the baseline, VISTA alignments of the HoxD gene desert show sequence conservation with human and chicken for CNS65 but not with teleosts (zebrafish and pufferfish) (bottom left). An alignment including gar, however, shows a peak of conservation in the gar sequence (middle). Using the identified gar CNS65 as the baseline identified CNS65 orthologs in zebrafish and pufferfish (right). (b) Gar (left) and zebrafish (right) CNS65 orthologs drive robust and reproducible GFP expression in zebrafish pectoral fins at 36 hours post-fertilization (h.p.f.) (top). Gar CNS65 has pectoral fin activity beginning at 31 h.p.f., which drives GFP expression throughout the fin, and becomes deactivated around 48 h.p.f. (bottom). Dashed lines indicate the distal portion of the pectoral fins. (c) Gar CNS65 drives expression throughout the early mouse forelimbs and hindlimbs (arrows) at stage E10.5 (left). At later stages (E12.5), gar CNS65 activity is restricted to the proximal portion of the limb and is absent in developing digits (middle). Zebrafish CNS65 drives reporter expression in developing mouse limbs at E10.5 but only in forelimbs (right). The number of LacZ-positive embryos showing limb signal is indicated at the bottom right of each image; FL, forelimb, HL, hindlimb. Scale bars, 50 μm (b) and 500 μm (c). See also the Supplementary Note.

To test whether cryptic CNE orthologs preserve enhancer function, we used CNS65-driven reporter constructs to generate transgenic zebrafish and mice (Supplementary Note). CNS65 from either gar or zebrafish drove early expression in the developing zebrafish pectoral fin (Fig. 5b). Gar CNS65 drove expression in the forelimbs and hindlimbs of embryonic day (E) 10.5 mice (Fig. 5c) that was indistinguishable from the activity of mouse CNS65 (ref. 88). Zebrafish CNS65 activated forelimb expression somewhat more weakly than gar CNS65 (Fig. 5c). At E12.5, gar CNS65 activated proximal but not distal limb expression (Fig. 5c), mimicking the endogenous mouse enhancer88. These functional experiments suggest that regulation of HoxD early-phase expression in limbs and fins is an ancestral, conserved feature of bony vertebrates and that gar connects otherwise cryptic teleost regulatory mechanisms to mammalian developmental biology.

Across the gar genome, we identified approximately 28% of human-centric CNEs (39,964/143,525), more than in any of five aligned teleost genomes. Around 19,000 human-centric CNEs aligned to gar but not to any teleost (Supplementary Table 21 and Supplementary Note). Without gar, one would have erroneously concluded that these elements originated in lobe-finned vertebrates or were lost in teleosts. The gar bridge (Fig. 4a) establishes hidden orthology from human to gar to zebrafish for many of these human-centric CNEs (30–36%, depending on overlap; Supplementary Table 21 and Supplementary Note). These approximately 6,500 newly connected human CNEs contain around 1,000 SNPs linked to human conditions in genome-wide association studies (GWAS), thereby connecting otherwise undetected disease-associated haplotypes to genomic locations in zebrafish (Supplementary Table 21). The gar bridge thus helps identify biomedically relevant candidate regions in model teleosts for functional testing, potentially enhancing teleost models for biomedical research.

Gar illuminates gene expression evolution following the TGD

Ohnologs experience several non-exclusive fates after genome duplication: loss of one copy, evolution of new expression domains or protein functions, and partitioning of ancestral functions89,90,91,92. Because the contribution of various fates has not yet been studied using a closely related TGD outgroup, we generated a list of gar genes and their orthologous TGD-derived ohnologs or singletons in zebrafish and medaka using phylogenetic93 and conserved synteny94 analyses (Fig. 6a,b, Supplementary Table 23 and Supplementary Note).

Figure 6: Gar illuminates gene expression evolution after the TGD.
figure 6

(a,b) The origin (a) and distribution (b) of gar and teleost singletons and TGD-derived ohnologs (Supplementary Table 23 and Supplementary Note). (c) Neofunctionalized ohnologs for slc1a3 showing new expression in liver. (d) Subfunctionalized TGD orthologs of gpr22 with one expressed in brain as in gar and the other expressed in heart as in gar. In c and d, the r values denote the correlation of the expression profile of each ohnolog with the gar pattern. The Supplementary Note lists neofunctionalization and subfunctionalization criteria. (eh) Expression conservation for ohnologs and singletons in zebrafish (Zf; e,g) and medaka (Md; f,h) (Supplementary Note). (e,f) Mean correlation between the expression patterns of gar genes and teleost ortholog(s). The correlation between average expression levels for ohnolog pairs and gar genes was greater than that for ohnologs alone and than that for singletons, indicating sharing of ancestral subfunctions by the ohnolog pair (multiple Wilcoxon Mann-Whitney tests with Bonferroni correction, α = 0.05 for significance). (g,h) Mean log10-transformed ratios of expression levels for gar genes and teleost ortholog(s). In comparison to gar genes, individual ohnologs were expressed at significantly lower levels than singletons; ohnolog pair/gar ratios were not statistically different from singleton/gar ratios, suggesting that the aggregate expression level of ohnolog pairs approaches the expression level of the preduplication gene (multiple two-sided Student's t test with Bonferroni correction, α = 0.05 for significance). Error bars in eh, s.e.m. Br, brain; Gil, gill; Hrt, heart; Mus, muscle; Liv, liver, Kid, kidney; Bo, bone; Int, intestine; Ov, ovary; Te, testis; Emb, embryo.

Source data

To compare tissue-specific gene expression patterns, we conducted RNA-seq analysis for ten adult organs and stage-matched embryos for gar, zebrafish and medaka and then normalized reads across tissues for each gene in each species (Supplementary Note). For example, gar expressed slc1a3 mainly in brain, bone and testis, but both teleosts expressed one ohnolog primarily in brain and the other primarily in liver, a novel expression domain, with little expression in bone or testis (Fig. 6c). New expression domains like this are expected if one ohnolog maintained ancestral patterns while the other evolved new functions95 before the teleost radiation. In contrast, gar expressed gpr22 mostly in brain and heart, but both teleosts expressed one ohnolog in brain and the other in heart (Fig. 6d), as expected from partitioning of ancestral regulatory subfunctions89.

To characterize the effects of the TGD on evolution of gene expression, we plotted tissue-specific expression levels in gar versus (i) expression of orthologous teleost singletons, (ii) expression of each TGD-derived ohnolog when both were retained and (iii) the averaged expression level of both retained ohnologs ('ohnolog pair'), and we then calculated correlation coefficients. Our results showed that the correlation between the expression patterns of gar genes and those of their teleost singleton orthologs was not significantly different from the correlation of expression patterns between gar genes and those of either copy of their teleost TGD-derived co-orthologs (Fig. 6e,f). Thus, when compared to ancestral single-copy genes as estimated from gar, teleost ohnologs binned at random do not appear to have evolved expression pattern differences significantly more rapidly than singletons. In contrast, the average tissue-specific patterns of both TGD-derived duplicates correlated significantly more closely with gar than with either ohnolog taken alone and correlated more closely with gar than with singletons (Fig. 6e,f); thus, ancestral gene subfunctions tended to be partitioned between TGD-derived ohnologs, which maintained ancestral functions as a gene pair, as predicted by the subfunctionalization model89.

We next calculated average expression levels for each gene over the 11 tissues and computed the ratio of each teleost gene to its gar ortholog. Comparisons showed that individual ohnologs were each expressed at significantly lower levels than singletons as compared to gar orthologs (Fig. 6g,h). The ohnolog pair/gar expression ratios, however, showed no statistical difference from the singleton/gar expression ratios (Fig. 6g,h). This finding suggests that the aggregate expression level for ohnolog pairs tends to evolve to approximately the expression level of the preduplication gene, as expected by quantitative subfunctionalization89,90,96.

Taken together, our analyses indicate that, after the TGD, ohnolog pairs evolved so that the sum of their expression domains and the sum of their expression levels usually approximated the patterns and levels of expression for preduplication genes.

Discussion

Gar is the first ray-finned fish genome sequence not affected by the TGD. Because of gar's phylogenetic position, slow rate of sequence evolution, dense genetic map and ease of laboratory culture, this resource provides a unique bridge between tetrapods and teleost biomedical models. Our analyses show that gar bridges teleosts to tetrapods in genome arrangement, allowing the identification of orthologous genes by possessing ancient VGD ohnologs lost reciprocally in teleosts and tetrapods and elucidating the evolution of vertebrate-specific features, including adaptive immunity and mineralized tissues, and the evolution of gene expression. Clarification of gene orthology and history is crucial for the design, analysis and interpretation of teleost models of human disease, including those generated with CRISPR/Cas9-induced genome editing97,98. Gar genomic analyses show that sequences formerly considered unique to teleosts or tetrapods are often shared by ray-finned and lobe-finned vertebrates, including human. Notably, the gar bridge helps identify potential gene regulatory elements that are shared by teleosts and humans but are elusive in direct teleost-tetrapod comparisons. The availability of gar embryos and the ease of raising eggs to adults in the laboratory22 (Supplementary Fig. 1) make gar a ray-finned species of choice when analyzing many vertebrate developmental and physiological features. In conclusion, the gar bridge facilitates the connectivity of teleost medical models to human biology.

Methods

A full description of methods can be found in the Supplementary Note. Animal work was approved by the University of Oregon Institutional Animal Care and Use Committee (Animal Welfare Assurance Number A-3009-01, IACUC protocol 12-02RA).

Gar genome sequencing and assembly.

The spotted gar genome was sequenced and assembled using DNA from a single adult female gar wild-caught in Bayou Chevreuil, Louisiana (Supplementary Note). It was sequenced using Illumina sequencing technology and jumping libraries to 90× coverage and assembled into LepOcu1 (GenBank accession AHAT00000000.1) using ALLPATHS-LG21. The draft assembly is 945 Mb in size and is composed of 869 Mb of sequence plus gaps between contigs. The spotted gar genome assembly has a contig N50 size of 68.3 kb, a scaffold N50 size of 6.9 Mb and quality metrics comparable to those of other vertebrate Illumina genome assemblies21. A total of 209 scaffolds were anchored in 29 linkage groups using 2,153 of 8,406 meiotic map restriction site–associated DNA (RAD) tag markers20, thus capturing 891 Mb of sequence or 94.2% of bases in the chromonome assembly (Supplementary Note).

RNA-seq transcriptomes.

The Broad Institute gar RNA-seq transcriptome (Supplementary Note) was generated from ten tissues (stage 28 embryo100, 8-day larvae, eye, liver, heart, skin, muscle, kidney, brain and testis) and assembled using Trinity101. PhyloFish RNA-seq transcriptomes of gar, bowfin, zebrafish and medaka (Supplementary Note) were generated from ten adult tissues (ovary, testis, brain, gills, heart, muscle, liver, kidney, bone and intestine) and one embryonic stage ('pigmented eye' stage of gar, zebrafish and medaka) and assembled using the Velvet/Oases package102.

Genome annotation.

Using evidence from the Broad Institute and PhyloFish gar transcriptomes (Supplementary Note), all RefSeq teleost proteins and all UniProt/SwissProt proteins, MAKER2 (ref. 23) annotated 25,645 protein-coding genes (Supplementary Note). Using the Broad Institute transcriptome, the Ensembl gene annotation pipeline identified 18,328 protein-coding genes for 22,470 transcripts along with 42 pseudogenes and 2,595 noncoding RNAs (Supplementary Note). Annotations for 762 and 6,877 genes are specific to Ensembl and MAKER, respectively. The gene set with 21,443 high-confidence genes predicted by MAKER likely has close to the true number of gar protein-coding genes.

Annotation of transposable elements.

Manual and automated classification (using RepeatScout and RepeatModeler) of gar TEs was performed on the basis of Wicker's nomenclature103, and identified elements were combined into a single library (Supplementary Note), which was then used to mask the genome with RepeatMasker. The TE age profile was determined using the Kimura distances of individual TE copies to the corresponding TE consensus sequence (Supplementary Note).

Phylogenomic and evolutionary rate analyses.

Phylogenetic analyses (Supplementary Note) were based on protein-coding sequence alignments described for the coelacanth genome analysis17 but updated with orthologous sequences from gar and bowfin (Supplementary Note) and from the slowly evolving Western painted turtle104. Phylogenetic reconstructions were carried out with RAxML105 and PhyloBayes MPI106. Molecular rate analyses (Supplementary Note) were performed at the protein alignment level with Tajima's relative rate tests107 and at the level of the reconstructed phylogenies with two-cluster tests108.

Genome structure analyses.

The spotted gar karyotype was determined from caudal fin fibroblast cell cultures established as described for zebrafish109 (Supplementary Note). Analyses of conserved synteny between gar, tetrapods (human and chicken) and teleosts (Supplementary Note) were performed with (i) Circos plots99 on the basis of orthology relationships from Ensembl 75 and as described in the Supplementary Note; (ii) the Synteny Database94 after integration of the gar genome assembly (Ensembl version 74); and (iii) comparative synteny maps derived as described in refs. 17,110.

Gene family analyses.

Individual gene families were analyzed as described in the Supplementary Note. RT-PCR and sequencing was performed to annotate and analyze gene expression of Scpp mineralization-related genes using cDNA libraries from gar teeth, jaw and scales (Supplementary Note).

miRNA annotation and analysis.

Gar miRNAs were studied in silico (Supplementary Note) by BLAST comparison of teleost and tetrapod miRNAs from miRBase74,111,112,113 against the gar genome assembly and confirmed with RNAfold114 (see also ref. 72). miRNA annotation and analyses based on the sequencing data of gar miRNAs (Supplementary Note) were performed as described for zebrafish73 by using small RNA-seq data from adult brain, heart, testis and ovary, which were processed and annotated with Prost! (ref. 115) according to miRNA gene nomenclature guidelines116; miRNA orthologies based on conserved synteny were established using Ensembl117, the Synteny Database94 and Genomicus118,119.

Analysis of conserved noncoding elements.

Investigation of CNEs in developmental gene loci was performed using SLAGAN120 in VISTA121 (Supplementary Note). Gar-, zebrafish- and human-centric 13-way multi-genome alignments were generated with MultiZ122 on the basis of lastZ123 pairwise whole-genome alignments. We used phyloFit124 to generate a neutral model of the evolution of fourfold-degenerate sites to identify conserved elements with phastCons124; genic elements and repetitive sequences were filtered out to obtain CNEs. Evolution of human limb enhancers33,82,83,84 was established using whole-genome alignments and conserved synteny curation. Genome-wide connectivity of CNEs and embedded GWAS SNPs from human to zebrafish through to gar was established from whole-genome alignments using liftOver125 and BEDtools126 (Supplementary Note).

HoxD enhancer functional analysis.

Gar and teleost orthologs of the HoxD early enhancer CNS65 were identified with VISTA (LAGAN)121. Gar and zebrafish CNS65 elements were cloned into pXIG-cFos-eGFP and Gateway-Hsp68-LacZ vectors for zebrafish127 and mouse (Cyagen Biosciences) transgenesis, respectively (Supplementary Note).

Comparative gene expression analyses.

Curated lists of TGD ohnologs and TGD singletons of zebrafish and medaka and their gar (co)orthologs were generated by integrating phylogenetic information from Ensembl Compara GeneTrees93 (Ensembl 74) and conserved synteny data from the Synteny Database94 (Supplementary Note). For all three species, RNA-seq reads from the PhyloFish transcriptomes (Supplementary Note) were mapped against the longest Ensembl reference coding sequence of each gene with BWA-Bowtie128,129, counted with SAMtools130 and normalized for each gene across the 11 tissues using DESeq131. The correlation of expression patterns and relative levels of expression between each zebrafish or medaka gene and its gar ortholog and of singletons, ohnolog 1, ohnolog 2 and ohnolog pairs was determined using R (ref. 132). See the Supplementary Note for additional information, including the definition of ohnolog pair expression and criteria for the detection of neofunctionalization and subfunctionalization.

Accession codes.

The spotted gar genome assembly is available from GenBank under accession GCA_000242695.1. RNA-seq data are available from the Sequence Read Archive (SRA) under accessions SRP042013 (Broad Institute gar transcriptome), SRP044781SRP044784 (PhyloFish transcriptomes of zebrafish, gar, bowfin and medaka) and SRP063942 (gar small RNA-seq for miRNA annotation). Gar Scpp gene sequences are available from GenBank under accessions KU189274KU189300.

URLs.

Spotted gar genome at Ensembl, http://www.ensembl.org/Lepisosteus_oculatus/Info/Index; Synteny Database, http://syntenydb.uoregon.edu/synteny_db/; PhyloFish Portal, http://phylofish.sigenae.org/index.html; RepeatMasker, http://www.repeatmasker.org/.