Main

We attempted to resolve the issue of whether the order Artiodactyla is monophyletic or paraphyletic by basing our analysis on the presence or absence of SINEs at particular orthologous loci of certain groups of species. SINEs are retroposons that have been amplified and integrated into genomes by retroposition8,9,10,11, that is, by the integration of a reverse-transcribed copy of RNA. As a consequence of the nature of retroposons, SINEs can be found specifically within members of a particular clade10,11,12,13. It is generally believed that SINEs are not excised precisely and, moreover, that SINEs have not been inserted independently at orthologous loci within different evolutionary lineages. These features mean that SINEs are very useful for the reconstruction of phylogenetic relationships among closely related species12,13.

We have characterized two new and different families of SINEs, designated the CHR-1 (for Cetacea, hippopotamus and Ruminantia) and CHR-2 family of repeats, from the genomes of several species of whales. The consensus sequences of these two families of SINEs are shown in Fig. 1a. The order Artiodactyla is traditionally divided into three suborders: Ruminantia (chevrotains, deer, cows, sheep), Tylopoda (camels) and Suiformes (pigs, peccaries and hippopotamuses). Dot-hybridization studies showed that these two families of SINEs are distributed extensively in the genomes of Cetacea, Ruminantia and hippopotamus, but were not detected in those of Tylopoda or of Suiformes other than the hippopotamus (Fig. 1b). These results suggest that whales, ruminants and hippopotamuses form a monophyletic group. This possibility prompted us to isolate specific genomic loci at which SINEs had been inserted.

Figure 1: The two newly isolated families of SINEs, CHR-1 and CHR-2, were present exclusively in the genomes of whales, ruminants and hippopotamuses.
figure 1

a, The consensus sequences of CHR-1 and CHR-2. The tRNA-derived region is underlined in each case. These sequences will appear in the DDBJ, EMBL and GenBank databases under the accession numbers: AB005033 and AB005034. b, Dot hybridization experiment.

The first approach to this involved random screening, to identify loci that contained a CHR-1 or CHR-2 SINE unit, followed by cloning and sequencing. Polymerase chain reactions (PCRs) were performed with genomic DNA from various cetacean and artiodactyl species to determine whether or not the locus might be informative from a phylogenetic perspective. Second, we performed a comprehensive survey of the protein-coding genes, in standard databases, in which an intron contained one unit of CHR-1 or CHR-2. When the length of the intron was short enough for generation of a PCR product from the entire intron, we designed one set of primers by reference to the sequences of exons. We used these two approaches to characterize seven different loci with a CHR-1 or CHR-2 SINE unit, as described below.

Our analysis indicates that a CHR-2 SINE had been integrated at the locus Pm52 in a common ancestor of cetaceans (Fig. 2A). The patterns of PCR products are shown in Fig. 2A, 2a. We performed hybridization experiments with the SINE sequence to confirm that the SINE unit had been integrated in a common ancestor of all cetaceans (Fig. 2A, 2b), and with the flanking sequence to confirm that the orthologous locus of each species had been amplified accurately (Fig. 2A, 2). The presence of the SINE unit in longer fragments (about 620 base pairs in length) in cetaceans (lanes 1–7) and the absence of the SINE unit in shorter fragments (about 230 base pairs (bp) in length) in artiodactyls (lanes 8–15) were confirmed by sequencing. The small fluctuations in fragment length were due to insertions and deletions of several nucleotides (data not shown). The Pm72 locus yielded similar results (Fig. 2B). The presence of these two loci indicates that the order Cetacea forms a monophyletic group.

Figure 2: Analysis of the seven loci at which a SINE unit(s) was inserted during the evolution of cetaceans, ruminants and hippopotamuses: A, Pm52; B, Pm72; C, pgha3; D, c21-352; E, Gm5; F, aaa228; G, aaa792.
figure 2

a, Products of PCR; b, c, results of hybridization experiments with different kinds of probe, namely, a unit sequence of the SINE (b) and the flanking sequence (c), respectively. In G, d and e show results of hybridization experiments with two different SINE probes, the CHR-2 SINE and the Bov-tA SINE, respectively.

The locus pgha3, at which a CHR-1 SINE was integrated in intron C of the gene for the α-subunit of a pituitary glycoprotein hormone (Fig. 2C), and locus c21-352, at which a CHR-1 SINE was integrated in intron C of the gene for steroid 21-hydroxylase (Fig. 2D), demonstrate the monophyly of ruminants.

Locus Gm5 was isolated by random screening by using CHR-1 SINE as probe. The SINE unit seems to have been integrated in a common ancestor of cetaceans, ruminants and hippopotamuses, suggesting that these three evolutionary lineages are monophyletic (Fig. 2E). The sequences of the fragments from the short-finned pilot whale, cow, hippopotamus and Bactrian camel confirmed the presence of the SINE unit in the longer fragment, and its absence in the shorter fragment, respectively. In lane 15 for the pig, a longer band was detected, but sequencing showed that this was due to insertion of another SINE unit, PRE-1 (ref. 14) in another site of this locus (data not shown).

The loci aaa228 and aaa792 are both derived from the gene for the α-subunit of the F(0)F(1) ATP synthase (designated the atpA1 gene) in the bovine genome. At locus 228, a CHR-1 SINE is present in the intron between exons 2 and 3, suggesting monophyly of cetaceans, ruminants and hippopotamuses (Fig. 2F, Fig. 1). Hybridization experiments using two different kinds of probe (Fig. 2F, 2 and 2) confirmed this conclusion.

Locus aaa792, between exons 10 and 11, is more complex. Three different families of SINEs (CHR-1, CHR-3 and Bov-tA15) became associated independently with this locus during the evolution of cetaceans and even-toed ungulates. The first integration event, involving a CHR-1 SINE, occurred in a common ancestor of cetaceans, ruminants and hippopotamuses (Fig. 2G, a–c). The pattern of PCR products is shown in Fig. 2G, 2. Hybridization experiments with the CHR-1 sequence (Fig. 2G, 2) and the flanking sequence (Fig. 2G, 2) as probe, respectively, showed that the SINE unit was integrated at the orthologous loci of the species designated above lanes 1–13, but not at those of the camel (lane 14) or the pig (lane 15). These results confirm the monophyly of cetaceans, ruminants and hippopotamuses, excluding camels and pigs. However, the lengths of fragments generated by PCR (Fig. 2G, 2) varied among species (lanes 1–13), but we deduced from the sequences of the main fragments that the other two different kinds of SINE were involved in this locus. Experiments using new probes confirmed that a CHR-2 SINE was integrated in the lineage of the minke and humpback whales (Fig. 2G, 2, lanes 1 and 2), and that a Bov-tA SINE was integrated in the lineage of the pecora (cows, sheep, deer and giraffes), indicating that the lineage forms a monophyletic group (Fig. 2G, 2, lanes 8–11). The sequences of major fragments are shown in Fig. 3.

Figure 3: An alignment of sequences of the aaa792 locus in the cow (BT, Bos taurus), minke whale (BA, Balaenoptera acutorostrata), hippopotamus (HA, Hippopotamus amphibius) and bactrian camel (CB, Camelus bactrianus).
figure 3

Boxed sequences indicate direct repeats arising from duplication upon retroinsertion. The underlined sequences show the sequences used for primers. Bars indicate deletions. Nucleotides identical to those in the cow are indicated by dots.

All results for the seven loci are congruent (Fig. 4), and provide conclusive evidence for the paraphyly of the order Artiodactyla, which should include the order Cetacea, and for the paraphyly of the suborder Suiformes, from which hippopotamuses should be excluded. Hippopotamuses form a monophyletic group with cetaceans and ruminants.

Figure 4: Phylogenetic relationships among cetaceans and artiodactyls, as deduced from the sites of insertion of SINEs.
figure 4

Arrows indicate the timing of insertion of SINEs. Types of SINE are shown in parentheses.

The inclusion of cetaceans within the order Artiodactyla has been proposed previously5, as the Ruminantia/Cetacea clade with the Suiformes (pigs and peccaries) as an outgroup. The possibility of clustering the hippopotamus with the Cetacea has also been suggested6,7, even though hippopotamuses have traditionally been grouped with pigs and peccaries on morphological grounds16. However, careful reanalyses of available molecular data17,18,19 indicate that the hypothesis of artiodactyl paraphyly was not supported convincingly from a statistical point of view. However, recent analyses of genes for milk casein7 provided new, convincing support for the ((cetaceans, hippopotamuses), ruminants) tree, supporting a previous hypothesis5. Our analysis of SINE retrotranspositions seems to provide unambiguous support for that hypothesis.

The conclusions from our retropositional analysis are inconsistent with earlier morphologically based hypotheses16,20,21. Paleontological and morphological data suggest that modern whales originated from the Archaeocetes (primitive aquatic cetaceans), which first appeared in the early Eocene epoch22. The Archaeocetes are believed to have originated from mesonychians, which appeared before the Eocene20. However, the most primitive artiodactyls (Dichobunids) first appeared in the early Eocene, and the origin of nearly all the families of artiodactyls can only be traced back to the middle or the late Eocene23,24. Such a sequence of appearance of these animals is inconsistent with our molecular data. However, a recent calibration of molecular clocks suggests that divergences among orders of eutherian mammals can be traced back more than 100 Myr. Hence, diversification of avian and mammalian orders might not have been an adaptive radiation after the Cretaceous/Tertiary extinction event (65 Myr ago), but might have been correlated with the fragmentation of emergent land areas during the Cretaceous25. We believe that recent molecular data will lead to the reinterpretation by palaeontologists of many fossil records of Artiodactyla to match our conclusions. Extensive morphological reversals and convergences, as well as large gaps in the fossil record, will then have to be acknowledged.

Methods

Polymerase chain reaction. PCR was performed in a 50-μl reaction mixture containing 0.2 mM dNTP, 200 ng of primer, Tth buffer (final Mg2+concentration, 1.5 mM) and 1 unit of Tth DNA polymerase (Toyobo, Osaka). Annealing temperature was chosen from 49 °C to 58 °C. A portion of the PCR products was analysed by electroporesis in an agarose gel containing 2% (w/v) Nusieve GTG and 1% (w/v) Seakem GTG (FMC BioProducts, Rockland, ME). Hybridization and washing were performed as described13.

Sequences of primers for PCR. Pm52 locus, 5′ primer, 5′-TCCTGATTCC(C/T)CTGAACAAA-3′, and 3′ primer, 5′-GGG(G/A)AAGACT(C/T)CCA(G/A)(C/T)TTTGAAAT-3′ Pm72 locus, 5′ primer, 5′-TTTAAAGCATGGCAGTTGGATTT(G/A)T-3′, and 3′ primer, 5′-GGATCTGTTTTTACTTTGACC-3′ pgha3 locus, 5′ primer, 5′-TCGGTGTGGTTCTC(G/C)AC(C/T)CT-3′, and 3′ primer, 5′-TGC(C/T)CCAATCTATCA(G/A)TG(C/T)ATG-3′ c21-352 locus, 5′ primer, 5′-GAGAATTCCTTCTG(G/A)AT(G/A)GT(G/C)AC-3′, and 3′ primer, 5′-(G/A)(C/T)CCGCAGCTCCATGGA(G/A)CC-3′ Gm5 locus, 5′ primer, 5′-GTAATGTGATTTGGCTTAGTGC-3′, and 3′ primer, 5′-TCAGCTCCTGGTGGCAGTCT-3′ aaa228 locus, 5′ primer, 5′-GCTTGATACCTACCACTATGAA-3′, and 3′ primer, 5′-CCTGG(A/C)(A/T)GTCT(G/C)AATTTGCAC-3′ and aaa792 locus, 5′ primer, 5′-TGTGGA(A/T)(G/T)NTG(G/C)CAGATTT(A/T)AAAG-3′, and 3′ primer, 5′-CAGCCACTTGCTCTTCAATAGC-3′.

Locus. Of the seven loci described, Pm52 and Pm72 were newly isolated by cloning and sequencing from a genomic library of sperm whales (Physeter macrocephalus), and Gm5 was isolated from that of short-finned pilot whale (Globicephala macrorhynchus). The other four loci were found in bovine genomic sequences in the EMBL database, as follows (accession numbers, locus name, SINE family): (X00004, pgha3, CHR1); (M11267 and M13545, c21-352, CHR-1); (X64565 and S48112, aaa228, CHR-1); (X64565 and S48112; aaa792; CHR-1, CHR-2 and Bov-tA).