We present a draft genome sequence of the platypus, Ornithorhynchus anatinus. This monotreme exhibits a fascinating combination of reptilian and mammalian characters. For example, platypuses have a coat of fur adapted to an aquatic lifestyle; platypus females lactate, yet lay eggs; and males are equipped with venom similar to that of reptiles. Analysis of the first monotreme genome aligned these features with genetic innovations. We find that reptile and platypus venom proteins have been co-opted independently from the same gene families; milk protein genes are conserved despite platypuses laying eggs; and immune gene family expansions are directly related to platypus biology. Expansions of protein, non-protein-coding RNA and microRNA families, as well as repeat elements, are identified. Sequencing of this genome now provides a valuable resource for deep mammalian comparative analyses, as well as for monotreme biology and conservation.
The platypus (Ornithorhynchus anatinus) has always elicited excitement and controversy in the zoological world1. Some initially considered it to be a true mammal despite its duck-bill and webbed feet. The platypus was placed with the echidnas into a new taxon called the Monotremata (meaning ‘single hole’ because of their common external opening for urogenital and digestive systems). Traditionally, the Monotremata are considered to belong to the mammalian subclass Prototheria, which diverged from the therapsid line that led to the Theria and subsequently split into the marsupials (Marsupialia) and eutherians (Placentalia). The divergence of monotremes and therians falls into the large gap in the amniote phylogeny between the eutherian radiation about 90 million years (Myr) ago and the divergence of mammals from the sauropsid lineage around 315 Myr ago (Fig. 1). Estimates of the monotreme–theria divergence time range between 160 and 210 Myr ago; here we will use 166 Myr ago, recently estimated from fossil and molecular data2.
The most extraordinary and controversial aspect of platypus biology was initially whether or not they lay eggs like birds and reptiles. In 1884, William Caldwell’s concise telegram to the British Association announced “Monotremes oviparous, ovum meroblastic”, not holoblastic as in the other two mammalian groups3,4. The egg is laid in an earthen nesting burrow after about 21 days and hatches 11 days later5,6. For about 4 months, when most organ systems differentiate, the young depend on milk sucked directly from the abdominal skin, as females lack nipples. Platypus milk changes in protein composition during lactation (as it does in marsupials, but not in most eutherians5). The anatomy of the monotreme reproductive system reflects its reptilian origins, but shows features typical of mammals7, as well as unique specialized characteristics. Spermatozoa are filiform, like those of birds and reptiles, but, uniquely among amniotes, form bundles of 100 during passage through the epididymis. Chromosomes are arranged in defined order in sperm8 as they are in therians, but not birds9. The testes synthesize testosterone and dihydrotestosterone, as in therians, but there is no scrotum and testes are abdominal10.
Other special features of the platypus are its gastrointestinal system, neuroanatomy (electro-reception) and a venom delivery system, unique among mammals11. Platypus is an obligate aquatic feeder that relies on its thick pelage to maintain its low (31–32 °C) body temperature during feeding in often icy waters. With its eyes, ears and nostrils closed while foraging underwater, it uses an electro-sensory system in the bill to help locate aquatic invertebrates and other prey12,13. Interestingly, adult monotremes lack teeth.
The platypus genome, as well as the animal, is an amalgam of ancestral reptilian and derived mammalian characteristics. The platypus karyotype comprises 52 chromosomes in both sexes14,15, with a few large and many small chromosomes, reminiscent of reptilian macro- and microchromosomes. Platypuses have multiple sex chromosomes with some homology to the bird Z chromosome16. Males have five X and five Y chromosomes, which form a chain at meiosis and segregate into 5X and 5Y sperm17,18. Sex determination and sex chromosome dosage compensation remain unclear.
Platypuses live in the waterways of eastern and southern Australia, including Tasmania. Its secretive lifestyle hampers understanding of its population dynamics and the social and family structure. Platypuses are still relatively common in the wild, but were recently reclassified as ‘vulnerable’ because of their reliance on an aquatic environment that is under stress from climate change and degradation by human activities. Water quality, erosion, destruction of habitat and food resources, and disease now threaten populations. Because the platypus has rarely bred in captivity and is the last of a long line of ornithorhynchid monotremes, their continued survival is of great importance. Here we describe the platypus genome sequence and compare it to the genomes of other mammals, and of the chicken.
Sequencing and assembly
All sequencing libraries were prepared from DNA of a single female platypus (Glennie; Glenrock Station, New South Wales, Australia) and were sequenced using established whole-genome shotgun (WGS) methods19. A draft assembly was produced from ∼6× coverage of whole-genome plasmid, fosmid and bacterial artificial chromosome (BAC) reads (Supplementary Table 1) using the assembly program PCAP20 (Supplementary Notes 1). A BAC-based physical map was developed in parallel with the sequence assembly and subsequently integrated with the WGS assembly to provide the primary means of scaffolding the assembly into larger ordered and oriented groupings (ultracontigs; Supplementary Notes 2 and 3 and Supplementary Table 2). Because there were no platypus linkage maps available, we used fluorescent in situ hybridization (FISH) to localize a subset of the sequence scaffolds to chromosomes following the agreed nomenclature21. Of the 1.84 gigabases (Gb) of assembled sequence, 437 megabases (Mb) were ordered and oriented along 20 of the platypus chromosomes. We analysed numerous metrics of assembly quality (Supplementary Notes 4–11) and we conclude that despite the adverse contiguity, the existing platypus assembly, given its structural and nucleotide accuracy, provides a reasonable substrate for the analyses presented here.
In general, the platypus genome contains fewer computationally predicted non-protein-coding (nc)RNAs (1,220 cases excluded high repetitive small nucleolar RNA (snoRNA) copies; see below) than do other mammalian species (for example, human with 4,421 Rfam hits), similar to observations in chicken19 (655 Rfam-based ncRNAs). This is probably because of the extensive retrotransposition of ncRNAs in therian mammals and the apparent lack of L1-mediated retrotransposition in chicken and platypus. The exception to this is the platypus family of snoRNAs, which is markedly expanded (∼2,000 matches to the Rfam covariant models) compared to that for therian mammals (∼200). snoRNAs are involved in RNA modifications, in particular of ribosomal RNA, and are often located in introns of protein-coding genes22. Our investigations revealed a novel short-interspersed-element (SINE)-like, snoRNA-related retrotransposon—which we have labelled snoRTEs—that has duplicated in platypus to ∼40,000 full-length or truncated copies. It is retrotransposed by means of retrotransposon-like non-LTR (long terminal repeat) transposable elements (RTE) as opposed to the L1-mediated transposition mechanism in therians23. We constructed a complementary DNA library of small, ncRNAs and identified 371 consensus sequences of small RNAs that included 166 snoRNAs23 (Supplementary Table 3). Ninety-nine of these cloned snoRNAs are found in paralogous families, and 21 of them belong to the snoRTE class. The presence of both the structural requirements known to be important in snoRNA function24 and evidence of their expression are consistent with these snoRTE elements being functional in the platypus. Similar to other unrelated ncRNAs that have proliferated in therian mammals (for example, 7SL RNA-derived primate Alu elements, tRNA-derived rodent identifier (ID) elements), this recent SINE-like expansion is probably due to chance events. However, given the RNA modification activity of snoRNAs, and our increasing awareness of the cellular importance of RNA molecules, it might be that some of the retrotranspositionally duplicated RNAs were exapted into new functions in this species.
Other small RNAs
Overall, we found commonalities with small RNA (sRNA) pathways of other mammals, but also features that are unique to monotremes. Components of the RNA interference machinery are conserved in platypus, including elements of biogenesis pathways (Dicer and Drosha) and RNA-interference effector complexes (argonaute proteins; Supplementary Table 4). Of 20,924,799 platypus and echidna sRNA reads derived from liver, kidney, brain, lung, heart and testis, 67% could be assigned to known microRNA (miRNA) families. Established patterns of miRNA expression were generally recapitulated in monotremes.
To determine the conservation patterns of miRNAs in platypus, we identified platypus miRNAs sharing at least 16-nucleotide identity with miRNAs in eutherian mammals (mouse/human) and chicken. Although most conserved miRNAs were identified across these vertebrate lineages (137 miRNAs), 10 miRNAs were shared only with eutherians (mouse/human) and 4 only with chicken (Fig. 2a). miRNAs can be classified into families based on identity of the functional ‘seed’ region at position 2–8 of the mature miRNA strand. We identified miRNA families that were shared between platypus and eutherians but not chicken (40 families), or between platypus and chicken but not eutherians (8 families), suggesting that for some miRNAs only the seed region may have been selectively conserved (Fig. 2a). Conserved miRNAs tended to be more robustly expressed in the platypus tissues analysed than lineage-restricted miRNAs (Fig. 2b).
To identify miRNAs unique to monotremes we used a heuristic search that identifies miRNA candidates in deep-sequencing data sets25. This method predicted 183 novel miRNAs in platypus and echidna (Fig. 2a). Notably, 92 of these lay in 9 large clusters, on platypus chromosome X1 and contigs 1754, 7160, 7359, 8388, 11344, 22847, 198872 and 191065. Physical mapping confirmed that at least five of these contigs are linked to the long arm of chromosome X1 (ref. 25). These abundantly expressed clusters were sequenced almost exclusively from platypus and echidna testis (Fig. 2b). The expansion of this unique miRNA class and its expression domain suggest possible roles in monotreme reproductive biology25.
Piwi-interacting RNAs (piRNAs) associate with a germline-expressed clade of argonaute proteins, known as Piwis26, and have a role in transposon silencing and genome methylation26. Monotreme piRNAs bear strong structural similarity to those in eutherians. They are ∼29 nucleotides in length and arise from large testis-specific genomic clusters with distinct genomic strand asymmetry, often with a typical ‘bidirectional’ organization. We identified 50 major platypus piRNA clusters as well as numerous smaller clusters25. In contrast to piRNAs in mouse, platypus piRNAs are repeat-rich and bear strong signatures of active transposon defence.
We set out to define the protein-coding gene content of platypus to illuminate both the specific biology of the monotreme clade and for comparisons to eutherians and marsupials, or to chicken, the representative sauropsid. Protein-coding genes were predicted using the established Ensembl pipeline27 suitably modified for platypus (Supplementary Notes 14), with a greater emphasis placed on similarity matches to mammalian genes. Overall this resulted in 18,527 protein-coding genes being predicted from the current platypus assembly. The number of platypus protein-coding genes thus is similar to estimates (18,600–20,800) for human and opossum28,29.
We were interested first in identifying platypus genes that contribute most to core biological functions that are conserved across the mammals. These will typically be ‘simple’ 1:1 orthologues, genes that have remained as single copies without duplication or deletion in platypus, in Eutheria (specifically, in dog, human and mouse) and in opossum, a representative marsupial. Subsequently, we considered genes that have been duplicated or deleted in the monotreme lineage, or that have been lost in eutherian and/or marsupial lineages. Such genes are proposed to contribute most to the lineage-specific biological functions that distinguish individual mammals30. These studies required the use of an outgroup species, here chicken, a representative of the sauropsids.
As expected, the majority of platypus genes (82%; 15,312 out of 18,596) have orthologues in these five other amniotes (Supplementary Table 5). The remaining ‘orphan’ genes are expected to primarily reflect rapidly evolving genes, for which no other homologues are discernible, erroneous predictions, and true lineage-specific genes that have been lost in each of the other five species under consideration. Simple 1:1 orthologues, which have been conserved without duplication, deletion or non-functionalization across the five mammalian species, were greatly enriched in housekeeping functions, such as metabolism, DNA replication and mRNA splicing (Supplementary Table 6).
We then identified evolutionary lineages that experienced the most stringent purifying selection. The mouse terminal lineage exhibited a significantly higher degree of purifying selection (the ratio of amino acid replacement to silent substitution rates, dN/dS = 0.105, P < 0.001) than dog, opossum and chicken terminal branches (values of 0.123–0.128); human and platypus terminal lineages showed significantly reduced purifying selection (both 0.132, P < 0.03). These values probably reflect the increased efficiency of purifying selection in populations of larger effective size, such as that of mouse31. We find that at least one nucleotide substitution has occurred, on average, in synonymous sites of platypus and human orthologues since their last common ancestor (Supplementary Notes 17 and Supplementary Fig. 1). This means that most neutral sequence cannot be aligned accurately between monotreme and eutherian genomes.
Next, we determined the genetic distance of echidna (Tachyglossus aculeatus) from platypus. The median dS value of 0.125 for the orthologues of echidna and platypus, when compared to the value for the monotreme lineage, predicts that platypus and echidna last shared a common ancestor 21.2 Myr ago. Although similar to previous estimates32, this value seems to be at odds with fossil evidence, perhaps owing to relatively recent reductions of mutational rates in the monotreme lineage33.
We next investigated whether the ancestral reptilian characters of monotremes are reflected in the set of genes that have been retained in platypus, sauropsids and other vertebrates from outside of the amniote clade (such as frogs and fish), but have been lost from eutherian and marsupial lineages (Fig. 1). These ancestral, sauropsid-like, characters of platypus include oviparity (egg laying) and the outward appearances of its spermatozoa and retina. Simultaneously, we sought genetic evidence within the platypus genome both for characteristics peculiar to monotremes, such as venom production and electro-reception, and for characteristics unique to mammals, in particular lactation. By investigating platypus homologues of genes already known to be involved in specific physiological processes (see Methods), we highlight those platypus genes for which evolution exemplifies the ancestral or derived physiological characters of monotremes.
The semi-aquatic platypus was expected to sense its terrestrial, but not aquatic, environment by detecting airborne odorants using olfactory receptors and vomeronasal receptors (types 1 and 2: V1Rs, V2Rs). Nevertheless large numbers of odorant receptor, V1R and V2R homologues (approximately 700, 950 and 80, respectively) are apparent in the platypus genome assembly, although for each family only a minority lack frame disruptions (approximately 333, 270 and 15, respectively)34. Many of these platypus genes and pseudogenes are monophyletic, having arisen by duplication in the 166 Myr since the last common ancestor of monotremes and therians. Although mouse and rat genomes possess greater numbers of odorant receptors and V2Rs than the platypus genome35,36, the platypus repertoire of V1Rs, showing undisrupted reading frames, is the largest yet seen, 50% more than for mouse (Fig. 3b). This is particularly noteworthy as the Anolis carolinensis lizard (sequence data used with the permission of the Broad Institute) and the chicken19 seem to possess no such receptors. The large expansion of the platypus V1R gene family might reflect sensory adaptations for pheromonal communication or, more generally, for the detection of water-soluble, non-volatile odorants, during underwater foraging.
The platypus odorant receptor gene repertoire is roughly one-half as large as those in other mammals37. Nevertheless, platypus odorant receptors fall into class, family and subfamily structures that are well represented from across the mammals, with a few notable exceptions such as family 14 (Fig. 3a). Together with the finding that lizard contains only ∼200 odorant receptor genes and pseudogenes, this indicates that the platypus olfactory repertoire is, as expected, more akin to other mammals than it is to sauropsids.
Fertilization in the platypus exhibits both sauropsid and therian characteristics. Platypus ova are small (4 mm diameter) relative to comparably sized reptiles and birds, and eggs hatch at an early stage of development so that most growth of the embryo and infant is dependent on lactation, as in marsupials. Like all mammals and many other amniotes, when fertilization occurs the ovum is invested with a zona pellucida. The platypus genome encodes each of the four proteins of the human zona pellucida38, as well as two ZPAX genes (Table 1) that previously were observed only in birds, amphibians and fish. The aspartyl-protease nothepsin is present in platypus, but has been lost from marsupial and eutherian genomes (Table 1). In zebrafish, this gene is specifically expressed in the liver of females under the action of oestrogens, and accumulates in the ovary39. These are the same characteristics as of the vitellogenins, indicating that nothepsin may be involved in processing vitellogenin or other egg-yolk proteins. We find that platypus has retained a single vitellogenin gene and pseudogene, whereas sauropsids such as chicken have three and the viviparous marsupials and eutherians have none.
Orthologues of many of the eutherian sperm membrane proteins related to fertilization40 are present in platypus (and marsupial) genomes. These include the genes for a number of putative zona pellucida receptors and proteins implicated in sperm–oolemma fusion. Testis-specific proteases, which in eutherians participate in degradation of the zona pellucida during fertilization, are all absent from the platypus genome assembly.
Monotreme spermatozoa undergo some post-testicular maturational changes, including the acquisition of progressive motility, loss of cytoplasmic droplets and aggregation of single spermatozoa into bundles during passage through the epididymis11. Nevertheless, maturational changes in the sperm surface that are both unique and essential in other mammals for fertilization of the ovum have yet to be identified. Also, the epididymis of monotremes is not highly adapted for sperm storage as in most marsupial and eutherian mammals. Consistent with these findings is the absence of platypus genes for the epididymal-specific proteins that have been implicated in sperm maturation and storage in other mammals. The most abundant secreted protein in the platypus epididymis is a lipocalin, the homologues of which are the most secreted proteins in the reptilian epididymis41. Notably, ADAM7, a protease that is secreted in the epididymis of eutherians, has an orthologue in the platypus. This is a bona fide protease with a characteristic Zn2+-coordinating sequence HExxH in the platypus, in the opossum and the tree shrew (Tupaia belangeri). However, loss of its proteolytic activity is predicted in eutherians42 owing to a single point mutation within its active site (E to Q).
Lactation and dentition
Lactation is an ancient reproductive trait whose origin predates the origin of mammals. It has been proposed that early lactation evolved as a water source to protect porous parchment-shelled eggs from desiccation during incubation43 or as a protection against microbial infection. Parchment-shelled egg-laying monotremes also exhibit a more ancestral glandular mammary patch or areola without a nipple that may still possess roles in egg protection. However, in common with all mammals, the milk of monotremes has evolved beyond primitive egg protection into a true milk that is a rich secretion containing sugars, lipids and milk proteins with nutritional, anti-microbial and bioactive functions. In a reflection of this eutherian similarity platypus casein genes are tightly clustered together in the genome, as they are in other mammals, although platypus contains a recently duplicated β-casein gene (Supplementary Fig. 2).
Mammalian casein genes are thought to have originally arisen by duplication of either enamelin or ameloblastin44, both of which are tooth enamel matrix protein genes that are located adjacent to the casein gene cluster in eutherians and, we find, also in platypus. Adult platypuses, as well as echidnas, lack teeth but the conservation of these enamel protein genes is consistent with the presence of teeth and enamel in the juvenile, as well as the fossil platypuses45.
Only a handful of mammals are venomous, but the male platypus is unique among them in delivering its poison not via a bite but from hind-leg spurs. Despite the obvious difficulties in obtaining samples, it is now known that platypus venom is a cocktail of at least 19 different substances46 including defensin-like peptides (vDLPs), C-type natriuretic peptide (vCNP) and nerve growth factor (vNGF). When analysed phylogenetically and mapped to the platypus genome assembly, these sequences are revealed to have arisen from local duplications of genes possessing very different functions (Fig. 4). Notably, duplications in each of the β-defensin, C-type natriuretic peptide and nerve growth factor gene families have also occurred independently in reptiles during the evolution of their venom47. Convergent evolution has thus clearly occurred during the independent evolution of reptilian and monotreme venom48.
Although the major organs of the monotreme immune system are similar to those of other mammals49, the repertoire of immunity molecules shows some important differences from those of other mammals. In particular, the platypus genome contains at least 214 natural killer receptor genes (Supplementary Notes 18) within the natural killer complex, a far larger number than for human (15 genes50), rat (45 genes50) or opossum (9 genes51).
Both platypus and opossum genomes contain gene expansions in the cathelicidin antimicrobial peptide gene family (Supplementary Fig. 3). Among eutherians, primates and rodents have a single cathelicidin gene52,53, whereas sheep and cows have numerous genes that have been duplicated only recently54. The expanded repertoire of cathelicidin genes in both marsupials and monotremes may arm their immunologically naive young with a diverse arsenal of innate immune responses. In eutherians, with their increases in length of gestation and advances in development in utero of their immune systems, the diversity of antimicrobial peptide genes may have become less critical. The platypus genome also contains an expansion in the macrophage differentiation antigen CD163 gene family (Supplementary Notes 18).
First, we analyse the phylogenetic position of platypus and confirm that marsupials and eutherians are more closely related than either is to monotremes (Supplementary Notes 19). We then describe platypus chromosomes and observe some properties of platypus interspersed and tandem repeats. We also discuss a potential relationship between interspersed repeats and genomic imprinting and investigate how the extremely high G+C fraction in platypus affects the strong association seen in eutherians between CpG islands and gene promoters.
Platypus chromosomes provide clues to the relationship between mammal and reptile chromosomes, and to the origins of mammal sex chromosomes and dosage compensation. Our analysis provides further insight with the following findings: the 52 platypus chromosomes show no correlation between the position of orthologous genes on the small platypus chromosomes and chicken microchromosomes; for the unique 5X chromosomes of platypus we reveal considerable sequence alignment similarity to chicken Z and no orthologous gene alignments to human X, implying that the platypus X chromosome evolved directly from a bird-like ancestral reptilian system55; and the genes on the five platypus X chromosomes appear to be partially dosage compensated (Supplementary Fig. 5), perhaps parallel to the incomplete dosage compensation recently described in birds56.
About one-half of the platypus genome consists of interspersed repeats derived from transposable elements. The most abundant and still active repeats are (severely truncated) copies of the 5-kb long-interspersed-element (LINE2) and its non-autonomous SINE-companion mammalian-wide interspersed repeat (MIR, Mon-1 in monotremes) that became extinct in marsupials and in eutherians 60–100 Myr ago. We estimate that there are 1.9 and 2.75 million copies of LINE2 and MIR/Mon-1, respectively, in the 2.3-Gb platypus genome. DNA transposons and LTR retroelements are quite rare in platypus, but there are thousands of copies of an ancient gypsy-class LTR element (all LTR elements previously identified in mammals, birds, or reptiles belong to the retrovirus clade). Overall, the frequency of interspersed repeats (over 2 repeats per kb) is higher than in any previously characterized metazoan genome. Population analysis using LINE2/Mon-1 elements distinguished the Tasmanian population from three other mainland clusters (Supplementary Fig. 4a, b), in good agreement with tree-based analysis, physical proximity and previous knowledge of platypus population relationships57.
Cluster analysis of all LINE2 copies revealed a phylogenetic relationship lacking branches, as if a single-locus, fast-evolving gene has steadily spread an exceptional number of pseudogenes over time (Supplementary Fig. 6). This ‘master gene’ appearance is, to a lesser degree, also observed for LINE1 in eutherians58, but not to the same extent for MIR/Mon-1 or other retrotransposons in mammals. The phylogeny of LINE2 and Mon-1 was also supported by a genome-wide transposition-in-transposition (TinT) analysis59 (Supplementary Tables 7 and 8). LINE2 density is similar on all chromosomes (Supplementary Fig. 7); it does not correlate with chromosome length (and recombination rate) as the CR1 LINE density does in the chicken genome19, nor is it higher on sex chromosomes than on autosomes, as LINE1 density is in eutherians (which has led to postulations on a function in dosage compensation)60.
We compared microsatellites in the platypus genome with those of representative vertebrates (Supplementary Notes 22). The mean microsatellite coverage of platypus genomic sequences assembled into chromosomes is 2.67 ± 0.34%; significantly lower than all other mammalian genomes sequenced so far and most similar to that observed in chicken (Supplementary Fig. 8). Microsatellites are on average shorter in platypus than in other genomes (Supplementary Table 9), but microsatellite coverage surpasses chicken owing to very long tri- and tetranucleotide repeats (Supplementary Fig. 9). The platypus has a higher proportion of microsatellites with high A+T content, in comparison to the other vertebrates examined, an abundance distribution that has more in common with reptiles than with mammals (Supplementary Fig. 10).
Genomic imprinting is an epigenetic phenomenon that results in monoallelic gene expression. In the vertebrates, imprinting seems to have evolved recently and has only been confirmed in marsupials and eutherian mammals61,62. The autosomal localization of some imprinted orthologues in platypus is known63. However, we examined the conservation of synteny and the distribution of retrotransposed elements in all orthologous eutherian-imprinted clustered and non-clustered genes in the platypus genome. A representative cluster is shown in Fig. 5 (see also Supplementary Fig. 12).
Clusters that became imprinted in therians (with the exception of the Prader–Willi–Angelman locus64) have not been assembled recently and reside in ancient syntenic mammalian groups, although some regions have expanded by mechanisms such as gene duplication or transposition. There were significantly fewer LTR and DNA elements across all platypus orthologous regions relative to eutherian imprinted genes (P < 0.04 and 0.04, respectively), whereas there was a significant increase in the sequences masked by SINEs (P < 0.03). The chicken had fewer total repeats and no SINEs or sRNAs. Comparison of all regions in the platypus with the orthologous regions in opossum, mouse, dog and human demonstrates that accumulation of LTR, DNA elements, and simple and low complexity repeats coincides with, and may be a driving force in, the acquisition of imprinting in these regions in therian mammals.
The CpG fraction
The eutherian and chicken genomes generally average around 41% G+C content, although many intervals differ substantially from the average, particularly in humans (Supplementary Notes 23). In contrast, the platypus genome averages 45.5% G+C content and rarely deviates far from the average. The opossum genome averages only 38% G+C content and also has a narrow distribution (Supplementary Fig. 13). The source of the elevated G+C fraction in platypus remains unclear. It is explained only in part by monotreme interspersed repeat elements, as platypus DNA outside of known interspersed repeats is 44.7% G+C. Furthermore, tandem repeats of short DNA motifs (microsatellites) in platypus show an A+T bias, as with other mammals. Recombination-driven biased gene conversion may be a factor, in agreement with what has been shown for eutherians65 and marsupials66. This is suggested by the observation that the six platypus chromosomes where the currently mapped DNA sequence averages over 45% G+C content (that is, 17, 20, 15, 14, 10 and 11 in order of decreasing G+C fraction) are among the 10 shortest (Supplementary Fig. 14), because short chromosomes have a higher recombination rate67. However, a direct test is currently lacking because platypus recombination rates have not been measured. A further examination of the CpG fraction, that associated with promoter elements, is found in Supplementary Notes 24 and Supplementary Fig. 15.
The egg-laying platypus is a remarkable species with many biological features unique among mammals. Our sequencing of the platypus genome now enables us to compare its sequence characteristics and organization with those of birds and therian mammals in order to address the questions of platypus biology and to date the emergence of mammalian traits. We report here that sequence characteristics of the platypus genome show features of reptiles as well as mammals.
Platypus contains a largely standard repertoire of non-protein-coding, ncRNAs, except for the snoRNAs, which exhibit a marked expansion associated with at least one retrotransposed subfamily. Some of these retrotransposed snoRNAs are expressed and thus may have functional roles. The platypus has fully elaborated piRNA and miRNA pathways, the latter including many monotreme-specific miRNAs and miRNAs that are shared with either mammals or chickens. Many functional assessments of these novel miRNAs remain to be carried out and will surely add to our knowledge of mammalian miRNA evolution.
The 18,527 protein-coding genes predicted from the platypus assembly fall within the range for therian genomes. Of particular interest are families of genes involved in biology that links monotremes to reptiles, such as egg-laying, vision and envenomation, as well as mammal-specific characters such as lactation, characters shared with marsupials such as antibacterial proteins, and platypus-specific characters such as venom delivery and underwater foraging. For instance, anatomical adaptations for chemoreception during underwater foraging are reflected in an unusually large repertoire of vomeronasal type 1 receptor genes. However, the repertoire of milk protein genes is typically mammalian, and the arrangement of milk protein genes seems to have been preserved since the last common ancestor of monotremes and therian mammals.
Since its initial description, the platypus has stood out as a species with a blend of reptilian and mammalian features, which is a characteristic that penetrates to the level of the genome sequence. The density and distribution of repetitive sequence, for example, reflects this fact. The high frequency of interspersed repeats in the platypus genome, although typical for mammalian genomes, is in contrast with the observed mean microsatellite coverage, which appears more reptilian. Additionally, the correlation of parent-of-origin-specific expression patterns in regions of reduced interspersed repeats in the platypus suggests that the evolution of imprinting in therians is linked to the accumulation of repetitive elements.
We find that the mixture of reptilian, mammalian and unique characteristics of the platypus genome provides many clues to the function and evolution of all mammalian genomes. The wealth of new findings and confirmation of existing knowledge immediately evident from the release of these data promise that the availability of the platypus genome sequence will provide the critically needed background to inspire rapid advances in other investigations of mammalian biology and evolution.
Tissue was obtained from animals captured at the Upper Barnard River, New South Wales, Australia, during breeding season (AEEC permit number R.CG.07.03 to F. Grützner; Environment ACT permit number LI 2002 270 to J. A. M. Graves; NPWS permit number A193 to R. C. Jones; AEC permit number S-49-2006 to F. Grützner).
A total of 26.9 million reads was assembled using the PCAP software20. Attempts were made to assign the largest contiguous blocks of sequence to chromosomes using standard FISH techniques.
We used the established Rfam pipeline68 and de novo sequencing to detect non-protein-coding RNAs (ncRNAs). Cloning, sequencing and annotation of sRNAs from platypus, echidna and chicken as well as miRNA sequences are described in ref. 25.
Protein-coding and non-protein-coding genes were computed using a modified version of the Ensembl pipeline (Supplementary Notes 14). Gene orthology assignment followed a procedure implemented previously69. Orthology rate estimation was performed with PAML70 using the model of ref. 71. In all cases, codon frequencies were estimated from the nucleotide composition at each codon position (F3X4 model).
Pairwise alignments between human and dog, mouse, opossum, platypus and chicken were projected from whole-genome alignments of 28 species (http://genome.cse.ucsc.edu/). These alignments were the basis for phylogeny, chromosome synteny, interspersed repeats, imprinting and CpG fraction analyses.
A total of 26.9 million reads was assembled using the PCAP software20. Assembly quality assessment accounted for read depth, chimaeric reads, repeat content, cloning bias, G+C content and heterozygosity (Supplementary Notes 4–11). We identified a total of ∼1.2 million single nucleotide polymorphisms (SNPs) within the 1.84-Gb sequenced female platypus genome using two independent analyses, SSAHA2 (SSAHA: a fast search method for large DNA databases72) and PCAP output20 (Supplementary Notes 11).
snoRNA annotation is as described in ref. 23. miRNAs sharing a heptamer at nucleotide position 2–8 were defined as a family. Homology with mouse/human miRNAs was based on annotated miRNAs in Rfam (http://microrna.sanger.ac.uk/sequences/index.shtml). piRNA sequences have been submitted to GEO (http://www.ncbi.nlm.nih.gov/geo/). miRNA total cloning frequency was normalized across tissue libraries by scaling cloning frequency per library by a factor representing total number of miRNA reads per library.
Orthologue groups were selected based on whether they contained genes predicted only from the platypus, and not from the chicken, opossum, dog, mouse or human genome assemblies (Supplementary Notes 15–17). Other groups were selected where the number of in-paralogous platypus genes exceeded the numbers of the other (chicken, opossum, dog, mouse and human) terminal lineages. Some of these groups represent erroneous gene predictions where, for example, protein-coding sequence predictions represented instead transposed element or highly repetitive sequence, or overlapped, on the reverse strand, other well-established coding sequence. Such instances were discarded. Lineage-specific gene loss was detected by inspection of BLASTZ alignment chains and nets at the UCSC Genome Browser (http://genome.cse.ucsc.edu/); by the interrogation of all known cDNA, EST and protein sequences held in GenBank using BLAST; and by attempting to predict orthologous genes within genomic intervals flanked by syntenic anchors.
To establish phylogeny we extended the basic data sampling approach described previously73 to protein-coding genes, and used established techniques to analyse protein-coding indels74 and retrotransposon insertions75 (Supplementary Notes 19).
The population structure of 90 platypuses from different regions in Australia was determined using Structure software v2.1 (ref. 76) using genotypes of 57 polymorphic Mon-1 and LINE2 loci. Five thousand replications were examined (Supplementary Notes 21).
For the imprinting cluster of PEG1/MEST, comparative maps were complied from Vega annotations for the mouse and human, and Ensembl gene builds for other species. Multiple alignments of each region for repeat distribution analyses were constructed using MLAGAN79 with translated anchoring.
We examined genomic assemblies for human (hg18), mouse (musMus8), dog (canFam2), opossum (monDom4), platypus (ornAna1) and chicken (galGal3), downloaded from the UCSC Genome Browser (http://genome.ucsc.edu), and computed the fraction of G+C nucleotides in each non-overlapping 10,000-bp window free of ambiguous bases. Bases in repeats were not distinguished and were counted along with non-repeat bases. For platypus all assembled sequence was analysed; for the other species only bases assigned to chromosomes were used.
The Ornithorhynchus anatinus whole-genome shotgun project has been deposited in DDBJ/EMBL/GenBank under the project accession AAPN00000000. The version described in this paper is the first version, AAPN01000000. The SNPs have been deposited in the dbSNP database (http://www.ncbi.nlm.nih.gov/projects/SNP/) with Submitter Method IDs PLATYPUS-ASSEMBLY_SNPS_200801 and PLATYPUS-READS_SNPS_200801.
The sequencing of platypus was funded by the National Human Genome Research Institute (NHGRI). This research was supported by grant HG002238 from the NHGRI (W.M.), NGFN (0313358A; to J.S. and J.B.), the DFG (SCHM 1469; to J.S. and J.B.), National Science Foundation BCS-0218338 (M.A.B.) and EPS-0346411 (M.A.B.), National Institutes of Health RO1 GM59290 (M.A.B.), National Institutes of Health RO1HG02385 (E.E.E), Australian Research Council (F.G.), UK Medical Research Council (C.P.P. and A.H.), Ministry of Science-Spain (X.S.P. and C.L.-O.) and the State of Louisiana Board of Regents Support Fund (M.A.B.). We thank T. Grant, S. Akiyama, P. Temple-Smith, R. Whittington and the Queensland Museum for platypus sample collection and DNA, and Macquarie Generation and Glenrock station for providing access and facilities during sampling. Approval to collect animals was granted by the New South Wales National Parks and Wildlife Services, New South Wales. Funding support for some platypus samples was provided by Australian Research Council and W.V. Scott Foundation. We thank M. Shelton, I. Elton and the Healesville Sanctuary for platypus pictures. We thank L. Duret for assistance on genome landscape analysis; G. Shaw for use of the silhouettes on Fig. 1; and Z.-X. Luo, M. Archer and R. Beck for advice on the Fig. 1 phylogeny. We acknowledge the approved use of the green anole lizard sequence data provided by the Broad Institute. Resources for exploring the sequence and annotation data are available on browser displays available at UCSC (http://genome.ucsc.edu), Ensembl (http://www.ensembl.org) and the NCBI (http://www.ncbi.nlm.nih.gov).
The file contains Supplementary Data with all sequences for all PCR primers used to establish platypus population structure
About this article
Nature Genetics (2018)