Article | Open | Published:

Genome evolution in the allotetraploid frog Xenopus laevis

Nature volume 538, pages 336343 (20 October 2016) | Download Citation


To explore the origins and consequences of tetraploidy in the African clawed frog, we sequenced the Xenopus laevis genome and compared it to the related diploid X. tropicalis genome. We characterize the allotetraploid origin of X. laevis by partitioning its genome into two homoeologous subgenomes, marked by distinct families of ‘fossil’ transposable elements. On the basis of the activity of these elements and the age of hundreds of unitary pseudogenes, we estimate that the two diploid progenitor species diverged around 34 million years ago (Ma) and combined to form an allotetraploid around 17–18 Ma. More than 56% of all genes were retained in two homoeologous copies. Protein function, gene expression, and the amount of conserved flanking sequence all correlate with retention rates. The subgenomes have evolved asymmetrically, with one chromosome set more often preserving the ancestral state and the other experiencing more gene loss, deletion, rearrangement, and reduced gene expression.


Ancient polyploidization events have shaped diverse eukaryotic genomes1, including two rounds of whole-genome duplication at the base of the vertebrate radiation2. While polyploidy is rare in amniotes, presumably owing to constraints on sex chromosome dosage3,4, it is common in fish5, amphibians6,7, and plants8. Polyploidy provides raw material for evolutionary diversification because gene duplicates can support new functions and networks9. However, the component subgenomes of a polyploid must cooperate to mediate potential incompatibilities of dosage, regulatory controls, protein–protein interactions and transposable element activity.

The African clawed frog X. laevis is one of a polyploid series that ranges from diploid to dodecaploid, and is therefore ideal for studying the impact of genome duplication10, especially given its status as a model for cell and developmental biology11. X. laevis has a chromosome number (2n = 36) nearly double that of the Western clawed frog Xenopus (formerly Silurana) tropicalis (2n = 20) and most other diploid frogs12, and is proposed to be an allotetraploid that arose via the interspecific hybridization of diploid progenitors with 2n = 18, followed by subsequent genome doubling to restore meiotic pairing and disomic inheritance10,13 (see Supplementary Note 1 and Extended Data Fig. 1 for discussion of the Xenopus allotetraploidy hypothesis).

Here we provide evidence for the allotetraploid hypothesis by tracing the origins of the X. laevis genome from its extinct progenitor diploids. The two subgenomes are distinct and maintain separate recombinational identities. Despite sharing the same nucleus, we find that the subgenomes have evolved asymmetrically: one of the two subgenomes has experienced more intrachromosomal rearrangement, gene loss by deletion and pseudogenization, and changes in levels of gene expression and in histone and DNA methylation. Superimposed on these global trends are local gene family expansions and the alteration of gene expression patterns.

Assembly, annotation and karyotype

We sequenced the genome of the X. laevis inbred ‘J’ strain by whole-genome shotgun methods in combination with long-insert clone-based end sequencing, (Supplementary Note 2) and organized the assembled sequences into chromosomes using fluorescence in situ hybridization (FISH) of 798 bacterial artificial chromosome clones (BACs) and in vivo and in vitro chromatin conformation capture analysis (Supplementary Note 3 and Methods). These complementary methods produced a high-quality chromosome-scale draft that includes all previously known X. laevis genes and assigns >91% of the assembled sequence (and 90% of the predicted protein-coding genes) to a chromosomal location.

We annotated 45,099 protein-coding genes and 342 microRNAs using RNA sequencing (RNA-seq) from 14 developmental stages (including the oocyte stage) and 14 adult tissues and organs (Supplementary Note 4), analysis of histone marks associated with transcription, and homology with X. tropicalis and other tetrapods (Supplementary Note 5 and Methods). Of the X. laevis protein-coding genes, 24,419 can be placed in 2:1 or 1:1 correspondence with 15,613 X. tropicalis genes, defining 8,806 homoeologous pairs of X. laevis genes with X. tropicalis orthologues and 6,807 single copy orthologues. The remaining genes are members of larger gene families (such as olfactory receptor genes) whose X. tropicalis orthology is more complex.

The X. laevis karyotype (Fig. 1a) reveals nine pairs of homoeologous chromosomes1,14,15. Each of the first eight pairs is co-orthologous to and named for a corresponding X. tropicalis chromosome, appending an L and S for the longer and shorter homoeologues, respectively16. XLA2L is the Z/W sex chromosome17, for which we determined a W-specific sequence in the q-subtelomeric region that includes the sex-determining gene dmw17 and a corresponding Z-specific haplotype. The homoeologous XLA2Sq, by contrast, has no such locus, and neither does XTR2 (Extended Data Fig. 2a and Supplementary Note 6). The ninth pair of homoeologues is a q–q fusion of proto-chromosomes homologous to XTR9 and XTR10, which probably occurred before allotetraploidization (Extended Data Fig. 2b–d and Supplementary Note 6). The S chromosomes are, on average, 13.2% shorter karyotypically16 and 17.3% shorter in assembled sequence than their L counterparts. The single nucleotide polymorphism rate in X. laevis is approximately 0.4%, far less than the approximately 6% divergence between homoeologous genes (Extended Data Fig. 1c and Supplementary Note 8.8).

Figure 1: Chromosome evolution in Xenopus.
Figure 1

a, Comparative cytogenetic map of XLA (Xenopus laevis) and XTR (Xenopus tropicalis) chromosomes. Magenta lines show relationships of chromosomal locations of 198 homoeologous gene pairs between XLA.L and XLA.S chromosomes, identified by FISH mapping using BAC clones (Supplementary Table 1 and Supplementary Note 3.1). Blue lines show relationships of chromosomal locations of orthologous genes between XTR chromosomes and (i) both XLA.L and XLA.S chromosomes (solid line) (lines between XLA.L and XLA.S are omitted), (ii) only XLA.L (dashed), or (iii) only XLA.S (dotted), which were taken from our previous studies14,15. Light blue lines indicate positional relationships of actr3 and lypd1 on XTR9q and rpl13a and rps11 on XTR10q with those on XLA9_10LS chromosomes (Supplementary Note 6.2). Double-headed arrows on the right of XLA.S chromosomes indicate the chromosomal regions in which inversions occurred. Ideograms of XTR and XLA chromosomes were taken from our previous reports15,16. b, Distribution of homoeologous genes (purple), singletons (grey) and subgenome-specific repeats across XLA1L (top) and XLA1S (bottom). Xl-TpL_harb is red, Xl-TpS_harb is blue, and Xl-TpS_mar is green. Purple lines mark homoeologous genes present in both L and S chromosomes, the black line marks the approximate centromere location on each chromosome. The homoeologous gene pairs, from left to right: rnf4, spcs3, intsl2, foxa1, sds, ap3s1, lifr, aqp7. Each bin is 3 Mb in size, with 0.5 Mb overlap with the previous bin. c, Chromosomal localization of the Xl-TpS_mar sequence with fluorescence in situ hybridization. Hybridization signals were only observed on the S chromosomes. Scale bar, 10 μm.

Subgenomes and timing of allotetraploidization

We reasoned that dispersed relicts of transposable elements specific to each progenitor would mark the descendent subgenomes in an allotetraploid (Fig. 2c and Extended Data Fig. 1). Three classes of DNA transposon relicts appeared almost exclusively on either the L or S chromosomes (Supplementary Note 7). Xl-TpL_harb and Xl-TpS_harb are subfamilies of miniature inverted-repeat transposable elements (MITE) of the PIF/harbinger superfamily18,19 whose relicts were almost completely restricted to L or S chromosomes, respectively (Fig. 1b and Extended Data Fig. 3a). Similarly, sequence relicts of the Tc1/mariner superfamily member Xl-TpS_mar (closely related to the fish MMTS subfamily20) were found almost exclusively on the S chromosomes (Fig. 1b), as confirmed by FISH analysis using Xl-TpS_mar as a probe (Fig. 1c and Supplementary Note 7.4; see Supplementary Note 7.3 for details on the rare elements that map to the opposite subgenome).

Figure 2: Molecular evolution and allotetraploidy.
Figure 2

a, The distribution of pseudogene ages, as described in Supplementary Note 9 (top). Phylogenetic tree illustrating the different epochs in Xenopus (bottom), with times based on protein-coding gene phylogeny of pipids, including Xenopus, Pipa carvalhoi, Hymenochirus boettgeri and Rana pipiens (only Xenopus depicted). We date the speciation of X. tropicalis and the X. laevis ancestor at 48 Ma, the L and S polyploid progenitors at 34 Ma and the divergence of the polyploid Xenopus radiation at 17 Ma. Using these times as calibration points, we estimate bursts of transposon activity at 18 Ma (mariner, blue star) and 33–34 Ma (harbinger, red star). The purple star is the time of hybridization, around 17–18 Ma. b, Phylogenetic tree based on protein-coding genes of tetrapods, rooted by elephant shark (not shown). Alignments were done by MACSE (multiple alignment of coding sequences accounting for frameshifts and stop codons) and the maximum-likelihood tree was built by PhyML. Branch length scale shown at the bottom for 0.08 substitutions per site.The difference in branch length between Xenopus laevis-L and Xenopus laevis-S is similar to that seen between mouse and rat. Both subgenomes of X. laevis have longer branch lengths than X. tropicalis.

The L and S chromosome sets therefore represent the descendants of two distinct diploid progenitors, confirming the allotetraploid hypothesis despite the absence of extant progenitor species. Analysis of synonymous divergence of protein-coding genes suggests that the L and S subgenomes diverged from each other around 34 Ma (T2) and from X. tropicalis around 48 Ma (T1) (Fig. 2a), consistent with prior gene-by-gene estimates from transcriptomes21,22,23,24 (Supplementary Note 8, Extended Data Fig. 4 and Methods). L- and S-specific transposable elements were active around 18–34 Ma, indicating that the two progenitors were independently evolving diploids during that period (Fig. 2a, Supplementary Note 7.5 and Extended Data Fig. 3). More recent transposon activity is more uniformly distributed across the L and S chromosomes (not shown). Finally, consistent with a common origin for tetraploid Xenopus species, we can clearly identify orthologues of L and S genes in whole-genome sequences of a related allotetraploid frog, X. borealis, and estimate the X. laevisX. borealis divergence to be around 17 Ma (T3). These considerations constrain the allotetraploid event to around 17–18 Ma (T*). This timing is consistent with other estimates of the radiation of tetraploid Xenopus species, which are presumed to emerge from the bottleneck of a shared allotetraploid founder population23,24.

Karyotype stability

With the exception of the chromosome 9–10 fusion, X. laevis and X. tropicalis chromosomes have maintained conserved synteny since their divergence around 48 Ma (Fig. 1a, b). The absence of inter-chromosomal rearrangements is consistent with the relative stability of amphibian and avian karyotypes compared to those of mammals25, which typically show dozens of inter-chromosome rearrangements26. It also contrasts with many plant polyploids, which can show considerable inter-subgenome rearrangement27. The distribution of L- and S-specific repeats along entire chromosomes implies the absence of crossover recombination between homoeologues since allotetraploidization, presumably because the two progenitors were sufficiently diverged to avoid meiotic pairing between homoeologous chromosomes, though we cannot rule out very limited localized inter-homoeologue exchanges (Supplementary Note 7).

The extensive collinearity between homologous X. laevis L and X. tropicalis chromosomes (Fig. 1a) implies that they represent the ancestral chromosome organization. In contrast, the S subgenome shows extensive intra-chromosomal rearrangements, evident in the large inversions of XLA2S, XLA3S, XLA4S, XLA5S and XLA8S, as well as shorter rearrangements (Fig. 1a). The S subgenome has also experienced more deletions. For example, the 45S pre-ribosomal RNA gene cluster is found on X. laevis XLA3Lp, but its homoeologous locus on XLA3Sp is absent (Extended Data Fig. 5a). Extensive small-scale deletions (Extended Data Fig. 5b) reduce the length of S chromosomes relative to their L and X. tropicalis counterparts (see below).

Response of subgenomes to allotetraploidy

Redundant functional elements in a polyploid are expected to rapidly revert to single copies through the fixation of disabling mutations and/or loss28 unless prevented by neofunctionalization8, subfunctionalization26, or selection for gene dosage29. Differential gene loss between homoeologous chromosomes is sometimes referred to as ‘genome fractionation’30,31,32 (Supplementary Note 1). At least 56.4% of the protein-coding genes duplicated by allotetraploidization have been retained in the X. laevis genome (Supplementary Note 10; 60.2% if genes on unassigned short scaffolds are included). Previous studies that relied on cDNA21 and expressed sequence tag (EST) surveys22,33,34 observed far lower rates of retention, probably owing to sampling biases from gene expression (Supplementary Note 8.2).

Even higher retention rates were found for homoeologous microRNAs (156 out of 180, 86.7%), similar to the salmonid-specific duplication5, and both primary copies are expressed for intergenic homoeologous microRNAs (Supplementary Note 8.6 and Extended Data Fig. 5e). Pan-vertebrate putatively cis-regulatory conserved non-coding elements (CNEs)35 were also highly retained (541 out of 550, 98.4%; Supplementary Note 8.7 and Table 1). CNEs conserved between X. laevis and X. tropicalis, however, were retained at a significantly lower rate (49%, P ≤ 1 × 1050; Table 1 and Supplemental Table 3). Longer genes (by genomic span, exon number or coding length) were more likely to be retained (Wilcoxon signed-rank test, P ≤ 10−5; Supplementary Note 10.5 and Extended Data Fig. 5 h–j), broadly consistent with the idea that longer genes have more independently mutable functions and are therefore more susceptible to subfunctionalization and subsequent retention36.

Table 1: Summary of retention of different genomic elements, in comparison to the diploid X. tropicalis genome

Genes have been lost asymmetrically between the two subgenomes of X. laevis. Similar results have been reported for some plant polyploids30 but not in rainbow trout5. For X. laevis protein-coding genes with clear 1:1 or 2:1 orthologues in X. tropicalis, we found that significantly more genes were lost from the S subgenome (31.5%) than from the L subgenome (8.3%; χ2 test P = 2.23 × 10−50; Supplementary Table 2), with the same trend for other types of functional elements, such as histone H3 lysine 4 trimethylation (H3K4me3)-enriched promoters and p300-bound enhancers (Table 1). Across most of the genome, genes appeared to be lost independently of their neighbours, as runs of gene losses were nearly geometrically distributed (Fig. 3a, right). We did observe some large block deletions (for example, several olfactory clusters (Extended Data Fig. 5b) and a few unusually long blocks of functionally unrelated genes that were retained in two copies without loss (Fig. 3a, left)).

Figure 3: Structural response to allotetraploidy.
Figure 3

a, Distributions of consecutive retentions (left) and deletions (right) in the L (red) and S (blue) subgenomes. The distributions were fit using the equation y = a × (ebx) + c × (edx). The y axis is shown on a log scale. Significant differences were seen between L and S subgenomes in both distributions (Student’s t-test, retention, P = 3.6 × 10−22; deletion, P = 4.5 × 10−84). b, Evolutionary conservation of the Xenopus major histocompatibility complex (MHC) and differential MHC silencing on the two X. laevis subgenomes. Selected gene names shown above. The ‘Adaptive MHC’ encodes tightly-linked essential genes involved in antigen presentation to T cells; this group of genes is the primordial linkage group and has been preserved in most non-mammalian vertebrates, including Xenopus. Differential gene silencing is particularly pronounced, as four genes around the class I gene are functional on the S chromosome, but absent (dma and dmb (MHC-class II domain alpha and beta) or pseudogenes (ring3, really interesting new gene 3; lmp2, large multifunctional proteasome 2) on the L chromosome. The gene map is not to scale; pseudogenes are noted as indicated. HSA, Homo sapiens MHC; XLA Xenopus laevis MHC; GGA Gallus gallus (chicken) MHC. Refer to the Supplementary Table 8 for a more detailed MHC map. TAPBP, TAP binding protein, or tapasin; TAP2, antigen peptide transporter 2; CFB, complement factor B and TNFa, tumor necrosis factor α. c, Hox gene clusters. X. laevis retains eight hox clusters, consisting of pairs of hoxa, b, c and d clusters, on L and S chromosomes. even-skipped genes (evx1 or evx2) are positioned flanking hoxa and hoxd clusters. hox genes are classified into four groups namely, labial, proboscipedia, central and posterior groups. Note that hoxb2.L (2p, black) is a pseudogene. d, Syntenies around the mix gene family. Abbreviations for species and chromosome numbers: HSA1, H. sapiens;; GGA3 G. gallus (chicken); XTR5, X. tropicalis; XLA5L and XLA5S, X. laevisL and S subgenomes); DRE20, D. rerio (zebrafish);. Each Xenopus (sub)genome experienced its own independent expansion of the family (see Extended Data Fig. 5 for details).

Many lost genes were simply deleted, as demonstrated by significantly shorter distances between conserved flanking genes in the other subgenome and in X. tropicalis. Both the size and number of deletions were greater on the S subgenome (Extended Data Fig. 5c). We identified 985 ‘unitary’ (that is, non-retrotransposed) pseudogenes out of 1,531 loci examined in detail. This 64% detection rate was similar between subgenomes in X. laevis and comparable to that reported in trout5. Based on the accumulation of non-synonymous mutations37 we estimated that most of these pseudogenes escaped evolutionary constraint around 15 Ma (Fig. 2a and Extended Data Fig. 6), consistent with the onset of extensive redundancy in the allotetraploid, although the precision of our pseudogene age estimates is low (Supplementary Note 9). Most pseudogenes showed no evidence of expression, but of 769 pseudogenes longer than 100 bp, 133 (17.2%) showed residual expression (Extended Data Fig. 6). Conversely, among homoeologous gene pairs, we found 760 for which one member had little to no expression across our 28 sampled conditions. Although these retained some gene structure (start and stop codon, no frame shifts, good splice signals), they showed increased rates of amino acid change and appeared to be under relaxed selection (Extended Data Fig. 5f). We called these nominally dying genes ‘thanagenes’ (Supplementary Note 12.5). Reduced expression may be due to mutated cis-regulatory elements, as exemplified by the six6 gene pair (Fig. 4e, Extended Data Fig. 8 g–i and Supplementary Note 13.1).

Figure 4: Retention and functional differentiation.
Figure 4

a, Comparison of L and S gene loss by KEGG categories (left) and tissue-weighted gene co-expression network analysis (WGCNA) categories (right) (Supplementary Note 10.1). Blue line denotes expected L or S loss based on genome-wide average (56.4%). Red points denote functional categories showing a high degree of loss. Magenta points denote functional categories showing a high degree of retention (χ2 test, P < 0.01). b, Box plot of log10(LTPM/STPM) for homoeologous gene pairs, zoomed in to show medians. Ovary and maternally controlled developmental time points (left, light blue and dark blue bars, respectively), zygotically controlled developmental time points and adult tissues (right, red and green bars, respectively). Red line, equal ratio log10(1). On average, maternal datasets express the L gene of a homoeologous pair 12% more strongly than the S gene (median = 0%), whereas zygotic tissues and time points express the L gene of a homoeologous pair 25% more strongly than the S gene (median = 1.8%). The difference between the mean and medians is explained by many genes with large differences between homoeologues (Extended Data Fig. 8c). c, d, Developmental expression plot (left) and epigenetic landscape (right) surrounding hoxb4 (c) and numbl (d). Right, genomic profiles of H3K4me3 (green), p300 (yellow), RNA polymerase II (RNAP II; d, purple) and H3K36me3 (d, blue) ChIP–seq tracks, as well as DNA methylation levels determined by whole-genome bisulfite sequencing (grey). Gene annotation track shows hoxb4 (c) and numbl (d) genes on L (top) and S. Grey denotes conservation between L and S genomic sequences. d, The small amount of expression seen in maternal numbl and numbl.L is consistent between replicates. Gene expression is measured in transcripts per million mapped reads (TPM). e, Representative embryos with GFP expression, as detected by in situ hybridization at stages 32–33, driven by six6.L-CNE or six6.S-CNE linked to a basal promoter-GFP cassette (six6.L-CNE:GFP and six6.S-CNE:GFP, respectively). Embryos were 4,250–4,450 μm. Semi-quantitative image analysis revealed a substantial difference in average expression level; the expression driven by six6.S-CNE (n = 27) was 0.6-fold weaker than that by six6.L-CNE in the eye region (n = 32). Given eye-specific patterns of their endogenous expression, the six6 genes probably have additional silencers for restricting enhancer activity of the CNEs in the eye.

Although tetraploidy created two ‘copies’ of nearly every gene, additional gene copies were continually produced by tandem duplication (Fig. 3d and Extended Data Fig. 7). The number of tandem clusters was greater in X. tropicalis than in the X. laevis L subgenome, which in turn was greater than in the S subgenome (Supplementary Note 11.1). Although tandem duplication was faster in X. tropicalis than in X. laevis, there was also a higher rate of loss. Since tandem duplications and deletions occur by unequal crossing over during meiosis, these differing rates were consistent with the shorter generation time of X. tropicalis (Extended Data Fig. 7 f, g). The mean time to loss of an old tandem duplicate is around 40 Ma in X. laevis (on either subgenome) compared to around 16 Ma in X. tropicalis. Homoeologous gene loss and tandem duplication can combine to yield complex histories for some gene families. We discuss how these families contribute to the literature on whole-genome duplication evolution in Supplementary Notes 10 and 13.

Functional patterns of gene retention and loss

We found preferential retention or loss of many functional categories (Fig. 4a, Extended Data Figs 4e, 9, 10 and Supplementary Note 13). DNA binding proteins, components of developmentally regulated signalling (TGFβ, Wnt, Hedgehog and Hippo) and cell cycle regulation pathways were retained at a substantially higher rate (>90%) than average (Extended Data Fig. 10). Genes retained in multiple copies after the ancient vertebrate genome duplication were also more likely to be retained as homoeologues in X. laevis (Supplementary Note 10.4), similar to teleost and trout genome duplications5. We found nearly complete retention of 37 out of 38 duplicated genes in the four pairs of homoeologous Hox clusters, with a single pseudogene (Fig. 3c). High rates of homoeologue retention in most genes in these categories suggest that stoichiometrically controlled expression levels may be needed, or subfunctionalization of homoeologues may have occurred, either in their expression domain or in their target specificity.

Conversely, homoeologous genes in other functional categories have been lost at a higher rate, presumably because of a corresponding lack of selection for dosage. For example, genes involved in DNA repair were lost at a high rate (79%) (Supplementary Note 10.1). This is consistent with relaxed selection for repair in the immediate aftermath of allotetraploidy, when all genes were present in four copies per somatic cell5. Other metabolic categories were also prone to loss, presumably because single loci encoding enzymes were sufficient38. Genomic regions with notable loss include the major histocompatibility complex genes on the S subgenome (Fig. 3b) and several olfactory receptor clusters (Extended Data Fig. 5b). We hypothesize that homoeologous genes may be functionally incompatible in these cases, leading to en bloc deletion in response to selection pressure. Specific case studies of duplicate gene retention and loss are detailed in Extended Data Figs 9, 10 and Supplementary Note 13.

Evolution of gene expression

Gene expression is also a predictor of retention, whereby more highly expressed genes are more likely to be retained (Extended Data Fig. 8b), similar to results seen in Paramecium39,40. Developmentally regulated genes whose expression levels peak at the maternal zygotic transition (MZT) or during neural differentiation were retained at higher levels (P < 0.01), based on gene expression networks constructed from developmental and adult tissue expression (Methods, Fig. 4a (right), Extended Data Fig. 10e and Supplementary Note 12.3). We speculate that the exceptional retention of developmentally regulated genes is due to selection for stoichiometric dosage of these factors, and in some cases higher expression in the physically larger allotetraploid cells and embryos relative to those of diploid frogs, although a propensity36 of these genes for sub- or neofunctionalization could also have contributed. In the adult, genes whose expression peaks in the brain and eye were also retained at higher levels (Fig. 4b).

In X. laevis, the expression of homoeologues was highly correlated (Extended Data Fig. 8a), showing that the overall expression of homoeologues diverged similarly to that of orthologues between Xenopus species41. Many homoeologous pairs, however, were differentially expressed throughout development or across adult tissues, either in a spatiotemporal pattern (a form of subfunctionalization36; Supplementary Note 12.4 and Extended Data Fig. 8d–f) or in the same pattern but with differing expression levels. When homoeologous gene pairs were both expressed, the average L copy expression level was approximately 25% higher than that of the S copy consistently across adult tissues and after the MZT42 (Fig. 4b and Supplementary Note 12.2). Excess L expression, however, averaged only around 12% in oocyte and early pre-MZT stages, suggesting that the two subgenomes were more evenly expressed as maternal transcripts but developed an increased asymmetry after the MZT. Strikingly, we found 391 cases in which one homoeologue had no detectable maternal mRNA (oocytes, egg and stage 8; Fig. 4c, d and Extended Data Fig. 8c). Compared to similar transcript data from X. tropicalis, we found cases of an apparent loss of expression (‘maternal subfunctionalization’, that is, X. tropicalis and one X. laevis gene was expressed, whereas the other X. laevis gene was silenced pre-MZT) in 238 genes (for example numbl.S). We also found gains of expression (‘maternal neofunctionalization’, that is, the X. tropicalis gene was not expressed maternally, but one X. laevis gene was expressed) in 153 genes (for example hoxb4.L). We did not see such a large divergence in other expression domains (Supplementary Note 12.2 and Extended Data Fig. 8c), suggesting a higher level of plasticity of maternal mRNA regulation between X. laevis homoeologues, similar to the trend seen between Xenopus species41.

Overall, thousands of homoeologue pairs have either divergent spatiotemporal patterns or similar patterns with differing expression levels. Such homoeologue pairs differed in substitution rate and coding sequence length difference more than those that were similar in expression (Supplementary Note 12.4 and Extended Data Fig. 8d–f), a pattern that was also found in trout homoeologous pairs5. These expression differences can largely be attributed to changes in epigenetic regulation (Random Forest classification; ROC area under the curve 0.78), with changes in H3K4me3 and DNA methylation contributing the most explanatory power among our epigenetic variables (Supplementary Note 14). Detailed comparison of the two subgenomes will facilitate identification of specific sequences that control cis-regulatory differences between homoeologues.


The two subgenomes of Xenopus laevis have evolved asymmetrically, with the L subgenome more consistently resembling the ancestral condition and the S subgenome more disrupted by deletion and rearrangement. Asymmetric gene loss has been observed in allopolyploid plants30 and yeast43 at the segmental level, but it has not been shown directly that similarly fractionated segments derive from the same progenitor (Fig. 1c). Our results are consistent with the idea that optimized gene expression levels are an important force affecting gene retention following polyploidy39,40. The asymmetry between the L and S subgenomes could have been the result of an intrinsic difference between their diploid progenitors. Alternately, the remodelling of the S genome could have been a response to the L–S merger itself, a ‘genomic shock’44 resulting from the activation of transposable elements (Fig. 2a and Supplementary Note 8.5). The popularity of Xenopus as a model for the study of vertebrate development, cell biology and immunology is now extended to a model for vertebrate polyploidy.


No statistical methods were used to predetermine sample size. The experiments were not randomized. The investigators were not blinded to allocation during experiments and outcome assessment.

Notation and terminology

‘Homoeologous’ chromosomes are anciently orthologous chromosomes that diverged by speciation but were reunited in the same nucleus by a polyploidization event. They are a special case of paralogues. Homoeologous genes are sometimes called ‘alloalleles’ to emphasize their role as alternate forms of a gene, but since homoeologues are unlinked and assort independently, we do not use this terminology. Similarly, loss of homoeologous genes is sometimes referred to as ‘diploidization’. We prefer the simpler and more descriptive term ‘gene loss’. Note that an allotetraploid such as Xenopus laevis has two related subgenomes, but these subgenomes are each transmitted to progeny via conventional disomic inheritance. So immediately after allotetraploidization, the new species is already genetically diploid. This is clearly the case for X. laevis, since we find no evidence for recombination between homoeologous chromosomes, which would create new sequences with mixed ‘L’ and ‘S’ type transposable elements.

Sequencing and assembly

DNA was extracted from the blood of a single female from the inbred J-strain for whole-genome shotgun sequencing. We generated 4.6 billion paired-end Illumina reads from a range of inserts and used Sanger dideoxy sequencing to obtain fosmid- and bacterial artificial chromosome (BAC)-end pairs and full BAC sequences. We used meraculous45 as the primary genome assembler. See supplementary notes for more detailed information.

Chromosome scale organization

We identified 798 BACs containing genes of interest distributed across the Xenopus genome and performed fluorescence FISH to assign these BACs to specific chromosomes based on Hoechst 33258-stained late-replication banding patterns (Supplementary Table 1). Tethered chromatin conformation capture (TCC)46 and in vitro chromatin conformation capture47 were performed as previously described, and assembled with HiRise47.

Characterization of sex locus

Sex determination in X. laevis follows a female heterogametic ZZ/ZW system48. We fully sequenced BAC clones representing both W and Z haplotypes, and identified both W- and Z-specific sequences (Extended Data Fig. 2a). The existence of the Z-specific sequence was unexpected and therefore verified by PCR analysis using specific primer sets and DNA from gynogenetic frogs having either W or Z loci.

Gene annotation

We made use of extensive previously generated transcriptome data for X. laevis and X. tropicalis, including 697,015 X. laevis EST sequences49. In addition, more than 1 billion RNA-seq reads were generated for this project from 14 oocyte/developmental stages and 14 adult tissues from J-strain X. laevis (Supplementary Note 4). These data were combined with homology and ab initio predictions using the Joint Genome Institute’s integrated gene call pipeline (see Supplementary Notes 4 and 8 for more details).

Characterization of subgenome-specific transposable elements

We found subgenome-specific repeats using a RepeatMasker50 result. The repeats were used to reconstruct full-length subgenome specific transposon sequences. The specific transposons, Xl-TpL_harb, Xl-TpS_harb and Xl-TpS_mar, were classified on the basis of their target site sequence and terminal inverted repeat (TIR) sequences. The coverage lengths of the transposons on each chromosome were calculated from the results of BLASTN search (E < 10−5) using the consensus sequences of the transposons as queries. The chromosomal distribution of the Xl-TpS_mar was revealed by a FISH analysis (Supplementary Note 7.4).

Phylogeny, divergence time, and evolutionary rates

We used Hymenochirus boettgeri, Pipa carvalhoi and Rana pipiens sequences as outgroups to estimate the evolutionary rate of duplicated genes in X. laevis and their relationship to X. tropicalis. See Supplementary Notes 7 and 8 for more detail.

Deletions and pseudogenes

Pseudogene sequences contain various defects including premature stop codons, frameshifts, disrupted splicing, and/or partial coding deletions. 985 pseudogenes were identified among 1,531 ‘2-1-2 regions’, with the others deleted or rendered unidentifiable by mutation. 368 out of 985 could be timed, based on the accumulation of non-synonymous and synonymous substitution between a pseudogene, its homoeologue and its orthologue in X. tropicalis, providing a time since the loss of constraint for each pseudogene37. See Supplementary Note 9 for additional details.

Functional annotation of genes

We used several bioinformatic methods and high-throughput datasets to assign functional annotations to Xenopus genes. Protein domains were assigned using InterPro (including PFAM and Panther)51 and KEGG52. Gene Ontology was assigned using InterPro2Go51. We identified genes that encode mitochondrial proteins by mapping the MitoCarta53 database from mouse to the most recent X. tropicalis proteome. Xenopus genes associated with germ plasm were manually curated using the extensive Xenopus literature (Supplementary Note 13).

Gene expression

We analysed transcriptome data generated for 14 oocyte/developmental stages and 14 adult tissues in duplicate except for oocyte stages (see Supplementary Note 4). Expression levels were measured by mapping paired-end RNA-seq reads to predicted full length cDNA and reporting transcripts per one million mapped reads (TPM). We consider the limit of detectable expression to be TPM >0.5. Co-expression modules were defined by weighted gene correlation network analysis (WGCNA) clustering54 (Supplementary Note 12).

Epigenetic analysis

We determined DNA methylation levels (DNAme) by whole genome bisulfite sequencing and used ChIP–seq to generate profiles of the promoter mark histone H3 lysine 4 trimethylation (H3K4me3), the transcription elongation mark H3K36me3, as well as RNA polymerase II (RNAPII) and the enhancer-associated co-activator p300. To test which regulatory features would contribute most to the L versus S expression differences, we applied a Random Forest machine learning algorithm to analyse differential expression between the L and S homoeologues (See Supplementary Note 14).

Data availability

The XENLAv9.1 genome assembly and annotation are deposited at NCBI (accession LYTH00000000. The DNA read libraries of X. laevis and X. borealis were deposited at the Sequence Read Archive under accessions SRP071264 and SRP070985, respectively. Datasets of the X. laevis RNA-seq short reads were deposited in NCBI Gene Expression Omnibus (accession number GSE73430 for stages, GSE73419 for tissues). Datasets of the Hymenochirus RNA-seq short reads were deposited in NCBI GEO (accession number GSE76089). The epigenetic data have been deposited in NCBI’s Gene Expression Omnibus and are accessible through GEO Series accession numbers GSE76059 for ChIP–seq. MethylC-seq data are accessible through GEO Series accession number GSE76247. The sequence data from BAC and fosmid clones have been deposited to DDBJ/GenBank/EMBL under the accession numbers: (i) GA131508–GA227532, GA228275–GA244139, GA244852–GA274229, GA274976–GA275712, GA277157–GA344957, GA345673–GA350926 and GA351685–GA393223 for the XLB1 end-sequences; (ii) GA720358–GA756840 for the XLB2 end-sequences; (iii) GA756841–GA867435 for the XLFIC end-sequences and (iv) AP012997–AP013026,AP014660–AP014679, AP017316 and AP017317 for the finished BAC/fosmid sequences.


  1. 1.

    , & The evolutionary significance of ancient genome duplications. Nat. Rev. Genet. 10, 725–732 (2009)

  2. 2.

    , , & Gene duplications and the origins of vertebrate development. Development Suppl. , 125–133 (1994)

  3. 3.

    Why polyploidy is rarer in animals than in plants. Am. Nat. 59, 346–353 (1925)

  4. 4.

    ‘Why polyploidy is rarer in animals than in plants’ revisited. Am. Nat. 136, 759–770 (1990)

  5. 5.

    et al. The rainbow trout genome provides novel insights into evolution after whole-genome duplication in vertebrates. Nat. Commun. 5, 3657 (2014)

  6. 6.

    et al. The zebrafish gene map defines ancestral vertebrate chromosomes. Genome Res. 15, 1307–1314 (2005)

  7. 7.

    & Whole-genome duplication in teleost fishes and its evolutionary consequences. Mol. Genet. Genomics 289, 1045–1060 (2014)

  8. 8.

    The evolutionary consequences of polyploidy. Cell 131, 452–462 (2007)

  9. 9.

    Evolution by Gene Duplication (Springer, 1970)

  10. 10.

    & Genetics of polyploid Xenopus. Trends Genet. 2, 310–315 (1986)

  11. 11.

    & Xenopus research: metamorphosed by genetics and genomics. Trends Genet. 27, 507–515 (2011)

  12. 12.

    A list of chromosome numbers of anuran amphibians. Bull. Fukuoka Univ. Educ. 39, 83–127 (1990)

  13. 13.

    , , , & Albumin phylogeny for clawed frogs (Xenopus). Science 195, 785–787 (1977)

  14. 14.

    , , , & Homoeologous chromosomes of Xenopus laevis are highly conserved after whole-genome duplication. Heredity 111, 430–436 (2013)

  15. 15.

    et al. Inference of the protokaryotypes of amniotes and tetrapods and the evolutionary processes of microchromosomes from comparative gene mapping. PLoS One 7, e53027 (2012)

  16. 16.

    et al. A new nomenclature of Xenopus laevis chromosomes based on the phylogenetic relationship to Silurana/Xenopus tropicalis. Cytogenet. Genome Res. 145, 187–191 (2015)

  17. 17.

    et al. A W-linked DM-domain gene, DM-W, participates in primary ovary development in Xenopus laevis. Proc. Natl Acad. Sci. USA 105, 2469–2474 (2008)

  18. 18.

    et al. P instability factor: an active maize transposon system associated with the amplification of Tourist-like MITEs and a new superfamily of transposases. Proc. Natl Acad. Sci. USA 98, 12572–12577 (2001)

  19. 19.

    & PIFs meet Tourists and Harbingers: a superfamily reunion. Proc. Natl Acad. Sci. USA 98, 12315–12316 (2001)

  20. 20.

    , , , & MMTS, a new subfamily of Tc1-like transposons. Mol. Cells 26, 387–395 (2008)

  21. 21.

    et al. Sequencing and analysis of 10,967 full-length cDNA clones from Xenopus laevis and Xenopus tropicalis reveals post-tetraploidization transcriptome remodeling. Genome Res. 16, 796–803 (2006)

  22. 22.

    et al. Accelerated gene evolution and subfunctionalization in the pseudotetraploid frog Xenopus laevis. BMC Biol. 5, 31 (2007)

  23. 23.

    , , & The pipid root. Syst. Biol. 61, 913–926 (2012)

  24. 24.

    Xenopus in space and time: fossils, node calibrations, tip-dating, and paleobiogeography. Cytogenet. Genome Res. 145, 283–301 (2015)

  25. 25.

    et al. Origin of amphibian and avian chromosomes by fission, fusion, and retention of ancestral chromosomes. Genome Res. 21, 1306–1312 (2011)

  26. 26.

    & Mammalian karyotype evolution. Nat. Rev. Genet. 8, 950–962 (2007)

  27. 27.

    et al. Genomic duplication, fractionation and the origin of regulatory novelty. Genetics 166, 935–945 (2004)

  28. 28.

    The part played by recurrent mutation in evolution. Am. Nat. 67, 5–19 (1933)

  29. 29.

    & Gene balance hypothesis: connecting issues of dosage sensitivity across biological disciplines. Proc. Natl Acad. Sci. USA 109, 14746–14753 (2012)

  30. 30.

    , & Differentiation of the maize subgenomes by genome dominance and both ancient and ongoing gene loss. Proc. Natl Acad. Sci. USA 108, 4069–4074 (2011)

  31. 31.

    , & A model for biased fractionation after whole genome duplication. BMC Genomics 13 (Suppl. 1), S8 (2012)

  32. 32.

    et al. Two evolutionarily distinct classes of paleopolyploidy. Mol. Biol. Evol. 31, 448–454 (2014)

  33. 33.

    & Preferential subfunctionalization of slow-evolving genes after allopolyploidization in Xenopus laevis. Proc. Natl Acad. Sci. USA 105, 8333–8338 (2008)

  34. 34.

    , & The odds of duplicate gene persistence after polyploidization. BMC Genomics 12, 599 (2011)

  35. 35.

    , , , & Ancient vertebrate conserved noncoding elements have been evolving rapidly in teleost fishes. Mol. Biol. Evol. 28, 1205–1215 (2011)

  36. 36.

    et al. Preservation of duplicate genes by complementary, degenerative mutations. Genetics 151, 1531–1545 (1999)

  37. 37.

    , , , & Molecular decay of the tooth gene Enamelin (ENAM) mirrors the loss of enamel in the fossil record of placental mammals. PLoS Genet. 5, e1000634 (2009)

  38. 38.

    & A common framework for understanding the origin of genetic dominance and evolutionary fates of gene duplications. Trends Genet. 20, 287–290 (2004)

  39. 39.

    et al. Global trends of whole-genome duplications revealed by the ciliate Paramecium tetraurelia. Nature 444, 171–178 (2006)

  40. 40.

    , , & Paramecium Post-Genomics Consortium. The relationship among gene expression, the evolution of gene dosage, and the rate of protein evolution. PLoS Genet. 6, e1000944 (2010)

  41. 41.

    , , & Mapping gene expression in two Xenopus species: evolutionary constraints and developmental flexibility. Dev. Cell 20, 483–496 (2011)

  42. 42.

    , , & New insights into the maternal to zygotic transition. Development 141, 3834–3841 (2014)

  43. 43.

    & Beyond the whole-genome duplication: phylogenetic evidence for an ancient interspecies hybridization in the baker’s yeast lineage. PLoS Biol. 13, e1002220 (2015)

  44. 44.

    The significance of responses of the genome to challenge. Science 226, 792–801 (1984)

  45. 45.

    et al. Meraculous: de novo genome assembly with short paired-end reads. PLoS One 6, e23501 (2011)

  46. 46.

    et al. Genome architectures revealed by tethered chromosome conformation capture and population-based modeling. Nat. Biotechnol. 30, 90–98 (2011)

  47. 47.

    et al. Chromosome-scale shotgun assembly using an in vitro method for long-range linkage. Genome Res. 26, 342–350 (2016)

  48. 48.

    & Genic control and hormonal reversal of sex differentiation in Xenopus. Proc. Soc. Exp. Biol. Med. 93, 140–144 (1956)

  49. 49.

    From expression cloning to gene modeling: the development of Xenopus gene sequence resources. Genesis 50, 143–154 (2012)

  50. 50.

    , & RepeatMasker Open-4.0.

  51. 51.

    et al. The InterPro protein families database: the classification resource after 15 years. Nucleic Acids Res. 43, D213–D221 (2015)

  52. 52.

    et al. Data, information, knowledge and principle: back to metabolism in KEGG. Nucleic Acids Res. 42, D199–D205 (2014)

  53. 53.

    , & MitoCarta2.0: an updated inventory of mammalian mitochondrial proteins. Nucleic Acids Res. 44, D1251–D1257 (2016)

  54. 54.

    & WGCNA: an R package for weighted correlation network analysis. BMC Bioinformatics 9, 559 (2008)

  55. 55.

    & Identification of Xenopus CENP-A and an associated centromeric DNA repeat. Mol. Biol. Cell 16, 1800–1810 (2005)

  56. 56.

    et al. Ohnologs are overrepresented in pathogenic copy number mutations. Proc. Natl. Acad. Sci. USA 111, 361–366 (2014)

  57. 57.

    et al. RNA sequencing reveals a diverse and dynamic repertoire of the Xenopus tropicalis transcriptome over development. Genome Res. 23, 201–216 (2013)

Download references


Please see Supplementary Note 15 for funding information.

Author information

Author notes

    • Adam M. Session
    • , Yoshinobu Uno
    •  & Taejoon Kwon

    These authors contributed equally to this work.

    • Christian Haudenschild

    Present address: Personalis Inc., 1330 O’Brien Drive, Menlo Park, California 94025, USA.


  1. University of California, Berkeley, Department of Molecular and Cell Biology and Center for Integrative Genomics, Life Sciences Addition #3200, Berkeley, California 94720-3200, USA

    • Adam M. Session
    • , Jessica B. Lyons
    • , Therese Mitros
    • , Darwin S. Dichmann
    • , Richard M. Harland
    •  & Daniel S. Rokhsar
  2. US Department of Energy Joint Genome Institute, Walnut Creek, California 94598, USA

    • Adam M. Session
    • , Jarrod A. Chapman
    • , Uffe Hellsten
    • , Shengquiang Shu
    • , Joseph Carlson
    • , Jerry Jenkins
    • , Jane Grimwood
    • , Jeremy Schmutz
    •  & Daniel S. Rokhsar
  3. Department of Applied Molecular Biosciences, Graduate School of Bioagricultural Sciences, Nagoya University, Furo-cho, Chikusa-ku, Nagoya, Aichi 464-8601, Japan

    • Yoshinobu Uno
    •  & Yoichi Matsuda
  4. Department of Molecular Biosciences, Center for Systems and Synthetic Biology, University of Texas at Austin, Austin, Texas 78712, USA

    • Taejoon Kwon
    • , Edward M. Marcotte
    •  & John B. Wallingford
  5. Department of Biomedical Engineering, School of Life Sciences, Ulsan National Institute of Science and Technology, Ulsan 689-798, Republic of Korea

    • Taejoon Kwon
  6. Center for Information Biology, and Advanced Genomics Center, National Institute of Genetics, 1111 Yata, Mishima, Shizuoka 411-8540, Japan

    • Atsushi Toyoda
    •  & Asao Fujiyama
  7. Amphibian Research Center, Graduate School of Science, Hiroshima University, 1-3-1 Kagamiyama, Higashi-Hiroshima, Hiroshima 739-8526, Japan

    • Shuji Takahashi
    •  & Atsushi Suzuki
  8. Laboratory of Tissue and Polymer Sciences, Faculty of Advanced Life Science, Hokkaido University, N10W8, Kita-ku, Sapporo 060-0810, Japan

    • Akimasa Fukui
  9. Division of Human Sciences, Graduate School of Integrated Arts and Sciences, Hiroshima University, 1-7-1 Kagamiyama, Higashi-Hiroshima, Hiroshima 739-8521, Japan

    • Akira Hikosaka
  10. Misaki Marine Biological Station (MMBS), Graduate School of Science, The University of Tokyo, 1024 Koajiro, Misaki, Miura, Kanagawa 238-0225, Japan

    • Mariko Kondo
  11. Radboud University, Faculty of Science, Department of Molecular Developmental Biology, 259 RIMLS, M850/2.97, Geert Grooteplein 28, Nijmegen 6525 GA, the Netherlands

    • Simon J. van Heeringen
    • , Georgios Georgiou
    • , Sarita S. Paranjpe
    • , Ila van Kruijsbergen
    •  & Gert Jan C. Veenstra
  12. Salk Institute, Molecular Neurobiology Laboratory, La Jolla, San Diego, California 92037, USA

    • Ian Quigley
  13. Salk Institute for Biological Studies, 10010 North Torrey Pines Road, La Jolla, San Diego, California 92037, USA

    • Sven Heinz
  14. Department of Animal Bioscience, Nagahama Institute of Bio-Science and Technology, 1266 Tamura, Nagahama, Shiga 526-0829, Japan

    • Hajime Ogino
  15. Institute for Promotion of Medical Science Research, Yamagata University Faculty of Medicine, 2-2-2 Iida-Nishi, Yamagata, Yamagata 990-9585, Japan

    • Haruki Ochi
  16. Molecular Genetics Unit, Okinawa Institute of Science and Technology Graduate University, Onna, Okinawa 904-0495, Japan

    • Oleg Simakov
    •  & Daniel S. Rokhsar
  17. Dovetail Genomics LLC. Santa Cruz, California 95060, USA

    • Nicholas Putnam
    •  & Jonathan Stites
  18. Department of Genome Medicine, National Research Institute for Child Health and Development, NCCHD, 2-10-1, Okura, Setagaya-ku, Tokyo 157-8535, Japan

    • Yoko Kuroki
  19. Department of Life Science and Technology, Tokyo Institute of Technology, 4259 Nagatsuta, Midori-ku, Yokohama 226-8501, Japan

    • Toshiaki Tanaka
  20. Department of Life Sciences, Graduate School of Arts and Sciences, The University of Tokyo, 3-8-1, Komaba, Meguro-ku, Tokyo 153-8902, Japan

    • Tatsuo Michiue
  21. Institute of Institution of Liberal Arts and Fundamental Education, Tokushima University, 1-1 Minamijosanjima-cho, Tokushima 770-8502, Japan

    • Minoru Watanabe
  22. Harry Perkins Institute of Medical Research and ARC Centre of Excellence in Plant Energy Biology, The University of Western Australia, Perth, Western Australia 6009, Australia

    • Ozren Bogdanovic
    •  & Ryan Lister
  23. Department of Life Science, Faculty of Science, Rikkyo University, 3-34-1 Nishi-Ikebukuro, Toshima-ku, Tokyo 171-8501, Japan

    • Tsutomu Kinoshita
  24. Department of Microbiology and Immunology, University of Maryland, 655 W Baltimore St, Baltimore, Maryland 21201, USA

    • Yuko Ohta
    •  & Martin F. Flajnik
  25. Kitasato Institute for Life Sciences, Kitasato University, 5-9-1 Shirokane Minato-ku, Tokyo 108-8641, Japan

    • Shuuji Mawaribuchi
  26. HudsonAlpha Institute of Biotechnology, Huntsville, Alabama 35806, USA

    • Jerry Jenkins
    • , Jane Grimwood
    •  & Jeremy Schmutz
  27. Department of Human Genetics, University of Chicago, 920 E. 58th St, CLSC 431F, Chicago, Illinois 60637, USA

    • Sahar V. Mozaffari
  28. Department of Computational Biology and Medical Sciences, The University of Tokyo, 5-1-5 Kashiwanoha, Kashiwa-shi, Chiba 277-8568, Japan

    • Yutaka Suzuki
  29. Biotechnology Research Institute for Drug Discovery, National Institute of Advanced Industrial Science and Technology (AIST), Central 5, 1-1-1 Higashi, Tsukuba, Ibaraki 305-8565, Japan

    • Yoshikazu Haramoto
    • , Yuzuru Ito
    •  & Makoto Asashima
  30. Division of Morphogenesis, Department of Developmental Biology, National Institute for Basic Biology, 38 Nishigonaka, Myodaiji, Okazaki, Aichi 444-8585, Japan

    • Takamasa S. Yamamoto
    • , Chiyo Takagi
    •  & Naoto Ueno
  31. University of California, Berkeley, Department of Molecular and Cell Biology, Life Sciences Addition #3200, Berkeley California 94720-3200, USA

    • Rebecca Heald
    •  & Kelly Miller
  32. Illumina Inc., 25861 Industrial Blvd, Hayward, California 94545, USA

    • Christian Haudenschild
  33. Department of Genome Sciences, University of Washington, Foege Building S-250, Box 355065, 3720 15th Ave NE, Seattle Washington 98195-5065, USA

    • Jacob Kitzman
    •  & Jay Shendure
  34. Department of Biology, University of Virginia, Charlottesville, Virginia 22904, USA

    • Takuya Nakayama
  35. Department of Biology, Faculty of Science, Niigata University, 8050, Ikarashi 2-no-cho, Nishi-ku, Niigata 950-2181, Japan

    • Yumi Izutsu
  36. Department of Microbiology & Immunology, University of Rochester Medical Center, Rochester, New York 14642, USA

    • Jacques Robert
  37. Division of Developmental Biology, Cincinnati Children's Research Foundation, Cincinnati, Ohio 45229-3039, USA

    • Joshua Fortriede
    • , Kevin Burns
    •  & Aaron M. Zorn
  38. Department of Biological Sciences, University of Calgary, Alberta T2N 1N4, Canada

    • Vaneet Lotay
    • , Kamran Karimi
    •  & Peter D. Vize
  39. Marine Genomics Unit, Okinawa Institute of Science and Technology Graduate University, 1919-1 Tancha, Onna-son, Okinawa 904-0495, Japan

    • Yuuri Yasuoka
  40. The University of Iowa, Department of Biology, 257 Biology Building, Iowa City, Iowa 52242-1324, USA

    • Douglas W. Houston
  41. Department of Zoology and Evolutionary Biology, University of Basel, Basel CH-4051, Switzerland

    • Louis DuPasquier
  42. Department of Biological Sciences, School of Science, Kitasato University, 1-15-1 Minamiku, Sagamihara, Kanagawa 252-0373, Japan

    • Michihiko Ito
  43. Department of Basic Biology, SOKENDAI (The Graduate University for Advanced Studies), 38 Nishigonaka, Myodaiji, Okazaki, Aichi 444-8585, Japan

    • Naoto Ueno
  44. Principles of Informatics, National Institute of Informatics, 2-1-2 Hitotsubashi, Chiyoda-ku, Tokyo 101-8430, Japan

    • Asao Fujiyama
  45. Department of Genetics, SOKENDAI (The Graduate University for Advanced Studies), 1111 Yata, Mishima, Shizoka 411-8540, Japan

    • Asao Fujiyama
  46. Department of Biological Sciences, Graduate School of Science, The University of Tokyo, 7-3-1 Hongo, Bunkyo-ku, Tokyo 113-0033, Japan

    • Masanori Taira


  1. Search for Adam M. Session in:

  2. Search for Yoshinobu Uno in:

  3. Search for Taejoon Kwon in:

  4. Search for Jarrod A. Chapman in:

  5. Search for Atsushi Toyoda in:

  6. Search for Shuji Takahashi in:

  7. Search for Akimasa Fukui in:

  8. Search for Akira Hikosaka in:

  9. Search for Atsushi Suzuki in:

  10. Search for Mariko Kondo in:

  11. Search for Simon J. van Heeringen in:

  12. Search for Ian Quigley in:

  13. Search for Sven Heinz in:

  14. Search for Hajime Ogino in:

  15. Search for Haruki Ochi in:

  16. Search for Uffe Hellsten in:

  17. Search for Jessica B. Lyons in:

  18. Search for Oleg Simakov in:

  19. Search for Nicholas Putnam in:

  20. Search for Jonathan Stites in:

  21. Search for Yoko Kuroki in:

  22. Search for Toshiaki Tanaka in:

  23. Search for Tatsuo Michiue in:

  24. Search for Minoru Watanabe in:

  25. Search for Ozren Bogdanovic in:

  26. Search for Ryan Lister in:

  27. Search for Georgios Georgiou in:

  28. Search for Sarita S. Paranjpe in:

  29. Search for Ila van Kruijsbergen in:

  30. Search for Shengquiang Shu in:

  31. Search for Joseph Carlson in:

  32. Search for Tsutomu Kinoshita in:

  33. Search for Yuko Ohta in:

  34. Search for Shuuji Mawaribuchi in:

  35. Search for Jerry Jenkins in:

  36. Search for Jane Grimwood in:

  37. Search for Jeremy Schmutz in:

  38. Search for Therese Mitros in:

  39. Search for Sahar V. Mozaffari in:

  40. Search for Yutaka Suzuki in:

  41. Search for Yoshikazu Haramoto in:

  42. Search for Takamasa S. Yamamoto in:

  43. Search for Chiyo Takagi in:

  44. Search for Rebecca Heald in:

  45. Search for Kelly Miller in:

  46. Search for Christian Haudenschild in:

  47. Search for Jacob Kitzman in:

  48. Search for Takuya Nakayama in:

  49. Search for Yumi Izutsu in:

  50. Search for Jacques Robert in:

  51. Search for Joshua Fortriede in:

  52. Search for Kevin Burns in:

  53. Search for Vaneet Lotay in:

  54. Search for Kamran Karimi in:

  55. Search for Yuuri Yasuoka in:

  56. Search for Darwin S. Dichmann in:

  57. Search for Martin F. Flajnik in:

  58. Search for Douglas W. Houston in:

  59. Search for Jay Shendure in:

  60. Search for Louis DuPasquier in:

  61. Search for Peter D. Vize in:

  62. Search for Aaron M. Zorn in:

  63. Search for Michihiko Ito in:

  64. Search for Edward M. Marcotte in:

  65. Search for John B. Wallingford in:

  66. Search for Yuzuru Ito in:

  67. Search for Makoto Asashima in:

  68. Search for Naoto Ueno in:

  69. Search for Yoichi Matsuda in:

  70. Search for Gert Jan C. Veenstra in:

  71. Search for Asao Fujiyama in:

  72. Search for Richard M. Harland in:

  73. Search for Masanori Taira in:

  74. Search for Daniel S. Rokhsar in:


R.M.H., M.T., D.S.R., G.J.C.V., A.Fuj., A.S., A.M.S., T.Kw., Y.U., A.Fuk., M.K. and H.Og. provided project leadership, with additional project management from Y.M., M.A., Y.Iz., N.U., J.Sh., J.B.W., E.M.M., J.Sc., A.M.Z., P.D.V. and M.I. Y.Iz. and J.R. inbred J strain frogs. A.T., C.H., A.Fuj., J.G., J.C., J.K., J.Sh., T.Mit. and J.B.L. generated genome sequence data. J.A.C., A.M.S., T.Kw., J.J., A.Fuk., M.T. and J.Sc. performed genome assembly and validation. S.T., T.Kw., A.M.S., Y.S., T.T., A.T., A.S. and M.T. generated and analysed the transcriptome data. A.M.S., T.Kw., S.J.v.H. and S.S. generated the annotations. Manual validation of annotation was done by H.Og., S.T., A.Fuk., A.S., M.K., H.Oc., T.T., T.Mic., M.W., T.Ki., Y.O., S.Ma., Y.H., T.N., Y.Y., J.F., K.B., V.L., D.W.D., M.T. and K.K. K.M., A.M.S. and R.H. generated the Hymenochirus transcriptome data. A.M.S. performed the phylogenetic analysis, with contributions from S.V.M. and U.H. M.W., A.Fuk., S.Ma., Y.U., Y.M. and M.T. performed the chromosome structure analysis. A.M.S., A.H., O.S., J.C. and Y.U. studied the transposable elements. BAC-FISH was performed by Y.U., A.Fuk., M.K., A.T., S.T., H.Og., H.Oc., Y.K., T.T., T.M., M.W., T.Ki., Y.O., Y.H., T.S.Y., C.T., T.N., A.S., Y.M., N.U., M.A., Y.Iz., A.Fuj. and M.T. I.Q., S.H., N.P. and J.St. generated and analysed the chromatin conformation capture data and their use in long-range scaffolding. H.Og. and H.Oc. performed the transgenic enhancer analysis. S.J.v.H, G.G., S.S.P., I.v.K., O.B., R.L., and G.J.C.V. generated and analysed the epigenetic data. A.S., A.M.S., T.Ki., M.K., M.T., Y.O., T.T., A.Fuk., M.W., T.Mic., D.W.H., T.N. and L.D. conducted the gene and pathway analysis. D.S.R., A.M.S., T.Kw., R.M.H., M.T., A.S., Y.U., G.J.C.V., M.K., U.H., S.J.v.H., A.Fuk., A.H., O.S., H.Og., T.T., I.Q., J.K., Y.O., S.T., M.W., T.Mic., A.T., H.Oc., T.Ki., S.Maw., Y.S., T.N., Y.Iz. and M.F.F. wrote the paper and supplementary notes, with input from all authors.

Competing interests

Dovetail Genomics LLC is a commercial entity developing genome assembly methods. N.P. and J.St. are employees of Dovetail Genomics, and D.S.R. is a scientific advisor to and minor investor in Dovetail.

Corresponding authors

Correspondence to Richard M. Harland or Masanori Taira or Daniel S. Rokhsar.

NCBI (LYTH00000000). Sequence Read Archive (SRP071264, SRP070985). NCBI Gene Expression Omnibus ( GSE73430, GSE73419, GSE76089, GSE76059, GSE76247). DDBJ/GenBank/EMBL ( AP017316 and AP017317).

Reviewer Information Nature thanks C. Amemiya, S. Burgess and the other anonymous reviewer(s) for their contribution to the peer review of this work.

Extended data

Supplementary information

PDF files

  1. 1.

    Supplementary Information

    This file contains Supplementary Notes 1-15, which detail analyses from the main text and additional references (see Contents for more details).

Zip files

  1. 1.

    Supplementary Tables

    This zipped file contains Supplementary Table 1-11.

About this article

Publication history





Further reading


By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.