Introduction

Polyploidy, which results in organisms with multiple chromosome sets, is an important genomic event in speciation and adaptive radiation that has led to the formation of many eukaryotic lineages (Cox et al, 2014). Polyploids with duplicated genomes may originate from single species (autopolyploidy) or from different species through interspecific hybridization (allopolyploidy) (Otto, 2007). Allopolyploids appear prevalent in nature, suggesting an evolutionary advantage of having multiple sets of genetic material for adaptation and development (Mallet, 2007). However, much remains unknown about the processes and consequences of allopolyploidy (Abbott et al, 2013).

Allopolyploidy can promote the activation of cryptic mobile elements and cause rapid genomic changes (McClintock, 1984). This ‘genomic shock’ has been reported in many allopolyploid plants translating as gene loss, chromosome mis-pairing, retrotransposon activation, altered methylation or rearrangements between parental genomes that could lead to novel gene sequences or differential homoeologous gene expression in hybrids throughout evolution (Cox et al, 2014). Rapid genomic DNA changes have also been demonstrated in several allopolyploid plants. For example, homoeologous nonreciprocal recombination and biased expression of homeologs found in allopolyploid cotton Gossypium sp. (Salmon et al, 2010), rapid genetic and epigenetic changes in allopolyploid wheat Triticum aestivum (Feldman and Levy, 2012) and retrotransposon activation, genomic rearrangements and trait variation in rape Brassica napus (Zou et al, 2011). These studies provide evidence that a series of radical dynamic and stochastic genomic changes (that is, ‘genomic shock’) were necessary for the establishment of new allopolyploid plant species.

Although allopolyploidization is less prevalent in animals than in plants, with the development of new genetics technologies, more events of polyploidization and hybridization have been discovered and studied in animals (Otto, 2007). For example, investigating the genetic fate of duplicated RAG genes and rapid epigenetic changes in the allopolyploid clawed frogs Xenopus sp. (Evans et al, 2005; Chain and Evans, 2006; Koroma et al, 2011) or gene dosage compensation and micro RNA expression in allopolyploid Squalius alburnoides complex (an Iberian cyprinid fish) (Pala et al, 2008; Inácio et al, 2012). However, the manifestations of genomic shock remain less understood in the animal systems at genomic level. In addition, natural allopolyploids are usually formed hundreds or even thousands of years ago and the original diploid parental species are often extinct or difficult to identify as they continue evolving since the formation of their hybrid (Song et al, 1995). Thus, synthetic allopolyploids provide a model system to study and characterize radical genomic changes at early evolutionary stages.

Most genomic studies have focused on the coding sequences of functional genes, mainly because they are directly related to biological function. But non-coding regions are equally important, especially in the context of genome evolution (Coulombe-Huntington and Majewski, 2007). Introns possess a broad spectrum of functions and are involved in virtually every step of messenger RNA processing (Carmel and Chorev, 2012). The major objective of this research is to characterize rapid genomic changes (including intron evolution) in an allotetraploid fish hybrid lineage that was created through artificial crosses between red crucian carp (RCC) (Carassius auratus red var., ♀, 2n=100) × common carp (CC) (Cyprinus carpio L., ♂, 2n=100) and maintained for over 20 generations (F20) (Liu et al, 2001). These allotetraploid fishes are currently used as the paternal progenitor to produce allotriploid fish for aquaculture purposes. Our previous studies demonstrated that F1 and F2 progenies were fertile diploid hybrids (2n=100). However, from F3 onwards, fertile allotetraploid individuals of both sexes were produced (Liu et al, 2001; Liu, 2010). This allotetraploid hybrid fish system is therefore unique and provides an opportunity to test which genetic elements are susceptible to rapid genomic changes. Our previous studies also demonstrated the loss of paternal DNA fragments and a recombined mitochondrial DNA segment in this allopolyploid hybrid lineage (Liu, 2010). In the present study, we performed a comparative study on this artificially derived allotetraploid lineage (4nAT) with parental species and hybrids of first generation (F1), with the expectation to provide a new perspective on genomic evolution in allopolyploid animals.

Materials and methods

BAC library construction and sequencing

DNA samples of five individuals from the 20th generation progeny of an allotetraploid hybrid lineage (4nAT) were used to construct a bacterial artificial chromosome (BAC) library. The fish were sampled from Engineering Research Center of Polyploid Fish Breeding at Hunan Normal University (Changsha, China). A total of 10 ml fresh peripheral blood from each fish was used for DNA extraction using the method previously reported to generate high-molecular weight genomic DNA (Katagiri et al, 2000). The homogenized blood sample was mixed with an equal volume of pre-warmed 1% low melting point agarose at a concentration of 5 × 108 cells ml−1, and cast into plugs using plug molds (Bio-Rad, Guangzhou, China). The agarose plugs were cut and digested with the restriction enzyme HindIII, DNA fragments (100–300 kb) were ligated to CopyControl pCC1BAC (HindIII Cloning-Ready) Vector (Epicentre, Madison, WI, USA), and then transformed to Escherichia coli DH10B. All positive BAC clones were hand-picked and stored at −80 °C. A total of 14 BAC clones were randomly chosen for shotgun sequencing. Subclone libraries were constructed first with insert fragment size 1–2 kb and sequenced on an ABI3730xl platform by Majorbio Bio-pharm Technology Co. Ltd (Shanghai, China). The computer programs PHRED, PHRAP and CONSED were used to perform base calling and quality assessment, sequence assembly and contigs ordering for each BAC clone (Ewing and Green, 1998; Gordon et al, 1998). The programs were used under their default settings.

Characterization of BAC clone sequences

Guanine–cytosine content of each BAC clone sequence was calculated using a custom Perl script. Repetitive DNA content was estimated using RepeatMasker (version 4.0.5, http://www.repeatmasker.org) against the REPBASE repeat database (20140131). The program FGENESH (Salamov and Solovyev, 2000) was used to predict gene structure of the 14 BAC sequences using zebrafish Danio rerio as reference organism. Predicted genes were annotated using BLASTX against the NCBI-NR (National Center for Biotechnology Information-Non-redundant) protein database with an E-value <1E-10.

Evolution of intron density

The intron information (for example, intron number, intron length) was extracted from the gene structure predicted for each BAC sequence, and intron density was calculated based on number of introns per gene. To infer the evolutionary trend of intron in 4nAT across the fish lineage, comparative intron analysis was conducted among 4nAT and other fish species, by selecting annotated orthologous sequences from zebrafish (DRE), medaka (Oryzias latipes), fugu (Takifugu rubripes), tetraodon (Tetraodon nigroviridis), tilapia (Oreochromis niloticus), sea lamprey (Petromyzon marinus, PMA), three-spined stickleback (Gasterosteus aculeatus), Atlantic cod (Gadus morhua), Human (Homo sapiens) and western clawed frog (Xenopus tropicalis), and used H. sapiens and X. tropicalis as outgroups. Orthologous gene sequences and detailed intron information were downloaded from Ensembl website (http://asia.ensembl.org/index.html). Multiple alignments of protein sequences with known intron locations and lengths were used to analyze intron evolution using Malin software (http://www.iro.umontreal.ca/~csuros/introns/malin/). The guide tree was constructed based upon ADP-ribosylation factor-like 14 gene sequences. The Dollo parsimony method was used to estimate intron loss and gain events (Csűrös, 2008).

Homologous gene amplification in parental and hybrid fishes

To verify the accuracy of sequencing assembly and estimate the genomic changes, all predicted functional genes were amplified from 4nAT, diploid F1 hybrids (2nF1), RCC and CC. Polymerase chain reaction primers were designed based on the conserved regions of DRE, O. latipes, T. rubripes, T. nigroviridis, O. niloticus and 4nAT across different exons (Supplementary Table S1). Genomic DNA of 4nAT, 2nF1, RCC and CC were extracted from peripheral blood using the Ezup Column Blood Genomic DNA Extraction Kit (Sangon Biotech, Shanghai, China). Long polymerase chain reaction was carried out using TaKaRa PrimeSTAR GXL DNA Polymerase (TakaRa, Dalian, China), with the following polymerase chain reaction settings: 94 °C for 1 min; 30 cycles of 98 °C for 10 s, 68 °C for N min (where N represents the annealing time for each gene) and 72 °C for 10 min. Polymerase chain reaction products within the expected size range were extracted and purified using SanPrep Column DNA Gel Extraction Kit (Sangon Biotech). The purified DNA fragments were inserted into vector PMD18-T (TakaRa) and transformed into DH5α competent cells. A total of 10 clones for each gene insert in 4nAT, 2nF1, RCC and CC were sequenced on the ABI3730 sequencer (GenScript Corporation, Nanjing, China). Gene sequences were named with the abbreviated fish name followed by Roman numbers. For example, if RCC had two gene copies, they would be named RCC-I and RCC-II, respectively.

Sequence comparison and analysis

Sequences of complementary DNA for each gene were verified and annotated through Basic Local Alignment Search Tool X (BLASTX) search against NCBI-NR database (http://blast.ncbi.nlm.nih.gov) and Ensembl zebrafish DRE protein database (http://uswest.ensembl.org/Tools/Blast?db=core). Multiple sequence alignment was performed to assess sequence similarity and variation among parental species (RCC, CC) and hybrids (4nAT, 2nF1) using ClustalW (Thompson et al, 1994). MEGA 5 was used to construct phylogenetic trees based on exon, intron and whole-DNA sequences using the maximum likelihood method with 1000 bootstraps (Tamura et al, 2011). Protein structures were predicted through the SWISS-MODEL web server (Arnold et al, 2006). To identify homoeologous recombination in hybrids, diagnostic single-nucleotide polymorphisms (SNPs) for RCC and CC were first identified, and then SNP biases were examined in the sequences of hybrids. In addition, we defined two genetic patterns to assess parental origin of hybrid sequences: (i) ‘genetic inheritance’ in case hybrid gene sequences were identical to either parental species; and (ii) ‘genetic variation’ in case large sequence variation was found between hybrids and their parental species, such as homoeologous recombination, DNA fragments insertion and deletion.

Results

Characterization of BAC clone sequences

A selection of 384–576 subclones with fragment inserts of 1–2 kb were sequenced and assembled for each of the 14 BAC clone sequences (NCBI accession numbers: KF758440-KF758444 and KJ424354-KJ424362). The lengths of the assembled BAC clone sequences ranged from 17 849 bp to 87 725 bp, with an average of 42 295 bp. Guanine–cytosine content of the BAC clones ranged from 34.38 to 39.80% with an average of 37.10% (Table 1).

Table 1 Sequence information of allotetraploid hybrids’ (4nAT) BAC clones

From the 14 assembled BAC clone sequences, 103 411 bp out of the total 592 126 bp were identified as repetitive sequences (17.46%). The classification and corresponding proportion of different repetitive elements were shown in Table 2. The most abundant type of repetitive elements in allotetraploid hybrids (4nAT) was retroelements (6.47%) followed by DNA transposons (4.17%). Other repeats include 345 simple repeats (3.67%), 118 low-complexity repeats (0.85%) and 105 small RNA repeat (1.59%). In comparison with CC, the only parental species for which a BAC library has been constructed and analyzed (Xu et al, 2011), 4nAT hybrids presented a higher proportion of retroelements and a lower proportion of DNA transposons (Table 2).

Table 2 Repetitive DNA elements detected in allotetraploid hybrids’ (4nAT) BAC clone sequences

Eleven functional genes were predicted (Table 3) with total lengths ranging from 609 bp to 41 004 bp, and coding DNA sequence regions of 609 bp to 3579 bp long. Two genes (ADP-ribosylation factor-like 14 and chemokine (C-X-C motif) receptor 7b) had no intron. The importin-13-like gene evidenced the largest number of exons and introns (28 exons and 27 introns). The amino-acid sequence similarity between 4nAT and zebrafish DRE ranged from 43 to 96%, depending on specific genes (Table 3).

Table 3 Gene annotation of allotetraploid hybrids’ (4nAT) BAC clone sequences and comparison with zebrafish Danio rerio orthologous sequences

Evolution of intron density

Intron information of orthologous genes in different species was detailed in Table 4. The total number of introns ranged from 81 to 112, and the intron density was from 9.4 in 4nAT to 12.9 in G. aculeatus, respectively. The average intron length in 4nAT (8564 bp) was larger than in the selected fish species except for DRE, G. morhua and PMA. The analysis of intron evolution demonstrated that the introns of the nine 4nAT genes evolved dynamically with a net gain of 30 introns and a loss of 39 introns. When compared to PMA or to O. latipes genes, a net gain of 19 and a loss of 10 introns and a net gain of 2 and a loss of 15 introns were found in the two fish species, respectively (Figure 1). The number of introns shared between 4nAT and the other fish species ranged from 14 to 25. However, when comparing DRE with the remaining fish species, this value ranged from 49 to 63 introns (Figure 2), which suggests rapid intron evolution in the 4nAT.

Table 4 Intron information of orthologous genes identified in allotetraploid hybrids (4nAT) and other fish species
Figure 1
figure 1

Dollo parsimony prediction of intron densities for orthologous genes among fish species. Numbers in red circles indicate conserved intron numbers in a species or a phylogenetic clade, with the area of red circles proportional to the corresponding numbers; numbers with ‘−‘ and ‘+’ in green boxes indicate number of introns lost and gained, respectively. TRU: Takifugu rubripes, TNI: Tetraodon nigroviridis, ONI: Oreochromis niloticus, GAC: Gasterosteus aculeatus, GMO: Gadus morhua, DRE: Danio rerio, 4nAT: allotetraploid hybrid lineage, OLA: Oryzias latipes, PMA: Petromyzon marinus, HSA: Homo sapiens and XTR: Xenopus tropicalis.

Figure 2
figure 2

Number of shared introns between model fish species and allotetraploid hybrids (4nAT) (orange) or Danio rerio (blue) predicted by the Malin software based on the nine orthologous genes. TRU: Takifugu rubripes, TNI: Tetraodon nigroviridis, ONI: Oreochromis niloticus, GAC: Gasterosteus aculeatus, GMO: Gadus morhua, 4nAT: allotetraploid hybrid lineage, OLA: Oryzias latipes, PMA: Petromyzon marinus, HSA: Homo sapiens and XTR: Xenopus tropicalis.

Comparative analysis of functional genes among parental species and hybrids

The sequences of seven homologous genes for 4nAT, 2nF1, RCC and CC are available at NCBI GenBank (Accession numbers: KF769270-KF769301 and KM088001-KM088012). Different copies of these genes were described in all the four species/lineages (Table 5). Eight models of genetic change could be associated with hybrid sequence inheritance, with models 1 and 2 belonging to the 'genetic inheritance' pattern and models 3–8 attributed to the ‘genetic variation’ pattern (Figure 3). Four genes of 2nF1 and two genes of 4nAT were associated with the ‘genetic inheritance’ pattern. Five genes showed the ‘genetic variation’ pattern in at least one sequence copy in 4nAT, whereas only three genes of 2nF1 demonstrated rapid sequence change in 4nAT.

Table 5 Information on the homologous genes from allotetraploid hybrids (4nAT), diploid hybrids (2nF1), red crucian carp (RCC) and common carp (CC)
Figure 3
figure 3

Patterns of genetic inheritance (Model 1 and 2) and genetic variation (Model 3–8) hypothesized for allotetraploid hybrids (4nAT). The red color indicates homoeologous sequences inherited from red crucian carp (RCC), whereas the blue color indicates homoeologous sequences inherited from common carp (CC). The green color indicates novel sequences in 4nAT. Rv, Cv, and Hv denote SNP variation identical to RCC, CC or 4nAT, respectively. Numbers following Rv, Cv and Hv indicate hypothetical numbers of SNP variation.

Two genes (fizzy-related protein homolog and importin-13-like) fell strictly within the ‘genetic inheritance’ pattern in both 4nAT and 2nF1 hybrids. The 2nF1-II and 4nAT copies of the fizzy-related protein homolog gene showed a higher similarity to CC (97.2% and 97.0%, respectively) (Figure 3—Model 2, Supplementary Figure S8A), whereas 2nF1-I showed a relatively higher sequence similarity to CC (73.8%) than to RCC (55.8%) (Supplementary Table S2,Supplementary Figure S1). The 2nF1-I copy of the importin-13-like gene was found to have been inherited from RCC (99.8%), whereas the 2nF1-II copy was highly similar to CC-II (98.0%). On the other hand, sequence similarity of importin-13-like gene was much higher between 4nAT and RCC (99.6%) than with CC-I (67.9%) or CC-II (70.8%) (Supplementary Table S3, Supplementary Figure S2, Figure 3—Model 1, Supplementary Figure S8B).

Two genes (denticleless homolog and iqca1) fell strictly within the 'genetic variation' pattern in both 4nAT and 2nF1 hybrids. The alignment and phylogenetic analyses of the denticleless homolog gene (Supplementary Figure S3) demonstrated that both 4nAT-I and 4nAT-II sequences underwent homoeologous recombination (Figure 3—Model 5 and 6). Results based on exon sequences clustered 4nAT-II with CC, however, using intron and whole-gene sequences, 4nAT was grouped with RCC (Supplementary Figure S8C). This discrepancy is indicative of homoeologous recombination also detected in the denticleless homolog gene copy of 2nF1 hybrids but with more mutations when compared with RCC and CC sequences (Supplementary Figure S3, Supplementary Table S4). The iqca1 gene had a single copy in all four fish species/lineages, with many mutations, deletions and insertions found in 2nF1 and 4nAT hybrids (Figure 3—Model 8, Supplementary Table S5,Supplementary Figure S4 and Supplementary Figure S8D). Particularly, extensive deletions in exon 2 and intron 2 (Figure 4) and altered protein structure (4nAT and 2nF1) (Supplementary Figure S9) indicate marked sequence change as a consequence of hybridization.

Figure 4
figure 4

Sequence alignment of iqca1 exon 2 and intron 2 from red crucian carp (RCC), common carp (CC), diploid F1 hybrids (2nF1) and allotetraploid hybrids (4nAT). Red background indicates deletion, black background with dots indicates identical region.

As regard to the chemokine (C-X-C motif) receptor 7b gene, both ‘genetic inheritance’ and ‘genetic variation’ patterns were implicated in 2nF1 and 4nAT hybrid lineages. Sequence similarity was very high between 2nF1-II and CC (99.0%), 4nAT-I and RCC-II (99.2%) and between 4nAT-II and CC (96.8%) (Supplementary Table S6,Supplementary Figure S5). Although 2nF1-I and 4nAT-III showed higher sequence similarity to RCC-I (96.4% and 96.3%, respectively), fragments of homoeologous recombination with CC were also detected (strong SNP bias at positions 1–150 bp and 459–542 bp) (Figure 3—Model 4), resulting in three phylogenetic clusters: (i) 2nF1-II, CC and 4nAT-II; (ii) 4nAT-I and RCC-II; and (iii) 2nF1-I, 4nAT-III and RCC-I (Figure 5).

Figure 5
figure 5

Genetic relationship among chemokine (C-X-C motif) receptor 7b sequences from red crucian carp (RCC), common carp (CC), diploid F1 hybrids (2nF1) and allotetraploid hybrids (4nAT) using Danio rerio (DRE) as outgroup. Bootstrap values above 50 and percent nucleotide substitution were shown.

The guanine nucleotide-binding protein G(I)/G(S)/G(O) subunit gamma-5 gene and phosphodiesterase 11 A gene showed a strict pattern of ‘genetic inheritance’ in 2nF1 hybrids, whereas in 4nAT, both patterns were found for the first gene and only 'genetic variation' for the second gene. Multiple sequence alignment analysis of guanine nucleotide-binding protein G(I)/G(S)/G(O) subunit gamma-5 gene (Supplementary Figure S6) revealed higher similarity between 2nF1-I and RCC (98.6%), between 2nF1-II and CC (98.9%) (Supplementary Table S7), between 4nAT-I and CC (99.3%) and between 4nAT-II and RCC (99.6%) (Supplementary Table S7, Supplementary Figure S8E). Homoeologous recombination was detected in 4nAT-III with a SNP bias towards CC from 1 to 60 bp (Supplementary Figure S6) and the remaining SNPs biased towards RCC (Figure 3—Model 3). Homoeologous recombination was also found in two sequences of the 4nAT phosphodiesterase 11 A gene (Supplementary Figure S7): SNPs from 1-4366 bp of 4nAT-I were similar to CC-I, whereas SNPs starting from 4367 bp were similar to RCC-II (Figure 3—Model 7); from 1 to 1991 bp of 4nAT-II, the sequence was similar to RCC-II with only few mutations, from 1992 to 4365 bp it was similar to CC and again from 4366 bp onwards it was similar to RCC-II (Figure 3—Model 5). 2nF1-I and 2nF1-II both showed higher sequence similarity with RCC-I (93.6% and 90.8%, respectively) (Supplementary Table S8,Supplementary Figure S8F).

Discussion

Genomic properties of 4nAT BAC clone sequences

Allotetraploid hybrids are characterized by doubled genomes. The increased genome size could produce duplicated functional genes that can also be accompanied by rapid and extensive genomic DNA changes and gene silencing (Chain and Evans, 2006). DNA methylation, a critical mechanism of gene silencing, usually occurs in promoter regions as a means of regulating the expression level of the duplicated genes for them to stabilize in the allotetraploid population and promote evolution (Sehrish et al, 2014). This process might explain variant densities and distributions of CpG islands among fish genomes, which can cause guanine–cytosine content variation (Han and Zhao, 2008). In the clawed frogs Xenopus sp. hybrid system, a higher proportion of methylated fragments were found in the hybrids compared with the parental species (Koroma et al, 2011). Higher guanine–cytosine content was also found in the allotetraploid hybrid 4nAT in contrast to its male progenitor CC (Xu et al, 2011) that could be associated with an elevated level of DNA methylation levels (Xiao et al, 2013). However, given that only a small portion of the genome was surveyed, the definite assentation that DNA methylation is more prevalent in the allotetraploid hybrids versus parental species requires further analysis of other genomic regions.

The proportion of all repetitive elements was comparable between 4nAT and CC (Xu et al, 2011) but 4nAT were characterized by a higher proportion of retrolements (Table 3). The duplication and/or transposition of retroelements into new sites may directly affect gene structure, and the presence of multiple copies of these elements throughout the genome could have long-term effects on recombination events and a more subtle influence on gene expression (Feschotte, 2008). The insertion of retroelements within genes could cause gene inactivation through disruption of the reading frame or promoter regions, another mechanism to turn off one copy of duplicated genes in allopolyploid species (Casacuberta and González, 2013). In the sunflower Helianthus sp. the genome size is at least 50% larger in hybrids than in pure species owing to retrotransposon proliferation (Ungerer et al, 2006). The genome size of allotetraploid hybrids (4nAT) (C-value: ~3.86 pg) is estimated to be larger than RCC (C-value: 1.88–2.14 pg) and CC (C-value: 1.61–2.03 pg) (http://www.genomesize.com) which could be associated with the proportion of retroelements (Table 3). Our limited sequence data at the moment, however, does not allow us to make any further assertions.

Evolution of intron density

Comparative intron evolution analysis was conducted among 4nAT and other fish species in this study to assess the rate of intron evolution in 4nAT. Our results suggested that the smallest intron density found in 4nAT compared with other fish species could be attributed to DNA loss, a common mechanism of genome stabilization in allopolyploid hybrids (Buggs et al, 2012). The relatively larger intron length found in 4nAT hybrids in comparison with selected fish species except for DRE, G. morhua and PMA can be explained by doubled genome and larger genome size (C-value: ~3.86 pg) of 4nAT. On the other hand, the largest intron density found in PMA may relate to the preservation of ancestral introns (Kawaguchi et al, 2010) and the larger average intron length in DRE is likely to be mainly due to intron size expansion events (Moss et al, 2011).

A previous study on intron evolution in animals attributed mammals' differences in gene structure to intron loss but not to intron gain (Coulombe-Huntington and Majewski, 2007). Our results revealed several intron gain events especially in 4nAT and PMA. Being a primitive vertebrate species and the ancestor of all teleost fishes it is expected of PMA to possess ancient introns that might have been lost in most divergent fish (Venkatesh et al, 1999). The dynamics of intron gain and loss in 4nAT (39 gained and 30 lost introns) showed a rapid genomic change in these hybrids. A number of factors including genome size, breeding cycle, gene expression level and intron length are related to intron gain and loss events (Coulombe-Huntington and Majewski, 2007). However, interspecific hybridization and polyploidization are likely the most crucial factors in the rapid intron evolution observed in 4nAT possibly allowing for the coordinated expression of duplicated genes (Carmel and Chorev, 2012).

Models of rapid genomic change

Three genetic models of rapid genomic changes were identified in 4nAT hybrids including the loss of parental DNA fragments, homoeologous recombination and the formation of novel genes. The loss of parental DNA fragments has been commonly observed in allopolyploid plants (Buggs et al, 2012; Sehrish et al, 2014). Similarly, it seems to have happened during the establishment of the 4nAT allotetraploid hybrid lineage, as only one copy of fizzy-related protein homolog gene and one copy of importin-13-like gene were found even though two copies for each gene were present in the 2nF1. Previous inter simple sequence repeat and amplified fragment length polymorphism studies indicated a bias of DNA loss towards the paternal (CC) genome (Liu, 2010). This observation agrees with the findings in triticale (a hybrid of wheat and rye) (Ma and Gustafson, 2006) and cordgrass Spartina sp. (Salmon et al, 2005). In this study only two genes evidenced DNA fragments loss, one from each parent, thus hindering the evaluation of potential parental bias. Similarly, research on clawed frogs Xenopus sp. showed no evidence of directional loss of sequences towards either parental species (Koroma et al, 2011). To understand if this might be a particularity of allopolyploid animals, further larger scale genomic studies are required to address this issue.

Genetic recombination is a process that generates novelty contributing to genetic variation and genome structural diversity in organisms (Gaut et al, 2007). In allopolyploids, recombination usually occurs among paralogous or homoeologous sequences and it has been shown as a major cause for genomic changes such as DNA deletion, duplication and gene conversion (Gaeta and Chris Pires, 2010). Genetic recombination has been reported in both allopolyploid plants such as rape Brassica napus (Zou et al, 2011) and cotton Gossypium sp. (Salmon et al, 2010) and allopolyploid animals, for example, clawed frogs and salamanders (Evans et al, 2005; Bi et al, 2008). In the present study, many variations, such as DNA deletions and insertions in 4nAT-I sequences of phosphodiesterase 11 A and denticleless homolog genes are suspected to be the consequence of homoeologous recombination (Figure 3—Model 6). Homoeologous recombination was detected in four out of seven genes in allotetraploid hybrids (4nAT) versus only one out of seven in 2nF1, strongly suggesting this as a potentially important mechanism for the rapid genomic changes observed in 4nAT.

Novel fragments were commonly observed in allopolyploid plants, which have been considered a critical mechanism for adaption and evolution after genome duplication (Chen, 2007). The novel iqca1 gene identified in both 2nF1 and 4nAT hybrids when compared with parental species RCC and CC seems to have resulted from DNA deletions, insertions and mutations (Supplementary Figure S4, Figure 4). The Iqca1 protein structure was different in both hybrids in comparison with the parental species (Supplementary Figure S9), confirming a novel gene function in the hybrids. This, in turn, provides additional evidence to support that interspecific hybridization could foster large genomic changes.

Conclusion

Few allopolyploidization studies focused on animals. We used an artificially derived allotetraploid lineage of freshwater fish to help filling this gap. We sequenced and assembled 14 BAC clones of F20 allotetraploid hybrids (4nAT), analyzed the evolution of introns and compared seven genes across the parental species CC and RCC as well as on 2nF1. This study demonstrated that rapid genomic changes are facilitated by intron gain and loss, homoeologous recombination and the formation of novel genes. Large-scale genomic studies are needed to verify these findings. Nevertheless, this study provided a preliminary genomic characterization of allotetraploid F20 hybrids, revealing evolutionary and functional genomic significance of allopolyploid animals.

Data Archiving

Sequencing data from this article have been deposited in GenBank under the accession numbers: KF758440 to KF758444, KJ424354 to KJ424362, KF769270 to KF769301 and KM088001 to KM088012.