Introduction

All teleosts underwent at least three rounds of whole-genome duplication (WGD) (i.e., teleost-specific WGD) approximately 320 million years ago (Mya). For a few teleost species, including Salmo salar (approximately 80 Mya)1, Cyprinus carpio (approximately 8.2 Mya)2, and Carassius auratus red var. (approximately 18.49 Mya), a fourth WGD event occurred in a common ancestor3. Some studies have concluded that all WGD events were induced by hybridizations and led to polyploidizations, including emerge of allotetraploidizations origin from hybrid of C. auratus red var. × C. carpio 4, and autotetraploidizations which the genome duplication and the excretion of paternal chromosomes occurred in the F1 allotetraploid of C. auratus red var. × Megalobrama amblycephala 1,2,5. However, at least one previous study started to focus on subsequent genome size changes and diploidizations (i.e., returning to a diploid-like condition) in plants6. The silencing of homoeologs (i.e., originating from a parental gene), gene loss, and the formation of novel genes have contributed to the development of new specie7. These studies provided insights into the rediploidization process, which has not been fully characterized.

Rediploidization is considered the principal cause of rapid evolutionary reconciliations between two diverged genomes that accelerate the speciation process1. Some studies have revealed that the combining of two diverged genomes in hybrids can lead to the emergence of novel genotypes and phenotypes. In the latest relevant studies, analyses of total and homoeologous gene expression levels have been used to investigate phenotypic changes8,9,10. An analysis of the transcriptome shock efficiently revealed that considerable changes to gene regulatory networks occurred during a special stage. To fully characterize the rediploidization process and clarify the underlying molecular mechanism, the associated gene expression patterns will need to be investigated. These mechanisms may be related to allelic interactions and gene redundancy, and may involve non-coding RNA, DNA, and methylation and transcriptome changes, possibly leading to the appearance of novel traits11,12,13. For example, populations of gynogenetic diploid fish were observed to grow 30% faster than the parents of allotetraploid hybrid14. The analysis of homoeolog expression provides a useful platform to investigate phenotypic divergence related to growth and other traits after a rediploidization.

The hybridization between C. auratus red var. (2n = 100, ♀) (R) and C. carpio (2n = 100, ♂) (C) resulted in emergence of fertile allodiploid hybrids (F1 and F2, 2n = 100) and allotetraploid hybrids (F3-F25) because of the unreduced gametes from F2 hybrids15. Induction of gynogenesis in the fertile allotetraploid hybrids resulted in the emergence of fertile diploid gynogenetic fish lineage (G1-G10), in which the diploid gynogenetic fish produced diploid eggs with 100 chromosomes, and the fertilization of these eggs with UV-irradiated sperm from common carp resulted in the development of the subsequent diploid gynogenetic fish16,17,18. The diploid gynogenetic fish lineage (G1-G10) is the rediploidization event regarding the allotetraploid lineage. The study based on a fluorescence in situ hybridization (FISH) experiment revealed that half of the R and C (1:1) genomes were present in three hybrid lineages17,19. The allotetraploid hybrid lineage (from 2n to 4n) and the diploid gynogenetic lineage (back to 2n from 4n) with two genome level changes provided an excellent experimental system for the investigation of genetic process regarding the allotetraploidization and rediploidization events.

Studies of the three hybrid linages have focused on genetic mutations19, development of gonads and embryos20,21, and genotypes19, and confirmed the establishment of artificially cultivated and stable allopolyploid populations. The RNA-seq and quantitative real-time PCR (qPCR) techniques have gradually become more commonly used to study non-model organisms regarding the expression of homoeologs, which originate from different species9,10,22,23. In this study, the changes in homoeolog expression levels caused by a rediplodization event were investigated by comparison of homoeolog-specific single nucleotide polymorphisms (SNPs) between the reference genomes of C. auratus red var. and C. carpio. The direction and extent of the homoeolog expression bias (HEB) in diploid gynogenetic fish were then assessed by RNA-seq and qPCR. Additionally, we compared the diploid gynogenetic fish with allodiploid and allotetraploid hybrids of the C. auratus red var. × C. carpio to investigate the changes in homoeolog expression levels induced by different rediploidization mechanisms.

Results

Transcriptome sequencing

To clarify the effects of rediploidization on transcript abundance, three hybrid lineages of the C. auratus red var. × C. carpio with different ploidy levels (i.e., F1, F18, and G4) and their original parents were analyzed15,16 (Table 1 and Fig. 1). Transcriptome sequencing produced 86.5 Gb of raw data for 15 libraries of the three hybrids and their original parents (Table S1). All short-read data were deposited in the Short Read Archive with the following accession numbers: SRX668436, SRX175397, SRX668453, SRX177691, SRX671568, SRX671569, SRX668467, SRX1610992, and SRX2347299.

Table 1 Genome and ploidy levels of two cyprinids and their three types of hybrids.
Figure 1
figure 1

Genotypes of two cyprinid species and their three types of hybrid offspring, including allodiploids, allotetraploids and gynogenetic allodiploids, used for the comparison of homoeolog expression analysis.

Statistical mapping of RNA-seq data

After eliminating the read adapters and low quality reads, clean reads (77.7%) from nine libraries for three hybrid lineages with different ploidy levels were mapped to reference transcriptomes of the maternal (R) and paternal (C) parents to obtain the total gene expression profiles. Clean reads of six C. auratus red var. and C. carpio libraries (84.4%) were mapped to their respective reference transcriptomes (Tables S2 and S3). The 11,998 genes commonly expressed among all samples were used for total gene expression analyses. Meanwhile, the homoeolog expression levels of the three hybrid lineages were detected using another method (described in method “Specific mapping of the R/C homoeologs”). The clean reads for the three hybrid lineages were mapped to reference transcriptomes of the parents based on the threshold values of the specific SNPs. The analysis of R and C homoeolog expression involved only 3,540 genes.

Differential total gene expression in gynogenetic diploid fish

A comparison of total gene expression among the three hybrid groups (i.e., F1, F18, and G4) based on hierarchical clustering revealed that the hybrid groups could clearly be separated from their parents (Fig. 2A). Additionally, the G4 allodiploid was highly correlated with its parental F18 allotetraploid (Fig. 2A). The MA-plots of the comparison of G4 with F1 and F18 are presented in Fig. 2B,C, respectively. Additionally, details regarding the differential total gene expression levels are provided in Fig. 2D. We identified 507 (4.75%) and 1,846 (21.95%) genes that were differentially expressed between G4 and the original maternal C. auratus red var. and paternal C. carpio, respectively. The analysis of total gene expression levels indicated that G4 was heavily biased toward the original maternal C. auratus red var.. Based on the comparison between G4 and F18, we identified 691 genes (5.71%) that were differentially expressed (Table 2), which was a consequence of rediploidization. In contrast, the comparison between G4 and F1 revealed more than double the number of differentially expressed genes (1,358, 12.93%) (Fig. 2D). Meanwhile, a comparison among G4, F1, and F18, and the original maternal C. auratus red var. revealed that the gene expression levels in G4 tended to be low (Fig. 2D). These results suggest that the global gene expression in G4 may have gradually decreased to a novel level among diploid lineages.

Figure 2
figure 2

Global analysis of gene expression in two cyprinid species and their three types of hybrids. (A) Hierarchical clustering of gene expression in five groups revealed notable difference depending on the generation of hybrids and their ploidy levels which have considerable influence on global gene expression levels. Pearson correlation coefficients were calculated for all pairwise comparisons, and are presented in a heatmap following unsupervised clustering. (B) Comparison of the expression levels between G4 and F1 allodiploid hybrids. Black dots between two blue lines represent the genes with similar expression levels and those that are outside the blue lines represent the genes with significantly different expression levels (log2 FC > 2 and FDR < 0.05). (C) Comparison of the expression levels between G4 allodiploid and F18 allotetraploid hybrids. (D) Bold text indicates the total number and fraction of genes differentially expressed in each comparison. Proportions of the total number of differentially expressed genes that are up-regulated were also given in boxes. For example, 507 genes were differentially expressed between R and G4 groups. Of these, 329 were up-regulated in C. auratus red var. and 178 were up-regulated in G4.

Table 2 Differences in total gene expression between the allotetraploid (F18) and gynogenetic allodiploid (G4) hybrids.

We next investigated the expression level changes resulting from the rediploidization events and compared it with the hybridization and tetraploidization events (They had been described in latest reports24). Differences in total gene expression levels were identified by comparing each hybrid offspring with the original maternal C. auratus red var. and paternal C. carpio. Compared with the C. carpio expression levels, we detected 25 and 77 up- and down-regulated genes common to the three hybrid offspring (Fig. S1). However, only two down-regulated genes were identified in all three hybrid offspring lineages in a comparison with C. auratus red var. expression levels (Fig. S1).

Novel expression pattern, silencing, and homoeolog expression bias

A gene ontology analysis (level 2) indicated that some genes were not simultaneously expressed in the liver of three hybrid individuals. These genes were mainly associated with metabolic or catalytic processes (Fig. S2). The novel expression pattern had been detected in 116 genes. Among of these, 83 genes (0.69% in G4) exhibited expression silencing in F18 and re-expressed in G4 (Table S4). While the remaining 33 genes (0.28%) expressed in F18, but silenced in G4 (Table S4). To investigate changes in homoeolog expression levels accompanying a rediploidization after an allotetraploidization, the R/C homoeolog expression patterns in F1, F18, and G4 were analyzed. During a comparison between G4 and F18, three (0.08%) and five (0.14%) genes were detected as R and C homoeologs with novel expression patterns in G4, respectively. We also identified two genes (0.06%) as silenced R homoeologs, and another two genes (0.06%) as silenced C homoeologs. These results suggest that a rediploidization accelerates the emergence of novel R/C homoeologous gene expression and silencing.

Principal component analysis (PCA) confirmed a strong correlation between R and C homoeolog expression levels in all hybrids (Fig. 3A). Yet, the expression levels of R and C homoeologs in hybrids showed variation based on their genotype. The distribution of the R and C homoeolog expression levels for the three analyzed hybrids, which was consistent with the PCA results, is presented in Fig. 3B. The differences between the R and C homoeolog expression levels in G4 were smaller than those in F18. Additionally, the R and C homoeolog expression levels were more closely correlated in F18 than in F1. These results imply that the R and C homoeolog expression levels become increasingly related with the development of new generations with differing ploidy levels. After clarifying the R and C homoeolog expression levels, we obtained the distribution of the log2 fold change (FC) values between R and C in three hybrid lineages (Fig. 4A). After hybridizations, the phenomena of R- and C-HEB were observed for increasing numbers of genes as the number of generations increased. Meanwhile, the number of differentially expressed R and C homoeologs gradually decreased from F1 to G4 (Fig. 4A). For a more thorough comparison of R and C homoeolog expression levels, we divided the genes into those exhibiting HEB and those that did not (no-HEB) based on a specific differential gene expression threshold. After identifying the HEB genes in the three hybrids, we focused on R- and C-HEB genes to identify which genes maintained a strong HEB. According to the HEB analysis, 73 and 47 genes were determined to be R- and C-HEB genes in the three analyzed hybrid lineages, respectively (Fig. 4B,C).

Figure 3
figure 3

Distribution of the expression values of parental (R and C) homoeologs in their three types of hybrids. (A) Principal component analysis revealed that differences between the expression of maternal and paternal homoeologs were smaller in group G4 than that of in groups F18 and F1. F1 group had the most divergent expression of parental homoeologs. (B) Relative expression levels of parental homoeologs in their three types of hybrids.

Figure 4
figure 4

Distribution of the R/C-HEB and no-HEB genes in two cyprinid species and their three types of hybrids. (A) Comparison of the expression levels of parental (R and C) homoeologs in their three types of hybrids. The numbers of R-HEB and C-HEB genes changed with the changing ploidy levels of hybrids. The distribution of log2 FC values indicate that the expression levels of maternal and paternal homoeologs gradually approach to parental levels in hybrids during tetraploidy and rediploidization events. (B) Commonality of the R-HEB genes in the hybrids. (C) Commonality of the C-HEB genes in the hybrids.

To further investigate the effects of rediploidization on R and C homoeolog expression levels, we compared and classified only G4 and F18 genes. The HEB remained unchanged for most genes between F18 and G4. These genes were then divided into the following three categories: no-HEB (58.47%), R-HEB (13.84%), and C-HEB (4.21%) (Table 3). These observations suggest that the parental condition remained during a rediploidization. Meanwhile, the HEB of some genes changed because of a rediploidization. A total of 420 genes (11.86%) that exhibited HEB in F18, including 283 R-HEB genes (7.99%) and 137 C-HEB genes (3.87%), exhibited no-HEB in G4. Additionally, 260 (7.34%) and 143 (4.04%) genes with no-HEB in F18 exhibited R- and C-HEB in G4, respectively (Table 3). A few genes underwent considerable changes. For example, six genes (0.17%) changed from C-HEB to R-HEB, while the opposite change occurred in three genes (0.08%) (Table 3).

Table 3 Differences in homoeolog expression bias between the allotetraploid (F18) and gynogenetic allodiploid (G4) hybrids.

Homoeolog expression bias of growth-regulating genes identified by RNA-seq and qPCR

A previous study concluded that the G4 fish grow 30% faster than the parental F18 fish14. Using RNA-seq technology, we analyzed the transcript levels of 92 growth-regulating genes (Table S5). The HEB status was obtained for only 34 of these genes (Table S6) because HEB was known for only 3,540 genes of the 11,998 genes included in the total gene expression analyses (Table 2). Most of the growth-regulating genes (142 genes, 95.30%) were not differentially expressed between G4 and F18. Only the dual specificity phosphatase 22 (dusp22) gene, growth hormone secretagogue receptor (ghsra) gene, fibroblast growth factor 19 (fgf19) gene, and glypican-4 (gpc4) gene were up-regulated in G4. In contrast, the bone morphogenetic protein 10 (bmp10) gene and two endothelial cell-specific molecule-1 (esm1) genes were down-regulated in G4 (Tables 2 and Table S5). Under the condition of no-HEB in paternal F18, the neuropilin 1 (nrp1a) gene exhibited C-HEB in G4, while the fibroblast growth factor 23 (fgf23) and esm1 genes exhibited R-HEB (Tables 3 and Table S6).

To study the relationship between HEB and rapid growth, we determined the HEB of six growth regulated genes (i.e., igf1, igf2, ghr, tab1, bmp4, and mstn) in three tissues (i.e., liver, muscle, and ovaries) of G4 and F18 fish using homoeolog-specific qPCR (Fig. 5). Interestingly, we observed that the C homoeolog of the mstn gene was silenced in the muscle of G4 and F18 fish (Fig. 6). Novel expression patterns of the C homoeolog of the mstn gene were observed in the liver of G4 fish, while this homoeolog was silenced in F18 (Fig. 6). We also observed differences in the extent of R/C-HEB in the three analyzed tissues (Fig. 6). Specifically, igf1 exhibited an overall R-HEB in the muscle and liver of G4 fish, while bmp4 and mstn exhibited an overall C-HEB in the ovaries of F18 fish. In summary, R-HEB was the predominant expression bias in the liver, muscle, and ovaries.

Figure 5
figure 5

qPCR analysis of the six growth-regulating genes. The CT ratio of maternal (R) and paternal (C) homoeologs was based on tissue distribution analyses in G4 and F18 hybrid groups. (A) CT ratio of R homoeolog vs. C homoeolog for igf1. (B) CT ratio of R homoeolog vs. C homoeolog for ghr. (C) CT ratio of R homoeolog vs. C homoeolog for igf2. (D) CT ratio of R homoeolog vs. C homoeolog for tab1. (E) CT ratio of R homoeolog vs. C homoeolog for bmp4. (F) CT ratio of R homoeolog vs. C homoeolog for mstn. **potential R-HEB; *potential C-HEB. a: overall C-HEB; b: overall R-HEB. Comparative analysis revealed significant differences in gene expression (P < 0.05) (n = 3 for each group).

Figure 6
figure 6

Hierarchical clustering analysis of homoeolog expression of six genes in various tissues of the parents (R 1–3 and C 1–3) and their three hybrid offspring (F1 1–3, F18 1–3, and G4 1–3). Transcripts with high and low expression levels are indicated in yellow and blue, respectively.

Discussion

Continuous changes to homoeolog expression accompany to new generations with different ploidy levels

In this study, the initial hybridization involved the merging of C. carpio and C. auratus red var. genomes. Following a genome duplication in the F2 diploid hybrid gametes, a bisexual fertile allotetraploid population was obtained15. Gynogenesis was then exploited to breed the G1 diploid hybrid, in which the genome consisted of only half of the allotetraploid genome16. The three hybrids and the two associated genome-level changes enabled us to elucidate the relationship between genomic variation and growth differences in hybrids.

Characterizing the structural and functional changes in hybrids with different ploidy levels is a major challenge because of the complexities of genome structures and regulatory pathways. In particular, it is still unclear whether the two original parental genomes contribute equally to the gene expression levels of hybrids. We explored this uncertainty in detail using a systematic approach for dissecting the relative contributions of two homoeologs (i.e., R and C) to total gene expression levels. We also investigated the changes to homoeolog expression accompanying a genome duplication and rediploidization events. Only a few studies have focused on the differential expression of homoeologs in vertebrates because of a lack of fertilized hybrids. However, studying homoeologs originating from different species has been considered useful for clarifying the genetic structure and specific phenotypic changes occurring in hybrids with differing ploidy levels25,26. Subsequent genome-wide assessments have relied primarily on microarrays, which may not be appropriate for distinguishing the expression of closely related genes10,27. Therefore, these studies have been unable to separate the contributions of different homoeologs, particularly the R and C homoeologs originating from two diploid parental species. In this study, we used two novel analyses based on RNA-seq and qPCR to study the HEB under the condition that the gene transcripts from the parental species could be distinguished. A subsequent analysis of growth-regulating genes provided insights into growth heterosis resulting from a rediploidization.

The genome reduction following a hybridization altered the parental allotetraploid gene expression levels. Our total gene expression analysis revealed that 5.97% of the genes were differentially expressed between G4 and F18, while 21.5% of the genes were differentially expressed between F1 and F18 (Fig. 2). Additionally, analyses of homoeolog expression indicated that the number of genes exhibiting HEB decreased in new generations with ploidy-level changes (Fig. 2). These observations imply that a rediploidization leads to novel expression patterns, which are not simply the result of a reversal of the tetraploidization process. Compared with their parents, a new population was observed to exhibit differences related to growth14, random amplification of polymorphic DNA and microsatellite characteristics14, and reproduction17. These changes were closely related to gene expression, especially homoeolog expression levels. Additionally, the maternal R-HEB phenomenon was detected in three hybrids (Table 3)24. Maternal expression bias is frequently observed in hybrid fish, including the M. amblycephala × Culter alburnus 8, Oncorhynchus mykiss 28, and S. salar 28 hybrids. Our PCA results indicated that the R and C homoeolog expression levels were increasingly closely linked during the reconstruction of the hybrid genotype and the development of new generations (Fig. 3). The hybrid genome gradually started to change, with the gene expression levels inherited from the original parents slowly reaching a similar level. These gene expression level changes suggest the hybrid lineages tended to reach a steady state. However, tetraploidizations and rediploidizations were associated with increases in the number of R-HEB genes. This change also influenced hybrid phenotypes, and induced various genetic changes conducive for adaptations. Recent studies have deduced the possible mechanisms underlying changes to total gene and homoeolog expression levels. Allelic interactions and gene redundancies were considered the major causes of alterations to non-coding RNA, DNA, and methylation, which resulted in additional changes to the hybrid transcriptomes11,29.

An analysis of homoeolog expression related to hybrid characteristics has produced relatively precise information regarding the associated regulatory mechanisms30. To investigate the transcriptome shock caused by a rediploidization and the resulting increase in the growth rate of G4 fish, we analyzed the expression of growth-regulating genes. The four up-regulated and three down-regulated genes enabled additional investigations into the relationship between gene expression levels and growth rates. However, we did not conduct any related phenotypic analyses. Thus, we were only able to obtain the annotation details of these genes (i.e., dusp22, ghsra, fgf19, esm1, and gpc4), which are important for controlling cell division and regulating growth31,32,33,34. Furthermore, bmp10 encodes a potent inhibitor of endothelial cell migration and growth35. The down-regulated expression of this gene likely promotes growth (Tables 2 and Table S5). We used the analysis of HEB among growth-regulating genes to investigate the relationship between HEB and phenotypes, and to characterize the mechanism underlying heterosis in polyploid hybrids (Tables 3 and Table S6). Our RNA-seq results suggested that nrp1a was associated with C-HEB, while fgf23 and esm1 were related to R-HEB (Tables 3 and Table S6). An assessment of the localization of the expression of six key growth-regulating genes revealed that igf1 exhibits C-HEB in the muscle and liver. The original paternal parent C. carpio grows faster than the maternal parent C. auratus red var.24. The C-HEB of igf1 likely promotes the dominance of the paternal growth characteristic in G4, possibly inducing a rapid growth rate.

Interestingly, the diploid G4 retained the inherited characteristics of the parental F18. The silencing of the C homoeolog was also observed for the mstn gene (Figs 5 and 6). This may have been due to genomic imprinting, implying that the regulation of gene expression is mediated by one parental genome, while the genetic material inherited from the other parent is silenced in the hybrid36. Some genes in hybrids always exhibit single-genome–mediated expression36. An earlier study concluded that mutations in mstn always result in increased muscle mass and strength in vertebrates, making these individuals considerably stronger than their peers37. However, we determined that the expression pattern for a C homoeolog of mstn in the liver of G4 fish differed from that of F18 fish. The re-expression of mstn reflected the new regulatory mechanism influencing growth that may contribute to an increased growth rate in G4 individuals. This mstn expression pattern was similar to that observed in F1 24. However, additional studies are required to verify that a rediploidization accelerates the emergence of new phenotypes via changes to the HEB and homoeolog silencing.

Conclusion

Gynogenetic diploid offspring provide unique opportunities to study the evolutionary effects of rediploidization. The associated transcriptome shock was clarified by a novel analysis of genes exhibiting HEB. In this study, we examined this phenomenon in detail using qPCR and homoeolog-specific sequences in the transcriptome. Our data revealed the direction and extent of HEBs and homoeolog silencing changes accompanying to hybridization, tetraploidization, and rediploidization. Additionally, our findings may be useful for more comprehensively characterizing of the relationship between novel growth phenotypes and homoeolog expression differences. Additional studies on this topic might contribute to elucidation of the mechanism regulating growth heterosis.

Materials and Methods

Animal materials

Fertile allodiploid hybrids (F1 and F2, 2n = 100) were obtained from the hybridization between C. auratus red var. (2n = 100, ♀) (R) and C. carpio (2n = 100, ♂). Then, the unreduced gametes from F2 hybrids lead to emergence of bisexual fertile allotetraploid hybrids (4n = 200). Until now, the allotetraploid lineage continued into F25 by successive self-crossing. A fertile diploid gynogenetic fish lineage was obtained from the induction of gynogenesis in the female fertile allotetraploid hybrids (2n = 100). The fertilization of gynogenetic fish was performed with the eggs with UV-irradiated sperm from common carp. This technology had ensured the continuous of diploid gynogenetic fish lineage (G1-G10), which could be considered as the model of rediploidization event regarding the allotetraploid lineage. The results of FISH showed that the genotype of three hybrid lineages were half of the R and C (1:1) genomes (Table 1).

All experiments (2012–2015) were approved by the Animal Care Committee of Hunan Normal University. We followed the animal experimentation guidelines of the Science and Technology Bureau of China. The experimental fish were kept in an indoor freshwater tank which maintained to carry suitable environmental conditions in terms of photo-period, water temperatures, forage etc. in the Engineering Center of Polyploidy Fish Breeding of the National Education Ministry located at Hunan Normal University, China. Fish were deeply anesthetized with 100 mg/L MS-222 (Sigma-Aldrich, St. Louis, Missouri, USA) before being dissected. Three mature females from each ploidy group, including diploid C. auratus red var., diploid C. carpio, F1 allodiploid hybrids of C. auratus red var. × C. carpio, F18 allotetraploid hybrids of C. auratus red var. × C. carpio, and G4 allodiploid hybrids of C. auratus red var. × C. carpio (2-year-old individuals) were collected.

Their ploidy levels were confirmed by measuring the DNA contents of their erythrocytes via flow cytometry and by direct counting of chromosome numbers from metaphase spreads. To prepare metaphase spreads, we cultured the red blood in DMEM solution for 68–72 h at 25.5 °C and 5% CO2. Cells were harvested by centrifugation, and then treated in a hypotonic solution (0.075 M KCl) at 26 °C for 25–30 min. Samples were fixed in a methanol–acetic acid (3:1, v/v) solution with three changes. Then, the samples were placed on cold slides, air-dried, and stained in 4% Giemsa solution for 30 min. The chromosome numbers were observed in metaphase spreads of 15 individuals to determine the ploidy levels.

cDNA generation, library construction, and RNA sequencing

After anesthetizing the fish with 2-phenoxyethanol, the liver, muscle, and ovary tissues were excised and immediately placed in RNAlater for storage based on the manufacturer’s instructions (Ambion Life Technologies, Carlsbad, CA, USA). Total RNA extracts were treated from the harvested tissues according to a standard Trizol protocol (Invitrogen) after the RNAlater was removed. Total RNA was treated with a DNA-free™ DNA Removal Kit (Ambion) to remove any contaminating genomic DNA. The purified RNA was quantified using a 2100 Bioanalyzer system (Agilent, Santa Clara, CA, USA).

We fragmented 2 μg isolated mRNA with fragmentation buffer. The resulting short fragments were reverse transcribed and amplified to produce cDNA. An Illumina RNA-seq library was prepared according to a standard high-throughput method38. The cDNA library concentration and quality were assessed by Qbit (Invitrogen) and the Agilent 2100 Bioanalyzer, after which the library was sequenced using the Illumina HiSeq. 2000 platform. The RNA-seq experiment was conducted with three biological replicates. The transcriptome data were generated using a 101-bp paired-end setting. After removing the read adapters and low quality reads, clean reads from each library were examined by using the FastQC program (Version 0.11.3)39. Afterwards, principal component analysis (PCA) was performed on 15 liver transcriptomes to elucidate the contribution of each transcript to different classes in the 15 liver transcriptomes40.

Specific mapping of the R/C homoeologs

For detection of changes in homoeolog expression due to a rediploidization event, we obtained the genome of the maternal parent C. auratus red var. (http://rd.biocloud.org.cn/) (39,069 transcripts) and the genome of the paternal parent C. carpio (http://www.carpbase.org/) (52,610 transcripts). A database of full-length transcripts was used as a resource for gene expression details (Table S2). Additionally, analyses of putative orthologs between R and C may help to identify true orthologs. Thus, the R and C sequences were aligned by using the reciprocal BLAST (BLASTN) hit method, with an e-value cut off of 1e−20 41. Two sequences were defined as orthologs if each one was the best hit for the other, and the aligned sequences contained at least 300 bp. We used putative R and C orthologs as the reference sequences for detecting the homoeolog expression of hybrids. By using custom perl scripts, we calculated the homoeolog expression levels over the SNPs between the R and C orthologs were applied for calculating homoeolog expression levels. The mapping of reads to the reference transcriptome for three hybrids and their original parents was completed by means of TopHat2 program42.

Analyses of homoeolog expression bias

Prior to analyses, the expression level data were normalized by using Cufflink program (version 2.1.0)43. We also restricted the data analysis with the number of read counts (≥ 1) of genes to remove the negative effects of background expression noise in all biological replicates. The abundance or the coverage of each transcript was normalized by using the number of reads per kilobase of exon per million mapped reads (RPKM)44. Gene expression levels were estimated based on the RPKM values of the reads. The false discovery rate (FDR) was used for determining the threshold P value in multiple tests and analyses. Meanwhile, unigenes with an FDR ≤ 0.05 and a fold change > 2 were considered differentially expressed genes. The mapping results were analyzed with the DEGseq package of the R program (version 2.13) (R Foundation for Statistical Computing, Vienna, Austria)45.

Comparisons of the R/C homoeolog expression levels among three hybrids were used to identify the genes with potentially novel expression patterns (e.g., new expression of a gene in the liver) and silenced genes in the hybrids according to the standards described by Yoo et al.9. To screen the newly expressed or silenced genes, we analyzed the homoeolog expression categories associated with these genes, which were simultaneously expressed among the three hybrids and their original parents. For homoeolog expression bias (HEB) analyses, we compared R and C homoeolog expression levels in the three hybrids. The log2 FC of homoeologue expression value (R vs C) > 2 had been considered as the threshold values of R-HEB. Contrarily, the log2 FC (C vs R) > 2 had been considered as the threshold values of C-HEB. The extent and direction of differential homoeolog expression were assessed according to Ren et al.24.

Homoeolog-specific qPCR

To elucidate the relationship between HEB and rapid growth, we examined the HEB of six important genes (i.e., igf1, igf2, ghr, tab1, bmp4, and mstn) previously observed to regulate growth rates13,46,47. However, HEB information was lacking for some of these genes. So the homoeolog-specific qPCR had been used to detect the homoeolog expression level. Total RNA was extracted from the liver, muscle, and ovaries. First-strand cDNA was synthesized using AMV reverse transcriptase (Fermentas, Canada) with an oligo (dT)12–18 primer at 42 °C for 60 min and 70 °C for 5 min. To further detect the changes in homoeolog expression pattern of growth-regulating genes, we obtained sequence information for the R and C transcripts of a housekeeping gene (i.e., β-actin) and six key growth-regulating genes (i.e., igf1, igf2, ghr, tab1, bmp4, and mstn) from the C. auratus red var. and C. carpio transcripts. A comparison of the R and C homoeolog sequences by using Bioedit program (version 7.0.9)48 revealed the presence of homoeolog-specific loci in all seven genes. The regions containing homoeolog-specific loci were used to design R/C homoeolog-specific primers for qPCR analysis24. To obtain highly sensitive specific primers, we completed cross amplifications by using ABI Prism 7500 Sequence Detection System (Applied Biosystems, USA)24. The PCR amplification conditions were as follows: 50 °C for 5 min, 95 °C for 10 min, and 40 cycles at 95 °C for 15 s and 60 °C for 45 s. Each test was conducted three times to improve accuracy of the results. Finally, relative quantities were calculated, and a melting curve analysis was used to verify the generation of a single product. Each sample was used in triplicate for the assays and to generate standard curves. The expression data for each homoeologous gene was normalized against that of β-actin according to 2−ΔΔCt method49. The β-actin expression level in the hybrids was estimated by using the ratio of transcript abundance to gene copy number using PCR and qPCR conducted with co-extracted DNA and RNA samples. The β-actin expression levels in somatic organs and gonads were similar among the three hybrids and their original parents50,51.

Data deposition

The transcriptome data were submitted to NCBI (accession number: accession numbers: SRX668436, SRX175397, SRX668453, SRX177691, SRX671568, SRX671569, SRX668467, SRX1610992, and SRX2347299). The remaining data are available within the article and its Supplementary Information files or available from the authors upon request.