Comparative genome anatomy reveals evolutionary insights into a unique amphitriploid fish

Triploids are rare in nature because of difficulties in meiotic and gametogenic processes, especially in vertebrates. The Carassius complex of cyprinid teleosts contains sexual tetraploid crucian carp/goldfish (C. auratus) and unisexual hexaploid gibel carp/Prussian carp (C. gibelio) lineages, providing a valuable model for studying the evolution and maintenance mechanism of unisexual polyploids in vertebrates. Here we sequence the genomes of the two species and assemble their haplotypes, which contain two subgenomes (A and B), to the chromosome level. Sequencing coverage analysis reveals that C. gibelio is an amphitriploid (AAABBB) with two triploid sets of chromosomes; each set is derived from a different ancestor. Resequencing data from different strains of C. gibelio show that unisexual reproduction has been maintained for over 0.82 million years. Comparative genomics show intensive expansion and alterations of meiotic cell cycle-related genes and an oocyte-specific histone variant. Cytological assays indicate that C. gibelio produces unreduced oocytes by an alternative ameiotic pathway; however, sporadic homologous recombination and a high rate of gene conversion also exist in C. gibelio. These genomic changes might have facilitated purging deleterious mutations and maintaining genome stability in this unisexual amphitriploid fish. Overall, the current results provide novel insights into the evolutionary mechanisms of the reproductive success in unisexual polyploid vertebrates.

We used C. gibelio as the reference genome and removed all repeats. Illuminua reads of two C. auratus, six C. gibelio and one C. carpio (Supplementary Table 16 and 17) were mapped to the C. gibelio genome using BWA (Version 0.7.12-r1039) 6 . Using the genome-wide mean reads depth (m) as a criterion, sites with reads depths between 0.5m and 2m were retained for subsequent analysis. Then, C. gibelio-specific mapped non-coding regions with uniform coverage of reads and lengths greater than 100 bp were defined as new non-coding elements. We investigated the functions of the closest genes within 10 Kb of both sides of new non-coding elements by GO enrichment analysis and found that many of these genes were associated with meiosis.

Supplementary Note 6 | Loss of heterozygosity analysis
For loss-of-heterozygosity (LOH) analysis, one female individual of G 4 generation of clone F 18 was selected to construct a C. gibelio clonal line by reproducing successive four generations via gynogenesis. We sequenced 11 individuals (~ 48× depth for each sample) from the offspring of the gynogenetic line and called SNPs of each individual as the method in "Resequencing-based ploidy analysis". To minimize both false-negative and false-positive calls, we used the following criteria to process SNPs and identify LOH sites in the SNP set identified in the 11 offsprings from the gynogenetic pedigree: (1) Filter the non-triploid chromosomes (Chr1B, 6A and 22A of Cg-F1 in Supplementary Fig. 13).
(2) Trimorphic SNP sites that have three different bases account for only 0.26% (34,916) of the total SNP sites, and the rest are dimorphic, of which the depth of one type of base is usually twice of the other type (referred to as the minor allele), indicating most SNPs were singletons (minor allele) in the first parental mother of the gynogenetic pedigree. Therefore, we used these dimorphic SNPs for following analyses. (3) SNPs with a minimum average of 20× coverage and a maximum coverage of 80× were maintained. (4) Sites directly adjacent to small insertion-deletion mutations were filtered to avoid false-positive inferences created by misalignment. (5) For each SNP site of one individual, the coverage depth of minor allele ≥ 5× was considered as heterozygous site, and ≤ 1× was considered as homozygous site. (6) If SNPs in any individual with a coverage depth of minor allele > 1× and < 5× were considered as ambiguous sites and filtered from the SNP set. (7) LOH sites were only called when they were heterozygous in some individuals but became unambiguously homozygous in one or more individual(s). Finally, 64,246 LOH sites were obtained from a total of 9,780,732 SNP sites. To verify these LOH sites, we used Sanger sequencing to examine 101 randomly selected LOH sites. PCR primers were designed using Primer 5 based on C. gibelio reference genome sequence, capturing approximately 300 bp flanking the LOH locus on both sides. Every amplified fragment was cloned, and then 30 clones were picked to sequence for determining the genotype of the SNP site. 97 LOH sites were verified.
Unlike in diploids where LOH (homozygous) SNP sites are continuous, every LOH block may contain both LOH SNP sites and still heterozygous SNP sites (referred to as non-LOH) in triploids (Extended Data Fig. 4a). Given the discontinuity of LOH sites, we next restricted our search to contiguous tracts of LOH sites, where the length of the tract was 100 kb. We considered the first LOH site found on a tract to be part of a possible LOH region and iteratively extended the region if a next LOH site was found within 100 kb to the previous LOH site. The tract length of each LOH region was calculated from the interval midpoint between the first LOH site and upstream non-LOH SNP site to interval midpoint between the last LOH site and downstream non-LOH SNP site. Furthermore, we filtered the LOH regions with only one LOH site presenting in a single individual.
After identifying LOH sites and regions, we then moved to filter deletion regions.
(1) We plotted the distribution of the average normalized read depth for SNP sites in each LOH region, and it showed two peaks around 49× (triploid) and 31× (diploid) (Extended Data Fig. 4b).
(2) We plotted the distribution for the average frequency of minor allele in each LOH region, and the results also showed two peaks around 0.33× (triploid) and 0.42× (diploid) (Extended Data Fig. 4c). (3) LOH regions with an average read depth >40× and average frequency of minor allele <0.37 at the same time were considered as gene conversion regions (triploid) (p-value < 0.05 in one or more lines, binomial test). Eventually, we obtained the candidate gene conversion regions that contained 61,014 LOH sites (95.0% of total LOH sites) in the 11 individuals of the gynogenetic pedigree.
Finally, we analyzed the identified gene conversion regions based on the unique SNP-converted pattern in triploids. As shown in Extended Data Fig. 4a, after a gene deletion, there could be two SNP sites: 1/3 of SNP sites show LOH and the rest remain heterozygous with similar read depths for each allele (like in diploids); however, after a conversion, there could be three types of SNP sites: homozygous converted sites which result in LOH in this region, heterozygous converted sites where the donor allele is minor allele before conversion, and converted sites that look unchanged where the donor allele has same base as the recipient allele before conversion but heterozygous with the minor allele, and their ratios should show a pattern of 1/3:1/3:1/3 if a conversion region is long enough. Accordingly, we calculated the ratios of the three types of SNP sites in the candidate gene conversion regions; exactly, each type approximately occupied 1/3 of total SNP sites in a conversion region (Extended Data Fig. 4d). Moreover, we phased the blocks by comparing homologous SNP sites between individuals that did or did not experience gene conversion, where SNP genotyping was determined by the read coverages of its two base statuses ( Supplementary Fig. 18). Since gene conversion is a unidirectional DNA modification from one haplotype to another, the donor and recipient alleles at each SNP site can be inferred respectively in the gynogenetic pedigree, and thereby the three haplotypes will be phased. As expected, the phasing blocks ( Fig. 5f and Extended Data Fig. 5) present a well-defined SNP pattern for gene conversion in triploid (Extended Data Fig. 4a). Therefore, these data indicated that the identified gene conversion regions are basically reliable.
The rate of LOH (per locus per generation) was calculated following the method in Omilian et al. 19 using the equation where h is the number of observed LOH sites, L is the number of lines, i is the number of total considered informative sites, and T is the number of generations for lines. The rates of LOH, GC and GD were calculated respectively.

Supplementary Note 7 | Assembly of male-specific supernumerary sequences
In order to obtain male-specific supernumerary sequences, we sequenced a male individual C. gibelio (F strain) using Illumina sequencing technology. A total of 333 Gb reads were mapped to the reference genome of C. gibelio with a mapping rate of 99.23%. The unmapped reads were then used to assemble possible male-specific regions using platanus v1.2.4. Finally, we obtained 33 Kb sequences, with a N50 of 16.6 Kb. Only one gene (tufm) was found in the assembled sequence. Unfortunately, we noticed that this gene is highly similar to the copy in Streptococcus, indicating this is most likely from the contamination during sampling and sequencing process. Chen, D. et  subgenomes based on ASTRAL ( The two phylogenetic trees have a same topology and subgenome B are clustered . huangchuchieni and C. molitorella

| Phylogenetic relationships of subgenomes based on ASTRAL (a) and IQ
The two phylogenetic trees have a same topology and subgenome B are clustered

Phylogenetic relationships of ) and IQ-TREE (
The two phylogenetic trees have a same topology and subgenome B are clustered C. molitorella.
Phylogenetic relationships of C. auratus

TREE (b).
The two phylogenetic trees have a same topology and subgenome B are clustered

C. auratus and
The two phylogenetic trees have a same topology and subgenome B are clustered

Supplementary Fig
The gene gibelio
Supplementary Figure 10 can be found in 0 | Gene loss showed by synteny chart.

found in S. grahami
Gene loss showed by synteny chart.

S. grahami and C. carpio
Gene loss showed by synteny chart.

C. carpio, but lost in
Gene loss showed by synteny chart.
, but lost in C. auratus C. auratus and and C.