Clonal polymorphism and high heterozygosity in the celibate genome of the Amazon molly

The extreme rarity of asexual vertebrates in nature is generally explained by genomic decay due to absence of meiotic recombination, thus leading to extinction of such lineages. We explore features of a vertebrate asexual genome, the Amazon molly, Poecilia formosa, and find few signs of genetic degeneration but unique genetic variability and ongoing evolution. We uncovered a substantial clonal polymorphism and as a conserved feature from its inter-specific hybrid origin a 10-fold higher heterozygosity than in the sexual parental species. These characteristics appear to be a main reason for the unpredicted fitness of this asexual vertebrate. Our data suggest that asexual vertebrate lineages are scarce not because they are at a disadvantage, but because the genomic combinations required to bypass meiosis and to make up a well-functioning hybrid genome are rarely met in nature.


Supplementary Note 1: Transposable element analysis
Repeat element annotation was performed to obtain a complete inventory of transposable elements (TEs) for the Amazon molly, as well as for the two parental species. TEs account for about 22-23% of the genomes of P. formosa, P. mexicana and P. latipinna (Supplementary Table 4). This is in the range of other similar sized fish genomes 1 and in particular of closely related species, the guppy (25%), Southern platyfish (22.5%), Monterrey platyfish (21.8%) and swordtail (21.1%). We found a high diversity of TE families in the Amazon molly genome (Supplementary Table 4 ), with many families that are also present in other fish genomes but absent from mammals and birds 2 . The P. formosa genome contains all major groups of vertebrate DNA transposons and retrotransposons.
Most TE families are represented by relatively low full copy number except a few that show a strong expansion, including two families of hAT transposons (>1000 copies longer than 80% of the reference sequence), two TcMariner families (>600 copies), PiggyBac transposons (about 1000 copies), Rex-Babar non-LTR retrotransposons (>400 copies) and V-SINE retrotransposons (>1200 copies) (Supplementary Table 5). No major difference in TE composition was detected between the genome of P. formosa and the parental genomes.
When we analyzed TE transposition history in the genomes of P. formosa and its parental species (Figure 2A), it became apparent that most of the superfamilies have a rather ancient transposition history represented by a high percentage of divergent copies, mostly concerning TcMariner elements. However, a burst of transposition occurred in recent timing (K-3/4), mostly involving hAT transposons. The comparison with two other Poecilids, i.e. X. maculatus and P. reticulata, indicates that the oldest major burst, which is common to all species tested, might have occurred in the Poecilid ancestral genome. The more recent burst is very different between the different poeciliid genomes and is probably genera-specific: a small burst in X. maculatus, a very high burst followed by a strong decrease in P. reticulata and an intermediate burst in P. formosa and parents. Finally, a higher fraction at most recent timing (K-0/1) in P. formosa might reflect species-specific transposition event. No major differences in transposition history between P. formosa and the parental genomes and other related species was noted.
Based on Kimura plots, TEs appear to be more active in P. formosa than in parental genomes (percentage higher in K-0). Presence of TE sequences in the transcriptome can indicate such an activity. TEs make about 1.4% of the Amazon molly transcriptome (Supplementary Table 6 ), which indicates a somehow lower overall activity compared to the platyfish (5.4%). Indeed, the most highly transcribed TE superfamilies are gypsy retrotransposons and TcMariner transposons. Expression of the Gypsy elements, but not TcMariner, might be the result of their activity rather than of general background expression because their relative fraction is remarkably higher in the transcriptome than in the genome ( Figure 2B).
Because long terminal repeat (LTR) retrotransposons are known to play fundamental roles in genome rearrangement after "catastrophic" events such as polyploidization or hybridization 3 we investigated Gypsy element activity by comparing copy-specific insertion sites within P. formosa to the parental genomes. Six of the 47 insertions of Gypsy elements are absent from both parental genomes while the remainder is found at the same genomic position in all three genomes. These P. formosa-specific insertions may be explained by post-hybridization transposition events of Gypsy elements.
Despite the absence of recombination, the RNA-mediated silencing machinery provides important mechanisms in epigenetic silencing of TEs preventing them from multiplying. Members of this critical pathway in P. formosa were not lost, corrupted or expanding faster than expected relative to other sexual teleost species (Supplementary Table 31).

Supplementary Note 2: Gene selection, gene family expansion and gene duplication Gene selection
Using analyses of protein coding genes relative to other closely related taxa, it is possible to discover signatures of positive selection along the P. formosa lineage by measuring the dN/dS ratio (the ratio of the rate of nonsynonymous substitutions to the rate of synonymous substitutions). We therefore selected 8,286 one-to-one orthologous gene alignments from eight species of teleosts (Poecilia formosa, Xiphophorus maculatus, Oryzias latipes, Oreochromis niloticus, Tetraodon nigroviridis, Gasterosteus aculeatus, Astynax mexicanus, Danio rerio). Next, using maximum likelihood ratio tests, we selected genes that showed significant results (p<0.01) for both the "free-ratio" and "two-ratio" model comparisons of branch tests. We measured the signal across the entire gene and selected only those genes that were significant for the molly lineage for further biological inference measures. Our analysis revealed 351 genes (4.3%) as significantly diverged. Canonical pathway (KEGG) enrichment analysis found only two pathways of significance (p<.002). These two pathways, calcium signaling and neuroactive ligand-receptor interaction, both have unknown phenotypic relevance specific to P. formosa. Further assessments of particular nonsynonymous substitutions within the positively selected gene set revealed that 153 genes (43.6%) contained substitutions that made significant impacts on the biological function of the translated protein.
Among total shared loss-of-function variants (LoF) P. formosa has a two-fold higher number than either P. mexicana or P. latipinna (Supplementary Table 13) however, the sexual parents shared LoF variants across all samples from different geographical locations are likely due to false calls or faulty gene models with incorrect consensus bases in the reference causing frameshifts. Once LoF variants shared across all sexual parents within species are removed the occurrence of LoFs in P. formosa is lower than either of the sexual parents of LoF types (Supplementary Figure 13A). We find the average number of LoFs per P. formosa individual to be 984 while 1,085 and 1,167 for P. latipinna and P. mexicana, respectively. If we only focus on genes that experienced a likely more damaging event, stop gain or losses, we also see P. formosa has a lower count (Supplementary Table 13). We find a total of 63,241 non-synonymous shared variants within P. formosa compared to 13,588 and 16,174 within P. latipinna and P. mexicana, respectively ( Supplementary Figure 13 B). Contrary to the overall higher polymorphic sites in P. formosa, the average of homozygous non-synonymous base changes is ~10-fold lower in P. formosa. Should the asexual genome be carrying parental copies of each allele this observation would be expected if gene conversion was rare which we have confirmed. Average non-synonymous base substitutions, heterozygous or homozygous allelic state, is only slightly higher in P. formosa. A measure of neutral mutations in sexual species is the number of synonomous sites. In P. formosa we observe a much higher number of synonomous sites compared to the sexual parents ( Supplementary Figure 13 B). We attribute this finding to higher numbers of polymorphic sites at the time of hybridization and their neutral effect on genome fitness. We find no overlapping genes that share putative LoFs within species. Only a few P. formosa genes with putative LoFs (n=4) are shared with one of the parents. These putative LoFs assumed to be fixed in each species were possibly inherited by P. formosa at the time of it's hybridization.
A total of 254 genes were found to contain LoF variants within the P. formosa genome that upon evaluation for GO enrichment revealed significant GO annotation only for the biological processes category. Other categories, i.e. molecular, were not significant. We find eight genes that belong to the neuromuscular process grouping (Supplementary Table  32). P. formosa have no known difference in neuromuscular control compared to their sexual parental species. However, proof of this difference will require additional investigations.

Gene family evolution
In order to identify rapidly evolving gene families along the Amazon molly lineage we obtained peptides from Astyanax mexicanus (Cavefish), Danio rerio (Zebrafish), Tetraodon nigroviridis (Green spotted puffer), Gasterosteus aculeatus (Stickleback), Oreochromis niloticus (Tilapia), Oryzias latipes (Medaka), Xiphophorus maculatus (Platyfish) and Poecilia formosa (Amazon Molly) from ENSEMBL 76 4 . To ensure that each gene was counted only once, we used only the longest isoform of each protein in each species and all other isoforms were filtered out. We then performed an all-vs-all BLAST 5 search on these filtered sequences. The resulting e-values from the search were used as the main clustering criterion for the MCL program to group peptides into gene families 6 . This resulted in 13,667 clusters. Of these, all single-peptide clusters were filtered out, leaving 8,719 gene families.
We also constructed an ultrametric tree (Supplementary Figure 1) for these 8 species based on ENSEMBL's ortholog identifications. We aligned the nucleotide coding sequences of the 844 one-to-one orthologs with MUSCLE 7 . Then we used RAxML 8 to construct gene trees for each alignment. These trees were combined using the average consensus method 9 implemented in SDM 10 . This resulted in a single species phylogeny with branch lengths in terms of relative number of changes. To obtain branches in terms of absolute time the tree was smoothed using the r8s program. r8s uses a semiparametric, penalized likelihood approach to estimate substitution rates and divergence times across a phylogeny, given at least one calibration time 11 . We calibrated the times on the tree based on the divergence between Zebrafish and Stickleback at 265 million years from TimeTree 12 .
With the gene family data and ultrametric phylogeny as input, we estimated gene gain and loss rates (λ) with CAFE v3.0 13 . This version of CAFE is able to estimate the amount of assembly and annotation error (ε) present in the input data using a distribution across the observed gene family counts and a pseudo-likelihood search. CAFE is then able to correct for this error and obtain a more accurate estimate of λ. We find an ε of about 0.06, which implies that 6% of gene families have observed counts that are not equal to their true counts. After correcting for this error rate, we find λ = 0.0007. Using the estimated λ value, CAFE infers ancestral gene counts and calculates p-values across the tree for each family to assess the significance of any gene family changes along a given branch. Those branches with low p-values are said to be rapidly evolving. Variations in the number of copies of genes between species have also been proposed to be a driving force of evolutionary change 14 . Both the loss and gain of genes can be adaptive and various methods exist to quantify these changes 15 . With this in mind, we sought to classify gene family evolution in the P. formosa and several closely related fish species (See methods). A summary of all gene family changes for all species (n=8) measured in this study (Supplementary Table 33) show P. formosa has a comparatively higher number of rapid expansions (110 of 131) compared to sexual fishes. Given a P. formosa unique mode of reproduction, we sought molecular evidence of any rapid gene family turnover supporting this trait. After correcting for multiple testing (Dunn-Sidak correction), we find almost no evidence for rapidly evolving gene families linked to reproductive function. However, among the families without enriched GO terms we do observe a family of matrix metallopeptidases (family 75) which has functions related to reproduction 16,17 . Another family of Diaphanous homologs (681) was found to be rapidly expanding in P. formosa and is also associated with reproduction (GO:0007292). Finally, a family of highly conserved Hox proteins (37), which partition segments of the body during development 18,19 , was found to be rapidly contracting in the P. formosa.

Gene duplication
We found 14.8 Mb, ~1.97%, of the P. formosa genome, to be located in segmental duplications (SDs) across the 19 analyzed samples (Supplementary Table 34 We intersected the genomic coordinates of the shared duplications on a per species basis with the genomic coordinates of the P. formosa genes. We required at least 80% of the gene sequence to fall within a shared duplication to consider a gene as duplicated. This resulted in a total of 283 duplicated genes within SDs among all the P. formosa samples (Supplementary Table 35), 210 in the P. latipinna and 207 in the P. mexicana samples. Altogether, we find 131 genes to be in fixed duplications across all 3 species. In Supplementary Tables 16-21 we include the GO terms significantly enriched (corrected p-value < 0.05) in each of the GO categories analyzed. We find several GO-terms associated with transposition and DNA recombination to be significantly enriched in the P. formosa and both parental species SD regions. We retrieved the 11 unique genes in duplications that contribute to the terms "DNA integration" (GO:0015074), "transposition, DNA-mediated" (GO:0006313), "transposase activity" (GO:0004803) and "DNA recombination" (GO:0006310). We compared absolute copy number in these genes between the P. formosa samples and the parental species (Supplementary Figure 14), and find 3 genes to have significantly expanded copy number in the P. formosa samples (ENSPFOG00000001300, ENSPFOG00000004306, ENSPFOG00000021805, Wilcoxon test, p<0.05). ENSPFOG00000001300 and ENSPFOG00000004306 are Tc1 transposons of the same family and ENSPFOG00000021805 is related to Gypsy transposons. We suggest each most likely represents expansions of certain TEs, which, however, do not contribute to a higher total TE content of P. formosa compared to P. mexicana and P. latipinna. To clarify the putative adaptive role of these unique gene family expansions and contractions occurring within the asexual vertebrate genome, regardless of the molecular mechanism of duplication, further experimental study is required.

Supplementary Note 3: Copy number variation
The asexual reproduction mode of the Amazon molly requires that diploid oocytes are formed without meiosis. In order to assure that homologous chromosomes and not chromatids are separated in meiotic division I the kinetochores of each pair of homologous chromosomes must capture microtubules emanating from the opposite poles of the acentrosomal meiotic spindle. Thereby the bipolarity of the kinetochore of each chromosome has to turn into unipolarity. This process is regulated by a complex of proteins that delay microtubule-kinetochore (K-MT) attachment and homolog separation for hours until anaphase of meiosis I is reached 20 , while in mitosis K-MT capture is completed within minutes. Disturbance of this process leads to missegregation of chromosomes and can even turn meiosis I into a normal mitosis 2 1 . We find that several genes involved in meiotic K-MT interaction show copy number variation in the P. formosa genome (Supplementary Table 22). In particular, cyclin dependent kinase 1 (cdk1) is present in 3 copies, all of which are expressed in ovaries of P. formosa. Cdk1 has been shown to be an essential regular of the delay of K-MT capture and increasing Cdk1 activity in mouse oocytes resulted in premature stable attachments of the spindle 22 . CNV (2 copies or more) is also observed for the noncyclin protein Ringo/Speedy, which substitutes for cyclinB in activation of Cdk1 in oocytes 23 . The biorientation of chromosomes in cell division (bod1) gene, which encodes a kinetochoreassociated protein involved in regulation of K-MT attachment stability 24 , is present in two copies in the P. formosa genome, as well as other regulators of kinetochore function, including aurora kinase b (aurkb), and securin/pituitary tumor-transforming 1 (pttg1). As local lineage-specific gene duplications we also find genes involved in cytokinesis (sept6) and progression of meiosis (mos). Interestingly, compared to the parental species the gene encoding the centrosomal protein centlein (cntl) is under positive selection in P. formosa (Supplementary Table 15).
All genes that show CNV and cntl are expressed in P. formosa ovaries with higher levels in the previtellogenic, immature ovary (Supplementary Table 22).

Supplementary Note 4: Mitochondrial DNA analysis
To elucidate the evolutionary origin and in particular the number of hybridization events represented by the present day individuals of P. formosa we produced reference mitochondrial genomes of all three species by long PCR and Sanger sequencing and reconstructed the complete mitochondrial genomes of all population samples.
We first assembled and annotated the complete mitochondrial genomes of one paternal (P. latipinna, Pla_Tam-S, origin: North of Tampico,) and maternal (P. mexicana, Pme_Nue-S, origin: Rio Purification at Nueva Padilla) ancestor and two P. formosa (Pfo_Br-S, origin: Brownsville, Pfo_SM-S, San Marcos). The four complete mitochondrial genomes are deposited on GenBank under accession numbers KT175511-14.
Thirteen different mitochondrial haplotypes were found in the 20 investigated P. formosa samples (Supplementary Figure 2). Mitochondrial haplotypes within P. formosa differed in two to 16 bases (total alignment length 16,582bp) and were more closely related to the P. mexicana haplotype (14 to 26 differences) than to any of the P. latipinna haplotypes (1,137 to 1,145 differences) confirming P. mexicana as the maternal ancestor. Identical haplotypes were not only found at the same field site (haplotypes A, C, D) but also shared between individuals from different field sites (haplotypes B, E, F). We then calculated phylogenetic trees from the complete mitochondrial genomes of P. formosa, P. mexicana, and P. latipinna allowing for a sequence length of 16,587 bp available ( Figure 3) with phylogenetic parameters; GTR model of sequence evolution; BioNJ as a starting tree, combined subtree pruning and regrafting plus nearest neighbor interchange options for tree improvement; otherwise, default parameters (http://atgc.lirmm.fr/ phyml/ for details). This revealed substantial subclustering of the maternal ancestral P. mexicana, of which only the northern subspecies (taxonomically classified as P. mexicana limantouri) forms a sister clade of P. formosa from throughout its range. Importantly, all P. formosa are united in a highly supported single clade ( Figure 3) confirming a single maternal origin of P. formosa.
To make a new estimate on the age of the hybridization event that generated P.formosa complete mitochondrial sequences of poeciliid fish including Xiphophorus maculatus (GenBank NC011379.1), X. couchianus (KT594624.1), X. helleri (FJ226476.1), Gambusia affinis (NC_004388) and Poecilia reticulata (AB898687.1) were aligned with the newly obtained complete mtDNAs of P. formosa, P. mexicana and P. latipinna. Nonalignable parts of the control region (D-loop) and maximal 123 bp of an INDEL from each of the three Xiphophorus and Gambusia mitochondrial DNAs were cut out, resulting in a final alignment of 16607 bp.
We calculated Bayesian phylogenetic trees using BEAST v. 1.8.3 25 . From the recently published fossil-based, multi-species and multi-locus phylogeny by Helmstetter et al.  Supplementary Figure 1b), we used two calibration points as priors, provided by this reference as minimal ages: (i) Xiphophorus helleri and Gambusia affinis diverged from Poecilia reticulata 18.87 Mya and (ii) Gambusia affinis from Xiphophorus helleri 14.9 Mya, assuming normal distributions of the divergence time of 1 My around each node. We estimated the time to the most recent common ancestors of each taxon, P. latipinna, P. mexicana and P. formosa, assuming a yule birth rate (as most appropriate for speciation models) under a simple JC-substitution model, obtained with jModelTest 2 7 under a strict molecular clock with free variation of the substitution rate, starting with a prior of 1, using an MCMC chain of 10 million generations. All multiplesample taxa were considered to be monophyletic, except P. mexicana, the close mitochondrial (maternal) ancestor of P. formosa. In parthenogenetic reptiles mitochondrial segmental duplications of several kilobases were detected [29][30][31] and connected to the unisexual reproduction. Either the clonal and hybrid nuclear background may cause cyto-nuclear incompatibilities resulting in improper mtDNA replication 30,31 or the two-fold effective population size of mtDNA in parthenogens relative to a same-sized sexual population may tolerate additional genetic diversity including such duplications 32 . Finally, absence of males, normally selecting for accurate mtDNA-transcription (e.g. by energy-demanding sperm), may increase the persistence of deleterious mtDNA in all-female clones 29 . Our analyses of mitochondrial genomes of P. formosa revealed the absence of large mitochondrial duplications, indicating accurate mtDNA replication. Thus, the mitochondrial genome of P. formosalike the nuclear genome -does not show negative consequences of hybrid origin and asexual reproduction.

Supplementary Note 5: Gene conversion
To detect gene conversion events we analyzed all P. formosa samples for mutations at fixed different sites in the parental species. For each tentative gene conversion site, we counted the depth of sequencing coverage (Supplementary Figure 15). All the conversion sites in P. formosa demonstrate a coverage greater than 10, the genomewide average coverage for sequenced P. formosa isolates. This observation suggests these sites experienced true conversion events, rather than genotypic change from heterozygosity to homozygosity due to deletion of one allele (LOH). Assuming an ancestral heterozygous genotype at a site (e.g., A/G), a mutation event at this site would lead to six possible genotypes (A/T, A/A, A/C, T/G, G/G, or C/G), whereas a gene conversion event would result in two possible genotypes (A/A or G/G). Within our high quality SNP dataset consisting of 53,175 fixed differences we detected 725 such single base homozygous sites. Thus, gene conversions rather than mutations are more likely to be responsible for the homozygous genotype for either parental alleles at these fixed difference sites in P. formosa.
Nonetheless, in contrast to the widely observed GC-biased conversion in eukaryotes 33 , these gene conversion events are not biased towards G+C nucleotides at AT:GC heterozygous sites (268 AT→GC vs 301 GC→AT, chi square test p = 0.1665).
Within the fixed SNP differences between the parental species, we identified 706 tentative single-base conversion events that occurred in only one P. formosa isolate with at least 12 of the remaining isolates being heterozygous. We found homozygosity for 498 fixed sites that are present in more than 12 isolates and thus are assumed to be of ancient origin.
Considering the single F1 hybrid origin of P. formosa, such sites strongly indicate phases of selective sweep and/or a demographic bottleneck associated with genetic drift following gene conversion events during early evolution of P. formosa after it arose from hybridization of the parental species.
To obtain an estimate on the overall gene conversion rate we considered the 706 singlesite conversion events from the total 53175 fixed-difference sites. Assuming an origin of P. formosa 100, 000 years ago (4 generation/year), this results in (706/53175)/(100,000 * 4) = 3.32 E-8 per site per generation.
To test the hypothesis that gene conversion is low in P. formosa due to the high heterozygosity of its hybrid genome we calculated Fst as an estimate of divergence between the two parental species for each scaffold, where we found conversion events (Supplementary Figure 16). We find no correlation of divergence and conversion event frequency, and thus no support for this explanation of paucity of gene conversions.

Supplementary Note 6: Paternal introgression
To determine paternal introgression we assessed allele imbalance at heterozygous sites across all scaffolds larger than 100Kb. We then studied the distribution of the reference allele frequency across samples and scaffolds. We identified putative paternal introgression events as bimodal distributions that result from the presence of an extra copy inherited from the male host, what makes the ratio of reads at heterozygous sites deviate from the theoretical 0.5. We confirmed that the predicted copy number throughout the affected scaffold was higher and continuous in the samples where the paternal introgression was detected. We did not observe any evident structural changes in the introgressed genomic material (Supplementary Figure 17). For further confirmation we determined whether the observed imbalance could be consistently traced back to the corresponding paternal species. Sequencing data from five P. mexicana and six P. latipinna was used to call SNPs. After filtering, we looked for fixed homozygous sites with opposing genotypes in the two species. Then these fixed homozygous sites were intersected with heterozygous variants called in each of the 20 P. formosa samples and for each site the percentage of reads carrying the allele present in P. mexicana was calculated. In the absence of paternal introgression, frequency should be ~50%. With introgression the imbalanced allele at heterozygous sites should match the allele present in the corresponding paternal species, and thus this frequency would deviate from 50%. In wild-caught isolates, we trace the introgressed sequence to the known male hosts being either P. latipinna or P. mexicana.
To evaluate the relief from genomic decay allowed by paternal introgression, we need to distinguish the following potential scenarios, since we do not know which effect on fitness a particular paternally introgressed sequence will have. To keep this analysis simple, we will pretend that all paternal introgressions are either selectively neutral, deleterious, or advantageous in the sense that they repair preexisting deleterious effects that had been caused by Muller's ratchet. The mechanistic scenario behind our model is as follows. We infer from the observed frequency of paternal introgressions in the natural population of about 25% (see above) that there's likely to be a steady state where paternal introgressions continue to occur and remain visible as B-chromosomes for some time until the B-chromosomes disappear again. Disappearance could mean complete loss without leaving a trace in the nuclear genome, or it could mean that the nuclear DNA has been replaced via gene conversion and thus a potentially beneficial effect of the paternal introgression has become permanent. We currently cannot determine which of these scenarios are most likely, and will therefore consider all of them. These scenarios suggest changing the parameter estimates that were presented in Loewe and Lamatsch 2008 as follows (previous values reused here, unless changes have been explicitly stated).
A. No or neutral paternal introgression. This plot is exactly as the corresponding plot in Loewe and Lamatsch 2008 for all parameter estimations, except that the overall age of the Amazon molly as indicated by the horizontal gray line at about 100,000 years in the past, has been adjusted to the results of this present paper. This plot is shown as a baseline with and without individual based simulation results (for comparison) and represents the case where either no paternal introgression occurs or all paternal introgression is effectively selectively neutral. B. Paternal introgression as exclusively deleterious. Same parameter combination as for A; except that the effective population size N e has been reduced by 25%, assuming that all Amazon molly in natural populations observed to carry B-chromosomes will ultimately go extinct, due to deleterious effects caused by these paternal introgressions. As argued elsewhere 34,35 , the N e most relevant for Muller's ratchet analyses is the N e that results from removing all individuals that carry strongly deleterious mutations, which will ultimately be removed deterministically by background selection. This is the case even in asexual populations 36 . Thus, reducing N e estimates by removing the fraction of individuals with potentially deleterious paternal introgressions seems to be the most appropriate approach. C. Paternal introgression as exclusively advantageous. Same parameter combination as for A; leaving N e unchanged, we now reduce the deleterious mutation rate U sdm , assuming that there is a constant flux of paternally introgressing genes that repair previously existing deleterious mutations accumulated from Muller's ratchet. To estimate this flux, we assume a steady state quasi-equilibrium that results in the observed frequency of B-chromosomes and most importantly allows us to assume that the total number of introgression events per generation per individual is effectively as frequent as the overall loss of B-chromosomes per generation per individual. Applying a logic similar to that used for quantifying neutral evolution, we translate observations of 1 B-chromosome loss per 10 or 100 generations in the lab into assumed estimates of 0.1 or 0.01 paternal introgressions per individual per generation. It is difficult to see how paternal introgression rates could be much higher in natural environments. These rates do not directly translate into advantageous mutations, because only 1% or 0.1% of the genomic DNA is transferred in a single introgression event, and might not include the sites required for a particular repair. Thus, the probability that a particular introgression will repair a particular deleterious mutation is given by the ratio of the size of the paternal introgression over the size of the genome. As a result, the overall probability of repairing damage from the initial operation of Muller's ratchet in the Amazon molly is at most 0.001 per genome per generation -as inferred from an upper limit of 0.1 paternal introgressions per genome per individual * 0.01 advantageous ratchet repairing introgressions out of all introgressions. The latter fraction of advantageous introgressions might increase substantially after many slightly deleterious mutations have accumulated throughout the genome, because then the probability of repairing at least 1 of them, will substantially increase. A full analysis of this and other related potential solutions to the genomic decay paradox in the Amazon molly (see 34 ) certainly needs further in-depth analysis. D. Paternal introgression as mostly deleterious and occasionally advantageous. Same parameter combination as in A; except that we now combine the cases B and C, and reduce N e and the mutation rate U sdm accordingly.
The corresponding U-shaped plots for the parameter estimations defined above can be found in Supplementary Figure 7, panels A -D. They show that paternal introgression has a rather small impact on the initial rate of Muller's ratchet in the Amazon molly. While this might change for Amazon molly genomes in advanced stages of genomic decay, it does not seem that our present understanding of paternal introgression reduces genomic decay processes so much that the absence of molecular signatures of decay as reported in this paper is easily explained by paternal introgression. This conclusion is not likely to be changed if a model of paternal introgression can be justified that merely modulates extinction time by some linear combination of introgression rate and introgressing genome size. The predicted U-shaped plots of such models merely move all curves slightly up the y-axis; the resulting effects are minor because the y-axis is logarithmic.

Supplementary Note 7: Heterozygosity analysis
Based on our collection of 293,324 SNPs we find abundance of medium frequency variants and no excess of low and/or high frequency polymorphisms (Tajima's D P. formosa 2.37, P. mexicana 0.82, P. latipinna 0.75).
From the SNP dataset we calculated nucleotide diversity (π) within P. formosa, P. mexicana and P. latipinna. We find a higher π for P. formosa (0.324) than for P. latipinna (0.2) and P. mexicana (0.225). These values are higher than what is usually expected from whole genome level analysis, because only the variable sites, which make up the SNP dataset are considered. Although π is usually/conventionally measured as a "virtual heterozyogosity" assuming random mating, which is not applicable to the mating system of the Amazon molly, it provides another indication for the high heterozygosity maintained in the asexual species.
Using the same SNP dataset, we reconstructed haplotypes for all sequenced isolates of P. formosa and the parental species. Taking advantage of the large amount of fixed nucleotide differences between the two parental species, the genotypes for P. formosa at such fixed different sites can be accordingly phased into two parental haplotypes, whereas the genotypes at other sites, where fixed difference is not detected, are phased randomly. For the two parental species, the alleles at a site are randomly phased to reconstruct two haplotypes. A neighbor-joining phylogenetic analysis with 100 bootstrap replicates revealed two strongly supported clades, one clade consisting of parental P. latipinna haplotypes and P. formosa haplotypes derived from P. latipinna and the other clade containing parental P. mexicana haplotypes and P. formosa haplotypes derived from P. mexicana (Supplementary Figure 4).

Supplementary Figures
Supplementary Figure 1 Ultrametric tree with branch lengths in millions of years for the 8 fish species used in this study. Branch lengths appear above the branch while the number of families that have expanded (green) and contracted (red) along that branch appear below it. Time-calibrated Bayesian phylogenetic tree obtained with BEAST v.1.8.3. Using independent previously published minimum age estimates for divergence of Gambusia affinis from Xiphophorus hellerii and the divergence of both from Poecilia reticulata for calibration the origin of Poecilia formosa is dated to at least 100 000 years ago. Bluish bars on nodes represent 95% confidence intervals.

Supplementary Figure 4
Neighbor-joining phylogeny (unrooted) of the reconstructed haplotypes of P. formosa, P.latipinna, and P.mexicana using uncorrected nucleotide divergence (292,324 sites). Numbers next to the tree represent bootstrap values for major clades. Scale bar indicates 0.05 divergence per site. The notation for each haplotype consists of the sample name followed by either 0 or 1 that indicate two haplotypes within each isolate, respectively. Phylogenetic tree reconstruction using maximum likelihood yielded the same topology.

Supplementary Figure 5
Allele specific gene expression in P. formosa liver (A), skin (B) and gills (C). Numbers in blue are genes with expression biased for the P. latipinna allele and number is red are genes with expression biased for the P. mexicana allele.

Supplementary
Selection coefficient U-Shaped plots quantifying the impact of Muller's ratchet in the presence of paternal introgression in the Amazon molly. U-shaped plots are a way to quantify the danger of extinction caused by Muller's ratchet in asexual populations. Briefly, they use a multiplicative fitness model and an assumed maximal reproduction capacity to predict the number of clicks Muller's ratchet might need to drive the population to extinction by mutational meltdown. This number of clicks is then multiplied with the time it takes Muller's ratchet to click. This click time is predicted from mutation rate (U sdm ) the selection coefficient (s, on x-axis), and the effective population size (N e , indicated indirectly on the x-axis as 1/s). Given the many uncertainties associated with these analyses, lower and upper limits are used for all respective values; thus, a range of s, U sdm values need to be considered for appropriately interpreting this plot. For more details, see 34 . The five panels shown focus on the main topic of this study by exploring the four potential roles of paternal introgression that are discussed next. In addition to the analytic predictions shown for all other panels, the first panel also shows related individual-based simulation results to clarify expectations for the range of parameter values where the analytic predictions break down.
(A) Basic scenario of no introgression or purely neutral introgression effects. Plots correspond to 35 except for adjustments in the age of the Amazon molly. Predicted extinction time (T ex ) (estimated as in 35 ) from the largest to the smallest conceivable effective population size, N e (10 7 to 10 4 ). Lines indicate the predicted extinction time for different deleterious genomic mutation rates (U sdm ) with N e =316,000, generation time T gen =1 yr and maximal reproductive capacity R max =500 offspring/generation. The thin orange (U sdm =0.5) and green (U sdm =0.05) lines represent the variability of the extinction time for the lower and upper credible deleterious mutation rate estimates as taken from 35 (U sdm =0.05, 0.5) using the corresponding upper and lower limits for N e (10 7 ,10 4 ; see bar indicating the range of s values that correspond to the populations N e values and are chosen such that the product N e * s = 1), T gen (2.5, 0.25) and R max (2,000, 50). The horizontal grey line marks the age of the Amazon molly lineage (100,000 years). (A1) This panel was generated by parameter combination A as described above; it includes all evolution@home individual-based simulation results, which were also shown in 35  Cumulative distribution of 36 base pairs kmers mapping approach applied to detect potentially hidden repeats. Figure 13 (A) Total LoFs per individual across species by variant classification. All LoF variants shared across all samples for the parental genomes were removed as these likely constitute false positives due to frameshifted gene models due to indels and very low likelihood these variants were fixed in the diverse populations selected.

Supplementary
(B) Total non-synonymous and synonomous variants estimated within the asexual and sexual parental genomes.

Supplementary Figure 15
Histogram of sequence coverage of tentative gene conversion events in P. formosa. The red vertical line indicates the 10x sequence coverage.

Supplementary Figure 17
Copy number in 1kb repeat-free windows along scaffold KI519701.1. Each panel corresponds to a different sample. Each dot represents the copy number in one 1kbrepeat-free window. Tracks for gaps, repeats, genes and segmental duplications (SDs) are included at the top of each panel. In grey, the samples with no introgression are shown. In teal, the sample (Pfo_Tan-2) carrying the introgressed scaffold (KI519701.1) is depicted. We observed an increased and constant copy number (~3) throughout the entire scaffold (KI519701.1) in sample Pfo_Tan-2, with no evidence of duplications/deletions. The same result was obtained for all other introgressed scaffolds.

Supplementary Tables
Supplementary Table 1